[LinuxPPS] [PATCH 02/11] Streamline interrupt handling; get rid of a lock and idr_find()
Rodolfo Giometti
giometti at enneenne.com
Tue Feb 10 17:51:48 CET 2009
On Tue, Feb 10, 2009 at 11:03:29AM -0500, George Spelvin wrote:
> Rodolfo Giometti <giometti at enneenne.com> wrote:
> > No "reasonable contracts" are allowed here, you _must_ provide bugs
> > free code! If pps_unregister_source() should remove a PPS source from
> > the system, it must be do it under any circumstance.
>
> Could you try to rephrase this? The grammar is sufficiently confusing that
> I can't figure out what you're trying to say.
> "It must be do it" makes no sense to me.
Sorry, my english is not good. :)
See below. I hope I can explain it better.
> pps_unregister_source will unregister a source as long as it exists.
> Exists is defined as has been registered and has not yet been
> unregistered.
>
> Once you have unregistered it, it no longer exists; you may no longer
> refer to it in any way.
This is not completely true... if you take a look at older LinuxPPS
code and exchanged e-mails between me and kernel gurus you can see
that in the past pps_event() was called inside the serial port IRQ
handler (using ldisc is just the latest news). This fact (which is
related to that special implementation) was discussed a lot and kernel
gurus asked to me to provide a core code which can get ride of these
particular cases. So it's normal that someone may call
pps_unregister_source() while pps_events() executes, and only the last
one who uses the resource can destroy it is usage drop to 0.
Hope I was clear now.
>
> >> widely used in multiple bits of the kernel. the unregister function
> >> frees the pointer, so you shouldn't have anything referencing that
> >> pointer still running.
> >
> > This is exacly what current implementation is trying to do. :)
> >
> > The problem here is that pps_event() can be called while
> > pps_unregister_source() executes and viceversa so your code must
> > prevent crashes due this fact.
>
> That is not possible. Period.
That's is _not_ true.
>
> Example:
> - pps_tty_close() -> pps_unregister_source()
> - Module is unloaded (because the reference count is now 0)
> - pps_dcd_change() -> pps_event()
Again: this is the _current_ implementation but it cannot be the only
one! See the code _before_ ldisc support and you can understand
why. Since ldisc solution is not accepted yet we must provide a
LinuxPPS core that can support concourrency between pps_event() and
pps_register/unregister_source().
>
> Just what the heck is a module supposed to do about that?
>
> If you think this kind of abuse is legal, could yiu specify EXACTLY what
> kind of behavior is acceptable and what is not? For example, here are
> possible ways to abuse the functions exported in <linux/pps.h>:
>
> struct pps_device *pps_get_source(int source);
> - Source values that no longer exist?
> - Source values that never did exist?
> extern void pps_put_source(struct pps_device *pps);
> - pps pointers that have already been put
> - pps pointers that are completely made up
> - pps pointers that are unaligned
> - pps pointers that are NULL
> - pps pointers that are to I/O registers
> extern int pps_register_source(struct pps_source_info *info,
> int default_params);
> - info->name contains control characters
> - info->name not null-terminated
> - info->path not null-terminated
> - info->echo points to hyperspace
> - info->owner invalid
> - (info->mode & PPS_CAPTUREBOTH) == 0
> extern void pps_unregister_source(int source);
> - Called with a source still in active use
> - Called with a source already unregistered
> - Called with a completely imaginary source
> extern int pps_register_cdev(struct pps_device *pps);
> - Called multiple times on the same pps
> - Called after the source is already unregistered
> - Called with invalid pps pointers
> extern void pps_unregister_cdev(struct pps_device *pps);
> - Called on a pps that was never registered as a cdev
> - Called multiple times on the same pps
> - Called with invalid pps pointers
> extern void pps_event(int source, struct pps_ktime *ts, int event, void *data);
> - Called with source values that no longer exist
> - Called with source values that never did exist
> - Called with invalid pps_ktime pointers
> - Called with pps_ktime values that are non-canonical
> (nsec negative or > 999999999)
> - Called with event == 0
>
> Which of those are legal and which are not?
>
> >> Kind of like you shouldn't free_pages() until the DMA is complete.
> >>
> >> I agree that using an index looked up in an IDR makes the code crash-proof
> >> (you can always detect invalidity of the index), but there are ways of
> >> doing that with pointers, too.
> >
> > Good! Let me see how, I don't think your code is doing it.
>
> No, it's not, because it's slow and annoying and not worth the
> bother.
I don't understand you... =:-o
>
> >> More to the point, using an index looked up in an IDR does NOT
> >> make the code bug-free. Consider the following hypothetical code
> >> that calls pps_unregister_source in parallel with pps_event:
> >>
> >>
> >> CPU0 CPU1 CPU2
> >> call pps_register_source
> >> allocate PPS struct
> >> call pps_event
> >> call pps_unregister_source
> >> lock IDR
> >> lock IDR...(wait)
> >> lock IDR...(wait)
> >> idr_find(/dev/pps0)
> >> decrement usage to 0
> >> remove pps from IDR
> >> unlock IDR
> >> deallocate PPS
> >> ... lock acquired
> >> Allocate ID 0
> >> unlock IDR
> >> ...lock acquired
> >> idr_find(/dev/pps0)
> >> Get unexpected PPS device!
> >> pps->usage++
> >> unlock IDR
> >> Do things to wrong PPS device
> >
> > No, it gets _exactly_ the right device, that is pps0. It's your
> > businness to inform CPU0 on which PPS source ID it must call
> > pss_event(). The core is doing _exactly_ what its documentation
> > states.
>
> CPU 2 would disagree. It registered a device, and never sent it any events,
> yet that device is reporting that events have arrived.
>
> That sure looks like a bug to me.
Not to me, since this is currenlty fully functional.
>
> If you're allowed to say "it's not my problem because it's documented",
> then why is this okay, and crashing due to a stale pointer not okay?
Because crashes are not recoverable but you can provide a good way to
suggest to pps_event() which source should be updated.
> We can document *that*, instead!
It's already documented.
>
> Then, just like this, the unwanted behavior is not a bug in the
> PPS core, it's a bug in the code that called it.
Which bug are you talking about? If each clients use the exported
methods everything works well.
>
> >> Now, the current pps_register_source code actually delays initializing
> >> the pps struct until *after* idr_get_new, so the CPU0 pps_event()
> >> call can crash on an uninitialized structure, but even if that's fixed
> >> (which isn't too hard), you still have the pps_event() calling idr_find
> >
> > Yes, you find a bug! But I can fix it simply initializing the struct
> > before calling idr_get_new(). See the following modifications:
> >
> > diff --git a/drivers/pps/kapi.c b/drivers/pps/kapi.c
> > index 917b379..e0b75ed 100644
>
> Still buggy. There is a brief window during which pps->id is not set
> correctly. (See my patch 03/11 for how to close the window completely.)
Why? It's protected by pps_idr_lock, isn't it? =:-o
>
> >> and getting the WRONG pps device.
> >
> > This is NOT a core problem, it DOES exactly what is expected to do,
> > that is updating the PPS source 0's event data.
>
> You're defining it as someone else's prolem. That doesn't fix the
> problem, which is that a PPS device is getting events that it doesn't
> want.
No, if the client correctly set up PPS IDs.
>
> If the user of the core does that, buggy behavior (of the whole *system*)
> results. The user of the core must not do that if they don't want to
> cause bugs.
>
> >> The pps-ldisc client relies on the guarantees that the ldisc layer
> >> makes. Your code does that already. Suppose that pps_tty_dcd_change
> >> could overlap with pps_tty_close. Well, after pps_tty_close comes
> >> tty_kref_put, which calls release_one_tty, which calls free_tty_struct(tty);
> >>
> >> CPU 0 CPU 1
> >>
> >> pps_tty_dcd_change(tty)
> >> pps_tty_close(tty)
> >> tty_kref_put(tty)
> >> release_one_tty(tty)
> >> free_tty_struct(tty)
> >> kfree(tty)
> >> Fetch tty->id: SEGFAULT!
> >>
> >> (This doesn't happen because the code in tty_release_dev that calls
> >> all of this takes care to ensure that the ldisc is *fully* shut down
> >> before getting on to tty_kref_put().)
> >
> > Ok, I don't know all possible solutions, what I know is that
> > __if_we_want_kernel_inclusion__ we have to provide a PPS core with a
> > well defined API who does _exactly_ what stated into its documentation
> > and not related with any particular client implementation.
>
> So document the fact that any reference to a PPS device after
> pps_unregister_device is a bug that will make demons fly out of your
> nose.
> ("Nasal demons" are from http://www.catb.org/jargon/html/N/nasal-demons.html)
>
> It will be just like the thousand other acquire/release primitives inside
> the kernel. I could start listing them, if you'd like:
> - kmalloc/kfree
> - fget/fput
> - get_unused_fd/put_unused_fd
> - get_page/put_page
> - device_create/device_destroy
> - tty_kref_get/tty_kref_put
> - tty_ldisc_ref/tty_ldisc_deref
>
> I could go on for quite a while. In every single case, you have to not
> have any operations im progress on the released object when you release
> it, or boom.
LinuxPPS's pps_register_source(), pps_unregister_source() and
pps_event() have different features due PPS nature, that is
pps_event() __can_be_called_while_the_others_execute__.
>
> That's the way the kernel works. That's the way most software works,
> in the absence of garbage collection.
>
> Linux doesn't believe in slowing the system down to catch every
> conceivable kernel programmer error.
Currently LinuxPPS implementation needs the above requirement. After
kernel inclusion and _____after_____ serial _____and_____ parallel
client inclusion we can discuss about relaxing some requirements due
specific clients implementation.
If you can provide better/faster/cleaner code that has the above
requirement I'll be happy to accept it, otherwise I'm sorry but I have
to refuse it.
Ciao,
Rodolfo
--
GNU/Linux Solutions e-mail: giometti at enneenne.com
Linux Device Driver giometti at linux.it
Embedded Systems phone: +39 349 2432127
UNIX programming skype: rodolfo.giometti
More information about the LinuxPPS
mailing list