[LinuxPPS] [PATCHv3 03/16] pps: fix race in PPS_FETCH handler
Alexander Gordeev
lasaine at lvk.cs.msu.su
Mon Aug 9 12:29:41 CEST 2010
В Fri, 6 Aug 2010 09:30:57 -0700 (PDT)
tlhackque <tlhackque at yahoo.com> пишет:
> > > Yes, it will freeze the fds (if they don't use timeouts). But in
> > > normal circumstances, i.e. when pps_event is called twice a second,
> > > it will overflow after ~68 years of uninterrupted work. Well, it's
> > > the same kind of problem as an overflow of struct timespec. I
> > > thought it's not actually a problem. Should I use u64 instead of
> > > unsigned int or add a runtime check somewhere?
> >
> > If we're using 1PPS it's ~68 years, but someone is trying 5PPS now
> > (it would overflow in ~13.6 years) - what if someone tries e.g. 100PPS?
> > It's not the same as overflow of struct timespec! I think it deserves
> > some treatment.
>
> I don't like this approach in any code. There is no reason to write code that
> isn't robust in the face of overflow.
>
> Two alternatives:
> - if all you care about is that there's a change, use a comparison for !=
>
> - If you really need less than, do a modulo compare (There's reasonably
> efficient code for this, see any network stack's sequence number comparisions.)
>
>
> In either case, the width of the counter needs to be how many unrecognized
> events you can have (maybe 2x for a cheap modulo compare), not some length of
> time before the system hangs. This will be much, much less than 64 bits.
>
> Every time someone thinks that their length of time is acceptable, it bites
> someone else later. Technology changes. Or your code gets sent on an
> interstellar mission that really is expected to run 120 years :-)
>
> Seriously, I've seen these kinds of counters break in all kinds of embedded
> systems - and there's no reason for it. Should I tell the story about the
> mainframe that crashed reproducibly after about 6 months of uptime because
> everyone knew that a 32-bit uptime counter used to manage timeouts would NEVER
> overflow in a disk controller? No controller ever went that long without being
> reset...until it became the least reliable component in the system.
Ok, you're absolutely right, thanks for the review!
--
Alexander
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
Url : http://ml.enneenne.com/pipermail/linuxpps/attachments/20100809/fb66d22f/attachment.pgp
More information about the LinuxPPS
mailing list