A Close Look at a Spinlock(blog.regehr.org) |
A Close Look at a Spinlock(blog.regehr.org) |
When I asked why the rationale was something like, "If a spinlock is ever held more than such and so number of microseconds, it's an error that it's a spinlock and actually it should be a mutex." Instead of doing degenerate behavior in this case, they just made the spinlock actually act like a mutex.
And once this happens, you now have a descheduled thread that owns the lock and a spinning thread waiting to acquire it, which is a recipe for priority inversion. As far as the kernel can tell, it’s the spinning thread that’s doing some important work so it has no reason to preempt it and schedule the waiting thread, which is exactly the wrong thing to do in this situation.
As the term is pervasively used in industry, "spinlocks" always imply locking against other physical hardware, and that can't be done without spinning (I mean, you could fall back to blocking, but then the blocking implementation in the kernel would have to lock its own data structures to effect the scheduling; it's a chicken and egg problem). They usually, but not quite always, imply protection against asynchronous interrupts or signals too.
[1] https://www.kernel.org/doc/html/latest/locking/locktypes.htm...
[2] https://github.com/apple/darwin-libplatform/blob/main/src/os...
In kernel-level code, for areas of the kernel that must use real spinlocks, they do not fallback to the equivalent of a Futex because that would mean the current thread of execution would be swapped out and kernel-level spinlocks are designed for critical sections that cannot be swapped out. In critical sections of the kernel that can be swapped out, you'd use a Mutex which does the equivalent of a user-level Futex (and swap out) when the Mutex is contended.
If you're facing such low-latency constraints that you need a spinlock, you wouldn't want the thread to sleep, instead you'd pin the core and have the dedicated core constantly spinning. That's how it's done in HFT. You can use a fancy network card like SolarFlare with efvi_drivers to bypass the kernel completely for network IO (spin waiting on the network card). Basically you need to treat the kernel like it's a paedophile trying to molest your children, and do everything you can to stop it touching your low-latency threads.
That could be a wait queue, or it could be keeping track of what lock (if any) a thread is blocked on and iterating all threads to see if anything is blocked (this is relatively easy to implement, but not great if you have many threads), or ???. If you're running SMP or a re-entrant kernel, you need to be careful about how you do your checks so you don't have race conditions; you don't want a thread blocked on an unlocked lock.
Or in embedded systems where code is running at supervisor level, regions where an interrupt should be deferred, thereby allowing interrupt-disable and interrupt-enable instructions to be avoided.
An alternative which is reasonably fast on modern CPUs is to have a spinlock implementation in the kernel VDSO (or equivalent).
Something a bit like that was done in uCLinux for old ARMs, except instead of a spinlock, they put atomic compare-and-swap in the VDSO, as the CPU didn't have an instruction for that. The kernel checked the userspace address on interrupts, and altered the PC register to redo the compare if the sequence was interrupted.
So that didn't extend the timeslice to allow the operation to complete. Instead it unwound the operation to ensure correct semantics. With a spinlock/mutex combination, cutting short the spinning phase when the task is descheduled by the kernel would be similar, and would allow the "normal" spin count to be unlimited.
B) The problem is the descheduled thread which has the lock and isn't in the lock sequence. Simply killing the time slice of the spinning thread without knowing what it's waiting on may lead to even worse behavior.
The answer here is simple: just don't use pure spinlocks if you can be preempted.
Did I claim it was...?
> Simply killing the time slice of the spinning thread without knowing what it's waiting on may lead to even worse behavior.
It's not obvious to me how likely this is compared to the other case, but if you're trying to make a case for why a kernel doing that would result in worse behavior in typical cases, it would probably help to explain this.
> The answer here is simple: just don't use pure spinlocks if you can be preempted.
I don't get why you're arguing with me on this. I wasn't telling anyone to use spinlocks, nor claiming this is the one and only problem with spinlocks. I was just saying a kernel could be a little smarter about a particular case by examining the instruction sequence.