Race Conditions Can Be Useful for Parallelism(shwestrick.github.io) |
Race Conditions Can Be Useful for Parallelism(shwestrick.github.io) |
[...] the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events
If we take the first example - Parallel BFS - the correctness of the output could be considered "system's substantive behavior". Properly implemented atomic checks (as demonstrated) would still guarantee to lead to correct output in all possible combinations of events. Therefore, the system's "substantive behavior" is not dependent on the sequence or timing of other uncontrollable events. Therefore, there is no "race condition" involved.Of course, the term "race condition" here is taken colloquially, for the sake of familiarity to the reader - the article has correctly recognized that the appropriate term for this kind of behavior is "non-determinism".
> A race condition occurs in a parallel program execution when two or more threads access a common resource, e.g., a variable in shared memory, and the order of the accesses depends on the timing, i.e., the progress of individual threads.
> The disposition for a race condition is in the parallel program. In different executions of the same program on the same input access events that constitute a race can occur in different order, which may but does not generally result in different program behaviors (non-determinacy).
Sometimes you deliberately program a full-on data race (which isn't a bug by definition, as the article says) for performance reasons.
> Data races that are not manifestations of program bugs are called benign data races.
No, in overwhelmingly common usage, programmers use the term "race condition" as a category of software bug. We mean it in the correctness sense, not the one used in the linked article nor your reference. You'd be met with some very weird stares if you tried to explain how arbitrary SMP ordering of log entries or whatever was a "race condition".
Dijkstra’s guarded commands don’t specify an order for the conditional. The semantics is that the process is free to execute any one of the cases that has a true guard. Nevertheless he found them useful for developing and describing many algorithms.
Is this the paper you're referring to? If not, could you please provide a reference to which PADUA you're referring to? I'd really like to read more on the subject, especially if the source is, as you claim, an industry reference.
Using the term "race condition" in context of correct programs would make it cover exactly the same universe of programs as the term "non-determinism". I that think the distinction, however trivial (race condition = incorrect behavior, non-determinism = correct behavior), is still useful.
Great article, by the way. I did not mean to criticize it in any way. My "slight nitpick" about meaning of the words is really just that - a nitpick :)
Race conditions are hard enough to explain to people and misusing the term just makes it more difficult.
The definition you quote has no linked citation on Wikipedia. Usually a good sign that you should not treat those statements as definitive. A good Wikipedia article should not state any "facts" without a direct means of verification. Otherwise it's considered "original research" and against the wiki policy for a high quality article.
https://en.m.wikipedia.org/wiki/Wikipedia:No_original_resear...
Searching the internet for "race condition definition" and taking the top few results brings several definitions that all agree in spirit with the Wikipedia one (see below).
If you know of any more reliable source that doesn't agree with Wikipedia on the definition of "race condition", please post it here. This is a honest request - I am always grateful to those who correct my mistakes (in good faith).
wordnik [0]: A flaw in a system or process whereby the output or result is unexpectedly and critically dependent on the sequence or timing of other events.
techtarget [1]: A race condition is an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time, but because of the nature of the device or system, the operations must be done in the proper sequence to be done correctly.
techterms [2]: A race condition occurs when a software program depends on the timing of one or more processes to function correctly.
javatpoint [3]: When the output of the system or program depends on the sequence or timing of other uncontrolled events, this condition is called Race Condition.
technopedia [4]: A race condition is a behavior which occurs in software applications or electronic systems, such as logic systems, where the output is dependent on the timing or sequence of other uncontrollable events.
[0] https://www.wordnik.com/words/race%20condition[1] https://www.techtarget.com/searchstorage/definition/race-con...
[2] https://techterms.com/definition/race_condition
[3] https://www.javatpoint.com/what-is-race-condition
[4] https://www.techopedia.com/definition/10313/race-condition
One notable exception is the Racy Single-Check Idiom: http://javaagile.blogspot.com/2013/05/the-racy-single-check-...
It is particularly suitable for lazy initialization in code that is typically (but not necessarily) executed single-threaded, and is famously used in Java’s String.hashCode() implementation.
I would hope that the primary takeaway from this post is that race conditions are not necessarily bugs. Race conditions are not necessarily something that need to be "fixed".
EDIT: and don't get me started on tensor cores and clever tricks to have them do 'fp32-alike' accuracy. Yes, wonderful magic but how do you reason about these new objects without a whole new slew of tools.
For example, this project https://github.com/EmbarkStudios/texture-synthesis generates textures and if you run the same code with the same input various times, the results will be slightly different. Here https://github.com/EmbarkStudios/texture-synthesis#notes it says: "When using multiple threads for generation, the output image is not guaranteed to be deterministic with the same inputs. To have 100% determinism, you must use a thread count of one"
Anyway, as time passes by I veer off equality and think about the actual necessary accuracy and wish there was a way to set it as a spec for proof (SPARK/Ada or a higher level DSL that can be lowered to proper accuracy analysis tools...
I wish I could also specify 'no NaNs please' as a postcondition. Need to check in with the SPARK team and get an introduction article going...
When talking about this kind of stuff to people who are unfamiliar with, say, lock-freedom, I've found that "non-determinism" is too vague --- people start thinking about things like randomness, or user interaction, etc. In contrast, the term "race condition" seems to hit the nail on the head.
But certainly, "race condition" also carries with it a bit of baggage ;)
The race exists because there are multiple accesses. This is resolved when there is a protocol for deciding who proceeds and who waits for the other.
Other commenters have already covered that. The links you shared aren't good primary sources, and probably took their definition from Wikipedia itself, creating a circular reference issue.
I remember learning about race conditions in college, and they were always mentioned in the context of bugs - that's why I took Wikipedia itself for granted, as its definition fit my current understanding of the word.
It seems to be a case of popular usage of the word differing from the original. It also raises a question of whether or not the original intended meaning is the authority, if the majority of programmers use it in a different way. Who should, in general, be the authority on the meaning of a word? But that's more of a philosophical question, I suppose.
The catch-fire terminology comes from the analogy that, as soon as a data race occurs, the semantics of the program completely explodes, and all guarantees are lost---the program is then allowed to do literally anything. This is sometimes known as "DRF-SC or catch fire": either the program is data-race-free (and therefore its executions are sequentially consistent), or the program has undefined behavior.
Infamously, the C memory model has the catch-fire problem. And therefore, any language which relies on the C memory model can catch-fire. As of today, I believe this includes C/C++, Swift, Rust, and probably a few others.
Tear-free does not imply no out-of-thin-air. But, afaik, the java memory model protects from both tearing and oota.
This was exactly my point.
race condition = incorrect behavior
non-determinism = correct behavior
Language is tricky sometimes.There are plenty of examples of entirely deterministic parallel models. Fork-join for one.
I mean, yes, what you say is true, but it just amounts to saying "synchronization is a solvable problem". At their core, ALL synchronization paradigms (spin polling, interrupt masking, OS-managed process suspend, hardware memory barriers, weird lockless tricks like Dekker's algorithm, you name it) can be understood to be ways of enforcing "para-determinism" on environments that don't provide it.
In SMP, things don't happen in reliable orders. They don't. They never will. They can't. But your hardware and OS platforms are smart and provide trickery so that at least you can guarantee that "some" things happen in reliable orders. And then you construct software such that correct behavior is dependent only on that subset of ordering and not everything.
Because there is no determinism in SMP, only that which you construct.
Your processor executes a single thread of instructions also in a non-deterministic order, based on complex internal state. We'd never say it was non-deterministic, as you can't detect it.
Moreover, polite explanations may bring enlightenment, but aggressive scoldings are almost guaranteed stop people from accepting your words, even if they're true, because in order to accept their truth, they have to accept the unpleasant implications of your words.
I think learning should generally be as pleasant as possible. Putting the work in is hard enough as it is.
If you have any relevant information, would you care to elaborate more on why is it considered industry standard and how did it gain such status?
This is the first time I'm hearing of both the author and the book (which probably says more about me), and it seems kind of odd that the definition differs from what is usually taught in class (at least mine). Of course, it wouldn't be the first time that popular use of some word differs from its original intended meaning, but I'm just interested in the wider historical context.
Thanks again.
https://cs.illinois.edu/about/people/faculty/padua
> David Padua has served as program committee member, program chair, or general chair to more than 70 conferences and workshops. He was the Editor-in-Chief of Springer‐Verlag’s Encyclopedia of Parallel Computing and is currently a member of the editorial board of the Communications of the ACM, the Journal of Parallel and Distributed Computing, and the International Journal of Parallel Programming.
Not directed at you specifically, but just a reminder for anyone who hasn’t been burned by it yet: floating point addition is not associative. (a+b)+c != a+(b+c). It’s close, but if you’re not careful you can get bad results where the accumulated small errors turn into wrong answers.
In fact the whole concept of "symmetric multiprocessing" demands it.
Of course if the result is non-deterministic it doesn't satisfy those conditions.
Doubly so if you must guarantee determinism across multiple platforms! IEEE 754-2008 helps but it defines cross-platform determinism for just a subset of operations. Compilers can also sometimes botch your FP code (there's a number of gotchas - for example, if any library anywhere in your program uses -ffast-math, it may infect the whole program https://stackoverflow.com/questions/68938175/what-happens-if...)
You can look at the crlibm papers for example.
And FWIW: CPU-internal instruction reordering is observable too, though the details there get complicated. x86 hides (almost!) all the complexity from you, but ARM's OOO is leakier and requires careful attention to memory barriers.
And also, tell that to the people that went from a compiler using x87 instructions to one using SSE instructions and between two binaries from the same code get different results. Yes, the exact same suite of FP instructions should always give the same results. And that's also supposing you're not loading some library that sets ugly fast-maths flags (see the recent yak-shaving session by @moyix).
I wish we had a language that guaranteed that the results of a computation were deterministic, all the while it properly enabled the use of all available hardware resources (so: using SIMD, all CPU cores, and also offloading some code to the a GPU if available), even if it had some overhead. Doing this manually is ridiculously difficult if you want to write high performance software, specially if you use GPUs.
[*] See https://stackoverflow.com/questions/42181795/is-ieee-754-200... - the amazing Rapier physics engine https://rapier.rs/ leverages IEEE 754-2008 to have a cross-platform deterministic mode that will run physics exactly the same way in every supported platform https://rapier.rs/docs/user_guides/rust/determinism/ - but this means taking a huge performance hit: you can't use SIMD and you must run the physics on a single thread.
Floating-point is not associative. Reordering operations yields different results, so no compiler will do so, unless you specifically disable standards conformance.
The use of SIMD, which is just a type of instruction-level parallelism, has no effect on the result of floating-point operations, unless of course you reorder your operations so that they may be parallelized.
What does affect the result of floating-point operations is when rounding happens and at what precision. If we're talking about C, the compiler is allowed to run intermediate operations with higher precision than that mandated by its type. This is merely so that it can use x87 which is 96-bit long by default and only round when it spills to memory and needs to store a 64-bit or 32-bit value. Compilers have flags to disable that behaviour, and it doesn't apply when the SSE unit instead of x87 is used. Using SSE for floating-point doesn't necessarily mean it's using SIMD, most of the instructions have scalar variants.
Another example is FMA, which might be substituted for any multiply+add operations.
In practice if your code breaks with this it just means it was incorrect in the first place.
[1]: https://randomascii.wordpress.com/2012/03/21/intermediate-fl...
Also, that kind of language is absolutely not warranted.
If you want pleasant learning on the Internet, don't "learn by teaching" through blogs that mislead other learners, even if only in terminology use.
People will tear that apart.
Regardless, that's still one more reason not to be unpleasant to people when correcting them - you might just find yourself in the wrong.
I'll also point out that fma is relatively new, so it's pretty easy to write code that works fine when compiled with default x86_64/SSE2 but will break when compiled for a more recent target cpu.
The compiler may use increased precision for intermediate computations. That means sometimes it will, sometimes it won't. If you understand the basics of the situations where it will do so, you can see it depends on register allocation, which of course not only depends on optimization level, but also can change anytime you change anything at all in the source code.