Teaching Concurrency (2009) [pdf](research.microsoft.com) |
Teaching Concurrency (2009) [pdf](research.microsoft.com) |
Concurrency can result in increased maintenance costs and complexity.
Concurrency is also not more efficient on a single core.
Concurrency can help with latency and response time.
In embedded systems in particular, there is an over-use of concurrency which often results in bloated, complex code.
Concurrency can be more efficient even on a single core. When blocking synchronous I/O is involved, concurrency may help saturate the bandwidth with multiple in-flight requests.
Otherwise, no.
(and in general, people just throw concurrency at the problem instead of analysing whether they are in fact I/O bound or CPU bound).
(Downvoted why??)
Concurrency is usually not a feature of the algorithm but a feature of the problem domain. If you have many requests coming in at the same time, all competing for a limited amount of computational resources -- you have concurrency.
How you handle it is a different matter. You can decide to handle the requests one by one, but then, by Little's law, your throughput would be very low, and your server will crash if the rate of requests is over some small limit (which depends on the time it takes to service each request).
Concurrency improves performance when a process accesses both the same computational resource and some other high-latency sharable resource.
But in fact concurrency is inevitable in absence of OS threads that can be blocked (another potential clusterfuck) or of some form of continuations support, because it is a direct consequence of asynchrony.
And asynchrony isn't avoidable, all you can do is to find abstractions that make it more deterministic.
We are all taught from uni about how concurrency is implemented and most applications use concurrency as a design-tool to decompose a problem into its functions.
Unfortunately, this tends to produce a sub-optimal result as the inefficiencies become visible on small embedded/real-time systems.
Its difficult to give any general advice, but have a look at real-time analysis to get an idea of the real issues.... and don't blindly throw tasks at a problem when a simple superloop is more efficient/simple/maintainable.
This isn't a hard and fast rule. There is overhead to parallelism.
For I/O bound code, concurrency can give you some benefit.
...but in general, people do not analyse this before throwing tasks at a problem.
When I grew up to be a programmer, never had much problems with concurrent stuff.
IMO designing concurrent programs is conceptually similar to building complex high-throughput low latency railway networks in the game.
You can serialize the transitions if you want, but that usually costs performance. This is especially true for distributed parallel computing, where the state is also distributed.
This is our partial correctness property
PartialCorrectness ≜ AllDone ⇒ ∃ p ∈ ProcSet : y[p] = 1
And this is the inductive invariant: Inv ≜ PartialCorrectness
∧ ∀ p ∈ ProcSet : pc[p] ≠ "Line1" ⇒ x[p] = 1
We need to show that Inv implies PartialCorrectness (trivial), that Inv holds in the initial state, and that if it holds in any state s, then it holds in any possible consecutive step s'. It's easy to see that it holds in the initial state. Now, let's assume it holds in s, and prove for s'. To make this transition, some process p either executes line 1 or executes line 2. If it executes 1, then PartialCorrectness doesn't change because no new process is done. The second conjunct holds because we've just left line 1 and x has been assigned (by the definition of line 1). If we are currently in line 2, the second conjunct of the invariant doesn't change. By the definition of this action, we'll be done. Here we have two cases, either we set y to 1, or we set y to zero. If we set y to 1, we're done and PartialCorrectness holds. If we set y to 0 then by the assumption of the invariant, the process we depend on must not be done, hence AllDone is false, and PartialCorrectness holds. QED.When you combine several sequences of state transitions (one sequence per thread), you don’t get another sequence of state transitions. When two CPU cores perform two state transitions at the same time, you typically cannot determine whichever of those transitions happened first.
We might have different definitions what’s sequence. I mean this: https://en.wikipedia.org/wiki/Sequence As you see, global order is required for bunch of elements (in this case, state transitions) to form a sequence. And in concurrent and especially parallel programming, there’s no global order for those transitions. Hence, those transitions don’t form a sequence.
A transition is not one machine head changing its position and state. {head0loc=0, head0state=2}->{head0loc=1, head0state=2} is not a transition because that does not identify a pair of whole-system states.
No it’s not. If the heads aren’t moving in lockstep, what actually happens is you’ll extremely rarely have a moment of time when both locations are whole numbers.
At one moment you have { head1loc=0, head2loc=4.17 }, head #1 reached the cell position on the tape, head #2 is still moving. After a while, you’ll have { head1loc=0.872, head2loc=5}, head #1 is on its way, head #2 reached the cell position.
Looks like for a Turing machine with two asynchronously moving heads, there’s no usable concept of “state” at all. That’s very close to what happens when doing actual parallel programming on real hardware.
Good luck applying your idea of computation as a sequence of state transitions to that.
When was the last time you monitored what parts of your system were truly I/O bound and not just blocked on another task?
Of course it helps when the platform has the proper tools for it. I end up using JMH and YourKit Profiler weekly, because a memory or CPU leak can crash our process and we have pretty strict reliability requirements with a single server processing about 5000 events per second - not extremely demanding, but when it crashes, we can lose money, with the redundancy infrastructure being pretty new.
Would that be just anecdotal ?
..and I can tell you that precisely none of them performed a correct analysis of a systems concurrency needs before 'designing' it.
A common artifact of this is far too many tasks.
The sequence is heavily non-deterministic, but that's life in a concurrent universe.
The thing is called “partially ordered set”: https://en.wikipedia.org/wiki/Partially_ordered_set
I’d like to point out, the very moment you hooked second asynchronously moving head to the machine, the abstraction leaked spectacularly. With two asynchronous heads, the machine no longer has state; suddenly we need to consider physical time, not just count steps; and so on.
Turing machine is useful abstraction for sequential computing, but nearly useless to research parallel computing problems. The same happens with many other abstractions and approaches when going parallel.
> either one of those moves completes before the other
Yeah, but in parallel computing, we don’t know that order, and have no way of knowing. Meaning the “sequence of state transitions” idea is IMO nearly useless.
...depending on how fast you're moving relative to the machine heads and in what direction. Simultaneity does not exist in the real world.