The Problem with Threads (2006) [pdf]

The Problem with Threads (2006) [pdf](www2.eecs.berkeley.edu)

80 points by DonbunEf7 8 years ago | 55 comments

jgtrosh 8 years ago |

This piece seems to have predicted a very active field in everyday software development since then.

What are the alternative paradigms that have actually become common use? Coroutines, async/await, that's what I hear about online but what are others? I've seen people who touted zmq-communicating-processes with standard patterns as the solution to all problems, and I'm happy not to have to maintain the results.

Have we effectively “solved” the concurrency problem, and if so what's left as an exercise for the future?

chrisseaton 8 years ago | |

Although this paper talks initially about concurrency, you can see that really he's talking about concurrency specifically for the purpose of parallelism.

Coroutines don't solve the parallelism part, because they're concurrent but exclusive.

Async/await as implemented in JavaScript doesn't solve the parallelism part either for the same reason, and async/await as implemented in C# has exactly the same problem as threads.

There are many ideas for how to solve the problem - but I think anyone who is honest will tell you none are a perfect solution to all situations where you want parallelism or concurrency.

For example to use zmq-communicating-processes effectively you need a problem where you can divide the data or tasks cleanly a-priori. We simply don't have the mathematical understanding of how to do that to some important algorithms that people really need to run in parallel today, such as triangulation or mesh refinement.

We probably need some radical new idea, or maybe it's looking increasingly like only a mix of ideas will work.

amelius 8 years ago | | |

Triangulation and mesh refinement seem suitable for divide and conquer, except perhaps in pathological cases.

toast0 8 years ago | |

At risk of being that guy, the actor model (ala Erlang) is pretty good at concurrency. If you're unfamiliar, it's basically no shared state, and communication with other actors (Erlang processes) by sending asynchronous messages to the other actor's message queue.

The code for each actor is usually pretty small and easy to reason about. However, emergent behavior of the system, and ordering between messages from multiple actors can become tricky. Also, exposure to this idea long term will warp your mind :)

chrisseaton 8 years ago | | |

The actor model is very vulnerable to race conditions - that’s a big downside to it.

acjohnson55 8 years ago | | |

The actor model is a great tool, but I think it's best looked at as a low level concurrency primitive. Most of the time, folks should be working with higher level constructs in conceptually simpler control flow paradigms like call-and-return (async/await) or streams.

dnautics 8 years ago | | |

It's also relatively easy to implement async/await using actor primitives.

pcwalton 8 years ago | |

Data parallelism in CUDA, OpenGL, and other GPU APIs is doing fantastically and has for decades. (If writes are allowed, these APIs technically have the same problems as threads, but in practice they're easier to deal with since traditional mutex locks and condition variables are mostly unavailable in that environment, and the APIs force you to carefully declare the sharing semantics of your data buffers.)

Most parallel (not concurrent) problems map well to the data parallel model. Even Make is basically a data parallel API with read-only constant data, just with a more complex dependency graph.

barbegal 8 years ago | |

Promises in javascript have become quite popular. Unfortunately, they're not understood very well so they aren't being used much in areas where they can improve the performance of javascript applications and instead are being used to reduce nested callbacks.

klodolph 8 years ago | | |

There's nothing wrong with using promises to reduce nested callbacks.

acjohnson55 8 years ago | | |

Really? At my company, promises have long since won the day. Now I'm trying to get people on async/await, which gets you like 90% of the way to the simplicity of synchronous code.

nickpsecurity 8 years ago | |

On the parallelism side, it might bring ideas to look at languages attempting it so far. IBM's X10, Cray's Chapel, and Taft's ParaSail come to mind.

pcwalton 8 years ago | | |

You don't even need to get that exotic. OpenGL shaders, for instance, offer a simple, safe data parallelism model.

zzzcpan 8 years ago | |

What do you mean by "the concurrency problem"?

rurban 8 years ago | | |

Deadlocks and data races.

Which boils down to problems created by the POSIX implementation with condvars, mutex and semaphores. No lockless and waitfree data structures.

With threads there are also minor hidden contants: limited stack size, high cost of context switches. And random order of evaluation.

Lockless threading semantics needs to know ownership, copy or ref and relationship to be able to fix these problems. I only know a few not well-known languages who actually did a solve these problems.

zbentley 8 years ago | | |

My hunch: that modern programming often requires concurrent execution of software, but that most ways in which we have to model concurrency in code are at best hard to learn, and are frequently orders of magnitude harder to learn and use.

rectang 8 years ago |

I've always liked section 3 of this paper, specifically the concept that "infinite interleavings" make threads executing in parallel non-deterministic and difficult to reason about. That gets to the heart of why threaded programs are so prone to heisenbugs.

"They make programs absurdly nondeterministic, and rely on programming style to constrain that nondeterminism to achieve deterministic aims."

You can't write an infinite number of test cases for all those interleavings, and it requires hard thought to suss out where any problems might lie.