Joe Armstrong: Solving the wrong problem

Joe Armstrong: Solving the wrong problem(joearms.github.com)

237 points by geoffhill 13 years ago | 168 comments

mncolinlee 13 years ago |

I worked in Cray's compiler department for seven years. If we couldn't dramatically parallelize someone's code, we couldn't sell a vector supercomputer. Period.

Automatic parallelization is very possible. The problem is tends to be less efficient. A decent developer can often do a better job than the compiler by performing manual code restructuring. The compiler cannot always determine which changes are safe without pragmas to guide it. With that said, our top compiler devs did some amazing work adding automatic parallelization to some awful code.

We inevitably sold our supercomputers because we had application experts who would manually restructure the most mission-critical code to fit cache lines and fill the vectors. Most other problems would perform quite adequately with the automatically-generated code.

What this article lacks is a description of why Erlang is more uniquely suited to writing parallel code than all the other natively parallel languages like Unified Parallel C, Fortran2008, Chapel, Golang, etc. There are so many choices and many have been around for a long, long time.

larsberg 13 years ago | |

I completely agree. As someone who works on a parallel functional language, it's very hard to sell a parallel language that isn't as fast as parallel fortran or hand-tuned C code that uses pthreads and the fastest parallel implementation of BLAS and other libraries.

The people who really care about performance are using those. The ones who don't are honestly mostly still writing code that has large constant factors of _sequential_ performance available as low-hanging fruit. Sure, they'd take free performance, but the rewrite/porting/debug costs (even in automatic parallel compilers for the same language) are at least as high as just firing up a profiler.

I'm increasingly of the opinion that if you can't win a top spot in the supercomputer performance fight, you have to have a unique application domain. Erlang's seems to be reliability. I suspect that a parallel variant of javascript that runs in every browser will end up being the next compelling parallel language, as opposed to all of us who are either inventing or attempting to resurrect languages that target x86 or GPUs.

necrodome 13 years ago | | |

Does/can Manticore have a such unique application domain?

rdtsc 13 years ago | |

As the complexity increases one thing that starts to show is not speedup in palatalization but fault tolerance.

Debugging a non-concurrent program can be difficult, now throw in threads, shared memory, pointers and it quickly becomes a nightmare. The system could be fast, but if it crashes every week, is it useful. Often the answer is yes. But in some cases the answer it no.

There is no free lunch. Shared-nothing architecture doesn't come for free. You pay a toll in _sequential_ performance. It might or not matter to you.

> Automatic parallelization is very possible.

For numerical algorithms many and for small function scope scale I can see that. Numerical code. But the problem is (and what Joe was pointing out) is that applications and algorithm design has to be build concurrently to start with.

A compile will not re-factor your code to not access a single database and acquire a lock from 100k clients into using some eventually consistent or event-sourcing data store. It is something that has to be built from ground up.

Same thing with fault tolerance, it has to built in from ground up. Adding it later is not easy.

seanmcdirmid 13 years ago | |

Erlang is not designed for parallel programming; it is designed for concurrent programming. These are two very different programming domains with different problems.

Every time someone conflates parallelism with concurrency...everyone gets very confused.

unoti 13 years ago | | |

Isn't it really fair to say that it's designed for both? The way it uses immutable state and something-similar-to-s-expressions to express data make it very straightforward (or even transparent) to distribute work between multiple processes and separate computers, in addition to how it makes it practical and simple to break work into small chunks that can be interleaved easily within the same thread. It's really designed for doing both very well, wouldn't you say?

willvarfar 13 years ago | | |

Isn't Joe's post specifically about parallelism and how Erlang is designed for it?

reeses 13 years ago | |

Fortran (especially ancient, wheezy Fortran) lent itself to supervised automatic parallelization because of its lack of dynamic arrays. It was "easy" to vectorize code at compile time when you had so much information about the runtime expectations.

We can do some of this now in most languages with hot-spot profiling, basic block analysis, selective inlining, and other innovations. However, you really can't beat low-level languages that explicitly "hint" at their execution paths.

By the same token, Cray's applications were...so...slow if you were foolish enough to run them on the expensive hardware and not the FEPs.

SoftwareMaven 13 years ago | |

If the code isn't efficient, you'll run into Ahmdal's Law much more quickly. In fact, I think your comment aligns with what Joe was saying: automated paralellization is not going to happen. You will have to go through and find all your contention points, just like your application experts did.

I completely agree with your last sentence. For those of us who have dived in a way, the advantages become clear, but TFA was really just preaching to the choir.

konstruktor 13 years ago |

> At this point in time, sequential programs started getting slower, year on year, and parallel programs started getting faster.

The first part of this statement is plain wrong. Single thread performance has improved a lot due to better CPU architecture. Look at http://www.cpubenchmark.net/singleThread.html and compare CPUs with the same clock rate, where a 2.5 GHz. An April 2012 Intel Core i7-3770T scores 1971 points while a July 2008 Intel Core2 Duo T9400 scores 1005 points. This is almost double the score in less than four years. Of course, one factor is the larger cache that the quad core has, but this refutes Armstrong's point that the multicore age is bad for single thread performance even more.

For exposure to a more balanced point of view, I would highly recommend Martin Thompson's blog mechanical-sympathy.blogspot.com. It is a good a starting point on how far single threaded programs can be pushed and where multi-threading can even be detrimental.

Also, I think that fault tolerance is where Erlang really shines. More than a decade after OTP, projects like Akka and Hysterix are finally venturing in the right direction.

daleharvey 13 years ago |

I cant help but read a lot of irony in this.

Erlang solved a problem really well over 20 years ago, its the sanest language by far that I have used when dealing with concurrent programming. (I havent tried go or dart yet) and I owe a lot of what I know to the very smart people building erlang.

However it has barely evolved in the last 10 years, will 2013 be the year of the structs? (I doubt it), every new release comes with some nice sounding benchmark about how much faster your programs will run in parallel and there is never a mention of whats actually important to programmers, a vibrant ecosystem and community, language improvements that doesnt make it feel like you are programming in the 80's. Better constructs for reusing and packaging code in a sane way.

Its fairly trivial in most languages to get the concurrency you need, I think erlang is solving the wrong problem in 2013.

dvt 13 years ago |

Same old hype. Erlang is good I guess, and I've used it in production a couple of times. But it's just a language that solves 3 problems but creates another 30. Just like C++11, Dart, Go, etc.

This kind of belligerent rhetoric (we're solving the right problems, everyone else is dumb) is the kind of drivel that gives momentum to language zealots that think language X is better than language Y.

I've contributed to Google Go in the early phases and I was naïve and really believed that Go was the "next big thing." But it turned out to be yet another general-purpose language with some things that were really interesting (goroutines, garbage collection, etc.) but some things that were just same-old same-old. Now, I'm editing a book about Dart and I've since lost my enthusiasm for new languages; I can already see that Dart solves some problems but often creates new ones.

And in a lot of ways Erlang sucks, too. The syntax is outdated and stupid (Prolog lol), it has weird type coercion, memory management isn't handled that well (and many more). Of course, since Facebook uses it, people think it's a magic bullet (Erlang is to Facebook like Python is to Google).

The article also forces readers to attack a straw man. Often times, algorithms simply cannot be parallelized. The Fibonacci sequence is a popular example (unless you use something called a prefix sum -- but that's a special case). So in many ways, the rhetorical question posed by the article -- "but will your servers with 24 cores be 24 times faster?" -- is just silly.

djvu9 13 years ago |

We used Erlang several years ago. The code base has ~100k lines of code so it should be representative. We abandoned it later and switched to C++ because of performance (mostly in mnesia) and quality issues (some drivers in OTP). We didn't expect too much from performance considering it is functional (which seldom does in place update) but it is still below expectation.

It is understandable though. Just think about how much resources have been put into development of Erlang VM and the runtime/libraries(OTP), and compare it with JVM/JDK. There is just no magic in software development. When talking about high concurrency and performance, the essential things are data layout, cache locality and CPU scheduling etc for your business scenario, not the language.

acqq 13 years ago | |

The tests here confirm your experience, Erlang is for many algorithms significantly slower, even when more cores are used:

http://benchmarksgame.alioth.debian.org/u64q/erlang.php

digitalzombie 13 years ago | | |

Do not use Erlang for number performance. I tried with EulerProject and create some prime number generators in Erlang. It is slow as hell.

Use it for what it is built for!

eridius 13 years ago |

If zlib could be rewritten in Erlang to be lock-free, why not just rewrite it in C to be lock-free instead of porting it? AFAIK Erlang isn't some magical language that allows traditionally-locked data structures to become lock-free.

eksith 13 years ago |

There's one big problem Erlang couldn't solve that I live with to this day :

Unlike another general purpose language (like say, C++ or C#) allow me to grasp what's happening after staring at it for 30 seconds. This is the same problem, I have with Lisp.

Maybe I'm just dyslexic, but these rhetoric pieces for one language or another that says it's concurrent (which it is), fast (obviously), more C than C, will bring the dead to life, create unicorns and other wonderful, fantastic things that I'm sure are all true, just don't seem to be capable of passing into my grey matter.

You know another thing all these amazing super power languages haven't been able to do that even a crappy, broken, in many ways outright wrong, carcinogenic etc... etc... language like even PHP has allowed me to do? Ship in 48 hours.

Before, I get flamed, I already tried that with Nitrogen (http://nitrogenproject.com). It didn't end well, but maybe it will work for someone already familiar with Erlang.

It's like you've written the Mahabharata; it'a a masterpiece and it's one of the greatest epics of all time. Unfortunately, it's written in Sanskrit.

acdha 13 years ago |

zlib is fine as long as you don't give an non-threadsafe memory allocator - see http://www.gzip.org/zlib/zlib_faq.html#faq21. As far as I can tell, it either means that the summary was imprecise and the slowdown was in the image processing code and not zlib or that they chose to rewrite (and debug) a big chunk of code rather than read the zlib documentation.

Ignoring that point, this seems like a poor point for comparison as it's a trivially parallelized task because zlib operates on streams and shouldn't have any thread contention. There's very little information in the description but unless there are key details missing, this doesn't sound like a problem where Erlang has much interesting to add. The most interesting aspect would be the relative measures for implementation complexity and debugging.

Uchikoma 13 years ago |

Because the lock-free Erlang meme doesn't die:

  1. Erlang has locks and semaphores [1], receive is a lock, actors are semaphore. Erlang chose a 1 semaphore/ 1 lock per process model
  2. Erlang scales better not because of being lock-free (see above), but because it easily uses async compared to other languages
  3. Async prevents deadlocks not Erlang being lock-free (see above)

Some 4year old reading http://james-iry.blogspot.de/2009/04/erlang-style-actors-are...

[1] http://en.wikipedia.org/wiki/Semaphore_(programming)

splicer 13 years ago |

> We’re right and the rest of the word is wrong. We (that is Erlang folks) are solving the right problem, the rest of the world (non Erlang people) are solving the wrong problem.

> The problem that the rest of the world is solving is how to parallelise legacy code.

As member of the rest of the world, I can assure you that I'm not trying to solve either of these problems. :p

InclinedPlane 13 years ago |

People have been thinking this, that it's vastly better to design for concurrency upfront, for literally decades. And every single time there has been a big sea change in processor technology it's always been the next generation which will see things like VLIW or Erlang and so forth come to the fore while what I will call "iterative advancements" and "patched solutions" turn out to have too many weaknesses to be competitive. In reality the reverse has happened, and new specialized languages and instruction sets have been relegated to niches.

It'll be the same over the next 20 years as well.

I predict that we'll see a lot of technological leaps which will serve as much to maintain the ability to run "old code" in new and interesting ways as to enable a brave new world of purpose-built languages.

In the next few decades we'll see advances in micro-chip fabrication and design as well as memory and storage technology (such as memristors) which will result in even handheld battery powered devices having vastly more processing power than high-end workstations do today.

Is that an environment in which one seeks to trade programmer effort and training in order to squeeze out the maximum possible efficiency from hugely abundant resources? Seems unlikely to me, to be honest.

Indeed, it seems like the trend of relying on even bloatier languages (like Java) will continue. Do you think anyone is going to seriously consider rewriting the code for a self-service point-of-sale terminal in Erlang in order to improve performance? That's not the long pole, it never has been, and it's becoming a shorter and shorter pole over time.

In the future we'll be drowning in processor cycles. The most important factor will very much not be figuring out how to use them most efficiently, it'll be figuring out how to maximize the value of programmer time and figuring out how to use any amount of cycles to provide value to customers effectively.

(I think that advancements in core, fundamental language design and efficiency will happen and take hold in the industry, but mostly via back-door means and blue sky research, rather than being forced into it through some impending limitation due to architecture.)

meshko 13 years ago |

The lack of understanding is amazingly widespread. I often have to explain to people that when they look at their CPU utilization and it is at 10% it means "you are throwing money way", not "you are efficient".

masklinn 13 years ago | |

That's not really true though, or at least not on all workloads: much as you are not "throwing money away" by not pegging your car engine in the red zone 100% of the time, you're not throwing money away by not being at 100% CPU all the time, there are other metrics, values and issues to take in account e.g. a pegged CPU but an unresponsive computer is useless for a desktop, a pegged CPU which can't serve requests because the CPU is pegged because it's swapping like mad is useless for a server, so is a server at 100% CPU when there's no load on it which will just keel over when people start trying to actually interact with it.

jeremyjh 13 years ago | | |

It sounds like you missed the point here. If an eight-core server is at 10% utilization, it effective has a single processor nearly pegged and the process doing it is thus CPU bound (and maybe serving responses at a high latency) while you have other cores sitting idle. Conserving CPU resources and running under capacity is wise, but has nothing at all to do with this comment.

alexchamberlain 13 years ago | |

It really depends why you are at 10%. A file server will probably spend the vast majority of its time waiting on I/O... That's not necessarily a bad thing.

kamaal 13 years ago |

This blog post shows everything that is wrong with languages like Lisp and Erlang. This is total disregard for that the rest of the world considers valuable to them.

The problem with these languages remain unchanged. The syntax is so strange and esoteric, learning and doing anything basic things with them will likely require months of learning and practice. This lone fact will make it impractical for 99% for all programmers in the world.

No serious company until its absolutely unavoidable(and situation gets completely unworkable without it) will ever use a language like Erlang or Lisp. Because every one knows the number of skilled people in market who know Erlang, are close to zero. And those who can work for you are going to be crazy expensive. And not to mention the night mare of maintaining the code in this kind of a language for years. There is no friendly documentation or a easy way a ordinary programmer can use to learn these languages. And there is no way the level of reusable solutions available for these languages as they are for other mainstream C based languages.

In short using these languages attracts massive maintenance nightmares.

The concurrency/parallelisation problem today is very similar to what memory management was in the 80's and 90's. Programmers hate to do it themselves. These are sort of things that the underlying technologies(Compilers/VM's) are supposed to do it for us.

I bet most of these super power languages will watch other pragmatic languages like Perl/Python/Ruby/Php etc eat their lunch over the next decade or so when they figure out more pragmatic means of achieving these goals.

zzzeek 13 years ago |

> The road to automatic parallelisation of sequential programs is littered with corpses. It can’t be done. (not quite true, in some specific circumstances it can, but this is by no means easy).

vs three paragraphs later

> Alexander’s talk gave us a glimpse of the future. His company concurix is showing us where the future leads. They have tools to automate the detection of sequential bottlenecks in Erlang code.

why is that not a contradiction? because an erlang program isn't "sequential" to start with?

masklinn 13 years ago | |

> why is that not a contradiction? because an erlang program isn't "sequential" to start with?

Yes. The point is that in a well-coded erlang program only bottlenecks should be sequential (and the bulk should be concurrent), the goal's tool would be (I haven't seen the presentation so I'm throwing ideas to the wall) to see what dependencies lead to sequences in the system reducing overall concurrency and leaving the developer to handle fixing this part if possible.

It doesn't try to automatically parallelize a sequential program, and it does not start from fully sequential programs in the first place.

(not saying I agree with Joe's assertions, they're quite inflammatory and at a very fundamental level lack solid evidence. I have to say I prefer his milder tone to this new "rha rha" one, though this one may yield more visibility for the language I fear the drama)

jameskilton 13 years ago | |

They're different statements. Taking a sequential program and automatically parallelizing it is a very hard problem. What this tool does, as I read it, is simply find sequential parts of code, and it's up to the devs to figure out how to parallelize said code.

zem 13 years ago | |

they aren't trying to automatically parallelise the program, they are just using the tool as a diagnostic aid to show where the sequential bottlenecks are, and where it would be most productive to rewrite that segment in a more parallel manner. think of it as a next-generation profiler.

dustismo 13 years ago | |

I think you missed the "automatic" part. Completely rewriting a program in a new language is certainly not automatic.

zzzeek 13 years ago | | |

both phrases feature the term "automate"...but yes, one is detection, one is resolution

surferbayarea 13 years ago |

Have you even used zlib in c++? The largest ecommerce site out there uses zlib in a multitreaded c++ application(24 cores, 100s of threads, 1000s requests/sec/server) and it works just fine! Bet you erlang can't come within a tenth of the performance of c++...

coldtea 13 years ago | |

>1000s requests/sec/server

That's not particularly impressive you know.

surferbayarea 13 years ago | | |

yes, because it also does other computation. Poiint was to illustrate that zlib can be used in a concurrent computing setting with high performance. The blog writer had claimed that zlib doesn't work in a multithreaded setting.

querulous 13 years ago | |

whatsapp is approaching 3 million simultaneous connected users on a single server using erlang

dap 13 years ago |

This completely misses the fact that many network services are not compute bound, and multi-tenancy (as we get from "the cloud") lets "legacy" code make very efficient use of CPU resources, even a larger number of relatively slow cores.

alexchamberlain 13 years ago |

A very very interesting article, but I was extremely disappointed with the example they gave.

So, there was an error in someone's code which you rewrote without the error and it ran faster? Well done detective...

We need more parallel programs, no doubt, but we need more, better programmers, who are willing to write in compiled languages with low-overhead.

artsrc 13 years ago |

Excel solves this right problem, and Erlang does not.

Erlang allows you to create concurrent programs, i.e.: programs where the result is schedule dependent.

One right problem is allowing people to write deterministic parallel programs. This gives you the speed (from parallel) with the reliability (from deterministic).

spenrose 13 years ago |

In conventional blocking languages, you can get a start on parallelizing your programs this way:

  - break program into function calls that match the steps that can happen in parallel

  - wrap the function calls in messages passed over the network
     + i.e. process(thing) -> post(thing)/poll_for_things()

  - split the sender and receiver into different processes

OF COURSE there are big advantages to using a language (Erlang) or a heavyweight framework (map/reduce) designed for concurrency. Rolling your own process-centric concurrency is a different set of tradeoffs, not a panacea. But it's worth considering for some problems.

uwiger 13 years ago |

To say that Erlang fails to deliver what most programmers need misses the point. If you have a mainstream problem, use a mainstream language!

I've spent many years developing and reviewing products in the telecoms realm, and have found that failing to realize when something like Erlang brings life-saving concepts to your project may well make the difference between delivering on time and disappearing into a black hole of endless complexity. It's not for everyone, but when it fits, boy does it help!

damian2000 13 years ago |

I would hazard a guess that 90%+ of the worlds programmers are working on projects that don't really need to use parallelism to get the job done.

tunesmith 13 years ago | |

The question is, will that percentage go up or down as time goes on?

ternaryoperator 13 years ago |

Joe Armstrong: "The problem that the rest of the world is solving is how to parallelise legacy code."

Donald Knuth: "During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading."

fulafel 13 years ago |

all true. to add' we are in a local optimum where we have a lot of fast non- parallel solutions and the parallel way needs >10 x parallelism. we'll see if we ever get to ubiquitous 100-way parallelism with no need for backward compatibility.

jlebrech 13 years ago |

Which frameworks can you use?

This? http://nitrogenproject.com/

melchebo 13 years ago | |

I suppose you mean web framework. There's Chicago Boss: http://www.chicagoboss.org/

Also a recent book using other Erlang web technology: http://www.amazon.com/Building-Web-Applications-Erlang-Worki...