Introducing the B3 JIT compiler

Introducing the B3 JIT compiler(webkit.org)

213 points by basugasubaku 10 years ago | 128 comments

munificent 10 years ago |

Really cool article. Posts like this always make me wonder what the state of the programming would be if browsers hadn't sucked up almost all of the world's compiler optimizers.

beagle3 10 years ago | |

Look at what Mike Pall did with LuaJIT2 - I assume if the world wasn't so focused on the web, we would see more of it in other languages. But really, things aren't that bad. Microsoft has enough good people working on RyuJIT, PyPy has some of the best people advancing metatracing JITs, and Mike Pall is a god among men.

eyan 10 years ago | | |

The Father, The Son, The Holy Ghost, and Mike Pall.

ajross 10 years ago | |

To be fair, GPU vendors sucked up a ton too.

But considering that optimized scalar code performance has moved, what, maybe 40% over the last two decades, I'm going to say "not much". Compilers are sexy, but they're very much a solved problem. If we were all forced to get by with the optimized performance we saw from GCC 2.7.2, I think we'd all survive. Most of us wouldn't even notice the change.

kannanvijayan 10 years ago | | |

I'd disagree. Classical compiler work is very mature, yes - and new progress in things like register allocation and backend IR-based optimization stuff is well trod ground.

But in the context of JIT compilers for dynamically typed languages, in particular the space involving runtime inferred hidden type models, there is a TON of work left on the table.

It hasn't been paid much attention to in academia, IMHO largley because of a historical perspective on optimizing dynamic languages as "not classy" work among language theorists. I hope that perspective changes over time.

munificent 10 years ago | | |

> Compilers are sexy, but they're very much a solved problem.

Not for all of the other widely-used languages that still have incredibly simple interpreters. Think how much energy could have been saved if Ruby, Python, and PHP were all as fast as your average JS engine.

mafribe 10 years ago | | |

    Compilers are sexy, but they're very much a solved problem.

This may be true for sequential languages, but is very much false for the compilation of concurrency and parallelism. It's basically not known how to do this well. Part of the problem is that CPU architectures with parallel features have not yet stabilised.

For sequential languages the problem has shifted: how can I get a performant compiler easily. The most interesting answer to this question is PyPy's meta-tracing, and that's work is from 2009, and far from played out.

pcwalton 10 years ago | | |

> Most of us wouldn't even notice the change.

A 40% decrease in optimization is enough to drop framerates from 60fps to 30fps easily, so I'm pretty sure we would notice it.

nly 10 years ago | | |

> optimized scalar code performance has moved, what, maybe 40% over the last two decades

I'm not convinced. Raw single-thread number crunching performance is somewhere around _two to three fold_, clock-for-clock, on Intel x86, over that of 10-15 years ago. What methodology do you use to attribute only a fraction of those gains to language optimizers? And even if you are correct, why is it meaningful? Who is going to have invested energy in optimising the shit out of mundane codegen when hardware performance will have just come and stolen your thunder a few months later?

The problem we have now is that CPUs are gaining ever more complex behaviour, peculiarities, and sensitivities. I'd say compiler engineering is far from a "solved problem", even for statically-typed languages.

Joky 10 years ago | | |

> Compilers are sexy, but they're very much a solved problem

No they're not, and won't be for long (ever?). However it does not matter because they are "good enough".

Compilers are driven by heuristics which provide "reasonable" results in most cases for common architectures. But they still leave a lot on the table. Compiler writers have to trade compile-time with execution-time. Now we're not talking about an order of magnitude, but rather ~20%-30% in some workloads. When it matters (I guess for people like Google/Facebook/Amazon/... it translates in electricity bill and a number of racks to add to the datacenter) people may have to get down to the assembly level for a very small (and hot) part of the program.

mrspeaker 10 years ago | |

That seems a bit chicken-and-the-egg-y though: if the web didn't become a global phenomenon then there would be far less interest in improving the browsers, far less business need for programmers, and far fewer people working on whatever they'd be working on if they weren't working on browsers.

sklogic 10 years ago | |

They're solving a non-issue. The rest of the world is perfectly fine with the statically typed, well designed languages that are easy to compile. And only the web world is so obssessed with smart compilers compensating (impressively, but still far from being sufficient) for multiple deficiencies in the language design.

pcwalton 10 years ago | | |

> The rest of the world is perfectly fine with the statically typed, well designed languages that are easy to compile.

Python, Ruby, PHP, and Perl aren't "the rest of the world"? As far as compilers are concerned, all of those languages have more troublesome semantics than JavaScript does.

> And only the web world is so obssessed with smart compilers compensating (impressively, but still far from being sufficient) for multiple deficiencies in the language design.

You have no idea how much compilers have to compensate for the deficiencies in C and C++'s design.

geofft 10 years ago | | |

> statically typed, well designed languages that are easy to compile

Which ones are these? All of the statically-typed and well-designed languages I can think of are, at the least, hard to compile well, if not hard to compile in the first place. (Haskell and Rust both come to mind; there are few Haskell compilers other than GHC, and no Rust compilers other than rustc.) The languages that are easier to compile are either not statically-typed in a useful way, or not well-designed, or both.

yxhuvud 10 years ago | |

While they have sucked up a lot, it is far from certain all would have been employed optimizing compilers if that hadn't happened. Demand tend to create the supply.

Also, many of the improvements done for JS will certainly trickle down to Python, Ruby, PHP eventually.

cpr 10 years ago |

Good to see the Webkit team (mostly Apple) continue putting serious energy into JS performance. Take a bit of guts to throw out the whole LLVM layer in order to get compilation performance...

It's also encouraging to see them opening up about future directions rather than just popping full-blown features from the head of Zeus every so often. (Not that they owe us anything... ;-)

(Edit: it's also damned impressive for 2 people in 3-4 months.)

MrBuddyCasino 10 years ago | |

Gutsy move indeed. Though I wonder, what was the development cost in real life cash for gaining those 5% of performance?

pizlonator 10 years ago | | |

3 months. Two people working on it (me and @awfulben).

om2 10 years ago | | |

As Phil said, the development cost was not that high all things considered. But even if it was - think about the number of user-hours that 5% is leveraged across, and the resulting time savings and battery life savings for those users. It would be worth investing even more than we did for 5%!

legulere 10 years ago |

tl;dr: B3 will replace LLVM in the FTL JIT of webkit. LLVM isn't performing fast enough for JIT mainly because it's so memory hungry and misses optimisations that depend on javascript semantics. They got an around 5x compile time reduction and from 0% up to around 10% performance boost in general.

zepto 10 years ago | |

Actually the bigger reason is compile time - better optimizations based on JavaScript semantics are a secondary advantage.

pizlonator 10 years ago | | |

I think that's mostly accurate, in the sense that we wouldn't have done this if it was only motivated by specializing for JavaScript semantics. We had gotten pretty good at having our high-level compiler (DFG) burn away the JavaScript crazy and leave behind fairly tight code for LLVM to optimize.

But as soon as we realized that we had such a huge compile time opportunity, of course we optimized the heck out of the new compiler for the kinds of things that we always wished LLVM could do - like very lightweight patchpoints and some opcodes that are an obvious nod for what dynamic languages want (ChillDiv, ChillMod, CheckAdd, CheckSub, CheckMul, etc).

alberth 10 years ago |

>> "tl;dr: B3 will replace LLVM in the FTL JIT of webkit. LLVM isn't performing fast enough for JIT mainly because it's so memory hungry and misses optimisations that depend on javascript semantics. They got an around 5x compile time reduction and from 0% up to around 10% performance boost in general." [1]

Is this a knock on LLVM then?

I wonder then specifically if this brings to light any concerns over Swift (another dynamic language, and was created by the same person who created LLVM as well). [2]

Seems weird that the original creator of LLVM was able to make a dynamic language such as Swift - without any problems.

[1] https://news.ycombinator.com/item?id=11105231

[2] http://nondot.org/sabre/

jlebar 10 years ago |

It's worth noticing that most of the optimizations here are for "space" -- reducing the working set size or the number of memory accesses. CPUs have gotten much faster than memory blah blah. This is the sort of thing where microbenchmarks may mislead you, because you WSS is probably not realistic.

I think we don't have great tools for helping with this sort of optimization. One can use perf to find cache misses, but that doesn't necessarily tell the whole story, as you might blame some other piece of code for causing a miss. Maybe I should try cachegrind again...

panic 10 years ago |

Cool stuff! Does anyone know why the geometric mean is used for averaging benchmark scores rather than the usual arithmetic mean?

jsnell 10 years ago | |

I would say that geometric mean is the usual way of averaging benchmark scores. It has the property that a given relative speedup on a component benchmark always has the same effect on the aggregate score. With an arithmetic mean the component benchmarks with a longer runtime will dominate the aggregate. Normalizing the results before applying the arithmetic mean doesn't really help either -- the first X% improvement to a component benchmark would still be valued more than the second X% speedup.

DannyBee 10 years ago |

Interestingly, much of their complaints around pointer chasing, etc, are things LLVM plans on solving in the next 6-8 months. i'm a bit surprised they never bothered to email the mailing list and say "hey guys, any plans to resolve this" before going and doing all of this work. But building new JITs is fun and shiny, so ...

Joky 10 years ago | |

LLVM instruction selection is slow, there is a "fast-path" which hasn't received much attention (it is only used for -O0 in clang). The new instruction selector work just started and will take a couple of years, considering the tradeoff between spending 3 months on it and waiting a few years for LLVM to be improved (without any guarantee of LLVM reaching the same speed as what they did). See also some thoughts from a LLVM developer on optimizing for high-level languages: http://lists.llvm.org/pipermail/llvm-dev/2016-February/09546...

DannyBee 10 years ago | | |

The problem with the path they've taken is it has a finite end.

This is why, when they compare it to v8/etc, it's kind of funny. They all have the same curve.

Basically all of these things, all of them, end up with roughly the same deficiencies once you cherry pick the low hanging fruit[1], and then they stall out, and get replaced a few years later when someone decides thing X can't do the job, and they need to write a new one. None of them ever get to a truly good state.

Rinse, wash, repeat.

The only thing these things make real progress, is by doing what LLVM did - someone works on it for years.

Let me quote a former colleague at IBM - "there is no secret silver bullet to really good compilers, it's just a lot of long hard work". If you keep resetting that hard work every couple years, that seems ... silly.

TL;DR If you really believe they've totally gotten everywhere they need to be in 3 months, i've got a bridge to sell you

[1] For example, good loop vectorization and SLP vectorization is hard.

Alphasite_ 10 years ago | |

I imagine they did, considering the head of LLVM is a (probably distant) coworker of theirs.

DannyBee 10 years ago | | |

"The head of LLVM" - no such thing exists, but okay. (People have such weird ideas about how these projects work in practice).

ck2 10 years ago |

tl;dr

https://webkit.org/blog-files/kraken.png

https://webkit.org/blog-files/octane.png

seriously though, dang, how many years of coding to get to that level of expertise

Ecco 10 years ago |

I'm really wondering about the politics behind all this. I mean both LLVM and WebKit are Apple projects (even though they're Open Source). So it would have been reasonable to expect an improvement of LLVM instead of ditching it altogether.

pcwalton 10 years ago | |

I highly doubt there are any politics behind it. LLVM is an AOT C/C++ compiler at its core, and the tradeoffs it makes don't always make sense for dynamic, JIT compiled languages with extreme emphasis on compilation speed like JS. Personally, I expected this to happen.

gsg 10 years ago | |

Specialising LLVM for dynamic compilation might well come at the expense of LLVM's strengths as an AoT compiler, and it would probably not be easy in any case.

In addition to that, dedicated implementations can take various shortcuts to make their job easier - there are some nice examples given in the link. LuaJIT is example of a compiler project that benefits from being heavily specialised to a particular job, to remarkable effect.

zepto 10 years ago | |

If you read the technical explanation you will see that there are zero politics behind this.