Making Julia as Fast as C++ (2019)

Making Julia as Fast as C++ (2019)(flow.byu.edu)

86 points by d_tr 56 days ago | 62 comments

Punchline: rewrote the code to look almost identical to C++, hand-held the compiler by adding @-marks to disable safety checks, forced SIMD codegen and fastmath on.

End result: code that is uglier and still much slower than C++. Kind of a shame.

celrod 54 days ago | |

I was once a bit of a Julia performance expert, but moved toward c++ for hobby projects even while still using Julia professionally.

I wrote a blog post at the time with exactly that punchline (not explicitly stated, but just look at the code!): https://spmd.org/posts/multithreadedallocations/ The example was similar to a real production-critical hot path from work.

Maybe things changed since I left Julia, but that was December 2023, for years after this blog post.

arbitrandomuser 54 days ago | | |

hey , what happened to LoopModels ?

SatvikBeri 54 days ago | |

This is 7 years old. Julia is a totally different language by now.

As a quick anecdote, in our take-home interview exercise, we usually receive answers in C++ or Julia, and the two fastest answers have been in Julia.

HarHarVeryFunny 54 days ago | | |

I'd have to guess that this is because of ease of use. C++ lets you get as close to the metal as you choose to, so there is no reason why a C++ solution shouldn't be at least as fast as one written in any other language, and yet ...

Of course it also depends on what additional libaries you are using, especially when it comes to parallel/GPU programming in C++, but easy to believe that Julia out of the box makes it easy to write high performance parallel software.

d_tr 54 days ago | | |

> This is 7 years old.

Yeah, I actually totally forgot to check the date...

neutrinobro 54 days ago | |

Hardly seems worth the effort, perhaps things have improved since 2019. It would be interesting to see an updated benchmark, but if your going to end up with code that looks like C++ to get proper performance, you might as well write it in C++. My biggest problem with Julia is that they decided to use column-major indexing for multi-dimensional arrays (i.e. FORTRAN/MATLAB style). This makes interoperability with C/C++ and python numpy a real pain, since you can't do zero-copy array sharing between the two without one side being forced into strided-access. For that reason alone I haven't adopted it in any of my work-flows.

adrian_b 54 days ago | | |

Actually the column-major order of Fortran is more efficient for some linear algebra operations than the order of C, which has been inherited by many modern languages that do not care about high performance in scientific computations.

So I would say that the culprit for interoperability is C and its descendants, not Fortran or Julia. The designers of C and of the languages that have imitated C have not given any thought about which order for multi-dimensional arrays is better, so the users of such languages do not have any right to blame for interoperability other languages that have done the right thing. Even if the Fortran order had not been better, it had already been used for 20 years before C, so there was no reason to choose a different order.

C has chosen to store arrays in the order in which they are typically read by humans when written on paper, but this is a choice like the choice between big-endian and little-endian, where big-endian was how Europeans wrote numbers, but little-endian is more efficient on computers.

An example of why column-major order is preferable, is the matrix-vector product, i.e. the evaluation of a function that maps linear spaces.

The matrix-vector product should not be done as it is typically taught in schools, by scalar products of rows of the matrix with the vector, because this is less efficient, by making more memory accesses. The right way to compute a matrix-vector product is by doing AXPY operations between columns of the matrix and the vector operand (segments of the output of the AXPY operations are held in registers until all partial AXPY operations are accumulated, avoiding memory accesses). In this case, you need to read columns of the input matrix for each AXPY operation, which is much more efficient when the elements of a column are stored compactly in memory, avoiding the need of strided accesses.

The same thing happens for matrix-matrix products, which must not be done in the naive way taught in schools, by scalar products of rows of the first matrix with columns of the second matrix, but it must be done by tensor products of columns of the first matrix with rows of the second matrix.

csvance 54 days ago | | |

Just reverse the axis on one side, typically the Julia side. This is the convention used in Lux.jl/Flux.jl. I share memory between the two with zero additional copying for my workflows on a daily basis. If you are really allergic to doing this, I’m sure it’s possible to use metaprogramming / the type system to write it the same way in both places with zero performance overhead.

brabel 54 days ago | |

> code that is uglier and still much slower than C++.

Oh such a shame indeed! They didn’t even manage to produce better looking code at least?? Julia was looking great in 2019 but it was very buggy still so I stopped looking. Had hopes that by now it would be a good choice over C++ and Rust with similar performance.

cmrdporcupine 54 days ago | | |

There's simply no way it'd ever have similar performance to those. It's not possible.

I have always seen it as a potential alternative to Java, and definitely better than Python.

My experience working in it professionally was that it was... fine. But the GC in it was not good under load and not competitive with Java's.

drnick1 54 days ago | |

Came here to say that. It's just easier to write C++ in the first place, and LLMs now make this easier than ever.

2ndorderthought 54 days ago | |

I don't get the appeal. It's like a. OSS Matlab but all contributions are used directly so the language developers can make money for a parent company? Most OSS languages aren't run that way. Seems kind of scammy

KenoFischer 54 days ago | | |

It always amuses me when people assume that the nefarious scheme is taking open source contributions and selling them. That's not the nefarious scheme. The nefarious scheme is going to partners, funding agencies and investors and saying "look at this unique capability / important research / profitable business opportunity that we can do together, but oops, all of our code is written in Julia, so I guess we better pay some people to maintain it so it'll all come crashing down, wouldn't want that to happen".

Also, I'm of course using nefarious in jest here in both cases. While we don't directly try to monetize our open source work, I respect that sometimes people need to do that. As long as people are transparent about it, I don't have a problem. Doing the thing we're doing seems to work, but it's a lot harder, because you have to build a successful pice of software and a (or multiple) successful something elses that has a critical dependency on it. It's like hitting the lottery twice.

csvance 54 days ago | | |

Your baseline for comparison is a company that doesn't give anything away for free?

Also, contributing in open source is a choice, not a mandate. I greatly benefit from Julia and its ecosystem so I chose to contribute back some of my work, no one forced me. I chose the MIT license because I want other people to be able to make money with it, just like I make money with other peoples MIT licensed stuff.

postflopclarity 54 days ago | | |

the parent company is a consumer of Julia, and has no formal role in oversight or governance; they are of course invested in the success and performance of the language, but so are all other users!

andyferris 54 days ago | | |

Meh, I’ve never been associated with the company and AFAICT they provide value through platforms for enterprises. Not everyone gets OSS sponsorships to fund team (and using a social media presence to achieve this was a post-Julia phenomenon).

It’s nothing like Google-the-ad-company influencing Chrome. The company consumes Julia for products to sell, rather. Maybe this affects the ordering of features landing, but… meh.

Syzygies 54 days ago |

Julia is reasonably fast. I returned to a language comparison project specific to my math research, to see how I might do better. My agents and I studied the advice in the post and various more recent links from the comments, but we were already mostly on target and nothing left moved the needle.

My work is more combinatorial. Julia does excel at numerical computation. There's a tribal divide in math between people who can't go 30 seconds away from the real or complex numbers, and those whose tolerance is about that long. I try to keep an open mind, but I'm closer to the second camp. Julia is good enough to consider either way.

A development in recent months, AI can assist in general purpose Lean 4 programming, no longer getting confused by the dominant proof-oriented training corpus. If one is a functional programmer who believes that Haskell was on the right track, then Lean is the most interesting language choice for shaping one's thoughts. Benchmarks are inherently misleading if a better language makes it possible to express algorithms out of reach of more primitive languages.

https://github.com/Syzygies/Compare

            C++  100    13.08s  ±0.08s
           Rust   99    13.16s  ±0.02s
          Julia   90    14.54s  ±0.01s
             F#   90    14.54s  ±0.04s
  Kotlin-native   88    14.79s  ±0.01s
         Kotlin   86    15.18s  ±0.01s
          Scala   79    16.50s  ±0.08s
   Scala-native   76    17.14s  ±0.02s
            Nim   65    20.17s  ±0.01s
          Swift   64    20.54s  ±0.04s
          Ocaml   52    25.38s  ±0.04s
           Chez   49    26.64s  ±0.02s
        Haskell   37    34.96s  ±0.06s
           Lean   29    45.39s  ±0.15s

kmaitreys 54 days ago |

I really like Julia as a language but I have struggled to adopt it and be productive in it. Part of it is because of the JIT runtime and a sub-par LSP (at least when I last tried).

To those who regularly write Julia code, what is your workflow? The whole thing with Revise.jl did not suit me honestly. I have enjoyed programming in Rust orders of magnitude more because there's no run time and you can do AOT. My intention is not write scripts, but high performance numerical/scientific code, and with Julia's JIT-based design, rapid iteration (to me at least) feels slower than Rust (!).

ekjhgkejhgk 54 days ago |

Phew. 7-year old post about a 10-year old language. Triggers all the LLMs posting empty generic response "Very interesting, exposes limitations...".

Prelude of what's to come in the self-reinforcing cycle of machines talking to machines and drowning everything else.

kelipso 54 days ago | |

It's a very predictable pattern I swear. Thought it was a mostly reddit thing but dead internet theory looking more and more real even here.

mgkuhn 54 days ago |

I'm always surprised when people describe Julia syntax as "Pythonic": Julia's syntax was clearly inspired by MATLAB rather than Python.

And that's a good thing, because Python+NumPy syntax is far more cumbersome than either Julia or MATLAB's.

You can see this at a glance from this nice trilingual cheat sheet:

https://cheatsheets.quantecon.org/

SatvikBeri 54 days ago | |

It's definitely closer to matlab than python, but it's closer to python than most mainstream programming languages. I ported ~20k lines of python code to Julia over a couple years manually, and for the most part could do line-by-line translations that worked (but weren't necessarily performant until I profiled and switched to using Julia idioms.)

ForceBru 54 days ago |

Recent discussion on Julia Discourse: https://discourse.julialang.org/t/making-julia-as-fast-as-c/

mgkuhn 54 days ago |

Note that this article is about Julia 1.0.3, whereas today you should consider as obsolete any experience reports involving Julia versions prior to Julia 1.10 (the current LTS version), the most significant milestone in the maturity and usability of the language.

orthogonal_cube 54 days ago |

Dang, haven’t read much on Julia as of late. I remember using it for a CS 300-level course around 2016 when learning about tokenizing and parsing as part of language fundamentals. Julia has undoubtedly made some significant performance improvements since then. Would love to see a follow-up that explores what, if anything, from this still holds true and what improvements can be made.

FattiMei 54 days ago |

Very interesting post and I think this exposes the limitations of the Julia compiler. Note that an old version of the compiler is used (1.0.3 from 2019).

One could say that we can almost replicate the semantic of a C++ program, but writing in Julia. For example we can remove bounds checks in arrays or remove hidden memory allocations.

But the goal of a language for numerical computing is capturing the mathematical formulas using high level constructs closer to the original representation while compiling to efficient code.

Domain scientists want to play with the math and the formulas, not doing common subexpression elimination in their programs. Just curious to see how it evolves

northzen 54 days ago | |

I think the best compromise would be to get the best of two words. By default perform bound checks, but have a compiler flag which skips it. Might broke many programs written with default behaviour in mind, but allow perform additional optimizations.

postflopclarity 54 days ago | | |

this is exactly what julia does. boundschecks are default on, and there are compiler flags --- either locally, via the `@inbounds` macro, or globally with `--check-bounds=no`--- to disable them

Woodi 52 days ago |

Over the years there already was almost the identical articles about making in language X program as fast as C or C++... And results was exactly the same: write C/C++ style programs!

Why ?

Because of CPU's architecture - given CPU one just need to structure code in a way CPU can perform efficiently! Is it such surprising that all sugar and multi-functional smartness have cost of all that if's and loops like maps? CPU is just rock stupid and can't do anything else!

That's from where all that specialized instructions are coming and programs just need to be structured or compiled to CPU arch way to perform as fast as CPU and rest of hardware allows...

And there are some "Java machines" and that is exaclty the same story: use CPU native lang :) As much as posible.

So: give us better cpus pls :)

kasperset 54 days ago |

I wonder how Mojo ranks along with Julia. Mojo was discussed yesterday here. Mojo seems to be more python focused while Julia is very much focused on Scientific computation. I may be wrong.

vivzkestrel 53 days ago |

- why are all the newer posts on page 1 and page 2 under blog empty? I mean I literally only see the title

- not a single post has anything inside here https://flow.byu.edu/posts/

slwvx 56 days ago |

From 2019