Julia 1.6 Highlights(julialang.org) |
Julia 1.6 Highlights(julialang.org) |
The Julia implementation ended up being about 2 to 3 times faster. I timed the core learning loops, the network evaluations and gradient calculations and applications, and PyTorch and Julia performed similar here. So it wasn't that Julia was faster at learning. Instead it was all the in-between, all the "book keeping" in Python ended up being much faster in Julia, enough so that overall it was 2 to 3 times faster.
(I was training on a CPU though. Things may be different if you're using a GPU, I don't know.)
Chose Julia instead. Took 4 hours to get everything sorted out (including getting IT to allow Julias package manager to actually download stuff) and have the first model running just putting a paper into code. Since code is just writing the math, this is a vast communication improvement.
After fiddling around withit at home for a week, this was the first professional experience and I'm blown away.
Thanks to everyone who is working on this language!
That requires a linear search through a list of all possible combinations of input arguments.
In a single dispatch language like most object oriented languages, you can do a simple dictionary/hash table lookup. Much faster.
With the JIT Julia is able the optimize away most of these super slow lookups at runtime. Hence you get multiple dispatch for all functions but with fantastic performance. Nobody had done that before.
JIT is not new, multiple dispatch is not new, and multiple dispatch + JIT also isn't new, but nmo existing langauges combined them in a way that allows for the fantastic, efficient devirtualization of generic methods that julia is so good at.
This is why things like addition and multiplication are not generic functions in Common Lisp, it's too slow in CL because the CLOS is not able to efficiently devirtualize the dispatch. In julia, everything is a generic function, and we use this fact to great effect.
CLOS and Dylan laid a ton of important groundwork for these developments, but they're also not the same.
I guess you could try to hack together some kind of templating feature to make new type-specific classes on the fly, but this won’t work well with subtyping. Your template goes system could probably have (matrix float) as a subclass of matrix, but not of (matrix real) or (matrix number). I think you’d lose too much in Common Lisp’s hodge-podge type system.
A big innovation of Julia was figuring out how to make generic functions and multiple dispatch work in a good way with the kind of generic data structures you need for good performance. And this was not a trivial problem at all. Julia’s system let’s you write generic numeric matrix code while still having float matrix multiplication done by LAPACK, which seems desirable.
The other thing is that Julia is a language where generic functions are a low-level thing all over the standard library whereas Common Lisp has a mix of a few generic functions (er, documentation is one; there are more in cltl2), a few “pre-clos” generic functions like mathematical functions, sequence functions and to some extent some array functions, and a whole lot of non-generic functions.
Although CLOS and others do support it, Julia seems to take the cake by most metrics, highlighting that it is a core paradigm of the language, more so than in the others.
I just did a “using Plots” in 1.6.0, and it was fast enough to not care about the delta between Plots and, say, R loading ggplot.
Huge kudos to the Julia team.
julia> @time let
using Plots
plot([sin, cos])
end
11.267558 seconds (17.98 M allocations: 1.114 GiB, 4.83% gc time)
Versus Matlab which probably takes about 15 seconds just to open the editor but plotting is very fast. >> tic
fplot( @(x) [sin(x) cos(x)])
toc
Elapsed time is 0.374394 seconds.
Julia is just about as fast as Matlab after the first run for plotting.[1] https://www.oxinabox.net/2021/02/13/Julia-1.6-what-has-chang...
But I'll sure give it a try because Julia hits a sweet spot between expressiveness and speed (at least for the kind of stuff I do : matrix, algorithms, graphs computations).
I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.
> I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.
If you use immutable data-structures in julia, you're rather unlikely to end up with any heap allocations at all. Unlike Java, Julia is very capable of stack allocating user defined types.
Check out staticcompiler.jl
Or are there steps to produce a binary (much like Go or C or Rust)??
Making a nice small static binary is technically possible using an approach similar to what GPUCompiler.jl does, but the CPU equivalent of that isn’t quite ready for primetime.
not that my suggestion is good, but what they have now is bad
Will there be an M1 Mac version for 1.7?
> `git clone https://github.com/JuliaLang/julia` and `make` should be enough at this point.
https://github.com/JuliaLang/julia/issues/36617#issuecomment...
For the package ecosystem as a whole, we will be slowly increasing the number of third-party packages that are built for aarch64-darwin, but this is a major undertaking, so I don't expect it to be truly "finished" for 3-6 months. This is due to both technical issues (packages may not build cleanly on aarch64-darwin and may need some patching/updating especially since some of our compilers like gfortran are prerelease testing builds, building for aarch64-darwin means that the packages must be marked as compatible with Julia 1.6+ only--due to a limitation in Julia 1.5-, etc...) as well as practical (Our packaging team is primarily volunteers and they only have so much bandwidth to help fix compilation issues).
Julia loses almost consistently to Go, Crystal, Nim, Rust, Kotlin, Python (PyPy, Numpy): https://github.com/kostya/benchmarks
Is this because of bad typing or they didn't use Julia properly in idiomatic manner?
https://github.com/JuliaLang/julia/issues/40190
Edit: it's now building:
https://github.com/JuliaLang/docs.julialang.org/runs/2196972...
By the download speed, it might take a few hours before I can plot something.
It also seems that just doing "git clone JuliaRegistries/General.git" is much faster than doing "] add Plots"
See the docs which kinda read like blog posts: https://juliasimd.github.io/LoopVectorization.jl/stable/
And then replacing the matmul.jl with the following:
@avx for i = 1:m, j = 1:p
z = 0.0
for k = 1:n
z += a[i, k] * b[k, j]
end
out[i, j] = z
end
I get a 4x speedup from 2.72s to 0.63s. And with @avxt (threaded) using 8 threads it goes town to 0.082s on my amd ryzen cpu. (So this is not dispatching to MKL/OpenBLAS/etc). Doing the same in native Python takes 403.781s on this system -- haven't tried the others.A fair comparison with C++ would be to at least include the compilation/linking time into the time reported.
Ditto for Java or any JVM language (you'd have JVM startup cost but that doesn't count the compilation time for bytecode).
Generally, for stuff (scientific computing benchmarks) like this you want to run a lot of computation precisely to avoid stuff like this (i.e you want to fairly allow the cost of compilation & startup amortize)
Optics of this type of reaction is seen everywhere in the Julia community. My advice is to embrace negativity around the language, try to understand if it is fabrication or legitimate, and address the shortcomings.
Julia is a beautiful language and hope some of the warts of the language gets fixed.
* interesting (not fun) to find out how Julia works
* annoying AF to discover that much of the teaching material was hidden behind some 3rd party website, presumably in videos (I didn't bother to register, but started browsing the manual instead). What's wrong with text?
* unnecessarily complex because the documentation for the basic functions is nearly inaccessible to beginners.
But, I managed to get a simple layout system up and running, and it wasn't fast. I rewrote it in Go (the language in which I'm currently working most), and it was literally >100x faster. And that should not be due to the startup costs, because a backtracker shouldn't have that much overhead JIT-ing.
I think I can now say that I can't see the use case for Julia. "Faster than Python" is simply not good enough, and for the rest there are no redeeming features. Perhaps the fabled partial differential equation module is worth it, but that can get ported to other languages, I guess.
Coming from a Python and C++ background, I found it sufficient to just read the docs and do some Advent of Code problems to get productive in Julia. What videos are you talking about? https://docs.julialang.org/en/v1/manual/performance-tips/ I found to be a pretty good document on why and when Julia can be slow.
Julia performance should be on par with Go, if it's slower, read the performance tips in the manual. As for teaching material on 3rd party websites, I don't know what you mean. The Julia manual is available from the julialang.org website.
As for re-writing DifferentialEquations, that is extremely strongly tied to the multiple dispatch paradigm, re-writing it would be hard. What you can get is wrappers like diffeqpy and diffeqr, which call out to Julia.
Here's code I ran, with results:
julia> using GalaxyBrain, BenchmarkTools
julia> bench = bf"""
>++[<+++++++++++++>-]<[[>+>+<<-]>[<+>-]++++++++
[>++++++++<-]>.[-]<<>++++++++++[>++++++++++[>++
++++++++[>++++++++++[>++++++++++[>++++++++++[>+
+++++++++[-]<-]<-]<-]<-]<-]<-]<-]++++++++++."""
julia> @benchmark $(bench)(; output=devnull, memory_size=100)
BenchmarkTools.Trial:
memory estimate: 352 bytes
allocs estimate: 3
--------------
minimum time: 96.706 ms (0.00% GC)
median time: 97.633 ms (0.00% GC)
mean time: 98.347 ms (0.00% GC)
maximum time: 102.814 ms (0.00% GC)
--------------
samples: 51
evals/sample: 1
julia> mandel = bf"(not printing for brevity's sake)"
julia> @benchmark $(mandel)(; output=devnull, memory_size=500)
BenchmarkTools.Trial:
memory estimate: 784 bytes
allocs estimate: 3
--------------
minimum time: 1.006 s (0.00% GC)
median time: 1.009 s (0.00% GC)
mean time: 1.011 s (0.00% GC)
maximum time: 1.022 s (0.00% GC)
--------------
samples: 5 evals/sample: 1
Note that, conservatively, GalaxyBrain is about 8 times faster than C++ on "bench.b" and 13 times faster than C on "mandel.b," with each being the fastest language for the respective benchmarks. In addition, it allocates almost no memory relative to the other programs, which measure memory usage in MiB.You could argue that I might see similar speedup for other languages on my machine, assuming I have a spectacularly fast setup, but this person ran their benchmarks on a tenth generation Intel CPU, whereas mine's an eighth generation Intel CPU:
julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info: OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
This package is 70 lines of Julia code. You can check it out for yourself here: https://github.com/OTDE/GalaxyBrain.jlI talk about this package in-depth here: https://medium.com/@otde/six-months-with-julia-parse-time-tr...
But note that OP uses larger cells (`int` = 32 bit in the C version, `Int` = 64 bit in the Julia version) while GalaxyBrain seems to use 8 bit cells. Not that I expect this to make a major difference (but perhaps a minor one?)
Second to write very fast julia u need to knew a lot of "tricks" and in most cases u won't be doing it as easy as writing normal code.
And all people writing this benchmark is measuring compilation time (XD?) or not including jitting time they could just look at code/readme for 5s before commenting.
Julia is fast and can be as fast as C but not in all cases and not as easy at it seems.
That's true in literally any language. Some languages require inlined assembly. Others require preprocessor directives. In almost all languages, you need to understand the difference between stack and heap, know how to minimize allocations, know how to minimize dynamic dispatch, know how to efficiently structure cache-friendly memory layouts. And of course, data structures & algorithms 101.
In terms of performance, Julia provides the following:
1. Zero-cost abstractions. And since it has homoiconic macros, users can create their own zero-cost abstractions, e.g. AoS to SoA conversions, auto-vectorization. Managing the complexity-performance trade-off is critical. But you don't see that in micro-benchmarks.
2. Fast iteration speed. Julia is optimized for interactive computing. I can compile any function into its SSA form, LLVM bytecode, or native assembler. And I can inspect this in a Pluto notebook. Optimizing Julia is fun, which is less true in other languages.
I don't know about linked lists, though.
I’m not sure what issue you think you identified by googling such a short problem description, but it seems like it could be any of:
- slow internet connection
- firewall / proxy issues
- antivirus gumming things up
- file system being slow when dealing with lots of small files (Windows mostly)
- precompiling Plots took longer than expected
- precompiling Plots hit a deadlock
- loading Plots took longer than expected
- loading Plots hit a deadlock
- something else?
Worse, what “stuck” means is also ambiguous. Does that mean it failed with an error? Does that mean a download started but then was too slow for the user’s taste? Does it mean that a download started but never got any data at all? How long did the user wait?
My best guess is that git cloning the registry on Windows is taking a long time and isn’t actually stuck. There’s a fix for that being worked on for 1.7 (don’t unpack registries).
All four take seconds to start (or even less), and Julia felt like a huge step back in productivity.
If it's really fixed, might be good trying again.
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.6.0 (2021-03-24)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
(@v1.6) pkg> add Plots
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
Installed Qt_jll ─ v5.15.2+3
Downloaded artifact: Qt
Updating `~/.julia/environments/v1.6/Project.toml`
[91a5bcdd] + Plots v1.11.0
Updating `~/.julia/environments/v1.6/Manifest.toml`
[ede63266] ↑ Qt_jll v5.15.2+2 ⇒ v5.15.2+3
Progress [========================================>] 246/246
246 dependencies successfully precompiled in 140 seconds
julia> @time using Plots
3.689727 seconds (6.58 M allocations: 472.965 MiB, 7.49% gc time, 0.13% compilation time)
julia> @time begin
using Plots
x = 1:10; y = rand(10); # These are the plotting data
plot(x, y)
end
3.050765 seconds (3.63 M allocations: 218.824 MiB, 4.87% gc time, 59.07% compilation time)
julia> @time begin
using Plots
x = 1:10; y = rand(10); # These are the plotting data
plot(x, y)
end
0.001435 seconds (2.61 k allocations: 161.836 KiB)It also has a feature (I can’t recall the name) which is a bit like fexprs (let’s say macros who’s inputs are the types of the arguments of a function) that can generate customised code (eg an FFT depending on the input size) on the fly.
(but I don't find it helpful to compare to fexprs, which I think of as more about deferring evaluation, whereas generated functions are about "staged programming".)
its a very poor design they don't have variable declaration, and they have to go head over heels to provide acceptable behaviour
anyway, it is a bad thing in the language, lets not defend it (i hope)
julia still have a lot to offer, i guess it can afford one design flaw
And as I said: I wrote a straight-forward backtracker. It just recursive function calls: check a possible state for the current item, and when successful, update the overall state and move on to the next item; on return, try another state for the current item, until the search space is exhausted. There's not a lot to optimize, nor is there a lot of work for a JIT compiler.
> on the most flimsy basis
I've got more gripes. Forward type declaration to name one. But I'm not spewing disdain: I just don't see Julia take a larger role in general software development.
As for performance, I'm not really talking about 'optimization'. Your implementation may simply have used some pattern that should be avoided, such as global variables, type instabilities, abstract types in structs, or some inappropriate data structures. If it's a microbencmhark, then there are some things to keep in mind.
These are not really optimizations, but basic performance principles. I cannot know that you are unaware of them, but your statement that 'there's not a lot to optimize' make me suspect that this could be the case. The unusual thing about Julia is that it's both dynamic and compiled, so that code that would simply not compile in static languages instead ends up slow.
Failed or bad type inference is almost always the cause of performance issues in Julia. Getting a feel for when the compiler can infer things or not takes practice, but it’s a lot easier than the semantics of generic programming systems IMO.
The REPL is really great for learning. If you type “Array{Int} == Array{Int, 1}” the result is false. If you type “?Array” it prints the docstring which gives some guidance on how to use one versus the other.
The really nice thing about CLOS, though, is that the meta-object protocol lets you choose an implementation of OOP that makes sense for your use-case.
Julia makes the runtime cost of (type stable) dispatch zero, and hence does not even give julia programmers an *option* to write non-generic functions (though it can be hacked in like with FunctionWrappers.jl). I'm not familiar with Strandh's work, but has it made the overhead of generic functions low, or has it completely eliminated it?
Another thing I'll mention is that Julia's type system is parametric, and we allow values (not just types) in our type parameters which is immensely useful for writing generic high performance code. You can specialize methods on matrices of complex integers separately from rank 5 arrays of rational Int8s for instance. This is not a capability that CLOS or Dylan has as far as I'm aware, though the common refrain is that you can do it with macros, but that neglects that it's rather hard to get right, and will have limited use because such macro implementations of type parameters won't be ubiquitious.
________________________________
To be clear though, I'm not hating on Common Lisp or CLOS. The Common Lisp ecosystem is awesome and can do all sorts of really cool things that I wish we had in julia. I'm mostly just pushing back on the notion that Julia doesn't do anything new or interesting.
It’s not that it’s the first, but it seems to be the first where the use of multiple dispatch throughout the community was sufficiently pervasive to kick-start the emergence of the strong network effects we’re now seeing w/r/t composability.
I would not be surprised to see more languages working to emulate this kind of combination of multiple dispatch and JAOT compilation in the future.
It's also nice to see that you (personally) are sponsoring zig development. There is so much more room for improvement in the arena of programming languages. Infrastructure like this is a huge multiplier.
Most of this is just a relic from StaticCompiler.jl being a very straightforward slight repurposing of GPUCompiler.jl. It will take some work to make it robust on CPU code, but the path to doing so it pretty strightforward. It just requires dedicated work, but it's not a top priority for anyone who has the know-how currently.
Do you have any links to what has been currently done?
EDIT: Found this thread so far: https://discourse.julialang.org/t/statically-compiled-and-st...
I think this isn't really a great place for beginners though unfortunately. This project is tightly coupled to undocumented internals of not only julia's compiler, but also LLVM.jl and GPUCompiler.jl. It'll require a lot of learning to be able to meaningfully contribute at this stage.
"Put your code into functions, people!" .. is the reason why most people dont notice the weird scoping rules
You will only hit the weirdness face first, if you write scripts with global variables, which is usually what beginners do
Most advanced users, and library writers, probably hardly notice it
You were courageous to even try :-)
From their refusal to see any use in explicit variables declarations, their (somewhat related) huge scope debacle, to its strange and irregular 'resolution', not to mention the original absurdly weird propositions they had made to resolve it: the scope and variable declaration subject is pretty hopeless in Julia land. I quickly gave up on it years ago (long before the scope debacle), as I had no intention of losing my time, when I saw the arguments and the logic they used.
Framing this as a case of irrational and illogical behaviour is unnecessary and unreasonable in my opinion. A lot of serious thought and debate went into the resolution. There is no need to disrespect and badmouth people because they have different priorities than you.
In a delightful twist, it seems that there is a Julia implementation of a Brainfuck JIT that is much faster than the fastest interpreter that is benchmarked here, so even by this somewhat esoteric benchmark, Julia ends up being absurdly fast.
Does the man have to be obsequious everytime he discusses his language in an informal setting?
I've never seen a 10 million entry immutable set on the stack but I could be wrong.
Edit: reading your other post, it seems like you may mean persistent data structures, a la Clojure, rather than immutable structures, which are quite different. The former would indeed always be heap-allocated (it's necessary since they are quite pointer-heavy). Immutable structures, on the other hand are detached from any particular location in memory.
Moreover, if the elements in an array are mutable, eg Java objects, then each one needs to be individually heap allocated with a vtable pointer and the array has to be an array of pointers to those individually allocated objects. For pointer-sized objects (say an object that has a single pointer-sized field), that takes 3x memory to store x objects, so that's already brutal, but worse is that since the objects are all individually allocated, the GC needs to look at every single one, and freeing the space is a fragmentation nightmare. If the objects are immutable (and the type is final; btw all concrete types are final in Julia), then you can store them inline with no overhead and GC can deal with them a single big block.
Btw, I had to vouch for you to undead your posts in order to reply. Looks like you got downvoted a bunch.
In other posts you actually argue that GCs help you reduce complexity because manual memory management is too much of a hassle.
May be immutable is not the correct term - persistent data structures is what I like support for: that is my use-case.
I think you can have efficient persistent data structures without a GC, but that requires fast reference counting and in turn, that requires a lot of work to be competitive with the JVM.
I also understand that my use-case is not Julia's focus. That's perfectly fine.
Do you happen have any links to the static binary procedure? Or links to the current state of efforts for this?
t = time()
results = calc(n)
elapsed = time() - t
So startup time at least isn't included.One might argue that this is still biased against Julia due to its compilation strategy, but fixing that would mean you'd have to figure out what the appropriate way to get 'equivalent' timings for any of the other languages would be as well - something far more involved than just slapping a timer around a block of code in all cases...
edit: As pointed out below, the Julia code should indeed already have been 'warmed up' due to a preceding sanity check. My apologies for 'lying'...
I think what s/he meant to say is that Julia is not "magically" faster than other languages. The real questions are:
1. Can unoptimised Julia code run as fast as unoptimised c/c++ code? I think the linked benchmark suggests this is not really the case.
2. Can optimised Julia code run faster than comparably (i.e. requiring similar amount of effort and expertise) optimised c/c++ code? If not, then why use Julia?
That's somewhat true, and is at the end-point of some mismatched expectations when folks come to Julia. Julia is a high-level dynamic language whose semantics are conducive to creating the ~same performance as static languages.
So if your unoptimized Julia program relies upon traditional "dynamic" features like `Any[]` arrays, then you should expect to see dynamic- (read: python-) like performance out of Julia. Julia should match performance of other dynamic languages here, but the complier doesn't have all the typical dynamic optimizations because, well, it's often easy to write your code in a manner that ends up hitting the happy path that gets the static-like performance.
Conversely, if your dynamic language baseline is just glue to an optimized static library, then you should expect to see static-like (read: C/C++-like) performance out of your dynamic language. Julia really should match performance here, and if it doesn't, open an issue: it's a bug.
Where Julia truly excels are the cases where you don't have a library implementation (like numpy) to lean on and find yourself writing a hot `for` loop in a dynamic language. Further, it excels at facilitating library creation, leading to more and more first-class ecosystems that are best-in-class like DiffEq.
Dynamic dispatch is slow in any language, including C/C++ (provided that the compiler can't devirtualize the method). This is why such things are never done in an inner loop.
In C++, its harder to "accidentally" use dynamic dispatch because you have to explicitly annotate a function as being virtual. In Julia, which is much more concise, type stability or instability is implicit. But it can be inspected statically via @code_warntype. Good IDE plug-ins can make it easier.
If you have saved a couple of minutes or hours of coding and are only going to run that code a handful of times, it should not matter if it runs a second or two slower than C/C++. This is the same rationale that Python and other scripting languages have. But unlike Python, you should be able to match the speed of C/C++ or get pretty close by optimizing your code.
1. Can unoptimised Julia code run faster than unoptimised Python code (with numpy being used to do the heavy lifting)?
Let's say one is prototyping some algorithm so iteration speed is more relevant than running speed. Then one can choose either Julia or Python (with the help of numpy perhaps) and get an implementation in similar timeframes. So Julia won't necessarily be more attractive here.
Now if the prototype proved that running speed is very critical to the successful application of the algorithm, then it would mean the developer now has to optimise the hell out of it. One can either:
1. Optimise the Julia codebase, if Julia was used to prototype, following the many tips and tricks available (e.g. type stability, various macros, etc.).
2. Port the algorithm to C/C++, applying the many performance best practices that people have accumulated over the years.
So if the optimised C/C++ port is capable of being any faster than the optimised Julia code, then the rational choice would be to port the implementation using C/C++; it would also mean Python would have some advantage over Julia in the prototyping phase too due to its popularity. Otherwise I'd agree that using a single language to both do prototyping and production is the best.
The downside is that generating system images can be quite slow, so we're still working on ways to generate them incrementally. In any case, if you're inspired to work on this kind of stuff, it's definitely something the entire community is interested in!
That however means that some packages get a preferred status in the Julia ecosystem.
Try this:
matlab -nosplash -nodesktop -r "tic; fplot( @(x) [sin(x) cos(x)]); toc"
Regarding performance, I don't know much about this in Dylan. Was Dylan able to completely remove the runtime overhead of multiple dispatch for type stable code?
Also note that it was competing against C++ as the Newton OS system programming language, it only lost due to politics.
Pioneering and ahead of its time in many ways, but for whatever reason it seems that the use of multiple dispatch in Dylan seems to have not (yet?) led to the same level of ecosystem-wide composability.
I have heard it is to do with how windows antivirus works. Since the registry is like 10,000 seperate files. It chokes on them.
I have heard there is an upcoming feature to allow the package manager to work with the registry being kept inside a tarball, which is specifically being added to deal with this
I gave up because it's just too slow (more than 15 min to get to 3%).
Maybe plotting is indeed faster, but I don't think I'll actually get to that point.
Problem disappeared.
PackageCompiler.jl just compiles everything and packages it up. It generates huge files, because it doesn't discriminate on which compiled stuff to include.
n = length(ARGS) > 0 ? parse(Int, ARGS[1]) : 100
left = calc(101) # <------- THIS IS COLD START JITTING CALL
right = -18.67
if abs(left - right) > 0.1
println(stderr, "$(left) != $(right)")
exit(1)
end
notify("Julia (no BLAS)\t$(getpid())")
t = time()
results = calc(n)
elapsed = time() - t
notify("stop")It's true that a simpler GC might get you acceptable performance for your requirements if your allocation rate is relatively low, but you still won't get OpenJDK performance. So I'd say that if you design your language to require fewer objects, then you can get by with a simple GC if your performance requirements aren't too demanding.
All that dereferencing puts a higher load on data structure traversal (which is why Java is getting "flattenable" types) than on the GC. The main reason for Java's particular GC challenges isn't its pointer-heavy (pre-Valhalla) design but the mere fact that it is the GCed platform that sees the heaviest workloads and most challenging requirements by far. Java's GC needs to work hard mostly for the simple reason that Java is asked to do a lot (and the better some automated mechanism works, the more people push it).
Go even reverted a generational collector because it had no performance benefits since most generational objects would be stack allocated anyway — Julia’s JIT and way more advanced llvm backend should do even better than go in keeping objects stack local and inline.
'Unoptimized' code should still observe most of the performance tips in the manual (such as avoiding globals and type instability), while 'naive' code frequently does not. With some experience, you never write naive code, even for quick prototypes.
In those cases, Julia should outperform other dynamic lanuages significantly, and approach static languages in most cases.
Proper optimization means going in and removing allocations, ensuring that operations vectorize (simd), tailoring data structures for performance, adding parallelism etc. In the latter case Julia should virtually _always_ match static languages closely, otherwise it merits investigation.
The thing with Python is that 'naive' Python code is already pretty close to 'unoptimised' Python code, so one can write naive Python code with numpy and still ends up with not-too-shabby performance, provided they chose an efficient algorithm, of course. In other words, there are not as many performance mistakes one can make with Python (perhaps because it can't get any worse). I imagine that's also why so many Python users who tried Julia were disappointed that direct translations of their Python program fail to perform as fast as advertised.
So there's often a huge difference between "unoptimized code" (something written by an experienced developer who's deliberately taking the easy way out) and "naive code" (something a newcomer might write). There can literally be orders-of-magnitude performance difference.
I agree that there isn't as much to learn about Python. But of course that's largely because of the gap in opportunities.
So I'm not saying you must prefer Java to Go (even though GC tuning is a thing of the past as of JDK 15 or 16), or that Go's performance isn't adequate for many reasonable workloads, only that 1. a flatter object landscape might still not match Java's memory management performance without sophisticated GCs, and 2. I wouldn't extrapolate from Go to Julia, as they are languages targeting very different workloads. E.g. Julia might well prefer higher throughput over lower latency, and Go's GC's throughput is not great.
Java sacrifices some performance for having this "one paradigm" of all objects, and then heavily invested in the GC, but in many cases like writing a BLAS it still just will not give performance exactly matching a highly tuned code, where as in Julia for example you can write really fast BLAS codes like Octavian.jl.
Julia is multi-paradigm in a way that is purposely designed for how these features compose. I think it's important to appreciate that design choice, in both its pros and cons.
Octavian uses stack-allocated temporaries when "packing" left matrix ("A" in "A*B"). These temporaries can have tens of thousands of elements, so that's a non-trivial stack allocation (the memory is mutable to boot). No heap allocations or GC activity needed (just a GC.@preserve to mark its lifetime). If I understand correctly, this isn't something that'd be possible in Java?
To be fair, you can also just use preallocated global memory for your temporaries, since the maximum amount of memory needed is known ahead of time.
Indeed, it is always about design choices and trade-offs. I can see why BLAS code is important and why Julia is an optimal choice for computation heavy problems.
It is entirely possible to design a garbage collected language that doesn't generate so goddamned much garbage — and this works much, much better because a relatively simple GC can easily keep up. Julia and Go are good examples of this. Julia uses immutable types extensively and by default, while Go uses value semantics, which has a similar effect on garbage (but has other issues). With a language design that doesn't spew so much garbage, if you only care about throughput, a relatively simple generational mark-and-sweep collector is totally fine. This is what Julia has. If you also want to minimize GC pause latency, then you need to get fancier like Go (I think they have a concurrent collector that can be paused when it's time slice is up and resumed later).
Persistent data structures are a whole different question that I haven't really spent much time thinking about. Clojure seems to be the state of the art there but I have no idea if that's because of the JVM or despite it.
How possible would it be for Julia to add this? I keep thinking Julia would be great for graphical environments and gaming, but high GC latency won't work there.
Unfortunately, persistent data structures tend to produce (short-lived) garbage which the JVM is very good at collecting!
So yes, Clojure benefits immensely from the JVM.
It is also an interesting research topic whether (optimised) reference counting would be a better approach.
Regarding objects, there is also a "middle ground" to consider:
Split big (immutable) arrays in smaller ones, connect them with some pointers in between, and you are still cache friendly.
Also, you can do a lot on the application level to reduce garbage, and most Java programmers don't care for that exactly because of JVM.
Not anymore. That future is here. Java is getting "flattenable" types not because of GC, but because of iteration.
As part of this change, existing built-in primitive types (like int or double) will retroactively become instances of these more general objects.
Then there are a couple of McLaren, Lotus and F1, but then it is another level altogether.