New Computer Language Benchmarks Game metric: time + source code size(benchmarksgame-team.pages.debian.net) |
New Computer Language Benchmarks Game metric: time + source code size(benchmarksgame-team.pages.debian.net) |
What if you shifted time to nanoseconds ? Or source code size in terms of Megabytes. The rankings could change. The culprit is the '+'
I would think Geometric mean of (time x gzipped source code size) is the correct way to compare languages together. It would not matter what the units of time or size are in that case.
[Here the geometric mean is the geometric mean of (time x gzipped size) of all benchmark programs of a particular language.]
$ insect '5s + 10MB'
Conversion error:
Cannot convert unit MB (base units: bit)
to unit s
$ insect '5s * 10MB'
50 s·MBThe whole point of benchmarks is to protect against accidental bias in your calculations. Adding them seems totally against my intuition. If you did want to give time more weight then I would raise it to some power. Example: geometric mean of (time x time x source size) would give time much more importance in an arguably more principled way.
That annotation does seem to have caused much frothing and gnashing.
Here's how the calculation is made — "How not to lie with statistics: The correct way to summarize benchmark results."
[pdf] http://www.cse.unsw.edu.au/~cs9242/11/papers/Fleming_Wallace...
That is a huge metric I care about.
You can figure out it somewhat by clicking on each language benchmark but it is not aggregated.
BTW as biased guy in the Java world I can tell you this is one area Java is actually mostly the winner even beating out many scripting languages apparently.
This is basically meaningless. I don't see why you'd even need to do this. You can easily show code size and performance on the same graph.
The text is not clear enough, but "geometric mean" is not the benchmark. The 11 problems are listed in https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
The results of the 11 problems are combined using the "geometric mean" into a single number. Some people prefer the "geometric mean", other people prefer the "arithmetic mean" to combine the numbers, other people prefer the maximum, and there rare many other methods (like the average excluding both borders).
Thanks that makes more sense, that's another issue for context then. I don't have anything against geometric means but there should be basic statistics like average, max, min,... available as well.
https://twitter.com/ChapelLanguage/status/152442889069266944...
https://salsa.debian.org/benchmarksgame-team/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Actually if you look at all the top net core submissions the only one fast are the one using low level intrinsics etc ...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Do you mean "fast" like a C program using low level intrinsics?
This thing has been a long running joke in the software industry, exceeded only by the level of their defensiveness.
SMH.
After trying hard to use julia for about a year and I came to conclusion it's one of the slowest things around. Maybe the stuff changed? Maybe, but julia code still remains incorrect.
I hope they fix both things, speed (including start up speed, it counts A LOT) and correctness.
“Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation, implemented using LLVM.”
Julia 1.7 Documentation, Introduction
You could imagine there's a 1 sec/byte multiplier on the bytes value, saying in effect "for every byte of gzipped source, penalise the benchmark by one second".
Your explanation makes sense. However the main issue is we don’t know if this “penalty” is fair or correct or has some justifiable basis. In absence of any explanation it would make more sense to multiply them together as a “sane default”. Later, having done some research we can attach some weightage perhaps appealing to some physical laws or information theory. Even then I doubt that + would be the operator I would use to combine them.
Read '+' as '&'.
"Julia features optional typing, multiple dispatch, and good performance, achieved using type inference and just-in-time (JIT) compilation (and optional ahead-of-time compilation), implemented using LLVM."
https://docs.julialang.org/en/v1.9-dev/
So it'll be updated when v1.9 comes out? Anyways, it's a somewhat interesting thing that Julia still gets 3rd even though it's measuring compilation time.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
:and these pages:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
:and these pages:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
>Python 3: 1h 09 minutes
Well damn.
"Or even acknowledge — Anyone else a bit shocked by how well Javascript on V8 performs? I might need to rethink my assumptions…"
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
[0] I wanted to output an image and the C code only ran statistics so I would have had to figure out some random C image library which wasn’t how I wanted to spend my day.
"Are they similar-enough to be comparable for your purposes?"
n-body Chapel #3 program
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
n-body C clang program
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
n-body Java program
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
median, quartiles
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Therefore the stats you mentioned are all there min, max, and average with two different definitions of average given (geometric mean, and 50th percentile).
https://www.merriam-webster.com/dictionary/average
https://www.itl.nist.gov/div898/handbook/eda/section3/boxplo...
https://salsa.debian.org/benchmarksgame-team/benchmarksgame/...