Crystal: Fast as C, Slick as Ruby(blog.codeship.com) |
Crystal: Fast as C, Slick as Ruby(blog.codeship.com) |
Edit: I'm getting hella downvoted but I'm leaving this here. Ruby fanboys can't silence me!!! ;)
Choosing a programming language based on the syntax is like choosing your significant other based on looks alone. You're going to be spending a lot of time together, what's inside is what counts.
A language with a nice syntax is easier to learn, easier to read and understand, and delightful to write.
Crystal's syntax is a great differentiator between it and its statically typed, garbage-collecting competition.
It's a personal preference of course.
there is something f~*#ing wrong with the article.
If you want to talk about efficient GCed languages, you have many choices, most of which have more tooling and mindshare advantages than you language has tooling advantages.
Really, GCed languages are commodities these days. A lot of people have put a lot of work into the fundamental building blocks, and now people are just combining them in various ways.
Things like list comprehensions and do notation are examples of nice syntactic sugar.
Sensitive whitespace like Python can be nice to write and read, but it's hard(er) to write a parser for it and that's why Python's tooling hasn't been great in the past.
That said, Ruby is even harder to parse and I don't like it aesthetically either.
But it still has a GC :(. Rust has completely spoiled me with making it easy to minimize dynamic memory allocation and copies, and to know (almost always) deterministically when something will go away.
EDIT: I should also say that if you want to bash on Rust's lack of these things, 3 out of the 4 items I cited have solutions being actively worked on (either at planning, RFC, or implementation phase). I don't think Rust's sigils are going away any time soon, but I have no idea how you'd do that and preserve semantics anyway.
But most likely - you simply do not need a language without a GC. If you look at the sheer amount of applications written in interpreted languages, anything compiled straight to machine code is a win, even with a GC. The interpreter and runtime overhead is so much bigger that a GC does not really matter in them, unless you're talking about highly tuned precompiled bytecode that is JIT'ed like Java and .NET, or natively compiled languages like Crystal and Go. So yes, when compiling to native code, the GC can become the "next" bottleneck - but only after you just removed/avoided the biggest-one. And that 'next' bottleneck is something most applications will never encounter. I initially thought of mentioning database engines in the above list of "huge projects with heavy performance constraints", but then I realized a good number of specialized databases actually use runtimes with a GC. Hadoop stack with especially Cassandra, Elasticsearch? Java. Prometheus and InfluxDB? Go.
Just face it: there is an need for something intermediate to fill the gap of a script-like, native compiled, low-overhead, modern language, and a GC is part of this. The popularity and "I want to be cool so I hate it" trend of Go proves this, but the devops space is getting new useful cool toys at a breakneck speed, pretty much exclusively written in Go.
So I really don't get the whole GC hate. If you don't want GC, there are already many options out-there, with Rust being the latest cool boy in town. But in reality there are huge opportunities and fields of applications for languages like Crystal and Go. And most likely - you could use such a language, only you don't think you do because you have an "oh no, a GC!" knee-jerk reaction.
Absolutely. That doesn't mean I can't want predictable performance or deterministic destruction. I also think it's a shame that we waste so much electricity and rare earth minerals on keeping ourselves from screwing up (i.e. on the overhead of managed runtimes and GCs). Before, I'd have argued that it was just necessary. Having spent a bunch of time with Rust, I don't think so any more, and I'm really excited to see non-GC languages build on Rust's ideas in the future.
> Hadoop stack with especially Cassandra, Elasticsearch? Java. Prometheus and InfluxDB? Go.
Cassandra has a drop-in-ish C++ replacement (Scylla, IIRC?) which supposedly blows the Java implementation away in performance. A magic JIT (and HotSpot is really magic) doesn't make everything better all of a sudden.
In a somewhat recent panel (https://www.infoq.com/presentations/c-rust-go), the CEO of InfluxDB basically admitted that if Rust had been more stable when they started they would have been able to use it instead of Go and would have had to do far fewer shenanigans to avoid the GC penalty.
> Just face it: there is an need for something intermediate to fill the gap of a script-like, native compiled, low-overhead, modern language, and a GC is part of this.
Indeed. I'm not in denial of this. I made an offhand remark about my personal preferences and what I'd like to see from future languages. I still write a ton of Python for things where speed really doesn't matter.
> "oh no, a GC!" knee-jerk reaction
I don't think having a refreshing experience without a GC counts as a "knee-jerk reaction." I've thoroughly enjoyed not having to tune that aspect of performance, and I remarked on it. I think Crystal shows great promise, and certainly has the potential to offer easier ergonomics than Rust.
Beyond that, however, there are many uses for ownership beyond controlling memory resources. Closing a TCP connection, releasing a OpenGL texture...there are lots of applications of having life cycles built in to the code rather than the runtime.
EDIT: fixed typo
Actually two of your three examples are no longer correct: game engines often use a core GCd heap because that's how Unreal Engine works since v3, and Chrome has switched to using garbage collection in the core Blink renderer as well. The GC project is called oilpan.
The benefits of GC are so huge, that they're used even for very latency and resource sensitive apps like browsers and AAA games.
Could the answer be lbstanza when it gets there? Lbstanza.org
Regardless, for a language which is meant to operate in the same domain as ruby and be as easy and declarative, not having a GC would be a puzzling decision.
As a side note, I'm curious what areas you are programming in where the presence of a GC is such a downside. Having written almost exclusively in garbage-collected languages over the last few years, it's something I almost never think about (and happy not to). Of course I don't deny that stricter memory control is sometimes necessary.
A tracing GC means that you either have to deal with potentially long GC pauses or you need a lot of extra free memory at all times to give the GC time to catch up before running out of memory [1].
Go says it can achieve 10ms max pause time using 20% of your CPU cores provided you give it 100% extra memory. In other words, memory utilisation must be kept below 50%.
Cloud/VPS prices scale roughly linearily with memory usage. So using a tracing GC doubles your hardeware costs. Whether or not that is cheap depends entirely on what share of your costs is hardware cost and how much productivity gain you expect from using a tracing GC.
I would be very interested in learning how much CPU and memory overhead Swift's reference counting has, because in terms of productivity Swift is certainly competitive compared to languages using a tracing GC.
[1] Azul can do pauseless, but I don't know exactly what tradeoffs their approach makes. Their price is too high for me to even care.
So basically C/C++ replacement is the only niche that is left to fill. It would be even better if new language could replace even GC-languages, so I can can write fast low level libraries or websites in single language, without sacrificing productivity. That would be the Holy Grail I guess.
Some languages, like erlang, do slightly better by garbage collecting erlang processes individually, so other erlang processes can continue running during GC.
The real time capabilities are not always done in pure SW, there are some FPGAs, but when you do rely on SW, you often can not afford to spend even a few milliseconds in GC. In some case, that would mean killing or maiming someone.
And you are often tied to the HW vendor toolchain for a specific DSP, MCU,.. that is only supporting C or C++. This is a domain that is moving very slowly, currently my most optimistic time table would be able to have vendor support for Rust toolchain in 10 or 15 years but I don't foresee any GC language coming to replace the critical part written today in C or C++.
Anything where memory or interactivity needs to be tightly controlled is problematic with a gc. Not only that, but a gc doesn't scale as well with lots of threads. Ultimately you need thread local allocation since you will eventually be bottlenecked by the fact that typical allocation (with malloc, VirtualAlloc, mmap, etc) is protected by a mutex, and deallocation suffers the same fate.
Re: application domains, I've recently been doing some work in CPU/memory constrained applications (not embedded, running big >500GB jobs on HPC clusters), and a GC is unfortunately a non-starter for this kind of data processing.
I have also been watching with great anticipation the work being done on "big data" processing with Rust (https://github.com/frankmcsherry/timely-dataflow) and how that might obviate the need for a GC with the various JVM RAM-hogs which dominate that field.
There are also many areas where people work (many of whom provide the tools that programmers of GC'd languages use for their jobs) which can't admit a garbage collector.
For example, I currently deploy Django code (running on an interpreter that needs to implement, not run on top of, a GC) to a machine with a Linux kernel, running nginx, backed by another machine running PostgreSQL, with caching in Redis. None of those very important tools can reasonably offer the performance needed in a garbage collected language.
For another example, I'm typing this (quite lengthy) response in a low-latency application (a browser) which would also be difficult to implement in a garbage-collected language.
you mean like webservers, where GC has been the #1 cause of operational problems for essentially forever?
LLVM has been enabling fantastic new programming languages, and while it has support for a GC, I have not found a GC library that would be easy to embed in a new compiler/runtime environment.
Now there are dozens of LLVM-based languages (or language prototypes) that have different, incompatible implementations of GC with varying degrees of quality. If there was a relatively simple but efficient GC available, it would be much easier to implement a new language on LLVM.
At one point there was a project called HLVM, but it was targetted at implementing JVM and .NET -style virtual machines. This is not what I'm looking for and I think the project is dead now.
If anyone knows about a GC implementation for LLVM, I'd really like to take a look. If it's a part of a programming language project but would be relatively easy to rip out of the rest of the compiler/runtime, it's not a problem.
That said, I prefer languages without GC.
For me, the only viable alternative to GC are substructural type systems like in Rust's case.
Not quite LLVM, but take a look at the Eclipse OMR project.
OMR intends to provide a set of reuseable components like a GC, port-library and given more effort a jit to be reused into existing language runtimes or build a whole new language out of them.
Bear in mind that .NET can do AOT compilation and the JVM is getting it (and some other non-OpenJDK JVMs already have it).
https://groups.google.com/forum/?fromgroups#!topic/crystal-l...
https://crystal-lang.org/2016/07/15/fibonacci-benchmark.html
I'm not familiar with very many scenarios where one has a garbage collector but doesn't need to free some piece of memory when it's no longer used. Could you clarify what you mean here?
It also seems to allow tweaking for soft realtime systems, e.g. games.
Now, the greater density of concepts shorthand notation can be abused, and too much of that often shifts the cost benefit ratio further to the cost side for all but the most expert in the language, but that's a problem of too much, not on inherent with their use at all.
GC isn't terrible, though. Azul has struck an amazing balance between latency and eagerness—even if you can't afford it the technology does exist. If you don't have latency, memory restrictions, or embedding requirements, rust may be overkill.
You just annotate functions that you don't want to use the GC in with it and it'll assert that they don't use it.
The simplest solution is to add the moral equivalent of 'null' -- objects that transition to an idempotently destructable state, which solves a lot of complexity with the data flow and analysis (yay!) at the cost of some safety (boo), and nulls (louder boo).
The Crystal website itself makes a more modest claim than "fast as C" under its language goals: "Compile to efficient native code", which it clearly does.
For now, if you want a fast language with the beauty and productivity of Ruby, check out Elixir [0] and its web framework, Phoenix [1]. I've been using Phoenix for a year, and it's the first framework that I've actually liked more over time. And I've been a web developer for a decade. With its recent 1.0 release, Phoenix is gaining a lot of momentum.
If you want some idea of the performance differences between Phoenix and Rails, see [2] and [3].
[1] http://www.phoenixframework.org/
[2] https://github.com/mroth/phoenix-showdown/blob/master/RESULT...
[3] http://www.phoenixframework.org/blog/the-road-to-2-million-w...
I had never worked with compiled languages before I tried Crystal, but had always had a huge interest in getting into that. When I wanted to learn the compiled ecosystem I looked at languages like Go and Rust, but the learning curve for those was a bit overwhelming for a newbie. A while later I found Crystal, and much thanks to the simple syntax of the language I learned a ton of new things about compiled languages very quickly. The absolutely best part of the language is that it is written in plain Crystal, and I've been looking at their own implementations for various things a lot - something I've never done before, having worked mostly with Node, Lua and PHP before.
Nowadays I can delve into Go documentation, packages are clear to me and I just understand how things should and should not be implemented to achieve a good efficiency and performance level. The Little Go Book makes sense, the I/O package is simple and this is probably all thanks to the syntax of Crystal, the amazing language & standard library documentation but most importantly the source of Crystal being written in Crystal.
I'm currently working on building a business using Go, because I absolutely need Windows target support - something which Crystal does not yet have. But the second it gets that, I'm moving back. Don't get me wrong, Go is really great and nice to work with - but Crystal is my mentor. Please note that I have not worked with Ruby before, so the whole language was new to me.
To summarize; even if you only wish to learn, Crystal is in my personal opinion the best choice to go with.
"However, Crystal might give incorrect results, while Ruby makes sure to always give the correct result."
https://crystal-lang.org/2016/07/15/fibonacci-benchmark.html
I believe Rust is also implemented in Rust and Go, after a few years of being implemented in C has now a compiler written in Go.
Could you expound upon that point a bit? What's the difference, in your mind?
This is a large problem space that you can glean over by using C as a layer of interaction between your language and the underlying machine, but it makes your language a: not truly a "system language" and b: it also ties you to C philosophy, API/ABI, calling conventions and so on.
Since 1.5 came out, a year ago.
It certainly is great to be able to jump right into Crystal coming from Ruby. It isn't very hard to convert most Ruby code to Crystal -- you just have to go through and "typify" everything. A few methods have different names and of course some don't exist but most of it is there.
My one grip with Crystal however, and why I haven't adopted it more generally, is that much of the "Lisp-like" features of Ruby are all but lost. Crystal makes up for some of this with macros, but it doesn't quite cut it. For example, you can't splat an array into a lambda in Crystal. Arguments have to be tuples which are compile-time bound. Little things like this feel very limiting to an experienced Ruby developer.
Blog: https://crystal-lang.org/2014/12/06/another-language.html Github Issue: https://github.com/crystal-lang/crystal/issues/681
Scala has type inference, Crystal has optional typing. In Scala, there are certain situations when the type is discernible by the compiler, and can be omitted. For example
val x = 1 + 2 + 3
the compiler infers that x is an Integer. However, omitting type information in Scala is the exception not the rule. Methods and functions, for example, must have type annotations.In practice, Crystal also infers type. But in Crystal you can omit almost any type annotation, including method and function definition. This probably poses a different challenge for the compiler authors. The Type Restrictions sections provides some more examples https://crystal-lang.org/docs/syntax_and_semantics/type_rest...
I was interested in Crystal but the lack of apps using it in production and proof of concept on the field is making me doubt its usefulness.
We've been using Crystal in production(at Protel) for more than 6 months for some heavy load APIs (100-200 req/s). We've replaced our Rails API with 64 unicorns to just 1 Kemal process and it's not even breaking any sweat while consuming nearly 100x less resource and 30x less CPU.
You can ask me about our experience.
Sure, thanks. Would love to hear that.
A blog post would probably be more appropriate since it will have a wider audience and will be good for Crystal and its community.
ex: non-scoped (everything in foo is added to the global scope)
inport foo
{
bar.do()
}
ex: scoped (everything in foo is added to the local scope, and assigned a name-space) {
bar = inport foo
bar.do()
}
I find it much easier to manage programs where there are no "hidden" global variables. It's especially hard when the included files also can include files, witch all adds to the global scope.can you make it faster than C though please? (seriously) i think it might even happen by accident in some cases already though. the places where C can be beaten for performance are, in my experience, from design choices in the C standards, users not understanding or leveraging those things for performance and the architecture of the compilation-unit/link process.
things like the struct layout rules - instead of the compiler organising things to be optimal it follows those rules for memory layout, or the calling conventions - you often have to use funky extensions to get efficient function calls.
other things are the lack of ability to hint the compiler that e.g. mathematical structures underly types that can be leveraged for optimisation. that const or functional purity can be trusted... etc.
One typical example of this was a few years ago (if I'm not mistaken) in the monitoring world, when Shinken released a Nagios-compatible engine in Python, and, basically the reactions in the Nagios community was that the modifications involved in Nagios (C) were just too important to be worth it.
(0) Does Crystal have a lot of undefined behavior like C?
(1) Does understanding Crystal programs require a lot of trial and error just like in Ruby?
(2) How good a job does Crystal do at preventing me from shooting myself in the foot?
A language isn't to be judged just by the amazing programs you can write in it. (Turing-completeness and I/O facilities have that covered.) Far more important are the totally stupid programs that you can't write in it.
Plus of NIM:
* powerful compiler can produce C, C++, JS, ObjectiveC code
* GC can be completely removed to adapt to the program
* support parallelism via threading
Plus of Crystal:
* use of types union permit to mix types in almost every data structure, let you pondering if the langugae is really strong typed
* so similar to Ruby that porting a 100-lines library (with no fancy-metaprogramming) to Crystal is often a matter of few hours
* use of green threads suits very nicely with HTTP request/response cycle (like GO and Erlang), where using threads/processes is more memory/CPU consuming
What Crystal still lacks is parallelism, but core team are working on that.
Said that both are modern, fast, elegant languages, with a good standard library and a vibrant community.
Python syntax with macros => Nim Ruby syntax => Crystal
https://crystal-lang.org/2014/06/19/crystal-0.1.0-released.h...
The question was asked in the comments as with the reply. I'm sorry that I can't link to it directly but it's the seventh comment from the top.
Yeah, I discovered Common Lisp back in 2006 and have been using it ever since …
I discovered Go back in 2009 and have been using it ever since, too.
What does Crystal get me that these two don't?
I am using it for years already, and it is really performant, somewhere between C and Rust. I am still wondering why so few people use it.
Benchmark: https://github.com/kostya/benchmarks
Nim vs Rust: http://arthurtw.github.io/2015/01/12/quick-comparison-nim-vs...
Performance discussion: http://forum.nim-lang.org/t/2261
Embedded Nim: https://hookrace.net/blog/nim-binary-size/
Nim on LLVM: https://github.com/arnetheduck/nlvm
If you don't have to write cutting-edge games or embedded software for tiny systems, why do you have to care about allocations at all? Today's systems and RAM's are so fast that garbage collections don't really matter in most cases. Consider SBCL (compiled Common Lisp) which is almost as performant as Java and C++.
http://benchmarksgame.alioth.debian.org/u64q/lisp.html
I used to develop software in C and C++ for many years, and a garbage collector was the thing I wanted the most. GC-free programming is unnecessarily tough in most cases, except you desperately need it for games and embedded systems.
Which one is more minimalistic, 'new Foo' or a collection of various custom-tuned allocation methods? Which one is more terse, 'myList.Where(foo).Select(bar).Aggregate(baz)' or an explicit for loop?
Exactly! I cannot agree more.
I have a small test program I port to different languages to test the length of the code and the speed of the program. Of course it only represents a single use case.
* C is first, of course.
* twice as slow, come Pascal, D and... Crystal!
* x3 to x5, come Nim, Go, C++ (and Unicon).
* x6 to x9, come Tcl, Perl, BASIC (and Awk).
* x15 to x30, come Little, Falcon, Ruby and Python.
* x60 to x90, come Pike, C#, Bash.
* x600 to x1000, come Perl6 and Julia.
This list looks byzantine, I know :-) The trends I can get out of it:
* the last 2 are languages with JIT compilation, and that's horrid for short programs.
* the "old" interpreted (or whatever you name it nowadays) languages (Tcl, Perl) are not so bad compared to compiled languages, and much faster than "modern" one (Ruby, Python). (Again, this is only valid for my specific use.)
* compiled languages should all end up in the same ballpark, shouldn't they? Well, they don't. The more they offer nice data structures, the more you use them. The more they have some kind of functional style (I mean the tendency to create new variables all the time instead of modifying existing ones) the more you allocate and create and copy loads of data. In the end, being readable and idiomatic in those languages means being lazy and inefficient, but what's the point of using those languages if don't use what they offer? C forces you to use proper data structures and not re-use existing ones. It comes naturally. What is unnatural in C is to copy again and again the data, it is simpler to modify the existing one and work on the right parts of it, not to pass the whole chunks every time you need one single bit. In more evolved languages, compilation won't save you by doing some hypothetical magic tricks, it cannot remove the heavy continuous data copying and moving you instructed your program to do. And that is what made the difference in speed between C on one side, and D, C++, Go on the other side.
EDIT: There's also this: https://github.com/nsf/pnoise
Two data points (one-off timings of a few lines of code doing the same work load) just don't make for a comparison we should spend time bothering about.
Whatever you think of the benchmarks game, I don't see why we need to waste time with comparisons that don't meet that low standard:
- a few different tasks
- more than a code snippet
- a few repeat measurements
- a few different workloads
>Remember: The cake is a lie, and so are benchmarks. You won’t have 35x increase in performance all the time, but you can expect 5x or more in complex applications, more if it’s CPU intensive.
Could it be startup time? That's less of an issue when the application has started up.
A fibonacci application is not a very good benchmark anyway.
That said, it's a great language worth recommending.
Depends what we mean by fast. I have seen Erlang VM handle 100k requests per second on a distributed cluster. That's plenty fast. Moreover, because of fault tolerance, it means ability to have a better uptime, with less people on-call. "Fast" can also be measured to include that, if system goes 200k requests per second, but crashes at midnight and stays down for a few hours, the average "speed" can be quite low. In a laptop demo that's not visible, but in practice that's money and customers lost.
But if fast means, "let's multiple some matrices", then yeah can probably use Rust or C for that. It all depends on the problem domain.
Not bad really for a language that's meant to be slow at computational stuff :^)
@compile [:native, {:hipe, [:verbose, :o3]}]
I've never used elixir but I assume it has a similar performance profile to erlang as it shares the vm.
I guess it's a good thing that people like it so much, but it's really starting to feel marketing-y by now.
That's a good sign!
You know why? Because it has a great community and is very friendly for new comers. Jose, Eric and the rest of the team made that a priority and it shows. It doesn't just mean being nice on IRC, it also means putting usability first, putting more effort in how example looks, how documentation looks and so on.
If Google invented a language then proceed to push and sponsor it, by paying authors to work on it, organizing marketing, hackathons etc, then it is hard to say if it popular because of Google's backing or because it has its own merits.
Case in point, LFE (Lisp Flavored Erlang) was created by one of the original designers of Erlang, Robert Virding, has great support for a small FOSS project, true macros, but the popularity of Ruby has rocketed Elixir way ahead in terms of repositories and users. Erlang Solutions has it on the site, but it is not as touted as Elixir. People go with what they know, and let's admit it, Lisp is a great language, but not as popular in the web-dev crowd sans Clojure (which I don't see as so Lispy).
From the early looks of it, having come from industry and academia, Pony lang looks poised to muscle in on Erlang/BEAM/OTP, Elixir and LFE anyway. I personally don't like the syntax, but syntax is not semantics, and you get over it.
Popularity doesn't always win the day if you do something a bit more off the main road, and potential to earn more researching what you love: Look at qdb/k devs and jobs, and Haskell has started increasing in uptake by fintech. Go with what you like, or as Joseph Campbell said, 'Follow your bliss' and the rest will fall into place.
But don't listen to me. I spend many waking moments fiddling with J (jsoftware.com). Not actually the most loved or known PL out there. I think the array languages J/APL/K/Q will have their day due to where software and hardware are heading: Multicores, array processing (GPU/FPGA hybrids, custom computers).
As someone who has made the transition from Ruby to Elixir, I'm really, really intrigued by Crystal.
Though, upon a cursory look into the Crystal docs and community, a couple things are clear...
Elixir killed it on all the things surrounding the language.
- docs
- testing
- Slack chan, IRC, mailing list
- package manager (Hex)
- build tool (Mix)
- web framework (plug & Phoenix)
- books from major publishers (manning, pragprog, etc.)
- ElixirConf
- ancillary teaching (ElixirSips, LearnElixir, Elixir Fountain, etc.)
While the language may not be as computationally performant as some of others mentioned, all the things above lower the barrier to entry for adoption and make Elixir a more attractive language than some of the counterparts. And it's amazing that a language this young has nailed it on these fronts.
Crystal, on the other hand, is as if I'm writing Ruby.
[1] http://elixir-lang.org/getting-started/pattern-matching.html
[2] http://elixir-lang.org/getting-started/case-cond-and-if.html...
JVM+FP FTW.
Of course the Crystal people probably don't have the same number of developers working on it as Go did even early on.
Edit: Go took a little while to support windows, not until around July 2010. See this question from November 2009: http://stackoverflow.com/questions/1717652/can-go-compiler-b...
It does exist unlike Crystal, so you should explain what you mean by poor. I never had a problem compiling a project with CGo on Windows.
The way they interact with different types, fibers and allocation on the stack vs heap, etc. Makes sense?
Edit: To give you an example of how friendly the Crystal lang & API documentation is to developers unfamilliar with the language, let's look at the Iterator: https://crystal-lang.org/api/0.18.7/Iterator.html
It comes with a great "introduction" to what it is, what it does and gives an example of the advantages it has over the Enumerable. It also explains how you can implement your own Iterator.
We can also look at the IO module for Crystal and the io package for Go: https://crystal-lang.org/api/0.18.7/IO.html https://golang.org/pkg/io/
From a beginners standpoint, you have to see that the Crystal documentation is way more friendlier.
To be clear, I have no strong opinions about Crystal and will probably never use it. But comments like yours are simply grandstanding and it's annoying that they are confused for contribution.
(1) Not at all, look at the source implementation of their language implementation. For example, the lexer: https://github.com/crystal-lang/crystal/blob/master/src/comp... Seems pretty clear to me.
(2) Not entirely sure what you mean since that is such a broad case, but as stated, Crystal stdlib is mostly safe.
Nice.
> (1) Not at all, look at the source implementation of their language implementation. For example, the lexer: https://github.com/crystal-lang/crystal/blob/master/src/comp.... Seems pretty clear to me.
How am I supposed to learn the language's semantics from a lexer?
> (2) Not entirely sure what you mean since that is such a broad case, but as stated, Crystal stdlib is mostly safe.
Consider this use case: I spawn five fibers. Can I send the same mutable object to all five? If so, can they attempt to mutate the object without properly taking turns? (e.g., using a mutex)
This is why I came to peace with Go's way of life.
Way better to push for less code being written in C, than argue about the language design decisions.
Firefox does the entire UI in XUL and Javascript: all GCd.
Chrome uses a GCd heap for much of its C++ code, and of course the web pages themselves are fully GCd.
I'm looking for something to plug into LLVM's GC mechanisms to be used for new languages.
I very much doubt that a fib code-snippet is what he was asking about.
So if the number of survivors << the number of allocated objects, which it always is in many functional languages, then GC can be faster than manual memory management. Especially if you use a copying GC algorithm which makes allocation extremely cheap.
You can get by in a GC'd system if you're careful not to allocate while being in the "hot path", but it's much more difficult than manual memory management (you need to know the internals of the GC algorithm) and interference from other threads may spoil your hard work.
Minecraft is a prime example of annoying (Java) GC pauses causing annoying interruptions. Another one is Kerbal Space Program's choppy audio (from C#/Mono GC). Although these games made millions or billions of dollars regardless, so you might argue it's a non-issue.
> currently my most optimistic time table would be able to have vendor support for Rust toolchain in 10 or 15 years
Not sure how much you'd need changes for the Rust compiler to be able to use it on MCUs and DSPs, but LLVM is more and more common and it might be (almost) enough to have the LLVM backend ported to the target arch. LLVM is moving fast, so for some targets it might be viable much sooner than your estimate.
Lack of multithreading is an issue for _ME_ because i'm working on an app that could really benefit from it. Just because Ruby is used almost exclusively for webapps doesn't mean Crystal is. Also, the "load another copy of the entire app into memory" that Rails people use is a really crappy way of doing "multithreading" .
Indeed, it may not be minimalistic in terms of the code size.
Please show one or more of those programs.
What? RAM is not fast at all, the latencies have almost not improved in 20 years (compared to the improvement of other subsystems like the CPU, of course).
Also, using numbers from a benchmark game is not representative of the performance of real world applications. If you look at the code, you'll find that it's written in a style that avoids heap objects and GC wherever possible. Forcing heap allocation is what makes Java slightly slower than C in many cases.
Granted, most software isn't like that, but it's certainly not only games and embedded software on tiny systems.
However, I think there's a sweet spot where you could implement a fairly nice boilerplate/framework that would be an 80% solution to the problem which would be a vast improvement over the current state.
The missing 20% would be language specifics and that would be either solved by forking the boilerplate code or writing some kind of callbacks for discovering references given a root object.
edit: Additionally, there's no simple example of using a GC with LLVM. It would be very helpful if there was, for example, a GC'd version of the Kaleidoscope language used in the LLVM tutorials. Even a trivial Lisp-style cons/car/cdr object system coupled with the simplest possible mark'n'sweep GC would be good.
http://projects.eclipse.org/projects/technology.omr
The guys behind The Mu Micro Virtual Machine were the ones that did the L4 OS formal verification.
Eclipse OMR is based on IBM's J9, they are making it language agnostic to support PHP, Python, Ruby and whatever else one can think of.
However they aren't LLVM based and the caveats of a generic GC do still apply.
I also agree with you, the problem is how much those 20% actually are in terms of effort.
EDIT: typo where => were, on => one
But correctly implemented reference counting is essentially pause-free. It's consistently "slow", which is better for some cases that unpredictably "fast".
For example, the classical "The Garbage Collection Handbook"
LFE had no documentation, no tools, no learning resources for a really long time. Compare that to Elixir that focused on those aspects since day one. Furthermore, LFE has reached 1.0 only recently, almost 1.5 years after Elixir, and that has an impact on industry adoption.
LFE also was, for a long time, literally Lisp-flavored Erlang while Elixir attempted from the beginning to bring its own abstractions such as protocols, collections, Unicode support, UTF-8 strings, and they are still pushing it forward: http://elixir-lang.org/blog/2016/07/14/announcing-genstage/
So I think you are selling both languages short. There is much more happening in Elixir besides the "popularity of Ruby" and there is a lot of potential in LFE now that they are focusing on being more than a "lispy" Erlang.
Lily is statically-typed, built-in classes can't be inherited from, and there's no C-like casting.
With those rules in mind, most objects can't become cyclical. It's impossible for a list of strings to loop back onto itself, for example. It helps that the value classes backing enums (like Option and Either) are immutable, which I so far suspect prevents a cycle.
That at least allows you to group classes into three groups:
These never cycle (Integer)
These may cycle (List)
These always cycle (Dynamic, linked lists?)
And yes, that ~60 lines per platform is really all that's needed. (And actually, I should be able to merge more of it for SysV platforms.)
So, I've thought about a GC, but I'd really prefer not to have it.
However, there are compile time versions of method_missing, delegates and instance_exec available (usually with slightly different naming due to not having the exact same semantics as the ruby counterparts), so it is still possible to do some magic.
It's also quite a joy to program in, and I'm a JavaScript developer, so I'm coming from the other side of the spectrum.
I consider myself a javascript developer because that's what I've done my best work in and that's what I enjoy the most.
[1] https://www.techempower.com/benchmarks/#section=data-r12&hw=...
"We don't know what caused the errors and unfortunately we didn't have a chance to collaborate with them on a true run. A few months ago they added Phoenix in a preview, but it was a very poor implementation. They were testing JSON benchmarks through the :browser pipeline, complete with crsf token generation. They had a dev DB pool size of 10, where other frameworks were given of pool size of 100. And they also had heavy IO logging, where other frameworks did no logging. We sent a PR to address these issues, and I was hoping to see true results in the latest runs, but no preview was provided this time and we weren't able to work with them on the errors. tldr; these results are not representative of the framework."
From my personal experience going from Rails/Sinatra to Phoenix it feels a lot faster but I haven't done any benchmarks so take that with a grain of salt.
In the Techempower benchmarks the Phoenix tests had a ton of errors and there was no preview run so whoever submitted them wasn't able to fix them. Look at the error column. I assume they'll be fixed in the next run.
Take a look at the "Single Query" test [1]. The entry for Elli has zero error and is still slower than many Python & Ruby frameworks.
[1]: https://www.techempower.com/benchmarks/#section=data-r12&hw=...
https://www.techempower.com/benchmarks/#section=data-r12&hw=...
we should all be using Dart on the server coupled with MongoDB perhaps.
Hard-realtime is always "allocate everything up front". It has to be. Allocation of dynamic sizes is not a problem you can make fully deterministic.
Crystal will at some point also be forced to optimize their GC for these cases, although it currently uses an out-of-the-box GC called Boehm-Demers-Weiser conservative garbage collector http://www.hboehm.info/gc/ which they have acknowledged they need to replace sooner or later.
Regardless of how good GC is you would never use it in a hard real-time system because it is non-deterministic. IRC chat is only soft real-time.
The principle is to regularly use a bounded amount of time for collecting - in line with the latency requirements for the whole system. I think the relevant term is 'tick tock', as in tick - compute, tock - collect.
If your program stops for 50 ms, do you really care if it was because of a GC cycle or something else? If you really do care, then you are not allowed to target Linux, Windows or OS X all of which are decidedly not real time operating systems.
- Firstly, we are a Ruby shop and most of the time the errors are pretty self descriptive that we can easily solve. Also the IRC/Gitter room is full of helpful people that you can get instant feedback about any issue (even a compiler bug). Note that we're not afraid of writing code :)
- I've updated our application once in the last 6 months. It's only a minor change for adding type annotations which took like 10 minutes.
- Actually i think that Crystal is already in beta quality (for us). Calling it Alpha is really an understatement.
I feel like Crystal is one of the best projects out there but it's not getting much attention. What do you think it will take to get people to start writing projects and contributing?
IMO the number one thing will be to start getting as many articles like this out as possible, and go beyond that. TodoMVC/Yelp Clone tutorials would go a LONG way I think, especially since a lot of people will just fork those projects to mess around in them.
This is with pointer happy java code, not with special effort to have pointer less data.
[1] http://openjdk.java.net/jeps/189 [2] https://fedoraproject.org/wiki/Changes/Shenandoah
I'm a bit surprised that indirection is efficient enough to be worth the trouble (since you need reads and writes to branch for the indirected-object case), but I can't argue with results.
It depends on the problem domain as always. If you're doing a lot of naïve single-threaded number-crunching, have fun. But Elixir/Phoenix haven't failed me for web applications, even in very intensive situations. It's the first time I've barely had to do any tuning beyond external factors such as network and database queries (which, by the way, Phoenix's Ecto handles very gracefully and explicitly).
This is my experience with every Elixir/Phoenix app I've worked on thus far. I apologize if I made it sound like some sort of universal truth.
I think that's a little simplistic. It's true that it's early days for cross platform Swift but clear progress is being made:
1. https://swift.org/download/
2. https://github.com/SwiftAndroid
3. https://github.com/tinysun212/swift-windows
4. https://swiftforwindows.codeplex.com/
Better cross platform support is a goal of Swift 3.0:
I certainly wouldn't invest my code anywhere near Apple unless that was also my market. Who knows what direction it's moving, aside from in Apple's interest. I'll stick with rust and go: between the two I get everything but easy objective c interop.
Plus, their design decisions with respect to null ability is... Interesting. It's gonna feel gimped by legacy needs for a long time.
Keep also in mind that my reply was within the context of a thread on Crystal. OP sort of sold Elixir as a fast language that we can use now while we wait for Crystal to mature.
My point is not that Elixir is useless. My point is that we must not oversell Elixir as a fast language. Generally speaking, it isn't. It excels at horizontal scaling, which is great, but I wouldn't call it "fast" without proper qualifiers.
as-in FAQ 1.4 "What sort of problems is Erlang not particularly suitable for? … The most common class of 'less suitable' problems is characterised by performance being a prime requirement and constant-factors having a large effect on performance. Typical examples are image processing, signal processing, sorting large volumes of data and low-level protocol termination."
Not what I wanted to achieve, but the lexer contains some frequently used methods and is fairly simple and straight-forward, but if you want to learn the semantics why not just go to their docs? https://crystal-lang.org/docs/syntax_and_semantics/index.htm...
> Consider this use case: I spawn five fibers. Can I send the same mutable object to all five? If so, can they attempt to mutate the object without properly taking turns? (e.g., using a mutex)
Taken from the docs:
Crystal has Channels inspired by CSP[1]. They allow communicating data between fibers without sharing memory and without having to worry about locks, semaphores or other special structures.
...
Because at this moment there's only a single thread executing your code, accessing and modifying a global variable in different fibers will work just fine. However, once multiple threads (parallelism) is introduced in the language, it might break. That's why the recommended mechanism to communicate data is using channels and sending messages between them. Internally, a channel implements all the locking mechanisms to avoid data races, but from the outside you use them as communication primitives, so you (the user) don't have to use locks.
[1] - https://en.wikipedia.org/wiki/Communicating_sequential_proce...
Yep, fair enough. I'm somewhat worried about how abstract classes work: https://crystal-lang.org/docs/syntax_and_semantics/virtual_a...
Apparently, Crystal can infer the methods of an abstract class from the methods of its subclasses. In the Animal/Dog/Cat example, what happens if, in a separate module, if define a Snake class that doesn't have a `talk` method? There are several possibilities, sadly all pretty bad:
(0) Does the type checker retroactively decide that not all Animals can talk?
(1) Does the type checker decide that Animal subclasses can't be defined in a separate module?
(2) Does the type checker decide that, if an Animal subclass is defined in a separate module, it must have all the common methods to all Animal subclasses defined in the same module as Animal?
(3) Is type checking not modular?
> That's why the recommended mechanism to communicate data is using channels and sending messages between them.
I'm asking about the errors that the language prevents, not the community's conventions.
My understanding is that, at the moment, the language doesn't have native threads yet. The design of multi-threading behavior and errors the language prevents is still being worked out.
The abstract class defines which methods every class that inherits from it must define by using abstract methods. If you write a class that inherits from an abstract class, and you do not define a method that the abstract class says that you absolutely must define, the compiler will raise an error for you. You can inherit from classes between modules.
> I'm asking about the errors that the language prevents, not the community's conventions.
I haven't actually tried it, definitely something I'll check out another time. I understand your concern though!
Memory management isn't hard --- you just need to pay attention to detail and not say "YOLO, let's abort on OOM" like the Rust stdlib does. Rust is an unacceptable language for anyone who cares about robustly responding to heap exhaustion.
You're quite right. The problem is that every bit of attention you spend on that detail is attention that you're not spending on details that are actually solving your problem.
I programmed in C for decades. I do not miss malloc() and free() in the least.
(I still do use C when the situation warrants, but the situations where it is warranted are becoming rarer and rarer with each passing year).
Add power consumption to that list.
-> 'extreymely difficult to get INcorrect in the way rust does'
Hard real-time GC systems exist. In these systems, you can prove that pauses last no longer than a certain number of milliseconds. They're definitely applicable to programs with UI.
Can you prove that dropping a reference doesn't free an arbitrarily large number of objects? You can probably convince yourself in specific cases for specific programs that you don't see arbitrarily large refcount-release times, but any change you make to the code might invalidate this analysis.
A hard real-time GC stays hard real time.
All of these can happen with non-GCd languages through heap fragmentation (i.e. even when correctly allocating and deallocating memory, you can still end up with a fragmented heap.) Tho only way to aviod this is to avoid all dynamic allocation (which is indeed done in a lot of systems) or exclusively use memory pools instead of a traditional heap.
EDIT: I'd also add that this is such a niche[1] area of programming that expecting any mainstream language to meaningfully support it is... optimisitic and that mainstream languages shouldn't try to support it. (Soft real-time may be reasonable, but I believe that can be achieved with GC as demonstrated by the Azul JVM.)
[1] Niche, but obviously important, but perhaps not lucrative enough for anything to displace C or perhaps Ada -- given that these industries tend to be extremely conservative. (I wonder if ATS is used, though. Can't claim it's pretty, but proof seems like it would be a good thing for these systems?)
Before Aonix got bought by PTC they had a few contracts for weapons control for missiles and battle cruisers.
- disable the GC
- deregister threads so that they are not stopped by GC
- eventually avoid the runtime altogether
There really is no realtime system that D can't do.
The whole anti-GC thing is a giant strawman that consider all GC stop-the-world, unavoidable, and overarching. Academia decided in favor of GC decades ago, and industry has been following suit for good reasons: mental overhead associated with finding owners to everything.
As they say, you can write FORTRAN in any language, but you don't necessarily want to (this is unfair to modern fortran which I hear is actually a decent language).
On the theoretical side, not reasoning about ownership means sharing data betweent threads is done with copies (slower) or locking (slower and error prone); if you know about ownership you can share references to data while it can't be mutated for free.
Ownership is also important for any non-memory resource (file handles, mutexes, etc). GCs release those "whenever", maybe never, unless you close manually.
And even though manual memory management has some small non-deterninistic overhead for heap coalescing (which one can usually work around with pools), most GCs I've worked with add measurable overhead. This equates to more cost per server, more load, more battery life drained, higher response times...
I don't think it follows and it's rather the reverse: it what I share has a global owner (ie. the GC), I don't have to lock or copy by definition: once it stops being reachable it will be collected. That's why some lockfree algorithms are enabled by the GC. With ownership you would have to have a unique owner, or reference counts. GC does require write barriers or stop-the-world though so let's say it's a draw :)
> Ownership is also important for any non-memory resource (file handles, mutexes, etc). GCs release those "whenever", maybe never, unless you close manually.
Yeah, it's a big problem that the GC even attempts poorly to close them. But D has scope guards and RAII builtin so for the 50% of non-memory resources you still have to think about ownership indeed. That's more complicated that the C++ situation. But realtime it does not prevent, you may well find yourself having more time to optimize :)
While Nim is my favorite language, I can understand that it has a small userbase, for these reasons:
1. No major backer like Google for Go or Mozilla for Rust
2. No killer feature like "memory safety and performance without GC" for Rust, instead a mix of all the reasonable down-to-earth features I want in a programming language
3. Some unique decisions instead of what you're used to from other languages, for example partial case sensitivity
2+2 * 5 = 20
2 + 2 * 5 = 12Partial case sensitivity and the special underscore case are features I can live with. Unfortunately this has actually become a stumbling block for a wider adoption of Nim.
All strange special features should be optional, not default.
If you put space around some operators and not around others, in a way that doesn't correspond to precedence, you're going to confuse anyone who reads your code, in any language.
I think this goes beyond syntactic sugar. Holding the hand of the developer too much?
Personally, as a Python programmer I like interfacing with C++ code like Qt via PyQt. If I see a camelCase method I know where it came from, but if I see a PEP-8 style name or method I know it's our own code, not from Qt.
I'd be very interested in a language that is roughly as low level as C, but has some obvious warts "fixed" while still being able to run on bare metal or with a minimal runtime system. I also don't care about a standard lib as long as I can call open(), close(), read(), write(), socket(), etc.
Native threads is another requirement for me.
Things I'd like to see in a language:
- compile to native executable
- type inference
- module system without header files
- easy to call into native C code, and export functions so they can be called from C or any other language
- first class SIMD structures (this is missing from Rust!), so that you don't have to duplicate code for sin4f and sin8f (which would be line-by-line equal, except types)
- perhaps some kind of modern polymorphism (ie. not class based OOP)
- can target GPUs via LLVM or SPIR-V
- memory safety is optional, but nice to have. I'd be mostly interested in using this kind of language for GPU kernels and tight inner loops, where you wouldn't be allocating anyways
I have a bunch of design ideas and prototypes in my drawer waiting for a lot of free time and inspiration appearing.
I like my tools sharp, even if it means there's going to be blood occasionally.
One of the virtues of the language would also be the direct correspondence between the HLL code and the emitted VM instructions, without any optimisation passes. This makes it much easier to reason about code performance and to write code which performs consistently and predictably (albeit a bit slower).
Also, I am not sure what you mean by "first class SIMD structures", but you can definitely have a single definition for sin4f and sin8f if they are line by line equal except types, by using union types.
[0]: https://github.com/BSVino/JaiPrimer/blob/master/JaiPrimer.md
It doesn't compile to a native executable, but since it produces .o files you should be able to just set your entry point and go from there.
The only thing I would add would be that compared to Ruby, Nim still takes you quite a while to put something together, so defaulting to Ruby isn't necessarily a great idea.
> Rust regex! runs faster than Regex
which is a very old claim - Regex should now be much faster than regex! ever was. Any pre-1.0 Rust benchmarks are probably wrong (to be fair, most benchmarks are probably wrong anyway).
No it's not, it is very different from Node. Elixir/Erlang know how to take advantage of multiple CPU cores easily. Node by default doesn't.
A single JVM server can do that load, scaling and providing fault tolerance for a server that just accepts requests is trivial these days, also, if your requests do computationally intensive stuff you are going to have a very bad time with Erlang.
It all depends on the problem domain.
Exactly, and the domain for Elixir/Erlang is way more niche and specialized than applicable domains of other languages.
I don't have anything against Elixir but part of its crowd just advertises it as the best thing for everything.
Furthermore, assuming each request is mapped to an Erlang process, each of them get their own VM -- no stop-the-world.
Not a very usable one. It means that atomicity is something you have to fight very hard to actually get, and very easy to accidentally destroy.
Granted, I'm completely head over heels for Rust, and I agree completely that ConcurrentModificationException is a crappy answer, but it is defined behavior (AFAIK).
Just look at the hand performance articles on C... People talk about it because you can do it, not because you have to do it.
It's crazy I can't tell Java and NodeJS "use the memory you need". Instead I have to specify max memory sizes (and then watch as they inevitably consume all of it).
Define the 'memory you need'? You know the computer doesn't have a cristal ball to know what latency vs memory usage trade off you want..
I don't write java, but my impression is the articles being talked about are much more java specific than the C ones are (C specific).
In my limited experience writing performance-critical Python code the improvements always came from choosing better libraries (eg for faster serializations) or improving our own code. The GC never showed up in profiling as an issue for us.
And yet, it performs great, including predictable STW latencies, all with a relatively simple and straightforward algorithm. With that in mind, the Java GC's manifold ways of tuning the GC in all its aspects for minimal performance improvements sound more like something that was purposely built as something people can build their livelihood upon by providing consulting services, rather than something that was built for the best performance possible for everyone.
And they tend to be very memory hungry. Often, the memory overhead is the difference between running a program or having a bunch of browser's tabs open.
Since memory deallocation is not deterministic, there have to be a tradeoff between lazy scheduling (which increase memory consumption) or frequent scheduling (which has a performance overhead).
You can do a fine tuning between those variables but that means that a high performant with a low memory footprint system is a very challenging thing to make using a tracing garbage collection (the ones in Java and Node).
Go has been able to perform well with a simple GC because it doesn't suffer from this problem.
These articles are bullshit. Most settings are either obsolete or forcing the default. The rest is just useless.
I spent months doing performance tuning of applications stacks which were using Java (for app, database or both). Most of the settings are useless and barely change +-1% in performance.
The JVM has had good defaults for a while. The only thing one MUST configure is the -Xmn and -Xmx options to set the maximum amount of memory allocated to the java process (both settings to the same value).
That's as hostile to newcomers as it gets. Contrast this with the Rust book, that gets you from "I have no idea how anything works" to "hey I just wrote a small program!" in a few minutes.
I learned Elixir totally from Getting started guide and then the documentation. Then for OTP, I have read an Erlang book to understand it well. Elixir's documentation is really awesome and one of the best I have seen.
[0] - http://elixir-lang.org/getting-started/introduction.html
I've used these before when playing with it:
* http://elixir-lang.org/getting-started/introduction.html
* https://howistart.org/posts/elixir/1
* Asking on #elixir-lang on freenode IRC
Here is a specific chapter on distribution and nodes:
http://elixir-lang.org/getting-started/mix-otp/distributed-t...
Their IRC channel, they were pretty helpful for me. And also I am sure they'd want to hear about your experience to also improve their docs and tutorials.
Exactly. To just add to your point, there is no longer a reason to settle for GC pauses with Rust. It does require more thought while writing the code, but what you gain is a firmly consistent runtime. If your memory allocation is slow, you can create your own allocator/slab, and then use that for hot memory space and optimize it out.
As a longtime Java geek who never understood the argument against GC, this has been a mind altering experience. I was a big C++ person before, but after one too many memory leaks and segfaults, I could never imagine not wanting a GC. Then Rust came along and taught me better.
1) Rust appears to be significantly less productive than a true GCd language. I see a lot of people talking about "fighting the borrow checker" with Rust and I see a lot of articles describing basic patterns that would be simple in any other language, but are complex in Rust.
2) If you want to invoke code that assumes a GC you need to have one.
You can do manual memory allocation in Java by the way and a few high performance libraries do. It's just not common.
It's interesting to note that the Chrome guys have gone in the direction of deploying a GC into C++ whereas the Mozilla guys have gone in the direction of moving manual memory management into the type system. I've got nothing against Rust but I'm a Chrome user, personally.
My experience is that this is an initial hurdle to clear. As an example, I've almost exclusively worked in GC'd languages for a while, and after learning Rust for a few months I very very rarely have borrow check errors.
The fact that it occasionally requires a complex pattern to do right should get better with time (non-lexical lifetimes would help), and there's also discussion around GC integration so that you could interact with a scripting language GC when writing a plugin for it, or you could farm out GC'd objects when you need to have cycles (i.e. in graph algorithms).
> the Chrome guys have gone in the direction of deploying a GC into C++
Interesting. I'm curious how much of the browser relies on it. I'm also curious whether it's an attempt to paper over C++ with a little memory safety, or whether it actually offers performance improvements. My original point was not that GC is bad, per se, but that I quite like being able to avoid it when it's reliable to do so, which is not the case in C++, IMO.
[0]: https://hacks.mozilla.org/2014/09/generational-garbage-colle...
[1]: https://hacks.mozilla.org/2015/07/compacting-garbage-collect...
> I see a lot of people talking about "fighting the borrow checker" with Rust
It's also usually described as "at first, I fought the borrow checker, but then I internalized its rules and it's now second nature." You're not wrong that it's a hump to get over, but once you do, it's not a big deal. > that would be simple in any other language,
Any other _GC'd_ language. You still fight the same kinds of complexity when you don't have GC.Also I am yet to see any large scale production deployment of those Hadoop alternatives.
But it might still be like 5 years from now, so who knows how it will evolve.
You're assuming your compiler or operating system won't cause memory to be freed at different times.
I also think it's a shame that we waste so much electricity and rare earth minerals on keeping ourselves from screwing up.
Wasting man hours on manufactured problems is far worse than wasting coal.
Manufactured problems? When what's now coastline is underwater, I'll be glad to see if you remain as smug.
I get that trying new things, but somethings are pretty well agreed upon.
I look forward to those changes coming to Java, and I think that stack-based value types could do a lot for the language. That said, the Scylla folks seem to have gotten a lot of their performance gains from CPU/thread affinity and async I/O (http://www.scylladb.com/2016/03/18/generalist-engineer-cassa...). NIO is pretty great in Java-land, IIRC, but CPU/thread affinity is, I imagine, hard to pull off with a garbage collector.
Another thing I'm curious about w.r.t. value types in Java -- hasn't C# had those for a while? If so, and if your claim that value types will provide large performance benefits is true, why isn't C# always blowing Java away in benchmarks? Perhaps it is and I'm just not seeing them. Perhaps Java's escape analysis is already pretty good and solve the 60/70/80% case? Perhaps I'm not well versed enough in the subject to understand the interactions here.
NGEN was just good enough for allowing quick application startup.
Also they didn't invest too much in optimizations in the old JIT.
Specially since .NET always had good interop to native code via C++/CLI, P/Invoke and RCW.
There were some improvements like multicore JIT in .NET 4.0 and PGO support in .NET 4.5, but not much in terms of optimization algorithms.
Hence why .NET 4.6 got a new revamped JIT called RyuJIT with SIMD support and lots of nice optimizations.
But this is only for the desktop.
.NET for the Windows Store is AOT compiled with the same backend that Visual C++ uses. In the Windows 8 and 8.1 they came up with MDIL from Singularity/Midori but with 10 they improved the workflow to what is nowadays known as .NET Native.
With the ongoing refactorings they plan to make C2 (Visual C++ backend) a kind of LLVM for their languages, similar to the Phoenix MSR project they did a few years ago.
If you watch the Build 2015 and 2016 talks, most of them are making use of C# with the new WinRT (COM based) APIs, leaving C++ just for the DX related ones.
So they are quite serious about taking their learnings from project Midori and improve the overall .NET performance.
I think this is because, on a platform like Java or .NET that uses generational garbage collection, the heap starts to behave like a stack in a lot of ways. Allocations are fast, since you just put objects at the top of the heap. And then, since they're at the top of the heap, they tend to stay in the cache where access is fast, so pointer chasing doesn't end up being such a big deal. On the other hand, if you use a struct, every time you pass or return it you end up creating a shallow copy of the data structure instead just passing a single pointer.
(Disclaimer: preceding comment is very speculative.)
I think you mean specialized generics (i.e. no autoboxing of primitives when used in generics)? Reified generics implies carrying around all generic type information at runtime, which will not be the case and also has nothing to do with performance. Non-value generics will still be erased I thought.
https://www.youtube.com/watch?v=Tc9vs_HFHVo&list=PLX8CzqL3Ar...
They will change the constant pool to have some kind of template information that gets specialized (what they call type species) into a specific set of types.
The plan is even if Java cannot fully take advantage of all possibilities due to backwards compatibility with existing libraries in binary format, the JVM will support it for other languages not tied to Java semantics and backwards compatibility.
let a = 10;
a / 5
output> 2
let b = pwd();
b/temp
output> Directory<"~/temp">
b / 2
error> b:Directory does not implement method "divide(:number)"
a/temp
error> a:int does not implement method "get(:string)"I understand the hate and everything but honestly I think it presents a fun and refreshing way of solving problems.
Also npm is pretty awesome, aside from how massive the node_modules folder gets.
Maybe using numbers for chapters, letters for sections, and a 1.A notation for the headers? At the least, adding the chapter numbers to the header, so it says "1. Introduction"? Putting the chapter numbers in the URL would help too. So would adding some highlighting in the right column index to indicate the location of the current page.
It seems like fantastic introductory material, but only if people can find it. Usually, the first thing do when I encounter a paginated manual like that is to search for a "single page" or "print" or "PDF" link. Is there one there that I couldn't find? If not, adding one might be a simple (partial) fix.
Edit: If others have more (and better) suggestions, they can be added to the issue I filed here: https://github.com/elixir-lang/elixir-lang.github.com/issues...
I don't know if the Erlang/Dialyzer docs cover that specifically, but I know the originial paper on Dialyzer and Dialyzer type specs does.
Edit: here is a source: http://nim-lang.org/docs/manual.html#syntax-strong-spaces ("... if the experimental parser directive #?strongSpaces is used..."). The last time this was discussed I said that it would be removed completely, and I still believe it will be. It's simply not a priority for us right now.
Some of those features are default in Nim, some (strongspaces) are not. I say that all such weird features should be optional in general so that newcomers don't get scared off.
Also case and underscore should work like in C per default since Nim interoperates with C seamlessly anyway. Case insensitivity and ignoring underscore are ok if optional.
Also a reason why C# 7 is getting them as return types in addition to ref/out parameters.
In their example, the type checker can infer that Animal has a `talk` method, even if it's never explicitly defined:
abstract class Animal
# no talk method here!
end
class Dog < Animal
def talk
"Woof!"
end
end
class Cat < Animal
def talk
"Miau"
end
end
class Person
getter pet
def initialize(@name : String, @pet : Animal)
end
end
john = Person.new "John", Dog.new
In their own words, “Now the code compiles:” john.pet.talk #=> "Woof!"
Now, what happens if, in a separate module, I define a Snake subclass of Animal, without a talk method? What happens in this corner case isn't documented anywhere.Type system design is very serious business, and can't be done by mindless trial and error. When a type system has a safety hole, patching it is pretty much guaranteed to break other people's code.
You can't compile a part of your application in crystal, there are no linked crystal libraries. You compile your whole app, will all class definitions. The compiler then will have all the type information to know if a method is missing in a subclass, when that method is used from a parent class.
For future reference: The easiest way to reply to a message that doesn't have a “reply” link below, is to click on the “X minutes ago” link above.
(0) If Snake is in the same module as Animal, the error is that “john.pet.talk” is a call to a nonexistent method.
(1) If Snake is in a different module from Animal, the errors is that Snake doesn't have a talk method.
This isn't nice.
@type pair(t, u) :: {t, u}
@type result(t, e) :: {:ok, t} | {:error, e}
I'm more talking about type parameters on functions, and also constraining by behaviors and protocols. For example the typespec for Elixir's Stream.map/2 (http://elixir-lang.org/docs/stable/elixir/Stream.html#map/2): @spec map(Enumerable.t, (element -> any)) :: Enumerable.t
Note all the other similarly unspecific type definitions. In a parametrically polmorphic lang, you could do something like: @spec map[e1: Enumerable, e2: Enumerable](e1.t, (e1.element -> e2.element)) :: e2.t
And perhaps even intersections of behaviours and protocols: @spec foo[a: Eq & Ord](a.t, a.t) :: bool
('scuse the Elixir syntax...)Oh interesting. Does it run callbacks on multiple threads? That's new, I haven't followed it for a couple of years.
That said, it's not even like a single-GC approach is incompatible with Rust: https://blog.mozilla.org/research/2014/08/26/javascript-serv...
UB is simply the latest stick to hit C with. In day-to-day working nobody worries about UB at all as you generally don't notice it.
Same with lack of a GC; this is a plus point for C for most applications, not a negative.
(and yes, I do have plenty of experience in it, I've been using it for the past 20 years, have you?).
I agree that in day-to-day working nobody worries about it, and I see odd behaviors all the time because of it. In particular as fewer applications are using C, a much higher fraction of C becomes systems and embedded where you are more likely to accidentally run afoul of choices that were made to compete with FORTRAN on numerical performance.
A read from NULL will crash on most unixen, but will not crash on some targets without an MPU and when running in kernel mode, so may be left lurking (see the linux kernel).
The C89 aliasing rules in particular are completely at odds with a lot of kernel and device driver code, and in addition where the int size was 2 bytes previously but is now 4 bytes you can have signed overflow where before the behavior was well defined:
UINT2 x; // 16 bit integer
...
x+=2; // addition mod 2**16 on 16 bit targets, undefined behavior on 32 bit targets.
These are some real-world bugs I've dealt with.> (and yes, I do have plenty of experience in it, I've been using it for the past 20 years, have you?).
I've been using it professionally for only about 15 years, but I started using C at home in '92.
[edit]
> Same with lack of a GC; this is a plus point for C for most applications, not a negative.
This is a bit of a non-sequitur, as I didn't mention memory management at all. C doesn't need a GC. It could use more memory safety though. There's been plenty of academic research on improving C's memory safety without significant runtime overhead; a lot of those techniques were used in rust. There are plenty of tools that can catch a large fraction of memory errors at compile time, which is a good thing.
Honest question, which is the case?
(0) You find it easy to determine, by visual inspection, whether a piece of code has undefined behavior.
(1) Your coding practices make it difficult to accidentally introduce undefined behavior in the first place.
> Same with lack of a GC; this is a plus point for C for most applications, not a negative.
Agreed. C addresses use cases for which GC (or any other feature requiring heavy runtime support) is simply unacceptable.
The vast majority of all code you write will not invoke UB, most people tend to stick to an 'easy' subset of syntax, unlike say C++ where everyone uses a different subset of features making it in effect multiple languages.
A combination of testing the known edge cases, wraparound issues, size issues, static analysis and tooling means running into an example of UB is extremely rare in most cases.
It used to be that people used dynamic memory allocation to beat C with, but that is just a resource management issue. TBH, this is not rocket science. If you need dynamic memory allocation, you had damn well better know how to use it properly.
Its an example of laziness and people ignoring the machine.
Another example is performance; saying that a language comes within a factor of 2 of C's performance and therefore is fast is absolutely ridiculous. a factor of 2 is huge.
You have to remember that people who write C are dealing with machine-specifics day-in, day-out. we're bit-fiddling and writing MPU code and drivers, etc.
Basically, we're much more aware of the machine than higher-level softies, so what would normally be UB is actually DB in most cases, its defined by the compiler and hardware that we're intimately familiar with.
...and that isn't to say that you can't write high level abstracted code in C, the simplicity of the language lends itself to (properly) efficient implementation, not efficient in the sense of Java or Ruby ;o)
Then how you do avoid data races? Two shared references which can mutate your shared data requires either a copy, a lock, immutability, or a single writer.
Trivially parallel algorithms do benefit from constructs like "parallel foreach" and implicit single writers, but in general, one either has to stick to those models (where the cognitive overhead is low but manageable) or if one ventures into more complex territory, one has to either deal with a higher mental complexity (ownership or locks), performance degradation (copying), or immutable data (if it fits your problem and doesn't decrease performance, win-win).
My argument is simply that GC doesn't fix everything, and the mental overhead of tracking ownership of memory (to me) isn't a huge burden, especially since I have to do it for non-memory resources and memory resources shared between threads already.
I'm not against a GC - I like languages that mix GC and non-GC side-by-side - because sometimes I do want to just forget about my memory, but only if it fits my problem domain. But I don't think GC beats non-GC hands-down for-all-cases.
Targetting GPUs is a deal-breaker. I'm sure the Nim compiler would be pretty easy to retarget to GPUs via SPIR-V (the new binary IR for Vulkan/OpenCL shaders and kernels) or OpenCL/CUDA C. But I don't think that would work for Nim's runtime system or existing Nim libraries (including any standard libs it has).
Also Nim's pauseless low latency automatic memory management (I guess you can call it a "GC") is very interesting but it's not what I'm after.
> Also, I am not sure what you mean by "first class SIMD structures",
I mean this:
def multiply_and_add(a : <n x f32>, b : <n x f32>, c : <n x f32>) : <n x f32> {
return (a*b) + c;
// TODO: figure out how to use "madd" from FMA4 or NEON instruction set
}
The trivial piece of code above should be "generic" so that it can be called with any width of vector.Now the example above is very trivial but more complex examples might have challenges for correct implementation of the type checker. In particular, doing vector shuffles (ie. equivalent __builtin_shufflevector in GCC/Clang vector extensions) would need to have a strange type. Shader languages typically use a syntax like `myvector.wxzy`, which might work.
This might perhaps be possible with an ungodly mess of C++ templates and explicit template specialization for each vector type (and hoping that the compiler is aggressive enough in inlining). But I'm not really a fan of template-heavy C++.
In fact, the kind of solution I've been thinking about would be semantically similar to what I'd do with C++ templates.
> but you can definitely have a single definition for sin4f and sin8f if they are line by line equal except types, by using union types.
I'm not familiar enough with Nim's union types to be sure, but my guess is that this would not compile to efficient low level code apart from the most trivial of circumstances. This is my (not very) educated guess based on other high level languages with some concept of union types.
Anyway, Nim is a very cool language that I will check out sometime in the near future. It just isn't what I'm looking for my very specific use case.
Thanks, I've read about it before, but haven't spent too much time looking at it.
However, this "single program, multiple data" isn't exactly what I'm looking for (it would solve the sin4f vs. sin8f issue mentioned above, though). I need explicit, low level access to SIMD, coupled with genericity over vector widths. This means doing almost assembly-style SIMD code with explicit shuffles, blending, etc as well as access to intrinsics where needed.
I also need portability (ispc is from Intel, it probably doesn't support ARM NEON) and targetting GPUs.
I'm very well aware that my needs are very specific. I need to do math stuff for 3d graphics and physics applications.
All I need is for a lot of free time to appear from out of nowhere and I can write a prototype compiler for this myself :)
ISPC actually has some preliminary support for targeting Nvidia PTX btw. It compiles using LLVM.
However, when using specific intrinsics they are for a specific width. It might take some "library code" to take advantage of some instructions like dot products, etc.
Perfect agreement.
This might be true in a deterministic setting, since the likelihood that a test suite will find the error is very high. But in a non-deterministic concurrent setting, throwing an exception that might be only caught once in a blue moon is just as bad as not doing anything about errors.
The fundamental problem still remains, that in neither case can I use the language's semantics to guide the design of my program.
> and as long as you have one instance of the exception you have a stack trace to start from
Oh, really? How am I supposed to find what the problem is in the general case, from just a single exception stack trace that I might or might not get?
If you're really serious about data integrity, ask yourself - does any language even attempt to describe data structures stored on disk?
The compiler does some incremental compilation for the generated object files, so compile times are kept relatively small. Other than that, I don't think it makes a big difference for a developer except saving compile times. On the other hand, in this way you can't have a compiled library conflict so you have to "clean and rebuild" when this happens.
For example in Ruby there's no such thing as pre-compiling a gem, and every time you run an app the whole app is "analyzed" (parsed and converted to VM instructions), and so you can think in Crystal it's the same (except that it's compiled to native code)
Just like every compiled, non-GC program gets. It's annoying af to fiddle with interpreter/VM "maximum heap sizes".
That's a bit like saying “the vast majority of the haystack doesn't contain any needles”.
> most people tend to stick to an 'easy' subset of syntax
I'm not sure I understand what you mean. Undefined behavior has nothing to do with syntax. It's strictly a semantic issue.
> It used to be that people used dynamic memory allocation to beat C with, but that is just a resource management issue. TBH, this is not rocket science.
If I understand correctly, the objection isn't that it's rocket science, but rather that you get little help from your tools if you do it wrong. Memory debuggers will only tell you about memory management bugs that manifest themselves in a particular program run. If a bug will only manifest itself under conditions that are hard to replicate, you're out of luck.
Of course, none of this is an indictment of manual memory management per se, or suggests that garbage collection is a universally good solution. But manual memory management has usability issues, which fortunately being addressed in more modern language designs like Rust.
> If you need dynamic memory allocation, you had damn well better know how to use it properly.
Sure, but are better compile-time diagnostics too much to ask for? Notice that compile time diagnostics don't introduce any runtime performance penalty.
> Another example is performance; saying that a language comes within a factor of 2 of C's performance and therefore is fast is absolutely ridiculous. a factor of 2 is huge.
No disagreement here.
> so what would normally be UB is actually DB in most cases,
As far as I can tell, the trend among C and C++ compiler writers is to optimize programs very aggressively under the assumption that UB simply will never happen, rather than to turn UB into DB.
> its defined by the compiler and hardware that we're intimately familiar with.
Well, “works on this machine” isn't good enough for most of us.
I'm not implying that UB doesn't exist, simply saying that using C is a different mindset.
If you use C, you dont just use the language, you use the language, the toolchain and the machine, you're familiar with the whole stack, quite often down to the metal.
The point about memory management is that memory management is just case of the general problem, i.e. resource management. resource management is a skill you need to have if you're a softie and making it easier in one specific case (RAM) is not a generic solution. Better that you learn how to do it properly then apply that knowledge in all situations (files, RAM, power, etc). e.g. where is the GC-equivalent for power management? or file handles? its the same problem in a different domain.
Wholeheartedly agree. I'm aware that GC is no solution for this problem. But I'm not arguing in favor of GC - I'm arguing in favor or making manual resource management safer, for example, like Rust does. Resource management is every single bit as manual as in C - the only difference is that the compiler yells if you do it wrong.
Even integer addition very easily leads to undefined behaviour.
> It used to be that people used dynamic memory allocation to beat C with, but that is just a resource management issue. TBH, this is not rocket science. If you need dynamic memory allocation, you had damn well better know how to use it properly.
If you're going to solve a quadratic equation you should damn well know how to do it properly, by completing the square. But once you know that you should use the formula, because it makes it a lot easier. If you complete the square every time out of pride, you're just wasting everyone's time.
> A combination of testing the known edge cases, wraparound issues, size issues, static analysis and tooling means running into an example of UB is extremely rare in most cases.
Sure. You can do enough work to eliminate it. Or you can use a language where you don't need to.
> Its an example of laziness and people ignoring the machine.
Laziness is one of the cardinal virtues of a programmer
> Another example is performance; saying that a language comes within a factor of 2 of C's performance and therefore is fast is absolutely ridiculous. a factor of 2 is huge.
A factor of 2 is irrelevant most of the time. If you're growing exponentially, a factor of 2 will let you put off the point where you have to start scaling out by maybe a few months. If you're not growing exponentially, you probably won't hit performance limits at all.
> Basically, we're much more aware of the machine than higher-level softies, so what would normally be UB is actually DB in most cases, its defined by the compiler and hardware that we're intimately familiar with.
Until the compiler adds new optimizations. Sure, if you're never going to upgrade the compiler maybe you can get away with C.
> ...and that isn't to say that you can't write high level abstracted code in C, the simplicity of the language lends itself to (properly) efficient implementation, not efficient in the sense of Java or Ruby ;o)
Without native tagged unions you won't get far up the abstraction ladder. You can write your own with macros sure, but they won't interoperate with anyone else's or any libraries you'd want to use.
What if the error is difficult to reproduce in the first place? This isn't too uncommon in concurrent programming.
Of course I can look at the code. Trust me the reason why I introduced the bug wasn't because I was looking at something else at the moment.
> see why you can't prove it correct,
Realistically, this is because the language and the program's design conspire to make proving anything about the program an uphill battle. If the language could perform basic sanity checks (e.g., no attempting to use objects after ownership has been transferred to someone else), then at least I could have a fighting change to manually prove more interesting properties.
> or work backward to see how the code could have gotten to such a state.
Doing this on a per case basis is an incredibly mind-numbing task.