The Crystal Programming Language(crystal-lang.org) |
The Crystal Programming Language(crystal-lang.org) |
With Crystal, at least when it matures a bit more, this hypothesis could be tested.
There are very logical reasons why dynamic typing at first appears better than static for Rubyists, that I think don't hold up as well after you scratch the surface:
* many Rubyists came from Java, and that kind of typing does slow you down. You need a modern type system with at least local type inference (Crystal seems to have global type inference)
* dynamic typing does actually help develop things more quickly in some cases, definitely in the case of small code bases for newer developers. A developer only has to think about runtime. With static typing a developer also must think about compile-time types, which takes time to master and integrate into their development. The relative payoff of preventing bugs grows exponentially as the complexity of the code base increases and at least linearly with size of the code base.
But it isn't really like Ruby with static typing. The language isn't Rubyish (no mixins or blocks, for example). It makes you put everything in a class, a la Java. It can verbose and is occasionally downright clunky (though syntactically it's categorically slicker than Java). The .NET ecosystem doesn't have the Ruby characteristic of lots of small, fast-evolving libraries that are easy to use. In fact, the C# open source ecosystem is kinda poor in general and not a huge part of most developer's lives, whereas Ruby's ecosystem is vibrant and an integral part of its coding culture.
Another way to put all that is that if C# were purely dynamically typed, it wouldn't feel anything like Ruby.
I do see what you're saying: LINQ feels like a static (and lazy!) version of Ruby's Enumerable module, the lambdas look similar, C# actually does have optional dynamic typing, and C# is increasingly full of nice developer-friendly features. In general, I'm a fan. But switching between them doesn't feel like just a static/dynamic change.
I've had the complete opposite experience. I've found when it comes to libraries, C# has fewer, but higher quality than Java ones. I've also found C# to have a much more intuitive standard library. In C#, I can often just figure out how a standard library class works purely through the type system and the IDE, while in Java, I'd have to search through documentation more frequently.
Shows some of the "dynamic" goodness of C#.
That has always been my reaction to most comments about dynamically typed scripting languages, including Python and Ruby. Most of the time, turning compile-time type errors into runtime exceptions is not a feature.
Heck, look at all that FactoryFactoryFactory stuff you have to deal with when you want to swap out core parts of a framework - you end up with config files and XML and you have to make sure the guy who made the original framework designed it to allow you to change the part you want to change with your modular swap... in a dynamic language? Monkey patch. It's ugly, but it works.
Heck, look at serialization. If you want to serialize/deserialize static objects, you need metadata that includes the types of everything - stuff like XSD in XML. Dynamic languages don't need that stuff, which is part of the modern popularity of JSON... Javascript and its buddies just play nicer with JSON. I actually wish there was a popular simplified analogue to XSD for Json because I actually miss the ease of serializing into objects that you get using XML/XSD in C# or JSON in Javascript.
The dynamic-ish nature of exceptions that seem like an unholy abominatable hole in the type-system in static languages (or a source of unending-agony in Java's checked exceptions) suddenly fit nicely into a dynamic-typed language paradigm. Python embraces a "easier to ask forgiveness than permission" approach, throwing exceptions willy nilly and it makes nice clean code.
Plus, working in a dynamically typed language heavily discourages premature optimization because you already threw performance out the window.
But yeah, you're basically working without a net, and that kinda sucks.
I don't think Java-style typing is that much of a hindrance. It's irritating boilerplate, but people using those languages can slam it out very quickly.
I don't think reasoning about runtime types is any more difficult than reasoning about compile-time times, it's in fact a higher cognitive load because you cannot ignore it and rely on a type-checking phase that covers all your paths without explicit test cases.
I personally found Ruby to be productive[1] due to the expressive metaprogramming, how easy it is to make DSLs, blocks and yield for CPS, generators, and co-routines, and how everything is re-definable. I don't know how much dynamic typing factors into that, but I think if you could get the same things with equally expressive syntax, Rubists would still like it.
[1] It used to be my favorite. I still like it (and love it for scripting), but prefer GADTs and pattern-matching on type constructors now.
You could test that now with http://rubyluwak.com/
RubyLuwak is statically typed with local type inference.
Exceptions include frequently reused code (libraries, components, frameworks) and well-specified problems (rewriting a known problem, implementing an algorithm, shuttle-like high-risk projects). Here, static types are also useful as documentation.
As Brooks said: plan to throw one away; you will, anyhow. i.e. code to understand, then to solve. You don't understand it well enough to have a correct theory the first time, and it's less feasible to rewrite a project from scratch the larger it is.
if some_condition
a = 1
else
a = 1.5
end
If I'm working in a compiled and typed language, the last thing I want is a language that automatically gives union types to variables. As far as I'm concerned, the type inference should fail at this point. In the above example, now I'm forcing the compiler to maintain a union which is going to have a pretty significant overhead on every computation the variable `a` is involved in.For example, in Scala: val x = if (some_condition) Employer else Employee
If Employer and Employee both derive from Person, then x will be of type Person. The run time uses dynamic dispatch to figure out how members are accessed from x.
If you want to constrain x, you need to specify the type explicitly.
In your example, the type of x is not (Employer \/ Employee), it is their shared superclass - Person. The analogous example would be if
val x = if (some_condition) Employer else Employee
succeeded even though Employer and Employee did not share a superclass. Very few languages use union types - Typed Racket comes to mind, and Algol apparently did too.You can always have a static analyzer tool (that works, because Crystal is compiled) that can pin-point all the locations of union types. You can then put some type restrictions wherever you need them, to know where the union types come from.
The idea is that you can start prototyping something that works and is quite fast, and later you can always improve the performance without ever having to write C code.
Also, the union of an int and float will probably be just float so it won't have any performance overhead.
Hint: if I'm manipulating/interpolating a string, it's an eval, not a (lisp-y) macro.
That said, python is the most productive language I've used so far but type management is really just one part of that.
I respect Charles Oliver Nutter but Java is something I want less of in my life. This seems like a great alternative for people seeking performant Ruby interpreters.
Given the syntax looks similar, could it run Ruby source, unaltered?
# In a foo.cr file
fun init_foo = Init_foo : Void
puts "Init foo! :-)"
end
Compiling it: bin/crystal foo.cr -o foo.bundle
Then in Ruby: $ irb
irb(main):001:0> require "./foo"
Init foo! :-)
=> true
But more complex things don't work right now (because of the initial implementation of the GC, which needs to be turned off for this case).So, yes, we are not that far from allow you to write Ruby extensions in Crystal.
We'd also like to write Objective-C code that way, and also Erlang extensions.
I think it's an interesting project. But not being a Ruby focused coder these days, I can't see myself choosing this over other compiled languages at this point.
Garbage collection, inlining virtual calls etc.
That's a common misconception: what makes a language "faster" than another is not only more or less optimizations in the compiler or efficiencies in the runtime. There are language features that just kill performance. The typical culprits, in descending order of cost:
- dynamic typing
- dynamic dispatch (virtual methods)
- mandated bounds checking
- existence of an "eval" function
- mandated introspection
http://blog.headius.com/2013/05/on-languages-vms-optimizatio...
No, it's actually not even close. The things that make Ruby slower than C are not just "implementation issues".
There's bits of Ruby that aren't implemented (I believe some due to iOS, some due to difficulty?) but what am I missing about the "compile-to-C"?
However, I have never seen the problem in writing C extensions for Ruby when I am pressed for speed. Only complaint that I can think of is that my first extension was "problematic" in terms of not being able to find any resources, which made me resort to reading the source, which in turn made me a better Rubyist!
Mono always seems to be behind the M$ release of .NET and I'd rather not bother.
Try Haskell for static typing done right.
Maybe it's a matter of what you are accustomed to but not having explicit declared types gives me headaches. I find code without types horribly unreadable - because I can't see at the first glance what data a function processes, etc.
Also dynamic typing and its runtime type checking gives me that uncanny feeling of "something might be wrong but I won't find out till I hit it".
long sum(int *a, int len)
{
long ret = 0;
for (int i = 0; i < len; i++)
ret += a[i];
return ret;
}
The entire loop (the conditional test, the increment, and the `ret` update) could probably be implemented in less than 10 native instructions depending on your machine. Faaaaast. If Ruby were a compile-to-C language, I would expect it to produce C code that looked somewhat like this. So let's look at the same snippet in Ruby: # sum the first len elements of a
def sum(a, len)
ret = i = 0
while i < len
ret += a[i]
i += 1
end
ret
end
(This is far from being idiomatic Ruby code, but this solution is the simplest and it also seems like it would be the easiest to directly translate to C.) Semantically, here's what that would translate to (in a C-like pseudocode): RubyObject *sum(RubyObject *a, RubyObject *len)
{
RubyObject *ret = newRubyInteger(0);
RubyObject *i = newRubyInteger(0);
while (call(getMethod(i, "<"), len)) {
ret = call(getMethod(ret, "+"), call(getMethod(a, "[]"), i));
i = call(getMethod(i, "+"), 1)
}
return ret;
}
Why is this so complicated? Because I've captured Ruby's dynamic typing and dynamic dispatch within the function itself. `ret` isn't a long, it's a Ruby variable that can hold any type of object, so we need to capture that in the source. Same with `i`, `a`, and `len`. When we say `a[i]`, we're not jumping to the `i`th element of the integer array `a`, which would be super fast. Instead, we have to dynamically dispatch the `[]` method, which will perform bounds checking and a bunch of type-checking. We also have to dynamically dispatch the `<` and `+` methods everywhere, which perform type-checking themselves. Obviously, this all takes much, much more than 10 native instructions. You can't generally optimize out the method dispatches, since you are generally allowed in Ruby to redefine methods of built-in classes wherever you want. You might be able to perform some static analysis to get rid of some dynamic types, but you have to be careful with machine integers, since they overflow without warning. You'd have to check after every operation you do that the operation didn't overflow, and switch it out for a big integer if that happens. Any of these methods could raise exceptions and that's a nontrivial problem to deal with. The garbage collector is also running in the background.And this is just a simple example, too. Things get a hell of a lot more complicated when you introduce blocks and dynamic scoping (which I purposefully stayed away from). So that should paint a somewhat clear picture of why it's not just an issue of waiting 10 years until Ruby gets as fast as C. I don't know how close it's even possible to get without messing with the semantics of the language.
someList.FindAll(i => i < 2)
.Select(i => i * 2)
.GroupBy(i => (i % 2) == 0 ? "even" : "odd")
Is about as close to Ruby's Enumerable as I've found in a mainstream/enterprise language (unless you include Scala).One really key difference with LINQ is that it doesn't produce arrays (or dictionaries, as in your example); it produces Enumerators, which you then have to do call toList() or toDictionary() on. That laziness is actually an awesome feature and my favorite thing about LINQ, because it can massively improve performance by shortcutting work and not creating intermediate arrays. You can even work on infinite sequences with it. Besides performance, it's just tastier. It's so great I actually wrote a Ruby library to imitate it: https://github.com/icambron/lazer
One of the biggest performance issues I've seen with modern .NET code is people abusing LINQ and lambdas. Chaining functions like this is most decidedly not fast. I once wrote a library that had do do some heavy signal processing on large data sets, and since I wanted to ship the first version as soon as possible, I just used LINQ in a lot of functions to save time. It wasn't very performant so later I rewrote most of the functions to use standard native code such as loops for iteration, hashmaps for caching and all sorts of improvements like that. I completely got rid of LINQ in that version and for many functions the runtime went down from something like 500ms-1000ms to microsecond area.
So sure, LINQ makes development fast and it's very nice to be able to write code such as .Skip(10).Take(50).Where(x => ...). On most web projects, it won't make a huge difference. I've seen Rails "developers" use ActiveRecord in such a way that they would create double and triple nested loops and then hit the database multiple times by using enumerable functions on ActiveRecord objects without realizing how this works, what's going on behind the curtains and so on. I've seen .NET devs do similar things using EntityFramework.
So yeah, it's convenient and all, but it can also be very dangerous when used by someone who doesn't understand the fundamentals behind these principles.
class Person
end
class Employer < Person
end
class Employee < Person
end
x = some_condition ? Employer.new : Employee.new
# x is a Person+
This is not said in the "happy birthday" article (or anywhere else, IIRC).In the beginning we typed x as Employer | Employee. But, as the hierarchy grew bigger compile times became huge. Then we decided to let x be the lowest superclass of all the types in the union (and mark it with a "+", meaning: it's this class, or any subclass). This made compile times much faster, and in most (if not all) cases this is what you want when you assign different types under the same hierarchy to a variable.
What this does mean, though, is that the following won't compile:
# Yes, there are abstract classes in Crystal
abstract class Animal
end
class Dog < Animal
def talk
end
end
class Cat < Animal
def talk
end
end
class Mouse < Animal
end
x = foo ? Dog.new : Cat.new
x.talk # undefined method 'talk' for Mouse
That is, even though "x" is never assigned a Mouse, Crystal infers the type of "x" to be Animal+, so it really doesn't know which types are in and considers all cases.Again, this is most of the time something good: if you introduce a new class in your hierarchy you probably want it to respond to some same methods as the other classes in the hierarchy.
If you do:
a = [1, 'a', 1.5, "hello"]
you get Array(Int32 | Char | Float64 | String)
In a way, the Super+ type is a union type of all the subtypes of Super, including itself, but just with a shorter name.
No, it wouldn't; that's the really important point about LINQ I was, clumsily, trying to express above [1]. Take this admittedly totally contrived example:
someList
.Where(i => i % 2 == 0)
.Select(i => i + 7)
.Take(5)
This is not equivalent to a bunch of sequential loops. What it is is a bunch nested Enumerators. Here's how it works. It gets the list's Enumerator, which is an interface that has a MoveNext() method and a Current property. In this case, MoveNext() just retrieves the next element of the list. Then Where() call wraps that enumerator with another enumerator [2], but this time its implementation of MoveNext() calls the wrapped MovedNext() until it finds a number divisible by 2, and then sets its Current property to that. That enumerator is wrapped with one whose MoveNext() calls underlying.MoveNext() and sets Current to underlying.Current + 7. Take just sets Current to null after 5 underlying MoveNext() calls.So all that returns an enumerable, so as written above, it actually hasn't done any real work yet. It's just wrapped some stuff in some other stuff.
Once we walk the enumerable--either by putting a foreach around it or by calling ToList() on it--we start processing list elements. But they come through one at a time as these MoveNext() calls bring through the list items; think of them as working from the inside out, with each MoveNext() call asking for one item, however that layer of the onion has defined "one item". The item is pulled up through the chain, only "leaving" the original list when it's needed. The entire list is traversed at most once, and in our example, possibly far less: the Take(5) stops calling MoveNext() after it's received 5 values, so we stop processing the list after that happens. If someList were the list of natural numbers, we'd only read the first 10 values from the list.
Now, those nested Enumerator calls aren't completely free, but they're not bad either, and you certainly shouldn't be seeing a one second vs microseconds difference. If you craft the chain correctly, it's functionally equivalent to having all of the right short circuitry in the manual for-loop version, and obviously it's way nicer.
So why are you seeing such poor perf on your LINQ chains? Hard to say without looking at them, but a few of pointers are: (1) Never call ToList() or ToDictionary() until the end of your chain. Or anything else that would prematurely "eat" the enumerable. (2) Order the chain so that filters that eliminate the most items go at the end of the chain, similar to how you'd put their equivalent if (...) continue; checks at the beginning of your loop body. (3) Just be cognizant of how LINQ chains actually work.
[1] In the example in the parent, FindAll isn't actually a LINQ method, so there is one extra loop in there. Always use Where() if you're chaining; use FindAll() when you want a simple List -> List transformation.
[2] A detail elided here: each level actually returns an Enumerable and the layer wrapping it does a GetEnumerator() call on that.
The nice thing about Enumerable methods is that they can significantly speed up development and most projects won't suffer for it. However, for speed critical code it's probably not the best tool in the box.