Helix: Rust and Ruby, Without the Glue(blog.skylight.io) |
Helix: Rust and Ruby, Without the Glue(blog.skylight.io) |
I think this could help Ruby regain some of its early excitement, and remove the negativity of its 'just a scripting language', 'only for rails, which is too slow for modern development anyway' image.
I would pay a decent chunk of money for a cleanly integrated MRI implementation inside of Rust, but I also will not be holding my breath for it.
Sounds extremely exciting!
Typos in the article: "slimed down", "you code could"
Is "needle_length = needle.length" actually necessary? I thought repeated calls would be zero cost, but I'm guessing I'm wrong.
Conceptually, a Set will do much better even with this sort of optimization.
And isn't there some common C lib that exposes Unicode functions like is_whitespace? Granted, using a cargo crate is easier than finding and adding a .h, and far easier than getting and linking another lib.
In this case, the coercion needs to ask Ruby for the encoding tag and ask Ruby to validate the encoding (which is does often enough that it's often cached) but after that we can safely coerce directly into a UTF8 string.
If we wanted to support other encodings, we could fall back to using Ruby's transcoding support (string.encode("UTF8")) and again, once someone does the work once it'll work for all helix users.
I wasn't present for the talk, just saw his slides.
Checking input, however, is a whole 'nother ballgame.
Anyway I think the same could be applied to Ruby Core as well. As I have been calling a Rusted Ruby for a long time.
Though I am not sure if this is a good thing for other Ruby implementation like JRuby.
Rust as far I know (not at all, honestly) doesn't interface with Java that well, so for JVM projects, it's nice to have a familiar language like Ruby that can tie in and make prototyping way easier.
I think people misunderstand how awesome Jruby is, because Ruby has never performed that well and Jruby has performed very well. Performance is not the only reason to both implementing a language. There's also the issue of mindshare, where a large group of people might already know Ruby, and it'd be easier in those scenarios to just give them Jruby and let them go to town than try and drag them through learning Java.
Rusty Ruby will do the same thing for Rust, I think. Rust is a very intricate, well-thought-out language, and I think it would benefit a lot from playing off the shared knowledge of thousands of Ruby users.
Besides that I really don't get why perforce are renaming their version control system yo Helix. Perforce is a well-known name, it seems weird to just drop a strong brand.
The article talked about 30 minutes to run through a "is this set of items in this other set of a bunch of items".
While it would probably make sense to do this efficiently in memory the reality is the company mentioned are probably doing it inefficiently in memory; asking some tool which can do this type of work effectively and is fairly well known was my basic suggestion.
Once they port all Rails libraries to Rust, you can still use the extremely-ergonomic Ruby programming language, but it won't be "slow as hell" anymore. And ruby performance is "tolerable in most circumstances" as-is.
I certainly agree that it could be awesome to have Rust on Rails for those who would find even Rails on Rust performance intolerable, and Iron's abstractions inadequate.
Rust provides a safer option - which is worthwhile. There have been several examples of gems with memory issues. It doesn't change anything, unless this is a general "Ruby is too slow" sort of rant.
Helix is setup to do the right thing – it already goes through a coercion protocol, we can easily add the encoding check there. We just missed that detail when porting the code, will fix it soon.
I suppose that echoes my point about how system programming in is hard to get right, there are just too many details you have to remember!
This is why having a shared solution like Helix is beneficial. By moving all the unsafe code into a common library, it's more likely that someone will notice the problem and fix it for everyone.
This actually touches on an interesting point I would like to elaborate on. When we say {Helix/Rust/Ruby} is safe, there is an important caveat – {Helix/Rust/Ruby} themselves could of course have bugs. I have definitely experienced segfaults on Ruby myself.
While true, this caveat is not particularly interesting. It is not a slight of hand. Moving code around doesn't magically remove human errors, that's not the point. It's about establishing clear boundaries for responsibility. (This is why unsafe blocks in Rust is great.)
When you get a segfault on Ruby, you know for certain that your code is not the problem. Sure, you might be something weird, but it is part of the contract that the VM is not supposed to crash no matter what you do. As a result, memory safety is just not a thing you have to constantly worry about when programming in Ruby.
It is the same thing as saying JavaScript code on a website "cannot" crash the browser, segfaults in user-space code "cannot" cause a kernel panic or malicious code "cannot" fry your chip. All of these could of course (and do) happen – but from the programmer's perspective, you can work with the assumption that they are not going to happen (and when they do, it's someone else's fault). It's not "cannot" in the "mathematically proven" sense, but it's just a useful abstraction boundary.
<form accept-charset="UTF-8">
so these days, the non-UTF-8 usage in Rails apps should be pretty tiny, I would think? It'd be stuff coming from outside of forms.It's their own words.
Not to start a flame war, but according to my experience, Ruby 1.9 is already quite good, later versions of Ruby only introduce minor syntax & semantic changes, which is trivial to work around. This is nothing like the big differences between 1.8 and 1.9
I agree with package availability problem of mruby, tho.
Encoding is purely an artifact of I/O if your language has a character type that can represent all possible characters you might want read or write.
Rust's strings are almost this; if there were no way to get a string's raw representation, nor perform bytewise slices, then how the string was stored in RAM would be an implementation detail rather than part of the public API. Rust, being a systems language, probably does need to specify this so that it doesn't incur encode/decode overhead when dealing with foreign code that can understand utf-8.
For example if we have a few tables we can determine which menu items a user could eat:
users
user_ingredient_exclusions
ingredients
menu_ingredients
menu
-- items a user can eat based on not having any items in the excluded list
select distinct m.*
from menu m
inner join menu_ingredients mi on m.id = mi.menu_id
left join user_ingredient_exclusions e on e.ingredient_id = mi.ingredient_id
inner join user u on u.id = e.user_id
where e.id is null
and u.id = @idI did not know that strings in Ruby have encodings. Is there a reason for that? I personally don't like mixing characters and opaque byte sequences as they are very different.
The representation of a Rust String in memory is guaranteed valid UTF-8. To me, a "sequence of Unicode scalar values" is an abstract description, because it could be implemented via UTF-8, UTF-16 or UTF-32.
> I personally find it unfortunate that they dictate the storage of it at the API level
It is extraordinarily convenient and provides a very transparent way to analyze the performance of string operations.
For transcoding, there is the in-progress `encoding` crate: https://github.com/lifthrasiir/rust-encoding
I note that Go does things very similarly (`string` is conventionally UTF-8) and it works famously for them. They have a much more mature set of encoding libraries, but they work the same as the equivalent libraries would work in Rust: transcode to and from UTF-8 at the boundaries. See: https://godoc.org/golang.org/x/text
[edit] I remember a talk where Matz was asked this specific question and tried to explain it clearly but seemed confused as to how the questioner could have such a poor grasp of unicode (the difference between monolingual americans and japanese i guess)
On the other hand, if you just track the encoding in your string type, then you don't have to pay a conversion cost at the boundary, but each encoding will have different memory-usage and performance characteristics.