The Broken Promises of MRI/REE/YARV(timetobleed.com) |
The Broken Promises of MRI/REE/YARV(timetobleed.com) |
.NET CLR has the exact same problem (perhaps a harder one, since CLR has a moving GC), so anytime they touch GC references (pointers to objects that are collectible) it's always wrapped in an explicit GC stack frame (think GC struct that lives on the stack). Furthermore, all reads/writes are carefully done with macros (which of course expands to volatile + some other stuff) to make sure the compiler doesn't optimize it away.
On the one hand, this is nice because they don't need to scan the C-stack (it scans the VM stack and the fake GC frame stacks -- well it's one stack but you skip the native C frames), on the other hand this means that any time a GC object is used in C code (ok, actually it's C++) they have to be real careful to guard it.
Of course bugs crop up all the time where an object gets collected where it shouldn't have, it happens so often that there is a name for it -- "GC Hole".
Astute readers and users of p/invoke may remark that they don't have to set up any "GC frames" -- that is because this complicated scheme is not exposed outside of the CLR source. Regular users of .NET who want to marshal pointers between native/managed can simply request that a GC reference gets pinned, at which point I'm mostly sure it won't get collected until it's unpinned.
The bad news is I'm almost positive there is nothing you can do with just C here to make this problem go away. You'd want stuff to magically just happen under the hood, and C++ is the right way to go for that.
It's probably possible to create an RAII style C++ GC smart pointer that would be 99% foolproof at the expense of some performance. It gets a little bit trickier if we are doing a moving collector. I am thinking it could ref/unref at creation/destruction, and disallow any direct raw pointer usage not to shoot yourself in the foot.
Of course the people writing the GC still need to worry about this..
What makes it 1/ irreversible and 2/ bad for today's users?
EDIT: as well, I wouldn't stop using Ruby because of that; I would use JRuby or Rubinius or IronRuby (if I understand well, these ones are not affected?)
A plausible rewrite of that function in an XS for ruby would leave the function declaration and wrapper code up to your equivalent of xsubpp to execute your DSL and transform the wrapped code to fully functional C. If you build a C using extension from Perl, you'll find an XS file like http://cpansearch.perl.org/src/SIMON/Devel-Pointer-1.00/Poin... which during the `perl Makefile.PL && make` step is transformed via `xsubpp Pointer.xs > Pointer.c` and then compiled as normal C.
Shit! MRI/YARV/REE are inherently fatally flawed! All that code I have running in production must be a FIGMENT OF MY IMAGINATION! SAVE YOURSELVES
Yours in perpetual bogglement,
Lil' B
If rubinius/ironruby/jruby have no issues, this may become moot eventually as rubinius is gaining lots of traction recently and is becoming faster by the release outperforming standard ruby vms in many cases.
However, I would like to see Matz' response to the recommended steps for a fix at the end. Sounds like a reasonable goal to add for Ruby 2.0.
Note to self: Listening to Papoose while writing a technical blog post turns your otherwise important observations into a Chicken Littleish, end-of-the world rant.
I don't intend this to be an inflammatory question, I'm sort of a perpetual ruby novice, it's never been my day job and I've never managed to sort of catch up with the community, as soon as I feel pretty good with something I find it's been obsoleted a couple times. I like it but how does the community at large deal with stuff like this? This guy found a real bug and invested some time in it, do other rubyists just deal with crashes and restart their stuff? Do they just consider it part of "being on the cutting edge?" Or do they not even notice?
That's what makes the hyperbolic tone of this article so douchey; he wrote up an interesting dissection of an edge case issue as though it were an ongoing catastrophe, mostly just to inject a bunch of chest-thumping rock-star bravado that added nothing of value to the discussion.
This is the implementation of `select.epoll`. Somethings you'll notice there's no GC details (allocations outside the GC of C level structs are handled nicely with a context manager), and we have a declarative (rather than imperative) mechanism for specifying argument parsing to Python level methods, this ensures consistency in readability as well error handling, etc.
That said, I like Handle, the RAII thing that V8 uses. It also allows for compacting collection. Too bad C doesn't do RAII.
[1] http://www.shafqatahmed.com/2008/05/memory-control.html
[2] http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index....
Faulty assumption seems to be that counting references only to RVALUEs (Ruby objects in heap) is enough to determine if a part of memory can be freed. This breaks down in C-extensions where macros extract some part of the object or something pointed by it for use. In this case RSTRING_PTR extracts the C char-array used by str for zstream_append_input to use (lets call it arr).
If zstream_append_input or any calls underneath it tries to allocate a new Ruby object, GC may get called and str (and thus arr) may get freed because there are no references left to it anymore (no heap/stack/register because the register value was overwritten).
And this seems to require all Ruby C-extension writers to lock the objects they're using through macros with RB_GC_GUARD.
Edit: note that there are no references left to str
The Ruby C API is returning objects that are not correctly reference-counted for a short period of time and are incorrectly subject to GC.
This doesn't seem fatal to me, just not reasonably fixable from the GC side. It might be true, that a new API is needed to hold refs in the C side.
Funktacularly yours,
Lil' B
BTW the CLR is not a good alternative runtime for Ruby, might not ever be: http://www.zdnet.com/blog/microsoft/whats-next-for-microsoft...
You did good work here -- don't hurt your credibility with overstatement.
"Volatile" is the wrong fix, by the way. That's just depending on yet another non-required behavior. There is in fact no further reference to "str" between the function call and the reassignment at the start of the next iteration, so there's nothing for "volatile" to chew on. This particular version of this particular compiler just happens to add an extra pair of stack operations in this case, but it's not truly required to. A real fix would not only mark the variable as volatile but also add a reference after the function call. The same "(void)str;" type of statement that's often used to suppress "unused argument/variable" warnings should count as a reference to force correct behavior here.
Perhaps the GC could be modified to track pointers not just to the head of object but to any address within it. Alternatively, C-coders working with Ruby could just say "I'm using this gc object" before calling C code.
I don't see this is a fatal flaw at all. Sounds like its just a bug. Now if, as many here assert, this bug is present all over the Ruby VM, then that's pretty unfortunate. Is that the case, or just hyperbole?
This certainly isn't an awesome solution but couldn't the GC backtrace(3) the current process and look at %eax at all C stack frames to additionally include that value in the "pointers currently plausibly in flight" list?
The correct way to handle this is to add the object reference to the GC's "root" set while you're using its guts, and removing it again when you're done.
Another possible solution is to allocate the string object and its character representation in one chunk of memory. This only works for immutable strings which never share substructure, though. The reason this works is that most conservative GCs will consider objects live as long as there is a pointer pointing to somewhere within a chunk of memory, not necessarily at the beginning.
[1] note: I'm not a Ruby coder but I fixed a very similar problem in a Lua implementation about 4 years ago. That one wasn't even conservative GC. EDIT: I told the story of that bug on HN 3 years (!) ago http://news.ycombinator.com/item?id=217189
[2] worse, it probably doesn't blow up immediately and instead causes memory corruption.
Both problems are hard and the current state of affairs is apparently some random amount of the time we'll get memory corruption bugs.
Just figuring this out is a non-trivial project.
Obviously all non-trivial code working in production not only can have bugs, but will have bugs. Just as obviously, no reasonable person would consider those "fatal flaws" for any reasonable definition of the word fatal.
MRI/YARV's Conservative GC opens up some bedevilling classes of bugs for gem writers, obviously. Calling that a "fatal flaw" when millions of lines of production code continue to function despite its presence is nothing but over-the-top hyperbole.
(EDIT: I guess people don't like unpopular views at all, that's fine, long live jokes, forget the facts.)
I'm not sure if those are both related to this or not, but I've had drastically more segfaults lately than in my past 6 years of ruby programming. It's getting pretty bad imo.
I know I can't run typhoeus + thin on 1.9.2 on OSX as it reliably crashes every ten minutes and I have no clue on how to debug it, but it is not a problem with the interpreter, it's a problem with external libraries.
The analysis was good, but the tone was ludicrous. It sounds far too much like: "Hey! everyone in the world should abandon MRI because of a bug I found!! That's right, me!!"
The question really is: how much data corruption is occurring that _does not_ cause world ending segfaults? THAT is what you need to worry about. Check yourself before you wreck yourself.
And yo check this: maybe this "fatal flaw" is actually just an edge case bug that isn't cropping up much in practice. Fo'shizzle!
And maybe we can drop the ridiculously asinine slang and douchey bravado, "bro".
imma talk the way i talk and dont give a fuck if you like it or not.
This issue is no doubt a pain in the ass for gem authors to debug, but it's definitely not something that library users are running into with any sort of frequency.