Node.js - A Giant Step Backwards(fenn.posterous.com) |
Node.js - A Giant Step Backwards(fenn.posterous.com) |
While evented I/O is great for a certain class of problems: building network servers that move bits around in memory and across network pipes at both ends of a logic sandwich, it is a totally asinine way to write most logic. I'd rather deal with threading's POTENTIAL shared mutable state bullshit than have to write every single piece of code that interacts with anything outside of my process in async form.
In node, you're only really saved from this if you don't have to talk to any other processes and you can keep all of your state in memory and never have to write it to disk.
Further, threads are still needed to scale out across cores. What the hell do these people plan on doing when CPUs are 32 or 64 core? Don't say fork(), because until there are cross-process heaps for V8 (aka never), that only works for problems that fit well into the message-passing model.
It won't work for every problem, of course.
dnode is a good way to easily talk to other node.js processes without the HTTP overhead. It can talk over HTTP too, with socket.io.
node-http-proxy is useful as a load balancer, and a load balancer can distribute work between cores.
Finally, most of the node.js people I've met, online and offline, are polyglots, and are happy to pick a good tool for a job. But right now node.js has great libraries for realtime apps, the ability to share code on the client and server in a simple way, and good UI DSLs like jade, less, and stylus.
I feel you about the polyglot and tend to agree, but I think some people are really trying to force awkward things into node, like people attempting to write big full-stack webapps using it.
To handle branching flow-control like 'if' statements, Twisted gives you the Deferred object[1], which is basically a data structure that represents what your call stack would look like in a synchronous environment. For example, his example would look something like this, with a hypothetical JS port:
d = asynchronousCache.get("id:3244"); // returns a Deferred
d.addCallback(function (result) {
if (result == null) {
return asynchronousDB.query("SELECT * from something WHERE id = 3244");
} else {
return result;
}
});
d.addCallback(function (result) {
// Do various stuff with myThing here
});
Not quite as elegant as the original synchronous version, but much tidier than banging raw callbacks together - and more composable. Deferred also has a .addErrback() method that corresponds to try/catch in synchronous code, so asynchronous error-handling is just as easy.For the second issue raised, about asynchronous behaviour in loops, Twisted supplies the DeferredList - if you give it a list (an Array, in JS) of Deferreds, it will call your callback function when all of them have either produced a result or raised an exception - and give you the results in the same order as the original list you passed in.
It is a source of endless frustration to me that despite Twisted having an excellent abstraction for dealing with asynchronous control-flow (one that would be even better with JavaScript's ability to support multi-statement lambda functions), JavaScript frameworks generally continue to struggle along with raw callbacks. Even the frameworks that do support some kind of Deferred or Promise object generally miss some of the finer details. For example, jQuery's Deferred is inferior to Twisted's Deferred: http://article.gmane.org/gmane.comp.python.twisted/22891
[1]: http://twistedmatrix.com/documents/current/core/howto/defer....
The differences between your example and the common JavaScript practice for promises (when they're used; most of the time they aren't) are that then is used instead of addCallback and that chaining is available and taken advantage of.
getSomething("id", function(thething) {
// one true code path
});
function getSomething(id, callback) {
var myThing = synchronousCache.get("id:3244");
if(myThing) {
callback(null, myThing);
} else {
async(id, callback);
}
}
a minor quibble with language style isnt exactly what I would call "A Giant Step Backwards"I was under the impression that you could not do _anything_ synchronous? What if the call blocks for 100ms? or 1000ms? Won't that delay all other clients and all other requests?
It also made for a better title than "Confusion, Then Indifference, Slowly Turning Into Understanding & Affinity"
I recently wrote a project that needs to do 100's or 1000's of possibly slow network requests per second. The first try was Ruby threads. That was a disaster (as I should have predicted). I had an entire 8-core server swamped and wasn't getting near the performance I needed.
The next try was node. I got it running and the performance was fantastic. A couple orders of magnitude faster than the Ruby solution and a tenth of the load on the box. But, all those callbacks just didn't sit right. Finding the source of an exception was a pain and control flow was tricky to get right. So, I started porting to other systems to try to find something better. I tried Java (Akka), EventMachine with/without fibers, and a couple others (not Erlang though).
I could never get anything else close to the performance of Node. They all had the same problems I have with Node (mainly that if something breaks, the entire app just hangs and you never know what happened), but they were way more complicated, _harder_ to debug, and slower.
I have a new appreciation for Node now. And now that I'm much more used to it, it's still difficult to do some of the more crazy async things, but I enjoy it a lot more. It's a bit of work, and you have to architect things carefully to avoid getting indented all the way to the 80-char margin on your editor, but you get a lot for that work.
Also the first example, the cache hitting and missing, could be rewritten with async, too.
async.waterfall([
function(callback) {
asynchronousCache.get("id:3244", callback);
},
function(myThing, callback) {
if (myThing == null) {
asynchronousDB.query("SELECT * from something WHERE id = 3244", callback)
} else {
callback(myThing)
}
},
function(myThing, callback) {
// We now have a thing from the DB or cache, do something with result
// ...
}
]);From a readability standpoint I'll take the "old" version any day:
function getFromDB(foo) {
var result = asynchronousCache.get("id:3244");
if ( null == result ) {
result = asynchronousDB.query("SELECT * from something WHERE id = 3244");
}
return result;
} x = db.getFutureResult("x");
y = db.getFutureResult("y");
whenFuturesReady([x,y], callback(x, y) {
useResults(x,y);
});
This looks reasonably similar to typical synchronous code, x = db.getResult("x")
y = db.getResult("y")
useResults(x,y)
but it allows db queries to happen simultaneously and doesn't break the node paradigm.http://gfxmonk.net/2010/07/04/defer-taming-asynchronous-java...
There's a die-hard core of callback proponents (especially in twisted- and lately in node-land) who claim the pure callback-style is more predictable, robust and testable.
This is not my experience. I've been through that with twisted (heavily), some with EventMachine and some with node.js.
The range of use-cases where I'd benefit from that style was extremely narrow.
For most tasks it would turn into a tedium of keeping track of callbacks and errbacks, littering supposedly linear code-paths with a ridiculous number of branches, and constantly working against test-frameworks that well covered the easy 90% but then fell down on the interesting 10% (i.e. verifying the interaction between multiple requests or callback-paths).
I'm sticking to coroutines where possible now (eventlet/concurrence) and remain baffled over the node-crew's resistance against adding meaningful abstractions to the core.
I like javascript a lot (more so with coffee), but I see little benefit in dealing with the spaghetti when that doesn't even give me transparent multi-process or multi-machine scalability.
And to prevent the obligatory: Yes, I know about Step, dnode and the likes. They remain kludges as long as the default style (i.e. the way all libraries and higher level frameworks are written) is callback-bolognese.
I believe that JavaScript could become the dominant language on the server. We just need to have a set of consistent synchronous interfaces across the major server side JavaScript platforms. This would allow for innovation and code reuse higher up the stack.
I'm doing my bit by maintaining Common Node (https://github.com/olegp/common-node), which is a synchronous CommonJS compatibility layer for Node.js.
Wouldn't it be better to describe it as running serially, using non-blocking asynchronous function calls? Guess that doesn't really roll of the tongue, though.
https://github.com/scalien/scaliendb/blob/master/src/Framewo...
I guess the OP is saying inlining [in a language where this is even possible] leads to unreadable code, which sounds about right.
function handler(yes, no) {
return function (err, data) {
if (data) {
yes(err, data);
}
else {
no(err, data);
}
}
}
function get() {
function done(err, data) {
// do something with data
}
function db() {
asynchronousDb.query("SELECT * fomr something where id = 3244", done);
}
asynchronousCache.get("id:3244", handler(done, db));
}My experience (mostly in perl - EV,AnyEvent, etc.) is that combining evens with finite state machines gives more structured code, with smaller functions that interact in predefined manner.
Meanwhile there are other choices that are about as easy, like Python libraries and Google's Go. Too bad they don't have the same zealous community support.
There is SpiderNode, not sure what the status of it is, but it replaces V8 in node.js with SpiderMonkey. SpiderMonkey already has yield and much other new JS syntactic sugar.
http://blog.zpao.com/post/4620873765/about-that-hybrid-v8mon...
Mentions they're working closely with the node team here. And that whole talk is about fixing up JavaScript into a modern language, remove the weird syntax quirks around classes, modules, etc. Say what you mean instead of the weird closure soup.
I had many of the same concerns with node.js. Every time I attempted to wrap my head around how I'd write the code I needed to write, it seemed like node was making it more complicated. Since I learned erlang several years ago, and first started thinking about parallel programming a couple decades ago, this seemed backwards to me. Why do event driven programming, when erlang is tried and true and battle tested?
The reason is, there isn't something like node.js for erlang, and so I set out to fix that.
For about a year I've been thinking about design, and for a couple months I've been implementing a new web application platform that I'm calling Nirvana. (Sorry if that sounds pretentious. It's my personal name- I've been storing up over a decades worth of requirements for my "ideal" web framework.)
Nirvana is made up of an embarrassingly small amount of code. It allows you to build web apps and services in coffeescript (or javascript) and have them execute in parallel in erlang, without having to worry too much about the issues of parallel programming.
It makes use of some great open source projects (which do all the heavy lifting): Webmachine, erlang_js and Riak. I plan to ship it with some appropriate server side javascript and coffee script libraries built in.
Some advantages of this approach: (from my perspective)
1) Your code lives in Riak. This means rather than deploying your app to a fleet of servers, you push your changes to a database.
2) All of the I/O actions your code might do are handled in parallel. For instance, to render a page, you might need to pull several records from the database, and then based on them, generate a couple map/reduce queries, and then maybe process the results from the queries, and finally you want to render the results in a template. The record fetches happen in parallel automagically in erlang, as do the map/reduce queries, and components defined for your page (such as client js files, or css files you want to include) are fetched in parallel as well.
3) We've adopted Riak's "No Operations Department" approach to scalability. That is to say, every node of Nirvana is identical, running the same software stack. To add capacity, you simply spin up a new node. All of your applications are immediately ready to be hosted on that node, because they live in the database.
4) Caching is built in, you don't have to worry about it. It is pretty slick- or I think it will be pretty slick-- because Basho did all the heavy lifting already in Riak. We use a Riak in-memory backend, recently accessed data is stored in RAM on one of the nodes. This means each machine you add to your cluster increases the total amount of cache RAM available.
5) There's a rudimentary sessions system built in, and built in authentication and user accounts seem eminently doable, though not at first release. Also templating, though use any js you want if you don't like the default.
So, say, you're writing a blog. You write a couple handlers, one for reading an article, one for getting a list of articles and one for writing an article. You tie them to /, /blog/article-id, and /post. For each of these handlers, any session information is present in the context of your code.
To get the list of articles, you just run the query, format the results as you like with your template preference and emit the html. If it is a common query, you just set a "freshness" on it, and it will be cached for that long. (EG: IF you post new articles once a week, you could set the freshness to an hour and it would pull results from the cache, only doing the actual query once an hour.)
To display a particular article, run a query for the article id from the URL (which is extracted for you) and, again this can be cached. For posting, you can check the session to see if the person is authorized, or the header (using cookies) and push the text into a new record, or update an existing record. Basically this is like most other frameworks, only your queries are handled in parallel.
The goal is to allow rapid development of apps, easy code re-use, and easy, built-in scalability, without having to think much about scalability, or have an ops department.
This is the very first time I've publicly talked about the project. I think that I'm doing something genuinely new, and genuinely worth doing, but its possible I've overlooked something important, or otherwise embarrassed myself. I don't mean to hijack this thread, but felt that I needed to out my project sometime. A real announcement will come when I ship.
If you're interested in keeping up to date with the project I describe above, please follow me on twitter @NirvanaCore.
EDIT TO ADD: -- This uses Riak as the database with data persisted to disk in BitCask. The Caching is done by a parallel backend in Riak (Riak supports multiple simultaneous backends) which lives in RAM. So, the RAM works as a cache but the data is persisted to disk.
It's asynchronous, not actually parallel. Only a single CPU core will be used in node.js.
However, waiting asynchronous tasks will let other tasks run meanwhile, which can feel like parallelism.
I don't mean to be offensive, but welcome to at least the 1980s. We've known this doesn't scale for ages. The fact that you even tried it and thought it might be a viable solution just shows your education has failed you. I am highly biased against Node, I think it is a giant step backwards. Every blog post I have read that says the opposite admits they have no experience in anything else so they just default to Node being good. I only hope Node is a fad.
This holier-than-thou attitude is exactly the thing that prevents more people from becoming educated on these kind of subjects. Knowledge and experience on these kinds of subjects are _not_ trivial and are _not_ easy to obtain! Information about what scales, what does not, and why, are scattered all over the place and difficult to find. It may be very obvious to you after you already know it but it's really not. If, instead of spending so much time on declaring other people as dumb or uneducated, people would spend more time on educating other people, then the world would be much better off.
And on a side note "your education has failed you"? Seriously? You can't just preface something with "I don't mean to be offensive" and then say whatever you like. I don't mean to be offensive, but get yourself some social skills.
Ship it tomorrow! ;)
Yes, you have overlooked something important, there will be something to be embarrassed by -- whether it turns up next week or next decade -- and we'll all have a good laugh. Don't sweat it. And don't worry that the thing isn't finished; the kind of geeks who might sign on at this stage like unfinished things; that is why they can't resist reinventing the wheel. Plus, it doesn't have to be finished to give people ideas, which is half the point. You are ready to start spreading the news; your writeup says as much.
The public repository beckons!
(Frankly, this sounds like a great experiment, although I would never be too quick to predict the end of the ops department. ;)
I also shouldn't predict the end of the ops department, until I've had it running in production with a significant number of users.
I think it would be better to say- my goal is to have the ops department working on really interesting stuff, rather than shepherding a fleet of servers, every one of which has a different configuration.
I have some plans in this area, but I couldn't guess how to best fit into other people's workflows.
What might be nice is if there was a way to sync a git repository with Riak, and then Nirvana could just pull the relevant code from that. Seems like it would be the best solution, but looking into that- from looking at possibly integrating with a github API (do they have one?) to command line scripts is something I'm punting on to focus on the essentials.
But I do agree with your points!
That is not true, see: https://github.com/hookio/hook.io, been in development in Node.js for over two years.
By which, I meant to say "platform for building server applications in javascript, backed by the power of the erlang OTP platform."
Node.js gives server side javascript a platform, that's great. What I'm working on is giving server side coffeescript and javascript access to the erlang platform (and some really great erlang technologies.)
d.then(func1);
d.then(func2);
You can use: d.then(func1).then(func2);In my own code, I tend not to use chaining because "methods returning self" is not a common idiom in Python (although tools like jQuery have given it currency in the JS world) and because I haven't yet figured out a way of formatting a multi-line method invocation that doesn't look messy.
Also it's a reminder for Node.js developers to make good use of async patterns, lest their code look silly. Besides async, which Fenn mentions in the comments, there's EventEmitters and Backbone.js for doing different styles of async programming. And there are a few other libraries that are a lot like https://github.com/caolan/async .
Perhaps if your background is more on the lines of traditional server-side web development (ie. PHP, Django, Rails) then this is some new territory.
The difference is in what happened afterward. What I did then was to try to understand why Node.js did things differently and how I could accomplish my goals in an idiomatic way. I didn't try to shoe-horn in my existing mental framework for how things should work, and then throw my hands in the air when they didn't.
However, what I didn't do was immediately run to Alert the Internets about what "A Giant Step Backward" Node.js is.
Why?
But don't put words in my mouth, I didn't call anyone dumb, I said his education has failed him. This could be himself failing to properly research the problem space, it could be his school for not properly introducing him to the subject, it could be a whole host of things. I never argued he was incapable of learning (clearly he did). And I do spend a lot of time educating people, don't take a singular snapshot of a comment on HN as indication of how my entire life is spent.
I would even argue that the Ruby threading problems he's experiencing may not necessarily because Ruby threads don't scale, but possibly because he's using them wrong or because he's not using the right version of Ruby. Ruby 1.8 uses select() to schedule I/O between threads so the more threads and the more sockets you have, the slower things become because select() is linear time. The use of select() also results in a hard limit of about 1024 file descriptors per Ruby 1.8 process. Also, context switching in Ruby 1.8 requires copying the stack. Ruby 1.9 is much better in this regard since it uses native threads and no longer uses select() to schedule threads that are blocked on I/O. I'm running a multithreaded, multiprocess Ruby (1.8!) analytics daemon that generates 12 GB of data per day. It flies. VMWare CloudFoundry's router is written in Ruby + EventMachine. That thing has to process tons and tons of requests and they've found Ruby + EventMachine to be fast enough. To simply say "Ruby doesn't scale and is slow" is too simplistic, and ignoring the underlying more complex reasons would result in one bumping against the same problems in a different context. So no, it isn't so obvious from day 1 that using Ruby would be a problem.
Finally, I didn't say "ruby is too slow and doesn't scale", I pointed out that the various issues with doing things fast in Ruby have been known for a long time, even to someone only following Ruby. What I did say was that the basic approach the original commentor chose is known to not scale (which it didn't). This is a fundamentally different approach than the VMWare product you mentioned which has chosen a solution similar to Twisted. This approach is known to scale far superior to the original solution.
I'd love to read some well-thought-out arguments against node from people who've seriously given node a shot, but I haven't seen any. Granted, I haven't been looking for them, so please prove us wrong.
I wrote a comment on reddit that expands on my reasons more, although the second point is less of a problem if people use something like TameJS with Node (which I don't think most people are doing).
http://www.reddit.com/r/programming/comments/ilols/the_node_...
Furthermore C10K is not the complete picture. It describes only connection management, not what you actually do with the connection. The latter plays a non-trivial role in actual scalability.
If you're going to argue that those with knowledge need to distribute that knowledge better, that's fine. Knowledge can almost always be distributed better, perhaps someone could make a nice centralized website that has better information than highscalability.com. But at the same time you've just told me that a document that is a great introductory resource on scaling connection handling is not a "useful educational document". You may have better things to do with your time than read kernel source, but is your time so precious you can't do some google searches? Perhaps read an industrial white paper or academic paper on the subject of scalability? You can write all the software you want but if you're ignorant of how to overcome scalability problems are you accomplishing much? And if you're doing tests and learning about what scales but keeping it to yourself you are just as culpable of not educating people.
Also callbacks to me are kind of like parenthesis in Lisp. They're annoying but they're for the greater good. :)
Is your problem with threads or shared mutable state? Web applications should be stateless and can be written as long request-response pipelines on-top of a pool of actor threads, with the only shared state existing at either ends of the pipeline, probably hidden by a framework anyways.
There are plenty of async idioms that make callbacks a breeze.
Would all the developers writing apps on node.js who are doing 10,000+ concurrent requests per process please stand up?
You obviously haven't written a lot of async code so not sure why you're so against the idea.
4000+ requests being sent every second to the server in total, the server is a single node process.
Also, I'd say I've written enough code on top of node.js to be qualified to comment on this. Here's some of it that's open source:
https://github.com/rbranson/glob-trie.js https://github.com/rbranson/twerk https://github.com/rbranson/node-ffi
At one stage I had the same train of thought that you have (http://chris6f.com/synchronous-nodejs / https://github.com/chriso/synchronous/blob/master/lib/protot...) - it would be nice to have the option of fibers, but it's not going to happen.
IMO async code isn't as difficult or ugly as you make it out to be. Is async code as easy to write and follow as sync code? No. Is it worth the benefits I've mentioned? For me, yes.
This sounds like a tacit admittance that you're willing to deal with it because you don't think you have other options, but you do. There is at least one CPS compiler for Node (TameJS), and there are other languages that allow for the same result, but more straightforward implementations of concurrent code (Erlang, Ocaml/LWT, Haskell). I'm not saying you should use though, but we can do better and we should, even if it's just compiling back to JS in the end.
The cost of hosting a webapp tends to be a rounding error in contrast to the cost of developing the webapp.
The additional benefit is that I can take the same program and handle 20,000+ concurrent users on two servers— which is when I suddenly become very glad that my hardware costs are significant compared to my dev costs.
That's a weird way to look at it, unless you're in the webapp hosting business? For everyone else there is usually only one webapp that they care about.
I can take the same program and handle 20,000+ concurrent users on two servers
Sorry to break it, but that's not how it works. Unless you have one of those rare webapps that never need to touch a database.
Anyway, what's good for the webapp hosting business is good for web developers, and what's good for web developers is good for the technical ecosystem in general (and then the world). Of course going from VPSes to EC2s was a significant improvement. But that isn't as good as it gets. EC2 rates were cheap already, but when Az started the free tier it represented a significantly lower barrier to entry. That's good for everyone.
And seriously, come on. This is a way of making programs run faster, and not a little faster, but a hundred times faster. It's the very definition of technological progress. It's absurd that we're here arguing about whether it matters or not.
Sorry, but if anything then that statement is absurd.
Faster than what? And where's that "hundred times faster" figure coming from?
It seems there's a bit of a misconception about the bottlenecks and cost structure in real world web applications.
Rails (aka the slowest web framework known to man) is popular because it trades hardware for development velocity. Hardware is cheap, developer salaries are not.
But node multiplies that, a lot. Which is nice, because you know it won't break or slow down if a bunch of people use it for some reason. And so you don't have to re-architect your system for a while longer, which is valuable time.
Yes. Rails is measured in hundreds per second. Node in thousands per second.
The point that you still seem to be missing is that the monetary amounts involved have normally turned into a rounding error long before you reach a traffic-volume where this difference becomes relevant.
Or, in other words, hosting a "webapp" already is nearly free in terms of hardware.