Automatic Flushing: The Rails 3.1 Plan(yehudakatz.com) |
Automatic Flushing: The Rails 3.1 Plan(yehudakatz.com) |
Currently, when an exception occurs, the system can simply change the response (since the response hasn’t been sent to the client yet, but is only buffered inside the system). With this approach, a response can be in x different states: before flushing, after the 1st flushing, … and after the xth flushing. And after the 1st flushing, the status, headers and some content has been sent to the client.
Imagine that something raises an exception after the 1st flushing. Then a 200 status has already been sent, togeher with some headers and some content. First of all, the system has to make sure the HTML is valid and at least give the user some feedback. It’s not impossible, but still a quite hard problem (because ERB doesn’t give us any hint of where tags are open/closed). The system also need to take care of all the x different state and return correct HTML in all of them.
Another issue is that we’re actually sending an error page with a 200 status. This means that the response is cacheable with whatever caching rules you decied earlier in the controller (before you knew that an error will occur). Suddenly you have your 500.html cached all over the placed, at the client-side, in your reverse proxy and everywhere.
Let’s not forget that exceptions don’t always render the error page, but do other things as well. For instance, sometimes an exception is raised to tell the system that the user needs to be authenticated or doesn’t have permission to do something. These are often implemented as Rack middlewares, but with automatic flushing they also need to take care of each x states. And if it for instance needs to redirect the user, it can’t change the status/headers to a 302/Location if it’s already in the 1st state, and therefore needs to inject a <script>window.location=’foo’</script> in a cacheable 200 response.
Of course, the views shouldn’t really raise any exceptions because it should be dumb. However, in Rails it’s very usual in Rails to defer the expensive method calls to the view. The controllers sets everything up, but it’s not until it needs to be rendered that it’s actually called. This increases the possibilty that an exception is raised in the rendering phrase.
Maybe I’m just not smart enough, but I just can’t come up with a way to tackle all of these problems (completely automated) without requiring any changes in the app.
Anything that might happen while actually rendering a view is a concern here (and as your get more lazy that could be quite a lot), but you'd normally sort out auth before actually rendering/flushing anything.
However, my point is that in order to take advantage of flushing, you want to start sending the HTML as soon as possible and this forces you to decide the status and the headers. If there's something in your stack which requires different status/header, you either need to evaluate it earlier in the request or hack around it by appending different HTML.
The more you decide to evaluate earlier, the less efficient becomes the flushing. So for every piece in the stack you have to make a choice: What is the chance that this requires different status/headers/content? How much do we gain by deferring? How can we hack around it if we've already started sending a response?
This is something you can't do automatically, and as far as I can see, this isn't mentioned in Yehuda's post at all.
- Allowing users to flush manually (people screw this up real bad) - Changing the rack spec (allowing for #each on the body to be lazily yielded, and terminating on nil or the like) - Moving to an always async stack (totally kills most users)
Yes, there are plenty of issues with this, and I agree with your concern, but it is also something which can have a marked effect on performance for users. It's also worth noting that a well componentised partial can render an error in-place of the partial itself, for example, rendering a page that contains the whole layout, and a single red box of errors (say a render of the _new partial can be added to the buffer after a _create fails, instead of rendering the success box). Yes, that requires some refactoring of the application (rather than using for example, the standard 302 approach).
It's also worth noting that a larger class of applications that would find this actually useful should generally have reasonable test coverage and code maturity. Whilst this isn't always the case, we also don't protect users from eval, and other evil tools, in ruby or rails.
For Rails 3.1, we wanted a mostly-compatible solution with the same programmer benefits as the existing model, but with all the benefits of automatic flushing
And from there he goes on with very specific implmentation details and the only caveat is some API change. This gives the impression that this is something you can easily enable for any app.
I just want to point out that 100% automatic flushing is pretty much impossible with the current state of Rack/Rails, and there's still plenty of work before there's anything near flushing support in Rails.
In addition, everyone should be aware of the trade-off you're making with flushing (potentially sending 500 responses as 200 Ok etc.)
class Dummy
def initialize(controller)
@controller = controller
end
def each
@controller.render.each { |part| yield part }
end
end
@body = Dummy.new(self)It'd be nice to see this implemented in Django as well...
@things = Thing.where(:it => "good")
And this view code:
<% for thing in @things %> <%= thing.name %> <% end %>
But the SQL query doesn't fire in the controller. It gets kicked in the view when you "for x in y"
Concerns still wonderfully separated.
In only the trivial cases can you defer the actual SQL queries from being performed before the view is rendered.
I'm surprised he didn't raise this in the article, but I guess the article was more about how it could work, than how it will in the final build, or how it should in all cases.
I've got to disagree partially, or at least present a different point of view about the 200/500 argument. It could be considered acceptable for the response to be a 200, as the server has not completely errored out, it is only a portion of the response that has errored, and at the application level. It seems that some apps would return a 500 in this case, and then render a page, suggesting that the server and app are broken. This is really something that could very quickly turn into a bikeshed discussion, but you can probably see my point even if you don't agree (I'm not sure I agree in all cases, but it is food for thought).
As you probably well know, I've been doing the async on rails and other frameworks game in ruby for about as long as anyone else that's publicly producing code in this arena (in ruby), and I have to say that this solution that they came up with is actually far better than most of the other hacks. It would be really nice if ruby had performant generators for yielding, but without them, we're left with less options. This fiber approach is getting us pretty close, and I think it's worth exploring, even if it turns out to be a bad idea for most people.
The Rack API won't really ever change significantly for the better in this arena. People don't seem to want to lose the simple #call returning a tuple protocol, and without making that either: a) asynchronous (in as much as not relying on a return value, but a call to a response object), or b) some significant changes to the contract for body, we can not really optimise many ways to provide both simplicity and reasonable levels of granular control of IO. ryah of node.js fame was writing an IO driven server years ago when I was first working on the async rack hack that is in thin, zbatery, rainbows and flow. We had a lot of discussions back then about how best to fit this stuff into the Rack API. We both desired to use something similar to Enumerator#next, but this would never fit much better into the existing setups, and as already noted, is not very performant by default.
As you say, this is not trivial, but I would argue that this means we need to experiment with more approaches, as neither of the currently presented solutions (my async api, and this fiber api) are ideal, nor is buffering large volumes of response data in memory, or taking a bigpipe / highly ajax style approach.
If Rails can help you make it easier, that's great, I just don't see how it's a mostly-compatible solution as Yehuda wrote in the post.
I fully agree that we should explore this option, but at the same time we should make people aware of the trade-offs and not present it as some setting you can simply enable by config.automatic_flushing = true (which makes Rails do all the hard work).
Yehuda didn't even mention the word exception in the blog post, so I wasn't sure if was aware of the issues or not.
Including assocations is an orthogonal issue. As far as I know they get the same benefit out of the box in Rails 3. For example, I think you can say:
@posts = Post.where(:published => true).include(:comments)
And it will still load it all eagerly, but it won't do it until you actually iterate over @posts.
While this does satisfy the objection to association loading, you still have the general problem of delaying the flushing until after all controller processing has completed. for nontrivial applications, this may indeed take quite some time (talking with disparate backends for a SOA, for example.)
Deferring only works if having a stub of a request is sufficient to proceed to the next step in the execution path. Unless you are going to implement your own conditionals (which, admittedly, is doable in ruby,) then you are going to force the evaluation of the request as soon as you want to use it to make a decision.
Claiming that lazy loaded queries is only a benefit for "trival cases" is a strawman. It's a hugely powerful functionality for ActiveRecord that you can utilize in many ways, and would be very hard to implement without low level support.
Cached attributes can often easily be made available via concise single model methods that operate transparently without the controller OR the view needing to know they are cache-backed. Plus, even if you are loading stuff out of memcached in the controller, it's going to be fast, because that's the whole point of memcached.
ActiveRecord meanwhile, normally takes a huge percentage of rendering time. Being able to defer those queries while still allowing the controller to declare them is actually a huge combination of performance flexibility and separation of concerns. Previously, if you wanted to defer them "cleanly", you'd have to create model methods, but even there you would have to pass params through somehow or generally do something uglier than what you have to do now.
1.The performance goal is to return http & html headers as quickly as possible.
2. Business logic belongs in the models, as triggered by method invocations from the controller.
3. A good deal of time is spent on business logic and request handling (authentication / set-up / before_filter stuff.) Often, this requires network I/O to backend systems (or databases.) While some data retrieval is necessary only to render the view (and can thus be deferred gracefully,) other times your application logic depends the completion of these lengthy requests in order to complete the desired state modification.
4. Since we want to return the headers as quickly as possible, we can either a) figure out how to send the headers before the controller or b) figure out how to delay the processing until after the controller.
---
I think that that a) is better than b).
I like views that exclusively take data and format it for output. I like having the core business logic in the models, and I like having the system guards and request setup concentrated in the controllers. This way, I can have an exhaustive understanding of the tasks performed by an action without having to read through all of the views and their partials.
If we hide some of the processing within model actions and call those model actions from the view, then I no longer can assume that all major processing / I/O has happened by simply reading the controller and the model methods it invokes. Instead, I also have to read all of the views.
This violates the expectations I have about processing times and MVC.
If instead, we could detect the requested format and return immediately with the headers, then we could perform the overwhelming majority of the processing while the browser is busy downloading static assets.
However I see the core team's point that they don't want to completely overhaul the API in a way that's going to break arguably a majority of existing apps, and at the same require the developer to pay attention to more details than may be necessary.
However on the topic of breaking expectations about processing times, I couldn't disagree more. To me, the main benefit of MVC (or any architecture choice) is in isolating responsibilities, not isolating code execution. It's not a big mental leap to conceive of a ActiveRecord relation as declaring some data that's needed that may or may not be actually queried, depending on if the view needs it. This helps your MVC separation, because it means you don't need to extract view logic from your template just to make sure the controller doesn't load something unnecessarily. Lazy execution of this form is a very powerful optimization technique, just ask any Haskeller.
(not really, boom, hat trick: http://img.skitch.com/20100908-tpruqq1pqsw3rsyh2ggn5qb7ac.pn...)
I'll let the meme die now. Doing it in a before filter would have the same effect -- unnecessarily increasing the latency before the headers are sent to the web browser. Wether the time is spent in a before filter or a controller action, the key is that the time is being spent before the headers are delivered to the browser. If you are trying to minimize the time it takes to return the header, then you'll want to do that before almost anything else.