Tokio internals: Understanding Rust's async I/O framework

Tokio internals: Understanding Rust's async I/O framework(cafbit.com)

110 points by bowyakka 8 years ago | 64 comments

carllerche 8 years ago |

I'm (one of) the author of Tokio, hopefully I can clarify some points.

> Unfortunately, Tokio is notoriously difficult to learn due to its sophisticated abstractions.

IMO, this is largely due to the current state of the docs (which are going to be rewritten as soon as some API changes land).

The docs were written at a point where we were still trying to figure out how to present Tokio, and they ended up focusing on the wrong things.

The Tokio docs currently focus on a very high level concept (`Service`) which is an RPC like abstraction similar to finagle.

The problem is that, Tokio also includes a novel runtime model and future system and the docs don't spend any time explaining this.

The next iteration of Tokio's docs is going to focus entirely at the "tokio-core" level, which is the reactor, runtime model, and TCP streams.

tl;dr, I think the main reason people have trouble learning Tokio is because the current state of the docs are terrible.

> Aren’t abstractions supposed to make things easier to learn?

Tokio's goal is to provide as ergonomic abstractions as possible without adding runtime overhead. Tokio will never be as "easy" as high level runtimes simply because we don't accept the overhead that comes with them.

The abstractions are also structured to help you avoid a lot of errors that tend to be introduced in asynchronous applications. For example, Tokio doesn't add any implicit buffering anywhere. A lot of other async libraries hide difficult details by adding unlimited buffering layers.

olix0r 8 years ago |

I've written a few production-facing applications with Tokio; and I think the author exactly identifies the stumbling blocks I hit while learning the landscape.

My takeaway from working with Tokio is that it's a fairly low-level abstraction and doesn't do much to address the challenges of building networked _applications_. And this is OK.

We'll need higher-level layers that use Tokio, however, to address more specific use cases. I'll point to the nascent tower-grpc[1] library as something in this direction. I hope to see more things like this fall out of our work on Conduit[2].

[1] https://github.com/tower-rs/tower-grpc

[2] https://github.com/runconduit/conduit

kindfellow92 8 years ago |

> Unfortunately, Tokio is notoriously difficult to learn due to its sophisticated abstractions.

Aren’t abstractions supposed to make things easier to learn? Something about the idea of “complex abstractions” seems wrong.

(Edit: this is not a criticism of Tokio, it’s a criticism of the OP’s characterization of “sophisticated abstractions” which IMO should reduce complexity)

jcrites 8 years ago | |

> Aren’t abstractions supposed to make things easier to learn?

Not always. Some abstractions are designed to make it easier to solve hard problems correctly (than without the abstraction).

For example, consider Rust's memory model. Many people criticize that model as difficult to learn. By comparison, you might argue that C's memory model is simpler to learn. Yet, the C approach to allocating, using, and freeing memory is highly error-prone. C programs historically have frequently had mistakes such as use-after-free errors, or buffer under/overflow/reuse errors. The high-profile OpenSSL Heartbleed vulnerability was an example of a weakness in C's memory model and memory handling abstractions [1].

Rust's memory model may be more difficult to learn than C's, but once learned, they are abstractions that provide an advantage in building correct software, by ruling out certain classes of mistakes. (GC in languages like C# and Java and Go can also prevent these mistakes, but comes with a runtime cost. Rust aims to provide zero-cost abstractions.)

Building correct async IO programs using kernel abstractions is difficult for similar reasons as it's difficult to write correct programs with C's memory model. It's especially difficult if you want the async IO program to be portable across multiple OS/kernels. I have not used Tokio, but I would guess that its Rust-powered abstractions will make it difficult or impossible to leak memory or sockets, or to fail to handle error cases that might arise handling async IO.

[1] https://www.seancassidy.me/diagnosis-of-the-openssl-heartble...

lurr 8 years ago | | |

Yeah, but I actually have a reasonable chance of accomplishing what I want in C++.

vs Rust where I bash my head against it for 2 days then give up. I'm not smart enough for Rust, oh well.

kindfellow92 8 years ago | | |

I think you’re comparing apples and oranges here.

Writing memory safe code without Rust is harder than using Rust’s abstractions to do the same task. If you agree with that then my comment stands.

tatterdemalion 8 years ago | |

There are various criticisms of tokio, coming from different directions. Some have to do with the fact that some abstractions in the futures ecosystem are leaky today and that makes them less easy to use than they could be (though they won't always be leaky[]). But others have to do with understanding the internal implementation of these abstractions - people who feel they must understand how their library works internally before using it.

Of course, schedulers are just complicated. Most of the time you don't think about how complicated your scheduler is, because its either an OS primitive in the kernel or a language primitive in your language's runtime. But since tokio is a library - and modular - it gets criticism for being complex that in my opinion is unfair.

[] To be more concrete: a future is essentially a state machine representing the stack state at any yield point; it can't (currently) contain lightweight references into itself because they'd be invalidated when the future is move around. This means using borrowing in futures programs is often infeasible today. Solutions are in the works.

kindfellow92 8 years ago | | |

My point is that “sophisticated abstractions” should reduce complexity, not increase it.

If Tokio’s abstractions are seemingly increasing complexity, maybe they aren’t sophisticated abstractions.

This is a criticism of the OP, not Tokio.

curun1r 8 years ago | |

It depends on whether an abstraction is intended for novices or, for lack of a better term, power users. Tokio is, IMHO, more of the latter. It's a complex set of concepts that is designed to scale up very well as the complexity of the task increases but doesn't scale down very well for simple tasks* . Learning quite often involves those simple tasks, so Tokio gets a reputation for being hard to learn. But when you take an easy-to-learn abstraction and try to scale it up to handle very complex problems, you often find the abstraction breaks down much more easily than something like Tokio does.

Tokio doesn't subscribe the the Larry Wall philosophy of making the easy things easy and the hard things possible. It seems more focused on making the hard things as easy as possible without much regard for the easy things.

* Before anyone attacks this...yes, you can accomplish simple tasks in Tokio, but it requires learning a lot more concepts than should be necessary to accomplish that simple task.

lurr 8 years ago | | |

So what, if you aren't already an expert then go away you aren't wanted?

Groxx 8 years ago | |

I'd argue it should be "abstractions take care of problems you don't care about". What you care about may be different from others, hence the variety of abstractions. Some (many!) favor ease-of-learning for general use, some favor safety at all costs, some favor explicit memory layout, some hardware independence, etc.

niftich 8 years ago | |

I think the author's quote is a bit of an overstatement. Reading the docs on tokio_core seems that knowledge could be readily transferred if you've worked with Node, or browser JS, Java Futures, or a game engine -- this, of course, means you've been exposed to similar abstractions, potentially with different terminology.

I think the quote is to be interpreted in terms of, if you've only ever seen blocking IO, and have never seen async IO or deferred computation, you have to learn some things first, but this isn't unique to Tokio in particular.

Yoric 8 years ago | | |

Actually, Tokio's futures are a bit different from the async IO (including most implementations of futures) in ecosystems. Nothing world-breaking, but they have shaped the abstraction a bit differently in order to be more efficient with respect to memory management, so it can be a bit surprising.

lossolo 8 years ago | |

Reading rust subreddit it seems like around 90% of opinions about Tokio are negative, that's why they are rewriting it.

Const-me 8 years ago |

> The tokio-core crate provides the central event loop

Does it mean it only uses a single thread for IO notifications?

If yes, the performance won’t be exceptionally great, especially on servers with many CPU cores and fast network interfaces.

The underlying OS APIs (both epoll, kqueue and iocp) do support multithreaded asynchronous IO, so that’s not some platform limitation.

carllerche 8 years ago | |

One can spawn as many reactors as you would like. The only thing a reactor does is receive events off of epoll (or other system selector) and notify the associated task. The task could be on the reactor thread or across a different thread.

Generally speaking, how to optimize concurrency for a network based application is pretty use case specific.

tl;dr, you can fully take advantage of many core systems w/ Tokio.

Const-me 8 years ago | | |

High I/O systems usually spawn multiple reactors, e.g. one reactor per CPU core, and run these reactors on the same set of file descriptors.

Does that library support such use case? Or does it imply 1-to-many relation between reactors and files/sockets? The latter doesn’t scale well.

br1 8 years ago | | |

We don't want to have to decide which thread will handle each connection, just pick an idle one.

jbirer 8 years ago |

I prefer golang's syscalls and os modules. Just direct interfaces to the C functions, no abstractions and no cruft.