Sans-IO: The secret to effective Rust for network services

Sans-IO: The secret to effective Rust for network services(firezone.dev)

233 points by wh33zle 1 year ago | 86 comments

This is billed as something revolutionary and forward progress but that’s exactly how we used to do async in $lang - including Rust - before language support for async/await landed.

The biggest productivity boost to my rust embedded firmware development was when I could stop manually implementing state machines and marshalling all local variables into custom state after custom state between each I/O operation snd let rust do that for me by using async/await syntax!

That’s, after all, what async desugars to in rust: an automatic state machine that saves values across I/O (await) points for you.

wh33zle 1 year ago | |

I tried to address this at the end of the post: If what you are implementing is mostly _sequential_ IO operations, then this model becomes a bit painful.

That isn't always the case though. In more packet-oriented usecases (QUIC, WebRTC & IP), doing the actual IO bit is easy: send & receive individual packets / datagrams.

There isn't really much the compiler can generate for you because you don't end up with many `.await` points. At the same time, the state management across all these futures becomes spaghetti code because many of these aspects should run concurrently and thus need to be in their own future / task.

tel 1 year ago | |

I don't think that's quite true. The lift here is that the state machine does not do any IO on its own. It always delegates that work to the event loop that's hosting it, which allows it to be interpreted in different contexts. That makes it more testable and more composable as it makes fewer assumptions about the runtime environment.

Theoretically, you could do the same thing with async/await constructing the state machines for you, although in practice it's pretty painful and most async/await code is impure.

There are lots of more experimental languages which exceptional support for this style of programming (Eff, Koka, Frank). Underlying all of Haskell's IO discourse is a very deep investment into several breeds of this kind of technology (free monads and their variants).

Lately, Unison has been a really interesting language which explores lots of new concepts but also has at its core an extensible effects system that provides excellent language-level support for this kind of coding.

sriram_malhar 1 year ago | | |

> I don't think that's quite true. The lift here is that the state machine does not do any IO on its own.

Here is a simple counter example. Suppose you have to process a packet that contains many sequences (strings/binary blobs) prefixed by 4 bytes of length.

You are not always guaranteed to get the length bytes or the string all in one go. In a sequential system you'd accumulate the string as follows

   handle_input(...)
       while not received 4 bytes
          accumulate in buf

       len = toInt(buf[0..4])

       while not received len bytes
          accumulate in buf

If implemented as a state machine ,these would require two await points to assemble the string. Flattening this out into a state machine manually is a pain.

PaulHoule 1 year ago | |

It was how we did I/O in assembly language in the 1980s. How else would you write an interrupt-driven YMODEM implementation?

k_bx 1 year ago | |

Yep. The only things about async that bothers me is the need to write ".await" everywhere. I wish there'd be a way to inverse this, and actually just run ".await" by default, while having a special construct not to.

vlmutolo 1 year ago | | |

It’s important to be able to see where the async function might pause execution.

For example, if you’re holding a mutex lock, you probably want to avoid holding it “across” an await point so that it’s not locked for longer than necessary if the function is paused.

mdtusz 1 year ago | | |

You mean `.await`, I assume?

Aissen 1 year ago | |

It is not billed as revolutionary. From the article:

> This pattern isn't something that we invented! The Python world even has a dedicated website about it.

And yet it is too common to find protocol libraries doing I/O in the wild :-(

zamalek 1 year ago |

I had been mulling over this problem space in my head, and this is a seriously great approach to the direction I have been thinking (though still needs work, footnote 3 in the article).

What got me thinking about this was the whole fn coloring discussion, and a happy accident on my part. I had been writing a VT100 library and was doing my head in trying to unit test it. The problem was that I was essentially `parser::new(stdin())`. During the 3rd or 4th rewrite I changed the parser to `parser::push(data)` without really thinking about what I was doing. I then realized that Rust was punishing me for using an enterprise OOPism anti-pattern I have since been calling "encapsulation infatuation." I now see it everywhere (not just in I/O) and the havoc it wreaks.

The irony is that this solution is taught pre-tertiary education (and again early tertiary). The simplest description of a computer is a machine that takes input, processes/transforms data, and produces output. This is relevant to the fn coloring discussion because only input and output need to be concerned with it, and the meat-and-potatoes is usually data transformation.

Again, this is patently obvious - but if you consider the size of the fn coloring "controversy;" we've clearly all been missing/forgetting it because many of us have become hard-wired to start solving problems by encapsulation first (the functional folks probably feel mighty smug at this point).

Rust has seriously been a journey of more unlearning than learning for me. Great pattern, I am going to adopt it.

Edit: code in question: https://codeberg.org/jcdickinson/termkit/src/branch/main/src...

ziziman 1 year ago |

How does this design compare to using channels to send data to a dedicated handlers. When using channels i've found multiple issues: (1) Web-shaped code that is often hard to follow along (2) Requires to manually implement message types that can then be converted to network-sendable messages (3) Requires to explicitly give a transmitter to interested/allowed entities (4) You get a result if your channel message failed to transmit but NOT if your message failed to transmit over network

But besides that it's pretty convenient. Let's say you have a ws_handler channel, you just send your data through that and there is a dedicated handler somewhere that may or may not send that message if it's able to.

K0nserv 1 year ago | |

For 4 you can implement that with a channel passed along with the message to send a result back. You can then block the sending side all the way to the callsite if you wish.

My feeling is that sans-IO is particularly useful for libraries, although it can be used for applications too. In a library it means you don't force decisions about how I/O happens on your consumer, making it strictly more useful. This is important for Rust because there's already a bunch of ecosystem fragmentation between sync and async IO(not to mention different async runtimes)

wh33zle 1 year ago | | |

The line between applications and libraries is fairly blurry, isn't it? In my experience, most applications grow to the point where you have internal libraries or could at least split out one or more crates.

I would go as far as saying that whatever functionality your application provides, there is a core that can be modelled without depending on IO primitives.

wh33zle 1 year ago | |

Channels work fine if you are happy for your software to have an actor-like design.

But as you say, it comes with problems: Actors / channels can be disconnected for example. You also want to make sure they are bounded otherwise you don't have backpressure. Plus, they require copying so achieving high-throughput may be tricky.

hardwaresofton 1 year ago |

See also: monads and in particular the Free(r) monad, and effects systems[0].

The idea of separating logic from execution is a whole thing, well trodden by the Haskell ecosystem.

[EDIT] Also, they didn't mention how they encapsulated the `tokio::select!` call that shows up when they need to do time-related things -- are they just carrying around a `tokio::Runtime` that they use to make the loop code async without requiring the outside code to be async?

[EDIT2] Maybe they weren't trying to show an encapsulated library doing that, but rather to show that the outside application can use the binding in an async context...

I would have been more interested in seeing how they could implement an encapsulated function in the sans-IO style that had to do something like wait on an action or a timer -- or maybe the answer they're expecting there is just busy-waiting, or carrying your own async runtime instance (that can essentially do the busy waiting for you, with something like block_in_place.

[0]: https://okmij.org/ftp/Computation/free-monad.html

r3trohack3r 1 year ago |

Oh hey thomaseizinger!

I got half way through this article feeling like this pattern was extremely familiar after spending time down inside rust-libp2p. Seems like that wasn't a coincidence!

Firezone looks amazing, connect all the things!

wh33zle 1 year ago | |

Haha thank you!

Yes there are indeed similarities to rust-libp2p! Over there, things are more interleaved though because the actual streams and connections are still within `Future`-like constructs and not strictly split like in the sans-IO case here.

amluto 1 year ago |

> Also, sequential workflows require more code to be written. In Rust, async functions compile down to state machines, with each .await point representing a transition to a different state. This makes it easy for developers to write sequential code together with non-blocking IO. Without async, we need to write our own state machines for expressing the various steps.

Has anyone tried to combine async and sans-io? At least morally, I ought to be able to write an async function that awaits sans-io-aware helpers, and the whole thing should be able to be compiled down to a state machine inside a struct with a nice sans-io interface that is easily callable by non-async code.

I’ve never tried this, but the main issues I would forsee would be getting decent ergonomics and dealing with Pin.

ethegwo 1 year ago |

Good job! Exposing state could make any async function 'pure'. All the user needs to do is push the state machine to the next state. I have tried to bind OpenSSL to async Rust before, its async API follows a similar design.

wh33zle 1 year ago | |

I did some quick research and found that there is an "async job" API in OpenSSL. That one appears to do IO though, it even says that creating a job is a very expensive operation and thus jobs should be reused.

Is the similarity you are seeing that the work itself that gets scheduled via a job is agnostic over how it is executed?

From this example [0] it looks more like that async API is very similar to Rust's futures:

- Within a job you can access a "wait context"

- You can suspend on some condition

- You can trigger a wake-up to continue executing

[0]: https://www.openssl.org/docs/man1.1.1/man3/ASYNC_is_capable....

ethegwo 1 year ago | | |

Yes, you're right. It's not entirely similar, it's not IO-less. But in async Rust (or any other stackless coroutine runtimes), IO should be bound to the scheduler. This allows IO events callback scheduler and wake the task it binds to. Exposing and manually pushing state is a good way to decouple IO from the scheduler.

mgaunard 1 year ago |

This is just normal asynchronous I/O with callbacks instead of coroutines.

mpweiher 1 year ago |

Reading the article and some of the comments, it sounds like they reinvented the hexagonal or ports/adapters architectural style?

Uptrenda 1 year ago |

I don't know what the take away is supposed to be here. Everything spoken about here is already basic network programming. It seems to focus on higher level plumbing and geeks out on state management even though this is just a matter of preference and has nothing to do with networking.

The most interesting thing I learned from the article is that cloudflare runs a public stun server. But even that isn't helpful because the 'good' and 'useful' version of the STUN protocol is the first version of the protocol which supports 'change requests' -- a feature that allows for NAT enumeration. Later versions of the STUN protocol removed that feature thanks to the 'helpful suggestions' of Cisco engineers who contributed to the spec.

K0nserv 1 year ago | |

The big thing, in the context of Rust, I think is how this solves function colouring, but it also makes testing really simple as outlined in the post.

The current situation in Rust is that if you implement a library, say one that does WebRTC, that uses the Tokio async runtime. Then it's very cumbersome for folks to use it if they are doing sync IO, using a different runtime(smol, async-std etc), are using iouring directly etc. With this approach you don't force the IO choice on consumers and make the library useful to more people.

solidninja 1 year ago | | |

The parallels with abstracting over the effect type and Free(r) monads are really apparent if you've had exposure to that style of programming. As you said, the benefit is that you can separate the business logic (what you want to do) from the execution model (how you do it) and this is very much an ongoing theme in programming language development.

screcth 1 year ago |

It would be better if the compiler could take the async code and transform it automatically to its sans io equivalent. Doing it manually seems error prone and makes it much harder to understand what the code is doing.

tmd83 1 year ago |

Does the actual traffic goes through the gateway or the gateway is only used for setting up the connection?

wh33zle 1 year ago | |

Yes, traffic is routed to the gateway through a WireGuard tunnel. Broadly speaking, what happens is:

- Client and gateway perform ICE to agree on a socket pair (this is where hole-punching happens or if that fails, a relay is used)

- The socket pair determined by ICE is used to set up a WireGuard tunnel (i.e. a noise handshake using ephemeral keys).

- IP traffic is read from the TUN device and sent via the WireGuard tunnel to the gateway.

- Gateway decrypts it and emits it as a packet from its TUN device, thereby forwarding it to the actual destination.

It is worth noting that a WireGuard tunnel in this case is "just" the Noise Protocol [0] layered on top of UDP. This ensures the traffic is end-to-end encrypted.

[0]: https://noiseprotocol.org

ibotty 1 year ago |

That's just an initial encoding as described e.g. here: https://peddie.github.io/encodings/encodings-text.html

Am I missing something?

Animats 1 year ago |

"... be it from your Android phone, MacOS computer or Linux server. "

Why would you want this in a client? It's not like a client needs to manage tens of thousands of connections. Unless it's doing a DDOS job.

wh33zle 1 year ago | |

In Firezone's case, things are built on top of UDP so technically there aren't any (kernel-managed) connections and only a single file descriptor is allocated for the UDP socket.

The main benefit is being able to use `&mut` everywhere: At the time when we read an IP packet from the TUN device, we don't yet know, which gateway (exit node), it needs to go to. We first have to look at the user's policies and then encrypt and send it via a WireGuard tunnel.

Similarly, we need to concurrently receive on all of these tunnels. The tunnels are just a user-space concept though. All we do is receive on the UDP socket and index into the corresponding data structure based on the sending socket.

If all of these "connections" would use their own task and UDP socket, we'd would have to use channels (and thus copying) to dispatch them. Additionally, the policy state would have to be in an `Arc<Mutex>` because it is shared among all connections.

Arnavion 1 year ago |

See also this discussion from a few months ago about sans-io in Rust: https://news.ycombinator.com/item?id=39957617

cryptonector 1 year ago |

> This pattern isn't something that we invented! The Python world even has a dedicated website about it.

I mean, this is basically what the IO monad and monadic programming in Haskell end up pushing Haskell programmers to do.

joshka 1 year ago |

This article / idea really refactors two things out of some IO code

- the event loop

- the state machine of data states that occur

But async rust is already a state machine, so the stun binding could be expressed as a 3 line async function that is fairly close to sans-io (if you don't consider relying on abstractions like Stream and Sink to be IO).

    async fn stun(
        server: SocketAddr,
        mut socket: impl Sink<(BindingRequest, SocketAddr), Error = color_eyre::Report>
            + Stream<Item = Result<(BindingResponse, SocketAddr), color_eyre::Report>>
            + Unpin
            + Send
            + 'static,
    ) -> Result<SocketAddr> {
        socket.send((BindingRequest, server)).await?;
        let (message, _server) = socket.next().await.ok_or_eyre("No response")??;
        Ok(message.address)
    }

If you look at how the underlying async primitives are implemented, they look pretty similar to what you;ve implemented. sink.send is just a future for Option<SomeMessage>, a future is just something that can be polled at some later point, which is exactly equivalent to your event loop constructing the StunBinding and then calling poll_transmit to get the next message. And the same goes with the stream.next call, it's the same as setting up a state machine that only proceeds when there is a next item that is being fed to it. The Tokio runtime is your event loop, but just generalized.

Restated simply: stun function above returns a future that that combines the same methods you have with a contract about how that interacts with a standard async event loop.

The above is testable without hitting the network. Just construct the test Stream / Sink yourself. It also easily composes to add timeouts etc. To make it work with the network instead pass in a UdpFramed (and implement codecs to convert the messages to / from bytes).

Adding timeout can be either composed from the outside caller if it's a timeout imposed by the application, or inside the function if it's a timeout you want to configure on the call. This can be tested using tokio test-utils and pausing / advancing the time in your tests.

---

The problem with the approach suggested in the article is that it splits the flow (event loop) and logic (statemachine) from places where the flow is the logic (send a stun binding request, get an answer).

Yes, there's arguments to be made about not wanting to use async await, but when you effectively create your own custom copy of async await, just without the syntactic sugar, and without the various benefits (threading, composability, ...), it's worth considering whether you could use async instead.

// Made up `gen` syntax: gen(yield_type, resume_type) gen(Transmit, &[u8]) fn stun_binding(server: SocketAddr) -> SocketAddr { let req = make_stun_request(); yield Transmit { server, payload: req }; let res = .await; // Made up "suspend and resume with argument"-syntax. let addr = parse_stun_response(res); addr }

async fn stun( server: SocketAddr, mut socket: impl Sink<(BindingRequest, SocketAddr), Error = color_eyre::Report> + Stream<Item = Result<(BindingResponse, SocketAddr), color_eyre::Report>> + Unpin + Send + 'static, ) -> Result<SocketAddr> { socket.send((BindingRequest, server)).await?; let (message, _server) = socket.next().await.ok_or_eyre("No response")??; Ok(message.address) }

1. Run N clients of the TURN protocol. 2. Run M connections (ICE agent + WireGuard tunnel). 3. Allow each connection to use each TURN client. 4. Everything must run over a single UDP socket (otherwise hole-punching doesn't work).