SSH and User-Mode IP WireGuard(fly.io) |
SSH and User-Mode IP WireGuard(fly.io) |
It could help you replace the TUN with something more cross platform, and possibly with less overhead. You can pass in the hostname using %h, so you can even have virtual DNS.
https://github.com/majek/slirpnetstack/
(btw, gvisor netstack, while not without problems, is likely to be faster than libslirp, see benchmarks https://github.com/rootless-containers/rootlesskit/pull/101#... )
Having complete control of TCP/IP in userland like this, with so little code, is so valuable I feel like there needs to be some special name for the technique.
The whole thing is kind of a vindication for Go's standard library network interface, which I have always hated.
Yes! Userspace TCP/IP is how we implement firewall for Androids (which don't expose iptables on non-root devices but let you setup TUN interfaces via VPN APIs). Right now, we rely on LwIP (wrapped in golang) and it has worked wonderfully well; especially since it is light-weight without any locking-overheads (single-threaded) and that bodes well for battery-powered devices.
> The whole thing is kind of a vindication for Go's standard library network interface, which I have always hated.
The Fuchsia team at Google is re-implementing netstack3 in Rust (and hence you're probably right to call it "gVisor netstack") due to what I presume are performance and efficiency reasons (which is of interest to us because we develop for smartphones). Of course, flyctl doesn't need that, but since you wrote about pulling in heavy dependencies, I am interested in your take on it.
As a non-Android developer, I've been working on a project the last few months that involves running an HTTP server on the device and tunneling out so it can receive requests from the outside world, and the platform feels nerfed at every level from filesystem access to keeping your server from being battery-killed.
I'll note that while all of gVisor's user-mode Linux is in the same Go module, we've actually gone to decent lengths to keep the network stack logically separate from the rest of the user-mode Linux code.
So while go.sum might look a bit frightening, Brad's depaware shows that the extra code you pull in to binaries by using netstack is actually quite minimal: https://github.com/tailscale/tailscale/commit/5aa5db89d6a9a6....
Curious about this. I've generally found Go's net libs to be pretty pleasant. Can you compare/contrast it with others you like better?
Anyways. Wrong about that one! Movin' on!
Many years ago, when we could take always-on desktop PCs more or less for granted, I developed a product that let the user connect back to their home PC from another PC, to stream music from home or grab a file (this was also pre-Dropbox). NAT was already ubiquitous by this point, and Windows XP SP2 (first version with Windows Firewall) came out that year, so I knew it couldn't just make a direct TCP connection to the user's home PC. So I did a stupid relay implementation, where both the client and the home server (that's what we actually called the tray applet on the home PC) made outgoing TCP connections to our central server, which would relay packets back and forth. If I'd had access to a TCP-in-userspace thing like the gvisor network stack, I could have run TCP end-to-end, the way it's meant to be used. It almost makes me want to reimplement that old system using Go and WireGuard, even though the functionality is basically irrelevant in today's world.
ssh dogmatic-potato-342@jump.fly.io
And tunnel connection over wireguard on jump server
Also, it saves moving cables around when I break stuff.
I really enjoy this style of writing from a company.
Regarding the article, it seems like Fly has pulled off some insane networking nonsense, but I don’t know enough about networking yet to understand it. Saving this page for later and gonna get back to the TCP/IP Guide.
Fly is essentially building a Tailscale-esque infrastructure to service one part of their cloud offering. It is indeed insane the amount of heavy-lifting they do to make it all work. They seem like a cross between packetfabric, gitops, docker, and hashicorp but with way less engineers on the team.
Most of the time these implementations are too tightly tied to the rest of the company's infra to be useful standalone. When one of those companies succeeds a common pattern is for engineers to cash out, leave, and build a new startup around one technology from the success story.
I would not be surprised if this is one of the forces that drives the consumer -> infra -> consumer -> infra cycle. A consumer wave leads to inventing lots of interesting but bespoke infra while it is growing like crazy. When it plateaus, folks spin out the interesting infra bits until the next consumer wave (generally larger) starts rising.
We mostly just try to pick the right primitives. And frequently get that wrong. Like that time we wrote our own JS runtime ...
I'm hopeful we'll also see some robust QUIC-based tunneling tools over the next couple years.
It also seems rather obvious to extend WireGuard to run over QUIC in addition to UDP. But the movement on that front has been very limited.
Also, this post prompted me to look closer at Fly.io, and it's leapfrogged to the top of my shortlist for an imminent client "edge proxy" project.
Last year I implemented TCP/IP over AWS Cloudwatch. Tons of "can you believe that actually works?" stuff possible with it:
https://medium.com/clog/tcp-ip-over-amazon-cloudwatch-logs-c...
Just realized this was written by security guru tptacek, nice. What is the contextual meaning of “AFFIANT SAYS NOTHING FURTHER.”?
"That's all, folks"
Normal SSH still works, and is usually going to be what people end up using. You just have to have WireGuard installed and running.
The product feature here is less interesting than how we did it.
(I admit that I haven't looked much into mesh networking / edge servers, so I don't know what the problems are. I always preferred Internet -> Identity Aware Proxy type thing -> mTLS mesh that is useless to humans. And, I don't ssh to stuff much anymore... I have my software collect debugging information and send it to something I can access through a browser or API, and control that software through an API. So everything is editing config files, basically, not SSHing places ;)
WireGuard is dead simple, but setting it up is extra cognitive friction if you've never dealt with it before (or if you're in an environment where you can't create a network interface). Jason Donenfield did some magic with a Google user space networking stack that lets us "hide" the wireguard component. People using our CLI will soon be able to connect to their private network + SSH into a container with one command.
Basically, WireGuard is cool and being able to connect into a wireguard network from a userland program is really helpful for building a straightforward UX.
What is the relationship with micro kernels? Is the feature available separate from the deployment/hosting?
By the way, as an elixir developer Fly.io looks extremely cool. But my (mostly public sector) customers want to hear something similar to the words "AWS" when asked about hosting – so is it running on top of AWS or Azure or GCP? (instances look like they may be GCP, which is fine too).
My first thought was "Wow, can we make this _more_ complicated please?", and then I read the rest of the post.
I hate technology.
Perhaps it's just me, but this is something I would accept as a "hey, I was bored and worked on something on my free time. It's probably broken but nobody cares because it's a toy thing, but it's sooo cool". I wouldn't accept it as " Fly.io OKR 1.3 (2021): SSH and User-mode iP WireGuard"... it's sounds pretty much like a hack.
Wait until I find a reason to put a whole virtual memory manager into `flyctl`. I'll probably knock out a whole bunch of MBOs that way, and gVisor has me covered.
Any Git opening will do, for private cloning.
I imagine a merger! Tailscale's mission is to "simplify the long tail of software development", and coincidentally, fly does just that (if only for server-side apps right now).
This userland wireguard project was helpful for making "flyctl run console" work.
See also: https://twitter.com/bradfitz/status/1303776199907311617?s=19
Android development is a bit tedious relatively compared to iOS due to having to support multiple API levels and having to account for subtleties across OEM implementations, but things have drastically improved in the last few years, especially after Oreo (Android 8).
> ...from filesystem access
Watch out for tutorials still recommending workarounds that aren't necessarily needed due to Jetpack and friends: https://developer.android.com/modern-android-development.
> ...to keeping your server from being battery-killed.
See: https://dontkillmyapp.com/
Process reaping is also, I believe, a problem on iOS? One way to keep a process out of OutOfMemory/LowMemoryKiller's reach is to make it a foreground service (what stuff like Music Players do) and generally be very stringent with resource use. It is easy to profile for resource usage thanks to Android Studio's built-in profiler and tools like https://perfetto.dev/
But Android seems to be working hard to "catch up" to iOS.
I'm mostly comparing to native Linux development. Obviously you may need to make some changes for security, but I feel like they've gone way overboard with things like forcing the storage access framework/media storage APIs, killing even foreground services (doze mode etc), and so on.
At the end of the day, if you're using software to purposefully limit what hardware is capable of, I think that's wrong. Even if you're worried about security, add a simple escape hatch for power users.
Does this imply that the user-space TCP/IP-over-WireGuard trick described here wouldn't work through NAT, or on a mobile OS (assuming you can get a Go toolchain up and running)?
I know they did a bunch of work to get wireguard-go working on iOS. It sounds hard to me!
Let's simplify it: from your Go WireGuard connection, just do an HTTP GET. What's your next step?
I was confused because Tailscale does not bring its own userland TCP/IP. It can - as a VPN solution - rely on OS-provided TCP/IP stack, but you wanted to avoid having to hook up flyctl into OS as a virtual network interface, right?
We could too! This is all in `wireguard-go`. But we'd have to prompt users to escalate privileges every time they tried to SSH somewhere (or, worse, install a long-term resident thingy, just to SSH to things). We don't want to own your VPN connections!
This is an end-run around all of that; we just take responsibility for all of TCP/IP, in our dumb little command line program.
I had another question – this seems similar to what Hashicorp is doing with Boundary. Have you looked at Boundary and how this potentially compares with that, from an architecture standpoint? Of course there are parts of this that are bespoke to your infrastructure, but I'm just more curious from a nerdy-aspect of it because we're evaluating boundary as a replacement to our current setup (Wireguard bastion host), for all the other benefits like auth and logging.
I think our take on end-user access management is lower-level than what Boundary is trying to do. Boundary, as I understand it, sees the world the way an IdP RP does, mostly in terms of bearer tokens. We see stuff as infrastructure; a static configuration on an EC2 instance or a CI container; "just Unix". If we weren't building a PAAS, we'd probably lean much more strongly towards Boundary's way of looking at things.
As well, we care about minimizing and understanding as much of the code we expose as possible. For all the talking I've done about SSH here, the serverside of this feature is just a couple hundred lines of code; it is dwarfed by the clientside code. I couldn't say that about a Hashi product. (HashiCorp could though!)
Curious about fiddling with something similar with firecracker at home.
Think it'd be neat to spin up bespoke micro-vm's with wireguard enabled.
If you're turning up microvms with a linux kernel, it might just be easier to use kernel mode wireguard. It works pretty well!
Just thought it'd be fun to futz with network code for once given the most I do is http usually.
Been checking out gliberlabs/ssh the past few hours which is neat. And which I can think of fun ways to pair with a micro-vm and step ca.
However, once authorized, the actual session uses a TLS stack generated for that individual session to establish a secure tunnel. It's explained at https://www.boundaryproject.io/docs/concepts/security/connec... if you're interested.
As for complexity, while Boundary overall is by no means a couple hundred lines, I will simply say that the vast majority of code (nearly the entire API) is related to user and resource management...how users are defined and authenticated, how infrastructure and services are described for access, RBAC, etc. The actual networking code performing the secure proxying is quite minimal because at least for the TCP tunnel it's more or less specifying the acceptable TLS parameters for that session and from there you're mostly in `io.Copy` land... it probably works out to a couple hundred lines :-D
What you're doing in Fly looks super cool and the stuff you're doing on gVisor (including the user-mode Wireguard stuff) is super cool too! Thanks for writing it up. And it sounds like the two solutions are more complementary than competing, so maybe at some point in the future you'll find that Boundary has a niche to fill in your setup as well!
But I'm still trying to fully understand what they're doing with Boundary. The abstractions just feel a bit off to me unlike other Hashicorp products (it's odd to me that you have to tell boundary to treat a database connection differently rather than just giving me any TCP or UDP access).
But their team does great work and has elegant designs so I trust it's more likely that the lightbulb just hasn't gone off in my head yet with Boundary.
It's all TCP though. Eventually we'll do more interesting things with specific protocol types.
I would take credit for this, but it's Ben's c--- hey, wait, I paid Ben Burkert for this, I'm going to take full credit.
So I have been actually looking at the code under pkg/wg and tracing stuff back into the wireguard-go pkg and so on for a bit. (Which is some very nice and clean code haha, so you definitely got what you paid for. :P)
I guess the conceptual hurdle I'm stuck on now is, great I've got this wg tunnel open in my code go. How do I actually force packets over it? Say I've got a sshd listening on the other end of the tunnel with netfilter rules that say only allow access over this tunnel.
Can I just do normal ssh calls and use the wg tunnel remote addr to do stuff?
Is it that simple and I'm vastly over thinking things, or is it more complicated then I thought?
Incidentally, fly.io is awesome!
Might have to see about getting our workloads running on it for any customers who might want to run them.
It's definitely given me some fun ideas custom wg and sshd impls running over micro-vm's for at home haha.