Coding Agent VMs on NixOS with Microvm.nix(michael.stapelberg.ch) |
Coding Agent VMs on NixOS with Microvm.nix(michael.stapelberg.ch) |
Is that really where we are at? Just outsource convenience to a few big players that can afford the hardware? Just to save on typing and god forbid…thinking?
“Sorry boss, I can’t write code because cloudflare is down.”
Generally speaking, once you have a working NixOS config, incremental changes become extremely trivial, safe, and easy to rollback.
You're opting for "sorry boss, it's going to take me 10 times as long, but it's going to be loving craftsmanship, not industrial production" instead. You want different tools, for a different job.
But the one-time setup seems like a really fair investment for having a more secure development. Of course, what concerns the problem of getting malicious code to production, this will not help. But this will, with a little overhead, I think, really make development locally much more secure.
And you can automate it a lot. And it will be finally my chance to get more into NixOS :D
https://github.com/5L-Labs/amp_in_a_box
I was going to add Gemini / OpenCode Kilo next.
There is some upfront cost to define what endpoints to map inside, but it definitely adds a veneer of preventing the crazy…
Without nix I mean
I’m curious what gVisor is getting you in your setup — of course gVisor is good for running untrusted code, but would you say that gVisor prevents issues that would otherwise make the agent break out of the kubernetes pod? Like, do you have examples you’ve observed where gVisor has saved the day?
The huge gVisor drawback is that it __drastically_ slows down applications (despite startup time being faster.)
For agents, the startup time latency is less of an issue than the runtime cost, so microvms perform a lot better. If you're doing this in kube, then there's a bunch of other challenges to deal with if you want standard k8s features, but if you're just looking for isolated sandboxes for agents, microvms work really well.
The performance of gVisor is often a big limiting factor in deployment.
I read the thesis on arxiv. Do you see any limitations from using Xen instead of KVM? I think that was the biggest surprise for me as I have very rarely seen teams build on Xen.
An alternative is to “infect” a VM running in whatever cloud and convert it into a NixOS VM in-place: https://github.com/nix-community/nixos-anywhere
In fact, it is a common practice to use the latter to install NixOS on new machines. You start off by booting into a live USB with SSH enabled, then use nixos-anywhere to install NixOS and partition disks via disko. Here is an example I used recently to provision a new gaming desktop:
nix run github:nix-community/nixos-anywhere -- \
--flake .#myhost \
--target-host user@192.168.0.100 \
--generate-hardware-config nixos-generate-config ./hosts/myhost/hardware-configuration.nix
At the end of this invocation, you end up with a NixOS machine running your config partitioned based on your disk config. My disko config in this case (ZFS pool with 1 disk vdev): https://gist.github.com/aksiksi/7fed39f17037e9ae82c043457ed2...The middle ground we've built is that a real Linux kernel interfaces with your application in the VM (we call it a zone), but that kernel then can make specialized and specific interface calls to the host system.
For example with NVIDIA on gVisor, the ioctl()'s are passed through directly, with NVIDIA driver vulnerabilities that can cause memory corruption, it leads directly into corruption in the host kernel. With our platform at Edera (https://edera.dev), the NVIDIA driver runs in the VM itself, so a memory corruption bug doesn't percolate to other systems.
This isn't true. You can look at the code right here[1], there is no code path in gVisor that calls fork() on the host. In fact, the only syscalls gVisor is allowed to make to the host are listed right here in their seccomp filters[2].
[1] https://github.com/google/gvisor/blob/master/pkg/sentry/sysc...
[2] https://github.com/google/gvisor/tree/master/runsc/boot/filt...
I think it's a small distinction. fork() itself isn't all that useful anyways.
However, consider reading a file in gVisor. This passes through the IO layers, which ultimately will end up a read in the kernel, through one of the many interfaces to do so.
Wait until they find a hole. Then good luck.
$ nixos-rebuild build-image --flake .#myhost --image-variant amazon
$ aws-cli image upload < result/images/image.ami
$ aws-cli create vm --image={image}Same with the CI platforms, instead of `setup-*` steps in GHA it could have just take flake in. Yes, I know I can build OCI image with nix, again, not the issue.
My private CI runs on top of nix, all workers on the same host share /nix/store. My pipelines focused on running actual things rather than getting a worker ready to run things. If I didn't want output to be parsed by CI, I could have just reduced my pipeline to `nix flake check`.
I share the exact same pipeline and worker image across multiple projects in multiple languages, all because everything is hidden behind devenv's tasks. When I switched project different rust and node versions, I didn't have to touch my CI at all. When I added a bunch of native deps that usually needed to be installed separately on GHA - again, didn't have to touch anything beyond my nix env once.