Bell Labs' Plan 9 research project looks to tomorrow (1990)(doc.cat-v.org) |
Bell Labs' Plan 9 research project looks to tomorrow (1990)(doc.cat-v.org) |
We'll never have any of the things it really promised until we give up on POSIX, tbh.
Not necessarily bloated or inelegant, but over engineered and over confident and missing reasons the first system was successful.
Purifying and perfecting some of the concepts from unix is nice, but someone running a file server or CAD program or editing source code or compiling code or running shell scripts and piping a lot of commands together to do cool things with data just does not see a whole lot of incremental benefit beyond what unix gave them.
Unix was successful because it was there and accessible and pretty easy and pretty good and evolved quickly (if not always elegantly) to new meet new requirements.
For example: purists talk about sockets as some kind of catastrophe. But in all honesty they're not that bad and once you have a few networking tools you can use in shell pipelines you really don't have to have absolute everything as a file. Simple standard composable tools is more important for practical use than everything is a file.
I think most of the missteps in plan9 are above the "everything is a file" layer though. And while sockets aren't that bad, I do think I would still rather interact with them in the plan9 way than the unix way if I could.
What about POSIX is in conflict with Plan 9? I would have called Plan 9 a subset of POSIX
The most pressing problem for implementing plan9-like semantics in a POSIX system is the permission system. In particular, setuid as a mechanism for privilege escalation. This is a big part of why users can't make their own namespaces on linux without help/intervention from root-owned processes (like dockerd or systemd).
Think about it: if you can make the file namespace any shape you want, and then run `sudo`, which is a setuid process that looks at /etc/sudo.conf to decide whether your escalation is allowed, how do you secure it?
How do you even begin to do distributed permissions if everything's looking at /etc/passwd and /etc/group in the current process' namespace to decide who you are?
POSIX is very much built on the idea of a canonical view of the filesystem, and plan9 is built on a vfs that may as well be sand.
POSIX standarized and attempted to unify a dozen different incompatible systems that developed independently on top of the original unix from bell labs. Those systems were developed by building new functionality on top of what unix provided. In order to keep at least some sort of compatibility the old and at times obsolete functionality was kept in the system.
Plan 9 on the other hand intentionally broke compatibility with its predecessors and had those same features that were glued recklessly on top of each other in various unices thoughtfully redesigned from scratch, often omitting stuff that didn't seem relevant enough to its authors.
Note that this - as almost everything else that is plan 9 related - is dated.
For example: consider how you'd write some generic code to forward all ioctls transparently across the network. Keep in mind that the data attached to the ioctls is machine dependent, driver dependent, and has no information about how it's formatted. Every ioctl for every driver is its own special case.
Meanwhile, faithfully forwarding all devices in a plan 9 system is trivial. Control messages aren't strictly formatted -- but they're done via reads and writes on file descriptors, so sending them to the devices that understand them, and relaying back the result, is trivial. It's just 9p: https://man.9front.org/5/0intro
Doing this fully, for all devices (except /srv, which is a bit magical) is implemented here, in a short shell script. This is the remote login program used by 9front, which gives you something resembling ssh or vnc, but with full access to the data and devices on your local machine, graphics, audio, mouse, keyboard, USB, network, and anything else, even if it hasn't been implemented yet. It does both client and server side:
https://git.9front.org/plan9front/plan9front/HEAD/rc/bin/rcp...
The client side sets itself as a file server, using exportfs. It exports everything in its namespace, including /dev, over to the server.
The server takes the client's namespace, and mounts it over /mnt/term. Then, it takes /mnt/term/dev/cons and binds that over /dev/cons and starts a shell. That means that every time a program is run, it opens /dev/cons to interact with the user, using the client's mouse, keyboard, and so on, forwarding all the operations transparently over the network.
The idea can go further; Instead of using network translation layers, for example, a plan 9 machine would import a different machine's network stack and mount it over itself:
# whats my ip?
% cat /net/ipselftab
192.168.1.11 01 4u
% hget https://api.ipify.org
74.{home.address}
# ok, let's import another machine's network stack and use it.
% rimport orib.dev /net/ /net
# what's my ip now? look ma, I'm proxying!
% cat /net/ipselftab
144.202.1.203 01 4u
% hget https://api.ipify.org
144.202.1.203
There are no special hooks in the network stack for this. It doesn't know. This happens for free because the network stack being accessible through the filesystem API.This kind of thing happens everywhere, because everything goes through 9p, and everything can be namespaced. There isn't any other special case to consider: If you forward 9p, you forward all operations you can do with a device. Or any other file server.
If everything is in a namespace, you don't pull the devices other programs are using out from under them, so you can put one login in one sandbox with a remote mouse and keyboard, and a different one in a different sandbox with a different network stack.
This falls apart when you have the 53,719 special cases bundled with posix. If you need a special case for each operation you through the network, or interpose in userspace, you're in for a rough time.
Plan 9 works because it's relatively simple and uniform.
Hard drives are cheap, so space is not an argument anymore, for reasonable uses of disk space. And most uses are reasonable!
Ah, but you might say, if a shared library is compromised, it's easy to push a fix! But how to you think it got so widely compromised in the first place? Perhaps because it was a widely shared library? Sharing is a double-edged sword.
The impetus behind the virtual environments for scripting languages, like Python's venv and Ruby's RVM, is isolation from the base system. Untold developer hours have been lost in attempts to run software with different dependencies than the base system. It's a total mess.
We shouldn't expect an operating system to be a monolith that dictates the dependency versions for all the code that runs on it. Code should be deployed in sandboxes and it should be independent of the base system. When the code is removed, it should be like it was never there.
It even steamrollered its own successor. Plan 9 is brilliant, but Unix already served most people's needs so why change.
Mental game: If they had managed to quickly push the whole thing out as what is now called open source, while Unix was still proprietary, how would the world look now?
For example there is this mandatory covid testing. So each department is handed an excel file and they log the tests. And there is another excel which summarizes those dept’s ones.
Can Plan9 be useful in such a situation?
If there is anything we should thank Plan 9 for, it has to be UTF-8.
Not sure I understand. I imagine without dynamic linking/shared libraries, most widely used shared libraries would be widely used statically linked libraries, and so a vulnerability in them would indeed be harder to fix, as you'd need to relink all the binaries using them instead of just the dynamically linked library?
(Also, memory usage seems more concerning to me than disk space. Shared libraries are called "shared" because all their non-mutable pages in memory are shared across processes. To even approximate the same with static libraries, you'd pretty much have to have deduplication of pages in memory. Link-time optimization might then spoil even that plan entirely. Of course on the other hand, dynamic linking precludes LTO for that.)
All of that code would have to be brought into app binary itself, making mobile apps even larger in code size than they already are. Dead code elimination could eliminate some of that bloat, but I doubt it'd eliminate much of it, given that we're only measuring resident pages to begin with.
Only allowing static linking might make sense on servers (it certainly makes deployment a lot easier). But it wouldn't work on mobile devices without significantly changing the way mobile apps and OSes are architected.
In an ideal world, archives would be well structured executables and my system would automatically deduplicate libraries out of them when they landed on the local filesystem. The linker would automatically pull a whitelist of "known bad" and do overrides for me from my package manager/security source automatically.
I see two unfortunate issues with that:
1. you lose the ability to use manifest and static types for your interfaces (everything's a text or binary protocol, dynamically typed)
2. you lose performance due to (de)serialization at any interface
By this logic, every game developer should be writing their own versions of Vulkan for every GPU they want to target. They would have to ship their own set of GPU drivers with every game.
Also, every app that wants to make a TLS connection would have to develop its own cryptography primitives. Hope you enjoy writing ASN.1 parsers (in C, because that's the Plan 9 way).
Sorry, but this "no dependencies" utopia is completely impractical.
For decades static linking was the only option on most platforms.
An AT&T that was willing to do that would have been willing to let BSDI slide. Linux would have died in the crib and we'd all be using BSD right now.
Computer system security would be vastly superior, I imagine. Webpages would be mounted file systems with restricted permissions systems, and browser apps would be command line utilities. Both would have benefited from the same security that UNIX systems use these days.
Kind of sad that Plan 9 got rid of them literally decades ago and we're still stuck dealing with their mess.
UNIX only steamrollered other OSes, becasue it was already "open source" during its early decade until AT&T got back the rights to sell, and the BSD lawsuit came to be alongside the prohibition of UNIX V6 annotated source book.
Another annoying issue is symlink loops, but I'm not sure if Plan 9's solution solves that.
https://tbhaxor.com/understanding-linux-capabilities/
https://blog.container-solutions.com/linux-capabilities-in-p...
Linux capabilities don't really change any of the issues around namespace security because they don't inherently provide a way to elevate privileges without setuid.
This is the kind of thing I mean about "not knowing what you don't know," because you're looking at namespaces through the lens linux does, which is that they exist to limit capabilities.
Plan 9 uses namespaces to allow users to control their own environment. It's not a special operation, it's just a thing you do all the time.
For a small practical example: there's no PATH environment variable in plan9. You just union mount things into /bin, and /bin is where your shell looks for things to run. It's that much of an every day operation.
If you put a user under a uid namespace in linux, and then give them the right to create their own filesystem namespaces then sure, you've enabled them to potentially do things like this. But you've also blocked them from escalating their privileges, because now they can't use setuid binaries to obtain "real root" or whatever.
So you're left with one or the other: either you can manage your own namespace, but you have to be protected from potentially breaching root security through a setuid or cap flag on a binary; or you have to be prevented from managing your namespace outright in order to avoid lying to sudo about who can do what.
What it sounds like plan9 is doing is giving a local view of the root that local processes see. Which Linux can do too. Not with the same use-cases in mind as Plan9 though, as such capabilities were added for sandboxing/containerization. But the mechanisms are probably(?) general enough to do Plan9 in Linux.
A `sudo` that is seeing a local view of the root is going to have privileged access to that local root, not to the global system root. And that is correct. That is what sudo does. It gives root access to the same root that contains /etc, not to any "outer" or "more global" root.
It doesn't mean you can't have any access to the global root from the local root though. There are many ways to arrange such privilege escalation. (They do have to be arranged, of course, by someone writing the userspace code -- like sudo had to be written.)
>If you put a user under a uid namespace in linux, and then give them the right to create their own filesystem namespaces then sure, you've enabled them to potentially do things like this. But you've also blocked them from escalating their privileges, because now they can't use setuid binaries to obtain "real root" or whatever.
Privileged processes can have a global view of the namespace while the user does not. An ordinary setuid binary on a filesystem the user controls can't get a global view, only because the user does not (should not) have authority to do that. A process with the global view and root can grant the authority though.
The important thing, it seems to me, is that the global outer namespace can grant to the process local namespace any capabilities available through the outer namespace. I'm not sure if this is 100% completed but the ongoing containerization efforts do involve reaching toward that 100% mark.
There has to be a better answer than “Wow, I can make my computer that can’t run anything people want to run secure in a hypothetical hierarchical organization structure of permissions that can each have their own subtree sudo. I even call it treedo, ha-ha!” .. it just doesn’t resonate.
If you want to give actual real global root, I think you can do it by having a gifting process put the real global root process into the same process namespace as the giftee process.
What I did not understand is how static linking would have precluded the problem in the first place. I don't think that would have made those libraries less widely used.
Yes, but if a shared library update includes a bug, it affects all your programs! That's why it is a double-edged sword: The same mechanism that solves the bugs can also deliver bugs.
I would frequently love to have the ability to just mount a bunch of cpus off a beefier machine onto my laptop and take advantage of that to speed up my builds. I can use DISTCC but holy hell is it a lot more complicated to set up.
Or like, mounting a zip file as a directory, without needing a whole enormous systemd or gnome hairball along with fuse or gfs to make it happen as a regular user. Or hell, mount a usb stick even!
These are literally things I wish were easier every day as a software developer. The vaguely plan9-shaped bits that have been added to linux over the years have brought me no closer to them.
In plan9 mount is not a privileged operation. Anyone can do it at any time for any reason. It does not impact or interact with the security of the system (except that you can implicitly remove access to things by unmounting them).
The authentication mechanisms on most Linuxes are based around suid binaries that read configuration files in order to decide on what to do, so if you can bind in a namespace, you can fool the authentication mechanisms.
In plan 9, this is solved with the kernel capability device. It's not particularly exciting, it's just one of the things that need to happen when you remove the concept of a global 'root user' from the system.
The point is that you DONT have root, and you DONT have access to write to the file. But you're free to rearrange your namespace WITHOUT having root, and you want to arrange for SOME users to escalate privileges and, say, debug the kernel. Or do something else dangerous that requires elevated privileges in the global context.
> A `sudo` that is seeing a local view of the root is going to have privileged access to that local root, not to the global system root. And that is correct. That is what sudo does. It gives root access to the same root that contains /etc, not to any "outer" or "more global" root.
Yes, and that's a concise description of the flaw: you can't use suid+file based privilege escalation to modify system-wide configuration, without restricting the ability to manage freely your namespace.
This is what unix does by design, and why the authentication design from unix isn't going to work for systems used in the style of plan 9.
Suid programs on Plan9 could not possibly behave any differently. If the user rebinds `/bin` and then runs a suid program that calls other programs, that suid program cannot use the rebound `/bin`. That kind of binding simply can't be allowed to cross security contexts.
They are not. The whole point of this subthread is that the ability to create namespaces as an unprivileged user[1] would be key to actually 'doing plan9 in linux'. You can not believe that if you want, but I'd suggest you read up a bit more on plan9 if so, because it becomes obvious pretty quick that it's the case.
[1] And here by 'unprivileged user' I mean someone who is still a user of the machine, and not a user who has been containered away into a separate user namespace, let alone into a whole docker-style container.
Last time I ran into the “edit an ext4 image as an unprivileged user” problem I used a small VM.
There are people trying to fix this problem and there’s a legitimate reason why it’s hard: https://lwn.net/Articles/755593/
IIRC the patches needed for what’s described in that article are already there on Ubuntu.
Plan9's design makes all of this very simple, and a big part of why is the specific choice to eschew standard UNIX semantics and use a different kind of mechanism for privilege management that allowed for flexible namespaces managed outside the kernel. It also has the advantage of moving all filesystem operations out of the kernel.
So this really backs up my point, rather than contradicts it: It's only through mitigating and otherwise contradicting traditional POSIX semantics that linux is able to approach this kind of thing.
Well yeah, that’s exactly what FUSE + user namespaces does to solve this problem on Linux.
Of course normal filesystems don’t do this because it would be way too slow if the kernel can’t share data structures with the file system.
In neither Plan9, nor Linux, nor any other potential system, could you have the caller of a privileged program control the namespaces accessed under the privilege of that program.
Like the whole thing about rebinding /bin instead of $PATH... well neither rebinding, nor editing $PATH, nor any other such thing, $LD_PRELOAD, anything, that would affect how the process found files to execute, could be secure if allowed to affect a privileged process. It only means you disable environment sharing, and you disable namespace sharing, etc., any other kind of sharing (no matter how implemented... Linux way, Plan9 way, anything). Plan9 has no better way than that, no way to make /bin rebinding work in a way that makes sense for privilege escalation.
I think you mean, 'and therefore, it's not a problem.'
In plan 9, the program that allows you to switch users needs almost no privileges.
> Plan9 has no better way than that, no way to make /bin rebinding work in a way that makes sense for privilege escalation.
Processes aren't allowed to become privileged without obtaining a cryptographic capability token via negotiation with the authentication agent. The auth agent which was started at boot will write the secret to the kernel, and give you the hash of it so you can prove you are the rightful recipient of that uid switch request. If you don't have the right secrets, you don't get a token.
You can't swap the devcap in the authenticators namespace, and negotiating with a rebound devcap is simply not going to work, because your authentication token wasn't written into it. You don't have the ability to change what the program doing authentication sees.
In summary: There's no information attached to the binary, and no capability grants in a namespace you can rebind.
If you want to namespace this sort of authentication agent, it's also possible -- you can authenticate, escalate permissions, and start your own agent talking to devcap -- but the capability to start a functional authentication agent is guarded by capability tokens. You need to be authenticated to allow authentication.
Removing suid binaries with config files as a method for privilege escalation is the right path. They don't play well with namespaces.
Cryptographic capability tokens that you can delegate to other programs do.
> You don't have the ability to change what the program doing authentication sees.
Obviously not. It has to behave just like sudo in this regard! It's the only option!
Since you can't do that, your rebindings are just as localized as what Linux permits.
> Processes aren't allowed to become privileged without obtaining a cryptographic capability token via negotiation with the authentication agent. The auth agent which was started at boot will write the secret to the kernel, and give you the hash of it so you can prove you are the rightful recipient of that uid switch request. If you don't have the right secrets, you don't get a token.
This is pretty cool, and also something that could be put into a Linux userspace daemon to authenticate privileged operations. I mean this is all userspace details from a Linux perspective. Systemd could do this or some PAM module.
> Removing suid binaries with config files as a method for privilege escalation is the right path. They don't play well with namespaces.
Suid binaries aren't used that commonly for privilege escalation anyway. Much more ordinary is to use privilege inherited from init.
Again, going back to the example I was using: How does that help with securely allowing `$get_permissions debug-my-kernel`?
Sudo is a HOLE IN THE SIDE OF A BOAT. It is not a problem for Plan 9, because plan 9 does not have a hole, and is therefore not doing contortions to avoid filling with water. Designing a boat without a hole in its side is generally considered a good idea. Designing a security model without a suid in its side is a similarly good idea.
If you want `auth/as`, it's there. But it does not use suid, and therefore does not have the problems created by suid.
Ok, so let's make this more concrete. The scenario we're discussing is a user who wishes to (illegitimately) elevate their privileges, and has the ability to mount, and in particular bind/union mount:
> mkdir ~/my-etc
> echo 'badperson ALL=NOPASSWD:(ALL:ALL) ALL' > ~/my-etc/sudoers
> mount --bind ~/my-etc /etc # note: you could also insert a command to bump us into a new filesystem namespace before this if you want, or assume that say logind did it for us, it wouldn't change anything about what happens next
> sudo -s
# rm -rf /*
This happens because sudo is a setuid program, and it inherits the filesystem namespace of the shell that ran it. In that namespace, the real sudoers file has been masked with a fake one that says badperson can sudo to any user they want, and so it lets them.That this is the mechanism of privilege escalation that linux uses is fundamental to why mounting (and bind mounting) is a privileged operation. Plan9 does not have either root, nor setuid programs, nor filesystem-stored capabilities, and so does not suffer from this and you can manipulate your namespace to your heart's content.
> Plan9 has no better way than that, no way to make /bin rebinding work in a way that makes sense for privilege escalation.
Please just read some things about plan9 already.