Abusing Privileged and Unprivileged Linux Containers(nccgroup.trust) |
Abusing Privileged and Unprivileged Linux Containers(nccgroup.trust) |
To do this efficiently they've had to make a bunch of changes on the VM side so the overhead is much smaller than an ordinary VM (of the order of 150ms and 20MB of RAM). I've also been looking at this and am hoping to give a talk about it at the KVM Forum in August (http://events.linuxfoundation.org/events/kvm-forum).
One note: you can run multiple "container processes", like redis and a redis dashboard, inside of these Clear Container VMs. This means in the case of Kubernetes we will only incur the cost of the startup time and Kernel/init overhead once per pod instead of once per process.
If you meant "running a process in a VM meets what requirement?", the requirement is security, which as the paper here proves is not available with simple containers running as processes on the host.
Instead this is left up to other tools like LXC. Also note, that higher level features such as network support are also left up to the higher level tool.
Docker and LXC have core differences in vision of what a container should be [2]. Also, Docker used to be based on LXC, but have since done their own library libcontainer which handles the interaction with the kernel primatives.
To me, Docker's philosophy and libcontainer implementation is...as you say, fugly, but LXC's approach and implementation is not.
I also don't think of the kernel exposing primatives and letting user space tools bind them together as inherently bad. I actually prefer it this way and think it leaves the kernel cleaner/leaner/better off.
[1] http://www.slideshare.net/jpetazzo/anatomy-of-a-container-na...
You can just skim this paper to see the problems: non-namespaced identifiers leak in procfs, UID "slides" expose containers to each others resource limits, there are non-namespaced non-containerized kernel functions exposed to root inside of containers, and so on.
You can have the best of both worlds: a secure container substrate, designed from the ground up as a coherent whole like Jails; and the vast packaging ecosystem provided by Ubuntu.
Glad to see you've added a lot of detail to your research. It's very necessary!
edit: (original comment this was in reply to said 'that's an unfortunate username')
So I don't really see how this is considered a big vulnerability, unless the goal is security by obscurity, but then we could go even further and obfuscate the whole system.
>NET_RAW abuse
Hard to blame LXC/Docker for something that has to do with the configuration of the bridge, plus for some setups this is desired functionality.
>DoS
Some of these are interesting but I don't see how filling up the diskspace is a problem with containers and not operating systems in general, and I feel like a lot of these DoS attacks are all just basic OS limitations but I don't know enough to make an informed statement.
Regarding NET_RAW, this is a case where you want reasonable defaults. Needing raw sockets is an exceptional condition for most container setups, and again, gives a greater threat exposure. Even ignoring the potential for things like ARP spoofing, filling up a MAC table on a lot of switches makes them fail over into being essentially rackmount hubs, which can allow for even greater amounts of service denial and information leakage.
Filling up disk space is an area that is problematic with Linux-based containers because in order to keep a process gone awry, or a malicious process from using up all disk space, you have to do things like set up fixed-sized loopback filesystems ahead of time, which impose performance and space constraints that makes your containers less flexible than containers under Solaris zones, for example. Under ZFS, you can directly configure a container to only be able to use x amount of space, without needing to set up loopback devices or other complexities. This allows you to set up limits, but at the same time, means that if a dataset needs it, you just need to run a single command to give it more space.
Yes, a lot of these issues can be easily mitigated, however, they're all symptoms of poor defaults. A good container system should help manage and mitigate these sorts of issues, so they only need to be thought of once, instead of by everyone implementing them.
Hope there was previous disclosure.
I could be wrong...but that path dependency seems to indicate that while they were implemented as more general kernel features...one of their motivating use cases was container isolation.
Can anyone more informed clarify the history for me?
To justify the question a bit: booting traditionally meant physically turning a system on. The boot time included BIOS initialization, a concept now blurred by the advent of virtualization.
150ms is such an absurdly short amount of time that I'm left wondering what booting is in this context.
Clear Linux was announced about a year ago, and it does boot absurdly quickly
https://www.kernel.org/doc/Documentation/filesystems/dax.txt
I imagine that most of the time would be in mocking some/all of the hardware interfaces to present to the VM, and running your init processes (and all that entails for whatever OS you're running).
Taking a 50x hit to run "exit" from a container doesn't sound bad, but it doesn't sound all that far fetched either.
[1] time util from pstools, as installed by scoop.sh - similar to why gcc (not eg msvc - it's all in my path atm, no work needed :)
I can't give up Debian's package system, though, so I'm left hoping that kFreeBSD will amount to something someday and I use Xen or KVM in the meantime... :-(
Why not? What would you miss from it?
What really does my head in is that a default Debian install can pull down 2 megabytes a second from a server over SFTP, and a default FreeBSD 10 server can only do ~800 kilobytes per second (FreeBSD 9 was worse).
"We can prove the existence of FreeBSD jails being actively used in the PS4's kernel through the auditon system call being impossible to execute within a jailed environment"
This quote is from: https://cturt.github.io/ps4.html
Shouldn't be that much of a difference. You might try OpenSSH from ports, maybe the HPN patches will help if you're on a high latency connection.