Firecracker – Lightweight Virtualization for Serverless Computing

Firecracker – Lightweight Virtualization for Serverless Computing(aws.amazon.com)

368 points by leef 7 years ago | 110 comments

sudhirj 7 years ago |

What this is allows, and I'm hoping a full fledged service will be announced on Thursday or Friday, is running containers as Lambdas. i.e. if you application starts fast enough, you can just set a container to start and run as a request comes in. It can also shut down when it's done running.

This allows things like per second billing for container runs, serverless containers (there's no container running 24/7, only when there's traffic), etc.

Twirrim 7 years ago | |

Why run a container? What value does that abstraction provide here?

To my mind this completely negates any value proposition of the container. The only thing missing, at face value, is something as straightforward as the Dockerfile for building base images. I imagine that shouldn't be hard using things like guestfish etc in guestfstools.

sudhirj 7 years ago | | |

Want to run Ruby on Rails on Lambda, with no changes from the servers I run on my laptop. Or maybe I want to run Crystal. Or maybe I’m writing my own language. Doesn’t really matter.

Lambda works great as a deployment and execution model. This allows anything to run on Lambda, not just specially prepared runtimes.

nine_k 7 years ago | | |

Your container = your environment.

When your container is not running (say, 99% of time), other customers' containers are running. No need to ever boot the kernel, etc.

One might say that an unikernel has advantages over it. But it also has a higher barrier to entry.

woogley 7 years ago | |

I'm expecting/hoping for this as well. GCP already has something like this in alpha - https://services.google.com/fb/forms/serverlesscontainers/ More info on it here in "Serverless containers" section: "https://cloud.google.com/blog/products/gcp/cloud-functions-s...

nazka 7 years ago | |

On its website in the FAQ they say that it can't run Docker and others, yet. I hope this is coming soon too!

discodave 7 years ago | |

> containers as Lambdas.

How similar is AWS Fargate to what you're describing?

sudhirj 7 years ago | | |

I need to run 1+ Fargate containers 24/7, which is useless and wasteful.

With Fargate-Lambda crossover I wouldn't be running anything 24/7, and it would be a lot less resource intensive than one Lambda-Container per request as well.

Google's App Engine gets / got this right when they first launched, but to make it work they had to demand apps be written for their sandbox (like AWS Lambda), because of which the model isn't as general purpose. Firecracker would allow regular containers to be used this way, making a Firecracker service the first service to allow general purpose servers to be started and stopped (all the way to zero) based on incoming traffic.

tybit 7 years ago | | |

Another missing piece in addition to the billing aspect mentioned elsewhere, is all the existing event based integrations lambda provides. E.g. react to kinesis, sqs, sns etc. events. having aws manage the event plumbing in addition to start/pause/stop is really nice.

xmly 7 years ago | | |

Pricing model is not serverless. The basic serverless principle is no-use-no-pay.

blasdel 7 years ago |

There's a Github Pages FAQ describing why it was made and how it fits with other solutions: https://firecracker-microvm.github.io/

and a high-level design document about how it works https://github.com/firecracker-microvm/firecracker/blob/mast...

espeed 7 years ago | |

Interesting name choice. When I clicked on the link and saw the name and design, my first thought was, "Is this a Firebase knockoff...?" [1] ... and then I scrolled to the bottom to see the copyright and saw this project is by Amazon Web Services.

[1] https://firebase.google.com

krat0sprakhar 7 years ago |

> Firecracker was built in a minimalist fashion. We started with crosvm and set up a minimal device model in order to reduce overhead and to enable secure multi-tenancy. Firecracker is written in Rust, a modern programming language that guarantees thread safety and prevents many types of buffer overrun errors that can lead to security vulnerabilities.

This is awesome! Really excited to try this out!

talawahtech 7 years ago |

This is huge! It basically removes the VM as the security boundary for something like Fargate [1]. This should lead to a significant reduction in pricing since Fargate will no longer need to over provision in the background because VMs were being used even for tiny Fargate launch types.

It should hopefully eliminate the cost disparity between using Fargate vs running your own instances. Should also mean much faster scale out since you containers don't need to wait on an entire VM to boot!

Will be interesting to see what kind of collaboration they get on the project. This is a big test of AWS stewardship of an open source project. It seems to be competing directly with Kata Containers [2] so it will be interesting to see which solution is deemed technically superior.

[1] https://aws.amazon.com/fargate/ [2] https://katacontainers.io/

Aissen 7 years ago | |

Indeed, this seems very similar to kata+runv+kvmtool(lkvm). I'm curious why they don't provide a comparison. Here's what I gathered:

- it seems to boot faster (how ?)

- it does not provide a pluggable container runtime (yet)

- a single tool/binary does both the VMM and the API server, in a single language.

Can anyone else chime in ?

coder543 7 years ago | | |

> I'm curious why they don't provide a comparison

They do, if you read the FAQs: https://firecracker-microvm.github.io/#faq

justincormack 7 years ago | | |

From memory the original version of Intel Clear Containers had its own kvm based vmm but they moved back to qemu (or a more minimal patched version they maintain). They are working on containerd support so should be similar to Kata soon.

tlrobinson 7 years ago | |

It sounds like it’s already being used in Lambda and Fargate, though I’m not sure how long that’s been the case:

> Firecracker has been battled-tested and is already powering multiple high-volume AWS services including AWS Lambda and AWS Fargate

kraemate 7 years ago |

Clear containers (now called kata containers) did this more than three years ago, with similar performance numbers (sub 200 ms boot times). It is frustrating, but not surprising, to see the same regurgitated solution receive this much excitement. The firecracker documentation also does not mention the similarity with prior work, oh well.

[Not affiliated with Intel in any way---just a long-time proponent of the clear containers approach.]

xaduha 7 years ago |

> microVMs, which provide enhanced security and workload isolation over traditional VMs, while enabling the speed and resource efficiency of containers.

Reminds me of rkt + kvm stage 1 https://github.com/rkt/rkt/blob/master/Documentation/running...

Too bad it didn't take off.

tlrobinson 7 years ago |

This looks great, I’m just wondering what Amazon’s motivation for open sourcing it is. It seems like some pretty critical secret sauce for making services like Lambda and Fargate both secure and efficient.

xrd 7 years ago |

My big question is: is this something only exciting for people doing lambda at massive scale?

Qemu is exciting technology and has paved the way for all kinds of interesting layers. So, creating a slimmed down improvement that really makes it faster and provides a new lambda-ish execution context is great.

I'm sure Amazon cares about that. I'm sure people doing millions of lambda calls a day care about that.

But, if I'm an entrepreneur thinking about building something entirely new, is there something I'm missing about this that would make me want to consider it?

Lambda and Firebase Functions are exciting partially because they break services into easy to deploy chunks. And, perhaps more importantly, easy things to reason about.

But that's not the big deal: the integration with storage, events, and everything else in AWS (or Firebase) is what really makes it shine. It's all about the integration.

When I read this documentation, I'm left wondering whether I want to write something that uses the REST API to manage thousands of micro vms. That seems like extra work that Amazon should do, not me.

Am I missing something important here? Surely Amazon will integrate this solution somewhat soon and connect it to all the fun pieces of AWS, but the fact that they didn't consider or mention it makes me think it is something I should not consider now.

Tehnix 7 years ago |

I really hope this helps with the cold start times on Lambda. We were currently looking heavily into moving our API from Lambda to EKS, but if this impacts cold start times, I think we will look at how it ends up looking like in practice.

sudhirj 7 years ago | |

Most code start time problems on Lambda I've seen are VPC related - public network Lambdas start in milliseconds, with the main lag being the userspace code startup time.

Starting a Lambda inside a VPC involves attaching a high security network adapter individually to each running process, which is likely what takes so long. I assume AWS is working on that, though, they've claimed some speedups unofficially.

If your security model allows, try running your Lambdas off-VPC.

Tehnix 7 years ago | | |

The VPC startup times are insane, so we quickly move our lambdas out of that, accepting the trade off.

Our normal cold starts are in the 1-2 second range, and the app initialization comes after. Too high for an API facing users :/

mneves 7 years ago | |

One solution is invoking a scheduled Lambda (with a test payload) at regular intervals to keep the function warm.

colemickens 7 years ago |

The crosvm and Rust have me intrigued. I was hoping for something like this since I saw the first hints of Rust showing up in ChromeOS in crosvm.

A compare/contrast with Kata Containers would also be interesting. Their architectures look similar. (Kata Containers [1] being another solution for running containers in KVM-isolated VMs, that has working integrations with Kubernetes and containerd already. Not affiliated, but I'm tinkering with it in a current project, though I'm also now keen to get `firecracker` working as well.)

Obviously, if nothing else, qemu vs crosvm is a big difference, and probably significant since my understanding is that Google chose to also eschew using qemu for Google Cloud.

[1]: https://katacontainers.io/

aliguori 7 years ago | |

Kata Containers is a lot of infrastructure for running containers and it uses QEMU to run the actual VMs. Firecracker just replaces the QEMU part and we're eager to work with folks like the Kata community.

I love QEMU, it's an amazing project, but it does a ton and it's very oriented towards running disk images and full operating systems. We wanted to explore something really focused on serverless. So far, I'm really happy with the results and I hope others find it interesting too.

zaxcellent 7 years ago | | |

We felt the same way about QEMU before we started crosvm. Glad to see you all found some use out of it.

wirelessben 7 years ago | |

The devops training site katacoda.com will be interesting to watch. They spin up and tear down _so_ many VMs, their cloud bill must be monstrous. Firecracker is much leaner, so they would save a lot of cycles by spinning up Firecracker over Kata.

colemickens 7 years ago | | |

Katacoda has nothing to do with Kata Containers...

I'm not sure how you can make any of the conclusions anyway, unless you know a lot of seemingly private details about how KataCoda is implemented.

tatoalo 7 years ago |

It’s QEMU without all the legacy stuff, they also open sourced it, interesting.

jetzzz 7 years ago | |

QEMU can do much more than this.

sitkack 7 years ago | | |

Which is exactly the problem.

tatoalo 7 years ago | | |

That’s why the attack surface is way larger and harder to keep an eye on.

mcrute 7 years ago |

More discussion here: https://news.ycombinator.com/item?id=18539532

sudhirj 7 years ago |

@zackbloom, @kentonv hint hint. Isn't this roughly the same memory footprint as a Worker? CONTAINERS ON ALL THE CLOUDFLARE THINGS!

zackbloom 7 years ago | |

Heh. Truthfully, what I'm most excited about right now is being able to start a worker in less time than it takes to make an internet request. When you can do that you get magical autoscaling and it becomes just as cheap to run it in hundreds of places as one. As long has you have to invest ~100ms of CPU to get one of these VMs running I'm not sure it will have quite the same economics.

sudhirj 7 years ago | | |

Yeah, jokes aside I simply don’t think it makes sense to run full processes on the edge. Not yet, anyway.

Script isolates makes a lot of sense with current hardware limitations, but full processes at the edge are coming sooner or later.

ec109685 7 years ago | |

You still have a full Linux kernel running inside the vm though?l with Firecracker versus essentially a fiber with cloudflare.

sudhirj 7 years ago | |

If you can implement it by tomorrow afternoon before the Andy Jassy keynote you might be able to steal some thunder.

whalesalad 7 years ago |

I’m very excited to play with this technology in the same way I love playing with Elixir/Erlang and userland concurrency models. I also love the idea of docker (and use it daily) but dislike the ergonomics. My first thought is, particularly with the emphasis on oversubscription, how does the kernel of the host schedule work?

mark212 7 years ago |

still seems much slower than the model used by Cloudflare for what they call "workers."[1] A recent blog post a few weeks back was the subject of considerable discussion here[2], and it seems to me to be doing much the same thing as Firecracker, but still faster because there's less overhead. But maybe I'm missing something.

[1] https://blog.cloudflare.com/cloud-computing-without-containe...

[2] https://news.ycombinator.com/item?id=18415708

tlrobinson 7 years ago | |

> But maybe I'm missing something.

From the "Disadvantages" section of your first link:

"No technology is magical, every transition comes with disadvantages. An Isolate-based system can’t run arbitrary compiled code. Process-level isolation allows your Lambda to spin up any binary it might need. In an Isolate universe you have to either write your code in Javascript (we use a lot of TypeScript), or a language which targets WebAssembly like Go or Rust."

"If you can’t recompile your processes, you can’t run them in an Isolate. This might mean Isolate-based Serverless is only for newer, more modern, applications in the immediate future. It also might mean legacy applications get only their most latency-sensitive components moved into an Isolate initially. The community may also find new and better ways to transpile existing applications into WebAssembly, rendering the issue moot."

tuananh 7 years ago | |

the way i see it, firecracker is more flexible but cloudflare workers isolate is faster. amazon can't afford the limitation of Isolate hence this project.

solatic 7 years ago |

"Process Jail – The Firecracker process is jailed using cgroups and seccomp BPF, and has access to a small, tightly controlled list of system calls."

So basically, a gVisor alternative?

perbu 7 years ago | |

Firecracker contains a machine emulator. This emulator will jail itself before launching the OS to reduce the attack surface the emulator has towards the host.

ec109685 7 years ago | |

gVisor doesn't use KVM:

"Machine-level virtualization, such as KVM and Xen, exposes virtualized hardware to a guest kernel via a Virtual Machine Monitor (VMM). This virtualized hardware is generally enlightened (paravirtualized) and additional mechanisms can be used to improve the visibility between the guest and host (e.g. balloon drivers, paravirtualized spinlocks). Running containers in distinct virtual machines can provide great isolation, compatibility and performance (though nested virtualization may bring challenges in this area), but for containers it often requires additional proxies and agents, and may require a larger resource footprint and slower start-up times."

solatic 7 years ago | | |

Yeah but one of the main ways in which gVisor provides security is by intercepting system calls and strictly limiting which calls can be made. Firecracker may use KVM instead of running entirely in usermode, but as far as most of us are concerned, that's an implementation detail. The pertinent question is whether the price of security is limiting the possible system calls, which means that Firecracker won't be able to run arbitrary containers, just as gVisor doesn't guarantee that it can run arbitrary code (which may require filtered system calls).

sdart 7 years ago |

Does this provide any multi host cluster management capabilities?

polskibus 7 years ago |

Does it support Windows?

steveklabnik 7 years ago | |

https://firecracker-microvm.github.io/ says

> What operating systems are supported by Firecracker?

> Firecracker supports Linux host and guest operating systems with kernel versions 4.14 and above. The long-term support plan is still under discussion. A leading option is to support Firecracker for the last two Linux stable branch releases.

chupasaurus 7 years ago | |

KVM-based, so no it doesn't.

perbu 7 years ago | | |

KVM supports Windows just fine, which is why you can run Windows on GCP and Openstack. And Firecracker seems to support enough of a machine to boot Windows as long as the windows instance has support for libvirt disk devices and a libvirt NIC.

However, it seems they boot in a slightly unconventional way. They take a elf64 binary and execute it. This works for Linux and likely some other operating systems that can produce elf64 binaries. Windows supports legacy x86 boot and UEFI, but likely not elf64 "direct boot".

So if you can get windows into an elf64 binary and have it run without a GPU you could have it boot. So, likely not. But the reason isn't due to KVM.

testbotlo2 7 years ago |

Can someone explain me how does this work? Is it an orchestration service for containers like Kubernetes or is it any different?

nunez 7 years ago |

I am extremely excited by this. i wonder if this can be used to provision jit kubernetes workers.

polskibus 7 years ago |

How does this compare to containers?

perbu 7 years ago | |

Containers share the OS kernel and some services. This is a virtual machine monitor, so it deals with virtual machines. A container can only run Linux containers.

Firecracker can likely run other operating systems, such as IncludeOS. You can't run those in containers.