The cost of a system call [pdf](cs.cmu.edu) |
The cost of a system call [pdf](cs.cmu.edu) |
They observe the cache damage from traditional system calls and propose batch queueing them and ideally using a different core to service them. This is not the traditional Unix programming model, so they create a threading package the transparently makes your traditional Unix synchronous system calls work. They benchmark Apache with all of this new apparatus and it performs very well.
It's painful to realize that, after a context switch, modern CPUs can need 11,000 cycles to get back to full speed, with the right stuff in the caches and pipelines. Maybe we need CPUs which handle context switches better.
It's the other ugly state changes that hurt a lot. Switching address spaces burns a few hundred cycles and zaps the TLB (fix coming in Linux 3.7, maybe). Interrupts are a few thousand cycles.
Some papers I read in no particular order:
Synthesis OS (http://valerieaurora.org/synthesis/SynthesisOS/) might be interesting for you. They do lots of runtime code synthesis.
Exokernels (follow links from https://en.wikipedia.org/wiki/Exokernel#Bibliography). And more recently Mirage (https://mirage.io/) and HaLVm (https://github.com/GaloisInc/HaLVM)
(I assume you already know how to program. Otherwise, brush up on that as step 0. C is still the canonical choice for OS work. But if you are feeling adventurous there's more choice.)
I'd be really concerned about trust issues, but I'm sure it could be done safely. Lots of room for corner cases, especially w/NUMA.
It looks like this is not what the article's implementation does, but I think it would be possible.
> The system call interface is very similar to POSIX's system calls