A fork() in the road

257 points by ralish 7 years ago | 178 comments

sfink 7 years ago |

I read the paper, and they make a lot of good points about fork's warts.

But I really wanted some explanation of why Windows process startup seems to be so heavyweight. Why does anything that spawns lots of little independent processes take so bloody long on Windows?

I'm not saying "lots of processes on Windows is slow, lots of processes on Linux is fast, Windows uses CreateProcess, Linux uses fork, CreateProcess is an alternative to fork/exec, therefore fork/exec is better than any alternative." I can imagine all kinds of reasons for the observed behavior, few of which would prove that fork is a good model. But I still want to know what's going on.

ralish 7 years ago | |

I'm a bit rusty on this but from memory the overhead is by and large specific to the Win32 environment. Creating a "raw" process is cheap and fast (as you'd reasonably expect), but there's a lot of additional initialisation that needs to occur for a "fully-fledged" Win32 process before it can start executing.

Beyond the raw Process and Thread kernel objects, which are represented by EPROCESS + KPROCESS and ETHREAD + KTHREAD structures in kernel address space, a Win32 process also needs to have:

- A PEB (Process Environment Block) structure in its user address space

- An associated CSR_PROCESS structure maintained by Csrss (Win32 subsystem user-mode)

- An associated W32PROCESS structure for Win32k (Win32 subsystem kernel-mode)

I'm pretty sure these days the W32PROCESS structure only gets created on-demand with the first creation of a GDI or USER object, so presumably CLI apps don't have to pay that price. But either way, those latter three structures are non-trivial. They are complicated structures and I assume involve a context switch (or several) at least for the Csrss component. At least some steps in the process also involve manipulating global data structures which block other process creation/destruction (Csrss steps only?).

I expect all this Win32 specific stuff largely doesn't apply to e.g. the Linux subsystem, and so creating processes should be much faster. The key takeaway is its all the Win32 stuff that contributes the bulk of the overhead, not the fundamental process or thread primitives themselves.

EDIT: If you want to learn more, Mark Russinovich's Windows Internals has a whole chapter on process creation which I'm sure explains all this.

intea 7 years ago | | |

The WSL processes are called pico processes.

https://blogs.msdn.microsoft.com/wsl/2016/05/23/pico-process...

Gibbon1 7 years ago | | |

> created on-demand with the first creation of a GDI or USER object, so presumably CLI apps don't have to pay that price

This tickles my brain. I read some blog post bitching that because Windows DLL's are kinda heavy weight it's way easy end up paying that price without realizing it.

speedplane 7 years ago | |

I used to work on a cross-platform project, and spent several weeks trying to figure out why our application ran significantly faster on linux than windows. One major culprit was process creation (another was file creation). I never really uncovered the true reason, but I suspect it had to do with the large number of DLLs that Windows would automatically link if you weren't very careful. Linux, of course, can also load shared code objects, but in my experience, they are smaller and lighter weight.

fanf2 7 years ago | | |

Anti-virus software makes process and file operations a lot slower.

zenexer 7 years ago | |

This probably isn't the technical explanation your looking for, but, in general, processes on Windows and processes on Unix aren't the same--or, at least, they're not meant to be used the same way. Creating lots of small processes on Windows has long been discouraged and considered poor design, whereas the opposite is true on Unix.

One could probably argue that processes on Windows need to be lighter-weight now that sandboxing is a common security practice. These days, programs like web browsers opt to create a large number of processes both for security and stability purposes. In much the same way that POSIX should deprecate the fork model, Windows should provide lighter-weight processes.

wvenable 7 years ago | | |

Windows now has minimal processes that have almost no setup and pico processes (based on minimal processes) that are the foundation for Linux processes in WSL.

SifJar 7 years ago | |

Others have mentioned about DLLs being pulled in, following post might be interesting:

https://randomascii.wordpress.com/2018/12/03/a-not-called-fu...

chris_wot 7 years ago | | |

It's not process creation that is tricky, it's process termination!

To see how Libreoffice does it, see https://opengrok.libreoffice.org/xref/core/sal/osl/w32/proce...

naasking 7 years ago | |

Microsoft Research doesn't just do research on Windows. They employ lots of researchers that are free to pursue many different topics.

kazinator 7 years ago | |

CreateProcess requires an application to initialize from scratch. When you fork, you cheaply inherit the initialized state of the whole application image. Only a few pages that are mutated have to be subject to copy-on-write. Even that copy-on-write is cheaper than calculating the contents of those pages from scratch.

JdeBP 7 years ago | | |

There has been a lot of discussion in recent years about how cheap that "cheaply" really is.

* https://news.ycombinator.com/item?id=9653238

* https://news.ycombinator.com/item?id=18071278

* https://news.ycombinator.com/item?id=19622503

muststopmyths 7 years ago | |

If I had to guess, I'd point to DLLs. The minimal Windows process loads probably half a dozen, plus the entry points are called in a serialized manner.

richardwhiuk 7 years ago | | |

Pretty much identical to shared objects on Linux

rgovostes 7 years ago |

On macOS, fork() is a bit weird: https://opensource.apple.com/source/Libc/Libc-997.90.3/sys/f...

Many frameworks are backed by XPC services, where the parent process has a socket-like connection to a backend server. After forking, the child would have no valid connection to the server. The fork() function establishes a new connection in the child for libSystem, to allow Unix programs to port easily to macOS, but other services' connections are not re-established. This makes fork on macOS (i) slow, and (ii) unsafe for code that touches virtually any of Apple's APIs.

cryptonector 7 years ago | |

fork() is generally unsafe for that reason, and OS X is only special in this regard in that it has more of these hidden C library handles that can blow up on the child-side of fork(). vfork()+exec()-or-_exit() is much safer.

ksherlock 7 years ago | |

BeOS GUI applications also had problems with fork()

dis-sys 7 years ago |

I agree with most parts of the paper.

Fork() is now basically the root of a looong list of special cases in so many aspects of programming. Things get even worse when you use a language with built-in runtime such as Golang for which multi-threaded programming the default behaviour. If fork() can't even handle multiple threads, what is the real point of having it when a 8 core 16 threads AMD processor is about $150 each.

sqrt17 7 years ago | |

> If fork() can't even handle multiple threads, what is the real point of having it when a 8 core 16 threads AMD processor ...

These threads and those threads are not the same. The 16-threads SMT processor will happily chew on 16 different programs, processes or whatever the load at the moment is, e.g. if you use Python's multiprocessing you can create 16 processes and they'll be executed in parallel.

fork() can handle multiple threads but you have to be attentive when cleaning up etc. - quite often, code using fork() will get confused when you spawn threads, and code using threads will get confused when you fork()

swiftcoder 7 years ago |

Fork has really weird semantics, and a lot of fun gotchas around managing resources. Good riddance?

vbernat 7 years ago |

fork() is also used to daemonize and for privilege separation, two tasks where posix_spawn() cannot be used. I suppose daemonization can be seen as something of the past, but privilege separation is not. On Linux, privileges are attached to a thread, so it should be possible to spawn a new thread instead of a new process. However, a privileged thread sharing the same address space as an unprivileged one doesn't seem a good idea.

The paper also mention the use case of multiprocess servers which relies heavily on fork() but dismiss it as it could be implemented with threads. A crash in a worker would lead to the crash of the whole application. While a worker could just be restarted.

A proper use case of removing fork() from an actual program would help. For example, how nginx on Windows is implemented?

alkonaut 7 years ago | |

I can’t answer for Nginx but normally on windows if you want “worker processes” you just start N of them and have them read work from a shared memory queue. That is, workers live longer than the tasks they perform. If one crashes, a new one is spawned. This does seem like a more sensible way of doing things than forking tbh. It isolates work in processes but doesn’t pay for process creation per request.

DblPlusUngood 7 years ago | | |

Is recovery of a shared memory queue after one of the workers crashes even possible, in general? (what if the worker crashed before releasing a lock?)

mikekchar 7 years ago | | |

I may be strange, but that's the way I've always used fork() as well. It's one of the reasons why named pipes exist (or at least that's what I've always thought).

bloak 7 years ago | | |

"If one crashes, a new one is spawned."

I suppose that makes sense on an OS on which crashing is expected behaviour, though some people would want to know what bug caused the crash and whether that bug has security implications.

cryptonector 7 years ago | |

posix_spawn() attributes can do a lot of this, and a helper program can do much if not all of the rest.

Removing fork() will take a long, long time. Every popular use case needs an alternative that doesn't suck.

But then again, fork() is kinda awful[0].

[0] https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c...

cryptonector 7 years ago |

I've been saying this for quite some time. Here's a gist I wrote about it: https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c...

ktpsns 7 years ago |

Can anybody elucidate about why fork() is still used in Chromium or Node.js? They are not old-grown traditional forking Unix servers (unlike Apache or the mentioned databases in the paper). I would expect them to implement some of the alternatives and having fork() only as a fallback in the code (i.e. after a cascade of #ifdefs) if no other API is available. Therefore, I wonder where the fork() bottlenecks really appear in everyday's life.

evmar 7 years ago | |

Chrome on Windows uses CreateProcess, and Windows came first, so Chrome is mostly architected around an approach that would fit posix_spawn better. However, fork has some benefits that I went into here:

http://neugierig.org/software/chromium/notes/2011/08/zygote....

xyzzyz 7 years ago | |

> why fork() is still used in Chromium

To support a multi-process web browser architecture that Chromium pioneered, you need to spawn processes. See https://chromium.googlesource.com/chromium/src/+/HEAD/docs/l...

Dylan16807 7 years ago | | |

That's not what the page says. It says the use of fork() saves 8MB and a few tens of milliseconds per process spawn.

IshKebab 7 years ago | |

This is explained in the paper. It's to get access to copy-on-write memory so you can make a pre-initialised process cheaply.

eesmith 7 years ago |

It points out that "1304 Ubuntu packages (7.2% of the total) calling fork, compared to only 41 uses of the more modern posix_spawn()".

In section 7 it suggests "We should therefore strongly discourage the use of fork in new code, and seek to remove it from existing apps."

Is anyone here going to help work on changing those 1304 packages?

I have already over-volunteered for thankless FOSS tasks like this, so I know it won't be me.

wbl 7 years ago | |

There are things that don't fit the posix_spawn limitations, especially with fd or capability manipulation.

eesmith 7 years ago | | |

Yes, certainly. The paper covers many of those limitations.

The goal is not "remove", but "seek to remove". The relevant definition of "seek" here is "to make an attempt" says https://www.merriam-webster.com/dictionary/seek .

How many of those 1304 Ubuntu packages require fork()? Are there benefits to replacing (say) 1283 of them with posix_spawn()?

harryf 7 years ago |

From the paper...

> 7. GET THE FORK OUT OF MY OS!

Someone couldn't resist...

heavenlyblue 7 years ago |

If they're going to remove fork, then python's multiprocessing is going to be dead. Maybe then the community will be forced to get rid of GIL?

fopen64 7 years ago |

When I learnt how fork() and select() worked, I just felt in love with Unix. The Win32 API was so ad-hoc and unnatural in direct comparison.

IshKebab 7 years ago | |

You fell in love with Unix because of a hacky unintuitive syscall? I'd suggest reading the paper!

pova 7 years ago | |

For me it was poll() due to it's simple and intuitive API. Also, it's much faster then select() when you have a large number of file descriptors being monitored.

fopen64 7 years ago | | |

For some reason the book (Beginning Linux Programming, Wrox Press, 1998 edition I think) explained select() first, so like the proverbial little duck that names 'mother' the first think it sees moving after hatch, select() caught my heart.

stuaxo 7 years ago |

And there I was thinking they would expose their fork() implementation.

Interested to see what this paper has to say.

zbentley 7 years ago |

While fork() might be sub-optimal for launching different programs (fork() + exec() vs. posix_spawn()), it's absolutely essential in several types of common systems that don't use it to launch different programs.

Fork-requiring program class 1:

The biggest example where fork() is needed are webservers/long-running programs with significant unchanging memory overhead and/or startup time.

Many large applications written in a language or framework that prefers the single-process/single-thread model for executing requests (e.g. Python/gunicorn, Perl, a lot of Ruby, NodeJS with ‘cluster’ for multicore, etc.) are basically dependent on fork(). Such applications often have a huge amount of memory required at startup (due to loading libraries and initializing frameworks/constant state). Creating workers that can execute requests in parallel but don’t require any additional memory overhead (just what they consume per request) is essential for them. fork()ing without exec()ing a new program facilitates this memory sharing; everything is copy-on-write, and most big webapps don’t need to write most of the startup-initialized memory they have, though they may need to read it.

Additionally, starting up such programs can take a long time due to costly initialization (seconds or minutes in the worst cases); using fork() allows them to quickly replace failed or aged-out subprocesses without having to pay that overhead (which also typically pegs a CPU core) to change their parallelism. “Quickly” might not be quick enough if a program needs to continually launch new subprocesses, but for periodically forking (or just forking-at-startup) long-running servers with a big footprint, it’s far better than re-initializing the whole runtime. For better or worse, we’ve come far enough from old-school process-per-request CGI that it is no longer feasible in most production deployments.

Anticipated rebuttals:

Q: Wouldn't it be nice if everyone wrote apps small enough that startup time was minimized and memory footprint was low?

A: Sure, but they won’t.

Q: People should just write their big, long-running services in a framework that starts fast, has low memory requirements, and uses threads instead of fork()s.

A: See previous answer. Also see zzzcpan’s response.

Q: Can you access some of those benefits with careful use of shared memory?

A: Yes, but it’s much harder to do than it is to use fork() in most cases (caveat Windows, but it’s still hard).

Q: Do tools exist in single-proc/single-thread forking frameworks/languages which switch from forking to hybrid async/threaded paradigms (like gevent) instead?

A: Yes, but they’re not nearly as mature, capable, or useful (especially when you need to utilize multiple cores).

Fork-requiring program class 2:

Programs which fork infrequently in order to parallelize uncommon tasks over shared memory. Redis does this to great effect; it doesn’t exec(), it just forks off a child process which keeps the memory image at the time of fork from the parent, and writes most of that memory state to disk so that the parent can keep handling requests while the child snapshots.

Python’s multiprocessing excels at these kinds of cases as well. If you’re launching and destroying multiprocessing pools multiple times a second, then sure, you’re holding it wrong, but many people get huge wins from using multiprocessing to do parallel operations on big data sets that were present in memory at the time multiprocessing fork()ed off processes. While this isn’t cross-platform, it can be a really massive performance advantage: no need to serialize data and pass it to a multiprocessing child (this is what apply_async does under the covers) if the data is already accessible in memory when the child starts. Node's 'cluster' module will do this too, if you ask nicely. Many other languages and frameworks support similar patterns: the common thread is making fork()ing parallelism "easy enough" with the option of spending a little extra effort to make it really really cheap to get pre-fork memory state into children for processing. Oh, and you basically don't have to worry about corrupting anyone else's in-memory state if you do this (not so with threads).

Anticipated Rebuttals:

Q: $language provides a really accessible way to use true threads that isn’t nearly as tricky as e.g. multiprocessing or knowing all the gotchas (e.g. accidental file descriptor sharing between non-fork-safe libraries) of fork(); why not use that?

A: Many people still prefer languages with primarily-forking parallelism[1] constructs for reasons besides their fork-based concurrency capabilities--nobody’s claiming multiprocessing beats goroutines for API friendliness--so fork() remains useful in much more than a legacy capacity.

Q: Why not use $tool which does this via threads or why not bind $threaded_language to $scripting_language and use threads on the other side of the FFI boundary?

A: People won’t switch. They won’t switch because it’s hard (don't tell me threaded Rust is as easy to pick up as multiprocessing--Rust has a lot of advantages in this space, but that ain't one of them) and because there’s a positive benefit to staying within a given platform, even if some infrequent tasks (hopefully your Python doesn’t invoke multiprocessing too much) are a bit more cumbersome than usual. Also, “Friendly, easy-to-use concurrency with threads” is often a very false promise. There’s a reason Antirez is resistant to threading.

--------------

TL;DR perhaps using fork() and exec() for launching new programs needs to stop. But fork() itself is absolutely essential for common real-world use cases.

[1] References to parallelism via fork() above assume you have more than one core to schedule processes onto. Otherwise it’s not that parallel.

EDITs: grammar. There will be several because essay. I won't change the substance.

makach 7 years ago |

fork() must go too??!!

zzzcpan 7 years ago |

It's hard to take them seriously when they imply the mess that threads are is somehow acceptable and necessary, but nicer, less error prone and simpler fork isn't. Threads are a nasty hack and a liability for the modern programmer to use. And systems researchers really should acknowledge that their continued existence as first class OS primitives is holding back systems research much more, than fork. I guess they are looking to spread FUD and justify the mess that Windows got itself into, not doing actual research.

IshKebab 7 years ago | |

What's wrong with threads exactly?

a1369209993 7 years ago | | |

Aliasable, mutable memory (ie race conditions) is evil, and threads perfuse the entire programming environment with it. This is a dirty implementation detail that operating system kernels have to deal with, and we should be burying it in the same hole as memory swapping and TCP retransmits, not making it a fundamental hazard every application developer has to worry about.

zerr 7 years ago |

Interesting that Redis uses fork() for COW implementation.

Mic92 7 years ago | |

It should be possible to achieve the same with mmap() and MAP_PRIVATE

bbsimonbb 7 years ago |

https://www.youtube.com/watch?v=p-mGXLgGqkY

chasil 7 years ago |

I readily admit that I am unfamiliar with POSIX_spawn() and its benefits over fork().

However, may I point out that Microsoft SQL Server benchmarks have been posted that show Linux TCP-H outperforming Windows?

https://www.dbbest.com/blog/running-sql-server-on-linux/

While I am sure that this is wise criticism, it might also be concluded that Windows itself contains no small amount of architectural decisions that limit performance.

kazinator 7 years ago |

Fork is quite excellent, except in cases when the intent is to run a different program or when threads are involved (threads are basically an incompatible, competing model of concurrency).

The use of fork as a concurrency mechanism (creating a new thread of control that executes in a copy of the address space) is very good and useful.

In the POSIX shell language, the subshell syntax (command1; command2; ...) is easily implemented using fork. This is useful: all destructive manipulations in the subshell like assignments to variables or changing the current directory do not affect the parent.

Check out the fork-based Perl solution to the Amb task in Rosetta code: https://rosettacode.org/wiki/Amb#Using_fork

This essentially simulates continuations (in a way). (If the parent process does nothing but wait for the child to finish, fork can be used to perform speculative execution, similar to creating a continuation and immediately invoking it).

Microsoft "researchers" can stuff it and their company's flagship piece of shit OS.

lawl 7 years ago |

I havent read the entire thing yet, but reading from "replacing fork" to the end it reads too much like embrace extend extinguish.

saagarjha 7 years ago | |

It's suggesting posix_spawn, which is standardized and has nothing to do with Microsoft.

lawl 7 years ago | | |

> Just as a programming course would not today begin with goto, we suggest teaching either posix_spawn() or CreateProcess(), and then introducing fork as a special case with its historic context (§2).

Or CreateProcess(), which has a lot to do with microsoft.

eesmith 7 years ago | |

Nowhere there did the authors "embrace" fork(). Quite the contrary.

While the article points out that the NT kernel natively supports fork, it certainly isn't arguing for any extension of the call.

So all we're left with is "extinguish", which this article certainly does. And it is persuasive. I will look at posix_spawn() for my own code in the future.

Solomoriah 7 years ago |

Okay, this one has me laughing out loud. Of COURSE Microsoft doesn't like fork()... Windows pretty much can't do it. I'll admit, there have been a lot of times I wish there was a more streamlined way to spawn processes on Linux (particularly daemons) but when I don't have fork() I always end up missing it. I'd take this paper a lot more seriously if it came from someone with a less obvious bias.

localhostdotdev 7 years ago |

"every other system has a feature except us, and we are not going to add it (because reasons) even if it's very widely used"

also:

> When a fork syscall is made on WSL, lxss.sys does some of the initial work to prepare for copying the process. It then calls internal NT APIs to create the process with the correct semantics and create a thread in the process with an identical register context. Finally, it does some additional work to complete copying the process and resumes the new process so it can begin executing.

https://blogs.msdn.microsoft.com/wsl/2016/06/08/wsl-system-c...

alkonaut 7 years ago | |

They must have considered it many times (not least when making the partial posix support for NT) but felt that supporting it wouldn’t help deprecating it either.

AFAIK it’s only unix/Linux (posix) OSes that implement fork. Perhaps that’s what you meant by “every other system”, ie unix + clones/derivatives?

dfawcus 7 years ago | | |

Also DRI's FlexOS 2.2 had it, it was not a unix clone per-se.

The COMMAND SVC had/has 4 variants:

- Execute program (akin to posix_spawn)

- Chain program (akin to posix_spawn, and parent exit)

- Execute subprocess (start a thread, one supplies code + stack address)

- Execute fork process (ala fork, but one supplies code + stack address like with 'subprocess' above)

Originally it only had the first two forms, 2.1 added the subprocess form, 2.2 added the fork form.

It didn't have a direct equivalent to exec(), but did have an OVERLAY SVC which loaded fresh code in to the process, and I expect that could be used to make something like exec(). Not that I ever tried, given there was no real need for it.

The other way to create an exec() like behaviour would have been with the CONTROL SVC, akin to ptrace(), but that would have been painful to do.

dijit 7 years ago | | |

VAX/VMS implement "vfork()" which is an implementation of the most common use-cases of fork.

VAX and VMS are not POSIX or UNIX-like.

stirfrykitty 7 years ago |

We already have posix_spawn. I guess MS isn't aware of this.

Methinks MS need to focus on their own issues and leave the nix world alone. While many people find their involvement in FOSS welcome, I do not and never have. They are still a for-profit company beholden to shareholders.

The purchase by MS of GitHub may, again, be welcomed by many, but I find it disastrous. I smell triple E here no matter what anyone says. This is why distros like Debian and Slackware are still so important. All nix needs to do is start adopting MS ideas and then it's a matter of time before distros adopt disastrous code like systemd. MS does want to control everything around them like every other for-profit company. I cannot see this any other way. They are involved for their own good, for things like Azure and their own "cloud". MS needs to focus on their own garden and not that of *nix. I always have and always will prefer the "us and them" mentality when dealing with MS. Don't forget EEE. It's still a reality should you care to look hard enough.

naasking 7 years ago | |

Talk about uncharitable. MS Research produces world-class research. They don't just research Windows, they do research in all operating systems, programming languages and more.

stirfrykitty 7 years ago | | |

It's not about being "uncharitable". It's about protecting nix from being controlled by outside forces. MS does, indeed. have world-class research, but they are sticking their heads in the nix camp, which some of us don't like. We're not all in this together, despite what some will tell you.

Sadly, UNIX (umbrella term here) is not what it was a few years ago. I dearly miss Solaris, for example. Nothing touched it in it's day, not even AIX or HP-UX. I was a UNIX admin for 10 years. I've used them all. Nothing MS can produce will ever be better than pure UNIX. There is a reason it's still being made. FreeBSD can outperform anything MS has on offer. Hell, they borrowed networking code because they couldn't come up with better.

Not all of us see us all under the same tent. I surely don't and never will. It's us and them. To say otherwise would indicate we on all on a level playing field and we're all working together to a common good. We're not. Good research aside, I don't like their history, stewardship, or about anything else they do. Agenda...

HappyJoy 7 years ago | |

The paper mentions posix_spawn... so I guess they are aware

* redirect stdin, stdout, and stderr * open files that might be needed and close files that aren't * change process limits * drop privileges * change the root directory * change namespaces