Concurrency in Rust

282 points by SirNoobsAlot 10 years ago | 155 comments

gregwebs 10 years ago |

Send + Sync are great. The downside of concurrency in Rust is:

1) There isn't transparent integration with IO in the runtime as in Go or Haskell. Rust probably won't ever do this because although such a model scales well in general, it does create overhead and a runtime.

2) OS threads are difficult to work with compared to a nice M:N threading abstraction (which again are the default in Go or Haskell). OS threads leads to lowest common denominator APIs (there is no way to kill a thread in Rust) and some difficulty in reasoning about performance implications. I am attempting to solve this aspect by using the mioco library, although due to point #1 IO is going to be a little awkward.

pcwalton 10 years ago | |

> 1) There isn't transparent integration with IO in the runtime as in Go or Haskell. Rust probably won't ever do this because although such a model scales well in general, it does create overhead and a runtime.

By "transparent integration with the runtime" you mean M:N threading. M:N threading is just delegating work to userspace that the kernel is already doing. There can be valid reasons for doing it, but M:N threading isn't us not doing the work that we could have done. In fact, we had M:N threading for a long time and went to great pains to remove it.

In addition to the downsides you mentioned, M:N threading interacts poorly with C libraries, and stack allocation becomes a major problem without a precise GC to be able to relocate stacks with.

M:N will never be as fast as an optimized async/await implementation can be, anyway. There is no way to reach nginx levels of performance with stackful coroutines.

> OS threads leads to lowest common denominator APIs (there is no way to kill a thread in Rust)

This has nothing to do with the reason why you can't kill threads in Rust. We could expose pthread_kill()/pthread_cancel() on Unix and TerminateThread() on Windows if we wanted to. The reason why you can't terminate threads that way is that there's no good reason to: if you have any locks anywhere then it's an unsafe operation.

> some difficulty in reasoning about performance implications.

I would actually expect the opposite to be true: 1:1 is easier to reason about in performance, because there are fewer magic runtime features like moving or segmented stacks involved. Could you elaborate?

IshKebab 10 years ago | |

There's no way to kill goroutines either. In fact, are there any systems that allow you to cleanly kill threads?

rdtsc 10 years ago | | |

Yes.

In Erlang:

    exit(kill).

or exit(Pid,kill).

Will kill a process. It has an isolated heap, so it won't affect other (possibly hundreds of thousands of) running processes. That memory will be garbage collected, safely and efficiently.

This will also work in Elixir, LFE and other languages running on the BEAM VM platform.

EDIT: masklinn user below pointed out correctly, the example is exit/2, that is exit(Pid,kill). In fact it is just exit(Pid, Reason), where Reason can be other exit reason, like say my_socket_failed. However in that case the process could catch it and handle that signal instead of being un-conditionally killed.

steveklabnik 10 years ago | | |

Every one I know of has regretted it, and seen it as an antipattern. For example, Java way back in 1.5: http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPr...

I think Erlang might be okay with it, because "this thread can fail at any time" is a core value of Erlang. But it's an exception.

gregwebs 10 years ago | | |

Yes. In Haskell you use `killThread` which throws an asynchronous exception to the thread. It is certainly difficult to perfectly cleanup resources in the face of asynchronous exceptions. However, once there are functions available to help you with this (e.g. use a bracket function whenever using resources) it becomes tractable.

This functionality is critical to being able to timeout a thread.

the_why_of_y 10 years ago | | |

GHC has throwTo, which raises an exception in another thread:

http://hackage.haskell.org/package/base-4.6.0.1/docs/Control...

This is used to provide the killThread function:

http://hackage.haskell.org/package/base-4.6.0.1/docs/Control...

Manishearth 10 years ago | | |

Note that this isn't exactly a safe operation, since a killed thread may stop in the midst of something. It's safer to have it process messages on a loop and include a quit message.

amluto 10 years ago | | |

POSIX has pthread_cancel. It's a big mess.

thegenius2000 10 years ago | | |

Can you clarify what you mean by "kill goroutines?" Because my understanding was if you return while inside a goroutine it get's handled by the GC immediately, and (as someone else mentioned) you can use context to send deadlines/cancellation signals to go routines.

tonyhb 10 years ago | | |

You can use contexts to send a cancellation signal to goroutines: https://blog.golang.org/context.

This is more of an implementation detail you make on a case-by-case language rather than a builtin to go.

catnaroek 10 years ago | |

> There is no way to kill a thread in Rust

Think about the interaction with (non-memory) resource ownership. This is just horrible, and I wouldn't even want it in a higher-level language. If you want to carefully notify threads that they must terminate, set up a channel, or write to a shared variable, but please do not just forcibly terminate threads.

conceit 10 years ago | | |

If thinking about ownership of orphaned data is horrible, that still doesn't mean that there is no general solution.

I can't support your argument, because I'm not capable. If I was, I probably wasn't asking in the first place.

Manishearth 10 years ago |

For a more in-depth explanation of how Send and Sync work theoretically, see http://manishearth.github.io/blog/2015/05/30/how-rust-achiev... or http://huonw.github.io/blog/2015/02/some-notes-on-send-and-s...

nindalf 10 years ago |

I think Steve Klabnik could clarify this, but the book at that link is in the process of being rewritten. I think it might be good to wait until it is. I personally found it slightly difficult to follow compared to other options like the soon to be published Programming Rust.

steveklabnik 10 years ago | |

I am in the middle of working on a second draft of the book. This page is one of the oldest bits of docs, overall, and isn't my best work. It's not _wrong_, I just have very high personal standards. It was adapted from older documentation and was written in the time up to 1.0, where I had a LOT on my plate.

galonk 10 years ago | | |

It certainly is _confusing_ if not _wrong_. You silently add "move" to the closure without any mention or explanation (then later say "note that we're copying i" without explaining that you're talking about the "move" keyword).

The bit about Mutex also has a "just type this to fix the problem with no explanation of how or why it works" flavour (although I guess if you already grok mutexes then its use here might be obvious to you).

Not a criticism of the tutorial, but it does something common in Rust tutorials which is really a problem with the language at this point: Rust tutorials always spend a lot of time interpreting Rust's notoriously poor error messages (e.g. "what this message that doesn't mention Sync is trying to tell you is that you need Sync on this type"). That's great when you're doing the tutorial, but as soon as you're on your own man are those errors frustrating.

nindalf 10 years ago | | |

Thanks Steve, you're doing great work. Can't wait to read the book once its done!

jonreem 10 years ago |

Another thing to know about rust concurrency is that it supports safe "scoped" threads, or threads which have plain references to their parent threads stack.

This makes it very easy to write, for instance, a concurrent in-place quicksort (this example uses the scoped-pool crate, which provides a thread pool supporting scoped threads):

    extern crate scoped_pool; // scoped threads
    extern crate itertools; // generic in-place partition
    extern crate rand; // for choosing a random pivot

    use rand::Rng;
    use scoped_pool::{Pool, Scope};

    pub fn quicksort<T: Send + Sync + Ord>(pool: &Pool, data: &mut [T]) {
        pool.scoped(move |scoped| do_quicksort(scoped, data))
    }

    fn do_quicksort<'a, T: Send + Sync + Ord>(scope: &Scope<'a>, data: &'a mut [T]) {
        scope.recurse(move |scope| {
            if data.len() > 1 {
                // Choose a random pivot.
                let mut rng = rand::thread_rng();
                let len = data.len();
                let pivot_index = rng.gen_range(0, len); // Choose a random pivot

                // Swap the pivot to the end.
                data.swap(pivot_index, len - 1);

                let split = {
                    // Retrieve the pivot.
                    let mut iter = data.into_iter();
                    let pivot = iter.next_back().unwrap();

                    // In-place partition the array.
                    itertools::partition(iter, |val| &*val <= &pivot)
                };

                // Swap the pivot back in at the split point by putting
                // the element currently there are at the end of the slice.
                data.swap(split, len - 1);

                // Sort both halves (in-place!).
                let (left, right) = data.split_at_mut(split);
                do_quicksort(scope, left);
                do_quicksort(scope, &mut right[1..]);
            }
        })
    }

In this example, quicksort will block until the array is fully sorted, then return.

pmarreck 10 years ago |

Reading all this is making me happy about pursuing Elixir (which is of course a language addressing largely different use-cases)

djhworld 10 years ago |

I'm having a tough time trying to understand this snippet

    for i in 0..3 {
        thread::spawn(move || {
            data[i] += 1;
        });
    }

What is the 'move' thing here before the ||

z1mm32m4n 10 years ago |

Does Rust have a way to work with SIMD concurrency as opposed to just fork/join concurrency? Something along the lines of how openmp or cilk let you do a parallel for all?

fndjdh 10 years ago | |

You seem to have concurrency confused with parallelism. Concurrency just means working on another task before the prior one has completed. You're describing parallelism which is doing multiple tasks at the same time.

m0th87 10 years ago | | |

Parallelism necessarily implies concurrency [1]

So, "SIMD concurrency" is not incorrect (although SIMD parallelism is more correct :)

1: http://programmers.stackexchange.com/a/155110

z1mm32m4n 10 years ago | | |

By SIMD concurrency, I meant that I was curious about a construct that actually translated into simultaneously executing code, not just a construct that denotes work that "could" be done simultaneously.

Manishearth 10 years ago | |

Rust supports SIMD, with some utility structs that let you do it easily. I don't know of any libraries that auto-simd things though IIRC LLVM can do this on its own sometimes.

alkonaut 10 years ago | | |

I think the parent was referring to "lightweight task"-based concurrency like C#'s PLinq:

    // C# 
    int sum;
    Parallel.ForEach(myCollection, item => sum += item);

Which operates using a reasonably sized thread pool rather than a thread per item. Something similar in Rust (Excuse my rusty rust) would look like.

    let someList = ...
    parallel::for(someList.iter(), |item| {
       // Do thing with each item
    };

I think there is ongoing work on this topic, but it will likely only be library-level and not language-level (Just like Linq is a language level feature in C# but PLinq is a library). There are some third party crates that do this like simple_parallel.

Edit: Low-level simd exist as intrinsics and of course through llvm vectorizations.

Chilinot 10 years ago | |

A simple google search would have told you that there is work being done the introduce simd in Rust. There appear to be some basics there right now but not much if i understood my quick search.

kbenson 10 years ago | | |

An active discussion about concurrency in rust with first developers commenting is a very appropriate place to ask that question, and ultimately much more likely to yield valid and current information than Google.

armitron 10 years ago |

This looks terribly overcomplicated/overengineered to me, to the point where I doubt many are going to adopt/switch to this style, esp when used to more convenient approaches [even the standard C++ approach, faulty as it may be].

Also note how much boilerplate one has to write and how the code snippets bypass error handling (do it differently in "real" code but don't show us how). Bleh.

bluejekyll 10 years ago | |

Try it before you mock it.

This prevents boiler plate issues, and allows the compiler to help you discover threading issues at compile time rather than runtime.

It's easy enough to just mark all you structs send+sync and still shoot your foot off just like in any language. The point is, you need to be explicit that your trying to shoot your foot off, as opposed to other languages which basically pull the trigger for you.

Manishearth 10 years ago | |

> overcomplicated/overengineered to me

You don't have to worry about most of this. Doing concurrent things in Rust is pretty clean. Designing new concurrent abstractions from scratch is where you need to worry about Send and Sync and be careful. And it's totally worth it, entire classes of concurrency errors just go away.

The error handling can get verbose, though with the new `?` operator and `catch` syntax it's much cleaner now.

RasmusWL 10 years ago |

Can someone enlighten me as to why the first snippet has a data race? Won't the resulting array become [2,3,4]?

    let mut data = vec![1, 2, 3];

    for i in 0..3 {
        thread::spawn(move || {
            data[i] += 1;
        });
    }

Manishearth 10 years ago | |

That's a mistake, clarified: https://github.com/rust-lang/rust/pull/32538

gsjs 10 years ago | |

I don't think there's a data race there, but the compiler can't check that. What the compiler sees is more than one thread accessing the variable `data`, which could cause a data race.

askyourmother 10 years ago |

What about the assumption (fatally flawed decision?) that malloc never fails when rust asks? Sounds like something that could affect concurrency

Manishearth 10 years ago | |

A failed malloc aborts the process. If this is important to you, don't use the heap abstractions in the stdlib then. This is no different from the situation in C++.

(You can also plug in a custom allocator which behaves differently)

qznc 10 years ago | | |

On Linux malloc never fails actually. Instead, the kernel kills processes if it runs out of memory.

quotemstr 10 years ago | | |

No custom allocator can make a task that fails to allocate gracefully report an error. Rust's error handling design is just terrible, and mostly a consequence of eschewing exceptions.

Had Rust opted for exceptions, it'd be a much better, and actually usable, language. Rust's terribly error-handling strategy is the chief reason not to use it.

losvedir 10 years ago | |

That's not a property of rust the language, but the standard library. I believe you could use other allocators that behave differently. In any case, the standard library panics on OOM, and panics are described at the bottom of the linked page.

kzrdude 10 years ago | | |

calls abort(), not panic!(). This is important since unwinding does not happen with the former.

anon4 10 years ago | |

Malloc never fails, but you might die if you touch the memory. In general, modern OSes don't have a good story about exhausting available memory beyond "let's kill a bunch of processes to free up memory".

masklinn 10 years ago | | |

> Malloc never fails

malloc can fail, even on default linux (overcommit enabled), if you go above the process's vmem limit for instance (because 32b or rlimited). And of course not all OS overcommit, Windows famously does not.