Async Rust: history strikes back.

Those who do not study history, are doomed to repeat it.

Those who study history, are doomed to sit and watch while some idiot repeats it.

Last week we have released a new version of Glommio, a thread-per-core asynchronous executor for Rust. Having each individual executor working within the confines of a single thread allow us to make a lot of guarantees that the ecosystem at large can’t. In our latest release, we were positive we had crafted a well defined API that would simplify the creation of asynchronous Glommio programs. Only to find out we were simply repeating the mistakes of giants that came before us.

As a rule of thumb, we as people like to talk about our successes and not so much about our failures. Upending that a bit, I figured I would take the time to write about that recent failure, reflect a bit about how asynchronous Rust can be much harder and full of surprises than it seems, and what this means for its future.

The problem

One of the well-known sources of friction for newcomers in Rust is the borrow checker with its lifetime rules. Yet, the borrow checker is what gives Rust its unique flavour and guarantees memory safety: the compiler enforces those rules for you, and as long as you don’t write code marked with the unsafe keyword, memory corruption is impossible.

Despite its initial friction, lifetimes are, once understood, usually easy to handle: an object will be alive until it goes out of scope, and you can only keep one mutable reference to it at a time.

This model gets a bit more complex with asynchronous code: by its very nature, asynchronous code can execute at any time in the future (or not at all), which makes scoping unpredictable. Because of that, code like this requires a'static bound: a special lifetime parameter that indicates that an object is alive for the entire duration of the program.

See for example, how to spawn a new thread in Rust:

Although this is considered synchronous code, the problem is the same and is a good example of how generic this issue is: because threads will keep running in the background, your data could be long dead by the time it gets accessed in the thread, so you need 'static data.

In asynchronous Rust one doesn’t spawn threads, but rather tasks. Like threads, tasks may run independently of their original context and thus need the'static keyword too. Compared to synchronous threads, Tokio’s spawn has a similar method, async-std is the same, and Glommio, of course, is no exception. The main difference from threads is that you pass futures and not synchronous closures to those functions.

Since references to dynamically created objects are not 'static , and references that spawns the entire program are rare — they only happen with statically defined variables, more commonly a 'static bound means no reference at all. Ownership of the object is moved inside the asynchronous context, avoiding the lifetime issue entirely.

But often times we still want to use the object in its original context, or across many asynchronous tasks. The solution is to use reference counted pointers, and because such pointers force immutability, the interior mutability pattern must also be used.

In the simple Glommio example below, we want to share the keep_running variable between the task and the original context, and use it to control for how long the task will run. Notice that it has to be wrapped in a shared pointer ( Rc), and since shared pointers are immutable in Rust, we need to use the interior mutability pattern (Cell).

Reference counting is not the end of the world. As a matter of fact, in an earlier essay I have argued that they are the price we pay to live in a civilized asynchronous world. And it is not even that high of a cost: compared to a simple memory access, a shared pointer adds an arithmetic operation and a likely cached dereference. Both should be really cheap for modern processors.

But that doesn’t mean they’re free and we are obviously still better off if we can avoid them. Furthermore, blanket statements about the cost or lack thereof of any primitive are problematic. Surely there must be situations in which reference counting can truly become expensive?

And indeed there are: when the difference is no longer between how much work you do to access the data, but between having to do work or doing no work at all.

Consider for instance a large vector of references. Once they go out of scope the compiler can simply deallocate the vector’s backing memory and be done with it, which is very fast. But if you instead accumulate a vector of reference counts, the compiler now has to iterate through each of them, decreasing their reference count and potentially freeing each of them individually. In this case, there is a lot of extra work to be done, compared to zero work if we could use simple references. This doesn’t matter if you have a handful of references, but it starts to matter as vector grows.

Can we do better?

Tasks can outlive their original scope, but they don’t always do. As a matter of fact, most of the time they don’t. Asynchronous code end up looking like the code below:

More often than not, tasks have a well defined lifetime: they need to be .awaited, and we know that once that happens they will no longer execute. If they are not, they get immediately cancelled so they won’t do any harm either. To keep the task alive outside its original scope one needs to explicitly .detach() it (in the case of Glommio, although other executors will be similar). So what if we provided a version of Task that cannot ever be detached, and will either die right away or terminate at a specific point?

After some discussion, we came up with the ScopedTask. As the name implies, the ScopedTask has a well defined scope. No methods are provided to allow background execution, so we know precisely when it will finish. As with a normal task, the ScopedTask starts executing right away, so it can still be used to drive concurrency.

History repeats itself?

Aside from the newcomer’s friction issue, I had recently found myself a couple of instances where the drop-a-vector-of-Rc issue showed significantly hot in profiles. So I was thrilled to be able to finally have the ScopedTask as a way to solve this problem.

Except, of course, we were not the first ones to do it. With any luck, this article will help us be the last. As it turns out, in as early as 2015 Rust’s standard library removed a similar API from its synchronous threading implementation. Reading that, I definitely took solace on the fact that much more experienced Rustaceans than I am also did not see this problem at first. As usual, things are not at all easy to see until they are discovered, at which point they become utterly obvious.

The original code in the report is a bit hard to follow, but it boils down to the fact that in Rust, there is no guarantee that destructors will ever run. While this can seem weird at first, it is not hard to see that there are valid cases in which that can happen like resource leaks and cyclic reference counts. Those are, for sure, most likely bugs. But the whole point of Rust lifetime rules is that while a fact of life, bugs should not ever cause invalid memory to be accessed.

A much easier to follow, albeit more artificial example is as follows:

The std::mem::forget prevents destructors from running. So they don’t, and our poor task that never completed never got cancelled either and lived on to create mayhem and blow through Rust’s safety guarantees.

What now?

We were very fortunate that although this unfortunately survived the review process, once it reached a wider scope through the release notes one of our users and frequent contributors who had apparently studied history caught the problem and decided would not sit and watch some idiot repeat it. The new version was yanked from crates.io, and a new one will follow where this API is marked as unsafe until we figure out what to do.

This is also an opportunity for reflection. As I am getting closer to my first year anniversary of writing Rust code, what does this tells me? On some level, I find it reassuring that things like this show how seriously the safety guarantees of Rust are taken by the community. A similar design was removed from the standard library despite protests, resisting the temptation of dismissing this example ill-posed.

On the other hand, it is sad that the memory model forces many things in the asynchronous Rust ecosystem to impose higher costs than necessary. Having to rely on shared pointers is the example-du-jour, but there are also other well known examples, such as the fact that using asynchronous traits force a memory allocation.

All things considered, I still see a bright future for Rust. The async foundations workgroup seems to be definitely aware of those issues, and working them diligently. They are definitely committed to making asynchronous Rust better. There is also recent jaw-dropping news that Rust is posed to be the first language ever aside from C (well, and technically asm) to be used for writing Linux Kernel code. As someone who wrote Linux Kernel code for a living for the best part of a decade, that’s no small feat.

For now, all I can hope for is that my recent adventures help you too to learn the history of this community, and avoid the doom of repeating it.