A curiously recurring lifetime issue(blog.dureuill.net) |
A curiously recurring lifetime issue(blog.dureuill.net) |
The main innovation of Cap'n Proto serialization compared to Protobuf is that it doesn't copy anything, it generates a nice API where all the accessor methods are directly backed by the underlying buffer. Hence the generated classes that you use all act as "views" into the single buffer.
C++, meanwhile, is famously an RAII lanugage, not garbage-collected. In such languages, you have to keep track of which things own which other things so that everyone knows who is responsible for freeing memory.
Thus in an RAII language, you generally don't expect view types to own the underlying data -- you must separately ensure that whatever does own the backing data structure stays alive. C++ infamously doesn't really help you with this job -- unlike Rust, which has a type system capable of catching mistakes at compile time.
You might argue that backing buffers should be reference counted and all of Cap'n Proto's view types should hold a refcount on the buffer. However, this creates new footguns. Should the refcounting be atomic? If so, it's really slow. If not, then using a message from multiple threads (even without modifying it) may occasionally blow up. Also, refcounting would have to keep the entire backing buffer alive if any one object is pointing at it. This can lead to hard-to-understand memory bloat.
In short, the design of Cap'n Proto's C++ API is a natural result of what it implements, and the language it is implemented in. It is well-documented that all these types are "pointer-like", behaving as views. This kind of API is very common in C++, especially high-performing C++. New projects should absolutely choose Rust instead of C++ to avoid these kinds of footguns.
In my experience each new developer makes this mistake once, figures it out, and doesn't have much trouble using the API after that.
is it not possible to delete the rvalue reference overload of ‘getList’?
as far as i can tell, the error producing code wouldn’t have produced a diagnostic, but failed to build in the first instance, like the rust case?
int i = call.send().getSomeStruct().getValue();
Here, even though `send()` returns a response that is not saved anywhere, and a struct reader is constructed from it, the struct reader is used immediately in the same line, so there's no use-after-free.Someone else mentioned using lifetimebound annotations. This will probably work a lot better, avoiding the false positives. It just hadn't been done because the annotations didn't exist at the time that most of Cap'n Proto was originally written.
It absolutely is. It's a fairly basic principle that APIs should be difficult to misuse, and that fact that you made the same mistake 2/3 times shows that it is very easy to misuse. In other words it is a badly designed API. At the very least it should be called ListView.
I am not a big fan of CapnProto. It has some neat features but it's very complicated, full of footguns like this, and the API is extremely unergonomic.
I haven't used non-owning types like string_view or span too much because I haven't needed that level of performance or memory optimization yet, and so those just seem like footguns as compared to just a reference without those needs. I do like to use a technique in classes that use non-owning references that would work for those too to prevent this particular problem.
For that, there are two methods with the same name, but different access - an lvalue version and an rvalue version. Then, you delete the rvalue method like this:
class Response {
auto getListView() & -> ListView {
return ListView(m_List);
}
void getListView() && = delete;
}
Then you get a compile error like in Rust when you try to call getListView() from a temporary object, but if you call the method from an lvalue it still works at least as long as the object is in scope.I don't agree. The API is what it is because it is specifically a zero copy API for performance. If you don't care about performance, why are you using C++ (stupid) and a zero-copy API (doubly stupid)?
I absolutely do NOT expect a zero copy API to own things. If I drop the underlying reference that is really an alias, how on earth is that the fault of the zero copy API?
The combination of aliasing and lifetimes are C++ footguns--full stop. This is aptly demonstrated by how quickly Rust kills this cold.
If you use sharp knives, sometimes you cut your fingers. People like you would claim the knife is the problem.
> If you use sharp knives, sometimes you cut your fingers. People like you would claim the knife is the problem.
This is more like a cutting yourself on a razor sharp butter knife. If it's a sharp knife it should look like a sharp knife.
I have a general rule that "resource" types which own a heap allocation should usually be given a variable name with explicit scope (and likely even an explicit type, rather than `auto response` like in this post). This is a general guideline to avoid holding a reference to a temporary that gets destroyed, but doesn't protect against returning a dangling reference into a resource type from a function.
In other places, where languages make the opposite decision (from this blog post) to extend the lifetime of a temporary variable with a destructor when you call methods on it, you get things like C++'s temporary lifetime extension (not a bug, note that I don't understand it well), and footguns like Rust's `match lock.lock().something {}` (https://github.com/rust-lang/lang-team/blob/master/design-me...).
I see no *, no & and no -> in the code. So I would assume everything to behave as if it was owned or even copied. Had it returned actual pointers, or pointer-like objects like iterators, it would have been more obvious.
auto x = y();
Is x a pointer or reference? There’s no way to tell. Maybe if you then do x->foo();
You have some idea that x is pointer-ish, but unique_ptr works like this and isn’t very pointer-ish.Rather than use a non-owning reference I'd rather use a design that didn't need it, or just use a std::shared_ptr owning reference instead. I realize there are potential cases (i.e. one can choose to design such cases) of circular references where a non-owning reference might be needed to break the circular chain, or where one wants a non-owning view of a data structure, but without very careful API design and code review these are easy to mess up.
1. Write a test program that exercises your segfault.
2. Build it in debug mode.
3. Run `valgrind <my-program>`
And it just tells you where your problem is. Not much more to it.
But if you would like, I could post one to my blog.
If you would like me to, contact me privately. [1]
FWIW I don't work on an "app", I work on a cloud platform that hosts web apps, operating in hundreds of datacenters and serving millions of requests per second. Small optimizations are worth a lot of money here -- and garbage collection is a nonstarter in this kind of systems work, for more reasons than just performance.
Maybe you're advocating for Rust, but in C++ you can of course always fall back to raw pointers for efficiency, while retaining well defined (e.g. class/scope based) lifetimes to avoid bugs.
This is certainly what I'd recommend wherever possible. It's OK to give Foo a reference to Bar as long as Foo is declared after Bar in some common scope, so will be destroyed first. If you find it's overly difficult to prove this relationship then you need to refactor your code or use some reference counting.
And then of course Rust actually checks the relationships for you.
myReader.getFoo().getBar()
Here, `myReader` is already a view type; ownership of the backing buffer lives elsewhere. `getFoo()` returns a reader for some sub-struct, and `getBar()` returns a member of that struct. If we say getters are not permitted to be called on rvalues, this expression is illegal, but there's no actual problem with it and in practice we write code like this all the time.regardless of that outcome, i think i’d prefer to require a value preserving the lifetime of the reader/view. in the cases that it may not be necessary, i'd prefer to lean on the optimiser to take care of it..!
> i think i’d prefer to require a value preserving the lifetime of the reader/view. in the cases that it may not be necessary, i'd prefer to lean on the optimiser to take care of it..!
We'd all prefer APIs that cannot be used unsafely but realistically there's no magic the optimizer can do to make the problems with refcounting go away. You need to use a language like Rust to solve this.
perhaps for values like this you’re fine. i think my point still stands about the reader of a built-in list/sequence type, surely?
and, not to sound facetious, that’s exactly what optimisers do :)
the c++ type system is more than capable about reasoning about lifetimes, the issue is that, with c++, it’s an optional part of the language. also, the lack of non-destructive moves. but to require both of those things in the language would require, essentially, the borrow checker in rust.
E.g. if you do:
{
auto foo = std::make_shared<Foo>();
bar->baz(foo);
}
The compiler has to know what `baz()` does in order to know whether it can elide heap allocation and refcounting of `foo`. `baz()` could, after all, add a refcount on `foo` and keep it somewhere.If `baz()` is virtual, or just implemented in a source file that the compiler cannot see at the time of compiling the calling code, then there's no ability to optimize at all. Even if the compiler does know the full implementation of `baz()`, eliding the heap allocation is not going to be easy. Maybe if `baz()` is very simple, it can do it? I actually don't know if the compiler is even capable of this when using shared_ptr.
Of course you can always say "well a sufficiently smart compiler could reason about your whole codebase including every implementation of a virtual call" but we program to the compiler we have, not the one we want. And frankly, if you had a compiler that smart it would be able to detect your use-after-free bugs and warn about them, so you wouldn't need to use shared_ptr everywehre.
of course, across the TU boundary things get difficult, but i don’t think it’s fair to dismiss LTO entirely (although.. i agree with the thesis that it’s not particularly.. good)
similarly, de-virtualisation is an optimisation technique compilers will aggressively use to improve performance, although you’re right that it can’t look through another source file, so it is not without limitations here.
but we’re not being general, we’re being specific; the safety issues that are being discussed are well within the remit of the c++ type system here, and i don’t think we’re doing any favours to anyone by letting this rvalue be accessed in this way. it is certainly not idiomatic to provide library code that can so violently implode with seemingly regular use. i find it difficult to believe that lifetime issues like this are undiagnosable.