Abusing Lifetimes for Perf and Memory Safety
A problem that I run into every once in a while is creating a reference to an item in a data structure that doesn’t hold a reference to the data structure so you can modify it while having the reference and has an API that prevents any kind of misuse.
For a vector, just handing out the index of an element almost works. It can be used to quickly get access to the
element in the vector but it can be easily misused and can cause a panic. The index is just a usize which is not
tied to the vector that returned it. You can easily pass the index from one vector into another vector without the type
system complaining at all.
A solution I’ve found to this problem is abusing lifetimes to brand an index or reference to a value so that it can only be used with the container that created it. I did not come up with this idea myself but learned about it from ghost_cell and this video from Gamozo.
That crate and video are great introductions to the idea but I found that it was impossible to really understand exactly how it worked and how to use the idea without trying to implement it myself.
Main Idea⌗
To tie one value to another, you have to create a new lifetime and use it as a field in the container and the reference type in such a way that the lifetime is invariant. It will look something like the following:
type Brand<'b> = PhantomData<fn(&'id ()) -> &'id ()>;
struct Container<'brand> {
data: Vec<u8>,
_brand: Brand<'brand>,
}
struct Reference<'brand> {
index: usize,
_brand: Brand<'brand>,
}
{{ note(header=“Unstable”, body=“There is an experimental type PhantomInvariantLifetime in the standard library that will hopefully be stabilized soon and can be used instaed of an ad hoc pantom data type.”)}}
Both Container and Reference use 'brand as an invariant lifetime which means that the lifetime may not be
shortened or extended any time it is used. This will make sure that the lifetime matches exactly and no other lifetime
can be used where 'brand is expected.
The API of Container will hand out a Reference with the matching lifetime and require a Reference with a matching
lifetime when accessing a value in the container.
impl<'b> Container<'b> {
pub fn push(&mut self, value: u8) -> Reference<'b> {
let idx = self.data.len();
self.data.push(value);
Reference {
index: idx,
_brand: PhantomData,
}
}
pub fn get(&self, r: Reference<'b>) -> &u8 {
// SAFETY: `r` must be a reference that was produced by this container so it has a valid index. That index is still
// valid because this container does not expose a way to remove or rearrange elements.
unsafe { self.data.get_unchecked(r.index) }
}
}
The safety comment in this code example adds some of the other requirements for this to work. Using the branded type
will ensure that a reference is always used with the correct container instance but will not ensure that the index is
still valid when it is used. To make sure that the index is not invalidated between when it was created and when it is
used, the backing Vec cannot have any elements removed from it. Otherwise, a previously valid index could bast the
end of the vector and access dangling or freed memory.
Creating a Lifetime⌗
One of the main drawbacks of using lifetimes to brand types is that you need to create a new lifetime that will be used as the branding lifetime.
I don’t understand this part perfectly, but what I do know is that the only way to guarantee the creation of a new
lifetime in rust right now is to use a closure and the for<'a> syntax. In a small project where I was testing this
out, I had an interface like the following:
pub fn update_vec<'v, U>(data: &'v mut Vec<u8>, update: U)
where
U: for<'b> FnOnce(SafeVec<'b, 'v>)
{
let sv = SafeVec {
data, _brand: PhantomData,
};
update(sv);
}
pub struct SafeVec<'b, 'v> {
pub(crate) data: &'v mut Vec<u8>,
_brand: Brand<'b>,
}
impl<'b, 'v> SafeVec<'b, 'v> {
pub fn push(&mut self, val: u8) -> SafeIndex {
let idx = self.data.len();
self.data.push(val);
SafeIndex {
idx, _brand: PhantomData,
}
}
pub fn get_mut(&mut self, idx: SafeIndex<'b>) -> &mut u8 {
unsafe { self.data.get_unchecked_mut(idx.idx) }
}
}
pub struct SafeIndex {
pub(crate) idx: usize,
_brand: Brand<'b>,
}
Using the closure syntax, a new lifetime can be created which will brand the safe version of the vector and allow it to create indexes that are branded with the same lifetime.
An additionaly bit of safety that needs to be kept in mind is that the end user should not be able to create the safe version of the container or any instance of a safe index. That could allow the user to violate the safety assumptions that all instances were created with the required constraints.
An example of using it:
fn main() {
let data = Vec::new();
update_vec(&mut data, |sv| {
let first = sv.push(1);
let second = sv.push(2);
*sv.get_mut(first) = 10;
*sv.get_mut(second) = 20;
});
}
And the following will fail to compile:
fn main() {
let vec1 = Vec::new();
let vec2 = Vec::new();
update_vec(&mut vec1, |sv1| {
update_vec(&mut vec2, |sv2| {
let first = sv1.push(10);
*sv2.get_mut(first) = 20;
});
});
}
The error message is not the best as it generates a scary sounding error about borrowed data escaping outside of a closure. However, it’s better to have some error message when something goes wrong than to have something compile and then cause weird behavior at runtime.
I have an example of creating a more extensive api using branded types here. One thing I noticed while writing it is that you either have to implement the API twice for the “unsafe” version and again for the “safe” version or decide how to split up which methods are usable on which version of the API.