Lifetimes

In Safety Features of Rust, we looked at one borrow checker rule that Rust enforces, namely aliasing XOR mutability. There is in fact another borrow checker rule that Rust enforces, which is that a referent should outlive its references. We looked at this at play with move before.

fn main() {
    let a = String::from("a");
    print_str(a); // `a` moves into the function `print_str()`.

    println!("a is {}", a); // This throws an error,
                            // because `a` can no longer access the string.
}

fn print_str(x: String) {
    println!("String {}", x);
}

The above example is the same example we saw before in Safety Features of Rust. a is a reference of a referent String that contains "a". This string gets deallocated after executing print_str. If move did not occur, the original reference a would outlive the referent, which would cause a use-after-free problem.

In general, in order to make sure that a referent does outlive its reference, Rust needs to know how long a referent would live for (i.e., not be deallocated) and how long a reference would live (i.e., be pointing to a valid object in memory). The problem is that this is sometimes difficult to infer and Rust asks programmers to tell the compiler using a concept called lifetimes. Below, we will first look at functions and how lifetimes are important for functions. Later, we will look at other uses of lifetimes.

The Problem with References in Functions

Lifetimes are frequently used in functions, and most of the times Rust is able to automatically take care of lifetimes so the programmers do not need to worry about them. However, that is not always the case. Let's look at the following example to understand this further.

The example below uses &str which is called a string slice. You can read more about it in the Rust book.

fn ref_return(x: &str) -> &str {
    // Assume that this function does some complicated things
    // and returns a string slice.
}

fn main() {
    let s = String::from("string");

    let res = ref_return(&s);
    // Assume that the rest of the code does things with `res`.
}

In the above code, res is a reference to the string slice returned from ref_return() (the referent). Thus, Rust needs to check whether or not the referent outlives the reference. Determining how long the reference (res) would live is easy---it's until the end of the main() function. However, it is not easy to determine how long the referent that res points to would live---it's coming from ref_return() and generally speaking, unless you execute the function, you won't know what it's going to return and which memory location res will point to. Thus, Rust is unable to check if the referent would outlive the reference.

Lifetimes in Functions

Due to the above problem, if a function returns a reference, Rust asks programmers to tell the Rust compiler how long the reference would live. This is called the reference's lifetime. However, there is an important thing to keep in mind. Rust does not ask programmers to specify the lifetime of a reference. Instead, Rust only asks programmers to represent the relationship of the lifetimes of input parameters and returned references. It would be very difficult, if not impossible, for a programmer to determine how long a reference would be valid and Rust does not ask for it.

Let's look at a more concrete example to understand what this means. The example below is from the Rust book.

#![allow(unused)]
fn main() {
fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}
}

If you run the code, the compiler will complain that the return type misses a lifetime parameter. The error message will tell you how lifetime parameters look like and what to do. A lifetime parameter looks like 'a with a ' and a name of the parameter. It is used to represent the duration for which a variable would be valid and it comes after &.

Using a lifetime parameter, Rust expects programmers to tell its compiler how long a reference would be valid for. As mentioned earlier, Rust does not expect programmers to manually figure out how long a reference would be valid for. All Rust expects is how long a reference would be valid for in relation to input parameters. Let's take a look at the following example.

#![allow(unused)]
fn main() {
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}
}

There are a few things to note in the above code. First, <'a> is using a syntax similar to the one we saw for generics in More on Struct and Traits. It means that 'a is a generic lifetime parameter, i.e., it represents any lifetime, not a particular lifetime. We then use 'a for the input parameters and the return type to represent that they all have the same lifetime. This is basically showing the relationship between the input parameters' lifetimes and the return value's lifetime. As mentioned earlier, Rust doesn't ask programmers to specify a lifetime. It only asks programmers to show what the relationships are. For functions, you always need to show what the relationship is between input parameters' lifetimes and the return value's lifetime. Since the return value is either x or y, telling the Rust compiler that the input parameters and the return value have the same lifetime is exactly what we want to represent.

If we called this function from another function, Rust would be able to see that the return value would be valid as long as input parameters are valid. Using this information, Rust would be able to check if a referent would outlive its references.

`'static`

There is a special lifetime called the static lifetime, meaning that the reference can live "during the entire duration of the program". The best example is a string literal that is embedded in a program's binary. Since it is always accessible, the lifetime of a string literal is always 'static.

#![allow(unused)]
fn main() {
fn string_literal() -> &'static str {
    let literal = "a string literal"; // This string is embedded in the program binary.
    literal
}
}

Oftentimes, you will see that the Rust compiler's error messages suggest to use 'static and it is actually an easy way to satisfy the compiler to get your code compiled. However, a lot of times 'static is not what you should use. Thus, it is important to think hard about why 'static is appropriate for you before using it.

The Implication of Lifetimes in Functions

As the above example shows, when you return a reference, you have to tell the Rust compiler how the lifetime of the reference is related to the lifetimes of the input parameters. This has an interesting implication---you can only return a reference if it manipulates the input arguments. For example, suppose you have a custom struct and you create an instance of it within a function. You cannot return a reference to the newly-created instance of the struct.

There are actually two reasons why you cannot return a reference to a newly-created instance of a struct. One is that you cannot represent a lifetime of the reference, and the other is that the instance gets dropped at the end of the function.

When you feel the need to create a new object and return a reference for it, instead of returning a reference directly, you need to use a data structure that transfers ownership, e.g., Box, Vec, or String.

#![allow(unused)]
fn main() {
fn box_creation_and_return() -> Box<i32> {
    let a: i32 = 1;
    return Box::new(a);
}
}

When Do We Need Lifetime Parameters?

If you read the Rust book or the above description of lifetimes, you may get an impression that lifetimes are optional. This is actually not true, and lifetimes are mandatory for all references. The reason why you don't see lifetimes all the time is that the Rust compiler is smart enough to do the work for you. For example, if you use a reference in a struct or enum, you have to have a lifetime.

#![allow(unused)]
fn main() {
struct ProblematicStruct {
    str_slice: &str,
}
}

The above does not work because it has a reference as a field and there is no lifetime. The following fixes it.

#![allow(unused)]
fn main() {
struct ProblematicStruct<'a> {
    str_slice: &'a str,
}
}

As mentioned earlier, <'a> means that you're using a generic lifetime parameter ('a) in the definition of the struct. You can then use it for the reference field. However, within a function, Rust is quite often able to do the work for you. This is called lifetime elision.

There is a simple algorithm that the Rust compiler uses to determine what lifetimes should be, and if the algorithm cannot determine lifetimes, the compiler throws an error. Then you need to manually annotate lifetimes. The algorithm works as follows.

First, the algorithm assigns a lifetime parameter for each input parameter. For example, if fn ex_func(x: &str, y: &str) -> &str {...} is the function, then the algorithm assigns 'a to x and 'b to y like this: fn ex_func(x: &'a str, y: &'b str) -> &str {...}.
Second, if there is exactly one input lifetime (i.e., one parameter), then that lifetime is assigned to all output references. For example, if the function is fn ex_func(x: &str) -> &str {...}, then the algorithm assigns 'a to both the input parameter and the return reference like this: fn ex_func(x: &'a str) -> &'a str {...}.
Third, for methods with &self or &mut self, all output references get the same lifetime as self.

In the above longest() example, since there are two input parameters, the algorithm tries to assign 'a for the first parameter, then 'b for the second parameter. The problem is that there is nothing else the algorithm can do. The second case doesn't apply because there are more than one parameter. The third case doesn't apply either because there is no self reference. Thus, the algorithm fails to determine the lifetimes and asks for manual annotation.

CMPT 479/982