Safety Features of Rust

Now that you are familiar with the basic syntax of Rust, let's talk about what makes Rust different from other languages, namely, the safety features of Rust. Rust employs many features that safeguard you from writing code that might cause problems at run time. The Rust language and compiler have many checks to enforce that you are writing a more reliable program. These very features sometimes make a program hard to compile. But soon you will realize that once you get your program compiled, it will just work as you have intended.

Variable Mutability

The first safety feature to discuss is variable mutability/immutability. By default, a variable in Rust is immutable meaning that once you assign a value to a variable, you cannot assign another value to it. If you run the following code, the compiler will complain. The error message basically says that the variable a is an immutable variable and you cannot assign a value twice.

fn main() {
    let a: i32 = 1;
    println!("a is {}", a);

    a = 2;
    println!("a is {}", a);
}

In order to make a variable mutable, you need to declare a mutable variable like the following.

fn main() {
    let mut a: i32 = 1;
    println!("a is {}", a);

    a = 2;
    println!("a is {}", a);
}

You use the mut keyword to define a mutable variable that you can assign a value to more than once.

Since you define mutable variables explicitly, the Rust compiler knows which variables can be modified. Thus, the Rust compiler can statically check (i.e., check at compile time) if the mutable variables are the only ones modified in your code. For you as a Rust developer, this is a safety feature for a couple of reasons. First, it makes you think hard about whether or not you will need to modify a variable when you define it. In other words, it gives you an opportunity to think about how you intend to use each and every variable in your code. Second, it prevents you from defining a variable in one place with the assumption that you will not modify it, and later modifying the variable inadvertently.

In addition to defining a mutable variable with mut, you can also reuse the same variable name.

fn main() {
    let a: i32 = 1;
    println!("a is {}", a);

    let a: i32 = a + 1;
    println!("a is {}", a);

    let a: i64 = 3; // Different type
    println!("a is {}", a);
}

This is called shadowing, which is effectively redefining a new variable with the same name. This is useful in certain scenarios, e.g., when you have a big chunk of code that uses a variable heavily, but then later realize that you need to do some quick transformations for the variable before it gets heavily used. If that's the case, you can shadow the variable and still take advantage of the Rust compiler's immutability check.

The Rust book has an excellent section on variable mutability, so please go read it.

Ownership

Ownership is perhaps the most distinguishing feature of Rust that everybody talks about, and frankly it will give you some headaches when you try to get the Rust compiler to compile your code. However, it is an important feature of Rust that provides safety.

The whole concept has to do with how to manage memory. In languages like C/C++, the approach is to leave memory management to programmers. What it means is that C/C++ programmers need to allocate memory and free memory by themselves. This has caused many programs to suffer from memory leak problems since it is easy to allocate memory and not free it. Other languages like Java and Python use an automated memory management approach where a garbage collector runs from time to time to reclaim allocated memory that no longer is in use. This approach unburdens programmers from worrying about memory management but has a performance cost because a garbage collector needs to run, which interferes with the normal program execution.

Rust takes a different automated approach to memory management. When you define a variable, Rust allocates a piece of memory that the variable will use. This variable is called the owner of that allocated memory. Later, Rust deallocates the memory when its owner goes out of scope. Rust calls this dropping of memory. In addition, there are certain cases where Rust moves the ownership of a piece of memory from one variable to another. Thus, it is not always the case that the first variable that owns a piece of memory remains its owner the whole time. However, it is the case that there is only a single variable that is the owner of a piece of memory.

There are a few things to unpack here and let's look at one by one.

The Scope of a Variable

Let's first look at what a scope is for a variable. A variable's scope in Rust is similar to other languages and you can easily determine what it is by looking at the block where the variable is first defined. The following examples show two cases to illustrate what a scope is for a variable.

fn main() {
    let a: i32 = 1; // The scope for `a` starts here.
    println!("a is {}", a); // This works fine because `a` is still valid.
} // The scope for `a` ends here.

fn main() {
    {
        let a: i32 = 1; // The scope for `a` starts here.
    } // The scope for `a` ends here.
    println!("a is {}", a); // This throws an error,
                            // because `a` is no longer valid.
}

As you can see from the examples, a variable has a scope and it is only valid within its scope. In fact, this is the way most other languages work as well. The difference is that Rust uses a variable's scope to automatically manage memory. In the above examples, when a goes out of scope, Rust drops a's memory automatically. At this point, you may think that it is still the way other languages work. You are correct. For stack-allocated memory such as local variables, you never have to worry about allocating or deallocating memory in other languages either. The difference for Rust is that you also do no need to worry about it for heap-allocated memory. Let's discuss this a little further. (If you need a refresher on the stack and the heap, please read the Rust book on the stack and the heap).

Automated Heap Memory Management

In languages like C/C++, programmers allocate or deallocate heap memory by invoking memory management functions such as malloc() and free(). In languages like Java and Python, programmers do not allocate or deallocate heap memory explicitly because it is by default hidden from the programmers and there is a garbage collector that manages memory.

In Rust, heap allocation/deallocation is by default hidden from the programmers as well, and through the combination of the Rust compiler and the Rust's standard library, Rust handles the allocation/deallocation of heap memory. Rust allocates heap memory through convenient data structures such as String, Vec, and Box. These data structures hide all the details of allocating heap memory. Of course, Rust is a low-level language, so you can allocate heap memory by yourself. But Rust developers typically do not use the heap that way.

Rust deallocates heap memory via a function called drop() and this is where ownership plays a critical role. When a variable that is the owner of a piece of heap memory goes out of scope, Rust invokes drop() automatically. By "automatically," we mean that the Rust compiler injects a piece of code that invokes drop(). The Rust compiler provides the default drop() implementation and deallocates the heap memory used by its owner.

Mechanism-wise, this is similar to C/C++ that use memory allocation/deallocation functions (e.g., malloc() and free()). It is just that by default, Rust programmers do not need to invoke them by themselves. This is different from languages like Java or Python where a separate runtime component, i.e., a garbage collector, is used to manage memory.

The following examples use the Box data structure to allocate heap memory.

fn main() {
    let a = Box::new(1); // The heap memory for `a` gets allocated.
                         // Don't worry about the syntax for `Box` for now.
    println!("a is {}", a); // This works fine because `a` is still valid.
} // The heap memory for `a` gets deallocated.

fn main() {
    {
        let a = Box::new(1); // The heap memory for `a` gets allocated.
    } // The heap memory for `a` gets deallocated.
    println!("a is {}", a); // This throws an error,
                            // because `a` is no longer valid.
}

Earlier we said that each allocated piece of memory in Rust has an owner and there is always a single owner. Since drop() by default takes care of deallocation when an owner goes out of scope, we mostly do not need to worry about memory leak problems. One caveat is that Rust does not prevent programmers from manually allocating and deallocating heap memory. Thus, it is possible to suffer from memory leaks when a programmer tries to manage memory explicitly and does not do a thorough job for it. However, Rust programmers typically do not choose to manage heap memory by themselves, so there is a low chance of getting into memory leak problems.

Determining Ownership

Since Rust calls drop() when an owner goes out of scope, it is absolutely critical to be able to determine whether or not a variable is the owner of a piece of memory. If a single variable accesses a piece of heap memory exclusively throughout a whole program, it is easy to determine the ownership. However, it is too restrictive to not allow two or more variables to access the same piece of memory. Thus, Rust employs a few mechanisms to keep track of ownership.

Move

By default, when you assign a variable to another variable, Rust moves the ownership. This is probably one of the most surprising aspects about Rust as a beginner. Let's look at the following code to see what this means.

fn main() {
    let a = Box::new(1); // `a` is the owner of the memory for `Box`.
    let b = a; // Rust moves the ownership of the `Box` from `a` to `b`.

    println!("b is {}", b); // This works fine.
}

fn main() {
    let a = Box::new(1); // `a` is the owner of the memory for `Box`.
    let b = a; // Rust moves the ownership of the `Box` from `a` to `b`.

    println!("a is {}", a); // This throws an error,
                            // because `a` no longer has access to the `Box`.
}

As you can see, if you assign a variable to another variable, Rust no longer allows us to use the original variable. The same thing happens with function calls and return values.

fn main() {
    let a = String::from("a"); // Don't worry about the syntax for `String` for now.
    print_str(a); // `a` moves to the function `print_str()`.
}

fn print_str(x: String) { // `x` is the (new) owner of the string passed in.
    println!("The string is {}", x);
}

fn main() {
    let a = String::from("a");
    print_str(a); // `a` moves into the function `print_str()`.

    println!("a is {}", a); // This throws an error,
                            // because `a` can no longer access the string.
}

fn print_str(x: String) {
    println!("String {}", x);
}

fn get_str() -> String {
    let x = String::from("a");
    x // `x` moves to the caller
}

fn main() {
    let a = get_str(); // `a` is the new owner of the String "a".
    println!("a is {}", a);
}

You might be wondering why this is necessary. Let's take a look at the first example to understand further.

fn main() {
    let a = String::from("a");
    print_str(a); // `a` moves into the function `print_str()`.

    println!("a is {}", a); // This throws an error,
                            // because `a` can no longer access the string.
}

fn print_str(x: String) {
    println!("The string is {}", x);
} // Since `x` is the owner, Rust deallocates the String at this point,
  // because `x` is out of scope.

In the code, you can see that x becomes the new owner of the String "a" and it goes out of scope when the function print_str() is done. Thus, Rust will drop the String at that point. Thus, a should not be able to access the memory location after the function returns. Otherwise, a will access the memory location that is already dropped.

Generally speaking, if you have two different variables that can access the same heap location (called aliases), it can cause problems. For example, one variable can free the memory at one point while the other variable access the memory at some later point. This is called use-after-free and it is a well-known bug that can cause a vulnerability. Similarly, one variable can free the memory at one point and the other variable can free the same memory again at some later point. This is called double free and it is also a well-known bug that can cause a vulnerability. By moving the ownership and not allowing the original variable to access the value it had, Rust helps prevent problems caused by two variables accessing the same heap location.

However, you might think that this is too restrictive. For example, if you can't use variables every time you call a function and pass them as arguments, it will be very difficult to write a program. Thus, Rust provides many ways to help you deal with the restriction.

Copy and Clone

By default, primitive data types such as i32, i64, etc. do not move ownership. Instead, they just copy the value to a new memory location. The following example illustrates that.

fn main() {
    let a = 1;
    let b = 2;
    let s = sum(a, b); // This does not move the ownership.

    println!("a is {}", a); // This works fine.
    println!("b is {}", b); // This works fine.
    println!("s is {}", s);
}

fn sum(x: i32, y: i32) -> i32 {
    x + y // This does not move the ownership either.
}

As we can see, even if we pass a and b as arguments to sum(), we can still use them later. It is the same with the return value of sum(), although the code does not directly illustrate that. All this is because primitive data types copy instead of move, hence do not transfer ownership.

Rust distinguishes copy and move by looking at whether or not a data type implements something called the Copy trait (we will look at what a trait is later). All primitive data types implement the Copy trait while data structures like String and Box do not. You can define your custom data structure and implement the Copy trait to use the copy semantics instead of the move semantics for your data structure. A typical criterion to use when deciding whether or not you want to implement the Copy trait is the cost and complexity of copying. For example, primitive data types are small in size and the sizes are fixed. Thus, it is relatively inexpensive and easy to copy. However, String or Box point to a location on the heap, and the sizes are often not known a priori. Thus, it may not be easy or inexpensive to copy.

Another way to copy is cloning. If a data type implements the Clone trait, you can call clone() to explicitly create a duplicated object. This is different from copy because there has to be an explicit call.

Borrow

Rust provides another alternative to move, which is called a borrow. This uses & to represent that a variable is borrowing a value from another variable.

fn main() {
    let str = String::from("a");
    print_str(&str); // `&` is used to represent a borrow.
    println!("Can still access str: {}", str);
}

fn print_str(s: &String) { // `&` is used along with the type.
    println!("The string is {}", s);
}

When you pass a variable to a function to borrow it instead of moving it, there are two things you need to do. First, you need to pass a variable and add &, and second, you need to use & in your function definition as part of the type for each borrow parameter. Similar to C/C++, & is called a reference, but in Rust, it's better to think of it as a borrow rather than a pointer.

Mutable Reference

One caveat for borrowing is that it is read-only.

fn main() {
    let str = String::from("a");
    print_str(&str); // `&` is used to represent a borrow.
    println!("Can still access str: {}", str);
}

fn print_str(s: &String) { // `&` is used along with the type.
    println!("The string is {}", s);
    s.push_str("_added_more"); // This throws an error since `s` is read-only.
    println!("The new string is {}", s);
}

Again, this is quite restrictive since you cannot modify the value coming in as an argument. Thus, Rust provides a mutable borrow.

fn main() {
    let mut str = String::from("a"); // `mut` is used.
    print_str(&mut str); // `&mut` is used.
    println!("Can still access str: {}", str);
}

fn print_str(s: &mut String) { // `&mut` is used.
    println!("The string is {}", s);
    s.push_str("_added_more"); // This works now.
    println!("The new string is {}", s);
}

There are three different things here. First, when defining str, we use mut to represent that str has a mutable value. Second, when passing str to print_str(), we use &mut to represent that it is a mutable borrow, i.e., we are saying that print_str() not only borrows the value but also modifies the value. Third, in the parameter definition of s in print_str(), we use &mut to represent that print_str() modifies the value it is borrowing.

The Borrow Checker and the Aliasing XOR Mutability Principle

Mutable borrowing gives us flexibility of being able to modify a borrowed value within a function. However, it has a risk of data races. If you need a refresher on data races, please read the Rust book on mutable references, which explains the data race problem. In a nutshell, if two references have mutable access to the same memory location, then one can modify the value without the other knowing. Data races are known to be difficult to track down and fix.

Rust safeguards its programs from experiencing this problem by employing a principle commonly known as aliasing XOR mutability. It means that you get either aliasing or mutability, but not both. As mentioned earlier, aliasing means having two or more references to the same (heap) memory location. Mutability means having the ability to modify the value at a memory location. Thus, aliasing XOR mutability means that you have either exactly one mutable reference (a variable defined with &mut) or two or more references (variables defined with just &), but not both. The following illustrates the principle.

fn main() {
    let a = String::from("a");
    let b = &a;
    let c = &a; // So far we have two additional references to `a`.
                // This is aliasing, which is fine, as long as
                // those references don't have mutability.

    println!("a is {}", a);
    println!("b is {}", b);
    println!("c is {}", c);
}

fn main() {
    let mut a = String::from("a");
    let b = &a;
    let c = &mut a; // This is a problem because `b` is an alias,
                    // and `c` has mutability. This is
                    // both aliasing and mutability, not XOR.

    println!("a is {}", a);
    println!("b is {}", b);
    println!("c is {}", c);
}

fn main() {
    let mut a = String::from("a");
    let b = &mut a;
    let c = &mut a; // This does not work either,
                    // because both `b` and `c` are mutable aliases.
                    // I.e., both aliasing and mutability, not XOR.

    println!("a is {}", a);
    println!("b is {}", b);
    println!("c is {}", c);
}

The Rust compiler has a component called the borrow checker that enforces the aliasing XOR mutability principle at compile time. Oftentimes, this borrow checking gives a hard time to beginners and people say they're "fighting the borrow checker" because the Rust compiler keeps rejecting a program due to borrow checking rules. Thus, it is important to understand how exactly borrow checking works. Practice is a must here and also make sure you read The Rust book on borrow checking.

`Option`

Another important safety aspect of Rust is its approach to handling variables with no values. If you have experience with programming, you probably know already that there are many cases where a variable does not have a meaningful value. In those cases, values like null or just plain 0 is used to represent that a variable doesn't have a meaningful value. However, this has led to numerous bugs and vulnerabilities since programmers often forget to handle null or 0 and get a runtime error, e.g., a null pointer exception.

In Rust, there is no null value that you can use. Instead, the standard Rust library provides an alternative called Option. Rust programmers use Option heavily and you can find it everywhere, e.g., the standard library, external crates, etc. Thus, it is absolutely critical to understand what Option is and how to use it.

Option is defined as follows.

#![allow(unused)]
fn main() {
enum Option<T> {
    None,
    Some(T),
}
}

The definition of Option uses enum, which is something we have not discussed yet. It is similar to enumeration types in other languages like C/C++ or Java. An enum defines a custom type and lists all possible values that a variable of that type can have. For example, the following code defines an enum type called Ex and it has two possibilities.
#![allow(unused)]
fn main() {
enum Ex {
    FirstPossibility,
    SecondPossibility,
}
}
These possibilities are called enum variants, and when using a variant from an enum, you need to use ::.
enum Ex {
    FirstPossibility,
    SecondPossibility,
}

fn main() {
    let a: Ex = Ex::FirstPossibility;
}
You can find more details in the Rust book's section on enum.

If you look at the Option definition, it defines an enum that has two variants, one is Option::None used when a variable does not have a meaningful value, and the other is Option::Some used when a variable does have a meaningful value.

Option::Some has a couple of additional details to discuss. First is the use of T found in Option<T> and Some(T). This T is called a generic type parameter (and it does not have to be the letter T). If you know the support for generics in other languages like C/C++ or Java, you can probably understand what it is quickly. T is a variable that can take a type instead of a value. What this means is that instead of defining Option for every single type there is, e.g., an Option for i32, an Option for i64, etc., we can define it once using a generic type variable and instantiate an Option for any type. In the Option definition above, a generic type variable T is used in Option<T> to declare that the enum Option is defined for all types.

The second detail is the definition Some(T). This declares that Some is a variant that should take a value of the type T. This is different from FirstPossibility or None in the above examples because it is a variant that expects a value of a certain type.

The following example demonstrates all of these.
fn main() {
    let option_some_for_i32: Option<i32> = Some(1);
    let option_none_for_i32: Option<i32> = None;
    let option_some_for_string: Option<String> = Some(String::from("str"));
}
You can find more details in the Rust book's secion on generic data types.

By declaring a variable with Option, you are explicitly saying that a variable may or may not have a meaningful value and more importantly, you are forcing yourself to deal with both cases in your code.

In the above example, you might have noticed that Some and None are not used with Option::, i.e., not as Option::Some or Option::None but as Some and None. This is because Rust automatically imports the definitions so you can use them without having the Option:: qualifier. This is called the prelude, i.e., things that every Rust program automatically imports by default.

There are a lot of details that we do not discuss here regarding Option and enum. Make sure you read the Rust book on enum and pattern matching as well as on generic data types.

`Result`

The last safety aspect of Rust to highlight is its approach to error handling. Some languages use values to represent an error condition, e.g., null or a negative integer such as -1. Other languages use an error reporting mechanism that is outside of regular return paths, e.g., throw and try-catch in Java. Rust unifies these two approaches and use an enum called Result to return a value or report an error. Similar to Option, Rust programmers heavily use Result and you can find it everywhere. Thus, it is also critical to understand what Result is and how to use it.

The definition looks like the following.

#![allow(unused)]
fn main() {
enum Result<T, E> {
    Ok(T),
    Err(E),
}
}

The first variant Result::Ok represents a success with a value. The second variant Result::Err represents an error with an error value. Thus, Result is typically used as a return value type.

#![allow(unused)]
fn main() {
fn function_with_result(success_or_fail: bool) -> Result<String, String> {
    match success_or_fail {
        true => Ok(String::from("success")),
        false => Err(String::from("fail")),
    }
}
}

A common way to work with Result (as well as Option) is using match that we have discussed earlier. The following example shows an example and also demonstrates the power of match for pattern matching that was briefly mentioned earlier.

fn function_with_result(success_or_fail: bool) -> Result<String, String> {
    match success_or_fail {
        true => Ok(String::from("success")),
        false => Err(String::from("fail")),
    }
}

fn main() {
    let result = function_with_result(true);

    match result {
        Ok(success_result) => println!("Success: {}", success_result),
        Err(error_result) => println!("Error: {}", error_result),
    }

    let result = function_with_result(false);
    match result {
        Ok(success_result) => println!("Success: {}", success_result),
        Err(error_result) => println!("Error: {}", error_result),
    }
}

As we can see, match not only recognizes that result is either Ok() or Err() but also assigns the value of Ok() (or Err()) to success_result (or error_result).

Another common way is to use if let, which is similar to match.

fn function_with_result(success_or_fail: bool) -> Result<String, String> {
    match success_or_fail {
        true => Ok(String::from("success")),
        false => Err(String::from("fail")),
    }
}

fn main() {
    let result = function_with_result(true);

    if let Ok(success_result) = result {
        println!("Success: {}", success_result);
    } else {
        println!("Error");
    }

    let result = function_with_result(false);
    if let Err(error_result) = result {
        println!("Error: {}", error_result);
    } else {
        println!("Success");
    }
}

if let attempts to perform a pattern match and if it is successful, it executes the if let block. Otherwise it executes the else block.

You can find more details on if let and pattern matching in the Rust book.

Similar to Option, by declaring a return type as Result, you are forcing yourself to handle both the success case and the error case. There are a lot of details about Result and error handling that we do not cover here, so please make sure you read the Rust book on error handling.

CMPT 479/982