Safety Features of Rust
Now that you are familiar with the basic syntax of Rust, let's talk about what makes Rust different from other languages, namely, the safety features of Rust. Rust employs many features that safeguard you from writing code that might cause problems at run time. The Rust language and compiler have many checks to enforce that you are writing a more reliable program. These very features sometimes make a program hard to compile. But soon you will realize that once you get your program compiled, it will just work as you have intended.
Variable Mutability
The first safety feature to discuss is variable mutability/immutability. By default, a variable in
Rust is immutable meaning that once you assign a value to a variable, you cannot assign another
value to it. If you run the following code, the compiler will complain. The error message basically
says that the variable a
is an immutable variable and you cannot assign a value twice.
fn main() { let a: i32 = 1; println!("a is {}", a); a = 2; println!("a is {}", a); }
In order to make a variable mutable, you need to declare a mutable variable like the following.
fn main() { let mut a: i32 = 1; println!("a is {}", a); a = 2; println!("a is {}", a); }
You use the mut
keyword to define a mutable variable that you can assign a value to more than
once.
Since you define mutable variables explicitly, the Rust compiler knows which variables can be modified. Thus, the Rust compiler can statically check (i.e., check at compile time) if the mutable variables are the only ones modified in your code. For you as a Rust developer, this is a safety feature for a couple of reasons. First, it makes you think hard about whether or not you will need to modify a variable when you define it. In other words, it gives you an opportunity to think about how you intend to use each and every variable in your code. Second, it prevents you from defining a variable in one place with the assumption that you will not modify it, and later modifying the variable inadvertently.
In addition to defining a mutable variable with mut
, you can also reuse the same variable name.
fn main() { let a: i32 = 1; println!("a is {}", a); let a: i32 = a + 1; println!("a is {}", a); let a: i64 = 3; // Different type println!("a is {}", a); }
This is called shadowing, which is effectively redefining a new variable with the same name. This is useful in certain scenarios, e.g., when you have a big chunk of code that uses a variable heavily, but then later realize that you need to do some quick transformations for the variable before it gets heavily used. If that's the case, you can shadow the variable and still take advantage of the Rust compiler's immutability check.
The Rust book has an excellent section on variable mutability, so please go read it.
Ownership
Ownership is perhaps the most distinguishing feature of Rust that everybody talks about, and frankly it will give you some headaches when you try to get the Rust compiler to compile your code. However, it is an important feature of Rust that provides safety.
The whole concept has to do with how to manage memory. In languages like C/C++, the approach is to leave memory management to programmers. What it means is that C/C++ programmers need to allocate memory and free memory by themselves. This has caused many programs to suffer from memory leak problems since it is easy to allocate memory and not free it. Other languages like Java and Python use an automated memory management approach where a garbage collector runs from time to time to reclaim allocated memory that no longer is in use. This approach unburdens programmers from worrying about memory management but has a performance cost because a garbage collector needs to run, which interferes with the normal program execution.
Rust takes a different automated approach to memory management. When you define a variable, Rust allocates a piece of memory that the variable will use. This variable is called the owner of that allocated memory. Later, Rust deallocates the memory when its owner goes out of scope. Rust calls this dropping of memory. In addition, there are certain cases where Rust moves the ownership of a piece of memory from one variable to another. Thus, it is not always the case that the first variable that owns a piece of memory remains its owner the whole time. However, it is the case that there is only a single variable that is the owner of a piece of memory.
There are a few things to unpack here and let's look at one by one.
The Scope of a Variable
Let's first look at what a scope is for a variable. A variable's scope in Rust is similar to other languages and you can easily determine what it is by looking at the block where the variable is first defined. The following examples show two cases to illustrate what a scope is for a variable.
fn main() { let a: i32 = 1; // The scope for `a` starts here. println!("a is {}", a); // This works fine because `a` is still valid. } // The scope for `a` ends here.
fn main() { { let a: i32 = 1; // The scope for `a` starts here. } // The scope for `a` ends here. println!("a is {}", a); // This throws an error, // because `a` is no longer valid. }
As you can see from the examples, a variable has a scope and it is only valid within its scope. In
fact, this is the way most other languages work as well. The difference is that Rust uses a
variable's scope to automatically manage memory. In the above examples, when a
goes out of scope,
Rust drops a
's memory automatically. At this point, you may think that it is still the way other
languages work. You are correct. For stack-allocated memory such as local variables, you never
have to worry about allocating or deallocating memory in other languages either. The difference for
Rust is that you also do no need to worry about it for heap-allocated memory. Let's discuss this a
little further. (If you need a refresher on the stack and the heap, please read the Rust book on
the stack and the
heap).
Automated Heap Memory Management
In languages like C/C++, programmers allocate or deallocate heap memory by invoking memory
management functions such as malloc()
and free()
. In languages like Java and Python, programmers
do not allocate or deallocate heap memory explicitly because it is by default hidden from the
programmers and there is a garbage collector that manages memory.
In Rust, heap allocation/deallocation is by default hidden from the programmers as well, and through
the combination of the Rust compiler and the Rust's standard library, Rust handles the
allocation/deallocation of heap memory. Rust allocates heap memory through convenient data
structures such as String
,
Vec
, and
Box
. These data structures hide all the details
of allocating heap memory. Of course, Rust is a low-level language, so you can allocate heap
memory by yourself. But Rust developers typically do not use the heap that way.
Rust deallocates heap memory via a function called drop()
and this is where ownership plays a
critical role. When a variable that is the owner of a piece of heap memory goes out of scope, Rust
invokes drop()
automatically. By "automatically," we mean that the Rust compiler injects a piece
of code that invokes drop()
. The Rust compiler provides the default drop()
implementation and
deallocates the heap memory used by its owner.
Mechanism-wise, this is similar to C/C++ that use memory allocation/deallocation functions (e.g.,
malloc()
and free()
). It is just that by default, Rust programmers do not need to invoke them by
themselves. This is different from languages like Java or Python where a separate runtime component,
i.e., a garbage collector, is used to manage memory.
The following examples use the Box
data structure to allocate heap memory.
fn main() { let a = Box::new(1); // The heap memory for `a` gets allocated. // Don't worry about the syntax for `Box` for now. println!("a is {}", a); // This works fine because `a` is still valid. } // The heap memory for `a` gets deallocated.
fn main() { { let a = Box::new(1); // The heap memory for `a` gets allocated. } // The heap memory for `a` gets deallocated. println!("a is {}", a); // This throws an error, // because `a` is no longer valid. }
Earlier we said that each allocated piece of memory in Rust has an owner and there is always a
single owner. Since drop()
by default takes care of deallocation when an owner goes out of scope,
we mostly do not need to worry about memory leak problems. One caveat is that Rust does not
prevent programmers from manually allocating and deallocating heap memory. Thus, it is possible to
suffer from memory leaks when a programmer tries to manage memory explicitly and does not do a
thorough job for it. However, Rust programmers typically do not choose to manage heap memory by
themselves, so there is a low chance of getting into memory leak problems.
Determining Ownership
Since Rust calls drop()
when an owner goes out of scope, it is absolutely critical to be able to
determine whether or not a variable is the owner of a piece of memory. If a single variable accesses
a piece of heap memory exclusively throughout a whole program, it is easy to determine the
ownership. However, it is too restrictive to not allow two or more variables to access the same
piece of memory. Thus, Rust employs a few mechanisms to keep track of ownership.
Move
By default, when you assign a variable to another variable, Rust moves the ownership. This is probably one of the most surprising aspects about Rust as a beginner. Let's look at the following code to see what this means.
fn main() { let a = Box::new(1); // `a` is the owner of the memory for `Box`. let b = a; // Rust moves the ownership of the `Box` from `a` to `b`. println!("b is {}", b); // This works fine. }
fn main() { let a = Box::new(1); // `a` is the owner of the memory for `Box`. let b = a; // Rust moves the ownership of the `Box` from `a` to `b`. println!("a is {}", a); // This throws an error, // because `a` no longer has access to the `Box`. }
As you can see, if you assign a variable to another variable, Rust no longer allows us to use the original variable. The same thing happens with function calls and return values.
fn main() { let a = String::from("a"); // Don't worry about the syntax for `String` for now. print_str(a); // `a` moves to the function `print_str()`. } fn print_str(x: String) { // `x` is the (new) owner of the string passed in. println!("The string is {}", x); }
fn main() { let a = String::from("a"); print_str(a); // `a` moves into the function `print_str()`. println!("a is {}", a); // This throws an error, // because `a` can no longer access the string. } fn print_str(x: String) { println!("String {}", x); }
fn get_str() -> String { let x = String::from("a"); x // `x` moves to the caller } fn main() { let a = get_str(); // `a` is the new owner of the String "a". println!("a is {}", a); }
You might be wondering why this is necessary. Let's take a look at the first example to understand further.
fn main() { let a = String::from("a"); print_str(a); // `a` moves into the function `print_str()`. println!("a is {}", a); // This throws an error, // because `a` can no longer access the string. } fn print_str(x: String) { println!("The string is {}", x); } // Since `x` is the owner, Rust deallocates the String at this point, // because `x` is out of scope.
In the code, you can see that x
becomes the new owner of the String "a"
and it goes out of scope
when the function print_str()
is done. Thus, Rust will drop the String
at that point. Thus,
a
should not be able to access the memory location after the function returns. Otherwise, a
will
access the memory location that is already dropped.
Generally speaking, if you have two different variables that can access the same heap location (called aliases), it can cause problems. For example, one variable can free the memory at one point while the other variable access the memory at some later point. This is called use-after-free and it is a well-known bug that can cause a vulnerability. Similarly, one variable can free the memory at one point and the other variable can free the same memory again at some later point. This is called double free and it is also a well-known bug that can cause a vulnerability. By moving the ownership and not allowing the original variable to access the value it had, Rust helps prevent problems caused by two variables accessing the same heap location.
However, you might think that this is too restrictive. For example, if you can't use variables every time you call a function and pass them as arguments, it will be very difficult to write a program. Thus, Rust provides many ways to help you deal with the restriction.
Copy and Clone
By default, primitive data types such as i32, i64, etc. do not move ownership. Instead, they just copy the value to a new memory location. The following example illustrates that.
fn main() { let a = 1; let b = 2; let s = sum(a, b); // This does not move the ownership. println!("a is {}", a); // This works fine. println!("b is {}", b); // This works fine. println!("s is {}", s); } fn sum(x: i32, y: i32) -> i32 { x + y // This does not move the ownership either. }
As we can see, even if we pass a
and b
as arguments to sum()
, we can still use them later. It
is the same with the return value of sum()
, although the code does not directly illustrate that.
All this is because primitive data types copy instead of move, hence do not transfer ownership.
Rust distinguishes copy and move by looking at whether or not a data type implements something
called the Copy
trait (we will look at
what a trait is later). All primitive data types implement the Copy
trait while data structures
like String
and Box
do not. You can define your custom data structure and implement the Copy
trait to use the copy semantics instead of the move semantics for your data structure. A typical
criterion to use when deciding whether or not you want to implement the Copy
trait is the cost and
complexity of copying. For example, primitive data types are small in size and the sizes are fixed.
Thus, it is relatively inexpensive and easy to copy. However, String
or Box
point to a location
on the heap, and the sizes are often not known a priori. Thus, it may not be easy or inexpensive to
copy.
Another way to copy is cloning. If a data type implements the Clone
trait, you can call clone()
to explicitly
create a duplicated object. This is different from copy because there has to be an explicit call.
Borrow
Rust provides another alternative to move, which is called a borrow. This uses &
to represent
that a variable is borrowing a value from another variable.
fn main() { let str = String::from("a"); print_str(&str); // `&` is used to represent a borrow. println!("Can still access str: {}", str); } fn print_str(s: &String) { // `&` is used along with the type. println!("The string is {}", s); }
When you pass a variable to a function to borrow it instead of moving it, there are two things you
need to do. First, you need to pass a variable and add &
, and second, you need to use &
in your
function definition as part of the type for each borrow parameter. Similar to C/C++, &
is called a
reference, but in Rust, it's better to think of it as a borrow rather than a pointer.
Mutable Reference
One caveat for borrowing is that it is read-only.
fn main() { let str = String::from("a"); print_str(&str); // `&` is used to represent a borrow. println!("Can still access str: {}", str); } fn print_str(s: &String) { // `&` is used along with the type. println!("The string is {}", s); s.push_str("_added_more"); // This throws an error since `s` is read-only. println!("The new string is {}", s); }
Again, this is quite restrictive since you cannot modify the value coming in as an argument. Thus, Rust provides a mutable borrow.
fn main() { let mut str = String::from("a"); // `mut` is used. print_str(&mut str); // `&mut` is used. println!("Can still access str: {}", str); } fn print_str(s: &mut String) { // `&mut` is used. println!("The string is {}", s); s.push_str("_added_more"); // This works now. println!("The new string is {}", s); }
There are three different things here. First, when defining str
, we use mut
to represent that
str
has a mutable value. Second, when passing str
to print_str()
, we use &mut
to represent
that it is a mutable borrow, i.e., we are saying that print_str()
not only borrows the value but
also modifies the value. Third, in the parameter definition of s
in print_str()
, we use &mut
to represent that print_str()
modifies the value it is borrowing.
The Borrow Checker and the Aliasing XOR Mutability Principle
Mutable borrowing gives us flexibility of being able to modify a borrowed value within a function. However, it has a risk of data races. If you need a refresher on data races, please read the Rust book on mutable references, which explains the data race problem. In a nutshell, if two references have mutable access to the same memory location, then one can modify the value without the other knowing. Data races are known to be difficult to track down and fix.
Rust safeguards its programs from experiencing this problem by employing a principle commonly known
as aliasing XOR mutability. It means that you get either aliasing or mutability, but not both. As
mentioned earlier, aliasing means having two or more references to the same (heap) memory location.
Mutability means having the ability to modify the value at a memory location. Thus, aliasing XOR
mutability means that you have either exactly one mutable reference (a variable defined with &mut
)
or two or more references (variables defined with just &
), but not both. The following
illustrates the principle.
fn main() { let a = String::from("a"); let b = &a; let c = &a; // So far we have two additional references to `a`. // This is aliasing, which is fine, as long as // those references don't have mutability. println!("a is {}", a); println!("b is {}", b); println!("c is {}", c); }
fn main() { let mut a = String::from("a"); let b = &a; let c = &mut a; // This is a problem because `b` is an alias, // and `c` has mutability. This is // both aliasing and mutability, not XOR. println!("a is {}", a); println!("b is {}", b); println!("c is {}", c); }
fn main() { let mut a = String::from("a"); let b = &mut a; let c = &mut a; // This does not work either, // because both `b` and `c` are mutable aliases. // I.e., both aliasing and mutability, not XOR. println!("a is {}", a); println!("b is {}", b); println!("c is {}", c); }
The Rust compiler has a component called the borrow checker that enforces the aliasing XOR mutability principle at compile time. Oftentimes, this borrow checking gives a hard time to beginners and people say they're "fighting the borrow checker" because the Rust compiler keeps rejecting a program due to borrow checking rules. Thus, it is important to understand how exactly borrow checking works. Practice is a must here and also make sure you read The Rust book on borrow checking.
Option
Another important safety aspect of Rust is its approach to handling variables with no values. If you
have experience with programming, you probably know already that there are many cases where a
variable does not have a meaningful value. In those cases, values like null
or just plain 0
is
used to represent that a variable doesn't have a meaningful value. However, this has led to numerous
bugs and vulnerabilities since programmers often forget to handle null
or 0
and get a runtime
error, e.g., a null pointer exception.
In Rust, there is no null
value that you can use. Instead, the standard Rust library provides an
alternative called Option
. Rust programmers use Option
heavily and you can find it everywhere,
e.g., the standard library, external crates, etc. Thus, it is absolutely critical to understand what
Option
is and how to use it.
Option
is defined as follows.
#![allow(unused)] fn main() { enum Option<T> { None, Some(T), } }
The definition of
Option
usesenum
, which is something we have not discussed yet. It is similar to enumeration types in other languages like C/C++ or Java. Anenum
defines a custom type and lists all possible values that a variable of that type can have. For example, the following code defines anenum
type calledEx
and it has two possibilities.#![allow(unused)] fn main() { enum Ex { FirstPossibility, SecondPossibility, } }
These possibilities are called
enum
variants, and when using a variant from anenum
, you need to use::
.enum Ex { FirstPossibility, SecondPossibility, } fn main() { let a: Ex = Ex::FirstPossibility; }
You can find more details in the Rust book's section on
enum
.
If you look at the Option
definition, it defines an enum
that has two variants, one is
Option::None
used when a variable does not have a meaningful value, and the other is
Option::Some
used when a variable does have a meaningful value.
Option::Some
has a couple of additional details to discuss. First is the use ofT
found inOption<T>
andSome(T)
. ThisT
is called a generic type parameter (and it does not have to be the letter T). If you know the support for generics in other languages like C/C++ or Java, you can probably understand what it is quickly.T
is a variable that can take a type instead of a value. What this means is that instead of definingOption
for every single type there is, e.g., anOption
fori32
, anOption
fori64
, etc., we can define it once using a generic type variable and instantiate anOption
for any type. In theOption
definition above, a generic type variableT
is used inOption<T>
to declare that theenum Option
is defined for all types.The second detail is the definition
Some(T)
. This declares thatSome
is a variant that should take a value of the typeT
. This is different fromFirstPossibility
orNone
in the above examples because it is a variant that expects a value of a certain type.The following example demonstrates all of these.
fn main() { let option_some_for_i32: Option<i32> = Some(1); let option_none_for_i32: Option<i32> = None; let option_some_for_string: Option<String> = Some(String::from("str")); }
You can find more details in the Rust book's secion on generic data types.
By declaring a variable with Option
, you are explicitly saying that a variable may or may not have
a meaningful value and more importantly, you are forcing yourself to deal with both cases in your
code.
In the above example, you might have noticed that Some
and None
are not used with Option::
,
i.e., not as Option::Some
or Option::None
but as Some
and None
. This is because Rust
automatically imports the definitions so you can use them without having the Option::
qualifier.
This is called the prelude, i.e., things that
every Rust program automatically imports by default.
There are a lot of details that we do not discuss here regarding Option
and enum
. Make sure you
read the Rust book on enum
and pattern
matching as well as on generic data
types.
Result
The last safety aspect of Rust to highlight is its approach to error handling. Some languages use
values to represent an error condition, e.g., null
or a negative integer such as -1
. Other
languages use an error reporting mechanism that is outside of regular return paths, e.g., throw
and try-catch
in Java. Rust unifies these two approaches and use an enum
called Result
to
return a value or report an error. Similar to Option
, Rust programmers heavily use Result
and
you can find it everywhere. Thus, it is also critical to understand what Result
is and how to use
it.
The definition looks like the following.
#![allow(unused)] fn main() { enum Result<T, E> { Ok(T), Err(E), } }
The first variant Result::Ok
represents a success with a value. The second variant Result::Err
represents an error with an error value. Thus, Result
is typically used as a return value type.
#![allow(unused)] fn main() { fn function_with_result(success_or_fail: bool) -> Result<String, String> { match success_or_fail { true => Ok(String::from("success")), false => Err(String::from("fail")), } } }
A common way to work with
Result
(as well asOption
) is usingmatch
that we have discussed earlier. The following example shows an example and also demonstrates the power ofmatch
for pattern matching that was briefly mentioned earlier.fn function_with_result(success_or_fail: bool) -> Result<String, String> { match success_or_fail { true => Ok(String::from("success")), false => Err(String::from("fail")), } } fn main() { let result = function_with_result(true); match result { Ok(success_result) => println!("Success: {}", success_result), Err(error_result) => println!("Error: {}", error_result), } let result = function_with_result(false); match result { Ok(success_result) => println!("Success: {}", success_result), Err(error_result) => println!("Error: {}", error_result), } }
As we can see,
match
not only recognizes thatresult
is eitherOk()
orErr()
but also assigns the value ofOk()
(orErr()
) tosuccess_result
(orerror_result
).Another common way is to use
if let
, which is similar tomatch
.fn function_with_result(success_or_fail: bool) -> Result<String, String> { match success_or_fail { true => Ok(String::from("success")), false => Err(String::from("fail")), } } fn main() { let result = function_with_result(true); if let Ok(success_result) = result { println!("Success: {}", success_result); } else { println!("Error"); } let result = function_with_result(false); if let Err(error_result) = result { println!("Error: {}", error_result); } else { println!("Success"); } }
if let
attempts to perform a pattern match and if it is successful, it executes theif let
block. Otherwise it executes theelse
block.You can find more details on
if let
and pattern matching in the Rust book.
Similar to Option
, by declaring a return type as Result
, you are forcing yourself to handle both
the success case and the error case. There are a lot of details about Result
and error handling
that we do not cover here, so please make sure you read the Rust book on error
handling.