Expressive Power and Unsafe Rust

Rust is a low-level language, meaning you can do mostly anything that other languages allow you to do. For example, you can allocate and deallocate memory by yourself as mentioned earlier. You can also do other things that low-level languages such as C or C++ allow you to do, while higher-level languages such as Java or Python do not. However, Rust recognizes that it is not always desirable or safe to allow potentially bug- or vulnerability-inducing operations. Thus, Rust tries to strike a balance between what is "safe" and what is "unsafe" and distinguish what is commonly called unsafe Rust from (what is commonly called) safe Rust.

The reason why unsafe Rust exists is because of expressiveness vs. safety that Rust presents as a language. Using safe Rust, especially when you write low-level code such as shell commands, system libraries, or kernel, you might encounter cases where you have a hard time expressing, or even cannot really express, what you want to express.

Example

The most famous example is linked lists. There is actually a whole online book about writing lists in Rust. It is not our goal to look at all the details, but let's take a look at an example to make our discussion a little more concrete.

In the example for linked lists below, we use struct, which is similar to the one in C/C++. It defines a custom type with a list of members. You can read the Rust book on struct to learn about the details. Below is a simple example for a struct definition and initialization.

struct Ex {
    member1: i32,
    member2: String,
}

fn main() {
    let struct_var: Ex = Ex {
        member1: 1,
        member2: String::from("member2"),
    };

    println!("Member1: {}", struct_var.member1);
    println!("Member2: {}", struct_var.member2);
}

The example here illustrates how a circular linked list, which is not difficult to express in other languages, does not translate easily to Rust.

struct Node {
    next: Option<Box<Node>>, // The definition of `Box` is actually
                             // `Box<T>` with a generic type parameter `T`.
}

fn main() {
    let mut tail = Box::new(Node {
        next: None,
    }); // A tail node that doesn't have the next node for now.

    let head = Box::new(Node {
        next: Some(tail),
    }); // A head node that has the tail node as the next node.

    tail.next = Some(head); // An attempt to have the tail node
                            // point back to the head node
}

The problem here is ownership. First, we assign a Box to tail. We then assign Some(tail) to head.next, so head becomes the owner of the Box at that point. This means that tail no longer has access to the Box. But then we try assigning Some(head) to tail.next, meaning we try to access the original Box that tail no longer can access.

This is one example that shows how safe Rust trades off expressive power for safety. In other words, safe Rust sometimes sacrifices expressive power in order to provide better safety. In addition to linked lists, there are other many other examples where safe Rust limits the expressive power.

The unsafe Keyword

As mentioned earlier, Rust is a low-level language and you can do mostly anything that other languages allow you to do. However, as we have just seen, Rust limits its expressive power to provide better safety. Obviously, these two things are in conflict with each other, and Rust deals with it by distinguishing what is considered safe and unsafe via the unsafe keyword.

You can use unsafe in order to do certain things that Rust by default does not allow you to do. Let's first look at how to use unsafe and then look at what you can do with unsafe.

unsafe Blocks, Functions, and Traits

You can use in in three ways.

The first way is to define an unsafe block. The code below does not actually need unsafe. It is only for demonstration purposes.

fn main() {
    unsafe {
        let a = 1;
        println!("a is {}", a);
    }
}

Another way is to define an unsafe function. When you want to invoke an unsafe function, you can only do it within an unsafe block.

unsafe fn unsafe_fn(a: i32) {
    println!("a is {}", a);
}

fn main() {
    unsafe {
        unsafe_fn(1);
    }
}

The third way is to define an unsafe trait. However, since we have not discussed traits yet, we will discuss the use of unsafe for traits later when we discuss traits.

unsafe Capabilities

The Rust book has a section on unsafe Rust that overviews what you can do with unsafe. There is also a separate book called The Rustonomicon that is dedicated to unsafe Rust. These resources discuss all the capacilities of unsafe as well as important nuances that you need to know when you use unsafe.

Among all the things that unsafe allows you to do, the use of raw pointers is perhaps the most common case. Raw pointers are similar to the pointers in C/C++ and there are two types---one type is immutable raw pointers defined as *const T and the other type is mutable raw pointers defined as *mut T. You can still create raw pointers without using unsafe but when you dereference a raw pointer, you can only do it inside unsafe. The following are two examples.

fn main() {
    let a = 1;
    let raw_ptr: *const i32 = &a;

    println!("a through raw_ptr is {}", *raw_ptr); // This does not work.
}
fn main() {
    let a = 1;
    let raw_ptr: *const i32 = &a;

    unsafe {
        println!("a through raw_ptr is {}", *raw_ptr); // This does work.
    }
}

Unlike references, raw pointers lack safety guarantees that Rust provides. Most notably, Rust does not check the aliasing XOR mutability rule for raw pointers. This means that you can have any number and combinations of mutable and immutable raw pointers, and the Rust compiler will not complain.

fn main() {
    let mut a = 1;
    let immutable_raw_ptr: *const i32 = &a;
    let mutable_raw_ptr: *mut i32 = &mut a;

    unsafe {
        println!("a through immutable_raw_ptr is {}", *immutable_raw_ptr);
        println!("a through mutable_raw_ptr is {}", *mutable_raw_ptr);

        *mutable_raw_ptr = 2;

        println!("*immutable_raw_ptr now is {}", *immutable_raw_ptr);
        println!("*mutable_raw_ptr now is {}", *mutable_raw_ptr);
    }
}

Another notable aspect about raw pointers is that Rust does not deallocate memory automatically for raw pointers. The following is an example that shows manual allocation and dealloation (modified from this page).

fn main() {
    unsafe {
        let layout = std::alloc::Layout::new::<u16>();
        let ptr: *mut u8 = std::alloc::alloc(layout);

        *ptr = 42;
        println!("*ptr is {}", *ptr);

        std::alloc::dealloc(ptr, layout);
    }
}

In this example, Rust does not automatically deallocate what ptr points to. It needs to be done manually.

Although unsafe gives you more expressive power and you can do low-level operations such as pointer manipulations, it is generally discouraged to use since it escapes the safety net provided by the Rust compiler. Thus, it is critical to have a clear understanding of what it does. As mentioned earlier, Rust already provides great resources (the unsafe section from the Rust book and the Rustonomicon). You are highly encouraged to read these before you start using unsafe.