Syllabus

This course is a research-oriented course where you are expected to produce research results on a specific topic of your choice, related to the overall theme of the course. To this end, you will carry out a number of tasks that a researcher in the field of computer science typically does. This includes, reading papers to understand the literature, identifying an important problem to solve, solving the problem, evaluating how good your solution is, and potentially repeating the process to find a better solution.

Consequently, this course assumes that you are interested in doing research in computer science, especially in the areas within this course's interest---systems and software engineering. If you are a grad student interested in doing research in these areas, or if you are an undergrad student who is thinking of going to grad school to do research in these areas, this course can be a good fit for you.

The overall theme of this course is techniques for reliably building systems software. Though this can be interpreted in many different ways, this course has specific topics of interest, such as fuzzing, property-based testing, symbolic/concolic execution, interactive verification, and automated verification, all applied to building systems software such as an OS kernel. This course also features Rust as another focus, due to its memory safety guarantees that can improve software reliability.

Upon completing this course, you will have gone through the complete cycle of research---identifying a research problem, doing a literature review, coming up with a solution, and evaluating the solution. In addition, you will have learned the aforementioned techniques (to a limited extent) and read a number of papers related to those techniques.

Although this course is open to anybody who meets the minimum prerequisites, it may not be a good fit for you for the following reasons.

  • This course is a research course, meaning that it is less structured. Though there are various structured learning components such as lectures and assignments, those are not the sole focus of the course. Rather, the course expects you to explore, experiment, and do your own learning on the way. It is ultimately your responsibility to make the most out of the course.
  • Rust is a relatively new language, which means that the infrastructure around it may not be mature yet. For example, though symbolic execution, property-based testing, or fuzz testing tools for Rust are mostly usable, they may still have some rough edges. This goes well with the spirit of the course that encourages you to explore and experiment. However, you may run into various issues as you use different tools and it may be necessary to pivot around.
  • The course content and schedule are fluid and can change. For example, if certain tools turn out to be unusable, the course content and schedule will adapt to that new piece of information.
  • This course is different from other courses where there are known, preset answers and your job is to find those answers. Instead, you need to identify a research problem yourself and solve it. The problems that you identify can be open-ended and may or may not be clear if there is a solution.
  • The course is new so it might have some rough edges itself.

Think carefully regarding the above before you commit to this course. You are welcome to take this course if the above are not an issue for you.

Administrative Information

Time and Location

Thursdays: 11:30 AM - 2:20 PM (AQ 5037, Burnaby)

Instructor and TA Information

Instructor: Steve Ko <steveyko@sfu.ca>
TA: Anant Awasthy <anant_awasthy@sfu.ca>

Office Hours

TBA

Prerequisites

  • CMPT 300 with a minimum grade of C-
  • Mastery of using Linux's command line interface

System Requirements

  • An installation of Linux with sudo access
  • An editor/IDE set up for Rust, e.g., Vim/Neovim, Emacs, VS Code, CLion, etc.

Grading

Grading ComponentWeight
Course project40%
Class prep20%
Paper presentation10%
Programming assignment 110%
Programming assignment 210%
Class participation10%

Late Submission Policy

All assignments have hard deadlines. No late submissions are allowed.

Regrading Policy

Assignments and exams (if any) may be submitted for regrading to correct grading errors.

  • Regrade requests are due no later than one (1) week after the grades are posted.
  • Regrade requests must be clearly written and attached to the assignment.
  • Regrades requests are intended to correct grading errors, NOT to negotiate for a higher grade. When work is submitted for regrade, the entire work may be regraded, which may result in a lower grade.

Accessibility Resources

If you would like reasonable accommodations to participate in this course, please contact the instructor as well as the Centre for Accessible Learning (CAL). The staff at CAL will provide you with information and review appropriate arrangements for reasonable accommodations.

Academic Honesty Statement

This course has a very high standard for academic integrity. Any type of academic integrity violation will result in an F for the semester. In general, this course follows the SFU Academic Honesty and Student Conduct Policies.

COVID and Mask Policies

This course follows the COVID and mask policies set by the university. There is a university website that contains general information regarding returning to campuses.

Course Schedule

The following schedule is subject to change.

   Week   
(Date)
DiscussionClass PrepDue
Week 1 (Sep 8)Course Introduction & Rust
Week 2 (Sep 15)Rust (continued) & How to Read a Paper- The Rust Book up to Chapter 11
- Rust by Example
- Rust Design Patterns
- How to Read a Paper
Milestone 1: Team Formation
Week 3 (Sep 22)Fuzz Testing, Property-Based Testing, Symbolic Execution, and SAT/SMT- An Empirical Study of the Reliability of UNIX Utilities
- QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs
- Symbolic Execution and Program Testing
- Problem Solving for the 21st Century
Milestone 2: Topic Selection
Week 4 (Sep 29) Class canceled
Week 5 (Oct 6)Rust Analyses- The Usability of Ownership
- How Do Programmers Use Unsafe Rust
Programming Assignment 1

Milestone 3: Problem Selection & Proposal
Week 6 (Oct 13)Rust OSes
(Cars in Space)
- RedLeaf: Towards An Operating System for Safe and Verified Firmware
- Theseus: an Experiment in Operating System Structure and State Management
Week 7 (Oct 20)Symbolic Execution
(Black Diamond)
- KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs
- Verifying Dynamic Trait Objects in Rust
Week 8 (Oct 27)Hybrid Fuzzing
(Rusting Away)
- Driller: Augmenting Fuzzing Through Selective Symbolic Execution
- HFL: Hybrid Fuzzing on the Linux Kernel
Week 9 (Nov 3)Formal Methods
Week 10 (Nov 10)Formal Methods (Continued)- How Amazon Web Services Uses Formal Methods
- Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3
Milestone 4: Intermediate Demo
Week 11 (Nov 17)Interactive Verification
(Lionfish)
- seL4: Formal Verification of an OS Kernel
- CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels
Programming Assignment 2
Week 12 (Nov 24)Automated Verification
(In the Loop)
- Safe to the Last Instruction: Automated Verification of a Type-Safe Operating System
- A Formally Verified NAT
Week 13 (Dec 1)Project Demos and PresentationsMilestone 5: Final Demo & Report

Class Prep

To prepare for each class, you are expected to read a few papers and write summaries for them. For each paper, you need to read it using the method described in How to Read a Paper, up to the second pass. Each week, the instructor posts a set of questions that you need to answer in your paper summaries for the coming week.

Due

Two nights before each class (i.e., every Tuesday night)

Submission

First, accept the repo invite for class preparation from our GitHub Classroom. Once that's done, you should push your summaries for each week to the repo by the deadline. You should create a file named week_X.md (where X is the week's number) and write your summaries in plain text or Markdown.

Grading

Your class prep is 20% toward the overall final grade.

Programming Assignments

Assignment 1: Toybox Command Reimplementation

This assignment asks you to reimplement a Toybox command, file. The goal is to produce a working command that implements the same features as the original Toybox implementation.

Due

Friday Sep 30 Oct 7 at 00:00 AM (Note: this is the night of the 29th the 6th.)

Requirements

  • You need to reimplement the file command.
  • It should implement the same set of features as in the original Toybox's file implementation.
  • You should not use any external crates, but there are two exceptions. The first exception is when the original Toybox source uses an external library. If that is the case, you can find a Rust crate that provides similar functionality and use it. The second exception is for command line arguments. There are a number of good external crates that process command line arguments, and you can use any one of those. If you do use an external crate, you need to get approval first.
  • Your command should work as a stand-alone executable. Note that the original Toybox produces a single executable for all commands that it supports.
  • You should not use unsafe Rust. However, if it is unavoidable, you need to get approval first. You then need to carefully document it as comments on the source itself and explain why you have to use unsafe Rust.
  • You might need to also reimplement some of the shared code, e.g., toys.h, toybox/lib/, etc. This is part of your assignment.
  • We grade your submission using the test cases from the original Toybox test source.

How to Submit

Submit on GitHub Classroom as follows.

  • Accept the assignment invitation on GitHub Classroom.
  • Make sure you push your code before the deadline. You need to be careful because you can keep pushing to the repo even after the deadline, which is what you should not do. The grading is be done for the last version pushed before the deadline.

Grading

TBA

Assignment 2: Simple Symbolic Execution Engine for Rust

This assignment asks you to implement a simple symbolic execution engine for Rust.

Due

Friday Nov 18 at 00:00 AM (Note: this is the night of the 17th.)

Requirements

How to Submit

Grading

Course Project

The goal of your course project is to identify a research problem, solve it, and evaluate the effectiveness of it. At the end of the semester, you are expected to show a demo of your research prototype to class and submit a project report. The topic of your project should be one of the topics we discuss in class.

Though the course schedule has structured components to help you make progress on your project in a timely fashion, you are highly encouraged to talk to the instructor about your project's direction and progress. This is especially true if you do not have much experience in carrying out a research project.

The following is the timeline for your project.

Milestone 1: Team Formation

The first thing to do is to find your teammates. Your team can be up to 3 people.

Due

Friday of Week 2

Submission

Your "submission" is to accept the repo invite for the course project with your teammates from our GitHub Classroom. Please make sure that you do this by the end of Week 2 (Friday).

Grading

There is no grading for this milestone. However, there is a 1% penalty for missing the deadline, deducted from the final grade of your course project.

Milestone 2: Topic Selection

With your teammates, you can go through the course schedule and find a topic that is appealing to your team. Once you find a topic, you can skim through the papers listed under the topic to gain a better understanding of it.

When selecting a topic, one good starting point is to think about what you want to gain deeper understanding of. Doing research on one topic is ultimately about gaining expertise on that topic. One way to make the most out of this course is to choose a research topic that you want to gain expertise on.

Due

Friday of Week 3

Submission

Your submission is to push a file named TOPIC.md to your GitHub Classroom repo. The file should contain your team's topic written in plain text or Markdown. Please make sure you do this by the end of Week 3 (Friday).

Grading

There is no grading for this milestone. However, there is a 1% penalty for missing the deadline, deducted from the final grade of your course project.

Milestone 3: Problem Selection & Proposal

After you decide on a topic, you are expected to identify a problem to work on. The goal is to generate as many ideas as possible, settle on one problem, and write a proposal for it. Just as selecting a topic, one good starting point is to think about what you want to gain deeper understanding of.

The exact kind of problem you want to work on can depend on your topic. For example, if your topic is fuzzing, you can find a problem of fuzzing and solve the problem by implementing your solution for a fuzzing tool. If your topic is automated verification, you can write a program and verify it by using a tool like Dafny. If you are stuck, you can always talk to the instructor.

It is ideal if your platform is based on Rust. However, this is not a requirement. If your topic or problem may have nothing to do with Rust, which is fine.

Due

Friday of Week 5

Submission

Your submission is to push a plain text or Markdown file named PROPOSAL.md to your GitHub Classroom repo. The file should clearly answer the following questions.

  • What is your problem statement?
  • What is a rough idea for your solution?
  • What is your team going to build for the solution? Clearly specify the tools that you will either leverage or modify.
  • What are you going to accomplish by the intermediate demo deadline?

You might find this article useful when writing your proposal.

Grading

Your proposal is 5% toward your overall semester final grade. The grading is not based on length but quality. If you answer adequately, your proposal does not have to be long. However, keep in mind that short answers often do not contain enough information to understand clearly.

Milestone 4: Intermediate Demo

We have an intermediate checkpoint to make sure that your team is making good progress on the project. For this, we set aside some time in class for every team to show an intermediate demo. The goal is to show a demo that you have promised in your proposal. Obviously, there can be unanticipated challenges that you have not foreseen and you may not be able to show a demo as planned. This is fine as long as you make honest effort. If this is the case, you need to talk to the instructor well before the deadline (e.g., 1-2 weeks earlier) in order to come up with a new demo plan. This is entirely your responsibility. The instructor does not verify with individual teams on their plan change.

Due

Week 10 class

Submission

Your "submission" is to show a demo in class.

Grading

Your intermediate demo is 10% toward your overall semester final grade.

Milestone 5: Final Demo & Report

At the end of the semester, you are expected to show a demo of your final prototype in class. You are also expected to submit a project report. The final demos take place in the last class of the semester. The report should be in PDF and 5 pages long (excluding references) with a 10-pt Times New Roman font. The report should contain sections that roughly correspond to the following.

  • Introduction
    • Include a clear problem statement here.
  • Overview of the solution
    • Diagrams typically help for an overview.
  • Detailed description of the solution
    • Discuss your algorithms, methodologies, etc. as well as what worked and what didn't work. Include diagrams, tables, etc. as appropriate.
  • Results
    • First design and run experiments. Then plot the results and include them in your report.
  • Conclusion
    • Draw a conclusion regarding what your solution and results tell you.
  • Responsibilities
    • Describe who did what in your team.

You might find this talk useful.

Due

The final demos are during the class in Week 13. The final report is due on Dec 6.

Submission

Your submission for the final report is to push a file named final_report.pdf to your GitHub Classroom repo. This should be a PDF file, not plain text or Markdown. Please make sure that you push it to the remote repo by the deadline.

Grading

Your final demo is 10% toward your overall semester final grade, and your final report is 15% toward your overall semester final grade.

Paper Discussion

Along with your team members, you are expected to give presentations in class. These student-led classes start from Week 4. Each presentation should consist of the following two parts.

Class-Prep Paper Discussion

The first part of your presentation is discussing class-prep papers. In order to do this, you are expected to read the class-prep papers up to the third pass described in How to Read a Paper. This will give you deeper understanding of the papers. The instructor will provide you with a set of questions, derived from student summaries, to discuss in class. You are expected to prepare your own answers for those questions, and discuss the questions and answers in class.

Paper Presentation

The second part of your presentation is presenting additional papers. If you look at the schedule, there are additional papers that are not part of the class-prep paper list for each week. You are expected to explain those additional papers to class by preparing a presentation. Since other students do not read those papers, your goal is to explain the papers as clearly as possible. This cannot be rushed---it requires careful reading (i.e., read the papers well in advance) and thinking about your presentation strategies that can maximize clarity. You need to run it among your team a few times to see if everything's clear. You might find this talk useful.

Grading

Each part is 5% toward the overall semester final grade.

Approval Instructions

In order to get approval regarding a piece of code (e.g., using an external crate or unsafe Rust), create a new issue from the code and mention @steveyko. Mentioning @steveyko is important as it notifies the instructor.

Week 1: Rust

The focus of the first couple weeks is learning Rust. Although we do have in-class lectures for Rust in CMPT 479/982, the best resource to use to learn Rust is the Rust book. You have to read the Rust book in addition to following the lectures since the lectures are not meant to replace the Rust book.

In the first lecture, we cover the basics of the Rust syntax. We discuss variables, types, functions, conditionals, etc. These share similarities with other languages and if you are familiar with any of the popular languages such as C++, Java, or Python, you will be able to pick up Rust's syntax quickly.

System Setup

The first thing we need to talk about is setting up your system so you can write, compile, and run Rust programs.

Installing Rust: rustup

Rust has a streamlined process of maintaining all its software packages. It is all done by a tool called rustup. Head over to the Rust book on installation and follow the instructions there to install rustup and other software for Rust.

rustup has many features to support Rust development, and it has its own book.

By default, rustup installs the latest stable version of Rust. However, this course later uses some experimental features of Rust. Thus, it is necessary to install "nightly" Rust that supports those experimental features. The following command will do just that.

$ rustup toolchain install nightly

If you are interested, you can read about various release channels of Rust (nightly, beta, and stable).

If later you determine that you want to use nightly Rust by default rather than stable Rust, you can do it as follows.

$ rustup default nightly

If you want to switch the default back to stable, you can do it as follows.

$ rustup default stable

If you want to list all installed versions of Rust, you can do the following.

$ rustup toolchain list

At this point, it will print out two versions, the latest stable version and the latest nightly version.

The following command updates the stable and nightly versions that you have.

$ rustup update

However, if you start using nightly Rust, it is important to not break the installation because your code might depend on a particular version of Rust. Thus, rustup allows you to install a specific version of Rust based on its date. For example, the following command installs a nightly Rust based on the source from 2021-09-01.

$ rustup toolchain install nightly-2021-09-01

Writing Rust Programs: Editors

Once you have a working installation, you need to have a proper editor/IDE to write a Rust program. There are many options available to choose from, but whatever editor/IDE you use, it is important to enable Rust plugins. Rust plugins support syntax highlighting, code formatting, error detection, etc., which will make your life easier.

  • Vim/Neovim: If you use Vim or Neovim, follow the instructions here and use the combination of rust-analyzer, coc.vim, and one of the LSP clients listed in the instructions. This is the personal setup that the instructor uses for Rust development (with ALE as the LSP client).
  • Emacs: You can also use rust-analyzer for Emacs. Here are the instructions.
  • VS Code: VS Code has an official Rust plugin. Open VS Code, go to Extensions, and search Rust. You can also use rust-analyzer with VS Code.
  • CLion: CLion is the only true IDE among the options listed here. You can get a free student license. CLion can use the IntelliJ Rust plugin.

Make sure that you install and enable rustfmt with your Rust plugins (e.g., on save). That way, it will automatically format your code.

Compiling Rust Programs: cargo

Once you set up an editor/IDE, you are ready to write Rust programs. Although it is possible to create Rust source files and compile them with the Rust compiler (rustc), you are most likely not going to do that. Instead. cargo is the command that you will use most of the time. Head over to the Rust book on cargo to learn about cargo.

One thing to add, due to our prior discussion on using nightly Rust, is that you can easily choose which Rust version to use when compiling with cargo. For example, the following command compiles your code using nightly Rust even if you do not set it as the default.

$ cargo +nightly build

The same + syntax works for rustc as well.

It is also possible to set the version of Rust that you want to use for a specific directory by using rustup. For example, the following commands will set the default to nightly Rust for the directory tmp/.

$ cd tmp
$ rustup override set nightly

Basic Syntax of Rust

Pretty much every (procedural) programming language provides constructs to (1) define a variable, (2) define a function, and (3) control the flow of execution. If you understand those constructs for a given language, you will be able to start writing simple programs in that language. Rust is no exception, so let's take a look at how Rust provides those constructs at the very basic level first.

Variable Definitions

The following program shows how to define variables in Rust.

fn main() {
    let a: i32 = 1;
    let b: i32 = 2;

    println!("a + b is {}", a + b);
}

But before we go further, let's note a couple of things.

  • main() is a function and it is the default entry point for a Rust program, similar to C/C++. If you create a new cargo project using cargo new, it will create a main() function in src/main.rs.
  • Let's not worry about println!() for now. All you need to know at this point is that it prints out formatted strings.

In the above program, we can make a few observations regarding variable definitions.

  • There are two variable definitions in the program.
    • let a: i32 = 1 means that a is a 32-bit integer and 1 is its initial value.
    • let b: i32 = 2 means that b is a 32-bit integer and 2 is its initial value.
  • A variable definition starts with let, e.g., let a: i32 = 1.
  • Each variable has a type, e.g., i32 (a 32-bit integer) in let a: i32 = 1. When you declare the type for a variable, the syntax is the variable name, followed by :, followed by the type, e.g., a: i32.
  • A variable definition ends with a semicolon (;). In Rust, all statements end with a semicolon (;). Rust makes a distinction between a statement and an expression, which we will discuss later.

Although Rust requires each variable to have a type, you don't always need to declare it because oftentimes the Rust compiler can infer a variable's type. We can revise the above code as follows and the Rust compiler will automatically understand the types for a and b.

fn main() {
    let a = 1; // No explicit type declaraion here.
    let b = 2; // Not here either.

    println!("a + b is {}", a + b);
}

Keep in mind, however, that it's not always possible for the Rust compiler to infer a variable's type. If that's the case, you still need to explicitly declare the type.

The Rust book has a section on data types and you can take a look at the primitive data types available in Rust.

Function Definitions

Let's look at another piece of code that shows how to define a function.

fn main() {
    let a = 1;
    let b = 2;
    print_sum(a, b);
}

fn print_sum(x: i32, y: i32) {
    let sum = x + y;
    println!("Sum: {}", sum);
}

Here also, we can make a few observations.

  • There are two function definitions. One is for main() and the other is for print_sum(). Again, main() is the default entry point for Rust programs.
  • A function definition starts with fn.
  • print_sum() has two parameters, one is x and the other is y.
    • x: i32 means that the parameter name is x and the type is i32.
    • y: i32 means that the parameter name is y and the type is i32.

Unlike variables, parameters always need a type. Run the following code (that doesn't declare types for parameters) and see what error messages you get.

fn main() {
    let a = 1;
    let b = 2;
    print_sum(a, b);
}

fn print_sum(x, y) {
    let sum = x + y;
    println!("Sum: {}", sum);
}

Typically, you want to return something as a result of calling a function. The following code shows the syntax for it.

fn main() {
    let a = 1;
    let b = 2;
    let s = sum(a, b);

    println!("a + b is {}", s);
}

fn sum(x: i32, y: i32) -> i32 {
    x + y
}

There are a couple of differences that we can see.

  • sum() has a return type declared in its definition -> i32. It means that it returns a 32-bit integer (i32) as a return value.
  • x + y is the return value. Notice a couple of things there.
    • There is no return keyword.
    • There is no semicolon at the end.

Not having a semicolon x + y means that x + y is an expression, and as mentioned earlier, we will discuss it later.

You can use return in a function, either at the end or in the middle (for an early return). The following code works exactly the same way except that it uses a return statement (notice ; at the end) instead.

fn main() {
    let a = 1;
    let b = 2;
    let s = sum(a, b);

    println!("a + b is {}", s);
}

fn sum(x: i32, y: i32) -> i32 {
    return x + y;
}

Read the Rust book on functions to understand the details about functions.

Execution Control

To control the flow of execution, Rust provides a few constructs that are similar to the ones provided by other languages. Although they are similar, Rust's constructs for execution control are often more powerful. There are typically two types of control you'd like---branching and looping. Rust provides if-else if-else and match for branching and loop, while, and for for looping.

Branching with if

Let's look at the following code, which shows how if-else works in Rust.

fn main() {
    let a: i32 = 1;

    if a == 1 {
        println!("a is 1");
    } else if a == 2 {
        println!("a is 2");
    } else {
        println!("a is something else");
    }
}

It looks very similar to the if-else constructs in other popular languages such as C/C++ or Java, except that there are no parentheses for a branch condition. However, there is a notable difference---if-else evaluates to a value. To understand what this means, let's look at the following code.

fn main() {
    let a = 1;

    let b = if a == 1 {
        2
    } else if a == 2 {
        1
    } else {
        0
    };

    println!("b is {}", b);
}

As you can see, b gets its initial value assigned from the result of if-else. This is because if is an expression, not a statement. An expression in Rust evaluates to a value while a statement doesn't. In Rust, most of the constructs are expressions, e.g., if, for, a code block {}, etc. If you add a semicolon at the end of an expression, it becomes a statement (called an expression statement) and the value that the expression evaluates to gets ignored.

Exactly what value an if evaluates to needs some more explanation. The first thing to understand is that a block is also an expression and it evaluates to the last expression of the block. Run the following code and see what the result looks like.

fn main() {
    let a: i32 = {
        println!("In the block");
        1 // This is an expression. There's no `;`.
    };
    println!("a is {}", a);
}

Since a block evaluates to its last expression, a gets 1. In order to see the difference between an expression and a statement, run the following code and see what error messages you get.

fn main() {
    let a: i32 = {
        println!("In the block");
        1; // This is a statement, not an expression due to `;`.
    };
    println!("a is {}", a);
}

If a block does not have the last expression, it gets () (sometimes referred to as the unit type) as its type. () has a single value, which is also (), and it is used "when there's no other meaningful value that could be returned".

Now if evaluates to the value of the block that corresponds to the correct condition. In the following code (which is the same code from above), if evaluates to 2 because a == 1 and the corresponding block evaluates to 2.

fn main() {
    let a = 1;

    let b = if a == 1 {
        2
    } else if a == 2 {
        1
    } else {
        0
    };

    println!("b is {}", b);
}

Branching with match

Rust provides another branching construct called match and you will probably find yourself using it very often due to its power. However, understanding its power requires the understanding of Rust's pattern matching ability, so we will not delve into that for now. The following code is a revised version of the very first code we saw for if and it uses match instead of if-else if-else.

fn main() {
    let a: i32 = 1;

    match a {
        1 => { // If `a` matches `1`, execute this.
            println!("a is 1");
        }
        2 => { // If `a` matches `2`, execute this.
            println!("a is 2");
        }
        _ => { // `_` works as a wild card and it matches any expression
            println!("a is something else");
        }
    }
}

With match, you provide an expression to match (e.g., a) and list out execution options (e.g., 1 => {}, 2 => {}, etc.). Each option is called a match arm. Unlike if, a match expression does not have to be a boolean expression as you can see from the code above.

As mentioned earlier, match is much more powerful than if due to pattern matching, but we will cover that later.

Loops

Rust provides three types of loops---loop, while, and for.

loop

loop is probably the most interesting loop construct, especially in conjunction with break. loop is an infinite loop construct and if you want to break out of the loop, you need to use break.

fn main() {
    let mut a: usize = 0; // Don't worry about `mut` for now.

    loop {
        println!("This is infinite...");
        a += 1;
        if a == 10 {
            println!("Unless there's a break");
            break;
        }
    }
}

An interesting thing about loop and break is that loop is an expression and it evaluates to what break returns.

fn main() {
    let mut a: usize = 0; // Don't worry about `mut` for now.

    let b = loop {
        println!("This is infinite...");
        a += 1;
        if a == 10 {
            println!("Unless there's a break");
            break a;
        }
    };

    println!("b is {}", b);
}

In the code, break returns a and loop evaluates to it. Thus, a's value is assigned to b.

while

while is almost exactly what you would expect.

fn main() {
    let mut a: usize = 0; // Don't worry about `mut` for now.

    while a < 5 {
        println!("a is {}", a);
        a += 1;
    }
}

As with other languages, while provides a conditional loop.

for

You can use for instead of while, but you will typically use for to iterate over a collection such as an array.

fn main() {
    let a = [0, 1, 2, 3, 4];

    for i in a {
        println!("a contains {}", i);
    }
}

In order to use for to iterate over a collection such as an array, you need to get an iterator for the collection. In the code above, even though it looks like you are using a directly, that's actually not the case. The Rust compiler understands that you are iterating over a and it replaces it with a proper iterator.

You can also use range operators, .. or ..=, with for. The first one, .., is a right exclusive range operator while ..= is a right inclusive range operator.

fn main() {
    let a: usize = 1;

    for i in 0..a {
        println!("Right exclusive iteration: {}", i);
    }

    for i in 0..=a {
        println!("Right inclusive iteration: {}", i);
    }
}

Safety Features of Rust

Now that you are familiar with the basic syntax of Rust, let's talk about what makes Rust different from other languages, namely, the safety features of Rust. Rust employs many features that safeguard you from writing code that might cause problems at run time. The Rust language and compiler have many checks to enforce that you are writing a more reliable program. These very features sometimes make a program hard to compile. But soon you will realize that once you get your program compiled, it will just work as you have intended.

Variable Mutability

The first safety feature to discuss is variable mutability/immutability. By default, a variable in Rust is immutable meaning that once you assign a value to a variable, you cannot assign another value to it. If you run the following code, the compiler will complain. The error message basically says that the variable a is an immutable variable and you cannot assign a value twice.

fn main() {
    let a: i32 = 1;
    println!("a is {}", a);

    a = 2;
    println!("a is {}", a);
}

In order to make a variable mutable, you need to declare a mutable variable like the following.

fn main() {
    let mut a: i32 = 1;
    println!("a is {}", a);

    a = 2;
    println!("a is {}", a);
}

You use the mut keyword to define a mutable variable that you can assign a value to more than once.

Since you define mutable variables explicitly, the Rust compiler knows which variables can be modified. Thus, the Rust compiler can statically check (i.e., check at compile time) if the mutable variables are the only ones modified in your code. For you as a Rust developer, this is a safety feature for a couple of reasons. First, it makes you think hard about whether or not you will need to modify a variable when you define it. In other words, it gives you an opportunity to think about how you intend to use each and every variable in your code. Second, it prevents you from defining a variable in one place with the assumption that you will not modify it, and later modifying the variable inadvertently.

In addition to defining a mutable variable with mut, you can also reuse the same variable name.

fn main() {
    let a: i32 = 1;
    println!("a is {}", a);

    let a: i32 = a + 1;
    println!("a is {}", a);

    let a: i64 = 3; // Different type
    println!("a is {}", a);
}

This is called shadowing, which is effectively redefining a new variable with the same name. This is useful in certain scenarios, e.g., when you have a big chunk of code that uses a variable heavily, but then later realize that you need to do some quick transformations for the variable before it gets heavily used. If that's the case, you can shadow the variable and still take advantage of the Rust compiler's immutability check.

The Rust book has an excellent section on variable mutability, so please go read it.

Ownership

Ownership is perhaps the most distinguishing feature of Rust that everybody talks about, and frankly it will give you some headaches when you try to get the Rust compiler to compile your code. However, it is an important feature of Rust that provides safety.

The whole concept has to do with how to manage memory. In languages like C/C++, the approach is to leave memory management to programmers. What it means is that C/C++ programmers need to allocate memory and free memory by themselves. This has caused many programs to suffer from memory leak problems since it is easy to allocate memory and not free it. Other languages like Java and Python use an automated memory management approach where a garbage collector runs from time to time to reclaim allocated memory that no longer is in use. This approach unburdens programmers from worrying about memory management but has a performance cost because a garbage collector needs to run, which interferes with the normal program execution.

Rust takes a different automated approach to memory management. When you define a variable, Rust allocates a piece of memory that the variable will use. This variable is called the owner of that allocated memory. Later, Rust deallocates the memory when its owner goes out of scope. Rust calls this dropping of memory. In addition, there are certain cases where Rust moves the ownership of a piece of memory from one variable to another. Thus, it is not always the case that the first variable that owns a piece of memory remains its owner the whole time. However, it is the case that there is only a single variable that is the owner of a piece of memory.

There are a few things to unpack here and let's look at one by one.

The Scope of a Variable

Let's first look at what a scope is for a variable. A variable's scope in Rust is similar to other languages and you can easily determine what it is by looking at the block where the variable is first defined. The following examples show two cases to illustrate what a scope is for a variable.

fn main() {
    let a: i32 = 1; // The scope for `a` starts here.
    println!("a is {}", a); // This works fine because `a` is still valid.
} // The scope for `a` ends here.
fn main() {
    {
        let a: i32 = 1; // The scope for `a` starts here.
    } // The scope for `a` ends here.
    println!("a is {}", a); // This throws an error,
                            // because `a` is no longer valid.
}

As you can see from the examples, a variable has a scope and it is only valid within its scope. In fact, this is the way most other languages work as well. The difference is that Rust uses a variable's scope to automatically manage memory. In the above examples, when a goes out of scope, Rust drops a's memory automatically. At this point, you may think that it is still the way other languages work. You are correct. For stack-allocated memory such as local variables, you never have to worry about allocating or deallocating memory in other languages either. The difference for Rust is that you also do no need to worry about it for heap-allocated memory. Let's discuss this a little further. (If you need a refresher on the stack and the heap, please read the Rust book on the stack and the heap).

Automated Heap Memory Management

In languages like C/C++, programmers allocate or deallocate heap memory by invoking memory management functions such as malloc() and free(). In languages like Java and Python, programmers do not allocate or deallocate heap memory explicitly because it is by default hidden from the programmers and there is a garbage collector that manages memory.

In Rust, heap allocation/deallocation is by default hidden from the programmers as well, and through the combination of the Rust compiler and the Rust's standard library, Rust handles the allocation/deallocation of heap memory. Rust allocates heap memory through convenient data structures such as String, Vec, and Box. These data structures hide all the details of allocating heap memory. Of course, Rust is a low-level language, so you can allocate heap memory by yourself. But Rust developers typically do not use the heap that way.

Rust deallocates heap memory via a function called drop() and this is where ownership plays a critical role. When a variable that is the owner of a piece of heap memory goes out of scope, Rust invokes drop() automatically. By "automatically," we mean that the Rust compiler injects a piece of code that invokes drop(). The Rust compiler provides the default drop() implementation and deallocates the heap memory used by its owner.

Mechanism-wise, this is similar to C/C++ that use memory allocation/deallocation functions (e.g., malloc() and free()). It is just that by default, Rust programmers do not need to invoke them by themselves. This is different from languages like Java or Python where a separate runtime component, i.e., a garbage collector, is used to manage memory.

The following examples use the Box data structure to allocate heap memory.

fn main() {
    let a = Box::new(1); // The heap memory for `a` gets allocated.
                         // Don't worry about the syntax for `Box` for now.
    println!("a is {}", a); // This works fine because `a` is still valid.
} // The heap memory for `a` gets deallocated.
fn main() {
    {
        let a = Box::new(1); // The heap memory for `a` gets allocated.
    } // The heap memory for `a` gets deallocated.
    println!("a is {}", a); // This throws an error,
                            // because `a` is no longer valid.
}

Earlier we said that each allocated piece of memory in Rust has an owner and there is always a single owner. Since drop() by default takes care of deallocation when an owner goes out of scope, we mostly do not need to worry about memory leak problems. One caveat is that Rust does not prevent programmers from manually allocating and deallocating heap memory. Thus, it is possible to suffer from memory leaks when a programmer tries to manage memory explicitly and does not do a thorough job for it. However, Rust programmers typically do not choose to manage heap memory by themselves, so there is a low chance of getting into memory leak problems.

Determining Ownership

Since Rust calls drop() when an owner goes out of scope, it is absolutely critical to be able to determine whether or not a variable is the owner of a piece of memory. If a single variable accesses a piece of heap memory exclusively throughout a whole program, it is easy to determine the ownership. However, it is too restrictive to not allow two or more variables to access the same piece of memory. Thus, Rust employs a few mechanisms to keep track of ownership.

Move

By default, when you assign a variable to another variable, Rust moves the ownership. This is probably one of the most surprising aspects about Rust as a beginner. Let's look at the following code to see what this means.

fn main() {
    let a = Box::new(1); // `a` is the owner of the memory for `Box`.
    let b = a; // Rust moves the ownership of the `Box` from `a` to `b`.

    println!("b is {}", b); // This works fine.
}
fn main() {
    let a = Box::new(1); // `a` is the owner of the memory for `Box`.
    let b = a; // Rust moves the ownership of the `Box` from `a` to `b`.

    println!("a is {}", a); // This throws an error,
                            // because `a` no longer has access to the `Box`.
}

As you can see, if you assign a variable to another variable, Rust no longer allows us to use the original variable. The same thing happens with function calls and return values.

fn main() {
    let a = String::from("a"); // Don't worry about the syntax for `String` for now.
    print_str(a); // `a` moves to the function `print_str()`.
}

fn print_str(x: String) { // `x` is the (new) owner of the string passed in.
    println!("The string is {}", x);
}
fn main() {
    let a = String::from("a");
    print_str(a); // `a` moves into the function `print_str()`.

    println!("a is {}", a); // This throws an error,
                            // because `a` can no longer access the string.
}

fn print_str(x: String) {
    println!("String {}", x);
}
fn get_str() -> String {
    let x = String::from("a");
    x // `x` moves to the caller
}

fn main() {
    let a = get_str(); // `a` is the new owner of the String "a".
    println!("a is {}", a);
}

You might be wondering why this is necessary. Let's take a look at the first example to understand further.

fn main() {
    let a = String::from("a");
    print_str(a); // `a` moves into the function `print_str()`.

    println!("a is {}", a); // This throws an error,
                            // because `a` can no longer access the string.
}

fn print_str(x: String) {
    println!("The string is {}", x);
} // Since `x` is the owner, Rust deallocates the String at this point,
  // because `x` is out of scope.

In the code, you can see that x becomes the new owner of the String "a" and it goes out of scope when the function print_str() is done. Thus, Rust will drop the String at that point. Thus, a should not be able to access the memory location after the function returns. Otherwise, a will access the memory location that is already dropped.

Generally speaking, if you have two different variables that can access the same heap location (called aliases), it can cause problems. For example, one variable can free the memory at one point while the other variable access the memory at some later point. This is called use-after-free and it is a well-known bug that can cause a vulnerability. Similarly, one variable can free the memory at one point and the other variable can free the same memory again at some later point. This is called double free and it is also a well-known bug that can cause a vulnerability. By moving the ownership and not allowing the original variable to access the value it had, Rust helps prevent problems caused by two variables accessing the same heap location.

However, you might think that this is too restrictive. For example, if you can't use variables every time you call a function and pass them as arguments, it will be very difficult to write a program. Thus, Rust provides many ways to help you deal with the restriction.

Copy and Clone

By default, primitive data types such as i32, i64, etc. do not move ownership. Instead, they just copy the value to a new memory location. The following example illustrates that.

fn main() {
    let a = 1;
    let b = 2;
    let s = sum(a, b); // This does not move the ownership.

    println!("a is {}", a); // This works fine.
    println!("b is {}", b); // This works fine.
    println!("s is {}", s);
}

fn sum(x: i32, y: i32) -> i32 {
    x + y // This does not move the ownership either.
}

As we can see, even if we pass a and b as arguments to sum(), we can still use them later. It is the same with the return value of sum(), although the code does not directly illustrate that. All this is because primitive data types copy instead of move, hence do not transfer ownership.

Rust distinguishes copy and move by looking at whether or not a data type implements something called the Copy trait (we will look at what a trait is later). All primitive data types implement the Copy trait while data structures like String and Box do not. You can define your custom data structure and implement the Copy trait to use the copy semantics instead of the move semantics for your data structure. A typical criterion to use when deciding whether or not you want to implement the Copy trait is the cost and complexity of copying. For example, primitive data types are small in size and the sizes are fixed. Thus, it is relatively inexpensive and easy to copy. However, String or Box point to a location on the heap, and the sizes are often not known a priori. Thus, it may not be easy or inexpensive to copy.

Another way to copy is cloning. If a data type implements the Clone trait, you can call clone() to explicitly create a duplicated object. This is different from copy because there has to be an explicit call.

Borrow

Rust provides another alternative to move, which is called a borrow. This uses & to represent that a variable is borrowing a value from another variable.

fn main() {
    let str = String::from("a");
    print_str(&str); // `&` is used to represent a borrow.
    println!("Can still access str: {}", str);
}

fn print_str(s: &String) { // `&` is used along with the type.
    println!("The string is {}", s);
}

When you pass a variable to a function to borrow it instead of moving it, there are two things you need to do. First, you need to pass a variable and add &, and second, you need to use & in your function definition as part of the type for each borrow parameter. Similar to C/C++, & is called a reference, but in Rust, it's better to think of it as a borrow rather than a pointer.

Mutable Reference

One caveat for borrowing is that it is read-only.

fn main() {
    let str = String::from("a");
    print_str(&str); // `&` is used to represent a borrow.
    println!("Can still access str: {}", str);
}

fn print_str(s: &String) { // `&` is used along with the type.
    println!("The string is {}", s);
    s.push_str("_added_more"); // This throws an error since `s` is read-only.
    println!("The new string is {}", s);
}

Again, this is quite restrictive since you cannot modify the value coming in as an argument. Thus, Rust provides a mutable borrow.

fn main() {
    let mut str = String::from("a"); // `mut` is used.
    print_str(&mut str); // `&mut` is used.
    println!("Can still access str: {}", str);
}

fn print_str(s: &mut String) { // `&mut` is used.
    println!("The string is {}", s);
    s.push_str("_added_more"); // This works now.
    println!("The new string is {}", s);
}

There are three different things here. First, when defining str, we use mut to represent that str has a mutable value. Second, when passing str to print_str(), we use &mut to represent that it is a mutable borrow, i.e., we are saying that print_str() not only borrows the value but also modifies the value. Third, in the parameter definition of s in print_str(), we use &mut to represent that print_str() modifies the value it is borrowing.

The Borrow Checker and the Aliasing XOR Mutability Principle

Mutable borrowing gives us flexibility of being able to modify a borrowed value within a function. However, it has a risk of data races. If you need a refresher on data races, please read the Rust book on mutable references, which explains the data race problem. In a nutshell, if two references have mutable access to the same memory location, then one can modify the value without the other knowing. Data races are known to be difficult to track down and fix.

Rust safeguards its programs from experiencing this problem by employing a principle commonly known as aliasing XOR mutability. It means that you get either aliasing or mutability, but not both. As mentioned earlier, aliasing means having two or more references to the same (heap) memory location. Mutability means having the ability to modify the value at a memory location. Thus, aliasing XOR mutability means that you have either exactly one mutable reference (a variable defined with &mut) or two or more references (variables defined with just &), but not both. The following illustrates the principle.

fn main() {
    let a = String::from("a");
    let b = &a;
    let c = &a; // So far we have two additional references to `a`.
                // This is aliasing, which is fine, as long as
                // those references don't have mutability.

    println!("a is {}", a);
    println!("b is {}", b);
    println!("c is {}", c);
}
fn main() {
    let mut a = String::from("a");
    let b = &a;
    let c = &mut a; // This is a problem because `b` is an alias,
                    // and `c` has mutability. This is
                    // both aliasing and mutability, not XOR.

    println!("a is {}", a);
    println!("b is {}", b);
    println!("c is {}", c);
}
fn main() {
    let mut a = String::from("a");
    let b = &mut a;
    let c = &mut a; // This does not work either,
                    // because both `b` and `c` are mutable aliases.
                    // I.e., both aliasing and mutability, not XOR.

    println!("a is {}", a);
    println!("b is {}", b);
    println!("c is {}", c);
}

The Rust compiler has a component called the borrow checker that enforces the aliasing XOR mutability principle at compile time. Oftentimes, this borrow checking gives a hard time to beginners and people say they're "fighting the borrow checker" because the Rust compiler keeps rejecting a program due to borrow checking rules. Thus, it is important to understand how exactly borrow checking works. Practice is a must here and also make sure you read The Rust book on borrow checking.

Option

Another important safety aspect of Rust is its approach to handling variables with no values. If you have experience with programming, you probably know already that there are many cases where a variable does not have a meaningful value. In those cases, values like null or just plain 0 is used to represent that a variable doesn't have a meaningful value. However, this has led to numerous bugs and vulnerabilities since programmers often forget to handle null or 0 and get a runtime error, e.g., a null pointer exception.

In Rust, there is no null value that you can use. Instead, the standard Rust library provides an alternative called Option. Rust programmers use Option heavily and you can find it everywhere, e.g., the standard library, external crates, etc. Thus, it is absolutely critical to understand what Option is and how to use it.

Option is defined as follows.

#![allow(unused)]
fn main() {
enum Option<T> {
    None,
    Some(T),
}
}

The definition of Option uses enum, which is something we have not discussed yet. It is similar to enumeration types in other languages like C/C++ or Java. An enum defines a custom type and lists all possible values that a variable of that type can have. For example, the following code defines an enum type called Ex and it has two possibilities.

#![allow(unused)]
fn main() {
enum Ex {
    FirstPossibility,
    SecondPossibility,
}
}

These possibilities are called enum variants, and when using a variant from an enum, you need to use ::.

enum Ex {
    FirstPossibility,
    SecondPossibility,
}

fn main() {
    let a: Ex = Ex::FirstPossibility;
}

You can find more details in the Rust book's section on enum.

If you look at the Option definition, it defines an enum that has two variants, one is Option::None used when a variable does not have a meaningful value, and the other is Option::Some used when a variable does have a meaningful value.

Option::Some has a couple of additional details to discuss. First is the use of T found in Option<T> and Some(T). This T is called a generic type parameter (and it does not have to be the letter T). If you know the support for generics in other languages like C/C++ or Java, you can probably understand what it is quickly. T is a variable that can take a type instead of a value. What this means is that instead of defining Option for every single type there is, e.g., an Option for i32, an Option for i64, etc., we can define it once using a generic type variable and instantiate an Option for any type. In the Option definition above, a generic type variable T is used in Option<T> to declare that the enum Option is defined for all types.

The second detail is the definition Some(T). This declares that Some is a variant that should take a value of the type T. This is different from FirstPossibility or None in the above examples because it is a variant that expects a value of a certain type.

The following example demonstrates all of these.

fn main() {
    let option_some_for_i32: Option<i32> = Some(1);
    let option_none_for_i32: Option<i32> = None;
    let option_some_for_string: Option<String> = Some(String::from("str"));
}

You can find more details in the Rust book's secion on generic data types.

By declaring a variable with Option, you are explicitly saying that a variable may or may not have a meaningful value and more importantly, you are forcing yourself to deal with both cases in your code.

In the above example, you might have noticed that Some and None are not used with Option::, i.e., not as Option::Some or Option::None but as Some and None. This is because Rust automatically imports the definitions so you can use them without having the Option:: qualifier. This is called the prelude, i.e., things that every Rust program automatically imports by default.

There are a lot of details that we do not discuss here regarding Option and enum. Make sure you read the Rust book on enum and pattern matching as well as on generic data types.

Result

The last safety aspect of Rust to highlight is its approach to error handling. Some languages use values to represent an error condition, e.g., null or a negative integer such as -1. Other languages use an error reporting mechanism that is outside of regular return paths, e.g., throw and try-catch in Java. Rust unifies these two approaches and use an enum called Result to return a value or report an error. Similar to Option, Rust programmers heavily use Result and you can find it everywhere. Thus, it is also critical to understand what Result is and how to use it.

The definition looks like the following.

#![allow(unused)]
fn main() {
enum Result<T, E> {
    Ok(T),
    Err(E),
}
}

The first variant Result::Ok represents a success with a value. The second variant Result::Err represents an error with an error value. Thus, Result is typically used as a return value type.

#![allow(unused)]
fn main() {
fn function_with_result(success_or_fail: bool) -> Result<String, String> {
    match success_or_fail {
        true => Ok(String::from("success")),
        false => Err(String::from("fail")),
    }
}
}

A common way to work with Result (as well as Option) is using match that we have discussed earlier. The following example shows an example and also demonstrates the power of match for pattern matching that was briefly mentioned earlier.

fn function_with_result(success_or_fail: bool) -> Result<String, String> {
    match success_or_fail {
        true => Ok(String::from("success")),
        false => Err(String::from("fail")),
    }
}

fn main() {
    let result = function_with_result(true);

    match result {
        Ok(success_result) => println!("Success: {}", success_result),
        Err(error_result) => println!("Error: {}", error_result),
    }

    let result = function_with_result(false);
    match result {
        Ok(success_result) => println!("Success: {}", success_result),
        Err(error_result) => println!("Error: {}", error_result),
    }
}

As we can see, match not only recognizes that result is either Ok() or Err() but also assigns the value of Ok() (or Err()) to success_result (or error_result).

Another common way is to use if let, which is similar to match.

fn function_with_result(success_or_fail: bool) -> Result<String, String> {
    match success_or_fail {
        true => Ok(String::from("success")),
        false => Err(String::from("fail")),
    }
}

fn main() {
    let result = function_with_result(true);

    if let Ok(success_result) = result {
        println!("Success: {}", success_result);
    } else {
        println!("Error");
    }

    let result = function_with_result(false);
    if let Err(error_result) = result {
        println!("Error: {}", error_result);
    } else {
        println!("Success");
    }
}

if let attempts to perform a pattern match and if it is successful, it executes the if let block. Otherwise it executes the else block.

You can find more details on if let and pattern matching in the Rust book.

Similar to Option, by declaring a return type as Result, you are forcing yourself to handle both the success case and the error case. There are a lot of details about Result and error handling that we do not cover here, so please make sure you read the Rust book on error handling.

Expressive Power and Unsafe Rust

Rust is a low-level language, meaning you can do mostly anything that other languages allow you to do. For example, you can allocate and deallocate memory by yourself as mentioned earlier. You can also do other things that low-level languages such as C or C++ allow you to do, while higher-level languages such as Java or Python do not. However, Rust recognizes that it is not always desirable or safe to allow potentially bug- or vulnerability-inducing operations. Thus, Rust tries to strike a balance between what is "safe" and what is "unsafe" and distinguish what is commonly called unsafe Rust from (what is commonly called) safe Rust.

The reason why unsafe Rust exists is because of expressiveness vs. safety that Rust presents as a language. Using safe Rust, especially when you write low-level code such as shell commands, system libraries, or kernel, you might encounter cases where you have a hard time expressing, or even cannot really express, what you want to express.

Example

The most famous example is linked lists. There is actually a whole online book about writing lists in Rust. It is not our goal to look at all the details, but let's take a look at an example to make our discussion a little more concrete.

In the example for linked lists below, we use struct, which is similar to the one in C/C++. It defines a custom type with a list of members. You can read the Rust book on struct to learn about the details. Below is a simple example for a struct definition and initialization.

struct Ex {
    member1: i32,
    member2: String,
}

fn main() {
    let struct_var: Ex = Ex {
        member1: 1,
        member2: String::from("member2"),
    };

    println!("Member1: {}", struct_var.member1);
    println!("Member2: {}", struct_var.member2);
}

The example here illustrates how a circular linked list, which is not difficult to express in other languages, does not translate easily to Rust.

struct Node {
    next: Option<Box<Node>>, // The definition of `Box` is actually
                             // `Box<T>` with a generic type parameter `T`.
}

fn main() {
    let mut tail = Box::new(Node {
        next: None,
    }); // A tail node that doesn't have the next node for now.

    let head = Box::new(Node {
        next: Some(tail),
    }); // A head node that has the tail node as the next node.

    tail.next = Some(head); // An attempt to have the tail node
                            // point back to the head node
}

The problem here is ownership. First, we assign a Box to tail. We then assign Some(tail) to head.next, so head becomes the owner of the Box at that point. This means that tail no longer has access to the Box. But then we try assigning Some(head) to tail.next, meaning we try to access the original Box that tail no longer can access.

This is one example that shows how safe Rust trades off expressive power for safety. In other words, safe Rust sometimes sacrifices expressive power in order to provide better safety. In addition to linked lists, there are other many other examples where safe Rust limits the expressive power.

The unsafe Keyword

As mentioned earlier, Rust is a low-level language and you can do mostly anything that other languages allow you to do. However, as we have just seen, Rust limits its expressive power to provide better safety. Obviously, these two things are in conflict with each other, and Rust deals with it by distinguishing what is considered safe and unsafe via the unsafe keyword.

You can use unsafe in order to do certain things that Rust by default does not allow you to do. Let's first look at how to use unsafe and then look at what you can do with unsafe.

unsafe Blocks, Functions, and Traits

You can use in in three ways.

The first way is to define an unsafe block. The code below does not actually need unsafe. It is only for demonstration purposes.

fn main() {
    unsafe {
        let a = 1;
        println!("a is {}", a);
    }
}

Another way is to define an unsafe function. When you want to invoke an unsafe function, you can only do it within an unsafe block.

unsafe fn unsafe_fn(a: i32) {
    println!("a is {}", a);
}

fn main() {
    unsafe {
        unsafe_fn(1);
    }
}

The third way is to define an unsafe trait. However, since we have not discussed traits yet, we will discuss the use of unsafe for traits later when we discuss traits.

unsafe Capabilities

The Rust book has a section on unsafe Rust that overviews what you can do with unsafe. There is also a separate book called The Rustonomicon that is dedicated to unsafe Rust. These resources discuss all the capacilities of unsafe as well as important nuances that you need to know when you use unsafe.

Among all the things that unsafe allows you to do, the use of raw pointers is perhaps the most common case. Raw pointers are similar to the pointers in C/C++ and there are two types---one type is immutable raw pointers defined as *const T and the other type is mutable raw pointers defined as *mut T. You can still create raw pointers without using unsafe but when you dereference a raw pointer, you can only do it inside unsafe. The following are two examples.

fn main() {
    let a = 1;
    let raw_ptr: *const i32 = &a;

    println!("a through raw_ptr is {}", *raw_ptr); // This does not work.
}
fn main() {
    let a = 1;
    let raw_ptr: *const i32 = &a;

    unsafe {
        println!("a through raw_ptr is {}", *raw_ptr); // This does work.
    }
}

Unlike references, raw pointers lack safety guarantees that Rust provides. Most notably, Rust does not check the aliasing XOR mutability rule for raw pointers. This means that you can have any number and combinations of mutable and immutable raw pointers, and the Rust compiler will not complain.

fn main() {
    let mut a = 1;
    let immutable_raw_ptr: *const i32 = &a;
    let mutable_raw_ptr: *mut i32 = &mut a;

    unsafe {
        println!("a through immutable_raw_ptr is {}", *immutable_raw_ptr);
        println!("a through mutable_raw_ptr is {}", *mutable_raw_ptr);

        *mutable_raw_ptr = 2;

        println!("*immutable_raw_ptr now is {}", *immutable_raw_ptr);
        println!("*mutable_raw_ptr now is {}", *mutable_raw_ptr);
    }
}

Another notable aspect about raw pointers is that Rust does not deallocate memory automatically for raw pointers. The following is an example that shows manual allocation and dealloation (modified from this page).

fn main() {
    unsafe {
        let layout = std::alloc::Layout::new::<u16>();
        let ptr: *mut u8 = std::alloc::alloc(layout);

        *ptr = 42;
        println!("*ptr is {}", *ptr);

        std::alloc::dealloc(ptr, layout);
    }
}

In this example, Rust does not automatically deallocate what ptr points to. It needs to be done manually.

Although unsafe gives you more expressive power and you can do low-level operations such as pointer manipulations, it is generally discouraged to use since it escapes the safety net provided by the Rust compiler. Thus, it is critical to have a clear understanding of what it does. As mentioned earlier, Rust already provides great resources (the unsafe section from the Rust book and the Rustonomicon). You are highly encouraged to read these before you start using unsafe.

More on struct and trait

There are a couple of things that we already used without explaining in the previous chapters, so let's tie up some loose ends.

More on struct

You might remember how we created a new Box or a new String in some of the earlier examples.

fn main() {
    let s = String::from("a");
    let b = Box::new(1);

    println!("This is a String: {}", s);
    println!("This is a Box: {}", b);
}

from() and new() are called associated functions and they are associated with struct String and struct Box, accordingly. The syntax for defining and calling an associated function is as follows.

struct Ex {
    field1: i32,
    field2: bool,
}

impl Ex {
    fn associated_fn(x: i32, b: bool) -> Ex {
        println!("Creating a new Ex with {} and {}", x, b);
        Ex { field1: x, field2: b }
    }
}

fn main() {
    let ex = Ex::associated_fn(1, true);
}

There is another type of functions that you can define for a struct and they are called methods. The difference between an associated function and a method is that a method takes self as the first parameter by default, which refers to a struct instance. This is similar to a Python class. The following example extends the above example and includes methods.

struct Ex {
    field1: i32,
    field2: bool,
}

impl Ex {
    fn associated_fn(x: i32, b: bool) -> Ex {
        println!("Creating a new Ex with {} and {}", x, b);
        Ex { field1: x, field2: b }
    }

    fn print_field1(self) {
        println!("field1 is {}", self.field1);
    }
}

fn main() {
    let ex = Ex::associated_fn(1, true);
    ex.print_field1(); // Invoking a method with a `.`
                        // `self` is automatically passed.
}

Now, the parameter self needs to adhere to the same borrow checker rules. Thus, when you call ex.print_field1(), ex moves into print_field1() since self is passed into it. What that means is that the next example does not work.

struct Ex {
    field1: i32,
    field2: bool,
}

impl Ex {
    fn associated_fn(x: i32, b: bool) -> Ex {
        println!("Creating a new Ex with {} and {}", x, b);
        Ex { field1: x, field2: b }
    }

    fn print_field1(self) {
        println!("field1 is {}", self.field1);
    }

    fn print_field2(self) {
        println!("field2 is {}", self.field2);
    }
}

fn main() {
    let ex = Ex::associated_fn(1, true);
    ex.print_field1();
    ex.print_field2();
}

As the compiler says, in the first call (ex.print_field1()), ex moves into print_field1(). Thus, the second call (ex.print_field2()) cannot use ex anymore. However, you can borrow self, just like any other variables/parameters.

struct Ex {
    field1: i32,
    field2: bool,
}

impl Ex {
    fn associated_fn(x: i32, b: bool) -> Ex {
        println!("Creating a new Ex with {} and {}", x, b);
        Ex { field1: x, field2: b }
    }

    fn print_field1(&self) { // Immutable borrow
        println!("field1 is {}", self.field1);
    }

    fn print_field2(&mut self) { // Mutable borrow
        println!("field2 is {}", self.field2);
    }
}

fn main() {
    let mut ex = Ex::associated_fn(1, true);
    ex.print_field1();
    ex.print_field2();
}

Oftentimes, you use associated functions for initialization. You use methods for instance-specific operations.

trait

Earlier, we mentioned that if a type implements a Copy trait, Rust does not move ownership but copies the value directly. We also mentioned that unsafe can be used to define an unsafe trait. A trait is similar to an interface or a template in other languages, and used to define a shared behavior across different types. It only defines functions and a type needs to implement those functions. For example, the following code defines a trait called TraitEx with a single function to implement shared_behavior().

#![allow(unused)]
fn main() {
trait TraitEx {
    fn shared_behavior(&self) -> String;
}
}

You can implement a trait for your struct as follows.

#![allow(unused)]
fn main() {
trait TraitEx {
    fn shared_behavior(&self) -> String;
}

struct StructEx;

impl TraitEx for StructEx {
    fn shared_behavior(&self) -> String {
        String::from("string")
    }
}
}

trait is heavily used in Rust and you will frequently encounter things like the following that might look confusing (below are taken from the Rust book).

#![allow(unused)]
fn main() {
fn notify<T: Summary>(item: &T) {}

fn notify(item: &impl Summary) {}
}

The above two are actually the same definition. What they mean is that the type of the parameter item can be a borrow of any type (hence the use of the generic parameter type T) that implements the Summary trait. This is called a trait bound, meaning that we are binding a parameter type to a trait. In fact, we can have multiple trait bounds for a parameter.

#![allow(unused)]
fn main() {
fn notify<T: Summary + Display>(item: &T) {}
}

The above defines the parameter item to have a type that implements two traits, Summary and Display. If we want to use many trait bounds, we can use where as follows.

#![allow(unused)]
fn main() {
fn some_function<T, U>(t: &T, u: &U) -> i32
    where T: Display + Clone,
          U: Clone + Debug
{}
}

Week 2: Rust (Continued)

There are a few remaining topics to highlight in Rust, which you will see and use frequently even as a beginner.

Lifetimes

In Safety Features of Rust, we looked at one borrow checker rule that Rust enforces, namely aliasing XOR mutability. There is in fact another borrow checker rule that Rust enforces, which is that a referent should outlive its references. We looked at this at play with move before.

fn main() {
    let a = String::from("a");
    print_str(a); // `a` moves into the function `print_str()`.

    println!("a is {}", a); // This throws an error,
                            // because `a` can no longer access the string.
}

fn print_str(x: String) {
    println!("String {}", x);
}

The above example is the same example we saw before in Safety Features of Rust. a is a reference of a referent String that contains "a". This string gets deallocated after executing print_str. If move did not occur, the original reference a would outlive the referent, which would cause a use-after-free problem.

In general, in order to make sure that a referent does outlive its reference, Rust needs to know how long a referent would live for (i.e., not be deallocated) and how long a reference would live (i.e., be pointing to a valid object in memory). The problem is that this is sometimes difficult to infer and Rust asks programmers to tell the compiler using a concept called lifetimes. Below, we will first look at functions and how lifetimes are important for functions. Later, we will look at other uses of lifetimes.

The Problem with References in Functions

Lifetimes are frequently used in functions, and most of the times Rust is able to automatically take care of lifetimes so the programmers do not need to worry about them. However, that is not always the case. Let's look at the following example to understand this further.

The example below uses &str which is called a string slice. You can read more about it in the Rust book.

fn ref_return(x: &str) -> &str {
    // Assume that this function does some complicated things
    // and returns a string slice.
}

fn main() {
    let s = String::from("string");

    let res = ref_return(&s);
    // Assume that the rest of the code does things with `res`.
}

In the above code, res is a reference to the string slice returned from ref_return() (the referent). Thus, Rust needs to check whether or not the referent outlives the reference. Determining how long the reference (res) would live is easy---it's until the end of the main() function. However, it is not easy to determine how long the referent that res points to would live---it's coming from ref_return() and generally speaking, unless you execute the function, you won't know what it's going to return and which memory location res will point to. Thus, Rust is unable to check if the referent would outlive the reference.

Lifetimes in Functions

Due to the above problem, if a function returns a reference, Rust asks programmers to tell the Rust compiler how long the reference would live. This is called the reference's lifetime. However, there is an important thing to keep in mind. Rust does not ask programmers to specify the lifetime of a reference. Instead, Rust only asks programmers to represent the relationship of the lifetimes of input parameters and returned references. It would be very difficult, if not impossible, for a programmer to determine how long a reference would be valid and Rust does not ask for it.

Let's look at a more concrete example to understand what this means. The example below is from the Rust book.

#![allow(unused)]
fn main() {
fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}
}

If you run the code, the compiler will complain that the return type misses a lifetime parameter. The error message will tell you how lifetime parameters look like and what to do. A lifetime parameter looks like 'a with a ' and a name of the parameter. It is used to represent the duration for which a variable would be valid and it comes after &.

Using a lifetime parameter, Rust expects programmers to tell its compiler how long a reference would be valid for. As mentioned earlier, Rust does not expect programmers to manually figure out how long a reference would be valid for. All Rust expects is how long a reference would be valid for in relation to input parameters. Let's take a look at the following example.

#![allow(unused)]
fn main() {
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}
}

There are a few things to note in the above code. First, <'a> is using a syntax similar to the one we saw for generics in More on Struct and Traits. It means that 'a is a generic lifetime parameter, i.e., it represents any lifetime, not a particular lifetime. We then use 'a for the input parameters and the return type to represent that they all have the same lifetime. This is basically showing the relationship between the input parameters' lifetimes and the return value's lifetime. As mentioned earlier, Rust doesn't ask programmers to specify a lifetime. It only asks programmers to show what the relationships are. For functions, you always need to show what the relationship is between input parameters' lifetimes and the return value's lifetime. Since the return value is either x or y, telling the Rust compiler that the input parameters and the return value have the same lifetime is exactly what we want to represent.

If we called this function from another function, Rust would be able to see that the return value would be valid as long as input parameters are valid. Using this information, Rust would be able to check if a referent would outlive its references.

'static

There is a special lifetime called the static lifetime, meaning that the reference can live "during the entire duration of the program". The best example is a string literal that is embedded in a program's binary. Since it is always accessible, the lifetime of a string literal is always 'static.

#![allow(unused)]
fn main() {
fn string_literal() -> &'static str {
    let literal = "a string literal"; // This string is embedded in the program binary.
    literal
}
}

Oftentimes, you will see that the Rust compiler's error messages suggest to use 'static and it is actually an easy way to satisfy the compiler to get your code compiled. However, a lot of times 'static is not what you should use. Thus, it is important to think hard about why 'static is appropriate for you before using it.

The Implication of Lifetimes in Functions

As the above example shows, when you return a reference, you have to tell the Rust compiler how the lifetime of the reference is related to the lifetimes of the input parameters. This has an interesting implication---you can only return a reference if it manipulates the input arguments. For example, suppose you have a custom struct and you create an instance of it within a function. You cannot return a reference to the newly-created instance of the struct.

There are actually two reasons why you cannot return a reference to a newly-created instance of a struct. One is that you cannot represent a lifetime of the reference, and the other is that the instance gets dropped at the end of the function.

When you feel the need to create a new object and return a reference for it, instead of returning a reference directly, you need to use a data structure that transfers ownership, e.g., Box, Vec, or String.

#![allow(unused)]
fn main() {
fn box_creation_and_return() -> Box<i32> {
    let a: i32 = 1;
    return Box::new(a);
}
}

When Do We Need Lifetime Parameters?

If you read the Rust book or the above description of lifetimes, you may get an impression that lifetimes are optional. This is actually not true, and lifetimes are mandatory for all references. The reason why you don't see lifetimes all the time is that the Rust compiler is smart enough to do the work for you. For example, if you use a reference in a struct or enum, you have to have a lifetime.

#![allow(unused)]
fn main() {
struct ProblematicStruct {
    str_slice: &str,
}
}

The above does not work because it has a reference as a field and there is no lifetime. The following fixes it.

#![allow(unused)]
fn main() {
struct ProblematicStruct<'a> {
    str_slice: &'a str,
}
}

As mentioned earlier, <'a> means that you're using a generic lifetime parameter ('a) in the definition of the struct. You can then use it for the reference field. However, within a function, Rust is quite often able to do the work for you. This is called lifetime elision.

There is a simple algorithm that the Rust compiler uses to determine what lifetimes should be, and if the algorithm cannot determine lifetimes, the compiler throws an error. Then you need to manually annotate lifetimes. The algorithm works as follows.

  • First, the algorithm assigns a lifetime parameter for each input parameter. For example, if fn ex_func(x: &str, y: &str) -> &str {...} is the function, then the algorithm assigns 'a to x and 'b to y like this: fn ex_func(x: &'a str, y: &'b str) -> &str {...}.
  • Second, if there is exactly one input lifetime (i.e., one parameter), then that lifetime is assigned to all output references. For example, if the function is fn ex_func(x: &str) -> &str {...}, then the algorithm assigns 'a to both the input parameter and the return reference like this: fn ex_func(x: &'a str) -> &'a str {...}.
  • Third, for methods with &self or &mut self, all output references get the same lifetime as self.

In the above longest() example, since there are two input parameters, the algorithm tries to assign 'a for the first parameter, then 'b for the second parameter. The problem is that there is nothing else the algorithm can do. The second case doesn't apply because there are more than one parameter. The third case doesn't apply either because there is no self reference. Thus, the algorithm fails to determine the lifetimes and asks for manual annotation.

?, dyn, and Macros

There are a few last things to highlight. We want to look at these because they are frequently used in the standard library and external crates.

?

? is a convenient operator that you might frequently use. It is a shortcut for handling Result and Option.

Recall from the Safety Features of Rust that you typically use a Result to handle successful return values and errors together. A Result wraps a return value or an error inside Ok() or Err() and you can use match to get the return value or the error.

fn function_with_result(success_or_fail: bool) -> Result<String, String> {
    match success_or_fail {
        true => Ok(String::from("success")),
        false => Err(String::from("fail")),
    }
}

fn function_that_handles_result() -> Result<(), String> {
    let result = function_with_result(true);

    match result {
        Ok(success_result) => {
            println!("success_result: {}", success_result);
            Ok(())
        },
        Err(error_result) => Err(error_result)
    }
}

fn main() {
    function_that_handles_result();
}

This works well but it can be repetitive and tiring since you typically make many function calls and need to handle a Result. Thus, Rust provides ?, an operator to handle a Result easily. What it does is that, if the Result is Ok(), it pulls out what's inside Ok(), and if the Result is Err(), it returns Err() and exits from the entire function. Using ?, the above code can be revised as follows.

fn function_with_result(success_or_fail: bool) -> Result<String, String> {
    match success_or_fail {
        true => Ok(String::from("success")),
        false => Err(String::from("fail")),
    }
}

fn function_that_handles_result() -> Result<(), String> {
    let success_result = function_with_result(true)?;
    println!("success_result: {}", success_result);
    Ok(())
}

fn main() {
    function_that_handles_result();
}

? can also be used to handle Option. If the value is None, it exits the function early and returns None. If the value is Some(), it pulls the value out of Some().

dyn

dyn is a keyword that represents a trait object. A trait object is an object of any type that implements the trait. You can see the use of it in error propagation like the following example from the Rust book.

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let f = std::fs::File::open("hello.txt")?;

    Ok(())
}

Box<dyn Error>> means that it is a Box that contains an object of any type that implements the Error trait.

You may recall that in More on Struct and Traits, we talked about generic type parameters with trait bounds. The purpose is almost exactly the same, i.e., it represents any type that implements certain traits. dyn is different in two ways.

  • dyn can only take one trait. (There are some nuances here and look at the Rust Reference on traits for that.)
  • dyn is dynamic while a generic type parameter with a trait bound is static. This means that for a generic type with a trait bound, the compiler will automatically generate code for each possible type. With dyn, the compiler will inject code that finds the right type (which is called dynamic dispatch).

Macros

Similar to C/C++, Rust provides macros. The first example you typically see is println!(). ! indicates that it is a macro. A good resource to learn about macros is (again) the Rust book. It has a section on macros.

Useful Crates and Tools

Rust is not just a language, it is an ecosystem for development with various development tools and crates. cargo is one such example but there are many more tools as well as crates. Here we highlight some of the popular ones.

Crates

A crate refers to a binary or a library in Rust. You may be familiar with it already since cargo creates a binary or library crate in a package. There are many crates out there that are very useful for development.

crates.io & lib.rs

crates.io is a place to find an external crate that you can leverage in your code. All crates available on crates.io have a documentation available on Docs.rs. lib.rs is an alternative to crates.io and provides similar functionality.

serde

serde is a de fact standard serialization library for Rust. It allows you to transform a data structure into a different format (e.g., JSON) and later restore it into the same data structure. serde is highly popular and convenient, and many Rust programs rely on it.

Tokio

Tokio provides an asynchronous runtime for Rust. Asynchrony here means that you make a call and you return right away without waiting until the call finishes. Thus, you can get much better performance than synchronous counterparts. You could also use Async instead.

Logging Crates

Logging is necessary for any kind of development and Rust has good support for it. The most common crate to use is the log crate, which provides an interface. Since the crate only provides an interface, you need to use another crate that provides an implementation, e.g., env_logger.

Command-Line Argument Parsing Crates

Argument parsing is often necessary for programs and Rust has convenient external crates that you can use. clap is one such crate and it is easy to use, and structopt is another popular one.

Tools

Clippy

The first one to highlight is called Clippy, which is a linter that checks your code and catches mistakes or stylistic issues. You can install it with rustup and it is widely popular. Just to show the usefulness of it, here are some of the examples that Clippy can catch.

  • Example 1

    Catches:

    #![allow(unused)]
    fn main() {
    fn func(opt: Option<Result<u64, String>>) {
        let n = match opt {
            Some(n) => match n {
                Ok(n) => n,
                _ => return,
            }
            None => return,
        };
    }
    }
    

    Suggests:

    #![allow(unused)]
    fn main() {
    fn func(opt: Option<Result<u64, String>>) {
      let n = match opt {
          Some(Ok(n)) => n,
          _ => return,
      };
    }
    }
    
  • Example 2

    Catches:

    #![allow(unused)]
    fn main() {
    for i in iter {
      if let Some(value) = i.parse().ok() {
          vec.push(value)
      }
    }
    }
    

    Suggests:

    #![allow(unused)]
    fn main() {
    for i in iter {
      if let Ok(value) = i.parse() {
          vec.push(value)
      }
    }
    }
    
  • Example 3

    Catches:

    #![allow(unused)]
    fn main() {
    fn bar(stool: &str) {}
    let x = Some("abc");
    
    match x {
        Some(ref foo) => bar(foo),
        _ => (),
    }
    }
    

    Suggests:

    #![allow(unused)]
    fn main() {
    fn bar(stool: &str) {}
    let x = Some("abc");
    
    if let Some(ref foo) = x {
        bar(foo);
    }
    }
    

rustfmt

rustfmt is the default code formatter for Rust. You should use it if not already. It is better to integrate it with your editor, e.g., by using a Rust plugin for your editor or through the combination of plugins and rls.

Miri

Miri is a runtime interpreter for Rust that can check some of the problems with your code especially with the unsafe part of your code. You need to use a nightly version of Rust in order to use Miri.

cargo test

Rust has great support for testing via cargo test. You can find the details from the Rust book and the Cargo book.

Idioms and Design Patterns

In any language, learning idioms and design patterns is important to write readable and maintainable code. Rust is no exception and there are good resources regarding the topic. For example, Rust by Example, Rust Design Patterns, Idiomatic Rust, and the Rust book on Object-Oriented Design Pattern are all excellent resources. Especially, Rust by Example and Rust Design Patterns are highly useful. Here we highlight some of the ones useful for beginners.

Use Expressions

Source: https://cheats.rs/#idiomatic-rust

if, match, loop, for, while, etc. are all expressions in Rust that evaluate to a value.

Don't write:

fn main() {
    let condition = true;
    let assignment;

    if condition {
        assignment = true;
    } else {
        assignment = false;
    }
}

Do write:

fn main() {
    let condition = true;
    let assignment = if condition {
        true
    } else {
        false
    };
}

Use Pattern Matching and Destructuring

Sources:
https://doc.rust-lang.org/book/ch18-01-all-the-places-for-patterns.html

Rust's pattern matching is powerful and it can destructure complex data types.

Slice Destructuring

fn main() {
    let array = [0, 1, 2];
    let [a, b, c] = array;

    println!("{}, {}, {}", a, b, c);
}

Subslice Destructuring

fn slice_pattern(array: &[i32]) {
    match array {
        [] => println!("Empty array"),
        [x0] => println!("One element: {}", x0),
        [x0, y @ .., xn] => println!("First: {}, Last: {}, Middle: {:?}", x0, xn, y),
    }
}

fn main() {
    slice_pattern(&[]);
    slice_pattern(&[0]);
    slice_pattern(&[0, 1, 2, 3, 4]);
}

struct, enum, Tuple Destructuring

struct ExStruct<'a> {
    field1: i32,
    field2: &'a str,
}

enum ExEnum<'a> {
    Variant(ExStruct<'a>),
}

fn main() {
    // Tuple destructuring
    let ex_tuple = (1, 2, 3);
    let (x, y, z) = ex_tuple;

    println!("x: {}, y: {}, z: {}", x, y, z);

    // `struct` destructuring
    let ex_struct = ExStruct {
        field1: 1,
        field2: "string",
    };

    let ExStruct { field1: i, field2: s } = ex_struct;
    let ExStruct { field1, field2 } = ex_struct;

    println!("field1: {}, field2: {}", i, s);
    println!("field1: {}, field2: {}", field1, field2);

    // Nested `enum` & `struct` destructuring
    let ex_enum = ExEnum::Variant(ex_struct);

    let ExEnum::Variant(ExStruct { field1, field2 }) = ex_enum;
    println!("field1: {}, field2: {}", field1, field2);
}

Reference Destructuring

fn main() {
    let mut x = 32;

    match x {
        ref x_ref => println!("Reference: {:?}", x_ref),
    }

    match x {
        ref mut x_mut_ref => println!("Mutable reference: {:?}", x_mut_ref),
    }
}

Use Iterators

Source: https://cheats.rs/#idiomatic-rust

Use iterators whenever possible.

Don't write:

fn main() {
    let a = [0, 1, 2, 3, 4];
    let mut i = 0;

    while i < a.len() {
        println!("{}", a[i]);
        i += 1;
    }
}

Do write:

fn main() {
    let a = [0, 1, 2, 3, 4];

    for e in a {
        println!("{}", e);
    }
}

Avoid Declaring First

Source: https://doc.rust-lang.org/rust-by-example/variable_bindings/declare.html

You can declare a variable first and later initialize it. However, it is better to avoid it as it may lead to using uninitialized variables. The examples below are adapted from the above source.

Don't write:

fn main() {
    let a_binding;
    let x = 2;

    // Initialize the binding
    a_binding = x * x;
}

Do write:

fn main() {
    let x = 2;
    let a_binding = x * x;
}

Think about clone() Before Using It

Source: https://rust-unofficial.github.io/patterns/anti_patterns/borrow_clone.html

It is easy to satisfy the compiler by calling clone(). However, it is important to understand what it means before doing it. It creates a separate variable, which means that making a change to the cloned value does not change the original value. Also, there is a cost of doing it, which may or may not be cheap. Thus, before using clone(), it is important to think through first. This does not mean that

Use the Debug Formatter

Rust's standard library types are all printable using the debug formatter {:?} or the "pretty printing" formatter {:#?}.

fn main() {
    let vec = vec![0, 1, 2, 3];

    println!("Debug: {:?}", vec);
    println!("Pretty print: {:#?}", vec);
}

In contrast, the default formatter throws an error.

fn main() {
    let vec = vec![0, 1, 2, 3];

    println!("{}", vec);
}

Use #[derive()]

Source: https://doc.rust-lang.org/rust-by-example/trait/derive.html

Rust has a feature called attributes and you use #[] to annotate attributes you want to use. There are many useful attributes and one is called derive. It automatically adds trait implementation for traits that implement derive macros.

For example, the following code automatically implements the Debug trait, so you can use the debug formatter.

#[derive(Debug)]
struct Ex {
    field1: i32,
    field2: i64,
}

fn main() {
    let ex = Ex { field1: 32, field2: 64 };
    println!("{:?}", ex);
}

There are other useful traits to derive.

  • Clone provide clone().
  • Debug makes your custom type (e.g., struct or enum) printable.
  • Default creates a default instance for your custom type (e.g., struct or enum) with empty values.
  • Hash computes a hash.

Implement the Display Trait

Source: https://doc.rust-lang.org/rust-by-example/hello/print/print_display.html

Deriving Debug is convenient but oftentimes you want to customize how your custom type prints out. You can implement the Display trait to control that.

struct Ex {
    field1: i32,
    field2: i32,
}

impl std::fmt::Display for Ex {
    // You need to implement the following function.
    // You can use `write!()`, which is similar to
    // `println!()` but writes to a `Formatter`.
    // `write!()` returns the right type for `fmt()`.
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "field1: {}, field2: {}", self.field1, self.field2)
    }
}

fn main() {
    let ex = Ex { field1: 32, field2: 64 };
    println!("{}", ex);
}

Implement new()

Source: https://rust-unofficial.github.io/patterns/idioms/ctor.html?highlight=construc#constructors

Rust's convention for constructing a new object is via new(). When you define your own type, implement new() that creates a new instance.

#![allow(unused)]
fn main() {
struct Ex {
    field1: i32,
    field2: i32,
}

impl Ex {
    fn new() -> Self {
        Ex {
            field1: 0,
            field2: 0,
        }
    }
}
}

Implement the Default Trait Along with new()

Sources:
https://rust-lang.github.io/rust-clippy/master/#new_without_default
https://rust-unofficial.github.io/patterns/idioms/default.html

The user of your type might expect to be able to use Default. It is also more convenient for you.

#![allow(unused)]
fn main() {
struct Ex {
    field1: i32,
    field2: i32,
}

impl Ex {
    fn new() -> Self {
        Ex {
            field1: 0,
            field2: 0,
        }
    }
}

impl std::default::Default for Ex {
    fn default() -> Self {
        Ex::new()
    }
}
}

match De-Nesting

You often end up having deeply-nested code especially when match is involved, which reduces readability. There are a few things you can try for de-nesting.

Use if let

Source: https://doc.rust-lang.org/rust-by-example/flow_control/if_let.html

Sometimes if let can make your match more readable and concise. The source above provides a good example.

Don't write:

fn main() {
    let optional = Some(7);

    match optional {
        Some(i) => {
            println!("This is a really long string and `{:?}`", i);
            // ^ Needed 2 indentations just so we could destructure
            // `i` from the option.
        },
        _ => {},
        // ^ Required because `match` is exhaustive. Doesn't it seem
        // like wasted space?
    };
}

Do write:

fn main() {
    let optional = Some(7);

    if let Some(i) = optional {
        println!("This is a really long string and `{:?}`", i);
    }
}

Use while let

Source: https://doc.rust-lang.org/rust-by-example/flow_control/while_let.html

Similar to if let, while let can make your match more readable and concise. The source above provides a good example.

Don't write:

fn main() {
    let mut optional = Some(0);

    // Repeatedly try this test.
    loop {
        match optional {
            // If `optional` destructures, evaluate the block.
            Some(i) => {
                if i > 9 {
                    println!("Greater than 9, quit!");
                    optional = None;
                } else {
                    println!("`i` is `{:?}`. Try again.", i);
                    optional = Some(i + 1);
                }
                // ^ Requires 3 indentations!
            },
            // Quit the loop when the destructure fails:
            _ => { break; }
            // ^ Why should this be required? There must be a better way!
        }
    }
}

Do write:

fn main() {
    let mut optional = Some(0);

    // Repeatedly try this test.
    while let Some(i) = optional {
        if i > 9 {
            println!("Greater than 9, quit!");
            optional = None;
        } else {
            println!("`i` is `{:?}`. Try again.", i);
            optional = Some(i + 1);
        }
    }
}

Use Tuple Matching

Source: https://github.com/ferrous-systems/elements-of-rust#tuple-matching

If you need to match on multiple variables, you can group them as a tuple and flatten the nested match expressions.

Don't write:

fn main() {
    let first_match: Option<i32> = Some(1);
    let second_match: Option<i32> = None;

    match first_match {
        Some(_) => {
            match second_match {
                None => println!("None found"),
                _ => (),
            }
        },
        _ => ()
    }
}

Do write:

fn main() {
    let first_match: Option<i32> = Some(1);
    let second_match: Option<i32> = None;

    match (first_match, second_match) {
        (Some(i), None) => println!("None found"),
        _ => (),
    }
}

Use match Guards

Source: https://doc.rust-lang.org/rust-by-example/flow_control/match/guard.html

match guards are additional conditions you can use to filter a match arm. The following examples are adapted from the above source.

fn main() {
    let pair = (2, -2);

    println!("Tell me about {:?}", pair);
    match pair {
        (x, y) => {
            if x == y {
                println!("These are twins");
            } else if x + y == 0 {
                println!("Antimatter, kaboom!");
            }
        },
        (x, _) => {
            if x % 2 == 1 {
                println!("The first one is odd");
            }
        },
        _ => println!("No correlation..."),
    }
}

The above can be revised as follows.

fn main() {
    let pair = (2, -2);

    println!("Tell me about {:?}", pair);
    match pair {
        (x, y) if x == y => println!("These are twins"),
        // The ^ `if condition` part is a guard
        (x, y) if x + y == 0 => println!("Antimatter, kaboom!"),
        (x, _) if x % 2 == 1 => println!("The first one is odd"),
        _ => println!("No correlation..."),
    }
}

Use match Binding

Source: https://doc.rust-lang.org/rust-by-example/flow_control/match/binding.html

You can bind a matching value to a variable. The above source has good examples.

Don't write:

fn main() {
    let num = Some(42);

    match num {
        Some(n) => {
            if n == 42 {
                println!("The Answer: {}!", n);
            } else {
                println!("Not interesting... {}", n);
            }
        },
        _            => (),
    }
}

Do write:

fn main() {
    let num = Some(42);

    match num {
        Some(n @ 42) => println!("The Answer: {}!", n),
        Some(n)      => println!("Not interesting... {}", n),
        _            => (),
    }
}

It is even shorter than using a match guard.

fn main() {
    let num = Some(42);

    match num {
        // Got `Some` variant, match if its value, bound to `n`,
        // is equal to 42.
        Some(n) if n == 42 => println!("The Answer: {}!", n),
        // Match any other number.
        Some(n)      => println!("Not interesting... {}", n),
        // Match anything else (`None` variant).
        _            => (),
    }
}

Don't write:

fn age() -> u32 {
    15
}

fn main() {
    let age = age();
    match age {
        0         => println!("I haven't celebrated my first birthday yet"),
        1  ..= 12 => {
            println!("I'm a child of age {:?}", age);
        },
        13 ..= 19 => {
            println!("I'm a teen of age {:?}", age);
        },
        // Nothing bound. Return the result.
        _         => println!("I'm an old person of age {:?}", age),
    }
}

Do write:

fn age() -> u32 {
    15
}

fn main() {
    match age() {
        0             => println!("I haven't celebrated my first birthday yet"),
        n @ 1  ..= 12 => println!("I'm a child of age {:?}", n),
        n @ 13 ..= 19 => println!("I'm a teen of age {:?}", n),
        // Nothing bound. Return the result.
        n             => println!("I'm an old person of age {:?}", n),
    }
}

Processing Collections of Items Using Functional Language Features

Sources
https://doc.rust-lang.org/book/ch13-00-functional-features.html
https://rust-unofficial.github.io/patterns/functional/index.html

Rust provides many functional language features, and the most popular usage is when processing a collection of items. Some example methods are fold(), filter(), map(), reduce(), etc.

The following uses an imperative programming style to calculate a sum.

fn main() {
    let mut sum = 0;
    for i in 1..11 {
        sum += i;
    }
    println!("{}", sum);
}

We can accomplish the same task using fold(). The method signature of fold() looks like the following.

#![allow(unused)]
fn main() {
fn fold<B, F>(self, init: B, f: F) -> B
where
    F: FnMut(B, Self::Item) -> B
}

The first parameter is an initial value. The second parameter f takes a closure which is an anonymous function or a lambda function in Rust. The syntax for a closure is |param1, param2, ...| { function body }. If the function body is a single line, you can omit {}, i.e., |param1, param2, ...| single_line_function_body. In case of fold() there should be two parameters for the closure, e.g., it.fold(init, |acc, x| { /* function body */ });.

This means that fold() can be called on an iterator, and it takes two arguments---one is the initial value and the other is a closure. fold() first takes the initial value and the first item in the iterator. Using those as arguments, fold() calls the closure. From there, fold() iterates---it takes the result of the closure from the previous iteration as well as the next item in the iterator, and calls the closure again using those as arguments. It returns the final result from the final call to the closure. Thus, the following code calculates a sum.

fn main() {
    println!("{}", (1..11).fold(0, |a, b| /* function body */ ));
}

Use Method Chaining

A common pattern found in Rust programs is method chaining. You can see this often with the collection-processing functions such as map(), filter(), etc., when a series of transformations need to be done for a collection.

fn main() {
    println!(
        "{}",
        [0, 1, 2, 3, 4]
            .iter()
            .map(|x| x * x)
            .filter(|x| *x > 5)
            .fold(0, |a, b| a + b)
    );
}

Here is another example that executes a command-line program.

fn main() {
    std::process::Command::new("sh")
            .arg("-c")
            .arg("echo hello")
            .output()
            .expect("failed to execute process");
}

A common way to use method chaining in your own code is to return self for your methods.

Error Handling

Sources:
https://blog.burntsushi.net/rust-error-handling/
https://doc.rust-lang.org/rust-by-example/error.html

Use ?

Source: https://cheats.rs/#idiomatic-rust

Your code will be much more readable and concise with ?.

Don't write:

#[allow(unused)]
fn main() -> std::io::Result<()> {
    let f = std::fs::File::open("hello.txt");

    let f = match f {
        Ok(file) => file,
        Err(e) => return Err(e),
    };

    Ok(())
}

Do write:

#[allow(unused)]
fn main() -> std::io::Result<()> {
    let f = std::fs::File::open("hello.txt")?;

    Ok(())
}

Error Trait Object

Source: https://doc.rust-lang.org/rust-by-example/error/multiple_error_types/boxing_errors.html

Sometimes (but not always) it is useful to use the Error trait object to propagate original errors.

Do write:

#[allow(unused)]
fn error_propagation() -> Result<(), Box<dyn std::error::Error>> {
    let f = std::fs::File::open("hello.txt")?;

    Ok(())
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    error_propagation()
}

Use type to Create a (Shorter) Result Alias

Source: https://doc.rust-lang.org/rust-by-example/error/result/result_alias.html

type allows you to create a type alias. You can use it to make your Result type more concise.

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
}

Use map(), map_or(), and(), and_then(), unwrap(), unwrap_or(), Etc.

Sources:
https://doc.rust-lang.org/rust-by-example/error/result.html
https://doc.rust-lang.org/rust-by-example/error/option_unwrap.html

Both Result and Option provide many helper methods to process them in various ways. You can use them instead of match. For example, map() for Option maps Some to Some by applying the provided lambda function, and None to None.

The following is an example of using match to process Option.

fn main() {
    let opt = Some(1);

    match opt {
        Some(i) => println!("{}", i),
        None => (),
    }
}

The same can be done with map().

fn main() {
    let opt = Some(1);

    opt.map(|i| println!("{}", i));
}

There are other convenient functions like map_or(), map_or_else(), and(), and_then(), unwrap(), unwrap_or(), etc. for both Result and Option.

New Type Idiom

Source: https://doc.rust-lang.org/rust-by-example/generics/new_types.html

If you use the same type for different purposes, often it is better to create a new type for each purpose. The above source shows an example where i64 is used for both years and days to count an age. Since the same type i64 is used for two different purposes, you can create one new type for years and another new type for days as follows.

#![allow(unused)]
fn main() {
struct Years(i64);
struct Days(i64);
}

The benefit of doing this is that you can use the compiler's type checker to ensure that you are using i64 for the right purpose. If you make a call to a function that needs to use years instead of days, then you can use Years as the type instead of i64. This will make sure that you are passing the right value with the right type to the function that expects to use years, not days.

#![allow(unused)]
fn main() {
fn old_enough(age: &Years) -> bool {
    age.0 >= 18
}
}

The above function expects to use years, not days and now the compiler will make sure that you pass years, not days.

struct Years(i64);
struct Days(i64);

fn old_enough(age: &Years) -> bool {
    age.0 >= 18
}

fn main() {
    let years = Years(32);
    let days = Days(years.0 * 365);

    println!("{}", old_enough(&years));

    // Uncomment the following line and run it to see an error.
    // println!("{}", old_enough(&days));
}

Use &str, &T, and &[T] for Function Parameters

Sources:
https://rust-unofficial.github.io/patterns/idioms/coercion-arguments.html?highlight=string#use-borrowed-types-for-arguments
https://hermanradtke.com/2015/05/03/string-vs-str-in-rust-functions.html

When borrowing String, Box, and Vec in a function, use &str, &T, and &[T] instead of &String, &Box, and &Vec. There are two advantages.

  • String, Box, and Vec are basically pointers already, so adding & will have another layer of indirection.
  • &str, &T, and &[T] provide more flexibility. They accept not only &str, &T, and &[T] but also &String, &Box, and &Vec.

One of the sources above illustrates this well.

fn print_me(msg: &str) { println!("msg = {}", msg); }

fn main() {
    let string = "hello world";
    print_me(string);

    let owned_string = String::from("hello world"); // or String::from_str("hello world")
    print_me(&owned_string);

    let boxed_string = std::boxed::Box::new(String::from("hello world"));
    print_me(&boxed_string);

    let counted_string = std::rc::Rc::new(String::from("hello world"));
    print_me(&counted_string);

    let atomically_counted_string = std::sync::Arc::new(String::from("hello world"));
    print_me(&atomically_counted_string);
}

Week 3: Fuzz Testing, Property-Based Testing, Symbolic Execution, and SAT/SMT

Fuzz Testing and Property-Based Testing

Symbolic Execution

SAT/SMT

Week 5: Rust Analyses

Week 6: Rust OSes

Week 7: Symbolic Execution

Week 8: Hybrid Fuzzing

Week 9: Formal Methods

Week 10: Formal Methods (Continued)

Week 11: Interactive Verification

Week 12: Automated Verification