Syllabus
This course is a research-oriented course where you are expected to produce research results on a specific topic of your choice, related to the overall theme of the course. To this end, you will carry out a number of tasks that a researcher in the field of computer science typically does. This includes, reading papers to understand the literature, identifying an important problem to solve, solving the problem, evaluating how good your solution is, and potentially repeating the process to find a better solution.
Consequently, this course assumes that you are interested in doing research in computer science, especially in the areas within this course's interest---systems and software engineering. If you are a grad student interested in doing research in these areas, or if you are an undergrad student who is thinking of going to grad school to do research in these areas, this course can be a good fit for you.
The overall theme of this course is techniques for reliably building systems software. Though this can be interpreted in many different ways, this course has specific topics of interest, such as fuzzing, property-based testing, symbolic/concolic execution, interactive verification, and automated verification, all applied to building systems software such as an OS kernel. This course also features Rust as another focus, due to its memory safety guarantees that can improve software reliability.
Upon completing this course, you will have gone through the complete cycle of research---identifying a research problem, doing a literature review, coming up with a solution, and evaluating the solution. In addition, you will have learned the aforementioned techniques (to a limited extent) and read a number of papers related to those techniques.
Although this course is open to anybody who meets the minimum prerequisites, it may not be a good fit for you for the following reasons.
- This course is a research course, meaning that it is less structured. Though there are various structured learning components such as lectures and assignments, those are not the sole focus of the course. Rather, the course expects you to explore, experiment, and do your own learning on the way. It is ultimately your responsibility to make the most out of the course.
- Rust is a relatively new language, which means that the infrastructure around it may not be mature yet. For example, though symbolic execution, property-based testing, or fuzz testing tools for Rust are mostly usable, they may still have some rough edges. This goes well with the spirit of the course that encourages you to explore and experiment. However, you may run into various issues as you use different tools and it may be necessary to pivot around.
- The course content and schedule are fluid and can change. For example, if certain tools turn out to be unusable, the course content and schedule will adapt to that new piece of information.
- This course is different from other courses where there are known, preset answers and your job is to find those answers. Instead, you need to identify a research problem yourself and solve it. The problems that you identify can be open-ended and may or may not be clear if there is a solution.
- The course is new so it might have some rough edges itself.
Think carefully regarding the above before you commit to this course. You are welcome to take this course if the above are not an issue for you.
Administrative Information
Time and Location
Thursdays: 11:30 AM - 2:20 PM (AQ 5037, Burnaby)
Instructor and TA Information
Instructor: Steve Ko <steveyko@sfu.ca>
TA: Anant Awasthy <anant_awasthy@sfu.ca>
Office Hours
TBA
Prerequisites
- CMPT 300 with a minimum grade of C-
- Mastery of using Linux's command line interface
System Requirements
- An installation of Linux with
sudo
access - An editor/IDE set up for Rust, e.g., Vim/Neovim, Emacs, VS Code, CLion, etc.
Grading
Grading Component | Weight |
---|---|
Course project | 40% |
Class prep | 20% |
Paper presentation | 10% |
Programming assignment 1 | 10% |
Programming assignment 2 | 10% |
Class participation | 10% |
Late Submission Policy
All assignments have hard deadlines. No late submissions are allowed.
Regrading Policy
Assignments and exams (if any) may be submitted for regrading to correct grading errors.
- Regrade requests are due no later than one (1) week after the grades are posted.
- Regrade requests must be clearly written and attached to the assignment.
- Regrades requests are intended to correct grading errors, NOT to negotiate for a higher grade. When work is submitted for regrade, the entire work may be regraded, which may result in a lower grade.
Accessibility Resources
If you would like reasonable accommodations to participate in this course, please contact the instructor as well as the Centre for Accessible Learning (CAL). The staff at CAL will provide you with information and review appropriate arrangements for reasonable accommodations.
Academic Honesty Statement
This course has a very high standard for academic integrity. Any type of academic integrity violation will result in an F for the semester. In general, this course follows the SFU Academic Honesty and Student Conduct Policies.
COVID and Mask Policies
This course follows the COVID and mask policies set by the university. There is a university website that contains general information regarding returning to campuses.
Course Schedule
The following schedule is subject to change.
Class Prep
To prepare for each class, you are expected to read a few papers and write summaries for them. For each paper, you need to read it using the method described in How to Read a Paper, up to the second pass. Each week, the instructor posts a set of questions that you need to answer in your paper summaries for the coming week.
Due
Two nights before each class (i.e., every Tuesday night)
Submission
First, accept the repo invite for class preparation from our GitHub Classroom. Once that's done, you
should push your summaries for each week to the repo by the deadline. You should create a file named
week_X.md
(where X
is the week's number) and write your summaries in plain text or Markdown.
Grading
Your class prep is 20% toward the overall final grade.
Programming Assignments
Assignment 1: Toybox Command Reimplementation
This assignment asks you to reimplement a Toybox command,
file
. The goal is to produce a working command that implements the same features as the original
Toybox implementation.
Due
Friday Sep 30 Oct 7 at 00:00 AM (Note: this is the night of the 29th the 6th.)
Requirements
- You need to reimplement the
file
command. - It should implement the same set of features as in the original Toybox's
file
implementation. - You should not use any external crates, but there are two exceptions. The first exception is when the original Toybox source uses an external library. If that is the case, you can find a Rust crate that provides similar functionality and use it. The second exception is for command line arguments. There are a number of good external crates that process command line arguments, and you can use any one of those. If you do use an external crate, you need to get approval first.
- Your command should work as a stand-alone executable. Note that the original Toybox produces a single executable for all commands that it supports.
- You should not use unsafe Rust. However, if it is unavoidable, you need to get approval first. You then need to carefully document it as comments on the source itself and explain why you have to use unsafe Rust.
- You might need to also reimplement some of the shared code, e.g.,
toys.h
,toybox/lib/
, etc. This is part of your assignment. - We grade your submission using the test cases from the original Toybox test source.
How to Submit
Submit on GitHub Classroom as follows.
- Accept the assignment invitation on GitHub Classroom.
- Make sure you push your code before the deadline. You need to be careful because you can keep pushing to the repo even after the deadline, which is what you should not do. The grading is be done for the last version pushed before the deadline.
Grading
TBA
Assignment 2: Simple Symbolic Execution Engine for Rust
This assignment asks you to implement a simple symbolic execution engine for Rust.
Due
Friday Nov 18 at 00:00 AM (Note: this is the night of the 17th.)
Requirements
How to Submit
Grading
Course Project
The goal of your course project is to identify a research problem, solve it, and evaluate the effectiveness of it. At the end of the semester, you are expected to show a demo of your research prototype to class and submit a project report. The topic of your project should be one of the topics we discuss in class.
Though the course schedule has structured components to help you make progress on your project in a timely fashion, you are highly encouraged to talk to the instructor about your project's direction and progress. This is especially true if you do not have much experience in carrying out a research project.
The following is the timeline for your project.
Milestone 1: Team Formation
The first thing to do is to find your teammates. Your team can be up to 3 people.
Due
Friday of Week 2
Submission
Your "submission" is to accept the repo invite for the course project with your teammates from our GitHub Classroom. Please make sure that you do this by the end of Week 2 (Friday).
Grading
There is no grading for this milestone. However, there is a 1% penalty for missing the deadline, deducted from the final grade of your course project.
Milestone 2: Topic Selection
With your teammates, you can go through the course schedule and find a topic that is appealing to your team. Once you find a topic, you can skim through the papers listed under the topic to gain a better understanding of it.
When selecting a topic, one good starting point is to think about what you want to gain deeper understanding of. Doing research on one topic is ultimately about gaining expertise on that topic. One way to make the most out of this course is to choose a research topic that you want to gain expertise on.
Due
Friday of Week 3
Submission
Your submission is to push a file named TOPIC.md
to your GitHub Classroom repo. The file should
contain your team's topic written in plain text or Markdown. Please make sure you do this by the
end of Week 3 (Friday).
Grading
There is no grading for this milestone. However, there is a 1% penalty for missing the deadline, deducted from the final grade of your course project.
Milestone 3: Problem Selection & Proposal
After you decide on a topic, you are expected to identify a problem to work on. The goal is to generate as many ideas as possible, settle on one problem, and write a proposal for it. Just as selecting a topic, one good starting point is to think about what you want to gain deeper understanding of.
The exact kind of problem you want to work on can depend on your topic. For example, if your topic is fuzzing, you can find a problem of fuzzing and solve the problem by implementing your solution for a fuzzing tool. If your topic is automated verification, you can write a program and verify it by using a tool like Dafny. If you are stuck, you can always talk to the instructor.
It is ideal if your platform is based on Rust. However, this is not a requirement. If your topic or problem may have nothing to do with Rust, which is fine.
Due
Friday of Week 5
Submission
Your submission is to push a plain text or Markdown file named PROPOSAL.md
to your GitHub
Classroom repo. The file should clearly answer the following questions.
- What is your problem statement?
- What is a rough idea for your solution?
- What is your team going to build for the solution? Clearly specify the tools that you will either leverage or modify.
- What are you going to accomplish by the intermediate demo deadline?
You might find this article useful when writing your proposal.
Grading
Your proposal is 5% toward your overall semester final grade. The grading is not based on length but quality. If you answer adequately, your proposal does not have to be long. However, keep in mind that short answers often do not contain enough information to understand clearly.
Milestone 4: Intermediate Demo
We have an intermediate checkpoint to make sure that your team is making good progress on the project. For this, we set aside some time in class for every team to show an intermediate demo. The goal is to show a demo that you have promised in your proposal. Obviously, there can be unanticipated challenges that you have not foreseen and you may not be able to show a demo as planned. This is fine as long as you make honest effort. If this is the case, you need to talk to the instructor well before the deadline (e.g., 1-2 weeks earlier) in order to come up with a new demo plan. This is entirely your responsibility. The instructor does not verify with individual teams on their plan change.
Due
Week 10 class
Submission
Your "submission" is to show a demo in class.
Grading
Your intermediate demo is 10% toward your overall semester final grade.
Milestone 5: Final Demo & Report
At the end of the semester, you are expected to show a demo of your final prototype in class. You are also expected to submit a project report. The final demos take place in the last class of the semester. The report should be in PDF and 5 pages long (excluding references) with a 10-pt Times New Roman font. The report should contain sections that roughly correspond to the following.
- Introduction
- Include a clear problem statement here.
- Overview of the solution
- Diagrams typically help for an overview.
- Detailed description of the solution
- Discuss your algorithms, methodologies, etc. as well as what worked and what didn't work. Include diagrams, tables, etc. as appropriate.
- Results
- First design and run experiments. Then plot the results and include them in your report.
- Conclusion
- Draw a conclusion regarding what your solution and results tell you.
- Responsibilities
- Describe who did what in your team.
You might find this talk useful.
Due
The final demos are during the class in Week 13. The final report is due on Dec 6.
Submission
Your submission for the final report is to push a file named final_report.pdf
to your GitHub
Classroom repo. This should be a PDF file, not plain text or Markdown. Please make sure that you
push it to the remote repo by the deadline.
Grading
Your final demo is 10% toward your overall semester final grade, and your final report is 15% toward your overall semester final grade.
Paper Discussion
Along with your team members, you are expected to give presentations in class. These student-led classes start from Week 4. Each presentation should consist of the following two parts.
Class-Prep Paper Discussion
The first part of your presentation is discussing class-prep papers. In order to do this, you are expected to read the class-prep papers up to the third pass described in How to Read a Paper. This will give you deeper understanding of the papers. The instructor will provide you with a set of questions, derived from student summaries, to discuss in class. You are expected to prepare your own answers for those questions, and discuss the questions and answers in class.
Paper Presentation
The second part of your presentation is presenting additional papers. If you look at the schedule, there are additional papers that are not part of the class-prep paper list for each week. You are expected to explain those additional papers to class by preparing a presentation. Since other students do not read those papers, your goal is to explain the papers as clearly as possible. This cannot be rushed---it requires careful reading (i.e., read the papers well in advance) and thinking about your presentation strategies that can maximize clarity. You need to run it among your team a few times to see if everything's clear. You might find this talk useful.
Grading
Each part is 5% toward the overall semester final grade.
Approval Instructions
In order to get approval regarding a piece of code (e.g., using an external crate or unsafe Rust), create a new issue from the code and mention @steveyko. Mentioning @steveyko is important as it notifies the instructor.
Week 1: Rust
The focus of the first couple weeks is learning Rust. Although we do have in-class lectures for Rust in CMPT 479/982, the best resource to use to learn Rust is the Rust book. You have to read the Rust book in addition to following the lectures since the lectures are not meant to replace the Rust book.
In the first lecture, we cover the basics of the Rust syntax. We discuss variables, types, functions, conditionals, etc. These share similarities with other languages and if you are familiar with any of the popular languages such as C++, Java, or Python, you will be able to pick up Rust's syntax quickly.
System Setup
The first thing we need to talk about is setting up your system so you can write, compile, and run Rust programs.
Installing Rust: rustup
Rust has a streamlined process of maintaining all its software packages. It is all done by a tool
called rustup
. Head over to the Rust book on
installation and follow the instructions
there to install rustup
and other software for Rust.
rustup
has many features to support Rust development, and it has its own
book.
By default, rustup
installs the latest stable version of Rust. However, this course later uses
some experimental features of Rust. Thus, it is necessary to install "nightly" Rust that supports
those experimental features. The following command will do just that.
$ rustup toolchain install nightly
If you are interested, you can read about various release channels of Rust (nightly, beta, and stable).
If later you determine that you want to use nightly Rust by default rather than stable Rust, you can do it as follows.
$ rustup default nightly
If you want to switch the default back to stable, you can do it as follows.
$ rustup default stable
If you want to list all installed versions of Rust, you can do the following.
$ rustup toolchain list
At this point, it will print out two versions, the latest stable version and the latest nightly version.
The following command updates the stable and nightly versions that you have.
$ rustup update
However, if you start using nightly Rust, it is important to not break the installation because your
code might depend on a particular version of Rust. Thus, rustup
allows you to install a specific
version of Rust based on its date. For example, the following command installs a nightly Rust based
on the source from 2021-09-01.
$ rustup toolchain install nightly-2021-09-01
Writing Rust Programs: Editors
Once you have a working installation, you need to have a proper editor/IDE to write a Rust program. There are many options available to choose from, but whatever editor/IDE you use, it is important to enable Rust plugins. Rust plugins support syntax highlighting, code formatting, error detection, etc., which will make your life easier.
- Vim/Neovim: If you use Vim or Neovim, follow the instructions here and use the combination of rust-analyzer, coc.vim, and one of the LSP clients listed in the instructions. This is the personal setup that the instructor uses for Rust development (with ALE as the LSP client).
- Emacs: You can also use rust-analyzer for Emacs. Here are the instructions.
- VS Code: VS Code has an official Rust plugin. Open VS Code, go to Extensions, and search Rust. You can also use rust-analyzer with VS Code.
- CLion: CLion is the only true IDE among the options listed here. You can get a free student license. CLion can use the IntelliJ Rust plugin.
Make sure that you install and enable
rustfmt
with your Rust plugins (e.g., on save). That way, it will automatically format your code.
Compiling Rust Programs: cargo
Once you set up an editor/IDE, you are ready to write Rust programs. Although it is possible to
create Rust source files and compile them with the Rust compiler (rustc
), you are most likely not
going to do that. Instead. cargo
is the command that you will use most of the time. Head over to
the Rust book on cargo
to learn about
cargo
.
One thing to add, due to our prior discussion on using nightly Rust, is that you can easily choose
which Rust version to use when compiling with cargo
. For example, the following command compiles
your code using nightly Rust even if you do not set it as the default.
$ cargo +nightly build
The same +
syntax works for rustc
as well.
It is also possible to set the version of Rust that you want to use for a specific directory by
using rustup
. For example, the following commands will set the default to nightly Rust for the
directory tmp/
.
$ cd tmp
$ rustup override set nightly
Basic Syntax of Rust
Pretty much every (procedural) programming language provides constructs to (1) define a variable, (2) define a function, and (3) control the flow of execution. If you understand those constructs for a given language, you will be able to start writing simple programs in that language. Rust is no exception, so let's take a look at how Rust provides those constructs at the very basic level first.
Variable Definitions
The following program shows how to define variables in Rust.
fn main() { let a: i32 = 1; let b: i32 = 2; println!("a + b is {}", a + b); }
But before we go further, let's note a couple of things.
main()
is a function and it is the default entry point for a Rust program, similar to C/C++. If you create a newcargo
project usingcargo new
, it will create amain()
function insrc/main.rs
.- Let's not worry about
println!()
for now. All you need to know at this point is that it prints out formatted strings.
In the above program, we can make a few observations regarding variable definitions.
- There are two variable definitions in the program.
let a: i32 = 1
means thata
is a 32-bit integer and1
is its initial value.let b: i32 = 2
means thatb
is a 32-bit integer and2
is its initial value.
- A variable definition starts with
let
, e.g.,let a: i32 = 1
. - Each variable has a type, e.g.,
i32
(a 32-bit integer) inlet a: i32 = 1
. When you declare the type for a variable, the syntax is the variable name, followed by:
, followed by the type, e.g.,a: i32
. - A variable definition ends with a semicolon (
;
). In Rust, all statements end with a semicolon (;
). Rust makes a distinction between a statement and an expression, which we will discuss later.
Although Rust requires each variable to have a type, you don't always need to declare it because
oftentimes the Rust compiler can infer a variable's type. We can revise the above code as follows
and the Rust compiler will automatically understand the types for a
and b
.
fn main() { let a = 1; // No explicit type declaraion here. let b = 2; // Not here either. println!("a + b is {}", a + b); }
Keep in mind, however, that it's not always possible for the Rust compiler to infer a variable's type. If that's the case, you still need to explicitly declare the type.
The Rust book has a section on data types and you can take a look at the primitive data types available in Rust.
Function Definitions
Let's look at another piece of code that shows how to define a function.
fn main() { let a = 1; let b = 2; print_sum(a, b); } fn print_sum(x: i32, y: i32) { let sum = x + y; println!("Sum: {}", sum); }
Here also, we can make a few observations.
- There are two function definitions. One is for
main()
and the other is forprint_sum()
. Again,main()
is the default entry point for Rust programs. - A function definition starts with
fn
. print_sum()
has two parameters, one isx
and the other isy
.x: i32
means that the parameter name isx
and the type isi32
.y: i32
means that the parameter name isy
and the type isi32
.
Unlike variables, parameters always need a type. Run the following code (that doesn't declare types for parameters) and see what error messages you get.
fn main() { let a = 1; let b = 2; print_sum(a, b); } fn print_sum(x, y) { let sum = x + y; println!("Sum: {}", sum); }
Typically, you want to return something as a result of calling a function. The following code shows the syntax for it.
fn main() { let a = 1; let b = 2; let s = sum(a, b); println!("a + b is {}", s); } fn sum(x: i32, y: i32) -> i32 { x + y }
There are a couple of differences that we can see.
sum()
has a return type declared in its definition-> i32
. It means that it returns a 32-bit integer (i32
) as a return value.x + y
is the return value. Notice a couple of things there.- There is no
return
keyword. - There is no semicolon at the end.
- There is no
Not having a semicolon x + y
means that x + y
is an expression, and as mentioned earlier, we
will discuss it later.
You can use return
in a function, either at the end or in the middle (for an early return). The
following code works exactly the same way except that it uses a return
statement (notice ;
at
the end) instead.
fn main() { let a = 1; let b = 2; let s = sum(a, b); println!("a + b is {}", s); } fn sum(x: i32, y: i32) -> i32 { return x + y; }
Read the Rust book on functions to understand the details about functions.
Execution Control
To control the flow of execution, Rust provides a few constructs that are similar to the ones
provided by other languages. Although they are similar, Rust's constructs for execution control are
often more powerful. There are typically two types of control you'd like---branching and looping.
Rust provides if-else if-else
and match
for branching and loop
, while
, and for
for
looping.
Branching with if
Let's look at the following code, which shows how if-else
works in Rust.
fn main() { let a: i32 = 1; if a == 1 { println!("a is 1"); } else if a == 2 { println!("a is 2"); } else { println!("a is something else"); } }
It looks very similar to the if-else
constructs in other popular languages such as C/C++ or Java,
except that there are no parentheses for a branch condition. However, there is a notable
difference---if-else
evaluates to a value. To understand what this means, let's look at the
following code.
fn main() { let a = 1; let b = if a == 1 { 2 } else if a == 2 { 1 } else { 0 }; println!("b is {}", b); }
As you can see, b
gets its initial value assigned from the result of if-else
. This is because
if
is an expression, not a statement. An expression in Rust evaluates to a value while a statement
doesn't. In Rust, most of the constructs are expressions, e.g., if
, for
, a code block {}
, etc.
If you add a semicolon at the end of an expression, it becomes a statement (called an expression
statement) and the value that the expression evaluates to gets ignored.
Exactly what value an if
evaluates to needs some more explanation. The first thing to understand
is that a block is also an expression and it evaluates to the last expression of the block. Run the
following code and see what the result looks like.
fn main() { let a: i32 = { println!("In the block"); 1 // This is an expression. There's no `;`. }; println!("a is {}", a); }
Since a block evaluates to its last expression, a
gets 1
. In order to see the difference between
an expression and a statement, run the following code and see what error messages you get.
fn main() { let a: i32 = { println!("In the block"); 1; // This is a statement, not an expression due to `;`. }; println!("a is {}", a); }
If a block does not have the last expression, it gets ()
(sometimes referred to as the unit
type) as its type. ()
has a single value, which is also ()
, and it is used "when there's no
other meaningful value that could be returned".
Now if
evaluates to the value of the block that corresponds to the correct condition. In the
following code (which is the same code from above), if
evaluates to 2
because a == 1
and the
corresponding block evaluates to 2
.
fn main() { let a = 1; let b = if a == 1 { 2 } else if a == 2 { 1 } else { 0 }; println!("b is {}", b); }
Branching with match
Rust provides another branching construct called match
and you will probably find yourself using
it very often due to its power. However, understanding its power requires the understanding of
Rust's pattern matching ability, so we will not delve into that for now. The following code is a
revised version of the very first code we saw for if
and it uses match
instead of if-else if-else
.
fn main() { let a: i32 = 1; match a { 1 => { // If `a` matches `1`, execute this. println!("a is 1"); } 2 => { // If `a` matches `2`, execute this. println!("a is 2"); } _ => { // `_` works as a wild card and it matches any expression println!("a is something else"); } } }
With match
, you provide an expression to match (e.g., a
) and list out execution options (e.g.,
1 => {}
, 2 => {}
, etc.). Each option is called a match arm. Unlike if
, a match
expression
does not have to be a boolean expression as you can see from the code above.
As mentioned earlier, match
is much more powerful than if
due to pattern matching, but we will
cover that later.
Loops
Rust provides three types of loops---loop
, while
, and for
.
loop
loop
is probably the most interesting loop construct, especially in conjunction with break
.
loop
is an infinite loop construct and if you want to break out of the loop, you need to use
break
.
fn main() { let mut a: usize = 0; // Don't worry about `mut` for now. loop { println!("This is infinite..."); a += 1; if a == 10 { println!("Unless there's a break"); break; } } }
An interesting thing about loop
and break
is that loop
is an expression and it evaluates to
what break
returns.
fn main() { let mut a: usize = 0; // Don't worry about `mut` for now. let b = loop { println!("This is infinite..."); a += 1; if a == 10 { println!("Unless there's a break"); break a; } }; println!("b is {}", b); }
In the code, break
returns a
and loop
evaluates to it. Thus, a
's value is assigned to b
.
while
while
is almost exactly what you would expect.
fn main() { let mut a: usize = 0; // Don't worry about `mut` for now. while a < 5 { println!("a is {}", a); a += 1; } }
As with other languages, while
provides a conditional loop.
for
You can use for
instead of while
, but you will typically use for
to iterate over a collection
such as an array.
fn main() { let a = [0, 1, 2, 3, 4]; for i in a { println!("a contains {}", i); } }
In order to use for
to iterate over a collection such as an array, you need to get an iterator
for the collection. In the code above, even though it looks like you are using a
directly, that's
actually not the case. The Rust compiler understands that you are iterating over a
and it replaces
it with a proper iterator.
You can also use range operators, ..
or ..=
, with for
. The first one, ..
, is a right
exclusive range operator while ..=
is a right inclusive range operator.
fn main() { let a: usize = 1; for i in 0..a { println!("Right exclusive iteration: {}", i); } for i in 0..=a { println!("Right inclusive iteration: {}", i); } }
Safety Features of Rust
Now that you are familiar with the basic syntax of Rust, let's talk about what makes Rust different from other languages, namely, the safety features of Rust. Rust employs many features that safeguard you from writing code that might cause problems at run time. The Rust language and compiler have many checks to enforce that you are writing a more reliable program. These very features sometimes make a program hard to compile. But soon you will realize that once you get your program compiled, it will just work as you have intended.
Variable Mutability
The first safety feature to discuss is variable mutability/immutability. By default, a variable in
Rust is immutable meaning that once you assign a value to a variable, you cannot assign another
value to it. If you run the following code, the compiler will complain. The error message basically
says that the variable a
is an immutable variable and you cannot assign a value twice.
fn main() { let a: i32 = 1; println!("a is {}", a); a = 2; println!("a is {}", a); }
In order to make a variable mutable, you need to declare a mutable variable like the following.
fn main() { let mut a: i32 = 1; println!("a is {}", a); a = 2; println!("a is {}", a); }
You use the mut
keyword to define a mutable variable that you can assign a value to more than
once.
Since you define mutable variables explicitly, the Rust compiler knows which variables can be modified. Thus, the Rust compiler can statically check (i.e., check at compile time) if the mutable variables are the only ones modified in your code. For you as a Rust developer, this is a safety feature for a couple of reasons. First, it makes you think hard about whether or not you will need to modify a variable when you define it. In other words, it gives you an opportunity to think about how you intend to use each and every variable in your code. Second, it prevents you from defining a variable in one place with the assumption that you will not modify it, and later modifying the variable inadvertently.
In addition to defining a mutable variable with mut
, you can also reuse the same variable name.
fn main() { let a: i32 = 1; println!("a is {}", a); let a: i32 = a + 1; println!("a is {}", a); let a: i64 = 3; // Different type println!("a is {}", a); }
This is called shadowing, which is effectively redefining a new variable with the same name. This is useful in certain scenarios, e.g., when you have a big chunk of code that uses a variable heavily, but then later realize that you need to do some quick transformations for the variable before it gets heavily used. If that's the case, you can shadow the variable and still take advantage of the Rust compiler's immutability check.
The Rust book has an excellent section on variable mutability, so please go read it.
Ownership
Ownership is perhaps the most distinguishing feature of Rust that everybody talks about, and frankly it will give you some headaches when you try to get the Rust compiler to compile your code. However, it is an important feature of Rust that provides safety.
The whole concept has to do with how to manage memory. In languages like C/C++, the approach is to leave memory management to programmers. What it means is that C/C++ programmers need to allocate memory and free memory by themselves. This has caused many programs to suffer from memory leak problems since it is easy to allocate memory and not free it. Other languages like Java and Python use an automated memory management approach where a garbage collector runs from time to time to reclaim allocated memory that no longer is in use. This approach unburdens programmers from worrying about memory management but has a performance cost because a garbage collector needs to run, which interferes with the normal program execution.
Rust takes a different automated approach to memory management. When you define a variable, Rust allocates a piece of memory that the variable will use. This variable is called the owner of that allocated memory. Later, Rust deallocates the memory when its owner goes out of scope. Rust calls this dropping of memory. In addition, there are certain cases where Rust moves the ownership of a piece of memory from one variable to another. Thus, it is not always the case that the first variable that owns a piece of memory remains its owner the whole time. However, it is the case that there is only a single variable that is the owner of a piece of memory.
There are a few things to unpack here and let's look at one by one.
The Scope of a Variable
Let's first look at what a scope is for a variable. A variable's scope in Rust is similar to other languages and you can easily determine what it is by looking at the block where the variable is first defined. The following examples show two cases to illustrate what a scope is for a variable.
fn main() { let a: i32 = 1; // The scope for `a` starts here. println!("a is {}", a); // This works fine because `a` is still valid. } // The scope for `a` ends here.
fn main() { { let a: i32 = 1; // The scope for `a` starts here. } // The scope for `a` ends here. println!("a is {}", a); // This throws an error, // because `a` is no longer valid. }
As you can see from the examples, a variable has a scope and it is only valid within its scope. In
fact, this is the way most other languages work as well. The difference is that Rust uses a
variable's scope to automatically manage memory. In the above examples, when a
goes out of scope,
Rust drops a
's memory automatically. At this point, you may think that it is still the way other
languages work. You are correct. For stack-allocated memory such as local variables, you never
have to worry about allocating or deallocating memory in other languages either. The difference for
Rust is that you also do no need to worry about it for heap-allocated memory. Let's discuss this a
little further. (If you need a refresher on the stack and the heap, please read the Rust book on
the stack and the
heap).
Automated Heap Memory Management
In languages like C/C++, programmers allocate or deallocate heap memory by invoking memory
management functions such as malloc()
and free()
. In languages like Java and Python, programmers
do not allocate or deallocate heap memory explicitly because it is by default hidden from the
programmers and there is a garbage collector that manages memory.
In Rust, heap allocation/deallocation is by default hidden from the programmers as well, and through
the combination of the Rust compiler and the Rust's standard library, Rust handles the
allocation/deallocation of heap memory. Rust allocates heap memory through convenient data
structures such as String
,
Vec
, and
Box
. These data structures hide all the details
of allocating heap memory. Of course, Rust is a low-level language, so you can allocate heap
memory by yourself. But Rust developers typically do not use the heap that way.
Rust deallocates heap memory via a function called drop()
and this is where ownership plays a
critical role. When a variable that is the owner of a piece of heap memory goes out of scope, Rust
invokes drop()
automatically. By "automatically," we mean that the Rust compiler injects a piece
of code that invokes drop()
. The Rust compiler provides the default drop()
implementation and
deallocates the heap memory used by its owner.
Mechanism-wise, this is similar to C/C++ that use memory allocation/deallocation functions (e.g.,
malloc()
and free()
). It is just that by default, Rust programmers do not need to invoke them by
themselves. This is different from languages like Java or Python where a separate runtime component,
i.e., a garbage collector, is used to manage memory.
The following examples use the Box
data structure to allocate heap memory.
fn main() { let a = Box::new(1); // The heap memory for `a` gets allocated. // Don't worry about the syntax for `Box` for now. println!("a is {}", a); // This works fine because `a` is still valid. } // The heap memory for `a` gets deallocated.
fn main() { { let a = Box::new(1); // The heap memory for `a` gets allocated. } // The heap memory for `a` gets deallocated. println!("a is {}", a); // This throws an error, // because `a` is no longer valid. }
Earlier we said that each allocated piece of memory in Rust has an owner and there is always a
single owner. Since drop()
by default takes care of deallocation when an owner goes out of scope,
we mostly do not need to worry about memory leak problems. One caveat is that Rust does not
prevent programmers from manually allocating and deallocating heap memory. Thus, it is possible to
suffer from memory leaks when a programmer tries to manage memory explicitly and does not do a
thorough job for it. However, Rust programmers typically do not choose to manage heap memory by
themselves, so there is a low chance of getting into memory leak problems.
Determining Ownership
Since Rust calls drop()
when an owner goes out of scope, it is absolutely critical to be able to
determine whether or not a variable is the owner of a piece of memory. If a single variable accesses
a piece of heap memory exclusively throughout a whole program, it is easy to determine the
ownership. However, it is too restrictive to not allow two or more variables to access the same
piece of memory. Thus, Rust employs a few mechanisms to keep track of ownership.
Move
By default, when you assign a variable to another variable, Rust moves the ownership. This is probably one of the most surprising aspects about Rust as a beginner. Let's look at the following code to see what this means.
fn main() { let a = Box::new(1); // `a` is the owner of the memory for `Box`. let b = a; // Rust moves the ownership of the `Box` from `a` to `b`. println!("b is {}", b); // This works fine. }
fn main() { let a = Box::new(1); // `a` is the owner of the memory for `Box`. let b = a; // Rust moves the ownership of the `Box` from `a` to `b`. println!("a is {}", a); // This throws an error, // because `a` no longer has access to the `Box`. }
As you can see, if you assign a variable to another variable, Rust no longer allows us to use the original variable. The same thing happens with function calls and return values.
fn main() { let a = String::from("a"); // Don't worry about the syntax for `String` for now. print_str(a); // `a` moves to the function `print_str()`. } fn print_str(x: String) { // `x` is the (new) owner of the string passed in. println!("The string is {}", x); }
fn main() { let a = String::from("a"); print_str(a); // `a` moves into the function `print_str()`. println!("a is {}", a); // This throws an error, // because `a` can no longer access the string. } fn print_str(x: String) { println!("String {}", x); }
fn get_str() -> String { let x = String::from("a"); x // `x` moves to the caller } fn main() { let a = get_str(); // `a` is the new owner of the String "a". println!("a is {}", a); }
You might be wondering why this is necessary. Let's take a look at the first example to understand further.
fn main() { let a = String::from("a"); print_str(a); // `a` moves into the function `print_str()`. println!("a is {}", a); // This throws an error, // because `a` can no longer access the string. } fn print_str(x: String) { println!("The string is {}", x); } // Since `x` is the owner, Rust deallocates the String at this point, // because `x` is out of scope.
In the code, you can see that x
becomes the new owner of the String "a"
and it goes out of scope
when the function print_str()
is done. Thus, Rust will drop the String
at that point. Thus,
a
should not be able to access the memory location after the function returns. Otherwise, a
will
access the memory location that is already dropped.
Generally speaking, if you have two different variables that can access the same heap location (called aliases), it can cause problems. For example, one variable can free the memory at one point while the other variable access the memory at some later point. This is called use-after-free and it is a well-known bug that can cause a vulnerability. Similarly, one variable can free the memory at one point and the other variable can free the same memory again at some later point. This is called double free and it is also a well-known bug that can cause a vulnerability. By moving the ownership and not allowing the original variable to access the value it had, Rust helps prevent problems caused by two variables accessing the same heap location.
However, you might think that this is too restrictive. For example, if you can't use variables every time you call a function and pass them as arguments, it will be very difficult to write a program. Thus, Rust provides many ways to help you deal with the restriction.
Copy and Clone
By default, primitive data types such as i32, i64, etc. do not move ownership. Instead, they just copy the value to a new memory location. The following example illustrates that.
fn main() { let a = 1; let b = 2; let s = sum(a, b); // This does not move the ownership. println!("a is {}", a); // This works fine. println!("b is {}", b); // This works fine. println!("s is {}", s); } fn sum(x: i32, y: i32) -> i32 { x + y // This does not move the ownership either. }
As we can see, even if we pass a
and b
as arguments to sum()
, we can still use them later. It
is the same with the return value of sum()
, although the code does not directly illustrate that.
All this is because primitive data types copy instead of move, hence do not transfer ownership.
Rust distinguishes copy and move by looking at whether or not a data type implements something
called the Copy
trait (we will look at
what a trait is later). All primitive data types implement the Copy
trait while data structures
like String
and Box
do not. You can define your custom data structure and implement the Copy
trait to use the copy semantics instead of the move semantics for your data structure. A typical
criterion to use when deciding whether or not you want to implement the Copy
trait is the cost and
complexity of copying. For example, primitive data types are small in size and the sizes are fixed.
Thus, it is relatively inexpensive and easy to copy. However, String
or Box
point to a location
on the heap, and the sizes are often not known a priori. Thus, it may not be easy or inexpensive to
copy.
Another way to copy is cloning. If a data type implements the Clone
trait, you can call clone()
to explicitly
create a duplicated object. This is different from copy because there has to be an explicit call.
Borrow
Rust provides another alternative to move, which is called a borrow. This uses &
to represent
that a variable is borrowing a value from another variable.
fn main() { let str = String::from("a"); print_str(&str); // `&` is used to represent a borrow. println!("Can still access str: {}", str); } fn print_str(s: &String) { // `&` is used along with the type. println!("The string is {}", s); }
When you pass a variable to a function to borrow it instead of moving it, there are two things you
need to do. First, you need to pass a variable and add &
, and second, you need to use &
in your
function definition as part of the type for each borrow parameter. Similar to C/C++, &
is called a
reference, but in Rust, it's better to think of it as a borrow rather than a pointer.
Mutable Reference
One caveat for borrowing is that it is read-only.
fn main() { let str = String::from("a"); print_str(&str); // `&` is used to represent a borrow. println!("Can still access str: {}", str); } fn print_str(s: &String) { // `&` is used along with the type. println!("The string is {}", s); s.push_str("_added_more"); // This throws an error since `s` is read-only. println!("The new string is {}", s); }
Again, this is quite restrictive since you cannot modify the value coming in as an argument. Thus, Rust provides a mutable borrow.
fn main() { let mut str = String::from("a"); // `mut` is used. print_str(&mut str); // `&mut` is used. println!("Can still access str: {}", str); } fn print_str(s: &mut String) { // `&mut` is used. println!("The string is {}", s); s.push_str("_added_more"); // This works now. println!("The new string is {}", s); }
There are three different things here. First, when defining str
, we use mut
to represent that
str
has a mutable value. Second, when passing str
to print_str()
, we use &mut
to represent
that it is a mutable borrow, i.e., we are saying that print_str()
not only borrows the value but
also modifies the value. Third, in the parameter definition of s
in print_str()
, we use &mut
to represent that print_str()
modifies the value it is borrowing.
The Borrow Checker and the Aliasing XOR Mutability Principle
Mutable borrowing gives us flexibility of being able to modify a borrowed value within a function. However, it has a risk of data races. If you need a refresher on data races, please read the Rust book on mutable references, which explains the data race problem. In a nutshell, if two references have mutable access to the same memory location, then one can modify the value without the other knowing. Data races are known to be difficult to track down and fix.
Rust safeguards its programs from experiencing this problem by employing a principle commonly known
as aliasing XOR mutability. It means that you get either aliasing or mutability, but not both. As
mentioned earlier, aliasing means having two or more references to the same (heap) memory location.
Mutability means having the ability to modify the value at a memory location. Thus, aliasing XOR
mutability means that you have either exactly one mutable reference (a variable defined with &mut
)
or two or more references (variables defined with just &
), but not both. The following
illustrates the principle.
fn main() { let a = String::from("a"); let b = &a; let c = &a; // So far we have two additional references to `a`. // This is aliasing, which is fine, as long as // those references don't have mutability. println!("a is {}", a); println!("b is {}", b); println!("c is {}", c); }
fn main() { let mut a = String::from("a"); let b = &a; let c = &mut a; // This is a problem because `b` is an alias, // and `c` has mutability. This is // both aliasing and mutability, not XOR. println!("a is {}", a); println!("b is {}", b); println!("c is {}", c); }
fn main() { let mut a = String::from("a"); let b = &mut a; let c = &mut a; // This does not work either, // because both `b` and `c` are mutable aliases. // I.e., both aliasing and mutability, not XOR. println!("a is {}", a); println!("b is {}", b); println!("c is {}", c); }
The Rust compiler has a component called the borrow checker that enforces the aliasing XOR mutability principle at compile time. Oftentimes, this borrow checking gives a hard time to beginners and people say they're "fighting the borrow checker" because the Rust compiler keeps rejecting a program due to borrow checking rules. Thus, it is important to understand how exactly borrow checking works. Practice is a must here and also make sure you read The Rust book on borrow checking.
Option
Another important safety aspect of Rust is its approach to handling variables with no values. If you
have experience with programming, you probably know already that there are many cases where a
variable does not have a meaningful value. In those cases, values like null
or just plain 0
is
used to represent that a variable doesn't have a meaningful value. However, this has led to numerous
bugs and vulnerabilities since programmers often forget to handle null
or 0
and get a runtime
error, e.g., a null pointer exception.
In Rust, there is no null
value that you can use. Instead, the standard Rust library provides an
alternative called Option
. Rust programmers use Option
heavily and you can find it everywhere,
e.g., the standard library, external crates, etc. Thus, it is absolutely critical to understand what
Option
is and how to use it.
Option
is defined as follows.
#![allow(unused)] fn main() { enum Option<T> { None, Some(T), } }
The definition of
Option
usesenum
, which is something we have not discussed yet. It is similar to enumeration types in other languages like C/C++ or Java. Anenum
defines a custom type and lists all possible values that a variable of that type can have. For example, the following code defines anenum
type calledEx
and it has two possibilities.#![allow(unused)] fn main() { enum Ex { FirstPossibility, SecondPossibility, } }
These possibilities are called
enum
variants, and when using a variant from anenum
, you need to use::
.enum Ex { FirstPossibility, SecondPossibility, } fn main() { let a: Ex = Ex::FirstPossibility; }
You can find more details in the Rust book's section on
enum
.
If you look at the Option
definition, it defines an enum
that has two variants, one is
Option::None
used when a variable does not have a meaningful value, and the other is
Option::Some
used when a variable does have a meaningful value.
Option::Some
has a couple of additional details to discuss. First is the use ofT
found inOption<T>
andSome(T)
. ThisT
is called a generic type parameter (and it does not have to be the letter T). If you know the support for generics in other languages like C/C++ or Java, you can probably understand what it is quickly.T
is a variable that can take a type instead of a value. What this means is that instead of definingOption
for every single type there is, e.g., anOption
fori32
, anOption
fori64
, etc., we can define it once using a generic type variable and instantiate anOption
for any type. In theOption
definition above, a generic type variableT
is used inOption<T>
to declare that theenum Option
is defined for all types.The second detail is the definition
Some(T)
. This declares thatSome
is a variant that should take a value of the typeT
. This is different fromFirstPossibility
orNone
in the above examples because it is a variant that expects a value of a certain type.The following example demonstrates all of these.
fn main() { let option_some_for_i32: Option<i32> = Some(1); let option_none_for_i32: Option<i32> = None; let option_some_for_string: Option<String> = Some(String::from("str")); }
You can find more details in the Rust book's secion on generic data types.
By declaring a variable with Option
, you are explicitly saying that a variable may or may not have
a meaningful value and more importantly, you are forcing yourself to deal with both cases in your
code.
In the above example, you might have noticed that Some
and None
are not used with Option::
,
i.e., not as Option::Some
or Option::None
but as Some
and None
. This is because Rust
automatically imports the definitions so you can use them without having the Option::
qualifier.
This is called the prelude, i.e., things that
every Rust program automatically imports by default.
There are a lot of details that we do not discuss here regarding Option
and enum
. Make sure you
read the Rust book on enum
and pattern
matching as well as on generic data
types.
Result
The last safety aspect of Rust to highlight is its approach to error handling. Some languages use
values to represent an error condition, e.g., null
or a negative integer such as -1
. Other
languages use an error reporting mechanism that is outside of regular return paths, e.g., throw
and try-catch
in Java. Rust unifies these two approaches and use an enum
called Result
to
return a value or report an error. Similar to Option
, Rust programmers heavily use Result
and
you can find it everywhere. Thus, it is also critical to understand what Result
is and how to use
it.
The definition looks like the following.
#![allow(unused)] fn main() { enum Result<T, E> { Ok(T), Err(E), } }
The first variant Result::Ok
represents a success with a value. The second variant Result::Err
represents an error with an error value. Thus, Result
is typically used as a return value type.
#![allow(unused)] fn main() { fn function_with_result(success_or_fail: bool) -> Result<String, String> { match success_or_fail { true => Ok(String::from("success")), false => Err(String::from("fail")), } } }
A common way to work with
Result
(as well asOption
) is usingmatch
that we have discussed earlier. The following example shows an example and also demonstrates the power ofmatch
for pattern matching that was briefly mentioned earlier.fn function_with_result(success_or_fail: bool) -> Result<String, String> { match success_or_fail { true => Ok(String::from("success")), false => Err(String::from("fail")), } } fn main() { let result = function_with_result(true); match result { Ok(success_result) => println!("Success: {}", success_result), Err(error_result) => println!("Error: {}", error_result), } let result = function_with_result(false); match result { Ok(success_result) => println!("Success: {}", success_result), Err(error_result) => println!("Error: {}", error_result), } }
As we can see,
match
not only recognizes thatresult
is eitherOk()
orErr()
but also assigns the value ofOk()
(orErr()
) tosuccess_result
(orerror_result
).Another common way is to use
if let
, which is similar tomatch
.fn function_with_result(success_or_fail: bool) -> Result<String, String> { match success_or_fail { true => Ok(String::from("success")), false => Err(String::from("fail")), } } fn main() { let result = function_with_result(true); if let Ok(success_result) = result { println!("Success: {}", success_result); } else { println!("Error"); } let result = function_with_result(false); if let Err(error_result) = result { println!("Error: {}", error_result); } else { println!("Success"); } }
if let
attempts to perform a pattern match and if it is successful, it executes theif let
block. Otherwise it executes theelse
block.You can find more details on
if let
and pattern matching in the Rust book.
Similar to Option
, by declaring a return type as Result
, you are forcing yourself to handle both
the success case and the error case. There are a lot of details about Result
and error handling
that we do not cover here, so please make sure you read the Rust book on error
handling.
Expressive Power and Unsafe Rust
Rust is a low-level language, meaning you can do mostly anything that other languages allow you to do. For example, you can allocate and deallocate memory by yourself as mentioned earlier. You can also do other things that low-level languages such as C or C++ allow you to do, while higher-level languages such as Java or Python do not. However, Rust recognizes that it is not always desirable or safe to allow potentially bug- or vulnerability-inducing operations. Thus, Rust tries to strike a balance between what is "safe" and what is "unsafe" and distinguish what is commonly called unsafe Rust from (what is commonly called) safe Rust.
The reason why unsafe Rust exists is because of expressiveness vs. safety that Rust presents as a language. Using safe Rust, especially when you write low-level code such as shell commands, system libraries, or kernel, you might encounter cases where you have a hard time expressing, or even cannot really express, what you want to express.
Example
The most famous example is linked lists. There is actually a whole online book about writing lists in Rust. It is not our goal to look at all the details, but let's take a look at an example to make our discussion a little more concrete.
In the example for linked lists below, we use
struct
, which is similar to the one in C/C++. It defines a custom type with a list of members. You can read the Rust book onstruct
to learn about the details. Below is a simple example for astruct
definition and initialization.struct Ex { member1: i32, member2: String, } fn main() { let struct_var: Ex = Ex { member1: 1, member2: String::from("member2"), }; println!("Member1: {}", struct_var.member1); println!("Member2: {}", struct_var.member2); }
The example here illustrates how a circular linked list, which is not difficult to express in other languages, does not translate easily to Rust.
struct Node { next: Option<Box<Node>>, // The definition of `Box` is actually // `Box<T>` with a generic type parameter `T`. } fn main() { let mut tail = Box::new(Node { next: None, }); // A tail node that doesn't have the next node for now. let head = Box::new(Node { next: Some(tail), }); // A head node that has the tail node as the next node. tail.next = Some(head); // An attempt to have the tail node // point back to the head node }
The problem here is ownership. First, we assign a Box
to tail
. We then assign Some(tail)
to
head.next
, so head
becomes the owner of the Box
at that point. This means that tail
no
longer has access to the Box
. But then we try assigning Some(head)
to tail.next
, meaning we
try to access the original Box
that tail
no longer can access.
This is one example that shows how safe Rust trades off expressive power for safety. In other words, safe Rust sometimes sacrifices expressive power in order to provide better safety. In addition to linked lists, there are other many other examples where safe Rust limits the expressive power.
The unsafe
Keyword
As mentioned earlier, Rust is a low-level language and you can do mostly anything that other
languages allow you to do. However, as we have just seen, Rust limits its expressive power to
provide better safety. Obviously, these two things are in conflict with each other, and Rust deals
with it by distinguishing what is considered safe and unsafe via the unsafe
keyword.
You can use unsafe
in order to do certain things that Rust by default does not allow you to do.
Let's first look at how to use unsafe
and then look at what you can do with unsafe
.
unsafe
Blocks, Functions, and Traits
You can use in in three ways.
The first way is to define an unsafe
block. The code below does not actually need unsafe
. It is
only for demonstration purposes.
fn main() { unsafe { let a = 1; println!("a is {}", a); } }
Another way is to define an unsafe
function. When you want to invoke an unsafe
function, you can
only do it within an unsafe
block.
unsafe fn unsafe_fn(a: i32) { println!("a is {}", a); } fn main() { unsafe { unsafe_fn(1); } }
The third way is to define an unsafe
trait. However, since we have not discussed traits yet, we
will discuss the use of unsafe
for traits later when we discuss traits.
unsafe
Capabilities
The Rust book has a section on unsafe
Rust that overviews what you can do with
unsafe
. There is also a separate book called The
Rustonomicon that is dedicated to unsafe Rust. These resources
discuss all the capacilities of unsafe
as well as important nuances that you need to know when you
use unsafe
.
Among all the things that unsafe
allows you to do, the use of raw pointers is perhaps the most
common case. Raw pointers are similar to the pointers in C/C++ and there are two types---one
type is immutable raw pointers defined as *const T
and the other type is mutable raw pointers
defined as *mut T
. You can still create raw pointers without using unsafe
but when you
dereference a raw pointer, you can only do it inside unsafe
. The following are two examples.
fn main() { let a = 1; let raw_ptr: *const i32 = &a; println!("a through raw_ptr is {}", *raw_ptr); // This does not work. }
fn main() { let a = 1; let raw_ptr: *const i32 = &a; unsafe { println!("a through raw_ptr is {}", *raw_ptr); // This does work. } }
Unlike references, raw pointers lack safety guarantees that Rust provides. Most notably, Rust does not check the aliasing XOR mutability rule for raw pointers. This means that you can have any number and combinations of mutable and immutable raw pointers, and the Rust compiler will not complain.
fn main() { let mut a = 1; let immutable_raw_ptr: *const i32 = &a; let mutable_raw_ptr: *mut i32 = &mut a; unsafe { println!("a through immutable_raw_ptr is {}", *immutable_raw_ptr); println!("a through mutable_raw_ptr is {}", *mutable_raw_ptr); *mutable_raw_ptr = 2; println!("*immutable_raw_ptr now is {}", *immutable_raw_ptr); println!("*mutable_raw_ptr now is {}", *mutable_raw_ptr); } }
Another notable aspect about raw pointers is that Rust does not deallocate memory automatically for raw pointers. The following is an example that shows manual allocation and dealloation (modified from this page).
fn main() { unsafe { let layout = std::alloc::Layout::new::<u16>(); let ptr: *mut u8 = std::alloc::alloc(layout); *ptr = 42; println!("*ptr is {}", *ptr); std::alloc::dealloc(ptr, layout); } }
In this example, Rust does not automatically deallocate what ptr
points to. It needs to be done
manually.
Although unsafe
gives you more expressive power and you can do low-level operations such as
pointer manipulations, it is generally discouraged to use since it escapes the safety net provided
by the Rust compiler. Thus, it is critical to have a clear understanding of what it does. As
mentioned earlier, Rust already provides great resources (the unsafe
section from the Rust
book and the
Rustonomicon). You are highly encouraged to read these before
you start using unsafe
.
More on struct
and trait
There are a couple of things that we already used without explaining in the previous chapters, so let's tie up some loose ends.
More on struct
You might remember how we created a new Box
or a new String
in some of the earlier examples.
fn main() { let s = String::from("a"); let b = Box::new(1); println!("This is a String: {}", s); println!("This is a Box: {}", b); }
from()
and new()
are called associated functions and they are associated with struct String
and struct Box
, accordingly. The syntax for defining and calling an associated function is
as follows.
struct Ex { field1: i32, field2: bool, } impl Ex { fn associated_fn(x: i32, b: bool) -> Ex { println!("Creating a new Ex with {} and {}", x, b); Ex { field1: x, field2: b } } } fn main() { let ex = Ex::associated_fn(1, true); }
There is another type of functions that you can define for a struct
and they are called methods.
The difference between an associated function and a method is that a method takes self
as the
first parameter by default, which refers to a struct
instance. This is similar to a Python class.
The following example extends the above example and includes methods.
struct Ex { field1: i32, field2: bool, } impl Ex { fn associated_fn(x: i32, b: bool) -> Ex { println!("Creating a new Ex with {} and {}", x, b); Ex { field1: x, field2: b } } fn print_field1(self) { println!("field1 is {}", self.field1); } } fn main() { let ex = Ex::associated_fn(1, true); ex.print_field1(); // Invoking a method with a `.` // `self` is automatically passed. }
Now, the parameter self
needs to adhere to the same borrow checker rules. Thus, when you call
ex.print_field1()
, ex
moves into print_field1()
since self
is passed into it. What that
means is that the next example does not work.
struct Ex { field1: i32, field2: bool, } impl Ex { fn associated_fn(x: i32, b: bool) -> Ex { println!("Creating a new Ex with {} and {}", x, b); Ex { field1: x, field2: b } } fn print_field1(self) { println!("field1 is {}", self.field1); } fn print_field2(self) { println!("field2 is {}", self.field2); } } fn main() { let ex = Ex::associated_fn(1, true); ex.print_field1(); ex.print_field2(); }
As the compiler says, in the first call (ex.print_field1()
), ex
moves into print_field1()
.
Thus, the second call (ex.print_field2()
) cannot use ex
anymore. However, you can borrow
self
, just like any other variables/parameters.
struct Ex { field1: i32, field2: bool, } impl Ex { fn associated_fn(x: i32, b: bool) -> Ex { println!("Creating a new Ex with {} and {}", x, b); Ex { field1: x, field2: b } } fn print_field1(&self) { // Immutable borrow println!("field1 is {}", self.field1); } fn print_field2(&mut self) { // Mutable borrow println!("field2 is {}", self.field2); } } fn main() { let mut ex = Ex::associated_fn(1, true); ex.print_field1(); ex.print_field2(); }
Oftentimes, you use associated functions for initialization. You use methods for instance-specific operations.
trait
Earlier, we mentioned that if a type implements a Copy
trait, Rust does not move ownership but
copies the value directly. We also mentioned that unsafe
can be used to define an unsafe
trait.
A trait is similar to an interface or a template in other languages, and used to define a shared
behavior across different types. It only defines functions and a type needs to implement those
functions. For example, the following code defines a trait called TraitEx
with a single function
to implement shared_behavior()
.
#![allow(unused)] fn main() { trait TraitEx { fn shared_behavior(&self) -> String; } }
You can implement a trait
for your struct
as follows.
#![allow(unused)] fn main() { trait TraitEx { fn shared_behavior(&self) -> String; } struct StructEx; impl TraitEx for StructEx { fn shared_behavior(&self) -> String { String::from("string") } } }
trait
is heavily used in Rust and you will frequently encounter things like the following that
might look confusing (below are taken from the Rust
book).
#![allow(unused)] fn main() { fn notify<T: Summary>(item: &T) {} fn notify(item: &impl Summary) {} }
The above two are actually the same definition. What they mean is that the type of the parameter
item
can be a borrow of any type (hence the use of the generic parameter type T
) that implements
the Summary
trait. This is called a trait bound, meaning that we are binding a parameter type to
a trait. In fact, we can have multiple trait bounds for a parameter.
#![allow(unused)] fn main() { fn notify<T: Summary + Display>(item: &T) {} }
The above defines the parameter item
to have a type that implements two traits, Summary
and
Display
. If we want to use many trait bounds, we can use where
as follows.
#![allow(unused)] fn main() { fn some_function<T, U>(t: &T, u: &U) -> i32 where T: Display + Clone, U: Clone + Debug {} }
Week 2: Rust (Continued)
There are a few remaining topics to highlight in Rust, which you will see and use frequently even as a beginner.
Lifetimes
In Safety Features of Rust, we looked at one borrow checker
rule that Rust enforces, namely aliasing XOR mutability. There is in fact another borrow checker
rule that Rust enforces, which is that a referent should outlive its references. We looked at this
at play with move
before.
fn main() { let a = String::from("a"); print_str(a); // `a` moves into the function `print_str()`. println!("a is {}", a); // This throws an error, // because `a` can no longer access the string. } fn print_str(x: String) { println!("String {}", x); }
The above example is the same example we saw before in Safety Features of
Rust. a
is a reference of a referent String
that contains "a"
.
This string gets deallocated after executing print_str
. If move
did not occur, the original
reference a
would outlive the referent, which would cause a use-after-free problem.
In general, in order to make sure that a referent does outlive its reference, Rust needs to know how long a referent would live for (i.e., not be deallocated) and how long a reference would live (i.e., be pointing to a valid object in memory). The problem is that this is sometimes difficult to infer and Rust asks programmers to tell the compiler using a concept called lifetimes. Below, we will first look at functions and how lifetimes are important for functions. Later, we will look at other uses of lifetimes.
The Problem with References in Functions
Lifetimes are frequently used in functions, and most of the times Rust is able to automatically take care of lifetimes so the programmers do not need to worry about them. However, that is not always the case. Let's look at the following example to understand this further.
The example below uses
&str
which is called a string slice. You can read more about it in the Rust book.
fn ref_return(x: &str) -> &str { // Assume that this function does some complicated things // and returns a string slice. } fn main() { let s = String::from("string"); let res = ref_return(&s); // Assume that the rest of the code does things with `res`. }
In the above code, res
is a reference to the string slice returned from ref_return()
(the
referent). Thus, Rust needs to check whether or not the referent outlives the reference. Determining
how long the reference (res
) would live is easy---it's until the end of the main()
function.
However, it is not easy to determine how long the referent that res
points to would live---it's
coming from ref_return()
and generally speaking, unless you execute the function, you won't know
what it's going to return and which memory location res
will point to. Thus, Rust is unable to
check if the referent would outlive the reference.
Lifetimes in Functions
Due to the above problem, if a function returns a reference, Rust asks programmers to tell the Rust compiler how long the reference would live. This is called the reference's lifetime. However, there is an important thing to keep in mind. Rust does not ask programmers to specify the lifetime of a reference. Instead, Rust only asks programmers to represent the relationship of the lifetimes of input parameters and returned references. It would be very difficult, if not impossible, for a programmer to determine how long a reference would be valid and Rust does not ask for it.
Let's look at a more concrete example to understand what this means. The example below is from the Rust book.
#![allow(unused)] fn main() { fn longest(x: &str, y: &str) -> &str { if x.len() > y.len() { x } else { y } } }
If you run the code, the compiler will complain that the return type misses a lifetime parameter.
The error message will tell you how lifetime parameters look like and what to do. A lifetime parameter
looks like 'a
with a '
and a name of the parameter. It is used to represent the duration for
which a variable would be valid and it comes after &
.
Using a lifetime parameter, Rust expects programmers to tell its compiler how long a reference would be valid for. As mentioned earlier, Rust does not expect programmers to manually figure out how long a reference would be valid for. All Rust expects is how long a reference would be valid for in relation to input parameters. Let's take a look at the following example.
#![allow(unused)] fn main() { fn longest<'a>(x: &'a str, y: &'a str) -> &'a str { if x.len() > y.len() { x } else { y } } }
There are a few things to note in the above code. First, <'a>
is using a syntax similar to the one
we saw for generics in More on Struct and Traits. It means that
'a
is a generic lifetime parameter, i.e., it represents any lifetime, not a particular lifetime.
We then use 'a
for the input parameters and the return type to represent that they all have the
same lifetime. This is basically showing the relationship between the input parameters' lifetimes
and the return value's lifetime. As mentioned earlier, Rust doesn't ask programmers to specify a
lifetime. It only asks programmers to show what the relationships are. For functions, you always
need to show what the relationship is between input parameters' lifetimes and the return value's
lifetime. Since the return value is either x
or y
, telling the Rust compiler that the input
parameters and the return value have the same lifetime is exactly what we want to represent.
If we called this function from another function, Rust would be able to see that the return value would be valid as long as input parameters are valid. Using this information, Rust would be able to check if a referent would outlive its references.
'static
There is a special lifetime called the static lifetime, meaning that the reference can live
"during the entire duration of the
program". The best
example is a string literal that is embedded in a program's binary. Since it is always accessible,
the lifetime of a string literal is always 'static
.
#![allow(unused)] fn main() { fn string_literal() -> &'static str { let literal = "a string literal"; // This string is embedded in the program binary. literal } }
Oftentimes, you will see that the Rust compiler's error messages suggest to use 'static
and it is
actually an easy way to satisfy the compiler to get your code compiled. However, a lot of times
'static
is not what you should use. Thus, it is important to think hard about why 'static
is
appropriate for you before using it.
The Implication of Lifetimes in Functions
As the above example shows, when you return a reference, you have to tell the Rust compiler how the
lifetime of the reference is related to the lifetimes of the input parameters. This has an
interesting implication---you can only return a reference if it manipulates the input arguments. For
example, suppose you have a custom struct
and you create an instance of it within a function. You
cannot return a reference to the newly-created instance of the struct
.
There are actually two reasons why you cannot return a reference to a newly-created instance of a
struct
. One is that you cannot represent a lifetime of the reference, and the other is that the instance gets dropped at the end of the function.
When you feel the need to create a new object and return a reference for it, instead of returning a
reference directly, you need to use a data structure that transfers ownership, e.g., Box
, Vec
,
or String
.
#![allow(unused)] fn main() { fn box_creation_and_return() -> Box<i32> { let a: i32 = 1; return Box::new(a); } }
When Do We Need Lifetime Parameters?
If you read the Rust book or the above description of lifetimes, you may get an impression that
lifetimes are optional. This is actually not true, and lifetimes are mandatory for all
references. The reason why you don't see lifetimes all the time is that the Rust compiler is smart
enough to do the work for you. For example, if you use a reference in a struct
or enum
, you have
to have a lifetime.
#![allow(unused)] fn main() { struct ProblematicStruct { str_slice: &str, } }
The above does not work because it has a reference as a field and there is no lifetime. The following fixes it.
#![allow(unused)] fn main() { struct ProblematicStruct<'a> { str_slice: &'a str, } }
As mentioned earlier, <'a>
means that you're using a generic lifetime parameter ('a
) in the
definition of the struct
. You can then use it for the reference field. However, within a
function, Rust is quite often able to do the work for you. This is called lifetime
elision.
There is a simple algorithm that the Rust compiler uses to determine what lifetimes should be, and if the algorithm cannot determine lifetimes, the compiler throws an error. Then you need to manually annotate lifetimes. The algorithm works as follows.
- First, the algorithm assigns a lifetime parameter for each input parameter. For example, if
fn ex_func(x: &str, y: &str) -> &str {...}
is the function, then the algorithm assigns'a
tox
and'b
toy
like this:fn ex_func(x: &'a str, y: &'b str) -> &str {...}
. - Second, if there is exactly one input lifetime (i.e., one parameter), then that lifetime is
assigned to all output references. For example, if the function is
fn ex_func(x: &str) -> &str {...}
, then the algorithm assigns'a
to both the input parameter and the return reference like this:fn ex_func(x: &'a str) -> &'a str {...}
. - Third, for methods with
&self
or&mut self
, all output references get the same lifetime asself
.
In the above longest()
example, since there are two input parameters, the algorithm tries to
assign 'a
for the first parameter, then 'b
for the second parameter. The problem is that there
is nothing else the algorithm can do. The second case doesn't apply because there are more than one
parameter. The third case doesn't apply either because there is no self
reference. Thus, the
algorithm fails to determine the lifetimes and asks for manual annotation.
?, dyn, and Macros
There are a few last things to highlight. We want to look at these because they are frequently used in the standard library and external crates.
?
?
is a convenient operator that you might frequently use. It is a shortcut for handling Result
and
Option
.
Recall from the Safety Features of Rust that you typically
use a Result
to handle successful return values and errors together. A Result
wraps a return
value or an error inside Ok()
or Err()
and you can use match
to get the return value or the
error.
fn function_with_result(success_or_fail: bool) -> Result<String, String> { match success_or_fail { true => Ok(String::from("success")), false => Err(String::from("fail")), } } fn function_that_handles_result() -> Result<(), String> { let result = function_with_result(true); match result { Ok(success_result) => { println!("success_result: {}", success_result); Ok(()) }, Err(error_result) => Err(error_result) } } fn main() { function_that_handles_result(); }
This works well but it can be repetitive and tiring since you typically make many function calls and
need to handle a Result
. Thus, Rust provides ?
, an operator to handle a Result
easily. What it
does is that, if the Result
is Ok()
, it pulls out what's inside Ok()
, and if the Result
is
Err()
, it returns Err()
and exits from the entire function. Using ?
, the above code can be
revised as follows.
fn function_with_result(success_or_fail: bool) -> Result<String, String> { match success_or_fail { true => Ok(String::from("success")), false => Err(String::from("fail")), } } fn function_that_handles_result() -> Result<(), String> { let success_result = function_with_result(true)?; println!("success_result: {}", success_result); Ok(()) } fn main() { function_that_handles_result(); }
?
can also be used to handle Option
. If the value is None
, it exits the function early and
returns None
. If the value is Some()
, it pulls the value out of Some()
.
dyn
dyn
is a keyword that represents a trait
object. A trait object is an object of
any type that implements the trait. You can see the use of it in error propagation like the
following example from the Rust
book.
fn main() -> Result<(), Box<dyn std::error::Error>> { let f = std::fs::File::open("hello.txt")?; Ok(()) }
Box<dyn Error>>
means that it is a Box
that contains an object of any type that implements the
Error
trait.
You may recall that in More on Struct and Traits, we talked about
generic type parameters with trait bounds. The purpose is almost exactly the same, i.e., it
represents any type that implements certain traits. dyn
is different in two ways.
dyn
can only take one trait. (There are some nuances here and look at the Rust Reference on traits for that.)dyn
is dynamic while a generic type parameter with a trait bound is static. This means that for a generic type with a trait bound, the compiler will automatically generate code for each possible type. Withdyn
, the compiler will inject code that finds the right type (which is called dynamic dispatch).
Macros
Similar to C/C++, Rust provides macros. The first example you typically see is println!()
. !
indicates that it is a macro. A good resource to learn about macros is (again) the Rust book. It has
a section on macros.
Useful Crates and Tools
Rust is not just a language, it is an ecosystem for development with various development tools and
crates. cargo
is one such example but there are many more tools as well as crates. Here we
highlight some of the popular ones.
Crates
A crate refers to a binary or a library in Rust. You may be familiar with it already since cargo
creates a binary or library crate in a package. There are many crates out there that are very useful
for development.
crates.io & lib.rs
crates.io is a place to find an external crate that you can leverage in your code. All crates available on crates.io have a documentation available on Docs.rs. lib.rs is an alternative to crates.io and provides similar functionality.
serde
serde
is a de fact standard serialization library for Rust. It allows you to
transform a data structure into a different format (e.g., JSON) and later restore it into the same
data structure. serde
is highly popular and convenient, and many Rust programs rely on it.
Tokio
Tokio provides an asynchronous runtime for Rust. Asynchrony here means that you make a call and you return right away without waiting until the call finishes. Thus, you can get much better performance than synchronous counterparts. You could also use Async instead.
Logging Crates
Logging is necessary for any kind of development and Rust has good support for it. The most common
crate to use is the log
crate, which provides an interface. Since
the crate only provides an interface, you need to use another crate that provides an implementation,
e.g., env_logger
.
Command-Line Argument Parsing Crates
Argument parsing is often necessary for programs and Rust has convenient external crates that you
can use. clap
is one such crate and it is easy to use, and
structopt
is another popular one.
Tools
Clippy
The first one to highlight is called Clippy, which is a
linter that checks your code and catches mistakes or stylistic issues. You can install it with
rustup
and it is widely popular. Just to show the usefulness of it, here are some of the examples
that Clippy can catch.
-
Example 1
Catches:
#![allow(unused)] fn main() { fn func(opt: Option<Result<u64, String>>) { let n = match opt { Some(n) => match n { Ok(n) => n, _ => return, } None => return, }; } }
Suggests:
#![allow(unused)] fn main() { fn func(opt: Option<Result<u64, String>>) { let n = match opt { Some(Ok(n)) => n, _ => return, }; } }
-
Example 2
Catches:
#![allow(unused)] fn main() { for i in iter { if let Some(value) = i.parse().ok() { vec.push(value) } } }
Suggests:
#![allow(unused)] fn main() { for i in iter { if let Ok(value) = i.parse() { vec.push(value) } } }
-
Example 3
Catches:
#![allow(unused)] fn main() { fn bar(stool: &str) {} let x = Some("abc"); match x { Some(ref foo) => bar(foo), _ => (), } }
Suggests:
#![allow(unused)] fn main() { fn bar(stool: &str) {} let x = Some("abc"); if let Some(ref foo) = x { bar(foo); } }
rustfmt
rustfmt
is the default code formatter for Rust. You should use it if not already. It is better to integrate
it with your editor, e.g., by using a Rust plugin for your editor or through the combination of
plugins and
rls
.
Miri
Miri is a runtime interpreter for Rust that can check some of
the problems with your code especially with the unsafe
part of your code. You need to use a
nightly version of Rust in order to use Miri.
cargo test
Rust has great support for testing via cargo test
. You can find the details from the Rust
book and the Cargo
book.
Idioms and Design Patterns
In any language, learning idioms and design patterns is important to write readable and maintainable code. Rust is no exception and there are good resources regarding the topic. For example, Rust by Example, Rust Design Patterns, Idiomatic Rust, and the Rust book on Object-Oriented Design Pattern are all excellent resources. Especially, Rust by Example and Rust Design Patterns are highly useful. Here we highlight some of the ones useful for beginners.
Use Expressions
if
, match
, loop
, for
, while
, etc. are all expressions in Rust that evaluate to a value.
Don't write:
fn main() { let condition = true; let assignment; if condition { assignment = true; } else { assignment = false; } }
Do write:
fn main() { let condition = true; let assignment = if condition { true } else { false }; }
Use Pattern Matching and Destructuring
Sources:
https://doc.rust-lang.org/book/ch18-01-all-the-places-for-patterns.html
Rust's pattern matching is powerful and it can destructure complex data types.
Slice Destructuring
fn main() { let array = [0, 1, 2]; let [a, b, c] = array; println!("{}, {}, {}", a, b, c); }
Subslice Destructuring
fn slice_pattern(array: &[i32]) { match array { [] => println!("Empty array"), [x0] => println!("One element: {}", x0), [x0, y @ .., xn] => println!("First: {}, Last: {}, Middle: {:?}", x0, xn, y), } } fn main() { slice_pattern(&[]); slice_pattern(&[0]); slice_pattern(&[0, 1, 2, 3, 4]); }
struct
, enum
, Tuple Destructuring
struct ExStruct<'a> { field1: i32, field2: &'a str, } enum ExEnum<'a> { Variant(ExStruct<'a>), } fn main() { // Tuple destructuring let ex_tuple = (1, 2, 3); let (x, y, z) = ex_tuple; println!("x: {}, y: {}, z: {}", x, y, z); // `struct` destructuring let ex_struct = ExStruct { field1: 1, field2: "string", }; let ExStruct { field1: i, field2: s } = ex_struct; let ExStruct { field1, field2 } = ex_struct; println!("field1: {}, field2: {}", i, s); println!("field1: {}, field2: {}", field1, field2); // Nested `enum` & `struct` destructuring let ex_enum = ExEnum::Variant(ex_struct); let ExEnum::Variant(ExStruct { field1, field2 }) = ex_enum; println!("field1: {}, field2: {}", field1, field2); }
Reference Destructuring
fn main() { let mut x = 32; match x { ref x_ref => println!("Reference: {:?}", x_ref), } match x { ref mut x_mut_ref => println!("Mutable reference: {:?}", x_mut_ref), } }
Use Iterators
Use iterators whenever possible.
Don't write:
fn main() { let a = [0, 1, 2, 3, 4]; let mut i = 0; while i < a.len() { println!("{}", a[i]); i += 1; } }
Do write:
fn main() { let a = [0, 1, 2, 3, 4]; for e in a { println!("{}", e); } }
Avoid Declaring First
Source: https://doc.rust-lang.org/rust-by-example/variable_bindings/declare.html
You can declare a variable first and later initialize it. However, it is better to avoid it as it may lead to using uninitialized variables. The examples below are adapted from the above source.
Don't write:
fn main() { let a_binding; let x = 2; // Initialize the binding a_binding = x * x; }
Do write:
fn main() { let x = 2; let a_binding = x * x; }
Think about clone()
Before Using It
Source: https://rust-unofficial.github.io/patterns/anti_patterns/borrow_clone.html
It is easy to satisfy the compiler by calling clone()
. However, it is important to understand what
it means before doing it. It creates a separate variable, which means that making a change to the
cloned value does not change the original value. Also, there is a cost of doing it, which may or may
not be cheap. Thus, before using clone()
, it is important to think through first. This does
not mean that
Use the Debug Formatter
Rust's standard library types are all printable using the debug formatter {:?}
or the "pretty
printing" formatter {:#?}
.
fn main() { let vec = vec![0, 1, 2, 3]; println!("Debug: {:?}", vec); println!("Pretty print: {:#?}", vec); }
In contrast, the default formatter throws an error.
fn main() { let vec = vec![0, 1, 2, 3]; println!("{}", vec); }
Use #[derive()]
Source: https://doc.rust-lang.org/rust-by-example/trait/derive.html
Rust has a feature called attributes
and you use #[]
to annotate attributes you want to use. There are many useful attributes and one
is called derive
. It automatically adds trait implementation for traits that implement derive
macros.
For example, the following code automatically implements the Debug
trait, so you can use the debug
formatter.
#[derive(Debug)] struct Ex { field1: i32, field2: i64, } fn main() { let ex = Ex { field1: 32, field2: 64 }; println!("{:?}", ex); }
There are other useful traits to derive.
Clone
provideclone()
.Debug
makes your custom type (e.g.,struct
orenum
) printable.Default
creates a default instance for your custom type (e.g.,struct
orenum
) with empty values.Hash
computes a hash.
Implement the Display
Trait
Source: https://doc.rust-lang.org/rust-by-example/hello/print/print_display.html
Deriving Debug
is convenient but oftentimes you want to customize how your custom type prints out.
You can implement the Display
trait to control that.
struct Ex { field1: i32, field2: i32, } impl std::fmt::Display for Ex { // You need to implement the following function. // You can use `write!()`, which is similar to // `println!()` but writes to a `Formatter`. // `write!()` returns the right type for `fmt()`. fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { write!(f, "field1: {}, field2: {}", self.field1, self.field2) } } fn main() { let ex = Ex { field1: 32, field2: 64 }; println!("{}", ex); }
Implement new()
Source: https://rust-unofficial.github.io/patterns/idioms/ctor.html?highlight=construc#constructors
Rust's convention for constructing a new object is via new()
. When you define your own type,
implement new()
that creates a new instance.
#![allow(unused)] fn main() { struct Ex { field1: i32, field2: i32, } impl Ex { fn new() -> Self { Ex { field1: 0, field2: 0, } } } }
Implement the Default
Trait Along with new()
Sources:
https://rust-lang.github.io/rust-clippy/master/#new_without_default
https://rust-unofficial.github.io/patterns/idioms/default.html
The user of your type might expect to be able to use Default
. It is also more convenient for you.
#![allow(unused)] fn main() { struct Ex { field1: i32, field2: i32, } impl Ex { fn new() -> Self { Ex { field1: 0, field2: 0, } } } impl std::default::Default for Ex { fn default() -> Self { Ex::new() } } }
match
De-Nesting
You often end up having deeply-nested code especially when match
is involved, which reduces
readability. There are a few things you can try for de-nesting.
Use if let
Source: https://doc.rust-lang.org/rust-by-example/flow_control/if_let.html
Sometimes if let
can make your match
more readable and concise. The source above provides a good
example.
Don't write:
fn main() { let optional = Some(7); match optional { Some(i) => { println!("This is a really long string and `{:?}`", i); // ^ Needed 2 indentations just so we could destructure // `i` from the option. }, _ => {}, // ^ Required because `match` is exhaustive. Doesn't it seem // like wasted space? }; }
Do write:
fn main() { let optional = Some(7); if let Some(i) = optional { println!("This is a really long string and `{:?}`", i); } }
Use while let
Source: https://doc.rust-lang.org/rust-by-example/flow_control/while_let.html
Similar to if let
, while let
can make your match
more readable and concise. The source above
provides a good example.
Don't write:
fn main() { let mut optional = Some(0); // Repeatedly try this test. loop { match optional { // If `optional` destructures, evaluate the block. Some(i) => { if i > 9 { println!("Greater than 9, quit!"); optional = None; } else { println!("`i` is `{:?}`. Try again.", i); optional = Some(i + 1); } // ^ Requires 3 indentations! }, // Quit the loop when the destructure fails: _ => { break; } // ^ Why should this be required? There must be a better way! } } }
Do write:
fn main() { let mut optional = Some(0); // Repeatedly try this test. while let Some(i) = optional { if i > 9 { println!("Greater than 9, quit!"); optional = None; } else { println!("`i` is `{:?}`. Try again.", i); optional = Some(i + 1); } } }
Use Tuple Matching
Source: https://github.com/ferrous-systems/elements-of-rust#tuple-matching
If you need to match on multiple variables, you can group them as a tuple and flatten the nested
match
expressions.
Don't write:
fn main() { let first_match: Option<i32> = Some(1); let second_match: Option<i32> = None; match first_match { Some(_) => { match second_match { None => println!("None found"), _ => (), } }, _ => () } }
Do write:
fn main() { let first_match: Option<i32> = Some(1); let second_match: Option<i32> = None; match (first_match, second_match) { (Some(i), None) => println!("None found"), _ => (), } }
Use match
Guards
Source: https://doc.rust-lang.org/rust-by-example/flow_control/match/guard.html
match
guards are additional conditions you can use to filter a match
arm. The following examples
are adapted from the above source.
fn main() { let pair = (2, -2); println!("Tell me about {:?}", pair); match pair { (x, y) => { if x == y { println!("These are twins"); } else if x + y == 0 { println!("Antimatter, kaboom!"); } }, (x, _) => { if x % 2 == 1 { println!("The first one is odd"); } }, _ => println!("No correlation..."), } }
The above can be revised as follows.
fn main() { let pair = (2, -2); println!("Tell me about {:?}", pair); match pair { (x, y) if x == y => println!("These are twins"), // The ^ `if condition` part is a guard (x, y) if x + y == 0 => println!("Antimatter, kaboom!"), (x, _) if x % 2 == 1 => println!("The first one is odd"), _ => println!("No correlation..."), } }
Use match
Binding
Source: https://doc.rust-lang.org/rust-by-example/flow_control/match/binding.html
You can bind a matching value to a variable. The above source has good examples.
Don't write:
fn main() { let num = Some(42); match num { Some(n) => { if n == 42 { println!("The Answer: {}!", n); } else { println!("Not interesting... {}", n); } }, _ => (), } }
Do write:
fn main() { let num = Some(42); match num { Some(n @ 42) => println!("The Answer: {}!", n), Some(n) => println!("Not interesting... {}", n), _ => (), } }
It is even shorter than using a match
guard.
fn main() { let num = Some(42); match num { // Got `Some` variant, match if its value, bound to `n`, // is equal to 42. Some(n) if n == 42 => println!("The Answer: {}!", n), // Match any other number. Some(n) => println!("Not interesting... {}", n), // Match anything else (`None` variant). _ => (), } }
Don't write:
fn age() -> u32 { 15 } fn main() { let age = age(); match age { 0 => println!("I haven't celebrated my first birthday yet"), 1 ..= 12 => { println!("I'm a child of age {:?}", age); }, 13 ..= 19 => { println!("I'm a teen of age {:?}", age); }, // Nothing bound. Return the result. _ => println!("I'm an old person of age {:?}", age), } }
Do write:
fn age() -> u32 { 15 } fn main() { match age() { 0 => println!("I haven't celebrated my first birthday yet"), n @ 1 ..= 12 => println!("I'm a child of age {:?}", n), n @ 13 ..= 19 => println!("I'm a teen of age {:?}", n), // Nothing bound. Return the result. n => println!("I'm an old person of age {:?}", n), } }
Processing Collections of Items Using Functional Language Features
Sources
https://doc.rust-lang.org/book/ch13-00-functional-features.html
https://rust-unofficial.github.io/patterns/functional/index.html
Rust provides many functional language features, and the most popular usage is when processing a
collection of items. Some example methods are
fold()
,
filter()
,
map()
,
reduce()
, etc.
The following uses an imperative programming style to calculate a sum.
fn main() { let mut sum = 0; for i in 1..11 { sum += i; } println!("{}", sum); }
We can accomplish the same task using fold()
. The method signature of fold()
looks like the
following.
#![allow(unused)] fn main() { fn fold<B, F>(self, init: B, f: F) -> B where F: FnMut(B, Self::Item) -> B }
The first parameter is an initial value. The second parameter f
takes a closure which is an
anonymous function or a lambda function in Rust. The syntax for a closure is |param1, param2, ...| { function body }
. If the function body is a single line, you can omit {}
, i.e., |param1, param2, ...| single_line_function_body
. In case of fold()
there should be two parameters for the
closure, e.g., it.fold(init, |acc, x| { /* function body */ });
.
This means that fold()
can be called on an iterator, and it takes two arguments---one is the
initial value and the other is a closure. fold()
first takes the initial value and the
first item in the iterator. Using those as arguments, fold()
calls the closure. From
there, fold()
iterates---it takes the result of the closure from the previous iteration as
well as the next item in the iterator, and calls the closure again using those as arguments. It
returns the final result from the final call to the closure. Thus, the following code calculates a
sum.
fn main() { println!("{}", (1..11).fold(0, |a, b| /* function body */ )); }
Use Method Chaining
A common pattern found in Rust programs is method chaining. You can see this often with the
collection-processing functions such as map()
, filter()
, etc., when a series of transformations
need to be done for a collection.
fn main() { println!( "{}", [0, 1, 2, 3, 4] .iter() .map(|x| x * x) .filter(|x| *x > 5) .fold(0, |a, b| a + b) ); }
Here is another example that executes a command-line program.
fn main() { std::process::Command::new("sh") .arg("-c") .arg("echo hello") .output() .expect("failed to execute process"); }
A common way to use method chaining in your own code is to return self
for your methods.
Error Handling
Sources:
https://blog.burntsushi.net/rust-error-handling/
https://doc.rust-lang.org/rust-by-example/error.html
Use ?
Your code will be much more readable and concise with ?
.
Don't write:
#[allow(unused)] fn main() -> std::io::Result<()> { let f = std::fs::File::open("hello.txt"); let f = match f { Ok(file) => file, Err(e) => return Err(e), }; Ok(()) }
Do write:
#[allow(unused)] fn main() -> std::io::Result<()> { let f = std::fs::File::open("hello.txt")?; Ok(()) }
Error
Trait Object
Source: https://doc.rust-lang.org/rust-by-example/error/multiple_error_types/boxing_errors.html
Sometimes (but not always) it is useful to use the Error
trait object to propagate original
errors.
Do write:
#[allow(unused)] fn error_propagation() -> Result<(), Box<dyn std::error::Error>> { let f = std::fs::File::open("hello.txt")?; Ok(()) } fn main() -> Result<(), Box<dyn std::error::Error>> { error_propagation() }
Use type
to Create a (Shorter) Result
Alias
Source: https://doc.rust-lang.org/rust-by-example/error/result/result_alias.html
type
allows you to create a type alias. You can use it to make your Result
type more concise.
#![allow(unused)] fn main() { type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>; }
Use map()
, map_or()
, and()
, and_then()
, unwrap()
, unwrap_or()
, Etc.
Sources:
https://doc.rust-lang.org/rust-by-example/error/result.html
https://doc.rust-lang.org/rust-by-example/error/option_unwrap.html
Both Result
and Option
provide many helper methods to process them in various ways. You can use
them instead of match
. For example, map()
for Option
maps Some
to Some
by applying the
provided lambda function, and None
to None
.
The following is an example of using match
to process Option
.
fn main() { let opt = Some(1); match opt { Some(i) => println!("{}", i), None => (), } }
The same can be done with map()
.
fn main() { let opt = Some(1); opt.map(|i| println!("{}", i)); }
There are other convenient functions like map_or()
, map_or_else()
, and()
, and_then()
,
unwrap()
, unwrap_or()
, etc. for both Result
and Option
.
New Type Idiom
Source: https://doc.rust-lang.org/rust-by-example/generics/new_types.html
If you use the same type for different purposes, often it is better to create a new type for each
purpose. The above source shows an example where i64
is used for both years and days to count an
age. Since the same type i64
is used for two different purposes, you can create one new type for
years and another new type for days as follows.
#![allow(unused)] fn main() { struct Years(i64); struct Days(i64); }
The benefit of doing this is that you can use the compiler's type checker to ensure that you are
using i64
for the right purpose. If you make a call to a function that needs to use years instead
of days, then you can use Years
as the type instead of i64
. This will make sure that you are
passing the right value with the right type to the function that expects to use years, not days.
#![allow(unused)] fn main() { fn old_enough(age: &Years) -> bool { age.0 >= 18 } }
The above function expects to use years, not days and now the compiler will make sure that you pass years, not days.
struct Years(i64); struct Days(i64); fn old_enough(age: &Years) -> bool { age.0 >= 18 } fn main() { let years = Years(32); let days = Days(years.0 * 365); println!("{}", old_enough(&years)); // Uncomment the following line and run it to see an error. // println!("{}", old_enough(&days)); }
Use &str
, &T
, and &[T]
for Function Parameters
Sources:
https://rust-unofficial.github.io/patterns/idioms/coercion-arguments.html?highlight=string#use-borrowed-types-for-arguments
https://hermanradtke.com/2015/05/03/string-vs-str-in-rust-functions.html
When borrowing String
, Box
, and Vec
in a function, use &str
, &T
, and &[T]
instead of
&String
, &Box
, and &Vec
. There are two advantages.
String
,Box
, andVec
are basically pointers already, so adding&
will have another layer of indirection.&str
,&T
, and&[T]
provide more flexibility. They accept not only&str
,&T
, and&[T]
but also&String
,&Box
, and&Vec
.
One of the sources above illustrates this well.
fn print_me(msg: &str) { println!("msg = {}", msg); } fn main() { let string = "hello world"; print_me(string); let owned_string = String::from("hello world"); // or String::from_str("hello world") print_me(&owned_string); let boxed_string = std::boxed::Box::new(String::from("hello world")); print_me(&boxed_string); let counted_string = std::rc::Rc::new(String::from("hello world")); print_me(&counted_string); let atomically_counted_string = std::sync::Arc::new(String::from("hello world")); print_me(&atomically_counted_string); }
Week 3: Fuzz Testing, Property-Based Testing, Symbolic Execution, and SAT/SMT
Fuzz Testing and Property-Based Testing
Symbolic Execution
SAT/SMT
Week 5: Rust Analyses
- The Usability of Ownership
- How Do Programmers Use Unsafe Rust
- Rudra: Finding Memory Safety Bugs in Rust at the Ecosystem Scale
- Memory-Safety Challenge Considered Solved? An In-Depth Study with All Rust CVEs
Week 6: Rust OSes
- RedLeaf: Towards An Operating System for Safe and Verified Firmware
- Theseus: an Experiment in Operating System Structure and State Management
- RedLeaf: Isolation and Communication in a Safe Operating System
- Multiprogramming a 64 kB Computer Safely and Efficiently
Week 7: Symbolic Execution
- KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs
- Verifying Dynamic Trait Objects in Rust
- SAGE: Whitebox Fuzzing for Security Testing
- S2E: A Platform for In-Vivo Multi-Path Analysis of Software Systems
- Symbolic Execution with SymCC: Don’t Interpret, Compile!
Week 8: Hybrid Fuzzing
- Driller: Augmenting Fuzzing Through Selective Symbolic Execution
- HFL: Hybrid Fuzzing on the Linux Kernel
- QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing
- RAZZER: Finding Kernel Race Bugs through Fuzzing
Week 9: Formal Methods
Week 10: Formal Methods (Continued)
- How Amazon Web Services Uses Formal Methods
- Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3
Week 11: Interactive Verification
- seL4: Formal Verification of an OS Kernel
- CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels
- Jitk: A Trustworthy In-Kernel Interpreter Infrastructure
- Using Crash Hoare Logic for Certifying the FSCQ File System