A programmer working in a systems programming language like Rust or C has more things to worry about than a programmer using a managed language like C#. The biggest discrepancy between these two types of languages is how memory is managed. A C# programmer might not even have to think about memory management. Thanks to garbage collection, a C# programmer can simply instantiate an object with the new operator and not worry about how that object is later going to be cleaned up. In a systems language we must have a strategy for cleaning up unused memory, otherwise the memory footprint of our program will continuously grow, causing performance degradation.
One of Rust’s most noteworthy features is the ownership system. This system allows the compiler to ensure that the programmer is managing memory appropriately. In most languages what are known as variables we call “variable bindings” in Rust. The reason we make this distinction is because a value is bound to a particular name or variable. In Rust a piece of memory can only be “owned” by one binding at a time.
Let’s say we have a function defined like this:
This function is defined to take ownership of the parameter that is passed into it. As a consequence of this the following snippet of code will result in a compilation error.
Because we passed the vector bound to “array” to the function named “do_something1” the compiler knows that “array” no longer owns that vector and it will not allow us to access that memory after we have given away ownership. This could cause pain for primitive types like integers, luckily Rust has a trait system that allows us to mark types as “Copy”. By default, the primitive types are marked as “Copy”. This tells rust that when we pass or rebind a “Copy”, we should instead use a copy of that value.
The benefits of the ownership system are huge! We no longer need to manually deallocate our objects. Since the compiler is tracking which scope a particular object belongs to, the compiler can determine that when an object goes out of scope, it is able to be deallocated.
What if we want “do_something1” to do some sort of operation on our vector but let the calling function keep ownership? We could define a function as follows:
Now the function will return the vector and give ownership back to the caller. There is a simpler way to accomplish the same thing using a technique called borrowing.
The & indicates that the function accepts a reference. The vector is now being borrowed and the caller retains ownership of the vector without explicitly having to return it at the end of “do_something3”.
In the previous example we declared the array binding using “let mut”, which makes the binding mutable. If the variable was bound with just “let” the compiler would not have allowed us to call “push” on the vector. The compiler is able to enforce this because vector’s push method is declared with a mutable borrowed reference to self. The following is an example definition for a function that takes a mutable reference.
Rust’s ownership model allows for only one mutable borrow to exist at one time. This helps eliminate errors where multiple threads might try changing the same thing concurrently, ensuring thread-safety.
Working with the borrow checker is certainly challenging. I noticed right away that I had to be much more thoughtful about how I allocate and use memory. This unique way of programming may be painful for many programmers, but forcing more rigor into the process of designing data structures does produce more bug free code. Rust’s memory management model also has big implications for security.
In 2014 a vulnerability dubbed Heartbleed was discovered in OpenSSL. The vulnerability gave the attacker access to adjacent memory. This was due to a buffer over-read where the user specified a length of data to read but OpenSSL never verified that the given length stayed within the buffer’s boundaries. This mistake was very easy for a programmer to make because of the way C allows unrestricted memory access via raw pointers. Rust also allows the use of raw pointers, but in order to use pointers the code must be wrapped in a block that is explicitly marked as unsafe. When the programmer absolutely must use pointers, the unsafe operations are clearly marked and can be isolated and reasoned about more easily. Since idiomatic Rust accesses data through built in data structures rather than raw pointers, this vulnerability is much less likely to have existed in a Rust implementation.
If you have never written Rust before I encourage you to try it out. Even if you have no use for a systems programming language, the ideas around memory management are highly transferable to other languages. C# might not have a borrow checker, but the thought process developed while using Rust does help in design work while using other languages. What's your take on Rust's memory management model? Do you think it can help developers write more bug free code?