Welcome back, fellow Rustaceans 🦀

In the last article, we learned about our first heap-allocated data structure in Rust, the String. In today's article, we'll circle back to our discussion on how different languages manage heap-allocated structures and learn how Rust approaches this issue.

On Day 10, we learned how Python uses a garbage collector to manage heap-allocated structures based on reference counting. Even though it's a great solution to manage heap memory, it comes with runtime costs and memory overhead required by the garbage collector. This may not fit the realm of systems/embedded engineers, where we need precise control of memory and are usually resource-constrained. On the contrary, C allows total control of heap memory to users by using malloc and free. This comes with its own set of problems like dangling pointers, double frees, and has been the source of countless security vulnerabilities.

So, what is it that Rust does that makes it so different? Well, let's have a look at the below C++ code from day 10:

#include <iostream>
#include <string>
#include <vector>

using namespace std;

int main () {

   vector<string> treasure_map  = {"gold", "diamonds", "ruby"};
   vector<string> another_map = treasure_map ;
   another_map[0] = "shiny new sword";

   cout << "treasure_map : " << treasure_map[0]  << endl;
   cout << "another_map : " << another_map[0] << endl;


   return 0;
}

In the above code, we create a vector of strings treasure_map, with all elements allocated on the heap. Then, we create another variable named another_map and initialize it using treasure_map. C++ does a deep copy here, creating two independent copies of the data structure. This makes sense, as unlike Python, there is no garbage collector to track reference counts. If you want a shallow copy, you would need to track both treasure_map and another_map usage throughout the code, as modifying one will impact the other. Even worse, you have to make sure that both references are invalidated at the same time. Invalidating one while still using the other will lead to segmentation faults.

Over the decades, many tools, bureaucracy, and review processes have been deployed to avoid such kinds of issues. However, memory issues are still the most common vulnerability in C/C++ codebases. In 2019, Microsoft security researchers presented that 70% of all vulnerabilities in Microsoft codebases are a result of memory vulnerabilities.

A proactive approach to more secure code | MSRC Blog | Microsoft Security Response Center
What if we could eliminate an entire class of vulnerabilities before they ever happened? Since 2004, the Microsoft Security Response Centre (MSRC) has triaged every reported Microsoft security vulnerability. From all that triage one astonishing fact sticks out: as Matt Miller discussed in his 2019 presentation at BlueHat IL, the majority of vulnerabilities fixed and with a CVE assigned are caused by developers inadvertently inserting memory corruption bugs into their C and C++ code.

Okay We would like to create shallow copies as they are much fast, but at the same time avoid relying on garbage collector to free up variables. Rust takes a different approach to address both these concerns. It follows what it calls the ownership model.

Imagine a library book – it can only be checked out by one person at a time. The concept of ownership in Rust is similar to borrowing a library book. Just as the borrower becomes responsible for returning the book, a new variable in Rust assumes ownership of the data when it's moved from another variable. This ensures the original variable is no longer responsible for the data's memory, freeing it up for potential reuse.

fn main() {
  let name = String::from("Alice"); // "name" owns the string "Alice"
  println!("Hello, {}", name); 
}

In this code, the String::from("Alice") expression creates a new string containing the text "Alice." The let statement assigns ownership of this string to the variable name. When the function main finishes executing, the variable name goes out of scope, and its ownership is dropped, automatically releasing the memory associated with the string "Alice."

But examining only one variable, doesn’t provide a complete picture. The interesting part comes when we create another variable based on the original one. In Python, this situation can lead to both variables pointing to the same memory location and in c++ it will create two independent copies, just like we discussed earlier. But what happens in Rust?

fn main() {
  let name1 = String::from("Alice");
  let name2 = name1; // Ownership of the string is moved to "name2"

  println!("The name after move: {}", name1); // compilation error
}

This code attempts to assign the ownership of the string "Alice" from name1 to name2. However, this will result in a compilation error.

So what happened? Unlike Python, instantiating a variable in Rust from another variable often involves moving ownership. Assigning let name2 = name1; moves the ownership of the string from name1 to name2. This means name1 can no longer be used, and attempting to access it would result in a compilation error.

Moving is like gifting a book – the ownership is transferred permanently. In Rust, moving typically occurs when assigning a value from one variable to another of the same type. This way rust can ensure that there is only one owner at a time.

While Rust's ownership model revolves around ownership transfers and borrowing, it offers flexibility in how data is handled. Not all data types are treated the same:

Simple data types like integers (i32), booleans (bool), and floating-point numbers (f64) are considered "copy" types. When you assign a copy type variable to another, a bit-wise copy of the data is created. Both variables have independent ownership of their own memory . This is similar to copying a number on a piece of paper – you end up with two identical copies.

let x = 42; // x owns the value 42
let y = x;  // A copy of 42 is assigned to y (independent ownership)

println!("x: {}", x); // Output: x: 42
println!("y: {}", y); // Output: y: 42

Rust does that as copying scalars that have fixed size is easy and more deterministic as their size is known at compile time and does not involve run time memory allocation and de-allocation.

While copying works well for simple data types, it can fall short for complex structures. Imagine copying a nested structure – the top level might be copied, but the underlying data (owned by the structure) might still be referenced by the original variable. This can lead to unexpected behavior if the original data is modified.

To create a true deep copy in Rust, you can use the clone() method. This method is implemented for types that require a deep copy, and it creates a new instance of the data structure with all its internal data copied independently. Both the original and the cloned version own their own memory.

let complex_data = vec![1, 2, 3]; // complex_data owns the vector
let deep_copy = complex_data.clone(); // A deep copy of the vector is created

complex_data.push(4); // Modify the original vector

println!("complex_data: {:?}", complex_data); // Output: complex_data: [1, 2, 3, 4]
println!("deep_copy: {:?}", deep_copy);        // Output: deep_copy: [1, 2, 3]


Conclusion

Okay, great! Today we learned the basics of Rust's ownership model and how it manages heap-allocated structures in a safe manner without using a garbage collector. However, this approach isn't unique to Rust; C++ has it in the form of RAII (Resource Acquisition Is Initialization). The thing that differentiates this in Rust is that you can't opt out of it, as it's built right into the compiler.

In upcoming articles, we'll take a deep dive into the intricacies of the Rust ownership model. We'll explore how it defines lifetimes and how other variables can borrow data without violating the ownership model. This will equip you with the knowledge to write memory-efficient and secure Rust programs.

For more articles like this on rust refer to :- https://inpyjama.com/tag/rust/