Day 11: Handling Heap
Welcome back, fellow Rustacians! 🦀
In our previous article, we delved into the memory layout of an application and took a quick look at heap memory, along with some of the challenges that come with using it. Today, we’ll explore how different programming languages handle heap memory, with a focus on Python’s garbage collection and C/C++'s manual control. So, let’s get started!
Managing Heap is a critical aspect of any programming language. It involves allocating and deallocating memory dynamically during the execution of a program. Depending on their design, languages typically adopt one of two common strategies for managing heap memory: manual control, as seen in C/C++, and automatic management using a garbage collector, as done by Python.
Shallow vs Deep copy
Before we delve into the concept of a garbage collector, it’s crucial to understand how different languages create a new variable from another variable that’s allocated on the heap. For this discussion, let’s consider strings as an example.
Consider the following Python code:
# Python code
treasure_map = ["gold", "diamonds", "ruby"]
another_map = treasure_map
# Modifying another_map changes the original! (Shallow Copy)
another_map[0] = "shiny new sword"
print(treasure_map) # Output: ["shiny new sword", "diamonds", "ruby"]
In this code, we create a list of strings called treasure_map
. Then we create a new variable another_map
from treasure_map
. Our general expectation might be that treasure_map
and another_map
are independent variables and modifying one should not affect the other. Let's run this code.
Hmmm, this is not what we expected. When we modify another_map
, treasure_map
also gets updated.
Now, let’s try the same thing with C++:
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main () {
vector<string> treasure_map = {"gold", "diamonds", "ruby"};
vector<string> another_map = treasure_map ;
another_map[0] = "shiny new sword";
cout << "treasure_map : " << treasure_map[0] << endl;
cout << "another_map : " << another_map[0] << endl;
return 0;
}
When we run this code, it matches our expectation: updating another_map
does not update treasure_map
. So, did we find a bug in Python? Not at all!
This is because Python performs what’s known as a shallow copy, while C++, at least for vectors and strings, performs a deep copy.
A shallow copy creates a new object and inserts references to the elements of the original object. So, if you modify the original object, the changes will be reflected in the shallow copy. Shallow copying is faster and uses less memory because it only copies the references to objects, not the objects themselves. In some cases, having multiple references to the same object can be useful. For example, if multiple parts of your code need to interact with the same object, a shallow copy can be an excellent choice. However, Since the copied variables point to the same memory location, changes to the original object will affect the copied object and vice versa. This can lead to unexpected behavior if not handled carefully.
A deep copy, on the other hand, creates a new object and recursively adds copies of the elements of the original object. Changes to the original object do not affect the deep copy. This is useful when you need to work with a copy of an object without affecting the original. However, as you can predict performing a deep copy takes a toll on both memory and cpu cycles. Deep copying is slower and uses more memory than shallow copying because it involves creating copies of all elements of the copied object. Implementing deep copy can be complex, especially for objects with nested or recursive data structures. You need to recursively make a copy of all the members of the structures.
Garbage Collection
Alright, now that we have a solid grasp of deep and shallow copying, let’s turn our attention to how Python manages heap memory. Python employs a mechanism known as garbage collection for automatic memory management. The garbage collector, a part of Python’s standard library, is engineered to reclaim memory that’s occupied by objects no longer in use by the program.
The garbage collector in Python operates using a method called reference counting. Each object has a count that keeps track of the number of references to it. When an object’s reference count falls to zero, it becomes unreachable and is marked for garbage collection.
As we observed in the previous example, when we created a new variable another_map
from treasure_map
, Python performed a shallow copy instead of allocating a completely new copy of the object. It accomplished this by increasing the reference count of the object.
As variables fall out of scope, the reference count decreases to zero. Python runs its garbage collector periodically. When the garbage collector encounters an object with a reference count of zero, it frees up its memory.
There are two statements here that might raise eyebrows for an embedded or systems engineer. Firstly, Python runs its garbage collector periodically, which means the garbage collector consumes additional CPU cycles and memory. This can be critical at times, especially since most embedded devices have very constrained memory. Secondly, when the garbage collector encounters an object with a reference count of zero, it frees up its memory. This implies that memory isn’t freed immediately. This can be crucial when working with limited memory where you want to know exactly when the memory is freed.
Manual Control
On the contrary, dynamic memory management in C is manually controlled by the programmer. This means that you, as the programmer, have direct control over when and how memory is allocated and deallocated. The functions malloc
and free
are used for this purpose. malloc
is a function that allocates a specified amount of memory and returns a pointer to the beginning of the allocated block. Here’s an example:
int *p = malloc(sizeof(int) * 10); // Allocates memory for 10 integers
In this example, malloc
allocates enough memory to hold 10 integers and returns a pointer to the first byte of this memory. The pointer p
can now be used like an array.
free
is a function that deallocates memory previously allocated by malloc
. Here’s an example:
free(p); // Deallocates the memory
In this example, free
deallocates the memory that p
points to. After this line, p
should not be used as it is now a dangling pointer.
Manual control of memory management can lead to more efficient programs because it gives you the power to decide exactly when and how memory is used. It also saves precious CPU cycles consumed by the garbage collector periodically. This can be particularly important in systems with limited resources, such as embedded systems.
While manual control gives you more power, it also comes with more responsibility and potential problems:
1- Memory Leaks: If you forget to call free
, the memory remains allocated, leading to a memory leak. This can be a problem in long-running programs where memory leaks can add up over time and consume a lot of memory.
int *p = malloc(sizeof(int) * 10);
// Forgot to call free(p), causing a memory leak
2- Dangling Pointers: If you call free
and then continue to use the pointer, you have a dangling pointer. This can lead to undefined behavior and hard-to-find bugs.
free(p);
*p = 5; // Undefined behavior, p is a dangling pointer
Conclusion
Today we explored how different languages handle dynamic memory – that's the memory your program grabs and releases while it's running. We've even looked at the pros and cons of different approaches. But what about Rust?
In our next article, we'll look into how Rust handles dynamic memory. We'll start by investigating our first heap-allocated type String. Using Strings, we'll see exactly how Rust's approach sets it apart from other languages.
Discussion