Welcome back fellow Rustaceans 🦀

In the last article, we learned how different languages move around variables allocated on heap memory. Then we looked at how this memory is allocated and freed in different languages. Before we look at how all of this is done in Rust, we will look at our first heap-allocated type in Rust: Strings. Strings are like dynamic arrays of chars. They can grow and shrink in size, and they offer a variety of methods for manipulation.

In Rust, we use the String type to represent strings. To create a string in Rust, you can use the from method of the String type from the Rust standard library. When we call this method, the Rust library internally allocates memory on the heap and initializes it with the content passed to the from method.

let mut my_string = String::from("Rust🦀"); 

One of the things you will notice is that we can have emojis as part of strings. So awesome, right? Well, if you remember what we learned in day 5 of this series, Rust's char type is not limited to a single byte. Instead, it represents a single Unicode Scalar Value. Similar to Rust chars, Rust strings are not limited to ASCII chars and can represent UTF-8. This ensures your Rust programs can handle text from around the world!

Now let's try to access the first element of the string. From what we know in C, strings can be thought of as arrays of char and can be accessed using indexes. Let's try that in Rust:

let mut message = String::from("Rust🦀"); 
println!("First elemnt of message is {}", message[0]); 

Okay, the Rust compiler is not Happy again, Can you guess why?

First, let me ask you a question: what is the size or length of this string? You might say 4 single-byte chars plus whatever the size of the 🦀 is. Well, here's your answer! Strings can have elements of varying sizes. Using the same size for all char representations would waste a lot of memory. For example, to represent the 🦀 Rust needs 4 bytes, but R can be represented using a single byte. This means that elements in a Rust string are not all of the same size, and we cannot access them using indexing directly, as rust does not know what the right index boundary should be.

So, how can I access individual elements in Rust? Well, the Rust compiler, as always, gives the solution. The compiler suggests we should access the index using either chars().nth() or bytes().nth(). What Rust wants you to do is either convert the string into an iterator of chars or an iterator of bytes and then select the index. However, you should be cautious since elements are of varying sizes. Accessing the crab emoji using bytes().nth(5) will only give you the 1st byte of the crab emoji, which is not what we expect. Hence, the right way to do this is to use chars().nth().

Let's have a look at how strings are represented in memory. A String internally has three parts: a pointer to the data on the heap, the length of the string (how many bytes), and its capacity (the total allocated space on the heap). One of the things to notice is that Rust's String type explicitly stores the length of the string. This eliminates the need for null termination and prevents the risk of reading invalid memory.

So, what happens when you want to expand this string? You can use push_str to add more chars to the string. If the current capacity is enough, Rust will simply append the new string. However, if it's not, Rust will allocate a new chunk of memory on the heap (likely larger than the original) to accommodate the expanded string and then copy the content over.

Conclusion

Okay, great! Today we worked with our first heap-allocated type and learnt a bit about Rust Strings. However, there is so much more to Strings that a single article is not going to be enough to master them. you can learn more about rust strings from the rust standard library or you can refer to the rust book. With Strings, we laid the foundation for what we are going to learn next: how Rust handles dynamic memory and how it is different from other languages that we learned about in previous articles.