A linked list is a basic data structure widely used among developers to implement stack, queue, hash table, graph, etc. It is quite extensively used in the Linux kernel to implement process and memory management. In this blog, I will explain why the Linux kernel implements linked lists differently and how.

A generic Linked List node contains data and reference pointers pointing to the next or previous node.

struct node {
    int data;
    struct node *prev;
    struct node *next;
}

The head pointer points to the start of the linked list. A typical linked list implementation requires three basic routines -

1. initialize(struct node *head);
2. struct node * add_entry(struct node *head, struct node *new_node);
3. struct node * remove_entry(struct node *head);
  1. initialize: To initialize a linked list head.
  2. add_entry: Add a new node at the start of the list.
  3. remove_entry: Remove a node from the start of the list.

The problem with this approach

Let us say I want to create another linked list struct student and use these routines for list operations.

Will it work? The answer is - NO! Why?

We will have to re-implement all these three functions again for struct student. The link pointers (prev/next) in struct node are pointers to the node itself and can't be made to point to any other struct. In simpler words, the above initialize, add_entry,remove_entry functions can't be used to create another linked list of struct student as it only accepts struct node pointer.

Intrusive Implementation: The Linux way

The idea is to make the link pointers independent of the struct node it is pointing to. So what if we isolate link pointers in a separate struct list_head and add it to struct node.

struct list_head {
    struct list_head *prev;
    struct list_head *next;
};

Now let us create the actual struct node containing data and add struct list_head to it.

struct node {
  int data;
  struct list_head list;
};

Let us create a list head pointing to the start of the list and initialize it.

struct list_head list;
void initialize(struct list_head *list) {
  list->next = list;
  list->prev = list;
}

...

initialize(&list);

Now, let us add a new node to the list using add_entry function.

void add_entry(struct list_head *list, struct list_head *new) {
    struct list_head *next;
    
    next = list->next;
    
    next->prev = new;
    list->next = new;
    new->prev = list;
    new->next = list;
}

...

struct node A;
A.data = 1;

...

add_entry(&list, &(A.list));

Let us add another struct node.

struct node B;
B.data = 2;

add_entry(&list, &(B.list));

Here, the link pointers in a node points to the link pointers in the newly added node and not the node itself. This makes the linked list independent of the actual data node being added.

You might be wondering if we have a reference for the list_head in the struct node but not the node itself. How do we access the data part of struct node.

Let us say you have a pointer to the struct node, would you be able to get the pointer to list_node contained in it?

Going ahead we will learn how to retrieve the reference to struct node given the reference to list_head.

💡
All the fields in a struct node gets a contiguous chunk of memory space

Introducing OFFSET_OF Macro

#define OFFSET_OF(Type, Field)  (unsinged long long) ((&(((Type*)0)->Field)))

offset_of_list = OFFSET_OF(struct node, list);

This macro calculates the offset of a field Field in a struct Type. For example, let us say you have a reference to a struct node, you can calculate the address of struct list contained in it by &(head->list).

Consider head to be a pointer to a struct node with the starting address as 0x0, then &(head->list) will be the offset of the field list.

Macro simplified

  • ((Type *)0x0): Takes the integer 0x0the and casts it to the pointer of the type Type.
  • (((Type*)0x0)->Field): Dereferences that pointer to point to structure member Field.
  • (&(((Type*)0x0)->Field)): Computes the address of Field, which effectively becomes the offset of the Field as well.
💡
We can assume placing a struct node at the memory address zero and then finding out at what address a particular field list is.

Now that you can retrieve the address of a field within a struct. Can we do the reverse? Let us say we have the address of a field list in struct node, Can I compute the address of struct node itself?

Yes, All we need to do is subtract the offset of the list from the address of list. This can be done by a beautiful macro.

The CONTAINER_OF Macro

#define CONTAINER_OF(ptr, Type, Field) \
                        (Type*)((unsigned long long)(ptr) - OFFSET_OF(Type, Field))

That's how using the macros and the pointer tracking infra that Linux provides, we can access all the linked list nodes without maintaining a direct pointer to it...