Documentation

Kernel & OS

Operating system kernels are among the most demanding environments for data structure design. Every allocation must be fast, every structure must work without the luxury of garbage collection or dynamic resizing, and correctness failures cause system crashes rather than program exceptions.

Why Kernel Code Has Strict Data Structure Requirements

Kernel code runs with interrupts disabled in critical sections, cannot call malloc in contexts where sleeping is forbidden, and must never leak memory or corrupt adjacent allocations. These constraints rule out most high-level data structures that rely on dynamic allocation or locking. Instead, kernel developers use intrusive data structures — where the node pointers are embedded directly in the object being managed — to eliminate separate allocations. The Linux kernel's generic list, red-black tree, and radix tree implementations are all intrusive, and understanding why requires understanding both the performance and safety requirements of kernel code.

Linked Lists in the Linux Kernel

Linux uses a circular doubly-linked list (struct list_head) embedded directly inside every major kernel object: task_struct for processes, file for open files, inode for filesystem nodes. Because the list node is inside the object, splicing a process onto a run queue or removing it is O(1) with no allocation. The kernel provides container_of() to recover the enclosing struct from a pointer to the embedded list_head, making the pattern both type-safe and zero-overhead. This design appears thousands of times in the kernel source and is the canonical example of an intrusive linked list. Device driver writers use the same list_head to manage queues of pending I/O requests.

Heaps and Priority Queues in CPU Scheduling

The Completely Fair Scheduler (CFS) in Linux uses a red-black tree ordered by virtual runtime (vruntime) to select the next task to run. The leftmost node of the tree — cached for O(1) access — is always the task that has run the least, making schedule() fast even with thousands of runnable tasks. Real-time schedulers use priority queues (binary heaps) to pick the highest-priority runnable thread in O(log n). Timer management also relies on a min-heap: the kernel's hrtimer subsystem keeps all pending timers in a heap so that the next expiry can be found in O(1) and cancelled in O(log n). These structures make the difference between a scheduler that scales to 256-core machines and one that collapses under load.

Memory Allocators and Free Lists

The kernel's slab allocator (SLUB in modern Linux) maintains per-size-class free lists of pre-allocated objects, making kmalloc for common sizes effectively O(1). Each slab is a contiguous set of pages carved into fixed-size slots; a free list threads through the unused slots using a simple embedded pointer. For larger allocations, the buddy allocator manages pages in power-of-two blocks, merging adjacent free buddies into larger blocks and splitting blocks on allocation — a process that requires only array indexing and bit manipulation. The design ensures that fragmentation stays bounded and that allocation never requires a full heap scan. Understanding free lists and buddy systems is essential for anyone writing device drivers or kernel modules.

Real-Time Constraints and Data Structure Choice

In hard real-time kernels (PREEMPT_RT Linux, VxWorks, FreeRTOS), worst-case execution time matters more than average-case throughput. Hash tables with open addressing can have O(n) worst-case lookup on pathological inputs, which is unacceptable when missing a deadline causes a safety failure. Red-black trees guarantee O(log n) worst-case for all operations, which is why they dominate in scheduler and timer code. Stack depth must also be bounded — recursive tree traversals are replaced with iterative ones using an explicit stack to avoid overflowing the small, fixed kernel stack. Every data structure choice in a real-time kernel is an argument about worst-case bounds, not averages.