Memory Management¶
Goals¶
The OS manages physical memory (DRAM) on behalf of executing processes. To decouple process address spaces from physical memory constraints, the OS provides virtual memory — virtual addresses are translated to physical addresses transparently.
Virtual address range (V0 to Vmax) can be much larger than physical memory
OS must allocate physical memory and arbitrate access to it
Allocation requires data structures and algorithms to track used/free physical memory, plus policies for replacing contents when physical memory is smaller than virtual memory (swapping to/from secondary storage).
Arbitration requires fast translation and validation of virtual addresses → physical addresses. Modern systems use page tables (page-based) or segment registers (segment-based).
Paging is the dominant method: virtual memory is divided into fixed-size pages, physical memory into same-size page frames. The OS maps pages to page frames.
Hardware Support¶
Memory management relies on a combination of OS software and hardware mechanisms:
Memory Management Unit (MMU):
Translates virtual addresses → physical addresses
Generates faults on illegal access, permission violations, or when a page is not present in memory (must be fetched from disk)
Registers:
Point to the currently active page table (e.g., x86 CR3 register)
In segment-based systems: base address, limit, segment count
Translation Lookaside Buffer (TLB):
Small hardware cache of recent virtual → physical translations
Eliminates repeated page table lookups for frequently accessed addresses
Includes protection/validity bits for access verification
Hardware dictates supported memory management modes (paging, segmentation, or both), page sizes, and virtual/physical address formats. Software controls allocation policies and page replacement algorithms.
Page Tables¶
A page table maps virtual page numbers (VPN) to physical frame numbers (PFN). Only the VPN portion of a virtual address indexes into the page table; the remaining bits are the offset within the page.
Translation: VPN → page table lookup → PFN; PFN + offset = physical address.
Allocation on first touch: physical memory is allocated only when a process first accesses a virtual page, not when the address space is declared. Unused allocations consume no physical memory.
Page reclamation: pages unused for a long time may be evicted from physical memory (swapped to disk). When re-accessed, the page is loaded into a (potentially different) physical frame and the page table is updated.
Per-process page tables: the OS maintains a separate page table for each process. On context switch, the active page table pointer is updated (x86: CR3 register).
Page Table Entries¶
Each entry contains the physical frame number plus status/control bits:
Present/Valid bit — 1 if page is in physical memory; 0 triggers a page fault
Dirty bit — set on write; useful for tracking modified pages (e.g., file cache writeback)
Accessed bit — set on any read or write; used by page replacement algorithms
Protection bits — read-only vs. read/write, user-mode vs. supervisor-only
x86 Pentium page table entry includes: Present, Dirty, Accessed, R/W (0=read-only, 1=read/write), U/S (user/supervisor), cache control bits, and reserved bits.
Page faults: when the MMU cannot complete a translation, it pushes an error code onto the kernel stack and traps into the OS page fault handler, which determines the action — bring page from disk, enforce protection, etc. On x86, the faulting address is stored in CR2.
Page Table Size¶
For a flat (single-level) page table:
Entries = total virtual pages = (address space size) / (page size)
32-bit system, 4KB pages, 4B entries → page table = 4 MB per process
64-bit system, 4KB pages, 8B entries → page table = 32 PB per process (impractical)
Processes rarely use the entire virtual address space, yet a flat page table requires entries for every possible VPN.
Multi-Level Page Tables¶
Hierarchical structure that only allocates page table entries for used virtual memory regions:
Two-level example:
Outer level (page table directory) — entries point to inner page tables
Inner page tables — entries contain PFN + protection bits
Inner tables exist only for valid virtual memory regions; gaps in the address space save entire inner tables
Address format (e.g., 32-bit, 10+10+12):
First 10 bits → index into outer page table directory
Next 10 bits → index into inner page table → produces PFN
Last 12 bits → offset within page
Each inner page table covers 2^10 pages × 2^12 bytes/page = 1 MB. Any 1 MB gap eliminates one inner table.
64-bit architectures use 3 or 4 levels — larger, sparser address spaces benefit more from hierarchical tables. Trade-off: more levels reduce table size but increase translation latency (more memory accesses per lookup).
TLB (Translation Lookaside Buffer)¶
Multi-level page tables increase translation cost. The TLB cache mitigates this:
TLB hit → bypass all page table accesses (single-cycle translation)
TLB miss → walk all page table levels
Even a small TLB is effective due to temporal and spatial locality in memory references.
x86 (Intel i7) TLB sizes: 64-entry data TLB, 128-entry instruction TLB (per core), 512-entry shared L2 TLB.
Inverted Page Tables¶
Entries correspond to physical frames (not virtual pages) — table size proportional to physical memory, not virtual address space. Each entry stores (PID, virtual page number).
Translation: search the table for matching (PID, VPN) → entry index = physical frame number.
Problem: requires linear search (unordered table). Mitigated by TLB (catches most lookups) and hashing page tables — hash on part of the address to narrow search to a linked list of candidates.
Segmentation¶
Address space divided into variable-size segments (code, heap, stack, etc.). Virtual address = segment descriptor + offset. The descriptor indexes a descriptor table to produce the segment’s base address; base + offset = linear address.
Segments defined by base address and limit (size) registers. In practice, segmentation and paging are combined: the linear address from segmentation is passed through a multi-level page table to produce the final physical address.
x86 support: 32-bit hardware supports both segmentation and paging (Linux: up to 8K per-process + 8K global segments). 64-bit defaults to paging only (segmentation available for backward compatibility).
Page Size¶
The page offset field width determines page size. Common sizes on Linux/x86:
4 KB (default) — 12-bit offset
2 MB (“large pages”) — 21-bit offset
1 GB (“huge pages”) — 30-bit offset
Larger pages → fewer page table entries → smaller page tables (512× reduction per step), more TLB coverage. Downside: internal fragmentation — wasted space if the page is not densely used.
4 KB is the common default. Large/huge pages are beneficial for databases and in-memory data stores with large, dense memory footprints.
Memory Allocation¶
Kernel-level allocators:
Allocate pages for kernel state, process code/stack/static data
Track free physical memory
User-level allocators (e.g., malloc/free):
Manage dynamic heap allocations
Request pages from the kernel, then manage subdivisions internally
External Fragmentation¶
Interleaved allocate/free operations create non-contiguous free pages, preventing large contiguous allocations even when total free memory is sufficient.
Mitigation: allocators that leave gaps aligned to expected request sizes, enabling coalescing of adjacent free regions.
Linux Kernel Allocators¶
Buddy Allocator:
Manages memory in power-of-two sized chunks
On allocation: recursively subdivide until smallest fitting power-of-two chunk is found
On free: merge (“coalesce”) with adjacent buddy (same-size neighbor differing by one bit in address) and propagate merges up
Benefit: fast coalescing — buddy’s address is computed by flipping one bit
Drawback: internal fragmentation for non-power-of-two sizes
Slab Allocator:
Builds object caches on top of contiguous physical memory slabs
Pre-creates caches for common kernel objects (e.g.,
task_structat 1.7 KB)Allocations go to the matching cache; freed slots are reused by same-type objects
Eliminates internal fragmentation (exact object sizes) and external fragmentation (uniform slot sizes)
The combination of buddy + slab allocators effectively addresses both fragmentation types.
Demand Paging¶
Physical memory is smaller than virtual memory, so not all pages need to reside in DRAM simultaneously. Pages are swapped between main memory and secondary storage (swap partition on disk, flash, or remote memory).
Page fault flow:
Access to a page with present bit = 0 in the page table
MMU raises a page fault → traps into the OS kernel
OS locates the page on disk, issues I/O to retrieve it
OS places the page in a free physical frame, updates the page table entry
Process resumes, re-executes the faulting instruction (now succeeds)
The page’s physical address after swap-in may differ from its original location.
Page pinning: disables swapping for specific pages — required for DMA (Direct Memory Access) operations where devices need stable physical addresses.
Page Replacement¶
When to swap out: when memory usage exceeds a high watermark threshold, a page-out daemon selects pages to evict (preferably when CPU usage is low).
Which pages to swap out:
Least Recently Used (LRU): evict the page not accessed for the longest time. Uses the hardware accessed bit. Intuition: recently used pages are likely needed again soon.
Prefer clean pages: pages without the dirty bit set don’t need to be written to disk, saving I/O overhead.
Never swap pinned pages (kernel state, DMA buffers).
LRU pathological case: an array of N+1 pages accessed in a loop on a system with N page frames → 100% page fault rate (every access evicts the next needed page). Demonstrates that LRU can fail badly under certain access patterns.
Linux uses a second-chance LRU variant (two scans before eviction) and categorizes pages by type to guide replacement decisions. Administrators can tune swap thresholds and rates.
Copy-on-Write (COW)¶
On fork(), the child’s virtual address space is mapped to the same physical pages as the parent, with pages marked write-protected.
Read-only access: both processes share the same physical page — saves memory and copy time
Write access: MMU generates a page fault → OS creates the actual copy of the page, updates page tables, removes write protection for the faulting process
Only pages that are actually modified incur the copy cost.
Checkpointing¶
Periodically save the entire process state for failure recovery — on crash, restart from the nearest checkpoint instead of the beginning.
Optimized checkpointing using MMU support:
Write-protect the entire address space
Copy the process state (without pausing execution)
Track dirtied pages via hardware dirty bits
Save only the diffs (incremental checkpoint)
Trade-off: more frequent checkpointing → faster recovery, but higher overhead and more state captured (each write observed individually vs. amortized across infrequent checkpoints).
Related techniques:
Rewind-Replay (debugging): restart from a checkpoint, replay execution to locate bugs
Migration: checkpoint to another machine, restart there — used for disaster recovery and datacenter consolidation (incremental migration: repeated fast checkpoints until remaining dirty state is small enough for a final pause-and-copy)