Memory Management

Goals

The OS manages physical memory (DRAM) on behalf of executing processes. To decouple process address spaces from physical memory constraints, the OS provides virtual memory — virtual addresses are translated to physical addresses transparently.

  • Virtual address range (V0 to Vmax) can be much larger than physical memory

  • OS must allocate physical memory and arbitrate access to it

Allocation requires data structures and algorithms to track used/free physical memory, plus policies for replacing contents when physical memory is smaller than virtual memory (swapping to/from secondary storage).

Arbitration requires fast translation and validation of virtual addresses → physical addresses. Modern systems use page tables (page-based) or segment registers (segment-based).

Paging is the dominant method: virtual memory is divided into fixed-size pages, physical memory into same-size page frames. The OS maps pages to page frames.

Hardware Support

Memory management relies on a combination of OS software and hardware mechanisms:

Memory Management Unit (MMU):

  • Translates virtual addresses → physical addresses

  • Generates faults on illegal access, permission violations, or when a page is not present in memory (must be fetched from disk)

Registers:

  • Point to the currently active page table (e.g., x86 CR3 register)

  • In segment-based systems: base address, limit, segment count

Translation Lookaside Buffer (TLB):

  • Small hardware cache of recent virtual → physical translations

  • Eliminates repeated page table lookups for frequently accessed addresses

  • Includes protection/validity bits for access verification

Hardware dictates supported memory management modes (paging, segmentation, or both), page sizes, and virtual/physical address formats. Software controls allocation policies and page replacement algorithms.

Page Tables

A page table maps virtual page numbers (VPN) to physical frame numbers (PFN). Only the VPN portion of a virtual address indexes into the page table; the remaining bits are the offset within the page.

Translation: VPN → page table lookup → PFN; PFN + offset = physical address.

Allocation on first touch: physical memory is allocated only when a process first accesses a virtual page, not when the address space is declared. Unused allocations consume no physical memory.

Page reclamation: pages unused for a long time may be evicted from physical memory (swapped to disk). When re-accessed, the page is loaded into a (potentially different) physical frame and the page table is updated.

Per-process page tables: the OS maintains a separate page table for each process. On context switch, the active page table pointer is updated (x86: CR3 register).

Page Table Entries

Each entry contains the physical frame number plus status/control bits:

  • Present/Valid bit — 1 if page is in physical memory; 0 triggers a page fault

  • Dirty bit — set on write; useful for tracking modified pages (e.g., file cache writeback)

  • Accessed bit — set on any read or write; used by page replacement algorithms

  • Protection bits — read-only vs. read/write, user-mode vs. supervisor-only

x86 Pentium page table entry includes: Present, Dirty, Accessed, R/W (0=read-only, 1=read/write), U/S (user/supervisor), cache control bits, and reserved bits.

Page faults: when the MMU cannot complete a translation, it pushes an error code onto the kernel stack and traps into the OS page fault handler, which determines the action — bring page from disk, enforce protection, etc. On x86, the faulting address is stored in CR2.

Page Table Size

For a flat (single-level) page table:

  • Entries = total virtual pages = (address space size) / (page size)

  • 32-bit system, 4KB pages, 4B entries → page table = 4 MB per process

  • 64-bit system, 4KB pages, 8B entries → page table = 32 PB per process (impractical)

Processes rarely use the entire virtual address space, yet a flat page table requires entries for every possible VPN.

Multi-Level Page Tables

Hierarchical structure that only allocates page table entries for used virtual memory regions:

Two-level example:

  • Outer level (page table directory) — entries point to inner page tables

  • Inner page tables — entries contain PFN + protection bits

  • Inner tables exist only for valid virtual memory regions; gaps in the address space save entire inner tables

Address format (e.g., 32-bit, 10+10+12):

  • First 10 bits → index into outer page table directory

  • Next 10 bits → index into inner page table → produces PFN

  • Last 12 bits → offset within page

Each inner page table covers 2^10 pages × 2^12 bytes/page = 1 MB. Any 1 MB gap eliminates one inner table.

64-bit architectures use 3 or 4 levels — larger, sparser address spaces benefit more from hierarchical tables. Trade-off: more levels reduce table size but increase translation latency (more memory accesses per lookup).

TLB (Translation Lookaside Buffer)

Multi-level page tables increase translation cost. The TLB cache mitigates this:

  • TLB hit → bypass all page table accesses (single-cycle translation)

  • TLB miss → walk all page table levels

Even a small TLB is effective due to temporal and spatial locality in memory references.

x86 (Intel i7) TLB sizes: 64-entry data TLB, 128-entry instruction TLB (per core), 512-entry shared L2 TLB.

Inverted Page Tables

Entries correspond to physical frames (not virtual pages) — table size proportional to physical memory, not virtual address space. Each entry stores (PID, virtual page number).

Translation: search the table for matching (PID, VPN) → entry index = physical frame number.

Problem: requires linear search (unordered table). Mitigated by TLB (catches most lookups) and hashing page tables — hash on part of the address to narrow search to a linked list of candidates.

Segmentation

Address space divided into variable-size segments (code, heap, stack, etc.). Virtual address = segment descriptor + offset. The descriptor indexes a descriptor table to produce the segment’s base address; base + offset = linear address.

Segments defined by base address and limit (size) registers. In practice, segmentation and paging are combined: the linear address from segmentation is passed through a multi-level page table to produce the final physical address.

x86 support: 32-bit hardware supports both segmentation and paging (Linux: up to 8K per-process + 8K global segments). 64-bit defaults to paging only (segmentation available for backward compatibility).

Page Size

The page offset field width determines page size. Common sizes on Linux/x86:

  • 4 KB (default) — 12-bit offset

  • 2 MB (“large pages”) — 21-bit offset

  • 1 GB (“huge pages”) — 30-bit offset

Larger pages → fewer page table entries → smaller page tables (512× reduction per step), more TLB coverage. Downside: internal fragmentation — wasted space if the page is not densely used.

4 KB is the common default. Large/huge pages are beneficial for databases and in-memory data stores with large, dense memory footprints.

Memory Allocation

Kernel-level allocators:

  • Allocate pages for kernel state, process code/stack/static data

  • Track free physical memory

User-level allocators (e.g., malloc/free):

  • Manage dynamic heap allocations

  • Request pages from the kernel, then manage subdivisions internally

External Fragmentation

Interleaved allocate/free operations create non-contiguous free pages, preventing large contiguous allocations even when total free memory is sufficient.

Mitigation: allocators that leave gaps aligned to expected request sizes, enabling coalescing of adjacent free regions.

Linux Kernel Allocators

Buddy Allocator:

  • Manages memory in power-of-two sized chunks

  • On allocation: recursively subdivide until smallest fitting power-of-two chunk is found

  • On free: merge (“coalesce”) with adjacent buddy (same-size neighbor differing by one bit in address) and propagate merges up

  • Benefit: fast coalescing — buddy’s address is computed by flipping one bit

  • Drawback: internal fragmentation for non-power-of-two sizes

Slab Allocator:

  • Builds object caches on top of contiguous physical memory slabs

  • Pre-creates caches for common kernel objects (e.g., task_struct at 1.7 KB)

  • Allocations go to the matching cache; freed slots are reused by same-type objects

  • Eliminates internal fragmentation (exact object sizes) and external fragmentation (uniform slot sizes)

The combination of buddy + slab allocators effectively addresses both fragmentation types.

Demand Paging

Physical memory is smaller than virtual memory, so not all pages need to reside in DRAM simultaneously. Pages are swapped between main memory and secondary storage (swap partition on disk, flash, or remote memory).

Page fault flow:

  1. Access to a page with present bit = 0 in the page table

  2. MMU raises a page fault → traps into the OS kernel

  3. OS locates the page on disk, issues I/O to retrieve it

  4. OS places the page in a free physical frame, updates the page table entry

  5. Process resumes, re-executes the faulting instruction (now succeeds)

The page’s physical address after swap-in may differ from its original location.

Page pinning: disables swapping for specific pages — required for DMA (Direct Memory Access) operations where devices need stable physical addresses.

Page Replacement

When to swap out: when memory usage exceeds a high watermark threshold, a page-out daemon selects pages to evict (preferably when CPU usage is low).

Which pages to swap out:

  • Least Recently Used (LRU): evict the page not accessed for the longest time. Uses the hardware accessed bit. Intuition: recently used pages are likely needed again soon.

  • Prefer clean pages: pages without the dirty bit set don’t need to be written to disk, saving I/O overhead.

  • Never swap pinned pages (kernel state, DMA buffers).

LRU pathological case: an array of N+1 pages accessed in a loop on a system with N page frames → 100% page fault rate (every access evicts the next needed page). Demonstrates that LRU can fail badly under certain access patterns.

Linux uses a second-chance LRU variant (two scans before eviction) and categorizes pages by type to guide replacement decisions. Administrators can tune swap thresholds and rates.

Copy-on-Write (COW)

On fork(), the child’s virtual address space is mapped to the same physical pages as the parent, with pages marked write-protected.

  • Read-only access: both processes share the same physical page — saves memory and copy time

  • Write access: MMU generates a page fault → OS creates the actual copy of the page, updates page tables, removes write protection for the faulting process

Only pages that are actually modified incur the copy cost.

Checkpointing

Periodically save the entire process state for failure recovery — on crash, restart from the nearest checkpoint instead of the beginning.

Optimized checkpointing using MMU support:

  1. Write-protect the entire address space

  2. Copy the process state (without pausing execution)

  3. Track dirtied pages via hardware dirty bits

  4. Save only the diffs (incremental checkpoint)

Trade-off: more frequent checkpointing → faster recovery, but higher overhead and more state captured (each write observed individually vs. amortized across infrequent checkpoints).

Related techniques:

  • Rewind-Replay (debugging): restart from a checkpoint, replay execution to locate bugs

  • Migration: checkpoint to another machine, restart there — used for disaster recovery and datacenter consolidation (incremental migration: repeated fast checkpoints until remaining dirty state is small enough for a final pause-and-copy)