Virtualization

What is Virtualization

Virtualization originated in the 1960s at IBM to allow multiple operating systems to concurrently run on shared mainframe hardware. Each OS deployed on a physical platform has the illusion it owns the underlying hardware resources (or a subset). An OS together with its applications and virtual resources is called a virtual machine (VM), also referred to as a guest or domain.

The virtualization layer — called a virtual machine monitor (VMM) or hypervisor — manages allocation of physical resources and provides isolation across VMs.

Defining Virtualization

Popek and Goldberg (1974) formally define a virtual machine as an efficient, isolated duplicate of the real machine. The VMM has three essential properties:

  • Fidelity — the VM environment is essentially identical to the original machine (same CPU type, device types, etc.)

  • Performance — programs show at worst only minor speed decrease; given equivalent resources, performance matches native execution

  • Safety/Control — the VMM has complete control of system resources, ensuring isolation among VMs; it decides which VMs get direct hardware access and prevents VMs from violating those policies

Per this definition, VirtualBox qualifies as virtualization. The Java Virtual Machine (a language runtime, not a hardware duplicate) and hardware emulators (e.g., Virtual GameBoy) do not.

Benefits of Virtualization

  • Consolidation — run multiple VMs on a single physical platform, reducing hardware count, space, power, and admin overhead

  • Migration & cloning — VMs decouple OS+apps from physical hardware, enabling live migration and cloning for availability and reliability

  • Isolation & security — bugs or malicious behavior are contained within VM boundaries

  • OS research — rapid prototyping and debugging of OS features without hardware restarts

  • Legacy support — legacy OSes can be packaged in VMs and co-located with modern workloads

Historically, virtualization was not widely adopted because mainframes were uncommon and commodity x86 hardware was cheap — companies simply bought more servers. This led to ~10-20% utilization rates, sprawling data centers, rising admin costs, and power/cooling bills consuming ~70% of IT budgets. Revisiting virtualization for consolidation became economically necessary.

Virtualization Models

Bare-Metal (Type 1)

The VMM/hypervisor sits directly on hardware and manages physical resources. A privileged service VM (running a standard OS with full hardware access) handles device drivers, management, and configuration.

  • Xen — open-source hypervisor (also commercialized as Citrix XenServer). VMs are called domains: dom0 is the privileged domain running drivers; domUs are guest VMs.

  • VMware ESX — hypervisor with largest market share in virtualized server cores. Exports APIs for third-party configuration. Originally had a Linux-based control core; now uses remote APIs.

  • Microsoft Hyper-V — also bare-metal.

Hosted (Type 2)

A full host OS manages all hardware. A VMM module provides virtual platform interfaces and scheduling for guest VMs, invoking host OS drivers as needed. Native applications can run alongside guest VMs.

  • KVM (Kernel-based VM) — Linux kernel module combined with QEMU (hardware emulator used in virtualizer mode). Guest VMs access actual hardware resources; QEMU intervenes for critical operations (e.g., I/O) and passes control to the KVM module/host OS. KVM leverages the Linux open-source ecosystem for rapid adaptation to new hardware, devices, and security patches. Originally developed to let Linux apps use virtualization hardware; became a full virtualization solution within three months.

  • VMware Fusion, VirtualBox, VMware Player — all hosted.

Note: KVM’s host OS switches into a hypervisor-like mode via the kernel module, making the rest of the OS play a secondary role similar to a privileged partition.

Hardware Protection Levels

x86 provides four protection rings:

  • Ring 0 — highest privilege, full hardware access (native OS resides here)

  • Ring 3 — least privilege (applications reside here)

  • Rings 1-2 available for intermediate privilege levels

For virtualization: hypervisor at ring 0, guest OS at ring 1, applications at ring 3.

Modern x86 adds root/non-root modes, each with four rings:

  • Root mode (host) — all operations permitted; hypervisor runs at ring 0 of root mode

  • Non-root mode (guest) — certain operations not permitted; guest OS at ring 0, apps at ring 3

Privileged operations in non-root mode cause VMexits (traps to root mode/hypervisor). The hypervisor returns control via VMentry.

Processor Virtualization

Trap-and-Emulate

Guest instructions execute directly on hardware — the VMM does not interpose on every instruction. As long as the guest operates within allocated resources, instructions run at hardware speed.

When a privileged instruction is issued, the processor traps to the hypervisor, which either:

  • Allows the operation — emulates the expected behavior so the guest sees correct results

  • Denies the operation — terminates the VM or returns an error

This trap-and-emulate mechanism is the foundational technique for efficient CPU virtualization.

x86 Challenges (Pre-2005)

Pre-2005 x86 had only 4 rings (no root/non-root). 17 instructions were privileged but did not cause traps when issued from ring 1 — they failed silently. Example: POPF/PUSHF for manipulating the interrupt enable/disable bit in the flags register. When executed from ring 1:

  • The instruction fails silently

  • The hypervisor is never notified, so it cannot emulate the intended behavior

  • The guest OS continues executing, falsely assuming the operation succeeded

This made pure trap-and-emulate inapplicable to pre-2005 x86.

Binary Translation

Pioneered by Mendel Rosenblum’s group at Stanford (later commercialized as VMware), binary translation enables full virtualization — running unmodified guest OSes.

The approach:

  1. Dynamically capture instruction sequences (at basic-block granularity — loops, functions) from the VM binary

  2. Inspect each block for any of the 17 problematic instructions

  3. If safe — execute natively at hardware speed

  4. If unsafetranslate the problematic instruction into an alternative sequence that emulates the desired behavior (possibly via an explicit trap to the hypervisor)

Optimizations include caching translated code fragments and analyzing only kernel code (not application code).

Paravirtualization

Instead of running unmodified guests, paravirtualization modifies the guest OS to be aware it runs on a hypervisor. The guest makes explicit hypercalls to the hypervisor (analogous to system calls) rather than attempting privileged operations that would fail.

Hypercall flow: guest packages context/state → issues hypercall → traps to VMM → VMM performs operation → returns control and data to guest.

Pioneered and popularized by Xen (University of Cambridge, later XenSource/Citrix). Trades guest modification for better performance by avoiding binary translation overhead.

Trap behavior: accessing a swapped-out page causes a hardware MMU fault and traps to the hypervisor regardless of virtualization approach. Page table entry updates depend on whether the OS has write permissions to its page tables.

Memory Virtualization

Full Virtualization

The guest must see a contiguous physical address space starting at address 0. Three address types:

  • Virtual addresses — used by guest applications

  • Physical addresses — what the guest thinks are real addresses

  • Machine addresses — actual physical addresses on the hardware

Two approaches:

  1. Dual page tables — guest maintains VA→PA mappings; hypervisor maintains PA→MA mappings. Every memory access requires two translations — too expensive.

  2. Shadow page tables (preferred) — the hypervisor maintains a direct VA→MA mapping. The hardware MMU uses this shadow table, so guest applications’ virtual addresses translate directly to machine addresses at hardware speed. The hypervisor write-protects the guest page table so that any new VA→PA mapping triggers a trap, allowing the hypervisor to update the shadow table. Invalidation occurs on context switches.

Paravirtualized

The guest OS knows it runs virtualized, so it:

  • Does not require contiguous physical memory starting at zero

  • Explicitly registers its page tables with the hypervisor (no need for shadow tables)

  • Still lacks write permissions to its page tables (to prevent corrupting other VMs)

  • Can batch page table updates into a single hypercall, amortizing trap costs

Modern hardware (extended page tables, tagged TLBs) has substantially reduced or eliminated these memory virtualization overheads.

Device Virtualization

Devices present greater diversity and less standardization than CPUs/ISAs, requiring specialized virtualization approaches.

Passthrough Model

The VMM configures access permissions, then the guest VM gets exclusive, direct access to a physical device (also called VMM bypass).

  • Pros: near-native device performance, no hypervisor overhead on device access

  • Cons: no device sharing between VMs; requires exact device type match between guest driver and physical device; breaks VM-hardware decoupling, making migration difficult (device-specific state must also be migrated)

Hypervisor Direct Model

The hypervisor intercepts every device access, translates it to a generic I/O representation, and traverses its own I/O stack to invoke the actual device driver.

  • Pros: VM is decoupled from physical device; simplifies sharing and migration

  • Cons: adds latency due to emulation; hypervisor must integrate all device drivers and is exposed to driver complexity/bugs

Originally adopted by VMware ESX, sustainable due to VMware’s market position.

Split Device Driver Model

Device access involves two components:

  • Front-end driver (in guest VM) — a modified driver that wraps device API calls into messages

  • Back-end driver (in service VM or host OS) — the standard, unmodified device driver

The front-end packages device operations and sends them to the back-end, which performs actual device access. Applications in the guest are unmodified.

  • Pros: eliminates emulation overhead; enables centralized device sharing policies (fairness, priorities)

  • Cons: requires paravirtualized guests (to install front-end drivers)

The split device driver model remains relevant even with hardware virtualization support, because it centralizes device access decisions and enables finer-grained sharing policies without relying on device-level support.

Hardware Virtualization Support

Starting ~2005, x86 hardware added virtualization-friendly features (AMD Pacifica, Intel VT):

CPU (VT-x):

  • Fixed the 17 non-virtualizable instructions to properly trap

  • Added root/non-root protection modes

  • Added VM Control Structure (VMCS) — per-VCPU state that the hardware can interpret, allowing selective trapping of operations

  • New instructions for mode transitions and VMCS manipulation

Memory:

  • Extended page tables with VM identifiers in page table entries

  • Tagged TLBs — TLB entries tagged with VM IDs, so world switches (VM context switches) no longer require TLB flushes; MMU matches both virtual address and VM ID

I/O (VT-d, IOV, VMDq):

  • Chipset-level support for SR-IOV (I/O virtualization), I/O routing, direct device access

  • Device-level DMA remapping to target correct VM memory

  • Multi-queue devices — multiple logical interfaces, each assignable to a different VM

  • Better interrupt routing — device interrupts delivered to the core running the target VM

Additional features address security guarantees between VMs and the hypervisor, and more efficient management interfaces for virtualized environments.