I/O Management

I/O Devices

I/O devices include keyboards, microphones, displays, speakers, mice, network interfaces, and disks.

  • Input only: keyboard, microphone

  • Output only: speaker, display

  • Both: hard disk, NIC, flash card

Device Features

Any device can be abstracted to have:

  • Control registers (accessed by CPU): - Command registers: CPU tells device what to do - Data registers: used for data transfer control - Status registers: CPU reads device state

  • Internal logic: microcontroller (device’s CPU), on-device memory, specialized hardware (e.g., analog-digital converters)

CPU-Device Interconnect

  • Devices connect to CPU via controllers and interconnects (e.g., PCI, PCI-X, PCIe)

  • PCIe: higher bandwidth, lower latency, more devices than PCI/PCI-X

  • Other buses: SCSI (disks), peripheral bus (keyboards)

  • Bridge controllers handle differences between interconnect types

Device Drivers

  • Device-specific software components in the OS

  • Responsible for all device access, management, and control

  • Manufacturers provide drivers for each OS

  • OS provides a device driver framework with standardized interfaces

Benefits:

  • Device independence: OS not specialized per device

  • Device diversity: easily support new devices by adding drivers

Types of Devices

  • Block devices (e.g., disks): operate on fixed-size blocks; support direct/random access

  • Character devices (e.g., keyboards): serial stream; get/put character interface

  • Network devices: stream of variable-size data chunks (between block and character)

OS represents devices internally as files (on Unix: /dev/ directory, managed by tmpfs/devfs).

Pseudo devices: /dev/null (discards output), /dev/random (pseudo-random bytes).

CPU-Device Interactions

Memory-Mapped I/O

  • Device registers mapped to physical memory addresses

  • CPU writes to those addresses → PCI controller routes to device

  • Portion of physical address space dedicated to device interaction

  • Configured via Base Address Registers (BAR) during boot (PCI configuration protocol)

I/O Port Model

  • Special CPU instructions (e.g., x86 in/out) target specific I/O ports

  • Each instruction specifies target device (port) and value (in register)

Interrupt vs. Polling

  • Interrupts: device signals CPU; overhead from handler execution, interrupt mask management, cache pollution; but immediate notification

  • Polling: CPU periodically reads device status register; can choose convenient times (less cache disruption); risk of delay or wasted cycles

  • Choice depends on device type, throughput/latency goals, interrupt handler complexity, and device data rate

Programmed I/O (PIO)

  • CPU directly writes commands to device command registers and moves data through device data registers

  • No additional hardware required

  • Example: sending 1500-byte packet on NIC with 8-byte bus = 1 command write + ~188 data writes = 189 CPU accesses

Direct Memory Access (DMA)

  • Relies on DMA controller hardware

  • CPU writes command to device + configures DMA controller (source address, size)

  • DMA controller moves data directly between device and memory without CPU involvement per byte

  • Example: same 1500-byte packet = 1 command write + 1 DMA configuration = 2 operations

  • DMA configuration is complex (more cycles than a single store) → PIO better for small transfers

  • Pinning: memory regions involved in DMA must be pinned (non-swappable) in physical memory

Trade-off: For a system where store costs 1 cycle and DMA config costs 5 cycles (8-byte bus):

  • Keyboard (small data): PIO is better

  • NIC: depends on packet size (< 5 stores → PIO; larger → DMA)

Typical Device Access Flow

  1. Process issues system call (e.g., send, read)

  2. OS runs in-kernel stack (e.g., TCP/IP, file system)

  3. Device driver configures the device (via PIO or DMA)

  4. Device performs the operation

  5. Results/events traverse the chain in reverse (interrupt → driver → kernel → process)

OS Bypass

  • Device registers/memory mapped directly to user process address space

  • User-level driver (library) handles device-specific operations

  • OS involved only in setup and coarse-grain control (enable/disable, permissions)

  • Requirements: device must have sufficient separate registers for user operations vs. OS control

  • Device must support demultiplexing to route data to correct process (e.g., inspecting packet port numbers)

Synchronous vs. Asynchronous I/O

  • Synchronous: calling thread blocks until I/O completes (placed on device wait queue)

  • Asynchronous: thread continues after issuing I/O call; later polls for results or receives notification

Block Device Stack

From top to bottom:

  1. User application: operates on files (logical storage units)

  2. POSIX API: open(), read(), write(), close()

  3. Virtual File System (VFS): abstraction layer hiding details of underlying file systems

  4. Specific file system (e.g., ext2, ext3, ext4): maps files to disk blocks

  5. Generic block layer: standard interface to all block device types; masks device-specific differences

  6. Device driver: speaks device-specific protocol

Virtual File System (VFS)

Hides from applications whether files span multiple devices, use different file system implementations, or reside on remote servers.

VFS Key Abstractions

File:

  • Represented by file descriptors (created on open)

  • Operations: read, write, lock, sendfile, close

Inode (index node):

  • Persistent data structure; one per file

  • Contains: list of data blocks, permissions, size, lock status

  • Files identified by inode number

  • Files need not be stored contiguously on disk

Dentry (directory entry):

  • Soft-state (in-memory only, not persisted to disk)

  • One dentry per path component (e.g., /users/ada → dentries for /, users, ada)

  • Dentry cache avoids re-traversing paths

Superblock:

  • Map of how file system is organized on storage device

  • Contains: number of inodes, number of blocks, start of free blocks

  • File-system-specific metadata

ext2 File System

Disk partition layout:

  • Block 0: boot block (not used by Linux)

  • Remaining partition divided into Block Groups, each containing: - Superblock: count of inodes, disk blocks, start of free blocks - Group descriptor: bitmap locations, free node count, directory count - Bitmaps: quickly find free blocks/inodes - Inode table: each inode is 128 bytes, describes one file (owner, stats, data block locations) - Data blocks: actual file content

ext2 tries to balance allocation of directories and files across block groups.

Inodes and Indirect Pointers

Direct pointers only (simple approach):

  • 128-byte inode with 4-byte block pointers → max 32 pointers → max file size = 32KB (with 1KB blocks)

  • Too restrictive

Indirect pointer scheme (used in practice):

  • Direct pointers: each points to one data block (1KB each)

  • Single indirect: points to a block of pointers (256 pointers × 1KB = 256KB)

  • Double indirect: pointer → block of pointers → blocks of pointers → data (256² × 1KB = 64MB)

  • Triple indirect: 256³ × 1KB = 16GB

With 12 direct + 1 single + 1 double + 1 triple indirect (1KB blocks, 4-byte pointers):

  • Max file size ≈ 16 GB

With 8KB blocks (2K pointers per block):

  • Max file size ≈ 64 TB

Trade-off: deeper indirection = more disk accesses per file read (up to 4 for double indirect).

Disk Access Optimizations

Buffer cache:

  • Cache file data in main memory; periodic flush to disk (fsync)

  • Amortizes disk write cost over multiple in-memory writes

I/O scheduling:

  • Reorders disk operations to maximize sequential access and minimize disk head movement

  • Example: reorder write(25), write(17) to write(17), write(25) if head is at position 15

Prefetching:

  • Read ahead multiple blocks when one block is accessed (exploits locality)

  • Uses more disk bandwidth but improves cache hit rate

Journaling:

  • Write updates to a sequential log before applying to proper disk locations

  • Protects against data loss on crash; reduces random writes

  • Used by ext3, ext4, and many modern file systems

  • Journal must be periodically flushed to proper disk locations