Efficiency

Physical database design concerns how data is stored on disk and how access structures (indexes) speed up queries.

Storage Hierarchy

The storage hierarchy (registers → cache → main memory → disk → tape) has increasing capacity but decreasing speed at each level. Database performance is dominated by disk I/O — minimizing disk accesses is the primary optimization goal.

Disk Structure

  • Data is stored on platters with concentric tracks, divided into sectors.

  • Access time = seek time (move head to track) + rotational latency (wait for sector) + transfer time.

  • Sequential access is much faster than random access.

Records, Blocks, and Files

  • A record is the physical storage unit for a tuple.

  • Records are grouped into fixed-size blocks (pages) — the unit of disk I/O.

  • A file is a collection of blocks storing records of one or more relations.

File Organizations

Heap (Unsorted)

  • Records are inserted at the end of the file.

  • Insert: O(1) — fast (append to last block).

  • Search: O(n) — requires linear scan.

  • Delete: Search + mark/overwrite.

Sorted File

  • Records ordered by a sort key.

  • Search: O(log n) via binary search on blocks.

  • Insert: Expensive — requires shifting records to maintain order.

  • Range queries: Efficient once the start point is found.

Indexing

Primary Index

  • Built on the ordering key of a sorted file.

  • Sparse index: One index entry per block (points to the first record in each block).

  • Dense index: One index entry per record.

  • Sparse primary indexes are smaller and fit in fewer blocks.

Secondary Index

  • Built on a non-ordering field — always a dense index.

  • Supports point queries only (not efficient for range queries without additional structure).

Multi-Level Index

  • When a single-level index is too large to search efficiently, build an index on the index.

  • Each level reduces the search space by the blocking factor of the index.

  • The B+ tree is the standard multi-level index structure:

    • Balanced — all leaf nodes at the same depth

    • Internal nodes contain keys and pointers to children

    • Leaf nodes contain keys and pointers to data records, plus pointers to sibling leaves for range scans

    • Typical fan-out keeps tree height at 3–4 levels for very large tables

Hashing

Static hashing maps a key to a fixed number of buckets via a hash function.

  • Insert/Search/Delete: O(1) average case.

  • Overflow chains degrade performance when buckets fill up.

  • No support for range queries.

  • Extendible hashing and linear hashing are dynamic schemes that grow/shrink the hash table as data changes.

Indexing Decisions

When deciding whether to create an index, consider:

  • Table size: Small tables (fitting in a few blocks) gain little from indexing.

  • Multiple access paths: Each additional index adds write overhead (every insert/update/delete must maintain all indexes).

  • Read vs write ratio: Indexes help reads but slow writes. Favor indexes on read-heavy tables.

  • Existing indexes: Check whether a needed access path is already covered by an existing index.