Distributed File Systems¶
Distributed File Systems¶
Modern operating systems hide storage diversity behind the virtual file system (VFS) interface. The VFS can also hide the fact that files reside on remote machines accessed over a network. When multiple machines collaborate to deliver a file system service, the result is a distributed file system (DFS).
DFS Models¶
A DFS can be organized in several ways:
Client-server — clients and file server on different machines. Simplest model; focus of this lesson.
Replicated — file server replicated across multiple machines. Every machine holds all files. Improves fault tolerance (replicas survive failures) and availability (load-balanced reads). Downside: write complexity increases because replicas must stay consistent.
Partitioned — files divided across machines. More scalable than replication (add machines to store more files) but a single machine failure loses that partition’s files.
Combined — files partitioned into groups, each group replicated independently. Used by large-scale systems (Google, Facebook).
Peer — all nodes both store and serve files; no client/server distinction.
Remote File Service Models¶
Upload/Download Model¶
Client downloads the entire file, operates locally, then uploads it back (similar to FTP/SVN). Fast local access once downloaded, but:
Entire file must be transferred even for small accesses
Server loses access control once the file leaves
True Remote File Access¶
File stays on the server; every read/write goes over the network. Server retains full control and consistency is easy, but:
Every operation incurs network latency (even repeated reads of read-only files)
Server becomes a bottleneck, limiting scalability
Practical Compromise¶
Clients cache blocks of files locally (memory or local disk). This reduces latency and server load. However, it introduces the need for:
Client-to-server notifications of local modifications
Server-to-client notifications when cached data becomes stale
Consistency management — more complex server logic and relaxed sharing semantics compared to local file systems
Stateless vs Stateful Servers¶
Stateless server — maintains no per-client state. Every request is self-contained (file handle + absolute offset + data).
Pros: no server-side resource consumption for state, resilient to failures (restart and resume)
Cons: cannot support caching (no state to track consistency), larger request messages
Stateful server — tracks which clients access which files, read/write modes, cached blocks.
Pros: enables caching with consistency, locking, incremental operations (e.g., “read next 1 KB”)
Cons: state must be recovered on failure (checkpointing, rebuild), runtime overhead for consistency protocols
Caching in a DFS¶
Caching lets clients locally store file blocks and perform operations (open, read, write) without contacting the server. Keeping caches consistent requires coherence mechanisms analogous to write-invalidate / write-update in shared-memory multiprocessors, but adapted for higher network latencies.
Where files can be cached:
Client memory (buffer cache) — fastest
Client local storage (disk/SSD) — faster than network access
Server memory (buffer cache) — usefulness depends on request interleaving across many clients
Coherence trigger options:
On demand — when a client opens/accesses a file
Periodic — at configured intervals
Client-driven (pull) vs server-driven (push)
File Sharing Semantics¶
UNIX Semantics¶
On a single machine, a write by process A is immediately visible to process B (shared buffer cache). In a DFS this is impractical due to message latencies.
Session Semantics¶
On close: client flushes all changes to the server
On open: client checks with server for newer version
A session = period between open and close
Clients may read stale data mid-session; long-open files lead to long inconsistency windows
Periodic Updates¶
Client writes propagated to server at regular intervals; server invalidations sent periodically to clients. Establishes time bounds on inconsistency. Clients can also explicitly flush or sync.
Other Semantics¶
Immutable files — never modified, only created/deleted (e.g., photo sharing)
Transactions — atomic commit of a collection of operations
Per-File Server State (Session + Server-Driven)¶
For a server-driven, session-semantics DFS, per-file metadata includes:
Readers list
Current writers (multiple possible with overlapping sessions)
Version number
File vs Directory Service¶
Regular files and directories have different access patterns (locality, lifetime, frequency). A DFS may apply different semantics or different update periods for each. Example: directories are shared more frequently but modified less often → less frequent write-backs suffice.
Replication and Partitioning¶
Replication¶
Every machine holds all files.
Pros: load balancing, high availability, fault tolerance
Cons: writes must propagate to all replicas (synchronous = slow writes; asynchronous = requires conflict resolution, e.g., voting)
Partitioning¶
Each machine holds a subset of files (by name, directory, etc.).
Pros: scalable file system size (add machines), writes localized to one machine
Cons: machine failure loses that partition’s files, potential hotspots if access is uneven
Example — 3 machines, 100 files each:
Replicated: 100 total files, 0% lost on single failure
Partitioned: 300 total files, 33% lost on single failure
Combined approach (partition + replicate each partition) provides the best balance of size and resiliency.
Network File System (NFS)¶
NFS is a widely deployed DFS originally from Sun Microsystems. Clients access remote files through the VFS interface using normal file descriptors.
Architecture¶
Client application uses VFS; VFS routes to local FS or NFS client
NFS client communicates with NFS server on the remote machine via RPC
NFS server translates requests into local file system operations
On open, server returns a file handle (byte sequence encoding server + file info); used in all subsequent operations
A file handle becomes stale when the remote file is deleted (not when outdated, server unresponsive, or open too long)
NFS Versions¶
NFSv3 — stateless by protocol specification; implementations typically bolt on caching and locking modules
NFSv4 — stateful by design; natively supports caching and locking
Caching semantics:
Non-concurrent access → session semantics (flush on close, check on open)
Additional periodic updates break pure session semantics when concurrent access occurs
Default periods: 3 seconds for files, 30 seconds for directories
NFSv4 delegation: server delegates full file management rights to a client for a period, avoiding update checks
Locking: lease-based — lock valid for a time period; client must release or renew before expiry. On client failure, lease expires and server reassigns. NFSv4 supports share reservations (reader/writer locks) with upgrade/downgrade.
NFS cache consistency is neither purely session nor purely periodic — it is a hybrid that depends on configuration.
Sprite Distributed File System¶
Sprite (UC Berkeley) is a research DFS whose design was driven by measured file access patterns.
Access Pattern Observations¶
33% of accesses are writes — caching helps reads, but write-through wastes the benefit for one-third of accesses
75% of files open < 0.5 s; 90% open < 10 s — session semantics would still cause frequent server interactions
20–30% of new data deleted within 30 s; 50% within 5 min — write-back on close often unnecessary
File sharing is rare — no need to optimize for concurrent access
Design Decisions¶
Write-back policy: every 30 s, write back blocks not modified in the last 30 s (recently modified blocks are likely still being worked on)
No write-back on close — file can be opened, modified, closed multiple times before data reaches the server
When a new client opens a file currently being written, server contacts the writer to collect dirty blocks
Directories not cached on clients (every open goes to server)
Concurrent writes → caching disabled for that file; all accesses serialized at server. Re-enabled when concurrency ends.
File Access State¶
Client per-file state: cache status, cached blocks, dirty-block timestamps, version number.
Server per-file state: readers list, current writer, version number, cacheable flag.
Sequential sharing (writers take turns): caching allowed, sequential semantics.
Concurrent sharing (multiple simultaneous writers): caching disabled, all operations go through server. Sprite dynamically enables/disables caching based on detected sharing pattern.