Middleware¶
Distributed Systems Architecture¶
This lesson focuses on architectures for heterogeneous distributed systems — multiple, potentially different computers collaborating to provide an application. A classic example is a client-server insurance system: web browsers on agent laptops → web server running business logic → database storing policy data.
Middleware is the collection of technologies between client and server that address non-functional constraints of the distributed system. Middleware is conceptually similar to architectural connectors. Material drawn from the Emmerich paper.
Context¶
The Internet has driven explosive growth in distributed applications, bringing:
More customers, heavier loads, increasing resource demands → performance and resource concerns
Specialized hardware (ATMs, card readers, mobile phones, Square readers) → device heterogeneity
Increasingly powerful applications requiring integration of existing components
Characteristic Issues¶
Five major categories of issues in distributed systems:
Network communication
Coordination
Reliability
Scalability
Heterogeneity
Network Communication¶
Key concerns when applications span a network:
Error handling: synchronous errors (request-response) vs. asynchronous errors (spontaneous notifications)
Reliable delivery: strategies like retransmission risk duplicate processing
Data representation: heterogeneous machines may represent data differently
Transactions: concurrent readers/writers on shared databases risk inconsistency
Data Transportability¶
Also called serialization (Java), marshalling, or pickling. Differences to resolve:
Bit order, byte order, character sets, alignment, word length (32-bit vs. 64-bit)
Organization of complex data structures; compaction
Self-describing data: data accompanied by its own schema
Standards: Internet standard x680, Google Protocol Buffers.
ACID Transactions¶
For reliable database access with multiple concurrent readers/writers:
Atomic — the transaction’s steps are treated as a single indivisible unit
Consistent — database integrity constraints hold before and after
Isolated — intermediate states invisible to other transactions
Durable — committed transactions are permanently persisted
Not always required — e.g., a voting app where approximate vote counts are acceptable may skip ACID for performance.
Coordination¶
Synchronization between distributed components:
Synchronous — sender blocks until response arrives; often clocked
Asynchronous — sender continues after sending; notified on response; more general but harder to reason about
Design decisions:
Push vs. pull: server pushes updates to clients, or clients pull on demand
Robustness: handling component failures (timeouts, acknowledgements)
Availability: 24/7 vs. scheduled maintenance; handling load
Persistence: database vs. file system for server state
Concurrency: handling multiple simultaneous clients; transaction integrity
Reliability¶
Percentage of time the application provides expected services. Typical failure mode: undelivered/unacknowledged messages.
Strategies and their trade-offs:
Best effort — send and hope
At most once — prevent duplicates
At least once — guarantee delivery, risk duplicates
Exactly once — ideal but most expensive
Classic reliability-performance trade-off: replication improves reliability but costs time and resources.
Scalability¶
How easily the application grows (more users, greater load). Scaling typically means adding hardware. The key question: to what extent does adding machines change the architecture?
Transparency types:
Access transparency — application doesn’t need to know if a resource is local or remote
Location transparency — physical location of resources is hidden
Migration transparency — resources can move between machines without affecting the system
Replication transparency — data replication is invisible to the application
Heterogeneity¶
Dimensions of heterogeneity: hardware (including embedded devices, phones), operating systems, programming languages, standards/protocols/APIs, browser families and versions.
Approaches to managing heterogeneity:
Standard APIs from W3C, OMG, ANSI, ISO — must address backward and forward compatibility
Normative architectures — e.g., OMG’s Model-Driven Architecture (MDA) separating machine-independent from machine-dependent parts
Vendor platforms — JEE (Oracle/Sun), .NET (Microsoft), WebSphere (IBM)
LAMP stack — Linux (OS), Apache (web server), MySQL (database), PHP (server-side scripting)
Kinds of Middleware¶
Four categories based on interaction mechanism:
Transactional Middleware¶
Handles distributed transactions with ACID guarantees. Uses policies like two-phase commit for reliability and consistency. Provides location transparency. Examples: CICS (IBM), Tuxedo (UNIX), Encina (HP).
Message-Oriented Middleware (MOM)¶
Based on asynchronous message passing with message queues. Provides fault tolerance (queued messages survive component failures). Not particularly transparent — clients must implement coordination embedded in messages. Examples: IBM MQSeries, Sun Java Message Queues, Amazon queuing solutions.
Procedural Middleware¶
Remote Procedure Calls (RPC) — make remote computation look like a local function call. Typically synchronous and OS-dependent. Available since the 1980s. Technologies: SUN RPC, NDR for data representation.
Object/Component Middleware¶
Extension of RPC to remote objects — send messages to objects on remote machines. Issues include object identity (memory addresses not globally unique) and cross-machine inheritance/delegation. Provides synchronous and asynchronous messaging, marshalling, exception handling. Examples: CORBA, COM (Microsoft), Java RMI (Oracle).
Software Engineering Issues¶
Requirements: non-functional requirements dominate; must be elicited from customers who may be uncertain about QoS needs
Architecture: choosing connectors that map to available middleware solutions
Design:
Latency — network delays require timeouts and retransmission protocols
Statefulness — web applications are often stateless; persistent state via databases or cookies
Concurrency — synchronization to avoid deadlock and ensure liveness
Service discovery: naming/White Pages (URLs, IP addresses) vs. Yellow Pages (capability-based lookup, e.g., UDDI)
Reflection and meta-object protocols: self-describing programs and data
Data representations: relational databases vs. NoSQL for different application needs
Fat vs. thin clients: trade-off between client-side functionality and simplicity (AJAX enables selective page updates without full round-trips)
Device constraints: power, memory, processing limitations on mobile/embedded devices
Mobility: handling intermittent connectivity
Web Services¶
Software systems supporting machine-to-machine interaction over the web using agreed-upon APIs and standards:
Data protocols: XML, SOAP, RDF, OWL, JSON
Service description: WSDL (Web Services Description Language) — enables code generation
Discovery: UDDI (Universal Description, Discovery and Integration) — Yellow Pages for services
Example platform: J2EE — web browser (HTML/applets) ↔ web server + EJB containers (business logic) ↔ database services.
Service-Oriented Architecture (SOA)¶
An architectural style for creating and using self-contained, self-defined, modular services throughout their lifecycle.
Characteristics:
Each service is a meaningful vertical slice of functionality
Services are stateless (simpler code), flexible (composable), and middleware-transparent
Architect decomposes functionality into a suite of sub-services that compose into user-facing services
Services are published, located, and dynamically invoked
Re-architecting risk: converting legacy mainframe applications to SOA involves switching from control-driven to reactive/event-driven models — a major and costly transformation with significant risk.
Summary¶
Middleware is a collection of technologies (APIs, protocols, tools, design patterns) for addressing non-functional constraints in heterogeneous distributed applications. As the Internet grows, standard middleware solutions become increasingly important.