Distributed Database Management Systems (DDBMS)

A Distributed Database Management System (DDBMS) is a software system that manages a collection of logically interrelated databases spread across multiple sites connected by a computer network. Its fundamental goal is to make this distribution transparent, providing users and applications with a unified view of the data as if it were a single, centralized database.

While various architectures exist, many modern distributed applications are built using client-server models, particularly the three-tier architecture.

Classifying DDBMS: Key Factors

The term DDBMS describes a wide range of systems. To understand their differences, we can classify them based on three key factors:

Autonomy: This refers to the independence of individual nodes within a distributed system. High autonomy allows each site to control its own data, security, and operations. Low autonomy means sites are tightly controlled by a central master.
- Design Autonomy: Each site can use different data models or software.
- Communication Autonomy: Each site decides when and how to share data.
- Execution Autonomy: Local operations can be executed without interference from other sites.
Distribution: This describes how data and software are spread across the sites.
Homogeneity vs. Heterogeneity:
- A DDBMS is called homogeneous if all servers (or local DBMSs) use the same software, and all users (clients) use identical interfaces (software).
- A heterogeneous DDBMS integrates different software systems, which is common in large organizations that need to connect pre-existing, diverse databases.

Local Autonomy
- If a site cannot operate independently (i.e., has no standalone DBMS functionality), it has no local autonomy.
- High degree of local autonomy: Direct access by local transactions to a server is permitted, and each server is an independent and autonomous centralized DBMS with its own local users, local transactions, and DBA.

Types of Distributed Database Architectures

Based on these classifications, several architectural models are prevalent:

1. Client-Server Architecture (Three-Tier)

This architecture is widely used for building distributed database applications, especially web applications. It introduces an intermediate layer between the client and the database server, promoting modularity and scalability

Presentation Layer (Client): The user interface (e.g., a web browser or mobile app) that captures user input and displays results. Render web pages or forms. Technologies such as HTML, XHTML, CSS, JavaScript, Java, SVG, and others are commonly used. When web interfaces are used, It communicates with the application server, frequently via the HTTP protocol.
Application Layer (Business Logic): This is the middle tier that runs application programs, embodying business rules and logic. It processes requests from the client, formulates database queries, and sends them to the database server. It then passes processed data back to clients. Crucially for distributed applications, this layer can manage the decomposition of global queries into local ones and handle data distribution details, providing distribution transparency to the client.
Database Server (Query and Transaction Processing Tier) : This back-end tier manages the database, processing query and update requests from the application server. It processes these requests using SQL or stored procedures. Communication between the application server and database server often uses standards like ODBC, JDBC, or SQL/CLI, with results potentially formatted in XML. This layer is responsible for managing the actual data stored in the database.

While the database server itself could be centralized, the application server's role in abstracting data distributed across multiple underlying systems or fragments makes this architecture crucial for interacting with distributed database applications.

2. Peer-to-Peer (Pure) DDBMS Architecture

In a pure DDBMS, every site is a full-fledged DBMS, running its own local database and transaction manager. These sites work cooperatively to manage distributed data.

Each site has its own local system catalog, query optimizer, and execution engine.
A global transaction manager coordinates operations that span multiple sites, ensuring consistency across the entire system.

A pure distributed database architecture involves multiple computer systems, or sites (nodes), each hosting its own database (DB), interconnected by a communications network. This architecture is designed for environments where data and software are geographically distributed.

Structure: The architecture consists of several independent "Sites" (e.g., Site 1, Site 2, Site 3). Each site contains a database, and all sites are linked via a communications network. In this model, data and software are designed to be spread across these separated machines..
Characteristics: A distributed database environment often features heterogeneity of hardware and operating systems at each node. The primary goal is to manage data across these diverse, separated machines.
DDBMS Role: A Distributed Database Management System (DDBMS) is the software system that manages this distributed database, ideally making the distribution transparent to the user. This transparency can cover data organization, replication, and fragmentation.
Advantages: This architecture supports increased availability (as failures are isolated to their site of origin and other sites continue to operate), easier expansion (scalability by adding more data or nodes), and improved ease and flexibility of application development due to data distribution transparency.

3. Federated and Multidatabase Systems (FDBS)

A federated database system (FDBS), also known as a multidatabase system, is a type of DDBMS designed to integrate autonomous, pre-existing databases that may be stored under heterogeneous DBMSs. It acts as middleware software allowing access to these disparate databases.

An FDBS is a specialized DDBMS designed to integrate multiple existing, autonomous, and often heterogeneous databases into a single, federated system.

A core feature is that participating DBMSs are loosely coupled and maintains its own schema, local autonomy in design, communication, and execution.

A global schema is created to provide a unified view for applications, allowing them to access data from different sources seamlessly.
An FDBS typically provides a global view or federated schema that integrates shareable portions of component schemas, aiming for a consistent, unified view and network transparency for applications.

Challenges in Federated Systems (Dealing with Heterogeneity)

Integrating diverse systems presents several major challenges:

Differences in Data Models: The federation may need to connect relational, object-oriented, and legacy (hierarchical, network) databases. This requires intelligent query-processing mechanisms that interpret metadata and relate data across models.
Differences in Constraints and Query Languages: Systems may use different versions of SQL or have varying support for constraints, requiring translation and reconciliation.
Semantic Heterogeneity: This is the most difficult challenge. It arises from differences in the meaning, interpretation, and intended use of data. For example, two databases might both have a Price column, but one stores it in USD and the other in EUR, or one includes tax while the other does not. Resolving these semantic conflicts is crucial for a meaningful integration.

DDBMS vs. Parallel DBMS: A Critical Distinction

It's important not to confuse distributed database systems with parallel database systems, even though both use multiple processors.

Parallel DBMS: The primary goal is high performance by executing operations in parallel. Processors are typically in close proximity and connected by a high-speed interconnect, not a general-purpose network. They are often homogeneous.
DDBMS: The primary goal is to manage geographically distributed data and provide transparent access. The system is designed to handle network latency and potential failures across heterogeneous sites.

Common parallel architectures like shared memory, shared disk, and shared nothing are considered parallel processing environments, not distributed database systems.

Numbers

Searching & Matrix

Sorting

Distributed Database Management Systems (DDBMS) ​

Classifying DDBMS: Key Factors ​

Types of Distributed Database Architectures ​

1. Client-Server Architecture (Three-Tier) ​

2. Peer-to-Peer (Pure) DDBMS Architecture ​

3. Federated and Multidatabase Systems (FDBS) ​

Challenges in Federated Systems (Dealing with Heterogeneity) ​

DDBMS vs. Parallel DBMS: A Critical Distinction ​