Data Fragmentation, Replication, and Allocation Techniques for Distributed Database Design

In a distributed database system (DDBS), data fragmentation, replication, and allocation are foundational design techniques. They help divide and manage the data across multiple nodes (or sites) in a networked environment.

These techniques serve the following purposes:

Breaking the database into manageable logical units called fragments.
Enhancing availability and reliability through data replication.
Strategically allocating data fragments or replicas to appropriate sites.

All metadata related to fragmentation, replication, and allocation is stored in a global directory, which the DDBMS uses to route queries and manage transactions.

Data Fragmentation and Sharding

Before distributing data, we must decide how to divide it logically. The basic unit is a relation (i.e., a table), but we can further partition each relation into fragments for more granular control and distribution.

Horizontal Fragmentation (Sharding)

Definition: Divides a relation into subsets of rows (tuples) based on conditions applied to one or more attributes of the relation.
Each horizontal fragment (or shard) includes subsets of tuples that meet a specific condition and logical meaning, often using a WHERE clause-like logic. These can be assigned to different sites in the distributed system.

Example: Suppose we partition the EMPLOYEE relation by department number:

Fragment 1: Dno = 5
Fragment 2: Dno = 4
Fragment 3: Dno = 1

Each of these fragments can then be stored at different sites.

Derived Horizontal Fragmentation:

A related table is fragmented in the same way as its parent table using foreign key relationships. This ensures that related tuples are stored together across fragments.

Vertical Fragmentation

Definition: Divides a relation "vertically" into subsets of columns (attributes).
Useful when different sites only need access to specific attributes. Each site may not need all the attributes of a relation so only certain attributes and kept.

Example: Fragmenting the EMPLOYEE relation:

Fragment A: Personal Information Name, Bdate, Address, Sex
Fragment B: work related Ssn, Salary, Super_ssn, Dno

Important: To preserve the ability to reconstruct the original table, every vertical fragment must include the primary key (e.g., Ssn) or another unique identifier.

Mixed (Hybrid) Fragmentation

A combination of horizontal and vertical fragmentation.
Allows more flexible partitioning depending on both row and column-based needs.

Fragmentation Schema

A fragmentation schema defines how the database is broken down into fragments. It must satisfy:

Completeness: All data (tuples and attributes) in the database is represented.
Reconstructability: The original database can be recreated using OUTER JOINs, UNIONs, or OUTER UNIONs.

Data Allocation

Once data is fragmented, decisions must be made about where each fragment (or replica) will be stored. This is known as data allocation.

Allocation Schema

An allocation schema maps each fragment to one or more sites of the DBMS. When a fragment is stored at multiple sites, it is said to be replicated.

Data Replication

Replication involves maintaining copies of data fragments at multiple sites to improve availability, reliability, and query performance.

Fully Replicated Database

Definition: Every site stores a complete copy of the database.
Advantages:
- Very high availability: The system continues functioning as long as one site is operational.
- Improved read performance: Queries for data retrieval (read) can be answered locally at any site if it includes a server module.
Disadvantages:
- Poor update performance: Every write/update must be propagated to all copies of the database.
- Increases complexity of concurrency control and recovery mechanisms.

No Replication (Nonredundant Allocation)

Each fragment is stored at only one site.
Requires disjoint fragments, except for primary key repetition in vertical or mixed fragmentation.
Reduces redundancy but risks lower availability and resilience.

Partial Replication

Only some fragments are replicated, while others are not.
The number of copies of each fragment can vary between one and the total number of sites.

A replication schema outlines which fragments are replicated and at which sites.

Data Distribution (Data Allocation)

After defining fragmentation and replication, each fragment (or its replicas) must be assigned to a site in the system. This is the data allocation step.

The choice of sites and the degree of replication depends on several factors:

Performance goals (e.g., low latency for certain queries).
Availability requirements.
Types and frequencies of transactions at each site.
Network costs and data access patterns.

Summary

Concept	Description
Fragmentation	Dividing data into smaller parts (horizontal, vertical, or hybrid)
Replication	Creating multiple copies of fragments to improve availability/performance
Allocation	Assigning fragments/replicas to specific sites
Global Directory	Metadata storage for managing fragmentation, replication, and allocation
Sharding	A practical application of horizontal fragmentation

Data Fragmentation, Replication, and Allocation Techniques for Distributed Database Design ​

Data Fragmentation and Sharding ​

Horizontal Fragmentation (Sharding) ​

Vertical Fragmentation ​

Mixed (Hybrid) Fragmentation ​

Fragmentation Schema ​

Data Allocation ​

Allocation Schema ​

Data Replication ​

Fully Replicated Database ​

No Replication (Nonredundant Allocation) ​

Partial Replication ​

Data Distribution (Data Allocation) ​

Summary ​