Three-Schema Architecture
The three-schema architecture is a framework proposed by the ANSI/SPARC committee to promote data abstraction and data independence. Its main goal is to separate user applications from the physical storage of data, allowing for better modularity, flexibility, and system management.
This architecture defines three levels of database schema:
1. Internal Level (Physical Level)
Describes how data is physically stored on storage media. Defines the internal schema, which includes:
- File structures
- Indexes
- Access paths (e.g., hashing, B-trees)
- Storage locations
- Compression and encryption techniques (if any)
Uses a low-level physical data model. Intended for database administrators and system programmers.
2. Conceptual Level (Logical Level)
Describes the logical structure of the entire database for a community of users. Defines the conceptual schema, which includes:
- Entities and relationships
- Data types and attributes
- Integrity constraints
- User operations and business rules
The Conceptual schema hides physical storage details and focuses on what data is stored and how it is related. Typically implemented using a representational data model, such as the relational model.
3. External Level (View Level)
- Represents the individual user views of the database.
- Defines multiple external schemas, each tailored to the needs of a specific user group or application.
- An external schema includes only the relevant data for the user, hiding the rest of the database.
- Each external schema is often based on a conceptual design but implemented using the same data model as the conceptual schema (e.g., SQL in relational DBMSs like Oracle or SQL Server).
Schema Mappings and Transformation
The three schemas are descriptions of the data structure; actual data resides only at the internal level.
Each user interacts with the external schema. The DBMS must translate (or map) user requests:
- From the external schema → to the conceptual schema
- From the conceptual schema → to the internal schema
If the operation is a data retrieval, the DBMS must also reformat the result:
- From the internal format → to the conceptual format
- Then to the user’s external format
These translation processes are called schema mappings, and they are essential for maintaining abstraction between the different levels.
Data Independence
Data independence refers to the ability to modify the schema at one level of the database system without requiring changes to the schema at the next higher level. It allows for greater flexibility and maintainability of database systems.
There are two types of data independence:
1. Logical Data Independence
The capacity to change the conceptual schema without requiring changes to external schemas or application programs.
- Adding or removing an entity or attribute
- Modifying constraints
Only the mappings between the conceptual and external schemas need to be updated. Application programs that access the data through the external schema remain unchanged.
Logical data independence is difficult to achieve because application programs are often tightly coupled to the structure of the data.
2. Physical Data Independence
The capacity to change the internal schema without requiring changes to the conceptual schema.
- Changing the file organization
- Creating or removing indexes
- Moving data to a new storage device
Such changes are transparent to both the conceptual and external schemas.
Physical data independence is more common in practice and easier to achieve, as low-level storage details are naturally abstracted from users.
Importance of Schema Mapping and Catalog Management
In a multi-level DBMS, the system catalog (or metadata repository) must maintain detailed mapping information between each level:
- External ↔ Conceptual
- Conceptual ↔ Internal
These mappings ensure that requests and data can be translated and interpreted correctly across levels, enabling data independence since only the mapping between two levels changed so higher level schema need not change.