Data Models, Schemas, and Instances
Data Abstraction
Data abstraction generally refers to the suppression of details of data organization and storage, and the highlighting of essential features for an improved understanding of data.
One of the main characteristics of the database approach is to support data abstraction so that different users can perceive data at their preferred level of detail.
A data model—a collection of concepts used to describe the structure of a database—provides the necessary means to achieve this abstraction. By the structure of a database, we mean the data types, relationships, and constraints that apply to the data. Most data models also include a set of basic operations for specifying retrievals and updates on the database.
Generic operations to insert, delete, modify, or retrieve any kind of object are often included in the basic data model operations.
Some data models also support the dynamic aspect or behavior of a database application. This allows the database designer to specify a set of valid user-defined operations that are allowed on the database objects.
Categories of Data Models
Many data models have been proposed, which can be categorized according to the types of concepts they use to describe the database structure:
High-level (conceptual) data models: Provide concepts that are close to the way users perceive data. These models are used during the early phases of database design and include concepts like entities, attributes, and relationships.
Low-level (physical) data models: Describe the details of how data is stored on the computer storage media, typically magnetic disks. These models include information such as record layouts, storage structures, and access paths. They are intended for use by system programmers and database administrators.
Representational (implementation) data models: Provide concepts that are easily understood by end users and are also close to the way data is physically stored. These models hide many storage details but can be implemented directly on a computer system. They are most commonly used in commercial DBMSs.
Conceptual data models use constructs such as:
Entity: Represents a real-world object or concept, such as an employee or project.
Attribute: Represents a property or characteristic of an entity, such as name or salary.
Relationship: Represents an association among two or more entities.
Representational or implementation data models are used most frequently in traditional DBMSs. These include:
The relational model (widely used today)
Legacy models such as the network and hierarchical models
Physical data models describe how data is stored as files on disk. They include specifications for:
Record formats
Record ordering
Access paths (e.g., indexing, hashing) that improve the efficiency of data retrieval
Another category of data models is self-describing data models. In these models, data storage combines the data with its description. In traditional DBMSs, the schema is stored separately from the data. In contrast, self-describing data models embed metadata within the data itself. Examples include:
XML
Key-value stores
Various NoSQL systems developed to manage big data
Schemas, Instances, and Database State
In a data model, it is important to distinguish between the description of the database and the actual data stored in it.
A schema is the overall description of the database. It defines the structure of data and is specified during database design. Schemas change infrequently and act as a blueprint for how data is organized. Each object in the schema—such as STUDENT or COURSE—is called a schema construct.
An instance (or database state) is the actual data in the database at a particular moment in time. It reflects the current content of the database and can change frequently as data is inserted, deleted, or modified.
The DBMS is responsible for ensuring that each instance of the database conforms to the constraints and structure defined in the schema. This ensures that every state of the database is valid.
The DBMS stores the schema definitions and constraints—also known as metadata—in the DBMS catalog. The system refers to this catalog to interpret and enforce the structure of the database.
The schema is sometimes referred to as the intension of the database.
The current database state or instance is referred to as the extension of the schema.
Changes to the schema may occasionally be required. For example, we may decide to add a new attribute such as Date_of_birth
to the STUDENT entity. This process is called schema evolution.
Most modern DBMSs support schema evolution, and these changes can often be applied while the database remains operational.
Data Models
A data model is a collection of concepts that can be used to describe the structure of a database. It provides a formal framework for organizing, representing, and manipulating data. Data models play a central role in supporting data abstraction.
Components of a Data Model
Structural Component:
Defines how data is organized in the database. This includes:Data types
Entities or objects
Attributes
Relationships among entities
Constraints (such as keys, cardinality, and data integrity rules)
Manipulative Component:
Specifies the operations that can be performed on the data. These include:Basic operations such as insert, delete, update, and retrieve
Query languages (e.g., SQL for relational databases)
Advanced or user-defined operations depending on the model (e.g., traversals in graph databases)
Dynamic or Behavioral Component (in some models):
Defines how the database can evolve over time. This includes:Triggers
Rules
User-defined procedures or operations
Support for modeling workflows or application-specific behaviors
Different data models offer different levels of abstraction and are suited to different types of applications.
Examples of Common Data Models
Relational Model: Uses tables (relations), rows (tuples), and columns (attributes). The most widely used model in commercial DBMSs.
Entity-Relationship (ER) Model: A high-level conceptual model used in database design.
Object-Oriented Model: Represents data as objects, similar to object-oriented programming.
Document Model: Stores data in document formats (e.g., JSON, XML). Used in NoSQL systems.
Graph Model: Represents data as nodes and edges. Useful in applications like social networks and recommendation systems.
Key-Value Model: Simplest form of NoSQL database, where each item is stored as a key-value pair.