Skip to content

Database Design Process

The database design process is a structured approach to creating an efficient and reliable database system that meets user requirements.

The goal is to generate a set of relation schema's that allows information to be stored without unnecessary redundancy and facilitates easy retrieval.

Phases of the Database Design Process

  1. Requirements Collection & Analysis: Designers interview users to understand their data and functional requirements (what information to store, what operations to perform, retrievals and updates). Techniques like flow diagrams, sequence diagrams, and scenarios are used.

  2. Conceptual Design: The collected requirements are translated into a conceptual schema using a high-level model like ER. This schema describes entities, attributes, relationships, and constraints, while hiding physical storage details. This phase often results in an E-R diagram, which provides a graphic representation of the schema.

  3. Logical Design (Data Model Mapping): The conceptual schema is mapped to an implementation data model, typically the relational model. This is where tables, columns, and keys in SQL are designed.

  4. Physical Design: The final step involves defining the physical storage details, such as file organization, indexes, and access paths, to ensure optimal performance.

  5. Application Program Design / Transaction Implementation: In parallel with database design, application programs are designed and implemented as database transactions, corresponding to the high-level transaction specifications identified earlier.

			Miniworld
			   |
			   V
+-------------------------------+
| REQUIREMENTS COLLECTION &     | <--- Data Requirements &
|      ANALYSIS                 |     Functional Requirements
+-------------------------------+
			   |
			   V
+-------------------------------+
|       CONCEPTUAL DESIGN       | <--- High-Level Data Model
| (e.g., E-R Schema Diagram)    |    (Entity-Relationship Model)
+-------------------------------+
			   |
			   V
+-------------------------------+
|        LOGICAL DESIGN         | <--- Mapping to Implementation Data Model
| (Data Model Mapping,          |   (Relational Schema with FDs)
|  Normalization)               |
+-------------------------------+
			   |
			   V
+-------------------------------+
|       PHYSICAL DESIGN         | <--- Internal Schema 
+-------------------------------+     (File Org., Indexes)
			   |
			   V
+-------------------------------+
| APPLICATION PROGRAM DESIGN|
| (Transaction Implementation)  |
+-------------------------------+
			   |
			   V
Stored Database (Implemented)

The Conceptual Model

The first and most critical phase of designing a database is creating a conceptual model. This is a high-level blueprint that captures the data requirements of an organization in a clear, technology-independent way. It focuses on what data is needed, not how it will be stored.

The most widely used conceptual model is the Entity-Relationship (ER) Model.

Concepts of the Entity-Relationship (ER) Model

The ER model describes the world using three main concepts: entities, their attributes, and the relationships among them.

Entity

An entity is a real-world "thing" or object of interest, which can be:

  • Physical: A person, a car, an employee.

  • Conceptual: A company, a university course, a job.

Attribute

Entity has attributes which are properties or characteristic that describes an entity. An EMPLOYEE entity might have attributes like Name, Age, Address, and Salary.

Each entity possesses a value for each of its attributes, and these values form a major part of the data stored in the database.

Types of Attributes

The ER model recognizes several types of attributes, allowing for a flexible representation of complex real-world data:

1. Simple (Atomic) vs Composite Attributes

  • Simple (Atomic) Attributes: indivisible and cannot be broken down into smaller subparts. Values from a domain are atomic, meaning they are indivisible units. Example: ID, age, gender, salary, title, credits.

  • Composite Attributes: can be divided into smaller subparts, which are themselves other attributes with independent meanings. Composite attributes help to group together related attributes, making the modeling cleaner.

  • They can form a hierarchy, meaning a component attribute of a composite attribute can also be composite. Example: AddressStreet_address, City, State, Zip
    Street_addressNumber, Street, Apartment_number

Composite attributes allow referencing either the entire attribute or its individual components.


2. Single-Valued vs Multivalued Attributes

  • Single-Valued Attribute: Most attributes fall into this category, meaning they have a single value for a particular entity at any given time.
    Example: A person has only one Birth_Date, Student's ID.

  • Multivalued Attribute: Can have a set of values for a specific entity. Different entities can have different numbers of values for such attributes.
    Example: Colors of a car, College_degrees of a person

Multivalued attributes may also have lower and upper bounds to constrain the number of values allowed. A car may have 1 or 2 color values.


3. Stored vs Derived Attributes

  • Stored Attributes: A base attribute whose value is stored directly in the database. Example: Birth_date

  • Derived Attributes: Its value is not stored but computed or derived from the values of other related attributes or entities when needed.
    Example:

    • Age can be derived from their Birth_Date and the current date.
    • Number_of_employees derived from counting employees in a department.

4. Complex Attributes

A complex attribute is formed by nesting composite and multivalued attributes. They provide flexibility to represent hierarchical attribute structures.

A person's Address_phone attribute could include a multivalued Phone attribute and a composite Address attribute, both of which are themselves composite.

json
{ Address_phone(
	{ Phone(Area_code, Phone_number) },
	Address(Street_address(Number, Street), City, State)
  )
}

Entity Types and Sets

  • An Entity Type is the blueprint or template for a set of similar entities. It defines the collection of name and attributes that all entities of that type will share. (e.g., the concept of an EMPLOYEE).

  • An Entity Set is the collection of all actual entity instances of a particular type at a specific point in time. (e.g., all the current employees in the company's database).

Entity sets are usually referred to using the same name as the entity type, though they are conceptually distinct. EMPLOYEE refers to both the entity type and the current collection of all employee entities in the database.

ER Diagram Notation

  • Entity types are represented as rectangular boxes enclosing the entity type name.

  • Attribute names are enclosed in ovals and connected to their entity type with straight lines.

  • Composite attributes are connected to their component attributes by straight lines.

  • Multivalued attributes are shown in double ovals.


Key Attributes of an Entity Type

An important constraint is that entities within an entity set must be uniquely identifiable. Which is the key (uniqueness) constraint on attributes.

  • A key attribute has values that are distinct for each entity in the entity set. This value can be used to identify each entity uniquely.

  • A key can be a single attribute (e.g., StudentID) or a composite key made of multiple attributes (e.g., (State, License_Number) for a car).

  • An entity type may have multiple candidate keys. The selection of a primary key happens during the logical design phase. Superfluous attributes should not be included in a key.

  • In ER diagrams, key attributes are underlined inside the oval.


Value Sets (Domains) of Attributes

Each simple attribute of an entity type is associated with a value set (or domain), which specifies the set of permissible values for that attribute, similar to a data type in programming (e.g., integer, string).

  • They are not typically shown in ER diagrams.

  • At any given time, only a subset of the domain values exist in the database, this subset reflects the current state of the miniworld.


NULL Values

Each value in a tuple must be Atomic, from attribute's domain. Sometimes, an attribute value for an entity may not be known. The special value NULL is used to represent this.

NULL is not the same as zero (0) or an empty string (""). NULL can mean:

  1. Not Applicable (Value does not exist) : The attribute logically doesn’t apply to an entity. if an entity type has an attribute that is relevant only to a subset of its entities, other entities will have a NULL value for that attribute. (e.g., Apartment_number for a person living in a house is NULL indicating "Not Applicable").

  2. Unknown (Exists but not known) : situation arises when an attribute value genuinely exists for an entity but is not known or has not yet been recorded in the database. e.g., Height, Date_of_birth of a person is unknown.

  3. Unknown (May or may not exist) : It's unclear if the entity even possesses this attribute in the real world. e.g., Home_phone of a person is NULL, and it's unclear if it exists at all.

When an attribute is declared as NOT NULL, the insertion of a NULL value for that attribute will generate an error.

SQL prohibits NULL values in the primary key of a relation due to the entity integrity constraint, as primary keys are used to uniquely identify tuples.

The presence of NULL values can lead to unpredictable results in SELECT, JOIN, and aggregate operations. It is generally recommended to avoid NULL values as much as possible in database design.

Made with ❤️ for students, by a fellow learner.