Skip to content

Managing Data and Storage

The Core Problem: The Ephemeral Filesystem

Containers are designed to be ephemeral and stateless. When a container is created, Docker stacks a thin, writable layer on top of the read-only image layers. Any data written by the application (logs, database files, user uploads) is stored in this writable layer.

When the container is deleted (docker rm), this writable layer is destroyed with it. All data inside is permanently lost.

To run stateful applications (like databases) or to manage application data (like source code or configuration files), we must use one of Docker's persistent storage mechanisms. These mechanisms move data outside the container's writable layer and onto the host filesystem, where its lifecycle is decoupled from the container.

Docker provides three mechanisms for mounting data into a container:

  1. Volumes (Preferred Method)

  2. Bind Mounts

  3. tmpfs Mounts

1. Volumes

Volumes are the officially recommended and preferred mechanism for persisting data in Docker.

A Volume is a directory stored on the Docker host's filesystem that is managed by Docker. When you create a volume, Docker creates a directory in a specific location on the host (by default, inside /var/lib/docker/volumes/) and manages its lifecycle.

Key Characteristics:

  • Managed by Docker: You don't need to know where on the host the data is stored. You refer to the volume by its name (e.g., my-db-data), and Docker handles the underlying path.

  • Decoupled Lifecycle: The volume's lifecycle is completely separate from the container's. You can create, delete, and inspect volumes independently. A volume is not deleted when a container using it is deleted.

  • Portability and Safety: Volumes are the safest and most portable way to manage data. You can back up, restore, or migrate volumes. Volume drivers also allow you to store volume data on remote hosts or cloud providers (like Amazon S3), which is not possible with other methods.

  • Performance: Volumes are high-performance and are the best choice for read/write-heavy applications like databases.

Use Case: Persisting database files (/var/lib/postgresql/data), storing user-uploaded content, sharing data safely between containers.

2. Bind Mounts

A Bind Mount is the "legacy" method. It is a simple mapping (or "bind") of a specific file or directory from the host into the container. Unlike volumes, Docker does not manage this storage; it just creates the mapping.

Key Characteristics:

  • Tied to Host Path: You must specify the exact, absolute path on the host machine (e.g., /home/user/project/src:/app).

  • Non-Portable: This creates a hard dependency on the host's filesystem structure. A container that relies on /home/user/project/src will not run on another machine that doesn't have that exact directory.

  • Security Risk: A container with a bind mount can modify the host filesystem, including creating, modifying, or deleting sensitive host files. A misconfigured container could (for example) be given access to / or /etc.

  • High Performance: Provides direct, high-performance access to the host's files.

Use Case: The primary use case is for local development. By bind-mounting your source code (e.g., ./webapp:/app), you can edit code on your host machine with your favorite IDE, and the changes are instantly reflected inside the running container, enabling live-reloading.

3. tmpfs Mounts

A tmpfs Mount is a non-persistent, in-memory storage option. Data is written to the host's RAM, not to its disk.

Key Characteristics:

  • In-Memory: Data is never written to the host filesystem (disk).

  • Extremely Fast: Writing and reading from RAM is significantly faster than from a disk.

  • Non-Persistent: As soon as the container is stopped, the tmpfs mount is destroyed, and all data within it is lost.

  • Security: Ideal for temporary, sensitive data that you do not want to be written to disk.

Use Case: Storing temporary files, caches, or secrets that an application needs during its runtime but should not be saved.

Comparison: Volume vs. Bind Mount vs. tmpfs

FeatureVolumesBind Mountstmpfs Mounts
Host LocationManaged by Docker (e.g., /var/lib/docker/volumes/)Specific path, set by user (e.g., /home/user/src)Host's Memory (RAM)
LifecycleIndependent of containerTied to host filesystemTied to container lifecycle
PersistencePersistentPersistentNon-Persistent
PortabilityHigh. (Abstracted by name)None. (Tied to host path)Not applicable
Managementdocker volume CLIManual (user manages host path)N/A (automatic)
Best Use CaseStateful data (e.g., databases)Local development (source code)Temporary/sensitive data

Practical Usage: CLI Flags

There are two flags for mounting storage: the simple -v flag and the more explicit --mount flag. The --mount flag is the modern, recommended syntax.

1. tmpfs Mount Syntax

# --tmpfs flag
docker run -d --name my-app \
  --tmpfs /app/cache \
  my-image

# --mount flag (more explicit)
docker run -d --name my-app \
  --mount type=tmpfs,destination=/app/cache \
  my-image

2. Bind Mount Syntax

# -v flag (simple syntax)
# Format: -v /path/on/host:/path/in/container
docker run -d --name my-app \
  -v /Users/me/project:/usr/src/app \
  my-image

# --mount flag (more explicit)
# Format: --mount type=bind,source=/path/on/host,target=/path/in/container
docker run -d --name my-app \
  --mount type=bind,source=/Users/me/project,target=/usr/src/app \
  my-image

3. Volume Syntax

# -v flag (simple syntax)
# If the first part is NOT a path, Docker assumes it's a named volume.
# This will create a volume named 'my-db-data' if it doesn't exist.
docker run -d --name my-db \
  -v my-db-data:/var/lib/postgresql/data \
  postgres:14

# --mount flag (more explicit)
# Format: --mount type=volume,source=my-db-data,target=/path/in/container
docker run -d --name my-db \
  --mount type=volume,source=my-db-data,target=/var/lib/postgresql/data \
  postgres:14

Managing Volumes with the CLI

Docker provides a full set of commands to manage the lifecycle of your volumes.

  • docker volume create [my-volume-name]

    • Creates a new named volume. (This is often not needed, as docker run or docker-compose up will create it automatically if it doesn't exist).
  • docker volume ls

    • Lists all volumes on the Docker host.
  • docker volume inspect [my-volume-name]

    • Shows detailed metadata about a volume, including its actual location on the host (the Mountpoint).
  • docker volume rm [my-volume-name]

    • Removes one or more volumes. You cannot remove a volume if it is currently being used by a container.
  • docker volume prune

    • A very useful command for cleanup. It removes all "dangling" volumes (volumes that are not currently attached to any container).

Real-World Use Cases (in docker-compose.yml)

As seen in File 08, Docker Compose makes managing storage simple and declarative.

Use Case 1: Stateful Database (Volume)

This is the standard pattern for a database. The db-data volume persists all data, even if the db container is deleted and re-created.

services:
  db:
    image: postgres:14
    environment:
      - "POSTGRES_USER=user"
    volumes:
      # Maps the named volume 'db-data'
      - "db-data:/var/lib/postgresql/data"

# Top-level 'volumes' key declares the volume
volumes:
  db-data:

Use Case 2: Local Development (Bind Mount)

This pattern is for a web application, mounting the local source code for live-reloading.

services:
  web:
    build: .
    ports:
      - "8000:8000"
    volumes:
      # Bind mounts the current directory into /app
      - "./:/app"

# No 'volumes' key is needed for a bind mount