Managing Data and Storage
The Core Problem: The Ephemeral Filesystem
Containers are designed to be ephemeral and stateless. When a container is created, Docker stacks a thin, writable layer on top of the read-only image layers. Any data written by the application (logs, database files, user uploads) is stored in this writable layer.
When the container is deleted (docker rm), this writable layer is destroyed with it. All data inside is permanently lost.
To run stateful applications (like databases) or to manage application data (like source code or configuration files), we must use one of Docker's persistent storage mechanisms. These mechanisms move data outside the container's writable layer and onto the host filesystem, where its lifecycle is decoupled from the container.
Docker provides three mechanisms for mounting data into a container:
Volumes (Preferred Method)
Bind Mounts
tmpfsMounts
1. Volumes
Volumes are the officially recommended and preferred mechanism for persisting data in Docker.
A Volume is a directory stored on the Docker host's filesystem that is managed by Docker. When you create a volume, Docker creates a directory in a specific location on the host (by default, inside /var/lib/docker/volumes/) and manages its lifecycle.
Key Characteristics:
Managed by Docker: You don't need to know where on the host the data is stored. You refer to the volume by its name (e.g.,
my-db-data), and Docker handles the underlying path.Decoupled Lifecycle: The volume's lifecycle is completely separate from the container's. You can create, delete, and inspect volumes independently. A volume is not deleted when a container using it is deleted.
Portability and Safety: Volumes are the safest and most portable way to manage data. You can back up, restore, or migrate volumes. Volume drivers also allow you to store volume data on remote hosts or cloud providers (like Amazon S3), which is not possible with other methods.
Performance: Volumes are high-performance and are the best choice for read/write-heavy applications like databases.
Use Case: Persisting database files (/var/lib/postgresql/data), storing user-uploaded content, sharing data safely between containers.
2. Bind Mounts
A Bind Mount is the "legacy" method. It is a simple mapping (or "bind") of a specific file or directory from the host into the container. Unlike volumes, Docker does not manage this storage; it just creates the mapping.
Key Characteristics:
Tied to Host Path: You must specify the exact, absolute path on the host machine (e.g.,
/home/user/project/src:/app).Non-Portable: This creates a hard dependency on the host's filesystem structure. A container that relies on
/home/user/project/srcwill not run on another machine that doesn't have that exact directory.Security Risk: A container with a bind mount can modify the host filesystem, including creating, modifying, or deleting sensitive host files. A misconfigured container could (for example) be given access to
/or/etc.High Performance: Provides direct, high-performance access to the host's files.
Use Case: The primary use case is for local development. By bind-mounting your source code (e.g., ./webapp:/app), you can edit code on your host machine with your favorite IDE, and the changes are instantly reflected inside the running container, enabling live-reloading.
3. tmpfs Mounts
A tmpfs Mount is a non-persistent, in-memory storage option. Data is written to the host's RAM, not to its disk.
Key Characteristics:
In-Memory: Data is never written to the host filesystem (disk).
Extremely Fast: Writing and reading from RAM is significantly faster than from a disk.
Non-Persistent: As soon as the container is stopped, the
tmpfsmount is destroyed, and all data within it is lost.Security: Ideal for temporary, sensitive data that you do not want to be written to disk.
Use Case: Storing temporary files, caches, or secrets that an application needs during its runtime but should not be saved.
Comparison: Volume vs. Bind Mount vs. tmpfs
| Feature | Volumes | Bind Mounts | tmpfs Mounts |
|---|---|---|---|
| Host Location | Managed by Docker (e.g., /var/lib/docker/volumes/) | Specific path, set by user (e.g., /home/user/src) | Host's Memory (RAM) |
| Lifecycle | Independent of container | Tied to host filesystem | Tied to container lifecycle |
| Persistence | Persistent | Persistent | Non-Persistent |
| Portability | High. (Abstracted by name) | None. (Tied to host path) | Not applicable |
| Management | docker volume CLI | Manual (user manages host path) | N/A (automatic) |
| Best Use Case | Stateful data (e.g., databases) | Local development (source code) | Temporary/sensitive data |
Practical Usage: CLI Flags
There are two flags for mounting storage: the simple -v flag and the more explicit --mount flag. The --mount flag is the modern, recommended syntax.
1. tmpfs Mount Syntax
# --tmpfs flag
docker run -d --name my-app \
--tmpfs /app/cache \
my-image
# --mount flag (more explicit)
docker run -d --name my-app \
--mount type=tmpfs,destination=/app/cache \
my-image2. Bind Mount Syntax
# -v flag (simple syntax)
# Format: -v /path/on/host:/path/in/container
docker run -d --name my-app \
-v /Users/me/project:/usr/src/app \
my-image
# --mount flag (more explicit)
# Format: --mount type=bind,source=/path/on/host,target=/path/in/container
docker run -d --name my-app \
--mount type=bind,source=/Users/me/project,target=/usr/src/app \
my-image3. Volume Syntax
# -v flag (simple syntax)
# If the first part is NOT a path, Docker assumes it's a named volume.
# This will create a volume named 'my-db-data' if it doesn't exist.
docker run -d --name my-db \
-v my-db-data:/var/lib/postgresql/data \
postgres:14
# --mount flag (more explicit)
# Format: --mount type=volume,source=my-db-data,target=/path/in/container
docker run -d --name my-db \
--mount type=volume,source=my-db-data,target=/var/lib/postgresql/data \
postgres:14Managing Volumes with the CLI
Docker provides a full set of commands to manage the lifecycle of your volumes.
docker volume create [my-volume-name]- Creates a new named volume. (This is often not needed, as
docker runordocker-compose upwill create it automatically if it doesn't exist).
- Creates a new named volume. (This is often not needed, as
docker volume ls- Lists all volumes on the Docker host.
docker volume inspect [my-volume-name]- Shows detailed metadata about a volume, including its actual location on the host (the
Mountpoint).
- Shows detailed metadata about a volume, including its actual location on the host (the
docker volume rm [my-volume-name]- Removes one or more volumes. You cannot remove a volume if it is currently being used by a container.
docker volume prune- A very useful command for cleanup. It removes all "dangling" volumes (volumes that are not currently attached to any container).
Real-World Use Cases (in docker-compose.yml)
As seen in File 08, Docker Compose makes managing storage simple and declarative.
Use Case 1: Stateful Database (Volume)
This is the standard pattern for a database. The db-data volume persists all data, even if the db container is deleted and re-created.
services:
db:
image: postgres:14
environment:
- "POSTGRES_USER=user"
volumes:
# Maps the named volume 'db-data'
- "db-data:/var/lib/postgresql/data"
# Top-level 'volumes' key declares the volume
volumes:
db-data:Use Case 2: Local Development (Bind Mount)
This pattern is for a web application, mounting the local source code for live-reloading.
services:
web:
build: .
ports:
- "8000:8000"
volumes:
# Bind mounts the current directory into /app
- "./:/app"
# No 'volumes' key is needed for a bind mount