Skip to content

Dockerfile

The Dockerfile is a text document that contains all the commands, in order, needed to build a container image. It is the blueprint for your application's environment. How you write this file has a significant impact on your image's size, build speed, and security.

The Build Context and .dockerignore

When you run docker build, the first thing the Docker client does is send the "build context" to the daemon. The build context is the set of files at the specified path (e.g., . for the current directory).

  • This entire directory is sent to the daemon, not just the Dockerfile.

This is why the .dockerignore file is critical. It is a file in the root of your build context that lists files and directories to exclude from the build context, similar to a .gitignore file.

Why .dockerignore is essential:

  1. Build Speed: It prevents large, unnecessary files (like node_modules, build/, .git/, .venv/) from being sent to the daemon, speeding up the build.

  2. Cache Invalidation: It prevents changes to non-essential files (like a README.md or test logs) from breaking the build cache.

  3. Security: It prevents sensitive files (like .env, aws_credentials, .ssh/) from being accidentally copied into the image.

Example .dockerignore:

dockerfile
# Exclude node.js dependencies
node_modules

# Exclude python virtual env
.venv

# Exclude build artifacts
build/
dist/

# Exclude git and OS files
.git
.DS_Store

# Exclude local secrets
.env
*.log

Core Instructions

FROM: Define the Base

The FROM instruction must be the first in a Dockerfile. It specifies the parent image your image is "based on."

  • Best Practice: Always use a specific tag (e.g., FROM node:18-alpine instead of FROM node). Using latest can lead to unpredictable builds as the base image may change.

  • Best Practice: Use minimal base images (like alpine, slim, or distroless) for production. They are smaller and have a reduced attack surface.

WORKDIR: Set a Working Directory

The WORKDIR instruction sets the working directory for any subsequent RUN, CMD, ENTRYPOINT, COPY, and ADD instructions.

  • Best Practice: Always use WORKDIR instead of RUN cd /app. WORKDIR automatically creates the directory if it doesn't exist and ensures all future commands execute in that context, making your Dockerfile cleaner and more reliable.
dockerfile
# Good
WORKDIR /app
COPY . .

# Bad
RUN mkdir /app
RUN cd /app
COPY . .

COPY vs. ADD: Moving Files

Both instructions copy files from the build context into the image.

  • COPY: This instruction is simple and explicit. It copies files and directories from the context to the image. COPY <src> <dest>

  • ADD: This instruction has "magic" features:

    1. It can copy and auto-extract local tarballs (.tar.gz).

    2. It can download files from a remote URL.

Best Practice: Always prefer COPY. Its behavior is explicit and transparent. The "magic" of ADD can be dangerous (e.g., downloading a malicious file, "zip bomb" extraction). Only use ADD if you specifically need to auto-extract a local tarball.

RUN: Executing Commands

The RUN instruction executes any command in a new layer on top of the current image. This is used for installing packages, compiling code, etc.

  • Best Practice (Layer Reduction): Chain related commands together using && and backslashes (\) to reduce the number of image layers. Each RUN instruction creates a new layer, and more layers can mean a larger image.
dockerfile
# Good: One layer
RUN apt-get update && apt-get install -y \
    curl \
    git \
    vim \
    && rm -rf /var/lib/apt/lists/*

# Bad: Four layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN apt-get install -y vim

Notice the rm -rf /var/lib/apt/lists/* in the good example. This cleans up the package cache in the same layer it was created, reducing the final image size.

EXPOSE: Documenting Ports

The EXPOSE instruction informs Docker that the container listens on the specified network ports at runtime.

  • What it does: This is purely metadata. It serves as documentation for the user and can be used by automation tools.

  • What it does NOT do: It does not actually publish the port. You still must use the -p or -P flag (docker run -p 8080:80 ...) to map the port from the host to the container.

ARG vs. ENV: Setting Variables

  • ARG (Build-Time): ARG defines a variable that can be passed at build-time (e.g., docker build --build-arg VERSION=1.2). This variable exists only during the build; it is not available to the running container.

  • ENV (Runtime): ENV sets a permanent environment variable in the image. This variable is available both during the build (after it's defined) and when the container is running.

CMD vs. ENTRYPOINT: Defining the Runtime

This is one of the most confusing but important distinctions.

  • CMD: Sets the default command and/or parameters to be executed when the container starts.

    • CMD can be easily overridden by the user at runtime (e.g., docker run my-image /bin/bash).

    • If a Dockerfile has multiple CMD instructions, only the last one takes effect.

  • ENTRYPOINT: Configures the container to run as a specific executable.

    • ENTRYPOINT is not easily overridden. If you provide arguments at runtime, they are appended to the ENTRYPOINT.

    • This is ideal for creating images that are "for" a specific application (e.g., an image that is the redis-server).

Exec Form (Preferred) vs. Shell Form (Avoid)

Both CMD and ENTRYPOINT can be written in two forms:

  1. Exec Form (JSON Array): CMD ["executable", "param1", "param2"]

    • This is the preferred form.

    • It executes the command directly, without a shell.

    • Crucially, this allows process signals (like SIGTERM from docker stop) to be sent directly to your application, allowing for graceful shutdowns.

  2. Shell Form: CMD executable param1 param2

    • This form runs your command inside /bin/sh -c "...".

    • This can cause problems with signal handling and is less explicit. Your application becomes a child process of the shell, which may not forward signals correctly.

The Best-Practice Combination: Use ENTRYPOINT in exec form to set the main executable and CMD in exec form to set the default parameters.

dockerfile
ENTRYPOINT ["/usr/bin/my-app"]
CMD ["--mode", "production"]
  • docker run my-image -> Runs /usr/bin/my-app --mode production

  • docker run my-image --mode staging -> Runs /usr/bin/my-app --mode staging (The CMD is overridden)

  • docker run my-image -h -> Runs /usr/bin/my-app -h (The CMD is overridden)

Build Cache Optimization

Docker builds images in layers. If nothing has changed in a layer, Docker reuses it from the cache. This is the key to fast builds.

The Golden Rule: Order your Dockerfile instructions from least-frequently changed to most-frequently changed.

A change in one layer invalidates the cache for all subsequent layers.

Bad Example (Node.js):

dockerfile
WORKDIR /app
COPY . .           # <-- Any file change breaks the cache
RUN npm install    # <-- npm install runs EVERY time
CMD ["node", "src/index.js"]

Good Example (Node.js):

dockerfile
WORKDIR /app
COPY package*.json ./   # 1. Copy only package.json
RUN npm install         # 2. Install dependencies
                        # (This layer is only rebuilt if package.json changes)
COPY . .                # 3. Copy source code
                        # (This layer is rebuilt on code changes, but npm install is cached)
CMD ["node", "src/index.js"]

Annotated Example: A Multistage Node.js Build

This example uses all the best practices: multistage builds, cache optimization, WORKDIR, USER, and the exec form of CMD.

dockerfile
# --- Stage 1: The "Builder" ---
# Use a full-featured base image for building the app
FROM node:18-alpine AS builder

# Set the working directory in the container
WORKDIR /app

# Copy package.json and package-lock.json first to leverage build cache
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy the rest of the application source code
COPY . .

# Run the build script (e.g., for a React, Vue, or TypeScript app)
RUN npm run build

# --- Stage 2: The "Production" Image ---
# Use a minimal base image for the final, lean image
FROM node:18-alpine

# Set the working directory
WORKDIR /app

# Create a non-root user and group for security
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# Copy only the necessary build artifacts from the "builder" stage
# This copies the 'dist' folder from the 'builder' stage
COPY --from=builder /app/dist ./dist

# Copy production node_modules (if different from dev)

# A more complex build might copy node_modules from the builder too.

# Expose the port the app runs on (metadata)
EXPOSE 3000

# Set the default command to run the app
# Use 'exec' form for correct signal handling

CMD ["node", "dist/main.js"]