Skip to content

Advanced Container Security

This guide covers runtime hardening: how to prevent a compromised container from damaging the host or other containers, even if the attacker has "root" access inside the container.

We will focus on the Linux kernel primitives that enforce isolation: Seccomp, AppArmor/SELinux, and Capabilities.

1. Understanding "Container Breakouts"

A container breakout occurs when a process inside a container manages to access resources outside its isolated namespace.

Common Vectors:

  • Kernel Exploits: Exploiting a bug in the host kernel (shared by all containers).

  • Misconfiguration: Mounting the Docker socket (/var/run/docker.sock) or the host root filesystem (/).

  • Excessive Capabilities: Having privileges like CAP_SYS_ADMIN allows a container to manipulate the kernel.

2. Limiting System Calls with Seccomp

The Linux kernel has hundreds of system calls (syscalls). A standard web server needs only a small fraction of them (e.g., read, write, socket, accept). Other syscalls (like mount, ptrace, reboot) are dangerous and unnecessary.

Seccomp (Secure Computing Mode) acts as a firewall for syscalls. It allows you to whitelist exactly which calls a container can make.

The Default Profile

Docker applies a default Seccomp profile that blocks ~44 dangerous syscalls out of the box. This prevents many common exploits.

Custom Profiles

For high-security environments, you can create a custom profile (JSON file) that whitelists only what your app needs.

Running with a custom profile:

docker run --security-opt seccomp=profile.json my-app

3. Mandatory Access Control (AppArmor / SELinux)

While standard Linux permissions (rwx) control access based on users, Mandatory Access Control (MAC) systems control access based on programs.

AppArmor (Ubuntu/Debian)

AppArmor restricts what specific programs can do.

  • Example: You can write a profile that says "Nginx can only read from /var/www/html and write to /var/log/nginx." Even if Nginx runs as root, it cannot touch /etc/shadow.

Applying a profile:

docker run --security-opt apparmor=my-nginx-profile nginx

SELinux (RHEL/CentOS/Fedora)

SELinux uses a labeling system. Every file and process has a label.

  • Container Isolation: Docker automatically labels container processes (container_t) and container files (container_file_t).

  • Protection: This prevents a container process from reading host files (like /home/user/.ssh) because the labels don't match.

4. Capabilities: Breaking Down "Root"

In Linux, "root" is not a single permission; it is a collection of ~40 "Capabilities."

The Principle of Least Privilege: Even if your container runs as UID 0 (root), you should drop all capabilities it doesn't explicitly need.

The Ultimate Hardening Command:

docker run \
  --cap-drop=ALL \          # 1. Drop EVERYTHING. The container has 0 special powers.
  --cap-add=NET_BIND_SERVICE \ # 2. Add back ONLY permission to bind ports < 1024.
  --security-opt no-new-privileges \ # 3. Prevent processes from gaining new privileges (e.g., via sudo).
  my-web-server

5. Rootless Docker

The ultimate defense against daemon exploits is to run the Docker daemon itself as a non-root user.

Rootless Mode:

  • The dockerd daemon runs as your standard user (e.g., alice).

  • It uses user namespaces to map your user to "root" inside the container.

  • Benefit: If the daemon is compromised, the attacker only gains the privileges of alice, not root on the host.

Installation (Linux):

curl -fsSL [https://get.docker.com/rootless](https://get.docker.com/rootless) | sh

Note: Rootless mode has some limitations regarding networking drivers and cgroup resource limits.

6. Runtime Threat Detection (Falco)

Prevention is ideal, but detection is mandatory. How do you know if a container has already been compromised?

Falco is a CNCF tool that monitors kernel syscalls in real-time to detect anomalous behavior.

Falco can detect:

  • A shell being spawned in a running container (often the first step of an attack).

  • A process writing to a binary directory (like /bin).

  • Outbound connections to unexpected IPs (crypto-mining or C2 servers).

  • Sensitive files (like /etc/shadow) being read.

Example Falco Rule:

- rule: Terminal Shell in Container
  desc: A shell was spawned in a container with an attached terminal.
  condition: >
    container.id != host and
    proc.name = bash and
    evt.type = execve
  output: "Shell spawned in container (user=%user.name container=%container.name)"
  priority: NOTICE