Aytict Academy

1 The container threat model

The single most important fact about container security is that containers share the host kernel. Unlike a virtual machine, which runs its own kernel on top of a hypervisor, every container on a host makes system calls into the same Linux kernel. A container is not a security boundary in the way a VM is — it is a bundle of kernel isolation features (namespaces and cgroups) wrapped around a process.

This shapes the blast radius: if an attacker escapes a container by exploiting a kernel bug, a misconfiguration, or an over-privileged setting, they can reach the host and potentially every other container on it. Defence therefore aims to make escape hard, to contain damage if it happens, and to reduce what each container is allowed to do in the first place.

2 Build, ship, run: the security model

Container security is usually organised along the lifecycle: build, ship, and run. Each phase has its own controls and they reinforce one another.

Build — what goes into the image: minimal base images, multi-stage builds, no secrets baked in, dependency hygiene, and scanning before the image is ever pushed.
Ship — how the image travels: a trusted private registry, image signing so consumers can verify provenance, and admission rules that reject unsigned or unscanned images.
Run — how the container behaves in production: least-privilege settings (non-root, dropped capabilities, read-only filesystem), network segmentation, and runtime threat detection.

The principle throughout is defence in depth: no single control is trusted to be perfect, so you layer build-time, ship-time and run-time protections.

3 Minimal and distroless base images

Every package, shell and tool inside an image is potential attack surface. A full ubuntu or debian base ships a package manager, a shell, curl, and dozens of libraries you may never use — each a possible vulnerability and a tool an attacker can abuse after a breach. Minimal base images cut this drastically.

Distroless images contain only your application and its runtime dependencies — no shell, no package manager, no busybox. This shrinks the vulnerability count and means an attacker who gains code execution cannot simply spawn /bin/sh or download tools. alpine is another small option, though it still includes a shell and uses musl libc. Smaller images also pull faster and have fewer CVEs to triage.

# Distroless: ships the app and runtime only, no shell or package manager
FROM gcr.io/distroless/static-debian12
COPY --chown=nonroot:nonroot app /app
USER nonroot
ENTRYPOINT ["/app"]

4 Multi-stage builds

A naive Dockerfile that compiles code in the final image drags along compilers, build tools, source code and caches — none of which the running application needs, and all of which expand attack surface and image size. A multi-stage build uses one stage to build and a second, clean stage to run.

The build stage has the full toolchain; the final stage starts from a minimal or distroless base and copies in only the compiled artefact. Build secrets, intermediate files and the compiler never appear in the shipped image. This is one of the highest-leverage hardening techniques because it simultaneously reduces size, removes tools an attacker could use, and keeps build-time secrets out of layers.

# Stage 1: build with the full toolchain
FROM golang:1.22 AS build
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 go build -o /app ./cmd/server

# Stage 2: minimal runtime, only the binary is copied in
FROM gcr.io/distroless/static-debian12
COPY --from=build /app /app
USER nonroot
ENTRYPOINT ["/app"]

5 Running as non-root

By default a container process runs as root (UID 0) inside the container. Because of the shared kernel, container-root is uncomfortably close to host-root: if an attacker escapes, or if a volume is mounted, running as root magnifies the damage. Running as a non-root user reduces risk by limiting what the process can do even before any other control kicks in.

Set an explicit non-root USER in the Dockerfile, and enforce it in Kubernetes with runAsNonRoot: true and a numeric runAsUser. Pair this with a read-only root filesystem and allowPrivilegeEscalation: false so the process cannot gain new privileges via setuid binaries.

securityContext:
  runAsNonRoot: true
  runAsUser: 10001
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true

6 Linux capabilities and dropping them

Traditional Unix splits privilege into all-or-nothing root. Linux capabilities break root’s power into discrete units — for example CAP_NET_BIND_SERVICE (bind ports below 1024), CAP_NET_RAW (raw sockets), or the dangerous CAP_SYS_ADMIN (a near-superuser grab-bag). A container does not need most of them.

The hardening rule is drop everything, then add back only what is required. Dropping ALL capabilities and granting just the one or two a service genuinely needs sharply limits what a compromised process can do to the kernel. Avoid CAP_SYS_ADMIN and --privileged (which grants nearly all capabilities and devices) except for tightly reviewed infrastructure workloads.

securityContext:
  capabilities:
    drop: ["ALL"]
    add: ["NET_BIND_SERVICE"]   # only if the app must bind a low port

7 The Linux primitives: namespaces, cgroups, seccomp, LSMs

Container isolation is not magic; it is assembled from kernel features. Namespaces give a container its own view of resources — PID, network, mount, UTS, IPC and user namespaces partition what the process can see. cgroups (control groups) limit and account for resource usage: CPU, memory, PIDs — preventing one container from starving the host.

On top of these, seccomp filters which system calls a process may make, shrinking the kernel attack surface (the RuntimeDefault seccomp profile blocks dangerous syscalls). Mandatory Access Control via Linux Security Modules — AppArmor or SELinux — confines what files and operations a container can touch, regardless of Unix permissions. Together these are the bricks that build the container sandbox.

8 Rootless containers

Even with a non-root user inside the container, the container runtime daemon may run as host-root — classic Docker does. Rootless containers run the entire runtime as an unprivileged host user, leveraging the user namespace to map container UID 0 to an ordinary, unprivileged UID on the host.

The benefit is that a breakout from a rootless container lands the attacker as a normal host user, not root, dramatically shrinking the blast radius. Tools like Podman and rootless Docker support this mode. There are trade-offs — some features (binding privileged ports, certain storage drivers) need extra configuration — but for many workloads rootless is a strong, low-effort isolation upgrade.

9 Dockerfile hardening pitfalls

Many vulnerabilities are introduced in the Dockerfile itself. Common pitfalls:

Baking secrets into layers — an ENV API_KEY=... or a copied .env stays in the image history forever, even if a later layer deletes it. Use build secrets or runtime injection instead.
Using latest tags — non-deterministic and unverifiable; pin to a digest or specific version.
Running as root — no explicit USER means UID 0.
Installing unnecessary packages — debug tools, curl, compilers that survive into production.
Broad ADD with remote URLs — prefer COPY; ADD can fetch and auto-extract untrusted content.

A useful habit is to add a .dockerignore so local secrets, .git and credentials never enter the build context.

10 Image vulnerability scanning

Images are built from base layers and dependencies that accumulate known vulnerabilities (CVEs) over time. An image that was clean last month may be vulnerable today as new CVEs are disclosed. Vulnerability scanners such as Trivy, Grype and Clair inspect an image’s OS packages and language dependencies against CVE databases and report what is exploitable.

Where scanning runs matters. Scan in CI to fail builds before a vulnerable image is ever pushed; scan in the registry to catch newly disclosed CVEs in already-stored images; and use admission control to block deployment of images that exceed a severity threshold. Scanning is continuous, not one-off, because the vulnerability landscape keeps changing under a static image.

# Fail the CI build if HIGH or CRITICAL vulnerabilities are found
trivy image --severity HIGH,CRITICAL --exit-code 1 myregistry.example.com/app:1.4.2

11 Image signing and verification with cosign

Scanning tells you what is in an image, but not whether the image is the one your pipeline actually produced. An attacker who compromises a registry could swap in a malicious image under the same tag. Image signing solves provenance: the build pipeline cryptographically signs the image digest, and consumers verify the signature before running it.

Cosign (part of the Sigstore project) is the common tool. It signs the immutable image digest (not the mutable tag), and supports keyless signing tied to OIDC identities. In Kubernetes you then add an admission controller (such as a policy engine) that only admits signed images from trusted signers, rejecting anything unsigned or signed by an unknown key.

# Sign an image digest, then verify it before deploy
cosign sign myregistry.example.com/app@sha256:abc123...
cosign verify --certificate-identity=ci@example.com \
  --certificate-oidc-issuer=https://token.actions.githubusercontent.com \
  myregistry.example.com/app@sha256:abc123...

12 Registry security and private registries

The registry is where images live between build and run, making it a high-value target. Pulling from arbitrary public registries means trusting unknown publishers; a typosquatted or backdoored public image can poison your whole estate. A private registry — with authentication, authorisation and TLS — lets you control exactly which images may enter your environment.

Good practices include requiring authenticated pushes, scoping pull/push permissions per team or repository, enabling registry-side vulnerability scanning, enforcing immutable tags so a tag cannot be silently overwritten, and proxying/mirroring approved public images through your own registry. Combined with signing, this gives a verifiable chain from build to deployment.

13 Supply chain: SBOM and provenance

Modern attacks increasingly target the software supply chain — compromising a dependency, a build server, or a base image rather than the running app directly. Two artefacts harden the chain. A Software Bill of Materials (SBOM) is a machine-readable inventory of every component and version in an image (in formats like SPDX or CycloneDX). When a new CVE lands, you can instantly query which images contain the affected component.

Provenance attestations record how and where an image was built — the source commit, the builder, the parameters — following frameworks like SLSA. Signed SBOM and provenance attestations let consumers verify not just that an image is signed, but that it was built from trusted source by a trusted builder. This is the foundation of a verifiable, tamper-evident pipeline.

14 Kubernetes RBAC and least privilege

Kubernetes Role-Based Access Control (RBAC) governs who — users and service accounts — can perform which actions on which resources. The model has four objects: Roles and ClusterRoles define permissions (verbs like get, list, create on resources); RoleBindings and ClusterRoleBindings grant those to subjects.

The guiding rule is least privilege: grant the minimum verbs on the minimum resources in the minimum namespaces. Avoid binding users or service accounts to cluster-admin. Pods get a service account whose token they can use to call the API — if that account is over-permissioned, a compromised pod inherits its power. Disable token automounting where the workload does not call the API, and scope each service account tightly.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata: { namespace: web, name: pod-reader }
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

15 Pod Security Standards and admission control

You need a way to enforce that pods are not privileged, not running as root, and not mounting the host filesystem — before they are admitted. The legacy PodSecurityPolicy (PSP) did this but was confusing and was removed in Kubernetes 1.25. Its built-in replacement is Pod Security Admission (PSA), which enforces the three Pod Security Standards profiles: Privileged (unrestricted), Baseline (blocks known privilege escalations), and Restricted (heavily hardened best practice).

PSA is applied per namespace via labels and can run in enforce, audit or warn modes. For richer, custom policy many teams add a general admission controller such as a policy engine (e.g. Kyverno or OPA Gatekeeper) to express rules PSA cannot — like requiring signed images or specific labels.

apiVersion: v1
kind: Namespace
metadata:
  name: payments
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/warn: restricted

16 Network policies and default-deny

By default, Kubernetes networking is flat and fully open: every pod can reach every other pod across all namespaces. That means one compromised pod can scan and attack the entire cluster laterally. NetworkPolicy objects restrict pod-to-pod (and pod-to-external) traffic at L3/L4, enforced by the CNI plugin.

The cornerstone pattern is default-deny: apply a policy that selects all pods in a namespace but permits no ingress (and/or egress), so all unspecified traffic is blocked. You then add narrow allow policies for the connections each service legitimately needs — for example, only the API pods may reach the database on port 5432. This segmentation contains a breach to a small blast radius instead of the whole cluster.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny-ingress, namespace: payments }
spec:
  podSelector: {}        # selects every pod in the namespace
  policyTypes: ["Ingress"]
  # no ingress rules listed => all ingress traffic is denied

17 Secrets in Kubernetes and their limits

Kubernetes Secret objects hold credentials, tokens and keys, and can be mounted into pods as files or environment variables. A crucial caveat: by default Secrets are stored base64-encoded, not encrypted — base64 is encoding, not protection. Anyone with read access to the Secret or to etcd can trivially decode them.

Hardening Secrets means: enable encryption at rest for Secrets in etcd (via an EncryptionConfiguration, ideally backed by a KMS); lock down RBAC so few subjects can read them; prefer mounting as files over environment variables (env vars leak into logs and child processes more easily); and for stronger guarantees integrate an external secrets manager (such as Vault) so the sensitive material never sits in plain etcd at all.

18 Securing the control plane and etcd

The control plane is the crown jewels: whoever controls the API server and etcd controls the cluster. etcd holds all cluster state — including every Secret — so an attacker who reads the etcd data store can extract every credential in the cluster. Protecting it is non-negotiable.

Key measures: restrict network access to etcd to the API servers only and require mutual TLS; enable encryption at rest for etcd data; protect the kubeadm/certificate authority keys; lock down the API server with strong authentication, RBAC and the NodeRestriction admission plugin; disable anonymous and insecure-port access; and enable audit logging so API actions are recorded. Treat control-plane node access with the same care as domain-admin on a corporate network.

19 Runtime threat detection with Falco

Build- and ship-time controls reduce risk, but a determined attacker may still gain execution at runtime. Runtime threat detection watches running containers for suspicious behaviour and alerts in real time. Falco, a CNCF project, taps the kernel (via eBPF or a kernel module) to observe system calls and flags activity that violates its rules.

Falco ships rules for behaviour that should almost never happen in a well-built container: a shell being spawned inside a container, writes to sensitive paths like /etc, an unexpected outbound connection, or a process trying to read /etc/shadow. Because such behaviour is anomalous against the expected baseline, detecting it catches breaches that static scanning cannot. Pair detection with response — alerting, and optionally killing or isolating the offending pod.

20 CIS benchmarks and kube-bench

It is hard to know whether a cluster is configured securely without a reference. The CIS (Center for Internet Security) Kubernetes Benchmark is a community-agreed checklist of hardening recommendations — covering API server flags, kubelet configuration, etcd settings, RBAC defaults and file permissions on control-plane nodes.

kube-bench is a tool that automatically checks a running cluster against the CIS Benchmark and reports which items pass, fail or warrant manual review. Running it regularly turns a long PDF of recommendations into actionable findings, and re-running after changes guards against configuration drift. There are equivalent CIS benchmarks for Docker, which tools like Docker Bench for Security evaluate. Benchmarks give you a measurable, repeatable definition of “hardened.”

# Audit a cluster node against the CIS Kubernetes Benchmark
kube-bench run --targets master,node

21 Authoring seccomp profiles

The RuntimeDefault seccomp profile is a good baseline, but a tightly scoped application can do better with a custom seccomp profile that allows only the system calls it actually uses. seccomp (secure computing mode) filters syscalls in the kernel via a BPF program; a profile lists a defaultAction (typically SCMP_ACT_ERRNO to deny) and an allow-list of syscalls with SCMP_ACT_ALLOW.

The practical workflow is to record the syscalls a workload makes under realistic load — tools such as the Security Profiles Operator or strace help — then generate a least-privilege profile and load it. Drop the profile JSON into the kubelet’s seccomp directory (or distribute it via the operator) and reference it with a localhostProfile. The payoff is a dramatically smaller kernel attack surface: a syscall an attacker needs for an exploit simply returns an error.

securityContext:
  seccompProfile:
    type: Localhost
    localhostProfile: profiles/app-minimal.json
# app-minimal.json:
# { "defaultAction": "SCMP_ACT_ERRNO",
#   "syscalls": [ { "names": ["read","write","epoll_wait"],
#                  "action": "SCMP_ACT_ALLOW" } ] }

22 AppArmor and SELinux profiles for containers

Where seccomp filters syscalls, Mandatory Access Control (MAC) via a Linux Security Module confines what files, capabilities and operations a process may use — regardless of ordinary Unix permissions. The two common LSMs are AppArmor (path-based profiles, common on Debian/Ubuntu) and SELinux (label/type-based, common on RHEL/Fedora).

An AppArmor profile for a container might deny writes outside a few directories, forbid raw network access, and block mount operations. In Kubernetes you attach a profile through the pod securityContext (or, on older versions, an annotation). SELinux is configured via the seLinuxOptions field, assigning the container a type/level so the kernel enforces label-based separation between workloads. MAC adds a second wall: even if an attacker subverts file permissions, the LSM still blocks disallowed actions.

securityContext:
  appArmorProfile:
    type: Localhost
    localhostProfile: k8s-apparmor-restricted
  seLinuxOptions:
    level: "s0:c123,c456"

23 Read-only root filesystems and writable volumes

A container whose root filesystem is writable invites tampering: an attacker can drop a binary, modify a config, or persist a foothold. Setting readOnlyRootFilesystem: true makes the entire container filesystem immutable at runtime, so such writes simply fail. This both blocks a class of attacks and surfaces sloppy apps that secretly write to disk.

Most applications still need some writable space — a temp directory, a cache, a pid file. The pattern is to mount narrow, ephemeral writable volumes (an emptyDir, ideally medium: Memory for sensitive scratch data) only where needed, while everything else stays read-only. Combined with running as non-root and dropping capabilities, an immutable root filesystem turns the container into a hard-to-tamper, cattle-not-pets unit you can kill and recreate freely.

securityContext:
  readOnlyRootFilesystem: true
volumeMounts:
  - name: tmp
    mountPath: /tmp
volumes:
  - name: tmp
    emptyDir:
      medium: Memory

24 Sandboxed runtimes: gVisor and Kata Containers

Standard runtimes share the host kernel directly, so a kernel exploit can mean a full escape. Sandboxed runtimes add a stronger boundary for untrusted or multi-tenant workloads. gVisor interposes a user-space kernel (runsc) that implements the Linux syscall surface itself, so the container’s syscalls hit gVisor rather than the host kernel — shrinking the host kernel attack surface. Kata Containers takes a different route, running each pod inside a lightweight virtual machine with its own guest kernel, giving near-VM isolation.

Both plug in via the Kubernetes RuntimeClass mechanism: you define a RuntimeClass pointing at the sandboxed handler and reference it from the pod. The trade-off is some performance and compatibility overhead, so teams typically reserve sandboxing for higher-risk tenants — untrusted code, customer-submitted workloads — rather than every pod.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata: { name: gvisor }
handler: runsc
---
apiVersion: v1
kind: Pod
metadata: { name: untrusted }
spec:
  runtimeClassName: gvisor
  containers: [ { name: app, image: untrusted/app:1.0 } ]

25 eBPF-based runtime security

eBPF lets you run sandboxed programs inside the Linux kernel, attached to events like syscalls, network packets and function entry points — safely and without kernel modules. This makes it a powerful foundation for runtime security: an eBPF program can observe exactly what a process does at the kernel level with low overhead, feeding detection and enforcement.

Modern tools build on this. Falco can use an eBPF probe to capture syscalls; Tetragon and Cilium use eBPF for deep observability and in-kernel enforcement — not just alerting but blocking a disallowed action before it completes. Because eBPF sees behaviour the application cannot hide (it is below user space), it gives high-fidelity visibility for threat detection, network policy enforcement, and runtime profiling, all without recompiling the kernel or restarting workloads.

26 Writing Falco rules

Falco’s power comes from its rules, written in YAML. A rule has a condition (a boolean expression over event fields), an output string, a priority, and tags. Reusable building blocks help: macros name a condition fragment for reuse, and lists hold sets of values (e.g. allowed binaries). Fields come from the syscall event — proc.name, fd.name, container.id, evt.type and so on.

A good custom rule is specific enough to avoid noise but broad enough to catch the threat. For example, alerting when a shell is spawned in a container that should never run one, or when a process opens a write to /etc. Tune carefully: overly broad rules drown responders in false positives, so use macros and exception lists to carve out known-good behaviour rather than disabling the rule entirely.

- rule: Shell spawned in web container
  desc: Detect a shell starting inside the web app container
  condition: >
    spawned_process and container
    and proc.name in (sh, bash, zsh)
    and container.image.repository = "myorg/web"
  output: "Shell in web container (proc=%proc.name user=%user.name)"
  priority: WARNING
  tags: [container, shell, mitre_execution]

27 Admission policy with OPA Gatekeeper

OPA Gatekeeper is a validating admission controller that enforces policy written in Rego, the language of the Open Policy Agent. Its key abstraction is the ConstraintTemplate, which defines a parameterised policy (the Rego logic plus a schema), and the Constraint, an instance of that template that targets specific resource kinds and supplies parameters.

This separation lets platform teams ship a library of templates — require labels, block privileged pods, restrict allowed registries — and let application teams apply constrained instances. Gatekeeper also offers an audit mode that reports existing violations already in the cluster, and a dry-run enforcementAction so you can roll out a policy in warn-only mode before switching it to deny. Because it sees every create/update request to the API server, it is a strong choke point for cluster-wide guardrails.

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata: { name: only-trusted-registry }
spec:
  match:
    kinds: [ { apiGroups: [""], kinds: ["Pod"] } ]
  parameters:
    repos: ["myregistry.example.com/"]
  enforcementAction: deny

28 Admission policy with Kyverno

Kyverno is a Kubernetes-native policy engine that expresses policy as YAML resources rather than a separate language, lowering the barrier for teams already fluent in Kubernetes manifests. A Kyverno ClusterPolicy contains rules of three kinds: validate (reject non-compliant resources), mutate (inject or default fields, e.g. add a securityContext), and generate (create companion resources like a default NetworkPolicy in every new namespace).

Validation rules use a validationFailureAction of Enforce or Audit, mirroring the warn-then-block rollout pattern. Kyverno also natively supports image verification — a verifyImages rule can require a valid cosign signature before admitting a pod. Its YAML-first model and mutate/generate capabilities make it popular for both guardrails and automated secure defaults.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata: { name: disallow-privileged }
spec:
  validationFailureAction: Enforce
  rules:
  - name: no-privileged
    match: { any: [ { resources: { kinds: ["Pod"] } } ] }
    validate:
      message: "Privileged containers are not allowed"
      pattern:
        spec:
          containers:
          - securityContext:
              privileged: "false"

29 Verifying image signatures at admission

Signing an image is only half the story; the cluster must refuse to run unsigned or untrusted images. This is enforced at admission: when a pod is created, the controller resolves each image to its digest, fetches the associated signatures and attestations, and verifies them against a trust policy — the expected signer identity, the OIDC issuer, and any required attestations — before the pod is admitted.

Tools include the Sigstore Policy Controller, Kyverno’s verifyImages, and Connaisseur. A robust policy checks keyless identity (e.g. that the build came from your CI’s OIDC identity), verifies the signature covers the exact digest being deployed, and can also require a signed SBOM or provenance attestation. Crucially the verification happens on the digest, so swapping a tag cannot bypass it. This closes the loop from signing to enforcement, making the supply chain tamper-evident end to end.

apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata: { name: require-ci-signature }
spec:
  images:
    - glob: "myregistry.example.com/**"
  authorities:
    - keyless:
        identities:
          - issuer: https://token.actions.githubusercontent.com
            subject: ci@example.com

30 SBOM consumption and VEX

An SBOM lists what is in an image, but a raw scan of that inventory often produces a flood of CVEs — many of which are not actually exploitable in your context because the vulnerable code path is never reached, the component is not loaded, or a compensating control exists. Drowning in non-exploitable findings is itself a security problem: real issues get lost.

VEX (Vulnerability Exploitability eXchange) addresses this. A VEX document is a machine-readable statement, per vulnerability, of its status: not affected, affected, fixed, or under investigation, often with a justification (e.g. “vulnerable code not in execute path”). Tooling consumes SBOM plus VEX so that triage focuses only on genuinely exploitable issues. Producing VEX alongside your SBOM — and attaching both as signed attestations — turns vulnerability management from noise into signal.

31 Egress control with network policies

Most NetworkPolicy work focuses on ingress, but controlling egress — what a pod is allowed to connect out to — is a powerful defence against data exfiltration and command-and-control. A compromised pod with unrestricted egress can phone home, pull a second-stage payload, or scan internal services. A default-deny egress policy blocks all outbound traffic, after which you allow only the specific destinations each workload legitimately needs.

Common allow patterns include permitting DNS to kube-dns (UDP/TCP 53), allowing traffic only to a database pod selected by label, and restricting external calls to a known CIDR or through an egress gateway. Note that standard NetworkPolicy is L3/L4 (IP and port), so for DNS-name or HTTP-aware egress rules you typically need a CNI with extensions (such as Cilium) or a service mesh. Tightly scoped egress turns a breached pod into a dead end.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: api-egress, namespace: payments }
spec:
  podSelector: { matchLabels: { app: api } }
  policyTypes: ["Egress"]
  egress:
  - to: [ { podSelector: { matchLabels: { app: db } } } ]
    ports: [ { protocol: TCP, port: 5432 } ]
  - to: [ { namespaceSelector: {} } ]
    ports: [ { protocol: UDP, port: 53 }, { protocol: TCP, port: 53 } ]

32 Service mesh mTLS

Within a cluster, service-to-service traffic is often plain HTTP, trusting the network. A service mesh (Istio, Linkerd) injects a sidecar proxy alongside each workload and transparently wraps traffic in mutual TLS (mTLS) — both client and server present certificates, so each connection is encrypted and both ends are cryptographically authenticated. This delivers workload identity: a service is identified by its certificate (often a SPIFFE identity), not merely its IP, which can be spoofed.

Meshes typically run mTLS in a PERMISSIVE mode during migration (accepting both plaintext and mTLS) and then switch to STRICT to require it everywhere. On top of authenticated identity you can write authorization policies — allow service A to call service B’s /checkout but nothing else. The mesh thus provides encryption in transit, strong identity, and fine-grained access control without changing application code.

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata: { name: default, namespace: payments }
spec:
  mtls:
    mode: STRICT

33 Secrets injection with CSI driver and Vault

Native Kubernetes Secrets sit in etcd and can leak through it; many teams prefer to keep secrets in a dedicated manager and inject them at runtime. The Secrets Store CSI Driver mounts secrets from an external store (Vault, AWS/GCP/Azure secret managers) into a pod as a volume at startup, so the value is delivered straight to the pod filesystem and need not be stored as a Kubernetes Secret at all.

An alternative is the Vault Agent sidecar (or injector), which authenticates to Vault using the pod’s Kubernetes service-account token, fetches secrets, writes them to a shared in-memory volume, and can renew or rotate them automatically. Both approaches favour short-lived, dynamically generated credentials over long-lived static ones, and keep the sensitive material out of plain etcd. The pod authenticates with its own identity, so access is auditable and scoped per workload.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata: { name: vault-db }
spec:
  provider: vault
  parameters:
    roleName: "payments-db"
    vaultAddress: "https://vault.example.com"
    objects: |
      - objectName: "db-password"
        secretPath: "secret/data/payments"
        secretKey: "password"

34 Node and kubelet hardening

Worker nodes run the kubelet, the agent that manages pods on the node — and a misconfigured kubelet is a direct path to running arbitrary containers or reading every secret on the node. Hardening starts with the kubelet API: disable anonymous authentication (--anonymous-auth=false), require authorization via --authorization-mode=Webhook (not AlwaysAllow), and protect the read-only port. Enable NodeRestriction on the API server so a kubelet can only modify its own node and the pods bound to it.

Beyond the kubelet, harden the host: a minimal/immutable OS, no unnecessary packages or open ports, restricted SSH, kernel hardening, and prompt patching. Limit which workloads can touch the host — block hostNetwork, hostPID, hostPath mounts and privileged pods through admission policy. Since every container shares this node’s kernel, the node’s posture is the floor under all container isolation.

# kubelet config hardening (KubeletConfiguration)
authentication:
  anonymous:
    enabled: false
authorization:
  mode: Webhook
readOnlyPort: 0
protectKernelDefaults: true

35 Kubernetes audit logging

Detection and incident response depend on knowing who did what. The Kubernetes audit log records requests to the API server — the user or service account, the verb, the resource, the source IP, and the response — giving a forensic trail of cluster activity. Without it, you cannot reconstruct how an attacker moved or which credentials were abused.

An audit policy controls what is recorded and at which level: None (skip), Metadata (request metadata only), Request (metadata plus request body), and RequestResponse (plus the response body). You tune levels per resource — capturing full bodies for sensitive objects like Secrets and RBAC, but only metadata for noisy read traffic, to balance signal against volume. Ship the logs off the control-plane node to a tamper-resistant store and alert on suspicious patterns such as mass secret reads or privilege-escalation attempts.

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["secrets"]
  - level: Metadata
    resources:
      - group: ""
        resources: ["pods"]

36 Multi-tenancy isolation

When several teams or customers share one cluster, the goal is tenant isolation: a noisy or compromised tenant must not affect the others. The common unit of soft isolation is the namespace, reinforced by per-namespace RBAC (scoped Roles), ResourceQuotas and LimitRanges (so one tenant cannot exhaust CPU/memory), a default-deny NetworkPolicy per namespace, and a Restricted Pod Security Standard.

But namespaces are soft isolation — all tenants still share one API server, one set of nodes, and one kernel. For untrusted tenants you escalate to hard isolation: dedicated node pools per tenant (with taints/tolerations and affinity), sandboxed runtimes like gVisor or Kata, or even separate clusters (a cluster-per-tenant model). The right level depends on the threat model: cooperating internal teams may be fine with namespaces, while hostile or regulated tenants demand stronger, kernel-level separation.

37 Image patching and lifecycle

An image is a frozen snapshot of software; the moment it is built, the clock starts ticking as new CVEs are disclosed against the packages it contains. Security therefore requires a patching lifecycle, not a one-time clean scan. The core practice is to rebuild regularly from updated bases — pull the latest patched base layer and dependencies and re-emit the image — rather than patching running containers in place, which violates immutability and causes drift.

Operationally this means pinning bases by digest for reproducibility while still scheduling routine rebuilds, automating dependency updates, and tracking image age so stale images are flagged. Because containers are immutable and cattle, not pets, you remediate by building a new image and rolling it out, then discarding the old one — never by exec-ing in to run a package update. A short, automated path from base update to redeployed image is what keeps a fleet patched.

38 The 4 C’s of cloud-native security

A useful mental model for layering defences is the 4 C’s of Cloud Native Security: Cloud, Cluster, Container, Code. They are nested concentric layers, and each inner layer depends on the security of the layers outside it — you cannot secure a container if the cluster or cloud beneath it is compromised.

Cloud (or corporate datacentre) — the infrastructure: IAM, network controls, encryption, the security of the nodes and control plane provider.
Cluster — Kubernetes itself: RBAC, network policy, admission control, secrets, control-plane hardening.
Container — the image and runtime: minimal images, scanning, non-root, dropped capabilities, seccomp.
Code — the application: secure coding, dependency management, TLS, input validation, no hard-coded secrets.

The model reminds you that container hardening alone is insufficient — defence in depth means addressing all four layers together.

39 Threat detection and incident response

Prevention will eventually fail, so a mature program plans for detection and response. Detection draws on layered signals: runtime sensors (Falco, eBPF tools) flagging anomalous syscalls, the Kubernetes audit log surfacing suspicious API activity, network telemetry revealing unexpected egress, and image/admission events. Mapping detections to a framework like MITRE ATT&CK for Containers helps ensure coverage across tactics — initial access, execution, persistence, privilege escalation, exfiltration.

Response in a container world has a distinctive advantage: workloads are ephemeral and reproducible. A standard playbook is to isolate a suspect pod (apply a quarantine NetworkPolicy and cordon the node), capture forensic evidence (process list, file changes, memory) before it is lost, then kill and replace the pod from a known-good image. Because you can recreate the workload cleanly, containment can be fast and aggressive — but only if detection, runbooks and evidence capture were prepared in advance.

📚 Lessons & quizzes