Aytict Academy

1 What counts as a secret?

A secret is any piece of data that grants access or proves identity and would cause harm if it leaked. Typical examples include passwords, API keys, database credentials, private TLS/SSH keys, OAuth tokens, signing keys and encryption keys.

The defining property is that a secret must be kept confidential: anyone who holds it can act as you. This is different from ordinary configuration such as a log level or a feature flag, which is not sensitive. The first job of secrets management is simply to recognise what is a secret in the first place, so it can be handled with care.

2 Why secrets must never be hardcoded or committed

Hardcoding a secret in source code — or committing it to a Git repository — is one of the most common and damaging mistakes in software. Once a secret is committed, it lives forever in the version-control history, even if you delete it in a later commit. Anyone with read access to the repo, now or in the future, can recover it.

Source code spreads widely: it is cloned to laptops, mirrored to backups, forked, and pushed to public hosts by accident. A single leaked key in a public repository is often scanned and abused by bots within minutes. Secrets must therefore be kept out of code entirely and supplied separately at runtime.

3 Secrets in env vars, images and logs

Keeping secrets out of source is necessary but not sufficient — secrets also leak through other channels. Baking a secret into a container image embeds it in an image layer that anyone who pulls the image can extract. Even if a later layer deletes the file, the earlier layer still contains it.

Environment variables are a common injection method but carry risks: they can appear in process listings, crash dumps, debug endpoints and child processes. Logs are a frequent accidental sink — printing a request header or config object can dump a token into log storage where it is widely readable. Always treat logs, images and process state as places secrets can escape, and scrub them accordingly.

4 Twelve-factor config: config in the environment

The twelve-factor app methodology says to store configuration that varies between deploys — including credentials — in the environment, strictly separated from code. The litmus test: could you open-source the codebase right now without exposing any credentials? If yes, config is properly externalised.

This separation lets the same build artifact run in dev, staging and production, with only the injected config differing. In practice the “environment” is often a secrets manager that populates env vars or mounts files at runtime, rather than literal shell variables. The key idea is that config is injected at runtime, never compiled in.

# Twelve-factor: code reads config from the environment, never hardcodes it
import os

DB_URL = os.environ["DATABASE_URL"]   # injected at runtime by the platform
API_KEY = os.environ["PAYMENTS_API_KEY"]
# The same image runs everywhere; only the injected values change.

5 Secrets managers and vaults

A secrets manager (or vault) is a dedicated service that stores secrets centrally, encrypts them, controls access with fine-grained policy, and records an audit log of every read. Instead of scattering credentials across files and env vars, applications fetch them from one trusted source at runtime.

HashiCorp Vault is a popular platform-agnostic option; cloud providers offer managed equivalents such as AWS Secrets Manager, Azure Key Vault and Google Secret Manager. Beyond storage, good vaults add dynamic secrets, automatic rotation, leasing and revocation. The win is centralised control and auditability: you know who accessed what, and you can revoke instantly.

6 Encryption at rest and in transit

Secrets must be protected both while stored and while moving. Encryption at rest means the stored data on disk (the vault’s backend, database or backups) is encrypted, so stealing the storage media yields only ciphertext. Encryption in transit means data is protected as it travels over the network, typically with TLS, so an attacker who taps the wire sees nothing useful.

Both are required: encryption at rest does not help if the secret is sent in cleartext over HTTP, and TLS does not help if the disk stores plaintext. A vault should always serve secrets over TLS and persist them encrypted. The strength of all this rests on protecting the encryption keys themselves.

7 Envelope encryption

Envelope encryption is a layered scheme used by virtually every cloud KMS. Data is encrypted with a per-object data encryption key (DEK); the DEK is then itself encrypted (“wrapped”) by a key encryption key (KEK) held in a hardened key-management service. The system stores the ciphertext alongside the wrapped DEK.

To decrypt, the service asks the KMS to unwrap the DEK, then uses the DEK to decrypt the data. The benefits: the master KEK never leaves the KMS, you can encrypt huge volumes with fast symmetric DEKs, and rotating the KEK only re-wraps the DEKs rather than re-encrypting all the underlying data.

# Envelope encryption (conceptual)
DEK = generate_random_key()             # data encryption key, per object
ciphertext = encrypt(plaintext, DEK)    # fast symmetric encryption
wrapped_dek = kms.wrap(DEK, key_id=KEK) # KEK never leaves the KMS
store(ciphertext, wrapped_dek)          # keep both together

# Decrypt: DEK = kms.unwrap(wrapped_dek); decrypt(ciphertext, DEK)

8 Dynamic, short-lived secrets vs long-lived

A long-lived secret — a static password or API key that stays valid for months — is a standing liability: if it leaks, the window of abuse is huge, and you may never notice. Dynamic secrets flip this around. A vault generates credentials on demand, scoped to a short lease (minutes or hours), and automatically revokes them when the lease expires.

For example, Vault can create a unique, temporary database username and password for each application instance, then delete it shortly after. Because each credential is short-lived and uniquely attributable, a leak has a tiny blast radius and an automatic expiry. The principle: prefer ephemeral credentials over standing ones wherever possible.

9 Secret rotation

Rotation is the practice of regularly replacing a secret with a new value and retiring the old one. Even secrets that never leak should be rotated periodically, because the longer a credential lives, the more places it may have been copied and the larger the risk it has been quietly compromised.

Good rotation is automated and ideally zero-downtime: the new secret is provisioned and distributed, consumers switch over, and only then is the old one revoked. Many secrets managers can rotate credentials automatically on a schedule (for example, rotating a database password every 30 days). Manual rotation tends to be skipped, so automation is what makes rotation actually happen.

10 Detecting leaked secrets

Despite best efforts, secrets do get committed. Secret scanning tools search code and history for things that look like credentials — high-entropy strings and known key formats. Popular tools include gitleaks and trufflehog.

The strongest defence is a pre-commit hook that scans staged changes and blocks the commit before a secret ever enters the repository — prevention beats cleanup. Because a leaked secret persists in history, you should also run history scanning across all past commits, not just the latest diff. Many platforms additionally offer server-side push protection that rejects commits containing detected secrets.

# Scan the whole git history for leaked secrets
gitleaks detect --source . --redact

# trufflehog can also scan a repo's full history
trufflehog git file://. --only-verified

11 After a leak: rotate!

When a secret is exposed, the single most important action is to rotate it immediately — revoke the leaked credential and issue a new one. Deleting the offending commit is not enough: the value is already in history, in clones, in CI logs and possibly already harvested. The leaked secret must be treated as compromised the moment it is exposed.

A full response is: rotate (invalidate and replace the secret), then assess the blast radius and review access logs for any abuse, then optionally scrub the value from history. Scrubbing history is good hygiene but it is secondary — rotation is what actually stops the attacker, because only revocation makes the leaked value useless.

12 Least-privilege access to secrets

The principle of least privilege says each identity should have only the minimum access it needs, and nothing more. Applied to secrets: a service should be able to read only the specific secrets it uses, not the entire vault. A leaked or compromised identity then exposes a small blast radius rather than everything.

In practice this means scoping policies per application, using distinct identities per service, granting read-only where write is not needed, and preferring short-lived, scoped tokens over broad master credentials. Avoid shared, all-powerful accounts. Combined with auditing, least privilege turns a breach of one component into a contained incident instead of a full compromise.

13 Sealed and encrypted secrets in Git

GitOps stores desired state in Git, but you must never commit plaintext secrets. The answer is to commit them encrypted, so only the cluster (or an authorised operator) can decrypt them. Two common tools:

SOPS (Secrets OPerationS) encrypts the values in a YAML/JSON file while leaving the keys readable, using a backing key from KMS, age or PGP — so a diff still shows which keys changed without revealing the values. Sealed Secrets uses a controller in the cluster: you encrypt a Secret into a SealedSecret with the controller’s public key, commit that, and only the in-cluster controller’s private key can decrypt it. Both let you safely keep secrets in version control.

# SOPS encrypts only the values; keys stay readable in the diff
apiVersion: v1
kind: Secret
data:
    api_key: ENC[AES256_GCM,data:9f3a...,iv:...,tag:...,type:str]
# Decrypted in-cluster or by an authorised operator, never committed in plaintext

14 What is Policy as Code?

Policy as Code (PaC) expresses organisational rules — security, compliance, operational standards — as machine-readable, version-controlled code that is evaluated automatically. Instead of a wiki page saying “don’t expose storage buckets publicly”, you write a policy that checks for it and fails the build when violated.

The guiding philosophy is guardrails, not gatekeepers: policies should let teams move fast within safe boundaries, giving instant automated feedback, rather than relying on slow manual review boards. Because policies are code, they are testable, reviewable, diffable and consistent across every environment — the same rule runs the same way everywhere.

15 Open Policy Agent and Rego basics

Open Policy Agent (OPA) is a general-purpose, CNCF-graduated policy engine. You feed it some input (JSON describing a request, a resource, or a config) and a set of policies, and it returns a decision — allow, deny, or a list of violations. OPA decouples policy from your services: the same engine can guard Kubernetes, CI pipelines, APIs and Terraform.

OPA policies are written in a declarative language called Rego. Rego rules describe what should be true; you query a rule and OPA evaluates it against the input. The tiny example below denies any policy decision where a Pod runs as root. Learning Rego basics — rules, input, and the default-deny pattern — unlocks most OPA use cases.

# OPA uses the Rego language. A tiny rule that denies root containers:
package kubernetes.admission

deny[msg] {
    input.request.kind.kind == "Pod"
    some c
    input.request.object.spec.containers[c].securityContext.runAsUser == 0
    msg := "containers must not run as root (runAsUser 0)"
}

16 Admission control with policy

In Kubernetes, an admission controller intercepts requests to the API server before an object is persisted, and can validate (allow/deny) or mutate (modify) it. This is the natural place to enforce policy: a non-compliant resource is rejected at creation, never reaching the cluster.

Two leading policy-based admission tools are OPA Gatekeeper (which runs OPA/Rego policies as Kubernetes constraints) and Kyverno (a Kubernetes-native engine whose policies are written as Kubernetes YAML resources, no separate language to learn). Both let you enforce rules such as “all images must come from an approved registry” or “every Pod must set resource limits” automatically at admission time.

17 IaC security scanning

Infrastructure-as-Code (IaC) scanning statically analyses Terraform, CloudFormation, Kubernetes manifests and similar files to catch misconfigurations before they are deployed. It is a shift-left control: the problem is found at author or pull-request time, not after a vulnerable resource is already live.

Common tools include tfsec, checkov and KICS. They flag issues such as a publicly readable storage bucket, an unencrypted volume, a security group open to 0.0.0.0/0, or missing logging. Because the scan runs against the declared config, it catches the mistake before any cloud resource exists — far cheaper than remediating a live exposure.

# Scan Terraform for misconfigurations before deploying
tfsec .
checkov -d .

# Example finding: S3 bucket is public, or a security group allows 0.0.0.0/0

18 Compliance as code and continuous compliance

Compliance as code expresses regulatory and standards requirements (PCI-DSS, HIPAA, SOC 2, CIS Benchmarks) as automated, executable checks rather than manual checklists and screenshots. Each control becomes a policy that can be evaluated against real configuration and produce evidence automatically.

Continuous compliance runs those checks constantly — in CI and against live infrastructure — so drift is detected the moment a resource falls out of policy, instead of once a year at audit time. The payoff is an always-current, auditable view of your posture, with machine-generated evidence, turning compliance from a periodic scramble into an ongoing, automated property of the system.

19 Policy in the CI/CD pipeline

Policy and security checks deliver the most value when wired directly into the CI/CD pipeline, where they act as automated quality gates. A typical secure pipeline runs, on every pull request: secret scanning (block committed credentials), IaC scanning (tfsec/checkov/KICS for misconfigurations), and policy evaluation (OPA/Rego or Kyverno checks). If any gate fails, the build is blocked.

This embodies shift-left: problems are caught early, automatically, with fast feedback to the developer who introduced them, instead of in a late manual review or in production. Policies live in the repo alongside the code, run identically for everyone, and form guardrails that make the secure path the default path.

# A policy gate in a CI pipeline (pseudo-YAML)
policy-gate:
  steps:
    - run: gitleaks detect --source . --redact   # block leaked secrets
    - run: tfsec .                                # IaC misconfig scan
    - run: opa eval -d policy/ -i plan.json 'data.main.deny'
  # Any failing gate blocks the build before deployment

20 Vault authentication methods

Before a client can read secrets from HashiCorp Vault, it must authenticate and receive a token. Vault supports many pluggable auth methods, each suited to a different kind of caller. A human operator might log in with userpass, OIDC or LDAP; a machine or workload uses a method tied to its platform identity.

Common machine auth methods include Kubernetes (a Pod presents its service-account JWT), AWS (an instance proves its IAM identity), AppRole (a role id plus a secret id) and JWT/OIDC (a signed token from a trusted issuer). Each method maps the proven identity to a Vault policy that grants exactly the paths that identity may use, and returns a token with a finite lifetime.

# A Pod authenticates to Vault using its Kubernetes service-account token
vault write auth/kubernetes/login \
    role="payments-app" \
    jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
# Vault verifies the JWT, maps the role to a policy, and returns a token

21 Vault dynamic database credentials

Vault’s database secrets engine generates database credentials on demand instead of storing a static username and password. You configure the engine once with an admin connection and a role that defines the SQL to create a user (its grants) and a lease TTL. When an application asks for credentials, Vault runs that SQL, creates a brand-new user, and hands back a unique username and password.

Each application instance gets its own short-lived credential. When the lease expires — or you explicitly revoke it — Vault runs the revocation SQL to drop the user. This means no shared static password to leak, automatic cleanup, and per-instance attribution in the database’s own audit log. A leaked credential is useless within minutes.

# Define a role that mints a read-only Postgres user with a 1h lease
vault write database/roles/reporting \
    db_name=appdb \
    creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
    default_ttl="1h" max_ttl="24h"

# App requests fresh credentials
vault read database/creds/reporting

22 The Vault transit engine: encryption as a service

The transit secrets engine turns Vault into encryption as a service. Applications send plaintext to Vault and get ciphertext back — but Vault does not store the data. It only holds the encryption keys and performs the cryptographic operations. This keeps the key material inside Vault while letting your services encrypt and decrypt freely.

The big win is that developers never handle raw keys, key rotation is centralised, and you get one audited place for all crypto. Transit also supports rewrap (re-encrypt old ciphertext under a newer key version without exposing plaintext), signing/verification, and HMAC. It is the cloud-agnostic equivalent of a KMS, giving you envelope-style protection without each app managing keys.

# Encrypt data without the app ever holding the key
vault write transit/encrypt/orders \
    plaintext="$(echo -n 'card-1234' | base64)"
# -> returns ciphertext like vault:v1:abc123...

# Decrypt later (Vault never stored the data, only the key)
vault write transit/decrypt/orders ciphertext="vault:v1:abc123..."

23 Leases, TTLs and revocation

Every dynamic secret Vault issues comes with a lease: a contract that the credential is valid only for a bounded time, the TTL (time to live). When the TTL elapses, Vault automatically revokes the secret — dropping the database user, deleting the cloud key, and so on. Clients that need the secret longer can renew the lease up to a max TTL, after which it must be reissued.

Leasing is what makes ephemeral secrets practical at scale: expiry is handled by the system, not by hopeful manual cleanup. Crucially, leases support explicit revocation — during an incident you can revoke a single lease, or every lease under a path, instantly invalidating compromised credentials across the fleet. Short TTLs plus revocation give a small, controllable blast radius.

# Renew a lease before it expires (up to its max TTL)
vault lease renew database/creds/reporting/abc123

# Incident response: revoke one lease, or every lease under a path
vault lease revoke database/creds/reporting/abc123
vault lease revoke -prefix database/creds/reporting

24 The secret-zero bootstrapping problem

A vault solves “where do secrets live”, but creates a new question: how does a workload get the first credential needed to authenticate to the vault? This initial credential is the secret zero, and the chicken-and-egg challenge of delivering it securely is the secret-zero problem. If secret zero is itself a long-lived static token committed somewhere, you have merely moved the original problem.

The modern answer is to avoid a stored secret-zero entirely by using a platform-provided identity the workload already has: a Kubernetes service-account token, an AWS instance role, or a SPIFFE identity. The workload proves who it is using an attestable identity issued by the platform, and the vault trades that for a token. There is nothing static to leak because the bootstrap identity is short-lived and minted by the infrastructure.

25 Workload identity with SPIFFE and SPIRE

SPIFFE (Secure Production Identity Framework For Everyone) is a standard that gives every workload a verifiable, platform-neutral identity. The identity is a SPIFFE ID, a URI like spiffe://example.org/payments, delivered as a short-lived SVID (SPIFFE Verifiable Identity Document), typically an X.509 certificate or JWT. This lets services authenticate to each other on identity rather than on network location or a shared secret.

SPIRE is the reference implementation. A SPIRE agent on each node attests a workload — checking properties like its Kubernetes service account, process UID or container image — and only then issues the matching SVID. Because SVIDs are short-lived and automatically rotated, and issuance is tied to attested properties, SPIFFE/SPIRE directly addresses secret zero: the identity is minted by the infrastructure, with no static credential to distribute.

26 Short-lived certificates

Long-lived TLS certificates are a liability: if a private key leaks, the certificate stays trusted until its distant expiry, and revocation via CRLs and OCSP is notoriously unreliable. The modern alternative is short-lived certificates — certificates valid for hours rather than years, issued automatically and renewed continuously by an internal CA.

Because lifetimes are tiny, expiry effectively replaces revocation: a compromised key becomes useless almost immediately, with no dependence on clients checking a revocation list. Vault’s PKI engine, SPIRE’s X.509 SVIDs, and service meshes that auto-rotate mTLS certificates all use this pattern. The trade-off — needing reliable automated issuance — is exactly what infrastructure now provides by default.

# Issue a short-lived leaf certificate from Vault's PKI engine
vault write pki/issue/internal \
    common_name="payments.svc.internal" \
    ttl="1h"
# The cert auto-expires in an hour; renewal is automated, so expiry replaces revocation

27 Rego in depth: rules and functions

Beyond simple deny rules, Rego has a small but expressive structure worth understanding. A complete rule assigns a single value (default allow = false then allow { … }). A partial rule builds a set or object, like deny[msg] collecting every violation message. Inside a rule body, statements are joined by implicit AND, and all variables must unify for the rule to hold.

Rego also supports user-defined functions — is_public(net) { net == "0.0.0.0/0" } — plus comprehensions for building collections and built-ins like startswith or count. Helper rules and functions let you factor common logic out of many policies, keeping each rule readable. Mastering complete vs partial rules, unification, and functions is the core of writing non-trivial Rego.

package example

default allow = false

# A reusable function
is_approved_registry(img) {
    startswith(img, "registry.internal/")
}

# Complete rule: allow only if every container uses an approved registry
allow {
    count(input.containers) > 0
    every c in input.containers {
        is_approved_registry(c.image)
    }
}

28 Testing Rego policies with opa test

Because policies are code, they deserve unit tests. OPA has a built-in test runner: any rule whose name starts with test_ in a file is executed by opa test, and the rule passing (evaluating to true) means the test passes. You write tests that feed crafted input documents into your policy using the with input as … construct and assert on the resulting decision.

Good policy test suites cover both the allow path (compliant input is permitted) and the deny path (each violation is correctly caught), plus edge cases. Running opa test in CI prevents a well-meaning policy edit from silently breaking enforcement — a real risk, since a policy that always evaluates to “allow” looks fine until something bad slips through. Tested policies are trustworthy guardrails.

package kubernetes.admission

# Test: a root container must be denied
test_denies_root {
    deny[_] with input as {
        "request": {
            "kind": {"kind": "Pod"},
            "object": {"spec": {"containers": [{"securityContext": {"runAsUser": 0}}]}}
        }
    }
}

# Run with:  opa test .

29 Gatekeeper constraint templates and constraints

OPA Gatekeeper splits a policy into two Kubernetes resources. A ConstraintTemplate defines the reusable logic: it carries the Rego that produces violations and declares a new constraint kind plus any parameters the policy accepts. A Constraint is then an instance of that kind which applies the template with specific parameters and a scope (which resource kinds and namespaces it targets).

This separation means a platform team can author one well-tested template — say K8sRequiredLabels — and many teams instantiate it with their own parameters, like requiring an owner label on Namespaces. Gatekeeper also offers an audit mode that continuously reports existing violations in the cluster, not just blocking new ones, giving visibility before you enforce.

# A Constraint instantiates a template with parameters and a target scope
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
    name: ns-must-have-owner
spec:
    match:
        kinds:
            - apiGroups: [""]
              kinds: ["Namespace"]
    parameters:
        labels: ["owner"]

30 Kyverno policy patterns

Kyverno is a Kubernetes-native policy engine whose policies are themselves Kubernetes YAML resources — no separate language to learn. A Kyverno ClusterPolicy contains rules, each of one of three kinds: validate (allow/deny a resource against a pattern), mutate (inject or modify fields, like adding default labels), and generate (create related resources, such as a default NetworkPolicy for every new Namespace).

Validation uses an intuitive pattern with overlays and operators (?, *, anchors like =()). A rule also carries a validationFailureAction of Audit or Enforce, letting you roll a policy out in report-only mode first and switch to blocking once clean. Because policies are plain YAML, they fit naturally into GitOps and are easy for Kubernetes users to read and review.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
    name: require-run-as-non-root
spec:
    validationFailureAction: Enforce
    rules:
        - name: check-non-root
          match:
              any:
                  - resources:
                        kinds: ["Pod"]
          validate:
              message: "Pods must run as non-root"
              pattern:
                  spec:
                      securityContext:
                          runAsNonRoot: true

31 Validating config files with conftest

conftest applies OPA/Rego policies to structured configuration files — YAML, JSON, HCL, Dockerfiles, INI and more — outside of Kubernetes admission. It parses a file into a document, evaluates your Rego against it, and reports deny (and optionally warn) results. This brings policy-as-code to anything that is configuration: CI pipeline definitions, Docker Compose files, Terraform plans, Helm output.

conftest looks for policies in a policy/ directory by default and uses the same deny[msg] convention as OPA. It fits naturally in CI as a gate: run conftest test deployment.yaml and fail the build on any violation. Because it works on plain files, you can enforce standards (no :latest image tags, required labels, banned settings) long before the config ever reaches a cluster.

# policy/deployment.rego
package main

deny[msg] {
    input.kind == "Deployment"
    some i
    image := input.spec.template.spec.containers[i].image
    endswith(image, ":latest")
    msg := sprintf("image %v must not use the :latest tag", [image])
}

# Run:  conftest test deployment.yaml

32 Mutating vs validating admission policies

Admission policies come in two flavours that run in a fixed order. A mutating policy runs first and can change the incoming object — injecting defaults such as resource limits, a sidecar, or a missing label. A validating policy runs after all mutations and can only accept or reject the (now-final) object; it must not modify it.

The ordering matters: validation sees the object exactly as it will be persisted, including anything mutation added, so the two compose cleanly — mutate to make the common case compliant by default, then validate to guarantee the result meets the rule. Prefer mutation for sensible defaults and validation for hard requirements. Mutating policies should be conservative and idempotent, since silently changing user objects can surprise people if overused.

33 Mapping controls to compliance frameworks

Auditors think in controls — numbered requirements in a framework like SOC 2, ISO 27001, PCI-DSS or a CIS Benchmark. To make compliance-as-code credible, each automated policy should be explicitly mapped to the control(s) it satisfies. A policy that forbids public storage buckets, for instance, maps to specific access-control and data-protection controls, and that linkage should be recorded as metadata on the policy.

This mapping turns a passing policy run into evidence: instead of gathering screenshots once a year, you can produce a report showing “control X is enforced by policy Y, which passed on these resources at this time.” Tagging policies with their control identifiers also reveals coverage gaps — controls with no automated check — so you know where manual attestation is still required.

# Annotate a policy with the controls it satisfies, so a pass becomes evidence
package compliance.storage

# METADATA
# title: No public storage buckets
# custom:
#   controls: ["CIS-AWS-2.1.5", "SOC2-CC6.1", "PCI-DSS-1.2"]
deny[msg] {
    input.resource.acl == "public-read"
    msg := "storage bucket must not be publicly readable"
}

34 Automated secret rotation pipelines

Reliable rotation is an automated pipeline, not a manual chore. A robust rotation follows a two-phase pattern that avoids downtime. First, create a new credential alongside the old one so both are valid at once. Next, distribute the new value to all consumers and let them adopt it. Only after consumers have switched do you revoke the old credential.

Many secrets managers automate this with rotation functions triggered on a schedule: AWS Secrets Manager runs a Lambda implementing createSecret, setSecret, testSecret and finishSecret stages; Vault rotates via its database and other engines. The test step is critical — verify the new credential actually works before retiring the old one, so a broken rotation never takes the service down.

35 Secrets in CI/CD via OIDC

Traditionally, a CI pipeline stored a long-lived cloud access key as a repository secret to deploy — a high-value, static credential that is easy to leak and hard to rotate. The modern pattern eliminates it using OIDC federation. The CI platform (GitHub Actions, GitLab) acts as an OIDC identity provider and issues a short-lived, signed JWT describing the running job (its repo, branch, workflow).

The cloud provider is configured to trust that issuer and exchange the JWT for temporary credentials, with a trust policy that restricts which repositories and branches may assume which roles. The result: no static cloud keys stored anywhere, automatic expiry, and fine-grained, per-workflow access. This is workload identity applied to pipelines — the job proves who it is instead of holding a secret.

# GitHub Actions exchanges its OIDC token for short-lived AWS credentials
permissions:
    id-token: write   # allow the job to request an OIDC token
    contents: read
steps:
    - uses: aws-actions/configure-aws-credentials@v4
      with:
          role-to-assume: arn:aws:iam::123456789012:role/deploy
          aws-region: eu-north-1
# No long-lived AWS access key is stored as a secret

36 IaC scanning with custom Rego rules

Off-the-shelf IaC scanners catch common misconfigurations, but every organisation has its own standards — an approved region list, a mandatory cost-centre tag, a banned instance type. You can encode these as custom rules. Tools like checkov support custom Python or YAML policies, and you can run conftest or OPA directly against a Terraform plan exported to JSON to enforce arbitrary Rego.

Scanning the terraform plan -out JSON is powerful because it reflects the actual computed changes, including resolved variables and modules, not just the raw source. Custom rules let you fail a build when, say, a resource is created outside an allowed region or omits a required tag — turning organisation-specific governance into automated, shift-left guardrails rather than tribal knowledge.

# Custom Rego over a terraform plan: deny resources outside allowed regions
package main

allowed := {"eu-north-1", "eu-west-1"}

deny[msg] {
    rc := input.resource_changes[_]
    region := rc.change.after.region
    not allowed[region]
    msg := sprintf("%v uses disallowed region %v", [rc.address, region])
}

# terraform show -json plan.out > plan.json ; conftest test plan.json

37 Audit-then-enforce: a policy rollout strategy

Switching a new policy straight to blocking across a large estate is risky — you may discover dozens of pre-existing violations and break legitimate workflows on day one. The safe rollout is audit then enforce. First deploy the policy in audit (report-only) mode: it records every violation but blocks nothing, giving you a true picture of the blast radius.

With that data you remediate existing violations, refine the policy to remove false positives, and communicate the upcoming change. Only once the audit reports come back clean do you flip the same policy to enforce, where it begins rejecting non-compliant changes. Most engines support this directly — Gatekeeper’s audit, Kyverno’s Audit vs Enforce action — making a gradual, low-drama rollout the default path.

38 Detecting and responding to secret leaks at scale

Prevention is not perfect, so mature teams build a detection-and-response capability for leaks. Detection spans several layers: push protection and history scanning on repositories, monitoring of public sources (paste sites, public forks, package registries) for your key patterns, and provider-side alerts — many cloud and SaaS vendors notify you when a credential of theirs appears in a public commit.

Response should be a rehearsed runbook, not improvisation. The core steps: revoke/rotate the exposed secret immediately, scope the exposure (what could the credential access, and for how long was it valid), review audit logs for actual abuse, then scrub history and capture lessons learned. The order is deliberate — revocation neutralises the threat first; forensics and cleanup follow. Short-lived credentials make every step easier because the window is already small.

📚 Lessons & quizzes