🧪

Security Testing in CI/CD Intermediate

Automate security testing in the pipeline: SAST, DAST, SCA, IAST, secret scanning and fuzzing — with sensible gates.

38 lessons 114 quiz questions
Lessons & quizzes Certificate

📚 Lessons & quizzes

Each lesson ends with its own short quiz. Answer them as you go — score 90% across all lessons to earn your certificate.

1 Why automate security testing

Manual security reviews and once-a-year penetration tests cannot keep pace with teams that ship many times a day. By the time a yearly audit finds a flaw, dozens of releases have shipped on top of it. Automating security testing in the pipeline shifts checks left — closer to the moment code is written — so defects are caught when they are cheapest to fix.

The goal is fast, repeatable, objective feedback on every change. Automation does not replace human expertise; it removes the repetitive scanning so humans can focus on design review, threat modelling and triaging the findings that matter.

2 The categories of security testing

Automated security testing is not one tool but a family of complementary techniques, each looking at the application from a different angle:

  • SAST — static analysis of source or bytecode, without running the app.
  • DAST — dynamic testing of the running application from the outside.
  • SCA — software composition analysis of third-party dependencies for known CVEs and licenses.
  • IAST — interactive analysis that instruments the running app during tests.
  • Secret scanning — detecting committed credentials.
  • Container/image scanning and fuzzing round out the set.

No single category is sufficient; they overlap and cover each other’s blind spots.

3 SAST: static application security testing

SAST (static application security testing) analyses an application’s source code, bytecode or binaries without executing it. By parsing the code into an abstract syntax tree and tracing how untrusted data flows from sources to sensitive sinks, it spots patterns such as SQL injection, cross-site scripting, path traversal and use of unsafe or deprecated APIs.

Because it works directly on the code, SAST can run very early — even in the editor or in a pre-commit hook — and it can point to the exact file and line. It does not need a deployed, running environment.

# SAST stage in a CI pipeline (GitHub Actions style)
sast:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Run Semgrep static analysis
      run: semgrep ci --config auto   # analyses source, never runs the app

4 DAST: dynamic application security testing

DAST (dynamic application security testing) tests a running application from the outside, with no knowledge of its source code — a so-called black-box approach. It sends crafted HTTP requests and malicious payloads to the deployed app and observes the responses, much as an attacker would.

Because it exercises the real, running system, DAST finds runtime and configuration issues that static analysis cannot see: missing security headers, server misconfiguration, authentication flaws, and injection that actually fires end-to-end. It typically runs later in the pipeline, against a deployed staging environment.

# DAST stage running against a deployed staging URL
dast:
  runs-on: ubuntu-latest
  steps:
    - name: ZAP baseline scan against staging
      run: zap-baseline.py -t "$STAGING_URL"   # probes the running app

5 SAST vs DAST: trade-offs

SAST and DAST are complementary because they look at the app from opposite ends. SAST has broad code coverage, runs early, and pinpoints the exact line — but it cannot see runtime context, so it tends to produce more false positives (flagging code paths that are never actually exploitable). DAST sees real runtime behaviour and produces fewer false positives, but it can only test code paths it manages to reach, needs a running environment, and cannot tell you which line of source is at fault.

A mature pipeline runs both: SAST early on every change, DAST later against staging.

6 SCA: software composition analysis

Modern applications are mostly third-party code: open-source libraries pulled in transitively through package managers. SCA (software composition analysis) inventories every direct and transitive dependency and checks each version against databases of known vulnerabilities (CVEs), telling you which packages are affected and what to upgrade to.

SCA also surfaces the license of each dependency, which matters for legal compliance. Tools such as Dependabot, Trivy, Grype and OWASP Dependency-Check read the lockfile or manifest, so SCA can run early and fast in CI.

# SCA: scan dependencies for known CVEs
sca:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Scan dependencies with Trivy
      run: trivy fs --scanners vuln,license .   # reads lockfiles, checks CVEs

7 IAST: interactive application security testing

IAST (interactive application security testing) is a hybrid that instruments the running application with an agent — sensors placed inside the code — and observes it while existing functional or integration tests exercise it. Because the agent sees both the code and the live data flowing through it, IAST can confirm whether a tainted input actually reaches a dangerous sink at runtime.

This combination gives IAST high accuracy and very low false positives: it reports a vulnerability only when it observes the vulnerable path being executed. The catch is that its coverage is limited to whatever your test suite actually drives.

8 Secret scanning

Hard-coded credentials — API keys, passwords, private keys, tokens — accidentally committed to a repository are one of the most common and damaging leaks. Secret scanning detects such committed credentials by matching known patterns (for example an AWS key format) and high-entropy strings against the code and its git history.

Tools such as gitleaks and TruffleHog run in pre-commit hooks and in CI. A secret that has been pushed must be treated as compromised: removing it from history is not enough — you must rotate (revoke and reissue) the credential.

# Secret scanning with gitleaks in CI
secrets:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
      with: { fetch-depth: 0 }   # full history
    - name: Run gitleaks
      run: gitleaks detect --source . --redact

9 Container image scanning

When you ship a container image, you ship not just your app but a whole userland: a base OS layer, system packages and your application’s dependencies. Any of these can carry known vulnerabilities. Container image scanning (Trivy, Grype, Clair) inspects the layers of a built image and reports CVEs in OS packages and installed libraries.

It fits in the pipeline after the image is built and before it is pushed to a registry or deployed. Many teams also re-scan images already in the registry, because new CVEs are disclosed against packages that were clean when the image was first built.

# Scan the built image before pushing it
image-scan:
  steps:
    - run: docker build -t myapp:$GIT_SHA .
    - name: Scan image layers for CVEs
      run: trivy image --severity HIGH,CRITICAL myapp:$GIT_SHA

10 Fuzzing: an introduction

Fuzzing feeds a program a flood of automatically generated, malformed or unexpected inputs and watches for crashes, hangs, memory errors or assertion failures. The idea is that real bugs hide in the inputs a developer never thought to test. Coverage-guided fuzzers (such as libFuzzer and AFL) instrument the target and mutate inputs to reach new code paths, getting steadily deeper over time.

Fuzzing excels at finding memory-safety bugs and parsing flaws in code that handles untrusted data. Because runs can be long, fuzzing is often run continuously or nightly rather than blocking every pull request.

11 Where each tool sits in the pipeline

Each technique has a natural home in the delivery flow, ordered roughly from fastest/earliest to slowest/latest:

  • Pre-commit hooks (fast, on the developer’s machine): secret scanning, lightweight SAST/linting.
  • Pull-request checks: SAST and SCA on the changed code.
  • Build stage: container image scanning after the image is built.
  • Staging / deployed: DAST and IAST against a running environment.
  • Continuous / nightly: fuzzing and full re-scans.

Putting fast checks early gives developers near-instant feedback; slower checks that need a running system run later so they do not stall every commit.

12 Gates: fail-the-build vs warn-only

A scan’s findings only matter if the pipeline acts on them. A fail-the-build (blocking) gate stops the pipeline and prevents merge or release when a finding crosses a threshold — strong enforcement, but it can block delivery on noise. A warn-only (non-blocking) gate records and reports findings without stopping the build — gentler, but easy to ignore.

A common, pragmatic policy combines them by severity threshold: fail the build on Critical and High findings, warn on Medium and Low. New tools are often introduced in warn-only mode first, then tightened to blocking once the noise is tuned down.

13 Severity thresholds and risk

Most scanners rank findings by severity, often using the CVSS score that maps a vulnerability onto a 0–10 scale and bands such as Low, Medium, High and Critical. Gating on severity lets you spend enforcement where the risk is greatest instead of treating every finding as equally urgent.

Severity alone is not the whole story: a High-severity flaw in an unreachable, internal-only code path may be lower risk than a Medium-severity one on an internet-facing login page. Good policies combine the scanner’s severity with context such as exploitability and exposure.

14 Triaging findings and false positives

No scanner is perfect: results include true positives, false positives (flagged but not actually a vulnerability) and sometimes false negatives (missed real issues). Triage is the human process of reviewing each finding to decide: fix it, accept the risk, or mark it a false positive.

When you suppress a false positive, do it explicitly with a recorded justification — an inline ignore comment or a suppression entry that says why it is safe and who decided. Blanket-disabling rules or suppressing without a reason hides real bugs and erodes trust in the results.

# Suppress a false positive WITH a justification (Semgrep)
result = run_query(user_id)  # nosemgrep: sql-injection -- user_id is an int validated above

15 Baselining and security regression

Turning a scanner on for the first time against a large, legacy codebase floods you with hundreds of pre-existing findings. If the gate blocks on all of them, nobody can merge anything. A baseline is a snapshot of the currently known findings; the gate then ignores everything in the baseline and only fails the build on new findings introduced by a change.

This enforces the “don’t make it worse” principle and catches security regressions — new vulnerabilities a change adds — while the existing backlog is burned down separately over time.

# Generate a baseline, then fail only on NEW findings
zap-baseline.py -t "$STAGING_URL" -g gen.conf   # create baseline
zap-baseline.py -t "$STAGING_URL" -c baseline.conf   # fail only on new alerts

16 License compliance

Open-source code comes with a license that dictates how you may use, modify and redistribute it. Some licenses are permissive (MIT, Apache-2.0); others are copyleft (such as the GPL family) and can require you to release your own source under the same terms when you distribute. Pulling in an incompatible license can create real legal exposure.

SCA tools build a dependency inventory — effectively an SBOM (software bill of materials) — and check each license against an organisational allow/deny policy, failing or warning when a disallowed license appears. This is policy enforcement, distinct from finding CVEs.

17 Reporting and dashboards

Findings scattered across dozens of separate pipeline logs are hard to act on. A common practice is to emit results in a standard format — for example SARIF (Static Analysis Results Interchange Format) — and aggregate them into a central dashboard or vulnerability-management platform. This deduplicates findings across tools, tracks them over time, and shows trends and ownership.

Good reporting also supports SLAs for remediation (for example, fix Criticals within 7 days) and surfaces metrics such as mean time to remediate, so security work is visible and managed rather than lost in noise.

# Emit SARIF and upload it to the code-scanning dashboard
sast:
  steps:
    - run: semgrep ci --sarif --output results.sarif
    - uses: github/codeql-action/upload-sarif@v3
      with: { sarif_file: results.sarif }

18 Putting it together: a layered pipeline

An effective program layers the techniques rather than relying on any single one. A representative flow: pre-commit secret scanning and quick SAST → PR checks running SAST and SCA on the diff with a blocking gate on new High/Critical findings → build producing an image that is scanned before push → staging where DAST and IAST exercise the running app → nightly fuzzing and full re-scans, all feeding a central dashboard.

Each layer covers the others’ blind spots, gates are tuned by severity and baselined to catch regressions, and findings are triaged with justified suppressions. Defence in depth applies to the pipeline itself.

19 Tuning SAST rules and reducing noise

An out-of-the-box SAST configuration often fires on rules that do not fit your stack, drowning real findings in noise. Developers who see mostly false alarms learn to ignore the tool entirely — so tuning is essential. Start by disabling rule categories that do not apply (for example, mobile rules in a backend service), then refine the rest.

Useful levers include scoping rules to relevant paths, raising the confidence threshold so only high-confidence matches block the build, and excluding generated code, vendored libraries and test fixtures. The aim is a high signal-to-noise ratio: when the scanner speaks, people listen.

# .semgrep.yml: tune which rules run and where
rules:
  - id: insecure-deserialization
    paths:
      include: [ "src/" ]
      exclude: [ "**/tests/**", "**/vendor/**", "**/generated/**" ]
    severity: ERROR   # only high-confidence rules block the build

20 Taint analysis: source to sink

The engine behind much of SAST’s injection detection is taint analysis. A source is any point where untrusted data enters the program — an HTTP parameter, a request header, a file read. A sink is a sensitive operation that is dangerous when fed untrusted data — a SQL query, a shell command, an HTML render. The analyser tracks whether tainted data can flow from a source to a sink.

A sanitiser breaks the flow: validation, escaping or parameterisation that renders the data safe. If tainted data reaches a sink without passing through a sanitiser, the tool reports a vulnerability. Modelling sources, sinks and sanitisers correctly is what makes the results accurate.

21 Authenticated DAST scans

An unauthenticated DAST scan only sees the public surface of an application — the login page and little else. Most of the interesting attack surface lives behind authentication: account settings, admin panels, payment flows. An authenticated scan gives the scanner valid credentials or a session token so it can crawl and test the logged-in application.

The tricky parts are logging in reliably and staying logged in: the scanner must avoid clicking logout links and must detect session expiry and re-authenticate. Authenticated scans dramatically increase coverage but must run against a dedicated test environment with disposable accounts, never production.

# ZAP authenticated scan: supply a session and exclude logout
zap.conf:
  context:
    authentication: form-based
    loginUrl: "$STAGING_URL/login"
    excludeFromScan: [ ".*/logout.*" ]   # do not log the scanner out

22 API security testing from OpenAPI

Modern systems expose much of their surface as APIs rather than HTML pages, and a generic web crawler struggles to discover JSON endpoints it cannot see links to. The fix is to drive the scanner from the API’s own OpenAPI (Swagger) specification: the spec enumerates every path, method, parameter and schema, giving the scanner an exact map of what to test.

From the spec a tool can generate valid and deliberately invalid requests, probe each endpoint for injection and broken authorisation, and check that the responses match the declared schema. Spec-driven testing also catches shadow behaviour — endpoints that behave differently from what the contract promises.

# Drive a DAST scan from the OpenAPI spec
api-scan:
  steps:
    - name: Import the OpenAPI definition and scan every endpoint
      run: zap-api-scan.py -t "$STAGING_URL/openapi.json" -f openapi

23 GraphQL security testing

GraphQL APIs pose security challenges that REST scanners miss. A single endpoint accepts arbitrary client-shaped queries, so an attacker can request deeply nested or recursive structures that explode into enormous responses — a denial-of-service vector. Introspection, if left enabled in production, hands an attacker the full schema.

Security testing for GraphQL therefore checks for: introspection being disabled in production, query depth and complexity limits, rate limiting, and field-level authorisation (so a permitted query cannot reach a forbidden field). Tools can auto-generate abusive queries from the schema to verify these defences hold.

24 SARIF and aggregating results (ASPM)

When many tools each emit findings in their own format, comparing and tracking them is painful. SARIF gives every tool a common output schema, which makes aggregation possible. An ASPM (Application Security Posture Management) platform ingests SARIF from SAST, DAST, SCA and more, then normalises, deduplicates and correlates the findings into a single view per application.

This unified posture lets a team see total risk, assign ownership, track findings against SLAs and measure trends — instead of stitching together a dozen tool dashboards. The standard format is the enabler: tools that speak SARIF plug into the platform without bespoke parsers.

# Each tool emits SARIF; the platform ingests them uniformly
for tool in sast dast sca; do
  run_scan "$tool" --format sarif --output "$tool.sarif"
done
aspm-cli upload *.sarif   # normalise, dedup and correlate centrally

25 Secret scanning across full history

Scanning only the latest commit misses secrets that were committed earlier and later “removed” in a follow-up commit — the secret still sits in git history and remains exploitable. Thorough secret scanning therefore walks the entire commit history, which is why CI checkouts for this purpose use full depth rather than a shallow clone.

History scanning is heavier, so teams often combine a fast incremental scan of the diff on every push with a periodic full-history sweep. Crucially, finding an old secret means it must be rotated; rewriting history to scrub it is secondary, because it may already have been cloned or leaked.

# Full-history scan, not just the latest commit
git clone --no-single-branch myrepo   # full history, all branches
gitleaks detect --source . --log-opts="--all"   # walk every commit

26 Dependency reachability analysis

Plain SCA flags every dependency with a known CVE, but many of those vulnerable functions are never actually called by your code — the package is present but the affected code path is unreachable. This floods teams with findings they cannot meaningfully act on. Reachability analysis narrows the list by asking: does my application actually invoke the vulnerable function?

By building a call graph from your code into the dependency, the tool can mark a CVE as reachable (your code calls the affected path) or not reachable (the vulnerable code is present but never executed). Prioritising reachable findings focuses remediation on the CVEs that pose real risk.

27 License-policy enforcement in CI

License compliance becomes real when it is enforced automatically rather than reviewed by hand. A license policy is typically expressed as an allow-list (licenses cleared by legal, such as MIT and Apache-2.0), a deny-list (licenses that are forbidden, such as strong copyleft for proprietary distribution) and a review bucket for anything ambiguous.

A CI step builds the dependency inventory, maps each package to its license, and fails the build when a denied license appears or warns when an unknown one needs review. Catching a forbidden license before merge is far cheaper than discovering it after it has shipped and spread through downstream releases.

# Fail the build on a denied license
license-check:
  steps:
    - run: license-checker --onlyAllow "MIT;Apache-2.0;BSD-3-Clause" --failOn "GPL-3.0"

28 Coverage-guided fuzzing in CI

Running a fuzzer as a one-off finds shallow bugs but loses all its hard-won progress when the job ends. To make fuzzing effective in CI, you must persist the corpus — the collection of interesting inputs the fuzzer has discovered — so each run resumes where the last left off and keeps reaching deeper code. The corpus is cached between runs and seeded with known-good samples.

Because campaigns are time-bounded in CI, teams set a per-run time budget (say 15 minutes per target), run continuously in the background or nightly, and regression-test every previously found crash on each build so fixed bugs cannot silently return. New crashes are minimised to a small reproducer and filed automatically.

# Persist the corpus so fuzzing resumes and deepens over time
fuzz:
  steps:
    - uses: actions/cache@v4
      with: { path: corpus/, key: fuzz-corpus-$TARGET }
    - run: ./fuzz_target corpus/ -max_total_time=900   # 15-minute budget

29 Security unit and regression tests

Not all security testing needs a heavyweight scanner. Plain unit and integration tests can encode security requirements directly: a test that asserts an unauthenticated request to an admin route returns 403, that a path-traversal payload is rejected, or that a known past vulnerability stays fixed. These run with your normal test suite — fast, deterministic and owned by developers.

When a real vulnerability is fixed, adding a security regression test that reproduces the original exploit guards against the bug ever reappearing. Over time these tests become living documentation of the application’s security expectations, complementing the broad sweep of automated scanners.

# A security regression test: the fixed exploit must stay fixed
def test_path_traversal_blocked():
    resp = client.get("/files?name=../../etc/passwd")
    assert resp.status_code == 400   # CVE-2023-xyz must never regress

30 Diff scanning: only-new findings

Scanning the entire codebase on every pull request is slow and reports a backlog the author did not create. Diff (delta) scanning restricts attention to what the change actually touched: it compares findings on the branch against the base branch and surfaces only the net-new ones introduced by the diff.

This keeps PR feedback fast and relevant — the author sees exactly the issues their change adds — and it is the practical mechanism behind “don’t make it worse” gating. Pre-existing findings are still tracked and burned down, but on a separate cadence (a nightly full scan), not as a blocker on unrelated PRs.

# Compare branch findings to the base; report only new ones
semgrep ci --baseline-commit "$(git merge-base origin/main HEAD)"
# only findings absent from the base branch are reported

31 Break-glass overrides with audit

Occasionally a blocking gate must be bypassed under genuine pressure — a critical hotfix while a scanner is down, or a finding confirmed safe but not yet suppressed. A break-glass mechanism allows an authorised override, but deliberately makes it visible and accountable rather than a quiet back door.

A sound break-glass control requires explicit authorisation (not just any developer), records who overrode what and why in an immutable audit log, and often raises an alert or ticket so the bypass is reviewed afterwards. The friction is intentional: overrides should be rare, traceable and time-bounded, never the normal path.

# Break-glass override is logged and attributed, not silent
override-gate:
  if: "${{ inputs.break_glass == 'true' }}"
  steps:
    - run: |
        echo "OVERRIDE by $ACTOR reason=$REASON" >> audit.log
        notify-security --ticket   # raise a review ticket

32 Scan performance and caching

Security stages that take too long get skipped, disabled or routed around — so performance is a security concern. Several techniques keep scans fast enough to stay in the critical path. Caching reuses vulnerability databases, downloaded rules and dependency trees between runs instead of fetching them every time. Parallelism runs independent scanners as separate jobs at once rather than in sequence.

Pairing diff scanning on PRs (fast, only-new) with full scans nightly (thorough, off the critical path) gives both speed and depth. Pinning and pre-pulling scanner images and warming caches further trims minutes off each run. Fast scans are scans that actually keep running.

# Cache the vulnerability DB so scans do not re-download it
scan:
  steps:
    - uses: actions/cache@v4
      with: { path: ~/.cache/trivy, key: trivy-db-$DATE }
    - run: trivy image --cache-dir ~/.cache/trivy myapp:$GIT_SHA

33 The triage workflow and SLAs

Findings without a process pile up and get ignored. A triage workflow turns raw output into managed work: each new finding is assigned an owner, given a severity-based priority, and tracked through states such as new → confirmed → in-progress → resolved (or accepted / false positive). Every state change is recorded.

Remediation SLAs attach a time budget to each severity — for example Critical in 24 hours, High in 7 days, Medium in 30 — and dashboards flag findings breaching their SLA. This makes security debt visible and measurable, with metrics such as mean time to remediate driving improvement.

34 Vulnerability correlation and dedup

Run SAST, DAST and IAST against the same application and the same SQL-injection flaw can appear three times, described three different ways. Presented as three findings it wastes triage effort and inflates the apparent risk count. Correlation and deduplication recognise that separate tool outputs point at one underlying vulnerability and merge them into a single tracked item.

Matching uses signals such as the file and line, the vulnerability class (CWE), the endpoint and the data flow. Correlation is also powerful evidence: a flaw flagged by SAST and confirmed by DAST hitting the running app is far more likely to be a true, exploitable positive — so the merged finding can be ranked higher with confidence.

35 IaC and container scanning integration

Infrastructure is now code too, and it carries its own risks. IaC scanning (with tools like Checkov or tfsec) statically analyses Terraform, CloudFormation and Kubernetes manifests for misconfigurations — a public S3 bucket, a security group open to the world, a container running as root — before any infrastructure is provisioned.

Combined with container scanning of the images those manifests deploy, you get coverage of the whole runtime stack at build time: the image contents and the configuration that exposes them. Both are static checks that run early and fast in CI, catching cloud and platform misconfiguration long before it reaches a live environment.

# Scan IaC and the image together before deploy
stack-scan:
  steps:
    - run: checkov -d infra/   # find IaC misconfigurations
    - run: trivy image --severity HIGH,CRITICAL myapp:$GIT_SHA

36 Pre-merge vs nightly scans

Different scans belong at different times. Pre-merge (PR) scans must be fast and focused: diff-scoped SAST and SCA that give the author feedback in minutes and block only on net-new High/Critical findings. Speed is the priority because every developer waits on them. Nightly (or scheduled) scans can be slow and exhaustive: full-codebase SAST, deep DAST crawls, long fuzzing campaigns and full-history secret sweeps.

Splitting the work this way keeps the inner loop fast without sacrificing depth. Nightly runs also re-scan unchanged code against freshly disclosed CVEs, so a dependency that was clean yesterday but vulnerable today is caught even when nobody touched it.

37 DAST vs IAST trade-offs in depth

Both DAST and IAST test the running application, but from different vantage points. DAST is fully external and language-agnostic: it needs no access to the code and works against anything that speaks HTTP, but it cannot see inside, so it reports symptoms without root cause and can only test what it reaches from outside. IAST instruments the app from within, so it sees the exact code path, the tainted data flow and the line responsible — far richer detail and very low false positives.

The costs differ too: DAST is simple to point at any deployed target; IAST requires installing an agent and a supported runtime, and its coverage is bounded by the tests that drive it. Many teams run DAST broadly for black-box coverage and add IAST where deep, accurate, low-noise results justify the instrumentation.

38 Threat-modelling-driven test selection

Scanning everything uniformly spends the same effort on a static marketing page and a payment-processing service. Threat modelling — systematically reasoning about what could go wrong, often with a framework like STRIDE — identifies the assets, entry points and most likely attack paths, and that picture should drive which security tests you run and where.

If the model flags an externally exposed authentication flow as high-risk, you invest there: authenticated DAST, targeted fuzzing of the token parser, focused SAST rules. Low-risk internal components get lighter coverage. This risk-based selection concentrates finite testing effort where compromise would hurt most, instead of treating every component as equally important.

🎓 Certificate of Completion

🔒 Complete every lesson quiz above with 90%+ to unlock your downloadable certificate.