🔬

Incident Response & Digital Forensics Advanced

Detect, contain and investigate breaches: the IR lifecycle, evidence handling, memory/disk forensics and lessons learned.

16 lessons 48 quiz questions

Lessons & quizzes Certificate

📚 Lessons & quizzes

Each lesson ends with its own short quiz. Answer them as you go — score 90% across all lessons to earn your certificate.

1 What incident response is & why a plan matters

An incident is any event that actually or potentially harms the confidentiality, integrity or availability of systems or data — a ransomware outbreak, a phishing compromise, data exfiltration or an insider abuse. Incident response (IR) is the organised set of activities used to detect, contain, investigate and recover from such events while limiting damage and cost.

A written IR plan matters because the middle of a crisis is the worst time to invent a process. A good plan defines roles, decision authority, communication paths (including legal and PR), and escalation thresholds before they are needed. It turns panic into procedure, reduces dwell time, and produces consistent, defensible evidence handling. Speed and discipline directly lower breach cost and reputational harm.

2 The IR lifecycle (NIST and SANS PICERL)

Two widely used frameworks describe the same cycle. NIST SP 800-61 uses four phases: Preparation; Detection & Analysis; Containment, Eradication & Recovery; and Post-Incident Activity (lessons learned). The SANS PICERL model expands this into six steps: Preparation, Identification, Containment, Eradication, Recovery, Lessons Learned.

The lifecycle is iterative, not strictly linear: analysis often loops back as new evidence appears, and lessons learned feed directly back into preparation. Mapping your runbooks to a recognised framework keeps responders aligned and makes audits and post-mortems easier.

NIST SP 800-61            SANS PICERL
-------------------       ----------------------
1. Preparation            1. Preparation
2. Detection & Analysis   2. Identification
3. Containment,           3. Containment
   Eradication &          4. Eradication
   Recovery               5. Recovery
4. Post-Incident          6. Lessons Learned
   Activity

3 Building the CSIRT & writing runbooks

A CSIRT (Computer Security Incident Response Team) is the group responsible for handling incidents. It combines an incident commander (who coordinates and owns decisions), technical analysts (forensics, network, endpoint), and supporting functions: legal, communications/PR, HR and executive sponsors. Clear role definitions prevent two people doing the same job while a critical task goes untouched.

Runbooks (playbooks) are step-by-step procedures for specific scenarios — phishing, ransomware, account compromise, data exfiltration. A good runbook lists triggers, required tools and access, decision points, and who to notify. Runbooks turn tribal knowledge into repeatable action and let less-experienced staff respond consistently under pressure.

4 Detection sources: alerts, logs, SIEM, EDR and users

Incidents are discovered through many channels. Security alerts come from IDS/IPS, firewalls and antivirus. Logs from servers, applications, authentication systems and cloud platforms hold the ground truth. A SIEM (Security Information and Event Management) aggregates and correlates logs from across the estate so analysts can spot patterns a single source would miss. EDR (Endpoint Detection and Response) watches process, file and network behaviour on hosts and can isolate them.

Crucially, many breaches are first reported by humans — a user who clicked a suspicious link, a partner who notices odd traffic, or an external party. No single source is enough; effective detection correlates automated signals with human reports.

5 Triage & severity classification

Not every alert is a major incident, and resources are finite. Triage is the rapid assessment that answers: is this real, what is affected, and how bad is it? Analysts validate the alert (rule out false positives), establish scope, and assign a severity level.

Severity is usually based on impact (data sensitivity, number of systems, business criticality, regulatory exposure) and urgency (is it spreading, is data actively leaving). A common scheme is Critical / High / Medium / Low, each tied to a defined response time and escalation path. Consistent classification ensures the right people are pulled in fast for serious events and prevents minor noise from consuming the team.

6 The order of volatility & evidence preservation

When collecting evidence you must capture the most fragile data first, because some evidence vanishes the moment a machine is touched or powered off. This priority is the order of volatility (RFC 3227). From most to least volatile: CPU registers and cache; the routing table, ARP cache, process list and memory (RAM); temporary file systems; disk; remote logging/monitoring data; and finally physical configuration and archival media.

RAM is highly volatile — it is lost on power-down and holds running processes, network connections, decryption keys and injected malware that may never touch disk. So a responder may image memory before pulling the plug. Always preserve before you analyse, and record exactly what was collected and when.

7 Chain of custody & legal admissibility

Chain of custody is the documented, unbroken record of who handled a piece of evidence, when, why and how — from collection through analysis to storage or court. Each transfer is logged with timestamps, handler identities and the evidence state.

It matters because in legal or disciplinary proceedings, evidence is only useful if it is admissible. If the defence can show the evidence could have been altered, lost or mishandled at any point, it may be thrown out. A solid chain of custody — combined with hashing to prove integrity, tamper-evident storage and minimal handling — demonstrates the evidence presented is the same as what was collected and was not tampered with.

8 Forensic imaging: bit-for-bit copies, write blockers & hashing

Forensics is performed on a forensic image, not the original. An image is a bit-for-bit copy that captures every sector — including deleted files, slack space and unallocated areas — not just the visible files. Tools like dd, dcfldd or FTK Imager create these images.

To prevent any accidental change to the source while copying, investigators use a write blocker (hardware or software) that allows reads but blocks all writes. To prove the copy is faithful and unaltered, you compute a cryptographic hash (e.g. SHA-256) of the source and of the image: if the hashes match, the image is verified identical; re-hashing later proves it has not changed since.

# Create a bit-for-bit image and hash both source and image
sudo dd if=/dev/sdb of=/evidence/disk.img bs=4M conv=noerror,sync status=progress

# Hash the SOURCE device and the IMAGE; the values must match
sudo sha256sum /dev/sdb
sha256sum /evidence/disk.img

# Re-hash later to prove integrity is unchanged
sha256sum /evidence/disk.img

9 Memory forensics: what lives in RAM

Memory forensics analyses a capture of a system’s RAM. RAM is a goldmine because it contains data that may never reach disk: running and hidden processes, open network connections, loaded modules and DLLs, command-line arguments, decryption keys and passwords, clipboard contents, and fileless or in-memory-only malware.

You first acquire memory with a tool such as WinPmem, LiME or AVML, then analyse the image with frameworks like Volatility. Because malware increasingly runs without writing files to disk, memory analysis is often the only way to see what a host was actually doing at the moment of compromise.

# Acquire memory (Linux example with AVML)
sudo avml /evidence/memory.lime
sha256sum /evidence/memory.lime

# Analyse with Volatility 3
vol -f /evidence/memory.lime windows.pslist     # running processes
vol -f /evidence/memory.lime windows.netscan    # network connections
vol -f /evidence/memory.lime windows.malfind    # injected/hidden code

10 Disk forensics: file systems, deleted files, slack & timelines

Disk forensics examines the imaged storage. When a file is "deleted", the file system (NTFS, ext4, APFS) usually just marks its space as available and removes the directory entry — the actual data often remains until overwritten, so deleted files can frequently be recovered (file carving). Slack space is the leftover area between the end of a file’s data and the end of its allocated cluster; it can hold fragments of previously deleted data.

File system metadata records MAC times — Modified, Accessed and Created/Changed timestamps. Building a timeline from these (with tools like Autopsy/Sleuth Kit, fls and mactime, or Plaso/log2timeline) reconstructs the sequence of attacker actions: when files appeared, ran or were touched.

# List files (including deleted) and build a timeline with Sleuth Kit
fls -r -m / /evidence/disk.img > bodyfile.txt
mactime -b bodyfile.txt -d > timeline.csv

# Recover deleted files by carving
photorec /evidence/disk.img

11 Log & network forensics: pcap and flow data

Network evidence shows how an attacker entered, moved and exfiltrated data. Full packet capture (pcap) records the complete contents of traffic and is the most detailed source — analysed with tcpdump, Wireshark or Zeek — but it is large and not always available. Flow data (NetFlow/IPFIX) is a lighter summary: who talked to whom, when, on which ports and how many bytes, without payload. Flow is excellent for spotting beaconing, lateral movement and large outbound transfers.

Logs (firewall, DNS, proxy, authentication, application) tie network activity to identities and actions. Correlating logs, flow and pcap across time — and ensuring clocks are synchronised — is how responders reconstruct the full attack path.

# Read a capture and filter for suspicious outbound traffic
tcpdump -r capture.pcap 'dst port 4444' -nn

# Summarise conversations with Zeek
zeek -r capture.pcap
cat conn.log | zeek-cut id.orig_h id.resp_h id.resp_p orig_bytes resp_bytes

12 Malware triage & sandboxing

When a suspicious file is found, malware triage answers what it is and what it does — without infecting production. Static analysis examines the file without running it: hashes, strings, file type, embedded URLs, imports and packing. Dynamic analysis runs the sample in a controlled, isolated environment — a sandbox — and observes its behaviour: files created, registry changes, processes spawned and network callbacks.

A sandbox must be isolated (snapshot-based VM or dedicated network) so the malware cannot reach real systems or the internet inadvertently. Triage produces actionable findings — IOCs, capabilities and a severity judgement — quickly, so deep reverse-engineering can be reserved for the cases that warrant it.

13 Indicators of compromise (IOCs) & threat intelligence

Indicators of compromise (IOCs) are observable artefacts that suggest a system was attacked: malicious file hashes, IP addresses and domains, URLs, registry keys, mutexes, filenames and email subjects. During an incident, IOCs let you hunt for the same attacker across the rest of the estate (scoping) and feed detection rules to catch reinfection.

IOCs are most powerful when shared as threat intelligence. Formats like STIX/TAXII and frameworks like MITRE ATT&CK (which maps attacker tactics and techniques) let teams exchange and contextualise indicators. Note that simple IOCs (an IP, a hash) are easy for attackers to change; behaviour-based indicators (TTPs) are more durable — this is the idea behind the "pyramid of pain."

14 Containment strategies: short-term vs long-term

Containment stops the incident from spreading or causing more damage while you investigate. Short-term containment is immediate and reversible: isolate an infected host from the network, block a malicious IP, disable a compromised account, or pull a cable. The goal is to halt the bleeding fast.

Long-term containment involves more durable measures applied while you prepare for full eradication — for example rebuilding a clean system to take over, applying temporary firewall rules, or patching while keeping systems running. A key tension is isolate vs observe: isolating fast limits damage but may tip off the attacker and destroy volatile evidence; sometimes you monitor briefly to understand scope first. Crucially, preserve evidence (image memory/disk) before wiping anything.

15 Eradication & recovery: rebuild vs clean, then validate

Eradication removes the attacker and their artefacts: deleting malware, closing the exploited vulnerability, resetting compromised credentials, and removing persistence mechanisms (scheduled tasks, services, backdoors). For deeply compromised hosts, a clean rebuild from known-good media is far safer than trying to "clean" in place, because attackers hide persistence in ways that are easy to miss — when in doubt, rebuild.

Recovery restores systems to normal, validated operation: restore data from trusted backups, return systems to production in a controlled way, and monitor closely for signs the attacker returns. Before declaring done, validate: confirm the vulnerability is fixed, credentials are rotated, and no IOCs remain. Restoring from a backup that predates or contains the compromise simply reinfects you.

16 Post-incident report & metrics (MTTD/MTTR)

The incident is not over until you learn from it. A post-incident review (lessons learned / blameless post-mortem) gathers the team to reconstruct the timeline, what worked, what failed and what to improve. It produces a report: an executive summary, the timeline, root cause, impact, actions taken, and concrete remediation items with owners and deadlines. A blameless culture encourages honesty so the real causes surface.

Two key metrics quantify performance. MTTD (Mean Time To Detect) measures how long from compromise to discovery; MTTR (Mean Time To Respond/Recover) measures how long from detection to containment or full recovery. Lower values mean less dwell time and damage. Tracking them over incidents shows whether your IR program is improving, and the lessons feed straight back into Preparation — closing the lifecycle loop.

🎓 Certificate of Completion

🔒 Complete every lesson quiz above with 90%+ to unlock your downloadable certificate.