📟

Security Operations (SOC & SIEM) Intermediate

Inside the SOC: monitoring, logging, SIEM correlation, detection engineering, alert triage and threat hunting.

16 lessons 48 quiz questions

📚 Lessons & quizzes

Each lesson ends with its own short quiz. Answer them as you go — score 90% across all lessons to earn your certificate.

1 What a SOC actually does

A Security Operations Centre (SOC) is the team and facility responsible for continuously monitoring an organisation’s systems, detecting malicious or suspicious activity, and coordinating the response when something goes wrong. It is the defensive nerve centre: people, processes and technology working together so that an attack is spotted and contained before it becomes a breach.

A SOC is not a single product you buy. It blends human analysts, documented procedures (playbooks, escalation paths) and tooling (log collection, a SIEM, endpoint agents). Many SOCs run 24×7 because attackers do not keep office hours. Some organisations run their own in-house SOC; others outsource to a Managed Security Service Provider (MSSP) or run a hybrid model.

The SOC’s job is fundamentally about visibility and speed: see what is happening across the estate, and shorten the time between an attacker acting and a defender reacting.

2 SOC tiers and roles

A traditional SOC is organised into tiers that reflect escalating depth of investigation:

Tier 1 — Triage analyst. The front line. Monitors alert queues, performs initial triage, weeds out obvious false positives, and escalates real or unclear cases. High volume, fast decisions.
Tier 2 — Incident responder / investigator. Takes escalated alerts, digs into context, correlates across data sources, scopes the activity and drives containment and remediation.
Tier 3 — Threat hunter / subject-matter expert. The most experienced analysts. Proactively hunt for threats that evaded detection, perform deep forensics and malware analysis, and build new detections.

Supporting roles include the SOC manager (runs the team and reporting), the detection engineer (writes and tunes detection logic), and the security architect. Tiering is a model, not a law: many modern SOCs flatten these roles, but the escalation idea — simple cases handled fast, hard cases handed to deeper expertise — remains.

3 The monitor → detect → respond loop

SOC work runs as a continuous loop with three core phases:

Monitor. Continuously collect telemetry from across the estate — logs, network flows, endpoint events — and keep eyes (human and automated) on it.
Detect. Apply rules, analytics and human judgement to that telemetry to surface activity worth investigating, turning raw data into alerts.
Respond. Triage, investigate, contain, eradicate and recover — then feed lessons learned back into monitoring and detection.

The loop never stops, and crucially it feeds itself: every incident teaches you what to log better and what to detect next time. This is why the famous incident-response phases (Prepare, Detect & Analyse, Contain, Eradicate, Recover, Lessons Learned) close back to preparation. A SOC that only reacts and never improves slowly falls behind its adversaries.

4 Logging fundamentals: what and where to log

Detection is impossible without data, and the data is mostly logs. A log is a timestamped record of an event: a login, a process start, a firewall deny, a DNS query. Good logging answers who did what, where, when, and from where.

Useful log sources include: authentication systems (logons, failures, MFA events), endpoints (process creation, file changes), servers and applications, network devices (firewall, proxy, VPN, DNS), and cloud and identity providers. You rarely want to log everything — too much noise and cost — so you prioritise security-relevant events.

Two recurring pitfalls: gaps (a source you forgot to collect, leaving a blind spot) and integrity (logs an attacker can quietly delete). That is why logs are shipped off the originating host to central, tamper-resistant storage as soon as they are created — an attacker who owns a machine should not be able to erase the evidence of how they got in.

5 The SIEM: aggregate, normalise, correlate

A SIEM (Security Information and Event Management) is the platform that turns scattered logs into actionable security signal. It performs four jobs:

Aggregation. Collect logs from many disparate sources into one place.
Normalisation (parsing). Reshape each vendor’s odd format into a common schema, so a "username" field means the same thing whether it came from a firewall or a domain controller.
Correlation. Connect related events across sources and time — the SIEM’s defining power. A single failed login is nothing; a failed login at the firewall followed by a successful one and then mass file access tells a story.
Presentation. Dashboards, searches and alerts that let analysts see and query the whole picture.

Because the SIEM sees data from all sources together, it can spot patterns no single device could. That cross-source correlation is what distinguishes a SIEM from just a big pile of log files.

6 Writing detection rules and correlation logic

A detection rule encodes the question "does this activity look malicious?" as logic the SIEM evaluates against incoming data. The simplest are signature or threshold rules — for example, "more than 10 failed logins for one account within 5 minutes". More powerful are correlation rules that join multiple conditions: "many failed logins then a success then access to sensitive data".

Detection engineering balances two opposing errors. A rule that is too broad produces false positives (benign activity flagged as bad), drowning analysts. A rule that is too narrow produces false negatives (real attacks missed). Good detections are specific, well-documented (what it catches, why, how to respond), and continuously tuned against real-world results.

Detection-as-code — storing rules in version control, testing them, peer-reviewing them — brings software-engineering discipline to this work and is now a common practice. The pseudo-logic below illustrates a threshold rule conceptually.

# Conceptual detection-rule pseudo-logic (not real product syntax)
# Brute-force candidate: many failures then a success for one user

RULE "possible_brute_force_success":
  WHEN event.type == "login_failed"
  GROUP BY user, source_ip
  COUNT >= 10 WITHIN 5m
  FOLLOWED_BY event.type == "login_success" WITHIN 2m
  THEN raise_alert(severity = "high", user = $user, ip = $source_ip)

7 Events, alerts and incidents

These three words are often muddled, but the distinction is the backbone of SOC workflow:

An event is any observable occurrence: a login, a packet, a file write. The vast majority are completely benign. There are billions of them.
An alert is an event (or pattern of events) that a detection rule flagged as worth a human look. Far fewer than events — but still many.
An incident is a confirmed (or strongly suspected) security event that has impact and requires a coordinated response. Few in number, high in importance.

The pipeline narrows like a funnel: events → alerts → incidents. Triage is the act of moving an alert either up to "incident" or down to "false positive / benign". Confusing an alert with an incident leads to panic; confusing an incident with a mere alert leads to disaster. Clear definitions keep severity and escalation consistent.

8 Alert triage and beating alert fatigue

Triage is the analyst’s core skill: rapidly deciding whether an alert is a genuine threat, a benign activity, or needs escalation. A good triage asks: What fired this? What is the context (asset value, user, time)? Is there corroborating evidence? What is the likely impact?

The enemy of triage is alert fatigue. When a SOC generates thousands of low-quality alerts — most of them false positives — analysts become numb, slow down, and may dismiss the one alert that mattered. Fatigue is a real security risk, not just a morale problem.

Defences against fatigue: tune noisy rules and suppress known-benign patterns; enrich alerts with context automatically so analysts decide faster; prioritise by risk so the worst alerts surface first; and aggregate related alerts into a single case instead of a hundred separate ones. The goal is fewer, higher-fidelity alerts — quality over quantity.

9 SOAR: automation and playbooks

SOAR (Security Orchestration, Automation and Response) sits alongside the SIEM and automates repetitive response work. Where the SIEM detects, SOAR acts — connecting (orchestrating) many tools and running predefined workflows called playbooks.

A playbook is a documented, often automated sequence of steps for handling a particular situation. For a phishing report, a playbook might: extract the URLs and attachments, look them up against threat intelligence, check who else received the message, quarantine those copies, and open a ticket — much of it without a human typing each step.

The aim is to free analysts from rote work so their attention goes to judgement-heavy tasks, and to make response faster and more consistent (the same correct steps every time). A vital caution: destructive automated actions (isolating a host, disabling an account) carry risk, so high-impact steps often keep a human in the loop for approval. Automate the toil, gate the dangerous.

10 EDR and XDR: endpoint telemetry

Endpoints — laptops, servers, workstations — are where much malicious activity actually executes, so deep visibility there is gold. EDR (Endpoint Detection and Response) places a lightweight agent on each host that records rich telemetry: process creation, command lines, file and registry changes, network connections, and parent-child process relationships. It can detect suspicious behaviour locally and, crucially, respond — isolating a host from the network or killing a malicious process.

XDR (Extended Detection and Response) broadens this idea beyond the endpoint, correlating endpoint signals with network, email, identity and cloud telemetry in one platform, so the analyst sees a cross-domain story rather than isolated dots.

EDR telemetry is also a treasure trove for threat hunting and forensics: because it records the chain of process execution, an analyst can reconstruct exactly how an attacker moved on a machine, well after the fact.

11 MITRE ATT&CK: mapping detections

MITRE ATT&CK is a public, curated knowledge base of real-world adversary behaviour, organised into tactics (the attacker’s goals — e.g. Initial Access, Persistence, Privilege Escalation, Lateral Movement, Exfiltration) and techniques (the specific ways they achieve each goal, each with an ID such as T1059 "Command and Scripting Interpreter"). Tactics are the "why"; techniques are the "how".

For a SOC, ATT&CK is a shared language and a coverage map. By tagging each detection rule with the technique it catches, the team can see which adversary behaviours they would notice and which are blind spots. This drives prioritisation: "we have no coverage for credential dumping — let’s fix that."

ATT&CK also aids communication: instead of vague phrases, analysts describe an incident in precise, comparable terms ("the actor used T1566 phishing, then T1059 scripting"). It underpins detection engineering, threat hunting and purple teaming alike.

12 Threat intelligence and IOCs

Threat intelligence is evidence-based knowledge about adversaries — who they are, what they target, and how they operate — used to make better defensive decisions. It ranges from strategic (big-picture trends for leadership) to tactical/operational (concrete data analysts can act on now).

The most operational form is the Indicator of Compromise (IOC): an artefact that suggests a system may be compromised — a malicious file hash, a known-bad IP or domain, a suspicious URL, a registry key. SOCs ingest threat-intel feeds of IOCs and have the SIEM/EDR watch for matches.

Two caveats keep this honest. IOCs are perishable — attackers rotate IPs and domains quickly, so an old indicator may be useless or even cause false positives. And IOCs are the lowest rung of the "Pyramid of Pain": blocking a hash barely inconveniences an attacker, whereas detecting their behaviour (TTPs) is far more durable. Use IOCs, but do not rely on them alone.

13 Threat hunting: going on the offensive (defensively)

Threat hunting is the proactive, human-led search for threats that have already evaded automated detection. Rather than waiting for an alert, hunters assume a breach may exist and go looking for it. It complements — never replaces — alerting.

Good hunting is hypothesis-driven. The hunter forms a testable idea — often inspired by ATT&CK or fresh threat intel — such as "if an attacker were living off the land, we would see PowerShell spawning from Office applications." They then query the telemetry to confirm or refute it, investigate any hits, and either find evil or gain confidence that it is absent.

The payoff is twofold: hunts sometimes catch real intrusions, and even when they find nothing, they reveal visibility gaps and produce new detections — turning a manual hunt into an automated rule. Hunting matures a SOC from purely reactive to genuinely proactive.

14 Detection use cases: brute force, impossible travel, exfiltration

Concrete use cases ground the theory. Three classics, all conceptual:

Brute-force / password spraying. Many authentication failures against one account (brute force) or one password against many accounts (spraying), often from one source. Detected by counting failures over a time window and watching for an eventual success.
Impossible travel. The same user account authenticates from two locations so far apart that travelling between them in the elapsed time is physically impossible (e.g. London then Tokyo twenty minutes later). A strong sign of stolen credentials — though VPNs can cause benign false positives, so it is enriched with other signals.
Data exfiltration. Unusually large or unusual outbound data transfers — a user suddenly uploading gigabytes to an unfamiliar external destination, or DNS/HTTPS traffic carrying far more data than normal. Detected by baselining normal volumes and flagging anomalies.

Each combines a clear hypothesis with the right log source, and each must be tuned to its environment to keep false positives manageable.

15 KPIs and metrics: MTTD, MTTR, dwell time

You cannot improve what you do not measure. SOCs track a handful of key metrics:

MTTD — Mean Time To Detect. Average time from when malicious activity begins to when the SOC detects it. Lower is better.
MTTR — Mean Time To Respond (or Remediate). Average time from detection to containment/resolution. Lower is better.
Dwell time. How long an attacker remains in the environment before being discovered and evicted — essentially the window of exposure. Reducing dwell time is a central SOC goal because the longer an attacker dwells, the more damage they do.

Other useful measures include alert volume, false-positive rate, and the proportion of alerts handled by automation. A health warning: metrics can be gamed (closing tickets fast to flatter MTTR) or chase the wrong thing. Use them to drive genuine improvement and analyst wellbeing, not as a stick.

16 Continuous improvement and purple teaming

A SOC is only as good as its last update. Continuous improvement means treating every incident, hunt and false positive as feedback: tune the noisy rule, fill the logging gap, write the missing detection, refine the playbook. The lessons-learned step is not paperwork — it is where the SOC gets stronger.

A powerful improvement engine is purple teaming: the red team (offence, emulating real attacker techniques) and the blue team (the SOC, defence) work together rather than in secret opposition. The red team executes a known technique; the blue team checks whether their telemetry, alerts and analysts actually caught it. Gaps found are fixed immediately, then re-tested.

This is far more constructive than a one-off "did you catch us?" penetration test, because it is collaborative and iterative: emulate, detect, measure, improve, repeat. Mapped against MITRE ATT&CK, purple-team exercises systematically close detection blind spots and validate that the whole monitor → detect → respond loop genuinely works.

🎓 Certificate of Completion

🔒 Complete every lesson quiz above with 90%+ to unlock your downloadable certificate.