Title
Create new category
Edit page index title
Edit category
Edit link
Investigation Runbooks
This chapter introduces the investigation runbooks that follow. Each runbook is a step-by-step walkthrough that takes an analyst from "an alert has appeared" to "a triage decision has been recorded", using the Hunt Page, the Dashboard, and the detection-family reference in Detection Overview as the working surfaces.
This chapter is written for Tier 1 Security Operations Center (SOC) analysts who perform first-response triage, Tier 2 and Tier 3 analysts who escalate and resolve incidents, threat hunters running proactive campaigns, Managed Security Service Provider (MSSP) analysts working across customer tenants, and SOC leads maintaining detection quality. It assumes familiarity with the Hunt page workspace and the six detection families.
First-use acronym expansions in this chapter: SOC (Security Operations Center), MSSP (Managed Security Service Provider), MVP (Minimum Viable Product), IOC (Indicator of Compromise), C2 (command-and-control), PCAP (packet capture), TLS (Transport Layer Security), SNI (Server Name Indication), DNS (Domain Name System), IP (Internet Protocol), RCF (Random Cut Forest), AV (antivirus), PCR (Producer-Consumer Ratio), TIDB (Threat Intelligence Database), SLA (Service-Level Agreement).
Purpose of the Runbooks
The runbooks exist to make triage repeatable. A runbook captures, for a specific trigger, the sequence of observations an experienced analyst makes before deciding what to do, the pivots and filters those observations depend on, the decision criteria that distinguish escalation from monitoring from closure, and the false-positive patterns that recur across deployments.
Runbooks serve three audiences at once. Junior analysts follow them linearly as a checklist. Senior analysts skim them as a reminder of the pivots available on the Hunt page and the enrichment fields worth checking. SOC leads cite them in peer reviews and in incident post-mortems to explain why a given triage decision was or was not defensible.
Runbooks do not replace judgment. They enumerate the evidence an analyst should gather and the questions each piece of evidence answers; the ultimate decision depends on the local environment, the threat landscape, and the context of the alerting host. Runbooks also do not define response-time Service-Level Agreements (SLAs) — those belong to the SOC's operating plan, not to product documentation. What a runbook does define is the order of operations that makes the decision well-informed.
How to Use a Runbook
Every runbook in this chapter follows a fixed shape. Reading it in order takes the analyst from zero context to a documented triage outcome.
- Read the trigger scenario. The opening paragraph describes what the analyst sees on a Dashboard widget or Hunt tab that brings the runbook into play. If the observed alert does not match the scenario, a different runbook applies — the Critical Alert Triage is the generic fallback when no family-specific runbook fits.
- Check prerequisites. Every runbook lists what the analyst needs to have ready — an open Hunt page session with an appropriate role, access to the detection family's sub-tab, any enrichment configuration the runbook depends on. Missing a prerequisite is a hard stop; the investigation cannot proceed until the gap is closed.
- Execute the investigation steps in order. Steps are numbered and each step is an action (open a row, apply a filter, pivot to another tab, examine a sidebar section) paired with what to look for in the result. Skipping steps is tempting when the answer looks obvious, but runbooks are written so that every step adds a distinct piece of evidence.
- Apply the decision tree. Every runbook ends with a branching decision: escalate, monitor, close as benign, or tune the rule via policy. Each branch lists the minimum artifacts the analyst must record before moving on — a ticket identifier, a Hunt tab reference, a policy change note.
- Review false-positive patterns. The "Common False Positives" section describes the benign behaviors each detection family recurrently matches so analysts do not classify a known-benign match as a new threat.
- Follow the references. Every runbook cross-links to the corresponding detection chapter for background, to related runbooks for adjacent families, and to glossary entries for unfamiliar terms.
Analysts document which runbook was followed in the incident ticket. When the runbook proves insufficient — evidence was ambiguous, the decision tree did not terminate cleanly, a false-positive pattern is missing — the analyst files a note against the runbook so the next revision captures the gap.
The Investigation Mindset
Every runbook shares a stance. It is worth naming the stance explicitly because it is the single most important thing a new analyst carries from runbook to runbook.
- Collect evidence before acting. An alert is a hypothesis, not a conclusion. Before an analyst isolates a host, blocks an indicator, or closes a ticket, the runbook demands that the analyst examine the supporting fields on the row, the enrichment sidebar, and at least one correlating view (the same source IP, the same destination, the same flow, the same file hash). The time cost is low; the cost of acting on a misread alert is high.
- Correlate across detection families. A single-signal alert rarely justifies a major response. Multi-signal convergence — for example, a behavioral beaconing detection that coincides with a C2 enrichment match and an OPSWAT InSights Threat Intelligence Database (TIDB) hit on the same destination — is a far stronger basis for escalation than any of the three signals alone. Every runbook's decision tree weights confirmed correlation over isolated signals.
- Never act on a single signal. This is the operational expression of the previous two points. If the only evidence is one row on one tab with no enrichments, no correlated events, and no prior history, the correct disposition is monitor — not close, not escalate. The runbook explicitly directs the analyst to widen the search window, pivot to related surfaces, and wait for a second signal before taking an action whose consequences are expensive to reverse.
- Record what was observed, not what was assumed. When an analyst closes a ticket as benign, the ticket note captures the specific fields that led to that conclusion — for example, "destination is 52.84.0.0/15, Amazon Web Services (AWS) CloudFront range, confirmed by Autonomous System Number (ASN) enrichment; user agent matches internal software update client; closing as benign". Later readers reopen the conclusion only if one of those fields changes, and the team's rate of re-alerting on known-benign patterns drops.
- Tune sparingly and deliberately. Closing an alert as benign is a triage outcome; tuning the rule that produced it is a detection-engineering outcome, and it removes future visibility into the same pattern. Runbooks separate the two explicitly. Tuning is appropriate when the same benign pattern has been confirmed multiple times, when the source of the noise is understood, and when a policy mechanism exists to exclude the pattern without silencing true positives. The rest of the time the disposition is close as benign and the rule stays live.
Pivot Patterns Used Across Runbooks
Every runbook leans on the same pivots through the Hunt page. The pivots are described in detail in the Hunt Page chapter and in the Alert, Flow, and PCAP Pivoting meta-runbook at the end of this chapter; the overview here is a reminder of the motion each pivot produces.
- Alert → flow. From an alert row, the analyst right-clicks the
community_idcolumn (or the matching field in the sidebar) and selects Show all events with this community id. A new All Events tab opens with every protocol transaction, flow record, file extraction, and enrichment that belongs to the same connection. This is the single most frequent pivot in triage. - Alert → session. From an alert row the analyst right-clicks the
community_idand selects Show related events, then narrows the resulting tab to a protocol-specific sub-tab (DNS, HTTP, TLS, Secure Shell (SSH), Server Message Block (SMB), Remote Desktop Protocol (RDP), Simple Mail Transfer Protocol (SMTP), File Transfer Protocol (FTP), Quick User Datagram Protocol Internet Connection (QUIC), FileInfo) to inspect the payload metadata for that connection. - Alert → hunt by value. Right-clicking any IP, domain, hash, or
community_idvalue in the row or sidebar exposes Hunt all events from this IP, Search file hash across all events, or Show related files. Each opens a new tab pre-filtered to the pivoted value, preserving the current time range so history is retained. - Flow → session → file. A long-duration or high-volume flow is pivoted to its session records, and any session carrying file metadata is pivoted to the Files bucket to retrieve the extracted file and its MetaDefender Core scan result if present.
- Session → PCAP. When session metadata alone does not resolve the question, the analyst requests a packet capture (PCAP) for the flow's time window. PCAP availability is configuration-dependent and selective — not every flow has a PCAP retained. The runbook calls out when PCAP is the right pivot and when the analyst should stop at session metadata.
- Value → hunt. The simplest pivot. Right-clicking any IP, domain, or hash value and selecting the matching Hunt action produces a value-scoped tab. Analysts use it to check whether an indicator has fired before, to count occurrences over a longer window, and to find other hosts interacting with the same destination.
Pivots always open new tabs; the originating tab is preserved so the analyst can return to the lead after following the pivot. Tab persistence means the investigation state survives across sign-out and sign-in, so runbooks can direct the analyst to close a loop later without losing prior context.
Runbook Catalog
Each entry below links to the corresponding runbook and names the detection family it covers. The time budget is a typical target for first-response triage on a single alert — more complex investigations routinely take longer and branch into incident response.
| Runbook | Trigger | Detection Family | Time Budget |
|---|---|---|---|
| Critical Alert Triage | Any Critical-severity alert on the Recent Alerts Dashboard widget or the All Alerts Hunt tab | Any | 10–15 min |
| C2 Beacon Investigation | Beaconing Detection or C2 Infrastructure Alert on a Hunt All Alerts tab | Behavioral beaconing, C2 threat intelligence | 15–25 min |
| Data Exfiltration Investigation | Data Exfiltration Alert, or an unusually high upload ratio surfaced on the Producer-Consumer Ratio (PCR) Dashboard widget | Behavioral data exfiltration | 20–30 min |
| Malicious File Investigation | MetaDefender Core High, Medium, or Low antivirus (AV) Detection alert | MetaDefender Core file scanning | 20–30 min |
| ML Anomaly Investigation | ML Random Cut Forest (RCF) Anomalous Activity Alert on a Hunt All Alerts tab | ML anomaly detection | 15–25 min |
| Tunneling Investigation | DNS Tunneling Detection alert (per-query suspicion) or the DNS Tunneling Hourly aggregation alert | Behavioral tunneling | 15–20 min |
| Alert, Flow, and PCAP Pivoting | Referenced from every other runbook as the pivot-mechanics meta-runbook | Cross-family | As needed |
See Also
- Hunt Page — the investigation surface every runbook uses.
- Dashboard — the entry point for most triage runbooks via the Recent Alerts widget.
- Detection Overview— background on the six detection families that produce runbook triggers.
- Severity and Confidence — the severity label definitions that runbook decision trees rely on.