Chapter 15: Real-World Scenario — Cybersecurity Threat Investigation

This capstone chapter ties together nearly every feature of AstraeaDB—graph construction, traversals, GQL queries, vector search, hybrid search, GraphRAG, and temporal analysis—in a single, realistic scenario. You will build a cybersecurity threat graph from scratch, investigate an active alert, enrich your findings with AI, and reconstruct the attack timeline. By the end, you will have seen how graph thinking transforms security operations from reactive log-chasing into proactive, contextual investigation.

15.1 Building the Threat Graph

Every cybersecurity investigation begins with a model of the environment. In a traditional SIEM, this model is implicit—scattered across log tables, asset inventories, and vulnerability scanners. In AstraeaDB, we make it explicit by constructing a graph where every entity is a node and every relationship is an edge.

The Schema

Our threat graph uses six node types and five edge types:

Node Label	Description	Example Properties
Host	A server, workstation, or container in the environment	`hostname`, `os`, `tier`
IPAddress	An internal or external IP address	`ip`, `type` (internal/external)
Vulnerability	A known CVE affecting a host	`cve_id`, `severity`, `cvss`
Alert	A security alert triggered by a detection rule	`alert_id`, `rule`, `severity`
AttackPattern	A MITRE ATT&CK technique or tactic	`technique_id`, `name`, `tactic`
User	A human or service account	`username`, `role`, `department`

Edge Type	From	To	Meaning
CONNECTS_TO	Host	IPAddress	The host has communicated with this IP
HAS_VULNERABILITY	Host	Vulnerability	The host is affected by this CVE
TRIGGERED_ALERT	Host	Alert	An alert was raised on this host
EXPLOITS	AttackPattern	Vulnerability	This attack technique exploits this vulnerability
LOGGED_IN_FROM	User	Host	The user authenticated to this host

Constructing the Graph in Python

The following code builds the complete threat graph. Notice how some nodes carry embeddings—vector representations of their descriptions. These will power semantic search in Section 15.3.

from astraeadb import AstraeaClient
import numpy as np

client = AstraeaClient("localhost:7687")

# ── Hosts ─────────────────────────────────────────────────────────
web_server = client.create_node(
    labels=["Host"],
    properties={
        "hostname": "web-prod-01",
        "os": "Ubuntu 22.04",
        "tier": "DMZ",
        "description": "Public-facing web server hosting customer portal"
    },
    embedding=np.random.rand(128).tolist()  # embedding from host description
)

db_server = client.create_node(
    labels=["Host"],
    properties={
        "hostname": "db-prod-01",
        "os": "Ubuntu 22.04",
        "tier": "Internal",
        "description": "Primary PostgreSQL database server"
    },
    embedding=np.random.rand(128).tolist()
)

jump_host = client.create_node(
    labels=["Host"],
    properties={
        "hostname": "bastion-01",
        "os": "Amazon Linux 2",
        "tier": "Management",
        "description": "SSH jump host for administrative access"
    }
)

# ── IP Addresses ──────────────────────────────────────────────────
ext_ip = client.create_node(
    labels=["IPAddress"],
    properties={"ip": "198.51.100.42", "type": "external", "geo": "Eastern Europe"}
)

int_ip = client.create_node(
    labels=["IPAddress"],
    properties={"ip": "10.0.1.50", "type": "internal"}
)

# ── Vulnerabilities ───────────────────────────────────────────────
vuln_rce = client.create_node(
    labels=["Vulnerability"],
    properties={
        "cve_id": "CVE-2024-3094",
        "severity": "CRITICAL",
        "cvss": 9.8,
        "description": "Remote code execution via crafted HTTP request"
    },
    embedding=np.random.rand(128).tolist()
)

vuln_sqli = client.create_node(
    labels=["Vulnerability"],
    properties={
        "cve_id": "CVE-2024-1234",
        "severity": "HIGH",
        "cvss": 8.1,
        "description": "SQL injection in authentication module"
    },
    embedding=np.random.rand(128).tolist()
)

# ── Alert ─────────────────────────────────────────────────────────
alert = client.create_node(
    labels=["Alert"],
    properties={
        "alert_id": "ALT-2024-0891",
        "rule": "Suspicious outbound connection to known C2 IP",
        "severity": "CRITICAL",
        "timestamp": "2024-11-15T03:22:17Z",
        "description": "Outbound beacon detected to known command-and-control server"
    },
    embedding=np.random.rand(128).tolist()
)

# ── Attack Pattern (MITRE ATT&CK) ────────────────────────────────
attack = client.create_node(
    labels=["AttackPattern"],
    properties={
        "technique_id": "T1190",
        "name": "Exploit Public-Facing Application",
        "tactic": "Initial Access",
        "description": "Adversary exploits a vulnerability in an internet-facing application"
    },
    embedding=np.random.rand(128).tolist()
)

# ── User ──────────────────────────────────────────────────────────
admin_user = client.create_node(
    labels=["User"],
    properties={
        "username": "svc-deploy",
        "role": "service_account",
        "department": "DevOps"
    }
)

# ── Edges ─────────────────────────────────────────────────────────
# Network connections (temporal: valid_from / valid_to)
client.create_edge(web_server, ext_ip, "CONNECTS_TO",
    properties={"port": 443, "protocol": "HTTPS"},
    weight=0.9,
    valid_from="2024-11-15T03:22:00Z",
    valid_to="2024-11-15T04:15:00Z"
)

client.create_edge(web_server, int_ip, "CONNECTS_TO",
    properties={"port": 5432, "protocol": "TCP"},
    weight=0.3,
    valid_from="2024-01-01T00:00:00Z",
    valid_to="9999-12-31T23:59:59Z"
)

client.create_edge(db_server, int_ip, "CONNECTS_TO",
    properties={"port": 5432, "protocol": "TCP"},
    weight=0.2,
    valid_from="2024-01-01T00:00:00Z",
    valid_to="9999-12-31T23:59:59Z"
)

# Vulnerabilities
client.create_edge(web_server, vuln_rce, "HAS_VULNERABILITY",
    properties={"discovered": "2024-10-20", "status": "unpatched"},
    weight=0.95,
    valid_from="2024-10-20T00:00:00Z",
    valid_to="9999-12-31T23:59:59Z"
)

client.create_edge(web_server, vuln_sqli, "HAS_VULNERABILITY",
    properties={"discovered": "2024-09-05", "status": "unpatched"},
    weight=0.8,
    valid_from="2024-09-05T00:00:00Z",
    valid_to="9999-12-31T23:59:59Z"
)

client.create_edge(db_server, vuln_sqli, "HAS_VULNERABILITY",
    properties={"discovered": "2024-09-05", "status": "patched"},
    weight=0.1,
    valid_from="2024-09-05T00:00:00Z",
    valid_to="2024-10-01T00:00:00Z"
)

# Alert
client.create_edge(web_server, alert, "TRIGGERED_ALERT",
    properties={"source": "EDR"},
    weight=1.0
)

# Attack pattern exploits vulnerability
client.create_edge(attack, vuln_rce, "EXPLOITS",
    properties={"confidence": "high"},
    weight=0.95
)

client.create_edge(attack, vuln_sqli, "EXPLOITS",
    properties={"confidence": "medium"},
    weight=0.6
)

# User logins (temporal)
client.create_edge(admin_user, jump_host, "LOGGED_IN_FROM",
    properties={"method": "ssh_key"},
    weight=0.5,
    valid_from="2024-11-15T02:45:00Z",
    valid_to="2024-11-15T03:30:00Z"
)

client.create_edge(admin_user, web_server, "LOGGED_IN_FROM",
    properties={"method": "ssh_key"},
    weight=0.7,
    valid_from="2024-11-15T03:00:00Z",
    valid_to="2024-11-15T03:25:00Z"
)

print("Threat graph constructed: 10 nodes, 12 edges")

At this point, the graph contains the following topology:

Why embeddings on some nodes? Not every node needs an embedding. We attach them to nodes whose description field carries semantic meaning—vulnerabilities, alerts, attack patterns, and key hosts. This lets us use vector search (Section 15.3) to find entities that are conceptually similar, not just structurally connected. In a production system, you would generate these embeddings using a model like text-embedding-3-small rather than random vectors.

15.2 Investigating an Alert

At 03:22 UTC, alert ALT-2024-0891 fires: "Suspicious outbound connection to known C2 IP" on web-prod-01. The SOC analyst opens the AstraeaDB console and begins the investigation.

Step 1: Determine the Blast Radius (BFS)

The first question is: what else is reachable from the compromised host? A breadth-first search from web-prod-01 reveals every entity within two hops—the immediate blast radius.

# BFS from the web server, max depth of 2
blast_radius = client.bfs(start=web_server, max_depth=2)

print(f"Blast radius: {len(blast_radius)} entities reachable within 2 hops")
for node in blast_radius:
    print(f"  [{node['labels'][0]}] {node['properties']}")

Expected output:

Blast radius: 8 entities reachable within 2 hops
  [IPAddress]     {"ip": "198.51.100.42", "type": "external", "geo": "Eastern Europe"}
  [IPAddress]     {"ip": "10.0.1.50", "type": "internal"}
  [Vulnerability] {"cve_id": "CVE-2024-3094", "severity": "CRITICAL", "cvss": 9.8}
  [Vulnerability] {"cve_id": "CVE-2024-1234", "severity": "HIGH", "cvss": 8.1}
  [Alert]         {"alert_id": "ALT-2024-0891", "severity": "CRITICAL"}
  [Host]          {"hostname": "db-prod-01", "tier": "Internal"}
  [AttackPattern] {"technique_id": "T1190", "name": "Exploit Public-Facing Application"}
  [User]          {"username": "svc-deploy", "role": "service_account"}

Critical finding The blast radius includes db-prod-01 (the production database) via the internal IP 10.0.1.50. If the attacker has achieved code execution on the web server, the database is one hop away. This must be escalated immediately.

Step 2: Find the Attack Path (Shortest Path)

How could the attacker reach the database from the external IP? The weighted shortest path reveals the most likely route, where edge weights represent risk scores.

# Weighted shortest path: external IP --> database server
path = client.shortest_path(
    from_node=ext_ip,
    to_node=db_server,
    weighted=True
)

print(f"Attack path (cost: {path['cost']}):")
for step in path["path"]:
    print(f"  --> {step}")

Expected output:

Attack path (cost: 1.4):
  --> IPAddress  198.51.100.42 (external)
  --> Host       web-prod-01 (DMZ)
  --> IPAddress  10.0.1.50 (internal)
  --> Host       db-prod-01 (Internal)

Step 3: Examine Neighbors (Direct Relationships)

Next, the analyst inspects the web server's immediate relationships to understand its full exposure surface.

# What is directly connected to the compromised web server?
neighbors = client.neighbors(
    node_id=web_server,
    direction="both",
    edge_type=None  # all edge types
)

print(f"web-prod-01 has {len(neighbors)} direct relationships:")
for n in neighbors:
    print(f"  --[{n['edge_type']}]--> {n['labels'][0]}: {n['properties']}")

Expected output:

web-prod-01 has 6 direct relationships:
  --[CONNECTS_TO]--------> IPAddress:     {"ip": "198.51.100.42", "type": "external"}
  --[CONNECTS_TO]--------> IPAddress:     {"ip": "10.0.1.50", "type": "internal"}
  --[HAS_VULNERABILITY]--> Vulnerability: {"cve_id": "CVE-2024-3094", "severity": "CRITICAL"}
  --[HAS_VULNERABILITY]--> Vulnerability: {"cve_id": "CVE-2024-1234", "severity": "HIGH"}
  --[TRIGGERED_ALERT]----> Alert:         {"alert_id": "ALT-2024-0891", "severity": "CRITICAL"}
  --[LOGGED_IN_FROM]<----- User:          {"username": "svc-deploy", "role": "service_account"}

Step 4: Targeted GQL Queries

For more precise questions, the analyst drops into GQL. These queries demonstrate the power of declarative pattern matching for security investigations.

Query 1: Find all CRITICAL or HIGH vulnerabilities on the compromised host

# GQL: Find critical/high vulnerabilities on web-prod-01
result = client.query("""
    MATCH (h:Host)-[:HAS_VULNERABILITY]->(v:Vulnerability)
    WHERE h.hostname = 'web-prod-01'
      AND v.severity IN ['CRITICAL', 'HIGH']
    RETURN v.cve_id, v.severity, v.cvss, v.description
    ORDER BY v.cvss DESC
""")

print("Vulnerabilities on compromised host:")
print(f"  Columns: {result['columns']}")
for row in result["rows"]:
    print(f"  {row[0]} | {row[1]} | CVSS {row[2]} | {row[3]}")
print(f"  ({result['stats']['rows_returned']} rows)")

Expected output:

Vulnerabilities on compromised host:
  Columns: ['v.cve_id', 'v.severity', 'v.cvss', 'v.description']
  CVE-2024-3094 | CRITICAL | CVSS 9.8 | Remote code execution via crafted HTTP request
  CVE-2024-1234 | HIGH     | CVSS 8.1 | SQL injection in authentication module
  (2 rows)

Query 2: Find all hosts that connect to the same suspicious external IP

# GQL: Which hosts have talked to the C2 IP?
result = client.query("""
    MATCH (h:Host)-[:CONNECTS_TO]->(ip:IPAddress)
    WHERE ip.ip = '198.51.100.42'
    RETURN h.hostname, h.tier, h.os
""")

print("Hosts communicating with suspected C2 IP:")
for row in result["rows"]:
    print(f"  {row[0]} (tier: {row[1]}, os: {row[2]})")

Query 3: Trace the full attack chain from technique to impact

# GQL: Full attack chain -- technique -> vulnerability -> host -> alert
result = client.query("""
    MATCH (a:AttackPattern)-[:EXPLOITS]->(v:Vulnerability)
          <-(h:Host)-[:TRIGGERED_ALERT]->(alert:Alert)
    WHERE alert.severity = 'CRITICAL'
    RETURN a.technique_id, a.name,
           v.cve_id, v.cvss,
           h.hostname,
           alert.alert_id, alert.rule
""")

print("Full attack chain:")
for row in result["rows"]:
    print(f"  Technique: {row[0]} ({row[1]})")
    print(f"  Exploits:  {row[2]} (CVSS {row[3]})")
    print(f"  Target:    {row[4]}")
    print(f"  Alert:     {row[5]} -- {row[6]}")
    print()

Expected output:

Full attack chain:
  Technique: T1190 (Exploit Public-Facing Application)
  Exploits:  CVE-2024-3094 (CVSS 9.8)
  Target:    web-prod-01
  Alert:     ALT-2024-0891 -- Suspicious outbound connection to known C2 IP

Graph advantage In a relational database, the query above would require joining four or five tables. In AstraeaDB, it is a single pattern match that follows the natural structure of the data. The graph model turns multi-table joins into intuitive path expressions. See Chapter 5 for the complete GQL guide.

15.3 Enriching with AI

The structural investigation from Section 15.2 told us what is connected and how. Now we use AstraeaDB's AI capabilities to understand what it means—finding similar past threats, discovering hidden relationships, and generating analyst-ready briefings.

Vector Search: Find Similar Past Threats

Given the embedding of the current alert, vector search finds historically similar events—even if they share no structural connection to the current graph. This is especially powerful in large environments with thousands of past incidents.

# Generate an embedding for the current threat description
query_text = "Outbound beacon to command-and-control server via HTTPS"
query_vector = embed(query_text)  # your embedding model (e.g., text-embedding-3-small)

# Find the 5 most semantically similar nodes in the graph
similar = client.vector_search(query_vector=query_vector, k=5)

print("Semantically similar entities:")
for hit in similar:
    print(f"  [{hit['labels'][0]}] score={hit['score']:.3f}")
    print(f"    {hit['properties'].get('description', hit['properties'])}")

Expected output:

Semantically similar entities:
  [Alert]         score=0.943
    Outbound beacon detected to known command-and-control server
  [AttackPattern] score=0.871
    Adversary exploits a vulnerability in an internet-facing application
  [Vulnerability] score=0.812
    Remote code execution via crafted HTTP request
  [Host]          score=0.754
    Public-facing web server hosting customer portal
  [Vulnerability] score=0.698
    SQL injection in authentication module

How vector search helps In a real environment with thousands of past incidents, vector search can surface a historical alert from six months ago that used the same C2 infrastructure but targeted a different host. This gives the analyst immediate context: "We have seen this actor before." See Chapter 8 for a full treatment of vector search.

Hybrid Search: Structure + Semantics Combined

Hybrid search is AstraeaDB's most powerful search mode. It combines graph proximity (how close a node is structurally) with semantic similarity (how close a node is conceptually). The alpha parameter controls the blend: alpha=1.0 is pure vector search, alpha=0.0 is pure graph traversal.

# Hybrid search: start from the web server, blend structure + semantics
# Looking for nodes related to lateral movement near the compromised host
results = client.hybrid_search(
    anchor=web_server,
    query_vector=embed("lateral movement privilege escalation data exfiltration"),
    max_hops=3,
    k=5,
    alpha=0.6  # 60% semantic, 40% structural
)

print("Hybrid search results (structure + semantics):")
for r in results:
    print(f"  [{r['labels'][0]}] hybrid_score={r['score']:.3f}")
    print(f"    {r['properties']}")

Expected output:

Hybrid search results (structure + semantics):
  [User]          hybrid_score=0.891
    {"username": "svc-deploy", "role": "service_account", "department": "DevOps"}
  [IPAddress]     hybrid_score=0.847
    {"ip": "10.0.1.50", "type": "internal"}
  [Host]          hybrid_score=0.823
    {"hostname": "db-prod-01", "tier": "Internal", "os": "Ubuntu 22.04"}
  [Vulnerability] hybrid_score=0.781
    {"cve_id": "CVE-2024-3094", "severity": "CRITICAL", "cvss": 9.8}
  [Host]          hybrid_score=0.695
    {"hostname": "bastion-01", "tier": "Management", "os": "Amazon Linux 2"}

The results prioritize nodes that are both near the compromised host in the graph and semantically related to lateral movement. This surfaces the service account svc-deploy (which logged into the web server shortly before the alert) and the internal IP 10.0.1.50 (the path to the database)—exactly the entities an analyst would want to investigate next.

GraphRAG: Automated Analyst Briefing

Finally, we use GraphRAG to generate a natural-language briefing. AstraeaDB retrieves the relevant subgraph around the alert, serializes it as structured context, and sends it to an LLM—all in a single atomic call.

# Generate an AI-powered threat briefing
briefing = client.graph_rag(
    question="Provide a threat analysis briefing for alert ALT-2024-0891. "
             "Include the likely attack vector, blast radius, affected assets, "
             "and recommended containment actions.",
    anchor=alert,
    hops=3,
    max_nodes=50,
    format="markdown"
)

print(briefing["answer"])

Example output (generated by the LLM from graph context):

## Threat Analysis Briefing: ALT-2024-0891

**Severity:** CRITICAL
**Timestamp:** 2024-11-15T03:22:17Z

### Summary
Alert ALT-2024-0891 indicates a command-and-control (C2) beacon from
web-prod-01 to external IP 198.51.100.42 (geolocated to Eastern Europe).
The host has two unpatched vulnerabilities: CVE-2024-3094 (CVSS 9.8, RCE)
and CVE-2024-1234 (CVSS 8.1, SQLi). The likely initial access vector is
MITRE ATT&CK T1190 (Exploit Public-Facing Application).

### Blast Radius
- **Direct risk:** db-prod-01 (production database) is reachable from the
  compromised host via internal IP 10.0.1.50 on port 5432.
- **Account compromise:** Service account svc-deploy authenticated to
  web-prod-01 at 03:00 UTC, 22 minutes before the alert. This account
  also has access to bastion-01.

### Recommended Actions
1. **Isolate** web-prod-01 from the network immediately.
2. **Rotate** credentials for svc-deploy and audit its access logs.
3. **Patch** CVE-2024-3094 and CVE-2024-1234 across all affected hosts.
4. **Block** outbound traffic to 198.51.100.42 at the firewall.
5. **Investigate** db-prod-01 for signs of lateral movement.

Why GraphRAG outperforms naive RAG A standard vector-only RAG system would retrieve the alert and perhaps the vulnerability descriptions. GraphRAG retrieves the entire connected subgraph—the hosts, IPs, user accounts, attack patterns, and their relationships. This gives the LLM the structural context it needs to reason about attack paths and blast radius, producing vastly more actionable briefings. See Chapter 11 for the full GraphRAG guide.

15.4 Temporal Analysis

The most important question in any incident response is: when did this happen? AstraeaDB's temporal edges (introduced in Chapter 9) let us reconstruct the exact timeline of the attack by querying the graph as it existed at specific points in time.

Reconstructing the Timeline

Each edge in our threat graph carries valid_from and valid_to timestamps. This allows us to answer time-specific questions without maintaining separate snapshots of the data.

Question 1: What was connected to the web server at the time of the alert?

# Neighbors at the exact moment of the alert
alert_time = "2024-11-15T03:22:17Z"

neighbors_at_alert = client.neighbors_at(
    node_id=web_server,
    direction="both",
    timestamp=alert_time,
    edge_type=None
)

print(f"Connections active at {alert_time}:")
for n in neighbors_at_alert:
    print(f"  --[{n['edge_type']}]--> {n['labels'][0]}: {n['properties']}")

Expected output:

Connections active at 2024-11-15T03:22:17Z:
  --[CONNECTS_TO]--------> IPAddress:     {"ip": "198.51.100.42", "type": "external"}
  --[CONNECTS_TO]--------> IPAddress:     {"ip": "10.0.1.50", "type": "internal"}
  --[HAS_VULNERABILITY]--> Vulnerability: {"cve_id": "CVE-2024-3094", "severity": "CRITICAL"}
  --[HAS_VULNERABILITY]--> Vulnerability: {"cve_id": "CVE-2024-1234", "severity": "HIGH"}
  --[TRIGGERED_ALERT]----> Alert:         {"alert_id": "ALT-2024-0891"}
  --[LOGGED_IN_FROM]<----- User:          {"username": "svc-deploy"}

Key observation The service account svc-deploy was logged in at the time of the alert (its session was valid from 03:00 to 03:25 UTC). This is a strong indicator that the account may have been used as part of the attack chain, or at the very least, its session was active during the compromise window.

Question 2: Was the vulnerability present before the connection to the C2 appeared?

# Check one month before the alert: were the vulns already present?
one_month_before = "2024-10-15T03:22:17Z"

vulns_before = client.neighbors_at(
    node_id=web_server,
    direction="outgoing",
    timestamp=one_month_before,
    edge_type="HAS_VULNERABILITY"
)

print(f"Vulnerabilities present on {one_month_before}:")
for v in vulns_before:
    print(f"  {v['properties']['cve_id']} (CVSS {v['properties']['cvss']})")

# Check: was the C2 connection active back then?
c2_before = client.neighbors_at(
    node_id=web_server,
    direction="outgoing",
    timestamp=one_month_before,
    edge_type="CONNECTS_TO"
)

ext_ips = [n for n in c2_before if n["properties"].get("type") == "external"]
print(f"\nExternal connections active on {one_month_before}: {len(ext_ips)}")

Expected output:

Vulnerabilities present on 2024-10-15T03:22:17Z:
  CVE-2024-1234 (CVSS 8.1)

External connections active on 2024-10-15T03:22:17Z: 0

This reveals that CVE-2024-1234 (the SQL injection) was already present a month before the attack, but the C2 connection did not exist yet, and CVE-2024-3094 (the RCE) had not yet been discovered on this host. The RCE vulnerability was discovered on October 20th and likely provided the initial access that led to the C2 beacon on November 15th.

Question 3: Did the attack path to the database exist before the C2 connection appeared?

# Temporal shortest path: could the attacker reach the DB at alert time?
temporal_path = client.shortest_path_at(
    from_node=ext_ip,
    to_node=db_server,
    timestamp=alert_time
)

if temporal_path["path"]:
    print(f"Attack path existed at {alert_time}:")
    for step in temporal_path["path"]:
        print(f"  --> {step}")
else:
    print("No path existed at that time.")

# Compare: was this path available before the C2 connection was established?
before_c2 = "2024-11-15T03:00:00Z"
path_before = client.shortest_path_at(
    from_node=ext_ip,
    to_node=db_server,
    timestamp=before_c2
)

if path_before["path"]:
    print(f"\nPath also existed at {before_c2} (pre-C2 connection)")
else:
    print(f"\nNo path from external IP to DB at {before_c2}")
    print("  --> The C2 connection at 03:22 UTC created the attack path")

Expected output:

Attack path existed at 2024-11-15T03:22:17Z:
  --> IPAddress  198.51.100.42 (external)
  --> Host       web-prod-01 (DMZ)
  --> IPAddress  10.0.1.50 (internal)
  --> Host       db-prod-01 (Internal)

No path from external IP to DB at 2024-11-15T03:00:00Z
  --> The C2 connection at 03:22 UTC created the attack path

Temporal analysis reveals causality Without temporal edges, we would only know that a path exists. With temporal edges, we can prove that the path did not exist before 03:22 UTC and did exist after. This establishes the C2 connection as the event that created the attack surface to the internal database. See Chapter 9 for the full temporal graphs guide.

The Complete Attack Timeline

Putting it all together, we can reconstruct the complete sequence of events:

Time (UTC)	Event	Evidence in Graph
`2024-09-05`	CVE-2024-1234 (SQLi, CVSS 8.1) discovered on web-prod-01 and db-prod-01	`HAS_VULNERABILITY` edge `valid_from`
`2024-10-01`	CVE-2024-1234 patched on db-prod-01 (but NOT on web-prod-01)	`HAS_VULNERABILITY` edge `valid_to` on db-prod-01
`2024-10-20`	CVE-2024-3094 (RCE, CVSS 9.8) discovered on web-prod-01	`HAS_VULNERABILITY` edge `valid_from`
`Nov 15 02:45`	svc-deploy logs into bastion-01	`LOGGED_IN_FROM` edge `valid_from`
`Nov 15 03:00`	svc-deploy logs into web-prod-01	`LOGGED_IN_FROM` edge `valid_from`
`Nov 15 03:22`	C2 beacon detected: web-prod-01 connects to 198.51.100.42	`CONNECTS_TO` edge `valid_from` + `TRIGGERED_ALERT`
`Nov 15 03:25`	svc-deploy session on web-prod-01 ends	`LOGGED_IN_FROM` edge `valid_to`
`Nov 15 04:15`	C2 connection terminates	`CONNECTS_TO` edge `valid_to`

Suspicious correlation The svc-deploy account logged into the web server at 03:00 UTC and the C2 beacon appeared 22 minutes later. The account's session ended at 03:25—only 3 minutes after the alert fired. This pattern is consistent with a compromised service account being used for initial access. The account should be immediately disabled and its credentials rotated across all systems.

15.5 Summary: Features Used in This Investigation

This single investigation leveraged nearly every major capability of AstraeaDB. The following table maps each feature to its role in the cybersecurity workflow:

Feature	API Used	Role in Investigation	Chapter
Graph Construction	`create_node`, `create_edge`	Built the threat model with hosts, IPs, vulnerabilities, alerts, attack patterns, and user accounts	Ch 4
BFS Traversal	`bfs`	Determined the blast radius—all entities reachable from the compromised host within 2 hops	Ch 6
Shortest Path	`shortest_path`	Identified the most likely attack path from the external C2 IP to the internal database	Ch 6
Neighbor Queries	`neighbors`	Enumerated all direct relationships of the compromised host for triage	Ch 6
GQL Queries	`query`	Pattern-matched complex attack chains across multiple entity types (technique to vulnerability to host to alert)	Ch 5
Vector Search	`vector_search`	Found semantically similar past threats using node embeddings	Ch 8
Hybrid Search	`hybrid_search`	Combined structural proximity and semantic similarity to surface the most relevant entities for lateral movement investigation	Ch 8
GraphRAG	`graph_rag`	Generated a complete analyst briefing by feeding the threat subgraph to an LLM	Ch 11
Temporal Neighbors	`neighbors_at`	Determined which connections and vulnerabilities were active at the exact time of the alert	Ch 9
Temporal Shortest Path	`shortest_path_at`	Proved that the attack path to the database did not exist before the C2 connection was established	Ch 9

Where to go from here This chapter demonstrated a single investigation, but the same patterns apply to any domain where relationships matter: fraud detection (financial transaction graphs), supply chain analysis (vendor dependency graphs), network monitoring (infrastructure topology graphs), and knowledge management (organizational knowledge graphs). The techniques you have learned throughout this guide—from basic graph construction in Chapter 4 to GraphRAG in Chapter 11—compose together to solve problems that are fundamentally difficult in relational databases. Explore the Appendices for quick reference cards, or visit the GitHub repository to start building.

← Chapter 14: Performance and Scaling Appendix A: GQL Quick Reference →