Chapter 15: Real-World Scenario — Cybersecurity Threat Investigation

This capstone chapter ties together nearly every feature of AstraeaDB—graph construction, traversals, GQL queries, vector search, hybrid search, GraphRAG, and temporal analysis—in a single, realistic scenario. You will build a cybersecurity threat graph from scratch, investigate an active alert, enrich your findings with AI, and reconstruct the attack timeline. By the end, you will have seen how graph thinking transforms security operations from reactive log-chasing into proactive, contextual investigation.

15.1 Building the Threat Graph

Every cybersecurity investigation begins with a model of the environment. In a traditional SIEM, this model is implicit—scattered across log tables, asset inventories, and vulnerability scanners. In AstraeaDB, we make it explicit by constructing a graph where every entity is a node and every relationship is an edge.

The Schema

Our threat graph uses six node types and five edge types:

Node LabelDescriptionExample Properties
Host A server, workstation, or container in the environment hostname, os, tier
IPAddress An internal or external IP address ip, type (internal/external)
Vulnerability A known CVE affecting a host cve_id, severity, cvss
Alert A security alert triggered by a detection rule alert_id, rule, severity
AttackPattern A MITRE ATT&CK technique or tactic technique_id, name, tactic
User A human or service account username, role, department
Edge TypeFromToMeaning
CONNECTS_TO Host IPAddress The host has communicated with this IP
HAS_VULNERABILITY Host Vulnerability The host is affected by this CVE
TRIGGERED_ALERT Host Alert An alert was raised on this host
EXPLOITS AttackPattern Vulnerability This attack technique exploits this vulnerability
LOGGED_IN_FROM User Host The user authenticated to this host

Constructing the Graph in Python

The following code builds the complete threat graph. Notice how some nodes carry embeddings—vector representations of their descriptions. These will power semantic search in Section 15.3.

from astraeadb import AstraeaClient
import numpy as np

client = AstraeaClient("localhost:7687")

# ── Hosts ─────────────────────────────────────────────────────────
web_server = client.create_node(
    labels=["Host"],
    properties={
        "hostname": "web-prod-01",
        "os": "Ubuntu 22.04",
        "tier": "DMZ",
        "description": "Public-facing web server hosting customer portal"
    },
    embedding=np.random.rand(128).tolist()  # embedding from host description
)

db_server = client.create_node(
    labels=["Host"],
    properties={
        "hostname": "db-prod-01",
        "os": "Ubuntu 22.04",
        "tier": "Internal",
        "description": "Primary PostgreSQL database server"
    },
    embedding=np.random.rand(128).tolist()
)

jump_host = client.create_node(
    labels=["Host"],
    properties={
        "hostname": "bastion-01",
        "os": "Amazon Linux 2",
        "tier": "Management",
        "description": "SSH jump host for administrative access"
    }
)

# ── IP Addresses ──────────────────────────────────────────────────
ext_ip = client.create_node(
    labels=["IPAddress"],
    properties={"ip": "198.51.100.42", "type": "external", "geo": "Eastern Europe"}
)

int_ip = client.create_node(
    labels=["IPAddress"],
    properties={"ip": "10.0.1.50", "type": "internal"}
)

# ── Vulnerabilities ───────────────────────────────────────────────
vuln_rce = client.create_node(
    labels=["Vulnerability"],
    properties={
        "cve_id": "CVE-2024-3094",
        "severity": "CRITICAL",
        "cvss": 9.8,
        "description": "Remote code execution via crafted HTTP request"
    },
    embedding=np.random.rand(128).tolist()
)

vuln_sqli = client.create_node(
    labels=["Vulnerability"],
    properties={
        "cve_id": "CVE-2024-1234",
        "severity": "HIGH",
        "cvss": 8.1,
        "description": "SQL injection in authentication module"
    },
    embedding=np.random.rand(128).tolist()
)

# ── Alert ─────────────────────────────────────────────────────────
alert = client.create_node(
    labels=["Alert"],
    properties={
        "alert_id": "ALT-2024-0891",
        "rule": "Suspicious outbound connection to known C2 IP",
        "severity": "CRITICAL",
        "timestamp": "2024-11-15T03:22:17Z",
        "description": "Outbound beacon detected to known command-and-control server"
    },
    embedding=np.random.rand(128).tolist()
)

# ── Attack Pattern (MITRE ATT&CK) ────────────────────────────────
attack = client.create_node(
    labels=["AttackPattern"],
    properties={
        "technique_id": "T1190",
        "name": "Exploit Public-Facing Application",
        "tactic": "Initial Access",
        "description": "Adversary exploits a vulnerability in an internet-facing application"
    },
    embedding=np.random.rand(128).tolist()
)

# ── User ──────────────────────────────────────────────────────────
admin_user = client.create_node(
    labels=["User"],
    properties={
        "username": "svc-deploy",
        "role": "service_account",
        "department": "DevOps"
    }
)

# ── Edges ─────────────────────────────────────────────────────────
# Network connections (temporal: valid_from / valid_to)
client.create_edge(web_server, ext_ip, "CONNECTS_TO",
    properties={"port": 443, "protocol": "HTTPS"},
    weight=0.9,
    valid_from="2024-11-15T03:22:00Z",
    valid_to="2024-11-15T04:15:00Z"
)

client.create_edge(web_server, int_ip, "CONNECTS_TO",
    properties={"port": 5432, "protocol": "TCP"},
    weight=0.3,
    valid_from="2024-01-01T00:00:00Z",
    valid_to="9999-12-31T23:59:59Z"
)

client.create_edge(db_server, int_ip, "CONNECTS_TO",
    properties={"port": 5432, "protocol": "TCP"},
    weight=0.2,
    valid_from="2024-01-01T00:00:00Z",
    valid_to="9999-12-31T23:59:59Z"
)

# Vulnerabilities
client.create_edge(web_server, vuln_rce, "HAS_VULNERABILITY",
    properties={"discovered": "2024-10-20", "status": "unpatched"},
    weight=0.95,
    valid_from="2024-10-20T00:00:00Z",
    valid_to="9999-12-31T23:59:59Z"
)

client.create_edge(web_server, vuln_sqli, "HAS_VULNERABILITY",
    properties={"discovered": "2024-09-05", "status": "unpatched"},
    weight=0.8,
    valid_from="2024-09-05T00:00:00Z",
    valid_to="9999-12-31T23:59:59Z"
)

client.create_edge(db_server, vuln_sqli, "HAS_VULNERABILITY",
    properties={"discovered": "2024-09-05", "status": "patched"},
    weight=0.1,
    valid_from="2024-09-05T00:00:00Z",
    valid_to="2024-10-01T00:00:00Z"
)

# Alert
client.create_edge(web_server, alert, "TRIGGERED_ALERT",
    properties={"source": "EDR"},
    weight=1.0
)

# Attack pattern exploits vulnerability
client.create_edge(attack, vuln_rce, "EXPLOITS",
    properties={"confidence": "high"},
    weight=0.95
)

client.create_edge(attack, vuln_sqli, "EXPLOITS",
    properties={"confidence": "medium"},
    weight=0.6
)

# User logins (temporal)
client.create_edge(admin_user, jump_host, "LOGGED_IN_FROM",
    properties={"method": "ssh_key"},
    weight=0.5,
    valid_from="2024-11-15T02:45:00Z",
    valid_to="2024-11-15T03:30:00Z"
)

client.create_edge(admin_user, web_server, "LOGGED_IN_FROM",
    properties={"method": "ssh_key"},
    weight=0.7,
    valid_from="2024-11-15T03:00:00Z",
    valid_to="2024-11-15T03:25:00Z"
)

print("Threat graph constructed: 10 nodes, 12 edges")

At this point, the graph contains the following topology:

(svc-deploy:User) / \ LOGGED_IN_FROM LOGGED_IN_FROM [02:45-03:30] [03:00-03:25] / \ (bastion-01:Host) (web-prod-01:Host) / | | \ CONNECTS_TO / | | \ CONNECTS_TO [03:22-04:15] / | | \ [permanent] / | | \ (198.51.100.42 HAS_VULN | TRIGGERED (10.0.1.50 :IPAddress) | | _ALERT :IPAddress) | | | +-----+--+ | CONNECTS_TO | | | [permanent] (CVE-2024 (CVE-2024 | | -3094) -1234) | (db-prod-01:Host) :Vuln) :Vuln) | \ / | EXPLOITS / (ALT-2024-0891 \ / :Alert) (T1190:AttackPattern)
Why embeddings on some nodes? Not every node needs an embedding. We attach them to nodes whose description field carries semantic meaning—vulnerabilities, alerts, attack patterns, and key hosts. This lets us use vector search (Section 15.3) to find entities that are conceptually similar, not just structurally connected. In a production system, you would generate these embeddings using a model like text-embedding-3-small rather than random vectors.

15.2 Investigating an Alert

At 03:22 UTC, alert ALT-2024-0891 fires: "Suspicious outbound connection to known C2 IP" on web-prod-01. The SOC analyst opens the AstraeaDB console and begins the investigation.

Step 1: Determine the Blast Radius (BFS)

The first question is: what else is reachable from the compromised host? A breadth-first search from web-prod-01 reveals every entity within two hops—the immediate blast radius.

# BFS from the web server, max depth of 2
blast_radius = client.bfs(start=web_server, max_depth=2)

print(f"Blast radius: {len(blast_radius)} entities reachable within 2 hops")
for node in blast_radius:
    print(f"  [{node['labels'][0]}] {node['properties']}")

Expected output:

Blast radius: 8 entities reachable within 2 hops
  [IPAddress]     {"ip": "198.51.100.42", "type": "external", "geo": "Eastern Europe"}
  [IPAddress]     {"ip": "10.0.1.50", "type": "internal"}
  [Vulnerability] {"cve_id": "CVE-2024-3094", "severity": "CRITICAL", "cvss": 9.8}
  [Vulnerability] {"cve_id": "CVE-2024-1234", "severity": "HIGH", "cvss": 8.1}
  [Alert]         {"alert_id": "ALT-2024-0891", "severity": "CRITICAL"}
  [Host]          {"hostname": "db-prod-01", "tier": "Internal"}
  [AttackPattern] {"technique_id": "T1190", "name": "Exploit Public-Facing Application"}
  [User]          {"username": "svc-deploy", "role": "service_account"}
Critical finding The blast radius includes db-prod-01 (the production database) via the internal IP 10.0.1.50. If the attacker has achieved code execution on the web server, the database is one hop away. This must be escalated immediately.

Step 2: Find the Attack Path (Shortest Path)

How could the attacker reach the database from the external IP? The weighted shortest path reveals the most likely route, where edge weights represent risk scores.

# Weighted shortest path: external IP --> database server
path = client.shortest_path(
    from_node=ext_ip,
    to_node=db_server,
    weighted=True
)

print(f"Attack path (cost: {path['cost']}):")
for step in path["path"]:
    print(f"  --> {step}")

Expected output:

Attack path (cost: 1.4):
  --> IPAddress  198.51.100.42 (external)
  --> Host       web-prod-01 (DMZ)
  --> IPAddress  10.0.1.50 (internal)
  --> Host       db-prod-01 (Internal)

Step 3: Examine Neighbors (Direct Relationships)

Next, the analyst inspects the web server's immediate relationships to understand its full exposure surface.

# What is directly connected to the compromised web server?
neighbors = client.neighbors(
    node_id=web_server,
    direction="both",
    edge_type=None  # all edge types
)

print(f"web-prod-01 has {len(neighbors)} direct relationships:")
for n in neighbors:
    print(f"  --[{n['edge_type']}]--> {n['labels'][0]}: {n['properties']}")

Expected output:

web-prod-01 has 6 direct relationships:
  --[CONNECTS_TO]--------> IPAddress:     {"ip": "198.51.100.42", "type": "external"}
  --[CONNECTS_TO]--------> IPAddress:     {"ip": "10.0.1.50", "type": "internal"}
  --[HAS_VULNERABILITY]--> Vulnerability: {"cve_id": "CVE-2024-3094", "severity": "CRITICAL"}
  --[HAS_VULNERABILITY]--> Vulnerability: {"cve_id": "CVE-2024-1234", "severity": "HIGH"}
  --[TRIGGERED_ALERT]----> Alert:         {"alert_id": "ALT-2024-0891", "severity": "CRITICAL"}
  --[LOGGED_IN_FROM]<----- User:          {"username": "svc-deploy", "role": "service_account"}

Step 4: Targeted GQL Queries

For more precise questions, the analyst drops into GQL. These queries demonstrate the power of declarative pattern matching for security investigations.

Query 1: Find all CRITICAL or HIGH vulnerabilities on the compromised host

# GQL: Find critical/high vulnerabilities on web-prod-01
result = client.query("""
    MATCH (h:Host)-[:HAS_VULNERABILITY]->(v:Vulnerability)
    WHERE h.hostname = 'web-prod-01'
      AND v.severity IN ['CRITICAL', 'HIGH']
    RETURN v.cve_id, v.severity, v.cvss, v.description
    ORDER BY v.cvss DESC
""")

print("Vulnerabilities on compromised host:")
print(f"  Columns: {result['columns']}")
for row in result["rows"]:
    print(f"  {row[0]} | {row[1]} | CVSS {row[2]} | {row[3]}")
print(f"  ({result['stats']['rows_returned']} rows)")

Expected output:

Vulnerabilities on compromised host:
  Columns: ['v.cve_id', 'v.severity', 'v.cvss', 'v.description']
  CVE-2024-3094 | CRITICAL | CVSS 9.8 | Remote code execution via crafted HTTP request
  CVE-2024-1234 | HIGH     | CVSS 8.1 | SQL injection in authentication module
  (2 rows)

Query 2: Find all hosts that connect to the same suspicious external IP

# GQL: Which hosts have talked to the C2 IP?
result = client.query("""
    MATCH (h:Host)-[:CONNECTS_TO]->(ip:IPAddress)
    WHERE ip.ip = '198.51.100.42'
    RETURN h.hostname, h.tier, h.os
""")

print("Hosts communicating with suspected C2 IP:")
for row in result["rows"]:
    print(f"  {row[0]} (tier: {row[1]}, os: {row[2]})")

Query 3: Trace the full attack chain from technique to impact

# GQL: Full attack chain -- technique -> vulnerability -> host -> alert
result = client.query("""
    MATCH (a:AttackPattern)-[:EXPLOITS]->(v:Vulnerability)
          <-(h:Host)-[:TRIGGERED_ALERT]->(alert:Alert)
    WHERE alert.severity = 'CRITICAL'
    RETURN a.technique_id, a.name,
           v.cve_id, v.cvss,
           h.hostname,
           alert.alert_id, alert.rule
""")

print("Full attack chain:")
for row in result["rows"]:
    print(f"  Technique: {row[0]} ({row[1]})")
    print(f"  Exploits:  {row[2]} (CVSS {row[3]})")
    print(f"  Target:    {row[4]}")
    print(f"  Alert:     {row[5]} -- {row[6]}")
    print()

Expected output:

Full attack chain:
  Technique: T1190 (Exploit Public-Facing Application)
  Exploits:  CVE-2024-3094 (CVSS 9.8)
  Target:    web-prod-01
  Alert:     ALT-2024-0891 -- Suspicious outbound connection to known C2 IP
Graph advantage In a relational database, the query above would require joining four or five tables. In AstraeaDB, it is a single pattern match that follows the natural structure of the data. The graph model turns multi-table joins into intuitive path expressions. See Chapter 5 for the complete GQL guide.

15.3 Enriching with AI

The structural investigation from Section 15.2 told us what is connected and how. Now we use AstraeaDB's AI capabilities to understand what it means—finding similar past threats, discovering hidden relationships, and generating analyst-ready briefings.

Vector Search: Find Similar Past Threats

Given the embedding of the current alert, vector search finds historically similar events—even if they share no structural connection to the current graph. This is especially powerful in large environments with thousands of past incidents.

# Generate an embedding for the current threat description
query_text = "Outbound beacon to command-and-control server via HTTPS"
query_vector = embed(query_text)  # your embedding model (e.g., text-embedding-3-small)

# Find the 5 most semantically similar nodes in the graph
similar = client.vector_search(query_vector=query_vector, k=5)

print("Semantically similar entities:")
for hit in similar:
    print(f"  [{hit['labels'][0]}] score={hit['score']:.3f}")
    print(f"    {hit['properties'].get('description', hit['properties'])}")

Expected output:

Semantically similar entities:
  [Alert]         score=0.943
    Outbound beacon detected to known command-and-control server
  [AttackPattern] score=0.871
    Adversary exploits a vulnerability in an internet-facing application
  [Vulnerability] score=0.812
    Remote code execution via crafted HTTP request
  [Host]          score=0.754
    Public-facing web server hosting customer portal
  [Vulnerability] score=0.698
    SQL injection in authentication module
How vector search helps In a real environment with thousands of past incidents, vector search can surface a historical alert from six months ago that used the same C2 infrastructure but targeted a different host. This gives the analyst immediate context: "We have seen this actor before." See Chapter 8 for a full treatment of vector search.

Hybrid Search: Structure + Semantics Combined

Hybrid search is AstraeaDB's most powerful search mode. It combines graph proximity (how close a node is structurally) with semantic similarity (how close a node is conceptually). The alpha parameter controls the blend: alpha=1.0 is pure vector search, alpha=0.0 is pure graph traversal.

# Hybrid search: start from the web server, blend structure + semantics
# Looking for nodes related to lateral movement near the compromised host
results = client.hybrid_search(
    anchor=web_server,
    query_vector=embed("lateral movement privilege escalation data exfiltration"),
    max_hops=3,
    k=5,
    alpha=0.6  # 60% semantic, 40% structural
)

print("Hybrid search results (structure + semantics):")
for r in results:
    print(f"  [{r['labels'][0]}] hybrid_score={r['score']:.3f}")
    print(f"    {r['properties']}")

Expected output:

Hybrid search results (structure + semantics):
  [User]          hybrid_score=0.891
    {"username": "svc-deploy", "role": "service_account", "department": "DevOps"}
  [IPAddress]     hybrid_score=0.847
    {"ip": "10.0.1.50", "type": "internal"}
  [Host]          hybrid_score=0.823
    {"hostname": "db-prod-01", "tier": "Internal", "os": "Ubuntu 22.04"}
  [Vulnerability] hybrid_score=0.781
    {"cve_id": "CVE-2024-3094", "severity": "CRITICAL", "cvss": 9.8}
  [Host]          hybrid_score=0.695
    {"hostname": "bastion-01", "tier": "Management", "os": "Amazon Linux 2"}

The results prioritize nodes that are both near the compromised host in the graph and semantically related to lateral movement. This surfaces the service account svc-deploy (which logged into the web server shortly before the alert) and the internal IP 10.0.1.50 (the path to the database)—exactly the entities an analyst would want to investigate next.

GraphRAG: Automated Analyst Briefing

Finally, we use GraphRAG to generate a natural-language briefing. AstraeaDB retrieves the relevant subgraph around the alert, serializes it as structured context, and sends it to an LLM—all in a single atomic call.

# Generate an AI-powered threat briefing
briefing = client.graph_rag(
    question="Provide a threat analysis briefing for alert ALT-2024-0891. "
             "Include the likely attack vector, blast radius, affected assets, "
             "and recommended containment actions.",
    anchor=alert,
    hops=3,
    max_nodes=50,
    format="markdown"
)

print(briefing["answer"])

Example output (generated by the LLM from graph context):

## Threat Analysis Briefing: ALT-2024-0891

**Severity:** CRITICAL
**Timestamp:** 2024-11-15T03:22:17Z

### Summary
Alert ALT-2024-0891 indicates a command-and-control (C2) beacon from
web-prod-01 to external IP 198.51.100.42 (geolocated to Eastern Europe).
The host has two unpatched vulnerabilities: CVE-2024-3094 (CVSS 9.8, RCE)
and CVE-2024-1234 (CVSS 8.1, SQLi). The likely initial access vector is
MITRE ATT&CK T1190 (Exploit Public-Facing Application).

### Blast Radius
- **Direct risk:** db-prod-01 (production database) is reachable from the
  compromised host via internal IP 10.0.1.50 on port 5432.
- **Account compromise:** Service account svc-deploy authenticated to
  web-prod-01 at 03:00 UTC, 22 minutes before the alert. This account
  also has access to bastion-01.

### Recommended Actions
1. **Isolate** web-prod-01 from the network immediately.
2. **Rotate** credentials for svc-deploy and audit its access logs.
3. **Patch** CVE-2024-3094 and CVE-2024-1234 across all affected hosts.
4. **Block** outbound traffic to 198.51.100.42 at the firewall.
5. **Investigate** db-prod-01 for signs of lateral movement.
Why GraphRAG outperforms naive RAG A standard vector-only RAG system would retrieve the alert and perhaps the vulnerability descriptions. GraphRAG retrieves the entire connected subgraph—the hosts, IPs, user accounts, attack patterns, and their relationships. This gives the LLM the structural context it needs to reason about attack paths and blast radius, producing vastly more actionable briefings. See Chapter 11 for the full GraphRAG guide.

15.4 Temporal Analysis

The most important question in any incident response is: when did this happen? AstraeaDB's temporal edges (introduced in Chapter 9) let us reconstruct the exact timeline of the attack by querying the graph as it existed at specific points in time.

Reconstructing the Timeline

Each edge in our threat graph carries valid_from and valid_to timestamps. This allows us to answer time-specific questions without maintaining separate snapshots of the data.

Question 1: What was connected to the web server at the time of the alert?

# Neighbors at the exact moment of the alert
alert_time = "2024-11-15T03:22:17Z"

neighbors_at_alert = client.neighbors_at(
    node_id=web_server,
    direction="both",
    timestamp=alert_time,
    edge_type=None
)

print(f"Connections active at {alert_time}:")
for n in neighbors_at_alert:
    print(f"  --[{n['edge_type']}]--> {n['labels'][0]}: {n['properties']}")

Expected output:

Connections active at 2024-11-15T03:22:17Z:
  --[CONNECTS_TO]--------> IPAddress:     {"ip": "198.51.100.42", "type": "external"}
  --[CONNECTS_TO]--------> IPAddress:     {"ip": "10.0.1.50", "type": "internal"}
  --[HAS_VULNERABILITY]--> Vulnerability: {"cve_id": "CVE-2024-3094", "severity": "CRITICAL"}
  --[HAS_VULNERABILITY]--> Vulnerability: {"cve_id": "CVE-2024-1234", "severity": "HIGH"}
  --[TRIGGERED_ALERT]----> Alert:         {"alert_id": "ALT-2024-0891"}
  --[LOGGED_IN_FROM]<----- User:          {"username": "svc-deploy"}
Key observation The service account svc-deploy was logged in at the time of the alert (its session was valid from 03:00 to 03:25 UTC). This is a strong indicator that the account may have been used as part of the attack chain, or at the very least, its session was active during the compromise window.

Question 2: Was the vulnerability present before the connection to the C2 appeared?

# Check one month before the alert: were the vulns already present?
one_month_before = "2024-10-15T03:22:17Z"

vulns_before = client.neighbors_at(
    node_id=web_server,
    direction="outgoing",
    timestamp=one_month_before,
    edge_type="HAS_VULNERABILITY"
)

print(f"Vulnerabilities present on {one_month_before}:")
for v in vulns_before:
    print(f"  {v['properties']['cve_id']} (CVSS {v['properties']['cvss']})")

# Check: was the C2 connection active back then?
c2_before = client.neighbors_at(
    node_id=web_server,
    direction="outgoing",
    timestamp=one_month_before,
    edge_type="CONNECTS_TO"
)

ext_ips = [n for n in c2_before if n["properties"].get("type") == "external"]
print(f"\nExternal connections active on {one_month_before}: {len(ext_ips)}")

Expected output:

Vulnerabilities present on 2024-10-15T03:22:17Z:
  CVE-2024-1234 (CVSS 8.1)

External connections active on 2024-10-15T03:22:17Z: 0

This reveals that CVE-2024-1234 (the SQL injection) was already present a month before the attack, but the C2 connection did not exist yet, and CVE-2024-3094 (the RCE) had not yet been discovered on this host. The RCE vulnerability was discovered on October 20th and likely provided the initial access that led to the C2 beacon on November 15th.

Question 3: Did the attack path to the database exist before the C2 connection appeared?

# Temporal shortest path: could the attacker reach the DB at alert time?
temporal_path = client.shortest_path_at(
    from_node=ext_ip,
    to_node=db_server,
    timestamp=alert_time
)

if temporal_path["path"]:
    print(f"Attack path existed at {alert_time}:")
    for step in temporal_path["path"]:
        print(f"  --> {step}")
else:
    print("No path existed at that time.")

# Compare: was this path available before the C2 connection was established?
before_c2 = "2024-11-15T03:00:00Z"
path_before = client.shortest_path_at(
    from_node=ext_ip,
    to_node=db_server,
    timestamp=before_c2
)

if path_before["path"]:
    print(f"\nPath also existed at {before_c2} (pre-C2 connection)")
else:
    print(f"\nNo path from external IP to DB at {before_c2}")
    print("  --> The C2 connection at 03:22 UTC created the attack path")

Expected output:

Attack path existed at 2024-11-15T03:22:17Z:
  --> IPAddress  198.51.100.42 (external)
  --> Host       web-prod-01 (DMZ)
  --> IPAddress  10.0.1.50 (internal)
  --> Host       db-prod-01 (Internal)

No path from external IP to DB at 2024-11-15T03:00:00Z
  --> The C2 connection at 03:22 UTC created the attack path
Temporal analysis reveals causality Without temporal edges, we would only know that a path exists. With temporal edges, we can prove that the path did not exist before 03:22 UTC and did exist after. This establishes the C2 connection as the event that created the attack surface to the internal database. See Chapter 9 for the full temporal graphs guide.

The Complete Attack Timeline

Putting it all together, we can reconstruct the complete sequence of events:

Time (UTC)EventEvidence in Graph
2024-09-05 CVE-2024-1234 (SQLi, CVSS 8.1) discovered on web-prod-01 and db-prod-01 HAS_VULNERABILITY edge valid_from
2024-10-01 CVE-2024-1234 patched on db-prod-01 (but NOT on web-prod-01) HAS_VULNERABILITY edge valid_to on db-prod-01
2024-10-20 CVE-2024-3094 (RCE, CVSS 9.8) discovered on web-prod-01 HAS_VULNERABILITY edge valid_from
Nov 15 02:45 svc-deploy logs into bastion-01 LOGGED_IN_FROM edge valid_from
Nov 15 03:00 svc-deploy logs into web-prod-01 LOGGED_IN_FROM edge valid_from
Nov 15 03:22 C2 beacon detected: web-prod-01 connects to 198.51.100.42 CONNECTS_TO edge valid_from + TRIGGERED_ALERT
Nov 15 03:25 svc-deploy session on web-prod-01 ends LOGGED_IN_FROM edge valid_to
Nov 15 04:15 C2 connection terminates CONNECTS_TO edge valid_to
Suspicious correlation The svc-deploy account logged into the web server at 03:00 UTC and the C2 beacon appeared 22 minutes later. The account's session ended at 03:25—only 3 minutes after the alert fired. This pattern is consistent with a compromised service account being used for initial access. The account should be immediately disabled and its credentials rotated across all systems.

15.5 Summary: Features Used in This Investigation

This single investigation leveraged nearly every major capability of AstraeaDB. The following table maps each feature to its role in the cybersecurity workflow:

FeatureAPI UsedRole in InvestigationChapter
Graph Construction create_node, create_edge Built the threat model with hosts, IPs, vulnerabilities, alerts, attack patterns, and user accounts Ch 4
BFS Traversal bfs Determined the blast radius—all entities reachable from the compromised host within 2 hops Ch 6
Shortest Path shortest_path Identified the most likely attack path from the external C2 IP to the internal database Ch 6
Neighbor Queries neighbors Enumerated all direct relationships of the compromised host for triage Ch 6
GQL Queries query Pattern-matched complex attack chains across multiple entity types (technique to vulnerability to host to alert) Ch 5
Vector Search vector_search Found semantically similar past threats using node embeddings Ch 8
Hybrid Search hybrid_search Combined structural proximity and semantic similarity to surface the most relevant entities for lateral movement investigation Ch 8
GraphRAG graph_rag Generated a complete analyst briefing by feeding the threat subgraph to an LLM Ch 11
Temporal Neighbors neighbors_at Determined which connections and vulnerabilities were active at the exact time of the alert Ch 9
Temporal Shortest Path shortest_path_at Proved that the attack path to the database did not exist before the C2 connection was established Ch 9
Where to go from here This chapter demonstrated a single investigation, but the same patterns apply to any domain where relationships matter: fraud detection (financial transaction graphs), supply chain analysis (vendor dependency graphs), network monitoring (infrastructure topology graphs), and knowledge management (organizational knowledge graphs). The techniques you have learned throughout this guide—from basic graph construction in Chapter 4 to GraphRAG in Chapter 11—compose together to solve problems that are fundamentally difficult in relational databases. Explore the Appendices for quick reference cards, or visit the GitHub repository to start building.
← Chapter 14: Performance and Scaling Appendix A: GQL Quick Reference →