Introduction

AstraeaDB is a cloud-native, AI-first graph database written in Rust. It combines a Vector-Property Graph model with an HNSW (Hierarchical Navigable Small World) vector index, enabling both structural graph traversals and semantic similarity search in a single system.

Key Differentiators

Project Stats

MetricValue
Rust Crates14
Rust Tests441
Python Tests23
Go Tests30
Java Tests113
Client LibrariesPython, R, Go, Java, Rust (embedded)
Rust Edition2024
LicenseMIT

Getting Started

Build

cargo build --workspace

Run Tests

cargo test --workspace

Start the Server

cargo run -p astraea-cli -- serve

This starts the TCP server on port 7687, gRPC on 7688, and Arrow Flight on 7689.

Connect with the Shell

cargo run -p astraea-cli -- shell

The interactive shell supports both GQL queries and raw JSON requests:

# GQL queries
astraea> CREATE (a:Person {name: "Alice", age: 30})
Nodes created: 1

astraea> MATCH (a:Person) WHERE a.age > 25 RETURN a.name, a.age
+-------+------+
| a.name| a.age|
+-------+------+
| Alice | 30   |
+-------+------+

# Dot-commands
astraea> .status
astraea> .help
astraea> .quit

Check Server Status

cargo run -p astraea-cli -- status

Architecture

+---------------------------------+ | astraea-cli | | serve | shell | import | export| +---------------+-----------------+ | +-------------------------+-------------------------+ | | | +---------v----------+ +-----------v--------+ +------------v-------+ | astraea-server | | astraea-flight | | Client Libraries | | JSON-TCP (7687) | | Arrow Flight | | Python, R, Go, | | gRPC (7688) | | do_get / do_put | | JSON + gRPC | | Auth, Metrics | | | | + Arrow Flight | | Connection Mgmt | | | | Java | +---------+----------+ +-----------+--------+ +--------------------+ | | +------------+------------+ | +------------------+------------------+-------------------+ | | | | +---v------+ +--------v--------+ +-----v-------+ +-------v----------+ | astraea- | | astraea-query | | astraea-gnn | | astraea- | | rag | | GQL Parser | | Tensor, | | algorithms | | Subgraph | | + Executor | | MsgPassing | | PageRank, | | LLM, RAG | | | | Training | | Louvain, etc. | +---+------+ +--------+--------+ +-----+-------+ +-------+----------+ | | | | +------------------+------------------+-------------------+ | +------------+------------+ | | +---------v---------+ +------------v--------+ | astraea-graph | | astraea-vector | | CRUD, BFS, DFS | | HNSW Index | | Hybrid Search | | ANN Search | | Temporal Queries | | Persistence | +---------+---------+ +---------------------+ | +---------v-----------------------------------+ | astraea-storage | | Pages → Buffer Pool → Pointer Swizzle | | MVCC, WAL, PageIO, Cold Storage | +---------+-----------------------------------+ | +---------v-----------------------------------+ | astraea-core | | Types, Traits, Errors | +---------+-----------------------------------+ | +-----+-------+----------------+ | | | +---v--------+ +--v-----------+ +--v--------------+ | astraea- | | astraea-gpu | | astraea-cluster | | crypto | | CSR Matrix, | | Partitioning, | | Encrypted | | CPU Backend | | Sharding, | | Labels,FHE | | PageRank/BFS | | Coordination | +------------+ +--------------+ +-----------------+

Crate Overview

CratePurposeTests
astraea-coreTypes (Node, Edge, NodeId), traits (StorageEngine, GraphOps, VectorIndex, TransactionalEngine), errors4
astraea-storage8 KiB pages, LRU buffer pool, pointer swizzling, MVCC, WAL, label index, cold storage (JSON/Parquet/S3), PageIO (memmap2/io_uring)75
astraea-graphCRUD, BFS, DFS, Dijkstra, temporal queries, hybrid search, semantic traversal55
astraea-queryGQL lexer, recursive-descent parser, full query executor56
astraea-vectorHNSW index, cosine/Euclidean/dot-product, binary persistence33
astraea-ragSubgraph extraction, linearization (4 formats), LLM providers, GraphRAG pipeline27
astraea-gnnDifferentiable tensors, message passing, node classification training26
astraea-serverTCP/gRPC server, auth (RBAC + mTLS), metrics (Prometheus), connection management68
astraea-flightArrow Flight: do_get (query → Arrow), do_put (Arrow → import)11
astraea-algorithmsPageRank, connected/strongly-connected components, centrality, Louvain20
astraea-cryptoEncrypted labels/values/nodes, server-side encrypted label matching31
astraea-gpuCSR matrix, GpuBackend trait, CpuBackend (PageRank, BFS, SSSP)16
astraea-clusterHash/range partitioning, shard management, cluster coordinator19
astraea-cliserve, shell, status, import, export

Data Model

AstraeaDB uses a Vector-Property Graph model that unifies property graphs with vector embeddings.

Node

A node has an ID, a set of labels, arbitrary JSON properties, and an optional embedding vector.

pub struct Node {
    pub id: NodeId,
    pub labels: Vec<String>,
    pub properties: serde_json::Value,
    pub embedding: Option<Vec<f32>>, // optional dense vector
}

Edge

An edge connects two nodes with a type, JSON properties, a learnable weight, and a temporal validity interval.

pub struct Edge {
    pub id: EdgeId,
    pub source: NodeId,
    pub target: NodeId,
    pub edge_type: String,
    pub properties: serde_json::Value,
    pub weight: f64,                   // learnable weight for GNN
    pub validity: ValidityInterval,    // temporal bounds
}

ValidityInterval

Represents when an edge is valid. Uses epoch milliseconds with inclusive start and exclusive end.

pub struct ValidityInterval {
    pub valid_from: Option<i64>,  // inclusive, None = unbounded
    pub valid_to: Option<i64>,    // exclusive, None = still valid
}

// Check if an edge is valid at a given time
let valid = edge.validity.contains(1704067200000); // 2024-01-01

GraphPath

Represents a path through the graph as a start node followed by (edge, node) steps.

pub struct GraphPath {
    pub start: NodeId,
    pub steps: Vec<(EdgeId, NodeId)>,
}

ID Types

TypeDescription
NodeId(u64)Unique node identifier
EdgeId(u64)Unique edge identifier
PageId(u64)Storage page identifier
TransactionId(u64)MVCC transaction identifier
Lsn(u64)Write-ahead log sequence number

Storage Engine

AstraeaDB uses a three-tier storage architecture optimized for graph workloads.

Tier 1: Cold Storage

Data at rest on disk or object storage. The ColdStorage trait provides a pluggable backend interface with three implementations:

BackendDescriptionUse Case
JsonFileColdStorageHuman-readable JSON files on local diskDevelopment, debugging, small datasets
ParquetColdStorageColumnar Apache Parquet with Arrow schemaAnalytics, large datasets, efficient compression
ObjectStoreColdStorageS3, GCS, Azure Blob, or local filesystemCloud-native deployments, data lake integration

Parquet Schema

Nodes and edges are stored with full Arrow schema mapping:

// Node schema
id: UInt64, labels: List<Utf8>, properties: Utf8, embedding: List<Float32>

// Edge schema
id: UInt64, source: UInt64, target: UInt64, edge_type: Utf8,
properties: Utf8, weight: Float64, valid_from: Int64, valid_to: Int64

Object Store Usage

use astraea_storage::ObjectStoreColdStorage;

// Local filesystem
let storage = ObjectStoreColdStorage::local("/data/cold")?;

// Amazon S3
let storage = ObjectStoreColdStorage::s3("my-bucket", "astraea/")?;

// Google Cloud Storage
let storage = ObjectStoreColdStorage::gcs("my-bucket", "astraea/")?;

// Azure Blob Storage
let storage = ObjectStoreColdStorage::azure("my-container", "astraea/")?;

Tier 2: Warm (Buffer Pool)

An LRU buffer pool caches frequently accessed 8 KiB pages in memory with pin/unpin semantics. The PageIO trait abstracts disk I/O with two backends:

BackendPlatformDescription
FileManagerAll platformsCross-platform memmap2-based I/O (default)
UringPageIOLinux onlyHigh-performance io_uring async I/O (feature-gated)

Enabling io_uring (Linux)

# Cargo.toml
[dependencies]
astraea-storage = { version = "0.1", features = ["io-uring"] }

# Build command
cargo build --features io-uring

Tier 3: Hot (Pointer Swizzling)

Frequently-accessed pages are promoted to permanently-pinned status, preventing eviction and enabling zero-copy access. When a page's access count exceeds a configurable threshold, it is "swizzled" into the hot tier.

Page Format

+----------------------------------+
| PageHeader (17 bytes)            |
|   page_id, type, record_count,  |
|   free_space_offset, checksum   |
+----------------------------------+
| Record 0: NodeRecordHeader       |
|   node_id, data_len, adj_offset |
|   + serialized properties       |
+----------------------------------+
| Record 1: ...                    |
+----------------------------------+
|         (free space)             |
+----------------------------------+
         8192 bytes total

MVCC Transactions

MVCC (Multi-Version Concurrency Control) allows multiple transactions to read and write data concurrently without blocking each other. AstraeaDB uses snapshot isolation with first-writer-wins conflict detection. The TransactionalEngine trait provides transactional access:

Write-Ahead Log (WAL)

The WAL (Write-Ahead Log) ensures durability: every mutation is logged to disk before being applied to the data files. If the database crashes, it can recover by replaying the log. Records use a [length][type][JSON payload][CRC32] frame format. Supports BeginTransaction, CommitTransaction, and AbortTransaction records for crash recovery.

Label Index

A hash-based index (HashMap<String, HashSet<NodeId>>) for O(1) constant-time label-based lookups, automatically maintained when nodes are created or deleted.

Graph Operations

The GraphOps trait defines all graph-level operations. It is implemented by the Graph struct on top of any StorageEngine.

CRUD Operations

use astraea_graph::Graph;
use astraea_core::traits::GraphOps;

// Create nodes
let alice = graph.create_node(
    vec!["Person".into()],
    json!({"name": "Alice", "age": 30}),
    None,  // no embedding
)?;

// Create an edge (always valid)
graph.create_edge(alice, bob, "KNOWS".into(), json!({}), 1.0, None, None)?;

// Read
let node = graph.get_node(alice)?;
let edge = graph.get_edge(edge_id)?;

// Update (merge semantics)
graph.update_node(alice, json!({"title": "Engineer"}))?;

// Delete (node deletion cascades to edges)
graph.delete_node(alice)?;

Traversals

Graph traversal algorithms explore nodes by following edges:

MethodAlgorithmDescription
bfs(start, max_depth)BFS (Breadth-First Search)Explores neighbors level by level. Returns Vec<(NodeId, depth)>
dfs(start, max_depth)DFS (Depth-First Search)Explores as far as possible before backtracking. Returns Vec<NodeId>
shortest_path(from, to)BFSUnweighted shortest path (fewest hops)
shortest_path_weighted(from, to)Dijkstra's algorithmWeighted shortest path using edge weights

Neighbor Queries

use astraea_core::types::Direction;

// All outgoing neighbors
let neighbors = graph.neighbors(alice, Direction::Outgoing)?;

// Filtered by edge type
let friends = graph.neighbors_filtered(alice, Direction::Both, "KNOWS")?;

// Find nodes by label (O(1) via label index)
let people = graph.find_by_label("Person")?;

Temporal Queries

Edges have a ValidityInterval that defines when they exist. Temporal query methods filter edges by a given timestamp, allowing you to query the graph as it existed at any point in time.

Creating Temporal Edges

// DHCP lease valid from 08:00 to 10:00 UTC on Jan 15, 2025
graph.create_edge(
    ip_node, laptop_node,
    "DHCP_LEASE".into(),
    json!({"dhcp_server": "10.0.0.1"}),
    1.0,
    Some(1736928000000),  // valid_from
    Some(1736935200000),  // valid_to
)?;

Temporal Traversal Methods

MethodDescription
neighbors_at(node, direction, timestamp)Neighbors via edges valid at the timestamp
bfs_at(start, max_depth, timestamp)BFS traversal only following valid edges
shortest_path_at(from, to, timestamp)Unweighted shortest path at a point in time
shortest_path_weighted_at(from, to, timestamp)Dijkstra with temporal filtering

Server Requests

// Neighbors at a specific time
{"type": "NeighborsAt", "id": 42, "direction": "outgoing", "timestamp": 1736929800000}

// BFS at a specific time
{"type": "BfsAt", "start": 42, "max_depth": 3, "timestamp": 1736929800000}

// Shortest path at a specific time
{"type": "ShortestPathAt", "from": 1, "to": 5, "timestamp": 1736929800000, "weighted": false}

AstraeaDB includes a full implementation of the HNSW (Hierarchical Navigable Small World) algorithm for ANN (Approximate Nearest-Neighbor) search. HNSW builds a multi-layer graph where each layer is a "small world" network — most nodes are not directly connected, but any two nodes can be reached through a small number of hops. This enables finding similar vectors in logarithmic time rather than scanning all vectors.

What are embeddings? Embeddings are numeric vectors (arrays of floating-point numbers) that represent the semantic meaning of data. Similar concepts have similar embeddings — two sentences about "dogs" will have embeddings that are mathematically close together, even if they use different words.

Configuration

ParameterDefaultDescription
M16Maximum connections per node per layer (higher = more accurate but slower)
ef_construction200Beam width during index building (higher = better quality index)
ef_search50Beam width during search (trade-off: higher = more accurate, lower = faster)

Distance Metrics

Distance metrics measure how "far apart" two vectors are. Lower distance means more similar:

MetricDescriptionBest For
CosineMeasures the angle between vectors (1 - cosθ). Ignores magnitude.Text embeddings, normalized vectors
EuclideanStraight-line distance (L2 norm). Considers magnitude.Spatial data, image features
DotProductNegative dot product. Higher dot product = more similar.Recommendation systems, MIPS

Usage

use astraea_vector::HnswVectorIndex;
use astraea_core::types::DistanceMetric;
use astraea_core::traits::VectorIndex;

// Create a 128-dimensional index with cosine distance
let index = HnswVectorIndex::new(128, DistanceMetric::Cosine);

// Insert embeddings
index.insert(node_id, &embedding_vec)?;

// Search for k nearest neighbors
let results = index.search(&query_vec, 10)?;
for r in &results {
    println!("Node {:?}, distance: {}", r.node_id, r.distance);
}

Persistence

The index can be saved to and loaded from disk using a versioned binary format (magic bytes + bincode):

// Save to disk
index.save("index.hnsw")?;

// Load from disk (no rebuild needed)
let index = HnswVectorIndex::load("index.hnsw")?;

Auto-Indexing

When a VectorIndex is attached to a Graph, embeddings are automatically indexed on create_node() and removed on delete_node().

Hybrid & Semantic Search

Hybrid Search

Combines graph proximity with vector similarity using a configurable alpha blending parameter:

final_score = alpha × vector_score + (1 - alpha) × graph_score

graph.hybrid_search(
    anchor_node,       // starting node for BFS
    &query_embedding,  // semantic target
    3,                 // max_hops (BFS radius)
    10,                // top-k results
    0.5,               // alpha: 0.0 = pure graph, 1.0 = pure vector
)?;

Semantic Neighbors

Rank a node's neighbors by their embedding similarity to a concept vector:

// "Find the neighbor of Alice most similar to the concept of Risk"
let ranked = graph.semantic_neighbors(
    alice_id,
    &risk_embedding,
    Direction::Outgoing,
    5,  // top-k
)?;

Semantic Walk

A greedy multi-hop walk that at each step moves to the unvisited neighbor most similar to the concept:

// Walk through the graph toward the concept of "Fraud"
let path = graph.semantic_walk(
    start_node,
    &fraud_embedding,
    4,  // max hops
)?;
// path: Vec<(NodeId, f32)> where f32 is distance to concept

GQL Query Language

AstraeaDB includes a hand-written recursive-descent parser and full query executor for a subset of ISO GQL / Cypher.

MATCH Queries

MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.age > 30 AND b.name = "Bob"
RETURN a.name AS person, b.name AS friend
ORDER BY a.age DESC
LIMIT 10

CREATE

CREATE (a:Person {name: "Alice", age: 30})

CREATE (a:Person {name: "Alice"})-[:KNOWS {since: 2020}]->(b:Person {name: "Bob"})

DELETE

MATCH (a:Person) WHERE a.name = "Alice" DELETE a

Expression Support

CategoryOperators / Functions
Arithmetic+, -, *, /, %
Comparison=, <>, <, <=, >, >=
BooleanAND, OR, NOT
Null checksIS NULL, IS NOT NULL
Functionscount(), id(), labels(), type(), toString(), toInteger()
Edge directions-[:TYPE]-> (out), <-[:TYPE]- (in), -[:TYPE]- (both)

Programmatic Usage

use astraea_query::parse;
use astraea_query::executor::Executor;

let ast = parse("MATCH (a:Person) WHERE a.age > 30 RETURN a.name")?;
let executor = Executor::new(graph.clone());
let result = executor.execute(ast)?;
// result.columns: ["a.name"]
// result.rows: [["Alice"], ...]

GraphRAG Engine

Retrieval-Augmented Generation backed by graph context. The pipeline performs: vector search → subgraph extraction → linearization → LLM completion.

Tutorial: See the GraphRAG with Claude Tutorial for a complete walkthrough of using AstraeaDB's GraphRAG engine with Anthropic's Claude.

Subgraph Extraction

use astraea_rag::{extract_subgraph, linearize_subgraph, TextFormat};

// BFS 2 hops from a node, max 50 nodes
let subgraph = extract_subgraph(&*graph, node_id, 2, 50)?;

Linearization Formats

FormatDescriptionBest For
StructuredIndented tree with arrows (-[KNOWS]->)General LLM context
ProseNatural language paragraphsConversational AI
Triples(subject, predicate, object)Knowledge extraction
JsonCompact JSONStructured prompts
let text = linearize_subgraph(&subgraph, TextFormat::Structured);
let tokens = estimate_tokens(&text); // ~4 chars per token

LLM Providers

The LlmProvider trait supports multiple backends. Providers use injectable HTTP callbacks (no HTTP dependencies in the crate).

ProviderDescription
MockProviderReturns canned responses (for testing)
OpenAiProviderOpenAI API compatible endpoints
AnthropicProviderAnthropic Messages API
OllamaProviderLocal Ollama instance (default: localhost:11434)

Full Pipeline

use astraea_rag::{GraphRagConfig, graph_rag_query_anchored, MockProvider};

let config = GraphRagConfig {
    hops: 2,
    max_context_nodes: 50,
    text_format: TextFormat::Structured,
    token_budget: 4000,
    ..Default::default()
};

let result = graph_rag_query_anchored(
    &*graph, &llm, "Who knows Alice?", node_id, &config
)?;
println!("Answer: {}", result.answer);
println!("Context: {} tokens, {} nodes",
    result.estimated_tokens, result.nodes_in_context);

GNN Training

GNN (Graph Neural Network) is a type of neural network designed for graph-structured data. Unlike traditional neural networks that work on fixed-size inputs (like images), GNNs can learn from the structure of a graph — incorporating information from a node's neighbors to make predictions. AstraeaDB implements GNN training in pure Rust with no external ML framework.

Tutorial: See the Graph Neural Networks Tutorial for a complete guide to tensors, message passing, training loops, and node classification examples.
How GNNs work: Each node starts with features (its embedding). In each layer, nodes "send messages" to their neighbors and "aggregate" messages from neighbors. After several layers, a node's representation captures information from nodes many hops away.

Components

Example

use astraea_gnn::{TrainingConfig, TrainingData, MessagePassingConfig};
use astraea_gnn::training::train_node_classification;

let config = TrainingConfig {
    layers: 2,
    learning_rate: 0.01,
    epochs: 50,
    message_passing: MessagePassingConfig {
        aggregation: Aggregation::Mean,
        activation: Activation::ReLU,
        normalize: true,
    },
};

let result = train_node_classification(&*graph, &training_data, &config)?;
println!("Accuracy: {:.1}%", result.accuracy * 100.0);

Graph Algorithms

The astraea-algorithms crate provides classical graph algorithms for analyzing graph structure.

AlgorithmFunctionDescription
PageRankpagerank(graph, nodes, config)Ranks nodes by importance based on incoming links (like Google's original algorithm). Returns importance scores for each node.
Connected Componentsconnected_components(graph, nodes)Groups nodes into clusters where every node can reach every other node (ignoring edge direction).
Strongly Connectedstrongly_connected_components(graph, nodes)Like connected components, but respects edge direction (for directed graphs).
Degree Centralitydegree_centrality(graph, nodes, direction)Measures importance by counting connections. More connections = more central.
Betweenness Centralitybetweenness_centrality(graph, nodes)Measures how often a node lies on shortest paths between other nodes. High betweenness = important bridge.
Community Detectionlouvain(graph, nodes)Finds densely-connected groups (communities) using the Louvain algorithm. Returns which community each node belongs to.

PageRank Configuration

let config = PageRankConfig {
    damping: 0.85,           // damping factor
    max_iterations: 100,     // convergence limit
    tolerance: 1e-6,         // L1 norm convergence threshold
};

Network Server

AstraeaDB provides three transport layers for different use cases.

Transport Layers

TransportPortFormatBest For
JSON-over-TCP7687Newline-delimited JSONDebugging, scripting, netcat
gRPC / Protobuf7688Protocol BuffersProduction clients, type safety
Arrow Flight7689Apache ArrowPython/Pandas, bulk operations

Supported Request Types

RequestDescription
CreateNodeCreate a node with labels, properties, optional embedding
CreateEdgeCreate an edge between two nodes
GetNode / GetEdgeRetrieve by ID
UpdateNode / UpdateEdgeMerge properties
DeleteNode / DeleteEdgeDelete (node deletion cascades edges)
NeighborsGet neighbors with direction and edge-type filtering
BfsBreadth-first traversal with depth limit
ShortestPathUnweighted or weighted (Dijkstra)
VectorSearchk-nearest-neighbor via HNSW index
HybridSearchGraph + vector blended search
SemanticNeighborsRank neighbors by concept similarity
SemanticWalkGreedy walk toward a concept
NeighborsAtTemporal neighbors at a timestamp
BfsAtTemporal BFS at a timestamp
ShortestPathAtTemporal shortest path at a timestamp
ExtractSubgraphExtract and linearize a local subgraph
GraphRagGraphRAG pipeline (search → subgraph → LLM)
QueryExecute a GQL query string
PingHealth check

JSON Protocol Examples

// Create a node with an embedding
{"type":"CreateNode","labels":["Person"],"properties":{"name":"Alice"},"embedding":[0.1,0.2]}

// Response
{"status":"ok","data":{"node_id":1}}

// Execute a GQL query
{"type":"Query","gql":"MATCH (a:Person) RETURN a.name"}

// Ping
{"type":"Ping"}
{"status":"ok","data":{"pong":true,"version":"0.1.0"}}

Authentication & RBAC

AstraeaDB supports API key authentication and mTLS (mutual TLS) with role-based access control.

Roles

RolePermissions
AdminFull access to all operations
WriterRead + write (CRUD, traversals, queries)
ReaderRead-only (get, query, search, traverse, ping)

API Key Authentication

Include an auth_token field in JSON requests:

{"type":"CreateNode","labels":["Person"],"properties":{},"auth_token":"my-api-key"}

If auth is enabled and the token is missing or invalid:

{"status":"error","message":"authentication required: provide auth_token"}

mTLS (Mutual TLS)

For production deployments, AstraeaDB supports TLS encryption with optional client certificate verification. The server uses rustls for modern, safe TLS.

TLS Configuration

use astraea_server::tls::{TlsConfig};

// Server-only TLS (encrypt traffic)
let tls = TlsConfig::new(
    "server.crt",  // Server certificate
    "server.key",  // Server private key
);

// mTLS (verify client certificates)
let tls = TlsConfig::with_mtls(
    "server.crt",
    "server.key",
    "ca.crt",      // CA cert for client verification
);

Client Certificate Role Mapping

When mTLS is enabled, the client certificate's Common Name (CN) is automatically mapped to an RBAC role:

Certificate CNRole
Contains "admin"Admin
Contains "writer" or "write"Writer
All othersReader

TLS Helper Functions

use astraea_server::tls::*;

// Load certificates and keys
let certs = load_certs("server.crt")?;
let key = load_private_key("server.key")?;

// Extract info from client certificates
let cn = extract_client_cn(&client_certs);  // e.g., "admin-service"
let sans = extract_sans(&cert);             // Subject Alternative Names

// Map CN to role
let role = cn_to_role(&cn);                 // "admin-service" -> Admin

Read-Only Operations (accessible by Reader role)

GetNode, GetEdge, Neighbors, NeighborsAt, Bfs, BfsAt, ShortestPath, ShortestPathAt, VectorSearch, HybridSearch, SemanticNeighbors, SemanticWalk, Query, ExtractSubgraph, GraphRag, Ping

Audit Logging

All authenticated requests are logged with timestamp, truncated API key prefix (first 8 chars), role, operation, and whether it was allowed. The audit log is a bounded circular buffer (max 10,000 entries).

Key Management

let auth = AuthManager::new(vec![
    ApiKeyEntry { key: "admin-key-xxx".into(), role: Role::Admin,
                       description: "Admin key".into(), active: true },
    ApiKeyEntry { key: "reader-key-xxx".into(), role: Role::Reader,
                       description: "CI key".into(), active: true },
]);

// Runtime key management
auth.add_key(new_entry);
auth.revoke_key("compromised-key");

Observability

Prometheus Metrics

The server exposes metrics in Prometheus text exposition format.

MetricTypeDescription
astraea_requests_total{type="..."}counterTotal requests by operation type
astraea_errors_total{type="..."}counterTotal errors by operation type
astraea_request_duration_us{type="...",quantile="0.5|0.9|0.99"}summaryRequest duration percentiles (microseconds)
astraea_active_connectionsgaugeCurrently active TCP connections
astraea_connections_totalcounterTotal connections since startup
astraea_uptime_secondsgaugeServer uptime in seconds

Health Check

The health() method returns a JSON object:

{
  "status": "healthy",
  "uptime_seconds": 3600,
  "active_connections": 12,
  "total_connections": 1543,
  "start_time": 1704067200
}

Connection Management

Configuration

ParameterDefaultDescription
max_connections1024Maximum concurrent TCP connections. New connections beyond this are rejected.
max_concurrent_requests256Request-level backpressure via semaphore.
idle_timeout300sClose connections idle for longer than this.
request_timeout30sAbort requests that take longer than this.
drain_timeout10sMax time to wait for in-flight requests during shutdown.

Graceful Shutdown

  1. Stop accepting new connections
  2. Wait for in-flight requests to complete (up to drain_timeout)
  3. Close all connections

When the connection limit is reached, new connections receive:

{"status":"error","message":"server connection limit reached"}

Encryption

The astraea-crypto crate provides a foundation for encrypted queries, allowing clients to query the graph without the server seeing unencrypted data. This is essential for privacy-sensitive applications in banking, healthcare, and other regulated industries.

How it works: Labels are encrypted with deterministic tags (same label always produces the same tag), so the server can check equality without decrypting. Property values use randomized encryption for stronger security.

Key Management

use astraea_crypto::{KeyPair, EncryptedNode, EncryptedQueryEngine};

// Client generates a key pair
let keys = KeyPair::generate();

// Encrypt a node
let encrypted = EncryptedNode::from_node(&node, &keys.secret);

Server-Side Label Matching

Labels are encrypted with deterministic tags, allowing the server to compare encrypted labels without decryption:

// Server side: search by encrypted label
let encrypted_label = EncryptedLabel::encrypt("Person", &keys.secret);
let results = engine.find_by_encrypted_label(&encrypted_label);

// Client side: decrypt results
for enc_node in results {
    let node = enc_node.to_node(&keys.secret);
}

Encryption Types

TypeDescription
EncryptedValueRandomized encryption (same plaintext → different ciphertexts)
EncryptedLabelDeterministic tag for matching + randomized value for confidentiality
EncryptedNodeEncrypted labels (individually) + encrypted properties (as JSON blob). Node ID stays plaintext.

GPU Acceleration

The astraea-gpu crate provides a framework for GPU-accelerated graph analytics. Graph algorithms like PageRank are fundamentally matrix operations, which GPUs can execute much faster than CPUs due to their parallel architecture.

CSR Matrix

Graphs are converted to CSR (Compressed Sparse Row) format for efficient matrix operations. CSR is a compact way to represent sparse matrices (matrices with mostly zeros, like adjacency matrices) that enables fast row access:

use astraea_gpu::{CsrMatrix, CpuBackend, GpuBackend};

let nodes = vec![n1, n2, n3, n4];
let csr = CsrMatrix::from_graph(&graph, &nodes)?;
// csr.spmv(&x) -- sparse matrix-vector multiply (the core of PageRank)
// csr.transpose() -- efficient transpose operation

GpuBackend Trait

MethodReturnsDescription
pagerank(csr, config)HashMap<NodeId, f64>PageRank importance scores
bfs(csr, source)HashMap<NodeId, i32>BFS levels (distance from source, -1 = unreachable)
sssp(csr, source)HashMap<NodeId, f64>SSSP (Single-Source Shortest Path): shortest distances from one node to all others

CPU Fallback

The CpuBackend implements all algorithms in pure Rust. It is always available and serves as the fallback when no GPU is present. The SSSP implementation uses the Bellman-Ford algorithm, which handles negative edge weights (unlike Dijkstra).

Clustering & Sharding

The astraea-cluster crate provides foundations for distributed graph processing.

Partitioning Strategies

StrategyDescription
HashPartitionerAssigns nodes to shards via hash(node_id) % num_shards. Deterministic and evenly distributed.
RangePartitionerAssigns nodes based on ID ranges with configurable boundaries. Can be uniform or custom.

Shard Management

use astraea_cluster::{ShardMap, ShardInfo, HashPartitioner};

let mut shard_map = ShardMap::new(Box::new(HashPartitioner::new(3)));
shard_map.register_shard(info);
let shard = shard_map.shard_for_node(node_id);

Cluster Coordinator

The ClusterCoordinator trait defines the contract for distributed operations. LocalCoordinator is the single-node implementation that routes everything locally.

CLI Reference

Commands

# Start the server
astraeadb serve [--config config.toml] [--bind 0.0.0.0] [--port 7687]

# Interactive shell (REPL)
astraeadb shell [--address 127.0.0.1:7687]

# Check server status
astraeadb status [--address 127.0.0.1:7687]

# Import data from JSON
astraeadb import --file data.json --format json --data-dir ./data

# Export data to JSON
astraeadb export --file dump.json --format json --data-dir ./data

Configuration File (config.toml)

[server]
bind_address = "127.0.0.1"
port = 7687

[storage]
data_dir = "data"
buffer_pool_size = 1024
wal_dir = "data/wal"

Shell Features

Python Client

Installation

# Basic (JSON only, zero dependencies)
pip install ./python

# With Arrow Flight support
pip install ./python[arrow]

Client Types

ClientTransportDependencies
JsonClientTCP / newline-delimited JSONNone (stdlib only)
ArrowClientApache Arrow Flightpyarrow >= 14.0
AstraeaClientAuto-selects best transportOptional pyarrow

Usage

from astraeadb import AstraeaClient

# Connect with optional authentication
with AstraeaClient(host="127.0.0.1", port=7687, auth_token="my-api-key") as client:
    # Create nodes (embeddings auto-indexed)
    alice = client.create_node(["Person"], {"name": "Alice", "age": 30},
                                embedding=[0.1] * 128)
    bob = client.create_node(["Person"], {"name": "Bob"})

    # Create a temporal edge
    client.create_edge(alice, bob, "KNOWS", {"since": 2020}, weight=0.9,
                       valid_from=1609459200000)  # Jan 1, 2021 (ms)

    # Traversals
    neighbors = client.neighbors(alice, direction="outgoing")
    path = client.shortest_path(alice, bob, weighted=True)
    reachable = client.bfs(alice, max_depth=2)

    # Temporal queries (time-travel)
    old_neighbors = client.neighbors_at(alice, "outgoing", 1577836800000)  # Jan 1, 2020
    historical_path = client.shortest_path_at(alice, bob, 1577836800000)

    # Vector search
    results = client.vector_search([0.15] * 128, k=5)

    # Hybrid search
    results = client.hybrid_search(anchor=alice, query_vector=[0.15] * 128,
                                    max_hops=3, k=10, alpha=0.5)

    # GraphRAG - extract subgraph context
    context = client.extract_subgraph(alice, hops=2, max_nodes=50, format="prose")

    # GraphRAG - full pipeline with LLM
    answer = client.graph_rag("Who does Alice know?", anchor=alice)

    # GQL query
    result = client.query("MATCH (a:Person) WHERE a.age > 25 RETURN a.name")

    # Batch operations
    node_ids = client.create_nodes([
        {"labels": ["Person"], "properties": {"name": "Charlie"}},
        {"labels": ["Person"], "properties": {"name": "Diana"}}
    ])

    # Health check
    status = client.ping()

DataFrame Support (Optional)

from astraeadb import AstraeaClient
from astraeadb.dataframe import import_nodes_df, export_nodes_df, export_bfs_df
import pandas as pd

# Import nodes from a DataFrame
df = pd.DataFrame([
    {"label": "Person", "name": "Alice", "age": 30},
    {"label": "Person", "name": "Bob", "age": 25}
])

with AstraeaClient() as client:
    node_ids = import_nodes_df(client, df, label_col="label")

    # Export nodes back to DataFrame
    result_df = export_nodes_df(client, node_ids)

    # Export BFS results with node details
    bfs_df = export_bfs_df(client, start=node_ids[0], max_depth=2)

Arrow Flight Client (Bulk Operations)

from astraeadb import ArrowClient
import pyarrow as pa

arrow = ArrowClient(host="127.0.0.1", flight_port=7689)

# Query results as Arrow Table (zero-copy to Pandas)
table = arrow.query("MATCH (a:Person) RETURN a.name, a.age")
df = table.to_pandas()

# Bulk import
nodes_table = pa.table({"id": [1, 2], "labels": ["Person", "Person"],
                         "properties": ['{"name":"Alice"}', '{"name":"Bob"}']})
arrow.bulk_insert_nodes(nodes_table)

Python API Reference

CategoryMethodDescription
Healthping()Health check, returns server version
Node CRUDcreate_node(labels, properties?, embedding?)Create a node, returns node ID
get_node(id)Get node by ID
update_node(id, properties)Merge properties into a node
delete_node(id)Delete node and all connected edges
Edge CRUDcreate_edge(source, target, type, props?, weight?, valid_from?, valid_to?)Create edge with optional temporal validity
get_edge(id)Get edge by ID
update_edge(id, properties)Update edge properties (merge)
delete_edge(id)Delete an edge
Traversalneighbors(id, direction?, edge_type?)Get neighbors
bfs(start, max_depth?)Breadth-first traversal
shortest_path(from, to, weighted?)Shortest path (BFS or Dijkstra)
Temporalneighbors_at(id, direction, timestamp, edge_type?)Neighbors at point in time
bfs_at(start, max_depth, timestamp)BFS at point in time
shortest_path_at(from, to, timestamp, weighted?)Path at point in time
Vector/Semanticvector_search(embedding, k?)k-nearest-neighbor search
hybrid_search(anchor, query_vector, max_hops?, k?, alpha?)Blended graph + vector search
semantic_neighbors(node, embedding, direction?, k?)Rank neighbors by concept
semantic_walk(start, embedding, max_hops?)Greedy semantic walk
GraphRAGextract_subgraph(center, hops?, max_nodes?, format?)Extract + linearize subgraph
graph_rag(question, anchor?, question_embedding?, hops?, max_nodes?, format?)Full RAG pipeline with LLM
GQLquery(gql_string)Execute a GQL query
Batch Opscreate_nodes(nodes_list)Create multiple nodes
create_edges(edges_list)Create multiple edges
delete_nodes(node_ids)Delete multiple nodes
delete_edges(edge_ids)Delete multiple edges

DataFrame Module (astraeadb.dataframe)

Requires pandas: pip install pandas

FunctionDescription
import_nodes_df(client, df, label_col, embedding_cols?)Import nodes from DataFrame
import_edges_df(client, df, source_col, target_col, type_col, ...)Import edges from DataFrame
export_nodes_df(client, node_ids)Export nodes to DataFrame
export_edges_df(client, edge_ids)Export edges to DataFrame
export_bfs_df(client, start, max_depth?)Export BFS results with node details
export_bfs_at_df(client, start, max_depth, timestamp)Export temporal BFS to DataFrame

R Client

The R client provides full feature parity with the Python client, supporting all AstraeaDB operations via JSON/TCP, with optional Apache Arrow Flight support for high-performance queries.

Prerequisites

install.packages("jsonlite")  # Required
install.packages("arrow")     # Optional, for Arrow Flight

Client Classes

ClassTransportDescription
AstraeaClientJSON/TCPStandard client, always available
ArrowClientArrow FlightHigh-performance queries (requires arrow package)
UnifiedClientAuto-selectUses Arrow when available, falls back to JSON

Basic Usage

source("examples/r_client.R")

# Connect (with optional auth token)
client <- AstraeaClient$new(host = "127.0.0.1", port = 7687L, auth_token = "my-key")
client$connect()

# Create nodes with embeddings
id <- client$create_node(
  list("Person"),
  list(name = "Alice", age = 30),
  embedding = c(0.9, 0.1, 0.3)
)

# Create temporal edges
eid <- client$create_edge(
  id1, id2, "KNOWS",
  properties = list(since = 2024),
  weight = 0.9,
  valid_from = 1704067200000,  # Jan 1, 2024 (ms)
  valid_to = NULL               # Still active
)

client$close()

Full API Reference

CategoryMethodDescription
Node CRUDcreate_node(labels, properties, embedding=NULL)Create node, returns ID
get_node(node_id)Get node by ID
update_node(node_id, properties)Update properties (merge)
delete_node(node_id)Delete node + edges
Edge CRUDcreate_edge(src, tgt, type, props, weight, valid_from, valid_to)Create temporal edge
get_edge(edge_id)Get edge by ID
update_edge(edge_id, properties)Update properties (merge)
delete_edge(edge_id)Delete edge
Traversalneighbors(node_id, direction, edge_type)Get neighbors
bfs(start, max_depth)Breadth-first search
shortest_path(from, to, weighted)Find shortest path
Temporalneighbors_at(node_id, direction, timestamp, edge_type)Neighbors at point in time
bfs_at(start, max_depth, timestamp)BFS at point in time
shortest_path_at(from, to, timestamp, weighted)Path at point in time
GQLquery(gql)Execute GQL query
Vector/Semanticvector_search(query_vector, k)k-NN vector search
hybrid_search(anchor, query_vector, max_hops, k, alpha)Graph + vector combined
semantic_neighbors(node_id, concept, direction, k)Neighbors by similarity
semantic_walk(start, concept, max_hops)Greedy semantic traversal
GraphRAGextract_subgraph(center, hops, max_nodes, format)Extract + linearize
graph_rag(question, anchor, embedding, hops, max_nodes, format)Full RAG pipeline
Batch Opscreate_nodes(nodes_list)Create multiple nodes
create_edges(edges_list)Create multiple edges
delete_nodes(node_ids)Delete multiple nodes
delete_edges(edge_ids)Delete multiple edges
Data Frame I/Oimport_nodes_df(df, label_col, embedding_cols)Import nodes from data.frame
import_edges_df(df, source_col, target_col, ...)Import edges from data.frame
export_nodes_df(node_ids)Export nodes to data.frame
export_bfs_df(start, max_depth)BFS results as data.frame
Utilityresults_to_dataframe(results)Convert results to data.frame
nodes_to_dataframe(node_ids)Fetch nodes as data.frame

Vector Search Example

# Find nodes similar to a "tech interest" vector
tech_vector <- c(1.0, 0.0, 0.0)
results <- client$vector_search(tech_vector, k = 5L)
for (r in results) {
  node <- client$get_node(r$node_id)
  cat(sprintf("  %s (similarity=%.3f)\n", node$properties$name, r$similarity))
}

Temporal Query Example

# See who Alice knew in 2020 vs 2024
t_2020 <- 1577836800000  # Jan 1, 2020
t_2024 <- 1704067200000  # Jan 1, 2024

neighbors_2020 <- client$neighbors_at(alice, "outgoing", t_2020)
neighbors_2024 <- client$neighbors_at(alice, "outgoing", t_2024)

cat(sprintf("2020: %d connections\n", length(neighbors_2020)))
cat(sprintf("2024: %d connections\n", length(neighbors_2024)))

GraphRAG Example

# Extract subgraph and get LLM answer
subgraph <- client$extract_subgraph(alice_id, hops = 2L, format = "structured")
cat(subgraph$text)  # Linearized graph context

# Full RAG pipeline (requires server LLM config)
result <- client$graph_rag(
  question = "Who does Alice work with?",
  anchor = alice_id,
  hops = 2L
)
cat(result$answer)

Batch Operations Example

# Create multiple nodes at once
nodes <- list(
  list(labels = list("Person"), properties = list(name = "Alice"), embedding = c(0.9, 0.1)),
  list(labels = list("Person"), properties = list(name = "Bob"), embedding = c(0.1, 0.9)),
  list(labels = list("Person"), properties = list(name = "Charlie"))
)
node_ids <- client$create_nodes(nodes)

# Create multiple edges at once
edges <- list(
  list(source = node_ids[1], target = node_ids[2], edge_type = "KNOWS"),
  list(source = node_ids[2], target = node_ids[3], edge_type = "KNOWS", weight = 0.5)
)
edge_ids <- client$create_edges(edges)

Data Frame Import/Export

# Import nodes from a data.frame
people_df <- data.frame(
  label = "Person",
  name = c("Alice", "Bob", "Charlie"),
  age = c(30, 25, 35)
)
node_ids <- client$import_nodes_df(people_df, label_col = "label")

# Export BFS results as a data.frame
bfs_df <- client$export_bfs_df(node_ids[1], max_depth = 2L)
print(bfs_df)
#   node_id depth  labels   name age
# 1       1     0  Person  Alice  30
# 2       2     1  Person    Bob  25
# 3       3     2  Person Charlie  35

Arrow Flight (High-Performance)

# Option 1: Use ArrowClient directly
arrow_client <- ArrowClient$new("grpc://localhost:7689")
arrow_client$connect()
result <- arrow_client$query("MATCH (p:Person) RETURN p.name, p.age")  # Returns Arrow Table
df <- arrow_client$query_df("MATCH (p:Person) RETURN p.name")          # Returns data.frame
arrow_client$close()

# Option 2: Use UnifiedClient (auto-selects best transport)
client <- UnifiedClient$new(host = "127.0.0.1", port = 7687L)
client$connect()
cat("Arrow enabled:", client$is_arrow_enabled(), "\n")
result <- client$query_df("MATCH (n) RETURN n")  # Uses Arrow if available
client$close()

Running the Demo

# Terminal 1
cargo run -p astraea-cli -- serve

# Terminal 2
Rscript examples/r_client.R

Go Client

A full-featured Go client library is provided in the go/astraeadb directory, published as github.com/AstraeaDB/AstraeaDB-Official. It supports three transport layers with idiomatic Go patterns including functional options, context.Context on every operation, and thread-safe connections.

Client Types

Installation

go get github.com/AstraeaDB/AstraeaDB-Official

Quick Start

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/AstraeaDB/AstraeaDB-Official"
)

func main() {
    ctx := context.Background()

    // Unified client: auto-selects gRPC when available
    client := astraeadb.NewClient(
        astraeadb.WithAddress("127.0.0.1", 7687),
        astraeadb.WithAuthToken("my-api-key"),
    )
    if err := client.Connect(ctx); err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    // Create nodes
    alice, _ := client.CreateNode(ctx,
        []string{"Person"},
        map[string]any{"name": "Alice", "age": 30},
        []float32{0.1, 0.2, 0.3},
    )
    bob, _ := client.CreateNode(ctx,
        []string{"Person"},
        map[string]any{"name": "Bob", "age": 25},
        nil,
    )

    // Create a temporal edge with options
    client.CreateEdge(ctx, alice, bob, "KNOWS",
        astraeadb.WithWeight(0.9),
        astraeadb.WithProperties(map[string]any{"since": 2020}),
        astraeadb.WithValidFrom(1609459200000),
    )

    // Traverse, search, and query
    neighbors, _ := client.Neighbors(ctx, alice,
        astraeadb.WithDirection("outgoing"))
    results, _ := client.VectorSearch(ctx, []float32{0.15, 0.25, 0.35}, 5)
    result, _ := client.Query(ctx, "MATCH (n:Person) RETURN n.name")
    rag, _ := client.GraphRAG(ctx, "Who does Alice know?",
        astraeadb.WithAnchor(alice))

    fmt.Println(neighbors, results, result, rag)
}

Configuration Options

The client uses the functional options pattern for configuration:

// All available options
client := astraeadb.NewClient(
    astraeadb.WithAddress("db.example.com", 7687),
    astraeadb.WithGRPCPort(7688),
    astraeadb.WithFlightPort(7689),
    astraeadb.WithAuthToken("my-api-key"),
    astraeadb.WithTimeout(30 * time.Second),
    astraeadb.WithDialTimeout(5 * time.Second),
    astraeadb.WithTLS("ca.pem"),                       // Server TLS
    astraeadb.WithMTLS("client.pem", "client.key", "ca.pem"), // Mutual TLS
    astraeadb.WithMaxRetries(5),
    astraeadb.WithReconnect(true),
)

Error Handling

The Go client provides sentinel errors for programmatic error handling with errors.Is():

import "errors"

_, err := client.GetNode(ctx, 999)
if errors.Is(err, astraeadb.ErrNodeNotFound) {
    // Handle missing node
}

// Available sentinel errors:
// ErrNotConnected, ErrNodeNotFound, ErrEdgeNotFound,
// ErrNoVectorIndex, ErrAccessDenied, ErrInvalidCreds, ErrAuthRequired

Batch Operations

nodes := []astraeadb.NodeInput{
    {Labels: []string{"Person"}, Properties: map[string]any{"name": "Charlie"}},
    {Labels: []string{"Person"}, Properties: map[string]any{"name": "Diana"}},
}
ids, err := client.CreateNodes(ctx, nodes)

edges := []astraeadb.EdgeInput{
    {Source: ids[0], Target: ids[1], EdgeType: "KNOWS", Weight: 0.8},
}
edgeIDs, err := client.CreateEdges(ctx, edges)

API Reference

CategoryMethodDescription
HealthPing(ctx)Health check, returns server version
Node CRUDCreateNode(ctx, labels, properties, embedding)Create a node, returns node ID
GetNode(ctx, id)Retrieve node by ID
UpdateNode(ctx, id, props)Merge properties into node
DeleteNode(ctx, id)Delete node and connected edges
Edge CRUDCreateEdge(ctx, src, tgt, type, opts...)Create edge with WithWeight, WithProperties, WithValidFrom, WithValidTo
GetEdge(ctx, id) / UpdateEdge / DeleteEdgeGet, update, or delete an edge
TraversalNeighbors(ctx, id, opts...)Get neighbors with WithDirection, WithEdgeType
BFS(ctx, start, maxDepth)Breadth-first traversal
ShortestPath(ctx, from, to, weighted)Shortest path (BFS or Dijkstra)
TemporalNeighborsAt(ctx, id, direction, timestamp)Neighbors at point in time
BFSAt(ctx, start, maxDepth, timestamp)BFS at point in time
ShortestPathAt(ctx, from, to, timestamp, weighted)Path at point in time
VectorVectorSearch(ctx, embedding, k)k-nearest-neighbor search
HybridSearch(ctx, anchor, embedding, opts...)Blended graph + vector search
SemanticNeighbors(ctx, id, concept, opts...)Rank neighbors by concept similarity
SemanticWalk(ctx, start, concept, maxHops)Greedy semantic walk
GraphRAGExtractSubgraph(ctx, center, opts...)Extract + linearize subgraph
GraphRAG(ctx, question, opts...)Full RAG pipeline with LLM
GQLQuery(ctx, gql)Execute a GQL query
BatchCreateNodes(ctx, nodes) / CreateEdges(ctx, edges)Batch create
DeleteNodes(ctx, ids) / DeleteEdges(ctx, ids)Batch delete

Transport Selection

The unified Client automatically selects the best available transport:

// Check transport availability at runtime
fmt.Println("gRPC available:", client.IsGRPCAvailable())
fmt.Println("Arrow available:", client.IsArrowAvailable())

Running the Tests

# From go/astraeadb/
go test -v -race ./...

# Run with the Makefile
make test

Java Client

A full-featured Java client is provided in the java/astraeadb directory as a Gradle multi-module project. It supports three transport layers with idiomatic Java patterns including records (Java 17+), the builder pattern, try-with-resources lifecycle, and thread-safe connections.

Client Types

Gradle Dependency

dependencies {
    implementation "com.astraeadb:astraeadb-unified:0.1.0"  // All transports
    // Or individual: astraeadb-json, astraeadb-grpc, astraeadb-flight
}

Usage Example

import com.astraeadb.unified.UnifiedClient;
import com.astraeadb.model.*;
import com.astraeadb.options.*;

try (var client = UnifiedClient.builder()
        .host("127.0.0.1")
        .authToken("my-api-key")
        .build()) {

    client.connect();

    // Create nodes with embeddings
    long alice = client.createNode(
        List.of("Person"),
        Map.of("name", "Alice", "age", 30),
        new float[]{0.1f, 0.2f, 0.3f});

    // Create a temporal edge
    client.createEdge(alice, bob, "KNOWS",
        EdgeOptions.builder()
            .weight(0.9)
            .validFrom(1609459200000L)
            .build());

    // Traverse, search, query
    List<NeighborEntry> neighbors = client.neighbors(alice,
        NeighborOptions.builder().direction("outgoing").build());
    List<SearchResult> results = client.vectorSearch(
        new float[]{0.15f, 0.25f, 0.35f}, 5);
    QueryResult result = client.query(
        "MATCH (n:Person) RETURN n.name");
    RagResult rag = client.graphRag("Who does Alice know?",
        RagOptions.builder().anchor(alice).hops(2).build());
}

Exception Handling

The Java client uses a checked exception hierarchy rooted at AstraeaException:

try {
    Node node = client.getNode(999);
} catch (NodeNotFoundException e) {
    // Specific exception for not-found
} catch (AccessDeniedException e) {
    // Permission error
} catch (AstraeaException e) {
    // Base exception for all errors
}

Java API Reference

CategoryMethodDescription
Healthping()Health check, returns version
Node CRUDcreateNode(labels, props, embedding)Create node, returns ID
getNode(id) / updateNode(id, props) / deleteNode(id)Read/update/delete
Edge CRUDcreateEdge(src, tgt, type, options)Create edge with EdgeOptions
getEdge(id) / updateEdge(id, props) / deleteEdge(id)Read/update/delete
Traversalneighbors(id, options)Get neighbors with NeighborOptions
bfs(start, maxDepth)Breadth-first traversal
shortestPath(from, to, weighted)Shortest path
TemporalneighborsAt(id, dir, timestamp)Neighbors at time T
bfsAt(start, depth, timestamp)BFS at time T
shortestPathAt(from, to, ts, weighted)Path at time T
VectorvectorSearch(embedding, k)k-NN search
hybridSearch(anchor, embedding, options)Graph + vector
semanticNeighbors(id, concept, options)Neighbors by similarity
semanticWalk(start, concept, maxHops)Semantic walk
GraphRAGextractSubgraph(center, options)Extract + linearize
graphRag(question, options)Full RAG pipeline
GQLquery(gql)Execute a GQL query
BatchcreateNodes(nodes) / createEdges(edges)Batch create
deleteNodes(ids) / deleteEdges(ids)Batch delete

Cybersecurity Demo

This example demonstrates how AstraeaDB enables security analysts to investigate network alerts by tracing connections through a graph.

The Problem

When a firewall alerts on suspicious traffic from 10.0.1.50, the analyst must manually search DHCP logs, asset management records, and other sources to trace the IP to a user. With AstraeaDB, these datasets are loaded as a graph and the investigation becomes a series of traversals.

Graph Model

User <--[ASSIGNED_TO]-- Laptop <--[DHCP_LEASE]-- IPAddress
                                                    |
                                              [TRAFFIC]  [TRIGGERED]
                                                    |         |
                                              IPAddress  FirewallAlert --[TARGETS]--> ExternalHost

The Scenario

Three employees — Alice (Engineering), Bob (Finance), and Eve (Marketing) — each have laptops with DHCP-assigned IPs. Eve's attack chain:

  1. Downloads a password cracker from darktools.example.com (port 443)
  2. Firewall logs the connection (alert FW-2025-0042, severity: critical)
  3. Attempts RDP to Bob's machine at 10.0.1.20:3389blocked
  4. Attempts SSH to Alice's machine at 10.0.1.10:22blocked

Investigation with AstraeaDB

# Step 1: Who triggered alert FW-2025-0042?
sources = client.neighbors(alert_id, "incoming", edge_type="TRIGGERED")
# → Source IP: 10.0.1.50

# Step 2: Trace IP → Laptop via DHCP lease
leases = client.neighbors(source_ip_id, "outgoing", edge_type="DHCP_LEASE")
# → Laptop: EVE-LAT01

# Step 3: Trace Laptop → User
users = client.neighbors(laptop_id, "outgoing", edge_type="ASSIGNED_TO")
# → User: Eve (Marketing, Analyst)

# Step 4: What else has Eve's IP been doing?
traffic = client.neighbors(source_ip_id, "outgoing", edge_type="TRAFFIC")
# → darktools.example.com:443, 10.0.1.20:3389 (RDP), 10.0.1.10:22 (SSH)

# Step 5: BFS blast radius
blast_radius = client.bfs(source_ip_id, max_depth=2)

Running the Demo

# Terminal 1
cargo run -p astraea-cli -- serve

# Terminal 2
python3 examples/cybersecurity_demo.py

13 Rust tests cover this scenario in the astraea-graph crate:

cargo test --package astraea-graph cybersecurity

API Reference

Complete JSON request/response format for all request types. All requests are newline-delimited JSON sent over TCP (port 7687).

Node Operations

CreateNode

// Request
{"type":"CreateNode","labels":["Person"],"properties":{"name":"Alice","age":30},"embedding":[0.1,0.2]}
// Response
{"status":"ok","data":{"node_id":1}}

GetNode

{"type":"GetNode","id":1}
// Response includes id, labels, properties, embedding

UpdateNode

{"type":"UpdateNode","id":1,"properties":{"title":"Engineer"}}

DeleteNode

{"type":"DeleteNode","id":1}
// Cascades: all connected edges are also deleted

Edge Operations

CreateEdge

{"type":"CreateEdge","source":1,"target":2,"edge_type":"KNOWS",
 "properties":{"since":2020},"weight":0.9,
 "valid_from":1704067200000,"valid_to":null}

GetEdge / UpdateEdge / DeleteEdge

{"type":"GetEdge","id":1}
{"type":"UpdateEdge","id":1,"properties":{"note":"updated"}}
{"type":"DeleteEdge","id":1}

Traversal Operations

Neighbors

{"type":"Neighbors","id":1,"direction":"outgoing","edge_type":"KNOWS"}

Bfs

{"type":"Bfs","start":1,"max_depth":3}

ShortestPath

{"type":"ShortestPath","from":1,"to":5,"weighted":false}

Vector & Semantic Operations

VectorSearch

{"type":"VectorSearch","query":[0.1,0.2,0.3],"k":10}

HybridSearch

{"type":"HybridSearch","anchor":1,"query":[0.1,0.2],"max_hops":3,"k":10,"alpha":0.5}

SemanticNeighbors

{"type":"SemanticNeighbors","id":1,"concept":[0.1,0.2],"direction":"outgoing","k":5}

SemanticWalk

{"type":"SemanticWalk","start":1,"concept":[0.1,0.2],"max_hops":4}

Temporal Operations

NeighborsAt

{"type":"NeighborsAt","id":1,"direction":"outgoing","timestamp":1736929800000}

BfsAt

{"type":"BfsAt","start":1,"max_depth":3,"timestamp":1736929800000}

ShortestPathAt

{"type":"ShortestPathAt","from":1,"to":5,"timestamp":1736929800000,"weighted":false}

RAG Operations

ExtractSubgraph

{"type":"ExtractSubgraph","center":1,"hops":2,"max_nodes":50,"format":"structured"}

GraphRag

{"type":"GraphRag","question":"Who compromised the server?",
 "anchor":1,"hops":2,"max_nodes":50,"format":"structured"}

Query & Health

Query

{"type":"Query","gql":"MATCH (a:Person)-[:KNOWS]->(b) RETURN a.name, b.name"}

Ping

{"type":"Ping"}
{"status":"ok","data":{"pong":true,"version":"0.1.0"}}

Glossary

Quick reference for technical terms used throughout this documentation.

TermDefinition
ANN Approximate Nearest Neighbor — A search algorithm that finds vectors close to a query vector, trading perfect accuracy for speed. Returns results that are "good enough" rather than guaranteed optimal.
Arrow Flight A high-performance protocol for streaming columnar data using Apache Arrow's in-memory format. Enables zero-copy data transfer between database and client.
BFS Breadth-First Search — A graph traversal algorithm that explores all neighbors at the current depth before moving deeper. Visits nodes level-by-level.
CSR Compressed Sparse Row — A memory-efficient matrix format that stores only non-zero values. Used to represent sparse graphs as adjacency matrices for fast GPU operations.
DFS Depth-First Search — A graph traversal algorithm that explores as far as possible along each branch before backtracking. Follows one path to its end before trying alternatives.
Embedding A fixed-size numeric vector (array of floats) that captures the semantic meaning of data. Similar concepts have similar embeddings, enabling similarity search.
FHE Fully Homomorphic Encryption — Encryption that allows computation on encrypted data without decrypting it first. The server never sees plaintext.
GNN Graph Neural Network — A neural network designed for graph-structured data. Learns by passing messages between connected nodes to capture both features and structure.
GQL Graph Query Language — The ISO standard (2024) for querying graph databases. Combines the best features of Cypher and SQL with pattern-matching syntax.
GraphRAG Graph-enhanced Retrieval-Augmented Generation — A technique that extracts relevant subgraphs, converts them to text, and feeds them to an LLM to answer questions with graph context.
gRPC Google Remote Procedure Call — A high-performance RPC framework using Protocol Buffers for serialization. More efficient than JSON for structured data.
HNSW Hierarchical Navigable Small World — A graph-based algorithm for approximate nearest neighbor search. Builds a multi-layer navigation structure for fast similarity queries with O(log n) complexity.
io_uring A Linux kernel interface for high-performance asynchronous I/O. Uses shared ring buffers between user space and kernel to minimize syscall overhead.
LLM Large Language Model — An AI model trained on vast text data that can understand and generate human language (e.g., GPT-4, Claude).
LSN Log Sequence Number — A monotonically increasing identifier for each entry in the write-ahead log. Used for recovery and replication.
mTLS Mutual TLS — Two-way TLS authentication where both client and server present certificates. Provides strong identity verification for both parties.
MVCC Multi-Version Concurrency Control — A database technique that maintains multiple versions of data to allow concurrent reads and writes without blocking. Each transaction sees a consistent snapshot.
NVMe Non-Volatile Memory Express — A high-speed storage interface protocol designed for SSDs. Provides much lower latency than SATA or SAS.
Parquet A columnar file format optimized for analytics. Stores data by column rather than row, enabling efficient compression and fast analytical queries.
Pointer Swizzling A technique that converts disk-based identifiers (64-bit IDs) into direct memory pointers when data is loaded into RAM, enabling nanosecond-level access.
RBAC Role-Based Access Control — A security model where permissions are assigned to roles (Admin, Writer, Reader) rather than individual users. Users are granted roles.
SCC Strongly Connected Components — Maximal subgraphs where every node can reach every other node following directed edges. Used to find tightly-knit groups.
SEAL Microsoft Simple Encrypted Arithmetic Library — An open-source library for homomorphic encryption that enables computation on encrypted data.
SSSP Single-Source Shortest Path — An algorithm that computes the shortest distance from one source node to all other nodes in the graph (e.g., Dijkstra's algorithm).
TLS Transport Layer Security — A cryptographic protocol that provides secure communication over networks. Successor to SSL.
WAL Write-Ahead Log — A durability mechanism that logs all mutations to disk before applying them. Enables crash recovery by replaying the log.

AstraeaDB — Cloud-Native, AI-First Graph Database — MIT License

441 Rust tests • 23 Python tests • 113 Java tests • 14 crates • Edition 2024