AstraeaDB Wiki

Introduction

AstraeaDB is a cloud-native, AI-first graph database written in Rust. It combines a Vector-Property Graph model with an HNSW (Hierarchical Navigable Small World) vector index, enabling both structural graph traversals and semantic similarity search in a single system.

Key Differentiators

Unified Data Model — Nodes carry labels, JSON properties, and optional float32 embeddings (numeric vectors that capture semantic meaning). Edges carry types, weights, and temporal validity intervals.
Hybrid Search — Blend graph proximity (how many hops away) and vector similarity (how semantically close) with a configurable alpha parameter.
Semantic Traversal — Navigate the graph by meaning: find neighbors most similar to an abstract concept represented as a vector.
Temporal Queries — Travel through time: query the graph as it existed at any point in time using edge validity intervals.
GraphRAG — Graph-enhanced Retrieval-Augmented Generation: extract subgraphs, convert to text, and feed to an LLM (Large Language Model) in one atomic operation.
GQL Support — Hand-written ISO GQL (Graph Query Language) / Cypher parser and executor for declarative graph queries.
Three Transport Layers — JSON-over-TCP, gRPC (Google's high-performance RPC framework), and Apache Arrow Flight (columnar data streaming).
Five Client Libraries — Python, R, Go, Java, and embedded Rust. Each supports all 22 server operations with idiomatic language patterns.
Pure Rust — Zero garbage-collection pauses, memory safety, 441 tests across 14 crates.
Cloud-Native Storage — Cold storage backends for S3, GCS, Azure, and Parquet (columnar file format).
Production Security — API key RBAC (Role-Based Access Control) and mTLS (mutual TLS) with client certificate authentication.

Project Stats

Metric	Value
Rust Crates	14
Rust Tests	441
Python Tests	23
Go Tests	30
Java Tests	113
Client Libraries	Python, R, Go, Java, Rust (embedded)
Rust Edition	2024
License	MIT

Getting Started

Build

cargo build --workspace

Run Tests

cargo test --workspace

Start the Server

cargo run -p astraea-cli -- serve

This starts the TCP server on port 7687, gRPC on 7688, and Arrow Flight on 7689.

Connect with the Shell

cargo run -p astraea-cli -- shell

The interactive shell supports both GQL queries and raw JSON requests:

# GQL queries
astraea> CREATE (a:Person {name: "Alice", age: 30})
Nodes created: 1

astraea> MATCH (a:Person) WHERE a.age > 25 RETURN a.name, a.age
+-------+------+
| a.name| a.age|
+-------+------+
| Alice | 30   |
+-------+------+

# Dot-commands
astraea> .status
astraea> .help
astraea> .quit

Check Server Status

cargo run -p astraea-cli -- status

Architecture

Crate Overview

Crate	Purpose	Tests
`astraea-core`	Types (`Node`, `Edge`, `NodeId`), traits (`StorageEngine`, `GraphOps`, `VectorIndex`, `TransactionalEngine`), errors	4
`astraea-storage`	8 KiB pages, LRU buffer pool, pointer swizzling, MVCC, WAL, label index, cold storage (JSON/Parquet/S3), PageIO (memmap2/io_uring)	75
`astraea-graph`	CRUD, BFS, DFS, Dijkstra, temporal queries, hybrid search, semantic traversal	55
`astraea-query`	GQL lexer, recursive-descent parser, full query executor	56
`astraea-vector`	HNSW index, cosine/Euclidean/dot-product, binary persistence	33
`astraea-rag`	Subgraph extraction, linearization (4 formats), LLM providers, GraphRAG pipeline	27
`astraea-gnn`	Differentiable tensors, message passing, node classification training	26
`astraea-server`	TCP/gRPC server, auth (RBAC + mTLS), metrics (Prometheus), connection management	68
`astraea-flight`	Arrow Flight: `do_get` (query → Arrow), `do_put` (Arrow → import)	11
`astraea-algorithms`	PageRank, connected/strongly-connected components, centrality, Louvain	20
`astraea-crypto`	Encrypted labels/values/nodes, server-side encrypted label matching	31
`astraea-gpu`	CSR matrix, GpuBackend trait, CpuBackend (PageRank, BFS, SSSP)	16
`astraea-cluster`	Hash/range partitioning, shard management, cluster coordinator	19
`astraea-mcp`	MCP server: JSON-RPC 2.0, stdio transport, 29 tools, 4 resource templates, 6 prompt templates	23
`astraea-cli`	`serve`, `shell`, `status`, `import`, `export`, `mcp`	—

Data Model

AstraeaDB uses a Vector-Property Graph model that unifies property graphs with vector embeddings.

Node

A node has an ID, a set of labels, arbitrary JSON properties, and an optional embedding vector.

pub struct Node {
    pub id: NodeId,
    pub labels: Vec<String>,
    pub properties: serde_json::Value,
    pub embedding: Option<Vec<f32>>, // optional dense vector
}

Edge

An edge connects two nodes with a type, JSON properties, a learnable weight, and a temporal validity interval.

pub struct Edge {
    pub id: EdgeId,
    pub source: NodeId,
    pub target: NodeId,
    pub edge_type: String,
    pub properties: serde_json::Value,
    pub weight: f64,                   // learnable weight for GNN
    pub validity: ValidityInterval,    // temporal bounds
}

ValidityInterval

Represents when an edge is valid. Uses epoch milliseconds with inclusive start and exclusive end.

pub struct ValidityInterval {
    pub valid_from: Option<i64>,  // inclusive, None = unbounded
    pub valid_to: Option<i64>,    // exclusive, None = still valid
}

// Check if an edge is valid at a given time
let valid = edge.validity.contains(1704067200000); // 2024-01-01

GraphPath

Represents a path through the graph as a start node followed by (edge, node) steps.

pub struct GraphPath {
    pub start: NodeId,
    pub steps: Vec<(EdgeId, NodeId)>,
}

ID Types

Type	Description
`NodeId(u64)`	Unique node identifier
`EdgeId(u64)`	Unique edge identifier
`PageId(u64)`	Storage page identifier
`TransactionId(u64)`	MVCC transaction identifier
`Lsn(u64)`	Write-ahead log sequence number

Storage Engine

AstraeaDB uses a three-tier storage architecture optimized for graph workloads.

Tier 1: Cold Storage

Data at rest on disk or object storage. The ColdStorage trait provides a pluggable backend interface with three implementations:

Backend	Description	Use Case
`JsonFileColdStorage`	Human-readable JSON files on local disk	Development, debugging, small datasets
`ParquetColdStorage`	Columnar Apache Parquet with Arrow schema	Analytics, large datasets, efficient compression
`ObjectStoreColdStorage`	S3, GCS, Azure Blob, or local filesystem	Cloud-native deployments, data lake integration

Parquet Schema

Nodes and edges are stored with full Arrow schema mapping:

// Node schema
id: UInt64, labels: List<Utf8>, properties: Utf8, embedding: List<Float32>

// Edge schema
id: UInt64, source: UInt64, target: UInt64, edge_type: Utf8,
properties: Utf8, weight: Float64, valid_from: Int64, valid_to: Int64

Object Store Usage

use astraea_storage::ObjectStoreColdStorage;

// Local filesystem
let storage = ObjectStoreColdStorage::local("/data/cold")?;

// Amazon S3
let storage = ObjectStoreColdStorage::s3("my-bucket", "astraea/")?;

// Google Cloud Storage
let storage = ObjectStoreColdStorage::gcs("my-bucket", "astraea/")?;

// Azure Blob Storage
let storage = ObjectStoreColdStorage::azure("my-container", "astraea/")?;

Tier 2: Warm (Buffer Pool)

An LRU buffer pool caches frequently accessed 8 KiB pages in memory with pin/unpin semantics. The PageIO trait abstracts disk I/O with two backends:

Backend	Platform	Description
`FileManager`	All platforms	Cross-platform memmap2-based I/O (default)
`UringPageIO`	Linux only	High-performance io_uring async I/O (feature-gated)

Enabling io_uring (Linux)

# Cargo.toml
[dependencies]
astraea-storage = { version = "0.1", features = ["io-uring"] }

# Build command
cargo build --features io-uring

Tier 3: Hot (Pointer Swizzling)

Frequently-accessed pages are promoted to permanently-pinned status, preventing eviction and enabling zero-copy access. When a page's access count exceeds a configurable threshold, it is "swizzled" into the hot tier.

Page Format

+----------------------------------+
| PageHeader (17 bytes)            |
|   page_id, type, record_count,  |
|   free_space_offset, checksum   |
+----------------------------------+
| Record 0: NodeRecordHeader       |
|   node_id, data_len, adj_offset |
|   + serialized properties       |
+----------------------------------+
| Record 1: ...                    |
+----------------------------------+
|         (free space)             |
+----------------------------------+
         8192 bytes total

MVCC Transactions

MVCC (Multi-Version Concurrency Control) allows multiple transactions to read and write data concurrently without blocking each other. AstraeaDB uses snapshot isolation with first-writer-wins conflict detection. The TransactionalEngine trait provides transactional access:

begin_transaction() — start a new transaction with a snapshot LSN (Log Sequence Number, a unique identifier for each log entry)
put_node_tx(node, txn_id) / put_edge_tx(edge, txn_id) — buffer writes until commit
commit_transaction(txn_id) — atomically apply all buffered writes
abort_transaction(txn_id) — discard all buffered writes

Write-Ahead Log (WAL)

The WAL (Write-Ahead Log) ensures durability: every mutation is logged to disk before being applied to the data files. If the database crashes, it can recover by replaying the log. Records use a [length][type][JSON payload][CRC32] frame format. Supports BeginTransaction, CommitTransaction, and AbortTransaction records for crash recovery.

Label Index

A hash-based index (HashMap<String, HashSet<NodeId>>) for O(1) constant-time label-based lookups, automatically maintained when nodes are created or deleted.

Graph Operations

The GraphOps trait defines all graph-level operations. It is implemented by the Graph struct on top of any StorageEngine.

CRUD Operations

use astraea_graph::Graph;
use astraea_core::traits::GraphOps;

// Create nodes
let alice = graph.create_node(
    vec!["Person".into()],
    json!({"name": "Alice", "age": 30}),
    None,  // no embedding
)?;

// Create an edge (always valid)
graph.create_edge(alice, bob, "KNOWS".into(), json!({}), 1.0, None, None)?;

// Read
let node = graph.get_node(alice)?;
let edge = graph.get_edge(edge_id)?;

// Update (merge semantics)
graph.update_node(alice, json!({"title": "Engineer"}))?;

// Delete (node deletion cascades to edges)
graph.delete_node(alice)?;

Traversals

Graph traversal algorithms explore nodes by following edges:

Method	Algorithm	Description
`bfs(start, max_depth)`	BFS (Breadth-First Search)	Explores neighbors level by level. Returns `Vec<(NodeId, depth)>`
`dfs(start, max_depth)`	DFS (Depth-First Search)	Explores as far as possible before backtracking. Returns `Vec<NodeId>`
`shortest_path(from, to)`	BFS	Unweighted shortest path (fewest hops)
`shortest_path_weighted(from, to)`	Dijkstra's algorithm	Weighted shortest path using edge weights

Neighbor Queries

use astraea_core::types::Direction;

// All outgoing neighbors
let neighbors = graph.neighbors(alice, Direction::Outgoing)?;

// Filtered by edge type
let friends = graph.neighbors_filtered(alice, Direction::Both, "KNOWS")?;

// Find nodes by label (O(1) via label index)
let people = graph.find_by_label("Person")?;

Temporal Queries

Edges have a ValidityInterval that defines when they exist. Temporal query methods filter edges by a given timestamp, allowing you to query the graph as it existed at any point in time.

Creating Temporal Edges

// DHCP lease valid from 08:00 to 10:00 UTC on Jan 15, 2025
graph.create_edge(
    ip_node, laptop_node,
    "DHCP_LEASE".into(),
    json!({"dhcp_server": "10.0.0.1"}),
    1.0,
    Some(1736928000000),  // valid_from
    Some(1736935200000),  // valid_to
)?;

Temporal Traversal Methods

Method	Description
`neighbors_at(node, direction, timestamp)`	Neighbors via edges valid at the timestamp
`bfs_at(start, max_depth, timestamp)`	BFS traversal only following valid edges
`shortest_path_at(from, to, timestamp)`	Unweighted shortest path at a point in time
`shortest_path_weighted_at(from, to, timestamp)`	Dijkstra with temporal filtering

Server Requests

// Neighbors at a specific time
{"type": "NeighborsAt", "id": 42, "direction": "outgoing", "timestamp": 1736929800000}

// BFS at a specific time
{"type": "BfsAt", "start": 42, "max_depth": 3, "timestamp": 1736929800000}

// Shortest path at a specific time
{"type": "ShortestPathAt", "from": 1, "to": 5, "timestamp": 1736929800000, "weighted": false}

Vector Search (HNSW)

AstraeaDB includes a full implementation of the HNSW (Hierarchical Navigable Small World) algorithm for ANN (Approximate Nearest-Neighbor) search. HNSW builds a multi-layer graph where each layer is a "small world" network — most nodes are not directly connected, but any two nodes can be reached through a small number of hops. This enables finding similar vectors in logarithmic time rather than scanning all vectors.

What are embeddings? Embeddings are numeric vectors (arrays of floating-point numbers) that represent the semantic meaning of data. Similar concepts have similar embeddings — two sentences about "dogs" will have embeddings that are mathematically close together, even if they use different words.

Configuration

Parameter	Default	Description
`M`	16	Maximum connections per node per layer (higher = more accurate but slower)
`ef_construction`	200	Beam width during index building (higher = better quality index)
`ef_search`	50	Beam width during search (trade-off: higher = more accurate, lower = faster)

Distance Metrics

Distance metrics measure how "far apart" two vectors are. Lower distance means more similar:

Metric	Description	Best For
`Cosine`	Measures the angle between vectors (1 - cosθ). Ignores magnitude.	Text embeddings, normalized vectors
`Euclidean`	Straight-line distance (L2 norm). Considers magnitude.	Spatial data, image features
`DotProduct`	Negative dot product. Higher dot product = more similar.	Recommendation systems, MIPS

Usage

use astraea_vector::HnswVectorIndex;
use astraea_core::types::DistanceMetric;
use astraea_core::traits::VectorIndex;

// Create a 128-dimensional index with cosine distance
let index = HnswVectorIndex::new(128, DistanceMetric::Cosine);

// Insert embeddings
index.insert(node_id, &embedding_vec)?;

// Search for k nearest neighbors
let results = index.search(&query_vec, 10)?;
for r in &results {
    println!("Node {:?}, distance: {}", r.node_id, r.distance);
}

Persistence

The index can be saved to and loaded from disk using a versioned binary format (magic bytes + bincode):

// Save to disk
index.save("index.hnsw")?;

// Load from disk (no rebuild needed)
let index = HnswVectorIndex::load("index.hnsw")?;

Auto-Indexing

When a VectorIndex is attached to a Graph, embeddings are automatically indexed on create_node() and removed on delete_node().

Hybrid & Semantic Search

Hybrid Search

Combines graph proximity with vector similarity using a configurable alpha blending parameter:

final_score = alpha × vector_score + (1 - alpha) × graph_score

graph.hybrid_search(
    anchor_node,       // starting node for BFS
    &query_embedding,  // semantic target
    3,                 // max_hops (BFS radius)
    10,                // top-k results
    0.5,               // alpha: 0.0 = pure graph, 1.0 = pure vector
)?;

Semantic Neighbors

Rank a node's neighbors by their embedding similarity to a concept vector:

// "Find the neighbor of Alice most similar to the concept of Risk"
let ranked = graph.semantic_neighbors(
    alice_id,
    &risk_embedding,
    Direction::Outgoing,
    5,  // top-k
)?;

Semantic Walk

A greedy multi-hop walk that at each step moves to the unvisited neighbor most similar to the concept:

// Walk through the graph toward the concept of "Fraud"
let path = graph.semantic_walk(
    start_node,
    &fraud_embedding,
    4,  // max hops
)?;
// path: Vec<(NodeId, f32)> where f32 is distance to concept

GQL Query Language

AstraeaDB includes a hand-written recursive-descent parser and full query executor for a subset of ISO GQL / Cypher.

MATCH Queries

MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.age > 30 AND b.name = "Bob"
RETURN a.name AS person, b.name AS friend
ORDER BY a.age DESC
LIMIT 10

CREATE

CREATE (a:Person {name: "Alice", age: 30})

CREATE (a:Person {name: "Alice"})-[:KNOWS {since: 2020}]->(b:Person {name: "Bob"})

DELETE

MATCH (a:Person) WHERE a.name = "Alice" DELETE a

Expression Support

Category	Operators / Functions
Arithmetic	`+`, `-`, `*`, `/`, `%`
Comparison	`=`, `<>`, `<`, `<=`, `>`, `>=`
Boolean	`AND`, `OR`, `NOT`
Null checks	`IS NULL`, `IS NOT NULL`
Functions	`count()`, `id()`, `labels()`, `type()`, `toString()`, `toInteger()`
Edge directions	`-[:TYPE]->` (out), `<-[:TYPE]-` (in), `-[:TYPE]-` (both)

Programmatic Usage

use astraea_query::parse;
use astraea_query::executor::Executor;

let ast = parse("MATCH (a:Person) WHERE a.age > 30 RETURN a.name")?;
let executor = Executor::new(graph.clone());
let result = executor.execute(ast)?;
// result.columns: ["a.name"]
// result.rows: [["Alice"], ...]

GraphRAG Engine

Retrieval-Augmented Generation backed by graph context. The pipeline performs: vector search → subgraph extraction → linearization → LLM completion.

Tutorial: See the GraphRAG with Claude Tutorial for a complete walkthrough of using AstraeaDB's GraphRAG engine with Anthropic's Claude.

Subgraph Extraction

use astraea_rag::{extract_subgraph, linearize_subgraph, TextFormat};

// BFS 2 hops from a node, max 50 nodes
let subgraph = extract_subgraph(&*graph, node_id, 2, 50)?;

Linearization Formats

Format	Description	Best For
`Structured`	Indented tree with arrows (`-[KNOWS]->`)	General LLM context
`Prose`	Natural language paragraphs	Conversational AI
`Triples`	`(subject, predicate, object)`	Knowledge extraction
`Json`	Compact JSON	Structured prompts

let text = linearize_subgraph(&subgraph, TextFormat::Structured);
let tokens = estimate_tokens(&text); // ~4 chars per token

LLM Providers

The LlmProvider trait supports multiple backends. Providers use injectable HTTP callbacks (no HTTP dependencies in the crate).

Provider	Description
`MockProvider`	Returns canned responses (for testing)
`OpenAiProvider`	OpenAI API compatible endpoints
`AnthropicProvider`	Anthropic Messages API
`OllamaProvider`	Local Ollama instance (default: localhost:11434)

Full Pipeline

use astraea_rag::{GraphRagConfig, graph_rag_query_anchored, MockProvider};

let config = GraphRagConfig {
    hops: 2,
    max_context_nodes: 50,
    text_format: TextFormat::Structured,
    token_budget: 4000,
    ..Default::default()
};

let result = graph_rag_query_anchored(
    &*graph, &llm, "Who knows Alice?", node_id, &config
)?;
println!("Answer: {}", result.answer);
println!("Context: {} tokens, {} nodes",
    result.estimated_tokens, result.nodes_in_context);

GNN Training

GNN (Graph Neural Network) is a type of neural network designed for graph-structured data. Unlike traditional neural networks that work on fixed-size inputs (like images), GNNs can learn from the structure of a graph — incorporating information from a node's neighbors to make predictions. AstraeaDB implements GNN training in pure Rust with no external ML framework.

Tutorial: See the Graph Neural Networks Tutorial for a complete guide to tensors, message passing, training loops, and node classification examples.

How GNNs work: Each node starts with features (its embedding). In each layer, nodes "send messages" to their neighbors and "aggregate" messages from neighbors. After several layers, a node's representation captures information from nodes many hops away.

Components

Tensor — Basic multi-dimensional array with element-wise operations (add, mul, scale), activations (ReLU, sigmoid), and gradient tracking for backpropagation.
Message Passing — The core GNN operation: aggregate neighbor features weighted by edge weights. Supports Sum, Mean, or Max aggregation.
Training Loop — Forward pass (run N message passing layers) → compute loss against known labels → estimate gradients numerically → update edge weights via gradient descent.

Example

use astraea_gnn::{TrainingConfig, TrainingData, MessagePassingConfig};
use astraea_gnn::training::train_node_classification;

let config = TrainingConfig {
    layers: 2,
    learning_rate: 0.01,
    epochs: 50,
    message_passing: MessagePassingConfig {
        aggregation: Aggregation::Mean,
        activation: Activation::ReLU,
        normalize: true,
    },
};

let result = train_node_classification(&*graph, &training_data, &config)?;
println!("Accuracy: {:.1}%", result.accuracy * 100.0);

Graph Algorithms

The astraea-algorithms crate provides classical graph algorithms for analyzing graph structure.

Algorithm	Function	Description
PageRank	`pagerank(graph, nodes, config)`	Ranks nodes by importance based on incoming links (like Google's original algorithm). Returns importance scores for each node.
Connected Components	`connected_components(graph, nodes)`	Groups nodes into clusters where every node can reach every other node (ignoring edge direction).
Strongly Connected	`strongly_connected_components(graph, nodes)`	Like connected components, but respects edge direction (for directed graphs).
Degree Centrality	`degree_centrality(graph, nodes, direction)`	Measures importance by counting connections. More connections = more central.
Betweenness Centrality	`betweenness_centrality(graph, nodes)`	Measures how often a node lies on shortest paths between other nodes. High betweenness = important bridge.
Community Detection	`louvain(graph, nodes)`	Finds densely-connected groups (communities) using the Louvain algorithm. Returns which community each node belongs to.

PageRank Configuration

let config = PageRankConfig {
    damping: 0.85,           // damping factor
    max_iterations: 100,     // convergence limit
    tolerance: 1e-6,         // L1 norm convergence threshold
};

Network Server

AstraeaDB provides four transport layers for different use cases.

Transport Layers

Transport	Port	Format	Best For
JSON-over-TCP	7687	Newline-delimited JSON	Debugging, scripting, `netcat`
gRPC / Protobuf	7688	Protocol Buffers	Production clients, type safety
Arrow Flight	7689	Apache Arrow	Python/Pandas, bulk operations
MCP / JSON-RPC	stdio	JSON-RPC 2.0	LLM integration (Claude, Cursor, etc.)

Supported Request Types

Request	Description
`CreateNode`	Create a node with labels, properties, optional embedding
`CreateEdge`	Create an edge between two nodes
`GetNode` / `GetEdge`	Retrieve by ID
`UpdateNode` / `UpdateEdge`	Merge properties
`DeleteNode` / `DeleteEdge`	Delete (node deletion cascades edges)
`Neighbors`	Get neighbors with direction and edge-type filtering
`Bfs`	Breadth-first traversal with depth limit
`ShortestPath`	Unweighted or weighted (Dijkstra)
`VectorSearch`	k-nearest-neighbor via HNSW index
`HybridSearch`	Graph + vector blended search
`SemanticNeighbors`	Rank neighbors by concept similarity
`SemanticWalk`	Greedy walk toward a concept
`NeighborsAt`	Temporal neighbors at a timestamp
`BfsAt`	Temporal BFS at a timestamp
`ShortestPathAt`	Temporal shortest path at a timestamp
`ExtractSubgraph`	Extract and linearize a local subgraph
`GraphRag`	GraphRAG pipeline (search → subgraph → LLM)
`Query`	Execute a GQL query string
`Ping`	Health check

JSON Protocol Examples

// Create a node with an embedding
{"type":"CreateNode","labels":["Person"],"properties":{"name":"Alice"},"embedding":[0.1,0.2]}

// Response
{"status":"ok","data":{"node_id":1}}

// Execute a GQL query
{"type":"Query","gql":"MATCH (a:Person) RETURN a.name"}

// Ping
{"type":"Ping"}
{"status":"ok","data":{"pong":true,"version":"0.1.0"}}

Authentication & RBAC

AstraeaDB supports API key authentication and mTLS (mutual TLS) with role-based access control.

Roles

Role	Permissions
`Admin`	Full access to all operations
`Writer`	Read + write (CRUD, traversals, queries)
`Reader`	Read-only (get, query, search, traverse, ping)

API Key Authentication

Include an auth_token field in JSON requests:

{"type":"CreateNode","labels":["Person"],"properties":{},"auth_token":"my-api-key"}

If auth is enabled and the token is missing or invalid:

{"status":"error","message":"authentication required: provide auth_token"}

mTLS (Mutual TLS)

For production deployments, AstraeaDB supports TLS encryption with optional client certificate verification. The server uses rustls for modern, safe TLS.

TLS Configuration

use astraea_server::tls::{TlsConfig};

// Server-only TLS (encrypt traffic)
let tls = TlsConfig::new(
    "server.crt",  // Server certificate
    "server.key",  // Server private key
);

// mTLS (verify client certificates)
let tls = TlsConfig::with_mtls(
    "server.crt",
    "server.key",
    "ca.crt",      // CA cert for client verification
);

Client Certificate Role Mapping

When mTLS is enabled, the client certificate's Common Name (CN) is automatically mapped to an RBAC role:

Certificate CN	Role
Contains "admin"	`Admin`
Contains "writer" or "write"	`Writer`
All others	`Reader`

TLS Helper Functions

use astraea_server::tls::*;

// Load certificates and keys
let certs = load_certs("server.crt")?;
let key = load_private_key("server.key")?;

// Extract info from client certificates
let cn = extract_client_cn(&client_certs);  // e.g., "admin-service"
let sans = extract_sans(&cert);             // Subject Alternative Names

// Map CN to role
let role = cn_to_role(&cn);                 // "admin-service" -> Admin

Read-Only Operations (accessible by Reader role)

GetNode, GetEdge, Neighbors, NeighborsAt, Bfs, BfsAt, ShortestPath, ShortestPathAt, VectorSearch, HybridSearch, SemanticNeighbors, SemanticWalk, Query, ExtractSubgraph, GraphRag, Ping

Audit Logging

All authenticated requests are logged with timestamp, truncated API key prefix (first 8 chars), role, operation, and whether it was allowed. The audit log is a bounded circular buffer (max 10,000 entries).

Key Management

let auth = AuthManager::new(vec![
    ApiKeyEntry { key: "admin-key-xxx".into(), role: Role::Admin,
                       description: "Admin key".into(), active: true },
    ApiKeyEntry { key: "reader-key-xxx".into(), role: Role::Reader,
                       description: "CI key".into(), active: true },
]);

// Runtime key management
auth.add_key(new_entry);
auth.revoke_key("compromised-key");

Observability

Prometheus Metrics

The server exposes metrics in Prometheus text exposition format.

Metric	Type	Description
`astraea_requests_total{type="..."}`	counter	Total requests by operation type
`astraea_errors_total{type="..."}`	counter	Total errors by operation type
`astraea_request_duration_us{type="...",quantile="0.5\|0.9\|0.99"}`	summary	Request duration percentiles (microseconds)
`astraea_active_connections`	gauge	Currently active TCP connections
`astraea_connections_total`	counter	Total connections since startup
`astraea_uptime_seconds`	gauge	Server uptime in seconds

Health Check

The health() method returns a JSON object:

{
  "status": "healthy",
  "uptime_seconds": 3600,
  "active_connections": 12,
  "total_connections": 1543,
  "start_time": 1704067200
}

Connection Management

Configuration

Parameter	Default	Description
`max_connections`	1024	Maximum concurrent TCP connections. New connections beyond this are rejected.
`max_concurrent_requests`	256	Request-level backpressure via semaphore.
`idle_timeout`	300s	Close connections idle for longer than this.
`request_timeout`	30s	Abort requests that take longer than this.
`drain_timeout`	10s	Max time to wait for in-flight requests during shutdown.

Graceful Shutdown

Stop accepting new connections
Wait for in-flight requests to complete (up to drain_timeout)
Close all connections

When the connection limit is reached, new connections receive:

{"status":"error","message":"server connection limit reached"}

Encryption

The astraea-crypto crate provides a foundation for encrypted queries, allowing clients to query the graph without the server seeing unencrypted data. This is essential for privacy-sensitive applications in banking, healthcare, and other regulated industries.

How it works: Labels are encrypted with deterministic tags (same label always produces the same tag), so the server can check equality without decrypting. Property values use randomized encryption for stronger security.

Key Management

use astraea_crypto::{KeyPair, EncryptedNode, EncryptedQueryEngine};

// Client generates a key pair
let keys = KeyPair::generate();

// Encrypt a node
let encrypted = EncryptedNode::from_node(&node, &keys.secret);

Server-Side Label Matching

Labels are encrypted with deterministic tags, allowing the server to compare encrypted labels without decryption:

// Server side: search by encrypted label
let encrypted_label = EncryptedLabel::encrypt("Person", &keys.secret);
let results = engine.find_by_encrypted_label(&encrypted_label);

// Client side: decrypt results
for enc_node in results {
    let node = enc_node.to_node(&keys.secret);
}

Encryption Types

Type	Description
`EncryptedValue`	Randomized encryption (same plaintext → different ciphertexts)
`EncryptedLabel`	Deterministic tag for matching + randomized value for confidentiality
`EncryptedNode`	Encrypted labels (individually) + encrypted properties (as JSON blob). Node ID stays plaintext.

GPU Acceleration

The astraea-gpu crate provides a framework for GPU-accelerated graph analytics. Graph algorithms like PageRank are fundamentally matrix operations, which GPUs can execute much faster than CPUs due to their parallel architecture.

CSR Matrix

Graphs are converted to CSR (Compressed Sparse Row) format for efficient matrix operations. CSR is a compact way to represent sparse matrices (matrices with mostly zeros, like adjacency matrices) that enables fast row access:

use astraea_gpu::{CsrMatrix, CpuBackend, GpuBackend};

let nodes = vec![n1, n2, n3, n4];
let csr = CsrMatrix::from_graph(&graph, &nodes)?;
// csr.spmv(&x) -- sparse matrix-vector multiply (the core of PageRank)
// csr.transpose() -- efficient transpose operation

GpuBackend Trait

Method	Returns	Description
`pagerank(csr, config)`	`HashMap<NodeId, f64>`	PageRank importance scores
`bfs(csr, source)`	`HashMap<NodeId, i32>`	BFS levels (distance from source, -1 = unreachable)
`sssp(csr, source)`	`HashMap<NodeId, f64>`	SSSP (Single-Source Shortest Path): shortest distances from one node to all others

CPU Fallback

The CpuBackend implements all algorithms in pure Rust. It is always available and serves as the fallback when no GPU is present. The SSSP implementation uses the Bellman-Ford algorithm, which handles negative edge weights (unlike Dijkstra).

Clustering & Sharding

The astraea-cluster crate provides foundations for distributed graph processing.

Partitioning Strategies

Strategy	Description
`HashPartitioner`	Assigns nodes to shards via `hash(node_id) % num_shards`. Deterministic and evenly distributed.
`RangePartitioner`	Assigns nodes based on ID ranges with configurable boundaries. Can be uniform or custom.

Shard Management

use astraea_cluster::{ShardMap, ShardInfo, HashPartitioner};

let mut shard_map = ShardMap::new(Box::new(HashPartitioner::new(3)));
shard_map.register_shard(info);
let shard = shard_map.shard_for_node(node_id);

Cluster Coordinator

The ClusterCoordinator trait defines the contract for distributed operations. LocalCoordinator is the single-node implementation that routes everything locally.

CLI Reference

Commands

# Start the server
astraeadb serve [--config config.toml] [--bind 0.0.0.0] [--port 7687]

# Interactive shell (REPL)
astraeadb shell [--address 127.0.0.1:7687]

# Check server status
astraeadb status [--address 127.0.0.1:7687]

# Import data from JSON
astraeadb import --file data.json --format json --data-dir ./data

# Export data to JSON
astraeadb export --file dump.json --format json --data-dir ./data

# Start MCP server for LLM integration (proxy mode)
astraeadb mcp [--address 127.0.0.1:7687] [--auth-token TOKEN]

Configuration File (config.toml)

[server]
bind_address = "127.0.0.1"
port = 7687

[storage]
data_dir = "data"
buffer_pool_size = 1024
wal_dir = "data/wal"

Shell Features

Readline support via rustyline (history, line editing)
Auto-detects GQL queries vs raw JSON requests
Table-formatted output for query results
Dot-commands: .help, .quit, .status

MCP (LLM Integration)

The Model Context Protocol (MCP) server (astraea-mcp) exposes AstraeaDB to LLM clients through a standardized JSON-RPC 2.0 interface. Any MCP-compatible client — Claude Desktop, Claude Code, Cursor, VS Code Copilot, and others — can discover and invoke AstraeaDB's full capabilities as tools, read graph data as resources, and use predefined prompt templates.

How It Works

The MCP server runs as a subprocess launched by the LLM client. It communicates over stdio (stdin/stdout) using JSON-RPC 2.0 messages. In proxy mode (default), it connects to a running AstraeaDB TCP server and translates MCP tool calls into AstraeaDB protocol requests.

┌──────────────┐    JSON-RPC/stdio     ┌──────────────┐    NDJSON/TCP    ┌──────────────┐
│  LLM Client  │ <──────────────────> │  astraea-mcp │ <──────────────> │  AstraeaDB   │
│ (Claude, etc)│                       │  (MCP server)│                  │  (TCP server) │
└──────────────┘                       └──────────────┘                  └──────────────┘

Available Tools (29)

Category	Tools	Description
CRUD	`create_node`, `create_edge`, `get_node`, `get_edge`, `update_node`, `update_edge`, `delete_node`, `delete_edge`	Full graph CRUD operations
Traversal	`neighbors`, `bfs`, `dfs`, `shortest_path`	Graph traversal and pathfinding
Search	`vector_search`, `hybrid_search`, `find_by_label`	Vector similarity and label-based search
Algorithms	`run_pagerank`, `run_louvain`, `run_connected_components`, `run_degree_centrality`, `run_betweenness_centrality`	Graph analytics and community detection
Temporal	`neighbors_at`, `bfs_at`, `dfs_at`, `shortest_path_at`	Time-travel queries on temporal edges
RAG	`graph_rag`, `extract_subgraph`	Graph-augmented retrieval for LLM context
Admin	`query`, `graph_stats`, `ping`	GQL queries, statistics, health check

Resources

Resources provide read-only graph data that LLM clients can fetch and include as context.

URI	Description	MIME Type
`astraea://stats`	Graph statistics (node/edge counts, labels)	application/json
`astraea://node/{id}`	Full node data (labels, properties, embedding)	application/json
`astraea://edge/{id}`	Full edge data (type, properties, weight, validity)	application/json
`astraea://subgraph/{nodeId}?hops=2&max=50`	Linearized subgraph context around a node	text/plain
`astraea://label/{label}`	All node IDs matching a label	application/json

Prompt Templates

Predefined prompt templates that LLM clients can discover and fill in.

Name	Arguments	Description
`analyze-node`	`node_id`	Analyze a node's properties, connections, and role in the graph
`explain-path`	`from`, `to`	Find and explain the shortest path between two nodes
`explore-community`	`node_id`	Run community detection and describe the community containing a node
`summarize-graph`	(none)	High-level summary of the graph: size, labels, structure
`temporal-diff`	`node_id`, `t1`, `t2`	Compare a node's neighborhood at two points in time
`rag-query`	`question`	Answer a question using graph-augmented retrieval

Setup: Local Model (Ollama)

Use AstraeaDB with a locally-running LLM via Ollama. The MCP server connects to AstraeaDB's TCP port, while the LLM runs locally. The MCP client (Claude Desktop, Cursor, etc.) orchestrates tool calls between the LLM and AstraeaDB.

Step 1: Start AstraeaDB

# Terminal 1: Start the database server
cargo run -p astraea-cli -- serve --port 7687

Step 2: (Optional) Start Ollama for GraphRAG

# If you want the graph_rag tool to use a local LLM for answering questions,
# start Ollama with a model. This is only needed for the graph_rag tool itself.
ollama serve
ollama pull llama3.1

Step 3: Configure your MCP client

Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "astraeadb": {
      "command": "astraeadb",
      "args": ["mcp", "--address", "127.0.0.1:7687"]
    }
  }
}

Claude Code — edit .claude/settings.json in your project or ~/.claude/settings.json globally:

{
  "mcpServers": {
    "astraeadb": {
      "command": "astraeadb",
      "args": ["mcp"]
    }
  }
}

Cursor — edit .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "astraeadb": {
      "command": "astraeadb",
      "args": ["mcp", "--address", "127.0.0.1:7687"]
    }
  }
}

Step 4: Use it

Once configured, the LLM can see all 29 tools. Example prompts you can give Claude:

"Create a knowledge graph about the solar system with planets and their moons" — Claude will call create_node and create_edge to build the graph.
"Find all Person nodes and show me their connections" — Claude will call find_by_label, then neighbors on each result.
"What communities exist in the graph?" — Claude will call run_louvain and interpret the results.
"Run PageRank and tell me which nodes are most important" — Claude will call run_pagerank.
"Compare the neighborhood of node 5 between January and March" — Claude will call neighbors_at with two timestamps.

Setup: Cloud Model (Claude API, OpenAI)

When using a cloud-hosted model, the MCP server still runs locally (or on the same machine as AstraeaDB), but the LLM client connects to a remote API. The architecture is:

┌────────────────┐                      ┌──────────────┐     ┌──────────────┐
│  Cloud LLM API │ <── internet ──>    │  MCP Client  │ <─> │  astraea-mcp │
│  (Claude,      │                      │  (Claude     │     │  (local)     │
│   GPT-4, etc.) │                      │   Desktop)   │     │              │
└────────────────┘                      └──────────────┘     └──────┬───────┘
                                                                     │ TCP
                                                             ┌───────▼──────┐
                                                             │  AstraeaDB   │
                                                             │  (local or   │
                                                             │   remote)    │
                                                             └──────────────┘

Option A: AstraeaDB on the same machine

This is the simplest setup. Both the MCP server and AstraeaDB run locally. The cloud LLM connects through your MCP client.

# Terminal 1: Start AstraeaDB
astraeadb serve

# Configure Claude Desktop/Code/Cursor as shown above — same config
# The MCP server auto-connects to 127.0.0.1:7687

Option B: AstraeaDB on a remote server

If AstraeaDB runs on a different machine (e.g., a cloud VM), point the MCP server at the remote address. Optionally use an auth token for access control.

# AstraeaDB is running on db.example.com:7687
# Configure the MCP client to pass the remote address:

{
  "mcpServers": {
    "astraeadb": {
      "command": "astraeadb",
      "args": ["mcp", "--address", "db.example.com:7687", "--auth-token", "my-secret-key"]
    }
  }
}

For production deployments, enable TLS on the AstraeaDB server and use a strong auth token.

Option C: Using environment variables

You can also pass the server address and auth token via environment variables in the MCP client configuration:

{
  "mcpServers": {
    "astraeadb": {
      "command": "astraeadb",
      "args": ["mcp"],
      "env": {
        "ASTRAEA_HOST": "db.example.com",
        "ASTRAEA_PORT": "7687",
        "ASTRAEA_AUTH_TOKEN": "my-secret-key"
      }
    }
  }
}

Example: Building a Knowledge Graph with Claude

Here is a complete example workflow using Claude Desktop with AstraeaDB MCP:

You: "I want to build a knowledge graph about cybersecurity threats. Create nodes for the MITRE ATT&CK tactics: Initial Access, Execution, Persistence, Privilege Escalation, and Defense Evasion. Then connect them in their typical attack chain order."

Claude: (calls create_node 5 times, then create_edge 4 times)

You: "Now run PageRank to see which tactic is most central."

Claude: (calls run_pagerank, interprets the scores)

You: "Show me the full graph structure."

Claude: (calls graph_stats, then extract_subgraph on the first node with hops=3)

You: "Query for all tactics connected to Execution."

Claude: (calls find_by_label to find Execution, then neighbors with direction=both)

Verifying the MCP Connection

To test that the MCP server is working correctly, you can run it directly and send JSON-RPC messages:

# Start AstraeaDB server in one terminal
astraeadb serve

# In another terminal, test the MCP server manually
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' | astraeadb mcp

You should see a JSON response with the server capabilities, tool list, and protocol version.

Python Client

Installation

# Basic (JSON only, zero dependencies)
pip install ./python

# With Arrow Flight support
pip install ./python[arrow]

Client Types

Client	Transport	Dependencies
`JsonClient`	TCP / newline-delimited JSON	None (stdlib only)
`ArrowClient`	Apache Arrow Flight	`pyarrow >= 14.0`
`AstraeaClient`	Auto-selects best transport	Optional `pyarrow`

Usage

from astraeadb import AstraeaClient

# Connect with optional authentication
with AstraeaClient(host="127.0.0.1", port=7687, auth_token="my-api-key") as client:
    # Create nodes (embeddings auto-indexed)
    alice = client.create_node(["Person"], {"name": "Alice", "age": 30},
                                embedding=[0.1] * 128)
    bob = client.create_node(["Person"], {"name": "Bob"})

    # Create a temporal edge
    client.create_edge(alice, bob, "KNOWS", {"since": 2020}, weight=0.9,
                       valid_from=1609459200000)  # Jan 1, 2021 (ms)

    # Traversals
    neighbors = client.neighbors(alice, direction="outgoing")
    path = client.shortest_path(alice, bob, weighted=True)
    reachable = client.bfs(alice, max_depth=2)

    # Temporal queries (time-travel)
    old_neighbors = client.neighbors_at(alice, "outgoing", 1577836800000)  # Jan 1, 2020
    historical_path = client.shortest_path_at(alice, bob, 1577836800000)

    # Vector search
    results = client.vector_search([0.15] * 128, k=5)

    # Hybrid search
    results = client.hybrid_search(anchor=alice, query_vector=[0.15] * 128,
                                    max_hops=3, k=10, alpha=0.5)

    # GraphRAG - extract subgraph context
    context = client.extract_subgraph(alice, hops=2, max_nodes=50, format="prose")

    # GraphRAG - full pipeline with LLM
    answer = client.graph_rag("Who does Alice know?", anchor=alice)

    # GQL query
    result = client.query("MATCH (a:Person) WHERE a.age > 25 RETURN a.name")

    # Batch operations
    node_ids = client.create_nodes([
        {"labels": ["Person"], "properties": {"name": "Charlie"}},
        {"labels": ["Person"], "properties": {"name": "Diana"}}
    ])

    # Health check
    status = client.ping()

DataFrame Support (Optional)

from astraeadb import AstraeaClient
from astraeadb.dataframe import import_nodes_df, export_nodes_df, export_bfs_df
import pandas as pd

# Import nodes from a DataFrame
df = pd.DataFrame([
    {"label": "Person", "name": "Alice", "age": 30},
    {"label": "Person", "name": "Bob", "age": 25}
])

with AstraeaClient() as client:
    node_ids = import_nodes_df(client, df, label_col="label")

    # Export nodes back to DataFrame
    result_df = export_nodes_df(client, node_ids)

    # Export BFS results with node details
    bfs_df = export_bfs_df(client, start=node_ids[0], max_depth=2)

Arrow Flight Client (Bulk Operations)

from astraeadb import ArrowClient
import pyarrow as pa

arrow = ArrowClient(host="127.0.0.1", flight_port=7689)

# Query results as Arrow Table (zero-copy to Pandas)
table = arrow.query("MATCH (a:Person) RETURN a.name, a.age")
df = table.to_pandas()

# Bulk import
nodes_table = pa.table({"id": [1, 2], "labels": ["Person", "Person"],
                         "properties": ['{"name":"Alice"}', '{"name":"Bob"}']})
arrow.bulk_insert_nodes(nodes_table)

Python API Reference

Category	Method	Description
Health	`ping()`	Health check, returns server version
Node CRUD	`create_node(labels, properties?, embedding?)`	Create a node, returns node ID
	`get_node(id)`	Get node by ID
	`update_node(id, properties)`	Merge properties into a node
	`delete_node(id)`	Delete node and all connected edges
Edge CRUD	`create_edge(source, target, type, props?, weight?, valid_from?, valid_to?)`	Create edge with optional temporal validity
	`get_edge(id)`	Get edge by ID
	`update_edge(id, properties)`	Update edge properties (merge)
	`delete_edge(id)`	Delete an edge
Traversal	`neighbors(id, direction?, edge_type?)`	Get neighbors
	`bfs(start, max_depth?)`	Breadth-first traversal
	`shortest_path(from, to, weighted?)`	Shortest path (BFS or Dijkstra)
Temporal	`neighbors_at(id, direction, timestamp, edge_type?)`	Neighbors at point in time
	`bfs_at(start, max_depth, timestamp)`	BFS at point in time
	`shortest_path_at(from, to, timestamp, weighted?)`	Path at point in time
Vector/Semantic	`vector_search(embedding, k?)`	k-nearest-neighbor search
	`hybrid_search(anchor, query_vector, max_hops?, k?, alpha?)`	Blended graph + vector search
	`semantic_neighbors(node, embedding, direction?, k?)`	Rank neighbors by concept
	`semantic_walk(start, embedding, max_hops?)`	Greedy semantic walk
GraphRAG	`extract_subgraph(center, hops?, max_nodes?, format?)`	Extract + linearize subgraph
GraphRAG	`graph_rag(question, anchor?, question_embedding?, hops?, max_nodes?, format?)`	Full RAG pipeline with LLM
GQL	`query(gql_string)`	Execute a GQL query
Batch Ops	`create_nodes(nodes_list)`	Create multiple nodes
	`create_edges(edges_list)`	Create multiple edges
	`delete_nodes(node_ids)`	Delete multiple nodes
	`delete_edges(edge_ids)`	Delete multiple edges

DataFrame Module (`astraeadb.dataframe`)

Requires pandas: pip install pandas

Function	Description
`import_nodes_df(client, df, label_col, embedding_cols?)`	Import nodes from DataFrame
`import_edges_df(client, df, source_col, target_col, type_col, ...)`	Import edges from DataFrame
`export_nodes_df(client, node_ids)`	Export nodes to DataFrame
`export_edges_df(client, edge_ids)`	Export edges to DataFrame
`export_bfs_df(client, start, max_depth?)`	Export BFS results with node details
`export_bfs_at_df(client, start, max_depth, timestamp)`	Export temporal BFS to DataFrame

R Client

The R client provides full feature parity with the Python client, supporting all AstraeaDB operations via JSON/TCP, with optional Apache Arrow Flight support for high-performance queries.

Prerequisites

install.packages("jsonlite")  # Required
install.packages("arrow")     # Optional, for Arrow Flight

Client Classes

Class	Transport	Description
`AstraeaClient`	JSON/TCP	Standard client, always available
`ArrowClient`	Arrow Flight	High-performance queries (requires `arrow` package)
`UnifiedClient`	Auto-select	Uses Arrow when available, falls back to JSON

Basic Usage

source("examples/r_client.R")

# Connect (with optional auth token)
client <- AstraeaClient$new(host = "127.0.0.1", port = 7687L, auth_token = "my-key")
client$connect()

# Create nodes with embeddings
id <- client$create_node(
  list("Person"),
  list(name = "Alice", age = 30),
  embedding = c(0.9, 0.1, 0.3)
)

# Create temporal edges
eid <- client$create_edge(
  id1, id2, "KNOWS",
  properties = list(since = 2024),
  weight = 0.9,
  valid_from = 1704067200000,  # Jan 1, 2024 (ms)
  valid_to = NULL               # Still active
)

client$close()

Full API Reference

Category	Method	Description
Node CRUD	`create_node(labels, properties, embedding=NULL)`	Create node, returns ID
	`get_node(node_id)`	Get node by ID
	`update_node(node_id, properties)`	Update properties (merge)
	`delete_node(node_id)`	Delete node + edges
Edge CRUD	`create_edge(src, tgt, type, props, weight, valid_from, valid_to)`	Create temporal edge
	`get_edge(edge_id)`	Get edge by ID
	`update_edge(edge_id, properties)`	Update properties (merge)
	`delete_edge(edge_id)`	Delete edge
Traversal	`neighbors(node_id, direction, edge_type)`	Get neighbors
	`bfs(start, max_depth)`	Breadth-first search
	`shortest_path(from, to, weighted)`	Find shortest path
Temporal	`neighbors_at(node_id, direction, timestamp, edge_type)`	Neighbors at point in time
	`bfs_at(start, max_depth, timestamp)`	BFS at point in time
	`shortest_path_at(from, to, timestamp, weighted)`	Path at point in time
GQL	`query(gql)`	Execute GQL query
Vector/Semantic	`vector_search(query_vector, k)`	k-NN vector search
	`hybrid_search(anchor, query_vector, max_hops, k, alpha)`	Graph + vector combined
	`semantic_neighbors(node_id, concept, direction, k)`	Neighbors by similarity
	`semantic_walk(start, concept, max_hops)`	Greedy semantic traversal
GraphRAG	`extract_subgraph(center, hops, max_nodes, format)`	Extract + linearize
GraphRAG	`graph_rag(question, anchor, embedding, hops, max_nodes, format)`	Full RAG pipeline
Batch Ops	`create_nodes(nodes_list)`	Create multiple nodes
	`create_edges(edges_list)`	Create multiple edges
	`delete_nodes(node_ids)`	Delete multiple nodes
	`delete_edges(edge_ids)`	Delete multiple edges
Data Frame I/O	`import_nodes_df(df, label_col, embedding_cols)`	Import nodes from data.frame
	`import_edges_df(df, source_col, target_col, ...)`	Import edges from data.frame
	`export_nodes_df(node_ids)`	Export nodes to data.frame
	`export_bfs_df(start, max_depth)`	BFS results as data.frame
Utility	`results_to_dataframe(results)`	Convert results to data.frame
Utility	`nodes_to_dataframe(node_ids)`	Fetch nodes as data.frame

Vector Search Example

# Find nodes similar to a "tech interest" vector
tech_vector <- c(1.0, 0.0, 0.0)
results <- client$vector_search(tech_vector, k = 5L)
for (r in results) {
  node <- client$get_node(r$node_id)
  cat(sprintf("  %s (similarity=%.3f)\n", node$properties$name, r$similarity))
}

Temporal Query Example

# See who Alice knew in 2020 vs 2024
t_2020 <- 1577836800000  # Jan 1, 2020
t_2024 <- 1704067200000  # Jan 1, 2024

neighbors_2020 <- client$neighbors_at(alice, "outgoing", t_2020)
neighbors_2024 <- client$neighbors_at(alice, "outgoing", t_2024)

cat(sprintf("2020: %d connections\n", length(neighbors_2020)))
cat(sprintf("2024: %d connections\n", length(neighbors_2024)))

GraphRAG Example

# Extract subgraph and get LLM answer
subgraph <- client$extract_subgraph(alice_id, hops = 2L, format = "structured")
cat(subgraph$text)  # Linearized graph context

# Full RAG pipeline (requires server LLM config)
result <- client$graph_rag(
  question = "Who does Alice work with?",
  anchor = alice_id,
  hops = 2L
)
cat(result$answer)

Batch Operations Example

# Create multiple nodes at once
nodes <- list(
  list(labels = list("Person"), properties = list(name = "Alice"), embedding = c(0.9, 0.1)),
  list(labels = list("Person"), properties = list(name = "Bob"), embedding = c(0.1, 0.9)),
  list(labels = list("Person"), properties = list(name = "Charlie"))
)
node_ids <- client$create_nodes(nodes)

# Create multiple edges at once
edges <- list(
  list(source = node_ids[1], target = node_ids[2], edge_type = "KNOWS"),
  list(source = node_ids[2], target = node_ids[3], edge_type = "KNOWS", weight = 0.5)
)
edge_ids <- client$create_edges(edges)

Data Frame Import/Export

# Import nodes from a data.frame
people_df <- data.frame(
  label = "Person",
  name = c("Alice", "Bob", "Charlie"),
  age = c(30, 25, 35)
)
node_ids <- client$import_nodes_df(people_df, label_col = "label")

# Export BFS results as a data.frame
bfs_df <- client$export_bfs_df(node_ids[1], max_depth = 2L)
print(bfs_df)
#   node_id depth  labels   name age
# 1       1     0  Person  Alice  30
# 2       2     1  Person    Bob  25
# 3       3     2  Person Charlie  35

Arrow Flight (High-Performance)

# Option 1: Use ArrowClient directly
arrow_client <- ArrowClient$new("grpc://localhost:7689")
arrow_client$connect()
result <- arrow_client$query("MATCH (p:Person) RETURN p.name, p.age")  # Returns Arrow Table
df <- arrow_client$query_df("MATCH (p:Person) RETURN p.name")          # Returns data.frame
arrow_client$close()

# Option 2: Use UnifiedClient (auto-selects best transport)
client <- UnifiedClient$new(host = "127.0.0.1", port = 7687L)
client$connect()
cat("Arrow enabled:", client$is_arrow_enabled(), "\n")
result <- client$query_df("MATCH (n) RETURN n")  # Uses Arrow if available
client$close()

Running the Demo

# Terminal 1
cargo run -p astraea-cli -- serve

# Terminal 2
Rscript examples/r_client.R

Go Client

A full-featured Go client library is provided in the go/astraeadb directory, published as github.com/AstraeaDB/AstraeaDB-Official. It supports three transport layers with idiomatic Go patterns including functional options, context.Context on every operation, and thread-safe connections.

Client Types

JSONClient — JSON/TCP transport (port 7687). Zero external dependencies beyond the Go standard library. Supports all 22 server operations.
GRPCClient — gRPC transport (port 7688) with Protocol Buffers. Supports 14 RPCs for type-safe, high-performance access.
Client (unified) — Auto-selects gRPC when available, falls back to JSON/TCP. Arrow Flight is stubbed for future implementation.

Installation

go get github.com/AstraeaDB/AstraeaDB-Official

Quick Start

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/AstraeaDB/AstraeaDB-Official"
)

func main() {
    ctx := context.Background()

    // Unified client: auto-selects gRPC when available
    client := astraeadb.NewClient(
        astraeadb.WithAddress("127.0.0.1", 7687),
        astraeadb.WithAuthToken("my-api-key"),
    )
    if err := client.Connect(ctx); err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    // Create nodes
    alice, _ := client.CreateNode(ctx,
        []string{"Person"},
        map[string]any{"name": "Alice", "age": 30},
        []float32{0.1, 0.2, 0.3},
    )
    bob, _ := client.CreateNode(ctx,
        []string{"Person"},
        map[string]any{"name": "Bob", "age": 25},
        nil,
    )

    // Create a temporal edge with options
    client.CreateEdge(ctx, alice, bob, "KNOWS",
        astraeadb.WithWeight(0.9),
        astraeadb.WithProperties(map[string]any{"since": 2020}),
        astraeadb.WithValidFrom(1609459200000),
    )

    // Traverse, search, and query
    neighbors, _ := client.Neighbors(ctx, alice,
        astraeadb.WithDirection("outgoing"))
    results, _ := client.VectorSearch(ctx, []float32{0.15, 0.25, 0.35}, 5)
    result, _ := client.Query(ctx, "MATCH (n:Person) RETURN n.name")
    rag, _ := client.GraphRAG(ctx, "Who does Alice know?",
        astraeadb.WithAnchor(alice))

    fmt.Println(neighbors, results, result, rag)
}

Configuration Options

The client uses the functional options pattern for configuration:

// All available options
client := astraeadb.NewClient(
    astraeadb.WithAddress("db.example.com", 7687),
    astraeadb.WithGRPCPort(7688),
    astraeadb.WithFlightPort(7689),
    astraeadb.WithAuthToken("my-api-key"),
    astraeadb.WithTimeout(30 * time.Second),
    astraeadb.WithDialTimeout(5 * time.Second),
    astraeadb.WithTLS("ca.pem"),                       // Server TLS
    astraeadb.WithMTLS("client.pem", "client.key", "ca.pem"), // Mutual TLS
    astraeadb.WithMaxRetries(5),
    astraeadb.WithReconnect(true),
)

Error Handling

The Go client provides sentinel errors for programmatic error handling with errors.Is():

import "errors"

_, err := client.GetNode(ctx, 999)
if errors.Is(err, astraeadb.ErrNodeNotFound) {
    // Handle missing node
}

// Available sentinel errors:
// ErrNotConnected, ErrNodeNotFound, ErrEdgeNotFound,
// ErrNoVectorIndex, ErrAccessDenied, ErrInvalidCreds, ErrAuthRequired

Batch Operations

nodes := []astraeadb.NodeInput{
    {Labels: []string{"Person"}, Properties: map[string]any{"name": "Charlie"}},
    {Labels: []string{"Person"}, Properties: map[string]any{"name": "Diana"}},
}
ids, err := client.CreateNodes(ctx, nodes)

edges := []astraeadb.EdgeInput{
    {Source: ids[0], Target: ids[1], EdgeType: "KNOWS", Weight: 0.8},
}
edgeIDs, err := client.CreateEdges(ctx, edges)

API Reference

Category	Method	Description
Health	`Ping(ctx)`	Health check, returns server version
Node CRUD	`CreateNode(ctx, labels, properties, embedding)`	Create a node, returns node ID
	`GetNode(ctx, id)`	Retrieve node by ID
	`UpdateNode(ctx, id, props)`	Merge properties into node
	`DeleteNode(ctx, id)`	Delete node and connected edges
Edge CRUD	`CreateEdge(ctx, src, tgt, type, opts...)`	Create edge with `WithWeight`, `WithProperties`, `WithValidFrom`, `WithValidTo`
	`GetEdge(ctx, id)` / `UpdateEdge` / `DeleteEdge`	Get, update, or delete an edge
Traversal	`Neighbors(ctx, id, opts...)`	Get neighbors with `WithDirection`, `WithEdgeType`
	`BFS(ctx, start, maxDepth)`	Breadth-first traversal
	`ShortestPath(ctx, from, to, weighted)`	Shortest path (BFS or Dijkstra)
Temporal	`NeighborsAt(ctx, id, direction, timestamp)`	Neighbors at point in time
	`BFSAt(ctx, start, maxDepth, timestamp)`	BFS at point in time
	`ShortestPathAt(ctx, from, to, timestamp, weighted)`	Path at point in time
Vector	`VectorSearch(ctx, embedding, k)`	k-nearest-neighbor search
	`HybridSearch(ctx, anchor, embedding, opts...)`	Blended graph + vector search
	`SemanticNeighbors(ctx, id, concept, opts...)`	Rank neighbors by concept similarity
	`SemanticWalk(ctx, start, concept, maxHops)`	Greedy semantic walk
GraphRAG	`ExtractSubgraph(ctx, center, opts...)`	Extract + linearize subgraph
	`GraphRAG(ctx, question, opts...)`	Full RAG pipeline with LLM
GQL	`Query(ctx, gql)`	Execute a GQL query
Batch	`CreateNodes(ctx, nodes)` / `CreateEdges(ctx, edges)`	Batch create
	`DeleteNodes(ctx, ids)` / `DeleteEdges(ctx, ids)`	Batch delete

Transport Selection

The unified Client automatically selects the best available transport:

gRPC-supported operations (CRUD, traversal, vector search, GQL query) → gRPC when available, JSON/TCP fallback
Temporal queries (NeighborsAt, BFSAt, ShortestPathAt) → always JSON/TCP (not in gRPC proto)
Semantic operations (HybridSearch, SemanticNeighbors, SemanticWalk) → always JSON/TCP
GraphRAG (ExtractSubgraph, GraphRAG) → always JSON/TCP

// Check transport availability at runtime
fmt.Println("gRPC available:", client.IsGRPCAvailable())
fmt.Println("Arrow available:", client.IsArrowAvailable())

Running the Tests

# From go/astraeadb/
go test -v -race ./...

# Run with the Makefile
make test

Java Client

A full-featured Java client is provided in the java/astraeadb directory as a Gradle multi-module project. It supports three transport layers with idiomatic Java patterns including records (Java 17+), the builder pattern, try-with-resources lifecycle, and thread-safe connections.

Client Types

JsonClient — JSON/TCP transport (port 7687). Supports all 22 server operations with Jackson JSON serialization.
GrpcClient — gRPC transport (port 7688) with Protocol Buffers. Supports 14 RPCs for type-safe, high-performance access.
FlightAstraeaClient — Arrow Flight transport (port 7689). Supports queries and bulk import via zero-copy Arrow tables.
UnifiedClient — Auto-selects the best transport per operation: gRPC for CRUD/traversal, Arrow Flight for queries/bulk, JSON/TCP for temporal/semantic/GraphRAG.

Gradle Dependency

dependencies {
    implementation "com.astraeadb:astraeadb-unified:0.1.0"  // All transports
    // Or individual: astraeadb-json, astraeadb-grpc, astraeadb-flight
}

Usage Example

import com.astraeadb.unified.UnifiedClient;
import com.astraeadb.model.*;
import com.astraeadb.options.*;

try (var client = UnifiedClient.builder()
        .host("127.0.0.1")
        .authToken("my-api-key")
        .build()) {

    client.connect();

    // Create nodes with embeddings
    long alice = client.createNode(
        List.of("Person"),
        Map.of("name", "Alice", "age", 30),
        new float[]{0.1f, 0.2f, 0.3f});

    // Create a temporal edge
    client.createEdge(alice, bob, "KNOWS",
        EdgeOptions.builder()
            .weight(0.9)
            .validFrom(1609459200000L)
            .build());

    // Traverse, search, query
    List<NeighborEntry> neighbors = client.neighbors(alice,
        NeighborOptions.builder().direction("outgoing").build());
    List<SearchResult> results = client.vectorSearch(
        new float[]{0.15f, 0.25f, 0.35f}, 5);
    QueryResult result = client.query(
        "MATCH (n:Person) RETURN n.name");
    RagResult rag = client.graphRag("Who does Alice know?",
        RagOptions.builder().anchor(alice).hops(2).build());
}

Exception Handling

The Java client uses a checked exception hierarchy rooted at AstraeaException:

try {
    Node node = client.getNode(999);
} catch (NodeNotFoundException e) {
    // Specific exception for not-found
} catch (AccessDeniedException e) {
    // Permission error
} catch (AstraeaException e) {
    // Base exception for all errors
}

Java API Reference

Category	Method	Description
Health	`ping()`	Health check, returns version
Node CRUD	`createNode(labels, props, embedding)`	Create node, returns ID
	`getNode(id)` / `updateNode(id, props)` / `deleteNode(id)`	Read/update/delete
Edge CRUD	`createEdge(src, tgt, type, options)`	Create edge with EdgeOptions
	`getEdge(id)` / `updateEdge(id, props)` / `deleteEdge(id)`	Read/update/delete
Traversal	`neighbors(id, options)`	Get neighbors with NeighborOptions
	`bfs(start, maxDepth)`	Breadth-first traversal
	`shortestPath(from, to, weighted)`	Shortest path
Temporal	`neighborsAt(id, dir, timestamp)`	Neighbors at time T
	`bfsAt(start, depth, timestamp)`	BFS at time T
	`shortestPathAt(from, to, ts, weighted)`	Path at time T
Vector	`vectorSearch(embedding, k)`	k-NN search
	`hybridSearch(anchor, embedding, options)`	Graph + vector
	`semanticNeighbors(id, concept, options)`	Neighbors by similarity
	`semanticWalk(start, concept, maxHops)`	Semantic walk
GraphRAG	`extractSubgraph(center, options)`	Extract + linearize
	`graphRag(question, options)`	Full RAG pipeline
GQL	`query(gql)`	Execute a GQL query
Batch	`createNodes(nodes)` / `createEdges(edges)`	Batch create
	`deleteNodes(ids)` / `deleteEdges(ids)`	Batch delete

Cybersecurity Demo

This example demonstrates how AstraeaDB enables security analysts to investigate network alerts by tracing connections through a graph.

The Problem

When a firewall alerts on suspicious traffic from 10.0.1.50, the analyst must manually search DHCP logs, asset management records, and other sources to trace the IP to a user. With AstraeaDB, these datasets are loaded as a graph and the investigation becomes a series of traversals.

Graph Model

User <--[ASSIGNED_TO]-- Laptop <--[DHCP_LEASE]-- IPAddress
                                                    |
                                              [TRAFFIC]  [TRIGGERED]
                                                    |         |
                                              IPAddress  FirewallAlert --[TARGETS]--> ExternalHost

The Scenario

Three employees — Alice (Engineering), Bob (Finance), and Eve (Marketing) — each have laptops with DHCP-assigned IPs. Eve's attack chain:

Downloads a password cracker from darktools.example.com (port 443)
Firewall logs the connection (alert FW-2025-0042, severity: critical)
Attempts RDP to Bob's machine at 10.0.1.20:3389 — blocked
Attempts SSH to Alice's machine at 10.0.1.10:22 — blocked

Investigation with AstraeaDB

# Step 1: Who triggered alert FW-2025-0042?
sources = client.neighbors(alert_id, "incoming", edge_type="TRIGGERED")
# → Source IP: 10.0.1.50

# Step 2: Trace IP → Laptop via DHCP lease
leases = client.neighbors(source_ip_id, "outgoing", edge_type="DHCP_LEASE")
# → Laptop: EVE-LAT01

# Step 3: Trace Laptop → User
users = client.neighbors(laptop_id, "outgoing", edge_type="ASSIGNED_TO")
# → User: Eve (Marketing, Analyst)

# Step 4: What else has Eve's IP been doing?
traffic = client.neighbors(source_ip_id, "outgoing", edge_type="TRAFFIC")
# → darktools.example.com:443, 10.0.1.20:3389 (RDP), 10.0.1.10:22 (SSH)

# Step 5: BFS blast radius
blast_radius = client.bfs(source_ip_id, max_depth=2)

Running the Demo

# Terminal 1
cargo run -p astraea-cli -- serve

# Terminal 2
python3 examples/cybersecurity_demo.py

13 Rust tests cover this scenario in the astraea-graph crate:

cargo test --package astraea-graph cybersecurity

API Reference

Complete JSON request/response format for all request types. All requests are newline-delimited JSON sent over TCP (port 7687).

Node Operations

CreateNode

// Request
{"type":"CreateNode","labels":["Person"],"properties":{"name":"Alice","age":30},"embedding":[0.1,0.2]}
// Response
{"status":"ok","data":{"node_id":1}}

GetNode

{"type":"GetNode","id":1}
// Response includes id, labels, properties, embedding

UpdateNode

{"type":"UpdateNode","id":1,"properties":{"title":"Engineer"}}

DeleteNode

{"type":"DeleteNode","id":1}
// Cascades: all connected edges are also deleted

Edge Operations

CreateEdge

{"type":"CreateEdge","source":1,"target":2,"edge_type":"KNOWS",
 "properties":{"since":2020},"weight":0.9,
 "valid_from":1704067200000,"valid_to":null}

GetEdge / UpdateEdge / DeleteEdge

{"type":"GetEdge","id":1}
{"type":"UpdateEdge","id":1,"properties":{"note":"updated"}}
{"type":"DeleteEdge","id":1}

Traversal Operations

Neighbors

{"type":"Neighbors","id":1,"direction":"outgoing","edge_type":"KNOWS"}

Bfs

{"type":"Bfs","start":1,"max_depth":3}

ShortestPath

{"type":"ShortestPath","from":1,"to":5,"weighted":false}

Vector & Semantic Operations

VectorSearch

{"type":"VectorSearch","query":[0.1,0.2,0.3],"k":10}

HybridSearch

{"type":"HybridSearch","anchor":1,"query":[0.1,0.2],"max_hops":3,"k":10,"alpha":0.5}

SemanticNeighbors

{"type":"SemanticNeighbors","id":1,"concept":[0.1,0.2],"direction":"outgoing","k":5}

SemanticWalk

{"type":"SemanticWalk","start":1,"concept":[0.1,0.2],"max_hops":4}

Temporal Operations

NeighborsAt

{"type":"NeighborsAt","id":1,"direction":"outgoing","timestamp":1736929800000}

BfsAt

{"type":"BfsAt","start":1,"max_depth":3,"timestamp":1736929800000}

ShortestPathAt

{"type":"ShortestPathAt","from":1,"to":5,"timestamp":1736929800000,"weighted":false}

RAG Operations

ExtractSubgraph

{"type":"ExtractSubgraph","center":1,"hops":2,"max_nodes":50,"format":"structured"}

GraphRag

{"type":"GraphRag","question":"Who compromised the server?",
 "anchor":1,"hops":2,"max_nodes":50,"format":"structured"}

Query & Health

Query

{"type":"Query","gql":"MATCH (a:Person)-[:KNOWS]->(b) RETURN a.name, b.name"}

Ping

{"type":"Ping"}
{"status":"ok","data":{"pong":true,"version":"0.1.0"}}

Glossary

Quick reference for technical terms used throughout this documentation.

Term	Definition
ANN	Approximate Nearest Neighbor — A search algorithm that finds vectors close to a query vector, trading perfect accuracy for speed. Returns results that are "good enough" rather than guaranteed optimal.
Arrow Flight	A high-performance protocol for streaming columnar data using Apache Arrow's in-memory format. Enables zero-copy data transfer between database and client.
BFS	Breadth-First Search — A graph traversal algorithm that explores all neighbors at the current depth before moving deeper. Visits nodes level-by-level.
CSR	Compressed Sparse Row — A memory-efficient matrix format that stores only non-zero values. Used to represent sparse graphs as adjacency matrices for fast GPU operations.
DFS	Depth-First Search — A graph traversal algorithm that explores as far as possible along each branch before backtracking. Follows one path to its end before trying alternatives.
Embedding	A fixed-size numeric vector (array of floats) that captures the semantic meaning of data. Similar concepts have similar embeddings, enabling similarity search.
FHE	Fully Homomorphic Encryption — Encryption that allows computation on encrypted data without decrypting it first. The server never sees plaintext.
GNN	Graph Neural Network — A neural network designed for graph-structured data. Learns by passing messages between connected nodes to capture both features and structure.
GQL	Graph Query Language — The ISO standard (2024) for querying graph databases. Combines the best features of Cypher and SQL with pattern-matching syntax.
GraphRAG	Graph-enhanced Retrieval-Augmented Generation — A technique that extracts relevant subgraphs, converts them to text, and feeds them to an LLM to answer questions with graph context.
gRPC	Google Remote Procedure Call — A high-performance RPC framework using Protocol Buffers for serialization. More efficient than JSON for structured data.
HNSW	Hierarchical Navigable Small World — A graph-based algorithm for approximate nearest neighbor search. Builds a multi-layer navigation structure for fast similarity queries with O(log n) complexity.
io_uring	A Linux kernel interface for high-performance asynchronous I/O. Uses shared ring buffers between user space and kernel to minimize syscall overhead.
LLM	Large Language Model — An AI model trained on vast text data that can understand and generate human language (e.g., GPT-4, Claude).
LSN	Log Sequence Number — A monotonically increasing identifier for each entry in the write-ahead log. Used for recovery and replication.
MCP	Model Context Protocol — A JSON-RPC 2.0 based protocol that allows LLM clients (Claude Desktop, Cursor, etc.) to discover and invoke external tools, read resources, and use prompt templates through a standardized interface.
mTLS	Mutual TLS — Two-way TLS authentication where both client and server present certificates. Provides strong identity verification for both parties.
MVCC	Multi-Version Concurrency Control — A database technique that maintains multiple versions of data to allow concurrent reads and writes without blocking. Each transaction sees a consistent snapshot.
NVMe	Non-Volatile Memory Express — A high-speed storage interface protocol designed for SSDs. Provides much lower latency than SATA or SAS.
Parquet	A columnar file format optimized for analytics. Stores data by column rather than row, enabling efficient compression and fast analytical queries.
Pointer Swizzling	A technique that converts disk-based identifiers (64-bit IDs) into direct memory pointers when data is loaded into RAM, enabling nanosecond-level access.
RBAC	Role-Based Access Control — A security model where permissions are assigned to roles (Admin, Writer, Reader) rather than individual users. Users are granted roles.
SCC	Strongly Connected Components — Maximal subgraphs where every node can reach every other node following directed edges. Used to find tightly-knit groups.
SEAL	Microsoft Simple Encrypted Arithmetic Library — An open-source library for homomorphic encryption that enables computation on encrypted data.
SSSP	Single-Source Shortest Path — An algorithm that computes the shortest distance from one source node to all other nodes in the graph (e.g., Dijkstra's algorithm).
TLS	Transport Layer Security — A cryptographic protocol that provides secure communication over networks. Successor to SSL.
WAL	Write-Ahead Log — A durability mechanism that logs all mutations to disk before applying them. Enables crash recovery by replaying the log.

AstraeaDB — Cloud-Native, AI-First Graph Database — MIT License

464 Rust tests • 23 Python tests • 113 Java tests • 15 crates • Edition 2024