Introduction
AstraeaDB is a cloud-native, AI-first graph database written in Rust. It combines a Vector-Property Graph model with an HNSW (Hierarchical Navigable Small World) vector index, enabling both structural graph traversals and semantic similarity search in a single system.
Key Differentiators
- Unified Data Model — Nodes carry labels, JSON properties, and optional float32 embeddings (numeric vectors that capture semantic meaning). Edges carry types, weights, and temporal validity intervals.
- Hybrid Search — Blend graph proximity (how many hops away) and vector similarity (how semantically close) with a configurable alpha parameter.
- Semantic Traversal — Navigate the graph by meaning: find neighbors most similar to an abstract concept represented as a vector.
- Temporal Queries — Travel through time: query the graph as it existed at any point in time using edge validity intervals.
- GraphRAG — Graph-enhanced Retrieval-Augmented Generation: extract subgraphs, convert to text, and feed to an LLM (Large Language Model) in one atomic operation.
- GQL Support — Hand-written ISO GQL (Graph Query Language) / Cypher parser and executor for declarative graph queries.
- Three Transport Layers — JSON-over-TCP, gRPC (Google's high-performance RPC framework), and Apache Arrow Flight (columnar data streaming).
- Five Client Libraries — Python, R, Go, Java, and embedded Rust. Each supports all 22 server operations with idiomatic language patterns.
- Pure Rust — Zero garbage-collection pauses, memory safety, 441 tests across 14 crates.
- Cloud-Native Storage — Cold storage backends for S3, GCS, Azure, and Parquet (columnar file format).
- Production Security — API key RBAC (Role-Based Access Control) and mTLS (mutual TLS) with client certificate authentication.
Project Stats
| Metric | Value |
|---|---|
| Rust Crates | 14 |
| Rust Tests | 441 |
| Python Tests | 23 |
| Go Tests | 30 |
| Java Tests | 113 |
| Client Libraries | Python, R, Go, Java, Rust (embedded) |
| Rust Edition | 2024 |
| License | MIT |
Getting Started
Build
cargo build --workspace
Run Tests
cargo test --workspace
Start the Server
cargo run -p astraea-cli -- serve
This starts the TCP server on port 7687, gRPC on 7688, and Arrow Flight on 7689.
Connect with the Shell
cargo run -p astraea-cli -- shell
The interactive shell supports both GQL queries and raw JSON requests:
# GQL queries astraea> CREATE (a:Person {name: "Alice", age: 30}) Nodes created: 1 astraea> MATCH (a:Person) WHERE a.age > 25 RETURN a.name, a.age +-------+------+ | a.name| a.age| +-------+------+ | Alice | 30 | +-------+------+ # Dot-commands astraea> .status astraea> .help astraea> .quit
Check Server Status
cargo run -p astraea-cli -- status
Architecture
Crate Overview
| Crate | Purpose | Tests |
|---|---|---|
astraea-core | Types (Node, Edge, NodeId), traits (StorageEngine, GraphOps, VectorIndex, TransactionalEngine), errors | 4 |
astraea-storage | 8 KiB pages, LRU buffer pool, pointer swizzling, MVCC, WAL, label index, cold storage (JSON/Parquet/S3), PageIO (memmap2/io_uring) | 75 |
astraea-graph | CRUD, BFS, DFS, Dijkstra, temporal queries, hybrid search, semantic traversal | 55 |
astraea-query | GQL lexer, recursive-descent parser, full query executor | 56 |
astraea-vector | HNSW index, cosine/Euclidean/dot-product, binary persistence | 33 |
astraea-rag | Subgraph extraction, linearization (4 formats), LLM providers, GraphRAG pipeline | 27 |
astraea-gnn | Differentiable tensors, message passing, node classification training | 26 |
astraea-server | TCP/gRPC server, auth (RBAC + mTLS), metrics (Prometheus), connection management | 68 |
astraea-flight | Arrow Flight: do_get (query → Arrow), do_put (Arrow → import) | 11 |
astraea-algorithms | PageRank, connected/strongly-connected components, centrality, Louvain | 20 |
astraea-crypto | Encrypted labels/values/nodes, server-side encrypted label matching | 31 |
astraea-gpu | CSR matrix, GpuBackend trait, CpuBackend (PageRank, BFS, SSSP) | 16 |
astraea-cluster | Hash/range partitioning, shard management, cluster coordinator | 19 |
astraea-cli | serve, shell, status, import, export | — |
Data Model
AstraeaDB uses a Vector-Property Graph model that unifies property graphs with vector embeddings.
Node
A node has an ID, a set of labels, arbitrary JSON properties, and an optional embedding vector.
pub struct Node { pub id: NodeId, pub labels: Vec<String>, pub properties: serde_json::Value, pub embedding: Option<Vec<f32>>, // optional dense vector }
Edge
An edge connects two nodes with a type, JSON properties, a learnable weight, and a temporal validity interval.
pub struct Edge { pub id: EdgeId, pub source: NodeId, pub target: NodeId, pub edge_type: String, pub properties: serde_json::Value, pub weight: f64, // learnable weight for GNN pub validity: ValidityInterval, // temporal bounds }
ValidityInterval
Represents when an edge is valid. Uses epoch milliseconds with inclusive start and exclusive end.
pub struct ValidityInterval { pub valid_from: Option<i64>, // inclusive, None = unbounded pub valid_to: Option<i64>, // exclusive, None = still valid } // Check if an edge is valid at a given time let valid = edge.validity.contains(1704067200000); // 2024-01-01
GraphPath
Represents a path through the graph as a start node followed by (edge, node) steps.
pub struct GraphPath { pub start: NodeId, pub steps: Vec<(EdgeId, NodeId)>, }
ID Types
| Type | Description |
|---|---|
NodeId(u64) | Unique node identifier |
EdgeId(u64) | Unique edge identifier |
PageId(u64) | Storage page identifier |
TransactionId(u64) | MVCC transaction identifier |
Lsn(u64) | Write-ahead log sequence number |
Storage Engine
AstraeaDB uses a three-tier storage architecture optimized for graph workloads.
Tier 1: Cold Storage
Data at rest on disk or object storage. The ColdStorage trait provides a pluggable backend interface with three implementations:
| Backend | Description | Use Case |
|---|---|---|
JsonFileColdStorage | Human-readable JSON files on local disk | Development, debugging, small datasets |
ParquetColdStorage | Columnar Apache Parquet with Arrow schema | Analytics, large datasets, efficient compression |
ObjectStoreColdStorage | S3, GCS, Azure Blob, or local filesystem | Cloud-native deployments, data lake integration |
Parquet Schema
Nodes and edges are stored with full Arrow schema mapping:
// Node schema id: UInt64, labels: List<Utf8>, properties: Utf8, embedding: List<Float32> // Edge schema id: UInt64, source: UInt64, target: UInt64, edge_type: Utf8, properties: Utf8, weight: Float64, valid_from: Int64, valid_to: Int64
Object Store Usage
use astraea_storage::ObjectStoreColdStorage; // Local filesystem let storage = ObjectStoreColdStorage::local("/data/cold")?; // Amazon S3 let storage = ObjectStoreColdStorage::s3("my-bucket", "astraea/")?; // Google Cloud Storage let storage = ObjectStoreColdStorage::gcs("my-bucket", "astraea/")?; // Azure Blob Storage let storage = ObjectStoreColdStorage::azure("my-container", "astraea/")?;
Tier 2: Warm (Buffer Pool)
An LRU buffer pool caches frequently accessed 8 KiB pages in memory with pin/unpin semantics. The PageIO trait abstracts disk I/O with two backends:
| Backend | Platform | Description |
|---|---|---|
FileManager | All platforms | Cross-platform memmap2-based I/O (default) |
UringPageIO | Linux only | High-performance io_uring async I/O (feature-gated) |
Enabling io_uring (Linux)
# Cargo.toml [dependencies] astraea-storage = { version = "0.1", features = ["io-uring"] } # Build command cargo build --features io-uring
Tier 3: Hot (Pointer Swizzling)
Frequently-accessed pages are promoted to permanently-pinned status, preventing eviction and enabling zero-copy access. When a page's access count exceeds a configurable threshold, it is "swizzled" into the hot tier.
Page Format
+----------------------------------+
| PageHeader (17 bytes) |
| page_id, type, record_count, |
| free_space_offset, checksum |
+----------------------------------+
| Record 0: NodeRecordHeader |
| node_id, data_len, adj_offset |
| + serialized properties |
+----------------------------------+
| Record 1: ... |
+----------------------------------+
| (free space) |
+----------------------------------+
8192 bytes total
MVCC Transactions
MVCC (Multi-Version Concurrency Control) allows multiple transactions to read and write data concurrently without blocking each other. AstraeaDB uses snapshot isolation with first-writer-wins conflict detection. The TransactionalEngine trait provides transactional access:
begin_transaction()— start a new transaction with a snapshot LSN (Log Sequence Number, a unique identifier for each log entry)put_node_tx(node, txn_id)/put_edge_tx(edge, txn_id)— buffer writes until commitcommit_transaction(txn_id)— atomically apply all buffered writesabort_transaction(txn_id)— discard all buffered writes
Write-Ahead Log (WAL)
The WAL (Write-Ahead Log) ensures durability: every mutation is logged to disk before being applied to the data files. If the database crashes, it can recover by replaying the log. Records use a [length][type][JSON payload][CRC32] frame format. Supports BeginTransaction, CommitTransaction, and AbortTransaction records for crash recovery.
Label Index
A hash-based index (HashMap<String, HashSet<NodeId>>) for O(1) constant-time label-based lookups, automatically maintained when nodes are created or deleted.
Graph Operations
The GraphOps trait defines all graph-level operations. It is implemented by the Graph struct on top of any StorageEngine.
CRUD Operations
use astraea_graph::Graph; use astraea_core::traits::GraphOps; // Create nodes let alice = graph.create_node( vec!["Person".into()], json!({"name": "Alice", "age": 30}), None, // no embedding )?; // Create an edge (always valid) graph.create_edge(alice, bob, "KNOWS".into(), json!({}), 1.0, None, None)?; // Read let node = graph.get_node(alice)?; let edge = graph.get_edge(edge_id)?; // Update (merge semantics) graph.update_node(alice, json!({"title": "Engineer"}))?; // Delete (node deletion cascades to edges) graph.delete_node(alice)?;
Traversals
Graph traversal algorithms explore nodes by following edges:
| Method | Algorithm | Description |
|---|---|---|
bfs(start, max_depth) | BFS (Breadth-First Search) | Explores neighbors level by level. Returns Vec<(NodeId, depth)> |
dfs(start, max_depth) | DFS (Depth-First Search) | Explores as far as possible before backtracking. Returns Vec<NodeId> |
shortest_path(from, to) | BFS | Unweighted shortest path (fewest hops) |
shortest_path_weighted(from, to) | Dijkstra's algorithm | Weighted shortest path using edge weights |
Neighbor Queries
use astraea_core::types::Direction; // All outgoing neighbors let neighbors = graph.neighbors(alice, Direction::Outgoing)?; // Filtered by edge type let friends = graph.neighbors_filtered(alice, Direction::Both, "KNOWS")?; // Find nodes by label (O(1) via label index) let people = graph.find_by_label("Person")?;
Temporal Queries
Edges have a ValidityInterval that defines when they exist. Temporal query methods filter edges by a given timestamp, allowing you to query the graph as it existed at any point in time.
Creating Temporal Edges
// DHCP lease valid from 08:00 to 10:00 UTC on Jan 15, 2025 graph.create_edge( ip_node, laptop_node, "DHCP_LEASE".into(), json!({"dhcp_server": "10.0.0.1"}), 1.0, Some(1736928000000), // valid_from Some(1736935200000), // valid_to )?;
Temporal Traversal Methods
| Method | Description |
|---|---|
neighbors_at(node, direction, timestamp) | Neighbors via edges valid at the timestamp |
bfs_at(start, max_depth, timestamp) | BFS traversal only following valid edges |
shortest_path_at(from, to, timestamp) | Unweighted shortest path at a point in time |
shortest_path_weighted_at(from, to, timestamp) | Dijkstra with temporal filtering |
Server Requests
// Neighbors at a specific time {"type": "NeighborsAt", "id": 42, "direction": "outgoing", "timestamp": 1736929800000} // BFS at a specific time {"type": "BfsAt", "start": 42, "max_depth": 3, "timestamp": 1736929800000} // Shortest path at a specific time {"type": "ShortestPathAt", "from": 1, "to": 5, "timestamp": 1736929800000, "weighted": false}
Vector Search (HNSW)
AstraeaDB includes a full implementation of the HNSW (Hierarchical Navigable Small World) algorithm for ANN (Approximate Nearest-Neighbor) search. HNSW builds a multi-layer graph where each layer is a "small world" network — most nodes are not directly connected, but any two nodes can be reached through a small number of hops. This enables finding similar vectors in logarithmic time rather than scanning all vectors.
Configuration
| Parameter | Default | Description |
|---|---|---|
M | 16 | Maximum connections per node per layer (higher = more accurate but slower) |
ef_construction | 200 | Beam width during index building (higher = better quality index) |
ef_search | 50 | Beam width during search (trade-off: higher = more accurate, lower = faster) |
Distance Metrics
Distance metrics measure how "far apart" two vectors are. Lower distance means more similar:
| Metric | Description | Best For |
|---|---|---|
Cosine | Measures the angle between vectors (1 - cosθ). Ignores magnitude. | Text embeddings, normalized vectors |
Euclidean | Straight-line distance (L2 norm). Considers magnitude. | Spatial data, image features |
DotProduct | Negative dot product. Higher dot product = more similar. | Recommendation systems, MIPS |
Usage
use astraea_vector::HnswVectorIndex; use astraea_core::types::DistanceMetric; use astraea_core::traits::VectorIndex; // Create a 128-dimensional index with cosine distance let index = HnswVectorIndex::new(128, DistanceMetric::Cosine); // Insert embeddings index.insert(node_id, &embedding_vec)?; // Search for k nearest neighbors let results = index.search(&query_vec, 10)?; for r in &results { println!("Node {:?}, distance: {}", r.node_id, r.distance); }
Persistence
The index can be saved to and loaded from disk using a versioned binary format (magic bytes + bincode):
// Save to disk index.save("index.hnsw")?; // Load from disk (no rebuild needed) let index = HnswVectorIndex::load("index.hnsw")?;
Auto-Indexing
When a VectorIndex is attached to a Graph, embeddings are automatically indexed on create_node() and removed on delete_node().
Hybrid & Semantic Search
Hybrid Search
Combines graph proximity with vector similarity using a configurable alpha blending parameter:
final_score = alpha × vector_score + (1 - alpha) × graph_score
graph.hybrid_search( anchor_node, // starting node for BFS &query_embedding, // semantic target 3, // max_hops (BFS radius) 10, // top-k results 0.5, // alpha: 0.0 = pure graph, 1.0 = pure vector )?;
Semantic Neighbors
Rank a node's neighbors by their embedding similarity to a concept vector:
// "Find the neighbor of Alice most similar to the concept of Risk" let ranked = graph.semantic_neighbors( alice_id, &risk_embedding, Direction::Outgoing, 5, // top-k )?;
Semantic Walk
A greedy multi-hop walk that at each step moves to the unvisited neighbor most similar to the concept:
// Walk through the graph toward the concept of "Fraud" let path = graph.semantic_walk( start_node, &fraud_embedding, 4, // max hops )?; // path: Vec<(NodeId, f32)> where f32 is distance to concept
GQL Query Language
AstraeaDB includes a hand-written recursive-descent parser and full query executor for a subset of ISO GQL / Cypher.
MATCH Queries
MATCH (a:Person)-[:KNOWS]->(b:Person) WHERE a.age > 30 AND b.name = "Bob" RETURN a.name AS person, b.name AS friend ORDER BY a.age DESC LIMIT 10
CREATE
CREATE (a:Person {name: "Alice", age: 30})
CREATE (a:Person {name: "Alice"})-[:KNOWS {since: 2020}]->(b:Person {name: "Bob"})
DELETE
MATCH (a:Person) WHERE a.name = "Alice" DELETE a
Expression Support
| Category | Operators / Functions |
|---|---|
| Arithmetic | +, -, *, /, % |
| Comparison | =, <>, <, <=, >, >= |
| Boolean | AND, OR, NOT |
| Null checks | IS NULL, IS NOT NULL |
| Functions | count(), id(), labels(), type(), toString(), toInteger() |
| Edge directions | -[:TYPE]-> (out), <-[:TYPE]- (in), -[:TYPE]- (both) |
Programmatic Usage
use astraea_query::parse; use astraea_query::executor::Executor; let ast = parse("MATCH (a:Person) WHERE a.age > 30 RETURN a.name")?; let executor = Executor::new(graph.clone()); let result = executor.execute(ast)?; // result.columns: ["a.name"] // result.rows: [["Alice"], ...]
GraphRAG Engine
Retrieval-Augmented Generation backed by graph context. The pipeline performs: vector search → subgraph extraction → linearization → LLM completion.
Subgraph Extraction
use astraea_rag::{extract_subgraph, linearize_subgraph, TextFormat}; // BFS 2 hops from a node, max 50 nodes let subgraph = extract_subgraph(&*graph, node_id, 2, 50)?;
Linearization Formats
| Format | Description | Best For |
|---|---|---|
Structured | Indented tree with arrows (-[KNOWS]->) | General LLM context |
Prose | Natural language paragraphs | Conversational AI |
Triples | (subject, predicate, object) | Knowledge extraction |
Json | Compact JSON | Structured prompts |
let text = linearize_subgraph(&subgraph, TextFormat::Structured); let tokens = estimate_tokens(&text); // ~4 chars per token
LLM Providers
The LlmProvider trait supports multiple backends. Providers use injectable HTTP callbacks (no HTTP dependencies in the crate).
| Provider | Description |
|---|---|
MockProvider | Returns canned responses (for testing) |
OpenAiProvider | OpenAI API compatible endpoints |
AnthropicProvider | Anthropic Messages API |
OllamaProvider | Local Ollama instance (default: localhost:11434) |
Full Pipeline
use astraea_rag::{GraphRagConfig, graph_rag_query_anchored, MockProvider}; let config = GraphRagConfig { hops: 2, max_context_nodes: 50, text_format: TextFormat::Structured, token_budget: 4000, ..Default::default() }; let result = graph_rag_query_anchored( &*graph, &llm, "Who knows Alice?", node_id, &config )?; println!("Answer: {}", result.answer); println!("Context: {} tokens, {} nodes", result.estimated_tokens, result.nodes_in_context);
GNN Training
GNN (Graph Neural Network) is a type of neural network designed for graph-structured data. Unlike traditional neural networks that work on fixed-size inputs (like images), GNNs can learn from the structure of a graph — incorporating information from a node's neighbors to make predictions. AstraeaDB implements GNN training in pure Rust with no external ML framework.
Components
- Tensor — Basic multi-dimensional array with element-wise operations (add, mul, scale), activations (ReLU, sigmoid), and gradient tracking for backpropagation.
- Message Passing — The core GNN operation: aggregate neighbor features weighted by edge weights. Supports Sum, Mean, or Max aggregation.
- Training Loop — Forward pass (run N message passing layers) → compute loss against known labels → estimate gradients numerically → update edge weights via gradient descent.
Example
use astraea_gnn::{TrainingConfig, TrainingData, MessagePassingConfig}; use astraea_gnn::training::train_node_classification; let config = TrainingConfig { layers: 2, learning_rate: 0.01, epochs: 50, message_passing: MessagePassingConfig { aggregation: Aggregation::Mean, activation: Activation::ReLU, normalize: true, }, }; let result = train_node_classification(&*graph, &training_data, &config)?; println!("Accuracy: {:.1}%", result.accuracy * 100.0);
Graph Algorithms
The astraea-algorithms crate provides classical graph algorithms for analyzing graph structure.
| Algorithm | Function | Description |
|---|---|---|
| PageRank | pagerank(graph, nodes, config) | Ranks nodes by importance based on incoming links (like Google's original algorithm). Returns importance scores for each node. |
| Connected Components | connected_components(graph, nodes) | Groups nodes into clusters where every node can reach every other node (ignoring edge direction). |
| Strongly Connected | strongly_connected_components(graph, nodes) | Like connected components, but respects edge direction (for directed graphs). |
| Degree Centrality | degree_centrality(graph, nodes, direction) | Measures importance by counting connections. More connections = more central. |
| Betweenness Centrality | betweenness_centrality(graph, nodes) | Measures how often a node lies on shortest paths between other nodes. High betweenness = important bridge. |
| Community Detection | louvain(graph, nodes) | Finds densely-connected groups (communities) using the Louvain algorithm. Returns which community each node belongs to. |
PageRank Configuration
let config = PageRankConfig { damping: 0.85, // damping factor max_iterations: 100, // convergence limit tolerance: 1e-6, // L1 norm convergence threshold };
Network Server
AstraeaDB provides three transport layers for different use cases.
Transport Layers
| Transport | Port | Format | Best For |
|---|---|---|---|
| JSON-over-TCP | 7687 | Newline-delimited JSON | Debugging, scripting, netcat |
| gRPC / Protobuf | 7688 | Protocol Buffers | Production clients, type safety |
| Arrow Flight | 7689 | Apache Arrow | Python/Pandas, bulk operations |
Supported Request Types
| Request | Description |
|---|---|
CreateNode | Create a node with labels, properties, optional embedding |
CreateEdge | Create an edge between two nodes |
GetNode / GetEdge | Retrieve by ID |
UpdateNode / UpdateEdge | Merge properties |
DeleteNode / DeleteEdge | Delete (node deletion cascades edges) |
Neighbors | Get neighbors with direction and edge-type filtering |
Bfs | Breadth-first traversal with depth limit |
ShortestPath | Unweighted or weighted (Dijkstra) |
VectorSearch | k-nearest-neighbor via HNSW index |
HybridSearch | Graph + vector blended search |
SemanticNeighbors | Rank neighbors by concept similarity |
SemanticWalk | Greedy walk toward a concept |
NeighborsAt | Temporal neighbors at a timestamp |
BfsAt | Temporal BFS at a timestamp |
ShortestPathAt | Temporal shortest path at a timestamp |
ExtractSubgraph | Extract and linearize a local subgraph |
GraphRag | GraphRAG pipeline (search → subgraph → LLM) |
Query | Execute a GQL query string |
Ping | Health check |
JSON Protocol Examples
// Create a node with an embedding {"type":"CreateNode","labels":["Person"],"properties":{"name":"Alice"},"embedding":[0.1,0.2]} // Response {"status":"ok","data":{"node_id":1}} // Execute a GQL query {"type":"Query","gql":"MATCH (a:Person) RETURN a.name"} // Ping {"type":"Ping"} {"status":"ok","data":{"pong":true,"version":"0.1.0"}}
Authentication & RBAC
AstraeaDB supports API key authentication and mTLS (mutual TLS) with role-based access control.
Roles
| Role | Permissions |
|---|---|
Admin | Full access to all operations |
Writer | Read + write (CRUD, traversals, queries) |
Reader | Read-only (get, query, search, traverse, ping) |
API Key Authentication
Include an auth_token field in JSON requests:
{"type":"CreateNode","labels":["Person"],"properties":{},"auth_token":"my-api-key"}
If auth is enabled and the token is missing or invalid:
{"status":"error","message":"authentication required: provide auth_token"}
mTLS (Mutual TLS)
For production deployments, AstraeaDB supports TLS encryption with optional client certificate verification. The server uses rustls for modern, safe TLS.
TLS Configuration
use astraea_server::tls::{TlsConfig}; // Server-only TLS (encrypt traffic) let tls = TlsConfig::new( "server.crt", // Server certificate "server.key", // Server private key ); // mTLS (verify client certificates) let tls = TlsConfig::with_mtls( "server.crt", "server.key", "ca.crt", // CA cert for client verification );
Client Certificate Role Mapping
When mTLS is enabled, the client certificate's Common Name (CN) is automatically mapped to an RBAC role:
| Certificate CN | Role |
|---|---|
| Contains "admin" | Admin |
| Contains "writer" or "write" | Writer |
| All others | Reader |
TLS Helper Functions
use astraea_server::tls::*; // Load certificates and keys let certs = load_certs("server.crt")?; let key = load_private_key("server.key")?; // Extract info from client certificates let cn = extract_client_cn(&client_certs); // e.g., "admin-service" let sans = extract_sans(&cert); // Subject Alternative Names // Map CN to role let role = cn_to_role(&cn); // "admin-service" -> Admin
Read-Only Operations (accessible by Reader role)
GetNode, GetEdge, Neighbors, NeighborsAt, Bfs, BfsAt, ShortestPath, ShortestPathAt, VectorSearch, HybridSearch, SemanticNeighbors, SemanticWalk, Query, ExtractSubgraph, GraphRag, Ping
Audit Logging
All authenticated requests are logged with timestamp, truncated API key prefix (first 8 chars), role, operation, and whether it was allowed. The audit log is a bounded circular buffer (max 10,000 entries).
Key Management
let auth = AuthManager::new(vec![ ApiKeyEntry { key: "admin-key-xxx".into(), role: Role::Admin, description: "Admin key".into(), active: true }, ApiKeyEntry { key: "reader-key-xxx".into(), role: Role::Reader, description: "CI key".into(), active: true }, ]); // Runtime key management auth.add_key(new_entry); auth.revoke_key("compromised-key");
Observability
Prometheus Metrics
The server exposes metrics in Prometheus text exposition format.
| Metric | Type | Description |
|---|---|---|
astraea_requests_total{type="..."} | counter | Total requests by operation type |
astraea_errors_total{type="..."} | counter | Total errors by operation type |
astraea_request_duration_us{type="...",quantile="0.5|0.9|0.99"} | summary | Request duration percentiles (microseconds) |
astraea_active_connections | gauge | Currently active TCP connections |
astraea_connections_total | counter | Total connections since startup |
astraea_uptime_seconds | gauge | Server uptime in seconds |
Health Check
The health() method returns a JSON object:
{
"status": "healthy",
"uptime_seconds": 3600,
"active_connections": 12,
"total_connections": 1543,
"start_time": 1704067200
}
Connection Management
Configuration
| Parameter | Default | Description |
|---|---|---|
max_connections | 1024 | Maximum concurrent TCP connections. New connections beyond this are rejected. |
max_concurrent_requests | 256 | Request-level backpressure via semaphore. |
idle_timeout | 300s | Close connections idle for longer than this. |
request_timeout | 30s | Abort requests that take longer than this. |
drain_timeout | 10s | Max time to wait for in-flight requests during shutdown. |
Graceful Shutdown
- Stop accepting new connections
- Wait for in-flight requests to complete (up to
drain_timeout) - Close all connections
When the connection limit is reached, new connections receive:
{"status":"error","message":"server connection limit reached"}
Encryption
The astraea-crypto crate provides a foundation for encrypted queries, allowing clients to query the graph without the server seeing unencrypted data. This is essential for privacy-sensitive applications in banking, healthcare, and other regulated industries.
Key Management
use astraea_crypto::{KeyPair, EncryptedNode, EncryptedQueryEngine}; // Client generates a key pair let keys = KeyPair::generate(); // Encrypt a node let encrypted = EncryptedNode::from_node(&node, &keys.secret);
Server-Side Label Matching
Labels are encrypted with deterministic tags, allowing the server to compare encrypted labels without decryption:
// Server side: search by encrypted label let encrypted_label = EncryptedLabel::encrypt("Person", &keys.secret); let results = engine.find_by_encrypted_label(&encrypted_label); // Client side: decrypt results for enc_node in results { let node = enc_node.to_node(&keys.secret); }
Encryption Types
| Type | Description |
|---|---|
EncryptedValue | Randomized encryption (same plaintext → different ciphertexts) |
EncryptedLabel | Deterministic tag for matching + randomized value for confidentiality |
EncryptedNode | Encrypted labels (individually) + encrypted properties (as JSON blob). Node ID stays plaintext. |
GPU Acceleration
The astraea-gpu crate provides a framework for GPU-accelerated graph analytics. Graph algorithms like PageRank are fundamentally matrix operations, which GPUs can execute much faster than CPUs due to their parallel architecture.
CSR Matrix
Graphs are converted to CSR (Compressed Sparse Row) format for efficient matrix operations. CSR is a compact way to represent sparse matrices (matrices with mostly zeros, like adjacency matrices) that enables fast row access:
use astraea_gpu::{CsrMatrix, CpuBackend, GpuBackend}; let nodes = vec![n1, n2, n3, n4]; let csr = CsrMatrix::from_graph(&graph, &nodes)?; // csr.spmv(&x) -- sparse matrix-vector multiply (the core of PageRank) // csr.transpose() -- efficient transpose operation
GpuBackend Trait
| Method | Returns | Description |
|---|---|---|
pagerank(csr, config) | HashMap<NodeId, f64> | PageRank importance scores |
bfs(csr, source) | HashMap<NodeId, i32> | BFS levels (distance from source, -1 = unreachable) |
sssp(csr, source) | HashMap<NodeId, f64> | SSSP (Single-Source Shortest Path): shortest distances from one node to all others |
CPU Fallback
The CpuBackend implements all algorithms in pure Rust. It is always available and serves as the fallback when no GPU is present. The SSSP implementation uses the Bellman-Ford algorithm, which handles negative edge weights (unlike Dijkstra).
Clustering & Sharding
The astraea-cluster crate provides foundations for distributed graph processing.
Partitioning Strategies
| Strategy | Description |
|---|---|
HashPartitioner | Assigns nodes to shards via hash(node_id) % num_shards. Deterministic and evenly distributed. |
RangePartitioner | Assigns nodes based on ID ranges with configurable boundaries. Can be uniform or custom. |
Shard Management
use astraea_cluster::{ShardMap, ShardInfo, HashPartitioner}; let mut shard_map = ShardMap::new(Box::new(HashPartitioner::new(3))); shard_map.register_shard(info); let shard = shard_map.shard_for_node(node_id);
Cluster Coordinator
The ClusterCoordinator trait defines the contract for distributed operations. LocalCoordinator is the single-node implementation that routes everything locally.
CLI Reference
Commands
# Start the server astraeadb serve [--config config.toml] [--bind 0.0.0.0] [--port 7687] # Interactive shell (REPL) astraeadb shell [--address 127.0.0.1:7687] # Check server status astraeadb status [--address 127.0.0.1:7687] # Import data from JSON astraeadb import --file data.json --format json --data-dir ./data # Export data to JSON astraeadb export --file dump.json --format json --data-dir ./data
Configuration File (config.toml)
[server] bind_address = "127.0.0.1" port = 7687 [storage] data_dir = "data" buffer_pool_size = 1024 wal_dir = "data/wal"
Shell Features
- Readline support via
rustyline(history, line editing) - Auto-detects GQL queries vs raw JSON requests
- Table-formatted output for query results
- Dot-commands:
.help,.quit,.status
Python Client
Installation
# Basic (JSON only, zero dependencies) pip install ./python # With Arrow Flight support pip install ./python[arrow]
Client Types
| Client | Transport | Dependencies |
|---|---|---|
JsonClient | TCP / newline-delimited JSON | None (stdlib only) |
ArrowClient | Apache Arrow Flight | pyarrow >= 14.0 |
AstraeaClient | Auto-selects best transport | Optional pyarrow |
Usage
from astraeadb import AstraeaClient # Connect with optional authentication with AstraeaClient(host="127.0.0.1", port=7687, auth_token="my-api-key") as client: # Create nodes (embeddings auto-indexed) alice = client.create_node(["Person"], {"name": "Alice", "age": 30}, embedding=[0.1] * 128) bob = client.create_node(["Person"], {"name": "Bob"}) # Create a temporal edge client.create_edge(alice, bob, "KNOWS", {"since": 2020}, weight=0.9, valid_from=1609459200000) # Jan 1, 2021 (ms) # Traversals neighbors = client.neighbors(alice, direction="outgoing") path = client.shortest_path(alice, bob, weighted=True) reachable = client.bfs(alice, max_depth=2) # Temporal queries (time-travel) old_neighbors = client.neighbors_at(alice, "outgoing", 1577836800000) # Jan 1, 2020 historical_path = client.shortest_path_at(alice, bob, 1577836800000) # Vector search results = client.vector_search([0.15] * 128, k=5) # Hybrid search results = client.hybrid_search(anchor=alice, query_vector=[0.15] * 128, max_hops=3, k=10, alpha=0.5) # GraphRAG - extract subgraph context context = client.extract_subgraph(alice, hops=2, max_nodes=50, format="prose") # GraphRAG - full pipeline with LLM answer = client.graph_rag("Who does Alice know?", anchor=alice) # GQL query result = client.query("MATCH (a:Person) WHERE a.age > 25 RETURN a.name") # Batch operations node_ids = client.create_nodes([ {"labels": ["Person"], "properties": {"name": "Charlie"}}, {"labels": ["Person"], "properties": {"name": "Diana"}} ]) # Health check status = client.ping()
DataFrame Support (Optional)
from astraeadb import AstraeaClient from astraeadb.dataframe import import_nodes_df, export_nodes_df, export_bfs_df import pandas as pd # Import nodes from a DataFrame df = pd.DataFrame([ {"label": "Person", "name": "Alice", "age": 30}, {"label": "Person", "name": "Bob", "age": 25} ]) with AstraeaClient() as client: node_ids = import_nodes_df(client, df, label_col="label") # Export nodes back to DataFrame result_df = export_nodes_df(client, node_ids) # Export BFS results with node details bfs_df = export_bfs_df(client, start=node_ids[0], max_depth=2)
Arrow Flight Client (Bulk Operations)
from astraeadb import ArrowClient import pyarrow as pa arrow = ArrowClient(host="127.0.0.1", flight_port=7689) # Query results as Arrow Table (zero-copy to Pandas) table = arrow.query("MATCH (a:Person) RETURN a.name, a.age") df = table.to_pandas() # Bulk import nodes_table = pa.table({"id": [1, 2], "labels": ["Person", "Person"], "properties": ['{"name":"Alice"}', '{"name":"Bob"}']}) arrow.bulk_insert_nodes(nodes_table)
Python API Reference
| Category | Method | Description |
|---|---|---|
| Health | ping() | Health check, returns server version |
| Node CRUD | create_node(labels, properties?, embedding?) | Create a node, returns node ID |
get_node(id) | Get node by ID | |
update_node(id, properties) | Merge properties into a node | |
delete_node(id) | Delete node and all connected edges | |
| Edge CRUD | create_edge(source, target, type, props?, weight?, valid_from?, valid_to?) | Create edge with optional temporal validity |
get_edge(id) | Get edge by ID | |
update_edge(id, properties) | Update edge properties (merge) | |
delete_edge(id) | Delete an edge | |
| Traversal | neighbors(id, direction?, edge_type?) | Get neighbors |
bfs(start, max_depth?) | Breadth-first traversal | |
shortest_path(from, to, weighted?) | Shortest path (BFS or Dijkstra) | |
| Temporal | neighbors_at(id, direction, timestamp, edge_type?) | Neighbors at point in time |
bfs_at(start, max_depth, timestamp) | BFS at point in time | |
shortest_path_at(from, to, timestamp, weighted?) | Path at point in time | |
| Vector/Semantic | vector_search(embedding, k?) | k-nearest-neighbor search |
hybrid_search(anchor, query_vector, max_hops?, k?, alpha?) | Blended graph + vector search | |
semantic_neighbors(node, embedding, direction?, k?) | Rank neighbors by concept | |
semantic_walk(start, embedding, max_hops?) | Greedy semantic walk | |
| GraphRAG | extract_subgraph(center, hops?, max_nodes?, format?) | Extract + linearize subgraph |
graph_rag(question, anchor?, question_embedding?, hops?, max_nodes?, format?) | Full RAG pipeline with LLM | |
| GQL | query(gql_string) | Execute a GQL query |
| Batch Ops | create_nodes(nodes_list) | Create multiple nodes |
create_edges(edges_list) | Create multiple edges | |
delete_nodes(node_ids) | Delete multiple nodes | |
delete_edges(edge_ids) | Delete multiple edges |
DataFrame Module (astraeadb.dataframe)
Requires pandas: pip install pandas
| Function | Description |
|---|---|
import_nodes_df(client, df, label_col, embedding_cols?) | Import nodes from DataFrame |
import_edges_df(client, df, source_col, target_col, type_col, ...) | Import edges from DataFrame |
export_nodes_df(client, node_ids) | Export nodes to DataFrame |
export_edges_df(client, edge_ids) | Export edges to DataFrame |
export_bfs_df(client, start, max_depth?) | Export BFS results with node details |
export_bfs_at_df(client, start, max_depth, timestamp) | Export temporal BFS to DataFrame |
R Client
The R client provides full feature parity with the Python client, supporting all AstraeaDB operations via JSON/TCP, with optional Apache Arrow Flight support for high-performance queries.
Prerequisites
install.packages("jsonlite") # Required install.packages("arrow") # Optional, for Arrow Flight
Client Classes
| Class | Transport | Description |
|---|---|---|
AstraeaClient | JSON/TCP | Standard client, always available |
ArrowClient | Arrow Flight | High-performance queries (requires arrow package) |
UnifiedClient | Auto-select | Uses Arrow when available, falls back to JSON |
Basic Usage
source("examples/r_client.R") # Connect (with optional auth token) client <- AstraeaClient$new(host = "127.0.0.1", port = 7687L, auth_token = "my-key") client$connect() # Create nodes with embeddings id <- client$create_node( list("Person"), list(name = "Alice", age = 30), embedding = c(0.9, 0.1, 0.3) ) # Create temporal edges eid <- client$create_edge( id1, id2, "KNOWS", properties = list(since = 2024), weight = 0.9, valid_from = 1704067200000, # Jan 1, 2024 (ms) valid_to = NULL # Still active ) client$close()
Full API Reference
| Category | Method | Description |
|---|---|---|
| Node CRUD | create_node(labels, properties, embedding=NULL) | Create node, returns ID |
get_node(node_id) | Get node by ID | |
update_node(node_id, properties) | Update properties (merge) | |
delete_node(node_id) | Delete node + edges | |
| Edge CRUD | create_edge(src, tgt, type, props, weight, valid_from, valid_to) | Create temporal edge |
get_edge(edge_id) | Get edge by ID | |
update_edge(edge_id, properties) | Update properties (merge) | |
delete_edge(edge_id) | Delete edge | |
| Traversal | neighbors(node_id, direction, edge_type) | Get neighbors |
bfs(start, max_depth) | Breadth-first search | |
shortest_path(from, to, weighted) | Find shortest path | |
| Temporal | neighbors_at(node_id, direction, timestamp, edge_type) | Neighbors at point in time |
bfs_at(start, max_depth, timestamp) | BFS at point in time | |
shortest_path_at(from, to, timestamp, weighted) | Path at point in time | |
| GQL | query(gql) | Execute GQL query |
| Vector/Semantic | vector_search(query_vector, k) | k-NN vector search |
hybrid_search(anchor, query_vector, max_hops, k, alpha) | Graph + vector combined | |
semantic_neighbors(node_id, concept, direction, k) | Neighbors by similarity | |
semantic_walk(start, concept, max_hops) | Greedy semantic traversal | |
| GraphRAG | extract_subgraph(center, hops, max_nodes, format) | Extract + linearize |
graph_rag(question, anchor, embedding, hops, max_nodes, format) | Full RAG pipeline | |
| Batch Ops | create_nodes(nodes_list) | Create multiple nodes |
create_edges(edges_list) | Create multiple edges | |
delete_nodes(node_ids) | Delete multiple nodes | |
delete_edges(edge_ids) | Delete multiple edges | |
| Data Frame I/O | import_nodes_df(df, label_col, embedding_cols) | Import nodes from data.frame |
import_edges_df(df, source_col, target_col, ...) | Import edges from data.frame | |
export_nodes_df(node_ids) | Export nodes to data.frame | |
export_bfs_df(start, max_depth) | BFS results as data.frame | |
| Utility | results_to_dataframe(results) | Convert results to data.frame |
nodes_to_dataframe(node_ids) | Fetch nodes as data.frame |
Vector Search Example
# Find nodes similar to a "tech interest" vector tech_vector <- c(1.0, 0.0, 0.0) results <- client$vector_search(tech_vector, k = 5L) for (r in results) { node <- client$get_node(r$node_id) cat(sprintf(" %s (similarity=%.3f)\n", node$properties$name, r$similarity)) }
Temporal Query Example
# See who Alice knew in 2020 vs 2024 t_2020 <- 1577836800000 # Jan 1, 2020 t_2024 <- 1704067200000 # Jan 1, 2024 neighbors_2020 <- client$neighbors_at(alice, "outgoing", t_2020) neighbors_2024 <- client$neighbors_at(alice, "outgoing", t_2024) cat(sprintf("2020: %d connections\n", length(neighbors_2020))) cat(sprintf("2024: %d connections\n", length(neighbors_2024)))
GraphRAG Example
# Extract subgraph and get LLM answer subgraph <- client$extract_subgraph(alice_id, hops = 2L, format = "structured") cat(subgraph$text) # Linearized graph context # Full RAG pipeline (requires server LLM config) result <- client$graph_rag( question = "Who does Alice work with?", anchor = alice_id, hops = 2L ) cat(result$answer)
Batch Operations Example
# Create multiple nodes at once nodes <- list( list(labels = list("Person"), properties = list(name = "Alice"), embedding = c(0.9, 0.1)), list(labels = list("Person"), properties = list(name = "Bob"), embedding = c(0.1, 0.9)), list(labels = list("Person"), properties = list(name = "Charlie")) ) node_ids <- client$create_nodes(nodes) # Create multiple edges at once edges <- list( list(source = node_ids[1], target = node_ids[2], edge_type = "KNOWS"), list(source = node_ids[2], target = node_ids[3], edge_type = "KNOWS", weight = 0.5) ) edge_ids <- client$create_edges(edges)
Data Frame Import/Export
# Import nodes from a data.frame people_df <- data.frame( label = "Person", name = c("Alice", "Bob", "Charlie"), age = c(30, 25, 35) ) node_ids <- client$import_nodes_df(people_df, label_col = "label") # Export BFS results as a data.frame bfs_df <- client$export_bfs_df(node_ids[1], max_depth = 2L) print(bfs_df) # node_id depth labels name age # 1 1 0 Person Alice 30 # 2 2 1 Person Bob 25 # 3 3 2 Person Charlie 35
Arrow Flight (High-Performance)
# Option 1: Use ArrowClient directly arrow_client <- ArrowClient$new("grpc://localhost:7689") arrow_client$connect() result <- arrow_client$query("MATCH (p:Person) RETURN p.name, p.age") # Returns Arrow Table df <- arrow_client$query_df("MATCH (p:Person) RETURN p.name") # Returns data.frame arrow_client$close() # Option 2: Use UnifiedClient (auto-selects best transport) client <- UnifiedClient$new(host = "127.0.0.1", port = 7687L) client$connect() cat("Arrow enabled:", client$is_arrow_enabled(), "\n") result <- client$query_df("MATCH (n) RETURN n") # Uses Arrow if available client$close()
Running the Demo
# Terminal 1 cargo run -p astraea-cli -- serve # Terminal 2 Rscript examples/r_client.R
Go Client
A full-featured Go client library is provided in the go/astraeadb directory, published as github.com/AstraeaDB/AstraeaDB-Official. It supports three transport layers with idiomatic Go patterns including functional options, context.Context on every operation, and thread-safe connections.
Client Types
JSONClient— JSON/TCP transport (port 7687). Zero external dependencies beyond the Go standard library. Supports all 22 server operations.GRPCClient— gRPC transport (port 7688) with Protocol Buffers. Supports 14 RPCs for type-safe, high-performance access.Client(unified) — Auto-selects gRPC when available, falls back to JSON/TCP. Arrow Flight is stubbed for future implementation.
Installation
go get github.com/AstraeaDB/AstraeaDB-Official
Quick Start
package main import ( "context" "fmt" "log" "github.com/AstraeaDB/AstraeaDB-Official" ) func main() { ctx := context.Background() // Unified client: auto-selects gRPC when available client := astraeadb.NewClient( astraeadb.WithAddress("127.0.0.1", 7687), astraeadb.WithAuthToken("my-api-key"), ) if err := client.Connect(ctx); err != nil { log.Fatal(err) } defer client.Close() // Create nodes alice, _ := client.CreateNode(ctx, []string{"Person"}, map[string]any{"name": "Alice", "age": 30}, []float32{0.1, 0.2, 0.3}, ) bob, _ := client.CreateNode(ctx, []string{"Person"}, map[string]any{"name": "Bob", "age": 25}, nil, ) // Create a temporal edge with options client.CreateEdge(ctx, alice, bob, "KNOWS", astraeadb.WithWeight(0.9), astraeadb.WithProperties(map[string]any{"since": 2020}), astraeadb.WithValidFrom(1609459200000), ) // Traverse, search, and query neighbors, _ := client.Neighbors(ctx, alice, astraeadb.WithDirection("outgoing")) results, _ := client.VectorSearch(ctx, []float32{0.15, 0.25, 0.35}, 5) result, _ := client.Query(ctx, "MATCH (n:Person) RETURN n.name") rag, _ := client.GraphRAG(ctx, "Who does Alice know?", astraeadb.WithAnchor(alice)) fmt.Println(neighbors, results, result, rag) }
Configuration Options
The client uses the functional options pattern for configuration:
// All available options client := astraeadb.NewClient( astraeadb.WithAddress("db.example.com", 7687), astraeadb.WithGRPCPort(7688), astraeadb.WithFlightPort(7689), astraeadb.WithAuthToken("my-api-key"), astraeadb.WithTimeout(30 * time.Second), astraeadb.WithDialTimeout(5 * time.Second), astraeadb.WithTLS("ca.pem"), // Server TLS astraeadb.WithMTLS("client.pem", "client.key", "ca.pem"), // Mutual TLS astraeadb.WithMaxRetries(5), astraeadb.WithReconnect(true), )
Error Handling
The Go client provides sentinel errors for programmatic error handling with errors.Is():
import "errors" _, err := client.GetNode(ctx, 999) if errors.Is(err, astraeadb.ErrNodeNotFound) { // Handle missing node } // Available sentinel errors: // ErrNotConnected, ErrNodeNotFound, ErrEdgeNotFound, // ErrNoVectorIndex, ErrAccessDenied, ErrInvalidCreds, ErrAuthRequired
Batch Operations
nodes := []astraeadb.NodeInput{ {Labels: []string{"Person"}, Properties: map[string]any{"name": "Charlie"}}, {Labels: []string{"Person"}, Properties: map[string]any{"name": "Diana"}}, } ids, err := client.CreateNodes(ctx, nodes) edges := []astraeadb.EdgeInput{ {Source: ids[0], Target: ids[1], EdgeType: "KNOWS", Weight: 0.8}, } edgeIDs, err := client.CreateEdges(ctx, edges)
API Reference
| Category | Method | Description |
|---|---|---|
| Health | Ping(ctx) | Health check, returns server version |
| Node CRUD | CreateNode(ctx, labels, properties, embedding) | Create a node, returns node ID |
GetNode(ctx, id) | Retrieve node by ID | |
UpdateNode(ctx, id, props) | Merge properties into node | |
DeleteNode(ctx, id) | Delete node and connected edges | |
| Edge CRUD | CreateEdge(ctx, src, tgt, type, opts...) | Create edge with WithWeight, WithProperties, WithValidFrom, WithValidTo |
GetEdge(ctx, id) / UpdateEdge / DeleteEdge | Get, update, or delete an edge | |
| Traversal | Neighbors(ctx, id, opts...) | Get neighbors with WithDirection, WithEdgeType |
BFS(ctx, start, maxDepth) | Breadth-first traversal | |
ShortestPath(ctx, from, to, weighted) | Shortest path (BFS or Dijkstra) | |
| Temporal | NeighborsAt(ctx, id, direction, timestamp) | Neighbors at point in time |
BFSAt(ctx, start, maxDepth, timestamp) | BFS at point in time | |
ShortestPathAt(ctx, from, to, timestamp, weighted) | Path at point in time | |
| Vector | VectorSearch(ctx, embedding, k) | k-nearest-neighbor search |
HybridSearch(ctx, anchor, embedding, opts...) | Blended graph + vector search | |
SemanticNeighbors(ctx, id, concept, opts...) | Rank neighbors by concept similarity | |
SemanticWalk(ctx, start, concept, maxHops) | Greedy semantic walk | |
| GraphRAG | ExtractSubgraph(ctx, center, opts...) | Extract + linearize subgraph |
GraphRAG(ctx, question, opts...) | Full RAG pipeline with LLM | |
| GQL | Query(ctx, gql) | Execute a GQL query |
| Batch | CreateNodes(ctx, nodes) / CreateEdges(ctx, edges) | Batch create |
DeleteNodes(ctx, ids) / DeleteEdges(ctx, ids) | Batch delete |
Transport Selection
The unified Client automatically selects the best available transport:
- gRPC-supported operations (CRUD, traversal, vector search, GQL query) → gRPC when available, JSON/TCP fallback
- Temporal queries (
NeighborsAt,BFSAt,ShortestPathAt) → always JSON/TCP (not in gRPC proto) - Semantic operations (
HybridSearch,SemanticNeighbors,SemanticWalk) → always JSON/TCP - GraphRAG (
ExtractSubgraph,GraphRAG) → always JSON/TCP
// Check transport availability at runtime fmt.Println("gRPC available:", client.IsGRPCAvailable()) fmt.Println("Arrow available:", client.IsArrowAvailable())
Running the Tests
# From go/astraeadb/ go test -v -race ./... # Run with the Makefile make test
Java Client
A full-featured Java client is provided in the java/astraeadb directory as a Gradle multi-module project. It supports three transport layers with idiomatic Java patterns including records (Java 17+), the builder pattern, try-with-resources lifecycle, and thread-safe connections.
Client Types
JsonClient— JSON/TCP transport (port 7687). Supports all 22 server operations with Jackson JSON serialization.GrpcClient— gRPC transport (port 7688) with Protocol Buffers. Supports 14 RPCs for type-safe, high-performance access.FlightAstraeaClient— Arrow Flight transport (port 7689). Supports queries and bulk import via zero-copy Arrow tables.UnifiedClient— Auto-selects the best transport per operation: gRPC for CRUD/traversal, Arrow Flight for queries/bulk, JSON/TCP for temporal/semantic/GraphRAG.
Gradle Dependency
dependencies {
implementation "com.astraeadb:astraeadb-unified:0.1.0" // All transports
// Or individual: astraeadb-json, astraeadb-grpc, astraeadb-flight
}
Usage Example
import com.astraeadb.unified.UnifiedClient; import com.astraeadb.model.*; import com.astraeadb.options.*; try (var client = UnifiedClient.builder() .host("127.0.0.1") .authToken("my-api-key") .build()) { client.connect(); // Create nodes with embeddings long alice = client.createNode( List.of("Person"), Map.of("name", "Alice", "age", 30), new float[]{0.1f, 0.2f, 0.3f}); // Create a temporal edge client.createEdge(alice, bob, "KNOWS", EdgeOptions.builder() .weight(0.9) .validFrom(1609459200000L) .build()); // Traverse, search, query List<NeighborEntry> neighbors = client.neighbors(alice, NeighborOptions.builder().direction("outgoing").build()); List<SearchResult> results = client.vectorSearch( new float[]{0.15f, 0.25f, 0.35f}, 5); QueryResult result = client.query( "MATCH (n:Person) RETURN n.name"); RagResult rag = client.graphRag("Who does Alice know?", RagOptions.builder().anchor(alice).hops(2).build()); }
Exception Handling
The Java client uses a checked exception hierarchy rooted at AstraeaException:
try { Node node = client.getNode(999); } catch (NodeNotFoundException e) { // Specific exception for not-found } catch (AccessDeniedException e) { // Permission error } catch (AstraeaException e) { // Base exception for all errors }
Java API Reference
| Category | Method | Description |
|---|---|---|
| Health | ping() | Health check, returns version |
| Node CRUD | createNode(labels, props, embedding) | Create node, returns ID |
getNode(id) / updateNode(id, props) / deleteNode(id) | Read/update/delete | |
| Edge CRUD | createEdge(src, tgt, type, options) | Create edge with EdgeOptions |
getEdge(id) / updateEdge(id, props) / deleteEdge(id) | Read/update/delete | |
| Traversal | neighbors(id, options) | Get neighbors with NeighborOptions |
bfs(start, maxDepth) | Breadth-first traversal | |
shortestPath(from, to, weighted) | Shortest path | |
| Temporal | neighborsAt(id, dir, timestamp) | Neighbors at time T |
bfsAt(start, depth, timestamp) | BFS at time T | |
shortestPathAt(from, to, ts, weighted) | Path at time T | |
| Vector | vectorSearch(embedding, k) | k-NN search |
hybridSearch(anchor, embedding, options) | Graph + vector | |
semanticNeighbors(id, concept, options) | Neighbors by similarity | |
semanticWalk(start, concept, maxHops) | Semantic walk | |
| GraphRAG | extractSubgraph(center, options) | Extract + linearize |
graphRag(question, options) | Full RAG pipeline | |
| GQL | query(gql) | Execute a GQL query |
| Batch | createNodes(nodes) / createEdges(edges) | Batch create |
deleteNodes(ids) / deleteEdges(ids) | Batch delete |
Cybersecurity Demo
This example demonstrates how AstraeaDB enables security analysts to investigate network alerts by tracing connections through a graph.
The Problem
When a firewall alerts on suspicious traffic from 10.0.1.50, the analyst must manually search DHCP logs, asset management records, and other sources to trace the IP to a user. With AstraeaDB, these datasets are loaded as a graph and the investigation becomes a series of traversals.
Graph Model
User <--[ASSIGNED_TO]-- Laptop <--[DHCP_LEASE]-- IPAddress
|
[TRAFFIC] [TRIGGERED]
| |
IPAddress FirewallAlert --[TARGETS]--> ExternalHost
The Scenario
Three employees — Alice (Engineering), Bob (Finance), and Eve (Marketing) — each have laptops with DHCP-assigned IPs. Eve's attack chain:
- Downloads a password cracker from
darktools.example.com(port 443) - Firewall logs the connection (alert
FW-2025-0042, severity: critical) - Attempts RDP to Bob's machine at
10.0.1.20:3389— blocked - Attempts SSH to Alice's machine at
10.0.1.10:22— blocked
Investigation with AstraeaDB
# Step 1: Who triggered alert FW-2025-0042? sources = client.neighbors(alert_id, "incoming", edge_type="TRIGGERED") # → Source IP: 10.0.1.50 # Step 2: Trace IP → Laptop via DHCP lease leases = client.neighbors(source_ip_id, "outgoing", edge_type="DHCP_LEASE") # → Laptop: EVE-LAT01 # Step 3: Trace Laptop → User users = client.neighbors(laptop_id, "outgoing", edge_type="ASSIGNED_TO") # → User: Eve (Marketing, Analyst) # Step 4: What else has Eve's IP been doing? traffic = client.neighbors(source_ip_id, "outgoing", edge_type="TRAFFIC") # → darktools.example.com:443, 10.0.1.20:3389 (RDP), 10.0.1.10:22 (SSH) # Step 5: BFS blast radius blast_radius = client.bfs(source_ip_id, max_depth=2)
Running the Demo
# Terminal 1 cargo run -p astraea-cli -- serve # Terminal 2 python3 examples/cybersecurity_demo.py
13 Rust tests cover this scenario in the astraea-graph crate:
cargo test --package astraea-graph cybersecurity
API Reference
Complete JSON request/response format for all request types. All requests are newline-delimited JSON sent over TCP (port 7687).
Node Operations
CreateNode
// Request {"type":"CreateNode","labels":["Person"],"properties":{"name":"Alice","age":30},"embedding":[0.1,0.2]} // Response {"status":"ok","data":{"node_id":1}}
GetNode
{"type":"GetNode","id":1}
// Response includes id, labels, properties, embedding
UpdateNode
{"type":"UpdateNode","id":1,"properties":{"title":"Engineer"}}
DeleteNode
{"type":"DeleteNode","id":1}
// Cascades: all connected edges are also deleted
Edge Operations
CreateEdge
{"type":"CreateEdge","source":1,"target":2,"edge_type":"KNOWS",
"properties":{"since":2020},"weight":0.9,
"valid_from":1704067200000,"valid_to":null}
GetEdge / UpdateEdge / DeleteEdge
{"type":"GetEdge","id":1}
{"type":"UpdateEdge","id":1,"properties":{"note":"updated"}}
{"type":"DeleteEdge","id":1}
Traversal Operations
Neighbors
{"type":"Neighbors","id":1,"direction":"outgoing","edge_type":"KNOWS"}
Bfs
{"type":"Bfs","start":1,"max_depth":3}
ShortestPath
{"type":"ShortestPath","from":1,"to":5,"weighted":false}
Vector & Semantic Operations
VectorSearch
{"type":"VectorSearch","query":[0.1,0.2,0.3],"k":10}
HybridSearch
{"type":"HybridSearch","anchor":1,"query":[0.1,0.2],"max_hops":3,"k":10,"alpha":0.5}
SemanticNeighbors
{"type":"SemanticNeighbors","id":1,"concept":[0.1,0.2],"direction":"outgoing","k":5}
SemanticWalk
{"type":"SemanticWalk","start":1,"concept":[0.1,0.2],"max_hops":4}
Temporal Operations
NeighborsAt
{"type":"NeighborsAt","id":1,"direction":"outgoing","timestamp":1736929800000}
BfsAt
{"type":"BfsAt","start":1,"max_depth":3,"timestamp":1736929800000}
ShortestPathAt
{"type":"ShortestPathAt","from":1,"to":5,"timestamp":1736929800000,"weighted":false}
RAG Operations
ExtractSubgraph
{"type":"ExtractSubgraph","center":1,"hops":2,"max_nodes":50,"format":"structured"}
GraphRag
{"type":"GraphRag","question":"Who compromised the server?",
"anchor":1,"hops":2,"max_nodes":50,"format":"structured"}
Query & Health
Query
{"type":"Query","gql":"MATCH (a:Person)-[:KNOWS]->(b) RETURN a.name, b.name"}
Ping
{"type":"Ping"}
{"status":"ok","data":{"pong":true,"version":"0.1.0"}}
Glossary
Quick reference for technical terms used throughout this documentation.
| Term | Definition |
|---|---|
| ANN | Approximate Nearest Neighbor — A search algorithm that finds vectors close to a query vector, trading perfect accuracy for speed. Returns results that are "good enough" rather than guaranteed optimal. |
| Arrow Flight | A high-performance protocol for streaming columnar data using Apache Arrow's in-memory format. Enables zero-copy data transfer between database and client. |
| BFS | Breadth-First Search — A graph traversal algorithm that explores all neighbors at the current depth before moving deeper. Visits nodes level-by-level. |
| CSR | Compressed Sparse Row — A memory-efficient matrix format that stores only non-zero values. Used to represent sparse graphs as adjacency matrices for fast GPU operations. |
| DFS | Depth-First Search — A graph traversal algorithm that explores as far as possible along each branch before backtracking. Follows one path to its end before trying alternatives. |
| Embedding | A fixed-size numeric vector (array of floats) that captures the semantic meaning of data. Similar concepts have similar embeddings, enabling similarity search. |
| FHE | Fully Homomorphic Encryption — Encryption that allows computation on encrypted data without decrypting it first. The server never sees plaintext. |
| GNN | Graph Neural Network — A neural network designed for graph-structured data. Learns by passing messages between connected nodes to capture both features and structure. |
| GQL | Graph Query Language — The ISO standard (2024) for querying graph databases. Combines the best features of Cypher and SQL with pattern-matching syntax. |
| GraphRAG | Graph-enhanced Retrieval-Augmented Generation — A technique that extracts relevant subgraphs, converts them to text, and feeds them to an LLM to answer questions with graph context. |
| gRPC | Google Remote Procedure Call — A high-performance RPC framework using Protocol Buffers for serialization. More efficient than JSON for structured data. |
| HNSW | Hierarchical Navigable Small World — A graph-based algorithm for approximate nearest neighbor search. Builds a multi-layer navigation structure for fast similarity queries with O(log n) complexity. |
| io_uring | A Linux kernel interface for high-performance asynchronous I/O. Uses shared ring buffers between user space and kernel to minimize syscall overhead. |
| LLM | Large Language Model — An AI model trained on vast text data that can understand and generate human language (e.g., GPT-4, Claude). |
| LSN | Log Sequence Number — A monotonically increasing identifier for each entry in the write-ahead log. Used for recovery and replication. |
| mTLS | Mutual TLS — Two-way TLS authentication where both client and server present certificates. Provides strong identity verification for both parties. |
| MVCC | Multi-Version Concurrency Control — A database technique that maintains multiple versions of data to allow concurrent reads and writes without blocking. Each transaction sees a consistent snapshot. |
| NVMe | Non-Volatile Memory Express — A high-speed storage interface protocol designed for SSDs. Provides much lower latency than SATA or SAS. |
| Parquet | A columnar file format optimized for analytics. Stores data by column rather than row, enabling efficient compression and fast analytical queries. |
| Pointer Swizzling | A technique that converts disk-based identifiers (64-bit IDs) into direct memory pointers when data is loaded into RAM, enabling nanosecond-level access. |
| RBAC | Role-Based Access Control — A security model where permissions are assigned to roles (Admin, Writer, Reader) rather than individual users. Users are granted roles. |
| SCC | Strongly Connected Components — Maximal subgraphs where every node can reach every other node following directed edges. Used to find tightly-knit groups. |
| SEAL | Microsoft Simple Encrypted Arithmetic Library — An open-source library for homomorphic encryption that enables computation on encrypted data. |
| SSSP | Single-Source Shortest Path — An algorithm that computes the shortest distance from one source node to all other nodes in the graph (e.g., Dijkstra's algorithm). |
| TLS | Transport Layer Security — A cryptographic protocol that provides secure communication over networks. Successor to SSL. |
| WAL | Write-Ahead Log — A durability mechanism that logs all mutations to disk before applying them. Enables crash recovery by replaying the log. |
AstraeaDB — Cloud-Native, AI-First Graph Database — MIT License
441 Rust tests • 23 Python tests • 113 Java tests • 14 crates • Edition 2024