AstraeaDB Logo

AstraeaDB

A cloud-native, AI-first graph database written in Rust. Unifying property graphs, vector search, and graph neural networks in a single system.

14
Rust Crates
441+
Tests
3
Transport Protocols
5
Client Libraries

Best-in-Class Features, Unified

AstraeaDB synthesizes the strengths of leading graph databases into a single, cohesive system built from the ground up in Rust.

🧠

Vector-Property Graph

Nodes carry labels, JSON properties, and float32 embedding vectors. The HNSW vector index navigation links map to graph edges, enabling semantic traversal—find neighbors most similar to a concept, not just structurally connected.

Native Graph Storage

Index-free adjacency via pointer swizzling. Hot pages are promoted to direct memory pointers for nanosecond-level traversal—O(k) neighbor lookups instead of O(log N) index scans.

🔍

Hybrid & Semantic Search

Blend graph proximity with vector similarity using a configurable alpha. Semantic walks greedily traverse the graph toward a concept embedding, combining structural and semantic intelligence.

💬

GraphRAG Engine

Built-in Retrieval-Augmented Generation: vector search finds the anchor, BFS extracts a subgraph, linearization converts it to text, and the result is fed to an LLM—all in one atomic operation.

🧬

GNN Training

Differentiable tensors and message passing layers built in. Run a training loop for node classification directly inside the database—forward pass, loss computation, and backpropagation through edge weights.

🕒

Temporal Graphs

Edges carry validity intervals. Query the graph as it existed at any point in time: neighbors_at(), bfs_at(), and shortest_path_at() with full temporal filtering.

📝

GQL Query Language

Hand-written recursive-descent parser for ISO GQL / Cypher. Full execution pipeline: MATCH with pattern matching, WHERE filtering, CREATE, DELETE, ORDER BY, LIMIT, aggregation functions, and more.

🔒

Security & Encryption

RBAC authentication, mutual TLS, and a homomorphic encryption engine that allows server-side label matching on encrypted data—the server never sees unencrypted node labels.

🚀

Zero-Copy Data Exchange

Apache Arrow Flight server for zero-copy data transfer. Stream GQL results directly into Pandas or Polars DataFrames without serialization overhead. JSON-TCP and gRPC transports also available.

Tiered Storage Architecture

A three-tier "hydrated" architecture that solves the cloud-native memory wall problem, written entirely in Rust for memory safety and zero-GC pauses.

Tier 3 — Hot (RAM)

Pointer swizzling promotes active subgraphs into RAM. 64-bit disk IDs are converted to direct memory pointers for nanosecond-level traversal. HNSW index lives here.

Tier 2 — Warm (NVMe SSD)

LRU buffer pool caches 8 KiB pages with pin/unpin semantics. Pluggable I/O backends: memmap2 (cross-platform) and io_uring (Linux async I/O).

Tier 1 — Cold (Object Storage)

Data persists in JSON, Apache Parquet, or cloud object stores (S3, GCS, Azure). Open formats for interoperability and long-term archival.

System Architecture

+----------------------------------+ | astraea-cli | | serve | shell | import | export| +--------------+-------------------+ | +------------------------+------------------------+ | | | +---------v-----------+ +---------v-----------+ +---------v-----------+ | astraea-server | | astraea-flight | | Python / R / Go / | | JSON-TCP (7687) | | Arrow Flight | | Java — Client Libs | | gRPC (7688) | | do_get / do_put | | JSON + gRPC | | Auth, Metrics | | | | + Arrow Flight | +--------+------------+ +--------+------------+ +---------------------+ | | +-----------+------------+ | +----------------+------------------+----------------+ | | | | +---v--------+ +----v-----------+ +----v--------+ +----v--------------+ | astraea- | | astraea- | | astraea- | | astraea- | | rag | | query | | gnn | | algorithms | | Subgraph | | GQL Parser | | Tensor, | | PageRank, Louvain | | Linearize | | + Executor | | MsgPassing | | Centrality, | | LLM, RAG | | | | Training | | Components | +---+--------+ +----+-----------+ +----+--------+ +----+--------------+ | | | | +----------------+---------+--------+----------------+ | +------------+------------+ | | +-------v----------+ +-----------v--------+ | astraea-graph | | astraea-vector | | CRUD, BFS, DFS | | HNSW Index | | Hybrid Search | | ANN Search | | Semantic Walk | | Persistence | | Temporal Queries| | | +--------+---------+ +--------------------+ | +--------v-----------------------------+ | astraea-storage | | Pages | Buffer Pool | Pointer Swizzle| | MVCC, WAL, PageIO, Cold Storage | +--------+-----------------------------+ | +-------------+-----------+-----------------+ | | | | +----v-------+ +---v--------+ +--v--------------+ | astraea- | | astraea- | | astraea- | | crypto | | gpu | | cluster | | Encrypted | | CSR Matrix | | Partitioning | | Labels, | | CPU/GPU | | Sharding | | FHE Engine | | Backends | | Coordination | +------------+ +------------+ +-----------------+

Quick Start

Get up and running in minutes. Build from source, start the server, and connect with the interactive shell or your language of choice.

# Build the entire workspace cargo build --workspace # Run all 441+ tests cargo test --workspace # Start the server (JSON-TCP :7687, gRPC :7688, Arrow Flight :7689) cargo run -p astraea-cli -- serve # Connect with the interactive shell cargo run -p astraea-cli -- shell
# GQL queries in the shell astraea> CREATE (a:Person {name: "Alice", age: 30}) Nodes created: 1 astraea> MATCH (a:Person) WHERE a.age > 25 RETURN a.name, a.age ORDER BY a.age DESC +-------+------+ | a.name| a.age| +-------+------+ | Alice | 30 | +-------+------+
# Python client (zero dependencies for JSON, optional pyarrow for Arrow) pip install ./python from astraeadb import AstraeaClient with AstraeaClient() as client: alice = client.create_node(["Person"], {"name": "Alice"}, embedding=[0.1] * 128) results = client.hybrid_search(alice, query_vec, max_hops=3, k=10, alpha=0.5) answer = client.graph_rag("Who does Alice know?", anchor=alice)
// Go client (auto-selects gRPC when available, falls back to JSON/TCP) import "github.com/AstraeaDB/AstraeaDB-Official" client := astraeadb.NewClient( astraeadb.WithAddress("127.0.0.1", 7687), ) client.Connect(ctx) defer client.Close() alice, _ := client.CreateNode(ctx, []string{"Person"}, map[string]any{"name": "Alice"}, nil) result, _ := client.Query(ctx, "MATCH (n:Person) RETURN n.name")
// Java client (JSON/TCP + gRPC + Arrow Flight, auto-selects best transport) import com.astraeadb.unified.UnifiedClient; try (var client = UnifiedClient.builder() .host("127.0.0.1").build()) { client.connect(); long alice = client.createNode( List.of("Person"), Map.of("name", "Alice"), null); QueryResult result = client.query( "MATCH (n:Person) RETURN n.name"); }

How AstraeaDB Compares

Synthesizing the best features from across the graph database ecosystem into one unified system.

Capability Current Leader AstraeaDB
Native Graph Storage Neo4j Index-free adjacency with pointer swizzling
Massively Parallel Processing TigerGraph Hash/range partitioning, shard coordination
Multi-Model Flexibility ArangoDB Vector-Property Graph (JSON + embeddings)
In-Memory Speed Memgraph Pointer swizzling + HNSW in hot tier
Vector / AI Integration Weaviate / Neo4j Built-in HNSW, GNN training, GraphRAG
Query Standard ISO GQL (2024) GQL / Cypher parser + full executor
Privacy / Encryption Homomorphic encryption for encrypted label matching

Built-in Graph Algorithms

Production-ready implementations of essential graph analytics, with optional GPU acceleration.

PageRank

Power iteration with dangling node handling for node importance ranking.

Community Detection

Louvain algorithm for discovering densely connected clusters.

Centrality

Degree and betweenness centrality (Brandes' algorithm) for identifying key nodes.

Components

Connected and strongly-connected components via Tarjan's algorithm.

Client Libraries

Connect from your language of choice over three transport protocols.

Python

JsonClient (zero deps), ArrowClient (pyarrow), and AstraeaClient (unified). DataFrame integration with Pandas and Polars. 23 tests.

R

AstraeaClient, ArrowClient, and UnifiedClient. Full feature parity with the Python client, including data.frame import/export.

Go

JSONClient (zero deps), GRPCClient (protobuf), and unified Client that auto-selects gRPC when available. Functional options, context.Context support, and batch operations. 30 tests.

Java

JsonClient (all 22 ops), GrpcClient (protobuf, 14 RPCs), FlightAstraeaClient (Arrow), and UnifiedClient (auto-transport). Java 17+ records, builder pattern, try-with-resources. 113 tests.

Rust (Embedded)

Use AstraeaDB as a library with no network overhead. Direct access to Graph, StorageEngine, and VectorIndex traits.

Ready to Get Started?

Explore the full documentation, browse the source code, or jump straight into building.

Gentle Introduction Read the Wiki View on GitHub GNN Tutorial GNN: Bitcoin AML Vignette GraphRAG Tutorial