A Gentle Introduction to AstraeaDB

A progressive learning path that takes you from "What is a graph?" all the way to running Graph Neural Networks, building GraphRAG pipelines, and deploying production-grade security—one chapter at a time.

How to use this guide Each chapter builds on the previous one. If you are new to graph databases, start with Part I. If you already have experience, feel free to jump directly to the topic you need. Every chapter includes working code examples that you can run against a live AstraeaDB instance.

Part I: Foundations

Understand why graph databases exist, how they differ from relational tables, and where AstraeaDB fits in the broader landscape.

Chapter 1: Why Graphs?

Discover the limits of relational tables when modeling connected data. Learn the fundamentals of nodes, edges, and properties, and see why graph traversals outperform SQL JOINs for relationship-heavy queries.

Chapter 2: The Graph Database Landscape

Survey the major graph data models (Property Graph vs. RDF), query languages (Cypher, Gremlin, SPARQL, GQL), and see how AstraeaDB's Vector-Property Graph and Rust foundation set it apart from Neo4j, TigerGraph, and others.

Part II: Getting Started

Get AstraeaDB running on your machine and build your first graph in minutes.

Chapter 3: Installation and Setup

Clone the repository, build from source with Cargo, start the server, and connect using the interactive shell. Covers all four transport protocols: JSON-TCP, gRPC, Arrow Flight, and MCP for LLM integration.

Chapter 4: Your First Graph

Create nodes and edges, attach properties and labels, query with MATCH/RETURN, and visualize results. A hands-on walkthrough using a small social network as a running example.

Part III: Intermediate

Master the query language, explore traversal algorithms, and understand the network protocols that power AstraeaDB clients.

Chapter 5: The GQL Query Language

Deep dive into AstraeaDB's GQL implementation: pattern matching with MATCH, filtering with WHERE, creating and deleting data, aggregation functions, ORDER BY, LIMIT, and more.

Chapter 6: Graph Traversals

Understand BFS and DFS traversals, shortest-path algorithms, neighbor expansion, and how index-free adjacency makes multi-hop queries efficient at any depth.

Chapter 7: Transport Protocols

Compare JSON-TCP (simple, zero-dependency), gRPC (strongly typed, streaming), Apache Arrow Flight (zero-copy, DataFrame-native), and MCP (LLM tool integration). Learn when to use each and how to configure clients.

Part IV: Advanced

Unlock the AI-first capabilities that make AstraeaDB unique: vector search, temporal queries, graph algorithms, RAG pipelines, and neural network training.

Chapter 8: Vector Search and Semantic Queries

Store embeddings on nodes, run approximate nearest-neighbor search with HNSW, blend vector similarity with graph proximity using hybrid search, and perform semantic walks.

Chapter 9: Temporal Graphs

Model time-varying relationships with validity intervals. Query the graph as it existed at any point in time using neighbors_at(), bfs_at(), and shortest_path_at().

Chapter 10: Graph Algorithms

Run PageRank, Louvain community detection, betweenness centrality, and connected components. Understand the CSR matrix representation and optional GPU acceleration.

Chapter 11: GraphRAG

Build Retrieval-Augmented Generation pipelines: anchor on a node via vector search, extract a subgraph with BFS, linearize it to text, and feed it to an LLM—all in one atomic operation.

Chapter 12: Graph Neural Networks

Train GNNs inside the database. Create differentiable tensors, define message-passing layers, run forward passes, compute loss, and backpropagate—without exporting data to an external framework.

Part V: Production

Harden your deployment with authentication, encryption, performance tuning, and real-world scenario planning.

Chapter 13: Security

Configure RBAC authentication, enable mutual TLS, and explore homomorphic encryption for server-side label matching on encrypted data—the server never sees plaintext node labels.

Chapter 14: Performance and Scaling

Tune the buffer pool, configure pointer swizzling thresholds, understand MVCC and WAL, set up hash/range partitioning, and monitor with built-in metrics.

Chapter 15: Cybersecurity Scenario

A complete end-to-end walkthrough: model a corporate network as a graph, ingest firewall logs, detect lateral movement with traversals, and identify attack paths using graph algorithms.

Appendices

Quick references and cheat sheets for everyday use.