Chapter 2: The Graph Database Landscape

Not all graph databases are alike. They differ in their data models, query languages, storage architectures, and intended workloads. This chapter surveys the landscape so you can make informed decisions—and understand exactly where AstraeaDB fits.

2.1 Property Graphs vs. RDF

The graph database world is divided into two major data model families. Understanding the difference is important because it shapes the query language, the API, and the kinds of problems each model handles naturally.

The Property Graph model

In a Property Graph, nodes carry labels and key-value properties, and edges are directed, typed, and can also carry properties. This is the model used by Neo4j, Memgraph, TigerGraph, and AstraeaDB. It is the dominant model in the industry for transactional and analytical graph workloads.

A Property Graph representation of "Alice knows Bob since 2019":

// Two nodes with labels and properties
(alice:Person {name: "Alice", age: 30})
(bob:Person   {name: "Bob",   age: 32})

// One directed edge with a type and properties
(alice)-[:KNOWS {since: 2019}]->(bob)

Key characteristics:

The RDF model

RDF (Resource Description Framework) represents everything as subject-predicate-object triples. Each element is identified by a URI. Relationships do not carry properties directly; you must use reification (creating a node to represent the relationship itself) to attach metadata.

The same "Alice knows Bob since 2019" in RDF (Turtle syntax):

# Prefix declarations
@prefix ex: <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Basic triple: Alice knows Bob
ex:alice foaf:knows ex:bob .

# Node properties require separate triples
ex:alice foaf:name "Alice" .
ex:alice ex:age "30"^^xsd:integer .
ex:bob   foaf:name "Bob" .
ex:bob   ex:age "32"^^xsd:integer .

# Edge properties require reification (a new node for the relationship)
ex:friendship1 a rdf:Statement ;
    rdf:subject   ex:alice ;
    rdf:predicate foaf:knows ;
    rdf:object    ex:bob ;
    ex:since      "2019"^^xsd:integer .

Key characteristics:

Comparison

Aspect Property Graph RDF
Node identity Internal ID + labels URI (globally unique)
Edge properties Native key-value pairs Requires reification
Schema Optional (schema-free or enforced) RDFS/OWL ontologies
Query language Cypher, GQL, Gremlin SPARQL
Reasoning Not built-in OWL inference, entailment
Developer ergonomics Intuitive for app developers Steeper learning curve
Data interchange Vendor-specific formats Universal (URIs, standards)
Primary audience Application development, analytics Linked data, semantic web, research
AstraeaDB's choice AstraeaDB uses the Property Graph model. This decision reflects its primary design goals: developer ergonomics, performance for deep traversals, and seamless integration with AI/ML workflows. The Property Graph model maps naturally to JSON documents, making it straightforward to attach rich metadata to both nodes and edges without reification overhead.

2.2 Query Languages

The query language you use determines how you express graph patterns, traversals, and mutations. Here are the four major graph query languages and how they compare:

Cypher (Neo4j)

Cypher pioneered ASCII art pattern matching: you draw the graph pattern you want to find using parentheses for nodes and arrows for edges. It is declarative—you describe what you want, not how to get it.

// Find Alice's friends who are older than 25
MATCH (a:Person {name: "Alice"})-[:KNOWS]->(friend:Person)
WHERE friend.age > 25
RETURN friend.name, friend.age
ORDER BY friend.age DESC

Cypher is the most widely adopted graph query language. Its pattern syntax is intuitive and readable, even for developers new to graph databases. However, it was developed by Neo4j and, until recently, lacked formal standardization.

Gremlin (Apache TinkerPop)

Gremlin takes an imperative, step-based approach. You compose a traversal by chaining steps that describe how to walk the graph. It runs on the JVM and is the standard for the Apache TinkerPop framework.

// Same query in Gremlin
g.V().has('Person', 'name', 'Alice')
 .out('KNOWS')
 .hasLabel('Person')
 .has('age', gt(25))
 .order().by('age', desc)
 .valueMap('name', 'age')

Gremlin's imperative style gives fine-grained control over traversal execution and is well-suited for procedural graph algorithms. However, complex patterns are harder to read than Cypher's visual syntax, and performance depends on step ordering.

SPARQL (W3C)

SPARQL is the W3C standard for querying RDF data. It uses triple pattern matching with a SQL-like SELECT syntax.

# Same query in SPARQL (RDF world)
PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?friendName ?friendAge
WHERE {
    ex:alice foaf:knows ?friend .
    ?friend  foaf:name  ?friendName .
    ?friend  ex:age     ?friendAge .
    FILTER(?friendAge > 25)
}
ORDER BY DESC(?friendAge)

SPARQL excels in federated queries across distributed RDF endpoints and in environments with formal ontologies. It is the standard in academic, governmental, and linked open data communities, but it is rarely used for application-level graph databases.

GQL (ISO 9075 — The New Standard)

GQL (Graph Query Language) is the new ISO standard, officially published in 2024. It unifies the best ideas from Cypher's pattern matching syntax and SQL's clause structure into a vendor-neutral specification. Key design goals include:

// GQL syntax (AstraeaDB implements this)
MATCH (a:Person {name: "Alice"})-[:KNOWS]->(friend:Person)
WHERE friend.age > 25
RETURN friend.name, friend.age
ORDER BY friend.age DESC

If you know Cypher, GQL will look immediately familiar. The core pattern matching syntax is compatible, while the standard adds formal grammar rules, composability features, and a path toward SQL integration.

Language comparison at a glance

Language Paradigm Data Model Standardized Primary Ecosystem
Cypher Declarative, pattern-based Property Graph openCypher (community) Neo4j, Memgraph, RedisGraph
Gremlin Imperative, step-based Property Graph Apache TinkerPop JanusGraph, Amazon Neptune, CosmosDB
SPARQL Declarative, triple-pattern RDF W3C Standard Blazegraph, GraphDB, Stardog
GQL Declarative, pattern-based Property Graph ISO Standard (2024) AstraeaDB, emerging adoption
AstraeaDB's query language AstraeaDB implements a GQL/Cypher-compatible syntax through a hand-written recursive-descent parser. The full execution pipeline supports: MATCH with node and edge pattern matching, WHERE filtering with boolean expressions, CREATE and DELETE for mutations, RETURN with expressions, ORDER BY, LIMIT, and aggregation functions (count(), sum(), avg(), min(), max()). Chapter 5 covers the query language in detail.

2.3 What Makes AstraeaDB Different

AstraeaDB is not another Neo4j clone. It was designed from scratch to address the shortcomings of existing graph databases, especially in the areas of AI integration, cloud-native storage, and performance. Here are the key differentiators:

The Vector-Property Graph

Most graph databases treat vector search as a bolt-on feature—an afterthought added to an existing architecture. In AstraeaDB, embeddings are first-class citizens. Every node can carry a float32 embedding vector alongside its labels and JSON properties. The HNSW (Hierarchical Navigable Small World) vector index is integrated directly into the graph structure: the navigation links in the vector index are graph edges. This unified architecture enables:

AI-First Architecture

AstraeaDB is built for the AI era. Beyond vector search, it includes:

Rust Performance

AstraeaDB is written entirely in Rust, delivering:

Three-Tier Storage

AstraeaDB solves the "Memory Wall" problem—the tension between needing random-access speed for traversals and wanting cloud-native separation of compute and storage:

Three Transport Protocols

No single protocol fits every use case. AstraeaDB offers three:

Full comparison: AstraeaDB vs. the field

Capability AstraeaDB Neo4j TigerGraph ArangoDB Memgraph
Language Rust Java C++ C++ C++
Data model Vector-Property Graph Property Graph Property Graph Multi-model (Doc+Graph) Property Graph
Query language GQL / Cypher Cypher GSQL AQL Cypher
Vector search Built-in HNSW (first-class) Vector index (added 2023) No No No
GraphRAG Built-in engine Plugin (LangChain) No No No
GNN training Built-in tensors + message passing Export to PyTorch Geometric Export to DGL No No
Temporal graphs Native validity intervals Manual (property-based) Manual Manual Manual
Storage tiers Cold / Warm / Hot with pointer swizzling Page cache Distributed in-memory RocksDB In-memory only
Transport protocols JSON-TCP + gRPC + Arrow Flight Bolt REST HTTP / VelocyStream Bolt
Encryption Homomorphic (FHE on labels) TLS + at-rest TLS + at-rest TLS + at-rest TLS
GC pauses None (Rust) JVM GC pauses None (C++) None (C++) None (C++)
Graph algorithms PageRank, Louvain, Centrality, Components GDS library (paid) Built-in (extensive) Pregel-based MAGE library
License MIT (open source) GPL / Commercial Commercial Apache 2.0 BSL / Commercial
The unified advantage The defining characteristic of AstraeaDB is unification. Rather than bolting vector search onto a graph engine, or exporting graph data to a separate ML framework, AstraeaDB treats graph traversal, vector similarity, temporal queries, and neural network training as facets of a single system. This eliminates data movement, reduces operational complexity, and enables novel operations like semantic traversal and in-database GNN training that are impossible when these capabilities live in separate systems.

When to choose AstraeaDB

AstraeaDB is the strongest fit when your workload requires two or more of the following:

If your workload is purely document-oriented (no meaningful relationships), a document database like MongoDB is a better fit. If your focus is RDF and ontology reasoning, a triple store like Stardog or GraphDB is more appropriate. AstraeaDB excels specifically where connections, semantics, and computation converge.

← Chapter 1: Why Graphs? Chapter 3: Installation and Setup →