Chapter 14: Performance and Scaling

AstraeaDB is engineered for performance at every layer—from the storage engine that eliminates garbage collection pauses to the GPU acceleration path for analytical workloads. This chapter explains the architecture that makes it fast, the knobs you can turn to make it faster, and the monitoring tools that tell you when something is wrong.

14.1 The Three-Tier Storage Architecture

The fundamental challenge of a cloud-native graph database is that graph traversals require random access at nanosecond speeds, while cloud storage (S3, GCS) offers sequential throughput at millisecond latencies. AstraeaDB solves this with a three-tier storage architecture that automatically moves data between cost-optimized cold storage and speed-optimized hot memory.

+---------------------------------------------------------------+ | AstraeaDB Storage Engine | +---------------------------------------------------------------+ | | | Tier 3: HOT (RAM) | | +-------------------------------------------------------+ | | | Pointer-Swizzled Subgraphs | | | | 64-bit disk page IDs --> direct memory pointers | | | | O(1) neighbor access | Nanosecond traversal | | | +-------------------------------------------------------+ | | ^ promote (frequent access) | demote (LRU eviction) | | | v | | Tier 2: WARM (NVMe SSD) | | +-------------------------------------------------------+ | | | LRU Buffer Pool | | | | 8 KiB pages with pin/unpin reference counting | | | | I/O backend: memmap2 (cross-platform) | | | | io_uring (Linux async, kernel 5.10+) | | | +-------------------------------------------------------+ | | ^ cache (on first read) | evict (capacity pressure) | | | v | | Tier 1: COLD (Object Storage) | | +-------------------------------------------------------+ | | | S3 / GCS / Azure Blob / Local Filesystem | | | | Apache Parquet format (columnar, open, compressible) | | | | Cheapest storage tier | Data at rest | | | +-------------------------------------------------------+ | | | +---------------------------------------------------------------+

Tier 1: Cold Storage (Object Storage)

Data at rest lives in Apache Parquet files on object storage (Amazon S3, Google Cloud Storage, Azure Blob Storage) or a local filesystem. Parquet is a columnar format that compresses well and is readable by virtually every data tool in the ecosystem (Spark, DuckDB, Polars, Pandas). This means your graph data is never locked into a proprietary format.

Tier 2: Warm Cache (NVMe SSD)

When data is first accessed, it is loaded from cold storage into the LRU (Least Recently Used) buffer pool on the local NVMe SSD. The buffer pool manages data in 8 KiB pages with reference counting (pin/unpin) to prevent eviction of pages that are actively being read.

AstraeaDB supports two I/O backends for the warm tier:

BackendPlatformCharacteristics
memmap2 All platforms (Linux, macOS, WSL2) Memory-mapped files. The OS kernel manages page caching. Simple, reliable, good default.
io_uring Linux (kernel 5.10+) Asynchronous I/O with zero system call overhead for batched operations. Higher throughput under heavy concurrent load.

Tier 3: Hot Memory (Pointer Swizzling)

The most frequently accessed subgraphs are promoted to the hot tier, where AstraeaDB performs pointer swizzling: 64-bit disk page IDs are replaced with direct memory pointers. This eliminates all indirection—traversing from one node to its neighbor is a single pointer dereference, achieving nanosecond-level traversal for active working sets.

How data moves between tiers Promotion and demotion are driven by access frequency. When a page is accessed, it moves up: cold to warm on first read, warm to hot after repeated access. When memory pressure increases, the LRU policy evicts the least recently used pages back down. This happens transparently—the query engine does not need to know which tier holds the data.

MVCC and the Write-Ahead Log

AstraeaDB uses MVCC (Multi-Version Concurrency Control) to allow concurrent readers and writers without locking. Each transaction sees a consistent snapshot of the graph, even while other transactions are writing.

Durability is guaranteed by the Write-Ahead Log (WAL). Every mutation is written to the WAL before being applied to the data pages. If the server crashes, it replays the WAL on startup to recover any committed transactions that had not yet been flushed to the data files.

# astraeadb.toml - Storage Configuration

[storage]
data_dir = "/var/lib/astraeadb/data"
wal_dir  = "/var/lib/astraeadb/wal"

[storage.tiering]
hot_cache_mb  = 4096       # RAM budget for pointer-swizzled subgraphs
warm_cache_mb = 16384      # NVMe buffer pool size
cold_backend  = "s3"       # "local", "s3", or "gcs"
io_backend    = "io_uring"  # "memmap2" or "io_uring" (Linux only)

[storage.s3]
bucket = "my-astraeadb-data"
region = "us-east-1"
prefix = "production/v1"

14.2 Connection Management

Under heavy load, connection management becomes critical. AstraeaDB provides several mechanisms to prevent resource exhaustion and maintain responsiveness.

Configuration Parameters

ParameterDefaultDescription
max_connections 1024 Maximum number of concurrent TCP connections. New connections beyond this limit are rejected with a clear error message.
max_concurrent_requests 256 Maximum number of requests being processed simultaneously. Provides backpressure to prevent the server from being overwhelmed by a burst of queries.
idle_timeout_secs 300 (5 min) Connections that send no requests for this duration are automatically closed, freeing resources for active clients.
request_timeout_secs 30 Maximum time a single request can run before being cancelled. Prevents runaway queries from consuming resources indefinitely.
# astraeadb.toml - Connection Management

[server]
max_connections          = 2048
max_concurrent_requests  = 512
idle_timeout_secs        = 300
request_timeout_secs     = 60

ConnectionGuard (RAII Pattern)

Internally, AstraeaDB uses a Rust RAII (Resource Acquisition Is Initialization) pattern called ConnectionGuard. When a connection is accepted, a guard object is created that increments the active connection counter. When the guard is dropped (whether due to normal disconnection, timeout, or error), the counter is automatically decremented and all associated resources are cleaned up. This makes resource leaks structurally impossible—a guarantee provided by Rust's ownership model.

Tuning for your workload For applications with many short-lived connections (e.g., serverless functions), increase max_connections and decrease idle_timeout_secs. For applications with fewer long-lived connections running complex queries (e.g., analytics dashboards), increase request_timeout_secs and max_concurrent_requests.

14.3 Monitoring with Prometheus

AstraeaDB exposes a Prometheus-compatible metrics endpoint that allows you to monitor the health and performance of your database in real time.

Key Metrics

MetricTypeDescription
astraea_requests_total Counter Total number of requests processed, labeled by operation type (create_node, query, bfs, etc.)
astraea_errors_total Counter Total number of errors, labeled by error type (auth_failed, timeout, internal)
astraea_request_duration_seconds Histogram Request latency distribution. Exposes percentiles: p50, p90, p99. Essential for SLA monitoring.
astraea_connections_active Gauge Current number of active connections. Alerts if approaching max_connections.
astraea_buffer_pool_hit_ratio Gauge Percentage of page reads served from the warm buffer pool (vs. cold storage). Should be above 95% for good performance.
astraea_hot_tier_nodes Gauge Number of nodes currently in the hot tier (pointer-swizzled in RAM).

Enabling the Metrics Endpoint

# astraeadb.toml - Prometheus Metrics

[monitoring]
prometheus_enabled = true
metrics_port       = 9090      # Separate port for metrics scraping
health_check_path  = "/health" # Returns 200 OK if the server is healthy

Scraping with Prometheus

Add AstraeaDB as a target in your Prometheus configuration:

# prometheus.yml
scrape_configs:
  - job_name: "astraeadb"
    static_configs:
      - targets: ["astraeadb-host:9090"]
    scrape_interval: 15s

Visualizing in Grafana

Once Prometheus is scraping AstraeaDB, connect Grafana to your Prometheus data source and build dashboards. Recommended panels include:

Health Check for Load Balancers The /health endpoint returns a 200 OK response when the server is accepting connections and the storage engine is operational. Use this as the health check target for load balancers (ALB, Nginx, HAProxy) to automatically remove unhealthy instances from the pool.

14.4 GPU Acceleration

Many graph algorithms—PageRank, BFS, single-source shortest path—are fundamentally matrix operations disguised as traversals. The adjacency matrix of a graph can be represented as a sparse matrix, and operations like "visit all neighbors" become sparse matrix-vector multiplications (SpMV). This makes them natural candidates for GPU acceleration.

CSR Matrix Representation

AstraeaDB represents the graph's adjacency structure in CSR (Compressed Sparse Row) format for algorithmic workloads. CSR stores only the non-zero entries of the adjacency matrix, using three arrays:

Example: 4-node graph Node 0 --> Node 1 (weight 0.5) Node 0 --> Node 2 (weight 0.3) Node 1 --> Node 3 (weight 0.8) Node 2 --> Node 3 (weight 0.2) CSR representation: row_ptr = [0, 2, 3, 4, 4] col_idx = [1, 2, 3, 3] values = [0.5, 0.3, 0.8, 0.2]

Compute Backends

BackendStatusDescription
CpuBackend Production-ready SIMD-optimized sparse matrix-vector multiplication. Uses CPU vector instructions (SSE4.2, AVX2) for parallel computation. The default backend on all platforms.
GpuBackend Framework ready Architecture in place for CUDA/cuGraph integration. Transfers the CSR matrix to GPU memory and executes kernels for massive parallelism. Targeted at million+ node analytical workloads.

Supported Algorithms

AlgorithmDescriptionComplexity
PageRank Iterative importance ranking. Converges via repeated SpMV until scores stabilize (tolerance threshold). O(V + E) per iteration
BFS Breadth-first search from a source node. Level-synchronous: each "frontier" is a SpMV operation. O(V + E)
SSSP (Bellman-Ford) Single-source shortest path supporting negative edge weights. Iterates V-1 times over all edges. O(V * E)
When to use GPU acceleration GPU acceleration shines for analytical, batch workloads on large graphs: computing PageRank over a million-node social network, running BFS from hundreds of source nodes simultaneously, or training Graph Neural Networks (Chapter 12). For transactional workloads (single-node lookups, small traversals), the CPU backend is faster due to lower data transfer overhead.

14.5 Sharding and Distributed Processing

When your graph outgrows a single machine—either in storage capacity or query throughput—AstraeaDB supports partitioning the graph across multiple shards.

Partitioning Strategies

StrategyHow It WorksBest For
Hash Partitioning Applies a consistent hash function to node IDs to distribute them evenly across shards. Adding or removing a shard only requires moving a fraction of the data (consistent hashing minimizes reshuffling). Uniform access patterns, general-purpose workloads
Range Partitioning Assigns contiguous ranges of node IDs to specific shards. Nodes with adjacent IDs (often created together) are co-located. Temporal data ingestion, locality-sensitive workloads

Architecture Components

ShardMap

The ShardMap is a routing table that maps node ID ranges to shard locations. When a query arrives, the query engine consults the ShardMap to determine which shard(s) hold the relevant data, then routes the request accordingly.

ClusterCoordinator

The ClusterCoordinator manages shard assignment, health monitoring, and rebalancing. It tracks which shards are online, handles failover when a shard becomes unavailable, and coordinates data migration when shards are added or removed.

+-------------------+ | ClusterCoordinator| | (Shard Management)| +--------+----------+ | +--------+----------+ | ShardMap | | (Routing Table) | +--------+----------+ | +--------+--------+--------+ | | | | v v v v Shard 0 Shard 1 Shard 2 Shard 3 [0-249] [250-499][500-749][750-999]
# astraeadb.toml - Sharding Configuration

[cluster]
enabled           = true
partition_strategy = "hash"       # "hash" or "range"
shard_count        = 4

[cluster.shards]
# Shard assignments (for range partitioning)
0 = { host = "shard-0.astraea.internal", port = 7687 }
1 = { host = "shard-1.astraea.internal", port = 7687 }
2 = { host = "shard-2.astraea.internal", port = 7687 }
3 = { host = "shard-3.astraea.internal", port = 7687 }
Current Status AstraeaDB's sharding layer includes a fully implemented local ClusterCoordinator and ShardMap for single-machine multi-shard configurations. Distributed coordination across multiple physical machines (using a consensus protocol like Raft) is on the roadmap and is planned for a future release. For production deployments requiring horizontal scaling today, use the local coordinator with separate AstraeaDB instances behind a routing proxy.

Performance Summary

The following table summarizes AstraeaDB's performance characteristics across different deployment configurations:

ConfigurationGraph SizeTraversal LatencyAnalytical Throughput
Single node, hot tier Up to ~100M nodes Nanoseconds (pointer swizzling) CPU-bound SpMV
Single node, warm tier Up to ~1B nodes Microseconds (NVMe page reads) I/O-bound, benefits from io_uring
Single node + GPU Up to ~1B nodes Microseconds (traversal), GPU-accelerated analytics 10-100x SpMV speedup for large matrices
Multi-shard cluster Multi-billion nodes Milliseconds (network hop between shards) Linear scale-out with shard count
← Chapter 13: Security Chapter 15: Cybersecurity Scenario →