Appendix C: Configuration Reference

Complete reference for configuring the AstraeaDB server, including the TOML configuration file, command-line flags, environment variables, and the JSON import format.

C.1 Server Configuration (TOML)

AstraeaDB reads its configuration from a TOML file. By default, the server looks for astraeadb.toml in the current working directory. You can specify a different path with the --config flag or the ASTRAEA_CONFIG environment variable.

Below is a complete configuration file with all available options and their default values:

# =============================================================================
# AstraeaDB Server Configuration
# =============================================================================

# ---------------------------------------------------------------------------
# [server] — Network binding and protocol ports
# ---------------------------------------------------------------------------
[server]
bind = "0.0.0.0"           # IP address to listen on (0.0.0.0 = all interfaces)
port = 7687                 # JSON-TCP protocol port
grpc_port = 7688            # gRPC protocol port
flight_port = 7689          # Apache Arrow Flight protocol port

# ---------------------------------------------------------------------------
# [storage] — Data persistence and buffer management
# ---------------------------------------------------------------------------
[storage]
data_dir = "/var/lib/astraeadb/data"   # Directory for graph data files
wal_dir = "/var/lib/astraeadb/wal"     # Directory for write-ahead log
page_size = 8192                       # Page size in bytes (4096, 8192, or 16384)
buffer_pool_pages = 10000              # Number of pages in the buffer pool

# ---------------------------------------------------------------------------
# [tls] — Transport Layer Security
# ---------------------------------------------------------------------------
[tls]
enabled = false                            # Enable TLS for all protocols
cert_path = "/etc/astraeadb/server.crt"    # Path to server certificate (PEM)
key_path = "/etc/astraeadb/server.key"     # Path to server private key (PEM)
ca_cert_path = "/etc/astraeadb/ca.crt"    # Path to CA certificate for mutual TLS

# ---------------------------------------------------------------------------
# [auth] — Authentication
# ---------------------------------------------------------------------------
[auth]
enabled = false             # Enable token-based authentication
# Tokens are configured via the admin API at runtime.
# When enabled, all client requests must include a valid
# Bearer token in the connection handshake.

# ---------------------------------------------------------------------------
# [vector] — HNSW vector index parameters
# ---------------------------------------------------------------------------
[vector]
dimensions = 128           # Dimensionality of embedding vectors
m = 16                      # HNSW: max connections per layer
ef_construction = 200      # HNSW: beam width during index build
ef_search = 50             # HNSW: beam width during search
distance_metric = "cosine" # Distance metric: cosine, euclidean, dot_product

# ---------------------------------------------------------------------------
# [connection] — Connection pool and timeout settings
# ---------------------------------------------------------------------------
[connection]
max_connections = 1024             # Maximum simultaneous client connections
max_concurrent_requests = 256     # Maximum in-flight requests across all connections
idle_timeout_seconds = 300        # Close idle connections after this many seconds
request_timeout_seconds = 30     # Maximum time for a single request to complete

Section-by-section explanation

[server]

Controls which network interfaces and ports the server binds to. AstraeaDB exposes three ports, one for each supported transport protocol. Set bind to "127.0.0.1" if you only want to accept local connections, or "0.0.0.0" to listen on all interfaces.

OptionTypeDefaultDescription
bindString"0.0.0.0"IP address to listen on. Use "127.0.0.1" for local-only access.
portInteger7687Port for the JSON-TCP protocol (the default client protocol).
grpc_portInteger7688Port for gRPC connections (Go and Java clients).
flight_portInteger7689Port for Apache Arrow Flight connections (Python, R, Java clients).

[storage]

Configures where data is persisted and how the buffer pool manages in-memory pages. The write-ahead log (WAL) ensures durability: every mutation is written to the WAL before being applied to the data files.

OptionTypeDefaultDescription
data_dirString"/var/lib/astraeadb/data"Directory for graph data files (nodes, edges, indexes).
wal_dirString"/var/lib/astraeadb/wal"Directory for write-ahead log segments. For best performance, place on a separate disk from data_dir.
page_sizeInteger8192Size of each page in bytes. Valid values: 4096, 8192, 16384. Larger pages reduce I/O operations but increase memory usage per page.
buffer_pool_pagesInteger10000Number of pages in the buffer pool. Total buffer memory = page_size * buffer_pool_pages. For the default values, this is ~78 MB.

[tls]

Enables encrypted communication between clients and the server. When enabled, all three protocol ports (JSON-TCP, gRPC, Arrow Flight) require TLS. If ca_cert_path is set, mutual TLS (mTLS) is enforced: clients must present a certificate signed by the specified CA.

OptionTypeDefaultDescription
enabledBooleanfalseSet to true to require TLS on all connections.
cert_pathString""Path to the PEM-encoded server certificate.
key_pathString""Path to the PEM-encoded server private key.
ca_cert_pathString""Path to a CA certificate for verifying client certificates (enables mTLS).

[auth]

Controls token-based authentication. When enabled, clients must present a valid Bearer token during the connection handshake. Tokens are managed through the admin API (not in the config file) and can be created, revoked, and rotated at runtime.

OptionTypeDefaultDescription
enabledBooleanfalseSet to true to require authentication on all client connections.

[vector]

Configures the HNSW (Hierarchical Navigable Small World) index used for vector search. These parameters directly affect the trade-off between index build time, search accuracy, and memory usage.

OptionTypeDefaultDescription
dimensionsInteger128Dimensionality of stored embedding vectors. Must match the output dimension of your embedding model (e.g., 384 for MiniLM, 1536 for OpenAI text-embedding-3-small).
mInteger16Maximum number of connections per node per HNSW layer. Higher values improve recall but increase memory usage and build time. Typical range: 8–64.
ef_constructionInteger200Beam width during index construction. Higher values produce a more accurate index but take longer to build. Must be ≥ m.
ef_searchInteger50Beam width during search. Higher values improve recall at the cost of latency. Can be tuned per query in the client API.
distance_metricString"cosine"Distance function for comparing vectors. Options: "cosine" (angular similarity), "euclidean" (L2 distance), "dot_product" (inner product).
Tuning HNSW parameters For most use cases, the defaults work well. If you need higher recall (fewer missed results), increase ef_search and ef_construction. If you need to reduce memory usage, decrease m. Always benchmark with your actual data to find the right balance.

[connection]

Controls connection pooling, concurrency limits, and timeouts. These settings protect the server from being overwhelmed by too many simultaneous clients or long-running queries.

OptionTypeDefaultDescription
max_connectionsInteger1024Maximum number of simultaneous client connections. Additional connections are rejected with an error.
max_concurrent_requestsInteger256Maximum number of requests being processed at any given time across all connections.
idle_timeout_secondsInteger300Seconds of inactivity before a client connection is automatically closed.
request_timeout_secondsInteger30Maximum seconds allowed for a single request to complete before it is cancelled.

C.2 Command-Line Flags

Command-line flags override values from the configuration file. They are useful for quick testing or container deployments where you want to set a single option without modifying a file.

Flag Description Default
--bind <address> Override the bind address from the config file 127.0.0.1
--port <number> Override the JSON-TCP port from the config file 7687
--config <path> Path to a TOML configuration file None (uses astraeadb.toml in the current directory if present)
--data-dir <path> Override the data directory Value from config file
--log-level <level> Set logging verbosity: error, warn, info, debug, trace info

C.3 CLI Commands

The astraeadb binary supports several subcommands for managing the server and data.

Command Description Example
serve Start the AstraeaDB server. This is the primary command for running the database. astraeadb serve --config /etc/astraeadb.toml
shell Launch an interactive GQL REPL connected to a running server. Useful for ad-hoc queries and exploration. astraeadb shell --port 7687
import <file> Bulk import nodes and edges from a JSON file (see Section C.6 for the expected format). astraeadb import data.json
export <file> Export the entire graph to a JSON file. The output format matches the import format. astraeadb export backup.json
status Check the health of a running server. Returns node/edge counts, uptime, and memory usage. astraeadb status --port 7687

C.4 Environment Variables

Environment variables provide a third layer of configuration, with the highest precedence. The priority order is: environment variable > CLI flag > config file > built-in default.

Variable Description Example
ASTRAEA_CONFIG Path to the TOML configuration file. Equivalent to --config. ASTRAEA_CONFIG=/etc/astraeadb.toml
ASTRAEA_BIND Override the server bind address. Equivalent to --bind. ASTRAEA_BIND=0.0.0.0
ASTRAEA_PORT Override the JSON-TCP port. Equivalent to --port. ASTRAEA_PORT=7687
ASTRAEA_DATA_DIR Override the data directory path. ASTRAEA_DATA_DIR=/mnt/ssd/astraeadb
ASTRAEA_LOG_LEVEL Override the log level (error, warn, info, debug, trace). ASTRAEA_LOG_LEVEL=debug
OPENAI_API_KEY API key for OpenAI. Required when using GraphRAG with OpenAI models for embedding generation and LLM responses. OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY API key for Anthropic. Required when using GraphRAG with Claude models for LLM responses. ANTHROPIC_API_KEY=sk-ant-...
Security note Never hard-code API keys in configuration files or command-line arguments. Use environment variables or a secrets manager (e.g., AWS Secrets Manager, HashiCorp Vault) to inject credentials at runtime. Environment variables set in shell profiles are visible in /proc on Linux; prefer injecting them via container orchestration secrets.

C.5 Configuration Precedence

When the same setting is specified in multiple places, AstraeaDB resolves it using the following priority order (highest to lowest):

Priority Source Example
1 (highest) Environment variable ASTRAEA_PORT=9000
2 Command-line flag --port 8000
3 TOML config file port = 7687 in [server]
4 (lowest) Built-in default 7687

C.6 JSON Import Format

The astraeadb import command expects a JSON file with the following structure. Node references in the edges array use zero-based indices into the nodes array.

{
  "nodes": [
    {
      "labels": ["Person"],
      "properties": {
        "name": "Alice",
        "age": 30
      }
    },
    {
      "labels": ["Person"],
      "properties": {
        "name": "Bob",
        "age": 25
      }
    },
    {
      "labels": ["Company"],
      "properties": {
        "name": "Acme Corp",
        "industry": "Technology"
      }
    }
  ],
  "edges": [
    {
      "source": 0,
      "target": 1,
      "edge_type": "KNOWS",
      "properties": {
        "since": 2020
      }
    },
    {
      "source": 0,
      "target": 2,
      "edge_type": "WORKS_AT",
      "properties": {
        "role": "Engineer",
        "start_date": "2021-03-15"
      }
    }
  ]
}

Field reference

Node fields

FieldTypeRequiredDescription
labelsArray of stringsYesOne or more labels classifying the node (e.g., ["Person"], ["Person", "Employee"]).
propertiesObjectNoKey-value pairs of node properties. Values can be strings, numbers, booleans, arrays, or nested objects.
embeddingArray of floatsNoA float32 vector for vector search. Length must match the dimensions setting in [vector].

Edge fields

FieldTypeRequiredDescription
sourceIntegerYesZero-based index into the nodes array identifying the source node.
targetIntegerYesZero-based index into the nodes array identifying the target node.
edge_typeStringYesThe relationship type (e.g., "KNOWS", "WORKS_AT").
propertiesObjectNoKey-value pairs of edge properties.
weightFloatNoNumeric weight for weighted graph algorithms. Defaults to 1.0.
valid_fromString (ISO 8601)NoStart of the edge validity interval for temporal graphs.
valid_toString (ISO 8601)NoEnd of the edge validity interval (exclusive) for temporal graphs.
Large imports For datasets larger than a few hundred thousand nodes, consider using the batch client API (create_nodes / create_edges) over Arrow Flight for better performance. The JSON import command is best suited for initial data loading and smaller datasets.
← Appendix B: Client API Reference Back to Table of Contents →