Appendix C: Configuration Reference
Complete reference for configuring the AstraeaDB server, including the TOML configuration file, command-line flags, environment variables, and the JSON import format.
C.1 Server Configuration (TOML)
AstraeaDB reads its configuration from a TOML file. By default, the server looks for astraeadb.toml in the current working directory. You can specify a different path with the --config flag or the ASTRAEA_CONFIG environment variable.
Below is a complete configuration file with all available options and their default values:
# ============================================================================= # AstraeaDB Server Configuration # ============================================================================= # --------------------------------------------------------------------------- # [server] — Network binding and protocol ports # --------------------------------------------------------------------------- [server] bind = "0.0.0.0" # IP address to listen on (0.0.0.0 = all interfaces) port = 7687 # JSON-TCP protocol port grpc_port = 7688 # gRPC protocol port flight_port = 7689 # Apache Arrow Flight protocol port # --------------------------------------------------------------------------- # [storage] — Data persistence and buffer management # --------------------------------------------------------------------------- [storage] data_dir = "/var/lib/astraeadb/data" # Directory for graph data files wal_dir = "/var/lib/astraeadb/wal" # Directory for write-ahead log page_size = 8192 # Page size in bytes (4096, 8192, or 16384) buffer_pool_pages = 10000 # Number of pages in the buffer pool # --------------------------------------------------------------------------- # [tls] — Transport Layer Security # --------------------------------------------------------------------------- [tls] enabled = false # Enable TLS for all protocols cert_path = "/etc/astraeadb/server.crt" # Path to server certificate (PEM) key_path = "/etc/astraeadb/server.key" # Path to server private key (PEM) ca_cert_path = "/etc/astraeadb/ca.crt" # Path to CA certificate for mutual TLS # --------------------------------------------------------------------------- # [auth] — Authentication # --------------------------------------------------------------------------- [auth] enabled = false # Enable token-based authentication # Tokens are configured via the admin API at runtime. # When enabled, all client requests must include a valid # Bearer token in the connection handshake. # --------------------------------------------------------------------------- # [vector] — HNSW vector index parameters # --------------------------------------------------------------------------- [vector] dimensions = 128 # Dimensionality of embedding vectors m = 16 # HNSW: max connections per layer ef_construction = 200 # HNSW: beam width during index build ef_search = 50 # HNSW: beam width during search distance_metric = "cosine" # Distance metric: cosine, euclidean, dot_product # --------------------------------------------------------------------------- # [connection] — Connection pool and timeout settings # --------------------------------------------------------------------------- [connection] max_connections = 1024 # Maximum simultaneous client connections max_concurrent_requests = 256 # Maximum in-flight requests across all connections idle_timeout_seconds = 300 # Close idle connections after this many seconds request_timeout_seconds = 30 # Maximum time for a single request to complete
Section-by-section explanation
[server]
Controls which network interfaces and ports the server binds to. AstraeaDB exposes three ports, one for each supported transport protocol. Set bind to "127.0.0.1" if you only want to accept local connections, or "0.0.0.0" to listen on all interfaces.
| Option | Type | Default | Description |
|---|---|---|---|
bind | String | "0.0.0.0" | IP address to listen on. Use "127.0.0.1" for local-only access. |
port | Integer | 7687 | Port for the JSON-TCP protocol (the default client protocol). |
grpc_port | Integer | 7688 | Port for gRPC connections (Go and Java clients). |
flight_port | Integer | 7689 | Port for Apache Arrow Flight connections (Python, R, Java clients). |
[storage]
Configures where data is persisted and how the buffer pool manages in-memory pages. The write-ahead log (WAL) ensures durability: every mutation is written to the WAL before being applied to the data files.
| Option | Type | Default | Description |
|---|---|---|---|
data_dir | String | "/var/lib/astraeadb/data" | Directory for graph data files (nodes, edges, indexes). |
wal_dir | String | "/var/lib/astraeadb/wal" | Directory for write-ahead log segments. For best performance, place on a separate disk from data_dir. |
page_size | Integer | 8192 | Size of each page in bytes. Valid values: 4096, 8192, 16384. Larger pages reduce I/O operations but increase memory usage per page. |
buffer_pool_pages | Integer | 10000 | Number of pages in the buffer pool. Total buffer memory = page_size * buffer_pool_pages. For the default values, this is ~78 MB. |
[tls]
Enables encrypted communication between clients and the server. When enabled, all three protocol ports (JSON-TCP, gRPC, Arrow Flight) require TLS. If ca_cert_path is set, mutual TLS (mTLS) is enforced: clients must present a certificate signed by the specified CA.
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | false | Set to true to require TLS on all connections. |
cert_path | String | "" | Path to the PEM-encoded server certificate. |
key_path | String | "" | Path to the PEM-encoded server private key. |
ca_cert_path | String | "" | Path to a CA certificate for verifying client certificates (enables mTLS). |
[auth]
Controls token-based authentication. When enabled, clients must present a valid Bearer token during the connection handshake. Tokens are managed through the admin API (not in the config file) and can be created, revoked, and rotated at runtime.
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | false | Set to true to require authentication on all client connections. |
[vector]
Configures the HNSW (Hierarchical Navigable Small World) index used for vector search. These parameters directly affect the trade-off between index build time, search accuracy, and memory usage.
| Option | Type | Default | Description |
|---|---|---|---|
dimensions | Integer | 128 | Dimensionality of stored embedding vectors. Must match the output dimension of your embedding model (e.g., 384 for MiniLM, 1536 for OpenAI text-embedding-3-small). |
m | Integer | 16 | Maximum number of connections per node per HNSW layer. Higher values improve recall but increase memory usage and build time. Typical range: 8–64. |
ef_construction | Integer | 200 | Beam width during index construction. Higher values produce a more accurate index but take longer to build. Must be ≥ m. |
ef_search | Integer | 50 | Beam width during search. Higher values improve recall at the cost of latency. Can be tuned per query in the client API. |
distance_metric | String | "cosine" | Distance function for comparing vectors. Options: "cosine" (angular similarity), "euclidean" (L2 distance), "dot_product" (inner product). |
ef_search and ef_construction. If you need to reduce memory usage, decrease m. Always benchmark with your actual data to find the right balance.
[connection]
Controls connection pooling, concurrency limits, and timeouts. These settings protect the server from being overwhelmed by too many simultaneous clients or long-running queries.
| Option | Type | Default | Description |
|---|---|---|---|
max_connections | Integer | 1024 | Maximum number of simultaneous client connections. Additional connections are rejected with an error. |
max_concurrent_requests | Integer | 256 | Maximum number of requests being processed at any given time across all connections. |
idle_timeout_seconds | Integer | 300 | Seconds of inactivity before a client connection is automatically closed. |
request_timeout_seconds | Integer | 30 | Maximum seconds allowed for a single request to complete before it is cancelled. |
C.2 Command-Line Flags
Command-line flags override values from the configuration file. They are useful for quick testing or container deployments where you want to set a single option without modifying a file.
| Flag | Description | Default |
|---|---|---|
--bind <address> |
Override the bind address from the config file | 127.0.0.1 |
--port <number> |
Override the JSON-TCP port from the config file | 7687 |
--config <path> |
Path to a TOML configuration file | None (uses astraeadb.toml in the current directory if present) |
--data-dir <path> |
Override the data directory | Value from config file |
--log-level <level> |
Set logging verbosity: error, warn, info, debug, trace |
info |
C.3 CLI Commands
The astraeadb binary supports several subcommands for managing the server and data.
| Command | Description | Example |
|---|---|---|
serve |
Start the AstraeaDB server. This is the primary command for running the database. | astraeadb serve --config /etc/astraeadb.toml |
shell |
Launch an interactive GQL REPL connected to a running server. Useful for ad-hoc queries and exploration. | astraeadb shell --port 7687 |
import <file> |
Bulk import nodes and edges from a JSON file (see Section C.6 for the expected format). | astraeadb import data.json |
export <file> |
Export the entire graph to a JSON file. The output format matches the import format. | astraeadb export backup.json |
status |
Check the health of a running server. Returns node/edge counts, uptime, and memory usage. | astraeadb status --port 7687 |
C.4 Environment Variables
Environment variables provide a third layer of configuration, with the highest precedence. The priority order is: environment variable > CLI flag > config file > built-in default.
| Variable | Description | Example |
|---|---|---|
ASTRAEA_CONFIG |
Path to the TOML configuration file. Equivalent to --config. |
ASTRAEA_CONFIG=/etc/astraeadb.toml |
ASTRAEA_BIND |
Override the server bind address. Equivalent to --bind. |
ASTRAEA_BIND=0.0.0.0 |
ASTRAEA_PORT |
Override the JSON-TCP port. Equivalent to --port. |
ASTRAEA_PORT=7687 |
ASTRAEA_DATA_DIR |
Override the data directory path. | ASTRAEA_DATA_DIR=/mnt/ssd/astraeadb |
ASTRAEA_LOG_LEVEL |
Override the log level (error, warn, info, debug, trace). |
ASTRAEA_LOG_LEVEL=debug |
OPENAI_API_KEY |
API key for OpenAI. Required when using GraphRAG with OpenAI models for embedding generation and LLM responses. | OPENAI_API_KEY=sk-... |
ANTHROPIC_API_KEY |
API key for Anthropic. Required when using GraphRAG with Claude models for LLM responses. | ANTHROPIC_API_KEY=sk-ant-... |
/proc on Linux; prefer injecting them via container orchestration secrets.
C.5 Configuration Precedence
When the same setting is specified in multiple places, AstraeaDB resolves it using the following priority order (highest to lowest):
| Priority | Source | Example |
|---|---|---|
| 1 (highest) | Environment variable | ASTRAEA_PORT=9000 |
| 2 | Command-line flag | --port 8000 |
| 3 | TOML config file | port = 7687 in [server] |
| 4 (lowest) | Built-in default | 7687 |
C.6 JSON Import Format
The astraeadb import command expects a JSON file with the following structure. Node references in the edges array use zero-based indices into the nodes array.
{
"nodes": [
{
"labels": ["Person"],
"properties": {
"name": "Alice",
"age": 30
}
},
{
"labels": ["Person"],
"properties": {
"name": "Bob",
"age": 25
}
},
{
"labels": ["Company"],
"properties": {
"name": "Acme Corp",
"industry": "Technology"
}
}
],
"edges": [
{
"source": 0,
"target": 1,
"edge_type": "KNOWS",
"properties": {
"since": 2020
}
},
{
"source": 0,
"target": 2,
"edge_type": "WORKS_AT",
"properties": {
"role": "Engineer",
"start_date": "2021-03-15"
}
}
]
}
Field reference
Node fields
| Field | Type | Required | Description |
|---|---|---|---|
labels | Array of strings | Yes | One or more labels classifying the node (e.g., ["Person"], ["Person", "Employee"]). |
properties | Object | No | Key-value pairs of node properties. Values can be strings, numbers, booleans, arrays, or nested objects. |
embedding | Array of floats | No | A float32 vector for vector search. Length must match the dimensions setting in [vector]. |
Edge fields
| Field | Type | Required | Description |
|---|---|---|---|
source | Integer | Yes | Zero-based index into the nodes array identifying the source node. |
target | Integer | Yes | Zero-based index into the nodes array identifying the target node. |
edge_type | String | Yes | The relationship type (e.g., "KNOWS", "WORKS_AT"). |
properties | Object | No | Key-value pairs of edge properties. |
weight | Float | No | Numeric weight for weighted graph algorithms. Defaults to 1.0. |
valid_from | String (ISO 8601) | No | Start of the edge validity interval for temporal graphs. |
valid_to | String (ISO 8601) | No | End of the edge validity interval (exclusive) for temporal graphs. |
create_nodes / create_edges) over Arrow Flight for better performance. The JSON import command is best suited for initial data loading and smaller datasets.