Skip to content

Backends

MemoryBackend defines the durable memory seam:

  • store: persist durable memory.
  • find: retrieve relevant memory.
  • hybrid_rrf: fuse keyword and vector retrieval channels with reciprocal rank fusion.
  • get_node: drill down by deterministic node_id or memory id.

VectorStore is lower level. It owns only collection/index creation, upsert, search, get, and capability reporting. Embedding, chunk-on-store, small-to-big retrieval, L0-L3 memory tiering, payload schema, and RRF all live in VectorMemoryBackend<V: VectorStore> — so every vector engine inherits them for free.

Because chunking and small-to-big are done once in VectorMemoryBackend, any vector backend automatically gets bounded, coherent retrieval: store splits content into ~400-token chunks with parent linkage, and find matches on the precise chunk but returns the surrounding parent-section window (same-parent hits collapsed into one), bounded by an adaptive budget — with the full document reachable by node_id drill-down. See memory.md §3.5 for the chunking + small-to-big algorithm and the adaptive budget.

Semantic query cache

CachingMemoryBackend<B, V> wraps any MemoryBackend with a SemanticCache: a repeated or paraphrased query whose embedding is within a cosine threshold of a recent query is served from cache instead of re-running search — cutting embedding + ANN work in agent loops that recall around the same task. The cache is bounded (LRU) with an optional TTL; only plain text queries are cached (a node_id or tenancy filter bypasses it), and every store clears the cache so a new write is never hidden by a stale read. SemanticCache is embedder-agnostic via the QueryVectorizer seam; under feature vector the pinned TextEmbedder is adapted to it via EmbedderVectorizer, and VectorMemoryBackend::into_cached wraps a backend reusing its own embedder. Enable it in artesian.toml (vector backends only):

[memory.semantic_cache]
enabled = true
capacity = 256          # max cached queries (LRU)
min_similarity = 0.95   # cosine threshold for a hit
ttl_seconds = 300       # optional entry expiry

The CLI and MCP wrap the opened vector backend automatically when this block is enabled.

Hybrid And RRF

Artesian uses reciprocal rank fusion with rank_constant = 60.0 by default. A document at rank r contributes 1 / (rank_constant + r) to its fused score. Duplicate node_id results across channels merge into one hit, preserving deterministic drill-down.

Vector engines may advertise supports_server_side_hybrid. When they do, VectorMemoryBackend delegates hybrid search to the engine. When they do not, Artesian runs keyword and vector searches separately and fuses them with the same RRF implementation.

FilesBackend

FilesBackend stores OKF markdown records under .artesian/memory/YYYY-MM-DD/<id>.md. It writes YAML --- frontmatter with required type: memory, recommended tags/timestamp, and Artesian extensions such as node_id, tier, and optional tenancy fields. It still reads legacy TOML +++ records.

Hybrid behavior:

  • Keyword search is local text matching over content, tags, and metadata.
  • Vector search is not available.
  • hybrid_rrf uses the default MemoryBackend implementation, so both channels are keyword searches unless a caller supplies different query text.

SqliteVecBackend

SqliteVecBackend is VectorMemoryBackend<SqliteVecVectorStore>.

Storage:

  • rusqlite owns the local database file.
  • sqlite-vec provides the vec0 vector table.
  • SQLite FTS5 provides keyword/BM25 search.
  • Payload JSON is stored beside the vector rows for deterministic get_node and idempotent backfill.
  • Connections use WAL and busy_timeout; writers are serialized inside the process.

Hybrid behavior:

  • supports_server_side_hybrid = false.
  • Artesian runs FTS5 BM25 keyword search and sqlite-vec vector search separately.
  • Results are fused by Artesian RRF.

Storage quantization: set quantization: "int8" on a VectorCollection to store vectors as signed 8-bit integers (1 byte/dim) instead of float32 (4 bytes/dim). This is honest int8 scalar quantization — a 4× storage reduction. LEANN's 97% figure requires pruned-graph recomputation and is not claimed here. Float32 remains the default; existing collections are not affected.

Default CLI config stores the SQLite file at .artesian/memory.sqlite3 when backend = "sqlite-vec".

QdrantBackend

QdrantBackend is VectorMemoryBackend<QdrantVectorStore>.

Storage:

  • Qdrant owns the collection and vector index.
  • Artesian stores the normalized memory payload in Qdrant point payload.
  • Upserts use wait=true for read-after-write behavior.
  • New collections are created with a collection-level HNSW config (m=16, ef_construct=100) — a balanced recall/build trade-off for the 10³–10⁵-point collections Artesian targets.
  • quantization: "int8" is applied at the collection level as Qdrant scalar quantization (quantile=0.99, always_ram=true): 4×-smaller vectors stay in RAM for fast scoring while the full-precision originals on disk back rescoring. Float32 stays the default.
  • Payload indexes are typed per field: full-text on content, datetime on created_at (so recency/time-range filters use an index instead of a full scan), integer on token counts, and keyword on node_id and the tenancy fields.
  • The first shared embedding default is pinned to intfloat/multilingual-e5-small with 384 dimensions.

Hybrid behavior:

  • supports_server_side_hybrid = false today because Artesian does not yet configure a sparse vector channel.
  • Artesian runs vector search through Qdrant and keyword fallback over Qdrant payload scroll, then fuses with RRF.
  • Future sparse support can flip capabilities without changing MemoryBackend callers.

Run a local Qdrant for development:

docker compose -f deploy/qdrant/compose.yml up -d
QDRANT_URL=http://127.0.0.1:6333 \
  cargo test -p aquifer --features qdrant --test qdrant -- --ignored

Do not hardcode Qdrant hosts in code. On default ports, Artesian accepts one QDRANT_URL / qdrant_url: :6333 is treated as REST and derives gRPC :6334; :6334 derives REST :6333. Use QDRANT_REST_URL / qdrant_rest_url only when the REST API is not the default sibling of the configured gRPC endpoint. CLI setup/import preflights both endpoints before writing memory.

PgVectorBackend

PgVectorStore (feature pgvector) adapts PostgreSQL + pgvector to the VectorStore trait, so a team already running Postgres can use it as the shared memory store with no extra service. It is exercised by a gated integration test (#[ignore] unless the database URL is set).

Adding a vector backend (the VectorStore adapter pattern)

A new vector engine is a thin adapter, not a fork. Implement the six VectorStore methods and the generic VectorMemoryBackend<V> gives you embedding, chunk-on-store, RRF hybrid, reranking, L0-L3 tiering, payload tenancy, and node_id drill-down for free — no core change.

Worked example: crates/aquifer/src/pgvector.rs (feature pgvector).

  1. Feature + deps — add a Cargo feature and optional client deps; gate the module with #[cfg(feature = "<name>")].
  2. Store typestruct YourVectorStore holding the connection/config, with connect(config).
  3. impl VectorStore:
  4. ensure_collection — create the collection/table with the right vector dimension and distance;
  5. ensure_payload_index — index the tenancy/keyword payload fields;
  6. upsert — write points { id, vector, payload };
  7. search — vector ANN and/or keyword, honoring the normalized Filter (eq / in / range / exists, with must / should / must_not);
  8. get — fetch a point by id (used for dedup and drill-down);
  9. capabilities — advertise e.g. supports_server_side_hybrid; return false and Artesian runs RRF itself. Optionally impl VectorCollectionAdmin for snapshot / migrate support.
  10. Aliaspub type YourBackend = VectorMemoryBackend<YourVectorStore>;.
  11. Gated test — a #[ignore] integration test proving store → find → hybrid against a live instance; read the host from an env var, never hardcode it.

Keep the trait minimal: do not push embedding, RRF, or chunking into the adapter — those stay in VectorMemoryBackend so every engine behaves identically. Never log credentials.

Composing With Strong External Retrieval Engines

Artesian does not need to compete with specialized recall engines on raw retrieval quality. Treat a strong external engine — for example an info-theoretic, hybrid, or domain-specific retriever — as a VectorStore adapter. The adapter receives Artesian's normalized points { id, vector, payload }, honors Filter for tenancy and scoped recall, and returns VectorSearchHit values from its own ANN, hybrid, sparse, or reranked search pipeline.

Because the seam stays at VectorStore, the external engine inherits Artesian behavior above the substrate:

  • chunk-on-store and parent node_id drill-down;
  • keyword/vector hybrid RRF when the engine does not advertise server-side hybrid;
  • optional reranking, entity signals, relation expansion, and small-to-big context windows;
  • tenancy fields (scope, agent_id, session_id, task_id, user_id) through normalized payload filters.

A minimal stub shape is:

// SPDX-License-Identifier: Apache-2.0
use futures_util::{future::BoxFuture, FutureExt};
use aquifer::{
    MemoryResult, PayloadIndex, VectorCollection, VectorMemoryBackend, VectorPoint,
    VectorSearch, VectorSearchHit, VectorStore, VectorStoreCapabilities,
};

pub struct ExternalRecallStore {
    // client/config for the external engine
}

pub type ExternalRecallBackend = VectorMemoryBackend<ExternalRecallStore>;

impl VectorStore for ExternalRecallStore {
    fn ensure_collection(&self, _: VectorCollection) -> BoxFuture<'_, MemoryResult<()>> {
        async { Ok(()) }.boxed()
    }

    fn ensure_payload_index(&self, _: &str, _: PayloadIndex) -> BoxFuture<'_, MemoryResult<()>> {
        async { Ok(()) }.boxed()
    }

    fn upsert(&self, _: &str, _: Vec<VectorPoint>) -> BoxFuture<'_, MemoryResult<()>> {
        async { Ok(()) }.boxed()
    }

    fn search(&self, _: &str, _: VectorSearch) -> BoxFuture<'_, MemoryResult<Vec<VectorSearchHit>>> {
        async { Ok(Vec::new()) }.boxed()
    }

    fn get(&self, _: &str, _: &str) -> BoxFuture<'_, MemoryResult<Option<VectorPoint>>> {
        async { Ok(None) }.boxed()
    }

    fn capabilities(&self) -> VectorStoreCapabilities {
        VectorStoreCapabilities {
            supports_server_side_hybrid: true,
            supports_sparse: true,
        }
    }
}

Keep the adapter responsible only for storage/search translation. Leave chunking, embedding model compatibility checks, relation expansion, and memory policy in VectorMemoryBackend.

Reserved backends

TencentDBBackend remains a reserved backend name behind the same MemoryBackend trait.