Why Artesian — and how it compares¶
The problem: three gaps the field has named but no OSS system closes together¶
Gap 1 — Recall ≠ use. Systems that saturate recall benchmarks still fail when memory must
guide action. The question is not "can you recall attempt 12?" but "given attempts 1–46, what
do you do on 47?" (MemoryArena, arXiv:2602.16313). Recall is necessary; it is not sufficient.
Artesian is the first OSS system that benchmarks both recall quality (LoCoMo / LongMemEval) and
memory-guides-action (the agentic task eval in gauge).
Gap 2 — Shared state without corruption. Naive file coordination silently corrupts under concurrent writes; the moment you add locking, atomic writes, indexing, and metadata discipline "you are no longer just using files — you are rebuilding a database" (Oracle, File Systems vs Databases for Agent Memory). Cursor's flat file-lock model degraded "20 agents to the throughput of 2–3" before they moved to optimistic concurrency. The owned, self-hosted version of this does not exist in OSS. Artesian provides it: a transactional, multi-writer substrate with per-scope isolation enforced by the system, not by convention.
Gap 3 — Context survival across auto-compaction and disconnect. Everyone calls memory "the
durable spine" of a loop, then implements it as "write to a markdown file." No system does real
self-repair: detect the compaction / reconnect boundary → re-anchor + targeted recall before the
next action. Artesian's headgate self-repair hook does this deterministically — see
docs/self-repair.md.
The one-line positioning: Artesian is the memory control plane for agent loops — the qualify-gate, bounded committed state, and drift/footprint measurement that turn a retrieval store into durable, action-guiding, owner-controlled memory.
Memory CONTROL: the wedge¶
Most agent memory systems solve retrieval: surface relevant records so the agent has more context. That is necessary, but not sufficient. Retrieval alone does not answer three questions that matter at scale:
- What is the agent's committed view of the world right now? Retrieval gives a ranked list; it does not define a bounded, authoritative state.
- What qualifies to enter that view? Without a gate, every write is trusted equally — drift, hallucination, and footprint inflate unchecked.
- How do you measure and control that drift? Without a bench, memory quality degrades silently.
Bousetouane (arXiv:2601.11653, "AI Agents Need Memory Control Over More Context", Jan 2026) formalises this as the Agent Cognitive Compressor (ACC): a control loop that separates the recall channel (read from any retrieval store) from the commit channel (what is written into a bounded, schema-governed Committed Context State / CCS). A qualify-gate sits between them: only information that passes the gate — verified, relevant, non-redundant — enters the CCS.
Artesian is a ground-up implementation of this model, layered as a control plane over any
retrieval store (Aquifer/OKF, sqlite-vec, Qdrant, mem0, Anthropic, or any MemoryBackend
adapter):
┌─────────────────────────────────────────────────────────┐
│ Artesian control plane │
│ │
recall │ memory.find ──► qualify-gate ──► CCS (bounded) │ commit
───────┼──────────────────────────────────────────────────────────┼────────►
│ any VectorStore/ headgate (Step 4) │ memory.store
│ FilesBackend │
└─────────────────────────────────────────────────────────┘
The qualify-gate is today approximated by the judge role in orchestrate mode (verifiers +
accept/reject loop); headgate — the dedicated CCS controller with schema-state tracking and
drift/hallucination/footprint metrics — is the Step 4 build. No OSS system implements ACC fully
today; Artesian is the first-mover.
Corrective-RAG and Graph-RAG¶
Corrective-RAG proposes verifying retrieved context before use and discarding low-quality retrievals. Artesian's ACC qualify-gate already implements this: every recall candidate is scored for relevance, novelty, and drift before it enters the bounded committed state. Artesian is a deployed Corrective-RAG — not a future direction.
Graph-RAG (relational memory via an entity-relation graph) is a complementary future direction.
The CCS schema has a relational_map slot reserved for it, but a graph store is not yet wired.
Graph-RAG is optional for agent loops that require structured relational queries; most loops do not
need it and are better served by the bounded CCS approach.
Why this matters beyond RAG¶
| Pure RAG | Artesian ACC | |
|---|---|---|
| State model | stateless query → ranked list | bounded CCS — authoritative committed state |
| Write path | append-only, all equal | qualify-gate: only verified, non-redundant, non-drifted entries enter CCS |
| Drift control | none | judge-eval of drift / hallucination / footprint per cycle |
| Recall cost | ~6–10 k tokens/query (mem0 benchmark) | ~1 k tokens/query (chunked OKF + small-to-big + adaptive budget) |
| Composes with | your retrieval store | any retrieval store (including mem0, Anthropic memory, existing Qdrant) |
What Artesian is — and is not¶
Artesian is, first, durable, semantic memory your agents own: the decisions, facts, and context they accumulate across sessions, kept in portable Open Knowledge Format markdown you can read, commit, and carry anywhere. That is the flagship — use only memory and nothing else about how you run your agent changes. Optionally, the same store is also an orchestration and agent-team layer (composable components you opt into, never required).
It is not:
- a cloud memory service you rent — Artesian runs locally; writes are free (no per-write LLM call) and your data never leaves your machine;
- a code-structure index like Codebase-Memory — that is a parsed graph of what your code is; Artesian stores what your agent learns, and the two compose;
- just a conversation log — it is consolidated, retrievable, tiered knowledge with a qualify gate.
How it compares¶
Against TencentDB Agent Memory specifically, the key differences:
| TencentDB Agent Memory | Artesian | |
|---|---|---|
| Scope | Memory only (capture → extract → recall) | Memory + ACC control plane + task tracking + master/worker/judge orchestration + sandbox |
| Integration | OpenClaw plugin / Hermes provider (framework-coupled) | MCP-first, agent-agnostic (Claude Code, Codex, Zed, opencode, …) + pluggable Agent adapters |
| Runtime | Node ≥22.16 + TypeScript | Rust — single static binary, no runtime |
| Vector store | SQLite + sqlite-vec (local-first; remote on roadmap) | Pluggable VectorStore: Files(OKF) / sqlite-vec / Qdrant (+ TencentDB-style adapter possible) |
| On-disk format | bespoke markdown/JSONL layout | Open Knowledge Format (OKF) — vendor-neutral, portable, interop with the OKF ecosystem |
| Concurrency | single-user, local-first | multi-project + multi-user + parallel (collection-per-project + payload tenancy) — see concurrency.md |
| Cross-tool memory | within its host framework | neutral shared store both Claude Code and Codex read (their native memories are siloed) |
| Upgrades | — | upgrade-survivable: OKF = source of truth, Qdrant = rebuildable index, migrate + version metadata (upgrades.md) |
| Orchestration safety | n/a (not an orchestrator) | verifiers-as-trust-boundary, judge-sole-committer, task DAG, worker workspace isolation |
What Artesian reuses from TencentDB (with credit): the L0–L3 tiering, hybrid+RRF retrieval, the
markdown white-box principle, node_id drill-down, and the benchmark-rigor mindset. A
TencentDB-style symbolic Mermaid "task canvas" for short-term memory is a natural future addition
on top of Artesian's WorkingMemory + session anchor.
One-line positioning: Artesian is a memory controller — an ACC control plane that layers
bounded committed context and a qualify gate over any retrieval store — with local-first Rust
storage, MCP-first integration, and optional master/worker/judge orchestration sharing the same
store. Use as little (just memory mode, ~1 k tokens/query retrieval) or as much (full, with
ACC qualify-gate and orchestration) as you want.
Direct competitors (general agent memory)¶
These solve the same core problem — durable memory for agents — and are the honest comparison set:
- mem0 (Apache-2.0) — the most prominent. An LLM extracts facts on every write, stored with entity linking; hybrid semantic + BM25 + temporal retrieval; strong published LoCoMo / LongMemEval results and a large token saving vs. full-context (see arXiv:2504.19413 for the exact figures), broad vector-DB and LLM support, and a hosted cloud. Artesian's wedge: writes are free and local (no per-write LLM call), memory is white-box OKF markdown you own (not an opaque or rented store), it runs zero-infra, and it is MCP-first / integrate-anything. Additionally: Artesian composes with mem0 as a retrieval backend under the ACC control plane — they are not mutually exclusive. We aim to match mem0's retrieval quality (opt-in LLM consolidation, entity/temporal signals are on the roadmap) while keeping the zero-cost, own-your-data default.
- Zep / Graphiti — temporal knowledge-graph memory with strong LongMemEval / DMR numbers; graph-centric and service-oriented. Artesian stays files-first and vendor-neutral (graph relations are a roadmap addition, not a required backend).
- Letta / MemGPT — an agent "memory OS" with a server and its own agent runtime. Artesian is lighter and non-intrusive: memory you add to your agent over MCP, not a runtime you adopt.
Honest take: these are well-funded and benchmark-strong. Artesian does not try to out-platform
them — it wins on ownership, simplicity, zero-cost local writes, freedom to integrate, and
on being the first to implement the ACC control-plane model: a qualify gate, bounded committed
state, and drift/hallucination/footprint measurement as first-class features. Published benchmarks — LoCoMo ≈ 0.475 (vector + reranking), LongMemEval-oracle ≈ 0.70 — are in
benchmarks/README.md. An agentic task score (memory-guides-action, not
just recall) ships in the gauge eval harness.
Adjacent / related projects¶
- open-engram — a brain-inspired memory library (TypeScript): sensory → working → episodic → semantic stores, a multi-stage consolidation pipeline, and RFR-scored demotion. Memory-only, framework-SDK (Mastra/LangChain.js), no MCP, no orchestration. Strongly validates the consolidation direction Artesian is taking; Artesian differs by being Rust + MCP-first + pluggable backends + orchestration, not a single-runtime library.
- openrelay — a model quota aggregator / router with a web dashboard that bridges credentials and routes requests from any tool to any provider. This is a different layer (model access), not memory or orchestration; it is complementary — Artesian could sit above an openrelay-style router. Its clean "connect any agent" UX is a good presentation model to learn from.
- h5i — an AI-aware Git sidecar (Rust): per-commit agent
context/reasoning in dedicated
refs/h5i/*, Agent Radio typed inter-agent messaging with union-merge, output token-reduction (collapse tool output, keep recoverable raw), and progressive sandbox isolation (workspace → process → supervised → container). A different layer from Artesian — provenance, comms, and confinement over Git, not semantic retrieval — and complementary: they could compose. We borrow ideas, with credit: its typed agent-handoff protocol informs Artesian's orchestration handoffs, its isolation tiers informsandbox, and its "collapse but never discard, recover by id" mirrors Artesian's L0–L3 +node_iddrill-down. - Codebase-Memory (MIT) — a Tree-Sitter structural code graph (who-calls-what, routes, impact) over MCP, single C binary, zero-infra. A different kind of memory — what your code is, parsed from source — versus Artesian's what your agent learns. Explicitly complementary: an agent can use Codebase-Memory for repo structure and Artesian for durable knowledge. Its local-first, deterministic, single-binary, commit-the-artifact philosophy mirrors and validates Artesian's own.
Converging evidence shaping the roadmap¶
Karpathy's LLM-wiki,
TencentDB's L0–L3, and open-engram's episodic→semantic consolidation all point the same way:
curated, consolidated memory (atomic facts + entity/concept/scenario pages + an index.md
read-first catalog) beats flat record dumps — for both retrieval precision and token cost. Below
~50–100 k tokens a curated wiki/index-first context can even beat vector RAG; vector retrieval
wins at larger scale. Artesian's plan is to do both: index-first + targeted memory.find,
with consolidation populating the tiers — see the memory roadmap in memory.md.
The ACC model (arXiv:2601.11653) provides the formal frame for why control over that consolidation — not just retrieval — is the right level of abstraction. Bounded committed state does not replace vector search; it governs what enters it.
Acknowledgements¶
The memory-control framing in this document builds on Bousetouane's analysis in arXiv:2601.11653 ("AI Agents Need Memory Control Over More Context", Jan 2026). We are grateful for the clear formalisation of the ACC/CCS model; Artesian aims to be a concrete, open-source implementation of those ideas.