Loop Engineering — autonomous, memory-first agent loops¶
You stop typing the next prompt and instead design the loop that prompts the agent — a system that finds work, does it, checks it, and writes down what happened, until a goal holds. The hard part of a long loop is not intelligence; it is memory: a loop fails when the agent forgets. Artesian is the memory layer that keeps such a loop on track.
This is the concept and a mini-guide for running autonomous, multi-agent loops on top of Artesian. It composes primitives Artesian already ships; it is not a separate runtime.
The loop, in one picture¶
Every iteration runs the same memory-first cycle (after mem0's six-stage loop and the Claude Agent SDK loop):
┌──────────────────────────────────────────────────────────┐
│ recall ─► assemble ─► decide ─► act ─► observe ─► commit │
│ (find) (CCS) (model) (tools) (verify) (gate) │
└─────────────────────────────▲────────────────────────────┘
└── repeat until the goal holds
- recall — pull only the high-signal slice for this step (
memory.find), not the whole history. - assemble — the bounded Committed Context State (CCS) is what the agent actually reads.
- decide / act — the model calls tools.
- observe / verify — a separate judge checks the result (the maker never grades itself).
- commit — the qualify-gate decides what durable learning enters memory.
The agent forgets between turns; the repository does not. State that must survive a turn lives outside the context window — in Artesian's memory and the self-repair anchor, so a loop survives compaction and disconnects.
Three modules of orchestration¶
A multi-agent loop is built from three decisions (after Skill-MAS, arXiv:2606.18837):
- Task decomposition (the what) — break the goal into evaluable sub-tasks with success criteria. Lands on the headrace task board.
- Agent engineering (the who) — instantiate specialized teammates (lead / workers / judge), each a role + tools, possibly different models. This is a flume.
- Workflow orchestration (the how) — choose a topology: sequential, hierarchical, or loop, with a verifier gate at each step.
The five harness building blocks → Artesian¶
Loop engineering sits on a reliable harness (after [Learn Harness Engineering]). Each block maps to an Artesian crate:
| Harness block | What it does | Artesian |
|---|---|---|
| Loop | the run-until-done control loop | basin orchestration + /goal-style stop condition |
| Memory | durable state across turns/sessions | aquifer + headgate (CCS) + the self-repair anchor |
| Verification | catch premature "done" | the judge role (qualify-gate / a second model) |
| Isolation | clean state per teammate | sandbox (optional Docker) + per-scope memory |
| Tools | observable actions | MCP tools served by artesian-mcp |
Autonomy controls¶
Autonomous does not mean unbounded. A loop is governed by:
- a stop condition — run until a verifiable goal holds (tests pass, a check returns true), not forever;
- budget caps — max turns / max spend, so an open-ended prompt cannot run away;
- the verifier gate — accepted outcomes pass the judge before they count as done;
- periodic fresh starts — reset the working context to the anchor + targeted recall to fight drift on very long runs (the loop's memory, not its prompt, is reset);
- per-scope memory —
user/agent/runscopes keep a fleet from cross-contaminating while still sharing a coordination memory (after mem0's memory scopes).
Mini-guide: run a loop with different agents and models¶
Today, the loop is driven over MCP by a lead agent (e.g. Claude Code, Codex) using Artesian's tools. The shape:
- Bind roles to agents/models.
artesian initdetects installed agent CLIs; map lead / worker / judge to any of Claude / Codex / Gemini / opencode / a local model. See modes. - Start a flume. Over MCP:
agents.list→team.create→team.spawnthe teammates. - Decompose + dispatch.
team.task.addthe sub-tasks; workersteam.task.claimand execute; coordinate viateam.message. - Verify before done. The judge reviews; only judge-accepted work is marked complete.
- Recall + commit each turn. Workers
memory.findbefore acting andmemory.commitdurable learnings after — so run N reads what runs 1..N-1 learned. - Resume anything. On compaction/disconnect,
memory.anchor.recoverrestores the plan and next step; export/import the working state as an OCF bundle to move the loop to another runtime.
For a single bounded subtask you do not need a flume — orchestrate.delegate(worker) runs one
worker under the judge gate.
artesian loop(available now). A convenience command drives this cycle directly — it repeats the worker action until the goal command exits 0 (the verifier gate), bounded by--max-turnsand optionally--max-wall-secs:
artesian loop --goal "cargo test" --worker-cmd "codex exec 'fix the failing tests'" --max-turns 10 --max-wall-secs 3600Each turn is memory-first end to end:
- recall → goal packet — the loop assembles a bounded, goal-scoped packet in
ARTESIAN_PACKET: the goal, the invariants that must hold (memories taggedinvariant, always injected regardless of relevance), the last failed verifier check (carried from the previous turn), and the most relevant memory. This is "hand the agent just the goal, invariants, and last failed check" — not a flat wiki dump. The raw recall is also passed asARTESIAN_RECALL(alongsideARTESIAN_GOAL,ARTESIAN_RUN_ID,ARTESIAN_TURN) for back-compat. Store invariants once withartesian memory store "…" --tag invariant; preview a packet withartesian memory context --goal "…".- anchor — a resume anchor is written so a crash or compaction mid-loop is recoverable.
- verify + commit — after the goal check, the turn's outcome is committed as a concise atom scoped to the run (
sessionscope,session_id = <run id>, taggedloop/turn-N). Run scoping keeps the working trail out of your durable memory and lets a later sweep reclaim it by run id, so loops never clog the store.- brakes + observability — before each turn the loop checks
~/.artesian/STOPand exits non-zero if it exists. Override that path withARTESIAN_STOP_FILE. Each run writes JSONL to~/.artesian/runs/<run id>.jsonl(override the directory withARTESIAN_RUNS_DIR): one line per turn plus a final summary with the outcome, elapsed time, and stop reason.- verified skill + spec — on success, the worker approach is stored as a durable, verified skill (tagged
skill) and a sharpened verifier-backed spec (taggedspec). A later run of the same or a similar goal surfaces them in the packet's Known approach (verified) and Sharper specs (verified) sections. If a failed check is later corrected, the loop stores a short de-duplicated auto-invariant (taggedinvariant) so future packets carry the learned constraint. The goal verifier still gates each turn, so stale learning falls back to a fresh attempt. Use--no-learnto disable these durable learning writes for a run.The worker is any shell command — a script or an agent CLI (
codex exec,claude -p, …), so you can drive a different model per loop.--configselects the project's memory backend for recall/commit (it falls back to a local files backend under--root);--pollre-checks the goal each turn without a worker.
Why memory-first¶
Long loops fail in documented ways — context rot (coherence decays after ~20–30 turns), goal drift, re-ingesting one's own early mistakes as truth, repeating finished work. Every one is a memory failure. A loop with durable, curated, semantic memory turns the circle into a spiral: each pass writes something the next pass builds on. That memory layer is exactly what Artesian provides.
References (prior art this builds on)¶
- Claude Agent SDK — the agent loop — https://code.claude.com/docs/en/agent-sdk/agent-loop (receive → evaluate → execute → repeat; compaction boundary; subagents; resume).
- mem0 — Loop Engineering for AI Agents (memory-first design) — https://mem0.ai/blog/loop-engineering-for-ai-agents-memory-first-design (the six-stage loop; user / agent / global memory scopes).
- Addy Osmani — Loop Engineering — https://addyosmani.com/blog/loop-engineering/ (the six loop primitives; "the agent forgets, the repo doesn't").
- Learn Harness Engineering — https://walkinglabs.github.io/learn-harness-engineering/en/ (Loop / Memory / Verification / Isolation / Tools; repository as the system of record).
- Skill-MAS — Evolving Meta-Skill for Automatic Multi-Agent Systems — arXiv:2606.18837 (orchestration as decompose / agent-engineer / orchestrate, evolved by reflection).
- ACC / Committed Context State — arXiv:2601.11653 (the bounded committed state the loop reads).
Related Artesian docs: modes · teams (flume) · self-repair · task-tracking (headrace) · orchestration (basin).