Skip to content

Loop Engineering — autonomous, memory-first agent loops

You stop typing the next prompt and instead design the loop that prompts the agent — a system that finds work, does it, checks it, and writes down what happened, until a goal holds. The hard part of a long loop is not intelligence; it is memory: a loop fails when the agent forgets. Artesian is the memory layer that keeps such a loop on track.

This is the concept and a mini-guide for running autonomous, multi-agent loops on top of Artesian. It composes primitives Artesian already ships; it is not a separate runtime.

The loop, in one picture

Every iteration runs the same memory-first cycle (after mem0's six-stage loop and the Claude Agent SDK loop):

            ┌──────────────────────────────────────────────────────────┐
            │  recall ─► assemble ─► decide ─► act ─► observe ─► commit │
            │   (find)    (CCS)     (model)  (tools)  (verify)  (gate)  │
            └─────────────────────────────▲────────────────────────────┘
                                          └── repeat until the goal holds
  • recall — pull only the high-signal slice for this step (memory.find), not the whole history.
  • assemble — the bounded Committed Context State (CCS) is what the agent actually reads.
  • decide / act — the model calls tools.
  • observe / verify — a separate judge checks the result (the maker never grades itself).
  • commit — the qualify-gate decides what durable learning enters memory.

The agent forgets between turns; the repository does not. State that must survive a turn lives outside the context window — in Artesian's memory and the self-repair anchor, so a loop survives compaction and disconnects.

Three modules of orchestration

A multi-agent loop is built from three decisions (after Skill-MAS, arXiv:2606.18837):

  1. Task decomposition (the what) — break the goal into evaluable sub-tasks with success criteria. Lands on the headrace task board.
  2. Agent engineering (the who) — instantiate specialized teammates (lead / workers / judge), each a role + tools, possibly different models. This is a flume.
  3. Workflow orchestration (the how) — choose a topology: sequential, hierarchical, or loop, with a verifier gate at each step.

The five harness building blocks → Artesian

Loop engineering sits on a reliable harness (after [Learn Harness Engineering]). Each block maps to an Artesian crate:

Harness block What it does Artesian
Loop the run-until-done control loop basin orchestration + /goal-style stop condition
Memory durable state across turns/sessions aquifer + headgate (CCS) + the self-repair anchor
Verification catch premature "done" the judge role (qualify-gate / a second model)
Isolation clean state per teammate sandbox (optional Docker) + per-scope memory
Tools observable actions MCP tools served by artesian-mcp

Autonomy controls

Autonomous does not mean unbounded. A loop is governed by:

  • a stop condition — run until a verifiable goal holds (tests pass, a check returns true), not forever;
  • budget caps — max turns / max spend, so an open-ended prompt cannot run away;
  • the verifier gate — accepted outcomes pass the judge before they count as done;
  • periodic fresh starts — reset the working context to the anchor + targeted recall to fight drift on very long runs (the loop's memory, not its prompt, is reset);
  • per-scope memoryuser / agent / run scopes keep a fleet from cross-contaminating while still sharing a coordination memory (after mem0's memory scopes).

Mini-guide: run a loop with different agents and models

Today, the loop is driven over MCP by a lead agent (e.g. Claude Code, Codex) using Artesian's tools. The shape:

  1. Bind roles to agents/models. artesian init detects installed agent CLIs; map lead / worker / judge to any of Claude / Codex / Gemini / opencode / a local model. See modes.
  2. Start a flume. Over MCP: agents.listteam.createteam.spawn the teammates.
  3. Decompose + dispatch. team.task.add the sub-tasks; workers team.task.claim and execute; coordinate via team.message.
  4. Verify before done. The judge reviews; only judge-accepted work is marked complete.
  5. Recall + commit each turn. Workers memory.find before acting and memory.commit durable learnings after — so run N reads what runs 1..N-1 learned.
  6. Resume anything. On compaction/disconnect, memory.anchor.recover restores the plan and next step; export/import the working state as an OCF bundle to move the loop to another runtime.

For a single bounded subtask you do not need a flume — orchestrate.delegate(worker) runs one worker under the judge gate.

artesian loop (available now). A convenience command drives this cycle directly — it repeats the worker action until the goal command exits 0 (the verifier gate), bounded by --max-turns and optionally --max-wall-secs:

artesian loop --goal "cargo test" --worker-cmd "codex exec 'fix the failing tests'" --max-turns 10 --max-wall-secs 3600

Each turn is memory-first end to end:

  1. recall → goal packet — the loop assembles a bounded, goal-scoped packet in ARTESIAN_PACKET: the goal, the invariants that must hold (memories tagged invariant, always injected regardless of relevance), the last failed verifier check (carried from the previous turn), and the most relevant memory. This is "hand the agent just the goal, invariants, and last failed check" — not a flat wiki dump. The raw recall is also passed as ARTESIAN_RECALL (alongside ARTESIAN_GOAL, ARTESIAN_RUN_ID, ARTESIAN_TURN) for back-compat. Store invariants once with artesian memory store "…" --tag invariant; preview a packet with artesian memory context --goal "…".
  2. anchor — a resume anchor is written so a crash or compaction mid-loop is recoverable.
  3. verify + commit — after the goal check, the turn's outcome is committed as a concise atom scoped to the run (session scope, session_id = <run id>, tagged loop/turn-N). Run scoping keeps the working trail out of your durable memory and lets a later sweep reclaim it by run id, so loops never clog the store.
  4. brakes + observability — before each turn the loop checks ~/.artesian/STOP and exits non-zero if it exists. Override that path with ARTESIAN_STOP_FILE. Each run writes JSONL to ~/.artesian/runs/<run id>.jsonl (override the directory with ARTESIAN_RUNS_DIR): one line per turn plus a final summary with the outcome, elapsed time, and stop reason.
  5. verified skill + spec — on success, the worker approach is stored as a durable, verified skill (tagged skill) and a sharpened verifier-backed spec (tagged spec). A later run of the same or a similar goal surfaces them in the packet's Known approach (verified) and Sharper specs (verified) sections. If a failed check is later corrected, the loop stores a short de-duplicated auto-invariant (tagged invariant) so future packets carry the learned constraint. The goal verifier still gates each turn, so stale learning falls back to a fresh attempt. Use --no-learn to disable these durable learning writes for a run.

The worker is any shell command — a script or an agent CLI (codex exec, claude -p, …), so you can drive a different model per loop. --config selects the project's memory backend for recall/commit (it falls back to a local files backend under --root); --poll re-checks the goal each turn without a worker.

Why memory-first

Long loops fail in documented ways — context rot (coherence decays after ~20–30 turns), goal drift, re-ingesting one's own early mistakes as truth, repeating finished work. Every one is a memory failure. A loop with durable, curated, semantic memory turns the circle into a spiral: each pass writes something the next pass builds on. That memory layer is exactly what Artesian provides.

References (prior art this builds on)

Related Artesian docs: modes · teams (flume) · self-repair · task-tracking (headrace) · orchestration (basin).