17 KiB
name, description
| name | description |
|---|---|
| harness | This skill should be used for multi-session autonomous agent work requiring progress checkpointing, failure recovery, and task dependency management. Triggers on '/harness' command, or when a task involves many subtasks needing progress persistence, sleep/resume cycles across context windows, recovery from mid-task failures with partial state, or distributed work across multiple agent sessions. Synthesized from Anthropic and OpenAI engineering practices for long-running agents. |
Harness — Long-Running Agent Framework
Executable protocol enabling any agent task to run continuously across multiple sessions with automatic progress recovery, task dependency resolution, failure rollback, and standardized error handling.
Design Principles
- Design for the agent, not the human — Test output, docs, and task structure are the agent's primary interface
- Progress files ARE the context — When context window resets, progress files + git history = full recovery
- Premature completion is the #1 failure mode — Structured task lists with explicit completion criteria prevent declaring victory early
- Standardize everything grep-able — ERROR on same line, structured timestamps, consistent prefixes
- Fast feedback loops — Pre-compute stats, run smoke tests before full validation
- Idempotent everything — Init scripts, task execution, environment setup must all be safe to re-run
- Fail safe, not fail silent — Every failure must have an explicit recovery strategy
Commands
/harness init <project-path> # Initialize harness files in project
/harness run # Start/resume the infinite loop
/harness status # Show current progress and stats
/harness add "task description" # Add a task to the list
Progress Persistence (Dual-File System)
Maintain two files in the project working directory:
harness-progress.txt (Append-Only Log)
Free-text log of all agent actions across sessions. Never truncate.
[2025-07-01T10:00:00Z] [SESSION-1] INIT Harness initialized for project /path/to/project
[2025-07-01T10:00:05Z] [SESSION-1] INIT Environment health check: PASS
[2025-07-01T10:00:10Z] [SESSION-1] LOCK acquired (pid=12345)
[2025-07-01T10:00:11Z] [SESSION-1] Starting [task-001] Implement user authentication (base=def5678)
[2025-07-01T10:05:00Z] [SESSION-1] CHECKPOINT [task-001] step=2/4 "auth routes created, tests pending"
[2025-07-01T10:15:30Z] [SESSION-1] Completed [task-001] (commit abc1234)
[2025-07-01T10:15:31Z] [SESSION-1] Starting [task-002] Add rate limiting (base=abc1234)
[2025-07-01T10:20:00Z] [SESSION-1] ERROR [task-002] [TASK_EXEC] Redis connection refused
[2025-07-01T10:20:01Z] [SESSION-1] ROLLBACK [task-002] git reset --hard abc1234
[2025-07-01T10:20:02Z] [SESSION-1] STATS tasks_total=5 completed=1 failed=1 pending=3 blocked=0 attempts_total=2 checkpoints=1
harness-tasks.json (Structured State)
{
"version": 2,
"created": "2025-07-01T10:00:00Z",
"session_config": {
"max_tasks_per_session": 20,
"max_sessions": 50
},
"tasks": [
{
"id": "task-001",
"title": "Implement user authentication",
"status": "completed",
"priority": "P0",
"depends_on": [],
"attempts": 1,
"max_attempts": 3,
"started_at_commit": "def5678",
"validation": {
"command": "npm test -- --testPathPattern=auth",
"timeout_seconds": 300
},
"on_failure": {
"cleanup": null
},
"error_log": [],
"checkpoints": [],
"completed_at": "2025-07-01T10:15:30Z"
},
{
"id": "task-002",
"title": "Add rate limiting",
"status": "failed",
"priority": "P1",
"depends_on": [],
"attempts": 1,
"max_attempts": 3,
"started_at_commit": "abc1234",
"validation": {
"command": "npm test -- --testPathPattern=rate-limit",
"timeout_seconds": 120
},
"on_failure": {
"cleanup": "docker compose down redis"
},
"error_log": ["[TASK_EXEC] Redis connection refused"],
"checkpoints": [],
"completed_at": null
},
{
"id": "task-003",
"title": "Add OAuth providers",
"status": "pending",
"priority": "P1",
"depends_on": ["task-001"],
"attempts": 0,
"max_attempts": 3,
"started_at_commit": null,
"validation": {
"command": "npm test -- --testPathPattern=oauth",
"timeout_seconds": 180
},
"on_failure": {
"cleanup": null
},
"error_log": [],
"checkpoints": [],
"completed_at": null
}
],
"session_count": 1,
"last_session": "2025-07-01T10:20:02Z"
}
Task statuses: pending → in_progress (transient, set only during active execution) → completed or failed. A task found as in_progress at session start means the previous session was interrupted — handle via Context Window Recovery Protocol.
Session boundary: A session starts when the agent begins executing the Session Start protocol and ends when a Stopping Condition is met or the context window resets. Each session gets a unique SESSION-N identifier (N = session_count after increment).
Concurrency Control
Before modifying harness-tasks.json, acquire an exclusive lock using portable mkdir (atomic on all POSIX systems, works on both macOS and Linux):
# Acquire lock (fail fast if another agent is running)
LOCKDIR="/tmp/harness-$(printf '%s' "$(pwd)" | shasum -a 256 2>/dev/null || sha256sum | cut -c1-8).lock"
if ! mkdir "$LOCKDIR" 2>/dev/null; then
# Check if lock holder is still alive
LOCK_PID=$(cat "$LOCKDIR/pid" 2>/dev/null)
if [ -n "$LOCK_PID" ] && kill -0 "$LOCK_PID" 2>/dev/null; then
echo "ERROR: Another harness session is active (pid=$LOCK_PID)"; exit 1
fi
# Stale lock — atomically reclaim via mv to avoid TOCTOU race
STALE="$LOCKDIR.stale.$$"
if mv "$LOCKDIR" "$STALE" 2>/dev/null; then
rm -rf "$STALE"
mkdir "$LOCKDIR" || { echo "ERROR: Lock contention"; exit 1; }
echo "WARN: Removed stale lock${LOCK_PID:+ from pid=$LOCK_PID}"
else
echo "ERROR: Another agent reclaimed the lock"; exit 1
fi
fi
echo "$$" > "$LOCKDIR/pid"
trap 'rm -rf "$LOCKDIR"' EXIT
Log lock acquisition: [timestamp] [SESSION-N] LOCK acquired (pid=<PID>)
Log lock release: [timestamp] [SESSION-N] LOCK released
The lock is held for the entire session. The trap EXIT handler releases it automatically on normal exit, errors, or signals. Never release the lock between tasks within a session.
Infinite Loop Protocol
Session Start (Execute Every Time)
- Read state: Read last 200 lines of
harness-progress.txt+ fullharness-tasks.json. If JSON is unparseable, see JSON corruption recovery in Error Handling. - Read git: Run
git log --oneline -20andgit diff --statto detect uncommitted work - Acquire lock: Fail if another session is active
- Recover interrupted tasks (see Context Window Recovery below)
- Health check: Run
harness-init.shif it exists - Track session: Increment
session_countin JSON. Checksession_countagainstmax_sessions— if reached, log STATS and STOP. Initialize per-session task counter to 0. - Pick next task using Task Selection Algorithm below
Task Selection Algorithm
Before selecting, run dependency validation:
- Cycle detection: For each non-completed task, walk
depends_ontransitively. If any task appears in its own chain, mark itfailedwith[DEPENDENCY] Circular dependency detected: task-A -> task-B -> task-A. Self-references (depends_onincludes own id) are also cycles. - Blocked propagation: If a task's
depends_onincludes a task that isfailedand will never be retried (eitherattempts >= max_attemptsOR itserror_logcontains a[DEPENDENCY]entry), mark the blocked task asfailedwith[DEPENDENCY] Blocked by failed task-XXX. Repeat until no more tasks can be propagated.
Then pick the next task in this priority order:
- Tasks with
status: "pending"where ALLdepends_ontasks arecompleted— sorted bypriority(P0 > P1 > P2), then byid(lowest first) - Tasks with
status: "failed"whereattempts < max_attemptsand ALLdepends_onarecompleted— sorted by priority, then oldest failure first - If no eligible tasks remain → log final STATS → STOP
Task Execution Cycle
For each task, execute this exact sequence:
- Claim: Record
started_at_commit= current HEAD hash. Set status toin_progress, logStarting [<task-id>] <title> (base=<hash>) - Execute with checkpoints: Perform the work. After each significant step, log:
Also append to the task's
[timestamp] [SESSION-N] CHECKPOINT [task-id] step=M/N "description of what was done"checkpointsarray:{ "step": M, "total": N, "description": "...", "timestamp": "ISO" } - Validate: Run the task's
validation.commandwrapped withtimeout:timeout <timeout_seconds> <command>. If no validation command, skip. Before running, verify the command exists (e.g.,command -v <binary>) — if missing, treat asENV_SETUPerror.- Command exits 0 → PASS
- Command exits non-zero → FAIL
- Command exceeds timeout → TIMEOUT
- Record outcome:
- Success: status=
completed, setcompleted_at, logCompleted [<task-id>] (commit <hash>), git commit - Failure: increment
attempts, append error toerror_log. Verifystarted_at_commitexists viagit cat-file -t <hash>— if missing, mark failed at max_attempts. Otherwise executegit reset --hard <started_at_commit>andgit clean -fdto rollback ALL commits and remove untracked files. Executeon_failure.cleanupif defined. LogERROR [<task-id>] [<category>] <message>. Set status=failed(Task Selection Algorithm pass 2 handles retries when attempts < max_attempts)
- Success: status=
- Track: Increment per-session task counter. If
max_tasks_per_sessionreached, log STATS and STOP. - Continue: Immediately pick next task (zero idle time)
Stopping Conditions
- All tasks
completed - All remaining tasks
failedat max_attempts or blocked by failed dependencies session_config.max_tasks_per_sessionreached for this sessionsession_config.max_sessionsreached across all sessions- User interrupts
Context Window Recovery Protocol
When a new session starts and finds a task with status: "in_progress":
- Check git state:
git diff --stat # Uncommitted changes? git log --oneline -5 # Recent commits since task started? git stash list # Any stashed work? - Check checkpoints: Read the task's
checkpointsarray to determine last completed step - Decision matrix (verify recent commits belong to this task by checking commit messages for the task-id):
| Uncommitted? | Recent task commits? | Checkpoints? | Action |
|---|---|---|---|
| No | No | None | Mark failed with [SESSION_TIMEOUT] No progress detected, increment attempts |
| No | No | Some | Verify file state matches checkpoint claims. If files reflect checkpoint progress, resume from last step. If not, mark failed — work was lost |
| No | Yes | Any | Run validation.command. If passes → mark completed. If fails → git reset --hard <started_at_commit>, mark failed |
| Yes | No | Any | Run validation WITH uncommitted changes present. If passes → commit, mark completed. If fails → git reset --hard <started_at_commit> + git clean -fd, mark failed |
| Yes | Yes | Any | Commit uncommitted changes, run validation.command. If passes → mark completed. If fails → git reset --hard <started_at_commit> + git clean -fd, mark failed |
- Log recovery:
[timestamp] [SESSION-N] RECOVERY [task-id] action="<action taken>" reason="<reason>"
Error Handling & Recovery Strategies
Each error category has a default recovery strategy:
| Category | Default Recovery | Agent Action |
|---|---|---|
ENV_SETUP |
Re-run init, then STOP if still failing | Run harness-init.sh again immediately. If fails twice, log and stop — environment is broken |
TASK_EXEC |
Rollback via git reset --hard <started_at_commit>, retry |
Verify started_at_commit exists (git cat-file -t <hash>). If missing, mark failed at max_attempts. Otherwise reset, run on_failure.cleanup if defined, retry if attempts < max_attempts |
TEST_FAIL |
Rollback via git reset --hard <started_at_commit>, retry |
Reset to started_at_commit, analyze test output to identify fix, retry with targeted changes |
TIMEOUT |
Kill process, execute cleanup, retry | Wrap validation with timeout <seconds> <command>. On timeout, run on_failure.cleanup, retry (consider splitting task if repeated) |
DEPENDENCY |
Skip task, mark blocked | Log which dependency failed, mark task as failed with dependency reason |
SESSION_TIMEOUT |
Use Context Window Recovery Protocol | New session assesses partial progress via Recovery Protocol — may result in completion or failure depending on validation |
JSON corruption: If harness-tasks.json cannot be parsed, check for harness-tasks.json.bak (written before each modification). If backup exists and is valid, restore from it. If no valid backup, log ERROR [ENV_SETUP] harness-tasks.json corrupted and unrecoverable and STOP — task metadata (validation commands, dependencies, cleanup) cannot be reconstructed from logs alone.
Backup protocol: Before every write to harness-tasks.json, copy the current file to harness-tasks.json.bak.
Environment Initialization
If harness-init.sh exists in the project root, run it at every session start. The script must be idempotent.
Example harness-init.sh:
#!/bin/bash
set -e
npm install 2>/dev/null || pip install -r requirements.txt 2>/dev/null || true
curl -sf http://localhost:5432 >/dev/null 2>&1 || echo "WARN: DB not reachable"
npm test -- --bail --silent 2>/dev/null || echo "WARN: Smoke test failed"
echo "Environment health check complete"
Standardized Log Format
All log entries use grep-friendly format on a single line:
[ISO-timestamp] [SESSION-N] <TYPE> [task-id]? [category]? message
[task-id] and [category] are included when applicable (task-scoped entries). Session-level entries (INIT, LOCK, STATS) omit them.
Types: INIT, Starting, Completed, ERROR, CHECKPOINT, ROLLBACK, RECOVERY, STATS, LOCK, WARN
Error categories: ENV_SETUP, TASK_EXEC, TEST_FAIL, TIMEOUT, DEPENDENCY, SESSION_TIMEOUT
Filtering:
grep "ERROR" harness-progress.txt # All errors
grep "ERROR" harness-progress.txt | grep "TASK_EXEC" # Execution errors only
grep "SESSION-3" harness-progress.txt # All session 3 activity
grep "STATS" harness-progress.txt # All session summaries
grep "CHECKPOINT" harness-progress.txt # All checkpoints
grep "RECOVERY" harness-progress.txt # All recovery actions
Session Statistics
At session end, update harness-tasks.json: increment session_count, set last_session to current timestamp. Then append:
[timestamp] [SESSION-N] STATS tasks_total=10 completed=7 failed=1 pending=2 blocked=0 attempts_total=12 checkpoints=23
blocked is computed at stats time: count of pending tasks whose depends_on includes a permanently failed task. It is not a stored status value.
Init Command (/harness init)
- Create
harness-progress.txtwith initialization entry - Create
harness-tasks.jsonwith empty task list and defaultsession_config - Optionally create
harness-init.shtemplate (chmod +x) - Ask user: add harness files to
.gitignore?
Status Command (/harness status)
Read harness-tasks.json and harness-progress.txt, then display:
- Task summary: count by status (completed, failed, pending, blocked).
blocked= pending tasks whosedepends_onincludes a permanently failed task (computed, not a stored status). - Per-task one-liner:
[status] task-id: title (attempts/max_attempts) - Last 5 lines from
harness-progress.txt - Session count and last session timestamp
Does NOT acquire the lock (read-only operation).
Add Command (/harness add)
Append a new task to harness-tasks.json with auto-incremented id (task-NNN), status pending, default max_attempts: 3, empty depends_on, and no validation command. Prompt user for optional fields: priority, depends_on, validation.command, timeout_seconds. Requires lock acquisition (modifies JSON).
Tool Dependencies
Requires: Bash, file read/write, git. All harness operations must be executed from the project root directory. Does NOT require: specific MCP servers, programming languages, or test frameworks.