Files
Claude-Code-Workflow/.claude/commands/workflow/test-cycle-execute.md
catlog22 97b2247896 feat: enhance test-cycle-execute with intelligent iteration strategies
Optimizations:
- Reduced file size: 791 → 540 lines (-32%)
- Merged redundant sections (Overview + Philosophy + Responsibility Matrix)
- Simplified JSON examples and execution flow descriptions

Enhancements:
1. Intelligent Strategy Engine
   - 4 strategies: conservative/aggressive/exploratory/surgical
   - Auto-selection based on pass rate, stuck tests, regression

2. Progressive Testing
   - Run affected tests during iterations (70-90% faster)
   - Full suite on final validation

3. Parallel Strategy Exploration
   - Trigger on stuck tests (3+ consecutive failures)
   - Git worktree-based parallel execution
   - Select best result by pass rate

Changes:
- Max iterations: 5 → 10
- CLI tools: Gemini/Qwen → Gemini/Qwen/Codex (fallback chain)
- Add strategy metadata to iteration-state.json
- Add test_execution config to fix task JSON

Impact:
- Faster iterations via progressive testing
- Better stuck situation handling via parallel exploration
- Adaptive strategy selection for complex scenarios
2025-11-24 23:02:46 +08:00

18 KiB

name, description, argument-hint, allowed-tools
name description argument-hint allowed-tools
test-cycle-execute Execute test-fix workflow with dynamic task generation and iterative fix cycles until test pass rate >= 95% or max iterations reached. Uses @cli-planning-agent for failure analysis and task generation. [--resume-session="session-id"] [--max-iterations=N] SlashCommand(*), TodoWrite(*), Read(*), Bash(*), Task(*)

Workflow Test-Cycle-Execute Command

Architecture & Philosophy

Core Concept: Dynamic test-fix orchestrator with adaptive task generation based on runtime analysis.

Quality Gate: Iterates until test pass rate >= 95% (with criticality assessment) or 100% (full pass).

Key Differences from Standard Execute:

  • Standard: Pre-defined task queue → Sequential execution → Complete
  • Test-Cycle: Initial tasks → Test → Analyze → Generate fix tasks → Execute → Re-test → Repeat

Orchestrator Boundary (CRITICAL):

  • This command is the ONLY place where test failures are handled
  • All failure analysis delegated to @cli-planning-agent (Gemini/Qwen/Codex)
  • All fixes applied by @test-fix-agent
  • Orchestrator manages: pass rate calculation, criticality assessment, iteration loop, strategy selection

Agent Coordination:

  • Orchestrator: Loop control, strategy selection, pass rate calculation, threshold decisions
  • @cli-planning-agent: CLI analysis (Gemini/Qwen/Codex), root cause extraction, fix task generation
  • @test-fix-agent: Test execution, code fixes, result reporting

Resume Mode: --resume-session flag skips discovery and continues from interruption point.

Enhanced Features

1. Intelligent Strategy Engine

Strategy Types:

  • Conservative (default): Single fix per iteration, full validation
  • Aggressive: Batch fix similar failures (when pass rate >80% + high pattern similarity)
  • Exploratory: Parallel multi-strategy attempt (when stuck on same failures 3+ times)
  • Surgical: Minimal changes, rollback on regression

Strategy Selection Logic:

// In orchestrator, before invoking @cli-planning-agent
function selectStrategy(context) {
  const { passRate, iteration, stuckTests, regressionDetected } = context;

  if (iteration <= 2) return "conservative";
  if (passRate > 80 && context.failurePattern.similarity > 0.7) return "aggressive";
  if (stuckTests > 0 && iteration >= 3) return "exploratory";
  if (regressionDetected) return "surgical";
  return "conservative";
}

Integration: Orchestrator passes selected strategy to @cli-planning-agent in prompt.

2. Progressive Testing

Concept: Run affected tests during iterations, full suite on final validation.

Affected Test Detection:

  • @cli-planning-agent analyzes modified files from fix strategy
  • Maps to test files via imports and integration patterns
  • Returns affected_tests[] in fix task JSON

Execution:

  • Iteration N: npm test -- ${affected_tests.join(' ')} (fast feedback)
  • Final validation: npm test (comprehensive check)

Benefits: 70-90% iteration speed improvement.

Task JSON Integration:

{
  "context": {
    "fix_strategy": {
      "test_execution": {
        "mode": "affected_only",
        "affected_tests": ["tests/auth/client.test.ts"],
        "fallback_to_full": true
      }
    }
  }
}

3. Parallel Strategy Exploration

Trigger: Iteration >= 3 AND stuck tests detected (same test failed 3+ consecutive times)

Workflow:

Stuck State Detected → Switch to "exploratory" strategy:
├── @cli-planning-agent generates 3 alternative fix approaches
├── Orchestrator creates 3 task JSONs: IMPL-fix-Na, IMPL-fix-Nb, IMPL-fix-Nc
├── Execute in parallel via git worktree:
│   ├── worktree-a/ → Apply fix-Na → Test → Score: 92%
│   ├── worktree-b/ → Apply fix-Nb → Test → Score: 88%
│   └── worktree-c/ → Apply fix-Nc → Test → Score: 95% ✓
└── Select best result, apply to main branch

Implementation:

# Orchestrator creates worktrees
git worktree add ../worktree-a
git worktree add ../worktree-b
git worktree add ../worktree-c

# Launch 3 @test-fix-agent instances in parallel
Task(subagent_type="test-fix-agent", prompt="...", model="haiku") x3

# Collect results, select winner by pass_rate

Task JSON:

{
  "id": "IMPL-fix-3-parallel",
  "meta": {
    "type": "parallel-exploration",
    "strategies": ["IMPL-fix-3a", "IMPL-fix-3b", "IMPL-fix-3c"],
    "selection_criteria": "highest_pass_rate"
  }
}

Core Responsibilities

Orchestrator:

  • Session discovery and task queue management
  • Pass rate calculation: (passed / total) * 100 from test-results.json
  • Criticality assessment from test-results.json
  • Strategy selection (conservative/aggressive/exploratory/surgical)
  • Stuck test detection (same test failed 3+ times)
  • Regression detection (pass rate dropped >10%)
  • Iteration control (max 10 iterations, default)
  • CLI tool selection (Gemini → Qwen → Codex fallback chain)
  • Failure analysis delegation to @cli-planning-agent
  • TodoWrite progress tracking
  • Session auto-complete when pass rate >= 95%

@cli-planning-agent:

  • Execute CLI analysis (Gemini/Qwen/Codex) with bug diagnosis template
  • Parse CLI output for root causes and fix strategy
  • Generate IMPL-fix-N.json with structured task definition
  • Detect affected tests for progressive testing
  • Save analysis reports and CLI output

@test-fix-agent:

  • Execute tests and save results to test-results.json
  • Apply fixes from task JSON fix_strategy
  • Assign criticality to test failures
  • Update task status

Execution Flow

Phase 1: Discovery & Initialization

  1. Detect test-fix session from workflow_type: "test_session"
  2. Load workflow-session.json, IMPL_PLAN.md, .task/*.json
  3. Initialize TodoWrite with initial tasks
  4. Setup iteration context (current=0, max=10, strategy="conservative")

Resume Mode: Load iteration-state.json, continue from next_action.

Phase 2: Task Execution Loop

For each task in queue:
  1. Load task JSON and context
  2. Determine task type (test-gen, test-fix, fix-iteration, parallel-exploration)
  3. Execute via appropriate agent
  4. Collect results from test-results.json
  5. Calculate pass rate: (passed / total) * 100
  6. Assess criticality from test-results.json
  7. Detect regression: if passRate < previousPassRate - 10%  REGRESSION
  8. Detect stuck tests: if same test failed 3+ consecutive iterations  STUCK

  9. Make threshold decision:
     IF passRate === 100%:
        SUCCESS: Mark complete, continue to next task

     ELSE IF passRate >= 95%:
        CHECK criticality:
          - All "low"  PARTIAL SUCCESS (auto-approve with note)
          - Any "high"/"medium"  Enter fix loop (step 10)

     ELSE IF passRate < 95%:
        FAILED: Enter fix loop (step 10)

  10. Fix Loop (if needed):
      a. Select strategy: conservative/aggressive/exploratory/surgical
      b. If strategy === "exploratory":
         - Invoke @cli-planning-agent for 3 alternative approaches
         - Create parallel tasks (IMPL-fix-Na, Nb, Nc)
         - Execute in parallel via git worktree
         - Select best result (highest pass_rate)
         - Apply to main branch
      c. Else:
         - Invoke @cli-planning-agent with selected strategy
         - Generate single IMPL-fix-N.json
         - Insert at queue front
      d. Continue loop

  11. Check max iterations (abort if exceeded)

Phase 3: Iteration Cycle (Per Fix Task)

Structure:

Iteration N:
├── 1. Test Execution (via @test-fix-agent)
│   ├── Mode: affected_only or full_suite (from task JSON)
│   ├── Save results: test-results.json, test-output.log
│   └── Report to orchestrator
├── 2. Pass Rate Validation (orchestrator)
│   ├── Calculate: (passed / total) * 100
│   ├── Assess criticality
│   ├── Detect regression/stuck
│   └── Select strategy
├── 3. Failure Analysis (via @cli-planning-agent)
│   ├── Input: failure context + selected strategy
│   ├── CLI tool: Gemini (fallback: Qwen → Codex)
│   ├── Output: IMPL-fix-N.json, iteration-N-analysis.md, cli-output.txt
│   └── Return task ID to orchestrator
└── 4. Fix Execution (via @test-fix-agent)
    ├── Load fix_strategy from task JSON
    ├── Apply changes (surgical fixes)
    └── Return to step 1 for re-test

Phase 4: Completion

Success Conditions:

  • All tasks completed
  • Pass rate === 100% (full success)

Partial Success Conditions:

  • All tasks completed
  • Pass rate >= 95% and < 100%
  • All failures are "low" criticality
  • Action: Auto-approve with review note

Completion Steps:

  1. Final validation: Run full test suite
  2. Calculate final pass rate
  3. Assess status (full/partial/failure)
  4. Update session state
  5. Generate summary with pass rate metrics
  6. Call /workflow:session:complete

Failure Conditions:

  • Max iterations (10) reached without 95% pass rate
  • Pass rate < 95% after max iterations
  • Unrecoverable errors

Failure Handling:

  1. Document final state with pass rate
  2. Generate failure report (remaining failures, iteration history, recommendations)
  3. Mark tasks as blocked
  4. Return control to user

Agent Coordination Details

@cli-planning-agent Invocation

CLI Tool Selection (fallback chain):

  1. Gemini (primary): gemini-2.5-pro
  2. Qwen (fallback): coder-model
  3. Codex (fallback): gpt-5.1-codex

Template: ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt

Timeout: 40min (2400000ms)

Prompt Structure:

Task(
  subagent_type="cli-planning-agent",
  description=`Analyze test failures (iteration ${N}) - ${strategy} strategy`,
  prompt=`
    ## Task Objective
    Analyze test failures and generate fix task JSON for iteration ${N}

    ## Strategy
    ${selectedStrategy} - ${strategyDescription}

    ## MANDATORY FIRST STEPS
    1. Read test results: ${session.test_results_path}
    2. Read test output: ${session.test_output_path}
    3. Read iteration state: ${session.iteration_state_path}

    ## Session Paths
    - Workflow Dir: ${session.workflow_dir}
    - Test Results: ${session.test_results_path}
    - Task Output Dir: ${session.task_dir}
    - Analysis Output: ${session.process_dir}/iteration-${N}-analysis.md

    ## Context Metadata
    - Session ID: ${sessionId}
    - Current Iteration: ${N}
    - Max Iterations: 10
    - Current Pass Rate: ${passRate}%
    - Selected Strategy: ${selectedStrategy}
    - Stuck Tests: ${stuckTests}

    ## CLI Configuration
    - Tool Priority: gemini → qwen → codex
    - Template: 01-diagnose-bug-root-cause.txt
    - Timeout: 2400000ms

    ## Expected Deliverables
    1. Task JSON: ${session.task_dir}/IMPL-fix-${N}.json
       - Must include: fix_strategy.test_execution.affected_tests[]
       - Must include: fix_strategy.confidence_score
    2. Analysis report: ${session.process_dir}/iteration-${N}-analysis.md
    3. CLI output: ${session.process_dir}/iteration-${N}-cli-output.txt
    4. Return task ID to orchestrator

    ## Strategy-Specific Requirements
    ${strategyRequirements[selectedStrategy]}

    ## Success Criteria
    - Valid IMPL-fix-${N}.json with all required fields
    - Concrete fix strategy with modification points (file:function:lines)
    - Affected tests list for progressive testing
    - Root cause analysis (not just symptoms)
  `
)

Strategy-Specific Requirements:

const strategyRequirements = {
  conservative: "Single targeted fix, high confidence required",
  aggressive: "Batch fix similar failures, pattern-based approach",
  exploratory: "Generate 3 alternative approaches with different root cause hypotheses",
  surgical: "Minimal changes, focus on rollback safety"
};

@test-fix-agent Context Loading

File Loading Sequence:

  1. Task JSON (task_json_path)
  2. Iteration state (iteration_state_path)
  3. Test results (test_results_path)
  4. Test output (test_output_path)
  5. Analysis report (analysis_path)
  6. Fix history (fix_history_path)

Paths Provided by Orchestrator:

{
  "task_json_path": ".workflow/active/WFS-test-{session}/.task/IMPL-fix-N.json",
  "iteration_state_path": ".workflow/active/WFS-test-{session}/.process/iteration-state.json",
  "test_results_path": ".workflow/active/WFS-test-{session}/.process/test-results.json",
  "test_output_path": ".workflow/active/WFS-test-{session}/.process/test-output.log",
  "analysis_path": ".workflow/active/WFS-test-{session}/.process/iteration-N-analysis.md",
  "workflow_dir": ".workflow/active/WFS-test-{session}/",
  "session_id": "WFS-test-{session}",
  "current_iteration": N,
  "max_iterations": 10
}

Data Structures

Session File Structure

.workflow/active/WFS-test-{session}/
├── workflow-session.json
├── IMPL_PLAN.md, TODO_LIST.md
├── .task/
│   ├── IMPL-{001,002}.json              # Initial tasks
│   ├── IMPL-fix-{N}.json                # Generated fix tasks
│   └── IMPL-fix-{N}-parallel.json       # Parallel exploration tasks
├── .process/
│   ├── iteration-state.json             # Current iteration + strategy
│   ├── test-results.json                # Latest test results
│   ├── test-output.log                  # Full test output
│   ├── fix-history.json                 # All fix attempts
│   ├── iteration-{N}-analysis.md        # CLI analysis
│   └── iteration-{N}-cli-output.txt     # Raw CLI output
└── .summaries/iteration-summaries/

Iteration State JSON

{
  "session_id": "WFS-test-user-auth",
  "current_task": "IMPL-002",
  "current_iteration": 3,
  "max_iterations": 10,
  "selected_strategy": "aggressive",
  "iterations": [
    {
      "iteration": 1,
      "pass_rate": 70,
      "strategy": "conservative",
      "stuck_tests": [],
      "regression_detected": false
    },
    {
      "iteration": 2,
      "pass_rate": 82,
      "strategy": "conservative",
      "stuck_tests": [],
      "regression_detected": false
    },
    {
      "iteration": 3,
      "pass_rate": 89,
      "strategy": "aggressive",
      "stuck_tests": ["test_auth_edge_case"],
      "regression_detected": false
    }
  ],
  "stuck_tests": ["test_auth_edge_case"],
  "next_action": "execute_fix_task"
}

Task JSON Template (Fix Iteration)

{
  "id": "IMPL-fix-3",
  "title": "Fix test failures - Iteration 3 (aggressive)",
  "status": "pending",
  "meta": {
    "type": "test-fix-iteration",
    "agent": "@test-fix-agent",
    "iteration": 3,
    "strategy": "aggressive",
    "max_iterations": 10
  },
  "context": {
    "requirements": ["Fix identified test failures", "Address root causes"],
    "failure_context": {
      "failed_tests": ["test1", "test2"],
      "error_messages": ["error1", "error2"],
      "pass_rate": 89,
      "stuck_tests": ["test_auth_edge_case"]
    },
    "fix_strategy": {
      "approach": "Batch fix authentication validation across multiple endpoints",
      "modification_points": ["src/auth/middleware.ts:validateToken:45-60"],
      "confidence_score": 0.85,
      "test_execution": {
        "mode": "affected_only",
        "affected_tests": ["tests/auth/*.test.ts"],
        "fallback_to_full": true
      }
    }
  }
}

Error Handling

Scenario Action
Test execution error Log error, retry with error context
CLI analysis failure Fallback: Gemini → Qwen → Codex → manual
Agent execution error Save state, retry with simplified context
Max iterations reached Generate failure report, mark blocked
Regression detected Rollback last fix, switch to surgical strategy
Stuck tests detected Switch to exploratory strategy (parallel)

CLI Fallback Chain:

  1. Try Gemini (primary)
  2. If HTTP 429 or error: Try Qwen
  3. If Qwen fails: Try Codex
  4. If all fail: Mark as degraded, use basic pattern matching

TodoWrite Coordination

Structure:

TodoWrite({
  todos: [
    {
      content: "Execute IMPL-001: Generate tests [code-developer]",
      status: "completed",
      activeForm: "Executing test generation"
    },
    {
      content: "Execute IMPL-002: Test & Fix Cycle [ITERATION]",
      status: "in_progress",
      activeForm: "Running test-fix iteration cycle"
    },
    {
      content: "  → Iteration 1: Initial test (pass: 70%, conservative)",
      status: "completed",
      activeForm: "Running initial tests"
    },
    {
      content: "  → Iteration 2: Fix validation (pass: 82%, conservative)",
      status: "completed",
      activeForm: "Fixing validation issues"
    },
    {
      content: "  → Iteration 3: Batch fix auth (pass: 89%, aggressive)",
      status: "in_progress",
      activeForm: "Fixing authentication issues"
    }
  ]
});

Update Rules:

  1. Add iteration item with strategy and pass rate
  2. Mark completed after each iteration
  3. Update parent task when all complete

Usage

# Execute test-fix workflow (discovers active session)
/workflow:test-cycle-execute

# Resume interrupted session
/workflow:test-cycle-execute --resume-session="WFS-test-user-auth"

# Custom iteration limit
/workflow:test-cycle-execute --max-iterations=15

Best Practices

  1. Iteration Limits: Default 10, adjust based on complexity
  2. Commit Between Iterations: Enables easier rollback
  3. Monitor Iteration Logs: Review CLI analysis quality
  4. Trust Strategy Engine: Let orchestrator select optimal approach
  5. Leverage Progressive Testing: 70-90% faster iterations
  6. Watch for Stuck Tests: Parallel exploration auto-triggers at iteration 3