mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-02-13 02:41:50 +08:00

Files

catlog22 97b2247896 feat: enhance test-cycle-execute with intelligent iteration strategies

Optimizations:
- Reduced file size: 791 → 540 lines (-32%)
- Merged redundant sections (Overview + Philosophy + Responsibility Matrix)
- Simplified JSON examples and execution flow descriptions

Enhancements:
1. Intelligent Strategy Engine
   - 4 strategies: conservative/aggressive/exploratory/surgical
   - Auto-selection based on pass rate, stuck tests, regression

2. Progressive Testing
   - Run affected tests during iterations (70-90% faster)
   - Full suite on final validation

3. Parallel Strategy Exploration
   - Trigger on stuck tests (3+ consecutive failures)
   - Git worktree-based parallel execution
   - Select best result by pass rate

Changes:
- Max iterations: 5 → 10
- CLI tools: Gemini/Qwen → Gemini/Qwen/Codex (fallback chain)
- Add strategy metadata to iteration-state.json
- Add test_execution config to fix task JSON

Impact:
- Faster iterations via progressive testing
- Better stuck situation handling via parallel exploration
- Adaptive strategy selection for complex scenarios

2025-11-24 23:02:46 +08:00

18 KiB

Raw Blame History

name, description, argument-hint, allowed-tools

name	description	argument-hint	allowed-tools
test-cycle-execute	Execute test-fix workflow with dynamic task generation and iterative fix cycles until test pass rate >= 95% or max iterations reached. Uses @cli-planning-agent for failure analysis and task generation.	[--resume-session="session-id"] [--max-iterations=N]	SlashCommand(), TodoWrite(), Read(), Bash(), Task(*)

Workflow Test-Cycle-Execute Command

Architecture & Philosophy

Core Concept: Dynamic test-fix orchestrator with adaptive task generation based on runtime analysis.

Quality Gate: Iterates until test pass rate >= 95% (with criticality assessment) or 100% (full pass).

Key Differences from Standard Execute:

Standard: Pre-defined task queue → Sequential execution → Complete
Test-Cycle: Initial tasks → Test → Analyze → Generate fix tasks → Execute → Re-test → Repeat

Orchestrator Boundary (CRITICAL):

This command is the ONLY place where test failures are handled
All failure analysis delegated to @cli-planning-agent (Gemini/Qwen/Codex)
All fixes applied by @test-fix-agent
Orchestrator manages: pass rate calculation, criticality assessment, iteration loop, strategy selection

Agent Coordination:

Orchestrator: Loop control, strategy selection, pass rate calculation, threshold decisions
@cli-planning-agent: CLI analysis (Gemini/Qwen/Codex), root cause extraction, fix task generation
@test-fix-agent: Test execution, code fixes, result reporting

Resume Mode: --resume-session flag skips discovery and continues from interruption point.

Enhanced Features

1. Intelligent Strategy Engine

Strategy Types:

Conservative (default): Single fix per iteration, full validation
Aggressive: Batch fix similar failures (when pass rate >80% + high pattern similarity)
Exploratory: Parallel multi-strategy attempt (when stuck on same failures 3+ times)
Surgical: Minimal changes, rollback on regression

Strategy Selection Logic:

// In orchestrator, before invoking @cli-planning-agent
function selectStrategy(context) {
  const { passRate, iteration, stuckTests, regressionDetected } = context;

  if (iteration <= 2) return "conservative";
  if (passRate > 80 && context.failurePattern.similarity > 0.7) return "aggressive";
  if (stuckTests > 0 && iteration >= 3) return "exploratory";
  if (regressionDetected) return "surgical";
  return "conservative";
}

Integration: Orchestrator passes selected strategy to @cli-planning-agent in prompt.

2. Progressive Testing

Concept: Run affected tests during iterations, full suite on final validation.

Affected Test Detection:

@cli-planning-agent analyzes modified files from fix strategy
Maps to test files via imports and integration patterns
Returns affected_tests[] in fix task JSON

Execution:

Iteration N: npm test -- ${affected_tests.join(' ')} (fast feedback)
Final validation: npm test (comprehensive check)

Benefits: 70-90% iteration speed improvement.

Task JSON Integration:

{
  "context": {
    "fix_strategy": {
      "test_execution": {
        "mode": "affected_only",
        "affected_tests": ["tests/auth/client.test.ts"],
        "fallback_to_full": true
      }
    }
  }
}

3. Parallel Strategy Exploration

Trigger: Iteration >= 3 AND stuck tests detected (same test failed 3+ consecutive times)

Workflow:

Stuck State Detected → Switch to "exploratory" strategy:
├── @cli-planning-agent generates 3 alternative fix approaches
├── Orchestrator creates 3 task JSONs: IMPL-fix-Na, IMPL-fix-Nb, IMPL-fix-Nc
├── Execute in parallel via git worktree:
│   ├── worktree-a/ → Apply fix-Na → Test → Score: 92%
│   ├── worktree-b/ → Apply fix-Nb → Test → Score: 88%
│   └── worktree-c/ → Apply fix-Nc → Test → Score: 95% ✓
└── Select best result, apply to main branch

Implementation:

# Orchestrator creates worktrees
git worktree add ../worktree-a
git worktree add ../worktree-b
git worktree add ../worktree-c

# Launch 3 @test-fix-agent instances in parallel
Task(subagent_type="test-fix-agent", prompt="...", model="haiku") x3

# Collect results, select winner by pass_rate

Task JSON:

{
  "id": "IMPL-fix-3-parallel",
  "meta": {
    "type": "parallel-exploration",
    "strategies": ["IMPL-fix-3a", "IMPL-fix-3b", "IMPL-fix-3c"],
    "selection_criteria": "highest_pass_rate"
  }
}

Core Responsibilities

Orchestrator:

Session discovery and task queue management
Pass rate calculation: (passed / total) * 100 from test-results.json
Criticality assessment from test-results.json
Strategy selection (conservative/aggressive/exploratory/surgical)
Stuck test detection (same test failed 3+ times)
Regression detection (pass rate dropped >10%)
Iteration control (max 10 iterations, default)
CLI tool selection (Gemini → Qwen → Codex fallback chain)
Failure analysis delegation to @cli-planning-agent
TodoWrite progress tracking
Session auto-complete when pass rate >= 95%

@cli-planning-agent:

Execute CLI analysis (Gemini/Qwen/Codex) with bug diagnosis template
Parse CLI output for root causes and fix strategy
Generate IMPL-fix-N.json with structured task definition
Detect affected tests for progressive testing
Save analysis reports and CLI output

@test-fix-agent:

Execute tests and save results to test-results.json
Apply fixes from task JSON fix_strategy
Assign criticality to test failures
Update task status

Execution Flow

Phase 1: Discovery & Initialization

Detect test-fix session from workflow_type: "test_session"
Load workflow-session.json, IMPL_PLAN.md, .task/*.json
Initialize TodoWrite with initial tasks
Setup iteration context (current=0, max=10, strategy="conservative")

Resume Mode: Load iteration-state.json, continue from next_action.

Phase 2: Task Execution Loop

For each task in queue:
  1. Load task JSON and context
  2. Determine task type (test-gen, test-fix, fix-iteration, parallel-exploration)
  3. Execute via appropriate agent
  4. Collect results from test-results.json
  5. Calculate pass rate: (passed / total) * 100
  6. Assess criticality from test-results.json
  7. Detect regression: if passRate < previousPassRate - 10% → REGRESSION
  8. Detect stuck tests: if same test failed 3+ consecutive iterations → STUCK

  9. Make threshold decision:
     IF passRate === 100%:
       → SUCCESS: Mark complete, continue to next task

     ELSE IF passRate >= 95%:
       → CHECK criticality:
          - All "low" → PARTIAL SUCCESS (auto-approve with note)
          - Any "high"/"medium" → Enter fix loop (step 10)

     ELSE IF passRate < 95%:
       → FAILED: Enter fix loop (step 10)

  10. Fix Loop (if needed):
      a. Select strategy: conservative/aggressive/exploratory/surgical
      b. If strategy === "exploratory":
         - Invoke @cli-planning-agent for 3 alternative approaches
         - Create parallel tasks (IMPL-fix-Na, Nb, Nc)
         - Execute in parallel via git worktree
         - Select best result (highest pass_rate)
         - Apply to main branch
      c. Else:
         - Invoke @cli-planning-agent with selected strategy
         - Generate single IMPL-fix-N.json
         - Insert at queue front
      d. Continue loop

  11. Check max iterations (abort if exceeded)

Phase 3: Iteration Cycle (Per Fix Task)

Structure:

Iteration N:
├── 1. Test Execution (via @test-fix-agent)
│   ├── Mode: affected_only or full_suite (from task JSON)
│   ├── Save results: test-results.json, test-output.log
│   └── Report to orchestrator
├── 2. Pass Rate Validation (orchestrator)
│   ├── Calculate: (passed / total) * 100
│   ├── Assess criticality
│   ├── Detect regression/stuck
│   └── Select strategy
├── 3. Failure Analysis (via @cli-planning-agent)
│   ├── Input: failure context + selected strategy
│   ├── CLI tool: Gemini (fallback: Qwen → Codex)
│   ├── Output: IMPL-fix-N.json, iteration-N-analysis.md, cli-output.txt
│   └── Return task ID to orchestrator
└── 4. Fix Execution (via @test-fix-agent)
    ├── Load fix_strategy from task JSON
    ├── Apply changes (surgical fixes)
    └── Return to step 1 for re-test

Phase 4: Completion

Success Conditions:

All tasks completed
Pass rate === 100% (full success)

Partial Success Conditions:

All tasks completed
Pass rate >= 95% and < 100%
All failures are "low" criticality
Action: Auto-approve with review note

Completion Steps:

Final validation: Run full test suite
Calculate final pass rate
Assess status (full/partial/failure)
Update session state
Generate summary with pass rate metrics
Call /workflow:session:complete

Failure Conditions:

Max iterations (10) reached without 95% pass rate
Pass rate < 95% after max iterations
Unrecoverable errors

Failure Handling:

Document final state with pass rate
Generate failure report (remaining failures, iteration history, recommendations)
Mark tasks as blocked
Return control to user

Agent Coordination Details

@cli-planning-agent Invocation

CLI Tool Selection (fallback chain):

Gemini (primary): gemini-2.5-pro
Qwen (fallback): coder-model
Codex (fallback): gpt-5.1-codex

Template: ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt

Timeout: 40min (2400000ms)

Prompt Structure:

Task(
  subagent_type="cli-planning-agent",
  description=`Analyze test failures (iteration ${N}) - ${strategy} strategy`,
  prompt=`
    ## Task Objective
    Analyze test failures and generate fix task JSON for iteration ${N}

    ## Strategy
    ${selectedStrategy} - ${strategyDescription}

    ## MANDATORY FIRST STEPS
    1. Read test results: ${session.test_results_path}
    2. Read test output: ${session.test_output_path}
    3. Read iteration state: ${session.iteration_state_path}

    ## Session Paths
    - Workflow Dir: ${session.workflow_dir}
    - Test Results: ${session.test_results_path}
    - Task Output Dir: ${session.task_dir}
    - Analysis Output: ${session.process_dir}/iteration-${N}-analysis.md

    ## Context Metadata
    - Session ID: ${sessionId}
    - Current Iteration: ${N}
    - Max Iterations: 10
    - Current Pass Rate: ${passRate}%
    - Selected Strategy: ${selectedStrategy}
    - Stuck Tests: ${stuckTests}

    ## CLI Configuration
    - Tool Priority: gemini → qwen → codex
    - Template: 01-diagnose-bug-root-cause.txt
    - Timeout: 2400000ms

    ## Expected Deliverables
    1. Task JSON: ${session.task_dir}/IMPL-fix-${N}.json
       - Must include: fix_strategy.test_execution.affected_tests[]
       - Must include: fix_strategy.confidence_score
    2. Analysis report: ${session.process_dir}/iteration-${N}-analysis.md
    3. CLI output: ${session.process_dir}/iteration-${N}-cli-output.txt
    4. Return task ID to orchestrator

    ## Strategy-Specific Requirements
    ${strategyRequirements[selectedStrategy]}

    ## Success Criteria
    - Valid IMPL-fix-${N}.json with all required fields
    - Concrete fix strategy with modification points (file:function:lines)
    - Affected tests list for progressive testing
    - Root cause analysis (not just symptoms)
  `
)

Strategy-Specific Requirements:

const strategyRequirements = {
  conservative: "Single targeted fix, high confidence required",
  aggressive: "Batch fix similar failures, pattern-based approach",
  exploratory: "Generate 3 alternative approaches with different root cause hypotheses",
  surgical: "Minimal changes, focus on rollback safety"
};

@test-fix-agent Context Loading

File Loading Sequence:

Task JSON (task_json_path)
Iteration state (iteration_state_path)
Test results (test_results_path)
Test output (test_output_path)
Analysis report (analysis_path)
Fix history (fix_history_path)

Paths Provided by Orchestrator:

{
  "task_json_path": ".workflow/active/WFS-test-{session}/.task/IMPL-fix-N.json",
  "iteration_state_path": ".workflow/active/WFS-test-{session}/.process/iteration-state.json",
  "test_results_path": ".workflow/active/WFS-test-{session}/.process/test-results.json",
  "test_output_path": ".workflow/active/WFS-test-{session}/.process/test-output.log",
  "analysis_path": ".workflow/active/WFS-test-{session}/.process/iteration-N-analysis.md",
  "workflow_dir": ".workflow/active/WFS-test-{session}/",
  "session_id": "WFS-test-{session}",
  "current_iteration": N,
  "max_iterations": 10
}

Data Structures

Session File Structure

.workflow/active/WFS-test-{session}/
├── workflow-session.json
├── IMPL_PLAN.md, TODO_LIST.md
├── .task/
│   ├── IMPL-{001,002}.json              # Initial tasks
│   ├── IMPL-fix-{N}.json                # Generated fix tasks
│   └── IMPL-fix-{N}-parallel.json       # Parallel exploration tasks
├── .process/
│   ├── iteration-state.json             # Current iteration + strategy
│   ├── test-results.json                # Latest test results
│   ├── test-output.log                  # Full test output
│   ├── fix-history.json                 # All fix attempts
│   ├── iteration-{N}-analysis.md        # CLI analysis
│   └── iteration-{N}-cli-output.txt     # Raw CLI output
└── .summaries/iteration-summaries/

Iteration State JSON

{
  "session_id": "WFS-test-user-auth",
  "current_task": "IMPL-002",
  "current_iteration": 3,
  "max_iterations": 10,
  "selected_strategy": "aggressive",
  "iterations": [
    {
      "iteration": 1,
      "pass_rate": 70,
      "strategy": "conservative",
      "stuck_tests": [],
      "regression_detected": false
    },
    {
      "iteration": 2,
      "pass_rate": 82,
      "strategy": "conservative",
      "stuck_tests": [],
      "regression_detected": false
    },
    {
      "iteration": 3,
      "pass_rate": 89,
      "strategy": "aggressive",
      "stuck_tests": ["test_auth_edge_case"],
      "regression_detected": false
    }
  ],
  "stuck_tests": ["test_auth_edge_case"],
  "next_action": "execute_fix_task"
}

Task JSON Template (Fix Iteration)

{
  "id": "IMPL-fix-3",
  "title": "Fix test failures - Iteration 3 (aggressive)",
  "status": "pending",
  "meta": {
    "type": "test-fix-iteration",
    "agent": "@test-fix-agent",
    "iteration": 3,
    "strategy": "aggressive",
    "max_iterations": 10
  },
  "context": {
    "requirements": ["Fix identified test failures", "Address root causes"],
    "failure_context": {
      "failed_tests": ["test1", "test2"],
      "error_messages": ["error1", "error2"],
      "pass_rate": 89,
      "stuck_tests": ["test_auth_edge_case"]
    },
    "fix_strategy": {
      "approach": "Batch fix authentication validation across multiple endpoints",
      "modification_points": ["src/auth/middleware.ts:validateToken:45-60"],
      "confidence_score": 0.85,
      "test_execution": {
        "mode": "affected_only",
        "affected_tests": ["tests/auth/*.test.ts"],
        "fallback_to_full": true
      }
    }
  }
}

Error Handling

Scenario	Action
Test execution error	Log error, retry with error context
CLI analysis failure	Fallback: Gemini → Qwen → Codex → manual
Agent execution error	Save state, retry with simplified context
Max iterations reached	Generate failure report, mark blocked
Regression detected	Rollback last fix, switch to surgical strategy
Stuck tests detected	Switch to exploratory strategy (parallel)

CLI Fallback Chain:

Try Gemini (primary)
If HTTP 429 or error: Try Qwen
If Qwen fails: Try Codex
If all fail: Mark as degraded, use basic pattern matching

TodoWrite Coordination