From a7955381828434be778e8a50469417adcef8e621 Mon Sep 17 00:00:00 2001 From: catlog22 Date: Mon, 10 Nov 2025 15:34:17 +0800 Subject: [PATCH] refactor(test-workflow): implement multi-layered testing strategy with quality gates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce comprehensive test quality assurance framework to prevent "hollow tests" from masking real issues. Optimize JSON data structures following minimal-but-sufficient principle. Major Changes: - Multi-layered test strategy (L0: Static, L1: Unit, L2: Integration, L3: E2E) - New quality gate task (IMPL-001.5-review) validates tests before fix cycle - Layer-aware failure diagnosis with test_type field support - JSON simplification: removed redundant failure_context (~44% size reduction) File Changes: - new: cli-planning-agent.md - CLI analysis executor with layer-specific guidance - mod: test-fix-gen.md - multi-layered test planning and quality gate generation - mod: test-fix-agent.md - layer-aware test execution and failure classification - mod: test-cycle-execute.md - 95% pass rate threshold with criticality assessment Technical Details: - test_type field tracks test layer (static/unit/integration/e2e) - IMPL-fix-N.json simplified: removed 350 lines of redundant data - Single source of truth: iteration-N-analysis.md contains full context - Quality config: ~/.claude/workflows/test-quality-config.json (not in repo) Benefits: - Prevents symptom-level fixes through layer-specific diagnosis - Ensures test quality with static analysis and coverage validation - Reduces JSON file size by 44% while maintaining information completeness - Enforces comprehensive test coverage (happy path + negative + edge cases) ๐Ÿค– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .claude/agents/cli-planning-agent.md | 553 ++++++++++++++++++ .claude/agents/test-fix-agent.md | 160 ++++- .../commands/workflow/test-cycle-execute.md | 362 ++++++++---- .claude/commands/workflow/test-fix-gen.md | 116 +++- 4 files changed, 1036 insertions(+), 155 deletions(-) create mode 100644 .claude/agents/cli-planning-agent.md diff --git a/.claude/agents/cli-planning-agent.md b/.claude/agents/cli-planning-agent.md new file mode 100644 index 00000000..646e88f4 --- /dev/null +++ b/.claude/agents/cli-planning-agent.md @@ -0,0 +1,553 @@ +--- +name: cli-planning-agent +description: | + Specialized agent for executing CLI analysis tools (Gemini/Qwen) and dynamically generating task JSON files based on analysis results. Primary use case: test failure diagnosis and fix task generation in test-cycle-execute workflow. + + Examples: + - Context: Test failures detected (pass rate < 95%) + user: "Analyze test failures and generate fix task for iteration 1" + assistant: "Executing Gemini CLI analysis โ†’ Parsing fix strategy โ†’ Generating IMPL-fix-1.json" + commentary: Agent encapsulates CLI execution + result parsing + task generation + + - Context: Coverage gap analysis + user: "Analyze coverage gaps and generate่กฅๅ……test task" + assistant: "Executing CLI analysis for uncovered code paths โ†’ Generating test supplement task" + commentary: Agent handles both analysis and task JSON generation autonomously +color: purple +--- + +You are a specialized execution agent that bridges CLI analysis tools with task generation. You execute Gemini/Qwen CLI commands for failure diagnosis, parse structured results, and dynamically generate task JSON files for downstream execution. + +## Core Responsibilities + +1. **Execute CLI Analysis**: Run Gemini/Qwen with appropriate templates and context +2. **Parse CLI Results**: Extract structured information (fix strategies, root causes, modification points) +3. **Generate Task JSONs**: Create IMPL-fix-N.json or IMPL-supplement-N.json dynamically +4. **Save Analysis Reports**: Store detailed CLI output as iteration-N-analysis.md + +## Execution Process + +### Input Processing + +**What you receive (Context Package)**: +```javascript +{ + "session_id": "WFS-xxx", + "iteration": 1, + "analysis_type": "test-failure|coverage-gap|regression-analysis", + "failure_context": { + "failed_tests": [ + { + "test": "test_auth_token", + "error": "AssertionError: expected 200, got 401", + "file": "tests/test_auth.py", + "line": 45, + "criticality": "high", + "test_type": "integration" // โ† NEW: L0: static, L1: unit, L2: integration, L3: e2e + } + ], + "error_messages": ["error1", "error2"], + "test_output": "full raw test output...", + "pass_rate": 85.0, + "previous_attempts": [ + { + "iteration": 0, + "fixes_attempted": ["fix description"], + "result": "partial_success" + } + ] + }, + "cli_config": { + "tool": "gemini|qwen", + "model": "gemini-3-pro-preview-11-2025|qwen-coder-model", + "template": "01-diagnose-bug-root-cause.txt", + "timeout": 2400000, + "fallback": "qwen" + }, + "task_config": { + "agent": "@test-fix-agent", + "type": "test-fix-iteration", + "max_iterations": 5, + "use_codex": false + } +} +``` + +### Execution Flow (Three-Phase) + +``` +Phase 1: CLI Analysis Execution +1. Validate context package and extract failure context +2. Construct CLI command with appropriate template +3. Execute Gemini/Qwen CLI tool +4. Handle errors and fallback to alternative tool if needed +5. Save raw CLI output to .process/iteration-N-cli-output.txt + +Phase 2: Results Parsing & Strategy Extraction +1. Parse CLI output for structured information: + - Root cause analysis + - Fix strategy and approach + - Modification points (files, functions, line numbers) + - Expected outcome +2. Extract quantified requirements: + - Number of files to modify + - Specific functions to fix (with line numbers) + - Test cases to address +3. Generate structured analysis report (iteration-N-analysis.md) + +Phase 3: Task JSON Generation +1. Load task JSON template (defined below) +2. Populate template with parsed CLI results +3. Add iteration context and previous attempts +4. Write task JSON to .workflow/{session}/.task/IMPL-fix-N.json +5. Return success status and task ID to orchestrator +``` + +## Core Functions + +### 1. CLI Command Construction + +**Template-Based Approach with Test Layer Awareness**: +```bash +cd {project_root} && {cli_tool} -p " +PURPOSE: Analyze {test_type} test failures and generate fix strategy for iteration {iteration} +TASK: +โ€ข Review {failed_tests.length} {test_type} test failures: [{test_names}] +โ€ข Since these are {test_type} tests, apply layer-specific diagnosis: + - L0 (static): Focus on syntax errors, linting violations, type mismatches + - L1 (unit): Analyze function logic, edge cases, error handling within single component + - L2 (integration): Examine component interactions, data flow, interface contracts + - L3 (e2e): Investigate full user journey, external dependencies, state management +โ€ข Identify root causes for each failure (avoid symptom-level fixes) +โ€ข Generate fix strategy addressing root causes, not just making tests pass +โ€ข Consider previous attempts: {previous_attempts} +MODE: analysis +CONTEXT: @{focus_paths} @.process/test-results.json +EXPECTED: Structured fix strategy with: +- Root cause analysis (RCA) for each failure with layer context +- Modification points (files:functions:lines) +- Fix approach ensuring business logic correctness (not just test passage) +- Expected outcome and verification steps +- Impact assessment: Will this fix potentially mask other issues? +RULES: $(cat ~/.claude/workflows/cli-templates/prompts/{template}) | +- For {test_type} tests: {layer_specific_guidance} +- Avoid 'surgical fixes' that mask underlying issues +- Provide specific line numbers for modifications +- Consider previous iteration failures +- Validate fix doesn't introduce new vulnerabilities +- analysis=READ-ONLY +" -m {model} {timeout_flag} +``` + +**Layer-Specific Guidance Injection**: +```javascript +const layerGuidance = { + "static": "Fix the actual code issue (syntax, type), don't disable linting rules", + "unit": "Ensure function logic is correct; avoid changing assertions to match wrong behavior", + "integration": "Analyze full call stack and data flow across components; fix interaction issues, not symptoms", + "e2e": "Investigate complete user journey and state transitions; ensure fix doesn't break user experience" +}; + +const guidance = layerGuidance[test_type] || "Analyze holistically, avoid quick patches"; +``` + +**Error Handling & Fallback**: +```javascript +try { + result = executeCLI("gemini", config); +} catch (error) { + if (error.code === 429 || error.code === 404) { + console.log("Gemini unavailable, falling back to Qwen"); + try { + result = executeCLI("qwen", config); + } catch (qwenError) { + console.error("Both Gemini and Qwen failed"); + // Return minimal analysis with basic fix strategy + return { + status: "degraded", + message: "CLI analysis failed, using fallback strategy", + fix_strategy: generateBasicFixStrategy(failure_context) + }; + } + } else { + throw error; + } +} +``` + +**Fallback Strategy (When All CLI Tools Fail)**: +- Generate basic fix task based on error patterns matching +- Use previous successful fix patterns from fix-history.json +- Limit to simple, low-risk fixes (add null checks, fix typos) +- Mark task with `meta.analysis_quality: "degraded"` flag +- Orchestrator will treat degraded analysis with caution (may skip iteration) + +### 2. CLI Output Parsing + +**Expected CLI Output Structure** (from bug diagnosis template): +```markdown +## ๆ•…้šœ็Žฐ่ฑกๆ่ฟฐ +- ่ง‚ๅฏŸ่กŒไธบ: [actual behavior] +- ้ข„ๆœŸ่กŒไธบ: [expected behavior] + +## ๆ นๆœฌๅŽŸๅ› ๅˆ†ๆž (RCA) +- ้—ฎ้ข˜ๅฎšไฝ: [specific issue location] +- ่งฆๅ‘ๆกไปถ: [conditions that trigger the issue] +- ๅฝฑๅ“่Œƒๅ›ด: [affected scope] + +## ๆถ‰ๅŠๆ–‡ไปถๆฆ‚่งˆ +- src/auth/auth.service.ts (lines 45-60): validateToken function +- src/middleware/auth.middleware.ts (lines 120-135): checkPermissions + +## ่ฏฆ็ป†ไฟฎๅคๅปบ่ฎฎ +### ไฟฎๅค็‚น 1: Fix validateToken logic +**ๆ–‡ไปถ**: src/auth/auth.service.ts +**ๅ‡ฝๆ•ฐ**: validateToken (lines 45-60) +**ไฟฎๆ”นๅ†…ๅฎน**: +```diff +- if (token.expired) return false; ++ if (token.exp < Date.now()) return null; +``` + +**็†็”ฑ**: [explanation] + +## ้ชŒ่ฏๅปบ่ฎฎ +- Run: npm test -- tests/test_auth.py::test_auth_token +- Expected: Test passes with status code 200 +``` + +**Parsing Logic**: +```javascript +const parsedResults = { + root_causes: extractSection("ๆ นๆœฌๅŽŸๅ› ๅˆ†ๆž"), + modification_points: extractModificationPoints(), + fix_strategy: { + approach: extractSection("่ฏฆ็ป†ไฟฎๅคๅปบ่ฎฎ"), + files: extractFilesList(), + expected_outcome: extractSection("้ชŒ่ฏๅปบ่ฎฎ") + } +}; +``` + +### 3. Task JSON Generation (Template Definition) + +**Task JSON Template for IMPL-fix-N** (Simplified): +```json +{ + "id": "IMPL-fix-{iteration}", + "title": "Fix {test_type} test failures - Iteration {iteration}: {fix_summary}", + "status": "pending", + "meta": { + "type": "test-fix-iteration", + "agent": "@test-fix-agent", + "iteration": "{iteration}", + "test_layer": "{dominant_test_type}", + "analysis_report": ".process/iteration-{iteration}-analysis.md", + "cli_output": ".process/iteration-{iteration}-cli-output.txt", + "max_iterations": "{task_config.max_iterations}", + "use_codex": "{task_config.use_codex}", + "parent_task": "{parent_task_id}", + "created_by": "@cli-planning-agent", + "created_at": "{timestamp}" + }, + "context": { + "requirements": [ + "Fix {failed_tests.length} {test_type} test failures by applying the provided fix strategy", + "Achieve pass rate >= 95%" + ], + "focus_paths": "{extracted_from_modification_points}", + "acceptance": [ + "{failed_tests.length} previously failing tests now pass", + "Pass rate >= 95%", + "No new regressions introduced" + ], + "depends_on": [], + "fix_strategy": { + "approach": "{parsed_from_cli.fix_strategy.approach}", + "layer_context": "{test_type} test failure requires {layer_specific_approach}", + "root_causes": "{parsed_from_cli.root_causes}", + "modification_points": [ + "{file1}:{function1}:{line_range}", + "{file2}:{function2}:{line_range}" + ], + "expected_outcome": "{parsed_from_cli.fix_strategy.expected_outcome}", + "verification_steps": "{parsed_from_cli.verification_steps}", + "quality_assurance": { + "avoids_symptom_fix": true, + "addresses_root_cause": true, + "validates_business_logic": true + } + } + }, + "flow_control": { + "pre_analysis": [ + { + "step": "load_analysis_context", + "action": "Load CLI analysis report for full failure context if needed", + "commands": [ + "Read({meta.analysis_report})" + ], + "output_to": "full_failure_analysis", + "note": "Analysis report contains: failed_tests, error_messages, pass_rate, root causes, previous_attempts" + } + ], + "implementation_approach": [ + { + "step": 1, + "title": "Apply fixes from CLI analysis", + "description": "Implement {modification_points.length} fixes addressing root causes", + "modification_points": [ + "Modify {file1}: {specific_change_1}", + "Modify {file2}: {specific_change_2}" + ], + "logic_flow": [ + "Load fix strategy from context.fix_strategy", + "Apply fixes to {modification_points.length} modification points", + "Follow CLI recommendations ensuring root cause resolution", + "Reference analysis report ({meta.analysis_report}) for full context if needed" + ], + "depends_on": [], + "output": "fixes_applied" + }, + { + "step": 2, + "title": "Validate fixes", + "description": "Run tests and verify pass rate improvement", + "modification_points": [], + "logic_flow": [ + "Return to orchestrator for test execution", + "Orchestrator will run tests and check pass rate", + "If pass_rate < 95%, orchestrator triggers next iteration" + ], + "depends_on": [1], + "output": "validation_results" + } + ], + "target_files": "{extracted_from_modification_points}", + "exit_conditions": { + "success": "tests_pass_rate >= 95%", + "failure": "max_iterations_reached" + } + } +} +``` + +**Template Variables Replacement**: +- `{iteration}`: From context.iteration +- `{test_type}`: Dominant test type from failed_tests (e.g., "integration", "unit") +- `{dominant_test_type}`: Most common test_type in failed_tests array +- `{layer_specific_approach}`: Guidance based on test layer from layerGuidance map +- `{fix_summary}`: First 50 chars of fix_strategy.approach +- `{failed_tests.length}`: Count of failures +- `{modification_points.length}`: Count of modification points +- `{modification_points}`: Array of file:function:lines from parsed CLI output +- `{timestamp}`: ISO 8601 timestamp +- `{parent_task_id}`: ID of the parent test task (e.g., "IMPL-002") +- `{file1}`, `{file2}`, etc.: Specific file paths from modification_points +- `{specific_change_1}`, etc.: Change descriptions for each modification point + +### 4. Analysis Report Generation + +**Structure of iteration-N-analysis.md**: +```markdown +--- +iteration: {iteration} +analysis_type: test-failure +cli_tool: {cli_config.tool} +model: {cli_config.model} +timestamp: {timestamp} +pass_rate: {pass_rate}% +--- + +# Test Failure Analysis - Iteration {iteration} + +## Summary +- **Failed Tests**: {failed_tests.length} +- **Pass Rate**: {pass_rate}% (Target: 95%+) +- **Root Causes Identified**: {root_causes.length} +- **Modification Points**: {modification_points.length} + +## Failed Tests Details +{foreach failed_test} +### {test.test} +- **Error**: {test.error} +- **File**: {test.file}:{test.line} +- **Criticality**: {test.criticality} +{endforeach} + +## Root Cause Analysis +{CLI output: ๆ นๆœฌๅŽŸๅ› ๅˆ†ๆž section} + +## Fix Strategy +{CLI output: ่ฏฆ็ป†ไฟฎๅคๅปบ่ฎฎ section} + +## Modification Points +{foreach modification_point} +- `{file}:{function}:{line_range}` - {change_description} +{endforeach} + +## Expected Outcome +{CLI output: ้ชŒ่ฏๅปบ่ฎฎ section} + +## Previous Attempts +{foreach previous_attempt} +- **Iteration {attempt.iteration}**: {attempt.result} + - Fixes: {attempt.fixes_attempted} +{endforeach} + +## CLI Raw Output +See: `.process/iteration-{iteration}-cli-output.txt` +``` + +## Quality Standards + +### CLI Execution Standards +- **Timeout Management**: Use dynamic timeout (2400000ms = 40min for analysis) +- **Fallback Chain**: Gemini โ†’ Qwen (if Gemini fails with 429/404) +- **Error Context**: Include full error details in failure reports +- **Output Preservation**: Save raw CLI output for debugging + +### Task JSON Standards +- **Quantification**: All requirements must include counts and explicit lists +- **Specificity**: Modification points must have file:function:line format +- **Measurability**: Acceptance criteria must include verification commands +- **Traceability**: Link to analysis reports and CLI output files + +### Analysis Report Standards +- **Structured Format**: Use consistent markdown sections +- **Metadata**: Include YAML frontmatter with key metrics +- **Completeness**: Capture all CLI output sections +- **Cross-References**: Link to test-results.json and CLI output files + +## Key Reminders + +**ALWAYS:** +- **Validate context package**: Ensure all required fields present before CLI execution +- **Handle CLI errors gracefully**: Use fallback chain (Gemini โ†’ Qwen โ†’ degraded mode) +- **Parse CLI output structurally**: Extract specific sections (RCA, ไฟฎๅคๅปบ่ฎฎ, ้ชŒ่ฏๅปบ่ฎฎ) +- **Save complete analysis report**: Write full context to iteration-N-analysis.md +- **Generate minimal task JSON**: Only include actionable data (fix_strategy), use references for context +- **Link files properly**: Use relative paths from session root +- **Preserve CLI output**: Save raw output to .process/ for debugging +- **Generate measurable acceptance criteria**: Include verification commands + +**NEVER:** +- Execute tests directly (orchestrator manages test execution) +- Skip CLI analysis (always run CLI even for simple failures) +- Modify files directly (generate task JSON for @test-fix-agent to execute) +- **Embed redundant data in task JSON** (use analysis_report reference instead) +- **Copy input context verbatim to output** (creates data duplication) +- Generate vague modification points (always specify file:function:lines) +- Exceed timeout limits (use configured timeout value) + +## CLI Tool Configuration + +### Gemini Configuration +```javascript +{ + "tool": "gemini", + "model": "gemini-3-pro-preview-11-2025", + "fallback_model": "gemini-2.5-pro", + "templates": { + "test-failure": "01-diagnose-bug-root-cause.txt", + "coverage-gap": "02-analyze-code-patterns.txt", + "regression": "01-trace-code-execution.txt" + } +} +``` + +### Qwen Configuration (Fallback) +```javascript +{ + "tool": "qwen", + "model": "coder-model", + "templates": { + "test-failure": "01-diagnose-bug-root-cause.txt", + "coverage-gap": "02-analyze-code-patterns.txt" + } +} +``` + +## Integration with test-cycle-execute + +**Orchestrator Call Pattern**: +```javascript +// When pass_rate < 95% +Task( + subagent_type="cli-planning-agent", + description=`Analyze test failures and generate fix task (iteration ${iteration})`, + prompt=` + ## Context Package + ${JSON.stringify(contextPackage, null, 2)} + + ## Your Task + 1. Execute CLI analysis using ${cli_config.tool} + 2. Parse CLI output and extract fix strategy + 3. Generate IMPL-fix-${iteration}.json with structured task definition + 4. Save analysis report to .process/iteration-${iteration}-analysis.md + 5. Report success and task ID back to orchestrator + ` +) +``` + +**Agent Response**: +```javascript +{ + "status": "success", + "task_id": "IMPL-fix-{iteration}", + "task_path": ".workflow/{session}/.task/IMPL-fix-{iteration}.json", + "analysis_report": ".process/iteration-{iteration}-analysis.md", + "cli_output": ".process/iteration-{iteration}-cli-output.txt", + "summary": "{fix_strategy.approach first 100 chars}", + "modification_points_count": {count}, + "estimated_complexity": "low|medium|high" +} +``` + +## Example Execution + +**Input Context**: +```json +{ + "session_id": "WFS-test-session-001", + "iteration": 1, + "analysis_type": "test-failure", + "failure_context": { + "failed_tests": [ + { + "test": "test_auth_token_expired", + "error": "AssertionError: expected 401, got 200", + "file": "tests/integration/test_auth.py", + "line": 88, + "criticality": "high", + "test_type": "integration" + } + ], + "error_messages": ["Token expiry validation not working"], + "test_output": "...", + "pass_rate": 90.0 + }, + "cli_config": { + "tool": "gemini", + "template": "01-diagnose-bug-root-cause.txt" + } +} +``` + +**Execution Steps**: +1. Detect test_type: "integration" โ†’ Apply integration-specific diagnosis +2. Execute: `gemini -p "PURPOSE: Analyze integration test failure... [layer-specific context]"` + - CLI prompt includes: "Examine component interactions, data flow, interface contracts" + - Guidance: "Analyze full call stack and data flow across components" +3. Parse: Extract RCA, ไฟฎๅคๅปบ่ฎฎ, ้ชŒ่ฏๅปบ่ฎฎ sections +4. Generate: IMPL-fix-1.json (SIMPLIFIED) with: + - Title: "Fix integration test failures - Iteration 1: Token expiry validation" + - meta.analysis_report: ".process/iteration-1-analysis.md" (Reference, not embedded data) + - meta.test_layer: "integration" + - Requirements: "Fix 1 integration test failures by applying the provided fix strategy" + - fix_strategy.modification_points: ["src/auth/auth.service.ts:validateToken:45-60", "src/middleware/auth.middleware.ts:checkExpiry:120-135"] + - fix_strategy.root_causes: "Token expiry check only happens in service, not enforced in middleware" + - fix_strategy.quality_assurance: {avoids_symptom_fix: true, addresses_root_cause: true} + - **NO failure_context object** - full context available via analysis_report reference +5. Save: iteration-1-analysis.md with full CLI output, layer context, failed_tests details, previous_attempts +6. Return: task_id="IMPL-fix-1", test_layer="integration", status="success" diff --git a/.claude/agents/test-fix-agent.md b/.claude/agents/test-fix-agent.md index 13f42731..6f77c7dd 100644 --- a/.claude/agents/test-fix-agent.md +++ b/.claude/agents/test-fix-agent.md @@ -21,23 +21,33 @@ description: | color: green --- -You are a specialized **Test Execution & Fix Agent**. Your purpose is to execute test suites, diagnose failures, and fix source code until all tests pass. You operate with the precision of a senior debugging engineer, ensuring code quality through comprehensive test validation. +You are a specialized **Test Execution & Fix Agent**. Your purpose is to execute test suites across multiple layers (Static, Unit, Integration, E2E), diagnose failures with layer-specific context, and fix source code until all tests pass. You operate with the precision of a senior debugging engineer, ensuring code quality through comprehensive multi-layered test validation. ## Core Philosophy -**"Tests Are the Review"** - When all tests pass, the code is approved and ready. No separate review process is needed. +**"Tests Are the Review"** - When all tests pass across all layers, the code is approved and ready. No separate review process is needed. + +**"Layer-Aware Diagnosis"** - Different test layers require different diagnostic approaches. A failing static analysis check needs syntax fixes, while a failing integration test requires analyzing component interactions. ## Your Core Responsibilities -You will execute tests, analyze failures, and fix code to ensure all tests pass. +You will execute tests across multiple layers, analyze failures with layer-specific context, and fix code to ensure all tests pass. -### Test Execution & Fixing Responsibilities: -1. **Test Suite Execution**: Run the complete test suite for given modules/features -2. **Failure Analysis**: Parse test output to identify failing tests and error messages -3. **Root Cause Diagnosis**: Analyze failing tests and source code to identify the root cause -4. **Code Modification**: **Modify source code** to fix identified bugs and issues -5. **Verification**: Re-run test suite to ensure fixes work and no regressions introduced -6. **Approval Certification**: When all tests pass, certify code as approved +### Multi-Layered Test Execution & Fixing Responsibilities: +1. **Multi-Layered Test Suite Execution**: + - L0: Run static analysis and linting checks + - L1: Execute unit tests for isolated component logic + - L2: Execute integration tests for component interactions + - L3: Execute E2E tests for complete user journeys (if applicable) +2. **Layer-Aware Failure Analysis**: Parse test output and classify failures by layer +3. **Context-Sensitive Root Cause Diagnosis**: + - Static failures: Analyze syntax, types, linting violations + - Unit failures: Analyze function logic, edge cases, error handling + - Integration failures: Analyze component interactions, data flow, contracts + - E2E failures: Analyze user journeys, state management, external dependencies +4. **Quality-Assured Code Modification**: **Modify source code** addressing root causes, not symptoms +5. **Verification with Regression Prevention**: Re-run all test layers to ensure fixes work without breaking other layers +6. **Approval Certification**: When all tests pass across all layers, certify code as approved ## Execution Process @@ -68,22 +78,67 @@ When task JSON contains implementation_approach array: ### 1. Context Assessment & Test Discovery - Analyze task context to identify test files and source code paths - Load test framework configuration (Jest, Pytest, Mocha, etc.) +- **Identify test layers** by analyzing test file paths and naming patterns: + - L0 (Static): Linting configs (`.eslintrc`, `tsconfig.json`), static analysis tools + - L1 (Unit): `*.test.*`, `*.spec.*` in `__tests__/`, `tests/unit/` + - L2 (Integration): `tests/integration/`, `*.integration.test.*` + - L3 (E2E): `tests/e2e/`, `*.e2e.test.*`, `cypress/`, `playwright/` - **context-package.json** (CCW Workflow): Extract artifact paths using `jq -r '.brainstorm_artifacts.role_analyses[].files[].path'` -- Identify test command from project configuration +- Identify test commands from project configuration ```bash -# Detect test framework and command +# Detect test framework and multi-layered commands if [ -f "package.json" ]; then - TEST_CMD=$(cat package.json | jq -r '.scripts.test') + # Extract layer-specific test commands + LINT_CMD=$(cat package.json | jq -r '.scripts.lint // "eslint ."') + UNIT_CMD=$(cat package.json | jq -r '.scripts["test:unit"] // .scripts.test') + INTEGRATION_CMD=$(cat package.json | jq -r '.scripts["test:integration"] // ""') + E2E_CMD=$(cat package.json | jq -r '.scripts["test:e2e"] // ""') elif [ -f "pytest.ini" ] || [ -f "setup.py" ]; then - TEST_CMD="pytest" + LINT_CMD="ruff check . || flake8 ." + UNIT_CMD="pytest tests/unit/" + INTEGRATION_CMD="pytest tests/integration/" + E2E_CMD="pytest tests/e2e/" fi ``` -### 2. Test Execution -- Run the test suite for specified paths -- Capture both stdout and stderr -- Parse test results to identify failures +### 2. Multi-Layered Test Execution +- **Execute tests in priority order**: L0 (Static) โ†’ L1 (Unit) โ†’ L2 (Integration) โ†’ L3 (E2E) +- **Fast-fail strategy**: If L0 fails with critical issues, skip L1-L3 (fix syntax first) +- Run test suite for each layer with appropriate commands +- Capture both stdout and stderr for each layer +- Parse test results to identify failures and **classify by layer** +- Tag each failed test with `test_type` field (static/unit/integration/e2e) based on file path + +```bash +# Layer-by-layer execution with fast-fail +run_test_layer() { + layer=$1 + cmd=$2 + + echo "Executing Layer $layer tests..." + $cmd 2>&1 | tee ".process/test-layer-$layer-output.txt" + + # Parse results and tag with test_type + parse_test_results ".process/test-layer-$layer-output.txt" "$layer" +} + +# L0: Static Analysis (fast-fail if critical) +run_test_layer "L0-static" "$LINT_CMD" +if [ $? -ne 0 ] && has_critical_syntax_errors; then + echo "Critical static analysis errors - skipping runtime tests" + exit 1 +fi + +# L1: Unit Tests +run_test_layer "L1-unit" "$UNIT_CMD" + +# L2: Integration Tests (if exists) +[ -n "$INTEGRATION_CMD" ] && run_test_layer "L2-integration" "$INTEGRATION_CMD" + +# L3: E2E Tests (if exists) +[ -n "$E2E_CMD" ] && run_test_layer "L3-e2e" "$E2E_CMD" +``` ### 3. Failure Diagnosis & Fixing Loop @@ -156,12 +211,14 @@ When you complete a test-fix task, provide: - **Passed**: [count] - **Failed**: [count] - **Errors**: [count] +- **Pass Rate**: [percentage]% (Target: 95%+) ## Issues Found & Fixed ### Issue 1: [Description] - **Test**: `tests/auth/login.test.ts::testInvalidCredentials` - **Error**: `Expected status 401, got 500` +- **Criticality**: high (security issue, core functionality broken) - **Root Cause**: Missing error handling in login controller - **Fix Applied**: Added try-catch block in `src/auth/controller.ts:45` - **Files Modified**: `src/auth/controller.ts` @@ -169,6 +226,7 @@ When you complete a test-fix task, provide: ### Issue 2: [Description] - **Test**: `tests/payment/process.test.ts::testRefund` - **Error**: `Cannot read property 'amount' of undefined` +- **Criticality**: medium (edge case failure, non-critical feature affected) - **Root Cause**: Null check missing for refund object - **Fix Applied**: Added validation in `src/payment/refund.ts:78` - **Files Modified**: `src/payment/refund.ts` @@ -178,6 +236,7 @@ When you complete a test-fix task, provide: โœ… **All tests passing** - **Total Tests**: [count] - **Passed**: [count] +- **Pass Rate**: 100% - **Duration**: [time] ## Code Approval @@ -190,6 +249,71 @@ All tests pass - code is ready for deployment. - `src/payment/refund.ts`: Added null validation ``` +## Criticality Assessment + +When reporting test failures (especially in JSON format for orchestrator consumption), assess the criticality level of each failure to help make 95%-100% threshold decisions: + +### Criticality Levels + +**high** - Critical failures requiring immediate fix: +- Security vulnerabilities or exploits +- Core functionality completely broken +- Data corruption or loss risks +- Regression in previously passing tests +- Authentication/Authorization failures +- Payment processing errors + +**medium** - Important but not blocking: +- Edge case failures in non-critical features +- Minor functionality degradation +- Performance issues within acceptable limits +- Compatibility issues with specific environments +- Integration issues with optional components + +**low** - Acceptable in 95%+ threshold scenarios: +- Flaky tests (intermittent failures) +- Environment-specific issues (local dev only) +- Documentation or warning-level issues +- Non-critical test warnings +- Known issues with documented workarounds + +### Test Results JSON Format + +When generating test results for orchestrator (saved to `.process/test-results.json`): + +```json +{ + "total": 10, + "passed": 9, + "failed": 1, + "pass_rate": 90.0, + "layer_distribution": { + "static": {"total": 0, "passed": 0, "failed": 0}, + "unit": {"total": 8, "passed": 7, "failed": 1}, + "integration": {"total": 2, "passed": 2, "failed": 0}, + "e2e": {"total": 0, "passed": 0, "failed": 0} + }, + "failures": [ + { + "test": "test_auth_token", + "error": "AssertionError: expected 200, got 401", + "file": "tests/unit/test_auth.py", + "line": 45, + "criticality": "high", + "test_type": "unit" + } + ] +} +``` + +### Decision Support + +**For orchestrator decision-making**: +- Pass rate 100% + all tests pass โ†’ โœ… SUCCESS (proceed to completion) +- Pass rate >= 95% + all failures are "low" criticality โ†’ โœ… PARTIAL SUCCESS (review and approve) +- Pass rate >= 95% + any "high" or "medium" criticality failures โ†’ โš ๏ธ NEEDS FIX (continue iteration) +- Pass rate < 95% โ†’ โŒ FAILED (continue iteration or abort) + ## Important Reminders **ALWAYS:** diff --git a/.claude/commands/workflow/test-cycle-execute.md b/.claude/commands/workflow/test-cycle-execute.md index b76c8050..a2ea6fd6 100644 --- a/.claude/commands/workflow/test-cycle-execute.md +++ b/.claude/commands/workflow/test-cycle-execute.md @@ -1,6 +1,6 @@ --- name: test-cycle-execute -description: Execute test-fix workflow with dynamic task generation and iterative fix cycles until all tests pass or max iterations reached +description: Execute test-fix workflow with dynamic task generation and iterative fix cycles until test pass rate >= 95% or max iterations reached. Uses @cli-planning-agent for failure analysis and task generation. argument-hint: "[--resume-session=\"session-id\"] [--max-iterations=N]" allowed-tools: SlashCommand(*), TodoWrite(*), Read(*), Bash(*), Task(*) --- @@ -10,10 +10,13 @@ allowed-tools: SlashCommand(*), TodoWrite(*), Read(*), Bash(*), Task(*) ## Overview Orchestrates dynamic test-fix workflow execution through iterative cycles of testing, analysis, and fixing. **Unlike standard execute, this command dynamically generates intermediate tasks** during execution based on test results and CLI analysis, enabling adaptive problem-solving. +**Quality Gate**: Iterates until test pass rate >= 95% (with criticality assessment) or 100% for full approval. + **CRITICAL - Orchestrator Boundary**: - This command is the **ONLY place** where test failures are handled -- All CLI analysis (Gemini/Qwen), fix task generation (IMPL-fix-N.json), and iteration management happen HERE -- Agents (@test-fix-agent) only execute single tasks and return results +- All failure analysis and fix task generation delegated to **@cli-planning-agent** +- Orchestrator calculates pass rate, assesses criticality, and manages iteration loop +- @test-fix-agent executes tests and applies fixes, reports results back - **Do NOT handle test failures in main workflow or other commands** - always delegate to this orchestrator **Resume Mode**: When called with `--resume-session` flag, skips discovery and continues from interruption point. @@ -35,44 +38,53 @@ Orchestrates dynamic test-fix workflow execution through iterative cycles of tes ### Agent Coordination - **@code-developer**: Understands requirements, generates implementations -- **@test-fix-agent**: Executes tests, applies fixes, validates results -- **CLI Tools (Gemini/Qwen)**: Analyzes failures, suggests fix strategies +- **@test-fix-agent**: Executes tests, applies fixes, validates results, assigns criticality +- **@cli-planning-agent**: Executes CLI analysis (Gemini/Qwen), parses results, generates fix task JSONs ## Core Rules -1. **Dynamic Task Generation**: Create intermediate fix tasks based on test failures -2. **Iterative Execution**: Repeat test-fix cycles until success or max iterations -3. **CLI-Driven Analysis**: Use Gemini/Qwen to analyze failures and plan fixes -4. **Agent Delegation**: All execution delegated to specialized agents -5. **Context Accumulation**: Each iteration builds on previous attempt context -6. **Autonomous Completion**: Continue until all tests pass without user interruption +1. **Dynamic Task Generation**: Create intermediate fix tasks via @cli-planning-agent based on test failures +2. **Iterative Execution**: Repeat test-fix cycles until pass rate >= 95% (with criticality assessment) or max iterations +3. **Pass Rate Threshold**: Target 95%+ pass rate; 100% for full approval; assess criticality for 95-100% range +4. **Agent-Based Analysis**: Delegate CLI execution and task generation to @cli-planning-agent +5. **Agent Delegation**: All execution delegated to specialized agents (@cli-planning-agent, @test-fix-agent) +6. **Context Accumulation**: Each iteration builds on previous attempt context +7. **Autonomous Completion**: Continue until pass rate >= 95% without user interruption ## Core Responsibilities - **Session Discovery**: Identify test-fix workflow sessions - **Task Queue Management**: Maintain dynamic task queue with runtime additions - **Test Execution**: Run tests through @test-fix-agent -- **Failure Analysis**: Use CLI tools to diagnose test failures -- **Fix Task Generation**: Create intermediate fix tasks dynamically -- **Iteration Control**: Manage fix cycles with max iteration limits -- **Context Propagation**: Pass failure context and fix history between iterations +- **Pass Rate Calculation**: Calculate test pass rate from test-results.json (passed/total * 100) +- **Criticality Assessment**: Evaluate failure severity using test-results.json criticality field +- **Threshold Decision**: Determine if pass rate >= 95% with criticality consideration +- **Failure Analysis Delegation**: Invoke @cli-planning-agent for CLI analysis and task generation +- **Iteration Control**: Manage fix cycles with max iteration limits (5 default) +- **Context Propagation**: Pass failure context and fix history to @cli-planning-agent - **Progress Tracking**: TodoWrite updates for entire iteration cycle -- **Session Auto-Complete**: Call `/workflow:session:complete` when all tests pass +- **Session Auto-Complete**: Call `/workflow:session:complete` when pass rate >= 95% (or 100%) ## Responsibility Matrix **CRITICAL - Clear division of labor between orchestrator and agents:** -| Responsibility | test-cycle-execute (Orchestrator) | @test-fix-agent (Executor) | -|----------------|----------------------------|---------------------------| -| Manage iteration loop | Yes - Controls loop flow | No - Executes single task | -| Run CLI analysis (Gemini/Qwen) | Yes - Runs between agent tasks | No - Not involved | -| Generate IMPL-fix-N.json | Yes - Creates task files | No - Not involved | -| Run tests | No - Delegates to agent | Yes - Executes test command | -| Apply fixes | No - Delegates to agent | Yes - Modifies code | -| Detect test failures | Yes - Analyzes results and decides next action | Yes - Executes tests and reports outcomes | -| Add tasks to queue | Yes - Manages queue | No - Not involved | -| Update iteration state | Yes - Maintains overall iteration state | Yes - Updates individual task status only | +| Responsibility | test-cycle-execute (Orchestrator) | @cli-planning-agent | @test-fix-agent (Executor) | +|----------------|----------------------------|---------------------|---------------------------| +| Manage iteration loop | Yes - Controls loop flow | No - Not involved | No - Executes single task | +| Calculate pass rate | Yes - From test-results.json | No - Not involved | No - Reports test results | +| Assess criticality | Yes - From test-results.json | No - Not involved | Yes - Assigns criticality in test results | +| Run CLI analysis (Gemini/Qwen) | No - Delegates to cli-planning-agent | Yes - Executes CLI internally | No - Not involved | +| Parse CLI output | No - Delegated | Yes - Extracts fix strategy | No - Not involved | +| Generate IMPL-fix-N.json | No - Delegated | Yes - Creates task files | No - Not involved | +| Run tests | No - Delegates to agent | No - Not involved | Yes - Executes test command | +| Apply fixes | No - Delegates to agent | No - Not involved | Yes - Modifies code | +| Detect test failures | Yes - Analyzes pass rate and decides next action | No - Not involved | Yes - Executes tests and reports outcomes | +| Add tasks to queue | Yes - Manages queue | No - Returns task ID | No - Not involved | +| Update iteration state | Yes - Maintains overall iteration state | No - Not involved | Yes - Updates individual task status only | -**Key Principle**: Orchestrator manages the "what" and "when"; agents execute the "how". +**Key Principles**: +- Orchestrator manages the "what" (iteration flow, threshold decisions) and "when" (task scheduling) +- @cli-planning-agent executes the "analysis" (CLI execution, result parsing, task generation) +- @test-fix-agent executes the "how" (run tests, apply fixes) **ENFORCEMENT**: If test failures occur outside this orchestrator, do NOT handle them inline - always call `/workflow:test-cycle-execute` instead. @@ -97,17 +109,29 @@ For each task in queue: 1. [Orchestrator] Load task JSON and context 2. [Orchestrator] Determine task type (test-gen, test-fix, fix-iteration) 3. [Orchestrator] Execute task through appropriate agent - 4. [Orchestrator] Collect agent results and check exit conditions - 5. If test failures detected: - a. [Orchestrator] Run CLI analysis (Gemini/Qwen) - b. [Orchestrator] Generate fix task JSON (IMPL-fix-N.json) + 4. [Orchestrator] Collect agent results from .process/test-results.json + 5. [Orchestrator] Calculate test pass rate: + a. Parse test-results.json: passRate = (passed / total) * 100 + b. Assess failure criticality (from test-results.json) + c. Evaluate fix effectiveness (NEW): + - Compare passRate with previous iteration + - If passRate decreased by >10%: REGRESSION detected + - If regression: Rollback last fix, skip to next strategy + 6. [Orchestrator] Make threshold decision: + IF passRate === 100%: + โ†’ SUCCESS: Mark task complete, update TodoWrite, continue + ELSE IF passRate >= 95%: + โ†’ REVIEW: Check failure criticality + โ†’ If all failures are "low" criticality: PARTIAL SUCCESS (approve with note) + โ†’ If any "high" or "medium" criticality: Enter fix loop (step 7) + ELSE IF passRate < 95%: + โ†’ FAILED: Enter fix loop (step 7) + 7. If entering fix loop (pass rate < 95% OR critical failures exist): + a. [Orchestrator] Invoke @cli-planning-agent with failure context + b. [Agent] Executes CLI analysis + generates IMPL-fix-N.json c. [Orchestrator] Insert fix task at front of queue d. [Orchestrator] Continue loop - 6. If test success: - a. [Orchestrator] Mark task complete - b. [Orchestrator] Update TodoWrite - c. [Orchestrator] Continue to next task - 7. [Orchestrator] Check max iterations limit + 8. [Orchestrator] Check max iterations limit (abort if exceeded) ``` **Note**: The orchestrator controls the loop. Agents execute individual tasks and return results. @@ -119,33 +143,33 @@ For each task in queue: #### Iteration Structure ``` Iteration N (managed by test-cycle-execute orchestrator): -โ”œโ”€โ”€ 1. Test Execution +โ”œโ”€โ”€ 1. Test Execution & Pass Rate Validation โ”‚ โ”œโ”€โ”€ [Orchestrator] Launch @test-fix-agent with test task -โ”‚ โ”œโ”€โ”€ [Agent] Run test suite -โ”‚ โ”œโ”€โ”€ [Agent] Collect failures and report back -โ”‚ โ””โ”€โ”€ [Orchestrator] Receive failure report -โ”œโ”€โ”€ 2. Failure Analysis -โ”‚ โ”œโ”€โ”€ [Orchestrator] Run CLI tool (Gemini/Qwen) -โ”‚ โ”œโ”€โ”€ [CLI Tool] Analyze error messages and failure context -โ”‚ โ”œโ”€โ”€ [CLI Tool] Identify root causes -โ”‚ โ””โ”€โ”€ [CLI Tool] Generate fix strategy โ†’ saved to iteration-N-analysis.md -โ”œโ”€โ”€ 3. Fix Task Generation -โ”‚ โ”œโ”€โ”€ [Orchestrator] Parse CLI analysis results -โ”‚ โ”œโ”€โ”€ [Orchestrator] Create IMPL-fix-N.json with: -โ”‚ โ”‚ โ”œโ”€โ”€ meta.agent: "@test-fix-agent" -โ”‚ โ”‚ โ”œโ”€โ”€ Failure context (content, not just path) -โ”‚ โ”‚ โ””โ”€โ”€ Fix strategy from CLI analysis -โ”‚ โ””โ”€โ”€ [Orchestrator] Insert into task queue (front position) -โ”œโ”€โ”€ 4. Fix Execution -โ”‚ โ”œโ”€โ”€ [Orchestrator] Launch @test-fix-agent with fix task -โ”‚ โ”œโ”€โ”€ [Agent] Load fix strategy from task context -โ”‚ โ”œโ”€โ”€ [Agent] Apply fixes to code/tests +โ”‚ โ”œโ”€โ”€ [Agent] Run test suite and save results to test-results.json +โ”‚ โ”œโ”€โ”€ [Agent] Report completion back to orchestrator +โ”‚ โ”œโ”€โ”€ [Orchestrator] Calculate pass rate: (passed / total) * 100 +โ”‚ โ””โ”€โ”€ [Orchestrator] Assess failure criticality from test-results.json +โ”œโ”€โ”€ 2. Failure Analysis & Task Generation (via @cli-planning-agent) +โ”‚ โ”œโ”€โ”€ [Orchestrator] Assemble failure context package (tests, errors, pass_rate) +โ”‚ โ”œโ”€โ”€ [Orchestrator] Invoke @cli-planning-agent with context +โ”‚ โ”œโ”€โ”€ [@cli-planning-agent] Execute CLI tool (Gemini/Qwen) internally +โ”‚ โ”œโ”€โ”€ [@cli-planning-agent] Parse CLI output for root causes and fix strategy +โ”‚ โ”œโ”€โ”€ [@cli-planning-agent] Generate IMPL-fix-N.json with structured task +โ”‚ โ”œโ”€โ”€ [@cli-planning-agent] Save analysis to iteration-N-analysis.md +โ”‚ โ””โ”€โ”€ [Orchestrator] Receive task ID and insert into queue (front position) +โ”œโ”€โ”€ 3. Fix Execution +โ”‚ โ”œโ”€โ”€ [Orchestrator] Launch @test-fix-agent with IMPL-fix-N task +โ”‚ โ”œโ”€โ”€ [Agent] Load fix strategy from task.context.fix_strategy +โ”‚ โ”œโ”€โ”€ [Agent] Apply surgical fixes to identified files โ”‚ โ””โ”€โ”€ [Agent] Report completion -โ””โ”€โ”€ 5. Re-test +โ””โ”€โ”€ 4. Re-test โ””โ”€โ”€ [Orchestrator] Return to step 1 with updated code ``` -**Key**: Orchestrator runs CLI analysis between agent tasks, then generates new fix tasks. +**Key Changes**: +- CLI analysis + task generation encapsulated in @cli-planning-agent +- Pass rate calculation added to test execution step +- Orchestrator only assembles context and invokes agent #### Iteration Task JSON Template ```json @@ -220,74 +244,139 @@ Iteration N (managed by test-cycle-execute orchestrator): } ``` -### Phase 4: CLI Analysis Integration +### Phase 4: Agent-Based Failure Analysis & Task Generation -**Orchestrator executes CLI analysis between agent tasks:** +**Orchestrator delegates CLI analysis and task generation to @cli-planning-agent:** -#### When Test Failures Occur -1. **[Orchestrator]** Detects failures from agent test execution output -2. **[Orchestrator]** Collects failure context from `.process/test-results.json` and logs -3. **[Orchestrator]** Executes Gemini/Qwen CLI tool with failure context -4. **[Orchestrator]** Interprets CLI tool output to extract fix strategy -5. **[Orchestrator]** Saves analysis to `.process/iteration-N-analysis.md` -6. **[Orchestrator]** Generates `IMPL-fix-N.json` with strategy content (not just path) +#### When Test Failures Occur (Pass Rate < 95% OR Critical Failures) +1. **[Orchestrator]** Detects failures from test-results.json +2. **[Orchestrator]** Check for repeated failures (NEW): + - Compare failed_tests with previous 2 iterations + - If same test failed 3 times consecutively: Mark as "stuck" + - If >50% of failures are "stuck": Switch analysis strategy or abort +3. **[Orchestrator]** Extracts failure context: + - Failed tests with criticality assessment + - Error messages and stack traces + - Current pass rate + - Previous iteration attempts (if any) + - Stuck test markers (NEW) +4. **[Orchestrator]** Assembles context package for @cli-planning-agent +5. **[Orchestrator]** Invokes @cli-planning-agent via Task tool +6. **[@cli-planning-agent]** Executes internally: + - Runs Gemini/Qwen CLI analysis with bug diagnosis template + - Parses CLI output to extract root causes and fix strategy + - Generates `IMPL-fix-N.json` with structured task definition + - Saves analysis report to `.process/iteration-N-analysis.md` + - Saves raw CLI output to `.process/iteration-N-cli-output.txt` +7. **[Orchestrator]** Receives task ID from agent and inserts into queue -**Note**: The orchestrator executes CLI analysis tools and processes their output. CLI tools provide analysis, orchestrator manages the workflow. +**Key Change**: CLI execution + result parsing + task generation are now encapsulated in @cli-planning-agent, simplifying orchestrator logic. -#### CLI Analysis Command (executed by orchestrator) -```bash -cd {project_root} && gemini -p " -PURPOSE: Analyze test failures and generate fix strategy -TASK: Review test failures and identify root causes -MODE: analysis -CONTEXT: @test files @ implementation files +#### Agent Invocation Pattern (executed by orchestrator) +```javascript +Task( + subagent_type="cli-planning-agent", + description=`Analyze test failures and generate fix task (iteration ${currentIteration})`, + prompt=` + ## Context Package + { + "session_id": "${sessionId}", + "iteration": ${currentIteration}, + "analysis_type": "test-failure", + "failure_context": { + "failed_tests": ${JSON.stringify(failedTests)}, + "error_messages": ${JSON.stringify(errorMessages)}, + "test_output": "${testOutputPath}", + "pass_rate": ${passRate}, + "previous_attempts": ${JSON.stringify(previousAttempts)} + }, + "cli_config": { + "tool": "gemini", + "model": "gemini-3-pro-preview-11-2025", + "template": "01-diagnose-bug-root-cause.txt", + "timeout": 2400000, + "fallback": "qwen" + }, + "task_config": { + "agent": "@test-fix-agent", + "type": "test-fix-iteration", + "max_iterations": ${maxIterations}, + "use_codex": false + } + } -[Test failure context and requirements...] - -EXPECTED: Detailed fix strategy in markdown format -RULES: Focus on minimal changes, avoid over-engineering -" + ## Your Task + 1. Execute CLI analysis using Gemini (fallback to Qwen if needed) + 2. Parse CLI output and extract fix strategy with specific modification points + 3. Generate IMPL-fix-${currentIteration}.json using your internal task template + 4. Save analysis report to .process/iteration-${currentIteration}-analysis.md + 5. Report success and task ID back to orchestrator + ` +) ``` -#### Analysis Output Structure +#### Agent Response +```javascript +{ + "status": "success", + "task_id": "IMPL-fix-${iteration}", + "task_path": ".workflow/${session}/.task/IMPL-fix-${iteration}.json", + "analysis_report": ".process/iteration-${iteration}-analysis.md", + "cli_output": ".process/iteration-${iteration}-cli-output.txt", + "summary": "Fix authentication token validation and null check issues", + "modification_points_count": 2, + "estimated_complexity": "low" +} +``` + +#### Generated Analysis Report Structure +The @cli-planning-agent generates `.process/iteration-N-analysis.md`: + ```markdown -# Test Failure Analysis - Iteration {N} +--- +iteration: N +analysis_type: test-failure +cli_tool: gemini +model: gemini-3-pro-preview-11-2025 +timestamp: 2025-11-10T10:00:00Z +pass_rate: 85.0% +--- + +# Test Failure Analysis - Iteration N + +## Summary +- **Failed Tests**: 2 +- **Pass Rate**: 85.0% (Target: 95%+) +- **Root Causes Identified**: 2 +- **Modification Points**: 2 + +## Failed Tests Details + +### test_auth_flow +- **Error**: Expected 200, got 401 +- **File**: tests/test_auth.test.ts:45 +- **Criticality**: high + +### test_data_validation +- **Error**: TypeError: Cannot read property 'name' of undefined +- **File**: tests/test_validators.test.ts:23 +- **Criticality**: medium ## Root Cause Analysis -1. **Test: test_auth_flow** - - Error: `Expected 200, got 401` - - Root Cause: Missing authentication token in request headers - - Affected Code: `src/auth/client.ts:45` - -2. **Test: test_data_validation** - - Error: `TypeError: Cannot read property 'name' of undefined` - - Root Cause: Null check missing before property access - - Affected Code: `src/validators/user.ts:23` +[CLI output: ๆ นๆœฌๅŽŸๅ› ๅˆ†ๆž section] ## Fix Strategy +[CLI output: ่ฏฆ็ป†ไฟฎๅคๅปบ่ฎฎ section] -### Priority 1: Authentication Issue -- **File**: src/auth/client.ts -- **Function**: sendRequest (line 45) -- **Change**: Add token header: `headers['Authorization'] = 'Bearer ' + token` -- **Verification**: Run test_auth_flow +## Modification Points +- `src/auth/client.ts:sendRequest:45-50` - Add authentication token header +- `src/validators/user.ts:validateUser:23-25` - Add null check before property access -### Priority 2: Null Check -- **File**: src/validators/user.ts -- **Function**: validateUser (line 23) -- **Change**: Add check: `if (!user?.name) return false` -- **Verification**: Run test_data_validation +## Expected Outcome +[CLI output: ้ชŒ่ฏๅปบ่ฎฎ section] -## Verification Plan -1. Apply fixes in order -2. Run test suite after each fix -3. Check for regressions -4. Validate all tests pass - -## Risk Assessment -- Low risk: Changes are surgical and isolated -- No breaking changes expected -- Existing tests should remain green +## CLI Raw Output +See: `.process/iteration-N-cli-output.txt` ``` ### Phase 5: Task Queue Management @@ -321,27 +410,56 @@ After IMPL-fix-2 execution (success): #### Success Conditions - All initial tasks completed - All generated fix tasks completed -- All tests passing +- **Test pass rate === 100%** (all tests passing) - No pending tasks in queue +#### Partial Success Conditions (NEW) +- All initial tasks completed +- Test pass rate >= 95% AND < 100% +- All failures are "low" criticality (flaky tests, env-specific issues) +- **Automatic Approval with Warning**: System auto-approves but marks session with review flag +- Note: Generate completion summary with detailed warnings about low-criticality failures + #### Completion Steps 1. **Final Validation**: Run full test suite one more time -2. **Update Session State**: Mark all tasks completed -3. **Generate Summary**: Create session completion summary -4. **Update TodoWrite**: Mark all items completed -5. **Auto-Complete**: Call `/workflow:session:complete` +2. **Calculate Final Pass Rate**: Parse test-results.json +3. **Assess Completion Status**: + - If pass_rate === 100% โ†’ Full Success + - If pass_rate >= 95% + all "low" criticality โ†’ Partial Success (add review note) + - If pass_rate >= 95% + any "high"/"medium" criticality โ†’ Continue iteration + - If pass_rate < 95% โ†’ Failure +4. **Update Session State**: Mark all tasks completed (or blocked if failure) +5. **Generate Summary**: Create session completion summary with pass rate metrics +6. **Update TodoWrite**: Mark all items completed +7. **Auto-Complete**: Call `/workflow:session:complete` (for Full or Partial Success) #### Failure Conditions -- Max iterations reached without success -- Unrecoverable test failures +- Max iterations (5) reached without achieving 95% pass rate +- **Test pass rate < 95% after max iterations** (NEW) +- Pass rate >= 95% but critical failures exist and max iterations reached +- Unrecoverable test failures (infinite loop detection) - Agent execution errors #### Failure Handling -1. **Document State**: Save current iteration context -2. **Generate Report**: Create failure analysis report -3. **Preserve Context**: Keep all iteration logs +1. **Document State**: Save current iteration context with final pass rate +2. **Generate Failure Report**: Include: + - Final pass rate (e.g., "85% after 5 iterations") + - Remaining failures with criticality assessment + - Iteration history and attempted fixes + - CLI analysis quality (normal/degraded) + - Recommendations for manual intervention +3. **Preserve Context**: Keep all iteration logs and analysis reports 4. **Mark Blocked**: Update task status to blocked -5. **Return Control**: Return to user with detailed report +5. **Return Control**: Return to user with detailed failure report + +#### Degraded Analysis Handling (NEW) +When @cli-planning-agent returns `status: "degraded"` (both Gemini and Qwen failed): +1. **Log Warning**: Record CLI analysis failure in iteration-state.json +2. **Assess Risk**: Check if degraded analysis is acceptable: + - If iteration < 3 AND pass_rate improved: Accept degraded analysis, continue + - If iteration >= 3 OR pass_rate stagnant: Skip iteration, mark as blocked +3. **User Notification**: Include CLI failure in completion summary +4. **Fallback Strategy**: Use basic pattern matching from fix-history.json ## TodoWrite Coordination diff --git a/.claude/commands/workflow/test-fix-gen.md b/.claude/commands/workflow/test-fix-gen.md index 9b32b151..9411b204 100644 --- a/.claude/commands/workflow/test-fix-gen.md +++ b/.claude/commands/workflow/test-fix-gen.md @@ -202,9 +202,25 @@ This command is a **pure planning coordinator**: **Expected Behavior**: - Use Gemini to analyze coverage gaps and implementation - Study existing test patterns and conventions -- Generate test requirements for missing test files -- Design test generation strategy -- Generate `TEST_ANALYSIS_RESULTS.md` +- Generate **multi-layered test requirements** (L0: Static Analysis, L1: Unit, L2: Integration, L3: E2E) +- Design test generation strategy with quality assurance criteria +- Generate `TEST_ANALYSIS_RESULTS.md` with structured test layers + +**Enhanced Test Requirements**: +For each targeted file/function, Gemini MUST generate: +1. **L0: Static Analysis Requirements**: + - Linting rules to enforce (ESLint, Prettier) + - Type checking requirements (TypeScript) + - Anti-pattern detection rules +2. **L1: Unit Test Requirements**: + - Happy path scenarios (valid inputs โ†’ expected outputs) + - Negative path scenarios (invalid inputs โ†’ error handling) + - Edge cases (null, undefined, 0, empty strings/arrays) +3. **L2: Integration Test Requirements**: + - Successful component interactions + - Failure handling scenarios (service unavailable, timeout) +4. **L3: E2E Test Requirements** (if applicable): + - Key user journeys from start to finish **Parse Output**: - Verify `.workflow/[testSessionId]/.process/TEST_ANALYSIS_RESULTS.md` created @@ -213,9 +229,18 @@ This command is a **pure planning coordinator**: - TEST_ANALYSIS_RESULTS.md exists with complete sections: - Coverage Assessment - Test Framework & Conventions - - Test Requirements by File + - **Multi-Layered Test Plan** (NEW): + - L0: Static Analysis Plan + - L1: Unit Test Plan + - L2: Integration Test Plan + - L3: E2E Test Plan (if applicable) + - Test Requirements by File (with layer annotations) - Test Generation Strategy - Implementation Targets + - Quality Assurance Criteria (NEW): + - Minimum coverage thresholds + - Required test types per function + - Acceptance criteria for test quality - Success Criteria **TodoWrite**: Mark phase 3 completed, phase 4 in_progress @@ -232,16 +257,18 @@ This command is a **pure planning coordinator**: - `--cli-execute` flag (if present) - Controls IMPL-001 generation mode **Expected Behavior**: -- Parse TEST_ANALYSIS_RESULTS.md from Phase 3 -- Generate **minimum 2 task JSON files** (expandable based on complexity): +- Parse TEST_ANALYSIS_RESULTS.md from Phase 3 (multi-layered test plan) +- Generate **minimum 3 task JSON files** (expandable based on complexity): - **IMPL-001.json**: Test Understanding & Generation (`@code-developer`) + - **IMPL-001.5-review.json**: Test Quality Gate (`@test-fix-agent`) โ† **NEW** - **IMPL-002.json**: Test Execution & Fix Cycle (`@test-fix-agent`) - **IMPL-003+**: Additional tasks if needed for complex projects -- Generate `IMPL_PLAN.md` with test strategy +- Generate `IMPL_PLAN.md` with multi-layered test strategy - Generate `TODO_LIST.md` with task checklist **Parse Output**: - Verify `.workflow/[testSessionId]/.task/IMPL-001.json` exists +- Verify `.workflow/[testSessionId]/.task/IMPL-001.5-review.json` exists โ† **NEW** - Verify `.workflow/[testSessionId]/.task/IMPL-002.json` exists - Verify additional `.task/IMPL-*.json` if applicable - Verify `IMPL_PLAN.md` and `TODO_LIST.md` created @@ -262,11 +289,16 @@ Test Session: [testSessionId] Tasks Created: - IMPL-001: Test Understanding & Generation (@code-developer) +- IMPL-001.5: Test Quality Gate - Static Analysis & Coverage (@test-fix-agent) โ† NEW - IMPL-002: Test Execution & Fix Cycle (@test-fix-agent) [- IMPL-003+: Additional tasks if applicable] +Test Strategy: Multi-Layered (L0: Static, L1: Unit, L2: Integration, L3: E2E) Test Framework: [detected framework] Test Files to Generate: [count] +Quality Thresholds: +- Minimum Coverage: 80% +- Static Analysis: Zero critical issues Max Fix Iterations: 5 Fix Mode: [Manual|Codex Automated] @@ -275,11 +307,12 @@ Review artifacts: - Task list: .workflow/[testSessionId]/TODO_LIST.md CRITICAL - Next Steps: -1. Review IMPL_PLAN.md +1. Review IMPL_PLAN.md (now includes multi-layered test strategy) 2. **MUST execute: /workflow:test-cycle-execute** - This command only generated task JSON files - Test execution and fix iterations happen in test-cycle-execute - Do NOT attempt to run tests or fixes in main workflow +3. IMPL-001.5 will validate test quality before fix cycle begins ``` **TodoWrite**: Mark phase 5 completed @@ -311,32 +344,85 @@ Update status to `in_progress` when starting each phase, `completed` when done. ## Task Specifications -Generates minimum 2 tasks (expandable for complex projects): +Generates minimum 3 tasks (expandable for complex projects): ### IMPL-001: Test Understanding & Generation **Agent**: `@code-developer` -**Purpose**: Understand source implementation and generate test files +**Purpose**: Understand source implementation and generate test files following multi-layered test strategy **Task Configuration**: - Task ID: `IMPL-001` - `meta.type: "test-gen"` - `meta.agent: "@code-developer"` -- `context.requirements`: Understand source implementation and generate tests +- `context.requirements`: Understand source implementation and generate tests across all layers (L0-L3) - `flow_control.target_files`: Test files to create from TEST_ANALYSIS_RESULTS.md section 5 **Execution Flow**: 1. **Understand Phase**: - Load TEST_ANALYSIS_RESULTS.md and test context - Understand source code implementation patterns - - Analyze test requirements and conventions - - Identify test scenarios and edge cases + - Analyze multi-layered test requirements (L0: Static, L1: Unit, L2: Integration, L3: E2E) + - Identify test scenarios, edge cases, and error paths 2. **Generation Phase**: - - Generate test files following existing patterns - - Ensure test coverage aligns with requirements + - Generate L1 unit test files following existing patterns + - Generate L2 integration test files (if applicable) + - Generate L3 E2E test files (if applicable) + - Ensure test coverage aligns with multi-layered requirements + - Include both positive and negative test cases 3. **Verification Phase**: - Verify test completeness and correctness + - Ensure each test has meaningful assertions + - Check for test anti-patterns (tests without assertions, overly broad mocks) + +### IMPL-001.5: Test Quality Gate โ† **NEW** + +**Agent**: `@test-fix-agent` + +**Purpose**: Validate test quality before entering fix cycle - prevent "hollow tests" from becoming the source of truth + +**Task Configuration**: +- Task ID: `IMPL-001.5-review` +- `meta.type: "test-quality-review"` +- `meta.agent: "@test-fix-agent"` +- `context.depends_on: ["IMPL-001"]` +- `context.requirements`: Validate generated tests meet quality standards +- `context.quality_config`: Load from `.claude/workflows/test-quality-config.json` + +**Execution Flow**: +1. **L0: Static Analysis**: + - Run linting on test files (ESLint, Prettier) + - Check for test anti-patterns: + - Tests without assertions (`expect()` missing) + - Empty test bodies (`it('should...', () => {})`) + - Disabled tests without justification (`it.skip`, `xit`) + - Verify TypeScript type safety (if applicable) +2. **Coverage Analysis**: + - Run coverage analysis on generated tests + - Calculate coverage percentage for target source files + - Identify uncovered branches and edge cases +3. **Test Quality Metrics**: + - Verify minimum coverage threshold met (default: 80%) + - Verify all critical functions have negative test cases + - Verify integration tests cover key component interactions +4. **Quality Gate Decision**: + - **PASS**: Coverage โ‰ฅ 80%, zero critical anti-patterns โ†’ Proceed to IMPL-002 + - **FAIL**: Coverage < 80% OR critical anti-patterns found โ†’ Loop back to IMPL-001 with feedback + +**Acceptance Criteria**: +- Static analysis: Zero critical issues +- Test coverage: โ‰ฅ 80% for target files +- Test completeness: All targeted functions have unit tests +- Negative test coverage: Each public API has at least one error handling test +- Integration coverage: Key component interactions have integration tests (if applicable) + +**Failure Handling**: +If quality gate fails: +1. Generate detailed feedback report (`.process/test-quality-report.md`) +2. Update IMPL-001 task with specific improvement requirements +3. Trigger IMPL-001 re-execution with enhanced context +4. Maximum 2 quality gate retries before escalating to user ### IMPL-002: Test Execution & Fix Cycle