feat: Implement dynamic test-fix execution phase with adaptive task generation

- Added Phase 2: Test-Cycle Execution documentation outlining the process for dynamic test-fix execution, including agent roles, core responsibilities, intelligent strategy engine, and progressive testing.
- Introduced new PowerShell scripts for analyzing TypeScript errors, focusing on error categorization and reporting.
- Created end-to-end tests for the Help Page, ensuring content visibility, documentation navigation, internationalization support, and accessibility compliance.
This commit is contained in:
catlog22
2026-02-07 17:01:30 +08:00
parent 4ce4419ea6
commit ba5f4eba84
70 changed files with 7288 additions and 488 deletions

View File

@@ -0,0 +1,774 @@
## Auto Mode
When `--yes` or `-y`: Skip user questions, use defaults (no materials, Agent executor).
# Phase 2: TDD Task Generation
## Overview
Autonomous TDD task JSON and IMPL_PLAN.md generation using action-planning-agent with two-phase execution: discovery and document generation. Generates complete Red-Green-Refactor cycles contained within each task.
## Core Philosophy
- **Agent-Driven**: Delegate execution to action-planning-agent for autonomous operation
- **Two-Phase Flow**: Discovery (context gathering) → Output (document generation)
- **Memory-First**: Reuse loaded documents from conversation memory
- **MCP-Enhanced**: Use MCP tools for advanced code analysis and research
- **Semantic CLI Selection**: CLI tool usage determined from user's task description, not flags
- **Agent Simplicity**: Agent generates content with semantic CLI detection
- **Path Clarity**: All `focus_paths` prefer absolute paths (e.g., `D:\\project\\src\\module`), or clear relative paths from project root (e.g., `./src/module`)
- **TDD-First**: Every feature starts with a failing test (Red phase)
- **Feature-Complete Tasks**: Each task contains complete Red-Green-Refactor cycle
- **Quantification-Enforced**: All test cases, coverage requirements, and implementation scope MUST include explicit counts and enumerations
- **Explicit Lifecycle**: Always close_agent after wait completes to free resources
## Task Strategy & Philosophy
### Optimized Task Structure (Current)
- **1 feature = 1 task** containing complete TDD cycle internally
- Each task executes Red-Green-Refactor phases sequentially
- Task count = Feature count (typically 5 features = 5 tasks)
**Previous Approach** (Deprecated):
- 1 feature = 3 separate tasks (TEST-N.M, IMPL-N.M, REFACTOR-N.M)
- 5 features = 15 tasks with complex dependency chains
- High context switching cost between phases
### When to Use Subtasks
- Feature complexity >2500 lines or >6 files per TDD cycle
- Multiple independent sub-features needing parallel execution
- Strong technical dependency blocking (e.g., API before UI)
- Different tech stacks or domains within feature
### Task Limits
- **Maximum 18 tasks** (hard limit for TDD workflows)
- **Feature-based**: Complete functional units with internal TDD cycles
- **Hierarchy**: Flat (<=5 simple features) | Two-level (6-10 for complex features with sub-features)
- **Re-scope**: If >18 tasks needed, break project into multiple TDD workflow sessions
### TDD Cycle Mapping
- **Old approach**: 1 feature = 3 tasks (TEST-N.M, IMPL-N.M, REFACTOR-N.M)
- **Current approach**: 1 feature = 1 task (IMPL-N with internal Red-Green-Refactor phases)
- **Complex features**: 1 container (IMPL-N) + subtasks (IMPL-N.M) when necessary
## Execution Process
```
Input Parsing:
├─ Parse flags: --session
└─ Validation: session_id REQUIRED
Phase 1: Discovery & Context Loading (Memory-First)
├─ Load session context (if not in memory)
├─ Load context package (if not in memory)
├─ Load test context package (if not in memory)
├─ Extract & load role analyses from context package
├─ Load conflict resolution (if exists)
└─ Optional: MCP external research
Phase 2: Agent Execution (Document Generation)
├─ Pre-agent template selection (semantic CLI detection)
├─ Invoke action-planning-agent via spawn_agent
├─ Generate TDD Task JSON Files (.task/IMPL-*.json)
│ └─ Each task: complete Red-Green-Refactor cycle internally
├─ Create IMPL_PLAN.md (TDD variant)
└─ Generate TODO_LIST.md with TDD phase indicators
```
## Execution Lifecycle
### Phase 0: User Configuration (Interactive)
**Purpose**: Collect user preferences before TDD task generation to ensure generated tasks match execution expectations and provide necessary supplementary context.
**User Questions**:
```javascript
AskUserQuestion({
questions: [
{
question: "Do you have supplementary materials or guidelines to include?",
header: "Materials",
multiSelect: false,
options: [
{ label: "No additional materials", description: "Use existing context only" },
{ label: "Provide file paths", description: "I'll specify paths to include" },
{ label: "Provide inline content", description: "I'll paste content directly" }
]
},
{
question: "Select execution method for generated TDD tasks:",
header: "Execution",
multiSelect: false,
options: [
{ label: "Agent (Recommended)", description: "Agent executes Red-Green-Refactor cycles directly" },
{ label: "Hybrid", description: "Agent orchestrates, calls CLI for complex steps (Red/Green phases)" },
{ label: "CLI Only", description: "All TDD cycles via CLI tools (codex/gemini/qwen)" }
]
},
{
question: "If using CLI, which tool do you prefer?",
header: "CLI Tool",
multiSelect: false,
options: [
{ label: "Codex (Recommended)", description: "Best for TDD Red-Green-Refactor cycles" },
{ label: "Gemini", description: "Best for analysis and large context" },
{ label: "Qwen", description: "Alternative analysis tool" },
{ label: "Auto", description: "Let agent decide per-task" }
]
}
]
})
```
**Handle Materials Response**:
```javascript
if (userConfig.materials === "Provide file paths") {
// Follow-up question for file paths
const pathsResponse = AskUserQuestion({
questions: [{
question: "Enter file paths to include (comma-separated or one per line):",
header: "Paths",
multiSelect: false,
options: [
{ label: "Enter paths", description: "Provide paths in text input" }
]
}]
})
userConfig.supplementaryPaths = parseUserPaths(pathsResponse)
}
```
**Build userConfig**:
```javascript
const userConfig = {
supplementaryMaterials: {
type: "none|paths|inline",
content: [...], // Parsed paths or inline content
},
executionMethod: "agent|hybrid|cli",
preferredCliTool: "codex|gemini|qwen|auto",
enableResume: true // Always enable resume for CLI executions
}
```
**Pass to Agent**: Include `userConfig` in agent prompt for Phase 2.
---
### Phase 1: Context Preparation & Discovery
**Command Responsibility**: Command prepares session paths and metadata, provides to agent for autonomous context loading.
**Memory-First Rule**: Skip file loading if documents already in conversation memory
**Progressive Loading Strategy**: Load context incrementally due to large analysis.md file sizes:
- **Core**: session metadata + context-package.json (always load)
- **Selective**: synthesis_output OR (guidance + relevant role analyses) - NOT all role analyses
- **On-Demand**: conflict resolution (if conflict_risk >= medium), test context
**Path Clarity Requirement**: All `focus_paths` prefer absolute paths (e.g., `D:\\project\\src\\module`), or clear relative paths from project root (e.g., `./src/module`)
**Session Path Structure** (Provided by Command to Agent):
```
.workflow/active/WFS-{session-id}/
├── workflow-session.json # Session metadata
├── .process/
│ ├── context-package.json # Context package with artifact catalog
│ ├── test-context-package.json # Test coverage analysis
│ └── conflict-resolution.json # Conflict resolution (if exists)
├── .task/ # Output: Task JSON files
│ ├── IMPL-1.json
│ ├── IMPL-2.json
│ └── ...
├── IMPL_PLAN.md # Output: TDD implementation plan
└── TODO_LIST.md # Output: TODO list with TDD phases
```
**Command Preparation**:
1. **Assemble Session Paths** for agent prompt:
- `session_metadata_path`: `.workflow/active/{session-id}/workflow-session.json`
- `context_package_path`: `.workflow/active/{session-id}/.process/context-package.json`
- `test_context_package_path`: `.workflow/active/{session-id}/.process/test-context-package.json`
- Output directory paths
2. **Provide Metadata** (simple values):
- `session_id`: WFS-{session-id}
- `workflow_type`: "tdd"
- `mcp_capabilities`: {exa_code, exa_web, code_index}
3. **Pass userConfig** from Phase 0
**Agent Context Package** (Agent loads autonomously):
```javascript
{
"session_id": "WFS-[session-id]",
"workflow_type": "tdd",
// Core (ALWAYS load)
"session_metadata": {
// If in memory: use cached content
// Else: Load from workflow-session.json
},
"context_package": {
// If in memory: use cached content
// Else: Load from context-package.json
},
// Selective (load based on progressive strategy)
"brainstorm_artifacts": {
// Loaded from context-package.json → brainstorm_artifacts section
"synthesis_output": {"path": "...", "exists": true}, // Load if exists (highest priority)
"guidance_specification": {"path": "...", "exists": true}, // Load if no synthesis
"role_analyses": [ // Load SELECTIVELY based on task relevance
{
"role": "system-architect",
"files": [{"path": "...", "type": "primary|supplementary"}]
}
]
},
// On-Demand (load if exists)
"test_context_package": {
// Load from test-context-package.json
// Contains existing test patterns and coverage analysis
},
"conflict_resolution": {
// Load from conflict-resolution.json if conflict_risk >= medium
// Check context-package.conflict_detection.resolution_file
},
// Capabilities
"mcp_capabilities": {
"exa_code": true,
"exa_web": true,
"code_index": true
},
// User configuration from Phase 0
"user_config": {
// From Phase 0 AskUserQuestion
}
}
```
**Discovery Actions**:
1. **Load Session Context** (if not in memory)
```javascript
if (!memory.has("workflow-session.json")) {
Read(.workflow/active/{session-id}/workflow-session.json)
}
```
2. **Load Context Package** (if not in memory)
```javascript
if (!memory.has("context-package.json")) {
Read(.workflow/active/{session-id}/.process/context-package.json)
}
```
3. **Load Test Context Package** (if not in memory)
```javascript
if (!memory.has("test-context-package.json")) {
Read(.workflow/active/{session-id}/.process/test-context-package.json)
}
```
4. **Extract & Load Role Analyses** (from context-package.json)
```javascript
// Extract role analysis paths from context package
const roleAnalysisPaths = contextPackage.brainstorm_artifacts.role_analyses
.flatMap(role => role.files.map(f => f.path));
// Load each role analysis file
roleAnalysisPaths.forEach(path => Read(path));
```
5. **Load Conflict Resolution** (from conflict-resolution.json, if exists)
```javascript
// Check for new conflict-resolution.json format
if (contextPackage.conflict_detection?.resolution_file) {
Read(contextPackage.conflict_detection.resolution_file) // .process/conflict-resolution.json
}
// Fallback: legacy brainstorm_artifacts path
else if (contextPackage.brainstorm_artifacts?.conflict_resolution?.exists) {
Read(contextPackage.brainstorm_artifacts.conflict_resolution.path)
}
```
6. **Code Analysis with Native Tools** (optional - enhance understanding)
```bash
# Find relevant test files and patterns
find . -name "*test*" -type f
rg "describe|it\(|test\(" -g "*.ts"
```
7. **MCP External Research** (optional - gather TDD best practices)
```javascript
// Get external TDD examples and patterns
mcp__exa__get_code_context_exa(
query="TypeScript TDD best practices Red-Green-Refactor",
tokensNum="dynamic"
)
```
### Phase 2: Agent Execution (TDD Document Generation)
**Purpose**: Generate TDD planning documents (IMPL_PLAN.md, task JSONs, TODO_LIST.md) - planning only, NOT code implementation.
**Agent Invocation**:
```javascript
// Spawn action-planning-agent
const agentId = spawn_agent({
message: `
## TASK ASSIGNMENT
### MANDATORY FIRST STEPS (Agent Execute)
1. **Read role definition**: ~/.codex/agents/action-planning-agent.md (MUST read first)
2. Read: .workflow/project-tech.json
3. Read: .workflow/project-guidelines.json
---
## TASK OBJECTIVE
Generate TDD implementation planning documents (IMPL_PLAN.md, task JSONs, TODO_LIST.md) for workflow session
IMPORTANT: This is PLANNING ONLY - you are generating planning documents, NOT implementing code.
CRITICAL: Follow the progressive loading strategy (load analysis.md files incrementally due to file size):
- **Core**: session metadata + context-package.json (always)
- **Selective**: synthesis_output OR (guidance + relevant role analyses) - NOT all
- **On-Demand**: conflict resolution (if conflict_risk >= medium), test context
## SESSION PATHS
Input:
- Session Metadata: .workflow/active/{session-id}/workflow-session.json
- Context Package: .workflow/active/{session-id}/.process/context-package.json
- Test Context: .workflow/active/{session-id}/.process/test-context-package.json
Output:
- Task Dir: .workflow/active/{session-id}/.task/
- IMPL_PLAN: .workflow/active/{session-id}/IMPL_PLAN.md
- TODO_LIST: .workflow/active/{session-id}/TODO_LIST.md
## CONTEXT METADATA
Session ID: {session-id}
Workflow Type: TDD
MCP Capabilities: {exa_code, exa_web, code_index}
## USER CONFIGURATION (from Phase 0)
Execution Method: ${userConfig.executionMethod} // agent|hybrid|cli
Preferred CLI Tool: ${userConfig.preferredCliTool} // codex|gemini|qwen|auto
Supplementary Materials: ${userConfig.supplementaryMaterials}
## EXECUTION METHOD MAPPING
Based on userConfig.executionMethod, set task-level meta.execution_config:
"agent" →
meta.execution_config = { method: "agent", cli_tool: null, enable_resume: false }
Agent executes Red-Green-Refactor phases directly
"cli" →
meta.execution_config = { method: "cli", cli_tool: userConfig.preferredCliTool, enable_resume: true }
Agent executes pre_analysis, then hands off full context to CLI via buildCliHandoffPrompt()
"hybrid" →
Per-task decision: Analyze TDD cycle complexity, set method to "agent" OR "cli" per task
- Simple cycles (<=5 test cases, <=3 files) → method: "agent"
- Complex cycles (>5 test cases, >3 files, integration tests) → method: "cli"
CLI tool: userConfig.preferredCliTool, enable_resume: true
IMPORTANT: Do NOT add command field to implementation_approach steps. Execution routing is controlled by task-level meta.execution_config.method only.
## EXPLORATION CONTEXT (from context-package.exploration_results)
- Load exploration_results from context-package.json
- Use aggregated_insights.critical_files for focus_paths generation
- Apply aggregated_insights.constraints to acceptance criteria
- Reference aggregated_insights.all_patterns for implementation approach
- Use aggregated_insights.all_integration_points for precise modification locations
- Use conflict_indicators for risk-aware task sequencing
## CONFLICT RESOLUTION CONTEXT (if exists)
- Check context-package.conflict_detection.resolution_file for conflict-resolution.json path
- If exists, load .process/conflict-resolution.json:
- Apply planning_constraints as task constraints (for brainstorm-less workflows)
- Reference resolved_conflicts for implementation approach alignment
- Handle custom_conflicts with explicit task notes
## TEST CONTEXT INTEGRATION
- Load test-context-package.json for existing test patterns and coverage analysis
- Extract test framework configuration (Jest/Pytest/etc.)
- Identify existing test conventions and patterns
- Map coverage gaps to TDD Red phase test targets
## TDD DOCUMENT GENERATION TASK
**Agent Configuration Reference**: All TDD task generation rules, quantification requirements, Red-Green-Refactor cycle structure, quality standards, and execution details are defined in action-planning-agent.
### TDD-Specific Requirements Summary
#### Task Structure Philosophy
- **1 feature = 1 task** containing complete TDD cycle internally
- Each task executes Red-Green-Refactor phases sequentially
- Task count = Feature count (typically 5 features = 5 tasks)
- Subtasks only when complexity >2500 lines or >6 files per cycle
- **Maximum 18 tasks** (hard limit for TDD workflows)
#### TDD Cycle Mapping
- **Simple features**: IMPL-N with internal Red-Green-Refactor phases
- **Complex features**: IMPL-N (container) + IMPL-N.M (subtasks)
- Each cycle includes: test_count, test_cases array, implementation_scope, expected_coverage
#### Required Outputs Summary
##### 1. TDD Task JSON Files (.task/IMPL-*.json)
- **Location**: \`.workflow/active/{session-id}/.task/\`
- **Schema**: 6-field structure with TDD-specific metadata
- \`id, title, status, context_package_path, meta, context, flow_control\`
- \`meta.tdd_workflow\`: true (REQUIRED)
- \`meta.max_iterations\`: 3 (Green phase test-fix cycle limit)
- \`meta.cli_execution_id\`: Unique CLI execution ID (format: \`{session_id}-{task_id}\`)
- \`meta.cli_execution\`: Strategy object (new|resume|fork|merge_fork)
- \`context.tdd_cycles\`: Array with quantified test cases and coverage
- \`context.focus_paths\`: Absolute or clear relative paths (enhanced with exploration critical_files)
- \`flow_control.implementation_approach\`: Exactly 3 steps with \`tdd_phase\` field
1. Red Phase (\`tdd_phase: "red"\`): Write failing tests
2. Green Phase (\`tdd_phase: "green"\`): Implement to pass tests
3. Refactor Phase (\`tdd_phase: "refactor"\`): Improve code quality
- \`flow_control.pre_analysis\`: Include exploration integration_points analysis
- **meta.execution_config**: Set per userConfig.executionMethod (agent/cli/hybrid)
- **Details**: See action-planning-agent.md § TDD Task JSON Generation
##### 2. IMPL_PLAN.md (TDD Variant)
- **Location**: \`.workflow/active/{session-id}/IMPL_PLAN.md\`
- **Template**: \`~/.claude/workflows/cli-templates/prompts/workflow/impl-plan-template.txt\`
- **TDD-Specific Frontmatter**: workflow_type="tdd", tdd_workflow=true, feature_count, task_breakdown
- **TDD Implementation Tasks Section**: Feature-by-feature with internal Red-Green-Refactor cycles
- **Context Analysis**: Artifact references and exploration insights
- **Details**: See action-planning-agent.md § TDD Implementation Plan Creation
##### 3. TODO_LIST.md
- **Location**: \`.workflow/active/{session-id}/TODO_LIST.md\`
- **Format**: Hierarchical task list with internal TDD phase indicators (Red → Green → Refactor)
- **Status**: ▸ (container), [ ] (pending), [x] (completed)
- **Links**: Task JSON references and summaries
- **Details**: See action-planning-agent.md § TODO List Generation
### CLI EXECUTION ID REQUIREMENTS (MANDATORY)
Each task JSON MUST include:
- **meta.cli_execution_id**: Unique ID for CLI execution (format: \`{session_id}-{task_id}\`)
- **meta.cli_execution**: Strategy object based on depends_on:
- No deps → \`{ "strategy": "new" }\`
- 1 dep (single child) → \`{ "strategy": "resume", "resume_from": "parent-cli-id" }\`
- 1 dep (multiple children) → \`{ "strategy": "fork", "resume_from": "parent-cli-id" }\`
- N deps → \`{ "strategy": "merge_fork", "resume_from": ["id1", "id2", ...] }\`
- **Type**: \`resume_from: string | string[]\` (string for resume/fork, array for merge_fork)
**CLI Execution Strategy Rules**:
1. **new**: Task has no dependencies - starts fresh CLI conversation
2. **resume**: Task has 1 parent AND that parent has only this child - continues same conversation
3. **fork**: Task has 1 parent BUT parent has multiple children - creates new branch with parent context
4. **merge_fork**: Task has multiple parents - merges all parent contexts into new conversation
**Execution Command Patterns**:
- new: \`ccw cli -p "[prompt]" --tool [tool] --mode write --id [cli_execution_id]\`
- resume: \`ccw cli -p "[prompt]" --resume [resume_from] --tool [tool] --mode write\`
- fork: \`ccw cli -p "[prompt]" --resume [resume_from] --id [cli_execution_id] --tool [tool] --mode write\`
- merge_fork: \`ccw cli -p "[prompt]" --resume [resume_from.join(',')] --id [cli_execution_id] --tool [tool] --mode write\` (resume_from is array)
### Quantification Requirements (MANDATORY)
**Core Rules**:
1. **Explicit Test Case Counts**: Red phase specifies exact number with enumerated list
2. **Quantified Coverage**: Acceptance includes measurable percentage (e.g., ">=85%")
3. **Detailed Implementation Scope**: Green phase enumerates files, functions, line counts
4. **Enumerated Refactoring Targets**: Refactor phase lists specific improvements with counts
**TDD Phase Formats**:
- **Red Phase**: "Write N test cases: [test1, test2, ...]"
- **Green Phase**: "Implement N functions in file lines X-Y: [func1() X1-Y1, func2() X2-Y2, ...]"
- **Refactor Phase**: "Apply N refactorings: [improvement1 (details), improvement2 (details), ...]"
- **Acceptance**: "All N tests pass with >=X% coverage: verify by [test command]"
**Validation Checklist**:
- [ ] Every Red phase specifies exact test case count with enumerated list
- [ ] Every Green phase enumerates files, functions, and estimated line counts
- [ ] Every Refactor phase lists specific improvements with counts
- [ ] Every acceptance criterion includes measurable coverage percentage
- [ ] tdd_cycles array contains test_count and test_cases for each cycle
- [ ] No vague language ("comprehensive", "complete", "thorough")
- [ ] cli_execution_id and cli_execution strategy assigned to each task
### Agent Execution Summary
**Key Steps** (Detailed instructions in action-planning-agent.md):
1. Load task JSON template from provided path
2. Extract and decompose features with TDD cycles
3. Generate TDD task JSON files enforcing quantification requirements
4. Create IMPL_PLAN.md using TDD template variant
5. Generate TODO_LIST.md with TDD phase indicators
6. Update session state with TDD metadata
**Quality Gates** (Full checklist in action-planning-agent.md):
- Task count <=18 (hard limit)
- Each task has meta.tdd_workflow: true
- Each task has exactly 3 implementation steps with tdd_phase field ("red", "green", "refactor")
- Each task has meta.cli_execution_id and meta.cli_execution strategy
- Green phase includes test-fix cycle logic with max_iterations
- focus_paths are absolute or clear relative paths (from exploration critical_files)
- Artifact references mapped correctly from context package
- Exploration context integrated (critical_files, constraints, patterns, integration_points)
- Conflict resolution context applied (if conflict_risk >= medium)
- Test context integrated (existing test patterns and coverage analysis)
- Documents follow TDD template structure
- CLI tool selection based on userConfig.executionMethod
- Quantification requirements enforced (explicit counts, measurable acceptance, exact targets)
## SUCCESS CRITERIA
- All planning documents generated successfully:
- Task JSONs valid and saved to .task/ directory with cli_execution_id
- IMPL_PLAN.md created with complete TDD structure
- TODO_LIST.md generated matching task JSONs
- CLI execution strategies assigned based on task dependencies
- Return completion status with document count and task breakdown summary
## OUTPUT SUMMARY
Generate all three documents and report:
- TDD task JSON files created: N files (IMPL-*.json) with cli_execution_id assigned
- TDD cycles configured: N cycles with quantified test cases
- CLI execution strategies: new/resume/fork/merge_fork assigned per dependency graph
- Artifacts integrated: synthesis-spec/guidance-specification, relevant role analyses
- Exploration context: critical_files, constraints, patterns, integration_points
- Test context integrated: existing patterns and coverage
- Conflict resolution: applied (if conflict_risk >= medium)
- Session ready for TDD execution
`
});
// Wait for agent completion
const result = wait({
ids: [agentId],
timeout_ms: 600000 // 10 minutes
});
// Handle timeout
if (result.timed_out) {
console.warn("TDD task generation timed out, prompting completion...");
send_input({
id: agentId,
message: "Please finalize document generation and report completion status."
});
const retryResult = wait({ ids: [agentId], timeout_ms: 300000 });
}
// Clean up agent resources (IMPORTANT: must always call)
close_agent({ id: agentId });
```
### Agent Context Passing
**Context Delegation Model**: Command provides paths and metadata, agent loads context autonomously using progressive loading strategy.
**Command Provides** (in agent prompt):
```javascript
// Command assembles these simple values and paths for agent
const commandProvides = {
// Session paths
session_metadata_path: ".workflow/active/WFS-{id}/workflow-session.json",
context_package_path: ".workflow/active/WFS-{id}/.process/context-package.json",
test_context_package_path: ".workflow/active/WFS-{id}/.process/test-context-package.json",
output_task_dir: ".workflow/active/WFS-{id}/.task/",
output_impl_plan: ".workflow/active/WFS-{id}/IMPL_PLAN.md",
output_todo_list: ".workflow/active/WFS-{id}/TODO_LIST.md",
// Simple metadata
session_id: "WFS-{id}",
workflow_type: "tdd",
mcp_capabilities: { exa_code: true, exa_web: true, code_index: true },
// User configuration from Phase 0
user_config: {
supplementaryMaterials: { type: "...", content: [...] },
executionMethod: "agent|hybrid|cli",
preferredCliTool: "codex|gemini|qwen|auto",
enableResume: true
}
}
```
**Agent Loads Autonomously** (progressive loading):
```javascript
// Agent executes progressive loading based on memory state
const agentLoads = {
// Core (ALWAYS load if not in memory)
session_metadata: loadIfNotInMemory(session_metadata_path),
context_package: loadIfNotInMemory(context_package_path),
// Selective (based on progressive strategy)
// Priority: synthesis_output > guidance + relevant_role_analyses
brainstorm_content: loadSelectiveBrainstormArtifacts(context_package),
// On-Demand (load if exists and relevant)
test_context: loadIfExists(test_context_package_path),
conflict_resolution: loadConflictResolution(context_package),
// Optional (if MCP available)
exploration_results: extractExplorationResults(context_package),
external_research: executeMcpResearch() // If needed
}
```
**Progressive Loading Implementation** (agent responsibility):
1. **Check memory first** - skip if already loaded
2. **Load core files** - session metadata + context-package.json
3. **Smart selective loading** - synthesis_output OR (guidance + task-relevant role analyses)
4. **On-demand loading** - test context, conflict resolution (if conflict_risk >= medium)
5. **Extract references** - exploration results, artifact paths from context package
## TDD Task Structure Reference
This section provides quick reference for TDD task JSON structure. For complete implementation details, see the agent invocation prompt in Phase 2 above.
**Quick Reference**:
- Each TDD task contains complete Red-Green-Refactor cycle
- Task ID format: `IMPL-N` (simple) or `IMPL-N.M` (complex subtasks)
- Required metadata:
- `meta.tdd_workflow: true`
- `meta.max_iterations: 3`
- `meta.cli_execution_id: "{session_id}-{task_id}"`
- `meta.cli_execution: { "strategy": "new|resume|fork|merge_fork", ... }`
- Context: `tdd_cycles` array with quantified test cases and coverage:
```javascript
tdd_cycles: [
{
test_count: 5, // Number of test cases to write
test_cases: ["case1", "case2"], // Enumerated test scenarios
implementation_scope: "...", // Files and functions to implement
expected_coverage: ">=85%" // Coverage target
}
]
```
- Context: `focus_paths` use absolute or clear relative paths
- Flow control: Exactly 3 steps with `tdd_phase` field ("red", "green", "refactor")
- Flow control: `pre_analysis` includes exploration integration_points analysis
- **meta.execution_config**: Set per `userConfig.executionMethod` (agent/cli/hybrid)
- See Phase 2 agent prompt for full schema and requirements
## Output Files Structure
```
.workflow/active/{session-id}/
├── IMPL_PLAN.md # Unified plan with TDD Implementation Tasks section
├── TODO_LIST.md # Progress tracking with internal TDD phase indicators
├── .task/
│ ├── IMPL-1.json # Complete TDD task (Red-Green-Refactor internally)
│ ├── IMPL-2.json # Complete TDD task
│ ├── IMPL-3.json # Complex feature container (if needed)
│ ├── IMPL-3.1.json # Complex feature subtask (if needed)
│ ├── IMPL-3.2.json # Complex feature subtask (if needed)
│ └── ...
└── .process/
├── conflict-resolution.json # Conflict resolution results (if conflict_risk >= medium)
├── test-context-package.json # Test coverage analysis
├── context-package.json # Input from context-gather
├── context_package_path # Path to smart context package
└── green-fix-iteration-*.md # Fix logs from Green phase test-fix cycles
```
**File Count**:
- **Old approach**: 5 features = 15 task JSON files (TEST/IMPL/REFACTOR x 5)
- **New approach**: 5 features = 5 task JSON files (IMPL-N x 5)
- **Complex feature**: 1 feature = 1 container + M subtasks (IMPL-N + IMPL-N.M)
## Validation Rules
### Task Completeness
- Every IMPL-N must contain complete TDD workflow in `flow_control.implementation_approach`
- Each task must have 3 steps with `tdd_phase`: "red", "green", "refactor"
- Every task must have `meta.tdd_workflow: true`
### Dependency Enforcement
- Sequential features: IMPL-N depends_on ["IMPL-(N-1)"] if needed
- Complex feature subtasks: IMPL-N.M depends_on ["IMPL-N.(M-1)"] or parent dependencies
- No circular dependencies allowed
### Task Limits
- Maximum 18 total tasks (simple + subtasks) - hard limit for TDD workflows
- Flat hierarchy (<=5 tasks) or two-level (6-18 tasks with containers)
- Re-scope requirements if >18 tasks needed
### TDD Workflow Validation
- `meta.tdd_workflow` must be true
- `flow_control.implementation_approach` must have exactly 3 steps
- Each step must have `tdd_phase` field ("red", "green", or "refactor")
- Green phase step must include test-fix cycle logic
- `meta.max_iterations` must be present (default: 3)
## Error Handling
### Input Validation Errors
| Error | Cause | Resolution |
|-------|-------|------------|
| Session not found | Invalid session ID | Verify session exists |
| Context missing | Incomplete planning | Run context-gather first |
### TDD Generation Errors
| Error | Cause | Resolution |
|-------|-------|------------|
| Task count exceeds 18 | Too many features or subtasks | Re-scope requirements or merge features into multiple TDD sessions |
| Missing test framework | No test config | Configure testing first |
| Invalid TDD workflow | Missing tdd_phase or incomplete flow_control | Fix TDD structure in ANALYSIS_RESULTS.md |
| Missing tdd_workflow flag | Task doesn't have meta.tdd_workflow: true | Add TDD workflow metadata |
| Agent timeout | Large context or complex planning | Retry with send_input, or spawn new agent |
## Integration
**Called By**: SKILL.md (Phase 5: TDD Task Generation)
**Invokes**: `action-planning-agent` via spawn_agent for autonomous task generation
**Followed By**: Phase 6 (TDD Structure Validation in SKILL.md), then workflow:execute (external)
**CLI Tool Selection**: Determined semantically from user's task description. Include "use Codex/Gemini/Qwen" in your request for CLI execution.
**Output**:
- TDD task JSON files in `.task/` directory (IMPL-N.json format)
- IMPL_PLAN.md with TDD Implementation Tasks section
- TODO_LIST.md with internal TDD phase indicators
- Session state updated with task count and TDD metadata
- MCP enhancements integrated (if available)
## Test Coverage Analysis Integration
The TDD workflow includes test coverage analysis (via phases/01-test-context-gather.md) to:
- Detect existing test patterns and conventions
- Identify current test coverage gaps
- Discover test framework and configuration
- Enable integration with existing tests
This makes TDD workflow context-aware instead of assuming greenfield scenarios.
## Iterative Green Phase with Test-Fix Cycle
IMPL (Green phase) tasks include automatic test-fix cycle:
**Process Flow**:
1. **Initial Implementation**: Write minimal code to pass tests
2. **Test Execution**: Run test suite
3. **Success Path**: Tests pass → Complete task
4. **Failure Path**: Tests fail → Enter iterative fix cycle:
- **Gemini Diagnosis**: Analyze failures with bug-fix template
- **Fix Application**: Agent executes fixes directly
- **Retest**: Verify fix resolves failures
- **Repeat**: Up to max_iterations (default: 3)
5. **Safety Net**: Auto-revert all changes if max iterations reached
## Configuration Options
- **meta.max_iterations**: Number of fix attempts in Green phase (default: 3)
- **meta.execution_config.method**: Execution routing (agent/cli) determined from userConfig.executionMethod
---
## Post-Phase Update
After Phase 2 (TDD Task Generation) completes:
- **Output Created**: IMPL_PLAN.md, TODO_LIST.md, IMPL-*.json task files in `.task/` directory
- **TDD Structure**: Each task contains complete Red-Green-Refactor cycle internally
- **CLI Execution IDs**: All tasks assigned unique cli_execution_id for resume support
- **Next Action**: Phase 6 (TDD Structure Validation) in SKILL.md
- **TodoWrite**: Collapse Phase 5 sub-tasks to "Phase 5: TDD Task Generation: completed"

View File

@@ -0,0 +1,575 @@
# Phase 3: TDD Verify
## Goal
Verify TDD workflow execution quality by validating Red-Green-Refactor cycle compliance, test coverage completeness, and task chain structure integrity. This phase orchestrates multiple analysis steps and generates a comprehensive compliance report with quality gate recommendation.
**Output**: A structured Markdown report saved to `.workflow/active/WFS-{session}/TDD_COMPLIANCE_REPORT.md` containing:
- Executive summary with compliance score and quality gate recommendation
- Task chain validation (TEST → IMPL → REFACTOR structure)
- Test coverage metrics (line, branch, function)
- Red-Green-Refactor cycle verification
- Best practices adherence assessment
- Actionable improvement recommendations
## Operating Constraints
**ORCHESTRATOR MODE**:
- This phase coordinates coverage analysis (`phases/04-tdd-coverage-analysis.md`) and internal validation
- MAY write output files: TDD_COMPLIANCE_REPORT.md (primary report), .process/*.json (intermediate artifacts)
- MUST NOT modify source task files or implementation code
- MUST NOT create or delete tasks in the workflow
**Quality Gate Authority**: The compliance report provides a binding recommendation (BLOCK_MERGE / REQUIRE_FIXES / PROCEED_WITH_CAVEATS / APPROVED) based on objective compliance criteria.
## Core Responsibilities
- Verify TDD task chain structure (TEST → IMPL → REFACTOR)
- Analyze test coverage metrics
- Validate TDD cycle execution quality
- Generate compliance report with quality gate recommendation
## Execution Process
```
Input Parsing:
└─ Decision (session argument):
├─ --session provided → Use provided session
└─ No session → Auto-detect active session
Phase 1: Session Discovery & Validation
├─ Detect or validate session directory
├─ Check required artifacts exist (.task/*.json, .summaries/*)
└─ ERROR if invalid or incomplete
Phase 2: Task Chain Structure Validation
├─ Load all task JSONs from .task/
├─ Validate TDD structure: TEST-N.M → IMPL-N.M → REFACTOR-N.M
├─ Verify dependencies (depends_on)
├─ Validate meta fields (tdd_phase, agent)
└─ Extract chain validation data
Phase 3: Coverage & Cycle Analysis
├─ Read and execute: phases/04-tdd-coverage-analysis.md
├─ Parse: test-results.json, coverage-report.json, tdd-cycle-report.md
└─ Extract coverage metrics and TDD cycle verification
Phase 4: Compliance Report Generation
├─ Aggregate findings from Phases 1-3
├─ Calculate compliance score (0-100)
├─ Determine quality gate recommendation
├─ Generate TDD_COMPLIANCE_REPORT.md
└─ Display summary to user
```
## 4-Phase Execution
### Phase 1: Session Discovery & Validation
**Step 1.1: Detect Session**
```bash
IF --session parameter provided:
session_id = provided session
ELSE:
# Auto-detect active session
active_sessions = bash(find .workflow/active/ -name "WFS-*" -type d 2>/dev/null)
IF active_sessions is empty:
ERROR: "No active workflow session found. Use --session <session-id>"
EXIT
ELSE IF active_sessions has multiple entries:
# Use most recently modified session
session_id = bash(ls -td .workflow/active/WFS-*/ 2>/dev/null | head -1 | xargs basename)
ELSE:
session_id = basename(active_sessions[0])
# Derive paths
session_dir = .workflow/active/WFS-{session_id}
task_dir = session_dir/.task
summaries_dir = session_dir/.summaries
process_dir = session_dir/.process
```
**Step 1.2: Validate Required Artifacts**
```bash
# Check task files exist
task_files = Glob(task_dir/*.json)
IF task_files.count == 0:
ERROR: "No task JSON files found. Run TDD planning (SKILL.md) first"
EXIT
# Check summaries exist (optional but recommended for full analysis)
summaries_exist = EXISTS(summaries_dir)
IF NOT summaries_exist:
WARNING: "No .summaries/ directory found. Some analysis may be limited."
```
**Output**: session_id, session_dir, task_files list
---
### Phase 2: Task Chain Structure Validation
**Step 2.1: Load and Parse Task JSONs**
```bash
# Single-pass JSON extraction using jq
validation_data = bash("""
# Load all tasks and extract structured data
cd '{session_dir}/.task'
# Extract all task IDs
task_ids=$(jq -r '.id' *.json 2>/dev/null | sort)
# Extract dependencies for IMPL tasks
impl_deps=$(jq -r 'select(.id | startswith("IMPL")) | .id + ":" + (.context.depends_on[]? // "none")' *.json 2>/dev/null)
# Extract dependencies for REFACTOR tasks
refactor_deps=$(jq -r 'select(.id | startswith("REFACTOR")) | .id + ":" + (.context.depends_on[]? // "none")' *.json 2>/dev/null)
# Extract meta fields
meta_tdd=$(jq -r '.id + ":" + (.meta.tdd_phase // "missing")' *.json 2>/dev/null)
meta_agent=$(jq -r '.id + ":" + (.meta.agent // "missing")' *.json 2>/dev/null)
# Output as JSON
jq -n --arg ids "$task_ids" \
--arg impl "$impl_deps" \
--arg refactor "$refactor_deps" \
--arg tdd "$meta_tdd" \
--arg agent "$meta_agent" \
'{ids: $ids, impl_deps: $impl, refactor_deps: $refactor, tdd: $tdd, agent: $agent}'
""")
```
**Step 2.2: Validate TDD Chain Structure**
```
Parse validation_data JSON and validate:
For each feature N (extracted from task IDs):
1. TEST-N.M exists?
2. IMPL-N.M exists?
3. REFACTOR-N.M exists? (optional but recommended)
4. IMPL-N.M.context.depends_on contains TEST-N.M?
5. REFACTOR-N.M.context.depends_on contains IMPL-N.M?
6. TEST-N.M.meta.tdd_phase == "red"?
7. TEST-N.M.meta.agent == "@code-review-test-agent"?
8. IMPL-N.M.meta.tdd_phase == "green"?
9. IMPL-N.M.meta.agent == "@code-developer"?
10. REFACTOR-N.M.meta.tdd_phase == "refactor"?
Calculate:
- chain_completeness_score = (complete_chains / total_chains) * 100
- dependency_accuracy = (correct_deps / total_deps) * 100
- meta_field_accuracy = (correct_meta / total_meta) * 100
```
**Output**: chain_validation_report (JSON structure with validation results)
---
### Phase 3: Coverage & Cycle Analysis
**Step 3.1: Call Coverage Analysis Phase**
Read and execute the coverage analysis phase:
- **Phase file**: `phases/04-tdd-coverage-analysis.md`
- **Args**: `--session {session_id}`
**Step 3.2: Parse Output Files**
```bash
# Check required outputs exist
IF NOT EXISTS(process_dir/test-results.json):
WARNING: "test-results.json not found. Coverage analysis incomplete."
coverage_data = null
ELSE:
coverage_data = Read(process_dir/test-results.json)
IF NOT EXISTS(process_dir/coverage-report.json):
WARNING: "coverage-report.json not found. Coverage metrics incomplete."
metrics = null
ELSE:
metrics = Read(process_dir/coverage-report.json)
IF NOT EXISTS(process_dir/tdd-cycle-report.md):
WARNING: "tdd-cycle-report.md not found. Cycle validation incomplete."
cycle_data = null
ELSE:
cycle_data = Read(process_dir/tdd-cycle-report.md)
```
**Step 3.3: Extract Coverage Metrics**
```
If coverage_data exists:
- line_coverage_percent
- branch_coverage_percent
- function_coverage_percent
- uncovered_files (list)
- uncovered_lines (map: file -> line ranges)
If cycle_data exists:
- red_phase_compliance (tests failed initially?)
- green_phase_compliance (tests pass after impl?)
- refactor_phase_compliance (tests stay green during refactor?)
- minimal_implementation_score (was impl minimal?)
```
**Output**: coverage_analysis, cycle_analysis
---
### Phase 4: Compliance Report Generation
**Step 4.1: Calculate Compliance Score**
```
Base Score: 100 points
Deductions:
Chain Structure:
- Missing TEST task: -30 points per feature
- Missing IMPL task: -30 points per feature
- Missing REFACTOR task: -10 points per feature
- Wrong dependency: -15 points per error
- Wrong agent: -5 points per error
- Wrong tdd_phase: -5 points per error
TDD Cycle Compliance:
- Test didn't fail initially: -10 points per feature
- Tests didn't pass after IMPL: -20 points per feature
- Tests broke during REFACTOR: -15 points per feature
- Over-engineered IMPL: -10 points per feature
Coverage Quality:
- Line coverage < 80%: -5 points
- Branch coverage < 70%: -5 points
- Function coverage < 80%: -5 points
- Critical paths uncovered: -10 points
Final Score: Max(0, Base Score - Total Deductions)
```
**Step 4.2: Determine Quality Gate**
```
IF score >= 90 AND no_critical_violations:
recommendation = "APPROVED"
ELSE IF score >= 70 AND critical_violations == 0:
recommendation = "PROCEED_WITH_CAVEATS"
ELSE IF score >= 50:
recommendation = "REQUIRE_FIXES"
ELSE:
recommendation = "BLOCK_MERGE"
```
**Step 4.3: Generate Report**
```bash
report_content = Generate markdown report (see structure below)
report_path = "{session_dir}/TDD_COMPLIANCE_REPORT.md"
Write(report_path, report_content)
```
**Step 4.4: Display Summary to User**
```bash
echo "=== TDD Verification Complete ==="
echo "Session: {session_id}"
echo "Report: {report_path}"
echo ""
echo "Quality Gate: {recommendation}"
echo "Compliance Score: {score}/100"
echo ""
echo "Chain Validation: {chain_completeness_score}%"
echo "Line Coverage: {line_coverage}%"
echo "Branch Coverage: {branch_coverage}%"
echo ""
echo "Next: Review full report for detailed findings"
```
## TodoWrite Pattern (Optional)
**Note**: As an orchestrator phase, TodoWrite tracking is optional and primarily useful for long-running verification processes. For most cases, the 4-phase execution is fast enough that progress tracking adds noise without value.
```javascript
// Only use TodoWrite for complex multi-session verification
// Skip for single-session verification
```
## Validation Logic
### Chain Validation Algorithm
```
1. Load all task JSONs from .workflow/active/{sessionId}/.task/
2. Extract task IDs and group by feature number
3. For each feature:
- Check TEST-N.M exists
- Check IMPL-N.M exists
- Check REFACTOR-N.M exists (optional but recommended)
- Verify IMPL-N.M depends_on TEST-N.M
- Verify REFACTOR-N.M depends_on IMPL-N.M
- Verify meta.tdd_phase values
- Verify meta.agent assignments
4. Calculate chain completeness score
5. Report incomplete or invalid chains
```
### Quality Gate Criteria
| Recommendation | Score Range | Critical Violations | Action |
|----------------|-------------|---------------------|--------|
| **APPROVED** | ≥90 | 0 | Safe to merge |
| **PROCEED_WITH_CAVEATS** | ≥70 | 0 | Can proceed, address minor issues |
| **REQUIRE_FIXES** | ≥50 | Any | Must fix before merge |
| **BLOCK_MERGE** | <50 | Any | Block merge until resolved |
**Critical Violations**:
- Missing TEST or IMPL task for any feature
- Tests didn't fail initially (Red phase violation)
- Tests didn't pass after IMPL (Green phase violation)
- Tests broke during REFACTOR (Refactor phase violation)
## Output Files
```
.workflow/active/WFS-{session-id}/
├── TDD_COMPLIANCE_REPORT.md # Comprehensive compliance report
└── .process/
├── test-results.json # From phases/04-tdd-coverage-analysis.md
├── coverage-report.json # From phases/04-tdd-coverage-analysis.md
└── tdd-cycle-report.md # From phases/04-tdd-coverage-analysis.md
```
## Error Handling
### Session Discovery Errors
| Error | Cause | Resolution |
|-------|-------|------------|
| No active session | No WFS-* directories | Provide --session explicitly |
| Multiple active sessions | Multiple WFS-* directories | Provide --session explicitly |
| Session not found | Invalid session-id | Check available sessions |
### Validation Errors
| Error | Cause | Resolution |
|-------|-------|------------|
| Task files missing | Incomplete planning | Run TDD planning (SKILL.md) first |
| Invalid JSON | Corrupted task files | Regenerate tasks |
| Missing summaries | Tasks not executed | Execute tasks before verify |
### Analysis Errors
| Error | Cause | Resolution |
|-------|-------|------------|
| Coverage tool missing | No test framework | Configure testing first |
| Tests fail to run | Code errors | Fix errors before verify |
| Coverage analysis fails | phases/04-tdd-coverage-analysis.md error | Check analysis output |
## Integration
### Phase Chain
- **Called After**: Task execution completes (all TDD tasks done)
- **Calls**: `phases/04-tdd-coverage-analysis.md`
- **Related Skills**: SKILL.md (orchestrator), `workflow-plan-execute/` (session management)
### When to Use
- After completing all TDD tasks in a workflow
- Before merging TDD workflow branch
- For TDD process quality assessment
- To identify missing TDD steps
## TDD Compliance Report Structure
```markdown
# TDD Compliance Report - {Session ID}
**Generated**: {timestamp}
**Session**: WFS-{sessionId}
**Workflow Type**: TDD
---
## Executive Summary
### Quality Gate Decision
| Metric | Value | Status |
|--------|-------|--------|
| Compliance Score | {score}/100 | {status_emoji} |
| Chain Completeness | {percentage}% | {status} |
| Line Coverage | {percentage}% | {status} |
| Branch Coverage | {percentage}% | {status} |
| Function Coverage | {percentage}% | {status} |
### Recommendation
**{RECOMMENDATION}**
**Decision Rationale**:
{brief explanation based on score and violations}
**Quality Gate Criteria**:
- **APPROVED**: Score ≥90, no critical violations
- **PROCEED_WITH_CAVEATS**: Score ≥70, no critical violations
- **REQUIRE_FIXES**: Score ≥50 or critical violations exist
- **BLOCK_MERGE**: Score <50
---
## Chain Analysis
### Feature 1: {Feature Name}
**Status**: Complete
**Chain**: TEST-1.1 → IMPL-1.1 → REFACTOR-1.1
| Phase | Task | Status | Details |
|-------|------|--------|---------|
| Red | TEST-1.1 | Pass | Test created and failed with clear message |
| Green | IMPL-1.1 | Pass | Minimal implementation made test pass |
| Refactor | REFACTOR-1.1 | Pass | Code improved, tests remained green |
### Feature 2: {Feature Name}
**Status**: Incomplete
**Chain**: TEST-2.1 → IMPL-2.1 (Missing REFACTOR-2.1)
| Phase | Task | Status | Details |
|-------|------|--------|---------|
| Red | TEST-2.1 | Pass | Test created and failed |
| Green | IMPL-2.1 | Warning | Implementation seems over-engineered |
| Refactor | REFACTOR-2.1 | Missing | Task not completed |
**Issues**:
- REFACTOR-2.1 task not completed (-10 points)
- IMPL-2.1 implementation exceeded minimal scope (-10 points)
### Chain Validation Summary
| Metric | Value |
|--------|-------|
| Total Features | {count} |
| Complete Chains | {count} ({percent}%) |
| Incomplete Chains | {count} |
| Missing TEST | {count} |
| Missing IMPL | {count} |
| Missing REFACTOR | {count} |
| Dependency Errors | {count} |
| Meta Field Errors | {count} |
---
## Test Coverage Analysis
### Coverage Metrics
| Metric | Coverage | Target | Status |
|--------|----------|--------|--------|
| Line Coverage | {percentage}% | ≥80% | {status} |
| Branch Coverage | {percentage}% | ≥70% | {status} |
| Function Coverage | {percentage}% | ≥80% | {status} |
### Coverage Gaps
| File | Lines | Issue | Priority |
|------|-------|-------|----------|
| src/auth/service.ts | 45-52 | Uncovered error handling | HIGH |
| src/utils/parser.ts | 78-85 | Uncovered edge case | MEDIUM |
---
## TDD Cycle Validation
### Red Phase (Write Failing Test)
- {N}/{total} features had failing tests initially ({percent}%)
- Compliant features: {list}
- Non-compliant features: {list}
**Violations**:
- Feature 3: No evidence of initial test failure (-10 points)
### Green Phase (Make Test Pass)
- {N}/{total} implementations made tests pass ({percent}%)
- Compliant features: {list}
- Non-compliant features: {list}
**Violations**:
- Feature 2: Implementation over-engineered (-10 points)
### Refactor Phase (Improve Quality)
- {N}/{total} features completed refactoring ({percent}%)
- Compliant features: {list}
- Non-compliant features: {list}
**Violations**:
- Feature 2, 4: Refactoring step skipped (-20 points total)
---
## Best Practices Assessment
### Strengths
- Clear test descriptions
- Good test coverage
- Consistent naming conventions
- Well-structured code
### Areas for Improvement
- Some implementations over-engineered in Green phase
- Missing refactoring steps
- Test failure messages could be more descriptive
---
## Detailed Findings by Severity
### Critical Issues ({count})
{List of critical issues with impact and remediation}
### High Priority Issues ({count})
{List of high priority issues with impact and remediation}
### Medium Priority Issues ({count})
{List of medium priority issues with impact and remediation}
### Low Priority Issues ({count})
{List of low priority issues with impact and remediation}
---
## Recommendations
### Required Fixes (Before Merge)
1. Complete missing REFACTOR tasks (Features 2, 4)
2. Verify initial test failures for Feature 3
3. Fix tests that broke during refactoring
### Recommended Improvements
1. Simplify over-engineered implementations
2. Add edge case tests for Features 1, 3
3. Improve test failure message clarity
4. Increase branch coverage to >85%
### Optional Enhancements
1. Add more descriptive test names
2. Consider parameterized tests for similar scenarios
3. Document TDD process learnings
---
## Metrics Summary
| Metric | Value |
|--------|-------|
| Total Features | {count} |
| Complete Chains | {count} ({percent}%) |
| Compliance Score | {score}/100 |
| Critical Issues | {count} |
| High Issues | {count} |
| Medium Issues | {count} |
| Low Issues | {count} |
| Line Coverage | {percent}% |
| Branch Coverage | {percent}% |
| Function Coverage | {percent}% |
---
**Report End**
```
---
## Post-Phase Update
After TDD Verify completes:
- **Output Created**: `TDD_COMPLIANCE_REPORT.md` in session directory
- **Data Produced**: Compliance score, quality gate recommendation, chain validation, coverage metrics
- **Next Action**: Based on quality gate - APPROVED (merge), REQUIRE_FIXES (iterate), BLOCK_MERGE (rework)
- **TodoWrite**: Mark "TDD Verify: completed" with quality gate result

View File

@@ -0,0 +1,287 @@
# Phase 4: TDD Coverage Analysis
## Overview
Analyze test coverage and verify Red-Green-Refactor cycle execution for TDD workflow validation.
## Core Responsibilities
- Extract test files from TEST tasks
- Run test suite with coverage
- Parse coverage metrics
- Verify TDD cycle execution (Red -> Green -> Refactor)
- Generate coverage and cycle reports
## Execution Process
```
Input Parsing:
├─ Parse flags: --session
└─ Validation: session_id REQUIRED
Phase 1: Extract Test Tasks
└─ Find TEST-*.json files and extract focus_paths
Phase 2: Run Test Suite
└─ Decision (test framework):
├─ Node.js → npm test --coverage --json
├─ Python → pytest --cov --json-report
└─ Other → [test_command] --coverage --json
Phase 3: Parse Coverage Data
├─ Extract line coverage percentage
├─ Extract branch coverage percentage
├─ Extract function coverage percentage
└─ Identify uncovered lines/branches
Phase 4: Verify TDD Cycle
└─ FOR each TDD chain (TEST-N.M → IMPL-N.M → REFACTOR-N.M):
├─ Red Phase: Verify tests created and failed initially
├─ Green Phase: Verify tests now pass
└─ Refactor Phase: Verify code quality improved
Phase 5: Generate Analysis Report
└─ Create tdd-cycle-report.md with coverage metrics and cycle verification
```
## Execution Lifecycle
### Phase 1: Extract Test Tasks
```bash
find .workflow/active/{session_id}/.task/ -name 'TEST-*.json' -exec jq -r '.context.focus_paths[]' {} \;
```
**Output**: List of test directories/files from all TEST tasks
### Phase 2: Run Test Suite
```bash
# Node.js/JavaScript
npm test -- --coverage --json > .workflow/active/{session_id}/.process/test-results.json
# Python
pytest --cov --json-report > .workflow/active/{session_id}/.process/test-results.json
# Other frameworks (detect from project)
[test_command] --coverage --json-output .workflow/active/{session_id}/.process/test-results.json
```
**Output**: test-results.json with coverage data
### Phase 3: Parse Coverage Data
```bash
jq '.coverage' .workflow/active/{session_id}/.process/test-results.json > .workflow/active/{session_id}/.process/coverage-report.json
```
**Extract**:
- Line coverage percentage
- Branch coverage percentage
- Function coverage percentage
- Uncovered lines/branches
### Phase 4: Verify TDD Cycle
For each TDD chain (TEST-N.M -> IMPL-N.M -> REFACTOR-N.M):
**1. Red Phase Verification**
```bash
# Check TEST task summary
cat .workflow/active/{session_id}/.summaries/TEST-N.M-summary.md
```
Verify:
- Tests were created
- Tests failed initially
- Failure messages were clear
**2. Green Phase Verification**
```bash
# Check IMPL task summary
cat .workflow/active/{session_id}/.summaries/IMPL-N.M-summary.md
```
Verify:
- Implementation was completed
- Tests now pass
- Implementation was minimal
**3. Refactor Phase Verification**
```bash
# Check REFACTOR task summary
cat .workflow/active/{session_id}/.summaries/REFACTOR-N.M-summary.md
```
Verify:
- Refactoring was completed
- Tests still pass
- Code quality improved
### Phase 5: Generate Analysis Report
Create `.workflow/active/{session_id}/.process/tdd-cycle-report.md`:
```markdown
# TDD Cycle Analysis - {Session ID}
## Coverage Metrics
- **Line Coverage**: {percentage}%
- **Branch Coverage**: {percentage}%
- **Function Coverage**: {percentage}%
## Coverage Details
### Covered
- {covered_lines} lines
- {covered_branches} branches
- {covered_functions} functions
### Uncovered
- Lines: {uncovered_line_numbers}
- Branches: {uncovered_branch_locations}
## TDD Cycle Verification
### Feature 1: {Feature Name}
**Chain**: TEST-1.1 -> IMPL-1.1 -> REFACTOR-1.1
- [PASS] **Red Phase**: Tests created and failed initially
- [PASS] **Green Phase**: Implementation made tests pass
- [PASS] **Refactor Phase**: Refactoring maintained green tests
### Feature 2: {Feature Name}
**Chain**: TEST-2.1 -> IMPL-2.1 -> REFACTOR-2.1
- [PASS] **Red Phase**: Tests created and failed initially
- [WARN] **Green Phase**: Tests pass but implementation seems over-engineered
- [PASS] **Refactor Phase**: Refactoring maintained green tests
[Repeat for all features]
## TDD Compliance Summary
- **Total Chains**: {N}
- **Complete Cycles**: {N}
- **Incomplete Cycles**: {0}
- **Compliance Score**: {score}/100
## Gaps Identified
- Feature 3: Missing initial test failure verification
- Feature 5: No refactoring step completed
## Recommendations
- Complete missing refactoring steps
- Add edge case tests for Feature 2
- Verify test failure messages are descriptive
```
## Output Files
```
.workflow/active/{session-id}/
└── .process/
├── test-results.json # Raw test execution results
├── coverage-report.json # Parsed coverage data
└── tdd-cycle-report.md # TDD cycle analysis
```
## Test Framework Detection
Auto-detect test framework from project:
```bash
# Check for test frameworks
if [ -f "package.json" ] && grep -q "jest\|mocha\|vitest" package.json; then
TEST_CMD="npm test -- --coverage --json"
elif [ -f "pytest.ini" ] || [ -f "setup.py" ]; then
TEST_CMD="pytest --cov --json-report"
elif [ -f "Cargo.toml" ]; then
TEST_CMD="cargo test -- --test-threads=1 --nocapture"
elif [ -f "go.mod" ]; then
TEST_CMD="go test -coverprofile=coverage.out -json ./..."
else
TEST_CMD="echo 'No supported test framework found'"
fi
```
## TDD Cycle Verification Algorithm
```
For each feature N:
1. Load TEST-N.M-summary.md
IF summary missing:
Mark: "Red phase incomplete"
SKIP to next feature
CHECK: Contains "test" AND "fail"
IF NOT found:
Mark: "Red phase verification failed"
ELSE:
Mark: "Red phase [PASS]"
2. Load IMPL-N.M-summary.md
IF summary missing:
Mark: "Green phase incomplete"
SKIP to next feature
CHECK: Contains "pass" OR "green"
IF NOT found:
Mark: "Green phase verification failed"
ELSE:
Mark: "Green phase [PASS]"
3. Load REFACTOR-N.M-summary.md
IF summary missing:
Mark: "Refactor phase incomplete"
CONTINUE (refactor is optional)
CHECK: Contains "refactor" AND "pass"
IF NOT found:
Mark: "Refactor phase verification failed"
ELSE:
Mark: "Refactor phase [PASS]"
4. Calculate chain score:
- Red + Green + Refactor all [PASS] = 100%
- Red + Green [PASS], Refactor missing = 80%
- Red [PASS], Green missing = 40%
- All missing = 0%
```
## Coverage Metrics Calculation
```bash
# Parse coverage from test-results.json
line_coverage=$(jq '.coverage.lineCoverage' test-results.json)
branch_coverage=$(jq '.coverage.branchCoverage' test-results.json)
function_coverage=$(jq '.coverage.functionCoverage' test-results.json)
# Calculate overall score
overall_score=$(echo "($line_coverage + $branch_coverage + $function_coverage) / 3" | bc)
```
## Error Handling
### Test Execution Errors
| Error | Cause | Resolution |
|-------|-------|------------|
| Test framework not found | No test config | Configure test framework first |
| Tests fail to run | Syntax errors | Fix code before analysis |
| Coverage not available | Missing coverage tool | Install coverage plugin |
### Cycle Verification Errors
| Error | Cause | Resolution |
|-------|-------|------------|
| Summary missing | Task not executed | Execute tasks before analysis |
| Invalid summary format | Corrupted file | Re-run task to regenerate |
| No test evidence | Tests not committed | Ensure tests are committed |
## Integration
### Phase Chain
- **Called By**: `phases/03-tdd-verify.md` (Coverage & Cycle Analysis step)
- **Calls**: Test framework commands (npm test, pytest, etc.)
- **Followed By**: Compliance report generation in `phases/03-tdd-verify.md`
---
## Post-Phase Update
After TDD Coverage Analysis completes:
- **Output Created**: `test-results.json`, `coverage-report.json`, `tdd-cycle-report.md` in `.process/`
- **Data Produced**: Coverage metrics (line/branch/function), TDD cycle verification results per feature
- **Next Action**: Return data to `phases/03-tdd-verify.md` for compliance report aggregation
- **TodoWrite**: Mark "Coverage & Cycle Analysis: completed"