feat: Implement dynamic test-fix execution phase with adaptive task generation

- Added Phase 2: Test-Cycle Execution documentation outlining the process for dynamic test-fix execution, including agent roles, core responsibilities, intelligent strategy engine, and progressive testing. - Introduced new PowerShell scripts for analyzing TypeScript errors, focusing on error categorization and reporting. - Created end-to-end tests for the Help Page, ensuring content visibility, documentation navigation, internationalization support, and accessibility compliance.
2026-02-13 02:41:50 +08:00 · 2026-02-07 17:01:30 +08:00
parent 4ce4419ea6
commit ba5f4eba84
70 changed files with 7288 additions and 488 deletions
--- a/.codex/skills/workflow-tdd-plan/phases/02-task-generate-tdd.md
+++ b/.codex/skills/workflow-tdd-plan/phases/02-task-generate-tdd.md
@@ -0,0 +1,774 @@
+## Auto Mode
+
+When `--yes` or `-y`: Skip user questions, use defaults (no materials, Agent executor).
+
+# Phase 2: TDD Task Generation
+
+## Overview
+Autonomous TDD task JSON and IMPL_PLAN.md generation using action-planning-agent with two-phase execution: discovery and document generation. Generates complete Red-Green-Refactor cycles contained within each task.
+
+## Core Philosophy
+- **Agent-Driven**: Delegate execution to action-planning-agent for autonomous operation
+- **Two-Phase Flow**: Discovery (context gathering) → Output (document generation)
+- **Memory-First**: Reuse loaded documents from conversation memory
+- **MCP-Enhanced**: Use MCP tools for advanced code analysis and research
+- **Semantic CLI Selection**: CLI tool usage determined from user's task description, not flags
+- **Agent Simplicity**: Agent generates content with semantic CLI detection
+- **Path Clarity**: All `focus_paths` prefer absolute paths (e.g., `D:\\project\\src\\module`), or clear relative paths from project root (e.g., `./src/module`)
+- **TDD-First**: Every feature starts with a failing test (Red phase)
+- **Feature-Complete Tasks**: Each task contains complete Red-Green-Refactor cycle
+- **Quantification-Enforced**: All test cases, coverage requirements, and implementation scope MUST include explicit counts and enumerations
+- **Explicit Lifecycle**: Always close_agent after wait completes to free resources
+
+## Task Strategy & Philosophy
+
+### Optimized Task Structure (Current)
+- **1 feature = 1 task** containing complete TDD cycle internally
+- Each task executes Red-Green-Refactor phases sequentially
+- Task count = Feature count (typically 5 features = 5 tasks)
+
+**Previous Approach** (Deprecated):
+- 1 feature = 3 separate tasks (TEST-N.M, IMPL-N.M, REFACTOR-N.M)
+- 5 features = 15 tasks with complex dependency chains
+- High context switching cost between phases
+
+### When to Use Subtasks
+- Feature complexity >2500 lines or >6 files per TDD cycle
+- Multiple independent sub-features needing parallel execution
+- Strong technical dependency blocking (e.g., API before UI)
+- Different tech stacks or domains within feature
+
+### Task Limits
+- **Maximum 18 tasks** (hard limit for TDD workflows)
+- **Feature-based**: Complete functional units with internal TDD cycles
+- **Hierarchy**: Flat (<=5 simple features) | Two-level (6-10 for complex features with sub-features)
+- **Re-scope**: If >18 tasks needed, break project into multiple TDD workflow sessions
+
+### TDD Cycle Mapping
+- **Old approach**: 1 feature = 3 tasks (TEST-N.M, IMPL-N.M, REFACTOR-N.M)
+- **Current approach**: 1 feature = 1 task (IMPL-N with internal Red-Green-Refactor phases)
+- **Complex features**: 1 container (IMPL-N) + subtasks (IMPL-N.M) when necessary
+
+## Execution Process
+
+```
+Input Parsing:
+   ├─ Parse flags: --session
+   └─ Validation: session_id REQUIRED
+
+Phase 1: Discovery & Context Loading (Memory-First)
+   ├─ Load session context (if not in memory)
+   ├─ Load context package (if not in memory)
+   ├─ Load test context package (if not in memory)
+   ├─ Extract & load role analyses from context package
+   ├─ Load conflict resolution (if exists)
+   └─ Optional: MCP external research
+
+Phase 2: Agent Execution (Document Generation)
+   ├─ Pre-agent template selection (semantic CLI detection)
+   ├─ Invoke action-planning-agent via spawn_agent
+   ├─ Generate TDD Task JSON Files (.task/IMPL-*.json)
+   │  └─ Each task: complete Red-Green-Refactor cycle internally
+   ├─ Create IMPL_PLAN.md (TDD variant)
+   └─ Generate TODO_LIST.md with TDD phase indicators
+```
+
+## Execution Lifecycle
+
+### Phase 0: User Configuration (Interactive)
+
+**Purpose**: Collect user preferences before TDD task generation to ensure generated tasks match execution expectations and provide necessary supplementary context.
+
+**User Questions**:
+```javascript
+AskUserQuestion({
+  questions: [
+    {
+      question: "Do you have supplementary materials or guidelines to include?",
+      header: "Materials",
+      multiSelect: false,
+      options: [
+        { label: "No additional materials", description: "Use existing context only" },
+        { label: "Provide file paths", description: "I'll specify paths to include" },
+        { label: "Provide inline content", description: "I'll paste content directly" }
+      ]
+    },
+    {
+      question: "Select execution method for generated TDD tasks:",
+      header: "Execution",
+      multiSelect: false,
+      options: [
+        { label: "Agent (Recommended)", description: "Agent executes Red-Green-Refactor cycles directly" },
+        { label: "Hybrid", description: "Agent orchestrates, calls CLI for complex steps (Red/Green phases)" },
+        { label: "CLI Only", description: "All TDD cycles via CLI tools (codex/gemini/qwen)" }
+      ]
+    },
+    {
+      question: "If using CLI, which tool do you prefer?",
+      header: "CLI Tool",
+      multiSelect: false,
+      options: [
+        { label: "Codex (Recommended)", description: "Best for TDD Red-Green-Refactor cycles" },
+        { label: "Gemini", description: "Best for analysis and large context" },
+        { label: "Qwen", description: "Alternative analysis tool" },
+        { label: "Auto", description: "Let agent decide per-task" }
+      ]
+    }
+  ]
+})
+```
+
+**Handle Materials Response**:
+```javascript
+if (userConfig.materials === "Provide file paths") {
+  // Follow-up question for file paths
+  const pathsResponse = AskUserQuestion({
+    questions: [{
+      question: "Enter file paths to include (comma-separated or one per line):",
+      header: "Paths",
+      multiSelect: false,
+      options: [
+        { label: "Enter paths", description: "Provide paths in text input" }
+      ]
+    }]
+  })
+  userConfig.supplementaryPaths = parseUserPaths(pathsResponse)
+}
+```
+
+**Build userConfig**:
+```javascript
+const userConfig = {
+  supplementaryMaterials: {
+    type: "none|paths|inline",
+    content: [...],  // Parsed paths or inline content
+  },
+  executionMethod: "agent|hybrid|cli",
+  preferredCliTool: "codex|gemini|qwen|auto",
+  enableResume: true  // Always enable resume for CLI executions
+}
+```
+
+**Pass to Agent**: Include `userConfig` in agent prompt for Phase 2.
+
+---
+
+### Phase 1: Context Preparation & Discovery
+
+**Command Responsibility**: Command prepares session paths and metadata, provides to agent for autonomous context loading.
+
+**Memory-First Rule**: Skip file loading if documents already in conversation memory
+
+**Progressive Loading Strategy**: Load context incrementally due to large analysis.md file sizes:
+- **Core**: session metadata + context-package.json (always load)
+- **Selective**: synthesis_output OR (guidance + relevant role analyses) - NOT all role analyses
+- **On-Demand**: conflict resolution (if conflict_risk >= medium), test context
+
+**Path Clarity Requirement**: All `focus_paths` prefer absolute paths (e.g., `D:\\project\\src\\module`), or clear relative paths from project root (e.g., `./src/module`)
+
+**Session Path Structure** (Provided by Command to Agent):
+```
+.workflow/active/WFS-{session-id}/
+├── workflow-session.json          # Session metadata
+├── .process/
+│   ├── context-package.json       # Context package with artifact catalog
+│   ├── test-context-package.json  # Test coverage analysis
+│   └── conflict-resolution.json   # Conflict resolution (if exists)
+├── .task/                         # Output: Task JSON files
+│   ├── IMPL-1.json
+│   ├── IMPL-2.json
+│   └── ...
+├── IMPL_PLAN.md                   # Output: TDD implementation plan
+└── TODO_LIST.md                   # Output: TODO list with TDD phases
+```
+
+**Command Preparation**:
+1. **Assemble Session Paths** for agent prompt:
+   - `session_metadata_path`: `.workflow/active/{session-id}/workflow-session.json`
+   - `context_package_path`: `.workflow/active/{session-id}/.process/context-package.json`
+   - `test_context_package_path`: `.workflow/active/{session-id}/.process/test-context-package.json`
+   - Output directory paths
+
+2. **Provide Metadata** (simple values):
+   - `session_id`: WFS-{session-id}
+   - `workflow_type`: "tdd"
+   - `mcp_capabilities`: {exa_code, exa_web, code_index}
+
+3. **Pass userConfig** from Phase 0
+
+**Agent Context Package** (Agent loads autonomously):
+```javascript
+{
+  "session_id": "WFS-[session-id]",
+  "workflow_type": "tdd",
+
+  // Core (ALWAYS load)
+  "session_metadata": {
+    // If in memory: use cached content
+    // Else: Load from workflow-session.json
+  },
+  "context_package": {
+    // If in memory: use cached content
+    // Else: Load from context-package.json
+  },
+
+  // Selective (load based on progressive strategy)
+  "brainstorm_artifacts": {
+    // Loaded from context-package.json → brainstorm_artifacts section
+    "synthesis_output": {"path": "...", "exists": true},  // Load if exists (highest priority)
+    "guidance_specification": {"path": "...", "exists": true},  // Load if no synthesis
+    "role_analyses": [  // Load SELECTIVELY based on task relevance
+      {
+        "role": "system-architect",
+        "files": [{"path": "...", "type": "primary|supplementary"}]
+      }
+    ]
+  },
+
+  // On-Demand (load if exists)
+  "test_context_package": {
+    // Load from test-context-package.json
+    // Contains existing test patterns and coverage analysis
+  },
+  "conflict_resolution": {
+    // Load from conflict-resolution.json if conflict_risk >= medium
+    // Check context-package.conflict_detection.resolution_file
+  },
+
+  // Capabilities
+  "mcp_capabilities": {
+    "exa_code": true,
+    "exa_web": true,
+    "code_index": true
+  },
+
+  // User configuration from Phase 0
+  "user_config": {
+    // From Phase 0 AskUserQuestion
+  }
+}
+```
+
+**Discovery Actions**:
+1. **Load Session Context** (if not in memory)
+   ```javascript
+   if (!memory.has("workflow-session.json")) {
+     Read(.workflow/active/{session-id}/workflow-session.json)
+   }
+   ```
+
+2. **Load Context Package** (if not in memory)
+   ```javascript
+   if (!memory.has("context-package.json")) {
+     Read(.workflow/active/{session-id}/.process/context-package.json)
+   }
+   ```
+
+3. **Load Test Context Package** (if not in memory)
+   ```javascript
+   if (!memory.has("test-context-package.json")) {
+     Read(.workflow/active/{session-id}/.process/test-context-package.json)
+   }
+   ```
+
+4. **Extract & Load Role Analyses** (from context-package.json)
+   ```javascript
+   // Extract role analysis paths from context package
+   const roleAnalysisPaths = contextPackage.brainstorm_artifacts.role_analyses
+     .flatMap(role => role.files.map(f => f.path));
+
+   // Load each role analysis file
+   roleAnalysisPaths.forEach(path => Read(path));
+   ```
+
+5. **Load Conflict Resolution** (from conflict-resolution.json, if exists)
+   ```javascript
+   // Check for new conflict-resolution.json format
+   if (contextPackage.conflict_detection?.resolution_file) {
+     Read(contextPackage.conflict_detection.resolution_file)  // .process/conflict-resolution.json
+   }
+   // Fallback: legacy brainstorm_artifacts path
+   else if (contextPackage.brainstorm_artifacts?.conflict_resolution?.exists) {
+     Read(contextPackage.brainstorm_artifacts.conflict_resolution.path)
+   }
+   ```
+
+6. **Code Analysis with Native Tools** (optional - enhance understanding)
+   ```bash
+   # Find relevant test files and patterns
+   find . -name "*test*" -type f
+   rg "describe|it\(|test\(" -g "*.ts"
+   ```
+
+7. **MCP External Research** (optional - gather TDD best practices)
+   ```javascript
+   // Get external TDD examples and patterns
+   mcp__exa__get_code_context_exa(
+     query="TypeScript TDD best practices Red-Green-Refactor",
+     tokensNum="dynamic"
+   )
+   ```
+
+### Phase 2: Agent Execution (TDD Document Generation)
+
+**Purpose**: Generate TDD planning documents (IMPL_PLAN.md, task JSONs, TODO_LIST.md) - planning only, NOT code implementation.
+
+**Agent Invocation**:
+```javascript
+// Spawn action-planning-agent
+const agentId = spawn_agent({
+  message: `
+## TASK ASSIGNMENT
+
+### MANDATORY FIRST STEPS (Agent Execute)
+1. **Read role definition**: ~/.codex/agents/action-planning-agent.md (MUST read first)
+2. Read: .workflow/project-tech.json
+3. Read: .workflow/project-guidelines.json
+
+---
+
+## TASK OBJECTIVE
+Generate TDD implementation planning documents (IMPL_PLAN.md, task JSONs, TODO_LIST.md) for workflow session
+
+IMPORTANT: This is PLANNING ONLY - you are generating planning documents, NOT implementing code.
+
+CRITICAL: Follow the progressive loading strategy (load analysis.md files incrementally due to file size):
+- **Core**: session metadata + context-package.json (always)
+- **Selective**: synthesis_output OR (guidance + relevant role analyses) - NOT all
+- **On-Demand**: conflict resolution (if conflict_risk >= medium), test context
+
+## SESSION PATHS
+Input:
+  - Session Metadata: .workflow/active/{session-id}/workflow-session.json
+  - Context Package: .workflow/active/{session-id}/.process/context-package.json
+  - Test Context: .workflow/active/{session-id}/.process/test-context-package.json
+
+Output:
+  - Task Dir: .workflow/active/{session-id}/.task/
+  - IMPL_PLAN: .workflow/active/{session-id}/IMPL_PLAN.md
+  - TODO_LIST: .workflow/active/{session-id}/TODO_LIST.md
+
+## CONTEXT METADATA
+Session ID: {session-id}
+Workflow Type: TDD
+MCP Capabilities: {exa_code, exa_web, code_index}
+
+## USER CONFIGURATION (from Phase 0)
+Execution Method: ${userConfig.executionMethod}  // agent|hybrid|cli
+Preferred CLI Tool: ${userConfig.preferredCliTool}  // codex|gemini|qwen|auto
+Supplementary Materials: ${userConfig.supplementaryMaterials}
+
+## EXECUTION METHOD MAPPING
+Based on userConfig.executionMethod, set task-level meta.execution_config:
+
+"agent" →
+  meta.execution_config = { method: "agent", cli_tool: null, enable_resume: false }
+  Agent executes Red-Green-Refactor phases directly
+
+"cli" →
+  meta.execution_config = { method: "cli", cli_tool: userConfig.preferredCliTool, enable_resume: true }
+  Agent executes pre_analysis, then hands off full context to CLI via buildCliHandoffPrompt()
+
+"hybrid" →
+  Per-task decision: Analyze TDD cycle complexity, set method to "agent" OR "cli" per task
+  - Simple cycles (<=5 test cases, <=3 files) → method: "agent"
+  - Complex cycles (>5 test cases, >3 files, integration tests) → method: "cli"
+  CLI tool: userConfig.preferredCliTool, enable_resume: true
+
+IMPORTANT: Do NOT add command field to implementation_approach steps. Execution routing is controlled by task-level meta.execution_config.method only.
+
+## EXPLORATION CONTEXT (from context-package.exploration_results)
+- Load exploration_results from context-package.json
+- Use aggregated_insights.critical_files for focus_paths generation
+- Apply aggregated_insights.constraints to acceptance criteria
+- Reference aggregated_insights.all_patterns for implementation approach
+- Use aggregated_insights.all_integration_points for precise modification locations
+- Use conflict_indicators for risk-aware task sequencing
+
+## CONFLICT RESOLUTION CONTEXT (if exists)
+- Check context-package.conflict_detection.resolution_file for conflict-resolution.json path
+- If exists, load .process/conflict-resolution.json:
+  - Apply planning_constraints as task constraints (for brainstorm-less workflows)
+  - Reference resolved_conflicts for implementation approach alignment
+  - Handle custom_conflicts with explicit task notes
+
+## TEST CONTEXT INTEGRATION
+- Load test-context-package.json for existing test patterns and coverage analysis
+- Extract test framework configuration (Jest/Pytest/etc.)
+- Identify existing test conventions and patterns
+- Map coverage gaps to TDD Red phase test targets
+
+## TDD DOCUMENT GENERATION TASK
+
+**Agent Configuration Reference**: All TDD task generation rules, quantification requirements, Red-Green-Refactor cycle structure, quality standards, and execution details are defined in action-planning-agent.
+
+### TDD-Specific Requirements Summary
+
+#### Task Structure Philosophy
+- **1 feature = 1 task** containing complete TDD cycle internally
+- Each task executes Red-Green-Refactor phases sequentially
+- Task count = Feature count (typically 5 features = 5 tasks)
+- Subtasks only when complexity >2500 lines or >6 files per cycle
+- **Maximum 18 tasks** (hard limit for TDD workflows)
+
+#### TDD Cycle Mapping
+- **Simple features**: IMPL-N with internal Red-Green-Refactor phases
+- **Complex features**: IMPL-N (container) + IMPL-N.M (subtasks)
+- Each cycle includes: test_count, test_cases array, implementation_scope, expected_coverage
+
+#### Required Outputs Summary
+
+##### 1. TDD Task JSON Files (.task/IMPL-*.json)
+- **Location**: \`.workflow/active/{session-id}/.task/\`
+- **Schema**: 6-field structure with TDD-specific metadata
+  - \`id, title, status, context_package_path, meta, context, flow_control\`
+  - \`meta.tdd_workflow\`: true (REQUIRED)
+  - \`meta.max_iterations\`: 3 (Green phase test-fix cycle limit)
+  - \`meta.cli_execution_id\`: Unique CLI execution ID (format: \`{session_id}-{task_id}\`)
+  - \`meta.cli_execution\`: Strategy object (new|resume|fork|merge_fork)
+  - \`context.tdd_cycles\`: Array with quantified test cases and coverage
+  - \`context.focus_paths\`: Absolute or clear relative paths (enhanced with exploration critical_files)
+  - \`flow_control.implementation_approach\`: Exactly 3 steps with \`tdd_phase\` field
+    1. Red Phase (\`tdd_phase: "red"\`): Write failing tests
+    2. Green Phase (\`tdd_phase: "green"\`): Implement to pass tests
+    3. Refactor Phase (\`tdd_phase: "refactor"\`): Improve code quality
+  - \`flow_control.pre_analysis\`: Include exploration integration_points analysis
+  - **meta.execution_config**: Set per userConfig.executionMethod (agent/cli/hybrid)
+- **Details**: See action-planning-agent.md § TDD Task JSON Generation
+
+##### 2. IMPL_PLAN.md (TDD Variant)
+- **Location**: \`.workflow/active/{session-id}/IMPL_PLAN.md\`
+- **Template**: \`~/.claude/workflows/cli-templates/prompts/workflow/impl-plan-template.txt\`
+- **TDD-Specific Frontmatter**: workflow_type="tdd", tdd_workflow=true, feature_count, task_breakdown
+- **TDD Implementation Tasks Section**: Feature-by-feature with internal Red-Green-Refactor cycles
+- **Context Analysis**: Artifact references and exploration insights
+- **Details**: See action-planning-agent.md § TDD Implementation Plan Creation
+
+##### 3. TODO_LIST.md
+- **Location**: \`.workflow/active/{session-id}/TODO_LIST.md\`
+- **Format**: Hierarchical task list with internal TDD phase indicators (Red → Green → Refactor)
+- **Status**: ▸ (container), [ ] (pending), [x] (completed)
+- **Links**: Task JSON references and summaries
+- **Details**: See action-planning-agent.md § TODO List Generation
+
+### CLI EXECUTION ID REQUIREMENTS (MANDATORY)
+
+Each task JSON MUST include:
+- **meta.cli_execution_id**: Unique ID for CLI execution (format: \`{session_id}-{task_id}\`)
+- **meta.cli_execution**: Strategy object based on depends_on:
+  - No deps → \`{ "strategy": "new" }\`
+  - 1 dep (single child) → \`{ "strategy": "resume", "resume_from": "parent-cli-id" }\`
+  - 1 dep (multiple children) → \`{ "strategy": "fork", "resume_from": "parent-cli-id" }\`
+  - N deps → \`{ "strategy": "merge_fork", "resume_from": ["id1", "id2", ...] }\`
+  - **Type**: \`resume_from: string | string[]\` (string for resume/fork, array for merge_fork)
+
+**CLI Execution Strategy Rules**:
+1. **new**: Task has no dependencies - starts fresh CLI conversation
+2. **resume**: Task has 1 parent AND that parent has only this child - continues same conversation
+3. **fork**: Task has 1 parent BUT parent has multiple children - creates new branch with parent context
+4. **merge_fork**: Task has multiple parents - merges all parent contexts into new conversation
+
+**Execution Command Patterns**:
+- new: \`ccw cli -p "[prompt]" --tool [tool] --mode write --id [cli_execution_id]\`
+- resume: \`ccw cli -p "[prompt]" --resume [resume_from] --tool [tool] --mode write\`
+- fork: \`ccw cli -p "[prompt]" --resume [resume_from] --id [cli_execution_id] --tool [tool] --mode write\`
+- merge_fork: \`ccw cli -p "[prompt]" --resume [resume_from.join(',')] --id [cli_execution_id] --tool [tool] --mode write\` (resume_from is array)
+
+### Quantification Requirements (MANDATORY)
+
+**Core Rules**:
+1. **Explicit Test Case Counts**: Red phase specifies exact number with enumerated list
+2. **Quantified Coverage**: Acceptance includes measurable percentage (e.g., ">=85%")
+3. **Detailed Implementation Scope**: Green phase enumerates files, functions, line counts
+4. **Enumerated Refactoring Targets**: Refactor phase lists specific improvements with counts
+
+**TDD Phase Formats**:
+- **Red Phase**: "Write N test cases: [test1, test2, ...]"
+- **Green Phase**: "Implement N functions in file lines X-Y: [func1() X1-Y1, func2() X2-Y2, ...]"
+- **Refactor Phase**: "Apply N refactorings: [improvement1 (details), improvement2 (details), ...]"
+- **Acceptance**: "All N tests pass with >=X% coverage: verify by [test command]"
+
+**Validation Checklist**:
+- [ ] Every Red phase specifies exact test case count with enumerated list
+- [ ] Every Green phase enumerates files, functions, and estimated line counts
+- [ ] Every Refactor phase lists specific improvements with counts
+- [ ] Every acceptance criterion includes measurable coverage percentage
+- [ ] tdd_cycles array contains test_count and test_cases for each cycle
+- [ ] No vague language ("comprehensive", "complete", "thorough")
+- [ ] cli_execution_id and cli_execution strategy assigned to each task
+
+### Agent Execution Summary
+
+**Key Steps** (Detailed instructions in action-planning-agent.md):
+1. Load task JSON template from provided path
+2. Extract and decompose features with TDD cycles
+3. Generate TDD task JSON files enforcing quantification requirements
+4. Create IMPL_PLAN.md using TDD template variant
+5. Generate TODO_LIST.md with TDD phase indicators
+6. Update session state with TDD metadata
+
+**Quality Gates** (Full checklist in action-planning-agent.md):
+- Task count <=18 (hard limit)
+- Each task has meta.tdd_workflow: true
+- Each task has exactly 3 implementation steps with tdd_phase field ("red", "green", "refactor")
+- Each task has meta.cli_execution_id and meta.cli_execution strategy
+- Green phase includes test-fix cycle logic with max_iterations
+- focus_paths are absolute or clear relative paths (from exploration critical_files)
+- Artifact references mapped correctly from context package
+- Exploration context integrated (critical_files, constraints, patterns, integration_points)
+- Conflict resolution context applied (if conflict_risk >= medium)
+- Test context integrated (existing test patterns and coverage analysis)
+- Documents follow TDD template structure
+- CLI tool selection based on userConfig.executionMethod
+- Quantification requirements enforced (explicit counts, measurable acceptance, exact targets)
+
+## SUCCESS CRITERIA
+- All planning documents generated successfully:
+  - Task JSONs valid and saved to .task/ directory with cli_execution_id
+  - IMPL_PLAN.md created with complete TDD structure
+  - TODO_LIST.md generated matching task JSONs
+- CLI execution strategies assigned based on task dependencies
+- Return completion status with document count and task breakdown summary
+
+## OUTPUT SUMMARY
+Generate all three documents and report:
+- TDD task JSON files created: N files (IMPL-*.json) with cli_execution_id assigned
+- TDD cycles configured: N cycles with quantified test cases
+- CLI execution strategies: new/resume/fork/merge_fork assigned per dependency graph
+- Artifacts integrated: synthesis-spec/guidance-specification, relevant role analyses
+- Exploration context: critical_files, constraints, patterns, integration_points
+- Test context integrated: existing patterns and coverage
+- Conflict resolution: applied (if conflict_risk >= medium)
+- Session ready for TDD execution
+`
+});
+
+// Wait for agent completion
+const result = wait({
+  ids: [agentId],
+  timeout_ms: 600000  // 10 minutes
+});
+
+// Handle timeout
+if (result.timed_out) {
+  console.warn("TDD task generation timed out, prompting completion...");
+  send_input({
+    id: agentId,
+    message: "Please finalize document generation and report completion status."
+  });
+  const retryResult = wait({ ids: [agentId], timeout_ms: 300000 });
+}
+
+// Clean up agent resources (IMPORTANT: must always call)
+close_agent({ id: agentId });
+```
+
+### Agent Context Passing
+
+**Context Delegation Model**: Command provides paths and metadata, agent loads context autonomously using progressive loading strategy.
+
+**Command Provides** (in agent prompt):
+```javascript
+// Command assembles these simple values and paths for agent
+const commandProvides = {
+  // Session paths
+  session_metadata_path: ".workflow/active/WFS-{id}/workflow-session.json",
+  context_package_path: ".workflow/active/WFS-{id}/.process/context-package.json",
+  test_context_package_path: ".workflow/active/WFS-{id}/.process/test-context-package.json",
+  output_task_dir: ".workflow/active/WFS-{id}/.task/",
+  output_impl_plan: ".workflow/active/WFS-{id}/IMPL_PLAN.md",
+  output_todo_list: ".workflow/active/WFS-{id}/TODO_LIST.md",
+
+  // Simple metadata
+  session_id: "WFS-{id}",
+  workflow_type: "tdd",
+  mcp_capabilities: { exa_code: true, exa_web: true, code_index: true },
+
+  // User configuration from Phase 0
+  user_config: {
+    supplementaryMaterials: { type: "...", content: [...] },
+    executionMethod: "agent|hybrid|cli",
+    preferredCliTool: "codex|gemini|qwen|auto",
+    enableResume: true
+  }
+}
+```
+
+**Agent Loads Autonomously** (progressive loading):
+```javascript
+// Agent executes progressive loading based on memory state
+const agentLoads = {
+  // Core (ALWAYS load if not in memory)
+  session_metadata: loadIfNotInMemory(session_metadata_path),
+  context_package: loadIfNotInMemory(context_package_path),
+
+  // Selective (based on progressive strategy)
+  // Priority: synthesis_output > guidance + relevant_role_analyses
+  brainstorm_content: loadSelectiveBrainstormArtifacts(context_package),
+
+  // On-Demand (load if exists and relevant)
+  test_context: loadIfExists(test_context_package_path),
+  conflict_resolution: loadConflictResolution(context_package),
+
+  // Optional (if MCP available)
+  exploration_results: extractExplorationResults(context_package),
+  external_research: executeMcpResearch()  // If needed
+}
+```
+
+**Progressive Loading Implementation** (agent responsibility):
+1. **Check memory first** - skip if already loaded
+2. **Load core files** - session metadata + context-package.json
+3. **Smart selective loading** - synthesis_output OR (guidance + task-relevant role analyses)
+4. **On-demand loading** - test context, conflict resolution (if conflict_risk >= medium)
+5. **Extract references** - exploration results, artifact paths from context package
+
+## TDD Task Structure Reference
+
+This section provides quick reference for TDD task JSON structure. For complete implementation details, see the agent invocation prompt in Phase 2 above.
+
+**Quick Reference**:
+- Each TDD task contains complete Red-Green-Refactor cycle
+- Task ID format: `IMPL-N` (simple) or `IMPL-N.M` (complex subtasks)
+- Required metadata:
+  - `meta.tdd_workflow: true`
+  - `meta.max_iterations: 3`
+  - `meta.cli_execution_id: "{session_id}-{task_id}"`
+  - `meta.cli_execution: { "strategy": "new|resume|fork|merge_fork", ... }`
+- Context: `tdd_cycles` array with quantified test cases and coverage:
+  ```javascript
+  tdd_cycles: [
+    {
+      test_count: 5,                    // Number of test cases to write
+      test_cases: ["case1", "case2"],   // Enumerated test scenarios
+      implementation_scope: "...",      // Files and functions to implement
+      expected_coverage: ">=85%"        // Coverage target
+    }
+  ]
+  ```
+- Context: `focus_paths` use absolute or clear relative paths
+- Flow control: Exactly 3 steps with `tdd_phase` field ("red", "green", "refactor")
+- Flow control: `pre_analysis` includes exploration integration_points analysis
+- **meta.execution_config**: Set per `userConfig.executionMethod` (agent/cli/hybrid)
+- See Phase 2 agent prompt for full schema and requirements
+
+## Output Files Structure
+```
+.workflow/active/{session-id}/
+├── IMPL_PLAN.md                     # Unified plan with TDD Implementation Tasks section
+├── TODO_LIST.md                     # Progress tracking with internal TDD phase indicators
+├── .task/
+│   ├── IMPL-1.json                  # Complete TDD task (Red-Green-Refactor internally)
+│   ├── IMPL-2.json                  # Complete TDD task
+│   ├── IMPL-3.json                  # Complex feature container (if needed)
+│   ├── IMPL-3.1.json                # Complex feature subtask (if needed)
+│   ├── IMPL-3.2.json                # Complex feature subtask (if needed)
+│   └── ...
+└── .process/
+    ├── conflict-resolution.json     # Conflict resolution results (if conflict_risk >= medium)
+    ├── test-context-package.json    # Test coverage analysis
+    ├── context-package.json         # Input from context-gather
+    ├── context_package_path         # Path to smart context package
+    └── green-fix-iteration-*.md     # Fix logs from Green phase test-fix cycles
+```
+
+**File Count**:
+- **Old approach**: 5 features = 15 task JSON files (TEST/IMPL/REFACTOR x 5)
+- **New approach**: 5 features = 5 task JSON files (IMPL-N x 5)
+- **Complex feature**: 1 feature = 1 container + M subtasks (IMPL-N + IMPL-N.M)
+
+## Validation Rules
+
+### Task Completeness
+- Every IMPL-N must contain complete TDD workflow in `flow_control.implementation_approach`
+- Each task must have 3 steps with `tdd_phase`: "red", "green", "refactor"
+- Every task must have `meta.tdd_workflow: true`
+
+### Dependency Enforcement
+- Sequential features: IMPL-N depends_on ["IMPL-(N-1)"] if needed
+- Complex feature subtasks: IMPL-N.M depends_on ["IMPL-N.(M-1)"] or parent dependencies
+- No circular dependencies allowed
+
+### Task Limits
+- Maximum 18 total tasks (simple + subtasks) - hard limit for TDD workflows
+- Flat hierarchy (<=5 tasks) or two-level (6-18 tasks with containers)
+- Re-scope requirements if >18 tasks needed
+
+### TDD Workflow Validation
+- `meta.tdd_workflow` must be true
+- `flow_control.implementation_approach` must have exactly 3 steps
+- Each step must have `tdd_phase` field ("red", "green", or "refactor")
+- Green phase step must include test-fix cycle logic
+- `meta.max_iterations` must be present (default: 3)
+
+## Error Handling
+
+### Input Validation Errors
+| Error | Cause | Resolution |
+|-------|-------|------------|
+| Session not found | Invalid session ID | Verify session exists |
+| Context missing | Incomplete planning | Run context-gather first |
+
+### TDD Generation Errors
+| Error | Cause | Resolution |
+|-------|-------|------------|
+| Task count exceeds 18 | Too many features or subtasks | Re-scope requirements or merge features into multiple TDD sessions |
+| Missing test framework | No test config | Configure testing first |
+| Invalid TDD workflow | Missing tdd_phase or incomplete flow_control | Fix TDD structure in ANALYSIS_RESULTS.md |
+| Missing tdd_workflow flag | Task doesn't have meta.tdd_workflow: true | Add TDD workflow metadata |
+| Agent timeout | Large context or complex planning | Retry with send_input, or spawn new agent |
+
+## Integration
+
+**Called By**: SKILL.md (Phase 5: TDD Task Generation)
+**Invokes**: `action-planning-agent` via spawn_agent for autonomous task generation
+**Followed By**: Phase 6 (TDD Structure Validation in SKILL.md), then workflow:execute (external)
+
+**CLI Tool Selection**: Determined semantically from user's task description. Include "use Codex/Gemini/Qwen" in your request for CLI execution.
+
+**Output**:
+- TDD task JSON files in `.task/` directory (IMPL-N.json format)
+- IMPL_PLAN.md with TDD Implementation Tasks section
+- TODO_LIST.md with internal TDD phase indicators
+- Session state updated with task count and TDD metadata
+- MCP enhancements integrated (if available)
+
+## Test Coverage Analysis Integration
+
+The TDD workflow includes test coverage analysis (via phases/01-test-context-gather.md) to:
+- Detect existing test patterns and conventions
+- Identify current test coverage gaps
+- Discover test framework and configuration
+- Enable integration with existing tests
+
+This makes TDD workflow context-aware instead of assuming greenfield scenarios.
+
+## Iterative Green Phase with Test-Fix Cycle
+
+IMPL (Green phase) tasks include automatic test-fix cycle:
+
+**Process Flow**:
+1. **Initial Implementation**: Write minimal code to pass tests
+2. **Test Execution**: Run test suite
+3. **Success Path**: Tests pass → Complete task
+4. **Failure Path**: Tests fail → Enter iterative fix cycle:
+   - **Gemini Diagnosis**: Analyze failures with bug-fix template
+   - **Fix Application**: Agent executes fixes directly
+   - **Retest**: Verify fix resolves failures
+   - **Repeat**: Up to max_iterations (default: 3)
+5. **Safety Net**: Auto-revert all changes if max iterations reached
+
+## Configuration Options
+- **meta.max_iterations**: Number of fix attempts in Green phase (default: 3)
+- **meta.execution_config.method**: Execution routing (agent/cli) determined from userConfig.executionMethod
+
+---
+
+## Post-Phase Update
+
+After Phase 2 (TDD Task Generation) completes:
+- **Output Created**: IMPL_PLAN.md, TODO_LIST.md, IMPL-*.json task files in `.task/` directory
+- **TDD Structure**: Each task contains complete Red-Green-Refactor cycle internally
+- **CLI Execution IDs**: All tasks assigned unique cli_execution_id for resume support
+- **Next Action**: Phase 6 (TDD Structure Validation) in SKILL.md
+- **TodoWrite**: Collapse Phase 5 sub-tasks to "Phase 5: TDD Task Generation: completed"
--- a/.codex/skills/workflow-tdd-plan/phases/03-tdd-verify.md
+++ b/.codex/skills/workflow-tdd-plan/phases/03-tdd-verify.md
@@ -0,0 +1,575 @@
+# Phase 3: TDD Verify
+
+## Goal
+
+Verify TDD workflow execution quality by validating Red-Green-Refactor cycle compliance, test coverage completeness, and task chain structure integrity. This phase orchestrates multiple analysis steps and generates a comprehensive compliance report with quality gate recommendation.
+
+**Output**: A structured Markdown report saved to `.workflow/active/WFS-{session}/TDD_COMPLIANCE_REPORT.md` containing:
+- Executive summary with compliance score and quality gate recommendation
+- Task chain validation (TEST → IMPL → REFACTOR structure)
+- Test coverage metrics (line, branch, function)
+- Red-Green-Refactor cycle verification
+- Best practices adherence assessment
+- Actionable improvement recommendations
+
+## Operating Constraints
+
+**ORCHESTRATOR MODE**:
+- This phase coordinates coverage analysis (`phases/04-tdd-coverage-analysis.md`) and internal validation
+- MAY write output files: TDD_COMPLIANCE_REPORT.md (primary report), .process/*.json (intermediate artifacts)
+- MUST NOT modify source task files or implementation code
+- MUST NOT create or delete tasks in the workflow
+
+**Quality Gate Authority**: The compliance report provides a binding recommendation (BLOCK_MERGE / REQUIRE_FIXES / PROCEED_WITH_CAVEATS / APPROVED) based on objective compliance criteria.
+
+## Core Responsibilities
+- Verify TDD task chain structure (TEST → IMPL → REFACTOR)
+- Analyze test coverage metrics
+- Validate TDD cycle execution quality
+- Generate compliance report with quality gate recommendation
+
+## Execution Process
+
+```
+Input Parsing:
+   └─ Decision (session argument):
+      ├─ --session provided → Use provided session
+      └─ No session → Auto-detect active session
+
+Phase 1: Session Discovery & Validation
+   ├─ Detect or validate session directory
+   ├─ Check required artifacts exist (.task/*.json, .summaries/*)
+   └─ ERROR if invalid or incomplete
+
+Phase 2: Task Chain Structure Validation
+   ├─ Load all task JSONs from .task/
+   ├─ Validate TDD structure: TEST-N.M → IMPL-N.M → REFACTOR-N.M
+   ├─ Verify dependencies (depends_on)
+   ├─ Validate meta fields (tdd_phase, agent)
+   └─ Extract chain validation data
+
+Phase 3: Coverage & Cycle Analysis
+   ├─ Read and execute: phases/04-tdd-coverage-analysis.md
+   ├─ Parse: test-results.json, coverage-report.json, tdd-cycle-report.md
+   └─ Extract coverage metrics and TDD cycle verification
+
+Phase 4: Compliance Report Generation
+   ├─ Aggregate findings from Phases 1-3
+   ├─ Calculate compliance score (0-100)
+   ├─ Determine quality gate recommendation
+   ├─ Generate TDD_COMPLIANCE_REPORT.md
+   └─ Display summary to user
+```
+
+## 4-Phase Execution
+
+### Phase 1: Session Discovery & Validation
+
+**Step 1.1: Detect Session**
+```bash
+IF --session parameter provided:
+    session_id = provided session
+ELSE:
+    # Auto-detect active session
+    active_sessions = bash(find .workflow/active/ -name "WFS-*" -type d 2>/dev/null)
+    IF active_sessions is empty:
+        ERROR: "No active workflow session found. Use --session <session-id>"
+        EXIT
+    ELSE IF active_sessions has multiple entries:
+        # Use most recently modified session
+        session_id = bash(ls -td .workflow/active/WFS-*/ 2>/dev/null | head -1 | xargs basename)
+    ELSE:
+        session_id = basename(active_sessions[0])
+
+# Derive paths
+session_dir = .workflow/active/WFS-{session_id}
+task_dir = session_dir/.task
+summaries_dir = session_dir/.summaries
+process_dir = session_dir/.process
+```
+
+**Step 1.2: Validate Required Artifacts**
+```bash
+# Check task files exist
+task_files = Glob(task_dir/*.json)
+IF task_files.count == 0:
+    ERROR: "No task JSON files found. Run TDD planning (SKILL.md) first"
+    EXIT
+
+# Check summaries exist (optional but recommended for full analysis)
+summaries_exist = EXISTS(summaries_dir)
+IF NOT summaries_exist:
+    WARNING: "No .summaries/ directory found. Some analysis may be limited."
+```
+
+**Output**: session_id, session_dir, task_files list
+
+---
+
+### Phase 2: Task Chain Structure Validation
+
+**Step 2.1: Load and Parse Task JSONs**
+```bash
+# Single-pass JSON extraction using jq
+validation_data = bash("""
+# Load all tasks and extract structured data
+cd '{session_dir}/.task'
+
+# Extract all task IDs
+task_ids=$(jq -r '.id' *.json 2>/dev/null | sort)
+
+# Extract dependencies for IMPL tasks
+impl_deps=$(jq -r 'select(.id | startswith("IMPL")) | .id + ":" + (.context.depends_on[]? // "none")' *.json 2>/dev/null)
+
+# Extract dependencies for REFACTOR tasks
+refactor_deps=$(jq -r 'select(.id | startswith("REFACTOR")) | .id + ":" + (.context.depends_on[]? // "none")' *.json 2>/dev/null)
+
+# Extract meta fields
+meta_tdd=$(jq -r '.id + ":" + (.meta.tdd_phase // "missing")' *.json 2>/dev/null)
+meta_agent=$(jq -r '.id + ":" + (.meta.agent // "missing")' *.json 2>/dev/null)
+
+# Output as JSON
+jq -n --arg ids "$task_ids" \
+       --arg impl "$impl_deps" \
+       --arg refactor "$refactor_deps" \
+       --arg tdd "$meta_tdd" \
+       --arg agent "$meta_agent" \
+       '{ids: $ids, impl_deps: $impl, refactor_deps: $refactor, tdd: $tdd, agent: $agent}'
+""")
+```
+
+**Step 2.2: Validate TDD Chain Structure**
+```
+Parse validation_data JSON and validate:
+
+For each feature N (extracted from task IDs):
+   1. TEST-N.M exists?
+   2. IMPL-N.M exists?
+   3. REFACTOR-N.M exists? (optional but recommended)
+   4. IMPL-N.M.context.depends_on contains TEST-N.M?
+   5. REFACTOR-N.M.context.depends_on contains IMPL-N.M?
+   6. TEST-N.M.meta.tdd_phase == "red"?
+   7. TEST-N.M.meta.agent == "@code-review-test-agent"?
+   8. IMPL-N.M.meta.tdd_phase == "green"?
+   9. IMPL-N.M.meta.agent == "@code-developer"?
+   10. REFACTOR-N.M.meta.tdd_phase == "refactor"?
+
+Calculate:
+- chain_completeness_score = (complete_chains / total_chains) * 100
+- dependency_accuracy = (correct_deps / total_deps) * 100
+- meta_field_accuracy = (correct_meta / total_meta) * 100
+```
+
+**Output**: chain_validation_report (JSON structure with validation results)
+
+---
+
+### Phase 3: Coverage & Cycle Analysis
+
+**Step 3.1: Call Coverage Analysis Phase**
+
+Read and execute the coverage analysis phase:
+- **Phase file**: `phases/04-tdd-coverage-analysis.md`
+- **Args**: `--session {session_id}`
+
+**Step 3.2: Parse Output Files**
+```bash
+# Check required outputs exist
+IF NOT EXISTS(process_dir/test-results.json):
+    WARNING: "test-results.json not found. Coverage analysis incomplete."
+    coverage_data = null
+ELSE:
+    coverage_data = Read(process_dir/test-results.json)
+
+IF NOT EXISTS(process_dir/coverage-report.json):
+    WARNING: "coverage-report.json not found. Coverage metrics incomplete."
+    metrics = null
+ELSE:
+    metrics = Read(process_dir/coverage-report.json)
+
+IF NOT EXISTS(process_dir/tdd-cycle-report.md):
+    WARNING: "tdd-cycle-report.md not found. Cycle validation incomplete."
+    cycle_data = null
+ELSE:
+    cycle_data = Read(process_dir/tdd-cycle-report.md)
+```
+
+**Step 3.3: Extract Coverage Metrics**
+```
+If coverage_data exists:
+   - line_coverage_percent
+   - branch_coverage_percent
+   - function_coverage_percent
+   - uncovered_files (list)
+   - uncovered_lines (map: file -> line ranges)
+
+If cycle_data exists:
+   - red_phase_compliance (tests failed initially?)
+   - green_phase_compliance (tests pass after impl?)
+   - refactor_phase_compliance (tests stay green during refactor?)
+   - minimal_implementation_score (was impl minimal?)
+```
+
+**Output**: coverage_analysis, cycle_analysis
+
+---
+
+### Phase 4: Compliance Report Generation
+
+**Step 4.1: Calculate Compliance Score**
+```
+Base Score: 100 points
+
+Deductions:
+Chain Structure:
+   - Missing TEST task: -30 points per feature
+   - Missing IMPL task: -30 points per feature
+   - Missing REFACTOR task: -10 points per feature
+   - Wrong dependency: -15 points per error
+   - Wrong agent: -5 points per error
+   - Wrong tdd_phase: -5 points per error
+
+TDD Cycle Compliance:
+   - Test didn't fail initially: -10 points per feature
+   - Tests didn't pass after IMPL: -20 points per feature
+   - Tests broke during REFACTOR: -15 points per feature
+   - Over-engineered IMPL: -10 points per feature
+
+Coverage Quality:
+   - Line coverage < 80%: -5 points
+   - Branch coverage < 70%: -5 points
+   - Function coverage < 80%: -5 points
+   - Critical paths uncovered: -10 points
+
+Final Score: Max(0, Base Score - Total Deductions)
+```
+
+**Step 4.2: Determine Quality Gate**
+```
+IF score >= 90 AND no_critical_violations:
+    recommendation = "APPROVED"
+ELSE IF score >= 70 AND critical_violations == 0:
+    recommendation = "PROCEED_WITH_CAVEATS"
+ELSE IF score >= 50:
+    recommendation = "REQUIRE_FIXES"
+ELSE:
+    recommendation = "BLOCK_MERGE"
+```
+
+**Step 4.3: Generate Report**
+```bash
+report_content = Generate markdown report (see structure below)
+report_path = "{session_dir}/TDD_COMPLIANCE_REPORT.md"
+Write(report_path, report_content)
+```
+
+**Step 4.4: Display Summary to User**
+```bash
+echo "=== TDD Verification Complete ==="
+echo "Session: {session_id}"
+echo "Report: {report_path}"
+echo ""
+echo "Quality Gate: {recommendation}"
+echo "Compliance Score: {score}/100"
+echo ""
+echo "Chain Validation: {chain_completeness_score}%"
+echo "Line Coverage: {line_coverage}%"
+echo "Branch Coverage: {branch_coverage}%"
+echo ""
+echo "Next: Review full report for detailed findings"
+```
+
+## TodoWrite Pattern (Optional)
+
+**Note**: As an orchestrator phase, TodoWrite tracking is optional and primarily useful for long-running verification processes. For most cases, the 4-phase execution is fast enough that progress tracking adds noise without value.
+
+```javascript
+// Only use TodoWrite for complex multi-session verification
+// Skip for single-session verification
+```
+
+## Validation Logic
+
+### Chain Validation Algorithm
+```
+1. Load all task JSONs from .workflow/active/{sessionId}/.task/
+2. Extract task IDs and group by feature number
+3. For each feature:
+   - Check TEST-N.M exists
+   - Check IMPL-N.M exists
+   - Check REFACTOR-N.M exists (optional but recommended)
+   - Verify IMPL-N.M depends_on TEST-N.M
+   - Verify REFACTOR-N.M depends_on IMPL-N.M
+   - Verify meta.tdd_phase values
+   - Verify meta.agent assignments
+4. Calculate chain completeness score
+5. Report incomplete or invalid chains
+```
+
+### Quality Gate Criteria
+
+| Recommendation | Score Range | Critical Violations | Action |
+|----------------|-------------|---------------------|--------|
+| **APPROVED** | ≥90 | 0 | Safe to merge |
+| **PROCEED_WITH_CAVEATS** | ≥70 | 0 | Can proceed, address minor issues |
+| **REQUIRE_FIXES** | ≥50 | Any | Must fix before merge |
+| **BLOCK_MERGE** | <50 | Any | Block merge until resolved |
+
+**Critical Violations**:
+- Missing TEST or IMPL task for any feature
+- Tests didn't fail initially (Red phase violation)
+- Tests didn't pass after IMPL (Green phase violation)
+- Tests broke during REFACTOR (Refactor phase violation)
+
+## Output Files
+```
+.workflow/active/WFS-{session-id}/
+├── TDD_COMPLIANCE_REPORT.md     # Comprehensive compliance report
+└── .process/
+    ├── test-results.json         # From phases/04-tdd-coverage-analysis.md
+    ├── coverage-report.json      # From phases/04-tdd-coverage-analysis.md
+    └── tdd-cycle-report.md       # From phases/04-tdd-coverage-analysis.md
+```
+
+## Error Handling
+
+### Session Discovery Errors
+| Error | Cause | Resolution |
+|-------|-------|------------|
+| No active session | No WFS-* directories | Provide --session explicitly |
+| Multiple active sessions | Multiple WFS-* directories | Provide --session explicitly |
+| Session not found | Invalid session-id | Check available sessions |
+
+### Validation Errors
+| Error | Cause | Resolution |
+|-------|-------|------------|
+| Task files missing | Incomplete planning | Run TDD planning (SKILL.md) first |
+| Invalid JSON | Corrupted task files | Regenerate tasks |
+| Missing summaries | Tasks not executed | Execute tasks before verify |
+
+### Analysis Errors
+| Error | Cause | Resolution |
+|-------|-------|------------|
+| Coverage tool missing | No test framework | Configure testing first |
+| Tests fail to run | Code errors | Fix errors before verify |
+| Coverage analysis fails | phases/04-tdd-coverage-analysis.md error | Check analysis output |
+
+## Integration
+
+### Phase Chain
+- **Called After**: Task execution completes (all TDD tasks done)
+- **Calls**: `phases/04-tdd-coverage-analysis.md`
+- **Related Skills**: SKILL.md (orchestrator), `workflow-plan-execute/` (session management)
+
+### When to Use
+- After completing all TDD tasks in a workflow
+- Before merging TDD workflow branch
+- For TDD process quality assessment
+- To identify missing TDD steps
+
+## TDD Compliance Report Structure
+
+```markdown
+# TDD Compliance Report - {Session ID}
+
+**Generated**: {timestamp}
+**Session**: WFS-{sessionId}
+**Workflow Type**: TDD
+
+---
+
+## Executive Summary
+
+### Quality Gate Decision
+
+| Metric | Value | Status |
+|--------|-------|--------|
+| Compliance Score | {score}/100 | {status_emoji} |
+| Chain Completeness | {percentage}% | {status} |
+| Line Coverage | {percentage}% | {status} |
+| Branch Coverage | {percentage}% | {status} |
+| Function Coverage | {percentage}% | {status} |
+
+### Recommendation
+
+**{RECOMMENDATION}**
+
+**Decision Rationale**:
+{brief explanation based on score and violations}
+
+**Quality Gate Criteria**:
+- **APPROVED**: Score ≥90, no critical violations
+- **PROCEED_WITH_CAVEATS**: Score ≥70, no critical violations
+- **REQUIRE_FIXES**: Score ≥50 or critical violations exist
+- **BLOCK_MERGE**: Score <50
+
+---
+
+## Chain Analysis
+
+### Feature 1: {Feature Name}
+**Status**: Complete
+**Chain**: TEST-1.1 → IMPL-1.1 → REFACTOR-1.1
+
+| Phase | Task | Status | Details |
+|-------|------|--------|---------|
+| Red | TEST-1.1 | Pass | Test created and failed with clear message |
+| Green | IMPL-1.1 | Pass | Minimal implementation made test pass |
+| Refactor | REFACTOR-1.1 | Pass | Code improved, tests remained green |
+
+### Feature 2: {Feature Name}
+**Status**: Incomplete
+**Chain**: TEST-2.1 → IMPL-2.1 (Missing REFACTOR-2.1)
+
+| Phase | Task | Status | Details |
+|-------|------|--------|---------|
+| Red | TEST-2.1 | Pass | Test created and failed |
+| Green | IMPL-2.1 | Warning | Implementation seems over-engineered |
+| Refactor | REFACTOR-2.1 | Missing | Task not completed |
+
+**Issues**:
+- REFACTOR-2.1 task not completed (-10 points)
+- IMPL-2.1 implementation exceeded minimal scope (-10 points)
+
+### Chain Validation Summary
+
+| Metric | Value |
+|--------|-------|
+| Total Features | {count} |
+| Complete Chains | {count} ({percent}%) |
+| Incomplete Chains | {count} |
+| Missing TEST | {count} |
+| Missing IMPL | {count} |
+| Missing REFACTOR | {count} |
+| Dependency Errors | {count} |
+| Meta Field Errors | {count} |
+
+---
+
+## Test Coverage Analysis
+
+### Coverage Metrics
+
+| Metric | Coverage | Target | Status |
+|--------|----------|--------|--------|
+| Line Coverage | {percentage}% | ≥80% | {status} |
+| Branch Coverage | {percentage}% | ≥70% | {status} |
+| Function Coverage | {percentage}% | ≥80% | {status} |
+
+### Coverage Gaps
+
+| File | Lines | Issue | Priority |
+|------|-------|-------|----------|
+| src/auth/service.ts | 45-52 | Uncovered error handling | HIGH |
+| src/utils/parser.ts | 78-85 | Uncovered edge case | MEDIUM |
+
+---
+
+## TDD Cycle Validation
+
+### Red Phase (Write Failing Test)
+- {N}/{total} features had failing tests initially ({percent}%)
+- Compliant features: {list}
+- Non-compliant features: {list}
+
+**Violations**:
+- Feature 3: No evidence of initial test failure (-10 points)
+
+### Green Phase (Make Test Pass)
+- {N}/{total} implementations made tests pass ({percent}%)
+- Compliant features: {list}
+- Non-compliant features: {list}
+
+**Violations**:
+- Feature 2: Implementation over-engineered (-10 points)
+
+### Refactor Phase (Improve Quality)
+- {N}/{total} features completed refactoring ({percent}%)
+- Compliant features: {list}
+- Non-compliant features: {list}
+
+**Violations**:
+- Feature 2, 4: Refactoring step skipped (-20 points total)
+
+---
+
+## Best Practices Assessment
+
+### Strengths
+- Clear test descriptions
+- Good test coverage
+- Consistent naming conventions
+- Well-structured code
+
+### Areas for Improvement
+- Some implementations over-engineered in Green phase
+- Missing refactoring steps
+- Test failure messages could be more descriptive
+
+---
+
+## Detailed Findings by Severity
+
+### Critical Issues ({count})
+{List of critical issues with impact and remediation}
+
+### High Priority Issues ({count})
+{List of high priority issues with impact and remediation}
+
+### Medium Priority Issues ({count})
+{List of medium priority issues with impact and remediation}
+
+### Low Priority Issues ({count})
+{List of low priority issues with impact and remediation}
+
+---
+
+## Recommendations
+
+### Required Fixes (Before Merge)
+1. Complete missing REFACTOR tasks (Features 2, 4)
+2. Verify initial test failures for Feature 3
+3. Fix tests that broke during refactoring
+
+### Recommended Improvements
+1. Simplify over-engineered implementations
+2. Add edge case tests for Features 1, 3
+3. Improve test failure message clarity
+4. Increase branch coverage to >85%
+
+### Optional Enhancements
+1. Add more descriptive test names
+2. Consider parameterized tests for similar scenarios
+3. Document TDD process learnings
+
+---
+
+## Metrics Summary
+
+| Metric | Value |
+|--------|-------|
+| Total Features | {count} |
+| Complete Chains | {count} ({percent}%) |
+| Compliance Score | {score}/100 |
+| Critical Issues | {count} |
+| High Issues | {count} |
+| Medium Issues | {count} |
+| Low Issues | {count} |
+| Line Coverage | {percent}% |
+| Branch Coverage | {percent}% |
+| Function Coverage | {percent}% |
+
+---
+
+**Report End**
+```
+
+---
+
+## Post-Phase Update
+
+After TDD Verify completes:
+- **Output Created**: `TDD_COMPLIANCE_REPORT.md` in session directory
+- **Data Produced**: Compliance score, quality gate recommendation, chain validation, coverage metrics
+- **Next Action**: Based on quality gate - APPROVED (merge), REQUIRE_FIXES (iterate), BLOCK_MERGE (rework)
+- **TodoWrite**: Mark "TDD Verify: completed" with quality gate result
--- a/.codex/skills/workflow-tdd-plan/phases/04-tdd-coverage-analysis.md
+++ b/.codex/skills/workflow-tdd-plan/phases/04-tdd-coverage-analysis.md
@@ -0,0 +1,287 @@
+# Phase 4: TDD Coverage Analysis
+
+## Overview
+Analyze test coverage and verify Red-Green-Refactor cycle execution for TDD workflow validation.
+
+## Core Responsibilities
+- Extract test files from TEST tasks
+- Run test suite with coverage
+- Parse coverage metrics
+- Verify TDD cycle execution (Red -> Green -> Refactor)
+- Generate coverage and cycle reports
+
+## Execution Process
+
+```
+Input Parsing:
+   ├─ Parse flags: --session
+   └─ Validation: session_id REQUIRED
+
+Phase 1: Extract Test Tasks
+   └─ Find TEST-*.json files and extract focus_paths
+
+Phase 2: Run Test Suite
+   └─ Decision (test framework):
+      ├─ Node.js → npm test --coverage --json
+      ├─ Python → pytest --cov --json-report
+      └─ Other → [test_command] --coverage --json
+
+Phase 3: Parse Coverage Data
+   ├─ Extract line coverage percentage
+   ├─ Extract branch coverage percentage
+   ├─ Extract function coverage percentage
+   └─ Identify uncovered lines/branches
+
+Phase 4: Verify TDD Cycle
+   └─ FOR each TDD chain (TEST-N.M → IMPL-N.M → REFACTOR-N.M):
+      ├─ Red Phase: Verify tests created and failed initially
+      ├─ Green Phase: Verify tests now pass
+      └─ Refactor Phase: Verify code quality improved
+
+Phase 5: Generate Analysis Report
+   └─ Create tdd-cycle-report.md with coverage metrics and cycle verification
+```
+
+## Execution Lifecycle
+
+### Phase 1: Extract Test Tasks
+```bash
+find .workflow/active/{session_id}/.task/ -name 'TEST-*.json' -exec jq -r '.context.focus_paths[]' {} \;
+```
+
+**Output**: List of test directories/files from all TEST tasks
+
+### Phase 2: Run Test Suite
+```bash
+# Node.js/JavaScript
+npm test -- --coverage --json > .workflow/active/{session_id}/.process/test-results.json
+
+# Python
+pytest --cov --json-report > .workflow/active/{session_id}/.process/test-results.json
+
+# Other frameworks (detect from project)
+[test_command] --coverage --json-output .workflow/active/{session_id}/.process/test-results.json
+```
+
+**Output**: test-results.json with coverage data
+
+### Phase 3: Parse Coverage Data
+```bash
+jq '.coverage' .workflow/active/{session_id}/.process/test-results.json > .workflow/active/{session_id}/.process/coverage-report.json
+```
+
+**Extract**:
+- Line coverage percentage
+- Branch coverage percentage
+- Function coverage percentage
+- Uncovered lines/branches
+
+### Phase 4: Verify TDD Cycle
+
+For each TDD chain (TEST-N.M -> IMPL-N.M -> REFACTOR-N.M):
+
+**1. Red Phase Verification**
+```bash
+# Check TEST task summary
+cat .workflow/active/{session_id}/.summaries/TEST-N.M-summary.md
+```
+
+Verify:
+- Tests were created
+- Tests failed initially
+- Failure messages were clear
+
+**2. Green Phase Verification**
+```bash
+# Check IMPL task summary
+cat .workflow/active/{session_id}/.summaries/IMPL-N.M-summary.md
+```
+
+Verify:
+- Implementation was completed
+- Tests now pass
+- Implementation was minimal
+
+**3. Refactor Phase Verification**
+```bash
+# Check REFACTOR task summary
+cat .workflow/active/{session_id}/.summaries/REFACTOR-N.M-summary.md
+```
+
+Verify:
+- Refactoring was completed
+- Tests still pass
+- Code quality improved
+
+### Phase 5: Generate Analysis Report
+
+Create `.workflow/active/{session_id}/.process/tdd-cycle-report.md`:
+
+```markdown
+# TDD Cycle Analysis - {Session ID}
+
+## Coverage Metrics
+- **Line Coverage**: {percentage}%
+- **Branch Coverage**: {percentage}%
+- **Function Coverage**: {percentage}%
+
+## Coverage Details
+### Covered
+- {covered_lines} lines
+- {covered_branches} branches
+- {covered_functions} functions
+
+### Uncovered
+- Lines: {uncovered_line_numbers}
+- Branches: {uncovered_branch_locations}
+
+## TDD Cycle Verification
+
+### Feature 1: {Feature Name}
+**Chain**: TEST-1.1 -> IMPL-1.1 -> REFACTOR-1.1
+
+- [PASS] **Red Phase**: Tests created and failed initially
+- [PASS] **Green Phase**: Implementation made tests pass
+- [PASS] **Refactor Phase**: Refactoring maintained green tests
+
+### Feature 2: {Feature Name}
+**Chain**: TEST-2.1 -> IMPL-2.1 -> REFACTOR-2.1
+
+- [PASS] **Red Phase**: Tests created and failed initially
+- [WARN] **Green Phase**: Tests pass but implementation seems over-engineered
+- [PASS] **Refactor Phase**: Refactoring maintained green tests
+
+[Repeat for all features]
+
+## TDD Compliance Summary
+- **Total Chains**: {N}
+- **Complete Cycles**: {N}
+- **Incomplete Cycles**: {0}
+- **Compliance Score**: {score}/100
+
+## Gaps Identified
+- Feature 3: Missing initial test failure verification
+- Feature 5: No refactoring step completed
+
+## Recommendations
+- Complete missing refactoring steps
+- Add edge case tests for Feature 2
+- Verify test failure messages are descriptive
+```
+
+## Output Files
+```
+.workflow/active/{session-id}/
+└── .process/
+    ├── test-results.json         # Raw test execution results
+    ├── coverage-report.json      # Parsed coverage data
+    └── tdd-cycle-report.md       # TDD cycle analysis
+```
+
+## Test Framework Detection
+
+Auto-detect test framework from project:
+
+```bash
+# Check for test frameworks
+if [ -f "package.json" ] && grep -q "jest\|mocha\|vitest" package.json; then
+    TEST_CMD="npm test -- --coverage --json"
+elif [ -f "pytest.ini" ] || [ -f "setup.py" ]; then
+    TEST_CMD="pytest --cov --json-report"
+elif [ -f "Cargo.toml" ]; then
+    TEST_CMD="cargo test -- --test-threads=1 --nocapture"
+elif [ -f "go.mod" ]; then
+    TEST_CMD="go test -coverprofile=coverage.out -json ./..."
+else
+    TEST_CMD="echo 'No supported test framework found'"
+fi
+```
+
+## TDD Cycle Verification Algorithm
+
+```
+For each feature N:
+  1. Load TEST-N.M-summary.md
+     IF summary missing:
+       Mark: "Red phase incomplete"
+       SKIP to next feature
+
+     CHECK: Contains "test" AND "fail"
+     IF NOT found:
+       Mark: "Red phase verification failed"
+     ELSE:
+       Mark: "Red phase [PASS]"
+
+  2. Load IMPL-N.M-summary.md
+     IF summary missing:
+       Mark: "Green phase incomplete"
+       SKIP to next feature
+
+     CHECK: Contains "pass" OR "green"
+     IF NOT found:
+       Mark: "Green phase verification failed"
+     ELSE:
+       Mark: "Green phase [PASS]"
+
+  3. Load REFACTOR-N.M-summary.md
+     IF summary missing:
+       Mark: "Refactor phase incomplete"
+       CONTINUE (refactor is optional)
+
+     CHECK: Contains "refactor" AND "pass"
+     IF NOT found:
+       Mark: "Refactor phase verification failed"
+     ELSE:
+       Mark: "Refactor phase [PASS]"
+
+  4. Calculate chain score:
+     - Red + Green + Refactor all [PASS] = 100%
+     - Red + Green [PASS], Refactor missing = 80%
+     - Red [PASS], Green missing = 40%
+     - All missing = 0%
+```
+
+## Coverage Metrics Calculation
+
+```bash
+# Parse coverage from test-results.json
+line_coverage=$(jq '.coverage.lineCoverage' test-results.json)
+branch_coverage=$(jq '.coverage.branchCoverage' test-results.json)
+function_coverage=$(jq '.coverage.functionCoverage' test-results.json)
+
+# Calculate overall score
+overall_score=$(echo "($line_coverage + $branch_coverage + $function_coverage) / 3" | bc)
+```
+
+## Error Handling
+
+### Test Execution Errors
+| Error | Cause | Resolution |
+|-------|-------|------------|
+| Test framework not found | No test config | Configure test framework first |
+| Tests fail to run | Syntax errors | Fix code before analysis |
+| Coverage not available | Missing coverage tool | Install coverage plugin |
+
+### Cycle Verification Errors
+| Error | Cause | Resolution |
+|-------|-------|------------|
+| Summary missing | Task not executed | Execute tasks before analysis |
+| Invalid summary format | Corrupted file | Re-run task to regenerate |
+| No test evidence | Tests not committed | Ensure tests are committed |
+
+## Integration
+
+### Phase Chain
+- **Called By**: `phases/03-tdd-verify.md` (Coverage & Cycle Analysis step)
+- **Calls**: Test framework commands (npm test, pytest, etc.)
+- **Followed By**: Compliance report generation in `phases/03-tdd-verify.md`
+
+---
+
+## Post-Phase Update
+
+After TDD Coverage Analysis completes:
+- **Output Created**: `test-results.json`, `coverage-report.json`, `tdd-cycle-report.md` in `.process/`
+- **Data Produced**: Coverage metrics (line/branch/function), TDD cycle verification results per feature
+- **Next Action**: Return data to `phases/03-tdd-verify.md` for compliance report aggregation
+- **TodoWrite**: Mark "Coverage & Cycle Analysis: completed"