feat: Add comprehensive tests for CCW Loop System flow state

- Implemented loop control tasks in JSON format for testing. - Created comprehensive test scripts for loop flow and standalone tests. - Developed a shell script to automate the testing of the entire loop system flow, including mock endpoints and state transitions. - Added error handling and execution history tests to ensure robustness. - Established variable substitution and success condition evaluations in tests. - Set up cleanup and workspace management for test environments.
2026-02-13 02:41:50 +08:00 · 2026-01-22 10:13:00 +08:00
parent d9f1d14d5e
commit 60eab98782
37 changed files with 12347 additions and 917 deletions
--- a/.claude/skills/ccw-loop/templates/progress-template.md
+++ b/.claude/skills/ccw-loop/templates/progress-template.md
@@ -0,0 +1,175 @@
+# Progress Document Template
+
+开发进度文档的标准模板。
+
+## Template Structure
+
+```markdown
+# Development Progress
+
+**Session ID**: {{session_id}}
+**Task**: {{task_description}}
+**Started**: {{started_at}}
+**Estimated Complexity**: {{complexity}}
+
+---
+
+## Task List
+
+{{#each tasks}}
+{{@index}}. [{{#if completed}}x{{else}} {{/if}}] {{description}}
+{{/each}}
+
+## Key Files
+
+{{#each key_files}}
+- `{{this}}`
+{{/each}}
+
+---
+
+## Progress Timeline
+
+{{#each iterations}}
+### Iteration {{@index}} - {{task_name}} ({{timestamp}})
+
+#### Task Details
+
+- **ID**: {{task_id}}
+- **Tool**: {{tool}}
+- **Mode**: {{mode}}
+
+#### Implementation Summary
+
+{{summary}}
+
+#### Files Changed
+
+{{#each files_changed}}
+- `{{this}}`
+{{/each}}
+
+#### Status: {{status}}
+
+---
+{{/each}}
+
+## Current Statistics
+
+| Metric | Value |
+|--------|-------|
+| Total Tasks | {{total_tasks}} |
+| Completed | {{completed_tasks}} |
+| In Progress | {{in_progress_tasks}} |
+| Pending | {{pending_tasks}} |
+| Progress | {{progress_percentage}}% |
+
+---
+
+## Next Steps
+
+{{#each next_steps}}
+- [ ] {{this}}
+{{/each}}
+```
+
+## Template Variables
+
+| Variable | Type | Source | Description |
+|----------|------|--------|-------------|
+| `session_id` | string | state.session_id | 会话 ID |
+| `task_description` | string | state.task_description | 任务描述 |
+| `started_at` | string | state.created_at | 开始时间 |
+| `complexity` | string | state.context.estimated_complexity | 预估复杂度 |
+| `tasks` | array | state.develop.tasks | 任务列表 |
+| `key_files` | array | state.context.key_files | 关键文件 |
+| `iterations` | array | 从文件解析 | 迭代历史 |
+| `total_tasks` | number | state.develop.total_count | 总任务数 |
+| `completed_tasks` | number | state.develop.completed_count | 已完成数 |
+
+## Usage Example
+
+```javascript
+const progressTemplate = Read('.claude/skills/ccw-loop/templates/progress-template.md')
+
+function renderProgress(state) {
+  let content = progressTemplate
+
+  // 替换简单变量
+  content = content.replace('{{session_id}}', state.session_id)
+  content = content.replace('{{task_description}}', state.task_description)
+  content = content.replace('{{started_at}}', state.created_at)
+  content = content.replace('{{complexity}}', state.context?.estimated_complexity || 'unknown')
+
+  // 替换任务列表
+  const taskList = state.develop.tasks.map((t, i) => {
+    const checkbox = t.status === 'completed' ? 'x' : ' '
+    return `${i + 1}. [${checkbox}] ${t.description}`
+  }).join('\n')
+  content = content.replace('{{#each tasks}}...{{/each}}', taskList)
+
+  // 替换统计
+  content = content.replace('{{total_tasks}}', state.develop.total_count)
+  content = content.replace('{{completed_tasks}}', state.develop.completed_count)
+  // ...
+
+  return content
+}
+```
+
+## Section Templates
+
+### Task Entry
+
+```markdown
+### Iteration {{N}} - {{task_name}} ({{timestamp}})
+
+#### Task Details
+
+- **ID**: {{task_id}}
+- **Tool**: {{tool}}
+- **Mode**: {{mode}}
+
+#### Implementation Summary
+
+{{summary}}
+
+#### Files Changed
+
+{{#each files}}
+- `{{this}}`
+{{/each}}
+
+#### Status: COMPLETED
+
+---
+```
+
+### Statistics Table
+
+```markdown
+## Current Statistics
+
+| Metric | Value |
+|--------|-------|
+| Total Tasks | {{total}} |
+| Completed | {{completed}} |
+| In Progress | {{in_progress}} |
+| Pending | {{pending}} |
+| Progress | {{percentage}}% |
+```
+
+### Next Steps
+
+```markdown
+## Next Steps
+
+{{#if all_completed}}
+- [ ] Run validation tests
+- [ ] Code review
+- [ ] Update documentation
+{{else}}
+- [ ] Complete remaining {{pending}} tasks
+- [ ] Review completed work
+{{/if}}
+```
--- a/.claude/skills/ccw-loop/templates/understanding-template.md
+++ b/.claude/skills/ccw-loop/templates/understanding-template.md
@@ -0,0 +1,303 @@
+# Understanding Document Template
+
+调试理解演变文档的标准模板。
+
+## Template Structure
+
+```markdown
+# Understanding Document
+
+**Session ID**: {{session_id}}
+**Bug Description**: {{bug_description}}
+**Started**: {{started_at}}
+
+---
+
+## Exploration Timeline
+
+{{#each iterations}}
+### Iteration {{number}} - {{title}} ({{timestamp}})
+
+{{#if is_exploration}}
+#### Current Understanding
+
+Based on bug description and initial code search:
+
+- Error pattern: {{error_pattern}}
+- Affected areas: {{affected_areas}}
+- Initial hypothesis: {{initial_thoughts}}
+
+#### Evidence from Code Search
+
+{{#each search_results}}
+**Keyword: "{{keyword}}"**
+- Found in: {{files}}
+- Key findings: {{insights}}
+{{/each}}
+{{/if}}
+
+{{#if has_hypotheses}}
+#### Hypotheses Generated (Gemini-Assisted)
+
+{{#each hypotheses}}
+**{{id}}** (Likelihood: {{likelihood}}): {{description}}
+- Logging at: {{logging_point}}
+- Testing: {{testable_condition}}
+- Evidence to confirm: {{confirm_criteria}}
+- Evidence to reject: {{reject_criteria}}
+{{/each}}
+
+**Gemini Insights**: {{gemini_insights}}
+{{/if}}
+
+{{#if is_analysis}}
+#### Log Analysis Results
+
+{{#each results}}
+**{{id}}**: {{verdict}}
+- Evidence: {{evidence}}
+- Reasoning: {{reason}}
+{{/each}}
+
+#### Corrected Understanding
+
+Previous misunderstandings identified and corrected:
+
+{{#each corrections}}
+- ~~{{wrong}}~~ → {{corrected}}
+  - Why wrong: {{reason}}
+  - Evidence: {{evidence}}
+{{/each}}
+
+#### New Insights
+
+{{#each insights}}
+- {{this}}
+{{/each}}
+
+#### Gemini Analysis
+
+{{gemini_analysis}}
+{{/if}}
+
+{{#if root_cause_found}}
+#### Root Cause Identified
+
+**{{hypothesis_id}}**: {{description}}
+
+Evidence supporting this conclusion:
+{{supporting_evidence}}
+{{else}}
+#### Next Steps
+
+{{next_steps}}
+{{/if}}
+
+---
+{{/each}}
+
+## Current Consolidated Understanding
+
+### What We Know
+
+{{#each valid_understandings}}
+- {{this}}
+{{/each}}
+
+### What Was Disproven
+
+{{#each disproven}}
+- ~~{{assumption}}~~ (Evidence: {{evidence}})
+{{/each}}
+
+### Current Investigation Focus
+
+{{current_focus}}
+
+### Remaining Questions
+
+{{#each questions}}
+- {{this}}
+{{/each}}
+```
+
+## Template Variables
+
+| Variable | Type | Source | Description |
+|----------|------|--------|-------------|
+| `session_id` | string | state.session_id | 会话 ID |
+| `bug_description` | string | state.debug.current_bug | Bug 描述 |
+| `iterations` | array | 从文件解析 | 迭代历史 |
+| `hypotheses` | array | state.debug.hypotheses | 假设列表 |
+| `valid_understandings` | array | 从 Gemini 分析 | 有效理解 |
+| `disproven` | array | 从假设状态 | 被否定的假设 |
+
+## Section Templates
+
+### Exploration Section
+
+```markdown
+### Iteration {{N}} - Initial Exploration ({{timestamp}})
+
+#### Current Understanding
+
+Based on bug description and initial code search:
+
+- Error pattern: {{pattern}}
+- Affected areas: {{areas}}
+- Initial hypothesis: {{thoughts}}
+
+#### Evidence from Code Search
+
+{{#each search_results}}
+**Keyword: "{{keyword}}"**
+- Found in: {{files}}
+- Key findings: {{insights}}
+{{/each}}
+
+#### Next Steps
+
+- Generate testable hypotheses
+- Add instrumentation
+- Await reproduction
+```
+
+### Hypothesis Section
+
+```markdown
+#### Hypotheses Generated (Gemini-Assisted)
+
+| ID | Description | Likelihood | Status |
+|----|-------------|------------|--------|
+{{#each hypotheses}}
+| {{id}} | {{description}} | {{likelihood}} | {{status}} |
+{{/each}}
+
+**Details:**
+
+{{#each hypotheses}}
+**{{id}}**: {{description}}
+- Logging at: `{{logging_point}}`
+- Testing: {{testable_condition}}
+- Confirm: {{evidence_criteria.confirm}}
+- Reject: {{evidence_criteria.reject}}
+{{/each}}
+```
+
+### Analysis Section
+
+```markdown
+### Iteration {{N}} - Evidence Analysis ({{timestamp}})
+
+#### Log Analysis Results
+
+{{#each results}}
+**{{id}}**: **{{verdict}}**
+- Evidence: \`{{evidence}}\`
+- Reasoning: {{reason}}
+{{/each}}
+
+#### Corrected Understanding
+
+| Previous Assumption | Corrected To | Reason |
+|---------------------|--------------|--------|
+{{#each corrections}}
+| ~~{{wrong}}~~ | {{corrected}} | {{reason}} |
+{{/each}}
+
+#### Gemini Analysis
+
+{{gemini_analysis}}
+```
+
+### Consolidated Understanding Section
+
+```markdown
+## Current Consolidated Understanding
+
+### What We Know
+
+{{#each valid}}
+- {{this}}
+{{/each}}
+
+### What Was Disproven
+
+{{#each disproven}}
+- ~~{{this.assumption}}~~ (Evidence: {{this.evidence}})
+{{/each}}
+
+### Current Investigation Focus
+
+{{focus}}
+
+### Remaining Questions
+
+{{#each questions}}
+- {{this}}
+{{/each}}
+```
+
+### Resolution Section
+
+```markdown
+### Resolution ({{timestamp}})
+
+#### Fix Applied
+
+- Modified files: {{files}}
+- Fix description: {{description}}
+- Root cause addressed: {{root_cause}}
+
+#### Verification Results
+
+{{verification}}
+
+#### Lessons Learned
+
+{{#each lessons}}
+{{@index}}. {{this}}
+{{/each}}
+
+#### Key Insights for Future
+
+{{#each insights}}
+- {{this}}
+{{/each}}
+```
+
+## Consolidation Rules
+
+更新 "Current Consolidated Understanding" 时遵循以下规则:
+
+1. **简化被否定项**: 移到 "What Was Disproven"，只保留单行摘要
+2. **保留有效见解**: 将确认的发现提升到 "What We Know"
+3. **避免重复**: 不在合并部分重复时间线细节
+4. **关注当前状态**: 描述现在知道什么，而不是过程
+5. **保留关键纠正**: 保留重要的 wrong→right 转换供学习
+
+## Anti-Patterns
+
+**错误示例 (冗余)**:
+```markdown
+## Current Consolidated Understanding
+
+In iteration 1 we thought X, but in iteration 2 we found Y, then in iteration 3...
+Also we checked A and found B, and then we checked C...
+```
+
+**正确示例 (精简)**:
+```markdown
+## Current Consolidated Understanding
+
+### What We Know
+- Error occurs during runtime update, not initialization
+- Config value is None (not missing key)
+
+### What Was Disproven
+- ~~Initialization error~~ (Timing evidence)
+- ~~Missing key hypothesis~~ (Key exists)
+
+### Current Investigation Focus
+Why is config value None during update?
+```
--- a/.claude/skills/ccw-loop/templates/validation-template.md
+++ b/.claude/skills/ccw-loop/templates/validation-template.md
@@ -0,0 +1,258 @@
+# Validation Report Template
+
+验证报告的标准模板。
+
+## Template Structure
+
+```markdown
+# Validation Report
+
+**Session ID**: {{session_id}}
+**Task**: {{task_description}}
+**Validated**: {{timestamp}}
+
+---
+
+## Iteration {{iteration}} - Validation Run
+
+### Test Execution Summary
+
+| Metric | Value |
+|--------|-------|
+| Total Tests | {{total_tests}} |
+| Passed | {{passed_tests}} |
+| Failed | {{failed_tests}} |
+| Skipped | {{skipped_tests}} |
+| Duration | {{duration}}ms |
+| **Pass Rate** | **{{pass_rate}}%** |
+
+### Coverage Report
+
+{{#if has_coverage}}
+| File | Statements | Branches | Functions | Lines |
+|------|------------|----------|-----------|-------|
+{{#each coverage_files}}
+| {{path}} | {{statements}}% | {{branches}}% | {{functions}}% | {{lines}}% |
+{{/each}}
+
+**Overall Coverage**: {{overall_coverage}}%
+{{else}}
+_No coverage data available_
+{{/if}}
+
+### Failed Tests
+
+{{#if has_failures}}
+{{#each failures}}
+#### {{test_name}}
+
+- **Suite**: {{suite}}
+- **Error**: {{error_message}}
+- **Stack**:
+\`\`\`
+{{stack_trace}}
+\`\`\`
+{{/each}}
+{{else}}
+_All tests passed_
+{{/if}}
+
+### Gemini Quality Analysis
+
+{{gemini_analysis}}
+
+### Recommendations
+
+{{#each recommendations}}
+- {{this}}
+{{/each}}
+
+---
+
+## Validation Decision
+
+**Result**: {{#if passed}}✅ PASS{{else}}❌ FAIL{{/if}}
+
+**Rationale**: {{rationale}}
+
+{{#if not_passed}}
+### Next Actions
+
+1. Review failed tests
+2. Debug failures using action-debug-with-file
+3. Fix issues and re-run validation
+{{else}}
+### Next Actions
+
+1. Consider code review
+2. Prepare for deployment
+3. Update documentation
+{{/if}}
+```
+
+## Template Variables
+
+| Variable | Type | Source | Description |
+|----------|------|--------|-------------|
+| `session_id` | string | state.session_id | 会话 ID |
+| `task_description` | string | state.task_description | 任务描述 |
+| `timestamp` | string | 当前时间 | 验证时间 |
+| `iteration` | number | 从文件计算 | 验证迭代次数 |
+| `total_tests` | number | 测试输出 | 总测试数 |
+| `passed_tests` | number | 测试输出 | 通过数 |
+| `failed_tests` | number | 测试输出 | 失败数 |
+| `pass_rate` | number | 计算得出 | 通过率 |
+| `coverage_files` | array | 覆盖率报告 | 文件覆盖率 |
+| `failures` | array | 测试输出 | 失败测试详情 |
+| `gemini_analysis` | string | Gemini CLI | 质量分析 |
+| `recommendations` | array | Gemini CLI | 建议列表 |
+
+## Section Templates
+
+### Test Summary
+
+```markdown
+### Test Execution Summary
+
+| Metric | Value |
+|--------|-------|
+| Total Tests | {{total}} |
+| Passed | {{passed}} |
+| Failed | {{failed}} |
+| Skipped | {{skipped}} |
+| Duration | {{duration}}ms |
+| **Pass Rate** | **{{rate}}%** |
+```
+
+### Coverage Table
+
+```markdown
+### Coverage Report
+
+| File | Statements | Branches | Functions | Lines |
+|------|------------|----------|-----------|-------|
+{{#each files}}
+| `{{path}}` | {{statements}}% | {{branches}}% | {{functions}}% | {{lines}}% |
+{{/each}}
+
+**Overall Coverage**: {{overall}}%
+
+**Coverage Thresholds**:
+- ✅ Good: ≥ 80%
+- ⚠️ Warning: 60-79%
+- ❌ Poor: < 60%
+```
+
+### Failed Test Details
+
+```markdown
+### Failed Tests
+
+{{#each failures}}
+#### ❌ {{test_name}}
+
+| Field | Value |
+|-------|-------|
+| Suite | {{suite}} |
+| Error | {{error_message}} |
+| Duration | {{duration}}ms |
+
+**Stack Trace**:
+\`\`\`
+{{stack_trace}}
+\`\`\`
+
+**Possible Causes**:
+{{#each possible_causes}}
+- {{this}}
+{{/each}}
+
+---
+{{/each}}
+```
+
+### Quality Analysis
+
+```markdown
+### Gemini Quality Analysis
+
+#### Code Quality Assessment
+
+| Dimension | Score | Status |
+|-----------|-------|--------|
+| Correctness | {{correctness}}/10 | {{correctness_status}} |
+| Completeness | {{completeness}}/10 | {{completeness_status}} |
+| Reliability | {{reliability}}/10 | {{reliability_status}} |
+| Maintainability | {{maintainability}}/10 | {{maintainability_status}} |
+
+#### Key Findings
+
+{{#each findings}}
+- **{{severity}}**: {{description}}
+{{/each}}
+
+#### Recommendations
+
+{{#each recommendations}}
+{{@index}}. {{this}}
+{{/each}}
+```
+
+### Decision Section
+
+```markdown
+## Validation Decision
+
+**Result**: {{#if passed}}✅ PASS{{else}}❌ FAIL{{/if}}
+
+**Rationale**:
+{{rationale}}
+
+**Confidence Level**: {{confidence}}
+
+### Decision Matrix
+
+| Criteria | Status | Weight | Score |
+|----------|--------|--------|-------|
+| All tests pass | {{tests_pass}} | 40% | {{tests_score}} |
+| Coverage ≥ 80% | {{coverage_pass}} | 30% | {{coverage_score}} |
+| No critical issues | {{no_critical}} | 20% | {{critical_score}} |
+| Quality analysis pass | {{quality_pass}} | 10% | {{quality_score}} |
+| **Total** | | 100% | **{{total_score}}** |
+
+**Threshold**: 70% to pass
+
+### Next Actions
+
+{{#if passed}}
+1. ✅ Code review (recommended)
+2. ✅ Update documentation
+3. ✅ Prepare for deployment
+{{else}}
+1. ❌ Review failed tests
+2. ❌ Debug failures
+3. ❌ Fix issues and re-run
+{{/if}}
+```
+
+## Historical Comparison
+
+```markdown
+## Validation History
+
+| Iteration | Date | Pass Rate | Coverage | Status |
+|-----------|------|-----------|----------|--------|
+{{#each history}}
+| {{iteration}} | {{date}} | {{pass_rate}}% | {{coverage}}% | {{status}} |
+{{/each}}
+
+### Trend Analysis
+
+{{#if improving}}
+📈 **Improving**: Pass rate increased from {{previous_rate}}% to {{current_rate}}%
+{{else if declining}}
+📉 **Declining**: Pass rate decreased from {{previous_rate}}% to {{current_rate}}%
+{{else}}
+➡️ **Stable**: Pass rate remains at {{current_rate}}%
+{{/if}}
+```