feat: Add comprehensive tests for CCW Loop System flow state

- Implemented loop control tasks in JSON format for testing.
- Created comprehensive test scripts for loop flow and standalone tests.
- Developed a shell script to automate the testing of the entire loop system flow, including mock endpoints and state transitions.
- Added error handling and execution history tests to ensure robustness.
- Established variable substitution and success condition evaluations in tests.
- Set up cleanup and workspace management for test environments.
This commit is contained in:
catlog22
2026-01-22 10:13:00 +08:00
parent d9f1d14d5e
commit 60eab98782
37 changed files with 12347 additions and 917 deletions

View File

@@ -0,0 +1,175 @@
# Progress Document Template
开发进度文档的标准模板。
## Template Structure
```markdown
# Development Progress
**Session ID**: {{session_id}}
**Task**: {{task_description}}
**Started**: {{started_at}}
**Estimated Complexity**: {{complexity}}
---
## Task List
{{#each tasks}}
{{@index}}. [{{#if completed}}x{{else}} {{/if}}] {{description}}
{{/each}}
## Key Files
{{#each key_files}}
- `{{this}}`
{{/each}}
---
## Progress Timeline
{{#each iterations}}
### Iteration {{@index}} - {{task_name}} ({{timestamp}})
#### Task Details
- **ID**: {{task_id}}
- **Tool**: {{tool}}
- **Mode**: {{mode}}
#### Implementation Summary
{{summary}}
#### Files Changed
{{#each files_changed}}
- `{{this}}`
{{/each}}
#### Status: {{status}}
---
{{/each}}
## Current Statistics
| Metric | Value |
|--------|-------|
| Total Tasks | {{total_tasks}} |
| Completed | {{completed_tasks}} |
| In Progress | {{in_progress_tasks}} |
| Pending | {{pending_tasks}} |
| Progress | {{progress_percentage}}% |
---
## Next Steps
{{#each next_steps}}
- [ ] {{this}}
{{/each}}
```
## Template Variables
| Variable | Type | Source | Description |
|----------|------|--------|-------------|
| `session_id` | string | state.session_id | 会话 ID |
| `task_description` | string | state.task_description | 任务描述 |
| `started_at` | string | state.created_at | 开始时间 |
| `complexity` | string | state.context.estimated_complexity | 预估复杂度 |
| `tasks` | array | state.develop.tasks | 任务列表 |
| `key_files` | array | state.context.key_files | 关键文件 |
| `iterations` | array | 从文件解析 | 迭代历史 |
| `total_tasks` | number | state.develop.total_count | 总任务数 |
| `completed_tasks` | number | state.develop.completed_count | 已完成数 |
## Usage Example
```javascript
const progressTemplate = Read('.claude/skills/ccw-loop/templates/progress-template.md')
function renderProgress(state) {
let content = progressTemplate
// 替换简单变量
content = content.replace('{{session_id}}', state.session_id)
content = content.replace('{{task_description}}', state.task_description)
content = content.replace('{{started_at}}', state.created_at)
content = content.replace('{{complexity}}', state.context?.estimated_complexity || 'unknown')
// 替换任务列表
const taskList = state.develop.tasks.map((t, i) => {
const checkbox = t.status === 'completed' ? 'x' : ' '
return `${i + 1}. [${checkbox}] ${t.description}`
}).join('\n')
content = content.replace('{{#each tasks}}...{{/each}}', taskList)
// 替换统计
content = content.replace('{{total_tasks}}', state.develop.total_count)
content = content.replace('{{completed_tasks}}', state.develop.completed_count)
// ...
return content
}
```
## Section Templates
### Task Entry
```markdown
### Iteration {{N}} - {{task_name}} ({{timestamp}})
#### Task Details
- **ID**: {{task_id}}
- **Tool**: {{tool}}
- **Mode**: {{mode}}
#### Implementation Summary
{{summary}}
#### Files Changed
{{#each files}}
- `{{this}}`
{{/each}}
#### Status: COMPLETED
---
```
### Statistics Table
```markdown
## Current Statistics
| Metric | Value |
|--------|-------|
| Total Tasks | {{total}} |
| Completed | {{completed}} |
| In Progress | {{in_progress}} |
| Pending | {{pending}} |
| Progress | {{percentage}}% |
```
### Next Steps
```markdown
## Next Steps
{{#if all_completed}}
- [ ] Run validation tests
- [ ] Code review
- [ ] Update documentation
{{else}}
- [ ] Complete remaining {{pending}} tasks
- [ ] Review completed work
{{/if}}
```

View File

@@ -0,0 +1,303 @@
# Understanding Document Template
调试理解演变文档的标准模板。
## Template Structure
```markdown
# Understanding Document
**Session ID**: {{session_id}}
**Bug Description**: {{bug_description}}
**Started**: {{started_at}}
---
## Exploration Timeline
{{#each iterations}}
### Iteration {{number}} - {{title}} ({{timestamp}})
{{#if is_exploration}}
#### Current Understanding
Based on bug description and initial code search:
- Error pattern: {{error_pattern}}
- Affected areas: {{affected_areas}}
- Initial hypothesis: {{initial_thoughts}}
#### Evidence from Code Search
{{#each search_results}}
**Keyword: "{{keyword}}"**
- Found in: {{files}}
- Key findings: {{insights}}
{{/each}}
{{/if}}
{{#if has_hypotheses}}
#### Hypotheses Generated (Gemini-Assisted)
{{#each hypotheses}}
**{{id}}** (Likelihood: {{likelihood}}): {{description}}
- Logging at: {{logging_point}}
- Testing: {{testable_condition}}
- Evidence to confirm: {{confirm_criteria}}
- Evidence to reject: {{reject_criteria}}
{{/each}}
**Gemini Insights**: {{gemini_insights}}
{{/if}}
{{#if is_analysis}}
#### Log Analysis Results
{{#each results}}
**{{id}}**: {{verdict}}
- Evidence: {{evidence}}
- Reasoning: {{reason}}
{{/each}}
#### Corrected Understanding
Previous misunderstandings identified and corrected:
{{#each corrections}}
- ~~{{wrong}}~~ → {{corrected}}
- Why wrong: {{reason}}
- Evidence: {{evidence}}
{{/each}}
#### New Insights
{{#each insights}}
- {{this}}
{{/each}}
#### Gemini Analysis
{{gemini_analysis}}
{{/if}}
{{#if root_cause_found}}
#### Root Cause Identified
**{{hypothesis_id}}**: {{description}}
Evidence supporting this conclusion:
{{supporting_evidence}}
{{else}}
#### Next Steps
{{next_steps}}
{{/if}}
---
{{/each}}
## Current Consolidated Understanding
### What We Know
{{#each valid_understandings}}
- {{this}}
{{/each}}
### What Was Disproven
{{#each disproven}}
- ~~{{assumption}}~~ (Evidence: {{evidence}})
{{/each}}
### Current Investigation Focus
{{current_focus}}
### Remaining Questions
{{#each questions}}
- {{this}}
{{/each}}
```
## Template Variables
| Variable | Type | Source | Description |
|----------|------|--------|-------------|
| `session_id` | string | state.session_id | 会话 ID |
| `bug_description` | string | state.debug.current_bug | Bug 描述 |
| `iterations` | array | 从文件解析 | 迭代历史 |
| `hypotheses` | array | state.debug.hypotheses | 假设列表 |
| `valid_understandings` | array | 从 Gemini 分析 | 有效理解 |
| `disproven` | array | 从假设状态 | 被否定的假设 |
## Section Templates
### Exploration Section
```markdown
### Iteration {{N}} - Initial Exploration ({{timestamp}})
#### Current Understanding
Based on bug description and initial code search:
- Error pattern: {{pattern}}
- Affected areas: {{areas}}
- Initial hypothesis: {{thoughts}}
#### Evidence from Code Search
{{#each search_results}}
**Keyword: "{{keyword}}"**
- Found in: {{files}}
- Key findings: {{insights}}
{{/each}}
#### Next Steps
- Generate testable hypotheses
- Add instrumentation
- Await reproduction
```
### Hypothesis Section
```markdown
#### Hypotheses Generated (Gemini-Assisted)
| ID | Description | Likelihood | Status |
|----|-------------|------------|--------|
{{#each hypotheses}}
| {{id}} | {{description}} | {{likelihood}} | {{status}} |
{{/each}}
**Details:**
{{#each hypotheses}}
**{{id}}**: {{description}}
- Logging at: `{{logging_point}}`
- Testing: {{testable_condition}}
- Confirm: {{evidence_criteria.confirm}}
- Reject: {{evidence_criteria.reject}}
{{/each}}
```
### Analysis Section
```markdown
### Iteration {{N}} - Evidence Analysis ({{timestamp}})
#### Log Analysis Results
{{#each results}}
**{{id}}**: **{{verdict}}**
- Evidence: \`{{evidence}}\`
- Reasoning: {{reason}}
{{/each}}
#### Corrected Understanding
| Previous Assumption | Corrected To | Reason |
|---------------------|--------------|--------|
{{#each corrections}}
| ~~{{wrong}}~~ | {{corrected}} | {{reason}} |
{{/each}}
#### Gemini Analysis
{{gemini_analysis}}
```
### Consolidated Understanding Section
```markdown
## Current Consolidated Understanding
### What We Know
{{#each valid}}
- {{this}}
{{/each}}
### What Was Disproven
{{#each disproven}}
- ~~{{this.assumption}}~~ (Evidence: {{this.evidence}})
{{/each}}
### Current Investigation Focus
{{focus}}
### Remaining Questions
{{#each questions}}
- {{this}}
{{/each}}
```
### Resolution Section
```markdown
### Resolution ({{timestamp}})
#### Fix Applied
- Modified files: {{files}}
- Fix description: {{description}}
- Root cause addressed: {{root_cause}}
#### Verification Results
{{verification}}
#### Lessons Learned
{{#each lessons}}
{{@index}}. {{this}}
{{/each}}
#### Key Insights for Future
{{#each insights}}
- {{this}}
{{/each}}
```
## Consolidation Rules
更新 "Current Consolidated Understanding" 时遵循以下规则:
1. **简化被否定项**: 移到 "What Was Disproven",只保留单行摘要
2. **保留有效见解**: 将确认的发现提升到 "What We Know"
3. **避免重复**: 不在合并部分重复时间线细节
4. **关注当前状态**: 描述现在知道什么,而不是过程
5. **保留关键纠正**: 保留重要的 wrong→right 转换供学习
## Anti-Patterns
**错误示例 (冗余)**:
```markdown
## Current Consolidated Understanding
In iteration 1 we thought X, but in iteration 2 we found Y, then in iteration 3...
Also we checked A and found B, and then we checked C...
```
**正确示例 (精简)**:
```markdown
## Current Consolidated Understanding
### What We Know
- Error occurs during runtime update, not initialization
- Config value is None (not missing key)
### What Was Disproven
- ~~Initialization error~~ (Timing evidence)
- ~~Missing key hypothesis~~ (Key exists)
### Current Investigation Focus
Why is config value None during update?
```

View File

@@ -0,0 +1,258 @@
# Validation Report Template
验证报告的标准模板。
## Template Structure
```markdown
# Validation Report
**Session ID**: {{session_id}}
**Task**: {{task_description}}
**Validated**: {{timestamp}}
---
## Iteration {{iteration}} - Validation Run
### Test Execution Summary
| Metric | Value |
|--------|-------|
| Total Tests | {{total_tests}} |
| Passed | {{passed_tests}} |
| Failed | {{failed_tests}} |
| Skipped | {{skipped_tests}} |
| Duration | {{duration}}ms |
| **Pass Rate** | **{{pass_rate}}%** |
### Coverage Report
{{#if has_coverage}}
| File | Statements | Branches | Functions | Lines |
|------|------------|----------|-----------|-------|
{{#each coverage_files}}
| {{path}} | {{statements}}% | {{branches}}% | {{functions}}% | {{lines}}% |
{{/each}}
**Overall Coverage**: {{overall_coverage}}%
{{else}}
_No coverage data available_
{{/if}}
### Failed Tests
{{#if has_failures}}
{{#each failures}}
#### {{test_name}}
- **Suite**: {{suite}}
- **Error**: {{error_message}}
- **Stack**:
\`\`\`
{{stack_trace}}
\`\`\`
{{/each}}
{{else}}
_All tests passed_
{{/if}}
### Gemini Quality Analysis
{{gemini_analysis}}
### Recommendations
{{#each recommendations}}
- {{this}}
{{/each}}
---
## Validation Decision
**Result**: {{#if passed}}✅ PASS{{else}}❌ FAIL{{/if}}
**Rationale**: {{rationale}}
{{#if not_passed}}
### Next Actions
1. Review failed tests
2. Debug failures using action-debug-with-file
3. Fix issues and re-run validation
{{else}}
### Next Actions
1. Consider code review
2. Prepare for deployment
3. Update documentation
{{/if}}
```
## Template Variables
| Variable | Type | Source | Description |
|----------|------|--------|-------------|
| `session_id` | string | state.session_id | 会话 ID |
| `task_description` | string | state.task_description | 任务描述 |
| `timestamp` | string | 当前时间 | 验证时间 |
| `iteration` | number | 从文件计算 | 验证迭代次数 |
| `total_tests` | number | 测试输出 | 总测试数 |
| `passed_tests` | number | 测试输出 | 通过数 |
| `failed_tests` | number | 测试输出 | 失败数 |
| `pass_rate` | number | 计算得出 | 通过率 |
| `coverage_files` | array | 覆盖率报告 | 文件覆盖率 |
| `failures` | array | 测试输出 | 失败测试详情 |
| `gemini_analysis` | string | Gemini CLI | 质量分析 |
| `recommendations` | array | Gemini CLI | 建议列表 |
## Section Templates
### Test Summary
```markdown
### Test Execution Summary
| Metric | Value |
|--------|-------|
| Total Tests | {{total}} |
| Passed | {{passed}} |
| Failed | {{failed}} |
| Skipped | {{skipped}} |
| Duration | {{duration}}ms |
| **Pass Rate** | **{{rate}}%** |
```
### Coverage Table
```markdown
### Coverage Report
| File | Statements | Branches | Functions | Lines |
|------|------------|----------|-----------|-------|
{{#each files}}
| `{{path}}` | {{statements}}% | {{branches}}% | {{functions}}% | {{lines}}% |
{{/each}}
**Overall Coverage**: {{overall}}%
**Coverage Thresholds**:
- ✅ Good: ≥ 80%
- ⚠️ Warning: 60-79%
- ❌ Poor: < 60%
```
### Failed Test Details
```markdown
### Failed Tests
{{#each failures}}
#### ❌ {{test_name}}
| Field | Value |
|-------|-------|
| Suite | {{suite}} |
| Error | {{error_message}} |
| Duration | {{duration}}ms |
**Stack Trace**:
\`\`\`
{{stack_trace}}
\`\`\`
**Possible Causes**:
{{#each possible_causes}}
- {{this}}
{{/each}}
---
{{/each}}
```
### Quality Analysis
```markdown
### Gemini Quality Analysis
#### Code Quality Assessment
| Dimension | Score | Status |
|-----------|-------|--------|
| Correctness | {{correctness}}/10 | {{correctness_status}} |
| Completeness | {{completeness}}/10 | {{completeness_status}} |
| Reliability | {{reliability}}/10 | {{reliability_status}} |
| Maintainability | {{maintainability}}/10 | {{maintainability_status}} |
#### Key Findings
{{#each findings}}
- **{{severity}}**: {{description}}
{{/each}}
#### Recommendations
{{#each recommendations}}
{{@index}}. {{this}}
{{/each}}
```
### Decision Section
```markdown
## Validation Decision
**Result**: {{#if passed}}✅ PASS{{else}}❌ FAIL{{/if}}
**Rationale**:
{{rationale}}
**Confidence Level**: {{confidence}}
### Decision Matrix
| Criteria | Status | Weight | Score |
|----------|--------|--------|-------|
| All tests pass | {{tests_pass}} | 40% | {{tests_score}} |
| Coverage ≥ 80% | {{coverage_pass}} | 30% | {{coverage_score}} |
| No critical issues | {{no_critical}} | 20% | {{critical_score}} |
| Quality analysis pass | {{quality_pass}} | 10% | {{quality_score}} |
| **Total** | | 100% | **{{total_score}}** |
**Threshold**: 70% to pass
### Next Actions
{{#if passed}}
1. ✅ Code review (recommended)
2. ✅ Update documentation
3. ✅ Prepare for deployment
{{else}}
1. ❌ Review failed tests
2. ❌ Debug failures
3. ❌ Fix issues and re-run
{{/if}}
```
## Historical Comparison
```markdown
## Validation History
| Iteration | Date | Pass Rate | Coverage | Status |
|-----------|------|-----------|----------|--------|
{{#each history}}
| {{iteration}} | {{date}} | {{pass_rate}}% | {{coverage}}% | {{status}} |
{{/each}}
### Trend Analysis
{{#if improving}}
📈 **Improving**: Pass rate increased from {{previous_rate}}% to {{current_rate}}%
{{else if declining}}
📉 **Declining**: Pass rate decreased from {{previous_rate}}% to {{current_rate}}%
{{else}}
➡️ **Stable**: Pass rate remains at {{current_rate}}%
{{/if}}
```