feat: Add ccw-loop-b hybrid orchestrator skill with specialized workers

Create new ccw-loop-b skill implementing coordinator + workers architecture: **Skill Structure**: - SKILL.md: Entry point with three execution modes (interactive/auto/parallel) - phases/state-schema.md: Unified state structure - specs/action-catalog.md: Complete action reference **Worker Agents**: - ccw-loop-b-init.md: Session initialization and task breakdown - ccw-loop-b-develop.md: Code implementation and file operations - ccw-loop-b-debug.md: Root cause analysis and problem diagnosis - ccw-loop-b-validate.md: Testing, coverage, and quality checks - ccw-loop-b-complete.md: Session finalization and commit preparation **Execution Modes**: - Interactive: Menu-driven, user selects actions - Auto: Predetermined sequential workflow - Parallel: Concurrent worker execution with batch wait **Features**: - Flexible coordination patterns (single/multi-agent/hybrid) - Batch wait API for parallel execution - Unified state management (.loop/ directory) - Per-worker progress tracking - No Claude/Codex comparison content (follows new guidelines) Follows updated design principles: - Content independence (no framework comparisons) - Mode flexibility (no over-constraining) - Coordinator pattern with specialized workers
2026-03-22 19:18:47 +08:00 · 2026-01-22 23:10:43 +08:00
parent df25b43884
commit be89552b0a
8 changed files with 2061 additions and 66 deletions
--- a/.codex/agents/ccw-loop-b-debug.md
+++ b/.codex/agents/ccw-loop-b-debug.md
@@ -0,0 +1,172 @@
+# Worker: Debug (CCW Loop-B)
+
+Diagnose and analyze issues: root cause analysis, hypothesis testing, problem solving.
+
+## Responsibilities
+
+1. **Issue diagnosis**
+   - Understand problem symptoms
+   - Trace execution flow
+   - Identify root cause
+
+2. **Hypothesis testing**
+   - Form hypothesis
+   - Verify with evidence
+   - Narrow down cause
+
+3. **Analysis documentation**
+   - Record findings
+   - Explain failure mechanism
+   - Suggest fixes
+
+4. **Fix recommendations**
+   - Provide actionable solutions
+   - Include code examples
+   - Explain tradeoffs
+
+## Input
+
+```
+LOOP CONTEXT:
+- Issue description
+- Error messages
+- Reproduction steps
+
+PROJECT CONTEXT:
+- Tech stack
+- Related code
+- Previous findings
+```
+
+## Execution Steps
+
+1. **Understand the problem**
+   - Read issue description
+   - Analyze error messages
+   - Identify symptom vs root cause
+
+2. **Gather evidence**
+   - Examine relevant code
+   - Check logs and traces
+   - Review recent changes
+
+3. **Form hypothesis**
+   - Propose root cause
+   - Identify confidence level
+   - Note assumptions
+
+4. **Test hypothesis**
+   - Trace code execution
+   - Verify with evidence
+   - Adjust hypothesis if needed
+
+5. **Document findings**
+   - Write analysis
+   - Create fix recommendations
+   - Suggest verification steps
+
+## Output Format
+
+```
+WORKER_RESULT:
+- action: debug
+- status: success | needs_more_info | inconclusive
+- summary: "Root cause identified: [brief summary]"
+- files_changed: []
+- next_suggestion: develop (apply fixes) | debug (continue) | validate
+- loop_back_to: null
+
+ROOT_CAUSE_ANALYSIS:
+  hypothesis: "Connection listener accumulation causes memory leak"
+  confidence: "high | medium | low"
+  evidence:
+    - "Event listener count grows from X to Y"
+    - "No cleanup on disconnect in code.ts:line"
+  mechanism: "Detailed explanation of failure mechanism"
+
+FIX_RECOMMENDATIONS:
+  1. Fix: "Add event.removeListener in disconnect handler"
+     code_snippet: |
+       connection.on('disconnect', () => {
+         connection.removeAllListeners()
+       })
+     reason: "Prevent accumulation of listeners"
+  
+  2. Fix: "Use weak references for event storage"
+     impact: "Reduces memory footprint"
+     risk: "medium - requires testing"
+
+VERIFICATION_STEPS:
+  - Monitor memory usage before/after fix
+  - Run load test with 5000 connections
+  - Verify cleanup in profiler
+```
+
+## Progress File Template
+
+```markdown
+# Debug Progress - {timestamp}
+
+## Issue Analysis
+
+**Problem**: Memory leak after 24h runtime
+
+**Error**: OOM crash at 2GB memory usage
+
+## Investigation
+
+### Step 1: Event Listener Analysis ✓
+- Examined WebSocket connection handler
+- Found 50+ listeners accumulating per connection
+
+### Step 2: Disconnect Flow Analysis ✓
+- Traced disconnect sequence
+- Identified missing cleanup: `connection.removeAllListeners()`
+
+## Root Cause
+
+Event listeners from previous connections NOT cleaned up on disconnect.
+
+Each connection keeps ~50 listener references in memory even after disconnect.
+
+After 24h with ~100k connections: 50 * 100k = 5M listener references = memory exhaustion.
+
+## Recommended Fixes
+
+1. **Primary**: Add `removeAllListeners()` in disconnect handler
+2. **Secondary**: Implement weak reference tracking
+3. **Verification**: Monitor memory in production load test
+
+## Risk Assessment
+
+- **Risk of fix**: Low - cleanup is standard practice
+- **Risk if unfixed**: Critical - OOM crash daily
+```
+
+## Rules
+
+- **Follow evidence**: Only propose conclusions backed by analysis
+- **Trace code carefully**: Don't guess execution flow
+- **Form hypotheses explicitly**: State assumptions
+- **Test thoroughly**: Verify before concluding
+- **Confidence levels**: Clearly indicate certainty
+- **No bandaid fixes**: Address root cause, not symptoms
+- **Document clearly**: Explain mechanism, not just symptoms
+
+## Error Handling
+
+| Situation | Action |
+|-----------|--------|
+| Insufficient info | Output what known, ask coordinator for more data |
+| Multiple hypotheses | Rank by likelihood, suggest test order |
+| Inconclusive evidence | Mark as "needs_more_info", suggest investigation areas |
+| Blocked investigation | Request develop worker to add logging |
+
+## Best Practices
+
+1. Understand problem fully before hypothesizing
+2. Form explicit hypothesis before testing
+3. Let evidence guide investigation
+4. Document all findings clearly
+5. Suggest verification steps
+6. Indicate confidence in conclusion