feat: add multi-mode workflow planning skill with session management and task generation

This commit is contained in:
catlog22
2026-03-02 15:25:56 +08:00
parent 2c2b9d6e29
commit 121e834459
28 changed files with 6478 additions and 533 deletions

View File

@@ -0,0 +1,85 @@
---
prefix: BENCH
inner_loop: false
message_types:
success: bench_complete
error: error
fix: fix_required
---
# Performance Benchmarker
Run benchmarks comparing before/after optimization metrics. Validate that improvements meet plan success criteria and detect any regressions.
## Phase 2: Environment & Baseline Loading
| Input | Source | Required |
|-------|--------|----------|
| Baseline metrics | <session>/artifacts/baseline-metrics.json | Yes |
| Optimization plan | <session>/artifacts/optimization-plan.md | Yes |
| shared-memory.json | <session>/wisdom/shared-memory.json | Yes |
1. Extract session path from task description
2. Read baseline metrics -- extract pre-optimization performance numbers
3. Read optimization plan -- extract success criteria and target thresholds
4. Load shared-memory.json for project type and optimization scope
5. Detect available benchmark tools from project:
| Signal | Benchmark Tool | Method |
|--------|---------------|--------|
| package.json + vitest/jest | Test runner benchmarks | Run existing perf tests |
| package.json + webpack/vite | Bundle analysis | Compare build output sizes |
| Cargo.toml + criterion | Rust benchmarks | cargo bench |
| go.mod | Go benchmarks | go test -bench |
| Makefile with bench target | Custom benchmarks | make bench |
| No tooling detected | Manual measurement | Timed execution via Bash |
6. Get changed files scope from shared-memory (optimizer namespace)
## Phase 3: Benchmark Execution
Run benchmarks matching detected project type:
**Frontend benchmarks**:
- Compare bundle size before/after (build output analysis)
- Measure render performance for affected components
- Check for dependency weight changes
**Backend benchmarks**:
- Measure endpoint response times for affected routes
- Profile memory usage under simulated load
- Verify database query performance improvements
**CLI / Library benchmarks**:
- Measure execution time for representative workloads
- Compare memory peak usage
- Test throughput under sustained load
**All project types**:
- Run existing test suite to verify no regressions
- Collect post-optimization metrics matching baseline format
- Calculate improvement percentages per metric
## Phase 4: Result Analysis
Compare against baseline and plan criteria:
| Metric | Threshold | Verdict |
|--------|-----------|---------|
| Target improvement vs baseline | Meets plan success criteria | PASS |
| No regression in unrelated metrics | < 5% degradation allowed | PASS |
| All plan success criteria met | Every criterion satisfied | PASS |
| Improvement below target | > 50% of target achieved | WARN |
| Regression detected | Any unrelated metric degrades > 5% | FAIL -> fix_required |
| Plan criteria not met | Any criterion not satisfied | FAIL -> fix_required |
1. Write benchmark results to `<session>/artifacts/benchmark-results.json`:
- Per-metric: name, baseline value, current value, improvement %, verdict
- Overall verdict: PASS / WARN / FAIL
- Regression details (if any)
2. Update `<session>/wisdom/shared-memory.json` under `benchmarker` namespace:
- Read existing -> merge `{ "benchmarker": { verdict, improvements, regressions } }` -> write back
3. If verdict is FAIL, include detailed feedback in message for FIX task creation:
- Which metrics failed, by how much, suggested investigation areas

View File

@@ -0,0 +1,76 @@
---
prefix: IMPL
inner_loop: true
additional_prefixes: [FIX]
subagents: [explore]
message_types:
success: impl_complete
error: error
fix: fix_required
---
# Code Optimizer
Implement optimization changes following the strategy plan. For FIX tasks, apply targeted corrections based on review/benchmark feedback.
## Modes
| Mode | Task Prefix | Trigger | Focus |
|------|-------------|---------|-------|
| Implement | IMPL | Strategy plan ready | Apply optimizations per plan priority |
| Fix | FIX | Review/bench feedback | Targeted fixes for identified issues |
## Phase 2: Plan & Context Loading
| Input | Source | Required |
|-------|--------|----------|
| Optimization plan | <session>/artifacts/optimization-plan.md | Yes (IMPL) |
| Review/bench feedback | From task description | Yes (FIX) |
| shared-memory.json | <session>/wisdom/shared-memory.json | Yes |
| Wisdom files | <session>/wisdom/patterns.md | No |
| Context accumulator | From prior IMPL/FIX tasks | Yes (inner loop) |
1. Extract session path and task mode (IMPL or FIX) from task description
2. For IMPL: read optimization plan -- extract priority-ordered changes and success criteria
3. For FIX: parse review/benchmark feedback for specific issues to address
4. Use `explore` subagent to load implementation context for target files
5. For inner loop: load context_accumulator from prior IMPL/FIX tasks to avoid re-reading
## Phase 3: Code Implementation
Implementation backend selection:
| Backend | Condition | Method |
|---------|-----------|--------|
| CLI | Multi-file optimization with clear plan | ccw cli --tool gemini --mode write |
| Direct | Single-file changes or targeted fixes | Inline Edit/Write tools |
For IMPL tasks:
- Apply optimizations in plan priority order (P0 first, then P1, etc.)
- Follow implementation guidance from plan (target files, patterns)
- Preserve existing behavior -- optimization must not break functionality
For FIX tasks:
- Read specific issues from review/benchmark feedback
- Apply targeted corrections to flagged code locations
- Verify the fix addresses the exact concern raised
General rules:
- Make minimal, focused changes per optimization
- Add comments only where optimization logic is non-obvious
- Preserve existing code style and conventions
## Phase 4: Self-Validation
| Check | Method | Pass Criteria |
|-------|--------|---------------|
| Syntax | IDE diagnostics or build check | No new errors |
| File integrity | Verify all planned files exist and are modified | All present |
| Acceptance | Match optimization plan success criteria | All target metrics addressed |
| No regression | Run existing tests if available | No new failures |
If validation fails, attempt auto-fix (max 2 attempts) before reporting error.
Append to context_accumulator for next IMPL/FIX task:
- Files modified, optimizations applied, validation results
- Any discovered patterns or caveats for subsequent iterations

View File

@@ -0,0 +1,73 @@
---
prefix: PROFILE
inner_loop: false
subagents: [explore]
message_types:
success: profile_complete
error: error
---
# Performance Profiler
Profile application performance to identify CPU, memory, I/O, network, and rendering bottlenecks. Produce quantified baseline metrics and a ranked bottleneck report.
## Phase 2: Context & Environment Detection
| Input | Source | Required |
|-------|--------|----------|
| Task description | From task subject/description | Yes |
| Session path | Extracted from task description | Yes |
| shared-memory.json | <session>/wisdom/shared-memory.json | No |
1. Extract session path and target scope from task description
2. Detect project type by scanning for framework markers:
| Signal File | Project Type | Profiling Focus |
|-------------|-------------|-----------------|
| package.json + React/Vue/Angular | Frontend | Render time, bundle size, FCP/LCP/CLS |
| package.json + Express/Fastify/NestJS | Backend Node | CPU hotspots, memory, DB queries |
| Cargo.toml / go.mod / pom.xml | Native/JVM Backend | CPU, memory, GC tuning |
| Mixed framework markers | Full-stack | Split into FE + BE profiling passes |
| CLI entry / bin/ directory | CLI Tool | Startup time, throughput, memory peak |
| No detection | Generic | All profiling dimensions |
3. Use `explore` subagent to map performance-critical code paths within target scope
4. Detect available profiling tools (test runners, benchmark harnesses, linting tools)
## Phase 3: Performance Profiling
Execute profiling based on detected project type:
**Frontend profiling**:
- Analyze bundle size and dependency weight via build output
- Identify render-blocking resources and heavy components
- Check for unnecessary re-renders, large DOM trees, unoptimized assets
**Backend profiling**:
- Trace hot code paths via execution analysis or instrumented runs
- Identify slow database queries, N+1 patterns, missing indexes
- Check memory allocation patterns and potential leaks
**CLI / Library profiling**:
- Measure startup time and critical path latency
- Profile throughput under representative workloads
- Identify memory peaks and allocation churn
**All project types**:
- Collect quantified baseline metrics (timing, memory, throughput)
- Rank top 3-5 bottlenecks by severity (Critical / High / Medium)
- Record evidence: file paths, line numbers, measured values
## Phase 4: Report Generation
1. Write baseline metrics to `<session>/artifacts/baseline-metrics.json`:
- Key metric names, measured values, units, measurement method
- Timestamp and environment details
2. Write bottleneck report to `<session>/artifacts/bottleneck-report.md`:
- Ranked list of bottlenecks with severity, location (file:line), measured impact
- Evidence summary per bottleneck
- Detected project type and profiling methods used
3. Update `<session>/wisdom/shared-memory.json` under `profiler` namespace:
- Read existing -> merge `{ "profiler": { project_type, bottleneck_count, top_bottleneck, scope } }` -> write back

View File

@@ -0,0 +1,69 @@
---
prefix: REVIEW
inner_loop: false
additional_prefixes: [QUALITY]
discuss_rounds: [DISCUSS-REVIEW]
subagents: [discuss]
message_types:
success: review_complete
error: error
fix: fix_required
---
# Optimization Reviewer
Review optimization code changes for correctness, side effects, regression risks, and adherence to best practices. Provide structured verdicts with actionable feedback.
## Phase 2: Context Loading
| Input | Source | Required |
|-------|--------|----------|
| Optimization code changes | From IMPL task artifacts / git diff | Yes |
| Optimization plan | <session>/artifacts/optimization-plan.md | Yes |
| Benchmark results | <session>/artifacts/benchmark-results.json | No |
| shared-memory.json | <session>/wisdom/shared-memory.json | Yes |
1. Extract session path from task description
2. Read optimization plan -- understand intended changes and success criteria
3. Load shared-memory.json for optimizer namespace (files modified, patterns applied)
4. Identify changed files from optimizer context -- read each modified file
5. If benchmark results available, read for cross-reference with code quality
## Phase 3: Multi-Dimension Review
Analyze optimization changes across five dimensions:
| Dimension | Focus | Severity |
|-----------|-------|----------|
| Correctness | Logic errors, off-by-one, race conditions, null safety | Critical |
| Side effects | Unintended behavior changes, API contract breaks, data loss | Critical |
| Maintainability | Code clarity, complexity increase, naming, documentation | High |
| Regression risk | Impact on unrelated code paths, implicit dependencies | High |
| Best practices | Idiomatic patterns, framework conventions, optimization anti-patterns | Medium |
Per-dimension review process:
- Scan modified files for patterns matching each dimension
- Record findings with severity (Critical / High / Medium / Low)
- Include specific file:line references and suggested fixes
If any Critical findings detected, invoke `discuss` subagent (DISCUSS-REVIEW round) to validate the assessment before issuing verdict.
## Phase 4: Verdict & Feedback
Classify overall verdict based on findings:
| Verdict | Condition | Action |
|---------|-----------|--------|
| APPROVE | No Critical or High findings | Send review_complete |
| REVISE | Has High findings, no Critical | Send fix_required with detailed feedback |
| REJECT | Has Critical findings or fundamental approach flaw | Send fix_required + flag for strategist escalation |
1. Write review report to `<session>/artifacts/review-report.md`:
- Per-dimension findings with severity, file:line, description
- Overall verdict with rationale
- Specific fix instructions for REVISE/REJECT verdicts
2. Update `<session>/wisdom/shared-memory.json` under `reviewer` namespace:
- Read existing -> merge `{ "reviewer": { verdict, finding_count, critical_count, dimensions_reviewed } }` -> write back
3. If DISCUSS-REVIEW was triggered, record discussion summary in `<session>/discussions/DISCUSS-REVIEW.md`

View File

@@ -0,0 +1,73 @@
---
prefix: STRATEGY
inner_loop: false
discuss_rounds: [DISCUSS-OPT]
subagents: [discuss]
message_types:
success: strategy_complete
error: error
---
# Optimization Strategist
Analyze bottleneck reports and baseline metrics to design a prioritized optimization plan with concrete strategies, expected improvements, and risk assessments.
## Phase 2: Analysis Loading
| Input | Source | Required |
|-------|--------|----------|
| Bottleneck report | <session>/artifacts/bottleneck-report.md | Yes |
| Baseline metrics | <session>/artifacts/baseline-metrics.json | Yes |
| shared-memory.json | <session>/wisdom/shared-memory.json | Yes |
| Wisdom files | <session>/wisdom/patterns.md | No |
1. Extract session path from task description
2. Read bottleneck report -- extract ranked bottleneck list with severities
3. Read baseline metrics -- extract current performance numbers
4. Load shared-memory.json for profiler findings (project_type, scope)
5. Assess overall optimization complexity:
| Bottleneck Count | Severity Mix | Complexity |
|-----------------|-------------|------------|
| 1-2 | All Medium | Low |
| 2-3 | Mix of High/Medium | Medium |
| 3+ or any Critical | Any Critical present | High |
## Phase 3: Strategy Formulation
For each bottleneck, select optimization approach by type:
| Bottleneck Type | Strategies | Risk Level |
|----------------|-----------|------------|
| CPU hotspot | Algorithm optimization, memoization, caching, worker threads | Medium |
| Memory leak/bloat | Pool reuse, lazy initialization, WeakRef, scope cleanup | High |
| I/O bound | Batching, async pipelines, streaming, connection pooling | Medium |
| Network latency | Request coalescing, compression, CDN, prefetching | Low |
| Rendering | Virtualization, memoization, CSS containment, code splitting | Medium |
| Database | Index optimization, query rewriting, caching layer, denormalization | High |
Prioritize optimizations by impact/effort ratio:
| Priority | Criteria |
|----------|----------|
| P0 (Critical) | High impact + Low effort -- quick wins |
| P1 (High) | High impact + Medium effort |
| P2 (Medium) | Medium impact + Low effort |
| P3 (Low) | Low impact or High effort -- defer |
If complexity is High, invoke `discuss` subagent (DISCUSS-OPT round) to evaluate trade-offs between competing strategies before finalizing the plan.
Define measurable success criteria per optimization (target metric value or improvement %).
## Phase 4: Plan Output
1. Write optimization plan to `<session>/artifacts/optimization-plan.md`:
- Priority-ordered list of optimizations
- Per optimization: target bottleneck, strategy, expected improvement %, risk level
- Success criteria: specific metric thresholds to verify
- Implementation guidance: files to modify, patterns to apply
2. Update `<session>/wisdom/shared-memory.json` under `strategist` namespace:
- Read existing -> merge `{ "strategist": { complexity, optimization_count, priorities, discuss_used } }` -> write back
3. If DISCUSS-OPT was triggered, record discussion summary in `<session>/discussions/DISCUSS-OPT.md`