feat: migrate all codex team skills from spawn_agents_on_csv to spawn_agent + wait_agent architecture

- Delete 21 old team skill directories using CSV-wave pipeline pattern (~100+ files) - Delete old team-lifecycle (v3) and team-planex-v2 - Create generic team-worker.toml and team-supervisor.toml (replacing tlv4-specific TOMLs) - Convert 19 team skills from Claude Code format (Agent/SendMessage/TaskCreate) to Codex format (spawn_agent/wait_agent/tasks.json/request_user_input) - Update team-lifecycle-v4 to use generic agent types (team_worker/team_supervisor) - Convert all coordinator role files: dispatch.md, monitor.md, role.md - Convert all worker role files: remove run_in_background, fix Bash syntax - Convert all specs/pipelines.md references - Final state: 20 team skills, 217 .md files, zero Claude Code API residuals Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 19:56:37 +08:00 · 2026-03-24 16:54:48 +08:00
parent 54283e5dbb
commit 1e560ab8e8
334 changed files with 28996 additions and 35516 deletions
--- a/.codex/skills/team-perf-opt/roles/benchmarker/role.md
+++ b/.codex/skills/team-perf-opt/roles/benchmarker/role.md
@@ -0,0 +1,89 @@
+---
+role: benchmarker
+prefix: BENCH
+inner_loop: false
+message_types:
+  success: bench_complete
+  error: error
+  fix: fix_required
+---
+
+# Performance Benchmarker
+
+Run benchmarks comparing before/after optimization metrics. Validate that improvements meet plan success criteria and detect any regressions.
+
+## Phase 2: Environment & Baseline Loading
+
+| Input | Source | Required |
+|-------|--------|----------|
+| Baseline metrics | <session>/artifacts/baseline-metrics.json (shared) | Yes |
+| Optimization plan / detail | Varies by mode (see below) | Yes |
+| .msg/meta.json | <session>/.msg/meta.json | Yes |
+
+1. Extract session path from task description
+2. **Detect branch/pipeline context** from task description:
+
+| Task Description Field | Value | Context |
+|----------------------|-------|---------|
+| `BranchId: B{NN}` | Present | Fan-out branch -- benchmark only this branch's metrics |
+| `PipelineId: {P}` | Present | Independent pipeline -- use pipeline-scoped baseline |
+| Neither present | - | Single mode -- full benchmark |
+
+3. **Load baseline metrics**:
+   - Single / Fan-out: Read `<session>/artifacts/baseline-metrics.json` (shared baseline)
+   - Independent: Read `<session>/artifacts/pipelines/{P}/baseline-metrics.json`
+
+4. **Load optimization context**:
+   - Single: Read `<session>/artifacts/optimization-plan.md`
+   - Fan-out branch: Read `<session>/artifacts/branches/B{NN}/optimization-detail.md`
+   - Independent: Read `<session>/artifacts/pipelines/{P}/optimization-plan.md`
+
+5. Load .msg/meta.json for project type and optimization scope
+6. Detect available benchmark tools from project:
+
+| Signal | Benchmark Tool | Method |
+|--------|---------------|--------|
+| package.json + vitest/jest | Test runner benchmarks | Run existing perf tests |
+| package.json + webpack/vite | Bundle analysis | Compare build output sizes |
+| Cargo.toml + criterion | Rust benchmarks | cargo bench |
+| go.mod | Go benchmarks | go test -bench |
+| Makefile with bench target | Custom benchmarks | make bench |
+| No tooling detected | Manual measurement | Timed execution via Bash |
+
+7. Get changed files scope from shared-memory (optimizer namespace, scoped by branch/pipeline)
+
+## Phase 3: Benchmark Execution
+
+Run benchmarks matching detected project type:
+
+**Frontend benchmarks**: Compare bundle size, render performance, dependency weight changes.
+
+**Backend benchmarks**: Measure endpoint response times, memory usage under load, database query improvements.
+
+**CLI / Library benchmarks**: Execution time, memory peak, throughput under sustained load.
+
+**All project types**:
+- Run existing test suite to verify no regressions
+- Collect post-optimization metrics matching baseline format
+- Calculate improvement percentages per metric
+
+**Branch-scoped benchmarking** (fan-out mode):
+- Only benchmark metrics relevant to this branch's optimization
+- Still check for regressions across all metrics
+
+## Phase 4: Result Analysis
+
+Compare against baseline and plan criteria:
+
+| Metric | Threshold | Verdict |
+|--------|-----------|---------|
+| Target improvement vs baseline | Meets plan success criteria | PASS |
+| No regression in unrelated metrics | < 5% degradation allowed | PASS |
+| All plan success criteria met | Every criterion satisfied | PASS |
+| Improvement below target | > 50% of target achieved | WARN |
+| Regression detected | Any unrelated metric degrades > 5% | FAIL -> fix_required |
+| Plan criteria not met | Any criterion not satisfied | FAIL -> fix_required |
+
+1. Write benchmark results to output path (scoped by branch/pipeline/single)
+2. Update `<session>/.msg/meta.json` under scoped namespace
+3. If verdict is FAIL, include detailed feedback in message for FIX task creation