Files
Claude-Code-Workflow/.codex/skills/team-perf-opt/roles/benchmarker/role.md
catlog22 1e560ab8e8 feat: migrate all codex team skills from spawn_agents_on_csv to spawn_agent + wait_agent architecture
- Delete 21 old team skill directories using CSV-wave pipeline pattern (~100+ files)
- Delete old team-lifecycle (v3) and team-planex-v2
- Create generic team-worker.toml and team-supervisor.toml (replacing tlv4-specific TOMLs)
- Convert 19 team skills from Claude Code format (Agent/SendMessage/TaskCreate)
  to Codex format (spawn_agent/wait_agent/tasks.json/request_user_input)
- Update team-lifecycle-v4 to use generic agent types (team_worker/team_supervisor)
- Convert all coordinator role files: dispatch.md, monitor.md, role.md
- Convert all worker role files: remove run_in_background, fix Bash syntax
- Convert all specs/pipelines.md references
- Final state: 20 team skills, 217 .md files, zero Claude Code API residuals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 16:54:48 +08:00

3.6 KiB

role, prefix, inner_loop, message_types
role prefix inner_loop message_types
benchmarker BENCH false
success error fix
bench_complete error fix_required

Performance Benchmarker

Run benchmarks comparing before/after optimization metrics. Validate that improvements meet plan success criteria and detect any regressions.

Phase 2: Environment & Baseline Loading

Input Source Required
Baseline metrics /artifacts/baseline-metrics.json (shared) Yes
Optimization plan / detail Varies by mode (see below) Yes
.msg/meta.json /.msg/meta.json Yes
  1. Extract session path from task description
  2. Detect branch/pipeline context from task description:
Task Description Field Value Context
BranchId: B{NN} Present Fan-out branch -- benchmark only this branch's metrics
PipelineId: {P} Present Independent pipeline -- use pipeline-scoped baseline
Neither present - Single mode -- full benchmark
  1. Load baseline metrics:

    • Single / Fan-out: Read <session>/artifacts/baseline-metrics.json (shared baseline)
    • Independent: Read <session>/artifacts/pipelines/{P}/baseline-metrics.json
  2. Load optimization context:

    • Single: Read <session>/artifacts/optimization-plan.md
    • Fan-out branch: Read <session>/artifacts/branches/B{NN}/optimization-detail.md
    • Independent: Read <session>/artifacts/pipelines/{P}/optimization-plan.md
  3. Load .msg/meta.json for project type and optimization scope

  4. Detect available benchmark tools from project:

Signal Benchmark Tool Method
package.json + vitest/jest Test runner benchmarks Run existing perf tests
package.json + webpack/vite Bundle analysis Compare build output sizes
Cargo.toml + criterion Rust benchmarks cargo bench
go.mod Go benchmarks go test -bench
Makefile with bench target Custom benchmarks make bench
No tooling detected Manual measurement Timed execution via Bash
  1. Get changed files scope from shared-memory (optimizer namespace, scoped by branch/pipeline)

Phase 3: Benchmark Execution

Run benchmarks matching detected project type:

Frontend benchmarks: Compare bundle size, render performance, dependency weight changes.

Backend benchmarks: Measure endpoint response times, memory usage under load, database query improvements.

CLI / Library benchmarks: Execution time, memory peak, throughput under sustained load.

All project types:

  • Run existing test suite to verify no regressions
  • Collect post-optimization metrics matching baseline format
  • Calculate improvement percentages per metric

Branch-scoped benchmarking (fan-out mode):

  • Only benchmark metrics relevant to this branch's optimization
  • Still check for regressions across all metrics

Phase 4: Result Analysis

Compare against baseline and plan criteria:

Metric Threshold Verdict
Target improvement vs baseline Meets plan success criteria PASS
No regression in unrelated metrics < 5% degradation allowed PASS
All plan success criteria met Every criterion satisfied PASS
Improvement below target > 50% of target achieved WARN
Regression detected Any unrelated metric degrades > 5% FAIL -> fix_required
Plan criteria not met Any criterion not satisfied FAIL -> fix_required
  1. Write benchmark results to output path (scoped by branch/pipeline/single)
  2. Update <session>/.msg/meta.json under scoped namespace
  3. If verdict is FAIL, include detailed feedback in message for FIX task creation