mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-03-26 19:56:37 +08:00

Files

catlog22 1e560ab8e8 feat: migrate all codex team skills from spawn_agents_on_csv to spawn_agent + wait_agent architecture

- Delete 21 old team skill directories using CSV-wave pipeline pattern (~100+ files)
- Delete old team-lifecycle (v3) and team-planex-v2
- Create generic team-worker.toml and team-supervisor.toml (replacing tlv4-specific TOMLs)
- Convert 19 team skills from Claude Code format (Agent/SendMessage/TaskCreate)
  to Codex format (spawn_agent/wait_agent/tasks.json/request_user_input)
- Update team-lifecycle-v4 to use generic agent types (team_worker/team_supervisor)
- Convert all coordinator role files: dispatch.md, monitor.md, role.md
- Convert all worker role files: remove run_in_background, fix Bash syntax
- Convert all specs/pipelines.md references
- Final state: 20 team skills, 217 .md files, zero Claude Code API residuals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-24 16:54:48 +08:00

3.6 KiB

Raw Blame History

role, prefix, inner_loop, message_types

role

prefix

inner_loop

message_types

benchmarker

BENCH

false

success	error	fix
bench_complete	error	fix_required

Performance Benchmarker

Run benchmarks comparing before/after optimization metrics. Validate that improvements meet plan success criteria and detect any regressions.

Phase 2: Environment & Baseline Loading

Input	Source	Required
Baseline metrics	/artifacts/baseline-metrics.json (shared)	Yes
Optimization plan / detail	Varies by mode (see below)	Yes
.msg/meta.json	/.msg/meta.json	Yes

Extract session path from task description
Detect branch/pipeline context from task description:

Task Description Field	Value	Context
`BranchId: B{NN}`	Present	Fan-out branch -- benchmark only this branch's metrics
`PipelineId: {P}`	Present	Independent pipeline -- use pipeline-scoped baseline
Neither present	-	Single mode -- full benchmark

Load baseline metrics:
- Single / Fan-out: Read <session>/artifacts/baseline-metrics.json (shared baseline)
- Independent: Read <session>/artifacts/pipelines/{P}/baseline-metrics.json
Load optimization context:
- Single: Read <session>/artifacts/optimization-plan.md
- Fan-out branch: Read <session>/artifacts/branches/B{NN}/optimization-detail.md
- Independent: Read <session>/artifacts/pipelines/{P}/optimization-plan.md
Load .msg/meta.json for project type and optimization scope
Detect available benchmark tools from project:

Signal	Benchmark Tool	Method
package.json + vitest/jest	Test runner benchmarks	Run existing perf tests
package.json + webpack/vite	Bundle analysis	Compare build output sizes
Cargo.toml + criterion	Rust benchmarks	cargo bench
go.mod	Go benchmarks	go test -bench
Makefile with bench target	Custom benchmarks	make bench
No tooling detected	Manual measurement	Timed execution via Bash

Get changed files scope from shared-memory (optimizer namespace, scoped by branch/pipeline)

Phase 3: Benchmark Execution

Run benchmarks matching detected project type:

Frontend benchmarks: Compare bundle size, render performance, dependency weight changes.

Backend benchmarks: Measure endpoint response times, memory usage under load, database query improvements.

CLI / Library benchmarks: Execution time, memory peak, throughput under sustained load.

All project types:

Run existing test suite to verify no regressions
Collect post-optimization metrics matching baseline format
Calculate improvement percentages per metric

Branch-scoped benchmarking (fan-out mode):

Only benchmark metrics relevant to this branch's optimization
Still check for regressions across all metrics

Phase 4: Result Analysis

Compare against baseline and plan criteria:

Metric	Threshold	Verdict
Target improvement vs baseline	Meets plan success criteria	PASS
No regression in unrelated metrics	< 5% degradation allowed	PASS
All plan success criteria met	Every criterion satisfied	PASS
Improvement below target	> 50% of target achieved	WARN
Regression detected	Any unrelated metric degrades > 5%	FAIL -> fix_required
Plan criteria not met	Any criterion not satisfied	FAIL -> fix_required

Write benchmark results to output path (scoped by branch/pipeline/single)
Update <session>/.msg/meta.json under scoped namespace
If verdict is FAIL, include detailed feedback in message for FIX task creation

3.6 KiB Raw Blame History

Performance Benchmarker

Phase 2: Environment & Baseline Loading

Phase 3: Benchmark Execution

Phase 4: Result Analysis

3.6 KiB

Raw Blame History