fix: resolve team worker task discovery failures and clean up legacy role-specs

- Remove owner name exact-match filter from team-worker.md Phase 1 task discovery (system appends numeric suffixes making match unreliable) - Fix role_spec paths in team-config.json for perf-opt, arch-opt, ux-improve (role-specs/<role>.md → roles/<role>/role.md) - Fix stale role-specs path in perf-opt monitor.md spawn template - Delete 14 dead role-specs/ directories (~60 duplicate files) across all teams - Add 8 missing .codex agent files (team-designer, team-iterdev, team-lifecycle-v4, team-uidesign) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 19:08:17 +08:00 · 2026-03-20 12:11:51 +08:00
parent b6c763fd1b
commit 26a7371a20
72 changed files with 1452 additions and 5263 deletions
--- a/.claude/skills/team-perf-opt/role-specs/benchmarker.md
+++ b/.claude/skills/team-perf-opt/role-specs/benchmarker.md
@@ -1,110 +0,0 @@
---
-prefix: BENCH
-inner_loop: false
-message_types:
-  success: bench_complete
-  error: error
-  fix: fix_required
---
-
-# Performance Benchmarker
-
-Run benchmarks comparing before/after optimization metrics. Validate that improvements meet plan success criteria and detect any regressions.
-
-## Phase 2: Environment & Baseline Loading
-
-| Input | Source | Required |
-|-------|--------|----------|
-| Baseline metrics | <session>/artifacts/baseline-metrics.json (shared) | Yes |
-| Optimization plan / detail | Varies by mode (see below) | Yes |
-| .msg/meta.json | <session>/.msg/meta.json | Yes |
-
-1. Extract session path from task description
-2. **Detect branch/pipeline context** from task description:
-
-| Task Description Field | Value | Context |
-|----------------------|-------|---------|
-| `BranchId: B{NN}` | Present | Fan-out branch -- benchmark only this branch's metrics |
-| `PipelineId: {P}` | Present | Independent pipeline -- use pipeline-scoped baseline |
-| Neither present | - | Single mode -- full benchmark |
-
-3. **Load baseline metrics**:
-   - Single / Fan-out: Read `<session>/artifacts/baseline-metrics.json` (shared baseline)
-   - Independent: Read `<session>/artifacts/pipelines/{P}/baseline-metrics.json`
-
-4. **Load optimization context**:
-   - Single: Read `<session>/artifacts/optimization-plan.md` -- all success criteria
-   - Fan-out branch: Read `<session>/artifacts/branches/B{NN}/optimization-detail.md` -- only this branch's criteria
-   - Independent: Read `<session>/artifacts/pipelines/{P}/optimization-plan.md`
-
-5. Load .msg/meta.json for project type and optimization scope
-6. Detect available benchmark tools from project:
-
-| Signal | Benchmark Tool | Method |
-|--------|---------------|--------|
-| package.json + vitest/jest | Test runner benchmarks | Run existing perf tests |
-| package.json + webpack/vite | Bundle analysis | Compare build output sizes |
-| Cargo.toml + criterion | Rust benchmarks | cargo bench |
-| go.mod | Go benchmarks | go test -bench |
-| Makefile with bench target | Custom benchmarks | make bench |
-| No tooling detected | Manual measurement | Timed execution via Bash |
-
-7. Get changed files scope from shared-memory:
-   - Single: `optimizer` namespace
-   - Fan-out: `optimizer.B{NN}` namespace
-   - Independent: `optimizer.{P}` namespace
-
-## Phase 3: Benchmark Execution
-
-Run benchmarks matching detected project type:
-
-**Frontend benchmarks**:
- Compare bundle size before/after (build output analysis)
- Measure render performance for affected components
- Check for dependency weight changes
-
-**Backend benchmarks**:
- Measure endpoint response times for affected routes
- Profile memory usage under simulated load
- Verify database query performance improvements
-
-**CLI / Library benchmarks**:
- Measure execution time for representative workloads
- Compare memory peak usage
- Test throughput under sustained load
-
-**All project types**:
- Run existing test suite to verify no regressions
- Collect post-optimization metrics matching baseline format
- Calculate improvement percentages per metric
-
-**Branch-scoped benchmarking** (fan-out mode):
- Only benchmark metrics relevant to this branch's optimization (from optimization-detail.md)
- Still check for regressions across all metrics (not just branch-specific ones)
-
-## Phase 4: Result Analysis
-
-Compare against baseline and plan criteria:
-
-| Metric | Threshold | Verdict |
-|--------|-----------|---------|
-| Target improvement vs baseline | Meets plan success criteria | PASS |
-| No regression in unrelated metrics | < 5% degradation allowed | PASS |
-| All plan success criteria met | Every criterion satisfied | PASS |
-| Improvement below target | > 50% of target achieved | WARN |
-| Regression detected | Any unrelated metric degrades > 5% | FAIL -> fix_required |
-| Plan criteria not met | Any criterion not satisfied | FAIL -> fix_required |
-
-1. Write benchmark results to output path:
-   - Single: `<session>/artifacts/benchmark-results.json`
-   - Fan-out: `<session>/artifacts/branches/B{NN}/benchmark-results.json`
-   - Independent: `<session>/artifacts/pipelines/{P}/benchmark-results.json`
-   - Content: Per-metric: name, baseline value, current value, improvement %, verdict; Overall verdict: PASS / WARN / FAIL; Regression details (if any)
-
-2. Update `<session>/.msg/meta.json` under scoped namespace:
-   - Single: merge `{ "benchmarker": { verdict, improvements, regressions } }`
-   - Fan-out: merge `{ "benchmarker.B{NN}": { verdict, improvements, regressions } }`
-   - Independent: merge `{ "benchmarker.{P}": { verdict, improvements, regressions } }`
-
-3. If verdict is FAIL, include detailed feedback in message for FIX task creation:
-   - Which metrics failed, by how much, suggested investigation areas
--- a/.claude/skills/team-perf-opt/role-specs/optimizer.md
+++ b/.claude/skills/team-perf-opt/role-specs/optimizer.md
@@ -1,102 +0,0 @@
---
-prefix: IMPL
-inner_loop: true
-additional_prefixes: [FIX]
-delegates_to: []
-message_types:
-  success: impl_complete
-  error: error
-  fix: fix_required
---
-
-# Code Optimizer
-
-Implement optimization changes following the strategy plan. For FIX tasks, apply targeted corrections based on review/benchmark feedback.
-
-## Modes
-
-| Mode | Task Prefix | Trigger | Focus |
-|------|-------------|---------|-------|
-| Implement | IMPL | Strategy plan ready | Apply optimizations per plan priority |
-| Fix | FIX | Review/bench feedback | Targeted fixes for identified issues |
-
-## Phase 2: Plan & Context Loading
-
-| Input | Source | Required |
-|-------|--------|----------|
-| Optimization plan | <session>/artifacts/optimization-plan.md | Yes (IMPL, no branch) |
-| Branch optimization detail | <session>/artifacts/branches/B{NN}/optimization-detail.md | Yes (IMPL with branch) |
-| Pipeline optimization plan | <session>/artifacts/pipelines/{P}/optimization-plan.md | Yes (IMPL with pipeline) |
-| Review/bench feedback | From task description | Yes (FIX) |
-| .msg/meta.json | <session>/.msg/meta.json | Yes |
-| Wisdom files | <session>/wisdom/patterns.md | No |
-| Context accumulator | From prior IMPL/FIX tasks | Yes (inner loop) |
-
-1. Extract session path and task mode (IMPL or FIX) from task description
-2. **Detect branch/pipeline context** from task description:
-
-| Task Description Field | Value | Context |
-|----------------------|-------|---------|
-| `BranchId: B{NN}` | Present | Fan-out branch -- load single optimization detail |
-| `PipelineId: {P}` | Present | Independent pipeline -- load pipeline-scoped plan |
-| Neither present | - | Single mode -- load full optimization plan |
-
-3. **Load optimization context by mode**:
-   - **Single mode (no branch)**: Read `<session>/artifacts/optimization-plan.md` -- extract ALL priority-ordered changes
-   - **Fan-out branch**: Read `<session>/artifacts/branches/B{NN}/optimization-detail.md` -- extract ONLY this branch's optimization (single OPT-ID)
-   - **Independent pipeline**: Read `<session>/artifacts/pipelines/{P}/optimization-plan.md` -- extract this pipeline's plan
-
-4. For FIX: parse review/benchmark feedback for specific issues to address
-5. Use ACE search or CLI tools to load implementation context for target files
-6. For inner loop (single mode only): load context_accumulator from prior IMPL/FIX tasks
-
-**Shared-memory namespace**:
- Single: write to `optimizer` namespace
- Fan-out: write to `optimizer.B{NN}` namespace
- Independent: write to `optimizer.{P}` namespace
-
-## Phase 3: Code Implementation
-
-Implementation backend selection:
-
-| Backend | Condition | Method |
-|---------|-----------|--------|
-| CLI | Multi-file optimization with clear plan | ccw cli --tool gemini --mode write |
-| Direct | Single-file changes or targeted fixes | Inline Edit/Write tools |
-
-For IMPL tasks:
- **Single mode**: Apply optimizations in plan priority order (P0 first, then P1, etc.)
- **Fan-out branch**: Apply ONLY this branch's single optimization (from optimization-detail.md)
- **Independent pipeline**: Apply this pipeline's optimizations in priority order
- Follow implementation guidance from plan (target files, patterns)
- Preserve existing behavior -- optimization must not break functionality
-
-For FIX tasks:
- Read specific issues from review/benchmark feedback
- Apply targeted corrections to flagged code locations
- Verify the fix addresses the exact concern raised
-
-General rules:
- Make minimal, focused changes per optimization
- Add comments only where optimization logic is non-obvious
- Preserve existing code style and conventions
-
-## Phase 4: Self-Validation
-
-| Check | Method | Pass Criteria |
-|-------|--------|---------------|
-| Syntax | IDE diagnostics or build check | No new errors |
-| File integrity | Verify all planned files exist and are modified | All present |
-| Acceptance | Match optimization plan success criteria | All target metrics addressed |
-| No regression | Run existing tests if available | No new failures |
-
-If validation fails, attempt auto-fix (max 2 attempts) before reporting error.
-
-Append to context_accumulator for next IMPL/FIX task (single/inner-loop mode only):
- Files modified, optimizations applied, validation results
- Any discovered patterns or caveats for subsequent iterations
-
-**Branch output paths**:
- Single: write artifacts to `<session>/artifacts/`
- Fan-out: write artifacts to `<session>/artifacts/branches/B{NN}/`
- Independent: write artifacts to `<session>/artifacts/pipelines/{P}/`
--- a/.claude/skills/team-perf-opt/role-specs/profiler.md
+++ b/.claude/skills/team-perf-opt/role-specs/profiler.md
@@ -1,73 +0,0 @@
---
-prefix: PROFILE
-inner_loop: false
-delegates_to: []
-message_types:
-  success: profile_complete
-  error: error
---
-
-# Performance Profiler
-
-Profile application performance to identify CPU, memory, I/O, network, and rendering bottlenecks. Produce quantified baseline metrics and a ranked bottleneck report.
-
-## Phase 2: Context & Environment Detection
-
-| Input | Source | Required |
-|-------|--------|----------|
-| Task description | From task subject/description | Yes |
-| Session path | Extracted from task description | Yes |
-| .msg/meta.json | <session>/.msg/meta.json | No |
-
-1. Extract session path and target scope from task description
-2. Detect project type by scanning for framework markers:
-
-| Signal File | Project Type | Profiling Focus |
-|-------------|-------------|-----------------|
-| package.json + React/Vue/Angular | Frontend | Render time, bundle size, FCP/LCP/CLS |
-| package.json + Express/Fastify/NestJS | Backend Node | CPU hotspots, memory, DB queries |
-| Cargo.toml / go.mod / pom.xml | Native/JVM Backend | CPU, memory, GC tuning |
-| Mixed framework markers | Full-stack | Split into FE + BE profiling passes |
-| CLI entry / bin/ directory | CLI Tool | Startup time, throughput, memory peak |
-| No detection | Generic | All profiling dimensions |
-
-3. Use ACE search or CLI tools to map performance-critical code paths within target scope
-4. Detect available profiling tools (test runners, benchmark harnesses, linting tools)
-
-## Phase 3: Performance Profiling
-
-Execute profiling based on detected project type:
-
-**Frontend profiling**:
- Analyze bundle size and dependency weight via build output
- Identify render-blocking resources and heavy components
- Check for unnecessary re-renders, large DOM trees, unoptimized assets
-
-**Backend profiling**:
- Trace hot code paths via execution analysis or instrumented runs
- Identify slow database queries, N+1 patterns, missing indexes
- Check memory allocation patterns and potential leaks
-
-**CLI / Library profiling**:
- Measure startup time and critical path latency
- Profile throughput under representative workloads
- Identify memory peaks and allocation churn
-
-**All project types**:
- Collect quantified baseline metrics (timing, memory, throughput)
- Rank top 3-5 bottlenecks by severity (Critical / High / Medium)
- Record evidence: file paths, line numbers, measured values
-
-## Phase 4: Report Generation
-
-1. Write baseline metrics to `<session>/artifacts/baseline-metrics.json`:
-   - Key metric names, measured values, units, measurement method
-   - Timestamp and environment details
-
-2. Write bottleneck report to `<session>/artifacts/bottleneck-report.md`:
-   - Ranked list of bottlenecks with severity, location (file:line), measured impact
-   - Evidence summary per bottleneck
-   - Detected project type and profiling methods used
-
-3. Update `<session>/.msg/meta.json` under `profiler` namespace:
-   - Read existing -> merge `{ "profiler": { project_type, bottleneck_count, top_bottleneck, scope } }` -> write back
--- a/.claude/skills/team-perf-opt/role-specs/reviewer.md
+++ b/.claude/skills/team-perf-opt/role-specs/reviewer.md
@@ -1,91 +0,0 @@
---
-prefix: REVIEW
-inner_loop: false
-additional_prefixes: [QUALITY]
-discuss_rounds: [DISCUSS-REVIEW]
-delegates_to: []
-message_types:
-  success: review_complete
-  error: error
-  fix: fix_required
---
-
-# Optimization Reviewer
-
-Review optimization code changes for correctness, side effects, regression risks, and adherence to best practices. Provide structured verdicts with actionable feedback.
-
-## Phase 2: Context Loading
-
-| Input | Source | Required |
-|-------|--------|----------|
-| Optimization code changes | From IMPL task artifacts / git diff | Yes |
-| Optimization plan / detail | Varies by mode (see below) | Yes |
-| Benchmark results | Varies by mode (see below) | No |
-| .msg/meta.json | <session>/.msg/meta.json | Yes |
-
-1. Extract session path from task description
-2. **Detect branch/pipeline context** from task description:
-
-| Task Description Field | Value | Context |
-|----------------------|-------|---------|
-| `BranchId: B{NN}` | Present | Fan-out branch -- review only this branch's changes |
-| `PipelineId: {P}` | Present | Independent pipeline -- review pipeline-scoped changes |
-| Neither present | - | Single mode -- review all optimization changes |
-
-3. **Load optimization context by mode**:
-   - Single: Read `<session>/artifacts/optimization-plan.md`
-   - Fan-out branch: Read `<session>/artifacts/branches/B{NN}/optimization-detail.md`
-   - Independent: Read `<session>/artifacts/pipelines/{P}/optimization-plan.md`
-
-4. Load .msg/meta.json for scoped optimizer namespace:
-   - Single: `optimizer` namespace
-   - Fan-out: `optimizer.B{NN}` namespace
-   - Independent: `optimizer.{P}` namespace
-
-5. Identify changed files from optimizer context -- read ONLY files modified by this branch/pipeline
-6. If benchmark results available, read from scoped path:
-   - Single: `<session>/artifacts/benchmark-results.json`
-   - Fan-out: `<session>/artifacts/branches/B{NN}/benchmark-results.json`
-   - Independent: `<session>/artifacts/pipelines/{P}/benchmark-results.json`
-
-## Phase 3: Multi-Dimension Review
-
-Analyze optimization changes across five dimensions:
-
-| Dimension | Focus | Severity |
-|-----------|-------|----------|
-| Correctness | Logic errors, off-by-one, race conditions, null safety | Critical |
-| Side effects | Unintended behavior changes, API contract breaks, data loss | Critical |
-| Maintainability | Code clarity, complexity increase, naming, documentation | High |
-| Regression risk | Impact on unrelated code paths, implicit dependencies | High |
-| Best practices | Idiomatic patterns, framework conventions, optimization anti-patterns | Medium |
-
-Per-dimension review process:
- Scan modified files for patterns matching each dimension
- Record findings with severity (Critical / High / Medium / Low)
- Include specific file:line references and suggested fixes
-
-If any Critical findings detected, use CLI tools for multi-perspective validation (DISCUSS-REVIEW round) to validate the assessment before issuing verdict.
-
-## Phase 4: Verdict & Feedback
-
-Classify overall verdict based on findings:
-
-| Verdict | Condition | Action |
-|---------|-----------|--------|
-| APPROVE | No Critical or High findings | Send review_complete |
-| REVISE | Has High findings, no Critical | Send fix_required with detailed feedback |
-| REJECT | Has Critical findings or fundamental approach flaw | Send fix_required + flag for strategist escalation |
-
-1. Write review report to scoped output path:
-   - Single: `<session>/artifacts/review-report.md`
-   - Fan-out: `<session>/artifacts/branches/B{NN}/review-report.md`
-   - Independent: `<session>/artifacts/pipelines/{P}/review-report.md`
-   - Content: Per-dimension findings with severity, file:line, description; Overall verdict with rationale; Specific fix instructions for REVISE/REJECT verdicts
-
-2. Update `<session>/.msg/meta.json` under scoped namespace:
-   - Single: merge `{ "reviewer": { verdict, finding_count, critical_count, dimensions_reviewed } }`
-   - Fan-out: merge `{ "reviewer.B{NN}": { verdict, finding_count, critical_count, dimensions_reviewed } }`
-   - Independent: merge `{ "reviewer.{P}": { verdict, finding_count, critical_count, dimensions_reviewed } }`
-
-3. If DISCUSS-REVIEW was triggered, record discussion summary in `<session>/discussions/DISCUSS-REVIEW.md` (or `DISCUSS-REVIEW-B{NN}.md` for branch-scoped discussions)
--- a/.claude/skills/team-perf-opt/role-specs/strategist.md
+++ b/.claude/skills/team-perf-opt/role-specs/strategist.md
@@ -1,114 +0,0 @@
---
-prefix: STRATEGY
-inner_loop: false
-discuss_rounds: [DISCUSS-OPT]
-delegates_to: []
-message_types:
-  success: strategy_complete
-  error: error
---
-
-# Optimization Strategist
-
-Analyze bottleneck reports and baseline metrics to design a prioritized optimization plan with concrete strategies, expected improvements, and risk assessments.
-
-## Phase 2: Analysis Loading
-
-| Input | Source | Required |
-|-------|--------|----------|
-| Bottleneck report | <session>/artifacts/bottleneck-report.md | Yes |
-| Baseline metrics | <session>/artifacts/baseline-metrics.json | Yes |
-| .msg/meta.json | <session>/.msg/meta.json | Yes |
-| Wisdom files | <session>/wisdom/patterns.md | No |
-
-1. Extract session path from task description
-2. Read bottleneck report -- extract ranked bottleneck list with severities
-3. Read baseline metrics -- extract current performance numbers
-4. Load .msg/meta.json for profiler findings (project_type, scope)
-5. Assess overall optimization complexity:
-
-| Bottleneck Count | Severity Mix | Complexity |
-|-----------------|-------------|------------|
-| 1-2 | All Medium | Low |
-| 2-3 | Mix of High/Medium | Medium |
-| 3+ or any Critical | Any Critical present | High |
-
-## Phase 3: Strategy Formulation
-
-For each bottleneck, select optimization approach by type:
-
-| Bottleneck Type | Strategies | Risk Level |
-|----------------|-----------|------------|
-| CPU hotspot | Algorithm optimization, memoization, caching, worker threads | Medium |
-| Memory leak/bloat | Pool reuse, lazy initialization, WeakRef, scope cleanup | High |
-| I/O bound | Batching, async pipelines, streaming, connection pooling | Medium |
-| Network latency | Request coalescing, compression, CDN, prefetching | Low |
-| Rendering | Virtualization, memoization, CSS containment, code splitting | Medium |
-| Database | Index optimization, query rewriting, caching layer, denormalization | High |
-
-Prioritize optimizations by impact/effort ratio:
-
-| Priority | Criteria |
-|----------|----------|
-| P0 (Critical) | High impact + Low effort -- quick wins |
-| P1 (High) | High impact + Medium effort |
-| P2 (Medium) | Medium impact + Low effort |
-| P3 (Low) | Low impact or High effort -- defer |
-
-If complexity is High, use CLI tools for multi-perspective analysis (DISCUSS-OPT round) to evaluate trade-offs between competing strategies before finalizing the plan.
-
-Define measurable success criteria per optimization (target metric value or improvement %).
-
-## Phase 4: Plan Output
-
-1. Write optimization plan to `<session>/artifacts/optimization-plan.md`:
-
-   Each optimization MUST have a unique OPT-ID and self-contained detail block:
-
-   ```markdown
-   ### OPT-001: <title>
-   - Priority: P0
-   - Target bottleneck: <bottleneck from report>
-   - Target files: <file-list>
-   - Strategy: <selected approach>
-   - Expected improvement: <metric> by <X%>
-   - Risk level: <Low/Medium/High>
-   - Success criteria: <specific threshold to verify>
-   - Implementation guidance:
-     1. <step 1>
-     2. <step 2>
-     3. <step 3>
-
-   ### OPT-002: <title>
-   ...
-   ```
-
-   Requirements:
-   - Each OPT-ID is sequentially numbered (OPT-001, OPT-002, ...)
-   - Each optimization must be **non-overlapping** in target files (no two OPT-IDs modify the same file unless explicitly noted with conflict resolution)
-   - Implementation guidance must be self-contained -- a branch optimizer should be able to work from a single OPT block without reading others
-
-2. Update `<session>/.msg/meta.json` under `strategist` namespace:
-   - Read existing -> merge -> write back:
-   ```json
-   {
-     "strategist": {
-       "complexity": "<Low|Medium|High>",
-       "optimization_count": 4,
-       "priorities": ["P0", "P0", "P1", "P2"],
-       "discuss_used": false,
-       "optimizations": [
-         {
-           "id": "OPT-001",
-           "title": "<title>",
-           "priority": "P0",
-           "target_files": ["src/a.ts", "src/b.ts"],
-           "expected_improvement": "<metric> by <X%>",
-           "success_criteria": "<threshold>"
-         }
-       ]
-     }
-   }
-   ```
-
-3. If DISCUSS-OPT was triggered, record discussion summary in `<session>/discussions/DISCUSS-OPT.md`
--- a/.claude/skills/team-perf-opt/roles/coordinator/commands/monitor.md
+++ b/.claude/skills/team-perf-opt/roles/coordinator/commands/monitor.md
@@ -73,7 +73,7 @@ Agent({
  run_in_background: true,
  prompt: `## Role Assignment
 role: <role>
-role_spec: ~  or <project>/.claude/skills/team-perf-opt/role-specs/<role>.md
+role_spec: ~  or <project>/.claude/skills/team-perf-opt/roles/<role>/role.md
 session: <session-folder>
 session_id: <session-id>
 team_name: perf-opt
--- a/.claude/skills/team-perf-opt/specs/team-config.json
+++ b/.claude/skills/team-perf-opt/specs/team-config.json
@@ -24,7 +24,7 @@
      "name": "profiler",
      "type": "orchestration",
      "description": "Profiles application performance, identifies CPU/memory/IO/network/rendering bottlenecks",
-      "role_spec": "role-specs/profiler.md",
+      "role_spec": "roles/profiler/role.md",
      "inner_loop": false,
      "frontmatter": {
        "prefix": "PROFILE",
@@ -44,7 +44,7 @@
      "name": "strategist",
      "type": "orchestration",
      "description": "Analyzes bottleneck reports, designs prioritized optimization plans with concrete strategies",
-      "role_spec": "role-specs/strategist.md",
+      "role_spec": "roles/strategist/role.md",
      "inner_loop": false,
      "frontmatter": {
        "prefix": "STRATEGY",
@@ -64,7 +64,7 @@
      "name": "optimizer",
      "type": "code_generation",
      "description": "Implements optimization changes following the strategy plan",
-      "role_spec": "role-specs/optimizer.md",
+      "role_spec": "roles/optimizer/role.md",
      "inner_loop": true,
      "frontmatter": {
        "prefix": "IMPL",
@@ -85,7 +85,7 @@
      "name": "benchmarker",
      "type": "validation",
      "description": "Runs benchmarks, compares before/after metrics, validates performance improvements",
-      "role_spec": "role-specs/benchmarker.md",
+      "role_spec": "roles/benchmarker/role.md",
      "inner_loop": false,
      "frontmatter": {
        "prefix": "BENCH",
@@ -106,7 +106,7 @@
      "name": "reviewer",
      "type": "read_only_analysis",
      "description": "Reviews optimization code for correctness, side effects, and regression risks",
-      "role_spec": "role-specs/reviewer.md",
+      "role_spec": "roles/reviewer/role.md",
      "inner_loop": false,
      "frontmatter": {
        "prefix": "REVIEW",