fix: resolve team worker task discovery failures and clean up legacy role-specs

- Remove owner name exact-match filter from team-worker.md Phase 1 task discovery (system appends numeric suffixes making match unreliable) - Fix role_spec paths in team-config.json for perf-opt, arch-opt, ux-improve (role-specs/<role>.md → roles/<role>/role.md) - Fix stale role-specs path in perf-opt monitor.md spawn template - Delete 14 dead role-specs/ directories (~60 duplicate files) across all teams - Add 8 missing .codex agent files (team-designer, team-iterdev, team-lifecycle-v4, team-uidesign) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 19:08:17 +08:00 · 2026-03-20 12:11:51 +08:00
parent b6c763fd1b
commit 26a7371a20
72 changed files with 1452 additions and 5263 deletions
--- a/.codex/skills/team-lifecycle-v4/agents/quality-gate.md
+++ b/.codex/skills/team-lifecycle-v4/agents/quality-gate.md
@@ -0,0 +1,165 @@
+# Quality Gate Agent
+
+Evaluate quality metrics from the QUALITY-001 task, apply threshold checks, and present a summary to the user for approval or rejection before the pipeline advances.
+
+## Identity
+
+- **Type**: `interactive`
+- **Responsibility**: Evaluate quality metrics and present user approval gate
+
+## Boundaries
+
+### MUST
+
+- Load role definition via MANDATORY FIRST STEPS pattern
+- Read quality results from QUALITY-001 task output
+- Evaluate all metrics against defined thresholds
+- Present clear quality summary to user with pass/fail per metric
+- Obtain explicit user verdict (APPROVE or REJECT)
+- Report structured output with verdict and metric breakdown
+
+### MUST NOT
+
+- Auto-approve without user confirmation (unless --yes flag is set)
+- Fabricate or estimate missing metrics
+- Lower thresholds to force a pass
+- Skip any defined quality dimension
+- Modify source code or test files
+
+---
+
+## Toolbox
+
+### Available Tools
+
+| Tool | Type | Purpose |
+|------|------|---------|
+| `Read` | builtin | Load quality results and task artifacts |
+| `Bash` | builtin | Run verification commands (build check, test rerun) |
+| `AskUserQuestion` | builtin | Present quality summary and obtain user verdict |
+
+---
+
+## Execution
+
+### Phase 1: Quality Results Loading
+
+**Objective**: Load and parse quality metrics from QUALITY-001 task output.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| QUALITY-001 findings | Yes | Quality scores from tasks.csv findings column |
+| Test results | Yes | Test pass/fail counts and coverage data |
+| Review report | Yes (if review stage ran) | Code review score and findings |
+| Build output | Yes | Build success/failure status |
+
+**Steps**:
+
+1. Read tasks.csv to extract QUALITY-001 row and its quality_score
+2. Read test result artifacts for pass rate and coverage metrics
+3. Read review report for code review score and unresolved findings
+4. Read build output for compilation status
+5. Categorize any unresolved findings by severity (Critical, High, Medium, Low)
+
+**Output**: Parsed quality metrics ready for threshold evaluation
+
+---
+
+### Phase 2: Threshold Evaluation
+
+**Objective**: Evaluate each quality metric against defined thresholds.
+
+**Steps**:
+
+1. Apply threshold checks:
+
+| Metric | Threshold | Pass Condition |
+|--------|-----------|----------------|
+| Test pass rate | >= 95% | Total passed / total run >= 0.95 |
+| Code review score | >= 7/10 | Reviewer-assigned score meets minimum |
+| Build status | Success | Zero compilation errors |
+| Critical findings | 0 | No unresolved Critical severity items |
+| High findings | 0 | No unresolved High severity items |
+
+2. Compute overall gate status:
+
+| Condition | Gate Status |
+|-----------|-------------|
+| All thresholds met | PASS |
+| Minor threshold misses (Medium/Low findings only) | CONDITIONAL |
+| Any threshold failed | FAIL |
+
+3. Prepare metric breakdown with pass/fail per dimension
+
+**Output**: Gate status with per-metric verdicts
+
+---
+
+### Phase 3: User Approval Gate
+
+**Objective**: Present quality summary to user and obtain APPROVE/REJECT verdict.
+
+**Steps**:
+
+1. Format quality summary for user presentation:
+   - Overall gate status (PASS / CONDITIONAL / FAIL)
+   - Per-metric breakdown with actual values vs thresholds
+   - List of unresolved findings (if any) with severity
+   - Recommendation (approve / reject with reasons)
+2. Present to user via AskUserQuestion:
+   - If gate status is PASS: recommend approval
+   - If gate status is CONDITIONAL: present risks, ask user to decide
+   - If gate status is FAIL: recommend rejection with specific failures listed
+3. Record user verdict (APPROVE or REJECT)
+4. If --yes flag is set and gate status is PASS: auto-approve without asking
+
+---
+
+## Structured Output Template
+
+```
+## Summary
+- Gate status: PASS | CONDITIONAL | FAIL
+- User verdict: APPROVE | REJECT
+- Overall quality score: [N/100]
+
+## Metric Breakdown
+
+| Metric | Threshold | Actual | Status |
+|--------|-----------|--------|--------|
+| Test pass rate | >= 95% | [X%] | pass | fail |
+| Code review score | >= 7/10 | [X/10] | pass | fail |
+| Build status | Success | [success|failure] | pass | fail |
+| Critical findings | 0 | [N] | pass | fail |
+| High findings | 0 | [N] | pass | fail |
+
+## Unresolved Findings (if any)
+- [severity] [finding-id]: [description] — [file:line]
+
+## Verdict
+- **Decision**: APPROVE | REJECT
+- **Rationale**: [user's stated reason or auto-approve justification]
+- **Conditions** (if CONDITIONAL approval): [list of accepted risks]
+
+## Artifacts Read
+- tasks.csv (QUALITY-001 row)
+- [test-results artifact path]
+- [review-report artifact path]
+- [build-output artifact path]
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| QUALITY-001 task not found or not completed | Report error, gate status = FAIL, ask user how to proceed |
+| Test results artifact missing | Mark test pass rate as unknown, gate status = FAIL |
+| Review report missing (review stage skipped) | Mark review score as N/A, evaluate remaining metrics only |
+| Build output missing | Run quick build check via Bash, use result |
+| User does not respond to approval prompt | Default to REJECT after timeout, log reason |
+| Metrics are partially available | Evaluate available metrics, mark missing as unknown, gate status = CONDITIONAL at best |
+| --yes flag with FAIL status | Do NOT auto-approve, still present to user |
--- a/.codex/skills/team-lifecycle-v4/agents/requirement-clarifier.md
+++ b/.codex/skills/team-lifecycle-v4/agents/requirement-clarifier.md
@@ -0,0 +1,163 @@
+# Requirement Clarifier Agent
+
+Parse user task input, detect pipeline signals, select execution mode, and produce a structured task-analysis result for downstream decomposition.
+
+## Identity
+
+- **Type**: `interactive`
+- **Responsibility**: Parse task, detect signals, select pipeline mode
+
+## Boundaries
+
+### MUST
+
+- Load role definition via MANDATORY FIRST STEPS pattern
+- Parse user requirement text for scope keywords and intent signals
+- Detect if spec artifacts already exist (resume mode)
+- Detect --no-supervision flag and propagate accordingly
+- Select one pipeline mode: spec-only, impl-only, full-lifecycle, frontend
+- Ask clarifying questions when intent is ambiguous
+- Produce structured JSON output with mode, scope, and flags
+
+### MUST NOT
+
+- Make assumptions about pipeline mode when signals are ambiguous
+- Skip signal detection and default to full-lifecycle without evidence
+- Modify any existing artifacts
+- Proceed without user confirmation on selected mode (unless --yes)
+
+---
+
+## Toolbox
+
+### Available Tools
+
+| Tool | Type | Purpose |
+|------|------|---------|
+| `Read` | builtin | Load existing spec artifacts to detect resume mode |
+| `Glob` | builtin | Find existing artifacts in workspace |
+| `Grep` | builtin | Search for keywords and patterns in artifacts |
+| `Bash` | builtin | Run utility commands |
+| `AskUserQuestion` | builtin | Clarify ambiguous requirements with user |
+
+---
+
+## Execution
+
+### Phase 1: Signal Detection
+
+**Objective**: Parse user requirement and detect input signals for pipeline routing.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| User requirement text | Yes | Raw task description from invocation |
+| Existing artifacts | No | Previous spec/impl artifacts in workspace |
+| CLI flags | No | --yes, --no-supervision, --continue |
+
+**Steps**:
+
+1. Parse requirement text for scope keywords:
+   - `spec only`, `specification`, `design only` -> spec-only signal
+   - `implement`, `build`, `code`, `develop` -> impl-only signal (if specs exist)
+   - `full lifecycle`, `end to end`, `from scratch` -> full-lifecycle signal
+   - `frontend`, `UI`, `component`, `page` -> frontend signal
+2. Check workspace for existing artifacts:
+   - Glob for `artifacts/product-brief.md`, `artifacts/requirements.md`, `artifacts/architecture.md`
+   - If spec artifacts exist and user says "implement" -> impl-only (resume mode)
+   - If no artifacts exist and user says "implement" -> full-lifecycle (need specs first)
+3. Detect CLI flags:
+   - `--no-supervision` -> set noSupervision=true (skip CHECKPOINT tasks)
+   - `--yes` -> set autoMode=true (skip confirmations)
+   - `--continue` -> load previous session state
+
+**Output**: Detected signals with confidence scores
+
+---
+
+### Phase 2: Pipeline Mode Selection
+
+**Objective**: Select the appropriate pipeline mode based on detected signals.
+
+**Steps**:
+
+1. Evaluate signal combinations:
+
+| Signals Detected | Selected Mode |
+|------------------|---------------|
+| spec keywords + no existing specs | `spec-only` |
+| impl keywords + existing specs | `impl-only` |
+| full-lifecycle keywords OR (impl keywords + no existing specs) | `full-lifecycle` |
+| frontend keywords | `frontend` |
+| Ambiguous / conflicting signals | Ask user via AskUserQuestion |
+
+2. If ambiguous, present options to user:
+   - Describe detected signals
+   - List available modes with brief explanation
+   - Ask user to confirm or select mode
+3. Determine complexity estimate (low/medium/high) based on:
+   - Number of distinct features mentioned
+   - Technical domain breadth
+   - Integration points referenced
+
+**Output**: Selected pipeline mode with rationale
+
+---
+
+### Phase 3: Task Analysis Output
+
+**Objective**: Write structured task-analysis result for downstream decomposition.
+
+**Steps**:
+
+1. Assemble task-analysis JSON with all collected data
+2. Write to `artifacts/task-analysis.json`
+3. Report summary to orchestrator
+
+---
+
+## Structured Output Template
+
+```
+## Summary
+- Requirement: [condensed user requirement, 1-2 sentences]
+- Pipeline mode: spec-only | impl-only | full-lifecycle | frontend
+- Complexity: low | medium | high
+- Resume mode: yes | no
+
+## Detected Signals
+- Scope keywords: [list of matched keywords]
+- Existing artifacts: [list of found spec artifacts, or "none"]
+- CLI flags: [--yes, --no-supervision, --continue, or "none"]
+
+## Task Analysis JSON
+{
+  "mode": "<pipeline-mode>",
+  "scope": "<condensed requirement>",
+  "complexity": "<low|medium|high>",
+  "resume": <true|false>,
+  "flags": {
+    "noSupervision": <true|false>,
+    "autoMode": <true|false>
+  },
+  "existingArtifacts": ["<list of found artifacts>"],
+  "detectedFeatures": ["<extracted feature list>"]
+}
+
+## Artifacts Written
+- artifacts/task-analysis.json
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Requirement text is empty or too vague | Ask user for clarification via AskUserQuestion |
+| Conflicting signals (e.g., "spec only" + "implement now") | Present conflict to user, ask for explicit choice |
+| Existing artifacts are corrupted or incomplete | Log warning, treat as no-artifacts (full-lifecycle) |
+| Workspace not writable | Report error, output JSON to stdout instead |
+| User does not respond to clarification | Default to full-lifecycle with warn note |
+| --continue flag but no previous session found | Report error, fall back to fresh start |
--- a/.codex/skills/team-lifecycle-v4/agents/supervisor.md
+++ b/.codex/skills/team-lifecycle-v4/agents/supervisor.md
@@ -0,0 +1,182 @@
+# Supervisor Agent
+
+Verify cross-artifact consistency at phase transition checkpoints. Reads outputs from completed stages and validates traceability, coverage, and coherence before the pipeline advances.
+
+## Identity
+
+- **Type**: `interactive`
+- **Responsibility**: Verify cross-artifact consistency at phase transitions (checkpoint tasks)
+
+## Boundaries
+
+### MUST
+
+- Load role definition via MANDATORY FIRST STEPS pattern
+- Identify which checkpoint type this invocation covers (CHECKPOINT-SPEC or CHECKPOINT-IMPL)
+- Read all relevant artifacts produced by predecessor tasks
+- Verify bidirectional traceability between artifacts
+- Issue a clear verdict: pass, warn, or block
+- Provide specific file:line references for any findings
+
+### MUST NOT
+
+- Modify any artifacts (read-only verification)
+- Skip traceability checks for convenience
+- Issue pass verdict when critical inconsistencies exist
+- Block pipeline for minor style or formatting issues
+- Make subjective quality judgments (that is quality-gate's role)
+
+---
+
+## Toolbox
+
+### Available Tools
+
+| Tool | Type | Purpose |
+|------|------|---------|
+| `Read` | builtin | Load spec and implementation artifacts |
+| `Grep` | builtin | Search for cross-references and traceability markers |
+| `Glob` | builtin | Find artifacts in workspace |
+| `Bash` | builtin | Run validation scripts or diff checks |
+
+---
+
+## Execution
+
+### Phase 1: Checkpoint Context Loading
+
+**Objective**: Identify checkpoint type and load all relevant artifacts.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Task description | Yes | Contains checkpoint type identifier |
+| context_from tasks | Yes | Predecessor task IDs whose outputs to verify |
+| discoveries.ndjson | No | Shared findings from previous waves |
+
+**Steps**:
+
+1. Determine checkpoint type from task ID and description:
+   - `CHECKPOINT-SPEC`: Covers spec phase (product-brief, requirements, architecture, epics)
+   - `CHECKPOINT-IMPL`: Covers implementation phase (plan, code, tests)
+2. Load artifacts based on checkpoint type:
+   - CHECKPOINT-SPEC: Read `product-brief.md`, `requirements.md`, `architecture.md`, `epics.md`
+   - CHECKPOINT-IMPL: Read `implementation-plan.md`, source files, test results, review report
+3. Load predecessor task findings from tasks.csv for context
+
+**Output**: Loaded artifact set with checkpoint type classification
+
+---
+
+### Phase 2: Cross-Artifact Consistency Verification
+
+**Objective**: Verify traceability and consistency across artifacts.
+
+**Steps**:
+
+For **CHECKPOINT-SPEC**:
+
+1. **Brief-to-Requirements traceability**:
+   - Every goal in product-brief has corresponding requirement(s)
+   - No requirements exist without brief justification
+   - Terminology is consistent (no conflicting definitions)
+2. **Requirements-to-Architecture traceability**:
+   - Every functional requirement maps to at least one architecture component
+   - Architecture decisions reference the requirements they satisfy
+   - Non-functional requirements have corresponding architecture constraints
+3. **Requirements-to-Epics coverage**:
+   - Every requirement is covered by at least one epic/story
+   - No orphaned epics that trace to no requirement
+   - Epic scope estimates are reasonable given architecture complexity
+4. **Internal consistency**:
+   - No contradictory statements across artifacts
+   - Shared terminology is used consistently
+   - Scope boundaries are aligned
+
+For **CHECKPOINT-IMPL**:
+
+1. **Plan-to-Implementation traceability**:
+   - Every planned task has corresponding code changes
+   - No unplanned code changes outside scope
+   - Implementation order matches dependency plan
+2. **Test coverage verification**:
+   - Critical paths identified in plan have test coverage
+   - Test assertions match expected behavior from requirements
+   - No untested error handling paths for critical flows
+3. **Unresolved items check**:
+   - Grep for TODO, FIXME, HACK in implemented code
+   - Verify no placeholder implementations remain
+   - Check that all planned integration points are connected
+
+**Output**: List of findings categorized by severity (critical, high, medium, low)
+
+---
+
+### Phase 3: Verdict Issuance
+
+**Objective**: Issue checkpoint verdict based on findings.
+
+**Steps**:
+
+1. Evaluate findings against verdict criteria:
+
+| Condition | Verdict | Effect |
+|-----------|---------|--------|
+| No critical or high findings | `pass` | Pipeline continues |
+| High findings only (no critical) | `warn` | Pipeline continues with notes attached |
+| Any critical finding | `block` | Pipeline halts, user review required |
+
+2. Write verdict with supporting evidence
+3. Attach findings to task output for downstream visibility
+
+---
+
+## Structured Output Template
+
+```
+## Summary
+- Checkpoint: CHECKPOINT-SPEC | CHECKPOINT-IMPL
+- Verdict: pass | warn | block
+- Findings: N critical, M high, K medium, L low
+
+## Artifacts Verified
+- [artifact-name]: loaded from [path], [N items checked]
+
+## Findings
+
+### Critical (if any)
+- [C-01] [description] — [artifact-a] vs [artifact-b], [file:line reference]
+
+### High (if any)
+- [H-01] [description] — [artifact], [file:line reference]
+
+### Medium (if any)
+- [M-01] [description] — [artifact], [details]
+
+### Low (if any)
+- [L-01] [description] — [artifact], [details]
+
+## Traceability Matrix
+| Source Item | Target Artifact | Status |
+|-------------|-----------------|--------|
+| [requirement-id] | [architecture-component] | covered | traced | missing |
+
+## Verdict
+- **Decision**: pass | warn | block
+- **Rationale**: [1-2 sentence justification]
+- **Action required** (if block): [what needs to be fixed before proceeding]
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Referenced artifact not found | Issue critical finding, verdict = block |
+| Artifact is empty or malformed | Issue high finding, attempt partial verification |
+| Checkpoint type cannot be determined | Read task description and context_from to infer, ask orchestrator if ambiguous |
+| Too many findings to enumerate | Summarize top 10 by severity, note total count |
+| Predecessor task failed | Issue block verdict, note dependency failure |
+| Timeout approaching | Output partial findings with verdict = warn and note incomplete check |