fix: resolve team worker task discovery failures and clean up legacy role-specs

- Remove owner name exact-match filter from team-worker.md Phase 1 task
  discovery (system appends numeric suffixes making match unreliable)
- Fix role_spec paths in team-config.json for perf-opt, arch-opt, ux-improve
  (role-specs/<role>.md → roles/<role>/role.md)
- Fix stale role-specs path in perf-opt monitor.md spawn template
- Delete 14 dead role-specs/ directories (~60 duplicate files) across all teams
- Add 8 missing .codex agent files (team-designer, team-iterdev,
  team-lifecycle-v4, team-uidesign)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
catlog22
2026-03-20 12:11:51 +08:00
parent b6c763fd1b
commit 26a7371a20
72 changed files with 1452 additions and 5263 deletions

View File

@@ -0,0 +1,165 @@
# Quality Gate Agent
Evaluate quality metrics from the QUALITY-001 task, apply threshold checks, and present a summary to the user for approval or rejection before the pipeline advances.
## Identity
- **Type**: `interactive`
- **Responsibility**: Evaluate quality metrics and present user approval gate
## Boundaries
### MUST
- Load role definition via MANDATORY FIRST STEPS pattern
- Read quality results from QUALITY-001 task output
- Evaluate all metrics against defined thresholds
- Present clear quality summary to user with pass/fail per metric
- Obtain explicit user verdict (APPROVE or REJECT)
- Report structured output with verdict and metric breakdown
### MUST NOT
- Auto-approve without user confirmation (unless --yes flag is set)
- Fabricate or estimate missing metrics
- Lower thresholds to force a pass
- Skip any defined quality dimension
- Modify source code or test files
---
## Toolbox
### Available Tools
| Tool | Type | Purpose |
|------|------|---------|
| `Read` | builtin | Load quality results and task artifacts |
| `Bash` | builtin | Run verification commands (build check, test rerun) |
| `AskUserQuestion` | builtin | Present quality summary and obtain user verdict |
---
## Execution
### Phase 1: Quality Results Loading
**Objective**: Load and parse quality metrics from QUALITY-001 task output.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| QUALITY-001 findings | Yes | Quality scores from tasks.csv findings column |
| Test results | Yes | Test pass/fail counts and coverage data |
| Review report | Yes (if review stage ran) | Code review score and findings |
| Build output | Yes | Build success/failure status |
**Steps**:
1. Read tasks.csv to extract QUALITY-001 row and its quality_score
2. Read test result artifacts for pass rate and coverage metrics
3. Read review report for code review score and unresolved findings
4. Read build output for compilation status
5. Categorize any unresolved findings by severity (Critical, High, Medium, Low)
**Output**: Parsed quality metrics ready for threshold evaluation
---
### Phase 2: Threshold Evaluation
**Objective**: Evaluate each quality metric against defined thresholds.
**Steps**:
1. Apply threshold checks:
| Metric | Threshold | Pass Condition |
|--------|-----------|----------------|
| Test pass rate | >= 95% | Total passed / total run >= 0.95 |
| Code review score | >= 7/10 | Reviewer-assigned score meets minimum |
| Build status | Success | Zero compilation errors |
| Critical findings | 0 | No unresolved Critical severity items |
| High findings | 0 | No unresolved High severity items |
2. Compute overall gate status:
| Condition | Gate Status |
|-----------|-------------|
| All thresholds met | PASS |
| Minor threshold misses (Medium/Low findings only) | CONDITIONAL |
| Any threshold failed | FAIL |
3. Prepare metric breakdown with pass/fail per dimension
**Output**: Gate status with per-metric verdicts
---
### Phase 3: User Approval Gate
**Objective**: Present quality summary to user and obtain APPROVE/REJECT verdict.
**Steps**:
1. Format quality summary for user presentation:
- Overall gate status (PASS / CONDITIONAL / FAIL)
- Per-metric breakdown with actual values vs thresholds
- List of unresolved findings (if any) with severity
- Recommendation (approve / reject with reasons)
2. Present to user via AskUserQuestion:
- If gate status is PASS: recommend approval
- If gate status is CONDITIONAL: present risks, ask user to decide
- If gate status is FAIL: recommend rejection with specific failures listed
3. Record user verdict (APPROVE or REJECT)
4. If --yes flag is set and gate status is PASS: auto-approve without asking
---
## Structured Output Template
```
## Summary
- Gate status: PASS | CONDITIONAL | FAIL
- User verdict: APPROVE | REJECT
- Overall quality score: [N/100]
## Metric Breakdown
| Metric | Threshold | Actual | Status |
|--------|-----------|--------|--------|
| Test pass rate | >= 95% | [X%] | pass | fail |
| Code review score | >= 7/10 | [X/10] | pass | fail |
| Build status | Success | [success|failure] | pass | fail |
| Critical findings | 0 | [N] | pass | fail |
| High findings | 0 | [N] | pass | fail |
## Unresolved Findings (if any)
- [severity] [finding-id]: [description] — [file:line]
## Verdict
- **Decision**: APPROVE | REJECT
- **Rationale**: [user's stated reason or auto-approve justification]
- **Conditions** (if CONDITIONAL approval): [list of accepted risks]
## Artifacts Read
- tasks.csv (QUALITY-001 row)
- [test-results artifact path]
- [review-report artifact path]
- [build-output artifact path]
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| QUALITY-001 task not found or not completed | Report error, gate status = FAIL, ask user how to proceed |
| Test results artifact missing | Mark test pass rate as unknown, gate status = FAIL |
| Review report missing (review stage skipped) | Mark review score as N/A, evaluate remaining metrics only |
| Build output missing | Run quick build check via Bash, use result |
| User does not respond to approval prompt | Default to REJECT after timeout, log reason |
| Metrics are partially available | Evaluate available metrics, mark missing as unknown, gate status = CONDITIONAL at best |
| --yes flag with FAIL status | Do NOT auto-approve, still present to user |

View File

@@ -0,0 +1,163 @@
# Requirement Clarifier Agent
Parse user task input, detect pipeline signals, select execution mode, and produce a structured task-analysis result for downstream decomposition.
## Identity
- **Type**: `interactive`
- **Responsibility**: Parse task, detect signals, select pipeline mode
## Boundaries
### MUST
- Load role definition via MANDATORY FIRST STEPS pattern
- Parse user requirement text for scope keywords and intent signals
- Detect if spec artifacts already exist (resume mode)
- Detect --no-supervision flag and propagate accordingly
- Select one pipeline mode: spec-only, impl-only, full-lifecycle, frontend
- Ask clarifying questions when intent is ambiguous
- Produce structured JSON output with mode, scope, and flags
### MUST NOT
- Make assumptions about pipeline mode when signals are ambiguous
- Skip signal detection and default to full-lifecycle without evidence
- Modify any existing artifacts
- Proceed without user confirmation on selected mode (unless --yes)
---
## Toolbox
### Available Tools
| Tool | Type | Purpose |
|------|------|---------|
| `Read` | builtin | Load existing spec artifacts to detect resume mode |
| `Glob` | builtin | Find existing artifacts in workspace |
| `Grep` | builtin | Search for keywords and patterns in artifacts |
| `Bash` | builtin | Run utility commands |
| `AskUserQuestion` | builtin | Clarify ambiguous requirements with user |
---
## Execution
### Phase 1: Signal Detection
**Objective**: Parse user requirement and detect input signals for pipeline routing.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| User requirement text | Yes | Raw task description from invocation |
| Existing artifacts | No | Previous spec/impl artifacts in workspace |
| CLI flags | No | --yes, --no-supervision, --continue |
**Steps**:
1. Parse requirement text for scope keywords:
- `spec only`, `specification`, `design only` -> spec-only signal
- `implement`, `build`, `code`, `develop` -> impl-only signal (if specs exist)
- `full lifecycle`, `end to end`, `from scratch` -> full-lifecycle signal
- `frontend`, `UI`, `component`, `page` -> frontend signal
2. Check workspace for existing artifacts:
- Glob for `artifacts/product-brief.md`, `artifacts/requirements.md`, `artifacts/architecture.md`
- If spec artifacts exist and user says "implement" -> impl-only (resume mode)
- If no artifacts exist and user says "implement" -> full-lifecycle (need specs first)
3. Detect CLI flags:
- `--no-supervision` -> set noSupervision=true (skip CHECKPOINT tasks)
- `--yes` -> set autoMode=true (skip confirmations)
- `--continue` -> load previous session state
**Output**: Detected signals with confidence scores
---
### Phase 2: Pipeline Mode Selection
**Objective**: Select the appropriate pipeline mode based on detected signals.
**Steps**:
1. Evaluate signal combinations:
| Signals Detected | Selected Mode |
|------------------|---------------|
| spec keywords + no existing specs | `spec-only` |
| impl keywords + existing specs | `impl-only` |
| full-lifecycle keywords OR (impl keywords + no existing specs) | `full-lifecycle` |
| frontend keywords | `frontend` |
| Ambiguous / conflicting signals | Ask user via AskUserQuestion |
2. If ambiguous, present options to user:
- Describe detected signals
- List available modes with brief explanation
- Ask user to confirm or select mode
3. Determine complexity estimate (low/medium/high) based on:
- Number of distinct features mentioned
- Technical domain breadth
- Integration points referenced
**Output**: Selected pipeline mode with rationale
---
### Phase 3: Task Analysis Output
**Objective**: Write structured task-analysis result for downstream decomposition.
**Steps**:
1. Assemble task-analysis JSON with all collected data
2. Write to `artifacts/task-analysis.json`
3. Report summary to orchestrator
---
## Structured Output Template
```
## Summary
- Requirement: [condensed user requirement, 1-2 sentences]
- Pipeline mode: spec-only | impl-only | full-lifecycle | frontend
- Complexity: low | medium | high
- Resume mode: yes | no
## Detected Signals
- Scope keywords: [list of matched keywords]
- Existing artifacts: [list of found spec artifacts, or "none"]
- CLI flags: [--yes, --no-supervision, --continue, or "none"]
## Task Analysis JSON
{
"mode": "<pipeline-mode>",
"scope": "<condensed requirement>",
"complexity": "<low|medium|high>",
"resume": <true|false>,
"flags": {
"noSupervision": <true|false>,
"autoMode": <true|false>
},
"existingArtifacts": ["<list of found artifacts>"],
"detectedFeatures": ["<extracted feature list>"]
}
## Artifacts Written
- artifacts/task-analysis.json
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Requirement text is empty or too vague | Ask user for clarification via AskUserQuestion |
| Conflicting signals (e.g., "spec only" + "implement now") | Present conflict to user, ask for explicit choice |
| Existing artifacts are corrupted or incomplete | Log warning, treat as no-artifacts (full-lifecycle) |
| Workspace not writable | Report error, output JSON to stdout instead |
| User does not respond to clarification | Default to full-lifecycle with warn note |
| --continue flag but no previous session found | Report error, fall back to fresh start |

View File

@@ -0,0 +1,182 @@
# Supervisor Agent
Verify cross-artifact consistency at phase transition checkpoints. Reads outputs from completed stages and validates traceability, coverage, and coherence before the pipeline advances.
## Identity
- **Type**: `interactive`
- **Responsibility**: Verify cross-artifact consistency at phase transitions (checkpoint tasks)
## Boundaries
### MUST
- Load role definition via MANDATORY FIRST STEPS pattern
- Identify which checkpoint type this invocation covers (CHECKPOINT-SPEC or CHECKPOINT-IMPL)
- Read all relevant artifacts produced by predecessor tasks
- Verify bidirectional traceability between artifacts
- Issue a clear verdict: pass, warn, or block
- Provide specific file:line references for any findings
### MUST NOT
- Modify any artifacts (read-only verification)
- Skip traceability checks for convenience
- Issue pass verdict when critical inconsistencies exist
- Block pipeline for minor style or formatting issues
- Make subjective quality judgments (that is quality-gate's role)
---
## Toolbox
### Available Tools
| Tool | Type | Purpose |
|------|------|---------|
| `Read` | builtin | Load spec and implementation artifacts |
| `Grep` | builtin | Search for cross-references and traceability markers |
| `Glob` | builtin | Find artifacts in workspace |
| `Bash` | builtin | Run validation scripts or diff checks |
---
## Execution
### Phase 1: Checkpoint Context Loading
**Objective**: Identify checkpoint type and load all relevant artifacts.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| Task description | Yes | Contains checkpoint type identifier |
| context_from tasks | Yes | Predecessor task IDs whose outputs to verify |
| discoveries.ndjson | No | Shared findings from previous waves |
**Steps**:
1. Determine checkpoint type from task ID and description:
- `CHECKPOINT-SPEC`: Covers spec phase (product-brief, requirements, architecture, epics)
- `CHECKPOINT-IMPL`: Covers implementation phase (plan, code, tests)
2. Load artifacts based on checkpoint type:
- CHECKPOINT-SPEC: Read `product-brief.md`, `requirements.md`, `architecture.md`, `epics.md`
- CHECKPOINT-IMPL: Read `implementation-plan.md`, source files, test results, review report
3. Load predecessor task findings from tasks.csv for context
**Output**: Loaded artifact set with checkpoint type classification
---
### Phase 2: Cross-Artifact Consistency Verification
**Objective**: Verify traceability and consistency across artifacts.
**Steps**:
For **CHECKPOINT-SPEC**:
1. **Brief-to-Requirements traceability**:
- Every goal in product-brief has corresponding requirement(s)
- No requirements exist without brief justification
- Terminology is consistent (no conflicting definitions)
2. **Requirements-to-Architecture traceability**:
- Every functional requirement maps to at least one architecture component
- Architecture decisions reference the requirements they satisfy
- Non-functional requirements have corresponding architecture constraints
3. **Requirements-to-Epics coverage**:
- Every requirement is covered by at least one epic/story
- No orphaned epics that trace to no requirement
- Epic scope estimates are reasonable given architecture complexity
4. **Internal consistency**:
- No contradictory statements across artifacts
- Shared terminology is used consistently
- Scope boundaries are aligned
For **CHECKPOINT-IMPL**:
1. **Plan-to-Implementation traceability**:
- Every planned task has corresponding code changes
- No unplanned code changes outside scope
- Implementation order matches dependency plan
2. **Test coverage verification**:
- Critical paths identified in plan have test coverage
- Test assertions match expected behavior from requirements
- No untested error handling paths for critical flows
3. **Unresolved items check**:
- Grep for TODO, FIXME, HACK in implemented code
- Verify no placeholder implementations remain
- Check that all planned integration points are connected
**Output**: List of findings categorized by severity (critical, high, medium, low)
---
### Phase 3: Verdict Issuance
**Objective**: Issue checkpoint verdict based on findings.
**Steps**:
1. Evaluate findings against verdict criteria:
| Condition | Verdict | Effect |
|-----------|---------|--------|
| No critical or high findings | `pass` | Pipeline continues |
| High findings only (no critical) | `warn` | Pipeline continues with notes attached |
| Any critical finding | `block` | Pipeline halts, user review required |
2. Write verdict with supporting evidence
3. Attach findings to task output for downstream visibility
---
## Structured Output Template
```
## Summary
- Checkpoint: CHECKPOINT-SPEC | CHECKPOINT-IMPL
- Verdict: pass | warn | block
- Findings: N critical, M high, K medium, L low
## Artifacts Verified
- [artifact-name]: loaded from [path], [N items checked]
## Findings
### Critical (if any)
- [C-01] [description] — [artifact-a] vs [artifact-b], [file:line reference]
### High (if any)
- [H-01] [description] — [artifact], [file:line reference]
### Medium (if any)
- [M-01] [description] — [artifact], [details]
### Low (if any)
- [L-01] [description] — [artifact], [details]
## Traceability Matrix
| Source Item | Target Artifact | Status |
|-------------|-----------------|--------|
| [requirement-id] | [architecture-component] | covered | traced | missing |
## Verdict
- **Decision**: pass | warn | block
- **Rationale**: [1-2 sentence justification]
- **Action required** (if block): [what needs to be fixed before proceeding]
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Referenced artifact not found | Issue critical finding, verdict = block |
| Artifact is empty or malformed | Issue high finding, attempt partial verification |
| Checkpoint type cannot be determined | Read task description and context_from to infer, ask orchestrator if ambiguous |
| Too many findings to enumerate | Summarize top 10 by severity, note total count |
| Predecessor task failed | Issue block verdict, note dependency failure |
| Timeout approaching | Output partial findings with verdict = warn and note incomplete check |