mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-28 09:23:08 +08:00
- Created a new orchestrator template for Codex skill design, detailing structure and execution phases. - Introduced the executor role with responsibilities for task execution, including routing to backends and handling implementation. - Added the planner role for requirement breakdown, issue creation, and task dispatching, ensuring a structured planning process.
164 lines
5.6 KiB
Markdown
164 lines
5.6 KiB
Markdown
# Quality Standards
|
||
|
||
Quality criteria and validation gates for generated Codex skills.
|
||
|
||
## Purpose
|
||
|
||
| Phase | Usage |
|
||
|-------|-------|
|
||
| Phase 3 | Reference during generation |
|
||
| Phase 4 | Apply during validation |
|
||
|
||
---
|
||
|
||
## 1. Quality Dimensions
|
||
|
||
### 1.1 Structural Completeness (30%)
|
||
|
||
| Check | Weight | Criteria |
|
||
|-------|--------|----------|
|
||
| Orchestrator exists | 5 | File present at expected path |
|
||
| Frontmatter valid | 3 | Contains name, description |
|
||
| Architecture diagram | 3 | ASCII flow showing spawn/wait/close |
|
||
| Agent Registry | 4 | Table with all agents, role paths, responsibilities |
|
||
| Phase Execution blocks | 5 | Code blocks for each phase with spawn/wait/close |
|
||
| Lifecycle Management | 5 | Timeout handling + cleanup protocol |
|
||
| Agent files complete | 5 | All new agent roles have complete role files |
|
||
|
||
**Scoring**: Each check passes (full weight) or fails (0). Total = sum / max.
|
||
|
||
### 1.2 Pattern Compliance (40%)
|
||
|
||
| Check | Weight | Criteria |
|
||
|-------|--------|----------|
|
||
| Lifecycle balanced | 6 | Every spawn_agent has matching close_agent |
|
||
| Role loading correct | 6 | MANDATORY FIRST STEPS pattern used (not inline content) |
|
||
| Wait for results | 5 | wait() used for results (not close_agent) |
|
||
| Batch wait for parallel | 5 | Parallel agents use wait({ ids: [...] }) |
|
||
| Timeout specified | 4 | All wait() calls have timeout_ms |
|
||
| Timeout handled | 4 | timed_out checked after every wait() |
|
||
| Structured output | 5 | Agents produce Summary/Findings/Changes/Tests/Questions |
|
||
| No Claude patterns | 5 | No Task(), TaskOutput(), resume: remaining |
|
||
|
||
**Scoring**: Each check passes (full weight) or fails (0). Total = sum / max.
|
||
|
||
### 1.3 Content Quality (30%)
|
||
|
||
| Check | Weight | Criteria |
|
||
|-------|--------|----------|
|
||
| Orchestrator substantive | 4 | Content > 500 chars, not boilerplate |
|
||
| Code blocks present | 3 | >= 4 code blocks with executable patterns |
|
||
| Error handling | 3 | Timeout + recovery + partial results handling |
|
||
| No placeholders | 4 | No `{{...}}` or `TODO` remaining in output |
|
||
| Agent roles substantive | 4 | Each agent role > 300 chars with actionable steps |
|
||
| Output format defined | 3 | Structured output template in each agent |
|
||
| Goals/scope clear | 4 | Every spawn_agent has Goal + Scope + Deliverables |
|
||
| Conversion faithful | 5 | Source content preserved (if converting) |
|
||
|
||
**Scoring**: Each check passes (full weight) or fails (0). Total = sum / max.
|
||
|
||
## 2. Quality Gates
|
||
|
||
| Verdict | Score | Action |
|
||
|---------|-------|--------|
|
||
| **PASS** | >= 80% | Deliver to target location |
|
||
| **REVIEW** | 60-79% | Report issues, user decides |
|
||
| **FAIL** | < 60% | Block delivery, list critical issues |
|
||
|
||
### 2.1 Critical Failures (Auto-FAIL)
|
||
|
||
These issues force FAIL regardless of overall score:
|
||
|
||
1. **No orchestrator file** — skill has no entry point
|
||
2. **Task() calls in output** — runtime incompatible with Codex
|
||
3. **No agent registry** — agents cannot be identified
|
||
4. **Missing close_agent** — resource leak risk
|
||
5. **Inline role content** — violates Codex pattern (message bloat)
|
||
|
||
### 2.2 Warnings (Non-blocking)
|
||
|
||
1. **Missing timeout handling** — degraded reliability
|
||
2. **No error handling section** — reduced robustness
|
||
3. **Placeholder text remaining** — needs manual completion
|
||
4. **Phase files missing** — acceptable for simple skills
|
||
|
||
## 3. Validation Process
|
||
|
||
### 3.1 Automated Checks
|
||
|
||
```javascript
|
||
function validateSkill(generatedFiles, codexSkillConfig) {
|
||
const checks = []
|
||
|
||
// Structural
|
||
checks.push(checkFileExists(generatedFiles.orchestrator))
|
||
checks.push(checkFrontmatter(generatedFiles.orchestrator))
|
||
checks.push(checkSection(generatedFiles.orchestrator, "Architecture"))
|
||
checks.push(checkSection(generatedFiles.orchestrator, "Agent Registry"))
|
||
// ...
|
||
|
||
// Pattern compliance
|
||
const content = Read(generatedFiles.orchestrator)
|
||
checks.push(checkBalancedLifecycle(content))
|
||
checks.push(checkRoleLoading(content))
|
||
checks.push(checkWaitPattern(content))
|
||
// ...
|
||
|
||
// Content quality
|
||
checks.push(checkNoPlaceholders(content))
|
||
checks.push(checkSubstantiveContent(content))
|
||
// ...
|
||
|
||
// Critical failures
|
||
const criticals = checkCriticalFailures(content, generatedFiles)
|
||
if (criticals.length > 0) return { verdict: "FAIL", criticals }
|
||
|
||
// Score
|
||
const score = calculateWeightedScore(checks)
|
||
const verdict = score >= 80 ? "PASS" : score >= 60 ? "REVIEW" : "FAIL"
|
||
|
||
return { score, verdict, checks, issues: checks.filter(c => !c.passed) }
|
||
}
|
||
```
|
||
|
||
### 3.2 Manual Review Points
|
||
|
||
For REVIEW verdict, highlight these for user attention:
|
||
|
||
1. Agent role completeness — are all capabilities covered?
|
||
2. Interaction model appropriateness — right pattern for use case?
|
||
3. Timeout values — appropriate for expected task duration?
|
||
4. Scope definitions — clear boundaries for each agent?
|
||
5. Output format — suitable for downstream consumers?
|
||
|
||
## 4. Scoring Formula
|
||
|
||
```
|
||
Overall = Structural × 0.30 + PatternCompliance × 0.40 + ContentQuality × 0.30
|
||
```
|
||
|
||
Pattern compliance weighted highest because Codex runtime correctness is critical.
|
||
|
||
## 5. Quality Improvement Guidance
|
||
|
||
### Low Structural Score
|
||
|
||
- Add missing sections to orchestrator
|
||
- Create missing agent role files
|
||
- Add frontmatter to all files
|
||
|
||
### Low Pattern Score
|
||
|
||
- Add MANDATORY FIRST STEPS to all spawn_agent messages
|
||
- Replace inline role content with path references
|
||
- Add close_agent for every spawn_agent
|
||
- Add timeout_ms and timed_out handling to all wait calls
|
||
- Remove any remaining Claude patterns
|
||
|
||
### Low Content Score
|
||
|
||
- Expand agent role definitions with more specific steps
|
||
- Add concrete Goal/Scope/Deliverables to spawn messages
|
||
- Replace placeholders with actual content
|
||
- Add error handling for each phase
|