Files
Claude-Code-Workflow/.claude/skills/codex-skill-designer/specs/quality-standards.md
catlog22 a4fff6a591 feat: Add orchestrator template and roles for executor and planner
- Created a new orchestrator template for Codex skill design, detailing structure and execution phases.
- Introduced the executor role with responsibilities for task execution, including routing to backends and handling implementation.
- Added the planner role for requirement breakdown, issue creation, and task dispatching, ensuring a structured planning process.
2026-02-16 00:17:15 +08:00

5.6 KiB
Raw Blame History

Quality Standards

Quality criteria and validation gates for generated Codex skills.

Purpose

Phase Usage
Phase 3 Reference during generation
Phase 4 Apply during validation

1. Quality Dimensions

1.1 Structural Completeness (30%)

Check Weight Criteria
Orchestrator exists 5 File present at expected path
Frontmatter valid 3 Contains name, description
Architecture diagram 3 ASCII flow showing spawn/wait/close
Agent Registry 4 Table with all agents, role paths, responsibilities
Phase Execution blocks 5 Code blocks for each phase with spawn/wait/close
Lifecycle Management 5 Timeout handling + cleanup protocol
Agent files complete 5 All new agent roles have complete role files

Scoring: Each check passes (full weight) or fails (0). Total = sum / max.

1.2 Pattern Compliance (40%)

Check Weight Criteria
Lifecycle balanced 6 Every spawn_agent has matching close_agent
Role loading correct 6 MANDATORY FIRST STEPS pattern used (not inline content)
Wait for results 5 wait() used for results (not close_agent)
Batch wait for parallel 5 Parallel agents use wait({ ids: [...] })
Timeout specified 4 All wait() calls have timeout_ms
Timeout handled 4 timed_out checked after every wait()
Structured output 5 Agents produce Summary/Findings/Changes/Tests/Questions
No Claude patterns 5 No Task(), TaskOutput(), resume: remaining

Scoring: Each check passes (full weight) or fails (0). Total = sum / max.

1.3 Content Quality (30%)

Check Weight Criteria
Orchestrator substantive 4 Content > 500 chars, not boilerplate
Code blocks present 3 >= 4 code blocks with executable patterns
Error handling 3 Timeout + recovery + partial results handling
No placeholders 4 No {{...}} or TODO remaining in output
Agent roles substantive 4 Each agent role > 300 chars with actionable steps
Output format defined 3 Structured output template in each agent
Goals/scope clear 4 Every spawn_agent has Goal + Scope + Deliverables
Conversion faithful 5 Source content preserved (if converting)

Scoring: Each check passes (full weight) or fails (0). Total = sum / max.

2. Quality Gates

Verdict Score Action
PASS >= 80% Deliver to target location
REVIEW 60-79% Report issues, user decides
FAIL < 60% Block delivery, list critical issues

2.1 Critical Failures (Auto-FAIL)

These issues force FAIL regardless of overall score:

  1. No orchestrator file — skill has no entry point
  2. Task() calls in output — runtime incompatible with Codex
  3. No agent registry — agents cannot be identified
  4. Missing close_agent — resource leak risk
  5. Inline role content — violates Codex pattern (message bloat)

2.2 Warnings (Non-blocking)

  1. Missing timeout handling — degraded reliability
  2. No error handling section — reduced robustness
  3. Placeholder text remaining — needs manual completion
  4. Phase files missing — acceptable for simple skills

3. Validation Process

3.1 Automated Checks

function validateSkill(generatedFiles, codexSkillConfig) {
  const checks = []

  // Structural
  checks.push(checkFileExists(generatedFiles.orchestrator))
  checks.push(checkFrontmatter(generatedFiles.orchestrator))
  checks.push(checkSection(generatedFiles.orchestrator, "Architecture"))
  checks.push(checkSection(generatedFiles.orchestrator, "Agent Registry"))
  // ...

  // Pattern compliance
  const content = Read(generatedFiles.orchestrator)
  checks.push(checkBalancedLifecycle(content))
  checks.push(checkRoleLoading(content))
  checks.push(checkWaitPattern(content))
  // ...

  // Content quality
  checks.push(checkNoPlaceholders(content))
  checks.push(checkSubstantiveContent(content))
  // ...

  // Critical failures
  const criticals = checkCriticalFailures(content, generatedFiles)
  if (criticals.length > 0) return { verdict: "FAIL", criticals }

  // Score
  const score = calculateWeightedScore(checks)
  const verdict = score >= 80 ? "PASS" : score >= 60 ? "REVIEW" : "FAIL"

  return { score, verdict, checks, issues: checks.filter(c => !c.passed) }
}

3.2 Manual Review Points

For REVIEW verdict, highlight these for user attention:

  1. Agent role completeness — are all capabilities covered?
  2. Interaction model appropriateness — right pattern for use case?
  3. Timeout values — appropriate for expected task duration?
  4. Scope definitions — clear boundaries for each agent?
  5. Output format — suitable for downstream consumers?

4. Scoring Formula

Overall = Structural × 0.30 + PatternCompliance × 0.40 + ContentQuality × 0.30

Pattern compliance weighted highest because Codex runtime correctness is critical.

5. Quality Improvement Guidance

Low Structural Score

  • Add missing sections to orchestrator
  • Create missing agent role files
  • Add frontmatter to all files

Low Pattern Score

  • Add MANDATORY FIRST STEPS to all spawn_agent messages
  • Replace inline role content with path references
  • Add close_agent for every spawn_agent
  • Add timeout_ms and timed_out handling to all wait calls
  • Remove any remaining Claude patterns

Low Content Score

  • Expand agent role definitions with more specific steps
  • Add concrete Goal/Scope/Deliverables to spawn messages
  • Replace placeholders with actual content
  • Add error handling for each phase