mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-02-28 09:23:08 +08:00

Files

catlog22 a4fff6a591 feat: Add orchestrator template and roles for executor and planner

- Created a new orchestrator template for Codex skill design, detailing structure and execution phases.
- Introduced the executor role with responsibilities for task execution, including routing to backends and handling implementation.
- Added the planner role for requirement breakdown, issue creation, and task dispatching, ensuring a structured planning process.

2026-02-16 00:17:15 +08:00

5.6 KiB

Raw Blame History

Quality Standards

Quality criteria and validation gates for generated Codex skills.

Purpose

Phase	Usage
Phase 3	Reference during generation
Phase 4	Apply during validation

1. Quality Dimensions

1.1 Structural Completeness (30%)

Check	Weight	Criteria
Orchestrator exists	5	File present at expected path
Frontmatter valid	3	Contains name, description
Architecture diagram	3	ASCII flow showing spawn/wait/close
Agent Registry	4	Table with all agents, role paths, responsibilities
Phase Execution blocks	5	Code blocks for each phase with spawn/wait/close
Lifecycle Management	5	Timeout handling + cleanup protocol
Agent files complete	5	All new agent roles have complete role files

Scoring: Each check passes (full weight) or fails (0). Total = sum / max.

1.2 Pattern Compliance (40%)

Check	Weight	Criteria
Lifecycle balanced	6	Every spawn_agent has matching close_agent
Role loading correct	6	MANDATORY FIRST STEPS pattern used (not inline content)
Wait for results	5	wait() used for results (not close_agent)
Batch wait for parallel	5	Parallel agents use wait({ ids: [...] })
Timeout specified	4	All wait() calls have timeout_ms
Timeout handled	4	timed_out checked after every wait()
Structured output	5	Agents produce Summary/Findings/Changes/Tests/Questions
No Claude patterns	5	No Task(), TaskOutput(), resume: remaining

Scoring: Each check passes (full weight) or fails (0). Total = sum / max.

1.3 Content Quality (30%)

Check	Weight	Criteria
Orchestrator substantive	4	Content > 500 chars, not boilerplate
Code blocks present	3	>= 4 code blocks with executable patterns
Error handling	3	Timeout + recovery + partial results handling
No placeholders	4	No `{{...}}` or `TODO` remaining in output
Agent roles substantive	4	Each agent role > 300 chars with actionable steps
Output format defined	3	Structured output template in each agent
Goals/scope clear	4	Every spawn_agent has Goal + Scope + Deliverables
Conversion faithful	5	Source content preserved (if converting)

Scoring: Each check passes (full weight) or fails (0). Total = sum / max.

2. Quality Gates

Verdict	Score	Action
PASS	>= 80%	Deliver to target location
REVIEW	60-79%	Report issues, user decides
FAIL	< 60%	Block delivery, list critical issues

2.1 Critical Failures (Auto-FAIL)

These issues force FAIL regardless of overall score:

No orchestrator file — skill has no entry point
Task() calls in output — runtime incompatible with Codex
No agent registry — agents cannot be identified
Missing close_agent — resource leak risk
Inline role content — violates Codex pattern (message bloat)

2.2 Warnings (Non-blocking)

Missing timeout handling — degraded reliability
No error handling section — reduced robustness
Placeholder text remaining — needs manual completion
Phase files missing — acceptable for simple skills

3. Validation Process

3.1 Automated Checks

function validateSkill(generatedFiles, codexSkillConfig) {
  const checks = []

  // Structural
  checks.push(checkFileExists(generatedFiles.orchestrator))
  checks.push(checkFrontmatter(generatedFiles.orchestrator))
  checks.push(checkSection(generatedFiles.orchestrator, "Architecture"))
  checks.push(checkSection(generatedFiles.orchestrator, "Agent Registry"))
  // ...

  // Pattern compliance
  const content = Read(generatedFiles.orchestrator)
  checks.push(checkBalancedLifecycle(content))
  checks.push(checkRoleLoading(content))
  checks.push(checkWaitPattern(content))
  // ...

  // Content quality
  checks.push(checkNoPlaceholders(content))
  checks.push(checkSubstantiveContent(content))
  // ...

  // Critical failures
  const criticals = checkCriticalFailures(content, generatedFiles)
  if (criticals.length > 0) return { verdict: "FAIL", criticals }

  // Score
  const score = calculateWeightedScore(checks)
  const verdict = score >= 80 ? "PASS" : score >= 60 ? "REVIEW" : "FAIL"

  return { score, verdict, checks, issues: checks.filter(c => !c.passed) }
}

3.2 Manual Review Points

For REVIEW verdict, highlight these for user attention:

Agent role completeness — are all capabilities covered?
Interaction model appropriateness — right pattern for use case?
Timeout values — appropriate for expected task duration?
Scope definitions — clear boundaries for each agent?
Output format — suitable for downstream consumers?

4. Scoring Formula

Overall = Structural × 0.30 + PatternCompliance × 0.40 + ContentQuality × 0.30

Pattern compliance weighted highest because Codex runtime correctness is critical.

5. Quality Improvement Guidance

Low Structural Score

Add missing sections to orchestrator
Create missing agent role files
Add frontmatter to all files

Low Pattern Score

Add MANDATORY FIRST STEPS to all spawn_agent messages
Replace inline role content with path references
Add close_agent for every spawn_agent
Add timeout_ms and timed_out handling to all wait calls
Remove any remaining Claude patterns

Low Content Score

Expand agent role definitions with more specific steps
Add concrete Goal/Scope/Deliverables to spawn messages
Replace placeholders with actual content
Add error handling for each phase

5.6 KiB Raw Blame History Unescape Escape

Quality Standards

Purpose

1. Quality Dimensions

1.1 Structural Completeness (30%)

1.2 Pattern Compliance (40%)

1.3 Content Quality (30%)

2. Quality Gates

2.1 Critical Failures (Auto-FAIL)

2.2 Warnings (Non-blocking)

3. Validation Process

3.1 Automated Checks

3.2 Manual Review Points

4. Scoring Formula

5. Quality Improvement Guidance

Low Structural Score

Low Pattern Score

Low Content Score

5.6 KiB

Raw Blame History