Files
Claude-Code-Workflow/.claude/skills/skill-tuning/specs/quality-gates.md
catlog22 633d918da1 Add quality gates and tuning strategies documentation
- Introduced quality gates specification for skill tuning, detailing quality dimensions, scoring, and gate definitions.
- Added comprehensive tuning strategies for various issue categories, including context explosion, long-tail forgetting, data flow, and agent coordination.
- Created templates for diagnosis reports and fix proposals to standardize documentation and reporting processes.
2026-01-14 12:59:13 +08:00

6.7 KiB

Quality Gates

Quality thresholds and verification criteria for skill tuning.

When to Use

Phase Usage Section
action-generate-report Calculate quality score Scoring
action-verify Check quality gates Gate Definitions
action-complete Final assessment Pass Criteria

Quality Dimensions

1. Issue Severity Distribution (40%)

Measures the severity profile of identified issues.

Metric Weight Calculation
Critical Issues -25 each High penalty
High Issues -15 each Significant penalty
Medium Issues -5 each Moderate penalty
Low Issues -1 each Minor penalty

Score Calculation:

function calculateSeverityScore(issues) {
  const weights = { critical: 25, high: 15, medium: 5, low: 1 };
  const deductions = issues.reduce((sum, issue) =>
    sum + (weights[issue.severity] || 0), 0);
  return Math.max(0, 100 - deductions);
}

2. Fix Effectiveness (30%)

Measures success rate of applied fixes.

Metric Weight Threshold
Fixes Verified Pass +30 > 80% pass rate
Fixes Verified Fail -20 < 50% triggers review
Issues Resolved +10 Per resolved issue

Score Calculation:

function calculateFixScore(appliedFixes) {
  const total = appliedFixes.length;
  if (total === 0) return 100;  // No fixes needed = good

  const passed = appliedFixes.filter(f => f.verification_result === 'pass').length;
  return Math.round((passed / total) * 100);
}

3. Coverage Completeness (20%)

Measures diagnosis coverage across all areas.

Metric Weight Threshold
All 4 diagnosis complete +20 Full coverage
3 diagnosis complete +15 Good coverage
2 diagnosis complete +10 Partial coverage
< 2 diagnosis complete +0 Insufficient

4. Iteration Efficiency (10%)

Measures how quickly issues are resolved.

Metric Weight Threshold
Resolved in 1 iteration +10 Excellent
Resolved in 2 iterations +7 Good
Resolved in 3 iterations +4 Acceptable
> 3 iterations +0 Needs improvement

Gate Definitions

Gate: PASS

Threshold: Quality Score >= 80 AND Critical Issues = 0 AND High Issues <= 2

Meaning: Skill is production-ready with minor issues.

Actions:

  • Complete tuning session
  • Generate summary report
  • No further fixes required

Gate: REVIEW

Threshold: Quality Score 60-79 OR High Issues 3-5

Meaning: Skill has issues requiring attention.

Actions:

  • Review remaining issues
  • Apply additional fixes if possible
  • May require manual intervention

Gate: FAIL

Threshold: Quality Score < 60 OR Critical Issues > 0 OR High Issues > 5

Meaning: Skill has serious issues blocking deployment.

Actions:

  • Must fix critical issues
  • Re-run diagnosis after fixes
  • Consider architectural review

Quality Score Calculation

function calculateQualityScore(state) {
  // Dimension 1: Severity (40%)
  const severityScore = calculateSeverityScore(state.issues);

  // Dimension 2: Fix Effectiveness (30%)
  const fixScore = calculateFixScore(state.applied_fixes);

  // Dimension 3: Coverage (20%)
  const diagnosisCount = Object.values(state.diagnosis)
    .filter(d => d !== null).length;
  const coverageScore = [0, 0, 10, 15, 20][diagnosisCount] || 0;

  // Dimension 4: Efficiency (10%)
  const efficiencyScore = state.iteration_count <= 1 ? 10 :
                          state.iteration_count <= 2 ? 7 :
                          state.iteration_count <= 3 ? 4 : 0;

  // Weighted total
  const total = (severityScore * 0.4) +
                (fixScore * 0.3) +
                (coverageScore * 1.0) +  // Already scaled to 20
                (efficiencyScore * 1.0);  // Already scaled to 10

  return Math.round(total);
}

function determineQualityGate(state) {
  const score = calculateQualityScore(state);
  const criticalCount = state.issues.filter(i => i.severity === 'critical').length;
  const highCount = state.issues.filter(i => i.severity === 'high').length;

  if (criticalCount > 0) return 'fail';
  if (highCount > 5) return 'fail';
  if (score < 60) return 'fail';

  if (highCount > 2) return 'review';
  if (score < 80) return 'review';

  return 'pass';
}

Verification Criteria

For Each Issue Type

Context Explosion Issues

  • Token count does not grow unbounded
  • History limited to reasonable size
  • No full content in prompts (paths used instead)
  • Agent returns are compact

Long-tail Forgetting Issues

  • Constraints visible in all phase prompts
  • State schema includes requirements field
  • Checkpoints exist at key milestones
  • Output matches original constraints

Data Flow Issues

  • Single state.json after execution
  • No orphan state files
  • Schema validation active
  • Consistent field naming

Agent Coordination Issues

  • All Task calls have error handling
  • Agent results validated before use
  • No nested agent calls
  • Tool declarations match usage

Iteration Control

Max Iterations

Default: 5 iterations

Rationale:

  • Each iteration may introduce new issues
  • Diminishing returns after 3-4 iterations
  • Prevents infinite loops

Iteration Exit Criteria

function shouldContinueIteration(state) {
  // Exit if quality gate passed
  if (state.quality_gate === 'pass') return false;

  // Exit if max iterations reached
  if (state.iteration_count >= state.max_iterations) return false;

  // Exit if no improvement in last 2 iterations
  if (state.iteration_count >= 2) {
    const recentHistory = state.action_history.slice(-10);
    const issuesResolvedRecently = recentHistory.filter(a =>
      a.action === 'action-verify' && a.result === 'success'
    ).length;

    if (issuesResolvedRecently === 0) {
      console.log('No progress in recent iterations, stopping.');
      return false;
    }
  }

  // Continue if critical/high issues remain
  const hasUrgentIssues = state.issues.some(i =>
    i.severity === 'critical' || i.severity === 'high'
  );

  return hasUrgentIssues;
}

Reporting Format

Quality Summary Table

Dimension Score Weight Weighted
Severity Distribution {score}/100 40% {weighted}
Fix Effectiveness {score}/100 30% {weighted}
Coverage Completeness {score}/20 20% {score}
Iteration Efficiency {score}/10 10% {score}
Total {total}/100

Gate Status

Quality Gate: {PASS|REVIEW|FAIL}

Criteria:
- Quality Score: {score} (threshold: 60)
- Critical Issues: {count} (threshold: 0)
- High Issues: {count} (threshold: 5)