mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-05 01:50:27 +08:00
- Introduced quality gates specification for skill tuning, detailing quality dimensions, scoring, and gate definitions. - Added comprehensive tuning strategies for various issue categories, including context explosion, long-tail forgetting, data flow, and agent coordination. - Created templates for diagnosis reports and fix proposals to standardize documentation and reporting processes.
6.7 KiB
6.7 KiB
Quality Gates
Quality thresholds and verification criteria for skill tuning.
When to Use
| Phase | Usage | Section |
|---|---|---|
| action-generate-report | Calculate quality score | Scoring |
| action-verify | Check quality gates | Gate Definitions |
| action-complete | Final assessment | Pass Criteria |
Quality Dimensions
1. Issue Severity Distribution (40%)
Measures the severity profile of identified issues.
| Metric | Weight | Calculation |
|---|---|---|
| Critical Issues | -25 each | High penalty |
| High Issues | -15 each | Significant penalty |
| Medium Issues | -5 each | Moderate penalty |
| Low Issues | -1 each | Minor penalty |
Score Calculation:
function calculateSeverityScore(issues) {
const weights = { critical: 25, high: 15, medium: 5, low: 1 };
const deductions = issues.reduce((sum, issue) =>
sum + (weights[issue.severity] || 0), 0);
return Math.max(0, 100 - deductions);
}
2. Fix Effectiveness (30%)
Measures success rate of applied fixes.
| Metric | Weight | Threshold |
|---|---|---|
| Fixes Verified Pass | +30 | > 80% pass rate |
| Fixes Verified Fail | -20 | < 50% triggers review |
| Issues Resolved | +10 | Per resolved issue |
Score Calculation:
function calculateFixScore(appliedFixes) {
const total = appliedFixes.length;
if (total === 0) return 100; // No fixes needed = good
const passed = appliedFixes.filter(f => f.verification_result === 'pass').length;
return Math.round((passed / total) * 100);
}
3. Coverage Completeness (20%)
Measures diagnosis coverage across all areas.
| Metric | Weight | Threshold |
|---|---|---|
| All 4 diagnosis complete | +20 | Full coverage |
| 3 diagnosis complete | +15 | Good coverage |
| 2 diagnosis complete | +10 | Partial coverage |
| < 2 diagnosis complete | +0 | Insufficient |
4. Iteration Efficiency (10%)
Measures how quickly issues are resolved.
| Metric | Weight | Threshold |
|---|---|---|
| Resolved in 1 iteration | +10 | Excellent |
| Resolved in 2 iterations | +7 | Good |
| Resolved in 3 iterations | +4 | Acceptable |
| > 3 iterations | +0 | Needs improvement |
Gate Definitions
Gate: PASS
Threshold: Quality Score >= 80 AND Critical Issues = 0 AND High Issues <= 2
Meaning: Skill is production-ready with minor issues.
Actions:
- Complete tuning session
- Generate summary report
- No further fixes required
Gate: REVIEW
Threshold: Quality Score 60-79 OR High Issues 3-5
Meaning: Skill has issues requiring attention.
Actions:
- Review remaining issues
- Apply additional fixes if possible
- May require manual intervention
Gate: FAIL
Threshold: Quality Score < 60 OR Critical Issues > 0 OR High Issues > 5
Meaning: Skill has serious issues blocking deployment.
Actions:
- Must fix critical issues
- Re-run diagnosis after fixes
- Consider architectural review
Quality Score Calculation
function calculateQualityScore(state) {
// Dimension 1: Severity (40%)
const severityScore = calculateSeverityScore(state.issues);
// Dimension 2: Fix Effectiveness (30%)
const fixScore = calculateFixScore(state.applied_fixes);
// Dimension 3: Coverage (20%)
const diagnosisCount = Object.values(state.diagnosis)
.filter(d => d !== null).length;
const coverageScore = [0, 0, 10, 15, 20][diagnosisCount] || 0;
// Dimension 4: Efficiency (10%)
const efficiencyScore = state.iteration_count <= 1 ? 10 :
state.iteration_count <= 2 ? 7 :
state.iteration_count <= 3 ? 4 : 0;
// Weighted total
const total = (severityScore * 0.4) +
(fixScore * 0.3) +
(coverageScore * 1.0) + // Already scaled to 20
(efficiencyScore * 1.0); // Already scaled to 10
return Math.round(total);
}
function determineQualityGate(state) {
const score = calculateQualityScore(state);
const criticalCount = state.issues.filter(i => i.severity === 'critical').length;
const highCount = state.issues.filter(i => i.severity === 'high').length;
if (criticalCount > 0) return 'fail';
if (highCount > 5) return 'fail';
if (score < 60) return 'fail';
if (highCount > 2) return 'review';
if (score < 80) return 'review';
return 'pass';
}
Verification Criteria
For Each Issue Type
Context Explosion Issues
- Token count does not grow unbounded
- History limited to reasonable size
- No full content in prompts (paths used instead)
- Agent returns are compact
Long-tail Forgetting Issues
- Constraints visible in all phase prompts
- State schema includes requirements field
- Checkpoints exist at key milestones
- Output matches original constraints
Data Flow Issues
- Single state.json after execution
- No orphan state files
- Schema validation active
- Consistent field naming
Agent Coordination Issues
- All Task calls have error handling
- Agent results validated before use
- No nested agent calls
- Tool declarations match usage
Iteration Control
Max Iterations
Default: 5 iterations
Rationale:
- Each iteration may introduce new issues
- Diminishing returns after 3-4 iterations
- Prevents infinite loops
Iteration Exit Criteria
function shouldContinueIteration(state) {
// Exit if quality gate passed
if (state.quality_gate === 'pass') return false;
// Exit if max iterations reached
if (state.iteration_count >= state.max_iterations) return false;
// Exit if no improvement in last 2 iterations
if (state.iteration_count >= 2) {
const recentHistory = state.action_history.slice(-10);
const issuesResolvedRecently = recentHistory.filter(a =>
a.action === 'action-verify' && a.result === 'success'
).length;
if (issuesResolvedRecently === 0) {
console.log('No progress in recent iterations, stopping.');
return false;
}
}
// Continue if critical/high issues remain
const hasUrgentIssues = state.issues.some(i =>
i.severity === 'critical' || i.severity === 'high'
);
return hasUrgentIssues;
}
Reporting Format
Quality Summary Table
| Dimension | Score | Weight | Weighted |
|---|---|---|---|
| Severity Distribution | {score}/100 | 40% | {weighted} |
| Fix Effectiveness | {score}/100 | 30% | {weighted} |
| Coverage Completeness | {score}/20 | 20% | {score} |
| Iteration Efficiency | {score}/10 | 10% | {score} |
| Total | {total}/100 |
Gate Status
Quality Gate: {PASS|REVIEW|FAIL}
Criteria:
- Quality Score: {score} (threshold: 60)
- Critical Issues: {count} (threshold: 0)
- High Issues: {count} (threshold: 5)