Files
Claude-Code-Workflow/.claude/skills/skill-tuning/specs/quality-gates.md
catlog22 633d918da1 Add quality gates and tuning strategies documentation
- Introduced quality gates specification for skill tuning, detailing quality dimensions, scoring, and gate definitions.
- Added comprehensive tuning strategies for various issue categories, including context explosion, long-tail forgetting, data flow, and agent coordination.
- Created templates for diagnosis reports and fix proposals to standardize documentation and reporting processes.
2026-01-14 12:59:13 +08:00

264 lines
6.7 KiB
Markdown

# Quality Gates
Quality thresholds and verification criteria for skill tuning.
## When to Use
| Phase | Usage | Section |
|-------|-------|---------|
| action-generate-report | Calculate quality score | Scoring |
| action-verify | Check quality gates | Gate Definitions |
| action-complete | Final assessment | Pass Criteria |
---
## Quality Dimensions
### 1. Issue Severity Distribution (40%)
Measures the severity profile of identified issues.
| Metric | Weight | Calculation |
|--------|--------|-------------|
| Critical Issues | -25 each | High penalty |
| High Issues | -15 each | Significant penalty |
| Medium Issues | -5 each | Moderate penalty |
| Low Issues | -1 each | Minor penalty |
**Score Calculation**:
```javascript
function calculateSeverityScore(issues) {
const weights = { critical: 25, high: 15, medium: 5, low: 1 };
const deductions = issues.reduce((sum, issue) =>
sum + (weights[issue.severity] || 0), 0);
return Math.max(0, 100 - deductions);
}
```
### 2. Fix Effectiveness (30%)
Measures success rate of applied fixes.
| Metric | Weight | Threshold |
|--------|--------|-----------|
| Fixes Verified Pass | +30 | > 80% pass rate |
| Fixes Verified Fail | -20 | < 50% triggers review |
| Issues Resolved | +10 | Per resolved issue |
**Score Calculation**:
```javascript
function calculateFixScore(appliedFixes) {
const total = appliedFixes.length;
if (total === 0) return 100; // No fixes needed = good
const passed = appliedFixes.filter(f => f.verification_result === 'pass').length;
return Math.round((passed / total) * 100);
}
```
### 3. Coverage Completeness (20%)
Measures diagnosis coverage across all areas.
| Metric | Weight | Threshold |
|--------|--------|-----------|
| All 4 diagnosis complete | +20 | Full coverage |
| 3 diagnosis complete | +15 | Good coverage |
| 2 diagnosis complete | +10 | Partial coverage |
| < 2 diagnosis complete | +0 | Insufficient |
### 4. Iteration Efficiency (10%)
Measures how quickly issues are resolved.
| Metric | Weight | Threshold |
|--------|--------|-----------|
| Resolved in 1 iteration | +10 | Excellent |
| Resolved in 2 iterations | +7 | Good |
| Resolved in 3 iterations | +4 | Acceptable |
| > 3 iterations | +0 | Needs improvement |
---
## Gate Definitions
### Gate: PASS
**Threshold**: Quality Score >= 80 AND Critical Issues = 0 AND High Issues <= 2
**Meaning**: Skill is production-ready with minor issues.
**Actions**:
- Complete tuning session
- Generate summary report
- No further fixes required
### Gate: REVIEW
**Threshold**: Quality Score 60-79 OR High Issues 3-5
**Meaning**: Skill has issues requiring attention.
**Actions**:
- Review remaining issues
- Apply additional fixes if possible
- May require manual intervention
### Gate: FAIL
**Threshold**: Quality Score < 60 OR Critical Issues > 0 OR High Issues > 5
**Meaning**: Skill has serious issues blocking deployment.
**Actions**:
- Must fix critical issues
- Re-run diagnosis after fixes
- Consider architectural review
---
## Quality Score Calculation
```javascript
function calculateQualityScore(state) {
// Dimension 1: Severity (40%)
const severityScore = calculateSeverityScore(state.issues);
// Dimension 2: Fix Effectiveness (30%)
const fixScore = calculateFixScore(state.applied_fixes);
// Dimension 3: Coverage (20%)
const diagnosisCount = Object.values(state.diagnosis)
.filter(d => d !== null).length;
const coverageScore = [0, 0, 10, 15, 20][diagnosisCount] || 0;
// Dimension 4: Efficiency (10%)
const efficiencyScore = state.iteration_count <= 1 ? 10 :
state.iteration_count <= 2 ? 7 :
state.iteration_count <= 3 ? 4 : 0;
// Weighted total
const total = (severityScore * 0.4) +
(fixScore * 0.3) +
(coverageScore * 1.0) + // Already scaled to 20
(efficiencyScore * 1.0); // Already scaled to 10
return Math.round(total);
}
function determineQualityGate(state) {
const score = calculateQualityScore(state);
const criticalCount = state.issues.filter(i => i.severity === 'critical').length;
const highCount = state.issues.filter(i => i.severity === 'high').length;
if (criticalCount > 0) return 'fail';
if (highCount > 5) return 'fail';
if (score < 60) return 'fail';
if (highCount > 2) return 'review';
if (score < 80) return 'review';
return 'pass';
}
```
---
## Verification Criteria
### For Each Issue Type
#### Context Explosion Issues
- [ ] Token count does not grow unbounded
- [ ] History limited to reasonable size
- [ ] No full content in prompts (paths used instead)
- [ ] Agent returns are compact
#### Long-tail Forgetting Issues
- [ ] Constraints visible in all phase prompts
- [ ] State schema includes requirements field
- [ ] Checkpoints exist at key milestones
- [ ] Output matches original constraints
#### Data Flow Issues
- [ ] Single state.json after execution
- [ ] No orphan state files
- [ ] Schema validation active
- [ ] Consistent field naming
#### Agent Coordination Issues
- [ ] All Task calls have error handling
- [ ] Agent results validated before use
- [ ] No nested agent calls
- [ ] Tool declarations match usage
---
## Iteration Control
### Max Iterations
Default: 5 iterations
**Rationale**:
- Each iteration may introduce new issues
- Diminishing returns after 3-4 iterations
- Prevents infinite loops
### Iteration Exit Criteria
```javascript
function shouldContinueIteration(state) {
// Exit if quality gate passed
if (state.quality_gate === 'pass') return false;
// Exit if max iterations reached
if (state.iteration_count >= state.max_iterations) return false;
// Exit if no improvement in last 2 iterations
if (state.iteration_count >= 2) {
const recentHistory = state.action_history.slice(-10);
const issuesResolvedRecently = recentHistory.filter(a =>
a.action === 'action-verify' && a.result === 'success'
).length;
if (issuesResolvedRecently === 0) {
console.log('No progress in recent iterations, stopping.');
return false;
}
}
// Continue if critical/high issues remain
const hasUrgentIssues = state.issues.some(i =>
i.severity === 'critical' || i.severity === 'high'
);
return hasUrgentIssues;
}
```
---
## Reporting Format
### Quality Summary Table
| Dimension | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| Severity Distribution | {score}/100 | 40% | {weighted} |
| Fix Effectiveness | {score}/100 | 30% | {weighted} |
| Coverage Completeness | {score}/20 | 20% | {score} |
| Iteration Efficiency | {score}/10 | 10% | {score} |
| **Total** | | | **{total}/100** |
### Gate Status
```
Quality Gate: {PASS|REVIEW|FAIL}
Criteria:
- Quality Score: {score} (threshold: 60)
- Critical Issues: {count} (threshold: 0)
- High Issues: {count} (threshold: 5)
```