Add quality gates and tuning strategies documentation

- Introduced quality gates specification for skill tuning, detailing quality dimensions, scoring, and gate definitions. - Added comprehensive tuning strategies for various issue categories, including context explosion, long-tail forgetting, data flow, and agent coordination. - Created templates for diagnosis reports and fix proposals to standardize documentation and reporting processes.
2026-02-14 02:42:04 +08:00 · 2026-01-14 12:59:13 +08:00
parent 6b4b9b0775
commit 633d918da1
20 changed files with 5755 additions and 0 deletions
--- a/.claude/skills/skill-tuning/specs/quality-gates.md
+++ b/.claude/skills/skill-tuning/specs/quality-gates.md
@@ -0,0 +1,263 @@
+# Quality Gates
+
+Quality thresholds and verification criteria for skill tuning.
+
+## When to Use
+
+| Phase | Usage | Section |
+|-------|-------|---------|
+| action-generate-report | Calculate quality score | Scoring |
+| action-verify | Check quality gates | Gate Definitions |
+| action-complete | Final assessment | Pass Criteria |
+
+---
+
+## Quality Dimensions
+
+### 1. Issue Severity Distribution (40%)
+
+Measures the severity profile of identified issues.
+
+| Metric | Weight | Calculation |
+|--------|--------|-------------|
+| Critical Issues | -25 each | High penalty |
+| High Issues | -15 each | Significant penalty |
+| Medium Issues | -5 each | Moderate penalty |
+| Low Issues | -1 each | Minor penalty |
+
+**Score Calculation**:
+```javascript
+function calculateSeverityScore(issues) {
+  const weights = { critical: 25, high: 15, medium: 5, low: 1 };
+  const deductions = issues.reduce((sum, issue) =>
+    sum + (weights[issue.severity] || 0), 0);
+  return Math.max(0, 100 - deductions);
+}
+```
+
+### 2. Fix Effectiveness (30%)
+
+Measures success rate of applied fixes.
+
+| Metric | Weight | Threshold |
+|--------|--------|-----------|
+| Fixes Verified Pass | +30 | > 80% pass rate |
+| Fixes Verified Fail | -20 | < 50% triggers review |
+| Issues Resolved | +10 | Per resolved issue |
+
+**Score Calculation**:
+```javascript
+function calculateFixScore(appliedFixes) {
+  const total = appliedFixes.length;
+  if (total === 0) return 100;  // No fixes needed = good
+
+  const passed = appliedFixes.filter(f => f.verification_result === 'pass').length;
+  return Math.round((passed / total) * 100);
+}
+```
+
+### 3. Coverage Completeness (20%)
+
+Measures diagnosis coverage across all areas.
+
+| Metric | Weight | Threshold |
+|--------|--------|-----------|
+| All 4 diagnosis complete | +20 | Full coverage |
+| 3 diagnosis complete | +15 | Good coverage |
+| 2 diagnosis complete | +10 | Partial coverage |
+| < 2 diagnosis complete | +0 | Insufficient |
+
+### 4. Iteration Efficiency (10%)
+
+Measures how quickly issues are resolved.
+
+| Metric | Weight | Threshold |
+|--------|--------|-----------|
+| Resolved in 1 iteration | +10 | Excellent |
+| Resolved in 2 iterations | +7 | Good |
+| Resolved in 3 iterations | +4 | Acceptable |
+| > 3 iterations | +0 | Needs improvement |
+
+---
+
+## Gate Definitions
+
+### Gate: PASS
+
+**Threshold**: Quality Score >= 80 AND Critical Issues = 0 AND High Issues <= 2
+
+**Meaning**: Skill is production-ready with minor issues.
+
+**Actions**:
+- Complete tuning session
+- Generate summary report
+- No further fixes required
+
+### Gate: REVIEW
+
+**Threshold**: Quality Score 60-79 OR High Issues 3-5
+
+**Meaning**: Skill has issues requiring attention.
+
+**Actions**:
+- Review remaining issues
+- Apply additional fixes if possible
+- May require manual intervention
+
+### Gate: FAIL
+
+**Threshold**: Quality Score < 60 OR Critical Issues > 0 OR High Issues > 5
+
+**Meaning**: Skill has serious issues blocking deployment.
+
+**Actions**:
+- Must fix critical issues
+- Re-run diagnosis after fixes
+- Consider architectural review
+
+---
+
+## Quality Score Calculation
+
+```javascript
+function calculateQualityScore(state) {
+  // Dimension 1: Severity (40%)
+  const severityScore = calculateSeverityScore(state.issues);
+
+  // Dimension 2: Fix Effectiveness (30%)
+  const fixScore = calculateFixScore(state.applied_fixes);
+
+  // Dimension 3: Coverage (20%)
+  const diagnosisCount = Object.values(state.diagnosis)
+    .filter(d => d !== null).length;
+  const coverageScore = [0, 0, 10, 15, 20][diagnosisCount] || 0;
+
+  // Dimension 4: Efficiency (10%)
+  const efficiencyScore = state.iteration_count <= 1 ? 10 :
+                          state.iteration_count <= 2 ? 7 :
+                          state.iteration_count <= 3 ? 4 : 0;
+
+  // Weighted total
+  const total = (severityScore * 0.4) +
+                (fixScore * 0.3) +
+                (coverageScore * 1.0) +  // Already scaled to 20
+                (efficiencyScore * 1.0);  // Already scaled to 10
+
+  return Math.round(total);
+}
+
+function determineQualityGate(state) {
+  const score = calculateQualityScore(state);
+  const criticalCount = state.issues.filter(i => i.severity === 'critical').length;
+  const highCount = state.issues.filter(i => i.severity === 'high').length;
+
+  if (criticalCount > 0) return 'fail';
+  if (highCount > 5) return 'fail';
+  if (score < 60) return 'fail';
+
+  if (highCount > 2) return 'review';
+  if (score < 80) return 'review';
+
+  return 'pass';
+}
+```
+
+---
+
+## Verification Criteria
+
+### For Each Issue Type
+
+#### Context Explosion Issues
+- [ ] Token count does not grow unbounded
+- [ ] History limited to reasonable size
+- [ ] No full content in prompts (paths used instead)
+- [ ] Agent returns are compact
+
+#### Long-tail Forgetting Issues
+- [ ] Constraints visible in all phase prompts
+- [ ] State schema includes requirements field
+- [ ] Checkpoints exist at key milestones
+- [ ] Output matches original constraints
+
+#### Data Flow Issues
+- [ ] Single state.json after execution
+- [ ] No orphan state files
+- [ ] Schema validation active
+- [ ] Consistent field naming
+
+#### Agent Coordination Issues
+- [ ] All Task calls have error handling
+- [ ] Agent results validated before use
+- [ ] No nested agent calls
+- [ ] Tool declarations match usage
+
+---
+
+## Iteration Control
+
+### Max Iterations
+
+Default: 5 iterations
+
+**Rationale**:
+- Each iteration may introduce new issues
+- Diminishing returns after 3-4 iterations
+- Prevents infinite loops
+
+### Iteration Exit Criteria
+
+```javascript
+function shouldContinueIteration(state) {
+  // Exit if quality gate passed
+  if (state.quality_gate === 'pass') return false;
+
+  // Exit if max iterations reached
+  if (state.iteration_count >= state.max_iterations) return false;
+
+  // Exit if no improvement in last 2 iterations
+  if (state.iteration_count >= 2) {
+    const recentHistory = state.action_history.slice(-10);
+    const issuesResolvedRecently = recentHistory.filter(a =>
+      a.action === 'action-verify' && a.result === 'success'
+    ).length;
+
+    if (issuesResolvedRecently === 0) {
+      console.log('No progress in recent iterations, stopping.');
+      return false;
+    }
+  }
+
+  // Continue if critical/high issues remain
+  const hasUrgentIssues = state.issues.some(i =>
+    i.severity === 'critical' || i.severity === 'high'
+  );
+
+  return hasUrgentIssues;
+}
+```
+
+---
+
+## Reporting Format
+
+### Quality Summary Table
+
+| Dimension | Score | Weight | Weighted |
+|-----------|-------|--------|----------|
+| Severity Distribution | {score}/100 | 40% | {weighted} |
+| Fix Effectiveness | {score}/100 | 30% | {weighted} |
+| Coverage Completeness | {score}/20 | 20% | {score} |
+| Iteration Efficiency | {score}/10 | 10% | {score} |
+| **Total** | | | **{total}/100** |
+
+### Gate Status
+
+```
+Quality Gate: {PASS|REVIEW|FAIL}
+
+Criteria:
+- Quality Score: {score} (threshold: 60)
+- Critical Issues: {count} (threshold: 0)
+- High Issues: {count} (threshold: 5)
+```