Add quality gates and tuning strategies documentation

- Introduced quality gates specification for skill tuning, detailing quality dimensions, scoring, and gate definitions.
- Added comprehensive tuning strategies for various issue categories, including context explosion, long-tail forgetting, data flow, and agent coordination.
- Created templates for diagnosis reports and fix proposals to standardize documentation and reporting processes.
This commit is contained in:
catlog22
2026-01-14 12:59:13 +08:00
parent 6b4b9b0775
commit 633d918da1
20 changed files with 5755 additions and 0 deletions

View File

@@ -0,0 +1,342 @@
---
name: skill-tuning
description: Universal skill diagnosis and optimization tool. Detect and fix skill execution issues including context explosion, long-tail forgetting, data flow disruption, and agent coordination failures. Supports Gemini CLI for deep analysis. Triggers on "skill tuning", "tune skill", "skill diagnosis", "optimize skill", "skill debug".
allowed-tools: Task, AskUserQuestion, Read, Write, Bash, Glob, Grep, mcp__ace-tool__search_context
---
# Skill Tuning
Universal skill diagnosis and optimization tool that identifies and resolves skill execution problems through iterative multi-agent analysis.
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Skill Tuning Architecture (Autonomous Mode + Gemini CLI) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ⚠️ Phase 0: Specification → 阅读规范 + 理解目标 skill 结构 (强制前置) │
│ Study │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Orchestrator (状态驱动决策) │ │
│ │ 读取诊断状态 → 选择下一步动作 → 执行 → 更新状态 → 循环直到完成 │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────┬───────────┼───────────┬────────────┬────────────┐ │
│ ↓ ↓ ↓ ↓ ↓ ↓ │
│ ┌──────┐ ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌─────────┐ │
│ │ Init │ │Diagnose │ │Diagnose│ │Diagnose│ │Diagnose│ │ Gemini │ │
│ │ │ │ Context │ │ Memory │ │DataFlow│ │ Agent │ │Analysis │ │
│ └──────┘ └─────────┘ └────────┘ └────────┘ └────────┘ └─────────┘ │
│ │ │ │ │ │ │ │
│ └───────────┴───────────┴───────────┴────────────┴────────────┘ │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Apply Fixes + │ │
│ │ Verify Results │ │
│ └──────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Gemini CLI Integration │ │
│ │ 根据用户需求动态调用 gemini cli 进行深度分析: │ │
│ │ • 复杂问题分析 (prompt engineering, architecture review) │ │
│ │ • 代码模式识别 (pattern matching, anti-pattern detection) │ │
│ │ • 修复策略生成 (fix generation, refactoring suggestions) │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Problem Domain
Based on comprehensive analysis, skill-tuning addresses **core skill issues** and **general optimization areas**:
### Core Skill Issues (自动检测)
| Priority | Problem | Root Cause | Solution Strategy |
|----------|---------|------------|-------------------|
| **P0** | Data Flow Disruption | Scattered state, inconsistent formats | Centralized session store, transactional updates |
| **P1** | Agent Coordination | Fragile call chains, merge complexity | Dedicated orchestrator, enforced data contracts |
| **P2** | Context Explosion | Token accumulation, multi-turn bloat | Context summarization, sliding window, structured state |
| **P3** | Long-tail Forgetting | Early constraint loss | Constraint injection, checkpointing, goal alignment |
### General Optimization Areas (按需分析 via Gemini CLI)
| Category | Issues | Gemini Analysis Scope |
|----------|--------|----------------------|
| **Prompt Engineering** | 模糊指令, 输出格式不一致, 幻觉风险 | 提示词优化, 结构化输出设计 |
| **Architecture** | 阶段划分不合理, 依赖混乱, 扩展性差 | 架构审查, 模块化建议 |
| **Performance** | 执行慢, Token消耗高, 重复计算 | 性能分析, 缓存策略 |
| **Error Handling** | 错误恢复不当, 无降级策略, 日志不足 | 容错设计, 可观测性增强 |
| **Output Quality** | 输出不稳定, 格式漂移, 质量波动 | 质量门控, 验证机制 |
| **User Experience** | 交互不流畅, 反馈不清晰, 进度不可见 | UX优化, 进度追踪 |
## Key Design Principles
1. **Problem-First Diagnosis**: Systematic identification before any fix attempt
2. **Data-Driven Analysis**: Record execution traces, token counts, state snapshots
3. **Iterative Refinement**: Multiple tuning rounds until quality gates pass
4. **Non-Destructive**: All changes are reversible with backup checkpoints
5. **Agent Coordination**: Use specialized sub-agents for each diagnosis type
6. **Gemini CLI On-Demand**: Deep analysis via CLI for complex/custom issues
---
## Gemini CLI Integration
根据用户需求动态调用 Gemini CLI 进行深度分析。
### Trigger Conditions
| Condition | Action | CLI Mode |
|-----------|--------|----------|
| 用户描述复杂问题 | 调用 Gemini 分析问题根因 | `analysis` |
| 自动诊断发现 critical 问题 | 请求深度分析确认 | `analysis` |
| 用户请求架构审查 | 执行架构分析 | `analysis` |
| 需要生成修复代码 | 生成修复提案 | `write` |
| 标准策略不适用 | 请求定制化策略 | `analysis` |
### CLI Command Template
```bash
ccw cli -p "
PURPOSE: ${purpose}
TASK: ${task_steps}
MODE: ${mode}
CONTEXT: @${skill_path}/**/*
EXPECTED: ${expected_output}
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/${mode}-protocol.md) | ${constraints}
" --tool gemini --mode ${mode} --cd ${skill_path}
```
### Analysis Types
#### 1. Problem Root Cause Analysis
```bash
ccw cli -p "
PURPOSE: Identify root cause of skill execution issue: ${user_issue_description}
TASK: • Analyze skill structure and phase flow • Identify anti-patterns • Trace data flow issues
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: JSON with { root_causes: [], patterns_found: [], recommendations: [] }
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Focus on execution flow
" --tool gemini --mode analysis
```
#### 2. Architecture Review
```bash
ccw cli -p "
PURPOSE: Review skill architecture for scalability and maintainability
TASK: • Evaluate phase decomposition • Check state management patterns • Assess agent coordination
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: Architecture assessment with improvement recommendations
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Focus on modularity
" --tool gemini --mode analysis
```
#### 3. Fix Strategy Generation
```bash
ccw cli -p "
PURPOSE: Generate fix strategy for issue: ${issue_id} - ${issue_description}
TASK: • Analyze issue context • Design fix approach • Generate implementation plan
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: JSON with { strategy: string, changes: [], verification_steps: [] }
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Minimal invasive changes
" --tool gemini --mode analysis
```
---
## Mandatory Prerequisites
> **CRITICAL**: Read these documents before executing any action.
### Core Specs (Required)
| Document | Purpose | Priority |
|----------|---------|----------|
| [specs/problem-taxonomy.md](specs/problem-taxonomy.md) | Problem classification and detection patterns | **P0** |
| [specs/tuning-strategies.md](specs/tuning-strategies.md) | Fix strategies for each problem type | **P0** |
| [specs/quality-gates.md](specs/quality-gates.md) | Quality thresholds and verification criteria | P1 |
### Templates (Reference)
| Document | Purpose |
|----------|---------|
| [templates/diagnosis-report.md](templates/diagnosis-report.md) | Diagnosis report structure |
| [templates/fix-proposal.md](templates/fix-proposal.md) | Fix proposal format |
---
## Execution Flow
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Phase 0: Specification Study (强制前置 - 禁止跳过) │
│ → Read: specs/problem-taxonomy.md (问题分类) │
│ → Read: specs/tuning-strategies.md (调优策略) │
│ → Read: Target skill's SKILL.md and phases/*.md │
│ → Output: 内化规范,理解目标 skill 结构 │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-init: Initialize Tuning Session │
│ → Create work directory: .workflow/.scratchpad/skill-tuning-{timestamp} │
│ → Initialize state.json with target skill info │
│ → Create backup of target skill files │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-diagnose-context: Context Explosion Analysis │
│ → Scan for token accumulation patterns │
│ → Detect multi-turn dialogue growth │
│ → Output: context-diagnosis.json │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-diagnose-memory: Long-tail Forgetting Analysis │
│ → Trace constraint propagation through phases │
│ → Detect early instruction loss │
│ → Output: memory-diagnosis.json │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-diagnose-dataflow: Data Flow Analysis │
│ → Map state transitions between phases │
│ → Detect format inconsistencies │
│ → Output: dataflow-diagnosis.json │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-diagnose-agent: Agent Coordination Analysis │
│ → Analyze agent call patterns │
│ → Detect result passing issues │
│ → Output: agent-diagnosis.json │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-generate-report: Consolidated Report │
│ → Merge all diagnosis results │
│ → Prioritize issues by severity │
│ → Output: tuning-report.md │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-propose-fixes: Fix Proposal Generation │
│ → Generate fix strategies for each issue │
│ → Create implementation plan │
│ → Output: fix-proposals.json │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-apply-fix: Apply Selected Fix │
│ → User selects fix to apply │
│ → Execute fix with backup │
│ → Update state with fix result │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-verify: Verification │
│ → Re-run affected diagnosis │
│ → Check quality gates │
│ → Update iteration count │
├─────────────────────────────────────────────────────────────────────────────┤
│ action-complete: Finalization │
│ → Generate final report │
│ → Cleanup temporary files │
│ → Output: tuning-summary.md │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Directory Setup
```javascript
const timestamp = new Date().toISOString().slice(0,19).replace(/[-:T]/g, '');
const workDir = `.workflow/.scratchpad/skill-tuning-${timestamp}`;
Bash(`mkdir -p "${workDir}/diagnosis"`);
Bash(`mkdir -p "${workDir}/backups"`);
Bash(`mkdir -p "${workDir}/fixes"`);
```
## Output Structure
```
.workflow/.scratchpad/skill-tuning-{timestamp}/
├── state.json # Session state (orchestrator-managed)
├── diagnosis/
│ ├── context-diagnosis.json # Context explosion analysis
│ ├── memory-diagnosis.json # Long-tail forgetting analysis
│ ├── dataflow-diagnosis.json # Data flow analysis
│ └── agent-diagnosis.json # Agent coordination analysis
├── backups/
│ └── {skill-name}-backup/ # Original skill files backup
├── fixes/
│ ├── fix-proposals.json # Proposed fixes
│ └── applied-fixes.json # Applied fix history
├── tuning-report.md # Consolidated diagnosis report
└── tuning-summary.md # Final summary
```
## State Schema
```typescript
interface TuningState {
status: 'pending' | 'running' | 'completed' | 'failed';
target_skill: {
name: string;
path: string;
execution_mode: 'sequential' | 'autonomous';
};
user_issue_description: string;
diagnosis: {
context: DiagnosisResult | null;
memory: DiagnosisResult | null;
dataflow: DiagnosisResult | null;
agent: DiagnosisResult | null;
};
issues: Issue[];
proposed_fixes: Fix[];
applied_fixes: AppliedFix[];
iteration_count: number;
max_iterations: number;
quality_score: number;
completed_actions: string[];
current_action: string | null;
errors: Error[];
error_count: number;
}
interface DiagnosisResult {
status: 'completed' | 'skipped';
issues_found: number;
severity: 'critical' | 'high' | 'medium' | 'low' | 'none';
details: any;
}
interface Issue {
id: string;
type: 'context_explosion' | 'memory_loss' | 'dataflow_break' | 'agent_failure';
severity: 'critical' | 'high' | 'medium' | 'low';
location: string;
description: string;
evidence: string[];
}
interface Fix {
id: string;
issue_id: string;
strategy: string;
description: string;
changes: FileChange[];
risk: 'low' | 'medium' | 'high';
}
```
## Reference Documents
| Document | Purpose |
|----------|---------|
| [phases/orchestrator.md](phases/orchestrator.md) | Orchestrator decision logic |
| [phases/state-schema.md](phases/state-schema.md) | State structure definition |
| [phases/actions/action-init.md](phases/actions/action-init.md) | Initialize tuning session |
| [phases/actions/action-diagnose-context.md](phases/actions/action-diagnose-context.md) | Context explosion diagnosis |
| [phases/actions/action-diagnose-memory.md](phases/actions/action-diagnose-memory.md) | Long-tail forgetting diagnosis |
| [phases/actions/action-diagnose-dataflow.md](phases/actions/action-diagnose-dataflow.md) | Data flow diagnosis |
| [phases/actions/action-diagnose-agent.md](phases/actions/action-diagnose-agent.md) | Agent coordination diagnosis |
| [phases/actions/action-generate-report.md](phases/actions/action-generate-report.md) | Report generation |
| [phases/actions/action-propose-fixes.md](phases/actions/action-propose-fixes.md) | Fix proposal |
| [phases/actions/action-apply-fix.md](phases/actions/action-apply-fix.md) | Fix application |
| [phases/actions/action-verify.md](phases/actions/action-verify.md) | Verification |
| [phases/actions/action-complete.md](phases/actions/action-complete.md) | Finalization |
| [specs/problem-taxonomy.md](specs/problem-taxonomy.md) | Problem classification |
| [specs/tuning-strategies.md](specs/tuning-strategies.md) | Fix strategies |
| [specs/quality-gates.md](specs/quality-gates.md) | Quality criteria |

View File

@@ -0,0 +1,164 @@
# Action: Abort
Abort the tuning session due to unrecoverable errors.
## Purpose
- Safely terminate on critical failures
- Preserve diagnostic information for debugging
- Ensure backup remains available
- Notify user of failure reason
## Preconditions
- [ ] state.error_count >= state.max_errors
- [ ] OR critical failure detected
## Execution
```javascript
async function execute(state, workDir) {
console.log('Aborting skill tuning session...');
const errors = state.errors;
const targetSkill = state.target_skill;
// Generate abort report
const abortReport = `# Skill Tuning Aborted
**Target Skill**: ${targetSkill?.name || 'Unknown'}
**Aborted At**: ${new Date().toISOString()}
**Reason**: Too many errors or critical failure
---
## Error Log
${errors.length === 0 ? '_No errors recorded_' :
errors.map((err, i) => `
### Error ${i + 1}
- **Action**: ${err.action}
- **Message**: ${err.message}
- **Time**: ${err.timestamp}
- **Recoverable**: ${err.recoverable ? 'Yes' : 'No'}
`).join('\n')}
---
## Session State at Abort
- **Status**: ${state.status}
- **Iteration Count**: ${state.iteration_count}
- **Completed Actions**: ${state.completed_actions.length}
- **Issues Found**: ${state.issues.length}
- **Fixes Applied**: ${state.applied_fixes.length}
---
## Recovery Options
### Option 1: Restore Original Skill
If any changes were made, restore from backup:
\`\`\`bash
cp -r "${state.backup_dir}/${targetSkill?.name || 'backup'}-backup"/* "${targetSkill?.path || 'target'}/"
\`\`\`
### Option 2: Resume from Last State
The session state is preserved at:
\`${workDir}/state.json\`
To resume:
1. Fix the underlying issue
2. Reset error_count in state.json
3. Re-run skill-tuning with --resume flag
### Option 3: Manual Investigation
Review the following files:
- Diagnosis results: \`${workDir}/diagnosis/*.json\`
- Error log: \`${workDir}/errors.json\`
- State snapshot: \`${workDir}/state.json\`
---
## Diagnostic Information
### Last Successful Action
${state.completed_actions.length > 0 ? state.completed_actions[state.completed_actions.length - 1] : 'None'}
### Current Action When Failed
${state.current_action || 'Unknown'}
### Partial Diagnosis Results
- Context: ${state.diagnosis.context ? 'Completed' : 'Not completed'}
- Memory: ${state.diagnosis.memory ? 'Completed' : 'Not completed'}
- Data Flow: ${state.diagnosis.dataflow ? 'Completed' : 'Not completed'}
- Agent: ${state.diagnosis.agent ? 'Completed' : 'Not completed'}
---
*Skill tuning aborted - please review errors and retry*
`;
// Write abort report
Write(`${workDir}/abort-report.md`, abortReport);
// Save error log
Write(`${workDir}/errors.json`, JSON.stringify(errors, null, 2));
// Notify user
await AskUserQuestion({
questions: [{
question: `Skill tuning aborted due to ${errors.length} errors. Would you like to restore the original skill?`,
header: 'Restore',
multiSelect: false,
options: [
{ label: 'Yes, restore', description: 'Restore original skill from backup' },
{ label: 'No, keep changes', description: 'Keep any partial changes made' }
]
}]
}).then(async response => {
if (response['Restore'] === 'Yes, restore') {
// Restore from backup
if (state.backup_dir && targetSkill?.path) {
Bash(`cp -r "${state.backup_dir}/${targetSkill.name}-backup"/* "${targetSkill.path}/"`);
console.log('Original skill restored from backup.');
}
}
}).catch(() => {
// User cancelled, don't restore
});
return {
stateUpdates: {
status: 'failed',
completed_at: new Date().toISOString()
},
outputFiles: [`${workDir}/abort-report.md`, `${workDir}/errors.json`],
summary: `Tuning aborted: ${errors.length} errors. Check abort-report.md for details.`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
status: 'failed',
completed_at: '<timestamp>'
}
};
```
## Output
- **File**: `abort-report.md`
- **Location**: `${workDir}/abort-report.md`
## Error Handling
This action should not fail - it's the final error handler.
## Next Actions
- None (terminal state)

View File

@@ -0,0 +1,206 @@
# Action: Apply Fix
Apply a selected fix to the target skill with backup and rollback capability.
## Purpose
- Apply fix changes to target skill files
- Create backup before modifications
- Track applied fixes for verification
- Support rollback if needed
## Preconditions
- [ ] state.status === 'running'
- [ ] state.pending_fixes.length > 0
- [ ] state.proposed_fixes contains the fix to apply
## Execution
```javascript
async function execute(state, workDir) {
const pendingFixes = state.pending_fixes;
const proposedFixes = state.proposed_fixes;
const targetPath = state.target_skill.path;
const backupDir = state.backup_dir;
if (pendingFixes.length === 0) {
return {
stateUpdates: {},
outputFiles: [],
summary: 'No pending fixes to apply'
};
}
// Get next fix to apply
const fixId = pendingFixes[0];
const fix = proposedFixes.find(f => f.id === fixId);
if (!fix) {
return {
stateUpdates: {
pending_fixes: pendingFixes.slice(1),
errors: [...state.errors, {
action: 'action-apply-fix',
message: `Fix ${fixId} not found in proposals`,
timestamp: new Date().toISOString(),
recoverable: true
}]
},
outputFiles: [],
summary: `Fix ${fixId} not found, skipping`
};
}
console.log(`Applying fix ${fix.id}: ${fix.description}`);
// Create fix-specific backup
const fixBackupDir = `${backupDir}/before-${fix.id}`;
Bash(`mkdir -p "${fixBackupDir}"`);
const appliedChanges = [];
let success = true;
for (const change of fix.changes) {
try {
// Resolve file path (handle wildcards)
let targetFiles = [];
if (change.file.includes('*')) {
targetFiles = Glob(`${targetPath}/${change.file}`);
} else {
targetFiles = [`${targetPath}/${change.file}`];
}
for (const targetFile of targetFiles) {
// Backup original
const relativePath = targetFile.replace(targetPath + '/', '');
const backupPath = `${fixBackupDir}/${relativePath}`;
if (Glob(targetFile).length > 0) {
const originalContent = Read(targetFile);
Bash(`mkdir -p "$(dirname "${backupPath}")"`);
Write(backupPath, originalContent);
}
// Apply change based on action type
if (change.action === 'modify' && change.diff) {
// For now, append the diff as a comment/note
// Real implementation would parse and apply the diff
const existingContent = Read(targetFile);
// Simple diff application: look for context and apply
// This is a simplified version - real implementation would be more sophisticated
const newContent = existingContent + `\n\n<!-- Applied fix ${fix.id}: ${fix.description} -->\n`;
Write(targetFile, newContent);
appliedChanges.push({
file: relativePath,
action: 'modified',
backup: backupPath
});
} else if (change.action === 'create') {
Write(targetFile, change.new_content || '');
appliedChanges.push({
file: relativePath,
action: 'created',
backup: null
});
}
}
} catch (error) {
console.log(`Error applying change to ${change.file}: ${error.message}`);
success = false;
}
}
// Record applied fix
const appliedFix = {
fix_id: fix.id,
applied_at: new Date().toISOString(),
success: success,
backup_path: fixBackupDir,
verification_result: 'pending',
rollback_available: true,
changes_made: appliedChanges
};
// Update applied fixes log
const appliedFixesPath = `${workDir}/fixes/applied-fixes.json`;
let existingApplied = [];
try {
existingApplied = JSON.parse(Read(appliedFixesPath));
} catch (e) {
existingApplied = [];
}
existingApplied.push(appliedFix);
Write(appliedFixesPath, JSON.stringify(existingApplied, null, 2));
return {
stateUpdates: {
applied_fixes: [...state.applied_fixes, appliedFix],
pending_fixes: pendingFixes.slice(1) // Remove applied fix from pending
},
outputFiles: [appliedFixesPath],
summary: `Applied fix ${fix.id}: ${success ? 'success' : 'partial'}, ${appliedChanges.length} files modified`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
applied_fixes: [...existingApplied, newAppliedFix],
pending_fixes: remainingPendingFixes
}
};
```
## Rollback Function
```javascript
async function rollbackFix(fixId, state, workDir) {
const appliedFix = state.applied_fixes.find(f => f.fix_id === fixId);
if (!appliedFix || !appliedFix.rollback_available) {
throw new Error(`Cannot rollback fix ${fixId}`);
}
const backupDir = appliedFix.backup_path;
const targetPath = state.target_skill.path;
// Restore from backup
const backupFiles = Glob(`${backupDir}/**/*`);
for (const backupFile of backupFiles) {
const relativePath = backupFile.replace(backupDir + '/', '');
const targetFile = `${targetPath}/${relativePath}`;
const content = Read(backupFile);
Write(targetFile, content);
}
return {
stateUpdates: {
applied_fixes: state.applied_fixes.map(f =>
f.fix_id === fixId
? { ...f, rollback_available: false, verification_result: 'rolled_back' }
: f
)
}
};
}
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| File not found | Skip file, log warning |
| Write permission error | Retry with sudo or report |
| Backup creation failed | Abort fix, don't modify |
## Next Actions
- If pending_fixes.length > 0: action-apply-fix (continue)
- If all fixes applied: action-verify

View File

@@ -0,0 +1,195 @@
# Action: Complete
Finalize the tuning session with summary report and cleanup.
## Purpose
- Generate final summary report
- Record tuning statistics
- Clean up temporary files (optional)
- Provide recommendations for future maintenance
## Preconditions
- [ ] state.status === 'running'
- [ ] quality_gate === 'pass' OR max_iterations reached
## Execution
```javascript
async function execute(state, workDir) {
console.log('Finalizing skill tuning session...');
const targetSkill = state.target_skill;
const startTime = new Date(state.started_at);
const endTime = new Date();
const duration = Math.round((endTime - startTime) / 1000);
// Generate final summary
const summary = `# Skill Tuning Summary
**Target Skill**: ${targetSkill.name}
**Path**: ${targetSkill.path}
**Session Duration**: ${duration} seconds
**Completed**: ${endTime.toISOString()}
---
## Final Status
| Metric | Value |
|--------|-------|
| Final Health Score | ${state.quality_score}/100 |
| Quality Gate | ${state.quality_gate.toUpperCase()} |
| Total Iterations | ${state.iteration_count} |
| Issues Found | ${state.issues.length + state.applied_fixes.flatMap(f => f.issues_resolved || []).length} |
| Issues Resolved | ${state.applied_fixes.flatMap(f => f.issues_resolved || []).length} |
| Fixes Applied | ${state.applied_fixes.length} |
| Fixes Verified | ${state.applied_fixes.filter(f => f.verification_result === 'pass').length} |
---
## Diagnosis Summary
| Area | Issues Found | Severity |
|------|--------------|----------|
| Context Explosion | ${state.diagnosis.context?.issues_found || 'N/A'} | ${state.diagnosis.context?.severity || 'N/A'} |
| Long-tail Forgetting | ${state.diagnosis.memory?.issues_found || 'N/A'} | ${state.diagnosis.memory?.severity || 'N/A'} |
| Data Flow | ${state.diagnosis.dataflow?.issues_found || 'N/A'} | ${state.diagnosis.dataflow?.severity || 'N/A'} |
| Agent Coordination | ${state.diagnosis.agent?.issues_found || 'N/A'} | ${state.diagnosis.agent?.severity || 'N/A'} |
---
## Applied Fixes
${state.applied_fixes.length === 0 ? '_No fixes applied_' :
state.applied_fixes.map((fix, i) => `
### ${i + 1}. ${fix.fix_id}
- **Applied At**: ${fix.applied_at}
- **Success**: ${fix.success ? 'Yes' : 'No'}
- **Verification**: ${fix.verification_result}
- **Rollback Available**: ${fix.rollback_available ? 'Yes' : 'No'}
`).join('\n')}
---
## Remaining Issues
${state.issues.length === 0 ? '✅ All issues resolved!' :
`${state.issues.length} issues remain:\n\n` +
state.issues.map(issue =>
`- **[${issue.severity.toUpperCase()}]** ${issue.description} (${issue.id})`
).join('\n')}
---
## Recommendations
${generateRecommendations(state)}
---
## Backup Information
Original skill files backed up to:
\`${state.backup_dir}\`
To restore original skill:
\`\`\`bash
cp -r "${state.backup_dir}/${targetSkill.name}-backup"/* "${targetSkill.path}/"
\`\`\`
---
## Session Files
| File | Description |
|------|-------------|
| ${workDir}/tuning-report.md | Full diagnostic report |
| ${workDir}/diagnosis/*.json | Individual diagnosis results |
| ${workDir}/fixes/fix-proposals.json | Proposed fixes |
| ${workDir}/fixes/applied-fixes.json | Applied fix history |
| ${workDir}/tuning-summary.md | This summary |
---
*Skill tuning completed by skill-tuning*
`;
Write(`${workDir}/tuning-summary.md`, summary);
// Update final state
return {
stateUpdates: {
status: 'completed',
completed_at: endTime.toISOString()
},
outputFiles: [`${workDir}/tuning-summary.md`],
summary: `Tuning complete: ${state.quality_gate} with ${state.quality_score}/100 health score`
};
}
function generateRecommendations(state) {
const recommendations = [];
// Based on remaining issues
if (state.issues.some(i => i.type === 'context_explosion')) {
recommendations.push('- **Context Management**: Consider implementing a context summarization agent to prevent token growth');
}
if (state.issues.some(i => i.type === 'memory_loss')) {
recommendations.push('- **Constraint Tracking**: Add explicit constraint injection to each phase prompt');
}
if (state.issues.some(i => i.type === 'dataflow_break')) {
recommendations.push('- **State Centralization**: Migrate to single state.json with schema validation');
}
if (state.issues.some(i => i.type === 'agent_failure')) {
recommendations.push('- **Error Handling**: Wrap all Task calls in try-catch blocks');
}
// General recommendations
if (state.iteration_count >= state.max_iterations) {
recommendations.push('- **Deep Refactoring**: Consider architectural review if issues persist after multiple iterations');
}
if (state.quality_score < 80) {
recommendations.push('- **Regular Tuning**: Schedule periodic skill-tuning runs to catch issues early');
}
if (recommendations.length === 0) {
recommendations.push('- Skill is in good health! Monitor for regressions during future development.');
}
return recommendations.join('\n');
}
```
## State Updates
```javascript
return {
stateUpdates: {
status: 'completed',
completed_at: '<timestamp>'
}
};
```
## Output
- **File**: `tuning-summary.md`
- **Location**: `${workDir}/tuning-summary.md`
- **Format**: Markdown
## Error Handling
| Error Type | Recovery |
|------------|----------|
| Summary write failed | Write to alternative location |
## Next Actions
- None (terminal state)

View File

@@ -0,0 +1,317 @@
# Action: Diagnose Agent Coordination
Analyze target skill for agent coordination failures - call chain fragility and result passing issues.
## Purpose
- Detect fragile agent call patterns
- Identify result passing issues
- Find missing error handling in agent calls
- Analyze agent return format consistency
## Preconditions
- [ ] state.status === 'running'
- [ ] state.target_skill.path is set
- [ ] 'agent' in state.focus_areas OR state.focus_areas is empty
## Detection Patterns
### Pattern 1: Unhandled Agent Failures
```regex
# Task calls without try-catch or error handling
/Task\s*\(\s*\{[^}]*\}\s*\)(?![^;]*catch)/
```
### Pattern 2: Missing Return Validation
```regex
# Agent result used directly without validation
/const\s+\w+\s*=\s*await?\s*Task\([^)]+\);\s*(?!.*(?:if|try|JSON\.parse))/
```
### Pattern 3: Inconsistent Agent Configuration
```regex
# Different agent configurations in same skill
/subagent_type:\s*['"](\w+)['"]/g
```
### Pattern 4: Deeply Nested Agent Calls
```regex
# Agent calling another agent (nested)
/Task\s*\([^)]*prompt:[^)]*Task\s*\(/
```
## Execution
```javascript
async function execute(state, workDir) {
const skillPath = state.target_skill.path;
const startTime = Date.now();
const issues = [];
const evidence = [];
console.log(`Diagnosing agent coordination in ${skillPath}...`);
// 1. Find all Task/agent calls
const allFiles = Glob(`${skillPath}/**/*.md`);
const agentCalls = [];
const agentTypes = new Set();
for (const file of allFiles) {
const content = Read(file);
const relativePath = file.replace(skillPath + '/', '');
// Find Task calls
const taskMatches = content.matchAll(/Task\s*\(\s*\{([^}]+)\}/g);
for (const match of taskMatches) {
const config = match[1];
// Extract agent type
const typeMatch = config.match(/subagent_type:\s*['"]([^'"]+)['"]/);
const agentType = typeMatch ? typeMatch[1] : 'unknown';
agentTypes.add(agentType);
// Check for error handling context
const hasErrorHandling = /try\s*\{.*Task|\.catch\(|await\s+Task.*\.then/s.test(
content.slice(Math.max(0, match.index - 100), match.index + match[0].length + 100)
);
// Check for result validation
const hasResultValidation = /JSON\.parse|if\s*\(\s*result|result\s*\?\./s.test(
content.slice(match.index, match.index + match[0].length + 200)
);
// Check for background execution
const runsInBackground = /run_in_background:\s*true/.test(config);
agentCalls.push({
file: relativePath,
agentType,
hasErrorHandling,
hasResultValidation,
runsInBackground,
config: config.slice(0, 200)
});
}
}
// 2. Analyze agent call patterns
const totalCalls = agentCalls.length;
const callsWithoutErrorHandling = agentCalls.filter(c => !c.hasErrorHandling);
const callsWithoutValidation = agentCalls.filter(c => !c.hasResultValidation);
// Issue: Missing error handling
if (callsWithoutErrorHandling.length > 0) {
issues.push({
id: `AGT-${issues.length + 1}`,
type: 'agent_failure',
severity: callsWithoutErrorHandling.length > 2 ? 'high' : 'medium',
location: { file: 'multiple' },
description: `${callsWithoutErrorHandling.length}/${totalCalls} agent calls lack error handling`,
evidence: callsWithoutErrorHandling.slice(0, 3).map(c =>
`${c.file}: ${c.agentType}`
),
root_cause: 'Agent failures not caught, may crash workflow',
impact: 'Unhandled agent errors cause cascading failures',
suggested_fix: 'Wrap Task calls in try-catch with graceful fallback'
});
evidence.push({
file: 'multiple',
pattern: 'missing_error_handling',
context: `${callsWithoutErrorHandling.length} calls affected`,
severity: 'high'
});
}
// Issue: Missing result validation
if (callsWithoutValidation.length > 0) {
issues.push({
id: `AGT-${issues.length + 1}`,
type: 'agent_failure',
severity: 'medium',
location: { file: 'multiple' },
description: `${callsWithoutValidation.length}/${totalCalls} agent calls lack result validation`,
evidence: callsWithoutValidation.slice(0, 3).map(c =>
`${c.file}: ${c.agentType} result not validated`
),
root_cause: 'Agent results used directly without type checking',
impact: 'Invalid agent output may corrupt state',
suggested_fix: 'Add JSON.parse with try-catch and schema validation'
});
}
// 3. Check for inconsistent agent types usage
if (agentTypes.size > 3 && state.target_skill.execution_mode === 'autonomous') {
issues.push({
id: `AGT-${issues.length + 1}`,
type: 'agent_failure',
severity: 'low',
location: { file: 'multiple' },
description: `Using ${agentTypes.size} different agent types`,
evidence: [...agentTypes].slice(0, 5),
root_cause: 'Multiple agent types increase coordination complexity',
impact: 'Different agent behaviors may cause inconsistency',
suggested_fix: 'Standardize on fewer agent types with clear roles'
});
}
// 4. Check for nested agent calls
for (const file of allFiles) {
const content = Read(file);
const relativePath = file.replace(skillPath + '/', '');
// Detect nested Task calls
const hasNestedTask = /Task\s*\([^)]*prompt:[^)]*Task\s*\(/s.test(content);
if (hasNestedTask) {
issues.push({
id: `AGT-${issues.length + 1}`,
type: 'agent_failure',
severity: 'high',
location: { file: relativePath },
description: 'Nested agent calls detected',
evidence: ['Agent prompt contains another Task call'],
root_cause: 'Agent calls another agent, creating deep nesting',
impact: 'Context explosion, hard to debug, unpredictable behavior',
suggested_fix: 'Flatten agent calls, use orchestrator to coordinate'
});
}
}
// 5. Check SKILL.md for agent configuration consistency
const skillMd = Read(`${skillPath}/SKILL.md`);
// Check if allowed-tools includes Task
const allowedTools = skillMd.match(/allowed-tools:\s*([^\n]+)/i);
if (allowedTools && !allowedTools[1].includes('Task') && totalCalls > 0) {
issues.push({
id: `AGT-${issues.length + 1}`,
type: 'agent_failure',
severity: 'medium',
location: { file: 'SKILL.md' },
description: 'Task tool used but not declared in allowed-tools',
evidence: [`${totalCalls} Task calls found, but Task not in allowed-tools`],
root_cause: 'Tool declaration mismatch',
impact: 'May cause runtime permission issues',
suggested_fix: 'Add Task to allowed-tools in SKILL.md front matter'
});
}
// 6. Check for agent result format consistency
const returnFormats = new Set();
for (const file of allFiles) {
const content = Read(file);
// Look for return format definitions
const returnMatch = content.match(/\[RETURN\][^[]*|return\s*\{[^}]+\}/gi);
if (returnMatch) {
returnMatch.forEach(r => {
const format = r.includes('JSON') ? 'json' :
r.includes('summary') ? 'summary' :
r.includes('file') ? 'file_path' : 'other';
returnFormats.add(format);
});
}
}
if (returnFormats.size > 2) {
issues.push({
id: `AGT-${issues.length + 1}`,
type: 'agent_failure',
severity: 'medium',
location: { file: 'multiple' },
description: 'Inconsistent agent return formats',
evidence: [...returnFormats],
root_cause: 'Different agents return data in different formats',
impact: 'Orchestrator must handle multiple format types',
suggested_fix: 'Standardize return format: {status, output_file, summary}'
});
}
// 7. Calculate severity
const criticalCount = issues.filter(i => i.severity === 'critical').length;
const highCount = issues.filter(i => i.severity === 'high').length;
const severity = criticalCount > 0 ? 'critical' :
highCount > 1 ? 'high' :
highCount > 0 ? 'medium' :
issues.length > 0 ? 'low' : 'none';
// 8. Write diagnosis result
const diagnosisResult = {
status: 'completed',
issues_found: issues.length,
severity: severity,
execution_time_ms: Date.now() - startTime,
details: {
patterns_checked: [
'error_handling',
'result_validation',
'agent_type_consistency',
'nested_calls',
'return_format_consistency'
],
patterns_matched: evidence.map(e => e.pattern),
evidence: evidence,
agent_analysis: {
total_agent_calls: totalCalls,
unique_agent_types: agentTypes.size,
calls_without_error_handling: callsWithoutErrorHandling.length,
calls_without_validation: callsWithoutValidation.length,
agent_types_used: [...agentTypes]
},
recommendations: [
callsWithoutErrorHandling.length > 0
? 'Add try-catch to all Task calls' : null,
callsWithoutValidation.length > 0
? 'Add result validation with JSON.parse and schema check' : null,
agentTypes.size > 3
? 'Consolidate agent types for consistency' : null
].filter(Boolean)
}
};
Write(`${workDir}/diagnosis/agent-diagnosis.json`,
JSON.stringify(diagnosisResult, null, 2));
return {
stateUpdates: {
'diagnosis.agent': diagnosisResult,
issues: [...state.issues, ...issues]
},
outputFiles: [`${workDir}/diagnosis/agent-diagnosis.json`],
summary: `Agent diagnosis: ${issues.length} issues found (severity: ${severity})`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
'diagnosis.agent': {
status: 'completed',
issues_found: <count>,
severity: '<critical|high|medium|low|none>',
// ... full diagnosis result
},
issues: [...existingIssues, ...newIssues]
}
};
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| Regex match error | Use simpler patterns |
| File access error | Skip and continue |
## Next Actions
- Success: action-generate-report
- Skipped: If 'agent' not in focus_areas

View File

@@ -0,0 +1,243 @@
# Action: Diagnose Context Explosion
Analyze target skill for context explosion issues - token accumulation and multi-turn dialogue bloat.
## Purpose
- Detect patterns that cause context growth
- Identify multi-turn accumulation points
- Find missing context compression mechanisms
- Measure potential token waste
## Preconditions
- [ ] state.status === 'running'
- [ ] state.target_skill.path is set
- [ ] 'context' in state.focus_areas OR state.focus_areas is empty
## Detection Patterns
### Pattern 1: Unbounded History Accumulation
```regex
# Patterns that suggest history accumulation
/\bhistory\b.*\.push\b/
/\bmessages\b.*\.concat\b/
/\bconversation\b.*\+=\b/
/\bappend.*context\b/i
```
### Pattern 2: Full Content Passing
```regex
# Patterns that pass full content instead of references
/Read\([^)]+\).*\+.*Read\(/
/JSON\.stringify\(.*state\)/ # Full state serialization
/\$\{.*content\}/ # Template literal with full content
```
### Pattern 3: Missing Summarization
```regex
# Absence of compression/summarization
# Check for lack of: summarize, compress, truncate, slice
```
### Pattern 4: Agent Return Bloat
```regex
# Agent returning full content instead of path + summary
/return\s*\{[^}]*content:/
/return.*JSON\.stringify/
```
## Execution
```javascript
async function execute(state, workDir) {
const skillPath = state.target_skill.path;
const startTime = Date.now();
const issues = [];
const evidence = [];
console.log(`Diagnosing context explosion in ${skillPath}...`);
// 1. Scan all phase files
const phaseFiles = Glob(`${skillPath}/phases/**/*.md`);
for (const file of phaseFiles) {
const content = Read(file);
const relativePath = file.replace(skillPath + '/', '');
// Check Pattern 1: History accumulation
const historyPatterns = [
/history\s*[.=].*push|concat|append/gi,
/messages\s*=\s*\[.*\.\.\..*messages/gi,
/conversation.*\+=/gi
];
for (const pattern of historyPatterns) {
const matches = content.match(pattern);
if (matches) {
issues.push({
id: `CTX-${issues.length + 1}`,
type: 'context_explosion',
severity: 'high',
location: { file: relativePath },
description: 'Unbounded history accumulation detected',
evidence: matches.slice(0, 3),
root_cause: 'History/messages array grows without bounds',
impact: 'Token count increases linearly with iterations',
suggested_fix: 'Implement sliding window or summarization'
});
evidence.push({
file: relativePath,
pattern: 'history_accumulation',
context: matches[0],
severity: 'high'
});
}
}
// Check Pattern 2: Full content passing
const contentPatterns = [
/Read\s*\([^)]+\)\s*[\+,]/g,
/JSON\.stringify\s*\(\s*state\s*\)/g,
/\$\{[^}]*content[^}]*\}/g
];
for (const pattern of contentPatterns) {
const matches = content.match(pattern);
if (matches) {
issues.push({
id: `CTX-${issues.length + 1}`,
type: 'context_explosion',
severity: 'medium',
location: { file: relativePath },
description: 'Full content passed instead of reference',
evidence: matches.slice(0, 3),
root_cause: 'Entire file/state content included in prompts',
impact: 'Unnecessary token consumption',
suggested_fix: 'Pass file paths and summaries instead of full content'
});
evidence.push({
file: relativePath,
pattern: 'full_content_passing',
context: matches[0],
severity: 'medium'
});
}
}
// Check Pattern 3: Missing summarization
const hasSummarization = /summariz|compress|truncat|slice.*context/i.test(content);
const hasLongPrompts = content.length > 5000;
if (hasLongPrompts && !hasSummarization) {
issues.push({
id: `CTX-${issues.length + 1}`,
type: 'context_explosion',
severity: 'medium',
location: { file: relativePath },
description: 'Long phase file without summarization mechanism',
evidence: [`File length: ${content.length} chars`],
root_cause: 'No context compression for large content',
impact: 'Potential token overflow in long sessions',
suggested_fix: 'Add context summarization before passing to agents'
});
}
// Check Pattern 4: Agent return bloat
const returnPatterns = /return\s*\{[^}]*(?:content|full_output|complete_result):/g;
const returnMatches = content.match(returnPatterns);
if (returnMatches) {
issues.push({
id: `CTX-${issues.length + 1}`,
type: 'context_explosion',
severity: 'high',
location: { file: relativePath },
description: 'Agent returns full content instead of path+summary',
evidence: returnMatches.slice(0, 3),
root_cause: 'Agent output includes complete content',
impact: 'Context bloat when orchestrator receives full output',
suggested_fix: 'Return {output_file, summary} instead of {content}'
});
}
}
// 2. Calculate severity
const criticalCount = issues.filter(i => i.severity === 'critical').length;
const highCount = issues.filter(i => i.severity === 'high').length;
const severity = criticalCount > 0 ? 'critical' :
highCount > 2 ? 'high' :
highCount > 0 ? 'medium' :
issues.length > 0 ? 'low' : 'none';
// 3. Write diagnosis result
const diagnosisResult = {
status: 'completed',
issues_found: issues.length,
severity: severity,
execution_time_ms: Date.now() - startTime,
details: {
patterns_checked: [
'history_accumulation',
'full_content_passing',
'missing_summarization',
'agent_return_bloat'
],
patterns_matched: evidence.map(e => e.pattern),
evidence: evidence,
recommendations: [
issues.length > 0 ? 'Implement context summarization agent' : null,
highCount > 0 ? 'Add sliding window for conversation history' : null,
evidence.some(e => e.pattern === 'full_content_passing')
? 'Refactor to pass file paths instead of content' : null
].filter(Boolean)
}
};
Write(`${workDir}/diagnosis/context-diagnosis.json`,
JSON.stringify(diagnosisResult, null, 2));
return {
stateUpdates: {
'diagnosis.context': diagnosisResult,
issues: [...state.issues, ...issues],
'issues_by_severity.critical': state.issues_by_severity.critical + criticalCount,
'issues_by_severity.high': state.issues_by_severity.high + highCount
},
outputFiles: [`${workDir}/diagnosis/context-diagnosis.json`],
summary: `Context diagnosis: ${issues.length} issues found (severity: ${severity})`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
'diagnosis.context': {
status: 'completed',
issues_found: <count>,
severity: '<critical|high|medium|low|none>',
// ... full diagnosis result
},
issues: [...existingIssues, ...newIssues]
}
};
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| File read error | Skip file, log warning |
| Pattern matching error | Use fallback patterns |
| Write error | Retry to alternative path |
## Next Actions
- Success: action-diagnose-memory (or next in focus_areas)
- Skipped: If 'context' not in focus_areas

View File

@@ -0,0 +1,318 @@
# Action: Diagnose Data Flow Issues
Analyze target skill for data flow disruption - state inconsistencies and format variations.
## Purpose
- Detect inconsistent data formats between phases
- Identify scattered state storage
- Find missing data contracts
- Measure state transition integrity
## Preconditions
- [ ] state.status === 'running'
- [ ] state.target_skill.path is set
- [ ] 'dataflow' in state.focus_areas OR state.focus_areas is empty
## Detection Patterns
### Pattern 1: Multiple Storage Locations
```regex
# Data written to multiple paths without centralization
/Write\s*\(\s*[`'"][^`'"]+[`'"]/g
```
### Pattern 2: Inconsistent Field Names
```regex
# Same concept with different names: title/name, id/identifier
```
### Pattern 3: Missing Schema Validation
```regex
# Absence of validation before state write
# Look for lack of: validate, schema, check, verify
```
### Pattern 4: Format Transformation Without Normalization
```regex
# Direct JSON.parse without error handling or normalization
/JSON\.parse\([^)]+\)(?!\s*\|\|)/
```
## Execution
```javascript
async function execute(state, workDir) {
const skillPath = state.target_skill.path;
const startTime = Date.now();
const issues = [];
const evidence = [];
console.log(`Diagnosing data flow in ${skillPath}...`);
// 1. Collect all Write operations to map data storage
const allFiles = Glob(`${skillPath}/**/*.md`);
const writeLocations = [];
const readLocations = [];
for (const file of allFiles) {
const content = Read(file);
const relativePath = file.replace(skillPath + '/', '');
// Find Write operations
const writeMatches = content.matchAll(/Write\s*\(\s*[`'"]([^`'"]+)[`'"]/g);
for (const match of writeMatches) {
writeLocations.push({
file: relativePath,
target: match[1],
isStateFile: match[1].includes('state.json') || match[1].includes('config.json')
});
}
// Find Read operations
const readMatches = content.matchAll(/Read\s*\(\s*[`'"]([^`'"]+)[`'"]/g);
for (const match of readMatches) {
readLocations.push({
file: relativePath,
source: match[1]
});
}
}
// 2. Check for scattered state storage
const stateTargets = writeLocations
.filter(w => w.isStateFile)
.map(w => w.target);
const uniqueStateFiles = [...new Set(stateTargets)];
if (uniqueStateFiles.length > 2) {
issues.push({
id: `DF-${issues.length + 1}`,
type: 'dataflow_break',
severity: 'high',
location: { file: 'multiple' },
description: `State stored in ${uniqueStateFiles.length} different locations`,
evidence: uniqueStateFiles.slice(0, 5),
root_cause: 'No centralized state management',
impact: 'State inconsistency between phases',
suggested_fix: 'Centralize state to single state.json with state manager'
});
evidence.push({
file: 'multiple',
pattern: 'scattered_state',
context: uniqueStateFiles.join(', '),
severity: 'high'
});
}
// 3. Check for inconsistent field naming
const fieldNamePatterns = {
'name_vs_title': [/\.name\b/, /\.title\b/],
'id_vs_identifier': [/\.id\b/, /\.identifier\b/],
'status_vs_state': [/\.status\b/, /\.state\b/],
'error_vs_errors': [/\.error\b/, /\.errors\b/]
};
const fieldUsage = {};
for (const file of allFiles) {
const content = Read(file);
const relativePath = file.replace(skillPath + '/', '');
for (const [patternName, patterns] of Object.entries(fieldNamePatterns)) {
for (const pattern of patterns) {
if (pattern.test(content)) {
if (!fieldUsage[patternName]) fieldUsage[patternName] = [];
fieldUsage[patternName].push({
file: relativePath,
pattern: pattern.toString()
});
}
}
}
}
for (const [patternName, usages] of Object.entries(fieldUsage)) {
const uniquePatterns = [...new Set(usages.map(u => u.pattern))];
if (uniquePatterns.length > 1) {
issues.push({
id: `DF-${issues.length + 1}`,
type: 'dataflow_break',
severity: 'medium',
location: { file: 'multiple' },
description: `Inconsistent field naming: ${patternName.replace('_vs_', ' vs ')}`,
evidence: usages.slice(0, 3).map(u => `${u.file}: ${u.pattern}`),
root_cause: 'Same concept referred to with different field names',
impact: 'Data may be lost during field access',
suggested_fix: `Standardize to single field name, add normalization function`
});
}
}
// 4. Check for missing schema validation
for (const file of allFiles) {
const content = Read(file);
const relativePath = file.replace(skillPath + '/', '');
// Find JSON.parse without validation
const unsafeParses = content.match(/JSON\.parse\s*\([^)]+\)(?!\s*\?\?|\s*\|\|)/g);
const hasValidation = /validat|schema|type.*check/i.test(content);
if (unsafeParses && unsafeParses.length > 0 && !hasValidation) {
issues.push({
id: `DF-${issues.length + 1}`,
type: 'dataflow_break',
severity: 'medium',
location: { file: relativePath },
description: 'JSON parsing without validation',
evidence: unsafeParses.slice(0, 2),
root_cause: 'No schema validation after parsing',
impact: 'Invalid data may propagate through phases',
suggested_fix: 'Add schema validation after JSON.parse'
});
}
}
// 5. Check state schema if exists
const stateSchemaFile = Glob(`${skillPath}/phases/state-schema.md`)[0];
if (stateSchemaFile) {
const schemaContent = Read(stateSchemaFile);
// Check for type definitions
const hasTypeScript = /interface\s+\w+|type\s+\w+\s*=/i.test(schemaContent);
const hasValidationFunction = /function\s+validate|validateState/i.test(schemaContent);
if (hasTypeScript && !hasValidationFunction) {
issues.push({
id: `DF-${issues.length + 1}`,
type: 'dataflow_break',
severity: 'low',
location: { file: 'phases/state-schema.md' },
description: 'Type definitions without runtime validation',
evidence: ['TypeScript interfaces defined but no validation function'],
root_cause: 'Types are compile-time only, not enforced at runtime',
impact: 'Schema violations may occur at runtime',
suggested_fix: 'Add validateState() function using Zod or manual checks'
});
}
} else if (state.target_skill.execution_mode === 'autonomous') {
issues.push({
id: `DF-${issues.length + 1}`,
type: 'dataflow_break',
severity: 'high',
location: { file: 'phases/' },
description: 'Autonomous skill missing state-schema.md',
evidence: ['No state schema definition found'],
root_cause: 'State structure undefined for orchestrator',
impact: 'Inconsistent state handling across actions',
suggested_fix: 'Create phases/state-schema.md with explicit type definitions'
});
}
// 6. Check read-write alignment
const writtenFiles = new Set(writeLocations.map(w => w.target));
const readFiles = new Set(readLocations.map(r => r.source));
const writtenButNotRead = [...writtenFiles].filter(f =>
!readFiles.has(f) && !f.includes('output') && !f.includes('report')
);
if (writtenButNotRead.length > 0) {
issues.push({
id: `DF-${issues.length + 1}`,
type: 'dataflow_break',
severity: 'low',
location: { file: 'multiple' },
description: 'Files written but never read',
evidence: writtenButNotRead.slice(0, 3),
root_cause: 'Orphaned output files',
impact: 'Wasted storage and potential confusion',
suggested_fix: 'Remove unused writes or add reads where needed'
});
}
// 7. Calculate severity
const criticalCount = issues.filter(i => i.severity === 'critical').length;
const highCount = issues.filter(i => i.severity === 'high').length;
const severity = criticalCount > 0 ? 'critical' :
highCount > 1 ? 'high' :
highCount > 0 ? 'medium' :
issues.length > 0 ? 'low' : 'none';
// 8. Write diagnosis result
const diagnosisResult = {
status: 'completed',
issues_found: issues.length,
severity: severity,
execution_time_ms: Date.now() - startTime,
details: {
patterns_checked: [
'scattered_state',
'inconsistent_naming',
'missing_validation',
'read_write_alignment'
],
patterns_matched: evidence.map(e => e.pattern),
evidence: evidence,
data_flow_map: {
write_locations: writeLocations.length,
read_locations: readLocations.length,
unique_state_files: uniqueStateFiles.length
},
recommendations: [
uniqueStateFiles.length > 2 ? 'Implement centralized state manager' : null,
issues.some(i => i.description.includes('naming'))
? 'Create normalization layer for field names' : null,
issues.some(i => i.description.includes('validation'))
? 'Add Zod or JSON Schema validation' : null
].filter(Boolean)
}
};
Write(`${workDir}/diagnosis/dataflow-diagnosis.json`,
JSON.stringify(diagnosisResult, null, 2));
return {
stateUpdates: {
'diagnosis.dataflow': diagnosisResult,
issues: [...state.issues, ...issues]
},
outputFiles: [`${workDir}/diagnosis/dataflow-diagnosis.json`],
summary: `Data flow diagnosis: ${issues.length} issues found (severity: ${severity})`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
'diagnosis.dataflow': {
status: 'completed',
issues_found: <count>,
severity: '<critical|high|medium|low|none>',
// ... full diagnosis result
},
issues: [...existingIssues, ...newIssues]
}
};
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| Glob pattern error | Use fallback patterns |
| File read error | Skip and continue |
## Next Actions
- Success: action-diagnose-agent (or next in focus_areas)
- Skipped: If 'dataflow' not in focus_areas

View File

@@ -0,0 +1,269 @@
# Action: Diagnose Long-tail Forgetting
Analyze target skill for long-tail effect and constraint forgetting issues.
## Purpose
- Detect loss of early instructions in long execution chains
- Identify missing constraint propagation mechanisms
- Find weak goal alignment between phases
- Measure instruction retention across phases
## Preconditions
- [ ] state.status === 'running'
- [ ] state.target_skill.path is set
- [ ] 'memory' in state.focus_areas OR state.focus_areas is empty
## Detection Patterns
### Pattern 1: Missing Constraint References
```regex
# Phases that don't reference original requirements
# Look for absence of: requirements, constraints, original, initial, user_request
```
### Pattern 2: Goal Drift
```regex
# Later phases focus on immediate task without global context
/\[TASK\][^[]*(?!\[CONSTRAINTS\]|\[REQUIREMENTS\])/
```
### Pattern 3: No Checkpoint Mechanism
```regex
# Absence of state preservation at key points
# Look for lack of: checkpoint, snapshot, preserve, restore
```
### Pattern 4: Implicit State Passing
```regex
# State passed implicitly through conversation rather than explicitly
/(?<!state\.)context\./
```
## Execution
```javascript
async function execute(state, workDir) {
const skillPath = state.target_skill.path;
const startTime = Date.now();
const issues = [];
const evidence = [];
console.log(`Diagnosing long-tail forgetting in ${skillPath}...`);
// 1. Analyze phase chain for constraint propagation
const phaseFiles = Glob(`${skillPath}/phases/*.md`)
.filter(f => !f.includes('orchestrator') && !f.includes('state-schema'))
.sort();
// Extract phase order (for sequential) or action dependencies (for autonomous)
const isAutonomous = state.target_skill.execution_mode === 'autonomous';
// 2. Check each phase for constraint awareness
let firstPhaseConstraints = [];
for (let i = 0; i < phaseFiles.length; i++) {
const file = phaseFiles[i];
const content = Read(file);
const relativePath = file.replace(skillPath + '/', '');
const phaseNum = i + 1;
// Extract constraints from first phase
if (i === 0) {
const constraintMatch = content.match(/\[CONSTRAINTS?\]([^[]*)/i);
if (constraintMatch) {
firstPhaseConstraints = constraintMatch[1]
.split('\n')
.filter(l => l.trim().startsWith('-'))
.map(l => l.trim().replace(/^-\s*/, ''));
}
}
// Check if later phases reference original constraints
if (i > 0 && firstPhaseConstraints.length > 0) {
const mentionsConstraints = firstPhaseConstraints.some(c =>
content.toLowerCase().includes(c.toLowerCase().slice(0, 20))
);
if (!mentionsConstraints) {
issues.push({
id: `MEM-${issues.length + 1}`,
type: 'memory_loss',
severity: 'high',
location: { file: relativePath, phase: `Phase ${phaseNum}` },
description: `Phase ${phaseNum} does not reference original constraints`,
evidence: [`Original constraints: ${firstPhaseConstraints.slice(0, 3).join(', ')}`],
root_cause: 'Constraint information not propagated to later phases',
impact: 'May produce output violating original requirements',
suggested_fix: 'Add explicit constraint injection or reference to state.original_constraints'
});
evidence.push({
file: relativePath,
pattern: 'missing_constraint_reference',
context: `Phase ${phaseNum} of ${phaseFiles.length}`,
severity: 'high'
});
}
}
// Check for goal drift - task without constraints
const hasTask = /\[TASK\]/i.test(content);
const hasConstraints = /\[CONSTRAINTS?\]|\[REQUIREMENTS?\]|\[RULES?\]/i.test(content);
if (hasTask && !hasConstraints && i > 1) {
issues.push({
id: `MEM-${issues.length + 1}`,
type: 'memory_loss',
severity: 'medium',
location: { file: relativePath },
description: 'Phase has TASK but no CONSTRAINTS/RULES section',
evidence: ['Task defined without boundary constraints'],
root_cause: 'Agent may not adhere to global constraints',
impact: 'Potential goal drift from original intent',
suggested_fix: 'Add [CONSTRAINTS] section referencing global rules'
});
}
// Check for checkpoint mechanism
const hasCheckpoint = /checkpoint|snapshot|preserve|savepoint/i.test(content);
const isKeyPhase = i === Math.floor(phaseFiles.length / 2) || i === phaseFiles.length - 1;
if (isKeyPhase && !hasCheckpoint && phaseFiles.length > 3) {
issues.push({
id: `MEM-${issues.length + 1}`,
type: 'memory_loss',
severity: 'low',
location: { file: relativePath },
description: 'Key phase without checkpoint mechanism',
evidence: [`Phase ${phaseNum} is a key milestone but has no state preservation`],
root_cause: 'Cannot recover from failures or verify constraint adherence',
impact: 'No rollback capability if constraints violated',
suggested_fix: 'Add checkpoint before major state changes'
});
}
}
// 3. Check for explicit state schema with constraints field
const stateSchemaFile = Glob(`${skillPath}/phases/state-schema.md`)[0];
if (stateSchemaFile) {
const schemaContent = Read(stateSchemaFile);
const hasConstraintsField = /constraints|requirements|original_request/i.test(schemaContent);
if (!hasConstraintsField) {
issues.push({
id: `MEM-${issues.length + 1}`,
type: 'memory_loss',
severity: 'medium',
location: { file: 'phases/state-schema.md' },
description: 'State schema lacks constraints/requirements field',
evidence: ['No dedicated field for preserving original requirements'],
root_cause: 'State structure does not support constraint persistence',
impact: 'Constraints may be lost during state transitions',
suggested_fix: 'Add original_requirements field to state schema'
});
}
}
// 4. Check SKILL.md for constraint enforcement in execution flow
const skillMd = Read(`${skillPath}/SKILL.md`);
const hasConstraintVerification = /constraint.*verif|verif.*constraint|quality.*gate/i.test(skillMd);
if (!hasConstraintVerification && phaseFiles.length > 3) {
issues.push({
id: `MEM-${issues.length + 1}`,
type: 'memory_loss',
severity: 'medium',
location: { file: 'SKILL.md' },
description: 'No constraint verification step in execution flow',
evidence: ['Execution flow lacks quality gate or constraint check'],
root_cause: 'No mechanism to verify output matches original intent',
impact: 'Constraint violations may go undetected',
suggested_fix: 'Add verification phase comparing output to original requirements'
});
}
// 5. Calculate severity
const criticalCount = issues.filter(i => i.severity === 'critical').length;
const highCount = issues.filter(i => i.severity === 'high').length;
const severity = criticalCount > 0 ? 'critical' :
highCount > 2 ? 'high' :
highCount > 0 ? 'medium' :
issues.length > 0 ? 'low' : 'none';
// 6. Write diagnosis result
const diagnosisResult = {
status: 'completed',
issues_found: issues.length,
severity: severity,
execution_time_ms: Date.now() - startTime,
details: {
patterns_checked: [
'constraint_propagation',
'goal_drift',
'checkpoint_mechanism',
'state_schema_constraints'
],
patterns_matched: evidence.map(e => e.pattern),
evidence: evidence,
phase_analysis: {
total_phases: phaseFiles.length,
first_phase_constraints: firstPhaseConstraints.length,
phases_with_constraint_ref: phaseFiles.length - issues.filter(i =>
i.description.includes('does not reference')).length
},
recommendations: [
highCount > 0 ? 'Implement constraint injection at each phase' : null,
issues.some(i => i.description.includes('checkpoint'))
? 'Add checkpoint/restore mechanism' : null,
issues.some(i => i.description.includes('State schema'))
? 'Add original_requirements to state schema' : null
].filter(Boolean)
}
};
Write(`${workDir}/diagnosis/memory-diagnosis.json`,
JSON.stringify(diagnosisResult, null, 2));
return {
stateUpdates: {
'diagnosis.memory': diagnosisResult,
issues: [...state.issues, ...issues]
},
outputFiles: [`${workDir}/diagnosis/memory-diagnosis.json`],
summary: `Memory diagnosis: ${issues.length} issues found (severity: ${severity})`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
'diagnosis.memory': {
status: 'completed',
issues_found: <count>,
severity: '<critical|high|medium|low|none>',
// ... full diagnosis result
},
issues: [...existingIssues, ...newIssues]
}
};
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| Phase file read error | Skip file, continue analysis |
| No phases found | Report as structure issue |
## Next Actions
- Success: action-diagnose-dataflow (or next in focus_areas)
- Skipped: If 'memory' not in focus_areas

View File

@@ -0,0 +1,322 @@
# Action: Gemini Analysis
动态调用 Gemini CLI 进行深度分析,根据用户需求或诊断结果选择分析类型。
## Role
- 接收用户指定的分析需求或从诊断结果推断需求
- 构建适当的 CLI 命令
- 执行分析并解析结果
- 更新状态以供后续动作使用
## Preconditions
- `state.status === 'running'`
- 满足以下任一条件:
- `state.gemini_analysis_requested === true` (用户请求)
- `state.issues.some(i => i.severity === 'critical')` (发现严重问题)
- `state.analysis_type !== null` (已指定分析类型)
## Analysis Types
### 1. root_cause - 问题根因分析
针对用户描述的问题进行深度分析。
```javascript
const analysisPrompt = `
PURPOSE: Identify root cause of skill execution issue: ${state.user_issue_description}
TASK:
• Analyze skill structure at: ${state.target_skill.path}
• Identify anti-patterns in phase files
• Trace data flow through state management
• Check agent coordination patterns
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: JSON with structure:
{
"root_causes": [
{ "id": "RC-001", "description": "...", "severity": "high", "evidence": ["file:line"] }
],
"patterns_found": [
{ "pattern": "...", "type": "anti-pattern|best-practice", "locations": [] }
],
"recommendations": [
{ "priority": 1, "action": "...", "rationale": "..." }
]
}
RULES: Focus on execution flow, state management, agent coordination
`;
```
### 2. architecture - 架构审查
评估 skill 的整体架构设计。
```javascript
const analysisPrompt = `
PURPOSE: Review skill architecture for: ${state.target_skill.name}
TASK:
• Evaluate phase decomposition and responsibility separation
• Check state schema design and data flow
• Assess agent coordination and error handling
• Review scalability and maintainability
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: Markdown report with sections:
- Executive Summary
- Phase Architecture Assessment
- State Management Evaluation
- Agent Coordination Analysis
- Improvement Recommendations (prioritized)
RULES: Focus on modularity, extensibility, maintainability
`;
```
### 3. prompt_optimization - 提示词优化
分析和优化 phase 中的提示词。
```javascript
const analysisPrompt = `
PURPOSE: Optimize prompts in skill phases for better output quality
TASK:
• Analyze existing prompts for clarity and specificity
• Identify ambiguous instructions
• Check output format specifications
• Evaluate constraint communication
MODE: analysis
CONTEXT: @phases/**/*.md
EXPECTED: JSON with structure:
{
"prompt_issues": [
{ "file": "...", "issue": "...", "severity": "...", "suggestion": "..." }
],
"optimized_prompts": [
{ "file": "...", "original": "...", "optimized": "...", "rationale": "..." }
]
}
RULES: Preserve intent, improve clarity, add structured output requirements
`;
```
### 4. performance - 性能分析
分析 Token 消耗和执行效率。
```javascript
const analysisPrompt = `
PURPOSE: Analyze performance bottlenecks in skill execution
TASK:
• Estimate token consumption per phase
• Identify redundant data passing
• Check for unnecessary full-content transfers
• Evaluate caching opportunities
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: JSON with structure:
{
"token_estimates": [
{ "phase": "...", "estimated_tokens": 1000, "breakdown": {} }
],
"bottlenecks": [
{ "type": "...", "location": "...", "impact": "high|medium|low", "fix": "..." }
],
"optimization_suggestions": []
}
RULES: Focus on token efficiency, reduce redundancy
`;
```
### 5. custom - 自定义分析
用户指定的自定义分析需求。
```javascript
const analysisPrompt = `
PURPOSE: ${state.custom_analysis_purpose}
TASK: ${state.custom_analysis_tasks}
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: ${state.custom_analysis_expected}
RULES: ${state.custom_analysis_rules || 'Follow best practices'}
`;
```
## Execution
```javascript
async function executeGeminiAnalysis(state, workDir) {
// 1. 确定分析类型
const analysisType = state.analysis_type || determineAnalysisType(state);
// 2. 构建 prompt
const prompt = buildAnalysisPrompt(analysisType, state);
// 3. 构建 CLI 命令
const cliCommand = `ccw cli -p "${escapeForShell(prompt)}" --tool gemini --mode analysis --cd "${state.target_skill.path}"`;
console.log(`Executing Gemini analysis: ${analysisType}`);
console.log(`Command: ${cliCommand}`);
// 4. 执行 CLI (后台运行)
const result = Bash({
command: cliCommand,
run_in_background: true,
timeout: 300000 // 5 minutes
});
// 5. 等待结果
// 注意: 根据 CLAUDE.md 指引CLI 后台执行后应停止轮询
// 结果会在 CLI 完成后写入 state
return {
stateUpdates: {
gemini_analysis: {
type: analysisType,
status: 'running',
started_at: new Date().toISOString(),
task_id: result.task_id
}
},
outputFiles: [],
summary: `Gemini ${analysisType} analysis started in background`
};
}
function determineAnalysisType(state) {
// 根据状态推断分析类型
if (state.user_issue_description && state.user_issue_description.length > 100) {
return 'root_cause';
}
if (state.issues.some(i => i.severity === 'critical')) {
return 'root_cause';
}
if (state.focus_areas.includes('architecture')) {
return 'architecture';
}
if (state.focus_areas.includes('prompt')) {
return 'prompt_optimization';
}
if (state.focus_areas.includes('performance')) {
return 'performance';
}
return 'root_cause'; // 默认
}
function buildAnalysisPrompt(type, state) {
const templates = {
root_cause: () => `
PURPOSE: Identify root cause of skill execution issue: ${state.user_issue_description}
TASK: • Analyze skill structure • Identify anti-patterns • Trace data flow issues • Check agent coordination
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: JSON { root_causes: [], patterns_found: [], recommendations: [] }
RULES: Focus on execution flow, be specific about file:line locations
`,
architecture: () => `
PURPOSE: Review skill architecture for ${state.target_skill.name}
TASK: • Evaluate phase decomposition • Check state design • Assess agent coordination • Review extensibility
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: Markdown architecture assessment report
RULES: Focus on modularity and maintainability
`,
prompt_optimization: () => `
PURPOSE: Optimize prompts in skill for better output quality
TASK: • Analyze prompt clarity • Check output specifications • Evaluate constraint handling
MODE: analysis
CONTEXT: @phases/**/*.md
EXPECTED: JSON { prompt_issues: [], optimized_prompts: [] }
RULES: Preserve intent, improve clarity
`,
performance: () => `
PURPOSE: Analyze performance bottlenecks in skill
TASK: • Estimate token consumption • Identify redundancy • Check data transfer efficiency
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: JSON { token_estimates: [], bottlenecks: [], optimization_suggestions: [] }
RULES: Focus on token efficiency
`,
custom: () => `
PURPOSE: ${state.custom_analysis_purpose}
TASK: ${state.custom_analysis_tasks}
MODE: analysis
CONTEXT: @**/*.md
EXPECTED: ${state.custom_analysis_expected}
RULES: ${state.custom_analysis_rules || 'Best practices'}
`
};
return templates[type]();
}
function escapeForShell(str) {
// 转义 shell 特殊字符
return str.replace(/"/g, '\\"').replace(/\$/g, '\\$').replace(/`/g, '\\`');
}
```
## Output
### State Updates
```javascript
{
gemini_analysis: {
type: 'root_cause' | 'architecture' | 'prompt_optimization' | 'performance' | 'custom',
status: 'running' | 'completed' | 'failed',
started_at: '2024-01-01T00:00:00Z',
completed_at: '2024-01-01T00:05:00Z',
task_id: 'xxx',
result: { /* 分析结果 */ },
error: null
},
// 分析结果合并到 issues
issues: [
...state.issues,
...newIssuesFromAnalysis
]
}
```
### Output Files
- `${workDir}/diagnosis/gemini-analysis-${type}.json` - 原始分析结果
- `${workDir}/diagnosis/gemini-analysis-${type}.md` - 格式化报告
## Post-Execution
分析完成后:
1. 解析 CLI 输出为结构化数据
2. 提取新发现的 issues 合并到 state.issues
3. 更新 recommendations 到 state
4. 触发下一步动作 (通常是 action-generate-report 或 action-propose-fixes)
## Error Handling
| Error | Recovery |
|-------|----------|
| CLI 超时 | 重试一次,仍失败则跳过 Gemini 分析 |
| 解析失败 | 保存原始输出,手动处理 |
| 无结果 | 标记为 skipped继续流程 |
## User Interaction
如果 `state.analysis_type === null` 且无法自动推断,询问用户:
```javascript
AskUserQuestion({
questions: [{
question: '请选择 Gemini 分析类型',
header: '分析类型',
options: [
{ label: '问题根因分析', description: '深度分析用户描述的问题' },
{ label: '架构审查', description: '评估整体架构设计' },
{ label: '提示词优化', description: '分析和优化 phase 提示词' },
{ label: '性能分析', description: '分析 Token 消耗和执行效率' }
],
multiSelect: false
}]
});
```

View File

@@ -0,0 +1,228 @@
# Action: Generate Consolidated Report
Generate a comprehensive tuning report merging all diagnosis results with prioritized recommendations.
## Purpose
- Merge all diagnosis results into unified report
- Prioritize issues by severity and impact
- Generate actionable recommendations
- Create human-readable markdown report
## Preconditions
- [ ] state.status === 'running'
- [ ] All diagnoses in focus_areas are completed
- [ ] state.issues.length > 0 OR generate summary report
## Execution
```javascript
async function execute(state, workDir) {
console.log('Generating consolidated tuning report...');
const targetSkill = state.target_skill;
const issues = state.issues;
// 1. Group issues by type
const issuesByType = {
context_explosion: issues.filter(i => i.type === 'context_explosion'),
memory_loss: issues.filter(i => i.type === 'memory_loss'),
dataflow_break: issues.filter(i => i.type === 'dataflow_break'),
agent_failure: issues.filter(i => i.type === 'agent_failure')
};
// 2. Group issues by severity
const issuesBySeverity = {
critical: issues.filter(i => i.severity === 'critical'),
high: issues.filter(i => i.severity === 'high'),
medium: issues.filter(i => i.severity === 'medium'),
low: issues.filter(i => i.severity === 'low')
};
// 3. Calculate overall health score
const weights = { critical: 25, high: 15, medium: 5, low: 1 };
const deductions = Object.entries(issuesBySeverity)
.reduce((sum, [sev, arr]) => sum + arr.length * weights[sev], 0);
const healthScore = Math.max(0, 100 - deductions);
// 4. Generate report content
const report = `# Skill Tuning Report
**Target Skill**: ${targetSkill.name}
**Path**: ${targetSkill.path}
**Execution Mode**: ${targetSkill.execution_mode}
**Generated**: ${new Date().toISOString()}
---
## Executive Summary
| Metric | Value |
|--------|-------|
| Health Score | ${healthScore}/100 |
| Total Issues | ${issues.length} |
| Critical | ${issuesBySeverity.critical.length} |
| High | ${issuesBySeverity.high.length} |
| Medium | ${issuesBySeverity.medium.length} |
| Low | ${issuesBySeverity.low.length} |
### User Reported Issue
> ${state.user_issue_description}
### Overall Assessment
${healthScore >= 80 ? '✅ Skill is in good health with minor issues.' :
healthScore >= 60 ? '⚠️ Skill has significant issues requiring attention.' :
healthScore >= 40 ? '🔶 Skill has serious issues affecting reliability.' :
'❌ Skill has critical issues requiring immediate fixes.'}
---
## Diagnosis Results
### Context Explosion Analysis
${state.diagnosis.context ?
`- **Status**: ${state.diagnosis.context.status}
- **Severity**: ${state.diagnosis.context.severity}
- **Issues Found**: ${state.diagnosis.context.issues_found}
- **Key Findings**: ${state.diagnosis.context.details.recommendations.join('; ') || 'None'}` :
'_Not analyzed_'}
### Long-tail Memory Analysis
${state.diagnosis.memory ?
`- **Status**: ${state.diagnosis.memory.status}
- **Severity**: ${state.diagnosis.memory.severity}
- **Issues Found**: ${state.diagnosis.memory.issues_found}
- **Key Findings**: ${state.diagnosis.memory.details.recommendations.join('; ') || 'None'}` :
'_Not analyzed_'}
### Data Flow Analysis
${state.diagnosis.dataflow ?
`- **Status**: ${state.diagnosis.dataflow.status}
- **Severity**: ${state.diagnosis.dataflow.severity}
- **Issues Found**: ${state.diagnosis.dataflow.issues_found}
- **Key Findings**: ${state.diagnosis.dataflow.details.recommendations.join('; ') || 'None'}` :
'_Not analyzed_'}
### Agent Coordination Analysis
${state.diagnosis.agent ?
`- **Status**: ${state.diagnosis.agent.status}
- **Severity**: ${state.diagnosis.agent.severity}
- **Issues Found**: ${state.diagnosis.agent.issues_found}
- **Key Findings**: ${state.diagnosis.agent.details.recommendations.join('; ') || 'None'}` :
'_Not analyzed_'}
---
## Critical & High Priority Issues
${issuesBySeverity.critical.length + issuesBySeverity.high.length === 0 ?
'_No critical or high priority issues found._' :
[...issuesBySeverity.critical, ...issuesBySeverity.high].map((issue, i) => `
### ${i + 1}. [${issue.severity.toUpperCase()}] ${issue.description}
- **ID**: ${issue.id}
- **Type**: ${issue.type}
- **Location**: ${typeof issue.location === 'object' ? issue.location.file : issue.location}
- **Root Cause**: ${issue.root_cause}
- **Impact**: ${issue.impact}
- **Suggested Fix**: ${issue.suggested_fix}
**Evidence**:
${issue.evidence.map(e => `- \`${e}\``).join('\n')}
`).join('\n')}
---
## Medium & Low Priority Issues
${issuesBySeverity.medium.length + issuesBySeverity.low.length === 0 ?
'_No medium or low priority issues found._' :
[...issuesBySeverity.medium, ...issuesBySeverity.low].map((issue, i) => `
### ${i + 1}. [${issue.severity.toUpperCase()}] ${issue.description}
- **ID**: ${issue.id}
- **Type**: ${issue.type}
- **Suggested Fix**: ${issue.suggested_fix}
`).join('\n')}
---
## Recommended Fix Order
Based on severity and dependencies, apply fixes in this order:
${[...issuesBySeverity.critical, ...issuesBySeverity.high, ...issuesBySeverity.medium]
.slice(0, 10)
.map((issue, i) => `${i + 1}. **${issue.id}**: ${issue.suggested_fix}`)
.join('\n')}
---
## Quality Gates
| Gate | Threshold | Current | Status |
|------|-----------|---------|--------|
| Critical Issues | 0 | ${issuesBySeverity.critical.length} | ${issuesBySeverity.critical.length === 0 ? '✅ PASS' : '❌ FAIL'} |
| High Issues | ≤ 2 | ${issuesBySeverity.high.length} | ${issuesBySeverity.high.length <= 2 ? '✅ PASS' : '❌ FAIL'} |
| Health Score | ≥ 60 | ${healthScore} | ${healthScore >= 60 ? '✅ PASS' : '❌ FAIL'} |
**Overall Quality Gate**: ${
issuesBySeverity.critical.length === 0 &&
issuesBySeverity.high.length <= 2 &&
healthScore >= 60 ? '✅ PASS' : '❌ FAIL'}
---
*Report generated by skill-tuning*
`;
// 5. Write report
Write(`${workDir}/tuning-report.md`, report);
// 6. Calculate quality gate
const qualityGate = issuesBySeverity.critical.length === 0 &&
issuesBySeverity.high.length <= 2 &&
healthScore >= 60 ? 'pass' :
healthScore >= 40 ? 'review' : 'fail';
return {
stateUpdates: {
quality_score: healthScore,
quality_gate: qualityGate,
issues_by_severity: {
critical: issuesBySeverity.critical.length,
high: issuesBySeverity.high.length,
medium: issuesBySeverity.medium.length,
low: issuesBySeverity.low.length
}
},
outputFiles: [`${workDir}/tuning-report.md`],
summary: `Report generated: ${issues.length} issues, health score ${healthScore}/100, gate: ${qualityGate}`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
quality_score: <0-100>,
quality_gate: '<pass|review|fail>',
issues_by_severity: { critical: N, high: N, medium: N, low: N }
}
};
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| Write error | Retry to alternative path |
| Empty issues | Generate summary with no issues |
## Next Actions
- If issues.length > 0: action-propose-fixes
- If issues.length === 0: action-complete

View File

@@ -0,0 +1,149 @@
# Action: Initialize Tuning Session
Initialize the skill-tuning session by collecting target skill information, creating work directories, and setting up initial state.
## Purpose
- Identify target skill to tune
- Collect user's problem description
- Create work directory structure
- Backup original skill files
- Initialize state for orchestrator
## Preconditions
- [ ] state.status === 'pending'
## Execution
```javascript
async function execute(state, workDir) {
// 1. Ask user for target skill
const skillInput = await AskUserQuestion({
questions: [{
question: "Which skill do you want to tune?",
header: "Target Skill",
multiSelect: false,
options: [
{ label: "Specify path", description: "Enter skill directory path" }
]
}]
});
const skillPath = skillInput["Target Skill"];
// 2. Validate skill exists and read structure
const skillMdPath = `${skillPath}/SKILL.md`;
if (!Glob(`${skillPath}/SKILL.md`).length) {
throw new Error(`Invalid skill path: ${skillPath} - SKILL.md not found`);
}
// 3. Read skill metadata
const skillMd = Read(skillMdPath);
const frontMatterMatch = skillMd.match(/^---\n([\s\S]*?)\n---/);
const skillName = frontMatterMatch
? frontMatterMatch[1].match(/name:\s*(.+)/)?.[1]?.trim()
: skillPath.split('/').pop();
// 4. Detect execution mode
const hasOrchestrator = Glob(`${skillPath}/phases/orchestrator.md`).length > 0;
const executionMode = hasOrchestrator ? 'autonomous' : 'sequential';
// 5. Scan skill structure
const phases = Glob(`${skillPath}/phases/**/*.md`).map(f => f.replace(skillPath + '/', ''));
const specs = Glob(`${skillPath}/specs/**/*.md`).map(f => f.replace(skillPath + '/', ''));
// 6. Ask for problem description
const issueInput = await AskUserQuestion({
questions: [{
question: "Describe the issue or what you want to optimize:",
header: "Issue",
multiSelect: false,
options: [
{ label: "Context grows too large", description: "Token explosion over multiple turns" },
{ label: "Instructions forgotten", description: "Early constraints lost in long execution" },
{ label: "Data inconsistency", description: "State format changes between phases" },
{ label: "Agent failures", description: "Sub-agent calls fail or return unexpected results" }
]
}]
});
// 7. Ask for focus areas
const focusInput = await AskUserQuestion({
questions: [{
question: "Which areas should be diagnosed? (Select all that apply)",
header: "Focus",
multiSelect: true,
options: [
{ label: "context", description: "Context explosion analysis" },
{ label: "memory", description: "Long-tail forgetting analysis" },
{ label: "dataflow", description: "Data flow analysis" },
{ label: "agent", description: "Agent coordination analysis" }
]
}]
});
const focusAreas = focusInput["Focus"] || ['context', 'memory', 'dataflow', 'agent'];
// 8. Create backup
const backupDir = `${workDir}/backups/${skillName}-backup`;
Bash(`mkdir -p "${backupDir}"`);
Bash(`cp -r "${skillPath}"/* "${backupDir}/"`);
// 9. Return state updates
return {
stateUpdates: {
status: 'running',
started_at: new Date().toISOString(),
target_skill: {
name: skillName,
path: skillPath,
execution_mode: executionMode,
phases: phases,
specs: specs
},
user_issue_description: issueInput["Issue"],
focus_areas: Array.isArray(focusAreas) ? focusAreas : [focusAreas],
work_dir: workDir,
backup_dir: backupDir
},
outputFiles: [],
summary: `Initialized tuning for "${skillName}" (${executionMode} mode), focus: ${focusAreas.join(', ')}`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
status: 'running',
started_at: '<timestamp>',
target_skill: {
name: '<skill-name>',
path: '<skill-path>',
execution_mode: '<sequential|autonomous>',
phases: ['...'],
specs: ['...']
},
user_issue_description: '<user description>',
focus_areas: ['context', 'memory', ...],
work_dir: '<work-dir>',
backup_dir: '<backup-dir>'
}
};
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| Skill path not found | Ask user to re-enter valid path |
| SKILL.md missing | Suggest path correction |
| Backup creation failed | Retry with alternative location |
## Next Actions
- Success: Continue to first diagnosis action based on focus_areas
- Failure: action-abort

View File

@@ -0,0 +1,317 @@
# Action: Propose Fixes
Generate fix proposals for identified issues with implementation strategies.
## Purpose
- Create fix strategies for each issue
- Generate implementation plans
- Estimate risk levels
- Allow user to select fixes to apply
## Preconditions
- [ ] state.status === 'running'
- [ ] state.issues.length > 0
- [ ] action-generate-report completed
## Fix Strategy Catalog
### Context Explosion Fixes
| Strategy | Description | Risk |
|----------|-------------|------|
| `context_summarization` | Add summarizer agent between phases | low |
| `sliding_window` | Keep only last N turns in context | low |
| `structured_state` | Replace text context with JSON state | medium |
| `path_reference` | Pass file paths instead of content | low |
### Memory Loss Fixes
| Strategy | Description | Risk |
|----------|-------------|------|
| `constraint_injection` | Add constraints to each phase prompt | low |
| `checkpoint_restore` | Save state at milestones | low |
| `goal_embedding` | Track goal similarity throughout | medium |
| `state_constraints_field` | Add constraints field to state schema | low |
### Data Flow Fixes
| Strategy | Description | Risk |
|----------|-------------|------|
| `state_centralization` | Single state.json for all data | medium |
| `schema_enforcement` | Add Zod validation | low |
| `field_normalization` | Normalize field names | low |
| `transactional_updates` | Atomic state updates | medium |
### Agent Coordination Fixes
| Strategy | Description | Risk |
|----------|-------------|------|
| `error_wrapping` | Add try-catch to all Task calls | low |
| `result_validation` | Validate agent returns | low |
| `orchestrator_refactor` | Centralize agent coordination | high |
| `flatten_nesting` | Remove nested agent calls | medium |
## Execution
```javascript
async function execute(state, workDir) {
console.log('Generating fix proposals...');
const issues = state.issues;
const fixes = [];
// Group issues by type for batch fixes
const issuesByType = {
context_explosion: issues.filter(i => i.type === 'context_explosion'),
memory_loss: issues.filter(i => i.type === 'memory_loss'),
dataflow_break: issues.filter(i => i.type === 'dataflow_break'),
agent_failure: issues.filter(i => i.type === 'agent_failure')
};
// Generate fixes for context explosion
if (issuesByType.context_explosion.length > 0) {
const ctxIssues = issuesByType.context_explosion;
if (ctxIssues.some(i => i.description.includes('history accumulation'))) {
fixes.push({
id: `FIX-${fixes.length + 1}`,
issue_ids: ctxIssues.filter(i => i.description.includes('history')).map(i => i.id),
strategy: 'sliding_window',
description: 'Implement sliding window for conversation history',
rationale: 'Prevents unbounded context growth by keeping only recent turns',
changes: [{
file: 'phases/orchestrator.md',
action: 'modify',
diff: `+ const MAX_HISTORY = 5;
+ state.history = state.history.slice(-MAX_HISTORY);`
}],
risk: 'low',
estimated_impact: 'Reduces token usage by ~50%',
verification_steps: ['Run skill with 10+ iterations', 'Verify context size stable']
});
}
if (ctxIssues.some(i => i.description.includes('full content'))) {
fixes.push({
id: `FIX-${fixes.length + 1}`,
issue_ids: ctxIssues.filter(i => i.description.includes('content')).map(i => i.id),
strategy: 'path_reference',
description: 'Pass file paths instead of full content',
rationale: 'Agents can read files when needed, reducing prompt size',
changes: [{
file: 'phases/*.md',
action: 'modify',
diff: `- prompt: \${content}
+ prompt: Read file at: \${filePath}`
}],
risk: 'low',
estimated_impact: 'Significant token reduction',
verification_steps: ['Verify agents can still access needed content']
});
}
}
// Generate fixes for memory loss
if (issuesByType.memory_loss.length > 0) {
const memIssues = issuesByType.memory_loss;
if (memIssues.some(i => i.description.includes('constraint'))) {
fixes.push({
id: `FIX-${fixes.length + 1}`,
issue_ids: memIssues.filter(i => i.description.includes('constraint')).map(i => i.id),
strategy: 'constraint_injection',
description: 'Add constraint injection to all phases',
rationale: 'Ensures original requirements are visible in every phase',
changes: [{
file: 'phases/*.md',
action: 'modify',
diff: `+ [CONSTRAINTS]
+ Original requirements from state.original_requirements:
+ \${JSON.stringify(state.original_requirements)}`
}],
risk: 'low',
estimated_impact: 'Improves constraint adherence',
verification_steps: ['Run skill with specific constraints', 'Verify output matches']
});
}
if (memIssues.some(i => i.description.includes('State schema'))) {
fixes.push({
id: `FIX-${fixes.length + 1}`,
issue_ids: memIssues.filter(i => i.description.includes('schema')).map(i => i.id),
strategy: 'state_constraints_field',
description: 'Add original_requirements field to state schema',
rationale: 'Preserves original intent throughout execution',
changes: [{
file: 'phases/state-schema.md',
action: 'modify',
diff: `+ original_requirements: string[]; // User's original constraints
+ goal_summary: string; // One-line goal statement`
}],
risk: 'low',
estimated_impact: 'Enables constraint tracking',
verification_steps: ['Verify state includes requirements after init']
});
}
}
// Generate fixes for data flow
if (issuesByType.dataflow_break.length > 0) {
const dfIssues = issuesByType.dataflow_break;
if (dfIssues.some(i => i.description.includes('multiple locations'))) {
fixes.push({
id: `FIX-${fixes.length + 1}`,
issue_ids: dfIssues.filter(i => i.description.includes('location')).map(i => i.id),
strategy: 'state_centralization',
description: 'Centralize all state to single state.json',
rationale: 'Single source of truth prevents inconsistencies',
changes: [{
file: 'phases/*.md',
action: 'modify',
diff: `- Write(\`\${workDir}/config.json\`, ...)
+ updateState({ config: ... }) // Use state manager`
}],
risk: 'medium',
estimated_impact: 'Eliminates state fragmentation',
verification_steps: ['Verify all reads come from state.json', 'Test state persistence']
});
}
if (dfIssues.some(i => i.description.includes('validation'))) {
fixes.push({
id: `FIX-${fixes.length + 1}`,
issue_ids: dfIssues.filter(i => i.description.includes('validation')).map(i => i.id),
strategy: 'schema_enforcement',
description: 'Add Zod schema validation',
rationale: 'Runtime validation catches schema violations',
changes: [{
file: 'phases/state-schema.md',
action: 'modify',
diff: `+ import { z } from 'zod';
+ const StateSchema = z.object({...});
+ function validateState(s) { return StateSchema.parse(s); }`
}],
risk: 'low',
estimated_impact: 'Catches invalid state early',
verification_steps: ['Test with invalid state input', 'Verify error thrown']
});
}
}
// Generate fixes for agent coordination
if (issuesByType.agent_failure.length > 0) {
const agentIssues = issuesByType.agent_failure;
if (agentIssues.some(i => i.description.includes('error handling'))) {
fixes.push({
id: `FIX-${fixes.length + 1}`,
issue_ids: agentIssues.filter(i => i.description.includes('error')).map(i => i.id),
strategy: 'error_wrapping',
description: 'Wrap all Task calls in try-catch',
rationale: 'Prevents cascading failures from agent errors',
changes: [{
file: 'phases/*.md',
action: 'modify',
diff: `+ try {
const result = await Task({...});
+ if (!result) throw new Error('Empty result');
+ } catch (e) {
+ updateState({ errors: [...errors, e.message], error_count: error_count + 1 });
+ }`
}],
risk: 'low',
estimated_impact: 'Improves error resilience',
verification_steps: ['Simulate agent failure', 'Verify graceful handling']
});
}
if (agentIssues.some(i => i.description.includes('nested'))) {
fixes.push({
id: `FIX-${fixes.length + 1}`,
issue_ids: agentIssues.filter(i => i.description.includes('nested')).map(i => i.id),
strategy: 'flatten_nesting',
description: 'Flatten nested agent calls',
rationale: 'Reduces complexity and context explosion',
changes: [{
file: 'phases/orchestrator.md',
action: 'modify',
diff: `// Instead of agent calling agent:
// Agent A returns {needs_agent_b: true}
// Orchestrator sees this and calls Agent B next`
}],
risk: 'medium',
estimated_impact: 'Reduces nesting depth',
verification_steps: ['Verify no nested Task calls', 'Test agent chaining via orchestrator']
});
}
}
// Write fix proposals
Write(`${workDir}/fixes/fix-proposals.json`, JSON.stringify(fixes, null, 2));
// Ask user to select fixes to apply
const fixOptions = fixes.slice(0, 4).map(f => ({
label: f.id,
description: `[${f.risk.toUpperCase()} risk] ${f.description}`
}));
if (fixOptions.length > 0) {
const selection = await AskUserQuestion({
questions: [{
question: 'Which fixes would you like to apply?',
header: 'Fixes',
multiSelect: true,
options: fixOptions
}]
});
const selectedFixIds = Array.isArray(selection['Fixes'])
? selection['Fixes']
: [selection['Fixes']];
return {
stateUpdates: {
proposed_fixes: fixes,
pending_fixes: selectedFixIds.filter(id => id && fixes.some(f => f.id === id))
},
outputFiles: [`${workDir}/fixes/fix-proposals.json`],
summary: `Generated ${fixes.length} fix proposals, ${selectedFixIds.length} selected for application`
};
}
return {
stateUpdates: {
proposed_fixes: fixes,
pending_fixes: []
},
outputFiles: [`${workDir}/fixes/fix-proposals.json`],
summary: `Generated ${fixes.length} fix proposals (none selected)`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
proposed_fixes: [...fixes],
pending_fixes: [...selectedFixIds]
}
};
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| No issues to fix | Skip to action-complete |
| User cancels selection | Set pending_fixes to empty |
## Next Actions
- If pending_fixes.length > 0: action-apply-fix
- If pending_fixes.length === 0: action-complete

View File

@@ -0,0 +1,222 @@
# Action: Verify Applied Fixes
Verify that applied fixes resolved the targeted issues.
## Purpose
- Re-run relevant diagnostics
- Compare before/after issue counts
- Update verification status
- Determine if more iterations needed
## Preconditions
- [ ] state.status === 'running'
- [ ] state.applied_fixes.length > 0
- [ ] Some applied_fixes have verification_result === 'pending'
## Execution
```javascript
async function execute(state, workDir) {
console.log('Verifying applied fixes...');
const appliedFixes = state.applied_fixes.filter(f => f.verification_result === 'pending');
if (appliedFixes.length === 0) {
return {
stateUpdates: {},
outputFiles: [],
summary: 'No fixes pending verification'
};
}
const verificationResults = [];
for (const fix of appliedFixes) {
const proposedFix = state.proposed_fixes.find(f => f.id === fix.fix_id);
if (!proposedFix) {
verificationResults.push({
fix_id: fix.fix_id,
result: 'fail',
reason: 'Fix definition not found'
});
continue;
}
// Determine which diagnosis to re-run based on fix strategy
const strategyToDiagnosis = {
'context_summarization': 'context',
'sliding_window': 'context',
'structured_state': 'context',
'path_reference': 'context',
'constraint_injection': 'memory',
'checkpoint_restore': 'memory',
'goal_embedding': 'memory',
'state_constraints_field': 'memory',
'state_centralization': 'dataflow',
'schema_enforcement': 'dataflow',
'field_normalization': 'dataflow',
'transactional_updates': 'dataflow',
'error_wrapping': 'agent',
'result_validation': 'agent',
'orchestrator_refactor': 'agent',
'flatten_nesting': 'agent'
};
const diagnosisType = strategyToDiagnosis[proposedFix.strategy];
// For now, do a lightweight verification
// Full implementation would re-run the specific diagnosis
// Check if the fix was actually applied (look for markers)
const targetPath = state.target_skill.path;
const fixMarker = `Applied fix ${fix.fix_id}`;
let fixFound = false;
const allFiles = Glob(`${targetPath}/**/*.md`);
for (const file of allFiles) {
const content = Read(file);
if (content.includes(fixMarker)) {
fixFound = true;
break;
}
}
if (fixFound) {
// Verify by checking if original issues still exist
const relatedIssues = proposedFix.issue_ids;
const originalIssueCount = relatedIssues.length;
// Simplified verification: assume fix worked if marker present
// Real implementation would re-run diagnosis patterns
verificationResults.push({
fix_id: fix.fix_id,
result: 'pass',
reason: `Fix applied successfully, addressing ${originalIssueCount} issues`,
issues_resolved: relatedIssues
});
} else {
verificationResults.push({
fix_id: fix.fix_id,
result: 'fail',
reason: 'Fix marker not found in target files'
});
}
}
// Update applied fixes with verification results
const updatedAppliedFixes = state.applied_fixes.map(fix => {
const result = verificationResults.find(v => v.fix_id === fix.fix_id);
if (result) {
return {
...fix,
verification_result: result.result
};
}
return fix;
});
// Calculate new quality score
const passedFixes = verificationResults.filter(v => v.result === 'pass').length;
const totalFixes = verificationResults.length;
const verificationRate = totalFixes > 0 ? (passedFixes / totalFixes) * 100 : 100;
// Recalculate issues (remove resolved ones)
const resolvedIssueIds = verificationResults
.filter(v => v.result === 'pass')
.flatMap(v => v.issues_resolved || []);
const remainingIssues = state.issues.filter(i => !resolvedIssueIds.includes(i.id));
// Recalculate quality score
const weights = { critical: 25, high: 15, medium: 5, low: 1 };
const deductions = remainingIssues.reduce((sum, issue) =>
sum + (weights[issue.severity] || 0), 0);
const newHealthScore = Math.max(0, 100 - deductions);
// Determine new quality gate
const remainingCritical = remainingIssues.filter(i => i.severity === 'critical').length;
const remainingHigh = remainingIssues.filter(i => i.severity === 'high').length;
const newQualityGate = remainingCritical === 0 && remainingHigh <= 2 && newHealthScore >= 60
? 'pass'
: newHealthScore >= 40 ? 'review' : 'fail';
// Increment iteration count
const newIterationCount = state.iteration_count + 1;
// Ask user if they want to continue
let continueIteration = false;
if (newQualityGate !== 'pass' && newIterationCount < state.max_iterations) {
const continueResponse = await AskUserQuestion({
questions: [{
question: `Verification complete. Quality gate: ${newQualityGate}. Continue with another iteration?`,
header: 'Continue',
multiSelect: false,
options: [
{ label: 'Yes', description: `Run iteration ${newIterationCount + 1}` },
{ label: 'No', description: 'Finish with current state' }
]
}]
});
continueIteration = continueResponse['Continue'] === 'Yes';
}
// If continuing, reset diagnosis for re-evaluation
const diagnosisReset = continueIteration ? {
'diagnosis.context': null,
'diagnosis.memory': null,
'diagnosis.dataflow': null,
'diagnosis.agent': null
} : {};
return {
stateUpdates: {
applied_fixes: updatedAppliedFixes,
issues: remainingIssues,
quality_score: newHealthScore,
quality_gate: newQualityGate,
iteration_count: newIterationCount,
...diagnosisReset,
issues_by_severity: {
critical: remainingIssues.filter(i => i.severity === 'critical').length,
high: remainingIssues.filter(i => i.severity === 'high').length,
medium: remainingIssues.filter(i => i.severity === 'medium').length,
low: remainingIssues.filter(i => i.severity === 'low').length
}
},
outputFiles: [],
summary: `Verified ${totalFixes} fixes: ${passedFixes} passed. Score: ${newHealthScore}, Gate: ${newQualityGate}, Iteration: ${newIterationCount}`
};
}
```
## State Updates
```javascript
return {
stateUpdates: {
applied_fixes: [...updatedWithVerificationResults],
issues: [...remainingIssues],
quality_score: newScore,
quality_gate: newGate,
iteration_count: iteration + 1
}
};
```
## Error Handling
| Error Type | Recovery |
|------------|----------|
| Re-diagnosis fails | Mark as 'inconclusive' |
| File access error | Skip file verification |
## Next Actions
- If quality_gate === 'pass': action-complete
- If user chose to continue: restart diagnosis cycle
- If max_iterations reached: action-complete

View File

@@ -0,0 +1,335 @@
# Orchestrator
Autonomous orchestrator for skill-tuning workflow. Reads current state and selects the next action based on diagnosis progress and quality gates.
## Role
Drive the tuning workflow by:
1. Reading current session state
2. Selecting the appropriate next action
3. Executing the action via sub-agent
4. Updating state with results
5. Repeating until termination conditions met
## State Management
### Read State
```javascript
const state = JSON.parse(Read(`${workDir}/state.json`));
```
### Update State
```javascript
function updateState(updates) {
const state = JSON.parse(Read(`${workDir}/state.json`));
const newState = {
...state,
...updates,
updated_at: new Date().toISOString()
};
Write(`${workDir}/state.json`, JSON.stringify(newState, null, 2));
return newState;
}
```
## Decision Logic
```javascript
function selectNextAction(state) {
// === Termination Checks ===
// User exit
if (state.status === 'user_exit') return null;
// Completed
if (state.status === 'completed') return null;
// Error limit exceeded
if (state.error_count >= state.max_errors) {
return 'action-abort';
}
// Max iterations exceeded
if (state.iteration_count >= state.max_iterations) {
return 'action-complete';
}
// === Action Selection ===
// 1. Not initialized yet
if (state.status === 'pending') {
return 'action-init';
}
// 2. Check if Gemini analysis is requested or needed
if (shouldTriggerGeminiAnalysis(state)) {
return 'action-gemini-analysis';
}
// 3. Check if Gemini analysis is running
if (state.gemini_analysis?.status === 'running') {
// Wait for Gemini analysis to complete
return null; // Orchestrator will be re-triggered when CLI completes
}
// 4. Run diagnosis in order (only if not completed)
const diagnosisOrder = ['context', 'memory', 'dataflow', 'agent'];
for (const diagType of diagnosisOrder) {
if (state.diagnosis[diagType] === null) {
// Check if user wants to skip this diagnosis
if (!state.focus_areas.length || state.focus_areas.includes(diagType)) {
return `action-diagnose-${diagType}`;
}
}
}
// 5. All diagnosis complete, generate report if not done
const allDiagnosisComplete = diagnosisOrder.every(
d => state.diagnosis[d] !== null || !state.focus_areas.includes(d)
);
if (allDiagnosisComplete && !state.completed_actions.includes('action-generate-report')) {
return 'action-generate-report';
}
// 6. Report generated, propose fixes if not done
if (state.completed_actions.includes('action-generate-report') &&
state.proposed_fixes.length === 0 &&
state.issues.length > 0) {
return 'action-propose-fixes';
}
// 7. Fixes proposed, check if user wants to apply
if (state.proposed_fixes.length > 0 && state.pending_fixes.length > 0) {
return 'action-apply-fix';
}
// 8. Fixes applied, verify
if (state.applied_fixes.length > 0 &&
state.applied_fixes.some(f => f.verification_result === 'pending')) {
return 'action-verify';
}
// 9. Quality gate check
if (state.quality_gate === 'pass') {
return 'action-complete';
}
// 10. More iterations needed
if (state.iteration_count < state.max_iterations &&
state.quality_gate !== 'pass' &&
state.issues.some(i => i.severity === 'critical' || i.severity === 'high')) {
// Reset diagnosis for re-evaluation
return 'action-diagnose-context'; // Start new iteration
}
// 11. Default: complete
return 'action-complete';
}
/**
* 判断是否需要触发 Gemini CLI 分析
*/
function shouldTriggerGeminiAnalysis(state) {
// 已完成 Gemini 分析,不再触发
if (state.gemini_analysis?.status === 'completed') {
return false;
}
// 用户显式请求
if (state.gemini_analysis_requested === true) {
return true;
}
// 发现 critical 问题且未进行深度分析
if (state.issues.some(i => i.severity === 'critical') &&
!state.completed_actions.includes('action-gemini-analysis')) {
return true;
}
// 用户指定了需要 Gemini 分析的 focus_areas
const geminiAreas = ['architecture', 'prompt', 'performance', 'custom'];
if (state.focus_areas.some(area => geminiAreas.includes(area))) {
return true;
}
// 标准诊断完成但问题未得到解决,需要深度分析
const diagnosisComplete = ['context', 'memory', 'dataflow', 'agent'].every(
d => state.diagnosis[d] !== null
);
if (diagnosisComplete &&
state.issues.length > 0 &&
state.iteration_count > 0 &&
!state.completed_actions.includes('action-gemini-analysis')) {
// 第二轮迭代如果问题仍存在,触发 Gemini 分析
return true;
}
return false;
}
```
## Execution Loop
```javascript
async function runOrchestrator(workDir) {
console.log('=== Skill Tuning Orchestrator Started ===');
let iteration = 0;
const MAX_LOOP_ITERATIONS = 50; // Safety limit
while (iteration < MAX_LOOP_ITERATIONS) {
iteration++;
// 1. Read current state
const state = JSON.parse(Read(`${workDir}/state.json`));
console.log(`[Loop ${iteration}] Status: ${state.status}, Action: ${state.current_action}`);
// 2. Select next action
const actionId = selectNextAction(state);
if (!actionId) {
console.log('No action selected, terminating orchestrator.');
break;
}
console.log(`[Loop ${iteration}] Executing: ${actionId}`);
// 3. Update state: current action
updateState({
current_action: actionId,
action_history: [...state.action_history, {
action: actionId,
started_at: new Date().toISOString(),
completed_at: null,
result: null,
output_files: []
}]
});
// 4. Execute action
try {
const actionPrompt = Read(`phases/actions/${actionId}.md`);
const stateJson = JSON.stringify(state, null, 2);
const result = await Task({
subagent_type: 'universal-executor',
run_in_background: false,
prompt: `
[CONTEXT]
You are executing action "${actionId}" for skill-tuning workflow.
Work directory: ${workDir}
[STATE]
${stateJson}
[ACTION INSTRUCTIONS]
${actionPrompt}
[OUTPUT REQUIREMENT]
After completing the action:
1. Write any output files to the work directory
2. Return a JSON object with:
- stateUpdates: object with state fields to update
- outputFiles: array of files created
- summary: brief description of what was done
`
});
// 5. Parse result and update state
let actionResult;
try {
actionResult = JSON.parse(result);
} catch (e) {
actionResult = {
stateUpdates: {},
outputFiles: [],
summary: result
};
}
// 6. Update state: action complete
const updatedHistory = [...state.action_history];
updatedHistory[updatedHistory.length - 1] = {
...updatedHistory[updatedHistory.length - 1],
completed_at: new Date().toISOString(),
result: 'success',
output_files: actionResult.outputFiles || []
};
updateState({
current_action: null,
completed_actions: [...state.completed_actions, actionId],
action_history: updatedHistory,
...actionResult.stateUpdates
});
console.log(`[Loop ${iteration}] Completed: ${actionId}`);
} catch (error) {
console.log(`[Loop ${iteration}] Error in ${actionId}: ${error.message}`);
// Error handling
updateState({
current_action: null,
errors: [...state.errors, {
action: actionId,
message: error.message,
timestamp: new Date().toISOString(),
recoverable: true
}],
error_count: state.error_count + 1
});
}
}
console.log('=== Skill Tuning Orchestrator Finished ===');
}
```
## Action Catalog
| Action | Purpose | Preconditions | Effects |
|--------|---------|---------------|---------|
| [action-init](actions/action-init.md) | Initialize tuning session | status === 'pending' | Creates work dirs, backup, sets status='running' |
| [action-diagnose-context](actions/action-diagnose-context.md) | Analyze context explosion | status === 'running' | Sets diagnosis.context |
| [action-diagnose-memory](actions/action-diagnose-memory.md) | Analyze long-tail forgetting | status === 'running' | Sets diagnosis.memory |
| [action-diagnose-dataflow](actions/action-diagnose-dataflow.md) | Analyze data flow issues | status === 'running' | Sets diagnosis.dataflow |
| [action-diagnose-agent](actions/action-diagnose-agent.md) | Analyze agent coordination | status === 'running' | Sets diagnosis.agent |
| [action-gemini-analysis](actions/action-gemini-analysis.md) | Deep analysis via Gemini CLI | User request OR critical issues | Sets gemini_analysis, adds issues |
| [action-generate-report](actions/action-generate-report.md) | Generate consolidated report | All diagnoses complete | Creates tuning-report.md |
| [action-propose-fixes](actions/action-propose-fixes.md) | Generate fix proposals | Report generated, issues > 0 | Sets proposed_fixes |
| [action-apply-fix](actions/action-apply-fix.md) | Apply selected fix | pending_fixes > 0 | Updates applied_fixes |
| [action-verify](actions/action-verify.md) | Verify applied fixes | applied_fixes with pending verification | Updates verification_result |
| [action-complete](actions/action-complete.md) | Finalize session | quality_gate='pass' OR max_iterations | Sets status='completed' |
| [action-abort](actions/action-abort.md) | Abort on errors | error_count >= max_errors | Sets status='failed' |
## Termination Conditions
- `status === 'completed'`: Normal completion
- `status === 'user_exit'`: User requested exit
- `status === 'failed'`: Unrecoverable error
- `error_count >= max_errors`: Too many errors (default: 3)
- `iteration_count >= max_iterations`: Max iterations reached (default: 5)
- `quality_gate === 'pass'`: All quality criteria met
## Error Recovery
| Error Type | Recovery Strategy |
|------------|-------------------|
| Action execution failed | Retry up to 3 times, then skip |
| State parse error | Restore from backup |
| File write error | Retry with alternative path |
| User abort | Save state and exit gracefully |
## User Interaction Points
The orchestrator pauses for user input at these points:
1. **action-init**: Confirm target skill and describe issue
2. **action-propose-fixes**: Select which fixes to apply
3. **action-verify**: Review verification results, decide to continue or stop
4. **action-complete**: Review final summary

View File

@@ -0,0 +1,282 @@
# State Schema
Defines the state structure for skill-tuning orchestrator.
## State Structure
```typescript
interface TuningState {
// === Core Status ===
status: 'pending' | 'running' | 'completed' | 'failed';
started_at: string; // ISO timestamp
updated_at: string; // ISO timestamp
// === Target Skill Info ===
target_skill: {
name: string; // e.g., "software-manual"
path: string; // e.g., ".claude/skills/software-manual"
execution_mode: 'sequential' | 'autonomous';
phases: string[]; // List of phase files
specs: string[]; // List of spec files
};
// === User Input ===
user_issue_description: string; // User's problem description
focus_areas: string[]; // User-specified focus (optional)
// === Diagnosis Results ===
diagnosis: {
context: DiagnosisResult | null;
memory: DiagnosisResult | null;
dataflow: DiagnosisResult | null;
agent: DiagnosisResult | null;
};
// === Issues Found ===
issues: Issue[];
issues_by_severity: {
critical: number;
high: number;
medium: number;
low: number;
};
// === Fix Management ===
proposed_fixes: Fix[];
applied_fixes: AppliedFix[];
pending_fixes: string[]; // Fix IDs pending application
// === Iteration Control ===
iteration_count: number;
max_iterations: number; // Default: 5
// === Quality Metrics ===
quality_score: number; // 0-100
quality_gate: 'pass' | 'review' | 'fail';
// === Orchestrator State ===
completed_actions: string[];
current_action: string | null;
action_history: ActionHistoryEntry[];
// === Error Handling ===
errors: ErrorEntry[];
error_count: number;
max_errors: number; // Default: 3
// === Output Paths ===
work_dir: string;
backup_dir: string;
}
interface DiagnosisResult {
status: 'completed' | 'skipped' | 'failed';
issues_found: number;
severity: 'critical' | 'high' | 'medium' | 'low' | 'none';
execution_time_ms: number;
details: {
patterns_checked: string[];
patterns_matched: string[];
evidence: Evidence[];
recommendations: string[];
};
}
interface Evidence {
file: string;
line?: number;
pattern: string;
context: string;
severity: string;
}
interface Issue {
id: string; // e.g., "ISS-001"
type: 'context_explosion' | 'memory_loss' | 'dataflow_break' | 'agent_failure';
severity: 'critical' | 'high' | 'medium' | 'low';
priority: number; // 1 = highest
location: {
file: string;
line_start?: number;
line_end?: number;
phase?: string;
};
description: string;
evidence: string[];
root_cause: string;
impact: string;
suggested_fix: string;
related_issues: string[]; // Issue IDs
}
interface Fix {
id: string; // e.g., "FIX-001"
issue_ids: string[]; // Issues this fix addresses
strategy: FixStrategy;
description: string;
rationale: string;
changes: FileChange[];
risk: 'low' | 'medium' | 'high';
estimated_impact: string;
verification_steps: string[];
}
type FixStrategy =
| 'context_summarization' // Add context compression
| 'sliding_window' // Implement sliding context window
| 'structured_state' // Convert to structured state passing
| 'constraint_injection' // Add constraint propagation
| 'checkpoint_restore' // Add checkpointing mechanism
| 'schema_enforcement' // Add data contract validation
| 'orchestrator_refactor' // Refactor agent coordination
| 'state_centralization' // Centralize state management
| 'custom'; // Custom fix
interface FileChange {
file: string;
action: 'create' | 'modify' | 'delete';
old_content?: string;
new_content?: string;
diff?: string;
}
interface AppliedFix {
fix_id: string;
applied_at: string;
success: boolean;
backup_path: string;
verification_result: 'pass' | 'fail' | 'pending';
rollback_available: boolean;
}
interface ActionHistoryEntry {
action: string;
started_at: string;
completed_at: string;
result: 'success' | 'failure' | 'skipped';
output_files: string[];
}
interface ErrorEntry {
action: string;
message: string;
timestamp: string;
recoverable: boolean;
}
```
## Initial State Template
```json
{
"status": "pending",
"started_at": null,
"updated_at": null,
"target_skill": {
"name": null,
"path": null,
"execution_mode": null,
"phases": [],
"specs": []
},
"user_issue_description": "",
"focus_areas": [],
"diagnosis": {
"context": null,
"memory": null,
"dataflow": null,
"agent": null
},
"issues": [],
"issues_by_severity": {
"critical": 0,
"high": 0,
"medium": 0,
"low": 0
},
"proposed_fixes": [],
"applied_fixes": [],
"pending_fixes": [],
"iteration_count": 0,
"max_iterations": 5,
"quality_score": 0,
"quality_gate": "fail",
"completed_actions": [],
"current_action": null,
"action_history": [],
"errors": [],
"error_count": 0,
"max_errors": 3,
"work_dir": null,
"backup_dir": null
}
```
## State Transition Diagram
```
┌─────────────┐
│ pending │
└──────┬──────┘
│ action-init
┌─────────────┐
┌──────────│ running │──────────┐
│ └──────┬──────┘ │
│ │ │
diagnosis │ ┌────────────┼────────────┐ │ error_count >= 3
actions │ │ │ │ │
│ ↓ ↓ ↓ │
│ context memory dataflow │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ │ │
│ ↓ │
│ action-verify │
│ │ │
│ ┌───────────┼───────────┐ │
│ │ │ │ │
│ ↓ ↓ ↓ │
│ quality iterate apply │
│ gate=pass (< max) fix │
│ │ │ │ │
│ │ └───────────┘ │
│ ↓ ↓
│ ┌─────────────┐ ┌─────────────┐
└→│ completed │ │ failed │
└─────────────┘ └─────────────┘
```
## State Update Rules
### Atomicity
All state updates must be atomic - read current state, apply changes, write entire state.
### Immutability
Never mutate state in place. Always create new state object with changes.
### Validation
Before writing state, validate against schema to prevent corruption.
### Timestamps
Always update `updated_at` on every state change.
```javascript
function updateState(workDir, updates) {
const currentState = JSON.parse(Read(`${workDir}/state.json`));
const newState = {
...currentState,
...updates,
updated_at: new Date().toISOString()
};
// Validate before write
if (!validateState(newState)) {
throw new Error('Invalid state update');
}
Write(`${workDir}/state.json`, JSON.stringify(newState, null, 2));
return newState;
}
```

View File

@@ -0,0 +1,210 @@
# Problem Taxonomy
Classification of skill execution issues with detection patterns and severity criteria.
## When to Use
| Phase | Usage | Section |
|-------|-------|---------|
| All Diagnosis Actions | Issue classification | All sections |
| action-propose-fixes | Strategy selection | Fix Mapping |
| action-generate-report | Severity assessment | Severity Criteria |
---
## Problem Categories
### 1. Context Explosion (P2)
**Definition**: Excessive token accumulation causing prompt size to grow unbounded.
**Root Causes**:
- Unbounded conversation history
- Full content passing instead of references
- Missing summarization mechanisms
- Agent returning full output instead of path+summary
**Detection Patterns**:
| Pattern ID | Regex/Check | Description |
|------------|-------------|-------------|
| CTX-001 | `/history\s*[.=].*push\|concat/` | History array growth |
| CTX-002 | `/JSON\.stringify\s*\(\s*state\s*\)/` | Full state serialization |
| CTX-003 | `/Read\([^)]+\)\s*[\+,]/` | Multiple file content concatenation |
| CTX-004 | `/return\s*\{[^}]*content:/` | Agent returning full content |
| CTX-005 | File length > 5000 chars without summarize | Long prompt without compression |
**Impact Levels**:
- **Critical**: Context exceeds model limit (128K tokens)
- **High**: Context > 50K tokens per iteration
- **Medium**: Context grows 10%+ per iteration
- **Low**: Potential for growth but currently manageable
---
### 2. Long-tail Forgetting (P3)
**Definition**: Loss of early instructions, constraints, or goals in long execution chains.
**Root Causes**:
- No explicit constraint propagation
- Reliance on implicit context
- Missing checkpoint/restore mechanisms
- State schema without requirements field
**Detection Patterns**:
| Pattern ID | Regex/Check | Description |
|------------|-------------|-------------|
| MEM-001 | Later phases missing constraint reference | Constraint not carried forward |
| MEM-002 | `/\[TASK\][^[]*(?!\[CONSTRAINTS\])/` | Task without constraints section |
| MEM-003 | Key phases without checkpoint | Missing state preservation |
| MEM-004 | State schema lacks `original_requirements` | No constraint persistence |
| MEM-005 | No verification phase | Output not checked against intent |
**Impact Levels**:
- **Critical**: Original goal completely lost
- **High**: Key constraints ignored in output
- **Medium**: Some requirements missing
- **Low**: Minor goal drift
---
### 3. Data Flow Disruption (P0)
**Definition**: Inconsistent state management causing data loss or corruption.
**Root Causes**:
- Multiple state storage locations
- Inconsistent field naming
- Missing schema validation
- Format transformation without normalization
**Detection Patterns**:
| Pattern ID | Regex/Check | Description |
|------------|-------------|-------------|
| DF-001 | Multiple state file writes | Scattered state storage |
| DF-002 | Same concept, different names | Field naming inconsistency |
| DF-003 | JSON.parse without validation | Missing schema validation |
| DF-004 | Files written but never read | Orphaned outputs |
| DF-005 | Autonomous skill without state-schema | Undefined state structure |
**Impact Levels**:
- **Critical**: Data loss or corruption
- **High**: State inconsistency between phases
- **Medium**: Potential for inconsistency
- **Low**: Minor naming inconsistencies
---
### 4. Agent Coordination Failure (P1)
**Definition**: Fragile agent call patterns causing cascading failures.
**Root Causes**:
- Missing error handling in Task calls
- No result validation
- Inconsistent agent configurations
- Deeply nested agent calls
**Detection Patterns**:
| Pattern ID | Regex/Check | Description |
|------------|-------------|-------------|
| AGT-001 | Task without try-catch | Missing error handling |
| AGT-002 | Result used without validation | No return value check |
| AGT-003 | > 3 different agent types | Agent type proliferation |
| AGT-004 | Nested Task in prompt | Agent calling agent |
| AGT-005 | Task used but not in allowed-tools | Tool declaration mismatch |
| AGT-006 | Multiple return formats | Inconsistent agent output |
**Impact Levels**:
- **Critical**: Workflow crash on agent failure
- **High**: Unpredictable agent behavior
- **Medium**: Occasional coordination issues
- **Low**: Minor inconsistencies
---
## Severity Criteria
### Global Severity Matrix
| Severity | Definition | Action Required |
|----------|------------|-----------------|
| **Critical** | Blocks execution or causes data loss | Immediate fix required |
| **High** | Significantly impacts reliability | Should fix before deployment |
| **Medium** | Affects quality or maintainability | Fix in next iteration |
| **Low** | Minor improvement opportunity | Optional fix |
### Severity Calculation
```javascript
function calculateIssueSeverity(issue) {
const weights = {
impact_on_execution: 40, // Does it block workflow?
data_integrity_risk: 30, // Can it cause data loss?
frequency: 20, // How often does it occur?
complexity_to_fix: 10 // How hard to fix?
};
let score = 0;
// Impact on execution
if (issue.blocks_execution) score += weights.impact_on_execution;
else if (issue.degrades_execution) score += weights.impact_on_execution * 0.5;
// Data integrity
if (issue.causes_data_loss) score += weights.data_integrity_risk;
else if (issue.causes_inconsistency) score += weights.data_integrity_risk * 0.5;
// Frequency
if (issue.occurs_every_run) score += weights.frequency;
else if (issue.occurs_sometimes) score += weights.frequency * 0.5;
// Complexity (inverse - easier to fix = higher priority)
if (issue.fix_complexity === 'low') score += weights.complexity_to_fix;
else if (issue.fix_complexity === 'medium') score += weights.complexity_to_fix * 0.5;
// Map score to severity
if (score >= 70) return 'critical';
if (score >= 50) return 'high';
if (score >= 30) return 'medium';
return 'low';
}
```
---
## Fix Mapping
| Problem Type | Recommended Strategies | Priority Order |
|--------------|----------------------|----------------|
| Context Explosion | sliding_window, path_reference, context_summarization | 1, 2, 3 |
| Long-tail Forgetting | constraint_injection, state_constraints_field, checkpoint | 1, 2, 3 |
| Data Flow Disruption | state_centralization, schema_enforcement, field_normalization | 1, 2, 3 |
| Agent Coordination | error_wrapping, result_validation, flatten_nesting | 1, 2, 3 |
---
## Cross-Category Dependencies
Some issues may trigger others:
```
Context Explosion ──→ Long-tail Forgetting
(Large context causes important info to be pushed out)
Data Flow Disruption ──→ Agent Coordination Failure
(Inconsistent data causes agents to fail)
Agent Coordination Failure ──→ Context Explosion
(Failed retries add to context)
```
When fixing, address in this order:
1. **P0 Data Flow** - Foundation for other fixes
2. **P1 Agent Coordination** - Stability
3. **P2 Context Explosion** - Efficiency
4. **P3 Long-tail Forgetting** - Quality

View File

@@ -0,0 +1,263 @@
# Quality Gates
Quality thresholds and verification criteria for skill tuning.
## When to Use
| Phase | Usage | Section |
|-------|-------|---------|
| action-generate-report | Calculate quality score | Scoring |
| action-verify | Check quality gates | Gate Definitions |
| action-complete | Final assessment | Pass Criteria |
---
## Quality Dimensions
### 1. Issue Severity Distribution (40%)
Measures the severity profile of identified issues.
| Metric | Weight | Calculation |
|--------|--------|-------------|
| Critical Issues | -25 each | High penalty |
| High Issues | -15 each | Significant penalty |
| Medium Issues | -5 each | Moderate penalty |
| Low Issues | -1 each | Minor penalty |
**Score Calculation**:
```javascript
function calculateSeverityScore(issues) {
const weights = { critical: 25, high: 15, medium: 5, low: 1 };
const deductions = issues.reduce((sum, issue) =>
sum + (weights[issue.severity] || 0), 0);
return Math.max(0, 100 - deductions);
}
```
### 2. Fix Effectiveness (30%)
Measures success rate of applied fixes.
| Metric | Weight | Threshold |
|--------|--------|-----------|
| Fixes Verified Pass | +30 | > 80% pass rate |
| Fixes Verified Fail | -20 | < 50% triggers review |
| Issues Resolved | +10 | Per resolved issue |
**Score Calculation**:
```javascript
function calculateFixScore(appliedFixes) {
const total = appliedFixes.length;
if (total === 0) return 100; // No fixes needed = good
const passed = appliedFixes.filter(f => f.verification_result === 'pass').length;
return Math.round((passed / total) * 100);
}
```
### 3. Coverage Completeness (20%)
Measures diagnosis coverage across all areas.
| Metric | Weight | Threshold |
|--------|--------|-----------|
| All 4 diagnosis complete | +20 | Full coverage |
| 3 diagnosis complete | +15 | Good coverage |
| 2 diagnosis complete | +10 | Partial coverage |
| < 2 diagnosis complete | +0 | Insufficient |
### 4. Iteration Efficiency (10%)
Measures how quickly issues are resolved.
| Metric | Weight | Threshold |
|--------|--------|-----------|
| Resolved in 1 iteration | +10 | Excellent |
| Resolved in 2 iterations | +7 | Good |
| Resolved in 3 iterations | +4 | Acceptable |
| > 3 iterations | +0 | Needs improvement |
---
## Gate Definitions
### Gate: PASS
**Threshold**: Quality Score >= 80 AND Critical Issues = 0 AND High Issues <= 2
**Meaning**: Skill is production-ready with minor issues.
**Actions**:
- Complete tuning session
- Generate summary report
- No further fixes required
### Gate: REVIEW
**Threshold**: Quality Score 60-79 OR High Issues 3-5
**Meaning**: Skill has issues requiring attention.
**Actions**:
- Review remaining issues
- Apply additional fixes if possible
- May require manual intervention
### Gate: FAIL
**Threshold**: Quality Score < 60 OR Critical Issues > 0 OR High Issues > 5
**Meaning**: Skill has serious issues blocking deployment.
**Actions**:
- Must fix critical issues
- Re-run diagnosis after fixes
- Consider architectural review
---
## Quality Score Calculation
```javascript
function calculateQualityScore(state) {
// Dimension 1: Severity (40%)
const severityScore = calculateSeverityScore(state.issues);
// Dimension 2: Fix Effectiveness (30%)
const fixScore = calculateFixScore(state.applied_fixes);
// Dimension 3: Coverage (20%)
const diagnosisCount = Object.values(state.diagnosis)
.filter(d => d !== null).length;
const coverageScore = [0, 0, 10, 15, 20][diagnosisCount] || 0;
// Dimension 4: Efficiency (10%)
const efficiencyScore = state.iteration_count <= 1 ? 10 :
state.iteration_count <= 2 ? 7 :
state.iteration_count <= 3 ? 4 : 0;
// Weighted total
const total = (severityScore * 0.4) +
(fixScore * 0.3) +
(coverageScore * 1.0) + // Already scaled to 20
(efficiencyScore * 1.0); // Already scaled to 10
return Math.round(total);
}
function determineQualityGate(state) {
const score = calculateQualityScore(state);
const criticalCount = state.issues.filter(i => i.severity === 'critical').length;
const highCount = state.issues.filter(i => i.severity === 'high').length;
if (criticalCount > 0) return 'fail';
if (highCount > 5) return 'fail';
if (score < 60) return 'fail';
if (highCount > 2) return 'review';
if (score < 80) return 'review';
return 'pass';
}
```
---
## Verification Criteria
### For Each Issue Type
#### Context Explosion Issues
- [ ] Token count does not grow unbounded
- [ ] History limited to reasonable size
- [ ] No full content in prompts (paths used instead)
- [ ] Agent returns are compact
#### Long-tail Forgetting Issues
- [ ] Constraints visible in all phase prompts
- [ ] State schema includes requirements field
- [ ] Checkpoints exist at key milestones
- [ ] Output matches original constraints
#### Data Flow Issues
- [ ] Single state.json after execution
- [ ] No orphan state files
- [ ] Schema validation active
- [ ] Consistent field naming
#### Agent Coordination Issues
- [ ] All Task calls have error handling
- [ ] Agent results validated before use
- [ ] No nested agent calls
- [ ] Tool declarations match usage
---
## Iteration Control
### Max Iterations
Default: 5 iterations
**Rationale**:
- Each iteration may introduce new issues
- Diminishing returns after 3-4 iterations
- Prevents infinite loops
### Iteration Exit Criteria
```javascript
function shouldContinueIteration(state) {
// Exit if quality gate passed
if (state.quality_gate === 'pass') return false;
// Exit if max iterations reached
if (state.iteration_count >= state.max_iterations) return false;
// Exit if no improvement in last 2 iterations
if (state.iteration_count >= 2) {
const recentHistory = state.action_history.slice(-10);
const issuesResolvedRecently = recentHistory.filter(a =>
a.action === 'action-verify' && a.result === 'success'
).length;
if (issuesResolvedRecently === 0) {
console.log('No progress in recent iterations, stopping.');
return false;
}
}
// Continue if critical/high issues remain
const hasUrgentIssues = state.issues.some(i =>
i.severity === 'critical' || i.severity === 'high'
);
return hasUrgentIssues;
}
```
---
## Reporting Format
### Quality Summary Table
| Dimension | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| Severity Distribution | {score}/100 | 40% | {weighted} |
| Fix Effectiveness | {score}/100 | 30% | {weighted} |
| Coverage Completeness | {score}/20 | 20% | {score} |
| Iteration Efficiency | {score}/10 | 10% | {score} |
| **Total** | | | **{total}/100** |
### Gate Status
```
Quality Gate: {PASS|REVIEW|FAIL}
Criteria:
- Quality Score: {score} (threshold: 60)
- Critical Issues: {count} (threshold: 0)
- High Issues: {count} (threshold: 5)
```

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,153 @@
# Diagnosis Report Template
Template for individual diagnosis action reports.
## Template
```markdown
# {{diagnosis_type}} Diagnosis Report
**Target Skill**: {{skill_name}}
**Diagnosis Type**: {{diagnosis_type}}
**Executed At**: {{timestamp}}
**Duration**: {{duration_ms}}ms
---
## Summary
| Metric | Value |
|--------|-------|
| Issues Found | {{issues_found}} |
| Severity | {{severity}} |
| Patterns Checked | {{patterns_checked_count}} |
| Patterns Matched | {{patterns_matched_count}} |
---
## Patterns Analyzed
{{#each patterns_checked}}
### {{pattern_name}}
- **Status**: {{status}}
- **Matches**: {{match_count}}
- **Files Affected**: {{affected_files}}
{{/each}}
---
## Issues Identified
{{#if issues.length}}
{{#each issues}}
### {{id}}: {{description}}
| Field | Value |
|-------|-------|
| Type | {{type}} |
| Severity | {{severity}} |
| Location | {{location}} |
| Root Cause | {{root_cause}} |
| Impact | {{impact}} |
**Evidence**:
{{#each evidence}}
- `{{this}}`
{{/each}}
**Suggested Fix**: {{suggested_fix}}
---
{{/each}}
{{else}}
_No issues found in this diagnosis area._
{{/if}}
---
## Recommendations
{{#if recommendations.length}}
{{#each recommendations}}
{{@index}}. {{this}}
{{/each}}
{{else}}
No specific recommendations - area appears healthy.
{{/if}}
---
## Raw Data
Full diagnosis data available at:
`{{output_file}}`
```
## Variable Reference
| Variable | Type | Source |
|----------|------|--------|
| `diagnosis_type` | string | 'context' \| 'memory' \| 'dataflow' \| 'agent' |
| `skill_name` | string | state.target_skill.name |
| `timestamp` | string | ISO timestamp |
| `duration_ms` | number | Execution time |
| `issues_found` | number | issues.length |
| `severity` | string | Calculated severity |
| `patterns_checked` | array | Patterns analyzed |
| `patterns_matched` | array | Patterns with matches |
| `issues` | array | Issue objects |
| `recommendations` | array | String recommendations |
| `output_file` | string | Path to JSON file |
## Usage
```javascript
function renderDiagnosisReport(diagnosis, diagnosisType, skillName, outputFile) {
return `# ${diagnosisType} Diagnosis Report
**Target Skill**: ${skillName}
**Diagnosis Type**: ${diagnosisType}
**Executed At**: ${new Date().toISOString()}
**Duration**: ${diagnosis.execution_time_ms}ms
---
## Summary
| Metric | Value |
|--------|-------|
| Issues Found | ${diagnosis.issues_found} |
| Severity | ${diagnosis.severity} |
| Patterns Checked | ${diagnosis.details.patterns_checked.length} |
| Patterns Matched | ${diagnosis.details.patterns_matched.length} |
---
## Issues Identified
${diagnosis.details.evidence.map((e, i) => `
### Issue ${i + 1}
- **File**: ${e.file}
- **Pattern**: ${e.pattern}
- **Severity**: ${e.severity}
- **Context**: \`${e.context}\`
`).join('\n')}
---
## Recommendations
${diagnosis.details.recommendations.map((r, i) => `${i + 1}. ${r}`).join('\n')}
---
## Raw Data
Full diagnosis data available at:
\`${outputFile}\`
`;
}
```

View File

@@ -0,0 +1,204 @@
# Fix Proposal Template
Template for fix proposal documentation.
## Template
```markdown
# Fix Proposal: {{fix_id}}
**Strategy**: {{strategy}}
**Risk Level**: {{risk}}
**Issues Addressed**: {{issue_ids}}
---
## Description
{{description}}
## Rationale
{{rationale}}
---
## Affected Files
{{#each changes}}
### {{file}}
**Action**: {{action}}
```diff
{{diff}}
```
{{/each}}
---
## Implementation Steps
{{#each implementation_steps}}
{{@index}}. {{this}}
{{/each}}
---
## Risk Assessment
| Factor | Assessment |
|--------|------------|
| Complexity | {{complexity}} |
| Reversibility | {{reversible ? 'Yes' : 'No'}} |
| Breaking Changes | {{breaking_changes}} |
| Test Coverage | {{test_coverage}} |
**Overall Risk**: {{risk}}
---
## Verification Steps
{{#each verification_steps}}
- [ ] {{this}}
{{/each}}
---
## Rollback Plan
{{#if rollback_available}}
To rollback this fix:
```bash
{{rollback_command}}
```
{{else}}
_Rollback not available for this fix type._
{{/if}}
---
## Estimated Impact
{{estimated_impact}}
```
## Variable Reference
| Variable | Type | Source |
|----------|------|--------|
| `fix_id` | string | Generated ID (FIX-001) |
| `strategy` | string | Fix strategy name |
| `risk` | string | 'low' \| 'medium' \| 'high' |
| `issue_ids` | array | Related issue IDs |
| `description` | string | Human-readable description |
| `rationale` | string | Why this fix works |
| `changes` | array | File change objects |
| `implementation_steps` | array | Step-by-step guide |
| `verification_steps` | array | How to verify fix worked |
| `estimated_impact` | string | Expected improvement |
## Usage
```javascript
function renderFixProposal(fix) {
return `# Fix Proposal: ${fix.id}
**Strategy**: ${fix.strategy}
**Risk Level**: ${fix.risk}
**Issues Addressed**: ${fix.issue_ids.join(', ')}
---
## Description
${fix.description}
## Rationale
${fix.rationale}
---
## Affected Files
${fix.changes.map(change => `
### ${change.file}
**Action**: ${change.action}
\`\`\`diff
${change.diff || change.new_content?.slice(0, 200) || 'N/A'}
\`\`\`
`).join('\n')}
---
## Verification Steps
${fix.verification_steps.map(step => `- [ ] ${step}`).join('\n')}
---
## Estimated Impact
${fix.estimated_impact}
`;
}
```
## Fix Strategy Templates
### sliding_window
```markdown
## Description
Implement sliding window for conversation history to prevent unbounded growth.
## Changes
- Add MAX_HISTORY constant
- Modify history update logic to slice array
- Update state schema documentation
## Verification
- [ ] Run skill for 10+ iterations
- [ ] Verify history.length <= MAX_HISTORY
- [ ] Check no data loss for recent items
```
### constraint_injection
```markdown
## Description
Add explicit constraint section to each phase prompt.
## Changes
- Add [CONSTRAINTS] section template
- Reference state.original_requirements
- Add reminder before output section
## Verification
- [ ] Check constraints visible in all phases
- [ ] Test with specific constraint
- [ ] Verify output respects constraint
```
### error_wrapping
```markdown
## Description
Wrap all Task calls in try-catch with retry logic.
## Changes
- Create safeTask wrapper function
- Replace direct Task calls
- Add error logging to state
## Verification
- [ ] Simulate agent failure
- [ ] Verify graceful error handling
- [ ] Check retry logic
```