From 633d918da1b143ef1bf4dcdf6033259e11179d18 Mon Sep 17 00:00:00 2001 From: catlog22 Date: Wed, 14 Jan 2026 12:59:13 +0800 Subject: [PATCH] Add quality gates and tuning strategies documentation - Introduced quality gates specification for skill tuning, detailing quality dimensions, scoring, and gate definitions. - Added comprehensive tuning strategies for various issue categories, including context explosion, long-tail forgetting, data flow, and agent coordination. - Created templates for diagnosis reports and fix proposals to standardize documentation and reporting processes. --- .claude/skills/skill-tuning/SKILL.md | 342 ++++++ .../phases/actions/action-abort.md | 164 +++ .../phases/actions/action-apply-fix.md | 206 ++++ .../phases/actions/action-complete.md | 195 ++++ .../phases/actions/action-diagnose-agent.md | 317 +++++ .../phases/actions/action-diagnose-context.md | 243 ++++ .../actions/action-diagnose-dataflow.md | 318 ++++++ .../phases/actions/action-diagnose-memory.md | 269 +++++ .../phases/actions/action-gemini-analysis.md | 322 ++++++ .../phases/actions/action-generate-report.md | 228 ++++ .../phases/actions/action-init.md | 149 +++ .../phases/actions/action-propose-fixes.md | 317 +++++ .../phases/actions/action-verify.md | 222 ++++ .../skill-tuning/phases/orchestrator.md | 335 ++++++ .../skill-tuning/phases/state-schema.md | 282 +++++ .../skill-tuning/specs/problem-taxonomy.md | 210 ++++ .../skill-tuning/specs/quality-gates.md | 263 +++++ .../skill-tuning/specs/tuning-strategies.md | 1016 +++++++++++++++++ .../templates/diagnosis-report.md | 153 +++ .../skill-tuning/templates/fix-proposal.md | 204 ++++ 20 files changed, 5755 insertions(+) create mode 100644 .claude/skills/skill-tuning/SKILL.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-abort.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-apply-fix.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-complete.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-diagnose-agent.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-diagnose-context.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-diagnose-dataflow.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-diagnose-memory.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-gemini-analysis.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-generate-report.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-init.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-propose-fixes.md create mode 100644 .claude/skills/skill-tuning/phases/actions/action-verify.md create mode 100644 .claude/skills/skill-tuning/phases/orchestrator.md create mode 100644 .claude/skills/skill-tuning/phases/state-schema.md create mode 100644 .claude/skills/skill-tuning/specs/problem-taxonomy.md create mode 100644 .claude/skills/skill-tuning/specs/quality-gates.md create mode 100644 .claude/skills/skill-tuning/specs/tuning-strategies.md create mode 100644 .claude/skills/skill-tuning/templates/diagnosis-report.md create mode 100644 .claude/skills/skill-tuning/templates/fix-proposal.md diff --git a/.claude/skills/skill-tuning/SKILL.md b/.claude/skills/skill-tuning/SKILL.md new file mode 100644 index 00000000..ca64efec --- /dev/null +++ b/.claude/skills/skill-tuning/SKILL.md @@ -0,0 +1,342 @@ +--- +name: skill-tuning +description: Universal skill diagnosis and optimization tool. Detect and fix skill execution issues including context explosion, long-tail forgetting, data flow disruption, and agent coordination failures. Supports Gemini CLI for deep analysis. Triggers on "skill tuning", "tune skill", "skill diagnosis", "optimize skill", "skill debug". +allowed-tools: Task, AskUserQuestion, Read, Write, Bash, Glob, Grep, mcp__ace-tool__search_context +--- + +# Skill Tuning + +Universal skill diagnosis and optimization tool that identifies and resolves skill execution problems through iterative multi-agent analysis. + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ Skill Tuning Architecture (Autonomous Mode + Gemini CLI) │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ⚠️ Phase 0: Specification → 阅读规范 + 理解目标 skill 结构 (强制前置) │ +│ Study │ +│ ↓ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ Orchestrator (状态驱动决策) │ │ +│ │ 读取诊断状态 → 选择下一步动作 → 执行 → 更新状态 → 循环直到完成 │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌────────────┬───────────┼───────────┬────────────┬────────────┐ │ +│ ↓ ↓ ↓ ↓ ↓ ↓ │ +│ ┌──────┐ ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌─────────┐ │ +│ │ Init │ │Diagnose │ │Diagnose│ │Diagnose│ │Diagnose│ │ Gemini │ │ +│ │ │ │ Context │ │ Memory │ │DataFlow│ │ Agent │ │Analysis │ │ +│ └──────┘ └─────────┘ └────────┘ └────────┘ └────────┘ └─────────┘ │ +│ │ │ │ │ │ │ │ +│ └───────────┴───────────┴───────────┴────────────┴────────────┘ │ +│ ↓ │ +│ ┌──────────────────┐ │ +│ │ Apply Fixes + │ │ +│ │ Verify Results │ │ +│ └──────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ Gemini CLI Integration │ │ +│ │ 根据用户需求动态调用 gemini cli 进行深度分析: │ │ +│ │ • 复杂问题分析 (prompt engineering, architecture review) │ │ +│ │ • 代码模式识别 (pattern matching, anti-pattern detection) │ │ +│ │ • 修复策略生成 (fix generation, refactoring suggestions) │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Problem Domain + +Based on comprehensive analysis, skill-tuning addresses **core skill issues** and **general optimization areas**: + +### Core Skill Issues (自动检测) + +| Priority | Problem | Root Cause | Solution Strategy | +|----------|---------|------------|-------------------| +| **P0** | Data Flow Disruption | Scattered state, inconsistent formats | Centralized session store, transactional updates | +| **P1** | Agent Coordination | Fragile call chains, merge complexity | Dedicated orchestrator, enforced data contracts | +| **P2** | Context Explosion | Token accumulation, multi-turn bloat | Context summarization, sliding window, structured state | +| **P3** | Long-tail Forgetting | Early constraint loss | Constraint injection, checkpointing, goal alignment | + +### General Optimization Areas (按需分析 via Gemini CLI) + +| Category | Issues | Gemini Analysis Scope | +|----------|--------|----------------------| +| **Prompt Engineering** | 模糊指令, 输出格式不一致, 幻觉风险 | 提示词优化, 结构化输出设计 | +| **Architecture** | 阶段划分不合理, 依赖混乱, 扩展性差 | 架构审查, 模块化建议 | +| **Performance** | 执行慢, Token消耗高, 重复计算 | 性能分析, 缓存策略 | +| **Error Handling** | 错误恢复不当, 无降级策略, 日志不足 | 容错设计, 可观测性增强 | +| **Output Quality** | 输出不稳定, 格式漂移, 质量波动 | 质量门控, 验证机制 | +| **User Experience** | 交互不流畅, 反馈不清晰, 进度不可见 | UX优化, 进度追踪 | + +## Key Design Principles + +1. **Problem-First Diagnosis**: Systematic identification before any fix attempt +2. **Data-Driven Analysis**: Record execution traces, token counts, state snapshots +3. **Iterative Refinement**: Multiple tuning rounds until quality gates pass +4. **Non-Destructive**: All changes are reversible with backup checkpoints +5. **Agent Coordination**: Use specialized sub-agents for each diagnosis type +6. **Gemini CLI On-Demand**: Deep analysis via CLI for complex/custom issues + +--- + +## Gemini CLI Integration + +根据用户需求动态调用 Gemini CLI 进行深度分析。 + +### Trigger Conditions + +| Condition | Action | CLI Mode | +|-----------|--------|----------| +| 用户描述复杂问题 | 调用 Gemini 分析问题根因 | `analysis` | +| 自动诊断发现 critical 问题 | 请求深度分析确认 | `analysis` | +| 用户请求架构审查 | 执行架构分析 | `analysis` | +| 需要生成修复代码 | 生成修复提案 | `write` | +| 标准策略不适用 | 请求定制化策略 | `analysis` | + +### CLI Command Template + +```bash +ccw cli -p " +PURPOSE: ${purpose} +TASK: ${task_steps} +MODE: ${mode} +CONTEXT: @${skill_path}/**/* +EXPECTED: ${expected_output} +RULES: $(cat ~/.claude/workflows/cli-templates/protocols/${mode}-protocol.md) | ${constraints} +" --tool gemini --mode ${mode} --cd ${skill_path} +``` + +### Analysis Types + +#### 1. Problem Root Cause Analysis + +```bash +ccw cli -p " +PURPOSE: Identify root cause of skill execution issue: ${user_issue_description} +TASK: • Analyze skill structure and phase flow • Identify anti-patterns • Trace data flow issues +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: JSON with { root_causes: [], patterns_found: [], recommendations: [] } +RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Focus on execution flow +" --tool gemini --mode analysis +``` + +#### 2. Architecture Review + +```bash +ccw cli -p " +PURPOSE: Review skill architecture for scalability and maintainability +TASK: • Evaluate phase decomposition • Check state management patterns • Assess agent coordination +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: Architecture assessment with improvement recommendations +RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Focus on modularity +" --tool gemini --mode analysis +``` + +#### 3. Fix Strategy Generation + +```bash +ccw cli -p " +PURPOSE: Generate fix strategy for issue: ${issue_id} - ${issue_description} +TASK: • Analyze issue context • Design fix approach • Generate implementation plan +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: JSON with { strategy: string, changes: [], verification_steps: [] } +RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Minimal invasive changes +" --tool gemini --mode analysis +``` + +--- + +## Mandatory Prerequisites + +> **CRITICAL**: Read these documents before executing any action. + +### Core Specs (Required) + +| Document | Purpose | Priority | +|----------|---------|----------| +| [specs/problem-taxonomy.md](specs/problem-taxonomy.md) | Problem classification and detection patterns | **P0** | +| [specs/tuning-strategies.md](specs/tuning-strategies.md) | Fix strategies for each problem type | **P0** | +| [specs/quality-gates.md](specs/quality-gates.md) | Quality thresholds and verification criteria | P1 | + +### Templates (Reference) + +| Document | Purpose | +|----------|---------| +| [templates/diagnosis-report.md](templates/diagnosis-report.md) | Diagnosis report structure | +| [templates/fix-proposal.md](templates/fix-proposal.md) | Fix proposal format | + +--- + +## Execution Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ Phase 0: Specification Study (强制前置 - 禁止跳过) │ +│ → Read: specs/problem-taxonomy.md (问题分类) │ +│ → Read: specs/tuning-strategies.md (调优策略) │ +│ → Read: Target skill's SKILL.md and phases/*.md │ +│ → Output: 内化规范,理解目标 skill 结构 │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-init: Initialize Tuning Session │ +│ → Create work directory: .workflow/.scratchpad/skill-tuning-{timestamp} │ +│ → Initialize state.json with target skill info │ +│ → Create backup of target skill files │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-diagnose-context: Context Explosion Analysis │ +│ → Scan for token accumulation patterns │ +│ → Detect multi-turn dialogue growth │ +│ → Output: context-diagnosis.json │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-diagnose-memory: Long-tail Forgetting Analysis │ +│ → Trace constraint propagation through phases │ +│ → Detect early instruction loss │ +│ → Output: memory-diagnosis.json │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-diagnose-dataflow: Data Flow Analysis │ +│ → Map state transitions between phases │ +│ → Detect format inconsistencies │ +│ → Output: dataflow-diagnosis.json │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-diagnose-agent: Agent Coordination Analysis │ +│ → Analyze agent call patterns │ +│ → Detect result passing issues │ +│ → Output: agent-diagnosis.json │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-generate-report: Consolidated Report │ +│ → Merge all diagnosis results │ +│ → Prioritize issues by severity │ +│ → Output: tuning-report.md │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-propose-fixes: Fix Proposal Generation │ +│ → Generate fix strategies for each issue │ +│ → Create implementation plan │ +│ → Output: fix-proposals.json │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-apply-fix: Apply Selected Fix │ +│ → User selects fix to apply │ +│ → Execute fix with backup │ +│ → Update state with fix result │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-verify: Verification │ +│ → Re-run affected diagnosis │ +│ → Check quality gates │ +│ → Update iteration count │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ action-complete: Finalization │ +│ → Generate final report │ +│ → Cleanup temporary files │ +│ → Output: tuning-summary.md │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Directory Setup + +```javascript +const timestamp = new Date().toISOString().slice(0,19).replace(/[-:T]/g, ''); +const workDir = `.workflow/.scratchpad/skill-tuning-${timestamp}`; + +Bash(`mkdir -p "${workDir}/diagnosis"`); +Bash(`mkdir -p "${workDir}/backups"`); +Bash(`mkdir -p "${workDir}/fixes"`); +``` + +## Output Structure + +``` +.workflow/.scratchpad/skill-tuning-{timestamp}/ +├── state.json # Session state (orchestrator-managed) +├── diagnosis/ +│ ├── context-diagnosis.json # Context explosion analysis +│ ├── memory-diagnosis.json # Long-tail forgetting analysis +│ ├── dataflow-diagnosis.json # Data flow analysis +│ └── agent-diagnosis.json # Agent coordination analysis +├── backups/ +│ └── {skill-name}-backup/ # Original skill files backup +├── fixes/ +│ ├── fix-proposals.json # Proposed fixes +│ └── applied-fixes.json # Applied fix history +├── tuning-report.md # Consolidated diagnosis report +└── tuning-summary.md # Final summary +``` + +## State Schema + +```typescript +interface TuningState { + status: 'pending' | 'running' | 'completed' | 'failed'; + target_skill: { + name: string; + path: string; + execution_mode: 'sequential' | 'autonomous'; + }; + user_issue_description: string; + diagnosis: { + context: DiagnosisResult | null; + memory: DiagnosisResult | null; + dataflow: DiagnosisResult | null; + agent: DiagnosisResult | null; + }; + issues: Issue[]; + proposed_fixes: Fix[]; + applied_fixes: AppliedFix[]; + iteration_count: number; + max_iterations: number; + quality_score: number; + completed_actions: string[]; + current_action: string | null; + errors: Error[]; + error_count: number; +} + +interface DiagnosisResult { + status: 'completed' | 'skipped'; + issues_found: number; + severity: 'critical' | 'high' | 'medium' | 'low' | 'none'; + details: any; +} + +interface Issue { + id: string; + type: 'context_explosion' | 'memory_loss' | 'dataflow_break' | 'agent_failure'; + severity: 'critical' | 'high' | 'medium' | 'low'; + location: string; + description: string; + evidence: string[]; +} + +interface Fix { + id: string; + issue_id: string; + strategy: string; + description: string; + changes: FileChange[]; + risk: 'low' | 'medium' | 'high'; +} +``` + +## Reference Documents + +| Document | Purpose | +|----------|---------| +| [phases/orchestrator.md](phases/orchestrator.md) | Orchestrator decision logic | +| [phases/state-schema.md](phases/state-schema.md) | State structure definition | +| [phases/actions/action-init.md](phases/actions/action-init.md) | Initialize tuning session | +| [phases/actions/action-diagnose-context.md](phases/actions/action-diagnose-context.md) | Context explosion diagnosis | +| [phases/actions/action-diagnose-memory.md](phases/actions/action-diagnose-memory.md) | Long-tail forgetting diagnosis | +| [phases/actions/action-diagnose-dataflow.md](phases/actions/action-diagnose-dataflow.md) | Data flow diagnosis | +| [phases/actions/action-diagnose-agent.md](phases/actions/action-diagnose-agent.md) | Agent coordination diagnosis | +| [phases/actions/action-generate-report.md](phases/actions/action-generate-report.md) | Report generation | +| [phases/actions/action-propose-fixes.md](phases/actions/action-propose-fixes.md) | Fix proposal | +| [phases/actions/action-apply-fix.md](phases/actions/action-apply-fix.md) | Fix application | +| [phases/actions/action-verify.md](phases/actions/action-verify.md) | Verification | +| [phases/actions/action-complete.md](phases/actions/action-complete.md) | Finalization | +| [specs/problem-taxonomy.md](specs/problem-taxonomy.md) | Problem classification | +| [specs/tuning-strategies.md](specs/tuning-strategies.md) | Fix strategies | +| [specs/quality-gates.md](specs/quality-gates.md) | Quality criteria | diff --git a/.claude/skills/skill-tuning/phases/actions/action-abort.md b/.claude/skills/skill-tuning/phases/actions/action-abort.md new file mode 100644 index 00000000..a24145fc --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-abort.md @@ -0,0 +1,164 @@ +# Action: Abort + +Abort the tuning session due to unrecoverable errors. + +## Purpose + +- Safely terminate on critical failures +- Preserve diagnostic information for debugging +- Ensure backup remains available +- Notify user of failure reason + +## Preconditions + +- [ ] state.error_count >= state.max_errors +- [ ] OR critical failure detected + +## Execution + +```javascript +async function execute(state, workDir) { + console.log('Aborting skill tuning session...'); + + const errors = state.errors; + const targetSkill = state.target_skill; + + // Generate abort report + const abortReport = `# Skill Tuning Aborted + +**Target Skill**: ${targetSkill?.name || 'Unknown'} +**Aborted At**: ${new Date().toISOString()} +**Reason**: Too many errors or critical failure + +--- + +## Error Log + +${errors.length === 0 ? '_No errors recorded_' : + errors.map((err, i) => ` +### Error ${i + 1} +- **Action**: ${err.action} +- **Message**: ${err.message} +- **Time**: ${err.timestamp} +- **Recoverable**: ${err.recoverable ? 'Yes' : 'No'} +`).join('\n')} + +--- + +## Session State at Abort + +- **Status**: ${state.status} +- **Iteration Count**: ${state.iteration_count} +- **Completed Actions**: ${state.completed_actions.length} +- **Issues Found**: ${state.issues.length} +- **Fixes Applied**: ${state.applied_fixes.length} + +--- + +## Recovery Options + +### Option 1: Restore Original Skill +If any changes were made, restore from backup: +\`\`\`bash +cp -r "${state.backup_dir}/${targetSkill?.name || 'backup'}-backup"/* "${targetSkill?.path || 'target'}/" +\`\`\` + +### Option 2: Resume from Last State +The session state is preserved at: +\`${workDir}/state.json\` + +To resume: +1. Fix the underlying issue +2. Reset error_count in state.json +3. Re-run skill-tuning with --resume flag + +### Option 3: Manual Investigation +Review the following files: +- Diagnosis results: \`${workDir}/diagnosis/*.json\` +- Error log: \`${workDir}/errors.json\` +- State snapshot: \`${workDir}/state.json\` + +--- + +## Diagnostic Information + +### Last Successful Action +${state.completed_actions.length > 0 ? state.completed_actions[state.completed_actions.length - 1] : 'None'} + +### Current Action When Failed +${state.current_action || 'Unknown'} + +### Partial Diagnosis Results +- Context: ${state.diagnosis.context ? 'Completed' : 'Not completed'} +- Memory: ${state.diagnosis.memory ? 'Completed' : 'Not completed'} +- Data Flow: ${state.diagnosis.dataflow ? 'Completed' : 'Not completed'} +- Agent: ${state.diagnosis.agent ? 'Completed' : 'Not completed'} + +--- + +*Skill tuning aborted - please review errors and retry* +`; + + // Write abort report + Write(`${workDir}/abort-report.md`, abortReport); + + // Save error log + Write(`${workDir}/errors.json`, JSON.stringify(errors, null, 2)); + + // Notify user + await AskUserQuestion({ + questions: [{ + question: `Skill tuning aborted due to ${errors.length} errors. Would you like to restore the original skill?`, + header: 'Restore', + multiSelect: false, + options: [ + { label: 'Yes, restore', description: 'Restore original skill from backup' }, + { label: 'No, keep changes', description: 'Keep any partial changes made' } + ] + }] + }).then(async response => { + if (response['Restore'] === 'Yes, restore') { + // Restore from backup + if (state.backup_dir && targetSkill?.path) { + Bash(`cp -r "${state.backup_dir}/${targetSkill.name}-backup"/* "${targetSkill.path}/"`); + console.log('Original skill restored from backup.'); + } + } + }).catch(() => { + // User cancelled, don't restore + }); + + return { + stateUpdates: { + status: 'failed', + completed_at: new Date().toISOString() + }, + outputFiles: [`${workDir}/abort-report.md`, `${workDir}/errors.json`], + summary: `Tuning aborted: ${errors.length} errors. Check abort-report.md for details.` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + status: 'failed', + completed_at: '' + } +}; +``` + +## Output + +- **File**: `abort-report.md` +- **Location**: `${workDir}/abort-report.md` + +## Error Handling + +This action should not fail - it's the final error handler. + +## Next Actions + +- None (terminal state) diff --git a/.claude/skills/skill-tuning/phases/actions/action-apply-fix.md b/.claude/skills/skill-tuning/phases/actions/action-apply-fix.md new file mode 100644 index 00000000..45c32f71 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-apply-fix.md @@ -0,0 +1,206 @@ +# Action: Apply Fix + +Apply a selected fix to the target skill with backup and rollback capability. + +## Purpose + +- Apply fix changes to target skill files +- Create backup before modifications +- Track applied fixes for verification +- Support rollback if needed + +## Preconditions + +- [ ] state.status === 'running' +- [ ] state.pending_fixes.length > 0 +- [ ] state.proposed_fixes contains the fix to apply + +## Execution + +```javascript +async function execute(state, workDir) { + const pendingFixes = state.pending_fixes; + const proposedFixes = state.proposed_fixes; + const targetPath = state.target_skill.path; + const backupDir = state.backup_dir; + + if (pendingFixes.length === 0) { + return { + stateUpdates: {}, + outputFiles: [], + summary: 'No pending fixes to apply' + }; + } + + // Get next fix to apply + const fixId = pendingFixes[0]; + const fix = proposedFixes.find(f => f.id === fixId); + + if (!fix) { + return { + stateUpdates: { + pending_fixes: pendingFixes.slice(1), + errors: [...state.errors, { + action: 'action-apply-fix', + message: `Fix ${fixId} not found in proposals`, + timestamp: new Date().toISOString(), + recoverable: true + }] + }, + outputFiles: [], + summary: `Fix ${fixId} not found, skipping` + }; + } + + console.log(`Applying fix ${fix.id}: ${fix.description}`); + + // Create fix-specific backup + const fixBackupDir = `${backupDir}/before-${fix.id}`; + Bash(`mkdir -p "${fixBackupDir}"`); + + const appliedChanges = []; + let success = true; + + for (const change of fix.changes) { + try { + // Resolve file path (handle wildcards) + let targetFiles = []; + if (change.file.includes('*')) { + targetFiles = Glob(`${targetPath}/${change.file}`); + } else { + targetFiles = [`${targetPath}/${change.file}`]; + } + + for (const targetFile of targetFiles) { + // Backup original + const relativePath = targetFile.replace(targetPath + '/', ''); + const backupPath = `${fixBackupDir}/${relativePath}`; + + if (Glob(targetFile).length > 0) { + const originalContent = Read(targetFile); + Bash(`mkdir -p "$(dirname "${backupPath}")"`); + Write(backupPath, originalContent); + } + + // Apply change based on action type + if (change.action === 'modify' && change.diff) { + // For now, append the diff as a comment/note + // Real implementation would parse and apply the diff + const existingContent = Read(targetFile); + + // Simple diff application: look for context and apply + // This is a simplified version - real implementation would be more sophisticated + const newContent = existingContent + `\n\n\n`; + + Write(targetFile, newContent); + + appliedChanges.push({ + file: relativePath, + action: 'modified', + backup: backupPath + }); + } else if (change.action === 'create') { + Write(targetFile, change.new_content || ''); + appliedChanges.push({ + file: relativePath, + action: 'created', + backup: null + }); + } + } + } catch (error) { + console.log(`Error applying change to ${change.file}: ${error.message}`); + success = false; + } + } + + // Record applied fix + const appliedFix = { + fix_id: fix.id, + applied_at: new Date().toISOString(), + success: success, + backup_path: fixBackupDir, + verification_result: 'pending', + rollback_available: true, + changes_made: appliedChanges + }; + + // Update applied fixes log + const appliedFixesPath = `${workDir}/fixes/applied-fixes.json`; + let existingApplied = []; + try { + existingApplied = JSON.parse(Read(appliedFixesPath)); + } catch (e) { + existingApplied = []; + } + existingApplied.push(appliedFix); + Write(appliedFixesPath, JSON.stringify(existingApplied, null, 2)); + + return { + stateUpdates: { + applied_fixes: [...state.applied_fixes, appliedFix], + pending_fixes: pendingFixes.slice(1) // Remove applied fix from pending + }, + outputFiles: [appliedFixesPath], + summary: `Applied fix ${fix.id}: ${success ? 'success' : 'partial'}, ${appliedChanges.length} files modified` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + applied_fixes: [...existingApplied, newAppliedFix], + pending_fixes: remainingPendingFixes + } +}; +``` + +## Rollback Function + +```javascript +async function rollbackFix(fixId, state, workDir) { + const appliedFix = state.applied_fixes.find(f => f.fix_id === fixId); + + if (!appliedFix || !appliedFix.rollback_available) { + throw new Error(`Cannot rollback fix ${fixId}`); + } + + const backupDir = appliedFix.backup_path; + const targetPath = state.target_skill.path; + + // Restore from backup + const backupFiles = Glob(`${backupDir}/**/*`); + for (const backupFile of backupFiles) { + const relativePath = backupFile.replace(backupDir + '/', ''); + const targetFile = `${targetPath}/${relativePath}`; + const content = Read(backupFile); + Write(targetFile, content); + } + + return { + stateUpdates: { + applied_fixes: state.applied_fixes.map(f => + f.fix_id === fixId + ? { ...f, rollback_available: false, verification_result: 'rolled_back' } + : f + ) + } + }; +} +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| File not found | Skip file, log warning | +| Write permission error | Retry with sudo or report | +| Backup creation failed | Abort fix, don't modify | + +## Next Actions + +- If pending_fixes.length > 0: action-apply-fix (continue) +- If all fixes applied: action-verify diff --git a/.claude/skills/skill-tuning/phases/actions/action-complete.md b/.claude/skills/skill-tuning/phases/actions/action-complete.md new file mode 100644 index 00000000..bab2c9c8 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-complete.md @@ -0,0 +1,195 @@ +# Action: Complete + +Finalize the tuning session with summary report and cleanup. + +## Purpose + +- Generate final summary report +- Record tuning statistics +- Clean up temporary files (optional) +- Provide recommendations for future maintenance + +## Preconditions + +- [ ] state.status === 'running' +- [ ] quality_gate === 'pass' OR max_iterations reached + +## Execution + +```javascript +async function execute(state, workDir) { + console.log('Finalizing skill tuning session...'); + + const targetSkill = state.target_skill; + const startTime = new Date(state.started_at); + const endTime = new Date(); + const duration = Math.round((endTime - startTime) / 1000); + + // Generate final summary + const summary = `# Skill Tuning Summary + +**Target Skill**: ${targetSkill.name} +**Path**: ${targetSkill.path} +**Session Duration**: ${duration} seconds +**Completed**: ${endTime.toISOString()} + +--- + +## Final Status + +| Metric | Value | +|--------|-------| +| Final Health Score | ${state.quality_score}/100 | +| Quality Gate | ${state.quality_gate.toUpperCase()} | +| Total Iterations | ${state.iteration_count} | +| Issues Found | ${state.issues.length + state.applied_fixes.flatMap(f => f.issues_resolved || []).length} | +| Issues Resolved | ${state.applied_fixes.flatMap(f => f.issues_resolved || []).length} | +| Fixes Applied | ${state.applied_fixes.length} | +| Fixes Verified | ${state.applied_fixes.filter(f => f.verification_result === 'pass').length} | + +--- + +## Diagnosis Summary + +| Area | Issues Found | Severity | +|------|--------------|----------| +| Context Explosion | ${state.diagnosis.context?.issues_found || 'N/A'} | ${state.diagnosis.context?.severity || 'N/A'} | +| Long-tail Forgetting | ${state.diagnosis.memory?.issues_found || 'N/A'} | ${state.diagnosis.memory?.severity || 'N/A'} | +| Data Flow | ${state.diagnosis.dataflow?.issues_found || 'N/A'} | ${state.diagnosis.dataflow?.severity || 'N/A'} | +| Agent Coordination | ${state.diagnosis.agent?.issues_found || 'N/A'} | ${state.diagnosis.agent?.severity || 'N/A'} | + +--- + +## Applied Fixes + +${state.applied_fixes.length === 0 ? '_No fixes applied_' : + state.applied_fixes.map((fix, i) => ` +### ${i + 1}. ${fix.fix_id} + +- **Applied At**: ${fix.applied_at} +- **Success**: ${fix.success ? 'Yes' : 'No'} +- **Verification**: ${fix.verification_result} +- **Rollback Available**: ${fix.rollback_available ? 'Yes' : 'No'} +`).join('\n')} + +--- + +## Remaining Issues + +${state.issues.length === 0 ? '✅ All issues resolved!' : + `${state.issues.length} issues remain:\n\n` + + state.issues.map(issue => + `- **[${issue.severity.toUpperCase()}]** ${issue.description} (${issue.id})` + ).join('\n')} + +--- + +## Recommendations + +${generateRecommendations(state)} + +--- + +## Backup Information + +Original skill files backed up to: +\`${state.backup_dir}\` + +To restore original skill: +\`\`\`bash +cp -r "${state.backup_dir}/${targetSkill.name}-backup"/* "${targetSkill.path}/" +\`\`\` + +--- + +## Session Files + +| File | Description | +|------|-------------| +| ${workDir}/tuning-report.md | Full diagnostic report | +| ${workDir}/diagnosis/*.json | Individual diagnosis results | +| ${workDir}/fixes/fix-proposals.json | Proposed fixes | +| ${workDir}/fixes/applied-fixes.json | Applied fix history | +| ${workDir}/tuning-summary.md | This summary | + +--- + +*Skill tuning completed by skill-tuning* +`; + + Write(`${workDir}/tuning-summary.md`, summary); + + // Update final state + return { + stateUpdates: { + status: 'completed', + completed_at: endTime.toISOString() + }, + outputFiles: [`${workDir}/tuning-summary.md`], + summary: `Tuning complete: ${state.quality_gate} with ${state.quality_score}/100 health score` + }; +} + +function generateRecommendations(state) { + const recommendations = []; + + // Based on remaining issues + if (state.issues.some(i => i.type === 'context_explosion')) { + recommendations.push('- **Context Management**: Consider implementing a context summarization agent to prevent token growth'); + } + + if (state.issues.some(i => i.type === 'memory_loss')) { + recommendations.push('- **Constraint Tracking**: Add explicit constraint injection to each phase prompt'); + } + + if (state.issues.some(i => i.type === 'dataflow_break')) { + recommendations.push('- **State Centralization**: Migrate to single state.json with schema validation'); + } + + if (state.issues.some(i => i.type === 'agent_failure')) { + recommendations.push('- **Error Handling**: Wrap all Task calls in try-catch blocks'); + } + + // General recommendations + if (state.iteration_count >= state.max_iterations) { + recommendations.push('- **Deep Refactoring**: Consider architectural review if issues persist after multiple iterations'); + } + + if (state.quality_score < 80) { + recommendations.push('- **Regular Tuning**: Schedule periodic skill-tuning runs to catch issues early'); + } + + if (recommendations.length === 0) { + recommendations.push('- Skill is in good health! Monitor for regressions during future development.'); + } + + return recommendations.join('\n'); +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + status: 'completed', + completed_at: '' + } +}; +``` + +## Output + +- **File**: `tuning-summary.md` +- **Location**: `${workDir}/tuning-summary.md` +- **Format**: Markdown + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| Summary write failed | Write to alternative location | + +## Next Actions + +- None (terminal state) diff --git a/.claude/skills/skill-tuning/phases/actions/action-diagnose-agent.md b/.claude/skills/skill-tuning/phases/actions/action-diagnose-agent.md new file mode 100644 index 00000000..8147f43f --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-diagnose-agent.md @@ -0,0 +1,317 @@ +# Action: Diagnose Agent Coordination + +Analyze target skill for agent coordination failures - call chain fragility and result passing issues. + +## Purpose + +- Detect fragile agent call patterns +- Identify result passing issues +- Find missing error handling in agent calls +- Analyze agent return format consistency + +## Preconditions + +- [ ] state.status === 'running' +- [ ] state.target_skill.path is set +- [ ] 'agent' in state.focus_areas OR state.focus_areas is empty + +## Detection Patterns + +### Pattern 1: Unhandled Agent Failures + +```regex +# Task calls without try-catch or error handling +/Task\s*\(\s*\{[^}]*\}\s*\)(?![^;]*catch)/ +``` + +### Pattern 2: Missing Return Validation + +```regex +# Agent result used directly without validation +/const\s+\w+\s*=\s*await?\s*Task\([^)]+\);\s*(?!.*(?:if|try|JSON\.parse))/ +``` + +### Pattern 3: Inconsistent Agent Configuration + +```regex +# Different agent configurations in same skill +/subagent_type:\s*['"](\w+)['"]/g +``` + +### Pattern 4: Deeply Nested Agent Calls + +```regex +# Agent calling another agent (nested) +/Task\s*\([^)]*prompt:[^)]*Task\s*\(/ +``` + +## Execution + +```javascript +async function execute(state, workDir) { + const skillPath = state.target_skill.path; + const startTime = Date.now(); + const issues = []; + const evidence = []; + + console.log(`Diagnosing agent coordination in ${skillPath}...`); + + // 1. Find all Task/agent calls + const allFiles = Glob(`${skillPath}/**/*.md`); + const agentCalls = []; + const agentTypes = new Set(); + + for (const file of allFiles) { + const content = Read(file); + const relativePath = file.replace(skillPath + '/', ''); + + // Find Task calls + const taskMatches = content.matchAll(/Task\s*\(\s*\{([^}]+)\}/g); + for (const match of taskMatches) { + const config = match[1]; + + // Extract agent type + const typeMatch = config.match(/subagent_type:\s*['"]([^'"]+)['"]/); + const agentType = typeMatch ? typeMatch[1] : 'unknown'; + agentTypes.add(agentType); + + // Check for error handling context + const hasErrorHandling = /try\s*\{.*Task|\.catch\(|await\s+Task.*\.then/s.test( + content.slice(Math.max(0, match.index - 100), match.index + match[0].length + 100) + ); + + // Check for result validation + const hasResultValidation = /JSON\.parse|if\s*\(\s*result|result\s*\?\./s.test( + content.slice(match.index, match.index + match[0].length + 200) + ); + + // Check for background execution + const runsInBackground = /run_in_background:\s*true/.test(config); + + agentCalls.push({ + file: relativePath, + agentType, + hasErrorHandling, + hasResultValidation, + runsInBackground, + config: config.slice(0, 200) + }); + } + } + + // 2. Analyze agent call patterns + const totalCalls = agentCalls.length; + const callsWithoutErrorHandling = agentCalls.filter(c => !c.hasErrorHandling); + const callsWithoutValidation = agentCalls.filter(c => !c.hasResultValidation); + + // Issue: Missing error handling + if (callsWithoutErrorHandling.length > 0) { + issues.push({ + id: `AGT-${issues.length + 1}`, + type: 'agent_failure', + severity: callsWithoutErrorHandling.length > 2 ? 'high' : 'medium', + location: { file: 'multiple' }, + description: `${callsWithoutErrorHandling.length}/${totalCalls} agent calls lack error handling`, + evidence: callsWithoutErrorHandling.slice(0, 3).map(c => + `${c.file}: ${c.agentType}` + ), + root_cause: 'Agent failures not caught, may crash workflow', + impact: 'Unhandled agent errors cause cascading failures', + suggested_fix: 'Wrap Task calls in try-catch with graceful fallback' + }); + evidence.push({ + file: 'multiple', + pattern: 'missing_error_handling', + context: `${callsWithoutErrorHandling.length} calls affected`, + severity: 'high' + }); + } + + // Issue: Missing result validation + if (callsWithoutValidation.length > 0) { + issues.push({ + id: `AGT-${issues.length + 1}`, + type: 'agent_failure', + severity: 'medium', + location: { file: 'multiple' }, + description: `${callsWithoutValidation.length}/${totalCalls} agent calls lack result validation`, + evidence: callsWithoutValidation.slice(0, 3).map(c => + `${c.file}: ${c.agentType} result not validated` + ), + root_cause: 'Agent results used directly without type checking', + impact: 'Invalid agent output may corrupt state', + suggested_fix: 'Add JSON.parse with try-catch and schema validation' + }); + } + + // 3. Check for inconsistent agent types usage + if (agentTypes.size > 3 && state.target_skill.execution_mode === 'autonomous') { + issues.push({ + id: `AGT-${issues.length + 1}`, + type: 'agent_failure', + severity: 'low', + location: { file: 'multiple' }, + description: `Using ${agentTypes.size} different agent types`, + evidence: [...agentTypes].slice(0, 5), + root_cause: 'Multiple agent types increase coordination complexity', + impact: 'Different agent behaviors may cause inconsistency', + suggested_fix: 'Standardize on fewer agent types with clear roles' + }); + } + + // 4. Check for nested agent calls + for (const file of allFiles) { + const content = Read(file); + const relativePath = file.replace(skillPath + '/', ''); + + // Detect nested Task calls + const hasNestedTask = /Task\s*\([^)]*prompt:[^)]*Task\s*\(/s.test(content); + + if (hasNestedTask) { + issues.push({ + id: `AGT-${issues.length + 1}`, + type: 'agent_failure', + severity: 'high', + location: { file: relativePath }, + description: 'Nested agent calls detected', + evidence: ['Agent prompt contains another Task call'], + root_cause: 'Agent calls another agent, creating deep nesting', + impact: 'Context explosion, hard to debug, unpredictable behavior', + suggested_fix: 'Flatten agent calls, use orchestrator to coordinate' + }); + } + } + + // 5. Check SKILL.md for agent configuration consistency + const skillMd = Read(`${skillPath}/SKILL.md`); + + // Check if allowed-tools includes Task + const allowedTools = skillMd.match(/allowed-tools:\s*([^\n]+)/i); + if (allowedTools && !allowedTools[1].includes('Task') && totalCalls > 0) { + issues.push({ + id: `AGT-${issues.length + 1}`, + type: 'agent_failure', + severity: 'medium', + location: { file: 'SKILL.md' }, + description: 'Task tool used but not declared in allowed-tools', + evidence: [`${totalCalls} Task calls found, but Task not in allowed-tools`], + root_cause: 'Tool declaration mismatch', + impact: 'May cause runtime permission issues', + suggested_fix: 'Add Task to allowed-tools in SKILL.md front matter' + }); + } + + // 6. Check for agent result format consistency + const returnFormats = new Set(); + for (const file of allFiles) { + const content = Read(file); + + // Look for return format definitions + const returnMatch = content.match(/\[RETURN\][^[]*|return\s*\{[^}]+\}/gi); + if (returnMatch) { + returnMatch.forEach(r => { + const format = r.includes('JSON') ? 'json' : + r.includes('summary') ? 'summary' : + r.includes('file') ? 'file_path' : 'other'; + returnFormats.add(format); + }); + } + } + + if (returnFormats.size > 2) { + issues.push({ + id: `AGT-${issues.length + 1}`, + type: 'agent_failure', + severity: 'medium', + location: { file: 'multiple' }, + description: 'Inconsistent agent return formats', + evidence: [...returnFormats], + root_cause: 'Different agents return data in different formats', + impact: 'Orchestrator must handle multiple format types', + suggested_fix: 'Standardize return format: {status, output_file, summary}' + }); + } + + // 7. Calculate severity + const criticalCount = issues.filter(i => i.severity === 'critical').length; + const highCount = issues.filter(i => i.severity === 'high').length; + const severity = criticalCount > 0 ? 'critical' : + highCount > 1 ? 'high' : + highCount > 0 ? 'medium' : + issues.length > 0 ? 'low' : 'none'; + + // 8. Write diagnosis result + const diagnosisResult = { + status: 'completed', + issues_found: issues.length, + severity: severity, + execution_time_ms: Date.now() - startTime, + details: { + patterns_checked: [ + 'error_handling', + 'result_validation', + 'agent_type_consistency', + 'nested_calls', + 'return_format_consistency' + ], + patterns_matched: evidence.map(e => e.pattern), + evidence: evidence, + agent_analysis: { + total_agent_calls: totalCalls, + unique_agent_types: agentTypes.size, + calls_without_error_handling: callsWithoutErrorHandling.length, + calls_without_validation: callsWithoutValidation.length, + agent_types_used: [...agentTypes] + }, + recommendations: [ + callsWithoutErrorHandling.length > 0 + ? 'Add try-catch to all Task calls' : null, + callsWithoutValidation.length > 0 + ? 'Add result validation with JSON.parse and schema check' : null, + agentTypes.size > 3 + ? 'Consolidate agent types for consistency' : null + ].filter(Boolean) + } + }; + + Write(`${workDir}/diagnosis/agent-diagnosis.json`, + JSON.stringify(diagnosisResult, null, 2)); + + return { + stateUpdates: { + 'diagnosis.agent': diagnosisResult, + issues: [...state.issues, ...issues] + }, + outputFiles: [`${workDir}/diagnosis/agent-diagnosis.json`], + summary: `Agent diagnosis: ${issues.length} issues found (severity: ${severity})` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + 'diagnosis.agent': { + status: 'completed', + issues_found: , + severity: '', + // ... full diagnosis result + }, + issues: [...existingIssues, ...newIssues] + } +}; +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| Regex match error | Use simpler patterns | +| File access error | Skip and continue | + +## Next Actions + +- Success: action-generate-report +- Skipped: If 'agent' not in focus_areas diff --git a/.claude/skills/skill-tuning/phases/actions/action-diagnose-context.md b/.claude/skills/skill-tuning/phases/actions/action-diagnose-context.md new file mode 100644 index 00000000..19790d90 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-diagnose-context.md @@ -0,0 +1,243 @@ +# Action: Diagnose Context Explosion + +Analyze target skill for context explosion issues - token accumulation and multi-turn dialogue bloat. + +## Purpose + +- Detect patterns that cause context growth +- Identify multi-turn accumulation points +- Find missing context compression mechanisms +- Measure potential token waste + +## Preconditions + +- [ ] state.status === 'running' +- [ ] state.target_skill.path is set +- [ ] 'context' in state.focus_areas OR state.focus_areas is empty + +## Detection Patterns + +### Pattern 1: Unbounded History Accumulation + +```regex +# Patterns that suggest history accumulation +/\bhistory\b.*\.push\b/ +/\bmessages\b.*\.concat\b/ +/\bconversation\b.*\+=\b/ +/\bappend.*context\b/i +``` + +### Pattern 2: Full Content Passing + +```regex +# Patterns that pass full content instead of references +/Read\([^)]+\).*\+.*Read\(/ +/JSON\.stringify\(.*state\)/ # Full state serialization +/\$\{.*content\}/ # Template literal with full content +``` + +### Pattern 3: Missing Summarization + +```regex +# Absence of compression/summarization +# Check for lack of: summarize, compress, truncate, slice +``` + +### Pattern 4: Agent Return Bloat + +```regex +# Agent returning full content instead of path + summary +/return\s*\{[^}]*content:/ +/return.*JSON\.stringify/ +``` + +## Execution + +```javascript +async function execute(state, workDir) { + const skillPath = state.target_skill.path; + const startTime = Date.now(); + const issues = []; + const evidence = []; + + console.log(`Diagnosing context explosion in ${skillPath}...`); + + // 1. Scan all phase files + const phaseFiles = Glob(`${skillPath}/phases/**/*.md`); + + for (const file of phaseFiles) { + const content = Read(file); + const relativePath = file.replace(skillPath + '/', ''); + + // Check Pattern 1: History accumulation + const historyPatterns = [ + /history\s*[.=].*push|concat|append/gi, + /messages\s*=\s*\[.*\.\.\..*messages/gi, + /conversation.*\+=/gi + ]; + + for (const pattern of historyPatterns) { + const matches = content.match(pattern); + if (matches) { + issues.push({ + id: `CTX-${issues.length + 1}`, + type: 'context_explosion', + severity: 'high', + location: { file: relativePath }, + description: 'Unbounded history accumulation detected', + evidence: matches.slice(0, 3), + root_cause: 'History/messages array grows without bounds', + impact: 'Token count increases linearly with iterations', + suggested_fix: 'Implement sliding window or summarization' + }); + evidence.push({ + file: relativePath, + pattern: 'history_accumulation', + context: matches[0], + severity: 'high' + }); + } + } + + // Check Pattern 2: Full content passing + const contentPatterns = [ + /Read\s*\([^)]+\)\s*[\+,]/g, + /JSON\.stringify\s*\(\s*state\s*\)/g, + /\$\{[^}]*content[^}]*\}/g + ]; + + for (const pattern of contentPatterns) { + const matches = content.match(pattern); + if (matches) { + issues.push({ + id: `CTX-${issues.length + 1}`, + type: 'context_explosion', + severity: 'medium', + location: { file: relativePath }, + description: 'Full content passed instead of reference', + evidence: matches.slice(0, 3), + root_cause: 'Entire file/state content included in prompts', + impact: 'Unnecessary token consumption', + suggested_fix: 'Pass file paths and summaries instead of full content' + }); + evidence.push({ + file: relativePath, + pattern: 'full_content_passing', + context: matches[0], + severity: 'medium' + }); + } + } + + // Check Pattern 3: Missing summarization + const hasSummarization = /summariz|compress|truncat|slice.*context/i.test(content); + const hasLongPrompts = content.length > 5000; + + if (hasLongPrompts && !hasSummarization) { + issues.push({ + id: `CTX-${issues.length + 1}`, + type: 'context_explosion', + severity: 'medium', + location: { file: relativePath }, + description: 'Long phase file without summarization mechanism', + evidence: [`File length: ${content.length} chars`], + root_cause: 'No context compression for large content', + impact: 'Potential token overflow in long sessions', + suggested_fix: 'Add context summarization before passing to agents' + }); + } + + // Check Pattern 4: Agent return bloat + const returnPatterns = /return\s*\{[^}]*(?:content|full_output|complete_result):/g; + const returnMatches = content.match(returnPatterns); + if (returnMatches) { + issues.push({ + id: `CTX-${issues.length + 1}`, + type: 'context_explosion', + severity: 'high', + location: { file: relativePath }, + description: 'Agent returns full content instead of path+summary', + evidence: returnMatches.slice(0, 3), + root_cause: 'Agent output includes complete content', + impact: 'Context bloat when orchestrator receives full output', + suggested_fix: 'Return {output_file, summary} instead of {content}' + }); + } + } + + // 2. Calculate severity + const criticalCount = issues.filter(i => i.severity === 'critical').length; + const highCount = issues.filter(i => i.severity === 'high').length; + const severity = criticalCount > 0 ? 'critical' : + highCount > 2 ? 'high' : + highCount > 0 ? 'medium' : + issues.length > 0 ? 'low' : 'none'; + + // 3. Write diagnosis result + const diagnosisResult = { + status: 'completed', + issues_found: issues.length, + severity: severity, + execution_time_ms: Date.now() - startTime, + details: { + patterns_checked: [ + 'history_accumulation', + 'full_content_passing', + 'missing_summarization', + 'agent_return_bloat' + ], + patterns_matched: evidence.map(e => e.pattern), + evidence: evidence, + recommendations: [ + issues.length > 0 ? 'Implement context summarization agent' : null, + highCount > 0 ? 'Add sliding window for conversation history' : null, + evidence.some(e => e.pattern === 'full_content_passing') + ? 'Refactor to pass file paths instead of content' : null + ].filter(Boolean) + } + }; + + Write(`${workDir}/diagnosis/context-diagnosis.json`, + JSON.stringify(diagnosisResult, null, 2)); + + return { + stateUpdates: { + 'diagnosis.context': diagnosisResult, + issues: [...state.issues, ...issues], + 'issues_by_severity.critical': state.issues_by_severity.critical + criticalCount, + 'issues_by_severity.high': state.issues_by_severity.high + highCount + }, + outputFiles: [`${workDir}/diagnosis/context-diagnosis.json`], + summary: `Context diagnosis: ${issues.length} issues found (severity: ${severity})` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + 'diagnosis.context': { + status: 'completed', + issues_found: , + severity: '', + // ... full diagnosis result + }, + issues: [...existingIssues, ...newIssues] + } +}; +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| File read error | Skip file, log warning | +| Pattern matching error | Use fallback patterns | +| Write error | Retry to alternative path | + +## Next Actions + +- Success: action-diagnose-memory (or next in focus_areas) +- Skipped: If 'context' not in focus_areas diff --git a/.claude/skills/skill-tuning/phases/actions/action-diagnose-dataflow.md b/.claude/skills/skill-tuning/phases/actions/action-diagnose-dataflow.md new file mode 100644 index 00000000..94a091dd --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-diagnose-dataflow.md @@ -0,0 +1,318 @@ +# Action: Diagnose Data Flow Issues + +Analyze target skill for data flow disruption - state inconsistencies and format variations. + +## Purpose + +- Detect inconsistent data formats between phases +- Identify scattered state storage +- Find missing data contracts +- Measure state transition integrity + +## Preconditions + +- [ ] state.status === 'running' +- [ ] state.target_skill.path is set +- [ ] 'dataflow' in state.focus_areas OR state.focus_areas is empty + +## Detection Patterns + +### Pattern 1: Multiple Storage Locations + +```regex +# Data written to multiple paths without centralization +/Write\s*\(\s*[`'"][^`'"]+[`'"]/g +``` + +### Pattern 2: Inconsistent Field Names + +```regex +# Same concept with different names: title/name, id/identifier +``` + +### Pattern 3: Missing Schema Validation + +```regex +# Absence of validation before state write +# Look for lack of: validate, schema, check, verify +``` + +### Pattern 4: Format Transformation Without Normalization + +```regex +# Direct JSON.parse without error handling or normalization +/JSON\.parse\([^)]+\)(?!\s*\|\|)/ +``` + +## Execution + +```javascript +async function execute(state, workDir) { + const skillPath = state.target_skill.path; + const startTime = Date.now(); + const issues = []; + const evidence = []; + + console.log(`Diagnosing data flow in ${skillPath}...`); + + // 1. Collect all Write operations to map data storage + const allFiles = Glob(`${skillPath}/**/*.md`); + const writeLocations = []; + const readLocations = []; + + for (const file of allFiles) { + const content = Read(file); + const relativePath = file.replace(skillPath + '/', ''); + + // Find Write operations + const writeMatches = content.matchAll(/Write\s*\(\s*[`'"]([^`'"]+)[`'"]/g); + for (const match of writeMatches) { + writeLocations.push({ + file: relativePath, + target: match[1], + isStateFile: match[1].includes('state.json') || match[1].includes('config.json') + }); + } + + // Find Read operations + const readMatches = content.matchAll(/Read\s*\(\s*[`'"]([^`'"]+)[`'"]/g); + for (const match of readMatches) { + readLocations.push({ + file: relativePath, + source: match[1] + }); + } + } + + // 2. Check for scattered state storage + const stateTargets = writeLocations + .filter(w => w.isStateFile) + .map(w => w.target); + + const uniqueStateFiles = [...new Set(stateTargets)]; + + if (uniqueStateFiles.length > 2) { + issues.push({ + id: `DF-${issues.length + 1}`, + type: 'dataflow_break', + severity: 'high', + location: { file: 'multiple' }, + description: `State stored in ${uniqueStateFiles.length} different locations`, + evidence: uniqueStateFiles.slice(0, 5), + root_cause: 'No centralized state management', + impact: 'State inconsistency between phases', + suggested_fix: 'Centralize state to single state.json with state manager' + }); + evidence.push({ + file: 'multiple', + pattern: 'scattered_state', + context: uniqueStateFiles.join(', '), + severity: 'high' + }); + } + + // 3. Check for inconsistent field naming + const fieldNamePatterns = { + 'name_vs_title': [/\.name\b/, /\.title\b/], + 'id_vs_identifier': [/\.id\b/, /\.identifier\b/], + 'status_vs_state': [/\.status\b/, /\.state\b/], + 'error_vs_errors': [/\.error\b/, /\.errors\b/] + }; + + const fieldUsage = {}; + + for (const file of allFiles) { + const content = Read(file); + const relativePath = file.replace(skillPath + '/', ''); + + for (const [patternName, patterns] of Object.entries(fieldNamePatterns)) { + for (const pattern of patterns) { + if (pattern.test(content)) { + if (!fieldUsage[patternName]) fieldUsage[patternName] = []; + fieldUsage[patternName].push({ + file: relativePath, + pattern: pattern.toString() + }); + } + } + } + } + + for (const [patternName, usages] of Object.entries(fieldUsage)) { + const uniquePatterns = [...new Set(usages.map(u => u.pattern))]; + if (uniquePatterns.length > 1) { + issues.push({ + id: `DF-${issues.length + 1}`, + type: 'dataflow_break', + severity: 'medium', + location: { file: 'multiple' }, + description: `Inconsistent field naming: ${patternName.replace('_vs_', ' vs ')}`, + evidence: usages.slice(0, 3).map(u => `${u.file}: ${u.pattern}`), + root_cause: 'Same concept referred to with different field names', + impact: 'Data may be lost during field access', + suggested_fix: `Standardize to single field name, add normalization function` + }); + } + } + + // 4. Check for missing schema validation + for (const file of allFiles) { + const content = Read(file); + const relativePath = file.replace(skillPath + '/', ''); + + // Find JSON.parse without validation + const unsafeParses = content.match(/JSON\.parse\s*\([^)]+\)(?!\s*\?\?|\s*\|\|)/g); + const hasValidation = /validat|schema|type.*check/i.test(content); + + if (unsafeParses && unsafeParses.length > 0 && !hasValidation) { + issues.push({ + id: `DF-${issues.length + 1}`, + type: 'dataflow_break', + severity: 'medium', + location: { file: relativePath }, + description: 'JSON parsing without validation', + evidence: unsafeParses.slice(0, 2), + root_cause: 'No schema validation after parsing', + impact: 'Invalid data may propagate through phases', + suggested_fix: 'Add schema validation after JSON.parse' + }); + } + } + + // 5. Check state schema if exists + const stateSchemaFile = Glob(`${skillPath}/phases/state-schema.md`)[0]; + if (stateSchemaFile) { + const schemaContent = Read(stateSchemaFile); + + // Check for type definitions + const hasTypeScript = /interface\s+\w+|type\s+\w+\s*=/i.test(schemaContent); + const hasValidationFunction = /function\s+validate|validateState/i.test(schemaContent); + + if (hasTypeScript && !hasValidationFunction) { + issues.push({ + id: `DF-${issues.length + 1}`, + type: 'dataflow_break', + severity: 'low', + location: { file: 'phases/state-schema.md' }, + description: 'Type definitions without runtime validation', + evidence: ['TypeScript interfaces defined but no validation function'], + root_cause: 'Types are compile-time only, not enforced at runtime', + impact: 'Schema violations may occur at runtime', + suggested_fix: 'Add validateState() function using Zod or manual checks' + }); + } + } else if (state.target_skill.execution_mode === 'autonomous') { + issues.push({ + id: `DF-${issues.length + 1}`, + type: 'dataflow_break', + severity: 'high', + location: { file: 'phases/' }, + description: 'Autonomous skill missing state-schema.md', + evidence: ['No state schema definition found'], + root_cause: 'State structure undefined for orchestrator', + impact: 'Inconsistent state handling across actions', + suggested_fix: 'Create phases/state-schema.md with explicit type definitions' + }); + } + + // 6. Check read-write alignment + const writtenFiles = new Set(writeLocations.map(w => w.target)); + const readFiles = new Set(readLocations.map(r => r.source)); + + const writtenButNotRead = [...writtenFiles].filter(f => + !readFiles.has(f) && !f.includes('output') && !f.includes('report') + ); + + if (writtenButNotRead.length > 0) { + issues.push({ + id: `DF-${issues.length + 1}`, + type: 'dataflow_break', + severity: 'low', + location: { file: 'multiple' }, + description: 'Files written but never read', + evidence: writtenButNotRead.slice(0, 3), + root_cause: 'Orphaned output files', + impact: 'Wasted storage and potential confusion', + suggested_fix: 'Remove unused writes or add reads where needed' + }); + } + + // 7. Calculate severity + const criticalCount = issues.filter(i => i.severity === 'critical').length; + const highCount = issues.filter(i => i.severity === 'high').length; + const severity = criticalCount > 0 ? 'critical' : + highCount > 1 ? 'high' : + highCount > 0 ? 'medium' : + issues.length > 0 ? 'low' : 'none'; + + // 8. Write diagnosis result + const diagnosisResult = { + status: 'completed', + issues_found: issues.length, + severity: severity, + execution_time_ms: Date.now() - startTime, + details: { + patterns_checked: [ + 'scattered_state', + 'inconsistent_naming', + 'missing_validation', + 'read_write_alignment' + ], + patterns_matched: evidence.map(e => e.pattern), + evidence: evidence, + data_flow_map: { + write_locations: writeLocations.length, + read_locations: readLocations.length, + unique_state_files: uniqueStateFiles.length + }, + recommendations: [ + uniqueStateFiles.length > 2 ? 'Implement centralized state manager' : null, + issues.some(i => i.description.includes('naming')) + ? 'Create normalization layer for field names' : null, + issues.some(i => i.description.includes('validation')) + ? 'Add Zod or JSON Schema validation' : null + ].filter(Boolean) + } + }; + + Write(`${workDir}/diagnosis/dataflow-diagnosis.json`, + JSON.stringify(diagnosisResult, null, 2)); + + return { + stateUpdates: { + 'diagnosis.dataflow': diagnosisResult, + issues: [...state.issues, ...issues] + }, + outputFiles: [`${workDir}/diagnosis/dataflow-diagnosis.json`], + summary: `Data flow diagnosis: ${issues.length} issues found (severity: ${severity})` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + 'diagnosis.dataflow': { + status: 'completed', + issues_found: , + severity: '', + // ... full diagnosis result + }, + issues: [...existingIssues, ...newIssues] + } +}; +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| Glob pattern error | Use fallback patterns | +| File read error | Skip and continue | + +## Next Actions + +- Success: action-diagnose-agent (or next in focus_areas) +- Skipped: If 'dataflow' not in focus_areas diff --git a/.claude/skills/skill-tuning/phases/actions/action-diagnose-memory.md b/.claude/skills/skill-tuning/phases/actions/action-diagnose-memory.md new file mode 100644 index 00000000..231c5dea --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-diagnose-memory.md @@ -0,0 +1,269 @@ +# Action: Diagnose Long-tail Forgetting + +Analyze target skill for long-tail effect and constraint forgetting issues. + +## Purpose + +- Detect loss of early instructions in long execution chains +- Identify missing constraint propagation mechanisms +- Find weak goal alignment between phases +- Measure instruction retention across phases + +## Preconditions + +- [ ] state.status === 'running' +- [ ] state.target_skill.path is set +- [ ] 'memory' in state.focus_areas OR state.focus_areas is empty + +## Detection Patterns + +### Pattern 1: Missing Constraint References + +```regex +# Phases that don't reference original requirements +# Look for absence of: requirements, constraints, original, initial, user_request +``` + +### Pattern 2: Goal Drift + +```regex +# Later phases focus on immediate task without global context +/\[TASK\][^[]*(?!\[CONSTRAINTS\]|\[REQUIREMENTS\])/ +``` + +### Pattern 3: No Checkpoint Mechanism + +```regex +# Absence of state preservation at key points +# Look for lack of: checkpoint, snapshot, preserve, restore +``` + +### Pattern 4: Implicit State Passing + +```regex +# State passed implicitly through conversation rather than explicitly +/(? !f.includes('orchestrator') && !f.includes('state-schema')) + .sort(); + + // Extract phase order (for sequential) or action dependencies (for autonomous) + const isAutonomous = state.target_skill.execution_mode === 'autonomous'; + + // 2. Check each phase for constraint awareness + let firstPhaseConstraints = []; + + for (let i = 0; i < phaseFiles.length; i++) { + const file = phaseFiles[i]; + const content = Read(file); + const relativePath = file.replace(skillPath + '/', ''); + const phaseNum = i + 1; + + // Extract constraints from first phase + if (i === 0) { + const constraintMatch = content.match(/\[CONSTRAINTS?\]([^[]*)/i); + if (constraintMatch) { + firstPhaseConstraints = constraintMatch[1] + .split('\n') + .filter(l => l.trim().startsWith('-')) + .map(l => l.trim().replace(/^-\s*/, '')); + } + } + + // Check if later phases reference original constraints + if (i > 0 && firstPhaseConstraints.length > 0) { + const mentionsConstraints = firstPhaseConstraints.some(c => + content.toLowerCase().includes(c.toLowerCase().slice(0, 20)) + ); + + if (!mentionsConstraints) { + issues.push({ + id: `MEM-${issues.length + 1}`, + type: 'memory_loss', + severity: 'high', + location: { file: relativePath, phase: `Phase ${phaseNum}` }, + description: `Phase ${phaseNum} does not reference original constraints`, + evidence: [`Original constraints: ${firstPhaseConstraints.slice(0, 3).join(', ')}`], + root_cause: 'Constraint information not propagated to later phases', + impact: 'May produce output violating original requirements', + suggested_fix: 'Add explicit constraint injection or reference to state.original_constraints' + }); + evidence.push({ + file: relativePath, + pattern: 'missing_constraint_reference', + context: `Phase ${phaseNum} of ${phaseFiles.length}`, + severity: 'high' + }); + } + } + + // Check for goal drift - task without constraints + const hasTask = /\[TASK\]/i.test(content); + const hasConstraints = /\[CONSTRAINTS?\]|\[REQUIREMENTS?\]|\[RULES?\]/i.test(content); + + if (hasTask && !hasConstraints && i > 1) { + issues.push({ + id: `MEM-${issues.length + 1}`, + type: 'memory_loss', + severity: 'medium', + location: { file: relativePath }, + description: 'Phase has TASK but no CONSTRAINTS/RULES section', + evidence: ['Task defined without boundary constraints'], + root_cause: 'Agent may not adhere to global constraints', + impact: 'Potential goal drift from original intent', + suggested_fix: 'Add [CONSTRAINTS] section referencing global rules' + }); + } + + // Check for checkpoint mechanism + const hasCheckpoint = /checkpoint|snapshot|preserve|savepoint/i.test(content); + const isKeyPhase = i === Math.floor(phaseFiles.length / 2) || i === phaseFiles.length - 1; + + if (isKeyPhase && !hasCheckpoint && phaseFiles.length > 3) { + issues.push({ + id: `MEM-${issues.length + 1}`, + type: 'memory_loss', + severity: 'low', + location: { file: relativePath }, + description: 'Key phase without checkpoint mechanism', + evidence: [`Phase ${phaseNum} is a key milestone but has no state preservation`], + root_cause: 'Cannot recover from failures or verify constraint adherence', + impact: 'No rollback capability if constraints violated', + suggested_fix: 'Add checkpoint before major state changes' + }); + } + } + + // 3. Check for explicit state schema with constraints field + const stateSchemaFile = Glob(`${skillPath}/phases/state-schema.md`)[0]; + if (stateSchemaFile) { + const schemaContent = Read(stateSchemaFile); + const hasConstraintsField = /constraints|requirements|original_request/i.test(schemaContent); + + if (!hasConstraintsField) { + issues.push({ + id: `MEM-${issues.length + 1}`, + type: 'memory_loss', + severity: 'medium', + location: { file: 'phases/state-schema.md' }, + description: 'State schema lacks constraints/requirements field', + evidence: ['No dedicated field for preserving original requirements'], + root_cause: 'State structure does not support constraint persistence', + impact: 'Constraints may be lost during state transitions', + suggested_fix: 'Add original_requirements field to state schema' + }); + } + } + + // 4. Check SKILL.md for constraint enforcement in execution flow + const skillMd = Read(`${skillPath}/SKILL.md`); + const hasConstraintVerification = /constraint.*verif|verif.*constraint|quality.*gate/i.test(skillMd); + + if (!hasConstraintVerification && phaseFiles.length > 3) { + issues.push({ + id: `MEM-${issues.length + 1}`, + type: 'memory_loss', + severity: 'medium', + location: { file: 'SKILL.md' }, + description: 'No constraint verification step in execution flow', + evidence: ['Execution flow lacks quality gate or constraint check'], + root_cause: 'No mechanism to verify output matches original intent', + impact: 'Constraint violations may go undetected', + suggested_fix: 'Add verification phase comparing output to original requirements' + }); + } + + // 5. Calculate severity + const criticalCount = issues.filter(i => i.severity === 'critical').length; + const highCount = issues.filter(i => i.severity === 'high').length; + const severity = criticalCount > 0 ? 'critical' : + highCount > 2 ? 'high' : + highCount > 0 ? 'medium' : + issues.length > 0 ? 'low' : 'none'; + + // 6. Write diagnosis result + const diagnosisResult = { + status: 'completed', + issues_found: issues.length, + severity: severity, + execution_time_ms: Date.now() - startTime, + details: { + patterns_checked: [ + 'constraint_propagation', + 'goal_drift', + 'checkpoint_mechanism', + 'state_schema_constraints' + ], + patterns_matched: evidence.map(e => e.pattern), + evidence: evidence, + phase_analysis: { + total_phases: phaseFiles.length, + first_phase_constraints: firstPhaseConstraints.length, + phases_with_constraint_ref: phaseFiles.length - issues.filter(i => + i.description.includes('does not reference')).length + }, + recommendations: [ + highCount > 0 ? 'Implement constraint injection at each phase' : null, + issues.some(i => i.description.includes('checkpoint')) + ? 'Add checkpoint/restore mechanism' : null, + issues.some(i => i.description.includes('State schema')) + ? 'Add original_requirements to state schema' : null + ].filter(Boolean) + } + }; + + Write(`${workDir}/diagnosis/memory-diagnosis.json`, + JSON.stringify(diagnosisResult, null, 2)); + + return { + stateUpdates: { + 'diagnosis.memory': diagnosisResult, + issues: [...state.issues, ...issues] + }, + outputFiles: [`${workDir}/diagnosis/memory-diagnosis.json`], + summary: `Memory diagnosis: ${issues.length} issues found (severity: ${severity})` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + 'diagnosis.memory': { + status: 'completed', + issues_found: , + severity: '', + // ... full diagnosis result + }, + issues: [...existingIssues, ...newIssues] + } +}; +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| Phase file read error | Skip file, continue analysis | +| No phases found | Report as structure issue | + +## Next Actions + +- Success: action-diagnose-dataflow (or next in focus_areas) +- Skipped: If 'memory' not in focus_areas diff --git a/.claude/skills/skill-tuning/phases/actions/action-gemini-analysis.md b/.claude/skills/skill-tuning/phases/actions/action-gemini-analysis.md new file mode 100644 index 00000000..c4610ac8 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-gemini-analysis.md @@ -0,0 +1,322 @@ +# Action: Gemini Analysis + +动态调用 Gemini CLI 进行深度分析,根据用户需求或诊断结果选择分析类型。 + +## Role + +- 接收用户指定的分析需求或从诊断结果推断需求 +- 构建适当的 CLI 命令 +- 执行分析并解析结果 +- 更新状态以供后续动作使用 + +## Preconditions + +- `state.status === 'running'` +- 满足以下任一条件: + - `state.gemini_analysis_requested === true` (用户请求) + - `state.issues.some(i => i.severity === 'critical')` (发现严重问题) + - `state.analysis_type !== null` (已指定分析类型) + +## Analysis Types + +### 1. root_cause - 问题根因分析 + +针对用户描述的问题进行深度分析。 + +```javascript +const analysisPrompt = ` +PURPOSE: Identify root cause of skill execution issue: ${state.user_issue_description} +TASK: +• Analyze skill structure at: ${state.target_skill.path} +• Identify anti-patterns in phase files +• Trace data flow through state management +• Check agent coordination patterns +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: JSON with structure: +{ + "root_causes": [ + { "id": "RC-001", "description": "...", "severity": "high", "evidence": ["file:line"] } + ], + "patterns_found": [ + { "pattern": "...", "type": "anti-pattern|best-practice", "locations": [] } + ], + "recommendations": [ + { "priority": 1, "action": "...", "rationale": "..." } + ] +} +RULES: Focus on execution flow, state management, agent coordination +`; +``` + +### 2. architecture - 架构审查 + +评估 skill 的整体架构设计。 + +```javascript +const analysisPrompt = ` +PURPOSE: Review skill architecture for: ${state.target_skill.name} +TASK: +• Evaluate phase decomposition and responsibility separation +• Check state schema design and data flow +• Assess agent coordination and error handling +• Review scalability and maintainability +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: Markdown report with sections: +- Executive Summary +- Phase Architecture Assessment +- State Management Evaluation +- Agent Coordination Analysis +- Improvement Recommendations (prioritized) +RULES: Focus on modularity, extensibility, maintainability +`; +``` + +### 3. prompt_optimization - 提示词优化 + +分析和优化 phase 中的提示词。 + +```javascript +const analysisPrompt = ` +PURPOSE: Optimize prompts in skill phases for better output quality +TASK: +• Analyze existing prompts for clarity and specificity +• Identify ambiguous instructions +• Check output format specifications +• Evaluate constraint communication +MODE: analysis +CONTEXT: @phases/**/*.md +EXPECTED: JSON with structure: +{ + "prompt_issues": [ + { "file": "...", "issue": "...", "severity": "...", "suggestion": "..." } + ], + "optimized_prompts": [ + { "file": "...", "original": "...", "optimized": "...", "rationale": "..." } + ] +} +RULES: Preserve intent, improve clarity, add structured output requirements +`; +``` + +### 4. performance - 性能分析 + +分析 Token 消耗和执行效率。 + +```javascript +const analysisPrompt = ` +PURPOSE: Analyze performance bottlenecks in skill execution +TASK: +• Estimate token consumption per phase +• Identify redundant data passing +• Check for unnecessary full-content transfers +• Evaluate caching opportunities +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: JSON with structure: +{ + "token_estimates": [ + { "phase": "...", "estimated_tokens": 1000, "breakdown": {} } + ], + "bottlenecks": [ + { "type": "...", "location": "...", "impact": "high|medium|low", "fix": "..." } + ], + "optimization_suggestions": [] +} +RULES: Focus on token efficiency, reduce redundancy +`; +``` + +### 5. custom - 自定义分析 + +用户指定的自定义分析需求。 + +```javascript +const analysisPrompt = ` +PURPOSE: ${state.custom_analysis_purpose} +TASK: ${state.custom_analysis_tasks} +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: ${state.custom_analysis_expected} +RULES: ${state.custom_analysis_rules || 'Follow best practices'} +`; +``` + +## Execution + +```javascript +async function executeGeminiAnalysis(state, workDir) { + // 1. 确定分析类型 + const analysisType = state.analysis_type || determineAnalysisType(state); + + // 2. 构建 prompt + const prompt = buildAnalysisPrompt(analysisType, state); + + // 3. 构建 CLI 命令 + const cliCommand = `ccw cli -p "${escapeForShell(prompt)}" --tool gemini --mode analysis --cd "${state.target_skill.path}"`; + + console.log(`Executing Gemini analysis: ${analysisType}`); + console.log(`Command: ${cliCommand}`); + + // 4. 执行 CLI (后台运行) + const result = Bash({ + command: cliCommand, + run_in_background: true, + timeout: 300000 // 5 minutes + }); + + // 5. 等待结果 + // 注意: 根据 CLAUDE.md 指引,CLI 后台执行后应停止轮询 + // 结果会在 CLI 完成后写入 state + + return { + stateUpdates: { + gemini_analysis: { + type: analysisType, + status: 'running', + started_at: new Date().toISOString(), + task_id: result.task_id + } + }, + outputFiles: [], + summary: `Gemini ${analysisType} analysis started in background` + }; +} + +function determineAnalysisType(state) { + // 根据状态推断分析类型 + if (state.user_issue_description && state.user_issue_description.length > 100) { + return 'root_cause'; + } + if (state.issues.some(i => i.severity === 'critical')) { + return 'root_cause'; + } + if (state.focus_areas.includes('architecture')) { + return 'architecture'; + } + if (state.focus_areas.includes('prompt')) { + return 'prompt_optimization'; + } + if (state.focus_areas.includes('performance')) { + return 'performance'; + } + return 'root_cause'; // 默认 +} + +function buildAnalysisPrompt(type, state) { + const templates = { + root_cause: () => ` +PURPOSE: Identify root cause of skill execution issue: ${state.user_issue_description} +TASK: • Analyze skill structure • Identify anti-patterns • Trace data flow issues • Check agent coordination +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: JSON { root_causes: [], patterns_found: [], recommendations: [] } +RULES: Focus on execution flow, be specific about file:line locations +`, + architecture: () => ` +PURPOSE: Review skill architecture for ${state.target_skill.name} +TASK: • Evaluate phase decomposition • Check state design • Assess agent coordination • Review extensibility +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: Markdown architecture assessment report +RULES: Focus on modularity and maintainability +`, + prompt_optimization: () => ` +PURPOSE: Optimize prompts in skill for better output quality +TASK: • Analyze prompt clarity • Check output specifications • Evaluate constraint handling +MODE: analysis +CONTEXT: @phases/**/*.md +EXPECTED: JSON { prompt_issues: [], optimized_prompts: [] } +RULES: Preserve intent, improve clarity +`, + performance: () => ` +PURPOSE: Analyze performance bottlenecks in skill +TASK: • Estimate token consumption • Identify redundancy • Check data transfer efficiency +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: JSON { token_estimates: [], bottlenecks: [], optimization_suggestions: [] } +RULES: Focus on token efficiency +`, + custom: () => ` +PURPOSE: ${state.custom_analysis_purpose} +TASK: ${state.custom_analysis_tasks} +MODE: analysis +CONTEXT: @**/*.md +EXPECTED: ${state.custom_analysis_expected} +RULES: ${state.custom_analysis_rules || 'Best practices'} +` + }; + + return templates[type](); +} + +function escapeForShell(str) { + // 转义 shell 特殊字符 + return str.replace(/"/g, '\\"').replace(/\$/g, '\\$').replace(/`/g, '\\`'); +} +``` + +## Output + +### State Updates + +```javascript +{ + gemini_analysis: { + type: 'root_cause' | 'architecture' | 'prompt_optimization' | 'performance' | 'custom', + status: 'running' | 'completed' | 'failed', + started_at: '2024-01-01T00:00:00Z', + completed_at: '2024-01-01T00:05:00Z', + task_id: 'xxx', + result: { /* 分析结果 */ }, + error: null + }, + // 分析结果合并到 issues + issues: [ + ...state.issues, + ...newIssuesFromAnalysis + ] +} +``` + +### Output Files + +- `${workDir}/diagnosis/gemini-analysis-${type}.json` - 原始分析结果 +- `${workDir}/diagnosis/gemini-analysis-${type}.md` - 格式化报告 + +## Post-Execution + +分析完成后: +1. 解析 CLI 输出为结构化数据 +2. 提取新发现的 issues 合并到 state.issues +3. 更新 recommendations 到 state +4. 触发下一步动作 (通常是 action-generate-report 或 action-propose-fixes) + +## Error Handling + +| Error | Recovery | +|-------|----------| +| CLI 超时 | 重试一次,仍失败则跳过 Gemini 分析 | +| 解析失败 | 保存原始输出,手动处理 | +| 无结果 | 标记为 skipped,继续流程 | + +## User Interaction + +如果 `state.analysis_type === null` 且无法自动推断,询问用户: + +```javascript +AskUserQuestion({ + questions: [{ + question: '请选择 Gemini 分析类型', + header: '分析类型', + options: [ + { label: '问题根因分析', description: '深度分析用户描述的问题' }, + { label: '架构审查', description: '评估整体架构设计' }, + { label: '提示词优化', description: '分析和优化 phase 提示词' }, + { label: '性能分析', description: '分析 Token 消耗和执行效率' } + ], + multiSelect: false + }] +}); +``` diff --git a/.claude/skills/skill-tuning/phases/actions/action-generate-report.md b/.claude/skills/skill-tuning/phases/actions/action-generate-report.md new file mode 100644 index 00000000..70046c97 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-generate-report.md @@ -0,0 +1,228 @@ +# Action: Generate Consolidated Report + +Generate a comprehensive tuning report merging all diagnosis results with prioritized recommendations. + +## Purpose + +- Merge all diagnosis results into unified report +- Prioritize issues by severity and impact +- Generate actionable recommendations +- Create human-readable markdown report + +## Preconditions + +- [ ] state.status === 'running' +- [ ] All diagnoses in focus_areas are completed +- [ ] state.issues.length > 0 OR generate summary report + +## Execution + +```javascript +async function execute(state, workDir) { + console.log('Generating consolidated tuning report...'); + + const targetSkill = state.target_skill; + const issues = state.issues; + + // 1. Group issues by type + const issuesByType = { + context_explosion: issues.filter(i => i.type === 'context_explosion'), + memory_loss: issues.filter(i => i.type === 'memory_loss'), + dataflow_break: issues.filter(i => i.type === 'dataflow_break'), + agent_failure: issues.filter(i => i.type === 'agent_failure') + }; + + // 2. Group issues by severity + const issuesBySeverity = { + critical: issues.filter(i => i.severity === 'critical'), + high: issues.filter(i => i.severity === 'high'), + medium: issues.filter(i => i.severity === 'medium'), + low: issues.filter(i => i.severity === 'low') + }; + + // 3. Calculate overall health score + const weights = { critical: 25, high: 15, medium: 5, low: 1 }; + const deductions = Object.entries(issuesBySeverity) + .reduce((sum, [sev, arr]) => sum + arr.length * weights[sev], 0); + const healthScore = Math.max(0, 100 - deductions); + + // 4. Generate report content + const report = `# Skill Tuning Report + +**Target Skill**: ${targetSkill.name} +**Path**: ${targetSkill.path} +**Execution Mode**: ${targetSkill.execution_mode} +**Generated**: ${new Date().toISOString()} + +--- + +## Executive Summary + +| Metric | Value | +|--------|-------| +| Health Score | ${healthScore}/100 | +| Total Issues | ${issues.length} | +| Critical | ${issuesBySeverity.critical.length} | +| High | ${issuesBySeverity.high.length} | +| Medium | ${issuesBySeverity.medium.length} | +| Low | ${issuesBySeverity.low.length} | + +### User Reported Issue +> ${state.user_issue_description} + +### Overall Assessment +${healthScore >= 80 ? '✅ Skill is in good health with minor issues.' : + healthScore >= 60 ? '⚠️ Skill has significant issues requiring attention.' : + healthScore >= 40 ? '🔶 Skill has serious issues affecting reliability.' : + '❌ Skill has critical issues requiring immediate fixes.'} + +--- + +## Diagnosis Results + +### Context Explosion Analysis +${state.diagnosis.context ? + `- **Status**: ${state.diagnosis.context.status} +- **Severity**: ${state.diagnosis.context.severity} +- **Issues Found**: ${state.diagnosis.context.issues_found} +- **Key Findings**: ${state.diagnosis.context.details.recommendations.join('; ') || 'None'}` : + '_Not analyzed_'} + +### Long-tail Memory Analysis +${state.diagnosis.memory ? + `- **Status**: ${state.diagnosis.memory.status} +- **Severity**: ${state.diagnosis.memory.severity} +- **Issues Found**: ${state.diagnosis.memory.issues_found} +- **Key Findings**: ${state.diagnosis.memory.details.recommendations.join('; ') || 'None'}` : + '_Not analyzed_'} + +### Data Flow Analysis +${state.diagnosis.dataflow ? + `- **Status**: ${state.diagnosis.dataflow.status} +- **Severity**: ${state.diagnosis.dataflow.severity} +- **Issues Found**: ${state.diagnosis.dataflow.issues_found} +- **Key Findings**: ${state.diagnosis.dataflow.details.recommendations.join('; ') || 'None'}` : + '_Not analyzed_'} + +### Agent Coordination Analysis +${state.diagnosis.agent ? + `- **Status**: ${state.diagnosis.agent.status} +- **Severity**: ${state.diagnosis.agent.severity} +- **Issues Found**: ${state.diagnosis.agent.issues_found} +- **Key Findings**: ${state.diagnosis.agent.details.recommendations.join('; ') || 'None'}` : + '_Not analyzed_'} + +--- + +## Critical & High Priority Issues + +${issuesBySeverity.critical.length + issuesBySeverity.high.length === 0 ? + '_No critical or high priority issues found._' : + [...issuesBySeverity.critical, ...issuesBySeverity.high].map((issue, i) => ` +### ${i + 1}. [${issue.severity.toUpperCase()}] ${issue.description} + +- **ID**: ${issue.id} +- **Type**: ${issue.type} +- **Location**: ${typeof issue.location === 'object' ? issue.location.file : issue.location} +- **Root Cause**: ${issue.root_cause} +- **Impact**: ${issue.impact} +- **Suggested Fix**: ${issue.suggested_fix} + +**Evidence**: +${issue.evidence.map(e => `- \`${e}\``).join('\n')} +`).join('\n')} + +--- + +## Medium & Low Priority Issues + +${issuesBySeverity.medium.length + issuesBySeverity.low.length === 0 ? + '_No medium or low priority issues found._' : + [...issuesBySeverity.medium, ...issuesBySeverity.low].map((issue, i) => ` +### ${i + 1}. [${issue.severity.toUpperCase()}] ${issue.description} + +- **ID**: ${issue.id} +- **Type**: ${issue.type} +- **Suggested Fix**: ${issue.suggested_fix} +`).join('\n')} + +--- + +## Recommended Fix Order + +Based on severity and dependencies, apply fixes in this order: + +${[...issuesBySeverity.critical, ...issuesBySeverity.high, ...issuesBySeverity.medium] + .slice(0, 10) + .map((issue, i) => `${i + 1}. **${issue.id}**: ${issue.suggested_fix}`) + .join('\n')} + +--- + +## Quality Gates + +| Gate | Threshold | Current | Status | +|------|-----------|---------|--------| +| Critical Issues | 0 | ${issuesBySeverity.critical.length} | ${issuesBySeverity.critical.length === 0 ? '✅ PASS' : '❌ FAIL'} | +| High Issues | ≤ 2 | ${issuesBySeverity.high.length} | ${issuesBySeverity.high.length <= 2 ? '✅ PASS' : '❌ FAIL'} | +| Health Score | ≥ 60 | ${healthScore} | ${healthScore >= 60 ? '✅ PASS' : '❌ FAIL'} | + +**Overall Quality Gate**: ${ + issuesBySeverity.critical.length === 0 && + issuesBySeverity.high.length <= 2 && + healthScore >= 60 ? '✅ PASS' : '❌ FAIL'} + +--- + +*Report generated by skill-tuning* +`; + + // 5. Write report + Write(`${workDir}/tuning-report.md`, report); + + // 6. Calculate quality gate + const qualityGate = issuesBySeverity.critical.length === 0 && + issuesBySeverity.high.length <= 2 && + healthScore >= 60 ? 'pass' : + healthScore >= 40 ? 'review' : 'fail'; + + return { + stateUpdates: { + quality_score: healthScore, + quality_gate: qualityGate, + issues_by_severity: { + critical: issuesBySeverity.critical.length, + high: issuesBySeverity.high.length, + medium: issuesBySeverity.medium.length, + low: issuesBySeverity.low.length + } + }, + outputFiles: [`${workDir}/tuning-report.md`], + summary: `Report generated: ${issues.length} issues, health score ${healthScore}/100, gate: ${qualityGate}` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + quality_score: <0-100>, + quality_gate: '', + issues_by_severity: { critical: N, high: N, medium: N, low: N } + } +}; +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| Write error | Retry to alternative path | +| Empty issues | Generate summary with no issues | + +## Next Actions + +- If issues.length > 0: action-propose-fixes +- If issues.length === 0: action-complete diff --git a/.claude/skills/skill-tuning/phases/actions/action-init.md b/.claude/skills/skill-tuning/phases/actions/action-init.md new file mode 100644 index 00000000..ef4843c5 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-init.md @@ -0,0 +1,149 @@ +# Action: Initialize Tuning Session + +Initialize the skill-tuning session by collecting target skill information, creating work directories, and setting up initial state. + +## Purpose + +- Identify target skill to tune +- Collect user's problem description +- Create work directory structure +- Backup original skill files +- Initialize state for orchestrator + +## Preconditions + +- [ ] state.status === 'pending' + +## Execution + +```javascript +async function execute(state, workDir) { + // 1. Ask user for target skill + const skillInput = await AskUserQuestion({ + questions: [{ + question: "Which skill do you want to tune?", + header: "Target Skill", + multiSelect: false, + options: [ + { label: "Specify path", description: "Enter skill directory path" } + ] + }] + }); + + const skillPath = skillInput["Target Skill"]; + + // 2. Validate skill exists and read structure + const skillMdPath = `${skillPath}/SKILL.md`; + if (!Glob(`${skillPath}/SKILL.md`).length) { + throw new Error(`Invalid skill path: ${skillPath} - SKILL.md not found`); + } + + // 3. Read skill metadata + const skillMd = Read(skillMdPath); + const frontMatterMatch = skillMd.match(/^---\n([\s\S]*?)\n---/); + const skillName = frontMatterMatch + ? frontMatterMatch[1].match(/name:\s*(.+)/)?.[1]?.trim() + : skillPath.split('/').pop(); + + // 4. Detect execution mode + const hasOrchestrator = Glob(`${skillPath}/phases/orchestrator.md`).length > 0; + const executionMode = hasOrchestrator ? 'autonomous' : 'sequential'; + + // 5. Scan skill structure + const phases = Glob(`${skillPath}/phases/**/*.md`).map(f => f.replace(skillPath + '/', '')); + const specs = Glob(`${skillPath}/specs/**/*.md`).map(f => f.replace(skillPath + '/', '')); + + // 6. Ask for problem description + const issueInput = await AskUserQuestion({ + questions: [{ + question: "Describe the issue or what you want to optimize:", + header: "Issue", + multiSelect: false, + options: [ + { label: "Context grows too large", description: "Token explosion over multiple turns" }, + { label: "Instructions forgotten", description: "Early constraints lost in long execution" }, + { label: "Data inconsistency", description: "State format changes between phases" }, + { label: "Agent failures", description: "Sub-agent calls fail or return unexpected results" } + ] + }] + }); + + // 7. Ask for focus areas + const focusInput = await AskUserQuestion({ + questions: [{ + question: "Which areas should be diagnosed? (Select all that apply)", + header: "Focus", + multiSelect: true, + options: [ + { label: "context", description: "Context explosion analysis" }, + { label: "memory", description: "Long-tail forgetting analysis" }, + { label: "dataflow", description: "Data flow analysis" }, + { label: "agent", description: "Agent coordination analysis" } + ] + }] + }); + + const focusAreas = focusInput["Focus"] || ['context', 'memory', 'dataflow', 'agent']; + + // 8. Create backup + const backupDir = `${workDir}/backups/${skillName}-backup`; + Bash(`mkdir -p "${backupDir}"`); + Bash(`cp -r "${skillPath}"/* "${backupDir}/"`); + + // 9. Return state updates + return { + stateUpdates: { + status: 'running', + started_at: new Date().toISOString(), + target_skill: { + name: skillName, + path: skillPath, + execution_mode: executionMode, + phases: phases, + specs: specs + }, + user_issue_description: issueInput["Issue"], + focus_areas: Array.isArray(focusAreas) ? focusAreas : [focusAreas], + work_dir: workDir, + backup_dir: backupDir + }, + outputFiles: [], + summary: `Initialized tuning for "${skillName}" (${executionMode} mode), focus: ${focusAreas.join(', ')}` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + status: 'running', + started_at: '', + target_skill: { + name: '', + path: '', + execution_mode: '', + phases: ['...'], + specs: ['...'] + }, + user_issue_description: '', + focus_areas: ['context', 'memory', ...], + work_dir: '', + backup_dir: '' + } +}; +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| Skill path not found | Ask user to re-enter valid path | +| SKILL.md missing | Suggest path correction | +| Backup creation failed | Retry with alternative location | + +## Next Actions + +- Success: Continue to first diagnosis action based on focus_areas +- Failure: action-abort diff --git a/.claude/skills/skill-tuning/phases/actions/action-propose-fixes.md b/.claude/skills/skill-tuning/phases/actions/action-propose-fixes.md new file mode 100644 index 00000000..12463690 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-propose-fixes.md @@ -0,0 +1,317 @@ +# Action: Propose Fixes + +Generate fix proposals for identified issues with implementation strategies. + +## Purpose + +- Create fix strategies for each issue +- Generate implementation plans +- Estimate risk levels +- Allow user to select fixes to apply + +## Preconditions + +- [ ] state.status === 'running' +- [ ] state.issues.length > 0 +- [ ] action-generate-report completed + +## Fix Strategy Catalog + +### Context Explosion Fixes + +| Strategy | Description | Risk | +|----------|-------------|------| +| `context_summarization` | Add summarizer agent between phases | low | +| `sliding_window` | Keep only last N turns in context | low | +| `structured_state` | Replace text context with JSON state | medium | +| `path_reference` | Pass file paths instead of content | low | + +### Memory Loss Fixes + +| Strategy | Description | Risk | +|----------|-------------|------| +| `constraint_injection` | Add constraints to each phase prompt | low | +| `checkpoint_restore` | Save state at milestones | low | +| `goal_embedding` | Track goal similarity throughout | medium | +| `state_constraints_field` | Add constraints field to state schema | low | + +### Data Flow Fixes + +| Strategy | Description | Risk | +|----------|-------------|------| +| `state_centralization` | Single state.json for all data | medium | +| `schema_enforcement` | Add Zod validation | low | +| `field_normalization` | Normalize field names | low | +| `transactional_updates` | Atomic state updates | medium | + +### Agent Coordination Fixes + +| Strategy | Description | Risk | +|----------|-------------|------| +| `error_wrapping` | Add try-catch to all Task calls | low | +| `result_validation` | Validate agent returns | low | +| `orchestrator_refactor` | Centralize agent coordination | high | +| `flatten_nesting` | Remove nested agent calls | medium | + +## Execution + +```javascript +async function execute(state, workDir) { + console.log('Generating fix proposals...'); + + const issues = state.issues; + const fixes = []; + + // Group issues by type for batch fixes + const issuesByType = { + context_explosion: issues.filter(i => i.type === 'context_explosion'), + memory_loss: issues.filter(i => i.type === 'memory_loss'), + dataflow_break: issues.filter(i => i.type === 'dataflow_break'), + agent_failure: issues.filter(i => i.type === 'agent_failure') + }; + + // Generate fixes for context explosion + if (issuesByType.context_explosion.length > 0) { + const ctxIssues = issuesByType.context_explosion; + + if (ctxIssues.some(i => i.description.includes('history accumulation'))) { + fixes.push({ + id: `FIX-${fixes.length + 1}`, + issue_ids: ctxIssues.filter(i => i.description.includes('history')).map(i => i.id), + strategy: 'sliding_window', + description: 'Implement sliding window for conversation history', + rationale: 'Prevents unbounded context growth by keeping only recent turns', + changes: [{ + file: 'phases/orchestrator.md', + action: 'modify', + diff: `+ const MAX_HISTORY = 5; ++ state.history = state.history.slice(-MAX_HISTORY);` + }], + risk: 'low', + estimated_impact: 'Reduces token usage by ~50%', + verification_steps: ['Run skill with 10+ iterations', 'Verify context size stable'] + }); + } + + if (ctxIssues.some(i => i.description.includes('full content'))) { + fixes.push({ + id: `FIX-${fixes.length + 1}`, + issue_ids: ctxIssues.filter(i => i.description.includes('content')).map(i => i.id), + strategy: 'path_reference', + description: 'Pass file paths instead of full content', + rationale: 'Agents can read files when needed, reducing prompt size', + changes: [{ + file: 'phases/*.md', + action: 'modify', + diff: `- prompt: \${content} ++ prompt: Read file at: \${filePath}` + }], + risk: 'low', + estimated_impact: 'Significant token reduction', + verification_steps: ['Verify agents can still access needed content'] + }); + } + } + + // Generate fixes for memory loss + if (issuesByType.memory_loss.length > 0) { + const memIssues = issuesByType.memory_loss; + + if (memIssues.some(i => i.description.includes('constraint'))) { + fixes.push({ + id: `FIX-${fixes.length + 1}`, + issue_ids: memIssues.filter(i => i.description.includes('constraint')).map(i => i.id), + strategy: 'constraint_injection', + description: 'Add constraint injection to all phases', + rationale: 'Ensures original requirements are visible in every phase', + changes: [{ + file: 'phases/*.md', + action: 'modify', + diff: `+ [CONSTRAINTS] ++ Original requirements from state.original_requirements: ++ \${JSON.stringify(state.original_requirements)}` + }], + risk: 'low', + estimated_impact: 'Improves constraint adherence', + verification_steps: ['Run skill with specific constraints', 'Verify output matches'] + }); + } + + if (memIssues.some(i => i.description.includes('State schema'))) { + fixes.push({ + id: `FIX-${fixes.length + 1}`, + issue_ids: memIssues.filter(i => i.description.includes('schema')).map(i => i.id), + strategy: 'state_constraints_field', + description: 'Add original_requirements field to state schema', + rationale: 'Preserves original intent throughout execution', + changes: [{ + file: 'phases/state-schema.md', + action: 'modify', + diff: `+ original_requirements: string[]; // User's original constraints ++ goal_summary: string; // One-line goal statement` + }], + risk: 'low', + estimated_impact: 'Enables constraint tracking', + verification_steps: ['Verify state includes requirements after init'] + }); + } + } + + // Generate fixes for data flow + if (issuesByType.dataflow_break.length > 0) { + const dfIssues = issuesByType.dataflow_break; + + if (dfIssues.some(i => i.description.includes('multiple locations'))) { + fixes.push({ + id: `FIX-${fixes.length + 1}`, + issue_ids: dfIssues.filter(i => i.description.includes('location')).map(i => i.id), + strategy: 'state_centralization', + description: 'Centralize all state to single state.json', + rationale: 'Single source of truth prevents inconsistencies', + changes: [{ + file: 'phases/*.md', + action: 'modify', + diff: `- Write(\`\${workDir}/config.json\`, ...) ++ updateState({ config: ... }) // Use state manager` + }], + risk: 'medium', + estimated_impact: 'Eliminates state fragmentation', + verification_steps: ['Verify all reads come from state.json', 'Test state persistence'] + }); + } + + if (dfIssues.some(i => i.description.includes('validation'))) { + fixes.push({ + id: `FIX-${fixes.length + 1}`, + issue_ids: dfIssues.filter(i => i.description.includes('validation')).map(i => i.id), + strategy: 'schema_enforcement', + description: 'Add Zod schema validation', + rationale: 'Runtime validation catches schema violations', + changes: [{ + file: 'phases/state-schema.md', + action: 'modify', + diff: `+ import { z } from 'zod'; ++ const StateSchema = z.object({...}); ++ function validateState(s) { return StateSchema.parse(s); }` + }], + risk: 'low', + estimated_impact: 'Catches invalid state early', + verification_steps: ['Test with invalid state input', 'Verify error thrown'] + }); + } + } + + // Generate fixes for agent coordination + if (issuesByType.agent_failure.length > 0) { + const agentIssues = issuesByType.agent_failure; + + if (agentIssues.some(i => i.description.includes('error handling'))) { + fixes.push({ + id: `FIX-${fixes.length + 1}`, + issue_ids: agentIssues.filter(i => i.description.includes('error')).map(i => i.id), + strategy: 'error_wrapping', + description: 'Wrap all Task calls in try-catch', + rationale: 'Prevents cascading failures from agent errors', + changes: [{ + file: 'phases/*.md', + action: 'modify', + diff: `+ try { + const result = await Task({...}); ++ if (!result) throw new Error('Empty result'); ++ } catch (e) { ++ updateState({ errors: [...errors, e.message], error_count: error_count + 1 }); ++ }` + }], + risk: 'low', + estimated_impact: 'Improves error resilience', + verification_steps: ['Simulate agent failure', 'Verify graceful handling'] + }); + } + + if (agentIssues.some(i => i.description.includes('nested'))) { + fixes.push({ + id: `FIX-${fixes.length + 1}`, + issue_ids: agentIssues.filter(i => i.description.includes('nested')).map(i => i.id), + strategy: 'flatten_nesting', + description: 'Flatten nested agent calls', + rationale: 'Reduces complexity and context explosion', + changes: [{ + file: 'phases/orchestrator.md', + action: 'modify', + diff: `// Instead of agent calling agent: +// Agent A returns {needs_agent_b: true} +// Orchestrator sees this and calls Agent B next` + }], + risk: 'medium', + estimated_impact: 'Reduces nesting depth', + verification_steps: ['Verify no nested Task calls', 'Test agent chaining via orchestrator'] + }); + } + } + + // Write fix proposals + Write(`${workDir}/fixes/fix-proposals.json`, JSON.stringify(fixes, null, 2)); + + // Ask user to select fixes to apply + const fixOptions = fixes.slice(0, 4).map(f => ({ + label: f.id, + description: `[${f.risk.toUpperCase()} risk] ${f.description}` + })); + + if (fixOptions.length > 0) { + const selection = await AskUserQuestion({ + questions: [{ + question: 'Which fixes would you like to apply?', + header: 'Fixes', + multiSelect: true, + options: fixOptions + }] + }); + + const selectedFixIds = Array.isArray(selection['Fixes']) + ? selection['Fixes'] + : [selection['Fixes']]; + + return { + stateUpdates: { + proposed_fixes: fixes, + pending_fixes: selectedFixIds.filter(id => id && fixes.some(f => f.id === id)) + }, + outputFiles: [`${workDir}/fixes/fix-proposals.json`], + summary: `Generated ${fixes.length} fix proposals, ${selectedFixIds.length} selected for application` + }; + } + + return { + stateUpdates: { + proposed_fixes: fixes, + pending_fixes: [] + }, + outputFiles: [`${workDir}/fixes/fix-proposals.json`], + summary: `Generated ${fixes.length} fix proposals (none selected)` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + proposed_fixes: [...fixes], + pending_fixes: [...selectedFixIds] + } +}; +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| No issues to fix | Skip to action-complete | +| User cancels selection | Set pending_fixes to empty | + +## Next Actions + +- If pending_fixes.length > 0: action-apply-fix +- If pending_fixes.length === 0: action-complete diff --git a/.claude/skills/skill-tuning/phases/actions/action-verify.md b/.claude/skills/skill-tuning/phases/actions/action-verify.md new file mode 100644 index 00000000..f13b9423 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/actions/action-verify.md @@ -0,0 +1,222 @@ +# Action: Verify Applied Fixes + +Verify that applied fixes resolved the targeted issues. + +## Purpose + +- Re-run relevant diagnostics +- Compare before/after issue counts +- Update verification status +- Determine if more iterations needed + +## Preconditions + +- [ ] state.status === 'running' +- [ ] state.applied_fixes.length > 0 +- [ ] Some applied_fixes have verification_result === 'pending' + +## Execution + +```javascript +async function execute(state, workDir) { + console.log('Verifying applied fixes...'); + + const appliedFixes = state.applied_fixes.filter(f => f.verification_result === 'pending'); + + if (appliedFixes.length === 0) { + return { + stateUpdates: {}, + outputFiles: [], + summary: 'No fixes pending verification' + }; + } + + const verificationResults = []; + + for (const fix of appliedFixes) { + const proposedFix = state.proposed_fixes.find(f => f.id === fix.fix_id); + + if (!proposedFix) { + verificationResults.push({ + fix_id: fix.fix_id, + result: 'fail', + reason: 'Fix definition not found' + }); + continue; + } + + // Determine which diagnosis to re-run based on fix strategy + const strategyToDiagnosis = { + 'context_summarization': 'context', + 'sliding_window': 'context', + 'structured_state': 'context', + 'path_reference': 'context', + 'constraint_injection': 'memory', + 'checkpoint_restore': 'memory', + 'goal_embedding': 'memory', + 'state_constraints_field': 'memory', + 'state_centralization': 'dataflow', + 'schema_enforcement': 'dataflow', + 'field_normalization': 'dataflow', + 'transactional_updates': 'dataflow', + 'error_wrapping': 'agent', + 'result_validation': 'agent', + 'orchestrator_refactor': 'agent', + 'flatten_nesting': 'agent' + }; + + const diagnosisType = strategyToDiagnosis[proposedFix.strategy]; + + // For now, do a lightweight verification + // Full implementation would re-run the specific diagnosis + + // Check if the fix was actually applied (look for markers) + const targetPath = state.target_skill.path; + const fixMarker = `Applied fix ${fix.fix_id}`; + + let fixFound = false; + const allFiles = Glob(`${targetPath}/**/*.md`); + + for (const file of allFiles) { + const content = Read(file); + if (content.includes(fixMarker)) { + fixFound = true; + break; + } + } + + if (fixFound) { + // Verify by checking if original issues still exist + const relatedIssues = proposedFix.issue_ids; + const originalIssueCount = relatedIssues.length; + + // Simplified verification: assume fix worked if marker present + // Real implementation would re-run diagnosis patterns + + verificationResults.push({ + fix_id: fix.fix_id, + result: 'pass', + reason: `Fix applied successfully, addressing ${originalIssueCount} issues`, + issues_resolved: relatedIssues + }); + } else { + verificationResults.push({ + fix_id: fix.fix_id, + result: 'fail', + reason: 'Fix marker not found in target files' + }); + } + } + + // Update applied fixes with verification results + const updatedAppliedFixes = state.applied_fixes.map(fix => { + const result = verificationResults.find(v => v.fix_id === fix.fix_id); + if (result) { + return { + ...fix, + verification_result: result.result + }; + } + return fix; + }); + + // Calculate new quality score + const passedFixes = verificationResults.filter(v => v.result === 'pass').length; + const totalFixes = verificationResults.length; + const verificationRate = totalFixes > 0 ? (passedFixes / totalFixes) * 100 : 100; + + // Recalculate issues (remove resolved ones) + const resolvedIssueIds = verificationResults + .filter(v => v.result === 'pass') + .flatMap(v => v.issues_resolved || []); + + const remainingIssues = state.issues.filter(i => !resolvedIssueIds.includes(i.id)); + + // Recalculate quality score + const weights = { critical: 25, high: 15, medium: 5, low: 1 }; + const deductions = remainingIssues.reduce((sum, issue) => + sum + (weights[issue.severity] || 0), 0); + const newHealthScore = Math.max(0, 100 - deductions); + + // Determine new quality gate + const remainingCritical = remainingIssues.filter(i => i.severity === 'critical').length; + const remainingHigh = remainingIssues.filter(i => i.severity === 'high').length; + const newQualityGate = remainingCritical === 0 && remainingHigh <= 2 && newHealthScore >= 60 + ? 'pass' + : newHealthScore >= 40 ? 'review' : 'fail'; + + // Increment iteration count + const newIterationCount = state.iteration_count + 1; + + // Ask user if they want to continue + let continueIteration = false; + if (newQualityGate !== 'pass' && newIterationCount < state.max_iterations) { + const continueResponse = await AskUserQuestion({ + questions: [{ + question: `Verification complete. Quality gate: ${newQualityGate}. Continue with another iteration?`, + header: 'Continue', + multiSelect: false, + options: [ + { label: 'Yes', description: `Run iteration ${newIterationCount + 1}` }, + { label: 'No', description: 'Finish with current state' } + ] + }] + }); + continueIteration = continueResponse['Continue'] === 'Yes'; + } + + // If continuing, reset diagnosis for re-evaluation + const diagnosisReset = continueIteration ? { + 'diagnosis.context': null, + 'diagnosis.memory': null, + 'diagnosis.dataflow': null, + 'diagnosis.agent': null + } : {}; + + return { + stateUpdates: { + applied_fixes: updatedAppliedFixes, + issues: remainingIssues, + quality_score: newHealthScore, + quality_gate: newQualityGate, + iteration_count: newIterationCount, + ...diagnosisReset, + issues_by_severity: { + critical: remainingIssues.filter(i => i.severity === 'critical').length, + high: remainingIssues.filter(i => i.severity === 'high').length, + medium: remainingIssues.filter(i => i.severity === 'medium').length, + low: remainingIssues.filter(i => i.severity === 'low').length + } + }, + outputFiles: [], + summary: `Verified ${totalFixes} fixes: ${passedFixes} passed. Score: ${newHealthScore}, Gate: ${newQualityGate}, Iteration: ${newIterationCount}` + }; +} +``` + +## State Updates + +```javascript +return { + stateUpdates: { + applied_fixes: [...updatedWithVerificationResults], + issues: [...remainingIssues], + quality_score: newScore, + quality_gate: newGate, + iteration_count: iteration + 1 + } +}; +``` + +## Error Handling + +| Error Type | Recovery | +|------------|----------| +| Re-diagnosis fails | Mark as 'inconclusive' | +| File access error | Skip file verification | + +## Next Actions + +- If quality_gate === 'pass': action-complete +- If user chose to continue: restart diagnosis cycle +- If max_iterations reached: action-complete diff --git a/.claude/skills/skill-tuning/phases/orchestrator.md b/.claude/skills/skill-tuning/phases/orchestrator.md new file mode 100644 index 00000000..d723cc10 --- /dev/null +++ b/.claude/skills/skill-tuning/phases/orchestrator.md @@ -0,0 +1,335 @@ +# Orchestrator + +Autonomous orchestrator for skill-tuning workflow. Reads current state and selects the next action based on diagnosis progress and quality gates. + +## Role + +Drive the tuning workflow by: +1. Reading current session state +2. Selecting the appropriate next action +3. Executing the action via sub-agent +4. Updating state with results +5. Repeating until termination conditions met + +## State Management + +### Read State + +```javascript +const state = JSON.parse(Read(`${workDir}/state.json`)); +``` + +### Update State + +```javascript +function updateState(updates) { + const state = JSON.parse(Read(`${workDir}/state.json`)); + const newState = { + ...state, + ...updates, + updated_at: new Date().toISOString() + }; + Write(`${workDir}/state.json`, JSON.stringify(newState, null, 2)); + return newState; +} +``` + +## Decision Logic + +```javascript +function selectNextAction(state) { + // === Termination Checks === + + // User exit + if (state.status === 'user_exit') return null; + + // Completed + if (state.status === 'completed') return null; + + // Error limit exceeded + if (state.error_count >= state.max_errors) { + return 'action-abort'; + } + + // Max iterations exceeded + if (state.iteration_count >= state.max_iterations) { + return 'action-complete'; + } + + // === Action Selection === + + // 1. Not initialized yet + if (state.status === 'pending') { + return 'action-init'; + } + + // 2. Check if Gemini analysis is requested or needed + if (shouldTriggerGeminiAnalysis(state)) { + return 'action-gemini-analysis'; + } + + // 3. Check if Gemini analysis is running + if (state.gemini_analysis?.status === 'running') { + // Wait for Gemini analysis to complete + return null; // Orchestrator will be re-triggered when CLI completes + } + + // 4. Run diagnosis in order (only if not completed) + const diagnosisOrder = ['context', 'memory', 'dataflow', 'agent']; + + for (const diagType of diagnosisOrder) { + if (state.diagnosis[diagType] === null) { + // Check if user wants to skip this diagnosis + if (!state.focus_areas.length || state.focus_areas.includes(diagType)) { + return `action-diagnose-${diagType}`; + } + } + } + + // 5. All diagnosis complete, generate report if not done + const allDiagnosisComplete = diagnosisOrder.every( + d => state.diagnosis[d] !== null || !state.focus_areas.includes(d) + ); + + if (allDiagnosisComplete && !state.completed_actions.includes('action-generate-report')) { + return 'action-generate-report'; + } + + // 6. Report generated, propose fixes if not done + if (state.completed_actions.includes('action-generate-report') && + state.proposed_fixes.length === 0 && + state.issues.length > 0) { + return 'action-propose-fixes'; + } + + // 7. Fixes proposed, check if user wants to apply + if (state.proposed_fixes.length > 0 && state.pending_fixes.length > 0) { + return 'action-apply-fix'; + } + + // 8. Fixes applied, verify + if (state.applied_fixes.length > 0 && + state.applied_fixes.some(f => f.verification_result === 'pending')) { + return 'action-verify'; + } + + // 9. Quality gate check + if (state.quality_gate === 'pass') { + return 'action-complete'; + } + + // 10. More iterations needed + if (state.iteration_count < state.max_iterations && + state.quality_gate !== 'pass' && + state.issues.some(i => i.severity === 'critical' || i.severity === 'high')) { + // Reset diagnosis for re-evaluation + return 'action-diagnose-context'; // Start new iteration + } + + // 11. Default: complete + return 'action-complete'; +} + +/** + * 判断是否需要触发 Gemini CLI 分析 + */ +function shouldTriggerGeminiAnalysis(state) { + // 已完成 Gemini 分析,不再触发 + if (state.gemini_analysis?.status === 'completed') { + return false; + } + + // 用户显式请求 + if (state.gemini_analysis_requested === true) { + return true; + } + + // 发现 critical 问题且未进行深度分析 + if (state.issues.some(i => i.severity === 'critical') && + !state.completed_actions.includes('action-gemini-analysis')) { + return true; + } + + // 用户指定了需要 Gemini 分析的 focus_areas + const geminiAreas = ['architecture', 'prompt', 'performance', 'custom']; + if (state.focus_areas.some(area => geminiAreas.includes(area))) { + return true; + } + + // 标准诊断完成但问题未得到解决,需要深度分析 + const diagnosisComplete = ['context', 'memory', 'dataflow', 'agent'].every( + d => state.diagnosis[d] !== null + ); + if (diagnosisComplete && + state.issues.length > 0 && + state.iteration_count > 0 && + !state.completed_actions.includes('action-gemini-analysis')) { + // 第二轮迭代如果问题仍存在,触发 Gemini 分析 + return true; + } + + return false; +} +``` + +## Execution Loop + +```javascript +async function runOrchestrator(workDir) { + console.log('=== Skill Tuning Orchestrator Started ==='); + + let iteration = 0; + const MAX_LOOP_ITERATIONS = 50; // Safety limit + + while (iteration < MAX_LOOP_ITERATIONS) { + iteration++; + + // 1. Read current state + const state = JSON.parse(Read(`${workDir}/state.json`)); + console.log(`[Loop ${iteration}] Status: ${state.status}, Action: ${state.current_action}`); + + // 2. Select next action + const actionId = selectNextAction(state); + + if (!actionId) { + console.log('No action selected, terminating orchestrator.'); + break; + } + + console.log(`[Loop ${iteration}] Executing: ${actionId}`); + + // 3. Update state: current action + updateState({ + current_action: actionId, + action_history: [...state.action_history, { + action: actionId, + started_at: new Date().toISOString(), + completed_at: null, + result: null, + output_files: [] + }] + }); + + // 4. Execute action + try { + const actionPrompt = Read(`phases/actions/${actionId}.md`); + const stateJson = JSON.stringify(state, null, 2); + + const result = await Task({ + subagent_type: 'universal-executor', + run_in_background: false, + prompt: ` +[CONTEXT] +You are executing action "${actionId}" for skill-tuning workflow. +Work directory: ${workDir} + +[STATE] +${stateJson} + +[ACTION INSTRUCTIONS] +${actionPrompt} + +[OUTPUT REQUIREMENT] +After completing the action: +1. Write any output files to the work directory +2. Return a JSON object with: + - stateUpdates: object with state fields to update + - outputFiles: array of files created + - summary: brief description of what was done +` + }); + + // 5. Parse result and update state + let actionResult; + try { + actionResult = JSON.parse(result); + } catch (e) { + actionResult = { + stateUpdates: {}, + outputFiles: [], + summary: result + }; + } + + // 6. Update state: action complete + const updatedHistory = [...state.action_history]; + updatedHistory[updatedHistory.length - 1] = { + ...updatedHistory[updatedHistory.length - 1], + completed_at: new Date().toISOString(), + result: 'success', + output_files: actionResult.outputFiles || [] + }; + + updateState({ + current_action: null, + completed_actions: [...state.completed_actions, actionId], + action_history: updatedHistory, + ...actionResult.stateUpdates + }); + + console.log(`[Loop ${iteration}] Completed: ${actionId}`); + + } catch (error) { + console.log(`[Loop ${iteration}] Error in ${actionId}: ${error.message}`); + + // Error handling + updateState({ + current_action: null, + errors: [...state.errors, { + action: actionId, + message: error.message, + timestamp: new Date().toISOString(), + recoverable: true + }], + error_count: state.error_count + 1 + }); + } + } + + console.log('=== Skill Tuning Orchestrator Finished ==='); +} +``` + +## Action Catalog + +| Action | Purpose | Preconditions | Effects | +|--------|---------|---------------|---------| +| [action-init](actions/action-init.md) | Initialize tuning session | status === 'pending' | Creates work dirs, backup, sets status='running' | +| [action-diagnose-context](actions/action-diagnose-context.md) | Analyze context explosion | status === 'running' | Sets diagnosis.context | +| [action-diagnose-memory](actions/action-diagnose-memory.md) | Analyze long-tail forgetting | status === 'running' | Sets diagnosis.memory | +| [action-diagnose-dataflow](actions/action-diagnose-dataflow.md) | Analyze data flow issues | status === 'running' | Sets diagnosis.dataflow | +| [action-diagnose-agent](actions/action-diagnose-agent.md) | Analyze agent coordination | status === 'running' | Sets diagnosis.agent | +| [action-gemini-analysis](actions/action-gemini-analysis.md) | Deep analysis via Gemini CLI | User request OR critical issues | Sets gemini_analysis, adds issues | +| [action-generate-report](actions/action-generate-report.md) | Generate consolidated report | All diagnoses complete | Creates tuning-report.md | +| [action-propose-fixes](actions/action-propose-fixes.md) | Generate fix proposals | Report generated, issues > 0 | Sets proposed_fixes | +| [action-apply-fix](actions/action-apply-fix.md) | Apply selected fix | pending_fixes > 0 | Updates applied_fixes | +| [action-verify](actions/action-verify.md) | Verify applied fixes | applied_fixes with pending verification | Updates verification_result | +| [action-complete](actions/action-complete.md) | Finalize session | quality_gate='pass' OR max_iterations | Sets status='completed' | +| [action-abort](actions/action-abort.md) | Abort on errors | error_count >= max_errors | Sets status='failed' | + +## Termination Conditions + +- `status === 'completed'`: Normal completion +- `status === 'user_exit'`: User requested exit +- `status === 'failed'`: Unrecoverable error +- `error_count >= max_errors`: Too many errors (default: 3) +- `iteration_count >= max_iterations`: Max iterations reached (default: 5) +- `quality_gate === 'pass'`: All quality criteria met + +## Error Recovery + +| Error Type | Recovery Strategy | +|------------|-------------------| +| Action execution failed | Retry up to 3 times, then skip | +| State parse error | Restore from backup | +| File write error | Retry with alternative path | +| User abort | Save state and exit gracefully | + +## User Interaction Points + +The orchestrator pauses for user input at these points: + +1. **action-init**: Confirm target skill and describe issue +2. **action-propose-fixes**: Select which fixes to apply +3. **action-verify**: Review verification results, decide to continue or stop +4. **action-complete**: Review final summary diff --git a/.claude/skills/skill-tuning/phases/state-schema.md b/.claude/skills/skill-tuning/phases/state-schema.md new file mode 100644 index 00000000..05fbf87c --- /dev/null +++ b/.claude/skills/skill-tuning/phases/state-schema.md @@ -0,0 +1,282 @@ +# State Schema + +Defines the state structure for skill-tuning orchestrator. + +## State Structure + +```typescript +interface TuningState { + // === Core Status === + status: 'pending' | 'running' | 'completed' | 'failed'; + started_at: string; // ISO timestamp + updated_at: string; // ISO timestamp + + // === Target Skill Info === + target_skill: { + name: string; // e.g., "software-manual" + path: string; // e.g., ".claude/skills/software-manual" + execution_mode: 'sequential' | 'autonomous'; + phases: string[]; // List of phase files + specs: string[]; // List of spec files + }; + + // === User Input === + user_issue_description: string; // User's problem description + focus_areas: string[]; // User-specified focus (optional) + + // === Diagnosis Results === + diagnosis: { + context: DiagnosisResult | null; + memory: DiagnosisResult | null; + dataflow: DiagnosisResult | null; + agent: DiagnosisResult | null; + }; + + // === Issues Found === + issues: Issue[]; + issues_by_severity: { + critical: number; + high: number; + medium: number; + low: number; + }; + + // === Fix Management === + proposed_fixes: Fix[]; + applied_fixes: AppliedFix[]; + pending_fixes: string[]; // Fix IDs pending application + + // === Iteration Control === + iteration_count: number; + max_iterations: number; // Default: 5 + + // === Quality Metrics === + quality_score: number; // 0-100 + quality_gate: 'pass' | 'review' | 'fail'; + + // === Orchestrator State === + completed_actions: string[]; + current_action: string | null; + action_history: ActionHistoryEntry[]; + + // === Error Handling === + errors: ErrorEntry[]; + error_count: number; + max_errors: number; // Default: 3 + + // === Output Paths === + work_dir: string; + backup_dir: string; +} + +interface DiagnosisResult { + status: 'completed' | 'skipped' | 'failed'; + issues_found: number; + severity: 'critical' | 'high' | 'medium' | 'low' | 'none'; + execution_time_ms: number; + details: { + patterns_checked: string[]; + patterns_matched: string[]; + evidence: Evidence[]; + recommendations: string[]; + }; +} + +interface Evidence { + file: string; + line?: number; + pattern: string; + context: string; + severity: string; +} + +interface Issue { + id: string; // e.g., "ISS-001" + type: 'context_explosion' | 'memory_loss' | 'dataflow_break' | 'agent_failure'; + severity: 'critical' | 'high' | 'medium' | 'low'; + priority: number; // 1 = highest + location: { + file: string; + line_start?: number; + line_end?: number; + phase?: string; + }; + description: string; + evidence: string[]; + root_cause: string; + impact: string; + suggested_fix: string; + related_issues: string[]; // Issue IDs +} + +interface Fix { + id: string; // e.g., "FIX-001" + issue_ids: string[]; // Issues this fix addresses + strategy: FixStrategy; + description: string; + rationale: string; + changes: FileChange[]; + risk: 'low' | 'medium' | 'high'; + estimated_impact: string; + verification_steps: string[]; +} + +type FixStrategy = + | 'context_summarization' // Add context compression + | 'sliding_window' // Implement sliding context window + | 'structured_state' // Convert to structured state passing + | 'constraint_injection' // Add constraint propagation + | 'checkpoint_restore' // Add checkpointing mechanism + | 'schema_enforcement' // Add data contract validation + | 'orchestrator_refactor' // Refactor agent coordination + | 'state_centralization' // Centralize state management + | 'custom'; // Custom fix + +interface FileChange { + file: string; + action: 'create' | 'modify' | 'delete'; + old_content?: string; + new_content?: string; + diff?: string; +} + +interface AppliedFix { + fix_id: string; + applied_at: string; + success: boolean; + backup_path: string; + verification_result: 'pass' | 'fail' | 'pending'; + rollback_available: boolean; +} + +interface ActionHistoryEntry { + action: string; + started_at: string; + completed_at: string; + result: 'success' | 'failure' | 'skipped'; + output_files: string[]; +} + +interface ErrorEntry { + action: string; + message: string; + timestamp: string; + recoverable: boolean; +} +``` + +## Initial State Template + +```json +{ + "status": "pending", + "started_at": null, + "updated_at": null, + "target_skill": { + "name": null, + "path": null, + "execution_mode": null, + "phases": [], + "specs": [] + }, + "user_issue_description": "", + "focus_areas": [], + "diagnosis": { + "context": null, + "memory": null, + "dataflow": null, + "agent": null + }, + "issues": [], + "issues_by_severity": { + "critical": 0, + "high": 0, + "medium": 0, + "low": 0 + }, + "proposed_fixes": [], + "applied_fixes": [], + "pending_fixes": [], + "iteration_count": 0, + "max_iterations": 5, + "quality_score": 0, + "quality_gate": "fail", + "completed_actions": [], + "current_action": null, + "action_history": [], + "errors": [], + "error_count": 0, + "max_errors": 3, + "work_dir": null, + "backup_dir": null +} +``` + +## State Transition Diagram + +``` + ┌─────────────┐ + │ pending │ + └──────┬──────┘ + │ action-init + ↓ + ┌─────────────┐ + ┌──────────│ running │──────────┐ + │ └──────┬──────┘ │ + │ │ │ + diagnosis │ ┌────────────┼────────────┐ │ error_count >= 3 + actions │ │ │ │ │ + │ ↓ ↓ ↓ │ + │ context memory dataflow │ + │ │ │ │ │ + │ └────────────┼────────────┘ │ + │ │ │ + │ ↓ │ + │ action-verify │ + │ │ │ + │ ┌───────────┼───────────┐ │ + │ │ │ │ │ + │ ↓ ↓ ↓ │ + │ quality iterate apply │ + │ gate=pass (< max) fix │ + │ │ │ │ │ + │ │ └───────────┘ │ + │ ↓ ↓ + │ ┌─────────────┐ ┌─────────────┐ + └→│ completed │ │ failed │ + └─────────────┘ └─────────────┘ +``` + +## State Update Rules + +### Atomicity +All state updates must be atomic - read current state, apply changes, write entire state. + +### Immutability +Never mutate state in place. Always create new state object with changes. + +### Validation +Before writing state, validate against schema to prevent corruption. + +### Timestamps +Always update `updated_at` on every state change. + +```javascript +function updateState(workDir, updates) { + const currentState = JSON.parse(Read(`${workDir}/state.json`)); + + const newState = { + ...currentState, + ...updates, + updated_at: new Date().toISOString() + }; + + // Validate before write + if (!validateState(newState)) { + throw new Error('Invalid state update'); + } + + Write(`${workDir}/state.json`, JSON.stringify(newState, null, 2)); + return newState; +} +``` diff --git a/.claude/skills/skill-tuning/specs/problem-taxonomy.md b/.claude/skills/skill-tuning/specs/problem-taxonomy.md new file mode 100644 index 00000000..3e5238f5 --- /dev/null +++ b/.claude/skills/skill-tuning/specs/problem-taxonomy.md @@ -0,0 +1,210 @@ +# Problem Taxonomy + +Classification of skill execution issues with detection patterns and severity criteria. + +## When to Use + +| Phase | Usage | Section | +|-------|-------|---------| +| All Diagnosis Actions | Issue classification | All sections | +| action-propose-fixes | Strategy selection | Fix Mapping | +| action-generate-report | Severity assessment | Severity Criteria | + +--- + +## Problem Categories + +### 1. Context Explosion (P2) + +**Definition**: Excessive token accumulation causing prompt size to grow unbounded. + +**Root Causes**: +- Unbounded conversation history +- Full content passing instead of references +- Missing summarization mechanisms +- Agent returning full output instead of path+summary + +**Detection Patterns**: + +| Pattern ID | Regex/Check | Description | +|------------|-------------|-------------| +| CTX-001 | `/history\s*[.=].*push\|concat/` | History array growth | +| CTX-002 | `/JSON\.stringify\s*\(\s*state\s*\)/` | Full state serialization | +| CTX-003 | `/Read\([^)]+\)\s*[\+,]/` | Multiple file content concatenation | +| CTX-004 | `/return\s*\{[^}]*content:/` | Agent returning full content | +| CTX-005 | File length > 5000 chars without summarize | Long prompt without compression | + +**Impact Levels**: +- **Critical**: Context exceeds model limit (128K tokens) +- **High**: Context > 50K tokens per iteration +- **Medium**: Context grows 10%+ per iteration +- **Low**: Potential for growth but currently manageable + +--- + +### 2. Long-tail Forgetting (P3) + +**Definition**: Loss of early instructions, constraints, or goals in long execution chains. + +**Root Causes**: +- No explicit constraint propagation +- Reliance on implicit context +- Missing checkpoint/restore mechanisms +- State schema without requirements field + +**Detection Patterns**: + +| Pattern ID | Regex/Check | Description | +|------------|-------------|-------------| +| MEM-001 | Later phases missing constraint reference | Constraint not carried forward | +| MEM-002 | `/\[TASK\][^[]*(?!\[CONSTRAINTS\])/` | Task without constraints section | +| MEM-003 | Key phases without checkpoint | Missing state preservation | +| MEM-004 | State schema lacks `original_requirements` | No constraint persistence | +| MEM-005 | No verification phase | Output not checked against intent | + +**Impact Levels**: +- **Critical**: Original goal completely lost +- **High**: Key constraints ignored in output +- **Medium**: Some requirements missing +- **Low**: Minor goal drift + +--- + +### 3. Data Flow Disruption (P0) + +**Definition**: Inconsistent state management causing data loss or corruption. + +**Root Causes**: +- Multiple state storage locations +- Inconsistent field naming +- Missing schema validation +- Format transformation without normalization + +**Detection Patterns**: + +| Pattern ID | Regex/Check | Description | +|------------|-------------|-------------| +| DF-001 | Multiple state file writes | Scattered state storage | +| DF-002 | Same concept, different names | Field naming inconsistency | +| DF-003 | JSON.parse without validation | Missing schema validation | +| DF-004 | Files written but never read | Orphaned outputs | +| DF-005 | Autonomous skill without state-schema | Undefined state structure | + +**Impact Levels**: +- **Critical**: Data loss or corruption +- **High**: State inconsistency between phases +- **Medium**: Potential for inconsistency +- **Low**: Minor naming inconsistencies + +--- + +### 4. Agent Coordination Failure (P1) + +**Definition**: Fragile agent call patterns causing cascading failures. + +**Root Causes**: +- Missing error handling in Task calls +- No result validation +- Inconsistent agent configurations +- Deeply nested agent calls + +**Detection Patterns**: + +| Pattern ID | Regex/Check | Description | +|------------|-------------|-------------| +| AGT-001 | Task without try-catch | Missing error handling | +| AGT-002 | Result used without validation | No return value check | +| AGT-003 | > 3 different agent types | Agent type proliferation | +| AGT-004 | Nested Task in prompt | Agent calling agent | +| AGT-005 | Task used but not in allowed-tools | Tool declaration mismatch | +| AGT-006 | Multiple return formats | Inconsistent agent output | + +**Impact Levels**: +- **Critical**: Workflow crash on agent failure +- **High**: Unpredictable agent behavior +- **Medium**: Occasional coordination issues +- **Low**: Minor inconsistencies + +--- + +## Severity Criteria + +### Global Severity Matrix + +| Severity | Definition | Action Required | +|----------|------------|-----------------| +| **Critical** | Blocks execution or causes data loss | Immediate fix required | +| **High** | Significantly impacts reliability | Should fix before deployment | +| **Medium** | Affects quality or maintainability | Fix in next iteration | +| **Low** | Minor improvement opportunity | Optional fix | + +### Severity Calculation + +```javascript +function calculateIssueSeverity(issue) { + const weights = { + impact_on_execution: 40, // Does it block workflow? + data_integrity_risk: 30, // Can it cause data loss? + frequency: 20, // How often does it occur? + complexity_to_fix: 10 // How hard to fix? + }; + + let score = 0; + + // Impact on execution + if (issue.blocks_execution) score += weights.impact_on_execution; + else if (issue.degrades_execution) score += weights.impact_on_execution * 0.5; + + // Data integrity + if (issue.causes_data_loss) score += weights.data_integrity_risk; + else if (issue.causes_inconsistency) score += weights.data_integrity_risk * 0.5; + + // Frequency + if (issue.occurs_every_run) score += weights.frequency; + else if (issue.occurs_sometimes) score += weights.frequency * 0.5; + + // Complexity (inverse - easier to fix = higher priority) + if (issue.fix_complexity === 'low') score += weights.complexity_to_fix; + else if (issue.fix_complexity === 'medium') score += weights.complexity_to_fix * 0.5; + + // Map score to severity + if (score >= 70) return 'critical'; + if (score >= 50) return 'high'; + if (score >= 30) return 'medium'; + return 'low'; +} +``` + +--- + +## Fix Mapping + +| Problem Type | Recommended Strategies | Priority Order | +|--------------|----------------------|----------------| +| Context Explosion | sliding_window, path_reference, context_summarization | 1, 2, 3 | +| Long-tail Forgetting | constraint_injection, state_constraints_field, checkpoint | 1, 2, 3 | +| Data Flow Disruption | state_centralization, schema_enforcement, field_normalization | 1, 2, 3 | +| Agent Coordination | error_wrapping, result_validation, flatten_nesting | 1, 2, 3 | + +--- + +## Cross-Category Dependencies + +Some issues may trigger others: + +``` +Context Explosion ──→ Long-tail Forgetting + (Large context causes important info to be pushed out) + +Data Flow Disruption ──→ Agent Coordination Failure + (Inconsistent data causes agents to fail) + +Agent Coordination Failure ──→ Context Explosion + (Failed retries add to context) +``` + +When fixing, address in this order: +1. **P0 Data Flow** - Foundation for other fixes +2. **P1 Agent Coordination** - Stability +3. **P2 Context Explosion** - Efficiency +4. **P3 Long-tail Forgetting** - Quality diff --git a/.claude/skills/skill-tuning/specs/quality-gates.md b/.claude/skills/skill-tuning/specs/quality-gates.md new file mode 100644 index 00000000..8bea3582 --- /dev/null +++ b/.claude/skills/skill-tuning/specs/quality-gates.md @@ -0,0 +1,263 @@ +# Quality Gates + +Quality thresholds and verification criteria for skill tuning. + +## When to Use + +| Phase | Usage | Section | +|-------|-------|---------| +| action-generate-report | Calculate quality score | Scoring | +| action-verify | Check quality gates | Gate Definitions | +| action-complete | Final assessment | Pass Criteria | + +--- + +## Quality Dimensions + +### 1. Issue Severity Distribution (40%) + +Measures the severity profile of identified issues. + +| Metric | Weight | Calculation | +|--------|--------|-------------| +| Critical Issues | -25 each | High penalty | +| High Issues | -15 each | Significant penalty | +| Medium Issues | -5 each | Moderate penalty | +| Low Issues | -1 each | Minor penalty | + +**Score Calculation**: +```javascript +function calculateSeverityScore(issues) { + const weights = { critical: 25, high: 15, medium: 5, low: 1 }; + const deductions = issues.reduce((sum, issue) => + sum + (weights[issue.severity] || 0), 0); + return Math.max(0, 100 - deductions); +} +``` + +### 2. Fix Effectiveness (30%) + +Measures success rate of applied fixes. + +| Metric | Weight | Threshold | +|--------|--------|-----------| +| Fixes Verified Pass | +30 | > 80% pass rate | +| Fixes Verified Fail | -20 | < 50% triggers review | +| Issues Resolved | +10 | Per resolved issue | + +**Score Calculation**: +```javascript +function calculateFixScore(appliedFixes) { + const total = appliedFixes.length; + if (total === 0) return 100; // No fixes needed = good + + const passed = appliedFixes.filter(f => f.verification_result === 'pass').length; + return Math.round((passed / total) * 100); +} +``` + +### 3. Coverage Completeness (20%) + +Measures diagnosis coverage across all areas. + +| Metric | Weight | Threshold | +|--------|--------|-----------| +| All 4 diagnosis complete | +20 | Full coverage | +| 3 diagnosis complete | +15 | Good coverage | +| 2 diagnosis complete | +10 | Partial coverage | +| < 2 diagnosis complete | +0 | Insufficient | + +### 4. Iteration Efficiency (10%) + +Measures how quickly issues are resolved. + +| Metric | Weight | Threshold | +|--------|--------|-----------| +| Resolved in 1 iteration | +10 | Excellent | +| Resolved in 2 iterations | +7 | Good | +| Resolved in 3 iterations | +4 | Acceptable | +| > 3 iterations | +0 | Needs improvement | + +--- + +## Gate Definitions + +### Gate: PASS + +**Threshold**: Quality Score >= 80 AND Critical Issues = 0 AND High Issues <= 2 + +**Meaning**: Skill is production-ready with minor issues. + +**Actions**: +- Complete tuning session +- Generate summary report +- No further fixes required + +### Gate: REVIEW + +**Threshold**: Quality Score 60-79 OR High Issues 3-5 + +**Meaning**: Skill has issues requiring attention. + +**Actions**: +- Review remaining issues +- Apply additional fixes if possible +- May require manual intervention + +### Gate: FAIL + +**Threshold**: Quality Score < 60 OR Critical Issues > 0 OR High Issues > 5 + +**Meaning**: Skill has serious issues blocking deployment. + +**Actions**: +- Must fix critical issues +- Re-run diagnosis after fixes +- Consider architectural review + +--- + +## Quality Score Calculation + +```javascript +function calculateQualityScore(state) { + // Dimension 1: Severity (40%) + const severityScore = calculateSeverityScore(state.issues); + + // Dimension 2: Fix Effectiveness (30%) + const fixScore = calculateFixScore(state.applied_fixes); + + // Dimension 3: Coverage (20%) + const diagnosisCount = Object.values(state.diagnosis) + .filter(d => d !== null).length; + const coverageScore = [0, 0, 10, 15, 20][diagnosisCount] || 0; + + // Dimension 4: Efficiency (10%) + const efficiencyScore = state.iteration_count <= 1 ? 10 : + state.iteration_count <= 2 ? 7 : + state.iteration_count <= 3 ? 4 : 0; + + // Weighted total + const total = (severityScore * 0.4) + + (fixScore * 0.3) + + (coverageScore * 1.0) + // Already scaled to 20 + (efficiencyScore * 1.0); // Already scaled to 10 + + return Math.round(total); +} + +function determineQualityGate(state) { + const score = calculateQualityScore(state); + const criticalCount = state.issues.filter(i => i.severity === 'critical').length; + const highCount = state.issues.filter(i => i.severity === 'high').length; + + if (criticalCount > 0) return 'fail'; + if (highCount > 5) return 'fail'; + if (score < 60) return 'fail'; + + if (highCount > 2) return 'review'; + if (score < 80) return 'review'; + + return 'pass'; +} +``` + +--- + +## Verification Criteria + +### For Each Issue Type + +#### Context Explosion Issues +- [ ] Token count does not grow unbounded +- [ ] History limited to reasonable size +- [ ] No full content in prompts (paths used instead) +- [ ] Agent returns are compact + +#### Long-tail Forgetting Issues +- [ ] Constraints visible in all phase prompts +- [ ] State schema includes requirements field +- [ ] Checkpoints exist at key milestones +- [ ] Output matches original constraints + +#### Data Flow Issues +- [ ] Single state.json after execution +- [ ] No orphan state files +- [ ] Schema validation active +- [ ] Consistent field naming + +#### Agent Coordination Issues +- [ ] All Task calls have error handling +- [ ] Agent results validated before use +- [ ] No nested agent calls +- [ ] Tool declarations match usage + +--- + +## Iteration Control + +### Max Iterations + +Default: 5 iterations + +**Rationale**: +- Each iteration may introduce new issues +- Diminishing returns after 3-4 iterations +- Prevents infinite loops + +### Iteration Exit Criteria + +```javascript +function shouldContinueIteration(state) { + // Exit if quality gate passed + if (state.quality_gate === 'pass') return false; + + // Exit if max iterations reached + if (state.iteration_count >= state.max_iterations) return false; + + // Exit if no improvement in last 2 iterations + if (state.iteration_count >= 2) { + const recentHistory = state.action_history.slice(-10); + const issuesResolvedRecently = recentHistory.filter(a => + a.action === 'action-verify' && a.result === 'success' + ).length; + + if (issuesResolvedRecently === 0) { + console.log('No progress in recent iterations, stopping.'); + return false; + } + } + + // Continue if critical/high issues remain + const hasUrgentIssues = state.issues.some(i => + i.severity === 'critical' || i.severity === 'high' + ); + + return hasUrgentIssues; +} +``` + +--- + +## Reporting Format + +### Quality Summary Table + +| Dimension | Score | Weight | Weighted | +|-----------|-------|--------|----------| +| Severity Distribution | {score}/100 | 40% | {weighted} | +| Fix Effectiveness | {score}/100 | 30% | {weighted} | +| Coverage Completeness | {score}/20 | 20% | {score} | +| Iteration Efficiency | {score}/10 | 10% | {score} | +| **Total** | | | **{total}/100** | + +### Gate Status + +``` +Quality Gate: {PASS|REVIEW|FAIL} + +Criteria: +- Quality Score: {score} (threshold: 60) +- Critical Issues: {count} (threshold: 0) +- High Issues: {count} (threshold: 5) +``` diff --git a/.claude/skills/skill-tuning/specs/tuning-strategies.md b/.claude/skills/skill-tuning/specs/tuning-strategies.md new file mode 100644 index 00000000..fcbd6c91 --- /dev/null +++ b/.claude/skills/skill-tuning/specs/tuning-strategies.md @@ -0,0 +1,1016 @@ +# Tuning Strategies + +Detailed fix strategies for each problem category with implementation guidance. + +## When to Use + +| Phase | Usage | Section | +|-------|-------|---------| +| action-propose-fixes | Strategy selection | Strategy Details | +| action-apply-fix | Implementation guidance | Implementation | +| action-verify | Verification steps | Verification | + +--- + +## Context Explosion Strategies + +### Strategy: sliding_window + +**Purpose**: Limit context history to most recent N items. + +**Implementation**: +```javascript +// In orchestrator.md or phase files +const MAX_HISTORY_ITEMS = 5; + +function updateHistory(state, newItem) { + const history = state.history || []; + const updated = [...history, newItem].slice(-MAX_HISTORY_ITEMS); + return { ...state, history: updated }; +} +``` + +**Files to Modify**: +- `phases/orchestrator.md` - Add history management +- `phases/state-schema.md` - Document history limit + +**Risk**: Low +**Verification**: +- Run skill for 10+ iterations +- Verify history.length never exceeds MAX_HISTORY_ITEMS + +--- + +### Strategy: path_reference + +**Purpose**: Pass file paths instead of full content. + +**Implementation**: +```javascript +// Before +const content = Read('data.json'); +const prompt = `Analyze: ${content}`; + +// After +const dataPath = `${workDir}/data.json`; +const prompt = `Analyze file at: ${dataPath}. Read it first.`; +``` + +**Files to Modify**: +- All phase files with `${content}` in prompts + +**Risk**: Low +**Verification**: +- Verify agents can still access required data +- Check token count reduced + +--- + +### Strategy: context_summarization + +**Purpose**: Add summarization step before passing to next phase. + +**Implementation**: +```javascript +// Add summarization agent +const summarizeResult = await Task({ + subagent_type: 'universal-executor', + prompt: ` + Summarize the following in <100 words, preserving key facts: + ${fullContent} + + Return JSON: { summary: "...", key_points: [...] } + ` +}); + +// Pass summary instead of full content +nextPhasePrompt = `Previous phase summary: ${summarizeResult.summary}`; +``` + +**Files to Modify**: +- Phase transition points +- Orchestrator (if autonomous) + +**Risk**: Low +**Verification**: +- Compare output quality with/without summarization +- Verify key information preserved + +--- + +### Strategy: structured_state + +**Purpose**: Replace text-based context with structured JSON state. + +**Implementation**: +```javascript +// Before: Text-based context passing +const context = ` + User requested: ${userRequest} + Previous output: ${previousOutput} + Current status: ${status} +`; + +// After: Structured state +const state = { + original_request: userRequest, + previous_output_path: `${workDir}/output.md`, + previous_output_summary: "...", + status: status, + key_decisions: [...] +}; +``` + +**Files to Modify**: +- `phases/state-schema.md` - Define structure +- All phases - Use structured fields + +**Risk**: Medium (requires refactoring) +**Verification**: +- Verify all phases can access required state fields +- Check backward compatibility + +--- + +## Long-tail Forgetting Strategies + +### Strategy: constraint_injection + +**Purpose**: Inject original constraints into every phase prompt. + +**Implementation**: +```javascript +// Add to every phase prompt template +const phasePrompt = ` +[CONSTRAINTS - FROM ORIGINAL REQUEST] +${state.original_requirements.map(r => `- ${r}`).join('\n')} + +[CURRENT TASK] +${taskDescription} + +[REMINDER] +Output MUST satisfy all constraints listed above. +`; +``` + +**Files to Modify**: +- All `phases/*.md` files +- `templates/agent-base.md` (if exists) + +**Risk**: Low +**Verification**: +- Verify constraints visible in each phase +- Test with specific constraint, verify output respects it + +--- + +### Strategy: state_constraints_field + +**Purpose**: Add dedicated field in state schema for requirements. + +**Implementation**: +```typescript +// In state-schema.md +interface State { + // Add these fields + original_requirements: string[]; // User's original constraints + goal_summary: string; // One-line goal statement + constraint_violations: string[]; // Track any violations +} + +// In action-init.md +function initState(userInput) { + return { + original_requirements: extractRequirements(userInput), + goal_summary: summarizeGoal(userInput), + constraint_violations: [] + }; +} +``` + +**Files to Modify**: +- `phases/state-schema.md` +- `phases/actions/action-init.md` + +**Risk**: Low +**Verification**: +- Verify state.json contains requirements after init +- Check requirements persist through all phases + +--- + +### Strategy: checkpoint_restore + +**Purpose**: Save state at key milestones for recovery and verification. + +**Implementation**: +```javascript +// Add checkpoint function +function createCheckpoint(state, workDir, checkpointName) { + const checkpointPath = `${workDir}/checkpoints/${checkpointName}.json`; + Write(checkpointPath, JSON.stringify({ + state: state, + timestamp: new Date().toISOString(), + name: checkpointName + }, null, 2)); + return checkpointPath; +} + +// Use at key points +await executePhase2(); +createCheckpoint(state, workDir, 'after-phase-2'); +``` + +**Files to Modify**: +- `phases/orchestrator.md` +- Key phase files + +**Risk**: Low +**Verification**: +- Verify checkpoints created at expected points +- Test restore from checkpoint + +--- + +### Strategy: goal_embedding + +**Purpose**: Track semantic similarity to original goal throughout execution. + +**Implementation**: +```javascript +// Store goal embedding at init +state.goal_embedding = await embed(state.goal_summary); + +// At each major phase, check alignment +const currentPlanEmbedding = await embed(currentPlan); +const similarity = cosineSimilarity(state.goal_embedding, currentPlanEmbedding); + +if (similarity < 0.7) { + console.warn('Goal drift detected! Similarity:', similarity); + // Trigger re-alignment +} +``` + +**Files to Modify**: +- State schema (add embedding field) +- Orchestrator (add similarity check) + +**Risk**: Medium (requires embedding infrastructure) +**Verification**: +- Test with intentional drift, verify detection +- Verify false positive rate acceptable + +--- + +## Data Flow Strategies + +### Strategy: state_centralization + +**Purpose**: Use single state.json for all persistent data. + +**Implementation**: +```javascript +// Create state manager +const StateManager = { + read: (workDir) => JSON.parse(Read(`${workDir}/state.json`)), + + update: (workDir, updates) => { + const current = StateManager.read(workDir); + const next = { ...current, ...updates, updated_at: new Date().toISOString() }; + Write(`${workDir}/state.json`, JSON.stringify(next, null, 2)); + return next; + }, + + get: (workDir, path) => { + const state = StateManager.read(workDir); + return path.split('.').reduce((obj, key) => obj?.[key], state); + } +}; + +// Replace direct writes +// Before: Write(`${workDir}/config.json`, config); +// After: StateManager.update(workDir, { config }); +``` + +**Files to Modify**: +- All phases that write state +- Create shared state manager + +**Risk**: Medium (significant refactoring) +**Verification**: +- Verify single state.json after full run +- Check no orphan state files + +--- + +### Strategy: schema_enforcement + +**Purpose**: Add runtime validation using Zod or similar. + +**Implementation**: +```javascript +// Define schema (in state-schema.md) +const StateSchema = { + status: ['pending', 'running', 'completed', 'failed'], + target_skill: { + name: 'string', + path: 'string' + }, + // ... full schema +}; + +function validateState(state) { + const errors = []; + + if (!StateSchema.status.includes(state.status)) { + errors.push(`Invalid status: ${state.status}`); + } + + if (typeof state.target_skill?.name !== 'string') { + errors.push('target_skill.name must be string'); + } + + if (errors.length > 0) { + throw new Error(`State validation failed:\n${errors.join('\n')}`); + } + + return true; +} + +// Use before state write +function updateState(workDir, updates) { + const newState = { ...currentState, ...updates }; + validateState(newState); // Throws if invalid + Write(`${workDir}/state.json`, JSON.stringify(newState, null, 2)); +} +``` + +**Files to Modify**: +- `phases/state-schema.md` - Add validation function +- All state write locations + +**Risk**: Low +**Verification**: +- Test with invalid state, verify rejection +- Verify valid state accepted + +--- + +### Strategy: field_normalization + +**Purpose**: Normalize field names across all phases. + +**Implementation**: +```javascript +// Create normalization mapping +const FIELD_NORMALIZATIONS = { + 'title': 'name', + 'identifier': 'id', + 'state': 'status', + 'error': 'errors' +}; + +function normalizeData(data) { + if (typeof data !== 'object' || data === null) return data; + + const normalized = {}; + for (const [key, value] of Object.entries(data)) { + const normalizedKey = FIELD_NORMALIZATIONS[key] || key; + normalized[normalizedKey] = normalizeData(value); + } + return normalized; +} + +// Apply when reading external data +const rawData = JSON.parse(Read(filePath)); +const normalizedData = normalizeData(rawData); +``` + +**Files to Modify**: +- Data ingestion points +- State update functions + +**Risk**: Low +**Verification**: +- Verify consistent field names in state +- Check no data loss during normalization + +--- + +## Agent Coordination Strategies + +### Strategy: error_wrapping + +**Purpose**: Add try-catch to all Task calls. + +**Implementation**: +```javascript +// Wrapper function +async function safeTask(config, state, updateState) { + const maxRetries = 3; + + for (let attempt = 1; attempt <= maxRetries; attempt++) { + try { + const result = await Task(config); + + // Validate result + if (!result) throw new Error('Empty result from agent'); + + return result; + } catch (error) { + console.log(`Task attempt ${attempt} failed: ${error.message}`); + + if (attempt === maxRetries) { + updateState({ + errors: [...state.errors, { + action: config.subagent_type, + message: error.message, + timestamp: new Date().toISOString() + }], + error_count: state.error_count + 1 + }); + throw error; + } + + // Wait before retry + await new Promise(r => setTimeout(r, 1000 * attempt)); + } + } +} +``` + +**Files to Modify**: +- All files with Task calls + +**Risk**: Low +**Verification**: +- Simulate agent failure, verify graceful handling +- Verify retry logic works + +--- + +### Strategy: result_validation + +**Purpose**: Validate agent returns before use. + +**Implementation**: +```javascript +function validateAgentResult(result, expectedSchema) { + // Try JSON parse + let parsed; + try { + parsed = typeof result === 'string' ? JSON.parse(result) : result; + } catch (e) { + throw new Error(`Agent result is not valid JSON: ${result.slice(0, 100)}`); + } + + // Check required fields + for (const field of expectedSchema.required || []) { + if (!(field in parsed)) { + throw new Error(`Missing required field: ${field}`); + } + } + + return parsed; +} + +// Usage +const rawResult = await Task({...}); +const validResult = validateAgentResult(rawResult, { + required: ['status', 'output_file'] +}); +``` + +**Files to Modify**: +- All locations where agent results are used + +**Risk**: Low +**Verification**: +- Test with invalid agent output +- Verify proper error messages + +--- + +### Strategy: flatten_nesting + +**Purpose**: Remove nested agent calls, use orchestrator coordination. + +**Implementation**: +```javascript +// Before: Agent A calls Agent B in its prompt +// Agent A prompt: "... then call Task({subagent_type: 'B', ...}) ..." + +// After: Agent A returns signal, orchestrator handles +// Agent A prompt: "If you need further analysis, return: { needs_agent_b: true, context: ... }" + +// Orchestrator handles: +const resultA = await Task({ subagent_type: 'A', ... }); +const parsedA = JSON.parse(resultA); + +if (parsedA.needs_agent_b) { + const resultB = await Task({ + subagent_type: 'B', + prompt: `Continue analysis with context: ${JSON.stringify(parsedA.context)}` + }); +} +``` + +**Files to Modify**: +- Phase files with nested Task calls +- Orchestrator decision logic + +**Risk**: Medium (may change agent behavior) +**Verification**: +- Verify no nested Task patterns +- Test agent chain via orchestrator + +--- + +## Strategy Selection Guide + +``` +Issue Type: Context Explosion +├── history grows unbounded? → sliding_window +├── full content in prompts? → path_reference +├── no summarization? → context_summarization +└── text-based context? → structured_state + +Issue Type: Long-tail Forgetting +├── constraints not in phases? → constraint_injection +├── no requirements in state? → state_constraints_field +├── no recovery points? → checkpoint_restore +└── goal drift risk? → goal_embedding + +Issue Type: Data Flow +├── multiple state files? → state_centralization +├── no validation? → schema_enforcement +└── inconsistent names? → field_normalization + +Issue Type: Agent Coordination +├── no error handling? → error_wrapping +├── no result validation? → result_validation +└── nested agent calls? → flatten_nesting + +Issue Type: Prompt Engineering +├── vague instructions? → structured_prompt +├── inconsistent output? → output_schema +├── hallucination risk? → grounding_context +└── format drift? → format_enforcement + +Issue Type: Architecture +├── unclear responsibilities? → phase_decomposition +├── tight coupling? → interface_contracts +├── poor extensibility? → plugin_architecture +└── complex flow? → state_machine + +Issue Type: Performance +├── high token usage? → token_budgeting +├── slow execution? → parallel_execution +├── redundant computation? → result_caching +└── large files? → lazy_loading + +Issue Type: Error Handling +├── no recovery? → graceful_degradation +├── silent failures? → error_propagation +├── no logging? → structured_logging +└── unclear errors? → error_context + +Issue Type: Output Quality +├── inconsistent quality? → quality_gates +├── no verification? → output_validation +├── format issues? → template_enforcement +└── incomplete output? → completeness_check + +Issue Type: User Experience +├── no progress? → progress_tracking +├── unclear status? → status_communication +├── no feedback? → interactive_checkpoints +└── confusing flow? → guided_workflow +``` + +--- + +## General Tuning Strategies (按需 via Gemini CLI) + +以下策略针对更通用的优化场景,通常需要 Gemini CLI 进行深度分析后生成具体实现。 + +--- + +### Prompt Engineering Strategies + +#### Strategy: structured_prompt + +**Purpose**: 将模糊指令转换为结构化提示词。 + +**Implementation**: +```javascript +// Before: Vague prompt +const prompt = "Please analyze the code and give suggestions"; + +// After: Structured prompt +const prompt = ` +[ROLE] +You are a code analysis expert specializing in ${domain}. + +[TASK] +Analyze the provided code for: +1. Code quality issues +2. Performance bottlenecks +3. Security vulnerabilities + +[INPUT] +File: ${filePath} +Context: ${context} + +[OUTPUT FORMAT] +Return JSON: +{ + "issues": [{ "type": "...", "severity": "...", "location": "...", "suggestion": "..." }], + "summary": "..." +} + +[CONSTRAINTS] +- Focus on actionable issues only +- Limit to top 10 findings +`; +``` + +**Risk**: Low +**Verification**: Check output consistency across multiple runs + +--- + +#### Strategy: output_schema + +**Purpose**: 强制 LLM 输出符合特定 schema。 + +**Implementation**: +```javascript +// Define expected schema +const outputSchema = { + type: 'object', + required: ['status', 'result'], + properties: { + status: { enum: ['success', 'error', 'partial'] }, + result: { type: 'object' }, + errors: { type: 'array' } + } +}; + +// Include in prompt +const prompt = ` +...task description... + +[OUTPUT SCHEMA] +Your response MUST be valid JSON matching this schema: +${JSON.stringify(outputSchema, null, 2)} + +[VALIDATION] +Before returning, verify your output: +1. Is it valid JSON? +2. Does it have all required fields? +3. Are field types correct? +`; +``` + +**Risk**: Low +**Verification**: JSON.parse + schema validation + +--- + +#### Strategy: grounding_context + +**Purpose**: 提供足够上下文减少幻觉。 + +**Implementation**: +```javascript +// Gather grounding context +const groundingContext = { + codebase_patterns: await analyzePatterns(skillPath), + existing_examples: await findSimilarImplementations(taskType), + constraints: state.original_requirements +}; + +const prompt = ` +[GROUNDING CONTEXT] +This skill follows these patterns: +${JSON.stringify(groundingContext.codebase_patterns)} + +Similar implementations exist at: +${groundingContext.existing_examples.map(e => `- ${e.path}`).join('\n')} + +[TASK] +${taskDescription} + +[IMPORTANT] +- Only suggest patterns that exist in the codebase +- Reference specific files when making suggestions +- If unsure, indicate uncertainty level +`; +``` + +**Risk**: Medium (requires context gathering) +**Verification**: Check suggestions match existing patterns + +--- + +### Architecture Strategies + +#### Strategy: phase_decomposition + +**Purpose**: 重新划分阶段以清晰化职责。 + +**Analysis via Gemini**: +```bash +ccw cli -p " +PURPOSE: Analyze phase decomposition for skill at ${skillPath} +TASK: • Map current phase responsibilities • Identify overlapping concerns • Suggest cleaner boundaries +MODE: analysis +CONTEXT: @phases/**/*.md +EXPECTED: { current_phases: [], overlaps: [], recommended_structure: [] } +" --tool gemini --mode analysis +``` + +**Implementation Pattern**: +``` +Before: Monolithic phases +Phase1: Collect + Analyze + Transform + Output + +After: Single-responsibility phases +Phase1: Collect (input gathering) +Phase2: Analyze (processing) +Phase3: Transform (conversion) +Phase4: Output (delivery) +``` + +--- + +#### Strategy: interface_contracts + +**Purpose**: 定义阶段间的数据契约。 + +**Implementation**: +```typescript +// Define contracts in state-schema.md +interface PhaseContract { + input: { + required: string[]; + optional: string[]; + schema: object; + }; + output: { + guarantees: string[]; + schema: object; + }; +} + +// Phase 1 output contract +const phase1Contract: PhaseContract = { + input: { + required: ['user_request'], + optional: ['preferences'], + schema: { /* ... */ } + }, + output: { + guarantees: ['parsed_requirements', 'validation_status'], + schema: { /* ... */ } + } +}; +``` + +--- + +### Performance Strategies + +#### Strategy: token_budgeting + +**Purpose**: 为每个阶段设置 Token 预算。 + +**Implementation**: +```javascript +const TOKEN_BUDGETS = { + 'phase-collect': 2000, + 'phase-analyze': 5000, + 'phase-generate': 8000, + total: 15000 +}; + +function checkBudget(phase, estimatedTokens) { + if (estimatedTokens > TOKEN_BUDGETS[phase]) { + console.warn(`Phase ${phase} exceeds budget: ${estimatedTokens} > ${TOKEN_BUDGETS[phase]}`); + // Trigger summarization or truncation + return false; + } + return true; +} +``` + +--- + +#### Strategy: parallel_execution + +**Purpose**: 并行执行独立任务。 + +**Implementation**: +```javascript +// Before: Sequential +const result1 = await Task({ subagent_type: 'analyzer', prompt: prompt1 }); +const result2 = await Task({ subagent_type: 'analyzer', prompt: prompt2 }); +const result3 = await Task({ subagent_type: 'analyzer', prompt: prompt3 }); + +// After: Parallel (when independent) +const [result1, result2, result3] = await Promise.all([ + Task({ subagent_type: 'analyzer', prompt: prompt1, run_in_background: true }), + Task({ subagent_type: 'analyzer', prompt: prompt2, run_in_background: true }), + Task({ subagent_type: 'analyzer', prompt: prompt3, run_in_background: true }) +]); +``` + +--- + +#### Strategy: result_caching + +**Purpose**: 缓存中间结果避免重复计算。 + +**Implementation**: +```javascript +const cache = {}; + +async function cachedAnalysis(key, analysisFunc) { + if (cache[key]) { + console.log(`Cache hit: ${key}`); + return cache[key]; + } + + const result = await analysisFunc(); + cache[key] = result; + + // Persist to disk for cross-session caching + Write(`${workDir}/cache/${key}.json`, JSON.stringify(result)); + + return result; +} +``` + +--- + +### Error Handling Strategies + +#### Strategy: graceful_degradation + +**Purpose**: 失败时降级而非崩溃。 + +**Implementation**: +```javascript +async function executeWithDegradation(primaryTask, fallbackTask) { + try { + return await primaryTask(); + } catch (error) { + console.warn(`Primary task failed: ${error.message}, using fallback`); + + try { + return await fallbackTask(); + } catch (fallbackError) { + console.error(`Fallback also failed: ${fallbackError.message}`); + return { + status: 'degraded', + partial_result: null, + error: fallbackError.message + }; + } + } +} +``` + +--- + +#### Strategy: structured_logging + +**Purpose**: 添加结构化日志便于调试。 + +**Implementation**: +```javascript +function log(level, action, data) { + const entry = { + timestamp: new Date().toISOString(), + level, + action, + ...data + }; + + // Append to log file + const logPath = `${workDir}/execution.log`; + const existing = Read(logPath) || ''; + Write(logPath, existing + JSON.stringify(entry) + '\n'); + + // Console output + console.log(`[${level}] ${action}:`, JSON.stringify(data)); +} + +// Usage +log('INFO', 'phase_start', { phase: 'analyze', input_size: 1000 }); +log('ERROR', 'agent_failure', { agent: 'universal-executor', error: err.message }); +``` + +--- + +### Output Quality Strategies + +#### Strategy: quality_gates + +**Purpose**: 输出前进行质量检查。 + +**Implementation**: +```javascript +const qualityGates = [ + { + name: 'completeness', + check: (output) => output.sections?.length >= 3, + message: 'Output must have at least 3 sections' + }, + { + name: 'format', + check: (output) => /^#\s/.test(output.content), + message: 'Output must start with markdown heading' + }, + { + name: 'length', + check: (output) => output.content?.length >= 500, + message: 'Output must be at least 500 characters' + } +]; + +function validateOutput(output) { + const failures = qualityGates + .filter(gate => !gate.check(output)) + .map(gate => gate.message); + + if (failures.length > 0) { + throw new Error(`Quality gate failures:\n${failures.join('\n')}`); + } + + return true; +} +``` + +--- + +### User Experience Strategies + +#### Strategy: progress_tracking + +**Purpose**: 显示执行进度。 + +**Implementation**: +```javascript +function updateProgress(current, total, description) { + const percentage = Math.round((current / total) * 100); + const progressBar = '█'.repeat(percentage / 5) + '░'.repeat(20 - percentage / 5); + + console.log(`[${progressBar}] ${percentage}% - ${description}`); + + // Update state for UI + updateState({ + progress: { + current, + total, + percentage, + description + } + }); +} + +// Usage +updateProgress(1, 5, 'Initializing tuning session...'); +updateProgress(2, 5, 'Running context diagnosis...'); +``` + +--- + +#### Strategy: interactive_checkpoints + +**Purpose**: 在关键点暂停获取用户确认。 + +**Implementation**: +```javascript +async function checkpoint(name, summary, options) { + console.log(`\n=== Checkpoint: ${name} ===`); + console.log(summary); + + const response = await AskUserQuestion({ + questions: [{ + question: `Review ${name} results. How to proceed?`, + header: 'Checkpoint', + options: options || [ + { label: 'Continue', description: 'Proceed with next step' }, + { label: 'Modify', description: 'Adjust parameters and retry' }, + { label: 'Skip', description: 'Skip this step' }, + { label: 'Abort', description: 'Stop the workflow' } + ], + multiSelect: false + }] + }); + + return response; +} +``` diff --git a/.claude/skills/skill-tuning/templates/diagnosis-report.md b/.claude/skills/skill-tuning/templates/diagnosis-report.md new file mode 100644 index 00000000..e5336179 --- /dev/null +++ b/.claude/skills/skill-tuning/templates/diagnosis-report.md @@ -0,0 +1,153 @@ +# Diagnosis Report Template + +Template for individual diagnosis action reports. + +## Template + +```markdown +# {{diagnosis_type}} Diagnosis Report + +**Target Skill**: {{skill_name}} +**Diagnosis Type**: {{diagnosis_type}} +**Executed At**: {{timestamp}} +**Duration**: {{duration_ms}}ms + +--- + +## Summary + +| Metric | Value | +|--------|-------| +| Issues Found | {{issues_found}} | +| Severity | {{severity}} | +| Patterns Checked | {{patterns_checked_count}} | +| Patterns Matched | {{patterns_matched_count}} | + +--- + +## Patterns Analyzed + +{{#each patterns_checked}} +### {{pattern_name}} + +- **Status**: {{status}} +- **Matches**: {{match_count}} +- **Files Affected**: {{affected_files}} + +{{/each}} + +--- + +## Issues Identified + +{{#if issues.length}} +{{#each issues}} +### {{id}}: {{description}} + +| Field | Value | +|-------|-------| +| Type | {{type}} | +| Severity | {{severity}} | +| Location | {{location}} | +| Root Cause | {{root_cause}} | +| Impact | {{impact}} | + +**Evidence**: +{{#each evidence}} +- `{{this}}` +{{/each}} + +**Suggested Fix**: {{suggested_fix}} + +--- +{{/each}} +{{else}} +_No issues found in this diagnosis area._ +{{/if}} + +--- + +## Recommendations + +{{#if recommendations.length}} +{{#each recommendations}} +{{@index}}. {{this}} +{{/each}} +{{else}} +No specific recommendations - area appears healthy. +{{/if}} + +--- + +## Raw Data + +Full diagnosis data available at: +`{{output_file}}` +``` + +## Variable Reference + +| Variable | Type | Source | +|----------|------|--------| +| `diagnosis_type` | string | 'context' \| 'memory' \| 'dataflow' \| 'agent' | +| `skill_name` | string | state.target_skill.name | +| `timestamp` | string | ISO timestamp | +| `duration_ms` | number | Execution time | +| `issues_found` | number | issues.length | +| `severity` | string | Calculated severity | +| `patterns_checked` | array | Patterns analyzed | +| `patterns_matched` | array | Patterns with matches | +| `issues` | array | Issue objects | +| `recommendations` | array | String recommendations | +| `output_file` | string | Path to JSON file | + +## Usage + +```javascript +function renderDiagnosisReport(diagnosis, diagnosisType, skillName, outputFile) { + return `# ${diagnosisType} Diagnosis Report + +**Target Skill**: ${skillName} +**Diagnosis Type**: ${diagnosisType} +**Executed At**: ${new Date().toISOString()} +**Duration**: ${diagnosis.execution_time_ms}ms + +--- + +## Summary + +| Metric | Value | +|--------|-------| +| Issues Found | ${diagnosis.issues_found} | +| Severity | ${diagnosis.severity} | +| Patterns Checked | ${diagnosis.details.patterns_checked.length} | +| Patterns Matched | ${diagnosis.details.patterns_matched.length} | + +--- + +## Issues Identified + +${diagnosis.details.evidence.map((e, i) => ` +### Issue ${i + 1} + +- **File**: ${e.file} +- **Pattern**: ${e.pattern} +- **Severity**: ${e.severity} +- **Context**: \`${e.context}\` +`).join('\n')} + +--- + +## Recommendations + +${diagnosis.details.recommendations.map((r, i) => `${i + 1}. ${r}`).join('\n')} + +--- + +## Raw Data + +Full diagnosis data available at: +\`${outputFile}\` +`; +} +``` diff --git a/.claude/skills/skill-tuning/templates/fix-proposal.md b/.claude/skills/skill-tuning/templates/fix-proposal.md new file mode 100644 index 00000000..5757b800 --- /dev/null +++ b/.claude/skills/skill-tuning/templates/fix-proposal.md @@ -0,0 +1,204 @@ +# Fix Proposal Template + +Template for fix proposal documentation. + +## Template + +```markdown +# Fix Proposal: {{fix_id}} + +**Strategy**: {{strategy}} +**Risk Level**: {{risk}} +**Issues Addressed**: {{issue_ids}} + +--- + +## Description + +{{description}} + +## Rationale + +{{rationale}} + +--- + +## Affected Files + +{{#each changes}} +### {{file}} + +**Action**: {{action}} + +```diff +{{diff}} +``` + +{{/each}} + +--- + +## Implementation Steps + +{{#each implementation_steps}} +{{@index}}. {{this}} +{{/each}} + +--- + +## Risk Assessment + +| Factor | Assessment | +|--------|------------| +| Complexity | {{complexity}} | +| Reversibility | {{reversible ? 'Yes' : 'No'}} | +| Breaking Changes | {{breaking_changes}} | +| Test Coverage | {{test_coverage}} | + +**Overall Risk**: {{risk}} + +--- + +## Verification Steps + +{{#each verification_steps}} +- [ ] {{this}} +{{/each}} + +--- + +## Rollback Plan + +{{#if rollback_available}} +To rollback this fix: + +```bash +{{rollback_command}} +``` +{{else}} +_Rollback not available for this fix type._ +{{/if}} + +--- + +## Estimated Impact + +{{estimated_impact}} +``` + +## Variable Reference + +| Variable | Type | Source | +|----------|------|--------| +| `fix_id` | string | Generated ID (FIX-001) | +| `strategy` | string | Fix strategy name | +| `risk` | string | 'low' \| 'medium' \| 'high' | +| `issue_ids` | array | Related issue IDs | +| `description` | string | Human-readable description | +| `rationale` | string | Why this fix works | +| `changes` | array | File change objects | +| `implementation_steps` | array | Step-by-step guide | +| `verification_steps` | array | How to verify fix worked | +| `estimated_impact` | string | Expected improvement | + +## Usage + +```javascript +function renderFixProposal(fix) { + return `# Fix Proposal: ${fix.id} + +**Strategy**: ${fix.strategy} +**Risk Level**: ${fix.risk} +**Issues Addressed**: ${fix.issue_ids.join(', ')} + +--- + +## Description + +${fix.description} + +## Rationale + +${fix.rationale} + +--- + +## Affected Files + +${fix.changes.map(change => ` +### ${change.file} + +**Action**: ${change.action} + +\`\`\`diff +${change.diff || change.new_content?.slice(0, 200) || 'N/A'} +\`\`\` +`).join('\n')} + +--- + +## Verification Steps + +${fix.verification_steps.map(step => `- [ ] ${step}`).join('\n')} + +--- + +## Estimated Impact + +${fix.estimated_impact} +`; +} +``` + +## Fix Strategy Templates + +### sliding_window + +```markdown +## Description +Implement sliding window for conversation history to prevent unbounded growth. + +## Changes +- Add MAX_HISTORY constant +- Modify history update logic to slice array +- Update state schema documentation + +## Verification +- [ ] Run skill for 10+ iterations +- [ ] Verify history.length <= MAX_HISTORY +- [ ] Check no data loss for recent items +``` + +### constraint_injection + +```markdown +## Description +Add explicit constraint section to each phase prompt. + +## Changes +- Add [CONSTRAINTS] section template +- Reference state.original_requirements +- Add reminder before output section + +## Verification +- [ ] Check constraints visible in all phases +- [ ] Test with specific constraint +- [ ] Verify output respects constraint +``` + +### error_wrapping + +```markdown +## Description +Wrap all Task calls in try-catch with retry logic. + +## Changes +- Create safeTask wrapper function +- Replace direct Task calls +- Add error logging to state + +## Verification +- [ ] Simulate agent failure +- [ ] Verify graceful error handling +- [ ] Check retry logic +```