mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-05 01:50:27 +08:00
Add quality gates and tuning strategies documentation
- Introduced quality gates specification for skill tuning, detailing quality dimensions, scoring, and gate definitions. - Added comprehensive tuning strategies for various issue categories, including context explosion, long-tail forgetting, data flow, and agent coordination. - Created templates for diagnosis reports and fix proposals to standardize documentation and reporting processes.
This commit is contained in:
342
.claude/skills/skill-tuning/SKILL.md
Normal file
342
.claude/skills/skill-tuning/SKILL.md
Normal file
@@ -0,0 +1,342 @@
|
||||
---
|
||||
name: skill-tuning
|
||||
description: Universal skill diagnosis and optimization tool. Detect and fix skill execution issues including context explosion, long-tail forgetting, data flow disruption, and agent coordination failures. Supports Gemini CLI for deep analysis. Triggers on "skill tuning", "tune skill", "skill diagnosis", "optimize skill", "skill debug".
|
||||
allowed-tools: Task, AskUserQuestion, Read, Write, Bash, Glob, Grep, mcp__ace-tool__search_context
|
||||
---
|
||||
|
||||
# Skill Tuning
|
||||
|
||||
Universal skill diagnosis and optimization tool that identifies and resolves skill execution problems through iterative multi-agent analysis.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Skill Tuning Architecture (Autonomous Mode + Gemini CLI) │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ⚠️ Phase 0: Specification → 阅读规范 + 理解目标 skill 结构 (强制前置) │
|
||||
│ Study │
|
||||
│ ↓ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Orchestrator (状态驱动决策) │ │
|
||||
│ │ 读取诊断状态 → 选择下一步动作 → 执行 → 更新状态 → 循环直到完成 │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────┬───────────┼───────────┬────────────┬────────────┐ │
|
||||
│ ↓ ↓ ↓ ↓ ↓ ↓ │
|
||||
│ ┌──────┐ ┌─────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌─────────┐ │
|
||||
│ │ Init │ │Diagnose │ │Diagnose│ │Diagnose│ │Diagnose│ │ Gemini │ │
|
||||
│ │ │ │ Context │ │ Memory │ │DataFlow│ │ Agent │ │Analysis │ │
|
||||
│ └──────┘ └─────────┘ └────────┘ └────────┘ └────────┘ └─────────┘ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ └───────────┴───────────┴───────────┴────────────┴────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌──────────────────┐ │
|
||||
│ │ Apply Fixes + │ │
|
||||
│ │ Verify Results │ │
|
||||
│ └──────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Gemini CLI Integration │ │
|
||||
│ │ 根据用户需求动态调用 gemini cli 进行深度分析: │ │
|
||||
│ │ • 复杂问题分析 (prompt engineering, architecture review) │ │
|
||||
│ │ • 代码模式识别 (pattern matching, anti-pattern detection) │ │
|
||||
│ │ • 修复策略生成 (fix generation, refactoring suggestions) │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Problem Domain
|
||||
|
||||
Based on comprehensive analysis, skill-tuning addresses **core skill issues** and **general optimization areas**:
|
||||
|
||||
### Core Skill Issues (自动检测)
|
||||
|
||||
| Priority | Problem | Root Cause | Solution Strategy |
|
||||
|----------|---------|------------|-------------------|
|
||||
| **P0** | Data Flow Disruption | Scattered state, inconsistent formats | Centralized session store, transactional updates |
|
||||
| **P1** | Agent Coordination | Fragile call chains, merge complexity | Dedicated orchestrator, enforced data contracts |
|
||||
| **P2** | Context Explosion | Token accumulation, multi-turn bloat | Context summarization, sliding window, structured state |
|
||||
| **P3** | Long-tail Forgetting | Early constraint loss | Constraint injection, checkpointing, goal alignment |
|
||||
|
||||
### General Optimization Areas (按需分析 via Gemini CLI)
|
||||
|
||||
| Category | Issues | Gemini Analysis Scope |
|
||||
|----------|--------|----------------------|
|
||||
| **Prompt Engineering** | 模糊指令, 输出格式不一致, 幻觉风险 | 提示词优化, 结构化输出设计 |
|
||||
| **Architecture** | 阶段划分不合理, 依赖混乱, 扩展性差 | 架构审查, 模块化建议 |
|
||||
| **Performance** | 执行慢, Token消耗高, 重复计算 | 性能分析, 缓存策略 |
|
||||
| **Error Handling** | 错误恢复不当, 无降级策略, 日志不足 | 容错设计, 可观测性增强 |
|
||||
| **Output Quality** | 输出不稳定, 格式漂移, 质量波动 | 质量门控, 验证机制 |
|
||||
| **User Experience** | 交互不流畅, 反馈不清晰, 进度不可见 | UX优化, 进度追踪 |
|
||||
|
||||
## Key Design Principles
|
||||
|
||||
1. **Problem-First Diagnosis**: Systematic identification before any fix attempt
|
||||
2. **Data-Driven Analysis**: Record execution traces, token counts, state snapshots
|
||||
3. **Iterative Refinement**: Multiple tuning rounds until quality gates pass
|
||||
4. **Non-Destructive**: All changes are reversible with backup checkpoints
|
||||
5. **Agent Coordination**: Use specialized sub-agents for each diagnosis type
|
||||
6. **Gemini CLI On-Demand**: Deep analysis via CLI for complex/custom issues
|
||||
|
||||
---
|
||||
|
||||
## Gemini CLI Integration
|
||||
|
||||
根据用户需求动态调用 Gemini CLI 进行深度分析。
|
||||
|
||||
### Trigger Conditions
|
||||
|
||||
| Condition | Action | CLI Mode |
|
||||
|-----------|--------|----------|
|
||||
| 用户描述复杂问题 | 调用 Gemini 分析问题根因 | `analysis` |
|
||||
| 自动诊断发现 critical 问题 | 请求深度分析确认 | `analysis` |
|
||||
| 用户请求架构审查 | 执行架构分析 | `analysis` |
|
||||
| 需要生成修复代码 | 生成修复提案 | `write` |
|
||||
| 标准策略不适用 | 请求定制化策略 | `analysis` |
|
||||
|
||||
### CLI Command Template
|
||||
|
||||
```bash
|
||||
ccw cli -p "
|
||||
PURPOSE: ${purpose}
|
||||
TASK: ${task_steps}
|
||||
MODE: ${mode}
|
||||
CONTEXT: @${skill_path}/**/*
|
||||
EXPECTED: ${expected_output}
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/${mode}-protocol.md) | ${constraints}
|
||||
" --tool gemini --mode ${mode} --cd ${skill_path}
|
||||
```
|
||||
|
||||
### Analysis Types
|
||||
|
||||
#### 1. Problem Root Cause Analysis
|
||||
|
||||
```bash
|
||||
ccw cli -p "
|
||||
PURPOSE: Identify root cause of skill execution issue: ${user_issue_description}
|
||||
TASK: • Analyze skill structure and phase flow • Identify anti-patterns • Trace data flow issues
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: JSON with { root_causes: [], patterns_found: [], recommendations: [] }
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Focus on execution flow
|
||||
" --tool gemini --mode analysis
|
||||
```
|
||||
|
||||
#### 2. Architecture Review
|
||||
|
||||
```bash
|
||||
ccw cli -p "
|
||||
PURPOSE: Review skill architecture for scalability and maintainability
|
||||
TASK: • Evaluate phase decomposition • Check state management patterns • Assess agent coordination
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: Architecture assessment with improvement recommendations
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Focus on modularity
|
||||
" --tool gemini --mode analysis
|
||||
```
|
||||
|
||||
#### 3. Fix Strategy Generation
|
||||
|
||||
```bash
|
||||
ccw cli -p "
|
||||
PURPOSE: Generate fix strategy for issue: ${issue_id} - ${issue_description}
|
||||
TASK: • Analyze issue context • Design fix approach • Generate implementation plan
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: JSON with { strategy: string, changes: [], verification_steps: [] }
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) | Minimal invasive changes
|
||||
" --tool gemini --mode analysis
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Mandatory Prerequisites
|
||||
|
||||
> **CRITICAL**: Read these documents before executing any action.
|
||||
|
||||
### Core Specs (Required)
|
||||
|
||||
| Document | Purpose | Priority |
|
||||
|----------|---------|----------|
|
||||
| [specs/problem-taxonomy.md](specs/problem-taxonomy.md) | Problem classification and detection patterns | **P0** |
|
||||
| [specs/tuning-strategies.md](specs/tuning-strategies.md) | Fix strategies for each problem type | **P0** |
|
||||
| [specs/quality-gates.md](specs/quality-gates.md) | Quality thresholds and verification criteria | P1 |
|
||||
|
||||
### Templates (Reference)
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [templates/diagnosis-report.md](templates/diagnosis-report.md) | Diagnosis report structure |
|
||||
| [templates/fix-proposal.md](templates/fix-proposal.md) | Fix proposal format |
|
||||
|
||||
---
|
||||
|
||||
## Execution Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Phase 0: Specification Study (强制前置 - 禁止跳过) │
|
||||
│ → Read: specs/problem-taxonomy.md (问题分类) │
|
||||
│ → Read: specs/tuning-strategies.md (调优策略) │
|
||||
│ → Read: Target skill's SKILL.md and phases/*.md │
|
||||
│ → Output: 内化规范,理解目标 skill 结构 │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-init: Initialize Tuning Session │
|
||||
│ → Create work directory: .workflow/.scratchpad/skill-tuning-{timestamp} │
|
||||
│ → Initialize state.json with target skill info │
|
||||
│ → Create backup of target skill files │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-diagnose-context: Context Explosion Analysis │
|
||||
│ → Scan for token accumulation patterns │
|
||||
│ → Detect multi-turn dialogue growth │
|
||||
│ → Output: context-diagnosis.json │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-diagnose-memory: Long-tail Forgetting Analysis │
|
||||
│ → Trace constraint propagation through phases │
|
||||
│ → Detect early instruction loss │
|
||||
│ → Output: memory-diagnosis.json │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-diagnose-dataflow: Data Flow Analysis │
|
||||
│ → Map state transitions between phases │
|
||||
│ → Detect format inconsistencies │
|
||||
│ → Output: dataflow-diagnosis.json │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-diagnose-agent: Agent Coordination Analysis │
|
||||
│ → Analyze agent call patterns │
|
||||
│ → Detect result passing issues │
|
||||
│ → Output: agent-diagnosis.json │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-generate-report: Consolidated Report │
|
||||
│ → Merge all diagnosis results │
|
||||
│ → Prioritize issues by severity │
|
||||
│ → Output: tuning-report.md │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-propose-fixes: Fix Proposal Generation │
|
||||
│ → Generate fix strategies for each issue │
|
||||
│ → Create implementation plan │
|
||||
│ → Output: fix-proposals.json │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-apply-fix: Apply Selected Fix │
|
||||
│ → User selects fix to apply │
|
||||
│ → Execute fix with backup │
|
||||
│ → Update state with fix result │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-verify: Verification │
|
||||
│ → Re-run affected diagnosis │
|
||||
│ → Check quality gates │
|
||||
│ → Update iteration count │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ action-complete: Finalization │
|
||||
│ → Generate final report │
|
||||
│ → Cleanup temporary files │
|
||||
│ → Output: tuning-summary.md │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Directory Setup
|
||||
|
||||
```javascript
|
||||
const timestamp = new Date().toISOString().slice(0,19).replace(/[-:T]/g, '');
|
||||
const workDir = `.workflow/.scratchpad/skill-tuning-${timestamp}`;
|
||||
|
||||
Bash(`mkdir -p "${workDir}/diagnosis"`);
|
||||
Bash(`mkdir -p "${workDir}/backups"`);
|
||||
Bash(`mkdir -p "${workDir}/fixes"`);
|
||||
```
|
||||
|
||||
## Output Structure
|
||||
|
||||
```
|
||||
.workflow/.scratchpad/skill-tuning-{timestamp}/
|
||||
├── state.json # Session state (orchestrator-managed)
|
||||
├── diagnosis/
|
||||
│ ├── context-diagnosis.json # Context explosion analysis
|
||||
│ ├── memory-diagnosis.json # Long-tail forgetting analysis
|
||||
│ ├── dataflow-diagnosis.json # Data flow analysis
|
||||
│ └── agent-diagnosis.json # Agent coordination analysis
|
||||
├── backups/
|
||||
│ └── {skill-name}-backup/ # Original skill files backup
|
||||
├── fixes/
|
||||
│ ├── fix-proposals.json # Proposed fixes
|
||||
│ └── applied-fixes.json # Applied fix history
|
||||
├── tuning-report.md # Consolidated diagnosis report
|
||||
└── tuning-summary.md # Final summary
|
||||
```
|
||||
|
||||
## State Schema
|
||||
|
||||
```typescript
|
||||
interface TuningState {
|
||||
status: 'pending' | 'running' | 'completed' | 'failed';
|
||||
target_skill: {
|
||||
name: string;
|
||||
path: string;
|
||||
execution_mode: 'sequential' | 'autonomous';
|
||||
};
|
||||
user_issue_description: string;
|
||||
diagnosis: {
|
||||
context: DiagnosisResult | null;
|
||||
memory: DiagnosisResult | null;
|
||||
dataflow: DiagnosisResult | null;
|
||||
agent: DiagnosisResult | null;
|
||||
};
|
||||
issues: Issue[];
|
||||
proposed_fixes: Fix[];
|
||||
applied_fixes: AppliedFix[];
|
||||
iteration_count: number;
|
||||
max_iterations: number;
|
||||
quality_score: number;
|
||||
completed_actions: string[];
|
||||
current_action: string | null;
|
||||
errors: Error[];
|
||||
error_count: number;
|
||||
}
|
||||
|
||||
interface DiagnosisResult {
|
||||
status: 'completed' | 'skipped';
|
||||
issues_found: number;
|
||||
severity: 'critical' | 'high' | 'medium' | 'low' | 'none';
|
||||
details: any;
|
||||
}
|
||||
|
||||
interface Issue {
|
||||
id: string;
|
||||
type: 'context_explosion' | 'memory_loss' | 'dataflow_break' | 'agent_failure';
|
||||
severity: 'critical' | 'high' | 'medium' | 'low';
|
||||
location: string;
|
||||
description: string;
|
||||
evidence: string[];
|
||||
}
|
||||
|
||||
interface Fix {
|
||||
id: string;
|
||||
issue_id: string;
|
||||
strategy: string;
|
||||
description: string;
|
||||
changes: FileChange[];
|
||||
risk: 'low' | 'medium' | 'high';
|
||||
}
|
||||
```
|
||||
|
||||
## Reference Documents
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [phases/orchestrator.md](phases/orchestrator.md) | Orchestrator decision logic |
|
||||
| [phases/state-schema.md](phases/state-schema.md) | State structure definition |
|
||||
| [phases/actions/action-init.md](phases/actions/action-init.md) | Initialize tuning session |
|
||||
| [phases/actions/action-diagnose-context.md](phases/actions/action-diagnose-context.md) | Context explosion diagnosis |
|
||||
| [phases/actions/action-diagnose-memory.md](phases/actions/action-diagnose-memory.md) | Long-tail forgetting diagnosis |
|
||||
| [phases/actions/action-diagnose-dataflow.md](phases/actions/action-diagnose-dataflow.md) | Data flow diagnosis |
|
||||
| [phases/actions/action-diagnose-agent.md](phases/actions/action-diagnose-agent.md) | Agent coordination diagnosis |
|
||||
| [phases/actions/action-generate-report.md](phases/actions/action-generate-report.md) | Report generation |
|
||||
| [phases/actions/action-propose-fixes.md](phases/actions/action-propose-fixes.md) | Fix proposal |
|
||||
| [phases/actions/action-apply-fix.md](phases/actions/action-apply-fix.md) | Fix application |
|
||||
| [phases/actions/action-verify.md](phases/actions/action-verify.md) | Verification |
|
||||
| [phases/actions/action-complete.md](phases/actions/action-complete.md) | Finalization |
|
||||
| [specs/problem-taxonomy.md](specs/problem-taxonomy.md) | Problem classification |
|
||||
| [specs/tuning-strategies.md](specs/tuning-strategies.md) | Fix strategies |
|
||||
| [specs/quality-gates.md](specs/quality-gates.md) | Quality criteria |
|
||||
164
.claude/skills/skill-tuning/phases/actions/action-abort.md
Normal file
164
.claude/skills/skill-tuning/phases/actions/action-abort.md
Normal file
@@ -0,0 +1,164 @@
|
||||
# Action: Abort
|
||||
|
||||
Abort the tuning session due to unrecoverable errors.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Safely terminate on critical failures
|
||||
- Preserve diagnostic information for debugging
|
||||
- Ensure backup remains available
|
||||
- Notify user of failure reason
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.error_count >= state.max_errors
|
||||
- [ ] OR critical failure detected
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
console.log('Aborting skill tuning session...');
|
||||
|
||||
const errors = state.errors;
|
||||
const targetSkill = state.target_skill;
|
||||
|
||||
// Generate abort report
|
||||
const abortReport = `# Skill Tuning Aborted
|
||||
|
||||
**Target Skill**: ${targetSkill?.name || 'Unknown'}
|
||||
**Aborted At**: ${new Date().toISOString()}
|
||||
**Reason**: Too many errors or critical failure
|
||||
|
||||
---
|
||||
|
||||
## Error Log
|
||||
|
||||
${errors.length === 0 ? '_No errors recorded_' :
|
||||
errors.map((err, i) => `
|
||||
### Error ${i + 1}
|
||||
- **Action**: ${err.action}
|
||||
- **Message**: ${err.message}
|
||||
- **Time**: ${err.timestamp}
|
||||
- **Recoverable**: ${err.recoverable ? 'Yes' : 'No'}
|
||||
`).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Session State at Abort
|
||||
|
||||
- **Status**: ${state.status}
|
||||
- **Iteration Count**: ${state.iteration_count}
|
||||
- **Completed Actions**: ${state.completed_actions.length}
|
||||
- **Issues Found**: ${state.issues.length}
|
||||
- **Fixes Applied**: ${state.applied_fixes.length}
|
||||
|
||||
---
|
||||
|
||||
## Recovery Options
|
||||
|
||||
### Option 1: Restore Original Skill
|
||||
If any changes were made, restore from backup:
|
||||
\`\`\`bash
|
||||
cp -r "${state.backup_dir}/${targetSkill?.name || 'backup'}-backup"/* "${targetSkill?.path || 'target'}/"
|
||||
\`\`\`
|
||||
|
||||
### Option 2: Resume from Last State
|
||||
The session state is preserved at:
|
||||
\`${workDir}/state.json\`
|
||||
|
||||
To resume:
|
||||
1. Fix the underlying issue
|
||||
2. Reset error_count in state.json
|
||||
3. Re-run skill-tuning with --resume flag
|
||||
|
||||
### Option 3: Manual Investigation
|
||||
Review the following files:
|
||||
- Diagnosis results: \`${workDir}/diagnosis/*.json\`
|
||||
- Error log: \`${workDir}/errors.json\`
|
||||
- State snapshot: \`${workDir}/state.json\`
|
||||
|
||||
---
|
||||
|
||||
## Diagnostic Information
|
||||
|
||||
### Last Successful Action
|
||||
${state.completed_actions.length > 0 ? state.completed_actions[state.completed_actions.length - 1] : 'None'}
|
||||
|
||||
### Current Action When Failed
|
||||
${state.current_action || 'Unknown'}
|
||||
|
||||
### Partial Diagnosis Results
|
||||
- Context: ${state.diagnosis.context ? 'Completed' : 'Not completed'}
|
||||
- Memory: ${state.diagnosis.memory ? 'Completed' : 'Not completed'}
|
||||
- Data Flow: ${state.diagnosis.dataflow ? 'Completed' : 'Not completed'}
|
||||
- Agent: ${state.diagnosis.agent ? 'Completed' : 'Not completed'}
|
||||
|
||||
---
|
||||
|
||||
*Skill tuning aborted - please review errors and retry*
|
||||
`;
|
||||
|
||||
// Write abort report
|
||||
Write(`${workDir}/abort-report.md`, abortReport);
|
||||
|
||||
// Save error log
|
||||
Write(`${workDir}/errors.json`, JSON.stringify(errors, null, 2));
|
||||
|
||||
// Notify user
|
||||
await AskUserQuestion({
|
||||
questions: [{
|
||||
question: `Skill tuning aborted due to ${errors.length} errors. Would you like to restore the original skill?`,
|
||||
header: 'Restore',
|
||||
multiSelect: false,
|
||||
options: [
|
||||
{ label: 'Yes, restore', description: 'Restore original skill from backup' },
|
||||
{ label: 'No, keep changes', description: 'Keep any partial changes made' }
|
||||
]
|
||||
}]
|
||||
}).then(async response => {
|
||||
if (response['Restore'] === 'Yes, restore') {
|
||||
// Restore from backup
|
||||
if (state.backup_dir && targetSkill?.path) {
|
||||
Bash(`cp -r "${state.backup_dir}/${targetSkill.name}-backup"/* "${targetSkill.path}/"`);
|
||||
console.log('Original skill restored from backup.');
|
||||
}
|
||||
}
|
||||
}).catch(() => {
|
||||
// User cancelled, don't restore
|
||||
});
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
status: 'failed',
|
||||
completed_at: new Date().toISOString()
|
||||
},
|
||||
outputFiles: [`${workDir}/abort-report.md`, `${workDir}/errors.json`],
|
||||
summary: `Tuning aborted: ${errors.length} errors. Check abort-report.md for details.`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
status: 'failed',
|
||||
completed_at: '<timestamp>'
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **File**: `abort-report.md`
|
||||
- **Location**: `${workDir}/abort-report.md`
|
||||
|
||||
## Error Handling
|
||||
|
||||
This action should not fail - it's the final error handler.
|
||||
|
||||
## Next Actions
|
||||
|
||||
- None (terminal state)
|
||||
206
.claude/skills/skill-tuning/phases/actions/action-apply-fix.md
Normal file
206
.claude/skills/skill-tuning/phases/actions/action-apply-fix.md
Normal file
@@ -0,0 +1,206 @@
|
||||
# Action: Apply Fix
|
||||
|
||||
Apply a selected fix to the target skill with backup and rollback capability.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Apply fix changes to target skill files
|
||||
- Create backup before modifications
|
||||
- Track applied fixes for verification
|
||||
- Support rollback if needed
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] state.pending_fixes.length > 0
|
||||
- [ ] state.proposed_fixes contains the fix to apply
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
const pendingFixes = state.pending_fixes;
|
||||
const proposedFixes = state.proposed_fixes;
|
||||
const targetPath = state.target_skill.path;
|
||||
const backupDir = state.backup_dir;
|
||||
|
||||
if (pendingFixes.length === 0) {
|
||||
return {
|
||||
stateUpdates: {},
|
||||
outputFiles: [],
|
||||
summary: 'No pending fixes to apply'
|
||||
};
|
||||
}
|
||||
|
||||
// Get next fix to apply
|
||||
const fixId = pendingFixes[0];
|
||||
const fix = proposedFixes.find(f => f.id === fixId);
|
||||
|
||||
if (!fix) {
|
||||
return {
|
||||
stateUpdates: {
|
||||
pending_fixes: pendingFixes.slice(1),
|
||||
errors: [...state.errors, {
|
||||
action: 'action-apply-fix',
|
||||
message: `Fix ${fixId} not found in proposals`,
|
||||
timestamp: new Date().toISOString(),
|
||||
recoverable: true
|
||||
}]
|
||||
},
|
||||
outputFiles: [],
|
||||
summary: `Fix ${fixId} not found, skipping`
|
||||
};
|
||||
}
|
||||
|
||||
console.log(`Applying fix ${fix.id}: ${fix.description}`);
|
||||
|
||||
// Create fix-specific backup
|
||||
const fixBackupDir = `${backupDir}/before-${fix.id}`;
|
||||
Bash(`mkdir -p "${fixBackupDir}"`);
|
||||
|
||||
const appliedChanges = [];
|
||||
let success = true;
|
||||
|
||||
for (const change of fix.changes) {
|
||||
try {
|
||||
// Resolve file path (handle wildcards)
|
||||
let targetFiles = [];
|
||||
if (change.file.includes('*')) {
|
||||
targetFiles = Glob(`${targetPath}/${change.file}`);
|
||||
} else {
|
||||
targetFiles = [`${targetPath}/${change.file}`];
|
||||
}
|
||||
|
||||
for (const targetFile of targetFiles) {
|
||||
// Backup original
|
||||
const relativePath = targetFile.replace(targetPath + '/', '');
|
||||
const backupPath = `${fixBackupDir}/${relativePath}`;
|
||||
|
||||
if (Glob(targetFile).length > 0) {
|
||||
const originalContent = Read(targetFile);
|
||||
Bash(`mkdir -p "$(dirname "${backupPath}")"`);
|
||||
Write(backupPath, originalContent);
|
||||
}
|
||||
|
||||
// Apply change based on action type
|
||||
if (change.action === 'modify' && change.diff) {
|
||||
// For now, append the diff as a comment/note
|
||||
// Real implementation would parse and apply the diff
|
||||
const existingContent = Read(targetFile);
|
||||
|
||||
// Simple diff application: look for context and apply
|
||||
// This is a simplified version - real implementation would be more sophisticated
|
||||
const newContent = existingContent + `\n\n<!-- Applied fix ${fix.id}: ${fix.description} -->\n`;
|
||||
|
||||
Write(targetFile, newContent);
|
||||
|
||||
appliedChanges.push({
|
||||
file: relativePath,
|
||||
action: 'modified',
|
||||
backup: backupPath
|
||||
});
|
||||
} else if (change.action === 'create') {
|
||||
Write(targetFile, change.new_content || '');
|
||||
appliedChanges.push({
|
||||
file: relativePath,
|
||||
action: 'created',
|
||||
backup: null
|
||||
});
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
console.log(`Error applying change to ${change.file}: ${error.message}`);
|
||||
success = false;
|
||||
}
|
||||
}
|
||||
|
||||
// Record applied fix
|
||||
const appliedFix = {
|
||||
fix_id: fix.id,
|
||||
applied_at: new Date().toISOString(),
|
||||
success: success,
|
||||
backup_path: fixBackupDir,
|
||||
verification_result: 'pending',
|
||||
rollback_available: true,
|
||||
changes_made: appliedChanges
|
||||
};
|
||||
|
||||
// Update applied fixes log
|
||||
const appliedFixesPath = `${workDir}/fixes/applied-fixes.json`;
|
||||
let existingApplied = [];
|
||||
try {
|
||||
existingApplied = JSON.parse(Read(appliedFixesPath));
|
||||
} catch (e) {
|
||||
existingApplied = [];
|
||||
}
|
||||
existingApplied.push(appliedFix);
|
||||
Write(appliedFixesPath, JSON.stringify(existingApplied, null, 2));
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
applied_fixes: [...state.applied_fixes, appliedFix],
|
||||
pending_fixes: pendingFixes.slice(1) // Remove applied fix from pending
|
||||
},
|
||||
outputFiles: [appliedFixesPath],
|
||||
summary: `Applied fix ${fix.id}: ${success ? 'success' : 'partial'}, ${appliedChanges.length} files modified`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
applied_fixes: [...existingApplied, newAppliedFix],
|
||||
pending_fixes: remainingPendingFixes
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Rollback Function
|
||||
|
||||
```javascript
|
||||
async function rollbackFix(fixId, state, workDir) {
|
||||
const appliedFix = state.applied_fixes.find(f => f.fix_id === fixId);
|
||||
|
||||
if (!appliedFix || !appliedFix.rollback_available) {
|
||||
throw new Error(`Cannot rollback fix ${fixId}`);
|
||||
}
|
||||
|
||||
const backupDir = appliedFix.backup_path;
|
||||
const targetPath = state.target_skill.path;
|
||||
|
||||
// Restore from backup
|
||||
const backupFiles = Glob(`${backupDir}/**/*`);
|
||||
for (const backupFile of backupFiles) {
|
||||
const relativePath = backupFile.replace(backupDir + '/', '');
|
||||
const targetFile = `${targetPath}/${relativePath}`;
|
||||
const content = Read(backupFile);
|
||||
Write(targetFile, content);
|
||||
}
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
applied_fixes: state.applied_fixes.map(f =>
|
||||
f.fix_id === fixId
|
||||
? { ...f, rollback_available: false, verification_result: 'rolled_back' }
|
||||
: f
|
||||
)
|
||||
}
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| File not found | Skip file, log warning |
|
||||
| Write permission error | Retry with sudo or report |
|
||||
| Backup creation failed | Abort fix, don't modify |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- If pending_fixes.length > 0: action-apply-fix (continue)
|
||||
- If all fixes applied: action-verify
|
||||
195
.claude/skills/skill-tuning/phases/actions/action-complete.md
Normal file
195
.claude/skills/skill-tuning/phases/actions/action-complete.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# Action: Complete
|
||||
|
||||
Finalize the tuning session with summary report and cleanup.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Generate final summary report
|
||||
- Record tuning statistics
|
||||
- Clean up temporary files (optional)
|
||||
- Provide recommendations for future maintenance
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] quality_gate === 'pass' OR max_iterations reached
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
console.log('Finalizing skill tuning session...');
|
||||
|
||||
const targetSkill = state.target_skill;
|
||||
const startTime = new Date(state.started_at);
|
||||
const endTime = new Date();
|
||||
const duration = Math.round((endTime - startTime) / 1000);
|
||||
|
||||
// Generate final summary
|
||||
const summary = `# Skill Tuning Summary
|
||||
|
||||
**Target Skill**: ${targetSkill.name}
|
||||
**Path**: ${targetSkill.path}
|
||||
**Session Duration**: ${duration} seconds
|
||||
**Completed**: ${endTime.toISOString()}
|
||||
|
||||
---
|
||||
|
||||
## Final Status
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Final Health Score | ${state.quality_score}/100 |
|
||||
| Quality Gate | ${state.quality_gate.toUpperCase()} |
|
||||
| Total Iterations | ${state.iteration_count} |
|
||||
| Issues Found | ${state.issues.length + state.applied_fixes.flatMap(f => f.issues_resolved || []).length} |
|
||||
| Issues Resolved | ${state.applied_fixes.flatMap(f => f.issues_resolved || []).length} |
|
||||
| Fixes Applied | ${state.applied_fixes.length} |
|
||||
| Fixes Verified | ${state.applied_fixes.filter(f => f.verification_result === 'pass').length} |
|
||||
|
||||
---
|
||||
|
||||
## Diagnosis Summary
|
||||
|
||||
| Area | Issues Found | Severity |
|
||||
|------|--------------|----------|
|
||||
| Context Explosion | ${state.diagnosis.context?.issues_found || 'N/A'} | ${state.diagnosis.context?.severity || 'N/A'} |
|
||||
| Long-tail Forgetting | ${state.diagnosis.memory?.issues_found || 'N/A'} | ${state.diagnosis.memory?.severity || 'N/A'} |
|
||||
| Data Flow | ${state.diagnosis.dataflow?.issues_found || 'N/A'} | ${state.diagnosis.dataflow?.severity || 'N/A'} |
|
||||
| Agent Coordination | ${state.diagnosis.agent?.issues_found || 'N/A'} | ${state.diagnosis.agent?.severity || 'N/A'} |
|
||||
|
||||
---
|
||||
|
||||
## Applied Fixes
|
||||
|
||||
${state.applied_fixes.length === 0 ? '_No fixes applied_' :
|
||||
state.applied_fixes.map((fix, i) => `
|
||||
### ${i + 1}. ${fix.fix_id}
|
||||
|
||||
- **Applied At**: ${fix.applied_at}
|
||||
- **Success**: ${fix.success ? 'Yes' : 'No'}
|
||||
- **Verification**: ${fix.verification_result}
|
||||
- **Rollback Available**: ${fix.rollback_available ? 'Yes' : 'No'}
|
||||
`).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Remaining Issues
|
||||
|
||||
${state.issues.length === 0 ? '✅ All issues resolved!' :
|
||||
`${state.issues.length} issues remain:\n\n` +
|
||||
state.issues.map(issue =>
|
||||
`- **[${issue.severity.toUpperCase()}]** ${issue.description} (${issue.id})`
|
||||
).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
${generateRecommendations(state)}
|
||||
|
||||
---
|
||||
|
||||
## Backup Information
|
||||
|
||||
Original skill files backed up to:
|
||||
\`${state.backup_dir}\`
|
||||
|
||||
To restore original skill:
|
||||
\`\`\`bash
|
||||
cp -r "${state.backup_dir}/${targetSkill.name}-backup"/* "${targetSkill.path}/"
|
||||
\`\`\`
|
||||
|
||||
---
|
||||
|
||||
## Session Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| ${workDir}/tuning-report.md | Full diagnostic report |
|
||||
| ${workDir}/diagnosis/*.json | Individual diagnosis results |
|
||||
| ${workDir}/fixes/fix-proposals.json | Proposed fixes |
|
||||
| ${workDir}/fixes/applied-fixes.json | Applied fix history |
|
||||
| ${workDir}/tuning-summary.md | This summary |
|
||||
|
||||
---
|
||||
|
||||
*Skill tuning completed by skill-tuning*
|
||||
`;
|
||||
|
||||
Write(`${workDir}/tuning-summary.md`, summary);
|
||||
|
||||
// Update final state
|
||||
return {
|
||||
stateUpdates: {
|
||||
status: 'completed',
|
||||
completed_at: endTime.toISOString()
|
||||
},
|
||||
outputFiles: [`${workDir}/tuning-summary.md`],
|
||||
summary: `Tuning complete: ${state.quality_gate} with ${state.quality_score}/100 health score`
|
||||
};
|
||||
}
|
||||
|
||||
function generateRecommendations(state) {
|
||||
const recommendations = [];
|
||||
|
||||
// Based on remaining issues
|
||||
if (state.issues.some(i => i.type === 'context_explosion')) {
|
||||
recommendations.push('- **Context Management**: Consider implementing a context summarization agent to prevent token growth');
|
||||
}
|
||||
|
||||
if (state.issues.some(i => i.type === 'memory_loss')) {
|
||||
recommendations.push('- **Constraint Tracking**: Add explicit constraint injection to each phase prompt');
|
||||
}
|
||||
|
||||
if (state.issues.some(i => i.type === 'dataflow_break')) {
|
||||
recommendations.push('- **State Centralization**: Migrate to single state.json with schema validation');
|
||||
}
|
||||
|
||||
if (state.issues.some(i => i.type === 'agent_failure')) {
|
||||
recommendations.push('- **Error Handling**: Wrap all Task calls in try-catch blocks');
|
||||
}
|
||||
|
||||
// General recommendations
|
||||
if (state.iteration_count >= state.max_iterations) {
|
||||
recommendations.push('- **Deep Refactoring**: Consider architectural review if issues persist after multiple iterations');
|
||||
}
|
||||
|
||||
if (state.quality_score < 80) {
|
||||
recommendations.push('- **Regular Tuning**: Schedule periodic skill-tuning runs to catch issues early');
|
||||
}
|
||||
|
||||
if (recommendations.length === 0) {
|
||||
recommendations.push('- Skill is in good health! Monitor for regressions during future development.');
|
||||
}
|
||||
|
||||
return recommendations.join('\n');
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
status: 'completed',
|
||||
completed_at: '<timestamp>'
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **File**: `tuning-summary.md`
|
||||
- **Location**: `${workDir}/tuning-summary.md`
|
||||
- **Format**: Markdown
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| Summary write failed | Write to alternative location |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- None (terminal state)
|
||||
@@ -0,0 +1,317 @@
|
||||
# Action: Diagnose Agent Coordination
|
||||
|
||||
Analyze target skill for agent coordination failures - call chain fragility and result passing issues.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Detect fragile agent call patterns
|
||||
- Identify result passing issues
|
||||
- Find missing error handling in agent calls
|
||||
- Analyze agent return format consistency
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] state.target_skill.path is set
|
||||
- [ ] 'agent' in state.focus_areas OR state.focus_areas is empty
|
||||
|
||||
## Detection Patterns
|
||||
|
||||
### Pattern 1: Unhandled Agent Failures
|
||||
|
||||
```regex
|
||||
# Task calls without try-catch or error handling
|
||||
/Task\s*\(\s*\{[^}]*\}\s*\)(?![^;]*catch)/
|
||||
```
|
||||
|
||||
### Pattern 2: Missing Return Validation
|
||||
|
||||
```regex
|
||||
# Agent result used directly without validation
|
||||
/const\s+\w+\s*=\s*await?\s*Task\([^)]+\);\s*(?!.*(?:if|try|JSON\.parse))/
|
||||
```
|
||||
|
||||
### Pattern 3: Inconsistent Agent Configuration
|
||||
|
||||
```regex
|
||||
# Different agent configurations in same skill
|
||||
/subagent_type:\s*['"](\w+)['"]/g
|
||||
```
|
||||
|
||||
### Pattern 4: Deeply Nested Agent Calls
|
||||
|
||||
```regex
|
||||
# Agent calling another agent (nested)
|
||||
/Task\s*\([^)]*prompt:[^)]*Task\s*\(/
|
||||
```
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
const skillPath = state.target_skill.path;
|
||||
const startTime = Date.now();
|
||||
const issues = [];
|
||||
const evidence = [];
|
||||
|
||||
console.log(`Diagnosing agent coordination in ${skillPath}...`);
|
||||
|
||||
// 1. Find all Task/agent calls
|
||||
const allFiles = Glob(`${skillPath}/**/*.md`);
|
||||
const agentCalls = [];
|
||||
const agentTypes = new Set();
|
||||
|
||||
for (const file of allFiles) {
|
||||
const content = Read(file);
|
||||
const relativePath = file.replace(skillPath + '/', '');
|
||||
|
||||
// Find Task calls
|
||||
const taskMatches = content.matchAll(/Task\s*\(\s*\{([^}]+)\}/g);
|
||||
for (const match of taskMatches) {
|
||||
const config = match[1];
|
||||
|
||||
// Extract agent type
|
||||
const typeMatch = config.match(/subagent_type:\s*['"]([^'"]+)['"]/);
|
||||
const agentType = typeMatch ? typeMatch[1] : 'unknown';
|
||||
agentTypes.add(agentType);
|
||||
|
||||
// Check for error handling context
|
||||
const hasErrorHandling = /try\s*\{.*Task|\.catch\(|await\s+Task.*\.then/s.test(
|
||||
content.slice(Math.max(0, match.index - 100), match.index + match[0].length + 100)
|
||||
);
|
||||
|
||||
// Check for result validation
|
||||
const hasResultValidation = /JSON\.parse|if\s*\(\s*result|result\s*\?\./s.test(
|
||||
content.slice(match.index, match.index + match[0].length + 200)
|
||||
);
|
||||
|
||||
// Check for background execution
|
||||
const runsInBackground = /run_in_background:\s*true/.test(config);
|
||||
|
||||
agentCalls.push({
|
||||
file: relativePath,
|
||||
agentType,
|
||||
hasErrorHandling,
|
||||
hasResultValidation,
|
||||
runsInBackground,
|
||||
config: config.slice(0, 200)
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Analyze agent call patterns
|
||||
const totalCalls = agentCalls.length;
|
||||
const callsWithoutErrorHandling = agentCalls.filter(c => !c.hasErrorHandling);
|
||||
const callsWithoutValidation = agentCalls.filter(c => !c.hasResultValidation);
|
||||
|
||||
// Issue: Missing error handling
|
||||
if (callsWithoutErrorHandling.length > 0) {
|
||||
issues.push({
|
||||
id: `AGT-${issues.length + 1}`,
|
||||
type: 'agent_failure',
|
||||
severity: callsWithoutErrorHandling.length > 2 ? 'high' : 'medium',
|
||||
location: { file: 'multiple' },
|
||||
description: `${callsWithoutErrorHandling.length}/${totalCalls} agent calls lack error handling`,
|
||||
evidence: callsWithoutErrorHandling.slice(0, 3).map(c =>
|
||||
`${c.file}: ${c.agentType}`
|
||||
),
|
||||
root_cause: 'Agent failures not caught, may crash workflow',
|
||||
impact: 'Unhandled agent errors cause cascading failures',
|
||||
suggested_fix: 'Wrap Task calls in try-catch with graceful fallback'
|
||||
});
|
||||
evidence.push({
|
||||
file: 'multiple',
|
||||
pattern: 'missing_error_handling',
|
||||
context: `${callsWithoutErrorHandling.length} calls affected`,
|
||||
severity: 'high'
|
||||
});
|
||||
}
|
||||
|
||||
// Issue: Missing result validation
|
||||
if (callsWithoutValidation.length > 0) {
|
||||
issues.push({
|
||||
id: `AGT-${issues.length + 1}`,
|
||||
type: 'agent_failure',
|
||||
severity: 'medium',
|
||||
location: { file: 'multiple' },
|
||||
description: `${callsWithoutValidation.length}/${totalCalls} agent calls lack result validation`,
|
||||
evidence: callsWithoutValidation.slice(0, 3).map(c =>
|
||||
`${c.file}: ${c.agentType} result not validated`
|
||||
),
|
||||
root_cause: 'Agent results used directly without type checking',
|
||||
impact: 'Invalid agent output may corrupt state',
|
||||
suggested_fix: 'Add JSON.parse with try-catch and schema validation'
|
||||
});
|
||||
}
|
||||
|
||||
// 3. Check for inconsistent agent types usage
|
||||
if (agentTypes.size > 3 && state.target_skill.execution_mode === 'autonomous') {
|
||||
issues.push({
|
||||
id: `AGT-${issues.length + 1}`,
|
||||
type: 'agent_failure',
|
||||
severity: 'low',
|
||||
location: { file: 'multiple' },
|
||||
description: `Using ${agentTypes.size} different agent types`,
|
||||
evidence: [...agentTypes].slice(0, 5),
|
||||
root_cause: 'Multiple agent types increase coordination complexity',
|
||||
impact: 'Different agent behaviors may cause inconsistency',
|
||||
suggested_fix: 'Standardize on fewer agent types with clear roles'
|
||||
});
|
||||
}
|
||||
|
||||
// 4. Check for nested agent calls
|
||||
for (const file of allFiles) {
|
||||
const content = Read(file);
|
||||
const relativePath = file.replace(skillPath + '/', '');
|
||||
|
||||
// Detect nested Task calls
|
||||
const hasNestedTask = /Task\s*\([^)]*prompt:[^)]*Task\s*\(/s.test(content);
|
||||
|
||||
if (hasNestedTask) {
|
||||
issues.push({
|
||||
id: `AGT-${issues.length + 1}`,
|
||||
type: 'agent_failure',
|
||||
severity: 'high',
|
||||
location: { file: relativePath },
|
||||
description: 'Nested agent calls detected',
|
||||
evidence: ['Agent prompt contains another Task call'],
|
||||
root_cause: 'Agent calls another agent, creating deep nesting',
|
||||
impact: 'Context explosion, hard to debug, unpredictable behavior',
|
||||
suggested_fix: 'Flatten agent calls, use orchestrator to coordinate'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 5. Check SKILL.md for agent configuration consistency
|
||||
const skillMd = Read(`${skillPath}/SKILL.md`);
|
||||
|
||||
// Check if allowed-tools includes Task
|
||||
const allowedTools = skillMd.match(/allowed-tools:\s*([^\n]+)/i);
|
||||
if (allowedTools && !allowedTools[1].includes('Task') && totalCalls > 0) {
|
||||
issues.push({
|
||||
id: `AGT-${issues.length + 1}`,
|
||||
type: 'agent_failure',
|
||||
severity: 'medium',
|
||||
location: { file: 'SKILL.md' },
|
||||
description: 'Task tool used but not declared in allowed-tools',
|
||||
evidence: [`${totalCalls} Task calls found, but Task not in allowed-tools`],
|
||||
root_cause: 'Tool declaration mismatch',
|
||||
impact: 'May cause runtime permission issues',
|
||||
suggested_fix: 'Add Task to allowed-tools in SKILL.md front matter'
|
||||
});
|
||||
}
|
||||
|
||||
// 6. Check for agent result format consistency
|
||||
const returnFormats = new Set();
|
||||
for (const file of allFiles) {
|
||||
const content = Read(file);
|
||||
|
||||
// Look for return format definitions
|
||||
const returnMatch = content.match(/\[RETURN\][^[]*|return\s*\{[^}]+\}/gi);
|
||||
if (returnMatch) {
|
||||
returnMatch.forEach(r => {
|
||||
const format = r.includes('JSON') ? 'json' :
|
||||
r.includes('summary') ? 'summary' :
|
||||
r.includes('file') ? 'file_path' : 'other';
|
||||
returnFormats.add(format);
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
if (returnFormats.size > 2) {
|
||||
issues.push({
|
||||
id: `AGT-${issues.length + 1}`,
|
||||
type: 'agent_failure',
|
||||
severity: 'medium',
|
||||
location: { file: 'multiple' },
|
||||
description: 'Inconsistent agent return formats',
|
||||
evidence: [...returnFormats],
|
||||
root_cause: 'Different agents return data in different formats',
|
||||
impact: 'Orchestrator must handle multiple format types',
|
||||
suggested_fix: 'Standardize return format: {status, output_file, summary}'
|
||||
});
|
||||
}
|
||||
|
||||
// 7. Calculate severity
|
||||
const criticalCount = issues.filter(i => i.severity === 'critical').length;
|
||||
const highCount = issues.filter(i => i.severity === 'high').length;
|
||||
const severity = criticalCount > 0 ? 'critical' :
|
||||
highCount > 1 ? 'high' :
|
||||
highCount > 0 ? 'medium' :
|
||||
issues.length > 0 ? 'low' : 'none';
|
||||
|
||||
// 8. Write diagnosis result
|
||||
const diagnosisResult = {
|
||||
status: 'completed',
|
||||
issues_found: issues.length,
|
||||
severity: severity,
|
||||
execution_time_ms: Date.now() - startTime,
|
||||
details: {
|
||||
patterns_checked: [
|
||||
'error_handling',
|
||||
'result_validation',
|
||||
'agent_type_consistency',
|
||||
'nested_calls',
|
||||
'return_format_consistency'
|
||||
],
|
||||
patterns_matched: evidence.map(e => e.pattern),
|
||||
evidence: evidence,
|
||||
agent_analysis: {
|
||||
total_agent_calls: totalCalls,
|
||||
unique_agent_types: agentTypes.size,
|
||||
calls_without_error_handling: callsWithoutErrorHandling.length,
|
||||
calls_without_validation: callsWithoutValidation.length,
|
||||
agent_types_used: [...agentTypes]
|
||||
},
|
||||
recommendations: [
|
||||
callsWithoutErrorHandling.length > 0
|
||||
? 'Add try-catch to all Task calls' : null,
|
||||
callsWithoutValidation.length > 0
|
||||
? 'Add result validation with JSON.parse and schema check' : null,
|
||||
agentTypes.size > 3
|
||||
? 'Consolidate agent types for consistency' : null
|
||||
].filter(Boolean)
|
||||
}
|
||||
};
|
||||
|
||||
Write(`${workDir}/diagnosis/agent-diagnosis.json`,
|
||||
JSON.stringify(diagnosisResult, null, 2));
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
'diagnosis.agent': diagnosisResult,
|
||||
issues: [...state.issues, ...issues]
|
||||
},
|
||||
outputFiles: [`${workDir}/diagnosis/agent-diagnosis.json`],
|
||||
summary: `Agent diagnosis: ${issues.length} issues found (severity: ${severity})`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
'diagnosis.agent': {
|
||||
status: 'completed',
|
||||
issues_found: <count>,
|
||||
severity: '<critical|high|medium|low|none>',
|
||||
// ... full diagnosis result
|
||||
},
|
||||
issues: [...existingIssues, ...newIssues]
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| Regex match error | Use simpler patterns |
|
||||
| File access error | Skip and continue |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- Success: action-generate-report
|
||||
- Skipped: If 'agent' not in focus_areas
|
||||
@@ -0,0 +1,243 @@
|
||||
# Action: Diagnose Context Explosion
|
||||
|
||||
Analyze target skill for context explosion issues - token accumulation and multi-turn dialogue bloat.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Detect patterns that cause context growth
|
||||
- Identify multi-turn accumulation points
|
||||
- Find missing context compression mechanisms
|
||||
- Measure potential token waste
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] state.target_skill.path is set
|
||||
- [ ] 'context' in state.focus_areas OR state.focus_areas is empty
|
||||
|
||||
## Detection Patterns
|
||||
|
||||
### Pattern 1: Unbounded History Accumulation
|
||||
|
||||
```regex
|
||||
# Patterns that suggest history accumulation
|
||||
/\bhistory\b.*\.push\b/
|
||||
/\bmessages\b.*\.concat\b/
|
||||
/\bconversation\b.*\+=\b/
|
||||
/\bappend.*context\b/i
|
||||
```
|
||||
|
||||
### Pattern 2: Full Content Passing
|
||||
|
||||
```regex
|
||||
# Patterns that pass full content instead of references
|
||||
/Read\([^)]+\).*\+.*Read\(/
|
||||
/JSON\.stringify\(.*state\)/ # Full state serialization
|
||||
/\$\{.*content\}/ # Template literal with full content
|
||||
```
|
||||
|
||||
### Pattern 3: Missing Summarization
|
||||
|
||||
```regex
|
||||
# Absence of compression/summarization
|
||||
# Check for lack of: summarize, compress, truncate, slice
|
||||
```
|
||||
|
||||
### Pattern 4: Agent Return Bloat
|
||||
|
||||
```regex
|
||||
# Agent returning full content instead of path + summary
|
||||
/return\s*\{[^}]*content:/
|
||||
/return.*JSON\.stringify/
|
||||
```
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
const skillPath = state.target_skill.path;
|
||||
const startTime = Date.now();
|
||||
const issues = [];
|
||||
const evidence = [];
|
||||
|
||||
console.log(`Diagnosing context explosion in ${skillPath}...`);
|
||||
|
||||
// 1. Scan all phase files
|
||||
const phaseFiles = Glob(`${skillPath}/phases/**/*.md`);
|
||||
|
||||
for (const file of phaseFiles) {
|
||||
const content = Read(file);
|
||||
const relativePath = file.replace(skillPath + '/', '');
|
||||
|
||||
// Check Pattern 1: History accumulation
|
||||
const historyPatterns = [
|
||||
/history\s*[.=].*push|concat|append/gi,
|
||||
/messages\s*=\s*\[.*\.\.\..*messages/gi,
|
||||
/conversation.*\+=/gi
|
||||
];
|
||||
|
||||
for (const pattern of historyPatterns) {
|
||||
const matches = content.match(pattern);
|
||||
if (matches) {
|
||||
issues.push({
|
||||
id: `CTX-${issues.length + 1}`,
|
||||
type: 'context_explosion',
|
||||
severity: 'high',
|
||||
location: { file: relativePath },
|
||||
description: 'Unbounded history accumulation detected',
|
||||
evidence: matches.slice(0, 3),
|
||||
root_cause: 'History/messages array grows without bounds',
|
||||
impact: 'Token count increases linearly with iterations',
|
||||
suggested_fix: 'Implement sliding window or summarization'
|
||||
});
|
||||
evidence.push({
|
||||
file: relativePath,
|
||||
pattern: 'history_accumulation',
|
||||
context: matches[0],
|
||||
severity: 'high'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Check Pattern 2: Full content passing
|
||||
const contentPatterns = [
|
||||
/Read\s*\([^)]+\)\s*[\+,]/g,
|
||||
/JSON\.stringify\s*\(\s*state\s*\)/g,
|
||||
/\$\{[^}]*content[^}]*\}/g
|
||||
];
|
||||
|
||||
for (const pattern of contentPatterns) {
|
||||
const matches = content.match(pattern);
|
||||
if (matches) {
|
||||
issues.push({
|
||||
id: `CTX-${issues.length + 1}`,
|
||||
type: 'context_explosion',
|
||||
severity: 'medium',
|
||||
location: { file: relativePath },
|
||||
description: 'Full content passed instead of reference',
|
||||
evidence: matches.slice(0, 3),
|
||||
root_cause: 'Entire file/state content included in prompts',
|
||||
impact: 'Unnecessary token consumption',
|
||||
suggested_fix: 'Pass file paths and summaries instead of full content'
|
||||
});
|
||||
evidence.push({
|
||||
file: relativePath,
|
||||
pattern: 'full_content_passing',
|
||||
context: matches[0],
|
||||
severity: 'medium'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Check Pattern 3: Missing summarization
|
||||
const hasSummarization = /summariz|compress|truncat|slice.*context/i.test(content);
|
||||
const hasLongPrompts = content.length > 5000;
|
||||
|
||||
if (hasLongPrompts && !hasSummarization) {
|
||||
issues.push({
|
||||
id: `CTX-${issues.length + 1}`,
|
||||
type: 'context_explosion',
|
||||
severity: 'medium',
|
||||
location: { file: relativePath },
|
||||
description: 'Long phase file without summarization mechanism',
|
||||
evidence: [`File length: ${content.length} chars`],
|
||||
root_cause: 'No context compression for large content',
|
||||
impact: 'Potential token overflow in long sessions',
|
||||
suggested_fix: 'Add context summarization before passing to agents'
|
||||
});
|
||||
}
|
||||
|
||||
// Check Pattern 4: Agent return bloat
|
||||
const returnPatterns = /return\s*\{[^}]*(?:content|full_output|complete_result):/g;
|
||||
const returnMatches = content.match(returnPatterns);
|
||||
if (returnMatches) {
|
||||
issues.push({
|
||||
id: `CTX-${issues.length + 1}`,
|
||||
type: 'context_explosion',
|
||||
severity: 'high',
|
||||
location: { file: relativePath },
|
||||
description: 'Agent returns full content instead of path+summary',
|
||||
evidence: returnMatches.slice(0, 3),
|
||||
root_cause: 'Agent output includes complete content',
|
||||
impact: 'Context bloat when orchestrator receives full output',
|
||||
suggested_fix: 'Return {output_file, summary} instead of {content}'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Calculate severity
|
||||
const criticalCount = issues.filter(i => i.severity === 'critical').length;
|
||||
const highCount = issues.filter(i => i.severity === 'high').length;
|
||||
const severity = criticalCount > 0 ? 'critical' :
|
||||
highCount > 2 ? 'high' :
|
||||
highCount > 0 ? 'medium' :
|
||||
issues.length > 0 ? 'low' : 'none';
|
||||
|
||||
// 3. Write diagnosis result
|
||||
const diagnosisResult = {
|
||||
status: 'completed',
|
||||
issues_found: issues.length,
|
||||
severity: severity,
|
||||
execution_time_ms: Date.now() - startTime,
|
||||
details: {
|
||||
patterns_checked: [
|
||||
'history_accumulation',
|
||||
'full_content_passing',
|
||||
'missing_summarization',
|
||||
'agent_return_bloat'
|
||||
],
|
||||
patterns_matched: evidence.map(e => e.pattern),
|
||||
evidence: evidence,
|
||||
recommendations: [
|
||||
issues.length > 0 ? 'Implement context summarization agent' : null,
|
||||
highCount > 0 ? 'Add sliding window for conversation history' : null,
|
||||
evidence.some(e => e.pattern === 'full_content_passing')
|
||||
? 'Refactor to pass file paths instead of content' : null
|
||||
].filter(Boolean)
|
||||
}
|
||||
};
|
||||
|
||||
Write(`${workDir}/diagnosis/context-diagnosis.json`,
|
||||
JSON.stringify(diagnosisResult, null, 2));
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
'diagnosis.context': diagnosisResult,
|
||||
issues: [...state.issues, ...issues],
|
||||
'issues_by_severity.critical': state.issues_by_severity.critical + criticalCount,
|
||||
'issues_by_severity.high': state.issues_by_severity.high + highCount
|
||||
},
|
||||
outputFiles: [`${workDir}/diagnosis/context-diagnosis.json`],
|
||||
summary: `Context diagnosis: ${issues.length} issues found (severity: ${severity})`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
'diagnosis.context': {
|
||||
status: 'completed',
|
||||
issues_found: <count>,
|
||||
severity: '<critical|high|medium|low|none>',
|
||||
// ... full diagnosis result
|
||||
},
|
||||
issues: [...existingIssues, ...newIssues]
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| File read error | Skip file, log warning |
|
||||
| Pattern matching error | Use fallback patterns |
|
||||
| Write error | Retry to alternative path |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- Success: action-diagnose-memory (or next in focus_areas)
|
||||
- Skipped: If 'context' not in focus_areas
|
||||
@@ -0,0 +1,318 @@
|
||||
# Action: Diagnose Data Flow Issues
|
||||
|
||||
Analyze target skill for data flow disruption - state inconsistencies and format variations.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Detect inconsistent data formats between phases
|
||||
- Identify scattered state storage
|
||||
- Find missing data contracts
|
||||
- Measure state transition integrity
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] state.target_skill.path is set
|
||||
- [ ] 'dataflow' in state.focus_areas OR state.focus_areas is empty
|
||||
|
||||
## Detection Patterns
|
||||
|
||||
### Pattern 1: Multiple Storage Locations
|
||||
|
||||
```regex
|
||||
# Data written to multiple paths without centralization
|
||||
/Write\s*\(\s*[`'"][^`'"]+[`'"]/g
|
||||
```
|
||||
|
||||
### Pattern 2: Inconsistent Field Names
|
||||
|
||||
```regex
|
||||
# Same concept with different names: title/name, id/identifier
|
||||
```
|
||||
|
||||
### Pattern 3: Missing Schema Validation
|
||||
|
||||
```regex
|
||||
# Absence of validation before state write
|
||||
# Look for lack of: validate, schema, check, verify
|
||||
```
|
||||
|
||||
### Pattern 4: Format Transformation Without Normalization
|
||||
|
||||
```regex
|
||||
# Direct JSON.parse without error handling or normalization
|
||||
/JSON\.parse\([^)]+\)(?!\s*\|\|)/
|
||||
```
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
const skillPath = state.target_skill.path;
|
||||
const startTime = Date.now();
|
||||
const issues = [];
|
||||
const evidence = [];
|
||||
|
||||
console.log(`Diagnosing data flow in ${skillPath}...`);
|
||||
|
||||
// 1. Collect all Write operations to map data storage
|
||||
const allFiles = Glob(`${skillPath}/**/*.md`);
|
||||
const writeLocations = [];
|
||||
const readLocations = [];
|
||||
|
||||
for (const file of allFiles) {
|
||||
const content = Read(file);
|
||||
const relativePath = file.replace(skillPath + '/', '');
|
||||
|
||||
// Find Write operations
|
||||
const writeMatches = content.matchAll(/Write\s*\(\s*[`'"]([^`'"]+)[`'"]/g);
|
||||
for (const match of writeMatches) {
|
||||
writeLocations.push({
|
||||
file: relativePath,
|
||||
target: match[1],
|
||||
isStateFile: match[1].includes('state.json') || match[1].includes('config.json')
|
||||
});
|
||||
}
|
||||
|
||||
// Find Read operations
|
||||
const readMatches = content.matchAll(/Read\s*\(\s*[`'"]([^`'"]+)[`'"]/g);
|
||||
for (const match of readMatches) {
|
||||
readLocations.push({
|
||||
file: relativePath,
|
||||
source: match[1]
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Check for scattered state storage
|
||||
const stateTargets = writeLocations
|
||||
.filter(w => w.isStateFile)
|
||||
.map(w => w.target);
|
||||
|
||||
const uniqueStateFiles = [...new Set(stateTargets)];
|
||||
|
||||
if (uniqueStateFiles.length > 2) {
|
||||
issues.push({
|
||||
id: `DF-${issues.length + 1}`,
|
||||
type: 'dataflow_break',
|
||||
severity: 'high',
|
||||
location: { file: 'multiple' },
|
||||
description: `State stored in ${uniqueStateFiles.length} different locations`,
|
||||
evidence: uniqueStateFiles.slice(0, 5),
|
||||
root_cause: 'No centralized state management',
|
||||
impact: 'State inconsistency between phases',
|
||||
suggested_fix: 'Centralize state to single state.json with state manager'
|
||||
});
|
||||
evidence.push({
|
||||
file: 'multiple',
|
||||
pattern: 'scattered_state',
|
||||
context: uniqueStateFiles.join(', '),
|
||||
severity: 'high'
|
||||
});
|
||||
}
|
||||
|
||||
// 3. Check for inconsistent field naming
|
||||
const fieldNamePatterns = {
|
||||
'name_vs_title': [/\.name\b/, /\.title\b/],
|
||||
'id_vs_identifier': [/\.id\b/, /\.identifier\b/],
|
||||
'status_vs_state': [/\.status\b/, /\.state\b/],
|
||||
'error_vs_errors': [/\.error\b/, /\.errors\b/]
|
||||
};
|
||||
|
||||
const fieldUsage = {};
|
||||
|
||||
for (const file of allFiles) {
|
||||
const content = Read(file);
|
||||
const relativePath = file.replace(skillPath + '/', '');
|
||||
|
||||
for (const [patternName, patterns] of Object.entries(fieldNamePatterns)) {
|
||||
for (const pattern of patterns) {
|
||||
if (pattern.test(content)) {
|
||||
if (!fieldUsage[patternName]) fieldUsage[patternName] = [];
|
||||
fieldUsage[patternName].push({
|
||||
file: relativePath,
|
||||
pattern: pattern.toString()
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for (const [patternName, usages] of Object.entries(fieldUsage)) {
|
||||
const uniquePatterns = [...new Set(usages.map(u => u.pattern))];
|
||||
if (uniquePatterns.length > 1) {
|
||||
issues.push({
|
||||
id: `DF-${issues.length + 1}`,
|
||||
type: 'dataflow_break',
|
||||
severity: 'medium',
|
||||
location: { file: 'multiple' },
|
||||
description: `Inconsistent field naming: ${patternName.replace('_vs_', ' vs ')}`,
|
||||
evidence: usages.slice(0, 3).map(u => `${u.file}: ${u.pattern}`),
|
||||
root_cause: 'Same concept referred to with different field names',
|
||||
impact: 'Data may be lost during field access',
|
||||
suggested_fix: `Standardize to single field name, add normalization function`
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Check for missing schema validation
|
||||
for (const file of allFiles) {
|
||||
const content = Read(file);
|
||||
const relativePath = file.replace(skillPath + '/', '');
|
||||
|
||||
// Find JSON.parse without validation
|
||||
const unsafeParses = content.match(/JSON\.parse\s*\([^)]+\)(?!\s*\?\?|\s*\|\|)/g);
|
||||
const hasValidation = /validat|schema|type.*check/i.test(content);
|
||||
|
||||
if (unsafeParses && unsafeParses.length > 0 && !hasValidation) {
|
||||
issues.push({
|
||||
id: `DF-${issues.length + 1}`,
|
||||
type: 'dataflow_break',
|
||||
severity: 'medium',
|
||||
location: { file: relativePath },
|
||||
description: 'JSON parsing without validation',
|
||||
evidence: unsafeParses.slice(0, 2),
|
||||
root_cause: 'No schema validation after parsing',
|
||||
impact: 'Invalid data may propagate through phases',
|
||||
suggested_fix: 'Add schema validation after JSON.parse'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 5. Check state schema if exists
|
||||
const stateSchemaFile = Glob(`${skillPath}/phases/state-schema.md`)[0];
|
||||
if (stateSchemaFile) {
|
||||
const schemaContent = Read(stateSchemaFile);
|
||||
|
||||
// Check for type definitions
|
||||
const hasTypeScript = /interface\s+\w+|type\s+\w+\s*=/i.test(schemaContent);
|
||||
const hasValidationFunction = /function\s+validate|validateState/i.test(schemaContent);
|
||||
|
||||
if (hasTypeScript && !hasValidationFunction) {
|
||||
issues.push({
|
||||
id: `DF-${issues.length + 1}`,
|
||||
type: 'dataflow_break',
|
||||
severity: 'low',
|
||||
location: { file: 'phases/state-schema.md' },
|
||||
description: 'Type definitions without runtime validation',
|
||||
evidence: ['TypeScript interfaces defined but no validation function'],
|
||||
root_cause: 'Types are compile-time only, not enforced at runtime',
|
||||
impact: 'Schema violations may occur at runtime',
|
||||
suggested_fix: 'Add validateState() function using Zod or manual checks'
|
||||
});
|
||||
}
|
||||
} else if (state.target_skill.execution_mode === 'autonomous') {
|
||||
issues.push({
|
||||
id: `DF-${issues.length + 1}`,
|
||||
type: 'dataflow_break',
|
||||
severity: 'high',
|
||||
location: { file: 'phases/' },
|
||||
description: 'Autonomous skill missing state-schema.md',
|
||||
evidence: ['No state schema definition found'],
|
||||
root_cause: 'State structure undefined for orchestrator',
|
||||
impact: 'Inconsistent state handling across actions',
|
||||
suggested_fix: 'Create phases/state-schema.md with explicit type definitions'
|
||||
});
|
||||
}
|
||||
|
||||
// 6. Check read-write alignment
|
||||
const writtenFiles = new Set(writeLocations.map(w => w.target));
|
||||
const readFiles = new Set(readLocations.map(r => r.source));
|
||||
|
||||
const writtenButNotRead = [...writtenFiles].filter(f =>
|
||||
!readFiles.has(f) && !f.includes('output') && !f.includes('report')
|
||||
);
|
||||
|
||||
if (writtenButNotRead.length > 0) {
|
||||
issues.push({
|
||||
id: `DF-${issues.length + 1}`,
|
||||
type: 'dataflow_break',
|
||||
severity: 'low',
|
||||
location: { file: 'multiple' },
|
||||
description: 'Files written but never read',
|
||||
evidence: writtenButNotRead.slice(0, 3),
|
||||
root_cause: 'Orphaned output files',
|
||||
impact: 'Wasted storage and potential confusion',
|
||||
suggested_fix: 'Remove unused writes or add reads where needed'
|
||||
});
|
||||
}
|
||||
|
||||
// 7. Calculate severity
|
||||
const criticalCount = issues.filter(i => i.severity === 'critical').length;
|
||||
const highCount = issues.filter(i => i.severity === 'high').length;
|
||||
const severity = criticalCount > 0 ? 'critical' :
|
||||
highCount > 1 ? 'high' :
|
||||
highCount > 0 ? 'medium' :
|
||||
issues.length > 0 ? 'low' : 'none';
|
||||
|
||||
// 8. Write diagnosis result
|
||||
const diagnosisResult = {
|
||||
status: 'completed',
|
||||
issues_found: issues.length,
|
||||
severity: severity,
|
||||
execution_time_ms: Date.now() - startTime,
|
||||
details: {
|
||||
patterns_checked: [
|
||||
'scattered_state',
|
||||
'inconsistent_naming',
|
||||
'missing_validation',
|
||||
'read_write_alignment'
|
||||
],
|
||||
patterns_matched: evidence.map(e => e.pattern),
|
||||
evidence: evidence,
|
||||
data_flow_map: {
|
||||
write_locations: writeLocations.length,
|
||||
read_locations: readLocations.length,
|
||||
unique_state_files: uniqueStateFiles.length
|
||||
},
|
||||
recommendations: [
|
||||
uniqueStateFiles.length > 2 ? 'Implement centralized state manager' : null,
|
||||
issues.some(i => i.description.includes('naming'))
|
||||
? 'Create normalization layer for field names' : null,
|
||||
issues.some(i => i.description.includes('validation'))
|
||||
? 'Add Zod or JSON Schema validation' : null
|
||||
].filter(Boolean)
|
||||
}
|
||||
};
|
||||
|
||||
Write(`${workDir}/diagnosis/dataflow-diagnosis.json`,
|
||||
JSON.stringify(diagnosisResult, null, 2));
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
'diagnosis.dataflow': diagnosisResult,
|
||||
issues: [...state.issues, ...issues]
|
||||
},
|
||||
outputFiles: [`${workDir}/diagnosis/dataflow-diagnosis.json`],
|
||||
summary: `Data flow diagnosis: ${issues.length} issues found (severity: ${severity})`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
'diagnosis.dataflow': {
|
||||
status: 'completed',
|
||||
issues_found: <count>,
|
||||
severity: '<critical|high|medium|low|none>',
|
||||
// ... full diagnosis result
|
||||
},
|
||||
issues: [...existingIssues, ...newIssues]
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| Glob pattern error | Use fallback patterns |
|
||||
| File read error | Skip and continue |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- Success: action-diagnose-agent (or next in focus_areas)
|
||||
- Skipped: If 'dataflow' not in focus_areas
|
||||
@@ -0,0 +1,269 @@
|
||||
# Action: Diagnose Long-tail Forgetting
|
||||
|
||||
Analyze target skill for long-tail effect and constraint forgetting issues.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Detect loss of early instructions in long execution chains
|
||||
- Identify missing constraint propagation mechanisms
|
||||
- Find weak goal alignment between phases
|
||||
- Measure instruction retention across phases
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] state.target_skill.path is set
|
||||
- [ ] 'memory' in state.focus_areas OR state.focus_areas is empty
|
||||
|
||||
## Detection Patterns
|
||||
|
||||
### Pattern 1: Missing Constraint References
|
||||
|
||||
```regex
|
||||
# Phases that don't reference original requirements
|
||||
# Look for absence of: requirements, constraints, original, initial, user_request
|
||||
```
|
||||
|
||||
### Pattern 2: Goal Drift
|
||||
|
||||
```regex
|
||||
# Later phases focus on immediate task without global context
|
||||
/\[TASK\][^[]*(?!\[CONSTRAINTS\]|\[REQUIREMENTS\])/
|
||||
```
|
||||
|
||||
### Pattern 3: No Checkpoint Mechanism
|
||||
|
||||
```regex
|
||||
# Absence of state preservation at key points
|
||||
# Look for lack of: checkpoint, snapshot, preserve, restore
|
||||
```
|
||||
|
||||
### Pattern 4: Implicit State Passing
|
||||
|
||||
```regex
|
||||
# State passed implicitly through conversation rather than explicitly
|
||||
/(?<!state\.)context\./
|
||||
```
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
const skillPath = state.target_skill.path;
|
||||
const startTime = Date.now();
|
||||
const issues = [];
|
||||
const evidence = [];
|
||||
|
||||
console.log(`Diagnosing long-tail forgetting in ${skillPath}...`);
|
||||
|
||||
// 1. Analyze phase chain for constraint propagation
|
||||
const phaseFiles = Glob(`${skillPath}/phases/*.md`)
|
||||
.filter(f => !f.includes('orchestrator') && !f.includes('state-schema'))
|
||||
.sort();
|
||||
|
||||
// Extract phase order (for sequential) or action dependencies (for autonomous)
|
||||
const isAutonomous = state.target_skill.execution_mode === 'autonomous';
|
||||
|
||||
// 2. Check each phase for constraint awareness
|
||||
let firstPhaseConstraints = [];
|
||||
|
||||
for (let i = 0; i < phaseFiles.length; i++) {
|
||||
const file = phaseFiles[i];
|
||||
const content = Read(file);
|
||||
const relativePath = file.replace(skillPath + '/', '');
|
||||
const phaseNum = i + 1;
|
||||
|
||||
// Extract constraints from first phase
|
||||
if (i === 0) {
|
||||
const constraintMatch = content.match(/\[CONSTRAINTS?\]([^[]*)/i);
|
||||
if (constraintMatch) {
|
||||
firstPhaseConstraints = constraintMatch[1]
|
||||
.split('\n')
|
||||
.filter(l => l.trim().startsWith('-'))
|
||||
.map(l => l.trim().replace(/^-\s*/, ''));
|
||||
}
|
||||
}
|
||||
|
||||
// Check if later phases reference original constraints
|
||||
if (i > 0 && firstPhaseConstraints.length > 0) {
|
||||
const mentionsConstraints = firstPhaseConstraints.some(c =>
|
||||
content.toLowerCase().includes(c.toLowerCase().slice(0, 20))
|
||||
);
|
||||
|
||||
if (!mentionsConstraints) {
|
||||
issues.push({
|
||||
id: `MEM-${issues.length + 1}`,
|
||||
type: 'memory_loss',
|
||||
severity: 'high',
|
||||
location: { file: relativePath, phase: `Phase ${phaseNum}` },
|
||||
description: `Phase ${phaseNum} does not reference original constraints`,
|
||||
evidence: [`Original constraints: ${firstPhaseConstraints.slice(0, 3).join(', ')}`],
|
||||
root_cause: 'Constraint information not propagated to later phases',
|
||||
impact: 'May produce output violating original requirements',
|
||||
suggested_fix: 'Add explicit constraint injection or reference to state.original_constraints'
|
||||
});
|
||||
evidence.push({
|
||||
file: relativePath,
|
||||
pattern: 'missing_constraint_reference',
|
||||
context: `Phase ${phaseNum} of ${phaseFiles.length}`,
|
||||
severity: 'high'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Check for goal drift - task without constraints
|
||||
const hasTask = /\[TASK\]/i.test(content);
|
||||
const hasConstraints = /\[CONSTRAINTS?\]|\[REQUIREMENTS?\]|\[RULES?\]/i.test(content);
|
||||
|
||||
if (hasTask && !hasConstraints && i > 1) {
|
||||
issues.push({
|
||||
id: `MEM-${issues.length + 1}`,
|
||||
type: 'memory_loss',
|
||||
severity: 'medium',
|
||||
location: { file: relativePath },
|
||||
description: 'Phase has TASK but no CONSTRAINTS/RULES section',
|
||||
evidence: ['Task defined without boundary constraints'],
|
||||
root_cause: 'Agent may not adhere to global constraints',
|
||||
impact: 'Potential goal drift from original intent',
|
||||
suggested_fix: 'Add [CONSTRAINTS] section referencing global rules'
|
||||
});
|
||||
}
|
||||
|
||||
// Check for checkpoint mechanism
|
||||
const hasCheckpoint = /checkpoint|snapshot|preserve|savepoint/i.test(content);
|
||||
const isKeyPhase = i === Math.floor(phaseFiles.length / 2) || i === phaseFiles.length - 1;
|
||||
|
||||
if (isKeyPhase && !hasCheckpoint && phaseFiles.length > 3) {
|
||||
issues.push({
|
||||
id: `MEM-${issues.length + 1}`,
|
||||
type: 'memory_loss',
|
||||
severity: 'low',
|
||||
location: { file: relativePath },
|
||||
description: 'Key phase without checkpoint mechanism',
|
||||
evidence: [`Phase ${phaseNum} is a key milestone but has no state preservation`],
|
||||
root_cause: 'Cannot recover from failures or verify constraint adherence',
|
||||
impact: 'No rollback capability if constraints violated',
|
||||
suggested_fix: 'Add checkpoint before major state changes'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Check for explicit state schema with constraints field
|
||||
const stateSchemaFile = Glob(`${skillPath}/phases/state-schema.md`)[0];
|
||||
if (stateSchemaFile) {
|
||||
const schemaContent = Read(stateSchemaFile);
|
||||
const hasConstraintsField = /constraints|requirements|original_request/i.test(schemaContent);
|
||||
|
||||
if (!hasConstraintsField) {
|
||||
issues.push({
|
||||
id: `MEM-${issues.length + 1}`,
|
||||
type: 'memory_loss',
|
||||
severity: 'medium',
|
||||
location: { file: 'phases/state-schema.md' },
|
||||
description: 'State schema lacks constraints/requirements field',
|
||||
evidence: ['No dedicated field for preserving original requirements'],
|
||||
root_cause: 'State structure does not support constraint persistence',
|
||||
impact: 'Constraints may be lost during state transitions',
|
||||
suggested_fix: 'Add original_requirements field to state schema'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Check SKILL.md for constraint enforcement in execution flow
|
||||
const skillMd = Read(`${skillPath}/SKILL.md`);
|
||||
const hasConstraintVerification = /constraint.*verif|verif.*constraint|quality.*gate/i.test(skillMd);
|
||||
|
||||
if (!hasConstraintVerification && phaseFiles.length > 3) {
|
||||
issues.push({
|
||||
id: `MEM-${issues.length + 1}`,
|
||||
type: 'memory_loss',
|
||||
severity: 'medium',
|
||||
location: { file: 'SKILL.md' },
|
||||
description: 'No constraint verification step in execution flow',
|
||||
evidence: ['Execution flow lacks quality gate or constraint check'],
|
||||
root_cause: 'No mechanism to verify output matches original intent',
|
||||
impact: 'Constraint violations may go undetected',
|
||||
suggested_fix: 'Add verification phase comparing output to original requirements'
|
||||
});
|
||||
}
|
||||
|
||||
// 5. Calculate severity
|
||||
const criticalCount = issues.filter(i => i.severity === 'critical').length;
|
||||
const highCount = issues.filter(i => i.severity === 'high').length;
|
||||
const severity = criticalCount > 0 ? 'critical' :
|
||||
highCount > 2 ? 'high' :
|
||||
highCount > 0 ? 'medium' :
|
||||
issues.length > 0 ? 'low' : 'none';
|
||||
|
||||
// 6. Write diagnosis result
|
||||
const diagnosisResult = {
|
||||
status: 'completed',
|
||||
issues_found: issues.length,
|
||||
severity: severity,
|
||||
execution_time_ms: Date.now() - startTime,
|
||||
details: {
|
||||
patterns_checked: [
|
||||
'constraint_propagation',
|
||||
'goal_drift',
|
||||
'checkpoint_mechanism',
|
||||
'state_schema_constraints'
|
||||
],
|
||||
patterns_matched: evidence.map(e => e.pattern),
|
||||
evidence: evidence,
|
||||
phase_analysis: {
|
||||
total_phases: phaseFiles.length,
|
||||
first_phase_constraints: firstPhaseConstraints.length,
|
||||
phases_with_constraint_ref: phaseFiles.length - issues.filter(i =>
|
||||
i.description.includes('does not reference')).length
|
||||
},
|
||||
recommendations: [
|
||||
highCount > 0 ? 'Implement constraint injection at each phase' : null,
|
||||
issues.some(i => i.description.includes('checkpoint'))
|
||||
? 'Add checkpoint/restore mechanism' : null,
|
||||
issues.some(i => i.description.includes('State schema'))
|
||||
? 'Add original_requirements to state schema' : null
|
||||
].filter(Boolean)
|
||||
}
|
||||
};
|
||||
|
||||
Write(`${workDir}/diagnosis/memory-diagnosis.json`,
|
||||
JSON.stringify(diagnosisResult, null, 2));
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
'diagnosis.memory': diagnosisResult,
|
||||
issues: [...state.issues, ...issues]
|
||||
},
|
||||
outputFiles: [`${workDir}/diagnosis/memory-diagnosis.json`],
|
||||
summary: `Memory diagnosis: ${issues.length} issues found (severity: ${severity})`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
'diagnosis.memory': {
|
||||
status: 'completed',
|
||||
issues_found: <count>,
|
||||
severity: '<critical|high|medium|low|none>',
|
||||
// ... full diagnosis result
|
||||
},
|
||||
issues: [...existingIssues, ...newIssues]
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| Phase file read error | Skip file, continue analysis |
|
||||
| No phases found | Report as structure issue |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- Success: action-diagnose-dataflow (or next in focus_areas)
|
||||
- Skipped: If 'memory' not in focus_areas
|
||||
@@ -0,0 +1,322 @@
|
||||
# Action: Gemini Analysis
|
||||
|
||||
动态调用 Gemini CLI 进行深度分析,根据用户需求或诊断结果选择分析类型。
|
||||
|
||||
## Role
|
||||
|
||||
- 接收用户指定的分析需求或从诊断结果推断需求
|
||||
- 构建适当的 CLI 命令
|
||||
- 执行分析并解析结果
|
||||
- 更新状态以供后续动作使用
|
||||
|
||||
## Preconditions
|
||||
|
||||
- `state.status === 'running'`
|
||||
- 满足以下任一条件:
|
||||
- `state.gemini_analysis_requested === true` (用户请求)
|
||||
- `state.issues.some(i => i.severity === 'critical')` (发现严重问题)
|
||||
- `state.analysis_type !== null` (已指定分析类型)
|
||||
|
||||
## Analysis Types
|
||||
|
||||
### 1. root_cause - 问题根因分析
|
||||
|
||||
针对用户描述的问题进行深度分析。
|
||||
|
||||
```javascript
|
||||
const analysisPrompt = `
|
||||
PURPOSE: Identify root cause of skill execution issue: ${state.user_issue_description}
|
||||
TASK:
|
||||
• Analyze skill structure at: ${state.target_skill.path}
|
||||
• Identify anti-patterns in phase files
|
||||
• Trace data flow through state management
|
||||
• Check agent coordination patterns
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: JSON with structure:
|
||||
{
|
||||
"root_causes": [
|
||||
{ "id": "RC-001", "description": "...", "severity": "high", "evidence": ["file:line"] }
|
||||
],
|
||||
"patterns_found": [
|
||||
{ "pattern": "...", "type": "anti-pattern|best-practice", "locations": [] }
|
||||
],
|
||||
"recommendations": [
|
||||
{ "priority": 1, "action": "...", "rationale": "..." }
|
||||
]
|
||||
}
|
||||
RULES: Focus on execution flow, state management, agent coordination
|
||||
`;
|
||||
```
|
||||
|
||||
### 2. architecture - 架构审查
|
||||
|
||||
评估 skill 的整体架构设计。
|
||||
|
||||
```javascript
|
||||
const analysisPrompt = `
|
||||
PURPOSE: Review skill architecture for: ${state.target_skill.name}
|
||||
TASK:
|
||||
• Evaluate phase decomposition and responsibility separation
|
||||
• Check state schema design and data flow
|
||||
• Assess agent coordination and error handling
|
||||
• Review scalability and maintainability
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: Markdown report with sections:
|
||||
- Executive Summary
|
||||
- Phase Architecture Assessment
|
||||
- State Management Evaluation
|
||||
- Agent Coordination Analysis
|
||||
- Improvement Recommendations (prioritized)
|
||||
RULES: Focus on modularity, extensibility, maintainability
|
||||
`;
|
||||
```
|
||||
|
||||
### 3. prompt_optimization - 提示词优化
|
||||
|
||||
分析和优化 phase 中的提示词。
|
||||
|
||||
```javascript
|
||||
const analysisPrompt = `
|
||||
PURPOSE: Optimize prompts in skill phases for better output quality
|
||||
TASK:
|
||||
• Analyze existing prompts for clarity and specificity
|
||||
• Identify ambiguous instructions
|
||||
• Check output format specifications
|
||||
• Evaluate constraint communication
|
||||
MODE: analysis
|
||||
CONTEXT: @phases/**/*.md
|
||||
EXPECTED: JSON with structure:
|
||||
{
|
||||
"prompt_issues": [
|
||||
{ "file": "...", "issue": "...", "severity": "...", "suggestion": "..." }
|
||||
],
|
||||
"optimized_prompts": [
|
||||
{ "file": "...", "original": "...", "optimized": "...", "rationale": "..." }
|
||||
]
|
||||
}
|
||||
RULES: Preserve intent, improve clarity, add structured output requirements
|
||||
`;
|
||||
```
|
||||
|
||||
### 4. performance - 性能分析
|
||||
|
||||
分析 Token 消耗和执行效率。
|
||||
|
||||
```javascript
|
||||
const analysisPrompt = `
|
||||
PURPOSE: Analyze performance bottlenecks in skill execution
|
||||
TASK:
|
||||
• Estimate token consumption per phase
|
||||
• Identify redundant data passing
|
||||
• Check for unnecessary full-content transfers
|
||||
• Evaluate caching opportunities
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: JSON with structure:
|
||||
{
|
||||
"token_estimates": [
|
||||
{ "phase": "...", "estimated_tokens": 1000, "breakdown": {} }
|
||||
],
|
||||
"bottlenecks": [
|
||||
{ "type": "...", "location": "...", "impact": "high|medium|low", "fix": "..." }
|
||||
],
|
||||
"optimization_suggestions": []
|
||||
}
|
||||
RULES: Focus on token efficiency, reduce redundancy
|
||||
`;
|
||||
```
|
||||
|
||||
### 5. custom - 自定义分析
|
||||
|
||||
用户指定的自定义分析需求。
|
||||
|
||||
```javascript
|
||||
const analysisPrompt = `
|
||||
PURPOSE: ${state.custom_analysis_purpose}
|
||||
TASK: ${state.custom_analysis_tasks}
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: ${state.custom_analysis_expected}
|
||||
RULES: ${state.custom_analysis_rules || 'Follow best practices'}
|
||||
`;
|
||||
```
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function executeGeminiAnalysis(state, workDir) {
|
||||
// 1. 确定分析类型
|
||||
const analysisType = state.analysis_type || determineAnalysisType(state);
|
||||
|
||||
// 2. 构建 prompt
|
||||
const prompt = buildAnalysisPrompt(analysisType, state);
|
||||
|
||||
// 3. 构建 CLI 命令
|
||||
const cliCommand = `ccw cli -p "${escapeForShell(prompt)}" --tool gemini --mode analysis --cd "${state.target_skill.path}"`;
|
||||
|
||||
console.log(`Executing Gemini analysis: ${analysisType}`);
|
||||
console.log(`Command: ${cliCommand}`);
|
||||
|
||||
// 4. 执行 CLI (后台运行)
|
||||
const result = Bash({
|
||||
command: cliCommand,
|
||||
run_in_background: true,
|
||||
timeout: 300000 // 5 minutes
|
||||
});
|
||||
|
||||
// 5. 等待结果
|
||||
// 注意: 根据 CLAUDE.md 指引,CLI 后台执行后应停止轮询
|
||||
// 结果会在 CLI 完成后写入 state
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
gemini_analysis: {
|
||||
type: analysisType,
|
||||
status: 'running',
|
||||
started_at: new Date().toISOString(),
|
||||
task_id: result.task_id
|
||||
}
|
||||
},
|
||||
outputFiles: [],
|
||||
summary: `Gemini ${analysisType} analysis started in background`
|
||||
};
|
||||
}
|
||||
|
||||
function determineAnalysisType(state) {
|
||||
// 根据状态推断分析类型
|
||||
if (state.user_issue_description && state.user_issue_description.length > 100) {
|
||||
return 'root_cause';
|
||||
}
|
||||
if (state.issues.some(i => i.severity === 'critical')) {
|
||||
return 'root_cause';
|
||||
}
|
||||
if (state.focus_areas.includes('architecture')) {
|
||||
return 'architecture';
|
||||
}
|
||||
if (state.focus_areas.includes('prompt')) {
|
||||
return 'prompt_optimization';
|
||||
}
|
||||
if (state.focus_areas.includes('performance')) {
|
||||
return 'performance';
|
||||
}
|
||||
return 'root_cause'; // 默认
|
||||
}
|
||||
|
||||
function buildAnalysisPrompt(type, state) {
|
||||
const templates = {
|
||||
root_cause: () => `
|
||||
PURPOSE: Identify root cause of skill execution issue: ${state.user_issue_description}
|
||||
TASK: • Analyze skill structure • Identify anti-patterns • Trace data flow issues • Check agent coordination
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: JSON { root_causes: [], patterns_found: [], recommendations: [] }
|
||||
RULES: Focus on execution flow, be specific about file:line locations
|
||||
`,
|
||||
architecture: () => `
|
||||
PURPOSE: Review skill architecture for ${state.target_skill.name}
|
||||
TASK: • Evaluate phase decomposition • Check state design • Assess agent coordination • Review extensibility
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: Markdown architecture assessment report
|
||||
RULES: Focus on modularity and maintainability
|
||||
`,
|
||||
prompt_optimization: () => `
|
||||
PURPOSE: Optimize prompts in skill for better output quality
|
||||
TASK: • Analyze prompt clarity • Check output specifications • Evaluate constraint handling
|
||||
MODE: analysis
|
||||
CONTEXT: @phases/**/*.md
|
||||
EXPECTED: JSON { prompt_issues: [], optimized_prompts: [] }
|
||||
RULES: Preserve intent, improve clarity
|
||||
`,
|
||||
performance: () => `
|
||||
PURPOSE: Analyze performance bottlenecks in skill
|
||||
TASK: • Estimate token consumption • Identify redundancy • Check data transfer efficiency
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: JSON { token_estimates: [], bottlenecks: [], optimization_suggestions: [] }
|
||||
RULES: Focus on token efficiency
|
||||
`,
|
||||
custom: () => `
|
||||
PURPOSE: ${state.custom_analysis_purpose}
|
||||
TASK: ${state.custom_analysis_tasks}
|
||||
MODE: analysis
|
||||
CONTEXT: @**/*.md
|
||||
EXPECTED: ${state.custom_analysis_expected}
|
||||
RULES: ${state.custom_analysis_rules || 'Best practices'}
|
||||
`
|
||||
};
|
||||
|
||||
return templates[type]();
|
||||
}
|
||||
|
||||
function escapeForShell(str) {
|
||||
// 转义 shell 特殊字符
|
||||
return str.replace(/"/g, '\\"').replace(/\$/g, '\\$').replace(/`/g, '\\`');
|
||||
}
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
### State Updates
|
||||
|
||||
```javascript
|
||||
{
|
||||
gemini_analysis: {
|
||||
type: 'root_cause' | 'architecture' | 'prompt_optimization' | 'performance' | 'custom',
|
||||
status: 'running' | 'completed' | 'failed',
|
||||
started_at: '2024-01-01T00:00:00Z',
|
||||
completed_at: '2024-01-01T00:05:00Z',
|
||||
task_id: 'xxx',
|
||||
result: { /* 分析结果 */ },
|
||||
error: null
|
||||
},
|
||||
// 分析结果合并到 issues
|
||||
issues: [
|
||||
...state.issues,
|
||||
...newIssuesFromAnalysis
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Output Files
|
||||
|
||||
- `${workDir}/diagnosis/gemini-analysis-${type}.json` - 原始分析结果
|
||||
- `${workDir}/diagnosis/gemini-analysis-${type}.md` - 格式化报告
|
||||
|
||||
## Post-Execution
|
||||
|
||||
分析完成后:
|
||||
1. 解析 CLI 输出为结构化数据
|
||||
2. 提取新发现的 issues 合并到 state.issues
|
||||
3. 更新 recommendations 到 state
|
||||
4. 触发下一步动作 (通常是 action-generate-report 或 action-propose-fixes)
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Recovery |
|
||||
|-------|----------|
|
||||
| CLI 超时 | 重试一次,仍失败则跳过 Gemini 分析 |
|
||||
| 解析失败 | 保存原始输出,手动处理 |
|
||||
| 无结果 | 标记为 skipped,继续流程 |
|
||||
|
||||
## User Interaction
|
||||
|
||||
如果 `state.analysis_type === null` 且无法自动推断,询问用户:
|
||||
|
||||
```javascript
|
||||
AskUserQuestion({
|
||||
questions: [{
|
||||
question: '请选择 Gemini 分析类型',
|
||||
header: '分析类型',
|
||||
options: [
|
||||
{ label: '问题根因分析', description: '深度分析用户描述的问题' },
|
||||
{ label: '架构审查', description: '评估整体架构设计' },
|
||||
{ label: '提示词优化', description: '分析和优化 phase 提示词' },
|
||||
{ label: '性能分析', description: '分析 Token 消耗和执行效率' }
|
||||
],
|
||||
multiSelect: false
|
||||
}]
|
||||
});
|
||||
```
|
||||
@@ -0,0 +1,228 @@
|
||||
# Action: Generate Consolidated Report
|
||||
|
||||
Generate a comprehensive tuning report merging all diagnosis results with prioritized recommendations.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Merge all diagnosis results into unified report
|
||||
- Prioritize issues by severity and impact
|
||||
- Generate actionable recommendations
|
||||
- Create human-readable markdown report
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] All diagnoses in focus_areas are completed
|
||||
- [ ] state.issues.length > 0 OR generate summary report
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
console.log('Generating consolidated tuning report...');
|
||||
|
||||
const targetSkill = state.target_skill;
|
||||
const issues = state.issues;
|
||||
|
||||
// 1. Group issues by type
|
||||
const issuesByType = {
|
||||
context_explosion: issues.filter(i => i.type === 'context_explosion'),
|
||||
memory_loss: issues.filter(i => i.type === 'memory_loss'),
|
||||
dataflow_break: issues.filter(i => i.type === 'dataflow_break'),
|
||||
agent_failure: issues.filter(i => i.type === 'agent_failure')
|
||||
};
|
||||
|
||||
// 2. Group issues by severity
|
||||
const issuesBySeverity = {
|
||||
critical: issues.filter(i => i.severity === 'critical'),
|
||||
high: issues.filter(i => i.severity === 'high'),
|
||||
medium: issues.filter(i => i.severity === 'medium'),
|
||||
low: issues.filter(i => i.severity === 'low')
|
||||
};
|
||||
|
||||
// 3. Calculate overall health score
|
||||
const weights = { critical: 25, high: 15, medium: 5, low: 1 };
|
||||
const deductions = Object.entries(issuesBySeverity)
|
||||
.reduce((sum, [sev, arr]) => sum + arr.length * weights[sev], 0);
|
||||
const healthScore = Math.max(0, 100 - deductions);
|
||||
|
||||
// 4. Generate report content
|
||||
const report = `# Skill Tuning Report
|
||||
|
||||
**Target Skill**: ${targetSkill.name}
|
||||
**Path**: ${targetSkill.path}
|
||||
**Execution Mode**: ${targetSkill.execution_mode}
|
||||
**Generated**: ${new Date().toISOString()}
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Health Score | ${healthScore}/100 |
|
||||
| Total Issues | ${issues.length} |
|
||||
| Critical | ${issuesBySeverity.critical.length} |
|
||||
| High | ${issuesBySeverity.high.length} |
|
||||
| Medium | ${issuesBySeverity.medium.length} |
|
||||
| Low | ${issuesBySeverity.low.length} |
|
||||
|
||||
### User Reported Issue
|
||||
> ${state.user_issue_description}
|
||||
|
||||
### Overall Assessment
|
||||
${healthScore >= 80 ? '✅ Skill is in good health with minor issues.' :
|
||||
healthScore >= 60 ? '⚠️ Skill has significant issues requiring attention.' :
|
||||
healthScore >= 40 ? '🔶 Skill has serious issues affecting reliability.' :
|
||||
'❌ Skill has critical issues requiring immediate fixes.'}
|
||||
|
||||
---
|
||||
|
||||
## Diagnosis Results
|
||||
|
||||
### Context Explosion Analysis
|
||||
${state.diagnosis.context ?
|
||||
`- **Status**: ${state.diagnosis.context.status}
|
||||
- **Severity**: ${state.diagnosis.context.severity}
|
||||
- **Issues Found**: ${state.diagnosis.context.issues_found}
|
||||
- **Key Findings**: ${state.diagnosis.context.details.recommendations.join('; ') || 'None'}` :
|
||||
'_Not analyzed_'}
|
||||
|
||||
### Long-tail Memory Analysis
|
||||
${state.diagnosis.memory ?
|
||||
`- **Status**: ${state.diagnosis.memory.status}
|
||||
- **Severity**: ${state.diagnosis.memory.severity}
|
||||
- **Issues Found**: ${state.diagnosis.memory.issues_found}
|
||||
- **Key Findings**: ${state.diagnosis.memory.details.recommendations.join('; ') || 'None'}` :
|
||||
'_Not analyzed_'}
|
||||
|
||||
### Data Flow Analysis
|
||||
${state.diagnosis.dataflow ?
|
||||
`- **Status**: ${state.diagnosis.dataflow.status}
|
||||
- **Severity**: ${state.diagnosis.dataflow.severity}
|
||||
- **Issues Found**: ${state.diagnosis.dataflow.issues_found}
|
||||
- **Key Findings**: ${state.diagnosis.dataflow.details.recommendations.join('; ') || 'None'}` :
|
||||
'_Not analyzed_'}
|
||||
|
||||
### Agent Coordination Analysis
|
||||
${state.diagnosis.agent ?
|
||||
`- **Status**: ${state.diagnosis.agent.status}
|
||||
- **Severity**: ${state.diagnosis.agent.severity}
|
||||
- **Issues Found**: ${state.diagnosis.agent.issues_found}
|
||||
- **Key Findings**: ${state.diagnosis.agent.details.recommendations.join('; ') || 'None'}` :
|
||||
'_Not analyzed_'}
|
||||
|
||||
---
|
||||
|
||||
## Critical & High Priority Issues
|
||||
|
||||
${issuesBySeverity.critical.length + issuesBySeverity.high.length === 0 ?
|
||||
'_No critical or high priority issues found._' :
|
||||
[...issuesBySeverity.critical, ...issuesBySeverity.high].map((issue, i) => `
|
||||
### ${i + 1}. [${issue.severity.toUpperCase()}] ${issue.description}
|
||||
|
||||
- **ID**: ${issue.id}
|
||||
- **Type**: ${issue.type}
|
||||
- **Location**: ${typeof issue.location === 'object' ? issue.location.file : issue.location}
|
||||
- **Root Cause**: ${issue.root_cause}
|
||||
- **Impact**: ${issue.impact}
|
||||
- **Suggested Fix**: ${issue.suggested_fix}
|
||||
|
||||
**Evidence**:
|
||||
${issue.evidence.map(e => `- \`${e}\``).join('\n')}
|
||||
`).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Medium & Low Priority Issues
|
||||
|
||||
${issuesBySeverity.medium.length + issuesBySeverity.low.length === 0 ?
|
||||
'_No medium or low priority issues found._' :
|
||||
[...issuesBySeverity.medium, ...issuesBySeverity.low].map((issue, i) => `
|
||||
### ${i + 1}. [${issue.severity.toUpperCase()}] ${issue.description}
|
||||
|
||||
- **ID**: ${issue.id}
|
||||
- **Type**: ${issue.type}
|
||||
- **Suggested Fix**: ${issue.suggested_fix}
|
||||
`).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Recommended Fix Order
|
||||
|
||||
Based on severity and dependencies, apply fixes in this order:
|
||||
|
||||
${[...issuesBySeverity.critical, ...issuesBySeverity.high, ...issuesBySeverity.medium]
|
||||
.slice(0, 10)
|
||||
.map((issue, i) => `${i + 1}. **${issue.id}**: ${issue.suggested_fix}`)
|
||||
.join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
| Gate | Threshold | Current | Status |
|
||||
|------|-----------|---------|--------|
|
||||
| Critical Issues | 0 | ${issuesBySeverity.critical.length} | ${issuesBySeverity.critical.length === 0 ? '✅ PASS' : '❌ FAIL'} |
|
||||
| High Issues | ≤ 2 | ${issuesBySeverity.high.length} | ${issuesBySeverity.high.length <= 2 ? '✅ PASS' : '❌ FAIL'} |
|
||||
| Health Score | ≥ 60 | ${healthScore} | ${healthScore >= 60 ? '✅ PASS' : '❌ FAIL'} |
|
||||
|
||||
**Overall Quality Gate**: ${
|
||||
issuesBySeverity.critical.length === 0 &&
|
||||
issuesBySeverity.high.length <= 2 &&
|
||||
healthScore >= 60 ? '✅ PASS' : '❌ FAIL'}
|
||||
|
||||
---
|
||||
|
||||
*Report generated by skill-tuning*
|
||||
`;
|
||||
|
||||
// 5. Write report
|
||||
Write(`${workDir}/tuning-report.md`, report);
|
||||
|
||||
// 6. Calculate quality gate
|
||||
const qualityGate = issuesBySeverity.critical.length === 0 &&
|
||||
issuesBySeverity.high.length <= 2 &&
|
||||
healthScore >= 60 ? 'pass' :
|
||||
healthScore >= 40 ? 'review' : 'fail';
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
quality_score: healthScore,
|
||||
quality_gate: qualityGate,
|
||||
issues_by_severity: {
|
||||
critical: issuesBySeverity.critical.length,
|
||||
high: issuesBySeverity.high.length,
|
||||
medium: issuesBySeverity.medium.length,
|
||||
low: issuesBySeverity.low.length
|
||||
}
|
||||
},
|
||||
outputFiles: [`${workDir}/tuning-report.md`],
|
||||
summary: `Report generated: ${issues.length} issues, health score ${healthScore}/100, gate: ${qualityGate}`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
quality_score: <0-100>,
|
||||
quality_gate: '<pass|review|fail>',
|
||||
issues_by_severity: { critical: N, high: N, medium: N, low: N }
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| Write error | Retry to alternative path |
|
||||
| Empty issues | Generate summary with no issues |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- If issues.length > 0: action-propose-fixes
|
||||
- If issues.length === 0: action-complete
|
||||
149
.claude/skills/skill-tuning/phases/actions/action-init.md
Normal file
149
.claude/skills/skill-tuning/phases/actions/action-init.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# Action: Initialize Tuning Session
|
||||
|
||||
Initialize the skill-tuning session by collecting target skill information, creating work directories, and setting up initial state.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Identify target skill to tune
|
||||
- Collect user's problem description
|
||||
- Create work directory structure
|
||||
- Backup original skill files
|
||||
- Initialize state for orchestrator
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'pending'
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
// 1. Ask user for target skill
|
||||
const skillInput = await AskUserQuestion({
|
||||
questions: [{
|
||||
question: "Which skill do you want to tune?",
|
||||
header: "Target Skill",
|
||||
multiSelect: false,
|
||||
options: [
|
||||
{ label: "Specify path", description: "Enter skill directory path" }
|
||||
]
|
||||
}]
|
||||
});
|
||||
|
||||
const skillPath = skillInput["Target Skill"];
|
||||
|
||||
// 2. Validate skill exists and read structure
|
||||
const skillMdPath = `${skillPath}/SKILL.md`;
|
||||
if (!Glob(`${skillPath}/SKILL.md`).length) {
|
||||
throw new Error(`Invalid skill path: ${skillPath} - SKILL.md not found`);
|
||||
}
|
||||
|
||||
// 3. Read skill metadata
|
||||
const skillMd = Read(skillMdPath);
|
||||
const frontMatterMatch = skillMd.match(/^---\n([\s\S]*?)\n---/);
|
||||
const skillName = frontMatterMatch
|
||||
? frontMatterMatch[1].match(/name:\s*(.+)/)?.[1]?.trim()
|
||||
: skillPath.split('/').pop();
|
||||
|
||||
// 4. Detect execution mode
|
||||
const hasOrchestrator = Glob(`${skillPath}/phases/orchestrator.md`).length > 0;
|
||||
const executionMode = hasOrchestrator ? 'autonomous' : 'sequential';
|
||||
|
||||
// 5. Scan skill structure
|
||||
const phases = Glob(`${skillPath}/phases/**/*.md`).map(f => f.replace(skillPath + '/', ''));
|
||||
const specs = Glob(`${skillPath}/specs/**/*.md`).map(f => f.replace(skillPath + '/', ''));
|
||||
|
||||
// 6. Ask for problem description
|
||||
const issueInput = await AskUserQuestion({
|
||||
questions: [{
|
||||
question: "Describe the issue or what you want to optimize:",
|
||||
header: "Issue",
|
||||
multiSelect: false,
|
||||
options: [
|
||||
{ label: "Context grows too large", description: "Token explosion over multiple turns" },
|
||||
{ label: "Instructions forgotten", description: "Early constraints lost in long execution" },
|
||||
{ label: "Data inconsistency", description: "State format changes between phases" },
|
||||
{ label: "Agent failures", description: "Sub-agent calls fail or return unexpected results" }
|
||||
]
|
||||
}]
|
||||
});
|
||||
|
||||
// 7. Ask for focus areas
|
||||
const focusInput = await AskUserQuestion({
|
||||
questions: [{
|
||||
question: "Which areas should be diagnosed? (Select all that apply)",
|
||||
header: "Focus",
|
||||
multiSelect: true,
|
||||
options: [
|
||||
{ label: "context", description: "Context explosion analysis" },
|
||||
{ label: "memory", description: "Long-tail forgetting analysis" },
|
||||
{ label: "dataflow", description: "Data flow analysis" },
|
||||
{ label: "agent", description: "Agent coordination analysis" }
|
||||
]
|
||||
}]
|
||||
});
|
||||
|
||||
const focusAreas = focusInput["Focus"] || ['context', 'memory', 'dataflow', 'agent'];
|
||||
|
||||
// 8. Create backup
|
||||
const backupDir = `${workDir}/backups/${skillName}-backup`;
|
||||
Bash(`mkdir -p "${backupDir}"`);
|
||||
Bash(`cp -r "${skillPath}"/* "${backupDir}/"`);
|
||||
|
||||
// 9. Return state updates
|
||||
return {
|
||||
stateUpdates: {
|
||||
status: 'running',
|
||||
started_at: new Date().toISOString(),
|
||||
target_skill: {
|
||||
name: skillName,
|
||||
path: skillPath,
|
||||
execution_mode: executionMode,
|
||||
phases: phases,
|
||||
specs: specs
|
||||
},
|
||||
user_issue_description: issueInput["Issue"],
|
||||
focus_areas: Array.isArray(focusAreas) ? focusAreas : [focusAreas],
|
||||
work_dir: workDir,
|
||||
backup_dir: backupDir
|
||||
},
|
||||
outputFiles: [],
|
||||
summary: `Initialized tuning for "${skillName}" (${executionMode} mode), focus: ${focusAreas.join(', ')}`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
status: 'running',
|
||||
started_at: '<timestamp>',
|
||||
target_skill: {
|
||||
name: '<skill-name>',
|
||||
path: '<skill-path>',
|
||||
execution_mode: '<sequential|autonomous>',
|
||||
phases: ['...'],
|
||||
specs: ['...']
|
||||
},
|
||||
user_issue_description: '<user description>',
|
||||
focus_areas: ['context', 'memory', ...],
|
||||
work_dir: '<work-dir>',
|
||||
backup_dir: '<backup-dir>'
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| Skill path not found | Ask user to re-enter valid path |
|
||||
| SKILL.md missing | Suggest path correction |
|
||||
| Backup creation failed | Retry with alternative location |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- Success: Continue to first diagnosis action based on focus_areas
|
||||
- Failure: action-abort
|
||||
@@ -0,0 +1,317 @@
|
||||
# Action: Propose Fixes
|
||||
|
||||
Generate fix proposals for identified issues with implementation strategies.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Create fix strategies for each issue
|
||||
- Generate implementation plans
|
||||
- Estimate risk levels
|
||||
- Allow user to select fixes to apply
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] state.issues.length > 0
|
||||
- [ ] action-generate-report completed
|
||||
|
||||
## Fix Strategy Catalog
|
||||
|
||||
### Context Explosion Fixes
|
||||
|
||||
| Strategy | Description | Risk |
|
||||
|----------|-------------|------|
|
||||
| `context_summarization` | Add summarizer agent between phases | low |
|
||||
| `sliding_window` | Keep only last N turns in context | low |
|
||||
| `structured_state` | Replace text context with JSON state | medium |
|
||||
| `path_reference` | Pass file paths instead of content | low |
|
||||
|
||||
### Memory Loss Fixes
|
||||
|
||||
| Strategy | Description | Risk |
|
||||
|----------|-------------|------|
|
||||
| `constraint_injection` | Add constraints to each phase prompt | low |
|
||||
| `checkpoint_restore` | Save state at milestones | low |
|
||||
| `goal_embedding` | Track goal similarity throughout | medium |
|
||||
| `state_constraints_field` | Add constraints field to state schema | low |
|
||||
|
||||
### Data Flow Fixes
|
||||
|
||||
| Strategy | Description | Risk |
|
||||
|----------|-------------|------|
|
||||
| `state_centralization` | Single state.json for all data | medium |
|
||||
| `schema_enforcement` | Add Zod validation | low |
|
||||
| `field_normalization` | Normalize field names | low |
|
||||
| `transactional_updates` | Atomic state updates | medium |
|
||||
|
||||
### Agent Coordination Fixes
|
||||
|
||||
| Strategy | Description | Risk |
|
||||
|----------|-------------|------|
|
||||
| `error_wrapping` | Add try-catch to all Task calls | low |
|
||||
| `result_validation` | Validate agent returns | low |
|
||||
| `orchestrator_refactor` | Centralize agent coordination | high |
|
||||
| `flatten_nesting` | Remove nested agent calls | medium |
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
console.log('Generating fix proposals...');
|
||||
|
||||
const issues = state.issues;
|
||||
const fixes = [];
|
||||
|
||||
// Group issues by type for batch fixes
|
||||
const issuesByType = {
|
||||
context_explosion: issues.filter(i => i.type === 'context_explosion'),
|
||||
memory_loss: issues.filter(i => i.type === 'memory_loss'),
|
||||
dataflow_break: issues.filter(i => i.type === 'dataflow_break'),
|
||||
agent_failure: issues.filter(i => i.type === 'agent_failure')
|
||||
};
|
||||
|
||||
// Generate fixes for context explosion
|
||||
if (issuesByType.context_explosion.length > 0) {
|
||||
const ctxIssues = issuesByType.context_explosion;
|
||||
|
||||
if (ctxIssues.some(i => i.description.includes('history accumulation'))) {
|
||||
fixes.push({
|
||||
id: `FIX-${fixes.length + 1}`,
|
||||
issue_ids: ctxIssues.filter(i => i.description.includes('history')).map(i => i.id),
|
||||
strategy: 'sliding_window',
|
||||
description: 'Implement sliding window for conversation history',
|
||||
rationale: 'Prevents unbounded context growth by keeping only recent turns',
|
||||
changes: [{
|
||||
file: 'phases/orchestrator.md',
|
||||
action: 'modify',
|
||||
diff: `+ const MAX_HISTORY = 5;
|
||||
+ state.history = state.history.slice(-MAX_HISTORY);`
|
||||
}],
|
||||
risk: 'low',
|
||||
estimated_impact: 'Reduces token usage by ~50%',
|
||||
verification_steps: ['Run skill with 10+ iterations', 'Verify context size stable']
|
||||
});
|
||||
}
|
||||
|
||||
if (ctxIssues.some(i => i.description.includes('full content'))) {
|
||||
fixes.push({
|
||||
id: `FIX-${fixes.length + 1}`,
|
||||
issue_ids: ctxIssues.filter(i => i.description.includes('content')).map(i => i.id),
|
||||
strategy: 'path_reference',
|
||||
description: 'Pass file paths instead of full content',
|
||||
rationale: 'Agents can read files when needed, reducing prompt size',
|
||||
changes: [{
|
||||
file: 'phases/*.md',
|
||||
action: 'modify',
|
||||
diff: `- prompt: \${content}
|
||||
+ prompt: Read file at: \${filePath}`
|
||||
}],
|
||||
risk: 'low',
|
||||
estimated_impact: 'Significant token reduction',
|
||||
verification_steps: ['Verify agents can still access needed content']
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Generate fixes for memory loss
|
||||
if (issuesByType.memory_loss.length > 0) {
|
||||
const memIssues = issuesByType.memory_loss;
|
||||
|
||||
if (memIssues.some(i => i.description.includes('constraint'))) {
|
||||
fixes.push({
|
||||
id: `FIX-${fixes.length + 1}`,
|
||||
issue_ids: memIssues.filter(i => i.description.includes('constraint')).map(i => i.id),
|
||||
strategy: 'constraint_injection',
|
||||
description: 'Add constraint injection to all phases',
|
||||
rationale: 'Ensures original requirements are visible in every phase',
|
||||
changes: [{
|
||||
file: 'phases/*.md',
|
||||
action: 'modify',
|
||||
diff: `+ [CONSTRAINTS]
|
||||
+ Original requirements from state.original_requirements:
|
||||
+ \${JSON.stringify(state.original_requirements)}`
|
||||
}],
|
||||
risk: 'low',
|
||||
estimated_impact: 'Improves constraint adherence',
|
||||
verification_steps: ['Run skill with specific constraints', 'Verify output matches']
|
||||
});
|
||||
}
|
||||
|
||||
if (memIssues.some(i => i.description.includes('State schema'))) {
|
||||
fixes.push({
|
||||
id: `FIX-${fixes.length + 1}`,
|
||||
issue_ids: memIssues.filter(i => i.description.includes('schema')).map(i => i.id),
|
||||
strategy: 'state_constraints_field',
|
||||
description: 'Add original_requirements field to state schema',
|
||||
rationale: 'Preserves original intent throughout execution',
|
||||
changes: [{
|
||||
file: 'phases/state-schema.md',
|
||||
action: 'modify',
|
||||
diff: `+ original_requirements: string[]; // User's original constraints
|
||||
+ goal_summary: string; // One-line goal statement`
|
||||
}],
|
||||
risk: 'low',
|
||||
estimated_impact: 'Enables constraint tracking',
|
||||
verification_steps: ['Verify state includes requirements after init']
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Generate fixes for data flow
|
||||
if (issuesByType.dataflow_break.length > 0) {
|
||||
const dfIssues = issuesByType.dataflow_break;
|
||||
|
||||
if (dfIssues.some(i => i.description.includes('multiple locations'))) {
|
||||
fixes.push({
|
||||
id: `FIX-${fixes.length + 1}`,
|
||||
issue_ids: dfIssues.filter(i => i.description.includes('location')).map(i => i.id),
|
||||
strategy: 'state_centralization',
|
||||
description: 'Centralize all state to single state.json',
|
||||
rationale: 'Single source of truth prevents inconsistencies',
|
||||
changes: [{
|
||||
file: 'phases/*.md',
|
||||
action: 'modify',
|
||||
diff: `- Write(\`\${workDir}/config.json\`, ...)
|
||||
+ updateState({ config: ... }) // Use state manager`
|
||||
}],
|
||||
risk: 'medium',
|
||||
estimated_impact: 'Eliminates state fragmentation',
|
||||
verification_steps: ['Verify all reads come from state.json', 'Test state persistence']
|
||||
});
|
||||
}
|
||||
|
||||
if (dfIssues.some(i => i.description.includes('validation'))) {
|
||||
fixes.push({
|
||||
id: `FIX-${fixes.length + 1}`,
|
||||
issue_ids: dfIssues.filter(i => i.description.includes('validation')).map(i => i.id),
|
||||
strategy: 'schema_enforcement',
|
||||
description: 'Add Zod schema validation',
|
||||
rationale: 'Runtime validation catches schema violations',
|
||||
changes: [{
|
||||
file: 'phases/state-schema.md',
|
||||
action: 'modify',
|
||||
diff: `+ import { z } from 'zod';
|
||||
+ const StateSchema = z.object({...});
|
||||
+ function validateState(s) { return StateSchema.parse(s); }`
|
||||
}],
|
||||
risk: 'low',
|
||||
estimated_impact: 'Catches invalid state early',
|
||||
verification_steps: ['Test with invalid state input', 'Verify error thrown']
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Generate fixes for agent coordination
|
||||
if (issuesByType.agent_failure.length > 0) {
|
||||
const agentIssues = issuesByType.agent_failure;
|
||||
|
||||
if (agentIssues.some(i => i.description.includes('error handling'))) {
|
||||
fixes.push({
|
||||
id: `FIX-${fixes.length + 1}`,
|
||||
issue_ids: agentIssues.filter(i => i.description.includes('error')).map(i => i.id),
|
||||
strategy: 'error_wrapping',
|
||||
description: 'Wrap all Task calls in try-catch',
|
||||
rationale: 'Prevents cascading failures from agent errors',
|
||||
changes: [{
|
||||
file: 'phases/*.md',
|
||||
action: 'modify',
|
||||
diff: `+ try {
|
||||
const result = await Task({...});
|
||||
+ if (!result) throw new Error('Empty result');
|
||||
+ } catch (e) {
|
||||
+ updateState({ errors: [...errors, e.message], error_count: error_count + 1 });
|
||||
+ }`
|
||||
}],
|
||||
risk: 'low',
|
||||
estimated_impact: 'Improves error resilience',
|
||||
verification_steps: ['Simulate agent failure', 'Verify graceful handling']
|
||||
});
|
||||
}
|
||||
|
||||
if (agentIssues.some(i => i.description.includes('nested'))) {
|
||||
fixes.push({
|
||||
id: `FIX-${fixes.length + 1}`,
|
||||
issue_ids: agentIssues.filter(i => i.description.includes('nested')).map(i => i.id),
|
||||
strategy: 'flatten_nesting',
|
||||
description: 'Flatten nested agent calls',
|
||||
rationale: 'Reduces complexity and context explosion',
|
||||
changes: [{
|
||||
file: 'phases/orchestrator.md',
|
||||
action: 'modify',
|
||||
diff: `// Instead of agent calling agent:
|
||||
// Agent A returns {needs_agent_b: true}
|
||||
// Orchestrator sees this and calls Agent B next`
|
||||
}],
|
||||
risk: 'medium',
|
||||
estimated_impact: 'Reduces nesting depth',
|
||||
verification_steps: ['Verify no nested Task calls', 'Test agent chaining via orchestrator']
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Write fix proposals
|
||||
Write(`${workDir}/fixes/fix-proposals.json`, JSON.stringify(fixes, null, 2));
|
||||
|
||||
// Ask user to select fixes to apply
|
||||
const fixOptions = fixes.slice(0, 4).map(f => ({
|
||||
label: f.id,
|
||||
description: `[${f.risk.toUpperCase()} risk] ${f.description}`
|
||||
}));
|
||||
|
||||
if (fixOptions.length > 0) {
|
||||
const selection = await AskUserQuestion({
|
||||
questions: [{
|
||||
question: 'Which fixes would you like to apply?',
|
||||
header: 'Fixes',
|
||||
multiSelect: true,
|
||||
options: fixOptions
|
||||
}]
|
||||
});
|
||||
|
||||
const selectedFixIds = Array.isArray(selection['Fixes'])
|
||||
? selection['Fixes']
|
||||
: [selection['Fixes']];
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
proposed_fixes: fixes,
|
||||
pending_fixes: selectedFixIds.filter(id => id && fixes.some(f => f.id === id))
|
||||
},
|
||||
outputFiles: [`${workDir}/fixes/fix-proposals.json`],
|
||||
summary: `Generated ${fixes.length} fix proposals, ${selectedFixIds.length} selected for application`
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
proposed_fixes: fixes,
|
||||
pending_fixes: []
|
||||
},
|
||||
outputFiles: [`${workDir}/fixes/fix-proposals.json`],
|
||||
summary: `Generated ${fixes.length} fix proposals (none selected)`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
proposed_fixes: [...fixes],
|
||||
pending_fixes: [...selectedFixIds]
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| No issues to fix | Skip to action-complete |
|
||||
| User cancels selection | Set pending_fixes to empty |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- If pending_fixes.length > 0: action-apply-fix
|
||||
- If pending_fixes.length === 0: action-complete
|
||||
222
.claude/skills/skill-tuning/phases/actions/action-verify.md
Normal file
222
.claude/skills/skill-tuning/phases/actions/action-verify.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Action: Verify Applied Fixes
|
||||
|
||||
Verify that applied fixes resolved the targeted issues.
|
||||
|
||||
## Purpose
|
||||
|
||||
- Re-run relevant diagnostics
|
||||
- Compare before/after issue counts
|
||||
- Update verification status
|
||||
- Determine if more iterations needed
|
||||
|
||||
## Preconditions
|
||||
|
||||
- [ ] state.status === 'running'
|
||||
- [ ] state.applied_fixes.length > 0
|
||||
- [ ] Some applied_fixes have verification_result === 'pending'
|
||||
|
||||
## Execution
|
||||
|
||||
```javascript
|
||||
async function execute(state, workDir) {
|
||||
console.log('Verifying applied fixes...');
|
||||
|
||||
const appliedFixes = state.applied_fixes.filter(f => f.verification_result === 'pending');
|
||||
|
||||
if (appliedFixes.length === 0) {
|
||||
return {
|
||||
stateUpdates: {},
|
||||
outputFiles: [],
|
||||
summary: 'No fixes pending verification'
|
||||
};
|
||||
}
|
||||
|
||||
const verificationResults = [];
|
||||
|
||||
for (const fix of appliedFixes) {
|
||||
const proposedFix = state.proposed_fixes.find(f => f.id === fix.fix_id);
|
||||
|
||||
if (!proposedFix) {
|
||||
verificationResults.push({
|
||||
fix_id: fix.fix_id,
|
||||
result: 'fail',
|
||||
reason: 'Fix definition not found'
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
// Determine which diagnosis to re-run based on fix strategy
|
||||
const strategyToDiagnosis = {
|
||||
'context_summarization': 'context',
|
||||
'sliding_window': 'context',
|
||||
'structured_state': 'context',
|
||||
'path_reference': 'context',
|
||||
'constraint_injection': 'memory',
|
||||
'checkpoint_restore': 'memory',
|
||||
'goal_embedding': 'memory',
|
||||
'state_constraints_field': 'memory',
|
||||
'state_centralization': 'dataflow',
|
||||
'schema_enforcement': 'dataflow',
|
||||
'field_normalization': 'dataflow',
|
||||
'transactional_updates': 'dataflow',
|
||||
'error_wrapping': 'agent',
|
||||
'result_validation': 'agent',
|
||||
'orchestrator_refactor': 'agent',
|
||||
'flatten_nesting': 'agent'
|
||||
};
|
||||
|
||||
const diagnosisType = strategyToDiagnosis[proposedFix.strategy];
|
||||
|
||||
// For now, do a lightweight verification
|
||||
// Full implementation would re-run the specific diagnosis
|
||||
|
||||
// Check if the fix was actually applied (look for markers)
|
||||
const targetPath = state.target_skill.path;
|
||||
const fixMarker = `Applied fix ${fix.fix_id}`;
|
||||
|
||||
let fixFound = false;
|
||||
const allFiles = Glob(`${targetPath}/**/*.md`);
|
||||
|
||||
for (const file of allFiles) {
|
||||
const content = Read(file);
|
||||
if (content.includes(fixMarker)) {
|
||||
fixFound = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (fixFound) {
|
||||
// Verify by checking if original issues still exist
|
||||
const relatedIssues = proposedFix.issue_ids;
|
||||
const originalIssueCount = relatedIssues.length;
|
||||
|
||||
// Simplified verification: assume fix worked if marker present
|
||||
// Real implementation would re-run diagnosis patterns
|
||||
|
||||
verificationResults.push({
|
||||
fix_id: fix.fix_id,
|
||||
result: 'pass',
|
||||
reason: `Fix applied successfully, addressing ${originalIssueCount} issues`,
|
||||
issues_resolved: relatedIssues
|
||||
});
|
||||
} else {
|
||||
verificationResults.push({
|
||||
fix_id: fix.fix_id,
|
||||
result: 'fail',
|
||||
reason: 'Fix marker not found in target files'
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Update applied fixes with verification results
|
||||
const updatedAppliedFixes = state.applied_fixes.map(fix => {
|
||||
const result = verificationResults.find(v => v.fix_id === fix.fix_id);
|
||||
if (result) {
|
||||
return {
|
||||
...fix,
|
||||
verification_result: result.result
|
||||
};
|
||||
}
|
||||
return fix;
|
||||
});
|
||||
|
||||
// Calculate new quality score
|
||||
const passedFixes = verificationResults.filter(v => v.result === 'pass').length;
|
||||
const totalFixes = verificationResults.length;
|
||||
const verificationRate = totalFixes > 0 ? (passedFixes / totalFixes) * 100 : 100;
|
||||
|
||||
// Recalculate issues (remove resolved ones)
|
||||
const resolvedIssueIds = verificationResults
|
||||
.filter(v => v.result === 'pass')
|
||||
.flatMap(v => v.issues_resolved || []);
|
||||
|
||||
const remainingIssues = state.issues.filter(i => !resolvedIssueIds.includes(i.id));
|
||||
|
||||
// Recalculate quality score
|
||||
const weights = { critical: 25, high: 15, medium: 5, low: 1 };
|
||||
const deductions = remainingIssues.reduce((sum, issue) =>
|
||||
sum + (weights[issue.severity] || 0), 0);
|
||||
const newHealthScore = Math.max(0, 100 - deductions);
|
||||
|
||||
// Determine new quality gate
|
||||
const remainingCritical = remainingIssues.filter(i => i.severity === 'critical').length;
|
||||
const remainingHigh = remainingIssues.filter(i => i.severity === 'high').length;
|
||||
const newQualityGate = remainingCritical === 0 && remainingHigh <= 2 && newHealthScore >= 60
|
||||
? 'pass'
|
||||
: newHealthScore >= 40 ? 'review' : 'fail';
|
||||
|
||||
// Increment iteration count
|
||||
const newIterationCount = state.iteration_count + 1;
|
||||
|
||||
// Ask user if they want to continue
|
||||
let continueIteration = false;
|
||||
if (newQualityGate !== 'pass' && newIterationCount < state.max_iterations) {
|
||||
const continueResponse = await AskUserQuestion({
|
||||
questions: [{
|
||||
question: `Verification complete. Quality gate: ${newQualityGate}. Continue with another iteration?`,
|
||||
header: 'Continue',
|
||||
multiSelect: false,
|
||||
options: [
|
||||
{ label: 'Yes', description: `Run iteration ${newIterationCount + 1}` },
|
||||
{ label: 'No', description: 'Finish with current state' }
|
||||
]
|
||||
}]
|
||||
});
|
||||
continueIteration = continueResponse['Continue'] === 'Yes';
|
||||
}
|
||||
|
||||
// If continuing, reset diagnosis for re-evaluation
|
||||
const diagnosisReset = continueIteration ? {
|
||||
'diagnosis.context': null,
|
||||
'diagnosis.memory': null,
|
||||
'diagnosis.dataflow': null,
|
||||
'diagnosis.agent': null
|
||||
} : {};
|
||||
|
||||
return {
|
||||
stateUpdates: {
|
||||
applied_fixes: updatedAppliedFixes,
|
||||
issues: remainingIssues,
|
||||
quality_score: newHealthScore,
|
||||
quality_gate: newQualityGate,
|
||||
iteration_count: newIterationCount,
|
||||
...diagnosisReset,
|
||||
issues_by_severity: {
|
||||
critical: remainingIssues.filter(i => i.severity === 'critical').length,
|
||||
high: remainingIssues.filter(i => i.severity === 'high').length,
|
||||
medium: remainingIssues.filter(i => i.severity === 'medium').length,
|
||||
low: remainingIssues.filter(i => i.severity === 'low').length
|
||||
}
|
||||
},
|
||||
outputFiles: [],
|
||||
summary: `Verified ${totalFixes} fixes: ${passedFixes} passed. Score: ${newHealthScore}, Gate: ${newQualityGate}, Iteration: ${newIterationCount}`
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## State Updates
|
||||
|
||||
```javascript
|
||||
return {
|
||||
stateUpdates: {
|
||||
applied_fixes: [...updatedWithVerificationResults],
|
||||
issues: [...remainingIssues],
|
||||
quality_score: newScore,
|
||||
quality_gate: newGate,
|
||||
iteration_count: iteration + 1
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error Type | Recovery |
|
||||
|------------|----------|
|
||||
| Re-diagnosis fails | Mark as 'inconclusive' |
|
||||
| File access error | Skip file verification |
|
||||
|
||||
## Next Actions
|
||||
|
||||
- If quality_gate === 'pass': action-complete
|
||||
- If user chose to continue: restart diagnosis cycle
|
||||
- If max_iterations reached: action-complete
|
||||
335
.claude/skills/skill-tuning/phases/orchestrator.md
Normal file
335
.claude/skills/skill-tuning/phases/orchestrator.md
Normal file
@@ -0,0 +1,335 @@
|
||||
# Orchestrator
|
||||
|
||||
Autonomous orchestrator for skill-tuning workflow. Reads current state and selects the next action based on diagnosis progress and quality gates.
|
||||
|
||||
## Role
|
||||
|
||||
Drive the tuning workflow by:
|
||||
1. Reading current session state
|
||||
2. Selecting the appropriate next action
|
||||
3. Executing the action via sub-agent
|
||||
4. Updating state with results
|
||||
5. Repeating until termination conditions met
|
||||
|
||||
## State Management
|
||||
|
||||
### Read State
|
||||
|
||||
```javascript
|
||||
const state = JSON.parse(Read(`${workDir}/state.json`));
|
||||
```
|
||||
|
||||
### Update State
|
||||
|
||||
```javascript
|
||||
function updateState(updates) {
|
||||
const state = JSON.parse(Read(`${workDir}/state.json`));
|
||||
const newState = {
|
||||
...state,
|
||||
...updates,
|
||||
updated_at: new Date().toISOString()
|
||||
};
|
||||
Write(`${workDir}/state.json`, JSON.stringify(newState, null, 2));
|
||||
return newState;
|
||||
}
|
||||
```
|
||||
|
||||
## Decision Logic
|
||||
|
||||
```javascript
|
||||
function selectNextAction(state) {
|
||||
// === Termination Checks ===
|
||||
|
||||
// User exit
|
||||
if (state.status === 'user_exit') return null;
|
||||
|
||||
// Completed
|
||||
if (state.status === 'completed') return null;
|
||||
|
||||
// Error limit exceeded
|
||||
if (state.error_count >= state.max_errors) {
|
||||
return 'action-abort';
|
||||
}
|
||||
|
||||
// Max iterations exceeded
|
||||
if (state.iteration_count >= state.max_iterations) {
|
||||
return 'action-complete';
|
||||
}
|
||||
|
||||
// === Action Selection ===
|
||||
|
||||
// 1. Not initialized yet
|
||||
if (state.status === 'pending') {
|
||||
return 'action-init';
|
||||
}
|
||||
|
||||
// 2. Check if Gemini analysis is requested or needed
|
||||
if (shouldTriggerGeminiAnalysis(state)) {
|
||||
return 'action-gemini-analysis';
|
||||
}
|
||||
|
||||
// 3. Check if Gemini analysis is running
|
||||
if (state.gemini_analysis?.status === 'running') {
|
||||
// Wait for Gemini analysis to complete
|
||||
return null; // Orchestrator will be re-triggered when CLI completes
|
||||
}
|
||||
|
||||
// 4. Run diagnosis in order (only if not completed)
|
||||
const diagnosisOrder = ['context', 'memory', 'dataflow', 'agent'];
|
||||
|
||||
for (const diagType of diagnosisOrder) {
|
||||
if (state.diagnosis[diagType] === null) {
|
||||
// Check if user wants to skip this diagnosis
|
||||
if (!state.focus_areas.length || state.focus_areas.includes(diagType)) {
|
||||
return `action-diagnose-${diagType}`;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 5. All diagnosis complete, generate report if not done
|
||||
const allDiagnosisComplete = diagnosisOrder.every(
|
||||
d => state.diagnosis[d] !== null || !state.focus_areas.includes(d)
|
||||
);
|
||||
|
||||
if (allDiagnosisComplete && !state.completed_actions.includes('action-generate-report')) {
|
||||
return 'action-generate-report';
|
||||
}
|
||||
|
||||
// 6. Report generated, propose fixes if not done
|
||||
if (state.completed_actions.includes('action-generate-report') &&
|
||||
state.proposed_fixes.length === 0 &&
|
||||
state.issues.length > 0) {
|
||||
return 'action-propose-fixes';
|
||||
}
|
||||
|
||||
// 7. Fixes proposed, check if user wants to apply
|
||||
if (state.proposed_fixes.length > 0 && state.pending_fixes.length > 0) {
|
||||
return 'action-apply-fix';
|
||||
}
|
||||
|
||||
// 8. Fixes applied, verify
|
||||
if (state.applied_fixes.length > 0 &&
|
||||
state.applied_fixes.some(f => f.verification_result === 'pending')) {
|
||||
return 'action-verify';
|
||||
}
|
||||
|
||||
// 9. Quality gate check
|
||||
if (state.quality_gate === 'pass') {
|
||||
return 'action-complete';
|
||||
}
|
||||
|
||||
// 10. More iterations needed
|
||||
if (state.iteration_count < state.max_iterations &&
|
||||
state.quality_gate !== 'pass' &&
|
||||
state.issues.some(i => i.severity === 'critical' || i.severity === 'high')) {
|
||||
// Reset diagnosis for re-evaluation
|
||||
return 'action-diagnose-context'; // Start new iteration
|
||||
}
|
||||
|
||||
// 11. Default: complete
|
||||
return 'action-complete';
|
||||
}
|
||||
|
||||
/**
|
||||
* 判断是否需要触发 Gemini CLI 分析
|
||||
*/
|
||||
function shouldTriggerGeminiAnalysis(state) {
|
||||
// 已完成 Gemini 分析,不再触发
|
||||
if (state.gemini_analysis?.status === 'completed') {
|
||||
return false;
|
||||
}
|
||||
|
||||
// 用户显式请求
|
||||
if (state.gemini_analysis_requested === true) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// 发现 critical 问题且未进行深度分析
|
||||
if (state.issues.some(i => i.severity === 'critical') &&
|
||||
!state.completed_actions.includes('action-gemini-analysis')) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// 用户指定了需要 Gemini 分析的 focus_areas
|
||||
const geminiAreas = ['architecture', 'prompt', 'performance', 'custom'];
|
||||
if (state.focus_areas.some(area => geminiAreas.includes(area))) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// 标准诊断完成但问题未得到解决,需要深度分析
|
||||
const diagnosisComplete = ['context', 'memory', 'dataflow', 'agent'].every(
|
||||
d => state.diagnosis[d] !== null
|
||||
);
|
||||
if (diagnosisComplete &&
|
||||
state.issues.length > 0 &&
|
||||
state.iteration_count > 0 &&
|
||||
!state.completed_actions.includes('action-gemini-analysis')) {
|
||||
// 第二轮迭代如果问题仍存在,触发 Gemini 分析
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
```
|
||||
|
||||
## Execution Loop
|
||||
|
||||
```javascript
|
||||
async function runOrchestrator(workDir) {
|
||||
console.log('=== Skill Tuning Orchestrator Started ===');
|
||||
|
||||
let iteration = 0;
|
||||
const MAX_LOOP_ITERATIONS = 50; // Safety limit
|
||||
|
||||
while (iteration < MAX_LOOP_ITERATIONS) {
|
||||
iteration++;
|
||||
|
||||
// 1. Read current state
|
||||
const state = JSON.parse(Read(`${workDir}/state.json`));
|
||||
console.log(`[Loop ${iteration}] Status: ${state.status}, Action: ${state.current_action}`);
|
||||
|
||||
// 2. Select next action
|
||||
const actionId = selectNextAction(state);
|
||||
|
||||
if (!actionId) {
|
||||
console.log('No action selected, terminating orchestrator.');
|
||||
break;
|
||||
}
|
||||
|
||||
console.log(`[Loop ${iteration}] Executing: ${actionId}`);
|
||||
|
||||
// 3. Update state: current action
|
||||
updateState({
|
||||
current_action: actionId,
|
||||
action_history: [...state.action_history, {
|
||||
action: actionId,
|
||||
started_at: new Date().toISOString(),
|
||||
completed_at: null,
|
||||
result: null,
|
||||
output_files: []
|
||||
}]
|
||||
});
|
||||
|
||||
// 4. Execute action
|
||||
try {
|
||||
const actionPrompt = Read(`phases/actions/${actionId}.md`);
|
||||
const stateJson = JSON.stringify(state, null, 2);
|
||||
|
||||
const result = await Task({
|
||||
subagent_type: 'universal-executor',
|
||||
run_in_background: false,
|
||||
prompt: `
|
||||
[CONTEXT]
|
||||
You are executing action "${actionId}" for skill-tuning workflow.
|
||||
Work directory: ${workDir}
|
||||
|
||||
[STATE]
|
||||
${stateJson}
|
||||
|
||||
[ACTION INSTRUCTIONS]
|
||||
${actionPrompt}
|
||||
|
||||
[OUTPUT REQUIREMENT]
|
||||
After completing the action:
|
||||
1. Write any output files to the work directory
|
||||
2. Return a JSON object with:
|
||||
- stateUpdates: object with state fields to update
|
||||
- outputFiles: array of files created
|
||||
- summary: brief description of what was done
|
||||
`
|
||||
});
|
||||
|
||||
// 5. Parse result and update state
|
||||
let actionResult;
|
||||
try {
|
||||
actionResult = JSON.parse(result);
|
||||
} catch (e) {
|
||||
actionResult = {
|
||||
stateUpdates: {},
|
||||
outputFiles: [],
|
||||
summary: result
|
||||
};
|
||||
}
|
||||
|
||||
// 6. Update state: action complete
|
||||
const updatedHistory = [...state.action_history];
|
||||
updatedHistory[updatedHistory.length - 1] = {
|
||||
...updatedHistory[updatedHistory.length - 1],
|
||||
completed_at: new Date().toISOString(),
|
||||
result: 'success',
|
||||
output_files: actionResult.outputFiles || []
|
||||
};
|
||||
|
||||
updateState({
|
||||
current_action: null,
|
||||
completed_actions: [...state.completed_actions, actionId],
|
||||
action_history: updatedHistory,
|
||||
...actionResult.stateUpdates
|
||||
});
|
||||
|
||||
console.log(`[Loop ${iteration}] Completed: ${actionId}`);
|
||||
|
||||
} catch (error) {
|
||||
console.log(`[Loop ${iteration}] Error in ${actionId}: ${error.message}`);
|
||||
|
||||
// Error handling
|
||||
updateState({
|
||||
current_action: null,
|
||||
errors: [...state.errors, {
|
||||
action: actionId,
|
||||
message: error.message,
|
||||
timestamp: new Date().toISOString(),
|
||||
recoverable: true
|
||||
}],
|
||||
error_count: state.error_count + 1
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
console.log('=== Skill Tuning Orchestrator Finished ===');
|
||||
}
|
||||
```
|
||||
|
||||
## Action Catalog
|
||||
|
||||
| Action | Purpose | Preconditions | Effects |
|
||||
|--------|---------|---------------|---------|
|
||||
| [action-init](actions/action-init.md) | Initialize tuning session | status === 'pending' | Creates work dirs, backup, sets status='running' |
|
||||
| [action-diagnose-context](actions/action-diagnose-context.md) | Analyze context explosion | status === 'running' | Sets diagnosis.context |
|
||||
| [action-diagnose-memory](actions/action-diagnose-memory.md) | Analyze long-tail forgetting | status === 'running' | Sets diagnosis.memory |
|
||||
| [action-diagnose-dataflow](actions/action-diagnose-dataflow.md) | Analyze data flow issues | status === 'running' | Sets diagnosis.dataflow |
|
||||
| [action-diagnose-agent](actions/action-diagnose-agent.md) | Analyze agent coordination | status === 'running' | Sets diagnosis.agent |
|
||||
| [action-gemini-analysis](actions/action-gemini-analysis.md) | Deep analysis via Gemini CLI | User request OR critical issues | Sets gemini_analysis, adds issues |
|
||||
| [action-generate-report](actions/action-generate-report.md) | Generate consolidated report | All diagnoses complete | Creates tuning-report.md |
|
||||
| [action-propose-fixes](actions/action-propose-fixes.md) | Generate fix proposals | Report generated, issues > 0 | Sets proposed_fixes |
|
||||
| [action-apply-fix](actions/action-apply-fix.md) | Apply selected fix | pending_fixes > 0 | Updates applied_fixes |
|
||||
| [action-verify](actions/action-verify.md) | Verify applied fixes | applied_fixes with pending verification | Updates verification_result |
|
||||
| [action-complete](actions/action-complete.md) | Finalize session | quality_gate='pass' OR max_iterations | Sets status='completed' |
|
||||
| [action-abort](actions/action-abort.md) | Abort on errors | error_count >= max_errors | Sets status='failed' |
|
||||
|
||||
## Termination Conditions
|
||||
|
||||
- `status === 'completed'`: Normal completion
|
||||
- `status === 'user_exit'`: User requested exit
|
||||
- `status === 'failed'`: Unrecoverable error
|
||||
- `error_count >= max_errors`: Too many errors (default: 3)
|
||||
- `iteration_count >= max_iterations`: Max iterations reached (default: 5)
|
||||
- `quality_gate === 'pass'`: All quality criteria met
|
||||
|
||||
## Error Recovery
|
||||
|
||||
| Error Type | Recovery Strategy |
|
||||
|------------|-------------------|
|
||||
| Action execution failed | Retry up to 3 times, then skip |
|
||||
| State parse error | Restore from backup |
|
||||
| File write error | Retry with alternative path |
|
||||
| User abort | Save state and exit gracefully |
|
||||
|
||||
## User Interaction Points
|
||||
|
||||
The orchestrator pauses for user input at these points:
|
||||
|
||||
1. **action-init**: Confirm target skill and describe issue
|
||||
2. **action-propose-fixes**: Select which fixes to apply
|
||||
3. **action-verify**: Review verification results, decide to continue or stop
|
||||
4. **action-complete**: Review final summary
|
||||
282
.claude/skills/skill-tuning/phases/state-schema.md
Normal file
282
.claude/skills/skill-tuning/phases/state-schema.md
Normal file
@@ -0,0 +1,282 @@
|
||||
# State Schema
|
||||
|
||||
Defines the state structure for skill-tuning orchestrator.
|
||||
|
||||
## State Structure
|
||||
|
||||
```typescript
|
||||
interface TuningState {
|
||||
// === Core Status ===
|
||||
status: 'pending' | 'running' | 'completed' | 'failed';
|
||||
started_at: string; // ISO timestamp
|
||||
updated_at: string; // ISO timestamp
|
||||
|
||||
// === Target Skill Info ===
|
||||
target_skill: {
|
||||
name: string; // e.g., "software-manual"
|
||||
path: string; // e.g., ".claude/skills/software-manual"
|
||||
execution_mode: 'sequential' | 'autonomous';
|
||||
phases: string[]; // List of phase files
|
||||
specs: string[]; // List of spec files
|
||||
};
|
||||
|
||||
// === User Input ===
|
||||
user_issue_description: string; // User's problem description
|
||||
focus_areas: string[]; // User-specified focus (optional)
|
||||
|
||||
// === Diagnosis Results ===
|
||||
diagnosis: {
|
||||
context: DiagnosisResult | null;
|
||||
memory: DiagnosisResult | null;
|
||||
dataflow: DiagnosisResult | null;
|
||||
agent: DiagnosisResult | null;
|
||||
};
|
||||
|
||||
// === Issues Found ===
|
||||
issues: Issue[];
|
||||
issues_by_severity: {
|
||||
critical: number;
|
||||
high: number;
|
||||
medium: number;
|
||||
low: number;
|
||||
};
|
||||
|
||||
// === Fix Management ===
|
||||
proposed_fixes: Fix[];
|
||||
applied_fixes: AppliedFix[];
|
||||
pending_fixes: string[]; // Fix IDs pending application
|
||||
|
||||
// === Iteration Control ===
|
||||
iteration_count: number;
|
||||
max_iterations: number; // Default: 5
|
||||
|
||||
// === Quality Metrics ===
|
||||
quality_score: number; // 0-100
|
||||
quality_gate: 'pass' | 'review' | 'fail';
|
||||
|
||||
// === Orchestrator State ===
|
||||
completed_actions: string[];
|
||||
current_action: string | null;
|
||||
action_history: ActionHistoryEntry[];
|
||||
|
||||
// === Error Handling ===
|
||||
errors: ErrorEntry[];
|
||||
error_count: number;
|
||||
max_errors: number; // Default: 3
|
||||
|
||||
// === Output Paths ===
|
||||
work_dir: string;
|
||||
backup_dir: string;
|
||||
}
|
||||
|
||||
interface DiagnosisResult {
|
||||
status: 'completed' | 'skipped' | 'failed';
|
||||
issues_found: number;
|
||||
severity: 'critical' | 'high' | 'medium' | 'low' | 'none';
|
||||
execution_time_ms: number;
|
||||
details: {
|
||||
patterns_checked: string[];
|
||||
patterns_matched: string[];
|
||||
evidence: Evidence[];
|
||||
recommendations: string[];
|
||||
};
|
||||
}
|
||||
|
||||
interface Evidence {
|
||||
file: string;
|
||||
line?: number;
|
||||
pattern: string;
|
||||
context: string;
|
||||
severity: string;
|
||||
}
|
||||
|
||||
interface Issue {
|
||||
id: string; // e.g., "ISS-001"
|
||||
type: 'context_explosion' | 'memory_loss' | 'dataflow_break' | 'agent_failure';
|
||||
severity: 'critical' | 'high' | 'medium' | 'low';
|
||||
priority: number; // 1 = highest
|
||||
location: {
|
||||
file: string;
|
||||
line_start?: number;
|
||||
line_end?: number;
|
||||
phase?: string;
|
||||
};
|
||||
description: string;
|
||||
evidence: string[];
|
||||
root_cause: string;
|
||||
impact: string;
|
||||
suggested_fix: string;
|
||||
related_issues: string[]; // Issue IDs
|
||||
}
|
||||
|
||||
interface Fix {
|
||||
id: string; // e.g., "FIX-001"
|
||||
issue_ids: string[]; // Issues this fix addresses
|
||||
strategy: FixStrategy;
|
||||
description: string;
|
||||
rationale: string;
|
||||
changes: FileChange[];
|
||||
risk: 'low' | 'medium' | 'high';
|
||||
estimated_impact: string;
|
||||
verification_steps: string[];
|
||||
}
|
||||
|
||||
type FixStrategy =
|
||||
| 'context_summarization' // Add context compression
|
||||
| 'sliding_window' // Implement sliding context window
|
||||
| 'structured_state' // Convert to structured state passing
|
||||
| 'constraint_injection' // Add constraint propagation
|
||||
| 'checkpoint_restore' // Add checkpointing mechanism
|
||||
| 'schema_enforcement' // Add data contract validation
|
||||
| 'orchestrator_refactor' // Refactor agent coordination
|
||||
| 'state_centralization' // Centralize state management
|
||||
| 'custom'; // Custom fix
|
||||
|
||||
interface FileChange {
|
||||
file: string;
|
||||
action: 'create' | 'modify' | 'delete';
|
||||
old_content?: string;
|
||||
new_content?: string;
|
||||
diff?: string;
|
||||
}
|
||||
|
||||
interface AppliedFix {
|
||||
fix_id: string;
|
||||
applied_at: string;
|
||||
success: boolean;
|
||||
backup_path: string;
|
||||
verification_result: 'pass' | 'fail' | 'pending';
|
||||
rollback_available: boolean;
|
||||
}
|
||||
|
||||
interface ActionHistoryEntry {
|
||||
action: string;
|
||||
started_at: string;
|
||||
completed_at: string;
|
||||
result: 'success' | 'failure' | 'skipped';
|
||||
output_files: string[];
|
||||
}
|
||||
|
||||
interface ErrorEntry {
|
||||
action: string;
|
||||
message: string;
|
||||
timestamp: string;
|
||||
recoverable: boolean;
|
||||
}
|
||||
```
|
||||
|
||||
## Initial State Template
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "pending",
|
||||
"started_at": null,
|
||||
"updated_at": null,
|
||||
"target_skill": {
|
||||
"name": null,
|
||||
"path": null,
|
||||
"execution_mode": null,
|
||||
"phases": [],
|
||||
"specs": []
|
||||
},
|
||||
"user_issue_description": "",
|
||||
"focus_areas": [],
|
||||
"diagnosis": {
|
||||
"context": null,
|
||||
"memory": null,
|
||||
"dataflow": null,
|
||||
"agent": null
|
||||
},
|
||||
"issues": [],
|
||||
"issues_by_severity": {
|
||||
"critical": 0,
|
||||
"high": 0,
|
||||
"medium": 0,
|
||||
"low": 0
|
||||
},
|
||||
"proposed_fixes": [],
|
||||
"applied_fixes": [],
|
||||
"pending_fixes": [],
|
||||
"iteration_count": 0,
|
||||
"max_iterations": 5,
|
||||
"quality_score": 0,
|
||||
"quality_gate": "fail",
|
||||
"completed_actions": [],
|
||||
"current_action": null,
|
||||
"action_history": [],
|
||||
"errors": [],
|
||||
"error_count": 0,
|
||||
"max_errors": 3,
|
||||
"work_dir": null,
|
||||
"backup_dir": null
|
||||
}
|
||||
```
|
||||
|
||||
## State Transition Diagram
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ pending │
|
||||
└──────┬──────┘
|
||||
│ action-init
|
||||
↓
|
||||
┌─────────────┐
|
||||
┌──────────│ running │──────────┐
|
||||
│ └──────┬──────┘ │
|
||||
│ │ │
|
||||
diagnosis │ ┌────────────┼────────────┐ │ error_count >= 3
|
||||
actions │ │ │ │ │
|
||||
│ ↓ ↓ ↓ │
|
||||
│ context memory dataflow │
|
||||
│ │ │ │ │
|
||||
│ └────────────┼────────────┘ │
|
||||
│ │ │
|
||||
│ ↓ │
|
||||
│ action-verify │
|
||||
│ │ │
|
||||
│ ┌───────────┼───────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ↓ ↓ ↓ │
|
||||
│ quality iterate apply │
|
||||
│ gate=pass (< max) fix │
|
||||
│ │ │ │ │
|
||||
│ │ └───────────┘ │
|
||||
│ ↓ ↓
|
||||
│ ┌─────────────┐ ┌─────────────┐
|
||||
└→│ completed │ │ failed │
|
||||
└─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
## State Update Rules
|
||||
|
||||
### Atomicity
|
||||
All state updates must be atomic - read current state, apply changes, write entire state.
|
||||
|
||||
### Immutability
|
||||
Never mutate state in place. Always create new state object with changes.
|
||||
|
||||
### Validation
|
||||
Before writing state, validate against schema to prevent corruption.
|
||||
|
||||
### Timestamps
|
||||
Always update `updated_at` on every state change.
|
||||
|
||||
```javascript
|
||||
function updateState(workDir, updates) {
|
||||
const currentState = JSON.parse(Read(`${workDir}/state.json`));
|
||||
|
||||
const newState = {
|
||||
...currentState,
|
||||
...updates,
|
||||
updated_at: new Date().toISOString()
|
||||
};
|
||||
|
||||
// Validate before write
|
||||
if (!validateState(newState)) {
|
||||
throw new Error('Invalid state update');
|
||||
}
|
||||
|
||||
Write(`${workDir}/state.json`, JSON.stringify(newState, null, 2));
|
||||
return newState;
|
||||
}
|
||||
```
|
||||
210
.claude/skills/skill-tuning/specs/problem-taxonomy.md
Normal file
210
.claude/skills/skill-tuning/specs/problem-taxonomy.md
Normal file
@@ -0,0 +1,210 @@
|
||||
# Problem Taxonomy
|
||||
|
||||
Classification of skill execution issues with detection patterns and severity criteria.
|
||||
|
||||
## When to Use
|
||||
|
||||
| Phase | Usage | Section |
|
||||
|-------|-------|---------|
|
||||
| All Diagnosis Actions | Issue classification | All sections |
|
||||
| action-propose-fixes | Strategy selection | Fix Mapping |
|
||||
| action-generate-report | Severity assessment | Severity Criteria |
|
||||
|
||||
---
|
||||
|
||||
## Problem Categories
|
||||
|
||||
### 1. Context Explosion (P2)
|
||||
|
||||
**Definition**: Excessive token accumulation causing prompt size to grow unbounded.
|
||||
|
||||
**Root Causes**:
|
||||
- Unbounded conversation history
|
||||
- Full content passing instead of references
|
||||
- Missing summarization mechanisms
|
||||
- Agent returning full output instead of path+summary
|
||||
|
||||
**Detection Patterns**:
|
||||
|
||||
| Pattern ID | Regex/Check | Description |
|
||||
|------------|-------------|-------------|
|
||||
| CTX-001 | `/history\s*[.=].*push\|concat/` | History array growth |
|
||||
| CTX-002 | `/JSON\.stringify\s*\(\s*state\s*\)/` | Full state serialization |
|
||||
| CTX-003 | `/Read\([^)]+\)\s*[\+,]/` | Multiple file content concatenation |
|
||||
| CTX-004 | `/return\s*\{[^}]*content:/` | Agent returning full content |
|
||||
| CTX-005 | File length > 5000 chars without summarize | Long prompt without compression |
|
||||
|
||||
**Impact Levels**:
|
||||
- **Critical**: Context exceeds model limit (128K tokens)
|
||||
- **High**: Context > 50K tokens per iteration
|
||||
- **Medium**: Context grows 10%+ per iteration
|
||||
- **Low**: Potential for growth but currently manageable
|
||||
|
||||
---
|
||||
|
||||
### 2. Long-tail Forgetting (P3)
|
||||
|
||||
**Definition**: Loss of early instructions, constraints, or goals in long execution chains.
|
||||
|
||||
**Root Causes**:
|
||||
- No explicit constraint propagation
|
||||
- Reliance on implicit context
|
||||
- Missing checkpoint/restore mechanisms
|
||||
- State schema without requirements field
|
||||
|
||||
**Detection Patterns**:
|
||||
|
||||
| Pattern ID | Regex/Check | Description |
|
||||
|------------|-------------|-------------|
|
||||
| MEM-001 | Later phases missing constraint reference | Constraint not carried forward |
|
||||
| MEM-002 | `/\[TASK\][^[]*(?!\[CONSTRAINTS\])/` | Task without constraints section |
|
||||
| MEM-003 | Key phases without checkpoint | Missing state preservation |
|
||||
| MEM-004 | State schema lacks `original_requirements` | No constraint persistence |
|
||||
| MEM-005 | No verification phase | Output not checked against intent |
|
||||
|
||||
**Impact Levels**:
|
||||
- **Critical**: Original goal completely lost
|
||||
- **High**: Key constraints ignored in output
|
||||
- **Medium**: Some requirements missing
|
||||
- **Low**: Minor goal drift
|
||||
|
||||
---
|
||||
|
||||
### 3. Data Flow Disruption (P0)
|
||||
|
||||
**Definition**: Inconsistent state management causing data loss or corruption.
|
||||
|
||||
**Root Causes**:
|
||||
- Multiple state storage locations
|
||||
- Inconsistent field naming
|
||||
- Missing schema validation
|
||||
- Format transformation without normalization
|
||||
|
||||
**Detection Patterns**:
|
||||
|
||||
| Pattern ID | Regex/Check | Description |
|
||||
|------------|-------------|-------------|
|
||||
| DF-001 | Multiple state file writes | Scattered state storage |
|
||||
| DF-002 | Same concept, different names | Field naming inconsistency |
|
||||
| DF-003 | JSON.parse without validation | Missing schema validation |
|
||||
| DF-004 | Files written but never read | Orphaned outputs |
|
||||
| DF-005 | Autonomous skill without state-schema | Undefined state structure |
|
||||
|
||||
**Impact Levels**:
|
||||
- **Critical**: Data loss or corruption
|
||||
- **High**: State inconsistency between phases
|
||||
- **Medium**: Potential for inconsistency
|
||||
- **Low**: Minor naming inconsistencies
|
||||
|
||||
---
|
||||
|
||||
### 4. Agent Coordination Failure (P1)
|
||||
|
||||
**Definition**: Fragile agent call patterns causing cascading failures.
|
||||
|
||||
**Root Causes**:
|
||||
- Missing error handling in Task calls
|
||||
- No result validation
|
||||
- Inconsistent agent configurations
|
||||
- Deeply nested agent calls
|
||||
|
||||
**Detection Patterns**:
|
||||
|
||||
| Pattern ID | Regex/Check | Description |
|
||||
|------------|-------------|-------------|
|
||||
| AGT-001 | Task without try-catch | Missing error handling |
|
||||
| AGT-002 | Result used without validation | No return value check |
|
||||
| AGT-003 | > 3 different agent types | Agent type proliferation |
|
||||
| AGT-004 | Nested Task in prompt | Agent calling agent |
|
||||
| AGT-005 | Task used but not in allowed-tools | Tool declaration mismatch |
|
||||
| AGT-006 | Multiple return formats | Inconsistent agent output |
|
||||
|
||||
**Impact Levels**:
|
||||
- **Critical**: Workflow crash on agent failure
|
||||
- **High**: Unpredictable agent behavior
|
||||
- **Medium**: Occasional coordination issues
|
||||
- **Low**: Minor inconsistencies
|
||||
|
||||
---
|
||||
|
||||
## Severity Criteria
|
||||
|
||||
### Global Severity Matrix
|
||||
|
||||
| Severity | Definition | Action Required |
|
||||
|----------|------------|-----------------|
|
||||
| **Critical** | Blocks execution or causes data loss | Immediate fix required |
|
||||
| **High** | Significantly impacts reliability | Should fix before deployment |
|
||||
| **Medium** | Affects quality or maintainability | Fix in next iteration |
|
||||
| **Low** | Minor improvement opportunity | Optional fix |
|
||||
|
||||
### Severity Calculation
|
||||
|
||||
```javascript
|
||||
function calculateIssueSeverity(issue) {
|
||||
const weights = {
|
||||
impact_on_execution: 40, // Does it block workflow?
|
||||
data_integrity_risk: 30, // Can it cause data loss?
|
||||
frequency: 20, // How often does it occur?
|
||||
complexity_to_fix: 10 // How hard to fix?
|
||||
};
|
||||
|
||||
let score = 0;
|
||||
|
||||
// Impact on execution
|
||||
if (issue.blocks_execution) score += weights.impact_on_execution;
|
||||
else if (issue.degrades_execution) score += weights.impact_on_execution * 0.5;
|
||||
|
||||
// Data integrity
|
||||
if (issue.causes_data_loss) score += weights.data_integrity_risk;
|
||||
else if (issue.causes_inconsistency) score += weights.data_integrity_risk * 0.5;
|
||||
|
||||
// Frequency
|
||||
if (issue.occurs_every_run) score += weights.frequency;
|
||||
else if (issue.occurs_sometimes) score += weights.frequency * 0.5;
|
||||
|
||||
// Complexity (inverse - easier to fix = higher priority)
|
||||
if (issue.fix_complexity === 'low') score += weights.complexity_to_fix;
|
||||
else if (issue.fix_complexity === 'medium') score += weights.complexity_to_fix * 0.5;
|
||||
|
||||
// Map score to severity
|
||||
if (score >= 70) return 'critical';
|
||||
if (score >= 50) return 'high';
|
||||
if (score >= 30) return 'medium';
|
||||
return 'low';
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Fix Mapping
|
||||
|
||||
| Problem Type | Recommended Strategies | Priority Order |
|
||||
|--------------|----------------------|----------------|
|
||||
| Context Explosion | sliding_window, path_reference, context_summarization | 1, 2, 3 |
|
||||
| Long-tail Forgetting | constraint_injection, state_constraints_field, checkpoint | 1, 2, 3 |
|
||||
| Data Flow Disruption | state_centralization, schema_enforcement, field_normalization | 1, 2, 3 |
|
||||
| Agent Coordination | error_wrapping, result_validation, flatten_nesting | 1, 2, 3 |
|
||||
|
||||
---
|
||||
|
||||
## Cross-Category Dependencies
|
||||
|
||||
Some issues may trigger others:
|
||||
|
||||
```
|
||||
Context Explosion ──→ Long-tail Forgetting
|
||||
(Large context causes important info to be pushed out)
|
||||
|
||||
Data Flow Disruption ──→ Agent Coordination Failure
|
||||
(Inconsistent data causes agents to fail)
|
||||
|
||||
Agent Coordination Failure ──→ Context Explosion
|
||||
(Failed retries add to context)
|
||||
```
|
||||
|
||||
When fixing, address in this order:
|
||||
1. **P0 Data Flow** - Foundation for other fixes
|
||||
2. **P1 Agent Coordination** - Stability
|
||||
3. **P2 Context Explosion** - Efficiency
|
||||
4. **P3 Long-tail Forgetting** - Quality
|
||||
263
.claude/skills/skill-tuning/specs/quality-gates.md
Normal file
263
.claude/skills/skill-tuning/specs/quality-gates.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# Quality Gates
|
||||
|
||||
Quality thresholds and verification criteria for skill tuning.
|
||||
|
||||
## When to Use
|
||||
|
||||
| Phase | Usage | Section |
|
||||
|-------|-------|---------|
|
||||
| action-generate-report | Calculate quality score | Scoring |
|
||||
| action-verify | Check quality gates | Gate Definitions |
|
||||
| action-complete | Final assessment | Pass Criteria |
|
||||
|
||||
---
|
||||
|
||||
## Quality Dimensions
|
||||
|
||||
### 1. Issue Severity Distribution (40%)
|
||||
|
||||
Measures the severity profile of identified issues.
|
||||
|
||||
| Metric | Weight | Calculation |
|
||||
|--------|--------|-------------|
|
||||
| Critical Issues | -25 each | High penalty |
|
||||
| High Issues | -15 each | Significant penalty |
|
||||
| Medium Issues | -5 each | Moderate penalty |
|
||||
| Low Issues | -1 each | Minor penalty |
|
||||
|
||||
**Score Calculation**:
|
||||
```javascript
|
||||
function calculateSeverityScore(issues) {
|
||||
const weights = { critical: 25, high: 15, medium: 5, low: 1 };
|
||||
const deductions = issues.reduce((sum, issue) =>
|
||||
sum + (weights[issue.severity] || 0), 0);
|
||||
return Math.max(0, 100 - deductions);
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Fix Effectiveness (30%)
|
||||
|
||||
Measures success rate of applied fixes.
|
||||
|
||||
| Metric | Weight | Threshold |
|
||||
|--------|--------|-----------|
|
||||
| Fixes Verified Pass | +30 | > 80% pass rate |
|
||||
| Fixes Verified Fail | -20 | < 50% triggers review |
|
||||
| Issues Resolved | +10 | Per resolved issue |
|
||||
|
||||
**Score Calculation**:
|
||||
```javascript
|
||||
function calculateFixScore(appliedFixes) {
|
||||
const total = appliedFixes.length;
|
||||
if (total === 0) return 100; // No fixes needed = good
|
||||
|
||||
const passed = appliedFixes.filter(f => f.verification_result === 'pass').length;
|
||||
return Math.round((passed / total) * 100);
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Coverage Completeness (20%)
|
||||
|
||||
Measures diagnosis coverage across all areas.
|
||||
|
||||
| Metric | Weight | Threshold |
|
||||
|--------|--------|-----------|
|
||||
| All 4 diagnosis complete | +20 | Full coverage |
|
||||
| 3 diagnosis complete | +15 | Good coverage |
|
||||
| 2 diagnosis complete | +10 | Partial coverage |
|
||||
| < 2 diagnosis complete | +0 | Insufficient |
|
||||
|
||||
### 4. Iteration Efficiency (10%)
|
||||
|
||||
Measures how quickly issues are resolved.
|
||||
|
||||
| Metric | Weight | Threshold |
|
||||
|--------|--------|-----------|
|
||||
| Resolved in 1 iteration | +10 | Excellent |
|
||||
| Resolved in 2 iterations | +7 | Good |
|
||||
| Resolved in 3 iterations | +4 | Acceptable |
|
||||
| > 3 iterations | +0 | Needs improvement |
|
||||
|
||||
---
|
||||
|
||||
## Gate Definitions
|
||||
|
||||
### Gate: PASS
|
||||
|
||||
**Threshold**: Quality Score >= 80 AND Critical Issues = 0 AND High Issues <= 2
|
||||
|
||||
**Meaning**: Skill is production-ready with minor issues.
|
||||
|
||||
**Actions**:
|
||||
- Complete tuning session
|
||||
- Generate summary report
|
||||
- No further fixes required
|
||||
|
||||
### Gate: REVIEW
|
||||
|
||||
**Threshold**: Quality Score 60-79 OR High Issues 3-5
|
||||
|
||||
**Meaning**: Skill has issues requiring attention.
|
||||
|
||||
**Actions**:
|
||||
- Review remaining issues
|
||||
- Apply additional fixes if possible
|
||||
- May require manual intervention
|
||||
|
||||
### Gate: FAIL
|
||||
|
||||
**Threshold**: Quality Score < 60 OR Critical Issues > 0 OR High Issues > 5
|
||||
|
||||
**Meaning**: Skill has serious issues blocking deployment.
|
||||
|
||||
**Actions**:
|
||||
- Must fix critical issues
|
||||
- Re-run diagnosis after fixes
|
||||
- Consider architectural review
|
||||
|
||||
---
|
||||
|
||||
## Quality Score Calculation
|
||||
|
||||
```javascript
|
||||
function calculateQualityScore(state) {
|
||||
// Dimension 1: Severity (40%)
|
||||
const severityScore = calculateSeverityScore(state.issues);
|
||||
|
||||
// Dimension 2: Fix Effectiveness (30%)
|
||||
const fixScore = calculateFixScore(state.applied_fixes);
|
||||
|
||||
// Dimension 3: Coverage (20%)
|
||||
const diagnosisCount = Object.values(state.diagnosis)
|
||||
.filter(d => d !== null).length;
|
||||
const coverageScore = [0, 0, 10, 15, 20][diagnosisCount] || 0;
|
||||
|
||||
// Dimension 4: Efficiency (10%)
|
||||
const efficiencyScore = state.iteration_count <= 1 ? 10 :
|
||||
state.iteration_count <= 2 ? 7 :
|
||||
state.iteration_count <= 3 ? 4 : 0;
|
||||
|
||||
// Weighted total
|
||||
const total = (severityScore * 0.4) +
|
||||
(fixScore * 0.3) +
|
||||
(coverageScore * 1.0) + // Already scaled to 20
|
||||
(efficiencyScore * 1.0); // Already scaled to 10
|
||||
|
||||
return Math.round(total);
|
||||
}
|
||||
|
||||
function determineQualityGate(state) {
|
||||
const score = calculateQualityScore(state);
|
||||
const criticalCount = state.issues.filter(i => i.severity === 'critical').length;
|
||||
const highCount = state.issues.filter(i => i.severity === 'high').length;
|
||||
|
||||
if (criticalCount > 0) return 'fail';
|
||||
if (highCount > 5) return 'fail';
|
||||
if (score < 60) return 'fail';
|
||||
|
||||
if (highCount > 2) return 'review';
|
||||
if (score < 80) return 'review';
|
||||
|
||||
return 'pass';
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Criteria
|
||||
|
||||
### For Each Issue Type
|
||||
|
||||
#### Context Explosion Issues
|
||||
- [ ] Token count does not grow unbounded
|
||||
- [ ] History limited to reasonable size
|
||||
- [ ] No full content in prompts (paths used instead)
|
||||
- [ ] Agent returns are compact
|
||||
|
||||
#### Long-tail Forgetting Issues
|
||||
- [ ] Constraints visible in all phase prompts
|
||||
- [ ] State schema includes requirements field
|
||||
- [ ] Checkpoints exist at key milestones
|
||||
- [ ] Output matches original constraints
|
||||
|
||||
#### Data Flow Issues
|
||||
- [ ] Single state.json after execution
|
||||
- [ ] No orphan state files
|
||||
- [ ] Schema validation active
|
||||
- [ ] Consistent field naming
|
||||
|
||||
#### Agent Coordination Issues
|
||||
- [ ] All Task calls have error handling
|
||||
- [ ] Agent results validated before use
|
||||
- [ ] No nested agent calls
|
||||
- [ ] Tool declarations match usage
|
||||
|
||||
---
|
||||
|
||||
## Iteration Control
|
||||
|
||||
### Max Iterations
|
||||
|
||||
Default: 5 iterations
|
||||
|
||||
**Rationale**:
|
||||
- Each iteration may introduce new issues
|
||||
- Diminishing returns after 3-4 iterations
|
||||
- Prevents infinite loops
|
||||
|
||||
### Iteration Exit Criteria
|
||||
|
||||
```javascript
|
||||
function shouldContinueIteration(state) {
|
||||
// Exit if quality gate passed
|
||||
if (state.quality_gate === 'pass') return false;
|
||||
|
||||
// Exit if max iterations reached
|
||||
if (state.iteration_count >= state.max_iterations) return false;
|
||||
|
||||
// Exit if no improvement in last 2 iterations
|
||||
if (state.iteration_count >= 2) {
|
||||
const recentHistory = state.action_history.slice(-10);
|
||||
const issuesResolvedRecently = recentHistory.filter(a =>
|
||||
a.action === 'action-verify' && a.result === 'success'
|
||||
).length;
|
||||
|
||||
if (issuesResolvedRecently === 0) {
|
||||
console.log('No progress in recent iterations, stopping.');
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Continue if critical/high issues remain
|
||||
const hasUrgentIssues = state.issues.some(i =>
|
||||
i.severity === 'critical' || i.severity === 'high'
|
||||
);
|
||||
|
||||
return hasUrgentIssues;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Reporting Format
|
||||
|
||||
### Quality Summary Table
|
||||
|
||||
| Dimension | Score | Weight | Weighted |
|
||||
|-----------|-------|--------|----------|
|
||||
| Severity Distribution | {score}/100 | 40% | {weighted} |
|
||||
| Fix Effectiveness | {score}/100 | 30% | {weighted} |
|
||||
| Coverage Completeness | {score}/20 | 20% | {score} |
|
||||
| Iteration Efficiency | {score}/10 | 10% | {score} |
|
||||
| **Total** | | | **{total}/100** |
|
||||
|
||||
### Gate Status
|
||||
|
||||
```
|
||||
Quality Gate: {PASS|REVIEW|FAIL}
|
||||
|
||||
Criteria:
|
||||
- Quality Score: {score} (threshold: 60)
|
||||
- Critical Issues: {count} (threshold: 0)
|
||||
- High Issues: {count} (threshold: 5)
|
||||
```
|
||||
1016
.claude/skills/skill-tuning/specs/tuning-strategies.md
Normal file
1016
.claude/skills/skill-tuning/specs/tuning-strategies.md
Normal file
File diff suppressed because it is too large
Load Diff
153
.claude/skills/skill-tuning/templates/diagnosis-report.md
Normal file
153
.claude/skills/skill-tuning/templates/diagnosis-report.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Diagnosis Report Template
|
||||
|
||||
Template for individual diagnosis action reports.
|
||||
|
||||
## Template
|
||||
|
||||
```markdown
|
||||
# {{diagnosis_type}} Diagnosis Report
|
||||
|
||||
**Target Skill**: {{skill_name}}
|
||||
**Diagnosis Type**: {{diagnosis_type}}
|
||||
**Executed At**: {{timestamp}}
|
||||
**Duration**: {{duration_ms}}ms
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Issues Found | {{issues_found}} |
|
||||
| Severity | {{severity}} |
|
||||
| Patterns Checked | {{patterns_checked_count}} |
|
||||
| Patterns Matched | {{patterns_matched_count}} |
|
||||
|
||||
---
|
||||
|
||||
## Patterns Analyzed
|
||||
|
||||
{{#each patterns_checked}}
|
||||
### {{pattern_name}}
|
||||
|
||||
- **Status**: {{status}}
|
||||
- **Matches**: {{match_count}}
|
||||
- **Files Affected**: {{affected_files}}
|
||||
|
||||
{{/each}}
|
||||
|
||||
---
|
||||
|
||||
## Issues Identified
|
||||
|
||||
{{#if issues.length}}
|
||||
{{#each issues}}
|
||||
### {{id}}: {{description}}
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Type | {{type}} |
|
||||
| Severity | {{severity}} |
|
||||
| Location | {{location}} |
|
||||
| Root Cause | {{root_cause}} |
|
||||
| Impact | {{impact}} |
|
||||
|
||||
**Evidence**:
|
||||
{{#each evidence}}
|
||||
- `{{this}}`
|
||||
{{/each}}
|
||||
|
||||
**Suggested Fix**: {{suggested_fix}}
|
||||
|
||||
---
|
||||
{{/each}}
|
||||
{{else}}
|
||||
_No issues found in this diagnosis area._
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
{{#if recommendations.length}}
|
||||
{{#each recommendations}}
|
||||
{{@index}}. {{this}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
No specific recommendations - area appears healthy.
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Raw Data
|
||||
|
||||
Full diagnosis data available at:
|
||||
`{{output_file}}`
|
||||
```
|
||||
|
||||
## Variable Reference
|
||||
|
||||
| Variable | Type | Source |
|
||||
|----------|------|--------|
|
||||
| `diagnosis_type` | string | 'context' \| 'memory' \| 'dataflow' \| 'agent' |
|
||||
| `skill_name` | string | state.target_skill.name |
|
||||
| `timestamp` | string | ISO timestamp |
|
||||
| `duration_ms` | number | Execution time |
|
||||
| `issues_found` | number | issues.length |
|
||||
| `severity` | string | Calculated severity |
|
||||
| `patterns_checked` | array | Patterns analyzed |
|
||||
| `patterns_matched` | array | Patterns with matches |
|
||||
| `issues` | array | Issue objects |
|
||||
| `recommendations` | array | String recommendations |
|
||||
| `output_file` | string | Path to JSON file |
|
||||
|
||||
## Usage
|
||||
|
||||
```javascript
|
||||
function renderDiagnosisReport(diagnosis, diagnosisType, skillName, outputFile) {
|
||||
return `# ${diagnosisType} Diagnosis Report
|
||||
|
||||
**Target Skill**: ${skillName}
|
||||
**Diagnosis Type**: ${diagnosisType}
|
||||
**Executed At**: ${new Date().toISOString()}
|
||||
**Duration**: ${diagnosis.execution_time_ms}ms
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Issues Found | ${diagnosis.issues_found} |
|
||||
| Severity | ${diagnosis.severity} |
|
||||
| Patterns Checked | ${diagnosis.details.patterns_checked.length} |
|
||||
| Patterns Matched | ${diagnosis.details.patterns_matched.length} |
|
||||
|
||||
---
|
||||
|
||||
## Issues Identified
|
||||
|
||||
${diagnosis.details.evidence.map((e, i) => `
|
||||
### Issue ${i + 1}
|
||||
|
||||
- **File**: ${e.file}
|
||||
- **Pattern**: ${e.pattern}
|
||||
- **Severity**: ${e.severity}
|
||||
- **Context**: \`${e.context}\`
|
||||
`).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
${diagnosis.details.recommendations.map((r, i) => `${i + 1}. ${r}`).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Raw Data
|
||||
|
||||
Full diagnosis data available at:
|
||||
\`${outputFile}\`
|
||||
`;
|
||||
}
|
||||
```
|
||||
204
.claude/skills/skill-tuning/templates/fix-proposal.md
Normal file
204
.claude/skills/skill-tuning/templates/fix-proposal.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Fix Proposal Template
|
||||
|
||||
Template for fix proposal documentation.
|
||||
|
||||
## Template
|
||||
|
||||
```markdown
|
||||
# Fix Proposal: {{fix_id}}
|
||||
|
||||
**Strategy**: {{strategy}}
|
||||
**Risk Level**: {{risk}}
|
||||
**Issues Addressed**: {{issue_ids}}
|
||||
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
{{description}}
|
||||
|
||||
## Rationale
|
||||
|
||||
{{rationale}}
|
||||
|
||||
---
|
||||
|
||||
## Affected Files
|
||||
|
||||
{{#each changes}}
|
||||
### {{file}}
|
||||
|
||||
**Action**: {{action}}
|
||||
|
||||
```diff
|
||||
{{diff}}
|
||||
```
|
||||
|
||||
{{/each}}
|
||||
|
||||
---
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
{{#each implementation_steps}}
|
||||
{{@index}}. {{this}}
|
||||
{{/each}}
|
||||
|
||||
---
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Factor | Assessment |
|
||||
|--------|------------|
|
||||
| Complexity | {{complexity}} |
|
||||
| Reversibility | {{reversible ? 'Yes' : 'No'}} |
|
||||
| Breaking Changes | {{breaking_changes}} |
|
||||
| Test Coverage | {{test_coverage}} |
|
||||
|
||||
**Overall Risk**: {{risk}}
|
||||
|
||||
---
|
||||
|
||||
## Verification Steps
|
||||
|
||||
{{#each verification_steps}}
|
||||
- [ ] {{this}}
|
||||
{{/each}}
|
||||
|
||||
---
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
{{#if rollback_available}}
|
||||
To rollback this fix:
|
||||
|
||||
```bash
|
||||
{{rollback_command}}
|
||||
```
|
||||
{{else}}
|
||||
_Rollback not available for this fix type._
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Estimated Impact
|
||||
|
||||
{{estimated_impact}}
|
||||
```
|
||||
|
||||
## Variable Reference
|
||||
|
||||
| Variable | Type | Source |
|
||||
|----------|------|--------|
|
||||
| `fix_id` | string | Generated ID (FIX-001) |
|
||||
| `strategy` | string | Fix strategy name |
|
||||
| `risk` | string | 'low' \| 'medium' \| 'high' |
|
||||
| `issue_ids` | array | Related issue IDs |
|
||||
| `description` | string | Human-readable description |
|
||||
| `rationale` | string | Why this fix works |
|
||||
| `changes` | array | File change objects |
|
||||
| `implementation_steps` | array | Step-by-step guide |
|
||||
| `verification_steps` | array | How to verify fix worked |
|
||||
| `estimated_impact` | string | Expected improvement |
|
||||
|
||||
## Usage
|
||||
|
||||
```javascript
|
||||
function renderFixProposal(fix) {
|
||||
return `# Fix Proposal: ${fix.id}
|
||||
|
||||
**Strategy**: ${fix.strategy}
|
||||
**Risk Level**: ${fix.risk}
|
||||
**Issues Addressed**: ${fix.issue_ids.join(', ')}
|
||||
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
${fix.description}
|
||||
|
||||
## Rationale
|
||||
|
||||
${fix.rationale}
|
||||
|
||||
---
|
||||
|
||||
## Affected Files
|
||||
|
||||
${fix.changes.map(change => `
|
||||
### ${change.file}
|
||||
|
||||
**Action**: ${change.action}
|
||||
|
||||
\`\`\`diff
|
||||
${change.diff || change.new_content?.slice(0, 200) || 'N/A'}
|
||||
\`\`\`
|
||||
`).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Verification Steps
|
||||
|
||||
${fix.verification_steps.map(step => `- [ ] ${step}`).join('\n')}
|
||||
|
||||
---
|
||||
|
||||
## Estimated Impact
|
||||
|
||||
${fix.estimated_impact}
|
||||
`;
|
||||
}
|
||||
```
|
||||
|
||||
## Fix Strategy Templates
|
||||
|
||||
### sliding_window
|
||||
|
||||
```markdown
|
||||
## Description
|
||||
Implement sliding window for conversation history to prevent unbounded growth.
|
||||
|
||||
## Changes
|
||||
- Add MAX_HISTORY constant
|
||||
- Modify history update logic to slice array
|
||||
- Update state schema documentation
|
||||
|
||||
## Verification
|
||||
- [ ] Run skill for 10+ iterations
|
||||
- [ ] Verify history.length <= MAX_HISTORY
|
||||
- [ ] Check no data loss for recent items
|
||||
```
|
||||
|
||||
### constraint_injection
|
||||
|
||||
```markdown
|
||||
## Description
|
||||
Add explicit constraint section to each phase prompt.
|
||||
|
||||
## Changes
|
||||
- Add [CONSTRAINTS] section template
|
||||
- Reference state.original_requirements
|
||||
- Add reminder before output section
|
||||
|
||||
## Verification
|
||||
- [ ] Check constraints visible in all phases
|
||||
- [ ] Test with specific constraint
|
||||
- [ ] Verify output respects constraint
|
||||
```
|
||||
|
||||
### error_wrapping
|
||||
|
||||
```markdown
|
||||
## Description
|
||||
Wrap all Task calls in try-catch with retry logic.
|
||||
|
||||
## Changes
|
||||
- Create safeTask wrapper function
|
||||
- Replace direct Task calls
|
||||
- Add error logging to state
|
||||
|
||||
## Verification
|
||||
- [ ] Simulate agent failure
|
||||
- [ ] Verify graceful error handling
|
||||
- [ ] Check retry logic
|
||||
```
|
||||
Reference in New Issue
Block a user