mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-03-12 17:21:19 +08:00
Implement phases for skill iteration tuning: Evaluation, Improvement, and Reporting
- Added Phase 3: Evaluate Quality with steps for preparing context, constructing evaluation prompts, executing evaluation via CLI, parsing scores, and checking termination conditions. - Introduced Phase 4: Apply Improvements to implement targeted changes based on evaluation suggestions, including agent execution and change documentation. - Created Phase 5: Final Report to generate a comprehensive report of the iteration process, including score progression and remaining weaknesses. - Established evaluation criteria in a new document to guide the evaluation process. - Developed templates for evaluation and execution prompts to standardize input for the evaluation and execution phases.
This commit is contained in:
144
.claude/skills/skill-iter-tune/phases/01-setup.md
Normal file
144
.claude/skills/skill-iter-tune/phases/01-setup.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# Phase 1: Setup
|
||||
|
||||
Initialize workspace, backup skills, parse inputs.
|
||||
|
||||
## Objective
|
||||
|
||||
- Parse skill path(s) and test scenario from user input
|
||||
- Validate all skill paths exist and contain SKILL.md
|
||||
- Create isolated workspace directory structure
|
||||
- Backup original skill files
|
||||
- Initialize iteration-state.json
|
||||
|
||||
## Execution
|
||||
|
||||
### Step 1.1: Parse Input
|
||||
|
||||
Parse `$ARGUMENTS` to extract skill paths and test scenario.
|
||||
|
||||
```javascript
|
||||
// Parse skill paths (first argument or comma-separated)
|
||||
const args = $ARGUMENTS.trim();
|
||||
const pathMatch = args.match(/^([^\s]+)/);
|
||||
const rawPaths = pathMatch ? pathMatch[1].split(',') : [];
|
||||
|
||||
// Parse test scenario
|
||||
const scenarioMatch = args.match(/(?:--scenario|--test)\s+"([^"]+)"/);
|
||||
const scenarioText = scenarioMatch ? scenarioMatch[1] : args.replace(rawPaths.join(','), '').trim();
|
||||
|
||||
// Record chain order (preserves input order for chain mode)
|
||||
const chainOrder = rawPaths.map(p => p.startsWith('.claude/') ? p.split('/').pop() : p);
|
||||
|
||||
// If no scenario, ask user
|
||||
if (!scenarioText) {
|
||||
const response = AskUserQuestion({
|
||||
questions: [{
|
||||
question: "Please describe the test scenario for evaluating this skill:",
|
||||
header: "Test Scenario",
|
||||
multiSelect: false,
|
||||
options: [
|
||||
{ label: "General quality test", description: "Evaluate overall skill quality with a generic task" },
|
||||
{ label: "Specific scenario", description: "I'll describe a specific test case" }
|
||||
]
|
||||
}]
|
||||
});
|
||||
// Use response to construct testScenario
|
||||
}
|
||||
```
|
||||
|
||||
### Step 1.2: Validate Skill Paths
|
||||
|
||||
```javascript
|
||||
const targetSkills = [];
|
||||
for (const rawPath of rawPaths) {
|
||||
const skillPath = rawPath.startsWith('.claude/') ? rawPath : `.claude/skills/${rawPath}`;
|
||||
|
||||
// Validate SKILL.md exists
|
||||
const skillFiles = Glob(`${skillPath}/SKILL.md`);
|
||||
if (skillFiles.length === 0) {
|
||||
throw new Error(`Skill not found at: ${skillPath} -- SKILL.md missing`);
|
||||
}
|
||||
|
||||
// Collect all skill files
|
||||
const allFiles = Glob(`${skillPath}/**/*.md`);
|
||||
targetSkills.push({
|
||||
name: skillPath.split('/').pop(),
|
||||
path: skillPath,
|
||||
files: allFiles.map(f => f.replace(skillPath + '/', '')),
|
||||
primary_file: 'SKILL.md'
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Step 1.3: Create Workspace
|
||||
|
||||
```javascript
|
||||
const ts = Date.now();
|
||||
const workDir = `.workflow/.scratchpad/skill-iter-tune-${ts}`;
|
||||
|
||||
Bash(`mkdir -p "${workDir}/backups" "${workDir}/iterations"`);
|
||||
```
|
||||
|
||||
### Step 1.4: Backup Original Skills
|
||||
|
||||
```javascript
|
||||
for (const skill of targetSkills) {
|
||||
Bash(`cp -r "${skill.path}" "${workDir}/backups/${skill.name}"`);
|
||||
}
|
||||
```
|
||||
|
||||
### Step 1.5: Initialize State
|
||||
|
||||
Write `iteration-state.json` with initial state:
|
||||
|
||||
```javascript
|
||||
const initialState = {
|
||||
status: 'running',
|
||||
started_at: new Date().toISOString(),
|
||||
updated_at: new Date().toISOString(),
|
||||
target_skills: targetSkills,
|
||||
test_scenario: {
|
||||
description: scenarioText,
|
||||
// Parse --requirements and --input-args from $ARGUMENTS if provided
|
||||
// e.g., --requirements "clear output,no errors" --input-args "my-skill --scenario test"
|
||||
requirements: parseListArg(args, '--requirements') || [],
|
||||
input_args: parseStringArg(args, '--input-args') || '',
|
||||
success_criteria: parseStringArg(args, '--success-criteria') || 'Produces correct, high-quality output'
|
||||
},
|
||||
execution_mode: workflowPreferences.executionMode || 'single',
|
||||
chain_order: workflowPreferences.executionMode === 'chain'
|
||||
? targetSkills.map(s => s.name)
|
||||
: [],
|
||||
current_iteration: 0,
|
||||
max_iterations: workflowPreferences.maxIterations,
|
||||
quality_threshold: workflowPreferences.qualityThreshold,
|
||||
latest_score: 0,
|
||||
score_trend: [],
|
||||
converged: false,
|
||||
iterations: [],
|
||||
errors: [],
|
||||
error_count: 0,
|
||||
max_errors: 3,
|
||||
work_dir: workDir,
|
||||
backup_dir: `${workDir}/backups`
|
||||
};
|
||||
|
||||
Write(`${workDir}/iteration-state.json`, JSON.stringify(initialState, null, 2));
|
||||
|
||||
// Chain mode: create per-skill tracking tasks
|
||||
if (initialState.execution_mode === 'chain') {
|
||||
for (const skill of targetSkills) {
|
||||
TaskCreate({
|
||||
subject: `Chain: ${skill.name}`,
|
||||
activeForm: `Tracking ${skill.name}`,
|
||||
description: `Skill chain member: ${skill.path} | Position: ${targetSkills.indexOf(skill) + 1}/${targetSkills.length}`
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **Variables**: `workDir`, `targetSkills[]`, `testScenario`, `chainOrder` (chain mode)
|
||||
- **Files**: `iteration-state.json`, `backups/` directory with skill copies
|
||||
- **TodoWrite**: Mark Phase 1 completed, start Iteration Loop. Chain mode: per-skill tracking tasks created
|
||||
292
.claude/skills/skill-iter-tune/phases/02-execute.md
Normal file
292
.claude/skills/skill-iter-tune/phases/02-execute.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# Phase 2: Execute Skill
|
||||
|
||||
> **COMPACT SENTINEL [Phase 2: Execute]**
|
||||
> This phase contains 4 execution steps (Step 2.1 -- 2.4).
|
||||
> If you can read this sentinel but cannot find the full Step protocol below, context has been compressed.
|
||||
> Recovery: `Read("phases/02-execute.md")`
|
||||
|
||||
Execute the target skill against the test scenario using `ccw cli --tool claude --mode write`. Claude receives the full skill definition and simulates producing its expected output artifacts.
|
||||
|
||||
## Objective
|
||||
|
||||
- Snapshot current skill version before execution
|
||||
- Construct execution prompt with full skill content + test scenario
|
||||
- Execute via ccw cli Claude
|
||||
- Collect output artifacts
|
||||
|
||||
## Execution
|
||||
|
||||
### Step 2.1: Snapshot Current Skill
|
||||
|
||||
```javascript
|
||||
const N = state.current_iteration;
|
||||
const iterDir = `${state.work_dir}/iterations/iteration-${N}`;
|
||||
Bash(`mkdir -p "${iterDir}/skill-snapshot" "${iterDir}/artifacts"`);
|
||||
|
||||
// Chain mode: create per-skill artifact directories
|
||||
if (state.execution_mode === 'chain') {
|
||||
for (const skillName of state.chain_order) {
|
||||
Bash(`mkdir -p "${iterDir}/artifacts/${skillName}"`);
|
||||
}
|
||||
}
|
||||
|
||||
// Snapshot current skill state (so we can compare/rollback)
|
||||
for (const skill of state.target_skills) {
|
||||
Bash(`cp -r "${skill.path}" "${iterDir}/skill-snapshot/${skill.name}"`);
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2.2: Construct Execution Prompt (Single Mode)
|
||||
|
||||
Read the execute-prompt template and substitute variables.
|
||||
|
||||
> Skip to Step 2.2b if `state.execution_mode === 'chain'`.
|
||||
|
||||
```javascript
|
||||
// Ref: templates/execute-prompt.md
|
||||
|
||||
// Build skillContent by reading only executable skill files (SKILL.md, phases/, specs/)
|
||||
// Exclude README.md, docs/, and other non-executable files to save tokens
|
||||
const skillContent = state.target_skills.map(skill => {
|
||||
const skillMd = Read(`${skill.path}/SKILL.md`);
|
||||
const phaseFiles = Glob(`${skill.path}/phases/*.md`).sort().map(f => ({
|
||||
relativePath: f.replace(skill.path + '/', ''),
|
||||
content: Read(f)
|
||||
}));
|
||||
const specFiles = Glob(`${skill.path}/specs/*.md`).map(f => ({
|
||||
relativePath: f.replace(skill.path + '/', ''),
|
||||
content: Read(f)
|
||||
}));
|
||||
|
||||
return `### File: SKILL.md\n${skillMd}\n\n` +
|
||||
phaseFiles.map(f => `### File: ${f.relativePath}\n${f.content}`).join('\n\n') +
|
||||
(specFiles.length > 0 ? '\n\n' + specFiles.map(f => `### File: ${f.relativePath}\n${f.content}`).join('\n\n') : '');
|
||||
}).join('\n\n---\n\n');
|
||||
|
||||
// Construct full prompt using template
|
||||
const executePrompt = `PURPOSE: Simulate executing the following workflow skill against a test scenario. Produce all expected output artifacts as if the skill were invoked with the given input.
|
||||
|
||||
SKILL CONTENT:
|
||||
${skillContent}
|
||||
|
||||
TEST SCENARIO:
|
||||
Description: ${state.test_scenario.description}
|
||||
Input Arguments: ${state.test_scenario.input_args}
|
||||
Requirements: ${state.test_scenario.requirements.join('; ')}
|
||||
Success Criteria: ${state.test_scenario.success_criteria}
|
||||
|
||||
TASK:
|
||||
1. Study the complete skill structure (SKILL.md + all phase files)
|
||||
2. Follow the skill execution flow sequentially
|
||||
3. For each phase, produce the artifacts that phase would generate
|
||||
4. Write all output artifacts to the current working directory
|
||||
5. Create a manifest.json listing all produced artifacts
|
||||
|
||||
MODE: write
|
||||
CONTEXT: @**/*
|
||||
EXPECTED: All artifacts written to disk + manifest.json
|
||||
CONSTRAINTS: Follow skill flow exactly, produce realistic output, not placeholders`;
|
||||
```
|
||||
|
||||
### Step 2.3: Execute via ccw cli
|
||||
|
||||
> **CHECKPOINT**: Before executing CLI, verify:
|
||||
> 1. This phase is TodoWrite `in_progress`
|
||||
> 2. `iterDir/artifacts/` directory exists
|
||||
> 3. Prompt is properly escaped
|
||||
|
||||
```javascript
|
||||
function escapeForShell(str) {
|
||||
return str.replace(/"/g, '\\"').replace(/\$/g, '\\$').replace(/`/g, '\\`');
|
||||
}
|
||||
|
||||
const cliCommand = `ccw cli -p "${escapeForShell(executePrompt)}" --tool claude --mode write --cd "${iterDir}/artifacts"`;
|
||||
|
||||
// Execute in background, wait for hook callback
|
||||
Bash({
|
||||
command: cliCommand,
|
||||
run_in_background: true,
|
||||
timeout: 600000 // 10 minutes max
|
||||
});
|
||||
|
||||
// STOP HERE -- wait for hook callback to resume
|
||||
// After callback, verify artifacts were produced
|
||||
```
|
||||
|
||||
### Step 2.2b: Chain Execution Path
|
||||
|
||||
> Skip this step if `state.execution_mode === 'single'`.
|
||||
|
||||
In chain mode, execute each skill sequentially. Each skill receives the previous skill's artifacts as input context.
|
||||
|
||||
```javascript
|
||||
// Chain execution: iterate through chain_order
|
||||
let previousArtifacts = ''; // Accumulates upstream output
|
||||
|
||||
for (let i = 0; i < state.chain_order.length; i++) {
|
||||
const skillName = state.chain_order[i];
|
||||
const skill = state.target_skills.find(s => s.name === skillName);
|
||||
const skillArtifactDir = `${iterDir}/artifacts/${skillName}`;
|
||||
|
||||
// Build this skill's content
|
||||
const skillMd = Read(`${skill.path}/SKILL.md`);
|
||||
const phaseFiles = Glob(`${skill.path}/phases/*.md`).sort().map(f => ({
|
||||
relativePath: f.replace(skill.path + '/', ''),
|
||||
content: Read(f)
|
||||
}));
|
||||
const specFiles = Glob(`${skill.path}/specs/*.md`).map(f => ({
|
||||
relativePath: f.replace(skill.path + '/', ''),
|
||||
content: Read(f)
|
||||
}));
|
||||
|
||||
const singleSkillContent = `### File: SKILL.md\n${skillMd}\n\n` +
|
||||
phaseFiles.map(f => `### File: ${f.relativePath}\n${f.content}`).join('\n\n') +
|
||||
(specFiles.length > 0 ? '\n\n' + specFiles.map(f => `### File: ${f.relativePath}\n${f.content}`).join('\n\n') : '');
|
||||
|
||||
// Build chain context from previous skill's artifacts
|
||||
const chainInputContext = previousArtifacts
|
||||
? `\nPREVIOUS CHAIN OUTPUT (from upstream skill "${state.chain_order[i - 1]}"):\n${previousArtifacts}\n\nIMPORTANT: Use the above output as input context for this skill's execution.\n`
|
||||
: '';
|
||||
|
||||
// Construct per-skill execution prompt
|
||||
// Ref: templates/execute-prompt.md
|
||||
const chainPrompt = `PURPOSE: Simulate executing the following workflow skill against a test scenario. Produce all expected output artifacts.
|
||||
|
||||
SKILL CONTENT (${skillName} — chain position ${i + 1}/${state.chain_order.length}):
|
||||
${singleSkillContent}
|
||||
${chainInputContext}
|
||||
TEST SCENARIO:
|
||||
Description: ${state.test_scenario.description}
|
||||
Input Arguments: ${state.test_scenario.input_args}
|
||||
Requirements: ${state.test_scenario.requirements.join('; ')}
|
||||
Success Criteria: ${state.test_scenario.success_criteria}
|
||||
|
||||
TASK:
|
||||
1. Study the complete skill structure
|
||||
2. Follow the skill execution flow sequentially
|
||||
3. Produce all expected artifacts
|
||||
4. Write output to the current working directory
|
||||
5. Create manifest.json listing all produced artifacts
|
||||
|
||||
MODE: write
|
||||
CONTEXT: @**/*
|
||||
CONSTRAINTS: Follow skill flow exactly, produce realistic output`;
|
||||
|
||||
function escapeForShell(str) {
|
||||
return str.replace(/"/g, '\\"').replace(/\$/g, '\\$').replace(/`/g, '\\`');
|
||||
}
|
||||
|
||||
const cliCommand = `ccw cli -p "${escapeForShell(chainPrompt)}" --tool claude --mode write --cd "${skillArtifactDir}"`;
|
||||
|
||||
// Execute in background
|
||||
Bash({
|
||||
command: cliCommand,
|
||||
run_in_background: true,
|
||||
timeout: 600000
|
||||
});
|
||||
|
||||
// STOP -- wait for hook callback
|
||||
|
||||
// After callback: collect artifacts for next skill in chain
|
||||
const artifacts = Glob(`${skillArtifactDir}/**/*`);
|
||||
const skillSuccess = artifacts.length > 0;
|
||||
|
||||
if (skillSuccess) {
|
||||
previousArtifacts = artifacts.slice(0, 10).map(f => {
|
||||
const relPath = f.replace(skillArtifactDir + '/', '');
|
||||
const content = Read(f, { limit: 100 });
|
||||
return `--- ${relPath} ---\n${content}`;
|
||||
}).join('\n\n');
|
||||
} else {
|
||||
// Mid-chain failure: keep previous artifacts for downstream skills
|
||||
// Log warning but continue chain — downstream skills receive last successful output
|
||||
state.errors.push({
|
||||
phase: 'execute',
|
||||
message: `Chain skill "${skillName}" (position ${i + 1}) produced no artifacts. Downstream skills will receive upstream output from "${state.chain_order[i - 1] || 'none'}" instead.`,
|
||||
timestamp: new Date().toISOString()
|
||||
});
|
||||
state.error_count++;
|
||||
// previousArtifacts remains from last successful skill (or empty if first)
|
||||
}
|
||||
|
||||
// Update per-skill TodoWrite
|
||||
// TaskUpdate chain skill task with execution status
|
||||
|
||||
// Record per-skill execution
|
||||
if (!state.iterations[N - 1].execution.chain_executions) {
|
||||
state.iterations[N - 1].execution.chain_executions = [];
|
||||
}
|
||||
state.iterations[N - 1].execution.chain_executions.push({
|
||||
skill_name: skillName,
|
||||
cli_command: cliCommand,
|
||||
artifacts_dir: skillArtifactDir,
|
||||
success: skillSuccess
|
||||
});
|
||||
|
||||
// Check error budget: abort chain if too many consecutive failures
|
||||
if (state.error_count >= 3) {
|
||||
state.errors.push({
|
||||
phase: 'execute',
|
||||
message: `Chain execution aborted at skill "${skillName}" — error limit reached (${state.error_count} errors).`,
|
||||
timestamp: new Date().toISOString()
|
||||
});
|
||||
break;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2.4: Collect Artifacts
|
||||
|
||||
After CLI completes (hook callback received):
|
||||
|
||||
```javascript
|
||||
// List produced artifacts
|
||||
const artifactFiles = Glob(`${iterDir}/artifacts/**/*`);
|
||||
|
||||
// Chain mode: check per-skill artifacts
|
||||
if (state.execution_mode === 'chain') {
|
||||
const chainSuccess = state.iterations[N - 1].execution.chain_executions?.every(e => e.success) ?? false;
|
||||
state.iterations[N - 1].execution.success = chainSuccess;
|
||||
state.iterations[N - 1].execution.artifacts_dir = `${iterDir}/artifacts`;
|
||||
} else {
|
||||
|
||||
if (artifactFiles.length === 0) {
|
||||
// Execution produced nothing -- record error
|
||||
state.iterations[N - 1].execution = {
|
||||
cli_command: cliCommand,
|
||||
started_at: new Date().toISOString(),
|
||||
completed_at: new Date().toISOString(),
|
||||
artifacts_dir: `${iterDir}/artifacts`,
|
||||
success: false
|
||||
};
|
||||
state.error_count++;
|
||||
// Continue to Phase 3 anyway -- Gemini can evaluate the skill even without artifacts
|
||||
} else {
|
||||
state.iterations[N - 1].execution = {
|
||||
cli_command: cliCommand,
|
||||
started_at: new Date().toISOString(),
|
||||
completed_at: new Date().toISOString(),
|
||||
artifacts_dir: `${iterDir}/artifacts`,
|
||||
success: true
|
||||
};
|
||||
}
|
||||
|
||||
} // end single mode branch
|
||||
|
||||
// Update state
|
||||
Write(`${state.work_dir}/iteration-state.json`, JSON.stringify(state, null, 2));
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Recovery |
|
||||
|-------|----------|
|
||||
| CLI timeout (10min) | Record failure, continue to Phase 3 without artifacts |
|
||||
| CLI crash | Retry once with simplified prompt (SKILL.md only, no phase files) |
|
||||
| No artifacts produced | Continue to Phase 3, evaluation focuses on skill definition quality |
|
||||
|
||||
## Output
|
||||
|
||||
- **Files**: `iteration-{N}/skill-snapshot/`, `iteration-{N}/artifacts/`
|
||||
- **State**: `iterations[N-1].execution` updated
|
||||
- **Next**: Phase 3 (Evaluate)
|
||||
312
.claude/skills/skill-iter-tune/phases/03-evaluate.md
Normal file
312
.claude/skills/skill-iter-tune/phases/03-evaluate.md
Normal file
@@ -0,0 +1,312 @@
|
||||
# Phase 3: Evaluate Quality
|
||||
|
||||
> **COMPACT SENTINEL [Phase 3: Evaluate]**
|
||||
> This phase contains 5 execution steps (Step 3.1 -- 3.5).
|
||||
> If you can read this sentinel but cannot find the full Step protocol below, context has been compressed.
|
||||
> Recovery: `Read("phases/03-evaluate.md")`
|
||||
|
||||
Evaluate skill quality using `ccw cli --tool gemini --mode analysis`. Gemini scores the skill across 5 dimensions and provides improvement suggestions.
|
||||
|
||||
## Objective
|
||||
|
||||
- Construct evaluation prompt with skill + artifacts + criteria
|
||||
- Execute via ccw cli Gemini
|
||||
- Parse multi-dimensional score
|
||||
- Write iteration-{N}-eval.md
|
||||
- Check termination conditions
|
||||
|
||||
## Execution
|
||||
|
||||
### Step 3.1: Prepare Evaluation Context
|
||||
|
||||
```javascript
|
||||
const N = state.current_iteration;
|
||||
const iterDir = `${state.work_dir}/iterations/iteration-${N}`;
|
||||
|
||||
// Read evaluation criteria
|
||||
// Ref: specs/evaluation-criteria.md
|
||||
const evaluationCriteria = Read('.claude/skills/skill-iter-tune/specs/evaluation-criteria.md');
|
||||
|
||||
// Build skillContent (same pattern as Phase 02 — only executable files)
|
||||
const skillContent = state.target_skills.map(skill => {
|
||||
const skillMd = Read(`${skill.path}/SKILL.md`);
|
||||
const phaseFiles = Glob(`${skill.path}/phases/*.md`).sort().map(f => ({
|
||||
relativePath: f.replace(skill.path + '/', ''),
|
||||
content: Read(f)
|
||||
}));
|
||||
const specFiles = Glob(`${skill.path}/specs/*.md`).map(f => ({
|
||||
relativePath: f.replace(skill.path + '/', ''),
|
||||
content: Read(f)
|
||||
}));
|
||||
return `### File: SKILL.md\n${skillMd}\n\n` +
|
||||
phaseFiles.map(f => `### File: ${f.relativePath}\n${f.content}`).join('\n\n') +
|
||||
(specFiles.length > 0 ? '\n\n' + specFiles.map(f => `### File: ${f.relativePath}\n${f.content}`).join('\n\n') : '');
|
||||
}).join('\n\n---\n\n');
|
||||
|
||||
// Build artifacts summary
|
||||
let artifactsSummary = 'No artifacts produced (execution may have failed)';
|
||||
|
||||
if (state.execution_mode === 'chain') {
|
||||
// Chain mode: group artifacts by skill
|
||||
const chainSummaries = state.chain_order.map(skillName => {
|
||||
const skillArtifactDir = `${iterDir}/artifacts/${skillName}`;
|
||||
const files = Glob(`${skillArtifactDir}/**/*`);
|
||||
if (files.length === 0) return `### ${skillName} (no artifacts)`;
|
||||
const filesSummary = files.map(f => {
|
||||
const relPath = f.replace(`${skillArtifactDir}/`, '');
|
||||
const content = Read(f, { limit: 200 });
|
||||
return `--- ${relPath} ---\n${content}`;
|
||||
}).join('\n\n');
|
||||
return `### ${skillName} (chain position ${state.chain_order.indexOf(skillName) + 1})\n${filesSummary}`;
|
||||
});
|
||||
artifactsSummary = chainSummaries.join('\n\n---\n\n');
|
||||
} else {
|
||||
// Single mode (existing)
|
||||
const artifactFiles = Glob(`${iterDir}/artifacts/**/*`);
|
||||
if (artifactFiles.length > 0) {
|
||||
artifactsSummary = artifactFiles.map(f => {
|
||||
const relPath = f.replace(`${iterDir}/artifacts/`, '');
|
||||
const content = Read(f, { limit: 200 });
|
||||
return `--- ${relPath} ---\n${content}`;
|
||||
}).join('\n\n');
|
||||
}
|
||||
}
|
||||
|
||||
// Build previous evaluation context
|
||||
const previousEvalContext = state.iterations.filter(i => i.evaluation).length > 0
|
||||
? `PREVIOUS ITERATIONS:\n` + state.iterations.filter(i => i.evaluation).map(iter =>
|
||||
`Iteration ${iter.round}: Score ${iter.evaluation.score}\n` +
|
||||
` Applied: ${iter.improvement?.changes_applied?.map(c => c.summary).join('; ') || 'none'}\n` +
|
||||
` Weaknesses: ${iter.evaluation.weaknesses?.slice(0, 3).join('; ') || 'none'}`
|
||||
).join('\n') + '\nIMPORTANT: Focus on NEW issues not yet addressed.'
|
||||
: '';
|
||||
```
|
||||
|
||||
### Step 3.2: Construct Evaluation Prompt
|
||||
|
||||
```javascript
|
||||
// Ref: templates/eval-prompt.md
|
||||
const evalPrompt = `PURPOSE: Evaluate the quality of a workflow skill by examining its definition and produced artifacts.
|
||||
|
||||
SKILL DEFINITION:
|
||||
${skillContent}
|
||||
|
||||
TEST SCENARIO:
|
||||
${state.test_scenario.description}
|
||||
Requirements: ${state.test_scenario.requirements.join('; ')}
|
||||
Success Criteria: ${state.test_scenario.success_criteria}
|
||||
|
||||
ARTIFACTS PRODUCED:
|
||||
${artifactsSummary}
|
||||
|
||||
EVALUATION CRITERIA:
|
||||
${evaluationCriteria}
|
||||
|
||||
${previousEvalContext}
|
||||
|
||||
${state.execution_mode === 'chain' ? `
|
||||
CHAIN CONTEXT:
|
||||
This skill chain contains ${state.chain_order.length} skills executed in order:
|
||||
${state.chain_order.map((s, i) => `${i+1}. ${s}`).join('\n')}
|
||||
Current evaluation covers the entire chain output.
|
||||
Please provide per-skill quality scores in an additional "chain_scores" field: { "${state.chain_order[0]}": <score>, ... }
|
||||
` : ''}
|
||||
|
||||
TASK:
|
||||
1. Score each dimension (Clarity 0.20, Completeness 0.25, Correctness 0.25, Effectiveness 0.20, Efficiency 0.10) on 0-100
|
||||
2. Calculate weighted composite score
|
||||
3. List top 3 strengths
|
||||
4. List top 3-5 weaknesses with file:section references
|
||||
5. Provide 3-5 prioritized improvement suggestions with concrete changes
|
||||
|
||||
EXPECTED OUTPUT (strict JSON, no markdown):
|
||||
{
|
||||
"composite_score": <0-100>,
|
||||
"dimensions": [
|
||||
{"name":"Clarity","id":"clarity","score":<0-100>,"weight":0.20,"feedback":"..."},
|
||||
{"name":"Completeness","id":"completeness","score":<0-100>,"weight":0.25,"feedback":"..."},
|
||||
{"name":"Correctness","id":"correctness","score":<0-100>,"weight":0.25,"feedback":"..."},
|
||||
{"name":"Effectiveness","id":"effectiveness","score":<0-100>,"weight":0.20,"feedback":"..."},
|
||||
{"name":"Efficiency","id":"efficiency","score":<0-100>,"weight":0.10,"feedback":"..."}
|
||||
],
|
||||
"strengths": ["...", "...", "..."],
|
||||
"weaknesses": ["...with file:section ref...", "..."],
|
||||
"suggestions": [
|
||||
{"priority":"high|medium|low","target_file":"...","description":"...","rationale":"...","code_snippet":"..."}
|
||||
]
|
||||
}
|
||||
|
||||
CONSTRAINTS: Be rigorous, reference exact files, focus on highest-impact changes, output ONLY JSON`;
|
||||
```
|
||||
|
||||
### Step 3.3: Execute via ccw cli Gemini
|
||||
|
||||
> **CHECKPOINT**: Verify evaluation prompt is properly constructed before CLI execution.
|
||||
|
||||
```javascript
|
||||
// Shell escape utility (same as Phase 02)
|
||||
function escapeForShell(str) {
|
||||
return str.replace(/"/g, '\\"').replace(/\$/g, '\\$').replace(/`/g, '\\`');
|
||||
}
|
||||
|
||||
const skillPath = state.target_skills[0].path; // Primary skill for --cd
|
||||
|
||||
const cliCommand = `ccw cli -p "${escapeForShell(evalPrompt)}" --tool gemini --mode analysis --cd "${skillPath}"`;
|
||||
|
||||
// Execute in background
|
||||
Bash({
|
||||
command: cliCommand,
|
||||
run_in_background: true,
|
||||
timeout: 300000 // 5 minutes
|
||||
});
|
||||
|
||||
// STOP -- wait for hook callback
|
||||
```
|
||||
|
||||
### Step 3.4: Parse Score and Write Eval File
|
||||
|
||||
After CLI completes:
|
||||
|
||||
```javascript
|
||||
// Parse JSON from Gemini output
|
||||
// The output may contain markdown wrapping -- extract JSON
|
||||
const rawOutput = /* CLI output from callback */;
|
||||
const jsonMatch = rawOutput.match(/\{[\s\S]*\}/);
|
||||
let evaluation;
|
||||
|
||||
if (jsonMatch) {
|
||||
try {
|
||||
evaluation = JSON.parse(jsonMatch[0]);
|
||||
// Extract chain_scores if present
|
||||
if (state.execution_mode === 'chain' && evaluation.chain_scores) {
|
||||
state.iterations[N - 1].evaluation.chain_scores = evaluation.chain_scores;
|
||||
}
|
||||
} catch (e) {
|
||||
// Fallback: try to extract score heuristically
|
||||
const scoreMatch = rawOutput.match(/"composite_score"\s*:\s*(\d+)/);
|
||||
evaluation = {
|
||||
composite_score: scoreMatch ? parseInt(scoreMatch[1]) : 50,
|
||||
dimensions: [],
|
||||
strengths: [],
|
||||
weaknesses: ['Evaluation output parsing failed -- raw output saved'],
|
||||
suggestions: []
|
||||
};
|
||||
}
|
||||
} else {
|
||||
evaluation = {
|
||||
composite_score: 50,
|
||||
dimensions: [],
|
||||
strengths: [],
|
||||
weaknesses: ['No structured evaluation output -- defaulting to 50'],
|
||||
suggestions: []
|
||||
};
|
||||
}
|
||||
|
||||
// Write iteration-N-eval.md
|
||||
const evalReport = `# Iteration ${N} Evaluation
|
||||
|
||||
**Composite Score**: ${evaluation.composite_score}/100
|
||||
**Date**: ${new Date().toISOString()}
|
||||
|
||||
## Dimension Scores
|
||||
|
||||
| Dimension | Score | Weight | Feedback |
|
||||
|-----------|-------|--------|----------|
|
||||
${(evaluation.dimensions || []).map(d =>
|
||||
`| ${d.name} | ${d.score} | ${d.weight} | ${d.feedback} |`
|
||||
).join('\n')}
|
||||
|
||||
${(state.execution_mode === 'chain' && evaluation.chain_scores) ? `
|
||||
## Chain Scores
|
||||
|
||||
| Skill | Score | Chain Position |
|
||||
|-------|-------|----------------|
|
||||
${state.chain_order.map((s, i) => `| ${s} | ${evaluation.chain_scores[s] || '-'} | ${i + 1} |`).join('\n')}
|
||||
` : ''}
|
||||
|
||||
## Strengths
|
||||
${(evaluation.strengths || []).map(s => `- ${s}`).join('\n')}
|
||||
|
||||
## Weaknesses
|
||||
${(evaluation.weaknesses || []).map(w => `- ${w}`).join('\n')}
|
||||
|
||||
## Improvement Suggestions
|
||||
${(evaluation.suggestions || []).map((s, i) =>
|
||||
`### ${i + 1}. [${s.priority}] ${s.description}\n- **Target**: ${s.target_file}\n- **Rationale**: ${s.rationale}\n${s.code_snippet ? `- **Suggested**:\n\`\`\`\n${s.code_snippet}\n\`\`\`` : ''}`
|
||||
).join('\n\n')}
|
||||
`;
|
||||
|
||||
Write(`${iterDir}/iteration-${N}-eval.md`, evalReport);
|
||||
|
||||
// Update state
|
||||
state.iterations[N - 1].evaluation = {
|
||||
score: evaluation.composite_score,
|
||||
dimensions: evaluation.dimensions || [],
|
||||
strengths: evaluation.strengths || [],
|
||||
weaknesses: evaluation.weaknesses || [],
|
||||
suggestions: evaluation.suggestions || [],
|
||||
chain_scores: evaluation.chain_scores || null,
|
||||
eval_file: `${iterDir}/iteration-${N}-eval.md`
|
||||
};
|
||||
state.latest_score = evaluation.composite_score;
|
||||
state.score_trend.push(evaluation.composite_score);
|
||||
|
||||
Write(`${state.work_dir}/iteration-state.json`, JSON.stringify(state, null, 2));
|
||||
```
|
||||
|
||||
### Step 3.5: Check Termination
|
||||
|
||||
```javascript
|
||||
function shouldTerminate(state) {
|
||||
// 1. Quality threshold met
|
||||
if (state.latest_score >= state.quality_threshold) {
|
||||
return { terminate: true, reason: 'quality_threshold_met' };
|
||||
}
|
||||
|
||||
// 2. Max iterations reached
|
||||
if (state.current_iteration >= state.max_iterations) {
|
||||
return { terminate: true, reason: 'max_iterations_reached' };
|
||||
}
|
||||
|
||||
// 3. Convergence: no improvement in last 2 iterations
|
||||
if (state.score_trend.length >= 3) {
|
||||
const last3 = state.score_trend.slice(-3);
|
||||
const improvement = last3[2] - last3[0];
|
||||
if (improvement <= 2) {
|
||||
state.converged = true;
|
||||
return { terminate: true, reason: 'convergence_detected' };
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Error limit
|
||||
if (state.error_count >= state.max_errors) {
|
||||
return { terminate: true, reason: 'error_limit_reached' };
|
||||
}
|
||||
|
||||
return { terminate: false };
|
||||
}
|
||||
|
||||
const termination = shouldTerminate(state);
|
||||
if (termination.terminate) {
|
||||
state.termination_reason = termination.reason;
|
||||
Write(`${state.work_dir}/iteration-state.json`, JSON.stringify(state, null, 2));
|
||||
// Skip Phase 4, go directly to Phase 5 (Report)
|
||||
} else {
|
||||
// Continue to Phase 4 (Improve)
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Recovery |
|
||||
|-------|----------|
|
||||
| CLI timeout | Retry once, if still fails use score 50 with warning |
|
||||
| JSON parse failure | Extract score heuristically, save raw output |
|
||||
| No output | Default score 50, note in weaknesses |
|
||||
|
||||
## Output
|
||||
|
||||
- **Files**: `iteration-{N}-eval.md`
|
||||
- **State**: `iterations[N-1].evaluation`, `latest_score`, `score_trend` updated
|
||||
- **Decision**: terminate -> Phase 5, continue -> Phase 4
|
||||
- **TodoWrite**: Update current iteration score display
|
||||
186
.claude/skills/skill-iter-tune/phases/04-improve.md
Normal file
186
.claude/skills/skill-iter-tune/phases/04-improve.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# Phase 4: Apply Improvements
|
||||
|
||||
> **COMPACT SENTINEL [Phase 4: Improve]**
|
||||
> This phase contains 4 execution steps (Step 4.1 -- 4.4).
|
||||
> If you can read this sentinel but cannot find the full Step protocol below, context has been compressed.
|
||||
> Recovery: `Read("phases/04-improve.md")`
|
||||
|
||||
Apply targeted improvements to skill files based on evaluation suggestions. Uses a general-purpose Agent to make changes, ensuring only suggested modifications are applied.
|
||||
|
||||
## Objective
|
||||
|
||||
- Read evaluation suggestions from current iteration
|
||||
- Launch Agent to apply improvements in priority order
|
||||
- Document all changes made
|
||||
- Update iteration state
|
||||
|
||||
## Execution
|
||||
|
||||
### Step 4.1: Prepare Improvement Context
|
||||
|
||||
```javascript
|
||||
const N = state.current_iteration;
|
||||
const iterDir = `${state.work_dir}/iterations/iteration-${N}`;
|
||||
const evaluation = state.iterations[N - 1].evaluation;
|
||||
|
||||
// Verify we have suggestions to apply
|
||||
if (!evaluation.suggestions || evaluation.suggestions.length === 0) {
|
||||
// No suggestions -- skip improvement, mark iteration complete
|
||||
state.iterations[N - 1].improvement = {
|
||||
changes_applied: [],
|
||||
changes_file: null,
|
||||
improvement_rationale: 'No suggestions provided by evaluation'
|
||||
};
|
||||
state.iterations[N - 1].status = 'completed';
|
||||
Write(`${state.work_dir}/iteration-state.json`, JSON.stringify(state, null, 2));
|
||||
// -> Return to orchestrator for next iteration
|
||||
return;
|
||||
}
|
||||
|
||||
// Build file inventory for agent context
|
||||
const skillFileInventory = state.target_skills.map(skill => {
|
||||
return `Skill: ${skill.name} (${skill.path})\nFiles:\n` +
|
||||
skill.files.map(f => ` - ${f}`).join('\n');
|
||||
}).join('\n\n');
|
||||
|
||||
// Chain mode: add chain relationship context
|
||||
const chainContext = state.execution_mode === 'chain'
|
||||
? `\nChain Order: ${state.chain_order.join(' -> ')}\n` +
|
||||
`Chain Scores: ${state.chain_order.map(s =>
|
||||
`${s}: ${state.iterations[N-1].evaluation?.chain_scores?.[s] || 'N/A'}`
|
||||
).join(', ')}\n` +
|
||||
`Weakest Link: ${state.chain_order.reduce((min, s) => {
|
||||
const score = state.iterations[N-1].evaluation?.chain_scores?.[s] || 100;
|
||||
return score < (state.iterations[N-1].evaluation?.chain_scores?.[min] || 100) ? s : min;
|
||||
}, state.chain_order[0])}`
|
||||
: '';
|
||||
```
|
||||
|
||||
### Step 4.2: Launch Improvement Agent
|
||||
|
||||
> **CHECKPOINT**: Before launching agent, verify:
|
||||
> 1. evaluation.suggestions is non-empty
|
||||
> 2. All target_file paths in suggestions are valid
|
||||
|
||||
```javascript
|
||||
const suggestionsText = evaluation.suggestions.map((s, i) =>
|
||||
`${i + 1}. [${s.priority.toUpperCase()}] ${s.description}\n` +
|
||||
` Target: ${s.target_file}\n` +
|
||||
` Rationale: ${s.rationale}\n` +
|
||||
(s.code_snippet ? ` Suggested change:\n ${s.code_snippet}\n` : '')
|
||||
).join('\n');
|
||||
|
||||
Agent({
|
||||
subagent_type: 'general-purpose',
|
||||
run_in_background: false,
|
||||
description: `Apply skill improvements iteration ${N}`,
|
||||
prompt: `## Task: Apply Targeted Improvements to Skill Files
|
||||
|
||||
You are improving a workflow skill based on evaluation feedback. Apply ONLY the suggested changes -- do not refactor, add features, or "improve" beyond what is explicitly suggested.
|
||||
|
||||
## Current Score: ${evaluation.score}/100
|
||||
Dimension breakdown:
|
||||
${evaluation.dimensions.map(d => `- ${d.name}: ${d.score}/100`).join('\n')}
|
||||
|
||||
## Skill File Inventory
|
||||
${skillFileInventory}
|
||||
|
||||
${chainContext ? `## Chain Context\n${chainContext}\n\nPrioritize improvements on the weakest skill in the chain. Also consider interface compatibility between adjacent skills in the chain.\n` : ''}
|
||||
|
||||
## Improvement Suggestions (apply in priority order)
|
||||
${suggestionsText}
|
||||
|
||||
## Rules
|
||||
1. Read each target file BEFORE modifying it
|
||||
2. Apply ONLY the suggested changes -- no unsolicited modifications
|
||||
3. If a suggestion's target_file doesn't exist, skip it and note in summary
|
||||
4. If a suggestion conflicts with existing patterns, adapt it to fit (note adaptation)
|
||||
5. Preserve existing code style, naming conventions, and structure
|
||||
6. After all changes, write a change summary to: ${iterDir}/iteration-${N}-changes.md
|
||||
|
||||
## Changes Summary Format (write to ${iterDir}/iteration-${N}-changes.md)
|
||||
|
||||
# Iteration ${N} Changes
|
||||
|
||||
## Applied Suggestions
|
||||
- [high] description: what was changed in which file
|
||||
- [medium] description: what was changed in which file
|
||||
|
||||
## Files Modified
|
||||
- path/to/file.md: brief description of changes
|
||||
|
||||
## Skipped Suggestions (if any)
|
||||
- description: reason for skipping
|
||||
|
||||
## Notes
|
||||
- Any adaptations or considerations
|
||||
|
||||
## Success Criteria
|
||||
- All high-priority suggestions applied
|
||||
- Medium-priority suggestions applied if feasible
|
||||
- Low-priority suggestions applied if trivial
|
||||
- Changes summary written to ${iterDir}/iteration-${N}-changes.md
|
||||
`
|
||||
});
|
||||
```
|
||||
|
||||
### Step 4.3: Verify Changes
|
||||
|
||||
After agent completes:
|
||||
|
||||
```javascript
|
||||
// Verify changes summary was written
|
||||
const changesFile = `${iterDir}/iteration-${N}-changes.md`;
|
||||
const changesExist = Glob(changesFile).length > 0;
|
||||
|
||||
if (!changesExist) {
|
||||
// Agent didn't write summary -- create a minimal one
|
||||
Write(changesFile, `# Iteration ${N} Changes\n\n## Notes\nAgent completed but did not produce changes summary.\n`);
|
||||
}
|
||||
|
||||
// Read changes summary to extract applied changes
|
||||
const changesContent = Read(changesFile);
|
||||
|
||||
// Parse applied changes (heuristic: count lines starting with "- [")
|
||||
const appliedMatches = changesContent.match(/^- \[.+?\]/gm) || [];
|
||||
const changes_applied = appliedMatches.map(m => ({
|
||||
summary: m.replace(/^- /, ''),
|
||||
file: '' // Extracted from context
|
||||
}));
|
||||
```
|
||||
|
||||
### Step 4.4: Update State
|
||||
|
||||
```javascript
|
||||
state.iterations[N - 1].improvement = {
|
||||
changes_applied: changes_applied,
|
||||
changes_file: changesFile,
|
||||
improvement_rationale: `Applied ${changes_applied.length} improvements based on evaluation score ${evaluation.score}`
|
||||
};
|
||||
state.iterations[N - 1].status = 'completed';
|
||||
state.updated_at = new Date().toISOString();
|
||||
|
||||
// Also update the skill files list in case new files were created
|
||||
for (const skill of state.target_skills) {
|
||||
skill.files = Glob(`${skill.path}/**/*.md`).map(f => f.replace(skill.path + '/', ''));
|
||||
}
|
||||
|
||||
Write(`${state.work_dir}/iteration-state.json`, JSON.stringify(state, null, 2));
|
||||
|
||||
// -> Return to orchestrator for next iteration (Phase 2) or termination check
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Error | Recovery |
|
||||
|-------|----------|
|
||||
| Agent fails to complete | Rollback from skill-snapshot: `cp -r "${iterDir}/skill-snapshot/${skill.name}/*" "${skill.path}/"` |
|
||||
| Agent corrupts files | Same rollback from snapshot |
|
||||
| Changes summary missing | Create minimal summary, continue |
|
||||
| target_file not found | Agent skips suggestion, notes in summary |
|
||||
|
||||
## Output
|
||||
|
||||
- **Files**: `iteration-{N}-changes.md`, modified skill files
|
||||
- **State**: `iterations[N-1].improvement` and `.status` updated
|
||||
- **Next**: Return to orchestrator, begin next iteration (Phase 2) or terminate
|
||||
166
.claude/skills/skill-iter-tune/phases/05-report.md
Normal file
166
.claude/skills/skill-iter-tune/phases/05-report.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# Phase 5: Final Report
|
||||
|
||||
> **COMPACT SENTINEL [Phase 5: Report]**
|
||||
> This phase contains 4 execution steps (Step 5.1 -- 5.4).
|
||||
> If you can read this sentinel but cannot find the full Step protocol below, context has been compressed.
|
||||
> Recovery: `Read("phases/05-report.md")`
|
||||
|
||||
Generate comprehensive iteration history report and display results to user.
|
||||
|
||||
## Objective
|
||||
|
||||
- Read complete iteration state
|
||||
- Generate formatted final report with score progression
|
||||
- Write final-report.md
|
||||
- Display summary to user
|
||||
|
||||
## Execution
|
||||
|
||||
### Step 5.1: Read Complete State
|
||||
|
||||
```javascript
|
||||
const state = JSON.parse(Read(`${state.work_dir}/iteration-state.json`));
|
||||
state.status = 'completed';
|
||||
state.updated_at = new Date().toISOString();
|
||||
```
|
||||
|
||||
### Step 5.2: Generate Report
|
||||
|
||||
```javascript
|
||||
// Determine outcome
|
||||
const outcomeMap = {
|
||||
quality_threshold_met: 'PASSED -- Quality threshold reached',
|
||||
max_iterations_reached: 'MAX ITERATIONS -- Threshold not reached',
|
||||
convergence_detected: 'CONVERGED -- Score stopped improving',
|
||||
error_limit_reached: 'FAILED -- Too many errors'
|
||||
};
|
||||
const outcome = outcomeMap[state.termination_reason] || 'COMPLETED';
|
||||
|
||||
// Build score progression table
|
||||
const scoreTable = state.iterations
|
||||
.filter(i => i.evaluation)
|
||||
.map(i => {
|
||||
const dims = i.evaluation.dimensions || [];
|
||||
const dimScores = ['clarity', 'completeness', 'correctness', 'effectiveness', 'efficiency']
|
||||
.map(id => {
|
||||
const dim = dims.find(d => d.id === id);
|
||||
return dim ? dim.score : '-';
|
||||
});
|
||||
return `| ${i.round} | ${i.evaluation.score} | ${dimScores.join(' | ')} |`;
|
||||
}).join('\n');
|
||||
|
||||
// Build iteration details
|
||||
const iterationDetails = state.iterations.map(iter => {
|
||||
const evalSection = iter.evaluation
|
||||
? `**Score**: ${iter.evaluation.score}/100\n` +
|
||||
`**Strengths**: ${iter.evaluation.strengths?.join(', ') || 'N/A'}\n` +
|
||||
`**Weaknesses**: ${iter.evaluation.weaknesses?.slice(0, 3).join(', ') || 'N/A'}`
|
||||
: '**Evaluation**: Skipped or failed';
|
||||
|
||||
const changesSection = iter.improvement
|
||||
? `**Changes Applied**: ${iter.improvement.changes_applied?.length || 0}\n` +
|
||||
(iter.improvement.changes_applied?.map(c => ` - ${c.summary}`).join('\n') || ' None')
|
||||
: '**Improvements**: None';
|
||||
|
||||
return `### Iteration ${iter.round}\n${evalSection}\n${changesSection}`;
|
||||
}).join('\n\n');
|
||||
|
||||
const report = `# Skill Iter Tune -- Final Report
|
||||
|
||||
## Summary
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Target Skills** | ${state.target_skills.map(s => s.name).join(', ')} |
|
||||
| **Execution Mode** | ${state.execution_mode} |
|
||||
${state.execution_mode === 'chain' ? `| **Chain Order** | ${state.chain_order.join(' -> ')} |` : ''}
|
||||
| **Test Scenario** | ${state.test_scenario.description} |
|
||||
| **Iterations** | ${state.iterations.length} |
|
||||
| **Initial Score** | ${state.score_trend[0] || 'N/A'} |
|
||||
| **Final Score** | ${state.latest_score}/100 |
|
||||
| **Quality Threshold** | ${state.quality_threshold} |
|
||||
| **Outcome** | ${outcome} |
|
||||
| **Started** | ${state.started_at} |
|
||||
| **Completed** | ${state.updated_at} |
|
||||
|
||||
## Score Progression
|
||||
|
||||
| Iter | Composite | Clarity | Completeness | Correctness | Effectiveness | Efficiency |
|
||||
|------|-----------|---------|--------------|-------------|---------------|------------|
|
||||
${scoreTable}
|
||||
|
||||
**Trend**: ${state.score_trend.join(' -> ')}
|
||||
|
||||
${state.execution_mode === 'chain' ? `
|
||||
## Chain Score Progression
|
||||
|
||||
| Iter | ${state.chain_order.join(' | ')} |
|
||||
|------|${state.chain_order.map(() => '------').join('|')}|
|
||||
${state.iterations.filter(i => i.evaluation?.chain_scores).map(i => {
|
||||
const scores = state.chain_order.map(s => i.evaluation.chain_scores[s] || '-');
|
||||
return `| ${i.round} | ${scores.join(' | ')} |`;
|
||||
}).join('\n')}
|
||||
` : ''}
|
||||
|
||||
## Iteration Details
|
||||
|
||||
${iterationDetails}
|
||||
|
||||
## Remaining Weaknesses
|
||||
|
||||
${state.iterations.length > 0 && state.iterations[state.iterations.length - 1].evaluation
|
||||
? state.iterations[state.iterations.length - 1].evaluation.weaknesses?.map(w => `- ${w}`).join('\n') || 'None identified'
|
||||
: 'No evaluation data available'}
|
||||
|
||||
## Artifact Locations
|
||||
|
||||
| Path | Description |
|
||||
|------|-------------|
|
||||
| \`${state.work_dir}/iteration-state.json\` | Complete state history |
|
||||
| \`${state.work_dir}/iterations/iteration-{N}/iteration-{N}-eval.md\` | Per-iteration evaluations |
|
||||
| \`${state.work_dir}/iterations/iteration-{N}/iteration-{N}-changes.md\` | Per-iteration change logs |
|
||||
| \`${state.work_dir}/final-report.md\` | This report |
|
||||
| \`${state.backup_dir}/\` | Original skill backups |
|
||||
|
||||
## Restore Original
|
||||
|
||||
To revert all changes and restore the original skill files:
|
||||
|
||||
\`\`\`bash
|
||||
${state.target_skills.map(s => `cp -r "${state.backup_dir}/${s.name}"/* "${s.path}/"`).join('\n')}
|
||||
\`\`\`
|
||||
`;
|
||||
```
|
||||
|
||||
### Step 5.3: Write Report and Update State
|
||||
|
||||
```javascript
|
||||
Write(`${state.work_dir}/final-report.md`, report);
|
||||
|
||||
state.status = 'completed';
|
||||
Write(`${state.work_dir}/iteration-state.json`, JSON.stringify(state, null, 2));
|
||||
```
|
||||
|
||||
### Step 5.4: Display Summary to User
|
||||
|
||||
Output to user:
|
||||
|
||||
```
|
||||
Skill Iter Tune Complete!
|
||||
|
||||
Target: {skill names}
|
||||
Iterations: {count}
|
||||
Score: {initial} -> {final} ({outcome})
|
||||
Threshold: {threshold}
|
||||
|
||||
Score trend: {score1} -> {score2} -> ... -> {scoreN}
|
||||
|
||||
Full report: {workDir}/final-report.md
|
||||
Backups: {backupDir}/
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **Files**: `final-report.md`
|
||||
- **State**: `status = completed`
|
||||
- **Next**: Workflow complete. Return control to user.
|
||||
Reference in New Issue
Block a user