mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-03-12 17:21:19 +08:00
Implement phases for skill iteration tuning: Evaluation, Improvement, and Reporting
- Added Phase 3: Evaluate Quality with steps for preparing context, constructing evaluation prompts, executing evaluation via CLI, parsing scores, and checking termination conditions. - Introduced Phase 4: Apply Improvements to implement targeted changes based on evaluation suggestions, including agent execution and change documentation. - Created Phase 5: Final Report to generate a comprehensive report of the iteration process, including score progression and remaining weaknesses. - Established evaluation criteria in a new document to guide the evaluation process. - Developed templates for evaluation and execution prompts to standardize input for the evaluation and execution phases.
This commit is contained in:
166
.claude/skills/skill-iter-tune/phases/05-report.md
Normal file
166
.claude/skills/skill-iter-tune/phases/05-report.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# Phase 5: Final Report
|
||||
|
||||
> **COMPACT SENTINEL [Phase 5: Report]**
|
||||
> This phase contains 4 execution steps (Step 5.1 -- 5.4).
|
||||
> If you can read this sentinel but cannot find the full Step protocol below, context has been compressed.
|
||||
> Recovery: `Read("phases/05-report.md")`
|
||||
|
||||
Generate comprehensive iteration history report and display results to user.
|
||||
|
||||
## Objective
|
||||
|
||||
- Read complete iteration state
|
||||
- Generate formatted final report with score progression
|
||||
- Write final-report.md
|
||||
- Display summary to user
|
||||
|
||||
## Execution
|
||||
|
||||
### Step 5.1: Read Complete State
|
||||
|
||||
```javascript
|
||||
const state = JSON.parse(Read(`${state.work_dir}/iteration-state.json`));
|
||||
state.status = 'completed';
|
||||
state.updated_at = new Date().toISOString();
|
||||
```
|
||||
|
||||
### Step 5.2: Generate Report
|
||||
|
||||
```javascript
|
||||
// Determine outcome
|
||||
const outcomeMap = {
|
||||
quality_threshold_met: 'PASSED -- Quality threshold reached',
|
||||
max_iterations_reached: 'MAX ITERATIONS -- Threshold not reached',
|
||||
convergence_detected: 'CONVERGED -- Score stopped improving',
|
||||
error_limit_reached: 'FAILED -- Too many errors'
|
||||
};
|
||||
const outcome = outcomeMap[state.termination_reason] || 'COMPLETED';
|
||||
|
||||
// Build score progression table
|
||||
const scoreTable = state.iterations
|
||||
.filter(i => i.evaluation)
|
||||
.map(i => {
|
||||
const dims = i.evaluation.dimensions || [];
|
||||
const dimScores = ['clarity', 'completeness', 'correctness', 'effectiveness', 'efficiency']
|
||||
.map(id => {
|
||||
const dim = dims.find(d => d.id === id);
|
||||
return dim ? dim.score : '-';
|
||||
});
|
||||
return `| ${i.round} | ${i.evaluation.score} | ${dimScores.join(' | ')} |`;
|
||||
}).join('\n');
|
||||
|
||||
// Build iteration details
|
||||
const iterationDetails = state.iterations.map(iter => {
|
||||
const evalSection = iter.evaluation
|
||||
? `**Score**: ${iter.evaluation.score}/100\n` +
|
||||
`**Strengths**: ${iter.evaluation.strengths?.join(', ') || 'N/A'}\n` +
|
||||
`**Weaknesses**: ${iter.evaluation.weaknesses?.slice(0, 3).join(', ') || 'N/A'}`
|
||||
: '**Evaluation**: Skipped or failed';
|
||||
|
||||
const changesSection = iter.improvement
|
||||
? `**Changes Applied**: ${iter.improvement.changes_applied?.length || 0}\n` +
|
||||
(iter.improvement.changes_applied?.map(c => ` - ${c.summary}`).join('\n') || ' None')
|
||||
: '**Improvements**: None';
|
||||
|
||||
return `### Iteration ${iter.round}\n${evalSection}\n${changesSection}`;
|
||||
}).join('\n\n');
|
||||
|
||||
const report = `# Skill Iter Tune -- Final Report
|
||||
|
||||
## Summary
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Target Skills** | ${state.target_skills.map(s => s.name).join(', ')} |
|
||||
| **Execution Mode** | ${state.execution_mode} |
|
||||
${state.execution_mode === 'chain' ? `| **Chain Order** | ${state.chain_order.join(' -> ')} |` : ''}
|
||||
| **Test Scenario** | ${state.test_scenario.description} |
|
||||
| **Iterations** | ${state.iterations.length} |
|
||||
| **Initial Score** | ${state.score_trend[0] || 'N/A'} |
|
||||
| **Final Score** | ${state.latest_score}/100 |
|
||||
| **Quality Threshold** | ${state.quality_threshold} |
|
||||
| **Outcome** | ${outcome} |
|
||||
| **Started** | ${state.started_at} |
|
||||
| **Completed** | ${state.updated_at} |
|
||||
|
||||
## Score Progression
|
||||
|
||||
| Iter | Composite | Clarity | Completeness | Correctness | Effectiveness | Efficiency |
|
||||
|------|-----------|---------|--------------|-------------|---------------|------------|
|
||||
${scoreTable}
|
||||
|
||||
**Trend**: ${state.score_trend.join(' -> ')}
|
||||
|
||||
${state.execution_mode === 'chain' ? `
|
||||
## Chain Score Progression
|
||||
|
||||
| Iter | ${state.chain_order.join(' | ')} |
|
||||
|------|${state.chain_order.map(() => '------').join('|')}|
|
||||
${state.iterations.filter(i => i.evaluation?.chain_scores).map(i => {
|
||||
const scores = state.chain_order.map(s => i.evaluation.chain_scores[s] || '-');
|
||||
return `| ${i.round} | ${scores.join(' | ')} |`;
|
||||
}).join('\n')}
|
||||
` : ''}
|
||||
|
||||
## Iteration Details
|
||||
|
||||
${iterationDetails}
|
||||
|
||||
## Remaining Weaknesses
|
||||
|
||||
${state.iterations.length > 0 && state.iterations[state.iterations.length - 1].evaluation
|
||||
? state.iterations[state.iterations.length - 1].evaluation.weaknesses?.map(w => `- ${w}`).join('\n') || 'None identified'
|
||||
: 'No evaluation data available'}
|
||||
|
||||
## Artifact Locations
|
||||
|
||||
| Path | Description |
|
||||
|------|-------------|
|
||||
| \`${state.work_dir}/iteration-state.json\` | Complete state history |
|
||||
| \`${state.work_dir}/iterations/iteration-{N}/iteration-{N}-eval.md\` | Per-iteration evaluations |
|
||||
| \`${state.work_dir}/iterations/iteration-{N}/iteration-{N}-changes.md\` | Per-iteration change logs |
|
||||
| \`${state.work_dir}/final-report.md\` | This report |
|
||||
| \`${state.backup_dir}/\` | Original skill backups |
|
||||
|
||||
## Restore Original
|
||||
|
||||
To revert all changes and restore the original skill files:
|
||||
|
||||
\`\`\`bash
|
||||
${state.target_skills.map(s => `cp -r "${state.backup_dir}/${s.name}"/* "${s.path}/"`).join('\n')}
|
||||
\`\`\`
|
||||
`;
|
||||
```
|
||||
|
||||
### Step 5.3: Write Report and Update State
|
||||
|
||||
```javascript
|
||||
Write(`${state.work_dir}/final-report.md`, report);
|
||||
|
||||
state.status = 'completed';
|
||||
Write(`${state.work_dir}/iteration-state.json`, JSON.stringify(state, null, 2));
|
||||
```
|
||||
|
||||
### Step 5.4: Display Summary to User
|
||||
|
||||
Output to user:
|
||||
|
||||
```
|
||||
Skill Iter Tune Complete!
|
||||
|
||||
Target: {skill names}
|
||||
Iterations: {count}
|
||||
Score: {initial} -> {final} ({outcome})
|
||||
Threshold: {threshold}
|
||||
|
||||
Score trend: {score1} -> {score2} -> ... -> {scoreN}
|
||||
|
||||
Full report: {workDir}/final-report.md
|
||||
Backups: {backupDir}/
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **Files**: `final-report.md`
|
||||
- **State**: `status = completed`
|
||||
- **Next**: Workflow complete. Return control to user.
|
||||
Reference in New Issue
Block a user