mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-03-11 17:21:03 +08:00
Implement phases for skill iteration tuning: Evaluation, Improvement, and Reporting
- Added Phase 3: Evaluate Quality with steps for preparing context, constructing evaluation prompts, executing evaluation via CLI, parsing scores, and checking termination conditions. - Introduced Phase 4: Apply Improvements to implement targeted changes based on evaluation suggestions, including agent execution and change documentation. - Created Phase 5: Final Report to generate a comprehensive report of the iteration process, including score progression and remaining weaknesses. - Established evaluation criteria in a new document to guide the evaluation process. - Developed templates for evaluation and execution prompts to standardize input for the evaluation and execution phases.
This commit is contained in:
63
.claude/skills/skill-iter-tune/specs/evaluation-criteria.md
Normal file
63
.claude/skills/skill-iter-tune/specs/evaluation-criteria.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# Evaluation Criteria
|
||||
|
||||
Skill 质量评估标准,由 Phase 03 (Evaluate) 引用。Gemini 按此标准对 skill 产出物进行多维度评分。
|
||||
|
||||
## Dimensions
|
||||
|
||||
| Dimension | Weight | ID | Description |
|
||||
|-----------|--------|----|-------------|
|
||||
| Clarity | 0.20 | clarity | 指令清晰无歧义,结构良好,易于遵循。Phase 文件有明确的 Step 划分、输入输出说明 |
|
||||
| Completeness | 0.25 | completeness | 覆盖所有必要阶段、边界情况、错误处理。没有遗漏关键执行路径 |
|
||||
| Correctness | 0.25 | correctness | 逻辑正确,数据流一致,Phase 间无矛盾。State schema 与实际使用匹配 |
|
||||
| Effectiveness | 0.20 | effectiveness | 在给定测试场景下能产出高质量输出。产物满足用户需求和成功标准 |
|
||||
| Efficiency | 0.10 | efficiency | 无冗余内容,上下文使用合理,不浪费 token。Phase 职责清晰无重叠 |
|
||||
|
||||
## Scoring Guide
|
||||
|
||||
| Range | Level | Description |
|
||||
|-------|-------|-------------|
|
||||
| 90-100 | Excellent | 生产级别,几乎无改进空间 |
|
||||
| 80-89 | Good | 可投入使用,仅需微调 |
|
||||
| 70-79 | Adequate | 功能可用,有明显可改进区域 |
|
||||
| 60-69 | Needs Work | 存在影响产出质量的显著问题 |
|
||||
| 0-59 | Poor | 结构或逻辑存在根本性问题 |
|
||||
|
||||
## Composite Score Calculation
|
||||
|
||||
```
|
||||
composite = sum(dimension.score * dimension.weight)
|
||||
```
|
||||
|
||||
## Output JSON Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"composite_score": 75,
|
||||
"dimensions": [
|
||||
{ "name": "Clarity", "id": "clarity", "score": 80, "weight": 0.20, "feedback": "..." },
|
||||
{ "name": "Completeness", "id": "completeness", "score": 70, "weight": 0.25, "feedback": "..." },
|
||||
{ "name": "Correctness", "id": "correctness", "score": 78, "weight": 0.25, "feedback": "..." },
|
||||
{ "name": "Effectiveness", "id": "effectiveness", "score": 72, "weight": 0.20, "feedback": "..." },
|
||||
{ "name": "Efficiency", "id": "efficiency", "score": 85, "weight": 0.10, "feedback": "..." }
|
||||
],
|
||||
"strengths": ["...", "...", "..."],
|
||||
"weaknesses": ["...", "...", "..."],
|
||||
"suggestions": [
|
||||
{
|
||||
"priority": "high",
|
||||
"target_file": "phases/02-execute.md",
|
||||
"description": "Add explicit error handling for CLI timeout",
|
||||
"rationale": "Current phase has no recovery path when CLI execution exceeds timeout",
|
||||
"code_snippet": "optional suggested replacement code"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Evaluation Focus by Iteration
|
||||
|
||||
| Iteration | Primary Focus |
|
||||
|-----------|--------------|
|
||||
| 1 | 全面评估,建立 baseline |
|
||||
| 2-3 | 重点关注上一轮 weaknesses 是否改善,避免重复已解决的问题 |
|
||||
| 4+ | 精细化改进,关注 Effectiveness 和 Efficiency |
|
||||
Reference in New Issue
Block a user