mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-05 01:50:27 +08:00
Introduces /workflow:lite-fix - a lightweight bug fixing workflow optimized for rapid diagnosis, targeted fixes, and streamlined verification. Command Design: - Three severity modes: Regular (2-4h), Critical (30-60min), Hotfix (15-30min) - Six-phase execution: Diagnosis → Impact → Planning → Verification → Confirmation → Execution - Intelligent code search: cli-explore-agent (regular) → direct search (critical) → minimal (hotfix) - Risk-aware verification: Full test suite → Focused tests → Smoke tests Key Features: - Structured root cause analysis (file:line, reproduction steps, blame info) - Quantitative impact assessment (risk score 0-10, user/business impact) - Multi-strategy fix planning (immediate patch vs comprehensive refactor) - Adaptive branch strategy (feature branch vs hotfix branch from production tag) - Automatic follow-up task generation for hotfixes (tech debt management) - Real-time deployment monitoring with auto-rollback triggers Integration: - Complements /workflow:lite-plan (fix vs feature development) - Reuses /workflow:lite-execute for execution layer - Integrates with /cli:mode:bug-diagnosis for preliminary analysis - Escalation path to /workflow:plan for complex refactors Design Documents: - .claude/commands/workflow/lite-fix.md - Complete command specification - LITE_FIX_DESIGN.md - Architecture design and decision records Addresses: PLANNING_GAP_ANALYSIS.md Scenario #8 (Emergency Fix) Expected Impact: - Reduce bug fix time by 50-70% - Improve diagnosis accuracy to 85%+ - Reduce production hotfix risks - Systematize technical debt from quick fixes
550 lines
14 KiB
Markdown
550 lines
14 KiB
Markdown
# Lite-Fix Command Design Document
|
||
|
||
**Date**: 2025-11-20
|
||
**Version**: 1.0.0
|
||
**Status**: Design Proposal
|
||
**Related**: PLANNING_GAP_ANALYSIS.md (Scenario #8: 紧急修复场景)
|
||
|
||
---
|
||
|
||
## 设计概述
|
||
|
||
`/workflow:lite-fix` 是一个轻量级的bug诊断和修复工作流命令,填补了当前planning系统在紧急修复场景的空白。设计参考了 `/workflow:lite-plan` 的成功模式,针对bug修复场景进行优化。
|
||
|
||
### 核心设计理念
|
||
|
||
1. **快速响应** - 支持15分钟到4小时的修复周期
|
||
2. **风险感知** - 根据严重程度调整流程复杂度
|
||
3. **渐进式验证** - 从smoke test到全量测试的灵活策略
|
||
4. **自动化跟进** - Hotfix模式自动生成完善任务
|
||
|
||
---
|
||
|
||
## 设计对比:lite-fix vs lite-plan
|
||
|
||
| 维度 | lite-plan | lite-fix | 设计理由 |
|
||
|------|-----------|----------|----------|
|
||
| **目标场景** | 新功能开发 | Bug修复 | 不同的开发意图 |
|
||
| **时间预算** | 1-6小时 | 15分钟-4小时 | Bug修复更紧迫 |
|
||
| **探索阶段** | 可选(`-e` flag) | 必需但可简化 | Bug需要诊断 |
|
||
| **输出类型** | 实现计划 | 诊断+修复计划 | Bug需要根因分析 |
|
||
| **验证策略** | 完整测试套件 | 分级验证(Smoke/Focused/Full) | 风险vs速度权衡 |
|
||
| **分支策略** | Feature分支 | Feature/Hotfix分支 | 生产环境需要特殊处理 |
|
||
| **跟进机制** | 无 | Hotfix自动生成跟进任务 | 技术债务管理 |
|
||
|
||
---
|
||
|
||
## 三种严重度模式设计
|
||
|
||
### Mode 1: Regular (默认)
|
||
|
||
**适用场景**:
|
||
- 非阻塞性bug
|
||
- 影响<20%用户
|
||
- 有充足时间(2-4小时)
|
||
|
||
**流程特点**:
|
||
```
|
||
完整诊断 → 多策略评估 → 全量测试 → 标准分支
|
||
```
|
||
|
||
**示例用例**:
|
||
```bash
|
||
/workflow:lite-fix "用户头像上传失败,返回413错误"
|
||
```
|
||
|
||
### Mode 2: Critical (`--critical`)
|
||
|
||
**适用场景**:
|
||
- 影响核心功能
|
||
- 影响20-50%用户
|
||
- 需要1小时内修复
|
||
|
||
**流程简化**:
|
||
```
|
||
聚焦诊断 → 单一最佳策略 → 关键场景测试 → 快速分支
|
||
```
|
||
|
||
**示例用例**:
|
||
```bash
|
||
/workflow:lite-fix --critical "购物车结算时随机丢失商品"
|
||
```
|
||
|
||
### Mode 3: Hotfix (`--hotfix`)
|
||
|
||
**适用场景**:
|
||
- 生产完全故障
|
||
- 影响100%用户或业务中断
|
||
- 需要15-30分钟修复
|
||
|
||
**流程最简**:
|
||
```
|
||
最小诊断 → 外科手术式修复 → Smoke测试 → Hotfix分支 → 自动跟进任务
|
||
```
|
||
|
||
**示例用例**:
|
||
```bash
|
||
/workflow:lite-fix --hotfix --incident INC-2024-1015 "支付网关5xx错误"
|
||
```
|
||
|
||
---
|
||
|
||
## 六阶段执行流程设计
|
||
|
||
### Phase 1: Diagnosis & Root Cause (诊断)
|
||
|
||
**设计亮点**:
|
||
- **智能搜索策略**:Regular使用cli-explore-agent,Critical使用直接搜索,Hotfix使用已知信息
|
||
- **结构化输出**:root_cause对象包含file、line_range、issue、introduced_by
|
||
- **可复现性验证**:输出reproduction_steps供验证
|
||
|
||
**技术实现**:
|
||
```javascript
|
||
if (mode === "regular") {
|
||
// 使用cli-explore-agent深度探索
|
||
Task(subagent_type="cli-explore-agent", ...)
|
||
} else if (mode === "critical") {
|
||
// 直接使用grep + git blame
|
||
Bash("grep -r '${error}' src/ | head -10")
|
||
} else {
|
||
// 假设已知问题,跳过探索
|
||
Read(suspected_file)
|
||
}
|
||
```
|
||
|
||
### Phase 2: Impact Assessment (影响评估)
|
||
|
||
**设计亮点**:
|
||
- **量化风险评分**:`risk_score = user_impact×0.4 + system_risk×0.3 + business_impact×0.3`
|
||
- **自动严重度建议**:根据评分建议使用`--critical`或`--hotfix`
|
||
- **业务影响分析**:包括revenue、reputation、SLA breach
|
||
|
||
**输出示例**:
|
||
```markdown
|
||
## Impact Assessment
|
||
**Risk Level**: HIGH (7.1/10)
|
||
**Affected Users**: ~5000 (100%)
|
||
**Business Impact**: Revenue (Medium), Reputation (High), SLA Breached
|
||
**Recommended Severity**: --critical flag suggested
|
||
```
|
||
|
||
### Phase 3: Fix Planning (修复规划)
|
||
|
||
**设计亮点**:
|
||
- **多策略生成**(Regular):immediate_patch vs comprehensive_refactor
|
||
- **单一最佳策略**(Critical/Hotfix):surgical_fix
|
||
- **复杂度自适应**:低复杂度用Claude直接规划,中等复杂度用cli-lite-planning-agent
|
||
|
||
**升级路径**:
|
||
```javascript
|
||
if (complexity > threshold) {
|
||
suggest("/workflow:plan --mode bugfix")
|
||
}
|
||
```
|
||
|
||
### Phase 4: Verification Strategy (验证策略)
|
||
|
||
**三级验证设计**:
|
||
|
||
| Level | Test Scope | Duration | Pass Criteria |
|
||
|-------|------------|----------|---------------|
|
||
| Smoke | 核心路径 | 2-5分钟 | 无核心功能回归 |
|
||
| Focused | 受影响模块 | 5-10分钟 | 关键场景通过 |
|
||
| Comprehensive | 完整套件 | 10-20分钟 | 全部测试通过 |
|
||
|
||
**分支策略差异**:
|
||
```javascript
|
||
if (mode === "hotfix") {
|
||
branch = {
|
||
type: "hotfix_branch",
|
||
base: "production_tag_v2.3.1", // ⚠️ 从生产tag创建
|
||
merge_target: ["main", "production"] // 双向合并
|
||
}
|
||
}
|
||
```
|
||
|
||
### Phase 5: User Confirmation (用户确认)
|
||
|
||
**多维度确认设计**:
|
||
|
||
**Regular**: 4维度
|
||
1. Fix strategy (Proceed/Modify/Escalate)
|
||
2. Execution method (Agent/CLI/Manual)
|
||
3. Verification level (Full/Focused/Smoke)
|
||
4. Code review (Gemini/Skip)
|
||
|
||
**Critical**: 3维度(跳过code review)
|
||
|
||
**Hotfix**: 2维度(最小确认)
|
||
1. Deploy confirmation (Deploy/Stage First/Abort)
|
||
2. Post-deployment monitoring (Real-time/Passive)
|
||
|
||
### Phase 6: Execution & Follow-up (执行和跟进)
|
||
|
||
**设计亮点**:
|
||
- **统一执行接口**:通过`/workflow:lite-execute --in-memory --mode bugfix`执行
|
||
- **自动跟进任务**(Hotfix专属):
|
||
```json
|
||
[
|
||
{"id": "FOLLOWUP-comprehensive", "title": "完善修复", "due": "3天"},
|
||
{"id": "FOLLOWUP-postmortem", "title": "事后分析", "due": "1周"}
|
||
]
|
||
```
|
||
- **实时监控**(可选):15分钟监控窗口,自动回滚触发器
|
||
|
||
---
|
||
|
||
## 与现有系统集成
|
||
|
||
### 1. 命令集成路径
|
||
|
||
```mermaid
|
||
graph TD
|
||
A[Bug发现] --> B{严重程度?}
|
||
B -->|不确定| C[/cli:mode:bug-diagnosis]
|
||
C --> D[/workflow:lite-fix]
|
||
|
||
B -->|一般| E[/workflow:lite-fix]
|
||
B -->|紧急| F[/workflow:lite-fix --critical]
|
||
B -->|生产故障| G[/workflow:lite-fix --hotfix]
|
||
|
||
E --> H{复杂度?}
|
||
F --> H
|
||
G --> I[lite-execute]
|
||
|
||
H -->|简单| I
|
||
H -->|复杂| J[建议升级到 /workflow:plan]
|
||
|
||
I --> K[修复完成]
|
||
K --> L[/workflow:review --type quality]
|
||
```
|
||
|
||
### 2. 数据流设计
|
||
|
||
**输入**:
|
||
```javascript
|
||
{
|
||
bug_description: string,
|
||
severity_flags: "--critical" | "--hotfix" | null,
|
||
incident_id: string | null
|
||
}
|
||
```
|
||
|
||
**内部数据结构**:
|
||
- `diagnosisContext`: 根因分析结果
|
||
- `impactContext`: 影响评估结果
|
||
- `fixPlan`: 修复计划对象
|
||
- `executionContext`: 传递给lite-execute的上下文
|
||
|
||
**输出**:
|
||
```javascript
|
||
{
|
||
// In-memory (传递给lite-execute)
|
||
executionContext: {...},
|
||
|
||
// Optional: Persistent JSON
|
||
`.workflow/lite-fixes/BUGFIX-${timestamp}.json`,
|
||
|
||
// Hotfix专属:Follow-up tasks
|
||
`.workflow/lite-fixes/BUGFIX-${timestamp}-followup.json`
|
||
}
|
||
```
|
||
|
||
### 3. 与lite-execute配合
|
||
|
||
**lite-fix职责**:
|
||
- ✅ 诊断和规划
|
||
- ✅ 影响评估
|
||
- ✅ 策略选择
|
||
- ✅ 用户确认
|
||
|
||
**lite-execute职责**:
|
||
- ✅ 代码实现
|
||
- ✅ 测试执行
|
||
- ✅ 分支操作
|
||
- ✅ 部署监控
|
||
|
||
**交接机制**:
|
||
```javascript
|
||
// lite-fix准备executionContext
|
||
executionContext = {
|
||
mode: "bugfix",
|
||
severity: "hotfix",
|
||
planObject: {...},
|
||
diagnosisContext: {...},
|
||
verificationStrategy: {...},
|
||
branchStrategy: {...}
|
||
}
|
||
|
||
// 调用lite-execute
|
||
SlashCommand("/workflow:lite-execute --in-memory --mode bugfix")
|
||
```
|
||
|
||
---
|
||
|
||
## 架构扩展点
|
||
|
||
### 1. Enhanced Task JSON Schema扩展
|
||
|
||
**新增scenario类型**:
|
||
```json
|
||
{
|
||
"scenario_type": "bugfix",
|
||
"scenario_config": {
|
||
"severity": "regular|critical|hotfix",
|
||
"root_cause": {
|
||
"file": "...",
|
||
"line_range": "...",
|
||
"issue": "..."
|
||
},
|
||
"impact_assessment": {
|
||
"risk_score": 7.1,
|
||
"affected_users_count": 5000
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2. workflow-session.json扩展(如果使用session模式)
|
||
|
||
```json
|
||
{
|
||
"session_id": "WFS-bugfix-payment",
|
||
"type": "bugfix",
|
||
"severity": "critical",
|
||
"incident_id": "INC-2024-1015",
|
||
"bugfixes": [
|
||
{
|
||
"id": "BUGFIX-001",
|
||
"status": "completed",
|
||
"follow_up_tasks": ["FOLLOWUP-001", "FOLLOWUP-002"]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### 3. 诊断缓存机制(高级特性)
|
||
|
||
**设计目的**:加速相似bug的诊断
|
||
|
||
```javascript
|
||
cache_key = hash(bug_keywords + recent_changes_hash)
|
||
cache_path = `.workflow/lite-fixes/diagnosis-cache/${cache_key}.json`
|
||
|
||
if (cache_exists && cache_age < 1_week) {
|
||
diagnosis = load_from_cache()
|
||
console.log("Using cached diagnosis (similar issue found)")
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 质量保证机制
|
||
|
||
### 1. 质量门禁
|
||
|
||
**执行前检查**:
|
||
- [ ] 根因识别置信度>80%
|
||
- [ ] 影响范围明确定义
|
||
- [ ] 修复策略已审查批准
|
||
- [ ] 验证计划与风险匹配
|
||
- [ ] 分支策略符合严重度
|
||
|
||
**Hotfix专属门禁**:
|
||
- [ ] 事故工单已创建并关联
|
||
- [ ] 事故指挥官批准
|
||
- [ ] 回滚计划已文档化
|
||
- [ ] 跟进任务已生成
|
||
- [ ] 部署后监控已配置
|
||
|
||
### 2. 错误处理策略
|
||
|
||
| 错误 | 原因 | 解决方案 |
|
||
|------|------|---------|
|
||
| 根因未找到 | 搜索范围不足 | 扩大搜索或升级到/workflow:plan |
|
||
| 复现失败 | 过期bug或环境问题 | 验证环境,请求更新复现步骤 |
|
||
| 多个潜在原因 | 复杂交互 | 使用/cli:discuss-plan多模型分析 |
|
||
| 修复过于复杂 | 需要重构 | 建议/workflow:plan --mode refactor |
|
||
| Hotfix验证失败 | Smoke测试不足 | 添加关键测试或降级到critical模式 |
|
||
|
||
---
|
||
|
||
## 实现优先级和路线图
|
||
|
||
### Phase 1: 核心功能实现(Sprint 1)
|
||
|
||
**必需功能**:
|
||
- [x] 命令文档完成(本文档)
|
||
- [ ] 六阶段流程实现
|
||
- [ ] 三种严重度模式支持
|
||
- [ ] 基础诊断逻辑
|
||
- [ ] 与lite-execute集成
|
||
|
||
**预计工作量**:5-8天
|
||
|
||
### Phase 2: 高级特性(Sprint 2)
|
||
|
||
**增强功能**:
|
||
- [ ] 诊断缓存机制
|
||
- [ ] 自动严重度检测
|
||
- [ ] Hotfix分支管理
|
||
- [ ] 实时监控集成
|
||
- [ ] 跟进任务自动生成
|
||
|
||
**预计工作量**:3-5天
|
||
|
||
### Phase 3: 优化和完善(Sprint 3)
|
||
|
||
**优化项**:
|
||
- [ ] 性能优化(诊断速度)
|
||
- [ ] 错误处理完善
|
||
- [ ] 文档和示例补充
|
||
- [ ] 用户反馈迭代
|
||
|
||
**预计工作量**:2-3天
|
||
|
||
---
|
||
|
||
## 成功度量指标
|
||
|
||
### 1. 使用指标
|
||
|
||
- **采用率**:lite-fix使用次数 / 总bug修复次数
|
||
- **目标**:>60%的常规bug使用lite-fix
|
||
|
||
### 2. 效率指标
|
||
|
||
- **Regular模式**:平均修复时间<3小时(vs 手动4-6小时)
|
||
- **Critical模式**:平均修复时间<1小时(vs 手动2-3小时)
|
||
- **Hotfix模式**:平均修复时间<30分钟(vs 手动1-2小时)
|
||
|
||
### 3. 质量指标
|
||
|
||
- **诊断准确率**:根因识别准确率>85%
|
||
- **修复成功率**:首次修复成功率>90%
|
||
- **回归率**:引入新bug的比例<5%
|
||
|
||
### 4. 跟进完成率(Hotfix)
|
||
|
||
- **跟进任务完成率**:>80%的跟进任务在deadline前完成
|
||
- **技术债务偿还**:Hotfix的完善修复在3天内完成率>70%
|
||
|
||
---
|
||
|
||
## 风险和缓解措施
|
||
|
||
### 风险1: Hotfix误用导致生产问题
|
||
|
||
**缓解措施**:
|
||
1. 严格的质量门禁(需要事故指挥官批准)
|
||
2. 强制回滚计划文档化
|
||
3. 实时监控自动回滚触发器
|
||
4. 强制生成跟进任务
|
||
|
||
### 风险2: 诊断准确率不足
|
||
|
||
**缓解措施**:
|
||
1. 诊断置信度评分(<80%建议升级到/workflow:plan)
|
||
2. 提供多假设诊断(当不确定时)
|
||
3. 集成/cli:discuss-plan用于复杂case
|
||
|
||
### 风险3: 用户跳过必要验证
|
||
|
||
**缓解措施**:
|
||
1. 根据严重度强制最低验证级别
|
||
2. 显示风险警告(跳过测试的后果)
|
||
3. Hotfix模式禁止跳过smoke测试
|
||
|
||
---
|
||
|
||
## 与PLANNING_GAP_ANALYSIS的对应关系
|
||
|
||
本设计是对 `PLANNING_GAP_ANALYSIS.md` 中 **场景8: 紧急修复场景** 的完整实现方案。
|
||
|
||
**Gap覆盖情况**:
|
||
|
||
| Gap项 | 覆盖程度 | 实现方式 |
|
||
|-------|---------|---------|
|
||
| 流程简化 | ✅ 100% | 三种严重度模式 |
|
||
| 快速验证 | ✅ 100% | 分级验证策略 |
|
||
| Hotfix分支管理 | ✅ 100% | branchStrategy配置 |
|
||
| 事后补充完整修复 | ✅ 100% | 自动跟进任务生成 |
|
||
|
||
**额外增强**:
|
||
- ✅ 诊断缓存机制(未在原gap中提及)
|
||
- ✅ 实时监控集成(超出原需求)
|
||
- ✅ 自动严重度建议(智能化增强)
|
||
|
||
---
|
||
|
||
## 设计决策记录
|
||
|
||
### ADR-001: 为什么不复用/workflow:plan?
|
||
|
||
**决策**:创建独立的lite-fix命令
|
||
|
||
**理由**:
|
||
1. Bug修复的核心是"诊断",而plan的核心是"设计"
|
||
2. Bug修复需要风险分级(Regular/Critical/Hotfix),plan不需要
|
||
3. Bug修复需要Hotfix分支管理,plan使用标准feature分支
|
||
4. Bug修复需要跟进任务机制,plan假设一次性完成
|
||
|
||
**替代方案被拒绝**:`/workflow:plan --mode bugfix`
|
||
- 问题:plan流程太重,不适合15分钟hotfix场景
|
||
|
||
### ADR-002: 为什么分三种严重度而不是连续评分?
|
||
|
||
**决策**:使用离散的Regular/Critical/Hotfix模式
|
||
|
||
**理由**:
|
||
1. 清晰的决策点(用户容易选择)
|
||
2. 每个模式有明确的流程差异
|
||
3. 避免"评分8.5应该用什么流程"的模糊性
|
||
|
||
**替代方案被拒绝**:0-10连续评分自动选择流程
|
||
- 问题:评分边界模糊,用户难以理解为什么某个分数用某个流程
|
||
|
||
### ADR-003: 为什么诊断是必需而非可选?
|
||
|
||
**决策**:Phase 1诊断在所有模式下都是必需的(但复杂度可调整)
|
||
|
||
**理由**:
|
||
1. 即使是已知bug,也需要验证复现路径
|
||
2. 诊断输出对后续修复质量至关重要
|
||
3. 跳过诊断容易导致修错问题(修了A却没修B)
|
||
|
||
**替代方案被拒绝**:Hotfix模式完全跳过诊断
|
||
- 问题:增加误修风险,事后难以生成postmortem
|
||
|
||
---
|
||
|
||
## 总结
|
||
|
||
`/workflow:lite-fix` 是对当前planning系统的重要补充,专注于bug修复这一特定场景。设计充分借鉴了lite-plan的成功经验,同时针对bug修复的特殊需求进行了优化:
|
||
|
||
**核心优势**:
|
||
1. ⚡ **快速响应**:15分钟到4小时的修复周期
|
||
2. 🎯 **风险感知**:三种严重度模式适配不同紧急程度
|
||
3. 🔍 **智能诊断**:结构化根因分析
|
||
4. 🛡️ **质量保证**:分级验证+强制门禁
|
||
5. 📋 **技术债务管理**:Hotfix自动生成完善任务
|
||
|
||
**与现有系统协同**:
|
||
- 与 `/workflow:lite-plan` 形成"修复-开发"双子命令
|
||
- 复用 `/workflow:lite-execute` 执行层
|
||
- 集成 `/cli:mode:bug-diagnosis` 诊断能力
|
||
- 支持升级到 `/workflow:plan` 处理复杂场景
|
||
|
||
**预期影响**:
|
||
- 减少bug修复时间50-70%
|
||
- 提升诊断准确率到85%+
|
||
- 减少生产hotfix风险
|
||
- 系统化管理技术债务
|
||
|
||
---
|
||
|
||
**文档版本**: 1.0.0
|
||
**作者**: Claude (Sonnet 4.5)
|
||
**审阅状态**: 待审阅
|
||
**实现状态**: 设计阶段(待开发)
|