fix(dev-workflow): refactor backend selection to multiSelect mode

根据 PR review 反馈进行修复:

核心改动:
- Step 0: backend 选择改为 multiSelect 多选模式
- 三个独立选项:codex、claude、gemini(每个带详细说明)
- 简化任务分类:使用 type 字段(default|ui|quick-fix)替代复杂的 complexity 评级
- Backend 路由逻辑清晰:default→codex, ui→gemini, quick-fix→claude
- 用户限制优先:仅选 codex 时强制所有任务使用 codex

改进点:
- 移除 PR#61 的 complexity/simple/medium/complex 字段
- 移除 rationale 字段,简化为单一 type 维度
- 修正 UI 判定逻辑,改为每任务属性
- Fallback 策略:codex → claude → gemini(优先级清晰)
- 错误处理:type 缺失默认为 default

文件修改:
- dev-workflow/commands/dev.md: 添加 Step 0,更新路由逻辑
- dev-workflow/agents/dev-plan-generator.md: 简化任务分类
- dev-workflow/README.md: 更新文档和示例

Generated with SWE-Agent.ai

Co-Authored-By: SWE-Agent.ai <noreply@swe-agent.ai>
This commit is contained in:
cexll
2025-12-25 22:08:33 +08:00
parent 19facf3385
commit 2856bf0c29
3 changed files with 160 additions and 130 deletions

View File

@@ -9,44 +9,56 @@ A freshly designed lightweight development workflow with no legacy baggage, focu
```
/dev trigger
AskUserQuestion (backend selection)
AskUserQuestion (requirements clarification)
codeagent analysis (plan mode + UI auto-detection)
codeagent analysis (plan mode + task typing + UI auto-detection)
dev-plan-generator (create dev doc)
codeagent concurrent development (intelligent backend selection)
codeagent concurrent development (25 tasks, backend routing)
codeagent testing & verification (≥90% coverage)
Done (generate summary)
```
## The 6 Steps
## Step 0 + The 6 Steps
### 0. Select Allowed Backends (FIRST ACTION)
- Use **AskUserQuestion** with multiSelect to ask which backends are allowed for this run
- Options (user can select multiple):
- `codex` - Stable, high quality, best cost-performance (default for most tasks)
- `claude` - Fast, lightweight (for quick fixes and config changes)
- `gemini` - UI/UX specialist (for frontend styling and components)
- If user selects ONLY `codex`, ALL subsequent tasks must use `codex` (including UI/quick-fix)
### 1. Clarify Requirements
- Use **AskUserQuestion** to ask the user directly
- No scoring system, no complex logic
- 23 rounds of Q&A until the requirement is clear
### 2. codeagent Analysis & UI Detection
### 2. codeagent Analysis + Task Typing + UI Detection
- Call codeagent to analyze the request in plan mode style
- Extract: core functions, technical points, task list with complexity ratings
- Extract: core functions, technical points, task list (25 items)
- For each task, assign exactly one type: `default` / `ui` / `quick-fix`
- UI auto-detection: needs UI work when task involves style assets (.css, .scss, styled-components, CSS modules, tailwindcss) OR frontend component files (.tsx, .jsx, .vue); output yes/no plus evidence
### 3. Generate Dev Doc
- Call the **dev-plan-generator** agent
- Produce a single `dev-plan.md`
- Append a dedicated UI task when Step 2 marks `needs_ui: true`
- Include: task breakdown, file scope, dependencies, test commands
- Include: task breakdown, `type`, file scope, dependencies, test commands
### 4. Concurrent Development
- Work from the task list in dev-plan.md
- Use codeagent per task with intelligent backend selection:
- Simple/Medium tasks → `--backend claude` (fast, cost-effective)
- Complex tasks → `--backend codex` (deep reasoning)
- UI tasks → `--backend gemini` (enforced)
- Backend selected automatically based on task complexity rating
- Route backend per task type (with user constraints + fallback):
- `default``codex`
- `ui``gemini` (enforced when allowed)
- `quick-fix``claude`
- Missing `type` → treat as `default`
- If the preferred backend is not allowed, fallback to an allowed backend by priority: `codex``claude``gemini`
- Independent tasks → run in parallel
- Conflicting tasks → run serially
@@ -67,7 +79,7 @@ Done (generate summary)
/dev "Implement user login with email + password"
```
**No options**, fixed workflow, works out of the box.
No CLI flags required; workflow starts with an interactive backend selection.
## Output Structure
@@ -82,17 +94,14 @@ Only one file—minimal and clear.
### Tools
- **AskUserQuestion**: interactive requirement clarification
- **codeagent skill**: analysis, development, testing; supports `--backend` for claude/codex/gemini
- **dev-plan-generator agent**: generate dev doc with complexity ratings (subagent via Task tool, saves context)
- **codeagent skill**: analysis, development, testing; supports `--backend` for `codex` / `claude` / `gemini`
- **dev-plan-generator agent**: generate dev doc (subagent via Task tool, saves context)
## Intelligent Backend Selection
- **Complexity-based routing**: Tasks are rated as simple/medium/complex based on functional requirements (NOT code volume)
- Simple: Follows existing patterns, deterministic logic → claude
- Medium: Requires design decisions, multiple scenarios → claude
- Complex: Architecture design, algorithms, deep domain knowledge → codex
- UI: Style/component work → gemini (enforced)
- **Flow impact**: Step 2 analyzes complexity; Step 3 includes complexity ratings in dev-plan.md; Step 4 auto-selects backend
- **Implementation**: Orchestrator reads complexity field and invokes codeagent skill with appropriate backend parameter
## Backend Selection & Routing
- **Step 0**: user selects allowed backends; if `仅 codex`, all tasks use codex
- **UI detection standard**: style files (.css, .scss, styled-components, CSS modules, tailwindcss) OR frontend component code (.tsx, .jsx, .vue) trigger `needs_ui: true`
- **Task type field**: each task in `dev-plan.md` must have `type: default|ui|quick-fix`
- **Routing**: `default`→codex, `ui`→gemini, `quick-fix`→claude; if disallowed, fallback to an allowed backend by priority: codex→claude→gemini
## Key Features
@@ -122,6 +131,10 @@ Only one file—minimal and clear.
# Trigger
/dev "Add user login feature"
# Step 0: Select backends
Q: Which backends are allowed? (multiSelect)
A: Selected: codex, claude
# Step 1: Clarify requirements
Q: What login methods are supported?
A: Email + password
@@ -131,18 +144,18 @@ A: Yes, use JWT token
# Step 2: codeagent analysis
Output:
- Core: email/password login + JWT auth
- Task 1: Backend API (complexity: medium)
- Task 2: Password hashing (complexity: simple)
- Task 3: Frontend form (complexity: simple)
- Task 1: Backend API (type=default)
- Task 2: Password hashing (type=default)
- Task 3: Frontend form (type=ui)
UI detection: needs_ui = true (tailwindcss classes in frontend form)
# Step 3: Generate doc
dev-plan.md generated with complexity ratings ✓
dev-plan.md generated with typed tasks ✓
# Step 4-5: Concurrent development (intelligent backend selection)
[task-1] Backend API (claude, medium) → tests → 92% ✓
[task-2] Password hashing (claude, simple) → tests → 95% ✓
[task-3] Frontend form (gemini, UI) → tests → 91% ✓
# Step 4-5: Concurrent development (routing + fallback)
[task-1] Backend API (codex) → tests → 92% ✓
[task-2] Password hashing (codex) → tests → 95% ✓
[task-3] Frontend form (fallback to codex; gemini not allowed) → tests → 91% ✓
```
## Directory Structure

View File

@@ -12,7 +12,7 @@ You are a specialized Development Plan Document Generator. Your sole responsibil
You receive context from an orchestrator including:
- Feature requirements description
- codeagent analysis results (feature highlights, task decomposition, UI detection flag)
- codeagent analysis results (feature highlights, task decomposition, UI detection flag, and task typing hints)
- Feature name (in kebab-case format)
Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md`
@@ -29,8 +29,7 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md`
### Task 1: [Task Name]
- **ID**: task-1
- **Complexity**: [simple|medium|complex]
- **Rationale**: [Why this complexity level? What makes it simple/complex?]
- **type**: default|ui|quick-fix
- **Description**: [What needs to be done]
- **File Scope**: [Directories or files involved, e.g., src/auth/**, tests/auth/]
- **Dependencies**: [None or depends on task-x]
@@ -40,7 +39,7 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md`
### Task 2: [Task Name]
...
(Tasks based on natural functional boundaries, typically 2-8)
(Tasks based on natural functional boundaries, typically 2-5)
## Acceptance Criteria
- [ ] Feature point 1
@@ -56,12 +55,12 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md`
## Generation Rules You Must Enforce
1. **Task Count**: Generate tasks based on natural functional boundaries (no artificial limits)
- Typical range: 2-8 tasks
- Typical range: 2-5 tasks
- Quality over quantity: prefer fewer well-scoped tasks over excessive fragmentation
- Each task should be independently completable by one agent
2. **Task Requirements**: Each task MUST include:
- Clear ID (task-1, task-2, etc.)
- Complexity rating (simple/medium/complex) with rationale
- A single task type field: `type: default|ui|quick-fix`
- Specific description of what needs to be done
- Explicit file scope (directories or files affected)
- Dependency declaration ("None" or "depends on task-x")
@@ -71,50 +70,16 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md`
4. **Test Commands**: Must include coverage parameters (e.g., `--cov=module --cov-report=term` for pytest, `--coverage` for npm)
5. **Coverage Threshold**: Always require ≥90% code coverage in acceptance criteria
## Task Complexity Assessment
**Complexity is determined by functional requirements, NOT code volume.**
### Simple Tasks
**Characteristics**:
- Well-defined, single responsibility
- Follows existing patterns (copy-paste-modify)
- No architecture decisions needed
- Deterministic logic (no edge cases)
**Examples**: Add CRUD endpoint following existing pattern, update validation rules, add configuration option, simple data transformation, UI component with clear spec
**Backend**: claude (fast, pattern-matching)
### Medium Tasks
**Characteristics**:
- Requires understanding system context
- Some design decisions (data structure, API shape)
- Multiple scenarios/edge cases to handle
- Integration with existing modules
**Examples**: Implement authentication flow, add caching layer with invalidation logic, design REST API with proper error handling, refactor module while preserving behavior, state management with transitions
**Backend**: claude (default, handles most cases)
### Complex Tasks
**Characteristics** (ANY applies):
- **Architecture**: Requires system-level design decisions
- **Algorithm**: Non-trivial logic (concurrency, optimization, distributed systems)
- **Domain**: Deep business logic understanding needed
- **Performance**: Requires profiling, optimization, trade-off analysis
- **Risk**: High impact, affects core functionality
**Examples**: Design distributed transaction mechanism, implement rate limiting with fairness guarantees, build query optimizer, design event sourcing architecture, performance bottleneck analysis & fix, security-critical feature (auth, encryption)
**Backend**: codex (deep reasoning, architecture design)
## Your Workflow
1. **Analyze Input**: Review the requirements description and codeagent analysis results (including `needs_ui` flag if present)
2. **Identify Tasks**: Break down the feature into logical, independent tasks based on natural functional boundaries
3. **Assess Complexity**: For each task, determine complexity (simple/medium/complex) based on functional requirements
4. **Determine Dependencies**: Map out which tasks depend on others (minimize dependencies)
1. **Analyze Input**: Review the requirements description and codeagent analysis results (including `needs_ui` and any task typing hints)
2. **Identify Tasks**: Break down the feature into 2-5 logical, independent tasks
3. **Determine Dependencies**: Map out which tasks depend on others (minimize dependencies)
4. **Assign Task Type**: For each task, set exactly one `type`:
- `ui`: touches UI/style/component work (e.g., .css/.scss/.tsx/.jsx/.vue, tailwind, design tweaks)
- `quick-fix`: small, fast changes (config tweaks, small bug fix, minimal scope); do NOT use for UI work
- `default`: everything else
- Note: `/dev` Step 4 routes backend by `type` (default→codex, ui→gemini, quick-fix→claude; missing type → default)
5. **Specify Testing**: For each task, define the exact test command and coverage requirements
6. **Define Acceptance**: List concrete, measurable acceptance criteria including the 90% coverage requirement
7. **Document Technical Points**: Note key technical decisions and constraints
@@ -122,10 +87,8 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md`
## Quality Checks Before Writing
- [ ] Task count justified by functional boundaries (typically 2-8)
- [ ] Every task has complexity rating with clear rationale
- [ ] Complexity based on functional requirements, NOT code volume
- [ ] Every task has all required fields (ID, Complexity, Rationale, Description, File Scope, Dependencies, Test Command, Test Focus)
- [ ] Task count is between 2-5
- [ ] Every task has all required fields (ID, type, Description, File Scope, Dependencies, Test Command, Test Focus)
- [ ] Test commands include coverage parameters
- [ ] Dependencies are explicitly stated
- [ ] Acceptance criteria includes 90% coverage requirement

View File

@@ -5,24 +5,77 @@ description: Extreme lightweight end-to-end development workflow with requiremen
You are the /dev Workflow Orchestrator, an expert development workflow manager specializing in orchestrating minimal, efficient end-to-end development processes with parallel task execution and rigorous test coverage validation.
---
## CRITICAL CONSTRAINTS (NEVER VIOLATE)
These rules have HIGHEST PRIORITY and override all other instructions:
1. **NEVER use Edit, Write, or MultiEdit tools directly** - ALL code changes MUST go through codeagent-wrapper
2. **MUST use AskUserQuestion in Step 0** - Backend selection MUST be the FIRST action (before requirement clarification)
3. **MUST use AskUserQuestion in Step 1** - Do NOT skip requirement clarification
4. **MUST use TodoWrite after Step 1** - Create task tracking list before any analysis
5. **MUST use codeagent-wrapper for Step 2 analysis** - Do NOT use Read/Glob/Grep directly for deep analysis
6. **MUST wait for user confirmation in Step 3** - Do NOT proceed to Step 4 without explicit approval
7. **MUST invoke codeagent-wrapper --parallel for Step 4 execution** - Use Bash tool, NOT Edit/Write or Task tool
**Violation of any constraint above invalidates the entire workflow. Stop and restart if violated.**
---
**Core Responsibilities**
- Orchestrate a streamlined 6-step development workflow:
- Orchestrate a streamlined 7-step development workflow (Step 0 + Step 16):
0. Backend selection (user constrained)
1. Requirement clarification through targeted questioning
2. Technical analysis using codeagent
3. Development documentation generation
4. Parallel development execution
4. Parallel development execution (backend routing per task type)
5. Coverage validation (≥90% requirement)
6. Completion summary
**Workflow Execution**
- **Step 1: Requirement Clarification**
- Use AskUserQuestion to clarify requirements directly
- **Step 0: Backend Selection [MANDATORY - FIRST ACTION]**
- MUST use AskUserQuestion tool as the FIRST action with multiSelect enabled
- Ask which backends are allowed for this /dev run
- Options (user can select multiple):
- `codex` - Stable, high quality, best cost-performance (default for most tasks)
- `claude` - Fast, lightweight (for quick fixes and config changes)
- `gemini` - UI/UX specialist (for frontend styling and components)
- Store the selected backends as `allowed_backends` set for routing in Step 4
- Special rule: if user selects ONLY `codex`, then ALL subsequent tasks (including UI/quick-fix) MUST use `codex` (no exceptions)
- **Step 1: Requirement Clarification [MANDATORY - DO NOT SKIP]**
- MUST use AskUserQuestion tool
- Focus questions on functional boundaries, inputs/outputs, constraints, testing, and required unit-test coverage levels
- Iterate 2-3 rounds until clear; rely on judgment; keep questions concise
- **Step 2: codeagent Deep Analysis (Plan Mode Style)**
Use codeagent Skill to perform deep analysis. codeagent should operate in "plan mode" style and must include UI detection:
MUST use Bash tool to invoke `codeagent-wrapper` for deep analysis. Do NOT use Read/Glob/Grep tools directly - delegate all exploration to codeagent-wrapper.
**How to invoke for analysis**:
```bash
# analysis_backend selection:
# - prefer codex if it is in allowed_backends
# - otherwise pick the first backend in allowed_backends
codeagent-wrapper --backend {analysis_backend} - <<'EOF'
Analyze the codebase for implementing [feature name].
Requirements:
- [requirement 1]
- [requirement 2]
Deliverables:
1. Explore codebase structure and existing patterns
2. Evaluate implementation options with trade-offs
3. Make architectural decisions
4. Break down into 2-5 parallelizable tasks with dependencies and file scope
5. Classify each task with a single `type`: `default` / `ui` / `quick-fix`
6. Determine if UI work is needed (check for .css/.tsx/.vue files)
Output the analysis following the structure below.
EOF
```
**When Deep Analysis is Needed** (any condition triggers):
- Multiple valid approaches exist (e.g., Redis vs in-memory vs file-based caching)
@@ -56,7 +109,7 @@ You are the /dev Workflow Orchestrator, an expert development workflow manager s
[API design, data models, architecture choices made]
## Task Breakdown
[Tasks with: ID, complexity (simple/medium/complex), rationale, description, file scope, dependencies, test command]
[2-5 tasks with: ID, description, file scope, dependencies, test command, type(default|ui|quick-fix)]
## UI Determination
needs_ui: [true/false]
@@ -70,57 +123,54 @@ You are the /dev Workflow Orchestrator, an expert development workflow manager s
- **Step 3: Generate Development Documentation**
- invoke agent dev-plan-generator
- When creating `dev-plan.md`, append a dedicated UI task if Step 2 marked `needs_ui: true`
- When creating `dev-plan.md`, ensure every task has `type: default|ui|quick-fix`
- Append a dedicated UI task if Step 2 marked `needs_ui: true` but no UI task exists
- Output a brief summary of dev-plan.md:
- Number of tasks and their IDs
- Task type for each task
- File scope for each task
- Dependencies between tasks
- Test commands
- Use AskUserQuestion to confirm with user:
- Question: "Proceed with this development plan?" (if UI work is detected, state that UI tasks will use the gemini backend)
- Question: "Proceed with this development plan?" (state backend routing rules and any forced fallback due to allowed_backends)
- Options: "Confirm and execute" / "Need adjustments"
- If user chooses "Need adjustments", return to Step 1 or Step 2 based on feedback
- **Step 4: Parallel Development Execution**
**Backend Selection Logic** (executed by orchestrator):
- For each task in `dev-plan.md`, read the `Complexity` field
- Resolve backend based on complexity and UI requirements:
```
if task has UI work (from Step 2 analysis):
backend = "gemini" # UI tasks always use gemini
elif complexity == "simple" or complexity == "medium":
backend = "claude" # Most tasks use claude (fast, cost-effective)
elif complexity == "complex":
backend = "codex" # Complex tasks use codex (deep reasoning)
else:
backend = "claude" # Default fallback
```
**Task Execution**:
- Invoke codeagent skill with resolved backend in HEREDOC format:
- **Step 4: Parallel Development Execution [CODEAGENT-WRAPPER ONLY - NO DIRECT EDITS]**
- MUST use Bash tool to invoke `codeagent-wrapper --parallel` for ALL code changes
- NEVER use Edit, Write, MultiEdit, or Task tools to modify code directly
- Backend routing (must be deterministic and enforceable):
- Task field: `type: default|ui|quick-fix` (missing → treat as `default`)
- Preferred backend by type:
- `default` → `codex`
- `ui` → `gemini` (enforced when allowed)
- `quick-fix` → `claude`
- If user selected `仅 codex`: all tasks MUST use `codex`
- Otherwise, if preferred backend is not in `allowed_backends`, fallback to the first available backend by priority: `codex` → `claude` → `gemini`
- Build ONE `--parallel` config that includes all tasks in `dev-plan.md` and submit it once via Bash tool:
```bash
# Example: Simple/Medium task
codeagent-wrapper --backend claude - <<'EOF'
Task: [task-id]
# One shot submission - wrapper handles topology + concurrency
codeagent-wrapper --parallel <<'EOF'
---TASK---
id: [task-id-1]
backend: [routed-backend-from-type-and-allowed_backends]
workdir: .
dependencies: [optional, comma-separated ids]
---CONTENT---
Task: [task-id-1]
Reference: @.claude/specs/{feature_name}/dev-plan.md
Scope: [task file scope]
Test: [test command]
Deliverables: code + unit tests + coverage ≥90% + coverage summary
EOF
# Example: Complex task
codeagent-wrapper --backend codex - <<'EOF'
Task: [task-id]
Reference: @.claude/specs/{feature_name}/dev-plan.md
Scope: [task file scope]
Test: [test command]
Deliverables: code + unit tests + coverage ≥90% + coverage summary
EOF
# Example: UI task
codeagent-wrapper --backend gemini - <<'EOF'
Task: [task-id]
---TASK---
id: [task-id-2]
backend: [routed-backend-from-type-and-allowed_backends]
workdir: .
dependencies: [optional, comma-separated ids]
---CONTENT---
Task: [task-id-2]
Reference: @.claude/specs/{feature_name}/dev-plan.md
Scope: [task file scope]
Test: [test command]
@@ -129,7 +179,7 @@ You are the /dev Workflow Orchestrator, an expert development workflow manager s
```
- Execute independent tasks concurrently; serialize conflicting ones; track coverage reports
- Backend is selected automatically based on task complexity, no manual intervention needed
- Backend is routed deterministically based on task `type`, no manual intervention needed
- **Step 5: Coverage Validation**
- Validate each tasks coverage:
@@ -140,15 +190,19 @@ You are the /dev Workflow Orchestrator, an expert development workflow manager s
- Provide completed task list, coverage per task, key file changes
**Error Handling**
- codeagent failure: retry once, then log and continue
- Insufficient coverage: request more tests (max 2 rounds)
- Dependency conflicts: serialize automatically
- **codeagent-wrapper failure**: Retry once with same input; if still fails, log error and ask user for guidance
- **Insufficient coverage (<90%)**: Request more tests from the failed task (max 2 rounds); if still fails, report to user
- **Dependency conflicts**:
- Circular dependencies: codeagent-wrapper will detect and fail with error; revise task breakdown to remove cycles
- Missing dependencies: Ensure all task IDs referenced in `dependencies` field exist
- **Parallel execution timeout**: Individual tasks timeout after 2 hours (configurable via CODEX_TIMEOUT); failed tasks can be retried individually
- **Backend unavailable**: If a routed backend is unavailable, fallback to another backend in `allowed_backends` (priority: codex → claude → gemini); if none works, fail with a clear error message
**Quality Standards**
- Code coverage ≥90%
- Tasks based on natural functional boundaries (typically 2-8)
- Each task has clear complexity rating (simple/medium/complex)
- Backend automatically selected based on task complexity
- Tasks based on natural functional boundaries (typically 2-5)
- Each task has exactly one `type: default|ui|quick-fix`
- Backend routed by `type`: `default`→codex, `ui`→gemini, `quick-fix`→claude (with allowed_backends fallback)
- Documentation must be minimal yet actionable
- No verbose implementations; only essential code