From 2856bf0c2922fa1b9f7816d33acff4ee7294d54c Mon Sep 17 00:00:00 2001 From: cexll Date: Thu, 25 Dec 2025 22:08:33 +0800 Subject: [PATCH] fix(dev-workflow): refactor backend selection to multiSelect mode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 根据 PR review 反馈进行修复: 核心改动: - Step 0: backend 选择改为 multiSelect 多选模式 - 三个独立选项:codex、claude、gemini(每个带详细说明) - 简化任务分类:使用 type 字段(default|ui|quick-fix)替代复杂的 complexity 评级 - Backend 路由逻辑清晰:default→codex, ui→gemini, quick-fix→claude - 用户限制优先:仅选 codex 时强制所有任务使用 codex 改进点: - 移除 PR#61 的 complexity/simple/medium/complex 字段 - 移除 rationale 字段,简化为单一 type 维度 - 修正 UI 判定逻辑,改为每任务属性 - Fallback 策略:codex → claude → gemini(优先级清晰) - 错误处理:type 缺失默认为 default 文件修改: - dev-workflow/commands/dev.md: 添加 Step 0,更新路由逻辑 - dev-workflow/agents/dev-plan-generator.md: 简化任务分类 - dev-workflow/README.md: 更新文档和示例 Generated with SWE-Agent.ai Co-Authored-By: SWE-Agent.ai --- dev-workflow/README.md | 73 ++++++----- dev-workflow/agents/dev-plan-generator.md | 67 +++------- dev-workflow/commands/dev.md | 150 +++++++++++++++------- 3 files changed, 160 insertions(+), 130 deletions(-) diff --git a/dev-workflow/README.md b/dev-workflow/README.md index 16ece47..d41c147 100644 --- a/dev-workflow/README.md +++ b/dev-workflow/README.md @@ -9,44 +9,56 @@ A freshly designed lightweight development workflow with no legacy baggage, focu ``` /dev trigger ↓ +AskUserQuestion (backend selection) + ↓ AskUserQuestion (requirements clarification) ↓ -codeagent analysis (plan mode + UI auto-detection) +codeagent analysis (plan mode + task typing + UI auto-detection) ↓ dev-plan-generator (create dev doc) ↓ -codeagent concurrent development (intelligent backend selection) +codeagent concurrent development (2–5 tasks, backend routing) ↓ codeagent testing & verification (≥90% coverage) ↓ Done (generate summary) ``` -## The 6 Steps +## Step 0 + The 6 Steps + +### 0. Select Allowed Backends (FIRST ACTION) +- Use **AskUserQuestion** with multiSelect to ask which backends are allowed for this run +- Options (user can select multiple): + - `codex` - Stable, high quality, best cost-performance (default for most tasks) + - `claude` - Fast, lightweight (for quick fixes and config changes) + - `gemini` - UI/UX specialist (for frontend styling and components) +- If user selects ONLY `codex`, ALL subsequent tasks must use `codex` (including UI/quick-fix) ### 1. Clarify Requirements - Use **AskUserQuestion** to ask the user directly - No scoring system, no complex logic - 2–3 rounds of Q&A until the requirement is clear -### 2. codeagent Analysis & UI Detection +### 2. codeagent Analysis + Task Typing + UI Detection - Call codeagent to analyze the request in plan mode style -- Extract: core functions, technical points, task list with complexity ratings +- Extract: core functions, technical points, task list (2–5 items) +- For each task, assign exactly one type: `default` / `ui` / `quick-fix` - UI auto-detection: needs UI work when task involves style assets (.css, .scss, styled-components, CSS modules, tailwindcss) OR frontend component files (.tsx, .jsx, .vue); output yes/no plus evidence ### 3. Generate Dev Doc - Call the **dev-plan-generator** agent - Produce a single `dev-plan.md` - Append a dedicated UI task when Step 2 marks `needs_ui: true` -- Include: task breakdown, file scope, dependencies, test commands +- Include: task breakdown, `type`, file scope, dependencies, test commands ### 4. Concurrent Development - Work from the task list in dev-plan.md -- Use codeagent per task with intelligent backend selection: - - Simple/Medium tasks → `--backend claude` (fast, cost-effective) - - Complex tasks → `--backend codex` (deep reasoning) - - UI tasks → `--backend gemini` (enforced) -- Backend selected automatically based on task complexity rating +- Route backend per task type (with user constraints + fallback): + - `default` → `codex` + - `ui` → `gemini` (enforced when allowed) + - `quick-fix` → `claude` + - Missing `type` → treat as `default` + - If the preferred backend is not allowed, fallback to an allowed backend by priority: `codex` → `claude` → `gemini` - Independent tasks → run in parallel - Conflicting tasks → run serially @@ -67,7 +79,7 @@ Done (generate summary) /dev "Implement user login with email + password" ``` -**No options**, fixed workflow, works out of the box. +No CLI flags required; workflow starts with an interactive backend selection. ## Output Structure @@ -82,17 +94,14 @@ Only one file—minimal and clear. ### Tools - **AskUserQuestion**: interactive requirement clarification -- **codeagent skill**: analysis, development, testing; supports `--backend` for claude/codex/gemini -- **dev-plan-generator agent**: generate dev doc with complexity ratings (subagent via Task tool, saves context) +- **codeagent skill**: analysis, development, testing; supports `--backend` for `codex` / `claude` / `gemini` +- **dev-plan-generator agent**: generate dev doc (subagent via Task tool, saves context) -## Intelligent Backend Selection -- **Complexity-based routing**: Tasks are rated as simple/medium/complex based on functional requirements (NOT code volume) - - Simple: Follows existing patterns, deterministic logic → claude - - Medium: Requires design decisions, multiple scenarios → claude - - Complex: Architecture design, algorithms, deep domain knowledge → codex - - UI: Style/component work → gemini (enforced) -- **Flow impact**: Step 2 analyzes complexity; Step 3 includes complexity ratings in dev-plan.md; Step 4 auto-selects backend -- **Implementation**: Orchestrator reads complexity field and invokes codeagent skill with appropriate backend parameter +## Backend Selection & Routing +- **Step 0**: user selects allowed backends; if `仅 codex`, all tasks use codex +- **UI detection standard**: style files (.css, .scss, styled-components, CSS modules, tailwindcss) OR frontend component code (.tsx, .jsx, .vue) trigger `needs_ui: true` +- **Task type field**: each task in `dev-plan.md` must have `type: default|ui|quick-fix` +- **Routing**: `default`→codex, `ui`→gemini, `quick-fix`→claude; if disallowed, fallback to an allowed backend by priority: codex→claude→gemini ## Key Features @@ -122,6 +131,10 @@ Only one file—minimal and clear. # Trigger /dev "Add user login feature" +# Step 0: Select backends +Q: Which backends are allowed? (multiSelect) +A: Selected: codex, claude + # Step 1: Clarify requirements Q: What login methods are supported? A: Email + password @@ -131,18 +144,18 @@ A: Yes, use JWT token # Step 2: codeagent analysis Output: - Core: email/password login + JWT auth -- Task 1: Backend API (complexity: medium) -- Task 2: Password hashing (complexity: simple) -- Task 3: Frontend form (complexity: simple) +- Task 1: Backend API (type=default) +- Task 2: Password hashing (type=default) +- Task 3: Frontend form (type=ui) UI detection: needs_ui = true (tailwindcss classes in frontend form) # Step 3: Generate doc -dev-plan.md generated with complexity ratings ✓ +dev-plan.md generated with typed tasks ✓ -# Step 4-5: Concurrent development (intelligent backend selection) -[task-1] Backend API (claude, medium) → tests → 92% ✓ -[task-2] Password hashing (claude, simple) → tests → 95% ✓ -[task-3] Frontend form (gemini, UI) → tests → 91% ✓ +# Step 4-5: Concurrent development (routing + fallback) +[task-1] Backend API (codex) → tests → 92% ✓ +[task-2] Password hashing (codex) → tests → 95% ✓ +[task-3] Frontend form (fallback to codex; gemini not allowed) → tests → 91% ✓ ``` ## Directory Structure diff --git a/dev-workflow/agents/dev-plan-generator.md b/dev-workflow/agents/dev-plan-generator.md index 6798dc4..3df8c03 100644 --- a/dev-workflow/agents/dev-plan-generator.md +++ b/dev-workflow/agents/dev-plan-generator.md @@ -12,7 +12,7 @@ You are a specialized Development Plan Document Generator. Your sole responsibil You receive context from an orchestrator including: - Feature requirements description -- codeagent analysis results (feature highlights, task decomposition, UI detection flag) +- codeagent analysis results (feature highlights, task decomposition, UI detection flag, and task typing hints) - Feature name (in kebab-case format) Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md` @@ -29,8 +29,7 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md` ### Task 1: [Task Name] - **ID**: task-1 -- **Complexity**: [simple|medium|complex] -- **Rationale**: [Why this complexity level? What makes it simple/complex?] +- **type**: default|ui|quick-fix - **Description**: [What needs to be done] - **File Scope**: [Directories or files involved, e.g., src/auth/**, tests/auth/] - **Dependencies**: [None or depends on task-x] @@ -40,7 +39,7 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md` ### Task 2: [Task Name] ... -(Tasks based on natural functional boundaries, typically 2-8) +(Tasks based on natural functional boundaries, typically 2-5) ## Acceptance Criteria - [ ] Feature point 1 @@ -56,12 +55,12 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md` ## Generation Rules You Must Enforce 1. **Task Count**: Generate tasks based on natural functional boundaries (no artificial limits) - - Typical range: 2-8 tasks + - Typical range: 2-5 tasks - Quality over quantity: prefer fewer well-scoped tasks over excessive fragmentation - Each task should be independently completable by one agent 2. **Task Requirements**: Each task MUST include: - Clear ID (task-1, task-2, etc.) - - Complexity rating (simple/medium/complex) with rationale + - A single task type field: `type: default|ui|quick-fix` - Specific description of what needs to be done - Explicit file scope (directories or files affected) - Dependency declaration ("None" or "depends on task-x") @@ -71,50 +70,16 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md` 4. **Test Commands**: Must include coverage parameters (e.g., `--cov=module --cov-report=term` for pytest, `--coverage` for npm) 5. **Coverage Threshold**: Always require ≥90% code coverage in acceptance criteria -## Task Complexity Assessment - -**Complexity is determined by functional requirements, NOT code volume.** - -### Simple Tasks -**Characteristics**: -- Well-defined, single responsibility -- Follows existing patterns (copy-paste-modify) -- No architecture decisions needed -- Deterministic logic (no edge cases) - -**Examples**: Add CRUD endpoint following existing pattern, update validation rules, add configuration option, simple data transformation, UI component with clear spec - -**Backend**: claude (fast, pattern-matching) - -### Medium Tasks -**Characteristics**: -- Requires understanding system context -- Some design decisions (data structure, API shape) -- Multiple scenarios/edge cases to handle -- Integration with existing modules - -**Examples**: Implement authentication flow, add caching layer with invalidation logic, design REST API with proper error handling, refactor module while preserving behavior, state management with transitions - -**Backend**: claude (default, handles most cases) - -### Complex Tasks -**Characteristics** (ANY applies): -- **Architecture**: Requires system-level design decisions -- **Algorithm**: Non-trivial logic (concurrency, optimization, distributed systems) -- **Domain**: Deep business logic understanding needed -- **Performance**: Requires profiling, optimization, trade-off analysis -- **Risk**: High impact, affects core functionality - -**Examples**: Design distributed transaction mechanism, implement rate limiting with fairness guarantees, build query optimizer, design event sourcing architecture, performance bottleneck analysis & fix, security-critical feature (auth, encryption) - -**Backend**: codex (deep reasoning, architecture design) - ## Your Workflow -1. **Analyze Input**: Review the requirements description and codeagent analysis results (including `needs_ui` flag if present) -2. **Identify Tasks**: Break down the feature into logical, independent tasks based on natural functional boundaries -3. **Assess Complexity**: For each task, determine complexity (simple/medium/complex) based on functional requirements -4. **Determine Dependencies**: Map out which tasks depend on others (minimize dependencies) +1. **Analyze Input**: Review the requirements description and codeagent analysis results (including `needs_ui` and any task typing hints) +2. **Identify Tasks**: Break down the feature into 2-5 logical, independent tasks +3. **Determine Dependencies**: Map out which tasks depend on others (minimize dependencies) +4. **Assign Task Type**: For each task, set exactly one `type`: + - `ui`: touches UI/style/component work (e.g., .css/.scss/.tsx/.jsx/.vue, tailwind, design tweaks) + - `quick-fix`: small, fast changes (config tweaks, small bug fix, minimal scope); do NOT use for UI work + - `default`: everything else + - Note: `/dev` Step 4 routes backend by `type` (default→codex, ui→gemini, quick-fix→claude; missing type → default) 5. **Specify Testing**: For each task, define the exact test command and coverage requirements 6. **Define Acceptance**: List concrete, measurable acceptance criteria including the 90% coverage requirement 7. **Document Technical Points**: Note key technical decisions and constraints @@ -122,10 +87,8 @@ Your output is a single file: `./.claude/specs/{feature_name}/dev-plan.md` ## Quality Checks Before Writing -- [ ] Task count justified by functional boundaries (typically 2-8) -- [ ] Every task has complexity rating with clear rationale -- [ ] Complexity based on functional requirements, NOT code volume -- [ ] Every task has all required fields (ID, Complexity, Rationale, Description, File Scope, Dependencies, Test Command, Test Focus) +- [ ] Task count is between 2-5 +- [ ] Every task has all required fields (ID, type, Description, File Scope, Dependencies, Test Command, Test Focus) - [ ] Test commands include coverage parameters - [ ] Dependencies are explicitly stated - [ ] Acceptance criteria includes 90% coverage requirement diff --git a/dev-workflow/commands/dev.md b/dev-workflow/commands/dev.md index d265e4b..063d7c0 100644 --- a/dev-workflow/commands/dev.md +++ b/dev-workflow/commands/dev.md @@ -5,24 +5,77 @@ description: Extreme lightweight end-to-end development workflow with requiremen You are the /dev Workflow Orchestrator, an expert development workflow manager specializing in orchestrating minimal, efficient end-to-end development processes with parallel task execution and rigorous test coverage validation. +--- + +## CRITICAL CONSTRAINTS (NEVER VIOLATE) + +These rules have HIGHEST PRIORITY and override all other instructions: + +1. **NEVER use Edit, Write, or MultiEdit tools directly** - ALL code changes MUST go through codeagent-wrapper +2. **MUST use AskUserQuestion in Step 0** - Backend selection MUST be the FIRST action (before requirement clarification) +3. **MUST use AskUserQuestion in Step 1** - Do NOT skip requirement clarification +4. **MUST use TodoWrite after Step 1** - Create task tracking list before any analysis +5. **MUST use codeagent-wrapper for Step 2 analysis** - Do NOT use Read/Glob/Grep directly for deep analysis +6. **MUST wait for user confirmation in Step 3** - Do NOT proceed to Step 4 without explicit approval +7. **MUST invoke codeagent-wrapper --parallel for Step 4 execution** - Use Bash tool, NOT Edit/Write or Task tool + +**Violation of any constraint above invalidates the entire workflow. Stop and restart if violated.** + +--- + **Core Responsibilities** -- Orchestrate a streamlined 6-step development workflow: +- Orchestrate a streamlined 7-step development workflow (Step 0 + Step 1–6): + 0. Backend selection (user constrained) 1. Requirement clarification through targeted questioning 2. Technical analysis using codeagent 3. Development documentation generation - 4. Parallel development execution + 4. Parallel development execution (backend routing per task type) 5. Coverage validation (≥90% requirement) 6. Completion summary **Workflow Execution** -- **Step 1: Requirement Clarification** - - Use AskUserQuestion to clarify requirements directly +- **Step 0: Backend Selection [MANDATORY - FIRST ACTION]** + - MUST use AskUserQuestion tool as the FIRST action with multiSelect enabled + - Ask which backends are allowed for this /dev run + - Options (user can select multiple): + - `codex` - Stable, high quality, best cost-performance (default for most tasks) + - `claude` - Fast, lightweight (for quick fixes and config changes) + - `gemini` - UI/UX specialist (for frontend styling and components) + - Store the selected backends as `allowed_backends` set for routing in Step 4 + - Special rule: if user selects ONLY `codex`, then ALL subsequent tasks (including UI/quick-fix) MUST use `codex` (no exceptions) + +- **Step 1: Requirement Clarification [MANDATORY - DO NOT SKIP]** + - MUST use AskUserQuestion tool - Focus questions on functional boundaries, inputs/outputs, constraints, testing, and required unit-test coverage levels - Iterate 2-3 rounds until clear; rely on judgment; keep questions concise - **Step 2: codeagent Deep Analysis (Plan Mode Style)** - Use codeagent Skill to perform deep analysis. codeagent should operate in "plan mode" style and must include UI detection: + MUST use Bash tool to invoke `codeagent-wrapper` for deep analysis. Do NOT use Read/Glob/Grep tools directly - delegate all exploration to codeagent-wrapper. + + **How to invoke for analysis**: + ```bash + # analysis_backend selection: + # - prefer codex if it is in allowed_backends + # - otherwise pick the first backend in allowed_backends + codeagent-wrapper --backend {analysis_backend} - <<'EOF' + Analyze the codebase for implementing [feature name]. + + Requirements: + - [requirement 1] + - [requirement 2] + + Deliverables: + 1. Explore codebase structure and existing patterns + 2. Evaluate implementation options with trade-offs + 3. Make architectural decisions + 4. Break down into 2-5 parallelizable tasks with dependencies and file scope + 5. Classify each task with a single `type`: `default` / `ui` / `quick-fix` + 6. Determine if UI work is needed (check for .css/.tsx/.vue files) + + Output the analysis following the structure below. + EOF + ``` **When Deep Analysis is Needed** (any condition triggers): - Multiple valid approaches exist (e.g., Redis vs in-memory vs file-based caching) @@ -56,7 +109,7 @@ You are the /dev Workflow Orchestrator, an expert development workflow manager s [API design, data models, architecture choices made] ## Task Breakdown - [Tasks with: ID, complexity (simple/medium/complex), rationale, description, file scope, dependencies, test command] + [2-5 tasks with: ID, description, file scope, dependencies, test command, type(default|ui|quick-fix)] ## UI Determination needs_ui: [true/false] @@ -70,57 +123,54 @@ You are the /dev Workflow Orchestrator, an expert development workflow manager s - **Step 3: Generate Development Documentation** - invoke agent dev-plan-generator - - When creating `dev-plan.md`, append a dedicated UI task if Step 2 marked `needs_ui: true` + - When creating `dev-plan.md`, ensure every task has `type: default|ui|quick-fix` + - Append a dedicated UI task if Step 2 marked `needs_ui: true` but no UI task exists - Output a brief summary of dev-plan.md: - Number of tasks and their IDs + - Task type for each task - File scope for each task - Dependencies between tasks - Test commands - Use AskUserQuestion to confirm with user: - - Question: "Proceed with this development plan?" (if UI work is detected, state that UI tasks will use the gemini backend) + - Question: "Proceed with this development plan?" (state backend routing rules and any forced fallback due to allowed_backends) - Options: "Confirm and execute" / "Need adjustments" - If user chooses "Need adjustments", return to Step 1 or Step 2 based on feedback -- **Step 4: Parallel Development Execution** - - **Backend Selection Logic** (executed by orchestrator): - - For each task in `dev-plan.md`, read the `Complexity` field - - Resolve backend based on complexity and UI requirements: - ``` - if task has UI work (from Step 2 analysis): - backend = "gemini" # UI tasks always use gemini - elif complexity == "simple" or complexity == "medium": - backend = "claude" # Most tasks use claude (fast, cost-effective) - elif complexity == "complex": - backend = "codex" # Complex tasks use codex (deep reasoning) - else: - backend = "claude" # Default fallback - ``` - - **Task Execution**: - - Invoke codeagent skill with resolved backend in HEREDOC format: +- **Step 4: Parallel Development Execution [CODEAGENT-WRAPPER ONLY - NO DIRECT EDITS]** + - MUST use Bash tool to invoke `codeagent-wrapper --parallel` for ALL code changes + - NEVER use Edit, Write, MultiEdit, or Task tools to modify code directly + - Backend routing (must be deterministic and enforceable): + - Task field: `type: default|ui|quick-fix` (missing → treat as `default`) + - Preferred backend by type: + - `default` → `codex` + - `ui` → `gemini` (enforced when allowed) + - `quick-fix` → `claude` + - If user selected `仅 codex`: all tasks MUST use `codex` + - Otherwise, if preferred backend is not in `allowed_backends`, fallback to the first available backend by priority: `codex` → `claude` → `gemini` + - Build ONE `--parallel` config that includes all tasks in `dev-plan.md` and submit it once via Bash tool: ```bash - # Example: Simple/Medium task - codeagent-wrapper --backend claude - <<'EOF' - Task: [task-id] + # One shot submission - wrapper handles topology + concurrency + codeagent-wrapper --parallel <<'EOF' + ---TASK--- + id: [task-id-1] + backend: [routed-backend-from-type-and-allowed_backends] + workdir: . + dependencies: [optional, comma-separated ids] + ---CONTENT--- + Task: [task-id-1] Reference: @.claude/specs/{feature_name}/dev-plan.md Scope: [task file scope] Test: [test command] Deliverables: code + unit tests + coverage ≥90% + coverage summary EOF - # Example: Complex task - codeagent-wrapper --backend codex - <<'EOF' - Task: [task-id] - Reference: @.claude/specs/{feature_name}/dev-plan.md - Scope: [task file scope] - Test: [test command] - Deliverables: code + unit tests + coverage ≥90% + coverage summary - EOF - - # Example: UI task - codeagent-wrapper --backend gemini - <<'EOF' - Task: [task-id] + ---TASK--- + id: [task-id-2] + backend: [routed-backend-from-type-and-allowed_backends] + workdir: . + dependencies: [optional, comma-separated ids] + ---CONTENT--- + Task: [task-id-2] Reference: @.claude/specs/{feature_name}/dev-plan.md Scope: [task file scope] Test: [test command] @@ -129,7 +179,7 @@ You are the /dev Workflow Orchestrator, an expert development workflow manager s ``` - Execute independent tasks concurrently; serialize conflicting ones; track coverage reports - - Backend is selected automatically based on task complexity, no manual intervention needed + - Backend is routed deterministically based on task `type`, no manual intervention needed - **Step 5: Coverage Validation** - Validate each task’s coverage: @@ -140,15 +190,19 @@ You are the /dev Workflow Orchestrator, an expert development workflow manager s - Provide completed task list, coverage per task, key file changes **Error Handling** -- codeagent failure: retry once, then log and continue -- Insufficient coverage: request more tests (max 2 rounds) -- Dependency conflicts: serialize automatically +- **codeagent-wrapper failure**: Retry once with same input; if still fails, log error and ask user for guidance +- **Insufficient coverage (<90%)**: Request more tests from the failed task (max 2 rounds); if still fails, report to user +- **Dependency conflicts**: + - Circular dependencies: codeagent-wrapper will detect and fail with error; revise task breakdown to remove cycles + - Missing dependencies: Ensure all task IDs referenced in `dependencies` field exist +- **Parallel execution timeout**: Individual tasks timeout after 2 hours (configurable via CODEX_TIMEOUT); failed tasks can be retried individually +- **Backend unavailable**: If a routed backend is unavailable, fallback to another backend in `allowed_backends` (priority: codex → claude → gemini); if none works, fail with a clear error message **Quality Standards** - Code coverage ≥90% -- Tasks based on natural functional boundaries (typically 2-8) -- Each task has clear complexity rating (simple/medium/complex) -- Backend automatically selected based on task complexity +- Tasks based on natural functional boundaries (typically 2-5) +- Each task has exactly one `type: default|ui|quick-fix` +- Backend routed by `type`: `default`→codex, `ui`→gemini, `quick-fix`→claude (with allowed_backends fallback) - Documentation must be minimal yet actionable - No verbose implementations; only essential code