Add tests for CLI command generation and model alias resolution

- Implement `test-cli-command-gen.js` to verify the logic of `buildCliCommand` function. - Create `test-e2e-model-alias.js` for end-to-end testing of model alias resolution in `ccw cli`. - Add `test-model-alias.js` to test model alias resolution for different models. - Introduce `test-model-alias.txt` for prompt testing with model alias. - Develop `test-update-claude-command.js` to test command generation for `update_module_claude`. - Create a test file in `test-update-claude/src` for future tests.
2026-02-10 02:24:35 +08:00 · 2026-02-05 20:17:10 +08:00
parent 6576886457
commit 01459a34a5
193 changed files with 4796 additions and 9405 deletions
--- a/.codex/agents/action-planning-agent.md
+++ b/.codex/agents/action-planning-agent.md
@@ -55,6 +55,17 @@ color: yellow
 **Step-by-step execution**:

 ```
+0. Load planning notes → Extract phase-level constraints (NEW)
+   Commands: Read('.workflow/active/{session-id}/planning-notes.md')
+   Output: Consolidated constraints from all workflow phases
+   Structure:
+     - User Intent: Original GOAL, KEY_CONSTRAINTS
+     - Context Findings: Critical files, architecture notes, constraints
+     - Conflict Decisions: Resolved conflicts, modified artifacts
+     - Consolidated Constraints: Numbered list of ALL constraints (Phase 1-3)
+
+   USAGE: This is the PRIMARY source of constraints. All task generation MUST respect these constraints.
+
 1. Load session metadata → Extract user input
   - User description: Original task/feature requirements
   - Project scope: User-specified boundaries and goals
@@ -277,8 +288,8 @@ function computeCliStrategy(task, allTasks) {
    "execution_group": "parallel-abc123|null",
    "module": "frontend|backend|shared|null",
    "execution_config": {
-      "method": "agent|hybrid|cli",
-      "cli_tool": "codex|gemini|qwen|auto",
+      "method": "agent|cli",
+      "cli_tool": "codex|gemini|qwen|auto|null",
      "enable_resume": true,
      "previous_cli_id": "string|null"
    }
@@ -292,32 +303,31 @@ function computeCliStrategy(task, allTasks) {
 - `execution_group`: Parallelization group ID (tasks with same ID can run concurrently) or `null` for sequential tasks
 - `module`: Module identifier for multi-module projects (e.g., `frontend`, `backend`, `shared`) or `null` for single-module
 - `execution_config`: CLI execution settings (MUST align with userConfig from task-generate-agent)
-  - `method`: Execution method - `agent` (direct), `hybrid` (agent + CLI), `cli` (CLI only)
+  - `method`: Execution method - `agent` (direct) or `cli` (CLI only). Only two values in final task JSON.
  - `cli_tool`: Preferred CLI tool - `codex`, `gemini`, `qwen`, `auto`, or `null` (for agent-only)
  - `enable_resume`: Whether to use `--resume` for CLI continuity (default: true)
  - `previous_cli_id`: Previous task's CLI execution ID for resume (populated at runtime)

 **execution_config Alignment Rules** (MANDATORY):
 ```
-userConfig.executionMethod → meta.execution_config + implementation_approach
+userConfig.executionMethod → meta.execution_config

 "agent" →
  meta.execution_config = { method: "agent", cli_tool: null, enable_resume: false }
-  implementation_approach steps: NO command field (agent direct execution)
-
-"hybrid" →
-  meta.execution_config = { method: "hybrid", cli_tool: userConfig.preferredCliTool }
-  implementation_approach steps: command field ONLY on complex steps
+  Execution: Agent executes pre_analysis, then directly implements implementation_approach

 "cli" →
-  meta.execution_config = { method: "cli", cli_tool: userConfig.preferredCliTool }
-  implementation_approach steps: command field on ALL steps
+  meta.execution_config = { method: "cli", cli_tool: userConfig.preferredCliTool, enable_resume: true }
+  Execution: Agent executes pre_analysis, then hands off full context to CLI via buildCliHandoffPrompt()
+
+"hybrid" →
+  Per-task decision: set method to "agent" OR "cli" per task based on complexity
+  - Simple tasks (≤3 files, straightforward logic) → { method: "agent", cli_tool: null, enable_resume: false }
+  - Complex tasks (>3 files, complex logic, refactoring) → { method: "cli", cli_tool: userConfig.preferredCliTool, enable_resume: true }
+  Final task JSON always has method = "agent" or "cli", never "hybrid"
 ```

-**Consistency Check**: `meta.execution_config.method` MUST match presence of `command` fields:
- `method: "agent"` → 0 steps have command field
- `method: "hybrid"` → some steps have command field
- `method: "cli"` → all steps have command field
+**IMPORTANT**: implementation_approach steps do NOT contain `command` fields. Execution routing is controlled by task-level `meta.execution_config.method` only.

 **Test Task Extensions** (for type="test-gen" or type="test-fix"):

@@ -336,7 +346,7 @@ userConfig.executionMethod → meta.execution_config + implementation_approach
 - `test_framework`: Existing test framework from project (required for test tasks)
 - `coverage_target`: Target code coverage percentage (optional)

-**Note**: CLI tool usage for test-fix tasks is now controlled via `flow_control.implementation_approach` steps with `command` fields, not via `meta.use_codex`.
+**Note**: CLI tool usage for test-fix tasks is now controlled via task-level `meta.execution_config.method`, not via `meta.use_codex`.

 #### Context Object

@@ -547,59 +557,45 @@ The examples above demonstrate **patterns**, not fixed requirements. Agent MUST:

 ##### Implementation Approach

-**Execution Modes**:
+**Execution Control**:

-The `implementation_approach` supports **two execution modes** based on the presence of the `command` field:
+The `implementation_approach` defines sequential implementation steps. Execution routing is controlled by **task-level `meta.execution_config.method`**, NOT by step-level `command` fields.

-1. **Default Mode (Agent Execution)** - `command` field **omitted**:
+**Two Execution Modes**:
+
+1. **Agent Mode** (`meta.execution_config.method = "agent"`):
   - Agent interprets `modification_points` and `logic_flow` autonomously
   - Direct agent execution with full context awareness
   - No external tool overhead
   - **Use for**: Standard implementation tasks where agent capability is sufficient
-   - **Required fields**: `step`, `title`, `description`, `modification_points`, `logic_flow`, `depends_on`, `output`

-2. **CLI Mode (Command Execution)** - `command` field **included**:
-   - Specified command executes the step directly
-   - Leverages specialized CLI tools (codex/gemini/qwen) for complex reasoning
-   - **Use for**: Large-scale features, complex refactoring, or when user explicitly requests CLI tool usage
-   - **Required fields**: Same as default mode **PLUS** `command`, `resume_from` (optional)
-   - **Command patterns** (with resume support):
-     - `ccw cli -p '[prompt]' --tool codex --mode write --cd [path]`
-     - `ccw cli -p '[prompt]' --resume ${previousCliId} --tool codex --mode write` (resume from previous)
-     - `ccw cli -p '[prompt]' --tool gemini --mode write --cd [path]` (write mode)
-   - **Resume mechanism**: When step depends on previous CLI execution, include `--resume` with previous execution ID
+2. **CLI Mode** (`meta.execution_config.method = "cli"`):
+   - Agent executes `pre_analysis`, then hands off full context to CLI via `buildCliHandoffPrompt()`
+   - CLI tool specified in `meta.execution_config.cli_tool` (codex/gemini/qwen)
+   - Leverages specialized CLI tools for complex reasoning
+   - **Use for**: Large-scale features, complex refactoring, or when userConfig.executionMethod = "cli"

-**Semantic CLI Tool Selection**:
+**Step Schema** (same for both modes):
+```json
+{
+  "step": 1,
+  "title": "Step title",
+  "description": "What to implement (may use [variable] placeholders from pre_analysis)",
+  "modification_points": ["Quantified changes: [list with counts]"],
+  "logic_flow": ["Implementation sequence"],
+  "depends_on": [0],
+  "output": "variable_name"
+}
+```

-Agent determines CLI tool usage per-step based on user semantics and task nature.
+**Required fields**: `step`, `title`, `description`, `modification_points`, `logic_flow`, `depends_on`, `output`

-**Source**: Scan `metadata.task_description` from context-package.json for CLI tool preferences.
+**IMPORTANT**: Do NOT add `command` field to implementation_approach steps. Execution routing is determined by task-level `meta.execution_config.method` only.

-**User Semantic Triggers** (patterns to detect in task_description):
- "use Codex/codex" → Add `command` field with Codex CLI
- "use Gemini/gemini" → Add `command` field with Gemini CLI
- "use Qwen/qwen" → Add `command` field with Qwen CLI
- "CLI execution" / "automated" → Infer appropriate CLI tool
-
-**Task-Based Selection** (when no explicit user preference):
- **Implementation/coding**: Codex preferred for autonomous development
- **Analysis/exploration**: Gemini preferred for large context analysis
- **Documentation**: Gemini/Qwen with write mode (`--mode write`)
- **Testing**: Depends on complexity - simple=agent, complex=Codex
-
-**Default Behavior**: Agent always executes the workflow. CLI commands are embedded in `implementation_approach` steps:
- Agent orchestrates task execution
- When step has `command` field, agent executes it via CCW CLI
- When step has no `command` field, agent implements directly
- This maintains agent control while leveraging CLI tool power
-
-**Key Principle**: The `command` field is **optional**. Agent decides based on user semantics and task complexity.
-
-**Examples**:
+**Example**:

 ```json
 [
-  // === DEFAULT MODE: Agent Execution (no command field) ===
  {
    "step": 1,
    "title": "Load and analyze role analyses",
@@ -636,33 +632,6 @@ Agent determines CLI tool usage per-step based on user semantics and task nature
    ],
    "depends_on": [1],
    "output": "implementation"
-  },
-
-  // === CLI MODE: Command Execution (optional command field) ===
-  {
-    "step": 3,
-    "title": "Execute implementation using CLI tool",
-    "description": "Use Codex/Gemini for complex autonomous execution",
-    "command": "ccw cli -p '[prompt]' --tool codex --mode write --cd [path]",
-    "modification_points": ["[Same as default mode]"],
-    "logic_flow": ["[Same as default mode]"],
-    "depends_on": [1, 2],
-    "output": "cli_implementation",
-    "cli_output_id": "step3_cli_id"  // Store execution ID for resume
-  },
-
-  // === CLI MODE with Resume: Continue from previous CLI execution ===
-  {
-    "step": 4,
-    "title": "Continue implementation with context",
-    "description": "Resume from previous step with accumulated context",
-    "command": "ccw cli -p '[continuation prompt]' --resume ${step3_cli_id} --tool codex --mode write",
-    "resume_from": "step3_cli_id",  // Reference previous step's CLI ID
-    "modification_points": ["[Continue from step 3]"],
-    "logic_flow": ["[Build on previous output]"],
-    "depends_on": [3],
-    "output": "continued_implementation",
-    "cli_output_id": "step4_cli_id"
  }
 ]
 ```
@@ -785,13 +754,13 @@ Generate at `.workflow/active/{session_id}/TODO_LIST.md`:
 Use `analysis_results.complexity` or task count to determine structure:

 **Single Module Mode**:
- **Simple Tasks** (≤5 tasks): Flat structure
- **Medium Tasks** (6-12 tasks): Flat structure
- **Complex Tasks** (>12 tasks): Re-scope required (maximum 12 tasks hard limit)
+- **Simple Tasks** (≤4 tasks): Flat structure
+- **Medium Tasks** (5-8 tasks): Flat structure
+- **Complex Tasks** (>8 tasks): Re-scope required (maximum 8 tasks hard limit)

 **Multi-Module Mode** (N+1 parallel planning):
- **Per-module limit**: ≤9 tasks per module
- **Total limit**: Sum of all module tasks ≤27 (3 modules × 9 tasks)
+- **Per-module limit**: ≤6 tasks per module
+- **Total limit**: No total limit (each module independently capped at 6 tasks)
 - **Task ID format**: `IMPL-{prefix}{seq}` (e.g., IMPL-A1, IMPL-B1)
 - **Structure**: Hierarchical by module in IMPL_PLAN.md and TODO_LIST.md

@@ -852,9 +821,35 @@ Use `analysis_results.complexity` or task count to determine structure:
 - Proper linking between documents
 - Consistent navigation and references

-### 3.3 Guidelines Checklist
+### 3.3 N+1 Context Recording
+
+**Purpose**: Record decisions and deferred items for N+1 planning continuity.
+
+**When**: After task generation, update `## N+1 Context` in planning-notes.md.
+
+**What to Record**:
+- **Decisions**: Architecture/technology choices with rationale (mark `Revisit?` if may change)
+- **Deferred**: Items explicitly moved to N+1 with reason
+
+**Example**:
+```markdown
+## N+1 Context
+### Decisions
+| Decision | Rationale | Revisit? |
+|----------|-----------|----------|
+| JWT over Session | Stateless scaling | No |
+| CROSS::B::api → IMPL-B1 | B1 defines base | Yes |
+
+### Deferred
+- [ ] Rate limiting - Requires Redis (N+1)
+- [ ] API versioning - Low priority
+```
+
+### 3.4 Guidelines Checklist

 **ALWAYS:**
+- **Load planning-notes.md FIRST**: Read planning-notes.md before context-package.json. Use its Consolidated Constraints as primary constraint source for all task generation
+- **Record N+1 Context**: Update `## N+1 Context` section with key decisions and deferred items
 - **Search Tool Priority**: ACE (`mcp__ace-tool__search_context`) → CCW (`mcp__ccw-tools__smart_search`) / Built-in (`Grep`, `Glob`, `Read`)
 - Apply Quantification Requirements to all requirements, acceptance criteria, and modification points
 - Load IMPL_PLAN template: `Read(~/.claude/workflows/cli-templates/prompts/workflow/impl-plan-template.txt)` before generating IMPL_PLAN.md
@@ -865,7 +860,7 @@ Use `analysis_results.complexity` or task count to determine structure:
 - **Compute CLI execution strategy**: Based on `depends_on`, set `cli_execution.strategy` (new/resume/fork/merge_fork)
 - Map artifacts: Use artifacts_inventory to populate task.context.artifacts array
 - Add MCP integration: Include MCP tool steps in flow_control.pre_analysis when capabilities available
- Validate task count: Maximum 12 tasks hard limit, request re-scope if exceeded
+- Validate task count: Maximum 8 tasks (single module) or 6 tasks per module (multi-module), request re-scope if exceeded
 - Use session paths: Construct all paths using provided session_id
 - Link documents properly: Use correct linking format (📋 for JSON, ✅ for summaries)
 - Run validation checklist: Verify all quantification requirements before finalizing task JSONs
@@ -879,7 +874,7 @@ Use `analysis_results.complexity` or task count to determine structure:
 - Load files directly (use provided context package instead)
 - Assume default locations (always use session_id in paths)
 - Create circular dependencies in task.depends_on
- Exceed 12 tasks without re-scoping
+- Exceed 8 tasks (single module) or 6 tasks per module (multi-module) without re-scoping
 - Skip artifact integration when artifacts_inventory is provided
 - Ignore MCP capabilities when available
 - Use fixed pre-analysis steps without task-specific adaptation