feat: add APIEmbedder for remote embedding with multi-endpoint support

- Introduced APIEmbedder class to handle embeddings via a remote HTTP API. - Implemented token packing to optimize batch sizes based on token limits. - Added support for multiple API endpoints with round-robin dispatching. - Included retry logic for API calls with exponential backoff on failures. - Enhanced indexing pipeline with file exclusion checks and smart chunking strategies. - Updated tests to cover new APIEmbedder functionality and ensure robustness.
2026-03-18 18:48:48 +08:00 · 2026-03-17 17:17:24 +08:00
parent 34749d2fad
commit f37189dc64
18 changed files with 1633 additions and 476 deletions
--- a/.claude/skills/workflow-lite-execute/SKILL.md
+++ b/.claude/skills/workflow-lite-execute/SKILL.md
@@ -412,17 +412,44 @@ if (hasUnresolvedIssues(reviewResult)) {

 **Artifact Substitution**: Replace `@{plan.json}` → `@${executionContext.session.artifacts.plan}`, `[@{exploration.json}]` → exploration files from artifacts (if exists).

-### Step 5: Auto-Sync Project State
+### Step 5: Chain to Test Review & Post-Completion

-**Trigger**: After all executions complete (regardless of code review)
+> **Note**: Spec sync (session:sync) is handled by lite-test-review's TR-Phase 5, not here. This avoids duplicate sync and ensures test fix changes are also captured.

-**Operation**: `/workflow:session:sync -y "{summary}"`
+**Map review tool**: Convert lite-execute's `codeReviewTool` to test-review tool name.

-Summary priority: `originalUserInput` → `planObject.summary` → git log auto-infer.
+```javascript
+function mapReviewTool(codeReviewTool) {
+  if (!codeReviewTool || codeReviewTool === 'Skip') return 'agent'
+  if (/gemini/i.test(codeReviewTool)) return 'gemini'
+  if (/codex/i.test(codeReviewTool)) return 'codex'
+  return 'agent'
+}
+```

-### Step 6: Post-Completion Expansion
+**Build testReviewContext and handoff**:

-Ask user whether to expand into issues (test/enhance/refactor/doc). Selected items call `/issue:new "{summary} - {dimension}"`.
+```javascript
+testReviewContext = {
+  planObject: planObject,
+  taskFiles: executionContext?.taskFiles
+    || getTasks(planObject).map(t => ({ id: t.id, path: `${executionContext?.session?.folder}/.task/${t.id}.json` })),
+  reviewTool: mapReviewTool(executionContext?.codeReviewTool),
+  executionResults: previousExecutionResults,
+  originalUserInput: originalUserInput,
+  session: executionContext?.session || {
+    id: 'standalone',
+    folder: executionContext?.session?.folder || '.',
+    artifacts: { plan: null, task_dir: null }
+  }
+}
+
+// Chain to lite-test-review (Mode 1: In-Memory)
+Skill("lite-test-review")
+// testReviewContext passed as global variable
+```
+
+**After test-review returns**: Ask user whether to expand into issues (enhance/refactor/doc). Selected items call `/issue:new "{summary} - {dimension}"`.

 ## Error Handling

--- a/.claude/skills/workflow-lite-plan/SKILL.md
+++ b/.claude/skills/workflow-lite-plan/SKILL.md
@@ -554,12 +554,25 @@ Skill("lite-execute")
 ├── explorations-manifest.json        # Exploration index
 ├── planning-context.md               # Evidence paths + understanding
 ├── plan.json                         # Plan overview (task_ids[])
+├── test-checklist.json               # Generated by lite-test-review
+├── test-review.md                    # Generated by lite-test-review
 └── .task/
    ├── TASK-001.json
    ├── TASK-002.json
    └── ...
 ```

+## Chain: lite-plan → lite-execute → lite-test-review
+
+```
+lite-plan (LP-Phase 1-5)
+  └─ Skill("lite-execute")     ← executionContext (global)
+       ├─ Step 1-4: Execute + Review
+       └─ Step 5: Skill("lite-test-review")  ← testReviewContext (global)
+            ├─ TR-Phase 1-4: Test + Fix
+            └─ TR-Phase 5: Report + Sync specs
+```
+
 ## 12. Error Handling

 | Error | Resolution |
--- a/.claude/skills/workflow-lite-test-review/SKILL.md
+++ b/.claude/skills/workflow-lite-test-review/SKILL.md
@@ -0,0 +1,442 @@
+---
+name: workflow-lite-test-review
+description: Post-execution test review and fix - chain from lite-execute or standalone. Reviews implementation against plan, runs tests, auto-fixes failures.
+allowed-tools: Skill, Agent, AskUserQuestion, TodoWrite, Read, Write, Edit, Bash, Glob, Grep
+---
+
+# Workflow-Lite-Test-Review
+
+Test review and fix engine for lite-execute chain or standalone invocation.
+
+---
+
+## Usage
+
+```
+<session-path|--last>      Session path or auto-detect last session (required for standalone)
+```
+
+| Flag | Description |
+|------|-------------|
+| `--in-memory` | Mode 1: Chain from lite-execute via `testReviewContext` global variable |
+| `--skip-fix` | Review only, do not auto-fix failures |
+
+## Input Modes
+
+### Mode 1: In-Memory Chain (from lite-execute)
+
+**Trigger**: `--in-memory` flag or `testReviewContext` global variable available
+
+**Input Source**: `testReviewContext` global variable set by lite-execute Step 6
+
+**Behavior**: Skip session discovery (already resolved), inherit review tool from execution chain, proceed directly to TR-Phase 1.
+
+> **Note**: lite-execute Step 6 is the chain gate. Mode 1 invocation means execution is complete — proceed with test review.
+
+### Mode 2: Standalone
+
+**Trigger**: User calls with session path or `--last`
+
+**Behavior**: Discover session → load plan + tasks → detect test tool → proceed to TR-Phase 1.
+
+```javascript
+// Session discovery
+let sessionPath, plan, taskFiles, reviewTool
+
+if (testReviewContext) {
+  // Mode 1: from lite-execute chain
+  sessionPath = testReviewContext.session.folder
+  plan = testReviewContext.planObject
+  taskFiles = testReviewContext.taskFiles.map(tf => JSON.parse(Read(tf.path)))
+  reviewTool = testReviewContext.reviewTool || 'agent'
+} else {
+  // Mode 2: standalone
+  const args = $ARGUMENTS
+  if (args.includes('--last') || !args.trim() || args.trim() === '--skip-fix') {
+    const sessions = Glob('.workflow/.lite-plan/*/plan.json')
+    if (sessions.length === 0) {
+      console.error('No lite-plan sessions found.')
+      return
+    }
+    sessionPath = sessions[sessions.length - 1].replace(/[/\\]plan\.json$/, '')
+  } else {
+    sessionPath = args.replace(/--skip-fix|--last/g, '').trim()
+  }
+  plan = JSON.parse(Read(`${sessionPath}/plan.json`))
+  taskFiles = plan.task_ids.map(id => JSON.parse(Read(`${sessionPath}/.task/${id}.json`)))
+  reviewTool = 'agent'  // default for standalone
+}
+
+const skipFix = $ARGUMENTS?.includes('--skip-fix') || false
+```
+
+## Phase Summary
+
+| Phase | Core Action | Output |
+|-------|-------------|--------|
+| TR-Phase 1 | Detect test framework + gather changes | testConfig |
+| TR-Phase 2 | Review implementation against convergence criteria | reviewResults[] |
+| TR-Phase 3 | Run tests + generate checklist | test-checklist.json |
+| TR-Phase 4 | Auto-fix failures (iterative, max 3 rounds) | Fixed code + updated checklist |
+| TR-Phase 5 | Output report + chain to session:sync | test-review.md |
+
+## TR-Phase 0: Initialize
+
+```javascript
+const sessionId = sessionPath.split('/').pop()
+
+TodoWrite({ todos: [
+  { content: "TR-Phase 1: Detect & Gather", status: "in_progress", activeForm: "Detecting test framework" },
+  { content: "TR-Phase 2: Review Convergence", status: "pending" },
+  { content: "TR-Phase 3: Run Tests", status: "pending" },
+  { content: "TR-Phase 4: Auto-Fix", status: "pending" },
+  { content: "TR-Phase 5: Report & Sync", status: "pending" }
+]})
+```
+
+## TR-Phase 1: Detect Test Framework & Gather Changes
+
+```javascript
+// Detect test framework
+const hasPackageJson = Glob('package.json').length > 0
+const hasPyproject = Glob('pyproject.toml').length > 0
+const hasCargo = Glob('Cargo.toml').length > 0
+const hasGoMod = Glob('go.mod').length > 0
+
+let testConfig = { command: null, framework: null, type: null }
+
+if (hasPackageJson) {
+  const pkg = JSON.parse(Read('package.json'))
+  const scripts = pkg.scripts || {}
+  if (scripts.test) { testConfig = { command: 'npm test', framework: 'jest/vitest', type: 'node' } }
+  else if (scripts['test:unit']) { testConfig = { command: 'npm run test:unit', framework: 'jest/vitest', type: 'node' } }
+} else if (hasPyproject) {
+  testConfig = { command: 'python -m pytest -v --tb=short', framework: 'pytest', type: 'python' }
+} else if (hasCargo) {
+  testConfig = { command: 'cargo test', framework: 'cargo-test', type: 'rust' }
+} else if (hasGoMod) {
+  testConfig = { command: 'go test ./...', framework: 'go-test', type: 'go' }
+}
+
+// Gather git changes
+const changedFiles = Bash('git diff --name-only HEAD~5..HEAD 2>/dev/null || git diff --name-only HEAD')
+  .split('\n').filter(Boolean)
+const gitDiffStat = Bash('git diff --stat HEAD~5..HEAD 2>/dev/null || git diff --stat HEAD')
+
+console.log(`Test Framework: ${testConfig.framework || 'unknown'} | Command: ${testConfig.command || 'none'}`)
+console.log(`Changed Files: ${changedFiles.length}`)
+```
+
+// TodoWrite: Phase 1 → completed, Phase 2 → in_progress
+
+## TR-Phase 2: Review Implementation Against Plan
+
+For each task, verify convergence criteria using agent or CLI review tool.
+
+**Agent Review** (reviewTool === 'agent', default):
+
+```javascript
+const reviewResults = []
+
+for (const task of taskFiles) {
+  const criteria = task.convergence?.criteria || []
+  const testReqs = task.test || {}
+
+  // Find actual changed files matching task scope
+  const taskTargetFiles = (task.files || [])
+    .map(f => f.path)
+    .filter(p => changedFiles.some(c => c.includes(p) || p.includes(c)))
+
+  // Read implementation to verify criteria
+  const fileContents = taskTargetFiles.map(p => {
+    try { return { path: p, content: Read(p) } }
+    catch { return { path: p, content: null } }
+  }).filter(f => f.content)
+
+  const review = {
+    taskId: task.id,
+    title: task.title,
+    criteria_met: [],
+    criteria_unmet: [],
+    test_gaps: [],
+    files_reviewed: taskTargetFiles
+  }
+
+  // Agent evaluates each criterion against file contents
+  for (const criterion of criteria) {
+    // Check: does implementation satisfy this criterion?
+    // Analyze file contents, look for expected patterns/functions/logic
+    const met = /* agent evaluation based on fileContents */ true_or_false
+    if (met) review.criteria_met.push(criterion)
+    else review.criteria_unmet.push(criterion)
+  }
+
+  // Check test coverage gaps
+  const hasTestFiles = changedFiles.some(f =>
+    /test[_\-.]|spec[_\-.]|\/__tests__\/|\/tests\//.test(f)
+  )
+  if (testReqs.unit?.length > 0 && !hasTestFiles) {
+    testReqs.unit.forEach(u => review.test_gaps.push({ type: 'unit', desc: u }))
+  }
+  if (testReqs.integration?.length > 0) {
+    testReqs.integration.forEach(i => review.test_gaps.push({ type: 'integration', desc: i }))
+  }
+
+  reviewResults.push(review)
+}
+```
+
+**CLI Review** (reviewTool === 'gemini' or 'codex'):
+
+```javascript
+if (reviewTool !== 'agent') {
+  const reviewId = `${sessionId}-tr-review`
+  Bash(`ccw cli -p "PURPOSE: Post-execution test review — verify convergence criteria met and identify test gaps
+TASK: • Read plan.json and .task/*.json convergence criteria • For each criterion, check implementation in changed files • Identify missing unit/integration tests • List unmet criteria with file:line evidence
+MODE: analysis
+CONTEXT: @${sessionPath}/plan.json @${sessionPath}/.task/*.json @**/* | Memory: lite-execute completed, reviewing convergence
+EXPECTED: Per-task verdict table (PASS/PARTIAL/FAIL) + unmet criteria list + test gap list
+CONSTRAINTS: Read-only | Focus on convergence verification" --tool ${reviewTool} --mode analysis --id ${reviewId}`, { run_in_background: true })
+  // STOP - wait for hook callback, then parse CLI output into reviewResults format
+}
+```
+
+// TodoWrite: Phase 2 → completed, Phase 3 → in_progress
+
+## TR-Phase 3: Run Tests & Generate Checklist
+
+```javascript
+// Build checklist structure
+const testChecklist = {
+  session: sessionId,
+  plan_summary: plan.summary,
+  generated_at: new Date().toISOString(),
+  test_config: testConfig,
+  tasks: reviewResults.map(review => {
+    const task = taskFiles.find(t => t.id === review.taskId)
+    const testReqs = task.test || {}
+    return {
+      task_id: review.taskId,
+      title: review.title,
+      status: review.criteria_unmet.length === 0 ? 'PASS'
+        : review.criteria_met.length > 0 ? 'PARTIAL' : 'FAIL',
+      convergence: { met: review.criteria_met, unmet: review.criteria_unmet },
+      test_items: [
+        ...(testReqs.unit || []).map(u => ({ type: 'unit', desc: u, status: 'pending' })),
+        ...(testReqs.integration || []).map(i => ({ type: 'integration', desc: i, status: 'pending' })),
+        ...(testReqs.success_metrics || []).map(m => ({ type: 'metric', desc: m, status: 'pending' })),
+        ...review.test_gaps.map(g => ({ type: g.type, desc: g.desc, status: 'missing' }))
+      ]
+    }
+  }),
+  execution: null
+}
+
+// Run tests if framework detected
+if (testConfig.command) {
+  console.log(`Running: ${testConfig.command}`)
+  const testResult = Bash(testConfig.command, { timeout: 300000 })
+
+  const passed = /(\d+) passed/.test(testResult) || /PASSED/.test(testResult) || /ok \d+/.test(testResult)
+  const failMatch = testResult.match(/(\d+) failed/)
+  const hasFail = failMatch || /FAILED/.test(testResult) || /FAIL/.test(testResult)
+
+  testChecklist.execution = {
+    command: testConfig.command,
+    timestamp: new Date().toISOString(),
+    raw_output: testResult.slice(-3000),  // keep tail for error context
+    overall: hasFail ? 'FAIL' : (passed ? 'PASS' : 'UNKNOWN'),
+    fail_count: failMatch ? parseInt(failMatch[1]) : (hasFail ? -1 : 0)
+  }
+
+  console.log(`Result: ${testChecklist.execution.overall}`)
+} else {
+  console.log('No test command detected. Skipping test execution.')
+}
+
+Write(`${sessionPath}/test-checklist.json`, JSON.stringify(testChecklist, null, 2))
+```
+
+// TodoWrite: Phase 3 → completed, Phase 4 → in_progress
+
+## TR-Phase 4: Auto-Fix Failures (Iterative)
+
+**Skip if**: `skipFix === true` OR `testChecklist.execution?.overall !== 'FAIL'`
+
+**Max iterations**: 3
+
+```javascript
+if (skipFix || !testChecklist.execution || testChecklist.execution.overall !== 'FAIL') {
+  console.log(testChecklist.execution?.overall === 'PASS'
+    ? 'All tests passed. Skipping fix phase.'
+    : 'Skipping auto-fix (--skip-fix or no test execution).')
+  // TodoWrite: Phase 4 → completed (skipped)
+} else {
+  let iteration = 0
+  const MAX_ITERATIONS = 3
+
+  while (iteration < MAX_ITERATIONS && testChecklist.execution.overall === 'FAIL') {
+    iteration++
+    console.log(`\n--- Fix Iteration ${iteration}/${MAX_ITERATIONS} ---`)
+
+    // Use test-fix-agent for fixing
+    Agent({
+      subagent_type: "test-fix-agent",
+      run_in_background: false,
+      description: `Fix tests (iter ${iteration})`,
+      prompt: `## Test Fix Iteration ${iteration}/${MAX_ITERATIONS}
+
+**Test Command**: ${testConfig.command}
+**Framework**: ${testConfig.framework}
+**Session**: ${sessionPath}
+
+### Failing Output (last 3000 chars)
+\`\`\`
+${testChecklist.execution.raw_output}
+\`\`\`
+
+### Plan Context
+**Summary**: ${plan.summary}
+**Tasks**: ${taskFiles.map(t => `${t.id}: ${t.title}`).join(' | ')}
+
+### Instructions
+1. Analyze test failure output to identify root cause
+2. Fix the SOURCE CODE (not tests) unless tests themselves are wrong
+3. Run \`${testConfig.command}\` to verify fix
+4. If fix introduces new failures, revert and try alternative approach
+5. Return: what was fixed, which files changed, test result after fix`
+    })
+
+    // Re-run tests after fix
+    const retestResult = Bash(testConfig.command, { timeout: 300000 })
+    const hasFail = /failed|FAIL/.test(retestResult)
+
+    testChecklist.execution = {
+      command: testConfig.command,
+      timestamp: new Date().toISOString(),
+      raw_output: retestResult.slice(-3000),
+      overall: hasFail ? 'FAIL' : 'PASS',
+      fix_iteration: iteration
+    }
+
+    Write(`${sessionPath}/test-checklist.json`, JSON.stringify(testChecklist, null, 2))
+
+    if (!hasFail) {
+      console.log(`Tests passed after iteration ${iteration}.`)
+      break
+    }
+  }
+
+  if (testChecklist.execution.overall === 'FAIL') {
+    console.log(`Tests still failing after ${MAX_ITERATIONS} iterations. Manual investigation needed.`)
+  }
+}
+```
+
+// TodoWrite: Phase 4 → completed, Phase 5 → in_progress
+
+## TR-Phase 5: Report & Sync
+
+> **CHECKPOINT**: This step is MANDATORY. Always generate report and trigger sync.
+
+```javascript
+// Generate markdown report
+const report = `# Test Review Report
+
+**Session**: ${sessionId}
+**Summary**: ${plan.summary}
+**Generated**: ${new Date().toISOString()}
+**Test Framework**: ${testConfig.framework || 'unknown'}
+
+## Task Verdicts
+
+| Task | Status | Convergence | Test Items | Gaps |
+|------|--------|-------------|------------|------|
+${testChecklist.tasks.map(t =>
+  `| ${t.task_id} | ${t.status} | ${t.convergence.met.length}/${t.convergence.met.length + t.convergence.unmet.length} | ${t.test_items.length} | ${t.test_items.filter(i => i.status === 'missing').length} |`
+).join('\n')}
+
+## Unmet Criteria
+
+${testChecklist.tasks.filter(t => t.convergence.unmet.length > 0).map(t =>
+  `### ${t.task_id}: ${t.title}\n${t.convergence.unmet.map(u => \`- [ ] \${u}\`).join('\n')}`
+).join('\n\n') || 'All criteria met.'}
+
+## Test Gaps
+
+${testChecklist.tasks.flatMap(t => t.test_items.filter(i => i.status === 'missing')).map(i =>
+  \`- [ ] (\${i.type}) \${i.desc}\`
+).join('\n') || 'No gaps detected.'}
+
+${testChecklist.execution ? `## Test Execution
+
+**Command**: \\\`${testChecklist.execution.command}\\\`
+**Result**: ${testChecklist.execution.overall}
+${testChecklist.execution.fix_iteration ? `**Fixed in iteration**: ${testChecklist.execution.fix_iteration}` : ''}
+` : '## Test Execution\n\nNo test framework detected.'}
+`
+
+Write(`${sessionPath}/test-review.md`, report)
+console.log(`Report: ${sessionPath}/test-review.md`)
+console.log(`Checklist: ${sessionPath}/test-checklist.json`)
+
+// Chain to session:sync
+Skill({ skill: "workflow:session:sync", args: `-y "Test review: ${testChecklist.execution?.overall || 'no-test'} — ${plan.summary}"` })
+```
+
+// TodoWrite: Phase 5 → completed
+
+**Display summary**:
+```javascript
+console.log(`
+── Test Review Complete ──
+${testChecklist.tasks.map(t => {
+  const icon = t.status === 'PASS' ? '[PASS]' : t.status === 'PARTIAL' ? '[PARTIAL]' : '[FAIL]'
+  return `${icon} ${t.task_id}: ${t.title} (${t.convergence.met.length}/${t.convergence.met.length + t.convergence.unmet.length})`
+}).join('\n')}
+Test: ${testChecklist.execution?.overall || 'skipped'}${testChecklist.execution?.fix_iteration ? ` (fixed iter ${testChecklist.execution.fix_iteration})` : ''}
+`)
+```
+
+## Data Structures
+
+### testReviewContext (Input - Mode 1, set by lite-execute)
+
+```javascript
+{
+  planObject: { /* same as executionContext.planObject */ },
+  taskFiles: [{ id: string, path: string }],
+  reviewTool: "agent" | "gemini" | "codex",   // inherited from lite-execute codeReviewTool
+  executionResults: [...],                      // previousExecutionResults from lite-execute
+  originalUserInput: string,
+  session: {
+    id: string,
+    folder: string,
+    artifacts: { plan: string, task_dir: string }
+  }
+}
+```
+
+## Session Folder Structure (after test-review)
+
+```
+.workflow/.lite-plan/{session-id}/
+├── exploration-*.json
+├── explorations-manifest.json
+├── planning-context.md
+├── plan.json
+├── .task/TASK-*.json
+├── test-checklist.json          # NEW: structured test results
+└── test-review.md               # NEW: human-readable report
+```
+
+## Error Handling
+
+| Error | Resolution |
+|-------|------------|
+| No session found | "No lite-plan sessions found. Run lite-plan first." |
+| Missing plan.json | "Invalid session: missing plan.json at {path}" |
+| No test framework | Skip TR-Phase 3 execution, still generate review report |
+| Test timeout | Capture partial output, report as FAIL |
+| Fix agent fails | Log iteration, continue to next or stop at max |
+| Sync fails | Log warning, do not block report generation |