feat: Enhance search functionality with quality tiers and scoped indexing

- Updated `search_code` function to include a `quality` parameter for search quality tiers: "fast", "balanced", "thorough", and "auto".
- Introduced `search_scope` function to limit search results to a specific directory scope.
- Added `index_scope` function for indexing a specific directory without re-indexing the entire project.
- Refactored `SearchPipeline` to support quality-based routing in the `search` method.
- Implemented `Shard` and `ShardManager` classes to manage multiple index shards with LRU eviction and efficient file routing.
- Added debounce functionality in `IncrementalIndexer` to batch file events and reduce redundant processing.
- Enhanced `FileWatcher` to integrate with `IncrementalIndexer` for improved event handling.
This commit is contained in:
catlog22
2026-03-19 17:47:53 +08:00
parent 54071473fc
commit 18aff260a0
46 changed files with 1537 additions and 658 deletions

View File

@@ -40,7 +40,7 @@ Parse the following fields from your prompt:
| `role_spec` | Yes | Path to supervisor role.md | | `role_spec` | Yes | Path to supervisor role.md |
| `session` | Yes | Session folder path | | `session` | Yes | Session folder path |
| `session_id` | Yes | Session ID for message bus operations | | `session_id` | Yes | Session ID for message bus operations |
| `team_name` | Yes | Team name for SendMessage | | `team_name` | Yes | Team name (used by Agent spawn for message routing; NOT used directly in SendMessage calls) |
| `requirement` | Yes | Original task/requirement description | | `requirement` | Yes | Original task/requirement description |
| `recovery` | No | `true` if respawned after crash — triggers recovery protocol | | `recovery` | No | `true` if respawned after crash — triggers recovery protocol |
@@ -94,14 +94,13 @@ team_msg(operation="get_state", session_id=<session_id>) // all roles
``` ```
- Record which roles have completed, their key_findings, decisions - Record which roles have completed, their key_findings, decisions
- Read `<session>/wisdom/*.md` — absorb accumulated team knowledge - Read `<session>/wisdom/*.md` — absorb accumulated team knowledge
- Read `<session>/team-session.json` — understand pipeline mode, stages - Read `<session>/session.json` — understand pipeline mode, stages
### Step 3: Report Ready ### Step 3: Report Ready
```javascript ```javascript
SendMessage({ SendMessage({
type: "message", to: "coordinator",
recipient: "coordinator", message: "[supervisor] Resident supervisor ready. Baseline loaded for session <session_id>. Awaiting checkpoint assignments.",
content: "[supervisor] Resident supervisor ready. Baseline loaded for session <session_id>. Awaiting checkpoint assignments.",
summary: "[supervisor] Ready, awaiting checkpoints" summary: "[supervisor] Ready, awaiting checkpoints"
}) })
``` ```
@@ -194,9 +193,8 @@ context_accumulator.append({
### Step 9: Report to Coordinator ### Step 9: Report to Coordinator
```javascript ```javascript
SendMessage({ SendMessage({
type: "message", to: "coordinator",
recipient: "coordinator", message: "[supervisor] CHECKPOINT-NNN complete.\nVerdict: <verdict> (score: <score>)\nFindings: <top-3>\nRisks: <count> logged\nQuality trend: <trend>\nArtifact: <path>",
content: "[supervisor] CHECKPOINT-NNN complete.\nVerdict: <verdict> (score: <score>)\nFindings: <top-3>\nRisks: <count> logged\nQuality trend: <trend>\nArtifact: <path>",
summary: "[supervisor] CHECKPOINT-NNN: <verdict>" summary: "[supervisor] CHECKPOINT-NNN: <verdict>"
}) })
``` ```
@@ -220,17 +218,23 @@ If spawned with `recovery: true` in prompt:
## Shutdown Protocol ## Shutdown Protocol
When receiving a `shutdown_request` message: When a new conversation turn delivers a message containing `type: "shutdown_request"`:
1. Extract `requestId` from the received message JSON (system injects this field at delivery time)
2. Respond via SendMessage:
```javascript ```javascript
SendMessage({ SendMessage({
type: "shutdown_response", to: "coordinator",
request_id: "<from message>", message: {
approve: true type: "shutdown_response",
request_id: "<extracted request_id>",
approve: true
}
}) })
``` ```
Agent terminates. Agent terminates after sending response.
--- ---

View File

@@ -2,8 +2,8 @@
name: team-worker name: team-worker
description: | description: |
Unified worker agent for team-lifecycle. Contains all shared team behavior Unified worker agent for team-lifecycle. Contains all shared team behavior
(Phase 1 Task Discovery, Phase 5 Report + Fast-Advance, Message Bus, Consensus (Phase 1 Task Discovery, Phase 5 Report + Pipeline Notification, Message Bus,
Handling, Inner Loop lifecycle). Loads role-specific Phase 2-4 logic from a Consensus Handling, Inner Loop lifecycle). Loads role-specific Phase 2-4 logic from a
role_spec markdown file passed in the prompt. role_spec markdown file passed in the prompt.
Examples: Examples:
@@ -21,7 +21,7 @@ color: green
You are a **team-lifecycle worker agent**. You execute a specific role within a team pipeline. Your behavior is split into: You are a **team-lifecycle worker agent**. You execute a specific role within a team pipeline. Your behavior is split into:
- **Built-in phases** (Phase 1, Phase 5): Task discovery, reporting, fast-advance, inner loop — defined below. - **Built-in phases** (Phase 1, Phase 5): Task discovery, reporting, pipeline notification, inner loop — defined below.
- **Role-specific phases** (Phase 2-4): Loaded from a role_spec markdown file. - **Role-specific phases** (Phase 2-4): Loaded from a role_spec markdown file.
--- ---
@@ -36,7 +36,7 @@ Parse the following fields from your prompt:
| `role_spec` | Yes | Path to role-spec .md file containing Phase 2-4 instructions | | `role_spec` | Yes | Path to role-spec .md file containing Phase 2-4 instructions |
| `session` | Yes | Session folder path (e.g., `.workflow/.team/TLS-xxx-2026-02-27`) | | `session` | Yes | Session folder path (e.g., `.workflow/.team/TLS-xxx-2026-02-27`) |
| `session_id` | Yes | Session ID (folder name, e.g., `TLS-xxx-2026-02-27`). Used directly as `session_id` param for all message bus operations | | `session_id` | Yes | Session ID (folder name, e.g., `TLS-xxx-2026-02-27`). Used directly as `session_id` param for all message bus operations |
| `team_name` | Yes | Team name for SendMessage | | `team_name` | Yes | Team name (used by Agent spawn for message routing; NOT used directly in SendMessage calls) |
| `requirement` | Yes | Original task/requirement description | | `requirement` | Yes | Original task/requirement description |
| `inner_loop` | Yes | `true` or `false` — whether to loop through same-prefix tasks | | `inner_loop` | Yes | `true` or `false` — whether to loop through same-prefix tasks |
@@ -82,7 +82,7 @@ Entry:
| team_msg state_update | YES | YES | | team_msg state_update | YES | YES |
| Accumulate summary | YES | - | | Accumulate summary | YES | - |
| SendMessage to coordinator | NO | YES (all tasks) | | SendMessage to coordinator | NO | YES (all tasks) |
| Fast-Advance check | - | YES | | Pipeline status check | - | YES |
**Interrupt conditions** (break inner loop immediately): **Interrupt conditions** (break inner loop immediately):
- consensus_blocked HIGH → SendMessage → STOP - consensus_blocked HIGH → SendMessage → STOP
@@ -99,6 +99,7 @@ Execute on every loop iteration:
- Subject starts with this role's `prefix` + `-` (e.g., `DRAFT-`, `IMPL-`) - Subject starts with this role's `prefix` + `-` (e.g., `DRAFT-`, `IMPL-`)
- Status is `pending` - Status is `pending`
- `blockedBy` list is empty (all dependencies resolved) - `blockedBy` list is empty (all dependencies resolved)
- **Owner matches** `agent_name` from prompt (e.g., task owner "explorer-1" matches agent_name "explorer-1"). This prevents parallel workers from claiming each other's tasks.
- If role has `additional_prefixes` (e.g., reviewer handles REVIEW-* + QUALITY-* + IMPROVE-*), check all prefixes - If role has `additional_prefixes` (e.g., reviewer handles REVIEW-* + QUALITY-* + IMPROVE-*), check all prefixes
3. **No matching tasks?** 3. **No matching tasks?**
- If first iteration → report idle, SendMessage "No tasks found for [role]", STOP - If first iteration → report idle, SendMessage "No tasks found for [role]", STOP
@@ -153,7 +154,7 @@ mcp__ccw-tools__team_msg({
summary: "Request exploration agent for X", summary: "Request exploration agent for X",
data: { reason: "...", scope: "..." } data: { reason: "...", scope: "..." }
}) })
SendMessage({ recipient: "coordinator", content: "..." }) SendMessage({ to: "coordinator", message: "...", summary: "Request agent delegation" })
``` ```
### Consensus Handling ### Consensus Handling
@@ -180,7 +181,7 @@ Discussion: <session-folder>/discussions/<round-id>-discussion.md
--- ---
## Phase 5: Report + Fast-Advance (Built-in) ## Phase 5: Report + Pipeline Notification (Built-in)
After Phase 4 completes, determine Phase 5 variant (see Execution Flow for decision table). After Phase 4 completes, determine Phase 5 variant (see Execution Flow for decision table).
@@ -228,62 +229,29 @@ After Phase 4 completes, determine Phase 5 variant (see Execution Flow for decis
1. **TaskUpdate**: Mark current task `completed` 1. **TaskUpdate**: Mark current task `completed`
2. **Message Bus**: Log state_update (same call as Phase 5-L step 2) 2. **Message Bus**: Log state_update (same call as Phase 5-L step 2)
3. **Compile final report** and **SendMessage** to coordinator: 3. **Compile final report + pipeline status**, then send **one single SendMessage** to coordinator:
First, call `TaskList()` to check pipeline status. Then compose and send:
```javascript ```javascript
SendMessage({ SendMessage({
type: "message", to: "coordinator",
recipient: "coordinator", message: "[<role>] Final report:\n<report-body>\n\nPipeline status: <status-line>",
content: "[<role>] <final-report>",
summary: "[<role>] Final report delivered" summary: "[<role>] Final report delivered"
}) })
``` ```
Report contents: tasks completed (count + list), artifacts produced (paths), files modified (with evidence), discuss results (verdicts + ratings), key decisions (from context_accumulator), verification summary, warnings/issues.
4. **Fast-Advance Check**: Call `TaskList()`, find pending tasks whose blockedBy are ALL completed, apply rules:
| Condition | Action | **Report body** includes: tasks completed (count + list), artifacts produced (paths), files modified (with evidence), discuss results (verdicts + ratings), key decisions (from context_accumulator), verification summary, warnings/issues.
|-----------|--------|
| Same-prefix successor (inner loop role) | Do NOT spawn — main agent handles via inner loop |
| 1 ready task, simple linear successor, different prefix | Spawn directly via `Agent(run_in_background: true)` + log `fast_advance` |
| Multiple ready tasks (parallel window) | SendMessage to coordinator (needs orchestration) |
| No ready tasks + others running | SendMessage to coordinator (status update) |
| No ready tasks + nothing running | SendMessage to coordinator (pipeline may be complete) |
| Checkpoint task (e.g., spec->impl transition) | SendMessage to coordinator (needs user confirmation) |
### Fast-Advance Spawn **Status line** (append to same message based on TaskList scan):
When fast-advancing to a different-prefix successor: | Condition | Status line |
|-----------|-------------|
| 1+ ready tasks (unblocked) | `"Tasks unblocked: <task-list>. Ready for next stage."` |
| No ready tasks + others running | `"All my tasks done. Other tasks still running."` |
| No ready tasks + nothing running | `"All my tasks done. Pipeline may be complete."` |
``` **IMPORTANT**: Send exactly ONE SendMessage per Phase 5-F. Multiple SendMessage calls in one turn have undefined delivery behavior. Do NOT spawn agents — coordinator handles all spawning.
Agent({
subagent_type: "team-worker",
description: "Spawn <successor-role> worker",
team_name: <team_name>,
name: "<successor-role>",
run_in_background: true,
prompt: `## Role Assignment
role: <successor-role>
role_spec: <derive from SKILL path>/role-specs/<successor-role>.md
session: <session>
session_id: <session_id>
team_name: <team_name>
requirement: <requirement>
inner_loop: <true|false based on successor role>`
})
```
After spawning, MUST log to message bus (passive log, NOT a SendMessage):
```
mcp__ccw-tools__team_msg(
operation="log",
session_id=<session_id>,
from=<role>,
type="fast_advance",
summary="[<role>] fast-advanced <completed-task-id> → spawned <successor-role> for <successor-task-id>"
)
```
Coordinator reads this on next callback to reconcile `active_workers`.
--- ---
@@ -306,7 +274,7 @@ The worker MUST load available cross-role context before executing role-spec Pha
After Phase 4 verification, the worker MUST publish its contributions: After Phase 4 verification, the worker MUST publish its contributions:
1. **Artifact**: Write deliverable to `<session>/artifacts/<prefix>-<task-id>-<name>.md` 1. **Artifact**: Write deliverable to the path specified by role_spec Phase 4. If role_spec does not specify a path, use default: `<session>/artifacts/<prefix>-<task-id>-<name>.md`
2. **State data**: Prepare payload for Phase 5 `state_update` message (see Phase 5-L step 2 for schema) 2. **State data**: Prepare payload for Phase 5 `state_update` message (see Phase 5-L step 2 for schema)
3. **Wisdom**: Append new patterns to `learnings.md`, decisions to `decisions.md`, issues to `issues.md` 3. **Wisdom**: Append new patterns to `learnings.md`, decisions to `decisions.md`, issues to `issues.md`
4. **Context accumulator** (inner_loop only): Append summary (see Phase 5-L step 3 for schema). Maintain full accumulator for context continuity across iterations. 4. **Context accumulator** (inner_loop only): Append summary (see Phase 5-L step 3 for schema). Maintain full accumulator for context continuity across iterations.
@@ -324,9 +292,18 @@ Load in Phase 2 to inform execution. Contribute in Phase 4/5 with discoveries.
--- ---
## Message Bus Protocol ## Communication Protocols
Always use `mcp__ccw-tools__team_msg` for team communication. ### Addressing Convention
- **SendMessage**: For triggering coordinator turns (auto-delivered). Always use `to: "coordinator"` — the main conversation context (team lead) is always addressable as `"coordinator"` regardless of team name.
- **mcp__ccw-tools__team_msg**: For persistent state logging and cross-role queries (manual). Uses `session_id`, not team_name.
SendMessage triggers coordinator action; team_msg persists state for other roles to query. Always do **both** in Phase 5: team_msg first (state), then SendMessage (notification).
### Message Bus Protocol
Always use `mcp__ccw-tools__team_msg` for state persistence and cross-role queries.
### log (with state_update) — Primary for Phase 5 ### log (with state_update) — Primary for Phase 5
@@ -380,11 +357,33 @@ ccw team log --session-id <session_id> --from <role> --type <type> --json
| Process own prefix tasks | Process other role's prefix tasks | | Process own prefix tasks | Process other role's prefix tasks |
| SendMessage to coordinator | Directly communicate with other workers | | SendMessage to coordinator | Directly communicate with other workers |
| Use CLI tools for analysis/exploration | Create tasks for other roles | | Use CLI tools for analysis/exploration | Create tasks for other roles |
| Fast-advance simple successors | Spawn parallel worker batches | | Notify coordinator of unblocked tasks | Spawn agents (workers cannot call Agent) |
| Write to own artifacts + wisdom | Modify resources outside own scope | | Write to own artifacts + wisdom | Modify resources outside own scope |
--- ---
## Shutdown Handling
When a new conversation turn delivers a message containing `type: "shutdown_request"`:
1. Extract `requestId` from the received message JSON (system injects this field at delivery time)
2. Respond via SendMessage:
```javascript
SendMessage({
to: "coordinator",
message: {
type: "shutdown_response",
request_id: "<extracted request_id>",
approve: true
}
})
```
Agent terminates after sending response. Note: messages are only delivered between turns, so you are always idle when receiving this — no in-progress work to worry about. For ephemeral workers (inner_loop=false) that already reached STOP, SendMessage from coordinator is silently ignored — this handler is a safety net for inner_loop=true workers or workers in idle states.
---
## Error Handling ## Error Handling
| Scenario | Resolution | | Scenario | Resolution |

View File

@@ -103,7 +103,7 @@ TEXT-LEVEL ONLY. No source code reading.
Delegate to @commands/dispatch.md: Delegate to @commands/dispatch.md:
1. Read dependency graph and parallel mode from session.json 1. Read dependency graph and parallel mode from session.json
2. Topological sort tasks 2. Topological sort tasks
3. Create tasks via TaskCreate with blockedBy 3. Create tasks via TaskCreate, then set dependencies via TaskUpdate({ addBlockedBy })
4. Update session.json with task count 4. Update session.json with task count
## Phase 4: Spawn-and-Stop ## Phase 4: Spawn-and-Stop

View File

@@ -99,7 +99,7 @@ TEXT-LEVEL ONLY. No source code reading.
Delegate to @commands/dispatch.md: Delegate to @commands/dispatch.md:
1. Read pipeline mode and angles from session.json 1. Read pipeline mode and angles from session.json
2. Create tasks for selected pipeline with correct blockedBy 2. Create tasks for selected pipeline, then set dependencies via TaskUpdate({ addBlockedBy })
3. Update session.json with task count 3. Update session.json with task count
## Phase 4: Spawn-and-Stop ## Phase 4: Spawn-and-Stop

View File

@@ -241,7 +241,7 @@ Coordinator supports `resume` / `continue` for interrupted sessions:
3. Audit TaskList -> reconcile session state <-> task status 3. Audit TaskList -> reconcile session state <-> task status
4. Reset in_progress -> pending (interrupted tasks) 4. Reset in_progress -> pending (interrupted tasks)
5. Rebuild team and spawn needed workers only 5. Rebuild team and spawn needed workers only
6. Create missing tasks with correct blockedBy 6. Create missing tasks, set dependencies via TaskUpdate({ addBlockedBy })
7. Kick first executable task -> Phase 4 coordination loop 7. Kick first executable task -> Phase 4 coordination loop
--- ---

View File

@@ -144,7 +144,7 @@ For callback/check/resume/adapt/complete: load `@commands/monitor.md` and execut
4. Detect fast-advance orphans (in_progress without recent activity) -> reset to pending 4. Detect fast-advance orphans (in_progress without recent activity) -> reset to pending
5. Determine remaining pipeline from reconciled state 5. Determine remaining pipeline from reconciled state
6. Rebuild team if disbanded (TeamCreate + spawn needed workers only) 6. Rebuild team if disbanded (TeamCreate + spawn needed workers only)
7. Create missing tasks with correct blockedBy dependencies 7. Create missing tasks, set dependencies via TaskUpdate({ addBlockedBy })
8. Verify dependency chain integrity 8. Verify dependency chain integrity
9. Update session file with reconciled state 9. Update session file with reconciled state
10. Kick first executable task's worker -> Phase 4 10. Kick first executable task's worker -> Phase 4
@@ -278,7 +278,7 @@ mcp__ccw-tools__team_msg({
Delegate to `@commands/dispatch.md` which creates the full task chain: Delegate to `@commands/dispatch.md` which creates the full task chain:
1. Reads dependency_graph from task-analysis.json 1. Reads dependency_graph from task-analysis.json
2. Topological sorts tasks 2. Topological sorts tasks
3. Creates tasks via TaskCreate with correct blockedBy 3. Creates tasks via TaskCreate, then sets dependencies via TaskUpdate({ addBlockedBy })
4. Assigns owner based on role mapping from task-analysis.json 4. Assigns owner based on role mapping from task-analysis.json
5. Includes `Session: <session-folder>` in every task description 5. Includes `Session: <session-folder>` in every task description
6. Sets InnerLoop flag for multi-task roles 6. Sets InnerLoop flag for multi-task roles

View File

@@ -77,7 +77,7 @@ If `.workflow/.team/${teamConfig.sessionPrefix}-*/team-session.json` exists:
## Phase 3: Dispatch ## Phase 3: Dispatch
- Execute `commands/dispatch.md` - Execute `commands/dispatch.md`
- Creates TaskCreate calls with blockedBy dependencies - Creates TaskCreate calls, then sets dependencies via TaskUpdate({ addBlockedBy })
## Phase 4: Spawn & Monitor ## Phase 4: Spawn & Monitor
@@ -144,7 +144,7 @@ Write `task-analysis.json` to session directory:
Template — includes: Template — includes:
- Topological sort from dependency graph - Topological sort from dependency graph
- TaskCreate with blockedBy - TaskCreate + TaskUpdate({ addBlockedBy }) for dependencies
- Task description template (PURPOSE/TASK/CONTEXT/EXPECTED/CONSTRAINTS) - Task description template (PURPOSE/TASK/CONTEXT/EXPECTED/CONSTRAINTS)
### coordinator/commands/monitor.md ### coordinator/commands/monitor.md

View File

@@ -44,10 +44,10 @@ Analyzer needs more evidence. Create supplemental reproduction task.
1. Parse Analyzer's evidence request (dimensions, specific actions) 1. Parse Analyzer's evidence request (dimensions, specific actions)
2. Create REPRODUCE-002 task: 2. Create REPRODUCE-002 task:
- TaskCreate with description from Analyzer's request - TaskCreate with description from Analyzer's request
- blockedBy: [] (can start immediately) - TaskUpdate to set owner (no blockedBycan start immediately)
3. Create ANALYZE-002 task: 3. Create ANALYZE-002 task:
- blockedBy: [REPRODUCE-002] - TaskCreate + TaskUpdate with addBlockedBy: [REPRODUCE-002]
- Update FIX-001 blockedBy to include ANALYZE-002 - TaskUpdate FIX-001 with addBlockedBy to include ANALYZE-002
4. Update team-session.json with new tasks 4. Update team-session.json with new tasks
5. -> handleSpawnNext 5. -> handleSpawnNext

View File

@@ -98,7 +98,7 @@ Delegate to @commands/dispatch.md:
1. Read dependency graph from task-analysis.json 1. Read dependency graph from task-analysis.json
2. Read specs/pipelines.md for debug-pipeline task registry 2. Read specs/pipelines.md for debug-pipeline task registry
3. Topological sort tasks 3. Topological sort tasks
4. Create tasks via TaskCreate with blockedBy 4. Create tasks via TaskCreate, then set blockedBy via TaskUpdate
5. Update team-session.json 5. Update team-session.json
## Phase 4: Spawn-and-Stop ## Phase 4: Spawn-and-Stop

View File

@@ -115,7 +115,7 @@ mcp__ccw-tools__team_msg({
Delegate to @commands/dispatch.md: Delegate to @commands/dispatch.md:
1. Read specs/pipelines.md for selected pipeline task registry 1. Read specs/pipelines.md for selected pipeline task registry
2. Create tasks via TaskCreate with blockedBy 2. Create tasks via TaskCreate, then set blockedBy via TaskUpdate
3. Update session.json 3. Update session.json
## Phase 4: Spawn-and-Stop ## Phase 4: Spawn-and-Stop

View File

@@ -227,7 +227,7 @@ Verify task chain integrity:
| Check | Method | Expected | | Check | Method | Expected |
|-------|--------|----------| |-------|--------|----------|
| Task count correct | TaskList count | patch: 2, sprint: 4, multi: 5+ | | Task count correct | TaskList count | patch: 2, sprint: 4, multi: 5+ |
| Dependencies correct | Trace blockedBy graph | Acyclic, correct ordering | | Dependencies correct | Trace addBlockedBy graph | Acyclic, correct ordering |
| No circular dependencies | Trace full graph | Acyclic | | No circular dependencies | Trace full graph | Acyclic |
| Structured descriptions | Each has PURPOSE/TASK/CONTEXT/EXPECTED | All present | | Structured descriptions | Each has PURPOSE/TASK/CONTEXT/EXPECTED | All present |

View File

@@ -111,13 +111,13 @@ mcp__ccw-tools__team_msg({
Delegate to @commands/dispatch.md: Delegate to @commands/dispatch.md:
1. Read specs/pipelines.md for selected pipeline task registry 1. Read specs/pipelines.md for selected pipeline task registry
2. Create tasks via TaskCreate with blockedBy 2. Create tasks via TaskCreate, then TaskUpdate with addBlockedBy
3. Update task-ledger.json 3. Update task-ledger.json
## Phase 4: Spawn-and-Stop ## Phase 4: Spawn-and-Stop
Delegate to @commands/monitor.md#handleSpawnNext: Delegate to @commands/monitor.md#handleSpawnNext:
1. Find ready tasks (pending + blockedBy resolved) 1. Find ready tasks (pending + all addBlockedBy dependencies resolved)
2. Spawn team-worker agents (see SKILL.md Spawn Template) 2. Spawn team-worker agents (see SKILL.md Spawn Template)
3. Output status summary 3. Output status summary
4. STOP 4. STOP

View File

@@ -105,7 +105,7 @@ Pipeline done. Generate report and completion action.
1. Shutdown resident supervisor (if active): 1. Shutdown resident supervisor (if active):
``` ```
SendMessage({ type: "shutdown_request", recipient: "supervisor", content: "Pipeline complete" }) SendMessage({ to: "supervisor", message: { type: "shutdown_request", reason: "Pipeline complete" } })
``` ```
2. Generate summary (deliverables, stats, discussions) 2. Generate summary (deliverables, stats, discussions)
3. Read session.completion_action: 3. Read session.completion_action:

View File

@@ -27,7 +27,6 @@ Every task description uses structured format for clarity:
``` ```
TaskCreate({ TaskCreate({
subject: "<TASK-ID>", subject: "<TASK-ID>",
owner: "<role>",
description: "PURPOSE: <what this task achieves> | Success: <measurable completion criteria> description: "PURPOSE: <what this task achieves> | Success: <measurable completion criteria>
TASK: TASK:
- <step 1: specific action> - <step 1: specific action>
@@ -44,9 +43,9 @@ CONSTRAINTS: <scope limits, focus areas>
--- ---
InnerLoop: <true|false> InnerLoop: <true|false>
BranchId: <B01|A|none>", BranchId: <B01|A|none>",
blockedBy: [<dependency-list>],
status: "pending" status: "pending"
}) })
TaskUpdate({ taskId: "<TASK-ID>", addBlockedBy: [<dependency-list>], owner: "<role>" })
``` ```
### Mode Router ### Mode Router
@@ -106,9 +105,9 @@ EXPECTED: <session>/artifacts/optimization-plan.md | Priority-ordered with impro
CONSTRAINTS: Focus on highest-impact optimizations | Risk assessment required | Non-overlapping file targets per OPT-ID CONSTRAINTS: Focus on highest-impact optimizations | Risk assessment required | Non-overlapping file targets per OPT-ID
--- ---
InnerLoop: false", InnerLoop: false",
blockedBy: ["PROFILE-001"],
status: "pending" status: "pending"
}) })
TaskUpdate({ taskId: "STRATEGY-001", addBlockedBy: ["PROFILE-001"] })
``` ```
**IMPL-001** (optimizer, Stage 3): **IMPL-001** (optimizer, Stage 3):
@@ -130,9 +129,9 @@ EXPECTED: Modified source files + validation passing | Optimizations applied wit
CONSTRAINTS: Preserve existing behavior | Minimal changes per optimization | Follow code conventions CONSTRAINTS: Preserve existing behavior | Minimal changes per optimization | Follow code conventions
--- ---
InnerLoop: true", InnerLoop: true",
blockedBy: ["STRATEGY-001"],
status: "pending" status: "pending"
}) })
TaskUpdate({ taskId: "IMPL-001", addBlockedBy: ["STRATEGY-001"] })
``` ```
**BENCH-001** (benchmarker, Stage 4 - parallel): **BENCH-001** (benchmarker, Stage 4 - parallel):
@@ -154,9 +153,9 @@ EXPECTED: <session>/artifacts/benchmark-results.json | Per-metric comparison wit
CONSTRAINTS: Must compare against baseline | Flag any regressions CONSTRAINTS: Must compare against baseline | Flag any regressions
--- ---
InnerLoop: false", InnerLoop: false",
blockedBy: ["IMPL-001"],
status: "pending" status: "pending"
}) })
TaskUpdate({ taskId: "BENCH-001", addBlockedBy: ["IMPL-001"] })
``` ```
**REVIEW-001** (reviewer, Stage 4 - parallel): **REVIEW-001** (reviewer, Stage 4 - parallel):
@@ -178,9 +177,9 @@ EXPECTED: <session>/artifacts/review-report.md | Per-dimension findings with sev
CONSTRAINTS: Focus on optimization changes only | Provide specific file:line references CONSTRAINTS: Focus on optimization changes only | Provide specific file:line references
--- ---
InnerLoop: false", InnerLoop: false",
blockedBy: ["IMPL-001"],
status: "pending" status: "pending"
}) })
TaskUpdate({ taskId: "REVIEW-001", addBlockedBy: ["IMPL-001"] })
``` ```
--- ---
@@ -207,11 +206,16 @@ For each target index `i` (0-based), with prefix char `P = pipeline_prefix_chars
// Create session subdirectory for this pipeline // Create session subdirectory for this pipeline
Bash("mkdir -p <session>/artifacts/pipelines/<P>") Bash("mkdir -p <session>/artifacts/pipelines/<P>")
TaskCreate({ subject: "PROFILE-<P>01", ... }) // blockedBy: [] TaskCreate({ subject: "PROFILE-<P>01", ... })
TaskCreate({ subject: "STRATEGY-<P>01", ... }) // blockedBy: ["PROFILE-<P>01"] TaskCreate({ subject: "STRATEGY-<P>01", ... })
TaskCreate({ subject: "IMPL-<P>01", ... }) // blockedBy: ["STRATEGY-<P>01"] TaskCreate({ subject: "IMPL-<P>01", ... })
TaskCreate({ subject: "BENCH-<P>01", ... }) // blockedBy: ["IMPL-<P>01"] TaskCreate({ subject: "BENCH-<P>01", ... })
TaskCreate({ subject: "REVIEW-<P>01", ... }) // blockedBy: ["IMPL-<P>01"] TaskCreate({ subject: "REVIEW-<P>01", ... })
// Then set dependencies via TaskUpdate:
TaskUpdate({ taskId: "STRATEGY-<P>01", addBlockedBy: ["PROFILE-<P>01"] })
TaskUpdate({ taskId: "IMPL-<P>01", addBlockedBy: ["STRATEGY-<P>01"] })
TaskUpdate({ taskId: "BENCH-<P>01", addBlockedBy: ["IMPL-<P>01"] })
TaskUpdate({ taskId: "REVIEW-<P>01", addBlockedBy: ["IMPL-<P>01"] })
``` ```
Task descriptions follow same template as single mode, with additions: Task descriptions follow same template as single mode, with additions:
@@ -295,9 +299,9 @@ CONSTRAINTS: Only implement this branch's optimization | Do not touch files outs
--- ---
InnerLoop: false InnerLoop: false
BranchId: B{NN}", BranchId: B{NN}",
blockedBy: ["STRATEGY-001"],
status: "pending" status: "pending"
}) })
TaskUpdate({ taskId: "IMPL-B{NN}", addBlockedBy: ["STRATEGY-001"] })
TaskCreate({ TaskCreate({
subject: "BENCH-B{NN}", subject: "BENCH-B{NN}",
@@ -316,9 +320,9 @@ CONSTRAINTS: Only benchmark this branch's metrics
--- ---
InnerLoop: false InnerLoop: false
BranchId: B{NN}", BranchId: B{NN}",
blockedBy: ["IMPL-B{NN}"],
status: "pending" status: "pending"
}) })
TaskUpdate({ taskId: "BENCH-B{NN}", addBlockedBy: ["IMPL-B{NN}"] })
TaskCreate({ TaskCreate({
subject: "REVIEW-B{NN}", subject: "REVIEW-B{NN}",
@@ -337,9 +341,9 @@ CONSTRAINTS: Only review this branch's changes
--- ---
InnerLoop: false InnerLoop: false
BranchId: B{NN}", BranchId: B{NN}",
blockedBy: ["IMPL-B{NN}"],
status: "pending" status: "pending"
}) })
TaskUpdate({ taskId: "REVIEW-B{NN}", addBlockedBy: ["IMPL-B{NN}"] })
``` ```
7. Update session.json: 7. Update session.json:
@@ -355,7 +359,7 @@ Verify task chain integrity:
| Check | Method | Expected | | Check | Method | Expected |
|-------|--------|----------| |-------|--------|----------|
| Task count correct | TaskList count | single: 5, auto/fan-out: 2 (pre-CP-2.5), independent: 5*M | | Task count correct | TaskList count | single: 5, auto/fan-out: 2 (pre-CP-2.5), independent: 5*M |
| Dependencies correct | Trace dependency graph | Acyclic, correct blockedBy | | Dependencies correct | Trace dependency graph | Acyclic, correct addBlockedBy |
| No circular dependencies | Trace dependency graph | Acyclic | | No circular dependencies | Trace dependency graph | Acyclic |
| Task IDs use correct prefixes | Pattern check | Match naming rules per mode | | Task IDs use correct prefixes | Pattern check | Match naming rules per mode |
| Structured descriptions complete | Each has PURPOSE/TASK/CONTEXT/EXPECTED/CONSTRAINTS | All present | | Structured descriptions complete | Each has PURPOSE/TASK/CONTEXT/EXPECTED/CONSTRAINTS | All present |

View File

@@ -172,7 +172,6 @@ CONSTRAINTS: Targeted fixes only | Do not touch other branches
--- ---
InnerLoop: false InnerLoop: false
BranchId: B{NN}", BranchId: B{NN}",
blockedBy: [],
status: "pending" status: "pending"
}) })
``` ```
@@ -186,7 +185,6 @@ Create new BENCH and REVIEW with retry suffix:
TaskCreate({ TaskCreate({
subject: "FIX-{P}01-{cycle}", subject: "FIX-{P}01-{cycle}",
...same pattern with pipeline prefix... ...same pattern with pipeline prefix...
blockedBy: [],
status: "pending" status: "pending"
}) })
``` ```
@@ -310,7 +308,7 @@ Triggered by user "revise <TASK-ID> [feedback]" command.
1. Parse target task ID and optional feedback 1. Parse target task ID and optional feedback
2. Detect branch/pipeline from task ID pattern 2. Detect branch/pipeline from task ID pattern
3. Create revision task with same role but updated requirements, scoped to branch 3. Create revision task with same role but updated requirements, scoped to branch
4. Set blockedBy to empty (immediate execution) 4. Skip addBlockedBy (no dependencies, immediate execution)
5. Cascade: create new downstream tasks within same branch only 5. Cascade: create new downstream tasks within same branch only
6. Proceed to handleSpawnNext 6. Proceed to handleSpawnNext

View File

@@ -32,7 +32,7 @@ Execution method: <agent|codex|gemini>
## Instructions ## Instructions
1. Parse input to get issue list 1. Parse input to get issue list
2. For each issue: call issue-plan-agent → write solution artifact 2. For each issue: call issue-plan-agent → write solution artifact
3. After each solution: create EXEC-* task (owner: executor) with solution_file path 3. After each solution: create EXEC-* task with solution_file path, then TaskUpdate to set owner: executor
4. After all issues: send all_planned signal 4. After all issues: send all_planned signal
InnerLoop: true`, InnerLoop: true`,

View File

@@ -46,7 +46,7 @@ For callback/check/resume: load `@commands/monitor.md` and execute the appropria
1. Parse new input (Issue IDs / `--text` / `--plan`) 1. Parse new input (Issue IDs / `--text` / `--plan`)
2. Get current max PLAN-* sequence from `TaskList` 2. Get current max PLAN-* sequence from `TaskList`
3. `TaskCreate` new PLAN-00N task (owner: planner) 3. `TaskCreate` new PLAN-00N task, then `TaskUpdate` to set owner: planner
4. If planner already sent `all_planned` (check team_msg) -> `SendMessage` to planner to re-enter loop 4. If planner already sent `all_planned` (check team_msg) -> `SendMessage` to planner to re-enter loop
5. STOP 5. STOP

View File

@@ -1,16 +1,16 @@
# Dispatch Tasks # Dispatch Tasks
Create task chains from dependency graph with proper blockedBy relationships. Create task chains from dependency graph with proper addBlockedBy relationships.
## Workflow ## Workflow
1. Read task-analysis.json -> extract pipeline_mode and dependency_graph 1. Read task-analysis.json -> extract pipeline_mode and dependency_graph
2. Read specs/pipelines.md -> get task registry for selected pipeline 2. Read specs/pipelines.md -> get task registry for selected pipeline
3. Topological sort tasks (respect blockedBy) 3. Topological sort tasks (respect addBlockedBy)
4. Validate all owners exist in role registry (SKILL.md) 4. Validate all owners exist in role registry (SKILL.md)
5. For each task (in order): 5. For each task (in order):
- TaskCreate with structured description (see template below) - TaskCreate with structured description (see template below)
- TaskUpdate with blockedBy + owner assignment - TaskUpdate with addBlockedBy + owner assignment
6. Update session.json with pipeline.tasks_total 6. Update session.json with pipeline.tasks_total
7. Validate chain (no orphans, no cycles, all refs valid) 7. Validate chain (no orphans, no cycles, all refs valid)
@@ -38,51 +38,51 @@ RoleSpec: ~ or <project>/.claude/skills/team-quality-assurance/roles/<role>/rol
### Discovery Mode ### Discovery Mode
``` ```
SCOUT-001 (scout): Multi-perspective issue scanning SCOUT-001 (scout): Multi-perspective issue scanning
blockedBy: [] addBlockedBy: []
QASTRAT-001 (strategist): Test strategy formulation QASTRAT-001 (strategist): Test strategy formulation
blockedBy: [SCOUT-001] addBlockedBy: [SCOUT-001]
QAGEN-001 (generator): L1 unit test generation QAGEN-001 (generator): L1 unit test generation
blockedBy: [QASTRAT-001], meta: layer=L1 addBlockedBy: [QASTRAT-001], meta: layer=L1
QARUN-001 (executor): L1 test execution + fix cycles QARUN-001 (executor): L1 test execution + fix cycles
blockedBy: [QAGEN-001], inner_loop: true, meta: layer=L1 addBlockedBy: [QAGEN-001], inner_loop: true, meta: layer=L1
QAANA-001 (analyst): Quality analysis report QAANA-001 (analyst): Quality analysis report
blockedBy: [QARUN-001] addBlockedBy: [QARUN-001]
``` ```
### Testing Mode ### Testing Mode
``` ```
QASTRAT-001 (strategist): Test strategy formulation QASTRAT-001 (strategist): Test strategy formulation
blockedBy: [] addBlockedBy: []
QAGEN-L1-001 (generator): L1 unit test generation QAGEN-L1-001 (generator): L1 unit test generation
blockedBy: [QASTRAT-001], meta: layer=L1 addBlockedBy: [QASTRAT-001], meta: layer=L1
QARUN-L1-001 (executor): L1 test execution + fix cycles QARUN-L1-001 (executor): L1 test execution + fix cycles
blockedBy: [QAGEN-L1-001], inner_loop: true, meta: layer=L1 addBlockedBy: [QAGEN-L1-001], inner_loop: true, meta: layer=L1
QAGEN-L2-001 (generator): L2 integration test generation QAGEN-L2-001 (generator): L2 integration test generation
blockedBy: [QARUN-L1-001], meta: layer=L2 addBlockedBy: [QARUN-L1-001], meta: layer=L2
QARUN-L2-001 (executor): L2 test execution + fix cycles QARUN-L2-001 (executor): L2 test execution + fix cycles
blockedBy: [QAGEN-L2-001], inner_loop: true, meta: layer=L2 addBlockedBy: [QAGEN-L2-001], inner_loop: true, meta: layer=L2
QAANA-001 (analyst): Quality analysis report QAANA-001 (analyst): Quality analysis report
blockedBy: [QARUN-L2-001] addBlockedBy: [QARUN-L2-001]
``` ```
### Full Mode ### Full Mode
``` ```
SCOUT-001 (scout): Multi-perspective issue scanning SCOUT-001 (scout): Multi-perspective issue scanning
blockedBy: [] addBlockedBy: []
QASTRAT-001 (strategist): Test strategy formulation QASTRAT-001 (strategist): Test strategy formulation
blockedBy: [SCOUT-001] addBlockedBy: [SCOUT-001]
QAGEN-L1-001 (generator-1): L1 unit test generation QAGEN-L1-001 (generator-1): L1 unit test generation
blockedBy: [QASTRAT-001], meta: layer=L1 addBlockedBy: [QASTRAT-001], meta: layer=L1
QAGEN-L2-001 (generator-2): L2 integration test generation QAGEN-L2-001 (generator-2): L2 integration test generation
blockedBy: [QASTRAT-001], meta: layer=L2 addBlockedBy: [QASTRAT-001], meta: layer=L2
QARUN-L1-001 (executor-1): L1 test execution + fix cycles QARUN-L1-001 (executor-1): L1 test execution + fix cycles
blockedBy: [QAGEN-L1-001], inner_loop: true, meta: layer=L1 addBlockedBy: [QAGEN-L1-001], inner_loop: true, meta: layer=L1
QARUN-L2-001 (executor-2): L2 test execution + fix cycles QARUN-L2-001 (executor-2): L2 test execution + fix cycles
blockedBy: [QAGEN-L2-001], inner_loop: true, meta: layer=L2 addBlockedBy: [QAGEN-L2-001], inner_loop: true, meta: layer=L2
QAANA-001 (analyst): Quality analysis report QAANA-001 (analyst): Quality analysis report
blockedBy: [QARUN-L1-001, QARUN-L2-001] addBlockedBy: [QARUN-L1-001, QARUN-L2-001]
SCOUT-002 (scout): Regression scan after fixes SCOUT-002 (scout): Regression scan after fixes
blockedBy: [QAANA-001] addBlockedBy: [QAANA-001]
``` ```
## InnerLoop Flag Rules ## InnerLoop Flag Rules

View File

@@ -72,9 +72,9 @@ EXPECTED: <session>/results/run-<layer>-gc-<round>.json
CONSTRAINTS: Read-only execution CONSTRAINTS: Read-only execution
--- ---
InnerLoop: false InnerLoop: false
RoleSpec: ~ or <project>/.claude/skills/team-quality-assurance/roles/executor/role.md", RoleSpec: ~ or <project>/.claude/skills/team-quality-assurance/roles/executor/role.md"
blockedBy: ["QAGEN-fix-<round>"]
}) })
TaskUpdate({ taskId: "QARUN-gc-<round>", addBlockedBy: ["QAGEN-fix-<round>"] })
``` ```
6. -> handleSpawnNext 6. -> handleSpawnNext

View File

@@ -111,13 +111,13 @@ Delegate to @commands/dispatch.md:
1. Read dependency graph from task-analysis.json 1. Read dependency graph from task-analysis.json
2. Read specs/pipelines.md for selected pipeline's task registry 2. Read specs/pipelines.md for selected pipeline's task registry
3. Topological sort tasks 3. Topological sort tasks
4. Create tasks via TaskCreate with blockedBy 4. Create tasks via TaskCreate, then TaskUpdate with addBlockedBy
5. Update session.json 5. Update session.json
## Phase 4: Spawn-and-Stop ## Phase 4: Spawn-and-Stop
Delegate to @commands/monitor.md#handleSpawnNext: Delegate to @commands/monitor.md#handleSpawnNext:
1. Find ready tasks (pending + blockedBy resolved) 1. Find ready tasks (pending + all addBlockedBy dependencies resolved)
2. Spawn team-worker agents (see SKILL.md Spawn Template) 2. Spawn team-worker agents (see SKILL.md Spawn Template)
3. Output status summary 3. Output status summary
4. STOP 4. STOP

View File

@@ -150,7 +150,7 @@ mcp__ccw-tools__team_msg({
operation: "log", session_id: sessionId, operation: "log", session_id: sessionId,
from: "coordinator", to: "all", from: "coordinator", to: "all",
type: "phase_started", type: "phase_started",
ref: `${sessionFolder}/roadmap.md` data: { ref: `${sessionFolder}/roadmap.md` }
}) })
``` ```

View File

@@ -58,7 +58,7 @@ mcp__ccw-tools__team_msg({
operation: "log", session_id: sessionId, operation: "log", session_id: sessionId,
from: "coordinator", to: "all", from: "coordinator", to: "all",
type: "phase_paused", type: "phase_paused",
ref: `${sessionFolder}/state.md` data: { ref: `${sessionFolder}/state.md` }
}) })
``` ```

View File

@@ -70,7 +70,7 @@ mcp__ccw-tools__team_msg({
operation: "log", session_id: sessionId, operation: "log", session_id: sessionId,
from: "coordinator", to: "all", from: "coordinator", to: "all",
type: "phase_started", type: "phase_started",
ref: `${sessionFolder}/state.md` data: { ref: `${sessionFolder}/state.md` }
}) })
``` ```

View File

@@ -139,7 +139,7 @@ mcp__ccw-tools__team_msg({
from: "coordinator", from: "coordinator",
to: <target-role>, to: <target-role>,
type: <message-type>, type: <message-type>,
ref: <artifact-path> data: { ref: <artifact-path> }
}) })
``` ```

View File

@@ -70,7 +70,8 @@ Worker completed. Process and advance.
Fix-Verify Task Creation: Fix-Verify Task Creation:
``` ```
TaskCreate({ subject: "TDFIX-fix-<round>", description: "PURPOSE: Fix regressions | Session: <session>" }) TaskCreate({ subject: "TDFIX-fix-<round>", description: "PURPOSE: Fix regressions | Session: <session>" })
TaskCreate({ subject: "TDVAL-recheck-<round>", description: "...", blockedBy: ["TDFIX-fix-<round>"] }) TaskCreate({ subject: "TDVAL-recheck-<round>", description: "..." })
TaskUpdate({ taskId: "TDVAL-recheck-<round>", addBlockedBy: ["TDFIX-fix-<round>"] })
``` ```
7. -> handleSpawnNext 7. -> handleSpawnNext

View File

@@ -1,89 +0,0 @@
---
prefix: ANALYZE
inner_loop: false
additional_prefixes: [ANALYZE-fix]
message_types:
success: analysis_ready
error: error
---
# Deep Analyst
Perform deep multi-perspective analysis on exploration results via CLI tools. Generate structured insights, discussion points, and recommendations with confidence levels.
## Phase 2: Context Loading
| Input | Source | Required |
|-------|--------|----------|
| Task description | From task subject/description | Yes |
| Session path | Extracted from task description | Yes |
| Exploration results | `<session>/explorations/*.json` | Yes |
1. Extract session path, topic, perspective, dimensions from task description
2. Detect direction-fix mode: `type:\s*direction-fix` with `adjusted_focus:\s*(.+)`
3. Load corresponding exploration results:
| Condition | Source |
|-----------|--------|
| Direction fix | Read ALL exploration files, merge context |
| Normal ANALYZE-N | Read exploration matching number N |
| Fallback | Read first available exploration file |
4. Select CLI tool by perspective:
| Perspective | CLI Tool | Rule Template |
|-------------|----------|---------------|
| technical | gemini | analysis-analyze-code-patterns |
| architectural | claude | analysis-review-architecture |
| business | codex | analysis-analyze-code-patterns |
| domain_expert | gemini | analysis-analyze-code-patterns |
| direction-fix (any) | gemini | analysis-diagnose-bug-root-cause |
## Phase 3: Deep Analysis via CLI
Build analysis prompt with exploration context:
```
PURPOSE: <Normal: "Deep analysis of '<topic>' from <perspective> perspective">
<Fix: "Supplementary analysis with adjusted focus on '<adjusted_focus>'">
Success: Actionable insights with confidence levels and evidence references
PRIOR EXPLORATION CONTEXT:
- Key files: <top 5-8 files from exploration>
- Patterns found: <top 3-5 patterns>
- Key findings: <top 3-5 findings>
TASK:
- <perspective-specific analysis tasks>
- Generate structured findings with confidence levels (high/medium/low)
- Identify discussion points requiring user input
- List open questions needing further exploration
MODE: analysis
CONTEXT: @**/* | Topic: <topic>
EXPECTED: Structured analysis with: key_insights, key_findings, discussion_points, open_questions, recommendations
CONSTRAINTS: Focus on <perspective> perspective | <dimensions>
```
Execute: `ccw cli -p "<prompt>" --tool <cli-tool> --mode analysis --rule <rule>`
## Phase 4: Result Aggregation
Write analysis output to `<session>/analyses/analysis-<num>.json`:
```json
{
"perspective": "<perspective>",
"dimensions": ["<dim1>", "<dim2>"],
"is_direction_fix": false,
"key_insights": [{"insight": "...", "confidence": "high", "evidence": "file:line"}],
"key_findings": [{"finding": "...", "file_ref": "...", "impact": "..."}],
"discussion_points": ["..."],
"open_questions": ["..."],
"recommendations": [{"action": "...", "rationale": "...", "priority": "high"}],
"_metadata": {"cli_tool": "...", "cli_rule": "...", "perspective": "...", "timestamp": "..."}
}
```
Update `<session>/wisdom/.msg/meta.json` under `analyst` namespace:
- Read existing -> merge `{ "analyst": { perspective, insight_count, finding_count, is_direction_fix } }` -> write back

View File

@@ -1,106 +0,0 @@
---
prefix: DISCUSS
inner_loop: false
message_types:
success: discussion_processed
error: error
---
# Discussant
Process analysis results and user feedback. Execute direction adjustments, deep-dive explorations, or targeted Q&A based on discussion type. Update discussion timeline.
## Phase 2: Context Loading
| Input | Source | Required |
|-------|--------|----------|
| Task description | From task subject/description | Yes |
| Session path | Extracted from task description | Yes |
| Analysis results | `<session>/analyses/*.json` | Yes |
| Exploration results | `<session>/explorations/*.json` | No |
1. Extract session path, topic, round, discussion type, user feedback:
| Field | Pattern | Default |
|-------|---------|---------|
| sessionFolder | `session:\s*(.+)` | required |
| topic | `topic:\s*(.+)` | required |
| round | `round:\s*(\d+)` | 1 |
| discussType | `type:\s*(.+)` | "initial" |
| userFeedback | `user_feedback:\s*(.+)` | empty |
2. Read all analysis and exploration results
3. Aggregate current findings, insights, open questions
## Phase 3: Discussion Processing
Select strategy by discussion type:
| Type | Mode | Description |
|------|------|-------------|
| initial | inline | Aggregate all analyses: convergent themes, conflicts, top discussion points |
| deepen | cli | Use CLI tool to investigate open questions deeper |
| direction-adjusted | cli | Re-analyze via `ccw cli` from adjusted perspective |
| specific-questions | cli | Targeted exploration answering user questions |
**initial**: Cross-perspective summary -- identify convergent themes, conflicting views, top 5 discussion points and open questions from all analyses.
**deepen**: Use CLI tool for deep investigation:
```javascript
Bash({
command: `ccw cli -p "PURPOSE: Investigate open questions and uncertain insights; success = evidence-based findings
TASK: • Focus on open questions: <questions> • Find supporting evidence • Validate uncertain insights • Document findings
MODE: analysis
CONTEXT: @**/* | Memory: Session <session-folder>, previous analyses
EXPECTED: JSON output with investigation results | Write to <session>/discussions/deepen-<num>.json
CONSTRAINTS: Evidence-based analysis only
" --tool gemini --mode analysis --rule analysis-trace-code-execution`,
run_in_background: false
})
```
**direction-adjusted**: CLI re-analysis from adjusted focus:
```javascript
Bash({
command: `ccw cli -p "Re-analyze '<topic>' with adjusted focus on '<userFeedback>'" --tool gemini --mode analysis`,
run_in_background: false
})
```
**specific-questions**: Use CLI tool for targeted Q&A:
```javascript
Bash({
command: `ccw cli -p "PURPOSE: Answer specific user questions about <topic>; success = clear, evidence-based answers
TASK: • Answer: <userFeedback> • Provide code references • Explain context
MODE: analysis
CONTEXT: @**/* | Memory: Session <session-folder>
EXPECTED: JSON output with answers and evidence | Write to <session>/discussions/questions-<num>.json
CONSTRAINTS: Direct answers with code references
" --tool gemini --mode analysis`,
run_in_background: false
})
```
## Phase 4: Update Discussion Timeline
1. Write round content to `<session>/discussions/discussion-round-<num>.json`:
```json
{
"round": 1, "type": "initial", "user_feedback": "...",
"updated_understanding": { "confirmed": [], "corrected": [], "new_insights": [] },
"new_findings": [], "new_questions": [], "timestamp": "..."
}
```
2. Append round section to `<session>/discussion.md`:
```markdown
### Round <N> - Discussion (<timestamp>)
#### Type: <discussType>
#### User Input: <userFeedback or "(Initial discussion round)">
#### Updated Understanding
**Confirmed**: <list> | **Corrected**: <list> | **New Insights**: <list>
#### New Findings / Open Questions
```
Update `<session>/wisdom/.msg/meta.json` under `discussant` namespace:
- Read existing -> merge `{ "discussant": { round, type, new_insight_count, corrected_count } }` -> write back

View File

@@ -1,73 +0,0 @@
---
prefix: EXPLORE
inner_loop: false
message_types:
success: exploration_ready
error: error
---
# Codebase Explorer
Explore codebase structure through cli-explore-agent, collecting structured context (files, patterns, findings) for downstream analysis. One explorer per analysis perspective.
## Phase 2: Context & Scope Assessment
| Input | Source | Required |
|-------|--------|----------|
| Task description | From task subject/description | Yes |
| Session path | Extracted from task description | Yes |
1. Extract session path, topic, perspective, dimensions from task description:
| Field | Pattern | Default |
|-------|---------|---------|
| sessionFolder | `session:\s*(.+)` | required |
| topic | `topic:\s*(.+)` | required |
| perspective | `perspective:\s*(.+)` | "general" |
| dimensions | `dimensions:\s*(.+)` | "general" |
2. Determine exploration number from task subject (EXPLORE-N)
3. Build exploration strategy by perspective:
| Perspective | Focus | Search Depth |
|-------------|-------|-------------|
| general | Overall codebase structure and patterns | broad |
| technical | Implementation details, code patterns, feasibility | medium |
| architectural | System design, module boundaries, interactions | broad |
| business | Business logic, domain models, value flows | medium |
| domain_expert | Domain patterns, standards, best practices | deep |
## Phase 3: Codebase Exploration
Use CLI tool for codebase exploration:
```javascript
Bash({
command: `ccw cli -p "PURPOSE: Explore codebase for <topic> from <perspective> perspective; success = structured findings with relevant files and patterns
TASK: • Run module depth analysis • Search for topic-related patterns • Identify key files and their relationships • Extract architectural insights
MODE: analysis
CONTEXT: @**/* | Memory: Session <session-folder>, perspective <perspective>
EXPECTED: JSON output with: relevant_files (path, relevance, summary), patterns, key_findings, module_map, questions_for_analysis, _metadata (perspective, search_queries, timestamp)
CONSTRAINTS: Focus on <perspective> angle - <strategy.focus> | Write to <session>/explorations/exploration-<num>.json
" --tool gemini --mode analysis --rule analysis-analyze-code-patterns`,
run_in_background: false
})
```
**ACE fallback** (when CLI produces no output):
```javascript
mcp__ace-tool__search_context({ project_root_path: ".", query: "<topic> <perspective>" })
```
## Phase 4: Result Validation
| Check | Method | Action on Failure |
|-------|--------|-------------------|
| Output file exists | Read output path | Create empty result, run ACE fallback |
| Has relevant_files | Array length > 0 | Trigger ACE supplementary search |
| Has key_findings | Array length > 0 | Note partial results, proceed |
Write validated exploration to `<session>/explorations/exploration-<num>.json`.
Update `<session>/wisdom/.msg/meta.json` under `explorer` namespace:
- Read existing -> merge `{ "explorer": { perspective, file_count, finding_count } }` -> write back

View File

@@ -1,77 +0,0 @@
---
prefix: SYNTH
inner_loop: false
message_types:
success: synthesis_ready
error: error
---
# Synthesizer
Integrate all explorations, analyses, and discussions into final conclusions. Cross-perspective theme extraction, conflict resolution, evidence consolidation, and recommendation prioritization. Pure integration role -- no external tools or CLI calls.
## Phase 2: Context Loading
| Input | Source | Required |
|-------|--------|----------|
| Task description | From task subject/description | Yes |
| Session path | Extracted from task description | Yes |
| All artifacts | `<session>/explorations/*.json`, `analyses/*.json`, `discussions/*.json` | Yes |
| Decision trail | From wisdom/.msg/meta.json | No |
1. Extract session path and topic from task description
2. Read all exploration, analysis, and discussion round files
3. Load decision trail and current understanding from meta.json
4. Select synthesis strategy:
| Condition | Strategy |
|-----------|----------|
| Single analysis, no discussions | simple (Quick mode summary) |
| Multiple analyses, >2 discussion rounds | deep (track evolution) |
| Default | standard (cross-perspective integration) |
## Phase 3: Cross-Perspective Synthesis
Execute synthesis across four dimensions:
**1. Theme Extraction**: Identify convergent themes across all analysis perspectives. Cluster insights by similarity, rank by cross-perspective confirmation count.
**2. Conflict Resolution**: Identify contradictions between perspectives. Present both sides with trade-off analysis when irreconcilable.
**3. Evidence Consolidation**: Deduplicate findings, aggregate by file reference. Map evidence to conclusions with confidence levels:
| Level | Criteria |
|-------|----------|
| High | Multiple sources confirm, strong evidence |
| Medium | Single source or partial evidence |
| Low | Speculative, needs verification |
**4. Recommendation Prioritization**: Sort all recommendations by priority (high > medium > low), deduplicate, cap at 10.
Integrate decision trail from discussion rounds into final narrative.
## Phase 4: Write Conclusions
1. Write `<session>/conclusions.json`:
```json
{
"session_id": "...", "topic": "...", "completed": "ISO-8601",
"summary": "Executive summary...",
"key_conclusions": [{"point": "...", "evidence": "...", "confidence": "high"}],
"recommendations": [{"action": "...", "rationale": "...", "priority": "high"}],
"open_questions": ["..."],
"decision_trail": [{"round": 1, "decision": "...", "context": "..."}],
"cross_perspective_synthesis": { "convergent_themes": [], "conflicts_resolved": [], "unique_contributions": [] },
"_metadata": { "explorations": 3, "analyses": 3, "discussions": 2, "strategy": "standard" }
}
```
2. Append conclusions section to `<session>/discussion.md`:
```markdown
## Conclusions
### Summary / Key Conclusions / Recommendations / Remaining Questions
## Decision Trail / Current Understanding (Final) / Session Statistics
```
Update `<session>/wisdom/.msg/meta.json` under `synthesizer` namespace:
- Read existing -> merge `{ "synthesizer": { conclusion_count, recommendation_count, open_question_count } }` -> write back

View File

@@ -40,18 +40,28 @@ MAX_ROUNDS = pipeline_mode === 'deep' ? 5
Triggered when a worker sends completion message (via SendMessage callback). Triggered when a worker sends completion message (via SendMessage callback).
1. Parse message to identify role and task ID: 1. Parse message to identify role, then resolve completed tasks:
| Message Pattern | Role Detection | **Role detection** (from message tag at start of body):
|----------------|---------------|
| `[explorer]` or task ID `EXPLORE-*` | explorer |
| `[analyst]` or task ID `ANALYZE-*` | analyst |
| `[discussant]` or task ID `DISCUSS-*` | discussant |
| `[synthesizer]` or task ID `SYNTH-*` | synthesizer |
2. Mark task as completed: | Message starts with | Role | Handler |
|---------------------|------|---------|
| `[explorer]` | explorer | handleCallback |
| `[analyst]` | analyst | handleCallback |
| `[discussant]` | discussant | handleCallback |
| `[synthesizer]` | synthesizer | handleCallback |
| `[supervisor]` | supervisor | Log checkpoint result, verify CHECKPOINT task completed, proceed to handleSpawnNext |
**Task ID resolution** (do NOT parse from message — use TaskList):
- Call `TaskList()` and find tasks matching the detected role's prefix
- Tasks with status `completed` that were not previously tracked = newly completed tasks
- This is reliable even when a worker reports multiple tasks (inner_loop) or when message format varies
2. Verify task completion (worker already marks completed in Phase 5):
``` ```
TaskGet({ taskId: "<task-id>" })
// If still "in_progress" (worker failed to mark) → fallback:
TaskUpdate({ taskId: "<task-id>", status: "completed" }) TaskUpdate({ taskId: "<task-id>", status: "completed" })
``` ```
@@ -112,7 +122,7 @@ ELSE:
|----------|--------| |----------|--------|
| "Continue deeper" | Create new DISCUSS-`<N+1>` task (pending, no blockedBy). Record decision in discussion.md. Proceed to handleSpawnNext | | "Continue deeper" | Create new DISCUSS-`<N+1>` task (pending, no blockedBy). Record decision in discussion.md. Proceed to handleSpawnNext |
| "Adjust direction" | AskUserQuestion for new focus. Create ANALYZE-fix-`<N>` task (pending). Create DISCUSS-`<N+1>` task (pending, blockedBy ANALYZE-fix-`<N>`). Record direction change in discussion.md. Proceed to handleSpawnNext | | "Adjust direction" | AskUserQuestion for new focus. Create ANALYZE-fix-`<N>` task (pending). Create DISCUSS-`<N+1>` task (pending, blockedBy ANALYZE-fix-`<N>`). Record direction change in discussion.md. Proceed to handleSpawnNext |
| "Done" | Create SYNTH-001 task (pending, blockedBy last DISCUSS). Record decision in discussion.md. Proceed to handleSpawnNext | | "Done" | Check if SYNTH-001 already exists (from dispatch): if yes, ensure blockedBy is updated to reference last DISCUSS task; if no, create SYNTH-001 (pending, blockedBy last DISCUSS). Record decision in discussion.md. Proceed to handleSpawnNext |
**Dynamic task creation templates**: **Dynamic task creation templates**:
@@ -160,8 +170,11 @@ InnerLoop: false"
TaskUpdate({ taskId: "ANALYZE-fix-<N>", owner: "analyst" }) TaskUpdate({ taskId: "ANALYZE-fix-<N>", owner: "analyst" })
``` ```
SYNTH-001 (created dynamically in deep mode): SYNTH-001 (created dynamically — check existence first):
``` ```
// Guard: only create if SYNTH-001 doesn't exist yet (dispatch may have pre-created it)
const existingSynth = TaskList().find(t => t.subject === 'SYNTH-001')
if (!existingSynth) {
TaskCreate({ TaskCreate({
subject: "SYNTH-001", subject: "SYNTH-001",
description: "PURPOSE: Integrate all analysis into final conclusions | Success: Executive summary with recommendations description: "PURPOSE: Integrate all analysis into final conclusions | Success: Executive summary with recommendations
@@ -179,6 +192,8 @@ CONSTRAINTS: Pure integration, no new exploration
--- ---
InnerLoop: false" InnerLoop: false"
}) })
}
// Always update blockedBy to reference the last DISCUSS task (whether pre-existing or newly created)
TaskUpdate({ taskId: "SYNTH-001", addBlockedBy: ["<last-DISCUSS-task-id>"], owner: "synthesizer" }) TaskUpdate({ taskId: "SYNTH-001", addBlockedBy: ["<last-DISCUSS-task-id>"], owner: "synthesizer" })
``` ```
@@ -211,10 +226,10 @@ Find and spawn the next ready tasks.
| Task Prefix | Role | Role Spec | | Task Prefix | Role | Role Spec |
|-------------|------|-----------| |-------------|------|-----------|
| `EXPLORE-*` | explorer | `~ or <project>/.claude/skills/team-ultra-analyze/role-specs/explorer.md` | | `EXPLORE-*` | explorer | `<skill_root>/roles/explorer/role.md` |
| `ANALYZE-*` | analyst | `~ or <project>/.claude/skills/team-ultra-analyze/role-specs/analyst.md` | | `ANALYZE-*` | analyst | `<skill_root>/roles/analyst/role.md` |
| `DISCUSS-*` | discussant | `~ or <project>/.claude/skills/team-ultra-analyze/role-specs/discussant.md` | | `DISCUSS-*` | discussant | `<skill_root>/roles/discussant/role.md` |
| `SYNTH-*` | synthesizer | `~ or <project>/.claude/skills/team-ultra-analyze/role-specs/synthesizer.md` | | `SYNTH-*` | synthesizer | `<skill_root>/roles/synthesizer/role.md` |
3. Spawn team-worker for each ready task: 3. Spawn team-worker for each ready task:
@@ -227,7 +242,7 @@ Agent({
run_in_background: true, run_in_background: true,
prompt: `## Role Assignment prompt: `## Role Assignment
role: <role> role: <role>
role_spec: ~ or <project>/.claude/skills/team-ultra-analyze/role-specs/<role>.md role_spec: <skill_root>/roles/<role>/role.md
session: <session-folder> session: <session-folder>
session_id: <session-id> session_id: <session-id>
team_name: ultra-analyze team_name: ultra-analyze
@@ -298,11 +313,11 @@ Triggered when all pipeline tasks are completed.
| deep | All EXPLORE + ANALYZE + all DISCUSS-N + SYNTH-001 completed | | deep | All EXPLORE + ANALYZE + all DISCUSS-N + SYNTH-001 completed |
1. Verify all tasks completed. If any not completed, return to handleSpawnNext 1. Verify all tasks completed. If any not completed, return to handleSpawnNext
2. If all completed, transition to coordinator Phase 5 2. If all completed, **inline-execute coordinator Phase 5** (shutdown workers → report → completion action). Do NOT STOP here — continue directly into Phase 5 within the same turn.
## Phase 4: State Persistence ## Phase 4: State Persistence
After every handler execution: After every handler execution **except handleComplete**:
1. Update session.json with current state: 1. Update session.json with current state:
- `discussion_round`: current round count - `discussion_round`: current round count
@@ -311,6 +326,8 @@ After every handler execution:
2. Verify task list consistency (no orphan tasks, no broken dependencies) 2. Verify task list consistency (no orphan tasks, no broken dependencies)
3. **STOP** and wait for next event 3. **STOP** and wait for next event
> **handleComplete exception**: handleComplete does NOT STOP — it transitions directly to coordinator Phase 5.
## Error Handling ## Error Handling
| Scenario | Resolution | | Scenario | Resolution |

View File

@@ -44,13 +44,21 @@ When coordinator is invoked, detect invocation type:
| Detection | Condition | Handler | | Detection | Condition | Handler |
|-----------|-----------|---------| |-----------|-----------|---------|
| Worker callback | Message contains role tag [explorer], [analyst], [discussant], [synthesizer] | -> handleCallback (monitor.md) | | Worker callback | Message content starts with `[explorer]`, `[analyst]`, `[discussant]`, or `[synthesizer]` (role tag at beginning of message body) | -> handleCallback (monitor.md) |
| Supervisor callback | Message content starts with `[supervisor]` | -> handleSupervisorReport (log checkpoint result, proceed to handleSpawnNext if tasks unblocked) |
| Idle notification | System notification that a teammate went idle (does NOT start with a role tag — typically says "Agent X is now idle") | -> **IGNORE** (do not handleCallback; idle is normal after every turn) |
| Shutdown response | Message content is a JSON object containing `shutdown_response` (parse as structured data, not string) | -> handleShutdownResponse (see Phase 5) |
| Status check | Arguments contain "check" or "status" | -> handleCheck (monitor.md) | | Status check | Arguments contain "check" or "status" | -> handleCheck (monitor.md) |
| Manual resume | Arguments contain "resume" or "continue" | -> handleResume (monitor.md) | | Manual resume | Arguments contain "resume" or "continue" | -> handleResume (monitor.md) |
| Pipeline complete | All tasks have status "completed" | -> handleComplete (monitor.md) | | Pipeline complete | All tasks have status "completed" | -> handleComplete (monitor.md) |
| Interrupted session | Active/paused session exists | -> Phase 0 | | Interrupted session | Active/paused session exists | -> Phase 0 |
| New session | None of above | -> Phase 1 | | New session | None of above | -> Phase 1 |
**Message format discrimination**:
- **String messages starting with `[<role>]`**: Worker/supervisor completion reports → route to handleCallback or handleSupervisorReport
- **JSON object messages** (contain `type:` field): Structured protocol messages (shutdown_response) → route by `type` field
- **Other strings without role tags**: System idle notifications → IGNORE
For callback/check/resume/complete: load `@commands/monitor.md` and execute matched handler, then STOP. For callback/check/resume/complete: load `@commands/monitor.md` and execute matched handler, then STOP.
### Router Implementation ### Router Implementation
@@ -167,7 +175,31 @@ All subsequent coordination is handled by `commands/monitor.md` handlers trigger
--- ---
## Phase 5: Report + Completion Action ## Phase 5: Shutdown Workers + Report + Completion Action
### Shutdown All Workers
Before reporting, gracefully shut down all active teammates. This is a **multi-turn** process:
1. Read team config: `~/.claude/teams/ultra-analyze/config.json`
2. Build shutdown tracking list: `pending_shutdown = [<all member names except coordinator>]`
3. For each member in pending_shutdown, send shutdown request:
```javascript
SendMessage({
to: "<member-name>",
message: { type: "shutdown_request", reason: "Pipeline complete" }
})
```
4. **STOP** — wait for responses. Each `shutdown_response` triggers a new coordinator turn.
5. On each subsequent turn (shutdown_response received):
- Remove responder from `pending_shutdown`
- If `pending_shutdown` is empty → proceed to **Report** section below
- If not empty → **STOP** again, wait for remaining responses
6. If a member is unresponsive after 2 follow-ups, remove from tracking and proceed
**Note**: Workers that completed Phase 5-F and reached STOP may have already terminated. SendMessage to a terminated agent is silently ignored — this is safe. Only resident agents (e.g., supervisor) require explicit shutdown.
### Report
1. Load session state -> count completed tasks, calculate duration 1. Load session state -> count completed tasks, calculate duration
2. List deliverables: 2. List deliverables:

View File

@@ -157,8 +157,7 @@ team_msg(operation="log", session_id=<session-id>, from="tester",
If pass rate < 95%, send fix_required message: If pass rate < 95%, send fix_required message:
``` ```
SendMessage({ SendMessage({
recipient: "coordinator", to: "coordinator",
type: "message", message: "[tester] Test validation incomplete. Pass rate: <percentage>%. Manual review needed."
content: "[tester] Test validation incomplete. Pass rate: <percentage>%. Manual review needed."
}) })
``` ```

View File

@@ -30,7 +30,6 @@ Every task description uses structured format for clarity:
``` ```
TaskCreate({ TaskCreate({
subject: "<TASK-ID>", subject: "<TASK-ID>",
owner: "<role>",
description: "PURPOSE: <what this task achieves> | Success: <measurable completion criteria> description: "PURPOSE: <what this task achieves> | Success: <measurable completion criteria>
TASK: TASK:
- <step 1: specific action> - <step 1: specific action>
@@ -46,9 +45,9 @@ EXPECTED: <deliverable path> + <quality criteria>
CONSTRAINTS: <scope limits, focus areas> CONSTRAINTS: <scope limits, focus areas>
--- ---
InnerLoop: <true|false> InnerLoop: <true|false>
<additional-metadata-fields>", <additional-metadata-fields>"
blockedBy: [<dependency-list>]
}) })
TaskUpdate({ taskId: "<TASK-ID>", addBlockedBy: [<dependency-list>], owner: "<role>" })
``` ```
### Standard Pipeline Tasks ### Standard Pipeline Tasks
@@ -57,7 +56,6 @@ InnerLoop: <true|false>
``` ```
TaskCreate({ TaskCreate({
subject: "SCAN-001", subject: "SCAN-001",
owner: "scanner",
description: "PURPOSE: Scan UI components to identify interaction issues (unresponsive buttons, missing feedback, state not refreshing) | Success: Complete issue report with file:line references and severity classification description: "PURPOSE: Scan UI components to identify interaction issues (unresponsive buttons, missing feedback, state not refreshing) | Success: Complete issue report with file:line references and severity classification
TASK: TASK:
- Detect framework (React/Vue) from project structure - Detect framework (React/Vue) from project structure
@@ -73,17 +71,15 @@ CONTEXT:
EXPECTED: artifacts/scan-report.md with structured issue list (severity: High/Medium/Low, file:line, description, category) EXPECTED: artifacts/scan-report.md with structured issue list (severity: High/Medium/Low, file:line, description, category)
CONSTRAINTS: Focus on interaction issues only, exclude styling/layout problems CONSTRAINTS: Focus on interaction issues only, exclude styling/layout problems
--- ---
InnerLoop: false", InnerLoop: false"
blockedBy: [],
status: "pending"
}) })
TaskUpdate({ taskId: "SCAN-001", owner: "scanner" })
``` ```
**DIAG-001: Root Cause Diagnosis** **DIAG-001: Root Cause Diagnosis**
``` ```
TaskCreate({ TaskCreate({
subject: "DIAG-001", subject: "DIAG-001",
owner: "diagnoser",
description: "PURPOSE: Diagnose root causes of identified UI issues | Success: Complete diagnosis report with fix recommendations for each issue description: "PURPOSE: Diagnose root causes of identified UI issues | Success: Complete diagnosis report with fix recommendations for each issue
TASK: TASK:
- Load scan report from artifacts/scan-report.md - Load scan report from artifacts/scan-report.md
@@ -100,17 +96,15 @@ CONTEXT:
EXPECTED: artifacts/diagnosis.md with root cause analysis (issue ID, root cause, pattern type, fix recommendation) EXPECTED: artifacts/diagnosis.md with root cause analysis (issue ID, root cause, pattern type, fix recommendation)
CONSTRAINTS: Focus on actionable root causes, provide specific fix strategies CONSTRAINTS: Focus on actionable root causes, provide specific fix strategies
--- ---
InnerLoop: false", InnerLoop: false"
blockedBy: ["SCAN-001"],
status: "pending"
}) })
TaskUpdate({ taskId: "DIAG-001", addBlockedBy: ["SCAN-001"], owner: "diagnoser" })
``` ```
**DESIGN-001: Solution Design** **DESIGN-001: Solution Design**
``` ```
TaskCreate({ TaskCreate({
subject: "DESIGN-001", subject: "DESIGN-001",
owner: "designer",
description: "PURPOSE: Design feedback mechanisms and state management solutions for identified issues | Success: Complete implementation guide with code patterns and examples description: "PURPOSE: Design feedback mechanisms and state management solutions for identified issues | Success: Complete implementation guide with code patterns and examples
TASK: TASK:
- Load diagnosis report from artifacts/diagnosis.md - Load diagnosis report from artifacts/diagnosis.md
@@ -128,17 +122,15 @@ CONTEXT:
EXPECTED: artifacts/design-guide.md with implementation guide (issue ID, solution design, code patterns, state management examples, UI binding templates) EXPECTED: artifacts/design-guide.md with implementation guide (issue ID, solution design, code patterns, state management examples, UI binding templates)
CONSTRAINTS: Solutions must be framework-appropriate, provide complete working examples CONSTRAINTS: Solutions must be framework-appropriate, provide complete working examples
--- ---
InnerLoop: false", InnerLoop: false"
blockedBy: ["DIAG-001"],
status: "pending"
}) })
TaskUpdate({ taskId: "DESIGN-001", addBlockedBy: ["DIAG-001"], owner: "designer" })
``` ```
**IMPL-001: Code Implementation** **IMPL-001: Code Implementation**
``` ```
TaskCreate({ TaskCreate({
subject: "IMPL-001", subject: "IMPL-001",
owner: "implementer",
description: "PURPOSE: Generate fix code with proper state management, event handling, and UI feedback bindings | Success: All fixes implemented and validated description: "PURPOSE: Generate fix code with proper state management, event handling, and UI feedback bindings | Success: All fixes implemented and validated
TASK: TASK:
- Load design guide from artifacts/design-guide.md - Load design guide from artifacts/design-guide.md
@@ -158,17 +150,15 @@ CONTEXT:
EXPECTED: artifacts/fixes/ directory with all fix files, implementation summary in artifacts/fixes/README.md EXPECTED: artifacts/fixes/ directory with all fix files, implementation summary in artifacts/fixes/README.md
CONSTRAINTS: Maintain existing code style, ensure backward compatibility, validate all changes CONSTRAINTS: Maintain existing code style, ensure backward compatibility, validate all changes
--- ---
InnerLoop: true", InnerLoop: true"
blockedBy: ["DESIGN-001"],
status: "pending"
}) })
TaskUpdate({ taskId: "IMPL-001", addBlockedBy: ["DESIGN-001"], owner: "implementer" })
``` ```
**TEST-001: Test Validation** **TEST-001: Test Validation**
``` ```
TaskCreate({ TaskCreate({
subject: "TEST-001", subject: "TEST-001",
owner: "tester",
description: "PURPOSE: Generate and run tests to verify fixes (loading states, error handling, state updates) | Success: Pass rate >= 95%, all critical fixes validated description: "PURPOSE: Generate and run tests to verify fixes (loading states, error handling, state updates) | Success: Pass rate >= 95%, all critical fixes validated
TASK: TASK:
- Detect test framework (Jest/Vitest) from project - Detect test framework (Jest/Vitest) from project
@@ -187,10 +177,9 @@ CONTEXT:
EXPECTED: artifacts/test-report.md with test results (pass/fail counts, coverage metrics, fix iterations, remaining issues) EXPECTED: artifacts/test-report.md with test results (pass/fail counts, coverage metrics, fix iterations, remaining issues)
CONSTRAINTS: Pass rate threshold: 95%, max fix iterations: 5 CONSTRAINTS: Pass rate threshold: 95%, max fix iterations: 5
--- ---
InnerLoop: false", InnerLoop: false"
blockedBy: ["IMPL-001"],
status: "pending"
}) })
TaskUpdate({ taskId: "TEST-001", addBlockedBy: ["IMPL-001"], owner: "tester" })
``` ```
--- ---

View File

@@ -124,6 +124,19 @@ def create_config_from_env(db_path: str | Path, **overrides: object) -> "Config"
kwargs["hnsw_ef"] = int(os.environ["CODEXLENS_HNSW_EF"]) kwargs["hnsw_ef"] = int(os.environ["CODEXLENS_HNSW_EF"])
if os.environ.get("CODEXLENS_HNSW_M"): if os.environ.get("CODEXLENS_HNSW_M"):
kwargs["hnsw_M"] = int(os.environ["CODEXLENS_HNSW_M"]) kwargs["hnsw_M"] = int(os.environ["CODEXLENS_HNSW_M"])
# Tier config from env
if os.environ.get("CODEXLENS_TIER_HOT_HOURS"):
kwargs["tier_hot_hours"] = int(os.environ["CODEXLENS_TIER_HOT_HOURS"])
if os.environ.get("CODEXLENS_TIER_COLD_HOURS"):
kwargs["tier_cold_hours"] = int(os.environ["CODEXLENS_TIER_COLD_HOURS"])
# Search quality tier from env
if os.environ.get("CODEXLENS_SEARCH_QUALITY"):
kwargs["default_search_quality"] = os.environ["CODEXLENS_SEARCH_QUALITY"]
# Shard config from env
if os.environ.get("CODEXLENS_NUM_SHARDS"):
kwargs["num_shards"] = int(os.environ["CODEXLENS_NUM_SHARDS"])
if os.environ.get("CODEXLENS_MAX_LOADED_SHARDS"):
kwargs["max_loaded_shards"] = int(os.environ["CODEXLENS_MAX_LOADED_SHARDS"])
resolved = Path(db_path).resolve() resolved = Path(db_path).resolve()
kwargs["metadata_db_path"] = str(resolved / "metadata.db") kwargs["metadata_db_path"] = str(resolved / "metadata.db")
return Config(**kwargs) return Config(**kwargs)
@@ -143,28 +156,8 @@ def _create_config(args: argparse.Namespace) -> "Config":
return create_config_from_env(args.db_path, **overrides) return create_config_from_env(args.db_path, **overrides)
def create_pipeline( def _create_embedder(config: "Config"):
db_path: str | Path, """Create embedder based on config, auto-detecting embed_dim from API."""
config: "Config | None" = None,
) -> tuple:
"""Construct pipeline components from db_path and config.
Returns (indexing_pipeline, search_pipeline, config).
Used by both CLI bridge and MCP server.
"""
from codexlens_search.config import Config
from codexlens_search.core.factory import create_ann_index, create_binary_index
from codexlens_search.indexing.metadata import MetadataStore
from codexlens_search.indexing.pipeline import IndexingPipeline
from codexlens_search.search.fts import FTSEngine
from codexlens_search.search.pipeline import SearchPipeline
if config is None:
config = create_config_from_env(db_path)
resolved = Path(db_path).resolve()
resolved.mkdir(parents=True, exist_ok=True)
# Select embedder: API if configured, otherwise local fastembed
if config.embed_api_url: if config.embed_api_url:
from codexlens_search.embed.api import APIEmbedder from codexlens_search.embed.api import APIEmbedder
embedder = APIEmbedder(config) embedder = APIEmbedder(config)
@@ -179,13 +172,11 @@ def create_pipeline(
else: else:
from codexlens_search.embed.local import FastEmbedEmbedder from codexlens_search.embed.local import FastEmbedEmbedder
embedder = FastEmbedEmbedder(config) embedder = FastEmbedEmbedder(config)
return embedder
binary_store = create_binary_index(resolved, config.embed_dim, config)
ann_index = create_ann_index(resolved, config.embed_dim, config)
fts = FTSEngine(resolved / "fts.db")
metadata = MetadataStore(resolved / "metadata.db")
# Select reranker: API if configured, otherwise local fastembed def _create_reranker(config: "Config"):
"""Create reranker based on config."""
if config.reranker_api_url: if config.reranker_api_url:
from codexlens_search.rerank.api import APIReranker from codexlens_search.rerank.api import APIReranker
reranker = APIReranker(config) reranker = APIReranker(config)
@@ -193,6 +184,60 @@ def create_pipeline(
else: else:
from codexlens_search.rerank.local import FastEmbedReranker from codexlens_search.rerank.local import FastEmbedReranker
reranker = FastEmbedReranker(config) reranker = FastEmbedReranker(config)
return reranker
def create_pipeline(
db_path: str | Path,
config: "Config | None" = None,
) -> tuple:
"""Construct pipeline components from db_path and config.
Returns (indexing_pipeline, search_pipeline, config).
Used by both CLI bridge and MCP server.
When config.num_shards > 1, returns a ShardManager-backed pipeline
where indexing and search are delegated to the ShardManager.
The returned tuple is (shard_manager, shard_manager, config) so that
callers can use shard_manager.sync() and shard_manager.search().
"""
from codexlens_search.config import Config
if config is None:
config = create_config_from_env(db_path)
resolved = Path(db_path).resolve()
resolved.mkdir(parents=True, exist_ok=True)
embedder = _create_embedder(config)
reranker = _create_reranker(config)
# Sharded mode: delegate to ShardManager
if config.num_shards > 1:
from codexlens_search.core.shard_manager import ShardManager
manager = ShardManager(
num_shards=config.num_shards,
db_path=resolved,
config=config,
embedder=embedder,
reranker=reranker,
)
log.info(
"Using ShardManager with %d shards (max_loaded=%d)",
config.num_shards, config.max_loaded_shards,
)
return manager, manager, config
# Single-shard mode: original behavior, no ShardManager overhead
from codexlens_search.core.factory import create_ann_index, create_binary_index
from codexlens_search.indexing.metadata import MetadataStore
from codexlens_search.indexing.pipeline import IndexingPipeline
from codexlens_search.search.fts import FTSEngine
from codexlens_search.search.pipeline import SearchPipeline
binary_store = create_binary_index(resolved, config.embed_dim, config)
ann_index = create_ann_index(resolved, config.embed_dim, config)
fts = FTSEngine(resolved / "fts.db")
metadata = MetadataStore(resolved / "metadata.db")
indexing = IndexingPipeline( indexing = IndexingPipeline(
embedder=embedder, embedder=embedder,

View File

@@ -47,7 +47,7 @@ class Config:
# Backend selection: 'auto', 'faiss', 'hnswlib' # Backend selection: 'auto', 'faiss', 'hnswlib'
ann_backend: str = "auto" ann_backend: str = "auto"
binary_backend: str = "auto" binary_backend: str = "faiss"
# Indexing pipeline # Indexing pipeline
index_workers: int = 2 # number of parallel indexing workers index_workers: int = 2 # number of parallel indexing workers
@@ -77,6 +77,17 @@ class Config:
# Metadata store # Metadata store
metadata_db_path: str = "" # empty = no metadata tracking metadata_db_path: str = "" # empty = no metadata tracking
# Data tiering (hot/warm/cold)
tier_hot_hours: int = 24 # files accessed within this window are 'hot'
tier_cold_hours: int = 168 # files not accessed for this long are 'cold'
# Search quality tier: 'fast', 'balanced', 'thorough', 'auto'
default_search_quality: str = "auto"
# Shard partitioning
num_shards: int = 1 # 1 = single partition (no sharding), >1 = hash-based sharding
max_loaded_shards: int = 4 # LRU limit for loaded shards in ShardManager
# FTS # FTS
fts_top_k: int = 50 fts_top_k: int = 50

View File

@@ -15,6 +15,13 @@ logger = logging.getLogger(__name__)
class BinaryStore(BaseBinaryIndex): class BinaryStore(BaseBinaryIndex):
"""Persistent binary vector store using numpy memmap. """Persistent binary vector store using numpy memmap.
.. deprecated::
Prefer ``FAISSBinaryIndex`` for binary coarse search. This class is
retained as a numpy-only fallback for environments where FAISS is not
available. New code should use ``create_binary_index()`` from
``codexlens_search.core.factory`` which selects the best backend
automatically.
Stores binary-quantized float32 vectors as packed uint8 arrays on disk. Stores binary-quantized float32 vectors as packed uint8 arrays on disk.
Supports fast coarse search via XOR + popcount Hamming distance. Supports fast coarse search via XOR + popcount Hamming distance.
""" """

View File

@@ -1,6 +1,7 @@
from __future__ import annotations from __future__ import annotations
import logging import logging
import warnings
from pathlib import Path from pathlib import Path
from codexlens_search.config import Config from codexlens_search.config import Config
@@ -97,14 +98,29 @@ def create_binary_index(
backend = config.binary_backend backend = config.binary_backend
if backend == "faiss": if backend == "faiss":
from codexlens_search.core.faiss_index import FAISSBinaryIndex if _FAISS_AVAILABLE:
return FAISSBinaryIndex(path, dim, config) from codexlens_search.core.faiss_index import FAISSBinaryIndex
return FAISSBinaryIndex(path, dim, config)
# FAISS explicitly requested but not installed: fall back with warning
from codexlens_search.core.binary import BinaryStore
warnings.warn(
"binary_backend='faiss' but FAISS is not installed. "
"Falling back to deprecated numpy BinaryStore. "
"Install faiss-cpu or faiss-gpu for the recommended binary backend.",
DeprecationWarning,
stacklevel=2,
)
logger.warning(
"binary_backend='faiss' but FAISS not available, "
"falling back to deprecated numpy BinaryStore."
)
return BinaryStore(path, dim, config)
if backend == "hnswlib": if backend == "hnswlib":
from codexlens_search.core.binary import BinaryStore from codexlens_search.core.binary import BinaryStore
return BinaryStore(path, dim, config) return BinaryStore(path, dim, config)
# auto: try faiss first, then numpy-based BinaryStore # auto: try faiss first, then numpy-based BinaryStore (deprecated fallback)
if _FAISS_AVAILABLE: if _FAISS_AVAILABLE:
from codexlens_search.core.faiss_index import FAISSBinaryIndex from codexlens_search.core.faiss_index import FAISSBinaryIndex
logger.info("Auto-selected FAISS binary backend") logger.info("Auto-selected FAISS binary backend")
@@ -112,5 +128,14 @@ def create_binary_index(
# numpy BinaryStore is always available (no extra deps) # numpy BinaryStore is always available (no extra deps)
from codexlens_search.core.binary import BinaryStore from codexlens_search.core.binary import BinaryStore
logger.info("Auto-selected numpy BinaryStore backend") warnings.warn(
"Falling back to numpy BinaryStore because FAISS is not installed. "
"BinaryStore is deprecated; install faiss-cpu or faiss-gpu for better performance.",
DeprecationWarning,
stacklevel=2,
)
logger.warning(
"FAISS not available, falling back to deprecated numpy BinaryStore. "
"Install faiss-cpu or faiss-gpu for the recommended binary backend."
)
return BinaryStore(path, dim, config) return BinaryStore(path, dim, config)

View File

@@ -71,10 +71,23 @@ class FAISSANNIndex(BaseANNIndex):
self.load() self.load()
def load(self) -> None: def load(self) -> None:
"""Load index from disk or initialize a fresh one.""" """Load index from disk or initialize a fresh one.
Uses IO_FLAG_MMAP for zero-copy memory-mapped loading when available,
falling back to regular read_index() on older faiss versions.
"""
with self._lock: with self._lock:
if self._index_path.exists(): if self._index_path.exists():
idx = faiss.read_index(str(self._index_path)) try:
idx = faiss.read_index(
str(self._index_path), faiss.IO_FLAG_MMAP
)
except (AttributeError, RuntimeError, Exception) as exc:
logger.debug(
"MMAP load failed, falling back to regular read: %s",
exc,
)
idx = faiss.read_index(str(self._index_path))
logger.debug( logger.debug(
"Loaded FAISS ANN index from %s (%d items)", "Loaded FAISS ANN index from %s (%d items)",
self._index_path, idx.ntotal, self._index_path, idx.ntotal,
@@ -201,10 +214,23 @@ class FAISSBinaryIndex(BaseBinaryIndex):
return np.packbits(binary).reshape(1, -1) return np.packbits(binary).reshape(1, -1)
def load(self) -> None: def load(self) -> None:
"""Load binary index from disk or initialize a fresh one.""" """Load binary index from disk or initialize a fresh one.
Uses IO_FLAG_MMAP for zero-copy memory-mapped loading when available,
falling back to regular read_index_binary() on older faiss versions.
"""
with self._lock: with self._lock:
if self._index_path.exists(): if self._index_path.exists():
idx = faiss.read_index_binary(str(self._index_path)) try:
idx = faiss.read_index_binary(
str(self._index_path), faiss.IO_FLAG_MMAP
)
except (AttributeError, RuntimeError, Exception) as exc:
logger.debug(
"MMAP load failed, falling back to regular read: %s",
exc,
)
idx = faiss.read_index_binary(str(self._index_path))
logger.debug( logger.debug(
"Loaded FAISS binary index from %s (%d items)", "Loaded FAISS binary index from %s (%d items)",
self._index_path, idx.ntotal, self._index_path, idx.ntotal,

View File

@@ -0,0 +1,178 @@
"""Single index partition (shard) that owns FTS, binary, ANN, and metadata stores."""
from __future__ import annotations
import logging
from pathlib import Path
from codexlens_search.config import Config
from codexlens_search.core.base import BaseANNIndex, BaseBinaryIndex
from codexlens_search.embed.base import BaseEmbedder
from codexlens_search.indexing.metadata import MetadataStore
from codexlens_search.indexing.pipeline import IndexingPipeline, IndexStats
from codexlens_search.rerank import BaseReranker
from codexlens_search.search.fts import FTSEngine
from codexlens_search.search.pipeline import SearchPipeline, SearchResult
logger = logging.getLogger(__name__)
class Shard:
"""A complete index partition with its own FTS, binary, ANN, and metadata stores.
Components are lazy-loaded on first access and can be explicitly unloaded
to release memory. The embedder and reranker are shared across shards
(passed in from ShardManager) since they are expensive to instantiate.
"""
def __init__(
self,
shard_id: int,
db_path: str | Path,
config: Config,
) -> None:
self._shard_id = shard_id
self._shard_dir = Path(db_path).resolve() / f"shard_{shard_id}"
self._config = config
# Lazy-loaded components (created on _ensure_loaded)
self._fts: FTSEngine | None = None
self._binary_store: BaseBinaryIndex | None = None
self._ann_index: BaseANNIndex | None = None
self._metadata: MetadataStore | None = None
self._indexing: IndexingPipeline | None = None
self._search: SearchPipeline | None = None
self._loaded = False
@property
def shard_id(self) -> int:
return self._shard_id
@property
def is_loaded(self) -> bool:
return self._loaded
def _ensure_loaded(
self,
embedder: BaseEmbedder,
reranker: BaseReranker,
) -> None:
"""Lazy-create all per-shard components if not yet loaded."""
if self._loaded:
return
from codexlens_search.core.factory import create_ann_index, create_binary_index
self._shard_dir.mkdir(parents=True, exist_ok=True)
self._fts = FTSEngine(self._shard_dir / "fts.db")
self._binary_store = create_binary_index(
self._shard_dir, self._config.embed_dim, self._config
)
self._ann_index = create_ann_index(
self._shard_dir, self._config.embed_dim, self._config
)
self._metadata = MetadataStore(self._shard_dir / "metadata.db")
self._indexing = IndexingPipeline(
embedder=embedder,
binary_store=self._binary_store,
ann_index=self._ann_index,
fts=self._fts,
config=self._config,
metadata=self._metadata,
)
self._search = SearchPipeline(
embedder=embedder,
binary_store=self._binary_store,
ann_index=self._ann_index,
reranker=reranker,
fts=self._fts,
config=self._config,
metadata_store=self._metadata,
)
self._loaded = True
logger.debug("Shard %d loaded from %s", self._shard_id, self._shard_dir)
def unload(self) -> None:
"""Release memory by closing connections and dropping references."""
if not self._loaded:
return
if self._metadata is not None:
self._metadata.close()
self._fts = None
self._binary_store = None
self._ann_index = None
self._metadata = None
self._indexing = None
self._search = None
self._loaded = False
logger.debug("Shard %d unloaded", self._shard_id)
def load(
self,
embedder: BaseEmbedder,
reranker: BaseReranker,
) -> None:
"""Explicitly load shard components."""
self._ensure_loaded(embedder, reranker)
def save(self) -> None:
"""Persist binary and ANN indexes to disk."""
if not self._loaded:
return
if self._binary_store is not None:
self._binary_store.save()
if self._ann_index is not None:
self._ann_index.save()
def search(
self,
query: str,
embedder: BaseEmbedder,
reranker: BaseReranker,
quality: str | None = None,
top_k: int | None = None,
) -> list[SearchResult]:
"""Search this shard's index.
Args:
query: Search query string.
embedder: Shared embedder instance.
reranker: Shared reranker instance.
quality: Search quality tier.
top_k: Maximum results to return.
Returns:
List of SearchResult from this shard.
"""
self._ensure_loaded(embedder, reranker)
assert self._search is not None
return self._search.search(query, top_k=top_k, quality=quality)
def sync(
self,
files: list[Path],
root: Path | None,
embedder: BaseEmbedder,
reranker: BaseReranker,
**kwargs: object,
) -> IndexStats:
"""Sync this shard's index with the given files.
Args:
files: Files that belong to this shard.
root: Root directory for relative paths.
embedder: Shared embedder instance.
reranker: Shared reranker instance.
**kwargs: Forwarded to IndexingPipeline.sync().
Returns:
IndexStats for this shard's sync operation.
"""
self._ensure_loaded(embedder, reranker)
assert self._indexing is not None
return self._indexing.sync(files, root=root, **kwargs)

View File

@@ -0,0 +1,250 @@
"""ShardManager: manages multiple Shard instances with LRU eviction."""
from __future__ import annotations
import logging
import threading
from collections import OrderedDict
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path
from codexlens_search.config import Config
from codexlens_search.core.shard import Shard
from codexlens_search.embed.base import BaseEmbedder
from codexlens_search.indexing.pipeline import IndexStats
from codexlens_search.rerank import BaseReranker
from codexlens_search.search.fusion import reciprocal_rank_fusion
from codexlens_search.search.pipeline import SearchResult
logger = logging.getLogger(__name__)
class ShardManager:
"""Manages multiple Shard instances with hash-based file routing and LRU eviction.
Files are deterministically routed to shards via hash(path) % num_shards.
Search queries all shards in parallel and merges results via RRF fusion.
At most max_loaded_shards are kept in memory; least-recently-used shards
are unloaded when the limit is exceeded.
"""
def __init__(
self,
num_shards: int,
db_path: str | Path,
config: Config,
embedder: BaseEmbedder,
reranker: BaseReranker,
) -> None:
if num_shards < 1:
raise ValueError("num_shards must be >= 1")
self._num_shards = num_shards
self._db_path = Path(db_path).resolve()
self._config = config
self._embedder = embedder
self._reranker = reranker
self._max_loaded = config.max_loaded_shards
# Create all Shard objects (lazy-loaded, no I/O yet)
self._shards: dict[int, Shard] = {
i: Shard(i, self._db_path, config)
for i in range(num_shards)
}
# LRU tracking: keys are shard_ids, most-recently-used at end
self._loaded_order: OrderedDict[int, None] = OrderedDict()
self._lru_lock = threading.Lock()
@property
def num_shards(self) -> int:
return self._num_shards
def route_file(self, path: str) -> int:
"""Deterministically route a file path to a shard ID.
Uses hash(path) % num_shards for uniform distribution.
"""
return hash(path) % self._num_shards
def get_shard(self, shard_id: int) -> Shard:
"""Return the Shard instance for a given shard_id."""
if shard_id not in self._shards:
raise ValueError(
f"Invalid shard_id {shard_id}, valid range: 0-{self._num_shards - 1}"
)
return self._shards[shard_id]
def _ensure_loaded(self, shard_id: int) -> Shard:
"""Load a shard if needed, applying LRU eviction policy.
Thread-safe: protects OrderedDict mutations with a lock.
Returns the loaded Shard.
"""
shard = self._shards[shard_id]
with self._lru_lock:
# Mark as most-recently-used
if shard_id in self._loaded_order:
self._loaded_order.move_to_end(shard_id)
else:
self._loaded_order[shard_id] = None
# Load if not already loaded
if not shard.is_loaded:
shard.load(self._embedder, self._reranker)
# Evict LRU shards if over limit
while len(self._loaded_order) > self._max_loaded:
evict_id, _ = self._loaded_order.popitem(last=False)
evict_shard = self._shards[evict_id]
if evict_shard.is_loaded:
logger.info("LRU evicting shard %d", evict_id)
evict_shard.unload()
return shard
def sync(
self,
files: list[Path],
root: Path | None = None,
**kwargs: object,
) -> IndexStats:
"""Sync index with files, routing each file to its shard.
Groups files by shard via route_file(), then syncs each shard
with its subset of files.
Args:
files: Current list of files to index.
root: Root directory for relative paths.
**kwargs: Forwarded to Shard.sync().
Returns:
Aggregated IndexStats across all shards.
"""
# Group files by shard
shard_files: dict[int, list[Path]] = {i: [] for i in range(self._num_shards)}
for fpath in files:
rel = str(fpath.relative_to(root)) if root else str(fpath)
shard_id = self.route_file(rel)
shard_files[shard_id].append(fpath)
total_files = 0
total_chunks = 0
total_duration = 0.0
for shard_id, shard_file_list in shard_files.items():
if not shard_file_list:
continue
self._ensure_loaded(shard_id)
shard = self._shards[shard_id]
stats = shard.sync(
shard_file_list,
root=root,
embedder=self._embedder,
reranker=self._reranker,
**kwargs,
)
total_files += stats.files_processed
total_chunks += stats.chunks_created
total_duration += stats.duration_seconds
return IndexStats(
files_processed=total_files,
chunks_created=total_chunks,
duration_seconds=round(total_duration, 2),
)
def search(
self,
query: str,
quality: str | None = None,
top_k: int | None = None,
) -> list[SearchResult]:
"""Search all shards in parallel, merge results via RRF fusion.
Each shard returns its own ranked results. Cross-shard merging
uses reciprocal_rank_fusion with equal weights across shards.
Per-shard top_k is increased to compensate for cross-shard dilution.
Args:
query: Search query string.
quality: Search quality tier.
top_k: Maximum final results to return.
Returns:
Merged list of SearchResult ordered by relevance.
"""
cfg = self._config
final_top_k = top_k if top_k is not None else cfg.reranker_top_k
# Increase per-shard top_k to get enough candidates for cross-shard RRF
per_shard_top_k = max(final_top_k, final_top_k * 2)
# Load all shards for search
for shard_id in range(self._num_shards):
self._ensure_loaded(shard_id)
# Parallel search across shards
shard_results: dict[int, list[SearchResult]] = {}
def _search_shard(sid: int) -> tuple[int, list[SearchResult]]:
shard = self._shards[sid]
results = shard.search(
query,
embedder=self._embedder,
reranker=self._reranker,
quality=quality,
top_k=per_shard_top_k,
)
return sid, results
with ThreadPoolExecutor(max_workers=min(self._num_shards, 4)) as pool:
futures = [pool.submit(_search_shard, sid) for sid in range(self._num_shards)]
for future in futures:
try:
sid, results = future.result()
shard_results[sid] = results
except Exception:
logger.warning("Shard search failed", exc_info=True)
# If only one shard returned results, no merging needed
non_empty = {k: v for k, v in shard_results.items() if v}
if not non_empty:
return []
if len(non_empty) == 1:
results = list(non_empty.values())[0]
return results[:final_top_k]
# Cross-shard RRF merge
# Build ranked lists keyed by shard name, with (doc_id, score) tuples
# Use a global result map to look up SearchResult by a unique key
# Since doc_ids are shard-local, we need a composite key
rrf_input: dict[str, list[tuple[int, float]]] = {}
global_results: dict[int, SearchResult] = {}
global_id = 0
for sid, results in non_empty.items():
ranked: list[tuple[int, float]] = []
for r in results:
global_results[global_id] = r
ranked.append((global_id, r.score))
global_id += 1
rrf_input[f"shard_{sid}"] = ranked
fused = reciprocal_rank_fusion(rrf_input, k=cfg.fusion_k)
merged: list[SearchResult] = []
for gid, fused_score in fused[:final_top_k]:
result = global_results[gid]
merged.append(SearchResult(
id=result.id,
path=result.path,
score=fused_score,
snippet=result.snippet,
line=result.line,
end_line=result.end_line,
content=result.content,
))
return merged

View File

@@ -2,6 +2,7 @@
from __future__ import annotations from __future__ import annotations
import sqlite3 import sqlite3
import time
from pathlib import Path from pathlib import Path
@@ -9,7 +10,8 @@ class MetadataStore:
"""Tracks file-to-chunk mappings and deleted chunk IDs (tombstones). """Tracks file-to-chunk mappings and deleted chunk IDs (tombstones).
Tables: Tables:
files - file_path (PK), content_hash, last_modified files - file_path (PK), content_hash, last_modified, file_size,
tier ('hot'/'warm'/'cold'), last_accessed (epoch float)
chunks - chunk_id (PK), file_path (FK CASCADE), chunk_hash chunks - chunk_id (PK), file_path (FK CASCADE), chunk_hash
deleted_chunks - chunk_id (PK) for tombstone tracking deleted_chunks - chunk_id (PK) for tombstone tracking
""" """
@@ -19,13 +21,18 @@ class MetadataStore:
self._conn.execute("PRAGMA foreign_keys = ON") self._conn.execute("PRAGMA foreign_keys = ON")
self._conn.execute("PRAGMA journal_mode = WAL") self._conn.execute("PRAGMA journal_mode = WAL")
self._create_tables() self._create_tables()
self._migrate_size_column()
self._migrate_tier_columns()
def _create_tables(self) -> None: def _create_tables(self) -> None:
self._conn.executescript(""" self._conn.executescript("""
CREATE TABLE IF NOT EXISTS files ( CREATE TABLE IF NOT EXISTS files (
file_path TEXT PRIMARY KEY, file_path TEXT PRIMARY KEY,
content_hash TEXT NOT NULL, content_hash TEXT NOT NULL,
last_modified REAL NOT NULL last_modified REAL NOT NULL,
file_size INTEGER NOT NULL DEFAULT 0,
tier TEXT NOT NULL DEFAULT 'warm',
last_accessed REAL
); );
CREATE TABLE IF NOT EXISTS chunks ( CREATE TABLE IF NOT EXISTS chunks (
@@ -41,14 +48,48 @@ class MetadataStore:
""") """)
self._conn.commit() self._conn.commit()
def _migrate_size_column(self) -> None:
"""Add file_size column if missing (for pre-existing DBs)."""
cols = {
row[1]
for row in self._conn.execute("PRAGMA table_info(files)").fetchall()
}
if "file_size" not in cols:
self._conn.execute(
"ALTER TABLE files ADD COLUMN file_size INTEGER NOT NULL DEFAULT 0"
)
self._conn.commit()
def _migrate_tier_columns(self) -> None:
"""Add tier and last_accessed columns if missing (for pre-existing DBs)."""
cols = {
row[1]
for row in self._conn.execute("PRAGMA table_info(files)").fetchall()
}
if "tier" not in cols:
self._conn.execute(
"ALTER TABLE files ADD COLUMN tier TEXT NOT NULL DEFAULT 'warm'"
)
if "last_accessed" not in cols:
self._conn.execute(
"ALTER TABLE files ADD COLUMN last_accessed REAL"
)
if "tier" not in cols or "last_accessed" not in cols:
self._conn.commit()
def register_file( def register_file(
self, file_path: str, content_hash: str, mtime: float self,
file_path: str,
content_hash: str,
mtime: float,
file_size: int = 0,
) -> None: ) -> None:
"""Insert or update a file record.""" """Insert or update a file record."""
self._conn.execute( self._conn.execute(
"INSERT OR REPLACE INTO files (file_path, content_hash, last_modified) " "INSERT OR REPLACE INTO files "
"VALUES (?, ?, ?)", "(file_path, content_hash, last_modified, file_size) "
(file_path, content_hash, mtime), "VALUES (?, ?, ?, ?)",
(file_path, content_hash, mtime, file_size),
) )
self._conn.commit() self._conn.commit()
@@ -121,6 +162,24 @@ class MetadataStore:
return True # New file return True # New file
return stored != content_hash return stored != content_hash
def file_needs_update_fast(
self, file_path: str, mtime: float, size: int
) -> bool:
"""Fast pre-check using mtime and file size (no content read needed).
Returns True if the file appears changed or is not yet tracked.
When mtime and size both match stored values, the file is assumed
unchanged (~1000x faster than content-hash comparison).
"""
row = self._conn.execute(
"SELECT last_modified, file_size FROM files WHERE file_path = ?",
(file_path,),
).fetchone()
if row is None:
return True # New file
stored_mtime, stored_size = row
return stored_mtime != mtime or stored_size != size
def compact_deleted(self) -> set[int]: def compact_deleted(self) -> set[int]:
"""Return deleted IDs and clear the deleted_chunks table. """Return deleted IDs and clear the deleted_chunks table.
@@ -161,5 +220,81 @@ class MetadataStore:
).fetchone() ).fetchone()
return row[0] if row[0] is not None else -1 return row[0] if row[0] is not None else -1
# ------------------------------------------------------------------
# Tier management
# ------------------------------------------------------------------
def record_access(self, file_path: str) -> None:
"""Update last_accessed timestamp for a file."""
self._conn.execute(
"UPDATE files SET last_accessed = ? WHERE file_path = ?",
(time.time(), file_path),
)
self._conn.commit()
def record_access_batch(self, file_paths: list[str]) -> None:
"""Batch-update last_accessed timestamps for multiple files."""
if not file_paths:
return
now = time.time()
self._conn.executemany(
"UPDATE files SET last_accessed = ? WHERE file_path = ?",
[(now, fp) for fp in file_paths],
)
self._conn.commit()
def classify_tiers(
self, hot_threshold_hours: int = 24, cold_threshold_hours: int = 168
) -> None:
"""Reclassify all files into hot/warm/cold tiers based on last_accessed.
- hot: last_accessed within hot_threshold_hours
- cold: last_accessed older than cold_threshold_hours (or never accessed)
- warm: everything in between
"""
now = time.time()
hot_cutoff = now - hot_threshold_hours * 3600
cold_cutoff = now - cold_threshold_hours * 3600
# Hot: recently accessed
self._conn.execute(
"UPDATE files SET tier = 'hot' "
"WHERE last_accessed IS NOT NULL AND last_accessed >= ?",
(hot_cutoff,),
)
# Cold: not accessed for a long time, or never accessed
self._conn.execute(
"UPDATE files SET tier = 'cold' "
"WHERE last_accessed IS NULL "
"OR (last_accessed < ? AND last_accessed < ?)",
(cold_cutoff, hot_cutoff),
)
# Warm: between hot and cold cutoffs
self._conn.execute(
"UPDATE files SET tier = 'warm' "
"WHERE last_accessed IS NOT NULL "
"AND last_accessed >= ? AND last_accessed < ?",
(cold_cutoff, hot_cutoff),
)
self._conn.commit()
def get_files_by_tier(self, tier: str) -> list[str]:
"""Return file paths in the specified tier ('hot', 'warm', or 'cold')."""
rows = self._conn.execute(
"SELECT file_path FROM files WHERE tier = ?", (tier,)
).fetchall()
return [r[0] for r in rows]
def get_cold_files(self) -> list[str]:
"""Return file paths in the 'cold' tier."""
return self.get_files_by_tier("cold")
def get_file_tier(self, file_path: str) -> str | None:
"""Return the tier for a specific file, or None if not tracked."""
row = self._conn.execute(
"SELECT tier FROM files WHERE file_path = ?", (file_path,)
).fetchone()
return row[0] if row else None
def close(self) -> None: def close(self) -> None:
self._conn.close() self._conn.close()

View File

@@ -17,8 +17,7 @@ from pathlib import Path
import numpy as np import numpy as np
from codexlens_search.config import Config from codexlens_search.config import Config
from codexlens_search.core.binary import BinaryStore from codexlens_search.core.base import BaseANNIndex, BaseBinaryIndex
from codexlens_search.core.index import ANNIndex
from codexlens_search.embed.base import BaseEmbedder from codexlens_search.embed.base import BaseEmbedder
from codexlens_search.indexing.metadata import MetadataStore from codexlens_search.indexing.metadata import MetadataStore
from codexlens_search.search.fts import FTSEngine from codexlens_search.search.fts import FTSEngine
@@ -100,8 +99,8 @@ class IndexingPipeline:
def __init__( def __init__(
self, self,
embedder: BaseEmbedder, embedder: BaseEmbedder,
binary_store: BinaryStore, binary_store: BaseBinaryIndex,
ann_index: ANNIndex, ann_index: BaseANNIndex,
fts: FTSEngine, fts: FTSEngine,
config: Config, config: Config,
metadata: MetadataStore | None = None, metadata: MetadataStore | None = None,
@@ -463,6 +462,94 @@ class IndexingPipeline:
meta = self._require_metadata() meta = self._require_metadata()
return meta.max_chunk_id() + 1 return meta.max_chunk_id() + 1
def index_files_fts_only(
self,
files: list[Path],
*,
root: Path | None = None,
max_chunk_chars: int = _DEFAULT_MAX_CHUNK_CHARS,
chunk_overlap: int = _DEFAULT_CHUNK_OVERLAP,
) -> IndexStats:
"""Index files into FTS5 only, without embedding or vector indexing.
Chunks files using the same logic as the full pipeline, then inserts
directly into FTS. No embedding computation, no binary/ANN store writes.
Args:
files: List of file paths to index.
root: Optional root for computing relative paths.
max_chunk_chars: Maximum characters per chunk.
chunk_overlap: Character overlap between consecutive chunks.
Returns:
IndexStats with counts and timing.
"""
if not files:
return IndexStats()
meta = self._require_metadata()
t0 = time.monotonic()
chunk_id = self._next_chunk_id()
files_processed = 0
chunks_created = 0
for fpath in files:
exclude_reason = is_file_excluded(fpath, self._config)
if exclude_reason:
logger.debug("Skipping %s: %s", fpath, exclude_reason)
continue
try:
text = fpath.read_text(encoding="utf-8", errors="replace")
except Exception as exc:
logger.debug("Skipping %s: %s", fpath, exc)
continue
rel_path = str(fpath.relative_to(root)) if root else str(fpath)
content_hash = self._content_hash(text)
# Skip unchanged files
if not meta.file_needs_update(rel_path, content_hash):
continue
# Remove old FTS data if file was previously indexed
if meta.get_file_hash(rel_path) is not None:
meta.mark_file_deleted(rel_path)
self._fts.delete_by_path(rel_path)
file_chunks = self._smart_chunk(text, rel_path, max_chunk_chars, chunk_overlap)
if not file_chunks:
st = fpath.stat()
meta.register_file(rel_path, content_hash, st.st_mtime, st.st_size)
continue
files_processed += 1
fts_docs = []
chunk_id_hashes = []
for chunk_text, path, sl, el in file_chunks:
fts_docs.append((chunk_id, path, chunk_text, sl, el))
chunk_id_hashes.append((chunk_id, self._content_hash(chunk_text)))
chunk_id += 1
self._fts.add_documents(fts_docs)
chunks_created += len(fts_docs)
# Register metadata
st = fpath.stat()
meta.register_file(rel_path, content_hash, st.st_mtime, st.st_size)
meta.register_chunks(rel_path, chunk_id_hashes)
duration = time.monotonic() - t0
stats = IndexStats(
files_processed=files_processed,
chunks_created=chunks_created,
duration_seconds=round(duration, 2),
)
logger.info(
"FTS-only indexing complete: %d files, %d chunks in %.1fs",
stats.files_processed, stats.chunks_created, stats.duration_seconds,
)
return stats
def index_file( def index_file(
self, self,
file_path: Path, file_path: Path,
@@ -522,7 +609,8 @@ class IndexingPipeline:
file_chunks = self._smart_chunk(text, rel_path, max_chunk_chars, chunk_overlap) file_chunks = self._smart_chunk(text, rel_path, max_chunk_chars, chunk_overlap)
if not file_chunks: if not file_chunks:
# Register file with no chunks # Register file with no chunks
meta.register_file(rel_path, content_hash, file_path.stat().st_mtime) st = file_path.stat()
meta.register_file(rel_path, content_hash, st.st_mtime, st.st_size)
return IndexStats( return IndexStats(
files_processed=1, files_processed=1,
duration_seconds=round(time.monotonic() - t0, 2), duration_seconds=round(time.monotonic() - t0, 2),
@@ -556,7 +644,8 @@ class IndexingPipeline:
self._fts.add_documents(fts_docs) self._fts.add_documents(fts_docs)
# Register in metadata # Register in metadata
meta.register_file(rel_path, content_hash, file_path.stat().st_mtime) st = file_path.stat()
meta.register_file(rel_path, content_hash, st.st_mtime, st.st_size)
chunk_id_hashes = [ chunk_id_hashes = [
(batch_ids[i], self._content_hash(batch_texts[i])) (batch_ids[i], self._content_hash(batch_texts[i]))
for i in range(len(batch_ids)) for i in range(len(batch_ids))
@@ -605,6 +694,7 @@ class IndexingPipeline:
chunk_overlap: int = _DEFAULT_CHUNK_OVERLAP, chunk_overlap: int = _DEFAULT_CHUNK_OVERLAP,
max_file_size: int = 50_000, max_file_size: int = 50_000,
progress_callback: callable | None = None, progress_callback: callable | None = None,
tier: str = "full",
) -> IndexStats: ) -> IndexStats:
"""Reconcile index state against a current file list. """Reconcile index state against a current file list.
@@ -617,6 +707,9 @@ class IndexingPipeline:
max_chunk_chars: Maximum characters per chunk. max_chunk_chars: Maximum characters per chunk.
chunk_overlap: Character overlap between consecutive chunks. chunk_overlap: Character overlap between consecutive chunks.
max_file_size: Skip files larger than this (bytes). max_file_size: Skip files larger than this (bytes).
tier: Indexing tier - 'full' (default) runs the full pipeline
with embedding, 'fts_only' runs FTS-only indexing without
embedding or vector stores.
Returns: Returns:
Aggregated IndexStats for all operations. Aggregated IndexStats for all operations.
@@ -638,33 +731,72 @@ class IndexingPipeline:
for rel in removed: for rel in removed:
self.remove_file(rel) self.remove_file(rel)
# Collect files needing update # Collect files needing update using 4-level detection:
# Level 1: set diff (removed files) - handled above
# Level 2: mtime + size fast pre-check via stat()
# Level 3: content hash only when mtime/size mismatch
files_to_index: list[Path] = [] files_to_index: list[Path] = []
for rel, fpath in current_rel_paths.items(): for rel, fpath in current_rel_paths.items():
# Level 2: stat-based fast check
try:
st = fpath.stat()
except OSError:
continue
if not meta.file_needs_update_fast(rel, st.st_mtime, st.st_size):
# mtime + size match stored values -> skip (no read needed)
continue
# Level 3: mtime/size changed -> verify with content hash
try: try:
text = fpath.read_text(encoding="utf-8", errors="replace") text = fpath.read_text(encoding="utf-8", errors="replace")
except Exception: except Exception:
continue continue
content_hash = self._content_hash(text) content_hash = self._content_hash(text)
if meta.file_needs_update(rel, content_hash): if not meta.file_needs_update(rel, content_hash):
# Remove old data if previously indexed # Content unchanged despite mtime/size change -> update metadata only
if meta.get_file_hash(rel) is not None: meta.register_file(rel, content_hash, st.st_mtime, st.st_size)
meta.mark_file_deleted(rel) continue
self._fts.delete_by_path(rel)
files_to_index.append(fpath)
# Batch index via parallel pipeline # File genuinely changed -> remove old data and queue for re-index
if meta.get_file_hash(rel) is not None:
meta.mark_file_deleted(rel)
self._fts.delete_by_path(rel)
files_to_index.append(fpath)
# Sort files by data tier priority: hot first, then warm, then cold
if files_to_index: if files_to_index:
# Set starting chunk ID from metadata _tier_priority = {"hot": 0, "warm": 1, "cold": 2}
start_id = self._next_chunk_id() def _tier_sort_key(fp: Path) -> int:
batch_stats = self._index_files_with_metadata( rel = str(fp.relative_to(root)) if root else str(fp)
files_to_index, t = meta.get_file_tier(rel)
root=root, return _tier_priority.get(t or "warm", 1)
max_chunk_chars=max_chunk_chars, files_to_index.sort(key=_tier_sort_key)
chunk_overlap=chunk_overlap,
start_chunk_id=start_id, # Reclassify data tiers after sync detection
progress_callback=progress_callback, meta.classify_tiers(
) self._config.tier_hot_hours, self._config.tier_cold_hours
)
# Batch index via parallel pipeline or FTS-only
if files_to_index:
if tier == "fts_only":
batch_stats = self.index_files_fts_only(
files_to_index,
root=root,
max_chunk_chars=max_chunk_chars,
chunk_overlap=chunk_overlap,
)
else:
# Full pipeline with embedding
start_id = self._next_chunk_id()
batch_stats = self._index_files_with_metadata(
files_to_index,
root=root,
max_chunk_chars=max_chunk_chars,
chunk_overlap=chunk_overlap,
start_chunk_id=start_id,
progress_callback=progress_callback,
)
total_files = batch_stats.files_processed total_files = batch_stats.files_processed
total_chunks = batch_stats.chunks_created total_chunks = batch_stats.chunks_created
else: else:
@@ -781,7 +913,8 @@ class IndexingPipeline:
file_chunks = self._smart_chunk(text, rel_path, max_chunk_chars, chunk_overlap) file_chunks = self._smart_chunk(text, rel_path, max_chunk_chars, chunk_overlap)
if not file_chunks: if not file_chunks:
meta.register_file(rel_path, content_hash, fpath.stat().st_mtime) st = fpath.stat()
meta.register_file(rel_path, content_hash, st.st_mtime, st.st_size)
continue continue
files_processed += 1 files_processed += 1
@@ -806,7 +939,8 @@ class IndexingPipeline:
chunks_created += len(file_chunk_ids) chunks_created += len(file_chunk_ids)
# Register metadata per file # Register metadata per file
meta.register_file(rel_path, content_hash, fpath.stat().st_mtime) st = fpath.stat()
meta.register_file(rel_path, content_hash, st.st_mtime, st.st_size)
chunk_id_hashes = [ chunk_id_hashes = [
(cid, self._content_hash(ct)) for cid, ct in file_chunk_ids (cid, self._content_hash(ct)) for cid, ct in file_chunk_ids
] ]

View File

@@ -102,13 +102,20 @@ def _get_pipelines(project_path: str) -> tuple:
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@mcp.tool() @mcp.tool()
def search_code(project_path: str, query: str, top_k: int = 10) -> str: def search_code(
project_path: str, query: str, top_k: int = 10, quality: str = "auto"
) -> str:
"""Semantic code search with hybrid fusion (vector + FTS + reranking). """Semantic code search with hybrid fusion (vector + FTS + reranking).
Args: Args:
project_path: Absolute path to the project root directory. project_path: Absolute path to the project root directory.
query: Natural language or code search query. query: Natural language or code search query.
top_k: Maximum number of results to return (default 10). top_k: Maximum number of results to return (default 10).
quality: Search quality tier (default "auto"):
- "fast": FTS-only + rerank (no embedding needed, fastest)
- "balanced": FTS + binary coarse search + rerank
- "thorough": Full 2-stage vector + FTS + reranking (best quality)
- "auto": Uses "thorough" if vector index exists, else "fast"
Returns: Returns:
Search results as formatted text with file paths, line numbers, scores, and code snippets. Search results as formatted text with file paths, line numbers, scores, and code snippets.
@@ -121,15 +128,75 @@ def search_code(project_path: str, query: str, top_k: int = 10) -> str:
if not (db_path / "metadata.db").exists(): if not (db_path / "metadata.db").exists():
return f"Error: no index found at {db_path}. Run index_project first." return f"Error: no index found at {db_path}. Run index_project first."
valid_qualities = ("fast", "balanced", "thorough", "auto")
if quality not in valid_qualities:
return f"Error: invalid quality '{quality}'. Must be one of: {', '.join(valid_qualities)}"
_, search, _ = _get_pipelines(project_path) _, search, _ = _get_pipelines(project_path)
results = search.search(query, top_k=top_k) results = search.search(query, top_k=top_k, quality=quality)
if not results: if not results:
return "No results found." return "No results found."
lines = [] lines = []
for i, r in enumerate(results, 1): for i, r in enumerate(results, 1):
lines.append(f"## Result {i} {r.path} (L{r.line}-{r.end_line}, score: {r.score:.4f})") lines.append(f"## Result {i} -- {r.path} (L{r.line}-{r.end_line}, score: {r.score:.4f})")
lines.append(f"```\n{r.content}\n```")
lines.append("")
return "\n".join(lines)
@mcp.tool()
def search_scope(
project_path: str,
query: str,
scope_path: str,
top_k: int = 10,
quality: str = "auto",
) -> str:
"""Search within a specific directory scope of a project.
Runs a normal search then filters results to only include files
under the specified scope path.
Args:
project_path: Absolute path to the project root directory.
query: Natural language or code search query.
scope_path: Relative directory path to limit search scope (e.g. "src/auth").
top_k: Maximum number of scoped results to return (default 10).
quality: Search quality tier ("fast", "balanced", "thorough", "auto").
Returns:
Search results filtered to the scope path.
"""
root = Path(project_path).resolve()
if not root.is_dir():
return f"Error: project path not found: {root}"
db_path = _db_path_for_project(project_path)
if not (db_path / "metadata.db").exists():
return f"Error: no index found at {db_path}. Run index_project first."
# Normalize scope path for prefix matching
scope = scope_path.replace("\\", "/").strip("/")
_, search, _ = _get_pipelines(project_path)
# Fetch more results than top_k to account for filtering
all_results = search.search(query, top_k=top_k * 5, quality=quality)
# Filter by scope path prefix
scoped = [
r for r in all_results
if r.path.replace("\\", "/").startswith(scope + "/")
or r.path.replace("\\", "/") == scope
]
if not scoped:
return f"No results found in scope '{scope_path}'."
lines = []
for i, r in enumerate(scoped[:top_k], 1):
lines.append(f"## Result {i} -- {r.path} (L{r.line}-{r.end_line}, score: {r.score:.4f})")
lines.append(f"```\n{r.content}\n```") lines.append(f"```\n{r.content}\n```")
lines.append("") lines.append("")
return "\n".join(lines) return "\n".join(lines)
@@ -275,6 +342,59 @@ async def index_update(
) )
@mcp.tool()
def index_scope(
project_path: str,
scope_path: str,
glob_pattern: str = "**/*",
tier: str = "full",
) -> str:
"""Index a specific directory scope within a project.
Useful for quickly indexing a subdirectory (e.g. after editing files
in a specific module) without re-indexing the entire project.
Args:
project_path: Absolute path to the project root directory.
scope_path: Relative directory path to index (e.g. "src/auth").
glob_pattern: Glob pattern for files within scope (default "**/*").
tier: Indexing tier - "full" (default) runs full pipeline with
embedding, "fts_only" indexes text only (faster, no vectors).
Returns:
Indexing summary for the scoped directory.
"""
root = Path(project_path).resolve()
if not root.is_dir():
return f"Error: project path not found: {root}"
scope_dir = root / scope_path
if not scope_dir.is_dir():
return f"Error: scope directory not found: {scope_dir}"
valid_tiers = ("full", "fts_only")
if tier not in valid_tiers:
return f"Error: invalid tier '{tier}'. Must be one of: {', '.join(valid_tiers)}"
indexing, _, _ = _get_pipelines(project_path)
file_paths = [
p for p in scope_dir.glob(glob_pattern)
if p.is_file() and not should_exclude(p.relative_to(root), DEFAULT_EXCLUDES)
]
if not file_paths:
return f"No files found in {scope_path} matching '{glob_pattern}'."
stats = indexing.sync(file_paths, root=root, tier=tier)
tier_label = "FTS-only" if tier == "fts_only" else "full"
return (
f"Indexed {stats.files_processed} files ({tier_label}), "
f"{stats.chunks_created} chunks in {stats.duration_seconds:.1f}s. "
f"Scope: {scope_path}"
)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# File discovery # File discovery
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------

View File

@@ -7,7 +7,7 @@ from dataclasses import dataclass
import numpy as np import numpy as np
from ..config import Config from ..config import Config
from ..core import ANNIndex, BinaryStore from ..core.base import BaseANNIndex, BaseBinaryIndex
from ..embed import BaseEmbedder from ..embed import BaseEmbedder
from ..indexing.metadata import MetadataStore from ..indexing.metadata import MetadataStore
from ..rerank import BaseReranker from ..rerank import BaseReranker
@@ -21,6 +21,8 @@ from .fusion import (
_log = logging.getLogger(__name__) _log = logging.getLogger(__name__)
_VALID_QUALITIES = ("fast", "balanced", "thorough", "auto")
@dataclass @dataclass
class SearchResult: class SearchResult:
@@ -37,8 +39,8 @@ class SearchPipeline:
def __init__( def __init__(
self, self,
embedder: BaseEmbedder, embedder: BaseEmbedder,
binary_store: BinaryStore, binary_store: BaseBinaryIndex,
ann_index: ANNIndex, ann_index: BaseANNIndex,
reranker: BaseReranker, reranker: BaseReranker,
fts: FTSEngine, fts: FTSEngine,
config: Config, config: Config,
@@ -52,6 +54,15 @@ class SearchPipeline:
self._config = config self._config = config
self._metadata_store = metadata_store self._metadata_store = metadata_store
# -- Helper: check if vector index has data ----------------------------
def _has_vector_index(self) -> bool:
"""Check if the binary store has any indexed entries."""
try:
return len(self._binary_store) > 0
except Exception:
return False
# -- Helper: vector search (binary coarse + ANN fine) ----------------- # -- Helper: vector search (binary coarse + ANN fine) -----------------
def _vector_search( def _vector_search(
@@ -84,6 +95,21 @@ class SearchPipeline:
] ]
return vector_results return vector_results
# -- Helper: binary coarse search only --------------------------------
def _binary_coarse_search(
self, query_vec: np.ndarray
) -> list[tuple[int, float]]:
"""Run binary coarse search only (no ANN fine search)."""
cfg = self._config
candidate_ids, distances = self._binary_store.coarse_search(
query_vec, top_k=cfg.binary_top_k
)
return [
(int(doc_id), float(dist))
for doc_id, dist in zip(candidate_ids, distances)
]
# -- Helper: FTS search (exact + fuzzy) ------------------------------ # -- Helper: FTS search (exact + fuzzy) ------------------------------
def _fts_search( def _fts_search(
@@ -95,55 +121,12 @@ class SearchPipeline:
fuzzy_results = self._fts.fuzzy_search(query, top_k=cfg.fts_top_k) fuzzy_results = self._fts.fuzzy_search(query, top_k=cfg.fts_top_k)
return exact_results, fuzzy_results return exact_results, fuzzy_results
# -- Main search entry point ----------------------------------------- # -- Helper: filter deleted IDs ---------------------------------------
def search(self, query: str, top_k: int | None = None) -> list[SearchResult]: def _filter_deleted(
cfg = self._config self, fused: list[tuple[int, float]]
final_top_k = top_k if top_k is not None else cfg.reranker_top_k ) -> list[tuple[int, float]]:
"""Remove tombstoned chunk IDs from results."""
# 1. Detect intent -> adaptive weights
intent = detect_query_intent(query)
weights = get_adaptive_weights(intent, cfg.fusion_weights)
# 2. Embed query
query_vec = self._embedder.embed_single(query)
# 3. Parallel vector + FTS search
vector_results: list[tuple[int, float]] = []
exact_results: list[tuple[int, float]] = []
fuzzy_results: list[tuple[int, float]] = []
with ThreadPoolExecutor(max_workers=2) as pool:
vec_future = pool.submit(self._vector_search, query_vec)
fts_future = pool.submit(self._fts_search, query)
# Collect vector results
try:
vector_results = vec_future.result()
except Exception:
_log.warning("Vector search failed, using empty results", exc_info=True)
# Collect FTS results
try:
exact_results, fuzzy_results = fts_future.result()
except Exception:
_log.warning("FTS search failed, using empty results", exc_info=True)
# 4. RRF fusion
fusion_input: dict[str, list[tuple[int, float]]] = {}
if vector_results:
fusion_input["vector"] = vector_results
if exact_results:
fusion_input["exact"] = exact_results
if fuzzy_results:
fusion_input["fuzzy"] = fuzzy_results
if not fusion_input:
return []
fused = reciprocal_rank_fusion(fusion_input, weights=weights, k=cfg.fusion_k)
# 4b. Filter out deleted IDs (tombstone filtering)
if self._metadata_store is not None: if self._metadata_store is not None:
deleted_ids = self._metadata_store.get_deleted_ids() deleted_ids = self._metadata_store.get_deleted_ids()
if deleted_ids: if deleted_ids:
@@ -152,16 +135,30 @@ class SearchPipeline:
for doc_id, score in fused for doc_id, score in fused
if doc_id not in deleted_ids if doc_id not in deleted_ids
] ]
return fused
# 5. Rerank top candidates # -- Helper: rerank and build results ---------------------------------
rerank_ids = [doc_id for doc_id, _ in fused[:50]]
contents = [self._fts.get_content(doc_id) for doc_id in rerank_ids]
rerank_scores = self._reranker.score_pairs(query, contents)
# 6. Sort by rerank score, build SearchResult list def _rerank_and_build(
ranked = sorted( self,
zip(rerank_ids, rerank_scores), key=lambda x: x[1], reverse=True query: str,
) fused: list[tuple[int, float]],
final_top_k: int,
use_reranker: bool = True,
) -> list[SearchResult]:
"""Rerank candidates (optionally) and build SearchResult list."""
if not fused:
return []
if use_reranker:
rerank_ids = [doc_id for doc_id, _ in fused[:50]]
contents = [self._fts.get_content(doc_id) for doc_id in rerank_ids]
rerank_scores = self._reranker.score_pairs(query, contents)
ranked = sorted(
zip(rerank_ids, rerank_scores), key=lambda x: x[1], reverse=True
)
else:
ranked = fused
results: list[SearchResult] = [] results: list[SearchResult] = []
for doc_id, score in ranked[:final_top_k]: for doc_id, score in ranked[:final_top_k]:
@@ -179,3 +176,178 @@ class SearchPipeline:
) )
) )
return results return results
# -- Helper: record access for tier tracking --------------------------
def _record_access(self, results: list[SearchResult]) -> None:
"""Record file access for data tier tracking."""
if results and self._metadata_store is not None:
unique_paths = list({r.path for r in results})
try:
self._metadata_store.record_access_batch(unique_paths)
except Exception:
_log.debug("Failed to record access for tier tracking", exc_info=True)
# -- Quality-routed search methods ------------------------------------
def _search_fast(
self, query: str, final_top_k: int
) -> list[SearchResult]:
"""FTS-only search with reranking. No embedding needed."""
exact_results, fuzzy_results = self._fts_search(query)
fusion_input: dict[str, list[tuple[int, float]]] = {}
if exact_results:
fusion_input["exact"] = exact_results
if fuzzy_results:
fusion_input["fuzzy"] = fuzzy_results
if not fusion_input:
return []
fused = reciprocal_rank_fusion(
fusion_input, weights={"exact": 0.7, "fuzzy": 0.3},
k=self._config.fusion_k,
)
fused = self._filter_deleted(fused)
return self._rerank_and_build(query, fused, final_top_k, use_reranker=True)
def _search_balanced(
self, query: str, final_top_k: int
) -> list[SearchResult]:
"""FTS + binary coarse search with RRF fusion and reranking.
Embeds the query for binary coarse search but skips ANN fine search.
"""
intent = detect_query_intent(query)
weights = get_adaptive_weights(intent, self._config.fusion_weights)
query_vec = self._embedder.embed_single(query)
# Parallel: binary coarse + FTS
coarse_results: list[tuple[int, float]] = []
exact_results: list[tuple[int, float]] = []
fuzzy_results: list[tuple[int, float]] = []
with ThreadPoolExecutor(max_workers=2) as pool:
coarse_future = pool.submit(self._binary_coarse_search, query_vec)
fts_future = pool.submit(self._fts_search, query)
try:
coarse_results = coarse_future.result()
except Exception:
_log.warning("Binary coarse search failed", exc_info=True)
try:
exact_results, fuzzy_results = fts_future.result()
except Exception:
_log.warning("FTS search failed", exc_info=True)
fusion_input: dict[str, list[tuple[int, float]]] = {}
if coarse_results:
fusion_input["vector"] = coarse_results
if exact_results:
fusion_input["exact"] = exact_results
if fuzzy_results:
fusion_input["fuzzy"] = fuzzy_results
if not fusion_input:
return []
fused = reciprocal_rank_fusion(fusion_input, weights=weights, k=self._config.fusion_k)
fused = self._filter_deleted(fused)
return self._rerank_and_build(query, fused, final_top_k, use_reranker=True)
def _search_thorough(
self, query: str, final_top_k: int
) -> list[SearchResult]:
"""Full 2-stage vector + FTS + reranking pipeline (original behavior)."""
cfg = self._config
intent = detect_query_intent(query)
weights = get_adaptive_weights(intent, cfg.fusion_weights)
query_vec = self._embedder.embed_single(query)
# Parallel vector + FTS search
vector_results: list[tuple[int, float]] = []
exact_results: list[tuple[int, float]] = []
fuzzy_results: list[tuple[int, float]] = []
with ThreadPoolExecutor(max_workers=2) as pool:
vec_future = pool.submit(self._vector_search, query_vec)
fts_future = pool.submit(self._fts_search, query)
try:
vector_results = vec_future.result()
except Exception:
_log.warning("Vector search failed, using empty results", exc_info=True)
try:
exact_results, fuzzy_results = fts_future.result()
except Exception:
_log.warning("FTS search failed, using empty results", exc_info=True)
fusion_input: dict[str, list[tuple[int, float]]] = {}
if vector_results:
fusion_input["vector"] = vector_results
if exact_results:
fusion_input["exact"] = exact_results
if fuzzy_results:
fusion_input["fuzzy"] = fuzzy_results
if not fusion_input:
return []
fused = reciprocal_rank_fusion(fusion_input, weights=weights, k=cfg.fusion_k)
fused = self._filter_deleted(fused)
return self._rerank_and_build(query, fused, final_top_k, use_reranker=True)
# -- Main search entry point -----------------------------------------
def search(
self,
query: str,
top_k: int | None = None,
quality: str | None = None,
) -> list[SearchResult]:
"""Search with quality-based routing.
Args:
query: Search query string.
top_k: Maximum results to return.
quality: Search quality tier:
- 'fast': FTS-only + rerank (no embedding, no vector search)
- 'balanced': FTS + binary coarse + rerank (no ANN fine search)
- 'thorough': Full 2-stage vector + FTS + reranking
- 'auto': Selects 'thorough' if vectors exist, else 'fast'
- None: Uses config.default_search_quality
Returns:
List of SearchResult ordered by relevance.
"""
cfg = self._config
final_top_k = top_k if top_k is not None else cfg.reranker_top_k
# Resolve quality tier
effective_quality = quality or cfg.default_search_quality
if effective_quality not in _VALID_QUALITIES:
_log.warning(
"Invalid search quality '%s', falling back to 'auto'",
effective_quality,
)
effective_quality = "auto"
# Auto-detect: use thorough if vector index has data, else fast
if effective_quality == "auto":
effective_quality = "thorough" if self._has_vector_index() else "fast"
if effective_quality == "fast":
results = self._search_fast(query, final_top_k)
elif effective_quality == "balanced":
results = self._search_balanced(query, final_top_k)
else:
results = self._search_thorough(query, final_top_k)
self._record_access(results)
return results

View File

@@ -20,6 +20,7 @@ from watchdog.events import FileSystemEventHandler
from watchdog.observers import Observer from watchdog.observers import Observer
from .events import ChangeType, FileEvent, WatcherConfig from .events import ChangeType, FileEvent, WatcherConfig
from .incremental_indexer import IncrementalIndexer
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -261,3 +262,24 @@ class FileWatcher:
if output: if output:
sys.stdout.write(output + "\n") sys.stdout.write(output + "\n")
sys.stdout.flush() sys.stdout.flush()
@classmethod
def create_with_indexer(
cls,
root_path: Path,
config: WatcherConfig,
indexer: IncrementalIndexer,
) -> "FileWatcher":
"""Create a FileWatcher wired to an IncrementalIndexer's async path.
Uses ``indexer.process_events_async()`` as the callback so that
events are debounced and batched within the indexer before
processing, preventing redundant per-file pipeline startups.
Example::
indexer = IncrementalIndexer(pipeline, root=root)
watcher = FileWatcher.create_with_indexer(root, config, indexer)
watcher.start()
"""
return cls(root_path, config, indexer.process_events_async)

View File

@@ -4,10 +4,13 @@ Ported from codex-lens v1 with simplifications:
- Uses IndexingPipeline.index_file() / remove_file() directly - Uses IndexingPipeline.index_file() / remove_file() directly
- No v1-specific Config, ParserFactory, DirIndexStore dependencies - No v1-specific Config, ParserFactory, DirIndexStore dependencies
- Per-file error isolation: one failure does not stop batch processing - Per-file error isolation: one failure does not stop batch processing
- Debounce batching: process_events_async() buffers events and flushes
after a configurable window to prevent redundant per-file pipeline startups
""" """
from __future__ import annotations from __future__ import annotations
import logging import logging
import threading
from dataclasses import dataclass, field from dataclasses import dataclass, field
from pathlib import Path from pathlib import Path
from typing import List, Optional from typing import List, Optional
@@ -60,6 +63,7 @@ class IncrementalIndexer:
pipeline: IndexingPipeline, pipeline: IndexingPipeline,
*, *,
root: Optional[Path] = None, root: Optional[Path] = None,
debounce_window_ms: int = 500,
) -> None: ) -> None:
"""Initialize the incremental indexer. """Initialize the incremental indexer.
@@ -67,9 +71,15 @@ class IncrementalIndexer:
pipeline: The indexing pipeline with metadata store configured. pipeline: The indexing pipeline with metadata store configured.
root: Optional project root for computing relative paths. root: Optional project root for computing relative paths.
If None, absolute paths are used as identifiers. If None, absolute paths are used as identifiers.
debounce_window_ms: Milliseconds to buffer events before flushing
in process_events_async(). Default 500ms.
""" """
self._pipeline = pipeline self._pipeline = pipeline
self._root = root self._root = root
self._debounce_window_ms = debounce_window_ms
self._event_buffer: List[FileEvent] = []
self._buffer_lock = threading.Lock()
self._flush_timer: Optional[threading.Timer] = None
def process_events(self, events: List[FileEvent]) -> BatchResult: def process_events(self, events: List[FileEvent]) -> BatchResult:
"""Process a batch of file events with per-file error isolation. """Process a batch of file events with per-file error isolation.
@@ -107,6 +117,52 @@ class IncrementalIndexer:
return result return result
def process_events_async(self, events: List[FileEvent]) -> None:
"""Buffer events and flush after the debounce window expires.
Non-blocking: events are accumulated in an internal buffer.
When no new events arrive within *debounce_window_ms*, the buffer
is flushed and all accumulated events are processed as a single
batch via process_events().
Args:
events: List of file events to buffer.
"""
with self._buffer_lock:
self._event_buffer.extend(events)
# Cancel previous timer and start a new one (true debounce)
if self._flush_timer is not None:
self._flush_timer.cancel()
self._flush_timer = threading.Timer(
self._debounce_window_ms / 1000.0,
self._flush_buffer,
)
self._flush_timer.daemon = True
self._flush_timer.start()
def _flush_buffer(self) -> None:
"""Flush the event buffer and process all accumulated events."""
with self._buffer_lock:
if not self._event_buffer:
return
events = list(self._event_buffer)
self._event_buffer.clear()
self._flush_timer = None
# Deduplicate: keep the last event per path
seen: dict[Path, FileEvent] = {}
for event in events:
seen[event.path] = event
deduped = list(seen.values())
logger.debug(
"Flushing debounce buffer: %d events (%d after dedup)",
len(events), len(deduped),
)
self.process_events(deduped)
def _handle_index(self, event: FileEvent, result: BatchResult) -> None: def _handle_index(self, event: FileEvent, result: BatchResult) -> None:
"""Index a created or modified file.""" """Index a created or modified file."""
stats = self._pipeline.index_file( stats = self._pipeline.index_file(