Add comprehensive tests for vector/semantic search functionality

- Implement full coverage tests for Embedder model loading and embedding generation - Add CRUD operations and caching tests for VectorStore - Include cosine similarity computation tests - Validate semantic search accuracy and relevance through various queries - Establish performance benchmarks for embedding and search operations - Ensure edge cases and error handling are covered - Test thread safety and concurrent access scenarios - Verify availability of semantic search dependencies
2026-02-13 02:41:50 +08:00 · 2025-12-14 17:17:09 +08:00
parent 8d542b8e45
commit 79a2953862
47 changed files with 11208 additions and 4336 deletions
--- a/.claude/workflows/tool-strategy.md
+++ b/.claude/workflows/tool-strategy.md
@@ -1,88 +1,216 @@
-# Tool Strategy
+# Tool Strategy - When to Use What

-## ⚡ Exa Triggering Mechanisms
+> **Focus**: Decision triggers and selection logic, NOT syntax (already registered with Claude)

-**Auto-Trigger**:
- User mentions "exa-code" or code-related queries → `mcp__exa__get_code_context_exa`
- Need current web information → `mcp__exa__web_search_exa`
-
-**Manual Trigger**:
- Complex API research → Exa Code Context
- Real-time information needs → Exa Web Search
-
-## ⚡ CCW MCP Tools
-
-
-### edit_file
-
-**When to Use**: Edit tool fails 1+ times on same file
+## Quick Decision Tree

 ```
-mcp__ccw-tools__edit_file(path="file.py", oldText="old", newText="new")
-mcp__ccw-tools__edit_file(path="file.py", oldText="old", newText="new", dryRun=true)
-mcp__ccw-tools__edit_file(path="file.py", oldText="old", newText="new", replaceAll=true)
-mcp__ccw-tools__edit_file(path="file.py", mode="line", operation="insert_after", line=10, text="new line")
+Need context?
+├─ Exa available? → Use Exa (fastest, most comprehensive)
+├─ Large codebase (>500 files)? → codex_lens
+├─ Known files (<5)? → Read tool
+└─ Unknown files? → smart_search → Read tool
+
+Need to modify files?
+├─ Built-in Edit fails? → mcp__ccw-tools__edit_file
+└─ Still fails? → mcp__ccw-tools__write_file
+
+Need to search?
+├─ Semantic/concept search? → smart_search (mode=semantic)
+├─ Exact pattern match? → Grep tool
+└─ Multiple search modes needed? → smart_search (mode=auto)
 ```

-**Options**: `dryRun` (preview diff), `replaceAll`, `mode` (update|line), `operation`, `line`, `text`
+---

-### write_file
+## 1. Context Gathering Tools

-**When to Use**: Create new files or overwrite existing content
+### Exa (`mcp__exa__get_code_context_exa`)
+
+**Use When**:
+- ✅ Researching external APIs, libraries, frameworks
+- ✅ Need recent documentation (post-cutoff knowledge)
+- ✅ Looking for implementation examples in public repos
+- ✅ Comparing architectural patterns across projects
+
+**Don't Use When**:
+- ❌ Searching internal codebase (use smart_search/codex_lens)
+- ❌ Files already in working directory (use Read)
+
+**Trigger Indicators**:
+- User mentions specific library/framework names
+- Questions about "best practices", "how does X work"
+- Need to verify current API signatures
+
+---
+
+### read_file (`mcp__ccw-tools__read_file`)
+
+**Use When**:
+- ✅ Reading multiple related files at once (batch reading)
+- ✅ Need directory traversal with pattern matching
+- ✅ Searching file content with regex (`contentPattern`)
+- ✅ Want to limit depth/file count for large directories
+
+**Don't Use When**:
+- ❌ Single file read → Use built-in Read tool (faster)
+- ❌ Unknown file locations → Use smart_search first
+- ❌ Need semantic search → Use smart_search or codex_lens
+
+**Trigger Indicators**:
+- Need to read "all TypeScript files in src/"
+- Need to find "files containing TODO comments"
+- Want to read "up to 20 config files"
+
+**Advantages over Built-in Read**:
+- Batch operation (multiple files in one call)
+- Pattern-based filtering (glob + content regex)
+- Directory traversal with depth control
+
+---
+
+### codex_lens (`mcp__ccw-tools__codex_lens`)
+
+**Use When**:
+- ✅ Large codebase (>500 files) requiring repeated searches
+- ✅ Need semantic understanding of code relationships
+- ✅ Working across multiple sessions (persistent index)
+- ✅ Symbol-level navigation needed
+
+**Don't Use When**:
+- ❌ Small project (<100 files) → Use smart_search (no indexing overhead)
+- ❌ One-time search → Use smart_search or Grep
+- ❌ Files change frequently → Indexing overhead not worth it
+
+**Trigger Indicators**:
+- "Find all implementations of interface X"
+- "What calls this function across the codebase?"
+- Multi-session workflow on same codebase
+
+**Action Selection**:
+- `init`: First time in new codebase
+- `search`: Find code patterns
+- `search_files`: Find files by path/name pattern
+- `symbol`: Get symbols in specific file
+- `status`: Check if index exists/is stale
+- `clean`: Remove stale index
+
+---
+
+### smart_search (`mcp__ccw-tools__smart_search`)
+
+**Use When**:
+- ✅ Don't know exact file locations
+- ✅ Need concept/semantic search ("authentication logic")
+- ✅ Medium-sized codebase (100-500 files)
+- ✅ One-time or infrequent searches
+
+**Don't Use When**:
+- ❌ Known exact file path → Use Read directly
+- ❌ Large codebase + repeated searches → Use codex_lens
+- ❌ Exact pattern match → Use Grep (faster)
+
+**Mode Selection**:
+- `auto`: Let tool decide (default, safest)
+- `exact`: Know exact pattern, need fast results
+- `fuzzy`: Typo-tolerant file/symbol names
+- `semantic`: Concept-based ("error handling", "data validation")
+- `graph`: Dependency/relationship analysis
+
+**Trigger Indicators**:
+- "Find files related to user authentication"
+- "Where is the payment processing logic?"
+- "Locate database connection setup"
+
+---
+
+## 2. File Modification Tools
+
+### edit_file (`mcp__ccw-tools__edit_file`)
+
+**Use When**:
+- ✅ Built-in Edit tool failed 1+ times
+- ✅ Need dry-run preview before applying
+- ✅ Need line-based operations (insert_after, insert_before)
+- ✅ Need to replace all occurrences
+
+**Don't Use When**:
+- ❌ Built-in Edit hasn't failed yet → Try built-in first
+- ❌ Need to create new file → Use write_file
+
+**Trigger Indicators**:
+- Built-in Edit returns "old_string not found"
+- Built-in Edit fails due to whitespace/formatting
+- Need to verify changes before applying (dryRun=true)
+
+**Mode Selection**:
+- `mode=update`: Replace text (similar to built-in Edit)
+- `mode=line`: Line-based operations (insert_after, insert_before, delete)
+
+---
+
+### write_file (`mcp__ccw-tools__write_file`)
+
+**Use When**:
+- ✅ Creating brand new files
+- ✅ MCP edit_file still fails (last resort)
+- ✅ Need to completely replace file content
+- ✅ Need backup before overwriting
+
+**Don't Use When**:
+- ❌ File exists + small change → Use Edit tools
+- ❌ Built-in Edit hasn't been tried → Try built-in Edit first
+
+**Trigger Indicators**:
+- All Edit attempts failed
+- Need to create new file with specific content
+- User explicitly asks to "recreate file"
+
+---
+
+## 3. Decision Logic
+
+### File Reading Priority

 ```
-mcp__ccw-tools__write_file(path="file.txt", content="Hello")
-mcp__ccw-tools__write_file(path="file.txt", content="code with `backticks` and ${vars}", backup=true)
+1. Known single file? → Built-in Read
+2. Multiple files OR pattern matching? → mcp__ccw-tools__read_file
+3. Unknown location? → smart_search, then Read
+4. Large codebase + repeated access? → codex_lens
 ```

-**Options**: `backup`, `createDirectories`, `encoding`
-
-### read_file
-
-**When to Use**: Read multiple files, directory traversal, content search
+### File Editing Priority

 ```
-mcp__ccw-tools__read_file(paths="file.ts")                         # Single file
-mcp__ccw-tools__read_file(paths=["a.ts", "b.ts"])                  # Multiple files
-mcp__ccw-tools__read_file(paths="src/", pattern="*.ts")            # Directory + glob
-mcp__ccw-tools__read_file(paths="src/", contentPattern="TODO")     # Regex search
+1. Always try built-in Edit first
+2. Fails 1+ times? → mcp__ccw-tools__edit_file
+3. Still fails? → mcp__ccw-tools__write_file (last resort)
 ```

-**Options**: `pattern`, `contentPattern`, `maxDepth` (3), `includeContent` (true), `maxFiles` (50)
-
-### codex_lens
-
-**When to Use**: Code indexing, semantic search, cache management
+### Search Tool Priority

 ```
-mcp__ccw-tools__codex_lens(action="init", path=".")
-mcp__ccw-tools__codex_lens(action="search", query="function main", path=".")
-mcp__ccw-tools__codex_lens(action="search_files", query="pattern", limit=20)
-mcp__ccw-tools__codex_lens(action="symbol", file="src/main.py")
-mcp__ccw-tools__codex_lens(action="status")
-mcp__ccw-tools__codex_lens(action="config_show")
-mcp__ccw-tools__codex_lens(action="config_set", key="index_dir", value="/path")
-mcp__ccw-tools__codex_lens(action="config_migrate", newPath="/new/path")
-mcp__ccw-tools__codex_lens(action="clean", path=".")
-mcp__ccw-tools__codex_lens(action="clean", all=true)
+1. External knowledge? → Exa
+2. Exact pattern in small codebase? → Built-in Grep
+3. Semantic/unknown location? → smart_search
+4. Large codebase + repeated searches? → codex_lens
 ```

-**Actions**: `init`, `search`, `search_files`, `symbol`, `status`, `config_show`, `config_set`, `config_migrate`, `clean`
+---

-### smart_search
+## 4. Anti-Patterns

-**When to Use**: Quick search without indexing, natural language queries
+**Don't**:
+- Use codex_lens for one-time searches in small projects
+- Use smart_search when file path is already known
+- Use write_file before trying Edit tools
+- Use Exa for internal codebase searches
+- Use read_file for single file when Read tool works

-```
-mcp__ccw-tools__smart_search(query="function main", path=".")
-mcp__ccw-tools__smart_search(query="def init", mode="exact")
-mcp__ccw-tools__smart_search(query="authentication logic", mode="semantic")
-```
-
-**Modes**: `auto` (default), `exact`, `fuzzy`, `semantic`, `graph`
-
-### Fallback Strategy
-
-1. **Edit fails 1+ times** → `mcp__ccw-tools__edit_file`
-2. **Still fails** → `mcp__ccw-tools__write_file`
+**Do**:
+- Start with simplest tool (Read, Edit, Grep)
+- Escalate to MCP tools when built-ins fail
+- Use semantic search (smart_search) for exploratory tasks
+- Use indexed search (codex_lens) for large, stable codebases
+- Use Exa for external/public knowledge