Add comprehensive workflows for CLI tools usage, coding philosophy, context requirements, file modification, and CodexLens auto hybrid mode

- Introduced a detailed guide for intelligent tools selection strategy, including quick reference, tool specifications, prompt templates, and best practices for CLI execution.
- Established a coding philosophy document outlining core beliefs, simplicity principles, and guidelines for effective coding practices.
- Created context requirements documentation emphasizing the importance of understanding existing patterns and dependencies before implementation.
- Developed a file modification workflow detailing the use of edit_file and write_file MCP tools, along with priority logic for file reading and editing.
- Implemented CodexLens auto hybrid mode, enhancing the CLI with automatic vector embedding generation and default hybrid search mode based on embedding availability.
This commit is contained in:
catlog22
2025-12-17 09:53:30 +08:00
parent 154a9283b5
commit d06a3ca12e
18 changed files with 619 additions and 313 deletions

View File

@@ -1,6 +1,6 @@
# Claude Instructions
- **CLI Tools Usage**: @~/.claude/rules/cli-tools-usage.md
- **Coding Philosophy**: @~/.claude/rules/coding-philosophy.md
- **Context Requirements**: @~/.claude/rules/context-requirements.md
- **File Modification**: @~/.claude/rules/file-modification.md
- **CLI Tools Usage**: @~/.claude/workflows/cli-tools-usage.md
- **Coding Philosophy**: @~/.claude/workflows/coding-philosophy.md
- **Context Requirements**: @~/.claude/workflows/context-requirements.md
- **File Modification**: @~/.claude/workflows/file-modification.md

View File

@@ -542,8 +542,8 @@ The `implementation_approach` supports **two execution modes** based on the pres
- **Use for**: Large-scale features, complex refactoring, or when user explicitly requests CLI tool usage
- **Required fields**: Same as default mode **PLUS** `command`, `resume_from` (optional)
- **Command patterns** (with resume support):
- `ccw cli exec '[prompt]' --tool codex --mode auto --cd [path]`
- `ccw cli exec '[prompt]' --resume ${previousCliId} --tool codex --mode auto` (resume from previous)
- `ccw cli exec '[prompt]' --tool codex --mode write --cd [path]`
- `ccw cli exec '[prompt]' --resume ${previousCliId} --tool codex --mode write` (resume from previous)
- `ccw cli exec '[prompt]' --tool gemini --mode write --cd [path]` (write mode)
- **Resume mechanism**: When step depends on previous CLI execution, include `--resume` with previous execution ID
@@ -621,7 +621,7 @@ Agent determines CLI tool usage per-step based on user semantics and task nature
"step": 3,
"title": "Execute implementation using CLI tool",
"description": "Use Codex/Gemini for complex autonomous execution",
"command": "ccw cli exec '[prompt]' --tool codex --mode auto --cd [path]",
"command": "ccw cli exec '[prompt]' --tool codex --mode write --cd [path]",
"modification_points": ["[Same as default mode]"],
"logic_flow": ["[Same as default mode]"],
"depends_on": [1, 2],
@@ -634,7 +634,7 @@ Agent determines CLI tool usage per-step based on user semantics and task nature
"step": 4,
"title": "Continue implementation with context",
"description": "Resume from previous step with accumulated context",
"command": "ccw cli exec '[continuation prompt]' --resume ${step3_cli_id} --tool codex --mode auto",
"command": "ccw cli exec '[continuation prompt]' --resume ${step3_cli_id} --tool codex --mode write",
"resume_from": "step3_cli_id", // Reference previous step's CLI ID
"modification_points": ["[Continue from step 3]"],
"logic_flow": ["[Build on previous output]"],

View File

@@ -134,7 +134,7 @@ RULES: $(cat {selected_template}) | {constraints}
```
analyze|plan → gemini (qwen fallback) + mode=analysis
execute (simple|medium) → gemini (qwen fallback) + mode=write
execute (complex) → codex + mode=auto
execute (complex) → codex + mode=write
discuss → multi (gemini + codex parallel)
```
@@ -165,9 +165,9 @@ RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/pattern.txt)
ccw cli exec "..." --tool gemini --mode write --cd {dir}
```
**Codex (Auto)**:
**Codex (Write)**:
```bash
ccw cli exec "..." --tool codex --mode auto --cd {dir}
ccw cli exec "..." --tool codex --mode write --cd {dir}
```
**Cross-Directory** (Gemini/Qwen):

View File

@@ -123,7 +123,7 @@ When task JSON contains `flow_control.implementation_approach` array:
**CLI Command Execution (CLI Execute Mode)**:
When step contains `command` field with Codex CLI, execute via CCW CLI. For Codex resume:
- First task (`depends_on: []`): `ccw cli exec "..." --tool codex --mode auto --cd [path]`
- First task (`depends_on: []`): `ccw cli exec "..." --tool codex --mode write --cd [path]`
- Subsequent tasks (has `depends_on`): Use CCW CLI with resume context to maintain session
**Test-Driven Development**:

View File

@@ -241,7 +241,7 @@ api_id=$((group_count + 3))
**Command Patterns**:
- Gemini/Qwen: `ccw cli exec "..." --tool gemini --cd dir`
- CLI Mode: `ccw cli exec "..." --tool gemini --mode write --cd dir`
- Codex: `ccw cli exec "..." --tool codex --mode auto --cd dir`
- Codex: `ccw cli exec "..." --tool codex --mode write --cd dir`
**Generation Process**:
1. Read configuration values (tool, cli_execute, mode) from workflow-session.json

View File

@@ -473,7 +473,7 @@ Detailed plan: ${executionContext.session.artifacts.plan}`)
return prompt
}
ccw cli exec "${buildCLIPrompt(batch)}" --tool codex --mode auto
ccw cli exec "${buildCLIPrompt(batch)}" --tool codex --mode write
```
**Execution with fixed IDs** (predictable ID pattern):
@@ -496,8 +496,8 @@ const previousCliId = batch.resumeFromCliId || null
// Build command with fixed ID (and optional resume for continuation)
const cli_command = previousCliId
? `ccw cli exec "${buildCLIPrompt(batch)}" --tool codex --mode auto --id ${fixedExecutionId} --resume ${previousCliId}`
: `ccw cli exec "${buildCLIPrompt(batch)}" --tool codex --mode auto --id ${fixedExecutionId}`
? `ccw cli exec "${buildCLIPrompt(batch)}" --tool codex --mode write --id ${fixedExecutionId} --resume ${previousCliId}`
: `ccw cli exec "${buildCLIPrompt(batch)}" --tool codex --mode write --id ${fixedExecutionId}`
bash_result = Bash(
command=cli_command,
@@ -519,7 +519,7 @@ if (bash_result.status === 'failed' || bash_result.status === 'timeout') {
⚠️ Execution incomplete. Resume available:
Fixed ID: ${fixedExecutionId}
Lookup: ccw cli detail ${fixedExecutionId}
Resume: ccw cli exec "Continue tasks" --resume ${fixedExecutionId} --tool codex --mode auto --id ${fixedExecutionId}-retry
Resume: ccw cli exec "Continue tasks" --resume ${fixedExecutionId} --tool codex --mode write --id ${fixedExecutionId}-retry
`)
// Store for potential retry in same session
@@ -582,7 +582,7 @@ ccw cli exec "[Shared Prompt Template with artifacts]" --tool qwen --mode analys
# Same prompt as Gemini, different execution engine
# Method 4: Codex Review (autonomous)
ccw cli exec "[Verify plan acceptance criteria at ${plan.json}]" --tool codex --mode auto
ccw cli exec "[Verify plan acceptance criteria at ${plan.json}]" --tool codex --mode write
```
**Multi-Round Review with Fixed IDs**:
@@ -744,5 +744,5 @@ Appended to `previousExecutionResults` array for context continuity in multi-exe
ccw cli detail ${fixedCliId}
# Resume with new fixed ID for retry
ccw cli exec "Continue from where we left off" --resume ${fixedCliId} --tool codex --mode auto --id ${fixedCliId}-retry
ccw cli exec "Continue from where we left off" --resume ${fixedCliId} --tool codex --mode write --id ${fixedCliId}-retry
```

View File

@@ -255,7 +255,7 @@ Based on userConfig.executionMethod:
CLI Resume Support (MANDATORY for all CLI commands):
- Use --resume parameter to continue from previous task execution
- Read previous task's cliExecutionId from session state
- Format: ccw cli exec "[prompt]" --resume ${previousCliId} --tool ${tool} --mode auto
- Format: ccw cli exec "[prompt]" --resume ${previousCliId} --tool ${tool} --mode write
## EXPLORATION CONTEXT (from context-package.exploration_results)
- Load exploration_results from context-package.json

View File

@@ -1,47 +0,0 @@
# Context Requirements
Before implementation, always:
- Identify 3+ existing similar patterns before implementation
- Map dependencies and integration points
- Understand testing framework and coding conventions
## Context Gathering
### Use Exa
- Researching external APIs, libraries, frameworks
- Need recent documentation beyond knowledge cutoff
- Looking for implementation examples in public repos
- User mentions specific library/framework names
- Questions about "best practices" or "how does X work"
### Use read_file (MCP)
- Reading multiple related files at once
- Directory traversal with pattern matching
- Searching file content with regex
- Need to limit depth/file count for large directories
- Batch operations on multiple files
- Pattern-based filtering (glob + content regex)
### Use codex_lens
- Large codebase (>500 files) requiring repeated searches
- Need semantic understanding of code relationships
- Working across multiple sessions
- Symbol-level navigation needed
- Finding all implementations of interface/class
- Tracking function calls across codebase
### Use smart_search
- Unknown file locations
- Concept/semantic search ("authentication logic", "payment processing")
- Medium-sized codebase (100-500 files)
- One-time or infrequent searches
- Natural language queries about code structure
**Mode Selection**:
- `auto`: Let tool decide (default)
- `exact`: Known exact pattern
- `fuzzy`: Typo-tolerant search
- `semantic`: Concept-based search
- `graph`: Dependency analysis

View File

@@ -17,11 +17,11 @@
```
┌─ Task Analysis/Documentation?
│ └─→ Use Gemini (Fallback: Codex,Qwen)
│ └─→ MODE: analysis (default, read-only)
│ └─→ MODE: analysis (read-only)
└─ Task Implementation/Bug Fix?
└─→ Use Codex (Fallback: Gemini,Qwen)
└─→ MODE: auto (full operations) or write (file operations)
└─→ Use Codex (Fallback: Gemini,Qwen)
└─→ MODE: write (file operations)
```
@@ -30,10 +30,10 @@
```
PURPOSE: [what] + [why] + [success criteria] + [constraints/scope]
TASK: • [step 1: specific action] • [step 2: specific action] • [step 3: specific action]
MODE: [analysis|write|auto]
MODE: [analysis|write]
CONTEXT: @[file patterns] | Memory: [session/tech/module context]
EXPECTED: [deliverable format] + [quality criteria] + [structure requirements]
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [domain constraints] | MODE=[permission]
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/[mode]-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [domain constraints]
```
### Intent Capture Checklist (Before CLI Execution)
@@ -54,20 +54,15 @@ RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt
- MODE: `analysis`
- When to Use: Code review, architecture analysis, pattern discovery, exploration
- **Write/Create**
- Tool: Gemini/Qwen
- **Write/Implement**
- Tool: Codex (Fallback: Gemini/Qwen)
- MODE: `write`
- When to Use: Documentation generation, file creation (non-code)
- **Implement/Fix**
- Tool: Codex
- MODE: `auto`
- When to Use: Feature implementation, bug fixes, test creation, refactoring
- When to Use: Feature implementation, bug fixes, test creation, refactoring, documentation generation, file creation
## Essential Command Structure
```bash
ccw cli exec "<PROMPT>" --tool <gemini|qwen|codex> --mode <analysis|write|auto>
ccw cli exec "<PROMPT>" --tool <gemini|qwen|codex> --mode <analysis|write>
```
**⚠️ CRITICAL**: `--mode` parameter is **MANDATORY** for all CLI executions. No defaults are assumed.
@@ -76,9 +71,9 @@ ccw cli exec "<PROMPT>" --tool <gemini|qwen|codex> --mode <analysis|write|auto>
- **Use tools early and often** - Tools are faster and more thorough
- **Unified CLI** - Always use `ccw cli exec` for consistent parameter handling
- **Mode is MANDATORY** - ALWAYS explicitly specify `--mode analysis|write|auto` (no implicit defaults)
- **Mode is MANDATORY** - ALWAYS explicitly specify `--mode analysis|write` (no implicit defaults)
- **One template required** - ALWAYS reference exactly ONE template in RULES (use universal fallback if no specific match)
- **Write protection** - Require EXPLICIT `--mode write` or `--mode auto`
- **Write protection** - Require EXPLICIT `--mode write` for file operations
- **No escape characters** - NEVER use `\$`, `\"`, `\'` in CLI commands
---
@@ -88,41 +83,52 @@ ccw cli exec "<PROMPT>" --tool <gemini|qwen|codex> --mode <analysis|write|auto>
### MODE Options
- **`analysis`**
- Permission: Read-only (default)
- Use For: Code review, architecture analysis, pattern discovery
- Specification: Auto for Gemini/Qwen
- Permission: Read-only
- Use For: Code review, architecture analysis, pattern discovery, exploration
- Specification: Safe for all tools (Gemini/Qwen/Codex)
- **`write`**
- Permission: Create/Modify/Delete
- Use For: Documentation, code creation, file modifications
- Specification: Requires `--mode write`
- **`auto`**
- Permission: Full operations
- Use For: Feature implementation, bug fixes, autonomous development
- Specification: Codex only, requires `--mode auto`
- Use For: Feature implementation, bug fixes, documentation, code creation, file modifications
- Specification: Requires explicit `--mode write`
### Mode Protocol References (MANDATORY)
**⚠️ REQUIRED**: Every CLI execution MUST include the corresponding mode protocol in RULES:
- **`analysis`**
- Protocol (REQUIRED): `$(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md)`
#### Mode Rule= Templates
- **`write/auto`**
- Protocol (REQUIRED): `$(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md)`
**Purpose**: Mode protocols define permission boundaries and operational constraints for each execution mode.
**Protocol Mapping**:
- **`analysis`** mode
- Protocol: `$(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md)`
- Permission: Read-only operations
- Enforces: No file creation/modification/deletion
- **`write`** mode
- Protocol: `$(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md)`
- Permission: Create/Modify/Delete files
- Enforces: Explicit write authorization and full workflow execution capability
**RULES Format** (protocol MUST be included):
```bash
# Analysis mode - MUST include analysis-protocol.md
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/analysis/...) | constraints
# Write/Auto mode - MUST include write-protocol.md
# Write mode - MUST include write-protocol.md
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/development/...) | constraints
```
**Validation**: CLI execution without mode protocol reference is INVALID
**Why Mode Rules Are Required**:
- Ensures consistent permission enforcement across all tools (Gemini/Qwen/Codex)
- Prevents accidental file modifications during analysis tasks
- Provides explicit authorization trail for write operations
- Enables safe automation with clear boundaries
### Gemini & Qwen
**Via CCW**: `ccw cli exec "<prompt>" --tool gemini --mode analysis` or `--tool qwen --mode analysis`
@@ -141,12 +147,12 @@ RULES: $(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md) $(ca
### Codex
**Via CCW**: `ccw cli exec "<prompt>" --tool codex --mode auto`
**Via CCW**: `ccw cli exec "<prompt>" --tool codex --mode write`
**Characteristics**:
- Autonomous development, mathematical reasoning
- Best for: Implementation, testing, automation
- No default MODE - must explicitly specify `--mode write` or `--mode auto`
- Best for: Implementation, testing, automation, bug fixes
- No default MODE - must explicitly specify `--mode analysis` or `--mode write`
**Models**: `gpt-5.2`
@@ -156,7 +162,7 @@ RULES: $(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md) $(ca
```bash
ccw cli exec "Continue analyzing" --tool gemini --mode analysis --resume # Resume last session
ccw cli exec "Fix issues found" --tool codex --mode auto --resume <id> # Resume specific session
ccw cli exec "Fix issues found" --tool codex --mode write --resume <id> # Resume specific session
```
- **`--resume` (empty)**: Resume most recent session
@@ -212,10 +218,10 @@ Every command MUST include these fields:
- Good Example: "Markdown report with: severity levels (Critical/High/Medium/Low), file:line references, remediation code snippets, priority ranking"
- **RULES**
- Purpose: Template + constraints
- Components: $(cat template) + domain rules
- Purpose: Protocol + template + constraints
- Components: $(cat protocol) + $(cat template) + domain rules
- Bad Example: (missing)
- Good Example: "$(cat ~/.claude/.../security.txt) \| Focus on authentication \| Ignore test files \| analysis=READ-ONLY"
- Good Example: "$(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-security-risks.txt) \| Focus on authentication \| Ignore test files"
### CONTEXT Configuration
@@ -334,7 +340,7 @@ RULES: $(cat ~/.claude/workflows/cli-templates/prompts/universal/00-universal-ri
- Default: gemini
- **`--mode <mode>`**
- Description: **REQUIRED**: analysis, write, auto
- Description: **REQUIRED**: analysis, write
- Default: **NONE** (must specify)
- **`--model <model>`**
@@ -400,14 +406,14 @@ CCW automatically maps to tool-specific syntax:
- Gemini/Qwen: `--include-directories`
- Codex: `--add-dir` (per dir)
- **`--mode analysis`**
- Gemini/Qwen: (default read-only)
- Codex: (default read-only)
- **`--mode write`**
- Gemini/Qwen: `--approval-mode yolo`
- Codex: `-s danger-full-access`
- **`--mode auto`**
- Gemini/Qwen: N/A
- Codex: `-s danger-full-access`
### Command Examples
#### Task-Type Specific Templates
@@ -420,7 +426,7 @@ TASK: • Scan for injection flaws (SQL, command, LDAP) • Check authentication
MODE: analysis
CONTEXT: @src/auth/**/* @src/middleware/auth.ts | Memory: Using bcrypt for passwords, JWT for sessions
EXPECTED: Security report with: severity matrix, file:line references, CVE mappings where applicable, remediation code snippets prioritized by risk
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-security-risks.txt) | Focus on authentication | Ignore test files | analysis=READ-ONLY
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-security-risks.txt) | Focus on authentication | Ignore test files
" --tool gemini --cd src/auth --timeout 600000
```
@@ -429,11 +435,11 @@ RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-securi
ccw cli exec "
PURPOSE: Implement rate limiting for API endpoints to prevent abuse; must be configurable per-endpoint; backward compatible with existing clients
TASK: • Create rate limiter middleware with sliding window • Implement per-route configuration • Add Redis backend for distributed state • Include bypass for internal services
MODE: auto
MODE: write
CONTEXT: @src/middleware/**/* @src/config/**/* | Memory: Using Express.js, Redis already configured, existing middleware pattern in auth.ts
EXPECTED: Production-ready code with: TypeScript types, unit tests, integration test, configuration example, migration guide
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-implement-feature.txt) | Follow existing middleware patterns | No breaking changes | auto=FULL
" --tool codex --mode auto --timeout 1800000
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/development/02-implement-feature.txt) | Follow existing middleware patterns | No breaking changes
" --tool codex --mode write --timeout 1800000
```
**Bug Fix Task**:
@@ -444,7 +450,7 @@ TASK: • Trace connection lifecycle from open to close • Identify event liste
MODE: analysis
CONTEXT: @src/websocket/**/* @src/services/connection-manager.ts | Memory: Using ws library, ~5000 concurrent connections in production
EXPECTED: Root cause analysis with: memory profile, leak source (file:line), fix recommendation with code, verification steps
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on resource cleanup | analysis=READ-ONLY
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on resource cleanup
" --tool gemini --cd src --timeout 900000
```
@@ -456,7 +462,7 @@ TASK: • Extract gateway interface from current implementation • Create strat
MODE: write
CONTEXT: @src/payments/**/* @src/types/payment.ts | Memory: Currently only Stripe, adding PayPal next sprint, must support future gateways
EXPECTED: Refactored code with: strategy interface, concrete implementations, factory class, updated tests, migration checklist
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-refactor-codebase.txt) | Preserve all existing behavior | Tests must pass | write=CREATE/MODIFY/DELETE
RULES: $(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/development/02-refactor-codebase.txt) | Preserve all existing behavior | Tests must pass
" --tool gemini --mode write --timeout 1200000
```
---
@@ -483,7 +489,7 @@ RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-refactor-c
```bash
ccw cli exec "<prompt>" --tool gemini --mode analysis --timeout 600000 # 10 min
ccw cli exec "<prompt>" --tool codex --mode auto --timeout 1800000 # 30 min
ccw cli exec "<prompt>" --tool codex --mode write --timeout 1800000 # 30 min
```
### Permission Framework
@@ -492,8 +498,7 @@ ccw cli exec "<prompt>" --tool codex --mode auto --timeout 1800000 # 30 min
**Mode Hierarchy**:
- `analysis`: Read-only, safe for auto-execution
- `write`: Create/Modify/Delete files - requires explicit `--mode write`
- `auto`: Full operations - requires explicit `--mode auto`
- `write`: Create/Modify/Delete files, full operations - requires explicit `--mode write`
- **Exception**: User provides clear instructions like "modify", "create", "implement"
---
@@ -512,15 +517,15 @@ ccw cli exec "<prompt>" --tool codex --mode auto --timeout 1800000 # 30 min
### Workflow Integration
- **Understanding**: `ccw cli exec "<prompt>" --tool gemini`
- **Architecture**: `ccw cli exec "<prompt>" --tool gemini`
- **Implementation**: `ccw cli exec "<prompt>" --tool codex --mode auto`
- **Understanding**: `ccw cli exec "<prompt>" --tool gemini --mode analysis`
- **Architecture**: `ccw cli exec "<prompt>" --tool gemini --mode analysis`
- **Implementation**: `ccw cli exec "<prompt>" --tool codex --mode write`
- **Quality**: `ccw cli exec "<prompt>" --tool codex --mode write`
### Planning Checklist
- [ ] **Purpose defined** - Clear goal and intent
- [ ] **Mode selected** - `--mode analysis|write|auto`
- [ ] **Mode selected** - `--mode analysis|write`
- [ ] **Context gathered** - File references + memory (default `@**/*`)
- [ ] **Directory navigation** - `--cd` and/or `--includeDirs`
- [ ] **Tool selected** - `--tool gemini|qwen|codex`

View File

@@ -0,0 +1,44 @@
# Context Requirements
Before implementation, always:
- Identify 3+ existing similar patterns
- Map dependencies and integration points
- Understand testing framework and coding conventions
## Context Gathering
**MANDATORY**: Use `codex_lens` (MCP tool) for all code search and analysis.
### codex_lens (REQUIRED)
**MCP Actions**: `init`, `search`, `search_files` (Advanced ops via CLI: `codexlens --help`)
**Initialize**:
```
codex_lens(action="init", path=".")
```
- Auto-generates embeddings if `fastembed` installed
- Skip with `--no-embeddings` flag
**Search** (Auto hybrid mode):
```
codex_lens(action="search", query="authentication")
```
**Search Files**:
```
codex_lens(action="search_files", query="payment")
```
### read_file (MCP)
- Read files found by codex_lens
- Directory traversal with patterns
- Batch operations
### smart_search
- Fallback when codex_lens unavailable
- Small projects (<100 files)
### Exa
- External APIs, libraries, frameworks
- Recent documentation beyond knowledge cutoff
- Public implementation examples

View File

@@ -1,56 +0,0 @@
# Context Search Strategy
## ⚡ Execution Environment
**CRITICAL**: All commands execute in **Bash environment** (Git Bash on Windows)
**❌ Forbidden**: Windows commands (`findstr`, `dir`, `where`) - Use Bash (`grep`, `find`, `cat`)
## ⚡ Core Search Tools
**Skill()**: FASTEST way to get context - use FIRST if SKILL exists. Three types: (1) `workflow-progress` for WFS sessions (2) tech SKILLs for stack docs (3) `{project-name}` for project docs
**codebase-retrieval**: Semantic file discovery via Gemini CLI with all files analysis
**rg (ripgrep)**: Fast content search with regex support
**find**: File/directory location by name patterns
**grep**: Built-in pattern matching (fallback when rg unavailable)
**get_modules_by_depth**: Program architecture analysis (MANDATORY before planning)
## 🔧 Quick Command Reference
```bash
# Semantic File Discovery (codebase-retrieval via CCW)
ccw cli exec "
PURPOSE: Discover files relevant to task/feature
TASK: • List all files related to [task/feature description]
MODE: analysis
CONTEXT: @**/*
EXPECTED: Relevant file paths with relevance explanation
RULES: Focus on direct relevance to task requirements | analysis=READ-ONLY
" --tool gemini --cd [directory]
# Program Architecture (MANDATORY before planning)
ccw tool exec get_modules_by_depth '{}'
# Content Search (rg preferred)
rg "pattern" --type js -n # Search JS files with line numbers
rg -i "case-insensitive" # Ignore case
rg -C 3 "context" # Show 3 lines before/after
# File Search
find . -name "*.ts" -type f # Find TypeScript files
find . -path "*/node_modules" -prune -o -name "*.js" -print
# Workflow Examples
rg "IMPL-\d+" .workflow/ --type json # Find task IDs
find .workflow/ -name "*.json" -path "*/.task/*" # Locate task files
rg "status.*pending" .workflow/.task/ # Find pending tasks
```
## ⚡ Performance Tips
- **rg > grep** for content search
- **Use --type filters** to limit file types
- **Exclude dirs**: `--glob '!node_modules'`
- **Use -F** for literal strings (no regex)

326
CODEX_LENS_AUTO_HYBRID.md Normal file
View File

@@ -0,0 +1,326 @@
# CodexLens Auto Hybrid Mode - Implementation Summary
## 概述
实现了两个主要功能:
1. **自动向量嵌入生成**`init` 命令在检测到语义搜索依赖后自动生成向量嵌入
2. **默认混合搜索模式**`search` 命令在检测到嵌入存在时自动使用 hybrid 模式
## 修改文件
### 1. codex-lens CLI (`codex-lens/src/codexlens/cli/commands.py`)
#### 1.1 `init` 命令增强
**新增参数**
- `--no-embeddings`: 跳过自动嵌入生成
- `--embedding-model`: 指定嵌入模型 (默认: "code")
**自动嵌入生成逻辑**
```python
# 在 init 成功后
if not no_embeddings:
from codexlens.semantic import SEMANTIC_AVAILABLE
if SEMANTIC_AVAILABLE:
# 自动调用 generate_embeddings()
# 使用指定的 embedding_model
```
**行为**
- 检测 `fastembed``numpy` 是否安装
- 如果可用,自动生成嵌入(可用 `--no-embeddings` 跳过)
- 默认使用 "code" 模型 (jinaai/jina-embeddings-v2-base-code)
- 在输出中显示嵌入生成进度和统计
#### 1.2 `search` 命令增强
**模式变更**
- 默认模式从 `"exact"` 改为 `"auto"`
- 新增 `"auto"` 模式到有效模式列表
**自动模式检测逻辑**
```python
if mode == "auto":
# 检查项目是否有嵌入
project_record = registry.find_by_source_path(str(search_path))
if project_record:
embed_status = check_embeddings_status(index_path)
if has_embeddings:
actual_mode = "hybrid" # 使用混合模式
else:
actual_mode = "exact" # 降级到精确模式
```
**行为**
- 默认使用 `auto` 模式
- 自动检测索引是否有嵌入
- 有嵌入 → 使用 `hybrid` 模式(精确 + 模糊 + 向量融合)
- 无嵌入 → 使用 `exact` 模式(仅全文搜索)
- 用户仍可手动指定模式覆盖自动检测
### 2. MCP 工具简化 (`ccw/src/tools/codex-lens.ts`)
#### 2.1 简化 action 枚举
**仅暴露核心操作**
- `init`: 初始化索引(自动生成嵌入)
- `search`: 搜索代码(自动混合模式)
- `search_files`: 搜索文件路径
**移除的高级操作**(仍可通过 CLI 使用):
- ~~`symbol`~~: 符号提取 → 使用 `codexlens symbol`
- ~~`status`~~: 状态检查 → 使用 `codexlens status`
- ~~`config_show/set/migrate`~~: 配置管理 → 使用 `codexlens config`
- ~~`clean`~~: 清理索引 → 使用 `codexlens clean`
- ~~`bootstrap/check`~~: 安装管理 → 自动处理
**简化的 ParamsSchema**
```typescript
const ParamsSchema = z.object({
action: z.enum(['init', 'search', 'search_files']),
path: z.string().optional(),
query: z.string().optional(),
mode: z.enum(['auto', 'text', 'semantic', 'exact', 'fuzzy', 'hybrid', 'vector', 'pure-vector']).default('auto'),
languages: z.array(z.string()).optional(),
limit: z.number().default(20),
});
```
#### 2.2 扩展 mode 枚举并设置默认值
**模式支持**
```typescript
mode: z.enum(['auto', 'text', 'semantic', 'exact', 'fuzzy', 'hybrid', 'vector', 'pure-vector']).default('auto')
```
**模式映射**MCP → CLI
```typescript
const modeMap: Record<string, string> = {
'text': 'exact',
'semantic': 'pure-vector',
'auto': 'auto', // 默认:自动检测
'exact': 'exact',
'fuzzy': 'fuzzy',
'hybrid': 'hybrid',
'vector': 'vector',
'pure-vector': 'pure-vector',
};
```
#### 2.3 传递 mode 参数到 CLI
```typescript
const args = ['search', query, '--limit', limit.toString(), '--mode', cliMode, '--json'];
```
### 3. 文档更新 (`.claude/rules/context-requirements.md`)
#### 3.1 更新 init 说明
强调自动嵌入生成功能:
```markdown
**NEW**: `init` automatically generates vector embeddings if semantic dependencies are installed (fastembed).
- Auto-detects if `numpy` and `fastembed` are available
- Uses "code" model by default (jinaai/jina-embeddings-v2-base-code)
- Skip with `--no-embeddings` flag if needed
```
#### 3.2 更新 search 说明
强调自动混合模式:
```markdown
**Search Code** (Auto Hybrid Mode - DEFAULT):
# Simple call - auto-detects mode (hybrid if embeddings exist, exact otherwise):
codex_lens(action="search", query="authentication", path=".", limit=20)
```
#### 3.3 详细模式说明
添加完整的模式列表和默认行为说明:
- `auto`: **DEFAULT** - Uses hybrid if embeddings exist, exact otherwise
- `hybrid`: Exact + Fuzzy + Vector fusion (best results, auto-selected if embeddings exist)
- 其他模式...
## 使用示例
### 场景 1首次使用已安装 fastembed
```bash
# 初始化索引(自动生成嵌入)
codexlens init .
# 输出:
# OK Indexed 150 files in 12 directories
#
# Generating embeddings...
# Model: code
# ✓ Generated 1234 embeddings in 45.2s
# 搜索(自动使用 hybrid 模式)
codexlens search "authentication"
# Mode: hybrid | Searched 12 directories in 15.2ms
```
### 场景 2首次使用未安装 fastembed
```bash
# 初始化索引(跳过嵌入)
codexlens init .
# 输出:
# OK Indexed 150 files in 12 directories
# (无嵌入生成提示)
# 搜索(降级到 exact 模式)
codexlens search "authentication"
# Mode: exact | Searched 12 directories in 8.5ms
```
### 场景 3手动控制
```bash
# 跳过嵌入生成
codexlens init . --no-embeddings
# 强制使用特定模式
codexlens search "auth" --mode exact
codexlens search "how to authenticate" --mode hybrid
```
### 场景 4MCP 工具使用(简化版)
```python
# 初始化(自动生成嵌入)
codex_lens(action="init", path=".")
# 搜索(默认 auto 模式:有嵌入用 hybrid无嵌入用 exact
codex_lens(action="search", query="authentication")
# 强制混合模式
codex_lens(action="search", query="authentication", mode="hybrid")
# 强制精确模式
codex_lens(action="search", query="authenticate_user", mode="exact")
# 仅返回文件路径
codex_lens(action="search_files", query="payment processing")
```
**高级操作使用 CLI**
```bash
# 检查状态
codexlens status
# 提取符号
codexlens symbol src/auth/login.js
# 配置管理
codexlens config show
codexlens config set index_dir /custom/path
# 清理索引
codexlens clean .
```
## 技术细节
### 嵌入检测逻辑
1. 查找项目在 registry 中的记录
2. 获取索引路径 `index_root/_index.db`
3. 调用 `check_embeddings_status()` 检查:
- 是否存在 `chunks`
- `chunks_count > 0`
4. 根据检测结果选择模式
### 混合搜索权重
默认 RRF 权重:
- Exact FTS: 0.4
- Fuzzy FTS: 0.3
- Vector: 0.3
可通过 `--weights` 参数自定义:
```bash
codexlens search "query" --mode hybrid --weights 0.5,0.3,0.2
```
### 模型选项
| 模型 | 模型名称 | 维度 | 大小 | 推荐场景 |
|------|---------|------|------|---------|
| fast | BAAI/bge-small-en-v1.5 | 384 | ~80MB | 快速原型 |
| code | jinaai/jina-embeddings-v2-base-code | 768 | ~150MB | **推荐** 代码搜索 |
| multilingual | intfloat/multilingual-e5-large | 1024 | ~1GB | 多语言项目 |
| balanced | mixedbread-ai/mxbai-embed-large-v1 | 1024 | ~600MB | 平衡性能 |
## 兼容性
### 向后兼容
- 所有现有命令仍然工作
- 手动指定 `--mode` 会覆盖自动检测
- 使用 `--no-embeddings` 可恢复旧行为
### 依赖要求
**核心功能**(无需额外依赖):
- FTS 索引exact, fuzzy
- 符号提取
**语义搜索功能**(需要安装):
```bash
pip install codexlens[semantic]
# 或
pip install numpy fastembed
```
## 性能影响
### 初始化时间
- FTS 索引:~2-5 秒100 文件)
- 嵌入生成:+30-60 秒(首次下载模型)
- 后续嵌入:+10-20 秒
### 搜索性能
| 模式 | 延迟 | 召回率 | 推荐场景 |
|------|------|--------|---------|
| exact | 5ms | 中 | 精确代码标识符 |
| fuzzy | 7ms | 中 | 容错搜索 |
| hybrid | 15ms | **最高** | **通用搜索(推荐)** |
| vector | 12ms | 高 | 语义查询 |
| pure-vector | 10ms | 中 | 自然语言 |
## 最小化修改原则
所有修改都遵循最小化原则:
1. **保持向后兼容**:不破坏现有功能
2. **默认智能**:自动检测最佳模式
3. **用户可控**:可通过参数覆盖自动行为
4. **渐进增强**:未安装 fastembed 时优雅降级
## 总结
**init 命令自动生成嵌入**(可用 `--no-embeddings` 跳过)
**search 命令默认使用混合模式**(有嵌入时自动启用)
**MCP 工具简化为核心操作**init, search, search_files
**所有搜索模式支持**auto, exact, fuzzy, hybrid, vector, pure-vector
✅ **文档已更新**反映新的默认行为
**保持向后兼容性**
**优雅降级**(无 fastembed 时使用 exact 模式)
### MCP vs CLI 功能对比
| 功能 | MCP 工具 | CLI |
|------|---------|-----|
| 初始化索引 | ✅ `codex_lens(action="init")` | ✅ `codexlens init` |
| 搜索代码 | ✅ `codex_lens(action="search")` | ✅ `codexlens search` |
| 搜索文件 | ✅ `codex_lens(action="search_files")` | ✅ `codexlens search --files-only` |
| 检查状态 | ❌ 使用 CLI | ✅ `codexlens status` |
| 提取符号 | ❌ 使用 CLI | ✅ `codexlens symbol` |
| 配置管理 | ❌ 使用 CLI | ✅ `codexlens config` |
| 清理索引 | ❌ 使用 CLI | ✅ `codexlens clean` |
**设计理念**MCP 工具专注于高频核心操作(索引、搜索),高级管理操作通过 CLI 执行。

View File

@@ -137,7 +137,7 @@ var helpI18n = {
// Helper function to get help translation
function ht(key, replacements) {
var lang = typeof currentLanguage !== 'undefined' ? currentLanguage : 'en';
var lang = typeof currentLang !== 'undefined' ? currentLang : 'en';
var translations = helpI18n[lang] || helpI18n.en;
var text = translations[key] || helpI18n.en[key] || key;
@@ -150,3 +150,9 @@ function ht(key, replacements) {
return text;
}
// Expose ht function globally
window.ht = ht;
// Debug log to verify loading
console.log('[Help i18n] ht function loaded and exposed to window:', typeof window.ht);

View File

@@ -18,6 +18,9 @@ var activeWorkflowDiagram = 'tdd';
// ========== Main Render Function ==========
async function renderHelpView() {
// Debug: Check if ht function is available
console.log('[Help View] ht function available:', typeof ht, typeof window.ht);
hideStatsAndCarousel();
var container = document.getElementById('mainContent');

View File

@@ -39,26 +39,12 @@ const ParamsSchema = z.object({
'init',
'search',
'search_files',
'symbol',
'status',
'config_show',
'config_set',
'config_migrate',
'clean',
'bootstrap',
'check',
]),
path: z.string().optional(),
query: z.string().optional(),
mode: z.enum(['text', 'semantic']).default('text'),
file: z.string().optional(),
key: z.string().optional(), // For config_set action
value: z.string().optional(), // For config_set action
newPath: z.string().optional(), // For config_migrate action
all: z.boolean().optional(), // For clean action
mode: z.enum(['auto', 'text', 'semantic', 'exact', 'fuzzy', 'hybrid', 'vector', 'pure-vector']).default('auto'),
languages: z.array(z.string()).optional(),
limit: z.number().default(20),
format: z.enum(['json', 'table', 'plain']).default('json'),
});
type Params = z.infer<typeof ParamsSchema>;
@@ -447,13 +433,26 @@ async function initIndex(params: Params): Promise<ExecuteResult> {
* @returns Execution result
*/
async function searchCode(params: Params): Promise<ExecuteResult> {
const { query, path = '.', limit = 20 } = params;
const { query, path = '.', limit = 20, mode = 'auto' } = params;
if (!query) {
return { success: false, error: 'Query is required for search action' };
}
const args = ['search', query, '--limit', limit.toString(), '--json'];
// Map MCP mode names to CLI mode names
const modeMap: Record<string, string> = {
'text': 'exact',
'semantic': 'pure-vector',
'auto': 'auto',
'exact': 'exact',
'fuzzy': 'fuzzy',
'hybrid': 'hybrid',
'vector': 'vector',
'pure-vector': 'pure-vector',
};
const cliMode = modeMap[mode] || 'auto';
const args = ['search', query, '--limit', limit.toString(), '--mode', cliMode, '--json'];
const result = await executeCodexLens(args, { cwd: path });
@@ -475,13 +474,26 @@ async function searchCode(params: Params): Promise<ExecuteResult> {
* @returns Execution result
*/
async function searchFiles(params: Params): Promise<ExecuteResult> {
const { query, path = '.', limit = 20 } = params;
const { query, path = '.', limit = 20, mode = 'auto' } = params;
if (!query) {
return { success: false, error: 'Query is required for search_files action' };
}
const args = ['search', query, '--files-only', '--limit', limit.toString(), '--json'];
// Map MCP mode names to CLI mode names
const modeMap: Record<string, string> = {
'text': 'exact',
'semantic': 'pure-vector',
'auto': 'auto',
'exact': 'exact',
'fuzzy': 'fuzzy',
'hybrid': 'hybrid',
'vector': 'vector',
'pure-vector': 'pure-vector',
};
const cliMode = modeMap[mode] || 'auto';
const args = ['search', query, '--files-only', '--limit', limit.toString(), '--mode', cliMode, '--json'];
const result = await executeCodexLens(args, { cwd: path });
@@ -661,20 +673,23 @@ async function cleanIndexes(params: Params): Promise<ExecuteResult> {
// Tool schema for MCP
export const schema: ToolSchema = {
name: 'codex_lens',
description: `CodexLens - Code indexing and search.
description: `CodexLens - Code indexing and semantic search.
Usage:
codex_lens(action="init", path=".") # Index directory
codex_lens(action="search", query="func", path=".") # Search code
codex_lens(action="init", path=".") # Index directory (auto-generates embeddings if available)
codex_lens(action="search", query="func") # Search code (auto: hybrid if embeddings exist, else exact)
codex_lens(action="search", query="func", mode="hybrid") # Force hybrid search
codex_lens(action="search_files", query="x") # Search, return paths only
codex_lens(action="symbol", file="f.py") # Extract symbols
codex_lens(action="status") # Index status
codex_lens(action="config_show") # Show configuration
codex_lens(action="config_set", key="index_dir", value="/path/to/indexes") # Set config
codex_lens(action="config_migrate", newPath="/new/path") # Migrate indexes
codex_lens(action="clean") # Show clean status
codex_lens(action="clean", path=".") # Clean specific project
codex_lens(action="clean", all=true) # Clean all indexes`,
Search Modes:
- auto: Auto-detect (hybrid if embeddings exist, exact otherwise) [default]
- exact/text: Exact FTS for code identifiers
- hybrid: Exact + Fuzzy + Vector fusion (best results, requires embeddings)
- fuzzy: Typo-tolerant search
- vector: Semantic + keyword
- pure-vector/semantic: Pure semantic search
Note: For advanced operations (config, status, clean), use CLI directly: codexlens --help`,
inputSchema: {
type: 'object',
properties: {
@@ -684,68 +699,33 @@ Usage:
'init',
'search',
'search_files',
'symbol',
'status',
'config_show',
'config_set',
'config_migrate',
'clean',
'bootstrap',
'check',
],
description: 'Action to perform',
description: 'Action to perform: init (index directory), search (search code), search_files (search files only)',
},
path: {
type: 'string',
description: 'Target path (for init, search, search_files, status, clean)',
description: 'Target directory path (for init, search, search_files). Defaults to current directory.',
},
query: {
type: 'string',
description: 'Search query (for search and search_files actions)',
description: 'Search query (required for search and search_files actions)',
},
mode: {
type: 'string',
enum: ['text', 'semantic'],
description: 'Search mode (default: text)',
default: 'text',
},
file: {
type: 'string',
description: 'File path (for symbol action)',
},
key: {
type: 'string',
description: 'Config key (for config_set action, e.g., "index_dir")',
},
value: {
type: 'string',
description: 'Config value (for config_set action)',
},
newPath: {
type: 'string',
description: 'New index path (for config_migrate action)',
},
all: {
type: 'boolean',
description: 'Clean all indexes (for clean action)',
default: false,
enum: ['auto', 'text', 'semantic', 'exact', 'fuzzy', 'hybrid', 'vector', 'pure-vector'],
description: 'Search mode: auto (default, hybrid if embeddings exist), text/exact (FTS), hybrid (best), fuzzy, vector, semantic/pure-vector',
default: 'auto',
},
languages: {
type: 'array',
items: { type: 'string' },
description: 'Languages to index (for init action)',
description: 'Languages to index (for init action). Example: ["javascript", "typescript", "python"]',
},
limit: {
type: 'number',
description: 'Maximum results (for search and search_files actions)',
description: 'Maximum number of search results (for search and search_files actions)',
default: 20,
},
format: {
type: 'string',
enum: ['json', 'table', 'plain'],
description: 'Output format',
default: 'json',
},
},
required: ['action'],
},
@@ -776,55 +756,9 @@ export async function handler(params: Record<string, unknown>): Promise<ToolResu
result = await searchFiles(parsed.data);
break;
case 'symbol':
result = await extractSymbols(parsed.data);
break;
case 'status':
result = await getStatus(parsed.data);
break;
case 'config_show':
result = await configShow();
break;
case 'config_set':
result = await configSet(parsed.data);
break;
case 'config_migrate':
result = await configMigrate(parsed.data);
break;
case 'clean':
result = await cleanIndexes(parsed.data);
break;
case 'bootstrap': {
// Force re-bootstrap
bootstrapChecked = false;
bootstrapReady = false;
const bootstrapResult = await bootstrapVenv();
result = bootstrapResult.success
? { success: true, message: 'CodexLens bootstrapped successfully' }
: { success: false, error: bootstrapResult.error };
break;
}
case 'check': {
const checkResult = await checkVenvStatus();
result = {
success: checkResult.ready,
ready: checkResult.ready,
error: checkResult.error,
version: checkResult.version,
};
break;
}
default:
throw new Error(
`Unknown action: ${action}. Valid actions: init, search, search_files, symbol, status, config_show, config_set, config_migrate, clean, bootstrap, check`
`Unknown action: ${action}. Valid actions: init, search, search_files`
);
}

View File

@@ -79,6 +79,8 @@ def init(
),
workers: int = typer.Option(4, "--workers", "-w", min=1, max=16, help="Parallel worker processes."),
force: bool = typer.Option(False, "--force", "-f", help="Force full reindex (skip incremental mode)."),
no_embeddings: bool = typer.Option(False, "--no-embeddings", help="Skip automatic embedding generation (if semantic deps installed)."),
embedding_model: str = typer.Option("code", "--embedding-model", help="Embedding model profile: fast, code, multilingual, balanced."),
json_mode: bool = typer.Option(False, "--json", help="Output JSON response."),
verbose: bool = typer.Option(False, "--verbose", "-v", help="Enable debug logging."),
) -> None:
@@ -89,6 +91,9 @@ def init(
By default, uses incremental indexing (skip unchanged files).
Use --force to rebuild all files regardless of modification time.
If semantic search dependencies are installed, automatically generates embeddings
after indexing completes. Use --no-embeddings to skip this step.
"""
_configure_logging(verbose)
config = Config()
@@ -133,6 +138,59 @@ def init(
if build_result.errors:
console.print(f" [yellow]Warnings:[/yellow] {len(build_result.errors)} errors")
# Auto-generate embeddings if semantic search is available
if not no_embeddings:
try:
from codexlens.semantic import SEMANTIC_AVAILABLE
from codexlens.cli.embedding_manager import generate_embeddings
if SEMANTIC_AVAILABLE:
# Find the index file
index_path = Path(build_result.index_root) / "_index.db"
if not json_mode:
console.print("\n[bold]Generating embeddings...[/bold]")
console.print(f"Model: [cyan]{embedding_model}[/cyan]")
# Progress callback for non-json mode
def progress_update(msg: str):
if not json_mode and verbose:
console.print(f" {msg}")
embed_result = generate_embeddings(
index_path,
model_profile=embedding_model,
force=False, # Don't force regenerate during init
chunk_size=2000,
progress_callback=progress_update if not json_mode else None,
)
if embed_result["success"]:
embed_data = embed_result["result"]
result["embeddings_generated"] = True
result["embeddings_count"] = embed_data["chunks_embedded"]
if not json_mode:
console.print(f"[green]✓[/green] Generated [bold]{embed_data['chunks_embedded']}[/bold] embeddings in {embed_data['elapsed_time']:.1f}s")
else:
if not json_mode:
console.print(f"[yellow]Warning:[/yellow] Embedding generation failed: {embed_result.get('error', 'Unknown error')}")
result["embeddings_generated"] = False
result["embeddings_error"] = embed_result.get("error")
else:
if not json_mode and verbose:
console.print("[dim]Semantic search not available. Skipping embeddings.[/dim]")
result["embeddings_generated"] = False
result["embeddings_error"] = "Semantic dependencies not installed"
except Exception as e:
if not json_mode and verbose:
console.print(f"[yellow]Warning:[/yellow] Could not generate embeddings: {e}")
result["embeddings_generated"] = False
result["embeddings_error"] = str(e)
else:
result["embeddings_generated"] = False
result["embeddings_error"] = "Skipped (--no-embeddings)"
except StorageError as exc:
if json_mode:
print_json(success=False, error=f"Storage error: {exc}")
@@ -181,7 +239,7 @@ def search(
limit: int = typer.Option(20, "--limit", "-n", min=1, max=500, help="Max results."),
depth: int = typer.Option(-1, "--depth", "-d", help="Search depth (-1 = unlimited, 0 = current only)."),
files_only: bool = typer.Option(False, "--files-only", "-f", help="Return only file paths without content snippets."),
mode: str = typer.Option("exact", "--mode", "-m", help="Search mode: exact, fuzzy, hybrid, vector, pure-vector."),
mode: str = typer.Option("auto", "--mode", "-m", help="Search mode: auto, exact, fuzzy, hybrid, vector, pure-vector."),
weights: Optional[str] = typer.Option(None, "--weights", help="Custom RRF weights as 'exact,fuzzy,vector' (e.g., '0.5,0.3,0.2')."),
json_mode: bool = typer.Option(False, "--json", help="Output JSON response."),
verbose: bool = typer.Option(False, "--verbose", "-v", help="Enable debug logging."),
@@ -192,7 +250,8 @@ def search(
Use --depth to limit search recursion (0 = current dir only).
Search Modes:
- exact: Exact FTS using unicode61 tokenizer (default) - for code identifiers
- auto: Auto-detect (hybrid if embeddings exist, exact otherwise) [default]
- exact: Exact FTS using unicode61 tokenizer - for code identifiers
- fuzzy: Fuzzy FTS using trigram tokenizer - for typo-tolerant search
- hybrid: RRF fusion of exact + fuzzy + vector (recommended) - best recall
- vector: Vector search with exact FTS fallback - semantic + keyword
@@ -207,20 +266,23 @@ def search(
Use --weights to customize (e.g., --weights 0.5,0.3,0.2)
Examples:
# Exact code search
# Auto-detect mode (uses hybrid if embeddings available)
codexlens search "authentication"
# Explicit exact code search
codexlens search "authenticate_user" --mode exact
# Semantic search (requires embeddings)
codexlens search "how to verify user credentials" --mode pure-vector
# Best of both worlds
# Force hybrid mode
codexlens search "authentication" --mode hybrid
"""
_configure_logging(verbose)
search_path = path.expanduser().resolve()
# Validate mode
valid_modes = ["exact", "fuzzy", "hybrid", "vector", "pure-vector"]
valid_modes = ["auto", "exact", "fuzzy", "hybrid", "vector", "pure-vector"]
if mode not in valid_modes:
if json_mode:
print_json(success=False, error=f"Invalid mode: {mode}. Must be one of: {', '.join(valid_modes)}")
@@ -258,19 +320,48 @@ def search(
engine = ChainSearchEngine(registry, mapper)
# Auto-detect mode if set to "auto"
actual_mode = mode
if mode == "auto":
# Check if embeddings are available by looking for project in registry
project_record = registry.find_by_source_path(str(search_path))
has_embeddings = False
if project_record:
# Check if index has embeddings
index_path = Path(project_record["index_root"]) / "_index.db"
try:
from codexlens.cli.embedding_manager import check_embeddings_status
embed_status = check_embeddings_status(index_path)
if embed_status["success"]:
embed_data = embed_status["result"]
has_embeddings = embed_data["has_embeddings"] and embed_data["chunks_count"] > 0
except Exception:
pass
# Choose mode based on embedding availability
if has_embeddings:
actual_mode = "hybrid"
if not json_mode and verbose:
console.print("[dim]Auto-detected mode: hybrid (embeddings available)[/dim]")
else:
actual_mode = "exact"
if not json_mode and verbose:
console.print("[dim]Auto-detected mode: exact (no embeddings)[/dim]")
# Map mode to options
if mode == "exact":
if actual_mode == "exact":
hybrid_mode, enable_fuzzy, enable_vector, pure_vector = False, False, False, False
elif mode == "fuzzy":
elif actual_mode == "fuzzy":
hybrid_mode, enable_fuzzy, enable_vector, pure_vector = False, True, False, False
elif mode == "vector":
elif actual_mode == "vector":
hybrid_mode, enable_fuzzy, enable_vector, pure_vector = True, False, True, False # Vector + exact fallback
elif mode == "pure-vector":
elif actual_mode == "pure-vector":
hybrid_mode, enable_fuzzy, enable_vector, pure_vector = True, False, True, True # Pure vector only
elif mode == "hybrid":
elif actual_mode == "hybrid":
hybrid_mode, enable_fuzzy, enable_vector, pure_vector = True, True, True, False
else:
raise ValueError(f"Invalid mode: {mode}")
raise ValueError(f"Invalid mode: {actual_mode}")
options = SearchOptions(
depth=depth,
@@ -295,7 +386,7 @@ def search(
result = engine.search(query, search_path, options)
payload = {
"query": query,
"mode": mode,
"mode": actual_mode,
"count": len(result.results),
"results": [
{
@@ -316,7 +407,7 @@ def search(
print_json(success=True, result=payload)
else:
render_search_results(result.results, verbose=verbose)
console.print(f"[dim]Mode: {mode} | Searched {result.stats.dirs_searched} directories in {result.stats.time_ms:.1f}ms[/dim]")
console.print(f"[dim]Mode: {actual_mode} | Searched {result.stats.dirs_searched} directories in {result.stats.time_ms:.1f}ms[/dim]")
except SearchError as exc:
if json_mode: