Files
Claude-Code-Workflow/codex-lens/docs/T6-CLI-Integration-Summary.md
catlog22 3da0ef2adb Add comprehensive tests for query parsing and Reciprocal Rank Fusion
- Implemented tests for the QueryParser class, covering various identifier splitting methods (CamelCase, snake_case, kebab-case), OR expansion, and FTS5 operator preservation.
- Added parameterized tests to validate expected token outputs for different query formats.
- Created edge case tests to ensure robustness against unusual input scenarios.
- Developed tests for the Reciprocal Rank Fusion (RRF) algorithm, including score computation, weight handling, and result ranking across multiple sources.
- Included tests for normalization of BM25 scores and tagging search results with source metadata.
2025-12-16 10:20:19 +08:00

249 lines
6.2 KiB
Markdown

# T6: CLI Integration for Hybrid Search - Implementation Summary
## Overview
Successfully integrated hybrid search capabilities into the CodexLens CLI with user-configurable options, migration support, and enhanced status reporting.
## Changes Made
### 1. Search Command Enhancement (`commands.py`)
**New `--mode` Parameter:**
- Replaced `--hybrid` and `--exact-only` flags with unified `--mode` parameter
- Supported modes: `exact`, `fuzzy`, `hybrid`, `vector`
- Default: `exact` (backward compatible)
**Mode Validation:**
```python
valid_modes = ["exact", "fuzzy", "hybrid", "vector"]
if mode not in valid_modes:
# Error with helpful message
```
**Weights Configuration:**
- Accepts custom RRF weights via `--weights exact,fuzzy,vector`
- Example: `--weights 0.5,0.3,0.2`
- Automatic normalization if weights don't sum to 1.0
- Validation for 3-value format
**Mode Mapping to SearchOptions:**
```python
hybrid_mode = mode == "hybrid"
enable_fuzzy = mode in ["fuzzy", "hybrid"]
options = SearchOptions(
hybrid_mode=hybrid_mode,
enable_fuzzy=enable_fuzzy,
hybrid_weights=hybrid_weights,
)
```
**Enhanced Output:**
- Shows search mode in status line
- Includes search source tags in verbose mode
- JSON output includes mode and source information
### 2. Migrate Command (`commands.py`)
**New Command for Dual-FTS Upgrade:**
```bash
codex-lens migrate [path]
```
**Features:**
- Upgrades all `_index.db` files to schema version 4
- Shows progress bar with percentage complete
- Tracks: migrated, already up-to-date, errors
- Safe operation preserving all data
- Verbose mode shows per-database migration details
**Progress Tracking:**
- Uses Rich progress bar with spinner
- Shows percentage and count (N/Total)
- Time elapsed indicator
### 3. Status Command Enhancement (`commands.py`)
**New Backend Status Display:**
```
Search Backends:
Exact FTS: ✓ (unicode61)
Fuzzy FTS: ✓ (trigram)
Hybrid Search: ✓ (RRF fusion)
Vector Search: ✗ (future)
```
**Schema Version Detection:**
- Checks first available `_index.db`
- Reports schema version
- Detects dual FTS table presence
**Feature Flags in JSON:**
```json
{
"features": {
"exact_fts": true,
"fuzzy_fts": true,
"hybrid_search": true,
"vector_search": false
}
}
```
### 4. Output Rendering (`output.py`)
**Verbose Mode Support:**
```python
render_search_results(results, verbose=True)
```
**Search Source Tags:**
- `[E]` - Exact FTS result
- `[F]` - Fuzzy FTS result
- `[V]` - Vector search result
- `[RRF]` - Fusion result
**Enhanced Table:**
- New "Source" column in verbose mode
- Shows result origin for debugging
- Fusion scores visible
## Usage Examples
### 1. Search with Different Modes
```bash
# Exact search (default)
codex-lens search "authentication"
# Fuzzy search only
codex-lens search "authentication" --mode fuzzy
# Hybrid search with RRF fusion
codex-lens search "authentication" --mode hybrid
# Hybrid with custom weights
codex-lens search "authentication" --mode hybrid --weights 0.5,0.3,0.2
# Verbose mode shows source tags
codex-lens search "authentication" --mode hybrid -v
```
### 2. Migration
```bash
# Migrate current project
codex-lens migrate
# Migrate specific project with verbose output
codex-lens migrate /path/to/project -v
# JSON output for automation
codex-lens migrate --json
```
### 3. Status Checking
```bash
# Check backend availability
codex-lens status
# JSON output with feature flags
codex-lens status --json
```
## Testing
**Test Coverage:**
- ✅ Mode parameter validation (exact, fuzzy, hybrid, vector)
- ✅ Weights parsing and normalization
- ✅ Help text shows all modes
- ✅ Migrate command exists and accessible
- ✅ Status command shows backends
- ✅ Mode mapping to SearchOptions
**Test Results:**
```
11 passed in 2.27s
```
## Integration Points
### With Phase 1 (Dual-FTS):
- Uses `search_fts_exact()` for exact mode
- Uses `search_fts_fuzzy()` for fuzzy mode
- Schema migration via `_apply_migrations()`
### With Phase 2 (Hybrid Search):
- Calls `HybridSearchEngine` for hybrid mode
- Passes custom weights to RRF algorithm
- Displays fusion scores and source tags
### With Existing CLI:
- Backward compatible (default mode=exact)
- Follows existing error handling patterns
- Uses Rich for progress and formatting
- Supports JSON output mode
## Done Criteria Verification
**CLI search --mode exact uses only exact FTS table**
- Mode validation ensures correct backend selection
- `hybrid_mode=False, enable_fuzzy=False` for exact mode
**--mode fuzzy uses only fuzzy table**
- `hybrid_mode=False, enable_fuzzy=True` for fuzzy mode
- Single backend execution
**--mode hybrid fuses both**
- `hybrid_mode=True, enable_fuzzy=True` activates RRF fusion
- HybridSearchEngine coordinates parallel search
**Custom weights via --weights 0.5,0.3,0.2**
- Parses 3-value comma-separated format
- Validates and normalizes to sum=1.0
- Passes to RRF algorithm
**Migration command completes Dual-FTS upgrade**
- Shows progress bar with percentage
- Tracks migration status per database
- Safe operation with error handling
**Search output shows [E], [F], [V] tags and fusion scores**
- Verbose mode displays Source column
- Tags extracted from `search_source` attribute
- Fusion scores shown in Score column
## Files Modified
1. `codex-lens/src/codexlens/cli/commands.py`
- Updated `search()` command with `--mode` parameter
- Added `migrate()` command
- Enhanced `status()` command
- Added DirIndexStore import
2. `codex-lens/src/codexlens/cli/output.py`
- Updated `render_search_results()` with verbose mode
- Added source tag display logic
3. `codex-lens/tests/test_cli_hybrid_search.py` (new)
- Comprehensive CLI integration tests
- Mode validation tests
- Weights parsing tests
- Command availability tests
## Performance Impact
- **Exact mode**: Same as before (no overhead)
- **Fuzzy mode**: Single FTS query (minimal overhead)
- **Hybrid mode**: Parallel execution (2x I/O, no sequential penalty)
- **Migration**: One-time operation, safe for large projects
## Next Steps
Users can now:
1. Run `codex-lens migrate` to upgrade existing indexes
2. Use `codex-lens search "query" --mode hybrid` for best results
3. Check `codex-lens status` to verify enabled features
4. Tune fusion weights for their use case via `--weights`