mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-05 01:50:27 +08:00
- Implement full coverage tests for Embedder model loading and embedding generation - Add CRUD operations and caching tests for VectorStore - Include cosine similarity computation tests - Validate semantic search accuracy and relevance through various queries - Establish performance benchmarks for embedding and search operations - Ensure edge cases and error handling are covered - Test thread safety and concurrent access scenarios - Verify availability of semantic search dependencies
84 lines
2.4 KiB
Markdown
84 lines
2.4 KiB
Markdown
# Semantic Search Integration
|
|
|
|
## Overview
|
|
The ChainSearchEngine now supports semantic keyword search in addition to FTS5 full-text search.
|
|
|
|
## Usage
|
|
|
|
### Enable Semantic Search
|
|
|
|
```python
|
|
from pathlib import Path
|
|
from codexlens.search.chain_search import ChainSearchEngine, SearchOptions
|
|
from codexlens.storage.registry import RegistryStore
|
|
from codexlens.storage.path_mapper import PathMapper
|
|
|
|
# Initialize
|
|
registry = RegistryStore()
|
|
registry.initialize()
|
|
mapper = PathMapper()
|
|
engine = ChainSearchEngine(registry, mapper)
|
|
|
|
# Create options with semantic search enabled
|
|
options = SearchOptions(
|
|
include_semantic=True, # Enable semantic keyword search
|
|
total_limit=50
|
|
)
|
|
|
|
# Execute search
|
|
result = engine.search("authentication", Path("./src"), options)
|
|
|
|
# Results include both FTS and semantic matches
|
|
for r in result.results:
|
|
print(f"{r.path}: {r.score:.2f} - {r.excerpt}")
|
|
```
|
|
|
|
### How It Works
|
|
|
|
1. **FTS Search**: Traditional full-text search using SQLite FTS5
|
|
2. **Semantic Search**: Searches the `semantic_metadata.keywords` field
|
|
3. **Result Merging**: Semantic results are added with 0.8x weight
|
|
- FTS results: BM25 score from SQLite
|
|
- Semantic results: Base score of 10.0 * 0.8 = 8.0
|
|
4. **Deduplication**: `_merge_and_rank()` deduplicates by path, keeping highest score
|
|
|
|
### Result Format
|
|
|
|
- **FTS results**: Regular excerpt from matched content
|
|
- **Semantic results**: `Keywords: keyword1, keyword2, keyword3, ...`
|
|
|
|
### Prerequisites
|
|
|
|
Files must have semantic metadata generated via:
|
|
|
|
```bash
|
|
codex-lens enhance . --tool gemini
|
|
```
|
|
|
|
This uses CCW CLI to generate summaries, keywords, and purpose descriptions.
|
|
|
|
## Implementation Details
|
|
|
|
### Changes Made
|
|
|
|
1. **SearchOptions**: Added `include_semantic: bool = False` parameter
|
|
2. **_search_parallel()**: Passes `include_semantic` to worker threads
|
|
3. **_search_single_index()**:
|
|
- Accepts `include_semantic` parameter
|
|
- Calls `DirIndexStore.search_semantic_keywords()` when enabled
|
|
- Converts semantic matches to `SearchResult` objects
|
|
- Applies 0.8x weight to semantic scores
|
|
|
|
### Score Weighting
|
|
|
|
```python
|
|
# FTS result (from BM25)
|
|
SearchResult(path="...", score=12.5, excerpt="...")
|
|
|
|
# Semantic result (fixed weighted score)
|
|
SearchResult(path="...", score=8.0, excerpt="Keywords: ...")
|
|
```
|
|
|
|
The 0.8x weight ensures semantic matches rank slightly lower than direct FTS matches
|
|
but still appear in relevant results.
|