mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-04 01:40:45 +08:00
- Implement full coverage tests for Embedder model loading and embedding generation - Add CRUD operations and caching tests for VectorStore - Include cosine similarity computation tests - Validate semantic search accuracy and relevance through various queries - Establish performance benchmarks for embedding and search operations - Ensure edge cases and error handling are covered - Test thread safety and concurrent access scenarios - Verify availability of semantic search dependencies
2.4 KiB
2.4 KiB
Semantic Search Integration
Overview
The ChainSearchEngine now supports semantic keyword search in addition to FTS5 full-text search.
Usage
Enable Semantic Search
from pathlib import Path
from codexlens.search.chain_search import ChainSearchEngine, SearchOptions
from codexlens.storage.registry import RegistryStore
from codexlens.storage.path_mapper import PathMapper
# Initialize
registry = RegistryStore()
registry.initialize()
mapper = PathMapper()
engine = ChainSearchEngine(registry, mapper)
# Create options with semantic search enabled
options = SearchOptions(
include_semantic=True, # Enable semantic keyword search
total_limit=50
)
# Execute search
result = engine.search("authentication", Path("./src"), options)
# Results include both FTS and semantic matches
for r in result.results:
print(f"{r.path}: {r.score:.2f} - {r.excerpt}")
How It Works
- FTS Search: Traditional full-text search using SQLite FTS5
- Semantic Search: Searches the
semantic_metadata.keywordsfield - Result Merging: Semantic results are added with 0.8x weight
- FTS results: BM25 score from SQLite
- Semantic results: Base score of 10.0 * 0.8 = 8.0
- Deduplication:
_merge_and_rank()deduplicates by path, keeping highest score
Result Format
- FTS results: Regular excerpt from matched content
- Semantic results:
Keywords: keyword1, keyword2, keyword3, ...
Prerequisites
Files must have semantic metadata generated via:
codex-lens enhance . --tool gemini
This uses CCW CLI to generate summaries, keywords, and purpose descriptions.
Implementation Details
Changes Made
- SearchOptions: Added
include_semantic: bool = Falseparameter - _search_parallel(): Passes
include_semanticto worker threads - _search_single_index():
- Accepts
include_semanticparameter - Calls
DirIndexStore.search_semantic_keywords()when enabled - Converts semantic matches to
SearchResultobjects - Applies 0.8x weight to semantic scores
- Accepts
Score Weighting
# FTS result (from BM25)
SearchResult(path="...", score=12.5, excerpt="...")
# Semantic result (fixed weighted score)
SearchResult(path="...", score=8.0, excerpt="Keywords: ...")
The 0.8x weight ensures semantic matches rank slightly lower than direct FTS matches but still appear in relevant results.