- Updated the Chunker class to adjust the window movement logic, ensuring proper handling of overlap lines.
- Introduced a new smart search tool with features including intent classification, CodexLens integration, multi-backend search routing, and index status checking.
- Implemented various search modes (auto, hybrid, exact, ripgrep, priority) with detailed metadata and error handling.
- Added support for progress tracking during index initialization and enhanced output transformation based on user-defined modes.
- Included comprehensive documentation for usage and parameters in the smart search tool.
- Updated the dashboard template to hide the Code Graph Explorer feature.
- Enhanced the `executeCodexLens` function to use `exec` for better cross-platform compatibility and improved command execution.
- Changed the default `maxResults` and `limit` parameters in the smart search tool to 10 for better performance.
- Introduced a new `priority` search mode in the smart search tool, replacing the previous `parallel` mode, which now follows a fallback strategy: hybrid -> exact -> ripgrep.
- Optimized the embedding generation process in the embedding manager by batching operations and using a cached embedder instance to reduce model loading overhead.
- Implemented a thread-safe singleton pattern for the embedder to improve performance across multiple searches.
Previously, embeddings were only generated for root directory files (1.6% coverage, 5/303 files).
This fix implements recursive processing across all subdirectory indexes, achieving 100% coverage
with 2,042 semantic chunks across all 303 files in 26 index databases.
Key improvements:
1. **Recursive embeddings generation** (embedding_manager.py):
- Add generate_embeddings_recursive() to process all _index.db files in directory tree
- Add get_embeddings_status() for comprehensive coverage statistics
- Add discover_all_index_dbs() helper for recursive file discovery
2. **Enhanced CLI commands** (commands.py):
- embeddings-generate: Add --recursive flag for full project coverage
- init: Use recursive generation by default for complete indexing
- status: Display embeddings coverage statistics with 50% threshold
3. **Smart search routing improvements** (smart-search.ts):
- Add 50% embeddings coverage threshold for hybrid mode routing
- Auto-fallback to exact mode when coverage insufficient
- Strip ANSI color codes from JSON output for correct parsing
- Add embeddings_coverage_percent to IndexStatus and SearchMetadata
- Provide clear warnings with actionable suggestions
4. **Documentation and analysis**:
- Add SMART_SEARCH_ANALYSIS.md with initial investigation
- Add SMART_SEARCH_CORRECTED_ANALYSIS.md revealing true extent of issue
- Add EMBEDDINGS_FIX_SUMMARY.md with complete fix summary
- Add check_embeddings.py script for coverage verification
Results:
- Coverage improved from 1.6% (5/303 files) to 100% (303/303 files) - 62.5x increase
- Semantic chunks increased from 10 to 2,042 - 204x increase
- All 26 subdirectory indexes now have embeddings vs just 1
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Implement tests for migration 005 to verify removal of deprecated fields in the database schema.
- Ensure that new databases are created with a clean schema.
- Validate that keywords are correctly extracted from the normalized file_keywords table.
- Test symbol insertion without deprecated fields and subdir operations without direct_files.
- Create a detailed search comparison test to evaluate vector search vs hybrid search performance.
- Add a script for reindexing projects to extract code relationships and verify GraphAnalyzer functionality.
- Include a test script to check TreeSitter parser availability and relationship extraction from sample files.