- Implemented BinaryEmbeddingBackend for fast coarse filtering using 256-dimensional binary vectors.
- Developed DenseEmbeddingBackend for high-precision dense vectors (2048 dimensions) for reranking.
- Created CascadeEmbeddingBackend to combine binary and dense embeddings for two-stage retrieval.
- Introduced utility functions for embedding conversion and distance computation.
chore: Migration 010 - Add multi-vector storage support
- Added 'chunks' table to support multi-vector embeddings for cascade retrieval.
- Included new columns: embedding_binary (256-dim) and embedding_dense (2048-dim) for efficient storage.
- Implemented upgrade and downgrade functions to manage schema changes and data migration.
- Added `splade_encoder.py` for ONNX-optimized SPLADE encoding, including methods for encoding text and batch processing.
- Created `SPLADE_IMPLEMENTATION.md` to document the SPLADE encoder's functionality, design patterns, and integration points.
- Introduced migration script `migration_009_add_splade.py` to add SPLADE metadata and posting list tables to the database.
- Developed `splade_index.py` for managing the SPLADE inverted index, supporting efficient sparse vector retrieval.
- Added verification script `verify_watcher.py` to test FileWatcher event filtering and debouncing functionality.
- Implemented GraphExpander to enhance search results with related symbols using precomputed neighbors.
- Added CrossEncoderReranker for second-stage search ranking, allowing for improved result scoring.
- Created migrations to establish necessary database tables for relationships and graph neighbors.
- Developed tests for graph expansion functionality, ensuring related results are populated correctly.
- Enhanced performance benchmarks for cross-encoder reranking latency and graph expansion overhead.
- Updated schema cleanup tests to reflect changes in versioning and deprecated fields.
- Added new test cases for Treesitter parser to validate relationship extraction with alias resolution.
- Added integration tests for adaptive RRF weights in hybrid search.
- Enhanced query intent detection with new classifications: keyword, semantic, and mixed.
- Introduced symbol boosting in search results based on explicit symbol matches.
- Implemented embedding-based reranking with configurable options.
- Added global symbol index for efficient symbol lookups across projects.
- Improved file deletion handling on Windows to avoid permission errors.
- Updated chunk configuration to increase overlap for better context.
- Modified package.json test script to target specific test files.
- Created comprehensive writing style guidelines for documentation.
- Added TypeScript tests for query intent detection and adaptive weights.
- Established performance benchmarks for global symbol indexing.
- Added search limit, content length, and extra files input fields in the CodexLens manager UI.
- Updated API request parameters to include new fields: max_content_length and extra_files_count.
- Refactored smart-search.ts to support new parameters with default values.
- Implemented result splitting logic to return both full content and additional file paths.
- Updated CLI commands to remove worker limits and allow dynamic scaling based on endpoint count.
- Introduced EmbeddingPoolConfig for improved embedding management and auto-discovery of providers.
- Enhanced search engines to utilize new parameters for fuzzy and exact searches.
- Added support for embedding single texts in the LiteLLM embedder.
- Added functions to get and update CodexLens embedding rotation configuration.
- Introduced functionality to retrieve enabled embedding providers for rotation.
- Created endpoints for managing rotation configuration via API.
- Enhanced dashboard UI to support multi-provider rotation configuration.
- Updated internationalization strings for new rotation features.
- Adjusted CLI commands and embedding manager to support increased concurrency limits.
- Modified hybrid search weights for improved ranking behavior.
- Updated embedding_manager.py to include backend parameter in model configuration.
- Modified model_manager.py to utilize cache_name for ONNX models.
- Refactored hybrid_search.py to improve embedder initialization based on backend type.
- Added backend column to vector_store.py for better model configuration management.
- Implemented migration for existing database to include backend information.
- Enhanced API settings implementation with comprehensive provider and endpoint management.
- Introduced LiteLLM integration guide detailing configuration and usage.
- Added examples for LiteLLM usage in TypeScript.
- Fix model installation detection using fastembed ONNX cache names
- Add embeddings_config table for model metadata tracking
- Fix hybrid search segfault by using single-threaded GPU mode
- Suppress INFO logs in JSON mode to prevent error display
- Add model dropdown filtering to show only installed models
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated the dashboard template to hide the Code Graph Explorer feature.
- Enhanced the `executeCodexLens` function to use `exec` for better cross-platform compatibility and improved command execution.
- Changed the default `maxResults` and `limit` parameters in the smart search tool to 10 for better performance.
- Introduced a new `priority` search mode in the smart search tool, replacing the previous `parallel` mode, which now follows a fallback strategy: hybrid -> exact -> ripgrep.
- Optimized the embedding generation process in the embedding manager by batching operations and using a cached embedder instance to reduce model loading overhead.
- Implemented a thread-safe singleton pattern for the embedder to improve performance across multiple searches.
- Added ANNIndex class for approximate nearest neighbor search using HNSW.
- Integrated ANN index with VectorStore for enhanced search capabilities.
- Updated test suite for ANN index, including tests for adding, searching, saving, and loading vectors.
- Modified existing tests to accommodate changes in search performance expectations.
- Improved error handling for file operations in tests to ensure compatibility with Windows file locks.
- Adjusted hybrid search performance assertions for increased stability in CI environments.
- Implement tests for migration 005 to verify removal of deprecated fields in the database schema.
- Ensure that new databases are created with a clean schema.
- Validate that keywords are correctly extracted from the normalized file_keywords table.
- Test symbol insertion without deprecated fields and subdir operations without direct_files.
- Create a detailed search comparison test to evaluate vector search vs hybrid search performance.
- Add a script for reindexing projects to extract code relationships and verify GraphAnalyzer functionality.
- Include a test script to check TreeSitter parser availability and relationship extraction from sample files.
- Implemented tests for the QueryParser class, covering various identifier splitting methods (CamelCase, snake_case, kebab-case), OR expansion, and FTS5 operator preservation.
- Added parameterized tests to validate expected token outputs for different query formats.
- Created edge case tests to ensure robustness against unusual input scenarios.
- Developed tests for the Reciprocal Rank Fusion (RRF) algorithm, including score computation, weight handling, and result ranking across multiple sources.
- Included tests for normalization of BM25 scores and tagging search results with source metadata.