feat: Enhance embedding generation and search capabilities

mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-03-01 13:13:50 +08:00

- Added pre-calculation of estimated chunk count for HNSW capacity in `generate_dense_embeddings_centralized` to optimize indexing performance.
- Implemented binary vector generation with memory-mapped storage for efficient cascade search, including metadata saving.
- Introduced SPLADE sparse index generation with improved handling and metadata storage.
- Updated `ChainSearchEngine` to prefer centralized binary searcher for improved performance and added fallback to legacy binary index.
- Deprecated `BinaryANNIndex` in favor of `BinarySearcher` for better memory management and performance.
- Enhanced `SpladeEncoder` with warmup functionality to reduce latency spikes during first-time inference.
- Improved `SpladeIndex` with cache size adjustments for better query performance.
- Added methods for managing binary vectors in `VectorMetadataStore`, including batch insertion and retrieval.
- Created a new `BinarySearcher` class for efficient binary vector search using Hamming distance, supporting both memory-mapped and database loading modes.

This commit is contained in:

catlog22

2026-01-02 23:57:55 +08:00

parent 96b44e1482

commit 54fd94547c

12 changed files with 945 additions and 167 deletions

									
										1

codex-lens/src/codexlens/config.py
									
												View File
												
				@@ -25,6 +25,7 @@ SPLADE_DB_NAME = "_splade.db"

				# Dense vector storage names (centralized storage)

				VECTORS_HNSW_NAME = "_vectors.hnsw"

				VECTORS_META_DB_NAME = "_vectors_meta.db"

				BINARY_VECTORS_MMAP_NAME = "_binary_vectors.mmap"

				log = logging.getLogger(__name__)

feat: Enhance embedding generation and search capabilities

1 codex-lens/src/codexlens/config.py Unescape Escape View File

1

codex-lens/src/codexlens/config.py

View File