Commit Graph

5 Commits

Author SHA1 Message Date
catlog22
71faaf43a8 refactor: 移除 SPLADE 和 hybrid_cascade,精简搜索架构
删除 SPLADE 稀疏神经搜索后端和 hybrid_cascade 策略,
将搜索架构从 6 种后端简化为 4 种(FTS Exact/Fuzzy, Binary Vector, Dense Vector, LSP)。

主要变更:
- 删除 splade_encoder.py, splade_index.py, migration_009 等 4 个文件
- 移除 config.py 中 SPLADE 相关配置(enable_splade, splade_model 等)
- DEFAULT_WEIGHTS 改为 FTS 权重 {exact:0.25, fuzzy:0.1, vector:0.5, lsp:0.15}
- 删除 hybrid_cascade_search(),所有 cascade fallback 改为 self.search()
- API fusion_strategy='hybrid' 向后兼容映射到 binary_rerank
- 删除 CLI index_splade/splade_status 命令和 --method splade
- 更新测试、基准测试和文档
2026-02-08 12:07:41 +08:00
catlog22
54fb7afdb2 Enhance semantic search capabilities and configuration
- Added category support for programming and documentation languages in Config.
- Implemented category-based filtering in HybridSearchEngine to improve search relevance based on query intent.
- Introduced functions for filtering results by category and determining file categories based on extensions.
- Updated VectorStore to include a category column in the database schema and modified chunk addition methods to support category tagging.
- Enhanced the WatcherConfig to ignore additional common directories and files.
- Created a benchmark script to compare performance between Binary Cascade, SPLADE, and Vector semantic search methods, including detailed result analysis and overlap comparison.
2026-01-02 15:01:20 +08:00
catlog22
56c03c847a feat: Add method to retrieve all semantic chunks from the vector store
- Implemented `get_all_chunks` method in `VectorStore` class to fetch all semantic chunks from the database.
- Added a new benchmark script `analyze_methods.py` for analyzing hybrid search methods and storage architecture.
- Included detailed analysis of method contributions, storage conflicts, and FTS + Rerank fusion experiments.
- Updated results JSON structure to reflect new analysis outputs and method performance metrics.
2026-01-02 12:32:43 +08:00
catlog22
9129c981a4 feat: Enhance BinaryANNIndex with vectorized search and performance benchmarking 2026-01-02 11:49:54 +08:00
catlog22
da68ba0b82 feat: Implement cascade indexing command and benchmark script for performance evaluation 2026-01-02 11:24:06 +08:00