Claude-Code-Workflow

mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-02-05 01:50:27 +08:00

Author	SHA1	Message	Date
catlog22	54fd94547c	feat: Enhance embedding generation and search capabilities - Added pre-calculation of estimated chunk count for HNSW capacity in `generate_dense_embeddings_centralized` to optimize indexing performance. - Implemented binary vector generation with memory-mapped storage for efficient cascade search, including metadata saving. - Introduced SPLADE sparse index generation with improved handling and metadata storage. - Updated `ChainSearchEngine` to prefer centralized binary searcher for improved performance and added fallback to legacy binary index. - Deprecated `BinaryANNIndex` in favor of `BinarySearcher` for better memory management and performance. - Enhanced `SpladeEncoder` with warmup functionality to reduce latency spikes during first-time inference. - Improved `SpladeIndex` with cache size adjustments for better query performance. - Added methods for managing binary vectors in `VectorMetadataStore`, including batch insertion and retrieval. - Created a new `BinarySearcher` class for efficient binary vector search using Hamming distance, supporting both memory-mapped and database loading modes.	2026-01-02 23:57:55 +08:00
catlog22	96b44e1482	feat: Add type validation for RRF weights and implement caching for embedder instances	2026-01-02 19:50:51 +08:00
catlog22	c268b531aa	feat: Enhance embedding generation to track current index path and improve metadata retrieval	2026-01-02 19:18:26 +08:00
catlog22	0b6e9db8e4	feat: Add centralized vector storage and metadata management for embeddings	2026-01-02 17:18:23 +08:00
catlog22	9157c5c78b	feat: Implement centralized storage for SPLADE and vector embeddings - Added centralized SPLADE database and vector storage configuration in config.py. - Updated embedding_manager.py to support centralized SPLADE database path. - Enhanced generate_embeddings and generate_embeddings_recursive functions for centralized storage. - Introduced centralized ANN index creation in ann_index.py. - Modified hybrid_search.py to utilize centralized vector index for searches. - Implemented methods to discover and manage centralized SPLADE and HNSW files.	2026-01-02 16:53:39 +08:00
catlog22	54fb7afdb2	Enhance semantic search capabilities and configuration - Added category support for programming and documentation languages in Config. - Implemented category-based filtering in HybridSearchEngine to improve search relevance based on query intent. - Introduced functions for filtering results by category and determining file categories based on extensions. - Updated VectorStore to include a category column in the database schema and modified chunk addition methods to support category tagging. - Enhanced the WatcherConfig to ignore additional common directories and files. - Created a benchmark script to compare performance between Binary Cascade, SPLADE, and Vector semantic search methods, including detailed result analysis and overlap comparison.	2026-01-02 15:01:20 +08:00
catlog22	92ed2524b7	feat: Enhance SPLADE indexing command to support multiple index databases and add chunk ID management	2026-01-02 13:25:23 +08:00
catlog22	56c03c847a	feat: Add method to retrieve all semantic chunks from the vector store - Implemented `get_all_chunks` method in `VectorStore` class to fetch all semantic chunks from the database. - Added a new benchmark script `analyze_methods.py` for analyzing hybrid search methods and storage architecture. - Included detailed analysis of method contributions, storage conflicts, and FTS + Rerank fusion experiments. - Updated results JSON structure to reflect new analysis outputs and method performance metrics.	2026-01-02 12:32:43 +08:00
catlog22	9129c981a4	feat: Enhance BinaryANNIndex with vectorized search and performance benchmarking	2026-01-02 11:49:54 +08:00
catlog22	da68ba0b82	feat: Implement cascade indexing command and benchmark script for performance evaluation	2026-01-02 11:24:06 +08:00
catlog22	e21d801523	feat: Add multi-type embedding backends for cascade retrieval - Implemented BinaryEmbeddingBackend for fast coarse filtering using 256-dimensional binary vectors. - Developed DenseEmbeddingBackend for high-precision dense vectors (2048 dimensions) for reranking. - Created CascadeEmbeddingBackend to combine binary and dense embeddings for two-stage retrieval. - Introduced utility functions for embedding conversion and distance computation. chore: Migration 010 - Add multi-vector storage support - Added 'chunks' table to support multi-vector embeddings for cascade retrieval. - Included new columns: embedding_binary (256-dim) and embedding_dense (2048-dim) for efficient storage. - Implemented upgrade and downgrade functions to manage schema changes and data migration.	2026-01-02 10:52:43 +08:00
catlog22	195438d26a	feat(splade): add cache directory support for ONNX models and improve thread-local database connection handling	2026-01-01 22:40:00 +08:00
catlog22	5bb01755bc	Implement SPLADE sparse encoder and associated database migrations - Added `splade_encoder.py` for ONNX-optimized SPLADE encoding, including methods for encoding text and batch processing. - Created `SPLADE_IMPLEMENTATION.md` to document the SPLADE encoder's functionality, design patterns, and integration points. - Introduced migration script `migration_009_add_splade.py` to add SPLADE metadata and posting list tables to the database. - Developed `splade_index.py` for managing the SPLADE inverted index, supporting efficient sparse vector retrieval. - Added verification script `verify_watcher.py` to test FileWatcher event filtering and debouncing functionality.	2026-01-01 17:41:22 +08:00
catlog22	520f2d26f2	feat(codex-lens): add unified reranker architecture and file watcher Unified Reranker Architecture: - Add BaseReranker ABC with factory pattern - Implement 4 backends: ONNX (default), API, LiteLLM, Legacy - Add .env configuration parsing for API credentials - Migrate from sentence-transformers to optimum+onnxruntime File Watcher Module: - Add real-time file system monitoring with watchdog - Implement IncrementalIndexer for single-file updates - Add WatcherManager with signal handling and graceful shutdown - Add 'codexlens watch' CLI command - Event filtering, debouncing, and deduplication - Thread-safe design with proper resource cleanup Tests: 16 watcher tests + 5 reranker test files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-01 13:23:52 +08:00
catlog22	31a45f1f30	Add graph expansion and cross-encoder reranking features - Implemented GraphExpander to enhance search results with related symbols using precomputed neighbors. - Added CrossEncoderReranker for second-stage search ranking, allowing for improved result scoring. - Created migrations to establish necessary database tables for relationships and graph neighbors. - Developed tests for graph expansion functionality, ensuring related results are populated correctly. - Enhanced performance benchmarks for cross-encoder reranking latency and graph expansion overhead. - Updated schema cleanup tests to reflect changes in versioning and deprecated fields. - Added new test cases for Treesitter parser to validate relationship extraction with alias resolution.	2025-12-31 16:58:59 +08:00
catlog22	70f8b14eaa	refactor(vector_store): use safer SQL query construction pattern Replaces f-string interpolation with safer string formatting. Adds documentation on SQL injection prevention. No functional changes - parameterized queries still used. Fixes: ISS-1766921318981-9 Solution-ID: SOL-1735386000-9 Issue-ID: ISS-1766921318981-9 Task-ID: T1	2025-12-29 20:09:49 +08:00
catlog22	0c8b2f2ec9	fix(vector_store): add bounds checking for chunk ID generation Prevents potential integer overflow when start_id is near sys.maxsize. Adds validation before range() calculation in batch insert methods. Fixes: ISS-1766921318981-6 Solution-ID: SOL-1735386000-6 Issue-ID: ISS-1766921318981-6 Task-ID: T1	2025-12-29 20:02:19 +08:00
catlog22	c56104c082	fix(vector_store): add null check for ANN search results before filtering Prevents errors when HNSW search returns null/empty results due to race conditions. Adds validation for ids and distances before zip operation. Fixes: ISS-1766921318981-5 Solution-ID: SOL-1735386000-5 Issue-ID: ISS-1766921318981-5 Task-ID: T1	2025-12-29 19:53:32 +08:00
catlog22	7f4433e449	fix(vector_store): add parameter validation for min_score range Validates min_score is within [0.0, 1.0] for cosine similarity. Raises ValueError for out-of-range values to prevent unexpected filtering. Fixes: ISS-1766921318981-14 Solution-ID: SOL-1735386000-14 Issue-ID: ISS-1766921318981-14 Task-ID: T1	2025-12-29 19:46:26 +08:00
catlog22	60fbb4177c	fix(config): add specific exception handling for path operations Replaces generic Exception handling with specific PermissionError and OSError handling in __post_init__ and ensure_runtime_dirs(). Provides clear diagnostic messages to distinguish permission issues from other filesystem errors. Solution-ID: SOL-1735385400008 Issue-ID: ISS-1766921318981-8 Task-ID: T1	2025-12-29 19:34:27 +08:00
catlog22	5914b1c5fc	fix(vector-store): protect bulk insert mode transitions with lock Ensure begin_bulk_insert() and end_bulk_insert() are fully lock-protected to prevent TOCTOU race conditions. Solution-ID: SOL-1735392000003 Issue-ID: ISS-1766921318981-12 Task-ID: T2	2025-12-29 19:20:02 +08:00
catlog22	d8be23fa83	fix(vector-store): add lock protection for bulk insert mode flag Protect _bulk_insert_mode flag and accumulation lists with _ann_write_lock to prevent corruption during concurrent access. Solution-ID: SOL-1735392000003 Issue-ID: ISS-1766921318981-12 Task-ID: T1	2025-12-29 19:16:30 +08:00
catlog22	3fdd52742b	fix(storage): handle rollback failures in batch operations Adds nested exception handling in add_files() and _migrate_fts_to_external() to catch and log rollback failures. Uses exception chaining to preserve both transaction and rollback errors, preventing silent database inconsistency. Solution-ID: SOL-1735385400010 Issue-ID: ISS-1766921318981-10 Task-ID: T1	2025-12-29 19:08:49 +08:00
catlog22	76ab4d67fe	test(entities): add zero vector validation tests Add comprehensive test coverage for zero and near-zero vector detection in SemanticChunk embedding validation. Solution-ID: SOL-20251228113612 Issue-ID: ISS-1766921318981-7 Task-ID: T2	2025-12-29 19:03:20 +08:00
catlog22	c859af1abf	fix(entities): validate embeddings are non-zero vectors Add L2 norm check to SemanticChunk.validate_embedding to reject zero vectors. Prevents division by zero in cosine similarity calculations downstream in vector search. Solution-ID: SOL-20251228113612 Issue-ID: ISS-1766921318981-7 Task-ID: T1	2025-12-29 19:01:27 +08:00
catlog22	6a73d3c379	fix(search): handle path operation failures in symbol filtering Adds robust exception handling for os.path.commonpath() in search_symbols() to prevent crashes on malformed paths and Windows cross-drive scenarios. Invalid symbols are skipped with debug logging, search continues. Solution-ID: SOL-1735385400004 Issue-ID: ISS-1766921318981-4 Task-ID: T1	2025-12-29 18:59:10 +08:00
catlog22	5d5652c2c5	fix(sqlite-store): improve thread tracking in connection cleanup Add fallback validation to detect dead threads missed by threading.enumerate(), ensuring all stale connections are cleaned. Solution-ID: SOL-1735392000002 Issue-ID: ISS-1766921318981-3 Task-ID: T2	2025-12-29 18:50:22 +08:00
catlog22	b958a1ea96	fix(sqlite-store): add periodic cleanup timer for connection pool Implement background timer to proactively clean stale connections every 5 minutes, preventing indefinite accumulation. Solution-ID: SOL-1735392000002 Issue-ID: ISS-1766921318981-3 Task-ID: T1	2025-12-29 18:43:55 +08:00
catlog22	9a45732a39	test(codex-lens): add connection pool stress tests Solution-ID: SOL-1735410004 Issue-ID: ISS-1766921318981-24 Task-ID: T3	2025-12-29 18:16:03 +08:00
catlog22	015b46e58b	test(codex-lens): add concurrent write operation tests Solution-ID: SOL-1735410004 Issue-ID: ISS-1766921318981-24 Task-ID: T2	2025-12-29 18:12:09 +08:00
catlog22	042a99dbe3	test(codex-lens): add concurrent read operation tests Solution-ID: SOL-1735410004 Issue-ID: ISS-1766921318981-24 Task-ID: T1	2025-12-29 17:59:08 +08:00
catlog22	1396010437	fix(embedder): add lock protection for cache read operations Protect fast path cache read in get_embedder() to prevent KeyError during concurrent access and cache clearing operations. Solution-ID: SOL-1735392000001 Issue-ID: ISS-1766921318981-2 Task-ID: T1	2025-12-29 12:33:23 +08:00
catlog22	84d06f4273	fix(registry): normalize path case for comparison on Windows Adds case normalization for path comparison on Windows to handle case-insensitive filesystem behavior. Preserves case-sensitivity on Unix. Fixes: ISS-1766921318981-13 Solution-ID: SOL-1735386000-13 Issue-ID: ISS-1766921318981-13 Task-ID: T1	2025-12-28 21:51:23 +08:00
catlog22	18cc536f65	refactor(vector-store): use consistent EPSILON constant Define module-level EPSILON constant and use it in both _cosine_similarity and _refresh_cache for consistent floating point precision handling. Solution-ID: SOL-20251228113619 Issue-ID: ISS-1766921318981-11 Task-ID: T3	2025-12-28 21:40:46 +08:00
catlog22	af2ff54cb7	test(vector-store): add epsilon tolerance edge case tests Add comprehensive test coverage for near-zero norms, product underflow, and floating point precision edge cases in _cosine_similarity function. Solution-ID: SOL-20251228113619 Issue-ID: ISS-1766921318981-11 Task-ID: T2	2025-12-28 21:37:59 +08:00
catlog22	6486c56850	fix(vector-store): add epsilon tolerance for norm checks Replace exact zero comparison with epsilon-based check (< 1e-10) in _cosine_similarity to handle floating point precision issues. Also check for product underflow to prevent inf/nan from division by very small numbers. Solution-ID: SOL-20251228113619 Issue-ID: ISS-1766921318981-11 Task-ID: T1	2025-12-28 21:11:30 +08:00
catlog22	93dcdd2293	fix(config): log configuration loading errors instead of silently ignoring Replaces bare exception handler in load_settings() with logging.warning() to help users debug configuration file issues (syntax errors, permissions). Maintains backward compatibility - errors do not break initialization. Solution-ID: SOL-1735385400001 Issue-ID: ISS-1766921318981-1 Task-ID: T1	2025-12-28 21:06:23 +08:00
catlog22	58caccb250	test(ranking): add edge case tests for normalize_weights Add comprehensive test coverage for NaN, infinity, and all-None edge cases in weight normalization to prevent regression. Solution-ID: SOL-20251228113631 Issue-ID: ISS-1766921318981-0 Task-ID: T2	2025-12-28 20:59:08 +08:00
catlog22	598eed92cb	fix(ranking): add explicit NaN check in normalize_weights Add math.isnan() check before math.isfinite() to properly catch NaN values in weight totals. Prevents division by NaN which could produce unexpected results in RRF fusion calculations. Solution-ID: SOL-20251228113631 Issue-ID: ISS-1766921318981-0 Task-ID: T1	2025-12-28 20:55:03 +08:00
catlog22	a2c88ba885	feat: Add project guidelines support and enhance project overview rendering	2025-12-28 14:50:50 +08:00
catlog22	4061ae48c4	feat: Implement adaptive RRF weights and query intent detection - Added integration tests for adaptive RRF weights in hybrid search. - Enhanced query intent detection with new classifications: keyword, semantic, and mixed. - Introduced symbol boosting in search results based on explicit symbol matches. - Implemented embedding-based reranking with configurable options. - Added global symbol index for efficient symbol lookups across projects. - Improved file deletion handling on Windows to avoid permission errors. - Updated chunk configuration to increase overlap for better context. - Modified package.json test script to target specific test files. - Created comprehensive writing style guidelines for documentation. - Added TypeScript tests for query intent detection and adaptive weights. - Established performance benchmarks for global symbol indexing.	2025-12-26 15:08:47 +08:00
catlog22	3b842ed290	feat(cli-executor): add streaming option and enhance output handling - Introduced a `stream` parameter to control output streaming vs. caching. - Enhanced status determination logic to prioritize valid output over exit codes. - Updated output structure to include full stdout and stderr when not streaming. feat(cli-history-store): extend conversation turn schema and migration - Added `cached`, `stdout_full`, and `stderr_full` fields to the conversation turn schema. - Implemented database migration to add new columns if they do not exist. - Updated upsert logic to handle new fields. feat(codex-lens): implement global symbol index for fast lookups - Created `GlobalSymbolIndex` class to manage project-wide symbol indexing. - Added methods for adding, updating, and deleting symbols in the global index. - Integrated global index updates into directory indexing processes. feat(codex-lens): optimize search functionality with global index - Enhanced `ChainSearchEngine` to utilize the global symbol index for faster searches. - Added configuration option to enable/disable global symbol indexing. - Updated tests to validate global index functionality and performance.	2025-12-25 22:22:31 +08:00
catlog22	203100431b	feat: 添加 Code Index MCP 提供者支持，更新相关 API 和配置	2025-12-25 19:58:42 +08:00
catlog22	ebcbb11cb2	feat: Enhance CodexLens search functionality with new parameters and result handling - Added search limit, content length, and extra files input fields in the CodexLens manager UI. - Updated API request parameters to include new fields: max_content_length and extra_files_count. - Refactored smart-search.ts to support new parameters with default values. - Implemented result splitting logic to return both full content and additional file paths. - Updated CLI commands to remove worker limits and allow dynamic scaling based on endpoint count. - Introduced EmbeddingPoolConfig for improved embedding management and auto-discovery of providers. - Enhanced search engines to utilize new parameters for fuzzy and exact searches. - Added support for embedding single texts in the LiteLLM embedder.	2025-12-25 16:16:44 +08:00
catlog22	a1413dd1b3	feat: Unified Embedding Pool with auto-discovery Architecture refactoring for multi-provider rotation: Backend: - Add EmbeddingPoolConfig type with autoDiscover support - Implement discoverProvidersForModel() for auto-aggregation - Add GET/PUT /api/litellm-api/embedding-pool endpoints - Add GET /api/litellm-api/embedding-pool/discover/:model preview - Convert ccw-litellm status check to async with 5-min cache - Maintain backward compatibility with legacy rotation config Frontend: - Add "Embedding Pool" tab in API Settings - Auto-discover providers when target model selected - Show provider/key count with include/exclude controls - Increase sidebar width (280px → 320px) - Add sync result feedback on save Other: - Remove worker count limits (was max=32) - Add i18n translations (EN/CN) - Update .gitignore for .mcp.json 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-25 16:06:49 +08:00
catlog22	8e744597d1	feat: Implement CodexLens multi-provider embedding rotation management - Added functions to get and update CodexLens embedding rotation configuration. - Introduced functionality to retrieve enabled embedding providers for rotation. - Created endpoints for managing rotation configuration via API. - Enhanced dashboard UI to support multi-provider rotation configuration. - Updated internationalization strings for new rotation features. - Adjusted CLI commands and embedding manager to support increased concurrency limits. - Modified hybrid search weights for improved ranking behavior.	2025-12-25 14:13:27 +08:00
catlog22	501d9a05d4	fix: 修复 ModelScope API 路由 bug 导致的 Ollama 连接错误 - 添加 _sanitize_text() 方法处理以 'import' 开头的文本 - ModelScope 后端错误地将此类文本路由到本地 Ollama 端点 - 通过在文本前添加空格绕过路由检测，不影响嵌入质量 - 增强 embedding_manager.py 的重试逻辑和错误处理 - 在 commands.py 中成功生成后调用全局模型锁定 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-25 12:52:43 +08:00
catlog22	229d51cd18	feat: 添加全局模型锁定功能，防止不同模型混合使用，增强嵌入生成的稳定性	2025-12-25 11:20:05 +08:00
catlog22	40e61b30d6	feat: 添加多端点支持和负载均衡功能，增强 LiteLLM 嵌入管理	2025-12-25 11:01:08 +08:00
catlog22	3c3ce55842	feat: 添加对 LiteLLM 嵌入后端的支持，增强并发 API 调用能力	2025-12-24 22:20:13 +08:00

1 2 3

105 Commits