Claude-Code-Workflow

mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-02-10 02:24:35 +08:00

Author	SHA1	Message	Date
catlog22	54fd94547c	feat: Enhance embedding generation and search capabilities - Added pre-calculation of estimated chunk count for HNSW capacity in `generate_dense_embeddings_centralized` to optimize indexing performance. - Implemented binary vector generation with memory-mapped storage for efficient cascade search, including metadata saving. - Introduced SPLADE sparse index generation with improved handling and metadata storage. - Updated `ChainSearchEngine` to prefer centralized binary searcher for improved performance and added fallback to legacy binary index. - Deprecated `BinaryANNIndex` in favor of `BinarySearcher` for better memory management and performance. - Enhanced `SpladeEncoder` with warmup functionality to reduce latency spikes during first-time inference. - Improved `SpladeIndex` with cache size adjustments for better query performance. - Added methods for managing binary vectors in `VectorMetadataStore`, including batch insertion and retrieval. - Created a new `BinarySearcher` class for efficient binary vector search using Hamming distance, supporting both memory-mapped and database loading modes.	2026-01-02 23:57:55 +08:00
catlog22	c268b531aa	feat: Enhance embedding generation to track current index path and improve metadata retrieval	2026-01-02 19:18:26 +08:00
catlog22	0b6e9db8e4	feat: Add centralized vector storage and metadata management for embeddings	2026-01-02 17:18:23 +08:00
catlog22	92ed2524b7	feat: Enhance SPLADE indexing command to support multiple index databases and add chunk ID management	2026-01-02 13:25:23 +08:00
catlog22	e21d801523	feat: Add multi-type embedding backends for cascade retrieval - Implemented BinaryEmbeddingBackend for fast coarse filtering using 256-dimensional binary vectors. - Developed DenseEmbeddingBackend for high-precision dense vectors (2048 dimensions) for reranking. - Created CascadeEmbeddingBackend to combine binary and dense embeddings for two-stage retrieval. - Introduced utility functions for embedding conversion and distance computation. chore: Migration 010 - Add multi-vector storage support - Added 'chunks' table to support multi-vector embeddings for cascade retrieval. - Included new columns: embedding_binary (256-dim) and embedding_dense (2048-dim) for efficient storage. - Implemented upgrade and downgrade functions to manage schema changes and data migration.	2026-01-02 10:52:43 +08:00
catlog22	195438d26a	feat(splade): add cache directory support for ONNX models and improve thread-local database connection handling	2026-01-01 22:40:00 +08:00
catlog22	5bb01755bc	Implement SPLADE sparse encoder and associated database migrations - Added `splade_encoder.py` for ONNX-optimized SPLADE encoding, including methods for encoding text and batch processing. - Created `SPLADE_IMPLEMENTATION.md` to document the SPLADE encoder's functionality, design patterns, and integration points. - Introduced migration script `migration_009_add_splade.py` to add SPLADE metadata and posting list tables to the database. - Developed `splade_index.py` for managing the SPLADE inverted index, supporting efficient sparse vector retrieval. - Added verification script `verify_watcher.py` to test FileWatcher event filtering and debouncing functionality.	2026-01-01 17:41:22 +08:00
catlog22	31a45f1f30	Add graph expansion and cross-encoder reranking features - Implemented GraphExpander to enhance search results with related symbols using precomputed neighbors. - Added CrossEncoderReranker for second-stage search ranking, allowing for improved result scoring. - Created migrations to establish necessary database tables for relationships and graph neighbors. - Developed tests for graph expansion functionality, ensuring related results are populated correctly. - Enhanced performance benchmarks for cross-encoder reranking latency and graph expansion overhead. - Updated schema cleanup tests to reflect changes in versioning and deprecated fields. - Added new test cases for Treesitter parser to validate relationship extraction with alias resolution.	2025-12-31 16:58:59 +08:00
catlog22	3fdd52742b	fix(storage): handle rollback failures in batch operations Adds nested exception handling in add_files() and _migrate_fts_to_external() to catch and log rollback failures. Uses exception chaining to preserve both transaction and rollback errors, preventing silent database inconsistency. Solution-ID: SOL-1735385400010 Issue-ID: ISS-1766921318981-10 Task-ID: T1	2025-12-29 19:08:49 +08:00
catlog22	5d5652c2c5	fix(sqlite-store): improve thread tracking in connection cleanup Add fallback validation to detect dead threads missed by threading.enumerate(), ensuring all stale connections are cleaned. Solution-ID: SOL-1735392000002 Issue-ID: ISS-1766921318981-3 Task-ID: T2	2025-12-29 18:50:22 +08:00
catlog22	b958a1ea96	fix(sqlite-store): add periodic cleanup timer for connection pool Implement background timer to proactively clean stale connections every 5 minutes, preventing indefinite accumulation. Solution-ID: SOL-1735392000002 Issue-ID: ISS-1766921318981-3 Task-ID: T1	2025-12-29 18:43:55 +08:00
catlog22	84d06f4273	fix(registry): normalize path case for comparison on Windows Adds case normalization for path comparison on Windows to handle case-insensitive filesystem behavior. Preserves case-sensitivity on Unix. Fixes: ISS-1766921318981-13 Solution-ID: SOL-1735386000-13 Issue-ID: ISS-1766921318981-13 Task-ID: T1	2025-12-28 21:51:23 +08:00
catlog22	3b842ed290	feat(cli-executor): add streaming option and enhance output handling - Introduced a `stream` parameter to control output streaming vs. caching. - Enhanced status determination logic to prioritize valid output over exit codes. - Updated output structure to include full stdout and stderr when not streaming. feat(cli-history-store): extend conversation turn schema and migration - Added `cached`, `stdout_full`, and `stderr_full` fields to the conversation turn schema. - Implemented database migration to add new columns if they do not exist. - Updated upsert logic to handle new fields. feat(codex-lens): implement global symbol index for fast lookups - Created `GlobalSymbolIndex` class to manage project-wide symbol indexing. - Added methods for adding, updating, and deleting symbols in the global index. - Integrated global index updates into directory indexing processes. feat(codex-lens): optimize search functionality with global index - Enhanced `ChainSearchEngine` to utilize the global symbol index for faster searches. - Added configuration option to enable/disable global symbol indexing. - Updated tests to validate global index functionality and performance.	2025-12-25 22:22:31 +08:00
catlog22	3cd842ca1a	fix: ccw package.json removal - add root build script and fix cli.ts path resolution - Fix cli.ts loadPackageInfo() to try root package.json first (../../package.json) - Add build script and devDependencies to root package.json - Remove ccw/package.json and ccw/package-lock.json (no longer needed) - CodexLens: add config.json support for index_dir configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-23 10:25:15 +08:00
catlog22	e60d793c8c	fix: 修复 SmartSearch 的 ripgrep limit 和 FTS 分词器问题 - Ripgrep 模式: 添加总结果数量限制，防止返回超过 2MB 数据 - --max-count 只限制每个文件的匹配数，现在在收集结果时应用 limit - 达到限制时在 metadata 中添加 warning 提示 - FTS 分词器: 将点号(.)添加到 tokenchars，修复 PortRole.FLOW 等带点号标识符的精确搜索 - 更新 dir_index.py 和 migration_004_dual_fts.py 中的 tokenize 配置 - 需要重建索引才能生效 - Exact 模式: 添加 fuzzy 回退，当精确搜索无结果时自动尝试模糊搜索 - 回退时在 metadata 中标注 fallback: 'fuzzy' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-22 09:50:29 +08:00
catlog22	3e9a309079	refactor: 移除图索引功能，修复内存泄露，优化嵌入生成主要更改: 1. 移除图索引功能 (graph indexing) - 删除 graph_analyzer.py 及相关迁移文件 - 移除 CLI 的 graph 命令和 --enrich 标志 - 清理 chain_search.py 中的图查询方法 (370行) - 删除相关测试文件 2. 修复嵌入生成内存问题 - 重构 generate_embeddings.py 使用流式批处理 - 改用 embedding_manager 的内存安全实现 - 文件从 548 行精简到 259 行 (52.7% 减少) 3. 修复内存泄露 - chain_search.py: quick_search 使用 with 语句管理 ChainSearchEngine - embedding_manager.py: 使用 with 语句管理 VectorStore - vector_store.py: 添加暴力搜索内存警告 4. 代码清理 - 移除 Symbol 模型的 token_count 和 symbol_type 字段 - 清理相关测试用例测试: 760 passed, 7 skipped 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 16:22:03 +08:00
catlog22	3428642d04	feat: Optimize FTS search with batch symbol fetching and improved excerpt generation	2025-12-19 15:35:31 +08:00
catlog22	2f0cce0089	feat: Enhance CodexLens indexing and search capabilities with new CLI options and improved error handling	2025-12-19 15:10:37 +08:00
catlog22	69049e3f45	feat: Return complete method blocks in FTS search results - Add helper methods to locate match lines and find containing symbols - Modify search_fts, search_fts_exact, search_fts_fuzzy to return complete code blocks (functions/methods/classes) instead of short snippets - Join with files table to get full content and file_id - Query symbols table to find the smallest symbol containing the match - Fall back to context lines when no symbol contains the match - Add return_full_content and context_lines parameters for flexibility - Include start_line, end_line, symbol_name, symbol_kind in SearchResult This improves search result quality by returning semantically meaningful code blocks rather than arbitrary 20-byte snippets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 11:38:39 +08:00
catlog22	17af615fe2	Add help view and core memory styles - Introduced styles for the help view including tab transitions, accordion animations, search highlighting, and responsive design. - Implemented core memory styles with modal base styles, memory card designs, and knowledge graph visualization. - Enhanced dark mode support across various components. - Added loading states and empty state designs for better user experience.	2025-12-18 18:29:45 +08:00
catlog22	51a61bef31	Add parallel search mode and index progress bar Features: - CCW smart_search: Add 'parallel' mode that runs hybrid + exact + ripgrep simultaneously with RRF (Reciprocal Rank Fusion) for result merging - Dashboard: Add real-time progress bar for CodexLens index initialization - MCP: Return progress metadata in init action response - Codex-lens: Auto-detect optimal worker count for parallel indexing Changes: - smart-search.ts: Add parallel mode, RRF fusion, progress tracking - codex-lens.ts: Add onProgress callback support, progress parsing - codexlens-routes.ts: Broadcast index progress via WebSocket - codexlens-manager.js: New index progress modal with real-time updates - notifications.js: Add WebSocket event handler registration system - i18n.js: Add English/Chinese translations for progress UI - index_tree.py: Workers parameter now auto-detects CPU count (max 16) - commands.py: CLI --workers parameter supports auto-detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-17 23:17:15 +08:00
catlog22	df23975a0b	Add comprehensive tests for schema cleanup migration and search comparison - Implement tests for migration 005 to verify removal of deprecated fields in the database schema. - Ensure that new databases are created with a clean schema. - Validate that keywords are correctly extracted from the normalized file_keywords table. - Test symbol insertion without deprecated fields and subdir operations without direct_files. - Create a detailed search comparison test to evaluate vector search vs hybrid search performance. - Add a script for reindexing projects to extract code relationships and verify GraphAnalyzer functionality. - Include a test script to check TreeSitter parser availability and relationship extraction from sample files.	2025-12-16 19:27:05 +08:00
catlog22	3da0ef2adb	Add comprehensive tests for query parsing and Reciprocal Rank Fusion - Implemented tests for the QueryParser class, covering various identifier splitting methods (CamelCase, snake_case, kebab-case), OR expansion, and FTS5 operator preservation. - Added parameterized tests to validate expected token outputs for different query formats. - Created edge case tests to ensure robustness against unusual input scenarios. - Developed tests for the Reciprocal Rank Fusion (RRF) algorithm, including score computation, weight handling, and result ranking across multiple sources. - Included tests for normalization of BM25 scores and tagging search results with source metadata.	2025-12-16 10:20:19 +08:00
catlog22	35485bbbb1	feat: Enhance navigation and cleanup for graph explorer view - Added a cleanup function to reset the state when navigating away from the graph explorer. - Updated navigation logic to call the cleanup function before switching views. - Improved internationalization by adding new translations for graph-related terms. - Adjusted icon sizes for better UI consistency in the graph explorer. - Implemented impact analysis button functionality in the graph explorer. - Refactored CLI tool configuration to use updated model names. - Enhanced CLI executor to handle prompts correctly for codex commands. - Introduced code relationship storage for better visualization in the index tree. - Added support for parsing Markdown and plain text files in the symbol parser. - Updated tests to reflect changes in language detection logic.	2025-12-15 23:11:01 +08:00
catlog22	97640a517a	feat(storage): implement storage manager for centralized management and cleanup - Added a new Storage Manager component to handle storage statistics, project cleanup, and configuration for CCW centralized storage. - Introduced functions to calculate directory sizes, get project storage stats, and clean specific or all storage. - Enhanced SQLiteStore with a public API for executing queries securely. - Updated tests to utilize the new execute_query method and validate storage management functionalities. - Improved performance by implementing connection pooling with idle timeout management in SQLiteStore. - Added new fields (token_count, symbol_type) to the symbols table and adjusted related insertions. - Enhanced error handling and logging for storage operations.	2025-12-15 17:39:38 +08:00
catlog22	0fe16963cd	Add comprehensive tests for tokenizer, performance benchmarks, and TreeSitter parser functionality - Implemented unit tests for the Tokenizer class, covering various text inputs, edge cases, and fallback mechanisms. - Created performance benchmarks comparing tiktoken and pure Python implementations for token counting. - Developed extensive tests for TreeSitterSymbolParser across Python, JavaScript, and TypeScript, ensuring accurate symbol extraction and parsing. - Added configuration documentation for MCP integration and custom prompts, enhancing usability and flexibility. - Introduced a refactor script for GraphAnalyzer to streamline future improvements.	2025-12-15 14:36:09 +08:00
catlog22	0529b57694	Implement database migration framework and performance optimizations - Added active memory configuration for manual interval and Gemini tool. - Created file modification rules for handling edits and writes. - Implemented migration manager for managing database schema migrations. - Added migration 001 to normalize keywords into separate tables. - Developed tests for validating performance optimizations including keyword normalization, path lookup, and symbol search. - Created validation script to manually verify optimization implementations.	2025-12-14 18:08:32 +08:00
catlog22	79a2953862	Add comprehensive tests for vector/semantic search functionality - Implement full coverage tests for Embedder model loading and embedding generation - Add CRUD operations and caching tests for VectorStore - Include cosine similarity computation tests - Validate semantic search accuracy and relevance through various queries - Establish performance benchmarks for embedding and search operations - Ensure edge cases and error handling are covered - Test thread safety and concurrent access scenarios - Verify availability of semantic search dependencies	2025-12-14 17:17:09 +08:00
catlog22	08dc0a0348	perf(codex-lens): optimize search performance with vectorized operations Performance Optimizations: - VectorStore: NumPy vectorized cosine similarity (100x+ faster) - Cached embedding matrix with pre-computed norms - Lazy content loading for top-k results only - Thread-safe cache invalidation - SQLite: Added PRAGMA mmap_size=30GB for memory-mapped I/O - FTS5: unicode61 tokenizer with tokenchars='_' for code identifiers - ChainSearch: files_only fast path skipping snippet generation - ThreadPoolExecutor: shared pool across searches New Components: - DirIndexStore: single-directory index with FTS5 and symbols - RegistryStore: global project registry with path mappings - PathMapper: source-to-index path conversion utility - IndexTreeBuilder: hierarchical index tree construction - ChainSearchEngine: parallel recursive directory search Test Coverage: - 36 comprehensive search functionality tests - 14 performance benchmark tests - 296 total tests passing (100% pass rate) Benchmark Results: - FTS5 search: 0.23-0.26ms avg (3900-4300 ops/sec) - Vector search: 1.05-1.54ms avg (650-955 ops/sec) - Full semantic: 4.56-6.38ms avg per query 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-14 11:06:24 +08:00
catlog22	c42f91a7fe	feat: Add support for Tree-Sitter parsing and enhance SQLite storage performance	2025-12-12 18:40:24 +08:00
catlog22	92d2085b64	Optimize SQLite FTS storage and pooling	2025-12-12 17:40:03 +08:00
catlog22	a393601ec5	feat(codexlens): add CodexLens code indexing platform with incremental updates - Add CodexLens Python package with SQLite FTS5 search and tree-sitter parsing - Implement workspace-local index storage (.codexlens/ directory) - Add incremental update CLI command for efficient file-level index refresh - Integrate CodexLens with CCW tools (codex_lens action: update) - Add CodexLens Auto-Sync hook template for automatic index updates on file changes - Add CodexLens status card in CCW Dashboard CLI Manager with install/init buttons - Add server APIs: /api/codexlens/status, /api/codexlens/bootstrap, /api/codexlens/init 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 15:02:32 +08:00

32 Commits