Commit Graph

33 Commits

Author SHA1 Message Date
catlog22
b958a1ea96 fix(sqlite-store): add periodic cleanup timer for connection pool
Implement background timer to proactively clean stale connections
every 5 minutes, preventing indefinite accumulation.

Solution-ID: SOL-1735392000002
Issue-ID: ISS-1766921318981-3
Task-ID: T1
2025-12-29 18:43:55 +08:00
catlog22
9a45732a39 test(codex-lens): add connection pool stress tests
Solution-ID: SOL-1735410004
Issue-ID: ISS-1766921318981-24
Task-ID: T3
2025-12-29 18:16:03 +08:00
catlog22
015b46e58b test(codex-lens): add concurrent write operation tests
Solution-ID: SOL-1735410004
Issue-ID: ISS-1766921318981-24
Task-ID: T2
2025-12-29 18:12:09 +08:00
catlog22
042a99dbe3 test(codex-lens): add concurrent read operation tests
Solution-ID: SOL-1735410004

Issue-ID: ISS-1766921318981-24

Task-ID: T1
2025-12-29 17:59:08 +08:00
catlog22
1396010437 fix(embedder): add lock protection for cache read operations
Protect fast path cache read in get_embedder() to prevent KeyError

during concurrent access and cache clearing operations.

Solution-ID: SOL-1735392000001

Issue-ID: ISS-1766921318981-2

Task-ID: T1
2025-12-29 12:33:23 +08:00
catlog22
84d06f4273 fix(registry): normalize path case for comparison on Windows
Adds case normalization for path comparison on Windows to handle
case-insensitive filesystem behavior. Preserves case-sensitivity on Unix.

Fixes: ISS-1766921318981-13

Solution-ID: SOL-1735386000-13
Issue-ID: ISS-1766921318981-13
Task-ID: T1
2025-12-28 21:51:23 +08:00
catlog22
af2ff54cb7 test(vector-store): add epsilon tolerance edge case tests
Add comprehensive test coverage for near-zero norms, product
underflow, and floating point precision edge cases in
_cosine_similarity function.

Solution-ID: SOL-20251228113619
Issue-ID: ISS-1766921318981-11
Task-ID: T2
2025-12-28 21:37:59 +08:00
catlog22
93dcdd2293 fix(config): log configuration loading errors instead of silently ignoring
Replaces bare exception handler in load_settings() with logging.warning()
to help users debug configuration file issues (syntax errors, permissions).
Maintains backward compatibility - errors do not break initialization.

Solution-ID: SOL-1735385400001
Issue-ID: ISS-1766921318981-1
Task-ID: T1
2025-12-28 21:06:23 +08:00
catlog22
58caccb250 test(ranking): add edge case tests for normalize_weights
Add comprehensive test coverage for NaN, infinity, and all-None
edge cases in weight normalization to prevent regression.

Solution-ID: SOL-20251228113631
Issue-ID: ISS-1766921318981-0
Task-ID: T2
2025-12-28 20:59:08 +08:00
catlog22
4061ae48c4 feat: Implement adaptive RRF weights and query intent detection
- Added integration tests for adaptive RRF weights in hybrid search.
- Enhanced query intent detection with new classifications: keyword, semantic, and mixed.
- Introduced symbol boosting in search results based on explicit symbol matches.
- Implemented embedding-based reranking with configurable options.
- Added global symbol index for efficient symbol lookups across projects.
- Improved file deletion handling on Windows to avoid permission errors.
- Updated chunk configuration to increase overlap for better context.
- Modified package.json test script to target specific test files.
- Created comprehensive writing style guidelines for documentation.
- Added TypeScript tests for query intent detection and adaptive weights.
- Established performance benchmarks for global symbol indexing.
2025-12-26 15:08:47 +08:00
catlog22
3b842ed290 feat(cli-executor): add streaming option and enhance output handling
- Introduced a `stream` parameter to control output streaming vs. caching.
- Enhanced status determination logic to prioritize valid output over exit codes.
- Updated output structure to include full stdout and stderr when not streaming.

feat(cli-history-store): extend conversation turn schema and migration

- Added `cached`, `stdout_full`, and `stderr_full` fields to the conversation turn schema.
- Implemented database migration to add new columns if they do not exist.
- Updated upsert logic to handle new fields.

feat(codex-lens): implement global symbol index for fast lookups

- Created `GlobalSymbolIndex` class to manage project-wide symbol indexing.
- Added methods for adding, updating, and deleting symbols in the global index.
- Integrated global index updates into directory indexing processes.

feat(codex-lens): optimize search functionality with global index

- Enhanced `ChainSearchEngine` to utilize the global symbol index for faster searches.
- Added configuration option to enable/disable global symbol indexing.
- Updated tests to validate global index functionality and performance.
2025-12-25 22:22:31 +08:00
catlog22
8e744597d1 feat: Implement CodexLens multi-provider embedding rotation management
- Added functions to get and update CodexLens embedding rotation configuration.
- Introduced functionality to retrieve enabled embedding providers for rotation.
- Created endpoints for managing rotation configuration via API.
- Enhanced dashboard UI to support multi-provider rotation configuration.
- Updated internationalization strings for new rotation features.
- Adjusted CLI commands and embedding manager to support increased concurrency limits.
- Modified hybrid search weights for improved ranking behavior.
2025-12-25 14:13:27 +08:00
catlog22
3c3ce55842 feat: 添加对 LiteLLM 嵌入后端的支持,增强并发 API 调用能力 2025-12-24 22:20:13 +08:00
catlog22
e671b45948 feat: Enhance configuration management and embedding capabilities
- Added JSON-based settings management in Config class for embedding and LLM configurations.
- Introduced methods to save and load settings from a JSON file.
- Updated BaseEmbedder and its subclasses to include max_tokens property for better token management.
- Enhanced chunking strategy to support recursive splitting of large symbols with improved overlap handling.
- Implemented comprehensive tests for recursive splitting and chunking behavior.
- Added CLI tools configuration management for better integration with external tools.
- Introduced a new command for compacting session memory into structured text for recovery.
2025-12-24 16:32:27 +08:00
catlog22
3e9a309079 refactor: 移除图索引功能,修复内存泄露,优化嵌入生成
主要更改:

1. 移除图索引功能 (graph indexing)
   - 删除 graph_analyzer.py 及相关迁移文件
   - 移除 CLI 的 graph 命令和 --enrich 标志
   - 清理 chain_search.py 中的图查询方法 (370行)
   - 删除相关测试文件

2. 修复嵌入生成内存问题
   - 重构 generate_embeddings.py 使用流式批处理
   - 改用 embedding_manager 的内存安全实现
   - 文件从 548 行精简到 259 行 (52.7% 减少)

3. 修复内存泄露
   - chain_search.py: quick_search 使用 with 语句管理 ChainSearchEngine
   - embedding_manager.py: 使用 with 语句管理 VectorStore
   - vector_store.py: 添加暴力搜索内存警告

4. 代码清理
   - 移除 Symbol 模型的 token_count 和 symbol_type 字段
   - 清理相关测试用例

测试: 760 passed, 7 skipped

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 16:22:03 +08:00
catlog22
b27d8a9570 feat: Add CLAUDE.md freshness tracking and update reminders
- Add SQLite table and CRUD methods for tracking update history
- Create freshness calculation service based on git file changes
- Add API endpoints for freshness data, marking updates, and history
- Display freshness badges in file tree (green/yellow/red indicators)
- Show freshness gauge and details in metadata panel
- Auto-mark files as updated after CLI sync
- Add English and Chinese i18n translations

Freshness algorithm: 100 - min((changedFilesCount / 20) * 100, 100)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 16:14:46 +08:00
catlog22
7adde91e9f feat: Add search result grouping by similarity score
Add functionality to group search results with similar content and scores
into a single representative result with additional locations.

Changes:
- Add AdditionalLocation entity model for storing grouped result locations
- Add additional_locations field to SearchResult for backward compatibility
- Implement group_similar_results() function in ranking.py with:
  - Content-based grouping (by excerpt or content field)
  - Score-based sub-grouping with configurable threshold
  - Metadata preservation with grouped_count tracking
- Add group_results and grouping_threshold options to SearchOptions
- Integrate grouping into ChainSearchEngine.search() after RRF fusion

Test coverage:
- 36 multi-level tests covering unit, boundary, integration, and performance
- Real-world scenario tests for RRF scores and duplicate code detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 16:33:44 +08:00
catlog22
2f0cce0089 feat: Enhance CodexLens indexing and search capabilities with new CLI options and improved error handling 2025-12-19 15:10:37 +08:00
catlog22
5e91ba6c60 Implement ANN index using HNSW algorithm and update related tests
- Added ANNIndex class for approximate nearest neighbor search using HNSW.
- Integrated ANN index with VectorStore for enhanced search capabilities.
- Updated test suite for ANN index, including tests for adding, searching, saving, and loading vectors.
- Modified existing tests to accommodate changes in search performance expectations.
- Improved error handling for file operations in tests to ensure compatibility with Windows file locks.
- Adjusted hybrid search performance assertions for increased stability in CI environments.
2025-12-19 10:35:29 +08:00
catlog22
17af615fe2 Add help view and core memory styles
- Introduced styles for the help view including tab transitions, accordion animations, search highlighting, and responsive design.
- Implemented core memory styles with modal base styles, memory card designs, and knowledge graph visualization.
- Enhanced dark mode support across various components.
- Added loading states and empty state designs for better user experience.
2025-12-18 18:29:45 +08:00
catlog22
b702791c2c Remove LLM enhancement features and related components as per user request. This includes the deletion of source code files, CLI commands, front-end components, tests, scripts, and documentation associated with LLM functionality. Simplified dependencies and reduced complexity while retaining core vector search capabilities. Validation of changes confirmed successful removal and functionality. 2025-12-16 21:38:27 +08:00
catlog22
d21066c282 Add scripts for inspecting LLM summaries and testing misleading comments
- Implement `inspect_llm_summaries.py` to display LLM-generated summaries from the semantic_chunks table in the database.
- Create `show_llm_analysis.py` to demonstrate LLM analysis of misleading code examples, highlighting discrepancies between comments and actual functionality.
- Develop `test_misleading_comments.py` to compare pure vector search with LLM-enhanced search, focusing on the impact of misleading or missing comments on search results.
- Introduce `test_llm_enhanced_search.py` to provide a test suite for evaluating the effectiveness of LLM-enhanced vector search against pure vector search.
- Ensure all new scripts are integrated with the existing codebase and follow the established coding standards.
2025-12-16 20:29:28 +08:00
catlog22
df23975a0b Add comprehensive tests for schema cleanup migration and search comparison
- Implement tests for migration 005 to verify removal of deprecated fields in the database schema.
- Ensure that new databases are created with a clean schema.
- Validate that keywords are correctly extracted from the normalized file_keywords table.
- Test symbol insertion without deprecated fields and subdir operations without direct_files.
- Create a detailed search comparison test to evaluate vector search vs hybrid search performance.
- Add a script for reindexing projects to extract code relationships and verify GraphAnalyzer functionality.
- Include a test script to check TreeSitter parser availability and relationship extraction from sample files.
2025-12-16 19:27:05 +08:00
catlog22
3da0ef2adb Add comprehensive tests for query parsing and Reciprocal Rank Fusion
- Implemented tests for the QueryParser class, covering various identifier splitting methods (CamelCase, snake_case, kebab-case), OR expansion, and FTS5 operator preservation.
- Added parameterized tests to validate expected token outputs for different query formats.
- Created edge case tests to ensure robustness against unusual input scenarios.
- Developed tests for the Reciprocal Rank Fusion (RRF) algorithm, including score computation, weight handling, and result ranking across multiple sources.
- Included tests for normalization of BM25 scores and tagging search results with source metadata.
2025-12-16 10:20:19 +08:00
catlog22
35485bbbb1 feat: Enhance navigation and cleanup for graph explorer view
- Added a cleanup function to reset the state when navigating away from the graph explorer.
- Updated navigation logic to call the cleanup function before switching views.
- Improved internationalization by adding new translations for graph-related terms.
- Adjusted icon sizes for better UI consistency in the graph explorer.
- Implemented impact analysis button functionality in the graph explorer.
- Refactored CLI tool configuration to use updated model names.
- Enhanced CLI executor to handle prompts correctly for codex commands.
- Introduced code relationship storage for better visualization in the index tree.
- Added support for parsing Markdown and plain text files in the symbol parser.
- Updated tests to reflect changes in language detection logic.
2025-12-15 23:11:01 +08:00
catlog22
97640a517a feat(storage): implement storage manager for centralized management and cleanup
- Added a new Storage Manager component to handle storage statistics, project cleanup, and configuration for CCW centralized storage.
- Introduced functions to calculate directory sizes, get project storage stats, and clean specific or all storage.
- Enhanced SQLiteStore with a public API for executing queries securely.
- Updated tests to utilize the new execute_query method and validate storage management functionalities.
- Improved performance by implementing connection pooling with idle timeout management in SQLiteStore.
- Added new fields (token_count, symbol_type) to the symbols table and adjusted related insertions.
- Enhanced error handling and logging for storage operations.
2025-12-15 17:39:38 +08:00
catlog22
0fe16963cd Add comprehensive tests for tokenizer, performance benchmarks, and TreeSitter parser functionality
- Implemented unit tests for the Tokenizer class, covering various text inputs, edge cases, and fallback mechanisms.
- Created performance benchmarks comparing tiktoken and pure Python implementations for token counting.
- Developed extensive tests for TreeSitterSymbolParser across Python, JavaScript, and TypeScript, ensuring accurate symbol extraction and parsing.
- Added configuration documentation for MCP integration and custom prompts, enhancing usability and flexibility.
- Introduced a refactor script for GraphAnalyzer to streamline future improvements.
2025-12-15 14:36:09 +08:00
catlog22
0529b57694 Implement database migration framework and performance optimizations
- Added active memory configuration for manual interval and Gemini tool.
- Created file modification rules for handling edits and writes.
- Implemented migration manager for managing database schema migrations.
- Added migration 001 to normalize keywords into separate tables.
- Developed tests for validating performance optimizations including keyword normalization, path lookup, and symbol search.
- Created validation script to manually verify optimization implementations.
2025-12-14 18:08:32 +08:00
catlog22
79a2953862 Add comprehensive tests for vector/semantic search functionality
- Implement full coverage tests for Embedder model loading and embedding generation
- Add CRUD operations and caching tests for VectorStore
- Include cosine similarity computation tests
- Validate semantic search accuracy and relevance through various queries
- Establish performance benchmarks for embedding and search operations
- Ensure edge cases and error handling are covered
- Test thread safety and concurrent access scenarios
- Verify availability of semantic search dependencies
2025-12-14 17:17:09 +08:00
catlog22
08dc0a0348 perf(codex-lens): optimize search performance with vectorized operations
Performance Optimizations:
- VectorStore: NumPy vectorized cosine similarity (100x+ faster)
  - Cached embedding matrix with pre-computed norms
  - Lazy content loading for top-k results only
  - Thread-safe cache invalidation
- SQLite: Added PRAGMA mmap_size=30GB for memory-mapped I/O
- FTS5: unicode61 tokenizer with tokenchars='_' for code identifiers
- ChainSearch: files_only fast path skipping snippet generation
- ThreadPoolExecutor: shared pool across searches

New Components:
- DirIndexStore: single-directory index with FTS5 and symbols
- RegistryStore: global project registry with path mappings
- PathMapper: source-to-index path conversion utility
- IndexTreeBuilder: hierarchical index tree construction
- ChainSearchEngine: parallel recursive directory search

Test Coverage:
- 36 comprehensive search functionality tests
- 14 performance benchmark tests
- 296 total tests passing (100% pass rate)

Benchmark Results:
- FTS5 search: 0.23-0.26ms avg (3900-4300 ops/sec)
- Vector search: 1.05-1.54ms avg (650-955 ops/sec)
- Full semantic: 4.56-6.38ms avg per query

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 11:06:24 +08:00
catlog22
4faa5f1c95 Add comprehensive tests for semantic chunking and search functionality
- Implemented tests for the ChunkConfig and Chunker classes, covering default and custom configurations.
- Added tests for symbol-based chunking, including single and multiple symbols, handling of empty symbols, and preservation of line numbers.
- Developed tests for sliding window chunking, ensuring correct chunking behavior with various content sizes and configurations.
- Created integration tests for semantic search, validating embedding generation, vector storage, and search accuracy across a complex codebase.
- Included performance tests for embedding generation and search operations.
- Established tests for chunking strategies, comparing symbol-based and sliding window approaches.
- Enhanced test coverage for edge cases, including handling of unicode characters and out-of-bounds symbol ranges.
2025-12-12 19:55:35 +08:00
catlog22
c42f91a7fe feat: Add support for Tree-Sitter parsing and enhance SQLite storage performance 2025-12-12 18:40:24 +08:00
catlog22
92d2085b64 Optimize SQLite FTS storage and pooling 2025-12-12 17:40:03 +08:00