Commit Graph

126 Commits

Author SHA1 Message Date
catlog22
f14418603a feat(cli): 添加 --rule 选项支持模板自动发现
重构 ccw cli 模板系统:

- 新增 template-discovery.ts 模块,支持扁平化模板自动发现
- 添加 --rule <template> 选项,自动加载 protocol 和 template
- 模板目录从嵌套结构 (prompts/category/file.txt) 迁移到扁平结构 (prompts/category-function.txt)
- 更新所有 agent/command 文件,使用 $PROTO $TMPL 环境变量替代 $(cat ...) 模式
- 支持模糊匹配:--rule 02-review-architecture 可匹配 analysis-review-architecture.txt

其他更新:
- Dashboard: 添加 Claude Manager 和 Issue Manager 页面
- Codex-lens: 增强 chain_search 和 clustering 模块

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 19:20:24 +08:00
catlog22
c308e429f8 feat: 添加增量更新命令以支持单文件索引更新 2026-01-15 18:14:51 +08:00
catlog22
8a15e08944 feat: 增强索引树构建逻辑,支持递归检查子目录中的可索引文件 2026-01-13 11:08:48 +08:00
catlog22
8c2d39d517 feat: 添加配置选项以调整重排序模型的权重和测试文件惩罚,增强语义搜索功能 2026-01-13 10:44:26 +08:00
catlog22
57173c9b02 feat: 优化动态批量大小计算,确保使用所有解析规则的最大字符限制,并调整利用率因子的安全范围 2026-01-12 17:47:19 +08:00
catlog22
90a1321aac feat: 添加动态批量大小计算,优化嵌入管理和配置系统 2026-01-12 17:34:37 +08:00
catlog22
b77672dda4 feat: 增强模型下载功能,支持 HuggingFace Hub 直接下载 ONNX 格式模型 2026-01-11 18:22:36 +08:00
catlog22
1e91fa9f9e feat: Add custom model download functionality and enhance model management
- Implemented `model-download-custom` command to download HuggingFace models.
- Added support for discovering manually placed models in the cache.
- Enhanced the model list view to display recommended and discovered models separately.
- Introduced JSON editor for direct configuration mode in API settings.
- Added validation and formatting features for JSON input.
- Updated translations for new API settings and common actions.
- Improved user interface for model management, including action buttons and tooltips.
2026-01-11 15:13:11 +08:00
catlog22
fae2f7e279 feat: 始终注册队列变更回调以支持标准输出(TypeScript 后端) 2026-01-07 22:21:11 +08:00
catlog22
05514631f2 feat: Enhance JSON streaming parsing and UI updates
- Added a function to parse JSON streaming content in core-memory.js, extracting readable text from messages.
- Updated memory detail view to utilize the new parsing function for content and summary.
- Introduced an enableReview option in rules-manager.js, allowing users to toggle review functionality in rule creation.
- Simplified skill creation modal in skills-manager.js by removing generation type selection UI.
- Improved CLI executor to handle tool calls for file writing, ensuring proper output parsing.
- Adjusted CLI command tests to set timeout to 0 for immediate execution.
- Updated file watcher to implement a true debounce mechanism and added a pending queue status for UI updates.
- Enhanced watcher manager to handle queue changes and provide JSON output for better integration with TypeScript backend.
- Established TypeScript naming conventions documentation to standardize code style across the project.
2026-01-07 21:51:26 +08:00
catlog22
42fbc1936d feat: 更新执行命令的参数提示,支持指定现有工作树路径,增强工作树管理功能 2026-01-07 16:54:23 +08:00
catlog22
87d38a3374 feat: 添加重排序模型配置,支持最大输入令牌数,优化 API 批处理能力 2026-01-07 15:50:22 +08:00
catlog22
6aa79c6dc9 feat: 添加工作空间索引状态接口,增强 CodexLens 状态检查功能,支持前端显示索引信息 2026-01-07 11:36:06 +08:00
catlog22
1bd3d9c9bf feat: 移除文档语言配置,优化代码语言分类 2026-01-07 10:10:25 +08:00
catlog22
86d3e36722 feat: 增强解决方案管理功能,支持按解决方案 ID 过滤和简要输出,优化嵌入模型配置读取 2026-01-07 09:31:52 +08:00
catlog22
1298fdd20f feat: 增加搜索功能的代码过滤选项,支持排除特定文件扩展名和仅返回代码文件 2026-01-06 23:19:47 +08:00
catlog22
ef770ff29b Add comprehensive code review specifications and templates
- Introduced best practices requirements specification covering code quality, performance, maintainability, error handling, and documentation standards.
- Established quality standards with overall quality metrics and mandatory checks for security, code quality, performance, and maintainability.
- Created security requirements specification aligned with OWASP Top 10 and CWE Top 25, detailing checks and patterns for common vulnerabilities.
- Developed templates for documenting best practice findings, security findings, and generating reports, including structured markdown and JSON formats.
- Updated dependencies in the project, ensuring compatibility and stability.
- Added test files and README documentation for vector indexing tests.
2026-01-06 23:11:15 +08:00
catlog22
1451594ae6 feat: Add user action prompt after issue discovery and enhance environment variable support for embedding and reranker configurations 2026-01-05 23:58:23 +08:00
catlog22
2e90230097 feat: Update import path for TextCrossEncoder to support fastembed versioning and add fallback for older versions 2026-01-05 23:13:52 +08:00
catlog22
f90c6b9fab feat: Enhance CodexLens uninstallation process with improved error handling and process termination for locked files 2026-01-05 22:40:26 +08:00
catlog22
853977c676 feat: Add reranker model management commands and UI integration
- Implemented CLI commands for listing, downloading, deleting, and retrieving information about reranker models.
- Enhanced the dashboard UI to support embedding and reranker configurations with internationalization.
- Updated environment variable management for embedding and reranker settings.
- Added functionality to dynamically update model options based on selected backend.
- Improved user experience with status indicators and action buttons for model management.
- Integrated new reranker models with detailed metadata and recommendations.
2026-01-05 21:23:09 +08:00
catlog22
f4585c8dea feat: enhance reranker and embedding configuration management with settings.json support 2026-01-05 17:21:34 +08:00
catlog22
504ccfebbc feat: add reranker models to ProviderCredential and improve FastEmbedReranker scoring
- Added `rerankerModels` property to the `ProviderCredential` interface in `litellm-api-config.ts` to support additional reranker configurations.
- Introduced a numerically stable sigmoid function in `FastEmbedReranker` for score normalization.
- Updated the scoring logic in `FastEmbedReranker` to use raw float scores from the encoder and normalize them using the new sigmoid function.
- Adjusted the result mapping to maintain original document order while applying normalization.
2026-01-03 22:20:06 +08:00
catlog22
0af84be775 feat(model-lock): implement model lock management with localStorage support 2026-01-03 19:48:07 +08:00
catlog22
be498acf59 feat: Add code analysis and LLM action templates with detailed configurations and examples
- Introduced a comprehensive code analysis action template for integrating code exploration and analysis capabilities.
- Added LLM action template for seamless integration of LLM calls with customizable prompts and tools.
- Implemented a benchmark search script to compare multiple search methods across various dimensions including speed, result quality, ranking stability, and coverage.
- Provided preset configurations for common analysis tasks and LLM actions, enhancing usability and flexibility.
2026-01-03 17:37:49 +08:00
catlog22
9922d455da feat: Add templates for autonomous actions, orchestrators, sequential phases, and skill documentation
- Introduced a comprehensive template for autonomous actions, detailing structure, execution, and error handling.
- Added an orchestrator template to manage state and decision logic for autonomous actions.
- Created a sequential phase template to outline execution steps and objectives for structured workflows.
- Developed a skill documentation template to standardize the generation of skill entry files.
- Implemented a Python script to compare search results between hybrid and cascade methods, analyzing ranking changes.
2026-01-03 15:58:31 +08:00
catlog22
bab5625123 feat: 添加全局环境变量加载功能并更新配置说明 2026-01-03 15:14:45 +08:00
catlog22
713894090d feat(codexlens): Improve search defaults and add explicit SPLADE mode
Config changes:
- Disable SPLADE by default (slow ~360ms), use FTS instead
- Enable use_fts_fallback by default for faster sparse search

CLI improvements:
- Fix duplicate index_app typer definition
- Add cascade_search dispatch for cascade method
- Rename 'mode' to 'method' in search output
- Mark embeddings-status, splade-status as deprecated
- Add enable_splade and enable_cascade to search options

Hybrid search:
- Add enable_splade parameter for explicit SPLADE mode
- Add fallback handling when SPLADE is requested but unavailable
2026-01-03 11:49:58 +08:00
catlog22
740bd1b61e fix(codexlens): Fix constructor and path handling issues
1. GlobalSymbolIndex constructor: Add project_id parameter lookup
   - Get project_id from registry using source_root
   - Pass project_id to GlobalSymbolIndex constructor

2. Binary cascade search path handling:
   - Add VectorMetadataStore import for centralized search
   - Fix _build_results_from_candidates to handle centralized mode
   - Use VectorMetadataStore for metadata, source_index_db for embeddings
   - Properly distinguish between index_root and index_path

3. Dense reranking for centralized search:
   - Get chunk metadata from _vectors_meta.db
   - Group chunks by source_index_db
   - Retrieve dense embeddings from respective _index.db files
2026-01-03 11:47:07 +08:00
catlog22
54fd94547c feat: Enhance embedding generation and search capabilities
- Added pre-calculation of estimated chunk count for HNSW capacity in `generate_dense_embeddings_centralized` to optimize indexing performance.
- Implemented binary vector generation with memory-mapped storage for efficient cascade search, including metadata saving.
- Introduced SPLADE sparse index generation with improved handling and metadata storage.
- Updated `ChainSearchEngine` to prefer centralized binary searcher for improved performance and added fallback to legacy binary index.
- Deprecated `BinaryANNIndex` in favor of `BinarySearcher` for better memory management and performance.
- Enhanced `SpladeEncoder` with warmup functionality to reduce latency spikes during first-time inference.
- Improved `SpladeIndex` with cache size adjustments for better query performance.
- Added methods for managing binary vectors in `VectorMetadataStore`, including batch insertion and retrieval.
- Created a new `BinarySearcher` class for efficient binary vector search using Hamming distance, supporting both memory-mapped and database loading modes.
2026-01-02 23:57:55 +08:00
catlog22
96b44e1482 feat: Add type validation for RRF weights and implement caching for embedder instances 2026-01-02 19:50:51 +08:00
catlog22
c268b531aa feat: Enhance embedding generation to track current index path and improve metadata retrieval 2026-01-02 19:18:26 +08:00
catlog22
0b6e9db8e4 feat: Add centralized vector storage and metadata management for embeddings 2026-01-02 17:18:23 +08:00
catlog22
9157c5c78b feat: Implement centralized storage for SPLADE and vector embeddings
- Added centralized SPLADE database and vector storage configuration in config.py.
- Updated embedding_manager.py to support centralized SPLADE database path.
- Enhanced generate_embeddings and generate_embeddings_recursive functions for centralized storage.
- Introduced centralized ANN index creation in ann_index.py.
- Modified hybrid_search.py to utilize centralized vector index for searches.
- Implemented methods to discover and manage centralized SPLADE and HNSW files.
2026-01-02 16:53:39 +08:00
catlog22
54fb7afdb2 Enhance semantic search capabilities and configuration
- Added category support for programming and documentation languages in Config.
- Implemented category-based filtering in HybridSearchEngine to improve search relevance based on query intent.
- Introduced functions for filtering results by category and determining file categories based on extensions.
- Updated VectorStore to include a category column in the database schema and modified chunk addition methods to support category tagging.
- Enhanced the WatcherConfig to ignore additional common directories and files.
- Created a benchmark script to compare performance between Binary Cascade, SPLADE, and Vector semantic search methods, including detailed result analysis and overlap comparison.
2026-01-02 15:01:20 +08:00
catlog22
92ed2524b7 feat: Enhance SPLADE indexing command to support multiple index databases and add chunk ID management 2026-01-02 13:25:23 +08:00
catlog22
56c03c847a feat: Add method to retrieve all semantic chunks from the vector store
- Implemented `get_all_chunks` method in `VectorStore` class to fetch all semantic chunks from the database.
- Added a new benchmark script `analyze_methods.py` for analyzing hybrid search methods and storage architecture.
- Included detailed analysis of method contributions, storage conflicts, and FTS + Rerank fusion experiments.
- Updated results JSON structure to reflect new analysis outputs and method performance metrics.
2026-01-02 12:32:43 +08:00
catlog22
9129c981a4 feat: Enhance BinaryANNIndex with vectorized search and performance benchmarking 2026-01-02 11:49:54 +08:00
catlog22
da68ba0b82 feat: Implement cascade indexing command and benchmark script for performance evaluation 2026-01-02 11:24:06 +08:00
catlog22
e21d801523 feat: Add multi-type embedding backends for cascade retrieval
- Implemented BinaryEmbeddingBackend for fast coarse filtering using 256-dimensional binary vectors.
- Developed DenseEmbeddingBackend for high-precision dense vectors (2048 dimensions) for reranking.
- Created CascadeEmbeddingBackend to combine binary and dense embeddings for two-stage retrieval.
- Introduced utility functions for embedding conversion and distance computation.

chore: Migration 010 - Add multi-vector storage support

- Added 'chunks' table to support multi-vector embeddings for cascade retrieval.
- Included new columns: embedding_binary (256-dim) and embedding_dense (2048-dim) for efficient storage.
- Implemented upgrade and downgrade functions to manage schema changes and data migration.
2026-01-02 10:52:43 +08:00
catlog22
195438d26a feat(splade): add cache directory support for ONNX models and improve thread-local database connection handling 2026-01-01 22:40:00 +08:00
catlog22
5bb01755bc Implement SPLADE sparse encoder and associated database migrations
- Added `splade_encoder.py` for ONNX-optimized SPLADE encoding, including methods for encoding text and batch processing.
- Created `SPLADE_IMPLEMENTATION.md` to document the SPLADE encoder's functionality, design patterns, and integration points.
- Introduced migration script `migration_009_add_splade.py` to add SPLADE metadata and posting list tables to the database.
- Developed `splade_index.py` for managing the SPLADE inverted index, supporting efficient sparse vector retrieval.
- Added verification script `verify_watcher.py` to test FileWatcher event filtering and debouncing functionality.
2026-01-01 17:41:22 +08:00
catlog22
520f2d26f2 feat(codex-lens): add unified reranker architecture and file watcher
Unified Reranker Architecture:
- Add BaseReranker ABC with factory pattern
- Implement 4 backends: ONNX (default), API, LiteLLM, Legacy
- Add .env configuration parsing for API credentials
- Migrate from sentence-transformers to optimum+onnxruntime

File Watcher Module:
- Add real-time file system monitoring with watchdog
- Implement IncrementalIndexer for single-file updates
- Add WatcherManager with signal handling and graceful shutdown
- Add 'codexlens watch' CLI command
- Event filtering, debouncing, and deduplication
- Thread-safe design with proper resource cleanup

Tests: 16 watcher tests + 5 reranker test files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 13:23:52 +08:00
catlog22
31a45f1f30 Add graph expansion and cross-encoder reranking features
- Implemented GraphExpander to enhance search results with related symbols using precomputed neighbors.
- Added CrossEncoderReranker for second-stage search ranking, allowing for improved result scoring.
- Created migrations to establish necessary database tables for relationships and graph neighbors.
- Developed tests for graph expansion functionality, ensuring related results are populated correctly.
- Enhanced performance benchmarks for cross-encoder reranking latency and graph expansion overhead.
- Updated schema cleanup tests to reflect changes in versioning and deprecated fields.
- Added new test cases for Treesitter parser to validate relationship extraction with alias resolution.
2025-12-31 16:58:59 +08:00
catlog22
70f8b14eaa refactor(vector_store): use safer SQL query construction pattern
Replaces f-string interpolation with safer string formatting.
Adds documentation on SQL injection prevention.
No functional changes - parameterized queries still used.

Fixes: ISS-1766921318981-9

Solution-ID: SOL-1735386000-9
Issue-ID: ISS-1766921318981-9
Task-ID: T1
2025-12-29 20:09:49 +08:00
catlog22
0c8b2f2ec9 fix(vector_store): add bounds checking for chunk ID generation
Prevents potential integer overflow when start_id is near sys.maxsize.
Adds validation before range() calculation in batch insert methods.

Fixes: ISS-1766921318981-6

Solution-ID: SOL-1735386000-6
Issue-ID: ISS-1766921318981-6
Task-ID: T1
2025-12-29 20:02:19 +08:00
catlog22
c56104c082 fix(vector_store): add null check for ANN search results before filtering
Prevents errors when HNSW search returns null/empty results due to race conditions.
Adds validation for ids and distances before zip operation.

Fixes: ISS-1766921318981-5

Solution-ID: SOL-1735386000-5
Issue-ID: ISS-1766921318981-5
Task-ID: T1
2025-12-29 19:53:32 +08:00
catlog22
7f4433e449 fix(vector_store): add parameter validation for min_score range
Validates min_score is within [0.0, 1.0] for cosine similarity.
Raises ValueError for out-of-range values to prevent unexpected filtering.

Fixes: ISS-1766921318981-14

Solution-ID: SOL-1735386000-14
Issue-ID: ISS-1766921318981-14
Task-ID: T1
2025-12-29 19:46:26 +08:00
catlog22
60fbb4177c fix(config): add specific exception handling for path operations
Replaces generic Exception handling with specific PermissionError and OSError
handling in __post_init__ and ensure_runtime_dirs(). Provides clear diagnostic
messages to distinguish permission issues from other filesystem errors.

Solution-ID: SOL-1735385400008
Issue-ID: ISS-1766921318981-8
Task-ID: T1
2025-12-29 19:34:27 +08:00
catlog22
5914b1c5fc fix(vector-store): protect bulk insert mode transitions with lock
Ensure begin_bulk_insert() and end_bulk_insert() are fully
lock-protected to prevent TOCTOU race conditions.

Solution-ID: SOL-1735392000003
Issue-ID: ISS-1766921318981-12
Task-ID: T2
2025-12-29 19:20:02 +08:00