Commit Graph

80 Commits

Author SHA1 Message Date
catlog22
fb4f6e718e feat: Implement DeepWiki documentation generation tools
- Added `__init__.py` in `codexlens/tools` for documentation generation.
- Created `deepwiki_generator.py` to handle symbol extraction and markdown generation.
- Introduced `MockMarkdownGenerator` for testing purposes.
- Implemented `DeepWikiGenerator` class for managing documentation generation and file processing.
- Added unit tests for `DeepWikiStore` to ensure proper functionality and error handling.
- Created tests for DeepWiki TypeScript types matching.
2026-03-05 18:30:56 +08:00
catlog22
60218f6bf3 codex-lens: harden CLI embedding errors and flags 2026-03-05 14:27:36 +08:00
catlog22
61e313a0c1 chore: move ccw-skill-hub to standalone repository
Migrated ccw-skill-hub to D:/ccw-skill-hub as independent git project.
Removed nested git repos (ccw/frontend/ccw-skill-hub, skill-hub-repo, skill-hub-temp).
2026-02-24 11:57:26 +08:00
catlog22
8938c47f88 feat: add experimental support for AST parsing and static graph indexing
- Introduced CLI options for using AST grep parsers and enabling static graph relationships during indexing.
- Updated configuration management to load new settings for AST parsing and static graph types.
- Enhanced AST grep processor to handle imports with aliases and improve relationship tracking.
- Modified TreeSitter parsers to support synthetic module scopes for better static graph persistence.
- Implemented global relationship updates in the incremental indexer for static graph expansion.
- Added new ArtifactTag and FloatingFileBrowser components to the frontend for improved terminal dashboard functionality.
- Created utility functions for detecting CCW artifacts in terminal output with associated tests.
2026-02-15 23:12:06 +08:00
catlog22
4344e79e68 Add benchmark results for fast3 and fast4, implement KeepAliveLspBridge, and add tests for staged strategies
- Added new benchmark result files: compare_2026-02-09_score_fast3.json and compare_2026-02-09_score_fast4.json.
- Implemented KeepAliveLspBridge to maintain a persistent LSP connection across multiple queries, improving performance.
- Created unit tests for staged clustering strategies in test_staged_stage3_fast_strategies.py, ensuring correct behavior of score and dir_rr strategies.
2026-02-09 20:45:29 +08:00
catlog22
964292ebdb feat: Add comprehensive tests for contentPattern and glob pattern matching
- Implemented final verification tests for contentPattern to validate behavior with empty strings, dangerous patterns, and normal patterns.
- Created glob pattern matching tests to verify regex conversion and matching functionality.
- Developed infinite loop risk tests using Worker threads to isolate potential blocking operations.
- Introduced optimized contentPattern tests to validate improvements in the findMatches function.
- Added verification tests to assess the effectiveness of contentPattern optimizations.
- Conducted safety tests for contentPattern to identify edge cases and potential vulnerabilities.
- Implemented unrestricted loop tests to analyze infinite loop risks without match limits.
- Developed tests for zero-width pattern detection logic to ensure proper handling of dangerous regex patterns.
2026-02-09 11:13:01 +08:00
catlog22
b9b2932f50 Add tests and implement functionality for staged cascade search and LSP expansion
- Introduced a new JSON file for verbose output of the Codex Lens search results.
- Added unit tests for binary search functionality in `test_stage1_binary_search_uses_chunk_lines.py`.
- Implemented regression tests for staged cascade Stage 2 expansion depth in `test_staged_cascade_lsp_depth.py`.
- Created unit tests for staged cascade Stage 2 realtime LSP graph expansion in `test_staged_cascade_realtime_lsp.py`.
- Enhanced the ChainSearchEngine to respect configuration settings for staged LSP depth and improve search accuracy.
2026-02-08 21:54:42 +08:00
catlog22
71faaf43a8 refactor: 移除 SPLADE 和 hybrid_cascade,精简搜索架构
删除 SPLADE 稀疏神经搜索后端和 hybrid_cascade 策略,
将搜索架构从 6 种后端简化为 4 种(FTS Exact/Fuzzy, Binary Vector, Dense Vector, LSP)。

主要变更:
- 删除 splade_encoder.py, splade_index.py, migration_009 等 4 个文件
- 移除 config.py 中 SPLADE 相关配置(enable_splade, splade_model 等)
- DEFAULT_WEIGHTS 改为 FTS 权重 {exact:0.25, fuzzy:0.1, vector:0.5, lsp:0.15}
- 删除 hybrid_cascade_search(),所有 cascade fallback 改为 self.search()
- API fusion_strategy='hybrid' 向后兼容映射到 binary_rerank
- 删除 CLI index_splade/splade_status 命令和 --method splade
- 更新测试、基准测试和文档
2026-02-08 12:07:41 +08:00
catlog22
c308e429f8 feat: 添加增量更新命令以支持单文件索引更新 2026-01-15 18:14:51 +08:00
catlog22
8a15e08944 feat: 增强索引树构建逻辑,支持递归检查子目录中的可索引文件 2026-01-13 11:08:48 +08:00
catlog22
57173c9b02 feat: 优化动态批量大小计算,确保使用所有解析规则的最大字符限制,并调整利用率因子的安全范围 2026-01-12 17:47:19 +08:00
catlog22
90a1321aac feat: 添加动态批量大小计算,优化嵌入管理和配置系统 2026-01-12 17:34:37 +08:00
catlog22
b77672dda4 feat: 增强模型下载功能,支持 HuggingFace Hub 直接下载 ONNX 格式模型 2026-01-11 18:22:36 +08:00
catlog22
1e91fa9f9e feat: Add custom model download functionality and enhance model management
- Implemented `model-download-custom` command to download HuggingFace models.
- Added support for discovering manually placed models in the cache.
- Enhanced the model list view to display recommended and discovered models separately.
- Introduced JSON editor for direct configuration mode in API settings.
- Added validation and formatting features for JSON input.
- Updated translations for new API settings and common actions.
- Improved user interface for model management, including action buttons and tooltips.
2026-01-11 15:13:11 +08:00
catlog22
6aa79c6dc9 feat: 添加工作空间索引状态接口,增强 CodexLens 状态检查功能,支持前端显示索引信息 2026-01-07 11:36:06 +08:00
catlog22
86d3e36722 feat: 增强解决方案管理功能,支持按解决方案 ID 过滤和简要输出,优化嵌入模型配置读取 2026-01-07 09:31:52 +08:00
catlog22
1298fdd20f feat: 增加搜索功能的代码过滤选项,支持排除特定文件扩展名和仅返回代码文件 2026-01-06 23:19:47 +08:00
catlog22
ef770ff29b Add comprehensive code review specifications and templates
- Introduced best practices requirements specification covering code quality, performance, maintainability, error handling, and documentation standards.
- Established quality standards with overall quality metrics and mandatory checks for security, code quality, performance, and maintainability.
- Created security requirements specification aligned with OWASP Top 10 and CWE Top 25, detailing checks and patterns for common vulnerabilities.
- Developed templates for documenting best practice findings, security findings, and generating reports, including structured markdown and JSON formats.
- Updated dependencies in the project, ensuring compatibility and stability.
- Added test files and README documentation for vector indexing tests.
2026-01-06 23:11:15 +08:00
catlog22
1451594ae6 feat: Add user action prompt after issue discovery and enhance environment variable support for embedding and reranker configurations 2026-01-05 23:58:23 +08:00
catlog22
2e90230097 feat: Update import path for TextCrossEncoder to support fastembed versioning and add fallback for older versions 2026-01-05 23:13:52 +08:00
catlog22
853977c676 feat: Add reranker model management commands and UI integration
- Implemented CLI commands for listing, downloading, deleting, and retrieving information about reranker models.
- Enhanced the dashboard UI to support embedding and reranker configurations with internationalization.
- Updated environment variable management for embedding and reranker settings.
- Added functionality to dynamically update model options based on selected backend.
- Improved user experience with status indicators and action buttons for model management.
- Integrated new reranker models with detailed metadata and recommendations.
2026-01-05 21:23:09 +08:00
catlog22
f4585c8dea feat: enhance reranker and embedding configuration management with settings.json support 2026-01-05 17:21:34 +08:00
catlog22
be498acf59 feat: Add code analysis and LLM action templates with detailed configurations and examples
- Introduced a comprehensive code analysis action template for integrating code exploration and analysis capabilities.
- Added LLM action template for seamless integration of LLM calls with customizable prompts and tools.
- Implemented a benchmark search script to compare multiple search methods across various dimensions including speed, result quality, ranking stability, and coverage.
- Provided preset configurations for common analysis tasks and LLM actions, enhancing usability and flexibility.
2026-01-03 17:37:49 +08:00
catlog22
9922d455da feat: Add templates for autonomous actions, orchestrators, sequential phases, and skill documentation
- Introduced a comprehensive template for autonomous actions, detailing structure, execution, and error handling.
- Added an orchestrator template to manage state and decision logic for autonomous actions.
- Created a sequential phase template to outline execution steps and objectives for structured workflows.
- Developed a skill documentation template to standardize the generation of skill entry files.
- Implemented a Python script to compare search results between hybrid and cascade methods, analyzing ranking changes.
2026-01-03 15:58:31 +08:00
catlog22
bab5625123 feat: 添加全局环境变量加载功能并更新配置说明 2026-01-03 15:14:45 +08:00
catlog22
713894090d feat(codexlens): Improve search defaults and add explicit SPLADE mode
Config changes:
- Disable SPLADE by default (slow ~360ms), use FTS instead
- Enable use_fts_fallback by default for faster sparse search

CLI improvements:
- Fix duplicate index_app typer definition
- Add cascade_search dispatch for cascade method
- Rename 'mode' to 'method' in search output
- Mark embeddings-status, splade-status as deprecated
- Add enable_splade and enable_cascade to search options

Hybrid search:
- Add enable_splade parameter for explicit SPLADE mode
- Add fallback handling when SPLADE is requested but unavailable
2026-01-03 11:49:58 +08:00
catlog22
54fd94547c feat: Enhance embedding generation and search capabilities
- Added pre-calculation of estimated chunk count for HNSW capacity in `generate_dense_embeddings_centralized` to optimize indexing performance.
- Implemented binary vector generation with memory-mapped storage for efficient cascade search, including metadata saving.
- Introduced SPLADE sparse index generation with improved handling and metadata storage.
- Updated `ChainSearchEngine` to prefer centralized binary searcher for improved performance and added fallback to legacy binary index.
- Deprecated `BinaryANNIndex` in favor of `BinarySearcher` for better memory management and performance.
- Enhanced `SpladeEncoder` with warmup functionality to reduce latency spikes during first-time inference.
- Improved `SpladeIndex` with cache size adjustments for better query performance.
- Added methods for managing binary vectors in `VectorMetadataStore`, including batch insertion and retrieval.
- Created a new `BinarySearcher` class for efficient binary vector search using Hamming distance, supporting both memory-mapped and database loading modes.
2026-01-02 23:57:55 +08:00
catlog22
c268b531aa feat: Enhance embedding generation to track current index path and improve metadata retrieval 2026-01-02 19:18:26 +08:00
catlog22
0b6e9db8e4 feat: Add centralized vector storage and metadata management for embeddings 2026-01-02 17:18:23 +08:00
catlog22
9157c5c78b feat: Implement centralized storage for SPLADE and vector embeddings
- Added centralized SPLADE database and vector storage configuration in config.py.
- Updated embedding_manager.py to support centralized SPLADE database path.
- Enhanced generate_embeddings and generate_embeddings_recursive functions for centralized storage.
- Introduced centralized ANN index creation in ann_index.py.
- Modified hybrid_search.py to utilize centralized vector index for searches.
- Implemented methods to discover and manage centralized SPLADE and HNSW files.
2026-01-02 16:53:39 +08:00
catlog22
54fb7afdb2 Enhance semantic search capabilities and configuration
- Added category support for programming and documentation languages in Config.
- Implemented category-based filtering in HybridSearchEngine to improve search relevance based on query intent.
- Introduced functions for filtering results by category and determining file categories based on extensions.
- Updated VectorStore to include a category column in the database schema and modified chunk addition methods to support category tagging.
- Enhanced the WatcherConfig to ignore additional common directories and files.
- Created a benchmark script to compare performance between Binary Cascade, SPLADE, and Vector semantic search methods, including detailed result analysis and overlap comparison.
2026-01-02 15:01:20 +08:00
catlog22
92ed2524b7 feat: Enhance SPLADE indexing command to support multiple index databases and add chunk ID management 2026-01-02 13:25:23 +08:00
catlog22
da68ba0b82 feat: Implement cascade indexing command and benchmark script for performance evaluation 2026-01-02 11:24:06 +08:00
catlog22
5bb01755bc Implement SPLADE sparse encoder and associated database migrations
- Added `splade_encoder.py` for ONNX-optimized SPLADE encoding, including methods for encoding text and batch processing.
- Created `SPLADE_IMPLEMENTATION.md` to document the SPLADE encoder's functionality, design patterns, and integration points.
- Introduced migration script `migration_009_add_splade.py` to add SPLADE metadata and posting list tables to the database.
- Developed `splade_index.py` for managing the SPLADE inverted index, supporting efficient sparse vector retrieval.
- Added verification script `verify_watcher.py` to test FileWatcher event filtering and debouncing functionality.
2026-01-01 17:41:22 +08:00
catlog22
520f2d26f2 feat(codex-lens): add unified reranker architecture and file watcher
Unified Reranker Architecture:
- Add BaseReranker ABC with factory pattern
- Implement 4 backends: ONNX (default), API, LiteLLM, Legacy
- Add .env configuration parsing for API credentials
- Migrate from sentence-transformers to optimum+onnxruntime

File Watcher Module:
- Add real-time file system monitoring with watchdog
- Implement IncrementalIndexer for single-file updates
- Add WatcherManager with signal handling and graceful shutdown
- Add 'codexlens watch' CLI command
- Event filtering, debouncing, and deduplication
- Thread-safe design with proper resource cleanup

Tests: 16 watcher tests + 5 reranker test files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 13:23:52 +08:00
catlog22
31a45f1f30 Add graph expansion and cross-encoder reranking features
- Implemented GraphExpander to enhance search results with related symbols using precomputed neighbors.
- Added CrossEncoderReranker for second-stage search ranking, allowing for improved result scoring.
- Created migrations to establish necessary database tables for relationships and graph neighbors.
- Developed tests for graph expansion functionality, ensuring related results are populated correctly.
- Enhanced performance benchmarks for cross-encoder reranking latency and graph expansion overhead.
- Updated schema cleanup tests to reflect changes in versioning and deprecated fields.
- Added new test cases for Treesitter parser to validate relationship extraction with alias resolution.
2025-12-31 16:58:59 +08:00
catlog22
a1413dd1b3 feat: Unified Embedding Pool with auto-discovery
Architecture refactoring for multi-provider rotation:

Backend:
- Add EmbeddingPoolConfig type with autoDiscover support
- Implement discoverProvidersForModel() for auto-aggregation
- Add GET/PUT /api/litellm-api/embedding-pool endpoints
- Add GET /api/litellm-api/embedding-pool/discover/:model preview
- Convert ccw-litellm status check to async with 5-min cache
- Maintain backward compatibility with legacy rotation config

Frontend:
- Add "Embedding Pool" tab in API Settings
- Auto-discover providers when target model selected
- Show provider/key count with include/exclude controls
- Increase sidebar width (280px → 320px)
- Add sync result feedback on save

Other:
- Remove worker count limits (was max=32)
- Add i18n translations (EN/CN)
- Update .gitignore for .mcp.json

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 16:06:49 +08:00
catlog22
8e744597d1 feat: Implement CodexLens multi-provider embedding rotation management
- Added functions to get and update CodexLens embedding rotation configuration.
- Introduced functionality to retrieve enabled embedding providers for rotation.
- Created endpoints for managing rotation configuration via API.
- Enhanced dashboard UI to support multi-provider rotation configuration.
- Updated internationalization strings for new rotation features.
- Adjusted CLI commands and embedding manager to support increased concurrency limits.
- Modified hybrid search weights for improved ranking behavior.
2025-12-25 14:13:27 +08:00
catlog22
501d9a05d4 fix: 修复 ModelScope API 路由 bug 导致的 Ollama 连接错误
- 添加 _sanitize_text() 方法处理以 'import' 开头的文本
- ModelScope 后端错误地将此类文本路由到本地 Ollama 端点
- 通过在文本前添加空格绕过路由检测,不影响嵌入质量
- 增强 embedding_manager.py 的重试逻辑和错误处理
- 在 commands.py 中成功生成后调用全局模型锁定

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 12:52:43 +08:00
catlog22
229d51cd18 feat: 添加全局模型锁定功能,防止不同模型混合使用,增强嵌入生成的稳定性 2025-12-25 11:20:05 +08:00
catlog22
40e61b30d6 feat: 添加多端点支持和负载均衡功能,增强 LiteLLM 嵌入管理 2025-12-25 11:01:08 +08:00
catlog22
3c3ce55842 feat: 添加对 LiteLLM 嵌入后端的支持,增强并发 API 调用能力 2025-12-24 22:20:13 +08:00
catlog22
e3e61bcae9 feat: Enhance LiteLLM integration and CLI management
- Added token estimation and batching functionality in LiteLLMEmbedder to handle large text inputs efficiently.
- Updated embed method to support max_tokens_per_batch parameter for better API call management.
- Introduced new API routes for managing custom CLI endpoints, including GET, POST, PUT, and DELETE methods.
- Enhanced CLI history component to support source directory context for native session content.
- Improved error handling and logging in various components for better debugging and user feedback.
- Added internationalization support for new API endpoint features in the i18n module.
- Updated CodexLens CLI commands to allow for concurrent API calls with a max_workers option.
- Enhanced embedding manager to track model information and handle embeddings generation more robustly.
- Added entry points for CLI commands in the package configuration.
2025-12-24 18:01:26 +08:00
catlog22
e671b45948 feat: Enhance configuration management and embedding capabilities
- Added JSON-based settings management in Config class for embedding and LLM configurations.
- Introduced methods to save and load settings from a JSON file.
- Updated BaseEmbedder and its subclasses to include max_tokens property for better token management.
- Enhanced chunking strategy to support recursive splitting of large symbols with improved overlap handling.
- Implemented comprehensive tests for recursive splitting and chunking behavior.
- Added CLI tools configuration management for better integration with external tools.
- Introduced a new command for compacting session memory into structured text for recovery.
2025-12-24 16:32:27 +08:00
catlog22
b00113d212 feat: Enhance embedding management and model configuration
- Updated embedding_manager.py to include backend parameter in model configuration.
- Modified model_manager.py to utilize cache_name for ONNX models.
- Refactored hybrid_search.py to improve embedder initialization based on backend type.
- Added backend column to vector_store.py for better model configuration management.
- Implemented migration for existing database to include backend information.
- Enhanced API settings implementation with comprehensive provider and endpoint management.
- Introduced LiteLLM integration guide detailing configuration and usage.
- Added examples for LiteLLM usage in TypeScript.
2025-12-24 14:03:59 +08:00
catlog22
46ac591fe8 Merge branch 'main' of https://github.com/catlog22/Claude-Code-Workflow 2025-12-23 20:46:01 +08:00
catlog22
bf66b095c7 feat: Add unified LiteLLM API management with dashboard UI and CLI integration
- Create ccw-litellm Python package with AbstractEmbedder and AbstractLLMClient interfaces
- Add BaseEmbedder abstraction and factory pattern to codex-lens for pluggable backends
- Implement API Settings dashboard page for provider credentials and custom endpoints
- Add REST API routes for CRUD operations on providers and endpoints
- Extend CLI with --model parameter for custom endpoint routing
- Integrate existing context-cache for @pattern file resolution
- Add provider model registry with predefined models per provider type
- Include i18n translations (en/zh) for all new UI elements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 20:36:32 +08:00
catlog22
39056292b7 feat: Add CodexLens Manager to dashboard and enhance GPU management
- Introduced a new CodexLens Manager item in the dashboard for easier access.
- Implemented GPU management commands in the CLI, including listing available GPUs, selecting a specific GPU, and resetting to automatic detection.
- Enhanced the embedding generation process to utilize GPU resources more effectively, including batch size optimization for better performance.
- Updated the embedder to support device ID options for GPU selection, ensuring compatibility with DirectML and CUDA.
- Added detailed logging and error handling for GPU detection and selection processes.
- Updated package version to 6.2.9 and added comprehensive documentation for Codex Agent Execution Protocol.
2025-12-23 18:35:30 +08:00
rhyme
1998f3ae8a fix(codexlens): correct fastembed 0.7.4 cache path and download trigger
- Update cache path to ~/.cache/huggingface (HuggingFace Hub default)
- Fix model path format: models--{org}--{model}
- Add .embed() call to trigger actual download in download_model()
- Ensure cross-platform compatibility (Linux/Windows)
2025-12-23 14:51:08 +08:00
catlog22
3cd842ca1a fix: ccw package.json removal - add root build script and fix cli.ts path resolution
- Fix cli.ts loadPackageInfo() to try root package.json first (../../package.json)
- Add build script and devDependencies to root package.json
- Remove ccw/package.json and ccw/package-lock.json (no longer needed)
- CodexLens: add config.json support for index_dir configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 10:25:15 +08:00