Commit Graph

69 Commits

Author SHA1 Message Date
catlog22
93dcdd2293 fix(config): log configuration loading errors instead of silently ignoring
Replaces bare exception handler in load_settings() with logging.warning()
to help users debug configuration file issues (syntax errors, permissions).
Maintains backward compatibility - errors do not break initialization.

Solution-ID: SOL-1735385400001
Issue-ID: ISS-1766921318981-1
Task-ID: T1
2025-12-28 21:06:23 +08:00
catlog22
58caccb250 test(ranking): add edge case tests for normalize_weights
Add comprehensive test coverage for NaN, infinity, and all-None
edge cases in weight normalization to prevent regression.

Solution-ID: SOL-20251228113631
Issue-ID: ISS-1766921318981-0
Task-ID: T2
2025-12-28 20:59:08 +08:00
catlog22
598eed92cb fix(ranking): add explicit NaN check in normalize_weights
Add math.isnan() check before math.isfinite() to properly catch
NaN values in weight totals. Prevents division by NaN which could
produce unexpected results in RRF fusion calculations.

Solution-ID: SOL-20251228113631
Issue-ID: ISS-1766921318981-0
Task-ID: T1
2025-12-28 20:55:03 +08:00
catlog22
a2c88ba885 feat: Add project guidelines support and enhance project overview rendering 2025-12-28 14:50:50 +08:00
catlog22
4061ae48c4 feat: Implement adaptive RRF weights and query intent detection
- Added integration tests for adaptive RRF weights in hybrid search.
- Enhanced query intent detection with new classifications: keyword, semantic, and mixed.
- Introduced symbol boosting in search results based on explicit symbol matches.
- Implemented embedding-based reranking with configurable options.
- Added global symbol index for efficient symbol lookups across projects.
- Improved file deletion handling on Windows to avoid permission errors.
- Updated chunk configuration to increase overlap for better context.
- Modified package.json test script to target specific test files.
- Created comprehensive writing style guidelines for documentation.
- Added TypeScript tests for query intent detection and adaptive weights.
- Established performance benchmarks for global symbol indexing.
2025-12-26 15:08:47 +08:00
catlog22
3b842ed290 feat(cli-executor): add streaming option and enhance output handling
- Introduced a `stream` parameter to control output streaming vs. caching.
- Enhanced status determination logic to prioritize valid output over exit codes.
- Updated output structure to include full stdout and stderr when not streaming.

feat(cli-history-store): extend conversation turn schema and migration

- Added `cached`, `stdout_full`, and `stderr_full` fields to the conversation turn schema.
- Implemented database migration to add new columns if they do not exist.
- Updated upsert logic to handle new fields.

feat(codex-lens): implement global symbol index for fast lookups

- Created `GlobalSymbolIndex` class to manage project-wide symbol indexing.
- Added methods for adding, updating, and deleting symbols in the global index.
- Integrated global index updates into directory indexing processes.

feat(codex-lens): optimize search functionality with global index

- Enhanced `ChainSearchEngine` to utilize the global symbol index for faster searches.
- Added configuration option to enable/disable global symbol indexing.
- Updated tests to validate global index functionality and performance.
2025-12-25 22:22:31 +08:00
catlog22
203100431b feat: 添加 Code Index MCP 提供者支持,更新相关 API 和配置 2025-12-25 19:58:42 +08:00
catlog22
ebcbb11cb2 feat: Enhance CodexLens search functionality with new parameters and result handling
- Added search limit, content length, and extra files input fields in the CodexLens manager UI.
- Updated API request parameters to include new fields: max_content_length and extra_files_count.
- Refactored smart-search.ts to support new parameters with default values.
- Implemented result splitting logic to return both full content and additional file paths.
- Updated CLI commands to remove worker limits and allow dynamic scaling based on endpoint count.
- Introduced EmbeddingPoolConfig for improved embedding management and auto-discovery of providers.
- Enhanced search engines to utilize new parameters for fuzzy and exact searches.
- Added support for embedding single texts in the LiteLLM embedder.
2025-12-25 16:16:44 +08:00
catlog22
a1413dd1b3 feat: Unified Embedding Pool with auto-discovery
Architecture refactoring for multi-provider rotation:

Backend:
- Add EmbeddingPoolConfig type with autoDiscover support
- Implement discoverProvidersForModel() for auto-aggregation
- Add GET/PUT /api/litellm-api/embedding-pool endpoints
- Add GET /api/litellm-api/embedding-pool/discover/:model preview
- Convert ccw-litellm status check to async with 5-min cache
- Maintain backward compatibility with legacy rotation config

Frontend:
- Add "Embedding Pool" tab in API Settings
- Auto-discover providers when target model selected
- Show provider/key count with include/exclude controls
- Increase sidebar width (280px → 320px)
- Add sync result feedback on save

Other:
- Remove worker count limits (was max=32)
- Add i18n translations (EN/CN)
- Update .gitignore for .mcp.json

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 16:06:49 +08:00
catlog22
8e744597d1 feat: Implement CodexLens multi-provider embedding rotation management
- Added functions to get and update CodexLens embedding rotation configuration.
- Introduced functionality to retrieve enabled embedding providers for rotation.
- Created endpoints for managing rotation configuration via API.
- Enhanced dashboard UI to support multi-provider rotation configuration.
- Updated internationalization strings for new rotation features.
- Adjusted CLI commands and embedding manager to support increased concurrency limits.
- Modified hybrid search weights for improved ranking behavior.
2025-12-25 14:13:27 +08:00
catlog22
501d9a05d4 fix: 修复 ModelScope API 路由 bug 导致的 Ollama 连接错误
- 添加 _sanitize_text() 方法处理以 'import' 开头的文本
- ModelScope 后端错误地将此类文本路由到本地 Ollama 端点
- 通过在文本前添加空格绕过路由检测,不影响嵌入质量
- 增强 embedding_manager.py 的重试逻辑和错误处理
- 在 commands.py 中成功生成后调用全局模型锁定

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 12:52:43 +08:00
catlog22
229d51cd18 feat: 添加全局模型锁定功能,防止不同模型混合使用,增强嵌入生成的稳定性 2025-12-25 11:20:05 +08:00
catlog22
40e61b30d6 feat: 添加多端点支持和负载均衡功能,增强 LiteLLM 嵌入管理 2025-12-25 11:01:08 +08:00
catlog22
3c3ce55842 feat: 添加对 LiteLLM 嵌入后端的支持,增强并发 API 调用能力 2025-12-24 22:20:13 +08:00
catlog22
e3e61bcae9 feat: Enhance LiteLLM integration and CLI management
- Added token estimation and batching functionality in LiteLLMEmbedder to handle large text inputs efficiently.
- Updated embed method to support max_tokens_per_batch parameter for better API call management.
- Introduced new API routes for managing custom CLI endpoints, including GET, POST, PUT, and DELETE methods.
- Enhanced CLI history component to support source directory context for native session content.
- Improved error handling and logging in various components for better debugging and user feedback.
- Added internationalization support for new API endpoint features in the i18n module.
- Updated CodexLens CLI commands to allow for concurrent API calls with a max_workers option.
- Enhanced embedding manager to track model information and handle embeddings generation more robustly.
- Added entry points for CLI commands in the package configuration.
2025-12-24 18:01:26 +08:00
catlog22
e671b45948 feat: Enhance configuration management and embedding capabilities
- Added JSON-based settings management in Config class for embedding and LLM configurations.
- Introduced methods to save and load settings from a JSON file.
- Updated BaseEmbedder and its subclasses to include max_tokens property for better token management.
- Enhanced chunking strategy to support recursive splitting of large symbols with improved overlap handling.
- Implemented comprehensive tests for recursive splitting and chunking behavior.
- Added CLI tools configuration management for better integration with external tools.
- Introduced a new command for compacting session memory into structured text for recovery.
2025-12-24 16:32:27 +08:00
catlog22
b00113d212 feat: Enhance embedding management and model configuration
- Updated embedding_manager.py to include backend parameter in model configuration.
- Modified model_manager.py to utilize cache_name for ONNX models.
- Refactored hybrid_search.py to improve embedder initialization based on backend type.
- Added backend column to vector_store.py for better model configuration management.
- Implemented migration for existing database to include backend information.
- Enhanced API settings implementation with comprehensive provider and endpoint management.
- Introduced LiteLLM integration guide detailing configuration and usage.
- Added examples for LiteLLM usage in TypeScript.
2025-12-24 14:03:59 +08:00
catlog22
46ac591fe8 Merge branch 'main' of https://github.com/catlog22/Claude-Code-Workflow 2025-12-23 20:46:01 +08:00
catlog22
bf66b095c7 feat: Add unified LiteLLM API management with dashboard UI and CLI integration
- Create ccw-litellm Python package with AbstractEmbedder and AbstractLLMClient interfaces
- Add BaseEmbedder abstraction and factory pattern to codex-lens for pluggable backends
- Implement API Settings dashboard page for provider credentials and custom endpoints
- Add REST API routes for CRUD operations on providers and endpoints
- Extend CLI with --model parameter for custom endpoint routing
- Integrate existing context-cache for @pattern file resolution
- Add provider model registry with predefined models per provider type
- Include i18n translations (en/zh) for all new UI elements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 20:36:32 +08:00
catlog22
39056292b7 feat: Add CodexLens Manager to dashboard and enhance GPU management
- Introduced a new CodexLens Manager item in the dashboard for easier access.
- Implemented GPU management commands in the CLI, including listing available GPUs, selecting a specific GPU, and resetting to automatic detection.
- Enhanced the embedding generation process to utilize GPU resources more effectively, including batch size optimization for better performance.
- Updated the embedder to support device ID options for GPU selection, ensuring compatibility with DirectML and CUDA.
- Added detailed logging and error handling for GPU detection and selection processes.
- Updated package version to 6.2.9 and added comprehensive documentation for Codex Agent Execution Protocol.
2025-12-23 18:35:30 +08:00
rhyme
1998f3ae8a fix(codexlens): correct fastembed 0.7.4 cache path and download trigger
- Update cache path to ~/.cache/huggingface (HuggingFace Hub default)
- Fix model path format: models--{org}--{model}
- Add .embed() call to trigger actual download in download_model()
- Ensure cross-platform compatibility (Linux/Windows)
2025-12-23 14:51:08 +08:00
catlog22
3cd842ca1a fix: ccw package.json removal - add root build script and fix cli.ts path resolution
- Fix cli.ts loadPackageInfo() to try root package.json first (../../package.json)
- Add build script and devDependencies to root package.json
- Remove ccw/package.json and ccw/package-lock.json (no longer needed)
- CodexLens: add config.json support for index_dir configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-23 10:25:15 +08:00
catlog22
8203d690cb fix: CodexLens model detection, hybrid search stability, and JSON logging
- Fix model installation detection using fastembed ONNX cache names
- Add embeddings_config table for model metadata tracking
- Fix hybrid search segfault by using single-threaded GPU mode
- Suppress INFO logs in JSON mode to prevent error display
- Add model dropdown filtering to show only installed models

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 21:49:10 +08:00
catlog22
cf58dc0dd3 bump version to 6.2.6 in package.json 2025-12-22 20:17:38 +08:00
catlog22
6a69af3bf1 feat: 更新嵌入批处理大小至 256,以优化性能并提高 GPU 加速效率 2025-12-22 17:55:05 +08:00
catlog22
acdfbb4644 feat: Enhance CodexLens with GPU support and semantic status improvements
- Added accelerator and providers fields to SemanticStatus interface.
- Updated checkSemanticStatus function to retrieve ONNX providers and accelerator type.
- Introduced detectGpuSupport function to identify available GPU modes (CUDA, DirectML).
- Modified installSemantic function to support GPU acceleration modes and clean up ONNX Runtime installations.
- Updated package requirements in PKG-INFO for semantic-gpu and semantic-directml extras.
- Added new source files for GPU support and enrichment functionalities.
- Updated tests to cover new features and ensure comprehensive testing.
2025-12-22 17:42:26 +08:00
catlog22
72f24bf535 feat: 更新版本号至 6.2.4,添加 GPU 加速支持和相关依赖 2025-12-22 14:15:36 +08:00
catlog22
e60d793c8c fix: 修复 SmartSearch 的 ripgrep limit 和 FTS 分词器问题
- Ripgrep 模式: 添加总结果数量限制,防止返回超过 2MB 数据
  - --max-count 只限制每个文件的匹配数,现在在收集结果时应用 limit
  - 达到限制时在 metadata 中添加 warning 提示

- FTS 分词器: 将点号(.)添加到 tokenchars,修复 PortRole.FLOW 等带点号标识符的精确搜索
  - 更新 dir_index.py 和 migration_004_dual_fts.py 中的 tokenize 配置
  - 需要重建索引才能生效

- Exact 模式: 添加 fuzzy 回退,当精确搜索无结果时自动尝试模糊搜索
  - 回退时在 metadata 中标注 fallback: 'fuzzy'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 09:50:29 +08:00
catlog22
fc4a9af0cb feat: 引入流式生成器以优化内存使用,改进嵌入生成过程 2025-12-21 23:47:29 +08:00
catlog22
fa64e11a77 refactor: 优化嵌入生成过程,调整批处理大小和内存管理策略 2025-12-21 23:37:34 +08:00
catlog22
210f0f1012 feat: 添加钩子命令,简化 Claude Code 钩子操作接口,支持会话上下文加载和通知功能 2025-12-21 23:28:19 +08:00
catlog22
2871950ab8 fix: 修复向量索引进度显示过早完成的问题
问题:FTS 索引完成后立即显示 100%,但嵌入生成仍在后台运行

修复:
- codex-lens.ts: 将 "Indexed X files" 阶段从 complete 改为 fts_complete (60%)
- codex-lens.ts: 添加嵌入批次和 Finalizing index 阶段解析
- embedding_manager.py: 使用 bulk_insert() 模式延迟 ANN 索引构建
- embedding_manager.py: 添加 "Finalizing index" 进度回调

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 20:55:45 +08:00
catlog22
5849f751bc fix: 修复嵌入生成内存泄漏,优化性能
- HNSW 索引:预分配从 100 万降至 5 万,添加动态扩容和可控保存
- Embedder:添加 embed_to_numpy() 避免 .tolist() 转换,增强缓存清理
- embedding_manager:每 10 批次重建 embedder 实例,显式 gc.collect()
- VectorStore:添加 bulk_insert() 上下文管理器,支持 numpy 批量写入
- Chunker:添加 skip_token_count 轻量模式,使用 char/4 估算(~9x 加速)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 19:15:47 +08:00
catlog22
f492f4839a refactor: 移除 CLI 中过宽的异常捕获
- 移除所有 16 个 except Exception 块
- 只保留对特定异常的捕获 (StorageError, ConfigError, SearchError 等)
- 允许未知异常自然传播,便于调试
- 保留嵌入功能的可选异常处理

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 17:19:54 +08:00
catlog22
fa81793bea refactor: 优化异常处理,使用 cached_property 替代 property,增强代码可读性;添加 RelationshipType 枚举以规范化关系类型 2025-12-21 17:01:49 +08:00
catlog22
6eebdb8898 fix: 修复额外的内存泄露问题
1. hybrid_search.py: 修复 _search_vector 方法中的 SQLite 连接泄露
   - 使用 with 语句包装数据库连接
   - 添加异常处理确保连接正确关闭

2. symbol_extractor.py: 添加上下文管理器支持
   - 实现 __enter__ 和 __exit__ 方法
   - 支持 with 语句自动管理资源

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 16:39:38 +08:00
catlog22
3e9a309079 refactor: 移除图索引功能,修复内存泄露,优化嵌入生成
主要更改:

1. 移除图索引功能 (graph indexing)
   - 删除 graph_analyzer.py 及相关迁移文件
   - 移除 CLI 的 graph 命令和 --enrich 标志
   - 清理 chain_search.py 中的图查询方法 (370行)
   - 删除相关测试文件

2. 修复嵌入生成内存问题
   - 重构 generate_embeddings.py 使用流式批处理
   - 改用 embedding_manager 的内存安全实现
   - 文件从 548 行精简到 259 行 (52.7% 减少)

3. 修复内存泄露
   - chain_search.py: quick_search 使用 with 语句管理 ChainSearchEngine
   - embedding_manager.py: 使用 with 语句管理 VectorStore
   - vector_store.py: 添加暴力搜索内存警告

4. 代码清理
   - 移除 Symbol 模型的 token_count 和 symbol_type 字段
   - 清理相关测试用例

测试: 760 passed, 7 skipped

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 16:22:03 +08:00
catlog22
fd4a15c84e fix: improve chunking logic in Chunker class and enhance smart search tool with comprehensive features
- Updated the Chunker class to adjust the window movement logic, ensuring proper handling of overlap lines.
- Introduced a new smart search tool with features including intent classification, CodexLens integration, multi-backend search routing, and index status checking.
- Implemented various search modes (auto, hybrid, exact, ripgrep, priority) with detailed metadata and error handling.
- Added support for progress tracking during index initialization and enhanced output transformation based on user-defined modes.
- Included comprehensive documentation for usage and parameters in the smart search tool.
2025-12-20 21:44:15 +08:00
catlog22
b27d8a9570 feat: Add CLAUDE.md freshness tracking and update reminders
- Add SQLite table and CRUD methods for tracking update history
- Create freshness calculation service based on git file changes
- Add API endpoints for freshness data, marking updates, and history
- Display freshness badges in file tree (green/yellow/red indicators)
- Show freshness gauge and details in metadata panel
- Auto-mark files as updated after CLI sync
- Add English and Chinese i18n translations

Freshness algorithm: 100 - min((changedFilesCount / 20) * 100, 100)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 16:14:46 +08:00
catlog22
e1cac5dd50 Refactor search modes and optimize embedding generation
- Updated the dashboard template to hide the Code Graph Explorer feature.
- Enhanced the `executeCodexLens` function to use `exec` for better cross-platform compatibility and improved command execution.
- Changed the default `maxResults` and `limit` parameters in the smart search tool to 10 for better performance.
- Introduced a new `priority` search mode in the smart search tool, replacing the previous `parallel` mode, which now follows a fallback strategy: hybrid -> exact -> ripgrep.
- Optimized the embedding generation process in the embedding manager by batching operations and using a cached embedder instance to reduce model loading overhead.
- Implemented a thread-safe singleton pattern for the embedder to improve performance across multiple searches.
2025-12-20 11:08:34 +08:00
catlog22
7adde91e9f feat: Add search result grouping by similarity score
Add functionality to group search results with similar content and scores
into a single representative result with additional locations.

Changes:
- Add AdditionalLocation entity model for storing grouped result locations
- Add additional_locations field to SearchResult for backward compatibility
- Implement group_similar_results() function in ranking.py with:
  - Content-based grouping (by excerpt or content field)
  - Score-based sub-grouping with configurable threshold
  - Metadata preservation with grouped_count tracking
- Add group_results and grouping_threshold options to SearchOptions
- Integrate grouping into ChainSearchEngine.search() after RRF fusion

Test coverage:
- 36 multi-level tests covering unit, boundary, integration, and performance
- Real-world scenario tests for RRF scores and duplicate code detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 16:33:44 +08:00
catlog22
3428642d04 feat: Optimize FTS search with batch symbol fetching and improved excerpt generation 2025-12-19 15:35:31 +08:00
catlog22
2f0cce0089 feat: Enhance CodexLens indexing and search capabilities with new CLI options and improved error handling 2025-12-19 15:10:37 +08:00
catlog22
69049e3f45 feat: Return complete method blocks in FTS search results
- Add helper methods to locate match lines and find containing symbols
- Modify search_fts, search_fts_exact, search_fts_fuzzy to return complete
  code blocks (functions/methods/classes) instead of short snippets
- Join with files table to get full content and file_id
- Query symbols table to find the smallest symbol containing the match
- Fall back to context lines when no symbol contains the match
- Add return_full_content and context_lines parameters for flexibility
- Include start_line, end_line, symbol_name, symbol_kind in SearchResult

This improves search result quality by returning semantically meaningful
code blocks rather than arbitrary 20-byte snippets.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 11:38:39 +08:00
catlog22
5e91ba6c60 Implement ANN index using HNSW algorithm and update related tests
- Added ANNIndex class for approximate nearest neighbor search using HNSW.
- Integrated ANN index with VectorStore for enhanced search capabilities.
- Updated test suite for ANN index, including tests for adding, searching, saving, and loading vectors.
- Modified existing tests to accommodate changes in search performance expectations.
- Improved error handling for file operations in tests to ensure compatibility with Windows file locks.
- Adjusted hybrid search performance assertions for increased stability in CI environments.
2025-12-19 10:35:29 +08:00
catlog22
17af615fe2 Add help view and core memory styles
- Introduced styles for the help view including tab transitions, accordion animations, search highlighting, and responsive design.
- Implemented core memory styles with modal base styles, memory card designs, and knowledge graph visualization.
- Enhanced dark mode support across various components.
- Added loading states and empty state designs for better user experience.
2025-12-18 18:29:45 +08:00
catlog22
8dd4a513c8 Refactor CLI command usage from ccw cli exec to ccw cli -p for improved prompt handling
- Updated command patterns across documentation and templates to reflect the new CLI syntax.
- Enhanced CLI tool implementation to support reading prompts from files and multi-line inputs.
- Modified core components and views to ensure compatibility with the new command structure.
- Adjusted help messages and internationalization strings to align with the updated command format.
- Improved error handling and user notifications in the CLI execution flow.
2025-12-18 14:12:45 +08:00
catlog22
51a61bef31 Add parallel search mode and index progress bar
Features:
- CCW smart_search: Add 'parallel' mode that runs hybrid + exact + ripgrep
  simultaneously with RRF (Reciprocal Rank Fusion) for result merging
- Dashboard: Add real-time progress bar for CodexLens index initialization
- MCP: Return progress metadata in init action response
- Codex-lens: Auto-detect optimal worker count for parallel indexing

Changes:
- smart-search.ts: Add parallel mode, RRF fusion, progress tracking
- codex-lens.ts: Add onProgress callback support, progress parsing
- codexlens-routes.ts: Broadcast index progress via WebSocket
- codexlens-manager.js: New index progress modal with real-time updates
- notifications.js: Add WebSocket event handler registration system
- i18n.js: Add English/Chinese translations for progress UI
- index_tree.py: Workers parameter now auto-detects CPU count (max 16)
- commands.py: CLI --workers parameter supports auto-detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 23:17:15 +08:00
catlog22
74a830694c Fix CodexLens embeddings generation to achieve 100% coverage
Previously, embeddings were only generated for root directory files (1.6% coverage, 5/303 files).
This fix implements recursive processing across all subdirectory indexes, achieving 100% coverage
with 2,042 semantic chunks across all 303 files in 26 index databases.

Key improvements:

1. **Recursive embeddings generation** (embedding_manager.py):
   - Add generate_embeddings_recursive() to process all _index.db files in directory tree
   - Add get_embeddings_status() for comprehensive coverage statistics
   - Add discover_all_index_dbs() helper for recursive file discovery

2. **Enhanced CLI commands** (commands.py):
   - embeddings-generate: Add --recursive flag for full project coverage
   - init: Use recursive generation by default for complete indexing
   - status: Display embeddings coverage statistics with 50% threshold

3. **Smart search routing improvements** (smart-search.ts):
   - Add 50% embeddings coverage threshold for hybrid mode routing
   - Auto-fallback to exact mode when coverage insufficient
   - Strip ANSI color codes from JSON output for correct parsing
   - Add embeddings_coverage_percent to IndexStatus and SearchMetadata
   - Provide clear warnings with actionable suggestions

4. **Documentation and analysis**:
   - Add SMART_SEARCH_ANALYSIS.md with initial investigation
   - Add SMART_SEARCH_CORRECTED_ANALYSIS.md revealing true extent of issue
   - Add EMBEDDINGS_FIX_SUMMARY.md with complete fix summary
   - Add check_embeddings.py script for coverage verification

Results:
- Coverage improved from 1.6% (5/303 files) to 100% (303/303 files) - 62.5x increase
- Semantic chunks increased from 10 to 2,042 - 204x increase
- All 26 subdirectory indexes now have embeddings vs just 1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-17 17:54:33 +08:00
catlog22
d06a3ca12e Add comprehensive workflows for CLI tools usage, coding philosophy, context requirements, file modification, and CodexLens auto hybrid mode
- Introduced a detailed guide for intelligent tools selection strategy, including quick reference, tool specifications, prompt templates, and best practices for CLI execution.
- Established a coding philosophy document outlining core beliefs, simplicity principles, and guidelines for effective coding practices.
- Created context requirements documentation emphasizing the importance of understanding existing patterns and dependencies before implementation.
- Developed a file modification workflow detailing the use of edit_file and write_file MCP tools, along with priority logic for file reading and editing.
- Implemented CodexLens auto hybrid mode, enhancing the CLI with automatic vector embedding generation and default hybrid search mode based on embedding availability.
2025-12-17 09:53:30 +08:00