Commit Graph

133 Commits

Author SHA1 Message Date
catlog22
8a15e08944 feat: 增强索引树构建逻辑,支持递归检查子目录中的可索引文件 2026-01-13 11:08:48 +08:00
catlog22
8c2d39d517 feat: 添加配置选项以调整重排序模型的权重和测试文件惩罚,增强语义搜索功能 2026-01-13 10:44:26 +08:00
catlog22
57173c9b02 feat: 优化动态批量大小计算,确保使用所有解析规则的最大字符限制,并调整利用率因子的安全范围 2026-01-12 17:47:19 +08:00
catlog22
90a1321aac feat: 添加动态批量大小计算,优化嵌入管理和配置系统 2026-01-12 17:34:37 +08:00
catlog22
b77672dda4 feat: 增强模型下载功能,支持 HuggingFace Hub 直接下载 ONNX 格式模型 2026-01-11 18:22:36 +08:00
catlog22
1e91fa9f9e feat: Add custom model download functionality and enhance model management
- Implemented `model-download-custom` command to download HuggingFace models.
- Added support for discovering manually placed models in the cache.
- Enhanced the model list view to display recommended and discovered models separately.
- Introduced JSON editor for direct configuration mode in API settings.
- Added validation and formatting features for JSON input.
- Updated translations for new API settings and common actions.
- Improved user interface for model management, including action buttons and tooltips.
2026-01-11 15:13:11 +08:00
catlog22
09d99abee6 Issue Queue: issue-exec-20260106-160325 (#52)
* feat(security): Secure dashboard server by default

## Solution Summary
- Solution-ID: SOL-DSC-002-1
- Issue-ID: DSC-002

## Tasks Completed
- [T1] JWT token manager (24h expiry, persisted secret/token)
- [T2] API auth middleware + localhost token endpoint
- [T3] Default bind 127.0.0.1, add --host with warning
- [T4] Localhost-only CORS with credentials + Vary
- [T5] SECURITY.md documentation + README link

## Verification
- npm run build
- npm test -- ccw/tests/token-manager.test.ts ccw/tests/middleware.test.ts ccw/tests/server-auth.integration.test.ts ccw/tests/server.test.ts ccw/tests/cors.test.ts

* fix(security): Prevent command injection in Windows spawn()

## Solution Summary
- **Solution-ID**: SOL-DSC-001-1
- **Issue-ID**: DSC-001
- **Risk/Impact/Complexity**: high/high/medium

## Tasks Completed
- [T1] Create Windows shell escape utility
- [T2] Escape cli-executor spawn() args on Windows
- [T3] Add command injection regression tests

## Files Modified
- ccw/src/utils/shell-escape.ts
- ccw/src/tools/cli-executor.ts
- ccw/tests/shell-escape.test.ts
- ccw/tests/security/command-injection.test.ts

## Verification
- npm run build
- npm test -- ccw/tests/shell-escape.test.ts ccw/tests/security/command-injection.test.ts

* fix(security): Harden path validation (DSC-005)

## Solution Summary
- Solution-ID: SOL-DSC-005-1
- Issue-ID: DSC-005

## Tasks Completed
- T1: Refactor path validation to pre-resolution checking
- T2: Implement allowlist-based path validation
- T3: Add path validation to API routes
- T4: Add path security regression tests

## Files Modified
- ccw/src/utils/path-resolver.ts
- ccw/src/utils/path-validator.ts
- ccw/src/core/routes/graph-routes.ts
- ccw/src/core/routes/files-routes.ts
- ccw/src/core/routes/skills-routes.ts
- ccw/tests/path-resolver.test.ts
- ccw/tests/graph-routes.test.ts
- ccw/tests/files-routes.test.ts
- ccw/tests/skills-routes.test.ts
- ccw/tests/security/path-traversal.test.ts

## Verification
- npm run build
- npm test -- path-resolver.test.ts
- npm test -- path-validator.test.ts
- npm test -- graph-routes.test.ts
- npm test -- files-routes.test.ts
- npm test -- skills-routes.test.ts
- npm test -- ccw/tests/security/path-traversal.test.ts

* fix(security): Prevent credential leakage (DSC-004)

## Solution Summary
- Solution-ID: SOL-DSC-004-1
- Issue-ID: DSC-004

## Tasks Completed
- T1: Create credential handling security tests
- T2: Add log sanitization tests
- T3: Add env var leakage prevention tests
- T4: Add secure storage tests

## Files Modified
- ccw/src/config/litellm-api-config-manager.ts
- ccw/src/core/routes/litellm-api-routes.ts
- ccw/tests/security/credential-handling.test.ts

## Verification
- npm run build
- node --experimental-strip-types --test ccw/tests/security/credential-handling.test.ts

* test(ranking): expand normalize_weights edge case coverage (ISS-1766920108814-0)

## Solution Summary
- Solution-ID: SOL-20251228113607
- Issue-ID: ISS-1766920108814-0

## Tasks Completed
- T1: Fix NaN and invalid total handling in normalize_weights
- T2: Add unit tests for NaN edge cases in normalize_weights

## Files Modified
- codex-lens/tests/test_rrf_fusion.py

## Verification
- python -m pytest codex-lens/tests/test_rrf_fusion.py::TestNormalizeBM25Score -v
- python -m pytest codex-lens/tests/test_rrf_fusion.py -v -k normalize
- python -m pytest codex-lens/tests/test_rrf_fusion.py::TestReciprocalRankFusion::test_weight_normalization codex-lens/tests/test_cli_hybrid_search.py::TestCLIHybridSearch::test_weights_normalization -v

* feat(security): Add CSRF protection and tighten CORS (DSC-006)

## Solution Summary
- Solution-ID: SOL-DSC-006-1
- Issue-ID: DSC-006
- Risk/Impact/Complexity: high/high/medium

## Tasks Completed
- T1: Create CSRF token generation system
- T2: Add CSRF token endpoints
- T3: Implement CSRF validation middleware
- T4: Restrict CORS to trusted origins
- T5: Add CSRF security tests

## Files Modified
- ccw/src/core/auth/csrf-manager.ts
- ccw/src/core/auth/csrf-middleware.ts
- ccw/src/core/routes/auth-routes.ts
- ccw/src/core/server.ts
- ccw/tests/csrf-manager.test.ts
- ccw/tests/auth-routes.test.ts
- ccw/tests/csrf-middleware.test.ts
- ccw/tests/security/csrf.test.ts

## Verification
- npm run build
- node --experimental-strip-types --test ccw/tests/csrf-manager.test.ts
- node --experimental-strip-types --test ccw/tests/auth-routes.test.ts
- node --experimental-strip-types --test ccw/tests/csrf-middleware.test.ts
- node --experimental-strip-types --test ccw/tests/cors.test.ts
- node --experimental-strip-types --test ccw/tests/security/csrf.test.ts

* fix(cli-executor): prevent stale SIGKILL timeouts

## Solution Summary

- Solution-ID: SOL-DSC-007-1

- Issue-ID: DSC-007

- Risk/Impact/Complexity: low/low/low

## Tasks Completed

- [T1] Store timeout handle in killCurrentCliProcess

## Files Modified

- ccw/src/tools/cli-executor.ts

- ccw/tests/cli-executor-kill.test.ts

## Verification

- node --experimental-strip-types --test ccw/tests/cli-executor-kill.test.ts

* fix(cli-executor): enhance merge validation guards

## Solution Summary

- Solution-ID: SOL-DSC-008-1

- Issue-ID: DSC-008

- Risk/Impact/Complexity: low/low/low

## Tasks Completed

- [T1] Enhance sourceConversations array validation

## Files Modified

- ccw/src/tools/cli-executor.ts

- ccw/tests/cli-executor-merge-validation.test.ts

## Verification

- node --experimental-strip-types --test ccw/tests/cli-executor-merge-validation.test.ts

* refactor(core): remove @ts-nocheck from core routes

## Solution Summary
- Solution-ID: SOL-DSC-003-1
- Issue-ID: DSC-003
- Queue-ID: QUE-20260106-164500
- Item-ID: S-9

## Tasks Completed
- T1: Create shared RouteContext type definition
- T2: Remove @ts-nocheck from small route files
- T3: Remove @ts-nocheck from medium route files
- T4: Remove @ts-nocheck from large route files
- T5: Remove @ts-nocheck from remaining core files

## Files Modified
- ccw/src/core/dashboard-generator-patch.ts
- ccw/src/core/dashboard-generator.ts
- ccw/src/core/routes/ccw-routes.ts
- ccw/src/core/routes/claude-routes.ts
- ccw/src/core/routes/cli-routes.ts
- ccw/src/core/routes/codexlens-routes.ts
- ccw/src/core/routes/discovery-routes.ts
- ccw/src/core/routes/files-routes.ts
- ccw/src/core/routes/graph-routes.ts
- ccw/src/core/routes/help-routes.ts
- ccw/src/core/routes/hooks-routes.ts
- ccw/src/core/routes/issue-routes.ts
- ccw/src/core/routes/litellm-api-routes.ts
- ccw/src/core/routes/litellm-routes.ts
- ccw/src/core/routes/mcp-routes.ts
- ccw/src/core/routes/mcp-routes.ts.backup
- ccw/src/core/routes/mcp-templates-db.ts
- ccw/src/core/routes/nav-status-routes.ts
- ccw/src/core/routes/rules-routes.ts
- ccw/src/core/routes/session-routes.ts
- ccw/src/core/routes/skills-routes.ts
- ccw/src/core/routes/status-routes.ts
- ccw/src/core/routes/system-routes.ts
- ccw/src/core/routes/types.ts
- ccw/src/core/server.ts
- ccw/src/core/websocket.ts

## Verification
- npm run build
- npm test

* refactor: split cli-executor and codexlens routes into modules

## Solution Summary
- Solution-ID: SOL-DSC-012-1
- Issue-ID: DSC-012
- Risk/Impact/Complexity: medium/medium/high

## Tasks Completed
- [T1] Extract execution orchestration from cli-executor.ts (Refactor ccw/src/tools)
- [T2] Extract route handlers from codexlens-routes.ts (Refactor ccw/src/core/routes)
- [T3] Extract prompt concatenation logic from cli-executor (Refactor ccw/src/tools)
- [T4] Document refactored module architecture (Docs)

## Files Modified
- ccw/src/tools/cli-executor.ts
- ccw/src/tools/cli-executor-core.ts
- ccw/src/tools/cli-executor-utils.ts
- ccw/src/tools/cli-executor-state.ts
- ccw/src/tools/cli-prompt-builder.ts
- ccw/src/tools/README.md
- ccw/src/core/routes/codexlens-routes.ts
- ccw/src/core/routes/codexlens/config-handlers.ts
- ccw/src/core/routes/codexlens/index-handlers.ts
- ccw/src/core/routes/codexlens/semantic-handlers.ts
- ccw/src/core/routes/codexlens/watcher-handlers.ts
- ccw/src/core/routes/codexlens/utils.ts
- ccw/src/core/routes/codexlens/README.md

## Verification
- npm run build
- npm test

* test(issue): Add comprehensive issue command tests

## Solution Summary
- **Solution-ID**: SOL-DSC-009-1
- **Issue-ID**: DSC-009
- **Risk/Impact/Complexity**: low/high/medium

## Tasks Completed
- [T1] Create issue command test file structure: Create isolated test harness
- [T2] Add JSONL read/write operation tests: Verify JSONL correctness and errors
- [T3] Add issue lifecycle tests: Verify status transitions and timestamps
- [T4] Add solution binding tests: Verify binding flows and error cases
- [T5] Add queue formation tests: Verify queue creation, IDs, and DAG behavior
- [T6] Add queue execution tests: Verify next/done/retry and status sync

## Files Modified
- ccw/src/commands/issue.ts
- ccw/tests/issue-command.test.ts

## Verification
- node --experimental-strip-types --test ccw/tests/issue-command.test.ts

* test(routes): Add integration tests for route modules

## Solution Summary
- Solution-ID: SOL-DSC-010-1
- Issue-ID: DSC-010
- Queue-ID: QUE-20260106-164500

## Tasks Completed
- [T1] Add tests for ccw-routes.ts
- [T2] Add tests for files-routes.ts
- [T3] Add tests for claude-routes.ts (includes Windows path fix for create)
- [T4] Add tests for issue-routes.ts
- [T5] Add tests for help-routes.ts (avoid hanging watchers)
- [T6] Add tests for nav-status-routes.ts
- [T7] Add tests for hooks/graph/rules/skills/litellm-api routes

## Files Modified
- ccw/src/core/routes/claude-routes.ts
- ccw/src/core/routes/help-routes.ts
- ccw/tests/integration/ccw-routes.test.ts
- ccw/tests/integration/claude-routes.test.ts
- ccw/tests/integration/files-routes.test.ts
- ccw/tests/integration/issue-routes.test.ts
- ccw/tests/integration/help-routes.test.ts
- ccw/tests/integration/nav-status-routes.test.ts
- ccw/tests/integration/hooks-routes.test.ts
- ccw/tests/integration/graph-routes.test.ts
- ccw/tests/integration/rules-routes.test.ts
- ccw/tests/integration/skills-routes.test.ts
- ccw/tests/integration/litellm-api-routes.test.ts

## Verification
- node --experimental-strip-types --test ccw/tests/integration/ccw-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/files-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/claude-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/issue-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/help-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/nav-status-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/hooks-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/graph-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/rules-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/skills-routes.test.ts
- node --experimental-strip-types --test ccw/tests/integration/litellm-api-routes.test.ts

* refactor(core): Switch cache and lite scanning to async fs

## Solution Summary
- Solution-ID: SOL-DSC-013-1
- Issue-ID: DSC-013
- Queue-ID: QUE-20260106-164500

## Tasks Completed
- [T1] Convert cache-manager.ts to async file operations
- [T2] Convert lite-scanner.ts to async file operations
- [T3] Update cache-manager call sites to await async API
- [T4] Update lite-scanner call sites to await async API

## Files Modified
- ccw/src/core/cache-manager.ts
- ccw/src/core/lite-scanner.ts
- ccw/src/core/data-aggregator.ts

## Verification
- npm run build
- npm test

* fix(exec): Add timeout protection for execSync

## Solution Summary
- Solution-ID: SOL-DSC-014-1
- Issue-ID: DSC-014
- Queue-ID: QUE-20260106-164500

## Tasks Completed
- [T1] Add timeout to execSync calls in python-utils.ts
- [T2] Add timeout to execSync calls in detect-changed-modules.ts
- [T3] Add timeout to execSync calls in claude-freshness.ts
- [T4] Add timeout to execSync calls in issue.ts
- [T5] Consolidate execSync timeout constants and audit coverage

## Files Modified
- ccw/src/utils/exec-constants.ts
- ccw/src/utils/python-utils.ts
- ccw/src/tools/detect-changed-modules.ts
- ccw/src/core/claude-freshness.ts
- ccw/src/commands/issue.ts
- ccw/src/tools/smart-search.ts
- ccw/src/tools/codex-lens.ts
- ccw/src/core/routes/codexlens/config-handlers.ts

## Verification
- npm run build
- npm test
- node --experimental-strip-types --test ccw/tests/issue-command.test.ts

* feat(cli): Add progress spinner with elapsed time for long-running operations

## Solution Summary
- Solution-ID: SOL-DSC-015-1
- Issue-ID: DSC-015
- Queue-Item: S-15
- Risk/Impact/Complexity: low/medium/low

## Tasks Completed
- [T1] Add progress spinner to CLI execution: Update ccw/src/commands/cli.ts

## Files Modified
- ccw/src/commands/cli.ts
- ccw/tests/cli-command.test.ts

## Verification
- node --experimental-strip-types --test ccw/tests/cli-command.test.ts
- node --experimental-strip-types --test ccw/tests/cli-executor-kill.test.ts
- node --experimental-strip-types --test ccw/tests/cli-executor-merge-validation.test.ts

* fix(cli): Move full output hint immediately after truncation notice

## Solution Summary
- Solution-ID: SOL-DSC-016-1
- Issue-ID: DSC-016
- Queue-Item: S-16
- Risk/Impact/Complexity: low/high/low

## Tasks Completed
- [T1] Relocate output hint after truncation: Update ccw/src/commands/cli.ts

## Files Modified
- ccw/src/commands/cli.ts
- ccw/tests/cli-command.test.ts

## Verification
- npm run build
- node --experimental-strip-types --test ccw/tests/cli-command.test.ts

* feat(cli): Add confirmation prompts for destructive operations

## Solution Summary
- Solution-ID: SOL-DSC-017-1
- Issue-ID: DSC-017
- Queue-Item: S-17
- Risk/Impact/Complexity: low/high/low

## Tasks Completed
- [T1] Add confirmation to storage clean operations: Update ccw/src/commands/cli.ts
- [T2] Add confirmation to issue queue delete: Update ccw/src/commands/issue.ts

## Files Modified
- ccw/src/commands/cli.ts
- ccw/src/commands/issue.ts
- ccw/tests/cli-command.test.ts
- ccw/tests/issue-command.test.ts

## Verification
- npm run build
- node --experimental-strip-types --test ccw/tests/cli-command.test.ts
- node --experimental-strip-types --test ccw/tests/issue-command.test.ts

* feat(cli): Improve multi-line prompt guidance

## Solution Summary
- Solution-ID: SOL-DSC-018-1
- Issue-ID: DSC-018
- Queue-Item: S-18
- Risk/Impact/Complexity: low/medium/low

## Tasks Completed
- [T1] Update CLI help to emphasize --file option: Update ccw/src/commands/cli.ts
- [T2] Add inline hint for multi-line detection: Update ccw/src/commands/cli.ts

## Files Modified
- ccw/src/commands/cli.ts
- ccw/tests/cli-command.test.ts

## Verification
- npm run build
- node --experimental-strip-types --test ccw/tests/cli-command.test.ts

---------

Co-authored-by: catlog22 <catlog22@github.com>
2026-01-07 22:35:46 +08:00
catlog22
fae2f7e279 feat: 始终注册队列变更回调以支持标准输出(TypeScript 后端) 2026-01-07 22:21:11 +08:00
catlog22
05514631f2 feat: Enhance JSON streaming parsing and UI updates
- Added a function to parse JSON streaming content in core-memory.js, extracting readable text from messages.
- Updated memory detail view to utilize the new parsing function for content and summary.
- Introduced an enableReview option in rules-manager.js, allowing users to toggle review functionality in rule creation.
- Simplified skill creation modal in skills-manager.js by removing generation type selection UI.
- Improved CLI executor to handle tool calls for file writing, ensuring proper output parsing.
- Adjusted CLI command tests to set timeout to 0 for immediate execution.
- Updated file watcher to implement a true debounce mechanism and added a pending queue status for UI updates.
- Enhanced watcher manager to handle queue changes and provide JSON output for better integration with TypeScript backend.
- Established TypeScript naming conventions documentation to standardize code style across the project.
2026-01-07 21:51:26 +08:00
catlog22
42fbc1936d feat: 更新执行命令的参数提示,支持指定现有工作树路径,增强工作树管理功能 2026-01-07 16:54:23 +08:00
catlog22
87d38a3374 feat: 添加重排序模型配置,支持最大输入令牌数,优化 API 批处理能力 2026-01-07 15:50:22 +08:00
catlog22
6aa79c6dc9 feat: 添加工作空间索引状态接口,增强 CodexLens 状态检查功能,支持前端显示索引信息 2026-01-07 11:36:06 +08:00
catlog22
1bd3d9c9bf feat: 移除文档语言配置,优化代码语言分类 2026-01-07 10:10:25 +08:00
catlog22
86d3e36722 feat: 增强解决方案管理功能,支持按解决方案 ID 过滤和简要输出,优化嵌入模型配置读取 2026-01-07 09:31:52 +08:00
catlog22
1298fdd20f feat: 增加搜索功能的代码过滤选项,支持排除特定文件扩展名和仅返回代码文件 2026-01-06 23:19:47 +08:00
catlog22
ef770ff29b Add comprehensive code review specifications and templates
- Introduced best practices requirements specification covering code quality, performance, maintainability, error handling, and documentation standards.
- Established quality standards with overall quality metrics and mandatory checks for security, code quality, performance, and maintainability.
- Created security requirements specification aligned with OWASP Top 10 and CWE Top 25, detailing checks and patterns for common vulnerabilities.
- Developed templates for documenting best practice findings, security findings, and generating reports, including structured markdown and JSON formats.
- Updated dependencies in the project, ensuring compatibility and stability.
- Added test files and README documentation for vector indexing tests.
2026-01-06 23:11:15 +08:00
catlog22
1451594ae6 feat: Add user action prompt after issue discovery and enhance environment variable support for embedding and reranker configurations 2026-01-05 23:58:23 +08:00
catlog22
2e90230097 feat: Update import path for TextCrossEncoder to support fastembed versioning and add fallback for older versions 2026-01-05 23:13:52 +08:00
catlog22
f90c6b9fab feat: Enhance CodexLens uninstallation process with improved error handling and process termination for locked files 2026-01-05 22:40:26 +08:00
catlog22
853977c676 feat: Add reranker model management commands and UI integration
- Implemented CLI commands for listing, downloading, deleting, and retrieving information about reranker models.
- Enhanced the dashboard UI to support embedding and reranker configurations with internationalization.
- Updated environment variable management for embedding and reranker settings.
- Added functionality to dynamically update model options based on selected backend.
- Improved user experience with status indicators and action buttons for model management.
- Integrated new reranker models with detailed metadata and recommendations.
2026-01-05 21:23:09 +08:00
catlog22
f4585c8dea feat: enhance reranker and embedding configuration management with settings.json support 2026-01-05 17:21:34 +08:00
catlog22
504ccfebbc feat: add reranker models to ProviderCredential and improve FastEmbedReranker scoring
- Added `rerankerModels` property to the `ProviderCredential` interface in `litellm-api-config.ts` to support additional reranker configurations.
- Introduced a numerically stable sigmoid function in `FastEmbedReranker` for score normalization.
- Updated the scoring logic in `FastEmbedReranker` to use raw float scores from the encoder and normalize them using the new sigmoid function.
- Adjusted the result mapping to maintain original document order while applying normalization.
2026-01-03 22:20:06 +08:00
catlog22
0af84be775 feat(model-lock): implement model lock management with localStorage support 2026-01-03 19:48:07 +08:00
catlog22
be498acf59 feat: Add code analysis and LLM action templates with detailed configurations and examples
- Introduced a comprehensive code analysis action template for integrating code exploration and analysis capabilities.
- Added LLM action template for seamless integration of LLM calls with customizable prompts and tools.
- Implemented a benchmark search script to compare multiple search methods across various dimensions including speed, result quality, ranking stability, and coverage.
- Provided preset configurations for common analysis tasks and LLM actions, enhancing usability and flexibility.
2026-01-03 17:37:49 +08:00
catlog22
9922d455da feat: Add templates for autonomous actions, orchestrators, sequential phases, and skill documentation
- Introduced a comprehensive template for autonomous actions, detailing structure, execution, and error handling.
- Added an orchestrator template to manage state and decision logic for autonomous actions.
- Created a sequential phase template to outline execution steps and objectives for structured workflows.
- Developed a skill documentation template to standardize the generation of skill entry files.
- Implemented a Python script to compare search results between hybrid and cascade methods, analyzing ranking changes.
2026-01-03 15:58:31 +08:00
catlog22
bab5625123 feat: 添加全局环境变量加载功能并更新配置说明 2026-01-03 15:14:45 +08:00
catlog22
713894090d feat(codexlens): Improve search defaults and add explicit SPLADE mode
Config changes:
- Disable SPLADE by default (slow ~360ms), use FTS instead
- Enable use_fts_fallback by default for faster sparse search

CLI improvements:
- Fix duplicate index_app typer definition
- Add cascade_search dispatch for cascade method
- Rename 'mode' to 'method' in search output
- Mark embeddings-status, splade-status as deprecated
- Add enable_splade and enable_cascade to search options

Hybrid search:
- Add enable_splade parameter for explicit SPLADE mode
- Add fallback handling when SPLADE is requested but unavailable
2026-01-03 11:49:58 +08:00
catlog22
740bd1b61e fix(codexlens): Fix constructor and path handling issues
1. GlobalSymbolIndex constructor: Add project_id parameter lookup
   - Get project_id from registry using source_root
   - Pass project_id to GlobalSymbolIndex constructor

2. Binary cascade search path handling:
   - Add VectorMetadataStore import for centralized search
   - Fix _build_results_from_candidates to handle centralized mode
   - Use VectorMetadataStore for metadata, source_index_db for embeddings
   - Properly distinguish between index_root and index_path

3. Dense reranking for centralized search:
   - Get chunk metadata from _vectors_meta.db
   - Group chunks by source_index_db
   - Retrieve dense embeddings from respective _index.db files
2026-01-03 11:47:07 +08:00
catlog22
54fd94547c feat: Enhance embedding generation and search capabilities
- Added pre-calculation of estimated chunk count for HNSW capacity in `generate_dense_embeddings_centralized` to optimize indexing performance.
- Implemented binary vector generation with memory-mapped storage for efficient cascade search, including metadata saving.
- Introduced SPLADE sparse index generation with improved handling and metadata storage.
- Updated `ChainSearchEngine` to prefer centralized binary searcher for improved performance and added fallback to legacy binary index.
- Deprecated `BinaryANNIndex` in favor of `BinarySearcher` for better memory management and performance.
- Enhanced `SpladeEncoder` with warmup functionality to reduce latency spikes during first-time inference.
- Improved `SpladeIndex` with cache size adjustments for better query performance.
- Added methods for managing binary vectors in `VectorMetadataStore`, including batch insertion and retrieval.
- Created a new `BinarySearcher` class for efficient binary vector search using Hamming distance, supporting both memory-mapped and database loading modes.
2026-01-02 23:57:55 +08:00
catlog22
96b44e1482 feat: Add type validation for RRF weights and implement caching for embedder instances 2026-01-02 19:50:51 +08:00
catlog22
c268b531aa feat: Enhance embedding generation to track current index path and improve metadata retrieval 2026-01-02 19:18:26 +08:00
catlog22
0b6e9db8e4 feat: Add centralized vector storage and metadata management for embeddings 2026-01-02 17:18:23 +08:00
catlog22
9157c5c78b feat: Implement centralized storage for SPLADE and vector embeddings
- Added centralized SPLADE database and vector storage configuration in config.py.
- Updated embedding_manager.py to support centralized SPLADE database path.
- Enhanced generate_embeddings and generate_embeddings_recursive functions for centralized storage.
- Introduced centralized ANN index creation in ann_index.py.
- Modified hybrid_search.py to utilize centralized vector index for searches.
- Implemented methods to discover and manage centralized SPLADE and HNSW files.
2026-01-02 16:53:39 +08:00
catlog22
54fb7afdb2 Enhance semantic search capabilities and configuration
- Added category support for programming and documentation languages in Config.
- Implemented category-based filtering in HybridSearchEngine to improve search relevance based on query intent.
- Introduced functions for filtering results by category and determining file categories based on extensions.
- Updated VectorStore to include a category column in the database schema and modified chunk addition methods to support category tagging.
- Enhanced the WatcherConfig to ignore additional common directories and files.
- Created a benchmark script to compare performance between Binary Cascade, SPLADE, and Vector semantic search methods, including detailed result analysis and overlap comparison.
2026-01-02 15:01:20 +08:00
catlog22
92ed2524b7 feat: Enhance SPLADE indexing command to support multiple index databases and add chunk ID management 2026-01-02 13:25:23 +08:00
catlog22
56c03c847a feat: Add method to retrieve all semantic chunks from the vector store
- Implemented `get_all_chunks` method in `VectorStore` class to fetch all semantic chunks from the database.
- Added a new benchmark script `analyze_methods.py` for analyzing hybrid search methods and storage architecture.
- Included detailed analysis of method contributions, storage conflicts, and FTS + Rerank fusion experiments.
- Updated results JSON structure to reflect new analysis outputs and method performance metrics.
2026-01-02 12:32:43 +08:00
catlog22
9129c981a4 feat: Enhance BinaryANNIndex with vectorized search and performance benchmarking 2026-01-02 11:49:54 +08:00
catlog22
da68ba0b82 feat: Implement cascade indexing command and benchmark script for performance evaluation 2026-01-02 11:24:06 +08:00
catlog22
e21d801523 feat: Add multi-type embedding backends for cascade retrieval
- Implemented BinaryEmbeddingBackend for fast coarse filtering using 256-dimensional binary vectors.
- Developed DenseEmbeddingBackend for high-precision dense vectors (2048 dimensions) for reranking.
- Created CascadeEmbeddingBackend to combine binary and dense embeddings for two-stage retrieval.
- Introduced utility functions for embedding conversion and distance computation.

chore: Migration 010 - Add multi-vector storage support

- Added 'chunks' table to support multi-vector embeddings for cascade retrieval.
- Included new columns: embedding_binary (256-dim) and embedding_dense (2048-dim) for efficient storage.
- Implemented upgrade and downgrade functions to manage schema changes and data migration.
2026-01-02 10:52:43 +08:00
catlog22
195438d26a feat(splade): add cache directory support for ONNX models and improve thread-local database connection handling 2026-01-01 22:40:00 +08:00
catlog22
5bb01755bc Implement SPLADE sparse encoder and associated database migrations
- Added `splade_encoder.py` for ONNX-optimized SPLADE encoding, including methods for encoding text and batch processing.
- Created `SPLADE_IMPLEMENTATION.md` to document the SPLADE encoder's functionality, design patterns, and integration points.
- Introduced migration script `migration_009_add_splade.py` to add SPLADE metadata and posting list tables to the database.
- Developed `splade_index.py` for managing the SPLADE inverted index, supporting efficient sparse vector retrieval.
- Added verification script `verify_watcher.py` to test FileWatcher event filtering and debouncing functionality.
2026-01-01 17:41:22 +08:00
catlog22
520f2d26f2 feat(codex-lens): add unified reranker architecture and file watcher
Unified Reranker Architecture:
- Add BaseReranker ABC with factory pattern
- Implement 4 backends: ONNX (default), API, LiteLLM, Legacy
- Add .env configuration parsing for API credentials
- Migrate from sentence-transformers to optimum+onnxruntime

File Watcher Module:
- Add real-time file system monitoring with watchdog
- Implement IncrementalIndexer for single-file updates
- Add WatcherManager with signal handling and graceful shutdown
- Add 'codexlens watch' CLI command
- Event filtering, debouncing, and deduplication
- Thread-safe design with proper resource cleanup

Tests: 16 watcher tests + 5 reranker test files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 13:23:52 +08:00
catlog22
31a45f1f30 Add graph expansion and cross-encoder reranking features
- Implemented GraphExpander to enhance search results with related symbols using precomputed neighbors.
- Added CrossEncoderReranker for second-stage search ranking, allowing for improved result scoring.
- Created migrations to establish necessary database tables for relationships and graph neighbors.
- Developed tests for graph expansion functionality, ensuring related results are populated correctly.
- Enhanced performance benchmarks for cross-encoder reranking latency and graph expansion overhead.
- Updated schema cleanup tests to reflect changes in versioning and deprecated fields.
- Added new test cases for Treesitter parser to validate relationship extraction with alias resolution.
2025-12-31 16:58:59 +08:00
catlog22
70f8b14eaa refactor(vector_store): use safer SQL query construction pattern
Replaces f-string interpolation with safer string formatting.
Adds documentation on SQL injection prevention.
No functional changes - parameterized queries still used.

Fixes: ISS-1766921318981-9

Solution-ID: SOL-1735386000-9
Issue-ID: ISS-1766921318981-9
Task-ID: T1
2025-12-29 20:09:49 +08:00
catlog22
0c8b2f2ec9 fix(vector_store): add bounds checking for chunk ID generation
Prevents potential integer overflow when start_id is near sys.maxsize.
Adds validation before range() calculation in batch insert methods.

Fixes: ISS-1766921318981-6

Solution-ID: SOL-1735386000-6
Issue-ID: ISS-1766921318981-6
Task-ID: T1
2025-12-29 20:02:19 +08:00
catlog22
c56104c082 fix(vector_store): add null check for ANN search results before filtering
Prevents errors when HNSW search returns null/empty results due to race conditions.
Adds validation for ids and distances before zip operation.

Fixes: ISS-1766921318981-5

Solution-ID: SOL-1735386000-5
Issue-ID: ISS-1766921318981-5
Task-ID: T1
2025-12-29 19:53:32 +08:00
catlog22
7f4433e449 fix(vector_store): add parameter validation for min_score range
Validates min_score is within [0.0, 1.0] for cosine similarity.
Raises ValueError for out-of-range values to prevent unexpected filtering.

Fixes: ISS-1766921318981-14

Solution-ID: SOL-1735386000-14
Issue-ID: ISS-1766921318981-14
Task-ID: T1
2025-12-29 19:46:26 +08:00
catlog22
60fbb4177c fix(config): add specific exception handling for path operations
Replaces generic Exception handling with specific PermissionError and OSError
handling in __post_init__ and ensure_runtime_dirs(). Provides clear diagnostic
messages to distinguish permission issues from other filesystem errors.

Solution-ID: SOL-1735385400008
Issue-ID: ISS-1766921318981-8
Task-ID: T1
2025-12-29 19:34:27 +08:00
catlog22
5914b1c5fc fix(vector-store): protect bulk insert mode transitions with lock
Ensure begin_bulk_insert() and end_bulk_insert() are fully
lock-protected to prevent TOCTOU race conditions.

Solution-ID: SOL-1735392000003
Issue-ID: ISS-1766921318981-12
Task-ID: T2
2025-12-29 19:20:02 +08:00
catlog22
d8be23fa83 fix(vector-store): add lock protection for bulk insert mode flag
Protect _bulk_insert_mode flag and accumulation lists with
_ann_write_lock to prevent corruption during concurrent access.

Solution-ID: SOL-1735392000003
Issue-ID: ISS-1766921318981-12
Task-ID: T1
2025-12-29 19:16:30 +08:00