Commit Graph

15 Commits

Author SHA1 Message Date
catlog22
e21d801523 feat: Add multi-type embedding backends for cascade retrieval
- Implemented BinaryEmbeddingBackend for fast coarse filtering using 256-dimensional binary vectors.
- Developed DenseEmbeddingBackend for high-precision dense vectors (2048 dimensions) for reranking.
- Created CascadeEmbeddingBackend to combine binary and dense embeddings for two-stage retrieval.
- Introduced utility functions for embedding conversion and distance computation.

chore: Migration 010 - Add multi-vector storage support

- Added 'chunks' table to support multi-vector embeddings for cascade retrieval.
- Included new columns: embedding_binary (256-dim) and embedding_dense (2048-dim) for efficient storage.
- Implemented upgrade and downgrade functions to manage schema changes and data migration.
2026-01-02 10:52:43 +08:00
catlog22
5bb01755bc Implement SPLADE sparse encoder and associated database migrations
- Added `splade_encoder.py` for ONNX-optimized SPLADE encoding, including methods for encoding text and batch processing.
- Created `SPLADE_IMPLEMENTATION.md` to document the SPLADE encoder's functionality, design patterns, and integration points.
- Introduced migration script `migration_009_add_splade.py` to add SPLADE metadata and posting list tables to the database.
- Developed `splade_index.py` for managing the SPLADE inverted index, supporting efficient sparse vector retrieval.
- Added verification script `verify_watcher.py` to test FileWatcher event filtering and debouncing functionality.
2026-01-01 17:41:22 +08:00
catlog22
520f2d26f2 feat(codex-lens): add unified reranker architecture and file watcher
Unified Reranker Architecture:
- Add BaseReranker ABC with factory pattern
- Implement 4 backends: ONNX (default), API, LiteLLM, Legacy
- Add .env configuration parsing for API credentials
- Migrate from sentence-transformers to optimum+onnxruntime

File Watcher Module:
- Add real-time file system monitoring with watchdog
- Implement IncrementalIndexer for single-file updates
- Add WatcherManager with signal handling and graceful shutdown
- Add 'codexlens watch' CLI command
- Event filtering, debouncing, and deduplication
- Thread-safe design with proper resource cleanup

Tests: 16 watcher tests + 5 reranker test files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 13:23:52 +08:00
catlog22
31a45f1f30 Add graph expansion and cross-encoder reranking features
- Implemented GraphExpander to enhance search results with related symbols using precomputed neighbors.
- Added CrossEncoderReranker for second-stage search ranking, allowing for improved result scoring.
- Created migrations to establish necessary database tables for relationships and graph neighbors.
- Developed tests for graph expansion functionality, ensuring related results are populated correctly.
- Enhanced performance benchmarks for cross-encoder reranking latency and graph expansion overhead.
- Updated schema cleanup tests to reflect changes in versioning and deprecated fields.
- Added new test cases for Treesitter parser to validate relationship extraction with alias resolution.
2025-12-31 16:58:59 +08:00
catlog22
60fbb4177c fix(config): add specific exception handling for path operations
Replaces generic Exception handling with specific PermissionError and OSError
handling in __post_init__ and ensure_runtime_dirs(). Provides clear diagnostic
messages to distinguish permission issues from other filesystem errors.

Solution-ID: SOL-1735385400008
Issue-ID: ISS-1766921318981-8
Task-ID: T1
2025-12-29 19:34:27 +08:00
catlog22
93dcdd2293 fix(config): log configuration loading errors instead of silently ignoring
Replaces bare exception handler in load_settings() with logging.warning()
to help users debug configuration file issues (syntax errors, permissions).
Maintains backward compatibility - errors do not break initialization.

Solution-ID: SOL-1735385400001
Issue-ID: ISS-1766921318981-1
Task-ID: T1
2025-12-28 21:06:23 +08:00
catlog22
4061ae48c4 feat: Implement adaptive RRF weights and query intent detection
- Added integration tests for adaptive RRF weights in hybrid search.
- Enhanced query intent detection with new classifications: keyword, semantic, and mixed.
- Introduced symbol boosting in search results based on explicit symbol matches.
- Implemented embedding-based reranking with configurable options.
- Added global symbol index for efficient symbol lookups across projects.
- Improved file deletion handling on Windows to avoid permission errors.
- Updated chunk configuration to increase overlap for better context.
- Modified package.json test script to target specific test files.
- Created comprehensive writing style guidelines for documentation.
- Added TypeScript tests for query intent detection and adaptive weights.
- Established performance benchmarks for global symbol indexing.
2025-12-26 15:08:47 +08:00
catlog22
3b842ed290 feat(cli-executor): add streaming option and enhance output handling
- Introduced a `stream` parameter to control output streaming vs. caching.
- Enhanced status determination logic to prioritize valid output over exit codes.
- Updated output structure to include full stdout and stderr when not streaming.

feat(cli-history-store): extend conversation turn schema and migration

- Added `cached`, `stdout_full`, and `stderr_full` fields to the conversation turn schema.
- Implemented database migration to add new columns if they do not exist.
- Updated upsert logic to handle new fields.

feat(codex-lens): implement global symbol index for fast lookups

- Created `GlobalSymbolIndex` class to manage project-wide symbol indexing.
- Added methods for adding, updating, and deleting symbols in the global index.
- Integrated global index updates into directory indexing processes.

feat(codex-lens): optimize search functionality with global index

- Enhanced `ChainSearchEngine` to utilize the global symbol index for faster searches.
- Added configuration option to enable/disable global symbol indexing.
- Updated tests to validate global index functionality and performance.
2025-12-25 22:22:31 +08:00
catlog22
40e61b30d6 feat: 添加多端点支持和负载均衡功能,增强 LiteLLM 嵌入管理 2025-12-25 11:01:08 +08:00
catlog22
e671b45948 feat: Enhance configuration management and embedding capabilities
- Added JSON-based settings management in Config class for embedding and LLM configurations.
- Introduced methods to save and load settings from a JSON file.
- Updated BaseEmbedder and its subclasses to include max_tokens property for better token management.
- Enhanced chunking strategy to support recursive splitting of large symbols with improved overlap handling.
- Implemented comprehensive tests for recursive splitting and chunking behavior.
- Added CLI tools configuration management for better integration with external tools.
- Introduced a new command for compacting session memory into structured text for recovery.
2025-12-24 16:32:27 +08:00
catlog22
fa81793bea refactor: 优化异常处理,使用 cached_property 替代 property,增强代码可读性;添加 RelationshipType 枚举以规范化关系类型 2025-12-21 17:01:49 +08:00
catlog22
35485bbbb1 feat: Enhance navigation and cleanup for graph explorer view
- Added a cleanup function to reset the state when navigating away from the graph explorer.
- Updated navigation logic to call the cleanup function before switching views.
- Improved internationalization by adding new translations for graph-related terms.
- Adjusted icon sizes for better UI consistency in the graph explorer.
- Implemented impact analysis button functionality in the graph explorer.
- Refactored CLI tool configuration to use updated model names.
- Enhanced CLI executor to handle prompts correctly for codex commands.
- Introduced code relationship storage for better visualization in the index tree.
- Added support for parsing Markdown and plain text files in the symbol parser.
- Updated tests to reflect changes in language detection logic.
2025-12-15 23:11:01 +08:00
catlog22
0fe16963cd Add comprehensive tests for tokenizer, performance benchmarks, and TreeSitter parser functionality
- Implemented unit tests for the Tokenizer class, covering various text inputs, edge cases, and fallback mechanisms.
- Created performance benchmarks comparing tiktoken and pure Python implementations for token counting.
- Developed extensive tests for TreeSitterSymbolParser across Python, JavaScript, and TypeScript, ensuring accurate symbol extraction and parsing.
- Added configuration documentation for MCP integration and custom prompts, enhancing usability and flexibility.
- Introduced a refactor script for GraphAnalyzer to streamline future improvements.
2025-12-15 14:36:09 +08:00
catlog22
79a2953862 Add comprehensive tests for vector/semantic search functionality
- Implement full coverage tests for Embedder model loading and embedding generation
- Add CRUD operations and caching tests for VectorStore
- Include cosine similarity computation tests
- Validate semantic search accuracy and relevance through various queries
- Establish performance benchmarks for embedding and search operations
- Ensure edge cases and error handling are covered
- Test thread safety and concurrent access scenarios
- Verify availability of semantic search dependencies
2025-12-14 17:17:09 +08:00
catlog22
a393601ec5 feat(codexlens): add CodexLens code indexing platform with incremental updates
- Add CodexLens Python package with SQLite FTS5 search and tree-sitter parsing
- Implement workspace-local index storage (.codexlens/ directory)
- Add incremental update CLI command for efficient file-level index refresh
- Integrate CodexLens with CCW tools (codex_lens action: update)
- Add CodexLens Auto-Sync hook template for automatic index updates on file changes
- Add CodexLens status card in CCW Dashboard CLI Manager with install/init buttons
- Add server APIs: /api/codexlens/status, /api/codexlens/bootstrap, /api/codexlens/init

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 15:02:32 +08:00