Files
Claude-Code-Workflow/ccw/check_embeddings.py
catlog22 74a830694c Fix CodexLens embeddings generation to achieve 100% coverage
Previously, embeddings were only generated for root directory files (1.6% coverage, 5/303 files).
This fix implements recursive processing across all subdirectory indexes, achieving 100% coverage
with 2,042 semantic chunks across all 303 files in 26 index databases.

Key improvements:

1. **Recursive embeddings generation** (embedding_manager.py):
   - Add generate_embeddings_recursive() to process all _index.db files in directory tree
   - Add get_embeddings_status() for comprehensive coverage statistics
   - Add discover_all_index_dbs() helper for recursive file discovery

2. **Enhanced CLI commands** (commands.py):
   - embeddings-generate: Add --recursive flag for full project coverage
   - init: Use recursive generation by default for complete indexing
   - status: Display embeddings coverage statistics with 50% threshold

3. **Smart search routing improvements** (smart-search.ts):
   - Add 50% embeddings coverage threshold for hybrid mode routing
   - Auto-fallback to exact mode when coverage insufficient
   - Strip ANSI color codes from JSON output for correct parsing
   - Add embeddings_coverage_percent to IndexStatus and SearchMetadata
   - Provide clear warnings with actionable suggestions

4. **Documentation and analysis**:
   - Add SMART_SEARCH_ANALYSIS.md with initial investigation
   - Add SMART_SEARCH_CORRECTED_ANALYSIS.md revealing true extent of issue
   - Add EMBEDDINGS_FIX_SUMMARY.md with complete fix summary
   - Add check_embeddings.py script for coverage verification

Results:
- Coverage improved from 1.6% (5/303 files) to 100% (303/303 files) - 62.5x increase
- Semantic chunks increased from 10 to 2,042 - 204x increase
- All 26 subdirectory indexes now have embeddings vs just 1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-17 17:54:33 +08:00

1.6 KiB