- Implement tests for migration 005 to verify removal of deprecated fields in the database schema. - Ensure that new databases are created with a clean schema. - Validate that keywords are correctly extracted from the normalized file_keywords table. - Test symbol insertion without deprecated fields and subdir operations without direct_files. - Create a detailed search comparison test to evaluate vector search vs hybrid search performance. - Add a script for reindexing projects to extract code relationships and verify GraphAnalyzer functionality. - Include a test script to check TreeSitter parser availability and relationship extraction from sample files.
9.1 KiB
CLI Integration Summary - Embedding Management
Date: 2025-12-16 Version: v0.5.1 Status: ✅ Complete
Overview
Completed integration of embedding management commands into the CodexLens CLI, making vector search functionality more accessible and user-friendly. Users no longer need to run standalone scripts - all embedding operations are now available through simple CLI commands.
What Changed
1. New CLI Commands
codexlens embeddings-generate
Purpose: Generate semantic embeddings for code search
Features:
- Accepts project directory or direct
_index.dbpath - Auto-finds index for project paths using registry
- Supports 4 model profiles (fast, code, multilingual, balanced)
- Force regeneration with
--forceflag - Configurable chunk size
- Verbose mode with progress updates
- JSON output mode for scripting
Examples:
# Generate embeddings for a project
codexlens embeddings-generate ~/projects/my-app
# Use specific model
codexlens embeddings-generate ~/projects/my-app --model fast
# Force regeneration
codexlens embeddings-generate ~/projects/my-app --force
# Verbose output
codexlens embeddings-generate ~/projects/my-app -v
Output:
Generating embeddings
Index: ~/.codexlens/indexes/my-app/_index.db
Model: code
✓ Embeddings generated successfully!
Model: jinaai/jina-embeddings-v2-base-code
Chunks created: 1,234
Files processed: 89
Time: 45.2s
Use vector search with:
codexlens search 'your query' --mode pure-vector
codexlens embeddings-status
Purpose: Check embedding status for indexes
Features:
- Check all indexes (no arguments)
- Check specific project or index
- Summary table view
- File coverage statistics
- Missing files detection
- JSON output mode
Examples:
# Check all indexes
codexlens embeddings-status
# Check specific project
codexlens embeddings-status ~/projects/my-app
# Check specific index
codexlens embeddings-status ~/.codexlens/indexes/my-app/_index.db
Output (all indexes):
Embedding Status Summary
Index root: ~/.codexlens/indexes
Total indexes: 5
Indexes with embeddings: 3/5
Total chunks: 4,567
Project Files Chunks Coverage Status
my-app 89 1,234 100.0% ✓
other-app 145 2,456 95.5% ✓
test-proj 23 877 100.0% ✓
no-emb 67 0 0.0% —
legacy 45 0 0.0% —
Output (specific project):
Embedding Status
Index: ~/.codexlens/indexes/my-app/_index.db
✓ Embeddings available
Total chunks: 1,234
Total files: 89
Files with embeddings: 89/89
Coverage: 100.0%
2. Improved Error Messages
Enhanced error messages throughout the search pipeline to guide users to the new CLI commands:
Before:
DEBUG: No semantic_chunks table found
DEBUG: Vector store is empty
After:
INFO: No embeddings found in index. Generate embeddings with: codexlens embeddings-generate ~/projects/my-app
WARNING: Pure vector search returned no results. This usually means embeddings haven't been generated. Run: codexlens embeddings-generate ~/projects/my-app
Locations Updated:
src/codexlens/search/hybrid_search.py- Added helpful info messagessrc/codexlens/cli/commands.py- Improved error hints in CLI output
3. Backend Infrastructure
Created src/codexlens/cli/embedding_manager.py with reusable functions:
Functions:
check_index_embeddings(index_path)- Check embedding statusgenerate_embeddings(index_path, ...)- Generate embeddingsfind_all_indexes(scan_dir)- Find all indexes in directoryget_embedding_stats_summary(index_root)- Aggregate stats for all indexes
Architecture:
- Follows same pattern as
model_manager.pyfor consistency - Returns standardized result dictionaries
{"success": bool, "result": dict} - Supports progress callbacks for UI updates
- Handles all error cases gracefully
4. Documentation Updates
Updated user-facing documentation to reference new CLI commands:
Files Updated:
-
docs/PURE_VECTOR_SEARCH_GUIDE.md- Changed all references from
python scripts/generate_embeddings.pytocodexlens embeddings-generate - Updated troubleshooting section
- Added new
embeddings-statusexamples
- Changed all references from
-
docs/IMPLEMENTATION_SUMMARY.md- Marked P1 priorities as complete
- Added CLI integration to checklist
- Updated feature list
-
src/codexlens/cli/commands.py- Updated search command help text to reference new commands
Files Created
| File | Purpose | Lines |
|---|---|---|
src/codexlens/cli/embedding_manager.py |
Backend logic for embedding operations | ~290 |
docs/CLI_INTEGRATION_SUMMARY.md |
This document | ~400 |
Files Modified
| File | Changes |
|---|---|
src/codexlens/cli/commands.py |
Added 2 new commands (~270 lines) |
src/codexlens/search/hybrid_search.py |
Improved error messages (~20 lines) |
docs/PURE_VECTOR_SEARCH_GUIDE.md |
Updated CLI references (~10 changes) |
docs/IMPLEMENTATION_SUMMARY.md |
Marked P1 complete (~10 lines) |
Testing Workflow
Manual Testing Checklist
codexlens embeddings-statuswith no indexescodexlens embeddings-statuswith multiple indexescodexlens embeddings-status ~/projects/my-app(project path)codexlens embeddings-status ~/.codexlens/indexes/my-app/_index.db(direct path)codexlens embeddings-generate ~/projects/my-app(first time)codexlens embeddings-generate ~/projects/my-app(already exists, should error)codexlens embeddings-generate ~/projects/my-app --force(regenerate)codexlens embeddings-generate ~/projects/my-app --model fastcodexlens embeddings-generate ~/projects/my-app -v(verbose output)codexlens search "query" --mode pure-vector(with embeddings)codexlens search "query" --mode pure-vector(without embeddings, check error message)codexlens embeddings-status --json(JSON output)codexlens embeddings-generate ~/projects/my-app --json(JSON output)
Expected Test Results
Without embeddings:
$ codexlens embeddings-status ~/projects/my-app
Embedding Status
Index: ~/.codexlens/indexes/my-app/_index.db
— No embeddings found
Total files indexed: 89
Generate embeddings with:
codexlens embeddings-generate ~/projects/my-app
After generating embeddings:
$ codexlens embeddings-generate ~/projects/my-app
Generating embeddings
Index: ~/.codexlens/indexes/my-app/_index.db
Model: code
✓ Embeddings generated successfully!
Model: jinaai/jina-embeddings-v2-base-code
Chunks created: 1,234
Files processed: 89
Time: 45.2s
Status after generation:
$ codexlens embeddings-status ~/projects/my-app
Embedding Status
Index: ~/.codexlens/indexes/my-app/_index.db
✓ Embeddings available
Total chunks: 1,234
Total files: 89
Files with embeddings: 89/89
Coverage: 100.0%
Pure vector search:
$ codexlens search "how to authenticate users" --mode pure-vector
Found 5 results in 12.3ms:
auth/authentication.py:42 [0.876]
def authenticate_user(username: str, password: str) -> bool:
'''Verify user credentials against database.'''
return check_password(username, password)
...
User Experience Improvements
| Before | After |
|---|---|
| Run separate Python script | Single CLI command |
| Manual path resolution | Auto-finds project index |
| No status check | embeddings-status command |
| Generic error messages | Helpful hints with commands |
| Script-level documentation | Integrated --help text |
Backward Compatibility
- ✅ Standalone script
scripts/generate_embeddings.pystill works - ✅ All existing search modes unchanged
- ✅ Pure vector implementation backward compatible
- ✅ No breaking changes to APIs
Next Steps (Optional)
Future enhancements users might want:
-
Batch operations:
codexlens embeddings-generate --all # Generate for all indexes -
Incremental updates:
codexlens embeddings-update ~/projects/my-app # Only changed files -
Embedding cleanup:
codexlens embeddings-delete ~/projects/my-app # Remove embeddings -
Model management integration:
codexlens embeddings-generate ~/projects/my-app --download-model
Summary
✅ Completed: Full CLI integration for embedding management ✅ User Experience: Simplified from multi-step script to single command ✅ Error Handling: Helpful messages guide users to correct commands ✅ Documentation: All references updated to new CLI commands ✅ Testing: Manual testing checklist prepared
Impact: Users can now manage embeddings with intuitive CLI commands instead of running scripts, making vector search more accessible and easier to use.
Command Summary:
codexlens embeddings-status [path] # Check status
codexlens embeddings-generate <path> [--model] [--force] # Generate
codexlens search "query" --mode pure-vector # Use vector search
The integration is complete and ready for testing.