Files
Claude-Code-Workflow/codex-lens/docs/CLI_INTEGRATION_SUMMARY.md
catlog22 df23975a0b Add comprehensive tests for schema cleanup migration and search comparison
- Implement tests for migration 005 to verify removal of deprecated fields in the database schema.
- Ensure that new databases are created with a clean schema.
- Validate that keywords are correctly extracted from the normalized file_keywords table.
- Test symbol insertion without deprecated fields and subdir operations without direct_files.
- Create a detailed search comparison test to evaluate vector search vs hybrid search performance.
- Add a script for reindexing projects to extract code relationships and verify GraphAnalyzer functionality.
- Include a test script to check TreeSitter parser availability and relationship extraction from sample files.
2025-12-16 19:27:05 +08:00

9.1 KiB

CLI Integration Summary - Embedding Management

Date: 2025-12-16 Version: v0.5.1 Status: Complete


Overview

Completed integration of embedding management commands into the CodexLens CLI, making vector search functionality more accessible and user-friendly. Users no longer need to run standalone scripts - all embedding operations are now available through simple CLI commands.

What Changed

1. New CLI Commands

codexlens embeddings-generate

Purpose: Generate semantic embeddings for code search

Features:

  • Accepts project directory or direct _index.db path
  • Auto-finds index for project paths using registry
  • Supports 4 model profiles (fast, code, multilingual, balanced)
  • Force regeneration with --force flag
  • Configurable chunk size
  • Verbose mode with progress updates
  • JSON output mode for scripting

Examples:

# Generate embeddings for a project
codexlens embeddings-generate ~/projects/my-app

# Use specific model
codexlens embeddings-generate ~/projects/my-app --model fast

# Force regeneration
codexlens embeddings-generate ~/projects/my-app --force

# Verbose output
codexlens embeddings-generate ~/projects/my-app -v

Output:

Generating embeddings
Index: ~/.codexlens/indexes/my-app/_index.db
Model: code

✓ Embeddings generated successfully!
  Model: jinaai/jina-embeddings-v2-base-code
  Chunks created: 1,234
  Files processed: 89
  Time: 45.2s

Use vector search with:
  codexlens search 'your query' --mode pure-vector

codexlens embeddings-status

Purpose: Check embedding status for indexes

Features:

  • Check all indexes (no arguments)
  • Check specific project or index
  • Summary table view
  • File coverage statistics
  • Missing files detection
  • JSON output mode

Examples:

# Check all indexes
codexlens embeddings-status

# Check specific project
codexlens embeddings-status ~/projects/my-app

# Check specific index
codexlens embeddings-status ~/.codexlens/indexes/my-app/_index.db

Output (all indexes):

Embedding Status Summary
Index root: ~/.codexlens/indexes

Total indexes: 5
Indexes with embeddings: 3/5
Total chunks: 4,567

Project      Files  Chunks  Coverage  Status
my-app        89    1,234    100.0%      ✓
other-app    145    2,456     95.5%      ✓
test-proj     23      877    100.0%      ✓
no-emb       67        0       0.0%      —
legacy       45        0       0.0%      —

Output (specific project):

Embedding Status
Index: ~/.codexlens/indexes/my-app/_index.db

✓ Embeddings available
  Total chunks: 1,234
  Total files: 89
  Files with embeddings: 89/89
  Coverage: 100.0%

2. Improved Error Messages

Enhanced error messages throughout the search pipeline to guide users to the new CLI commands:

Before:

DEBUG: No semantic_chunks table found
DEBUG: Vector store is empty

After:

INFO: No embeddings found in index. Generate embeddings with: codexlens embeddings-generate ~/projects/my-app
WARNING: Pure vector search returned no results. This usually means embeddings haven't been generated. Run: codexlens embeddings-generate ~/projects/my-app

Locations Updated:

  • src/codexlens/search/hybrid_search.py - Added helpful info messages
  • src/codexlens/cli/commands.py - Improved error hints in CLI output

3. Backend Infrastructure

Created src/codexlens/cli/embedding_manager.py with reusable functions:

Functions:

  • check_index_embeddings(index_path) - Check embedding status
  • generate_embeddings(index_path, ...) - Generate embeddings
  • find_all_indexes(scan_dir) - Find all indexes in directory
  • get_embedding_stats_summary(index_root) - Aggregate stats for all indexes

Architecture:

  • Follows same pattern as model_manager.py for consistency
  • Returns standardized result dictionaries {"success": bool, "result": dict}
  • Supports progress callbacks for UI updates
  • Handles all error cases gracefully

4. Documentation Updates

Updated user-facing documentation to reference new CLI commands:

Files Updated:

  1. docs/PURE_VECTOR_SEARCH_GUIDE.md

    • Changed all references from python scripts/generate_embeddings.py to codexlens embeddings-generate
    • Updated troubleshooting section
    • Added new embeddings-status examples
  2. docs/IMPLEMENTATION_SUMMARY.md

    • Marked P1 priorities as complete
    • Added CLI integration to checklist
    • Updated feature list
  3. src/codexlens/cli/commands.py

    • Updated search command help text to reference new commands

Files Created

File Purpose Lines
src/codexlens/cli/embedding_manager.py Backend logic for embedding operations ~290
docs/CLI_INTEGRATION_SUMMARY.md This document ~400

Files Modified

File Changes
src/codexlens/cli/commands.py Added 2 new commands (~270 lines)
src/codexlens/search/hybrid_search.py Improved error messages (~20 lines)
docs/PURE_VECTOR_SEARCH_GUIDE.md Updated CLI references (~10 changes)
docs/IMPLEMENTATION_SUMMARY.md Marked P1 complete (~10 lines)

Testing Workflow

Manual Testing Checklist

  • codexlens embeddings-status with no indexes
  • codexlens embeddings-status with multiple indexes
  • codexlens embeddings-status ~/projects/my-app (project path)
  • codexlens embeddings-status ~/.codexlens/indexes/my-app/_index.db (direct path)
  • codexlens embeddings-generate ~/projects/my-app (first time)
  • codexlens embeddings-generate ~/projects/my-app (already exists, should error)
  • codexlens embeddings-generate ~/projects/my-app --force (regenerate)
  • codexlens embeddings-generate ~/projects/my-app --model fast
  • codexlens embeddings-generate ~/projects/my-app -v (verbose output)
  • codexlens search "query" --mode pure-vector (with embeddings)
  • codexlens search "query" --mode pure-vector (without embeddings, check error message)
  • codexlens embeddings-status --json (JSON output)
  • codexlens embeddings-generate ~/projects/my-app --json (JSON output)

Expected Test Results

Without embeddings:

$ codexlens embeddings-status ~/projects/my-app
Embedding Status
Index: ~/.codexlens/indexes/my-app/_index.db

— No embeddings found
  Total files indexed: 89

Generate embeddings with:
  codexlens embeddings-generate ~/projects/my-app

After generating embeddings:

$ codexlens embeddings-generate ~/projects/my-app
Generating embeddings
Index: ~/.codexlens/indexes/my-app/_index.db
Model: code

✓ Embeddings generated successfully!
  Model: jinaai/jina-embeddings-v2-base-code
  Chunks created: 1,234
  Files processed: 89
  Time: 45.2s

Status after generation:

$ codexlens embeddings-status ~/projects/my-app
Embedding Status
Index: ~/.codexlens/indexes/my-app/_index.db

✓ Embeddings available
  Total chunks: 1,234
  Total files: 89
  Files with embeddings: 89/89
  Coverage: 100.0%

Pure vector search:

$ codexlens search "how to authenticate users" --mode pure-vector
Found 5 results in 12.3ms:

auth/authentication.py:42  [0.876]
  def authenticate_user(username: str, password: str) -> bool:
      '''Verify user credentials against database.'''
      return check_password(username, password)
...

User Experience Improvements

Before After
Run separate Python script Single CLI command
Manual path resolution Auto-finds project index
No status check embeddings-status command
Generic error messages Helpful hints with commands
Script-level documentation Integrated --help text

Backward Compatibility

  • Standalone script scripts/generate_embeddings.py still works
  • All existing search modes unchanged
  • Pure vector implementation backward compatible
  • No breaking changes to APIs

Next Steps (Optional)

Future enhancements users might want:

  1. Batch operations:

    codexlens embeddings-generate --all  # Generate for all indexes
    
  2. Incremental updates:

    codexlens embeddings-update ~/projects/my-app  # Only changed files
    
  3. Embedding cleanup:

    codexlens embeddings-delete ~/projects/my-app  # Remove embeddings
    
  4. Model management integration:

    codexlens embeddings-generate ~/projects/my-app --download-model
    

Summary

Completed: Full CLI integration for embedding management User Experience: Simplified from multi-step script to single command Error Handling: Helpful messages guide users to correct commands Documentation: All references updated to new CLI commands Testing: Manual testing checklist prepared

Impact: Users can now manage embeddings with intuitive CLI commands instead of running scripts, making vector search more accessible and easier to use.

Command Summary:

codexlens embeddings-status [path]                     # Check status
codexlens embeddings-generate <path> [--model] [--force]  # Generate
codexlens search "query" --mode pure-vector            # Use vector search

The integration is complete and ready for testing.