# Memory Embedder Bridge CCW to CodexLens semantic search by generating and searching embeddings for memory chunks. ## Features - **Generate embeddings** for memory chunks using CodexLens's jina-embeddings-v2-base-code (768 dim) - **Semantic search** across all memory types (core_memory, workflow, cli_history) - **Status tracking** to monitor embedding progress - **Batch processing** for efficient embedding generation - **Restore commands** included in search results ## Requirements ```bash pip install numpy codexlens[semantic] ``` ## Usage ### 1. Check Status ```bash python scripts/memory_embedder.py status ``` Example output: ```json { "total_chunks": 150, "embedded_chunks": 100, "pending_chunks": 50, "by_type": { "core_memory": {"total": 80, "embedded": 60, "pending": 20}, "workflow": {"total": 50, "embedded": 30, "pending": 20}, "cli_history": {"total": 20, "embedded": 10, "pending": 10} } } ``` ### 2. Generate Embeddings Embed all unembedded chunks: ```bash python scripts/memory_embedder.py embed ``` Embed specific source: ```bash python scripts/memory_embedder.py embed --source-id CMEM-20250101-120000 ``` Re-embed all chunks (force): ```bash python scripts/memory_embedder.py embed --force ``` Adjust batch size (default 8): ```bash python scripts/memory_embedder.py embed --batch-size 16 ``` Example output: ```json { "success": true, "chunks_processed": 50, "chunks_failed": 0, "elapsed_time": 12.34 } ``` ### 3. Semantic Search Basic search: ```bash python scripts/memory_embedder.py search "authentication flow" ``` Advanced search: ```bash python scripts/memory_embedder.py search "rate limiting" \ --top-k 5 \ --min-score 0.5 \ --type workflow ``` Example output: ```json { "success": true, "matches": [ { "source_id": "WFS-20250101-auth", "source_type": "workflow", "chunk_index": 2, "content": "Implemented JWT-based authentication...", "score": 0.8542, "restore_command": "ccw session resume WFS-20250101-auth" } ] } ``` ## Database Path The database is located in CCW's storage directory: - **Windows**: `%USERPROFILE%\.ccw\projects\\core-memory\core_memory.db` - **Linux/Mac**: `~/.ccw/projects//core-memory/core_memory.db` Find your project's database: ```bash ccw memory list # Shows project path # Then look in: ~/.ccw/projects//core-memory/core_memory.db ``` ## Integration with CCW This script is designed to be called from CCW's TypeScript code: ```typescript import { execSync } from 'child_process'; // Embed chunks const result = execSync( `python scripts/memory_embedder.py embed ${dbPath}`, { encoding: 'utf-8' } ); const { success, chunks_processed } = JSON.parse(result); // Search const searchResult = execSync( `python scripts/memory_embedder.py search ${dbPath} "${query}" --top-k 10`, { encoding: 'utf-8' } ); const { matches } = JSON.parse(searchResult); ``` ## Performance - **Embedding speed**: ~8 chunks/second (batch size 8) - **Search speed**: ~0.1-0.5 seconds for 1000 chunks - **Model loading**: ~0.8 seconds (cached after first use) ## Source Types - `core_memory`: Strategic architectural context - `workflow`: Session-based development history - `cli_history`: Command execution logs ## Restore Commands Search results include restore commands: - **core_memory/cli_history**: `ccw memory export ` - **workflow**: `ccw session resume `