Files
Claude-Code-Workflow/ccw/scripts/README-memory-embedder.md
catlog22 1dfabf6bda fix: resolve CodexLens installation issues by correcting package name and improving local path detection
- Updated package name from `codexlens` to `codex-lens` in all relevant files to ensure consistency with `pyproject.toml`.
- Enhanced `findLocalPackagePath()` to always search for local paths, even when running from `node_modules`.
- Removed fallback logic for PyPI installation in several functions, providing clearer error messages for local installation failures.
- Added detailed documentation on installation steps and error handling for local development packages.
- Introduced a new summary document outlining the issues and fixes related to CodexLens installation.
2026-01-21 15:32:41 +08:00

3.5 KiB

Memory Embedder

Bridge CCW to CodexLens semantic search by generating and searching embeddings for memory chunks.

Features

  • Generate embeddings for memory chunks using CodexLens's jina-embeddings-v2-base-code (768 dim)
  • Semantic search across all memory types (core_memory, workflow, cli_history)
  • Status tracking to monitor embedding progress
  • Batch processing for efficient embedding generation
  • Restore commands included in search results

Requirements

pip install numpy codex-lens[semantic]

Usage

1. Check Status

python scripts/memory_embedder.py status <db_path>

Example output:

{
  "total_chunks": 150,
  "embedded_chunks": 100,
  "pending_chunks": 50,
  "by_type": {
    "core_memory": {"total": 80, "embedded": 60, "pending": 20},
    "workflow": {"total": 50, "embedded": 30, "pending": 20},
    "cli_history": {"total": 20, "embedded": 10, "pending": 10}
  }
}

2. Generate Embeddings

Embed all unembedded chunks:

python scripts/memory_embedder.py embed <db_path>

Embed specific source:

python scripts/memory_embedder.py embed <db_path> --source-id CMEM-20250101-120000

Re-embed all chunks (force):

python scripts/memory_embedder.py embed <db_path> --force

Adjust batch size (default 8):

python scripts/memory_embedder.py embed <db_path> --batch-size 16

Example output:

{
  "success": true,
  "chunks_processed": 50,
  "chunks_failed": 0,
  "elapsed_time": 12.34
}

Basic search:

python scripts/memory_embedder.py search <db_path> "authentication flow"

Advanced search:

python scripts/memory_embedder.py search <db_path> "rate limiting" \
  --top-k 5 \
  --min-score 0.5 \
  --type workflow

Example output:

{
  "success": true,
  "matches": [
    {
      "source_id": "WFS-20250101-auth",
      "source_type": "workflow",
      "chunk_index": 2,
      "content": "Implemented JWT-based authentication...",
      "score": 0.8542,
      "restore_command": "ccw session resume WFS-20250101-auth"
    }
  ]
}

Database Path

The database is located in CCW's storage directory:

  • Windows: %USERPROFILE%\.ccw\projects\<project-id>\core-memory\core_memory.db
  • Linux/Mac: ~/.ccw/projects/<project-id>/core-memory/core_memory.db

Find your project's database:

ccw memory list  # Shows project path
# Then look in: ~/.ccw/projects/<hashed-path>/core-memory/core_memory.db

Integration with CCW

This script is designed to be called from CCW's TypeScript code:

import { execSync } from 'child_process';

// Embed chunks
const result = execSync(
  `python scripts/memory_embedder.py embed ${dbPath}`,
  { encoding: 'utf-8' }
);
const { success, chunks_processed } = JSON.parse(result);

// Search
const searchResult = execSync(
  `python scripts/memory_embedder.py search ${dbPath} "${query}" --top-k 10`,
  { encoding: 'utf-8' }
);
const { matches } = JSON.parse(searchResult);

Performance

  • Embedding speed: ~8 chunks/second (batch size 8)
  • Search speed: ~0.1-0.5 seconds for 1000 chunks
  • Model loading: ~0.8 seconds (cached after first use)

Source Types

  • core_memory: Strategic architectural context
  • workflow: Session-based development history
  • cli_history: Command execution logs

Restore Commands

Search results include restore commands:

  • core_memory/cli_history: ccw memory export <source_id>
  • workflow: ccw session resume <source_id>