mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-06 01:54:11 +08:00
- Updated package name from `codexlens` to `codex-lens` in all relevant files to ensure consistency with `pyproject.toml`. - Enhanced `findLocalPackagePath()` to always search for local paths, even when running from `node_modules`. - Removed fallback logic for PyPI installation in several functions, providing clearer error messages for local installation failures. - Added detailed documentation on installation steps and error handling for local development packages. - Introduced a new summary document outlining the issues and fixes related to CodexLens installation.
158 lines
3.5 KiB
Markdown
158 lines
3.5 KiB
Markdown
# Memory Embedder
|
|
|
|
Bridge CCW to CodexLens semantic search by generating and searching embeddings for memory chunks.
|
|
|
|
## Features
|
|
|
|
- **Generate embeddings** for memory chunks using CodexLens's jina-embeddings-v2-base-code (768 dim)
|
|
- **Semantic search** across all memory types (core_memory, workflow, cli_history)
|
|
- **Status tracking** to monitor embedding progress
|
|
- **Batch processing** for efficient embedding generation
|
|
- **Restore commands** included in search results
|
|
|
|
## Requirements
|
|
|
|
```bash
|
|
pip install numpy codex-lens[semantic]
|
|
```
|
|
|
|
## Usage
|
|
|
|
### 1. Check Status
|
|
|
|
```bash
|
|
python scripts/memory_embedder.py status <db_path>
|
|
```
|
|
|
|
Example output:
|
|
```json
|
|
{
|
|
"total_chunks": 150,
|
|
"embedded_chunks": 100,
|
|
"pending_chunks": 50,
|
|
"by_type": {
|
|
"core_memory": {"total": 80, "embedded": 60, "pending": 20},
|
|
"workflow": {"total": 50, "embedded": 30, "pending": 20},
|
|
"cli_history": {"total": 20, "embedded": 10, "pending": 10}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Generate Embeddings
|
|
|
|
Embed all unembedded chunks:
|
|
```bash
|
|
python scripts/memory_embedder.py embed <db_path>
|
|
```
|
|
|
|
Embed specific source:
|
|
```bash
|
|
python scripts/memory_embedder.py embed <db_path> --source-id CMEM-20250101-120000
|
|
```
|
|
|
|
Re-embed all chunks (force):
|
|
```bash
|
|
python scripts/memory_embedder.py embed <db_path> --force
|
|
```
|
|
|
|
Adjust batch size (default 8):
|
|
```bash
|
|
python scripts/memory_embedder.py embed <db_path> --batch-size 16
|
|
```
|
|
|
|
Example output:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"chunks_processed": 50,
|
|
"chunks_failed": 0,
|
|
"elapsed_time": 12.34
|
|
}
|
|
```
|
|
|
|
### 3. Semantic Search
|
|
|
|
Basic search:
|
|
```bash
|
|
python scripts/memory_embedder.py search <db_path> "authentication flow"
|
|
```
|
|
|
|
Advanced search:
|
|
```bash
|
|
python scripts/memory_embedder.py search <db_path> "rate limiting" \
|
|
--top-k 5 \
|
|
--min-score 0.5 \
|
|
--type workflow
|
|
```
|
|
|
|
Example output:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"matches": [
|
|
{
|
|
"source_id": "WFS-20250101-auth",
|
|
"source_type": "workflow",
|
|
"chunk_index": 2,
|
|
"content": "Implemented JWT-based authentication...",
|
|
"score": 0.8542,
|
|
"restore_command": "ccw session resume WFS-20250101-auth"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Database Path
|
|
|
|
The database is located in CCW's storage directory:
|
|
|
|
- **Windows**: `%USERPROFILE%\.ccw\projects\<project-id>\core-memory\core_memory.db`
|
|
- **Linux/Mac**: `~/.ccw/projects/<project-id>/core-memory/core_memory.db`
|
|
|
|
Find your project's database:
|
|
```bash
|
|
ccw memory list # Shows project path
|
|
# Then look in: ~/.ccw/projects/<hashed-path>/core-memory/core_memory.db
|
|
```
|
|
|
|
## Integration with CCW
|
|
|
|
This script is designed to be called from CCW's TypeScript code:
|
|
|
|
```typescript
|
|
import { execSync } from 'child_process';
|
|
|
|
// Embed chunks
|
|
const result = execSync(
|
|
`python scripts/memory_embedder.py embed ${dbPath}`,
|
|
{ encoding: 'utf-8' }
|
|
);
|
|
const { success, chunks_processed } = JSON.parse(result);
|
|
|
|
// Search
|
|
const searchResult = execSync(
|
|
`python scripts/memory_embedder.py search ${dbPath} "${query}" --top-k 10`,
|
|
{ encoding: 'utf-8' }
|
|
);
|
|
const { matches } = JSON.parse(searchResult);
|
|
```
|
|
|
|
## Performance
|
|
|
|
- **Embedding speed**: ~8 chunks/second (batch size 8)
|
|
- **Search speed**: ~0.1-0.5 seconds for 1000 chunks
|
|
- **Model loading**: ~0.8 seconds (cached after first use)
|
|
|
|
## Source Types
|
|
|
|
- `core_memory`: Strategic architectural context
|
|
- `workflow`: Session-based development history
|
|
- `cli_history`: Command execution logs
|
|
|
|
## Restore Commands
|
|
|
|
Search results include restore commands:
|
|
|
|
- **core_memory/cli_history**: `ccw memory export <source_id>`
|
|
- **workflow**: `ccw session resume <source_id>`
|