Files
Claude-Code-Workflow/ccw/scripts/README-memory-embedder.md
catlog22 4458af83d8 feat: Upgrade to version 6.2.0 with major enhancements
- Updated COMMAND_SPEC.md to reflect new version and features including native CodexLens and CLI refactor.
- Revised GETTING_STARTED.md and GETTING_STARTED_CN.md for improved onboarding experience with new features.
- Enhanced INSTALL_CN.md to highlight the new CodexLens and Dashboard capabilities.
- Updated README.md and README_CN.md to showcase version 6.2.0 features and breaking changes.
- Introduced memory embedder scripts with comprehensive documentation and quick reference.
- Added test suite for memory embedder functionality to ensure reliability and correctness.
- Implemented TypeScript integration examples for memory embedder usage.
2025-12-20 13:16:09 +08:00

158 lines
3.5 KiB
Markdown

# Memory Embedder
Bridge CCW to CodexLens semantic search by generating and searching embeddings for memory chunks.
## Features
- **Generate embeddings** for memory chunks using CodexLens's jina-embeddings-v2-base-code (768 dim)
- **Semantic search** across all memory types (core_memory, workflow, cli_history)
- **Status tracking** to monitor embedding progress
- **Batch processing** for efficient embedding generation
- **Restore commands** included in search results
## Requirements
```bash
pip install numpy codexlens[semantic]
```
## Usage
### 1. Check Status
```bash
python scripts/memory_embedder.py status <db_path>
```
Example output:
```json
{
"total_chunks": 150,
"embedded_chunks": 100,
"pending_chunks": 50,
"by_type": {
"core_memory": {"total": 80, "embedded": 60, "pending": 20},
"workflow": {"total": 50, "embedded": 30, "pending": 20},
"cli_history": {"total": 20, "embedded": 10, "pending": 10}
}
}
```
### 2. Generate Embeddings
Embed all unembedded chunks:
```bash
python scripts/memory_embedder.py embed <db_path>
```
Embed specific source:
```bash
python scripts/memory_embedder.py embed <db_path> --source-id CMEM-20250101-120000
```
Re-embed all chunks (force):
```bash
python scripts/memory_embedder.py embed <db_path> --force
```
Adjust batch size (default 8):
```bash
python scripts/memory_embedder.py embed <db_path> --batch-size 16
```
Example output:
```json
{
"success": true,
"chunks_processed": 50,
"chunks_failed": 0,
"elapsed_time": 12.34
}
```
### 3. Semantic Search
Basic search:
```bash
python scripts/memory_embedder.py search <db_path> "authentication flow"
```
Advanced search:
```bash
python scripts/memory_embedder.py search <db_path> "rate limiting" \
--top-k 5 \
--min-score 0.5 \
--type workflow
```
Example output:
```json
{
"success": true,
"matches": [
{
"source_id": "WFS-20250101-auth",
"source_type": "workflow",
"chunk_index": 2,
"content": "Implemented JWT-based authentication...",
"score": 0.8542,
"restore_command": "ccw session resume WFS-20250101-auth"
}
]
}
```
## Database Path
The database is located in CCW's storage directory:
- **Windows**: `%USERPROFILE%\.ccw\projects\<project-id>\core-memory\core_memory.db`
- **Linux/Mac**: `~/.ccw/projects/<project-id>/core-memory/core_memory.db`
Find your project's database:
```bash
ccw memory list # Shows project path
# Then look in: ~/.ccw/projects/<hashed-path>/core-memory/core_memory.db
```
## Integration with CCW
This script is designed to be called from CCW's TypeScript code:
```typescript
import { execSync } from 'child_process';
// Embed chunks
const result = execSync(
`python scripts/memory_embedder.py embed ${dbPath}`,
{ encoding: 'utf-8' }
);
const { success, chunks_processed } = JSON.parse(result);
// Search
const searchResult = execSync(
`python scripts/memory_embedder.py search ${dbPath} "${query}" --top-k 10`,
{ encoding: 'utf-8' }
);
const { matches } = JSON.parse(searchResult);
```
## Performance
- **Embedding speed**: ~8 chunks/second (batch size 8)
- **Search speed**: ~0.1-0.5 seconds for 1000 chunks
- **Model loading**: ~0.8 seconds (cached after first use)
## Source Types
- `core_memory`: Strategic architectural context
- `workflow`: Session-based development history
- `cli_history`: Command execution logs
## Restore Commands
Search results include restore commands:
- **core_memory/cli_history**: `ccw memory export <source_id>`
- **workflow**: `ccw session resume <source_id>`