mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-05 01:50:27 +08:00
- Updated COMMAND_SPEC.md to reflect new version and features including native CodexLens and CLI refactor. - Revised GETTING_STARTED.md and GETTING_STARTED_CN.md for improved onboarding experience with new features. - Enhanced INSTALL_CN.md to highlight the new CodexLens and Dashboard capabilities. - Updated README.md and README_CN.md to showcase version 6.2.0 features and breaking changes. - Introduced memory embedder scripts with comprehensive documentation and quick reference. - Added test suite for memory embedder functionality to ensure reliability and correctness. - Implemented TypeScript integration examples for memory embedder usage.
3.5 KiB
3.5 KiB
Memory Embedder
Bridge CCW to CodexLens semantic search by generating and searching embeddings for memory chunks.
Features
- Generate embeddings for memory chunks using CodexLens's jina-embeddings-v2-base-code (768 dim)
- Semantic search across all memory types (core_memory, workflow, cli_history)
- Status tracking to monitor embedding progress
- Batch processing for efficient embedding generation
- Restore commands included in search results
Requirements
pip install numpy codexlens[semantic]
Usage
1. Check Status
python scripts/memory_embedder.py status <db_path>
Example output:
{
"total_chunks": 150,
"embedded_chunks": 100,
"pending_chunks": 50,
"by_type": {
"core_memory": {"total": 80, "embedded": 60, "pending": 20},
"workflow": {"total": 50, "embedded": 30, "pending": 20},
"cli_history": {"total": 20, "embedded": 10, "pending": 10}
}
}
2. Generate Embeddings
Embed all unembedded chunks:
python scripts/memory_embedder.py embed <db_path>
Embed specific source:
python scripts/memory_embedder.py embed <db_path> --source-id CMEM-20250101-120000
Re-embed all chunks (force):
python scripts/memory_embedder.py embed <db_path> --force
Adjust batch size (default 8):
python scripts/memory_embedder.py embed <db_path> --batch-size 16
Example output:
{
"success": true,
"chunks_processed": 50,
"chunks_failed": 0,
"elapsed_time": 12.34
}
3. Semantic Search
Basic search:
python scripts/memory_embedder.py search <db_path> "authentication flow"
Advanced search:
python scripts/memory_embedder.py search <db_path> "rate limiting" \
--top-k 5 \
--min-score 0.5 \
--type workflow
Example output:
{
"success": true,
"matches": [
{
"source_id": "WFS-20250101-auth",
"source_type": "workflow",
"chunk_index": 2,
"content": "Implemented JWT-based authentication...",
"score": 0.8542,
"restore_command": "ccw session resume WFS-20250101-auth"
}
]
}
Database Path
The database is located in CCW's storage directory:
- Windows:
%USERPROFILE%\.ccw\projects\<project-id>\core-memory\core_memory.db - Linux/Mac:
~/.ccw/projects/<project-id>/core-memory/core_memory.db
Find your project's database:
ccw memory list # Shows project path
# Then look in: ~/.ccw/projects/<hashed-path>/core-memory/core_memory.db
Integration with CCW
This script is designed to be called from CCW's TypeScript code:
import { execSync } from 'child_process';
// Embed chunks
const result = execSync(
`python scripts/memory_embedder.py embed ${dbPath}`,
{ encoding: 'utf-8' }
);
const { success, chunks_processed } = JSON.parse(result);
// Search
const searchResult = execSync(
`python scripts/memory_embedder.py search ${dbPath} "${query}" --top-k 10`,
{ encoding: 'utf-8' }
);
const { matches } = JSON.parse(searchResult);
Performance
- Embedding speed: ~8 chunks/second (batch size 8)
- Search speed: ~0.1-0.5 seconds for 1000 chunks
- Model loading: ~0.8 seconds (cached after first use)
Source Types
core_memory: Strategic architectural contextworkflow: Session-based development historycli_history: Command execution logs
Restore Commands
Search results include restore commands:
- core_memory/cli_history:
ccw memory export <source_id> - workflow:
ccw session resume <source_id>