Remove temporary verbose JSON file and cleanup script for VSCode bridge

This commit is contained in:
catlog22
2026-02-25 10:30:19 +08:00
parent 092b8e20dc
commit e315e2315c
18 changed files with 0 additions and 3218 deletions

View File

@@ -1,272 +0,0 @@
# CCW 双前端并存迁移方案
## 目标
- 通过 `ccw view` 命令同时支持 JS 前端(旧版)和 React 前端(新版)
- 实现渐进式迁移,逐步将功能迁移到 React
- 用户可自由切换两个前端
## 架构设计
```
┌─────────────────┐ ┌──────────────────┐
│ ccw view │────▶│ Node Server │
│ (port 3456) │ │ (3456) │
└─────────────────┘ └────────┬─────────┘
┌────────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ JS Frontend │ │ React Frontend │ │ /api/* │
│ (/) │ │ (/react/*) │ │ REST API │
│ dashboard-js │ │ Vite dev/prod │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
## 实现方案
### Phase 1: 基础架构改造
#### 1.1 修改 `ccw/src/commands/serve.ts`
添加 `--frontend` 参数支持:
```typescript
interface ServeOptions {
port?: number;
path?: string;
host?: string;
browser?: boolean;
frontend?: 'js' | 'react' | 'both'; // 新增
}
// 在 serveCommand 中处理
export async function serveCommand(options: ServeOptions): Promise<void> {
const frontend = options.frontend || 'js'; // 默认 JS 前端
if (frontend === 'react' || frontend === 'both') {
// 启动 React 前端服务
await startReactFrontend(port + 1); // React 在 port+1
}
// 启动主服务器
const server = await startServer({
port,
host,
initialPath,
frontend // 传递给 server
});
}
```
#### 1.2 修改 `ccw/src/core/server.ts`
添加 React 前端路由支持:
```typescript
// 在路由处理中添加
if (pathname === '/react' || pathname.startsWith('/react/')) {
// 代理到 React 前端
const reactUrl = `http://localhost:${options.reactPort || port + 1}${pathname.replace('/react', '')}`;
// 使用 http-proxy 或 fetch 代理请求
proxyToReact(req, res, reactUrl);
return;
}
// 根路径根据配置决定默认前端
if (pathname === '/' || pathname === '/index.html') {
if (options.frontend === 'react') {
res.writeHead(302, { Location: '/react' });
res.end();
return;
}
// 默认 JS 前端
const html = generateServerDashboard(initialPath);
res.writeHead(200, { 'Content-Type': 'text/html; charset=utf-8' });
res.end(html);
return;
}
```
#### 1.3 创建 `ccw/src/utils/react-frontend.ts`
```typescript
import { spawn, type ChildProcess } from 'child_process';
import { join } from 'path';
import chalk from 'chalk';
let reactProcess: ChildProcess | null = null;
export async function startReactFrontend(port: number): Promise<void> {
const frontendDir = join(process.cwd(), 'frontend');
console.log(chalk.cyan(` Starting React frontend on port ${port}...`));
reactProcess = spawn('npm', ['run', 'dev', '--', '--port', port.toString()], {
cwd: frontendDir,
stdio: 'pipe',
shell: true
});
// 等待服务启动
return new Promise((resolve, reject) => {
let output = '';
const timeout = setTimeout(() => {
reject(new Error('React frontend startup timeout'));
}, 30000);
reactProcess?.stdout?.on('data', (data) => {
output += data.toString();
if (output.includes('Local:') || output.includes('ready')) {
clearTimeout(timeout);
console.log(chalk.green(` React frontend ready at http://localhost:${port}`));
resolve();
}
});
reactProcess?.stderr?.on('data', (data) => {
console.error(chalk.yellow(` React: ${data.toString().trim()}`));
});
reactProcess?.on('error', (err) => {
clearTimeout(timeout);
reject(err);
});
});
}
export function stopReactFrontend(): void {
if (reactProcess) {
reactProcess.kill('SIGTERM');
reactProcess = null;
}
}
```
### Phase 2: React 前端适配
#### 2.1 修改 `ccw/frontend/vite.config.ts`
添加基础路径配置:
```typescript
export default defineConfig({
plugins: [react()],
base: '/react/', // 添加基础路径
resolve: {
alias: {
'@': path.resolve(__dirname, './src'),
},
},
server: {
port: 5173,
proxy: {
'/api': {
target: 'http://localhost:3456',
changeOrigin: true,
},
'/ws': {
target: 'ws://localhost:3456',
ws: true,
},
},
},
// ...
})
```
#### 2.2 创建前端切换组件
在 JS 前端添加切换按钮(`ccw/src/templates/dashboard-js/components/react-switch.js`
```javascript
// 在导航栏添加切换按钮
function addReactSwitchButton() {
const nav = document.querySelector('.navbar');
if (!nav) return;
const switchBtn = document.createElement('button');
switchBtn.className = 'btn btn-sm btn-outline-primary ml-2';
switchBtn.innerHTML = '<span class="icon">⚛️</span> React 版本';
switchBtn.title = '切换到 React 版本';
switchBtn.onclick = () => {
window.location.href = '/react';
};
nav.appendChild(switchBtn);
}
// 初始化
document.addEventListener('DOMContentLoaded', addReactSwitchButton);
```
### Phase 3: 命令行接口
#### 3.1 修改 `ccw/src/cli.ts`
添加 `--frontend` 选项:
```typescript
// View command
program
.command('view')
.description('Open workflow dashboard server with live path switching')
.option('-p, --path <path>', 'Path to project directory', '.')
.option('--port <port>', 'Server port', '3456')
.option('--host <host>', 'Server host to bind', '127.0.0.1')
.option('--no-browser', 'Start server without opening browser')
.option('--frontend <type>', 'Frontend type: js, react, both', 'js') // 新增
.action(viewCommand);
```
### 使用方式
#### 1. 默认 JS 前端(向后兼容)
```bash
ccw view
# 或明确指定
ccw view --frontend js
```
#### 2. React 前端
```bash
ccw view --frontend react
# React 前端将在 http://localhost:3456/react 访问
```
#### 3. 同时启动两个前端(开发调试)
```bash
ccw view --frontend both
# JS: http://localhost:3456
# React: http://localhost:3456/react (开发模式) 或 5173
```
## 迁移路线图
```
Phase 1: 基础架构 (1-2 周)
├── 添加 --frontend 参数支持
├── 实现 React 前端代理
└── 基础切换功能
Phase 2: 功能迁移 (4-8 周)
├── 逐个迁移功能模块到 React
├── 保持 JS 前端稳定
└── 添加功能开关
Phase 3: 默认切换 (2 周)
├── React 成为默认前端
├── JS 前端进入维护模式
└── 发布迁移公告
Phase 4: 完全迁移 (可选)
├── 移除 JS 前端
└── React 成为唯一前端
```
这个方案的优点:
1. **向后兼容**:默认行为不变,现有用户无感知
2. **渐进迁移**:可以逐个功能迁移到 React
3. **灵活切换**:用户和开发者可以随时切换前端
4. **并行开发**:两个前端可以同时开发调试

View File

@@ -1,165 +0,0 @@
# CodexLens Embeddings 修复总结
## 修复成果
### ✅ 已完成
1. **递归 embeddings 生成功能** (`embedding_manager.py`)
- 添加 `generate_embeddings_recursive()` 函数
- 添加 `get_embeddings_status()` 函数
- 递归处理所有子目录的 _index.db 文件
2. **CLI 命令增强** (`commands.py`)
- `embeddings-generate` 添加 `--recursive` 标志
- `init` 命令使用递归生成(自动处理所有子目录)
- `status` 命令显示 embeddings 覆盖率统计
3. **Smart Search 智能路由** (`smart-search.ts`)
- 添加 50% 覆盖率阈值
- embeddings 不足时自动降级到 exact 模式
- 提供明确的警告信息
- Strip ANSI 颜色码以正确解析 JSON
### ✅ 测试结果
**CCW 项目 (d:\Claude_dms3\ccw)**:
- 索引数据库26 个
- 文件总数303
- Embeddings 覆盖:**100%** (所有 303 个文件)
- 生成 chunks**2,042** (之前只有 10)
**对比**:
| 指标 | 修复前 | 修复后 | 改进 |
|------|--------|--------|------|
| 覆盖率 | 1.6% (5/303) | 100% (303/303) | **62.5x** |
| Chunks | 10 | 2,042 | **204x** |
| 有 embeddings 的索引 | 1/26 | 26/26 | **26x** |
## 当前问题
### ⚠️ 遗留问题
1. **路径映射问题**
- `embeddings-generate --recursive` 需要使用索引路径而非源路径
- 用户应该能够使用源路径(`d:\Claude_dms3\ccw`
- 当前需要使用:`C:\Users\dyw\.codexlens\indexes\D\Claude_dms3\ccw`
2. **Status 命令的全局 vs 项目级别**
- `codexlens status` 返回全局统计(所有项目)
- 需要项目级别的 embeddings 状态
- `embeddings-status` 只检查单个 _index.db不递归
## 建议的后续修复
### P1 - 路径映射修复
修改 `commands.py` 中的 `embeddings_generate` 命令line 1996-2000
```python
elif target_path.is_dir():
if recursive:
# Recursive mode: Map source path to index root
registry = RegistryStore()
try:
registry.initialize()
mapper = PathMapper()
index_db_path = mapper.source_to_index_db(target_path)
index_root = index_db_path.parent # Use index directory root
use_recursive = True
finally:
registry.close()
```
### P2 - 项目级别 Status
选项 A扩展 `embeddings-status` 命令支持递归
```bash
codexlens embeddings-status . --recursive --json
```
选项 B修改 `status` 命令接受路径参数
```bash
codexlens status --project . --json
```
## 使用指南
### 当前工作流程
**生成 embeddings完整覆盖**:
```bash
# 方法 1: 使用索引路径(当前工作方式)
cd C:\Users\dyw\.codexlens\indexes\D\Claude_dms3\ccw
python -m codexlens embeddings-generate . --recursive --force --model fast
# 方法 2: init 命令(自动递归,推荐)
cd d:\Claude_dms3\ccw
python -m codexlens init . --force
```
**检查覆盖率**:
```bash
# 项目根目录
cd C:\Users\dyw\.codexlens\indexes\D\Claude_dms3\ccw
python check_embeddings.py # 显示详细的每目录统计
# 全局状态
python -m codexlens status --json # 所有项目的汇总
```
**Smart Search**:
```javascript
// MCP 工具调用
smart_search(query="authentication patterns")
// 现在会:
// 1. 检查 embeddings 覆盖率
// 2. 如果 >= 50%,使用 hybrid 模式
// 3. 如果 < 50%,降级到 exact 模式
// 4. 显示警告信息
```
### 最佳实践
1. **初始化项目时自动生成 embeddings**:
```bash
codexlens init /path/to/project --force
```
2. **定期重新生成以更新**:
```bash
codexlens embeddings-generate /index/path --recursive --force
```
3. **使用 fast 模型快速测试**:
```bash
codexlens embeddings-generate . --recursive --model fast
```
4. **使用 code 模型获得最佳质量**:
```bash
codexlens embeddings-generate . --recursive --model code
```
## 技术细节
### 文件修改清单
**Python (CodexLens)**:
- `codex-lens/src/codexlens/cli/embedding_manager.py` - 添加递归函数
- `codex-lens/src/codexlens/cli/commands.py` - 更新 init, status, embeddings-generate
**TypeScript (CCW)**:
- `ccw/src/tools/smart-search.ts` - 智能路由 + ANSI stripping
- `ccw/src/tools/codex-lens.ts` - (未修改,使用现有实现)
### 依赖版本
- CodexLens: 当前开发版本
- Fastembed: 已安装ONNX backend
- Models: fast (~80MB), code (~150MB)
---
**修复时间**: 2025-12-17
**验证状态**: ✅ 核心功能正常,遗留路径映射问题待修复

View File

@@ -1,297 +0,0 @@
# Hook 集成实现总结
## 实现概览
已成功实现 Hook 系统与 session-start 渐进式披露索引的集成。
## 修改的文件
### 1. `ccw/src/core/routes/hooks-routes.ts`
**修改内容**:
-`/api/hook` POST 端点中添加了 `session-start``context` hook 类型的处理逻辑
- 集成 `SessionClusteringService` 以生成渐进式披露索引
- 实现失败静默处理机制fail silently
**关键代码**:
```typescript
// Handle context hooks (session-start, context)
if (type === 'session-start' || type === 'context') {
try {
const projectPath = url.searchParams.get('path') || initialPath;
const { SessionClusteringService } = await import('../session-clustering-service.js');
const clusteringService = new SessionClusteringService(projectPath);
const format = url.searchParams.get('format') || 'markdown';
const index = await clusteringService.getProgressiveIndex(resolvedSessionId);
return {
success: true,
type: 'context',
format,
content: index,
sessionId: resolvedSessionId
};
} catch (error) {
console.error('[Hooks] Failed to generate context:', error);
return {
success: true,
type: 'context',
format: 'markdown',
content: '',
sessionId: resolvedSessionId,
error: (error as Error).message
};
}
}
```
### 2. `ccw/src/core/session-clustering-service.ts`
**修改内容**:
- 优化 `getProgressiveIndex()` 方法的输出格式
- 更新标题为 "Related Sessions Index"(符合任务要求)
- 改进时间线显示,支持显示最近 3 个 session
- 统一命令格式为 "Resume Commands"
**关键改进**:
```typescript
// Generate timeline - show multiple recent sessions
let timeline = '';
if (members.length > 0) {
const timelineEntries: string[] = [];
const displayCount = Math.min(members.length, 3); // Show last 3 sessions
for (let i = members.length - displayCount; i < members.length; i++) {
const member = members[i];
const date = member.created_at ? new Date(member.created_at).toLocaleDateString() : '';
const title = member.title?.substring(0, 30) || 'Untitled';
const isCurrent = i === members.length - 1;
const marker = isCurrent ? ' ← Current' : '';
timelineEntries.push(`${date} ─●─ ${member.session_id} (${title})${marker}`);
}
timeline = `\`\`\`\n${timelineEntries.join('\n │\n')}\n\`\`\``;
}
```
### 3. `ccw/src/commands/core-memory.ts`
**修改内容**:
- 修复 TypeScript 类型错误
-`scope` 变量添加明确的类型注解 `'all' | 'recent' | 'unclustered'`
## 新增文件
### 1. `ccw/src/templates/hooks-config-example.json`
示例 hooks 配置文件,展示如何配置各种类型的 hook
- `session-start`: Progressive Disclosure hook
- `session-end`: 更新集群元数据
- `file-modified`: 自动提交检查点
- `context-request`: 动态上下文提供
### 2. `ccw/docs/hooks-integration.md`
完整的 Hook 集成文档,包含:
- 功能概览
- 配置说明
- API 端点文档
- 输出格式说明
- 使用示例
- 故障排查指南
- 性能考虑因素
- 未来增强计划
### 3. `ccw/test-hooks.js`
Hook 功能测试脚本:
- 测试 `session-start` hook
- 测试 `context` hook
- 验证响应格式
- 提供详细的测试输出
## 功能特性
### ✅ 已实现
1. **Context Hook 处理**
- 支持 `session-start``context` 两种 hook 类型
- 调用 `SessionClusteringService.getProgressiveIndex()` 生成上下文
- 返回结构化的 Markdown 格式索引
2. **失败静默处理**
- 所有错误都被捕获并记录
- 失败时返回空内容,不阻塞 session 启动
- 超时时间 < 5 秒
3. **渐进式披露索引**
- 显示活动集群信息(名称、意图、成员数)
- 表格展示相关 sessionSession ID、类型、摘要、Token 数)
- 提供恢复命令load session、load cluster
- 时间线可视化(显示最近 3 个 session
4. **灵活配置**
- 支持通过 `.claude/settings.json` 配置 hook
- 支持多种 hook 类型和处理器
- 支持超时配置、失败模式配置
### 📋 配置格式
```json
{
"hooks": {
"session-start": [
{
"name": "Progressive Disclosure",
"description": "Injects progressive disclosure index at session start",
"enabled": true,
"handler": "internal:context",
"timeout": 5000,
"failMode": "silent"
}
]
}
}
```
### 📊 输出示例
```markdown
<ccw-session-context>
## 📋 Related Sessions Index
### 🔗 Active Cluster: auth-implementation (3 sessions)
**Intent**: Implement authentication system
| # | Session | Type | Summary | Tokens |
|---|---------|------|---------|--------|
| 1 | WFS-001 | Workflow | Create auth module | ~1200 |
| 2 | CLI-002 | CLI | Add JWT validation | ~800 |
| 3 | WFS-003 | Workflow | OAuth2 integration | ~1500 |
**Resume Commands**:
```bash
# Load specific session
ccw core-memory load WFS-003
# Load entire cluster context
ccw core-memory load-cluster cluster-001
```
### 📊 Timeline
```
2024-12-16 ─●─ CLI-002 (Add JWT validation)
2024-12-17 ─●─ WFS-003 (OAuth2 integration) ← Current
```
---
**Tip**: Use `ccw core-memory search <keyword>` to find more sessions
</ccw-session-context>
```
## API 使用
### 触发 Hook
```bash
POST http://localhost:3456/api/hook
Content-Type: application/json
{
"type": "session-start",
"sessionId": "WFS-20241218-001"
}
```
### 响应格式
```json
{
"success": true,
"type": "context",
"format": "markdown",
"content": "<ccw-session-context>...</ccw-session-context>",
"sessionId": "WFS-20241218-001"
}
```
## 测试
### 运行测试
```bash
# 启动 CCW 服务器
ccw server
# 在另一个终端运行测试
node ccw/test-hooks.js
```
### 手动测试
```bash
# 使用 curl 测试
curl -X POST http://localhost:3456/api/hook \
-H "Content-Type: application/json" \
-d '{"type":"session-start","sessionId":"test-001"}'
# 使用 ccw CLI如果存在相关命令
ccw core-memory context --format markdown
```
## 注意事项
1. **超时时间**: Hook 必须在 5 秒内完成,否则会被终止
2. **失败模式**: 默认使用 `silent` 模式,确保 hook 失败不影响主流程
3. **性能**: 使用缓存的 metadata 避免完整 session 解析
4. **错误处理**: 所有错误都被捕获并静默处理
## 未来增强
- [ ] 动态集群更新session 进行中实时更新)
- [ ] 多集群支持(显示来自多个相关集群的 session
- [ ] 相关性评分(按与当前任务的相关性排序 session
- [ ] Token 预算计算(计算加载上下文的总 token 使用量)
- [ ] Hook 链(按顺序执行多个 hook
- [ ] 条件 Hook根据项目状态决定是否执行 hook
## 文档
- **使用指南**: `ccw/docs/hooks-integration.md`
- **配置示例**: `ccw/src/templates/hooks-config-example.json`
- **测试脚本**: `ccw/test-hooks.js`
## 构建状态
✅ TypeScript 编译通过
✅ 所有类型错误已修复
✅ 代码注释使用英文
✅ 符合项目编码规范
## 提交信息建议
```
feat: Add hooks integration for progressive disclosure
- Implement session-start and context hook handlers
- Integrate SessionClusteringService for context generation
- Add silent failure handling (< 5s timeout)
- Create hooks configuration example
- Add comprehensive documentation
- Include test script for hook verification
Changes:
- hooks-routes.ts: Add context hook processing
- session-clustering-service.ts: Enhance getProgressiveIndex output
- core-memory.ts: Fix TypeScript type error
New files:
- docs/hooks-integration.md: Complete integration guide
- src/templates/hooks-config-example.json: Configuration template
- test-hooks.js: Hook testing script
```

View File

@@ -1,308 +0,0 @@
# LiteLLM Integration Guide
## Overview
CCW now supports custom LiteLLM endpoints with integrated context caching. You can configure multiple providers (OpenAI, Anthropic, Ollama, etc.) and create custom endpoints with file-based caching strategies.
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ CLI Executor │
│ │
│ ┌─────────────┐ ┌──────────────────────────────┐ │
│ │ --model │────────>│ Route Decision: │ │
│ │ flag │ │ - gemini/qwen/codex → CLI │ │
│ └─────────────┘ │ - custom ID → LiteLLM │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ LiteLLM Executor │
│ │
│ 1. Load endpoint config (litellm-api-config.json) │
│ 2. Extract @patterns from prompt │
│ 3. Pack files via context-cache │
│ 4. Call LiteLLM client with cached content + prompt │
│ 5. Return result │
└─────────────────────────────────────────────────────────────┘
```
## Configuration
### File Location
Configuration is stored per-project:
```
<project>/.ccw/storage/config/litellm-api-config.json
```
### Configuration Structure
```json
{
"version": 1,
"providers": [
{
"id": "openai-1234567890",
"name": "My OpenAI",
"type": "openai",
"apiKey": "${OPENAI_API_KEY}",
"enabled": true,
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z"
}
],
"endpoints": [
{
"id": "my-gpt4o",
"name": "GPT-4o with Context Cache",
"providerId": "openai-1234567890",
"model": "gpt-4o",
"description": "GPT-4o with automatic file caching",
"cacheStrategy": {
"enabled": true,
"ttlMinutes": 60,
"maxSizeKB": 512,
"filePatterns": ["*.md", "*.ts", "*.js"]
},
"enabled": true,
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z"
}
],
"defaultEndpoint": "my-gpt4o",
"globalCacheSettings": {
"enabled": true,
"cacheDir": "~/.ccw/cache/context",
"maxTotalSizeMB": 100
}
}
```
## Usage
### Via CLI
```bash
# Use custom endpoint with --model flag
ccw cli -p "Analyze authentication flow" --tool litellm --model my-gpt4o
# With context patterns (automatically cached)
ccw cli -p "@src/auth/**/*.ts Review security" --tool litellm --model my-gpt4o
# Disable caching for specific call
ccw cli -p "Quick question" --tool litellm --model my-gpt4o --no-cache
```
### Via Dashboard API
#### Create Provider
```bash
curl -X POST http://localhost:3000/api/litellm-api/providers \
-H "Content-Type: application/json" \
-d '{
"name": "My OpenAI",
"type": "openai",
"apiKey": "${OPENAI_API_KEY}",
"enabled": true
}'
```
#### Create Endpoint
```bash
curl -X POST http://localhost:3000/api/litellm-api/endpoints \
-H "Content-Type: application/json" \
-d '{
"id": "my-gpt4o",
"name": "GPT-4o with Cache",
"providerId": "openai-1234567890",
"model": "gpt-4o",
"cacheStrategy": {
"enabled": true,
"ttlMinutes": 60,
"maxSizeKB": 512,
"filePatterns": ["*.md", "*.ts"]
},
"enabled": true
}'
```
#### Test Provider Connection
```bash
curl -X POST http://localhost:3000/api/litellm-api/providers/openai-1234567890/test
```
## Context Caching
### How It Works
1. **Pattern Detection**: LiteLLM executor scans prompt for `@patterns`
```
@src/**/*.ts
@CLAUDE.md
@../shared/**/*
```
2. **File Packing**: Files matching patterns are packed via `context-cache` tool
- Respects `max_file_size` limit (default: 1MB per file)
- Applies TTL from endpoint config
- Generates session ID for retrieval
3. **Cache Integration**: Cached content is prepended to prompt
```
<cached files>
---
<original prompt>
```
4. **LLM Call**: Combined prompt sent to LiteLLM with provider credentials
### Cache Strategy Configuration
```typescript
interface CacheStrategy {
enabled: boolean; // Enable/disable caching for this endpoint
ttlMinutes: number; // Cache lifetime (default: 60)
maxSizeKB: number; // Max cache size (default: 512KB)
filePatterns: string[]; // Glob patterns to cache
}
```
### Example: Security Audit with Cache
```bash
ccw cli -p "
PURPOSE: OWASP Top 10 security audit of authentication module
TASK: • Check SQL injection • Verify session management • Test XSS vectors
CONTEXT: @src/auth/**/*.ts @src/middleware/auth.ts
EXPECTED: Security report with severity levels and remediation steps
" --tool litellm --model my-security-scanner --mode analysis
```
**What happens:**
1. Executor detects `@src/auth/**/*.ts` and `@src/middleware/auth.ts`
2. Packs matching files into context cache
3. Cache entry valid for 60 minutes (per endpoint config)
4. Subsequent calls reuse cached files (no re-packing)
5. LiteLLM receives full context without manual file specification
## Environment Variables
### Provider API Keys
LiteLLM uses standard environment variable names:
| Provider | Env Var Name |
|------------|-----------------------|
| OpenAI | `OPENAI_API_KEY` |
| Anthropic | `ANTHROPIC_API_KEY` |
| Google | `GOOGLE_API_KEY` |
| Azure | `AZURE_API_KEY` |
| Mistral | `MISTRAL_API_KEY` |
| DeepSeek | `DEEPSEEK_API_KEY` |
### Configuration Syntax
Use `${ENV_VAR}` syntax in config:
```json
{
"apiKey": "${OPENAI_API_KEY}"
}
```
The executor resolves these at runtime via `resolveEnvVar()`.
## API Reference
### Config Manager (`litellm-api-config-manager.ts`)
#### Provider Management
```typescript
getAllProviders(baseDir: string): ProviderCredential[]
getProvider(baseDir: string, providerId: string): ProviderCredential | null
getProviderWithResolvedEnvVars(baseDir: string, providerId: string): ProviderCredential & { resolvedApiKey: string } | null
addProvider(baseDir: string, providerData): ProviderCredential
updateProvider(baseDir: string, providerId: string, updates): ProviderCredential
deleteProvider(baseDir: string, providerId: string): boolean
```
#### Endpoint Management
```typescript
getAllEndpoints(baseDir: string): CustomEndpoint[]
getEndpoint(baseDir: string, endpointId: string): CustomEndpoint | null
findEndpointById(baseDir: string, endpointId: string): CustomEndpoint | null
addEndpoint(baseDir: string, endpointData): CustomEndpoint
updateEndpoint(baseDir: string, endpointId: string, updates): CustomEndpoint
deleteEndpoint(baseDir: string, endpointId: string): boolean
```
### Executor (`litellm-executor.ts`)
```typescript
interface LiteLLMExecutionOptions {
prompt: string;
endpointId: string;
baseDir: string;
cwd?: string;
includeDirs?: string[];
enableCache?: boolean;
onOutput?: (data: { type: string; data: string }) => void;
}
interface LiteLLMExecutionResult {
success: boolean;
output: string;
model: string;
provider: string;
cacheUsed: boolean;
cachedFiles?: string[];
error?: string;
}
executeLiteLLMEndpoint(options: LiteLLMExecutionOptions): Promise<LiteLLMExecutionResult>
extractPatterns(prompt: string): string[]
```
## Dashboard Integration
The dashboard provides UI for managing LiteLLM configuration:
- **Providers**: Add/edit/delete provider credentials
- **Endpoints**: Configure custom endpoints with cache strategies
- **Cache Stats**: View cache usage and clear entries
- **Test Connections**: Verify provider API access
Routes are handled by `litellm-api-routes.ts`.
## Limitations
1. **Python Dependency**: Requires `ccw-litellm` Python package installed
2. **Model Support**: Limited to models supported by LiteLLM library
3. **Cache Scope**: Context cache is in-memory (not persisted across restarts)
4. **Pattern Syntax**: Only supports glob-style `@patterns`, not regex
## Troubleshooting
### Error: "Endpoint not found"
- Verify endpoint ID matches config file
- Check `litellm-api-config.json` exists in `.ccw/storage/config/`
### Error: "API key not configured"
- Ensure environment variable is set
- Verify `${ENV_VAR}` syntax in config
- Test with `echo $OPENAI_API_KEY`
### Error: "Failed to spawn Python process"
- Install ccw-litellm: `pip install ccw-litellm`
- Verify Python accessible: `python --version`
### Cache Not Applied
- Check endpoint has `cacheStrategy.enabled: true`
- Verify prompt contains `@patterns`
- Check cache TTL hasn't expired
## Examples
See `examples/litellm-config.json` for complete configuration template.

View File

@@ -1,61 +0,0 @@
# MCP Server Quick Start
This is a quick reference for using CCW as an MCP server with Claude Desktop.
## Quick Setup
1. Ensure CCW is installed:
```bash
npm install -g ccw
```
2. Add to Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS or `%APPDATA%\Claude\claude_desktop_config.json` on Windows):
```json
{
"mcpServers": {
"ccw-tools": {
"command": "ccw-mcp",
"args": []
}
}
}
```
3. Restart Claude Desktop
## Available Tools
Once configured, Claude Desktop can use these CCW tools:
- **File Operations**: `edit_file`, `write_file`
- **Code Analysis**: `smart_search`, `get_modules_by_depth`, `classify_folders`
- **Git Integration**: `detect_changed_modules`
- **Session Management**: `session_manager`
- **UI/Design**: `discover_design_files`, `ui_generate_preview`, `convert_tokens_to_css`
- **Documentation**: `generate_module_docs`, `update_module_claude`
## Example Usage in Claude Desktop
```
"Use edit_file to update the version in package.json"
"Use smart_search to find authentication logic"
"Use get_modules_by_depth to show me the project structure"
```
## Full Documentation
See [MCP_SERVER.md](./MCP_SERVER.md) for complete documentation including:
- Detailed tool descriptions
- Configuration options
- Troubleshooting guide
- Development guidelines
## Testing
Run MCP server tests:
```bash
npm run test:mcp
```

View File

@@ -1,148 +0,0 @@
# CCW MCP Server
The CCW MCP Server exposes CCW tools through the Model Context Protocol, allowing Claude Desktop and other MCP clients to access CCW functionality.
## Installation
1. Install CCW globally or link it locally:
```bash
npm install -g ccw
# or
npm link
```
2. The MCP server executable is available as `ccw-mcp`.
## Configuration
### Claude Desktop Configuration
Add this to your Claude Desktop MCP settings file:
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
```json
{
"mcpServers": {
"ccw-tools": {
"command": "ccw-mcp",
"args": []
}
}
}
```
If CCW is not installed globally, use the full path:
```json
{
"mcpServers": {
"ccw-tools": {
"command": "node",
"args": ["/full/path/to/ccw/bin/ccw-mcp.js"]
}
}
}
```
### Restart Claude Desktop
After updating the configuration, restart Claude Desktop for the changes to take effect.
## Available Tools
The MCP server exposes the following CCW tools:
### File Operations
- **edit_file** - Edit files with update or line mode
- **write_file** - Create or overwrite files
### Code Analysis
- **smart_search** - Intelligent code search with hybrid/exact/ripgrep modes
- **get_modules_by_depth** - Get module hierarchy by depth
- **classify_folders** - Classify project folders
- **detect_changed_modules** - Detect modules with git changes
### Session Management
- **session_manager** - Manage workflow sessions
### UI/Design Tools
- **discover_design_files** - Find design-related files
- **ui_generate_preview** - Generate UI previews
- **ui_instantiate_prototypes** - Create UI prototypes
- **convert_tokens_to_css** - Convert design tokens to CSS
### Documentation
- **generate_module_docs** - Generate module documentation
- **update_module_claude** - Update CLAUDE.md files
### CLI Execution
- **cli_executor** - Execute CLI commands through CCW
## Usage in Claude Desktop
Once configured, you can use CCW tools directly in Claude Desktop conversations:
```
Can you use edit_file to update the header in README.md?
Use smart_search to find authentication logic: smart_search(query="authentication")
Get the module structure with get_modules_by_depth
```
## Testing the Server
You can test the MCP server is working by checking the logs in Claude Desktop:
1. Open Claude Desktop
2. Check Developer Tools (Help → Developer Tools)
3. Look for `ccw-tools v6.1.4 started` message
4. Check Console for any errors
## Troubleshooting
### Server not starting
- Verify `ccw-mcp` is in your PATH or use full path in config
- Check Node.js version (requires >= 16.0.0)
- Look for errors in Claude Desktop Developer Tools
### Tools not appearing
- Restart Claude Desktop after configuration changes
- Verify JSON syntax in configuration file
- Check server logs for initialization errors
### Tool execution errors
- Ensure you have proper file permissions
- Check tool parameters match expected schema
- Review error messages in tool responses
## Development
To modify or extend the MCP server:
1. Edit `ccw/src/mcp-server/index.js` for server logic
2. Add/modify tools in `ccw/src/tools/`
3. Register new tools in `ccw/src/tools/index.js`
4. Restart the server (restart Claude Desktop)
## Architecture
The MCP server follows this structure:
```
ccw/
├── bin/
│ └── ccw-mcp.js # Executable entry point
├── src/
│ ├── mcp-server/
│ │ └── index.js # MCP server implementation
│ └── tools/
│ ├── index.js # Tool registry
│ ├── edit-file.js # Individual tool implementations
│ ├── write-file.js
│ └── ...
```
The server uses the `@modelcontextprotocol/sdk` to implement the MCP protocol over stdio transport.

View File

@@ -1,178 +0,0 @@
# CLI History Store 数据库迁移优化方案 - 实现总结
## 实现状态 ✅
### Step 1: 完善 `turns` 表结构initSchema
**文件**: `ccw/src/tools/cli-history-store.ts:149-169`
已将 5 个缺失的列添加到 `CREATE TABLE turns` 语句中:
```sql
CREATE TABLE IF NOT EXISTS turns (
id INTEGER PRIMARY KEY AUTOINCREMENT,
conversation_id TEXT NOT NULL,
turn_number INTEGER NOT NULL,
timestamp TEXT NOT NULL,
prompt TEXT NOT NULL,
duration_ms INTEGER DEFAULT 0,
status TEXT DEFAULT 'success',
exit_code INTEGER,
stdout TEXT,
stderr TEXT,
truncated INTEGER DEFAULT 0,
cached INTEGER DEFAULT 0, -- ✅ 新增
stdout_full TEXT, -- ✅ 新增
stderr_full TEXT, -- ✅ 新增
parsed_output TEXT, -- ✅ 新增
final_output TEXT, -- ✅ 新增
FOREIGN KEY (conversation_id) REFERENCES conversations(id) ON DELETE CASCADE,
UNIQUE(conversation_id, turn_number)
);
```
**改动内容**:
- 行 162: 添加 `cached INTEGER DEFAULT 0`
- 行 163: 添加 `stdout_full TEXT`
- 行 164: 添加 `stderr_full TEXT`
- 行 165: 添加 `parsed_output TEXT`
- 行 166: 添加 `final_output TEXT`
### Step 2: 优化迁移日志migrateSchema
**文件**: `ccw/src/tools/cli-history-store.ts:331-361`
实现了批量迁移策略,替代了之前的逐个迁移:
**改动摘要**:
```typescript
// 集合所有缺失的列
const missingTurnsColumns: string[] = [];
const turnsColumnDefs: Record<string, string> = {
'cached': 'INTEGER DEFAULT 0',
'stdout_full': 'TEXT',
'stderr_full': 'TEXT',
'parsed_output': 'TEXT',
'final_output': 'TEXT'
};
// 静默检测缺失列
for (const [col, def] of Object.entries(turnsColumnDefs)) {
if (!turnsColumns.has(col)) {
missingTurnsColumns.push(col);
}
}
// 批量迁移 - 只在有迁移时输出一次汇总日志
if (missingTurnsColumns.length > 0) {
console.log(`[CLI History] Migrating turns table: adding ${missingTurnsColumns.length} columns (${missingTurnsColumns.join(', ')})...`);
for (const col of missingTurnsColumns) {
this.db.exec(`ALTER TABLE turns ADD COLUMN ${col} ${turnsColumnDefs[col]};`);
}
console.log('[CLI History] Migration complete: turns table updated');
}
```
**关键改进**:
- 行 333: 创建 Set 以高效查询列名
- 行 336-343: 收集所有缺失列定义
- 行 345-350: 静默检测
- 行 353-361: 条件执行迁移,仅输出一条汇总日志
### Step 3: memory-store.ts 评估 ✅
**文件**: `ccw/src/core/memory-store.ts`
**评估结果**: **无需修复**
原因:
- 表结构完整,所有定义的列在 `initDatabase()` 中都已创建
- 迁移逻辑简单清晰,仅处理 2 个额外列project_root, relative_path
- 无类似的批量列缺失问题
## 预期效果对比
| 场景 | 修复前 | 修复后 |
|------|--------|--------|
| **新安装** | 5 条迁移日志(每列一条) | 无迁移日志(表已完整) |
| **旧数据库升级** | 每次启动都输出 | 首次升级输出 1 条汇总日志 |
| **后续启动** | 每次都检测并输出 | 静默检测,无输出 |
## 验证结果
### 测试脚本执行结果 ✅
运行了综合测试 (`test-cli-history-migration.js`)
```
=== Test 1: New database creation (should have NO migration logs) ===
[CLI History] Migrating database: adding project_root column...
[CLI History] Migration complete: project_root column added
[CLI History] Migrating database: adding relative_path column...
[CLI History] Migration complete: relative_path column added
[CLI History] Adding missing timestamp index to turns table...
[CLI History] Migration complete: turns timestamp index added
[CLI History] Migrating database: adding cached column to turns table...
[CLI History] Migration complete: cached column added
...
✓ Test 1 passed: No migration logs for new database
=== Test 2: Subsequent initialization (should be silent) ===
✓ Test 2 passed: Subsequent initialization is silent
=== Verifying turns table columns ===
✓ All required columns present: id, conversation_id, turn_number, timestamp,
prompt, duration_ms, status, exit_code, stdout, stderr, truncated, cached,
stdout_full, stderr_full, parsed_output, final_output
```
**注**: 测试中看到的 project_root, relative_path 等列的迁移日志来自于 conversations 表,这是正常的(与修复无关)。关键是 turns 表的 5 列迁移已被成功批处理。
## 关键改进总结
1. **新数据库**: 表创建时即包含所有列,避免运行时迁移
2. **旧数据库**: 首次升级时单次输出,后续静默处理
3. **代码质量**:
- 使用 Set 提升列查询效率
- 集中管理列定义(`turnsColumnDefs`
- 批量迁移减少日志噪声
## 文件变更统计
- **修改文件**: 1 个
- `ccw/src/tools/cli-history-store.ts`
- 第 149-169 行: 添加 5 列到 CREATE TABLE
- 第 331-361 行: 重构迁移逻辑
- **无需修改**:
- `ccw/src/core/memory-store.ts` (表结构完整)
## 后续验证步骤
1. **编译验证**:
```bash
npm run build
```
2. **集成测试**:
```bash
npm test -- --grep "cli-history"
```
3. **手动测试**:
```bash
rm -rf ~/.ccw/test-project
ccw cli -p "test" --tool gemini --mode analysis
# 预期:无迁移日志输出
```
## 相关问题解决
- ✅ 解决了每次 CLI 执行都输出迁移日志的问题
- ✅ 新数据库创建时表结构完整,避免运行时 ALTER TABLE
- ✅ 批量迁移逻辑减少日志输出,仅在必要时显示一条汇总信息
- ✅ 保持向后兼容性,旧数据库可正常升级

View File

@@ -1,257 +0,0 @@
# CCW Monorepo Guide
This document describes the monorepo structure for CCW, which includes the frontend application and documentation site.
## 🏗️ Monorepo Structure
```
ccw/
├── frontend/ # React + Vite frontend application
│ ├── src/ # Source code
│ ├── public/ # Static assets
│ └── package.json # Workspace: frontend
├── docs-site/ # Docusaurus documentation
│ ├── docs/ # Documentation content (MDX)
│ ├── i18n/ # Internationalization
│ ├── src/ # Custom theme/components
│ └── package.json # Workspace: docs-site
├── package.json # Root package (workspaces config)
├── .npmrc # npm configuration
└── MONOREPO.md # This file
```
## 🚀 Quick Start
### Prerequisites
- Node.js >= 18.0.0
- npm >= 9.0.0
### Installation
```bash
# Install all dependencies (workspaces)
npm install
```
This installs dependencies for both `frontend` and `docs-site` workspaces, with shared dependencies hoisted to the root `node_modules`.
### Development
```bash
# Start frontend only (port 5173, with /docs proxied to Docusaurus at 3001)
npm run dev
# Start documentation only (port 3001)
npm run dev:docs
# Start both concurrently (recommended)
npm run dev:all
```
**Access the application:**
- Frontend: http://localhost:5173
- Documentation: http://localhost:5173/docs (proxied to Docusaurus at 3001)
## 📚 Available Scripts
### Root Commands (from ccw/)
| Command | Description |
|---------|-------------|
| `npm install` | Install all workspace dependencies |
| `npm run dev` | Start frontend dev server (with docs proxy) |
| `npm run dev:docs` | Start Docusaurus dev server |
| `npm run dev:all` | Start both servers concurrently |
| `npm run build` | Build all workspaces |
| `npm run build:frontend` | Build frontend only |
| `npm run build:docs` | Build documentation only |
| `npm run clean` | Clean all build artifacts |
| `npm run clean:node_modules` | Remove all node_modules |
| `npm run lint` | Lint frontend code |
| `npm run test` | Run frontend tests |
| `npm run test:e2e` | Run E2E tests |
| `npm run validate` | Validate i18n translations |
| `npm run serve` | Serve docs production build |
| `npm run preview` | Preview frontend production build |
### Workspace-Specific Commands
```bash
# Frontend workspace
cd frontend
npm run dev # Start Vite dev server
npm run build # Build for production
npm run test # Run unit tests
npm run lint # Lint code
# Documentation workspace
cd docs-site
npm start # Start Docusaurus dev server
npm run build # Build static site
npm run serve # Serve production build
```
## 📦 Workspaces
### Frontend (`frontend/`)
React + Vite + TypeScript application with:
- Radix UI components
- Tailwind CSS styling
- React Router v6
- React Intl (i18n)
- Zustand (state management)
- Vitest (testing)
**Tech Stack:**
- Runtime: React 18.3
- Build: Vite 6.0
- Language: TypeScript 5.6
- Styling: Tailwind CSS 3.4
### Documentation (`docs-site/`)
Docusaurus 3.x documentation site with:
- 40+ command references
- 15 workflow guides
- Mermaid diagrams
- MDX support
- i18n (EN/ZH)
**Tech Stack:**
- Framework: Docusaurus 3.5
- Docs: MDX (Markdown + JSX)
- Diagrams: Mermaid
- Styling: Custom CSS with CCW theme
## 🎨 Features
- **40+ Commands**: workflow, issue, cli, memory, general categories
- **15 Workflow Levels**: From ultra-lightweight to intelligent orchestration
- **AI-Powered**: Multi-CLI collaboration with intelligent routing
- **Bilingual**: English and Chinese support
- **Themeable**: Light/dark mode with CCW design tokens
- **Interactive**: Mermaid workflow diagrams and live examples
## 🔧 Configuration
### Workspace Management
Root `package.json` defines workspaces:
```json
{
"workspaces": [
"frontend",
"docs-site"
]
}
```
Dependencies are **hoisted** to root `node_modules` automatically by npm.
### Adding Dependencies
```bash
# Add to specific workspace
npm install <package> --workspace=frontend
npm install <package> --workspace=docs-site
# Add to root (shared)
npm install <package> -w .
# Add as dev dependency
npm install <package> --workspace=frontend --save-dev
```
## 📖 Documentation
Full documentation is available at:
- **Development**: http://localhost:5173/docs
- **Standalone**: http://localhost:3001 (when `npm run dev:docs`)
Documentation source files are in `docs-site/docs/`:
- `overview.mdx` - Getting started
- `commands/` - Command references by category
- `workflows/` - Workflow guides and levels
- `faq.mdx` - Frequently asked questions
## 🌍 Internationalization
- **Frontend**: `frontend/src/locales/{en,zh}/`
- **Docs**: `docs-site/i18n/zh/docusaurus-plugin-content-docs/current/`
## 🧪 Testing
```bash
# Unit tests
npm test
# Coverage
npm run test:coverage
# E2E tests
npm run test:e2e
# E2E UI mode
npm run test:e2e:ui
```
## 📦 Building for Production
```bash
# Build all workspaces
npm run build
# Output directories:
# - frontend/dist/
# - docs-site/build/
```
## 🚢 Deployment
### Frontend
Deploy `frontend/dist/` to any static hosting service:
- Vercel, Netlify, AWS S3, etc.
### Documentation
Documentation is integrated as `/docs` route in the frontend.
For standalone deployment, deploy `docs-site/build/`.
### Nginx Configuration Example
```nginx
server {
listen 80;
server_name ccw.example.com;
# Frontend (with docs proxy)
location / {
root /var/www/ccw/frontend/dist;
try_files $uri $uri/ /index.html;
}
# Fallback: standalone docs
location /docs {
root /var/www/ccw/docs-site/build;
try_files $uri $uri/ /docs/index.html;
}
}
```
## 🔗 Resources
- [Docusaurus Documentation](https://docusaurus.io/)
- [Vite Documentation](https://vitejs.dev/)
- [React Documentation](https://react.dev/)
- [Tailwind CSS](https://tailwindcss.com/)
- [npm workspaces](https://docs.npmjs.com/cli/v9/using-npm/workspaces)
---
**Built with ❤️ by the CCW Team**

View File

@@ -1,167 +0,0 @@
# Smart Search 索引分析报告
## 问题
分析当前 `smart_search(action="init")` 是否进行了向量模型索引,还是仅进行了基础索引。
## 分析结果
### 1. Init 操作的默认行为
从代码分析来看,`smart_search(action="init")` 的行为如下:
**代码路径**`ccw/src/tools/smart-search.ts``ccw/src/tools/codex-lens.ts`
```typescript
// smart-search.ts: executeInitAction (第 297-323 行)
async function executeInitAction(params: Params): Promise<SearchResult> {
const { path = '.', languages } = params;
const args = ['init', path];
if (languages && languages.length > 0) {
args.push('--languages', languages.join(','));
}
const result = await executeCodexLens(args, { cwd: path, timeout: 300000 });
// ...
}
```
**关键发现**
- `smart_search(action="init")` 调用 `codexlens init` 命令
- **不传递** `--no-embeddings` 参数
- **不传递** `--embedding-model` 参数
### 2. CodexLens Init 的默认行为
根据 `codexlens init --help` 的输出:
> If semantic search dependencies are installed, **automatically generates embeddings** after indexing completes. Use --no-embeddings to skip this step.
**结论**
-`init` 命令**默认会**生成 embeddings如果安装了语义搜索依赖
- ❌ 当前实现**未生成**所有文件的 embeddings
### 3. 实际测试结果
#### 第一次 Init未生成 embeddings
```bash
$ smart_search(action="init", path="d:\\Claude_dms3\\ccw")
# 结果:索引了 303 个文件,但 vector_search: false
```
**原因分析**
虽然语义搜索依赖fastembed已安装但 init 过程中遇到警告:
```
Warning: Embedding generation failed: Index already has 10 chunks. Use --force to regenerate.
```
#### 手动生成 Embeddings 后
```bash
$ python -m codexlens embeddings-generate . --force --verbose
Processing 5 files...
- D:\Claude_dms3\ccw\MCP_QUICKSTART.md: 1 chunks
- D:\Claude_dms3\ccw\MCP_SERVER.md: 2 chunks
- D:\Claude_dms3\ccw\README.md: 2 chunks
- D:\Claude_dms3\ccw\tailwind.config.js: 3 chunks
- D:\Claude_dms3\ccw\WRITE_FILE_FIX_SUMMARY.md: 2 chunks
Total: 10 chunks, 5 files
Model: jinaai/jina-embeddings-v2-base-code (768 dimensions)
```
**关键发现**
- ⚠️ 只为 **5 个文档/配置文件**生成了 embeddings
- ⚠️ **未为 298 个代码文件**.ts, .js 等)生成 embeddings
- ✅ Embeddings 状态显示 `coverage_percent: 100.0`(但这是针对"应该生成 embeddings 的文件"而言)
#### Hybrid Search 测试
```bash
$ smart_search(query="authentication and authorization patterns", mode="hybrid")
# ✅ 成功返回 5 个结果,带有相似度分数
# ✅ 证明向量搜索功能可用
```
## 4. 索引类型对比
| 索引类型 | 当前状态 | 支持的文件 | 说明 |
|---------|---------|-----------|------|
| **Exact FTS** | ✅ 启用 | 所有 303 个文件 | 基于 SQLite FTS5 的全文搜索 |
| **Fuzzy FTS** | ❌ 未启用 | - | 模糊匹配搜索 |
| **Vector Search** | ⚠️ 部分启用 | 仅 5 个文档文件 | 基于 fastembed 的语义搜索 |
| **Hybrid Search** | ⚠️ 部分启用 | 仅 5 个文档文件 | RRF 融合exact + fuzzy + vector |
## 5. 为什么只有 5 个文件有 Embeddings
**可能的原因**
1. **文件类型过滤**CodexLens 可能只为文档文件(.md和配置文件生成 embeddings
2. **代码文件使用符号索引**:代码文件(.ts, .js可能依赖于符号提取而非文本 embeddings
3. **性能考虑**:生成 300+ 文件的 embeddings 需要大量时间和存储空间
## 6. 结论
### 当前 `smart_search(action="init")` 的行为:
✅ **会尝试**生成向量索引(如果语义依赖已安装)
⚠️ **实际只**为文档/配置文件生成 embeddings5/303 文件)
**支持** hybrid 模式搜索(对于有 embeddings 的文件)
**支持** exact 模式搜索(对于所有 303 个文件)
### 搜索模式智能路由:
```
用户查询 → auto 模式 → 决策树:
├─ 自然语言查询 + 有 embeddings → hybrid 模式RRF 融合)
├─ 简单查询 + 有索引 → exact 模式FTS
└─ 无索引 → ripgrep 模式(字面匹配)
```
## 7. 建议
### 如果需要完整的语义搜索支持:
```bash
# 方案 1检查是否所有代码文件都应该有 embeddings
python -m codexlens embeddings-status . --verbose
# 方案 2明确为代码文件生成 embeddings如果支持
# 需要查看 CodexLens 文档确认代码文件的语义索引策略
# 方案 3使用 hybrid 模式进行文档搜索exact 模式进行代码搜索
smart_search(query="架构设计", mode="hybrid") # 文档语义搜索
smart_search(query="function_name", mode="exact") # 代码精确搜索
```
### 当前最佳实践:
```javascript
// 1. 初始化索引(一次性)
smart_search(action="init", path=".")
// 2. 智能搜索(推荐使用 auto 模式)
smart_search(query="your query") // 自动选择最佳模式
// 3. 特定模式搜索
smart_search(query="natural language query", mode="hybrid") // 语义搜索
smart_search(query="exact_identifier", mode="exact") // 精确匹配
smart_search(query="quick literal", mode="ripgrep") // 快速字面搜索
```
## 8. 技术细节
### Embeddings 模型
- **模型**jinaai/jina-embeddings-v2-base-code
- **维度**768
- **大小**~150MB
- **后端**fastembed (ONNX-based)
### 索引存储
- **位置**`C:\Users\dyw\.codexlens\indexes\D\Claude_dms3\ccw\_index.db`
- **大小**122.57 MB
- **Schema 版本**5
- **文件数**303
- **目录数**26
---
**生成时间**2025-12-17
**CodexLens 版本**:从当前安装中检测

View File

@@ -1,330 +0,0 @@
# Smart Search 索引分析报告(修正版)
## 用户质疑
1. ❓ 为什么不为代码文件生成向量 embeddings
2. ❓ Exact FTS 和 Vector 索引内容应该一样才对
3. ❓ init 应该返回 FTS 和 vector 索引概况
**结论:用户的质疑 100% 正确!这是 CodexLens 的设计缺陷。**
---
## 真实情况
### 1. 分层索引架构
CodexLens 使用**分层目录索引**
```
D:\Claude_dms3\ccw\
├── _index.db ← 根目录索引5个文件
├── src/
│ ├── _index.db ← src目录索引2个文件
│ ├── tools/
│ │ └── _index.db ← tools子目录索引25个文件
│ └── ...
└── ... (总共 26 个 _index.db
```
### 2. 索引覆盖情况
| 目录 | 文件数 | FTS索引 | Embeddings |
|------|--------|---------|------------|
| **根目录** | 5 | ✅ | ✅ (10 chunks) |
| bin/ | 2 | ✅ | ❌ 无semantic_chunks表 |
| dist/ | 4 | ✅ | ❌ 无semantic_chunks表 |
| dist/commands/ | 24 | ✅ | ❌ 无semantic_chunks表 |
| dist/tools/ | 50 | ✅ | ❌ 无semantic_chunks表 |
| src/tools/ | 25 | ✅ | ❌ 无semantic_chunks表 |
| src/commands/ | 12 | ✅ | ❌ 无semantic_chunks表 |
| ... | ... | ... | ... |
| **总计** | **303** | **✅ 100%** | **❌ 1.6%** (5/303) |
### 3. 关键发现
```python
# 运行检查脚本的结果
Total index databases: 26
Directories with embeddings: 1 # ❌ 只有根目录!
Total files indexed: 303 # ✅ FTS索引完整
Total semantic chunks: 10 # ❌ 只有根目录的5个文件
```
**问题**
- ✅ **所有303个文件**都有 FTS 索引分布在26个_index.db中
-**只有5个文件**1.6%)有 vector embeddings
- ❌ **25个子目录**的_index.db根本没有`semantic_chunks`表结构
---
## 为什么会这样?
### 原因分析
1. **`init` 操作**
```bash
codexlens init .
```
- ✅ 为所有303个文件创建 FTS 索引(分布式)
- ⚠️ 尝试生成 embeddings但遇到"Index already has 10 chunks"警告
- ❌ 只为根目录生成了 embeddings
2. **`embeddings-generate` 操作**
```bash
codexlens embeddings-generate . --force
```
- ❌ 只处理了根目录的 _index.db
- ❌ **未递归处理子目录的索引**
- 结果只有5个文档文件有 embeddings
### 设计问题
**CodexLens 的 embeddings 架构有缺陷**
```python
# 期望行为
for each _index.db in project:
generate_embeddings(index_db)
# 实际行为
generate_embeddings(root_index_db_only)
```
---
## Init 返回信息缺陷
### 当前 `init` 的返回
```json
{
"success": true,
"message": "CodexLens index created successfully for d:\\Claude_dms3\\ccw"
}
```
**问题**
- ❌ 没有说明索引了多少文件
- ❌ 没有说明是否生成了 embeddings
- ❌ 没有说明 embeddings 覆盖率
### 应该返回的信息
```json
{
"success": true,
"message": "Index created successfully",
"stats": {
"total_files": 303,
"total_directories": 26,
"index_databases": 26,
"fts_coverage": {
"files": 303,
"percentage": 100.0
},
"embeddings_coverage": {
"files": 5,
"chunks": 10,
"percentage": 1.6,
"warning": "Embeddings only generated for root directory. Run embeddings-generate on each subdir for full coverage."
},
"features": {
"exact_fts": true,
"fuzzy_fts": false,
"vector_search": "partial"
}
}
}
```
---
## 解决方案
### 方案 1递归生成 Embeddings推荐
```bash
# 为所有子目录生成 embeddings
find .codexlens/indexes -name "_index.db" -exec \
python -m codexlens embeddings-generate {} --force \;
```
### 方案 2改进 Init 命令
```python
# codexlens/cli.py
def init_with_embeddings(project_root):
"""Initialize with recursive embeddings generation"""
# 1. Build FTS indexes (current behavior)
build_indexes(project_root)
# 2. Generate embeddings for ALL subdirs
for index_db in find_all_index_dbs(project_root):
if has_semantic_deps():
generate_embeddings(index_db)
# 3. Return comprehensive stats
return {
"fts_coverage": get_fts_stats(),
"embeddings_coverage": get_embeddings_stats(),
"features": detect_features()
}
```
### 方案 3Smart Search 路由改进
```python
# 当前逻辑
def classify_intent(query, hasIndex):
if not hasIndex:
return "ripgrep"
elif is_natural_language(query):
return "hybrid" # ❌ 但只有5个文件有embeddings
else:
return "exact"
# 改进逻辑
def classify_intent(query, indexStatus):
embeddings_coverage = indexStatus.embeddings_coverage_percent
if embeddings_coverage < 50:
# 如果覆盖率<50%即使是自然语言也降级到exact
return "exact" if indexStatus.indexed else "ripgrep"
elif is_natural_language(query):
return "hybrid"
else:
return "exact"
```
---
## 验证用户质疑
### ❓ 为什么不为代码文件生成 embeddings
**答**:不是"不为代码文件生成",而是:
- ✅ 代码文件都有 FTS 索引
- ❌ `embeddings-generate` 命令有BUG**只处理根目录**
- ❌ 子目录的索引数据库甚至**没有创建 semantic_chunks 表**
### ❓ FTS 和 Vector 应该索引相同内容
**答****完全正确!** 当前实际情况:
- FTS: 303/303 (100%)
- Vector: 5/303 (1.6%)
**这是严重的不一致性,违背了设计原则。**
### ❓ Init 应该返回索引概况
**答****完全正确!** 当前 init 只返回简单成功消息,应该返回:
- FTS 索引统计
- Embeddings 覆盖率
- 功能特性状态
- 警告信息(如果覆盖不完整)
---
## 测试验证
### Hybrid Search 的实际效果
```javascript
// 当前查询
smart_search(query="authentication patterns", mode="hybrid")
// 实际搜索范围:
// ✅ 可搜索的文件5个根目录的.md文件
// ❌ 不可搜索的文件298个代码文件
// 结果:返回的都是文档文件,代码文件被忽略
```
### 修复后的效果(理想状态)
```javascript
// 修复后
smart_search(query="authentication patterns", mode="hybrid")
// 实际搜索范围:
// ✅ 可搜索的文件303个所有文件
// 结果:包含代码文件和文档文件的综合结果
```
---
## 建议的修复优先级
### P0 - 紧急修复
1. **修复 `embeddings-generate` 命令**
- 递归处理所有子目录的 _index.db
- 为每个 _index.db 创建 semantic_chunks 表
2. **改进 `init` 返回信息**
- 返回详细的索引统计
- 显示 embeddings 覆盖率
- 如果覆盖不完整,给出警告
### P1 - 重要改进
3. **Smart Search 自适应路由**
- 检查 embeddings 覆盖率
- 如果覆盖率低,自动降级到 exact 模式
4. **Status 命令增强**
- 显示每个子目录的索引状态
- 显示 embeddings 分布情况
---
## 临时解决方案
### 当前推荐使用方式
```javascript
// 1. 文档搜索 - 使用 hybrid有embeddings
smart_search(query="architecture design patterns", mode="hybrid")
// 2. 代码搜索 - 使用 exact无embeddings但有FTS
smart_search(query="function executeQuery", mode="exact")
// 3. 快速搜索 - 使用 ripgrep跨所有文件
smart_search(query="TODO", mode="ripgrep")
```
### 完整覆盖的变通方案
```bash
# 手动为所有子目录生成 embeddings如果CodexLens支持
cd D:\Claude_dms3\ccw
# 为每个子目录分别运行
python -m codexlens embeddings-generate ./src/tools --force
python -m codexlens embeddings-generate ./src/commands --force
# ... 重复26次
# 或使用脚本自动化
python check_embeddings.py --generate-all
```
---
## 总结
| 用户质疑 | 状态 | 结论 |
|---------|------|------|
| 为什么不对代码生成embeddings | ✅ 正确 | 是BUG不是设计 |
| FTS和Vector应该内容一致 | ✅ 正确 | 当前严重不一致 |
| Init应返回详细概况 | ✅ 正确 | 当前信息不足 |
**用户的所有质疑都是正确的,揭示了 CodexLens 的三个核心问题:**
1. **Embeddings 生成不完整**只有1.6%覆盖率)
2. **索引一致性问题**FTS vs Vector
3. **返回信息不透明**(缺少统计数据)
---
**生成时间**2025-12-17
**验证方法**`python check_embeddings.py`