mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-02-05 01:50:27 +08:00

Files

catlog22 d06a3ca12e Add comprehensive workflows for CLI tools usage, coding philosophy, context requirements, file modification, and CodexLens auto hybrid mode

- Introduced a detailed guide for intelligent tools selection strategy, including quick reference, tool specifications, prompt templates, and best practices for CLI execution.
- Established a coding philosophy document outlining core beliefs, simplicity principles, and guidelines for effective coding practices.
- Created context requirements documentation emphasizing the importance of understanding existing patterns and dependencies before implementation.
- Developed a file modification workflow detailing the use of edit_file and write_file MCP tools, along with priority logic for file reading and editing.
- Implemented CodexLens auto hybrid mode, enhancing the CLI with automatic vector embedding generation and default hybrid search mode based on embedding availability.

2025-12-17 09:53:30 +08:00

9.2 KiB

Raw Blame History

CodexLens Auto Hybrid Mode - Implementation Summary

概述

实现了两个主要功能：

自动向量嵌入生成：init 命令在检测到语义搜索依赖后自动生成向量嵌入
默认混合搜索模式：search 命令在检测到嵌入存在时自动使用 hybrid 模式

修改文件

1. codex-lens CLI (`codex-lens/src/codexlens/cli/commands.py`)

1.1 `init` 命令增强

新增参数：

--no-embeddings: 跳过自动嵌入生成
--embedding-model: 指定嵌入模型 (默认: "code")

自动嵌入生成逻辑：

# 在 init 成功后
if not no_embeddings:
    from codexlens.semantic import SEMANTIC_AVAILABLE
    if SEMANTIC_AVAILABLE:
        # 自动调用 generate_embeddings()
        # 使用指定的 embedding_model

行为：

检测 fastembed 和 numpy 是否安装
如果可用，自动生成嵌入（可用 --no-embeddings 跳过）
默认使用 "code" 模型 (jinaai/jina-embeddings-v2-base-code)
在输出中显示嵌入生成进度和统计

1.2 `search` 命令增强

模式变更：

默认模式从 "exact" 改为 "auto"
新增 "auto" 模式到有效模式列表

自动模式检测逻辑：

if mode == "auto":
    # 检查项目是否有嵌入
    project_record = registry.find_by_source_path(str(search_path))
    if project_record:
        embed_status = check_embeddings_status(index_path)
        if has_embeddings:
            actual_mode = "hybrid"  # 使用混合模式
        else:
            actual_mode = "exact"   # 降级到精确模式

行为：

默认使用 auto 模式
自动检测索引是否有嵌入
有嵌入 → 使用 hybrid 模式（精确 + 模糊 + 向量融合）
无嵌入 → 使用 exact 模式（仅全文搜索）
用户仍可手动指定模式覆盖自动检测

2. MCP 工具简化 (`ccw/src/tools/codex-lens.ts`)

2.1 简化 action 枚举

仅暴露核心操作：

init: 初始化索引（自动生成嵌入）
search: 搜索代码（自动混合模式）
search_files: 搜索文件路径

移除的高级操作（仍可通过 CLI 使用）：

~~symbol~~: 符号提取 → 使用 codexlens symbol
~~status~~: 状态检查 → 使用 codexlens status
~~config_show/set/migrate~~: 配置管理 → 使用 codexlens config
~~clean~~: 清理索引 → 使用 codexlens clean
~~bootstrap/check~~: 安装管理 → 自动处理

简化的 ParamsSchema：

const ParamsSchema = z.object({
  action: z.enum(['init', 'search', 'search_files']),
  path: z.string().optional(),
  query: z.string().optional(),
  mode: z.enum(['auto', 'text', 'semantic', 'exact', 'fuzzy', 'hybrid', 'vector', 'pure-vector']).default('auto'),
  languages: z.array(z.string()).optional(),
  limit: z.number().default(20),
});

2.2 扩展 mode 枚举并设置默认值

模式支持：

mode: z.enum(['auto', 'text', 'semantic', 'exact', 'fuzzy', 'hybrid', 'vector', 'pure-vector']).default('auto')

模式映射（MCP → CLI）：

const modeMap: Record<string, string> = {
  'text': 'exact',
  'semantic': 'pure-vector',
  'auto': 'auto',         // 默认：自动检测
  'exact': 'exact',
  'fuzzy': 'fuzzy',
  'hybrid': 'hybrid',
  'vector': 'vector',
  'pure-vector': 'pure-vector',
};

2.3 传递 mode 参数到 CLI

const args = ['search', query, '--limit', limit.toString(), '--mode', cliMode, '--json'];

3. 文档更新 (`.claude/rules/context-requirements.md`)

3.1 更新 init 说明

强调自动嵌入生成功能：

**NEW**: `init` automatically generates vector embeddings if semantic dependencies are installed (fastembed).
- Auto-detects if `numpy` and `fastembed` are available
- Uses "code" model by default (jinaai/jina-embeddings-v2-base-code)
- Skip with `--no-embeddings` flag if needed

3.2 更新 search 说明

强调自动混合模式：

**Search Code** (Auto Hybrid Mode - DEFAULT):
# Simple call - auto-detects mode (hybrid if embeddings exist, exact otherwise):
codex_lens(action="search", query="authentication", path=".", limit=20)

3.3 详细模式说明

添加完整的模式列表和默认行为说明：

auto: DEFAULT - Uses hybrid if embeddings exist, exact otherwise
hybrid: Exact + Fuzzy + Vector fusion (best results, auto-selected if embeddings exist)
其他模式...

使用示例

场景 1：首次使用（已安装 fastembed）

# 初始化索引（自动生成嵌入）
codexlens init .

# 输出：
# OK Indexed 150 files in 12 directories
#
# Generating embeddings...
# Model: code
# ✓ Generated 1234 embeddings in 45.2s

# 搜索（自动使用 hybrid 模式）
codexlens search "authentication"
# Mode: hybrid | Searched 12 directories in 15.2ms

场景 2：首次使用（未安装 fastembed）

# 初始化索引（跳过嵌入）
codexlens init .

# 输出：
# OK Indexed 150 files in 12 directories
# (无嵌入生成提示)

# 搜索（降级到 exact 模式）
codexlens search "authentication"
# Mode: exact | Searched 12 directories in 8.5ms

场景 3：手动控制

# 跳过嵌入生成
codexlens init . --no-embeddings

# 强制使用特定模式
codexlens search "auth" --mode exact
codexlens search "how to authenticate" --mode hybrid

场景 4：MCP 工具使用（简化版）

# 初始化（自动生成嵌入）
codex_lens(action="init", path=".")

# 搜索（默认 auto 模式：有嵌入用 hybrid，无嵌入用 exact）
codex_lens(action="search", query="authentication")

# 强制混合模式
codex_lens(action="search", query="authentication", mode="hybrid")

# 强制精确模式
codex_lens(action="search", query="authenticate_user", mode="exact")

# 仅返回文件路径
codex_lens(action="search_files", query="payment processing")

高级操作使用 CLI：

# 检查状态
codexlens status

# 提取符号
codexlens symbol src/auth/login.js

# 配置管理
codexlens config show
codexlens config set index_dir /custom/path

# 清理索引
codexlens clean .

技术细节

嵌入检测逻辑

查找项目在 registry 中的记录
获取索引路径 index_root/_index.db
调用 check_embeddings_status() 检查：
- 是否存在 chunks 表
- chunks_count > 0
根据检测结果选择模式

混合搜索权重

默认 RRF 权重：

Exact FTS: 0.4
Fuzzy FTS: 0.3
Vector: 0.3

可通过 --weights 参数自定义：

codexlens search "query" --mode hybrid --weights 0.5,0.3,0.2

模型选项

模型	模型名称	维度	大小	推荐场景
fast	BAAI/bge-small-en-v1.5	384	~80MB	快速原型
code	jinaai/jina-embeddings-v2-base-code	768	~150MB	推荐代码搜索
multilingual	intfloat/multilingual-e5-large	1024	~1GB	多语言项目
balanced	mixedbread-ai/mxbai-embed-large-v1	1024	~600MB	平衡性能

兼容性

向后兼容

所有现有命令仍然工作
手动指定 --mode 会覆盖自动检测
使用 --no-embeddings 可恢复旧行为

依赖要求

核心功能（无需额外依赖）：

FTS 索引（exact, fuzzy）
符号提取

语义搜索功能（需要安装）：

pip install codexlens[semantic]
# 或
pip install numpy fastembed

性能影响

初始化时间

FTS 索引：~2-5 秒（100 文件）
嵌入生成：+30-60 秒（首次下载模型）
后续嵌入：+10-20 秒

搜索性能

模式	延迟	召回率	推荐场景
exact	5ms	中	精确代码标识符
fuzzy	7ms	中	容错搜索
hybrid	15ms	最高	通用搜索（推荐）
vector	12ms	高	语义查询
pure-vector	10ms	中	自然语言

最小化修改原则

所有修改都遵循最小化原则：

保持向后兼容：不破坏现有功能
默认智能：自动检测最佳模式
用户可控：可通过参数覆盖自动行为
渐进增强：未安装 fastembed 时优雅降级

总结

✅ init 命令自动生成嵌入（可用 --no-embeddings 跳过） ✅ search 命令默认使用混合模式（有嵌入时自动启用） ✅ MCP 工具简化为核心操作（init, search, search_files） ✅ 所有搜索模式支持（auto, exact, fuzzy, hybrid, vector, pure-vector） ✅ 文档已更新反映新的默认行为 ✅ 保持向后兼容性 ✅ 优雅降级（无 fastembed 时使用 exact 模式）

MCP vs CLI 功能对比

功能	MCP 工具	CLI
初始化索引	✅ `codex_lens(action="init")`	✅ `codexlens init`
搜索代码	✅ `codex_lens(action="search")`	✅ `codexlens search`
搜索文件	✅ `codex_lens(action="search_files")`	✅ `codexlens search --files-only`
检查状态	❌ 使用 CLI	✅ `codexlens status`
提取符号	❌ 使用 CLI	✅ `codexlens symbol`
配置管理	❌ 使用 CLI	✅ `codexlens config`
清理索引	❌ 使用 CLI	✅ `codexlens clean`

设计理念：MCP 工具专注于高频核心操作（索引、搜索），高级管理操作通过 CLI 执行。

9.2 KiB Raw Blame History Unescape Escape