mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-05 01:50:27 +08:00
feat: 添加全局环境变量加载功能并更新配置说明
This commit is contained in:
@@ -1,10 +1,15 @@
|
||||
# CodexLens Environment Configuration
|
||||
# Copy this file to .codexlens/.env and fill in your values
|
||||
#
|
||||
# Priority order:
|
||||
# 1. Environment variables (already set in shell)
|
||||
# 2. .codexlens/.env (workspace-local, this file)
|
||||
#
|
||||
# Configuration locations (copy to one of these):
|
||||
# - ~/.codexlens/.env (global, applies to all projects)
|
||||
# - project/.codexlens/.env (workspace-local)
|
||||
# - project/.env (project root)
|
||||
#
|
||||
# Priority order (later overrides earlier):
|
||||
# 1. Environment variables (already set in shell) - highest
|
||||
# 2. .codexlens/.env (workspace-local)
|
||||
# 3. .env (project root)
|
||||
# 4. ~/.codexlens/.env (global) - lowest
|
||||
|
||||
# ============================================
|
||||
# RERANKER Configuration
|
||||
|
||||
265
codex-lens/docs/CONFIGURATION.md
Normal file
265
codex-lens/docs/CONFIGURATION.md
Normal file
@@ -0,0 +1,265 @@
|
||||
# CodexLens 配置说明
|
||||
|
||||
## 目录结构
|
||||
|
||||
```
|
||||
~/.codexlens/ # 全局数据目录
|
||||
├── .env # 全局 API 配置 (新增)
|
||||
├── settings.json # 运行时设置
|
||||
├── embedding_lock.json # 模型锁定文件
|
||||
├── registry.db # 项目注册表
|
||||
├── indexes/ # 集中式索引存储
|
||||
└── venv/ # Python 虚拟环境
|
||||
|
||||
project/
|
||||
├── .codexlens/ # 工作区本地目录
|
||||
│ ├── .env # 工作区 API 配置 (覆盖全局)
|
||||
│ ├── index.db # 项目索引数据库
|
||||
│ ├── cache/ # 缓存目录
|
||||
│ └── .gitignore # 排除敏感文件
|
||||
└── .env # 项目根目录配置
|
||||
```
|
||||
|
||||
## 配置优先级
|
||||
|
||||
配置加载顺序 (后者覆盖前者):
|
||||
|
||||
| 优先级 | 位置 | 说明 |
|
||||
|--------|------|------|
|
||||
| 1 (最低) | `~/.codexlens/.env` | 全局默认配置 |
|
||||
| 2 | `project/.env` | 项目根目录配置 |
|
||||
| 3 | `project/.codexlens/.env` | 工作区本地配置 |
|
||||
| 4 (最高) | 环境变量 | Shell 环境变量 |
|
||||
|
||||
## 环境变量
|
||||
|
||||
### Embedding 配置
|
||||
|
||||
用于 `litellm` 后端的嵌入向量服务:
|
||||
|
||||
```bash
|
||||
# API 密钥
|
||||
EMBEDDING_API_KEY=your-api-key
|
||||
|
||||
# API 基础 URL
|
||||
EMBEDDING_API_BASE=https://api.example.com/v1
|
||||
|
||||
# 嵌入模型名称
|
||||
EMBEDDING_MODEL=text-embedding-3-small
|
||||
```
|
||||
|
||||
**支持的提供商示例**:
|
||||
|
||||
| 提供商 | API Base | 模型示例 |
|
||||
|--------|----------|----------|
|
||||
| OpenAI | `https://api.openai.com/v1` | `text-embedding-3-small` |
|
||||
| ModelScope | `https://api-inference.modelscope.cn/v1` | `Qwen/Qwen3-Embedding-8B` |
|
||||
| Azure | `https://your-resource.openai.azure.com` | `text-embedding-ada-002` |
|
||||
|
||||
### LiteLLM 配置
|
||||
|
||||
用于 LLM 功能 (重排序、语义分析等):
|
||||
|
||||
```bash
|
||||
# API 密钥
|
||||
LITELLM_API_KEY=your-api-key
|
||||
|
||||
# API 基础 URL
|
||||
LITELLM_API_BASE=https://api.example.com/v1
|
||||
|
||||
# 模型名称
|
||||
LITELLM_MODEL=gpt-4o-mini
|
||||
```
|
||||
|
||||
### Reranker 配置
|
||||
|
||||
用于搜索结果重排序 (可选):
|
||||
|
||||
```bash
|
||||
# API 密钥
|
||||
RERANKER_API_KEY=your-api-key
|
||||
|
||||
# API 基础 URL
|
||||
RERANKER_API_BASE=https://api.siliconflow.cn
|
||||
|
||||
# 提供商: siliconflow, cohere, jina
|
||||
RERANKER_PROVIDER=siliconflow
|
||||
|
||||
# 重排序模型
|
||||
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
|
||||
```
|
||||
|
||||
### 通用配置
|
||||
|
||||
```bash
|
||||
# 自定义数据目录 (默认: ~/.codexlens)
|
||||
CODEXLENS_DATA_DIR=~/.codexlens
|
||||
|
||||
# 启用调试模式
|
||||
CODEXLENS_DEBUG=false
|
||||
```
|
||||
|
||||
## settings.json
|
||||
|
||||
运行时设置保存在 `~/.codexlens/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"embedding": {
|
||||
"backend": "litellm",
|
||||
"model": "Qwen/Qwen3-Embedding-8B",
|
||||
"use_gpu": false,
|
||||
"endpoints": [
|
||||
{
|
||||
"model": "Qwen/Qwen3-Embedding-8B",
|
||||
"api_key": "${EMBEDDING_API_KEY}",
|
||||
"api_base": "${EMBEDDING_API_BASE}",
|
||||
"weight": 1.0
|
||||
}
|
||||
],
|
||||
"strategy": "latency_aware",
|
||||
"cooldown": 60.0
|
||||
},
|
||||
"llm": {
|
||||
"enabled": true,
|
||||
"tool": "gemini",
|
||||
"timeout_ms": 300000,
|
||||
"batch_size": 5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Embedding 设置
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `backend` | string | `fastembed` (本地) 或 `litellm` (API) |
|
||||
| `model` | string | 模型名称或配置文件 |
|
||||
| `use_gpu` | bool | GPU 加速 (仅 fastembed) |
|
||||
| `endpoints` | array | 多端点配置 (仅 litellm) |
|
||||
| `strategy` | string | 负载均衡策略 |
|
||||
| `cooldown` | float | 限流冷却时间 (秒) |
|
||||
|
||||
**Embedding Backend 对比**:
|
||||
|
||||
| 特性 | fastembed | litellm |
|
||||
|------|-----------|---------|
|
||||
| 运行方式 | 本地 ONNX | API 调用 |
|
||||
| 依赖 | 本地模型文件 | API 密钥 |
|
||||
| 速度 | 快 (本地) | 取决于网络 |
|
||||
| 模型选择 | 预定义配置文件 | 任意 API 模型 |
|
||||
| GPU 支持 | 是 | N/A |
|
||||
|
||||
**负载均衡策略**:
|
||||
|
||||
| 策略 | 说明 |
|
||||
|------|------|
|
||||
| `round_robin` | 轮询分配 |
|
||||
| `latency_aware` | 延迟感知 (推荐) |
|
||||
| `weighted_random` | 加权随机 |
|
||||
|
||||
### LLM 设置
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `enabled` | bool | 启用 LLM 功能 |
|
||||
| `tool` | string | LLM 工具 (`gemini`, `codex`) |
|
||||
| `timeout_ms` | int | 超时时间 (毫秒) |
|
||||
| `batch_size` | int | 批处理大小 |
|
||||
|
||||
## FastEmbed 模型配置文件
|
||||
|
||||
使用 `fastembed` 后端时的预定义模型:
|
||||
|
||||
| 配置文件 | 模型 | 维度 | 大小 |
|
||||
|----------|------|------|------|
|
||||
| `fast` | BAAI/bge-small-en-v1.5 | 384 | 80MB |
|
||||
| `base` | BAAI/bge-base-en-v1.5 | 768 | 220MB |
|
||||
| `code` | jinaai/jina-embeddings-v2-base-code | 768 | 150MB |
|
||||
| `minilm` | sentence-transformers/all-MiniLM-L6-v2 | 384 | 90MB |
|
||||
| `multilingual` | intfloat/multilingual-e5-large | 1024 | 1000MB |
|
||||
| `balanced` | mixedbread-ai/mxbai-embed-large-v1 | 1024 | 600MB |
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 1. 使用全局配置
|
||||
|
||||
创建 `~/.codexlens/.env`:
|
||||
|
||||
```bash
|
||||
# 复制示例配置
|
||||
cp codex-lens/.env.example ~/.codexlens/.env
|
||||
|
||||
# 编辑配置
|
||||
nano ~/.codexlens/.env
|
||||
```
|
||||
|
||||
### 2. 使用本地嵌入 (fastembed)
|
||||
|
||||
```bash
|
||||
# 初始化索引 (使用 code 配置文件)
|
||||
codexlens init --backend fastembed --model code
|
||||
|
||||
# 或使用多语言模型
|
||||
codexlens init --backend fastembed --model multilingual
|
||||
```
|
||||
|
||||
### 3. 使用 API 嵌入 (litellm)
|
||||
|
||||
```bash
|
||||
# 设置环境变量
|
||||
export EMBEDDING_API_KEY=your-key
|
||||
export EMBEDDING_API_BASE=https://api.example.com/v1
|
||||
export EMBEDDING_MODEL=text-embedding-3-small
|
||||
|
||||
# 初始化索引
|
||||
codexlens init --backend litellm --model text-embedding-3-small
|
||||
```
|
||||
|
||||
### 4. 验证配置
|
||||
|
||||
```bash
|
||||
# 检查配置加载
|
||||
codexlens config show
|
||||
|
||||
# 测试嵌入
|
||||
codexlens test-embedding "Hello World"
|
||||
```
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 配置未加载
|
||||
|
||||
检查文件权限和路径:
|
||||
|
||||
```bash
|
||||
ls -la ~/.codexlens/.env
|
||||
cat ~/.codexlens/.env
|
||||
```
|
||||
|
||||
### API 错误
|
||||
|
||||
1. 验证 API 密钥有效性
|
||||
2. 检查 API Base URL 是否正确
|
||||
3. 确认模型名称匹配提供商支持的模型
|
||||
|
||||
### 模型不兼容
|
||||
|
||||
如果更换嵌入模型,需要重建索引:
|
||||
|
||||
```bash
|
||||
# 删除旧索引
|
||||
rm -rf project/.codexlens/
|
||||
|
||||
# 重新初始化
|
||||
codexlens init --backend litellm --model new-model
|
||||
```
|
||||
|
||||
## 相关文件
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `src/codexlens/config.py` | 配置类定义 |
|
||||
| `src/codexlens/env_config.py` | 环境变量加载 |
|
||||
| `src/codexlens/cli/model_manager.py` | FastEmbed 模型管理 |
|
||||
| `src/codexlens/semantic/factory.py` | Embedder 工厂 |
|
||||
@@ -527,8 +527,8 @@ def search(
|
||||
console.print("[dim]Use --method with: fts, vector, splade, hybrid, cascade[/dim]")
|
||||
raise typer.Exit(code=1)
|
||||
|
||||
# Configure search
|
||||
config = Config()
|
||||
# Configure search (load settings from file)
|
||||
config = Config.load()
|
||||
|
||||
# Validate method
|
||||
valid_methods = ["fts", "vector", "splade", "hybrid", "cascade"]
|
||||
|
||||
@@ -265,6 +265,12 @@ class Config:
|
||||
"timeout_ms": self.llm_timeout_ms,
|
||||
"batch_size": self.llm_batch_size,
|
||||
},
|
||||
"reranker": {
|
||||
"enabled": self.enable_cross_encoder_rerank,
|
||||
"backend": self.reranker_backend,
|
||||
"model": self.reranker_model,
|
||||
"top_k": self.reranker_top_k,
|
||||
},
|
||||
}
|
||||
with open(self.settings_path, "w", encoding="utf-8") as f:
|
||||
json.dump(settings, f, indent=2)
|
||||
@@ -313,6 +319,25 @@ class Config:
|
||||
self.llm_timeout_ms = llm["timeout_ms"]
|
||||
if "batch_size" in llm:
|
||||
self.llm_batch_size = llm["batch_size"]
|
||||
|
||||
# Load reranker settings
|
||||
reranker = settings.get("reranker", {})
|
||||
if "enabled" in reranker:
|
||||
self.enable_cross_encoder_rerank = reranker["enabled"]
|
||||
if "backend" in reranker:
|
||||
backend = reranker["backend"]
|
||||
if backend in {"onnx", "api", "litellm", "legacy"}:
|
||||
self.reranker_backend = backend
|
||||
else:
|
||||
log.warning(
|
||||
"Invalid reranker backend in %s: %r (expected 'onnx', 'api', 'litellm', or 'legacy')",
|
||||
self.settings_path,
|
||||
backend,
|
||||
)
|
||||
if "model" in reranker:
|
||||
self.reranker_model = reranker["model"]
|
||||
if "top_k" in reranker:
|
||||
self.reranker_top_k = reranker["top_k"]
|
||||
except Exception as exc:
|
||||
log.warning(
|
||||
"Failed to load settings from %s (%s): %s",
|
||||
|
||||
@@ -95,39 +95,68 @@ def load_env_file(env_path: Path) -> Dict[str, str]:
|
||||
return env_vars
|
||||
|
||||
|
||||
def _get_global_data_dir() -> Path:
|
||||
"""Get global CodexLens data directory."""
|
||||
env_override = os.environ.get("CODEXLENS_DATA_DIR")
|
||||
if env_override:
|
||||
return Path(env_override).expanduser().resolve()
|
||||
return (Path.home() / ".codexlens").resolve()
|
||||
|
||||
|
||||
def load_global_env() -> Dict[str, str]:
|
||||
"""Load environment variables from global ~/.codexlens/.env file.
|
||||
|
||||
Returns:
|
||||
Dictionary of environment variables from global config
|
||||
"""
|
||||
global_env_path = _get_global_data_dir() / ".env"
|
||||
if global_env_path.is_file():
|
||||
env_vars = load_env_file(global_env_path)
|
||||
log.debug("Loaded %d vars from global %s", len(env_vars), global_env_path)
|
||||
return env_vars
|
||||
return {}
|
||||
|
||||
|
||||
def load_workspace_env(workspace_root: Path | None = None) -> Dict[str, str]:
|
||||
"""Load environment variables from workspace .env files.
|
||||
|
||||
|
||||
Priority (later overrides earlier):
|
||||
1. Project root .env
|
||||
2. .codexlens/.env
|
||||
|
||||
1. Global ~/.codexlens/.env (lowest priority)
|
||||
2. Project root .env
|
||||
3. .codexlens/.env (highest priority)
|
||||
|
||||
Args:
|
||||
workspace_root: Workspace root directory. If None, uses current directory.
|
||||
|
||||
|
||||
Returns:
|
||||
Merged dictionary of environment variables
|
||||
"""
|
||||
if workspace_root is None:
|
||||
workspace_root = Path.cwd()
|
||||
|
||||
|
||||
workspace_root = Path(workspace_root).resolve()
|
||||
|
||||
|
||||
env_vars: Dict[str, str] = {}
|
||||
|
||||
# Load from project root .env (lowest priority)
|
||||
|
||||
# Load from global ~/.codexlens/.env (lowest priority)
|
||||
global_vars = load_global_env()
|
||||
if global_vars:
|
||||
env_vars.update(global_vars)
|
||||
|
||||
# Load from project root .env (medium priority)
|
||||
root_env = workspace_root / ".env"
|
||||
if root_env.is_file():
|
||||
env_vars.update(load_env_file(root_env))
|
||||
log.debug("Loaded %d vars from %s", len(env_vars), root_env)
|
||||
|
||||
# Load from .codexlens/.env (higher priority)
|
||||
loaded = load_env_file(root_env)
|
||||
env_vars.update(loaded)
|
||||
log.debug("Loaded %d vars from %s", len(loaded), root_env)
|
||||
|
||||
# Load from .codexlens/.env (highest priority)
|
||||
codexlens_env = workspace_root / ".codexlens" / ".env"
|
||||
if codexlens_env.is_file():
|
||||
loaded = load_env_file(codexlens_env)
|
||||
env_vars.update(loaded)
|
||||
log.debug("Loaded %d vars from %s", len(loaded), codexlens_env)
|
||||
|
||||
|
||||
return env_vars
|
||||
|
||||
|
||||
|
||||
77
compare_reranker.py
Normal file
77
compare_reranker.py
Normal file
@@ -0,0 +1,77 @@
|
||||
#!/usr/bin/env python
|
||||
"""Compare search results with and without reranker."""
|
||||
import json
|
||||
import subprocess
|
||||
import sys
|
||||
import os
|
||||
|
||||
os.chdir(r"D:\dongdiankaifa9\hydro_generator_module")
|
||||
|
||||
query = "热网络计算"
|
||||
|
||||
def run_search(method: str) -> dict:
|
||||
"""Run search and return parsed JSON result."""
|
||||
cmd = [sys.executable, "-m", "codexlens", "search", query, "--method", method, "--limit", "10", "--json"]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, encoding="utf-8")
|
||||
# Find JSON in output (skip debug lines)
|
||||
for line in result.stdout.split("\n"):
|
||||
if line.strip().startswith("{"):
|
||||
try:
|
||||
return json.loads(line)
|
||||
except:
|
||||
pass
|
||||
# Try to find JSON object in stderr
|
||||
output = result.stdout + result.stderr
|
||||
start = output.find('{"success"')
|
||||
if start >= 0:
|
||||
# Find matching closing brace
|
||||
depth = 0
|
||||
for i, c in enumerate(output[start:]):
|
||||
if c == '{':
|
||||
depth += 1
|
||||
elif c == '}':
|
||||
depth -= 1
|
||||
if depth == 0:
|
||||
try:
|
||||
return json.loads(output[start:start+i+1])
|
||||
except:
|
||||
pass
|
||||
break
|
||||
return {"success": False, "error": "Failed to parse JSON"}
|
||||
|
||||
print("=" * 60)
|
||||
print("搜索对比: 有无 Reranker 效果")
|
||||
print("查询:", query)
|
||||
print("=" * 60)
|
||||
|
||||
# Run hybrid search (no reranker)
|
||||
print("\n[1] Hybrid 搜索 (无 Reranker)")
|
||||
print("-" * 40)
|
||||
hybrid_result = run_search("hybrid")
|
||||
if hybrid_result.get("success"):
|
||||
results = hybrid_result.get("result", {}).get("results", [])[:10]
|
||||
for i, r in enumerate(results, 1):
|
||||
path = r.get("path", "").split("\\")[-1]
|
||||
score = r.get("score", 0)
|
||||
print(f"{i:2}. {path[:45]:<45} score={score:.4f}")
|
||||
else:
|
||||
print("搜索失败:", hybrid_result.get("error"))
|
||||
|
||||
# Run cascade search (with reranker)
|
||||
print("\n[2] Cascade 搜索 (使用 Reranker)")
|
||||
print("-" * 40)
|
||||
cascade_result = run_search("cascade")
|
||||
if cascade_result.get("success"):
|
||||
results = cascade_result.get("result", {}).get("results", [])[:10]
|
||||
for i, r in enumerate(results, 1):
|
||||
path = r.get("path", "").split("\\")[-1]
|
||||
score = r.get("score", 0)
|
||||
print(f"{i:2}. {path[:45]:<45} score={score:.4f}")
|
||||
else:
|
||||
print("搜索失败:", cascade_result.get("error"))
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("对比说明:")
|
||||
print("- Hybrid: FTS + Vector 融合,无二次重排序")
|
||||
print("- Cascade: Vector 粗筛 + Reranker API 精排")
|
||||
print("=" * 60)
|
||||
Reference in New Issue
Block a user