feat: Enhance embedding management and model configuration

- Updated embedding_manager.py to include backend parameter in model configuration. - Modified model_manager.py to utilize cache_name for ONNX models. - Refactored hybrid_search.py to improve embedder initialization based on backend type. - Added backend column to vector_store.py for better model configuration management. - Implemented migration for existing database to include backend information. - Enhanced API settings implementation with comprehensive provider and endpoint management. - Introduced LiteLLM integration guide detailing configuration and usage. - Added examples for LiteLLM usage in TypeScript.
2026-02-04 01:40:45 +08:00 · 2025-12-24 14:03:59 +08:00
parent 9b926d1a1e
commit b00113d212
22 changed files with 5507 additions and 706 deletions
--- a/.claude/workflows/cli-templates/protocols/write-protocol.md
+++ b/.claude/workflows/cli-templates/protocols/write-protocol.md
@@ -49,17 +49,6 @@ RULES: [templates | additional constraints]
 - Break backward compatibility
 - Exceed 3 failed attempts without stopping

-## Multi-Task Execution (Resume)
-
-**First subtask**: Standard execution flow
-**Subsequent subtasks** (via `resume`):
- Recall context from previous subtasks
- Build on previous work
- Maintain consistency
- Test integration
- Report context for next subtask
-
-## Error Handling

 **Three-Attempt Rule**: On 3rd failure, stop and report what attempted, what failed, root cause

@@ -80,7 +69,7 @@ RULES: [templates | additional constraints]

 **If template has no format** → Use default format below

-### Single Task Implementation
+### Task Implementation

 ```markdown
 # Implementation: [TASK Title]
@@ -112,48 +101,6 @@ RULES: [templates | additional constraints]
 [Recommendations if any]
 ```

-### Multi-Task (First Subtask)
-
-```markdown
-# Subtask 1/N: [TASK Title]
-
-## Changes
-[List of file changes]
-
-## Implementation
-[Details with code references]
-
-## Testing
-✅ Tests: X passing
-
-## Context for Next Subtask
- Key decisions: [established patterns]
- Files created: [paths and purposes]
- Integration points: [where next subtask should connect]
-```
-
-### Multi-Task (Subsequent Subtasks)
-
-```markdown
-# Subtask N/M: [TASK Title]
-
-## Changes
-[List of file changes]
-
-## Integration Notes
-✅ Compatible with previous subtask
-✅ Maintains established patterns
-
-## Implementation
-[Details with code references]
-
-## Testing
-✅ Tests: X passing
-
-## Context for Next Subtask
-[If not final, provide context]
-```
-
 ### Partial Completion

 ```markdown
--- a/.claude/workflows/cli-tools-usage.md
+++ b/.claude/workflows/cli-tools-usage.md
@@ -362,10 +362,6 @@ ccw cli -p "RULES: \$(cat ~/.claude/workflows/cli-templates/protocols/analysis-p
  - Description: Additional directories (comma-separated)
  - Default: none

- **`--timeout <ms>`**
-  - Description: Timeout in milliseconds
-  - Default: 300000
-
 - **`--resume [id]`**
  - Description: Resume previous session
  - Default: -
@@ -423,73 +419,80 @@ CCW automatically maps to tool-specific syntax:

 **Analysis Task** (Security Audit):
 ```bash
-ccw cli -p "
+timeout 600 ccw cli -p "
 PURPOSE: Identify OWASP Top 10 vulnerabilities in authentication module to pass security audit; success = all critical/high issues documented with remediation
 TASK: • Scan for injection flaws (SQL, command, LDAP) • Check authentication bypass vectors • Evaluate session management • Assess sensitive data exposure
 MODE: analysis
 CONTEXT: @src/auth/**/* @src/middleware/auth.ts | Memory: Using bcrypt for passwords, JWT for sessions
 EXPECTED: Security report with: severity matrix, file:line references, CVE mappings where applicable, remediation code snippets prioritized by risk
 RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-security-risks.txt) | Focus on authentication | Ignore test files
-" --tool gemini --cd src/auth --timeout 600000
+" --tool gemini --mode analysis --cd src/auth
 ```

 **Implementation Task** (New Feature):
 ```bash
-ccw cli -p "
+timeout 1800 ccw cli -p "
 PURPOSE: Implement rate limiting for API endpoints to prevent abuse; must be configurable per-endpoint; backward compatible with existing clients
 TASK: • Create rate limiter middleware with sliding window • Implement per-route configuration • Add Redis backend for distributed state • Include bypass for internal services
 MODE: write
 CONTEXT: @src/middleware/**/* @src/config/**/* | Memory: Using Express.js, Redis already configured, existing middleware pattern in auth.ts
 EXPECTED: Production-ready code with: TypeScript types, unit tests, integration test, configuration example, migration guide
 RULES: $(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/development/02-implement-feature.txt) | Follow existing middleware patterns | No breaking changes
-" --tool codex --mode write --timeout 1800000
+" --tool codex --mode write
 ```

 **Bug Fix Task**:
 ```bash
-ccw cli -p "
+timeout 900 ccw cli -p "
 PURPOSE: Fix memory leak in WebSocket connection handler causing server OOM after 24h; root cause must be identified before any fix
 TASK: • Trace connection lifecycle from open to close • Identify event listener accumulation • Check cleanup on disconnect • Verify garbage collection eligibility
 MODE: analysis
 CONTEXT: @src/websocket/**/* @src/services/connection-manager.ts | Memory: Using ws library, ~5000 concurrent connections in production
 EXPECTED: Root cause analysis with: memory profile, leak source (file:line), fix recommendation with code, verification steps
 RULES: $(cat ~/.claude/workflows/cli-templates/protocols/analysis-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on resource cleanup
-" --tool gemini --cd src --timeout 900000
+" --tool gemini --mode analysis --cd src
 ```

 **Refactoring Task**:
 ```bash
-ccw cli -p "
+timeout 1200 ccw cli -p "
 PURPOSE: Refactor payment processing to use strategy pattern for multi-gateway support; no functional changes; all existing tests must pass
 TASK: • Extract gateway interface from current implementation • Create strategy classes for Stripe, PayPal • Implement factory for gateway selection • Migrate existing code to use strategies
 MODE: write
 CONTEXT: @src/payments/**/* @src/types/payment.ts | Memory: Currently only Stripe, adding PayPal next sprint, must support future gateways
 EXPECTED: Refactored code with: strategy interface, concrete implementations, factory class, updated tests, migration checklist
 RULES: $(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md) $(cat ~/.claude/workflows/cli-templates/prompts/development/02-refactor-codebase.txt) | Preserve all existing behavior | Tests must pass
-" --tool gemini --mode write --timeout 1200000
+" --tool gemini --mode write
 ```
 ---

 ## Configuration

-### Timeout Allocation
+### Timeout Allocation (Bash)

-**Minimum**: 5 minutes (300000ms)
+CLI internal timeout is disabled; controlled by external bash `timeout` command:

- **Simple**: 5-10min (300000-600000ms)
-  - Examples: Analysis, search
+```bash
+# Syntax: timeout <seconds> ccw cli ...
+timeout 600 ccw cli -p "..." --tool gemini --mode analysis   # 10 minutes
+timeout 1800 ccw cli -p "..." --tool codex --mode write      # 30 minutes
+```

- **Medium**: 10-20min (600000-1200000ms)
-  - Examples: Refactoring, documentation
+**Recommended Time Allocation**:

- **Complex**: 20-60min (1200000-3600000ms)
-  - Examples: Implementation, migration
+- **Simple** (5-10min): Analysis, search
+  - `timeout 300` ~ `timeout 600`

- **Heavy**: 60-120min (3600000-7200000ms)
-  - Examples: Large codebase, multi-file
+- **Medium** (10-20min): Refactoring, documentation
+  - `timeout 600` ~ `timeout 1200`

-**Codex Multiplier**: 3x allocated time (minimum 15min / 900000ms)
+- **Complex** (20-60min): Implementation, migration
+  - `timeout 1200` ~ `timeout 3600`

+- **Heavy** (60-120min): Large codebase, multi-file
+  - `timeout 3600` ~ `timeout 7200`
+
+**Codex Multiplier**: 3x allocated time (minimum 15min / 900s)

 ### Permission Framework

@@ -523,4 +526,3 @@ RULES: $(cat ~/.claude/workflows/cli-templates/protocols/write-protocol.md) $(ca
 - [ ] **Tool selected** - `--tool gemini|qwen|codex`
 - [ ] **Template applied (REQUIRED)** - Use specific or universal fallback template
 - [ ] **Constraints specified** - Scope, requirements
- [ ] **Timeout configured** - Based on complexity
--- a/.codex/AGENTS.md
+++ b/.codex/AGENTS.md
@@ -21,8 +21,11 @@
 - Graceful degradation
 - Don't expose sensitive info

+
+
 ## Core Principles

+
 **Incremental Progress**:
 - Small, testable changes
 - Commit working code frequently
@@ -43,11 +46,58 @@
 - Maintain established patterns
 - Test integration between subtasks

+
+## System Optimization
+
+**Direct Binary Calls**: Always call binaries directly in `functions.shell`, set `workdir`, avoid shell wrappers (`bash -lc`, `cmd /c`, etc.)
+
+**Text Editing Priority**:
+1. Use `apply_patch` tool for all routine text edits
+2. Fall back to `sed` for single-line substitutions if unavailable
+3. Avoid Python editing scripts unless both fail
+
+**apply_patch invocation**:
+```json
+{
+  "command": ["apply_patch", "*** Begin Patch\n*** Update File: path/to/file\n@@\n- old\n+ new\n*** End Patch\n"],
+  "workdir": "<workdir>",
+  "justification": "Brief reason"
+}
+```
+
+**Windows UTF-8 Encoding** (before commands):
+```powershell
+[Console]::InputEncoding  = [Text.UTF8Encoding]::new($false)
+[Console]::OutputEncoding = [Text.UTF8Encoding]::new($false)
+chcp 65001 > $null
+```
+
+## Context Acquisition (MCP Tools Priority)
+
+**For task context gathering and analysis, ALWAYS prefer MCP tools**:
+
+1. **smart_search** - First choice for code discovery
+   - Use `smart_search(query="...")` for semantic/keyword search
+   - Use `smart_search(action="find_files", pattern="*.ts")` for file discovery
+   - Supports modes: `auto`, `hybrid`, `exact`, `ripgrep`
+
+2. **read_file** - Batch file reading
+   - Read multiple files in parallel: `read_file(path="file1.ts")`, `read_file(path="file2.ts")`
+   - Supports glob patterns: `read_file(path="src/**/*.config.ts")`
+
+**Priority Order**:
+```
+smart_search (discovery) → read_file (batch read) → shell commands (fallback)
+```
+
+**NEVER** use shell commands (`cat`, `find`, `grep`) when MCP tools are available.
+
 ## Execution Checklist

 **Before**:
 - [ ] Understand PURPOSE and TASK clearly
- [ ] Review CONTEXT files, find 3+ patterns
+- [ ] Use smart_search to discover relevant files
+- [ ] Use read_file to batch read context files, find 3+ patterns
 - [ ] Check RULES templates and constraints

 **During**:
--- a/.gemini/GEMINI.md
+++ b/.gemini/GEMINI.md
@@ -1,25 +1,62 @@
-# Gemini Code Guidelines
+
+## Code Quality Standards
+
+### Code Quality
+- Follow project's existing patterns
+- Match import style and naming conventions
+- Single responsibility per function/class
+- DRY (Don't Repeat Yourself)
+- YAGNI (You Aren't Gonna Need It)
+
+### Testing
+- Test all public functions
+- Test edge cases and error conditions
+- Mock external dependencies
+- Target 80%+ coverage
+
+### Error Handling
+- Proper try-catch blocks
+- Clear error messages
+- Graceful degradation
+- Don't expose sensitive info

 ## Core Principles

-**Thoroughness**:
- Analyze ALL CONTEXT files completely
- Check cross-file patterns and dependencies
- Identify edge cases and quantify metrics
+**Incremental Progress**:
+- Small, testable changes
+- Commit working code frequently
+- Build on previous work (subtasks)

 **Evidence-Based**:
- Quote relevant code with `file:line` references
- Link related patterns across files
- Support all claims with concrete examples
+- Study 3+ similar patterns before implementing
+- Match project style exactly
+- Verify with existing code

-**Actionable**:
- Clear, specific recommendations (not vague)
- Prioritized by impact
- Incremental changes over big rewrites
+**Pragmatic**:
+- Boring solutions over clever code
+- Simple over complex
+- Adapt to project reality

-**Philosophy**:
- **Simple over complex** - Avoid over-engineering
- **Clear over clever** - Prefer obvious solutions
- **Learn from existing** - Reference project patterns
- **Pragmatic over dogmatic** - Adapt to project reality
- **Incremental progress** - Small, testable changes
+**Context Continuity** (Multi-Task):
+- Leverage resume for consistency
+- Maintain established patterns
+- Test integration between subtasks
+
+## Execution Checklist
+
+**Before**:
+- [ ] Understand PURPOSE and TASK clearly
+- [ ] Review CONTEXT files, find 3+ patterns
+- [ ] Check RULES templates and constraints
+
+**During**:
+- [ ] Follow existing patterns exactly
+- [ ] Write tests alongside code
+- [ ] Run tests after every change
+- [ ] Commit working code incrementally
+
+**After**:
+- [ ] All tests pass
+- [ ] Coverage meets target
+- [ ] Build succeeds
+- [ ] All EXPECTED deliverables met
--- a/API_SETTINGS_IMPLEMENTATION.md
+++ b/API_SETTINGS_IMPLEMENTATION.md
@@ -0,0 +1,196 @@
+# API Settings 页面实现完成
+
+## 创建的文件
+
+### 1. JavaScript 文件
+**位置**: `ccw/src/templates/dashboard-js/views/api-settings.js` (28KB)
+
+**主要功能**:
+- ✅ Provider Management (提供商管理)
+  - 添加/编辑/删除提供商
+  - 支持 OpenAI, Anthropic, Google, Ollama, Azure, Mistral, DeepSeek, Custom
+  - API Key 管理（支持环境变量）
+  - 连接测试功能
+  
+- ✅ Endpoint Management (端点管理)
+  - 创建自定义端点
+  - 关联提供商和模型
+  - 缓存策略配置
+  - 显示 CLI 使用示例
+  
+- ✅ Cache Management (缓存管理)
+  - 全局缓存开关
+  - 缓存统计显示
+  - 清除缓存功能
+
+### 2. CSS 样式文件
+**位置**: `ccw/src/templates/dashboard-css/31-api-settings.css` (6.8KB)
+
+**样式包括**:
+- 卡片式布局
+- 表单样式
+- 进度条
+- 响应式设计
+- 空状态显示
+
+### 3. 国际化支持
+**位置**: `ccw/src/templates/dashboard-js/i18n.js`
+
+**添加的翻译**:
+- 英文：54 个翻译键
+- 中文：54 个翻译键
+- 包含所有 UI 文本、提示信息、错误消息
+
+### 4. 配置更新
+
+#### dashboard-generator.ts
+- ✅ 添加 `31-api-settings.css` 到 CSS 模块列表
+- ✅ 添加 `views/api-settings.js` 到 JS 模块列表
+
+#### navigation.js
+- ✅ 添加 `api-settings` 路由处理
+- ✅ 添加标题更新逻辑
+
+#### dashboard.html
+- ✅ 添加导航菜单项 (Settings 图标)
+
+## API 端点使用
+
+该页面使用以下后端 API（已存在）:
+
+### Provider APIs
+- `GET /api/litellm-api/providers` - 获取所有提供商
+- `POST /api/litellm-api/providers` - 创建提供商
+- `PUT /api/litellm-api/providers/:id` - 更新提供商
+- `DELETE /api/litellm-api/providers/:id` - 删除提供商
+- `POST /api/litellm-api/providers/:id/test` - 测试连接
+
+### Endpoint APIs
+- `GET /api/litellm-api/endpoints` - 获取所有端点
+- `POST /api/litellm-api/endpoints` - 创建端点
+- `PUT /api/litellm-api/endpoints/:id` - 更新端点
+- `DELETE /api/litellm-api/endpoints/:id` - 删除端点
+
+### Model Discovery
+- `GET /api/litellm-api/models/:providerType` - 获取提供商支持的模型列表
+
+### Cache APIs
+- `GET /api/litellm-api/cache/stats` - 获取缓存统计
+- `POST /api/litellm-api/cache/clear` - 清除缓存
+
+### Config APIs
+- `GET /api/litellm-api/config` - 获取完整配置
+- `PUT /api/litellm-api/config/cache` - 更新全局缓存设置
+
+## 页面特性
+
+### Provider 管理
+```
+-- Provider Card ------------------------+
+| OpenAI Production          [Edit] [Del] |
+| Type: openai                            |
+| Key: sk-...abc                          |
+| URL: https://api.openai.com/v1         |
+| Status: ✓ Enabled                       |
+-----------------------------------------+
+```
+
+### Endpoint 管理
+```
+-- Endpoint Card ------------------------+
+| GPT-4o Code Review          [Edit] [Del]|
+| ID: my-gpt4o                            |
+| Provider: OpenAI Production             |
+| Model: gpt-4-turbo                      |
+| Cache: Enabled (60 min)                 |
+| Usage: ccw cli -p "..." --model my-gpt4o|
+-----------------------------------------+
+```
+
+### 表单功能
+- **Provider Form**:
+  - 类型选择（8 种提供商）
+  - API Key 输入（支持显示/隐藏）
+  - 环境变量支持
+  - Base URL 自定义
+  - 启用/禁用开关
+
+- **Endpoint Form**:
+  - 端点 ID（CLI 使用）
+  - 显示名称
+  - 提供商选择（动态加载）
+  - 模型选择（根据提供商动态加载）
+  - 缓存策略配置
+    - TTL（分钟）
+    - 最大大小（KB）
+    - 自动缓存文件模式
+
+## 使用流程
+
+### 1. 添加提供商
+1. 点击 "Add Provider"
+2. 选择提供商类型（如 OpenAI）
+3. 输入显示名称
+4. 输入 API Key（或使用环境变量）
+5. 可选：输入自定义 API Base URL
+6. 保存
+
+### 2. 创建自定义端点
+1. 点击 "Add Endpoint"
+2. 输入端点 ID（用于 CLI）
+3. 输入显示名称
+4. 选择提供商
+5. 选择模型（自动加载该提供商支持的模型）
+6. 可选：配置缓存策略
+7. 保存
+
+### 3. 使用端点
+```bash
+ccw cli -p "Analyze this code..." --model my-gpt4o
+```
+
+## 代码质量
+
+- ✅ 遵循现有代码风格
+- ✅ 使用 i18n 函数支持国际化
+- ✅ 响应式设计（移动端友好）
+- ✅ 完整的表单验证
+- ✅ 用户友好的错误提示
+- ✅ 使用 Lucide 图标
+- ✅ 模态框复用现有样式
+- ✅ 与后端 API 完全集成
+
+## 测试建议
+
+1. **基础功能测试**:
+   - 添加/编辑/删除提供商
+   - 添加/编辑/删除端点
+   - 清除缓存
+
+2. **表单验证测试**:
+   - 必填字段验证
+   - API Key 显示/隐藏
+   - 环境变量切换
+
+3. **数据加载测试**:
+   - 模型列表动态加载
+   - 缓存统计显示
+   - 空状态显示
+
+4. **国际化测试**:
+   - 切换语言（英文/中文）
+   - 验证所有文本正确显示
+
+## 下一步
+
+页面已完成并集成到项目中。启动 CCW Dashboard 后：
+1. 导航栏会显示 "API Settings" 菜单项（Settings 图标）
+2. 点击进入即可使用所有功能
+3. 所有操作会实时同步到配置文件
+
+## 注意事项
+
+- 页面使用现有的 LiteLLM API 路由（`litellm-api-routes.ts`）
+- 配置保存在项目的 LiteLLM 配置文件中
+- 支持环境变量引用格式：`${VARIABLE_NAME}`
+- API Key 在显示时会自动脱敏（显示前 4 位和后 4 位）
--- a/ccw/LITELLM_INTEGRATION.md
+++ b/ccw/LITELLM_INTEGRATION.md
@@ -0,0 +1,308 @@
+# LiteLLM Integration Guide
+
+## Overview
+
+CCW now supports custom LiteLLM endpoints with integrated context caching. You can configure multiple providers (OpenAI, Anthropic, Ollama, etc.) and create custom endpoints with file-based caching strategies.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                      CLI Executor                           │
+│                                                             │
+│  ┌─────────────┐         ┌──────────────────────────────┐  │
+│  │   --model   │────────>│  Route Decision:             │  │
+│  │   flag      │         │  - gemini/qwen/codex → CLI   │  │
+│  └─────────────┘         │  - custom ID → LiteLLM       │  │
+│                          └──────────────────────────────┘  │
+└─────────────────────────────────────────────────────────────┘
+                                    │
+                                    ▼
+┌─────────────────────────────────────────────────────────────┐
+│                  LiteLLM Executor                           │
+│                                                             │
+│  1. Load endpoint config (litellm-api-config.json)         │
+│  2. Extract @patterns from prompt                          │
+│  3. Pack files via context-cache                           │
+│  4. Call LiteLLM client with cached content + prompt       │
+│  5. Return result                                          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Configuration
+
+### File Location
+
+Configuration is stored per-project:
+```
+<project>/.ccw/storage/config/litellm-api-config.json
+```
+
+### Configuration Structure
+
+```json
+{
+  "version": 1,
+  "providers": [
+    {
+      "id": "openai-1234567890",
+      "name": "My OpenAI",
+      "type": "openai",
+      "apiKey": "${OPENAI_API_KEY}",
+      "enabled": true,
+      "createdAt": "2025-01-01T00:00:00.000Z",
+      "updatedAt": "2025-01-01T00:00:00.000Z"
+    }
+  ],
+  "endpoints": [
+    {
+      "id": "my-gpt4o",
+      "name": "GPT-4o with Context Cache",
+      "providerId": "openai-1234567890",
+      "model": "gpt-4o",
+      "description": "GPT-4o with automatic file caching",
+      "cacheStrategy": {
+        "enabled": true,
+        "ttlMinutes": 60,
+        "maxSizeKB": 512,
+        "filePatterns": ["*.md", "*.ts", "*.js"]
+      },
+      "enabled": true,
+      "createdAt": "2025-01-01T00:00:00.000Z",
+      "updatedAt": "2025-01-01T00:00:00.000Z"
+    }
+  ],
+  "defaultEndpoint": "my-gpt4o",
+  "globalCacheSettings": {
+    "enabled": true,
+    "cacheDir": "~/.ccw/cache/context",
+    "maxTotalSizeMB": 100
+  }
+}
+```
+
+## Usage
+
+### Via CLI
+
+```bash
+# Use custom endpoint with --model flag
+ccw cli -p "Analyze authentication flow" --tool litellm --model my-gpt4o
+
+# With context patterns (automatically cached)
+ccw cli -p "@src/auth/**/*.ts Review security" --tool litellm --model my-gpt4o
+
+# Disable caching for specific call
+ccw cli -p "Quick question" --tool litellm --model my-gpt4o --no-cache
+```
+
+### Via Dashboard API
+
+#### Create Provider
+```bash
+curl -X POST http://localhost:3000/api/litellm-api/providers \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "My OpenAI",
+    "type": "openai",
+    "apiKey": "${OPENAI_API_KEY}",
+    "enabled": true
+  }'
+```
+
+#### Create Endpoint
+```bash
+curl -X POST http://localhost:3000/api/litellm-api/endpoints \
+  -H "Content-Type: application/json" \
+  -d '{
+    "id": "my-gpt4o",
+    "name": "GPT-4o with Cache",
+    "providerId": "openai-1234567890",
+    "model": "gpt-4o",
+    "cacheStrategy": {
+      "enabled": true,
+      "ttlMinutes": 60,
+      "maxSizeKB": 512,
+      "filePatterns": ["*.md", "*.ts"]
+    },
+    "enabled": true
+  }'
+```
+
+#### Test Provider Connection
+```bash
+curl -X POST http://localhost:3000/api/litellm-api/providers/openai-1234567890/test
+```
+
+## Context Caching
+
+### How It Works
+
+1. **Pattern Detection**: LiteLLM executor scans prompt for `@patterns`
+   ```
+   @src/**/*.ts
+   @CLAUDE.md
+   @../shared/**/*
+   ```
+
+2. **File Packing**: Files matching patterns are packed via `context-cache` tool
+   - Respects `max_file_size` limit (default: 1MB per file)
+   - Applies TTL from endpoint config
+   - Generates session ID for retrieval
+
+3. **Cache Integration**: Cached content is prepended to prompt
+   ```
+   <cached files>
+   ---
+   <original prompt>
+   ```
+
+4. **LLM Call**: Combined prompt sent to LiteLLM with provider credentials
+
+### Cache Strategy Configuration
+
+```typescript
+interface CacheStrategy {
+  enabled: boolean;           // Enable/disable caching for this endpoint
+  ttlMinutes: number;         // Cache lifetime (default: 60)
+  maxSizeKB: number;          // Max cache size (default: 512KB)
+  filePatterns: string[];     // Glob patterns to cache
+}
+```
+
+### Example: Security Audit with Cache
+
+```bash
+ccw cli -p "
+PURPOSE: OWASP Top 10 security audit of authentication module
+TASK: • Check SQL injection • Verify session management • Test XSS vectors
+CONTEXT: @src/auth/**/*.ts @src/middleware/auth.ts
+EXPECTED: Security report with severity levels and remediation steps
+" --tool litellm --model my-security-scanner --mode analysis
+```
+
+**What happens:**
+1. Executor detects `@src/auth/**/*.ts` and `@src/middleware/auth.ts`
+2. Packs matching files into context cache
+3. Cache entry valid for 60 minutes (per endpoint config)
+4. Subsequent calls reuse cached files (no re-packing)
+5. LiteLLM receives full context without manual file specification
+
+## Environment Variables
+
+### Provider API Keys
+
+LiteLLM uses standard environment variable names:
+
+| Provider   | Env Var Name          |
+|------------|-----------------------|
+| OpenAI     | `OPENAI_API_KEY`      |
+| Anthropic  | `ANTHROPIC_API_KEY`   |
+| Google     | `GOOGLE_API_KEY`      |
+| Azure      | `AZURE_API_KEY`       |
+| Mistral    | `MISTRAL_API_KEY`     |
+| DeepSeek   | `DEEPSEEK_API_KEY`    |
+
+### Configuration Syntax
+
+Use `${ENV_VAR}` syntax in config:
+```json
+{
+  "apiKey": "${OPENAI_API_KEY}"
+}
+```
+
+The executor resolves these at runtime via `resolveEnvVar()`.
+
+## API Reference
+
+### Config Manager (`litellm-api-config-manager.ts`)
+
+#### Provider Management
+```typescript
+getAllProviders(baseDir: string): ProviderCredential[]
+getProvider(baseDir: string, providerId: string): ProviderCredential | null
+getProviderWithResolvedEnvVars(baseDir: string, providerId: string): ProviderCredential & { resolvedApiKey: string } | null
+addProvider(baseDir: string, providerData): ProviderCredential
+updateProvider(baseDir: string, providerId: string, updates): ProviderCredential
+deleteProvider(baseDir: string, providerId: string): boolean
+```
+
+#### Endpoint Management
+```typescript
+getAllEndpoints(baseDir: string): CustomEndpoint[]
+getEndpoint(baseDir: string, endpointId: string): CustomEndpoint | null
+findEndpointById(baseDir: string, endpointId: string): CustomEndpoint | null
+addEndpoint(baseDir: string, endpointData): CustomEndpoint
+updateEndpoint(baseDir: string, endpointId: string, updates): CustomEndpoint
+deleteEndpoint(baseDir: string, endpointId: string): boolean
+```
+
+### Executor (`litellm-executor.ts`)
+
+```typescript
+interface LiteLLMExecutionOptions {
+  prompt: string;
+  endpointId: string;
+  baseDir: string;
+  cwd?: string;
+  includeDirs?: string[];
+  enableCache?: boolean;
+  onOutput?: (data: { type: string; data: string }) => void;
+}
+
+interface LiteLLMExecutionResult {
+  success: boolean;
+  output: string;
+  model: string;
+  provider: string;
+  cacheUsed: boolean;
+  cachedFiles?: string[];
+  error?: string;
+}
+
+executeLiteLLMEndpoint(options: LiteLLMExecutionOptions): Promise<LiteLLMExecutionResult>
+extractPatterns(prompt: string): string[]
+```
+
+## Dashboard Integration
+
+The dashboard provides UI for managing LiteLLM configuration:
+
+- **Providers**: Add/edit/delete provider credentials
+- **Endpoints**: Configure custom endpoints with cache strategies
+- **Cache Stats**: View cache usage and clear entries
+- **Test Connections**: Verify provider API access
+
+Routes are handled by `litellm-api-routes.ts`.
+
+## Limitations
+
+1. **Python Dependency**: Requires `ccw-litellm` Python package installed
+2. **Model Support**: Limited to models supported by LiteLLM library
+3. **Cache Scope**: Context cache is in-memory (not persisted across restarts)
+4. **Pattern Syntax**: Only supports glob-style `@patterns`, not regex
+
+## Troubleshooting
+
+### Error: "Endpoint not found"
+- Verify endpoint ID matches config file
+- Check `litellm-api-config.json` exists in `.ccw/storage/config/`
+
+### Error: "API key not configured"
+- Ensure environment variable is set
+- Verify `${ENV_VAR}` syntax in config
+- Test with `echo $OPENAI_API_KEY`
+
+### Error: "Failed to spawn Python process"
+- Install ccw-litellm: `pip install ccw-litellm`
+- Verify Python accessible: `python --version`
+
+### Cache Not Applied
+- Check endpoint has `cacheStrategy.enabled: true`
+- Verify prompt contains `@patterns`
+- Check cache TTL hasn't expired
+
+## Examples
+
+See `examples/litellm-config.json` for complete configuration template.
--- a/ccw/examples/litellm-usage.ts
+++ b/ccw/examples/litellm-usage.ts
@@ -0,0 +1,77 @@
+/**
+ * LiteLLM Usage Examples
+ * Demonstrates how to use the LiteLLM TypeScript client
+ */
+
+import { getLiteLLMClient, getLiteLLMStatus } from '../src/tools/litellm-client';
+
+async function main() {
+  console.log('=== LiteLLM TypeScript Bridge Examples ===\n');
+
+  // Example 1: Check availability
+  console.log('1. Checking LiteLLM availability...');
+  const status = await getLiteLLMStatus();
+  console.log('   Status:', status);
+  console.log('');
+
+  if (!status.available) {
+    console.log('❌ LiteLLM is not available. Please install ccw-litellm:');
+    console.log('   pip install ccw-litellm');
+    return;
+  }
+
+  const client = getLiteLLMClient();
+
+  // Example 2: Get configuration
+  console.log('2. Getting configuration...');
+  try {
+    const config = await client.getConfig();
+    console.log('   Config:', config);
+  } catch (error) {
+    console.log('   Error:', error.message);
+  }
+  console.log('');
+
+  // Example 3: Generate embeddings
+  console.log('3. Generating embeddings...');
+  try {
+    const texts = ['Hello world', 'Machine learning is amazing'];
+    const embedResult = await client.embed(texts, 'default');
+    console.log('   Dimensions:', embedResult.dimensions);
+    console.log('   Vectors count:', embedResult.vectors.length);
+    console.log('   First vector (first 5 dims):', embedResult.vectors[0]?.slice(0, 5));
+  } catch (error) {
+    console.log('   Error:', error.message);
+  }
+  console.log('');
+
+  // Example 4: Single message chat
+  console.log('4. Single message chat...');
+  try {
+    const response = await client.chat('What is 2+2?', 'default');
+    console.log('   Response:', response);
+  } catch (error) {
+    console.log('   Error:', error.message);
+  }
+  console.log('');
+
+  // Example 5: Multi-turn chat
+  console.log('5. Multi-turn chat...');
+  try {
+    const chatResponse = await client.chatMessages([
+      { role: 'system', content: 'You are a helpful math tutor.' },
+      { role: 'user', content: 'What is the Pythagorean theorem?' }
+    ], 'default');
+    console.log('   Content:', chatResponse.content);
+    console.log('   Model:', chatResponse.model);
+    console.log('   Usage:', chatResponse.usage);
+  } catch (error) {
+    console.log('   Error:', error.message);
+  }
+  console.log('');
+
+  console.log('=== Examples completed ===');
+}
+
+// Run examples
+main().catch(console.error);
--- a/ccw/src/commands/cli.ts
+++ b/ccw/src/commands/cli.ts
@@ -855,7 +855,7 @@ export async function cliCommand(
        console.log(chalk.gray('    --model <model>     Model override'));
        console.log(chalk.gray('    --cd <path>         Working directory'));
        console.log(chalk.gray('    --includeDirs <dirs>  Additional directories'));
-        console.log(chalk.gray('    --timeout <ms>      Timeout (default: 300000)'));
+        console.log(chalk.gray('    --timeout <ms>      Timeout (default: 0=disabled)'));
        console.log(chalk.gray('    --resume [id]       Resume previous session'));
        console.log(chalk.gray('    --cache <items>     Cache: comma-separated @patterns and text'));
        console.log(chalk.gray('    --inject-mode <m>   Inject mode: none, full, progressive'));
--- a/ccw/src/commands/hook.ts
+++ b/ccw/src/commands/hook.ts
@@ -6,7 +6,7 @@
 import chalk from 'chalk';
 import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'fs';
 import { join, dirname } from 'path';
-import { tmpdir } from 'os';
+import { homedir } from 'os';

 interface HookOptions {
  stdin?: boolean;
@@ -53,9 +53,10 @@ async function readStdin(): Promise<string> {

 /**
 * Get session state file path
+ * Uses ~/.claude/.ccw-sessions/ for reliable persistence across sessions
 */
 function getSessionStateFile(sessionId: string): string {
-  const stateDir = join(tmpdir(), '.ccw-sessions');
+  const stateDir = join(homedir(), '.claude', '.ccw-sessions');
  if (!existsSync(stateDir)) {
    mkdirSync(stateDir, { recursive: true });
  }
--- a/ccw/src/config/.litellm-api-config-manager.ts.2025-12-23T11-57-43-727Z.bak
+++ b/ccw/src/config/.litellm-api-config-manager.ts.2025-12-23T11-57-43-727Z.bak
@@ -0,0 +1,441 @@
+/**
+ * LiteLLM API Config Manager
+ * Manages provider credentials, endpoint configurations, and model discovery
+ */
+
+import { join } from 'path';
+import { readFileSync, writeFileSync, existsSync, mkdirSync } from 'fs';
+import { homedir } from 'os';
+
+// ===========================
+// Type Definitions
+// ===========================
+
+export type ProviderType =
+  | 'openai'
+  | 'anthropic'
+  | 'google'
+  | 'cohere'
+  | 'azure'
+  | 'bedrock'
+  | 'vertexai'
+  | 'huggingface'
+  | 'ollama'
+  | 'custom';
+
+export interface ProviderCredential {
+  id: string;
+  name: string;
+  type: ProviderType;
+  apiKey?: string;
+  baseUrl?: string;
+  apiVersion?: string;
+  region?: string;
+  projectId?: string;
+  organizationId?: string;
+  enabled: boolean;
+  metadata?: Record<string, any>;
+  createdAt: string;
+  updatedAt: string;
+}
+
+export interface EndpointConfig {
+  id: string;
+  name: string;
+  providerId: string;
+  model: string;
+  alias?: string;
+  temperature?: number;
+  maxTokens?: number;
+  topP?: number;
+  enabled: boolean;
+  metadata?: Record<string, any>;
+  createdAt: string;
+  updatedAt: string;
+}
+
+export interface ModelInfo {
+  id: string;
+  name: string;
+  provider: ProviderType;
+  contextWindow: number;
+  supportsFunctions: boolean;
+  supportsStreaming: boolean;
+  inputCostPer1k?: number;
+  outputCostPer1k?: number;
+}
+
+export interface LiteLLMApiConfig {
+  version: string;
+  providers: ProviderCredential[];
+  endpoints: EndpointConfig[];
+}
+
+// ===========================
+// Model Definitions
+// ===========================
+
+export const PROVIDER_MODELS: Record<ProviderType, ModelInfo[]> = {
+  openai: [
+    {
+      id: 'gpt-4-turbo',
+      name: 'GPT-4 Turbo',
+      provider: 'openai',
+      contextWindow: 128000,
+      supportsFunctions: true,
+      supportsStreaming: true,
+      inputCostPer1k: 0.01,
+      outputCostPer1k: 0.03,
+    },
+    {
+      id: 'gpt-4',
+      name: 'GPT-4',
+      provider: 'openai',
+      contextWindow: 8192,
+      supportsFunctions: true,
+      supportsStreaming: true,
+      inputCostPer1k: 0.03,
+      outputCostPer1k: 0.06,
+    },
+    {
+      id: 'gpt-3.5-turbo',
+      name: 'GPT-3.5 Turbo',
+      provider: 'openai',
+      contextWindow: 16385,
+      supportsFunctions: true,
+      supportsStreaming: true,
+      inputCostPer1k: 0.0005,
+      outputCostPer1k: 0.0015,
+    },
+  ],
+  anthropic: [
+    {
+      id: 'claude-3-opus-20240229',
+      name: 'Claude 3 Opus',
+      provider: 'anthropic',
+      contextWindow: 200000,
+      supportsFunctions: true,
+      supportsStreaming: true,
+      inputCostPer1k: 0.015,
+      outputCostPer1k: 0.075,
+    },
+    {
+      id: 'claude-3-sonnet-20240229',
+      name: 'Claude 3 Sonnet',
+      provider: 'anthropic',
+      contextWindow: 200000,
+      supportsFunctions: true,
+      supportsStreaming: true,
+      inputCostPer1k: 0.003,
+      outputCostPer1k: 0.015,
+    },
+    {
+      id: 'claude-3-haiku-20240307',
+      name: 'Claude 3 Haiku',
+      provider: 'anthropic',
+      contextWindow: 200000,
+      supportsFunctions: true,
+      supportsStreaming: true,
+      inputCostPer1k: 0.00025,
+      outputCostPer1k: 0.00125,
+    },
+  ],
+  google: [
+    {
+      id: 'gemini-pro',
+      name: 'Gemini Pro',
+      provider: 'google',
+      contextWindow: 32768,
+      supportsFunctions: true,
+      supportsStreaming: true,
+    },
+    {
+      id: 'gemini-pro-vision',
+      name: 'Gemini Pro Vision',
+      provider: 'google',
+      contextWindow: 16384,
+      supportsFunctions: false,
+      supportsStreaming: true,
+    },
+  ],
+  cohere: [
+    {
+      id: 'command',
+      name: 'Command',
+      provider: 'cohere',
+      contextWindow: 4096,
+      supportsFunctions: false,
+      supportsStreaming: true,
+    },
+    {
+      id: 'command-light',
+      name: 'Command Light',
+      provider: 'cohere',
+      contextWindow: 4096,
+      supportsFunctions: false,
+      supportsStreaming: true,
+    },
+  ],
+  azure: [],
+  bedrock: [],
+  vertexai: [],
+  huggingface: [],
+  ollama: [],
+  custom: [],
+};
+
+// ===========================
+// Config File Management
+// ===========================
+
+const CONFIG_DIR = join(homedir(), '.claude', 'litellm');
+const CONFIG_FILE = join(CONFIG_DIR, 'config.json');
+
+function ensureConfigDir(): void {
+  if (!existsSync(CONFIG_DIR)) {
+    mkdirSync(CONFIG_DIR, { recursive: true });
+  }
+}
+
+function loadConfig(): LiteLLMApiConfig {
+  ensureConfigDir();
+
+  if (!existsSync(CONFIG_FILE)) {
+    const defaultConfig: LiteLLMApiConfig = {
+      version: '1.0.0',
+      providers: [],
+      endpoints: [],
+    };
+    saveConfig(defaultConfig);
+    return defaultConfig;
+  }
+
+  try {
+    const content = readFileSync(CONFIG_FILE, 'utf-8');
+    return JSON.parse(content);
+  } catch (err) {
+    throw new Error(`Failed to load config: ${(err as Error).message}`);
+  }
+}
+
+function saveConfig(config: LiteLLMApiConfig): void {
+  ensureConfigDir();
+
+  try {
+    writeFileSync(CONFIG_FILE, JSON.stringify(config, null, 2), 'utf-8');
+  } catch (err) {
+    throw new Error(`Failed to save config: ${(err as Error).message}`);
+  }
+}
+
+// ===========================
+// Provider Management
+// ===========================
+
+export function getAllProviders(): ProviderCredential[] {
+  const config = loadConfig();
+  return config.providers;
+}
+
+export function getProvider(id: string): ProviderCredential | null {
+  const config = loadConfig();
+  return config.providers.find((p) => p.id === id) || null;
+}
+
+export function createProvider(
+  data: Omit<ProviderCredential, 'id' | 'createdAt' | 'updatedAt'>
+): ProviderCredential {
+  const config = loadConfig();
+
+  const now = new Date().toISOString();
+  const provider: ProviderCredential = {
+    ...data,
+    id: `provider-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`,
+    createdAt: now,
+    updatedAt: now,
+  };
+
+  config.providers.push(provider);
+  saveConfig(config);
+
+  return provider;
+}
+
+export function updateProvider(
+  id: string,
+  updates: Partial<ProviderCredential>
+): ProviderCredential | null {
+  const config = loadConfig();
+
+  const index = config.providers.findIndex((p) => p.id === id);
+  if (index === -1) {
+    return null;
+  }
+
+  const updated: ProviderCredential = {
+    ...config.providers[index],
+    ...updates,
+    id,
+    updatedAt: new Date().toISOString(),
+  };
+
+  config.providers[index] = updated;
+  saveConfig(config);
+
+  return updated;
+}
+
+export function deleteProvider(id: string): { success: boolean } {
+  const config = loadConfig();
+
+  const index = config.providers.findIndex((p) => p.id === id);
+  if (index === -1) {
+    return { success: false };
+  }
+
+  config.providers.splice(index, 1);
+
+  // Also delete endpoints using this provider
+  config.endpoints = config.endpoints.filter((e) => e.providerId !== id);
+
+  saveConfig(config);
+
+  return { success: true };
+}
+
+export async function testProviderConnection(
+  providerId: string
+): Promise<{ success: boolean; error?: string }> {
+  const provider = getProvider(providerId);
+
+  if (!provider) {
+    return { success: false, error: 'Provider not found' };
+  }
+
+  if (!provider.enabled) {
+    return { success: false, error: 'Provider is disabled' };
+  }
+
+  // Basic validation
+  if (!provider.apiKey && provider.type !== 'ollama' && provider.type !== 'custom') {
+    return { success: false, error: 'API key is required for this provider type' };
+  }
+
+  // TODO: Implement actual provider connection testing using litellm-client
+  // For now, just validate the configuration
+  return { success: true };
+}
+
+// ===========================
+// Endpoint Management
+// ===========================
+
+export function getAllEndpoints(): EndpointConfig[] {
+  const config = loadConfig();
+  return config.endpoints;
+}
+
+export function getEndpoint(id: string): EndpointConfig | null {
+  const config = loadConfig();
+  return config.endpoints.find((e) => e.id === id) || null;
+}
+
+export function createEndpoint(
+  data: Omit<EndpointConfig, 'id' | 'createdAt' | 'updatedAt'>
+): EndpointConfig {
+  const config = loadConfig();
+
+  // Validate provider exists
+  const provider = config.providers.find((p) => p.id === data.providerId);
+  if (!provider) {
+    throw new Error('Provider not found');
+  }
+
+  const now = new Date().toISOString();
+  const endpoint: EndpointConfig = {
+    ...data,
+    id: `endpoint-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`,
+    createdAt: now,
+    updatedAt: now,
+  };
+
+  config.endpoints.push(endpoint);
+  saveConfig(config);
+
+  return endpoint;
+}
+
+export function updateEndpoint(
+  id: string,
+  updates: Partial<EndpointConfig>
+): EndpointConfig | null {
+  const config = loadConfig();
+
+  const index = config.endpoints.findIndex((e) => e.id === id);
+  if (index === -1) {
+    return null;
+  }
+
+  // Validate provider if being updated
+  if (updates.providerId) {
+    const provider = config.providers.find((p) => p.id === updates.providerId);
+    if (!provider) {
+      throw new Error('Provider not found');
+    }
+  }
+
+  const updated: EndpointConfig = {
+    ...config.endpoints[index],
+    ...updates,
+    id,
+    updatedAt: new Date().toISOString(),
+  };
+
+  config.endpoints[index] = updated;
+  saveConfig(config);
+
+  return updated;
+}
+
+export function deleteEndpoint(id: string): { success: boolean } {
+  const config = loadConfig();
+
+  const index = config.endpoints.findIndex((e) => e.id === id);
+  if (index === -1) {
+    return { success: false };
+  }
+
+  config.endpoints.splice(index, 1);
+  saveConfig(config);
+
+  return { success: true };
+}
+
+// ===========================
+// Model Discovery
+// ===========================
+
+export function getModelsForProviderType(providerType: ProviderType): ModelInfo[] | null {
+  return PROVIDER_MODELS[providerType] || null;
+}
+
+export function getAllModels(): Record<ProviderType, ModelInfo[]> {
+  return PROVIDER_MODELS;
+}
+
+// ===========================
+// Config Access
+// ===========================
+
+export function getFullConfig(): LiteLLMApiConfig {
+  return loadConfig();
+}
+
+export function resetConfig(): void {
+  const defaultConfig: LiteLLMApiConfig = {
+    version: '1.0.0',
+    providers: [],
+    endpoints: [],
+  };
+  saveConfig(defaultConfig);
+}
--- a/ccw/src/config/provider-models.ts
+++ b/ccw/src/config/provider-models.ts
@@ -25,10 +25,33 @@ export interface ModelInfo {
 }

 /**
- * Predefined models for each provider
+ * Embedding model information metadata
+ */
+export interface EmbeddingModelInfo {
+  /** Model identifier (used in API calls) */
+  id: string;
+
+  /** Human-readable display name */
+  name: string;
+
+  /** Embedding dimensions */
+  dimensions: number;
+
+  /** Maximum input tokens */
+  maxTokens: number;
+
+  /** Provider identifier */
+  provider: string;
+}
+
+
+/**
+ * Predefined models for each API format
 * Used for UI selection and validation
+ * Note: Most providers use OpenAI-compatible format
 */
 export const PROVIDER_MODELS: Record<ProviderType, ModelInfo[]> = {
+  // OpenAI-compatible format (used by OpenAI, DeepSeek, Ollama, etc.)
  openai: [
    {
      id: 'gpt-4o',
@@ -49,19 +72,32 @@ export const PROVIDER_MODELS: Record<ProviderType, ModelInfo[]> = {
      supportsCaching: true
    },
    {
-      id: 'o1-mini',
-      name: 'O1 Mini',
-      contextWindow: 128000,
-      supportsCaching: true
+      id: 'deepseek-chat',
+      name: 'DeepSeek Chat',
+      contextWindow: 64000,
+      supportsCaching: false
    },
    {
-      id: 'gpt-4-turbo',
-      name: 'GPT-4 Turbo',
+      id: 'deepseek-coder',
+      name: 'DeepSeek Coder',
+      contextWindow: 64000,
+      supportsCaching: false
+    },
+    {
+      id: 'llama3.2',
+      name: 'Llama 3.2',
      contextWindow: 128000,
      supportsCaching: false
+    },
+    {
+      id: 'qwen2.5-coder',
+      name: 'Qwen 2.5 Coder',
+      contextWindow: 32000,
+      supportsCaching: false
    }
  ],

+  // Anthropic format
  anthropic: [
    {
      id: 'claude-sonnet-4-20250514',
@@ -89,135 +125,7 @@ export const PROVIDER_MODELS: Record<ProviderType, ModelInfo[]> = {
    }
  ],

-  ollama: [
-    {
-      id: 'llama3.2',
-      name: 'Llama 3.2',
-      contextWindow: 128000,
-      supportsCaching: false
-    },
-    {
-      id: 'llama3.1',
-      name: 'Llama 3.1',
-      contextWindow: 128000,
-      supportsCaching: false
-    },
-    {
-      id: 'qwen2.5-coder',
-      name: 'Qwen 2.5 Coder',
-      contextWindow: 32000,
-      supportsCaching: false
-    },
-    {
-      id: 'codellama',
-      name: 'Code Llama',
-      contextWindow: 16000,
-      supportsCaching: false
-    },
-    {
-      id: 'mistral',
-      name: 'Mistral',
-      contextWindow: 32000,
-      supportsCaching: false
-    }
-  ],
-
-  azure: [
-    {
-      id: 'gpt-4o',
-      name: 'GPT-4o (Azure)',
-      contextWindow: 128000,
-      supportsCaching: true
-    },
-    {
-      id: 'gpt-4o-mini',
-      name: 'GPT-4o Mini (Azure)',
-      contextWindow: 128000,
-      supportsCaching: true
-    },
-    {
-      id: 'gpt-4-turbo',
-      name: 'GPT-4 Turbo (Azure)',
-      contextWindow: 128000,
-      supportsCaching: false
-    },
-    {
-      id: 'gpt-35-turbo',
-      name: 'GPT-3.5 Turbo (Azure)',
-      contextWindow: 16000,
-      supportsCaching: false
-    }
-  ],
-
-  google: [
-    {
-      id: 'gemini-2.0-flash-exp',
-      name: 'Gemini 2.0 Flash Experimental',
-      contextWindow: 1048576,
-      supportsCaching: true
-    },
-    {
-      id: 'gemini-1.5-pro',
-      name: 'Gemini 1.5 Pro',
-      contextWindow: 2097152,
-      supportsCaching: true
-    },
-    {
-      id: 'gemini-1.5-flash',
-      name: 'Gemini 1.5 Flash',
-      contextWindow: 1048576,
-      supportsCaching: true
-    },
-    {
-      id: 'gemini-1.0-pro',
-      name: 'Gemini 1.0 Pro',
-      contextWindow: 32000,
-      supportsCaching: false
-    }
-  ],
-
-  mistral: [
-    {
-      id: 'mistral-large-latest',
-      name: 'Mistral Large',
-      contextWindow: 128000,
-      supportsCaching: false
-    },
-    {
-      id: 'mistral-medium-latest',
-      name: 'Mistral Medium',
-      contextWindow: 32000,
-      supportsCaching: false
-    },
-    {
-      id: 'mistral-small-latest',
-      name: 'Mistral Small',
-      contextWindow: 32000,
-      supportsCaching: false
-    },
-    {
-      id: 'codestral-latest',
-      name: 'Codestral',
-      contextWindow: 32000,
-      supportsCaching: false
-    }
-  ],
-
-  deepseek: [
-    {
-      id: 'deepseek-chat',
-      name: 'DeepSeek Chat',
-      contextWindow: 64000,
-      supportsCaching: false
-    },
-    {
-      id: 'deepseek-coder',
-      name: 'DeepSeek Coder',
-      contextWindow: 64000,
-      supportsCaching: false
-    }
-  ],
-
+  // Custom format
  custom: [
    {
      id: 'custom-model',
@@ -237,6 +145,61 @@ export function getModelsForProvider(providerType: ProviderType): ModelInfo[] {
  return PROVIDER_MODELS[providerType] || [];
 }

+/**
+ * Predefined embedding models for each API format
+ * Used for UI selection and validation
+ */
+export const EMBEDDING_MODELS: Record<ProviderType, EmbeddingModelInfo[]> = {
+  // OpenAI embedding models
+  openai: [
+    {
+      id: 'text-embedding-3-small',
+      name: 'Text Embedding 3 Small',
+      dimensions: 1536,
+      maxTokens: 8191,
+      provider: 'openai'
+    },
+    {
+      id: 'text-embedding-3-large',
+      name: 'Text Embedding 3 Large',
+      dimensions: 3072,
+      maxTokens: 8191,
+      provider: 'openai'
+    },
+    {
+      id: 'text-embedding-ada-002',
+      name: 'Ada 002',
+      dimensions: 1536,
+      maxTokens: 8191,
+      provider: 'openai'
+    }
+  ],
+
+  // Anthropic doesn't have embedding models
+  anthropic: [],
+
+  // Custom embedding models
+  custom: [
+    {
+      id: 'custom-embedding',
+      name: 'Custom Embedding',
+      dimensions: 1536,
+      maxTokens: 8192,
+      provider: 'custom'
+    }
+  ]
+};
+
+/**
+ * Get embedding models for a specific provider
+ * @param providerType - Provider type to get embedding models for
+ * @returns Array of embedding model information
+ */
+export function getEmbeddingModelsForProvider(providerType: ProviderType): EmbeddingModelInfo[] {
+  return EMBEDDING_MODELS[providerType] || [];
+}
+
+
 /**
 * Get model information by ID within a provider
 * @param providerType - Provider type
--- a/ccw/src/core/routes/hooks-routes.ts
+++ b/ccw/src/core/routes/hooks-routes.ts
@@ -181,29 +181,13 @@ function deleteHookFromSettings(projectPath, scope, event, hookIndex) {
 }

 // ========================================
-// Session State Tracking (for progressive disclosure)
+// Session State Tracking
 // ========================================
-
-// Track sessions that have received startup context
-// Key: sessionId, Value: timestamp of first context load
-const sessionContextState = new Map<string, {
-  firstLoad: string;
-  loadCount: number;
-  lastPrompt?: string;
-}>();
-
-// Cleanup old sessions (older than 24 hours)
-function cleanupOldSessions() {
-  const cutoff = Date.now() - 24 * 60 * 60 * 1000;
-  for (const [sessionId, state] of sessionContextState.entries()) {
-    if (new Date(state.firstLoad).getTime() < cutoff) {
-      sessionContextState.delete(sessionId);
-    }
-  }
-}
-
-// Run cleanup every hour
-setInterval(cleanupOldSessions, 60 * 60 * 1000);
+// NOTE: Session state is managed by the CLI command (src/commands/hook.ts)
+// using file-based persistence (~/.claude/.ccw-sessions/).
+// This ensures consistent state tracking across all invocation methods.
+// The /api/hook endpoint delegates to SessionClusteringService without
+// managing its own state, as the authoritative state lives in the CLI layer.

 // ========================================
 // Route Handler
@@ -286,7 +270,8 @@ export async function handleHooksRoutes(ctx: RouteContext): Promise<boolean> {
  }

  // API: Unified Session Context endpoint (Progressive Disclosure)
-  // Automatically detects first prompt vs subsequent prompts
+  // DEPRECATED: Use CLI command `ccw hook session-context --stdin` instead.
+  // This endpoint now uses file-based state (shared with CLI) for consistency.
  // - First prompt: returns cluster-based session overview
  // - Subsequent prompts: returns intent-matched sessions based on prompt
  if (pathname === '/api/hook/session-context' && req.method === 'POST') {
@@ -306,21 +291,30 @@ export async function handleHooksRoutes(ctx: RouteContext): Promise<boolean> {
        const { SessionClusteringService } = await import('../session-clustering-service.js');
        const clusteringService = new SessionClusteringService(projectPath);

-        // Check if this is the first prompt for this session
-        const existingState = sessionContextState.get(sessionId);
+        // Use file-based session state (shared with CLI hook.ts)
+        const sessionStateDir = join(homedir(), '.claude', '.ccw-sessions');
+        const sessionStateFile = join(sessionStateDir, `session-${sessionId}.json`);
+        
+        let existingState: { firstLoad: string; loadCount: number; lastPrompt?: string } | null = null;
+        if (existsSync(sessionStateFile)) {
+          try {
+            existingState = JSON.parse(readFileSync(sessionStateFile, 'utf-8'));
+          } catch {
+            existingState = null;
+          }
+        }
+        
        const isFirstPrompt = !existingState;

-        // Update session state
-        if (isFirstPrompt) {
-          sessionContextState.set(sessionId, {
-            firstLoad: new Date().toISOString(),
-            loadCount: 1,
-            lastPrompt: prompt
-          });
-        } else {
-          existingState.loadCount++;
-          existingState.lastPrompt = prompt;
+        // Update session state (file-based)
+        const newState = isFirstPrompt
+          ? { firstLoad: new Date().toISOString(), loadCount: 1, lastPrompt: prompt }
+          : { ...existingState!, loadCount: existingState!.loadCount + 1, lastPrompt: prompt };
+        
+        if (!existsSync(sessionStateDir)) {
+          mkdirSync(sessionStateDir, { recursive: true });
        }
+        writeFileSync(sessionStateFile, JSON.stringify(newState, null, 2));

        // Determine which type of context to return
        let contextType: 'session-start' | 'context';
@@ -351,7 +345,7 @@ export async function handleHooksRoutes(ctx: RouteContext): Promise<boolean> {
          success: true,
          type: contextType,
          isFirstPrompt,
-          loadCount: sessionContextState.get(sessionId)?.loadCount || 1,
+          loadCount: newState.loadCount,
          content,
          sessionId
        };
--- a/ccw/src/templates/dashboard-css/31-api-settings.css
+++ b/ccw/src/templates/dashboard-css/31-api-settings.css
--- a/ccw/src/templates/dashboard-js/i18n.js
+++ b/ccw/src/templates/dashboard-js/i18n.js
@@ -23,6 +23,8 @@ const i18n = {
    'common.loading': 'Loading...',
    'common.error': 'Error',
    'common.success': 'Success',
+    'common.deleteSuccess': 'Deleted successfully',
+    'common.deleteFailed': 'Delete failed',
    'common.retry': 'Retry',
    'common.refresh': 'Refresh',
    'common.minutes': 'minutes',
@@ -1345,17 +1347,64 @@ const i18n = {
    'apiSettings.editEndpoint': 'Edit Endpoint',
    'apiSettings.deleteEndpoint': 'Delete Endpoint',
    'apiSettings.providerType': 'Provider Type',
+    'apiSettings.apiFormat': 'API Format',
+    'apiSettings.compatible': 'Compatible',
+    'apiSettings.customFormat': 'Custom Format',
+    'apiSettings.apiFormatHint': 'Most providers (DeepSeek, Ollama, etc.) use OpenAI-compatible format',
    'apiSettings.displayName': 'Display Name',
    'apiSettings.apiKey': 'API Key',
    'apiSettings.apiBaseUrl': 'API Base URL',
    'apiSettings.useEnvVar': 'Use environment variable',
    'apiSettings.enableProvider': 'Enable provider',
+    'apiSettings.advancedSettings': 'Advanced Settings',
+    'apiSettings.basicInfo': 'Basic Info',
+    'apiSettings.endpointSettings': 'Endpoint Settings',
+    'apiSettings.timeout': 'Timeout (seconds)',
+    'apiSettings.seconds': 'seconds',
+    'apiSettings.timeoutHint': 'Request timeout in seconds (default: 300)',
+    'apiSettings.maxRetries': 'Max Retries',
+    'apiSettings.maxRetriesHint': 'Maximum retry attempts on failure',
+    'apiSettings.organization': 'Organization ID',
+    'apiSettings.organizationHint': 'OpenAI organization ID (org-...)',
+    'apiSettings.apiVersion': 'API Version',
+    'apiSettings.apiVersionHint': 'Azure API version (e.g., 2024-02-01)',
+    'apiSettings.rpm': 'RPM Limit',
+    'apiSettings.tpm': 'TPM Limit',
+    'apiSettings.unlimited': 'Unlimited',
+    'apiSettings.proxy': 'Proxy Server',
+    'apiSettings.proxyHint': 'HTTP proxy server URL',
+    'apiSettings.customHeaders': 'Custom Headers',
+    'apiSettings.customHeadersHint': 'JSON object with custom HTTP headers',
+    'apiSettings.invalidJsonHeaders': 'Invalid JSON in custom headers',
+    'apiSettings.searchProviders': 'Search providers...',
+    'apiSettings.selectProvider': 'Select a Provider',
+    'apiSettings.selectProviderHint': 'Select a provider from the list to view and manage its settings',
+    'apiSettings.noProvidersFound': 'No providers found',
+    'apiSettings.llmModels': 'LLM Models',
+    'apiSettings.embeddingModels': 'Embedding Models',
+    'apiSettings.manageModels': 'Manage',
+    'apiSettings.addModel': 'Add Model',
+    'apiSettings.multiKeySettings': 'Multi-Key Settings',
+    'apiSettings.noModels': 'No models configured',
+    'apiSettings.previewModel': 'Preview',
+    'apiSettings.modelSettings': 'Model Settings',
+    'apiSettings.deleteModel': 'Delete Model',
+    'apiSettings.providerUpdated': 'Provider updated',
+    'apiSettings.preview': 'Preview',
+    'apiSettings.used': 'used',
+    'apiSettings.total': 'total',
    'apiSettings.testConnection': 'Test Connection',
    'apiSettings.endpointId': 'Endpoint ID',
    'apiSettings.endpointIdHint': 'Usage: ccw cli -p "..." --model <endpoint-id>',
+    'apiSettings.endpoints': 'Endpoints',
+    'apiSettings.addEndpointHint': 'Create custom endpoint aliases for CLI usage',
+    'apiSettings.endpointModel': 'Model',
+    'apiSettings.selectEndpoint': 'Select an endpoint',
+    'apiSettings.selectEndpointHint': 'Choose an endpoint from the list to view or edit its settings',
    'apiSettings.provider': 'Provider',
    'apiSettings.model': 'Model',
    'apiSettings.selectModel': 'Select model',
+    'apiSettings.noModelsConfigured': 'No models configured for this provider',
    'apiSettings.cacheStrategy': 'Cache Strategy',
    'apiSettings.enableContextCaching': 'Enable Context Caching',
    'apiSettings.cacheTTL': 'TTL (minutes)',
@@ -1386,6 +1435,82 @@ const i18n = {
    'apiSettings.addProviderFirst': 'Please add a provider first',
    'apiSettings.failedToLoad': 'Failed to load API settings',
    'apiSettings.toggleVisibility': 'Toggle visibility',
+    'apiSettings.noProvidersHint': 'Add an API provider to get started',
+    'apiSettings.noEndpointsHint': 'Create custom endpoints for quick access to models',
+    'apiSettings.cache': 'Cache',
+    'apiSettings.off': 'Off',
+    'apiSettings.used': 'used',
+    'apiSettings.total': 'total',
+    'apiSettings.cacheUsage': 'Usage',
+    'apiSettings.cacheSize': 'Size',
+    'apiSettings.endpointsDescription': 'Manage custom API endpoints for quick model access',
+    'apiSettings.totalEndpoints': 'Total Endpoints',
+    'apiSettings.cachedEndpoints': 'Cached Endpoints',
+    'apiSettings.cacheTabHint': 'Configure global cache settings and view statistics in the main panel',
+    'apiSettings.cacheDescription': 'Manage response caching to improve performance and reduce costs',
+    'apiSettings.cachedEntries': 'Cached Entries',
+    'apiSettings.storageUsed': 'Storage Used',
+    'apiSettings.cacheActions': 'Cache Actions',
+    'apiSettings.cacheStatistics': 'Cache Statistics',
+    'apiSettings.globalCache': 'Global Cache',
+
+    // Multi-key management
+    'apiSettings.apiKeys': 'API Keys',
+    'apiSettings.addKey': 'Add Key',
+    'apiSettings.keyLabel': 'Label',
+    'apiSettings.keyValue': 'API Key',
+    'apiSettings.keyWeight': 'Weight',
+    'apiSettings.removeKey': 'Remove',
+    'apiSettings.noKeys': 'No API keys configured',
+    'apiSettings.primaryKey': 'Primary Key',
+
+    // Routing strategy
+    'apiSettings.routingStrategy': 'Routing Strategy',
+    'apiSettings.simpleShuffleRouting': 'Simple Shuffle (Random)',
+    'apiSettings.weightedRouting': 'Weighted Distribution',
+    'apiSettings.latencyRouting': 'Latency-Based',
+    'apiSettings.costRouting': 'Cost-Based',
+    'apiSettings.leastBusyRouting': 'Least Busy',
+    'apiSettings.routingHint': 'How to distribute requests across multiple API keys',
+
+    // Health check
+    'apiSettings.healthCheck': 'Health Check',
+    'apiSettings.enableHealthCheck': 'Enable Health Check',
+    'apiSettings.healthInterval': 'Check Interval (seconds)',
+    'apiSettings.healthCooldown': 'Cooldown (seconds)',
+    'apiSettings.failureThreshold': 'Failure Threshold',
+    'apiSettings.healthStatus': 'Status',
+    'apiSettings.healthy': 'Healthy',
+    'apiSettings.unhealthy': 'Unhealthy',
+    'apiSettings.unknown': 'Unknown',
+    'apiSettings.lastCheck': 'Last Check',
+    'apiSettings.testKey': 'Test Key',
+    'apiSettings.testingKey': 'Testing...',
+    'apiSettings.keyValid': 'Key is valid',
+    'apiSettings.keyInvalid': 'Key is invalid',
+
+    // Embedding models
+    'apiSettings.embeddingDimensions': 'Dimensions',
+    'apiSettings.embeddingMaxTokens': 'Max Tokens',
+    'apiSettings.selectEmbeddingModel': 'Select Embedding Model',
+
+    // Model modal
+    'apiSettings.addLlmModel': 'Add LLM Model',
+    'apiSettings.addEmbeddingModel': 'Add Embedding Model',
+    'apiSettings.modelId': 'Model ID',
+    'apiSettings.modelName': 'Display Name',
+    'apiSettings.modelSeries': 'Series',
+    'apiSettings.selectFromPresets': 'Select from Presets',
+    'apiSettings.customModel': 'Custom Model',
+    'apiSettings.capabilities': 'Capabilities',
+    'apiSettings.streaming': 'Streaming',
+    'apiSettings.functionCalling': 'Function Calling',
+    'apiSettings.vision': 'Vision',
+    'apiSettings.contextWindow': 'Context Window',
+    'apiSettings.description': 'Description',
+    'apiSettings.optional': 'Optional',
+    'apiSettings.modelIdExists': 'Model ID already exists',
+    'apiSettings.useModelTreeToManage': 'Use the model tree to manage individual models',

    // Common
    'common.cancel': 'Cancel',
@@ -1410,6 +1535,7 @@ const i18n = {
    'common.saveFailed': 'Failed to save',
    'common.unknownError': 'Unknown error',
    'common.exception': 'Exception',
+    'common.status': 'Status',

    // Core Memory
    'title.coreMemory': 'Core Memory',
@@ -1537,6 +1663,8 @@ const i18n = {
    'common.loading': '加载中...',
    'common.error': '错误',
    'common.success': '成功',
+    'common.deleteSuccess': '删除成功',
+    'common.deleteFailed': '删除失败',
    'common.retry': '重试',
    'common.refresh': '刷新',
    'common.minutes': '分钟',
@@ -2869,17 +2997,64 @@ const i18n = {
    'apiSettings.editEndpoint': '编辑端点',
    'apiSettings.deleteEndpoint': '删除端点',
    'apiSettings.providerType': '提供商类型',
+    'apiSettings.apiFormat': 'API 格式',
+    'apiSettings.compatible': '兼容',
+    'apiSettings.customFormat': '自定义格式',
+    'apiSettings.apiFormatHint': '大多数供应商（DeepSeek、Ollama 等）使用 OpenAI 兼容格式',
    'apiSettings.displayName': '显示名称',
    'apiSettings.apiKey': 'API 密钥',
    'apiSettings.apiBaseUrl': 'API 基础 URL',
    'apiSettings.useEnvVar': '使用环境变量',
    'apiSettings.enableProvider': '启用提供商',
+    'apiSettings.advancedSettings': '高级设置',
+    'apiSettings.basicInfo': '基本信息',
+    'apiSettings.endpointSettings': '端点设置',
+    'apiSettings.timeout': '超时时间（秒）',
+    'apiSettings.seconds': '秒',
+    'apiSettings.timeoutHint': '请求超时时间，单位秒（默认：300）',
+    'apiSettings.maxRetries': '最大重试次数',
+    'apiSettings.maxRetriesHint': '失败后最大重试次数',
+    'apiSettings.organization': '组织 ID',
+    'apiSettings.organizationHint': 'OpenAI 组织 ID（org-...）',
+    'apiSettings.apiVersion': 'API 版本',
+    'apiSettings.apiVersionHint': 'Azure API 版本（如 2024-02-01）',
+    'apiSettings.rpm': 'RPM 限制',
+    'apiSettings.tpm': 'TPM 限制',
+    'apiSettings.unlimited': '无限制',
+    'apiSettings.proxy': '代理服务器',
+    'apiSettings.proxyHint': 'HTTP 代理服务器 URL',
+    'apiSettings.customHeaders': '自定义请求头',
+    'apiSettings.customHeadersHint': '自定义 HTTP 请求头的 JSON 对象',
+    'apiSettings.invalidJsonHeaders': '自定义请求头 JSON 格式无效',
+    'apiSettings.searchProviders': '搜索供应商...',
+    'apiSettings.selectProvider': '选择供应商',
+    'apiSettings.selectProviderHint': '从列表中选择一个供应商来查看和管理其设置',
+    'apiSettings.noProvidersFound': '未找到供应商',
+    'apiSettings.llmModels': '大语言模型',
+    'apiSettings.embeddingModels': '向量模型',
+    'apiSettings.manageModels': '管理',
+    'apiSettings.addModel': '添加模型',
+    'apiSettings.multiKeySettings': '多密钥设置',
+    'apiSettings.noModels': '暂无模型配置',
+    'apiSettings.previewModel': '预览',
+    'apiSettings.modelSettings': '模型设置',
+    'apiSettings.deleteModel': '删除模型',
+    'apiSettings.providerUpdated': '供应商已更新',
+    'apiSettings.preview': '预览',
+    'apiSettings.used': '已使用',
+    'apiSettings.total': '总计',
    'apiSettings.testConnection': '测试连接',
    'apiSettings.endpointId': '端点 ID',
    'apiSettings.endpointIdHint': '用法: ccw cli -p "..." --model <端点ID>',
+    'apiSettings.endpoints': '端点',
+    'apiSettings.addEndpointHint': '创建用于 CLI 的自定义端点别名',
+    'apiSettings.endpointModel': '模型',
+    'apiSettings.selectEndpoint': '选择端点',
+    'apiSettings.selectEndpointHint': '从列表中选择一个端点以查看或编辑其设置',
    'apiSettings.provider': '提供商',
    'apiSettings.model': '模型',
    'apiSettings.selectModel': '选择模型',
+    'apiSettings.noModelsConfigured': '该供应商未配置模型',
    'apiSettings.cacheStrategy': '缓存策略',
    'apiSettings.enableContextCaching': '启用上下文缓存',
    'apiSettings.cacheTTL': 'TTL (分钟)',
@@ -2910,6 +3085,82 @@ const i18n = {
    'apiSettings.addProviderFirst': '请先添加提供商',
    'apiSettings.failedToLoad': '加载 API 设置失败',
    'apiSettings.toggleVisibility': '切换可见性',
+    'apiSettings.noProvidersHint': '添加 API 提供商以开始使用',
+    'apiSettings.noEndpointsHint': '创建自定义端点以快速访问模型',
+    'apiSettings.cache': '缓存',
+    'apiSettings.off': '关闭',
+    'apiSettings.used': '已用',
+    'apiSettings.total': '总计',
+    'apiSettings.cacheUsage': '使用率',
+    'apiSettings.cacheSize': '大小',
+    'apiSettings.endpointsDescription': '管理自定义 API 端点以快速访问模型',
+    'apiSettings.totalEndpoints': '总端点数',
+    'apiSettings.cachedEndpoints': '缓存端点数',
+    'apiSettings.cacheTabHint': '在主面板中配置全局缓存设置并查看统计信息',
+    'apiSettings.cacheDescription': '管理响应缓存以提高性能并降低成本',
+    'apiSettings.cachedEntries': '缓存条目',
+    'apiSettings.storageUsed': '已用存储',
+    'apiSettings.cacheActions': '缓存操作',
+    'apiSettings.cacheStatistics': '缓存统计',
+    'apiSettings.globalCache': '全局缓存',
+
+    // Multi-key management
+    'apiSettings.apiKeys': 'API 密钥',
+    'apiSettings.addKey': '添加密钥',
+    'apiSettings.keyLabel': '标签',
+    'apiSettings.keyValue': 'API 密钥',
+    'apiSettings.keyWeight': '权重',
+    'apiSettings.removeKey': '移除',
+    'apiSettings.noKeys': '未配置 API 密钥',
+    'apiSettings.primaryKey': '主密钥',
+
+    // Routing strategy
+    'apiSettings.routingStrategy': '路由策略',
+    'apiSettings.simpleShuffleRouting': '简单随机',
+    'apiSettings.weightedRouting': '权重分配',
+    'apiSettings.latencyRouting': '延迟优先',
+    'apiSettings.costRouting': '成本优先',
+    'apiSettings.leastBusyRouting': '最少并发',
+    'apiSettings.routingHint': '如何在多个 API 密钥间分配请求',
+
+    // Health check
+    'apiSettings.healthCheck': '健康检查',
+    'apiSettings.enableHealthCheck': '启用健康检查',
+    'apiSettings.healthInterval': '检查间隔（秒）',
+    'apiSettings.healthCooldown': '冷却时间（秒）',
+    'apiSettings.failureThreshold': '失败阈值',
+    'apiSettings.healthStatus': '状态',
+    'apiSettings.healthy': '健康',
+    'apiSettings.unhealthy': '异常',
+    'apiSettings.unknown': '未知',
+    'apiSettings.lastCheck': '最后检查',
+    'apiSettings.testKey': '测试密钥',
+    'apiSettings.testingKey': '测试中...',
+    'apiSettings.keyValid': '密钥有效',
+    'apiSettings.keyInvalid': '密钥无效',
+
+    // Embedding models
+    'apiSettings.embeddingDimensions': '向量维度',
+    'apiSettings.embeddingMaxTokens': '最大 Token',
+    'apiSettings.selectEmbeddingModel': '选择嵌入模型',
+
+    // Model modal
+    'apiSettings.addLlmModel': '添加 LLM 模型',
+    'apiSettings.addEmbeddingModel': '添加嵌入模型',
+    'apiSettings.modelId': '模型 ID',
+    'apiSettings.modelName': '显示名称',
+    'apiSettings.modelSeries': '模型系列',
+    'apiSettings.selectFromPresets': '从预设选择',
+    'apiSettings.customModel': '自定义模型',
+    'apiSettings.capabilities': '能力',
+    'apiSettings.streaming': '流式输出',
+    'apiSettings.functionCalling': '函数调用',
+    'apiSettings.vision': '视觉能力',
+    'apiSettings.contextWindow': '上下文窗口',
+    'apiSettings.description': '描述',
+    'apiSettings.optional': '可选',
+    'apiSettings.modelIdExists': '模型 ID 已存在',
+    'apiSettings.useModelTreeToManage': '使用模型树管理各个模型',

    // Common
    'common.cancel': '取消',
@@ -2934,6 +3185,7 @@ const i18n = {
    'common.saveFailed': '保存失败',
    'common.unknownError': '未知错误',
    'common.exception': '异常',
+    'common.status': '状态',

    // Core Memory
    'title.coreMemory': '核心记忆',
--- a/ccw/src/templates/dashboard-js/views/api-settings.js
+++ b/ccw/src/templates/dashboard-js/views/api-settings.js
--- a/ccw/src/templates/dashboard-js/views/codexlens-manager.js
+++ b/ccw/src/templates/dashboard-js/views/codexlens-manager.js
@@ -810,8 +810,8 @@ function buildManualDownloadGuide() {
            '<i data-lucide="info" class="w-3.5 h-3.5 mt-0.5 flex-shrink-0"></i>' +
            '<div>' +
              '<strong>' + (t('codexlens.cacheLocation') || 'Cache Location') + ':</strong><br>' +
-              '<code class="text-xs">Windows: %LOCALAPPDATA%\\Temp\\fastembed_cache</code><br>' +
-              '<code class="text-xs">Linux/Mac: ~/.cache/fastembed</code>' +
+              '<code class="text-xs">Default: ~/.cache/huggingface</code><br>' +
+              '<code class="text-xs text-muted-foreground">(Check HF_HOME env var if set)</code>' +
            '</div>' +
          '</div>' +
        '</div>' +
--- a/ccw/src/tools/cli-executor.ts
+++ b/ccw/src/tools/cli-executor.ts
@@ -67,7 +67,7 @@ const ParamsSchema = z.object({
  model: z.string().optional(),
  cd: z.string().optional(),
  includeDirs: z.string().optional(),
-  timeout: z.number().default(300000),
+  timeout: z.number().default(0), // 0 = no internal timeout, controlled by external caller (e.g., bash timeout)
  resume: z.union([z.boolean(), z.string()]).optional(), // true = last, string = single ID or comma-separated IDs
  id: z.string().optional(), // Custom execution ID (e.g., IMPL-001-step1)
  noNative: z.boolean().optional(), // Force prompt concatenation instead of native resume
@@ -1058,8 +1058,10 @@ async function executeCliTool(
      reject(new Error(`Failed to spawn ${tool}: ${error.message}`));
    });

-    // Timeout handling
-    const timeoutId = setTimeout(() => {
+    // Timeout handling (timeout=0 disables internal timeout, controlled by external caller)
+    let timeoutId: NodeJS.Timeout | null = null;
+    if (timeout > 0) {
+      timeoutId = setTimeout(() => {
        timedOut = true;
        child.kill('SIGTERM');
        setTimeout(() => {
@@ -1068,9 +1070,12 @@ async function executeCliTool(
          }
        }, 5000);
      }, timeout);
+    }

    child.on('close', () => {
+      if (timeoutId) {
        clearTimeout(timeoutId);
+      }
    });
  });
 }
@@ -1115,8 +1120,8 @@ Modes:
      },
      timeout: {
        type: 'number',
-        description: 'Timeout in milliseconds (default: 300000 = 5 minutes)',
-        default: 300000
+        description: 'Timeout in milliseconds (default: 0 = disabled, controlled by external caller)',
+        default: 0
      }
    },
    required: ['tool', 'prompt']
--- a/ccw/src/types/litellm-api-config.ts
+++ b/ccw/src/types/litellm-api-config.ts
@@ -6,17 +6,184 @@
 */

 /**
- * Supported LLM provider types
+ * API format types (simplified)
+ * Most providers use OpenAI-compatible format
 */
 export type ProviderType =
-  | 'openai'
-  | 'anthropic'
-  | 'ollama'
-  | 'azure'
-  | 'google'
-  | 'mistral'
-  | 'deepseek'
-  | 'custom';
+  | 'openai'      // OpenAI-compatible format (most providers)
+  | 'anthropic'   // Anthropic format
+  | 'custom';     // Custom format
+
+/**
+ * Advanced provider settings for LiteLLM compatibility
+ * Maps to LiteLLM's provider configuration options
+ */
+export interface ProviderAdvancedSettings {
+  /** Request timeout in seconds (default: 300) */
+  timeout?: number;
+
+  /** Maximum retry attempts on failure (default: 3) */
+  maxRetries?: number;
+
+  /** Organization ID (OpenAI-specific) */
+  organization?: string;
+
+  /** API version string (Azure-specific, e.g., "2024-02-01") */
+  apiVersion?: string;
+
+  /** Custom HTTP headers as JSON object */
+  customHeaders?: Record<string, string>;
+
+  /** Requests per minute rate limit */
+  rpm?: number;
+
+  /** Tokens per minute rate limit */
+  tpm?: number;
+
+  /** Proxy server URL (e.g., "http://proxy.example.com:8080") */
+  proxy?: string;
+}
+
+/**
+ * Model type classification
+ */
+export type ModelType = 'llm' | 'embedding';
+
+/**
+ * Model capability metadata
+ */
+export interface ModelCapabilities {
+  /** Whether the model supports streaming responses */
+  streaming?: boolean;
+
+  /** Whether the model supports function/tool calling */
+  functionCalling?: boolean;
+
+  /** Whether the model supports vision/image input */
+  vision?: boolean;
+
+  /** Context window size in tokens */
+  contextWindow?: number;
+
+  /** Embedding dimension (for embedding models only) */
+  embeddingDimension?: number;
+
+  /** Maximum output tokens */
+  maxOutputTokens?: number;
+}
+
+/**
+ * Routing strategy for load balancing across multiple keys
+ */
+export type RoutingStrategy = 
+  | 'simple-shuffle'    // Random selection (default, recommended)
+  | 'weighted'          // Weight-based distribution
+  | 'latency-based'     // Route to lowest latency
+  | 'cost-based'        // Route to lowest cost
+  | 'least-busy';       // Route to least concurrent
+
+/**
+ * Individual API key configuration with optional weight
+ */
+export interface ApiKeyEntry {
+  /** Unique identifier */
+  id: string;
+
+  /** API key value or env var reference */
+  key: string;
+
+  /** Display label for this key */
+  label?: string;
+
+  /** Weight for weighted routing (default: 1) */
+  weight?: number;
+
+  /** Whether this key is enabled */
+  enabled: boolean;
+
+  /** Last health check status */
+  healthStatus?: 'healthy' | 'unhealthy' | 'unknown';
+
+  /** Last health check timestamp */
+  lastHealthCheck?: string;
+
+  /** Error message if unhealthy */
+  lastError?: string;
+}
+
+/**
+ * Health check configuration
+ */
+export interface HealthCheckConfig {
+  /** Enable automatic health checks */
+  enabled: boolean;
+
+  /** Check interval in seconds (default: 300) */
+  intervalSeconds: number;
+
+  /** Cooldown period after failure in seconds (default: 5) */
+  cooldownSeconds: number;
+
+  /** Number of failures before marking unhealthy (default: 3) */
+  failureThreshold: number;
+}
+
+
+/**
+ * Model-specific endpoint settings
+ * Allows per-model configuration overrides
+ */
+export interface ModelEndpointSettings {
+  /** Override base URL for this model */
+  baseUrl?: string;
+
+  /** Override timeout for this model */
+  timeout?: number;
+
+  /** Override max retries for this model */
+  maxRetries?: number;
+
+  /** Custom headers for this model */
+  customHeaders?: Record<string, string>;
+
+  /** Cache strategy for this model */
+  cacheStrategy?: CacheStrategy;
+}
+
+/**
+ * Model definition with type and grouping
+ */
+export interface ModelDefinition {
+  /** Unique identifier for this model */
+  id: string;
+
+  /** Display name for UI */
+  name: string;
+
+  /** Model type: LLM or Embedding */
+  type: ModelType;
+
+  /** Model series for grouping (e.g., "GPT-4", "Claude-3") */
+  series: string;
+
+  /** Whether this model is enabled */
+  enabled: boolean;
+
+  /** Model capabilities */
+  capabilities?: ModelCapabilities;
+
+  /** Model-specific endpoint settings */
+  endpointSettings?: ModelEndpointSettings;
+
+  /** Optional description */
+  description?: string;
+
+  /** Creation timestamp (ISO 8601) */
+  createdAt: string;
+
+  /** Last update timestamp (ISO 8601) */
+  updatedAt: string;
+}

 /**
 * Provider credential configuration
@@ -41,6 +208,24 @@ export interface ProviderCredential {
  /** Whether this provider is enabled */
  enabled: boolean;

+  /** Advanced provider settings (optional) */
+  advancedSettings?: ProviderAdvancedSettings;
+
+  /** Multiple API keys for load balancing */
+  apiKeys?: ApiKeyEntry[];
+
+  /** Routing strategy for multi-key load balancing */
+  routingStrategy?: RoutingStrategy;
+
+  /** Health check configuration */
+  healthCheck?: HealthCheckConfig;
+
+  /** LLM models configured for this provider */
+  llmModels?: ModelDefinition[];
+
+  /** Embedding models configured for this provider */
+  embeddingModels?: ModelDefinition[];
+
  /** Creation timestamp (ISO 8601) */
  createdAt: string;

--- a/codex-lens/src/codexlens/cli/embedding_manager.py
+++ b/codex-lens/src/codexlens/cli/embedding_manager.py
@@ -309,7 +309,7 @@ def generate_embeddings(

            # Set/update model configuration for this index
            vector_store.set_model_config(
-                model_profile, embedder.model_name, embedder.embedding_dim
+                model_profile, embedder.model_name, embedder.embedding_dim, backend=embedding_backend
            )
            # Use bulk insert mode for efficient batch ANN index building
            # This defers ANN updates until end_bulk_insert() is called
--- a/codex-lens/src/codexlens/cli/model_manager.py
+++ b/codex-lens/src/codexlens/cli/model_manager.py
@@ -107,8 +107,9 @@ def _get_model_cache_path(cache_dir: Path, info: Dict) -> Path:
        Path to the model cache directory
    """
    # HuggingFace Hub naming: models--{org}--{model}
-    model_name = info["model_name"]
-    sanitized_name = f"models--{model_name.replace('/', '--')}"
+    # Use cache_name if available (for mapped ONNX models), else model_name
+    target_name = info.get("cache_name", info["model_name"])
+    sanitized_name = f"models--{target_name.replace('/', '--')}"
    return cache_dir / sanitized_name


--- a/codex-lens/src/codexlens/search/hybrid_search.py
+++ b/codex-lens/src/codexlens/search/hybrid_search.py
@@ -260,7 +260,7 @@ class HybridSearchEngine:
                return []

            # Initialize embedder and vector store
-            from codexlens.semantic.embedder import get_embedder
+            from codexlens.semantic.factory import get_embedder
            from codexlens.semantic.vector_store import VectorStore

            vector_store = VectorStore(index_path)
@@ -277,32 +277,51 @@ class HybridSearchEngine:
            # Get stored model configuration (preferred) or auto-detect from dimension
            model_config = vector_store.get_model_config()
            if model_config:
-                profile = model_config["model_profile"]
+                backend = model_config.get("backend", "fastembed")
+                model_name = model_config["model_name"]
+                model_profile = model_config["model_profile"]
                self.logger.debug(
-                    "Using stored model config: %s (%s, %dd)",
-                    profile, model_config["model_name"], model_config["embedding_dim"]
+                    "Using stored model config: %s backend, %s (%s, %dd)",
+                    backend, model_profile, model_name, model_config["embedding_dim"]
                )
+                
+                # Get embedder based on backend
+                if backend == "litellm":
+                    embedder = get_embedder(backend="litellm", model=model_name)
+                else:
+                    embedder = get_embedder(backend="fastembed", profile=model_profile)
            else:
                # Fallback: auto-detect from embedding dimension
                detected_dim = vector_store.dimension
                if detected_dim is None:
                    self.logger.info("Vector store dimension unknown, using default profile")
-                    profile = "code"  # Default fallback
+                    embedder = get_embedder(backend="fastembed", profile="code")
                elif detected_dim == 384:
-                    profile = "fast"
+                    embedder = get_embedder(backend="fastembed", profile="fast")
                elif detected_dim == 768:
-                    profile = "code"
+                    embedder = get_embedder(backend="fastembed", profile="code")
                elif detected_dim == 1024:
-                    profile = "multilingual"  # or balanced, both are 1024
-                else:
-                    profile = "code"  # Default fallback
-                self.logger.debug(
-                    "No stored model config, auto-detected profile '%s' from dimension %s",
-                    profile, detected_dim
+                    embedder = get_embedder(backend="fastembed", profile="multilingual")
+                elif detected_dim == 1536:
+                    # Likely OpenAI text-embedding-3-small or ada-002
+                    self.logger.info(
+                        "Detected 1536-dim embeddings (likely OpenAI), using litellm backend with text-embedding-3-small"
                    )
+                    embedder = get_embedder(backend="litellm", model="text-embedding-3-small")
+                elif detected_dim == 3072:
+                    # Likely OpenAI text-embedding-3-large
+                    self.logger.info(
+                        "Detected 3072-dim embeddings (likely OpenAI), using litellm backend with text-embedding-3-large"
+                    )
+                    embedder = get_embedder(backend="litellm", model="text-embedding-3-large")
+                else:
+                    self.logger.debug(
+                        "Unknown dimension %s, using default fastembed profile 'code'",
+                        detected_dim
+                    )
+                    embedder = get_embedder(backend="fastembed", profile="code")
+

-            # Use cached embedder (singleton) for performance
-            embedder = get_embedder(profile=profile)

            # Generate query embedding
            query_embedding = embedder.embed_single(query)
--- a/codex-lens/src/codexlens/semantic/vector_store.py
+++ b/codex-lens/src/codexlens/semantic/vector_store.py
@@ -123,12 +123,34 @@ class VectorStore:
                    model_profile TEXT NOT NULL,
                    model_name TEXT NOT NULL,
                    embedding_dim INTEGER NOT NULL,
+                    backend TEXT NOT NULL DEFAULT 'fastembed',
                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
                )
            """)
+
+            # Migration: Add backend column to existing tables
+            self._migrate_backend_column(conn)
+
            conn.commit()

+    def _migrate_backend_column(self, conn: sqlite3.Connection) -> None:
+        """Add backend column to existing embeddings_config table if not present.
+
+        Args:
+            conn: Active SQLite connection
+        """
+        # Check if backend column exists
+        cursor = conn.execute("PRAGMA table_info(embeddings_config)")
+        columns = [row[1] for row in cursor.fetchall()]
+
+        if 'backend' not in columns:
+            logger.info("Migrating embeddings_config table: adding backend column")
+            conn.execute("""
+                ALTER TABLE embeddings_config
+                ADD COLUMN backend TEXT NOT NULL DEFAULT 'fastembed'
+            """)
+
    def _init_ann_index(self) -> None:
        """Initialize ANN index (lazy loading from existing data)."""
        if not HNSWLIB_AVAILABLE:
@@ -947,11 +969,11 @@ class VectorStore:
        """Get the model configuration used for embeddings in this store.

        Returns:
-            Dictionary with model_profile, model_name, embedding_dim, or None if not set.
+            Dictionary with model_profile, model_name, embedding_dim, backend, or None if not set.
        """
        with sqlite3.connect(self.db_path) as conn:
            row = conn.execute(
-                "SELECT model_profile, model_name, embedding_dim, created_at, updated_at "
+                "SELECT model_profile, model_name, embedding_dim, backend, created_at, updated_at "
                "FROM embeddings_config WHERE id = 1"
            ).fetchone()
            if row:
@@ -959,13 +981,14 @@ class VectorStore:
                    "model_profile": row[0],
                    "model_name": row[1],
                    "embedding_dim": row[2],
-                    "created_at": row[3],
-                    "updated_at": row[4],
+                    "backend": row[3],
+                    "created_at": row[4],
+                    "updated_at": row[5],
                }
        return None

    def set_model_config(
-        self, model_profile: str, model_name: str, embedding_dim: int
+        self, model_profile: str, model_name: str, embedding_dim: int, backend: str = 'fastembed'
    ) -> None:
        """Set the model configuration for embeddings in this store.

@@ -976,19 +999,21 @@ class VectorStore:
            model_profile: Model profile name (fast, code, minilm, etc.)
            model_name: Full model name (e.g., jinaai/jina-embeddings-v2-base-code)
            embedding_dim: Embedding dimension (e.g., 768)
+            backend: Backend used for embeddings (fastembed or litellm, default: fastembed)
        """
        with sqlite3.connect(self.db_path) as conn:
            conn.execute(
                """
-                INSERT INTO embeddings_config (id, model_profile, model_name, embedding_dim)
-                VALUES (1, ?, ?, ?)
+                INSERT INTO embeddings_config (id, model_profile, model_name, embedding_dim, backend)
+                VALUES (1, ?, ?, ?, ?)
                ON CONFLICT(id) DO UPDATE SET
                    model_profile = excluded.model_profile,
                    model_name = excluded.model_name,
                    embedding_dim = excluded.embedding_dim,
+                    backend = excluded.backend,
                    updated_at = CURRENT_TIMESTAMP
                """,
-                (model_profile, model_name, embedding_dim)
+                (model_profile, model_name, embedding_dim, backend)
            )
            conn.commit()