feat: Enhance configuration management and embedding capabilities

- Added JSON-based settings management in Config class for embedding and LLM configurations. - Introduced methods to save and load settings from a JSON file. - Updated BaseEmbedder and its subclasses to include max_tokens property for better token management. - Enhanced chunking strategy to support recursive splitting of large symbols with improved overlap handling. - Implemented comprehensive tests for recursive splitting and chunking behavior. - Added CLI tools configuration management for better integration with external tools. - Introduced a new command for compacting session memory into structured text for recovery.
2026-02-05 01:50:27 +08:00 · 2025-12-24 16:32:27 +08:00
parent b00113d212
commit e671b45948
25 changed files with 2889 additions and 153 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -4,7 +4,23 @@
 - **Coding Philosophy**: @~/.claude/workflows/coding-philosophy.md
 - **Context Requirements**: @~/.claude/workflows/context-tools.md
 - **File Modification**: @~/.claude/workflows/file-modification.md
+- **CLI Endpoints Config**: @.claude/cli-tools.json
+
+## CLI Endpoints
+
+**Strictly follow the @.claude/cli-tools.json configuration**
+
+Available CLI endpoints are dynamically defined by the config file:
+- Built-in tools and their enable/disable status
+- Custom API endpoints registered via the Dashboard
+- Managed through the CCW Dashboard Status page

 ## Agent Execution

- **Always use `run_in_background = false`** for Task tool agent calls to ensure synchronous execution and immediate result visibility
+- **Always use `run_in_background = false`** for Task tool agent calls to ensure synchronous execution and immediate result visibility
+
+## Code Diagnostics
+
+- **Prefer `mcp__ide__getDiagnostics`** for code error checking over shell-based TypeScript compilation
+- Usage: `mcp__ide__getDiagnostics({ uri: "file:///path/to/file.ts" })` for specific file or omit uri for all files
+- Benefits: Works across platforms, no shell environment issues, real-time IDE integration
--- a/.claude/cli-tools.json
+++ b/.claude/cli-tools.json
@@ -0,0 +1,46 @@
+{
+  "$schema": "./cli-tools.schema.json",
+  "version": "1.0.0",
+  "tools": {
+    "gemini": {
+      "enabled": true,
+      "isBuiltin": true,
+      "command": "gemini",
+      "description": "Google AI for code analysis"
+    },
+    "qwen": {
+      "enabled": true,
+      "isBuiltin": true,
+      "command": "qwen",
+      "description": "Alibaba AI assistant"
+    },
+    "codex": {
+      "enabled": true,
+      "isBuiltin": true,
+      "command": "codex",
+      "description": "OpenAI code generation"
+    },
+    "claude": {
+      "enabled": true,
+      "isBuiltin": true,
+      "command": "claude",
+      "description": "Anthropic AI assistant"
+    }
+  },
+  "customEndpoints": [],
+  "defaultTool": "gemini",
+  "settings": {
+    "promptFormat": "plain",
+    "smartContext": {
+      "enabled": false,
+      "maxFiles": 10
+    },
+    "nativeResume": true,
+    "recursiveQuery": true,
+    "cache": {
+      "injectionMode": "auto",
+      "defaultPrefix": "",
+      "defaultSuffix": ""
+    }
+  }
+}
--- a/.codex/prompts/compact.md
+++ b/.codex/prompts/compact.md
@@ -0,0 +1,378 @@
+---
+description: Compact current session memory into structured text for session recovery
+argument-hint: "[optional: session description]"
+---
+
+# Memory Compact Command (/memory:compact)
+
+## 1. Overview
+
+The `memory:compact` command **compresses current session working memory** into structured text optimized for **session recovery**, extracts critical information, and saves it to persistent storage via MCP `core_memory` tool.
+
+**Core Philosophy**:
+- **Session Recovery First**: Capture everything needed to resume work seamlessly
+- **Minimize Re-exploration**: Include file paths, decisions, and state to avoid redundant analysis
+- **Preserve Train of Thought**: Keep notes and hypotheses for complex debugging
+- **Actionable State**: Record last action result and known issues
+
+## 2. Parameters
+
+- `"session description"` (Optional): Session description to supplement objective
+  - Example: "completed core-memory module"
+  - Example: "debugging JWT refresh - suspected memory leak"
+
+## 3. Structured Output Format
+
+```markdown
+## Session ID
+[WFS-ID if workflow session active, otherwise (none)]
+
+## Project Root
+[Absolute path to project root, e.g., D:\Claude_dms3]
+
+## Objective
+[High-level goal - the "North Star" of this session]
+
+## Execution Plan
+[CRITICAL: Embed the LATEST plan in its COMPLETE and DETAILED form]
+
+### Source: [workflow | todo | user-stated | inferred]
+
+<details>
+<summary>Full Execution Plan (Click to expand)</summary>
+
+[PRESERVE COMPLETE PLAN VERBATIM - DO NOT SUMMARIZE]
+- ALL phases, tasks, subtasks
+- ALL file paths (absolute)
+- ALL dependencies and prerequisites
+- ALL acceptance criteria
+- ALL status markers ([x] done, [ ] pending)
+- ALL notes and context
+
+Example:
+## Phase 1: Setup
+- [x] Initialize project structure
+  - Created D:\Claude_dms3\src\core\index.ts
+  - Added dependencies: lodash, zod
+- [ ] Configure TypeScript
+  - Update tsconfig.json for strict mode
+
+## Phase 2: Implementation
+- [ ] Implement core API
+  - Target: D:\Claude_dms3\src\api\handler.ts
+  - Dependencies: Phase 1 complete
+  - Acceptance: All tests pass
+
+</details>
+
+## Working Files (Modified)
+[Absolute paths to actively modified files]
+- D:\Claude_dms3\src\file1.ts (role: main implementation)
+- D:\Claude_dms3\tests\file1.test.ts (role: unit tests)
+
+## Reference Files (Read-Only)
+[Absolute paths to context files - NOT modified but essential for understanding]
+- D:\Claude_dms3\.claude\CLAUDE.md (role: project instructions)
+- D:\Claude_dms3\src\types\index.ts (role: type definitions)
+- D:\Claude_dms3\package.json (role: dependencies)
+
+## Last Action
+[Last significant action and its result/status]
+
+## Decisions
+- [Decision]: [Reasoning]
+- [Decision]: [Reasoning]
+
+## Constraints
+- [User-specified limitation or preference]
+
+## Dependencies
+- [Added/changed packages or environment requirements]
+
+## Known Issues
+- [Deferred bug or edge case]
+
+## Changes Made
+- [Completed modification]
+
+## Pending
+- [Next step] or (none)
+
+## Notes
+[Unstructured thoughts, hypotheses, debugging trails]
+```
+
+## 4. Field Definitions
+
+| Field | Purpose | Recovery Value |
+|-------|---------|----------------|
+| **Session ID** | Workflow session identifier (WFS-*) | Links memory to specific stateful task execution |
+| **Project Root** | Absolute path to project directory | Enables correct path resolution in new sessions |
+| **Objective** | Ultimate goal of the session | Prevents losing track of broader feature |
+| **Execution Plan** | Complete plan from any source (verbatim) | Preserves full planning context, avoids re-planning |
+| **Working Files** | Actively modified files (absolute paths) | Immediately identifies where work was happening |
+| **Reference Files** | Read-only context files (absolute paths) | Eliminates re-exploration for critical context |
+| **Last Action** | Final tool output/status | Immediate state awareness (success/failure) |
+| **Decisions** | Architectural choices + reasoning | Prevents re-litigating settled decisions |
+| **Constraints** | User-imposed limitations | Maintains personalized coding style |
+| **Dependencies** | Package/environment changes | Prevents missing dependency errors |
+| **Known Issues** | Deferred bugs/edge cases | Ensures issues aren't forgotten |
+| **Changes Made** | Completed modifications | Clear record of what was done |
+| **Pending** | Next steps | Immediate action items |
+| **Notes** | Hypotheses, debugging trails | Preserves "train of thought" |
+
+## 5. Execution Flow
+
+### Step 1: Analyze Current Session
+
+Extract the following from conversation history:
+
+```javascript
+const sessionAnalysis = {
+  sessionId: "",       // WFS-* if workflow session active, null otherwise
+  projectRoot: "",     // Absolute path: D:\Claude_dms3
+  objective: "",       // High-level goal (1-2 sentences)
+  executionPlan: {
+    source: "workflow" | "todo" | "user-stated" | "inferred",
+    content: ""        // Full plan content - ALWAYS preserve COMPLETE and DETAILED form
+  },
+  workingFiles: [],    // {absolutePath, role} - modified files
+  referenceFiles: [],  // {absolutePath, role} - read-only context files
+  lastAction: "",      // Last significant action + result
+  decisions: [],       // {decision, reasoning}
+  constraints: [],     // User-specified limitations
+  dependencies: [],    // Added/changed packages
+  knownIssues: [],     // Deferred bugs
+  changesMade: [],     // Completed modifications
+  pending: [],         // Next steps
+  notes: ""            // Unstructured thoughts
+}
+```
+
+### Step 2: Generate Structured Text
+
+```javascript
+// Helper: Generate execution plan section
+const generateExecutionPlan = (plan) => {
+  const sourceLabels = {
+    'workflow': 'workflow (IMPL_PLAN.md)',
+    'todo': 'todo (TodoWrite)',
+    'user-stated': 'user-stated',
+    'inferred': 'inferred'
+  };
+
+  // CRITICAL: Preserve complete plan content verbatim - DO NOT summarize
+  return `### Source: ${sourceLabels[plan.source] || plan.source}
+
+<details>
+<summary>Full Execution Plan (Click to expand)</summary>
+
+${plan.content}
+
+</details>`;
+};
+
+const structuredText = `## Session ID
+${sessionAnalysis.sessionId || '(none)'}
+
+## Project Root
+${sessionAnalysis.projectRoot}
+
+## Objective
+${sessionAnalysis.objective}
+
+## Execution Plan
+${generateExecutionPlan(sessionAnalysis.executionPlan)}
+
+## Working Files (Modified)
+${sessionAnalysis.workingFiles.map(f => `- ${f.absolutePath} (role: ${f.role})`).join('\n') || '(none)'}
+
+## Reference Files (Read-Only)
+${sessionAnalysis.referenceFiles.map(f => `- ${f.absolutePath} (role: ${f.role})`).join('\n') || '(none)'}
+
+## Last Action
+${sessionAnalysis.lastAction}
+
+## Decisions
+${sessionAnalysis.decisions.map(d => `- ${d.decision}: ${d.reasoning}`).join('\n') || '(none)'}
+
+## Constraints
+${sessionAnalysis.constraints.map(c => `- ${c}`).join('\n') || '(none)'}
+
+## Dependencies
+${sessionAnalysis.dependencies.map(d => `- ${d}`).join('\n') || '(none)'}
+
+## Known Issues
+${sessionAnalysis.knownIssues.map(i => `- ${i}`).join('\n') || '(none)'}
+
+## Changes Made
+${sessionAnalysis.changesMade.map(c => `- ${c}`).join('\n') || '(none)'}
+
+## Pending
+${sessionAnalysis.pending.length > 0
+  ? sessionAnalysis.pending.map(p => `- ${p}`).join('\n')
+  : '(none)'}
+
+## Notes
+${sessionAnalysis.notes || '(none)'}`
+```
+
+### Step 3: Import to Core Memory via MCP
+
+Use the MCP `core_memory` tool to save the structured text:
+
+```javascript
+mcp__ccw-tools__core_memory({
+  operation: "import",
+  text: structuredText
+})
+```
+
+Or via CLI (pipe structured text to import):
+
+```bash
+# Write structured text to temp file, then import
+echo "$structuredText" | ccw core-memory import
+
+# Or from a file
+ccw core-memory import --file /path/to/session-memory.md
+```
+
+**Response Format**:
+```json
+{
+  "operation": "import",
+  "id": "CMEM-YYYYMMDD-HHMMSS",
+  "message": "Created memory: CMEM-YYYYMMDD-HHMMSS"
+}
+```
+
+### Step 4: Report Recovery ID
+
+After successful import, **clearly display the Recovery ID** to the user:
+
+```
+╔════════════════════════════════════════════════════════════════════════════╗
+║  ✓ Session Memory Saved                                                    ║
+║                                                                            ║
+║  Recovery ID: CMEM-YYYYMMDD-HHMMSS                                         ║
+║                                                                            ║
+║  To restore: "Please import memory <ID>"                                   ║
+║  (MCP: core_memory export | CLI: ccw core-memory export --id <ID>)         ║
+╚════════════════════════════════════════════════════════════════════════════╝
+```
+
+## 6. Quality Checklist
+
+Before generating:
+- [ ] Session ID captured if workflow session active (WFS-*)
+- [ ] Project Root is absolute path (e.g., D:\Claude_dms3)
+- [ ] Objective clearly states the "North Star" goal
+- [ ] Execution Plan: COMPLETE plan preserved VERBATIM (no summarization)
+- [ ] Plan Source: Clearly identified (workflow | todo | user-stated | inferred)
+- [ ] Plan Details: ALL phases, tasks, file paths, dependencies, status markers included
+- [ ] All file paths are ABSOLUTE (not relative)
+- [ ] Working Files: 3-8 modified files with roles
+- [ ] Reference Files: Key context files (CLAUDE.md, types, configs)
+- [ ] Last Action captures final state (success/failure)
+- [ ] Decisions include reasoning, not just choices
+- [ ] Known Issues separates deferred from forgotten bugs
+- [ ] Notes preserve debugging hypotheses if any
+
+## 7. Path Resolution Rules
+
+### Project Root Detection
+1. Check current working directory from environment
+2. Look for project markers: `.git/`, `package.json`, `.claude/`
+3. Use the topmost directory containing these markers
+
+### Absolute Path Conversion
+```javascript
+// Convert relative to absolute
+const toAbsolutePath = (relativePath, projectRoot) => {
+  if (path.isAbsolute(relativePath)) return relativePath;
+  return path.join(projectRoot, relativePath);
+};
+
+// Example: "src/api/auth.ts" → "D:\Claude_dms3\src\api\auth.ts"
+```
+
+### Reference File Categories
+| Category | Examples | Priority |
+|----------|----------|----------|
+| Project Config | `.claude/CLAUDE.md`, `package.json`, `tsconfig.json` | High |
+| Type Definitions | `src/types/*.ts`, `*.d.ts` | High |
+| Related Modules | Parent/sibling modules with shared interfaces | Medium |
+| Test Files | Corresponding test files for modified code | Medium |
+| Documentation | `README.md`, `ARCHITECTURE.md` | Low |
+
+## 8. Plan Detection (Priority Order)
+
+### Priority 1: Workflow Session (IMPL_PLAN.md)
+```javascript
+// Check for active workflow session
+const manifest = await mcp__ccw-tools__session_manager({
+  operation: "list",
+  location: "active"
+});
+
+if (manifest.sessions?.length > 0) {
+  const session = manifest.sessions[0];
+  const plan = await mcp__ccw-tools__session_manager({
+    operation: "read",
+    session_id: session.id,
+    content_type: "plan"
+  });
+  sessionAnalysis.sessionId = session.id;
+  sessionAnalysis.executionPlan.source = "workflow";
+  sessionAnalysis.executionPlan.content = plan.content;
+}
+```
+
+### Priority 2: TodoWrite (Current Session Todos)
+```javascript
+// Extract from conversation - look for TodoWrite tool calls
+// Preserve COMPLETE todo list with all details
+const todos = extractTodosFromConversation();
+if (todos.length > 0) {
+  sessionAnalysis.executionPlan.source = "todo";
+  // Format todos with full context - preserve status markers
+  sessionAnalysis.executionPlan.content = todos.map(t =>
+    `- [${t.status === 'completed' ? 'x' : t.status === 'in_progress' ? '>' : ' '}] ${t.content}`
+  ).join('\n');
+}
+```
+
+### Priority 3: User-Stated Plan
+```javascript
+// Look for explicit plan statements in user messages:
+// - "Here's my plan: 1. ... 2. ... 3. ..."
+// - "I want to: first..., then..., finally..."
+// - Numbered or bulleted lists describing steps
+const userPlan = extractUserStatedPlan();
+if (userPlan) {
+  sessionAnalysis.executionPlan.source = "user-stated";
+  sessionAnalysis.executionPlan.content = userPlan;
+}
+```
+
+### Priority 4: Inferred Plan
+```javascript
+// If no explicit plan, infer from:
+// - Task description and breakdown discussion
+// - Sequence of actions taken
+// - Outstanding work mentioned
+const inferredPlan = inferPlanFromDiscussion();
+if (inferredPlan) {
+  sessionAnalysis.executionPlan.source = "inferred";
+  sessionAnalysis.executionPlan.content = inferredPlan;
+}
+```
+
+## 9. Notes
+
+- **Timing**: Execute at task completion or before context switch
+- **Frequency**: Once per independent task or milestone
+- **Recovery**: New session can immediately continue with full context
+- **Knowledge Graph**: Entity relationships auto-extracted for visualization
+- **Absolute Paths**: Critical for cross-session recovery on different machines
--- a/ccw-litellm/src/ccw_litellm/clients/litellm_embedder.py
+++ b/ccw-litellm/src/ccw_litellm/clients/litellm_embedder.py
@@ -81,7 +81,7 @@ class LiteLLMEmbedder(AbstractEmbedder):
        """Format model name for LiteLLM.

        Returns:
-            Formatted model name (e.g., "text-embedding-3-small")
+            Formatted model name (e.g., "openai/text-embedding-3-small")
        """
        provider = self._model_config.provider
        model = self._model_config.model
@@ -90,6 +90,11 @@ class LiteLLMEmbedder(AbstractEmbedder):
        if provider in ["azure", "vertex_ai", "bedrock"]:
            return f"{provider}/{model}"

+        # For providers with custom api_base (OpenAI-compatible endpoints),
+        # use openai/ prefix to tell LiteLLM to use OpenAI API format
+        if self._provider_config.api_base and provider not in ["openai", "anthropic"]:
+            return f"openai/{model}"
+
        return model

    @property
@@ -133,6 +138,10 @@ class LiteLLMEmbedder(AbstractEmbedder):
        embedding_kwargs = {**self._litellm_kwargs, **kwargs}

        try:
+            # For OpenAI-compatible endpoints, ensure encoding_format is set
+            if self._provider_config.api_base and "encoding_format" not in embedding_kwargs:
+                embedding_kwargs["encoding_format"] = "float"
+
            # Call LiteLLM embedding
            response = litellm.embedding(
                model=self._format_model_name(),
--- a/ccw-litellm/src/ccw_litellm/config/loader.py
+++ b/ccw-litellm/src/ccw_litellm/config/loader.py
@@ -2,6 +2,7 @@

 from __future__ import annotations

+import json
 import os
 import re
 from pathlib import Path
@@ -11,8 +12,12 @@ import yaml

 from .models import LiteLLMConfig

-# Default configuration path
-DEFAULT_CONFIG_PATH = Path.home() / ".ccw" / "config" / "litellm-config.yaml"
+# Default configuration paths
+# JSON format (UI config) takes priority over YAML format
+DEFAULT_JSON_CONFIG_PATH = Path.home() / ".ccw" / "config" / "litellm-api-config.json"
+DEFAULT_YAML_CONFIG_PATH = Path.home() / ".ccw" / "config" / "litellm-config.yaml"
+# Keep backward compatibility
+DEFAULT_CONFIG_PATH = DEFAULT_YAML_CONFIG_PATH

 # Global configuration singleton
 _config_instance: LiteLLMConfig | None = None
@@ -84,11 +89,147 @@ def _get_default_config() -> dict[str, Any]:
    }


-def load_config(config_path: Path | str | None = None) -> LiteLLMConfig:
-    """Load LiteLLM configuration from YAML file.
+def _convert_json_to_internal_format(json_config: dict[str, Any]) -> dict[str, Any]:
+    """Convert UI JSON config format to internal format.
+
+    The UI stores config in a different structure:
+    - providers: array of {id, name, type, apiKey, apiBase, llmModels[], embeddingModels[]}
+
+    Internal format uses:
+    - providers: dict of {provider_id: {api_key, api_base}}
+    - llm_models: dict of {model_id: {provider, model}}
+    - embedding_models: dict of {model_id: {provider, model, dimensions}}

    Args:
-        config_path: Path to configuration file (default: ~/.ccw/config/litellm-config.yaml)
+        json_config: Configuration in UI JSON format
+
+    Returns:
+        Configuration in internal format
+    """
+    providers: dict[str, Any] = {}
+    llm_models: dict[str, Any] = {}
+    embedding_models: dict[str, Any] = {}
+    default_provider: str | None = None
+
+    for provider in json_config.get("providers", []):
+        if not provider.get("enabled", True):
+            continue
+
+        provider_id = provider.get("id", "")
+        if not provider_id:
+            continue
+
+        # Set first enabled provider as default
+        if default_provider is None:
+            default_provider = provider_id
+
+        # Convert provider with advanced settings
+        provider_config: dict[str, Any] = {
+            "api_key": provider.get("apiKey", ""),
+            "api_base": provider.get("apiBase"),
+        }
+
+        # Map advanced settings
+        adv = provider.get("advancedSettings", {})
+        if adv.get("timeout"):
+            provider_config["timeout"] = adv["timeout"]
+        if adv.get("maxRetries"):
+            provider_config["max_retries"] = adv["maxRetries"]
+        if adv.get("organization"):
+            provider_config["organization"] = adv["organization"]
+        if adv.get("apiVersion"):
+            provider_config["api_version"] = adv["apiVersion"]
+        if adv.get("customHeaders"):
+            provider_config["custom_headers"] = adv["customHeaders"]
+
+        providers[provider_id] = provider_config
+
+        # Convert LLM models
+        for model in provider.get("llmModels", []):
+            if not model.get("enabled", True):
+                continue
+            model_id = model.get("id", "")
+            if not model_id:
+                continue
+
+            llm_model_config: dict[str, Any] = {
+                "provider": provider_id,
+                "model": model.get("name", ""),
+            }
+            # Add model-specific endpoint settings
+            endpoint = model.get("endpointSettings", {})
+            if endpoint.get("baseUrl"):
+                llm_model_config["api_base"] = endpoint["baseUrl"]
+            if endpoint.get("timeout"):
+                llm_model_config["timeout"] = endpoint["timeout"]
+            if endpoint.get("maxRetries"):
+                llm_model_config["max_retries"] = endpoint["maxRetries"]
+
+            # Add capabilities
+            caps = model.get("capabilities", {})
+            if caps.get("contextWindow"):
+                llm_model_config["context_window"] = caps["contextWindow"]
+            if caps.get("maxOutputTokens"):
+                llm_model_config["max_output_tokens"] = caps["maxOutputTokens"]
+
+            llm_models[model_id] = llm_model_config
+
+        # Convert embedding models
+        for model in provider.get("embeddingModels", []):
+            if not model.get("enabled", True):
+                continue
+            model_id = model.get("id", "")
+            if not model_id:
+                continue
+
+            embedding_model_config: dict[str, Any] = {
+                "provider": provider_id,
+                "model": model.get("name", ""),
+                "dimensions": model.get("capabilities", {}).get("embeddingDimension", 1536),
+            }
+            # Add model-specific endpoint settings
+            endpoint = model.get("endpointSettings", {})
+            if endpoint.get("baseUrl"):
+                embedding_model_config["api_base"] = endpoint["baseUrl"]
+            if endpoint.get("timeout"):
+                embedding_model_config["timeout"] = endpoint["timeout"]
+
+            embedding_models[model_id] = embedding_model_config
+
+    # Ensure we have defaults if no models found
+    if not llm_models:
+        llm_models["default"] = {
+            "provider": default_provider or "openai",
+            "model": "gpt-4",
+        }
+
+    if not embedding_models:
+        embedding_models["default"] = {
+            "provider": default_provider or "openai",
+            "model": "text-embedding-3-small",
+            "dimensions": 1536,
+        }
+
+    return {
+        "version": json_config.get("version", 1),
+        "default_provider": default_provider or "openai",
+        "providers": providers,
+        "llm_models": llm_models,
+        "embedding_models": embedding_models,
+    }
+
+
+def load_config(config_path: Path | str | None = None) -> LiteLLMConfig:
+    """Load LiteLLM configuration from JSON or YAML file.
+
+    Priority order:
+    1. Explicit config_path if provided
+    2. JSON config (UI format): ~/.ccw/config/litellm-api-config.json
+    3. YAML config: ~/.ccw/config/litellm-config.yaml
+    4. Default configuration
+
+    Args:
+        config_path: Path to configuration file (optional)

    Returns:
        Parsed and validated configuration
@@ -97,22 +238,47 @@ def load_config(config_path: Path | str | None = None) -> LiteLLMConfig:
        FileNotFoundError: If config file not found and no default available
        ValueError: If configuration is invalid
    """
-    if config_path is None:
-        config_path = DEFAULT_CONFIG_PATH
-    else:
-        config_path = Path(config_path)
+    raw_config: dict[str, Any] | None = None
+    is_json_format = False

-    # Load configuration
-    if config_path.exists():
+    if config_path is not None:
+        config_path = Path(config_path)
+        if config_path.exists():
+            try:
+                with open(config_path, "r", encoding="utf-8") as f:
+                    if config_path.suffix == ".json":
+                        raw_config = json.load(f)
+                        is_json_format = True
+                    else:
+                        raw_config = yaml.safe_load(f)
+            except Exception as e:
+                raise ValueError(f"Failed to load configuration from {config_path}: {e}") from e
+
+    # Check JSON config first (UI format)
+    if raw_config is None and DEFAULT_JSON_CONFIG_PATH.exists():
        try:
-            with open(config_path, "r", encoding="utf-8") as f:
+            with open(DEFAULT_JSON_CONFIG_PATH, "r", encoding="utf-8") as f:
+                raw_config = json.load(f)
+                is_json_format = True
+        except Exception:
+            pass  # Fall through to YAML
+
+    # Check YAML config
+    if raw_config is None and DEFAULT_YAML_CONFIG_PATH.exists():
+        try:
+            with open(DEFAULT_YAML_CONFIG_PATH, "r", encoding="utf-8") as f:
                raw_config = yaml.safe_load(f)
-        except Exception as e:
-            raise ValueError(f"Failed to load configuration from {config_path}: {e}") from e
-    else:
-        # Use default configuration
+        except Exception:
+            pass  # Fall through to default
+
+    # Use default configuration
+    if raw_config is None:
        raw_config = _get_default_config()

+    # Convert JSON format to internal format if needed
+    if is_json_format:
+        raw_config = _convert_json_to_internal_format(raw_config)
+
    # Substitute environment variables
    config_data = _substitute_env_vars(raw_config)

--- a/ccw/src/config/litellm-api-config-manager.ts
+++ b/ccw/src/config/litellm-api-config-manager.ts
@@ -5,7 +5,7 @@

 import { existsSync, readFileSync, writeFileSync } from 'fs';
 import { join } from 'path';
-import { StoragePaths, ensureStorageDir } from './storage-paths.js';
+import { StoragePaths, GlobalPaths, ensureStorageDir } from './storage-paths.js';
 import type {
  LiteLLMApiConfig,
  ProviderCredential,
@@ -32,12 +32,12 @@ function getDefaultConfig(): LiteLLMApiConfig {
 }

 /**
- * Get config file path for a project
+ * Get config file path (global, shared across all projects)
 */
-function getConfigPath(baseDir: string): string {
-  const paths = StoragePaths.project(baseDir);
-  ensureStorageDir(paths.config);
-  return join(paths.config, 'litellm-api-config.json');
+function getConfigPath(_baseDir?: string): string {
+  const configDir = GlobalPaths.config();
+  ensureStorageDir(configDir);
+  return join(configDir, 'litellm-api-config.json');
 }

 /**
@@ -356,5 +356,166 @@ export function updateGlobalCacheSettings(
  saveConfig(baseDir, config);
 }

+// ===========================
+// YAML Config Generation for ccw_litellm
+// ===========================
+
+/**
+ * Convert UI config (JSON) to ccw_litellm config (YAML format object)
+ * This allows CodexLens to use UI-configured providers
+ */
+export function generateLiteLLMYamlConfig(baseDir: string): Record<string, unknown> {
+  const config = loadLiteLLMApiConfig(baseDir);
+
+  // Build providers object
+  const providers: Record<string, unknown> = {};
+  for (const provider of config.providers) {
+    if (!provider.enabled) continue;
+
+    providers[provider.id] = {
+      api_key: provider.apiKey,
+      api_base: provider.apiBase || getDefaultApiBaseForType(provider.type),
+    };
+  }
+
+  // Build embedding_models object from providers' embeddingModels
+  const embeddingModels: Record<string, unknown> = {};
+  for (const provider of config.providers) {
+    if (!provider.enabled || !provider.embeddingModels) continue;
+
+    for (const model of provider.embeddingModels) {
+      if (!model.enabled) continue;
+
+      embeddingModels[model.id] = {
+        provider: provider.id,
+        model: model.name,
+        dimensions: model.capabilities?.embeddingDimension || 1536,
+        // Use model-specific base URL if set, otherwise use provider's
+        ...(model.endpointSettings?.baseUrl && {
+          api_base: model.endpointSettings.baseUrl,
+        }),
+      };
+    }
+  }
+
+  // Build llm_models object from providers' llmModels
+  const llmModels: Record<string, unknown> = {};
+  for (const provider of config.providers) {
+    if (!provider.enabled || !provider.llmModels) continue;
+
+    for (const model of provider.llmModels) {
+      if (!model.enabled) continue;
+
+      llmModels[model.id] = {
+        provider: provider.id,
+        model: model.name,
+        ...(model.endpointSettings?.baseUrl && {
+          api_base: model.endpointSettings.baseUrl,
+        }),
+      };
+    }
+  }
+
+  // Find default provider
+  const defaultProvider = config.providers.find((p) => p.enabled)?.id || 'openai';
+
+  return {
+    version: 1,
+    default_provider: defaultProvider,
+    providers,
+    embedding_models: Object.keys(embeddingModels).length > 0 ? embeddingModels : {
+      default: {
+        provider: defaultProvider,
+        model: 'text-embedding-3-small',
+        dimensions: 1536,
+      },
+    },
+    llm_models: Object.keys(llmModels).length > 0 ? llmModels : {
+      default: {
+        provider: defaultProvider,
+        model: 'gpt-4',
+      },
+    },
+  };
+}
+
+/**
+ * Get default API base URL for provider type
+ */
+function getDefaultApiBaseForType(type: ProviderType): string {
+  const defaults: Record<string, string> = {
+    openai: 'https://api.openai.com/v1',
+    anthropic: 'https://api.anthropic.com/v1',
+    custom: 'https://api.example.com/v1',
+  };
+  return defaults[type] || 'https://api.openai.com/v1';
+}
+
+/**
+ * Save ccw_litellm YAML config file
+ * Writes to ~/.ccw/config/litellm-config.yaml
+ */
+export function saveLiteLLMYamlConfig(baseDir: string): string {
+  const yamlConfig = generateLiteLLMYamlConfig(baseDir);
+
+  // Convert to YAML manually (simple format)
+  const yamlContent = objectToYaml(yamlConfig);
+
+  // Write to ~/.ccw/config/litellm-config.yaml
+  const homePath = process.env.HOME || process.env.USERPROFILE || '';
+  const yamlPath = join(homePath, '.ccw', 'config', 'litellm-config.yaml');
+
+  // Ensure directory exists
+  const configDir = join(homePath, '.ccw', 'config');
+  ensureStorageDir(configDir);
+
+  writeFileSync(yamlPath, yamlContent, 'utf-8');
+  return yamlPath;
+}
+
+/**
+ * Simple object to YAML converter
+ */
+function objectToYaml(obj: unknown, indent: number = 0): string {
+  const spaces = '  '.repeat(indent);
+
+  if (obj === null || obj === undefined) {
+    return 'null';
+  }
+
+  if (typeof obj === 'string') {
+    // Quote strings that contain special characters
+    if (obj.includes(':') || obj.includes('#') || obj.includes('\n') || obj.startsWith('$')) {
+      return `"${obj.replace(/"/g, '\\"')}"`;
+    }
+    return obj;
+  }
+
+  if (typeof obj === 'number' || typeof obj === 'boolean') {
+    return String(obj);
+  }
+
+  if (Array.isArray(obj)) {
+    if (obj.length === 0) return '[]';
+    return obj.map((item) => `${spaces}- ${objectToYaml(item, indent + 1).trimStart()}`).join('\n');
+  }
+
+  if (typeof obj === 'object') {
+    const entries = Object.entries(obj as Record<string, unknown>);
+    if (entries.length === 0) return '{}';
+
+    return entries
+      .map(([key, value]) => {
+        if (typeof value === 'object' && value !== null && !Array.isArray(value)) {
+          return `${spaces}${key}:\n${objectToYaml(value, indent + 1)}`;
+        }
+        return `${spaces}${key}: ${objectToYaml(value, indent)}`;
+      })
+      .join('\n');
+  }
+
+  return String(obj);
+}
+
 // Re-export types
 export type { ProviderCredential, CustomEndpoint, ProviderType, CacheStrategy };
--- a/ccw/src/core/routes/cli-routes.ts
+++ b/ccw/src/core/routes/cli-routes.ts
@@ -33,6 +33,13 @@ import {
  getFullConfigResponse,
  PREDEFINED_MODELS
 } from '../../tools/cli-config-manager.js';
+import {
+  loadClaudeCliTools,
+  saveClaudeCliTools,
+  updateClaudeToolEnabled,
+  updateClaudeCacheSettings,
+  getClaudeCliToolsInfo
+} from '../../tools/claude-cli-tools.js';

 export interface RouteContext {
  pathname: string;
@@ -558,5 +565,101 @@ export async function handleCliRoutes(ctx: RouteContext): Promise<boolean> {
    return true;
  }

+  // API: Get CLI Tools Config from .claude/cli-tools.json (with fallback to global)
+  if (pathname === '/api/cli/tools-config' && req.method === 'GET') {
+    try {
+      const config = loadClaudeCliTools(initialPath);
+      const info = getClaudeCliToolsInfo(initialPath);
+      res.writeHead(200, { 'Content-Type': 'application/json' });
+      res.end(JSON.stringify({
+        ...config,
+        _configInfo: info
+      }));
+    } catch (err) {
+      res.writeHead(500, { 'Content-Type': 'application/json' });
+      res.end(JSON.stringify({ error: (err as Error).message }));
+    }
+    return true;
+  }
+
+  // API: Update CLI Tools Config
+  if (pathname === '/api/cli/tools-config' && req.method === 'PUT') {
+    handlePostRequest(req, res, async (body: unknown) => {
+      try {
+        const updates = body as Partial<any>;
+        const config = loadClaudeCliTools(initialPath);
+
+        // Merge updates
+        const updatedConfig = {
+          ...config,
+          ...updates,
+          tools: { ...config.tools, ...(updates.tools || {}) },
+          settings: {
+            ...config.settings,
+            ...(updates.settings || {}),
+            cache: {
+              ...config.settings.cache,
+              ...(updates.settings?.cache || {})
+            }
+          }
+        };
+
+        saveClaudeCliTools(initialPath, updatedConfig);
+
+        broadcastToClients({
+          type: 'CLI_TOOLS_CONFIG_UPDATED',
+          payload: { config: updatedConfig, timestamp: new Date().toISOString() }
+        });
+
+        return { success: true, config: updatedConfig };
+      } catch (err) {
+        return { error: (err as Error).message, status: 500 };
+      }
+    });
+    return true;
+  }
+
+  // API: Update specific tool enabled status
+  const toolsConfigMatch = pathname.match(/^\/api\/cli\/tools-config\/([a-zA-Z0-9_-]+)$/);
+  if (toolsConfigMatch && req.method === 'PUT') {
+    const toolName = toolsConfigMatch[1];
+    handlePostRequest(req, res, async (body: unknown) => {
+      try {
+        const { enabled } = body as { enabled: boolean };
+        const config = updateClaudeToolEnabled(initialPath, toolName, enabled);
+
+        broadcastToClients({
+          type: 'CLI_TOOL_TOGGLED',
+          payload: { tool: toolName, enabled, timestamp: new Date().toISOString() }
+        });
+
+        return { success: true, config };
+      } catch (err) {
+        return { error: (err as Error).message, status: 500 };
+      }
+    });
+    return true;
+  }
+
+  // API: Update cache settings
+  if (pathname === '/api/cli/tools-config/cache' && req.method === 'PUT') {
+    handlePostRequest(req, res, async (body: unknown) => {
+      try {
+        const cacheSettings = body as { injectionMode?: string; defaultPrefix?: string; defaultSuffix?: string };
+        const config = updateClaudeCacheSettings(initialPath, cacheSettings as any);
+
+        broadcastToClients({
+          type: 'CLI_CACHE_SETTINGS_UPDATED',
+          payload: { cache: config.settings.cache, timestamp: new Date().toISOString() }
+        });
+
+        return { success: true, config };
+      } catch (err) {
+        return { error: (err as Error).message, status: 500 };
+      }
+    });
+    return true;
+  }
+
  return false;
 }
--- a/ccw/src/core/routes/codexlens-routes.ts
+++ b/ccw/src/core/routes/codexlens-routes.ts
@@ -405,7 +405,7 @@ export async function handleCodexLensRoutes(ctx: RouteContext): Promise<boolean>
  // API: CodexLens Init (Initialize workspace index)
  if (pathname === '/api/codexlens/init' && req.method === 'POST') {
    handlePostRequest(req, res, async (body) => {
-      const { path: projectPath, indexType = 'vector', embeddingModel = 'code' } = body;
+      const { path: projectPath, indexType = 'vector', embeddingModel = 'code', embeddingBackend = 'fastembed' } = body;
      const targetPath = projectPath || initialPath;

      // Build CLI arguments based on index type
@@ -415,6 +415,10 @@ export async function handleCodexLensRoutes(ctx: RouteContext): Promise<boolean>
      } else {
        // Add embedding model selection for vector index
        args.push('--embedding-model', embeddingModel);
+        // Add embedding backend if not using default fastembed
+        if (embeddingBackend && embeddingBackend !== 'fastembed') {
+          args.push('--embedding-backend', embeddingBackend);
+        }
      }

      // Broadcast start event
--- a/ccw/src/core/routes/litellm-api-routes.ts
+++ b/ccw/src/core/routes/litellm-api-routes.ts
@@ -20,6 +20,8 @@ import {
  getGlobalCacheSettings,
  updateGlobalCacheSettings,
  loadLiteLLMApiConfig,
+  saveLiteLLMYamlConfig,
+  generateLiteLLMYamlConfig,
  type ProviderCredential,
  type CustomEndpoint,
  type ProviderType,
@@ -481,5 +483,150 @@ export async function handleLiteLLMApiRoutes(ctx: RouteContext): Promise<boolean
    return true;
  }

+  // ===========================
+  // Config Sync Routes
+  // ===========================
+
+  // POST /api/litellm-api/config/sync - Sync UI config to ccw_litellm YAML config
+  if (pathname === '/api/litellm-api/config/sync' && req.method === 'POST') {
+    try {
+      const yamlPath = saveLiteLLMYamlConfig(initialPath);
+
+      res.writeHead(200, { 'Content-Type': 'application/json' });
+      res.end(JSON.stringify({
+        success: true,
+        message: 'Config synced to ccw_litellm',
+        yamlPath,
+      }));
+    } catch (err) {
+      res.writeHead(500, { 'Content-Type': 'application/json' });
+      res.end(JSON.stringify({ error: (err as Error).message }));
+    }
+    return true;
+  }
+
+  // GET /api/litellm-api/config/yaml-preview - Preview YAML config without saving
+  if (pathname === '/api/litellm-api/config/yaml-preview' && req.method === 'GET') {
+    try {
+      const yamlConfig = generateLiteLLMYamlConfig(initialPath);
+
+      res.writeHead(200, { 'Content-Type': 'application/json' });
+      res.end(JSON.stringify({
+        success: true,
+        config: yamlConfig,
+      }));
+    } catch (err) {
+      res.writeHead(500, { 'Content-Type': 'application/json' });
+      res.end(JSON.stringify({ error: (err as Error).message }));
+    }
+    return true;
+  }
+
+  // ===========================
+  // CCW-LiteLLM Package Management
+  // ===========================
+
+  // GET /api/litellm-api/ccw-litellm/status - Check ccw-litellm installation status
+  if (pathname === '/api/litellm-api/ccw-litellm/status' && req.method === 'GET') {
+    try {
+      const { spawn } = await import('child_process');
+      const result = await new Promise<{ installed: boolean; version?: string }>((resolve) => {
+        const proc = spawn('python', ['-c', 'import ccw_litellm; print(ccw_litellm.__version__ if hasattr(ccw_litellm, "__version__") else "installed")'], {
+          shell: true,
+          timeout: 10000
+        });
+
+        let output = '';
+        proc.stdout?.on('data', (data) => { output += data.toString(); });
+        proc.on('close', (code) => {
+          if (code === 0) {
+            resolve({ installed: true, version: output.trim() || 'unknown' });
+          } else {
+            resolve({ installed: false });
+          }
+        });
+        proc.on('error', () => resolve({ installed: false }));
+      });
+
+      res.writeHead(200, { 'Content-Type': 'application/json' });
+      res.end(JSON.stringify(result));
+    } catch (err) {
+      res.writeHead(200, { 'Content-Type': 'application/json' });
+      res.end(JSON.stringify({ installed: false, error: (err as Error).message }));
+    }
+    return true;
+  }
+
+  // POST /api/litellm-api/ccw-litellm/install - Install ccw-litellm package
+  if (pathname === '/api/litellm-api/ccw-litellm/install' && req.method === 'POST') {
+    handlePostRequest(req, res, async () => {
+      try {
+        const { spawn } = await import('child_process');
+        const path = await import('path');
+        const fs = await import('fs');
+
+        // Try to find ccw-litellm package in distribution
+        const possiblePaths = [
+          path.join(initialPath, 'ccw-litellm'),
+          path.join(initialPath, '..', 'ccw-litellm'),
+          path.join(process.cwd(), 'ccw-litellm'),
+        ];
+
+        let packagePath = '';
+        for (const p of possiblePaths) {
+          const pyproject = path.join(p, 'pyproject.toml');
+          if (fs.existsSync(pyproject)) {
+            packagePath = p;
+            break;
+          }
+        }
+
+        if (!packagePath) {
+          // Try pip install from PyPI as fallback
+          return new Promise((resolve) => {
+            const proc = spawn('pip', ['install', 'ccw-litellm'], { shell: true, timeout: 300000 });
+            let output = '';
+            let error = '';
+            proc.stdout?.on('data', (data) => { output += data.toString(); });
+            proc.stderr?.on('data', (data) => { error += data.toString(); });
+            proc.on('close', (code) => {
+              if (code === 0) {
+                resolve({ success: true, message: 'ccw-litellm installed from PyPI' });
+              } else {
+                resolve({ success: false, error: error || 'Installation failed' });
+              }
+            });
+            proc.on('error', (err) => resolve({ success: false, error: err.message }));
+          });
+        }
+
+        // Install from local package
+        return new Promise((resolve) => {
+          const proc = spawn('pip', ['install', '-e', packagePath], { shell: true, timeout: 300000 });
+          let output = '';
+          let error = '';
+          proc.stdout?.on('data', (data) => { output += data.toString(); });
+          proc.stderr?.on('data', (data) => { error += data.toString(); });
+          proc.on('close', (code) => {
+            if (code === 0) {
+              // Broadcast installation event
+              broadcastToClients({
+                type: 'CCW_LITELLM_INSTALLED',
+                payload: { timestamp: new Date().toISOString() }
+              });
+              resolve({ success: true, message: 'ccw-litellm installed successfully', path: packagePath });
+            } else {
+              resolve({ success: false, error: error || output || 'Installation failed' });
+            }
+          });
+          proc.on('error', (err) => resolve({ success: false, error: err.message }));
+        });
+      } catch (err) {
+        return { success: false, error: (err as Error).message };
+      }
+    });
+    return true;
+  }
+
  return false;
 }
--- a/ccw/src/templates/dashboard-css/12-cli-legacy.css
+++ b/ccw/src/templates/dashboard-css/12-cli-legacy.css
@@ -170,6 +170,27 @@
  letter-spacing: 0.03em;
 }

+.cli-tool-badge-disabled {
+  font-size: 0.5625rem;
+  font-weight: 600;
+  padding: 0.125rem 0.375rem;
+  background: hsl(38 92% 50% / 0.2);
+  color: hsl(38 92% 50%);
+  border-radius: 9999px;
+  text-transform: uppercase;
+  letter-spacing: 0.03em;
+}
+
+/* Disabled tool card state */
+.cli-tool-card.disabled {
+  opacity: 0.7;
+  border-style: dashed;
+}
+
+.cli-tool-card.disabled .cli-tool-name {
+  color: hsl(var(--muted-foreground));
+}
+
 .cli-tool-info {
  font-size: 0.6875rem;
  margin-bottom: 0.3125rem;
@@ -773,6 +794,29 @@
  border-color: hsl(var(--destructive) / 0.5);
 }

+/* Enable/Disable button variants */
+.btn-sm.btn-outline-success {
+  background: transparent;
+  border: 1px solid hsl(142 76% 36% / 0.4);
+  color: hsl(142 76% 36%);
+}
+
+.btn-sm.btn-outline-success:hover {
+  background: hsl(142 76% 36% / 0.1);
+  border-color: hsl(142 76% 36% / 0.6);
+}
+
+.btn-sm.btn-outline-warning {
+  background: transparent;
+  border: 1px solid hsl(38 92% 50% / 0.4);
+  color: hsl(38 92% 50%);
+}
+
+.btn-sm.btn-outline-warning:hover {
+  background: hsl(38 92% 50% / 0.1);
+  border-color: hsl(38 92% 50% / 0.6);
+}
+
 /* Empty State */
 .empty-state {
  display: flex;
--- a/ccw/src/templates/dashboard-css/31-api-settings.css
+++ b/ccw/src/templates/dashboard-css/31-api-settings.css
@@ -622,11 +622,110 @@ select.cli-input {
  align-items: center;
  justify-content: flex-end;
  gap: 0.75rem;
-  margin-top: 1rem;
-  padding-top: 1rem;
+  margin-top: 1.25rem;
+  padding-top: 1.25rem;
  border-top: 1px solid hsl(var(--border));
 }

+.modal-actions button {
+  display: inline-flex;
+  align-items: center;
+  justify-content: center;
+  gap: 0.5rem;
+  padding: 0.625rem 1.25rem;
+  font-size: 0.875rem;
+  font-weight: 500;
+  border-radius: 0.5rem;
+  cursor: pointer;
+  transition: all 0.2s ease;
+  min-width: 5rem;
+}
+
+.modal-actions .btn-secondary {
+  background: transparent;
+  border: 1px solid hsl(var(--border));
+  color: hsl(var(--muted-foreground));
+}
+
+.modal-actions .btn-secondary:hover {
+  background: hsl(var(--muted));
+  color: hsl(var(--foreground));
+  border-color: hsl(var(--muted-foreground) / 0.3);
+}
+
+.modal-actions .btn-primary {
+  background: hsl(var(--primary));
+  border: 1px solid hsl(var(--primary));
+  color: hsl(var(--primary-foreground));
+}
+
+.modal-actions .btn-primary:hover {
+  background: hsl(var(--primary) / 0.9);
+  box-shadow: 0 2px 8px hsl(var(--primary) / 0.3);
+}
+
+.modal-actions .btn-primary:disabled {
+  opacity: 0.5;
+  cursor: not-allowed;
+  box-shadow: none;
+}
+
+.modal-actions .btn-danger {
+  background: hsl(var(--destructive));
+  border: 1px solid hsl(var(--destructive));
+  color: hsl(var(--destructive-foreground));
+}
+
+.modal-actions .btn-danger:hover {
+  background: hsl(var(--destructive) / 0.9);
+  box-shadow: 0 2px 8px hsl(var(--destructive) / 0.3);
+}
+
+.modal-actions button i,
+.modal-actions button svg {
+  width: 1rem;
+  height: 1rem;
+  flex-shrink: 0;
+}
+
+/* Handle .btn class prefix */
+.modal-actions .btn {
+  display: inline-flex;
+  align-items: center;
+  justify-content: center;
+  gap: 0.5rem;
+  padding: 0.625rem 1.25rem;
+  font-size: 0.875rem;
+  font-weight: 500;
+  border-radius: 0.5rem;
+  cursor: pointer;
+  transition: all 0.2s ease;
+  min-width: 5rem;
+}
+
+.modal-actions .btn.btn-secondary {
+  background: transparent;
+  border: 1px solid hsl(var(--border));
+  color: hsl(var(--muted-foreground));
+}
+
+.modal-actions .btn.btn-secondary:hover {
+  background: hsl(var(--muted));
+  color: hsl(var(--foreground));
+  border-color: hsl(var(--muted-foreground) / 0.3);
+}
+
+.modal-actions .btn.btn-primary {
+  background: hsl(var(--primary));
+  border: 1px solid hsl(var(--primary));
+  color: hsl(var(--primary-foreground));
+}
+
+.modal-actions .btn.btn-primary:hover {
+  background: hsl(var(--primary) / 0.9);
+  box-shadow: 0 2px 8px hsl(var(--primary) / 0.3);
+}
+
 /* Button Icon */
 .btn-icon {
  display: inline-flex;
@@ -1916,4 +2015,84 @@ select.cli-input {
  .health-check-grid {
    grid-template-columns: 1fr;
  }
+}
+
+/* ===========================
+   Model Settings Modal - Endpoint Preview
+   =========================== */
+
+.endpoint-preview-section {
+  background: hsl(var(--muted) / 0.3);
+  border: 1px solid hsl(var(--border));
+  border-radius: 0.5rem;
+  padding: 1rem;
+  margin-bottom: 0.5rem;
+}
+
+.endpoint-preview-section h4 {
+  display: flex;
+  align-items: center;
+  gap: 0.5rem;
+  margin: 0 0 0.75rem 0;
+  font-size: 0.875rem;
+  font-weight: 600;
+  color: hsl(var(--foreground));
+}
+
+.endpoint-preview-section h4 i {
+  width: 16px;
+  height: 16px;
+  color: hsl(var(--primary));
+}
+
+.endpoint-preview-box {
+  display: flex;
+  align-items: center;
+  gap: 0.5rem;
+  padding: 0.625rem 0.75rem;
+  background: hsl(var(--background));
+  border: 1px solid hsl(var(--border));
+  border-radius: 0.375rem;
+  margin-bottom: 1rem;
+}
+
+.endpoint-preview-box code {
+  flex: 1;
+  font-family: 'SF Mono', 'Consolas', 'Liberation Mono', monospace;
+  font-size: 0.8125rem;
+  color: hsl(var(--primary));
+  word-break: break-all;
+}
+
+.endpoint-preview-box .btn-icon-sm {
+  flex-shrink: 0;
+}
+
+/* Form Section within Modal */
+.form-section {
+  margin-bottom: 1.25rem;
+}
+
+.form-section h4 {
+  margin: 0 0 0.75rem 0;
+  font-size: 0.8125rem;
+  font-weight: 600;
+  color: hsl(var(--muted-foreground));
+  text-transform: uppercase;
+  letter-spacing: 0.05em;
+}
+
+.form-section:last-of-type {
+  margin-bottom: 0;
+}
+
+/* Capabilities Checkboxes */
+.capabilities-checkboxes {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 0.75rem 1.5rem;
+}
+
+.capabilities-checkboxes .checkbox-label {
+  font-size: 0.875rem;
 }
--- a/ccw/src/templates/dashboard-js/components/cli-status.js
+++ b/ccw/src/templates/dashboard-js/components/cli-status.js
@@ -8,6 +8,8 @@ let semanticStatus = { available: false };
 let ccwInstallStatus = { installed: true, workflowsInstalled: true, missingFiles: [], installPath: '' };
 let defaultCliTool = 'gemini';
 let promptConcatFormat = localStorage.getItem('ccw-prompt-format') || 'plain'; // plain, yaml, json
+let cliToolsConfig = {}; // CLI tools enable/disable config
+let apiEndpoints = []; // API endpoints from LiteLLM config

 // Smart Context settings
 let smartContextEnabled = localStorage.getItem('ccw-smart-context') === 'true';
@@ -41,6 +43,12 @@ async function loadAllStatuses() {
    semanticStatus = data.semantic || { available: false };
    ccwInstallStatus = data.ccwInstall || { installed: true, workflowsInstalled: true, missingFiles: [], installPath: '' };

+    // Load CLI tools config and API endpoints
+    await Promise.all([
+      loadCliToolsConfig(),
+      loadApiEndpoints()
+    ]);
+
    // Update badges
    updateCliBadge();
    updateCodexLensBadge();
@@ -168,6 +176,67 @@ async function loadInstalledModels() {
  }
 }

+/**
+ * Load CLI tools config from .claude/cli-tools.json (project or global fallback)
+ */
+async function loadCliToolsConfig() {
+  try {
+    const response = await fetch('/api/cli/tools-config');
+    if (!response.ok) return null;
+    const data = await response.json();
+    // Store full config and extract tools for backward compatibility
+    cliToolsConfig = data.tools || {};
+    window.claudeCliToolsConfig = data; // Full config available globally
+
+    // Load default tool from config
+    if (data.defaultTool) {
+      defaultCliTool = data.defaultTool;
+    }
+
+    console.log('[CLI Config] Loaded from:', data._configInfo?.source || 'unknown', '| Default:', data.defaultTool);
+    return data;
+  } catch (err) {
+    console.error('Failed to load CLI tools config:', err);
+    return null;
+  }
+}
+
+/**
+ * Update CLI tool enabled status
+ */
+async function updateCliToolEnabled(tool, enabled) {
+  try {
+    const response = await fetch('/api/cli/tools-config/' + tool, {
+      method: 'PUT',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({ enabled: enabled })
+    });
+    if (!response.ok) throw new Error('Failed to update');
+    showRefreshToast(tool + (enabled ? ' enabled' : ' disabled'), 'success');
+    return await response.json();
+  } catch (err) {
+    console.error('Failed to update CLI tool:', err);
+    showRefreshToast('Failed to update ' + tool, 'error');
+    return null;
+  }
+}
+
+/**
+ * Load API endpoints from LiteLLM config
+ */
+async function loadApiEndpoints() {
+  try {
+    const response = await fetch('/api/litellm-api/endpoints');
+    if (!response.ok) return [];
+    const data = await response.json();
+    apiEndpoints = data.endpoints || [];
+    return apiEndpoints;
+  } catch (err) {
+    console.error('Failed to load API endpoints:', err);
+    return [];
+  }
+}
+
 // ========== Badge Update ==========
 function updateCliBadge() {
  const badge = document.getElementById('badgeCliTools');
@@ -234,25 +303,41 @@ function renderCliStatus() {
    const status = cliToolStatus[tool] || {};
    const isAvailable = status.available;
    const isDefault = defaultCliTool === tool;
+    const config = cliToolsConfig[tool] || { enabled: true };
+    const isEnabled = config.enabled !== false;
+    const canSetDefault = isAvailable && isEnabled && !isDefault;

    return `
-      <div class="cli-tool-card tool-${tool} ${isAvailable ? 'available' : 'unavailable'}">
+      <div class="cli-tool-card tool-${tool} ${isAvailable ? 'available' : 'unavailable'} ${!isEnabled ? 'disabled' : ''}">
        <div class="cli-tool-header">
-          <span class="cli-tool-status ${isAvailable ? 'status-available' : 'status-unavailable'}"></span>
+          <span class="cli-tool-status ${isAvailable && isEnabled ? 'status-available' : 'status-unavailable'}"></span>
          <span class="cli-tool-name">${tool.charAt(0).toUpperCase() + tool.slice(1)}</span>
          ${isDefault ? '<span class="cli-tool-badge">Default</span>' : ''}
+          ${!isEnabled && isAvailable ? '<span class="cli-tool-badge-disabled">Disabled</span>' : ''}
        </div>
        <div class="cli-tool-desc text-xs text-muted-foreground mt-1">
          ${toolDescriptions[tool]}
        </div>
-        <div class="cli-tool-info mt-2">
-          ${isAvailable
-            ? `<span class="text-success flex items-center gap-1"><i data-lucide="check-circle" class="w-3 h-3"></i> Ready</span>`
-            : `<span class="text-muted-foreground flex items-center gap-1"><i data-lucide="circle-dashed" class="w-3 h-3"></i> Not Installed</span>`
-          }
+        <div class="cli-tool-info mt-2 flex items-center justify-between">
+          <div>
+            ${isAvailable
+              ? (isEnabled
+                  ? `<span class="text-success flex items-center gap-1"><i data-lucide="check-circle" class="w-3 h-3"></i> Ready</span>`
+                  : `<span class="text-warning flex items-center gap-1"><i data-lucide="pause-circle" class="w-3 h-3"></i> Disabled</span>`)
+              : `<span class="text-muted-foreground flex items-center gap-1"><i data-lucide="circle-dashed" class="w-3 h-3"></i> Not Installed</span>`
+            }
+          </div>
        </div>
-        <div class="cli-tool-actions mt-3">
-          ${isAvailable && !isDefault
+        <div class="cli-tool-actions mt-3 flex gap-2">
+          ${isAvailable ? (isEnabled
+            ? `<button class="btn-sm btn-outline-warning flex items-center gap-1" onclick="toggleCliTool('${tool}', false)">
+                <i data-lucide="pause" class="w-3 h-3"></i> Disable
+              </button>`
+            : `<button class="btn-sm btn-outline-success flex items-center gap-1" onclick="toggleCliTool('${tool}', true)">
+                <i data-lucide="play" class="w-3 h-3"></i> Enable
+              </button>`
+          ) : ''}
+          ${canSetDefault
            ? `<button class="btn-sm btn-outline flex items-center gap-1" onclick="setDefaultCliTool('${tool}')">
                <i data-lucide="star" class="w-3 h-3"></i> Set Default
              </button>`
@@ -365,11 +450,42 @@ function renderCliStatus() {
    </div>
  ` : '';

+  // API Endpoints section
+  const apiEndpointsHtml = apiEndpoints.length > 0 ? `
+    <div class="cli-api-endpoints-section" style="margin-top: 1.5rem;">
+      <div class="cli-section-header" style="display: flex; align-items: center; gap: 0.5rem; margin-bottom: 1rem;">
+        <h4 style="display: flex; align-items: center; gap: 0.5rem; font-weight: 600; margin: 0;">
+          <i data-lucide="link" class="w-4 h-4"></i> API Endpoints
+        </h4>
+        <span class="badge" style="padding: 0.125rem 0.5rem; font-size: 0.75rem; border-radius: 0.25rem; background: var(--muted); color: var(--muted-foreground);">${apiEndpoints.length}</span>
+      </div>
+      <div class="cli-endpoints-list" style="display: grid; grid-template-columns: repeat(auto-fill, minmax(250px, 1fr)); gap: 0.75rem;">
+        ${apiEndpoints.map(ep => `
+          <div class="cli-endpoint-card ${ep.enabled ? 'available' : 'unavailable'}" style="padding: 0.75rem; border: 1px solid var(--border); border-radius: 0.5rem; background: var(--card);">
+            <div class="cli-endpoint-header" style="display: flex; align-items: center; gap: 0.5rem; margin-bottom: 0.5rem;">
+              <span class="cli-tool-status ${ep.enabled ? 'status-available' : 'status-unavailable'}" style="width: 8px; height: 8px; border-radius: 50%; background: ${ep.enabled ? 'var(--success)' : 'var(--muted-foreground)'}; flex-shrink: 0;"></span>
+              <span class="cli-endpoint-id" style="font-weight: 500; font-size: 0.875rem;">${ep.id}</span>
+            </div>
+            <div class="cli-endpoint-info" style="margin-top: 0.25rem;">
+              <span class="text-xs text-muted-foreground" style="font-size: 0.75rem; color: var(--muted-foreground);">${ep.model}</span>
+            </div>
+          </div>
+        `).join('')}
+      </div>
+    </div>
+  ` : '';
+
+  // Config source info
+  const configInfo = window.claudeCliToolsConfig?._configInfo || {};
+  const configSourceLabel = configInfo.source === 'project' ? 'Project' : configInfo.source === 'global' ? 'Global' : 'Default';
+  const configSourceClass = configInfo.source === 'project' ? 'text-success' : configInfo.source === 'global' ? 'text-primary' : 'text-muted-foreground';
+
  // CLI Settings section
  const settingsHtml = `
    <div class="cli-settings-section">
      <div class="cli-settings-header">
        <h4><i data-lucide="settings" class="w-3.5 h-3.5"></i> Settings</h4>
+        <span class="badge text-xs ${configSourceClass}" title="${configInfo.activePath || ''}">${configSourceLabel}</span>
      </div>
      <div class="cli-settings-grid">
        <div class="cli-setting-item">
@@ -436,6 +552,20 @@ function renderCliStatus() {
          </div>
          <p class="cli-setting-desc">Maximum files to include in smart context</p>
        </div>
+        <div class="cli-setting-item">
+          <label class="cli-setting-label">
+            <i data-lucide="hard-drive" class="w-3 h-3"></i>
+            Cache Injection
+          </label>
+          <div class="cli-setting-control">
+            <select class="cli-setting-select" onchange="setCacheInjectionMode(this.value)">
+              <option value="auto" ${getCacheInjectionMode() === 'auto' ? 'selected' : ''}>Auto</option>
+              <option value="manual" ${getCacheInjectionMode() === 'manual' ? 'selected' : ''}>Manual</option>
+              <option value="disabled" ${getCacheInjectionMode() === 'disabled' ? 'selected' : ''}>Disabled</option>
+            </select>
+          </div>
+          <p class="cli-setting-desc">Cache prefix/suffix injection mode for prompts</p>
+        </div>
      </div>
    </div>
  `;
@@ -453,6 +583,7 @@ function renderCliStatus() {
      ${codexLensHtml}
      ${semanticHtml}
    </div>
+    ${apiEndpointsHtml}
    ${settingsHtml}
  `;

@@ -464,7 +595,30 @@ function renderCliStatus() {

 // ========== Actions ==========
 function setDefaultCliTool(tool) {
+  // Validate: tool must be available and enabled
+  const status = cliToolStatus[tool] || {};
+  const config = cliToolsConfig[tool] || { enabled: true };
+
+  if (!status.available) {
+    showRefreshToast(`Cannot set ${tool} as default: not installed`, 'error');
+    return;
+  }
+
+  if (config.enabled === false) {
+    showRefreshToast(`Cannot set ${tool} as default: tool is disabled`, 'error');
+    return;
+  }
+
  defaultCliTool = tool;
+  // Save to config
+  if (window.claudeCliToolsConfig) {
+    window.claudeCliToolsConfig.defaultTool = tool;
+    fetch('/api/cli/tools-config', {
+      method: 'PUT',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({ defaultTool: tool })
+    }).catch(err => console.error('Failed to save default tool:', err));
+  }
  renderCliStatus();
  showRefreshToast(`Default CLI tool set to ${tool}`, 'success');
 }
@@ -505,11 +659,67 @@ function setRecursiveQueryEnabled(enabled) {
  showRefreshToast(`Recursive Query ${enabled ? 'enabled' : 'disabled'}`, 'success');
 }

+function getCacheInjectionMode() {
+  if (window.claudeCliToolsConfig && window.claudeCliToolsConfig.settings) {
+    return window.claudeCliToolsConfig.settings.cache?.injectionMode || 'auto';
+  }
+  return localStorage.getItem('ccw-cache-injection-mode') || 'auto';
+}
+
+async function setCacheInjectionMode(mode) {
+  try {
+    const response = await fetch('/api/cli/tools-config/cache', {
+      method: 'PUT',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({ injectionMode: mode })
+    });
+    if (response.ok) {
+      localStorage.setItem('ccw-cache-injection-mode', mode);
+      if (window.claudeCliToolsConfig) {
+        window.claudeCliToolsConfig.settings.cache.injectionMode = mode;
+      }
+      showRefreshToast(`Cache injection mode set to ${mode}`, 'success');
+    } else {
+      showRefreshToast('Failed to update cache settings', 'error');
+    }
+  } catch (err) {
+    console.error('Failed to update cache settings:', err);
+    showRefreshToast('Failed to update cache settings', 'error');
+  }
+}
+
 async function refreshAllCliStatus() {
  await loadAllStatuses();
  renderCliStatus();
 }

+async function toggleCliTool(tool, enabled) {
+  // If disabling the current default tool, switch to another available+enabled tool
+  if (!enabled && defaultCliTool === tool) {
+    const tools = ['gemini', 'qwen', 'codex', 'claude'];
+    const newDefault = tools.find(t => {
+      if (t === tool) return false;
+      const status = cliToolStatus[t] || {};
+      const config = cliToolsConfig[t] || { enabled: true };
+      return status.available && config.enabled !== false;
+    });
+
+    if (newDefault) {
+      defaultCliTool = newDefault;
+      if (window.claudeCliToolsConfig) {
+        window.claudeCliToolsConfig.defaultTool = newDefault;
+      }
+      showRefreshToast(`Default tool switched to ${newDefault}`, 'info');
+    } else {
+      showRefreshToast(`Warning: No other enabled tool available for default`, 'warning');
+    }
+  }
+
+  await updateCliToolEnabled(tool, enabled);
+  await loadAllStatuses();
+  renderCliStatus();
+}
+
 function installCodexLens() {
  openCodexLensInstallWizard();
 }
--- a/ccw/src/templates/dashboard-js/i18n.js
+++ b/ccw/src/templates/dashboard-js/i18n.js
@@ -1389,7 +1389,13 @@ const i18n = {
    'apiSettings.previewModel': 'Preview',
    'apiSettings.modelSettings': 'Model Settings',
    'apiSettings.deleteModel': 'Delete Model',
+    'apiSettings.endpointPreview': 'Endpoint Preview',
+    'apiSettings.modelBaseUrlOverride': 'Base URL Override',
+    'apiSettings.modelBaseUrlHint': 'Override the provider base URL for this specific model (leave empty to use provider default)',
    'apiSettings.providerUpdated': 'Provider updated',
+    'apiSettings.syncToCodexLens': 'Sync to CodexLens',
+    'apiSettings.configSynced': 'Config synced to CodexLens',
+    'apiSettings.sdkAutoAppends': 'SDK auto-appends',
    'apiSettings.preview': 'Preview',
    'apiSettings.used': 'used',
    'apiSettings.total': 'total',
@@ -1422,6 +1428,7 @@ const i18n = {
    'apiSettings.cacheDisabled': 'Cache Disabled',
    'apiSettings.providerSaved': 'Provider saved successfully',
    'apiSettings.providerDeleted': 'Provider deleted successfully',
+    'apiSettings.apiBaseUpdated': 'API Base URL updated successfully',
    'apiSettings.endpointSaved': 'Endpoint saved successfully',
    'apiSettings.endpointDeleted': 'Endpoint deleted successfully',
    'apiSettings.cacheCleared': 'Cache cleared successfully',
@@ -3039,7 +3046,12 @@ const i18n = {
    'apiSettings.previewModel': '预览',
    'apiSettings.modelSettings': '模型设置',
    'apiSettings.deleteModel': '删除模型',
+    'apiSettings.endpointPreview': '端点预览',
+    'apiSettings.modelBaseUrlOverride': '基础 URL 覆盖',
+    'apiSettings.modelBaseUrlHint': '为此模型覆盖供应商的基础 URL（留空则使用供应商默认值）',
    'apiSettings.providerUpdated': '供应商已更新',
+    'apiSettings.syncToCodexLens': '同步到 CodexLens',
+    'apiSettings.configSynced': '配置已同步到 CodexLens',
    'apiSettings.preview': '预览',
    'apiSettings.used': '已使用',
    'apiSettings.total': '总计',
@@ -3072,6 +3084,7 @@ const i18n = {
    'apiSettings.cacheDisabled': '缓存已禁用',
    'apiSettings.providerSaved': '提供商保存成功',
    'apiSettings.providerDeleted': '提供商删除成功',
+    'apiSettings.apiBaseUpdated': 'API 基础 URL 更新成功',
    'apiSettings.endpointSaved': '端点保存成功',
    'apiSettings.endpointDeleted': '端点删除成功',
    'apiSettings.cacheCleared': '缓存清除成功',
--- a/ccw/src/templates/dashboard-js/views/api-settings.js
+++ b/ccw/src/templates/dashboard-js/views/api-settings.js
@@ -359,10 +359,20 @@ async function deleteProvider(providerId) {

 /**
 * Test provider connection
+ * @param {string} [providerIdParam] - Optional provider ID. If not provided, uses form context or selectedProviderId
 */
-async function testProviderConnection() {
-  const form = document.getElementById('providerForm');
-  const providerId = form.dataset.providerId;
+async function testProviderConnection(providerIdParam) {
+  var providerId = providerIdParam;
+
+  // Try to get providerId from different sources
+  if (!providerId) {
+    var form = document.getElementById('providerForm');
+    if (form && form.dataset.providerId) {
+      providerId = form.dataset.providerId;
+    } else if (selectedProviderId) {
+      providerId = selectedProviderId;
+    }
+  }

  if (!providerId) {
    showRefreshToast(t('apiSettings.saveProviderFirst'), 'warning');
@@ -553,9 +563,9 @@ async function showAddEndpointModal() {
    '</div>' +
    '</fieldset>' +
    '<div class="modal-actions">' +
-    '<button type="button" class="btn btn-secondary" onclick="closeEndpointModal()">' + t('common.cancel') + '</button>' +
+    '<button type="button" class="btn btn-secondary" onclick="closeEndpointModal()"><i data-lucide="x"></i> ' + t('common.cancel') + '</button>' +
    '<button type="submit" class="btn btn-primary">' +
-    '<i data-lucide="save"></i> ' + t('common.save') +
+    '<i data-lucide="check"></i> ' + t('common.save') +
    '</button>' +
    '</div>' +
    '</form>' +
@@ -845,7 +855,10 @@ async function renderApiSettings() {
  }

  // Build split layout
-  container.innerHTML = '<div class="api-settings-container api-settings-split">' +
+  container.innerHTML =
+    // CCW-LiteLLM Status Container
+    '<div id="ccwLitellmStatusContainer" class="mb-4"></div>' +
+    '<div class="api-settings-container api-settings-split">' +
    // Left Sidebar
    '<aside class="api-settings-sidebar">' +
    sidebarTabsHtml +
@@ -878,6 +891,9 @@ async function renderApiSettings() {
    renderCacheMainPanel();
  }

+  // Check and render ccw-litellm status
+  checkCcwLitellmStatus().then(renderCcwLitellmStatusCard);
+
  if (window.lucide) lucide.createIcons();
 }

@@ -966,7 +982,10 @@ function renderProviderDetail(providerId) {
  }

  var maskedKey = provider.apiKey ? '••••••••••••••••' + provider.apiKey.slice(-4) : '••••••••';
-  var apiBasePreview = (provider.apiBase || getDefaultApiBase(provider.type)) + '/chat/completions';
+  var currentApiBase = provider.apiBase || getDefaultApiBase(provider.type);
+  // Show full endpoint URL preview based on active model tab
+  var endpointPath = activeModelTab === 'embedding' ? '/embeddings' : '/chat/completions';
+  var apiBasePreview = currentApiBase + endpointPath;

  var html = '<div class="provider-detail-header">' +
    '<div class="provider-detail-title">' +
@@ -1007,13 +1026,18 @@ function renderProviderDetail(providerId) {
    '<button class="btn btn-secondary" onclick="testProviderConnection()">' + t('apiSettings.testConnection') + '</button>' +
    '</div>' +
    '</div>' +
-    // API Base URL field
+    // API Base URL field - editable
    '<div class="field-group">' +
    '<div class="field-label">' +
    '<span>' + t('apiSettings.apiBaseUrl') + '</span>' +
    '</div>' +
-    '<input type="text" class="cli-input" value="' + escapeHtml(provider.apiBase || getDefaultApiBase(provider.type)) + '" readonly />' +
-    '<span class="field-hint">' + t('apiSettings.preview') + ': ' + apiBasePreview + '</span>' +
+    '<div class="field-input-group">' +
+    '<input type="text" class="cli-input" id="provider-detail-apibase" value="' + escapeHtml(currentApiBase) + '" placeholder="https://api.openai.com/v1" oninput="updateApiBasePreview(this.value)" />' +
+    '<button class="btn btn-secondary" onclick="saveProviderApiBase(\'' + providerId + '\')">' +
+    '<i data-lucide="save"></i> ' + t('common.save') +
+    '</button>' +
+    '</div>' +
+    '<span class="field-hint" id="api-base-preview">' + t('apiSettings.preview') + ': ' + escapeHtml(apiBasePreview) + '</span>' +
    '</div>' +
    // Model Section
    '<div class="model-section">' +
@@ -1037,11 +1061,14 @@ function renderProviderDetail(providerId) {
    '</div>' +
    '<div class="model-tree" id="model-tree"></div>' +
    '</div>' +
-    // Multi-key settings button
+    // Multi-key and sync buttons
    '<div class="multi-key-trigger">' +
    '<button class="btn btn-secondary multi-key-btn" onclick="showMultiKeyModal(\'' + providerId + '\')">' +
    '<i data-lucide="key-round"></i> ' + t('apiSettings.multiKeySettings') +
    '</button>' +
+    '<button class="btn btn-secondary" onclick="syncConfigToCodexLens()">' +
+    '<i data-lucide="refresh-cw"></i> ' + t('apiSettings.syncToCodexLens') +
+    '</button>' +
    '</div>' +
    '</div>';

@@ -1107,18 +1134,21 @@ function renderModelTree(provider) {
        ? formatContextWindow(model.capabilities.contextWindow)
        : '';

+      // Badge for embedding models shows dimension instead of context window
+      var embeddingBadge = model.capabilities && model.capabilities.embeddingDimension
+        ? model.capabilities.embeddingDimension + 'd'
+        : '';
+      var displayBadge = activeModelTab === 'llm' ? badge : embeddingBadge;
+
      html += '<div class="model-item" data-model-id="' + model.id + '">' +
        '<i data-lucide="' + (activeModelTab === 'llm' ? 'sparkles' : 'box') + '" class="model-item-icon"></i>' +
        '<span class="model-item-name">' + escapeHtml(model.name) + '</span>' +
-        (badge ? '<span class="model-item-badge">' + badge + '</span>' : '') +
+        (displayBadge ? '<span class="model-item-badge">' + displayBadge + '</span>' : '') +
        '<div class="model-item-actions">' +
-        '<button class="btn-icon-sm" onclick="previewModel(\'' + model.id + '\')" title="' + t('apiSettings.previewModel') + '">' +
-        '<i data-lucide="eye"></i>' +
-        '</button>' +
-        '<button class="btn-icon-sm" onclick="showModelSettingsModal(\'' + model.id + '\')" title="' + t('apiSettings.modelSettings') + '">' +
+        '<button class="btn-icon-sm" onclick="showModelSettingsModal(\'' + selectedProviderId + '\', \'' + model.id + '\', \'' + activeModelTab + '\')" title="' + t('apiSettings.modelSettings') + '">' +
        '<i data-lucide="settings"></i>' +
        '</button>' +
-        '<button class="btn-icon-sm text-destructive" onclick="deleteModel(\'' + model.id + '\')" title="' + t('apiSettings.deleteModel') + '">' +
+        '<button class="btn-icon-sm text-destructive" onclick="deleteModel(\'' + selectedProviderId + '\', \'' + model.id + '\', \'' + activeModelTab + '\')" title="' + t('apiSettings.deleteModel') + '">' +
        '<i data-lucide="trash-2"></i>' +
        '</button>' +
        '</div>' +
@@ -1418,8 +1448,8 @@ function showAddModelModal(providerId, modelType) {
    '</div>' +

    '<div class="modal-actions">' +
-    '<button type="button" class="btn btn-secondary" onclick="closeAddModelModal()">' + t('common.cancel') + '</button>' +
-    '<button type="submit" class="btn btn-primary">' + t('common.save') + '</button>' +
+    '<button type="button" class="btn btn-secondary" onclick="closeAddModelModal()"><i data-lucide="x"></i> ' + t('common.cancel') + '</button>' +
+    '<button type="submit" class="btn btn-primary"><i data-lucide="check"></i> ' + t('common.save') + '</button>' +
    '</div>' +
    '</form>' +
    '</div>' +
@@ -1624,29 +1654,51 @@ function showModelSettingsModal(providerId, modelId, modelType) {
  var capabilities = model.capabilities || {};
  var endpointSettings = model.endpointSettings || {};

+  // Calculate endpoint preview URL
+  var providerBase = provider.apiBase || getDefaultApiBase(provider.type);
+  var modelBaseUrl = endpointSettings.baseUrl || providerBase;
+  var endpointPath = isLlm ? '/chat/completions' : '/embeddings';
+  var endpointPreview = modelBaseUrl + endpointPath;
+
  var modalHtml = '<div class="modal-overlay" id="model-settings-modal">' +
-    '<div class="modal-content" style="max-width: 550px;">' +
+    '<div class="modal-content" style="max-width: 600px;">' +
    '<div class="modal-header">' +
-    '<h3>' + t('apiSettings.modelSettings') + ': ' + model.name + '</h3>' +
+    '<h3>' + t('apiSettings.modelSettings') + ': ' + escapeHtml(model.name) + '</h3>' +
    '<button class="modal-close" onclick="closeModelSettingsModal()">&times;</button>' +
    '</div>' +
    '<div class="modal-body">' +
    '<form id="model-settings-form" onsubmit="saveModelSettings(event, \'' + providerId + '\', \'' + modelId + '\', \'' + modelType + '\')">' +

+    // Endpoint Preview Section (combined view + settings)
+    '<div class="form-section endpoint-preview-section">' +
+    '<h4><i data-lucide="' + (isLlm ? 'message-square' : 'box') + '"></i> ' + t('apiSettings.endpointPreview') + '</h4>' +
+    '<div class="endpoint-preview-box">' +
+    '<code id="model-endpoint-preview">' + escapeHtml(endpointPreview) + '</code>' +
+    '<button type="button" class="btn-icon-sm" onclick="copyModelEndpoint()" title="' + t('common.copy') + '">' +
+    '<i data-lucide="copy"></i>' +
+    '</button>' +
+    '</div>' +
+    '<div class="form-group">' +
+    '<label>' + t('apiSettings.modelBaseUrlOverride') + ' <span class="text-muted">(' + t('common.optional') + ')</span></label>' +
+    '<input type="text" id="model-settings-baseurl" class="cli-input" value="' + escapeHtml(endpointSettings.baseUrl || '') + '" placeholder="' + escapeHtml(providerBase) + '" oninput="updateModelEndpointPreview(\'' + (isLlm ? 'chat/completions' : 'embeddings') + '\', \'' + escapeHtml(providerBase) + '\')">' +
+    '<small class="form-hint">' + t('apiSettings.modelBaseUrlHint') + '</small>' +
+    '</div>' +
+    '</div>' +
+
    // Basic Info
    '<div class="form-section">' +
    '<h4>' + t('apiSettings.basicInfo') + '</h4>' +
    '<div class="form-group">' +
    '<label>' + t('apiSettings.modelName') + '</label>' +
-    '<input type="text" id="model-settings-name" class="cli-input" value="' + (model.name || '') + '" required>' +
+    '<input type="text" id="model-settings-name" class="cli-input" value="' + escapeHtml(model.name || '') + '" required>' +
    '</div>' +
    '<div class="form-group">' +
    '<label>' + t('apiSettings.modelSeries') + '</label>' +
-    '<input type="text" id="model-settings-series" class="cli-input" value="' + (model.series || '') + '" required>' +
+    '<input type="text" id="model-settings-series" class="cli-input" value="' + escapeHtml(model.series || '') + '" required>' +
    '</div>' +
    '<div class="form-group">' +
    '<label>' + t('apiSettings.description') + '</label>' +
-    '<textarea id="model-settings-description" class="cli-input" rows="2">' + (model.description || '') + '</textarea>' +
+    '<textarea id="model-settings-description" class="cli-input" rows="2">' + escapeHtml(model.description || '') + '</textarea>' +
    '</div>' +
    '</div>' +

@@ -1678,19 +1730,21 @@ function showModelSettingsModal(providerId, modelId, modelType) {
    // Endpoint Settings
    '<div class="form-section">' +
    '<h4>' + t('apiSettings.endpointSettings') + '</h4>' +
-    '<div class="form-group">' +
+    '<div class="form-row">' +
+    '<div class="form-group form-group-half">' +
    '<label>' + t('apiSettings.timeout') + ' (' + t('apiSettings.seconds') + ')</label>' +
    '<input type="number" id="model-settings-timeout" class="cli-input" value="' + (endpointSettings.timeout || 300) + '" min="10" max="3600">' +
    '</div>' +
-    '<div class="form-group">' +
+    '<div class="form-group form-group-half">' +
    '<label>' + t('apiSettings.maxRetries') + '</label>' +
    '<input type="number" id="model-settings-retries" class="cli-input" value="' + (endpointSettings.maxRetries || 3) + '" min="0" max="10">' +
    '</div>' +
    '</div>' +
+    '</div>' +

    '<div class="modal-actions">' +
-    '<button type="button" class="btn-secondary" onclick="closeModelSettingsModal()">' + t('common.cancel') + '</button>' +
-    '<button type="submit" class="btn-primary">' + t('common.save') + '</button>' +
+    '<button type="button" class="btn-secondary" onclick="closeModelSettingsModal()"><i data-lucide="x"></i> ' + t('common.cancel') + '</button>' +
+    '<button type="submit" class="btn-primary"><i data-lucide="check"></i> ' + t('common.save') + '</button>' +
    '</div>' +
    '</form>' +
    '</div>' +
@@ -1701,6 +1755,33 @@ function showModelSettingsModal(providerId, modelId, modelType) {
  if (window.lucide) lucide.createIcons();
 }

+/**
+ * Update model endpoint preview when base URL changes
+ */
+function updateModelEndpointPreview(endpointPath, defaultBase) {
+  var baseUrlInput = document.getElementById('model-settings-baseurl');
+  var previewElement = document.getElementById('model-endpoint-preview');
+  if (!baseUrlInput || !previewElement) return;
+
+  var baseUrl = baseUrlInput.value.trim() || defaultBase;
+  // Remove trailing slash if present
+  if (baseUrl.endsWith('/')) {
+    baseUrl = baseUrl.slice(0, -1);
+  }
+  previewElement.textContent = baseUrl + '/' + endpointPath;
+}
+
+/**
+ * Copy model endpoint URL to clipboard
+ */
+function copyModelEndpoint() {
+  var previewElement = document.getElementById('model-endpoint-preview');
+  if (previewElement) {
+    navigator.clipboard.writeText(previewElement.textContent);
+    showRefreshToast(t('common.copied'), 'success');
+  }
+}
+
 function closeModelSettingsModal() {
  var modal = document.getElementById('model-settings-modal');
  if (modal) modal.remove();
@@ -1744,7 +1825,13 @@ function saveModelSettings(event, providerId, modelId, modelType) {
      }

      // Update endpoint settings
+      var baseUrlOverride = document.getElementById('model-settings-baseurl').value.trim();
+      // Remove trailing slash if present
+      if (baseUrlOverride && baseUrlOverride.endsWith('/')) {
+        baseUrlOverride = baseUrlOverride.slice(0, -1);
+      }
      models[modelIndex].endpointSettings = {
+        baseUrl: baseUrlOverride || undefined,
        timeout: parseInt(document.getElementById('model-settings-timeout').value) || 300,
        maxRetries: parseInt(document.getElementById('model-settings-retries').value) || 3
      };
@@ -1774,11 +1861,6 @@ function saveModelSettings(event, providerId, modelId, modelType) {
    });
 }

-function previewModel(providerId, modelId, modelType) {
-  // Just open the settings modal in read mode for now
-  showModelSettingsModal(providerId, modelId, modelType);
-}
-
 function deleteModel(providerId, modelId, modelType) {
  if (!confirm(t('common.confirmDelete'))) return;

@@ -1823,6 +1905,59 @@ function copyProviderApiKey(providerId) {
  }
 }

+/**
+ * Save provider API base URL
+ */
+async function saveProviderApiBase(providerId) {
+  var input = document.getElementById('provider-detail-apibase');
+  if (!input) return;
+
+  var newApiBase = input.value.trim();
+  // Remove trailing slash if present
+  if (newApiBase.endsWith('/')) {
+    newApiBase = newApiBase.slice(0, -1);
+  }
+
+  try {
+    var response = await fetch('/api/litellm-api/providers/' + providerId, {
+      method: 'PUT',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({ apiBase: newApiBase || undefined })
+    });
+
+    if (!response.ok) throw new Error('Failed to update API base');
+
+    // Update local data
+    var provider = apiSettingsData.providers.find(function(p) { return p.id === providerId; });
+    if (provider) {
+      provider.apiBase = newApiBase || undefined;
+    }
+
+    // Update preview
+    updateApiBasePreview(newApiBase);
+    showRefreshToast(t('apiSettings.apiBaseUpdated'), 'success');
+  } catch (err) {
+    console.error('Failed to save API base:', err);
+    showRefreshToast(t('common.error') + ': ' + err.message, 'error');
+  }
+}
+
+/**
+ * Update API base preview text showing full endpoint URL
+ */
+function updateApiBasePreview(apiBase) {
+  var preview = document.getElementById('api-base-preview');
+  if (!preview) return;
+
+  var base = apiBase || getDefaultApiBase('openai');
+  // Remove trailing slash if present
+  if (base.endsWith('/')) {
+    base = base.slice(0, -1);
+  }
+  var endpointPath = activeModelTab === 'embedding' ? '/embeddings' : '/chat/completions';
+  preview.textContent = t('apiSettings.preview') + ': ' + base + endpointPath;
+}
+
 /**
 * Delete provider with confirmation
 */
@@ -1859,6 +1994,25 @@ async function deleteProviderWithConfirm(providerId) {
  }
 }

+/**
+ * Sync config to CodexLens (generate YAML config for ccw_litellm)
+ */
+async function syncConfigToCodexLens() {
+  try {
+    var response = await fetch('/api/litellm-api/config/sync', {
+      method: 'POST'
+    });
+
+    if (!response.ok) throw new Error('Failed to sync config');
+
+    var result = await response.json();
+    showRefreshToast(t('apiSettings.configSynced') + ' (' + result.yamlPath + ')', 'success');
+  } catch (err) {
+    console.error('Failed to sync config:', err);
+    showRefreshToast(t('common.error') + ': ' + err.message, 'error');
+  }
+}
+
 /**
 * Get provider icon class based on type
 */
@@ -2343,7 +2497,7 @@ function showMultiKeyModal(providerId) {
    renderHealthCheckSection(provider) +
    '</div>' +
    '<div class="modal-actions">' +
-    '<button type="button" class="btn-primary" onclick="closeMultiKeyModal()">' + t('common.close') + '</button>' +
+    '<button type="button" class="btn-primary" onclick="closeMultiKeyModal()"><i data-lucide="check"></i> ' + t('common.close') + '</button>' +
    '</div>' +
    '</div>' +
    '</div>';
@@ -2578,6 +2732,99 @@ function toggleKeyVisibility(btn) {
 }


+// ========== CCW-LiteLLM Management ==========
+
+/**
+ * Check ccw-litellm installation status
+ */
+async function checkCcwLitellmStatus() {
+  try {
+    var response = await fetch('/api/litellm-api/ccw-litellm/status');
+    var status = await response.json();
+    window.ccwLitellmStatus = status;
+    return status;
+  } catch (e) {
+    console.warn('[API Settings] Could not check ccw-litellm status:', e);
+    return { installed: false };
+  }
+}
+
+/**
+ * Render ccw-litellm status card
+ */
+function renderCcwLitellmStatusCard() {
+  var container = document.getElementById('ccwLitellmStatusContainer');
+  if (!container) return;
+
+  var status = window.ccwLitellmStatus || { installed: false };
+
+  if (status.installed) {
+    container.innerHTML =
+      '<div class="flex items-center gap-2 text-sm">' +
+        '<span class="inline-flex items-center gap-1.5 px-2.5 py-1 rounded-full bg-success/10 text-success border border-success/20">' +
+          '<i data-lucide="check-circle" class="w-3.5 h-3.5"></i>' +
+          'ccw-litellm ' + (status.version || '') +
+        '</span>' +
+      '</div>';
+  } else {
+    container.innerHTML =
+      '<div class="flex items-center gap-2">' +
+        '<span class="inline-flex items-center gap-1.5 px-2.5 py-1 rounded-full bg-muted text-muted-foreground border border-border text-sm">' +
+          '<i data-lucide="circle" class="w-3.5 h-3.5"></i>' +
+          'ccw-litellm not installed' +
+        '</span>' +
+        '<button class="btn-sm btn-primary" onclick="installCcwLitellm()">' +
+          '<i data-lucide="download" class="w-3.5 h-3.5"></i> Install' +
+        '</button>' +
+      '</div>';
+  }
+
+  if (window.lucide) lucide.createIcons();
+}
+
+/**
+ * Install ccw-litellm package
+ */
+async function installCcwLitellm() {
+  var container = document.getElementById('ccwLitellmStatusContainer');
+  if (container) {
+    container.innerHTML =
+      '<div class="flex items-center gap-2 text-sm text-muted-foreground">' +
+        '<div class="animate-spin w-4 h-4 border-2 border-primary border-t-transparent rounded-full"></div>' +
+        'Installing ccw-litellm...' +
+      '</div>';
+  }
+
+  try {
+    var response = await fetch('/api/litellm-api/ccw-litellm/install', {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({})
+    });
+
+    var result = await response.json();
+
+    if (result.success) {
+      showRefreshToast('ccw-litellm installed successfully!', 'success');
+      // Refresh status
+      await checkCcwLitellmStatus();
+      renderCcwLitellmStatusCard();
+    } else {
+      showRefreshToast('Failed to install ccw-litellm: ' + result.error, 'error');
+      renderCcwLitellmStatusCard();
+    }
+  } catch (e) {
+    showRefreshToast('Installation error: ' + e.message, 'error');
+    renderCcwLitellmStatusCard();
+  }
+}
+
+// Make functions globally accessible
+window.checkCcwLitellmStatus = checkCcwLitellmStatus;
+window.renderCcwLitellmStatusCard = renderCcwLitellmStatusCard;
+window.installCcwLitellm = installCcwLitellm;
+
+
 // ========== Utility Functions ==========

 /**
--- a/ccw/src/templates/dashboard-js/views/codexlens-manager.js
+++ b/ccw/src/templates/dashboard-js/views/codexlens-manager.js
@@ -1166,10 +1166,12 @@ async function deleteModel(profile) {
 * Initialize CodexLens index with bottom floating progress bar
 * @param {string} indexType - 'vector' (with embeddings), 'normal' (FTS only), or 'full' (FTS + Vector)
 * @param {string} embeddingModel - Model profile: 'code', 'fast'
+ * @param {string} embeddingBackend - Backend: 'fastembed' (local) or 'litellm' (API)
 */
-async function initCodexLensIndex(indexType, embeddingModel) {
+async function initCodexLensIndex(indexType, embeddingModel, embeddingBackend) {
  indexType = indexType || 'vector';
  embeddingModel = embeddingModel || 'code';
+  embeddingBackend = embeddingBackend || 'fastembed';

  // For vector or full index, check if semantic dependencies are available
  if (indexType === 'vector' || indexType === 'full') {
@@ -1235,7 +1237,8 @@ async function initCodexLensIndex(indexType, embeddingModel) {
  var modelLabel = '';
  if (indexType !== 'normal') {
    var modelNames = { code: 'Code', fast: 'Fast' };
-    modelLabel = ' [' + (modelNames[embeddingModel] || embeddingModel) + ']';
+    var backendLabel = embeddingBackend === 'litellm' ? 'API: ' : '';
+    modelLabel = ' [' + backendLabel + (modelNames[embeddingModel] || embeddingModel) + ']';
  }

  progressBar.innerHTML =
@@ -1272,17 +1275,19 @@ async function initCodexLensIndex(indexType, embeddingModel) {
  var apiIndexType = (indexType === 'full') ? 'vector' : indexType;

  // Start indexing with specified type and model
-  startCodexLensIndexing(apiIndexType, embeddingModel);
+  startCodexLensIndexing(apiIndexType, embeddingModel, embeddingBackend);
 }

 /**
 * Start the indexing process
 * @param {string} indexType - 'vector' or 'normal'
 * @param {string} embeddingModel - Model profile: 'code', 'fast'
+ * @param {string} embeddingBackend - Backend: 'fastembed' (local) or 'litellm' (API)
 */
-async function startCodexLensIndexing(indexType, embeddingModel) {
+async function startCodexLensIndexing(indexType, embeddingModel, embeddingBackend) {
  indexType = indexType || 'vector';
  embeddingModel = embeddingModel || 'code';
+  embeddingBackend = embeddingBackend || 'fastembed';
  var statusText = document.getElementById('codexlensIndexStatus');
  var progressBar = document.getElementById('codexlensIndexProgressBar');
  var percentText = document.getElementById('codexlensIndexPercent');
@@ -1314,11 +1319,11 @@ async function startCodexLensIndexing(indexType, embeddingModel) {
  }

  try {
-    console.log('[CodexLens] Starting index for:', projectPath, 'type:', indexType, 'model:', embeddingModel);
+    console.log('[CodexLens] Starting index for:', projectPath, 'type:', indexType, 'model:', embeddingModel, 'backend:', embeddingBackend);
    var response = await fetch('/api/codexlens/init', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
-      body: JSON.stringify({ path: projectPath, indexType: indexType, embeddingModel: embeddingModel })
+      body: JSON.stringify({ path: projectPath, indexType: indexType, embeddingModel: embeddingModel, embeddingBackend: embeddingBackend })
    });

    var result = await response.json();
@@ -1883,6 +1888,16 @@ async function renderCodexLensManager() {
      await loadCodexLensStatus();
    }

+    // Load LiteLLM API config for embedding backend options
+    try {
+      var litellmResponse = await fetch('/api/litellm-api/config');
+      if (litellmResponse.ok) {
+        window.litellmApiConfig = await litellmResponse.json();
+      }
+    } catch (e) {
+      console.warn('[CodexLens] Could not load LiteLLM config:', e);
+    }
+
    var response = await fetch('/api/codexlens/config');
    var config = await response.json();

@@ -1946,6 +1961,15 @@ function buildCodexLensManagerPage(config) {
            '<div class="bg-card border border-border rounded-lg p-5">' +
              '<h4 class="text-lg font-semibold mb-4 flex items-center gap-2"><i data-lucide="layers" class="w-5 h-5 text-primary"></i> ' + t('codexlens.createIndex') + '</h4>' +
              '<div class="space-y-4">' +
+                // Backend selector (fastembed local or litellm API)
+                '<div class="mb-4">' +
+                  '<label class="block text-sm font-medium mb-1.5">' + (t('codexlens.embeddingBackend') || 'Embedding Backend') + '</label>' +
+                  '<select id="pageBackendSelect" class="w-full px-3 py-2 border border-border rounded-lg bg-background text-sm" onchange="onEmbeddingBackendChange()">' +
+                    '<option value="fastembed">' + (t('codexlens.localFastembed') || 'Local (FastEmbed)') + '</option>' +
+                    '<option value="litellm">' + (t('codexlens.apiLitellm') || 'API (LiteLLM)') + '</option>' +
+                  '</select>' +
+                  '<p class="text-xs text-muted-foreground mt-1">' + (t('codexlens.backendHint') || 'Select local model or remote API endpoint') + '</p>' +
+                '</div>' +
                // Model selector
                '<div>' +
                  '<label class="block text-sm font-medium mb-1.5">' + t('codexlens.embeddingModel') + '</label>' +
@@ -2150,18 +2174,68 @@ function buildModelSelectOptionsForPage() {
  return options;
 }

+/**
+ * Handle embedding backend change
+ */
+function onEmbeddingBackendChange() {
+  var backendSelect = document.getElementById('pageBackendSelect');
+  var modelSelect = document.getElementById('pageModelSelect');
+  if (!backendSelect || !modelSelect) return;
+  
+  var backend = backendSelect.value;
+  
+  if (backend === 'litellm') {
+    // Load LiteLLM embedding models
+    modelSelect.innerHTML = buildLiteLLMModelOptions();
+  } else {
+    // Load local fastembed models
+    modelSelect.innerHTML = buildModelSelectOptionsForPage();
+  }
+}
+
+/**
+ * Build LiteLLM model options from config
+ */
+function buildLiteLLMModelOptions() {
+  var litellmConfig = window.litellmApiConfig || {};
+  var providers = litellmConfig.providers || [];
+  var options = '';
+  
+  providers.forEach(function(provider) {
+    if (!provider.enabled) return;
+    var models = provider.models || [];
+    models.forEach(function(model) {
+      if (model.type !== 'embedding' || !model.enabled) return;
+      var label = model.name || model.id;
+      var selected = options === '' ? ' selected' : '';
+      options += '<option value="' + model.id + '"' + selected + '>' + label + '</option>';
+    });
+  });
+  
+  if (options === '') {
+    options = '<option value="" disabled selected>' + (t('codexlens.noApiModels') || 'No API embedding models configured') + '</option>';
+  }
+  
+  return options;
+}
+
+// Make functions globally accessible
+window.onEmbeddingBackendChange = onEmbeddingBackendChange;
+
 /**
 * Initialize index from page with selected model
 */
 function initCodexLensIndexFromPage(indexType) {
+  var backendSelect = document.getElementById('pageBackendSelect');
  var modelSelect = document.getElementById('pageModelSelect');
+  var selectedBackend = backendSelect ? backendSelect.value : 'fastembed';
  var selectedModel = modelSelect ? modelSelect.value : 'code';

  // For FTS-only index, model is not needed
  if (indexType === 'normal') {
    initCodexLensIndex(indexType);
  } else {
-    initCodexLensIndex(indexType, selectedModel);
+    initCodexLensIndex(indexType, selectedModel, selectedBackend);
  }
 }

--- a/ccw/src/tools/claude-cli-tools.ts
+++ b/ccw/src/tools/claude-cli-tools.ts
@@ -0,0 +1,300 @@
+/**
+ * Claude CLI Tools Configuration Manager
+ * Manages .claude/cli-tools.json with fallback:
+ * 1. Project workspace: {projectDir}/.claude/cli-tools.json (priority)
+ * 2. Global: ~/.claude/cli-tools.json (fallback)
+ */
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+
+// ========== Types ==========
+
+export interface ClaudeCliTool {
+  enabled: boolean;
+  isBuiltin: boolean;
+  command: string;
+  description: string;
+}
+
+export interface ClaudeCacheSettings {
+  injectionMode: 'auto' | 'manual' | 'disabled';
+  defaultPrefix: string;
+  defaultSuffix: string;
+}
+
+export interface ClaudeCliToolsConfig {
+  $schema?: string;
+  version: string;
+  tools: Record<string, ClaudeCliTool>;
+  customEndpoints: Array<{
+    id: string;
+    name: string;
+    enabled: boolean;
+  }>;
+  defaultTool: string;
+  settings: {
+    promptFormat: 'plain' | 'yaml' | 'json';
+    smartContext: {
+      enabled: boolean;
+      maxFiles: number;
+    };
+    nativeResume: boolean;
+    recursiveQuery: boolean;
+    cache: ClaudeCacheSettings;
+  };
+}
+
+// ========== Default Config ==========
+
+const DEFAULT_CONFIG: ClaudeCliToolsConfig = {
+  version: '1.0.0',
+  tools: {
+    gemini: {
+      enabled: true,
+      isBuiltin: true,
+      command: 'gemini',
+      description: 'Google AI for code analysis'
+    },
+    qwen: {
+      enabled: true,
+      isBuiltin: true,
+      command: 'qwen',
+      description: 'Alibaba AI assistant'
+    },
+    codex: {
+      enabled: true,
+      isBuiltin: true,
+      command: 'codex',
+      description: 'OpenAI code generation'
+    },
+    claude: {
+      enabled: true,
+      isBuiltin: true,
+      command: 'claude',
+      description: 'Anthropic AI assistant'
+    }
+  },
+  customEndpoints: [],
+  defaultTool: 'gemini',
+  settings: {
+    promptFormat: 'plain',
+    smartContext: {
+      enabled: false,
+      maxFiles: 10
+    },
+    nativeResume: true,
+    recursiveQuery: true,
+    cache: {
+      injectionMode: 'auto',
+      defaultPrefix: '',
+      defaultSuffix: ''
+    }
+  }
+};
+
+// ========== Helper Functions ==========
+
+function getProjectConfigPath(projectDir: string): string {
+  return path.join(projectDir, '.claude', 'cli-tools.json');
+}
+
+function getGlobalConfigPath(): string {
+  return path.join(os.homedir(), '.claude', 'cli-tools.json');
+}
+
+/**
+ * Resolve config path with fallback:
+ * 1. Project: {projectDir}/.claude/cli-tools.json
+ * 2. Global: ~/.claude/cli-tools.json
+ * Returns { path, source } where source is 'project' | 'global' | 'default'
+ */
+function resolveConfigPath(projectDir: string): { path: string; source: 'project' | 'global' | 'default' } {
+  const projectPath = getProjectConfigPath(projectDir);
+  if (fs.existsSync(projectPath)) {
+    return { path: projectPath, source: 'project' };
+  }
+
+  const globalPath = getGlobalConfigPath();
+  if (fs.existsSync(globalPath)) {
+    return { path: globalPath, source: 'global' };
+  }
+
+  return { path: projectPath, source: 'default' };
+}
+
+function ensureClaudeDir(projectDir: string): void {
+  const claudeDir = path.join(projectDir, '.claude');
+  if (!fs.existsSync(claudeDir)) {
+    fs.mkdirSync(claudeDir, { recursive: true });
+  }
+}
+
+// ========== Main Functions ==========
+
+/**
+ * Load CLI tools configuration with fallback:
+ * 1. Project: {projectDir}/.claude/cli-tools.json
+ * 2. Global: ~/.claude/cli-tools.json
+ * 3. Default config
+ */
+export function loadClaudeCliTools(projectDir: string): ClaudeCliToolsConfig & { _source?: string } {
+  const resolved = resolveConfigPath(projectDir);
+
+  try {
+    if (resolved.source === 'default') {
+      // No config file found, return defaults
+      return { ...DEFAULT_CONFIG, _source: 'default' };
+    }
+
+    const content = fs.readFileSync(resolved.path, 'utf-8');
+    const parsed = JSON.parse(content) as Partial<ClaudeCliToolsConfig>;
+
+    // Merge with defaults
+    const config = {
+      ...DEFAULT_CONFIG,
+      ...parsed,
+      tools: { ...DEFAULT_CONFIG.tools, ...(parsed.tools || {}) },
+      settings: {
+        ...DEFAULT_CONFIG.settings,
+        ...(parsed.settings || {}),
+        smartContext: {
+          ...DEFAULT_CONFIG.settings.smartContext,
+          ...(parsed.settings?.smartContext || {})
+        },
+        cache: {
+          ...DEFAULT_CONFIG.settings.cache,
+          ...(parsed.settings?.cache || {})
+        }
+      },
+      _source: resolved.source
+    };
+
+    console.log(`[claude-cli-tools] Loaded config from ${resolved.source}: ${resolved.path}`);
+    return config;
+  } catch (err) {
+    console.error('[claude-cli-tools] Error loading config:', err);
+    return { ...DEFAULT_CONFIG, _source: 'default' };
+  }
+}
+
+/**
+ * Save CLI tools configuration to project .claude/cli-tools.json
+ * Always saves to project directory (not global)
+ */
+export function saveClaudeCliTools(projectDir: string, config: ClaudeCliToolsConfig & { _source?: string }): void {
+  ensureClaudeDir(projectDir);
+  const configPath = getProjectConfigPath(projectDir);
+
+  // Remove internal _source field before saving
+  const { _source, ...configToSave } = config;
+
+  try {
+    fs.writeFileSync(configPath, JSON.stringify(configToSave, null, 2), 'utf-8');
+    console.log(`[claude-cli-tools] Saved config to project: ${configPath}`);
+  } catch (err) {
+    console.error('[claude-cli-tools] Error saving config:', err);
+    throw new Error(`Failed to save CLI tools config: ${err}`);
+  }
+}
+
+/**
+ * Update enabled status for a specific tool
+ */
+export function updateClaudeToolEnabled(
+  projectDir: string,
+  toolName: string,
+  enabled: boolean
+): ClaudeCliToolsConfig {
+  const config = loadClaudeCliTools(projectDir);
+
+  if (config.tools[toolName]) {
+    config.tools[toolName].enabled = enabled;
+    saveClaudeCliTools(projectDir, config);
+  }
+
+  return config;
+}
+
+/**
+ * Update cache settings
+ */
+export function updateClaudeCacheSettings(
+  projectDir: string,
+  cacheSettings: Partial<ClaudeCacheSettings>
+): ClaudeCliToolsConfig {
+  const config = loadClaudeCliTools(projectDir);
+
+  config.settings.cache = {
+    ...config.settings.cache,
+    ...cacheSettings
+  };
+
+  saveClaudeCliTools(projectDir, config);
+  return config;
+}
+
+/**
+ * Update default tool
+ */
+export function updateClaudeDefaultTool(
+  projectDir: string,
+  defaultTool: string
+): ClaudeCliToolsConfig {
+  const config = loadClaudeCliTools(projectDir);
+  config.defaultTool = defaultTool;
+  saveClaudeCliTools(projectDir, config);
+  return config;
+}
+
+/**
+ * Add custom endpoint
+ */
+export function addClaudeCustomEndpoint(
+  projectDir: string,
+  endpoint: { id: string; name: string; enabled: boolean }
+): ClaudeCliToolsConfig {
+  const config = loadClaudeCliTools(projectDir);
+
+  // Check if endpoint already exists
+  const existingIndex = config.customEndpoints.findIndex(e => e.id === endpoint.id);
+  if (existingIndex >= 0) {
+    config.customEndpoints[existingIndex] = endpoint;
+  } else {
+    config.customEndpoints.push(endpoint);
+  }
+
+  saveClaudeCliTools(projectDir, config);
+  return config;
+}
+
+/**
+ * Remove custom endpoint
+ */
+export function removeClaudeCustomEndpoint(
+  projectDir: string,
+  endpointId: string
+): ClaudeCliToolsConfig {
+  const config = loadClaudeCliTools(projectDir);
+  config.customEndpoints = config.customEndpoints.filter(e => e.id !== endpointId);
+  saveClaudeCliTools(projectDir, config);
+  return config;
+}
+
+/**
+ * Get config source info
+ */
+export function getClaudeCliToolsInfo(projectDir: string): {
+  projectPath: string;
+  globalPath: string;
+  activePath: string;
+  source: 'project' | 'global' | 'default';
+} {
+  const resolved = resolveConfigPath(projectDir);
+  return {
+    projectPath: getProjectConfigPath(projectDir),
+    globalPath: getGlobalConfigPath(),
+    activePath: resolved.path,
+    source: resolved.source
+  };
+}
--- a/ccw/src/tools/core-memory.ts
+++ b/ccw/src/tools/core-memory.ts
@@ -16,6 +16,8 @@ const OperationEnum = z.enum(['list', 'import', 'export', 'summary', 'embed', 's

 const ParamsSchema = z.object({
  operation: OperationEnum,
+  // Path parameter - highest priority for project resolution
+  path: z.string().optional(),
  text: z.string().optional(),
  id: z.string().optional(),
  tool: z.enum(['gemini', 'qwen']).optional().default('gemini'),
@@ -106,17 +108,21 @@ interface EmbedStatusResult {
 type OperationResult = ListResult | ImportResult | ExportResult | SummaryResult | EmbedResult | SearchResult | EmbedStatusResult;

 /**
- * Get project path from current working directory
+ * Get project path - uses explicit path if provided, otherwise falls back to current working directory
+ * Priority: path parameter > getProjectRoot()
 */
-function getProjectPath(): string {
+function getProjectPath(explicitPath?: string): string {
+  if (explicitPath) {
+    return explicitPath;
+  }
  return getProjectRoot();
 }

 /**
- * Get database path for current project
+ * Get database path for project
 */
-function getDatabasePath(): string {
-  const projectPath = getProjectPath();
+function getDatabasePath(explicitPath?: string): string {
+  const projectPath = getProjectPath(explicitPath);
  const paths = StoragePaths.project(projectPath);
  return join(paths.root, 'core-memory', 'core_memory.db');
 }
@@ -129,8 +135,8 @@ const PREVIEW_MAX_LENGTH = 100;
 * List all memories with compact output
 */
 function executeList(params: Params): ListResult {
-  const { limit } = params;
-  const store = getCoreMemoryStore(getProjectPath());
+  const { limit, path } = params;
+  const store = getCoreMemoryStore(getProjectPath(path));
  const memories = store.getMemories({ limit }) as CoreMemory[];

  // Convert to compact format with truncated preview
@@ -160,13 +166,13 @@ function executeList(params: Params): ListResult {
 * Import text as a new memory
 */
 function executeImport(params: Params): ImportResult {
-  const { text } = params;
+  const { text, path } = params;

  if (!text || text.trim() === '') {
    throw new Error('Parameter "text" is required for import operation');
  }

-  const store = getCoreMemoryStore(getProjectPath());
+  const store = getCoreMemoryStore(getProjectPath(path));
  const memory = store.upsertMemory({
    content: text.trim(),
  });
@@ -184,14 +190,14 @@ function executeImport(params: Params): ImportResult {
 * Searches current project first, then all projects if not found
 */
 function executeExport(params: Params): ExportResult {
-  const { id } = params;
+  const { id, path } = params;

  if (!id) {
    throw new Error('Parameter "id" is required for export operation');
  }

-  // Try current project first
-  const store = getCoreMemoryStore(getProjectPath());
+  // Try current project first (or explicit path if provided)
+  const store = getCoreMemoryStore(getProjectPath(path));
  let memory = store.getMemory(id);

  // If not found, search across all projects
@@ -218,13 +224,13 @@ function executeExport(params: Params): ExportResult {
 * Generate AI summary for a memory
 */
 async function executeSummary(params: Params): Promise<SummaryResult> {
-  const { id, tool = 'gemini' } = params;
+  const { id, tool = 'gemini', path } = params;

  if (!id) {
    throw new Error('Parameter "id" is required for summary operation');
  }

-  const store = getCoreMemoryStore(getProjectPath());
+  const store = getCoreMemoryStore(getProjectPath(path));
  const memory = store.getMemory(id);

  if (!memory) {
@@ -245,8 +251,8 @@ async function executeSummary(params: Params): Promise<SummaryResult> {
 * Generate embeddings for memory chunks
 */
 async function executeEmbed(params: Params): Promise<EmbedResult> {
-  const { source_id, batch_size = 8, force = false } = params;
-  const dbPath = getDatabasePath();
+  const { source_id, batch_size = 8, force = false, path } = params;
+  const dbPath = getDatabasePath(path);

  const result = await MemoryEmbedder.generateEmbeddings(dbPath, {
    sourceId: source_id,
@@ -272,13 +278,13 @@ async function executeEmbed(params: Params): Promise<EmbedResult> {
 * Search memory chunks using semantic search
 */
 async function executeSearch(params: Params): Promise<SearchResult> {
-  const { query, top_k = 10, min_score = 0.3, source_type } = params;
+  const { query, top_k = 10, min_score = 0.3, source_type, path } = params;

  if (!query) {
    throw new Error('Parameter "query" is required for search operation');
  }

-  const dbPath = getDatabasePath();
+  const dbPath = getDatabasePath(path);

  const result = await MemoryEmbedder.searchMemories(dbPath, query, {
    topK: top_k,
@@ -309,7 +315,8 @@ async function executeSearch(params: Params): Promise<SearchResult> {
 * Get embedding status statistics
 */
 async function executeEmbedStatus(params: Params): Promise<EmbedStatusResult> {
-  const dbPath = getDatabasePath();
+  const { path } = params;
+  const dbPath = getDatabasePath(path);

  const result = await MemoryEmbedder.getEmbeddingStatus(dbPath);

@@ -368,6 +375,9 @@ Usage:
  core_memory(operation="search", query="authentication")    # Search memories semantically
  core_memory(operation="embed_status")                      # Check embedding status

+Path parameter (highest priority):
+  core_memory(operation="list", path="/path/to/project")     # Use specific project path
+
 Memory IDs use format: CMEM-YYYYMMDD-HHMMSS`,
  inputSchema: {
    type: 'object',
@@ -377,6 +387,10 @@ Memory IDs use format: CMEM-YYYYMMDD-HHMMSS`,
        enum: ['list', 'import', 'export', 'summary', 'embed', 'search', 'embed_status'],
        description: 'Operation to perform',
      },
+      path: {
+        type: 'string',
+        description: 'Project path (highest priority - overrides auto-detected project root)',
+      },
      text: {
        type: 'string',
        description: 'Text content to import (required for import operation)',
--- a/codex-lens/src/codexlens/cli/embedding_manager.py
+++ b/codex-lens/src/codexlens/cli/embedding_manager.py
@@ -4,8 +4,10 @@ import gc
 import logging
 import sqlite3
 import time
+from concurrent.futures import ThreadPoolExecutor, as_completed
 from itertools import islice
 from pathlib import Path
+from threading import Lock
 from typing import Dict, Generator, List, Optional, Tuple

 try:
@@ -79,6 +81,44 @@ def _generate_chunks_from_cursor(
                failed_files.append((file_path, str(e)))


+def _create_token_aware_batches(
+    chunk_generator: Generator,
+    max_tokens_per_batch: int = 8000,
+) -> Generator[List[Tuple], None, None]:
+    """Group chunks by total token count instead of fixed count.
+
+    Uses fast token estimation (len(content) // 4) for efficiency.
+    Yields batches when approaching the token limit.
+
+    Args:
+        chunk_generator: Generator yielding (chunk, file_path) tuples
+        max_tokens_per_batch: Maximum tokens per batch (default: 8000)
+
+    Yields:
+        List of (chunk, file_path) tuples representing a batch
+    """
+    current_batch = []
+    current_tokens = 0
+
+    for chunk, file_path in chunk_generator:
+        # Fast token estimation: len(content) // 4
+        chunk_tokens = len(chunk.content) // 4
+
+        # If adding this chunk would exceed limit and we have items, yield current batch
+        if current_tokens + chunk_tokens > max_tokens_per_batch and current_batch:
+            yield current_batch
+            current_batch = []
+            current_tokens = 0
+
+        # Add chunk to current batch
+        current_batch.append((chunk, file_path))
+        current_tokens += chunk_tokens
+
+    # Yield final batch if not empty
+    if current_batch:
+        yield current_batch
+
+
 def _get_path_column(conn: sqlite3.Connection) -> str:
    """Detect whether files table uses 'path' or 'full_path' column.

@@ -189,31 +229,69 @@ def check_index_embeddings(index_path: Path) -> Dict[str, any]:
        }


+def _get_embedding_defaults() -> tuple[str, str, bool]:
+    """Get default embedding settings from config.
+
+    Returns:
+        Tuple of (backend, model, use_gpu)
+    """
+    try:
+        from codexlens.config import Config
+        config = Config.load()
+        return config.embedding_backend, config.embedding_model, config.embedding_use_gpu
+    except Exception:
+        return "fastembed", "code", True
+
+
 def generate_embeddings(
    index_path: Path,
-    embedding_backend: str = "fastembed",
-    model_profile: str = "code",
+    embedding_backend: Optional[str] = None,
+    model_profile: Optional[str] = None,
    force: bool = False,
    chunk_size: int = 2000,
+    overlap: int = 200,
    progress_callback: Optional[callable] = None,
+    use_gpu: Optional[bool] = None,
+    max_tokens_per_batch: Optional[int] = None,
+    max_workers: int = 1,
 ) -> Dict[str, any]:
    """Generate embeddings for an index using memory-efficient batch processing.

    This function processes files in small batches to keep memory usage under 2GB,
-    regardless of the total project size.
+    regardless of the total project size. Supports concurrent API calls for
+    LiteLLM backend to improve throughput.

    Args:
        index_path: Path to _index.db file
-        embedding_backend: Embedding backend to use (fastembed or litellm)
+        embedding_backend: Embedding backend to use (fastembed or litellm).
+                          Defaults to config setting.
        model_profile: Model profile for fastembed (fast, code, multilingual, balanced)
-                      or model name for litellm (e.g., text-embedding-3-small)
+                      or model name for litellm (e.g., qwen3-embedding).
+                      Defaults to config setting.
        force: If True, regenerate even if embeddings exist
        chunk_size: Maximum chunk size in characters
+        overlap: Overlap size in characters for sliding window chunking (default: 200)
        progress_callback: Optional callback for progress updates
+        use_gpu: Whether to use GPU acceleration (fastembed only).
+                Defaults to config setting.
+        max_tokens_per_batch: Maximum tokens per batch for token-aware batching.
+                             If None, attempts to get from embedder.max_tokens,
+                             then falls back to 8000. If set, overrides automatic detection.
+        max_workers: Maximum number of concurrent API calls (default: 1 for sequential).
+                    Recommended: 2-4 for LiteLLM API backends.

    Returns:
        Result dictionary with generation statistics
    """
+    # Get defaults from config if not specified
+    default_backend, default_model, default_gpu = _get_embedding_defaults()
+
+    if embedding_backend is None:
+        embedding_backend = default_backend
+    if model_profile is None:
+        model_profile = default_model
+    if use_gpu is None:
+        use_gpu = default_gpu
    if not SEMANTIC_AVAILABLE:
        return {
            "success": False,
@@ -261,9 +339,9 @@ def generate_embeddings(

        # Initialize embedder using factory (supports both fastembed and litellm)
        # For fastembed: model_profile is a profile name (fast/code/multilingual/balanced)
-        # For litellm: model_profile is a model name (e.g., text-embedding-3-small)
+        # For litellm: model_profile is a model name (e.g., qwen3-embedding)
        if embedding_backend == "fastembed":
-            embedder = get_embedder_factory(backend="fastembed", profile=model_profile, use_gpu=True)
+            embedder = get_embedder_factory(backend="fastembed", profile=model_profile, use_gpu=use_gpu)
        elif embedding_backend == "litellm":
            embedder = get_embedder_factory(backend="litellm", model=model_profile)
        else:
@@ -274,7 +352,11 @@ def generate_embeddings(

        # skip_token_count=True: Use fast estimation (len/4) instead of expensive tiktoken
        # This significantly reduces CPU usage with minimal impact on metadata accuracy
-        chunker = Chunker(config=ChunkConfig(max_chunk_size=chunk_size, skip_token_count=True))
+        chunker = Chunker(config=ChunkConfig(
+            max_chunk_size=chunk_size,
+            overlap=overlap,
+            skip_token_count=True
+        ))

        if progress_callback:
            progress_callback(f"Using model: {embedder.model_name} ({embedder.embedding_dim} dimensions)")
@@ -336,43 +418,105 @@ def generate_embeddings(
                        cursor, chunker, path_column, FILE_BATCH_SIZE, failed_files
                    )

+                    # Determine max tokens per batch
+                    # Priority: explicit parameter > embedder.max_tokens > default 8000
+                    if max_tokens_per_batch is None:
+                        max_tokens_per_batch = getattr(embedder, 'max_tokens', 8000)
+
+                    # Create token-aware batches or fall back to fixed-size batching
+                    if max_tokens_per_batch:
+                        batch_generator = _create_token_aware_batches(
+                            chunk_generator, max_tokens_per_batch
+                        )
+                    else:
+                        # Fallback to fixed-size batching for backward compatibility
+                        def fixed_size_batches():
+                            while True:
+                                batch = list(islice(chunk_generator, EMBEDDING_BATCH_SIZE))
+                                if not batch:
+                                    break
+                                yield batch
+                        batch_generator = fixed_size_batches()
+
                    batch_number = 0
                    files_seen = set()

-                    while True:
-                        # Get a small batch of chunks from the generator (EMBEDDING_BATCH_SIZE at a time)
-                        chunk_batch = list(islice(chunk_generator, EMBEDDING_BATCH_SIZE))
-                        if not chunk_batch:
-                            break
+                    # Thread-safe counters for concurrent processing
+                    counter_lock = Lock()

-                        batch_number += 1
+                    def process_batch(batch_data: Tuple[int, List[Tuple]]) -> Tuple[int, set, Optional[str]]:
+                        """Process a single batch: generate embeddings and store.

-                        # Track unique files for progress
-                        for _, file_path in chunk_batch:
-                            files_seen.add(file_path)
+                        Args:
+                            batch_data: Tuple of (batch_number, chunk_batch)
+
+                        Returns:
+                            Tuple of (chunks_created, files_in_batch, error_message)
+                        """
+                        batch_num, chunk_batch = batch_data
+                        batch_files = set()

-                        # Generate embeddings directly to numpy (no tolist() conversion)
                        try:
+                            # Track files in this batch
+                            for _, file_path in chunk_batch:
+                                batch_files.add(file_path)
+
+                            # Generate embeddings
                            batch_contents = [chunk.content for chunk, _ in chunk_batch]
-                            # Pass batch_size to fastembed for optimal GPU utilization
                            embeddings_numpy = embedder.embed_to_numpy(batch_contents, batch_size=EMBEDDING_BATCH_SIZE)

-                            # Use add_chunks_batch_numpy to avoid numpy->list->numpy roundtrip
+                            # Store embeddings (thread-safe via SQLite's serialized mode)
                            vector_store.add_chunks_batch_numpy(chunk_batch, embeddings_numpy)

-                            total_chunks_created += len(chunk_batch)
-                            total_files_processed = len(files_seen)
-
-                            if progress_callback and batch_number % 10 == 0:
-                                progress_callback(f"  Batch {batch_number}: {total_chunks_created} chunks, {total_files_processed} files")
-
-                            # Cleanup intermediate data
-                            del batch_contents, embeddings_numpy, chunk_batch
+                            return len(chunk_batch), batch_files, None

                        except Exception as e:
-                            logger.error(f"Failed to process embedding batch {batch_number}: {str(e)}")
-                            # Continue to next batch instead of failing entirely
-                            continue
+                            error_msg = f"Batch {batch_num}: {str(e)}"
+                            logger.error(f"Failed to process embedding batch {batch_num}: {str(e)}")
+                            return 0, batch_files, error_msg
+
+                    # Collect batches for concurrent processing
+                    all_batches = []
+                    for chunk_batch in batch_generator:
+                        batch_number += 1
+                        all_batches.append((batch_number, chunk_batch))
+
+                    # Process batches (sequential or concurrent based on max_workers)
+                    if max_workers <= 1:
+                        # Sequential processing (original behavior)
+                        for batch_num, chunk_batch in all_batches:
+                            chunks_created, batch_files, error = process_batch((batch_num, chunk_batch))
+                            files_seen.update(batch_files)
+                            total_chunks_created += chunks_created
+                            total_files_processed = len(files_seen)
+
+                            if progress_callback and batch_num % 10 == 0:
+                                progress_callback(f"  Batch {batch_num}: {total_chunks_created} chunks, {total_files_processed} files")
+                    else:
+                        # Concurrent processing for API backends
+                        if progress_callback:
+                            progress_callback(f"Processing {len(all_batches)} batches with {max_workers} concurrent workers...")
+
+                        with ThreadPoolExecutor(max_workers=max_workers) as executor:
+                            futures = {executor.submit(process_batch, batch): batch[0] for batch in all_batches}
+
+                            completed = 0
+                            for future in as_completed(futures):
+                                batch_num = futures[future]
+                                try:
+                                    chunks_created, batch_files, error = future.result()
+
+                                    with counter_lock:
+                                        files_seen.update(batch_files)
+                                        total_chunks_created += chunks_created
+                                        total_files_processed = len(files_seen)
+                                        completed += 1
+
+                                    if progress_callback and completed % 10 == 0:
+                                        progress_callback(f"  Completed {completed}/{len(all_batches)} batches: {total_chunks_created} chunks")
+
+                                except Exception as e:
+                                    logger.error(f"Batch {batch_num} raised exception: {str(e)}")

                # Notify before ANN index finalization (happens when bulk_insert context exits)
                if progress_callback:
@@ -445,26 +589,49 @@ def find_all_indexes(scan_dir: Path) -> List[Path]:

 def generate_embeddings_recursive(
    index_root: Path,
-    embedding_backend: str = "fastembed",
-    model_profile: str = "code",
+    embedding_backend: Optional[str] = None,
+    model_profile: Optional[str] = None,
    force: bool = False,
    chunk_size: int = 2000,
+    overlap: int = 200,
    progress_callback: Optional[callable] = None,
+    use_gpu: Optional[bool] = None,
+    max_tokens_per_batch: Optional[int] = None,
+    max_workers: int = 1,
 ) -> Dict[str, any]:
    """Generate embeddings for all index databases in a project recursively.

    Args:
        index_root: Root index directory containing _index.db files
-        embedding_backend: Embedding backend to use (fastembed or litellm)
+        embedding_backend: Embedding backend to use (fastembed or litellm).
+                          Defaults to config setting.
        model_profile: Model profile for fastembed (fast, code, multilingual, balanced)
-                      or model name for litellm (e.g., text-embedding-3-small)
+                      or model name for litellm (e.g., qwen3-embedding).
+                      Defaults to config setting.
        force: If True, regenerate even if embeddings exist
        chunk_size: Maximum chunk size in characters
+        overlap: Overlap size in characters for sliding window chunking (default: 200)
        progress_callback: Optional callback for progress updates
+        use_gpu: Whether to use GPU acceleration (fastembed only).
+                Defaults to config setting.
+        max_tokens_per_batch: Maximum tokens per batch for token-aware batching.
+                             If None, attempts to get from embedder.max_tokens,
+                             then falls back to 8000. If set, overrides automatic detection.
+        max_workers: Maximum number of concurrent API calls (default: 1 for sequential).
+                    Recommended: 2-4 for LiteLLM API backends.

    Returns:
        Aggregated result dictionary with generation statistics
    """
+    # Get defaults from config if not specified
+    default_backend, default_model, default_gpu = _get_embedding_defaults()
+
+    if embedding_backend is None:
+        embedding_backend = default_backend
+    if model_profile is None:
+        model_profile = default_model
+    if use_gpu is None:
+        use_gpu = default_gpu
    # Discover all _index.db files
    index_files = discover_all_index_dbs(index_root)

@@ -498,7 +665,11 @@ def generate_embeddings_recursive(
            model_profile=model_profile,
            force=force,
            chunk_size=chunk_size,
+            overlap=overlap,
            progress_callback=None,  # Don't cascade callbacks
+            use_gpu=use_gpu,
+            max_tokens_per_batch=max_tokens_per_batch,
+            max_workers=max_workers,
        )

        all_results.append({
--- a/codex-lens/src/codexlens/config.py
+++ b/codex-lens/src/codexlens/config.py
@@ -2,6 +2,7 @@

 from __future__ import annotations

+import json
 import os
 from dataclasses import dataclass, field
 from functools import cached_property
@@ -14,6 +15,9 @@ from .errors import ConfigError
 # Workspace-local directory name
 WORKSPACE_DIR_NAME = ".codexlens"

+# Settings file name
+SETTINGS_FILE_NAME = "settings.json"
+

 def _default_global_dir() -> Path:
    """Get global CodexLens data directory."""
@@ -89,6 +93,13 @@ class Config:
    # Hybrid chunker configuration
    hybrid_max_chunk_size: int = 2000  # Max characters per chunk before LLM refinement
    hybrid_llm_refinement: bool = False  # Enable LLM-based semantic boundary refinement
+
+    # Embedding configuration
+    embedding_backend: str = "fastembed"  # "fastembed" (local) or "litellm" (API)
+    embedding_model: str = "code"  # For fastembed: profile (fast/code/multilingual/balanced)
+                                   # For litellm: model name from config (e.g., "qwen3-embedding")
+    embedding_use_gpu: bool = True  # For fastembed: whether to use GPU acceleration
+
    def __post_init__(self) -> None:
        try:
            self.data_dir = self.data_dir.expanduser().resolve()
@@ -133,6 +144,67 @@ class Config:
        """Get parsing rules for a specific language, falling back to defaults."""
        return {**self.parsing_rules.get("default", {}), **self.parsing_rules.get(language_id, {})}

+    @cached_property
+    def settings_path(self) -> Path:
+        """Path to the settings file."""
+        return self.data_dir / SETTINGS_FILE_NAME
+
+    def save_settings(self) -> None:
+        """Save embedding and other settings to file."""
+        settings = {
+            "embedding": {
+                "backend": self.embedding_backend,
+                "model": self.embedding_model,
+                "use_gpu": self.embedding_use_gpu,
+            },
+            "llm": {
+                "enabled": self.llm_enabled,
+                "tool": self.llm_tool,
+                "timeout_ms": self.llm_timeout_ms,
+                "batch_size": self.llm_batch_size,
+            },
+        }
+        with open(self.settings_path, "w", encoding="utf-8") as f:
+            json.dump(settings, f, indent=2)
+
+    def load_settings(self) -> None:
+        """Load settings from file if exists."""
+        if not self.settings_path.exists():
+            return
+
+        try:
+            with open(self.settings_path, "r", encoding="utf-8") as f:
+                settings = json.load(f)
+
+            # Load embedding settings
+            embedding = settings.get("embedding", {})
+            if "backend" in embedding:
+                self.embedding_backend = embedding["backend"]
+            if "model" in embedding:
+                self.embedding_model = embedding["model"]
+            if "use_gpu" in embedding:
+                self.embedding_use_gpu = embedding["use_gpu"]
+
+            # Load LLM settings
+            llm = settings.get("llm", {})
+            if "enabled" in llm:
+                self.llm_enabled = llm["enabled"]
+            if "tool" in llm:
+                self.llm_tool = llm["tool"]
+            if "timeout_ms" in llm:
+                self.llm_timeout_ms = llm["timeout_ms"]
+            if "batch_size" in llm:
+                self.llm_batch_size = llm["batch_size"]
+        except Exception:
+            pass  # Silently ignore errors
+
+    @classmethod
+    def load(cls) -> "Config":
+        """Load config with settings from file."""
+        config = cls()
+        config.load_settings()
+        return config
+

@dataclass
 class WorkspaceConfig:
--- a/codex-lens/src/codexlens/semantic/base.py
+++ b/codex-lens/src/codexlens/semantic/base.py
@@ -38,6 +38,16 @@ class BaseEmbedder(ABC):
        """
        ...

+    @property
+    def max_tokens(self) -> int:
+        """Return maximum token limit for embeddings.
+
+        Returns:
+            int: Maximum number of tokens that can be embedded at once.
+                Default is 8192 if not overridden by implementation.
+        """
+        return 8192
+
    @abstractmethod
    def embed_to_numpy(self, texts: str | Iterable[str]) -> np.ndarray:
        """Embed texts to numpy array.
--- a/codex-lens/src/codexlens/semantic/chunker.py
+++ b/codex-lens/src/codexlens/semantic/chunker.py
@@ -39,7 +39,7 @@ from codexlens.parsers.tokenizer import get_default_tokenizer
 class ChunkConfig:
    """Configuration for chunking strategies."""
    max_chunk_size: int = 1000  # Max characters per chunk
-    overlap: int = 100  # Overlap for sliding window
+    overlap: int = 200  # Overlap for sliding window (increased from 100 for better context)
    strategy: str = "auto"  # Chunking strategy: auto, symbol, sliding_window, hybrid
    min_chunk_size: int = 50  # Minimum chunk size
    skip_token_count: bool = False  # Skip expensive token counting (use char/4 estimate)
@@ -80,6 +80,7 @@ class Chunker:
        """Chunk code by extracted symbols (functions, classes).

        Each symbol becomes one chunk with its full content.
+        Large symbols exceeding max_chunk_size are recursively split using sliding window.

        Args:
            content: Source code content
@@ -101,27 +102,49 @@ class Chunker:
            if len(chunk_content.strip()) < self.config.min_chunk_size:
                continue

-            # Calculate token count if not provided
-            token_count = None
-            if symbol_token_counts and symbol.name in symbol_token_counts:
-                token_count = symbol_token_counts[symbol.name]
-            else:
-                token_count = self._estimate_token_count(chunk_content)
+            # Check if symbol content exceeds max_chunk_size
+            if len(chunk_content) > self.config.max_chunk_size:
+                # Create line mapping for correct line number tracking
+                line_mapping = list(range(start_line, end_line + 1))

-            chunks.append(SemanticChunk(
-                content=chunk_content,
-                embedding=None,
-                metadata={
-                    "file": str(file_path),
-                    "language": language,
-                    "symbol_name": symbol.name,
-                    "symbol_kind": symbol.kind,
-                    "start_line": start_line,
-                    "end_line": end_line,
-                    "strategy": "symbol",
-                    "token_count": token_count,
-                }
-            ))
+                # Use sliding window to split large symbol
+                sub_chunks = self.chunk_sliding_window(
+                    chunk_content,
+                    file_path=file_path,
+                    language=language,
+                    line_mapping=line_mapping
+                )
+
+                # Update sub_chunks with parent symbol metadata
+                for sub_chunk in sub_chunks:
+                    sub_chunk.metadata["symbol_name"] = symbol.name
+                    sub_chunk.metadata["symbol_kind"] = symbol.kind
+                    sub_chunk.metadata["strategy"] = "symbol_split"
+                    sub_chunk.metadata["parent_symbol_range"] = (start_line, end_line)
+
+                chunks.extend(sub_chunks)
+            else:
+                # Calculate token count if not provided
+                token_count = None
+                if symbol_token_counts and symbol.name in symbol_token_counts:
+                    token_count = symbol_token_counts[symbol.name]
+                else:
+                    token_count = self._estimate_token_count(chunk_content)
+
+                chunks.append(SemanticChunk(
+                    content=chunk_content,
+                    embedding=None,
+                    metadata={
+                        "file": str(file_path),
+                        "language": language,
+                        "symbol_name": symbol.name,
+                        "symbol_kind": symbol.kind,
+                        "start_line": start_line,
+                        "end_line": end_line,
+                        "strategy": "symbol",
+                        "token_count": token_count,
+                    }
+                ))

        return chunks

--- a/codex-lens/src/codexlens/semantic/embedder.py
+++ b/codex-lens/src/codexlens/semantic/embedder.py
@@ -165,6 +165,33 @@ class Embedder(BaseEmbedder):
        """Get embedding dimension for current model."""
        return self.MODEL_DIMS.get(self._model_name, 768)  # Default to 768 if unknown

+    @property
+    def max_tokens(self) -> int:
+        """Get maximum token limit for current model.
+
+        Returns:
+            int: Maximum number of tokens based on model profile.
+                - fast: 512 (lightweight, optimized for speed)
+                - code: 8192 (code-optimized, larger context)
+                - multilingual: 512 (standard multilingual model)
+                - balanced: 512 (general purpose)
+        """
+        # Determine profile from model name
+        profile = None
+        for prof, model in self.MODELS.items():
+            if model == self._model_name:
+                profile = prof
+                break
+
+        # Return token limit based on profile
+        if profile == "code":
+            return 8192
+        elif profile in ("fast", "multilingual", "balanced"):
+            return 512
+        else:
+            # Default for unknown models
+            return 512
+
    @property
    def providers(self) -> List[str]:
        """Get configured ONNX execution providers."""
--- a/codex-lens/src/codexlens/semantic/litellm_embedder.py
+++ b/codex-lens/src/codexlens/semantic/litellm_embedder.py
@@ -63,11 +63,39 @@ class LiteLLMEmbedderWrapper(BaseEmbedder):
        """
        return self._embedder.model_name

-    def embed_to_numpy(self, texts: str | Iterable[str]) -> np.ndarray:
+    @property
+    def max_tokens(self) -> int:
+        """Return maximum token limit for the embedding model.
+
+        Returns:
+            int: Maximum number of tokens that can be embedded at once.
+                Inferred from model config or model name patterns.
+        """
+        # Try to get from LiteLLM config first
+        if hasattr(self._embedder, 'max_input_tokens') and self._embedder.max_input_tokens:
+            return self._embedder.max_input_tokens
+
+        # Infer from model name
+        model_name_lower = self.model_name.lower()
+
+        # Large models (8B or "large" in name)
+        if '8b' in model_name_lower or 'large' in model_name_lower:
+            return 32768
+
+        # OpenAI text-embedding-3-* models
+        if 'text-embedding-3' in model_name_lower:
+            return 8191
+
+        # Default fallback
+        return 8192
+
+    def embed_to_numpy(self, texts: str | Iterable[str], **kwargs) -> np.ndarray:
        """Embed texts to numpy array using LiteLLMEmbedder.

        Args:
            texts: Single text or iterable of texts to embed.
+            **kwargs: Additional arguments (ignored for LiteLLM backend).
+                      Accepts batch_size for API compatibility with fastembed.

        Returns:
            numpy.ndarray: Array of shape (n_texts, embedding_dim) containing embeddings.
@@ -76,4 +104,5 @@ class LiteLLMEmbedderWrapper(BaseEmbedder):
            texts = [texts]
        else:
            texts = list(texts)
+        # LiteLLM handles batching internally, ignore batch_size parameter
        return self._embedder.embed(texts)
--- a/codex-lens/tests/test_recursive_splitting.py
+++ b/codex-lens/tests/test_recursive_splitting.py
@@ -0,0 +1,291 @@
+"""Tests for recursive splitting of large symbols in chunker."""
+
+import pytest
+from codexlens.entities import Symbol
+from codexlens.semantic.chunker import Chunker, ChunkConfig
+
+
+class TestRecursiveSplitting:
+    """Test cases for recursive splitting of large symbols."""
+
+    def test_small_symbol_no_split(self):
+        """Test that small symbols are not split."""
+        config = ChunkConfig(max_chunk_size=1000, overlap=100)
+        chunker = Chunker(config)
+
+        content = '''def small_function():
+    # This is a small function
+    x = 1
+    y = 2
+    return x + y
+'''
+        symbols = [Symbol(name='small_function', kind='function', range=(1, 5))]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        assert len(chunks) == 1
+        assert chunks[0].metadata['strategy'] == 'symbol'
+        assert chunks[0].metadata['symbol_name'] == 'small_function'
+        assert chunks[0].metadata['symbol_kind'] == 'function'
+        assert 'parent_symbol_range' not in chunks[0].metadata
+
+    def test_large_symbol_splits(self):
+        """Test that large symbols are recursively split."""
+        config = ChunkConfig(max_chunk_size=100, overlap=20)
+        chunker = Chunker(config)
+
+        content = '''def large_function():
+    # Line 1
+    # Line 2
+    # Line 3
+    # Line 4
+    # Line 5
+    # Line 6
+    # Line 7
+    # Line 8
+    # Line 9
+    # Line 10
+    # Line 11
+    # Line 12
+    # Line 13
+    # Line 14
+    # Line 15
+    pass
+'''
+        symbols = [Symbol(name='large_function', kind='function', range=(1, 18))]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        # Should be split into multiple chunks
+        assert len(chunks) > 1
+
+        # All chunks should have symbol metadata
+        for chunk in chunks:
+            assert chunk.metadata['strategy'] == 'symbol_split'
+            assert chunk.metadata['symbol_name'] == 'large_function'
+            assert chunk.metadata['symbol_kind'] == 'function'
+            assert chunk.metadata['parent_symbol_range'] == (1, 18)
+
+    def test_boundary_condition(self):
+        """Test symbol exactly at max_chunk_size boundary."""
+        config = ChunkConfig(max_chunk_size=90, overlap=20)
+        chunker = Chunker(config)
+
+        content = '''def boundary_function():
+    # This function is exactly at boundary
+    x = 1
+    y = 2
+    return x + y
+'''
+        symbols = [Symbol(name='boundary_function', kind='function', range=(1, 5))]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        # Content is slightly over 90 chars, should be split
+        assert len(chunks) >= 1
+        assert chunks[0].metadata['strategy'] == 'symbol_split'
+
+    def test_multiple_symbols_mixed_sizes(self):
+        """Test chunking with multiple symbols of different sizes."""
+        config = ChunkConfig(max_chunk_size=150, overlap=30)
+        chunker = Chunker(config)
+
+        content = '''def small():
+    return 1
+
+def medium():
+    # Medium function
+    x = 1
+    y = 2
+    z = 3
+    return x + y + z
+
+def very_large():
+    # Line 1
+    # Line 2
+    # Line 3
+    # Line 4
+    # Line 5
+    # Line 6
+    # Line 7
+    # Line 8
+    # Line 9
+    # Line 10
+    # Line 11
+    # Line 12
+    # Line 13
+    # Line 14
+    # Line 15
+    pass
+'''
+        symbols = [
+            Symbol(name='small', kind='function', range=(1, 2)),
+            Symbol(name='medium', kind='function', range=(4, 9)),
+            Symbol(name='very_large', kind='function', range=(11, 28)),
+        ]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        # Find chunks for each symbol
+        small_chunks = [c for c in chunks if c.metadata['symbol_name'] == 'small']
+        medium_chunks = [c for c in chunks if c.metadata['symbol_name'] == 'medium']
+        large_chunks = [c for c in chunks if c.metadata['symbol_name'] == 'very_large']
+
+        # Small should be filtered (< min_chunk_size)
+        assert len(small_chunks) == 0
+
+        # Medium should not be split
+        assert len(medium_chunks) == 1
+        assert medium_chunks[0].metadata['strategy'] == 'symbol'
+
+        # Large should be split
+        assert len(large_chunks) > 1
+        for chunk in large_chunks:
+            assert chunk.metadata['strategy'] == 'symbol_split'
+
+    def test_line_numbers_preserved(self):
+        """Test that line numbers are correctly preserved in sub-chunks."""
+        config = ChunkConfig(max_chunk_size=100, overlap=20)
+        chunker = Chunker(config)
+
+        content = '''def large_function():
+    # Line 1 with some extra content to make it longer
+    # Line 2 with some extra content to make it longer
+    # Line 3 with some extra content to make it longer
+    # Line 4 with some extra content to make it longer
+    # Line 5 with some extra content to make it longer
+    # Line 6 with some extra content to make it longer
+    # Line 7 with some extra content to make it longer
+    # Line 8 with some extra content to make it longer
+    # Line 9 with some extra content to make it longer
+    # Line 10 with some extra content to make it longer
+    pass
+'''
+        symbols = [Symbol(name='large_function', kind='function', range=(1, 13))]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        # Verify line numbers are correct and sequential
+        assert len(chunks) > 1
+        assert chunks[0].metadata['start_line'] == 1
+
+        # Each chunk should have valid line numbers
+        for chunk in chunks:
+            assert chunk.metadata['start_line'] >= 1
+            assert chunk.metadata['end_line'] <= 13
+            assert chunk.metadata['start_line'] <= chunk.metadata['end_line']
+
+    def test_overlap_in_split_chunks(self):
+        """Test that overlap is applied when splitting large symbols."""
+        config = ChunkConfig(max_chunk_size=100, overlap=30)
+        chunker = Chunker(config)
+
+        content = '''def large_function():
+    # Line 1
+    # Line 2
+    # Line 3
+    # Line 4
+    # Line 5
+    # Line 6
+    # Line 7
+    # Line 8
+    # Line 9
+    # Line 10
+    # Line 11
+    # Line 12
+    pass
+'''
+        symbols = [Symbol(name='large_function', kind='function', range=(1, 14))]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        # With overlap, consecutive chunks should overlap
+        if len(chunks) > 1:
+            for i in range(len(chunks) - 1):
+                # Next chunk should start before current chunk ends (overlap)
+                current_end = chunks[i].metadata['end_line']
+                next_start = chunks[i + 1].metadata['start_line']
+                # Overlap should exist
+                assert next_start <= current_end
+
+    def test_empty_symbol_filtered(self):
+        """Test that symbols smaller than min_chunk_size are filtered."""
+        config = ChunkConfig(max_chunk_size=1000, min_chunk_size=50)
+        chunker = Chunker(config)
+
+        content = '''def tiny():
+    pass
+'''
+        symbols = [Symbol(name='tiny', kind='function', range=(1, 2))]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        # Should be filtered due to min_chunk_size
+        assert len(chunks) == 0
+
+    def test_class_symbol_splits(self):
+        """Test that large class symbols are also split correctly."""
+        config = ChunkConfig(max_chunk_size=120, overlap=25)
+        chunker = Chunker(config)
+
+        content = '''class LargeClass:
+    """A large class with many methods."""
+
+    def method1(self):
+        return 1
+
+    def method2(self):
+        return 2
+
+    def method3(self):
+        return 3
+
+    def method4(self):
+        return 4
+'''
+        symbols = [Symbol(name='LargeClass', kind='class', range=(1, 14))]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        # Should be split
+        assert len(chunks) > 1
+
+        # All chunks should preserve class metadata
+        for chunk in chunks:
+            assert chunk.metadata['symbol_name'] == 'LargeClass'
+            assert chunk.metadata['symbol_kind'] == 'class'
+            assert chunk.metadata['strategy'] == 'symbol_split'
+
+
+class TestLightweightMode:
+    """Test recursive splitting with lightweight token counting."""
+
+    def test_large_symbol_splits_lightweight_mode(self):
+        """Test that large symbols split correctly in lightweight mode."""
+        config = ChunkConfig(max_chunk_size=100, overlap=20, skip_token_count=True)
+        chunker = Chunker(config)
+
+        content = '''def large_function():
+    # Line 1 with some extra content to make it longer
+    # Line 2 with some extra content to make it longer
+    # Line 3 with some extra content to make it longer
+    # Line 4 with some extra content to make it longer
+    # Line 5 with some extra content to make it longer
+    # Line 6 with some extra content to make it longer
+    # Line 7 with some extra content to make it longer
+    # Line 8 with some extra content to make it longer
+    # Line 9 with some extra content to make it longer
+    # Line 10 with some extra content to make it longer
+    pass
+'''
+        symbols = [Symbol(name='large_function', kind='function', range=(1, 13))]
+
+        chunks = chunker.chunk_by_symbol(content, symbols, 'test.py', 'python')
+
+        # Should split even in lightweight mode
+        assert len(chunks) > 1
+
+        # All chunks should have token_count (estimated)
+        for chunk in chunks:
+            assert 'token_count' in chunk.metadata
+            assert chunk.metadata['token_count'] > 0
--- a/package.json
+++ b/package.json
@@ -60,6 +60,8 @@
    ".qwen/",
    "codex-lens/src/codexlens/",
    "codex-lens/pyproject.toml",
+    "ccw-litellm/src/ccw_litellm/",
+    "ccw-litellm/pyproject.toml",
    "CLAUDE.md",
    "README.md"
  ],