Files
Claude-Code-Workflow/ccw/LITELLM_INTEGRATION.md
catlog22 b00113d212 feat: Enhance embedding management and model configuration
- Updated embedding_manager.py to include backend parameter in model configuration.
- Modified model_manager.py to utilize cache_name for ONNX models.
- Refactored hybrid_search.py to improve embedder initialization based on backend type.
- Added backend column to vector_store.py for better model configuration management.
- Implemented migration for existing database to include backend information.
- Enhanced API settings implementation with comprehensive provider and endpoint management.
- Introduced LiteLLM integration guide detailing configuration and usage.
- Added examples for LiteLLM usage in TypeScript.
2025-12-24 14:03:59 +08:00

9.6 KiB

LiteLLM Integration Guide

Overview

CCW now supports custom LiteLLM endpoints with integrated context caching. You can configure multiple providers (OpenAI, Anthropic, Ollama, etc.) and create custom endpoints with file-based caching strategies.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      CLI Executor                           │
│                                                             │
│  ┌─────────────┐         ┌──────────────────────────────┐  │
│  │   --model   │────────>│  Route Decision:             │  │
│  │   flag      │         │  - gemini/qwen/codex → CLI   │  │
│  └─────────────┘         │  - custom ID → LiteLLM       │  │
│                          └──────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────┐
│                  LiteLLM Executor                           │
│                                                             │
│  1. Load endpoint config (litellm-api-config.json)         │
│  2. Extract @patterns from prompt                          │
│  3. Pack files via context-cache                           │
│  4. Call LiteLLM client with cached content + prompt       │
│  5. Return result                                          │
└─────────────────────────────────────────────────────────────┘

Configuration

File Location

Configuration is stored per-project:

<project>/.ccw/storage/config/litellm-api-config.json

Configuration Structure

{
  "version": 1,
  "providers": [
    {
      "id": "openai-1234567890",
      "name": "My OpenAI",
      "type": "openai",
      "apiKey": "${OPENAI_API_KEY}",
      "enabled": true,
      "createdAt": "2025-01-01T00:00:00.000Z",
      "updatedAt": "2025-01-01T00:00:00.000Z"
    }
  ],
  "endpoints": [
    {
      "id": "my-gpt4o",
      "name": "GPT-4o with Context Cache",
      "providerId": "openai-1234567890",
      "model": "gpt-4o",
      "description": "GPT-4o with automatic file caching",
      "cacheStrategy": {
        "enabled": true,
        "ttlMinutes": 60,
        "maxSizeKB": 512,
        "filePatterns": ["*.md", "*.ts", "*.js"]
      },
      "enabled": true,
      "createdAt": "2025-01-01T00:00:00.000Z",
      "updatedAt": "2025-01-01T00:00:00.000Z"
    }
  ],
  "defaultEndpoint": "my-gpt4o",
  "globalCacheSettings": {
    "enabled": true,
    "cacheDir": "~/.ccw/cache/context",
    "maxTotalSizeMB": 100
  }
}

Usage

Via CLI

# Use custom endpoint with --model flag
ccw cli -p "Analyze authentication flow" --tool litellm --model my-gpt4o

# With context patterns (automatically cached)
ccw cli -p "@src/auth/**/*.ts Review security" --tool litellm --model my-gpt4o

# Disable caching for specific call
ccw cli -p "Quick question" --tool litellm --model my-gpt4o --no-cache

Via Dashboard API

Create Provider

curl -X POST http://localhost:3000/api/litellm-api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My OpenAI",
    "type": "openai",
    "apiKey": "${OPENAI_API_KEY}",
    "enabled": true
  }'

Create Endpoint

curl -X POST http://localhost:3000/api/litellm-api/endpoints \
  -H "Content-Type: application/json" \
  -d '{
    "id": "my-gpt4o",
    "name": "GPT-4o with Cache",
    "providerId": "openai-1234567890",
    "model": "gpt-4o",
    "cacheStrategy": {
      "enabled": true,
      "ttlMinutes": 60,
      "maxSizeKB": 512,
      "filePatterns": ["*.md", "*.ts"]
    },
    "enabled": true
  }'

Test Provider Connection

curl -X POST http://localhost:3000/api/litellm-api/providers/openai-1234567890/test

Context Caching

How It Works

  1. Pattern Detection: LiteLLM executor scans prompt for @patterns

    @src/**/*.ts
    @CLAUDE.md
    @../shared/**/*
    
  2. File Packing: Files matching patterns are packed via context-cache tool

    • Respects max_file_size limit (default: 1MB per file)
    • Applies TTL from endpoint config
    • Generates session ID for retrieval
  3. Cache Integration: Cached content is prepended to prompt

    <cached files>
    ---
    <original prompt>
    
  4. LLM Call: Combined prompt sent to LiteLLM with provider credentials

Cache Strategy Configuration

interface CacheStrategy {
  enabled: boolean;           // Enable/disable caching for this endpoint
  ttlMinutes: number;         // Cache lifetime (default: 60)
  maxSizeKB: number;          // Max cache size (default: 512KB)
  filePatterns: string[];     // Glob patterns to cache
}

Example: Security Audit with Cache

ccw cli -p "
PURPOSE: OWASP Top 10 security audit of authentication module
TASK: • Check SQL injection • Verify session management • Test XSS vectors
CONTEXT: @src/auth/**/*.ts @src/middleware/auth.ts
EXPECTED: Security report with severity levels and remediation steps
" --tool litellm --model my-security-scanner --mode analysis

What happens:

  1. Executor detects @src/auth/**/*.ts and @src/middleware/auth.ts
  2. Packs matching files into context cache
  3. Cache entry valid for 60 minutes (per endpoint config)
  4. Subsequent calls reuse cached files (no re-packing)
  5. LiteLLM receives full context without manual file specification

Environment Variables

Provider API Keys

LiteLLM uses standard environment variable names:

Provider Env Var Name
OpenAI OPENAI_API_KEY
Anthropic ANTHROPIC_API_KEY
Google GOOGLE_API_KEY
Azure AZURE_API_KEY
Mistral MISTRAL_API_KEY
DeepSeek DEEPSEEK_API_KEY

Configuration Syntax

Use ${ENV_VAR} syntax in config:

{
  "apiKey": "${OPENAI_API_KEY}"
}

The executor resolves these at runtime via resolveEnvVar().

API Reference

Config Manager (litellm-api-config-manager.ts)

Provider Management

getAllProviders(baseDir: string): ProviderCredential[]
getProvider(baseDir: string, providerId: string): ProviderCredential | null
getProviderWithResolvedEnvVars(baseDir: string, providerId: string): ProviderCredential & { resolvedApiKey: string } | null
addProvider(baseDir: string, providerData): ProviderCredential
updateProvider(baseDir: string, providerId: string, updates): ProviderCredential
deleteProvider(baseDir: string, providerId: string): boolean

Endpoint Management

getAllEndpoints(baseDir: string): CustomEndpoint[]
getEndpoint(baseDir: string, endpointId: string): CustomEndpoint | null
findEndpointById(baseDir: string, endpointId: string): CustomEndpoint | null
addEndpoint(baseDir: string, endpointData): CustomEndpoint
updateEndpoint(baseDir: string, endpointId: string, updates): CustomEndpoint
deleteEndpoint(baseDir: string, endpointId: string): boolean

Executor (litellm-executor.ts)

interface LiteLLMExecutionOptions {
  prompt: string;
  endpointId: string;
  baseDir: string;
  cwd?: string;
  includeDirs?: string[];
  enableCache?: boolean;
  onOutput?: (data: { type: string; data: string }) => void;
}

interface LiteLLMExecutionResult {
  success: boolean;
  output: string;
  model: string;
  provider: string;
  cacheUsed: boolean;
  cachedFiles?: string[];
  error?: string;
}

executeLiteLLMEndpoint(options: LiteLLMExecutionOptions): Promise<LiteLLMExecutionResult>
extractPatterns(prompt: string): string[]

Dashboard Integration

The dashboard provides UI for managing LiteLLM configuration:

  • Providers: Add/edit/delete provider credentials
  • Endpoints: Configure custom endpoints with cache strategies
  • Cache Stats: View cache usage and clear entries
  • Test Connections: Verify provider API access

Routes are handled by litellm-api-routes.ts.

Limitations

  1. Python Dependency: Requires ccw-litellm Python package installed
  2. Model Support: Limited to models supported by LiteLLM library
  3. Cache Scope: Context cache is in-memory (not persisted across restarts)
  4. Pattern Syntax: Only supports glob-style @patterns, not regex

Troubleshooting

Error: "Endpoint not found"

  • Verify endpoint ID matches config file
  • Check litellm-api-config.json exists in .ccw/storage/config/

Error: "API key not configured"

  • Ensure environment variable is set
  • Verify ${ENV_VAR} syntax in config
  • Test with echo $OPENAI_API_KEY

Error: "Failed to spawn Python process"

  • Install ccw-litellm: pip install ccw-litellm
  • Verify Python accessible: python --version

Cache Not Applied

  • Check endpoint has cacheStrategy.enabled: true
  • Verify prompt contains @patterns
  • Check cache TTL hasn't expired

Examples

See examples/litellm-config.json for complete configuration template.