- Updated embedding_manager.py to include backend parameter in model configuration. - Modified model_manager.py to utilize cache_name for ONNX models. - Refactored hybrid_search.py to improve embedder initialization based on backend type. - Added backend column to vector_store.py for better model configuration management. - Implemented migration for existing database to include backend information. - Enhanced API settings implementation with comprehensive provider and endpoint management. - Introduced LiteLLM integration guide detailing configuration and usage. - Added examples for LiteLLM usage in TypeScript.
9.6 KiB
LiteLLM Integration Guide
Overview
CCW now supports custom LiteLLM endpoints with integrated context caching. You can configure multiple providers (OpenAI, Anthropic, Ollama, etc.) and create custom endpoints with file-based caching strategies.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ CLI Executor │
│ │
│ ┌─────────────┐ ┌──────────────────────────────┐ │
│ │ --model │────────>│ Route Decision: │ │
│ │ flag │ │ - gemini/qwen/codex → CLI │ │
│ └─────────────┘ │ - custom ID → LiteLLM │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LiteLLM Executor │
│ │
│ 1. Load endpoint config (litellm-api-config.json) │
│ 2. Extract @patterns from prompt │
│ 3. Pack files via context-cache │
│ 4. Call LiteLLM client with cached content + prompt │
│ 5. Return result │
└─────────────────────────────────────────────────────────────┘
Configuration
File Location
Configuration is stored per-project:
<project>/.ccw/storage/config/litellm-api-config.json
Configuration Structure
{
"version": 1,
"providers": [
{
"id": "openai-1234567890",
"name": "My OpenAI",
"type": "openai",
"apiKey": "${OPENAI_API_KEY}",
"enabled": true,
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z"
}
],
"endpoints": [
{
"id": "my-gpt4o",
"name": "GPT-4o with Context Cache",
"providerId": "openai-1234567890",
"model": "gpt-4o",
"description": "GPT-4o with automatic file caching",
"cacheStrategy": {
"enabled": true,
"ttlMinutes": 60,
"maxSizeKB": 512,
"filePatterns": ["*.md", "*.ts", "*.js"]
},
"enabled": true,
"createdAt": "2025-01-01T00:00:00.000Z",
"updatedAt": "2025-01-01T00:00:00.000Z"
}
],
"defaultEndpoint": "my-gpt4o",
"globalCacheSettings": {
"enabled": true,
"cacheDir": "~/.ccw/cache/context",
"maxTotalSizeMB": 100
}
}
Usage
Via CLI
# Use custom endpoint with --model flag
ccw cli -p "Analyze authentication flow" --tool litellm --model my-gpt4o
# With context patterns (automatically cached)
ccw cli -p "@src/auth/**/*.ts Review security" --tool litellm --model my-gpt4o
# Disable caching for specific call
ccw cli -p "Quick question" --tool litellm --model my-gpt4o --no-cache
Via Dashboard API
Create Provider
curl -X POST http://localhost:3000/api/litellm-api/providers \
-H "Content-Type: application/json" \
-d '{
"name": "My OpenAI",
"type": "openai",
"apiKey": "${OPENAI_API_KEY}",
"enabled": true
}'
Create Endpoint
curl -X POST http://localhost:3000/api/litellm-api/endpoints \
-H "Content-Type: application/json" \
-d '{
"id": "my-gpt4o",
"name": "GPT-4o with Cache",
"providerId": "openai-1234567890",
"model": "gpt-4o",
"cacheStrategy": {
"enabled": true,
"ttlMinutes": 60,
"maxSizeKB": 512,
"filePatterns": ["*.md", "*.ts"]
},
"enabled": true
}'
Test Provider Connection
curl -X POST http://localhost:3000/api/litellm-api/providers/openai-1234567890/test
Context Caching
How It Works
-
Pattern Detection: LiteLLM executor scans prompt for
@patterns@src/**/*.ts @CLAUDE.md @../shared/**/* -
File Packing: Files matching patterns are packed via
context-cachetool- Respects
max_file_sizelimit (default: 1MB per file) - Applies TTL from endpoint config
- Generates session ID for retrieval
- Respects
-
Cache Integration: Cached content is prepended to prompt
<cached files> --- <original prompt> -
LLM Call: Combined prompt sent to LiteLLM with provider credentials
Cache Strategy Configuration
interface CacheStrategy {
enabled: boolean; // Enable/disable caching for this endpoint
ttlMinutes: number; // Cache lifetime (default: 60)
maxSizeKB: number; // Max cache size (default: 512KB)
filePatterns: string[]; // Glob patterns to cache
}
Example: Security Audit with Cache
ccw cli -p "
PURPOSE: OWASP Top 10 security audit of authentication module
TASK: • Check SQL injection • Verify session management • Test XSS vectors
CONTEXT: @src/auth/**/*.ts @src/middleware/auth.ts
EXPECTED: Security report with severity levels and remediation steps
" --tool litellm --model my-security-scanner --mode analysis
What happens:
- Executor detects
@src/auth/**/*.tsand@src/middleware/auth.ts - Packs matching files into context cache
- Cache entry valid for 60 minutes (per endpoint config)
- Subsequent calls reuse cached files (no re-packing)
- LiteLLM receives full context without manual file specification
Environment Variables
Provider API Keys
LiteLLM uses standard environment variable names:
| Provider | Env Var Name |
|---|---|
| OpenAI | OPENAI_API_KEY |
| Anthropic | ANTHROPIC_API_KEY |
GOOGLE_API_KEY |
|
| Azure | AZURE_API_KEY |
| Mistral | MISTRAL_API_KEY |
| DeepSeek | DEEPSEEK_API_KEY |
Configuration Syntax
Use ${ENV_VAR} syntax in config:
{
"apiKey": "${OPENAI_API_KEY}"
}
The executor resolves these at runtime via resolveEnvVar().
API Reference
Config Manager (litellm-api-config-manager.ts)
Provider Management
getAllProviders(baseDir: string): ProviderCredential[]
getProvider(baseDir: string, providerId: string): ProviderCredential | null
getProviderWithResolvedEnvVars(baseDir: string, providerId: string): ProviderCredential & { resolvedApiKey: string } | null
addProvider(baseDir: string, providerData): ProviderCredential
updateProvider(baseDir: string, providerId: string, updates): ProviderCredential
deleteProvider(baseDir: string, providerId: string): boolean
Endpoint Management
getAllEndpoints(baseDir: string): CustomEndpoint[]
getEndpoint(baseDir: string, endpointId: string): CustomEndpoint | null
findEndpointById(baseDir: string, endpointId: string): CustomEndpoint | null
addEndpoint(baseDir: string, endpointData): CustomEndpoint
updateEndpoint(baseDir: string, endpointId: string, updates): CustomEndpoint
deleteEndpoint(baseDir: string, endpointId: string): boolean
Executor (litellm-executor.ts)
interface LiteLLMExecutionOptions {
prompt: string;
endpointId: string;
baseDir: string;
cwd?: string;
includeDirs?: string[];
enableCache?: boolean;
onOutput?: (data: { type: string; data: string }) => void;
}
interface LiteLLMExecutionResult {
success: boolean;
output: string;
model: string;
provider: string;
cacheUsed: boolean;
cachedFiles?: string[];
error?: string;
}
executeLiteLLMEndpoint(options: LiteLLMExecutionOptions): Promise<LiteLLMExecutionResult>
extractPatterns(prompt: string): string[]
Dashboard Integration
The dashboard provides UI for managing LiteLLM configuration:
- Providers: Add/edit/delete provider credentials
- Endpoints: Configure custom endpoints with cache strategies
- Cache Stats: View cache usage and clear entries
- Test Connections: Verify provider API access
Routes are handled by litellm-api-routes.ts.
Limitations
- Python Dependency: Requires
ccw-litellmPython package installed - Model Support: Limited to models supported by LiteLLM library
- Cache Scope: Context cache is in-memory (not persisted across restarts)
- Pattern Syntax: Only supports glob-style
@patterns, not regex
Troubleshooting
Error: "Endpoint not found"
- Verify endpoint ID matches config file
- Check
litellm-api-config.jsonexists in.ccw/storage/config/
Error: "API key not configured"
- Ensure environment variable is set
- Verify
${ENV_VAR}syntax in config - Test with
echo $OPENAI_API_KEY
Error: "Failed to spawn Python process"
- Install ccw-litellm:
pip install ccw-litellm - Verify Python accessible:
python --version
Cache Not Applied
- Check endpoint has
cacheStrategy.enabled: true - Verify prompt contains
@patterns - Check cache TTL hasn't expired
Examples
See examples/litellm-config.json for complete configuration template.