mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-02-05 01:50:27 +08:00

Files

catlog22 b00113d212 feat: Enhance embedding management and model configuration

- Updated embedding_manager.py to include backend parameter in model configuration.
- Modified model_manager.py to utilize cache_name for ONNX models.
- Refactored hybrid_search.py to improve embedder initialization based on backend type.
- Added backend column to vector_store.py for better model configuration management.
- Implemented migration for existing database to include backend information.
- Enhanced API settings implementation with comprehensive provider and endpoint management.
- Introduced LiteLLM integration guide detailing configuration and usage.
- Added examples for LiteLLM usage in TypeScript.

2025-12-24 14:03:59 +08:00

9.6 KiB

Raw Blame History

LiteLLM Integration Guide

Overview

CCW now supports custom LiteLLM endpoints with integrated context caching. You can configure multiple providers (OpenAI, Anthropic, Ollama, etc.) and create custom endpoints with file-based caching strategies.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      CLI Executor                           │
│                                                             │
│  ┌─────────────┐         ┌──────────────────────────────┐  │
│  │   --model   │────────>│  Route Decision:             │  │
│  │   flag      │         │  - gemini/qwen/codex → CLI   │  │
│  └─────────────┘         │  - custom ID → LiteLLM       │  │
│                          └──────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────┐
│                  LiteLLM Executor                           │
│                                                             │
│  1. Load endpoint config (litellm-api-config.json)         │
│  2. Extract @patterns from prompt                          │
│  3. Pack files via context-cache                           │
│  4. Call LiteLLM client with cached content + prompt       │
│  5. Return result                                          │
└─────────────────────────────────────────────────────────────┘

Configuration

File Location

Configuration is stored per-project:

<project>/.ccw/storage/config/litellm-api-config.json

Configuration Structure

{
  "version": 1,
  "providers": [
    {
      "id": "openai-1234567890",
      "name": "My OpenAI",
      "type": "openai",
      "apiKey": "${OPENAI_API_KEY}",
      "enabled": true,
      "createdAt": "2025-01-01T00:00:00.000Z",
      "updatedAt": "2025-01-01T00:00:00.000Z"
    }
  ],
  "endpoints": [
    {
      "id": "my-gpt4o",
      "name": "GPT-4o with Context Cache",
      "providerId": "openai-1234567890",
      "model": "gpt-4o",
      "description": "GPT-4o with automatic file caching",
      "cacheStrategy": {
        "enabled": true,
        "ttlMinutes": 60,
        "maxSizeKB": 512,
        "filePatterns": ["*.md", "*.ts", "*.js"]
      },
      "enabled": true,
      "createdAt": "2025-01-01T00:00:00.000Z",
      "updatedAt": "2025-01-01T00:00:00.000Z"
    }
  ],
  "defaultEndpoint": "my-gpt4o",
  "globalCacheSettings": {
    "enabled": true,
    "cacheDir": "~/.ccw/cache/context",
    "maxTotalSizeMB": 100
  }
}

Usage

Via CLI

# Use custom endpoint with --model flag
ccw cli -p "Analyze authentication flow" --tool litellm --model my-gpt4o

# With context patterns (automatically cached)
ccw cli -p "@src/auth/**/*.ts Review security" --tool litellm --model my-gpt4o

# Disable caching for specific call
ccw cli -p "Quick question" --tool litellm --model my-gpt4o --no-cache

Via Dashboard API

Create Provider

curl -X POST http://localhost:3000/api/litellm-api/providers \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My OpenAI",
    "type": "openai",
    "apiKey": "${OPENAI_API_KEY}",
    "enabled": true
  }'

Create Endpoint

curl -X POST http://localhost:3000/api/litellm-api/endpoints \
  -H "Content-Type: application/json" \
  -d '{
    "id": "my-gpt4o",
    "name": "GPT-4o with Cache",
    "providerId": "openai-1234567890",
    "model": "gpt-4o",
    "cacheStrategy": {
      "enabled": true,
      "ttlMinutes": 60,
      "maxSizeKB": 512,
      "filePatterns": ["*.md", "*.ts"]
    },
    "enabled": true
  }'

Test Provider Connection

curl -X POST http://localhost:3000/api/litellm-api/providers/openai-1234567890/test

Context Caching

How It Works

Pattern Detection: LiteLLM executor scans prompt for @patterns
```
@src/**/*.ts
@CLAUDE.md
@../shared/**/*
```
File Packing: Files matching patterns are packed via context-cache tool
- Respects max_file_size limit (default: 1MB per file)
- Applies TTL from endpoint config
- Generates session ID for retrieval
Cache Integration: Cached content is prepended to prompt
```
<cached files>
---
<original prompt>
```
LLM Call: Combined prompt sent to LiteLLM with provider credentials

Cache Strategy Configuration

interface CacheStrategy {
  enabled: boolean;           // Enable/disable caching for this endpoint
  ttlMinutes: number;         // Cache lifetime (default: 60)
  maxSizeKB: number;          // Max cache size (default: 512KB)
  filePatterns: string[];     // Glob patterns to cache
}

Example: Security Audit with Cache

ccw cli -p "
PURPOSE: OWASP Top 10 security audit of authentication module
TASK: • Check SQL injection • Verify session management • Test XSS vectors
CONTEXT: @src/auth/**/*.ts @src/middleware/auth.ts
EXPECTED: Security report with severity levels and remediation steps
" --tool litellm --model my-security-scanner --mode analysis

What happens:

Executor detects @src/auth/**/*.ts and @src/middleware/auth.ts
Packs matching files into context cache
Cache entry valid for 60 minutes (per endpoint config)
Subsequent calls reuse cached files (no re-packing)
LiteLLM receives full context without manual file specification

Environment Variables

Provider API Keys

LiteLLM uses standard environment variable names:

Provider	Env Var Name
OpenAI	`OPENAI_API_KEY`
Anthropic	`ANTHROPIC_API_KEY`
Google	`GOOGLE_API_KEY`
Azure	`AZURE_API_KEY`
Mistral	`MISTRAL_API_KEY`
DeepSeek	`DEEPSEEK_API_KEY`

Configuration Syntax

Use ${ENV_VAR} syntax in config:

{
  "apiKey": "${OPENAI_API_KEY}"
}

The executor resolves these at runtime via resolveEnvVar().

API Reference

Config Manager (`litellm-api-config-manager.ts`)

Provider Management

getAllProviders(baseDir: string): ProviderCredential[]
getProvider(baseDir: string, providerId: string): ProviderCredential | null
getProviderWithResolvedEnvVars(baseDir: string, providerId: string): ProviderCredential & { resolvedApiKey: string } | null
addProvider(baseDir: string, providerData): ProviderCredential
updateProvider(baseDir: string, providerId: string, updates): ProviderCredential
deleteProvider(baseDir: string, providerId: string): boolean

Endpoint Management

getAllEndpoints(baseDir: string): CustomEndpoint[]
getEndpoint(baseDir: string, endpointId: string): CustomEndpoint | null
findEndpointById(baseDir: string, endpointId: string): CustomEndpoint | null
addEndpoint(baseDir: string, endpointData): CustomEndpoint
updateEndpoint(baseDir: string, endpointId: string, updates): CustomEndpoint
deleteEndpoint(baseDir: string, endpointId: string): boolean

Executor (`litellm-executor.ts`)

interface LiteLLMExecutionOptions {
  prompt: string;
  endpointId: string;
  baseDir: string;
  cwd?: string;
  includeDirs?: string[];
  enableCache?: boolean;
  onOutput?: (data: { type: string; data: string }) => void;
}

interface LiteLLMExecutionResult {
  success: boolean;
  output: string;
  model: string;
  provider: string;
  cacheUsed: boolean;
  cachedFiles?: string[];
  error?: string;
}

executeLiteLLMEndpoint(options: LiteLLMExecutionOptions): Promise<LiteLLMExecutionResult>
extractPatterns(prompt: string): string[]

Dashboard Integration

The dashboard provides UI for managing LiteLLM configuration:

Providers: Add/edit/delete provider credentials
Endpoints: Configure custom endpoints with cache strategies
Cache Stats: View cache usage and clear entries
Test Connections: Verify provider API access

Routes are handled by litellm-api-routes.ts.

Limitations

Python Dependency: Requires ccw-litellm Python package installed
Model Support: Limited to models supported by LiteLLM library
Cache Scope: Context cache is in-memory (not persisted across restarts)
Pattern Syntax: Only supports glob-style @patterns, not regex

Troubleshooting

Error: "Endpoint not found"

Verify endpoint ID matches config file
Check litellm-api-config.json exists in .ccw/storage/config/

Error: "API key not configured"

Ensure environment variable is set
Verify ${ENV_VAR} syntax in config
Test with echo $OPENAI_API_KEY

Error: "Failed to spawn Python process"

Install ccw-litellm: pip install ccw-litellm
Verify Python accessible: python --version

Cache Not Applied

Check endpoint has cacheStrategy.enabled: true
Verify prompt contains @patterns
Check cache TTL hasn't expired

Examples

See examples/litellm-config.json for complete configuration template.

9.6 KiB Raw Blame History