mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-02-15 02:42:45 +08:00

Files

catlog22 633d918da1 Add quality gates and tuning strategies documentation

- Introduced quality gates specification for skill tuning, detailing quality dimensions, scoring, and gate definitions.
- Added comprehensive tuning strategies for various issue categories, including context explosion, long-tail forgetting, data flow, and agent coordination.
- Created templates for diagnosis reports and fix proposals to standardize documentation and reporting processes.

2026-01-14 12:59:13 +08:00

23 KiB

Raw Blame History

Tuning Strategies

Detailed fix strategies for each problem category with implementation guidance.

When to Use

Phase	Usage	Section
action-propose-fixes	Strategy selection	Strategy Details
action-apply-fix	Implementation guidance	Implementation
action-verify	Verification steps	Verification

Context Explosion Strategies

Strategy: sliding_window

Purpose: Limit context history to most recent N items.

Implementation:

// In orchestrator.md or phase files
const MAX_HISTORY_ITEMS = 5;

function updateHistory(state, newItem) {
  const history = state.history || [];
  const updated = [...history, newItem].slice(-MAX_HISTORY_ITEMS);
  return { ...state, history: updated };
}

Files to Modify:

phases/orchestrator.md - Add history management
phases/state-schema.md - Document history limit

Risk: Low Verification:

Run skill for 10+ iterations
Verify history.length never exceeds MAX_HISTORY_ITEMS

Strategy: path_reference

Purpose: Pass file paths instead of full content.

Implementation:

// Before
const content = Read('data.json');
const prompt = `Analyze: ${content}`;

// After
const dataPath = `${workDir}/data.json`;
const prompt = `Analyze file at: ${dataPath}. Read it first.`;

Files to Modify:

All phase files with ${content} in prompts

Risk: Low Verification:

Verify agents can still access required data
Check token count reduced

Strategy: context_summarization

Purpose: Add summarization step before passing to next phase.

Implementation:

// Add summarization agent
const summarizeResult = await Task({
  subagent_type: 'universal-executor',
  prompt: `
    Summarize the following in <100 words, preserving key facts:
    ${fullContent}

    Return JSON: { summary: "...", key_points: [...] }
  `
});

// Pass summary instead of full content
nextPhasePrompt = `Previous phase summary: ${summarizeResult.summary}`;

Files to Modify:

Phase transition points
Orchestrator (if autonomous)

Risk: Low Verification:

Compare output quality with/without summarization
Verify key information preserved

Strategy: structured_state

Purpose: Replace text-based context with structured JSON state.

Implementation:

// Before: Text-based context passing
const context = `
  User requested: ${userRequest}
  Previous output: ${previousOutput}
  Current status: ${status}
`;

// After: Structured state
const state = {
  original_request: userRequest,
  previous_output_path: `${workDir}/output.md`,
  previous_output_summary: "...",
  status: status,
  key_decisions: [...]
};

Files to Modify:

phases/state-schema.md - Define structure
All phases - Use structured fields

Risk: Medium (requires refactoring) Verification:

Verify all phases can access required state fields
Check backward compatibility

Long-tail Forgetting Strategies

Strategy: constraint_injection

Purpose: Inject original constraints into every phase prompt.

Implementation:

// Add to every phase prompt template
const phasePrompt = `
[CONSTRAINTS - FROM ORIGINAL REQUEST]
${state.original_requirements.map(r => `- ${r}`).join('\n')}

[CURRENT TASK]
${taskDescription}

[REMINDER]
Output MUST satisfy all constraints listed above.
`;

Files to Modify:

All phases/*.md files
templates/agent-base.md (if exists)

Risk: Low Verification:

Verify constraints visible in each phase
Test with specific constraint, verify output respects it

Strategy: state_constraints_field

Purpose: Add dedicated field in state schema for requirements.

Implementation:

// In state-schema.md
interface State {
  // Add these fields
  original_requirements: string[];    // User's original constraints
  goal_summary: string;               // One-line goal statement
  constraint_violations: string[];    // Track any violations
}

// In action-init.md
function initState(userInput) {
  return {
    original_requirements: extractRequirements(userInput),
    goal_summary: summarizeGoal(userInput),
    constraint_violations: []
  };
}

Files to Modify:

phases/state-schema.md
phases/actions/action-init.md

Risk: Low Verification:

Verify state.json contains requirements after init
Check requirements persist through all phases

Strategy: checkpoint_restore

Purpose: Save state at key milestones for recovery and verification.

Implementation:

// Add checkpoint function
function createCheckpoint(state, workDir, checkpointName) {
  const checkpointPath = `${workDir}/checkpoints/${checkpointName}.json`;
  Write(checkpointPath, JSON.stringify({
    state: state,
    timestamp: new Date().toISOString(),
    name: checkpointName
  }, null, 2));
  return checkpointPath;
}

// Use at key points
await executePhase2();
createCheckpoint(state, workDir, 'after-phase-2');

Files to Modify:

phases/orchestrator.md
Key phase files

Risk: Low Verification:

Verify checkpoints created at expected points
Test restore from checkpoint

Strategy: goal_embedding

Purpose: Track semantic similarity to original goal throughout execution.

Implementation:

// Store goal embedding at init
state.goal_embedding = await embed(state.goal_summary);

// At each major phase, check alignment
const currentPlanEmbedding = await embed(currentPlan);
const similarity = cosineSimilarity(state.goal_embedding, currentPlanEmbedding);

if (similarity < 0.7) {
  console.warn('Goal drift detected! Similarity:', similarity);
  // Trigger re-alignment
}

Files to Modify:

State schema (add embedding field)
Orchestrator (add similarity check)

Risk: Medium (requires embedding infrastructure) Verification:

Test with intentional drift, verify detection
Verify false positive rate acceptable

Data Flow Strategies

Strategy: state_centralization

Purpose: Use single state.json for all persistent data.

Implementation:

// Create state manager
const StateManager = {
  read: (workDir) => JSON.parse(Read(`${workDir}/state.json`)),

  update: (workDir, updates) => {
    const current = StateManager.read(workDir);
    const next = { ...current, ...updates, updated_at: new Date().toISOString() };
    Write(`${workDir}/state.json`, JSON.stringify(next, null, 2));
    return next;
  },

  get: (workDir, path) => {
    const state = StateManager.read(workDir);
    return path.split('.').reduce((obj, key) => obj?.[key], state);
  }
};

// Replace direct writes
// Before: Write(`${workDir}/config.json`, config);
// After:  StateManager.update(workDir, { config });

Files to Modify:

All phases that write state
Create shared state manager

Risk: Medium (significant refactoring) Verification:

Verify single state.json after full run
Check no orphan state files

Strategy: schema_enforcement

Purpose: Add runtime validation using Zod or similar.

Implementation:

// Define schema (in state-schema.md)
const StateSchema = {
  status: ['pending', 'running', 'completed', 'failed'],
  target_skill: {
    name: 'string',
    path: 'string'
  },
  // ... full schema
};

function validateState(state) {
  const errors = [];

  if (!StateSchema.status.includes(state.status)) {
    errors.push(`Invalid status: ${state.status}`);
  }

  if (typeof state.target_skill?.name !== 'string') {
    errors.push('target_skill.name must be string');
  }

  if (errors.length > 0) {
    throw new Error(`State validation failed:\n${errors.join('\n')}`);
  }

  return true;
}

// Use before state write
function updateState(workDir, updates) {
  const newState = { ...currentState, ...updates };
  validateState(newState);  // Throws if invalid
  Write(`${workDir}/state.json`, JSON.stringify(newState, null, 2));
}

Files to Modify:

phases/state-schema.md - Add validation function
All state write locations

Risk: Low Verification:

Test with invalid state, verify rejection
Verify valid state accepted

Strategy: field_normalization

Purpose: Normalize field names across all phases.

Implementation:

// Create normalization mapping
const FIELD_NORMALIZATIONS = {
  'title': 'name',
  'identifier': 'id',
  'state': 'status',
  'error': 'errors'
};

function normalizeData(data) {
  if (typeof data !== 'object' || data === null) return data;

  const normalized = {};
  for (const [key, value] of Object.entries(data)) {
    const normalizedKey = FIELD_NORMALIZATIONS[key] || key;
    normalized[normalizedKey] = normalizeData(value);
  }
  return normalized;
}

// Apply when reading external data
const rawData = JSON.parse(Read(filePath));
const normalizedData = normalizeData(rawData);

Files to Modify:

Data ingestion points
State update functions

Risk: Low Verification:

Verify consistent field names in state
Check no data loss during normalization

Agent Coordination Strategies

Strategy: error_wrapping

Purpose: Add try-catch to all Task calls.

Implementation:

// Wrapper function
async function safeTask(config, state, updateState) {
  const maxRetries = 3;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await Task(config);

      // Validate result
      if (!result) throw new Error('Empty result from agent');

      return result;
    } catch (error) {
      console.log(`Task attempt ${attempt} failed: ${error.message}`);

      if (attempt === maxRetries) {
        updateState({
          errors: [...state.errors, {
            action: config.subagent_type,
            message: error.message,
            timestamp: new Date().toISOString()
          }],
          error_count: state.error_count + 1
        });
        throw error;
      }

      // Wait before retry
      await new Promise(r => setTimeout(r, 1000 * attempt));
    }
  }
}

Files to Modify:

All files with Task calls

Risk: Low Verification:

Simulate agent failure, verify graceful handling
Verify retry logic works

Strategy: result_validation

Purpose: Validate agent returns before use.

Implementation:

function validateAgentResult(result, expectedSchema) {
  // Try JSON parse
  let parsed;
  try {
    parsed = typeof result === 'string' ? JSON.parse(result) : result;
  } catch (e) {
    throw new Error(`Agent result is not valid JSON: ${result.slice(0, 100)}`);
  }

  // Check required fields
  for (const field of expectedSchema.required || []) {
    if (!(field in parsed)) {
      throw new Error(`Missing required field: ${field}`);
    }
  }

  return parsed;
}

// Usage
const rawResult = await Task({...});
const validResult = validateAgentResult(rawResult, {
  required: ['status', 'output_file']
});

Files to Modify:

All locations where agent results are used

Risk: Low Verification:

Test with invalid agent output
Verify proper error messages

Strategy: flatten_nesting

Purpose: Remove nested agent calls, use orchestrator coordination.

Implementation:

// Before: Agent A calls Agent B in its prompt
// Agent A prompt: "... then call Task({subagent_type: 'B', ...}) ..."

// After: Agent A returns signal, orchestrator handles
// Agent A prompt: "If you need further analysis, return: { needs_agent_b: true, context: ... }"

// Orchestrator handles:
const resultA = await Task({ subagent_type: 'A', ... });
const parsedA = JSON.parse(resultA);

if (parsedA.needs_agent_b) {
  const resultB = await Task({
    subagent_type: 'B',
    prompt: `Continue analysis with context: ${JSON.stringify(parsedA.context)}`
  });
}

Files to Modify:

Phase files with nested Task calls
Orchestrator decision logic

Risk: Medium (may change agent behavior) Verification:

Verify no nested Task patterns
Test agent chain via orchestrator

Strategy Selection Guide

Issue Type: Context Explosion
├── history grows unbounded? → sliding_window
├── full content in prompts? → path_reference
├── no summarization? → context_summarization
└── text-based context? → structured_state

Issue Type: Long-tail Forgetting
├── constraints not in phases? → constraint_injection
├── no requirements in state? → state_constraints_field
├── no recovery points? → checkpoint_restore
└── goal drift risk? → goal_embedding

Issue Type: Data Flow
├── multiple state files? → state_centralization
├── no validation? → schema_enforcement
└── inconsistent names? → field_normalization

Issue Type: Agent Coordination
├── no error handling? → error_wrapping
├── no result validation? → result_validation
└── nested agent calls? → flatten_nesting

Issue Type: Prompt Engineering
├── vague instructions? → structured_prompt
├── inconsistent output? → output_schema
├── hallucination risk? → grounding_context
└── format drift? → format_enforcement

Issue Type: Architecture
├── unclear responsibilities? → phase_decomposition
├── tight coupling? → interface_contracts
├── poor extensibility? → plugin_architecture
└── complex flow? → state_machine

Issue Type: Performance
├── high token usage? → token_budgeting
├── slow execution? → parallel_execution
├── redundant computation? → result_caching
└── large files? → lazy_loading

Issue Type: Error Handling
├── no recovery? → graceful_degradation
├── silent failures? → error_propagation
├── no logging? → structured_logging
└── unclear errors? → error_context

Issue Type: Output Quality
├── inconsistent quality? → quality_gates
├── no verification? → output_validation
├── format issues? → template_enforcement
└── incomplete output? → completeness_check

Issue Type: User Experience
├── no progress? → progress_tracking
├── unclear status? → status_communication
├── no feedback? → interactive_checkpoints
└── confusing flow? → guided_workflow

General Tuning Strategies (按需 via Gemini CLI)

以下策略针对更通用的优化场景，通常需要 Gemini CLI 进行深度分析后生成具体实现。

Prompt Engineering Strategies

Strategy: structured_prompt

Purpose: 将模糊指令转换为结构化提示词。

Implementation:

// Before: Vague prompt
const prompt = "Please analyze the code and give suggestions";

// After: Structured prompt
const prompt = `
[ROLE]
You are a code analysis expert specializing in ${domain}.

[TASK]
Analyze the provided code for:
1. Code quality issues
2. Performance bottlenecks
3. Security vulnerabilities

[INPUT]
File: ${filePath}
Context: ${context}

[OUTPUT FORMAT]
Return JSON:
{
  "issues": [{ "type": "...", "severity": "...", "location": "...", "suggestion": "..." }],
  "summary": "..."
}

[CONSTRAINTS]
- Focus on actionable issues only
- Limit to top 10 findings
`;

Risk: Low Verification: Check output consistency across multiple runs

Strategy: output_schema

Purpose: 强制 LLM 输出符合特定 schema。

Implementation:

// Define expected schema
const outputSchema = {
  type: 'object',
  required: ['status', 'result'],
  properties: {
    status: { enum: ['success', 'error', 'partial'] },
    result: { type: 'object' },
    errors: { type: 'array' }
  }
};

// Include in prompt
const prompt = `
...task description...

[OUTPUT SCHEMA]
Your response MUST be valid JSON matching this schema:
${JSON.stringify(outputSchema, null, 2)}

[VALIDATION]
Before returning, verify your output:
1. Is it valid JSON?
2. Does it have all required fields?
3. Are field types correct?
`;

Risk: Low Verification: JSON.parse + schema validation

Strategy: grounding_context

Purpose: 提供足够上下文减少幻觉。

Implementation:

// Gather grounding context
const groundingContext = {
  codebase_patterns: await analyzePatterns(skillPath),
  existing_examples: await findSimilarImplementations(taskType),
  constraints: state.original_requirements
};

const prompt = `
[GROUNDING CONTEXT]
This skill follows these patterns:
${JSON.stringify(groundingContext.codebase_patterns)}

Similar implementations exist at:
${groundingContext.existing_examples.map(e => `- ${e.path}`).join('\n')}

[TASK]
${taskDescription}

[IMPORTANT]
- Only suggest patterns that exist in the codebase
- Reference specific files when making suggestions
- If unsure, indicate uncertainty level
`;

Risk: Medium (requires context gathering) Verification: Check suggestions match existing patterns

Architecture Strategies

Strategy: phase_decomposition

Purpose: 重新划分阶段以清晰化职责。

Analysis via Gemini:

ccw cli -p "
PURPOSE: Analyze phase decomposition for skill at ${skillPath}
TASK: • Map current phase responsibilities • Identify overlapping concerns • Suggest cleaner boundaries
MODE: analysis
CONTEXT: @phases/**/*.md
EXPECTED: { current_phases: [], overlaps: [], recommended_structure: [] }
" --tool gemini --mode analysis

Implementation Pattern:

Before: Monolithic phases
Phase1: Collect + Analyze + Transform + Output

After: Single-responsibility phases
Phase1: Collect (input gathering)
Phase2: Analyze (processing)
Phase3: Transform (conversion)
Phase4: Output (delivery)

Strategy: interface_contracts

Purpose: 定义阶段间的数据契约。

Implementation:

// Define contracts in state-schema.md
interface PhaseContract {
  input: {
    required: string[];
    optional: string[];
    schema: object;
  };
  output: {
    guarantees: string[];
    schema: object;
  };
}

// Phase 1 output contract
const phase1Contract: PhaseContract = {
  input: {
    required: ['user_request'],
    optional: ['preferences'],
    schema: { /* ... */ }
  },
  output: {
    guarantees: ['parsed_requirements', 'validation_status'],
    schema: { /* ... */ }
  }
};

Performance Strategies

Strategy: token_budgeting

Purpose: 为每个阶段设置 Token 预算。

Implementation:

const TOKEN_BUDGETS = {
  'phase-collect': 2000,
  'phase-analyze': 5000,
  'phase-generate': 8000,
  total: 15000
};

function checkBudget(phase, estimatedTokens) {
  if (estimatedTokens > TOKEN_BUDGETS[phase]) {
    console.warn(`Phase ${phase} exceeds budget: ${estimatedTokens} > ${TOKEN_BUDGETS[phase]}`);
    // Trigger summarization or truncation
    return false;
  }
  return true;
}

Strategy: parallel_execution

Purpose: 并行执行独立任务。

Implementation:

// Before: Sequential
const result1 = await Task({ subagent_type: 'analyzer', prompt: prompt1 });
const result2 = await Task({ subagent_type: 'analyzer', prompt: prompt2 });
const result3 = await Task({ subagent_type: 'analyzer', prompt: prompt3 });

// After: Parallel (when independent)
const [result1, result2, result3] = await Promise.all([
  Task({ subagent_type: 'analyzer', prompt: prompt1, run_in_background: true }),
  Task({ subagent_type: 'analyzer', prompt: prompt2, run_in_background: true }),
  Task({ subagent_type: 'analyzer', prompt: prompt3, run_in_background: true })
]);

Strategy: result_caching

Purpose: 缓存中间结果避免重复计算。

Implementation:

const cache = {};

async function cachedAnalysis(key, analysisFunc) {
  if (cache[key]) {
    console.log(`Cache hit: ${key}`);
    return cache[key];
  }

  const result = await analysisFunc();
  cache[key] = result;

  // Persist to disk for cross-session caching
  Write(`${workDir}/cache/${key}.json`, JSON.stringify(result));

  return result;
}

Error Handling Strategies

Strategy: graceful_degradation

Purpose: 失败时降级而非崩溃。

Implementation:

async function executeWithDegradation(primaryTask, fallbackTask) {
  try {
    return await primaryTask();
  } catch (error) {
    console.warn(`Primary task failed: ${error.message}, using fallback`);

    try {
      return await fallbackTask();
    } catch (fallbackError) {
      console.error(`Fallback also failed: ${fallbackError.message}`);
      return {
        status: 'degraded',
        partial_result: null,
        error: fallbackError.message
      };
    }
  }
}

Strategy: structured_logging

Purpose: 添加结构化日志便于调试。

Implementation:

function log(level, action, data) {
  const entry = {
    timestamp: new Date().toISOString(),
    level,
    action,
    ...data
  };

  // Append to log file
  const logPath = `${workDir}/execution.log`;
  const existing = Read(logPath) || '';
  Write(logPath, existing + JSON.stringify(entry) + '\n');

  // Console output
  console.log(`[${level}] ${action}:`, JSON.stringify(data));
}

// Usage
log('INFO', 'phase_start', { phase: 'analyze', input_size: 1000 });
log('ERROR', 'agent_failure', { agent: 'universal-executor', error: err.message });

Output Quality Strategies

Strategy: quality_gates

Purpose: 输出前进行质量检查。

Implementation:

const qualityGates = [
  {
    name: 'completeness',
    check: (output) => output.sections?.length >= 3,
    message: 'Output must have at least 3 sections'
  },
  {
    name: 'format',
    check: (output) => /^#\s/.test(output.content),
    message: 'Output must start with markdown heading'
  },
  {
    name: 'length',
    check: (output) => output.content?.length >= 500,
    message: 'Output must be at least 500 characters'
  }
];

function validateOutput(output) {
  const failures = qualityGates
    .filter(gate => !gate.check(output))
    .map(gate => gate.message);

  if (failures.length > 0) {
    throw new Error(`Quality gate failures:\n${failures.join('\n')}`);
  }

  return true;
}

User Experience Strategies

Strategy: progress_tracking

Purpose: 显示执行进度。

Implementation:

function updateProgress(current, total, description) {
  const percentage = Math.round((current / total) * 100);
  const progressBar = '█'.repeat(percentage / 5) + '░'.repeat(20 - percentage / 5);

  console.log(`[${progressBar}] ${percentage}% - ${description}`);

  // Update state for UI
  updateState({
    progress: {
      current,
      total,
      percentage,
      description
    }
  });
}

// Usage
updateProgress(1, 5, 'Initializing tuning session...');
updateProgress(2, 5, 'Running context diagnosis...');

Strategy: interactive_checkpoints

Purpose: 在关键点暂停获取用户确认。

Implementation:

async function checkpoint(name, summary, options) {
  console.log(`\n=== Checkpoint: ${name} ===`);
  console.log(summary);

  const response = await AskUserQuestion({
    questions: [{
      question: `Review ${name} results. How to proceed?`,
      header: 'Checkpoint',
      options: options || [
        { label: 'Continue', description: 'Proceed with next step' },
        { label: 'Modify', description: 'Adjust parameters and retry' },
        { label: 'Skip', description: 'Skip this step' },
        { label: 'Abort', description: 'Stop the workflow' }
      ],
      multiSelect: false
    }]
  });

  return response;
}

23 KiB Raw Blame History

Tuning Strategies

When to Use

Context Explosion Strategies

Strategy: sliding_window

Strategy: path_reference

Strategy: context_summarization

Strategy: structured_state

Long-tail Forgetting Strategies

Strategy: constraint_injection

Strategy: state_constraints_field

Strategy: checkpoint_restore

Strategy: goal_embedding

Data Flow Strategies

Strategy: state_centralization

Strategy: schema_enforcement

Strategy: field_normalization

Agent Coordination Strategies

Strategy: error_wrapping

Strategy: result_validation

Strategy: flatten_nesting

Strategy Selection Guide

General Tuning Strategies (按需 via Gemini CLI)

Prompt Engineering Strategies

Strategy: structured_prompt

Strategy: output_schema

Strategy: grounding_context

Architecture Strategies

Strategy: phase_decomposition

Strategy: interface_contracts

Performance Strategies

Strategy: token_budgeting

Strategy: parallel_execution

Strategy: result_caching

Error Handling Strategies

Strategy: graceful_degradation

Strategy: structured_logging

Output Quality Strategies

Strategy: quality_gates

User Experience Strategies

Strategy: progress_tracking

Strategy: interactive_checkpoints

23 KiB

Raw Blame History