fix: Add explicit JSON schema requirements to review cycle agent prompts

Move specific JSON structure requirements from cli-explore-agent (keep generic)
to review-*-cycle.md prompts. Key requirements now inline in prompts:
- Root must be array [{}] not object {}
- analysis_timestamp field (not timestamp/analyzed_at)
- Flat summary structure (not nested by_severity)
- Lowercase severity/id values
- Correct field names (snippet not code_snippet, impact not exploit_scenario)
This commit is contained in:
catlog22
2025-11-26 15:15:30 +08:00
parent cf6a0f1bc0
commit 34a9a23d5b
5 changed files with 120 additions and 106 deletions

View File

@@ -29,10 +29,10 @@ Phase 1: Task Understanding
↓ Parse prompt for: analysis scope, output requirements, schema path ↓ Parse prompt for: analysis scope, output requirements, schema path
Phase 2: Analysis Execution Phase 2: Analysis Execution
↓ Bash structural scan + Gemini semantic analysis (based on mode) ↓ Bash structural scan + Gemini semantic analysis (based on mode)
Phase 3: Schema Validation (if schema specified in prompt) Phase 3: Schema Validation (MANDATORY if schema specified)
↓ Read schema → Validate structure → Check field names ↓ Read schema → Extract EXACT field names → Validate structure
Phase 4: Output Generation Phase 4: Output Generation
↓ Agent report + File output (if required) ↓ Agent report + File output (strictly schema-compliant)
``` ```
--- ---
@@ -100,21 +100,34 @@ RULES: {from prompt, if template specified} | analysis=READ-ONLY
## Phase 3: Schema Validation ## Phase 3: Schema Validation
**⚠️ MANDATORY if schema file specified in prompt** ### ⚠️ CRITICAL: Schema Compliance Protocol
**Step 1**: `Read()` schema file specified in prompt **This phase is MANDATORY when schema file is specified in prompt.**
**Step 2**: Extract requirements from schema: **Step 1: Read Schema FIRST**
- Required fields and types ```
- Field naming conventions Read(schema_file_path)
- Enum values and constraints ```
- Root structure (array vs object)
**Step 3**: Validate before output: **Step 2: Extract Schema Requirements**
- Field names exactly match schema
- Data types correct Parse and memorize:
- All required fields present 1. **Root structure** - Is it array `[...]` or object `{...}`?
- Root structure correct 2. **Required fields** - List all `"required": [...]` arrays
3. **Field names EXACTLY** - Copy character-by-character (case-sensitive)
4. **Enum values** - Copy exact strings (e.g., `"critical"` not `"Critical"`)
5. **Nested structures** - Note flat vs nested requirements
**Step 3: Pre-Output Validation Checklist**
Before writing ANY JSON output, verify:
- [ ] Root structure matches schema (array vs object)
- [ ] ALL required fields present at each level
- [ ] Field names EXACTLY match schema (character-by-character)
- [ ] Enum values EXACTLY match schema (case-sensitive)
- [ ] Nested structures follow schema pattern (flat vs nested)
- [ ] Data types correct (string, integer, array, object)
--- ---
@@ -129,11 +142,13 @@ Brief summary:
### File Output (as specified in prompt) ### File Output (as specified in prompt)
**Extract from prompt**: **⚠️ MANDATORY WORKFLOW**:
- Output file path
- Schema file path
**⚠️ MANDATORY**: If schema specified, `Read()` schema before writing, strictly follow field names and structure. 1. `Read()` schema file BEFORE generating output
2. Extract ALL field names from schema
3. Build JSON using ONLY schema field names
4. Validate against checklist before writing
5. Write file with validated content
--- ---
@@ -150,23 +165,18 @@ Brief summary:
## Key Reminders ## Key Reminders
**ALWAYS**: **ALWAYS**:
1. Understand task requirements from prompt 1. Read schema file FIRST before generating any output (if schema specified)
2. If schema specified, `Read()` schema before writing 2. Copy field names EXACTLY from schema (case-sensitive)
3. Strictly follow schema field names and structure 3. Verify root structure matches schema (array vs object)
4. Include file:line references 4. Match nested/flat structures as schema requires
5. Attribute discovery source (bash/gemini) 5. Use exact enum values from schema (case-sensitive)
6. Include ALL required fields at every level
7. Include file:line references in findings
8. Attribute discovery source (bash/gemini)
**NEVER**: **NEVER**:
1. Modify any files (read-only agent) 1. Modify any files (read-only agent)
2. Skip schema validation (if specified) 2. Skip schema reading step when schema is specified
3. Use wrong field names 3. Guess field names - ALWAYS copy from schema
4. Mix case (e.g., severity must be lowercase) 4. Assume structure - ALWAYS verify against schema
5. Omit required fields
**COMMON FIELD ERRORS**:
| Wrong | Correct |
|-------|---------|
| `"severity": "Critical"` | `"severity": "critical"` |
| `"code_snippet"` | `"snippet"` |
| `"timestamp"` | `"analysis_timestamp"` |
| `{ "dimension": ... }` | `[{ "dimension": ... }]` |
| `"by_severity": {...}` | `"critical": 2, "high": 4` |

View File

@@ -397,29 +397,7 @@ Calculate risk level based on:
} }
``` ```
## Execution Mode: Brainstorm vs Plan
### Brainstorm Mode (Lightweight)
**Purpose**: Provide high-level context for generating brainstorming questions
**Execution**: Phase 1-2 only (skip deep analysis)
**Output**:
- Lightweight context-package with:
- Project structure overview
- Tech stack identification
- High-level existing module names
- Basic conflict risk (file count only)
- Skip: Detailed dependency graphs, deep code analysis, web research
### Plan Mode (Comprehensive)
**Purpose**: Detailed implementation planning with conflict detection
**Execution**: Full Phase 1-3 (complete discovery + analysis)
**Output**:
- Comprehensive context-package with:
- Detailed dependency graphs
- Deep code structure analysis
- Conflict detection with mitigation strategies
- Web research for unfamiliar tech
- Include: All discovery tracks, relevance scoring, 3-source synthesis
## Quality Validation ## Quality Validation

View File

@@ -397,23 +397,3 @@ function detect_framework_from_config() {
- ✅ All missing tests catalogued with priority - ✅ All missing tests catalogued with priority
- ✅ Execution time < 30 seconds (< 60s for large codebases) - ✅ Execution time < 30 seconds (< 60s for large codebases)
## Integration Points
### Called By
- `/workflow:tools:test-context-gather` - Orchestrator command
### Calls
- Code-Index MCP tools (preferred)
- ripgrep/find (fallback)
- Bash file operations
### Followed By
- `/workflow:tools:test-concept-enhanced` - Test generation analysis
## Notes
- **Detection-first**: Always check for existing test-context-package before analysis
- **Code-Index priority**: Use MCP tools when available, fallback to CLI
- **Framework agnostic**: Supports Jest, Mocha, pytest, RSpec, etc.
- **Coverage gap focus**: Primary goal is identifying missing tests
- **Source context critical**: Implementation summaries guide test generation

View File

@@ -438,19 +438,35 @@ Task(
- Context Pattern: ${targetFiles.map(f => `@${f}`).join(' ')} - Context Pattern: ${targetFiles.map(f => `@${f}`).join(' ')}
## Expected Deliverables ## Expected Deliverables
**MANDATORY**: Before generating any JSON output, read the template example first: **MANDATORY**: Before generating any JSON output, read the schema first:
- Read: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json - Read: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json
- Follow the exact structure and field naming from the example - Extract and follow EXACT field names from schema
1. Dimension Results JSON: ${outputDir}/dimensions/${dimension}.json 1. Dimension Results JSON: ${outputDir}/dimensions/${dimension}.json
- MUST follow example template: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json
- MUST include: findings array with severity, file, line, description, recommendation **⚠️ CRITICAL JSON STRUCTURE REQUIREMENTS**:
- MUST include: summary statistics (total findings, severity distribution)
- MUST include: cross_references to related findings Root structure MUST be array: \`[{ ... }]\` NOT \`{ ... }\`
Required top-level fields:
- dimension, review_id, analysis_timestamp (NOT timestamp/analyzed_at)
- cli_tool_used (gemini|qwen|codex), model, analysis_duration_ms
- summary (FLAT structure), findings, cross_references
Summary MUST be FLAT (NOT nested by_severity):
\`{ "total_findings": N, "critical": N, "high": N, "medium": N, "low": N, "files_analyzed": N, "lines_reviewed": N }\`
Finding required fields:
- id: format \`{dim}-{seq}-{uuid8}\` e.g., \`sec-001-a1b2c3d4\` (lowercase)
- severity: lowercase only (critical|high|medium|low)
- snippet (NOT code_snippet), impact (NOT exploit_scenario)
- metadata, iteration (0), status (pending_remediation), cross_references
2. Analysis Report: ${outputDir}/reports/${dimension}-analysis.md 2. Analysis Report: ${outputDir}/reports/${dimension}-analysis.md
- Human-readable summary with recommendations - Human-readable summary with recommendations
- Grouped by severity: critical → high → medium → low - Grouped by severity: critical → high → medium → low
- Include file:line references for all findings - Include file:line references for all findings
3. CLI Output Log: ${outputDir}/reports/${dimension}-cli-output.txt 3. CLI Output Log: ${outputDir}/reports/${dimension}-cli-output.txt
- Raw CLI tool output for debugging - Raw CLI tool output for debugging
- Include full analysis text - Include full analysis text
@@ -512,17 +528,24 @@ Task(
- Mode: analysis (READ-ONLY) - Mode: analysis (READ-ONLY)
## Expected Deliverables ## Expected Deliverables
**MANDATORY**: Before generating any JSON output, read the template example first: **MANDATORY**: Before generating any JSON output, read the schema first:
- Read: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json - Read: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json
- Follow the exact structure and field naming from the example - Extract and follow EXACT field names from schema
1. Deep-Dive Results JSON: ${outputDir}/iterations/iteration-${iteration}-finding-${findingId}.json 1. Deep-Dive Results JSON: ${outputDir}/iterations/iteration-${iteration}-finding-${findingId}.json
- MUST follow example template: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json
- MUST include: root_cause with summary, details, affected_scope, similar_patterns **⚠️ CRITICAL JSON STRUCTURE REQUIREMENTS**:
- MUST include: remediation_plan with approach, steps[], estimated_effort, risk_level
- MUST include: impact_assessment with files_affected, tests_required, breaking_changes Root structure MUST be array: \`[{ ... }]\` NOT \`{ ... }\`
- MUST include: reassessed_severity with severity_change_reason
- MUST include: confidence_score (0.0-1.0) Required top-level fields:
- finding_id, dimension, iteration, analysis_timestamp
- cli_tool_used, model, analysis_duration_ms
- original_finding, root_cause, remediation_plan
- impact_assessment, reassessed_severity, confidence_score, cross_references
All nested objects must follow schema exactly - read schema for field names
2. Analysis Report: ${outputDir}/reports/deep-dive-${iteration}-${findingId}.md 2. Analysis Report: ${outputDir}/reports/deep-dive-${iteration}-${findingId}.md
- Detailed root cause analysis - Detailed root cause analysis
- Step-by-step remediation plan - Step-by-step remediation plan

View File

@@ -442,19 +442,35 @@ Task(
- Mode: analysis (READ-ONLY) - Mode: analysis (READ-ONLY)
## Expected Deliverables ## Expected Deliverables
**MANDATORY**: Before generating any JSON output, read the template example first: **MANDATORY**: Before generating any JSON output, read the schema first:
- Read: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json - Read: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json
- Follow the exact structure and field naming from the example - Extract and follow EXACT field names from schema
1. Dimension Results JSON: ${outputDir}/dimensions/${dimension}.json 1. Dimension Results JSON: ${outputDir}/dimensions/${dimension}.json
- MUST follow example template: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json
- MUST include: findings array with severity, file, line, description, recommendation **⚠️ CRITICAL JSON STRUCTURE REQUIREMENTS**:
- MUST include: summary statistics (total findings, severity distribution)
- MUST include: cross_references to related findings Root structure MUST be array: \`[{ ... }]\` NOT \`{ ... }\`
Required top-level fields:
- dimension, review_id, analysis_timestamp (NOT timestamp/analyzed_at)
- cli_tool_used (gemini|qwen|codex), model, analysis_duration_ms
- summary (FLAT structure), findings, cross_references
Summary MUST be FLAT (NOT nested by_severity):
\`{ "total_findings": N, "critical": N, "high": N, "medium": N, "low": N, "files_analyzed": N, "lines_reviewed": N }\`
Finding required fields:
- id: format \`{dim}-{seq}-{uuid8}\` e.g., \`sec-001-a1b2c3d4\` (lowercase)
- severity: lowercase only (critical|high|medium|low)
- snippet (NOT code_snippet), impact (NOT exploit_scenario)
- metadata, iteration (0), status (pending_remediation), cross_references
2. Analysis Report: ${outputDir}/reports/${dimension}-analysis.md 2. Analysis Report: ${outputDir}/reports/${dimension}-analysis.md
- Human-readable summary with recommendations - Human-readable summary with recommendations
- Grouped by severity: critical → high → medium → low - Grouped by severity: critical → high → medium → low
- Include file:line references for all findings - Include file:line references for all findings
3. CLI Output Log: ${outputDir}/reports/${dimension}-cli-output.txt 3. CLI Output Log: ${outputDir}/reports/${dimension}-cli-output.txt
- Raw CLI tool output for debugging - Raw CLI tool output for debugging
- Include full analysis text - Include full analysis text
@@ -517,17 +533,24 @@ Task(
- Mode: analysis (READ-ONLY) - Mode: analysis (READ-ONLY)
## Expected Deliverables ## Expected Deliverables
**MANDATORY**: Before generating any JSON output, read the template example first: **MANDATORY**: Before generating any JSON output, read the schema first:
- Read: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json - Read: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json
- Follow the exact structure and field naming from the example - Extract and follow EXACT field names from schema
1. Deep-Dive Results JSON: ${outputDir}/iterations/iteration-${iteration}-finding-${findingId}.json 1. Deep-Dive Results JSON: ${outputDir}/iterations/iteration-${iteration}-finding-${findingId}.json
- MUST follow example template: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json
- MUST include: root_cause with summary, details, affected_scope, similar_patterns **⚠️ CRITICAL JSON STRUCTURE REQUIREMENTS**:
- MUST include: remediation_plan with approach, steps[], estimated_effort, risk_level
- MUST include: impact_assessment with files_affected, tests_required, breaking_changes Root structure MUST be array: \`[{ ... }]\` NOT \`{ ... }\`
- MUST include: reassessed_severity with severity_change_reason
- MUST include: confidence_score (0.0-1.0) Required top-level fields:
- finding_id, dimension, iteration, analysis_timestamp
- cli_tool_used, model, analysis_duration_ms
- original_finding, root_cause, remediation_plan
- impact_assessment, reassessed_severity, confidence_score, cross_references
All nested objects must follow schema exactly - read schema for field names
2. Analysis Report: ${outputDir}/reports/deep-dive-${iteration}-${findingId}.md 2. Analysis Report: ${outputDir}/reports/deep-dive-${iteration}-${findingId}.md
- Detailed root cause analysis - Detailed root cause analysis
- Step-by-step remediation plan - Step-by-step remediation plan