fix: Add explicit JSON schema requirements to review cycle agent prompts

Move specific JSON structure requirements from cli-explore-agent (keep generic)
to review-*-cycle.md prompts. Key requirements now inline in prompts:
- Root must be array [{}] not object {}
- analysis_timestamp field (not timestamp/analyzed_at)
- Flat summary structure (not nested by_severity)
- Lowercase severity/id values
- Correct field names (snippet not code_snippet, impact not exploit_scenario)
This commit is contained in:
catlog22
2025-11-26 15:15:30 +08:00
parent cf6a0f1bc0
commit 34a9a23d5b
5 changed files with 120 additions and 106 deletions

View File

@@ -29,10 +29,10 @@ Phase 1: Task Understanding
↓ Parse prompt for: analysis scope, output requirements, schema path
Phase 2: Analysis Execution
↓ Bash structural scan + Gemini semantic analysis (based on mode)
Phase 3: Schema Validation (if schema specified in prompt)
↓ Read schema → Validate structure → Check field names
Phase 3: Schema Validation (MANDATORY if schema specified)
↓ Read schema → Extract EXACT field names → Validate structure
Phase 4: Output Generation
↓ Agent report + File output (if required)
↓ Agent report + File output (strictly schema-compliant)
```
---
@@ -100,21 +100,34 @@ RULES: {from prompt, if template specified} | analysis=READ-ONLY
## Phase 3: Schema Validation
**⚠️ MANDATORY if schema file specified in prompt**
### ⚠️ CRITICAL: Schema Compliance Protocol
**Step 1**: `Read()` schema file specified in prompt
**This phase is MANDATORY when schema file is specified in prompt.**
**Step 2**: Extract requirements from schema:
- Required fields and types
- Field naming conventions
- Enum values and constraints
- Root structure (array vs object)
**Step 1: Read Schema FIRST**
```
Read(schema_file_path)
```
**Step 3**: Validate before output:
- Field names exactly match schema
- Data types correct
- All required fields present
- Root structure correct
**Step 2: Extract Schema Requirements**
Parse and memorize:
1. **Root structure** - Is it array `[...]` or object `{...}`?
2. **Required fields** - List all `"required": [...]` arrays
3. **Field names EXACTLY** - Copy character-by-character (case-sensitive)
4. **Enum values** - Copy exact strings (e.g., `"critical"` not `"Critical"`)
5. **Nested structures** - Note flat vs nested requirements
**Step 3: Pre-Output Validation Checklist**
Before writing ANY JSON output, verify:
- [ ] Root structure matches schema (array vs object)
- [ ] ALL required fields present at each level
- [ ] Field names EXACTLY match schema (character-by-character)
- [ ] Enum values EXACTLY match schema (case-sensitive)
- [ ] Nested structures follow schema pattern (flat vs nested)
- [ ] Data types correct (string, integer, array, object)
---
@@ -129,11 +142,13 @@ Brief summary:
### File Output (as specified in prompt)
**Extract from prompt**:
- Output file path
- Schema file path
**⚠️ MANDATORY WORKFLOW**:
**⚠️ MANDATORY**: If schema specified, `Read()` schema before writing, strictly follow field names and structure.
1. `Read()` schema file BEFORE generating output
2. Extract ALL field names from schema
3. Build JSON using ONLY schema field names
4. Validate against checklist before writing
5. Write file with validated content
---
@@ -150,23 +165,18 @@ Brief summary:
## Key Reminders
**ALWAYS**:
1. Understand task requirements from prompt
2. If schema specified, `Read()` schema before writing
3. Strictly follow schema field names and structure
4. Include file:line references
5. Attribute discovery source (bash/gemini)
1. Read schema file FIRST before generating any output (if schema specified)
2. Copy field names EXACTLY from schema (case-sensitive)
3. Verify root structure matches schema (array vs object)
4. Match nested/flat structures as schema requires
5. Use exact enum values from schema (case-sensitive)
6. Include ALL required fields at every level
7. Include file:line references in findings
8. Attribute discovery source (bash/gemini)
**NEVER**:
1. Modify any files (read-only agent)
2. Skip schema validation (if specified)
3. Use wrong field names
4. Mix case (e.g., severity must be lowercase)
**COMMON FIELD ERRORS**:
| Wrong | Correct |
|-------|---------|
| `"severity": "Critical"` | `"severity": "critical"` |
| `"code_snippet"` | `"snippet"` |
| `"timestamp"` | `"analysis_timestamp"` |
| `{ "dimension": ... }` | `[{ "dimension": ... }]` |
| `"by_severity": {...}` | `"critical": 2, "high": 4` |
2. Skip schema reading step when schema is specified
3. Guess field names - ALWAYS copy from schema
4. Assume structure - ALWAYS verify against schema
5. Omit required fields

View File

@@ -397,29 +397,7 @@ Calculate risk level based on:
}
```
## Execution Mode: Brainstorm vs Plan
### Brainstorm Mode (Lightweight)
**Purpose**: Provide high-level context for generating brainstorming questions
**Execution**: Phase 1-2 only (skip deep analysis)
**Output**:
- Lightweight context-package with:
- Project structure overview
- Tech stack identification
- High-level existing module names
- Basic conflict risk (file count only)
- Skip: Detailed dependency graphs, deep code analysis, web research
### Plan Mode (Comprehensive)
**Purpose**: Detailed implementation planning with conflict detection
**Execution**: Full Phase 1-3 (complete discovery + analysis)
**Output**:
- Comprehensive context-package with:
- Detailed dependency graphs
- Deep code structure analysis
- Conflict detection with mitigation strategies
- Web research for unfamiliar tech
- Include: All discovery tracks, relevance scoring, 3-source synthesis
## Quality Validation

View File

@@ -397,23 +397,3 @@ function detect_framework_from_config() {
- ✅ All missing tests catalogued with priority
- ✅ Execution time < 30 seconds (< 60s for large codebases)
## Integration Points
### Called By
- `/workflow:tools:test-context-gather` - Orchestrator command
### Calls
- Code-Index MCP tools (preferred)
- ripgrep/find (fallback)
- Bash file operations
### Followed By
- `/workflow:tools:test-concept-enhanced` - Test generation analysis
## Notes
- **Detection-first**: Always check for existing test-context-package before analysis
- **Code-Index priority**: Use MCP tools when available, fallback to CLI
- **Framework agnostic**: Supports Jest, Mocha, pytest, RSpec, etc.
- **Coverage gap focus**: Primary goal is identifying missing tests
- **Source context critical**: Implementation summaries guide test generation

View File

@@ -438,19 +438,35 @@ Task(
- Context Pattern: ${targetFiles.map(f => `@${f}`).join(' ')}
## Expected Deliverables
**MANDATORY**: Before generating any JSON output, read the template example first:
**MANDATORY**: Before generating any JSON output, read the schema first:
- Read: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json
- Follow the exact structure and field naming from the example
- Extract and follow EXACT field names from schema
1. Dimension Results JSON: ${outputDir}/dimensions/${dimension}.json
- MUST follow example template: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json
- MUST include: findings array with severity, file, line, description, recommendation
- MUST include: summary statistics (total findings, severity distribution)
- MUST include: cross_references to related findings
**⚠️ CRITICAL JSON STRUCTURE REQUIREMENTS**:
Root structure MUST be array: \`[{ ... }]\` NOT \`{ ... }\`
Required top-level fields:
- dimension, review_id, analysis_timestamp (NOT timestamp/analyzed_at)
- cli_tool_used (gemini|qwen|codex), model, analysis_duration_ms
- summary (FLAT structure), findings, cross_references
Summary MUST be FLAT (NOT nested by_severity):
\`{ "total_findings": N, "critical": N, "high": N, "medium": N, "low": N, "files_analyzed": N, "lines_reviewed": N }\`
Finding required fields:
- id: format \`{dim}-{seq}-{uuid8}\` e.g., \`sec-001-a1b2c3d4\` (lowercase)
- severity: lowercase only (critical|high|medium|low)
- snippet (NOT code_snippet), impact (NOT exploit_scenario)
- metadata, iteration (0), status (pending_remediation), cross_references
2. Analysis Report: ${outputDir}/reports/${dimension}-analysis.md
- Human-readable summary with recommendations
- Grouped by severity: critical → high → medium → low
- Include file:line references for all findings
3. CLI Output Log: ${outputDir}/reports/${dimension}-cli-output.txt
- Raw CLI tool output for debugging
- Include full analysis text
@@ -512,17 +528,24 @@ Task(
- Mode: analysis (READ-ONLY)
## Expected Deliverables
**MANDATORY**: Before generating any JSON output, read the template example first:
**MANDATORY**: Before generating any JSON output, read the schema first:
- Read: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json
- Follow the exact structure and field naming from the example
- Extract and follow EXACT field names from schema
1. Deep-Dive Results JSON: ${outputDir}/iterations/iteration-${iteration}-finding-${findingId}.json
- MUST follow example template: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json
- MUST include: root_cause with summary, details, affected_scope, similar_patterns
- MUST include: remediation_plan with approach, steps[], estimated_effort, risk_level
- MUST include: impact_assessment with files_affected, tests_required, breaking_changes
- MUST include: reassessed_severity with severity_change_reason
- MUST include: confidence_score (0.0-1.0)
**⚠️ CRITICAL JSON STRUCTURE REQUIREMENTS**:
Root structure MUST be array: \`[{ ... }]\` NOT \`{ ... }\`
Required top-level fields:
- finding_id, dimension, iteration, analysis_timestamp
- cli_tool_used, model, analysis_duration_ms
- original_finding, root_cause, remediation_plan
- impact_assessment, reassessed_severity, confidence_score, cross_references
All nested objects must follow schema exactly - read schema for field names
2. Analysis Report: ${outputDir}/reports/deep-dive-${iteration}-${findingId}.md
- Detailed root cause analysis
- Step-by-step remediation plan

View File

@@ -442,19 +442,35 @@ Task(
- Mode: analysis (READ-ONLY)
## Expected Deliverables
**MANDATORY**: Before generating any JSON output, read the template example first:
**MANDATORY**: Before generating any JSON output, read the schema first:
- Read: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json
- Follow the exact structure and field naming from the example
- Extract and follow EXACT field names from schema
1. Dimension Results JSON: ${outputDir}/dimensions/${dimension}.json
- MUST follow example template: ~/.claude/workflows/cli-templates/schemas/review-dimension-results-schema.json
- MUST include: findings array with severity, file, line, description, recommendation
- MUST include: summary statistics (total findings, severity distribution)
- MUST include: cross_references to related findings
**⚠️ CRITICAL JSON STRUCTURE REQUIREMENTS**:
Root structure MUST be array: \`[{ ... }]\` NOT \`{ ... }\`
Required top-level fields:
- dimension, review_id, analysis_timestamp (NOT timestamp/analyzed_at)
- cli_tool_used (gemini|qwen|codex), model, analysis_duration_ms
- summary (FLAT structure), findings, cross_references
Summary MUST be FLAT (NOT nested by_severity):
\`{ "total_findings": N, "critical": N, "high": N, "medium": N, "low": N, "files_analyzed": N, "lines_reviewed": N }\`
Finding required fields:
- id: format \`{dim}-{seq}-{uuid8}\` e.g., \`sec-001-a1b2c3d4\` (lowercase)
- severity: lowercase only (critical|high|medium|low)
- snippet (NOT code_snippet), impact (NOT exploit_scenario)
- metadata, iteration (0), status (pending_remediation), cross_references
2. Analysis Report: ${outputDir}/reports/${dimension}-analysis.md
- Human-readable summary with recommendations
- Grouped by severity: critical → high → medium → low
- Include file:line references for all findings
3. CLI Output Log: ${outputDir}/reports/${dimension}-cli-output.txt
- Raw CLI tool output for debugging
- Include full analysis text
@@ -517,17 +533,24 @@ Task(
- Mode: analysis (READ-ONLY)
## Expected Deliverables
**MANDATORY**: Before generating any JSON output, read the template example first:
**MANDATORY**: Before generating any JSON output, read the schema first:
- Read: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json
- Follow the exact structure and field naming from the example
- Extract and follow EXACT field names from schema
1. Deep-Dive Results JSON: ${outputDir}/iterations/iteration-${iteration}-finding-${findingId}.json
- MUST follow example template: ~/.claude/workflows/cli-templates/schemas/review-deep-dive-results-schema.json
- MUST include: root_cause with summary, details, affected_scope, similar_patterns
- MUST include: remediation_plan with approach, steps[], estimated_effort, risk_level
- MUST include: impact_assessment with files_affected, tests_required, breaking_changes
- MUST include: reassessed_severity with severity_change_reason
- MUST include: confidence_score (0.0-1.0)
**⚠️ CRITICAL JSON STRUCTURE REQUIREMENTS**:
Root structure MUST be array: \`[{ ... }]\` NOT \`{ ... }\`
Required top-level fields:
- finding_id, dimension, iteration, analysis_timestamp
- cli_tool_used, model, analysis_duration_ms
- original_finding, root_cause, remediation_plan
- impact_assessment, reassessed_severity, confidence_score, cross_references
All nested objects must follow schema exactly - read schema for field names
2. Analysis Report: ${outputDir}/reports/deep-dive-${iteration}-${findingId}.md
- Detailed root cause analysis
- Step-by-step remediation plan