feat: add json_builder tool with schema-aware JSON construction and validation

Unified tool replacing manual schema reading (cat schema) across all agents and skills. Supports 5 commands: init (skeleton), set (incremental field setting with instant validation), validate (full structural + semantic), merge (dedup multiple JSONs), info (compact schema summary). Registers 24 schemas in schema-registry.ts. Updates all agent/skill/command files to use json_builder info/init/validate instead of cat schema references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 20:11:04 +08:00 · 2026-03-29 13:37:41 +08:00
parent bbceef3d36
commit 21a6d29701
16 changed files with 1145 additions and 110 deletions
--- a/.claude/agents/action-planning-agent.md
+++ b/.claude/agents/action-planning-agent.md
@@ -829,6 +829,12 @@ Generate at `.workflow/active/{session_id}/plan.json` following `plan-overview-b

 **Generation Timing**: After all `.task/IMPL-*.json` files are generated, aggregate into plan.json.

+**Validation**: After writing plan.json and task files, validate with json_builder:
+```bash
+ccw tool exec json_builder '{"cmd":"validate","target":"<session>/plan.json","schema":"plan"}'
+ccw tool exec json_builder '{"cmd":"validate","target":"<session>/.task/IMPL-001.json","schema":"task"}'
+```
+
 ### 2.3 IMPL_PLAN.md Structure

 **Template-Based Generation**:
--- a/.claude/agents/cli-explore-agent.md
+++ b/.claude/agents/cli-explore-agent.md
@@ -5,6 +5,7 @@ description: |
  Orchestrates 4-phase workflow: Task Understanding → Analysis Execution → Schema Validation → Output Generation.
  Spawned by /explore command orchestrator.
 tools: Read, Bash, Glob, Grep
+# json_builder available via: ccw tool exec json_builder '{"cmd":"..."}' (Bash)
 color: yellow
 ---

@@ -66,9 +67,9 @@ Phase 4: Output Generation
   Store result as `project_structure` for module-aware file discovery in Phase 2.

 2. **Output Schema Loading** (if output file path specified in prompt):
-   - Exploration output → `cat ~/.ccw/workflows/cli-templates/schemas/explore-json-schema.json`
-   - Other schemas as specified in prompt
-   Read and memorize schema requirements BEFORE any analysis begins (feeds Phase 3 validation).
+   - Get schema summary: `ccw tool exec json_builder '{"cmd":"info","schema":"explore"}'` (or "diagnosis" for bug analysis)
+   - Initialize output file: `ccw tool exec json_builder '{"cmd":"init","schema":"explore","output":"<output_path>"}'`
+   - The tool returns requiredFields, arrayFields, and enumFields — memorize these for Phase 2.

 3. **Project Context Loading** (from spec system):
   - Load exploration specs using: `ccw spec load --category exploration`
@@ -150,55 +151,56 @@ RULES: {from prompt, if template specified} | analysis=READ-ONLY
 ---

 <schema_validation>
-## Phase 3: Schema Validation
+## Phase 3: Incremental Build & Validation (via json_builder)

-### CRITICAL: Schema Compliance Protocol
+**This phase replaces manual JSON writing + self-validation with tool-assisted construction.**

-**This phase is MANDATORY when schema file is specified in prompt.**
-
-**Step 1: Read Schema FIRST**
-```
-Read(schema_file_path)
+**Step 1: Set text fields** (discovered during Phase 2 analysis)
+```bash
+ccw tool exec json_builder '{"cmd":"set","target":"<output_path>","ops":[
+  {"path":"project_structure","value":"..."},
+  {"path":"patterns","value":"..."},
+  {"path":"dependencies","value":"..."},
+  {"path":"integration_points","value":"..."},
+  {"path":"constraints","value":"..."}
+]}'
 ```

-**Step 2: Extract Schema Requirements**
+**Step 2: Append file entries** (as discovered — one `set` per batch)
+```bash
+ccw tool exec json_builder '{"cmd":"set","target":"<output_path>","ops":[
+  {"path":"relevant_files[+]","value":{"path":"src/auth.ts","relevance":0.9,"rationale":"Contains AuthService.login() entry point for JWT generation","role":"modify_target","discovery_source":"bash-scan","key_code":[{"symbol":"login()","location":"L45-78","description":"JWT token generation with bcrypt verification"}],"topic_relation":"Security target — JWT generation lacks token rotation"}},
+  {"path":"relevant_files[+]","value":{...}}
+]}'
+```

-Parse and memorize:
-1. **Root structure** - Is it array `[...]` or object `{...}`?
-2. **Required fields** - List all `"required": [...]` arrays
-3. **Field names EXACTLY** - Copy character-by-character (case-sensitive)
-4. **Enum values** - Copy exact strings (e.g., `"critical"` not `"Critical"`)
-5. **Nested structures** - Note flat vs nested requirements
+The tool **automatically validates** each operation:
+- enum values (role, discovery_source) → rejects invalid
+- minLength (rationale >= 10) → rejects too short
+- type checking → rejects wrong types

-**Step 3: File Rationale Validation** (MANDATORY for relevant_files / affected_files)
+**Step 3: Set metadata**
+```bash
+ccw tool exec json_builder '{"cmd":"set","target":"<output_path>","ops":[
+  {"path":"_metadata.timestamp","value":"auto"},
+  {"path":"_metadata.task_description","value":"..."},
+  {"path":"_metadata.source","value":"cli-explore-agent"},
+  {"path":"_metadata.exploration_angle","value":"..."},
+  {"path":"_metadata.exploration_index","value":1},
+  {"path":"_metadata.total_explorations","value":2}
+]}'
+```

-Every file entry MUST have:
- `rationale` (required, minLength 10): Specific reason tied to the exploration topic, NOT generic
-  - GOOD: "Contains AuthService.login() which is the entry point for JWT token generation"
-  - BAD: "Related to auth" or "Relevant file"
- `role` (required, enum): Structural classification of why it was selected
- `discovery_source` (optional but recommended): How the file was found
- `key_code` (strongly recommended for relevance >= 0.7): Array of {symbol, location?, description}
-  - GOOD: [{"symbol": "AuthService.login()", "location": "L45-L78", "description": "JWT token generation with bcrypt verification, returns token pair"}]
-  - BAD: [{"symbol": "login", "description": "login function"}]
- `topic_relation` (strongly recommended for relevance >= 0.7): Connection from exploration angle perspective
-  - GOOD: "Security exploration targets this file because JWT generation lacks token rotation"
-  - BAD: "Related to security"
+**Step 4: Final validation**
+```bash
+ccw tool exec json_builder '{"cmd":"validate","target":"<output_path>"}'
+```
+Returns `{valid, errors, warnings, stats}`. If errors exist → fix with `set` → re-validate.

-**Step 4: Pre-Output Validation Checklist**
-
-Before writing ANY JSON output, verify:
-
- [ ] Root structure matches schema (array vs object)
- [ ] ALL required fields present at each level
- [ ] Field names EXACTLY match schema (character-by-character)
- [ ] Enum values EXACTLY match schema (case-sensitive)
- [ ] Nested structures follow schema pattern (flat vs nested)
- [ ] Data types correct (string, integer, array, object)
- [ ] Every file in relevant_files has: path + relevance + rationale + role
- [ ] Every rationale is specific (>10 chars, not generic)
- [ ] Files with relevance >= 0.7 have key_code with symbol + description (minLength 10)
- [ ] Files with relevance >= 0.7 have topic_relation explaining connection to angle (minLength 15)
+**Quality reminders** (enforced by tool, but be aware):
+- `rationale`: Must be specific, not generic ("Related to auth" → rejected by semantic check)
+- `key_code`: Strongly recommended for relevance >= 0.7 (warnings if missing)
+- `topic_relation`: Strongly recommended for relevance >= 0.7 (warnings if missing)
 </schema_validation>

 ---
@@ -212,16 +214,12 @@ Brief summary:
 - Task completion status
 - Key findings summary
 - Generated file paths (if any)
+- Validation result (from Phase 3 Step 4)

-### File Output (as specified in prompt)
+### File Output

-**MANDATORY WORKFLOW**:
-
-1. `Read()` schema file BEFORE generating output
-2. Extract ALL field names from schema
-3. Build JSON using ONLY schema field names
-4. Validate against checklist before writing
-5. Write file with validated content
+File is already written by json_builder during Phase 3 (init + set operations).
+Phase 4 only verifies the final validation passed and returns the summary.
 </output_generation>

 ---
@@ -243,28 +241,19 @@ Brief summary:

 **ALWAYS**:
 1. **Search Tool Priority**: ACE (`mcp__ace-tool__search_context`) → CCW (`mcp__ccw-tools__smart_search`) / Built-in (`Grep`, `Glob`, `Read`)
-2. Read schema file FIRST before generating any output (if schema specified)
-3. Copy field names EXACTLY from schema (case-sensitive)
-4. Verify root structure matches schema (array vs object)
-5. Match nested/flat structures as schema requires
-6. Use exact enum values from schema (case-sensitive)
-7. Include ALL required fields at every level
-8. Include file:line references in findings
-9. **Every file MUST have rationale**: Specific selection basis tied to the topic (not generic)
-10. **Every file MUST have role**: Classify as modify_target/dependency/pattern_reference/test_target/type_definition/integration_point/config/context_only
-11. **Track discovery source**: Record how each file was found (bash-scan/cli-analysis/ace-search/dependency-trace/manual)
-12. **Populate key_code for high-relevance files**: relevance >= 0.7 → key_code array with symbol, location, description
-13. **Populate topic_relation for high-relevance files**: relevance >= 0.7 → topic_relation explaining file-to-angle connection
+2. **Use json_builder** for all JSON output: `init` → `set` (incremental) → `validate`
+3. Include file:line references in findings
+4. **Every file MUST have rationale + role** (enforced by json_builder set validation)
+5. **Track discovery source**: Record how each file was found (bash-scan/cli-analysis/ace-search/dependency-trace/manual)
+6. **Populate key_code + topic_relation for high-relevance files** (relevance >= 0.7; json_builder warns if missing)

 **Bash Tool**:
 - Use `run_in_background=false` for all Bash/CLI calls to ensure foreground execution

 **NEVER**:
-1. Modify any files (read-only agent)
-2. Skip schema reading step when schema is specified
-3. Guess field names - ALWAYS copy from schema
-4. Assume structure - ALWAYS verify against schema
-5. Omit required fields
+1. Modify any source code files (read-only agent — json_builder writes only output JSON)
+2. Hand-write JSON output — always use json_builder
+3. Skip the `validate` step before returning
 </operational_rules>

 <output_contract>
@@ -282,11 +271,8 @@ When exploration is complete, return one of:

 Before returning, verify:
 - [ ] All 4 phases were executed (or skipped with justification)
- [ ] Schema was read BEFORE output generation (if schema specified)
- [ ] All field names match schema exactly (case-sensitive)
- [ ] Every file entry has rationale (specific, >10 chars) and role
- [ ] High-relevance files (>= 0.7) have key_code and topic_relation
+- [ ] json_builder `init` was called at start
+- [ ] json_builder `validate` returned `valid: true` (or all errors were fixed)
 - [ ] Discovery sources are tracked for all findings
- [ ] No files were modified (read-only agent)
- [ ] Output format matches schema root structure (array vs object)
+- [ ] No source code files were modified (read-only agent)
 </quality_gate>
--- a/.claude/agents/cli-lite-planning-agent.md
+++ b/.claude/agents/cli-lite-planning-agent.md
@@ -139,16 +139,15 @@ When `process_docs: true`, generate planning-context.md before sub-plan.json:

 ## Schema-Driven Output

-**CRITICAL**: Read the schema reference first to determine output structure:
- `plan-overview-base-schema.json` → Implementation plan with `approach`, `complexity`
- `plan-overview-fix-schema.json` → Fix plan with `root_cause`, `severity`, `risk_level`
+**CRITICAL**: Get schema info via json_builder to determine output structure:
+- `ccw tool exec json_builder '{"cmd":"info","schema":"plan"}'` → Implementation plan with `approach`, `complexity`
+- `ccw tool exec json_builder '{"cmd":"info","schema":"plan-fix"}'` → Fix plan with `root_cause`, `severity`, `risk_level`

-```javascript
-// Step 1: Always read schema first
-const schema = Bash(`cat ${schema_path}`)
-
-// Step 2: Generate plan conforming to schema
-const planObject = generatePlanFromSchema(schema, context)
+After generating plan.json and .task/*.json, validate:
+```bash
+ccw tool exec json_builder '{"cmd":"validate","target":"<session>/plan.json","schema":"plan"}'
+# For each task file:
+ccw tool exec json_builder '{"cmd":"validate","target":"<session>/.task/TASK-001.json","schema":"task"}'
 ```

 </schema_driven_output>
@@ -863,7 +862,7 @@ function validateTask(task) {

 **ALWAYS**:
 - **Search Tool Priority**: ACE (`mcp__ace-tool__search_context`) → CCW (`mcp__ccw-tools__smart_search`) / Built-in (`Grep`, `Glob`, `Read`)
- **Read schema first** to determine output structure
+- **Get schema info via json_builder** to determine output structure
 - Generate task IDs (TASK-001/TASK-002 for plan, FIX-001/FIX-002 for fix-plan)
 - Include depends_on (even if empty [])
 - **Assign cli_execution_id** (`{sessionId}-{taskId}`)
@@ -981,7 +980,7 @@ Upon completion, return one of:

 Before returning, verify:

- [ ] Schema reference was read and output structure matches schema type (base vs fix)
+- [ ] Schema info was obtained via json_builder and output structure matches schema type (base vs fix)
 - [ ] All tasks have valid IDs (TASK-NNN or FIX-NNN format)
 - [ ] All tasks have 2+ implementation steps
 - [ ] All convergence criteria are quantified and testable (no vague language)
--- a/.claude/agents/issue-plan-agent.md
+++ b/.claude/agents/issue-plan-agent.md
@@ -348,7 +348,7 @@ Write({ file_path: filePath, content: newContent })
 .workflow/issues/solutions/{issue-id}.jsonl
 ```

-Each line is a solution JSON containing tasks. Schema: `cat ~/.ccw/workflows/cli-templates/schemas/solution-schema.json`
+Each line is a solution JSON containing tasks. Schema: `ccw tool exec json_builder '{"cmd":"info","schema":"solution"}'`

 ### 2.2 Return Summary

@@ -388,7 +388,7 @@ Each line is a solution JSON containing tasks. Schema: `cat ~/.ccw/workflows/cli

 **ALWAYS**:
 1. **Search Tool Priority**: ACE (`mcp__ace-tool__search_context`) → CCW (`mcp__ccw-tools__smart_search`) / Built-in (`Grep`, `Glob`, `Read`)
-2. Read schema first: `cat ~/.ccw/workflows/cli-templates/schemas/solution-schema.json`
+2. Get schema info: `ccw tool exec json_builder '{"cmd":"info","schema":"solution"}'` (replaces reading raw schema)
 3. Use ACE semantic search as PRIMARY exploration tool
 4. Fetch issue details via `ccw issue status <id> --json`
 5. **Analyze failure history**: Check `issue.feedback` for type='failure', stage='execute'
@@ -408,6 +408,11 @@ Each line is a solution JSON containing tasks. Schema: `cat ~/.ccw/workflows/cli
 4. **Dependency ordering**: If issues must touch same files, encode execution order via `depends_on`
 5. **Scope minimization**: Prefer smaller, focused modifications over broad refactoring

+**VALIDATE**: After writing solution JSONL, validate each solution:
+```bash
+ccw tool exec json_builder '{"cmd":"validate","target":".workflow/issues/solutions/<issue-id>.jsonl","schema":"solution"}'
+```
+
 **NEVER**:
 1. Execute implementation (return plan only)
 2. Use vague criteria ("works correctly", "good performance")