feat: add investigate, security-audit, ship skills (Claude + Codex)

- Add 3 new Claude skills: investigate (Iron Law debugging), security-audit
  (OWASP Top 10 + STRIDE), ship (gated release pipeline)
- Port all 3 skills to Codex v4 format under .codex/skills/ using
  Deep Interaction pattern (spawn_agent + assign_task phase transitions)
- Update README/README_CN acknowledgments: credit gstack
  (https://github.com/garrytan/gstack) as inspiration source

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
catlog22
2026-03-30 10:31:13 +08:00
parent 360a0316f7
commit 67ff3fe339
43 changed files with 8892 additions and 0 deletions

View File

@@ -0,0 +1,110 @@
---
name: investigate
description: Systematic debugging with Iron Law methodology. 5-phase investigation from evidence collection to verified fix. Triggers on "investigate", "debug", "root cause".
allowed-tools: Bash, Read, Write, Edit, Glob, Grep
---
# Investigate
Systematic debugging skill that enforces the Iron Law: never fix without a confirmed root cause. Produces a structured debug report with full evidence chain, minimal fix, and regression test.
## Iron Law Principle
**No fix without confirmed root cause.** Every investigation follows a strict evidence chain:
1. Reproduce the bug with concrete evidence
2. Analyze patterns to assess scope
3. Form and test hypotheses (max 3 strikes)
4. Implement minimal fix ONLY after root cause is confirmed
5. Verify fix and generate structured report
Violation of the Iron Law (skipping to Phase 4 without Phase 3 confirmation) is prohibited.
## Key Design Principles
1. **Evidence-First**: Collect before theorizing. Logs, stack traces, and reproduction steps are mandatory inputs.
2. **Minimal Fix**: Change only what is necessary. Refactoring is not debugging.
3. **3-Strike Escalation**: If 3 consecutive hypothesis tests fail, STOP and escalate with a diagnostic dump.
4. **Regression Coverage**: Every fix must include a test that fails without the fix and passes with it.
5. **Structured Output**: All findings are recorded in machine-readable JSON for future reference.
## Execution Flow
```
Phase 1: Root Cause Investigation
Reproduce bug, collect evidence (errors, logs, traces)
Use ccw cli --tool gemini --mode analysis for initial diagnosis
Output: investigation-report.json
|
v
Phase 2: Pattern Analysis
Search codebase for similar patterns (same error, module, antipattern)
Assess scope: isolated vs systemic
Output: pattern-analysis section in report
|
v
Phase 3: Hypothesis Testing
Form max 3 hypotheses from evidence
Test each with minimal read-only probes
3-strike rule: STOP and escalate on 3 consecutive failures
Output: confirmed root cause with evidence chain
|
v
Phase 4: Implementation [GATE: requires Phase 3 confirmed root cause]
Implement minimal fix
Add regression test
Verify fix resolves reproduction case
|
v
Phase 5: Verification & Report
Run full test suite
Check for regressions
Generate structured debug report to .workflow/.debug/
```
## Directory Setup
```bash
mkdir -p .workflow/.debug
```
## Output Structure
```
.workflow/.debug/
debug-report-{YYYY-MM-DD}-{slug}.json # Structured debug report
```
## Completion Status Protocol
This skill follows the Completion Status Protocol defined in `_shared/SKILL-DESIGN-SPEC.md` sections 13-14.
| Status | When |
|--------|------|
| **DONE** | Root cause confirmed, fix applied, regression test passes, no regressions |
| **DONE_WITH_CONCERNS** | Fix applied but partial test coverage or minor warnings |
| **BLOCKED** | Cannot reproduce bug, or 3-strike escalation triggered in Phase 3 |
| **NEEDS_CONTEXT** | Missing reproduction steps, unclear error conditions |
## Reference Documents
| Document | Purpose |
|----------|---------|
| [phases/01-root-cause-investigation.md](phases/01-root-cause-investigation.md) | Evidence collection and reproduction |
| [phases/02-pattern-analysis.md](phases/02-pattern-analysis.md) | Codebase pattern search and scope assessment |
| [phases/03-hypothesis-testing.md](phases/03-hypothesis-testing.md) | Hypothesis formation, testing, and 3-strike rule |
| [phases/04-implementation.md](phases/04-implementation.md) | Minimal fix with Iron Law gate |
| [phases/05-verification-report.md](phases/05-verification-report.md) | Test suite, regression check, report generation |
| [specs/iron-law.md](specs/iron-law.md) | Iron Law rules definition |
| [specs/debug-report-format.md](specs/debug-report-format.md) | Structured debug report JSON schema |
## CLI Integration
This skill leverages `ccw cli` for multi-model analysis at key points:
| Phase | CLI Usage | Mode |
|-------|-----------|------|
| Phase 1 | Initial diagnosis from error evidence | `--mode analysis` |
| Phase 2 | Cross-file pattern search | `--mode analysis` |
| Phase 3 | Hypothesis validation assistance | `--mode analysis` |
All CLI calls use `--mode analysis` (read-only). No write-mode CLI calls during investigation phases 1-3.

View File

@@ -0,0 +1,132 @@
# Phase 1: Root Cause Investigation
Reproduce the bug and collect all available evidence before forming any theories.
## Objective
- Reproduce the bug with concrete, observable symptoms
- Collect all evidence: error messages, logs, stack traces, affected files
- Establish a baseline understanding of what goes wrong and where
- Use CLI analysis for initial diagnosis
## Execution Steps
### Step 1: Understand the Bug Report
Parse the user's description to extract:
- **Symptom**: What observable behavior is wrong?
- **Expected**: What should happen instead?
- **Context**: When/where does it occur? (specific input, environment, timing)
```javascript
const bugReport = {
symptom: "extracted from user description",
expected_behavior: "what should happen",
context: "when/where it occurs",
user_provided_files: ["files mentioned by user"],
user_provided_errors: ["error messages provided"]
}
```
### Step 2: Reproduce the Bug
Attempt to reproduce using the most direct method available:
1. **Run the failing test** (if one exists):
```bash
# Identify and run the specific failing test
```
2. **Run the failing command** (if CLI/script):
```bash
# Execute the command that triggers the bug
```
3. **Read error-producing code path** (if reproduction requires complex setup):
- Use `Grep` to find the error message in source code
- Use `Read` to trace the code path that produces the error
- Document the theoretical reproduction path
**If reproduction fails**: Document what was attempted. The investigation can continue with static analysis, but note this as a concern.
### Step 3: Collect Evidence
Gather all available evidence using project tools:
```javascript
// 1. Find error messages in source
Grep({ pattern: "error message text", path: "src/" })
// 2. Find related log output
Grep({ pattern: "relevant log pattern", path: "." })
// 3. Read stack trace files or test output
Read({ file_path: "path/to/failing-test-output" })
// 4. Identify affected files and modules
Glob({ pattern: "**/*relevant-module*" })
```
### Step 4: Initial Diagnosis via CLI Analysis
Use `ccw cli` for a broader diagnostic perspective:
```bash
ccw cli -p "PURPOSE: Diagnose root cause of bug from collected evidence
TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
MODE: analysis
CONTEXT: @{affected_files} | Evidence: {error_messages_and_traces}
EXPECTED: Top 3 likely root causes ranked by evidence strength
CONSTRAINTS: Read-only analysis | Focus on {affected_module}" \
--tool gemini --mode analysis
```
### Step 5: Write Investigation Report
Generate `investigation-report.json` in memory (carried to next phase):
```json
{
"phase": 1,
"bug_description": "concise description of the bug",
"reproduction": {
"reproducible": true,
"steps": [
"step 1: ...",
"step 2: ...",
"step 3: observe error"
],
"reproduction_method": "test|command|static_analysis"
},
"evidence": {
"error_messages": ["exact error text"],
"stack_traces": ["relevant stack trace"],
"affected_files": ["file1.ts", "file2.ts"],
"affected_modules": ["module-name"],
"log_output": ["relevant log lines"]
},
"initial_diagnosis": {
"cli_tool_used": "gemini",
"top_suspects": [
{ "description": "suspect 1", "evidence_strength": "strong|moderate|weak", "files": [] }
]
}
}
```
## Output
- **Data**: `investigation-report` (in-memory, passed to Phase 2)
- **Format**: JSON structure as defined above
## Quality Checks
- [ ] Bug symptom clearly documented
- [ ] Reproduction attempted (success or documented failure)
- [ ] At least one piece of concrete evidence collected (error message, stack trace, or failing test)
- [ ] Affected files identified
- [ ] Initial diagnosis generated
## Next Phase
Proceed to [Phase 2: Pattern Analysis](02-pattern-analysis.md) with the investigation report.

View File

@@ -0,0 +1,126 @@
# Phase 2: Pattern Analysis
Search for similar patterns in the codebase to determine if the bug is isolated or systemic.
## Objective
- Search for similar error patterns, antipatterns, or code smells across the codebase
- Determine if the bug is an isolated incident or part of a systemic issue
- Identify related code that may be affected by the same root cause
- Refine the scope of the investigation
## Execution Steps
### Step 1: Search for Similar Error Patterns
Look for the same error type or message elsewhere in the codebase:
```javascript
// Search for identical or similar error messages
Grep({ pattern: "error_message_fragment", path: "src/", output_mode: "content", context: 3 })
// Search for the same exception/error type
Grep({ pattern: "ErrorClassName|error_code", path: "src/", output_mode: "files_with_matches" })
// Search for similar error handling patterns
Grep({ pattern: "catch.*{similar_pattern}", path: "src/", output_mode: "content" })
```
### Step 2: Search for Same Antipattern
If the initial diagnosis suggests a coding antipattern, search for it globally:
```javascript
// Examples of antipattern searches:
// Missing null checks
Grep({ pattern: "variable\\.property", path: "src/", glob: "*.ts" })
// Unchecked async operations
Grep({ pattern: "async.*without.*await", path: "src/" })
// Direct mutation of shared state
Grep({ pattern: "shared_state_pattern", path: "src/" })
```
### Step 3: Module-Level Analysis
Examine the affected module for structural issues:
```javascript
// List all files in the affected module
Glob({ pattern: "src/affected-module/**/*" })
// Check imports and dependencies
Grep({ pattern: "import.*from.*affected-module", path: "src/" })
// Check for circular dependencies or unusual patterns
Grep({ pattern: "require.*affected-module", path: "src/" })
```
### Step 4: CLI Cross-File Pattern Analysis (Optional)
For complex patterns that span multiple files, use CLI analysis:
```bash
ccw cli -p "PURPOSE: Identify all instances of antipattern across codebase; success = complete scope map
TASK: Search for pattern '{antipattern_description}' | Map all occurrences | Assess systemic risk
MODE: analysis
CONTEXT: @src/**/*.{ext} | Bug in {module}, pattern: {pattern_description}
EXPECTED: List of all files with same pattern, risk assessment per occurrence
CONSTRAINTS: Focus on {antipattern} pattern only | Ignore test files for scope" \
--tool gemini --mode analysis
```
### Step 5: Scope Assessment
Classify the bug scope based on findings:
```json
{
"phase": 2,
"pattern_analysis": {
"scope": "isolated|module-wide|systemic",
"similar_occurrences": [
{
"file": "path/to/file.ts",
"line": 42,
"pattern": "description of similar pattern",
"risk": "same_bug|potential_bug|safe"
}
],
"total_occurrences": 1,
"affected_modules": ["module-name"],
"antipattern_identified": "description or null",
"scope_justification": "why this scope classification"
}
}
```
**Scope Definitions**:
- **isolated**: Bug exists in a single location, no similar patterns found
- **module-wide**: Same pattern exists in multiple files within the same module
- **systemic**: Pattern spans multiple modules, may require broader fix
## Output
- **Data**: `pattern-analysis` section added to investigation report (in-memory)
- **Format**: JSON structure as defined above
## Decision Point
| Scope | Action |
|-------|--------|
| isolated | Proceed to Phase 3 with narrow focus |
| module-wide | Proceed to Phase 3, note all occurrences for Phase 4 fix |
| systemic | Proceed to Phase 3, but flag for potential multi-phase fix or separate tracking |
## Quality Checks
- [ ] At least 3 search queries executed against the codebase
- [ ] Scope classified as isolated, module-wide, or systemic
- [ ] Similar occurrences documented with file:line references
- [ ] Scope justification provided with evidence
## Next Phase
Proceed to [Phase 3: Hypothesis Testing](03-hypothesis-testing.md) with the pattern analysis results.

View File

@@ -0,0 +1,177 @@
# Phase 3: Hypothesis Testing
Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.
## Objective
- Form a maximum of 3 hypotheses from Phase 1-2 evidence
- Test each hypothesis with minimal, read-only probes
- Confirm or reject each hypothesis with concrete evidence
- Enforce 3-strike rule: STOP and escalate after 3 consecutive test failures
## Execution Steps
### Step 1: Form Hypotheses
Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:
```json
{
"hypotheses": [
{
"id": "H1",
"description": "The root cause is X because evidence Y",
"evidence_supporting": ["evidence item 1", "evidence item 2"],
"predicted_behavior": "If H1 is correct, then we should observe Z",
"test_method": "How to verify: read file X line Y, check value Z",
"confidence": "high|medium|low"
}
]
}
```
**Hypothesis Formation Rules**:
- Each hypothesis must cite at least one piece of evidence from Phase 1-2
- Each hypothesis must have a testable prediction
- Rank by confidence (high first)
- Maximum 3 hypotheses per investigation
### Step 2: Test Hypotheses Sequentially
Test each hypothesis starting from highest confidence. Use read-only probes:
**Allowed test methods**:
- `Read` a specific file and check a specific value or condition
- `Grep` for a pattern that would confirm or deny the hypothesis
- `Bash` to run a specific test or command that reveals the condition
- Temporarily add a log statement to observe runtime behavior (revert after)
**Prohibited during testing**:
- Modifying production code (save that for Phase 4)
- Changing multiple things at once
- Running the full test suite (targeted checks only)
```javascript
// Example hypothesis test
// H1: "Function X receives null because caller Y doesn't check return value"
const evidence = Read({ file_path: "src/caller.ts" })
// Check: Does caller Y use the return value without null check?
// Result: Confirmed / Rejected with specific evidence
```
### Step 3: Record Test Results
For each hypothesis test:
```json
{
"hypothesis_tests": [
{
"id": "H1",
"test_performed": "Read src/caller.ts:42 - checked null handling",
"result": "confirmed|rejected|inconclusive",
"evidence": "specific observation that confirms or rejects",
"files_checked": ["src/caller.ts:42-55"]
}
]
}
```
### Step 4: 3-Strike Escalation Rule
Track consecutive test failures. A "failure" means the test was inconclusive or the hypothesis was rejected AND no actionable insight was gained.
```
Strike Counter:
[H1 rejected, no insight] → Strike 1
[H2 rejected, no insight] → Strike 2
[H3 rejected, no insight] → Strike 3 → STOP
```
**Important**: A rejected hypothesis that provides useful insight (narrows the search) does NOT count as a strike. Only truly unproductive tests count.
**On 3rd Strike — STOP and Escalate**:
```
## ESCALATION: 3-Strike Limit Reached
### Failed Step
- Phase: 3 — Hypothesis Testing
- Step: Hypothesis test #{N}
### Error History
1. Attempt 1: H1 — {description}
Test: {what was checked}
Result: {rejected/inconclusive} — {why}
2. Attempt 2: H2 — {description}
Test: {what was checked}
Result: {rejected/inconclusive} — {why}
3. Attempt 3: H3 — {description}
Test: {what was checked}
Result: {rejected/inconclusive} — {why}
### Current State
- Evidence collected: {summary from Phase 1-2}
- Hypotheses tested: {list}
- Files examined: {list}
### Diagnosis
- Likely root cause area: {best guess based on all evidence}
- Suggested human action: {specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z"}
### Diagnostic Dump
{Full investigation-report.json content}
```
After escalation, set status to **BLOCKED** per Completion Status Protocol.
### Step 5: Confirm Root Cause
If a hypothesis is confirmed, document the confirmed root cause:
```json
{
"phase": 3,
"confirmed_root_cause": {
"hypothesis_id": "H1",
"description": "Root cause description with full evidence chain",
"evidence_chain": [
"Phase 1: Error message X observed in Y",
"Phase 2: Same pattern found in 3 other files",
"Phase 3: H1 confirmed — null check missing at file.ts:42"
],
"affected_code": {
"file": "path/to/file.ts",
"line_range": "42-55",
"function": "functionName"
}
}
}
```
## Output
- **Data**: `hypothesis-tests` and `confirmed_root_cause` added to investigation report (in-memory)
- **Format**: JSON structure as defined above
## Gate for Phase 4
**Phase 4 can ONLY proceed if `confirmed_root_cause` is present.** This is the Iron Law gate.
| Outcome | Next Step |
|---------|-----------|
| Root cause confirmed | Proceed to [Phase 4: Implementation](04-implementation.md) |
| 3-strike escalation | STOP, output diagnostic dump, status = BLOCKED |
| Partial insight | Re-form hypotheses with new evidence (stays in Phase 3) |
## Quality Checks
- [ ] Maximum 3 hypotheses formed, each with cited evidence
- [ ] Each hypothesis tested with a specific, documented probe
- [ ] Test results recorded with concrete evidence
- [ ] 3-strike counter maintained correctly
- [ ] Root cause confirmed with full evidence chain OR escalation triggered
## Next Phase
Proceed to [Phase 4: Implementation](04-implementation.md) ONLY with confirmed root cause.

View File

@@ -0,0 +1,139 @@
# Phase 4: Implementation
Implement the minimal fix and add a regression test. Iron Law gate enforced.
## Objective
- Verify Iron Law gate: confirmed root cause MUST exist from Phase 3
- Implement the minimal fix that addresses the confirmed root cause
- Add a regression test that fails without the fix and passes with it
- Verify the fix resolves the original reproduction case
## Iron Law Gate Check
**MANDATORY**: Before any code modification, verify:
```javascript
if (!investigation_report.confirmed_root_cause) {
// VIOLATION: Cannot proceed without confirmed root cause
// Return to Phase 3 or escalate
throw new Error("Iron Law violation: No confirmed root cause. Return to Phase 3.")
}
console.log(`Root cause confirmed: ${investigation_report.confirmed_root_cause.description}`)
console.log(`Evidence chain: ${investigation_report.confirmed_root_cause.evidence_chain.length} items`)
console.log(`Affected code: ${investigation_report.confirmed_root_cause.affected_code.file}:${investigation_report.confirmed_root_cause.affected_code.line_range}`)
```
If the gate check fails, do NOT proceed. Return status **BLOCKED** with reason "Iron Law: no confirmed root cause".
## Execution Steps
### Step 1: Plan the Minimal Fix
Define the fix scope BEFORE writing any code:
```json
{
"fix_plan": {
"description": "What the fix does and why",
"changes": [
{
"file": "path/to/file.ts",
"change_type": "modify|add|remove",
"description": "specific change description",
"lines_affected": "42-45"
}
],
"total_files_changed": 1,
"total_lines_changed": "estimated"
}
}
```
**Minimal Fix Rules** (from [specs/iron-law.md](../specs/iron-law.md)):
- Change only what is necessary to fix the confirmed root cause
- Do not refactor surrounding code
- Do not add features
- Do not change formatting or style of unrelated code
- If the fix requires changes to more than 3 files, document justification
### Step 2: Implement the Fix
Apply the planned changes using `Edit` tool:
```javascript
Edit({
file_path: "path/to/affected/file.ts",
old_string: "buggy code",
new_string: "fixed code"
})
```
### Step 3: Add Regression Test
Create or modify a test that:
1. **Fails** without the fix (tests the exact bug condition)
2. **Passes** with the fix
```javascript
// Identify existing test file for the module
Glob({ pattern: "**/*.test.{ts,js,py}" })
// or
Glob({ pattern: "**/test_*.py" })
// Add regression test
// Test name should reference the bug: "should handle null return from X"
// Test should exercise the exact code path that caused the bug
```
**Regression test requirements**:
- Test name clearly describes the bug scenario
- Test exercises the specific code path identified in root cause
- Test is deterministic (no flaky timing, external dependencies)
- Test is placed in the appropriate test file for the module
### Step 4: Verify Fix Against Reproduction
Re-run the original reproduction case from Phase 1:
```bash
# Run the specific failing test/command from Phase 1
# It should now pass
```
Record the verification result:
```json
{
"phase": 4,
"fix_applied": {
"description": "what was fixed",
"files_changed": ["path/to/file.ts"],
"lines_changed": 3,
"regression_test": {
"file": "path/to/test.ts",
"test_name": "should handle null return from X",
"status": "added|modified"
},
"reproduction_verified": true
}
}
```
## Output
- **Data**: `fix_applied` section added to investigation report (in-memory)
- **Artifacts**: Modified source files and test files
## Quality Checks
- [ ] Iron Law gate passed: confirmed root cause exists
- [ ] Fix is minimal: only necessary changes made
- [ ] Regression test added that covers the specific bug
- [ ] Original reproduction case passes with the fix
- [ ] No unrelated code changes included
## Next Phase
Proceed to [Phase 5: Verification & Report](05-verification-report.md) to run full test suite and generate report.

View File

@@ -0,0 +1,153 @@
# Phase 5: Verification & Report
Run full test suite, check for regressions, and generate the structured debug report.
## Objective
- Run the full test suite to verify no regressions were introduced
- Generate a structured debug report for future reference
- Output the report to `.workflow/.debug/` directory
## Execution Steps
### Step 1: Run Full Test Suite
```bash
# Detect and run the project's test framework
# npm test / pytest / go test / cargo test / etc.
```
Record results:
```json
{
"test_results": {
"total": 0,
"passed": 0,
"failed": 0,
"skipped": 0,
"regression_test_passed": true,
"new_failures": []
}
}
```
**If new failures are found**:
- Check if the failures are related to the fix
- If related: the fix introduced a regression — return to Phase 4 to adjust
- If unrelated: document as pre-existing failures, proceed with report
### Step 2: Regression Check
Verify specifically:
1. The new regression test passes
2. All tests that passed before the fix still pass
3. No new warnings or errors in test output
### Step 3: Generate Structured Debug Report
Create the report following the schema in [specs/debug-report-format.md](../specs/debug-report-format.md):
```bash
mkdir -p .workflow/.debug
```
```json
{
"bug_description": "concise description of the bug",
"reproduction_steps": [
"step 1",
"step 2",
"step 3: observe error"
],
"root_cause": "confirmed root cause description with technical detail",
"evidence_chain": [
"Phase 1: error message X observed in module Y",
"Phase 2: pattern analysis found N similar occurrences",
"Phase 3: hypothesis H1 confirmed — specific condition at file:line"
],
"fix_description": "what was changed and why",
"files_changed": [
{
"path": "src/module/file.ts",
"change_type": "modify",
"description": "added null check before property access"
}
],
"tests_added": [
{
"file": "src/module/__tests__/file.test.ts",
"test_name": "should handle null return from X",
"type": "regression"
}
],
"regression_check_result": {
"passed": true,
"total_tests": 0,
"new_failures": [],
"pre_existing_failures": []
},
"completion_status": "DONE|DONE_WITH_CONCERNS|BLOCKED",
"concerns": [],
"timestamp": "ISO-8601",
"investigation_duration_phases": 5
}
```
### Step 4: Write Report File
```javascript
const slug = bugDescription.toLowerCase().replace(/[^a-z0-9]+/g, '-').substring(0, 40)
const dateStr = new Date().toISOString().substring(0, 10)
const reportPath = `.workflow/.debug/debug-report-${dateStr}-${slug}.json`
Write({ file_path: reportPath, content: JSON.stringify(report, null, 2) })
```
### Step 5: Output Completion Status
Follow the Completion Status Protocol from `_shared/SKILL-DESIGN-SPEC.md` section 13:
**DONE**:
```
## STATUS: DONE
**Summary**: Fixed {bug_description} — root cause was {root_cause_summary}
### Details
- Phases completed: 5/5
- Root cause: {confirmed_root_cause}
- Fix: {fix_description}
- Regression test: {test_name} in {test_file}
### Outputs
- Debug report: {reportPath}
- Files changed: {list}
- Tests added: {list}
```
**DONE_WITH_CONCERNS**:
```
## STATUS: DONE_WITH_CONCERNS
**Summary**: Fixed {bug_description} with concerns
### Details
- Phases completed: 5/5
- Concerns:
1. {concern} — Impact: {low|medium} — Suggested fix: {action}
```
## Output
- **File**: `debug-report-{YYYY-MM-DD}-{slug}.json`
- **Location**: `.workflow/.debug/`
- **Format**: JSON (see [specs/debug-report-format.md](../specs/debug-report-format.md))
## Quality Checks
- [ ] Full test suite executed
- [ ] Regression test specifically verified
- [ ] No new test failures introduced (or documented if pre-existing)
- [ ] Debug report written to `.workflow/.debug/`
- [ ] Completion status output follows protocol

View File

@@ -0,0 +1,226 @@
# Debug Report Format
Defines the structured JSON schema for debug reports generated by the investigate skill.
## When to Use
| Phase | Usage | Section |
|-------|-------|---------|
| Phase 5 | Generate final report | Full schema |
| Phase 3 (escalation) | Diagnostic dump includes partial report | Partial schema |
---
## JSON Schema
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Debug Report",
"type": "object",
"required": [
"bug_description",
"reproduction_steps",
"root_cause",
"evidence_chain",
"fix_description",
"files_changed",
"tests_added",
"regression_check_result",
"completion_status"
],
"properties": {
"bug_description": {
"type": "string",
"description": "Concise description of the bug symptom",
"minLength": 10
},
"reproduction_steps": {
"type": "array",
"description": "Ordered steps to reproduce the bug",
"items": { "type": "string" },
"minItems": 1
},
"root_cause": {
"type": "string",
"description": "Confirmed root cause with technical detail",
"minLength": 20
},
"evidence_chain": {
"type": "array",
"description": "Ordered evidence from Phase 1 through Phase 3, each prefixed with phase number",
"items": { "type": "string" },
"minItems": 1
},
"fix_description": {
"type": "string",
"description": "What was changed and why",
"minLength": 10
},
"files_changed": {
"type": "array",
"items": {
"type": "object",
"required": ["path", "change_type", "description"],
"properties": {
"path": {
"type": "string",
"description": "Relative file path"
},
"change_type": {
"type": "string",
"enum": ["add", "modify", "remove"]
},
"description": {
"type": "string",
"description": "Brief description of changes to this file"
}
}
}
},
"tests_added": {
"type": "array",
"items": {
"type": "object",
"required": ["file", "test_name", "type"],
"properties": {
"file": {
"type": "string",
"description": "Test file path"
},
"test_name": {
"type": "string",
"description": "Name of the test function or describe block"
},
"type": {
"type": "string",
"enum": ["regression", "unit", "integration"],
"description": "Type of test added"
}
}
}
},
"regression_check_result": {
"type": "object",
"required": ["passed", "total_tests"],
"properties": {
"passed": {
"type": "boolean",
"description": "Whether the full test suite passed"
},
"total_tests": {
"type": "integer",
"description": "Total number of tests executed"
},
"new_failures": {
"type": "array",
"items": { "type": "string" },
"description": "Tests that failed after the fix but passed before"
},
"pre_existing_failures": {
"type": "array",
"items": { "type": "string" },
"description": "Tests that were already failing before the investigation"
}
}
},
"completion_status": {
"type": "string",
"enum": ["DONE", "DONE_WITH_CONCERNS", "BLOCKED"],
"description": "Final status per Completion Status Protocol"
},
"concerns": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"impact": { "type": "string", "enum": ["low", "medium"] },
"suggested_action": { "type": "string" }
}
},
"description": "Non-blocking concerns (populated when status is DONE_WITH_CONCERNS)"
},
"timestamp": {
"type": "string",
"format": "date-time",
"description": "ISO-8601 timestamp of report generation"
},
"investigation_duration_phases": {
"type": "integer",
"description": "Number of phases completed (1-5)",
"minimum": 1,
"maximum": 5
}
}
}
```
## Field Descriptions
| Field | Source Phase | Description |
|-------|-------------|-------------|
| `bug_description` | Phase 1 | User-reported symptom, one sentence |
| `reproduction_steps` | Phase 1 | Ordered steps to trigger the bug |
| `root_cause` | Phase 3 | Confirmed cause with file:line reference |
| `evidence_chain` | Phase 1-3 | Each item prefixed with "Phase N:" |
| `fix_description` | Phase 4 | What code was changed and why |
| `files_changed` | Phase 4 | Each file with change type and description |
| `tests_added` | Phase 4 | Regression tests covering the bug |
| `regression_check_result` | Phase 5 | Full test suite results |
| `completion_status` | Phase 5 | Final status per protocol |
| `concerns` | Phase 5 | Non-blocking issues (if any) |
| `timestamp` | Phase 5 | When report was generated |
| `investigation_duration_phases` | Phase 5 | How many phases were completed |
## Example Report
```json
{
"bug_description": "API returns 500 when user profile has null display_name",
"reproduction_steps": [
"Create user account without setting display_name",
"Call GET /api/users/:id/profile",
"Observe 500 Internal Server Error"
],
"root_cause": "ProfileSerializer.format() calls displayName.trim() without null check at src/serializers/profile.ts:42",
"evidence_chain": [
"Phase 1: TypeError: Cannot read properties of null (reading 'trim') in server logs",
"Phase 2: Same pattern in 2 other serializers (address.ts:28, company.ts:35)",
"Phase 3: H1 confirmed — displayName field is nullable in DB but serializer assumes non-null"
],
"fix_description": "Added null-safe access for displayName in ProfileSerializer.format()",
"files_changed": [
{
"path": "src/serializers/profile.ts",
"change_type": "modify",
"description": "Added optional chaining for displayName.trim() call"
}
],
"tests_added": [
{
"file": "src/serializers/__tests__/profile.test.ts",
"test_name": "should handle null display_name without error",
"type": "regression"
}
],
"regression_check_result": {
"passed": true,
"total_tests": 142,
"new_failures": [],
"pre_existing_failures": []
},
"completion_status": "DONE",
"concerns": [],
"timestamp": "2026-03-29T15:30:00+08:00",
"investigation_duration_phases": 5
}
```
## Output Location
Reports are written to: `.workflow/.debug/debug-report-{YYYY-MM-DD}-{slug}.json`
Where:
- `{YYYY-MM-DD}` is the investigation date
- `{slug}` is derived from the bug description (lowercase, hyphens, max 40 chars)

View File

@@ -0,0 +1,101 @@
# Iron Law of Debugging
The Iron Law defines the non-negotiable rules that govern every investigation performed by this skill. These rules exist to prevent symptom-fixing and ensure durable, evidence-based solutions.
## When to Use
| Phase | Usage | Section |
|-------|-------|---------|
| Phase 3 | Hypothesis must produce confirmed root cause before proceeding | Rule 1 |
| Phase 1 | Reproduction must produce observable evidence | Rule 2 |
| Phase 4 | Fix scope must be minimal | Rule 3 |
| Phase 4 | Regression test is mandatory | Rule 4 |
| Phase 3 | 3 consecutive unproductive hypothesis failures trigger escalation | Rule 5 |
---
## Rules
### Rule 1: Never Fix Without Confirmed Root Cause
**Statement**: No code modification is permitted until a root cause has been confirmed through hypothesis testing with concrete evidence.
**Enforcement**: Phase 4 begins with an Iron Law gate check. If `confirmed_root_cause` is absent from the investigation report, Phase 4 is blocked.
**Rationale**: Fixing symptoms without understanding the cause leads to:
- Incomplete fixes that break under different conditions
- Masking of deeper issues
- Wasted investigation time when the bug recurs
### Rule 2: Evidence Must Be Reproducible
**Statement**: The bug must be reproducible through documented steps, or if not reproducible, the evidence must be sufficient to identify the root cause through static analysis.
**Enforcement**: Phase 1 documents reproduction steps and evidence. If reproduction fails, this is flagged as a concern but does not block investigation if sufficient static evidence exists.
**Acceptable evidence types**:
- Failing test case
- Error message with stack trace
- Log output showing the failure
- Code path analysis showing the defect condition
### Rule 3: Fix Must Be Minimal
**Statement**: The fix must change only what is necessary to address the confirmed root cause. No refactoring, no feature additions, no style changes to unrelated code.
**Enforcement**: Phase 4 requires a fix plan before implementation. Changes exceeding 3 files require written justification.
**What counts as minimal**:
- Adding a missing null check
- Fixing an incorrect condition
- Correcting a wrong variable reference
- Adding a missing import or dependency
**What is NOT minimal**:
- Refactoring the function "while we're here"
- Renaming variables for clarity
- Adding error handling to unrelated code paths
- Reformatting surrounding code
### Rule 4: Regression Test Required
**Statement**: Every fix must include a test that:
1. Fails when the fix is reverted (proves it tests the bug)
2. Passes when the fix is applied (proves the fix works)
**Enforcement**: Phase 4 requires a regression test before the phase is marked complete.
**Test requirements**:
- Test name clearly references the bug scenario
- Test exercises the exact code path of the root cause
- Test is deterministic (no timing dependencies, no external services)
- Test is placed in the appropriate test file for the affected module
### Rule 5: 3-Strike Escalation on Hypothesis Failure
**Statement**: If 3 consecutive hypothesis tests produce no actionable insight, the investigation must STOP and escalate with a full diagnostic dump.
**Enforcement**: Phase 3 tracks a strike counter. On the 3rd consecutive unproductive failure, execution halts and outputs the escalation block.
**What counts as a strike**:
- Hypothesis rejected AND no new insight gained
- Test was inconclusive AND no narrowing of search space
**What does NOT count as a strike**:
- Hypothesis rejected BUT new evidence narrows the search
- Hypothesis rejected BUT reveals a different potential cause
- Test inconclusive BUT identifies a new area to investigate
**Post-escalation**: Status set to BLOCKED. No further automated investigation. Preserve all intermediate outputs for human review.
---
## Validation Checklist
Before completing any investigation, verify:
- [ ] Rule 1: Root cause confirmed before any fix was applied
- [ ] Rule 2: Bug reproduction documented (or static evidence justified)
- [ ] Rule 3: Fix changes only necessary code (file count, line count documented)
- [ ] Rule 4: Regression test exists and passes
- [ ] Rule 5: No more than 3 consecutive unproductive hypothesis tests (or escalation triggered)

View File

@@ -0,0 +1,125 @@
---
name: security-audit
description: OWASP Top 10 and STRIDE security auditing with supply chain analysis. Triggers on "security audit", "security scan", "cso".
allowed-tools: Read, Write, Bash, Glob, Grep
---
# Security Audit
4-phase security audit covering supply chain risks, OWASP Top 10 code review, STRIDE threat modeling, and trend-tracked reporting. Produces structured JSON findings in `.workflow/.security/`.
## Architecture Overview
```
+-------------------------------------------------------------------+
| Phase 1: Supply Chain Scan |
| -> Dependency audit, secrets detection, CI/CD review, LLM risks |
| -> Output: supply-chain-report.json |
+-----------------------------------+-------------------------------+
|
+-----------------------------------v-------------------------------+
| Phase 2: OWASP Review |
| -> OWASP Top 10 2021 code-level analysis via ccw cli |
| -> Output: owasp-findings.json |
+-----------------------------------+-------------------------------+
|
+-----------------------------------v-------------------------------+
| Phase 3: Threat Modeling (STRIDE) |
| -> 6 threat categories mapped to architecture components |
| -> Output: threat-model.json |
+-----------------------------------+-------------------------------+
|
+-----------------------------------v-------------------------------+
| Phase 4: Report & Tracking |
| -> Score calculation, trend comparison, dated report |
| -> Output: .workflow/.security/audit-report-{date}.json |
+-------------------------------------------------------------------+
```
## Key Design Principles
1. **Infrastructure-first**: Phase 1 catches low-hanging fruit (leaked secrets, vulnerable deps) before deeper analysis
2. **Standards-based**: OWASP Top 10 2021 and STRIDE provide systematic coverage
3. **Scoring gates**: Daily quick-scan must score 8/10; comprehensive audit minimum 2/10 for initial baseline
4. **Trend tracking**: Each audit compares against prior results in `.workflow/.security/`
## Execution Flow
### Quick-Scan Mode (daily)
Run Phase 1 only. Must score >= 8/10 to pass.
### Comprehensive Mode (full audit)
Run all 4 phases sequentially. Initial baseline minimum 2/10.
### Phase Sequence
1. **Phase 1: Supply Chain Scan** -- [phases/01-supply-chain-scan.md](phases/01-supply-chain-scan.md)
- Dependency audit (npm audit / pip-audit / safety check)
- Secrets detection (API keys, tokens, passwords in source)
- CI/CD config review (injection risks in workflow YAML)
- LLM/AI prompt injection check
2. **Phase 2: OWASP Review** -- [phases/02-owasp-review.md](phases/02-owasp-review.md)
- Systematic OWASP Top 10 2021 code review
- Uses `ccw cli --tool gemini --mode analysis --rule analysis-assess-security-risks`
3. **Phase 3: Threat Modeling** -- [phases/03-threat-modeling.md](phases/03-threat-modeling.md)
- STRIDE threat model mapped to architecture components
- Trust boundary identification and attack surface assessment
4. **Phase 4: Report & Tracking** -- [phases/04-report-tracking.md](phases/04-report-tracking.md)
- Score calculation with severity weights
- Trend comparison with previous audits
- Date-stamped report to `.workflow/.security/`
## Scoring Overview
See [specs/scoring-gates.md](specs/scoring-gates.md) for full specification.
| Severity | Weight | Example |
|----------|--------|---------|
| Critical | 10 | RCE, SQL injection, leaked credentials |
| High | 7 | Broken auth, SSRF, privilege escalation |
| Medium | 4 | XSS, CSRF, verbose error messages |
| Low | 1 | Missing headers, informational disclosures |
**Gates**: Daily quick-scan >= 8/10, Comprehensive initial >= 2/10.
## Directory Setup
```bash
mkdir -p .workflow/.security
WORK_DIR=".workflow/.security"
```
## Output Structure
```
.workflow/.security/
audit-report-{YYYY-MM-DD}.json # Dated audit report
supply-chain-report.json # Latest supply chain scan
owasp-findings.json # Latest OWASP findings
threat-model.json # Latest STRIDE threat model
```
## Reference Documents
| Document | Purpose |
|----------|---------|
| [phases/01-supply-chain-scan.md](phases/01-supply-chain-scan.md) | Dependency, secrets, CI/CD, LLM risk scan |
| [phases/02-owasp-review.md](phases/02-owasp-review.md) | OWASP Top 10 2021 code review |
| [phases/03-threat-modeling.md](phases/03-threat-modeling.md) | STRIDE threat modeling |
| [phases/04-report-tracking.md](phases/04-report-tracking.md) | Report generation and trend tracking |
| [specs/scoring-gates.md](specs/scoring-gates.md) | Scoring system and quality gates |
| [specs/owasp-checklist.md](specs/owasp-checklist.md) | OWASP Top 10 detection patterns |
## Completion Status Protocol
This skill follows the Completion Status Protocol defined in `_shared/SKILL-DESIGN-SPEC.md` sections 13-14.
Possible termination statuses:
- **DONE**: All phases completed, score calculated, report generated
- **DONE_WITH_CONCERNS**: Audit completed but findings exceed acceptable thresholds
- **BLOCKED**: Required tools unavailable (e.g., npm/pip not installed), permission denied
- **NEEDS_CONTEXT**: Ambiguous project scope, unclear trust boundaries
Escalation follows the Three-Strike Rule (section 14) per step.

View File

@@ -0,0 +1,139 @@
# Phase 1: Supply Chain Scan
Detect low-hanging security risks in dependencies, secrets, CI/CD pipelines, and LLM/AI integrations.
## Objective
- Audit third-party dependencies for known vulnerabilities
- Scan source code for leaked secrets and credentials
- Review CI/CD configuration for injection risks
- Check for LLM/AI prompt injection vulnerabilities
## Execution Steps
### Step 1: Dependency Audit
Detect package manager and run appropriate audit tool.
```bash
# Node.js projects
if [ -f package-lock.json ] || [ -f yarn.lock ]; then
npm audit --json > "${WORK_DIR}/npm-audit-raw.json" 2>&1 || true
fi
# Python projects
if [ -f requirements.txt ] || [ -f pyproject.toml ]; then
pip-audit --format json --output "${WORK_DIR}/pip-audit-raw.json" 2>&1 || true
# Fallback: safety check
safety check --json > "${WORK_DIR}/safety-raw.json" 2>&1 || true
fi
# Go projects
if [ -f go.sum ]; then
govulncheck ./... 2>&1 | tee "${WORK_DIR}/govulncheck-raw.txt" || true
fi
```
If audit tools are not installed, log as INFO finding and continue.
### Step 2: Secrets Detection
Scan source files for hardcoded secrets using regex patterns.
```bash
# High-confidence patterns (case-insensitive)
grep -rniE \
'(api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token|secret[_-]?key)\s*[:=]\s*["\x27][A-Za-z0-9+/=_-]{16,}' \
--include='*.ts' --include='*.js' --include='*.py' --include='*.go' \
--include='*.java' --include='*.rb' --include='*.env' --include='*.yml' \
--include='*.yaml' --include='*.json' --include='*.toml' --include='*.cfg' \
. || true
# AWS patterns
grep -rniE '(AKIA[0-9A-Z]{16}|aws[_-]?secret[_-]?access[_-]?key)' . || true
# Private keys
grep -rniE '-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----' . || true
# Connection strings with passwords
grep -rniE '(mongodb|postgres|mysql|redis)://[^:]+:[^@]+@' . || true
# JWT tokens (hardcoded)
grep -rniE 'eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}' . || true
```
Exclude: `node_modules/`, `.git/`, `dist/`, `build/`, `__pycache__/`, `*.lock`, `*.min.js`.
### Step 3: CI/CD Config Review
Check GitHub Actions and other CI/CD configs for injection risks.
```bash
# Find workflow files
find .github/workflows -name '*.yml' -o -name '*.yaml' 2>/dev/null
# Check for expression injection in run: blocks
# Dangerous: ${{ github.event.pull_request.title }} in run:
grep -rn '\${{.*github\.event\.' .github/workflows/ 2>/dev/null || true
# Check for pull_request_target with checkout of PR code
grep -rn 'pull_request_target' .github/workflows/ 2>/dev/null || true
# Check for use of deprecated/vulnerable actions
grep -rn 'actions/checkout@v1\|actions/checkout@v2' .github/workflows/ 2>/dev/null || true
# Check for secrets passed to untrusted contexts
grep -rn 'secrets\.' .github/workflows/ 2>/dev/null || true
```
### Step 4: LLM/AI Prompt Injection Check
Scan for patterns indicating prompt injection risk in LLM integrations.
```bash
# User input concatenated directly into prompts
grep -rniE '(prompt|system_message|messages)\s*[+=].*\b(user_input|request\.(body|query|params)|req\.)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Template strings with user data in LLM calls
grep -rniE '(openai|anthropic|llm|chat|completion)\.' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Check for missing input sanitization before LLM calls
grep -rniE 'f".*{.*}.*".*\.(chat|complete|generate)' \
--include='*.py' . || true
```
## Output
- **File**: `supply-chain-report.json`
- **Location**: `${WORK_DIR}/supply-chain-report.json`
- **Format**: JSON
```json
{
"phase": "supply-chain-scan",
"timestamp": "ISO-8601",
"findings": [
{
"category": "dependency|secret|cicd|llm",
"severity": "critical|high|medium|low",
"title": "Finding title",
"description": "Detailed description",
"file": "path/to/file",
"line": 42,
"evidence": "matched text or context",
"remediation": "How to fix"
}
],
"summary": {
"total": 0,
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
"by_category": { "dependency": 0, "secret": 0, "cicd": 0, "llm": 0 }
}
}
```
## Next Phase
Proceed to [Phase 2: OWASP Review](02-owasp-review.md) with supply chain findings as context.

View File

@@ -0,0 +1,156 @@
# Phase 2: OWASP Review
Systematic code-level review against OWASP Top 10 2021 categories.
## Objective
- Review codebase against all 10 OWASP Top 10 2021 categories
- Use CCW CLI multi-model analysis for comprehensive coverage
- Produce structured findings with file:line references and remediation steps
## Prerequisites
- Phase 1 supply-chain-report.json (provides dependency context)
- Read [specs/owasp-checklist.md](../specs/owasp-checklist.md) for detection patterns
## Execution Steps
### Step 1: Identify Target Scope
```bash
# Identify source directories (exclude deps, build, test fixtures)
# Focus on: API routes, auth modules, data access, input handlers
find . -type f \( -name '*.ts' -o -name '*.js' -o -name '*.py' -o -name '*.go' -o -name '*.java' \) \
! -path '*/node_modules/*' ! -path '*/dist/*' ! -path '*/.git/*' \
! -path '*/build/*' ! -path '*/__pycache__/*' ! -path '*/vendor/*' \
| head -200
```
### Step 2: CCW CLI Analysis
Run multi-model security analysis using the security risks rule template.
```bash
ccw cli -p "PURPOSE: OWASP Top 10 2021 security audit of this codebase.
Systematically check each OWASP category:
A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
A09 Security Logging/Monitoring Failures | A10 SSRF
TASK: For each OWASP category, scan relevant code patterns, identify vulnerabilities with file:line references, classify severity, provide remediation.
MODE: analysis
CONTEXT: @src/**/* @**/*.config.* @**/*.env.example
EXPECTED: JSON-structured findings per OWASP category with severity, file:line, evidence, remediation.
CONSTRAINTS: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks
" --tool gemini --mode analysis --rule analysis-assess-security-risks
```
### Step 3: Manual Pattern Scanning
Supplement CLI analysis with targeted pattern scans per OWASP category. Reference [specs/owasp-checklist.md](../specs/owasp-checklist.md) for full pattern list.
**A01 - Broken Access Control**:
```bash
# Missing auth middleware on routes
grep -rn 'app\.\(get\|post\|put\|delete\|patch\)(' --include='*.ts' --include='*.js' . | grep -v 'auth\|middleware\|protect'
# Direct object references without ownership check
grep -rn 'params\.id\|req\.params\.' --include='*.ts' --include='*.js' . || true
```
**A03 - Injection**:
```bash
# SQL string concatenation
grep -rniE '(query|execute|raw)\s*\(\s*[`"'\'']\s*SELECT.*\+\s*|f".*SELECT.*{' --include='*.ts' --include='*.js' --include='*.py' . || true
# Command injection
grep -rniE '(exec|spawn|system|popen|subprocess)\s*\(' --include='*.ts' --include='*.js' --include='*.py' . || true
```
**A05 - Security Misconfiguration**:
```bash
# Debug mode enabled
grep -rniE '(DEBUG|debug)\s*[:=]\s*(true|True|1|"true")' --include='*.env' --include='*.py' --include='*.ts' --include='*.json' . || true
# CORS wildcard
grep -rniE "cors.*\*|Access-Control-Allow-Origin.*\*" --include='*.ts' --include='*.js' --include='*.py' . || true
```
**A07 - Identification and Authentication Failures**:
```bash
# Weak password patterns
grep -rniE 'password.*length.*[0-5][^0-9]|minlength.*[0-5][^0-9]' --include='*.ts' --include='*.js' --include='*.py' . || true
# Hardcoded credentials
grep -rniE '(password|passwd|pwd)\s*[:=]\s*["\x27][^"\x27]{3,}' --include='*.ts' --include='*.js' --include='*.py' --include='*.env' . || true
```
### Step 4: Consolidate Findings
Merge CLI analysis results and manual pattern scan results. Deduplicate and classify by OWASP category.
## OWASP Top 10 2021 Categories
| ID | Category | Key Checks |
|----|----------|------------|
| A01 | Broken Access Control | Missing auth, IDOR, path traversal, CORS |
| A02 | Cryptographic Failures | Weak algorithms, plaintext storage, missing TLS |
| A03 | Injection | SQL, NoSQL, OS command, LDAP, XPath injection |
| A04 | Insecure Design | Missing threat modeling, insecure business logic |
| A05 | Security Misconfiguration | Debug enabled, default creds, verbose errors |
| A06 | Vulnerable and Outdated Components | Known CVEs in dependencies (from Phase 1) |
| A07 | Identification and Authentication Failures | Weak passwords, missing MFA, session issues |
| A08 | Software and Data Integrity Failures | Unsigned updates, insecure deserialization, CI/CD |
| A09 | Security Logging and Monitoring Failures | Missing audit logs, no alerting, insufficient logging |
| A10 | Server-Side Request Forgery (SSRF) | Unvalidated URLs, internal resource access |
## Output
- **File**: `owasp-findings.json`
- **Location**: `${WORK_DIR}/owasp-findings.json`
- **Format**: JSON
```json
{
"phase": "owasp-review",
"timestamp": "ISO-8601",
"owasp_version": "2021",
"findings": [
{
"owasp_id": "A01",
"owasp_category": "Broken Access Control",
"severity": "critical|high|medium|low",
"title": "Finding title",
"description": "Detailed description",
"file": "path/to/file",
"line": 42,
"evidence": "code snippet or pattern match",
"remediation": "Specific fix recommendation",
"cwe": "CWE-XXX"
}
],
"coverage": {
"A01": "checked|not_applicable",
"A02": "checked|not_applicable",
"A03": "checked|not_applicable",
"A04": "checked|not_applicable",
"A05": "checked|not_applicable",
"A06": "checked|not_applicable",
"A07": "checked|not_applicable",
"A08": "checked|not_applicable",
"A09": "checked|not_applicable",
"A10": "checked|not_applicable"
},
"summary": {
"total": 0,
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
"categories_checked": 10,
"categories_with_findings": 0
}
}
```
## Next Phase
Proceed to [Phase 3: Threat Modeling](03-threat-modeling.md) with OWASP findings as input for STRIDE analysis.

View File

@@ -0,0 +1,180 @@
# Phase 3: Threat Modeling (STRIDE)
Map STRIDE threat categories to architecture components, identify trust boundaries, and assess attack surface.
## Objective
- Apply the STRIDE threat model to the project architecture
- Identify trust boundaries between system components
- Assess attack surface area per component
- Cross-reference with Phase 1 and Phase 2 findings
## STRIDE Categories
| Category | Threat | Question | Typical Targets |
|----------|--------|----------|-----------------|
| **S** - Spoofing | Identity impersonation | Can an attacker pretend to be someone else? | Auth endpoints, API keys, session tokens |
| **T** - Tampering | Data modification | Can data be modified in transit or at rest? | Request bodies, database records, config files |
| **R** - Repudiation | Deniable actions | Can a user deny performing an action? | Audit logs, transaction records, user actions |
| **I** - Information Disclosure | Data leakage | Can sensitive data be exposed? | Error messages, logs, API responses, storage |
| **D** - Denial of Service | Availability disruption | Can the system be made unavailable? | API endpoints, resource-intensive operations |
| **E** - Elevation of Privilege | Unauthorized access | Can a user gain higher privileges? | Role checks, admin routes, permission logic |
## Execution Steps
### Step 1: Architecture Component Discovery
Identify major system components by scanning project structure.
```bash
# Identify entry points (API routes, CLI commands, event handlers)
grep -rlE '(app\.(get|post|put|delete|patch|use)|router\.|@app\.route|@router\.)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Identify data stores (database connections, file storage)
grep -rlE '(createConnection|mongoose\.connect|sqlite|redis|S3|createClient)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Identify external service integrations
grep -rlE '(fetch|axios|http\.request|requests\.(get|post)|urllib)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Identify auth/session components
grep -rlE '(jwt|passport|session|oauth|bcrypt|argon2|crypto)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
```
### Step 2: Trust Boundary Identification
Map trust boundaries in the system:
1. **External boundary**: User/browser <-> Application server
2. **Service boundary**: Application <-> External APIs/services
3. **Data boundary**: Application <-> Database/storage
4. **Internal boundary**: Public routes <-> Authenticated routes <-> Admin routes
5. **Process boundary**: Main process <-> Worker/subprocess
For each boundary, document:
- What crosses the boundary (data types, credentials)
- How the boundary is enforced (middleware, TLS, auth)
- What happens when enforcement fails
### Step 3: STRIDE per Component
For each discovered component, systematically evaluate all 6 STRIDE categories:
**Spoofing Analysis**:
- Are authentication mechanisms in place at all entry points?
- Can API keys or tokens be forged or replayed?
- Are session tokens properly validated and rotated?
**Tampering Analysis**:
- Is input validation applied before processing?
- Are database queries parameterized?
- Can request bodies or headers be manipulated to alter behavior?
- Are file uploads validated for type and content?
**Repudiation Analysis**:
- Are user actions logged with sufficient detail (who, what, when)?
- Are logs tamper-proof or centralized?
- Can critical operations (payments, deletions) be traced to a user?
**Information Disclosure Analysis**:
- Do error responses leak stack traces or internal paths?
- Are sensitive fields (passwords, tokens) excluded from logs and API responses?
- Is PII properly handled (encryption at rest, masking in logs)?
- Do debug endpoints or verbose modes expose internals?
**Denial of Service Analysis**:
- Are rate limits applied to public endpoints?
- Can resource-intensive operations be triggered without limits?
- Are file upload sizes bounded?
- Are database queries bounded (pagination, timeouts)?
**Elevation of Privilege Analysis**:
- Are role/permission checks applied consistently?
- Can horizontal privilege escalation occur (accessing other users' data)?
- Can vertical escalation occur (user -> admin)?
- Are admin/debug routes properly protected?
### Step 4: Attack Surface Assessment
Quantify the attack surface:
```
Attack Surface = Sum of:
- Number of public API endpoints
- Number of external service integrations
- Number of user-controllable input points
- Number of privileged operations
- Number of data stores with sensitive content
```
Rate each component:
- **High exposure**: Public-facing, handles sensitive data, complex logic
- **Medium exposure**: Authenticated access, moderate data sensitivity
- **Low exposure**: Internal only, no sensitive data, simple operations
## Output
- **File**: `threat-model.json`
- **Location**: `${WORK_DIR}/threat-model.json`
- **Format**: JSON
```json
{
"phase": "threat-modeling",
"timestamp": "ISO-8601",
"framework": "STRIDE",
"components": [
{
"name": "Component name",
"type": "api_endpoint|data_store|external_service|auth_module|worker",
"files": ["path/to/file.ts"],
"exposure": "high|medium|low",
"trust_boundaries": ["external", "data"],
"threats": {
"spoofing": {
"applicable": true,
"findings": ["Description of threat"],
"mitigations": ["Existing mitigation"],
"gaps": ["Missing mitigation"]
},
"tampering": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
"repudiation": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
"information_disclosure": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
"denial_of_service": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
"elevation_of_privilege": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] }
}
}
],
"trust_boundaries": [
{
"name": "Boundary name",
"from": "Component A",
"to": "Component B",
"enforcement": "TLS|auth_middleware|API_key",
"data_crossing": ["request bodies", "credentials"],
"risk_level": "high|medium|low"
}
],
"attack_surface": {
"public_endpoints": 0,
"external_integrations": 0,
"input_points": 0,
"privileged_operations": 0,
"sensitive_data_stores": 0,
"total_score": 0
},
"summary": {
"components_analyzed": 0,
"threats_identified": 0,
"by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 },
"high_exposure_components": 0
}
}
```
## Next Phase
Proceed to [Phase 4: Report & Tracking](04-report-tracking.md) with the threat model to generate the final scored audit report.

View File

@@ -0,0 +1,177 @@
# Phase 4: Report & Tracking
Generate scored audit report, compare with previous audits, and track trends.
## Objective
- Calculate security score from all phase findings
- Compare with previous audit results (if available)
- Generate date-stamped report in `.workflow/.security/`
- Track improvement or regression trends
## Prerequisites
- Phase 1: `supply-chain-report.json`
- Phase 2: `owasp-findings.json`
- Phase 3: `threat-model.json`
- Previous audit: `.workflow/.security/audit-report-*.json` (optional)
## Execution Steps
### Step 1: Aggregate Findings
Collect all findings from phases 1-3 and classify by severity.
```
All findings =
supply-chain-report.findings
+ owasp-findings.findings
+ threat-model threats (where gaps exist)
```
### Step 2: Calculate Score
Apply scoring formula from [specs/scoring-gates.md](../specs/scoring-gates.md):
```
Base score = 10.0
For each finding:
penalty = severity_weight / total_files_scanned
- Critical: weight = 10 (each critical finding has outsized impact)
- High: weight = 7
- Medium: weight = 4
- Low: weight = 1
Weighted penalty = SUM(finding_weight * count_per_severity) / normalization_factor
Final score = max(0, 10.0 - weighted_penalty)
Normalization factor = max(10, total_files_scanned)
```
**Score interpretation**:
| Score | Rating | Meaning |
|-------|--------|---------|
| 9-10 | Excellent | Minimal risk, production-ready |
| 7-8 | Good | Acceptable risk, minor improvements needed |
| 5-6 | Fair | Notable risks, remediation recommended |
| 3-4 | Poor | Significant risks, remediation required |
| 0-2 | Critical | Severe vulnerabilities, immediate action needed |
### Step 3: Gate Evaluation
**Daily quick-scan gate** (Phase 1 only):
- PASS: score >= 8/10
- FAIL: score < 8/10 -- block deployment or flag for review
**Comprehensive audit gate** (all phases):
- For initial/baseline: PASS if score >= 2/10 (establishes baseline)
- For subsequent: PASS if score >= previous_score (no regression)
- Target: score >= 7/10 for production readiness
### Step 4: Trend Comparison
```bash
# Find previous audit reports
ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -5
```
Compare current vs. previous:
- Delta per OWASP category
- Delta per STRIDE category
- New findings vs. resolved findings
- Overall score trend
### Step 5: Generate Report
Write the final report with all consolidated data.
## Output
- **File**: `audit-report-{YYYY-MM-DD}.json`
- **Location**: `.workflow/.security/audit-report-{YYYY-MM-DD}.json`
- **Format**: JSON
```json
{
"report": "security-audit",
"version": "1.0",
"timestamp": "ISO-8601",
"date": "YYYY-MM-DD",
"mode": "comprehensive|quick-scan",
"score": {
"overall": 7.5,
"rating": "Good",
"gate": "PASS|FAIL",
"gate_threshold": 8
},
"findings_summary": {
"total": 0,
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
"by_phase": {
"supply_chain": 0,
"owasp": 0,
"stride": 0
},
"by_owasp": {
"A01": 0, "A02": 0, "A03": 0, "A04": 0, "A05": 0,
"A06": 0, "A07": 0, "A08": 0, "A09": 0, "A10": 0
},
"by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 }
},
"top_risks": [
{
"rank": 1,
"title": "Most critical finding",
"severity": "critical",
"source_phase": "owasp",
"remediation": "How to fix",
"effort": "low|medium|high"
}
],
"trend": {
"previous_date": "YYYY-MM-DD or null",
"previous_score": 0,
"score_delta": 0,
"new_findings": 0,
"resolved_findings": 0,
"direction": "improving|stable|regressing|baseline"
},
"phases_completed": ["supply-chain-scan", "owasp-review", "threat-modeling", "report-tracking"],
"files_scanned": 0,
"remediation_priority": [
{
"priority": 1,
"finding": "Finding title",
"effort": "low",
"impact": "high",
"recommendation": "Specific action"
}
]
}
```
## Report Storage
```bash
# Ensure directory exists
mkdir -p .workflow/.security
# Write report with date stamp
DATE=$(date +%Y-%m-%d)
cp "${WORK_DIR}/audit-report.json" ".workflow/.security/audit-report-${DATE}.json"
# Also maintain latest copies of phase outputs
cp "${WORK_DIR}/supply-chain-report.json" ".workflow/.security/" 2>/dev/null || true
cp "${WORK_DIR}/owasp-findings.json" ".workflow/.security/" 2>/dev/null || true
cp "${WORK_DIR}/threat-model.json" ".workflow/.security/" 2>/dev/null || true
```
## Completion
After report generation, output skill completion status per the Completion Status Protocol:
- **DONE**: All phases completed, report generated, score calculated
- **DONE_WITH_CONCERNS**: Report generated but score below target or regression detected
- **BLOCKED**: Phase data missing or corrupted

View File

@@ -0,0 +1,442 @@
# OWASP Top 10 2021 Checklist
Code-level detection patterns, vulnerable code examples, and remediation templates for each OWASP category.
## When to Use
| Phase | Usage | Section |
|-------|-------|---------|
| Phase 2 | Reference during OWASP code review | All categories |
| Phase 4 | Classify findings by OWASP category | Category IDs |
---
## A01: Broken Access Control
**CWE**: CWE-200, CWE-284, CWE-285, CWE-352, CWE-639
### Detection Patterns
```bash
# Missing auth middleware on route handlers
grep -rnE 'app\.(get|post|put|delete|patch)\s*\(\s*["\x27/]' --include='*.ts' --include='*.js' .
# Then verify each route has auth middleware
# Direct object reference without ownership check
grep -rnE 'findById\(.*params|findOne\(.*params|\.get\(.*id' --include='*.ts' --include='*.js' --include='*.py' .
# Path traversal patterns
grep -rnE '(readFile|writeFile|createReadStream|open)\s*\(.*req\.' --include='*.ts' --include='*.js' .
grep -rnE 'os\.path\.join\(.*request\.' --include='*.py' .
# Missing CORS restrictions
grep -rnE 'Access-Control-Allow-Origin.*\*|cors\(\s*\)' --include='*.ts' --include='*.js' .
```
### Vulnerable Code Example
```javascript
// BAD: No ownership check
app.get('/api/documents/:id', auth, async (req, res) => {
const doc = await Document.findById(req.params.id); // Any user can access any doc
res.json(doc);
});
```
### Remediation
```javascript
// GOOD: Ownership check
app.get('/api/documents/:id', auth, async (req, res) => {
const doc = await Document.findOne({ _id: req.params.id, owner: req.user.id });
if (!doc) return res.status(404).json({ error: 'Not found' });
res.json(doc);
});
```
---
## A02: Cryptographic Failures
**CWE**: CWE-259, CWE-327, CWE-331, CWE-798
### Detection Patterns
```bash
# Weak hash algorithms
grep -rniE '(md5|sha1)\s*\(' --include='*.ts' --include='*.js' --include='*.py' --include='*.java' .
# Plaintext password storage
grep -rniE 'password\s*[:=]\s*.*\.(body|query|params)' --include='*.ts' --include='*.js' .
# Hardcoded encryption keys
grep -rniE '(encrypt|cipher|secret|key)\s*[:=]\s*["\x27][A-Za-z0-9+/=]{8,}' --include='*.ts' --include='*.js' --include='*.py' .
# HTTP (not HTTPS) for sensitive operations
grep -rniE 'http://.*\.(api|auth|login|payment)' --include='*.ts' --include='*.js' --include='*.py' .
# Missing encryption at rest
grep -rniE '(password|ssn|credit.?card|social.?security)' --include='*.sql' --include='*.prisma' --include='*.schema' .
```
### Vulnerable Code Example
```python
# BAD: MD5 for password hashing
import hashlib
password_hash = hashlib.md5(password.encode()).hexdigest()
```
### Remediation
```python
# GOOD: bcrypt with proper work factor
import bcrypt
password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
```
---
## A03: Injection
**CWE**: CWE-20, CWE-74, CWE-79, CWE-89
### Detection Patterns
```bash
# SQL string concatenation/interpolation
grep -rniE "(query|execute|raw)\s*\(\s*[\`\"'].*(\+|\$\{|%s|\.format)" --include='*.ts' --include='*.js' --include='*.py' .
grep -rniE "f[\"'].*SELECT.*\{" --include='*.py' .
# NoSQL injection
grep -rniE '\$where|\$regex.*req\.' --include='*.ts' --include='*.js' .
grep -rniE 'find\(\s*\{.*req\.(body|query|params)' --include='*.ts' --include='*.js' .
# OS command injection
grep -rniE '(child_process|exec|execSync|spawn|system|popen|subprocess)\s*\(.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
# XPath/LDAP injection
grep -rniE '(xpath|ldap).*\+.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
# Template injection
grep -rniE '(render_template_string|Template\(.*req\.|eval\(.*req\.)' --include='*.py' --include='*.js' .
```
### Vulnerable Code Example
```javascript
// BAD: SQL string concatenation
const result = await db.query(`SELECT * FROM users WHERE id = ${req.params.id}`);
```
### Remediation
```javascript
// GOOD: Parameterized query
const result = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
```
---
## A04: Insecure Design
**CWE**: CWE-209, CWE-256, CWE-501, CWE-522
### Detection Patterns
```bash
# Missing rate limiting on auth endpoints
grep -rniE '(login|register|reset.?password|forgot.?password)' --include='*.ts' --include='*.js' --include='*.py' .
# Then check if rate limiting middleware is applied
# No account lockout mechanism
grep -rniE 'failed.?login|login.?attempt|max.?retries' --include='*.ts' --include='*.js' --include='*.py' .
# Business logic without validation
grep -rniE '(transfer|withdraw|purchase|delete.?account)' --include='*.ts' --include='*.js' --include='*.py' .
# Then check for confirmation/validation steps
```
### Checks
- [ ] Authentication flows have rate limiting
- [ ] Account lockout after N failed attempts
- [ ] Multi-step operations have proper state validation
- [ ] Business-critical operations require confirmation
- [ ] Threat modeling has been performed (see Phase 3)
### Remediation
Implement defense-in-depth: rate limiting, input validation, business logic validation, and multi-step confirmation for critical operations.
---
## A05: Security Misconfiguration
**CWE**: CWE-2, CWE-11, CWE-13, CWE-15, CWE-16, CWE-388
### Detection Patterns
```bash
# Debug mode enabled
grep -rniE '(DEBUG|NODE_ENV)\s*[:=]\s*(true|True|1|"development"|"debug")' \
--include='*.env' --include='*.env.*' --include='*.py' --include='*.json' --include='*.yaml' .
# Default credentials
grep -rniE '(admin|root|test|default).*[:=].*password' --include='*.env' --include='*.yaml' --include='*.json' --include='*.py' .
# Verbose error responses (stack traces to client)
grep -rniE '(stack|stackTrace|traceback).*res\.(json|send)|app\.use.*err.*stack' --include='*.ts' --include='*.js' .
# Missing security headers
grep -rniE '(helmet|X-Frame-Options|X-Content-Type-Options|Strict-Transport-Security)' --include='*.ts' --include='*.js' .
# Directory listing enabled
grep -rniE 'autoindex\s+on|directory.?listing|serveStatic.*index.*false' --include='*.conf' --include='*.ts' --include='*.js' .
# Unnecessary features/services
grep -rniE '(graphiql|playground|swagger-ui).*true' --include='*.ts' --include='*.js' --include='*.py' --include='*.yaml' .
```
### Vulnerable Code Example
```javascript
// BAD: Stack trace in error response
app.use((err, req, res, next) => {
res.status(500).json({ error: err.message, stack: err.stack });
});
```
### Remediation
```javascript
// GOOD: Generic error response in production
app.use((err, req, res, next) => {
console.error(err.stack); // Log internally
res.status(500).json({ error: 'Internal server error' });
});
```
---
## A06: Vulnerable and Outdated Components
**CWE**: CWE-1104
### Detection Patterns
```bash
# Check dependency lock files age
ls -la package-lock.json yarn.lock requirements.txt Pipfile.lock go.sum 2>/dev/null
# Run package audits (from Phase 1)
npm audit --json 2>/dev/null
pip-audit --format json 2>/dev/null
# Check for pinned vs unpinned dependencies
grep -E ':\s*"\^|:\s*"~|:\s*"\*|>=\s' package.json 2>/dev/null
grep -E '^[a-zA-Z].*[^=]==[^=]' requirements.txt 2>/dev/null # Good: pinned
grep -E '^[a-zA-Z].*>=|^[a-zA-Z][^=]*$' requirements.txt 2>/dev/null # Bad: unpinned
```
### Checks
- [ ] All dependencies have pinned versions
- [ ] No known CVEs in dependencies (via audit tools)
- [ ] Dependencies are actively maintained (not abandoned)
- [ ] Lock files are committed to version control
### Remediation
Run `npm audit fix` or `pip install --upgrade` for vulnerable packages. Pin all dependency versions. Set up automated dependency scanning (Dependabot, Renovate).
---
## A07: Identification and Authentication Failures
**CWE**: CWE-255, CWE-259, CWE-287, CWE-384
### Detection Patterns
```bash
# Weak password requirements
grep -rniE 'password.*length.*[0-5]|minlength.*[0-5]|min.?length.*[0-5]' --include='*.ts' --include='*.js' --include='*.py' .
# Missing password hashing
grep -rniE 'password\s*[:=].*req\.' --include='*.ts' --include='*.js' .
# Then check if bcrypt/argon2/scrypt is used before storage
# Session fixation (no rotation after login)
grep -rniE 'session\.regenerate|session\.id\s*=' --include='*.ts' --include='*.js' .
# JWT without expiration
grep -rniE 'jwt\.sign\(' --include='*.ts' --include='*.js' .
# Then check for expiresIn option
# Credentials in URL
grep -rniE '(token|key|password|secret)=[^&\s]+' --include='*.ts' --include='*.js' --include='*.py' .
```
### Vulnerable Code Example
```javascript
// BAD: JWT without expiration
const token = jwt.sign({ userId: user.id }, SECRET);
```
### Remediation
```javascript
// GOOD: JWT with expiration and proper claims
const token = jwt.sign(
{ userId: user.id, role: user.role },
SECRET,
{ expiresIn: '1h', issuer: 'myapp', audience: 'myapp-client' }
);
```
---
## A08: Software and Data Integrity Failures
**CWE**: CWE-345, CWE-353, CWE-426, CWE-494, CWE-502
### Detection Patterns
```bash
# Insecure deserialization
grep -rniE '(pickle\.load|yaml\.load\(|unserialize|JSON\.parse\(.*req\.|eval\()' --include='*.py' --include='*.ts' --include='*.js' --include='*.php' .
# Missing integrity checks on downloads/updates
grep -rniE '(download|fetch|curl|wget)' --include='*.sh' --include='*.yaml' --include='*.yml' .
# Then check for checksum/signature verification
# CI/CD pipeline without pinned action versions
grep -rniE 'uses:\s*[^@]+$|uses:.*@(main|master|latest)' .github/workflows/*.yml 2>/dev/null
# Unsafe YAML loading
grep -rniE 'yaml\.load\(' --include='*.py' .
# Should be yaml.safe_load()
```
### Vulnerable Code Example
```python
# BAD: Unsafe YAML loading
import yaml
data = yaml.load(user_input) # Allows arbitrary code execution
```
### Remediation
```python
# GOOD: Safe YAML loading
import yaml
data = yaml.safe_load(user_input)
```
---
## A09: Security Logging and Monitoring Failures
**CWE**: CWE-223, CWE-532, CWE-778
### Detection Patterns
```bash
# Check for logging of auth events
grep -rniE '(log|logger|logging)\.' --include='*.ts' --include='*.js' --include='*.py' .
# Then check if login/logout/failed-auth events are logged
# Sensitive data in logs
grep -rniE 'log.*(password|token|secret|credit.?card|ssn)' --include='*.ts' --include='*.js' --include='*.py' .
# Empty catch blocks (swallowed errors)
grep -rniE 'catch\s*\([^)]*\)\s*\{\s*\}' --include='*.ts' --include='*.js' .
# Missing audit trail for critical operations
grep -rniE '(delete|update|create|transfer)' --include='*.ts' --include='*.js' --include='*.py' .
# Then check if these operations are logged with user context
```
### Checks
- [ ] Failed login attempts are logged with IP and timestamp
- [ ] Successful logins are logged
- [ ] Access control failures are logged
- [ ] Input validation failures are logged
- [ ] Sensitive data is NOT logged (passwords, tokens, PII)
- [ ] Logs include sufficient context (who, what, when, where)
### Remediation
Implement structured logging with: user ID, action, timestamp, IP address, result (success/failure). Exclude sensitive data. Set up log monitoring and alerting for anomalous patterns.
---
## A10: Server-Side Request Forgery (SSRF)
**CWE**: CWE-918
### Detection Patterns
```bash
# User-controlled URLs in fetch/request calls
grep -rniE '(fetch|axios|http\.request|requests\.(get|post)|urllib)\s*\(.*req\.(body|query|params)' \
--include='*.ts' --include='*.js' --include='*.py' .
# URL construction from user input
grep -rniE '(url|endpoint|target|redirect)\s*[:=].*req\.(body|query|params)' --include='*.ts' --include='*.js' --include='*.py' .
# Image/file fetch from URL
grep -rniE '(download|fetchImage|getFile|loadUrl)\s*\(.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
# Redirect without validation
grep -rniE 'res\.redirect\(.*req\.|redirect_to.*request\.' --include='*.ts' --include='*.js' --include='*.py' .
```
### Vulnerable Code Example
```javascript
// BAD: Unvalidated URL fetch
app.get('/proxy', async (req, res) => {
const response = await fetch(req.query.url); // Can access internal services
res.send(await response.text());
});
```
### Remediation
```javascript
// GOOD: URL allowlist validation
const ALLOWED_HOSTS = ['api.example.com', 'cdn.example.com'];
app.get('/proxy', async (req, res) => {
const url = new URL(req.query.url);
if (!ALLOWED_HOSTS.includes(url.hostname)) {
return res.status(400).json({ error: 'Host not allowed' });
}
if (url.protocol !== 'https:') {
return res.status(400).json({ error: 'HTTPS required' });
}
const response = await fetch(url.toString());
res.send(await response.text());
});
```
---
## Quick Reference
| ID | Category | Key Grep Pattern | Severity Baseline |
|----|----------|-----------------|-------------------|
| A01 | Broken Access Control | `findById.*params` without owner check | High |
| A02 | Cryptographic Failures | `md5\|sha1` for passwords | High |
| A03 | Injection | `query.*\+.*req\.\|f".*SELECT.*\{` | Critical |
| A04 | Insecure Design | Missing rate limit on auth routes | Medium |
| A05 | Security Misconfiguration | `DEBUG.*true\|stack.*res.json` | Medium |
| A06 | Vulnerable Components | `npm audit` / `pip-audit` results | Varies |
| A07 | Auth Failures | `jwt.sign` without `expiresIn` | High |
| A08 | Integrity Failures | `pickle.load\|yaml.load` | High |
| A09 | Logging Failures | Empty catch blocks, no auth logging | Medium |
| A10 | SSRF | `fetch.*req.query.url` | High |

View File

@@ -0,0 +1,141 @@
# Scoring Gates
Defines the 10-point scoring system, severity weights, quality gates, and trend tracking format for security audits.
## When to Use
| Phase | Usage | Section |
|-------|-------|---------|
| Phase 1 | Quick-scan scoring (daily gate) | Severity Weights, Daily Gate |
| Phase 4 | Full audit scoring and reporting | All sections |
---
## 10-Point Scale
All security audit scores are on a 0-10 scale where 10 = no findings and 0 = critical exposure.
| Score | Rating | Description |
|-------|--------|-------------|
| 9.0 - 10.0 | Excellent | Minimal risk. Production-ready without reservations. |
| 7.0 - 8.9 | Good | Low risk. Acceptable for production with minor improvements. |
| 5.0 - 6.9 | Fair | Moderate risk. Remediation recommended before production. |
| 3.0 - 4.9 | Poor | High risk. Remediation required. Not production-ready. |
| 0.0 - 2.9 | Critical | Severe exposure. Immediate action required. |
## Severity Weights
Each finding is weighted by severity for score calculation.
| Severity | Weight | Criteria | Examples |
|----------|--------|----------|----------|
| **Critical** | 10 | Exploitable with high impact, no user interaction needed | RCE, SQL injection with data access, leaked production credentials, auth bypass |
| **High** | 7 | Exploitable with significant impact, may need user interaction | Broken authentication, SSRF, privilege escalation, XSS with session theft |
| **Medium** | 4 | Limited exploitability or moderate impact | Reflected XSS, CSRF, verbose error messages, missing security headers |
| **Low** | 1 | Informational or minimal impact | Missing best-practice headers, minor info disclosure, deprecated dependencies without known exploit |
## Score Calculation
```
Input:
findings[] -- array of all findings with severity
files_scanned -- total source files analyzed
Algorithm:
base_score = 10.0
normalization = max(10, files_scanned)
weighted_sum = 0
for each finding:
weighted_sum += severity_weight(finding.severity)
penalty = weighted_sum / normalization
final_score = max(0, base_score - penalty)
final_score = round(final_score, 1)
return final_score
```
**Example**:
| Findings | Files Scanned | Weighted Sum | Penalty | Score |
|----------|--------------|--------------|---------|-------|
| 1 critical | 50 | 10 | 0.2 | 9.8 |
| 2 critical, 3 high | 50 | 41 | 0.82 | 9.2 |
| 5 critical, 10 high | 50 | 120 | 2.4 | 7.6 |
| 10 critical, 20 high, 15 medium | 100 | 300 | 3.0 | 7.0 |
| 20 critical | 20 | 200 | 10.0 | 0.0 |
## Quality Gates
### Daily Quick-Scan Gate
Applies to Phase 1 (Supply Chain Scan) only.
| Result | Condition | Action |
|--------|-----------|--------|
| **PASS** | score >= 8.0 | Continue. No blocking issues. |
| **WARN** | 6.0 <= score < 8.0 | Log warning. Review findings before deploy. |
| **FAIL** | score < 6.0 | Block deployment. Remediate critical/high findings. |
### Comprehensive Audit Gate
Applies to full audit (all 4 phases).
**Initial/Baseline audit** (no previous audit exists):
| Result | Condition | Action |
|--------|-----------|--------|
| **PASS** | score >= 2.0 | Baseline established. Plan remediation. |
| **FAIL** | score < 2.0 | Critical exposure. Immediate triage required. |
**Subsequent audits** (previous audit exists):
| Result | Condition | Action |
|--------|-----------|--------|
| **PASS** | score >= previous_score | No regression. Continue improvement. |
| **WARN** | score within 0.5 of previous | Marginal change. Review new findings. |
| **FAIL** | score < previous_score - 0.5 | Regression detected. Investigate new findings. |
**Production readiness target**: score >= 7.0
## Trend Tracking Format
Each audit report stores trend data for comparison.
```json
{
"trend": {
"current_date": "2026-03-29",
"current_score": 7.5,
"previous_date": "2026-03-22",
"previous_score": 6.8,
"score_delta": 0.7,
"new_findings": 2,
"resolved_findings": 5,
"direction": "improving",
"history": [
{ "date": "2026-03-15", "score": 5.2, "total_findings": 45 },
{ "date": "2026-03-22", "score": 6.8, "total_findings": 32 },
{ "date": "2026-03-29", "score": 7.5, "total_findings": 29 }
]
}
}
```
**Direction values**:
| Direction | Condition |
|-----------|-----------|
| `improving` | score_delta > 0.5 |
| `stable` | -0.5 <= score_delta <= 0.5 |
| `regressing` | score_delta < -0.5 |
| `baseline` | No previous audit exists |
## Finding Deduplication
When the same vulnerability appears in multiple phases:
1. Keep the highest-severity classification
2. Merge evidence from all phases
3. Count as a single finding for scoring
4. Note all phases that detected it

View File

@@ -0,0 +1,105 @@
---
name: ship
description: Structured release pipeline with pre-flight checks, AI code review, version bump, changelog, and PR creation. Triggers on "ship", "release", "publish".
allowed-tools: Read, Write, Bash, Glob, Grep
---
# Ship
Structured release pipeline that guides code from working branch to pull request through 5 gated phases: pre-flight checks, automated code review, version bump, changelog generation, and PR creation.
## Key Design Principles
1. **Phase Gates**: Each phase must pass before the next begins — no shipping broken code
2. **Multi-Project Support**: Detects npm (package.json), Python (pyproject.toml), and generic (VERSION) projects
3. **AI-Powered Review**: Uses CCW CLI to run automated code review before release
4. **Audit Trail**: Each phase produces structured output for traceability
5. **Safe Defaults**: Warns on risky operations (direct push to main, major version bumps)
## Architecture Overview
```
User: "ship" / "release" / "publish"
|
v
┌──────────────────────────────────────────────────────────┐
│ Phase 1: Pre-Flight Checks │
│ → git clean? branch ok? tests pass? build ok? │
│ → Output: preflight-report.json │
│ → Gate: ALL checks must pass │
├──────────────────────────────────────────────────────────┤
│ Phase 2: Code Review │
│ → detect merge base, diff against base │
│ → ccw cli --tool gemini --mode analysis │
│ → flag high-risk changes │
│ → Output: review-summary │
│ → Gate: No critical issues flagged │
├──────────────────────────────────────────────────────────┤
│ Phase 3: Version Bump │
│ → detect version file (package.json/pyproject.toml/VERSION)
│ → determine bump type from commits or user input │
│ → update version file │
│ → Output: version change record │
│ → Gate: Version updated successfully │
├──────────────────────────────────────────────────────────┤
│ Phase 4: Changelog & Commit │
│ → generate changelog from git log since last tag │
│ → update CHANGELOG.md │
│ → create release commit, push to remote │
│ → Output: commit SHA │
│ → Gate: Push successful │
├──────────────────────────────────────────────────────────┤
│ Phase 5: PR Creation │
│ → gh pr create with structured body │
│ → auto-link issues from commits │
│ → Output: PR URL │
│ → Gate: PR created │
└──────────────────────────────────────────────────────────┘
```
## Execution Flow
Execute phases sequentially. Each phase has a gate condition — if the gate fails, stop and report status.
1. **Phase 1**: [Pre-Flight Checks](phases/01-preflight-checks.md) -- Validate git state, branch, tests, build
2. **Phase 2**: [Code Review](phases/02-code-review.md) -- AI-powered diff review with risk assessment
3. **Phase 3**: [Version Bump](phases/03-version-bump.md) -- Detect and update version across project types
4. **Phase 4**: [Changelog & Commit](phases/04-changelog-commit.md) -- Generate changelog, create release commit, push
5. **Phase 5**: [PR Creation](phases/05-pr-creation.md) -- Create PR with structured body and issue links
## Pre-Flight Checklist (Quick Reference)
| Check | Command | Pass Condition |
|-------|---------|----------------|
| Git clean | `git status --porcelain` | Empty output |
| Branch | `git branch --show-current` | Not main/master |
| Tests | `npm test` / `pytest` | Exit code 0 |
| Build | `npm run build` / `python -m build` | Exit code 0 |
## Completion Status Protocol
This skill follows the Completion Status Protocol defined in [SKILL-DESIGN-SPEC.md sections 13-14](../_shared/SKILL-DESIGN-SPEC.md#13-completion-status-protocol).
Every execution terminates with one of:
| Status | When |
|--------|------|
| **DONE** | All 5 phases completed, PR created |
| **DONE_WITH_CONCERNS** | PR created but with review warnings or non-critical issues |
| **BLOCKED** | A gate failed (dirty git, tests fail, push rejected) |
| **NEEDS_CONTEXT** | Cannot determine bump type, ambiguous branch target |
### Escalation
Follows the Three-Strike Rule (SKILL-DESIGN-SPEC section 14). On 3 consecutive failures at the same step, stop and output diagnostic dump.
## Reference Documents
| Document | Purpose |
|----------|---------|
| [phases/01-preflight-checks.md](phases/01-preflight-checks.md) | Git, branch, test, build validation |
| [phases/02-code-review.md](phases/02-code-review.md) | AI-powered diff review |
| [phases/03-version-bump.md](phases/03-version-bump.md) | Version detection and bump |
| [phases/04-changelog-commit.md](phases/04-changelog-commit.md) | Changelog generation and release commit |
| [phases/05-pr-creation.md](phases/05-pr-creation.md) | PR creation with issue linking |
| [../_shared/SKILL-DESIGN-SPEC.md](../_shared/SKILL-DESIGN-SPEC.md) | Skill design spec (completion protocol, escalation) |

View File

@@ -0,0 +1,121 @@
# Phase 1: Pre-Flight Checks
Validate that the repository is in a shippable state before proceeding with the release pipeline.
## Objective
- Confirm working tree is clean (no uncommitted changes)
- Validate current branch is appropriate for release
- Run test suite and confirm all tests pass
- Verify build succeeds
## Gate Condition
ALL four checks must pass. If any check fails, stop the pipeline and report BLOCKED status with the specific failure.
## Execution Steps
### Step 1: Git Clean Check
```bash
git_status=$(git status --porcelain)
if [ -n "$git_status" ]; then
echo "FAIL: Working tree is dirty"
echo "$git_status"
# Gate: BLOCKED — commit or stash changes first
else
echo "PASS: Working tree is clean"
fi
```
**Pass condition**: `git status --porcelain` produces empty output.
**On failure**: Report dirty files and suggest `git stash` or `git commit`.
### Step 2: Branch Validation
```bash
current_branch=$(git branch --show-current)
if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
echo "WARN: Currently on $current_branch — direct push to main/master is risky"
# Ask user for confirmation before proceeding
else
echo "PASS: On branch $current_branch"
fi
```
**Pass condition**: Not on main/master, OR user explicitly confirms direct-to-main release.
**On warning**: Ask user to confirm they intend to release from main/master directly.
### Step 3: Test Suite Execution
Detect and run the project's test suite:
```bash
# Detection priority:
# 1. package.json with "test" script → npm test
# 2. pytest available and tests exist → pytest
# 3. No tests found → WARN and continue
if [ -f "package.json" ] && grep -q '"test"' package.json; then
npm test
elif command -v pytest &>/dev/null && [ -d "tests" -o -d "test" ]; then
pytest
elif [ -f "pyproject.toml" ] && grep -q 'pytest' pyproject.toml; then
pytest
else
echo "WARN: No test suite detected — skipping test check"
fi
```
**Pass condition**: Test command exits with code 0, or no tests detected (warn).
**On failure**: Report test failures and stop the pipeline.
### Step 4: Build Verification
Detect and run the project's build step:
```bash
# Detection priority:
# 1. package.json with "build" script → npm run build
# 2. pyproject.toml → python -m build (if build module available)
# 3. Makefile with build target → make build
# 4. No build step → PASS (not all projects need a build)
if [ -f "package.json" ] && grep -q '"build"' package.json; then
npm run build
elif [ -f "pyproject.toml" ] && python -m build --help &>/dev/null; then
python -m build
elif [ -f "Makefile" ] && grep -q '^build:' Makefile; then
make build
else
echo "INFO: No build step detected — skipping build check"
fi
```
**Pass condition**: Build command exits with code 0, or no build step detected.
**On failure**: Report build errors and stop the pipeline.
## Output
- **Format**: JSON object with pass/fail per check
- **Structure**:
```json
{
"phase": "preflight",
"timestamp": "ISO-8601",
"checks": {
"git_clean": { "status": "pass|fail", "details": "" },
"branch": { "status": "pass|warn", "current": "branch-name", "details": "" },
"tests": { "status": "pass|fail|skip", "details": "" },
"build": { "status": "pass|fail|skip", "details": "" }
},
"overall": "pass|fail",
"blockers": []
}
```
## Next Phase
If all checks pass, proceed to [Phase 2: Code Review](02-code-review.md).
If any check fails, report BLOCKED status with the preflight report.

View File

@@ -0,0 +1,137 @@
# Phase 2: Code Review
Automated AI-powered code review of changes since the base branch, with risk assessment.
## Objective
- Detect the merge base between current branch and target branch
- Generate diff for review
- Run AI-powered code review via CCW CLI
- Flag high-risk changes (large diffs, sensitive files, breaking changes)
## Gate Condition
No critical issues flagged by the review. Warnings are reported but do not block.
## Execution Steps
### Step 1: Detect Merge Base
```bash
# Determine target branch (default: main, fallback: master)
target_branch="main"
if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
target_branch="master"
fi
# Find merge base
merge_base=$(git merge-base "origin/$target_branch" HEAD)
echo "Merge base: $merge_base"
# If on main/master directly, compare against last tag
current_branch=$(git branch --show-current)
if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$last_tag" ]; then
merge_base="$last_tag"
echo "On main — using last tag as base: $last_tag"
else
# Use first commit if no tags exist
merge_base=$(git rev-list --max-parents=0 HEAD | head -1)
echo "No tags found — using initial commit as base"
fi
fi
```
### Step 2: Generate Diff Summary
```bash
# File-level summary
git diff --stat "$merge_base"...HEAD
# Full diff for review
git diff "$merge_base"...HEAD > /tmp/ship-review-diff.txt
# Count changes for risk assessment
files_changed=$(git diff --name-only "$merge_base"...HEAD | wc -l)
lines_added=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$1} END {print s}')
lines_removed=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$2} END {print s}')
```
### Step 3: Risk Assessment
Flag high-risk indicators before AI review:
| Risk Factor | Threshold | Risk Level |
|-------------|-----------|------------|
| Files changed | > 50 | High |
| Lines changed | > 1000 | High |
| Sensitive files modified | Any of: `.env*`, `*secret*`, `*credential*`, `*auth*`, `*.key`, `*.pem` | High |
| Config files modified | `package.json`, `pyproject.toml`, `tsconfig.json`, `Dockerfile` | Medium |
| Migration files | `*migration*`, `*migrate*` | Medium |
```bash
# Check for sensitive file changes
sensitive_files=$(git diff --name-only "$merge_base"...HEAD | grep -iE '\.(env|key|pem)|secret|credential' || true)
if [ -n "$sensitive_files" ]; then
echo "HIGH RISK: Sensitive files modified:"
echo "$sensitive_files"
fi
```
### Step 4: AI Code Review
Use CCW CLI for automated analysis:
```bash
ccw cli -p "PURPOSE: Review code changes for release readiness; success = all critical issues identified with file:line references
TASK: Review diff for bugs | Check for breaking changes | Identify security concerns | Assess test coverage gaps
MODE: analysis
CONTEXT: @**/* | Reviewing diff from $merge_base to HEAD ($files_changed files, +$lines_added/-$lines_removed lines)
EXPECTED: Risk assessment (low/medium/high), list of issues with severity and file:line, release recommendation (ship/hold/fix-first)
CONSTRAINTS: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes
" --tool gemini --mode analysis
```
**Note**: Wait for the CLI analysis to complete before proceeding. Do not proceed to Phase 3 while review is running.
### Step 5: Evaluate Review Results
Based on the AI review output:
| Review Result | Action |
|---------------|--------|
| No critical issues | Proceed to Phase 3 |
| Critical issues found | Report BLOCKED, list issues |
| Warnings only | Proceed with DONE_WITH_CONCERNS note |
| Review failed/timeout | Ask user whether to proceed or retry |
## Output
- **Format**: Review summary with risk assessment
- **Structure**:
```json
{
"phase": "code-review",
"merge_base": "commit-sha",
"stats": {
"files_changed": 0,
"lines_added": 0,
"lines_removed": 0
},
"risk_level": "low|medium|high",
"risk_factors": [],
"ai_review": {
"recommendation": "ship|hold|fix-first",
"critical_issues": [],
"warnings": []
},
"overall": "pass|fail|warn"
}
```
## Next Phase
If review passes (no critical issues), proceed to [Phase 3: Version Bump](03-version-bump.md).
If critical issues found, report BLOCKED status with review summary.

View File

@@ -0,0 +1,171 @@
# Phase 3: Version Bump
Detect the current version, determine the bump type, and update the version file.
## Objective
- Detect which version file the project uses
- Read the current version
- Determine bump type (patch/minor/major) from commit messages or user input
- Update the version file
- Record the version change
## Gate Condition
Version file updated successfully with the new version.
## Execution Steps
### Step 1: Detect Version File
Detection priority order:
| Priority | File | Read Method |
|----------|------|-------------|
| 1 | `package.json` | `jq -r .version package.json` |
| 2 | `pyproject.toml` | `grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml` |
| 3 | `VERSION` | `cat VERSION` |
```bash
if [ -f "package.json" ]; then
version_file="package.json"
current_version=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
elif [ -f "pyproject.toml" ]; then
version_file="pyproject.toml"
current_version=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
elif [ -f "VERSION" ]; then
version_file="VERSION"
current_version=$(cat VERSION | tr -d '[:space:]')
else
echo "NEEDS_CONTEXT: No version file found"
echo "Expected one of: package.json, pyproject.toml, VERSION"
# Ask user which file to use or create
fi
echo "Version file: $version_file"
echo "Current version: $current_version"
```
### Step 2: Determine Bump Type
**Auto-detection from commit messages** (conventional commits):
```bash
# Get commits since last tag
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$last_tag" ]; then
commits=$(git log "$last_tag"..HEAD --oneline)
else
commits=$(git log --oneline -20)
fi
# Scan for conventional commit prefixes
has_breaking=$(echo "$commits" | grep -iE '(BREAKING CHANGE|!:)' || true)
has_feat=$(echo "$commits" | grep -iE '^[a-f0-9]+ feat' || true)
has_fix=$(echo "$commits" | grep -iE '^[a-f0-9]+ fix' || true)
if [ -n "$has_breaking" ]; then
suggested_bump="major"
elif [ -n "$has_feat" ]; then
suggested_bump="minor"
else
suggested_bump="patch"
fi
echo "Suggested bump: $suggested_bump"
```
**User confirmation**:
- For `patch` and `minor`: proceed with suggested bump, inform user
- For `major`: always ask user to confirm before proceeding (major bumps have significant implications)
- User can override the suggestion with an explicit bump type
### Step 3: Calculate New Version
```bash
# Parse semver components
IFS='.' read -r major minor patch <<< "$current_version"
case "$bump_type" in
major)
new_version="$((major + 1)).0.0"
;;
minor)
new_version="${major}.$((minor + 1)).0"
;;
patch)
new_version="${major}.${minor}.$((patch + 1))"
;;
esac
echo "Version bump: $current_version -> $new_version"
```
### Step 4: Update Version File
```bash
case "$version_file" in
package.json)
# Use node/jq for safe JSON update
jq --arg v "$new_version" '.version = $v' package.json > tmp.json && mv tmp.json package.json
# Also update package-lock.json if it exists
if [ -f "package-lock.json" ]; then
jq --arg v "$new_version" '.version = $v | .packages[""].version = $v' package-lock.json > tmp.json && mv tmp.json package-lock.json
fi
;;
pyproject.toml)
# Use sed for TOML update (version line in [project] or [tool.poetry])
sed -i "s/^version\s*=\s*\".*\"/version = \"$new_version\"/" pyproject.toml
;;
VERSION)
echo "$new_version" > VERSION
;;
esac
echo "Updated $version_file: $current_version -> $new_version"
```
### Step 5: Verify Update
```bash
# Re-read to confirm
case "$version_file" in
package.json)
verified=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
;;
pyproject.toml)
verified=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
;;
VERSION)
verified=$(cat VERSION | tr -d '[:space:]')
;;
esac
if [ "$verified" = "$new_version" ]; then
echo "PASS: Version verified as $new_version"
else
echo "FAIL: Version mismatch — expected $new_version, got $verified"
fi
```
## Output
- **Format**: Version change record
- **Structure**:
```json
{
"phase": "version-bump",
"version_file": "package.json",
"previous_version": "1.2.3",
"new_version": "1.3.0",
"bump_type": "minor",
"bump_source": "auto-detected|user-specified",
"overall": "pass|fail"
}
```
## Next Phase
If version updated successfully, proceed to [Phase 4: Changelog & Commit](04-changelog-commit.md).
If version update fails, report BLOCKED status.

View File

@@ -0,0 +1,167 @@
# Phase 4: Changelog & Commit
Generate changelog entry from git history, update CHANGELOG.md, create release commit, and push to remote.
## Objective
- Parse git log since last tag into grouped changelog entry
- Update or create CHANGELOG.md
- Create a release commit with version in the message
- Push the branch to remote
## Gate Condition
Release commit created and pushed to remote successfully.
## Execution Steps
### Step 1: Gather Commits Since Last Tag
```bash
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$last_tag" ]; then
echo "Generating changelog since tag: $last_tag"
git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges
else
echo "No previous tag found — using last 50 commits"
git log --pretty=format:"%h %s" --no-merges -50
fi
```
### Step 2: Group Commits by Conventional Commit Type
Parse commit messages and group into categories:
| Prefix | Category | Changelog Section |
|--------|----------|-------------------|
| `feat:` / `feat(*):`| Features | **Features** |
| `fix:` / `fix(*):`| Bug Fixes | **Bug Fixes** |
| `perf:` | Performance | **Performance** |
| `docs:` | Documentation | **Documentation** |
| `refactor:` | Refactoring | **Refactoring** |
| `chore:` | Maintenance | **Maintenance** |
| `test:` | Testing | *(omitted from changelog)* |
| Other | Miscellaneous | **Other Changes** |
```bash
# Example grouping logic (executed by the agent, not a literal script):
# 1. Read all commits since last tag
# 2. Parse prefix from each commit message
# 3. Group into categories
# 4. Format as markdown sections
# 5. Omit empty categories
```
### Step 3: Format Changelog Entry
Generate a markdown changelog entry:
```markdown
## [X.Y.Z] - YYYY-MM-DD
### Features
- feat: description (sha)
- feat(scope): description (sha)
### Bug Fixes
- fix: description (sha)
### Performance
- perf: description (sha)
### Other Changes
- chore: description (sha)
```
Rules:
- Date format: YYYY-MM-DD (ISO 8601)
- Each entry includes the short SHA for traceability
- Empty categories are omitted
- Entries are listed in chronological order within each category
### Step 4: Update CHANGELOG.md
```bash
if [ -f "CHANGELOG.md" ]; then
# Insert new entry after the first heading line (# Changelog)
# The new entry goes between the main heading and the previous version entry
# Use Write tool to insert the new section at the correct position
echo "Updating existing CHANGELOG.md"
else
# Create new CHANGELOG.md with header
echo "Creating new CHANGELOG.md"
fi
```
**CHANGELOG.md structure**:
```markdown
# Changelog
## [X.Y.Z] - YYYY-MM-DD
(new entry here)
## [X.Y.Z-1] - YYYY-MM-DD
(previous entry)
```
### Step 5: Create Release Commit
```bash
# Stage version file and changelog
git add package.json package-lock.json pyproject.toml VERSION CHANGELOG.md 2>/dev/null
# Only stage files that actually exist and are modified
git add -u
# Create release commit
git commit -m "$(cat <<'EOF'
chore: bump version to X.Y.Z
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EOF
)"
```
**Commit message format**: `chore: bump version to X.Y.Z`
- Follows conventional commit format
- Includes Co-Authored-By trailer
### Step 6: Push to Remote
```bash
current_branch=$(git branch --show-current)
# Check if remote tracking branch exists
if git rev-parse --verify "origin/$current_branch" &>/dev/null; then
git push origin "$current_branch"
else
git push -u origin "$current_branch"
fi
```
**On push failure**:
- If rejected (non-fast-forward): Report BLOCKED, suggest `git pull --rebase`
- If permission denied: Report BLOCKED, check remote access
- If no remote configured: Report BLOCKED, suggest `git remote add`
## Output
- **Format**: Commit and push record
- **Structure**:
```json
{
"phase": "changelog-commit",
"changelog_entry": "## [X.Y.Z] - YYYY-MM-DD ...",
"commit_sha": "abc1234",
"commit_message": "chore: bump version to X.Y.Z",
"pushed_to": "origin/branch-name",
"overall": "pass|fail"
}
```
## Next Phase
If commit and push succeed, proceed to [Phase 5: PR Creation](05-pr-creation.md).
If push fails, report BLOCKED status with error details.

View File

@@ -0,0 +1,163 @@
# Phase 5: PR Creation
Create a pull request via GitHub CLI with a structured body, linked issues, and release metadata.
## Objective
- Create a PR using `gh pr create` with structured body
- Auto-link related issues from commit messages
- Include release summary (version, changes, test plan)
- Output the PR URL
## Gate Condition
PR created successfully and URL returned.
## Execution Steps
### Step 1: Extract Issue References from Commits
```bash
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$last_tag" ]; then
commits=$(git log "$last_tag"..HEAD --pretty=format:"%s" --no-merges)
else
commits=$(git log --pretty=format:"%s" --no-merges -50)
fi
# Extract issue references: fixes #N, closes #N, resolves #N, refs #N
issues=$(echo "$commits" | grep -oiE '(fix(es)?|close[sd]?|resolve[sd]?|refs?)\s*#[0-9]+' | grep -oE '#[0-9]+' | sort -u || true)
echo "Referenced issues: $issues"
```
### Step 2: Determine Target Branch
```bash
# Default target: main (fallback: master)
target_branch="main"
if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
target_branch="master"
fi
current_branch=$(git branch --show-current)
echo "PR: $current_branch -> $target_branch"
```
### Step 3: Build PR Title
Format: `release: vX.Y.Z`
```bash
pr_title="release: v${new_version}"
```
If the version context is not available, fall back to a descriptive title from the branch name.
### Step 4: Build PR Body
Construct the PR body using a HEREDOC for correct formatting:
```bash
# Gather change summary
change_summary=$(git log "$merge_base"..HEAD --pretty=format:"- %s (%h)" --no-merges)
# Build linked issues section
if [ -n "$issues" ]; then
issues_section="## Linked Issues
$(echo "$issues" | while read -r issue; do echo "- $issue"; done)"
else
issues_section=""
fi
```
### Step 5: Create PR via gh CLI
```bash
gh pr create --title "$pr_title" --base "$target_branch" --body "$(cat <<'EOF'
## Summary
Release vX.Y.Z
### Changes
- list of changes from changelog
## Linked Issues
- #N (fixes)
- #M (closes)
## Version
- Previous: X.Y.Z-1
- New: X.Y.Z
- Bump type: patch|minor|major
## Test Plan
- [ ] Pre-flight checks passed (git clean, branch, tests, build)
- [ ] AI code review completed with no critical issues
- [ ] Version bump verified in version file
- [ ] Changelog updated with all changes since last release
- [ ] Release commit pushed successfully
Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
```
**PR body sections**:
| Section | Content |
|---------|---------|
| **Summary** | Version being released, one-line description |
| **Changes** | Grouped changelog entries (from Phase 4) |
| **Linked Issues** | Auto-extracted `fixes #N`, `closes #N` references |
| **Version** | Previous version, new version, bump type |
| **Test Plan** | Checklist confirming all phases passed |
### Step 6: Capture and Report PR URL
```bash
# gh pr create outputs the PR URL on success
pr_url=$(gh pr create ... 2>&1 | tail -1)
echo "PR created: $pr_url"
```
## Output
- **Format**: PR creation record
- **Structure**:
```json
{
"phase": "pr-creation",
"pr_url": "https://github.com/owner/repo/pull/N",
"pr_title": "release: vX.Y.Z",
"target_branch": "main",
"source_branch": "feature-branch",
"linked_issues": ["#1", "#2"],
"overall": "pass|fail"
}
```
## Completion
After PR creation, output the final Completion Status:
```
## STATUS: DONE
**Summary**: Released vX.Y.Z — PR created at {pr_url}
### Details
- Phases completed: 5/5
- Version: {previous} -> {new} ({bump_type})
- PR: {pr_url}
- Key outputs: CHANGELOG.md updated, release commit pushed, PR created
### Outputs
- CHANGELOG.md (updated)
- {version_file} (version bumped)
- Release commit: {sha}
- PR: {pr_url}
```
If there were review warnings, use `DONE_WITH_CONCERNS` and list the warnings in the Details section.

View File

@@ -0,0 +1,392 @@
# Investigator Agent
Executes all 5 phases of the systematic debugging investigation under the Iron Law methodology. Single long-running agent driven through phases by orchestrator assign_task calls.
## Identity
- **Type**: `investigation`
- **Role File**: `~/.codex/skills/investigate/agents/investigator.md`
- **task_name**: `investigator`
- **Responsibility**: Full 5-phase investigation execution — evidence collection, pattern search, hypothesis testing, minimal fix, verification
- **fork_context**: false
- **Reasoning Effort**: high
## Boundaries
### MUST
- Load role definition via MANDATORY FIRST STEPS pattern before any phase execution
- Read the phase file at the start of each phase before executing that phase's steps
- Collect concrete evidence before forming any theories (evidence-first)
- Check `confirmed_root_cause` exists before executing Phase 4 (Iron Law gate)
- Track 3-strike counter accurately in Phase 3
- Implement only minimal fix — change only what addresses the confirmed root cause
- Add a regression test that fails without the fix and passes with it
- Write the final debug report to `.workflow/.debug/` using the schema in `~/.codex/skills/investigate/specs/debug-report-format.md`
- Produce structured output after each phase, then await next assign_task
### MUST NOT
- Skip MANDATORY FIRST STEPS role loading
- Proceed to Phase 4 without `confirmed_root_cause` (Iron Law violation)
- Modify production code during Phases 1-3 (read-only investigation)
- Count a rejected hypothesis as a strike if it yielded new actionable insight
- Refactor, add features, or change formatting beyond the minimal fix
- Change more than 3 files without written justification
- Proceed past Phase 3 BLOCKED status
---
## Toolbox
### Available Tools
| Tool | Type | Purpose |
|------|------|---------|
| `Bash` | Shell execution | Run tests, reproduce bug, detect test framework, run full test suite |
| `Read` | File read | Read source files, test files, phase docs, role files |
| `Write` | File write | Write debug report to `.workflow/.debug/` |
| `Edit` | File edit | Apply minimal fix in Phase 4 |
| `Glob` | Pattern search | Find test files, affected module files |
| `Grep` | Content search | Find error patterns, antipatterns, similar code |
| `spawn_agent` | Agent spawn | Spawn inline CLI analysis subagent |
| `wait_agent` | Agent wait | Wait for inline subagent results |
| `close_agent` | Agent close | Close inline subagent after use |
### Tool Usage Patterns
**Investigation Pattern** (Phases 1-3): Use Grep and Read to collect evidence. No Write or Edit.
**Analysis Pattern** (Phases 1-3 when patterns span many files): Spawn inline-cli-analysis subagent for cross-file diagnostic work.
**Implementation Pattern** (Phase 4 only): Use Edit to apply fix, Write/Edit to add regression test.
**Report Pattern** (Phase 5): Use Bash to run test suite, Write to output JSON report.
---
## Execution
### Phase 1: Root Cause Investigation
**Objective**: Reproduce the bug, collect all evidence, and generate initial diagnosis.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| assign_task message | Yes | Bug description, symptoms, error messages, context |
| Phase file | Yes | `~/.codex/skills/investigate/phases/01-root-cause-investigation.md` |
**Steps**:
1. Read `~/.codex/skills/investigate/phases/01-root-cause-investigation.md` before executing.
2. Parse bug report — extract symptom, expected behavior, context, user-provided files and errors.
3. Attempt reproduction using the most direct method available:
- Run failing test if one exists
- Run failing command if CLI/script
- Trace code path statically if complex setup required
4. Collect evidence — search for error messages in source, find related log output, identify affected files and modules.
5. Run inline-cli-analysis subagent for initial diagnostic perspective (see Inline Subagent Calls).
6. Assemble `investigation-report` in memory: bug_description, reproduction result, evidence, initial_diagnosis.
7. Output Phase 1 summary and await assign_task for Phase 2.
**Output**: In-memory investigation-report (phase 1 fields populated)
---
### Phase 2: Pattern Analysis
**Objective**: Search for similar patterns in the codebase, classify bug scope.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| assign_task message | Yes | Phase 2 instruction |
| Phase file | Yes | `~/.codex/skills/investigate/phases/02-pattern-analysis.md` |
| investigation-report | Yes | Phase 1 output in context |
**Steps**:
1. Read `~/.codex/skills/investigate/phases/02-pattern-analysis.md` before executing.
2. Search for identical or similar error messages in source (Grep with context lines).
3. Search for the same exception/error type across the codebase.
4. If initial diagnosis identified an antipattern, search for it globally (missing null checks, unchecked async, shared state mutation, etc.).
5. Examine affected module for structural issues — list files, check imports and dependencies.
6. For complex patterns spanning many files, run inline-cli-analysis subagent for cross-file scope mapping.
7. Classify scope: `isolated` | `module-wide` | `systemic` with justification.
8. Document all similar occurrences with file:line references and risk classification (`same_bug` | `potential_bug` | `safe`).
9. Add `pattern_analysis` section to investigation-report in memory.
10. Output Phase 2 summary and await assign_task for Phase 3.
**Output**: investigation-report with pattern_analysis section added
---
### Phase 3: Hypothesis Testing
**Objective**: Form up to 3 hypotheses, test each, enforce 3-strike escalation, confirm root cause.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| assign_task message | Yes | Phase 3 instruction |
| Phase file | Yes | `~/.codex/skills/investigate/phases/03-hypothesis-testing.md` |
| investigation-report | Yes | Phase 1-2 output in context |
**Steps**:
1. Read `~/.codex/skills/investigate/phases/03-hypothesis-testing.md` before executing.
2. Form up to 3 ranked hypotheses from Phase 1-2 evidence. Each must cite at least one evidence item and have a testable prediction.
3. Initialize strike counter at 0.
4. Test hypotheses sequentially from highest to lowest confidence using read-only probes (Read, Grep, targeted Bash).
5. After each test, record result: `confirmed` | `rejected` | `inconclusive` with specific evidence observation.
**Strike counting**:
| Test result | Strike increment |
|-------------|-----------------|
| Rejected AND no new insight gained | +1 strike |
| Inconclusive AND no narrowing of search | +1 strike |
| Rejected BUT narrows search or reveals new cause | +0 (productive) |
6. If strike counter reaches 3 — STOP immediately. Output escalation block (see 3-Strike Escalation Output below). Set status BLOCKED.
7. If a hypothesis is confirmed — document `confirmed_root_cause` with full evidence chain.
8. Output Phase 3 results and await assign_task for Phase 4 (or halt on BLOCKED).
**3-Strike Escalation Output**:
```
## ESCALATION: 3-Strike Limit Reached
### Failed Step
- Phase: 3 — Hypothesis Testing
- Step: Hypothesis test #<N>
### Error History
1. Attempt 1: <H1 description>
Test: <what was checked>
Result: <rejected/inconclusive> — <why>
2. Attempt 2: <H2 description>
Test: <what was checked>
Result: <rejected/inconclusive> — <why>
3. Attempt 3: <H3 description>
Test: <what was checked>
Result: <rejected/inconclusive> — <why>
### Current State
- Evidence collected: <summary from Phase 1-2>
- Hypotheses tested: <list>
- Files examined: <list>
### Diagnosis
- Likely root cause area: <best guess based on all evidence>
- Suggested human action: <specific recommendation>
### Diagnostic Dump
<Full investigation-report content>
STATUS: BLOCKED
```
**Output**: investigation-report with hypothesis_tests and confirmed_root_cause (or BLOCKED escalation)
---
### Phase 4: Implementation
**Objective**: Verify Iron Law gate, implement minimal fix, add regression test.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| assign_task message | Yes | Phase 4 instruction |
| Phase file | Yes | `~/.codex/skills/investigate/phases/04-implementation.md` |
| investigation-report | Yes | Must contain confirmed_root_cause |
**Steps**:
1. Read `~/.codex/skills/investigate/phases/04-implementation.md` before executing.
2. **Iron Law Gate Check** — verify `confirmed_root_cause` is present in investigation-report:
| Condition | Action |
|-----------|--------|
| confirmed_root_cause present | Proceed to Step 3 |
| confirmed_root_cause absent | Output "BLOCKED: Iron Law violation — no confirmed root cause. Return to Phase 3." Halt. |
3. Plan the minimal fix before writing any code. Document: description, files to change, change types, estimated lines.
| Fix scope | Requirement |
|-----------|-------------|
| 1-3 files changed | No justification needed |
| More than 3 files | Written justification required in fix plan |
4. Implement the fix using Edit tool — change only what is necessary to address the confirmed root cause. No refactoring, no style changes to unrelated code.
5. Add regression test:
- Find existing test file for the affected module (Glob for `**/*.test.{ts,js,py}` or `**/test_*.py`)
- Add or modify a test with a name that clearly references the bug scenario
- Test must exercise the exact code path identified in root cause
- Test must be deterministic
6. Re-run the original reproduction case from Phase 1. Verify it now passes.
7. Add `fix_applied` section to investigation-report in memory.
8. Output Phase 4 summary and await assign_task for Phase 5.
**Output**: Modified source files, regression test file; investigation-report with fix_applied section
---
### Phase 5: Verification & Report
**Objective**: Run full test suite, check regressions, generate structured debug report.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| assign_task message | Yes | Phase 5 instruction |
| Phase file | Yes | `~/.codex/skills/investigate/phases/05-verification-report.md` |
| investigation-report | Yes | All phases populated |
**Steps**:
1. Read `~/.codex/skills/investigate/phases/05-verification-report.md` before executing.
2. Detect and run the project's test framework:
- Check for `package.json` (npm test)
- Check for `pytest.ini` / `pyproject.toml` (pytest)
- Check for `go.mod` (go test)
- Check for `Cargo.toml` (cargo test)
3. Record test results: total, passed, failed, skipped. Note if regression test passed.
4. Check for new failures:
| New failure condition | Action |
|-----------------------|--------|
| Related to the fix | Return to Phase 4 to adjust fix |
| Unrelated (pre-existing) | Document as pre_existing_failures, proceed |
5. Generate debug report JSON following schema in `~/.codex/skills/investigate/specs/debug-report-format.md`. Populate all required fields from investigation-report phases.
6. Create output directory and write report:
```
Bash: mkdir -p .workflow/.debug
```
Filename: `.workflow/.debug/debug-report-<YYYY-MM-DD>-<slug>.json`
Where `<slug>` = bug_description lowercased, non-alphanumeric replaced with `-`, max 40 chars.
7. Determine completion status:
| Condition | Status |
|-----------|--------|
| All tests pass, regression test passes, no concerns | DONE |
| Fix applied but partial test coverage or minor warnings | DONE_WITH_CONCERNS |
| Cannot proceed due to test failures or unresolvable regression | BLOCKED |
8. Output completion status block.
**Output**: `.workflow/.debug/debug-report-<date>-<slug>.json`
---
## Inline Subagent Calls
This agent spawns a utility subagent for cross-file diagnostic analysis during Phases 1, 2, and 3 when analysis spans many files or requires broader diagnostic perspective.
### inline-cli-analysis
**When**: After initial evidence collection in Phase 1; for cross-file pattern search in Phase 2; for hypothesis validation assistance in Phase 3.
**Agent File**: `~/.codex/agents/cli-explore-agent.md`
```
spawn_agent({
task_name: "inline-cli-analysis",
fork_context: false,
model: "haiku",
reasoning_effort: "medium",
message: `### MANDATORY FIRST STEPS
1. Read: ~/.codex/agents/cli-explore-agent.md
<analysis task description — e.g.:
PURPOSE: Diagnose root cause of bug from collected evidence
TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
MODE: analysis
CONTEXT: @<affected_files> | Evidence: <error_messages_and_traces>
EXPECTED: Top 3 likely root causes ranked by evidence strength
CONSTRAINTS: Read-only analysis | Focus on <affected_module>>
Expected: Structured findings with file:line references`
})
const result = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
close_agent({ target: "inline-cli-analysis" })
```
Substitute the analysis task description with phase-appropriate content:
- Phase 1: Initial diagnosis from error evidence
- Phase 2: Cross-file pattern search and scope mapping
- Phase 3: Hypothesis validation assistance
### Result Handling
| Result | Action |
|--------|--------|
| Success | Integrate findings into investigation-report, continue |
| Timeout / Error | Continue without subagent result, log warning in investigation-report |
---
## Structured Output Template
After each phase, output the following structure before awaiting the next assign_task:
```
## Phase <N> Complete
### Summary
- <one-sentence status of what was accomplished>
### Findings
- <Finding 1>: <specific description with file:line reference>
- <Finding 2>: <specific description with file:line reference>
### Investigation Report Update
- Fields updated: <list of fields added/modified this phase>
- Key data: <most important finding from this phase>
### Status
<AWAITING_NEXT_PHASE | BLOCKED: <reason> | DONE>
```
Final Phase 5 output follows Completion Status Protocol:
```
## STATUS: DONE
**Summary**: Fixed <bug_description> — root cause was <root_cause_summary>
### Details
- Phases completed: 5/5
- Root cause: <confirmed_root_cause>
- Fix: <fix_description>
- Regression test: <test_name> in <test_file>
### Outputs
- Debug report: <reportPath>
- Files changed: <list>
- Tests added: <list>
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Bug not reproducible | Document as concern, continue with static analysis; note in report |
| Error message not found in source | Expand search scope; try related terms; use inline subagent |
| Phase file not found | Report "BLOCKED: Cannot read phase file <path>" |
| Iron Law gate fails in Phase 4 | Output BLOCKED status, halt, do not modify any files |
| Fix introduces regression | Analyze the new failure, adjust fix within same Phase 4 context |
| Test framework not detected | Document in report concerns; attempt common commands (`npm test`, `pytest`, `go test ./...`) |
| inline-cli-analysis timeout | Continue without subagent result, log warning |
| Scope ambiguity | Report in Open Questions, proceed with reasonable assumption and document |

View File

@@ -0,0 +1,362 @@
---
name: investigate
description: Systematic debugging with Iron Law methodology. 5-phase investigation from evidence collection to verified fix. Triggers on "investigate", "debug", "root cause".
agents: investigator
phases: 5
---
# Investigate
Systematic debugging skill that enforces the Iron Law: never fix without a confirmed root cause. Produces a structured debug report with full evidence chain, minimal fix, and regression test.
## Architecture
```
+--------------------------------------------------------------+
| investigate Orchestrator |
| -> Drive investigator agent through 5 sequential phases |
+----------------------------+---------------------------------+
|
spawn_agent (Phase 1 initial task)
|
v
+------------------+
| investigator |
| (single agent, |
| 5-phase loop) |
+------------------+
| ^ |
assign_task | | | assign_task
(Phase 2-5) v | v (Phase 3 gate check)
+------------------+
| Phase 1: Root |
| Phase 2: Pattern |
| Phase 3: Hyp. | <-- Gate: BLOCKED?
| Phase 4: Impl. | <-- Iron Law gate
| Phase 5: Report |
+------------------+
|
v
.workflow/.debug/debug-report-*.json
```
---
## Agent Registry
| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
|-------|-----------|-----------|----------------|---------|-------------|
| investigator | `investigator` | `~/.codex/skills/investigate/agents/investigator.md` | Full 5-phase investigation execution | Deep Interaction (2.3) | false |
> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
---
## Fork Context Strategy
| Agent | task_name | fork_context | fork_from | Rationale |
|-------|-----------|-------------|-----------|-----------|
| investigator | `investigator` | false | — | Starts fresh; receives all phase context via assign_task messages. No prior conversation history needed. |
**Fork Decision Rules**:
| Condition | fork_context | Reason |
|-----------|-------------|--------|
| investigator spawned (Phase 1) | false | Clean context; full task description in message |
| Phase 2-5 transitions | N/A | assign_task used, agent already running |
---
## Subagent Registry
Utility subagents callable by the investigator agent during analysis phases:
| Subagent | Agent File | Callable By | Purpose | Model |
|----------|-----------|-------------|---------|-------|
| inline-cli-analysis | `~/.codex/agents/cli-explore-agent.md` | investigator | Cross-file diagnostic analysis (replaces ccw cli calls) | haiku |
> Subagents are spawned by the investigator within its own execution context (Pattern 2.8), not by the orchestrator.
---
## Phase Execution
### Phase 1: Root Cause Investigation
**Objective**: Spawn the investigator agent and assign the Phase 1 investigation task. Agent reproduces the bug, collects evidence, and runs initial diagnosis.
**Input**:
| Source | Description |
|--------|-------------|
| User message | Bug description, symptom, context, error messages |
**Execution**:
Build the initial spawn message embedding the bug report and Phase 1 instructions, then spawn the investigator:
```
spawn_agent({
task_name: "investigator",
fork_context: false,
message: `## TASK ASSIGNMENT
### MANDATORY FIRST STEPS (Agent Execute)
1. Read role definition: ~/.codex/skills/investigate/agents/investigator.md (MUST read first)
2. Read: ~/.codex/skills/investigate/phases/01-root-cause-investigation.md
---
## Phase 1: Root Cause Investigation
Bug Report:
<user-provided bug description, symptoms, error messages, context>
Execute Phase 1 per the phase file. Produce investigation-report (in-memory) and report back with:
- Phase 1 complete summary
- bug_description, reproduction result, evidence collected, initial diagnosis
- Await next phase assignment.`
})
const p1Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
```
**Output**:
| Artifact | Description |
|----------|-------------|
| p1Result | Phase 1 completion summary with evidence, reproduction, initial diagnosis |
---
### Phase 2: Pattern Analysis
**Objective**: Assign Phase 2 to the running investigator. Agent searches codebase for similar patterns and classifies bug scope.
**Input**:
| Source | Description |
|--------|-------------|
| p1Result | Phase 1 output — evidence, affected files, initial suspects |
**Execution**:
```
assign_task({
target: "investigator",
items: [{
type: "text",
text: `## Phase 2: Pattern Analysis
Read: ~/.codex/skills/investigate/phases/02-pattern-analysis.md
Using your Phase 1 findings, execute Phase 2:
- Search for similar error patterns across the codebase
- Search for the same antipattern if identified
- Classify scope: isolated | module-wide | systemic
- Document all occurrences with file:line references
Report back with pattern_analysis section and scope classification. Await next phase assignment.`
}]
})
const p2Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
```
**Output**:
| Artifact | Description |
|----------|-------------|
| p2Result | Pattern analysis section: scope classification, similar occurrences, scope justification |
---
### Phase 3: Hypothesis Testing
**Objective**: Assign Phase 3 to the investigator. Agent forms and tests up to 3 hypotheses. Orchestrator checks output for `BLOCKED` marker before proceeding.
**Input**:
| Source | Description |
|--------|-------------|
| p2Result | Pattern analysis results |
**Execution**:
```
assign_task({
target: "investigator",
items: [{
type: "text",
text: `## Phase 3: Hypothesis Testing
Read: ~/.codex/skills/investigate/phases/03-hypothesis-testing.md
Using Phase 1-2 evidence, execute Phase 3:
- Form up to 3 ranked hypotheses, each citing evidence
- Test each hypothesis with read-only probes
- Track 3-strike counter — if 3 consecutive unproductive failures: STOP and output ESCALATION block with BLOCKED status
- If a hypothesis is confirmed: output confirmed_root_cause with full evidence chain
Report back with hypothesis test results and either:
confirmed_root_cause (proceed to Phase 4)
OR BLOCKED: <escalation dump> (halt)`
}]
})
const p3Result = wait_agent({ targets: ["investigator"], timeout_ms: 480000 })
```
**Phase 3 Gate Decision**:
| Condition | Action |
|-----------|--------|
| p3Result contains `confirmed_root_cause` | Proceed to Phase 4 |
| p3Result contains `BLOCKED` | Halt workflow, output escalation dump to user, close investigator |
| p3Result contains `ESCALATION: 3-Strike Limit Reached` | Halt workflow, output diagnostic dump, close investigator |
| Timeout | assign_task "Finalize Phase 3 results now", re-wait 120s; if still timeout → halt |
If BLOCKED: close investigator and surface the diagnostic dump to the user. Do not proceed to Phase 4.
---
### Phase 4: Implementation
**Objective**: Assign Phase 4 only after confirmed root cause. Agent implements minimal fix and adds regression test.
**Input**:
| Source | Description |
|--------|-------------|
| p3Result | confirmed_root_cause with evidence chain, affected file:line |
**Execution**:
```
assign_task({
target: "investigator",
items: [{
type: "text",
text: `## Phase 4: Implementation
Read: ~/.codex/skills/investigate/phases/04-implementation.md
Iron Law gate confirmed — proceed with implementation:
- Verify confirmed_root_cause is present in your context (gate check)
- Plan the minimal fix before writing any code
- Implement only what is necessary to fix the confirmed root cause
- Add regression test: must fail without fix, pass with fix
- Verify fix against original reproduction case from Phase 1
Report back with fix_applied section. Await Phase 5 assignment.`
}]
})
const p4Result = wait_agent({ targets: ["investigator"], timeout_ms: 480000 })
```
**Output**:
| Artifact | Description |
|----------|-------------|
| p4Result | fix_applied section: files changed, regression test details, reproduction verified |
---
### Phase 5: Verification & Report
**Objective**: Assign Phase 5 to run the full test suite and generate the structured debug report.
**Input**:
| Source | Description |
|--------|-------------|
| p4Result | fix_applied details — files changed, regression test |
**Execution**:
```
assign_task({
target: "investigator",
items: [{
type: "text",
text: `## Phase 5: Verification & Report
Read: ~/.codex/skills/investigate/phases/05-verification-report.md
Final phase:
- Run full test suite (detect framework: npm test / pytest / go test / cargo test)
- Verify the regression test passes
- Check for new failures introduced by the fix
- Generate structured debug report per specs/debug-report-format.md
- Write report to .workflow/.debug/debug-report-<YYYY-MM-DD>-<slug>.json
- Output completion status: DONE | DONE_WITH_CONCERNS | BLOCKED`
}]
})
const p5Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
```
**Output**:
| Artifact | Description |
|----------|-------------|
| p5Result | Completion status, test suite results, path to debug report file |
---
## Lifecycle Management
### Timeout Protocol
| Phase | Default Timeout | On Timeout |
|-------|-----------------|------------|
| Phase 1 (spawn + wait) | 300000 ms | assign_task "Finalize Phase 1 now" + wait 120s; if still timeout → halt |
| Phase 2 (assign + wait) | 300000 ms | assign_task "Finalize Phase 2 now" + wait 120s; if still timeout → halt |
| Phase 3 (assign + wait) | 480000 ms | assign_task "Finalize Phase 3 now" + wait 120s; if still timeout → halt BLOCKED |
| Phase 4 (assign + wait) | 480000 ms | assign_task "Finalize Phase 4 now" + wait 120s; if still timeout → halt |
| Phase 5 (assign + wait) | 300000 ms | assign_task "Finalize Phase 5 now" + wait 120s; if still timeout → partial report |
### Cleanup Protocol
At workflow end (success or halt), close the investigator agent:
```
close_agent({ target: "investigator" })
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Agent timeout (first) | assign_task "Finalize current work and output results" + re-wait 120000 ms |
| Agent timeout (second) | close_agent, report partial results to user |
| Phase 3 BLOCKED | close_agent, surface full escalation dump to user, halt |
| Phase 4 Iron Law violation | close_agent, report "Cannot proceed: no confirmed root cause" |
| Phase 4 introduces regression | Investigator returns to fix adjustment; orchestrator re-waits same phase |
| User cancellation | close_agent({ target: "investigator" }), report current state |
| send_message ignored | Escalate to assign_task |
---
## Output Format
```
## Summary
- One-sentence completion status (DONE / DONE_WITH_CONCERNS / BLOCKED)
## Results
- Root cause: <confirmed root cause description>
- Fix: <what was changed>
- Regression test: <test name in test file>
## Artifacts
- File: .workflow/.debug/debug-report-<date>-<slug>.json
- Description: Full structured investigation report
## Next Steps (if DONE_WITH_CONCERNS or BLOCKED)
1. <recommended follow-up action>
2. <recommended follow-up action>
```

View File

@@ -0,0 +1,212 @@
# Phase 1: Root Cause Investigation
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Reproduce the bug and collect all available evidence before forming any theories.
## Objective
- Reproduce the bug with concrete, observable symptoms
- Collect all evidence: error messages, logs, stack traces, affected files
- Establish a baseline understanding of what goes wrong and where
- Use inline CLI analysis for initial diagnosis
## Input
| Source | Required | Description |
|--------|----------|-------------|
| assign_task message | Yes | Bug description, symptom, expected behavior, context, user-provided errors |
| User-provided files | Optional | Any files or paths the user mentioned as relevant |
## Execution Steps
### Step 1: Parse the Bug Report
Extract the following from the user's description:
- **Symptom**: What observable behavior is wrong?
- **Expected**: What should happen instead?
- **Context**: When/where does it occur? (specific input, environment, timing)
- **User-provided files**: Any files mentioned
- **User-provided errors**: Any error messages provided
Assemble the extracted fields as the initial `investigation-report` structure in memory:
```
bugReport = {
symptom: <extracted from description>,
expected_behavior: <what should happen>,
context: <when/where it occurs>,
user_provided_files: [<files mentioned>],
user_provided_errors: [<error messages>]
}
```
---
### Step 2: Reproduce the Bug
Attempt reproduction using the most direct method available:
| Method | When to use |
|--------|-------------|
| Run failing test | A specific failing test is known or can be identified |
| Run failing command | Bug is triggered by a CLI command or script |
| Static code path trace | Reproduction requires complex setup; use Read + Grep to trace the path |
Execution for each method:
**Run failing test**:
```
Bash: <detect test runner and run the specific failing test>
```
**Run failing command**:
```
Bash: <execute the command that triggers the bug>
```
**Static code path trace**:
- Use Grep to find the error message text in source
- Use Read to trace the code path that produces the error
- Document the theoretical reproduction path
**Decision table**:
| Outcome | Action |
|---------|--------|
| Reproduction successful | Document steps and method, proceed to Step 3 |
| Reproduction failed | Document what was attempted, note as concern, continue with static analysis |
---
### Step 3: Collect Evidence
Gather all available evidence using project tools:
1. Search for the exact error message text in source files (Grep with 3 lines of context).
2. Search for related log output patterns.
3. Read any stack trace files or test output files if they exist on disk.
4. Use Glob to identify all files in the affected module or area.
5. Read the most directly implicated source files.
Compile findings into the evidence section of the investigation-report:
```
evidence = {
error_messages: [<exact error text>],
stack_traces: [<relevant stack trace>],
affected_files: [<file1>, <file2>],
affected_modules: [<module-name>],
log_output: [<relevant log lines>]
}
```
---
### Step 4: Initial Diagnosis via Inline CLI Analysis
Spawn inline-cli-analysis subagent for broader diagnostic perspective:
```
spawn_agent({
task_name: "inline-cli-analysis",
fork_context: false,
model: "haiku",
reasoning_effort: "medium",
message: `### MANDATORY FIRST STEPS
1. Read: ~/.codex/agents/cli-explore-agent.md
PURPOSE: Diagnose root cause of bug from collected evidence
TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
MODE: analysis
CONTEXT: @<affected_files_from_step3> | Evidence: <error_messages_and_traces>
EXPECTED: Top 3 likely root causes ranked by evidence strength, each with file:line reference
CONSTRAINTS: Read-only analysis | Focus on <affected_module>`
})
const diagResult = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
close_agent({ target: "inline-cli-analysis" })
```
Record results in initial_diagnosis section:
```
initial_diagnosis = {
cli_tool_used: "inline-cli-analysis",
top_suspects: [
{ description: <suspect 1>, evidence_strength: "strong|moderate|weak", files: [<files>] }
]
}
```
**Decision table**:
| Outcome | Action |
|---------|--------|
| Subagent returns top suspects | Integrate into investigation-report, proceed to Step 5 |
| Subagent timeout or error | Log warning in investigation-report, proceed to Step 5 without subagent findings |
---
### Step 5: Assemble Investigation Report
Combine all findings into the complete Phase 1 investigation-report:
```
investigation_report = {
phase: 1,
bug_description: <concise one-sentence description>,
reproduction: {
reproducible: true|false,
steps: ["step 1: ...", "step 2: ...", "step 3: observe error"],
reproduction_method: "test|command|static_analysis"
},
evidence: {
error_messages: [<exact error text>],
stack_traces: [<relevant stack trace>],
affected_files: [<file1>, <file2>],
affected_modules: [<module-name>],
log_output: [<relevant log lines>]
},
initial_diagnosis: {
cli_tool_used: "inline-cli-analysis",
top_suspects: [
{ description: <suspect>, evidence_strength: "strong|moderate|weak", files: [] }
]
}
}
```
Output Phase 1 summary and await assign_task for Phase 2.
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| investigation-report (phase 1) | In-memory JSON | bug_description, reproduction, evidence, initial_diagnosis |
| Phase 1 summary | Structured text output | Summary for orchestrator, await Phase 2 assignment |
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Bug symptom clearly documented | bug_description field populated with 10+ chars |
| Reproduction attempted | reproduction.reproducible is true or failure documented |
| At least one concrete evidence item collected | evidence.error_messages OR stack_traces OR affected_files non-empty |
| Affected files identified | evidence.affected_files non-empty |
| Initial diagnosis generated | initial_diagnosis.top_suspects has at least one entry (or timeout documented) |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Cannot reproduce bug | Document what was attempted, set reproducible: false, continue with static analysis |
| Error message not found in source | Expand search to whole project, try related terms, continue |
| No affected files identifiable | Use Glob on broad patterns, document uncertainty |
| inline-cli-analysis timeout | Continue without subagent result, log warning in initial_diagnosis |
| User description insufficient | Document in Open Questions, proceed with available information |
## Next Phase
-> [Phase 2: Pattern Analysis](02-pattern-analysis.md)

View File

@@ -0,0 +1,181 @@
# Phase 2: Pattern Analysis
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Search for similar patterns in the codebase to determine if the bug is isolated or systemic.
## Objective
- Search for similar error patterns, antipatterns, or code smells across the codebase
- Determine if the bug is an isolated incident or part of a systemic issue
- Identify related code that may be affected by the same root cause
- Refine the scope of the investigation
## Input
| Source | Required | Description |
|--------|----------|-------------|
| investigation-report (phase 1) | Yes | Evidence, affected files, affected modules, initial diagnosis suspects |
| assign_task message | Yes | Phase 2 instruction |
## Execution Steps
### Step 1: Search for Similar Error Patterns
Search for the same error type or message elsewhere in the codebase:
1. Grep for identical or similar error message fragments in `src/` with 3 lines of context.
2. Grep for the same exception class or error code — output mode: files with matches.
3. Grep for similar error handling patterns in the same module.
**Decision table**:
| Result | Action |
|--------|--------|
| Similar patterns found in same module | Note as module-wide indicator, continue |
| Similar patterns found across multiple modules | Note as systemic indicator, continue |
| No similar patterns found | Note as isolated indicator, continue |
---
### Step 2: Search for the Same Antipattern
If the Phase 1 initial diagnosis identified a coding antipattern, search for it globally:
**Common antipattern examples to search for**:
| Antipattern | Grep pattern style |
|-------------|-------------------|
| Missing null/undefined check | `variable\.property` without guard |
| Unchecked async operation | unhandled promise, missing await |
| Direct mutation of shared state | shared state write without lock |
| Type assumption violation | forced cast without validation |
Execute at least one targeted Grep for the identified antipattern across relevant source directories.
**Decision table**:
| Result | Action |
|--------|--------|
| Antipattern found in multiple files | Classify as module-wide or systemic candidate |
| Antipattern isolated to one location | Classify as isolated candidate |
| No antipattern identifiable | Proceed without antipattern classification |
---
### Step 3: Module-Level Analysis
Examine the affected module for structural issues:
1. Use Glob to list all files in the affected module directory.
2. Grep for imports from the affected module to understand its consumers.
3. Check for circular dependencies or unusual import patterns.
---
### Step 4: CLI Cross-File Pattern Analysis (Optional)
For complex patterns that span many files, use inline-cli-analysis subagent:
```
spawn_agent({
task_name: "inline-cli-analysis",
fork_context: false,
model: "haiku",
reasoning_effort: "medium",
message: `### MANDATORY FIRST STEPS
1. Read: ~/.codex/agents/cli-explore-agent.md
PURPOSE: Identify all instances of antipattern across codebase; success = complete scope map
TASK: Search for pattern '<antipattern_description>' | Map all occurrences | Assess systemic risk
MODE: analysis
CONTEXT: @src/**/*.<ext> | Bug in <module>, pattern: <pattern_description>
EXPECTED: List of all files with same pattern, risk assessment per occurrence (same_bug|potential_bug|safe)
CONSTRAINTS: Focus on <antipattern> pattern only | Ignore test files for scope`
})
const patternResult = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
close_agent({ target: "inline-cli-analysis" })
```
**Decision table**:
| Condition | Action |
|-----------|--------|
| Pattern spans >3 files in >1 module | Use subagent for full scope map |
| Pattern confined to 1 module | Skip subagent, proceed with manual search results |
| Subagent timeout | Continue with manual search results |
---
### Step 5: Classify Scope and Assemble Pattern Analysis
Classify the bug scope based on all search findings:
**Scope Definitions**:
| Scope | Definition |
|-------|-----------|
| `isolated` | Bug exists in a single location; no similar patterns found elsewhere |
| `module-wide` | Same pattern exists in multiple files within the same module |
| `systemic` | Pattern spans multiple modules; may require broader fix |
Assemble `pattern_analysis` section and add to investigation-report:
```
pattern_analysis = {
scope: "isolated|module-wide|systemic",
similar_occurrences: [
{
file: "<path/to/file.ts>",
line: <line number>,
pattern: "<description of similar pattern>",
risk: "same_bug|potential_bug|safe"
}
],
total_occurrences: <count>,
affected_modules: ["<module-name>"],
antipattern_identified: "<description or null>",
scope_justification: "<evidence-based reasoning for this scope classification>"
}
```
**Scope decision table**:
| Scope | Phase 3 Focus |
|-------|--------------|
| isolated | Narrow hypothesis scope to single location |
| module-wide | Note all occurrences for Phase 4 fix planning |
| systemic | Note for potential multi-location fix; flag for separate tracking |
Output Phase 2 summary and await assign_task for Phase 3.
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| investigation-report (phase 2) | In-memory JSON | Phase 1 fields + pattern_analysis section added |
| Phase 2 summary | Structured text output | Scope classification with justification, await Phase 3 |
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| At least 3 search queries executed | Count of Grep/Glob operations performed |
| Scope classified | pattern_analysis.scope is one of: isolated, module-wide, systemic |
| Similar occurrences documented | pattern_analysis.similar_occurrences populated (empty array acceptable for isolated) |
| Scope justification provided | pattern_analysis.scope_justification non-empty with evidence |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| No source directory found | Search from project root, document uncertainty |
| Grep returns too many results | Narrow pattern, add path filter, take top 10 most relevant |
| inline-cli-analysis timeout | Continue with manual search results, log warning |
| Antipattern not identifiable from Phase 1 | Skip Step 2 antipattern search, proceed with error pattern search only |
## Next Phase
-> [Phase 3: Hypothesis Testing](03-hypothesis-testing.md)

View File

@@ -0,0 +1,214 @@
# Phase 3: Hypothesis Testing
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.
## Objective
- Form a maximum of 3 hypotheses from Phase 1-2 evidence
- Test each hypothesis with minimal, read-only probes
- Confirm or reject each hypothesis with concrete evidence
- Enforce 3-strike rule: STOP and escalate after 3 consecutive unproductive test failures
## Input
| Source | Required | Description |
|--------|----------|-------------|
| investigation-report (phases 1-2) | Yes | Evidence, affected files, pattern analysis, initial suspects |
| assign_task message | Yes | Phase 3 instruction |
## Execution Steps
### Step 1: Form Hypotheses
Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:
**Hypothesis formation rules**:
- Each hypothesis must cite at least one piece of evidence from Phase 1-2
- Each hypothesis must have a testable prediction
- Rank by confidence (high first)
- Maximum 3 hypotheses per investigation
Assemble hypotheses in memory:
```
hypotheses = [
{
id: "H1",
description: "The root cause is <X> because evidence <Y>",
evidence_supporting: ["<evidence item 1>", "<evidence item 2>"],
predicted_behavior: "If H1 is correct, then we should observe <Z>",
test_method: "How to verify: read file <X> line <Y>, check value <Z>",
confidence: "high|medium|low"
}
]
```
Initialize strike counter: 0
---
### Step 2: Test Hypotheses Sequentially
Test each hypothesis starting from highest confidence (H1 first). Use read-only probes only during testing.
**Allowed test methods**:
| Method | Usage |
|--------|-------|
| Read a specific file | Check a specific value, condition, or code pattern |
| Grep for a pattern | Confirm or deny the presence of a condition |
| Bash targeted test | Run a specific test that reveals the condition |
| Temporary log statement | Add a log to observe runtime behavior; MUST revert after |
**Prohibited during hypothesis testing**:
- Modifying production code (save for Phase 4)
- Changing multiple things at once
- Running the full test suite (targeted checks only)
---
### Step 3: Record Test Results
For each hypothesis test, record:
```
hypothesis_test = {
id: "H1",
test_performed: "<what was checked, e.g.: Read src/caller.ts:42 — checked null handling>",
result: "confirmed|rejected|inconclusive",
evidence: "<specific observation that confirms or rejects>",
files_checked: ["<src/caller.ts:42-55>"]
}
```
---
### Step 4: 3-Strike Escalation Rule
Track consecutive unproductive test failures. After each hypothesis test, evaluate:
**Strike evaluation**:
| Test result | New insight gained | Strike action |
|-------------|-------------------|---------------|
| confirmed | — | CONFIRM root cause, end testing |
| rejected | Yes — narrows search or reveals new cause | No strike (productive rejection) |
| rejected | No — no actionable insight | +1 strike |
| inconclusive | Yes — identifies new area | No strike (productive) |
| inconclusive | No — no narrowing | +1 strike |
**Strike counter tracking**:
| Strike count | Action |
|--------------|--------|
| 1 | Continue to next hypothesis |
| 2 | Continue to next hypothesis |
| 3 | STOP — output escalation block immediately |
**On 3rd Strike — output this escalation block verbatim and halt**:
```
## ESCALATION: 3-Strike Limit Reached
### Failed Step
- Phase: 3 — Hypothesis Testing
- Step: Hypothesis test #<N>
### Error History
1. Attempt 1: <H1 description>
Test: <what was checked>
Result: <rejected/inconclusive> — <why>
2. Attempt 2: <H2 description>
Test: <what was checked>
Result: <rejected/inconclusive> — <why>
3. Attempt 3: <H3 description>
Test: <what was checked>
Result: <rejected/inconclusive> — <why>
### Current State
- Evidence collected: <summary from Phase 1-2>
- Hypotheses tested: <list>
- Files examined: <list>
### Diagnosis
- Likely root cause area: <best guess based on all evidence>
- Suggested human action: <specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z">
### Diagnostic Dump
<Full investigation-report content from all phases>
STATUS: BLOCKED
```
After outputting escalation: set status BLOCKED. Do not proceed to Phase 4.
---
### Step 5: Confirm Root Cause
If a hypothesis is confirmed, document the confirmed root cause:
```
confirmed_root_cause = {
hypothesis_id: "H1",
description: "<Root cause description with full technical detail>",
evidence_chain: [
"Phase 1: <Error message X observed in Y>",
"Phase 2: <Same pattern found in N other files>",
"Phase 3: H1 confirmed — <specific condition at file.ts:42>"
],
affected_code: {
file: "<path/to/file.ts>",
line_range: "<42-55>",
function: "<functionName>"
}
}
```
Add `hypothesis_tests` and `confirmed_root_cause` to investigation-report in memory.
Output Phase 3 results and await assign_task for Phase 4.
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| investigation-report (phase 3) | In-memory JSON | Phases 1-2 fields + hypothesis_tests + confirmed_root_cause |
| Phase 3 summary or escalation block | Structured text output | Either confirmed root cause or BLOCKED escalation |
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Maximum 3 hypotheses formed | Count of hypotheses array |
| Each hypothesis cites evidence | evidence_supporting non-empty for each |
| Each hypothesis tested with documented probe | test_performed field populated for each |
| Strike counter maintained correctly | Count of unproductive consecutive failures |
| Root cause confirmed with evidence chain OR escalation triggered | confirmed_root_cause present OR BLOCKED output |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Evidence insufficient to form 3 hypotheses | Form as many as evidence supports (minimum 1), proceed |
| Partial insight from rejected hypothesis | Do not count as strike; re-form or refine remaining hypotheses with new insight |
| All 3 hypotheses confirmed simultaneously | Use highest-confidence confirmed one as root cause |
| Hypothesis test requires production change | Prohibited — use static analysis or targeted read-only probe instead |
## Gate for Phase 4
Phase 4 can ONLY proceed if `confirmed_root_cause` is present. This is the Iron Law gate.
| Outcome | Next Step |
|---------|-----------|
| Root cause confirmed | -> [Phase 4: Implementation](04-implementation.md) |
| 3-strike escalation triggered | STOP — output diagnostic dump — STATUS: BLOCKED |
| Partial insight, re-forming hypotheses | Stay in Phase 3, re-test with refined hypotheses |
## Next Phase
-> [Phase 4: Implementation](04-implementation.md) ONLY with confirmed root cause.

View File

@@ -0,0 +1,195 @@
# Phase 4: Implementation
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Implement the minimal fix and add a regression test. Iron Law gate enforced at entry.
## Objective
- Verify Iron Law gate: confirmed root cause MUST exist from Phase 3
- Implement the minimal fix that addresses the confirmed root cause
- Add a regression test that fails without the fix and passes with it
- Verify the fix resolves the original reproduction case
## Input
| Source | Required | Description |
|--------|----------|-------------|
| investigation-report (phase 3) | Yes | Must contain confirmed_root_cause with evidence chain |
| assign_task message | Yes | Phase 4 instruction |
## Iron Law Gate Check
**MANDATORY FIRST ACTION before any code modification**:
| Condition | Action |
|-----------|--------|
| investigation-report contains `confirmed_root_cause` with non-empty description | Proceed to Step 1 |
| `confirmed_root_cause` absent or empty | Output "BLOCKED: Iron Law violation — no confirmed root cause. Return to Phase 3." Halt. Do NOT modify any files. |
Log the confirmed state before proceeding:
- Root cause: `<confirmed_root_cause.description>`
- Evidence chain: `<confirmed_root_cause.evidence_chain.length>` items
- Affected code: `<confirmed_root_cause.affected_code.file>:<confirmed_root_cause.affected_code.line_range>`
## Execution Steps
### Step 1: Plan the Minimal Fix
Define the fix scope BEFORE writing any code:
```
fix_plan = {
description: "<What the fix does and why>",
changes: [
{
file: "<path/to/file.ts>",
change_type: "modify|add|remove",
description: "<specific change description>",
lines_affected: "<42-45>"
}
],
total_files_changed: <count>,
total_lines_changed: "<estimated>"
}
```
**Minimal Fix Rules** (from Iron Law):
| Rule | Requirement |
|------|-------------|
| Change only necessary code | Only the confirmed root cause location |
| No refactoring | Do not restructure surrounding code |
| No feature additions | Fix only; no new capabilities |
| No style/format changes | Do not touch unrelated code formatting |
| >3 files changed | Requires written justification in fix_plan |
**Fix scope decision**:
| Files to change | Action |
|----------------|--------|
| 1-3 files | Proceed without justification |
| More than 3 files | Document justification in fix_plan.description before proceeding |
---
### Step 2: Implement the Fix
Apply the planned changes using Edit:
- Target only the file(s) and line(s) identified in `confirmed_root_cause.affected_code`
- Make exactly the change described in fix_plan
- Verify the edit was applied correctly by reading the modified section
**Decision table**:
| Edit outcome | Action |
|-------------|--------|
| Edit applied correctly | Proceed to Step 3 |
| Edit failed or incorrect | Re-apply with corrected old_string/new_string; if Edit fails 2+ times, use Bash sed as fallback |
| Fix requires more than planned | Document the additional change in fix_plan with justification |
---
### Step 3: Add Regression Test
Create or modify a test that proves the fix:
1. Find the appropriate test file for the affected module:
- Use Glob for `**/*.test.{ts,js,py}`, `**/__tests__/**/*.{ts,js}`, or `**/test_*.py`
- Match the test file to the affected source module
2. Add a regression test with these requirements:
**Regression test requirements**:
| Requirement | Details |
|-------------|---------|
| Test name references the bug | Name clearly describes the bug scenario (e.g., "should handle null display_name without error") |
| Tests exact code path | Exercises the specific path identified in root cause |
| Deterministic | No timing dependencies, no external services |
| Correct placement | In the appropriate test file for the affected module |
| Proves the fix | Must fail when fix is reverted, pass when fix is applied |
**Decision table**:
| Condition | Action |
|-----------|--------|
| Existing test file found for module | Add test to that file |
| No existing test file found | Create new test file following project conventions |
| Multiple candidate test files | Choose the one most directly testing the affected module |
---
### Step 4: Verify Fix Against Reproduction
Re-run the original reproduction case from Phase 1:
- If Phase 1 used a failing test: run that same test now
- If Phase 1 used a failing command: run that same command now
- If Phase 1 used static analysis: run the regression test as verification
Record verification result:
```
fix_applied = {
description: "<what was fixed>",
files_changed: ["<path/to/file.ts>"],
lines_changed: <count>,
regression_test: {
file: "<path/to/test.ts>",
test_name: "<test name>",
status: "added|modified"
},
reproduction_verified: true|false
}
```
**Decision table**:
| Verification result | Action |
|--------------------|--------|
| Reproduction case now passes | Set reproduction_verified: true, proceed to Step 5 |
| Reproduction case still fails | Analyze why fix is insufficient, adjust fix, re-run |
| Cannot verify (setup required) | Document as concern, set reproduction_verified: false, proceed |
---
### Step 5: Assemble Phase 4 Output
Add `fix_applied` to investigation-report in memory. Output Phase 4 summary and await assign_task for Phase 5.
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| Modified source files | File edits | Minimal fix applied to affected code |
| Regression test | File add/edit | Test covering the exact bug scenario |
| investigation-report (phase 4) | In-memory JSON | Phases 1-3 fields + fix_applied section |
| Phase 4 summary | Structured text output | Fix description, test added, verification result |
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Iron Law gate passed | confirmed_root_cause present before any code change |
| Fix is minimal | fix_plan.total_files_changed <= 3 OR justification documented |
| Regression test added | fix_applied.regression_test populated |
| Original reproduction passes | fix_applied.reproduction_verified: true |
| No unrelated code changes | Only confirmed_root_cause.affected_code locations modified |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Iron Law gate fails | Output BLOCKED, halt, do not modify any files |
| Edit tool fails twice | Try Bash sed/awk as fallback; if still failing, use Write to recreate file |
| Fix does not resolve reproduction | Analyze remaining failure, adjust fix within Phase 4 |
| Fix requires changing >3 files | Document justification, proceed after explicit justification |
| No test file found for module | Create new test file following nearest similar test file pattern |
| Regression test is non-deterministic | Refactor test to remove timing/external dependencies |
## Next Phase
-> [Phase 5: Verification & Report](05-verification-report.md)

View File

@@ -0,0 +1,240 @@
# Phase 5: Verification & Report
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Run full test suite, check for regressions, and generate the structured debug report.
## Objective
- Run the full test suite to verify no regressions were introduced
- Generate a structured debug report for future reference
- Output the report to `.workflow/.debug/` directory
## Input
| Source | Required | Description |
|--------|----------|-------------|
| investigation-report (phases 1-4) | Yes | All phases populated: evidence, root cause, fix_applied |
| assign_task message | Yes | Phase 5 instruction |
## Execution Steps
### Step 1: Detect and Run Full Test Suite
Detect the project's test framework by checking for project files, then run the full suite:
| Detection file | Test command |
|---------------|-------------|
| `package.json` with `test` script | `npm test` |
| `pytest.ini` or `pyproject.toml` | `pytest` |
| `go.mod` | `go test ./...` |
| `Cargo.toml` | `cargo test` |
| `Makefile` with `test` target | `make test` |
| None detected | Try `npm test`, `pytest`, `go test ./...` sequentially |
```
Bash: mkdir -p .workflow/.debug
Bash: <detected test command>
```
Record test results:
```
test_results = {
total: <count>,
passed: <count>,
failed: <count>,
skipped: <count>,
regression_test_passed: true|false,
new_failures: []
}
```
---
### Step 2: Regression Check
Verify specifically:
1. The new regression test passes (check by test name from fix_applied.regression_test.test_name).
2. All tests that were passing before the fix still pass.
3. No new warnings or errors appeared in test output.
**Decision table for new failures**:
| New failure | Assessment | Action |
|-------------|-----------|--------|
| Related to fix (same module, same code path) | Fix introduced regression | Return to Phase 4 to adjust fix |
| Unrelated to fix (different module, pre-existing) | Pre-existing failure | Document in pre_existing_failures, proceed |
| Regression test itself fails | Fix is not working correctly | Return to Phase 4 |
Classify failures:
```
regression_check_result = {
passed: true|false,
total_tests: <count>,
new_failures: ["<test names that newly fail>"],
pre_existing_failures: ["<tests that were already failing>"]
}
```
---
### Step 3: Generate Structured Debug Report
Compile all investigation data into the final debug report JSON following the schema from `~/.codex/skills/investigate/specs/debug-report-format.md`:
```
debug_report = {
"bug_description": "<concise one-sentence description of the bug>",
"reproduction_steps": [
"<step 1>",
"<step 2>",
"<step 3: observe error>"
],
"root_cause": "<confirmed root cause description with technical detail and file:line reference>",
"evidence_chain": [
"Phase 1: <error message X observed in module Y>",
"Phase 2: <pattern analysis found N similar occurrences>",
"Phase 3: hypothesis H<N> confirmed — <specific condition at file:line>"
],
"fix_description": "<what was changed and why>",
"files_changed": [
{
"path": "<src/module/file.ts>",
"change_type": "add|modify|remove",
"description": "<brief description of changes to this file>"
}
],
"tests_added": [
{
"file": "<src/module/__tests__/file.test.ts>",
"test_name": "<should handle null return from X>",
"type": "regression|unit|integration"
}
],
"regression_check_result": {
"passed": true|false,
"total_tests": <count>,
"new_failures": [],
"pre_existing_failures": []
},
"completion_status": "DONE|DONE_WITH_CONCERNS|BLOCKED",
"concerns": [],
"timestamp": "<ISO-8601 timestamp>",
"investigation_duration_phases": 5
}
```
**Field sources**:
| Field | Source Phase | Description |
|-------|-------------|-------------|
| `bug_description` | Phase 1 | User-reported symptom, one sentence |
| `reproduction_steps` | Phase 1 | Ordered steps to trigger the bug |
| `root_cause` | Phase 3 | Confirmed cause with file:line reference |
| `evidence_chain` | Phase 1-3 | Each item prefixed with "Phase N:" |
| `fix_description` | Phase 4 | What code was changed and why |
| `files_changed` | Phase 4 | Each file with change type and description |
| `tests_added` | Phase 4 | Regression tests covering the bug |
| `regression_check_result` | Phase 5 | Full test suite results |
| `completion_status` | Phase 5 | Final status per protocol |
| `concerns` | Phase 5 | Non-blocking issues (if any) |
| `timestamp` | Phase 5 | When report was generated |
| `investigation_duration_phases` | Phase 5 | Always 5 for complete investigation |
---
### Step 4: Write Report File
Compute the filename:
- `<slug>` = bug_description lowercased, non-alphanumeric characters replaced with `-`, truncated to 40 chars
- `<date>` = current date as YYYY-MM-DD
```
Bash: mkdir -p .workflow/.debug
Write: .workflow/.debug/debug-report-<date>-<slug>.json
Content: <debug_report JSON with 2-space indent>
```
---
### Step 5: Output Completion Status
Determine status and output completion block:
**Status determination**:
| Condition | Status |
|-----------|--------|
| Regression test passes, no new failures, all quality checks met | DONE |
| Fix applied but partial test coverage, minor warnings, or non-critical concerns | DONE_WITH_CONCERNS |
| New test failures introduced by fix (unresolvable), or critical concern | BLOCKED |
**DONE output**:
```
## STATUS: DONE
**Summary**: Fixed <bug_description> — root cause was <root_cause_summary>
### Details
- Phases completed: 5/5
- Root cause: <confirmed_root_cause.description>
- Fix: <fix_description>
- Regression test: <test_name> in <test_file>
### Outputs
- Debug report: .workflow/.debug/debug-report-<date>-<slug>.json
- Files changed: <list>
- Tests added: <list>
```
**DONE_WITH_CONCERNS output**:
```
## STATUS: DONE_WITH_CONCERNS
**Summary**: Fixed <bug_description> with concerns
### Details
- Phases completed: 5/5
- Concerns:
1. <concern> — Impact: low|medium — Suggested fix: <action>
### Outputs
- Debug report: .workflow/.debug/debug-report-<date>-<slug>.json
- Files changed: <list>
- Tests added: <list>
```
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| `.workflow/.debug/debug-report-<date>-<slug>.json` | JSON file | Full structured investigation report |
| Completion status block | Structured text output | Final status per Completion Status Protocol |
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Full test suite executed | Test command ran and produced output |
| Regression test passes | test_results.regression_test_passed: true |
| No new failures introduced | regression_check_result.new_failures is empty (or documented as pre-existing) |
| Debug report written | File exists at `.workflow/.debug/debug-report-<date>-<slug>.json` |
| Completion status output | Status block follows protocol format |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Test framework not detected | Try common commands in order; document uncertainty in concerns |
| New failures related to fix | Return to Phase 4 to adjust; do not write report until resolved |
| New failures unrelated | Document as pre_existing_failures, set DONE_WITH_CONCERNS if impactful |
| Report directory not writable | Try alternate path `.workflow/debug/`; document in output |
| Test suite takes >5 minutes | Run regression test only; note full suite skipped in concerns |
| Regression test was not added in Phase 4 | Document as DONE_WITH_CONCERNS concern |

View File

@@ -0,0 +1,341 @@
# Security Auditor Agent
Executes all 4 phases of the security audit: supply chain scan, OWASP Top 10 review, STRIDE threat modeling, and scored report generation. Driven by orchestrator via assign_task through each phase.
## Identity
- **Type**: `analysis`
- **Role File**: `~/.codex/agents/security-auditor.md`
- **task_name**: `security-auditor`
- **Responsibility**: Read-only analysis (Phases 13) + Write (Phase 4 report output)
- **fork_context**: false
- **Reasoning Effort**: high
## Boundaries
### MUST
- Load role definition via MANDATORY FIRST STEPS pattern
- Produce structured JSON output for every phase
- Include file:line references in all code-level findings
- Enforce scoring gates: quick-scan >= 8/10; comprehensive initial >= 2/10
- Deduplicate findings that appear in multiple phases (keep highest severity, merge evidence)
- Write phase output files to `.workflow/.security/` before reporting completion
### MUST NOT
- Skip phases in comprehensive mode — all 4 phases must complete in sequence
- Proceed to next phase before writing current phase output file
- Include sensitive discovered values (actual secrets, credentials) in JSON evidence fields — redact with `[REDACTED]`
- Apply suppression (`@ts-ignore`, empty catch) — report findings as-is
---
## Toolbox
### Available Tools
| Tool | Type | Purpose |
|------|------|---------|
| `Bash` | execution | Run dependency audits, grep patterns, file discovery, directory setup |
| `Read` | read | Load phase files, specs, previous audit reports |
| `Write` | write | Output JSON phase results to `.workflow/.security/` |
| `Glob` | read | Discover source files by pattern for scoping |
| `Grep` | read | Pattern-based security scanning across source files |
| `spawn_agent` | agent | Spawn inline subagent for OWASP CLI analysis (Phase 2) |
| `wait_agent` | agent | Await inline subagent result |
| `close_agent` | agent | Close inline subagent after result received |
### Tool Usage Patterns
**Setup Pattern**: Ensure work directory exists before any phase output.
```
Bash("mkdir -p .workflow/.security")
```
**Read Pattern**: Load phase spec before executing.
```
Read("~/.codex/skills/security-audit/phases/01-supply-chain-scan.md")
Read("~/.codex/skills/security-audit/specs/scoring-gates.md")
```
**Write Pattern**: Output structured JSON after each phase.
```
Write(".workflow/.security/supply-chain-report.json", <json_content>)
```
---
## Execution
### Phase 1: Supply Chain Scan
**Objective**: Detect vulnerable dependencies, hardcoded secrets, CI/CD injection risks, and LLM prompt injection vectors.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| Phase spec | Yes | `~/.codex/skills/security-audit/phases/01-supply-chain-scan.md` |
| Project root | Yes | Working directory with source files |
**Steps**:
1. Read `~/.codex/skills/security-audit/phases/01-supply-chain-scan.md` for full execution instructions.
2. Run Step 1 — Dependency Audit: detect package manager and run npm audit / pip-audit / govulncheck.
3. Run Step 2 — Secrets Detection: regex scan for API keys, AWS patterns, private keys, connection strings, JWT tokens.
4. Run Step 3 — CI/CD Config Review: scan `.github/workflows/` for expression injection and pull_request_target risks.
5. Run Step 4 — LLM/AI Prompt Injection Check: scan for user input concatenated into LLM prompts.
6. Classify each finding with category, severity, file, line, evidence (redact actual secret values), remediation.
7. Write output file.
**Decision Table — Dependency Audit**:
| Condition | Action |
|-----------|--------|
| npm / yarn lock file found | Run `npm audit --json` |
| requirements.txt / pyproject.toml found | Run `pip-audit --format json`; fallback to `safety check --json` |
| go.sum found | Run `govulncheck ./...` |
| No lock files found | Log INFO finding: "No lock files detected"; continue |
| Audit tool not installed | Log INFO finding: "<tool> not installed"; continue |
**Decision Table — Secrets Detection**:
| Pattern Match | Severity | Category |
|---------------|----------|----------|
| API key / secret / token with 16+ char value | Critical | secret |
| AWS AKIA key pattern | Critical | secret |
| `-----BEGIN PRIVATE KEY-----` | Critical | secret |
| DB connection string with password | Critical | secret |
| Hardcoded JWT token | High | secret |
**Output**: `.workflow/.security/supply-chain-report.json` — schema per phase spec.
---
### Phase 2: OWASP Review
**Objective**: Systematic code-level review against all 10 OWASP Top 10 2021 categories.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| Phase spec | Yes | `~/.codex/skills/security-audit/phases/02-owasp-review.md` |
| OWASP checklist | Yes | `~/.codex/skills/security-audit/specs/owasp-checklist.md` |
| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
**Steps**:
1. Read `~/.codex/skills/security-audit/phases/02-owasp-review.md` for full execution instructions.
2. Read `~/.codex/skills/security-audit/specs/owasp-checklist.md` for detection patterns.
3. Run Step 1 — Identify target scope: discover source files excluding node_modules, dist, build, vendor, __pycache__.
4. Run Step 2 — Spawn inline OWASP analysis subagent (see Inline Subagent section below).
5. Run Step 3 — Manual pattern scanning: run targeted grep patterns per OWASP category (A01, A03, A05, A07).
6. Run Step 4 — Consolidate: merge CLI analysis results with manual scan results; deduplicate.
7. Set coverage field for each category: `checked` or `not_applicable`.
8. Write output file.
**Decision Table — Scope**:
| Condition | Action |
|-----------|--------|
| Source files found | Proceed with full scan |
| No source files detected | Report as BLOCKED with scope note |
| Files > 500 | Prioritize: routes/, auth/, api/, handlers/ first |
**Output**: `.workflow/.security/owasp-findings.json` — schema per phase spec.
---
## Inline Subagent: OWASP CLI Analysis (Phase 2, Step 2)
**When**: After identifying target scope in Phase 2, Step 2.
**Agent File**: `~/.codex/agents/cli-explore-agent.md`
```
spawn_agent({
task_name: "inline-owasp-analysis",
fork_context: false,
model: "haiku",
reasoning_effort: "medium",
message: `### MANDATORY FIRST STEPS
1. Read: ~/.codex/agents/cli-explore-agent.md
Goal: OWASP Top 10 2021 security analysis of this codebase.
Systematically check each OWASP category:
A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
A09 Security Logging/Monitoring Failures | A10 SSRF
Scope: @src/**/* @**/*.config.* @**/*.env.example
Expected: JSON findings per OWASP category with severity, file:line, evidence, remediation.
Constraints: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks`
})
const result = wait_agent({ targets: ["inline-owasp-analysis"], timeout_ms: 300000 })
close_agent({ target: "inline-owasp-analysis" })
```
**Result Handling**:
| Result | Action |
|--------|--------|
| Success | Integrate findings into owasp-findings.json consolidation step |
| Timeout / Error | Continue with manual pattern scan results only; log warning |
---
### Phase 3: Threat Modeling
**Objective**: Apply STRIDE framework to architecture components; identify trust boundaries and attack surface.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| Phase spec | Yes | `~/.codex/skills/security-audit/phases/03-threat-modeling.md` |
| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
| OWASP findings | Yes | `.workflow/.security/owasp-findings.json` |
**Steps**:
1. Read `~/.codex/skills/security-audit/phases/03-threat-modeling.md` for full execution instructions.
2. Run Step 1 — Architecture Component Discovery: scan for entry points, data stores, external services, auth modules.
3. Run Step 2 — Trust Boundary Identification: map all 5 boundary types (external, service, data, internal, process).
4. Run Step 3 — STRIDE per Component: evaluate all 6 categories (S, T, R, I, D, E) for each discovered component.
5. Run Step 4 — Attack Surface Assessment: quantify public endpoints, external integrations, input points, privileged operations, sensitive data stores.
6. Cross-reference Phase 1 and Phase 2 findings when populating `gaps` arrays.
7. Write output file.
**STRIDE Evaluation Decision Table**:
| Component Type | Priority STRIDE Categories |
|----------------|---------------------------|
| api_endpoint | S (spoofing), T (tampering), D (denial-of-service), E (elevation) |
| auth_module | S (spoofing), R (repudiation), E (elevation) |
| data_store | T (tampering), I (information disclosure), R (repudiation) |
| external_service | T (tampering), I (information disclosure), D (denial-of-service) |
| worker | T (tampering), D (denial-of-service) |
**Output**: `.workflow/.security/threat-model.json` — schema per phase spec.
---
### Phase 4: Report & Tracking
**Objective**: Aggregate all findings, calculate score, compare trends, write dated report.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| Phase spec | Yes | `~/.codex/skills/security-audit/phases/04-report-tracking.md` |
| Scoring gates | Yes | `~/.codex/skills/security-audit/specs/scoring-gates.md` |
| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
| OWASP findings | Yes | `.workflow/.security/owasp-findings.json` |
| Threat model | Yes | `.workflow/.security/threat-model.json` |
| Previous audits | No | `.workflow/.security/audit-report-*.json` (for trend) |
**Steps**:
1. Read `~/.codex/skills/security-audit/phases/04-report-tracking.md` for full execution instructions.
2. Aggregate all findings from phases 13 (supply-chain + owasp + STRIDE gaps).
3. Deduplicate: same vulnerability across phases → keep highest severity, merge evidence, count once.
4. Count files scanned (from phase outputs).
5. Calculate score per formula: `base_score(10.0) - (weighted_sum / max(10, files_scanned))`.
6. Find previous audit: `ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -1`.
7. Compute trend direction and score_delta.
8. Evaluate gate (initial vs. subsequent logic).
9. Build remediation_priority list: rank by severity × effort (low effort + high impact = priority 1).
10. Write dated report.
11. Copy phase outputs to `.workflow/.security/` as latest copies.
**Score Calculation**:
| Severity | Weight |
|----------|--------|
| critical | 10 |
| high | 7 |
| medium | 4 |
| low | 1 |
Formula: `final_score = max(0, round(10.0 - (weighted_sum / max(10, files_scanned)), 1))`
**Score Interpretation Table**:
| Score Range | Rating | Meaning |
|-------------|--------|---------|
| 9.0 10.0 | Excellent | Minimal risk, production-ready |
| 7.0 8.9 | Good | Acceptable risk, minor improvements needed |
| 5.0 6.9 | Fair | Notable risks, remediation recommended |
| 3.0 4.9 | Poor | Significant risks, remediation required |
| 0.0 2.9 | Critical | Severe vulnerabilities, immediate action needed |
**Gate Evaluation**:
| Condition | Gate Result | Status |
|-----------|------------|--------|
| No previous audit AND score >= 2.0 | PASS | Baseline established |
| No previous audit AND score < 2.0 | FAIL | DONE_WITH_CONCERNS |
| Previous audit AND score >= previous_score | PASS | No regression |
| Previous audit AND score within 0.5 of previous | WARN | DONE_WITH_CONCERNS |
| Previous audit AND score < previous_score - 0.5 | FAIL | DONE_WITH_CONCERNS |
**Trend Direction**:
| Condition | direction field |
|-----------|----------------|
| No previous audit | `baseline` |
| score_delta > 0.5 | `improving` |
| -0.5 <= score_delta <= 0.5 | `stable` |
| score_delta < -0.5 | `regressing` |
**Output**: `.workflow/.security/audit-report-<YYYY-MM-DD>.json` — full schema per phase spec.
---
## Structured Output Template
```
## Summary
- One-sentence completion status with phase completed and finding count
## Score (Phase 4 / quick-scan)
- Score: <N>/10 (<Rating>)
- Gate: PASS|FAIL|WARN
- Trend: <improving|stable|regressing|baseline> (delta: <+/-N.N>)
## Findings
- Critical: <N> High: <N> Medium: <N> Low: <N>
## Phase Outputs Written
- .workflow/.security/supply-chain-report.json
- .workflow/.security/owasp-findings.json (if Phase 2 completed)
- .workflow/.security/threat-model.json (if Phase 3 completed)
- .workflow/.security/audit-report-<date>.json (if Phase 4 completed)
## Top Risks
1. [severity] <title> — <file>:<line> — <remediation summary>
2. [severity] <title> — <file>:<line> — <remediation summary>
## Open Questions
1. <Any scope ambiguity or blocked items>
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Phase spec file not found | Read from fallback path; report in Open Questions if unavailable |
| Dependency audit tool missing | Log as INFO finding (category: dependency), continue with other steps |
| No source files found | Report as BLOCKED with path; request scope clarification |
| Inline subagent timeout (Phase 2) | Continue with manual grep results only; note in findings summary |
| Phase output file write failure | Retry once; if still failing report as BLOCKED |
| Previous audit parse error | Treat as baseline (no prior data); note in trend section |
| Timeout approaching mid-phase | Output partial results with "PARTIAL" status, write what is available |

View File

@@ -0,0 +1,384 @@
---
name: security-audit
description: OWASP Top 10 and STRIDE security auditing with supply chain analysis. Triggers on "security audit", "security scan", "cso".
agents: security-auditor
phases: 4
---
# Security Audit
4-phase security audit covering supply chain risks, OWASP Top 10 code review, STRIDE threat modeling, and trend-tracked reporting. Produces structured JSON findings in `.workflow/.security/`.
## Architecture
```
+----------------------------------------------------------------------+
| security-audit Orchestrator |
| -> Mode selection: quick-scan (Phase 1 only) vs comprehensive |
+-----------------------------------+----------------------------------+
|
+---------------------+---------------------+
| |
[quick-scan mode] [comprehensive mode]
| |
+---------v---------+ +------------v-----------+
| Phase 1 | | Phase 1 |
| Supply Chain Scan | | Supply Chain Scan |
| -> supply-chain- | | -> supply-chain- |
| report.json | | report.json |
+---------+---------+ +------------+-----------+
| |
[score gate] +-----------v-----------+
score >= 8/10 | Phase 2 |
| | OWASP Review |
[DONE or | -> owasp-findings. |
DONE_WITH_CONCERNS] | json |
+-----------+-----------+
|
+-----------v-----------+
| Phase 3 |
| Threat Modeling |
| (STRIDE) |
| -> threat-model.json |
+-----------+-----------+
|
+-----------v-----------+
| Phase 4 |
| Report & Tracking |
| -> audit-report- |
| {date}.json |
+-----------------------+
```
---
## Agent Registry
| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
|-------|-----------|-----------|----------------|---------|-------------|
| security-auditor | security-auditor | ~/.codex/agents/security-auditor.md | Execute all 4 phases: dependency audit, OWASP review, STRIDE modeling, report generation | Deep Interaction (2.3) | false |
> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
---
## Fork Context Strategy
| Agent | task_name | fork_context | fork_from | Rationale |
|-------|-----------|-------------|-----------|-----------|
| security-auditor | security-auditor | false | — | Starts fresh; all context provided via assign_task phase messages |
**Fork Decision Rules**:
| Condition | fork_context | Reason |
|-----------|-------------|--------|
| security-auditor spawn | false | Self-contained pipeline; phase inputs passed via assign_task |
---
## Subagent Registry
Utility subagents spawned by `security-auditor` (not by the orchestrator):
| Subagent | Agent File | Callable By | Purpose | Model |
|----------|-----------|-------------|---------|-------|
| inline-owasp-analysis | ~/.codex/agents/cli-explore-agent.md | security-auditor (Phase 2) | OWASP Top 10 2021 code-level analysis | haiku |
> Subagents are spawned by agents within their own execution context (Pattern 2.8), not by the orchestrator.
---
## Mode Selection
Determine mode from user request before spawning any agent.
| User Intent | Mode | Phases to Execute | Gate |
|-------------|------|-------------------|------|
| "quick scan", "daily check", "fast audit" | quick-scan | Phase 1 only | score >= 8/10 |
| "full audit", "comprehensive", "security audit", "cso" | comprehensive | Phases 1 → 2 → 3 → 4 | no regression (initial: >= 2/10) |
| Ambiguous | Prompt user: "Quick-scan (Phase 1 only) or comprehensive (all 4 phases)?" | — | — |
---
## Phase Execution
### Phase 1: Supply Chain Scan
**Objective**: Detect low-hanging security risks in dependencies, secrets, CI/CD pipelines, and LLM integrations.
**Input**:
| Source | Description |
|--------|-------------|
| Working directory | Project source to be scanned |
| Mode | quick-scan or comprehensive |
**Execution**:
Spawn the security-auditor agent and assign Phase 1:
```
spawn_agent({
task_name: "security-auditor",
fork_context: false,
message: `### MANDATORY FIRST STEPS
1. Read: ~/.codex/skills/security-audit/agents/security-auditor.md
## TASK: Phase 1 — Supply Chain Scan
Mode: <quick-scan|comprehensive>
Work directory: .workflow/.security
Execute Phase 1 per: ~/.codex/skills/security-audit/phases/01-supply-chain-scan.md
Deliverables:
- .workflow/.security/supply-chain-report.json
- Structured output summary with finding counts by severity`
})
const phase1Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 300000 })
```
**On timeout**:
```
assign_task({
target: "security-auditor",
items: [{ type: "text", text: "Finalize current supply chain scan and output supply-chain-report.json now." }]
})
const phase1Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 120000 })
```
**Output**:
| Artifact | Description |
|----------|-------------|
| `.workflow/.security/supply-chain-report.json` | Dependency, secrets, CI/CD, and LLM findings |
---
### Quick-Scan Gate (quick-scan mode only)
After Phase 1 completes, evaluate score and close agent.
| Condition | Action |
|-----------|--------|
| score >= 8.0 | Status: DONE. No blocking issues. |
| 6.0 <= score < 8.0 | Status: DONE_WITH_CONCERNS. Log warning — review before deploy. |
| score < 6.0 | Status: DONE_WITH_CONCERNS. Block deployment. Remediate critical/high findings. |
```
close_agent({ target: "security-auditor" })
```
> **If quick-scan mode**: Stop here. Output final summary with score and findings count.
---
### Phase 2: OWASP Review (comprehensive mode only)
**Objective**: Systematic code-level review against all 10 OWASP Top 10 2021 categories.
**Input**:
| Source | Description |
|--------|-------------|
| `.workflow/.security/supply-chain-report.json` | Phase 1 findings for context |
| Source files | All .ts/.js/.py/.go/.java excluding node_modules, dist, build |
**Execution**:
```
assign_task({
target: "security-auditor",
items: [{ type: "text", text: `## Phase 2 — OWASP Review
Execute Phase 2 per: ~/.codex/skills/security-audit/phases/02-owasp-review.md
Context: supply-chain-report.json already written to .workflow/.security/
Reference: ~/.codex/skills/security-audit/specs/owasp-checklist.md
Deliverables:
- .workflow/.security/owasp-findings.json
- Coverage for all 10 OWASP categories (A01A10)` }]
})
const phase2Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 360000 })
```
**Output**:
| Artifact | Description |
|----------|-------------|
| `.workflow/.security/owasp-findings.json` | OWASP findings with owasp_id, severity, file:line, evidence, remediation |
---
### Phase 3: Threat Modeling (comprehensive mode only)
**Objective**: Apply STRIDE threat model to architecture components; assess attack surface.
**Input**:
| Source | Description |
|--------|-------------|
| `.workflow/.security/supply-chain-report.json` | Phase 1 findings |
| `.workflow/.security/owasp-findings.json` | Phase 2 findings |
| Source files | Route handlers, data stores, auth modules, external service clients |
**Execution**:
```
assign_task({
target: "security-auditor",
items: [{ type: "text", text: `## Phase 3 — Threat Modeling (STRIDE)
Execute Phase 3 per: ~/.codex/skills/security-audit/phases/03-threat-modeling.md
Context: supply-chain-report.json and owasp-findings.json available in .workflow/.security/
Cross-reference Phase 1 and Phase 2 findings when mapping STRIDE categories.
Deliverables:
- .workflow/.security/threat-model.json
- All 6 STRIDE categories (S, T, R, I, D, E) evaluated per component
- Trust boundaries and attack surface quantified` }]
})
const phase3Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 360000 })
```
**Output**:
| Artifact | Description |
|----------|-------------|
| `.workflow/.security/threat-model.json` | STRIDE threat model with components, trust boundaries, attack surface |
---
### Phase 4: Report & Tracking (comprehensive mode only)
**Objective**: Calculate score, compare with previous audits, generate date-stamped report.
**Input**:
| Source | Description |
|--------|-------------|
| `.workflow/.security/supply-chain-report.json` | Phase 1 output |
| `.workflow/.security/owasp-findings.json` | Phase 2 output |
| `.workflow/.security/threat-model.json` | Phase 3 output |
| `.workflow/.security/audit-report-*.json` | Previous audit reports (optional, for trend) |
**Execution**:
```
assign_task({
target: "security-auditor",
items: [{ type: "text", text: `## Phase 4 — Report & Tracking
Execute Phase 4 per: ~/.codex/skills/security-audit/phases/04-report-tracking.md
Scoring reference: ~/.codex/skills/security-audit/specs/scoring-gates.md
Steps:
1. Aggregate all findings from phases 13
2. Calculate score using formula: base 10.0 - (weighted_sum / normalization)
3. Check for previous audit: ls -t .workflow/.security/audit-report-*.json | head -1
4. Compute trend (improving/stable/regressing/baseline)
5. Evaluate gate (initial >= 2/10; subsequent >= previous_score)
6. Write .workflow/.security/audit-report-<YYYY-MM-DD>.json
Deliverables:
- .workflow/.security/audit-report-<YYYY-MM-DD>.json
- Updated copies of all phase outputs in .workflow/.security/` }]
})
const phase4Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 300000 })
```
**Output**:
| Artifact | Description |
|----------|-------------|
| `.workflow/.security/audit-report-<date>.json` | Full scored report with trend, top risks, remediation priority |
---
### Comprehensive Gate (comprehensive mode only)
After Phase 4 completes, evaluate gate and close agent.
| Audit Type | Condition | Result | Action |
|------------|-----------|--------|--------|
| Initial (no prior audit) | score >= 2.0 | PASS | DONE. Baseline established. Plan remediation. |
| Initial | score < 2.0 | FAIL | DONE_WITH_CONCERNS. Critical exposure. Immediate triage required. |
| Subsequent | score >= previous_score | PASS | DONE. No regression. |
| Subsequent | previous_score - 0.5 <= score < previous_score | WARN | DONE_WITH_CONCERNS. Marginal change. Review new findings. |
| Subsequent | score < previous_score - 0.5 | FAIL | DONE_WITH_CONCERNS. Regression detected. Investigate new findings. |
```
close_agent({ target: "security-auditor" })
```
---
## Lifecycle Management
### Timeout Protocol
| Phase | Default Timeout | On Timeout |
|-------|-----------------|------------|
| Phase 1: Supply Chain | 300000 ms (5 min) | assign_task "Finalize output now", re-wait 120s |
| Phase 2: OWASP Review | 360000 ms (6 min) | assign_task "Output partial findings", re-wait 120s |
| Phase 3: Threat Modeling | 360000 ms (6 min) | assign_task "Output partial threat model", re-wait 120s |
| Phase 4: Report | 300000 ms (5 min) | assign_task "Write report with available data", re-wait 120s |
### Cleanup Protocol
Agent is closed after the final executed phase (Phase 1 for quick-scan, Phase 4 for comprehensive).
```
close_agent({ target: "security-auditor" })
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Agent timeout (first) | assign_task "Finalize current work and output now" + re-wait 120000 ms |
| Agent timeout (second) | Log error, close_agent({ target: "security-auditor" }), report partial results |
| Phase output file missing | assign_task requesting specific file output, re-wait |
| Audit tool not installed (npm/pip) | Phase 1 logs as INFO finding and continues — not a blocker |
| No previous audit found | Treat as baseline — apply initial gate (>= 2/10) |
| User cancellation | close_agent({ target: "security-auditor" }), report current state |
---
## Output Format
```
## Summary
- One-sentence completion status with mode and final score
## Score
- Overall: <N>/10 (<Rating>)
- Gate: PASS|FAIL|WARN
- Mode: quick-scan|comprehensive
## Findings
- Critical: <N>
- High: <N>
- Medium: <N>
- Low: <N>
## Artifacts
- File: .workflow/.security/supply-chain-report.json
- File: .workflow/.security/owasp-findings.json (comprehensive only)
- File: .workflow/.security/threat-model.json (comprehensive only)
- File: .workflow/.security/audit-report-<date>.json (comprehensive only)
## Top Risks
1. <Most critical finding with file:line and remediation>
2. <Second finding>
## Next Steps
1. Remediate critical findings (effort: <low|medium|high>)
2. Re-run audit to verify fixes
```

View File

@@ -0,0 +1,226 @@
# Phase 1: Supply Chain Scan
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Detect low-hanging security risks in third-party dependencies, hardcoded secrets, CI/CD pipelines, and LLM/AI integrations.
## Objective
- Audit third-party dependencies for known vulnerabilities
- Scan source code for leaked secrets and credentials
- Review CI/CD configuration for injection risks
- Check for LLM/AI prompt injection vulnerabilities
## Input
| Source | Required | Description |
|--------|----------|-------------|
| Project root | Yes | Working directory containing source files and dependency manifests |
| WORK_DIR | Yes | `.workflow/.security` — output directory (create if not exists) |
## Execution Steps
### Step 1: Dependency Audit
Detect package manager and run appropriate audit tool.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| `package-lock.json` or `yarn.lock` present | Run `npm audit --json` |
| `requirements.txt` or `pyproject.toml` present | Run `pip-audit --format json`; fallback `safety check --json` |
| `go.sum` present | Run `govulncheck ./...` |
| No manifest files found | Log INFO finding: "No dependency manifests detected"; continue |
| Audit tool not installed | Log INFO finding: "<tool> not installed — manual review needed"; continue |
**Execution**:
```bash
# Ensure output directory exists
mkdir -p .workflow/.security
WORK_DIR=".workflow/.security"
# Node.js projects
if [ -f package-lock.json ] || [ -f yarn.lock ]; then
npm audit --json > "${WORK_DIR}/npm-audit-raw.json" 2>&1 || true
fi
# Python projects
if [ -f requirements.txt ] || [ -f pyproject.toml ]; then
pip-audit --format json --output "${WORK_DIR}/pip-audit-raw.json" 2>&1 || true
# Fallback: safety check
safety check --json > "${WORK_DIR}/safety-raw.json" 2>&1 || true
fi
# Go projects
if [ -f go.sum ]; then
govulncheck ./... 2>&1 | tee "${WORK_DIR}/govulncheck-raw.txt" || true
fi
```
---
### Step 2: Secrets Detection
Scan source files for hardcoded secrets using regex patterns. Exclude generated, compiled, and dependency directories.
**Decision Table**:
| Match Type | Severity | Category |
|------------|----------|----------|
| API key / token with 16+ chars | Critical | secret |
| AWS AKIA key pattern | Critical | secret |
| Private key PEM block | Critical | secret |
| DB connection string with embedded password | Critical | secret |
| Hardcoded JWT token | High | secret |
| No matches | — | No finding |
**Execution**:
```bash
# High-confidence patterns (case-insensitive)
grep -rniE \
'(api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token|secret[_-]?key)\s*[:=]\s*["\x27][A-Za-z0-9+/=_-]{16,}' \
--include='*.ts' --include='*.js' --include='*.py' --include='*.go' \
--include='*.java' --include='*.rb' --include='*.env' --include='*.yml' \
--include='*.yaml' --include='*.json' --include='*.toml' --include='*.cfg' \
. || true
# AWS patterns
grep -rniE '(AKIA[0-9A-Z]{16}|aws[_-]?secret[_-]?access[_-]?key)' . || true
# Private keys
grep -rniE '-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----' . || true
# Connection strings with passwords
grep -rniE '(mongodb|postgres|mysql|redis)://[^:]+:[^@]+@' . || true
# JWT tokens (hardcoded)
grep -rniE 'eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}' . || true
```
Exclude from scan: `node_modules/`, `.git/`, `dist/`, `build/`, `__pycache__/`, `*.lock`, `*.min.js`.
Redact actual matched secret values in findings — use `[REDACTED]` in evidence field.
---
### Step 3: CI/CD Config Review
Check GitHub Actions and other CI/CD configurations for injection risks.
**Decision Table**:
| Pattern Found | Severity | Finding |
|---------------|----------|---------|
| `${{ github.event.` in `run:` block | High | Expression injection in workflow run step |
| `pull_request_target` with checkout of PR code | High | Privileged workflow triggered by untrusted code |
| `actions/checkout@v1` or `@v2` | Medium | Deprecated action version with known issues |
| `secrets.` passed to untrusted context | High | Secret exposure risk |
| No `.github/workflows/` directory | — | Not applicable; skip |
**Execution**:
```bash
# Find workflow files
find .github/workflows -name '*.yml' -o -name '*.yaml' 2>/dev/null
# Check for expression injection in run: blocks
# Dangerous: ${{ github.event.pull_request.title }} in run:
grep -rn '\${{.*github\.event\.' .github/workflows/ 2>/dev/null || true
# Check for pull_request_target with checkout of PR code
grep -rn 'pull_request_target' .github/workflows/ 2>/dev/null || true
# Check for use of deprecated/vulnerable actions
grep -rn 'actions/checkout@v1\|actions/checkout@v2' .github/workflows/ 2>/dev/null || true
# Check for secrets passed to untrusted contexts
grep -rn 'secrets\.' .github/workflows/ 2>/dev/null || true
```
---
### Step 4: LLM/AI Prompt Injection Check
Scan for patterns indicating prompt injection risk in LLM integrations.
**Decision Table**:
| Pattern Found | Severity | Finding |
|---------------|----------|---------|
| User input directly concatenated into prompt/system_message | High | LLM prompt injection vector |
| User input in template string passed to LLM call | High | LLM prompt injection via template |
| f-string with user data in `.complete`/`.generate` call | High | Python LLM prompt injection |
| LLM API call detected, no injection pattern | Low | LLM integration present — review for sanitization |
**Execution**:
```bash
# User input concatenated directly into prompts
grep -rniE '(prompt|system_message|messages)\s*[+=].*\b(user_input|request\.(body|query|params)|req\.)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Template strings with user data in LLM calls
grep -rniE '(openai|anthropic|llm|chat|completion)\.' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Check for missing input sanitization before LLM calls
grep -rniE 'f".*{.*}.*".*\.(chat|complete|generate)' \
--include='*.py' . || true
```
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| `.workflow/.security/supply-chain-report.json` | JSON | All supply chain findings with severity classifications |
```json
{
"phase": "supply-chain-scan",
"timestamp": "ISO-8601",
"findings": [
{
"category": "dependency|secret|cicd|llm",
"severity": "critical|high|medium|low",
"title": "Finding title",
"description": "Detailed description",
"file": "path/to/file",
"line": 42,
"evidence": "matched text or context",
"remediation": "How to fix"
}
],
"summary": {
"total": 0,
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
"by_category": { "dependency": 0, "secret": 0, "cicd": 0, "llm": 0 }
}
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| All 4 scan steps executed or explicitly skipped with reason | Review step execution log |
| `supply-chain-report.json` written to `.workflow/.security/` | File exists and is valid JSON |
| All findings have category, severity, file, evidence, remediation | JSON schema check |
| Secret values redacted in evidence field | No raw credential values in output |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Audit tool not installed | Log INFO finding; continue with remaining steps |
| `grep` finds no matches | No finding generated for that pattern; continue |
| `.github/workflows/` does not exist | Mark CI/CD step as not_applicable; continue |
| Write to WORK_DIR fails | Attempt `mkdir -p .workflow/.security` and retry once |
## Next Phase
-> [Phase 2: OWASP Review](02-owasp-review.md)

View File

@@ -0,0 +1,232 @@
# Phase 2: OWASP Review
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Systematic code-level review against OWASP Top 10 2021 categories using inline subagent analysis and targeted pattern scanning.
## Objective
- Review codebase against all 10 OWASP Top 10 2021 categories
- Use inline subagent multi-model analysis for comprehensive coverage
- Produce structured findings with file:line references and remediation steps
## Input
| Source | Required | Description |
|--------|----------|-------------|
| `~/.codex/skills/security-audit/specs/owasp-checklist.md` | Yes | Detection patterns per OWASP category |
| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings for dependency context |
| Project source files | Yes | `.ts`, `.js`, `.py`, `.go`, `.java` excluding deps/build |
## Execution Steps
### Step 1: Identify Target Scope
Discover source files, excluding generated and dependency directories.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Source files found | Proceed to Step 2 |
| No source files found | Report as BLOCKED with path note; do not proceed |
| Files > 500 | Prioritize routes/, auth/, api/, handlers/ first |
**Execution**:
```bash
# Identify source directories (exclude deps, build, test fixtures)
# Focus on: API routes, auth modules, data access, input handlers
find . -type f \( -name '*.ts' -o -name '*.js' -o -name '*.py' -o -name '*.go' -o -name '*.java' \) \
! -path '*/node_modules/*' ! -path '*/dist/*' ! -path '*/.git/*' \
! -path '*/build/*' ! -path '*/__pycache__/*' ! -path '*/vendor/*' \
| head -200
```
---
### Step 2: Inline Subagent OWASP Analysis
Spawn inline subagent using `cli-explore-agent` role to perform systematic OWASP analysis.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Subagent completes successfully | Integrate findings into Step 4 consolidation |
| Subagent times out | Continue with manual pattern scan (Step 3) only; log warning |
| Subagent errors | Continue with manual pattern scan only; log warning |
```
spawn_agent({
task_name: "inline-owasp-analysis",
fork_context: false,
model: "haiku",
reasoning_effort: "medium",
message: `### MANDATORY FIRST STEPS
1. Read: ~/.codex/agents/cli-explore-agent.md
Goal: OWASP Top 10 2021 security audit of this codebase.
Systematically check each OWASP category:
A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
A09 Security Logging/Monitoring Failures | A10 SSRF
TASK: For each OWASP category, scan relevant code patterns, identify vulnerabilities with file:line references, classify severity, provide remediation.
MODE: analysis
CONTEXT: @src/**/* @**/*.config.* @**/*.env.example
EXPECTED: JSON-structured findings per OWASP category with severity, file:line, evidence, remediation.
CONSTRAINTS: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks`
})
const result = wait_agent({ targets: ["inline-owasp-analysis"], timeout_ms: 300000 })
close_agent({ target: "inline-owasp-analysis" })
```
---
### Step 3: Manual Pattern Scanning
Supplement inline subagent analysis with targeted grep patterns per OWASP category. Reference `~/.codex/skills/security-audit/specs/owasp-checklist.md` for full pattern list.
**A01 — Broken Access Control**:
```bash
# Missing auth middleware on routes
grep -rn 'app\.\(get\|post\|put\|delete\|patch\)(' --include='*.ts' --include='*.js' . | grep -v 'auth\|middleware\|protect'
# Direct object references without ownership check
grep -rn 'params\.id\|req\.params\.' --include='*.ts' --include='*.js' . || true
```
**A03 — Injection**:
```bash
# SQL string concatenation
grep -rniE '(query|execute|raw)\s*\(\s*[`"'\'']\s*SELECT.*\+\s*|f".*SELECT.*{' --include='*.ts' --include='*.js' --include='*.py' . || true
# Command injection
grep -rniE '(exec|spawn|system|popen|subprocess)\s*\(' --include='*.ts' --include='*.js' --include='*.py' . || true
```
**A05 — Security Misconfiguration**:
```bash
# Debug mode enabled
grep -rniE '(DEBUG|debug)\s*[:=]\s*(true|True|1|"true")' --include='*.env' --include='*.py' --include='*.ts' --include='*.json' . || true
# CORS wildcard
grep -rniE "cors.*\*|Access-Control-Allow-Origin.*\*" --include='*.ts' --include='*.js' --include='*.py' . || true
```
**A07 — Identification and Authentication Failures**:
```bash
# Weak password patterns
grep -rniE 'password.*length.*[0-5][^0-9]|minlength.*[0-5][^0-9]' --include='*.ts' --include='*.js' --include='*.py' . || true
# Hardcoded credentials
grep -rniE '(password|passwd|pwd)\s*[:=]\s*["\x27][^"\x27]{3,}' --include='*.ts' --include='*.js' --include='*.py' --include='*.env' . || true
```
---
### Step 4: Consolidate Findings
Merge inline subagent results and manual pattern scan results. Deduplicate and classify by OWASP category.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Same finding in both sources | Keep highest severity; merge evidence; note both sources |
| Finding lacks file:line reference | Attempt to resolve via grep; if not resolvable, mark evidence as "pattern match — no line ref" |
| Category has no findings | Set coverage to `checked` with 0 findings |
| Category not applicable to project stack | Set coverage to `not_applicable` with reason |
---
## OWASP Top 10 2021 Coverage
| ID | Category | Key Checks |
|----|----------|------------|
| A01 | Broken Access Control | Missing auth, IDOR, path traversal, CORS |
| A02 | Cryptographic Failures | Weak algorithms, plaintext storage, missing TLS |
| A03 | Injection | SQL, NoSQL, OS command, LDAP, XPath injection |
| A04 | Insecure Design | Missing threat modeling, insecure business logic |
| A05 | Security Misconfiguration | Debug enabled, default creds, verbose errors |
| A06 | Vulnerable and Outdated Components | Known CVEs in dependencies (from Phase 1) |
| A07 | Identification and Authentication Failures | Weak passwords, missing MFA, session issues |
| A08 | Software and Data Integrity Failures | Unsigned updates, insecure deserialization, CI/CD |
| A09 | Security Logging and Monitoring Failures | Missing audit logs, no alerting, insufficient logging |
| A10 | Server-Side Request Forgery (SSRF) | Unvalidated URLs, internal resource access |
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| `.workflow/.security/owasp-findings.json` | JSON | Findings per OWASP category with coverage map |
```json
{
"phase": "owasp-review",
"timestamp": "ISO-8601",
"owasp_version": "2021",
"findings": [
{
"owasp_id": "A01",
"owasp_category": "Broken Access Control",
"severity": "critical|high|medium|low",
"title": "Finding title",
"description": "Detailed description",
"file": "path/to/file",
"line": 42,
"evidence": "code snippet or pattern match",
"remediation": "Specific fix recommendation",
"cwe": "CWE-XXX"
}
],
"coverage": {
"A01": "checked|not_applicable",
"A02": "checked|not_applicable",
"A03": "checked|not_applicable",
"A04": "checked|not_applicable",
"A05": "checked|not_applicable",
"A06": "checked|not_applicable",
"A07": "checked|not_applicable",
"A08": "checked|not_applicable",
"A09": "checked|not_applicable",
"A10": "checked|not_applicable"
},
"summary": {
"total": 0,
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
"categories_checked": 10,
"categories_with_findings": 0
}
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| All 10 OWASP categories have coverage entry | JSON coverage map has all A01A10 keys |
| All findings have owasp_id, severity, file, evidence, remediation | JSON schema check |
| `owasp-findings.json` written to `.workflow/.security/` | File exists and is valid JSON |
| Inline subagent result integrated (or skip logged) | Summary includes source note |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Inline subagent timeout | Continue with manual grep results; log "inline-owasp-analysis timed out" in summary |
| OWASP checklist spec not found | Use built-in patterns from this file; note missing spec |
| No source files in scope | Report BLOCKED with path; set all categories to not_applicable |
| Grep produces no matches for a category | Set that category coverage to `checked` with 0 findings |
## Next Phase
-> [Phase 3: Threat Modeling](03-threat-modeling.md)

View File

@@ -0,0 +1,249 @@
# Phase 3: Threat Modeling
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Map STRIDE threat categories to architecture components, identify trust boundaries, and assess attack surface.
## Objective
- Apply the STRIDE threat model to the project architecture
- Identify trust boundaries between system components
- Assess attack surface area per component
- Cross-reference with Phase 1 and Phase 2 findings
## Input
| Source | Required | Description |
|--------|----------|-------------|
| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings for dependency/CI context |
| `.workflow/.security/owasp-findings.json` | Yes | Phase 2 findings to cross-reference in STRIDE gaps |
| Project source files | Yes | Route handlers, data stores, external service clients, auth modules |
## Execution Steps
### Step 1: Architecture Component Discovery
Identify major system components by scanning project structure.
**Decision Table**:
| Component Pattern Found | component.type |
|------------------------|----------------|
| `app.get/post/put/delete/patch`, `router.`, `@app.route`, `@router.` | api_endpoint |
| `createConnection`, `mongoose.connect`, `sqlite`, `redis`, `S3`, `createClient` | data_store |
| `fetch`, `axios`, `http.request`, `requests.get/post`, `urllib` | external_service |
| `jwt`, `passport`, `session`, `oauth`, `bcrypt`, `argon2`, `crypto` | auth_module |
| `worker`, `subprocess`, `child_process`, `celery`, `queue` | worker |
**Execution**:
```bash
# Identify entry points (API routes, CLI commands, event handlers)
grep -rlE '(app\.(get|post|put|delete|patch|use)|router\.|@app\.route|@router\.)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Identify data stores (database connections, file storage)
grep -rlE '(createConnection|mongoose\.connect|sqlite|redis|S3|createClient)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Identify external service integrations
grep -rlE '(fetch|axios|http\.request|requests\.(get|post)|urllib)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
# Identify auth/session components
grep -rlE '(jwt|passport|session|oauth|bcrypt|argon2|crypto)' \
--include='*.ts' --include='*.js' --include='*.py' . || true
```
---
### Step 2: Trust Boundary Identification
Map the 5 standard trust boundary types. For each boundary: document what data crosses it, how it is enforced, and what happens when enforcement fails.
**Trust Boundary Types**:
| Boundary | From | To | Key Data Crossing |
|----------|------|----|------------------|
| External boundary | User/browser | Application server | User input, credentials, session tokens |
| Service boundary | Application | External APIs/services | API keys, request bodies, response data |
| Data boundary | Application | Database/storage | Query parameters, credentials, PII |
| Internal boundary | Public routes | Authenticated/admin routes | Auth tokens, role claims |
| Process boundary | Main process | Worker/subprocess | Job parameters, environment variables |
For each boundary, document:
- What crosses the boundary (data types, credentials)
- How the boundary is enforced (middleware, TLS, auth)
- What happens when enforcement fails
---
### Step 3: STRIDE per Component
For each discovered component, evaluate all 6 STRIDE categories systematically.
**STRIDE Category Definitions**:
| Category | Threat | Key Question |
|----------|--------|-------------|
| S — Spoofing | Identity impersonation | Can an attacker pretend to be someone else? |
| T — Tampering | Data modification | Can data be modified in transit or at rest? |
| R — Repudiation | Deniable actions | Can a user deny performing an action? |
| I — Information Disclosure | Data leakage | Can sensitive data be exposed? |
| D — Denial of Service | Availability disruption | Can the system be made unavailable? |
| E — Elevation of Privilege | Unauthorized access | Can a user gain higher privileges? |
**Spoofing Analysis Checks**:
- Are authentication mechanisms in place at all entry points?
- Can API keys or tokens be forged or replayed?
- Are session tokens properly validated and rotated?
**Tampering Analysis Checks**:
- Is input validation applied before processing?
- Are database queries parameterized?
- Can request bodies or headers be manipulated to alter behavior?
- Are file uploads validated for type and content?
**Repudiation Analysis Checks**:
- Are user actions logged with sufficient detail (who, what, when)?
- Are logs tamper-proof or centralized?
- Can critical operations (payments, deletions) be traced to a user?
**Information Disclosure Analysis Checks**:
- Do error responses leak stack traces or internal paths?
- Are sensitive fields (passwords, tokens) excluded from logs and API responses?
- Is PII properly handled (encryption at rest, masking in logs)?
- Do debug endpoints or verbose modes expose internals?
**Denial of Service Analysis Checks**:
- Are rate limits applied to public endpoints?
- Can resource-intensive operations be triggered without limits?
- Are file upload sizes bounded?
- Are database queries bounded (pagination, timeouts)?
**Elevation of Privilege Analysis Checks**:
- Are role/permission checks applied consistently?
- Can horizontal privilege escalation occur (accessing other users' data)?
- Can vertical escalation occur (user -> admin)?
- Are admin/debug routes properly protected?
**Component Exposure Rating**:
| Rating | Criteria |
|--------|----------|
| High | Public-facing, handles sensitive data, complex logic |
| Medium | Authenticated access, moderate data sensitivity |
| Low | Internal only, no sensitive data, simple operations |
---
### Step 4: Attack Surface Assessment
Quantify the attack surface across the entire system.
**Attack Surface Components**:
```
Attack Surface = Sum of:
- Number of public API endpoints
- Number of external service integrations
- Number of user-controllable input points
- Number of privileged operations
- Number of data stores with sensitive content
```
**Decision Table — Attack Surface Rating**:
| Total Score | Interpretation |
|-------------|---------------|
| 05 | Low attack surface |
| 615 | Moderate attack surface |
| 1630 | High attack surface |
| > 30 | Very high attack surface — prioritize hardening |
Cross-reference Phase 1 and Phase 2 findings when populating `gaps` arrays for each STRIDE category. A finding in Phase 2 (e.g., A03 injection) maps to STRIDE T (Tampering) for the relevant component.
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| `.workflow/.security/threat-model.json` | JSON | STRIDE model with components, trust boundaries, attack surface |
```json
{
"phase": "threat-modeling",
"timestamp": "ISO-8601",
"framework": "STRIDE",
"components": [
{
"name": "Component name",
"type": "api_endpoint|data_store|external_service|auth_module|worker",
"files": ["path/to/file.ts"],
"exposure": "high|medium|low",
"trust_boundaries": ["external", "data"],
"threats": {
"spoofing": {
"applicable": true,
"findings": ["Description of threat"],
"mitigations": ["Existing mitigation"],
"gaps": ["Missing mitigation"]
},
"tampering": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
"repudiation": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
"information_disclosure": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
"denial_of_service": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
"elevation_of_privilege": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] }
}
}
],
"trust_boundaries": [
{
"name": "Boundary name",
"from": "Component A",
"to": "Component B",
"enforcement": "TLS|auth_middleware|API_key",
"data_crossing": ["request bodies", "credentials"],
"risk_level": "high|medium|low"
}
],
"attack_surface": {
"public_endpoints": 0,
"external_integrations": 0,
"input_points": 0,
"privileged_operations": 0,
"sensitive_data_stores": 0,
"total_score": 0
},
"summary": {
"components_analyzed": 0,
"threats_identified": 0,
"by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 },
"high_exposure_components": 0
}
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| At least one component analyzed | `components` array has at least 1 entry |
| All 6 STRIDE categories evaluated per component | Each component.threats has all 6 keys |
| Trust boundaries mapped | `trust_boundaries` array populated |
| Attack surface quantified | `attack_surface.total_score` calculated |
| `threat-model.json` written to `.workflow/.security/` | File exists and is valid JSON |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| No components discovered via grep | Analyze project structure manually (README, package.json); note uncertainty |
| Phase 2 findings not available for cross-reference | Proceed with grep-only; note missing OWASP context |
| Ambiguous architecture (monolith vs microservices) | Document assumption in summary; note for user review |
| No `.github/workflows/` for CI boundary | Mark process boundary as not_applicable |
## Next Phase
-> [Phase 4: Report & Tracking](04-report-tracking.md)

View File

@@ -0,0 +1,300 @@
# Phase 4: Report & Tracking
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Generate scored audit report, compare with previous audits, and track security trends.
## Objective
- Calculate security score from all phase findings
- Compare with previous audit results (if available)
- Generate date-stamped report in `.workflow/.security/`
- Track improvement or regression trends
## Input
| Source | Required | Description |
|--------|----------|-------------|
| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings |
| `.workflow/.security/owasp-findings.json` | Yes | Phase 2 findings |
| `.workflow/.security/threat-model.json` | Yes | Phase 3 findings (STRIDE gaps) |
| `.workflow/.security/audit-report-*.json` | No | Previous audit reports for trend comparison |
| `~/.codex/skills/security-audit/specs/scoring-gates.md` | Yes | Scoring formula and gate thresholds |
## Execution Steps
### Step 1: Aggregate Findings
Collect all findings from phases 13 and classify by severity.
**Aggregation Formula**:
```
All findings =
supply-chain-report.findings
+ owasp-findings.findings
+ threat-model threats (where gaps array is non-empty)
```
**Deduplication Rule**:
| Condition | Action |
|-----------|--------|
| Same vulnerability appears in multiple phases | Keep highest-severity classification; merge evidence; count as single finding |
| Same file:line in different categories | Merge into one finding; note all phases that detected it |
| Unique finding per phase | Include as-is |
---
### Step 2: Calculate Score
Apply scoring formula from `~/.codex/skills/security-audit/specs/scoring-gates.md`.
**Scoring Formula**:
```
Base score = 10.0
For each finding:
penalty = severity_weight / total_files_scanned
- Critical: weight = 10 (each critical finding has outsized impact)
- High: weight = 7
- Medium: weight = 4
- Low: weight = 1
Weighted penalty = SUM(finding_weight * count_per_severity) / normalization_factor
Final score = max(0, 10.0 - weighted_penalty)
Normalization factor = max(10, total_files_scanned)
```
**Severity Weights**:
| Severity | Weight | Criteria | Examples |
|----------|--------|----------|----------|
| Critical | 10 | Exploitable with high impact, no user interaction needed | RCE, SQL injection with data access, leaked production credentials, auth bypass |
| High | 7 | Exploitable with significant impact, may need user interaction | Broken authentication, SSRF, privilege escalation, XSS with session theft |
| Medium | 4 | Limited exploitability or moderate impact | Reflected XSS, CSRF, verbose error messages, missing security headers |
| Low | 1 | Informational or minimal impact | Missing best-practice headers, minor info disclosure, deprecated dependencies without known exploit |
**Score Interpretation**:
| Score | Rating | Meaning |
|-------|--------|---------|
| 9.010.0 | Excellent | Minimal risk, production-ready |
| 7.08.9 | Good | Acceptable risk, minor improvements needed |
| 5.06.9 | Fair | Notable risks, remediation recommended |
| 3.04.9 | Poor | Significant risks, remediation required |
| 0.02.9 | Critical | Severe vulnerabilities, immediate action needed |
**Example Score Calculations**:
| Findings | Files Scanned | Weighted Sum | Penalty | Score |
|----------|--------------|--------------|---------|-------|
| 1 critical | 50 | 10 | 0.2 | 9.8 |
| 2 critical, 3 high | 50 | 41 | 0.82 | 9.2 |
| 5 critical, 10 high | 50 | 120 | 2.4 | 7.6 |
| 10 critical, 20 high, 15 medium | 100 | 300 | 3.0 | 7.0 |
| 20 critical | 20 | 200 | 10.0 | 0.0 |
---
### Step 3: Gate Evaluation
**Daily quick-scan gate** (Phase 1 only):
| Result | Condition | Action |
|--------|-----------|--------|
| PASS | score >= 8.0 | Continue. No blocking issues. |
| WARN | 6.0 <= score < 8.0 | Log warning. Review findings before deploy. |
| FAIL | score < 6.0 | Block deployment. Remediate critical/high findings. |
**Comprehensive audit gate** (all phases):
Initial/baseline audit (no previous audit exists):
| Result | Condition | Action |
|--------|-----------|--------|
| PASS | score >= 2.0 | Baseline established. Plan remediation. |
| FAIL | score < 2.0 | Critical exposure. Immediate triage required. |
Subsequent audits (previous audit exists):
| Result | Condition | Action |
|--------|-----------|--------|
| PASS | score >= previous_score | No regression. Continue improvement. |
| WARN | score within 0.5 of previous | Marginal change. Review new findings. |
| FAIL | score < previous_score - 0.5 | Regression detected. Investigate new findings. |
Production readiness target: score >= 7.0
---
### Step 4: Trend Comparison
Find and compare with previous audit reports.
**Execution**:
```bash
# Find previous audit reports
ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -5
```
**Trend Direction Decision Table**:
| Condition | direction |
|-----------|-----------|
| No previous audit file found | `baseline` |
| score_delta > 0.5 | `improving` |
| -0.5 <= score_delta <= 0.5 | `stable` |
| score_delta < -0.5 | `regressing` |
Compare current vs. previous:
- Delta per OWASP category (new findings vs. resolved findings)
- Delta per STRIDE category
- New findings vs. resolved findings (by title/file comparison)
- Overall score trend
**Trend JSON Format**:
```json
{
"trend": {
"current_date": "2026-03-29",
"current_score": 7.5,
"previous_date": "2026-03-22",
"previous_score": 6.8,
"score_delta": 0.7,
"new_findings": 2,
"resolved_findings": 5,
"direction": "improving",
"history": [
{ "date": "2026-03-15", "score": 5.2, "total_findings": 45 },
{ "date": "2026-03-22", "score": 6.8, "total_findings": 32 },
{ "date": "2026-03-29", "score": 7.5, "total_findings": 29 }
]
}
}
```
---
### Step 5: Generate Report
Assemble and write the final scored report.
**Execution**:
```bash
# Ensure directory exists
mkdir -p .workflow/.security
# Write report with date stamp
DATE=$(date +%Y-%m-%d)
cp "${WORK_DIR}/audit-report.json" ".workflow/.security/audit-report-${DATE}.json"
# Also maintain latest copies of phase outputs
cp "${WORK_DIR}/supply-chain-report.json" ".workflow/.security/" 2>/dev/null || true
cp "${WORK_DIR}/owasp-findings.json" ".workflow/.security/" 2>/dev/null || true
cp "${WORK_DIR}/threat-model.json" ".workflow/.security/" 2>/dev/null || true
```
Build `remediation_priority` list: rank by severity weight × inverse effort (low effort + high impact = priority 1).
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| `.workflow/.security/audit-report-<YYYY-MM-DD>.json` | JSON | Full scored report with trend, top risks, remediation priority |
```json
{
"report": "security-audit",
"version": "1.0",
"timestamp": "ISO-8601",
"date": "YYYY-MM-DD",
"mode": "comprehensive|quick-scan",
"score": {
"overall": 7.5,
"rating": "Good",
"gate": "PASS|FAIL",
"gate_threshold": 8
},
"findings_summary": {
"total": 0,
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
"by_phase": {
"supply_chain": 0,
"owasp": 0,
"stride": 0
},
"by_owasp": {
"A01": 0, "A02": 0, "A03": 0, "A04": 0, "A05": 0,
"A06": 0, "A07": 0, "A08": 0, "A09": 0, "A10": 0
},
"by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 }
},
"top_risks": [
{
"rank": 1,
"title": "Most critical finding",
"severity": "critical",
"source_phase": "owasp",
"remediation": "How to fix",
"effort": "low|medium|high"
}
],
"trend": {
"previous_date": "YYYY-MM-DD or null",
"previous_score": 0,
"score_delta": 0,
"new_findings": 0,
"resolved_findings": 0,
"direction": "improving|stable|regressing|baseline"
},
"phases_completed": ["supply-chain-scan", "owasp-review", "threat-modeling", "report-tracking"],
"files_scanned": 0,
"remediation_priority": [
{
"priority": 1,
"finding": "Finding title",
"effort": "low",
"impact": "high",
"recommendation": "Specific action"
}
]
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Score calculated using correct formula | Verify: base 10.0 - (weighted_sum / max(10, files)) |
| Gate evaluation matches mode and audit history | Check gate logic against previous audit presence |
| Trend direction computed correctly | Verify score_delta and direction mapping |
| `audit-report-<date>.json` written to `.workflow/.security/` | File exists, is valid JSON, contains all required fields |
| remediation_priority ranked by severity and effort | Priority 1 = highest severity + lowest effort |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Phase data file missing or corrupted | Report as BLOCKED; output partial report with available data |
| Previous audit parse error | Treat as baseline; note data integrity issue |
| files_scanned is zero | Use normalization_factor of 10 (minimum); continue |
| Date command unavailable | Use ISO timestamp substring for date portion |
| Write fails | Retry once with explicit `mkdir -p`; report BLOCKED if still failing |
## Completion Status
After report generation, output skill completion status:
| Status | Condition |
|--------|-----------|
| DONE | All phases completed, report generated, gate PASS |
| DONE_WITH_CONCERNS | Report generated but gate WARN or FAIL, or regression detected |
| BLOCKED | Phase data missing or corrupted, cannot calculate score |

View File

@@ -0,0 +1,318 @@
# ship-operator Agent
Executes all 5 gated phases of the release pipeline sequentially, enforcing gate conditions before advancing.
## Identity
- **Type**: `pipeline-executor`
- **Role File**: `~/.codex/agents/ship-operator.md`
- **task_name**: `ship-operator`
- **Responsibility**: Code generation / Execution (write mode — git, file updates, push, PR)
- **fork_context**: false
## Boundaries
### MUST
- Load role definition via MANDATORY FIRST STEPS pattern
- Read the phase detail file at the start of each phase before executing any step
- Check gate condition after each phase and halt on failure
- Produce structured JSON output for each completed phase
- Confirm with user before proceeding on major version bumps or direct-to-main releases
- Include file:line references in any findings
### MUST NOT
- Skip the MANDATORY FIRST STEPS role loading
- Advance to the next phase if the current phase gate fails
- Push to remote if Phase 3 (version bump) gate failed
- Create a PR if Phase 4 (push) gate failed
- Produce unstructured output
- Modify files outside the release pipeline scope (version file, CHANGELOG.md, package-lock.json)
---
## Toolbox
### Available Tools
| Tool | Type | Purpose |
|------|------|---------|
| `Bash` | Execution | Run git, npm, pytest, gh, jq, sed commands |
| `Read` | File I/O | Read phase detail files, version files, CHANGELOG.md |
| `Write` | File I/O | Write/update CHANGELOG.md, VERSION file |
| `Edit` | File I/O | Update package.json, pyproject.toml version fields |
| `Glob` | Discovery | Detect presence of version files, test configs |
| `Grep` | Search | Scan commit messages, detect conventional commit prefixes |
| `spawn_agent` | Agent | Spawn inline-code-review subagent during Phase 2 |
| `wait_agent` | Agent | Wait for inline-code-review subagent result |
| `close_agent` | Agent | Close inline-code-review subagent after use |
---
## Execution
### Phase 1: Pre-Flight Checks
**Objective**: Validate repository is in shippable state.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| ~/.codex/skills/ship/phases/01-preflight-checks.md | Yes | Full phase execution detail |
| Repository working directory | Yes | Git repo with working tree |
**Steps**:
Read `~/.codex/skills/ship/phases/01-preflight-checks.md` first.
Then execute all four checks as specified in that file:
1. Git clean check — `git status --porcelain`
2. Branch validation — `git branch --show-current`
3. Test suite execution — detect and run npm test / pytest
4. Build verification — detect and run npm run build / python -m build / make build
**Decision Table**:
| Condition | Action |
|-----------|--------|
| All checks pass | Set gate = pass, output preflight JSON, await Phase 2 task |
| Any check fails | Set gate = fail, output BLOCKED with failure details, halt |
| Branch is main/master | Set gate = warn, ask user to confirm direct release |
| No tests detected | Set gate = warn (skip), continue to build check |
| No build step detected | Set gate = pass (info), continue |
**Output**: Structured preflight-report JSON (see phase file for schema).
---
### Phase 2: Code Review
**Objective**: Diff analysis and AI-powered code review via inline subagent.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| ~/.codex/skills/ship/phases/02-code-review.md | Yes | Full phase execution detail |
| Phase 1 gate result | Yes | Must be pass before running |
**Steps**:
Read `~/.codex/skills/ship/phases/02-code-review.md` first.
1. Detect merge base (compare to origin/main or origin/master; if on main use last tag)
2. Generate diff summary (`git diff --stat`, count files/lines)
3. Perform risk assessment (sensitive files, large diffs — see phase file table)
4. Spawn inline-code-review subagent (see Inline Subagent Calls section below)
5. Evaluate review results against gate condition
**Decision Table**:
| Condition | Action |
|-----------|--------|
| No critical issues | Set gate = pass, output review JSON |
| Critical issues found | Set gate = fail, output BLOCKED with issues list |
| Warnings only | Set gate = warn, proceed, flag DONE_WITH_CONCERNS |
| Subagent timeout or error | Log warning, ask user whether to proceed or retry |
**Output**: Structured code-review JSON (see phase file for schema).
---
### Phase 3: Version Bump
**Objective**: Detect version file, determine and apply bump.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| ~/.codex/skills/ship/phases/03-version-bump.md | Yes | Full phase execution detail |
| Phase 2 gate result | Yes | Must be pass/warn before running |
**Steps**:
Read `~/.codex/skills/ship/phases/03-version-bump.md` first.
1. Detect version file (package.json > pyproject.toml > VERSION)
2. Read current version
3. Scan commits for conventional prefixes to determine suggested bump type
4. For major bumps: ask user to confirm before proceeding
5. Calculate new version (semver)
6. Update version file using jq / sed / echo as appropriate
7. Verify update by re-reading
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Version file found and updated | Set gate = pass, output version record |
| No version file found | Set gate = needs_context, ask user, halt until answered |
| Version mismatch after update | Set gate = fail, output BLOCKED |
| User declines major bump | Set gate = blocked, halt |
| Bump type ambiguous | Default to patch, inform user |
**Output**: Structured version-bump JSON (see phase file for schema).
---
### Phase 4: Changelog & Commit
**Objective**: Generate changelog, create release commit, push to remote.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| ~/.codex/skills/ship/phases/04-changelog-commit.md | Yes | Full phase execution detail |
| Phase 3 output | Yes | new_version, version_file |
**Steps**:
Read `~/.codex/skills/ship/phases/04-changelog-commit.md` first.
1. Gather commits since last tag (`git log "$last_tag"..HEAD`)
2. Group by conventional commit prefix into changelog sections
3. Format markdown changelog entry (`## [X.Y.Z] - YYYY-MM-DD`)
4. Update or create CHANGELOG.md (insert new entry after main heading)
5. Stage changes (`git add -u`)
6. Create release commit (`chore: bump version to <new_version>`)
7. Push branch to remote
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Push succeeded | Set gate = pass, output commit record |
| Push rejected (non-fast-forward) | Set gate = fail, BLOCKED — suggest `git pull --rebase` |
| Permission denied | Set gate = fail, BLOCKED — advise check remote access |
| No remote configured | Set gate = fail, BLOCKED — suggest `git remote add` |
| No previous tag | Use last 50 commits for changelog |
**Output**: Structured changelog-commit JSON (see phase file for schema).
---
### Phase 5: PR Creation
**Objective**: Create PR with structured body and linked issues.
**Input**:
| Source | Required | Description |
|--------|----------|-------------|
| ~/.codex/skills/ship/phases/05-pr-creation.md | Yes | Full phase execution detail |
| Phase 4 output | Yes | commit_sha, pushed_to |
| Phase 3 output | Yes | new_version, previous_version, bump_type |
| Phase 2 output | Yes | merge_base (for change summary) |
**Steps**:
Read `~/.codex/skills/ship/phases/05-pr-creation.md` first.
1. Extract issue references from commit messages (fixes/closes/resolves/refs #N)
2. Determine target branch (main fallback master)
3. Build PR title: `release: v<new_version>`
4. Build PR body (Summary, Changes, Linked Issues, Version, Test Plan sections)
5. Create PR via `gh pr create`
6. Capture PR URL from gh output
**Decision Table**:
| Condition | Action |
|-----------|--------|
| PR created, URL returned | Set gate = pass, output PR record, output DONE |
| Phase 2 had warnings only | Set gate = pass with concerns, output DONE_WITH_CONCERNS |
| gh CLI not available | Set gate = fail, BLOCKED — advise `gh auth login` |
| PR creation fails | Set gate = fail, BLOCKED — report error details |
**Output**: Structured PR creation JSON plus final completion status (see phase file for schema).
---
## Inline Subagent Calls
This agent spawns a utility subagent during Phase 2 for AI code review:
### inline-code-review
**When**: After completing risk assessment (Phase 2, Step 3)
**Agent File**: ~/.codex/agents/cli-explore-agent.md
```
spawn_agent({
task_name: "inline-code-review",
fork_context: false,
model: "haiku",
reasoning_effort: "medium",
message: `### MANDATORY FIRST STEPS
1. Read: ~/.codex/agents/cli-explore-agent.md
Goal: Review code changes for release readiness
Context: Diff from <merge_base> to HEAD (<files_changed> files, +<lines_added>/-<lines_removed> lines)
Task:
- Review diff for bugs and correctness issues
- Check for breaking changes (API, config, schema)
- Identify security concerns
- Assess test coverage gaps
- Flag formatting-only changes to exclude from critical issues
Expected: Risk level (low/medium/high), list of issues with severity and file:line reference, release recommendation (ship|hold|fix-first)
Constraints: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes`
})
const result = wait_agent({ targets: ["inline-code-review"], timeout_ms: 300000 })
close_agent({ target: "inline-code-review" })
```
### Result Handling
| Result | Severity | Action |
|--------|----------|--------|
| recommendation: "ship", no critical issues | — | gate = pass, integrate findings |
| recommendation: "hold" or critical issues found | HIGH | gate = fail, BLOCKED — list issues |
| recommendation: "fix-first" | HIGH | gate = fail, BLOCKED — list issues with locations |
| Warnings only, recommendation: "ship" | MEDIUM | gate = warn, proceed with DONE_WITH_CONCERNS |
| Timeout or error | — | Log warning, ask user whether to proceed or retry |
---
## Structured Output Template
```
## Summary
- One-sentence phase completion status
## Phase Result
- Phase: <phase_name>
- Gate: pass | fail | warn | blocked | needs_context
- Status: PASS | BLOCKED | NEEDS_CONTEXT | DONE_WITH_CONCERNS | DONE
## Findings
- Finding 1: specific description with file:line reference (if applicable)
- Finding 2: specific description with file:line reference (if applicable)
## Artifacts
- File: path/to/modified/file
Change: specific modification made
## Open Questions
1. Question needing user answer (if gate = needs_context)
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Phase detail file not found | Report error, halt — phase files are required |
| Git command fails | Report stderr, set gate = fail, BLOCKED |
| Version file parse error | Report error, set gate = needs_context, ask user |
| Inline subagent timeout | Log warning, ask user whether to proceed without AI review |
| Build/test failure | Report output, set gate = fail, BLOCKED |
| Push rejected | Report rejection reason, set gate = fail, BLOCKED with suggestion |
| gh CLI missing | Report error, set gate = fail, BLOCKED with install advice |
| Three consecutive failures at same step | Stop, output diagnostic dump, halt |

View File

@@ -0,0 +1,426 @@
---
name: ship
description: Structured release pipeline with pre-flight checks, AI code review, version bump, changelog, and PR creation. Triggers on "ship", "release", "publish".
agents: ship-operator
phases: 5
---
# Ship
Structured release pipeline that guides code from working branch to pull request through 5 gated phases: pre-flight checks, automated code review, version bump, changelog generation, and PR creation.
## Architecture
```
+--------------------------------------------------------------+
| ship Orchestrator |
| -> Single ship-operator agent driven through 5 gated phases |
+------------------------------+-------------------------------+
|
+-------------------+-------------------+
v v v
+------------+ +------------+ +------------+
| Phase 1 | --> | Phase 2 | --> | Phase 3 |
| Pre-Flight | | Code Review| | Version |
| Checks | | | | Bump |
+------------+ +------------+ +------------+
v v v
Gate: ALL Gate: No Gate: Version
4 checks critical updated OK
pass issues
|
+-------------------+-------------------+
v v
+------------+ +------------+
| Phase 4 | ----------------------> | Phase 5 |
| Changelog | | PR Creation|
| & Commit | | |
+------------+ +------------+
v v
Gate: Push Gate: PR
succeeded created
```
---
## Agent Registry
| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
|-------|-----------|-----------|----------------|---------|--------------|
| ship-operator | ship-operator | ~/.codex/agents/ship-operator.md | Execute all 5 release phases sequentially, enforce gates | Deep Interaction (2.3) | false |
> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
---
## Fork Context Strategy
| Agent | task_name | fork_context | fork_from | Rationale |
|-------|-----------|--------------|-----------|-----------|
| ship-operator | ship-operator | false | — | Starts fresh; all context provided in initial task message |
**Fork Decision Rules**:
| Condition | fork_context | Reason |
|-----------|--------------|--------|
| Pipeline stage with explicit input | false | Context in message, not history |
| Agent is isolated utility | false | Clean context, focused task |
| ship-operator | false | Self-contained release operator; no parent context needed |
---
## Subagent Registry
Utility subagents callable by ship-operator (not separate pipeline stages):
| Subagent | Agent File | Callable By | Purpose | Model |
|----------|-----------|-------------|---------|-------|
| inline-code-review | ~/.codex/agents/cli-explore-agent.md | ship-operator | AI code review of diff during Phase 2 | haiku |
> Subagents are spawned by agents within their own execution context (Pattern 2.8), not by the orchestrator.
---
## Phase Execution
### Phase 1: Pre-Flight Checks
**Objective**: Validate that the repository is in a shippable state — confirm clean working tree, appropriate branch, passing tests, and successful build.
**Input**:
| Source | Description |
|--------|-------------|
| User trigger | "ship" / "release" / "publish" command |
| Repository | Current git working directory |
| Phase detail | ~/.codex/skills/ship/phases/01-preflight-checks.md |
**Execution**:
Spawn ship-operator with Phase 1 task. The operator reads the phase detail file then executes all four checks.
```
spawn_agent({
task_name: "ship-operator",
fork_context: false,
message: `## TASK ASSIGNMENT
### MANDATORY FIRST STEPS
1. Read role definition: ~/.codex/agents/ship-operator.md (MUST read first)
2. Read phase detail: ~/.codex/skills/ship/phases/01-preflight-checks.md
---
Goal: Execute Phase 1 Pre-Flight Checks for the release pipeline.
Execute all four checks (git clean, branch validation, test suite, build verification).
Output structured preflight-report JSON plus gate status.`
})
const phase1Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
```
**Gate Decision**:
| Condition | Action |
|-----------|--------|
| All four checks pass (overall: "pass") | Fast-advance: assign Phase 2 task to ship-operator |
| Any check fails (overall: "fail") | BLOCKED — report failure details, halt pipeline |
| Branch is main/master (warn) | Ask user to confirm direct-to-main release before proceeding |
| Timeout | assign_task "Finalize current work and output results", re-wait 120s |
**Output**:
| Artifact | Description |
|----------|-------------|
| preflight-report JSON | Pass/fail per check, blockers list |
| Gate status | pass / fail / blocked |
---
### Phase 2: Code Review
**Objective**: Detect merge base, generate diff, run AI-powered code review via inline subagent, assess risk, evaluate results.
**Input**:
| Source | Description |
|--------|-------------|
| Phase 1 result | Gate passed (overall: "pass") |
| Repository | Git history, diff data |
| Phase detail | ~/.codex/skills/ship/phases/02-code-review.md |
**Execution**:
Phase 2 is assigned to the already-running ship-operator via assign_task.
```
assign_task({
target: "ship-operator",
items: [{ type: "text", text: `## PHASE 2 TASK
Read phase detail: ~/.codex/skills/ship/phases/02-code-review.md
Execute Phase 2 Code Review:
1. Detect merge base
2. Generate diff summary
3. Perform risk assessment
4. Spawn inline-code-review subagent for AI analysis
5. Evaluate review results and report gate status` }]
})
const phase2Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 600000 })
```
**Gate Decision**:
| Condition | Action |
|-----------|--------|
| No critical issues (overall: "pass") | Fast-advance: assign Phase 3 task to ship-operator |
| Critical issues found (overall: "fail") | BLOCKED — report critical issues list, halt pipeline |
| Warnings only (overall: "warn") | Fast-advance to Phase 3, flag DONE_WITH_CONCERNS |
| Review subagent timeout/error | Ask user whether to proceed or retry; if proceed, flag warn |
| Timeout on phase2Result | assign_task "Finalize current work", re-wait 120s |
**Output**:
| Artifact | Description |
|----------|-------------|
| Review summary JSON | Risk level, risk factors, AI review recommendation, issues |
| Gate status | pass / fail / warn / blocked |
---
### Phase 3: Version Bump
**Objective**: Detect version file, determine bump type from commits or user input, calculate new version, update version file, verify update.
**Input**:
| Source | Description |
|--------|-------------|
| Phase 2 result | Gate passed (no critical issues) |
| Repository | package.json / pyproject.toml / VERSION |
| Phase detail | ~/.codex/skills/ship/phases/03-version-bump.md |
**Execution**:
```
assign_task({
target: "ship-operator",
items: [{ type: "text", text: `## PHASE 3 TASK
Read phase detail: ~/.codex/skills/ship/phases/03-version-bump.md
Execute Phase 3 Version Bump:
1. Detect version file (package.json > pyproject.toml > VERSION)
2. Determine bump type from commit messages (patch/minor/major)
3. For major bumps: ask user to confirm before proceeding
4. Calculate new version
5. Update version file
6. Verify update
Output version change record JSON plus gate status.` }]
})
const phase3Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
```
**Gate Decision**:
| Condition | Action |
|-----------|--------|
| Version file updated and verified (overall: "pass") | Fast-advance: assign Phase 4 task to ship-operator |
| Version file not found | NEEDS_CONTEXT — ask user which file to use; halt until answered |
| Version mismatch after update (overall: "fail") | BLOCKED — report mismatch, halt pipeline |
| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
| Timeout | assign_task "Finalize current work", re-wait 120s |
**Output**:
| Artifact | Description |
|----------|-------------|
| Version change record JSON | version_file, previous_version, new_version, bump_type, bump_source |
| Gate status | pass / fail / needs_context / blocked |
---
### Phase 4: Changelog & Commit
**Objective**: Parse git log into grouped changelog entry, update CHANGELOG.md, create release commit, push branch to remote.
**Input**:
| Source | Description |
|--------|-------------|
| Phase 3 result | new_version, version_file, bump_type |
| Repository | Git history since last tag |
| Phase detail | ~/.codex/skills/ship/phases/04-changelog-commit.md |
**Execution**:
```
assign_task({
target: "ship-operator",
items: [{ type: "text", text: `## PHASE 4 TASK
Read phase detail: ~/.codex/skills/ship/phases/04-changelog-commit.md
New version: <new_version>
Version file: <version_file>
Execute Phase 4 Changelog & Commit:
1. Gather commits since last tag
2. Group by conventional commit type
3. Format changelog entry
4. Update or create CHANGELOG.md
5. Create release commit (chore: bump version to <new_version>)
6. Push branch to remote
Output commit record JSON plus gate status.` }]
})
const phase4Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
```
**Gate Decision**:
| Condition | Action |
|-----------|--------|
| Push succeeded (overall: "pass") | Fast-advance: assign Phase 5 task to ship-operator |
| Push rejected (non-fast-forward) | BLOCKED — report error, suggest `git pull --rebase` |
| Permission denied | BLOCKED — report error, advise check remote access |
| No remote configured | BLOCKED — report error, suggest `git remote add` |
| Timeout | assign_task "Finalize current work", re-wait 120s |
**Output**:
| Artifact | Description |
|----------|-------------|
| Commit record JSON | changelog_entry, commit_sha, commit_message, pushed_to |
| Gate status | pass / fail / blocked |
---
### Phase 5: PR Creation
**Objective**: Extract issue references from commits, build PR title and body, create PR via `gh pr create`, capture PR URL.
**Input**:
| Source | Description |
|--------|-------------|
| Phase 4 result | commit_sha, new_version, previous_version, bump_type |
| Phase 2 result | merge_base (for change_summary) |
| Repository | Git history, remote |
| Phase detail | ~/.codex/skills/ship/phases/05-pr-creation.md |
**Execution**:
```
assign_task({
target: "ship-operator",
items: [{ type: "text", text: `## PHASE 5 TASK
Read phase detail: ~/.codex/skills/ship/phases/05-pr-creation.md
New version: <new_version>
Previous version: <previous_version>
Bump type: <bump_type>
Merge base: <merge_base>
Commit SHA: <commit_sha>
Execute Phase 5 PR Creation:
1. Extract issue references from commits
2. Determine target branch
3. Build PR title: "release: v<new_version>"
4. Build PR body with all sections
5. Create PR via gh pr create
6. Capture and report PR URL
Output PR creation record JSON plus final completion status.` }]
})
const phase5Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
```
**Gate Decision**:
| Condition | Action |
|-----------|--------|
| PR created, URL returned (overall: "pass") | Pipeline complete — output DONE status |
| PR created with review warnings | Pipeline complete — output DONE_WITH_CONCERNS |
| gh CLI not available | BLOCKED — report error, advise `gh auth login` |
| PR creation fails | BLOCKED — report error details, halt |
| Timeout | assign_task "Finalize current work", re-wait 120s |
**Output**:
| Artifact | Description |
|----------|-------------|
| PR record JSON | pr_url, pr_title, target_branch, source_branch, linked_issues |
| Final completion status | DONE / DONE_WITH_CONCERNS / BLOCKED |
---
## Lifecycle Management
### Timeout Protocol
| Phase | Default Timeout | On Timeout |
|-------|-----------------|------------|
| Phase 1: Pre-Flight | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
| Phase 2: Code Review | 600000 ms (10 min) | assign_task "Finalize current work", re-wait 120s |
| Phase 3: Version Bump | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
| Phase 4: Changelog & Commit | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
| Phase 5: PR Creation | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
### Cleanup Protocol
After Phase 5 completes (or on any terminal BLOCKED halt), close ship-operator.
```
close_agent({ target: "ship-operator" })
```
### Agent Health Check
```
const remaining = list_agents({})
if (remaining.length > 0) {
remaining.forEach(agent => close_agent({ target: agent.id }))
}
```
---
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Agent timeout (first) | assign_task with "Finalize current work and output results" + re-wait 120s |
| Agent timeout (second) | Log error, close_agent({ target: "ship-operator" }), report partial results |
| Gate fail — any phase | Log BLOCKED status with phase name and failure detail, close_agent, halt |
| NEEDS_CONTEXT | Pause pipeline, surface question to user, resume with assign_task on answer |
| send_message ignored | Escalate to assign_task |
| Inline subagent timeout | ship-operator handles internally; continue with warn if review failed |
| User cancellation | close_agent({ target: "ship-operator" }), report current pipeline state |
| Fork from closed agent | Not applicable (single agent, no forking) |
---
## Output Format
```
## Summary
- One-sentence completion status (DONE / DONE_WITH_CONCERNS / BLOCKED)
## Results
- Phase 1 Pre-Flight: pass/fail
- Phase 2 Code Review: pass/warn/fail
- Phase 3 Version Bump: <previous> -> <new> (<bump_type>)
- Phase 4 Changelog & Commit: commit <sha> pushed to <remote/branch>
- Phase 5 PR Creation: <pr_url>
## Artifacts
- CHANGELOG.md (updated)
- <version_file> (version bumped to <new_version>)
- Release commit: <sha>
- PR: <pr_url>
## Next Steps (Optional)
1. Review and merge the PR
2. Tag the release after merge
```

View File

@@ -0,0 +1,198 @@
# Phase 1: Pre-Flight Checks
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Validate that the repository is in a shippable state before proceeding with the release pipeline.
## Objective
- Confirm working tree is clean (no uncommitted changes)
- Validate current branch is appropriate for release
- Run test suite and confirm all tests pass
- Verify build succeeds
## Input
| Source | Required | Description |
|--------|----------|-------------|
| Repository working directory | Yes | Git repo with working tree |
| package.json / pyproject.toml / Makefile | No | Used for test and build detection |
## Execution Steps
### Step 1: Git Clean Check
Run `git status --porcelain` and evaluate output.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Output is empty | PASS — working tree is clean |
| Output is non-empty | FAIL — working tree is dirty; report dirty files, suggest `git stash` or `git commit` |
```bash
git_status=$(git status --porcelain)
if [ -n "$git_status" ]; then
echo "FAIL: Working tree is dirty"
echo "$git_status"
# Gate: BLOCKED — commit or stash changes first
else
echo "PASS: Working tree is clean"
fi
```
**Pass condition**: `git status --porcelain` produces empty output.
**On failure**: Report dirty files and suggest `git stash` or `git commit`.
---
### Step 2: Branch Validation
Run `git branch --show-current` and evaluate.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Branch is not main or master | PASS — proceed |
| Branch is main or master | WARN — ask user to confirm direct-to-main/master release before proceeding |
| User confirms direct release | PASS with warning noted |
| User declines | BLOCKED — halt pipeline |
```bash
current_branch=$(git branch --show-current)
if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
echo "WARN: Currently on $current_branch — direct push to main/master is risky"
# Ask user for confirmation before proceeding
else
echo "PASS: On branch $current_branch"
fi
```
**Pass condition**: Not on main/master, OR user explicitly confirms direct-to-main release.
**On warning**: Ask user to confirm they intend to release from main/master directly.
---
### Step 3: Test Suite Execution
Detect project type and run appropriate test command.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| package.json with "test" script exists | Run `npm test` |
| pytest available and tests/ or test/ directory exists | Run `pytest` |
| pyproject.toml with pytest listed exists | Run `pytest` |
| No test suite detected | WARN and continue (skip check) |
| Test command exits code 0 | PASS |
| Test command exits non-zero | FAIL — report test failures, halt pipeline |
```bash
# Detection priority:
# 1. package.json with "test" script → npm test
# 2. pytest available and tests exist → pytest
# 3. No tests found → WARN and continue
if [ -f "package.json" ] && grep -q '"test"' package.json; then
npm test
elif command -v pytest &>/dev/null && [ -d "tests" -o -d "test" ]; then
pytest
elif [ -f "pyproject.toml" ] && grep -q 'pytest' pyproject.toml; then
pytest
else
echo "WARN: No test suite detected — skipping test check"
fi
```
**Pass condition**: Test command exits with code 0, or no tests detected (warn).
**On failure**: Report test failures and stop the pipeline.
---
### Step 4: Build Verification
Detect project build step and run it.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| package.json with "build" script exists | Run `npm run build` |
| pyproject.toml exists and python build module available | Run `python -m build` |
| Makefile with build target exists | Run `make build` |
| No build step detected | INFO — skip (not all projects need a build), PASS |
| Build command exits code 0 | PASS |
| Build command exits non-zero | FAIL — report build errors, halt pipeline |
```bash
# Detection priority:
# 1. package.json with "build" script → npm run build
# 2. pyproject.toml → python -m build (if build module available)
# 3. Makefile with build target → make build
# 4. No build step → PASS (not all projects need a build)
if [ -f "package.json" ] && grep -q '"build"' package.json; then
npm run build
elif [ -f "pyproject.toml" ] && python -m build --help &>/dev/null; then
python -m build
elif [ -f "Makefile" ] && grep -q '^build:' Makefile; then
make build
else
echo "INFO: No build step detected — skipping build check"
fi
```
**Pass condition**: Build command exits with code 0, or no build step detected.
**On failure**: Report build errors and stop the pipeline.
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| preflight-report | JSON | Pass/fail per check, current branch, blockers list |
```json
{
"phase": "preflight",
"timestamp": "ISO-8601",
"checks": {
"git_clean": { "status": "pass|fail", "details": "" },
"branch": { "status": "pass|warn", "current": "branch-name", "details": "" },
"tests": { "status": "pass|fail|skip", "details": "" },
"build": { "status": "pass|fail|skip", "details": "" }
},
"overall": "pass|fail",
"blockers": []
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Git working tree is clean | `git status --porcelain` returns empty |
| Branch is non-main or user confirmed | Branch check + optional user confirmation |
| Tests pass or skipped with warning | Test command exit code 0, or skip with WARN |
| Build passes or skipped with info | Build command exit code 0, or skip with INFO |
| Overall gate is "pass" | All checks produce pass/warn/skip (no fail) |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| Dirty working tree | BLOCKED — list dirty files, suggest `git stash` or `git commit`, halt |
| Tests fail | BLOCKED — report test output, halt pipeline |
| Build fails | BLOCKED — report build output, halt pipeline |
| git command not found | BLOCKED — report environment error |
| No version file or project type detected | WARN — continue, version detection deferred to Phase 3 |
## Next Phase
-> [Phase 2: Code Review](02-code-review.md)
If any check fails (overall: "fail"), report BLOCKED status with the preflight report. Do not proceed.

View File

@@ -0,0 +1,228 @@
# Phase 2: Code Review
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Automated AI-powered code review of changes since the base branch, with risk assessment.
## Objective
- Detect the merge base between current branch and target branch
- Generate diff for review
- Assess high-risk indicators before AI review
- Run AI-powered code review via inline subagent
- Flag high-risk changes (large diffs, sensitive files, breaking changes)
## Input
| Source | Required | Description |
|--------|----------|-------------|
| Phase 1 gate result | Yes | overall: "pass" — must have passed |
| Repository git history | Yes | Commit log, diff data |
## Execution Steps
### Step 1: Detect Merge Base
Determine the target branch and find the common ancestor commit.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| origin/main exists | Use main as target branch |
| origin/main not found | Fall back to master as target branch |
| Current branch is main or master | Use last tag as merge base |
| Current branch is main/master and no tags exist | Use initial commit as merge base |
| Current branch is feature branch | Use `git merge-base origin/<target> HEAD` |
```bash
# Determine target branch (default: main, fallback: master)
target_branch="main"
if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
target_branch="master"
fi
# Find merge base
merge_base=$(git merge-base "origin/$target_branch" HEAD)
echo "Merge base: $merge_base"
# If on main/master directly, compare against last tag
current_branch=$(git branch --show-current)
if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$last_tag" ]; then
merge_base="$last_tag"
echo "On main — using last tag as base: $last_tag"
else
# Use first commit if no tags exist
merge_base=$(git rev-list --max-parents=0 HEAD | head -1)
echo "No tags found — using initial commit as base"
fi
fi
```
---
### Step 2: Generate Diff Summary
Collect statistics and full diff content.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Diff command succeeds | Record files_changed, lines_added, lines_removed |
| No changes found | WARN — nothing to review; ask user whether to proceed |
```bash
# File-level summary
git diff --stat "$merge_base"...HEAD
# Full diff for review
git diff "$merge_base"...HEAD > /tmp/ship-review-diff.txt
# Count changes for risk assessment
files_changed=$(git diff --name-only "$merge_base"...HEAD | wc -l)
lines_added=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$1} END {print s}')
lines_removed=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$2} END {print s}')
```
---
### Step 3: Risk Assessment
Flag high-risk indicators before AI review.
**Risk Factor Table**:
| Risk Factor | Threshold | Risk Level |
|-------------|-----------|------------|
| Files changed | > 50 | High |
| Lines changed | > 1000 | High |
| Sensitive files modified | Any of: `.env*`, `*secret*`, `*credential*`, `*auth*`, `*.key`, `*.pem` | High |
| Config files modified | `package.json`, `pyproject.toml`, `tsconfig.json`, `Dockerfile` | Medium |
| Migration files | `*migration*`, `*migrate*` | Medium |
```bash
# Check for sensitive file changes
sensitive_files=$(git diff --name-only "$merge_base"...HEAD | grep -iE '\.(env|key|pem)|secret|credential' || true)
if [ -n "$sensitive_files" ]; then
echo "HIGH RISK: Sensitive files modified:"
echo "$sensitive_files"
fi
```
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Sensitive files detected | Set risk_level = high, add to risk_factors |
| files_changed > 50 | Set risk_level = high, add to risk_factors |
| lines changed > 1000 | Set risk_level = high, add to risk_factors |
| Config or migration files detected | Set risk_level = medium (if not already high) |
| No risk factors | Set risk_level = low |
---
### Step 4: AI Code Review via Inline Subagent
Spawn inline-code-review subagent for AI analysis. Replace the ccw cli call from the original with this inline subagent:
```
spawn_agent({
task_name: "inline-code-review",
fork_context: false,
model: "haiku",
reasoning_effort: "medium",
message: `### MANDATORY FIRST STEPS
1. Read: ~/.codex/agents/cli-explore-agent.md
Goal: Review code changes for release readiness
Context: Diff from <merge_base> to HEAD (<files_changed> files, +<lines_added>/-<lines_removed> lines)
Task:
- Review diff for bugs and correctness issues
- Check for breaking changes (API, config, schema)
- Identify security concerns
- Assess test coverage gaps
- Flag formatting-only changes to exclude from critical issues
Expected: Risk level (low/medium/high), list of issues with severity and file:line reference, release recommendation (ship|hold|fix-first)
Constraints: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes`
})
const result = wait_agent({ targets: ["inline-code-review"], timeout_ms: 300000 })
close_agent({ target: "inline-code-review" })
```
**Note**: Wait for the subagent to complete before proceeding. Do not advance to Step 5 while review is running.
---
### Step 5: Evaluate Review Results
Based on the inline subagent output, apply gate logic.
**Review Result Decision Table**:
| Review Result | Action |
|---------------|--------|
| recommendation: "ship", no critical issues | Gate = pass — proceed to Phase 3 |
| recommendation: "hold" or critical issues present | Gate = fail — report BLOCKED, list issues |
| recommendation: "fix-first" | Gate = fail — report BLOCKED, list issues with file:line |
| Warnings only, recommendation: "ship" | Gate = warn — proceed with DONE_WITH_CONCERNS note |
| Review subagent failed or timed out | Ask user whether to proceed or retry |
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| Review summary | JSON | Risk level, risk factors, AI review recommendation, critical issues, warnings |
```json
{
"phase": "code-review",
"merge_base": "commit-sha",
"stats": {
"files_changed": 0,
"lines_added": 0,
"lines_removed": 0
},
"risk_level": "low|medium|high",
"risk_factors": [],
"ai_review": {
"recommendation": "ship|hold|fix-first",
"critical_issues": [],
"warnings": []
},
"overall": "pass|fail|warn"
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Merge base detected | merge_base SHA present in output |
| Diff statistics collected | files_changed, lines_added, lines_removed populated |
| Risk assessment completed | risk_level set (low/medium/high), risk_factors populated |
| AI review completed | ai_review.recommendation present |
| Gate condition evaluated | overall set to pass/fail/warn |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| origin/main and origin/master both missing | Use HEAD~1 as merge base, warn user |
| No commits in diff | WARN — nothing to review; ask user whether to proceed |
| Inline subagent timeout | Log warning, ask user whether to proceed without AI review |
| Inline subagent error | Log error, ask user whether to proceed |
| Critical issues found | BLOCKED — report full issues list with severity and file:line |
## Next Phase
-> [Phase 3: Version Bump](03-version-bump.md)
If review passes (overall: "pass" or "warn"), proceed to Phase 3.
If critical issues found (overall: "fail"), report BLOCKED status with review summary. Do not proceed.

View File

@@ -0,0 +1,259 @@
# Phase 3: Version Bump
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Detect the current version, determine the bump type, and update the version file.
## Objective
- Detect which version file the project uses
- Read the current version
- Determine bump type (patch/minor/major) from commit messages or user input
- Update the version file
- Record the version change
## Input
| Source | Required | Description |
|--------|----------|-------------|
| Phase 2 gate result | Yes | overall: "pass" or "warn" — must have passed |
| package.json / pyproject.toml / VERSION | Conditional | One must exist; used for version detection |
| Git history | Yes | Commit messages for bump type auto-detection |
## Execution Steps
### Step 1: Detect Version File
Search for version file in priority order.
**Version File Detection Priority Table**:
| Priority | File | Read Method |
|----------|------|-------------|
| 1 | `package.json` | `jq -r .version package.json` |
| 2 | `pyproject.toml` | `grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml` |
| 3 | `VERSION` | `cat VERSION` |
**Decision Table**:
| Condition | Action |
|-----------|--------|
| package.json found | Set version_file = package.json, read version with node/jq |
| pyproject.toml found (no package.json) | Set version_file = pyproject.toml, read with grep -oP |
| VERSION found (no others) | Set version_file = VERSION, read with cat |
| No version file found | NEEDS_CONTEXT — ask user which file to use or create |
```bash
if [ -f "package.json" ]; then
version_file="package.json"
current_version=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
elif [ -f "pyproject.toml" ]; then
version_file="pyproject.toml"
current_version=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
elif [ -f "VERSION" ]; then
version_file="VERSION"
current_version=$(cat VERSION | tr -d '[:space:]')
else
echo "NEEDS_CONTEXT: No version file found"
echo "Expected one of: package.json, pyproject.toml, VERSION"
# Ask user which file to use or create
fi
echo "Version file: $version_file"
echo "Current version: $current_version"
```
---
### Step 2: Determine Bump Type
Auto-detect from commit messages, then confirm with user for major bumps.
**Bump Type Auto-Detection from Conventional Commits**:
```bash
# Get commits since last tag
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$last_tag" ]; then
commits=$(git log "$last_tag"..HEAD --oneline)
else
commits=$(git log --oneline -20)
fi
# Scan for conventional commit prefixes
has_breaking=$(echo "$commits" | grep -iE '(BREAKING CHANGE|!:)' || true)
has_feat=$(echo "$commits" | grep -iE '^[a-f0-9]+ feat' || true)
has_fix=$(echo "$commits" | grep -iE '^[a-f0-9]+ fix' || true)
if [ -n "$has_breaking" ]; then
suggested_bump="major"
elif [ -n "$has_feat" ]; then
suggested_bump="minor"
else
suggested_bump="patch"
fi
echo "Suggested bump: $suggested_bump"
```
**User Confirmation Decision Table**:
| Bump Type | Action |
|-----------|--------|
| patch | Proceed with suggested bump, inform user |
| minor | Proceed with suggested bump, inform user |
| major | Always ask user to confirm before proceeding |
| User overrides suggestion | Use user-specified bump type |
| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
---
### Step 3: Calculate New Version
Apply semver arithmetic to derive new version.
**Decision Table**:
| Bump Type | Calculation |
|-----------|-------------|
| major | `(major+1).0.0` |
| minor | `major.(minor+1).0` |
| patch | `major.minor.(patch+1)` |
```bash
# Parse semver components
IFS='.' read -r major minor patch <<< "$current_version"
case "$bump_type" in
major)
new_version="$((major + 1)).0.0"
;;
minor)
new_version="${major}.$((minor + 1)).0"
;;
patch)
new_version="${major}.${minor}.$((patch + 1))"
;;
esac
echo "Version bump: $current_version -> $new_version"
```
---
### Step 4: Update Version File
Write new version to the appropriate file using the correct method for each format.
**Decision Table**:
| Version File | Update Method |
|--------------|---------------|
| package.json | `jq --arg v "<new_version>" '.version = $v'` + update package-lock.json if present |
| pyproject.toml | `sed -i "s/^version\s*=\s*\".*\"/version = \"<new_version>\"/"` |
| VERSION | `echo "<new_version>" > VERSION` |
```bash
case "$version_file" in
package.json)
# Use node/jq for safe JSON update
jq --arg v "$new_version" '.version = $v' package.json > tmp.json && mv tmp.json package.json
# Also update package-lock.json if it exists
if [ -f "package-lock.json" ]; then
jq --arg v "$new_version" '.version = $v | .packages[""].version = $v' package-lock.json > tmp.json && mv tmp.json package-lock.json
fi
;;
pyproject.toml)
# Use sed for TOML update (version line in [project] or [tool.poetry])
sed -i "s/^version\s*=\s*\".*\"/version = \"$new_version\"/" pyproject.toml
;;
VERSION)
echo "$new_version" > VERSION
;;
esac
echo "Updated $version_file: $current_version -> $new_version"
```
---
### Step 5: Verify Update
Re-read version file to confirm the update was applied correctly.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Re-read version equals new_version | PASS — gate satisfied |
| Re-read version does not match | FAIL — report mismatch, BLOCKED |
```bash
# Re-read to confirm
case "$version_file" in
package.json)
verified=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
;;
pyproject.toml)
verified=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
;;
VERSION)
verified=$(cat VERSION | tr -d '[:space:]')
;;
esac
if [ "$verified" = "$new_version" ]; then
echo "PASS: Version verified as $new_version"
else
echo "FAIL: Version mismatch — expected $new_version, got $verified"
fi
```
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| Version change record | JSON | version_file, previous_version, new_version, bump_type, bump_source |
```json
{
"phase": "version-bump",
"version_file": "package.json",
"previous_version": "1.2.3",
"new_version": "1.3.0",
"bump_type": "minor",
"bump_source": "auto-detected|user-specified",
"overall": "pass|fail"
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Version file detected | version_file field populated |
| Current version read | current_version field populated |
| Bump type determined | bump_type set to patch/minor/major |
| Version file updated | Write/edit operation succeeded |
| Update verified | Re-read matches new_version |
| overall = "pass" | All steps completed without error |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| No version file found | NEEDS_CONTEXT — ask user which file to create or use |
| Version parse error (malformed semver) | NEEDS_CONTEXT — report current value, ask user for correction |
| jq not available | Fall back to node for package.json; report error for others |
| sed fails on pyproject.toml | Try Write tool to rewrite the file; report on failure |
| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
| Version mismatch after update | BLOCKED — report expected vs actual, suggest manual fix |
## Next Phase
-> [Phase 4: Changelog & Commit](04-changelog-commit.md)
If version updated successfully (overall: "pass"), proceed to Phase 4.
If update fails or context needed, report BLOCKED / NEEDS_CONTEXT. Do not proceed.

View File

@@ -0,0 +1,263 @@
# Phase 4: Changelog & Commit
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Generate changelog entry from git history, update CHANGELOG.md, create release commit, and push to remote.
## Objective
- Parse git log since last tag into grouped changelog entry
- Update or create CHANGELOG.md
- Create a release commit with version in the message
- Push the branch to remote
## Input
| Source | Required | Description |
|--------|----------|-------------|
| Phase 3 output | Yes | new_version, version_file, bump_type |
| Git history | Yes | Commits since last tag |
| CHANGELOG.md | No | Updated in-place if it exists; created if not |
## Execution Steps
### Step 1: Gather Commits Since Last Tag
Retrieve commits to include in the changelog.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Last tag exists | `git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges` |
| No previous tag found | Use last 50 commits: `git log --pretty=format:"%h %s" --no-merges -50` |
```bash
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$last_tag" ]; then
echo "Generating changelog since tag: $last_tag"
git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges
else
echo "No previous tag found — using last 50 commits"
git log --pretty=format:"%h %s" --no-merges -50
fi
```
---
### Step 2: Group Commits by Conventional Commit Type
Parse commit messages and group into changelog sections.
**Conventional Commit Grouping Table**:
| Prefix | Category | Changelog Section |
|--------|----------|-------------------|
| `feat:` / `feat(*):` | Features | **Features** |
| `fix:` / `fix(*):` | Bug Fixes | **Bug Fixes** |
| `perf:` | Performance | **Performance** |
| `docs:` | Documentation | **Documentation** |
| `refactor:` | Refactoring | **Refactoring** |
| `chore:` | Maintenance | **Maintenance** |
| `test:` | Testing | *(omitted from changelog)* |
| Other | Miscellaneous | **Other Changes** |
```bash
# Example grouping logic (executed by the agent, not a literal script):
# 1. Read all commits since last tag
# 2. Parse prefix from each commit message
# 3. Group into categories
# 4. Format as markdown sections
# 5. Omit empty categories
```
---
### Step 3: Format Changelog Entry
Generate a markdown changelog entry using ISO 8601 date format.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Category has commits | Include section with all entries |
| Category is empty | Omit section entirely |
| test: commits present | Omit from changelog output |
Changelog entry format:
```markdown
## [X.Y.Z] - YYYY-MM-DD
### Features
- feat: description (sha)
- feat(scope): description (sha)
### Bug Fixes
- fix: description (sha)
### Performance
- perf: description (sha)
### Other Changes
- chore: description (sha)
```
Rules:
- Date format: YYYY-MM-DD (ISO 8601)
- Each entry includes the short SHA for traceability
- Empty categories are omitted
- Entries are listed in chronological order within each category
---
### Step 4: Update CHANGELOG.md
Write the new entry into CHANGELOG.md.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| CHANGELOG.md exists | Insert new entry after the first heading line (`# Changelog`), before previous version entry |
| CHANGELOG.md does not exist | Create new file with `# Changelog` heading followed by new entry |
```bash
if [ -f "CHANGELOG.md" ]; then
# Insert new entry after the first heading line (# Changelog)
# The new entry goes between the main heading and the previous version entry
# Use Write tool to insert the new section at the correct position
echo "Updating existing CHANGELOG.md"
else
# Create new CHANGELOG.md with header
echo "Creating new CHANGELOG.md"
fi
```
**CHANGELOG.md structure**:
```markdown
# Changelog
## [X.Y.Z] - YYYY-MM-DD
(new entry here)
## [X.Y.Z-1] - YYYY-MM-DD
(previous entry)
```
---
### Step 5: Create Release Commit
Stage changed files and create conventionally-formatted release commit.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Version file is package.json | Stage package.json and package-lock.json (if present) |
| Version file is pyproject.toml | Stage pyproject.toml |
| Version file is VERSION | Stage VERSION |
| CHANGELOG.md was updated/created | Stage CHANGELOG.md |
| git commit succeeds | Proceed to push step |
| git commit fails | BLOCKED — report error |
```bash
# Stage version file and changelog
git add package.json package-lock.json pyproject.toml VERSION CHANGELOG.md 2>/dev/null
# Only stage files that actually exist and are modified
git add -u
# Create release commit
git commit -m "$(cat <<'EOF'
chore: bump version to X.Y.Z
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EOF
)"
```
**Commit message format**: `chore: bump version to <new_version>`
- Follows conventional commit format
- Includes Co-Authored-By trailer
---
### Step 6: Push to Remote
Push the branch to the remote origin.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Remote tracking branch exists | `git push origin "<current_branch>"` |
| No remote tracking branch | `git push -u origin "<current_branch>"` |
| Push succeeds (exit 0) | PASS — gate satisfied |
| Push rejected (non-fast-forward) | BLOCKED — report error, suggest `git pull --rebase` |
| Permission denied | BLOCKED — report error, advise check remote access |
| No remote configured | BLOCKED — report error, suggest `git remote add` |
```bash
current_branch=$(git branch --show-current)
# Check if remote tracking branch exists
if git rev-parse --verify "origin/$current_branch" &>/dev/null; then
git push origin "$current_branch"
else
git push -u origin "$current_branch"
fi
```
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| Commit and push record | JSON | changelog_entry, commit_sha, commit_message, pushed_to |
| CHANGELOG.md | Markdown file | Updated with new version entry |
```json
{
"phase": "changelog-commit",
"changelog_entry": "## [X.Y.Z] - YYYY-MM-DD ...",
"commit_sha": "abc1234",
"commit_message": "chore: bump version to X.Y.Z",
"pushed_to": "origin/branch-name",
"overall": "pass|fail"
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Commits gathered since last tag | Commit list non-empty or warn if empty |
| Changelog entry formatted | Markdown entry with correct sections |
| CHANGELOG.md updated or created | File exists with new entry at top |
| Release commit created | `git log -1 --oneline` shows commit |
| Branch pushed to remote | Push command exits 0 |
| overall = "pass" | All steps completed without error |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| No commits since last tag | WARN — create minimal changelog entry, continue |
| CHANGELOG.md write error | BLOCKED — report file system error |
| git commit fails (nothing staged) | Verify version file and CHANGELOG.md were modified, re-stage |
| Push rejected (non-fast-forward) | BLOCKED — suggest `git pull --rebase`, halt |
| Push permission denied | BLOCKED — advise check SSH keys or access token |
| No remote configured | BLOCKED — suggest `git remote add origin <url>` |
## Next Phase
-> [Phase 5: PR Creation](05-pr-creation.md)
If commit and push succeed (overall: "pass"), proceed to Phase 5.
If push fails, report BLOCKED status with error details. Do not proceed.

View File

@@ -0,0 +1,280 @@
# Phase 5: PR Creation
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
Create a pull request via GitHub CLI with a structured body, linked issues, and release metadata.
## Objective
- Create a PR using `gh pr create` with structured body
- Auto-link related issues from commit messages
- Include release summary (version, changes, test plan)
- Output the PR URL
## Input
| Source | Required | Description |
|--------|----------|-------------|
| Phase 4 output | Yes | commit_sha, pushed_to |
| Phase 3 output | Yes | new_version, previous_version, bump_type, version_file |
| Phase 2 output | Yes | merge_base (for change summary) |
| Git history | Yes | Commit messages for issue extraction |
## Execution Steps
### Step 1: Extract Issue References from Commits
Scan commit messages for issue reference patterns.
**Issue Reference Pattern**: `fixes #N`, `closes #N`, `resolves #N`, `refs #N` (case-insensitive, singular and plural forms).
**Decision Table**:
| Condition | Action |
|-----------|--------|
| Last tag exists | Scan commits from last_tag..HEAD |
| No last tag | Scan last 50 commit subjects |
| Issue references found | Deduplicate, sort numerically |
| No issue references found | issues_section = empty (omit section from PR body) |
```bash
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$last_tag" ]; then
commits=$(git log "$last_tag"..HEAD --pretty=format:"%s" --no-merges)
else
commits=$(git log --pretty=format:"%s" --no-merges -50)
fi
# Extract issue references: fixes #N, closes #N, resolves #N, refs #N
issues=$(echo "$commits" | grep -oiE '(fix(es)?|close[sd]?|resolve[sd]?|refs?)\s*#[0-9]+' | grep -oE '#[0-9]+' | sort -u || true)
echo "Referenced issues: $issues"
```
---
### Step 2: Determine Target Branch
Find the appropriate base branch for the PR.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| origin/main exists | target_branch = main |
| origin/main not found | target_branch = master |
```bash
# Default target: main (fallback: master)
target_branch="main"
if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
target_branch="master"
fi
current_branch=$(git branch --show-current)
echo "PR: $current_branch -> $target_branch"
```
---
### Step 3: Build PR Title
Format the PR title as `release: vX.Y.Z`.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| new_version available from Phase 3 | pr_title = "release: v<new_version>" |
| new_version not available | Fall back to descriptive title derived from branch name |
```bash
pr_title="release: v${new_version}"
```
---
### Step 4: Build PR Body
Construct the full PR body with all sections.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| issues list non-empty | Include "## Linked Issues" section with each issue as `- #N` |
| issues list empty | Omit "## Linked Issues" section |
| Phase 2 warnings exist | Include warning note in Summary section |
```bash
# Gather change summary
change_summary=$(git log "$merge_base"..HEAD --pretty=format:"- %s (%h)" --no-merges)
# Build linked issues section
if [ -n "$issues" ]; then
issues_section="## Linked Issues
$(echo "$issues" | while read -r issue; do echo "- $issue"; done)"
else
issues_section=""
fi
```
**PR Body Sections Table**:
| Section | Content |
|---------|---------|
| **Summary** | Version being released, one-line description |
| **Changes** | Grouped changelog entries (from Phase 4) |
| **Linked Issues** | Auto-extracted `fixes #N`, `closes #N` references |
| **Version** | Previous version, new version, bump type |
| **Test Plan** | Checklist confirming all phases passed |
---
### Step 5: Create PR via gh CLI
Invoke `gh pr create` with title and fully assembled body.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| gh CLI available | Execute `gh pr create` |
| gh CLI not installed | BLOCKED — report missing CLI, advise `gh auth login` |
| PR created successfully | Capture URL from output |
| PR creation fails (already exists) | Report existing PR URL, gate = pass |
| PR creation fails (other error) | BLOCKED — report error details |
```bash
gh pr create --title "$pr_title" --base "$target_branch" --body "$(cat <<'EOF'
## Summary
Release vX.Y.Z
### Changes
- list of changes from changelog
## Linked Issues
- #N (fixes)
- #M (closes)
## Version
- Previous: X.Y.Z-1
- New: X.Y.Z
- Bump type: patch|minor|major
## Test Plan
- [ ] Pre-flight checks passed (git clean, branch, tests, build)
- [ ] AI code review completed with no critical issues
- [ ] Version bump verified in version file
- [ ] Changelog updated with all changes since last release
- [ ] Release commit pushed successfully
Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
```
---
### Step 6: Capture and Report PR URL
Extract the PR URL from gh output.
**Decision Table**:
| Condition | Action |
|-----------|--------|
| URL present in output | Record pr_url, set gate = pass |
| No URL in output | Check `gh pr view --json url` as fallback |
| Both fail | BLOCKED — report failure |
```bash
# gh pr create outputs the PR URL on success
pr_url=$(gh pr create ... 2>&1 | tail -1)
echo "PR created: $pr_url"
```
---
## Output
| Artifact | Format | Description |
|----------|--------|-------------|
| PR creation record | JSON | pr_url, pr_title, target_branch, source_branch, linked_issues |
| Final completion status | Text block | DONE / DONE_WITH_CONCERNS with full summary |
```json
{
"phase": "pr-creation",
"pr_url": "https://github.com/owner/repo/pull/N",
"pr_title": "release: vX.Y.Z",
"target_branch": "main",
"source_branch": "feature-branch",
"linked_issues": ["#1", "#2"],
"overall": "pass|fail"
}
```
## Success Criteria
| Criterion | Validation Method |
|-----------|-------------------|
| Issue references extracted | issues list populated (or empty with no error) |
| Target branch determined | target_branch set to main or master |
| PR title formatted | pr_title = "release: v<new_version>" |
| PR body assembled with all sections | All required sections present |
| PR created via gh CLI | pr_url present in output |
| Completion status output | DONE or DONE_WITH_CONCERNS block present |
## Error Handling
| Scenario | Resolution |
|----------|------------|
| gh CLI not installed | BLOCKED — report error, advise install + `gh auth login` |
| Not authenticated with gh | BLOCKED — report auth error, advise `gh auth login` |
| PR already exists for branch | Report existing PR URL, treat as pass |
| No changes to create PR for | BLOCKED — report, suggest verifying Phase 4 push succeeded |
| Issue regex finds no matches | issues = [] — omit Linked Issues section, continue |
## Completion Status
After PR creation, output the final Completion Status:
```
## STATUS: DONE
**Summary**: Released vX.Y.Z — PR created at <pr_url>
### Details
- Phases completed: 5/5
- Version: <previous> -> <new> (<bump_type>)
- PR: <pr_url>
- Key outputs: CHANGELOG.md updated, release commit pushed, PR created
### Outputs
- CHANGELOG.md (updated)
- <version_file> (version bumped)
- Release commit: <sha>
- PR: <pr_url>
```
If there were review warnings from Phase 2, use `DONE_WITH_CONCERNS` and list the warnings in the Details section:
```
## STATUS: DONE_WITH_CONCERNS
**Summary**: Released vX.Y.Z — PR created at <pr_url> (review warnings noted)
### Details
- Phases completed: 5/5
- Version: <previous> -> <new> (<bump_type>)
- PR: <pr_url>
- Concerns: <list review warnings from Phase 2>
### Outputs
- CHANGELOG.md (updated)
- <version_file> (version bumped)
- Release commit: <sha>
- PR: <pr_url>
```

View File

@@ -416,6 +416,8 @@ Visual workflow template editor with drag-drop.
- **[Impeccable](https://github.com/pbakaus/impeccable)** — Design audit methodology, OKLCH color system, anti-AI-slop detection patterns, editorial typography standards, motion/animation token architecture, and vanilla JS interaction patterns. The UI team skills (`team-ui-polish`, `team-interactive-craft`, `team-motion-design`, `team-visual-a11y`, `team-uidesign`, `team-ux-improve`) draw heavily from Impeccable's design knowledge.
- **[gstack](https://github.com/garrytan/gstack)** — Systematic debugging methodology, security audit frameworks, and release pipeline patterns. The skills `investigate` (Iron Law debugging), `security-audit` (OWASP Top 10 + STRIDE), and `ship` (gated release pipeline) are inspired by gstack's workflow designs.
---
## 🤝 Contributing

View File

@@ -416,6 +416,8 @@ v2 团队架构引入了**事件驱动的节拍模型**,实现高效编排:
- **[Impeccable](https://github.com/pbakaus/impeccable)** — 设计审计方法论、OKLCH 色彩系统、anti-AI-slop 检测模式、编辑级排版标准、动效/动画 token 体系、以及原生 JS 交互模式。UI 团队技能(`team-ui-polish``team-interactive-craft``team-motion-design``team-visual-a11y``team-uidesign``team-ux-improve`)大量借鉴了 Impeccable 的设计知识。
- **[gstack](https://github.com/garrytan/gstack)** — 系统化调试方法论、安全审计框架与发布流水线模式。`investigate`Iron Law 调试)、`security-audit`OWASP Top 10 + STRIDE`ship`(门控发布流水线)三个技能的设计灵感来源于 gstack 的工作流设计。
---
## 🤝 贡献