feat: add investigate, security-audit, ship skills (Claude + Codex)

- Add 3 new Claude skills: investigate (Iron Law debugging), security-audit (OWASP Top 10 + STRIDE), ship (gated release pipeline) - Port all 3 skills to Codex v4 format under .codex/skills/ using Deep Interaction pattern (spawn_agent + assign_task phase transitions) - Update README/README_CN acknowledgments: credit gstack (https://github.com/garrytan/gstack) as inspiration source Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 20:21:09 +08:00 · 2026-03-30 10:31:13 +08:00
parent 360a0316f7
commit 67ff3fe339
43 changed files with 8892 additions and 0 deletions
--- a/.claude/skills/investigate/SKILL.md
+++ b/.claude/skills/investigate/SKILL.md
@@ -0,0 +1,110 @@
+---
+name: investigate
+description: Systematic debugging with Iron Law methodology. 5-phase investigation from evidence collection to verified fix. Triggers on "investigate", "debug", "root cause".
+allowed-tools: Bash, Read, Write, Edit, Glob, Grep
+---
+
+# Investigate
+
+Systematic debugging skill that enforces the Iron Law: never fix without a confirmed root cause. Produces a structured debug report with full evidence chain, minimal fix, and regression test.
+
+## Iron Law Principle
+
+**No fix without confirmed root cause.** Every investigation follows a strict evidence chain:
+1. Reproduce the bug with concrete evidence
+2. Analyze patterns to assess scope
+3. Form and test hypotheses (max 3 strikes)
+4. Implement minimal fix ONLY after root cause is confirmed
+5. Verify fix and generate structured report
+
+Violation of the Iron Law (skipping to Phase 4 without Phase 3 confirmation) is prohibited.
+
+## Key Design Principles
+
+1. **Evidence-First**: Collect before theorizing. Logs, stack traces, and reproduction steps are mandatory inputs.
+2. **Minimal Fix**: Change only what is necessary. Refactoring is not debugging.
+3. **3-Strike Escalation**: If 3 consecutive hypothesis tests fail, STOP and escalate with a diagnostic dump.
+4. **Regression Coverage**: Every fix must include a test that fails without the fix and passes with it.
+5. **Structured Output**: All findings are recorded in machine-readable JSON for future reference.
+
+## Execution Flow
+
+```
+Phase 1: Root Cause Investigation
+  Reproduce bug, collect evidence (errors, logs, traces)
+  Use ccw cli --tool gemini --mode analysis for initial diagnosis
+  Output: investigation-report.json
+      |
+      v
+Phase 2: Pattern Analysis
+  Search codebase for similar patterns (same error, module, antipattern)
+  Assess scope: isolated vs systemic
+  Output: pattern-analysis section in report
+      |
+      v
+Phase 3: Hypothesis Testing
+  Form max 3 hypotheses from evidence
+  Test each with minimal read-only probes
+  3-strike rule: STOP and escalate on 3 consecutive failures
+  Output: confirmed root cause with evidence chain
+      |
+      v
+Phase 4: Implementation  [GATE: requires Phase 3 confirmed root cause]
+  Implement minimal fix
+  Add regression test
+  Verify fix resolves reproduction case
+      |
+      v
+Phase 5: Verification & Report
+  Run full test suite
+  Check for regressions
+  Generate structured debug report to .workflow/.debug/
+```
+
+## Directory Setup
+
+```bash
+mkdir -p .workflow/.debug
+```
+
+## Output Structure
+
+```
+.workflow/.debug/
+  debug-report-{YYYY-MM-DD}-{slug}.json    # Structured debug report
+```
+
+## Completion Status Protocol
+
+This skill follows the Completion Status Protocol defined in `_shared/SKILL-DESIGN-SPEC.md` sections 13-14.
+
+| Status | When |
+|--------|------|
+| **DONE** | Root cause confirmed, fix applied, regression test passes, no regressions |
+| **DONE_WITH_CONCERNS** | Fix applied but partial test coverage or minor warnings |
+| **BLOCKED** | Cannot reproduce bug, or 3-strike escalation triggered in Phase 3 |
+| **NEEDS_CONTEXT** | Missing reproduction steps, unclear error conditions |
+
+## Reference Documents
+
+| Document | Purpose |
+|----------|---------|
+| [phases/01-root-cause-investigation.md](phases/01-root-cause-investigation.md) | Evidence collection and reproduction |
+| [phases/02-pattern-analysis.md](phases/02-pattern-analysis.md) | Codebase pattern search and scope assessment |
+| [phases/03-hypothesis-testing.md](phases/03-hypothesis-testing.md) | Hypothesis formation, testing, and 3-strike rule |
+| [phases/04-implementation.md](phases/04-implementation.md) | Minimal fix with Iron Law gate |
+| [phases/05-verification-report.md](phases/05-verification-report.md) | Test suite, regression check, report generation |
+| [specs/iron-law.md](specs/iron-law.md) | Iron Law rules definition |
+| [specs/debug-report-format.md](specs/debug-report-format.md) | Structured debug report JSON schema |
+
+## CLI Integration
+
+This skill leverages `ccw cli` for multi-model analysis at key points:
+
+| Phase | CLI Usage | Mode |
+|-------|-----------|------|
+| Phase 1 | Initial diagnosis from error evidence | `--mode analysis` |
+| Phase 2 | Cross-file pattern search | `--mode analysis` |
+| Phase 3 | Hypothesis validation assistance | `--mode analysis` |
+
+All CLI calls use `--mode analysis` (read-only). No write-mode CLI calls during investigation phases 1-3.
--- a/.claude/skills/investigate/phases/01-root-cause-investigation.md
+++ b/.claude/skills/investigate/phases/01-root-cause-investigation.md
@@ -0,0 +1,132 @@
+# Phase 1: Root Cause Investigation
+
+Reproduce the bug and collect all available evidence before forming any theories.
+
+## Objective
+
+- Reproduce the bug with concrete, observable symptoms
+- Collect all evidence: error messages, logs, stack traces, affected files
+- Establish a baseline understanding of what goes wrong and where
+- Use CLI analysis for initial diagnosis
+
+## Execution Steps
+
+### Step 1: Understand the Bug Report
+
+Parse the user's description to extract:
+- **Symptom**: What observable behavior is wrong?
+- **Expected**: What should happen instead?
+- **Context**: When/where does it occur? (specific input, environment, timing)
+
+```javascript
+const bugReport = {
+  symptom: "extracted from user description",
+  expected_behavior: "what should happen",
+  context: "when/where it occurs",
+  user_provided_files: ["files mentioned by user"],
+  user_provided_errors: ["error messages provided"]
+}
+```
+
+### Step 2: Reproduce the Bug
+
+Attempt to reproduce using the most direct method available:
+
+1. **Run the failing test** (if one exists):
+   ```bash
+   # Identify and run the specific failing test
+   ```
+
+2. **Run the failing command** (if CLI/script):
+   ```bash
+   # Execute the command that triggers the bug
+   ```
+
+3. **Read error-producing code path** (if reproduction requires complex setup):
+   - Use `Grep` to find the error message in source code
+   - Use `Read` to trace the code path that produces the error
+   - Document the theoretical reproduction path
+
+**If reproduction fails**: Document what was attempted. The investigation can continue with static analysis, but note this as a concern.
+
+### Step 3: Collect Evidence
+
+Gather all available evidence using project tools:
+
+```javascript
+// 1. Find error messages in source
+Grep({ pattern: "error message text", path: "src/" })
+
+// 2. Find related log output
+Grep({ pattern: "relevant log pattern", path: "." })
+
+// 3. Read stack trace files or test output
+Read({ file_path: "path/to/failing-test-output" })
+
+// 4. Identify affected files and modules
+Glob({ pattern: "**/*relevant-module*" })
+```
+
+### Step 4: Initial Diagnosis via CLI Analysis
+
+Use `ccw cli` for a broader diagnostic perspective:
+
+```bash
+ccw cli -p "PURPOSE: Diagnose root cause of bug from collected evidence
+TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
+MODE: analysis
+CONTEXT: @{affected_files} | Evidence: {error_messages_and_traces}
+EXPECTED: Top 3 likely root causes ranked by evidence strength
+CONSTRAINTS: Read-only analysis | Focus on {affected_module}" \
+  --tool gemini --mode analysis
+```
+
+### Step 5: Write Investigation Report
+
+Generate `investigation-report.json` in memory (carried to next phase):
+
+```json
+{
+  "phase": 1,
+  "bug_description": "concise description of the bug",
+  "reproduction": {
+    "reproducible": true,
+    "steps": [
+      "step 1: ...",
+      "step 2: ...",
+      "step 3: observe error"
+    ],
+    "reproduction_method": "test|command|static_analysis"
+  },
+  "evidence": {
+    "error_messages": ["exact error text"],
+    "stack_traces": ["relevant stack trace"],
+    "affected_files": ["file1.ts", "file2.ts"],
+    "affected_modules": ["module-name"],
+    "log_output": ["relevant log lines"]
+  },
+  "initial_diagnosis": {
+    "cli_tool_used": "gemini",
+    "top_suspects": [
+      { "description": "suspect 1", "evidence_strength": "strong|moderate|weak", "files": [] }
+    ]
+  }
+}
+```
+
+## Output
+
+- **Data**: `investigation-report` (in-memory, passed to Phase 2)
+- **Format**: JSON structure as defined above
+
+## Quality Checks
+
+- [ ] Bug symptom clearly documented
+- [ ] Reproduction attempted (success or documented failure)
+- [ ] At least one piece of concrete evidence collected (error message, stack trace, or failing test)
+- [ ] Affected files identified
+- [ ] Initial diagnosis generated
+
+## Next Phase
+
+Proceed to [Phase 2: Pattern Analysis](02-pattern-analysis.md) with the investigation report.
--- a/.claude/skills/investigate/phases/02-pattern-analysis.md
+++ b/.claude/skills/investigate/phases/02-pattern-analysis.md
@@ -0,0 +1,126 @@
+# Phase 2: Pattern Analysis
+
+Search for similar patterns in the codebase to determine if the bug is isolated or systemic.
+
+## Objective
+
+- Search for similar error patterns, antipatterns, or code smells across the codebase
+- Determine if the bug is an isolated incident or part of a systemic issue
+- Identify related code that may be affected by the same root cause
+- Refine the scope of the investigation
+
+## Execution Steps
+
+### Step 1: Search for Similar Error Patterns
+
+Look for the same error type or message elsewhere in the codebase:
+
+```javascript
+// Search for identical or similar error messages
+Grep({ pattern: "error_message_fragment", path: "src/", output_mode: "content", context: 3 })
+
+// Search for the same exception/error type
+Grep({ pattern: "ErrorClassName|error_code", path: "src/", output_mode: "files_with_matches" })
+
+// Search for similar error handling patterns
+Grep({ pattern: "catch.*{similar_pattern}", path: "src/", output_mode: "content" })
+```
+
+### Step 2: Search for Same Antipattern
+
+If the initial diagnosis suggests a coding antipattern, search for it globally:
+
+```javascript
+// Examples of antipattern searches:
+// Missing null checks
+Grep({ pattern: "variable\\.property", path: "src/", glob: "*.ts" })
+
+// Unchecked async operations
+Grep({ pattern: "async.*without.*await", path: "src/" })
+
+// Direct mutation of shared state
+Grep({ pattern: "shared_state_pattern", path: "src/" })
+```
+
+### Step 3: Module-Level Analysis
+
+Examine the affected module for structural issues:
+
+```javascript
+// List all files in the affected module
+Glob({ pattern: "src/affected-module/**/*" })
+
+// Check imports and dependencies
+Grep({ pattern: "import.*from.*affected-module", path: "src/" })
+
+// Check for circular dependencies or unusual patterns
+Grep({ pattern: "require.*affected-module", path: "src/" })
+```
+
+### Step 4: CLI Cross-File Pattern Analysis (Optional)
+
+For complex patterns that span multiple files, use CLI analysis:
+
+```bash
+ccw cli -p "PURPOSE: Identify all instances of antipattern across codebase; success = complete scope map
+TASK: Search for pattern '{antipattern_description}' | Map all occurrences | Assess systemic risk
+MODE: analysis
+CONTEXT: @src/**/*.{ext} | Bug in {module}, pattern: {pattern_description}
+EXPECTED: List of all files with same pattern, risk assessment per occurrence
+CONSTRAINTS: Focus on {antipattern} pattern only | Ignore test files for scope" \
+  --tool gemini --mode analysis
+```
+
+### Step 5: Scope Assessment
+
+Classify the bug scope based on findings:
+
+```json
+{
+  "phase": 2,
+  "pattern_analysis": {
+    "scope": "isolated|module-wide|systemic",
+    "similar_occurrences": [
+      {
+        "file": "path/to/file.ts",
+        "line": 42,
+        "pattern": "description of similar pattern",
+        "risk": "same_bug|potential_bug|safe"
+      }
+    ],
+    "total_occurrences": 1,
+    "affected_modules": ["module-name"],
+    "antipattern_identified": "description or null",
+    "scope_justification": "why this scope classification"
+  }
+}
+```
+
+**Scope Definitions**:
+- **isolated**: Bug exists in a single location, no similar patterns found
+- **module-wide**: Same pattern exists in multiple files within the same module
+- **systemic**: Pattern spans multiple modules, may require broader fix
+
+## Output
+
+- **Data**: `pattern-analysis` section added to investigation report (in-memory)
+- **Format**: JSON structure as defined above
+
+## Decision Point
+
+| Scope | Action |
+|-------|--------|
+| isolated | Proceed to Phase 3 with narrow focus |
+| module-wide | Proceed to Phase 3, note all occurrences for Phase 4 fix |
+| systemic | Proceed to Phase 3, but flag for potential multi-phase fix or separate tracking |
+
+## Quality Checks
+
+- [ ] At least 3 search queries executed against the codebase
+- [ ] Scope classified as isolated, module-wide, or systemic
+- [ ] Similar occurrences documented with file:line references
+- [ ] Scope justification provided with evidence
+
+## Next Phase
+
+Proceed to [Phase 3: Hypothesis Testing](03-hypothesis-testing.md) with the pattern analysis results.
--- a/.claude/skills/investigate/phases/03-hypothesis-testing.md
+++ b/.claude/skills/investigate/phases/03-hypothesis-testing.md
@@ -0,0 +1,177 @@
+# Phase 3: Hypothesis Testing
+
+Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.
+
+## Objective
+
+- Form a maximum of 3 hypotheses from Phase 1-2 evidence
+- Test each hypothesis with minimal, read-only probes
+- Confirm or reject each hypothesis with concrete evidence
+- Enforce 3-strike rule: STOP and escalate after 3 consecutive test failures
+
+## Execution Steps
+
+### Step 1: Form Hypotheses
+
+Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:
+
+```json
+{
+  "hypotheses": [
+    {
+      "id": "H1",
+      "description": "The root cause is X because evidence Y",
+      "evidence_supporting": ["evidence item 1", "evidence item 2"],
+      "predicted_behavior": "If H1 is correct, then we should observe Z",
+      "test_method": "How to verify: read file X line Y, check value Z",
+      "confidence": "high|medium|low"
+    }
+  ]
+}
+```
+
+**Hypothesis Formation Rules**:
+- Each hypothesis must cite at least one piece of evidence from Phase 1-2
+- Each hypothesis must have a testable prediction
+- Rank by confidence (high first)
+- Maximum 3 hypotheses per investigation
+
+### Step 2: Test Hypotheses Sequentially
+
+Test each hypothesis starting from highest confidence. Use read-only probes:
+
+**Allowed test methods**:
+- `Read` a specific file and check a specific value or condition
+- `Grep` for a pattern that would confirm or deny the hypothesis
+- `Bash` to run a specific test or command that reveals the condition
+- Temporarily add a log statement to observe runtime behavior (revert after)
+
+**Prohibited during testing**:
+- Modifying production code (save that for Phase 4)
+- Changing multiple things at once
+- Running the full test suite (targeted checks only)
+
+```javascript
+// Example hypothesis test
+// H1: "Function X receives null because caller Y doesn't check return value"
+const evidence = Read({ file_path: "src/caller.ts" })
+// Check: Does caller Y use the return value without null check?
+// Result: Confirmed / Rejected with specific evidence
+```
+
+### Step 3: Record Test Results
+
+For each hypothesis test:
+
+```json
+{
+  "hypothesis_tests": [
+    {
+      "id": "H1",
+      "test_performed": "Read src/caller.ts:42 - checked null handling",
+      "result": "confirmed|rejected|inconclusive",
+      "evidence": "specific observation that confirms or rejects",
+      "files_checked": ["src/caller.ts:42-55"]
+    }
+  ]
+}
+```
+
+### Step 4: 3-Strike Escalation Rule
+
+Track consecutive test failures. A "failure" means the test was inconclusive or the hypothesis was rejected AND no actionable insight was gained.
+
+```
+Strike Counter:
+  [H1 rejected, no insight] → Strike 1
+  [H2 rejected, no insight] → Strike 2
+  [H3 rejected, no insight] → Strike 3 → STOP
+```
+
+**Important**: A rejected hypothesis that provides useful insight (narrows the search) does NOT count as a strike. Only truly unproductive tests count.
+
+**On 3rd Strike — STOP and Escalate**:
+
+```
+## ESCALATION: 3-Strike Limit Reached
+
+### Failed Step
+- Phase: 3 — Hypothesis Testing
+- Step: Hypothesis test #{N}
+
+### Error History
+1. Attempt 1: H1 — {description}
+   Test: {what was checked}
+   Result: {rejected/inconclusive} — {why}
+2. Attempt 2: H2 — {description}
+   Test: {what was checked}
+   Result: {rejected/inconclusive} — {why}
+3. Attempt 3: H3 — {description}
+   Test: {what was checked}
+   Result: {rejected/inconclusive} — {why}
+
+### Current State
+- Evidence collected: {summary from Phase 1-2}
+- Hypotheses tested: {list}
+- Files examined: {list}
+
+### Diagnosis
+- Likely root cause area: {best guess based on all evidence}
+- Suggested human action: {specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z"}
+
+### Diagnostic Dump
+{Full investigation-report.json content}
+```
+
+After escalation, set status to **BLOCKED** per Completion Status Protocol.
+
+### Step 5: Confirm Root Cause
+
+If a hypothesis is confirmed, document the confirmed root cause:
+
+```json
+{
+  "phase": 3,
+  "confirmed_root_cause": {
+    "hypothesis_id": "H1",
+    "description": "Root cause description with full evidence chain",
+    "evidence_chain": [
+      "Phase 1: Error message X observed in Y",
+      "Phase 2: Same pattern found in 3 other files",
+      "Phase 3: H1 confirmed — null check missing at file.ts:42"
+    ],
+    "affected_code": {
+      "file": "path/to/file.ts",
+      "line_range": "42-55",
+      "function": "functionName"
+    }
+  }
+}
+```
+
+## Output
+
+- **Data**: `hypothesis-tests` and `confirmed_root_cause` added to investigation report (in-memory)
+- **Format**: JSON structure as defined above
+
+## Gate for Phase 4
+
+**Phase 4 can ONLY proceed if `confirmed_root_cause` is present.** This is the Iron Law gate.
+
+| Outcome | Next Step |
+|---------|-----------|
+| Root cause confirmed | Proceed to [Phase 4: Implementation](04-implementation.md) |
+| 3-strike escalation | STOP, output diagnostic dump, status = BLOCKED |
+| Partial insight | Re-form hypotheses with new evidence (stays in Phase 3) |
+
+## Quality Checks
+
+- [ ] Maximum 3 hypotheses formed, each with cited evidence
+- [ ] Each hypothesis tested with a specific, documented probe
+- [ ] Test results recorded with concrete evidence
+- [ ] 3-strike counter maintained correctly
+- [ ] Root cause confirmed with full evidence chain OR escalation triggered
+
+## Next Phase
+
+Proceed to [Phase 4: Implementation](04-implementation.md) ONLY with confirmed root cause.
--- a/.claude/skills/investigate/phases/04-implementation.md
+++ b/.claude/skills/investigate/phases/04-implementation.md
@@ -0,0 +1,139 @@
+# Phase 4: Implementation
+
+Implement the minimal fix and add a regression test. Iron Law gate enforced.
+
+## Objective
+
+- Verify Iron Law gate: confirmed root cause MUST exist from Phase 3
+- Implement the minimal fix that addresses the confirmed root cause
+- Add a regression test that fails without the fix and passes with it
+- Verify the fix resolves the original reproduction case
+
+## Iron Law Gate Check
+
+**MANDATORY**: Before any code modification, verify:
+
+```javascript
+if (!investigation_report.confirmed_root_cause) {
+  // VIOLATION: Cannot proceed without confirmed root cause
+  // Return to Phase 3 or escalate
+  throw new Error("Iron Law violation: No confirmed root cause. Return to Phase 3.")
+}
+
+console.log(`Root cause confirmed: ${investigation_report.confirmed_root_cause.description}`)
+console.log(`Evidence chain: ${investigation_report.confirmed_root_cause.evidence_chain.length} items`)
+console.log(`Affected code: ${investigation_report.confirmed_root_cause.affected_code.file}:${investigation_report.confirmed_root_cause.affected_code.line_range}`)
+```
+
+If the gate check fails, do NOT proceed. Return status **BLOCKED** with reason "Iron Law: no confirmed root cause".
+
+## Execution Steps
+
+### Step 1: Plan the Minimal Fix
+
+Define the fix scope BEFORE writing any code:
+
+```json
+{
+  "fix_plan": {
+    "description": "What the fix does and why",
+    "changes": [
+      {
+        "file": "path/to/file.ts",
+        "change_type": "modify|add|remove",
+        "description": "specific change description",
+        "lines_affected": "42-45"
+      }
+    ],
+    "total_files_changed": 1,
+    "total_lines_changed": "estimated"
+  }
+}
+```
+
+**Minimal Fix Rules** (from [specs/iron-law.md](../specs/iron-law.md)):
+- Change only what is necessary to fix the confirmed root cause
+- Do not refactor surrounding code
+- Do not add features
+- Do not change formatting or style of unrelated code
+- If the fix requires changes to more than 3 files, document justification
+
+### Step 2: Implement the Fix
+
+Apply the planned changes using `Edit` tool:
+
+```javascript
+Edit({
+  file_path: "path/to/affected/file.ts",
+  old_string: "buggy code",
+  new_string: "fixed code"
+})
+```
+
+### Step 3: Add Regression Test
+
+Create or modify a test that:
+1. **Fails** without the fix (tests the exact bug condition)
+2. **Passes** with the fix
+
+```javascript
+// Identify existing test file for the module
+Glob({ pattern: "**/*.test.{ts,js,py}" })
+// or
+Glob({ pattern: "**/test_*.py" })
+
+// Add regression test
+// Test name should reference the bug: "should handle null return from X"
+// Test should exercise the exact code path that caused the bug
+```
+
+**Regression test requirements**:
+- Test name clearly describes the bug scenario
+- Test exercises the specific code path identified in root cause
+- Test is deterministic (no flaky timing, external dependencies)
+- Test is placed in the appropriate test file for the module
+
+### Step 4: Verify Fix Against Reproduction
+
+Re-run the original reproduction case from Phase 1:
+
+```bash
+# Run the specific failing test/command from Phase 1
+# It should now pass
+```
+
+Record the verification result:
+
+```json
+{
+  "phase": 4,
+  "fix_applied": {
+    "description": "what was fixed",
+    "files_changed": ["path/to/file.ts"],
+    "lines_changed": 3,
+    "regression_test": {
+      "file": "path/to/test.ts",
+      "test_name": "should handle null return from X",
+      "status": "added|modified"
+    },
+    "reproduction_verified": true
+  }
+}
+```
+
+## Output
+
+- **Data**: `fix_applied` section added to investigation report (in-memory)
+- **Artifacts**: Modified source files and test files
+
+## Quality Checks
+
+- [ ] Iron Law gate passed: confirmed root cause exists
+- [ ] Fix is minimal: only necessary changes made
+- [ ] Regression test added that covers the specific bug
+- [ ] Original reproduction case passes with the fix
+- [ ] No unrelated code changes included
+
+## Next Phase
+
+Proceed to [Phase 5: Verification & Report](05-verification-report.md) to run full test suite and generate report.
--- a/.claude/skills/investigate/phases/05-verification-report.md
+++ b/.claude/skills/investigate/phases/05-verification-report.md
@@ -0,0 +1,153 @@
+# Phase 5: Verification & Report
+
+Run full test suite, check for regressions, and generate the structured debug report.
+
+## Objective
+
+- Run the full test suite to verify no regressions were introduced
+- Generate a structured debug report for future reference
+- Output the report to `.workflow/.debug/` directory
+
+## Execution Steps
+
+### Step 1: Run Full Test Suite
+
+```bash
+# Detect and run the project's test framework
+# npm test / pytest / go test / cargo test / etc.
+```
+
+Record results:
+
+```json
+{
+  "test_results": {
+    "total": 0,
+    "passed": 0,
+    "failed": 0,
+    "skipped": 0,
+    "regression_test_passed": true,
+    "new_failures": []
+  }
+}
+```
+
+**If new failures are found**:
+- Check if the failures are related to the fix
+- If related: the fix introduced a regression — return to Phase 4 to adjust
+- If unrelated: document as pre-existing failures, proceed with report
+
+### Step 2: Regression Check
+
+Verify specifically:
+1. The new regression test passes
+2. All tests that passed before the fix still pass
+3. No new warnings or errors in test output
+
+### Step 3: Generate Structured Debug Report
+
+Create the report following the schema in [specs/debug-report-format.md](../specs/debug-report-format.md):
+
+```bash
+mkdir -p .workflow/.debug
+```
+
+```json
+{
+  "bug_description": "concise description of the bug",
+  "reproduction_steps": [
+    "step 1",
+    "step 2",
+    "step 3: observe error"
+  ],
+  "root_cause": "confirmed root cause description with technical detail",
+  "evidence_chain": [
+    "Phase 1: error message X observed in module Y",
+    "Phase 2: pattern analysis found N similar occurrences",
+    "Phase 3: hypothesis H1 confirmed — specific condition at file:line"
+  ],
+  "fix_description": "what was changed and why",
+  "files_changed": [
+    {
+      "path": "src/module/file.ts",
+      "change_type": "modify",
+      "description": "added null check before property access"
+    }
+  ],
+  "tests_added": [
+    {
+      "file": "src/module/__tests__/file.test.ts",
+      "test_name": "should handle null return from X",
+      "type": "regression"
+    }
+  ],
+  "regression_check_result": {
+    "passed": true,
+    "total_tests": 0,
+    "new_failures": [],
+    "pre_existing_failures": []
+  },
+  "completion_status": "DONE|DONE_WITH_CONCERNS|BLOCKED",
+  "concerns": [],
+  "timestamp": "ISO-8601",
+  "investigation_duration_phases": 5
+}
+```
+
+### Step 4: Write Report File
+
+```javascript
+const slug = bugDescription.toLowerCase().replace(/[^a-z0-9]+/g, '-').substring(0, 40)
+const dateStr = new Date().toISOString().substring(0, 10)
+const reportPath = `.workflow/.debug/debug-report-${dateStr}-${slug}.json`
+
+Write({ file_path: reportPath, content: JSON.stringify(report, null, 2) })
+```
+
+### Step 5: Output Completion Status
+
+Follow the Completion Status Protocol from `_shared/SKILL-DESIGN-SPEC.md` section 13:
+
+**DONE**:
+```
+## STATUS: DONE
+
+**Summary**: Fixed {bug_description} — root cause was {root_cause_summary}
+
+### Details
+- Phases completed: 5/5
+- Root cause: {confirmed_root_cause}
+- Fix: {fix_description}
+- Regression test: {test_name} in {test_file}
+
+### Outputs
+- Debug report: {reportPath}
+- Files changed: {list}
+- Tests added: {list}
+```
+
+**DONE_WITH_CONCERNS**:
+```
+## STATUS: DONE_WITH_CONCERNS
+
+**Summary**: Fixed {bug_description} with concerns
+
+### Details
+- Phases completed: 5/5
+- Concerns:
+  1. {concern} — Impact: {low|medium} — Suggested fix: {action}
+```
+
+## Output
+
+- **File**: `debug-report-{YYYY-MM-DD}-{slug}.json`
+- **Location**: `.workflow/.debug/`
+- **Format**: JSON (see [specs/debug-report-format.md](../specs/debug-report-format.md))
+
+## Quality Checks
+
+- [ ] Full test suite executed
+- [ ] Regression test specifically verified
+- [ ] No new test failures introduced (or documented if pre-existing)
+- [ ] Debug report written to `.workflow/.debug/`
+- [ ] Completion status output follows protocol
--- a/.claude/skills/investigate/specs/debug-report-format.md
+++ b/.claude/skills/investigate/specs/debug-report-format.md
@@ -0,0 +1,226 @@
+# Debug Report Format
+
+Defines the structured JSON schema for debug reports generated by the investigate skill.
+
+## When to Use
+
+| Phase | Usage | Section |
+|-------|-------|---------|
+| Phase 5 | Generate final report | Full schema |
+| Phase 3 (escalation) | Diagnostic dump includes partial report | Partial schema |
+
+---
+
+## JSON Schema
+
+```json
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "Debug Report",
+  "type": "object",
+  "required": [
+    "bug_description",
+    "reproduction_steps",
+    "root_cause",
+    "evidence_chain",
+    "fix_description",
+    "files_changed",
+    "tests_added",
+    "regression_check_result",
+    "completion_status"
+  ],
+  "properties": {
+    "bug_description": {
+      "type": "string",
+      "description": "Concise description of the bug symptom",
+      "minLength": 10
+    },
+    "reproduction_steps": {
+      "type": "array",
+      "description": "Ordered steps to reproduce the bug",
+      "items": { "type": "string" },
+      "minItems": 1
+    },
+    "root_cause": {
+      "type": "string",
+      "description": "Confirmed root cause with technical detail",
+      "minLength": 20
+    },
+    "evidence_chain": {
+      "type": "array",
+      "description": "Ordered evidence from Phase 1 through Phase 3, each prefixed with phase number",
+      "items": { "type": "string" },
+      "minItems": 1
+    },
+    "fix_description": {
+      "type": "string",
+      "description": "What was changed and why",
+      "minLength": 10
+    },
+    "files_changed": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "required": ["path", "change_type", "description"],
+        "properties": {
+          "path": {
+            "type": "string",
+            "description": "Relative file path"
+          },
+          "change_type": {
+            "type": "string",
+            "enum": ["add", "modify", "remove"]
+          },
+          "description": {
+            "type": "string",
+            "description": "Brief description of changes to this file"
+          }
+        }
+      }
+    },
+    "tests_added": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "required": ["file", "test_name", "type"],
+        "properties": {
+          "file": {
+            "type": "string",
+            "description": "Test file path"
+          },
+          "test_name": {
+            "type": "string",
+            "description": "Name of the test function or describe block"
+          },
+          "type": {
+            "type": "string",
+            "enum": ["regression", "unit", "integration"],
+            "description": "Type of test added"
+          }
+        }
+      }
+    },
+    "regression_check_result": {
+      "type": "object",
+      "required": ["passed", "total_tests"],
+      "properties": {
+        "passed": {
+          "type": "boolean",
+          "description": "Whether the full test suite passed"
+        },
+        "total_tests": {
+          "type": "integer",
+          "description": "Total number of tests executed"
+        },
+        "new_failures": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Tests that failed after the fix but passed before"
+        },
+        "pre_existing_failures": {
+          "type": "array",
+          "items": { "type": "string" },
+          "description": "Tests that were already failing before the investigation"
+        }
+      }
+    },
+    "completion_status": {
+      "type": "string",
+      "enum": ["DONE", "DONE_WITH_CONCERNS", "BLOCKED"],
+      "description": "Final status per Completion Status Protocol"
+    },
+    "concerns": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "properties": {
+          "description": { "type": "string" },
+          "impact": { "type": "string", "enum": ["low", "medium"] },
+          "suggested_action": { "type": "string" }
+        }
+      },
+      "description": "Non-blocking concerns (populated when status is DONE_WITH_CONCERNS)"
+    },
+    "timestamp": {
+      "type": "string",
+      "format": "date-time",
+      "description": "ISO-8601 timestamp of report generation"
+    },
+    "investigation_duration_phases": {
+      "type": "integer",
+      "description": "Number of phases completed (1-5)",
+      "minimum": 1,
+      "maximum": 5
+    }
+  }
+}
+```
+
+## Field Descriptions
+
+| Field | Source Phase | Description |
+|-------|-------------|-------------|
+| `bug_description` | Phase 1 | User-reported symptom, one sentence |
+| `reproduction_steps` | Phase 1 | Ordered steps to trigger the bug |
+| `root_cause` | Phase 3 | Confirmed cause with file:line reference |
+| `evidence_chain` | Phase 1-3 | Each item prefixed with "Phase N:" |
+| `fix_description` | Phase 4 | What code was changed and why |
+| `files_changed` | Phase 4 | Each file with change type and description |
+| `tests_added` | Phase 4 | Regression tests covering the bug |
+| `regression_check_result` | Phase 5 | Full test suite results |
+| `completion_status` | Phase 5 | Final status per protocol |
+| `concerns` | Phase 5 | Non-blocking issues (if any) |
+| `timestamp` | Phase 5 | When report was generated |
+| `investigation_duration_phases` | Phase 5 | How many phases were completed |
+
+## Example Report
+
+```json
+{
+  "bug_description": "API returns 500 when user profile has null display_name",
+  "reproduction_steps": [
+    "Create user account without setting display_name",
+    "Call GET /api/users/:id/profile",
+    "Observe 500 Internal Server Error"
+  ],
+  "root_cause": "ProfileSerializer.format() calls displayName.trim() without null check at src/serializers/profile.ts:42",
+  "evidence_chain": [
+    "Phase 1: TypeError: Cannot read properties of null (reading 'trim') in server logs",
+    "Phase 2: Same pattern in 2 other serializers (address.ts:28, company.ts:35)",
+    "Phase 3: H1 confirmed — displayName field is nullable in DB but serializer assumes non-null"
+  ],
+  "fix_description": "Added null-safe access for displayName in ProfileSerializer.format()",
+  "files_changed": [
+    {
+      "path": "src/serializers/profile.ts",
+      "change_type": "modify",
+      "description": "Added optional chaining for displayName.trim() call"
+    }
+  ],
+  "tests_added": [
+    {
+      "file": "src/serializers/__tests__/profile.test.ts",
+      "test_name": "should handle null display_name without error",
+      "type": "regression"
+    }
+  ],
+  "regression_check_result": {
+    "passed": true,
+    "total_tests": 142,
+    "new_failures": [],
+    "pre_existing_failures": []
+  },
+  "completion_status": "DONE",
+  "concerns": [],
+  "timestamp": "2026-03-29T15:30:00+08:00",
+  "investigation_duration_phases": 5
+}
+```
+
+## Output Location
+
+Reports are written to: `.workflow/.debug/debug-report-{YYYY-MM-DD}-{slug}.json`
+
+Where:
+- `{YYYY-MM-DD}` is the investigation date
+- `{slug}` is derived from the bug description (lowercase, hyphens, max 40 chars)
--- a/.claude/skills/investigate/specs/iron-law.md
+++ b/.claude/skills/investigate/specs/iron-law.md
@@ -0,0 +1,101 @@
+# Iron Law of Debugging
+
+The Iron Law defines the non-negotiable rules that govern every investigation performed by this skill. These rules exist to prevent symptom-fixing and ensure durable, evidence-based solutions.
+
+## When to Use
+
+| Phase | Usage | Section |
+|-------|-------|---------|
+| Phase 3 | Hypothesis must produce confirmed root cause before proceeding | Rule 1 |
+| Phase 1 | Reproduction must produce observable evidence | Rule 2 |
+| Phase 4 | Fix scope must be minimal | Rule 3 |
+| Phase 4 | Regression test is mandatory | Rule 4 |
+| Phase 3 | 3 consecutive unproductive hypothesis failures trigger escalation | Rule 5 |
+
+---
+
+## Rules
+
+### Rule 1: Never Fix Without Confirmed Root Cause
+
+**Statement**: No code modification is permitted until a root cause has been confirmed through hypothesis testing with concrete evidence.
+
+**Enforcement**: Phase 4 begins with an Iron Law gate check. If `confirmed_root_cause` is absent from the investigation report, Phase 4 is blocked.
+
+**Rationale**: Fixing symptoms without understanding the cause leads to:
+- Incomplete fixes that break under different conditions
+- Masking of deeper issues
+- Wasted investigation time when the bug recurs
+
+### Rule 2: Evidence Must Be Reproducible
+
+**Statement**: The bug must be reproducible through documented steps, or if not reproducible, the evidence must be sufficient to identify the root cause through static analysis.
+
+**Enforcement**: Phase 1 documents reproduction steps and evidence. If reproduction fails, this is flagged as a concern but does not block investigation if sufficient static evidence exists.
+
+**Acceptable evidence types**:
+- Failing test case
+- Error message with stack trace
+- Log output showing the failure
+- Code path analysis showing the defect condition
+
+### Rule 3: Fix Must Be Minimal
+
+**Statement**: The fix must change only what is necessary to address the confirmed root cause. No refactoring, no feature additions, no style changes to unrelated code.
+
+**Enforcement**: Phase 4 requires a fix plan before implementation. Changes exceeding 3 files require written justification.
+
+**What counts as minimal**:
+- Adding a missing null check
+- Fixing an incorrect condition
+- Correcting a wrong variable reference
+- Adding a missing import or dependency
+
+**What is NOT minimal**:
+- Refactoring the function "while we're here"
+- Renaming variables for clarity
+- Adding error handling to unrelated code paths
+- Reformatting surrounding code
+
+### Rule 4: Regression Test Required
+
+**Statement**: Every fix must include a test that:
+1. Fails when the fix is reverted (proves it tests the bug)
+2. Passes when the fix is applied (proves the fix works)
+
+**Enforcement**: Phase 4 requires a regression test before the phase is marked complete.
+
+**Test requirements**:
+- Test name clearly references the bug scenario
+- Test exercises the exact code path of the root cause
+- Test is deterministic (no timing dependencies, no external services)
+- Test is placed in the appropriate test file for the affected module
+
+### Rule 5: 3-Strike Escalation on Hypothesis Failure
+
+**Statement**: If 3 consecutive hypothesis tests produce no actionable insight, the investigation must STOP and escalate with a full diagnostic dump.
+
+**Enforcement**: Phase 3 tracks a strike counter. On the 3rd consecutive unproductive failure, execution halts and outputs the escalation block.
+
+**What counts as a strike**:
+- Hypothesis rejected AND no new insight gained
+- Test was inconclusive AND no narrowing of search space
+
+**What does NOT count as a strike**:
+- Hypothesis rejected BUT new evidence narrows the search
+- Hypothesis rejected BUT reveals a different potential cause
+- Test inconclusive BUT identifies a new area to investigate
+
+**Post-escalation**: Status set to BLOCKED. No further automated investigation. Preserve all intermediate outputs for human review.
+
+---
+
+## Validation Checklist
+
+Before completing any investigation, verify:
+
+- [ ] Rule 1: Root cause confirmed before any fix was applied
+- [ ] Rule 2: Bug reproduction documented (or static evidence justified)
+- [ ] Rule 3: Fix changes only necessary code (file count, line count documented)
+- [ ] Rule 4: Regression test exists and passes
+- [ ] Rule 5: No more than 3 consecutive unproductive hypothesis tests (or escalation triggered)
--- a/.claude/skills/security-audit/SKILL.md
+++ b/.claude/skills/security-audit/SKILL.md
@@ -0,0 +1,125 @@
+---
+name: security-audit
+description: OWASP Top 10 and STRIDE security auditing with supply chain analysis. Triggers on "security audit", "security scan", "cso".
+allowed-tools: Read, Write, Bash, Glob, Grep
+---
+
+# Security Audit
+
+4-phase security audit covering supply chain risks, OWASP Top 10 code review, STRIDE threat modeling, and trend-tracked reporting. Produces structured JSON findings in `.workflow/.security/`.
+
+## Architecture Overview
+
+```
+-------------------------------------------------------------------+
+|  Phase 1: Supply Chain Scan                                       |
+|  -> Dependency audit, secrets detection, CI/CD review, LLM risks  |
+|  -> Output: supply-chain-report.json                              |
+-----------------------------------+-------------------------------+
+                                    |
+-----------------------------------v-------------------------------+
+|  Phase 2: OWASP Review                                           |
+|  -> OWASP Top 10 2021 code-level analysis via ccw cli            |
+|  -> Output: owasp-findings.json                                  |
+-----------------------------------+-------------------------------+
+                                    |
+-----------------------------------v-------------------------------+
+|  Phase 3: Threat Modeling (STRIDE)                                |
+|  -> 6 threat categories mapped to architecture components         |
+|  -> Output: threat-model.json                                    |
+-----------------------------------+-------------------------------+
+                                    |
+-----------------------------------v-------------------------------+
+|  Phase 4: Report & Tracking                                      |
+|  -> Score calculation, trend comparison, dated report             |
+|  -> Output: .workflow/.security/audit-report-{date}.json         |
+-------------------------------------------------------------------+
+```
+
+## Key Design Principles
+
+1. **Infrastructure-first**: Phase 1 catches low-hanging fruit (leaked secrets, vulnerable deps) before deeper analysis
+2. **Standards-based**: OWASP Top 10 2021 and STRIDE provide systematic coverage
+3. **Scoring gates**: Daily quick-scan must score 8/10; comprehensive audit minimum 2/10 for initial baseline
+4. **Trend tracking**: Each audit compares against prior results in `.workflow/.security/`
+
+## Execution Flow
+
+### Quick-Scan Mode (daily)
+
+Run Phase 1 only. Must score >= 8/10 to pass.
+
+### Comprehensive Mode (full audit)
+
+Run all 4 phases sequentially. Initial baseline minimum 2/10.
+
+### Phase Sequence
+
+1. **Phase 1: Supply Chain Scan** -- [phases/01-supply-chain-scan.md](phases/01-supply-chain-scan.md)
+   - Dependency audit (npm audit / pip-audit / safety check)
+   - Secrets detection (API keys, tokens, passwords in source)
+   - CI/CD config review (injection risks in workflow YAML)
+   - LLM/AI prompt injection check
+2. **Phase 2: OWASP Review** -- [phases/02-owasp-review.md](phases/02-owasp-review.md)
+   - Systematic OWASP Top 10 2021 code review
+   - Uses `ccw cli --tool gemini --mode analysis --rule analysis-assess-security-risks`
+3. **Phase 3: Threat Modeling** -- [phases/03-threat-modeling.md](phases/03-threat-modeling.md)
+   - STRIDE threat model mapped to architecture components
+   - Trust boundary identification and attack surface assessment
+4. **Phase 4: Report & Tracking** -- [phases/04-report-tracking.md](phases/04-report-tracking.md)
+   - Score calculation with severity weights
+   - Trend comparison with previous audits
+   - Date-stamped report to `.workflow/.security/`
+
+## Scoring Overview
+
+See [specs/scoring-gates.md](specs/scoring-gates.md) for full specification.
+
+| Severity | Weight | Example |
+|----------|--------|---------|
+| Critical | 10 | RCE, SQL injection, leaked credentials |
+| High | 7 | Broken auth, SSRF, privilege escalation |
+| Medium | 4 | XSS, CSRF, verbose error messages |
+| Low | 1 | Missing headers, informational disclosures |
+
+**Gates**: Daily quick-scan >= 8/10, Comprehensive initial >= 2/10.
+
+## Directory Setup
+
+```bash
+mkdir -p .workflow/.security
+WORK_DIR=".workflow/.security"
+```
+
+## Output Structure
+
+```
+.workflow/.security/
+  audit-report-{YYYY-MM-DD}.json    # Dated audit report
+  supply-chain-report.json           # Latest supply chain scan
+  owasp-findings.json                # Latest OWASP findings
+  threat-model.json                  # Latest STRIDE threat model
+```
+
+## Reference Documents
+
+| Document | Purpose |
+|----------|---------|
+| [phases/01-supply-chain-scan.md](phases/01-supply-chain-scan.md) | Dependency, secrets, CI/CD, LLM risk scan |
+| [phases/02-owasp-review.md](phases/02-owasp-review.md) | OWASP Top 10 2021 code review |
+| [phases/03-threat-modeling.md](phases/03-threat-modeling.md) | STRIDE threat modeling |
+| [phases/04-report-tracking.md](phases/04-report-tracking.md) | Report generation and trend tracking |
+| [specs/scoring-gates.md](specs/scoring-gates.md) | Scoring system and quality gates |
+| [specs/owasp-checklist.md](specs/owasp-checklist.md) | OWASP Top 10 detection patterns |
+
+## Completion Status Protocol
+
+This skill follows the Completion Status Protocol defined in `_shared/SKILL-DESIGN-SPEC.md` sections 13-14.
+
+Possible termination statuses:
+- **DONE**: All phases completed, score calculated, report generated
+- **DONE_WITH_CONCERNS**: Audit completed but findings exceed acceptable thresholds
+- **BLOCKED**: Required tools unavailable (e.g., npm/pip not installed), permission denied
+- **NEEDS_CONTEXT**: Ambiguous project scope, unclear trust boundaries
+
+Escalation follows the Three-Strike Rule (section 14) per step.
--- a/.claude/skills/security-audit/phases/01-supply-chain-scan.md
+++ b/.claude/skills/security-audit/phases/01-supply-chain-scan.md
@@ -0,0 +1,139 @@
+# Phase 1: Supply Chain Scan
+
+Detect low-hanging security risks in dependencies, secrets, CI/CD pipelines, and LLM/AI integrations.
+
+## Objective
+
+- Audit third-party dependencies for known vulnerabilities
+- Scan source code for leaked secrets and credentials
+- Review CI/CD configuration for injection risks
+- Check for LLM/AI prompt injection vulnerabilities
+
+## Execution Steps
+
+### Step 1: Dependency Audit
+
+Detect package manager and run appropriate audit tool.
+
+```bash
+# Node.js projects
+if [ -f package-lock.json ] || [ -f yarn.lock ]; then
+  npm audit --json > "${WORK_DIR}/npm-audit-raw.json" 2>&1 || true
+fi
+
+# Python projects
+if [ -f requirements.txt ] || [ -f pyproject.toml ]; then
+  pip-audit --format json --output "${WORK_DIR}/pip-audit-raw.json" 2>&1 || true
+  # Fallback: safety check
+  safety check --json > "${WORK_DIR}/safety-raw.json" 2>&1 || true
+fi
+
+# Go projects
+if [ -f go.sum ]; then
+  govulncheck ./... 2>&1 | tee "${WORK_DIR}/govulncheck-raw.txt" || true
+fi
+```
+
+If audit tools are not installed, log as INFO finding and continue.
+
+### Step 2: Secrets Detection
+
+Scan source files for hardcoded secrets using regex patterns.
+
+```bash
+# High-confidence patterns (case-insensitive)
+grep -rniE \
+  '(api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token|secret[_-]?key)\s*[:=]\s*["\x27][A-Za-z0-9+/=_-]{16,}' \
+  --include='*.ts' --include='*.js' --include='*.py' --include='*.go' \
+  --include='*.java' --include='*.rb' --include='*.env' --include='*.yml' \
+  --include='*.yaml' --include='*.json' --include='*.toml' --include='*.cfg' \
+  . || true
+
+# AWS patterns
+grep -rniE '(AKIA[0-9A-Z]{16}|aws[_-]?secret[_-]?access[_-]?key)' . || true
+
+# Private keys
+grep -rniE '-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----' . || true
+
+# Connection strings with passwords
+grep -rniE '(mongodb|postgres|mysql|redis)://[^:]+:[^@]+@' . || true
+
+# JWT tokens (hardcoded)
+grep -rniE 'eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}' . || true
+```
+
+Exclude: `node_modules/`, `.git/`, `dist/`, `build/`, `__pycache__/`, `*.lock`, `*.min.js`.
+
+### Step 3: CI/CD Config Review
+
+Check GitHub Actions and other CI/CD configs for injection risks.
+
+```bash
+# Find workflow files
+find .github/workflows -name '*.yml' -o -name '*.yaml' 2>/dev/null
+
+# Check for expression injection in run: blocks
+# Dangerous: ${{ github.event.pull_request.title }} in run:
+grep -rn '\${{.*github\.event\.' .github/workflows/ 2>/dev/null || true
+
+# Check for pull_request_target with checkout of PR code
+grep -rn 'pull_request_target' .github/workflows/ 2>/dev/null || true
+
+# Check for use of deprecated/vulnerable actions
+grep -rn 'actions/checkout@v1\|actions/checkout@v2' .github/workflows/ 2>/dev/null || true
+
+# Check for secrets passed to untrusted contexts
+grep -rn 'secrets\.' .github/workflows/ 2>/dev/null || true
+```
+
+### Step 4: LLM/AI Prompt Injection Check
+
+Scan for patterns indicating prompt injection risk in LLM integrations.
+
+```bash
+# User input concatenated directly into prompts
+grep -rniE '(prompt|system_message|messages)\s*[+=].*\b(user_input|request\.(body|query|params)|req\.)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Template strings with user data in LLM calls
+grep -rniE '(openai|anthropic|llm|chat|completion)\.' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Check for missing input sanitization before LLM calls
+grep -rniE 'f".*{.*}.*".*\.(chat|complete|generate)' \
+  --include='*.py' . || true
+```
+
+## Output
+
+- **File**: `supply-chain-report.json`
+- **Location**: `${WORK_DIR}/supply-chain-report.json`
+- **Format**: JSON
+
+```json
+{
+  "phase": "supply-chain-scan",
+  "timestamp": "ISO-8601",
+  "findings": [
+    {
+      "category": "dependency|secret|cicd|llm",
+      "severity": "critical|high|medium|low",
+      "title": "Finding title",
+      "description": "Detailed description",
+      "file": "path/to/file",
+      "line": 42,
+      "evidence": "matched text or context",
+      "remediation": "How to fix"
+    }
+  ],
+  "summary": {
+    "total": 0,
+    "by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
+    "by_category": { "dependency": 0, "secret": 0, "cicd": 0, "llm": 0 }
+  }
+}
+```
+
+## Next Phase
+
+Proceed to [Phase 2: OWASP Review](02-owasp-review.md) with supply chain findings as context.
--- a/.claude/skills/security-audit/phases/02-owasp-review.md
+++ b/.claude/skills/security-audit/phases/02-owasp-review.md
@@ -0,0 +1,156 @@
+# Phase 2: OWASP Review
+
+Systematic code-level review against OWASP Top 10 2021 categories.
+
+## Objective
+
+- Review codebase against all 10 OWASP Top 10 2021 categories
+- Use CCW CLI multi-model analysis for comprehensive coverage
+- Produce structured findings with file:line references and remediation steps
+
+## Prerequisites
+
+- Phase 1 supply-chain-report.json (provides dependency context)
+- Read [specs/owasp-checklist.md](../specs/owasp-checklist.md) for detection patterns
+
+## Execution Steps
+
+### Step 1: Identify Target Scope
+
+```bash
+# Identify source directories (exclude deps, build, test fixtures)
+# Focus on: API routes, auth modules, data access, input handlers
+find . -type f \( -name '*.ts' -o -name '*.js' -o -name '*.py' -o -name '*.go' -o -name '*.java' \) \
+  ! -path '*/node_modules/*' ! -path '*/dist/*' ! -path '*/.git/*' \
+  ! -path '*/build/*' ! -path '*/__pycache__/*' ! -path '*/vendor/*' \
+  | head -200
+```
+
+### Step 2: CCW CLI Analysis
+
+Run multi-model security analysis using the security risks rule template.
+
+```bash
+ccw cli -p "PURPOSE: OWASP Top 10 2021 security audit of this codebase.
+Systematically check each OWASP category:
+A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
+A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
+A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
+A09 Security Logging/Monitoring Failures | A10 SSRF
+
+TASK: For each OWASP category, scan relevant code patterns, identify vulnerabilities with file:line references, classify severity, provide remediation.
+
+MODE: analysis
+
+CONTEXT: @src/**/* @**/*.config.* @**/*.env.example
+
+EXPECTED: JSON-structured findings per OWASP category with severity, file:line, evidence, remediation.
+
+CONSTRAINTS: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks
+" --tool gemini --mode analysis --rule analysis-assess-security-risks
+```
+
+### Step 3: Manual Pattern Scanning
+
+Supplement CLI analysis with targeted pattern scans per OWASP category. Reference [specs/owasp-checklist.md](../specs/owasp-checklist.md) for full pattern list.
+
+**A01 - Broken Access Control**:
+```bash
+# Missing auth middleware on routes
+grep -rn 'app\.\(get\|post\|put\|delete\|patch\)(' --include='*.ts' --include='*.js' . | grep -v 'auth\|middleware\|protect'
+# Direct object references without ownership check
+grep -rn 'params\.id\|req\.params\.' --include='*.ts' --include='*.js' . || true
+```
+
+**A03 - Injection**:
+```bash
+# SQL string concatenation
+grep -rniE '(query|execute|raw)\s*\(\s*[`"'\'']\s*SELECT.*\+\s*|f".*SELECT.*{' --include='*.ts' --include='*.js' --include='*.py' . || true
+# Command injection
+grep -rniE '(exec|spawn|system|popen|subprocess)\s*\(' --include='*.ts' --include='*.js' --include='*.py' . || true
+```
+
+**A05 - Security Misconfiguration**:
+```bash
+# Debug mode enabled
+grep -rniE '(DEBUG|debug)\s*[:=]\s*(true|True|1|"true")' --include='*.env' --include='*.py' --include='*.ts' --include='*.json' . || true
+# CORS wildcard
+grep -rniE "cors.*\*|Access-Control-Allow-Origin.*\*" --include='*.ts' --include='*.js' --include='*.py' . || true
+```
+
+**A07 - Identification and Authentication Failures**:
+```bash
+# Weak password patterns
+grep -rniE 'password.*length.*[0-5][^0-9]|minlength.*[0-5][^0-9]' --include='*.ts' --include='*.js' --include='*.py' . || true
+# Hardcoded credentials
+grep -rniE '(password|passwd|pwd)\s*[:=]\s*["\x27][^"\x27]{3,}' --include='*.ts' --include='*.js' --include='*.py' --include='*.env' . || true
+```
+
+### Step 4: Consolidate Findings
+
+Merge CLI analysis results and manual pattern scan results. Deduplicate and classify by OWASP category.
+
+## OWASP Top 10 2021 Categories
+
+| ID | Category | Key Checks |
+|----|----------|------------|
+| A01 | Broken Access Control | Missing auth, IDOR, path traversal, CORS |
+| A02 | Cryptographic Failures | Weak algorithms, plaintext storage, missing TLS |
+| A03 | Injection | SQL, NoSQL, OS command, LDAP, XPath injection |
+| A04 | Insecure Design | Missing threat modeling, insecure business logic |
+| A05 | Security Misconfiguration | Debug enabled, default creds, verbose errors |
+| A06 | Vulnerable and Outdated Components | Known CVEs in dependencies (from Phase 1) |
+| A07 | Identification and Authentication Failures | Weak passwords, missing MFA, session issues |
+| A08 | Software and Data Integrity Failures | Unsigned updates, insecure deserialization, CI/CD |
+| A09 | Security Logging and Monitoring Failures | Missing audit logs, no alerting, insufficient logging |
+| A10 | Server-Side Request Forgery (SSRF) | Unvalidated URLs, internal resource access |
+
+## Output
+
+- **File**: `owasp-findings.json`
+- **Location**: `${WORK_DIR}/owasp-findings.json`
+- **Format**: JSON
+
+```json
+{
+  "phase": "owasp-review",
+  "timestamp": "ISO-8601",
+  "owasp_version": "2021",
+  "findings": [
+    {
+      "owasp_id": "A01",
+      "owasp_category": "Broken Access Control",
+      "severity": "critical|high|medium|low",
+      "title": "Finding title",
+      "description": "Detailed description",
+      "file": "path/to/file",
+      "line": 42,
+      "evidence": "code snippet or pattern match",
+      "remediation": "Specific fix recommendation",
+      "cwe": "CWE-XXX"
+    }
+  ],
+  "coverage": {
+    "A01": "checked|not_applicable",
+    "A02": "checked|not_applicable",
+    "A03": "checked|not_applicable",
+    "A04": "checked|not_applicable",
+    "A05": "checked|not_applicable",
+    "A06": "checked|not_applicable",
+    "A07": "checked|not_applicable",
+    "A08": "checked|not_applicable",
+    "A09": "checked|not_applicable",
+    "A10": "checked|not_applicable"
+  },
+  "summary": {
+    "total": 0,
+    "by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
+    "categories_checked": 10,
+    "categories_with_findings": 0
+  }
+}
+```
+
+## Next Phase
+
+Proceed to [Phase 3: Threat Modeling](03-threat-modeling.md) with OWASP findings as input for STRIDE analysis.
--- a/.claude/skills/security-audit/phases/03-threat-modeling.md
+++ b/.claude/skills/security-audit/phases/03-threat-modeling.md
@@ -0,0 +1,180 @@
+# Phase 3: Threat Modeling (STRIDE)
+
+Map STRIDE threat categories to architecture components, identify trust boundaries, and assess attack surface.
+
+## Objective
+
+- Apply the STRIDE threat model to the project architecture
+- Identify trust boundaries between system components
+- Assess attack surface area per component
+- Cross-reference with Phase 1 and Phase 2 findings
+
+## STRIDE Categories
+
+| Category | Threat | Question | Typical Targets |
+|----------|--------|----------|-----------------|
+| **S** - Spoofing | Identity impersonation | Can an attacker pretend to be someone else? | Auth endpoints, API keys, session tokens |
+| **T** - Tampering | Data modification | Can data be modified in transit or at rest? | Request bodies, database records, config files |
+| **R** - Repudiation | Deniable actions | Can a user deny performing an action? | Audit logs, transaction records, user actions |
+| **I** - Information Disclosure | Data leakage | Can sensitive data be exposed? | Error messages, logs, API responses, storage |
+| **D** - Denial of Service | Availability disruption | Can the system be made unavailable? | API endpoints, resource-intensive operations |
+| **E** - Elevation of Privilege | Unauthorized access | Can a user gain higher privileges? | Role checks, admin routes, permission logic |
+
+## Execution Steps
+
+### Step 1: Architecture Component Discovery
+
+Identify major system components by scanning project structure.
+
+```bash
+# Identify entry points (API routes, CLI commands, event handlers)
+grep -rlE '(app\.(get|post|put|delete|patch|use)|router\.|@app\.route|@router\.)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Identify data stores (database connections, file storage)
+grep -rlE '(createConnection|mongoose\.connect|sqlite|redis|S3|createClient)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Identify external service integrations
+grep -rlE '(fetch|axios|http\.request|requests\.(get|post)|urllib)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Identify auth/session components
+grep -rlE '(jwt|passport|session|oauth|bcrypt|argon2|crypto)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+```
+
+### Step 2: Trust Boundary Identification
+
+Map trust boundaries in the system:
+
+1. **External boundary**: User/browser <-> Application server
+2. **Service boundary**: Application <-> External APIs/services
+3. **Data boundary**: Application <-> Database/storage
+4. **Internal boundary**: Public routes <-> Authenticated routes <-> Admin routes
+5. **Process boundary**: Main process <-> Worker/subprocess
+
+For each boundary, document:
+- What crosses the boundary (data types, credentials)
+- How the boundary is enforced (middleware, TLS, auth)
+- What happens when enforcement fails
+
+### Step 3: STRIDE per Component
+
+For each discovered component, systematically evaluate all 6 STRIDE categories:
+
+**Spoofing Analysis**:
+- Are authentication mechanisms in place at all entry points?
+- Can API keys or tokens be forged or replayed?
+- Are session tokens properly validated and rotated?
+
+**Tampering Analysis**:
+- Is input validation applied before processing?
+- Are database queries parameterized?
+- Can request bodies or headers be manipulated to alter behavior?
+- Are file uploads validated for type and content?
+
+**Repudiation Analysis**:
+- Are user actions logged with sufficient detail (who, what, when)?
+- Are logs tamper-proof or centralized?
+- Can critical operations (payments, deletions) be traced to a user?
+
+**Information Disclosure Analysis**:
+- Do error responses leak stack traces or internal paths?
+- Are sensitive fields (passwords, tokens) excluded from logs and API responses?
+- Is PII properly handled (encryption at rest, masking in logs)?
+- Do debug endpoints or verbose modes expose internals?
+
+**Denial of Service Analysis**:
+- Are rate limits applied to public endpoints?
+- Can resource-intensive operations be triggered without limits?
+- Are file upload sizes bounded?
+- Are database queries bounded (pagination, timeouts)?
+
+**Elevation of Privilege Analysis**:
+- Are role/permission checks applied consistently?
+- Can horizontal privilege escalation occur (accessing other users' data)?
+- Can vertical escalation occur (user -> admin)?
+- Are admin/debug routes properly protected?
+
+### Step 4: Attack Surface Assessment
+
+Quantify the attack surface:
+
+```
+Attack Surface = Sum of:
+  - Number of public API endpoints
+  - Number of external service integrations
+  - Number of user-controllable input points
+  - Number of privileged operations
+  - Number of data stores with sensitive content
+```
+
+Rate each component:
+- **High exposure**: Public-facing, handles sensitive data, complex logic
+- **Medium exposure**: Authenticated access, moderate data sensitivity
+- **Low exposure**: Internal only, no sensitive data, simple operations
+
+## Output
+
+- **File**: `threat-model.json`
+- **Location**: `${WORK_DIR}/threat-model.json`
+- **Format**: JSON
+
+```json
+{
+  "phase": "threat-modeling",
+  "timestamp": "ISO-8601",
+  "framework": "STRIDE",
+  "components": [
+    {
+      "name": "Component name",
+      "type": "api_endpoint|data_store|external_service|auth_module|worker",
+      "files": ["path/to/file.ts"],
+      "exposure": "high|medium|low",
+      "trust_boundaries": ["external", "data"],
+      "threats": {
+        "spoofing": {
+          "applicable": true,
+          "findings": ["Description of threat"],
+          "mitigations": ["Existing mitigation"],
+          "gaps": ["Missing mitigation"]
+        },
+        "tampering": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
+        "repudiation": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
+        "information_disclosure": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
+        "denial_of_service": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
+        "elevation_of_privilege": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] }
+      }
+    }
+  ],
+  "trust_boundaries": [
+    {
+      "name": "Boundary name",
+      "from": "Component A",
+      "to": "Component B",
+      "enforcement": "TLS|auth_middleware|API_key",
+      "data_crossing": ["request bodies", "credentials"],
+      "risk_level": "high|medium|low"
+    }
+  ],
+  "attack_surface": {
+    "public_endpoints": 0,
+    "external_integrations": 0,
+    "input_points": 0,
+    "privileged_operations": 0,
+    "sensitive_data_stores": 0,
+    "total_score": 0
+  },
+  "summary": {
+    "components_analyzed": 0,
+    "threats_identified": 0,
+    "by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 },
+    "high_exposure_components": 0
+  }
+}
+```
+
+## Next Phase
+
+Proceed to [Phase 4: Report & Tracking](04-report-tracking.md) with the threat model to generate the final scored audit report.
--- a/.claude/skills/security-audit/phases/04-report-tracking.md
+++ b/.claude/skills/security-audit/phases/04-report-tracking.md
@@ -0,0 +1,177 @@
+# Phase 4: Report & Tracking
+
+Generate scored audit report, compare with previous audits, and track trends.
+
+## Objective
+
+- Calculate security score from all phase findings
+- Compare with previous audit results (if available)
+- Generate date-stamped report in `.workflow/.security/`
+- Track improvement or regression trends
+
+## Prerequisites
+
+- Phase 1: `supply-chain-report.json`
+- Phase 2: `owasp-findings.json`
+- Phase 3: `threat-model.json`
+- Previous audit: `.workflow/.security/audit-report-*.json` (optional)
+
+## Execution Steps
+
+### Step 1: Aggregate Findings
+
+Collect all findings from phases 1-3 and classify by severity.
+
+```
+All findings =
+  supply-chain-report.findings
+  + owasp-findings.findings
+  + threat-model threats (where gaps exist)
+```
+
+### Step 2: Calculate Score
+
+Apply scoring formula from [specs/scoring-gates.md](../specs/scoring-gates.md):
+
+```
+Base score = 10.0
+
+For each finding:
+  penalty = severity_weight / total_files_scanned
+  - Critical: weight = 10  (each critical finding has outsized impact)
+  - High:     weight = 7
+  - Medium:   weight = 4
+  - Low:      weight = 1
+
+Weighted penalty = SUM(finding_weight * count_per_severity) / normalization_factor
+Final score = max(0, 10.0 - weighted_penalty)
+
+Normalization factor = max(10, total_files_scanned)
+```
+
+**Score interpretation**:
+
+| Score | Rating | Meaning |
+|-------|--------|---------|
+| 9-10 | Excellent | Minimal risk, production-ready |
+| 7-8 | Good | Acceptable risk, minor improvements needed |
+| 5-6 | Fair | Notable risks, remediation recommended |
+| 3-4 | Poor | Significant risks, remediation required |
+| 0-2 | Critical | Severe vulnerabilities, immediate action needed |
+
+### Step 3: Gate Evaluation
+
+**Daily quick-scan gate** (Phase 1 only):
+- PASS: score >= 8/10
+- FAIL: score < 8/10 -- block deployment or flag for review
+
+**Comprehensive audit gate** (all phases):
+- For initial/baseline: PASS if score >= 2/10 (establishes baseline)
+- For subsequent: PASS if score >= previous_score (no regression)
+- Target: score >= 7/10 for production readiness
+
+### Step 4: Trend Comparison
+
+```bash
+# Find previous audit reports
+ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -5
+```
+
+Compare current vs. previous:
+- Delta per OWASP category
+- Delta per STRIDE category
+- New findings vs. resolved findings
+- Overall score trend
+
+### Step 5: Generate Report
+
+Write the final report with all consolidated data.
+
+## Output
+
+- **File**: `audit-report-{YYYY-MM-DD}.json`
+- **Location**: `.workflow/.security/audit-report-{YYYY-MM-DD}.json`
+- **Format**: JSON
+
+```json
+{
+  "report": "security-audit",
+  "version": "1.0",
+  "timestamp": "ISO-8601",
+  "date": "YYYY-MM-DD",
+  "mode": "comprehensive|quick-scan",
+  "score": {
+    "overall": 7.5,
+    "rating": "Good",
+    "gate": "PASS|FAIL",
+    "gate_threshold": 8
+  },
+  "findings_summary": {
+    "total": 0,
+    "by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
+    "by_phase": {
+      "supply_chain": 0,
+      "owasp": 0,
+      "stride": 0
+    },
+    "by_owasp": {
+      "A01": 0, "A02": 0, "A03": 0, "A04": 0, "A05": 0,
+      "A06": 0, "A07": 0, "A08": 0, "A09": 0, "A10": 0
+    },
+    "by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 }
+  },
+  "top_risks": [
+    {
+      "rank": 1,
+      "title": "Most critical finding",
+      "severity": "critical",
+      "source_phase": "owasp",
+      "remediation": "How to fix",
+      "effort": "low|medium|high"
+    }
+  ],
+  "trend": {
+    "previous_date": "YYYY-MM-DD or null",
+    "previous_score": 0,
+    "score_delta": 0,
+    "new_findings": 0,
+    "resolved_findings": 0,
+    "direction": "improving|stable|regressing|baseline"
+  },
+  "phases_completed": ["supply-chain-scan", "owasp-review", "threat-modeling", "report-tracking"],
+  "files_scanned": 0,
+  "remediation_priority": [
+    {
+      "priority": 1,
+      "finding": "Finding title",
+      "effort": "low",
+      "impact": "high",
+      "recommendation": "Specific action"
+    }
+  ]
+}
+```
+
+## Report Storage
+
+```bash
+# Ensure directory exists
+mkdir -p .workflow/.security
+
+# Write report with date stamp
+DATE=$(date +%Y-%m-%d)
+cp "${WORK_DIR}/audit-report.json" ".workflow/.security/audit-report-${DATE}.json"
+
+# Also maintain latest copies of phase outputs
+cp "${WORK_DIR}/supply-chain-report.json" ".workflow/.security/" 2>/dev/null || true
+cp "${WORK_DIR}/owasp-findings.json" ".workflow/.security/" 2>/dev/null || true
+cp "${WORK_DIR}/threat-model.json" ".workflow/.security/" 2>/dev/null || true
+```
+
+## Completion
+
+After report generation, output skill completion status per the Completion Status Protocol:
+
+- **DONE**: All phases completed, report generated, score calculated
+- **DONE_WITH_CONCERNS**: Report generated but score below target or regression detected
+- **BLOCKED**: Phase data missing or corrupted
--- a/.claude/skills/security-audit/specs/owasp-checklist.md
+++ b/.claude/skills/security-audit/specs/owasp-checklist.md
@@ -0,0 +1,442 @@
+# OWASP Top 10 2021 Checklist
+
+Code-level detection patterns, vulnerable code examples, and remediation templates for each OWASP category.
+
+## When to Use
+
+| Phase | Usage | Section |
+|-------|-------|---------|
+| Phase 2 | Reference during OWASP code review | All categories |
+| Phase 4 | Classify findings by OWASP category | Category IDs |
+
+---
+
+## A01: Broken Access Control
+
+**CWE**: CWE-200, CWE-284, CWE-285, CWE-352, CWE-639
+
+### Detection Patterns
+
+```bash
+# Missing auth middleware on route handlers
+grep -rnE 'app\.(get|post|put|delete|patch)\s*\(\s*["\x27/]' --include='*.ts' --include='*.js' .
+# Then verify each route has auth middleware
+
+# Direct object reference without ownership check
+grep -rnE 'findById\(.*params|findOne\(.*params|\.get\(.*id' --include='*.ts' --include='*.js' --include='*.py' .
+
+# Path traversal patterns
+grep -rnE '(readFile|writeFile|createReadStream|open)\s*\(.*req\.' --include='*.ts' --include='*.js' .
+grep -rnE 'os\.path\.join\(.*request\.' --include='*.py' .
+
+# Missing CORS restrictions
+grep -rnE 'Access-Control-Allow-Origin.*\*|cors\(\s*\)' --include='*.ts' --include='*.js' .
+```
+
+### Vulnerable Code Example
+
+```javascript
+// BAD: No ownership check
+app.get('/api/documents/:id', auth, async (req, res) => {
+  const doc = await Document.findById(req.params.id);  // Any user can access any doc
+  res.json(doc);
+});
+```
+
+### Remediation
+
+```javascript
+// GOOD: Ownership check
+app.get('/api/documents/:id', auth, async (req, res) => {
+  const doc = await Document.findOne({ _id: req.params.id, owner: req.user.id });
+  if (!doc) return res.status(404).json({ error: 'Not found' });
+  res.json(doc);
+});
+```
+
+---
+
+## A02: Cryptographic Failures
+
+**CWE**: CWE-259, CWE-327, CWE-331, CWE-798
+
+### Detection Patterns
+
+```bash
+# Weak hash algorithms
+grep -rniE '(md5|sha1)\s*\(' --include='*.ts' --include='*.js' --include='*.py' --include='*.java' .
+
+# Plaintext password storage
+grep -rniE 'password\s*[:=]\s*.*\.(body|query|params)' --include='*.ts' --include='*.js' .
+
+# Hardcoded encryption keys
+grep -rniE '(encrypt|cipher|secret|key)\s*[:=]\s*["\x27][A-Za-z0-9+/=]{8,}' --include='*.ts' --include='*.js' --include='*.py' .
+
+# HTTP (not HTTPS) for sensitive operations
+grep -rniE 'http://.*\.(api|auth|login|payment)' --include='*.ts' --include='*.js' --include='*.py' .
+
+# Missing encryption at rest
+grep -rniE '(password|ssn|credit.?card|social.?security)' --include='*.sql' --include='*.prisma' --include='*.schema' .
+```
+
+### Vulnerable Code Example
+
+```python
+# BAD: MD5 for password hashing
+import hashlib
+password_hash = hashlib.md5(password.encode()).hexdigest()
+```
+
+### Remediation
+
+```python
+# GOOD: bcrypt with proper work factor
+import bcrypt
+password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
+```
+
+---
+
+## A03: Injection
+
+**CWE**: CWE-20, CWE-74, CWE-79, CWE-89
+
+### Detection Patterns
+
+```bash
+# SQL string concatenation/interpolation
+grep -rniE "(query|execute|raw)\s*\(\s*[\`\"'].*(\+|\$\{|%s|\.format)" --include='*.ts' --include='*.js' --include='*.py' .
+grep -rniE "f[\"'].*SELECT.*\{" --include='*.py' .
+
+# NoSQL injection
+grep -rniE '\$where|\$regex.*req\.' --include='*.ts' --include='*.js' .
+grep -rniE 'find\(\s*\{.*req\.(body|query|params)' --include='*.ts' --include='*.js' .
+
+# OS command injection
+grep -rniE '(child_process|exec|execSync|spawn|system|popen|subprocess)\s*\(.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
+
+# XPath/LDAP injection
+grep -rniE '(xpath|ldap).*\+.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
+
+# Template injection
+grep -rniE '(render_template_string|Template\(.*req\.|eval\(.*req\.)' --include='*.py' --include='*.js' .
+```
+
+### Vulnerable Code Example
+
+```javascript
+// BAD: SQL string concatenation
+const result = await db.query(`SELECT * FROM users WHERE id = ${req.params.id}`);
+```
+
+### Remediation
+
+```javascript
+// GOOD: Parameterized query
+const result = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
+```
+
+---
+
+## A04: Insecure Design
+
+**CWE**: CWE-209, CWE-256, CWE-501, CWE-522
+
+### Detection Patterns
+
+```bash
+# Missing rate limiting on auth endpoints
+grep -rniE '(login|register|reset.?password|forgot.?password)' --include='*.ts' --include='*.js' --include='*.py' .
+# Then check if rate limiting middleware is applied
+
+# No account lockout mechanism
+grep -rniE 'failed.?login|login.?attempt|max.?retries' --include='*.ts' --include='*.js' --include='*.py' .
+
+# Business logic without validation
+grep -rniE '(transfer|withdraw|purchase|delete.?account)' --include='*.ts' --include='*.js' --include='*.py' .
+# Then check for confirmation/validation steps
+```
+
+### Checks
+
+- [ ] Authentication flows have rate limiting
+- [ ] Account lockout after N failed attempts
+- [ ] Multi-step operations have proper state validation
+- [ ] Business-critical operations require confirmation
+- [ ] Threat modeling has been performed (see Phase 3)
+
+### Remediation
+
+Implement defense-in-depth: rate limiting, input validation, business logic validation, and multi-step confirmation for critical operations.
+
+---
+
+## A05: Security Misconfiguration
+
+**CWE**: CWE-2, CWE-11, CWE-13, CWE-15, CWE-16, CWE-388
+
+### Detection Patterns
+
+```bash
+# Debug mode enabled
+grep -rniE '(DEBUG|NODE_ENV)\s*[:=]\s*(true|True|1|"development"|"debug")' \
+  --include='*.env' --include='*.env.*' --include='*.py' --include='*.json' --include='*.yaml' .
+
+# Default credentials
+grep -rniE '(admin|root|test|default).*[:=].*password' --include='*.env' --include='*.yaml' --include='*.json' --include='*.py' .
+
+# Verbose error responses (stack traces to client)
+grep -rniE '(stack|stackTrace|traceback).*res\.(json|send)|app\.use.*err.*stack' --include='*.ts' --include='*.js' .
+
+# Missing security headers
+grep -rniE '(helmet|X-Frame-Options|X-Content-Type-Options|Strict-Transport-Security)' --include='*.ts' --include='*.js' .
+
+# Directory listing enabled
+grep -rniE 'autoindex\s+on|directory.?listing|serveStatic.*index.*false' --include='*.conf' --include='*.ts' --include='*.js' .
+
+# Unnecessary features/services
+grep -rniE '(graphiql|playground|swagger-ui).*true' --include='*.ts' --include='*.js' --include='*.py' --include='*.yaml' .
+```
+
+### Vulnerable Code Example
+
+```javascript
+// BAD: Stack trace in error response
+app.use((err, req, res, next) => {
+  res.status(500).json({ error: err.message, stack: err.stack });
+});
+```
+
+### Remediation
+
+```javascript
+// GOOD: Generic error response in production
+app.use((err, req, res, next) => {
+  console.error(err.stack);  // Log internally
+  res.status(500).json({ error: 'Internal server error' });
+});
+```
+
+---
+
+## A06: Vulnerable and Outdated Components
+
+**CWE**: CWE-1104
+
+### Detection Patterns
+
+```bash
+# Check dependency lock files age
+ls -la package-lock.json yarn.lock requirements.txt Pipfile.lock go.sum 2>/dev/null
+
+# Run package audits (from Phase 1)
+npm audit --json 2>/dev/null
+pip-audit --format json 2>/dev/null
+
+# Check for pinned vs unpinned dependencies
+grep -E ':\s*"\^|:\s*"~|:\s*"\*|>=\s' package.json 2>/dev/null
+grep -E '^[a-zA-Z].*[^=]==[^=]' requirements.txt 2>/dev/null  # Good: pinned
+grep -E '^[a-zA-Z].*>=|^[a-zA-Z][^=]*$' requirements.txt 2>/dev/null  # Bad: unpinned
+```
+
+### Checks
+
+- [ ] All dependencies have pinned versions
+- [ ] No known CVEs in dependencies (via audit tools)
+- [ ] Dependencies are actively maintained (not abandoned)
+- [ ] Lock files are committed to version control
+
+### Remediation
+
+Run `npm audit fix` or `pip install --upgrade` for vulnerable packages. Pin all dependency versions. Set up automated dependency scanning (Dependabot, Renovate).
+
+---
+
+## A07: Identification and Authentication Failures
+
+**CWE**: CWE-255, CWE-259, CWE-287, CWE-384
+
+### Detection Patterns
+
+```bash
+# Weak password requirements
+grep -rniE 'password.*length.*[0-5]|minlength.*[0-5]|min.?length.*[0-5]' --include='*.ts' --include='*.js' --include='*.py' .
+
+# Missing password hashing
+grep -rniE 'password\s*[:=].*req\.' --include='*.ts' --include='*.js' .
+# Then check if bcrypt/argon2/scrypt is used before storage
+
+# Session fixation (no rotation after login)
+grep -rniE 'session\.regenerate|session\.id\s*=' --include='*.ts' --include='*.js' .
+
+# JWT without expiration
+grep -rniE 'jwt\.sign\(' --include='*.ts' --include='*.js' .
+# Then check for expiresIn option
+
+# Credentials in URL
+grep -rniE '(token|key|password|secret)=[^&\s]+' --include='*.ts' --include='*.js' --include='*.py' .
+```
+
+### Vulnerable Code Example
+
+```javascript
+// BAD: JWT without expiration
+const token = jwt.sign({ userId: user.id }, SECRET);
+```
+
+### Remediation
+
+```javascript
+// GOOD: JWT with expiration and proper claims
+const token = jwt.sign(
+  { userId: user.id, role: user.role },
+  SECRET,
+  { expiresIn: '1h', issuer: 'myapp', audience: 'myapp-client' }
+);
+```
+
+---
+
+## A08: Software and Data Integrity Failures
+
+**CWE**: CWE-345, CWE-353, CWE-426, CWE-494, CWE-502
+
+### Detection Patterns
+
+```bash
+# Insecure deserialization
+grep -rniE '(pickle\.load|yaml\.load\(|unserialize|JSON\.parse\(.*req\.|eval\()' --include='*.py' --include='*.ts' --include='*.js' --include='*.php' .
+
+# Missing integrity checks on downloads/updates
+grep -rniE '(download|fetch|curl|wget)' --include='*.sh' --include='*.yaml' --include='*.yml' .
+# Then check for checksum/signature verification
+
+# CI/CD pipeline without pinned action versions
+grep -rniE 'uses:\s*[^@]+$|uses:.*@(main|master|latest)' .github/workflows/*.yml 2>/dev/null
+
+# Unsafe YAML loading
+grep -rniE 'yaml\.load\(' --include='*.py' .
+# Should be yaml.safe_load()
+```
+
+### Vulnerable Code Example
+
+```python
+# BAD: Unsafe YAML loading
+import yaml
+data = yaml.load(user_input)  # Allows arbitrary code execution
+```
+
+### Remediation
+
+```python
+# GOOD: Safe YAML loading
+import yaml
+data = yaml.safe_load(user_input)
+```
+
+---
+
+## A09: Security Logging and Monitoring Failures
+
+**CWE**: CWE-223, CWE-532, CWE-778
+
+### Detection Patterns
+
+```bash
+# Check for logging of auth events
+grep -rniE '(log|logger|logging)\.' --include='*.ts' --include='*.js' --include='*.py' .
+# Then check if login/logout/failed-auth events are logged
+
+# Sensitive data in logs
+grep -rniE 'log.*(password|token|secret|credit.?card|ssn)' --include='*.ts' --include='*.js' --include='*.py' .
+
+# Empty catch blocks (swallowed errors)
+grep -rniE 'catch\s*\([^)]*\)\s*\{\s*\}' --include='*.ts' --include='*.js' .
+
+# Missing audit trail for critical operations
+grep -rniE '(delete|update|create|transfer)' --include='*.ts' --include='*.js' --include='*.py' .
+# Then check if these operations are logged with user context
+```
+
+### Checks
+
+- [ ] Failed login attempts are logged with IP and timestamp
+- [ ] Successful logins are logged
+- [ ] Access control failures are logged
+- [ ] Input validation failures are logged
+- [ ] Sensitive data is NOT logged (passwords, tokens, PII)
+- [ ] Logs include sufficient context (who, what, when, where)
+
+### Remediation
+
+Implement structured logging with: user ID, action, timestamp, IP address, result (success/failure). Exclude sensitive data. Set up log monitoring and alerting for anomalous patterns.
+
+---
+
+## A10: Server-Side Request Forgery (SSRF)
+
+**CWE**: CWE-918
+
+### Detection Patterns
+
+```bash
+# User-controlled URLs in fetch/request calls
+grep -rniE '(fetch|axios|http\.request|requests\.(get|post)|urllib)\s*\(.*req\.(body|query|params)' \
+  --include='*.ts' --include='*.js' --include='*.py' .
+
+# URL construction from user input
+grep -rniE '(url|endpoint|target|redirect)\s*[:=].*req\.(body|query|params)' --include='*.ts' --include='*.js' --include='*.py' .
+
+# Image/file fetch from URL
+grep -rniE '(download|fetchImage|getFile|loadUrl)\s*\(.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
+
+# Redirect without validation
+grep -rniE 'res\.redirect\(.*req\.|redirect_to.*request\.' --include='*.ts' --include='*.js' --include='*.py' .
+```
+
+### Vulnerable Code Example
+
+```javascript
+// BAD: Unvalidated URL fetch
+app.get('/proxy', async (req, res) => {
+  const response = await fetch(req.query.url);  // Can access internal services
+  res.send(await response.text());
+});
+```
+
+### Remediation
+
+```javascript
+// GOOD: URL allowlist validation
+const ALLOWED_HOSTS = ['api.example.com', 'cdn.example.com'];
+
+app.get('/proxy', async (req, res) => {
+  const url = new URL(req.query.url);
+  if (!ALLOWED_HOSTS.includes(url.hostname)) {
+    return res.status(400).json({ error: 'Host not allowed' });
+  }
+  if (url.protocol !== 'https:') {
+    return res.status(400).json({ error: 'HTTPS required' });
+  }
+  const response = await fetch(url.toString());
+  res.send(await response.text());
+});
+```
+
+---
+
+## Quick Reference
+
+| ID | Category | Key Grep Pattern | Severity Baseline |
+|----|----------|-----------------|-------------------|
+| A01 | Broken Access Control | `findById.*params` without owner check | High |
+| A02 | Cryptographic Failures | `md5\|sha1` for passwords | High |
+| A03 | Injection | `query.*\+.*req\.\|f".*SELECT.*\{` | Critical |
+| A04 | Insecure Design | Missing rate limit on auth routes | Medium |
+| A05 | Security Misconfiguration | `DEBUG.*true\|stack.*res.json` | Medium |
+| A06 | Vulnerable Components | `npm audit` / `pip-audit` results | Varies |
+| A07 | Auth Failures | `jwt.sign` without `expiresIn` | High |
+| A08 | Integrity Failures | `pickle.load\|yaml.load` | High |
+| A09 | Logging Failures | Empty catch blocks, no auth logging | Medium |
+| A10 | SSRF | `fetch.*req.query.url` | High |
--- a/.claude/skills/security-audit/specs/scoring-gates.md
+++ b/.claude/skills/security-audit/specs/scoring-gates.md
@@ -0,0 +1,141 @@
+# Scoring Gates
+
+Defines the 10-point scoring system, severity weights, quality gates, and trend tracking format for security audits.
+
+## When to Use
+
+| Phase | Usage | Section |
+|-------|-------|---------|
+| Phase 1 | Quick-scan scoring (daily gate) | Severity Weights, Daily Gate |
+| Phase 4 | Full audit scoring and reporting | All sections |
+
+---
+
+## 10-Point Scale
+
+All security audit scores are on a 0-10 scale where 10 = no findings and 0 = critical exposure.
+
+| Score | Rating | Description |
+|-------|--------|-------------|
+| 9.0 - 10.0 | Excellent | Minimal risk. Production-ready without reservations. |
+| 7.0 - 8.9 | Good | Low risk. Acceptable for production with minor improvements. |
+| 5.0 - 6.9 | Fair | Moderate risk. Remediation recommended before production. |
+| 3.0 - 4.9 | Poor | High risk. Remediation required. Not production-ready. |
+| 0.0 - 2.9 | Critical | Severe exposure. Immediate action required. |
+
+## Severity Weights
+
+Each finding is weighted by severity for score calculation.
+
+| Severity | Weight | Criteria | Examples |
+|----------|--------|----------|----------|
+| **Critical** | 10 | Exploitable with high impact, no user interaction needed | RCE, SQL injection with data access, leaked production credentials, auth bypass |
+| **High** | 7 | Exploitable with significant impact, may need user interaction | Broken authentication, SSRF, privilege escalation, XSS with session theft |
+| **Medium** | 4 | Limited exploitability or moderate impact | Reflected XSS, CSRF, verbose error messages, missing security headers |
+| **Low** | 1 | Informational or minimal impact | Missing best-practice headers, minor info disclosure, deprecated dependencies without known exploit |
+
+## Score Calculation
+
+```
+Input:
+  findings[]     -- array of all findings with severity
+  files_scanned  -- total source files analyzed
+
+Algorithm:
+  base_score = 10.0
+  normalization = max(10, files_scanned)
+
+  weighted_sum = 0
+  for each finding:
+    weighted_sum += severity_weight(finding.severity)
+
+  penalty = weighted_sum / normalization
+  final_score = max(0, base_score - penalty)
+  final_score = round(final_score, 1)
+
+  return final_score
+```
+
+**Example**:
+
+| Findings | Files Scanned | Weighted Sum | Penalty | Score |
+|----------|--------------|--------------|---------|-------|
+| 1 critical | 50 | 10 | 0.2 | 9.8 |
+| 2 critical, 3 high | 50 | 41 | 0.82 | 9.2 |
+| 5 critical, 10 high | 50 | 120 | 2.4 | 7.6 |
+| 10 critical, 20 high, 15 medium | 100 | 300 | 3.0 | 7.0 |
+| 20 critical | 20 | 200 | 10.0 | 0.0 |
+
+## Quality Gates
+
+### Daily Quick-Scan Gate
+
+Applies to Phase 1 (Supply Chain Scan) only.
+
+| Result | Condition | Action |
+|--------|-----------|--------|
+| **PASS** | score >= 8.0 | Continue. No blocking issues. |
+| **WARN** | 6.0 <= score < 8.0 | Log warning. Review findings before deploy. |
+| **FAIL** | score < 6.0 | Block deployment. Remediate critical/high findings. |
+
+### Comprehensive Audit Gate
+
+Applies to full audit (all 4 phases).
+
+**Initial/Baseline audit** (no previous audit exists):
+
+| Result | Condition | Action |
+|--------|-----------|--------|
+| **PASS** | score >= 2.0 | Baseline established. Plan remediation. |
+| **FAIL** | score < 2.0 | Critical exposure. Immediate triage required. |
+
+**Subsequent audits** (previous audit exists):
+
+| Result | Condition | Action |
+|--------|-----------|--------|
+| **PASS** | score >= previous_score | No regression. Continue improvement. |
+| **WARN** | score within 0.5 of previous | Marginal change. Review new findings. |
+| **FAIL** | score < previous_score - 0.5 | Regression detected. Investigate new findings. |
+
+**Production readiness target**: score >= 7.0
+
+## Trend Tracking Format
+
+Each audit report stores trend data for comparison.
+
+```json
+{
+  "trend": {
+    "current_date": "2026-03-29",
+    "current_score": 7.5,
+    "previous_date": "2026-03-22",
+    "previous_score": 6.8,
+    "score_delta": 0.7,
+    "new_findings": 2,
+    "resolved_findings": 5,
+    "direction": "improving",
+    "history": [
+      { "date": "2026-03-15", "score": 5.2, "total_findings": 45 },
+      { "date": "2026-03-22", "score": 6.8, "total_findings": 32 },
+      { "date": "2026-03-29", "score": 7.5, "total_findings": 29 }
+    ]
+  }
+}
+```
+
+**Direction values**:
+
+| Direction | Condition |
+|-----------|-----------|
+| `improving` | score_delta > 0.5 |
+| `stable` | -0.5 <= score_delta <= 0.5 |
+| `regressing` | score_delta < -0.5 |
+| `baseline` | No previous audit exists |
+
+## Finding Deduplication
+
+When the same vulnerability appears in multiple phases:
+1. Keep the highest-severity classification
+2. Merge evidence from all phases
+3. Count as a single finding for scoring
+4. Note all phases that detected it
--- a/.claude/skills/ship/SKILL.md
+++ b/.claude/skills/ship/SKILL.md
@@ -0,0 +1,105 @@
+---
+name: ship
+description: Structured release pipeline with pre-flight checks, AI code review, version bump, changelog, and PR creation. Triggers on "ship", "release", "publish".
+allowed-tools: Read, Write, Bash, Glob, Grep
+---
+
+# Ship
+
+Structured release pipeline that guides code from working branch to pull request through 5 gated phases: pre-flight checks, automated code review, version bump, changelog generation, and PR creation.
+
+## Key Design Principles
+
+1. **Phase Gates**: Each phase must pass before the next begins — no shipping broken code
+2. **Multi-Project Support**: Detects npm (package.json), Python (pyproject.toml), and generic (VERSION) projects
+3. **AI-Powered Review**: Uses CCW CLI to run automated code review before release
+4. **Audit Trail**: Each phase produces structured output for traceability
+5. **Safe Defaults**: Warns on risky operations (direct push to main, major version bumps)
+
+## Architecture Overview
+
+```
+User: "ship" / "release" / "publish"
+         |
+         v
+┌──────────────────────────────────────────────────────────┐
+│  Phase 1: Pre-Flight Checks                              │
+│  → git clean? branch ok? tests pass? build ok?           │
+│  → Output: preflight-report.json                         │
+│  → Gate: ALL checks must pass                            │
+├──────────────────────────────────────────────────────────┤
+│  Phase 2: Code Review                                    │
+│  → detect merge base, diff against base                  │
+│  → ccw cli --tool gemini --mode analysis                 │
+│  → flag high-risk changes                                │
+│  → Output: review-summary                                │
+│  → Gate: No critical issues flagged                      │
+├──────────────────────────────────────────────────────────┤
+│  Phase 3: Version Bump                                   │
+│  → detect version file (package.json/pyproject.toml/VERSION)
+│  → determine bump type from commits or user input        │
+│  → update version file                                   │
+│  → Output: version change record                         │
+│  → Gate: Version updated successfully                    │
+├──────────────────────────────────────────────────────────┤
+│  Phase 4: Changelog & Commit                             │
+│  → generate changelog from git log since last tag        │
+│  → update CHANGELOG.md                                   │
+│  → create release commit, push to remote                 │
+│  → Output: commit SHA                                    │
+│  → Gate: Push successful                                 │
+├──────────────────────────────────────────────────────────┤
+│  Phase 5: PR Creation                                    │
+│  → gh pr create with structured body                     │
+│  → auto-link issues from commits                         │
+│  → Output: PR URL                                        │
+│  → Gate: PR created                                      │
+└──────────────────────────────────────────────────────────┘
+```
+
+## Execution Flow
+
+Execute phases sequentially. Each phase has a gate condition — if the gate fails, stop and report status.
+
+1. **Phase 1**: [Pre-Flight Checks](phases/01-preflight-checks.md) -- Validate git state, branch, tests, build
+2. **Phase 2**: [Code Review](phases/02-code-review.md) -- AI-powered diff review with risk assessment
+3. **Phase 3**: [Version Bump](phases/03-version-bump.md) -- Detect and update version across project types
+4. **Phase 4**: [Changelog & Commit](phases/04-changelog-commit.md) -- Generate changelog, create release commit, push
+5. **Phase 5**: [PR Creation](phases/05-pr-creation.md) -- Create PR with structured body and issue links
+
+## Pre-Flight Checklist (Quick Reference)
+
+| Check | Command | Pass Condition |
+|-------|---------|----------------|
+| Git clean | `git status --porcelain` | Empty output |
+| Branch | `git branch --show-current` | Not main/master |
+| Tests | `npm test` / `pytest` | Exit code 0 |
+| Build | `npm run build` / `python -m build` | Exit code 0 |
+
+## Completion Status Protocol
+
+This skill follows the Completion Status Protocol defined in [SKILL-DESIGN-SPEC.md sections 13-14](../_shared/SKILL-DESIGN-SPEC.md#13-completion-status-protocol).
+
+Every execution terminates with one of:
+
+| Status | When |
+|--------|------|
+| **DONE** | All 5 phases completed, PR created |
+| **DONE_WITH_CONCERNS** | PR created but with review warnings or non-critical issues |
+| **BLOCKED** | A gate failed (dirty git, tests fail, push rejected) |
+| **NEEDS_CONTEXT** | Cannot determine bump type, ambiguous branch target |
+
+### Escalation
+
+Follows the Three-Strike Rule (SKILL-DESIGN-SPEC section 14). On 3 consecutive failures at the same step, stop and output diagnostic dump.
+
+## Reference Documents
+
+| Document | Purpose |
+|----------|---------|
+| [phases/01-preflight-checks.md](phases/01-preflight-checks.md) | Git, branch, test, build validation |
+| [phases/02-code-review.md](phases/02-code-review.md) | AI-powered diff review |
+| [phases/03-version-bump.md](phases/03-version-bump.md) | Version detection and bump |
+| [phases/04-changelog-commit.md](phases/04-changelog-commit.md) | Changelog generation and release commit |
+| [phases/05-pr-creation.md](phases/05-pr-creation.md) | PR creation with issue linking |
+| [../_shared/SKILL-DESIGN-SPEC.md](../_shared/SKILL-DESIGN-SPEC.md) | Skill design spec (completion protocol, escalation) |
--- a/.claude/skills/ship/phases/01-preflight-checks.md
+++ b/.claude/skills/ship/phases/01-preflight-checks.md
@@ -0,0 +1,121 @@
+# Phase 1: Pre-Flight Checks
+
+Validate that the repository is in a shippable state before proceeding with the release pipeline.
+
+## Objective
+
+- Confirm working tree is clean (no uncommitted changes)
+- Validate current branch is appropriate for release
+- Run test suite and confirm all tests pass
+- Verify build succeeds
+
+## Gate Condition
+
+ALL four checks must pass. If any check fails, stop the pipeline and report BLOCKED status with the specific failure.
+
+## Execution Steps
+
+### Step 1: Git Clean Check
+
+```bash
+git_status=$(git status --porcelain)
+if [ -n "$git_status" ]; then
+  echo "FAIL: Working tree is dirty"
+  echo "$git_status"
+  # Gate: BLOCKED — commit or stash changes first
+else
+  echo "PASS: Working tree is clean"
+fi
+```
+
+**Pass condition**: `git status --porcelain` produces empty output.
+**On failure**: Report dirty files and suggest `git stash` or `git commit`.
+
+### Step 2: Branch Validation
+
+```bash
+current_branch=$(git branch --show-current)
+if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
+  echo "WARN: Currently on $current_branch — direct push to main/master is risky"
+  # Ask user for confirmation before proceeding
+else
+  echo "PASS: On branch $current_branch"
+fi
+```
+
+**Pass condition**: Not on main/master, OR user explicitly confirms direct-to-main release.
+**On warning**: Ask user to confirm they intend to release from main/master directly.
+
+### Step 3: Test Suite Execution
+
+Detect and run the project's test suite:
+
+```bash
+# Detection priority:
+# 1. package.json with "test" script → npm test
+# 2. pytest available and tests exist → pytest
+# 3. No tests found → WARN and continue
+
+if [ -f "package.json" ] && grep -q '"test"' package.json; then
+  npm test
+elif command -v pytest &>/dev/null && [ -d "tests" -o -d "test" ]; then
+  pytest
+elif [ -f "pyproject.toml" ] && grep -q 'pytest' pyproject.toml; then
+  pytest
+else
+  echo "WARN: No test suite detected — skipping test check"
+fi
+```
+
+**Pass condition**: Test command exits with code 0, or no tests detected (warn).
+**On failure**: Report test failures and stop the pipeline.
+
+### Step 4: Build Verification
+
+Detect and run the project's build step:
+
+```bash
+# Detection priority:
+# 1. package.json with "build" script → npm run build
+# 2. pyproject.toml → python -m build (if build module available)
+# 3. Makefile with build target → make build
+# 4. No build step → PASS (not all projects need a build)
+
+if [ -f "package.json" ] && grep -q '"build"' package.json; then
+  npm run build
+elif [ -f "pyproject.toml" ] && python -m build --help &>/dev/null; then
+  python -m build
+elif [ -f "Makefile" ] && grep -q '^build:' Makefile; then
+  make build
+else
+  echo "INFO: No build step detected — skipping build check"
+fi
+```
+
+**Pass condition**: Build command exits with code 0, or no build step detected.
+**On failure**: Report build errors and stop the pipeline.
+
+## Output
+
+- **Format**: JSON object with pass/fail per check
+- **Structure**:
+
+```json
+{
+  "phase": "preflight",
+  "timestamp": "ISO-8601",
+  "checks": {
+    "git_clean": { "status": "pass|fail", "details": "" },
+    "branch": { "status": "pass|warn", "current": "branch-name", "details": "" },
+    "tests": { "status": "pass|fail|skip", "details": "" },
+    "build": { "status": "pass|fail|skip", "details": "" }
+  },
+  "overall": "pass|fail",
+  "blockers": []
+}
+```
+
+## Next Phase
+
+If all checks pass, proceed to [Phase 2: Code Review](02-code-review.md).
+If any check fails, report BLOCKED status with the preflight report.
--- a/.claude/skills/ship/phases/02-code-review.md
+++ b/.claude/skills/ship/phases/02-code-review.md
@@ -0,0 +1,137 @@
+# Phase 2: Code Review
+
+Automated AI-powered code review of changes since the base branch, with risk assessment.
+
+## Objective
+
+- Detect the merge base between current branch and target branch
+- Generate diff for review
+- Run AI-powered code review via CCW CLI
+- Flag high-risk changes (large diffs, sensitive files, breaking changes)
+
+## Gate Condition
+
+No critical issues flagged by the review. Warnings are reported but do not block.
+
+## Execution Steps
+
+### Step 1: Detect Merge Base
+
+```bash
+# Determine target branch (default: main, fallback: master)
+target_branch="main"
+if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
+  target_branch="master"
+fi
+
+# Find merge base
+merge_base=$(git merge-base "origin/$target_branch" HEAD)
+echo "Merge base: $merge_base"
+
+# If on main/master directly, compare against last tag
+current_branch=$(git branch --show-current)
+if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
+  last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+  if [ -n "$last_tag" ]; then
+    merge_base="$last_tag"
+    echo "On main — using last tag as base: $last_tag"
+  else
+    # Use first commit if no tags exist
+    merge_base=$(git rev-list --max-parents=0 HEAD | head -1)
+    echo "No tags found — using initial commit as base"
+  fi
+fi
+```
+
+### Step 2: Generate Diff Summary
+
+```bash
+# File-level summary
+git diff --stat "$merge_base"...HEAD
+
+# Full diff for review
+git diff "$merge_base"...HEAD > /tmp/ship-review-diff.txt
+
+# Count changes for risk assessment
+files_changed=$(git diff --name-only "$merge_base"...HEAD | wc -l)
+lines_added=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$1} END {print s}')
+lines_removed=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$2} END {print s}')
+```
+
+### Step 3: Risk Assessment
+
+Flag high-risk indicators before AI review:
+
+| Risk Factor | Threshold | Risk Level |
+|-------------|-----------|------------|
+| Files changed | > 50 | High |
+| Lines changed | > 1000 | High |
+| Sensitive files modified | Any of: `.env*`, `*secret*`, `*credential*`, `*auth*`, `*.key`, `*.pem` | High |
+| Config files modified | `package.json`, `pyproject.toml`, `tsconfig.json`, `Dockerfile` | Medium |
+| Migration files | `*migration*`, `*migrate*` | Medium |
+
+```bash
+# Check for sensitive file changes
+sensitive_files=$(git diff --name-only "$merge_base"...HEAD | grep -iE '\.(env|key|pem)|secret|credential' || true)
+if [ -n "$sensitive_files" ]; then
+  echo "HIGH RISK: Sensitive files modified:"
+  echo "$sensitive_files"
+fi
+```
+
+### Step 4: AI Code Review
+
+Use CCW CLI for automated analysis:
+
+```bash
+ccw cli -p "PURPOSE: Review code changes for release readiness; success = all critical issues identified with file:line references
+TASK: Review diff for bugs | Check for breaking changes | Identify security concerns | Assess test coverage gaps
+MODE: analysis
+CONTEXT: @**/* | Reviewing diff from $merge_base to HEAD ($files_changed files, +$lines_added/-$lines_removed lines)
+EXPECTED: Risk assessment (low/medium/high), list of issues with severity and file:line, release recommendation (ship/hold/fix-first)
+CONSTRAINTS: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes
+" --tool gemini --mode analysis
+```
+
+**Note**: Wait for the CLI analysis to complete before proceeding. Do not proceed to Phase 3 while review is running.
+
+### Step 5: Evaluate Review Results
+
+Based on the AI review output:
+
+| Review Result | Action |
+|---------------|--------|
+| No critical issues | Proceed to Phase 3 |
+| Critical issues found | Report BLOCKED, list issues |
+| Warnings only | Proceed with DONE_WITH_CONCERNS note |
+| Review failed/timeout | Ask user whether to proceed or retry |
+
+## Output
+
+- **Format**: Review summary with risk assessment
+- **Structure**:
+
+```json
+{
+  "phase": "code-review",
+  "merge_base": "commit-sha",
+  "stats": {
+    "files_changed": 0,
+    "lines_added": 0,
+    "lines_removed": 0
+  },
+  "risk_level": "low|medium|high",
+  "risk_factors": [],
+  "ai_review": {
+    "recommendation": "ship|hold|fix-first",
+    "critical_issues": [],
+    "warnings": []
+  },
+  "overall": "pass|fail|warn"
+}
+```
+
+## Next Phase
+
+If review passes (no critical issues), proceed to [Phase 3: Version Bump](03-version-bump.md).
+If critical issues found, report BLOCKED status with review summary.
--- a/.claude/skills/ship/phases/03-version-bump.md
+++ b/.claude/skills/ship/phases/03-version-bump.md
@@ -0,0 +1,171 @@
+# Phase 3: Version Bump
+
+Detect the current version, determine the bump type, and update the version file.
+
+## Objective
+
+- Detect which version file the project uses
+- Read the current version
+- Determine bump type (patch/minor/major) from commit messages or user input
+- Update the version file
+- Record the version change
+
+## Gate Condition
+
+Version file updated successfully with the new version.
+
+## Execution Steps
+
+### Step 1: Detect Version File
+
+Detection priority order:
+
+| Priority | File | Read Method |
+|----------|------|-------------|
+| 1 | `package.json` | `jq -r .version package.json` |
+| 2 | `pyproject.toml` | `grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml` |
+| 3 | `VERSION` | `cat VERSION` |
+
+```bash
+if [ -f "package.json" ]; then
+  version_file="package.json"
+  current_version=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
+elif [ -f "pyproject.toml" ]; then
+  version_file="pyproject.toml"
+  current_version=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
+elif [ -f "VERSION" ]; then
+  version_file="VERSION"
+  current_version=$(cat VERSION | tr -d '[:space:]')
+else
+  echo "NEEDS_CONTEXT: No version file found"
+  echo "Expected one of: package.json, pyproject.toml, VERSION"
+  # Ask user which file to use or create
+fi
+
+echo "Version file: $version_file"
+echo "Current version: $current_version"
+```
+
+### Step 2: Determine Bump Type
+
+**Auto-detection from commit messages** (conventional commits):
+
+```bash
+# Get commits since last tag
+last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+if [ -n "$last_tag" ]; then
+  commits=$(git log "$last_tag"..HEAD --oneline)
+else
+  commits=$(git log --oneline -20)
+fi
+
+# Scan for conventional commit prefixes
+has_breaking=$(echo "$commits" | grep -iE '(BREAKING CHANGE|!:)' || true)
+has_feat=$(echo "$commits" | grep -iE '^[a-f0-9]+ feat' || true)
+has_fix=$(echo "$commits" | grep -iE '^[a-f0-9]+ fix' || true)
+
+if [ -n "$has_breaking" ]; then
+  suggested_bump="major"
+elif [ -n "$has_feat" ]; then
+  suggested_bump="minor"
+else
+  suggested_bump="patch"
+fi
+
+echo "Suggested bump: $suggested_bump"
+```
+
+**User confirmation**:
+- For `patch` and `minor`: proceed with suggested bump, inform user
+- For `major`: always ask user to confirm before proceeding (major bumps have significant implications)
+- User can override the suggestion with an explicit bump type
+
+### Step 3: Calculate New Version
+
+```bash
+# Parse semver components
+IFS='.' read -r major minor patch <<< "$current_version"
+
+case "$bump_type" in
+  major)
+    new_version="$((major + 1)).0.0"
+    ;;
+  minor)
+    new_version="${major}.$((minor + 1)).0"
+    ;;
+  patch)
+    new_version="${major}.${minor}.$((patch + 1))"
+    ;;
+esac
+
+echo "Version bump: $current_version -> $new_version"
+```
+
+### Step 4: Update Version File
+
+```bash
+case "$version_file" in
+  package.json)
+    # Use node/jq for safe JSON update
+    jq --arg v "$new_version" '.version = $v' package.json > tmp.json && mv tmp.json package.json
+    # Also update package-lock.json if it exists
+    if [ -f "package-lock.json" ]; then
+      jq --arg v "$new_version" '.version = $v | .packages[""].version = $v' package-lock.json > tmp.json && mv tmp.json package-lock.json
+    fi
+    ;;
+  pyproject.toml)
+    # Use sed for TOML update (version line in [project] or [tool.poetry])
+    sed -i "s/^version\s*=\s*\".*\"/version = \"$new_version\"/" pyproject.toml
+    ;;
+  VERSION)
+    echo "$new_version" > VERSION
+    ;;
+esac
+
+echo "Updated $version_file: $current_version -> $new_version"
+```
+
+### Step 5: Verify Update
+
+```bash
+# Re-read to confirm
+case "$version_file" in
+  package.json)
+    verified=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
+    ;;
+  pyproject.toml)
+    verified=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
+    ;;
+  VERSION)
+    verified=$(cat VERSION | tr -d '[:space:]')
+    ;;
+esac
+
+if [ "$verified" = "$new_version" ]; then
+  echo "PASS: Version verified as $new_version"
+else
+  echo "FAIL: Version mismatch — expected $new_version, got $verified"
+fi
+```
+
+## Output
+
+- **Format**: Version change record
+- **Structure**:
+
+```json
+{
+  "phase": "version-bump",
+  "version_file": "package.json",
+  "previous_version": "1.2.3",
+  "new_version": "1.3.0",
+  "bump_type": "minor",
+  "bump_source": "auto-detected|user-specified",
+  "overall": "pass|fail"
+}
+```
+
+## Next Phase
+
+If version updated successfully, proceed to [Phase 4: Changelog & Commit](04-changelog-commit.md).
+If version update fails, report BLOCKED status.
--- a/.claude/skills/ship/phases/04-changelog-commit.md
+++ b/.claude/skills/ship/phases/04-changelog-commit.md
@@ -0,0 +1,167 @@
+# Phase 4: Changelog & Commit
+
+Generate changelog entry from git history, update CHANGELOG.md, create release commit, and push to remote.
+
+## Objective
+
+- Parse git log since last tag into grouped changelog entry
+- Update or create CHANGELOG.md
+- Create a release commit with version in the message
+- Push the branch to remote
+
+## Gate Condition
+
+Release commit created and pushed to remote successfully.
+
+## Execution Steps
+
+### Step 1: Gather Commits Since Last Tag
+
+```bash
+last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+
+if [ -n "$last_tag" ]; then
+  echo "Generating changelog since tag: $last_tag"
+  git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges
+else
+  echo "No previous tag found — using last 50 commits"
+  git log --pretty=format:"%h %s" --no-merges -50
+fi
+```
+
+### Step 2: Group Commits by Conventional Commit Type
+
+Parse commit messages and group into categories:
+
+| Prefix | Category | Changelog Section |
+|--------|----------|-------------------|
+| `feat:` / `feat(*):`| Features | **Features** |
+| `fix:` / `fix(*):`| Bug Fixes | **Bug Fixes** |
+| `perf:` | Performance | **Performance** |
+| `docs:` | Documentation | **Documentation** |
+| `refactor:` | Refactoring | **Refactoring** |
+| `chore:` | Maintenance | **Maintenance** |
+| `test:` | Testing | *(omitted from changelog)* |
+| Other | Miscellaneous | **Other Changes** |
+
+```bash
+# Example grouping logic (executed by the agent, not a literal script):
+# 1. Read all commits since last tag
+# 2. Parse prefix from each commit message
+# 3. Group into categories
+# 4. Format as markdown sections
+# 5. Omit empty categories
+```
+
+### Step 3: Format Changelog Entry
+
+Generate a markdown changelog entry:
+
+```markdown
+## [X.Y.Z] - YYYY-MM-DD
+
+### Features
+- feat: description (sha)
+- feat(scope): description (sha)
+
+### Bug Fixes
+- fix: description (sha)
+
+### Performance
+- perf: description (sha)
+
+### Other Changes
+- chore: description (sha)
+```
+
+Rules:
+- Date format: YYYY-MM-DD (ISO 8601)
+- Each entry includes the short SHA for traceability
+- Empty categories are omitted
+- Entries are listed in chronological order within each category
+
+### Step 4: Update CHANGELOG.md
+
+```bash
+if [ -f "CHANGELOG.md" ]; then
+  # Insert new entry after the first heading line (# Changelog)
+  # The new entry goes between the main heading and the previous version entry
+  # Use Write tool to insert the new section at the correct position
+  echo "Updating existing CHANGELOG.md"
+else
+  # Create new CHANGELOG.md with header
+  echo "Creating new CHANGELOG.md"
+fi
+```
+
+**CHANGELOG.md structure**:
+```markdown
+# Changelog
+
+## [X.Y.Z] - YYYY-MM-DD
+(new entry here)
+
+## [X.Y.Z-1] - YYYY-MM-DD
+(previous entry)
+```
+
+### Step 5: Create Release Commit
+
+```bash
+# Stage version file and changelog
+git add package.json package-lock.json pyproject.toml VERSION CHANGELOG.md 2>/dev/null
+
+# Only stage files that actually exist and are modified
+git add -u
+
+# Create release commit
+git commit -m "$(cat <<'EOF'
+chore: bump version to X.Y.Z
+
+Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
+EOF
+)"
+```
+
+**Commit message format**: `chore: bump version to X.Y.Z`
+- Follows conventional commit format
+- Includes Co-Authored-By trailer
+
+### Step 6: Push to Remote
+
+```bash
+current_branch=$(git branch --show-current)
+
+# Check if remote tracking branch exists
+if git rev-parse --verify "origin/$current_branch" &>/dev/null; then
+  git push origin "$current_branch"
+else
+  git push -u origin "$current_branch"
+fi
+```
+
+**On push failure**:
+- If rejected (non-fast-forward): Report BLOCKED, suggest `git pull --rebase`
+- If permission denied: Report BLOCKED, check remote access
+- If no remote configured: Report BLOCKED, suggest `git remote add`
+
+## Output
+
+- **Format**: Commit and push record
+- **Structure**:
+
+```json
+{
+  "phase": "changelog-commit",
+  "changelog_entry": "## [X.Y.Z] - YYYY-MM-DD ...",
+  "commit_sha": "abc1234",
+  "commit_message": "chore: bump version to X.Y.Z",
+  "pushed_to": "origin/branch-name",
+  "overall": "pass|fail"
+}
+```
+
+## Next Phase
+
+If commit and push succeed, proceed to [Phase 5: PR Creation](05-pr-creation.md).
+If push fails, report BLOCKED status with error details.
--- a/.claude/skills/ship/phases/05-pr-creation.md
+++ b/.claude/skills/ship/phases/05-pr-creation.md
@@ -0,0 +1,163 @@
+# Phase 5: PR Creation
+
+Create a pull request via GitHub CLI with a structured body, linked issues, and release metadata.
+
+## Objective
+
+- Create a PR using `gh pr create` with structured body
+- Auto-link related issues from commit messages
+- Include release summary (version, changes, test plan)
+- Output the PR URL
+
+## Gate Condition
+
+PR created successfully and URL returned.
+
+## Execution Steps
+
+### Step 1: Extract Issue References from Commits
+
+```bash
+last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+
+if [ -n "$last_tag" ]; then
+  commits=$(git log "$last_tag"..HEAD --pretty=format:"%s" --no-merges)
+else
+  commits=$(git log --pretty=format:"%s" --no-merges -50)
+fi
+
+# Extract issue references: fixes #N, closes #N, resolves #N, refs #N
+issues=$(echo "$commits" | grep -oiE '(fix(es)?|close[sd]?|resolve[sd]?|refs?)\s*#[0-9]+' | grep -oE '#[0-9]+' | sort -u || true)
+
+echo "Referenced issues: $issues"
+```
+
+### Step 2: Determine Target Branch
+
+```bash
+# Default target: main (fallback: master)
+target_branch="main"
+if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
+  target_branch="master"
+fi
+
+current_branch=$(git branch --show-current)
+echo "PR: $current_branch -> $target_branch"
+```
+
+### Step 3: Build PR Title
+
+Format: `release: vX.Y.Z`
+
+```bash
+pr_title="release: v${new_version}"
+```
+
+If the version context is not available, fall back to a descriptive title from the branch name.
+
+### Step 4: Build PR Body
+
+Construct the PR body using a HEREDOC for correct formatting:
+
+```bash
+# Gather change summary
+change_summary=$(git log "$merge_base"..HEAD --pretty=format:"- %s (%h)" --no-merges)
+
+# Build linked issues section
+if [ -n "$issues" ]; then
+  issues_section="## Linked Issues
+$(echo "$issues" | while read -r issue; do echo "- $issue"; done)"
+else
+  issues_section=""
+fi
+```
+
+### Step 5: Create PR via gh CLI
+
+```bash
+gh pr create --title "$pr_title" --base "$target_branch" --body "$(cat <<'EOF'
+## Summary
+Release vX.Y.Z
+
+### Changes
+- list of changes from changelog
+
+## Linked Issues
+- #N (fixes)
+- #M (closes)
+
+## Version
+- Previous: X.Y.Z-1
+- New: X.Y.Z
+- Bump type: patch|minor|major
+
+## Test Plan
+- [ ] Pre-flight checks passed (git clean, branch, tests, build)
+- [ ] AI code review completed with no critical issues
+- [ ] Version bump verified in version file
+- [ ] Changelog updated with all changes since last release
+- [ ] Release commit pushed successfully
+
+Generated with [Claude Code](https://claude.com/claude-code)
+EOF
+)"
+```
+
+**PR body sections**:
+
+| Section | Content |
+|---------|---------|
+| **Summary** | Version being released, one-line description |
+| **Changes** | Grouped changelog entries (from Phase 4) |
+| **Linked Issues** | Auto-extracted `fixes #N`, `closes #N` references |
+| **Version** | Previous version, new version, bump type |
+| **Test Plan** | Checklist confirming all phases passed |
+
+### Step 6: Capture and Report PR URL
+
+```bash
+# gh pr create outputs the PR URL on success
+pr_url=$(gh pr create ... 2>&1 | tail -1)
+echo "PR created: $pr_url"
+```
+
+## Output
+
+- **Format**: PR creation record
+- **Structure**:
+
+```json
+{
+  "phase": "pr-creation",
+  "pr_url": "https://github.com/owner/repo/pull/N",
+  "pr_title": "release: vX.Y.Z",
+  "target_branch": "main",
+  "source_branch": "feature-branch",
+  "linked_issues": ["#1", "#2"],
+  "overall": "pass|fail"
+}
+```
+
+## Completion
+
+After PR creation, output the final Completion Status:
+
+```
+## STATUS: DONE
+
+**Summary**: Released vX.Y.Z — PR created at {pr_url}
+
+### Details
+- Phases completed: 5/5
+- Version: {previous} -> {new} ({bump_type})
+- PR: {pr_url}
+- Key outputs: CHANGELOG.md updated, release commit pushed, PR created
+
+### Outputs
+- CHANGELOG.md (updated)
+- {version_file} (version bumped)
+- Release commit: {sha}
+- PR: {pr_url}
+```
+
+If there were review warnings, use `DONE_WITH_CONCERNS` and list the warnings in the Details section.
--- a/.codex/skills/investigate/agents/investigator.md
+++ b/.codex/skills/investigate/agents/investigator.md
@@ -0,0 +1,392 @@
+# Investigator Agent
+
+Executes all 5 phases of the systematic debugging investigation under the Iron Law methodology. Single long-running agent driven through phases by orchestrator assign_task calls.
+
+## Identity
+
+- **Type**: `investigation`
+- **Role File**: `~/.codex/skills/investigate/agents/investigator.md`
+- **task_name**: `investigator`
+- **Responsibility**: Full 5-phase investigation execution — evidence collection, pattern search, hypothesis testing, minimal fix, verification
+- **fork_context**: false
+- **Reasoning Effort**: high
+
+## Boundaries
+
+### MUST
+
+- Load role definition via MANDATORY FIRST STEPS pattern before any phase execution
+- Read the phase file at the start of each phase before executing that phase's steps
+- Collect concrete evidence before forming any theories (evidence-first)
+- Check `confirmed_root_cause` exists before executing Phase 4 (Iron Law gate)
+- Track 3-strike counter accurately in Phase 3
+- Implement only minimal fix — change only what addresses the confirmed root cause
+- Add a regression test that fails without the fix and passes with it
+- Write the final debug report to `.workflow/.debug/` using the schema in `~/.codex/skills/investigate/specs/debug-report-format.md`
+- Produce structured output after each phase, then await next assign_task
+
+### MUST NOT
+
+- Skip MANDATORY FIRST STEPS role loading
+- Proceed to Phase 4 without `confirmed_root_cause` (Iron Law violation)
+- Modify production code during Phases 1-3 (read-only investigation)
+- Count a rejected hypothesis as a strike if it yielded new actionable insight
+- Refactor, add features, or change formatting beyond the minimal fix
+- Change more than 3 files without written justification
+- Proceed past Phase 3 BLOCKED status
+
+---
+
+## Toolbox
+
+### Available Tools
+
+| Tool | Type | Purpose |
+|------|------|---------|
+| `Bash` | Shell execution | Run tests, reproduce bug, detect test framework, run full test suite |
+| `Read` | File read | Read source files, test files, phase docs, role files |
+| `Write` | File write | Write debug report to `.workflow/.debug/` |
+| `Edit` | File edit | Apply minimal fix in Phase 4 |
+| `Glob` | Pattern search | Find test files, affected module files |
+| `Grep` | Content search | Find error patterns, antipatterns, similar code |
+| `spawn_agent` | Agent spawn | Spawn inline CLI analysis subagent |
+| `wait_agent` | Agent wait | Wait for inline subagent results |
+| `close_agent` | Agent close | Close inline subagent after use |
+
+### Tool Usage Patterns
+
+**Investigation Pattern** (Phases 1-3): Use Grep and Read to collect evidence. No Write or Edit.
+
+**Analysis Pattern** (Phases 1-3 when patterns span many files): Spawn inline-cli-analysis subagent for cross-file diagnostic work.
+
+**Implementation Pattern** (Phase 4 only): Use Edit to apply fix, Write/Edit to add regression test.
+
+**Report Pattern** (Phase 5): Use Bash to run test suite, Write to output JSON report.
+
+---
+
+## Execution
+
+### Phase 1: Root Cause Investigation
+
+**Objective**: Reproduce the bug, collect all evidence, and generate initial diagnosis.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| assign_task message | Yes | Bug description, symptoms, error messages, context |
+| Phase file | Yes | `~/.codex/skills/investigate/phases/01-root-cause-investigation.md` |
+
+**Steps**:
+
+1. Read `~/.codex/skills/investigate/phases/01-root-cause-investigation.md` before executing.
+2. Parse bug report — extract symptom, expected behavior, context, user-provided files and errors.
+3. Attempt reproduction using the most direct method available:
+   - Run failing test if one exists
+   - Run failing command if CLI/script
+   - Trace code path statically if complex setup required
+4. Collect evidence — search for error messages in source, find related log output, identify affected files and modules.
+5. Run inline-cli-analysis subagent for initial diagnostic perspective (see Inline Subagent Calls).
+6. Assemble `investigation-report` in memory: bug_description, reproduction result, evidence, initial_diagnosis.
+7. Output Phase 1 summary and await assign_task for Phase 2.
+
+**Output**: In-memory investigation-report (phase 1 fields populated)
+
+---
+
+### Phase 2: Pattern Analysis
+
+**Objective**: Search for similar patterns in the codebase, classify bug scope.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| assign_task message | Yes | Phase 2 instruction |
+| Phase file | Yes | `~/.codex/skills/investigate/phases/02-pattern-analysis.md` |
+| investigation-report | Yes | Phase 1 output in context |
+
+**Steps**:
+
+1. Read `~/.codex/skills/investigate/phases/02-pattern-analysis.md` before executing.
+2. Search for identical or similar error messages in source (Grep with context lines).
+3. Search for the same exception/error type across the codebase.
+4. If initial diagnosis identified an antipattern, search for it globally (missing null checks, unchecked async, shared state mutation, etc.).
+5. Examine affected module for structural issues — list files, check imports and dependencies.
+6. For complex patterns spanning many files, run inline-cli-analysis subagent for cross-file scope mapping.
+7. Classify scope: `isolated` | `module-wide` | `systemic` with justification.
+8. Document all similar occurrences with file:line references and risk classification (`same_bug` | `potential_bug` | `safe`).
+9. Add `pattern_analysis` section to investigation-report in memory.
+10. Output Phase 2 summary and await assign_task for Phase 3.
+
+**Output**: investigation-report with pattern_analysis section added
+
+---
+
+### Phase 3: Hypothesis Testing
+
+**Objective**: Form up to 3 hypotheses, test each, enforce 3-strike escalation, confirm root cause.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| assign_task message | Yes | Phase 3 instruction |
+| Phase file | Yes | `~/.codex/skills/investigate/phases/03-hypothesis-testing.md` |
+| investigation-report | Yes | Phase 1-2 output in context |
+
+**Steps**:
+
+1. Read `~/.codex/skills/investigate/phases/03-hypothesis-testing.md` before executing.
+2. Form up to 3 ranked hypotheses from Phase 1-2 evidence. Each must cite at least one evidence item and have a testable prediction.
+3. Initialize strike counter at 0.
+4. Test hypotheses sequentially from highest to lowest confidence using read-only probes (Read, Grep, targeted Bash).
+5. After each test, record result: `confirmed` | `rejected` | `inconclusive` with specific evidence observation.
+
+   **Strike counting**:
+
+   | Test result | Strike increment |
+   |-------------|-----------------|
+   | Rejected AND no new insight gained | +1 strike |
+   | Inconclusive AND no narrowing of search | +1 strike |
+   | Rejected BUT narrows search or reveals new cause | +0 (productive) |
+
+6. If strike counter reaches 3 — STOP immediately. Output escalation block (see 3-Strike Escalation Output below). Set status BLOCKED.
+7. If a hypothesis is confirmed — document `confirmed_root_cause` with full evidence chain.
+8. Output Phase 3 results and await assign_task for Phase 4 (or halt on BLOCKED).
+
+**3-Strike Escalation Output**:
+
+```
+## ESCALATION: 3-Strike Limit Reached
+
+### Failed Step
+- Phase: 3 — Hypothesis Testing
+- Step: Hypothesis test #<N>
+
+### Error History
+1. Attempt 1: <H1 description>
+   Test: <what was checked>
+   Result: <rejected/inconclusive> — <why>
+2. Attempt 2: <H2 description>
+   Test: <what was checked>
+   Result: <rejected/inconclusive> — <why>
+3. Attempt 3: <H3 description>
+   Test: <what was checked>
+   Result: <rejected/inconclusive> — <why>
+
+### Current State
+- Evidence collected: <summary from Phase 1-2>
+- Hypotheses tested: <list>
+- Files examined: <list>
+
+### Diagnosis
+- Likely root cause area: <best guess based on all evidence>
+- Suggested human action: <specific recommendation>
+
+### Diagnostic Dump
+<Full investigation-report content>
+
+STATUS: BLOCKED
+```
+
+**Output**: investigation-report with hypothesis_tests and confirmed_root_cause (or BLOCKED escalation)
+
+---
+
+### Phase 4: Implementation
+
+**Objective**: Verify Iron Law gate, implement minimal fix, add regression test.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| assign_task message | Yes | Phase 4 instruction |
+| Phase file | Yes | `~/.codex/skills/investigate/phases/04-implementation.md` |
+| investigation-report | Yes | Must contain confirmed_root_cause |
+
+**Steps**:
+
+1. Read `~/.codex/skills/investigate/phases/04-implementation.md` before executing.
+
+2. **Iron Law Gate Check** — verify `confirmed_root_cause` is present in investigation-report:
+
+   | Condition | Action |
+   |-----------|--------|
+   | confirmed_root_cause present | Proceed to Step 3 |
+   | confirmed_root_cause absent | Output "BLOCKED: Iron Law violation — no confirmed root cause. Return to Phase 3." Halt. |
+
+3. Plan the minimal fix before writing any code. Document: description, files to change, change types, estimated lines.
+
+   | Fix scope | Requirement |
+   |-----------|-------------|
+   | 1-3 files changed | No justification needed |
+   | More than 3 files | Written justification required in fix plan |
+
+4. Implement the fix using Edit tool — change only what is necessary to address the confirmed root cause. No refactoring, no style changes to unrelated code.
+5. Add regression test:
+   - Find existing test file for the affected module (Glob for `**/*.test.{ts,js,py}` or `**/test_*.py`)
+   - Add or modify a test with a name that clearly references the bug scenario
+   - Test must exercise the exact code path identified in root cause
+   - Test must be deterministic
+6. Re-run the original reproduction case from Phase 1. Verify it now passes.
+7. Add `fix_applied` section to investigation-report in memory.
+8. Output Phase 4 summary and await assign_task for Phase 5.
+
+**Output**: Modified source files, regression test file; investigation-report with fix_applied section
+
+---
+
+### Phase 5: Verification & Report
+
+**Objective**: Run full test suite, check regressions, generate structured debug report.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| assign_task message | Yes | Phase 5 instruction |
+| Phase file | Yes | `~/.codex/skills/investigate/phases/05-verification-report.md` |
+| investigation-report | Yes | All phases populated |
+
+**Steps**:
+
+1. Read `~/.codex/skills/investigate/phases/05-verification-report.md` before executing.
+2. Detect and run the project's test framework:
+   - Check for `package.json` (npm test)
+   - Check for `pytest.ini` / `pyproject.toml` (pytest)
+   - Check for `go.mod` (go test)
+   - Check for `Cargo.toml` (cargo test)
+3. Record test results: total, passed, failed, skipped. Note if regression test passed.
+4. Check for new failures:
+
+   | New failure condition | Action |
+   |-----------------------|--------|
+   | Related to the fix | Return to Phase 4 to adjust fix |
+   | Unrelated (pre-existing) | Document as pre_existing_failures, proceed |
+
+5. Generate debug report JSON following schema in `~/.codex/skills/investigate/specs/debug-report-format.md`. Populate all required fields from investigation-report phases.
+6. Create output directory and write report:
+   ```
+   Bash: mkdir -p .workflow/.debug
+   ```
+   Filename: `.workflow/.debug/debug-report-<YYYY-MM-DD>-<slug>.json`
+   Where `<slug>` = bug_description lowercased, non-alphanumeric replaced with `-`, max 40 chars.
+7. Determine completion status:
+
+   | Condition | Status |
+   |-----------|--------|
+   | All tests pass, regression test passes, no concerns | DONE |
+   | Fix applied but partial test coverage or minor warnings | DONE_WITH_CONCERNS |
+   | Cannot proceed due to test failures or unresolvable regression | BLOCKED |
+
+8. Output completion status block.
+
+**Output**: `.workflow/.debug/debug-report-<date>-<slug>.json`
+
+---
+
+## Inline Subagent Calls
+
+This agent spawns a utility subagent for cross-file diagnostic analysis during Phases 1, 2, and 3 when analysis spans many files or requires broader diagnostic perspective.
+
+### inline-cli-analysis
+
+**When**: After initial evidence collection in Phase 1; for cross-file pattern search in Phase 2; for hypothesis validation assistance in Phase 3.
+
+**Agent File**: `~/.codex/agents/cli-explore-agent.md`
+
+```
+spawn_agent({
+  task_name: "inline-cli-analysis",
+  fork_context: false,
+  model: "haiku",
+  reasoning_effort: "medium",
+  message: `### MANDATORY FIRST STEPS
+1. Read: ~/.codex/agents/cli-explore-agent.md
+
+<analysis task description — e.g.:
+PURPOSE: Diagnose root cause of bug from collected evidence
+TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
+MODE: analysis
+CONTEXT: @<affected_files> | Evidence: <error_messages_and_traces>
+EXPECTED: Top 3 likely root causes ranked by evidence strength
+CONSTRAINTS: Read-only analysis | Focus on <affected_module>>
+
+Expected: Structured findings with file:line references`
+})
+const result = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
+close_agent({ target: "inline-cli-analysis" })
+```
+
+Substitute the analysis task description with phase-appropriate content:
+- Phase 1: Initial diagnosis from error evidence
+- Phase 2: Cross-file pattern search and scope mapping
+- Phase 3: Hypothesis validation assistance
+
+### Result Handling
+
+| Result | Action |
+|--------|--------|
+| Success | Integrate findings into investigation-report, continue |
+| Timeout / Error | Continue without subagent result, log warning in investigation-report |
+
+---
+
+## Structured Output Template
+
+After each phase, output the following structure before awaiting the next assign_task:
+
+```
+## Phase <N> Complete
+
+### Summary
+- <one-sentence status of what was accomplished>
+
+### Findings
+- <Finding 1>: <specific description with file:line reference>
+- <Finding 2>: <specific description with file:line reference>
+
+### Investigation Report Update
+- Fields updated: <list of fields added/modified this phase>
+- Key data: <most important finding from this phase>
+
+### Status
+<AWAITING_NEXT_PHASE | BLOCKED: <reason> | DONE>
+```
+
+Final Phase 5 output follows Completion Status Protocol:
+
+```
+## STATUS: DONE
+
+**Summary**: Fixed <bug_description> — root cause was <root_cause_summary>
+
+### Details
+- Phases completed: 5/5
+- Root cause: <confirmed_root_cause>
+- Fix: <fix_description>
+- Regression test: <test_name> in <test_file>
+
+### Outputs
+- Debug report: <reportPath>
+- Files changed: <list>
+- Tests added: <list>
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Bug not reproducible | Document as concern, continue with static analysis; note in report |
+| Error message not found in source | Expand search scope; try related terms; use inline subagent |
+| Phase file not found | Report "BLOCKED: Cannot read phase file <path>" |
+| Iron Law gate fails in Phase 4 | Output BLOCKED status, halt, do not modify any files |
+| Fix introduces regression | Analyze the new failure, adjust fix within same Phase 4 context |
+| Test framework not detected | Document in report concerns; attempt common commands (`npm test`, `pytest`, `go test ./...`) |
+| inline-cli-analysis timeout | Continue without subagent result, log warning |
+| Scope ambiguity | Report in Open Questions, proceed with reasonable assumption and document |
--- a/.codex/skills/investigate/orchestrator.md
+++ b/.codex/skills/investigate/orchestrator.md
@@ -0,0 +1,362 @@
+---
+name: investigate
+description: Systematic debugging with Iron Law methodology. 5-phase investigation from evidence collection to verified fix. Triggers on "investigate", "debug", "root cause".
+agents: investigator
+phases: 5
+---
+
+# Investigate
+
+Systematic debugging skill that enforces the Iron Law: never fix without a confirmed root cause. Produces a structured debug report with full evidence chain, minimal fix, and regression test.
+
+## Architecture
+
+```
+--------------------------------------------------------------+
+|  investigate Orchestrator                                     |
+|  -> Drive investigator agent through 5 sequential phases     |
+----------------------------+---------------------------------+
+                             |
+              spawn_agent (Phase 1 initial task)
+                             |
+                             v
+                    +------------------+
+                    |   investigator   |
+                    |  (single agent,  |
+                    |   5-phase loop)  |
+                    +------------------+
+                        |   ^   |
+          assign_task   |   |   |  assign_task
+          (Phase 2-5)   v   |   v  (Phase 3 gate check)
+                    +------------------+
+                    | Phase 1: Root    |
+                    | Phase 2: Pattern |
+                    | Phase 3: Hyp.    | <-- Gate: BLOCKED?
+                    | Phase 4: Impl.   | <-- Iron Law gate
+                    | Phase 5: Report  |
+                    +------------------+
+                             |
+                             v
+                  .workflow/.debug/debug-report-*.json
+```
+
+---
+
+## Agent Registry
+
+| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
+|-------|-----------|-----------|----------------|---------|-------------|
+| investigator | `investigator` | `~/.codex/skills/investigate/agents/investigator.md` | Full 5-phase investigation execution | Deep Interaction (2.3) | false |
+
+> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
+
+---
+
+## Fork Context Strategy
+
+| Agent | task_name | fork_context | fork_from | Rationale |
+|-------|-----------|-------------|-----------|-----------|
+| investigator | `investigator` | false | — | Starts fresh; receives all phase context via assign_task messages. No prior conversation history needed. |
+
+**Fork Decision Rules**:
+
+| Condition | fork_context | Reason |
+|-----------|-------------|--------|
+| investigator spawned (Phase 1) | false | Clean context; full task description in message |
+| Phase 2-5 transitions | N/A | assign_task used, agent already running |
+
+---
+
+## Subagent Registry
+
+Utility subagents callable by the investigator agent during analysis phases:
+
+| Subagent | Agent File | Callable By | Purpose | Model |
+|----------|-----------|-------------|---------|-------|
+| inline-cli-analysis | `~/.codex/agents/cli-explore-agent.md` | investigator | Cross-file diagnostic analysis (replaces ccw cli calls) | haiku |
+
+> Subagents are spawned by the investigator within its own execution context (Pattern 2.8), not by the orchestrator.
+
+---
+
+## Phase Execution
+
+### Phase 1: Root Cause Investigation
+
+**Objective**: Spawn the investigator agent and assign the Phase 1 investigation task. Agent reproduces the bug, collects evidence, and runs initial diagnosis.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| User message | Bug description, symptom, context, error messages |
+
+**Execution**:
+
+Build the initial spawn message embedding the bug report and Phase 1 instructions, then spawn the investigator:
+
+```
+spawn_agent({
+  task_name: "investigator",
+  fork_context: false,
+  message: `## TASK ASSIGNMENT
+
+### MANDATORY FIRST STEPS (Agent Execute)
+1. Read role definition: ~/.codex/skills/investigate/agents/investigator.md (MUST read first)
+2. Read: ~/.codex/skills/investigate/phases/01-root-cause-investigation.md
+
+---
+
+## Phase 1: Root Cause Investigation
+
+Bug Report:
+<user-provided bug description, symptoms, error messages, context>
+
+Execute Phase 1 per the phase file. Produce investigation-report (in-memory) and report back with:
+- Phase 1 complete summary
+- bug_description, reproduction result, evidence collected, initial diagnosis
+- Await next phase assignment.`
+})
+
+const p1Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
+```
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| p1Result | Phase 1 completion summary with evidence, reproduction, initial diagnosis |
+
+---
+
+### Phase 2: Pattern Analysis
+
+**Objective**: Assign Phase 2 to the running investigator. Agent searches codebase for similar patterns and classifies bug scope.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| p1Result | Phase 1 output — evidence, affected files, initial suspects |
+
+**Execution**:
+
+```
+assign_task({
+  target: "investigator",
+  items: [{
+    type: "text",
+    text: `## Phase 2: Pattern Analysis
+
+Read: ~/.codex/skills/investigate/phases/02-pattern-analysis.md
+
+Using your Phase 1 findings, execute Phase 2:
+- Search for similar error patterns across the codebase
+- Search for the same antipattern if identified
+- Classify scope: isolated | module-wide | systemic
+- Document all occurrences with file:line references
+
+Report back with pattern_analysis section and scope classification. Await next phase assignment.`
+  }]
+})
+
+const p2Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
+```
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| p2Result | Pattern analysis section: scope classification, similar occurrences, scope justification |
+
+---
+
+### Phase 3: Hypothesis Testing
+
+**Objective**: Assign Phase 3 to the investigator. Agent forms and tests up to 3 hypotheses. Orchestrator checks output for `BLOCKED` marker before proceeding.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| p2Result | Pattern analysis results |
+
+**Execution**:
+
+```
+assign_task({
+  target: "investigator",
+  items: [{
+    type: "text",
+    text: `## Phase 3: Hypothesis Testing
+
+Read: ~/.codex/skills/investigate/phases/03-hypothesis-testing.md
+
+Using Phase 1-2 evidence, execute Phase 3:
+- Form up to 3 ranked hypotheses, each citing evidence
+- Test each hypothesis with read-only probes
+- Track 3-strike counter — if 3 consecutive unproductive failures: STOP and output ESCALATION block with BLOCKED status
+- If a hypothesis is confirmed: output confirmed_root_cause with full evidence chain
+
+Report back with hypothesis test results and either:
+  confirmed_root_cause (proceed to Phase 4)
+  OR  BLOCKED: <escalation dump> (halt)`
+  }]
+})
+
+const p3Result = wait_agent({ targets: ["investigator"], timeout_ms: 480000 })
+```
+
+**Phase 3 Gate Decision**:
+
+| Condition | Action |
+|-----------|--------|
+| p3Result contains `confirmed_root_cause` | Proceed to Phase 4 |
+| p3Result contains `BLOCKED` | Halt workflow, output escalation dump to user, close investigator |
+| p3Result contains `ESCALATION: 3-Strike Limit Reached` | Halt workflow, output diagnostic dump, close investigator |
+| Timeout | assign_task "Finalize Phase 3 results now", re-wait 120s; if still timeout → halt |
+
+If BLOCKED: close investigator and surface the diagnostic dump to the user. Do not proceed to Phase 4.
+
+---
+
+### Phase 4: Implementation
+
+**Objective**: Assign Phase 4 only after confirmed root cause. Agent implements minimal fix and adds regression test.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| p3Result | confirmed_root_cause with evidence chain, affected file:line |
+
+**Execution**:
+
+```
+assign_task({
+  target: "investigator",
+  items: [{
+    type: "text",
+    text: `## Phase 4: Implementation
+
+Read: ~/.codex/skills/investigate/phases/04-implementation.md
+
+Iron Law gate confirmed — proceed with implementation:
+- Verify confirmed_root_cause is present in your context (gate check)
+- Plan the minimal fix before writing any code
+- Implement only what is necessary to fix the confirmed root cause
+- Add regression test: must fail without fix, pass with fix
+- Verify fix against original reproduction case from Phase 1
+
+Report back with fix_applied section. Await Phase 5 assignment.`
+  }]
+})
+
+const p4Result = wait_agent({ targets: ["investigator"], timeout_ms: 480000 })
+```
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| p4Result | fix_applied section: files changed, regression test details, reproduction verified |
+
+---
+
+### Phase 5: Verification & Report
+
+**Objective**: Assign Phase 5 to run the full test suite and generate the structured debug report.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| p4Result | fix_applied details — files changed, regression test |
+
+**Execution**:
+
+```
+assign_task({
+  target: "investigator",
+  items: [{
+    type: "text",
+    text: `## Phase 5: Verification & Report
+
+Read: ~/.codex/skills/investigate/phases/05-verification-report.md
+
+Final phase:
+- Run full test suite (detect framework: npm test / pytest / go test / cargo test)
+- Verify the regression test passes
+- Check for new failures introduced by the fix
+- Generate structured debug report per specs/debug-report-format.md
+- Write report to .workflow/.debug/debug-report-<YYYY-MM-DD>-<slug>.json
+- Output completion status: DONE | DONE_WITH_CONCERNS | BLOCKED`
+  }]
+})
+
+const p5Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
+```
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| p5Result | Completion status, test suite results, path to debug report file |
+
+---
+
+## Lifecycle Management
+
+### Timeout Protocol
+
+| Phase | Default Timeout | On Timeout |
+|-------|-----------------|------------|
+| Phase 1 (spawn + wait) | 300000 ms | assign_task "Finalize Phase 1 now" + wait 120s; if still timeout → halt |
+| Phase 2 (assign + wait) | 300000 ms | assign_task "Finalize Phase 2 now" + wait 120s; if still timeout → halt |
+| Phase 3 (assign + wait) | 480000 ms | assign_task "Finalize Phase 3 now" + wait 120s; if still timeout → halt BLOCKED |
+| Phase 4 (assign + wait) | 480000 ms | assign_task "Finalize Phase 4 now" + wait 120s; if still timeout → halt |
+| Phase 5 (assign + wait) | 300000 ms | assign_task "Finalize Phase 5 now" + wait 120s; if still timeout → partial report |
+
+### Cleanup Protocol
+
+At workflow end (success or halt), close the investigator agent:
+
+```
+close_agent({ target: "investigator" })
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Agent timeout (first) | assign_task "Finalize current work and output results" + re-wait 120000 ms |
+| Agent timeout (second) | close_agent, report partial results to user |
+| Phase 3 BLOCKED | close_agent, surface full escalation dump to user, halt |
+| Phase 4 Iron Law violation | close_agent, report "Cannot proceed: no confirmed root cause" |
+| Phase 4 introduces regression | Investigator returns to fix adjustment; orchestrator re-waits same phase |
+| User cancellation | close_agent({ target: "investigator" }), report current state |
+| send_message ignored | Escalate to assign_task |
+
+---
+
+## Output Format
+
+```
+## Summary
+- One-sentence completion status (DONE / DONE_WITH_CONCERNS / BLOCKED)
+
+## Results
+- Root cause: <confirmed root cause description>
+- Fix: <what was changed>
+- Regression test: <test name in test file>
+
+## Artifacts
+- File: .workflow/.debug/debug-report-<date>-<slug>.json
+- Description: Full structured investigation report
+
+## Next Steps (if DONE_WITH_CONCERNS or BLOCKED)
+1. <recommended follow-up action>
+2. <recommended follow-up action>
+```
--- a/.codex/skills/investigate/phases/01-root-cause-investigation.md
+++ b/.codex/skills/investigate/phases/01-root-cause-investigation.md
@@ -0,0 +1,212 @@
+# Phase 1: Root Cause Investigation
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Reproduce the bug and collect all available evidence before forming any theories.
+
+## Objective
+
+- Reproduce the bug with concrete, observable symptoms
+- Collect all evidence: error messages, logs, stack traces, affected files
+- Establish a baseline understanding of what goes wrong and where
+- Use inline CLI analysis for initial diagnosis
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| assign_task message | Yes | Bug description, symptom, expected behavior, context, user-provided errors |
+| User-provided files | Optional | Any files or paths the user mentioned as relevant |
+
+## Execution Steps
+
+### Step 1: Parse the Bug Report
+
+Extract the following from the user's description:
+- **Symptom**: What observable behavior is wrong?
+- **Expected**: What should happen instead?
+- **Context**: When/where does it occur? (specific input, environment, timing)
+- **User-provided files**: Any files mentioned
+- **User-provided errors**: Any error messages provided
+
+Assemble the extracted fields as the initial `investigation-report` structure in memory:
+
+```
+bugReport = {
+  symptom: <extracted from description>,
+  expected_behavior: <what should happen>,
+  context: <when/where it occurs>,
+  user_provided_files: [<files mentioned>],
+  user_provided_errors: [<error messages>]
+}
+```
+
+---
+
+### Step 2: Reproduce the Bug
+
+Attempt reproduction using the most direct method available:
+
+| Method | When to use |
+|--------|-------------|
+| Run failing test | A specific failing test is known or can be identified |
+| Run failing command | Bug is triggered by a CLI command or script |
+| Static code path trace | Reproduction requires complex setup; use Read + Grep to trace the path |
+
+Execution for each method:
+
+**Run failing test**:
+```
+Bash: <detect test runner and run the specific failing test>
+```
+
+**Run failing command**:
+```
+Bash: <execute the command that triggers the bug>
+```
+
+**Static code path trace**:
+- Use Grep to find the error message text in source
+- Use Read to trace the code path that produces the error
+- Document the theoretical reproduction path
+
+**Decision table**:
+
+| Outcome | Action |
+|---------|--------|
+| Reproduction successful | Document steps and method, proceed to Step 3 |
+| Reproduction failed | Document what was attempted, note as concern, continue with static analysis |
+
+---
+
+### Step 3: Collect Evidence
+
+Gather all available evidence using project tools:
+
+1. Search for the exact error message text in source files (Grep with 3 lines of context).
+2. Search for related log output patterns.
+3. Read any stack trace files or test output files if they exist on disk.
+4. Use Glob to identify all files in the affected module or area.
+5. Read the most directly implicated source files.
+
+Compile findings into the evidence section of the investigation-report:
+
+```
+evidence = {
+  error_messages: [<exact error text>],
+  stack_traces: [<relevant stack trace>],
+  affected_files: [<file1>, <file2>],
+  affected_modules: [<module-name>],
+  log_output: [<relevant log lines>]
+}
+```
+
+---
+
+### Step 4: Initial Diagnosis via Inline CLI Analysis
+
+Spawn inline-cli-analysis subagent for broader diagnostic perspective:
+
+```
+spawn_agent({
+  task_name: "inline-cli-analysis",
+  fork_context: false,
+  model: "haiku",
+  reasoning_effort: "medium",
+  message: `### MANDATORY FIRST STEPS
+1. Read: ~/.codex/agents/cli-explore-agent.md
+
+PURPOSE: Diagnose root cause of bug from collected evidence
+TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
+MODE: analysis
+CONTEXT: @<affected_files_from_step3> | Evidence: <error_messages_and_traces>
+EXPECTED: Top 3 likely root causes ranked by evidence strength, each with file:line reference
+CONSTRAINTS: Read-only analysis | Focus on <affected_module>`
+})
+const diagResult = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
+close_agent({ target: "inline-cli-analysis" })
+```
+
+Record results in initial_diagnosis section:
+
+```
+initial_diagnosis = {
+  cli_tool_used: "inline-cli-analysis",
+  top_suspects: [
+    { description: <suspect 1>, evidence_strength: "strong|moderate|weak", files: [<files>] }
+  ]
+}
+```
+
+**Decision table**:
+
+| Outcome | Action |
+|---------|--------|
+| Subagent returns top suspects | Integrate into investigation-report, proceed to Step 5 |
+| Subagent timeout or error | Log warning in investigation-report, proceed to Step 5 without subagent findings |
+
+---
+
+### Step 5: Assemble Investigation Report
+
+Combine all findings into the complete Phase 1 investigation-report:
+
+```
+investigation_report = {
+  phase: 1,
+  bug_description: <concise one-sentence description>,
+  reproduction: {
+    reproducible: true|false,
+    steps: ["step 1: ...", "step 2: ...", "step 3: observe error"],
+    reproduction_method: "test|command|static_analysis"
+  },
+  evidence: {
+    error_messages: [<exact error text>],
+    stack_traces: [<relevant stack trace>],
+    affected_files: [<file1>, <file2>],
+    affected_modules: [<module-name>],
+    log_output: [<relevant log lines>]
+  },
+  initial_diagnosis: {
+    cli_tool_used: "inline-cli-analysis",
+    top_suspects: [
+      { description: <suspect>, evidence_strength: "strong|moderate|weak", files: [] }
+    ]
+  }
+}
+```
+
+Output Phase 1 summary and await assign_task for Phase 2.
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| investigation-report (phase 1) | In-memory JSON | bug_description, reproduction, evidence, initial_diagnosis |
+| Phase 1 summary | Structured text output | Summary for orchestrator, await Phase 2 assignment |
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Bug symptom clearly documented | bug_description field populated with 10+ chars |
+| Reproduction attempted | reproduction.reproducible is true or failure documented |
+| At least one concrete evidence item collected | evidence.error_messages OR stack_traces OR affected_files non-empty |
+| Affected files identified | evidence.affected_files non-empty |
+| Initial diagnosis generated | initial_diagnosis.top_suspects has at least one entry (or timeout documented) |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Cannot reproduce bug | Document what was attempted, set reproducible: false, continue with static analysis |
+| Error message not found in source | Expand search to whole project, try related terms, continue |
+| No affected files identifiable | Use Glob on broad patterns, document uncertainty |
+| inline-cli-analysis timeout | Continue without subagent result, log warning in initial_diagnosis |
+| User description insufficient | Document in Open Questions, proceed with available information |
+
+## Next Phase
+
+-> [Phase 2: Pattern Analysis](02-pattern-analysis.md)
--- a/.codex/skills/investigate/phases/02-pattern-analysis.md
+++ b/.codex/skills/investigate/phases/02-pattern-analysis.md
@@ -0,0 +1,181 @@
+# Phase 2: Pattern Analysis
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Search for similar patterns in the codebase to determine if the bug is isolated or systemic.
+
+## Objective
+
+- Search for similar error patterns, antipatterns, or code smells across the codebase
+- Determine if the bug is an isolated incident or part of a systemic issue
+- Identify related code that may be affected by the same root cause
+- Refine the scope of the investigation
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| investigation-report (phase 1) | Yes | Evidence, affected files, affected modules, initial diagnosis suspects |
+| assign_task message | Yes | Phase 2 instruction |
+
+## Execution Steps
+
+### Step 1: Search for Similar Error Patterns
+
+Search for the same error type or message elsewhere in the codebase:
+
+1. Grep for identical or similar error message fragments in `src/` with 3 lines of context.
+2. Grep for the same exception class or error code — output mode: files with matches.
+3. Grep for similar error handling patterns in the same module.
+
+**Decision table**:
+
+| Result | Action |
+|--------|--------|
+| Similar patterns found in same module | Note as module-wide indicator, continue |
+| Similar patterns found across multiple modules | Note as systemic indicator, continue |
+| No similar patterns found | Note as isolated indicator, continue |
+
+---
+
+### Step 2: Search for the Same Antipattern
+
+If the Phase 1 initial diagnosis identified a coding antipattern, search for it globally:
+
+**Common antipattern examples to search for**:
+
+| Antipattern | Grep pattern style |
+|-------------|-------------------|
+| Missing null/undefined check | `variable\.property` without guard |
+| Unchecked async operation | unhandled promise, missing await |
+| Direct mutation of shared state | shared state write without lock |
+| Type assumption violation | forced cast without validation |
+
+Execute at least one targeted Grep for the identified antipattern across relevant source directories.
+
+**Decision table**:
+
+| Result | Action |
+|--------|--------|
+| Antipattern found in multiple files | Classify as module-wide or systemic candidate |
+| Antipattern isolated to one location | Classify as isolated candidate |
+| No antipattern identifiable | Proceed without antipattern classification |
+
+---
+
+### Step 3: Module-Level Analysis
+
+Examine the affected module for structural issues:
+
+1. Use Glob to list all files in the affected module directory.
+2. Grep for imports from the affected module to understand its consumers.
+3. Check for circular dependencies or unusual import patterns.
+
+---
+
+### Step 4: CLI Cross-File Pattern Analysis (Optional)
+
+For complex patterns that span many files, use inline-cli-analysis subagent:
+
+```
+spawn_agent({
+  task_name: "inline-cli-analysis",
+  fork_context: false,
+  model: "haiku",
+  reasoning_effort: "medium",
+  message: `### MANDATORY FIRST STEPS
+1. Read: ~/.codex/agents/cli-explore-agent.md
+
+PURPOSE: Identify all instances of antipattern across codebase; success = complete scope map
+TASK: Search for pattern '<antipattern_description>' | Map all occurrences | Assess systemic risk
+MODE: analysis
+CONTEXT: @src/**/*.<ext> | Bug in <module>, pattern: <pattern_description>
+EXPECTED: List of all files with same pattern, risk assessment per occurrence (same_bug|potential_bug|safe)
+CONSTRAINTS: Focus on <antipattern> pattern only | Ignore test files for scope`
+})
+const patternResult = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
+close_agent({ target: "inline-cli-analysis" })
+```
+
+**Decision table**:
+
+| Condition | Action |
+|-----------|--------|
+| Pattern spans >3 files in >1 module | Use subagent for full scope map |
+| Pattern confined to 1 module | Skip subagent, proceed with manual search results |
+| Subagent timeout | Continue with manual search results |
+
+---
+
+### Step 5: Classify Scope and Assemble Pattern Analysis
+
+Classify the bug scope based on all search findings:
+
+**Scope Definitions**:
+
+| Scope | Definition |
+|-------|-----------|
+| `isolated` | Bug exists in a single location; no similar patterns found elsewhere |
+| `module-wide` | Same pattern exists in multiple files within the same module |
+| `systemic` | Pattern spans multiple modules; may require broader fix |
+
+Assemble `pattern_analysis` section and add to investigation-report:
+
+```
+pattern_analysis = {
+  scope: "isolated|module-wide|systemic",
+  similar_occurrences: [
+    {
+      file: "<path/to/file.ts>",
+      line: <line number>,
+      pattern: "<description of similar pattern>",
+      risk: "same_bug|potential_bug|safe"
+    }
+  ],
+  total_occurrences: <count>,
+  affected_modules: ["<module-name>"],
+  antipattern_identified: "<description or null>",
+  scope_justification: "<evidence-based reasoning for this scope classification>"
+}
+```
+
+**Scope decision table**:
+
+| Scope | Phase 3 Focus |
+|-------|--------------|
+| isolated | Narrow hypothesis scope to single location |
+| module-wide | Note all occurrences for Phase 4 fix planning |
+| systemic | Note for potential multi-location fix; flag for separate tracking |
+
+Output Phase 2 summary and await assign_task for Phase 3.
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| investigation-report (phase 2) | In-memory JSON | Phase 1 fields + pattern_analysis section added |
+| Phase 2 summary | Structured text output | Scope classification with justification, await Phase 3 |
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| At least 3 search queries executed | Count of Grep/Glob operations performed |
+| Scope classified | pattern_analysis.scope is one of: isolated, module-wide, systemic |
+| Similar occurrences documented | pattern_analysis.similar_occurrences populated (empty array acceptable for isolated) |
+| Scope justification provided | pattern_analysis.scope_justification non-empty with evidence |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| No source directory found | Search from project root, document uncertainty |
+| Grep returns too many results | Narrow pattern, add path filter, take top 10 most relevant |
+| inline-cli-analysis timeout | Continue with manual search results, log warning |
+| Antipattern not identifiable from Phase 1 | Skip Step 2 antipattern search, proceed with error pattern search only |
+
+## Next Phase
+
+-> [Phase 3: Hypothesis Testing](03-hypothesis-testing.md)
--- a/.codex/skills/investigate/phases/03-hypothesis-testing.md
+++ b/.codex/skills/investigate/phases/03-hypothesis-testing.md
@@ -0,0 +1,214 @@
+# Phase 3: Hypothesis Testing
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.
+
+## Objective
+
+- Form a maximum of 3 hypotheses from Phase 1-2 evidence
+- Test each hypothesis with minimal, read-only probes
+- Confirm or reject each hypothesis with concrete evidence
+- Enforce 3-strike rule: STOP and escalate after 3 consecutive unproductive test failures
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| investigation-report (phases 1-2) | Yes | Evidence, affected files, pattern analysis, initial suspects |
+| assign_task message | Yes | Phase 3 instruction |
+
+## Execution Steps
+
+### Step 1: Form Hypotheses
+
+Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:
+
+**Hypothesis formation rules**:
+- Each hypothesis must cite at least one piece of evidence from Phase 1-2
+- Each hypothesis must have a testable prediction
+- Rank by confidence (high first)
+- Maximum 3 hypotheses per investigation
+
+Assemble hypotheses in memory:
+
+```
+hypotheses = [
+  {
+    id: "H1",
+    description: "The root cause is <X> because evidence <Y>",
+    evidence_supporting: ["<evidence item 1>", "<evidence item 2>"],
+    predicted_behavior: "If H1 is correct, then we should observe <Z>",
+    test_method: "How to verify: read file <X> line <Y>, check value <Z>",
+    confidence: "high|medium|low"
+  }
+]
+```
+
+Initialize strike counter: 0
+
+---
+
+### Step 2: Test Hypotheses Sequentially
+
+Test each hypothesis starting from highest confidence (H1 first). Use read-only probes only during testing.
+
+**Allowed test methods**:
+
+| Method | Usage |
+|--------|-------|
+| Read a specific file | Check a specific value, condition, or code pattern |
+| Grep for a pattern | Confirm or deny the presence of a condition |
+| Bash targeted test | Run a specific test that reveals the condition |
+| Temporary log statement | Add a log to observe runtime behavior; MUST revert after |
+
+**Prohibited during hypothesis testing**:
+- Modifying production code (save for Phase 4)
+- Changing multiple things at once
+- Running the full test suite (targeted checks only)
+
+---
+
+### Step 3: Record Test Results
+
+For each hypothesis test, record:
+
+```
+hypothesis_test = {
+  id: "H1",
+  test_performed: "<what was checked, e.g.: Read src/caller.ts:42 — checked null handling>",
+  result: "confirmed|rejected|inconclusive",
+  evidence: "<specific observation that confirms or rejects>",
+  files_checked: ["<src/caller.ts:42-55>"]
+}
+```
+
+---
+
+### Step 4: 3-Strike Escalation Rule
+
+Track consecutive unproductive test failures. After each hypothesis test, evaluate:
+
+**Strike evaluation**:
+
+| Test result | New insight gained | Strike action |
+|-------------|-------------------|---------------|
+| confirmed | — | CONFIRM root cause, end testing |
+| rejected | Yes — narrows search or reveals new cause | No strike (productive rejection) |
+| rejected | No — no actionable insight | +1 strike |
+| inconclusive | Yes — identifies new area | No strike (productive) |
+| inconclusive | No — no narrowing | +1 strike |
+
+**Strike counter tracking**:
+
+| Strike count | Action |
+|--------------|--------|
+| 1 | Continue to next hypothesis |
+| 2 | Continue to next hypothesis |
+| 3 | STOP — output escalation block immediately |
+
+**On 3rd Strike — output this escalation block verbatim and halt**:
+
+```
+## ESCALATION: 3-Strike Limit Reached
+
+### Failed Step
+- Phase: 3 — Hypothesis Testing
+- Step: Hypothesis test #<N>
+
+### Error History
+1. Attempt 1: <H1 description>
+   Test: <what was checked>
+   Result: <rejected/inconclusive> — <why>
+2. Attempt 2: <H2 description>
+   Test: <what was checked>
+   Result: <rejected/inconclusive> — <why>
+3. Attempt 3: <H3 description>
+   Test: <what was checked>
+   Result: <rejected/inconclusive> — <why>
+
+### Current State
+- Evidence collected: <summary from Phase 1-2>
+- Hypotheses tested: <list>
+- Files examined: <list>
+
+### Diagnosis
+- Likely root cause area: <best guess based on all evidence>
+- Suggested human action: <specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z">
+
+### Diagnostic Dump
+<Full investigation-report content from all phases>
+
+STATUS: BLOCKED
+```
+
+After outputting escalation: set status BLOCKED. Do not proceed to Phase 4.
+
+---
+
+### Step 5: Confirm Root Cause
+
+If a hypothesis is confirmed, document the confirmed root cause:
+
+```
+confirmed_root_cause = {
+  hypothesis_id: "H1",
+  description: "<Root cause description with full technical detail>",
+  evidence_chain: [
+    "Phase 1: <Error message X observed in Y>",
+    "Phase 2: <Same pattern found in N other files>",
+    "Phase 3: H1 confirmed — <specific condition at file.ts:42>"
+  ],
+  affected_code: {
+    file: "<path/to/file.ts>",
+    line_range: "<42-55>",
+    function: "<functionName>"
+  }
+}
+```
+
+Add `hypothesis_tests` and `confirmed_root_cause` to investigation-report in memory.
+
+Output Phase 3 results and await assign_task for Phase 4.
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| investigation-report (phase 3) | In-memory JSON | Phases 1-2 fields + hypothesis_tests + confirmed_root_cause |
+| Phase 3 summary or escalation block | Structured text output | Either confirmed root cause or BLOCKED escalation |
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Maximum 3 hypotheses formed | Count of hypotheses array |
+| Each hypothesis cites evidence | evidence_supporting non-empty for each |
+| Each hypothesis tested with documented probe | test_performed field populated for each |
+| Strike counter maintained correctly | Count of unproductive consecutive failures |
+| Root cause confirmed with evidence chain OR escalation triggered | confirmed_root_cause present OR BLOCKED output |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Evidence insufficient to form 3 hypotheses | Form as many as evidence supports (minimum 1), proceed |
+| Partial insight from rejected hypothesis | Do not count as strike; re-form or refine remaining hypotheses with new insight |
+| All 3 hypotheses confirmed simultaneously | Use highest-confidence confirmed one as root cause |
+| Hypothesis test requires production change | Prohibited — use static analysis or targeted read-only probe instead |
+
+## Gate for Phase 4
+
+Phase 4 can ONLY proceed if `confirmed_root_cause` is present. This is the Iron Law gate.
+
+| Outcome | Next Step |
+|---------|-----------|
+| Root cause confirmed | -> [Phase 4: Implementation](04-implementation.md) |
+| 3-strike escalation triggered | STOP — output diagnostic dump — STATUS: BLOCKED |
+| Partial insight, re-forming hypotheses | Stay in Phase 3, re-test with refined hypotheses |
+
+## Next Phase
+
+-> [Phase 4: Implementation](04-implementation.md) ONLY with confirmed root cause.
--- a/.codex/skills/investigate/phases/04-implementation.md
+++ b/.codex/skills/investigate/phases/04-implementation.md
@@ -0,0 +1,195 @@
+# Phase 4: Implementation
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Implement the minimal fix and add a regression test. Iron Law gate enforced at entry.
+
+## Objective
+
+- Verify Iron Law gate: confirmed root cause MUST exist from Phase 3
+- Implement the minimal fix that addresses the confirmed root cause
+- Add a regression test that fails without the fix and passes with it
+- Verify the fix resolves the original reproduction case
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| investigation-report (phase 3) | Yes | Must contain confirmed_root_cause with evidence chain |
+| assign_task message | Yes | Phase 4 instruction |
+
+## Iron Law Gate Check
+
+**MANDATORY FIRST ACTION before any code modification**:
+
+| Condition | Action |
+|-----------|--------|
+| investigation-report contains `confirmed_root_cause` with non-empty description | Proceed to Step 1 |
+| `confirmed_root_cause` absent or empty | Output "BLOCKED: Iron Law violation — no confirmed root cause. Return to Phase 3." Halt. Do NOT modify any files. |
+
+Log the confirmed state before proceeding:
+- Root cause: `<confirmed_root_cause.description>`
+- Evidence chain: `<confirmed_root_cause.evidence_chain.length>` items
+- Affected code: `<confirmed_root_cause.affected_code.file>:<confirmed_root_cause.affected_code.line_range>`
+
+## Execution Steps
+
+### Step 1: Plan the Minimal Fix
+
+Define the fix scope BEFORE writing any code:
+
+```
+fix_plan = {
+  description: "<What the fix does and why>",
+  changes: [
+    {
+      file: "<path/to/file.ts>",
+      change_type: "modify|add|remove",
+      description: "<specific change description>",
+      lines_affected: "<42-45>"
+    }
+  ],
+  total_files_changed: <count>,
+  total_lines_changed: "<estimated>"
+}
+```
+
+**Minimal Fix Rules** (from Iron Law):
+
+| Rule | Requirement |
+|------|-------------|
+| Change only necessary code | Only the confirmed root cause location |
+| No refactoring | Do not restructure surrounding code |
+| No feature additions | Fix only; no new capabilities |
+| No style/format changes | Do not touch unrelated code formatting |
+| >3 files changed | Requires written justification in fix_plan |
+
+**Fix scope decision**:
+
+| Files to change | Action |
+|----------------|--------|
+| 1-3 files | Proceed without justification |
+| More than 3 files | Document justification in fix_plan.description before proceeding |
+
+---
+
+### Step 2: Implement the Fix
+
+Apply the planned changes using Edit:
+
+- Target only the file(s) and line(s) identified in `confirmed_root_cause.affected_code`
+- Make exactly the change described in fix_plan
+- Verify the edit was applied correctly by reading the modified section
+
+**Decision table**:
+
+| Edit outcome | Action |
+|-------------|--------|
+| Edit applied correctly | Proceed to Step 3 |
+| Edit failed or incorrect | Re-apply with corrected old_string/new_string; if Edit fails 2+ times, use Bash sed as fallback |
+| Fix requires more than planned | Document the additional change in fix_plan with justification |
+
+---
+
+### Step 3: Add Regression Test
+
+Create or modify a test that proves the fix:
+
+1. Find the appropriate test file for the affected module:
+   - Use Glob for `**/*.test.{ts,js,py}`, `**/__tests__/**/*.{ts,js}`, or `**/test_*.py`
+   - Match the test file to the affected source module
+2. Add a regression test with these requirements:
+
+**Regression test requirements**:
+
+| Requirement | Details |
+|-------------|---------|
+| Test name references the bug | Name clearly describes the bug scenario (e.g., "should handle null display_name without error") |
+| Tests exact code path | Exercises the specific path identified in root cause |
+| Deterministic | No timing dependencies, no external services |
+| Correct placement | In the appropriate test file for the affected module |
+| Proves the fix | Must fail when fix is reverted, pass when fix is applied |
+
+**Decision table**:
+
+| Condition | Action |
+|-----------|--------|
+| Existing test file found for module | Add test to that file |
+| No existing test file found | Create new test file following project conventions |
+| Multiple candidate test files | Choose the one most directly testing the affected module |
+
+---
+
+### Step 4: Verify Fix Against Reproduction
+
+Re-run the original reproduction case from Phase 1:
+
+- If Phase 1 used a failing test: run that same test now
+- If Phase 1 used a failing command: run that same command now
+- If Phase 1 used static analysis: run the regression test as verification
+
+Record verification result:
+
+```
+fix_applied = {
+  description: "<what was fixed>",
+  files_changed: ["<path/to/file.ts>"],
+  lines_changed: <count>,
+  regression_test: {
+    file: "<path/to/test.ts>",
+    test_name: "<test name>",
+    status: "added|modified"
+  },
+  reproduction_verified: true|false
+}
+```
+
+**Decision table**:
+
+| Verification result | Action |
+|--------------------|--------|
+| Reproduction case now passes | Set reproduction_verified: true, proceed to Step 5 |
+| Reproduction case still fails | Analyze why fix is insufficient, adjust fix, re-run |
+| Cannot verify (setup required) | Document as concern, set reproduction_verified: false, proceed |
+
+---
+
+### Step 5: Assemble Phase 4 Output
+
+Add `fix_applied` to investigation-report in memory. Output Phase 4 summary and await assign_task for Phase 5.
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| Modified source files | File edits | Minimal fix applied to affected code |
+| Regression test | File add/edit | Test covering the exact bug scenario |
+| investigation-report (phase 4) | In-memory JSON | Phases 1-3 fields + fix_applied section |
+| Phase 4 summary | Structured text output | Fix description, test added, verification result |
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Iron Law gate passed | confirmed_root_cause present before any code change |
+| Fix is minimal | fix_plan.total_files_changed <= 3 OR justification documented |
+| Regression test added | fix_applied.regression_test populated |
+| Original reproduction passes | fix_applied.reproduction_verified: true |
+| No unrelated code changes | Only confirmed_root_cause.affected_code locations modified |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Iron Law gate fails | Output BLOCKED, halt, do not modify any files |
+| Edit tool fails twice | Try Bash sed/awk as fallback; if still failing, use Write to recreate file |
+| Fix does not resolve reproduction | Analyze remaining failure, adjust fix within Phase 4 |
+| Fix requires changing >3 files | Document justification, proceed after explicit justification |
+| No test file found for module | Create new test file following nearest similar test file pattern |
+| Regression test is non-deterministic | Refactor test to remove timing/external dependencies |
+
+## Next Phase
+
+-> [Phase 5: Verification & Report](05-verification-report.md)
--- a/.codex/skills/investigate/phases/05-verification-report.md
+++ b/.codex/skills/investigate/phases/05-verification-report.md
@@ -0,0 +1,240 @@
+# Phase 5: Verification & Report
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Run full test suite, check for regressions, and generate the structured debug report.
+
+## Objective
+
+- Run the full test suite to verify no regressions were introduced
+- Generate a structured debug report for future reference
+- Output the report to `.workflow/.debug/` directory
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| investigation-report (phases 1-4) | Yes | All phases populated: evidence, root cause, fix_applied |
+| assign_task message | Yes | Phase 5 instruction |
+
+## Execution Steps
+
+### Step 1: Detect and Run Full Test Suite
+
+Detect the project's test framework by checking for project files, then run the full suite:
+
+| Detection file | Test command |
+|---------------|-------------|
+| `package.json` with `test` script | `npm test` |
+| `pytest.ini` or `pyproject.toml` | `pytest` |
+| `go.mod` | `go test ./...` |
+| `Cargo.toml` | `cargo test` |
+| `Makefile` with `test` target | `make test` |
+| None detected | Try `npm test`, `pytest`, `go test ./...` sequentially |
+
+```
+Bash: mkdir -p .workflow/.debug
+Bash: <detected test command>
+```
+
+Record test results:
+
+```
+test_results = {
+  total: <count>,
+  passed: <count>,
+  failed: <count>,
+  skipped: <count>,
+  regression_test_passed: true|false,
+  new_failures: []
+}
+```
+
+---
+
+### Step 2: Regression Check
+
+Verify specifically:
+
+1. The new regression test passes (check by test name from fix_applied.regression_test.test_name).
+2. All tests that were passing before the fix still pass.
+3. No new warnings or errors appeared in test output.
+
+**Decision table for new failures**:
+
+| New failure | Assessment | Action |
+|-------------|-----------|--------|
+| Related to fix (same module, same code path) | Fix introduced regression | Return to Phase 4 to adjust fix |
+| Unrelated to fix (different module, pre-existing) | Pre-existing failure | Document in pre_existing_failures, proceed |
+| Regression test itself fails | Fix is not working correctly | Return to Phase 4 |
+
+Classify failures:
+
+```
+regression_check_result = {
+  passed: true|false,
+  total_tests: <count>,
+  new_failures: ["<test names that newly fail>"],
+  pre_existing_failures: ["<tests that were already failing>"]
+}
+```
+
+---
+
+### Step 3: Generate Structured Debug Report
+
+Compile all investigation data into the final debug report JSON following the schema from `~/.codex/skills/investigate/specs/debug-report-format.md`:
+
+```
+debug_report = {
+  "bug_description": "<concise one-sentence description of the bug>",
+  "reproduction_steps": [
+    "<step 1>",
+    "<step 2>",
+    "<step 3: observe error>"
+  ],
+  "root_cause": "<confirmed root cause description with technical detail and file:line reference>",
+  "evidence_chain": [
+    "Phase 1: <error message X observed in module Y>",
+    "Phase 2: <pattern analysis found N similar occurrences>",
+    "Phase 3: hypothesis H<N> confirmed — <specific condition at file:line>"
+  ],
+  "fix_description": "<what was changed and why>",
+  "files_changed": [
+    {
+      "path": "<src/module/file.ts>",
+      "change_type": "add|modify|remove",
+      "description": "<brief description of changes to this file>"
+    }
+  ],
+  "tests_added": [
+    {
+      "file": "<src/module/__tests__/file.test.ts>",
+      "test_name": "<should handle null return from X>",
+      "type": "regression|unit|integration"
+    }
+  ],
+  "regression_check_result": {
+    "passed": true|false,
+    "total_tests": <count>,
+    "new_failures": [],
+    "pre_existing_failures": []
+  },
+  "completion_status": "DONE|DONE_WITH_CONCERNS|BLOCKED",
+  "concerns": [],
+  "timestamp": "<ISO-8601 timestamp>",
+  "investigation_duration_phases": 5
+}
+```
+
+**Field sources**:
+
+| Field | Source Phase | Description |
+|-------|-------------|-------------|
+| `bug_description` | Phase 1 | User-reported symptom, one sentence |
+| `reproduction_steps` | Phase 1 | Ordered steps to trigger the bug |
+| `root_cause` | Phase 3 | Confirmed cause with file:line reference |
+| `evidence_chain` | Phase 1-3 | Each item prefixed with "Phase N:" |
+| `fix_description` | Phase 4 | What code was changed and why |
+| `files_changed` | Phase 4 | Each file with change type and description |
+| `tests_added` | Phase 4 | Regression tests covering the bug |
+| `regression_check_result` | Phase 5 | Full test suite results |
+| `completion_status` | Phase 5 | Final status per protocol |
+| `concerns` | Phase 5 | Non-blocking issues (if any) |
+| `timestamp` | Phase 5 | When report was generated |
+| `investigation_duration_phases` | Phase 5 | Always 5 for complete investigation |
+
+---
+
+### Step 4: Write Report File
+
+Compute the filename:
+- `<slug>` = bug_description lowercased, non-alphanumeric characters replaced with `-`, truncated to 40 chars
+- `<date>` = current date as YYYY-MM-DD
+
+```
+Bash: mkdir -p .workflow/.debug
+Write: .workflow/.debug/debug-report-<date>-<slug>.json
+Content: <debug_report JSON with 2-space indent>
+```
+
+---
+
+### Step 5: Output Completion Status
+
+Determine status and output completion block:
+
+**Status determination**:
+
+| Condition | Status |
+|-----------|--------|
+| Regression test passes, no new failures, all quality checks met | DONE |
+| Fix applied but partial test coverage, minor warnings, or non-critical concerns | DONE_WITH_CONCERNS |
+| New test failures introduced by fix (unresolvable), or critical concern | BLOCKED |
+
+**DONE output**:
+
+```
+## STATUS: DONE
+
+**Summary**: Fixed <bug_description> — root cause was <root_cause_summary>
+
+### Details
+- Phases completed: 5/5
+- Root cause: <confirmed_root_cause.description>
+- Fix: <fix_description>
+- Regression test: <test_name> in <test_file>
+
+### Outputs
+- Debug report: .workflow/.debug/debug-report-<date>-<slug>.json
+- Files changed: <list>
+- Tests added: <list>
+```
+
+**DONE_WITH_CONCERNS output**:
+
+```
+## STATUS: DONE_WITH_CONCERNS
+
+**Summary**: Fixed <bug_description> with concerns
+
+### Details
+- Phases completed: 5/5
+- Concerns:
+  1. <concern> — Impact: low|medium — Suggested fix: <action>
+
+### Outputs
+- Debug report: .workflow/.debug/debug-report-<date>-<slug>.json
+- Files changed: <list>
+- Tests added: <list>
+```
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| `.workflow/.debug/debug-report-<date>-<slug>.json` | JSON file | Full structured investigation report |
+| Completion status block | Structured text output | Final status per Completion Status Protocol |
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Full test suite executed | Test command ran and produced output |
+| Regression test passes | test_results.regression_test_passed: true |
+| No new failures introduced | regression_check_result.new_failures is empty (or documented as pre-existing) |
+| Debug report written | File exists at `.workflow/.debug/debug-report-<date>-<slug>.json` |
+| Completion status output | Status block follows protocol format |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Test framework not detected | Try common commands in order; document uncertainty in concerns |
+| New failures related to fix | Return to Phase 4 to adjust; do not write report until resolved |
+| New failures unrelated | Document as pre_existing_failures, set DONE_WITH_CONCERNS if impactful |
+| Report directory not writable | Try alternate path `.workflow/debug/`; document in output |
+| Test suite takes >5 minutes | Run regression test only; note full suite skipped in concerns |
+| Regression test was not added in Phase 4 | Document as DONE_WITH_CONCERNS concern |
--- a/.codex/skills/security-audit/agents/security-auditor.md
+++ b/.codex/skills/security-audit/agents/security-auditor.md
@@ -0,0 +1,341 @@
+# Security Auditor Agent
+
+Executes all 4 phases of the security audit: supply chain scan, OWASP Top 10 review, STRIDE threat modeling, and scored report generation. Driven by orchestrator via assign_task through each phase.
+
+## Identity
+
+- **Type**: `analysis`
+- **Role File**: `~/.codex/agents/security-auditor.md`
+- **task_name**: `security-auditor`
+- **Responsibility**: Read-only analysis (Phases 1–3) + Write (Phase 4 report output)
+- **fork_context**: false
+- **Reasoning Effort**: high
+
+## Boundaries
+
+### MUST
+
+- Load role definition via MANDATORY FIRST STEPS pattern
+- Produce structured JSON output for every phase
+- Include file:line references in all code-level findings
+- Enforce scoring gates: quick-scan >= 8/10; comprehensive initial >= 2/10
+- Deduplicate findings that appear in multiple phases (keep highest severity, merge evidence)
+- Write phase output files to `.workflow/.security/` before reporting completion
+
+### MUST NOT
+
+- Skip phases in comprehensive mode — all 4 phases must complete in sequence
+- Proceed to next phase before writing current phase output file
+- Include sensitive discovered values (actual secrets, credentials) in JSON evidence fields — redact with `[REDACTED]`
+- Apply suppression (`@ts-ignore`, empty catch) — report findings as-is
+
+---
+
+## Toolbox
+
+### Available Tools
+
+| Tool | Type | Purpose |
+|------|------|---------|
+| `Bash` | execution | Run dependency audits, grep patterns, file discovery, directory setup |
+| `Read` | read | Load phase files, specs, previous audit reports |
+| `Write` | write | Output JSON phase results to `.workflow/.security/` |
+| `Glob` | read | Discover source files by pattern for scoping |
+| `Grep` | read | Pattern-based security scanning across source files |
+| `spawn_agent` | agent | Spawn inline subagent for OWASP CLI analysis (Phase 2) |
+| `wait_agent` | agent | Await inline subagent result |
+| `close_agent` | agent | Close inline subagent after result received |
+
+### Tool Usage Patterns
+
+**Setup Pattern**: Ensure work directory exists before any phase output.
+```
+Bash("mkdir -p .workflow/.security")
+```
+
+**Read Pattern**: Load phase spec before executing.
+```
+Read("~/.codex/skills/security-audit/phases/01-supply-chain-scan.md")
+Read("~/.codex/skills/security-audit/specs/scoring-gates.md")
+```
+
+**Write Pattern**: Output structured JSON after each phase.
+```
+Write(".workflow/.security/supply-chain-report.json", <json_content>)
+```
+
+---
+
+## Execution
+
+### Phase 1: Supply Chain Scan
+
+**Objective**: Detect vulnerable dependencies, hardcoded secrets, CI/CD injection risks, and LLM prompt injection vectors.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Phase spec | Yes | `~/.codex/skills/security-audit/phases/01-supply-chain-scan.md` |
+| Project root | Yes | Working directory with source files |
+
+**Steps**:
+
+1. Read `~/.codex/skills/security-audit/phases/01-supply-chain-scan.md` for full execution instructions.
+2. Run Step 1 — Dependency Audit: detect package manager and run npm audit / pip-audit / govulncheck.
+3. Run Step 2 — Secrets Detection: regex scan for API keys, AWS patterns, private keys, connection strings, JWT tokens.
+4. Run Step 3 — CI/CD Config Review: scan `.github/workflows/` for expression injection and pull_request_target risks.
+5. Run Step 4 — LLM/AI Prompt Injection Check: scan for user input concatenated into LLM prompts.
+6. Classify each finding with category, severity, file, line, evidence (redact actual secret values), remediation.
+7. Write output file.
+
+**Decision Table — Dependency Audit**:
+
+| Condition | Action |
+|-----------|--------|
+| npm / yarn lock file found | Run `npm audit --json` |
+| requirements.txt / pyproject.toml found | Run `pip-audit --format json`; fallback to `safety check --json` |
+| go.sum found | Run `govulncheck ./...` |
+| No lock files found | Log INFO finding: "No lock files detected"; continue |
+| Audit tool not installed | Log INFO finding: "<tool> not installed"; continue |
+
+**Decision Table — Secrets Detection**:
+
+| Pattern Match | Severity | Category |
+|---------------|----------|----------|
+| API key / secret / token with 16+ char value | Critical | secret |
+| AWS AKIA key pattern | Critical | secret |
+| `-----BEGIN PRIVATE KEY-----` | Critical | secret |
+| DB connection string with password | Critical | secret |
+| Hardcoded JWT token | High | secret |
+
+**Output**: `.workflow/.security/supply-chain-report.json` — schema per phase spec.
+
+---
+
+### Phase 2: OWASP Review
+
+**Objective**: Systematic code-level review against all 10 OWASP Top 10 2021 categories.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Phase spec | Yes | `~/.codex/skills/security-audit/phases/02-owasp-review.md` |
+| OWASP checklist | Yes | `~/.codex/skills/security-audit/specs/owasp-checklist.md` |
+| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
+
+**Steps**:
+
+1. Read `~/.codex/skills/security-audit/phases/02-owasp-review.md` for full execution instructions.
+2. Read `~/.codex/skills/security-audit/specs/owasp-checklist.md` for detection patterns.
+3. Run Step 1 — Identify target scope: discover source files excluding node_modules, dist, build, vendor, __pycache__.
+4. Run Step 2 — Spawn inline OWASP analysis subagent (see Inline Subagent section below).
+5. Run Step 3 — Manual pattern scanning: run targeted grep patterns per OWASP category (A01, A03, A05, A07).
+6. Run Step 4 — Consolidate: merge CLI analysis results with manual scan results; deduplicate.
+7. Set coverage field for each category: `checked` or `not_applicable`.
+8. Write output file.
+
+**Decision Table — Scope**:
+
+| Condition | Action |
+|-----------|--------|
+| Source files found | Proceed with full scan |
+| No source files detected | Report as BLOCKED with scope note |
+| Files > 500 | Prioritize: routes/, auth/, api/, handlers/ first |
+
+**Output**: `.workflow/.security/owasp-findings.json` — schema per phase spec.
+
+---
+
+## Inline Subagent: OWASP CLI Analysis (Phase 2, Step 2)
+
+**When**: After identifying target scope in Phase 2, Step 2.
+
+**Agent File**: `~/.codex/agents/cli-explore-agent.md`
+
+```
+spawn_agent({
+  task_name: "inline-owasp-analysis",
+  fork_context: false,
+  model: "haiku",
+  reasoning_effort: "medium",
+  message: `### MANDATORY FIRST STEPS
+1. Read: ~/.codex/agents/cli-explore-agent.md
+
+Goal: OWASP Top 10 2021 security analysis of this codebase.
+Systematically check each OWASP category:
+A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
+A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
+A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
+A09 Security Logging/Monitoring Failures | A10 SSRF
+
+Scope: @src/**/* @**/*.config.* @**/*.env.example
+
+Expected: JSON findings per OWASP category with severity, file:line, evidence, remediation.
+
+Constraints: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks`
+})
+const result = wait_agent({ targets: ["inline-owasp-analysis"], timeout_ms: 300000 })
+close_agent({ target: "inline-owasp-analysis" })
+```
+
+**Result Handling**:
+
+| Result | Action |
+|--------|--------|
+| Success | Integrate findings into owasp-findings.json consolidation step |
+| Timeout / Error | Continue with manual pattern scan results only; log warning |
+
+---
+
+### Phase 3: Threat Modeling
+
+**Objective**: Apply STRIDE framework to architecture components; identify trust boundaries and attack surface.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Phase spec | Yes | `~/.codex/skills/security-audit/phases/03-threat-modeling.md` |
+| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
+| OWASP findings | Yes | `.workflow/.security/owasp-findings.json` |
+
+**Steps**:
+
+1. Read `~/.codex/skills/security-audit/phases/03-threat-modeling.md` for full execution instructions.
+2. Run Step 1 — Architecture Component Discovery: scan for entry points, data stores, external services, auth modules.
+3. Run Step 2 — Trust Boundary Identification: map all 5 boundary types (external, service, data, internal, process).
+4. Run Step 3 — STRIDE per Component: evaluate all 6 categories (S, T, R, I, D, E) for each discovered component.
+5. Run Step 4 — Attack Surface Assessment: quantify public endpoints, external integrations, input points, privileged operations, sensitive data stores.
+6. Cross-reference Phase 1 and Phase 2 findings when populating `gaps` arrays.
+7. Write output file.
+
+**STRIDE Evaluation Decision Table**:
+
+| Component Type | Priority STRIDE Categories |
+|----------------|---------------------------|
+| api_endpoint | S (spoofing), T (tampering), D (denial-of-service), E (elevation) |
+| auth_module | S (spoofing), R (repudiation), E (elevation) |
+| data_store | T (tampering), I (information disclosure), R (repudiation) |
+| external_service | T (tampering), I (information disclosure), D (denial-of-service) |
+| worker | T (tampering), D (denial-of-service) |
+
+**Output**: `.workflow/.security/threat-model.json` — schema per phase spec.
+
+---
+
+### Phase 4: Report & Tracking
+
+**Objective**: Aggregate all findings, calculate score, compare trends, write dated report.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Phase spec | Yes | `~/.codex/skills/security-audit/phases/04-report-tracking.md` |
+| Scoring gates | Yes | `~/.codex/skills/security-audit/specs/scoring-gates.md` |
+| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
+| OWASP findings | Yes | `.workflow/.security/owasp-findings.json` |
+| Threat model | Yes | `.workflow/.security/threat-model.json` |
+| Previous audits | No | `.workflow/.security/audit-report-*.json` (for trend) |
+
+**Steps**:
+
+1. Read `~/.codex/skills/security-audit/phases/04-report-tracking.md` for full execution instructions.
+2. Aggregate all findings from phases 1–3 (supply-chain + owasp + STRIDE gaps).
+3. Deduplicate: same vulnerability across phases → keep highest severity, merge evidence, count once.
+4. Count files scanned (from phase outputs).
+5. Calculate score per formula: `base_score(10.0) - (weighted_sum / max(10, files_scanned))`.
+6. Find previous audit: `ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -1`.
+7. Compute trend direction and score_delta.
+8. Evaluate gate (initial vs. subsequent logic).
+9. Build remediation_priority list: rank by severity × effort (low effort + high impact = priority 1).
+10. Write dated report.
+11. Copy phase outputs to `.workflow/.security/` as latest copies.
+
+**Score Calculation**:
+
+| Severity | Weight |
+|----------|--------|
+| critical | 10 |
+| high | 7 |
+| medium | 4 |
+| low | 1 |
+
+Formula: `final_score = max(0, round(10.0 - (weighted_sum / max(10, files_scanned)), 1))`
+
+**Score Interpretation Table**:
+
+| Score Range | Rating | Meaning |
+|-------------|--------|---------|
+| 9.0 – 10.0 | Excellent | Minimal risk, production-ready |
+| 7.0 – 8.9 | Good | Acceptable risk, minor improvements needed |
+| 5.0 – 6.9 | Fair | Notable risks, remediation recommended |
+| 3.0 – 4.9 | Poor | Significant risks, remediation required |
+| 0.0 – 2.9 | Critical | Severe vulnerabilities, immediate action needed |
+
+**Gate Evaluation**:
+
+| Condition | Gate Result | Status |
+|-----------|------------|--------|
+| No previous audit AND score >= 2.0 | PASS | Baseline established |
+| No previous audit AND score < 2.0 | FAIL | DONE_WITH_CONCERNS |
+| Previous audit AND score >= previous_score | PASS | No regression |
+| Previous audit AND score within 0.5 of previous | WARN | DONE_WITH_CONCERNS |
+| Previous audit AND score < previous_score - 0.5 | FAIL | DONE_WITH_CONCERNS |
+
+**Trend Direction**:
+
+| Condition | direction field |
+|-----------|----------------|
+| No previous audit | `baseline` |
+| score_delta > 0.5 | `improving` |
+| -0.5 <= score_delta <= 0.5 | `stable` |
+| score_delta < -0.5 | `regressing` |
+
+**Output**: `.workflow/.security/audit-report-<YYYY-MM-DD>.json` — full schema per phase spec.
+
+---
+
+## Structured Output Template
+
+```
+## Summary
+- One-sentence completion status with phase completed and finding count
+
+## Score (Phase 4 / quick-scan)
+- Score: <N>/10 (<Rating>)
+- Gate: PASS|FAIL|WARN
+- Trend: <improving|stable|regressing|baseline> (delta: <+/-N.N>)
+
+## Findings
+- Critical: <N>  High: <N>  Medium: <N>  Low: <N>
+
+## Phase Outputs Written
+- .workflow/.security/supply-chain-report.json
+- .workflow/.security/owasp-findings.json (if Phase 2 completed)
+- .workflow/.security/threat-model.json (if Phase 3 completed)
+- .workflow/.security/audit-report-<date>.json (if Phase 4 completed)
+
+## Top Risks
+1. [severity] <title> — <file>:<line> — <remediation summary>
+2. [severity] <title> — <file>:<line> — <remediation summary>
+
+## Open Questions
+1. <Any scope ambiguity or blocked items>
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Phase spec file not found | Read from fallback path; report in Open Questions if unavailable |
+| Dependency audit tool missing | Log as INFO finding (category: dependency), continue with other steps |
+| No source files found | Report as BLOCKED with path; request scope clarification |
+| Inline subagent timeout (Phase 2) | Continue with manual grep results only; note in findings summary |
+| Phase output file write failure | Retry once; if still failing report as BLOCKED |
+| Previous audit parse error | Treat as baseline (no prior data); note in trend section |
+| Timeout approaching mid-phase | Output partial results with "PARTIAL" status, write what is available |
--- a/.codex/skills/security-audit/orchestrator.md
+++ b/.codex/skills/security-audit/orchestrator.md
@@ -0,0 +1,384 @@
+---
+name: security-audit
+description: OWASP Top 10 and STRIDE security auditing with supply chain analysis. Triggers on "security audit", "security scan", "cso".
+agents: security-auditor
+phases: 4
+---
+
+# Security Audit
+
+4-phase security audit covering supply chain risks, OWASP Top 10 code review, STRIDE threat modeling, and trend-tracked reporting. Produces structured JSON findings in `.workflow/.security/`.
+
+## Architecture
+
+```
+----------------------------------------------------------------------+
+|  security-audit Orchestrator                                          |
+|  -> Mode selection: quick-scan (Phase 1 only) vs comprehensive       |
+-----------------------------------+----------------------------------+
+                                    |
+              +---------------------+---------------------+
+              |                                           |
+    [quick-scan mode]                        [comprehensive mode]
+              |                                           |
+    +---------v---------+                   +------------v-----------+
+    |  Phase 1           |                   |  Phase 1               |
+    |  Supply Chain Scan |                   |  Supply Chain Scan     |
+    |  -> supply-chain-  |                   |  -> supply-chain-      |
+    |     report.json    |                   |     report.json        |
+    +---------+---------+                   +------------+-----------+
+              |                                          |
+       [score gate]                          +-----------v-----------+
+       score >= 8/10                         |  Phase 2               |
+              |                              |  OWASP Review          |
+       [DONE or                              |  -> owasp-findings.    |
+        DONE_WITH_CONCERNS]                  |     json               |
+                                             +-----------+-----------+
+                                                         |
+                                             +-----------v-----------+
+                                             |  Phase 3               |
+                                             |  Threat Modeling       |
+                                             |  (STRIDE)              |
+                                             |  -> threat-model.json  |
+                                             +-----------+-----------+
+                                                         |
+                                             +-----------v-----------+
+                                             |  Phase 4               |
+                                             |  Report & Tracking     |
+                                             |  -> audit-report-      |
+                                             |     {date}.json        |
+                                             +-----------------------+
+```
+
+---
+
+## Agent Registry
+
+| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
+|-------|-----------|-----------|----------------|---------|-------------|
+| security-auditor | security-auditor | ~/.codex/agents/security-auditor.md | Execute all 4 phases: dependency audit, OWASP review, STRIDE modeling, report generation | Deep Interaction (2.3) | false |
+
+> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
+
+---
+
+## Fork Context Strategy
+
+| Agent | task_name | fork_context | fork_from | Rationale |
+|-------|-----------|-------------|-----------|-----------|
+| security-auditor | security-auditor | false | — | Starts fresh; all context provided via assign_task phase messages |
+
+**Fork Decision Rules**:
+
+| Condition | fork_context | Reason |
+|-----------|-------------|--------|
+| security-auditor spawn | false | Self-contained pipeline; phase inputs passed via assign_task |
+
+---
+
+## Subagent Registry
+
+Utility subagents spawned by `security-auditor` (not by the orchestrator):
+
+| Subagent | Agent File | Callable By | Purpose | Model |
+|----------|-----------|-------------|---------|-------|
+| inline-owasp-analysis | ~/.codex/agents/cli-explore-agent.md | security-auditor (Phase 2) | OWASP Top 10 2021 code-level analysis | haiku |
+
+> Subagents are spawned by agents within their own execution context (Pattern 2.8), not by the orchestrator.
+
+---
+
+## Mode Selection
+
+Determine mode from user request before spawning any agent.
+
+| User Intent | Mode | Phases to Execute | Gate |
+|-------------|------|-------------------|------|
+| "quick scan", "daily check", "fast audit" | quick-scan | Phase 1 only | score >= 8/10 |
+| "full audit", "comprehensive", "security audit", "cso" | comprehensive | Phases 1 → 2 → 3 → 4 | no regression (initial: >= 2/10) |
+| Ambiguous | Prompt user: "Quick-scan (Phase 1 only) or comprehensive (all 4 phases)?" | — | — |
+
+---
+
+## Phase Execution
+
+### Phase 1: Supply Chain Scan
+
+**Objective**: Detect low-hanging security risks in dependencies, secrets, CI/CD pipelines, and LLM integrations.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| Working directory | Project source to be scanned |
+| Mode | quick-scan or comprehensive |
+
+**Execution**:
+
+Spawn the security-auditor agent and assign Phase 1:
+
+```
+spawn_agent({
+  task_name: "security-auditor",
+  fork_context: false,
+  message: `### MANDATORY FIRST STEPS
+1. Read: ~/.codex/skills/security-audit/agents/security-auditor.md
+
+## TASK: Phase 1 — Supply Chain Scan
+
+Mode: <quick-scan|comprehensive>
+Work directory: .workflow/.security
+
+Execute Phase 1 per: ~/.codex/skills/security-audit/phases/01-supply-chain-scan.md
+
+Deliverables:
+- .workflow/.security/supply-chain-report.json
+- Structured output summary with finding counts by severity`
+})
+const phase1Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 300000 })
+```
+
+**On timeout**:
+
+```
+assign_task({
+  target: "security-auditor",
+  items: [{ type: "text", text: "Finalize current supply chain scan and output supply-chain-report.json now." }]
+})
+const phase1Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 120000 })
+```
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| `.workflow/.security/supply-chain-report.json` | Dependency, secrets, CI/CD, and LLM findings |
+
+---
+
+### Quick-Scan Gate (quick-scan mode only)
+
+After Phase 1 completes, evaluate score and close agent.
+
+| Condition | Action |
+|-----------|--------|
+| score >= 8.0 | Status: DONE. No blocking issues. |
+| 6.0 <= score < 8.0 | Status: DONE_WITH_CONCERNS. Log warning — review before deploy. |
+| score < 6.0 | Status: DONE_WITH_CONCERNS. Block deployment. Remediate critical/high findings. |
+
+```
+close_agent({ target: "security-auditor" })
+```
+
+> **If quick-scan mode**: Stop here. Output final summary with score and findings count.
+
+---
+
+### Phase 2: OWASP Review (comprehensive mode only)
+
+**Objective**: Systematic code-level review against all 10 OWASP Top 10 2021 categories.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| `.workflow/.security/supply-chain-report.json` | Phase 1 findings for context |
+| Source files | All .ts/.js/.py/.go/.java excluding node_modules, dist, build |
+
+**Execution**:
+
+```
+assign_task({
+  target: "security-auditor",
+  items: [{ type: "text", text: `## Phase 2 — OWASP Review
+
+Execute Phase 2 per: ~/.codex/skills/security-audit/phases/02-owasp-review.md
+
+Context: supply-chain-report.json already written to .workflow/.security/
+Reference: ~/.codex/skills/security-audit/specs/owasp-checklist.md
+
+Deliverables:
+- .workflow/.security/owasp-findings.json
+- Coverage for all 10 OWASP categories (A01–A10)` }]
+})
+const phase2Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 360000 })
+```
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| `.workflow/.security/owasp-findings.json` | OWASP findings with owasp_id, severity, file:line, evidence, remediation |
+
+---
+
+### Phase 3: Threat Modeling (comprehensive mode only)
+
+**Objective**: Apply STRIDE threat model to architecture components; assess attack surface.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| `.workflow/.security/supply-chain-report.json` | Phase 1 findings |
+| `.workflow/.security/owasp-findings.json` | Phase 2 findings |
+| Source files | Route handlers, data stores, auth modules, external service clients |
+
+**Execution**:
+
+```
+assign_task({
+  target: "security-auditor",
+  items: [{ type: "text", text: `## Phase 3 — Threat Modeling (STRIDE)
+
+Execute Phase 3 per: ~/.codex/skills/security-audit/phases/03-threat-modeling.md
+
+Context: supply-chain-report.json and owasp-findings.json available in .workflow/.security/
+Cross-reference Phase 1 and Phase 2 findings when mapping STRIDE categories.
+
+Deliverables:
+- .workflow/.security/threat-model.json
+- All 6 STRIDE categories (S, T, R, I, D, E) evaluated per component
+- Trust boundaries and attack surface quantified` }]
+})
+const phase3Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 360000 })
+```
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| `.workflow/.security/threat-model.json` | STRIDE threat model with components, trust boundaries, attack surface |
+
+---
+
+### Phase 4: Report & Tracking (comprehensive mode only)
+
+**Objective**: Calculate score, compare with previous audits, generate date-stamped report.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| `.workflow/.security/supply-chain-report.json` | Phase 1 output |
+| `.workflow/.security/owasp-findings.json` | Phase 2 output |
+| `.workflow/.security/threat-model.json` | Phase 3 output |
+| `.workflow/.security/audit-report-*.json` | Previous audit reports (optional, for trend) |
+
+**Execution**:
+
+```
+assign_task({
+  target: "security-auditor",
+  items: [{ type: "text", text: `## Phase 4 — Report & Tracking
+
+Execute Phase 4 per: ~/.codex/skills/security-audit/phases/04-report-tracking.md
+
+Scoring reference: ~/.codex/skills/security-audit/specs/scoring-gates.md
+
+Steps:
+1. Aggregate all findings from phases 1–3
+2. Calculate score using formula: base 10.0 - (weighted_sum / normalization)
+3. Check for previous audit: ls -t .workflow/.security/audit-report-*.json | head -1
+4. Compute trend (improving/stable/regressing/baseline)
+5. Evaluate gate (initial >= 2/10; subsequent >= previous_score)
+6. Write .workflow/.security/audit-report-<YYYY-MM-DD>.json
+
+Deliverables:
+- .workflow/.security/audit-report-<YYYY-MM-DD>.json
+- Updated copies of all phase outputs in .workflow/.security/` }]
+})
+const phase4Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 300000 })
+```
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| `.workflow/.security/audit-report-<date>.json` | Full scored report with trend, top risks, remediation priority |
+
+---
+
+### Comprehensive Gate (comprehensive mode only)
+
+After Phase 4 completes, evaluate gate and close agent.
+
+| Audit Type | Condition | Result | Action |
+|------------|-----------|--------|--------|
+| Initial (no prior audit) | score >= 2.0 | PASS | DONE. Baseline established. Plan remediation. |
+| Initial | score < 2.0 | FAIL | DONE_WITH_CONCERNS. Critical exposure. Immediate triage required. |
+| Subsequent | score >= previous_score | PASS | DONE. No regression. |
+| Subsequent | previous_score - 0.5 <= score < previous_score | WARN | DONE_WITH_CONCERNS. Marginal change. Review new findings. |
+| Subsequent | score < previous_score - 0.5 | FAIL | DONE_WITH_CONCERNS. Regression detected. Investigate new findings. |
+
+```
+close_agent({ target: "security-auditor" })
+```
+
+---
+
+## Lifecycle Management
+
+### Timeout Protocol
+
+| Phase | Default Timeout | On Timeout |
+|-------|-----------------|------------|
+| Phase 1: Supply Chain | 300000 ms (5 min) | assign_task "Finalize output now", re-wait 120s |
+| Phase 2: OWASP Review | 360000 ms (6 min) | assign_task "Output partial findings", re-wait 120s |
+| Phase 3: Threat Modeling | 360000 ms (6 min) | assign_task "Output partial threat model", re-wait 120s |
+| Phase 4: Report | 300000 ms (5 min) | assign_task "Write report with available data", re-wait 120s |
+
+### Cleanup Protocol
+
+Agent is closed after the final executed phase (Phase 1 for quick-scan, Phase 4 for comprehensive).
+
+```
+close_agent({ target: "security-auditor" })
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Agent timeout (first) | assign_task "Finalize current work and output now" + re-wait 120000 ms |
+| Agent timeout (second) | Log error, close_agent({ target: "security-auditor" }), report partial results |
+| Phase output file missing | assign_task requesting specific file output, re-wait |
+| Audit tool not installed (npm/pip) | Phase 1 logs as INFO finding and continues — not a blocker |
+| No previous audit found | Treat as baseline — apply initial gate (>= 2/10) |
+| User cancellation | close_agent({ target: "security-auditor" }), report current state |
+
+---
+
+## Output Format
+
+```
+## Summary
+- One-sentence completion status with mode and final score
+
+## Score
+- Overall: <N>/10 (<Rating>)
+- Gate: PASS|FAIL|WARN
+- Mode: quick-scan|comprehensive
+
+## Findings
+- Critical: <N>
+- High: <N>
+- Medium: <N>
+- Low: <N>
+
+## Artifacts
+- File: .workflow/.security/supply-chain-report.json
+- File: .workflow/.security/owasp-findings.json (comprehensive only)
+- File: .workflow/.security/threat-model.json (comprehensive only)
+- File: .workflow/.security/audit-report-<date>.json (comprehensive only)
+
+## Top Risks
+1. <Most critical finding with file:line and remediation>
+2. <Second finding>
+
+## Next Steps
+1. Remediate critical findings (effort: <low|medium|high>)
+2. Re-run audit to verify fixes
+```
--- a/.codex/skills/security-audit/phases/01-supply-chain-scan.md
+++ b/.codex/skills/security-audit/phases/01-supply-chain-scan.md
@@ -0,0 +1,226 @@
+# Phase 1: Supply Chain Scan
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Detect low-hanging security risks in third-party dependencies, hardcoded secrets, CI/CD pipelines, and LLM/AI integrations.
+
+## Objective
+
+- Audit third-party dependencies for known vulnerabilities
+- Scan source code for leaked secrets and credentials
+- Review CI/CD configuration for injection risks
+- Check for LLM/AI prompt injection vulnerabilities
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Project root | Yes | Working directory containing source files and dependency manifests |
+| WORK_DIR | Yes | `.workflow/.security` — output directory (create if not exists) |
+
+## Execution Steps
+
+### Step 1: Dependency Audit
+
+Detect package manager and run appropriate audit tool.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| `package-lock.json` or `yarn.lock` present | Run `npm audit --json` |
+| `requirements.txt` or `pyproject.toml` present | Run `pip-audit --format json`; fallback `safety check --json` |
+| `go.sum` present | Run `govulncheck ./...` |
+| No manifest files found | Log INFO finding: "No dependency manifests detected"; continue |
+| Audit tool not installed | Log INFO finding: "<tool> not installed — manual review needed"; continue |
+
+**Execution**:
+
+```bash
+# Ensure output directory exists
+mkdir -p .workflow/.security
+WORK_DIR=".workflow/.security"
+
+# Node.js projects
+if [ -f package-lock.json ] || [ -f yarn.lock ]; then
+  npm audit --json > "${WORK_DIR}/npm-audit-raw.json" 2>&1 || true
+fi
+
+# Python projects
+if [ -f requirements.txt ] || [ -f pyproject.toml ]; then
+  pip-audit --format json --output "${WORK_DIR}/pip-audit-raw.json" 2>&1 || true
+  # Fallback: safety check
+  safety check --json > "${WORK_DIR}/safety-raw.json" 2>&1 || true
+fi
+
+# Go projects
+if [ -f go.sum ]; then
+  govulncheck ./... 2>&1 | tee "${WORK_DIR}/govulncheck-raw.txt" || true
+fi
+```
+
+---
+
+### Step 2: Secrets Detection
+
+Scan source files for hardcoded secrets using regex patterns. Exclude generated, compiled, and dependency directories.
+
+**Decision Table**:
+
+| Match Type | Severity | Category |
+|------------|----------|----------|
+| API key / token with 16+ chars | Critical | secret |
+| AWS AKIA key pattern | Critical | secret |
+| Private key PEM block | Critical | secret |
+| DB connection string with embedded password | Critical | secret |
+| Hardcoded JWT token | High | secret |
+| No matches | — | No finding |
+
+**Execution**:
+
+```bash
+# High-confidence patterns (case-insensitive)
+grep -rniE \
+  '(api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token|secret[_-]?key)\s*[:=]\s*["\x27][A-Za-z0-9+/=_-]{16,}' \
+  --include='*.ts' --include='*.js' --include='*.py' --include='*.go' \
+  --include='*.java' --include='*.rb' --include='*.env' --include='*.yml' \
+  --include='*.yaml' --include='*.json' --include='*.toml' --include='*.cfg' \
+  . || true
+
+# AWS patterns
+grep -rniE '(AKIA[0-9A-Z]{16}|aws[_-]?secret[_-]?access[_-]?key)' . || true
+
+# Private keys
+grep -rniE '-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----' . || true
+
+# Connection strings with passwords
+grep -rniE '(mongodb|postgres|mysql|redis)://[^:]+:[^@]+@' . || true
+
+# JWT tokens (hardcoded)
+grep -rniE 'eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}' . || true
+```
+
+Exclude from scan: `node_modules/`, `.git/`, `dist/`, `build/`, `__pycache__/`, `*.lock`, `*.min.js`.
+
+Redact actual matched secret values in findings — use `[REDACTED]` in evidence field.
+
+---
+
+### Step 3: CI/CD Config Review
+
+Check GitHub Actions and other CI/CD configurations for injection risks.
+
+**Decision Table**:
+
+| Pattern Found | Severity | Finding |
+|---------------|----------|---------|
+| `${{ github.event.` in `run:` block | High | Expression injection in workflow run step |
+| `pull_request_target` with checkout of PR code | High | Privileged workflow triggered by untrusted code |
+| `actions/checkout@v1` or `@v2` | Medium | Deprecated action version with known issues |
+| `secrets.` passed to untrusted context | High | Secret exposure risk |
+| No `.github/workflows/` directory | — | Not applicable; skip |
+
+**Execution**:
+
+```bash
+# Find workflow files
+find .github/workflows -name '*.yml' -o -name '*.yaml' 2>/dev/null
+
+# Check for expression injection in run: blocks
+# Dangerous: ${{ github.event.pull_request.title }} in run:
+grep -rn '\${{.*github\.event\.' .github/workflows/ 2>/dev/null || true
+
+# Check for pull_request_target with checkout of PR code
+grep -rn 'pull_request_target' .github/workflows/ 2>/dev/null || true
+
+# Check for use of deprecated/vulnerable actions
+grep -rn 'actions/checkout@v1\|actions/checkout@v2' .github/workflows/ 2>/dev/null || true
+
+# Check for secrets passed to untrusted contexts
+grep -rn 'secrets\.' .github/workflows/ 2>/dev/null || true
+```
+
+---
+
+### Step 4: LLM/AI Prompt Injection Check
+
+Scan for patterns indicating prompt injection risk in LLM integrations.
+
+**Decision Table**:
+
+| Pattern Found | Severity | Finding |
+|---------------|----------|---------|
+| User input directly concatenated into prompt/system_message | High | LLM prompt injection vector |
+| User input in template string passed to LLM call | High | LLM prompt injection via template |
+| f-string with user data in `.complete`/`.generate` call | High | Python LLM prompt injection |
+| LLM API call detected, no injection pattern | Low | LLM integration present — review for sanitization |
+
+**Execution**:
+
+```bash
+# User input concatenated directly into prompts
+grep -rniE '(prompt|system_message|messages)\s*[+=].*\b(user_input|request\.(body|query|params)|req\.)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Template strings with user data in LLM calls
+grep -rniE '(openai|anthropic|llm|chat|completion)\.' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Check for missing input sanitization before LLM calls
+grep -rniE 'f".*{.*}.*".*\.(chat|complete|generate)' \
+  --include='*.py' . || true
+```
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| `.workflow/.security/supply-chain-report.json` | JSON | All supply chain findings with severity classifications |
+
+```json
+{
+  "phase": "supply-chain-scan",
+  "timestamp": "ISO-8601",
+  "findings": [
+    {
+      "category": "dependency|secret|cicd|llm",
+      "severity": "critical|high|medium|low",
+      "title": "Finding title",
+      "description": "Detailed description",
+      "file": "path/to/file",
+      "line": 42,
+      "evidence": "matched text or context",
+      "remediation": "How to fix"
+    }
+  ],
+  "summary": {
+    "total": 0,
+    "by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
+    "by_category": { "dependency": 0, "secret": 0, "cicd": 0, "llm": 0 }
+  }
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| All 4 scan steps executed or explicitly skipped with reason | Review step execution log |
+| `supply-chain-report.json` written to `.workflow/.security/` | File exists and is valid JSON |
+| All findings have category, severity, file, evidence, remediation | JSON schema check |
+| Secret values redacted in evidence field | No raw credential values in output |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Audit tool not installed | Log INFO finding; continue with remaining steps |
+| `grep` finds no matches | No finding generated for that pattern; continue |
+| `.github/workflows/` does not exist | Mark CI/CD step as not_applicable; continue |
+| Write to WORK_DIR fails | Attempt `mkdir -p .workflow/.security` and retry once |
+
+## Next Phase
+
+-> [Phase 2: OWASP Review](02-owasp-review.md)
--- a/.codex/skills/security-audit/phases/02-owasp-review.md
+++ b/.codex/skills/security-audit/phases/02-owasp-review.md
@@ -0,0 +1,232 @@
+# Phase 2: OWASP Review
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Systematic code-level review against OWASP Top 10 2021 categories using inline subagent analysis and targeted pattern scanning.
+
+## Objective
+
+- Review codebase against all 10 OWASP Top 10 2021 categories
+- Use inline subagent multi-model analysis for comprehensive coverage
+- Produce structured findings with file:line references and remediation steps
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| `~/.codex/skills/security-audit/specs/owasp-checklist.md` | Yes | Detection patterns per OWASP category |
+| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings for dependency context |
+| Project source files | Yes | `.ts`, `.js`, `.py`, `.go`, `.java` excluding deps/build |
+
+## Execution Steps
+
+### Step 1: Identify Target Scope
+
+Discover source files, excluding generated and dependency directories.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Source files found | Proceed to Step 2 |
+| No source files found | Report as BLOCKED with path note; do not proceed |
+| Files > 500 | Prioritize routes/, auth/, api/, handlers/ first |
+
+**Execution**:
+
+```bash
+# Identify source directories (exclude deps, build, test fixtures)
+# Focus on: API routes, auth modules, data access, input handlers
+find . -type f \( -name '*.ts' -o -name '*.js' -o -name '*.py' -o -name '*.go' -o -name '*.java' \) \
+  ! -path '*/node_modules/*' ! -path '*/dist/*' ! -path '*/.git/*' \
+  ! -path '*/build/*' ! -path '*/__pycache__/*' ! -path '*/vendor/*' \
+  | head -200
+```
+
+---
+
+### Step 2: Inline Subagent OWASP Analysis
+
+Spawn inline subagent using `cli-explore-agent` role to perform systematic OWASP analysis.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Subagent completes successfully | Integrate findings into Step 4 consolidation |
+| Subagent times out | Continue with manual pattern scan (Step 3) only; log warning |
+| Subagent errors | Continue with manual pattern scan only; log warning |
+
+```
+spawn_agent({
+  task_name: "inline-owasp-analysis",
+  fork_context: false,
+  model: "haiku",
+  reasoning_effort: "medium",
+  message: `### MANDATORY FIRST STEPS
+1. Read: ~/.codex/agents/cli-explore-agent.md
+
+Goal: OWASP Top 10 2021 security audit of this codebase.
+Systematically check each OWASP category:
+A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
+A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
+A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
+A09 Security Logging/Monitoring Failures | A10 SSRF
+
+TASK: For each OWASP category, scan relevant code patterns, identify vulnerabilities with file:line references, classify severity, provide remediation.
+
+MODE: analysis
+
+CONTEXT: @src/**/* @**/*.config.* @**/*.env.example
+
+EXPECTED: JSON-structured findings per OWASP category with severity, file:line, evidence, remediation.
+
+CONSTRAINTS: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks`
+})
+const result = wait_agent({ targets: ["inline-owasp-analysis"], timeout_ms: 300000 })
+close_agent({ target: "inline-owasp-analysis" })
+```
+
+---
+
+### Step 3: Manual Pattern Scanning
+
+Supplement inline subagent analysis with targeted grep patterns per OWASP category. Reference `~/.codex/skills/security-audit/specs/owasp-checklist.md` for full pattern list.
+
+**A01 — Broken Access Control**:
+
+```bash
+# Missing auth middleware on routes
+grep -rn 'app\.\(get\|post\|put\|delete\|patch\)(' --include='*.ts' --include='*.js' . | grep -v 'auth\|middleware\|protect'
+# Direct object references without ownership check
+grep -rn 'params\.id\|req\.params\.' --include='*.ts' --include='*.js' . || true
+```
+
+**A03 — Injection**:
+
+```bash
+# SQL string concatenation
+grep -rniE '(query|execute|raw)\s*\(\s*[`"'\'']\s*SELECT.*\+\s*|f".*SELECT.*{' --include='*.ts' --include='*.js' --include='*.py' . || true
+# Command injection
+grep -rniE '(exec|spawn|system|popen|subprocess)\s*\(' --include='*.ts' --include='*.js' --include='*.py' . || true
+```
+
+**A05 — Security Misconfiguration**:
+
+```bash
+# Debug mode enabled
+grep -rniE '(DEBUG|debug)\s*[:=]\s*(true|True|1|"true")' --include='*.env' --include='*.py' --include='*.ts' --include='*.json' . || true
+# CORS wildcard
+grep -rniE "cors.*\*|Access-Control-Allow-Origin.*\*" --include='*.ts' --include='*.js' --include='*.py' . || true
+```
+
+**A07 — Identification and Authentication Failures**:
+
+```bash
+# Weak password patterns
+grep -rniE 'password.*length.*[0-5][^0-9]|minlength.*[0-5][^0-9]' --include='*.ts' --include='*.js' --include='*.py' . || true
+# Hardcoded credentials
+grep -rniE '(password|passwd|pwd)\s*[:=]\s*["\x27][^"\x27]{3,}' --include='*.ts' --include='*.js' --include='*.py' --include='*.env' . || true
+```
+
+---
+
+### Step 4: Consolidate Findings
+
+Merge inline subagent results and manual pattern scan results. Deduplicate and classify by OWASP category.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Same finding in both sources | Keep highest severity; merge evidence; note both sources |
+| Finding lacks file:line reference | Attempt to resolve via grep; if not resolvable, mark evidence as "pattern match — no line ref" |
+| Category has no findings | Set coverage to `checked` with 0 findings |
+| Category not applicable to project stack | Set coverage to `not_applicable` with reason |
+
+---
+
+## OWASP Top 10 2021 Coverage
+
+| ID | Category | Key Checks |
+|----|----------|------------|
+| A01 | Broken Access Control | Missing auth, IDOR, path traversal, CORS |
+| A02 | Cryptographic Failures | Weak algorithms, plaintext storage, missing TLS |
+| A03 | Injection | SQL, NoSQL, OS command, LDAP, XPath injection |
+| A04 | Insecure Design | Missing threat modeling, insecure business logic |
+| A05 | Security Misconfiguration | Debug enabled, default creds, verbose errors |
+| A06 | Vulnerable and Outdated Components | Known CVEs in dependencies (from Phase 1) |
+| A07 | Identification and Authentication Failures | Weak passwords, missing MFA, session issues |
+| A08 | Software and Data Integrity Failures | Unsigned updates, insecure deserialization, CI/CD |
+| A09 | Security Logging and Monitoring Failures | Missing audit logs, no alerting, insufficient logging |
+| A10 | Server-Side Request Forgery (SSRF) | Unvalidated URLs, internal resource access |
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| `.workflow/.security/owasp-findings.json` | JSON | Findings per OWASP category with coverage map |
+
+```json
+{
+  "phase": "owasp-review",
+  "timestamp": "ISO-8601",
+  "owasp_version": "2021",
+  "findings": [
+    {
+      "owasp_id": "A01",
+      "owasp_category": "Broken Access Control",
+      "severity": "critical|high|medium|low",
+      "title": "Finding title",
+      "description": "Detailed description",
+      "file": "path/to/file",
+      "line": 42,
+      "evidence": "code snippet or pattern match",
+      "remediation": "Specific fix recommendation",
+      "cwe": "CWE-XXX"
+    }
+  ],
+  "coverage": {
+    "A01": "checked|not_applicable",
+    "A02": "checked|not_applicable",
+    "A03": "checked|not_applicable",
+    "A04": "checked|not_applicable",
+    "A05": "checked|not_applicable",
+    "A06": "checked|not_applicable",
+    "A07": "checked|not_applicable",
+    "A08": "checked|not_applicable",
+    "A09": "checked|not_applicable",
+    "A10": "checked|not_applicable"
+  },
+  "summary": {
+    "total": 0,
+    "by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
+    "categories_checked": 10,
+    "categories_with_findings": 0
+  }
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| All 10 OWASP categories have coverage entry | JSON coverage map has all A01–A10 keys |
+| All findings have owasp_id, severity, file, evidence, remediation | JSON schema check |
+| `owasp-findings.json` written to `.workflow/.security/` | File exists and is valid JSON |
+| Inline subagent result integrated (or skip logged) | Summary includes source note |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Inline subagent timeout | Continue with manual grep results; log "inline-owasp-analysis timed out" in summary |
+| OWASP checklist spec not found | Use built-in patterns from this file; note missing spec |
+| No source files in scope | Report BLOCKED with path; set all categories to not_applicable |
+| Grep produces no matches for a category | Set that category coverage to `checked` with 0 findings |
+
+## Next Phase
+
+-> [Phase 3: Threat Modeling](03-threat-modeling.md)
--- a/.codex/skills/security-audit/phases/03-threat-modeling.md
+++ b/.codex/skills/security-audit/phases/03-threat-modeling.md
@@ -0,0 +1,249 @@
+# Phase 3: Threat Modeling
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Map STRIDE threat categories to architecture components, identify trust boundaries, and assess attack surface.
+
+## Objective
+
+- Apply the STRIDE threat model to the project architecture
+- Identify trust boundaries between system components
+- Assess attack surface area per component
+- Cross-reference with Phase 1 and Phase 2 findings
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings for dependency/CI context |
+| `.workflow/.security/owasp-findings.json` | Yes | Phase 2 findings to cross-reference in STRIDE gaps |
+| Project source files | Yes | Route handlers, data stores, external service clients, auth modules |
+
+## Execution Steps
+
+### Step 1: Architecture Component Discovery
+
+Identify major system components by scanning project structure.
+
+**Decision Table**:
+
+| Component Pattern Found | component.type |
+|------------------------|----------------|
+| `app.get/post/put/delete/patch`, `router.`, `@app.route`, `@router.` | api_endpoint |
+| `createConnection`, `mongoose.connect`, `sqlite`, `redis`, `S3`, `createClient` | data_store |
+| `fetch`, `axios`, `http.request`, `requests.get/post`, `urllib` | external_service |
+| `jwt`, `passport`, `session`, `oauth`, `bcrypt`, `argon2`, `crypto` | auth_module |
+| `worker`, `subprocess`, `child_process`, `celery`, `queue` | worker |
+
+**Execution**:
+
+```bash
+# Identify entry points (API routes, CLI commands, event handlers)
+grep -rlE '(app\.(get|post|put|delete|patch|use)|router\.|@app\.route|@router\.)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Identify data stores (database connections, file storage)
+grep -rlE '(createConnection|mongoose\.connect|sqlite|redis|S3|createClient)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Identify external service integrations
+grep -rlE '(fetch|axios|http\.request|requests\.(get|post)|urllib)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+
+# Identify auth/session components
+grep -rlE '(jwt|passport|session|oauth|bcrypt|argon2|crypto)' \
+  --include='*.ts' --include='*.js' --include='*.py' . || true
+```
+
+---
+
+### Step 2: Trust Boundary Identification
+
+Map the 5 standard trust boundary types. For each boundary: document what data crosses it, how it is enforced, and what happens when enforcement fails.
+
+**Trust Boundary Types**:
+
+| Boundary | From | To | Key Data Crossing |
+|----------|------|----|------------------|
+| External boundary | User/browser | Application server | User input, credentials, session tokens |
+| Service boundary | Application | External APIs/services | API keys, request bodies, response data |
+| Data boundary | Application | Database/storage | Query parameters, credentials, PII |
+| Internal boundary | Public routes | Authenticated/admin routes | Auth tokens, role claims |
+| Process boundary | Main process | Worker/subprocess | Job parameters, environment variables |
+
+For each boundary, document:
+- What crosses the boundary (data types, credentials)
+- How the boundary is enforced (middleware, TLS, auth)
+- What happens when enforcement fails
+
+---
+
+### Step 3: STRIDE per Component
+
+For each discovered component, evaluate all 6 STRIDE categories systematically.
+
+**STRIDE Category Definitions**:
+
+| Category | Threat | Key Question |
+|----------|--------|-------------|
+| S — Spoofing | Identity impersonation | Can an attacker pretend to be someone else? |
+| T — Tampering | Data modification | Can data be modified in transit or at rest? |
+| R — Repudiation | Deniable actions | Can a user deny performing an action? |
+| I — Information Disclosure | Data leakage | Can sensitive data be exposed? |
+| D — Denial of Service | Availability disruption | Can the system be made unavailable? |
+| E — Elevation of Privilege | Unauthorized access | Can a user gain higher privileges? |
+
+**Spoofing Analysis Checks**:
+- Are authentication mechanisms in place at all entry points?
+- Can API keys or tokens be forged or replayed?
+- Are session tokens properly validated and rotated?
+
+**Tampering Analysis Checks**:
+- Is input validation applied before processing?
+- Are database queries parameterized?
+- Can request bodies or headers be manipulated to alter behavior?
+- Are file uploads validated for type and content?
+
+**Repudiation Analysis Checks**:
+- Are user actions logged with sufficient detail (who, what, when)?
+- Are logs tamper-proof or centralized?
+- Can critical operations (payments, deletions) be traced to a user?
+
+**Information Disclosure Analysis Checks**:
+- Do error responses leak stack traces or internal paths?
+- Are sensitive fields (passwords, tokens) excluded from logs and API responses?
+- Is PII properly handled (encryption at rest, masking in logs)?
+- Do debug endpoints or verbose modes expose internals?
+
+**Denial of Service Analysis Checks**:
+- Are rate limits applied to public endpoints?
+- Can resource-intensive operations be triggered without limits?
+- Are file upload sizes bounded?
+- Are database queries bounded (pagination, timeouts)?
+
+**Elevation of Privilege Analysis Checks**:
+- Are role/permission checks applied consistently?
+- Can horizontal privilege escalation occur (accessing other users' data)?
+- Can vertical escalation occur (user -> admin)?
+- Are admin/debug routes properly protected?
+
+**Component Exposure Rating**:
+
+| Rating | Criteria |
+|--------|----------|
+| High | Public-facing, handles sensitive data, complex logic |
+| Medium | Authenticated access, moderate data sensitivity |
+| Low | Internal only, no sensitive data, simple operations |
+
+---
+
+### Step 4: Attack Surface Assessment
+
+Quantify the attack surface across the entire system.
+
+**Attack Surface Components**:
+
+```
+Attack Surface = Sum of:
+  - Number of public API endpoints
+  - Number of external service integrations
+  - Number of user-controllable input points
+  - Number of privileged operations
+  - Number of data stores with sensitive content
+```
+
+**Decision Table — Attack Surface Rating**:
+
+| Total Score | Interpretation |
+|-------------|---------------|
+| 0–5 | Low attack surface |
+| 6–15 | Moderate attack surface |
+| 16–30 | High attack surface |
+| > 30 | Very high attack surface — prioritize hardening |
+
+Cross-reference Phase 1 and Phase 2 findings when populating `gaps` arrays for each STRIDE category. A finding in Phase 2 (e.g., A03 injection) maps to STRIDE T (Tampering) for the relevant component.
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| `.workflow/.security/threat-model.json` | JSON | STRIDE model with components, trust boundaries, attack surface |
+
+```json
+{
+  "phase": "threat-modeling",
+  "timestamp": "ISO-8601",
+  "framework": "STRIDE",
+  "components": [
+    {
+      "name": "Component name",
+      "type": "api_endpoint|data_store|external_service|auth_module|worker",
+      "files": ["path/to/file.ts"],
+      "exposure": "high|medium|low",
+      "trust_boundaries": ["external", "data"],
+      "threats": {
+        "spoofing": {
+          "applicable": true,
+          "findings": ["Description of threat"],
+          "mitigations": ["Existing mitigation"],
+          "gaps": ["Missing mitigation"]
+        },
+        "tampering": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
+        "repudiation": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
+        "information_disclosure": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
+        "denial_of_service": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
+        "elevation_of_privilege": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] }
+      }
+    }
+  ],
+  "trust_boundaries": [
+    {
+      "name": "Boundary name",
+      "from": "Component A",
+      "to": "Component B",
+      "enforcement": "TLS|auth_middleware|API_key",
+      "data_crossing": ["request bodies", "credentials"],
+      "risk_level": "high|medium|low"
+    }
+  ],
+  "attack_surface": {
+    "public_endpoints": 0,
+    "external_integrations": 0,
+    "input_points": 0,
+    "privileged_operations": 0,
+    "sensitive_data_stores": 0,
+    "total_score": 0
+  },
+  "summary": {
+    "components_analyzed": 0,
+    "threats_identified": 0,
+    "by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 },
+    "high_exposure_components": 0
+  }
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| At least one component analyzed | `components` array has at least 1 entry |
+| All 6 STRIDE categories evaluated per component | Each component.threats has all 6 keys |
+| Trust boundaries mapped | `trust_boundaries` array populated |
+| Attack surface quantified | `attack_surface.total_score` calculated |
+| `threat-model.json` written to `.workflow/.security/` | File exists and is valid JSON |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| No components discovered via grep | Analyze project structure manually (README, package.json); note uncertainty |
+| Phase 2 findings not available for cross-reference | Proceed with grep-only; note missing OWASP context |
+| Ambiguous architecture (monolith vs microservices) | Document assumption in summary; note for user review |
+| No `.github/workflows/` for CI boundary | Mark process boundary as not_applicable |
+
+## Next Phase
+
+-> [Phase 4: Report & Tracking](04-report-tracking.md)
--- a/.codex/skills/security-audit/phases/04-report-tracking.md
+++ b/.codex/skills/security-audit/phases/04-report-tracking.md
@@ -0,0 +1,300 @@
+# Phase 4: Report & Tracking
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Generate scored audit report, compare with previous audits, and track security trends.
+
+## Objective
+
+- Calculate security score from all phase findings
+- Compare with previous audit results (if available)
+- Generate date-stamped report in `.workflow/.security/`
+- Track improvement or regression trends
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings |
+| `.workflow/.security/owasp-findings.json` | Yes | Phase 2 findings |
+| `.workflow/.security/threat-model.json` | Yes | Phase 3 findings (STRIDE gaps) |
+| `.workflow/.security/audit-report-*.json` | No | Previous audit reports for trend comparison |
+| `~/.codex/skills/security-audit/specs/scoring-gates.md` | Yes | Scoring formula and gate thresholds |
+
+## Execution Steps
+
+### Step 1: Aggregate Findings
+
+Collect all findings from phases 1–3 and classify by severity.
+
+**Aggregation Formula**:
+
+```
+All findings =
+  supply-chain-report.findings
+  + owasp-findings.findings
+  + threat-model threats (where gaps array is non-empty)
+```
+
+**Deduplication Rule**:
+
+| Condition | Action |
+|-----------|--------|
+| Same vulnerability appears in multiple phases | Keep highest-severity classification; merge evidence; count as single finding |
+| Same file:line in different categories | Merge into one finding; note all phases that detected it |
+| Unique finding per phase | Include as-is |
+
+---
+
+### Step 2: Calculate Score
+
+Apply scoring formula from `~/.codex/skills/security-audit/specs/scoring-gates.md`.
+
+**Scoring Formula**:
+
+```
+Base score = 10.0
+
+For each finding:
+  penalty = severity_weight / total_files_scanned
+  - Critical: weight = 10  (each critical finding has outsized impact)
+  - High:     weight = 7
+  - Medium:   weight = 4
+  - Low:      weight = 1
+
+Weighted penalty = SUM(finding_weight * count_per_severity) / normalization_factor
+Final score = max(0, 10.0 - weighted_penalty)
+
+Normalization factor = max(10, total_files_scanned)
+```
+
+**Severity Weights**:
+
+| Severity | Weight | Criteria | Examples |
+|----------|--------|----------|----------|
+| Critical | 10 | Exploitable with high impact, no user interaction needed | RCE, SQL injection with data access, leaked production credentials, auth bypass |
+| High | 7 | Exploitable with significant impact, may need user interaction | Broken authentication, SSRF, privilege escalation, XSS with session theft |
+| Medium | 4 | Limited exploitability or moderate impact | Reflected XSS, CSRF, verbose error messages, missing security headers |
+| Low | 1 | Informational or minimal impact | Missing best-practice headers, minor info disclosure, deprecated dependencies without known exploit |
+
+**Score Interpretation**:
+
+| Score | Rating | Meaning |
+|-------|--------|---------|
+| 9.0–10.0 | Excellent | Minimal risk, production-ready |
+| 7.0–8.9 | Good | Acceptable risk, minor improvements needed |
+| 5.0–6.9 | Fair | Notable risks, remediation recommended |
+| 3.0–4.9 | Poor | Significant risks, remediation required |
+| 0.0–2.9 | Critical | Severe vulnerabilities, immediate action needed |
+
+**Example Score Calculations**:
+
+| Findings | Files Scanned | Weighted Sum | Penalty | Score |
+|----------|--------------|--------------|---------|-------|
+| 1 critical | 50 | 10 | 0.2 | 9.8 |
+| 2 critical, 3 high | 50 | 41 | 0.82 | 9.2 |
+| 5 critical, 10 high | 50 | 120 | 2.4 | 7.6 |
+| 10 critical, 20 high, 15 medium | 100 | 300 | 3.0 | 7.0 |
+| 20 critical | 20 | 200 | 10.0 | 0.0 |
+
+---
+
+### Step 3: Gate Evaluation
+
+**Daily quick-scan gate** (Phase 1 only):
+
+| Result | Condition | Action |
+|--------|-----------|--------|
+| PASS | score >= 8.0 | Continue. No blocking issues. |
+| WARN | 6.0 <= score < 8.0 | Log warning. Review findings before deploy. |
+| FAIL | score < 6.0 | Block deployment. Remediate critical/high findings. |
+
+**Comprehensive audit gate** (all phases):
+
+Initial/baseline audit (no previous audit exists):
+
+| Result | Condition | Action |
+|--------|-----------|--------|
+| PASS | score >= 2.0 | Baseline established. Plan remediation. |
+| FAIL | score < 2.0 | Critical exposure. Immediate triage required. |
+
+Subsequent audits (previous audit exists):
+
+| Result | Condition | Action |
+|--------|-----------|--------|
+| PASS | score >= previous_score | No regression. Continue improvement. |
+| WARN | score within 0.5 of previous | Marginal change. Review new findings. |
+| FAIL | score < previous_score - 0.5 | Regression detected. Investigate new findings. |
+
+Production readiness target: score >= 7.0
+
+---
+
+### Step 4: Trend Comparison
+
+Find and compare with previous audit reports.
+
+**Execution**:
+
+```bash
+# Find previous audit reports
+ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -5
+```
+
+**Trend Direction Decision Table**:
+
+| Condition | direction |
+|-----------|-----------|
+| No previous audit file found | `baseline` |
+| score_delta > 0.5 | `improving` |
+| -0.5 <= score_delta <= 0.5 | `stable` |
+| score_delta < -0.5 | `regressing` |
+
+Compare current vs. previous:
+- Delta per OWASP category (new findings vs. resolved findings)
+- Delta per STRIDE category
+- New findings vs. resolved findings (by title/file comparison)
+- Overall score trend
+
+**Trend JSON Format**:
+
+```json
+{
+  "trend": {
+    "current_date": "2026-03-29",
+    "current_score": 7.5,
+    "previous_date": "2026-03-22",
+    "previous_score": 6.8,
+    "score_delta": 0.7,
+    "new_findings": 2,
+    "resolved_findings": 5,
+    "direction": "improving",
+    "history": [
+      { "date": "2026-03-15", "score": 5.2, "total_findings": 45 },
+      { "date": "2026-03-22", "score": 6.8, "total_findings": 32 },
+      { "date": "2026-03-29", "score": 7.5, "total_findings": 29 }
+    ]
+  }
+}
+```
+
+---
+
+### Step 5: Generate Report
+
+Assemble and write the final scored report.
+
+**Execution**:
+
+```bash
+# Ensure directory exists
+mkdir -p .workflow/.security
+
+# Write report with date stamp
+DATE=$(date +%Y-%m-%d)
+cp "${WORK_DIR}/audit-report.json" ".workflow/.security/audit-report-${DATE}.json"
+
+# Also maintain latest copies of phase outputs
+cp "${WORK_DIR}/supply-chain-report.json" ".workflow/.security/" 2>/dev/null || true
+cp "${WORK_DIR}/owasp-findings.json" ".workflow/.security/" 2>/dev/null || true
+cp "${WORK_DIR}/threat-model.json" ".workflow/.security/" 2>/dev/null || true
+```
+
+Build `remediation_priority` list: rank by severity weight × inverse effort (low effort + high impact = priority 1).
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| `.workflow/.security/audit-report-<YYYY-MM-DD>.json` | JSON | Full scored report with trend, top risks, remediation priority |
+
+```json
+{
+  "report": "security-audit",
+  "version": "1.0",
+  "timestamp": "ISO-8601",
+  "date": "YYYY-MM-DD",
+  "mode": "comprehensive|quick-scan",
+  "score": {
+    "overall": 7.5,
+    "rating": "Good",
+    "gate": "PASS|FAIL",
+    "gate_threshold": 8
+  },
+  "findings_summary": {
+    "total": 0,
+    "by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
+    "by_phase": {
+      "supply_chain": 0,
+      "owasp": 0,
+      "stride": 0
+    },
+    "by_owasp": {
+      "A01": 0, "A02": 0, "A03": 0, "A04": 0, "A05": 0,
+      "A06": 0, "A07": 0, "A08": 0, "A09": 0, "A10": 0
+    },
+    "by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 }
+  },
+  "top_risks": [
+    {
+      "rank": 1,
+      "title": "Most critical finding",
+      "severity": "critical",
+      "source_phase": "owasp",
+      "remediation": "How to fix",
+      "effort": "low|medium|high"
+    }
+  ],
+  "trend": {
+    "previous_date": "YYYY-MM-DD or null",
+    "previous_score": 0,
+    "score_delta": 0,
+    "new_findings": 0,
+    "resolved_findings": 0,
+    "direction": "improving|stable|regressing|baseline"
+  },
+  "phases_completed": ["supply-chain-scan", "owasp-review", "threat-modeling", "report-tracking"],
+  "files_scanned": 0,
+  "remediation_priority": [
+    {
+      "priority": 1,
+      "finding": "Finding title",
+      "effort": "low",
+      "impact": "high",
+      "recommendation": "Specific action"
+    }
+  ]
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Score calculated using correct formula | Verify: base 10.0 - (weighted_sum / max(10, files)) |
+| Gate evaluation matches mode and audit history | Check gate logic against previous audit presence |
+| Trend direction computed correctly | Verify score_delta and direction mapping |
+| `audit-report-<date>.json` written to `.workflow/.security/` | File exists, is valid JSON, contains all required fields |
+| remediation_priority ranked by severity and effort | Priority 1 = highest severity + lowest effort |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Phase data file missing or corrupted | Report as BLOCKED; output partial report with available data |
+| Previous audit parse error | Treat as baseline; note data integrity issue |
+| files_scanned is zero | Use normalization_factor of 10 (minimum); continue |
+| Date command unavailable | Use ISO timestamp substring for date portion |
+| Write fails | Retry once with explicit `mkdir -p`; report BLOCKED if still failing |
+
+## Completion Status
+
+After report generation, output skill completion status:
+
+| Status | Condition |
+|--------|-----------|
+| DONE | All phases completed, report generated, gate PASS |
+| DONE_WITH_CONCERNS | Report generated but gate WARN or FAIL, or regression detected |
+| BLOCKED | Phase data missing or corrupted, cannot calculate score |
--- a/.codex/skills/ship/agents/ship-operator.md
+++ b/.codex/skills/ship/agents/ship-operator.md
@@ -0,0 +1,318 @@
+# ship-operator Agent
+
+Executes all 5 gated phases of the release pipeline sequentially, enforcing gate conditions before advancing.
+
+## Identity
+
+- **Type**: `pipeline-executor`
+- **Role File**: `~/.codex/agents/ship-operator.md`
+- **task_name**: `ship-operator`
+- **Responsibility**: Code generation / Execution (write mode — git, file updates, push, PR)
+- **fork_context**: false
+
+## Boundaries
+
+### MUST
+
+- Load role definition via MANDATORY FIRST STEPS pattern
+- Read the phase detail file at the start of each phase before executing any step
+- Check gate condition after each phase and halt on failure
+- Produce structured JSON output for each completed phase
+- Confirm with user before proceeding on major version bumps or direct-to-main releases
+- Include file:line references in any findings
+
+### MUST NOT
+
+- Skip the MANDATORY FIRST STEPS role loading
+- Advance to the next phase if the current phase gate fails
+- Push to remote if Phase 3 (version bump) gate failed
+- Create a PR if Phase 4 (push) gate failed
+- Produce unstructured output
+- Modify files outside the release pipeline scope (version file, CHANGELOG.md, package-lock.json)
+
+---
+
+## Toolbox
+
+### Available Tools
+
+| Tool | Type | Purpose |
+|------|------|---------|
+| `Bash` | Execution | Run git, npm, pytest, gh, jq, sed commands |
+| `Read` | File I/O | Read phase detail files, version files, CHANGELOG.md |
+| `Write` | File I/O | Write/update CHANGELOG.md, VERSION file |
+| `Edit` | File I/O | Update package.json, pyproject.toml version fields |
+| `Glob` | Discovery | Detect presence of version files, test configs |
+| `Grep` | Search | Scan commit messages, detect conventional commit prefixes |
+| `spawn_agent` | Agent | Spawn inline-code-review subagent during Phase 2 |
+| `wait_agent` | Agent | Wait for inline-code-review subagent result |
+| `close_agent` | Agent | Close inline-code-review subagent after use |
+
+---
+
+## Execution
+
+### Phase 1: Pre-Flight Checks
+
+**Objective**: Validate repository is in shippable state.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| ~/.codex/skills/ship/phases/01-preflight-checks.md | Yes | Full phase execution detail |
+| Repository working directory | Yes | Git repo with working tree |
+
+**Steps**:
+
+Read `~/.codex/skills/ship/phases/01-preflight-checks.md` first.
+
+Then execute all four checks as specified in that file:
+1. Git clean check — `git status --porcelain`
+2. Branch validation — `git branch --show-current`
+3. Test suite execution — detect and run npm test / pytest
+4. Build verification — detect and run npm run build / python -m build / make build
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| All checks pass | Set gate = pass, output preflight JSON, await Phase 2 task |
+| Any check fails | Set gate = fail, output BLOCKED with failure details, halt |
+| Branch is main/master | Set gate = warn, ask user to confirm direct release |
+| No tests detected | Set gate = warn (skip), continue to build check |
+| No build step detected | Set gate = pass (info), continue |
+
+**Output**: Structured preflight-report JSON (see phase file for schema).
+
+---
+
+### Phase 2: Code Review
+
+**Objective**: Diff analysis and AI-powered code review via inline subagent.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| ~/.codex/skills/ship/phases/02-code-review.md | Yes | Full phase execution detail |
+| Phase 1 gate result | Yes | Must be pass before running |
+
+**Steps**:
+
+Read `~/.codex/skills/ship/phases/02-code-review.md` first.
+
+1. Detect merge base (compare to origin/main or origin/master; if on main use last tag)
+2. Generate diff summary (`git diff --stat`, count files/lines)
+3. Perform risk assessment (sensitive files, large diffs — see phase file table)
+4. Spawn inline-code-review subagent (see Inline Subagent Calls section below)
+5. Evaluate review results against gate condition
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| No critical issues | Set gate = pass, output review JSON |
+| Critical issues found | Set gate = fail, output BLOCKED with issues list |
+| Warnings only | Set gate = warn, proceed, flag DONE_WITH_CONCERNS |
+| Subagent timeout or error | Log warning, ask user whether to proceed or retry |
+
+**Output**: Structured code-review JSON (see phase file for schema).
+
+---
+
+### Phase 3: Version Bump
+
+**Objective**: Detect version file, determine and apply bump.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| ~/.codex/skills/ship/phases/03-version-bump.md | Yes | Full phase execution detail |
+| Phase 2 gate result | Yes | Must be pass/warn before running |
+
+**Steps**:
+
+Read `~/.codex/skills/ship/phases/03-version-bump.md` first.
+
+1. Detect version file (package.json > pyproject.toml > VERSION)
+2. Read current version
+3. Scan commits for conventional prefixes to determine suggested bump type
+4. For major bumps: ask user to confirm before proceeding
+5. Calculate new version (semver)
+6. Update version file using jq / sed / echo as appropriate
+7. Verify update by re-reading
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Version file found and updated | Set gate = pass, output version record |
+| No version file found | Set gate = needs_context, ask user, halt until answered |
+| Version mismatch after update | Set gate = fail, output BLOCKED |
+| User declines major bump | Set gate = blocked, halt |
+| Bump type ambiguous | Default to patch, inform user |
+
+**Output**: Structured version-bump JSON (see phase file for schema).
+
+---
+
+### Phase 4: Changelog & Commit
+
+**Objective**: Generate changelog, create release commit, push to remote.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| ~/.codex/skills/ship/phases/04-changelog-commit.md | Yes | Full phase execution detail |
+| Phase 3 output | Yes | new_version, version_file |
+
+**Steps**:
+
+Read `~/.codex/skills/ship/phases/04-changelog-commit.md` first.
+
+1. Gather commits since last tag (`git log "$last_tag"..HEAD`)
+2. Group by conventional commit prefix into changelog sections
+3. Format markdown changelog entry (`## [X.Y.Z] - YYYY-MM-DD`)
+4. Update or create CHANGELOG.md (insert new entry after main heading)
+5. Stage changes (`git add -u`)
+6. Create release commit (`chore: bump version to <new_version>`)
+7. Push branch to remote
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Push succeeded | Set gate = pass, output commit record |
+| Push rejected (non-fast-forward) | Set gate = fail, BLOCKED — suggest `git pull --rebase` |
+| Permission denied | Set gate = fail, BLOCKED — advise check remote access |
+| No remote configured | Set gate = fail, BLOCKED — suggest `git remote add` |
+| No previous tag | Use last 50 commits for changelog |
+
+**Output**: Structured changelog-commit JSON (see phase file for schema).
+
+---
+
+### Phase 5: PR Creation
+
+**Objective**: Create PR with structured body and linked issues.
+
+**Input**:
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| ~/.codex/skills/ship/phases/05-pr-creation.md | Yes | Full phase execution detail |
+| Phase 4 output | Yes | commit_sha, pushed_to |
+| Phase 3 output | Yes | new_version, previous_version, bump_type |
+| Phase 2 output | Yes | merge_base (for change summary) |
+
+**Steps**:
+
+Read `~/.codex/skills/ship/phases/05-pr-creation.md` first.
+
+1. Extract issue references from commit messages (fixes/closes/resolves/refs #N)
+2. Determine target branch (main fallback master)
+3. Build PR title: `release: v<new_version>`
+4. Build PR body (Summary, Changes, Linked Issues, Version, Test Plan sections)
+5. Create PR via `gh pr create`
+6. Capture PR URL from gh output
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| PR created, URL returned | Set gate = pass, output PR record, output DONE |
+| Phase 2 had warnings only | Set gate = pass with concerns, output DONE_WITH_CONCERNS |
+| gh CLI not available | Set gate = fail, BLOCKED — advise `gh auth login` |
+| PR creation fails | Set gate = fail, BLOCKED — report error details |
+
+**Output**: Structured PR creation JSON plus final completion status (see phase file for schema).
+
+---
+
+## Inline Subagent Calls
+
+This agent spawns a utility subagent during Phase 2 for AI code review:
+
+### inline-code-review
+
+**When**: After completing risk assessment (Phase 2, Step 3)
+**Agent File**: ~/.codex/agents/cli-explore-agent.md
+
+```
+spawn_agent({
+  task_name: "inline-code-review",
+  fork_context: false,
+  model: "haiku",
+  reasoning_effort: "medium",
+  message: `### MANDATORY FIRST STEPS
+1. Read: ~/.codex/agents/cli-explore-agent.md
+
+Goal: Review code changes for release readiness
+Context: Diff from <merge_base> to HEAD (<files_changed> files, +<lines_added>/-<lines_removed> lines)
+
+Task:
+- Review diff for bugs and correctness issues
+- Check for breaking changes (API, config, schema)
+- Identify security concerns
+- Assess test coverage gaps
+- Flag formatting-only changes to exclude from critical issues
+
+Expected: Risk level (low/medium/high), list of issues with severity and file:line reference, release recommendation (ship|hold|fix-first)
+Constraints: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes`
+})
+const result = wait_agent({ targets: ["inline-code-review"], timeout_ms: 300000 })
+close_agent({ target: "inline-code-review" })
+```
+
+### Result Handling
+
+| Result | Severity | Action |
+|--------|----------|--------|
+| recommendation: "ship", no critical issues | — | gate = pass, integrate findings |
+| recommendation: "hold" or critical issues found | HIGH | gate = fail, BLOCKED — list issues |
+| recommendation: "fix-first" | HIGH | gate = fail, BLOCKED — list issues with locations |
+| Warnings only, recommendation: "ship" | MEDIUM | gate = warn, proceed with DONE_WITH_CONCERNS |
+| Timeout or error | — | Log warning, ask user whether to proceed or retry |
+
+---
+
+## Structured Output Template
+
+```
+## Summary
+- One-sentence phase completion status
+
+## Phase Result
+- Phase: <phase_name>
+- Gate: pass | fail | warn | blocked | needs_context
+- Status: PASS | BLOCKED | NEEDS_CONTEXT | DONE_WITH_CONCERNS | DONE
+
+## Findings
+- Finding 1: specific description with file:line reference (if applicable)
+- Finding 2: specific description with file:line reference (if applicable)
+
+## Artifacts
+- File: path/to/modified/file
+  Change: specific modification made
+
+## Open Questions
+1. Question needing user answer (if gate = needs_context)
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Phase detail file not found | Report error, halt — phase files are required |
+| Git command fails | Report stderr, set gate = fail, BLOCKED |
+| Version file parse error | Report error, set gate = needs_context, ask user |
+| Inline subagent timeout | Log warning, ask user whether to proceed without AI review |
+| Build/test failure | Report output, set gate = fail, BLOCKED |
+| Push rejected | Report rejection reason, set gate = fail, BLOCKED with suggestion |
+| gh CLI missing | Report error, set gate = fail, BLOCKED with install advice |
+| Three consecutive failures at same step | Stop, output diagnostic dump, halt |
--- a/.codex/skills/ship/orchestrator.md
+++ b/.codex/skills/ship/orchestrator.md
@@ -0,0 +1,426 @@
+---
+name: ship
+description: Structured release pipeline with pre-flight checks, AI code review, version bump, changelog, and PR creation. Triggers on "ship", "release", "publish".
+agents: ship-operator
+phases: 5
+---
+
+# Ship
+
+Structured release pipeline that guides code from working branch to pull request through 5 gated phases: pre-flight checks, automated code review, version bump, changelog generation, and PR creation.
+
+## Architecture
+
+```
+--------------------------------------------------------------+
+|  ship Orchestrator                                           |
+|  -> Single ship-operator agent driven through 5 gated phases |
+------------------------------+-------------------------------+
+                               |
+           +-------------------+-------------------+
+           v                   v                   v
+    +------------+      +------------+      +------------+
+    |  Phase 1   |  --> |  Phase 2   |  --> |  Phase 3   |
+    | Pre-Flight |      | Code Review|      | Version    |
+    |  Checks    |      |            |      |   Bump     |
+    +------------+      +------------+      +------------+
+          v                   v                   v
+      Gate: ALL           Gate: No            Gate: Version
+      4 checks            critical            updated OK
+      pass                issues
+                               |
+           +-------------------+-------------------+
+           v                                       v
+    +------------+                          +------------+
+    |  Phase 4   |  ----------------------> |  Phase 5   |
+    | Changelog  |                          | PR Creation|
+    |  & Commit  |                          |            |
+    +------------+                          +------------+
+          v                                       v
+      Gate: Push                             Gate: PR
+      succeeded                              created
+```
+
+---
+
+## Agent Registry
+
+| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
+|-------|-----------|-----------|----------------|---------|--------------|
+| ship-operator | ship-operator | ~/.codex/agents/ship-operator.md | Execute all 5 release phases sequentially, enforce gates | Deep Interaction (2.3) | false |
+
+> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
+
+---
+
+## Fork Context Strategy
+
+| Agent | task_name | fork_context | fork_from | Rationale |
+|-------|-----------|--------------|-----------|-----------|
+| ship-operator | ship-operator | false | — | Starts fresh; all context provided in initial task message |
+
+**Fork Decision Rules**:
+
+| Condition | fork_context | Reason |
+|-----------|--------------|--------|
+| Pipeline stage with explicit input | false | Context in message, not history |
+| Agent is isolated utility | false | Clean context, focused task |
+| ship-operator | false | Self-contained release operator; no parent context needed |
+
+---
+
+## Subagent Registry
+
+Utility subagents callable by ship-operator (not separate pipeline stages):
+
+| Subagent | Agent File | Callable By | Purpose | Model |
+|----------|-----------|-------------|---------|-------|
+| inline-code-review | ~/.codex/agents/cli-explore-agent.md | ship-operator | AI code review of diff during Phase 2 | haiku |
+
+> Subagents are spawned by agents within their own execution context (Pattern 2.8), not by the orchestrator.
+
+---
+
+## Phase Execution
+
+### Phase 1: Pre-Flight Checks
+
+**Objective**: Validate that the repository is in a shippable state — confirm clean working tree, appropriate branch, passing tests, and successful build.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| User trigger | "ship" / "release" / "publish" command |
+| Repository | Current git working directory |
+| Phase detail | ~/.codex/skills/ship/phases/01-preflight-checks.md |
+
+**Execution**:
+
+Spawn ship-operator with Phase 1 task. The operator reads the phase detail file then executes all four checks.
+
+```
+spawn_agent({
+  task_name: "ship-operator",
+  fork_context: false,
+  message: `## TASK ASSIGNMENT
+
+### MANDATORY FIRST STEPS
+1. Read role definition: ~/.codex/agents/ship-operator.md (MUST read first)
+2. Read phase detail: ~/.codex/skills/ship/phases/01-preflight-checks.md
+
+---
+
+Goal: Execute Phase 1 Pre-Flight Checks for the release pipeline.
+
+Execute all four checks (git clean, branch validation, test suite, build verification).
+Output structured preflight-report JSON plus gate status.`
+})
+const phase1Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
+```
+
+**Gate Decision**:
+
+| Condition | Action |
+|-----------|--------|
+| All four checks pass (overall: "pass") | Fast-advance: assign Phase 2 task to ship-operator |
+| Any check fails (overall: "fail") | BLOCKED — report failure details, halt pipeline |
+| Branch is main/master (warn) | Ask user to confirm direct-to-main release before proceeding |
+| Timeout | assign_task "Finalize current work and output results", re-wait 120s |
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| preflight-report JSON | Pass/fail per check, blockers list |
+| Gate status | pass / fail / blocked |
+
+---
+
+### Phase 2: Code Review
+
+**Objective**: Detect merge base, generate diff, run AI-powered code review via inline subagent, assess risk, evaluate results.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| Phase 1 result | Gate passed (overall: "pass") |
+| Repository | Git history, diff data |
+| Phase detail | ~/.codex/skills/ship/phases/02-code-review.md |
+
+**Execution**:
+
+Phase 2 is assigned to the already-running ship-operator via assign_task.
+
+```
+assign_task({
+  target: "ship-operator",
+  items: [{ type: "text", text: `## PHASE 2 TASK
+
+Read phase detail: ~/.codex/skills/ship/phases/02-code-review.md
+
+Execute Phase 2 Code Review:
+1. Detect merge base
+2. Generate diff summary
+3. Perform risk assessment
+4. Spawn inline-code-review subagent for AI analysis
+5. Evaluate review results and report gate status` }]
+})
+const phase2Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 600000 })
+```
+
+**Gate Decision**:
+
+| Condition | Action |
+|-----------|--------|
+| No critical issues (overall: "pass") | Fast-advance: assign Phase 3 task to ship-operator |
+| Critical issues found (overall: "fail") | BLOCKED — report critical issues list, halt pipeline |
+| Warnings only (overall: "warn") | Fast-advance to Phase 3, flag DONE_WITH_CONCERNS |
+| Review subagent timeout/error | Ask user whether to proceed or retry; if proceed, flag warn |
+| Timeout on phase2Result | assign_task "Finalize current work", re-wait 120s |
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| Review summary JSON | Risk level, risk factors, AI review recommendation, issues |
+| Gate status | pass / fail / warn / blocked |
+
+---
+
+### Phase 3: Version Bump
+
+**Objective**: Detect version file, determine bump type from commits or user input, calculate new version, update version file, verify update.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| Phase 2 result | Gate passed (no critical issues) |
+| Repository | package.json / pyproject.toml / VERSION |
+| Phase detail | ~/.codex/skills/ship/phases/03-version-bump.md |
+
+**Execution**:
+
+```
+assign_task({
+  target: "ship-operator",
+  items: [{ type: "text", text: `## PHASE 3 TASK
+
+Read phase detail: ~/.codex/skills/ship/phases/03-version-bump.md
+
+Execute Phase 3 Version Bump:
+1. Detect version file (package.json > pyproject.toml > VERSION)
+2. Determine bump type from commit messages (patch/minor/major)
+3. For major bumps: ask user to confirm before proceeding
+4. Calculate new version
+5. Update version file
+6. Verify update
+Output version change record JSON plus gate status.` }]
+})
+const phase3Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
+```
+
+**Gate Decision**:
+
+| Condition | Action |
+|-----------|--------|
+| Version file updated and verified (overall: "pass") | Fast-advance: assign Phase 4 task to ship-operator |
+| Version file not found | NEEDS_CONTEXT — ask user which file to use; halt until answered |
+| Version mismatch after update (overall: "fail") | BLOCKED — report mismatch, halt pipeline |
+| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
+| Timeout | assign_task "Finalize current work", re-wait 120s |
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| Version change record JSON | version_file, previous_version, new_version, bump_type, bump_source |
+| Gate status | pass / fail / needs_context / blocked |
+
+---
+
+### Phase 4: Changelog & Commit
+
+**Objective**: Parse git log into grouped changelog entry, update CHANGELOG.md, create release commit, push branch to remote.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| Phase 3 result | new_version, version_file, bump_type |
+| Repository | Git history since last tag |
+| Phase detail | ~/.codex/skills/ship/phases/04-changelog-commit.md |
+
+**Execution**:
+
+```
+assign_task({
+  target: "ship-operator",
+  items: [{ type: "text", text: `## PHASE 4 TASK
+
+Read phase detail: ~/.codex/skills/ship/phases/04-changelog-commit.md
+
+New version: <new_version>
+Version file: <version_file>
+
+Execute Phase 4 Changelog & Commit:
+1. Gather commits since last tag
+2. Group by conventional commit type
+3. Format changelog entry
+4. Update or create CHANGELOG.md
+5. Create release commit (chore: bump version to <new_version>)
+6. Push branch to remote
+Output commit record JSON plus gate status.` }]
+})
+const phase4Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
+```
+
+**Gate Decision**:
+
+| Condition | Action |
+|-----------|--------|
+| Push succeeded (overall: "pass") | Fast-advance: assign Phase 5 task to ship-operator |
+| Push rejected (non-fast-forward) | BLOCKED — report error, suggest `git pull --rebase` |
+| Permission denied | BLOCKED — report error, advise check remote access |
+| No remote configured | BLOCKED — report error, suggest `git remote add` |
+| Timeout | assign_task "Finalize current work", re-wait 120s |
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| Commit record JSON | changelog_entry, commit_sha, commit_message, pushed_to |
+| Gate status | pass / fail / blocked |
+
+---
+
+### Phase 5: PR Creation
+
+**Objective**: Extract issue references from commits, build PR title and body, create PR via `gh pr create`, capture PR URL.
+
+**Input**:
+
+| Source | Description |
+|--------|-------------|
+| Phase 4 result | commit_sha, new_version, previous_version, bump_type |
+| Phase 2 result | merge_base (for change_summary) |
+| Repository | Git history, remote |
+| Phase detail | ~/.codex/skills/ship/phases/05-pr-creation.md |
+
+**Execution**:
+
+```
+assign_task({
+  target: "ship-operator",
+  items: [{ type: "text", text: `## PHASE 5 TASK
+
+Read phase detail: ~/.codex/skills/ship/phases/05-pr-creation.md
+
+New version: <new_version>
+Previous version: <previous_version>
+Bump type: <bump_type>
+Merge base: <merge_base>
+Commit SHA: <commit_sha>
+
+Execute Phase 5 PR Creation:
+1. Extract issue references from commits
+2. Determine target branch
+3. Build PR title: "release: v<new_version>"
+4. Build PR body with all sections
+5. Create PR via gh pr create
+6. Capture and report PR URL
+Output PR creation record JSON plus final completion status.` }]
+})
+const phase5Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
+```
+
+**Gate Decision**:
+
+| Condition | Action |
+|-----------|--------|
+| PR created, URL returned (overall: "pass") | Pipeline complete — output DONE status |
+| PR created with review warnings | Pipeline complete — output DONE_WITH_CONCERNS |
+| gh CLI not available | BLOCKED — report error, advise `gh auth login` |
+| PR creation fails | BLOCKED — report error details, halt |
+| Timeout | assign_task "Finalize current work", re-wait 120s |
+
+**Output**:
+
+| Artifact | Description |
+|----------|-------------|
+| PR record JSON | pr_url, pr_title, target_branch, source_branch, linked_issues |
+| Final completion status | DONE / DONE_WITH_CONCERNS / BLOCKED |
+
+---
+
+## Lifecycle Management
+
+### Timeout Protocol
+
+| Phase | Default Timeout | On Timeout |
+|-------|-----------------|------------|
+| Phase 1: Pre-Flight | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
+| Phase 2: Code Review | 600000 ms (10 min) | assign_task "Finalize current work", re-wait 120s |
+| Phase 3: Version Bump | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
+| Phase 4: Changelog & Commit | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
+| Phase 5: PR Creation | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
+
+### Cleanup Protocol
+
+After Phase 5 completes (or on any terminal BLOCKED halt), close ship-operator.
+
+```
+close_agent({ target: "ship-operator" })
+```
+
+### Agent Health Check
+
+```
+const remaining = list_agents({})
+if (remaining.length > 0) {
+  remaining.forEach(agent => close_agent({ target: agent.id }))
+}
+```
+
+---
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Agent timeout (first) | assign_task with "Finalize current work and output results" + re-wait 120s |
+| Agent timeout (second) | Log error, close_agent({ target: "ship-operator" }), report partial results |
+| Gate fail — any phase | Log BLOCKED status with phase name and failure detail, close_agent, halt |
+| NEEDS_CONTEXT | Pause pipeline, surface question to user, resume with assign_task on answer |
+| send_message ignored | Escalate to assign_task |
+| Inline subagent timeout | ship-operator handles internally; continue with warn if review failed |
+| User cancellation | close_agent({ target: "ship-operator" }), report current pipeline state |
+| Fork from closed agent | Not applicable (single agent, no forking) |
+
+---
+
+## Output Format
+
+```
+## Summary
+- One-sentence completion status (DONE / DONE_WITH_CONCERNS / BLOCKED)
+
+## Results
+- Phase 1 Pre-Flight: pass/fail
+- Phase 2 Code Review: pass/warn/fail
+- Phase 3 Version Bump: <previous> -> <new> (<bump_type>)
+- Phase 4 Changelog & Commit: commit <sha> pushed to <remote/branch>
+- Phase 5 PR Creation: <pr_url>
+
+## Artifacts
+- CHANGELOG.md (updated)
+- <version_file> (version bumped to <new_version>)
+- Release commit: <sha>
+- PR: <pr_url>
+
+## Next Steps (Optional)
+1. Review and merge the PR
+2. Tag the release after merge
+```
--- a/.codex/skills/ship/phases/01-preflight-checks.md
+++ b/.codex/skills/ship/phases/01-preflight-checks.md
@@ -0,0 +1,198 @@
+# Phase 1: Pre-Flight Checks
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Validate that the repository is in a shippable state before proceeding with the release pipeline.
+
+## Objective
+
+- Confirm working tree is clean (no uncommitted changes)
+- Validate current branch is appropriate for release
+- Run test suite and confirm all tests pass
+- Verify build succeeds
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Repository working directory | Yes | Git repo with working tree |
+| package.json / pyproject.toml / Makefile | No | Used for test and build detection |
+
+## Execution Steps
+
+### Step 1: Git Clean Check
+
+Run `git status --porcelain` and evaluate output.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Output is empty | PASS — working tree is clean |
+| Output is non-empty | FAIL — working tree is dirty; report dirty files, suggest `git stash` or `git commit` |
+
+```bash
+git_status=$(git status --porcelain)
+if [ -n "$git_status" ]; then
+  echo "FAIL: Working tree is dirty"
+  echo "$git_status"
+  # Gate: BLOCKED — commit or stash changes first
+else
+  echo "PASS: Working tree is clean"
+fi
+```
+
+**Pass condition**: `git status --porcelain` produces empty output.
+**On failure**: Report dirty files and suggest `git stash` or `git commit`.
+
+---
+
+### Step 2: Branch Validation
+
+Run `git branch --show-current` and evaluate.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Branch is not main or master | PASS — proceed |
+| Branch is main or master | WARN — ask user to confirm direct-to-main/master release before proceeding |
+| User confirms direct release | PASS with warning noted |
+| User declines | BLOCKED — halt pipeline |
+
+```bash
+current_branch=$(git branch --show-current)
+if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
+  echo "WARN: Currently on $current_branch — direct push to main/master is risky"
+  # Ask user for confirmation before proceeding
+else
+  echo "PASS: On branch $current_branch"
+fi
+```
+
+**Pass condition**: Not on main/master, OR user explicitly confirms direct-to-main release.
+**On warning**: Ask user to confirm they intend to release from main/master directly.
+
+---
+
+### Step 3: Test Suite Execution
+
+Detect project type and run appropriate test command.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| package.json with "test" script exists | Run `npm test` |
+| pytest available and tests/ or test/ directory exists | Run `pytest` |
+| pyproject.toml with pytest listed exists | Run `pytest` |
+| No test suite detected | WARN and continue (skip check) |
+| Test command exits code 0 | PASS |
+| Test command exits non-zero | FAIL — report test failures, halt pipeline |
+
+```bash
+# Detection priority:
+# 1. package.json with "test" script → npm test
+# 2. pytest available and tests exist → pytest
+# 3. No tests found → WARN and continue
+
+if [ -f "package.json" ] && grep -q '"test"' package.json; then
+  npm test
+elif command -v pytest &>/dev/null && [ -d "tests" -o -d "test" ]; then
+  pytest
+elif [ -f "pyproject.toml" ] && grep -q 'pytest' pyproject.toml; then
+  pytest
+else
+  echo "WARN: No test suite detected — skipping test check"
+fi
+```
+
+**Pass condition**: Test command exits with code 0, or no tests detected (warn).
+**On failure**: Report test failures and stop the pipeline.
+
+---
+
+### Step 4: Build Verification
+
+Detect project build step and run it.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| package.json with "build" script exists | Run `npm run build` |
+| pyproject.toml exists and python build module available | Run `python -m build` |
+| Makefile with build target exists | Run `make build` |
+| No build step detected | INFO — skip (not all projects need a build), PASS |
+| Build command exits code 0 | PASS |
+| Build command exits non-zero | FAIL — report build errors, halt pipeline |
+
+```bash
+# Detection priority:
+# 1. package.json with "build" script → npm run build
+# 2. pyproject.toml → python -m build (if build module available)
+# 3. Makefile with build target → make build
+# 4. No build step → PASS (not all projects need a build)
+
+if [ -f "package.json" ] && grep -q '"build"' package.json; then
+  npm run build
+elif [ -f "pyproject.toml" ] && python -m build --help &>/dev/null; then
+  python -m build
+elif [ -f "Makefile" ] && grep -q '^build:' Makefile; then
+  make build
+else
+  echo "INFO: No build step detected — skipping build check"
+fi
+```
+
+**Pass condition**: Build command exits with code 0, or no build step detected.
+**On failure**: Report build errors and stop the pipeline.
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| preflight-report | JSON | Pass/fail per check, current branch, blockers list |
+
+```json
+{
+  "phase": "preflight",
+  "timestamp": "ISO-8601",
+  "checks": {
+    "git_clean": { "status": "pass|fail", "details": "" },
+    "branch": { "status": "pass|warn", "current": "branch-name", "details": "" },
+    "tests": { "status": "pass|fail|skip", "details": "" },
+    "build": { "status": "pass|fail|skip", "details": "" }
+  },
+  "overall": "pass|fail",
+  "blockers": []
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Git working tree is clean | `git status --porcelain` returns empty |
+| Branch is non-main or user confirmed | Branch check + optional user confirmation |
+| Tests pass or skipped with warning | Test command exit code 0, or skip with WARN |
+| Build passes or skipped with info | Build command exit code 0, or skip with INFO |
+| Overall gate is "pass" | All checks produce pass/warn/skip (no fail) |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| Dirty working tree | BLOCKED — list dirty files, suggest `git stash` or `git commit`, halt |
+| Tests fail | BLOCKED — report test output, halt pipeline |
+| Build fails | BLOCKED — report build output, halt pipeline |
+| git command not found | BLOCKED — report environment error |
+| No version file or project type detected | WARN — continue, version detection deferred to Phase 3 |
+
+## Next Phase
+
+-> [Phase 2: Code Review](02-code-review.md)
+
+If any check fails (overall: "fail"), report BLOCKED status with the preflight report. Do not proceed.
--- a/.codex/skills/ship/phases/02-code-review.md
+++ b/.codex/skills/ship/phases/02-code-review.md
@@ -0,0 +1,228 @@
+# Phase 2: Code Review
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Automated AI-powered code review of changes since the base branch, with risk assessment.
+
+## Objective
+
+- Detect the merge base between current branch and target branch
+- Generate diff for review
+- Assess high-risk indicators before AI review
+- Run AI-powered code review via inline subagent
+- Flag high-risk changes (large diffs, sensitive files, breaking changes)
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Phase 1 gate result | Yes | overall: "pass" — must have passed |
+| Repository git history | Yes | Commit log, diff data |
+
+## Execution Steps
+
+### Step 1: Detect Merge Base
+
+Determine the target branch and find the common ancestor commit.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| origin/main exists | Use main as target branch |
+| origin/main not found | Fall back to master as target branch |
+| Current branch is main or master | Use last tag as merge base |
+| Current branch is main/master and no tags exist | Use initial commit as merge base |
+| Current branch is feature branch | Use `git merge-base origin/<target> HEAD` |
+
+```bash
+# Determine target branch (default: main, fallback: master)
+target_branch="main"
+if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
+  target_branch="master"
+fi
+
+# Find merge base
+merge_base=$(git merge-base "origin/$target_branch" HEAD)
+echo "Merge base: $merge_base"
+
+# If on main/master directly, compare against last tag
+current_branch=$(git branch --show-current)
+if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
+  last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+  if [ -n "$last_tag" ]; then
+    merge_base="$last_tag"
+    echo "On main — using last tag as base: $last_tag"
+  else
+    # Use first commit if no tags exist
+    merge_base=$(git rev-list --max-parents=0 HEAD | head -1)
+    echo "No tags found — using initial commit as base"
+  fi
+fi
+```
+
+---
+
+### Step 2: Generate Diff Summary
+
+Collect statistics and full diff content.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Diff command succeeds | Record files_changed, lines_added, lines_removed |
+| No changes found | WARN — nothing to review; ask user whether to proceed |
+
+```bash
+# File-level summary
+git diff --stat "$merge_base"...HEAD
+
+# Full diff for review
+git diff "$merge_base"...HEAD > /tmp/ship-review-diff.txt
+
+# Count changes for risk assessment
+files_changed=$(git diff --name-only "$merge_base"...HEAD | wc -l)
+lines_added=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$1} END {print s}')
+lines_removed=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$2} END {print s}')
+```
+
+---
+
+### Step 3: Risk Assessment
+
+Flag high-risk indicators before AI review.
+
+**Risk Factor Table**:
+
+| Risk Factor | Threshold | Risk Level |
+|-------------|-----------|------------|
+| Files changed | > 50 | High |
+| Lines changed | > 1000 | High |
+| Sensitive files modified | Any of: `.env*`, `*secret*`, `*credential*`, `*auth*`, `*.key`, `*.pem` | High |
+| Config files modified | `package.json`, `pyproject.toml`, `tsconfig.json`, `Dockerfile` | Medium |
+| Migration files | `*migration*`, `*migrate*` | Medium |
+
+```bash
+# Check for sensitive file changes
+sensitive_files=$(git diff --name-only "$merge_base"...HEAD | grep -iE '\.(env|key|pem)|secret|credential' || true)
+if [ -n "$sensitive_files" ]; then
+  echo "HIGH RISK: Sensitive files modified:"
+  echo "$sensitive_files"
+fi
+```
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Sensitive files detected | Set risk_level = high, add to risk_factors |
+| files_changed > 50 | Set risk_level = high, add to risk_factors |
+| lines changed > 1000 | Set risk_level = high, add to risk_factors |
+| Config or migration files detected | Set risk_level = medium (if not already high) |
+| No risk factors | Set risk_level = low |
+
+---
+
+### Step 4: AI Code Review via Inline Subagent
+
+Spawn inline-code-review subagent for AI analysis. Replace the ccw cli call from the original with this inline subagent:
+
+```
+spawn_agent({
+  task_name: "inline-code-review",
+  fork_context: false,
+  model: "haiku",
+  reasoning_effort: "medium",
+  message: `### MANDATORY FIRST STEPS
+1. Read: ~/.codex/agents/cli-explore-agent.md
+
+Goal: Review code changes for release readiness
+Context: Diff from <merge_base> to HEAD (<files_changed> files, +<lines_added>/-<lines_removed> lines)
+
+Task:
+- Review diff for bugs and correctness issues
+- Check for breaking changes (API, config, schema)
+- Identify security concerns
+- Assess test coverage gaps
+- Flag formatting-only changes to exclude from critical issues
+
+Expected: Risk level (low/medium/high), list of issues with severity and file:line reference, release recommendation (ship|hold|fix-first)
+Constraints: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes`
+})
+const result = wait_agent({ targets: ["inline-code-review"], timeout_ms: 300000 })
+close_agent({ target: "inline-code-review" })
+```
+
+**Note**: Wait for the subagent to complete before proceeding. Do not advance to Step 5 while review is running.
+
+---
+
+### Step 5: Evaluate Review Results
+
+Based on the inline subagent output, apply gate logic.
+
+**Review Result Decision Table**:
+
+| Review Result | Action |
+|---------------|--------|
+| recommendation: "ship", no critical issues | Gate = pass — proceed to Phase 3 |
+| recommendation: "hold" or critical issues present | Gate = fail — report BLOCKED, list issues |
+| recommendation: "fix-first" | Gate = fail — report BLOCKED, list issues with file:line |
+| Warnings only, recommendation: "ship" | Gate = warn — proceed with DONE_WITH_CONCERNS note |
+| Review subagent failed or timed out | Ask user whether to proceed or retry |
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| Review summary | JSON | Risk level, risk factors, AI review recommendation, critical issues, warnings |
+
+```json
+{
+  "phase": "code-review",
+  "merge_base": "commit-sha",
+  "stats": {
+    "files_changed": 0,
+    "lines_added": 0,
+    "lines_removed": 0
+  },
+  "risk_level": "low|medium|high",
+  "risk_factors": [],
+  "ai_review": {
+    "recommendation": "ship|hold|fix-first",
+    "critical_issues": [],
+    "warnings": []
+  },
+  "overall": "pass|fail|warn"
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Merge base detected | merge_base SHA present in output |
+| Diff statistics collected | files_changed, lines_added, lines_removed populated |
+| Risk assessment completed | risk_level set (low/medium/high), risk_factors populated |
+| AI review completed | ai_review.recommendation present |
+| Gate condition evaluated | overall set to pass/fail/warn |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| origin/main and origin/master both missing | Use HEAD~1 as merge base, warn user |
+| No commits in diff | WARN — nothing to review; ask user whether to proceed |
+| Inline subagent timeout | Log warning, ask user whether to proceed without AI review |
+| Inline subagent error | Log error, ask user whether to proceed |
+| Critical issues found | BLOCKED — report full issues list with severity and file:line |
+
+## Next Phase
+
+-> [Phase 3: Version Bump](03-version-bump.md)
+
+If review passes (overall: "pass" or "warn"), proceed to Phase 3.
+If critical issues found (overall: "fail"), report BLOCKED status with review summary. Do not proceed.
--- a/.codex/skills/ship/phases/03-version-bump.md
+++ b/.codex/skills/ship/phases/03-version-bump.md
@@ -0,0 +1,259 @@
+# Phase 3: Version Bump
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Detect the current version, determine the bump type, and update the version file.
+
+## Objective
+
+- Detect which version file the project uses
+- Read the current version
+- Determine bump type (patch/minor/major) from commit messages or user input
+- Update the version file
+- Record the version change
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Phase 2 gate result | Yes | overall: "pass" or "warn" — must have passed |
+| package.json / pyproject.toml / VERSION | Conditional | One must exist; used for version detection |
+| Git history | Yes | Commit messages for bump type auto-detection |
+
+## Execution Steps
+
+### Step 1: Detect Version File
+
+Search for version file in priority order.
+
+**Version File Detection Priority Table**:
+
+| Priority | File | Read Method |
+|----------|------|-------------|
+| 1 | `package.json` | `jq -r .version package.json` |
+| 2 | `pyproject.toml` | `grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml` |
+| 3 | `VERSION` | `cat VERSION` |
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| package.json found | Set version_file = package.json, read version with node/jq |
+| pyproject.toml found (no package.json) | Set version_file = pyproject.toml, read with grep -oP |
+| VERSION found (no others) | Set version_file = VERSION, read with cat |
+| No version file found | NEEDS_CONTEXT — ask user which file to use or create |
+
+```bash
+if [ -f "package.json" ]; then
+  version_file="package.json"
+  current_version=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
+elif [ -f "pyproject.toml" ]; then
+  version_file="pyproject.toml"
+  current_version=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
+elif [ -f "VERSION" ]; then
+  version_file="VERSION"
+  current_version=$(cat VERSION | tr -d '[:space:]')
+else
+  echo "NEEDS_CONTEXT: No version file found"
+  echo "Expected one of: package.json, pyproject.toml, VERSION"
+  # Ask user which file to use or create
+fi
+
+echo "Version file: $version_file"
+echo "Current version: $current_version"
+```
+
+---
+
+### Step 2: Determine Bump Type
+
+Auto-detect from commit messages, then confirm with user for major bumps.
+
+**Bump Type Auto-Detection from Conventional Commits**:
+
+```bash
+# Get commits since last tag
+last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+if [ -n "$last_tag" ]; then
+  commits=$(git log "$last_tag"..HEAD --oneline)
+else
+  commits=$(git log --oneline -20)
+fi
+
+# Scan for conventional commit prefixes
+has_breaking=$(echo "$commits" | grep -iE '(BREAKING CHANGE|!:)' || true)
+has_feat=$(echo "$commits" | grep -iE '^[a-f0-9]+ feat' || true)
+has_fix=$(echo "$commits" | grep -iE '^[a-f0-9]+ fix' || true)
+
+if [ -n "$has_breaking" ]; then
+  suggested_bump="major"
+elif [ -n "$has_feat" ]; then
+  suggested_bump="minor"
+else
+  suggested_bump="patch"
+fi
+
+echo "Suggested bump: $suggested_bump"
+```
+
+**User Confirmation Decision Table**:
+
+| Bump Type | Action |
+|-----------|--------|
+| patch | Proceed with suggested bump, inform user |
+| minor | Proceed with suggested bump, inform user |
+| major | Always ask user to confirm before proceeding |
+| User overrides suggestion | Use user-specified bump type |
+| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
+
+---
+
+### Step 3: Calculate New Version
+
+Apply semver arithmetic to derive new version.
+
+**Decision Table**:
+
+| Bump Type | Calculation |
+|-----------|-------------|
+| major | `(major+1).0.0` |
+| minor | `major.(minor+1).0` |
+| patch | `major.minor.(patch+1)` |
+
+```bash
+# Parse semver components
+IFS='.' read -r major minor patch <<< "$current_version"
+
+case "$bump_type" in
+  major)
+    new_version="$((major + 1)).0.0"
+    ;;
+  minor)
+    new_version="${major}.$((minor + 1)).0"
+    ;;
+  patch)
+    new_version="${major}.${minor}.$((patch + 1))"
+    ;;
+esac
+
+echo "Version bump: $current_version -> $new_version"
+```
+
+---
+
+### Step 4: Update Version File
+
+Write new version to the appropriate file using the correct method for each format.
+
+**Decision Table**:
+
+| Version File | Update Method |
+|--------------|---------------|
+| package.json | `jq --arg v "<new_version>" '.version = $v'` + update package-lock.json if present |
+| pyproject.toml | `sed -i "s/^version\s*=\s*\".*\"/version = \"<new_version>\"/"` |
+| VERSION | `echo "<new_version>" > VERSION` |
+
+```bash
+case "$version_file" in
+  package.json)
+    # Use node/jq for safe JSON update
+    jq --arg v "$new_version" '.version = $v' package.json > tmp.json && mv tmp.json package.json
+    # Also update package-lock.json if it exists
+    if [ -f "package-lock.json" ]; then
+      jq --arg v "$new_version" '.version = $v | .packages[""].version = $v' package-lock.json > tmp.json && mv tmp.json package-lock.json
+    fi
+    ;;
+  pyproject.toml)
+    # Use sed for TOML update (version line in [project] or [tool.poetry])
+    sed -i "s/^version\s*=\s*\".*\"/version = \"$new_version\"/" pyproject.toml
+    ;;
+  VERSION)
+    echo "$new_version" > VERSION
+    ;;
+esac
+
+echo "Updated $version_file: $current_version -> $new_version"
+```
+
+---
+
+### Step 5: Verify Update
+
+Re-read version file to confirm the update was applied correctly.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Re-read version equals new_version | PASS — gate satisfied |
+| Re-read version does not match | FAIL — report mismatch, BLOCKED |
+
+```bash
+# Re-read to confirm
+case "$version_file" in
+  package.json)
+    verified=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
+    ;;
+  pyproject.toml)
+    verified=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
+    ;;
+  VERSION)
+    verified=$(cat VERSION | tr -d '[:space:]')
+    ;;
+esac
+
+if [ "$verified" = "$new_version" ]; then
+  echo "PASS: Version verified as $new_version"
+else
+  echo "FAIL: Version mismatch — expected $new_version, got $verified"
+fi
+```
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| Version change record | JSON | version_file, previous_version, new_version, bump_type, bump_source |
+
+```json
+{
+  "phase": "version-bump",
+  "version_file": "package.json",
+  "previous_version": "1.2.3",
+  "new_version": "1.3.0",
+  "bump_type": "minor",
+  "bump_source": "auto-detected|user-specified",
+  "overall": "pass|fail"
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Version file detected | version_file field populated |
+| Current version read | current_version field populated |
+| Bump type determined | bump_type set to patch/minor/major |
+| Version file updated | Write/edit operation succeeded |
+| Update verified | Re-read matches new_version |
+| overall = "pass" | All steps completed without error |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| No version file found | NEEDS_CONTEXT — ask user which file to create or use |
+| Version parse error (malformed semver) | NEEDS_CONTEXT — report current value, ask user for correction |
+| jq not available | Fall back to node for package.json; report error for others |
+| sed fails on pyproject.toml | Try Write tool to rewrite the file; report on failure |
+| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
+| Version mismatch after update | BLOCKED — report expected vs actual, suggest manual fix |
+
+## Next Phase
+
+-> [Phase 4: Changelog & Commit](04-changelog-commit.md)
+
+If version updated successfully (overall: "pass"), proceed to Phase 4.
+If update fails or context needed, report BLOCKED / NEEDS_CONTEXT. Do not proceed.
--- a/.codex/skills/ship/phases/04-changelog-commit.md
+++ b/.codex/skills/ship/phases/04-changelog-commit.md
@@ -0,0 +1,263 @@
+# Phase 4: Changelog & Commit
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Generate changelog entry from git history, update CHANGELOG.md, create release commit, and push to remote.
+
+## Objective
+
+- Parse git log since last tag into grouped changelog entry
+- Update or create CHANGELOG.md
+- Create a release commit with version in the message
+- Push the branch to remote
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Phase 3 output | Yes | new_version, version_file, bump_type |
+| Git history | Yes | Commits since last tag |
+| CHANGELOG.md | No | Updated in-place if it exists; created if not |
+
+## Execution Steps
+
+### Step 1: Gather Commits Since Last Tag
+
+Retrieve commits to include in the changelog.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Last tag exists | `git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges` |
+| No previous tag found | Use last 50 commits: `git log --pretty=format:"%h %s" --no-merges -50` |
+
+```bash
+last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+
+if [ -n "$last_tag" ]; then
+  echo "Generating changelog since tag: $last_tag"
+  git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges
+else
+  echo "No previous tag found — using last 50 commits"
+  git log --pretty=format:"%h %s" --no-merges -50
+fi
+```
+
+---
+
+### Step 2: Group Commits by Conventional Commit Type
+
+Parse commit messages and group into changelog sections.
+
+**Conventional Commit Grouping Table**:
+
+| Prefix | Category | Changelog Section |
+|--------|----------|-------------------|
+| `feat:` / `feat(*):` | Features | **Features** |
+| `fix:` / `fix(*):` | Bug Fixes | **Bug Fixes** |
+| `perf:` | Performance | **Performance** |
+| `docs:` | Documentation | **Documentation** |
+| `refactor:` | Refactoring | **Refactoring** |
+| `chore:` | Maintenance | **Maintenance** |
+| `test:` | Testing | *(omitted from changelog)* |
+| Other | Miscellaneous | **Other Changes** |
+
+```bash
+# Example grouping logic (executed by the agent, not a literal script):
+# 1. Read all commits since last tag
+# 2. Parse prefix from each commit message
+# 3. Group into categories
+# 4. Format as markdown sections
+# 5. Omit empty categories
+```
+
+---
+
+### Step 3: Format Changelog Entry
+
+Generate a markdown changelog entry using ISO 8601 date format.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Category has commits | Include section with all entries |
+| Category is empty | Omit section entirely |
+| test: commits present | Omit from changelog output |
+
+Changelog entry format:
+
+```markdown
+## [X.Y.Z] - YYYY-MM-DD
+
+### Features
+- feat: description (sha)
+- feat(scope): description (sha)
+
+### Bug Fixes
+- fix: description (sha)
+
+### Performance
+- perf: description (sha)
+
+### Other Changes
+- chore: description (sha)
+```
+
+Rules:
+- Date format: YYYY-MM-DD (ISO 8601)
+- Each entry includes the short SHA for traceability
+- Empty categories are omitted
+- Entries are listed in chronological order within each category
+
+---
+
+### Step 4: Update CHANGELOG.md
+
+Write the new entry into CHANGELOG.md.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| CHANGELOG.md exists | Insert new entry after the first heading line (`# Changelog`), before previous version entry |
+| CHANGELOG.md does not exist | Create new file with `# Changelog` heading followed by new entry |
+
+```bash
+if [ -f "CHANGELOG.md" ]; then
+  # Insert new entry after the first heading line (# Changelog)
+  # The new entry goes between the main heading and the previous version entry
+  # Use Write tool to insert the new section at the correct position
+  echo "Updating existing CHANGELOG.md"
+else
+  # Create new CHANGELOG.md with header
+  echo "Creating new CHANGELOG.md"
+fi
+```
+
+**CHANGELOG.md structure**:
+
+```markdown
+# Changelog
+
+## [X.Y.Z] - YYYY-MM-DD
+(new entry here)
+
+## [X.Y.Z-1] - YYYY-MM-DD
+(previous entry)
+```
+
+---
+
+### Step 5: Create Release Commit
+
+Stage changed files and create conventionally-formatted release commit.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Version file is package.json | Stage package.json and package-lock.json (if present) |
+| Version file is pyproject.toml | Stage pyproject.toml |
+| Version file is VERSION | Stage VERSION |
+| CHANGELOG.md was updated/created | Stage CHANGELOG.md |
+| git commit succeeds | Proceed to push step |
+| git commit fails | BLOCKED — report error |
+
+```bash
+# Stage version file and changelog
+git add package.json package-lock.json pyproject.toml VERSION CHANGELOG.md 2>/dev/null
+
+# Only stage files that actually exist and are modified
+git add -u
+
+# Create release commit
+git commit -m "$(cat <<'EOF'
+chore: bump version to X.Y.Z
+
+Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
+EOF
+)"
+```
+
+**Commit message format**: `chore: bump version to <new_version>`
+- Follows conventional commit format
+- Includes Co-Authored-By trailer
+
+---
+
+### Step 6: Push to Remote
+
+Push the branch to the remote origin.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Remote tracking branch exists | `git push origin "<current_branch>"` |
+| No remote tracking branch | `git push -u origin "<current_branch>"` |
+| Push succeeds (exit 0) | PASS — gate satisfied |
+| Push rejected (non-fast-forward) | BLOCKED — report error, suggest `git pull --rebase` |
+| Permission denied | BLOCKED — report error, advise check remote access |
+| No remote configured | BLOCKED — report error, suggest `git remote add` |
+
+```bash
+current_branch=$(git branch --show-current)
+
+# Check if remote tracking branch exists
+if git rev-parse --verify "origin/$current_branch" &>/dev/null; then
+  git push origin "$current_branch"
+else
+  git push -u origin "$current_branch"
+fi
+```
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| Commit and push record | JSON | changelog_entry, commit_sha, commit_message, pushed_to |
+| CHANGELOG.md | Markdown file | Updated with new version entry |
+
+```json
+{
+  "phase": "changelog-commit",
+  "changelog_entry": "## [X.Y.Z] - YYYY-MM-DD ...",
+  "commit_sha": "abc1234",
+  "commit_message": "chore: bump version to X.Y.Z",
+  "pushed_to": "origin/branch-name",
+  "overall": "pass|fail"
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Commits gathered since last tag | Commit list non-empty or warn if empty |
+| Changelog entry formatted | Markdown entry with correct sections |
+| CHANGELOG.md updated or created | File exists with new entry at top |
+| Release commit created | `git log -1 --oneline` shows commit |
+| Branch pushed to remote | Push command exits 0 |
+| overall = "pass" | All steps completed without error |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| No commits since last tag | WARN — create minimal changelog entry, continue |
+| CHANGELOG.md write error | BLOCKED — report file system error |
+| git commit fails (nothing staged) | Verify version file and CHANGELOG.md were modified, re-stage |
+| Push rejected (non-fast-forward) | BLOCKED — suggest `git pull --rebase`, halt |
+| Push permission denied | BLOCKED — advise check SSH keys or access token |
+| No remote configured | BLOCKED — suggest `git remote add origin <url>` |
+
+## Next Phase
+
+-> [Phase 5: PR Creation](05-pr-creation.md)
+
+If commit and push succeed (overall: "pass"), proceed to Phase 5.
+If push fails, report BLOCKED status with error details. Do not proceed.
--- a/.codex/skills/ship/phases/05-pr-creation.md
+++ b/.codex/skills/ship/phases/05-pr-creation.md
@@ -0,0 +1,280 @@
+# Phase 5: PR Creation
+
+> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
+
+Create a pull request via GitHub CLI with a structured body, linked issues, and release metadata.
+
+## Objective
+
+- Create a PR using `gh pr create` with structured body
+- Auto-link related issues from commit messages
+- Include release summary (version, changes, test plan)
+- Output the PR URL
+
+## Input
+
+| Source | Required | Description |
+|--------|----------|-------------|
+| Phase 4 output | Yes | commit_sha, pushed_to |
+| Phase 3 output | Yes | new_version, previous_version, bump_type, version_file |
+| Phase 2 output | Yes | merge_base (for change summary) |
+| Git history | Yes | Commit messages for issue extraction |
+
+## Execution Steps
+
+### Step 1: Extract Issue References from Commits
+
+Scan commit messages for issue reference patterns.
+
+**Issue Reference Pattern**: `fixes #N`, `closes #N`, `resolves #N`, `refs #N` (case-insensitive, singular and plural forms).
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| Last tag exists | Scan commits from last_tag..HEAD |
+| No last tag | Scan last 50 commit subjects |
+| Issue references found | Deduplicate, sort numerically |
+| No issue references found | issues_section = empty (omit section from PR body) |
+
+```bash
+last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
+
+if [ -n "$last_tag" ]; then
+  commits=$(git log "$last_tag"..HEAD --pretty=format:"%s" --no-merges)
+else
+  commits=$(git log --pretty=format:"%s" --no-merges -50)
+fi
+
+# Extract issue references: fixes #N, closes #N, resolves #N, refs #N
+issues=$(echo "$commits" | grep -oiE '(fix(es)?|close[sd]?|resolve[sd]?|refs?)\s*#[0-9]+' | grep -oE '#[0-9]+' | sort -u || true)
+
+echo "Referenced issues: $issues"
+```
+
+---
+
+### Step 2: Determine Target Branch
+
+Find the appropriate base branch for the PR.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| origin/main exists | target_branch = main |
+| origin/main not found | target_branch = master |
+
+```bash
+# Default target: main (fallback: master)
+target_branch="main"
+if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
+  target_branch="master"
+fi
+
+current_branch=$(git branch --show-current)
+echo "PR: $current_branch -> $target_branch"
+```
+
+---
+
+### Step 3: Build PR Title
+
+Format the PR title as `release: vX.Y.Z`.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| new_version available from Phase 3 | pr_title = "release: v<new_version>" |
+| new_version not available | Fall back to descriptive title derived from branch name |
+
+```bash
+pr_title="release: v${new_version}"
+```
+
+---
+
+### Step 4: Build PR Body
+
+Construct the full PR body with all sections.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| issues list non-empty | Include "## Linked Issues" section with each issue as `- #N` |
+| issues list empty | Omit "## Linked Issues" section |
+| Phase 2 warnings exist | Include warning note in Summary section |
+
+```bash
+# Gather change summary
+change_summary=$(git log "$merge_base"..HEAD --pretty=format:"- %s (%h)" --no-merges)
+
+# Build linked issues section
+if [ -n "$issues" ]; then
+  issues_section="## Linked Issues
+$(echo "$issues" | while read -r issue; do echo "- $issue"; done)"
+else
+  issues_section=""
+fi
+```
+
+**PR Body Sections Table**:
+
+| Section | Content |
+|---------|---------|
+| **Summary** | Version being released, one-line description |
+| **Changes** | Grouped changelog entries (from Phase 4) |
+| **Linked Issues** | Auto-extracted `fixes #N`, `closes #N` references |
+| **Version** | Previous version, new version, bump type |
+| **Test Plan** | Checklist confirming all phases passed |
+
+---
+
+### Step 5: Create PR via gh CLI
+
+Invoke `gh pr create` with title and fully assembled body.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| gh CLI available | Execute `gh pr create` |
+| gh CLI not installed | BLOCKED — report missing CLI, advise `gh auth login` |
+| PR created successfully | Capture URL from output |
+| PR creation fails (already exists) | Report existing PR URL, gate = pass |
+| PR creation fails (other error) | BLOCKED — report error details |
+
+```bash
+gh pr create --title "$pr_title" --base "$target_branch" --body "$(cat <<'EOF'
+## Summary
+Release vX.Y.Z
+
+### Changes
+- list of changes from changelog
+
+## Linked Issues
+- #N (fixes)
+- #M (closes)
+
+## Version
+- Previous: X.Y.Z-1
+- New: X.Y.Z
+- Bump type: patch|minor|major
+
+## Test Plan
+- [ ] Pre-flight checks passed (git clean, branch, tests, build)
+- [ ] AI code review completed with no critical issues
+- [ ] Version bump verified in version file
+- [ ] Changelog updated with all changes since last release
+- [ ] Release commit pushed successfully
+
+Generated with [Claude Code](https://claude.com/claude-code)
+EOF
+)"
+```
+
+---
+
+### Step 6: Capture and Report PR URL
+
+Extract the PR URL from gh output.
+
+**Decision Table**:
+
+| Condition | Action |
+|-----------|--------|
+| URL present in output | Record pr_url, set gate = pass |
+| No URL in output | Check `gh pr view --json url` as fallback |
+| Both fail | BLOCKED — report failure |
+
+```bash
+# gh pr create outputs the PR URL on success
+pr_url=$(gh pr create ... 2>&1 | tail -1)
+echo "PR created: $pr_url"
+```
+
+---
+
+## Output
+
+| Artifact | Format | Description |
+|----------|--------|-------------|
+| PR creation record | JSON | pr_url, pr_title, target_branch, source_branch, linked_issues |
+| Final completion status | Text block | DONE / DONE_WITH_CONCERNS with full summary |
+
+```json
+{
+  "phase": "pr-creation",
+  "pr_url": "https://github.com/owner/repo/pull/N",
+  "pr_title": "release: vX.Y.Z",
+  "target_branch": "main",
+  "source_branch": "feature-branch",
+  "linked_issues": ["#1", "#2"],
+  "overall": "pass|fail"
+}
+```
+
+## Success Criteria
+
+| Criterion | Validation Method |
+|-----------|-------------------|
+| Issue references extracted | issues list populated (or empty with no error) |
+| Target branch determined | target_branch set to main or master |
+| PR title formatted | pr_title = "release: v<new_version>" |
+| PR body assembled with all sections | All required sections present |
+| PR created via gh CLI | pr_url present in output |
+| Completion status output | DONE or DONE_WITH_CONCERNS block present |
+
+## Error Handling
+
+| Scenario | Resolution |
+|----------|------------|
+| gh CLI not installed | BLOCKED — report error, advise install + `gh auth login` |
+| Not authenticated with gh | BLOCKED — report auth error, advise `gh auth login` |
+| PR already exists for branch | Report existing PR URL, treat as pass |
+| No changes to create PR for | BLOCKED — report, suggest verifying Phase 4 push succeeded |
+| Issue regex finds no matches | issues = [] — omit Linked Issues section, continue |
+
+## Completion Status
+
+After PR creation, output the final Completion Status:
+
+```
+## STATUS: DONE
+
+**Summary**: Released vX.Y.Z — PR created at <pr_url>
+
+### Details
+- Phases completed: 5/5
+- Version: <previous> -> <new> (<bump_type>)
+- PR: <pr_url>
+- Key outputs: CHANGELOG.md updated, release commit pushed, PR created
+
+### Outputs
+- CHANGELOG.md (updated)
+- <version_file> (version bumped)
+- Release commit: <sha>
+- PR: <pr_url>
+```
+
+If there were review warnings from Phase 2, use `DONE_WITH_CONCERNS` and list the warnings in the Details section:
+
+```
+## STATUS: DONE_WITH_CONCERNS
+
+**Summary**: Released vX.Y.Z — PR created at <pr_url> (review warnings noted)
+
+### Details
+- Phases completed: 5/5
+- Version: <previous> -> <new> (<bump_type>)
+- PR: <pr_url>
+- Concerns: <list review warnings from Phase 2>
+
+### Outputs
+- CHANGELOG.md (updated)
+- <version_file> (version bumped)
+- Release commit: <sha>
+- PR: <pr_url>
+```
--- a/README.md
+++ b/README.md
@@ -416,6 +416,8 @@ Visual workflow template editor with drag-drop.

 - **[Impeccable](https://github.com/pbakaus/impeccable)** — Design audit methodology, OKLCH color system, anti-AI-slop detection patterns, editorial typography standards, motion/animation token architecture, and vanilla JS interaction patterns. The UI team skills (`team-ui-polish`, `team-interactive-craft`, `team-motion-design`, `team-visual-a11y`, `team-uidesign`, `team-ux-improve`) draw heavily from Impeccable's design knowledge.

+- **[gstack](https://github.com/garrytan/gstack)** — Systematic debugging methodology, security audit frameworks, and release pipeline patterns. The skills `investigate` (Iron Law debugging), `security-audit` (OWASP Top 10 + STRIDE), and `ship` (gated release pipeline) are inspired by gstack's workflow designs.
+
 ---

 ## 🤝 Contributing
--- a/README_CN.md
+++ b/README_CN.md
@@ -416,6 +416,8 @@ v2 团队架构引入了**事件驱动的节拍模型**，实现高效编排：

 - **[Impeccable](https://github.com/pbakaus/impeccable)** — 设计审计方法论、OKLCH 色彩系统、anti-AI-slop 检测模式、编辑级排版标准、动效/动画 token 体系、以及原生 JS 交互模式。UI 团队技能（`team-ui-polish`、`team-interactive-craft`、`team-motion-design`、`team-visual-a11y`、`team-uidesign`、`team-ux-improve`）大量借鉴了 Impeccable 的设计知识。

+- **[gstack](https://github.com/garrytan/gstack)** — 系统化调试方法论、安全审计框架与发布流水线模式。`investigate`（Iron Law 调试）、`security-audit`（OWASP Top 10 + STRIDE）、`ship`（门控发布流水线）三个技能的设计灵感来源于 gstack 的工作流设计。
+
 ---

 ## 🤝 贡献