mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-03-30 20:21:09 +08:00
feat: add investigate, security-audit, ship skills (Claude + Codex)
- Add 3 new Claude skills: investigate (Iron Law debugging), security-audit (OWASP Top 10 + STRIDE), ship (gated release pipeline) - Port all 3 skills to Codex v4 format under .codex/skills/ using Deep Interaction pattern (spawn_agent + assign_task phase transitions) - Update README/README_CN acknowledgments: credit gstack (https://github.com/garrytan/gstack) as inspiration source Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
110
.claude/skills/investigate/SKILL.md
Normal file
110
.claude/skills/investigate/SKILL.md
Normal file
@@ -0,0 +1,110 @@
|
||||
---
|
||||
name: investigate
|
||||
description: Systematic debugging with Iron Law methodology. 5-phase investigation from evidence collection to verified fix. Triggers on "investigate", "debug", "root cause".
|
||||
allowed-tools: Bash, Read, Write, Edit, Glob, Grep
|
||||
---
|
||||
|
||||
# Investigate
|
||||
|
||||
Systematic debugging skill that enforces the Iron Law: never fix without a confirmed root cause. Produces a structured debug report with full evidence chain, minimal fix, and regression test.
|
||||
|
||||
## Iron Law Principle
|
||||
|
||||
**No fix without confirmed root cause.** Every investigation follows a strict evidence chain:
|
||||
1. Reproduce the bug with concrete evidence
|
||||
2. Analyze patterns to assess scope
|
||||
3. Form and test hypotheses (max 3 strikes)
|
||||
4. Implement minimal fix ONLY after root cause is confirmed
|
||||
5. Verify fix and generate structured report
|
||||
|
||||
Violation of the Iron Law (skipping to Phase 4 without Phase 3 confirmation) is prohibited.
|
||||
|
||||
## Key Design Principles
|
||||
|
||||
1. **Evidence-First**: Collect before theorizing. Logs, stack traces, and reproduction steps are mandatory inputs.
|
||||
2. **Minimal Fix**: Change only what is necessary. Refactoring is not debugging.
|
||||
3. **3-Strike Escalation**: If 3 consecutive hypothesis tests fail, STOP and escalate with a diagnostic dump.
|
||||
4. **Regression Coverage**: Every fix must include a test that fails without the fix and passes with it.
|
||||
5. **Structured Output**: All findings are recorded in machine-readable JSON for future reference.
|
||||
|
||||
## Execution Flow
|
||||
|
||||
```
|
||||
Phase 1: Root Cause Investigation
|
||||
Reproduce bug, collect evidence (errors, logs, traces)
|
||||
Use ccw cli --tool gemini --mode analysis for initial diagnosis
|
||||
Output: investigation-report.json
|
||||
|
|
||||
v
|
||||
Phase 2: Pattern Analysis
|
||||
Search codebase for similar patterns (same error, module, antipattern)
|
||||
Assess scope: isolated vs systemic
|
||||
Output: pattern-analysis section in report
|
||||
|
|
||||
v
|
||||
Phase 3: Hypothesis Testing
|
||||
Form max 3 hypotheses from evidence
|
||||
Test each with minimal read-only probes
|
||||
3-strike rule: STOP and escalate on 3 consecutive failures
|
||||
Output: confirmed root cause with evidence chain
|
||||
|
|
||||
v
|
||||
Phase 4: Implementation [GATE: requires Phase 3 confirmed root cause]
|
||||
Implement minimal fix
|
||||
Add regression test
|
||||
Verify fix resolves reproduction case
|
||||
|
|
||||
v
|
||||
Phase 5: Verification & Report
|
||||
Run full test suite
|
||||
Check for regressions
|
||||
Generate structured debug report to .workflow/.debug/
|
||||
```
|
||||
|
||||
## Directory Setup
|
||||
|
||||
```bash
|
||||
mkdir -p .workflow/.debug
|
||||
```
|
||||
|
||||
## Output Structure
|
||||
|
||||
```
|
||||
.workflow/.debug/
|
||||
debug-report-{YYYY-MM-DD}-{slug}.json # Structured debug report
|
||||
```
|
||||
|
||||
## Completion Status Protocol
|
||||
|
||||
This skill follows the Completion Status Protocol defined in `_shared/SKILL-DESIGN-SPEC.md` sections 13-14.
|
||||
|
||||
| Status | When |
|
||||
|--------|------|
|
||||
| **DONE** | Root cause confirmed, fix applied, regression test passes, no regressions |
|
||||
| **DONE_WITH_CONCERNS** | Fix applied but partial test coverage or minor warnings |
|
||||
| **BLOCKED** | Cannot reproduce bug, or 3-strike escalation triggered in Phase 3 |
|
||||
| **NEEDS_CONTEXT** | Missing reproduction steps, unclear error conditions |
|
||||
|
||||
## Reference Documents
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [phases/01-root-cause-investigation.md](phases/01-root-cause-investigation.md) | Evidence collection and reproduction |
|
||||
| [phases/02-pattern-analysis.md](phases/02-pattern-analysis.md) | Codebase pattern search and scope assessment |
|
||||
| [phases/03-hypothesis-testing.md](phases/03-hypothesis-testing.md) | Hypothesis formation, testing, and 3-strike rule |
|
||||
| [phases/04-implementation.md](phases/04-implementation.md) | Minimal fix with Iron Law gate |
|
||||
| [phases/05-verification-report.md](phases/05-verification-report.md) | Test suite, regression check, report generation |
|
||||
| [specs/iron-law.md](specs/iron-law.md) | Iron Law rules definition |
|
||||
| [specs/debug-report-format.md](specs/debug-report-format.md) | Structured debug report JSON schema |
|
||||
|
||||
## CLI Integration
|
||||
|
||||
This skill leverages `ccw cli` for multi-model analysis at key points:
|
||||
|
||||
| Phase | CLI Usage | Mode |
|
||||
|-------|-----------|------|
|
||||
| Phase 1 | Initial diagnosis from error evidence | `--mode analysis` |
|
||||
| Phase 2 | Cross-file pattern search | `--mode analysis` |
|
||||
| Phase 3 | Hypothesis validation assistance | `--mode analysis` |
|
||||
|
||||
All CLI calls use `--mode analysis` (read-only). No write-mode CLI calls during investigation phases 1-3.
|
||||
132
.claude/skills/investigate/phases/01-root-cause-investigation.md
Normal file
132
.claude/skills/investigate/phases/01-root-cause-investigation.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Phase 1: Root Cause Investigation
|
||||
|
||||
Reproduce the bug and collect all available evidence before forming any theories.
|
||||
|
||||
## Objective
|
||||
|
||||
- Reproduce the bug with concrete, observable symptoms
|
||||
- Collect all evidence: error messages, logs, stack traces, affected files
|
||||
- Establish a baseline understanding of what goes wrong and where
|
||||
- Use CLI analysis for initial diagnosis
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Understand the Bug Report
|
||||
|
||||
Parse the user's description to extract:
|
||||
- **Symptom**: What observable behavior is wrong?
|
||||
- **Expected**: What should happen instead?
|
||||
- **Context**: When/where does it occur? (specific input, environment, timing)
|
||||
|
||||
```javascript
|
||||
const bugReport = {
|
||||
symptom: "extracted from user description",
|
||||
expected_behavior: "what should happen",
|
||||
context: "when/where it occurs",
|
||||
user_provided_files: ["files mentioned by user"],
|
||||
user_provided_errors: ["error messages provided"]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Reproduce the Bug
|
||||
|
||||
Attempt to reproduce using the most direct method available:
|
||||
|
||||
1. **Run the failing test** (if one exists):
|
||||
```bash
|
||||
# Identify and run the specific failing test
|
||||
```
|
||||
|
||||
2. **Run the failing command** (if CLI/script):
|
||||
```bash
|
||||
# Execute the command that triggers the bug
|
||||
```
|
||||
|
||||
3. **Read error-producing code path** (if reproduction requires complex setup):
|
||||
- Use `Grep` to find the error message in source code
|
||||
- Use `Read` to trace the code path that produces the error
|
||||
- Document the theoretical reproduction path
|
||||
|
||||
**If reproduction fails**: Document what was attempted. The investigation can continue with static analysis, but note this as a concern.
|
||||
|
||||
### Step 3: Collect Evidence
|
||||
|
||||
Gather all available evidence using project tools:
|
||||
|
||||
```javascript
|
||||
// 1. Find error messages in source
|
||||
Grep({ pattern: "error message text", path: "src/" })
|
||||
|
||||
// 2. Find related log output
|
||||
Grep({ pattern: "relevant log pattern", path: "." })
|
||||
|
||||
// 3. Read stack trace files or test output
|
||||
Read({ file_path: "path/to/failing-test-output" })
|
||||
|
||||
// 4. Identify affected files and modules
|
||||
Glob({ pattern: "**/*relevant-module*" })
|
||||
```
|
||||
|
||||
### Step 4: Initial Diagnosis via CLI Analysis
|
||||
|
||||
Use `ccw cli` for a broader diagnostic perspective:
|
||||
|
||||
```bash
|
||||
ccw cli -p "PURPOSE: Diagnose root cause of bug from collected evidence
|
||||
TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
|
||||
MODE: analysis
|
||||
CONTEXT: @{affected_files} | Evidence: {error_messages_and_traces}
|
||||
EXPECTED: Top 3 likely root causes ranked by evidence strength
|
||||
CONSTRAINTS: Read-only analysis | Focus on {affected_module}" \
|
||||
--tool gemini --mode analysis
|
||||
```
|
||||
|
||||
### Step 5: Write Investigation Report
|
||||
|
||||
Generate `investigation-report.json` in memory (carried to next phase):
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": 1,
|
||||
"bug_description": "concise description of the bug",
|
||||
"reproduction": {
|
||||
"reproducible": true,
|
||||
"steps": [
|
||||
"step 1: ...",
|
||||
"step 2: ...",
|
||||
"step 3: observe error"
|
||||
],
|
||||
"reproduction_method": "test|command|static_analysis"
|
||||
},
|
||||
"evidence": {
|
||||
"error_messages": ["exact error text"],
|
||||
"stack_traces": ["relevant stack trace"],
|
||||
"affected_files": ["file1.ts", "file2.ts"],
|
||||
"affected_modules": ["module-name"],
|
||||
"log_output": ["relevant log lines"]
|
||||
},
|
||||
"initial_diagnosis": {
|
||||
"cli_tool_used": "gemini",
|
||||
"top_suspects": [
|
||||
{ "description": "suspect 1", "evidence_strength": "strong|moderate|weak", "files": [] }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **Data**: `investigation-report` (in-memory, passed to Phase 2)
|
||||
- **Format**: JSON structure as defined above
|
||||
|
||||
## Quality Checks
|
||||
|
||||
- [ ] Bug symptom clearly documented
|
||||
- [ ] Reproduction attempted (success or documented failure)
|
||||
- [ ] At least one piece of concrete evidence collected (error message, stack trace, or failing test)
|
||||
- [ ] Affected files identified
|
||||
- [ ] Initial diagnosis generated
|
||||
|
||||
## Next Phase
|
||||
|
||||
Proceed to [Phase 2: Pattern Analysis](02-pattern-analysis.md) with the investigation report.
|
||||
126
.claude/skills/investigate/phases/02-pattern-analysis.md
Normal file
126
.claude/skills/investigate/phases/02-pattern-analysis.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Phase 2: Pattern Analysis
|
||||
|
||||
Search for similar patterns in the codebase to determine if the bug is isolated or systemic.
|
||||
|
||||
## Objective
|
||||
|
||||
- Search for similar error patterns, antipatterns, or code smells across the codebase
|
||||
- Determine if the bug is an isolated incident or part of a systemic issue
|
||||
- Identify related code that may be affected by the same root cause
|
||||
- Refine the scope of the investigation
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Search for Similar Error Patterns
|
||||
|
||||
Look for the same error type or message elsewhere in the codebase:
|
||||
|
||||
```javascript
|
||||
// Search for identical or similar error messages
|
||||
Grep({ pattern: "error_message_fragment", path: "src/", output_mode: "content", context: 3 })
|
||||
|
||||
// Search for the same exception/error type
|
||||
Grep({ pattern: "ErrorClassName|error_code", path: "src/", output_mode: "files_with_matches" })
|
||||
|
||||
// Search for similar error handling patterns
|
||||
Grep({ pattern: "catch.*{similar_pattern}", path: "src/", output_mode: "content" })
|
||||
```
|
||||
|
||||
### Step 2: Search for Same Antipattern
|
||||
|
||||
If the initial diagnosis suggests a coding antipattern, search for it globally:
|
||||
|
||||
```javascript
|
||||
// Examples of antipattern searches:
|
||||
// Missing null checks
|
||||
Grep({ pattern: "variable\\.property", path: "src/", glob: "*.ts" })
|
||||
|
||||
// Unchecked async operations
|
||||
Grep({ pattern: "async.*without.*await", path: "src/" })
|
||||
|
||||
// Direct mutation of shared state
|
||||
Grep({ pattern: "shared_state_pattern", path: "src/" })
|
||||
```
|
||||
|
||||
### Step 3: Module-Level Analysis
|
||||
|
||||
Examine the affected module for structural issues:
|
||||
|
||||
```javascript
|
||||
// List all files in the affected module
|
||||
Glob({ pattern: "src/affected-module/**/*" })
|
||||
|
||||
// Check imports and dependencies
|
||||
Grep({ pattern: "import.*from.*affected-module", path: "src/" })
|
||||
|
||||
// Check for circular dependencies or unusual patterns
|
||||
Grep({ pattern: "require.*affected-module", path: "src/" })
|
||||
```
|
||||
|
||||
### Step 4: CLI Cross-File Pattern Analysis (Optional)
|
||||
|
||||
For complex patterns that span multiple files, use CLI analysis:
|
||||
|
||||
```bash
|
||||
ccw cli -p "PURPOSE: Identify all instances of antipattern across codebase; success = complete scope map
|
||||
TASK: Search for pattern '{antipattern_description}' | Map all occurrences | Assess systemic risk
|
||||
MODE: analysis
|
||||
CONTEXT: @src/**/*.{ext} | Bug in {module}, pattern: {pattern_description}
|
||||
EXPECTED: List of all files with same pattern, risk assessment per occurrence
|
||||
CONSTRAINTS: Focus on {antipattern} pattern only | Ignore test files for scope" \
|
||||
--tool gemini --mode analysis
|
||||
```
|
||||
|
||||
### Step 5: Scope Assessment
|
||||
|
||||
Classify the bug scope based on findings:
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": 2,
|
||||
"pattern_analysis": {
|
||||
"scope": "isolated|module-wide|systemic",
|
||||
"similar_occurrences": [
|
||||
{
|
||||
"file": "path/to/file.ts",
|
||||
"line": 42,
|
||||
"pattern": "description of similar pattern",
|
||||
"risk": "same_bug|potential_bug|safe"
|
||||
}
|
||||
],
|
||||
"total_occurrences": 1,
|
||||
"affected_modules": ["module-name"],
|
||||
"antipattern_identified": "description or null",
|
||||
"scope_justification": "why this scope classification"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Scope Definitions**:
|
||||
- **isolated**: Bug exists in a single location, no similar patterns found
|
||||
- **module-wide**: Same pattern exists in multiple files within the same module
|
||||
- **systemic**: Pattern spans multiple modules, may require broader fix
|
||||
|
||||
## Output
|
||||
|
||||
- **Data**: `pattern-analysis` section added to investigation report (in-memory)
|
||||
- **Format**: JSON structure as defined above
|
||||
|
||||
## Decision Point
|
||||
|
||||
| Scope | Action |
|
||||
|-------|--------|
|
||||
| isolated | Proceed to Phase 3 with narrow focus |
|
||||
| module-wide | Proceed to Phase 3, note all occurrences for Phase 4 fix |
|
||||
| systemic | Proceed to Phase 3, but flag for potential multi-phase fix or separate tracking |
|
||||
|
||||
## Quality Checks
|
||||
|
||||
- [ ] At least 3 search queries executed against the codebase
|
||||
- [ ] Scope classified as isolated, module-wide, or systemic
|
||||
- [ ] Similar occurrences documented with file:line references
|
||||
- [ ] Scope justification provided with evidence
|
||||
|
||||
## Next Phase
|
||||
|
||||
Proceed to [Phase 3: Hypothesis Testing](03-hypothesis-testing.md) with the pattern analysis results.
|
||||
177
.claude/skills/investigate/phases/03-hypothesis-testing.md
Normal file
177
.claude/skills/investigate/phases/03-hypothesis-testing.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# Phase 3: Hypothesis Testing
|
||||
|
||||
Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.
|
||||
|
||||
## Objective
|
||||
|
||||
- Form a maximum of 3 hypotheses from Phase 1-2 evidence
|
||||
- Test each hypothesis with minimal, read-only probes
|
||||
- Confirm or reject each hypothesis with concrete evidence
|
||||
- Enforce 3-strike rule: STOP and escalate after 3 consecutive test failures
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Form Hypotheses
|
||||
|
||||
Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:
|
||||
|
||||
```json
|
||||
{
|
||||
"hypotheses": [
|
||||
{
|
||||
"id": "H1",
|
||||
"description": "The root cause is X because evidence Y",
|
||||
"evidence_supporting": ["evidence item 1", "evidence item 2"],
|
||||
"predicted_behavior": "If H1 is correct, then we should observe Z",
|
||||
"test_method": "How to verify: read file X line Y, check value Z",
|
||||
"confidence": "high|medium|low"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Hypothesis Formation Rules**:
|
||||
- Each hypothesis must cite at least one piece of evidence from Phase 1-2
|
||||
- Each hypothesis must have a testable prediction
|
||||
- Rank by confidence (high first)
|
||||
- Maximum 3 hypotheses per investigation
|
||||
|
||||
### Step 2: Test Hypotheses Sequentially
|
||||
|
||||
Test each hypothesis starting from highest confidence. Use read-only probes:
|
||||
|
||||
**Allowed test methods**:
|
||||
- `Read` a specific file and check a specific value or condition
|
||||
- `Grep` for a pattern that would confirm or deny the hypothesis
|
||||
- `Bash` to run a specific test or command that reveals the condition
|
||||
- Temporarily add a log statement to observe runtime behavior (revert after)
|
||||
|
||||
**Prohibited during testing**:
|
||||
- Modifying production code (save that for Phase 4)
|
||||
- Changing multiple things at once
|
||||
- Running the full test suite (targeted checks only)
|
||||
|
||||
```javascript
|
||||
// Example hypothesis test
|
||||
// H1: "Function X receives null because caller Y doesn't check return value"
|
||||
const evidence = Read({ file_path: "src/caller.ts" })
|
||||
// Check: Does caller Y use the return value without null check?
|
||||
// Result: Confirmed / Rejected with specific evidence
|
||||
```
|
||||
|
||||
### Step 3: Record Test Results
|
||||
|
||||
For each hypothesis test:
|
||||
|
||||
```json
|
||||
{
|
||||
"hypothesis_tests": [
|
||||
{
|
||||
"id": "H1",
|
||||
"test_performed": "Read src/caller.ts:42 - checked null handling",
|
||||
"result": "confirmed|rejected|inconclusive",
|
||||
"evidence": "specific observation that confirms or rejects",
|
||||
"files_checked": ["src/caller.ts:42-55"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: 3-Strike Escalation Rule
|
||||
|
||||
Track consecutive test failures. A "failure" means the test was inconclusive or the hypothesis was rejected AND no actionable insight was gained.
|
||||
|
||||
```
|
||||
Strike Counter:
|
||||
[H1 rejected, no insight] → Strike 1
|
||||
[H2 rejected, no insight] → Strike 2
|
||||
[H3 rejected, no insight] → Strike 3 → STOP
|
||||
```
|
||||
|
||||
**Important**: A rejected hypothesis that provides useful insight (narrows the search) does NOT count as a strike. Only truly unproductive tests count.
|
||||
|
||||
**On 3rd Strike — STOP and Escalate**:
|
||||
|
||||
```
|
||||
## ESCALATION: 3-Strike Limit Reached
|
||||
|
||||
### Failed Step
|
||||
- Phase: 3 — Hypothesis Testing
|
||||
- Step: Hypothesis test #{N}
|
||||
|
||||
### Error History
|
||||
1. Attempt 1: H1 — {description}
|
||||
Test: {what was checked}
|
||||
Result: {rejected/inconclusive} — {why}
|
||||
2. Attempt 2: H2 — {description}
|
||||
Test: {what was checked}
|
||||
Result: {rejected/inconclusive} — {why}
|
||||
3. Attempt 3: H3 — {description}
|
||||
Test: {what was checked}
|
||||
Result: {rejected/inconclusive} — {why}
|
||||
|
||||
### Current State
|
||||
- Evidence collected: {summary from Phase 1-2}
|
||||
- Hypotheses tested: {list}
|
||||
- Files examined: {list}
|
||||
|
||||
### Diagnosis
|
||||
- Likely root cause area: {best guess based on all evidence}
|
||||
- Suggested human action: {specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z"}
|
||||
|
||||
### Diagnostic Dump
|
||||
{Full investigation-report.json content}
|
||||
```
|
||||
|
||||
After escalation, set status to **BLOCKED** per Completion Status Protocol.
|
||||
|
||||
### Step 5: Confirm Root Cause
|
||||
|
||||
If a hypothesis is confirmed, document the confirmed root cause:
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": 3,
|
||||
"confirmed_root_cause": {
|
||||
"hypothesis_id": "H1",
|
||||
"description": "Root cause description with full evidence chain",
|
||||
"evidence_chain": [
|
||||
"Phase 1: Error message X observed in Y",
|
||||
"Phase 2: Same pattern found in 3 other files",
|
||||
"Phase 3: H1 confirmed — null check missing at file.ts:42"
|
||||
],
|
||||
"affected_code": {
|
||||
"file": "path/to/file.ts",
|
||||
"line_range": "42-55",
|
||||
"function": "functionName"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **Data**: `hypothesis-tests` and `confirmed_root_cause` added to investigation report (in-memory)
|
||||
- **Format**: JSON structure as defined above
|
||||
|
||||
## Gate for Phase 4
|
||||
|
||||
**Phase 4 can ONLY proceed if `confirmed_root_cause` is present.** This is the Iron Law gate.
|
||||
|
||||
| Outcome | Next Step |
|
||||
|---------|-----------|
|
||||
| Root cause confirmed | Proceed to [Phase 4: Implementation](04-implementation.md) |
|
||||
| 3-strike escalation | STOP, output diagnostic dump, status = BLOCKED |
|
||||
| Partial insight | Re-form hypotheses with new evidence (stays in Phase 3) |
|
||||
|
||||
## Quality Checks
|
||||
|
||||
- [ ] Maximum 3 hypotheses formed, each with cited evidence
|
||||
- [ ] Each hypothesis tested with a specific, documented probe
|
||||
- [ ] Test results recorded with concrete evidence
|
||||
- [ ] 3-strike counter maintained correctly
|
||||
- [ ] Root cause confirmed with full evidence chain OR escalation triggered
|
||||
|
||||
## Next Phase
|
||||
|
||||
Proceed to [Phase 4: Implementation](04-implementation.md) ONLY with confirmed root cause.
|
||||
139
.claude/skills/investigate/phases/04-implementation.md
Normal file
139
.claude/skills/investigate/phases/04-implementation.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Phase 4: Implementation
|
||||
|
||||
Implement the minimal fix and add a regression test. Iron Law gate enforced.
|
||||
|
||||
## Objective
|
||||
|
||||
- Verify Iron Law gate: confirmed root cause MUST exist from Phase 3
|
||||
- Implement the minimal fix that addresses the confirmed root cause
|
||||
- Add a regression test that fails without the fix and passes with it
|
||||
- Verify the fix resolves the original reproduction case
|
||||
|
||||
## Iron Law Gate Check
|
||||
|
||||
**MANDATORY**: Before any code modification, verify:
|
||||
|
||||
```javascript
|
||||
if (!investigation_report.confirmed_root_cause) {
|
||||
// VIOLATION: Cannot proceed without confirmed root cause
|
||||
// Return to Phase 3 or escalate
|
||||
throw new Error("Iron Law violation: No confirmed root cause. Return to Phase 3.")
|
||||
}
|
||||
|
||||
console.log(`Root cause confirmed: ${investigation_report.confirmed_root_cause.description}`)
|
||||
console.log(`Evidence chain: ${investigation_report.confirmed_root_cause.evidence_chain.length} items`)
|
||||
console.log(`Affected code: ${investigation_report.confirmed_root_cause.affected_code.file}:${investigation_report.confirmed_root_cause.affected_code.line_range}`)
|
||||
```
|
||||
|
||||
If the gate check fails, do NOT proceed. Return status **BLOCKED** with reason "Iron Law: no confirmed root cause".
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Plan the Minimal Fix
|
||||
|
||||
Define the fix scope BEFORE writing any code:
|
||||
|
||||
```json
|
||||
{
|
||||
"fix_plan": {
|
||||
"description": "What the fix does and why",
|
||||
"changes": [
|
||||
{
|
||||
"file": "path/to/file.ts",
|
||||
"change_type": "modify|add|remove",
|
||||
"description": "specific change description",
|
||||
"lines_affected": "42-45"
|
||||
}
|
||||
],
|
||||
"total_files_changed": 1,
|
||||
"total_lines_changed": "estimated"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Minimal Fix Rules** (from [specs/iron-law.md](../specs/iron-law.md)):
|
||||
- Change only what is necessary to fix the confirmed root cause
|
||||
- Do not refactor surrounding code
|
||||
- Do not add features
|
||||
- Do not change formatting or style of unrelated code
|
||||
- If the fix requires changes to more than 3 files, document justification
|
||||
|
||||
### Step 2: Implement the Fix
|
||||
|
||||
Apply the planned changes using `Edit` tool:
|
||||
|
||||
```javascript
|
||||
Edit({
|
||||
file_path: "path/to/affected/file.ts",
|
||||
old_string: "buggy code",
|
||||
new_string: "fixed code"
|
||||
})
|
||||
```
|
||||
|
||||
### Step 3: Add Regression Test
|
||||
|
||||
Create or modify a test that:
|
||||
1. **Fails** without the fix (tests the exact bug condition)
|
||||
2. **Passes** with the fix
|
||||
|
||||
```javascript
|
||||
// Identify existing test file for the module
|
||||
Glob({ pattern: "**/*.test.{ts,js,py}" })
|
||||
// or
|
||||
Glob({ pattern: "**/test_*.py" })
|
||||
|
||||
// Add regression test
|
||||
// Test name should reference the bug: "should handle null return from X"
|
||||
// Test should exercise the exact code path that caused the bug
|
||||
```
|
||||
|
||||
**Regression test requirements**:
|
||||
- Test name clearly describes the bug scenario
|
||||
- Test exercises the specific code path identified in root cause
|
||||
- Test is deterministic (no flaky timing, external dependencies)
|
||||
- Test is placed in the appropriate test file for the module
|
||||
|
||||
### Step 4: Verify Fix Against Reproduction
|
||||
|
||||
Re-run the original reproduction case from Phase 1:
|
||||
|
||||
```bash
|
||||
# Run the specific failing test/command from Phase 1
|
||||
# It should now pass
|
||||
```
|
||||
|
||||
Record the verification result:
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": 4,
|
||||
"fix_applied": {
|
||||
"description": "what was fixed",
|
||||
"files_changed": ["path/to/file.ts"],
|
||||
"lines_changed": 3,
|
||||
"regression_test": {
|
||||
"file": "path/to/test.ts",
|
||||
"test_name": "should handle null return from X",
|
||||
"status": "added|modified"
|
||||
},
|
||||
"reproduction_verified": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **Data**: `fix_applied` section added to investigation report (in-memory)
|
||||
- **Artifacts**: Modified source files and test files
|
||||
|
||||
## Quality Checks
|
||||
|
||||
- [ ] Iron Law gate passed: confirmed root cause exists
|
||||
- [ ] Fix is minimal: only necessary changes made
|
||||
- [ ] Regression test added that covers the specific bug
|
||||
- [ ] Original reproduction case passes with the fix
|
||||
- [ ] No unrelated code changes included
|
||||
|
||||
## Next Phase
|
||||
|
||||
Proceed to [Phase 5: Verification & Report](05-verification-report.md) to run full test suite and generate report.
|
||||
153
.claude/skills/investigate/phases/05-verification-report.md
Normal file
153
.claude/skills/investigate/phases/05-verification-report.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Phase 5: Verification & Report
|
||||
|
||||
Run full test suite, check for regressions, and generate the structured debug report.
|
||||
|
||||
## Objective
|
||||
|
||||
- Run the full test suite to verify no regressions were introduced
|
||||
- Generate a structured debug report for future reference
|
||||
- Output the report to `.workflow/.debug/` directory
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Run Full Test Suite
|
||||
|
||||
```bash
|
||||
# Detect and run the project's test framework
|
||||
# npm test / pytest / go test / cargo test / etc.
|
||||
```
|
||||
|
||||
Record results:
|
||||
|
||||
```json
|
||||
{
|
||||
"test_results": {
|
||||
"total": 0,
|
||||
"passed": 0,
|
||||
"failed": 0,
|
||||
"skipped": 0,
|
||||
"regression_test_passed": true,
|
||||
"new_failures": []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**If new failures are found**:
|
||||
- Check if the failures are related to the fix
|
||||
- If related: the fix introduced a regression — return to Phase 4 to adjust
|
||||
- If unrelated: document as pre-existing failures, proceed with report
|
||||
|
||||
### Step 2: Regression Check
|
||||
|
||||
Verify specifically:
|
||||
1. The new regression test passes
|
||||
2. All tests that passed before the fix still pass
|
||||
3. No new warnings or errors in test output
|
||||
|
||||
### Step 3: Generate Structured Debug Report
|
||||
|
||||
Create the report following the schema in [specs/debug-report-format.md](../specs/debug-report-format.md):
|
||||
|
||||
```bash
|
||||
mkdir -p .workflow/.debug
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"bug_description": "concise description of the bug",
|
||||
"reproduction_steps": [
|
||||
"step 1",
|
||||
"step 2",
|
||||
"step 3: observe error"
|
||||
],
|
||||
"root_cause": "confirmed root cause description with technical detail",
|
||||
"evidence_chain": [
|
||||
"Phase 1: error message X observed in module Y",
|
||||
"Phase 2: pattern analysis found N similar occurrences",
|
||||
"Phase 3: hypothesis H1 confirmed — specific condition at file:line"
|
||||
],
|
||||
"fix_description": "what was changed and why",
|
||||
"files_changed": [
|
||||
{
|
||||
"path": "src/module/file.ts",
|
||||
"change_type": "modify",
|
||||
"description": "added null check before property access"
|
||||
}
|
||||
],
|
||||
"tests_added": [
|
||||
{
|
||||
"file": "src/module/__tests__/file.test.ts",
|
||||
"test_name": "should handle null return from X",
|
||||
"type": "regression"
|
||||
}
|
||||
],
|
||||
"regression_check_result": {
|
||||
"passed": true,
|
||||
"total_tests": 0,
|
||||
"new_failures": [],
|
||||
"pre_existing_failures": []
|
||||
},
|
||||
"completion_status": "DONE|DONE_WITH_CONCERNS|BLOCKED",
|
||||
"concerns": [],
|
||||
"timestamp": "ISO-8601",
|
||||
"investigation_duration_phases": 5
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Write Report File
|
||||
|
||||
```javascript
|
||||
const slug = bugDescription.toLowerCase().replace(/[^a-z0-9]+/g, '-').substring(0, 40)
|
||||
const dateStr = new Date().toISOString().substring(0, 10)
|
||||
const reportPath = `.workflow/.debug/debug-report-${dateStr}-${slug}.json`
|
||||
|
||||
Write({ file_path: reportPath, content: JSON.stringify(report, null, 2) })
|
||||
```
|
||||
|
||||
### Step 5: Output Completion Status
|
||||
|
||||
Follow the Completion Status Protocol from `_shared/SKILL-DESIGN-SPEC.md` section 13:
|
||||
|
||||
**DONE**:
|
||||
```
|
||||
## STATUS: DONE
|
||||
|
||||
**Summary**: Fixed {bug_description} — root cause was {root_cause_summary}
|
||||
|
||||
### Details
|
||||
- Phases completed: 5/5
|
||||
- Root cause: {confirmed_root_cause}
|
||||
- Fix: {fix_description}
|
||||
- Regression test: {test_name} in {test_file}
|
||||
|
||||
### Outputs
|
||||
- Debug report: {reportPath}
|
||||
- Files changed: {list}
|
||||
- Tests added: {list}
|
||||
```
|
||||
|
||||
**DONE_WITH_CONCERNS**:
|
||||
```
|
||||
## STATUS: DONE_WITH_CONCERNS
|
||||
|
||||
**Summary**: Fixed {bug_description} with concerns
|
||||
|
||||
### Details
|
||||
- Phases completed: 5/5
|
||||
- Concerns:
|
||||
1. {concern} — Impact: {low|medium} — Suggested fix: {action}
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **File**: `debug-report-{YYYY-MM-DD}-{slug}.json`
|
||||
- **Location**: `.workflow/.debug/`
|
||||
- **Format**: JSON (see [specs/debug-report-format.md](../specs/debug-report-format.md))
|
||||
|
||||
## Quality Checks
|
||||
|
||||
- [ ] Full test suite executed
|
||||
- [ ] Regression test specifically verified
|
||||
- [ ] No new test failures introduced (or documented if pre-existing)
|
||||
- [ ] Debug report written to `.workflow/.debug/`
|
||||
- [ ] Completion status output follows protocol
|
||||
226
.claude/skills/investigate/specs/debug-report-format.md
Normal file
226
.claude/skills/investigate/specs/debug-report-format.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# Debug Report Format
|
||||
|
||||
Defines the structured JSON schema for debug reports generated by the investigate skill.
|
||||
|
||||
## When to Use
|
||||
|
||||
| Phase | Usage | Section |
|
||||
|-------|-------|---------|
|
||||
| Phase 5 | Generate final report | Full schema |
|
||||
| Phase 3 (escalation) | Diagnostic dump includes partial report | Partial schema |
|
||||
|
||||
---
|
||||
|
||||
## JSON Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "Debug Report",
|
||||
"type": "object",
|
||||
"required": [
|
||||
"bug_description",
|
||||
"reproduction_steps",
|
||||
"root_cause",
|
||||
"evidence_chain",
|
||||
"fix_description",
|
||||
"files_changed",
|
||||
"tests_added",
|
||||
"regression_check_result",
|
||||
"completion_status"
|
||||
],
|
||||
"properties": {
|
||||
"bug_description": {
|
||||
"type": "string",
|
||||
"description": "Concise description of the bug symptom",
|
||||
"minLength": 10
|
||||
},
|
||||
"reproduction_steps": {
|
||||
"type": "array",
|
||||
"description": "Ordered steps to reproduce the bug",
|
||||
"items": { "type": "string" },
|
||||
"minItems": 1
|
||||
},
|
||||
"root_cause": {
|
||||
"type": "string",
|
||||
"description": "Confirmed root cause with technical detail",
|
||||
"minLength": 20
|
||||
},
|
||||
"evidence_chain": {
|
||||
"type": "array",
|
||||
"description": "Ordered evidence from Phase 1 through Phase 3, each prefixed with phase number",
|
||||
"items": { "type": "string" },
|
||||
"minItems": 1
|
||||
},
|
||||
"fix_description": {
|
||||
"type": "string",
|
||||
"description": "What was changed and why",
|
||||
"minLength": 10
|
||||
},
|
||||
"files_changed": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"required": ["path", "change_type", "description"],
|
||||
"properties": {
|
||||
"path": {
|
||||
"type": "string",
|
||||
"description": "Relative file path"
|
||||
},
|
||||
"change_type": {
|
||||
"type": "string",
|
||||
"enum": ["add", "modify", "remove"]
|
||||
},
|
||||
"description": {
|
||||
"type": "string",
|
||||
"description": "Brief description of changes to this file"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"tests_added": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"required": ["file", "test_name", "type"],
|
||||
"properties": {
|
||||
"file": {
|
||||
"type": "string",
|
||||
"description": "Test file path"
|
||||
},
|
||||
"test_name": {
|
||||
"type": "string",
|
||||
"description": "Name of the test function or describe block"
|
||||
},
|
||||
"type": {
|
||||
"type": "string",
|
||||
"enum": ["regression", "unit", "integration"],
|
||||
"description": "Type of test added"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"regression_check_result": {
|
||||
"type": "object",
|
||||
"required": ["passed", "total_tests"],
|
||||
"properties": {
|
||||
"passed": {
|
||||
"type": "boolean",
|
||||
"description": "Whether the full test suite passed"
|
||||
},
|
||||
"total_tests": {
|
||||
"type": "integer",
|
||||
"description": "Total number of tests executed"
|
||||
},
|
||||
"new_failures": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Tests that failed after the fix but passed before"
|
||||
},
|
||||
"pre_existing_failures": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" },
|
||||
"description": "Tests that were already failing before the investigation"
|
||||
}
|
||||
}
|
||||
},
|
||||
"completion_status": {
|
||||
"type": "string",
|
||||
"enum": ["DONE", "DONE_WITH_CONCERNS", "BLOCKED"],
|
||||
"description": "Final status per Completion Status Protocol"
|
||||
},
|
||||
"concerns": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"description": { "type": "string" },
|
||||
"impact": { "type": "string", "enum": ["low", "medium"] },
|
||||
"suggested_action": { "type": "string" }
|
||||
}
|
||||
},
|
||||
"description": "Non-blocking concerns (populated when status is DONE_WITH_CONCERNS)"
|
||||
},
|
||||
"timestamp": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "ISO-8601 timestamp of report generation"
|
||||
},
|
||||
"investigation_duration_phases": {
|
||||
"type": "integer",
|
||||
"description": "Number of phases completed (1-5)",
|
||||
"minimum": 1,
|
||||
"maximum": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Field Descriptions
|
||||
|
||||
| Field | Source Phase | Description |
|
||||
|-------|-------------|-------------|
|
||||
| `bug_description` | Phase 1 | User-reported symptom, one sentence |
|
||||
| `reproduction_steps` | Phase 1 | Ordered steps to trigger the bug |
|
||||
| `root_cause` | Phase 3 | Confirmed cause with file:line reference |
|
||||
| `evidence_chain` | Phase 1-3 | Each item prefixed with "Phase N:" |
|
||||
| `fix_description` | Phase 4 | What code was changed and why |
|
||||
| `files_changed` | Phase 4 | Each file with change type and description |
|
||||
| `tests_added` | Phase 4 | Regression tests covering the bug |
|
||||
| `regression_check_result` | Phase 5 | Full test suite results |
|
||||
| `completion_status` | Phase 5 | Final status per protocol |
|
||||
| `concerns` | Phase 5 | Non-blocking issues (if any) |
|
||||
| `timestamp` | Phase 5 | When report was generated |
|
||||
| `investigation_duration_phases` | Phase 5 | How many phases were completed |
|
||||
|
||||
## Example Report
|
||||
|
||||
```json
|
||||
{
|
||||
"bug_description": "API returns 500 when user profile has null display_name",
|
||||
"reproduction_steps": [
|
||||
"Create user account without setting display_name",
|
||||
"Call GET /api/users/:id/profile",
|
||||
"Observe 500 Internal Server Error"
|
||||
],
|
||||
"root_cause": "ProfileSerializer.format() calls displayName.trim() without null check at src/serializers/profile.ts:42",
|
||||
"evidence_chain": [
|
||||
"Phase 1: TypeError: Cannot read properties of null (reading 'trim') in server logs",
|
||||
"Phase 2: Same pattern in 2 other serializers (address.ts:28, company.ts:35)",
|
||||
"Phase 3: H1 confirmed — displayName field is nullable in DB but serializer assumes non-null"
|
||||
],
|
||||
"fix_description": "Added null-safe access for displayName in ProfileSerializer.format()",
|
||||
"files_changed": [
|
||||
{
|
||||
"path": "src/serializers/profile.ts",
|
||||
"change_type": "modify",
|
||||
"description": "Added optional chaining for displayName.trim() call"
|
||||
}
|
||||
],
|
||||
"tests_added": [
|
||||
{
|
||||
"file": "src/serializers/__tests__/profile.test.ts",
|
||||
"test_name": "should handle null display_name without error",
|
||||
"type": "regression"
|
||||
}
|
||||
],
|
||||
"regression_check_result": {
|
||||
"passed": true,
|
||||
"total_tests": 142,
|
||||
"new_failures": [],
|
||||
"pre_existing_failures": []
|
||||
},
|
||||
"completion_status": "DONE",
|
||||
"concerns": [],
|
||||
"timestamp": "2026-03-29T15:30:00+08:00",
|
||||
"investigation_duration_phases": 5
|
||||
}
|
||||
```
|
||||
|
||||
## Output Location
|
||||
|
||||
Reports are written to: `.workflow/.debug/debug-report-{YYYY-MM-DD}-{slug}.json`
|
||||
|
||||
Where:
|
||||
- `{YYYY-MM-DD}` is the investigation date
|
||||
- `{slug}` is derived from the bug description (lowercase, hyphens, max 40 chars)
|
||||
101
.claude/skills/investigate/specs/iron-law.md
Normal file
101
.claude/skills/investigate/specs/iron-law.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# Iron Law of Debugging
|
||||
|
||||
The Iron Law defines the non-negotiable rules that govern every investigation performed by this skill. These rules exist to prevent symptom-fixing and ensure durable, evidence-based solutions.
|
||||
|
||||
## When to Use
|
||||
|
||||
| Phase | Usage | Section |
|
||||
|-------|-------|---------|
|
||||
| Phase 3 | Hypothesis must produce confirmed root cause before proceeding | Rule 1 |
|
||||
| Phase 1 | Reproduction must produce observable evidence | Rule 2 |
|
||||
| Phase 4 | Fix scope must be minimal | Rule 3 |
|
||||
| Phase 4 | Regression test is mandatory | Rule 4 |
|
||||
| Phase 3 | 3 consecutive unproductive hypothesis failures trigger escalation | Rule 5 |
|
||||
|
||||
---
|
||||
|
||||
## Rules
|
||||
|
||||
### Rule 1: Never Fix Without Confirmed Root Cause
|
||||
|
||||
**Statement**: No code modification is permitted until a root cause has been confirmed through hypothesis testing with concrete evidence.
|
||||
|
||||
**Enforcement**: Phase 4 begins with an Iron Law gate check. If `confirmed_root_cause` is absent from the investigation report, Phase 4 is blocked.
|
||||
|
||||
**Rationale**: Fixing symptoms without understanding the cause leads to:
|
||||
- Incomplete fixes that break under different conditions
|
||||
- Masking of deeper issues
|
||||
- Wasted investigation time when the bug recurs
|
||||
|
||||
### Rule 2: Evidence Must Be Reproducible
|
||||
|
||||
**Statement**: The bug must be reproducible through documented steps, or if not reproducible, the evidence must be sufficient to identify the root cause through static analysis.
|
||||
|
||||
**Enforcement**: Phase 1 documents reproduction steps and evidence. If reproduction fails, this is flagged as a concern but does not block investigation if sufficient static evidence exists.
|
||||
|
||||
**Acceptable evidence types**:
|
||||
- Failing test case
|
||||
- Error message with stack trace
|
||||
- Log output showing the failure
|
||||
- Code path analysis showing the defect condition
|
||||
|
||||
### Rule 3: Fix Must Be Minimal
|
||||
|
||||
**Statement**: The fix must change only what is necessary to address the confirmed root cause. No refactoring, no feature additions, no style changes to unrelated code.
|
||||
|
||||
**Enforcement**: Phase 4 requires a fix plan before implementation. Changes exceeding 3 files require written justification.
|
||||
|
||||
**What counts as minimal**:
|
||||
- Adding a missing null check
|
||||
- Fixing an incorrect condition
|
||||
- Correcting a wrong variable reference
|
||||
- Adding a missing import or dependency
|
||||
|
||||
**What is NOT minimal**:
|
||||
- Refactoring the function "while we're here"
|
||||
- Renaming variables for clarity
|
||||
- Adding error handling to unrelated code paths
|
||||
- Reformatting surrounding code
|
||||
|
||||
### Rule 4: Regression Test Required
|
||||
|
||||
**Statement**: Every fix must include a test that:
|
||||
1. Fails when the fix is reverted (proves it tests the bug)
|
||||
2. Passes when the fix is applied (proves the fix works)
|
||||
|
||||
**Enforcement**: Phase 4 requires a regression test before the phase is marked complete.
|
||||
|
||||
**Test requirements**:
|
||||
- Test name clearly references the bug scenario
|
||||
- Test exercises the exact code path of the root cause
|
||||
- Test is deterministic (no timing dependencies, no external services)
|
||||
- Test is placed in the appropriate test file for the affected module
|
||||
|
||||
### Rule 5: 3-Strike Escalation on Hypothesis Failure
|
||||
|
||||
**Statement**: If 3 consecutive hypothesis tests produce no actionable insight, the investigation must STOP and escalate with a full diagnostic dump.
|
||||
|
||||
**Enforcement**: Phase 3 tracks a strike counter. On the 3rd consecutive unproductive failure, execution halts and outputs the escalation block.
|
||||
|
||||
**What counts as a strike**:
|
||||
- Hypothesis rejected AND no new insight gained
|
||||
- Test was inconclusive AND no narrowing of search space
|
||||
|
||||
**What does NOT count as a strike**:
|
||||
- Hypothesis rejected BUT new evidence narrows the search
|
||||
- Hypothesis rejected BUT reveals a different potential cause
|
||||
- Test inconclusive BUT identifies a new area to investigate
|
||||
|
||||
**Post-escalation**: Status set to BLOCKED. No further automated investigation. Preserve all intermediate outputs for human review.
|
||||
|
||||
---
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
Before completing any investigation, verify:
|
||||
|
||||
- [ ] Rule 1: Root cause confirmed before any fix was applied
|
||||
- [ ] Rule 2: Bug reproduction documented (or static evidence justified)
|
||||
- [ ] Rule 3: Fix changes only necessary code (file count, line count documented)
|
||||
- [ ] Rule 4: Regression test exists and passes
|
||||
- [ ] Rule 5: No more than 3 consecutive unproductive hypothesis tests (or escalation triggered)
|
||||
125
.claude/skills/security-audit/SKILL.md
Normal file
125
.claude/skills/security-audit/SKILL.md
Normal file
@@ -0,0 +1,125 @@
|
||||
---
|
||||
name: security-audit
|
||||
description: OWASP Top 10 and STRIDE security auditing with supply chain analysis. Triggers on "security audit", "security scan", "cso".
|
||||
allowed-tools: Read, Write, Bash, Glob, Grep
|
||||
---
|
||||
|
||||
# Security Audit
|
||||
|
||||
4-phase security audit covering supply chain risks, OWASP Top 10 code review, STRIDE threat modeling, and trend-tracked reporting. Produces structured JSON findings in `.workflow/.security/`.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
+-------------------------------------------------------------------+
|
||||
| Phase 1: Supply Chain Scan |
|
||||
| -> Dependency audit, secrets detection, CI/CD review, LLM risks |
|
||||
| -> Output: supply-chain-report.json |
|
||||
+-----------------------------------+-------------------------------+
|
||||
|
|
||||
+-----------------------------------v-------------------------------+
|
||||
| Phase 2: OWASP Review |
|
||||
| -> OWASP Top 10 2021 code-level analysis via ccw cli |
|
||||
| -> Output: owasp-findings.json |
|
||||
+-----------------------------------+-------------------------------+
|
||||
|
|
||||
+-----------------------------------v-------------------------------+
|
||||
| Phase 3: Threat Modeling (STRIDE) |
|
||||
| -> 6 threat categories mapped to architecture components |
|
||||
| -> Output: threat-model.json |
|
||||
+-----------------------------------+-------------------------------+
|
||||
|
|
||||
+-----------------------------------v-------------------------------+
|
||||
| Phase 4: Report & Tracking |
|
||||
| -> Score calculation, trend comparison, dated report |
|
||||
| -> Output: .workflow/.security/audit-report-{date}.json |
|
||||
+-------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
## Key Design Principles
|
||||
|
||||
1. **Infrastructure-first**: Phase 1 catches low-hanging fruit (leaked secrets, vulnerable deps) before deeper analysis
|
||||
2. **Standards-based**: OWASP Top 10 2021 and STRIDE provide systematic coverage
|
||||
3. **Scoring gates**: Daily quick-scan must score 8/10; comprehensive audit minimum 2/10 for initial baseline
|
||||
4. **Trend tracking**: Each audit compares against prior results in `.workflow/.security/`
|
||||
|
||||
## Execution Flow
|
||||
|
||||
### Quick-Scan Mode (daily)
|
||||
|
||||
Run Phase 1 only. Must score >= 8/10 to pass.
|
||||
|
||||
### Comprehensive Mode (full audit)
|
||||
|
||||
Run all 4 phases sequentially. Initial baseline minimum 2/10.
|
||||
|
||||
### Phase Sequence
|
||||
|
||||
1. **Phase 1: Supply Chain Scan** -- [phases/01-supply-chain-scan.md](phases/01-supply-chain-scan.md)
|
||||
- Dependency audit (npm audit / pip-audit / safety check)
|
||||
- Secrets detection (API keys, tokens, passwords in source)
|
||||
- CI/CD config review (injection risks in workflow YAML)
|
||||
- LLM/AI prompt injection check
|
||||
2. **Phase 2: OWASP Review** -- [phases/02-owasp-review.md](phases/02-owasp-review.md)
|
||||
- Systematic OWASP Top 10 2021 code review
|
||||
- Uses `ccw cli --tool gemini --mode analysis --rule analysis-assess-security-risks`
|
||||
3. **Phase 3: Threat Modeling** -- [phases/03-threat-modeling.md](phases/03-threat-modeling.md)
|
||||
- STRIDE threat model mapped to architecture components
|
||||
- Trust boundary identification and attack surface assessment
|
||||
4. **Phase 4: Report & Tracking** -- [phases/04-report-tracking.md](phases/04-report-tracking.md)
|
||||
- Score calculation with severity weights
|
||||
- Trend comparison with previous audits
|
||||
- Date-stamped report to `.workflow/.security/`
|
||||
|
||||
## Scoring Overview
|
||||
|
||||
See [specs/scoring-gates.md](specs/scoring-gates.md) for full specification.
|
||||
|
||||
| Severity | Weight | Example |
|
||||
|----------|--------|---------|
|
||||
| Critical | 10 | RCE, SQL injection, leaked credentials |
|
||||
| High | 7 | Broken auth, SSRF, privilege escalation |
|
||||
| Medium | 4 | XSS, CSRF, verbose error messages |
|
||||
| Low | 1 | Missing headers, informational disclosures |
|
||||
|
||||
**Gates**: Daily quick-scan >= 8/10, Comprehensive initial >= 2/10.
|
||||
|
||||
## Directory Setup
|
||||
|
||||
```bash
|
||||
mkdir -p .workflow/.security
|
||||
WORK_DIR=".workflow/.security"
|
||||
```
|
||||
|
||||
## Output Structure
|
||||
|
||||
```
|
||||
.workflow/.security/
|
||||
audit-report-{YYYY-MM-DD}.json # Dated audit report
|
||||
supply-chain-report.json # Latest supply chain scan
|
||||
owasp-findings.json # Latest OWASP findings
|
||||
threat-model.json # Latest STRIDE threat model
|
||||
```
|
||||
|
||||
## Reference Documents
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [phases/01-supply-chain-scan.md](phases/01-supply-chain-scan.md) | Dependency, secrets, CI/CD, LLM risk scan |
|
||||
| [phases/02-owasp-review.md](phases/02-owasp-review.md) | OWASP Top 10 2021 code review |
|
||||
| [phases/03-threat-modeling.md](phases/03-threat-modeling.md) | STRIDE threat modeling |
|
||||
| [phases/04-report-tracking.md](phases/04-report-tracking.md) | Report generation and trend tracking |
|
||||
| [specs/scoring-gates.md](specs/scoring-gates.md) | Scoring system and quality gates |
|
||||
| [specs/owasp-checklist.md](specs/owasp-checklist.md) | OWASP Top 10 detection patterns |
|
||||
|
||||
## Completion Status Protocol
|
||||
|
||||
This skill follows the Completion Status Protocol defined in `_shared/SKILL-DESIGN-SPEC.md` sections 13-14.
|
||||
|
||||
Possible termination statuses:
|
||||
- **DONE**: All phases completed, score calculated, report generated
|
||||
- **DONE_WITH_CONCERNS**: Audit completed but findings exceed acceptable thresholds
|
||||
- **BLOCKED**: Required tools unavailable (e.g., npm/pip not installed), permission denied
|
||||
- **NEEDS_CONTEXT**: Ambiguous project scope, unclear trust boundaries
|
||||
|
||||
Escalation follows the Three-Strike Rule (section 14) per step.
|
||||
139
.claude/skills/security-audit/phases/01-supply-chain-scan.md
Normal file
139
.claude/skills/security-audit/phases/01-supply-chain-scan.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Phase 1: Supply Chain Scan
|
||||
|
||||
Detect low-hanging security risks in dependencies, secrets, CI/CD pipelines, and LLM/AI integrations.
|
||||
|
||||
## Objective
|
||||
|
||||
- Audit third-party dependencies for known vulnerabilities
|
||||
- Scan source code for leaked secrets and credentials
|
||||
- Review CI/CD configuration for injection risks
|
||||
- Check for LLM/AI prompt injection vulnerabilities
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Dependency Audit
|
||||
|
||||
Detect package manager and run appropriate audit tool.
|
||||
|
||||
```bash
|
||||
# Node.js projects
|
||||
if [ -f package-lock.json ] || [ -f yarn.lock ]; then
|
||||
npm audit --json > "${WORK_DIR}/npm-audit-raw.json" 2>&1 || true
|
||||
fi
|
||||
|
||||
# Python projects
|
||||
if [ -f requirements.txt ] || [ -f pyproject.toml ]; then
|
||||
pip-audit --format json --output "${WORK_DIR}/pip-audit-raw.json" 2>&1 || true
|
||||
# Fallback: safety check
|
||||
safety check --json > "${WORK_DIR}/safety-raw.json" 2>&1 || true
|
||||
fi
|
||||
|
||||
# Go projects
|
||||
if [ -f go.sum ]; then
|
||||
govulncheck ./... 2>&1 | tee "${WORK_DIR}/govulncheck-raw.txt" || true
|
||||
fi
|
||||
```
|
||||
|
||||
If audit tools are not installed, log as INFO finding and continue.
|
||||
|
||||
### Step 2: Secrets Detection
|
||||
|
||||
Scan source files for hardcoded secrets using regex patterns.
|
||||
|
||||
```bash
|
||||
# High-confidence patterns (case-insensitive)
|
||||
grep -rniE \
|
||||
'(api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token|secret[_-]?key)\s*[:=]\s*["\x27][A-Za-z0-9+/=_-]{16,}' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' --include='*.go' \
|
||||
--include='*.java' --include='*.rb' --include='*.env' --include='*.yml' \
|
||||
--include='*.yaml' --include='*.json' --include='*.toml' --include='*.cfg' \
|
||||
. || true
|
||||
|
||||
# AWS patterns
|
||||
grep -rniE '(AKIA[0-9A-Z]{16}|aws[_-]?secret[_-]?access[_-]?key)' . || true
|
||||
|
||||
# Private keys
|
||||
grep -rniE '-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----' . || true
|
||||
|
||||
# Connection strings with passwords
|
||||
grep -rniE '(mongodb|postgres|mysql|redis)://[^:]+:[^@]+@' . || true
|
||||
|
||||
# JWT tokens (hardcoded)
|
||||
grep -rniE 'eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}' . || true
|
||||
```
|
||||
|
||||
Exclude: `node_modules/`, `.git/`, `dist/`, `build/`, `__pycache__/`, `*.lock`, `*.min.js`.
|
||||
|
||||
### Step 3: CI/CD Config Review
|
||||
|
||||
Check GitHub Actions and other CI/CD configs for injection risks.
|
||||
|
||||
```bash
|
||||
# Find workflow files
|
||||
find .github/workflows -name '*.yml' -o -name '*.yaml' 2>/dev/null
|
||||
|
||||
# Check for expression injection in run: blocks
|
||||
# Dangerous: ${{ github.event.pull_request.title }} in run:
|
||||
grep -rn '\${{.*github\.event\.' .github/workflows/ 2>/dev/null || true
|
||||
|
||||
# Check for pull_request_target with checkout of PR code
|
||||
grep -rn 'pull_request_target' .github/workflows/ 2>/dev/null || true
|
||||
|
||||
# Check for use of deprecated/vulnerable actions
|
||||
grep -rn 'actions/checkout@v1\|actions/checkout@v2' .github/workflows/ 2>/dev/null || true
|
||||
|
||||
# Check for secrets passed to untrusted contexts
|
||||
grep -rn 'secrets\.' .github/workflows/ 2>/dev/null || true
|
||||
```
|
||||
|
||||
### Step 4: LLM/AI Prompt Injection Check
|
||||
|
||||
Scan for patterns indicating prompt injection risk in LLM integrations.
|
||||
|
||||
```bash
|
||||
# User input concatenated directly into prompts
|
||||
grep -rniE '(prompt|system_message|messages)\s*[+=].*\b(user_input|request\.(body|query|params)|req\.)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Template strings with user data in LLM calls
|
||||
grep -rniE '(openai|anthropic|llm|chat|completion)\.' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Check for missing input sanitization before LLM calls
|
||||
grep -rniE 'f".*{.*}.*".*\.(chat|complete|generate)' \
|
||||
--include='*.py' . || true
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **File**: `supply-chain-report.json`
|
||||
- **Location**: `${WORK_DIR}/supply-chain-report.json`
|
||||
- **Format**: JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "supply-chain-scan",
|
||||
"timestamp": "ISO-8601",
|
||||
"findings": [
|
||||
{
|
||||
"category": "dependency|secret|cicd|llm",
|
||||
"severity": "critical|high|medium|low",
|
||||
"title": "Finding title",
|
||||
"description": "Detailed description",
|
||||
"file": "path/to/file",
|
||||
"line": 42,
|
||||
"evidence": "matched text or context",
|
||||
"remediation": "How to fix"
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"total": 0,
|
||||
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
|
||||
"by_category": { "dependency": 0, "secret": 0, "cicd": 0, "llm": 0 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Next Phase
|
||||
|
||||
Proceed to [Phase 2: OWASP Review](02-owasp-review.md) with supply chain findings as context.
|
||||
156
.claude/skills/security-audit/phases/02-owasp-review.md
Normal file
156
.claude/skills/security-audit/phases/02-owasp-review.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# Phase 2: OWASP Review
|
||||
|
||||
Systematic code-level review against OWASP Top 10 2021 categories.
|
||||
|
||||
## Objective
|
||||
|
||||
- Review codebase against all 10 OWASP Top 10 2021 categories
|
||||
- Use CCW CLI multi-model analysis for comprehensive coverage
|
||||
- Produce structured findings with file:line references and remediation steps
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Phase 1 supply-chain-report.json (provides dependency context)
|
||||
- Read [specs/owasp-checklist.md](../specs/owasp-checklist.md) for detection patterns
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Identify Target Scope
|
||||
|
||||
```bash
|
||||
# Identify source directories (exclude deps, build, test fixtures)
|
||||
# Focus on: API routes, auth modules, data access, input handlers
|
||||
find . -type f \( -name '*.ts' -o -name '*.js' -o -name '*.py' -o -name '*.go' -o -name '*.java' \) \
|
||||
! -path '*/node_modules/*' ! -path '*/dist/*' ! -path '*/.git/*' \
|
||||
! -path '*/build/*' ! -path '*/__pycache__/*' ! -path '*/vendor/*' \
|
||||
| head -200
|
||||
```
|
||||
|
||||
### Step 2: CCW CLI Analysis
|
||||
|
||||
Run multi-model security analysis using the security risks rule template.
|
||||
|
||||
```bash
|
||||
ccw cli -p "PURPOSE: OWASP Top 10 2021 security audit of this codebase.
|
||||
Systematically check each OWASP category:
|
||||
A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
|
||||
A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
|
||||
A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
|
||||
A09 Security Logging/Monitoring Failures | A10 SSRF
|
||||
|
||||
TASK: For each OWASP category, scan relevant code patterns, identify vulnerabilities with file:line references, classify severity, provide remediation.
|
||||
|
||||
MODE: analysis
|
||||
|
||||
CONTEXT: @src/**/* @**/*.config.* @**/*.env.example
|
||||
|
||||
EXPECTED: JSON-structured findings per OWASP category with severity, file:line, evidence, remediation.
|
||||
|
||||
CONSTRAINTS: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks
|
||||
" --tool gemini --mode analysis --rule analysis-assess-security-risks
|
||||
```
|
||||
|
||||
### Step 3: Manual Pattern Scanning
|
||||
|
||||
Supplement CLI analysis with targeted pattern scans per OWASP category. Reference [specs/owasp-checklist.md](../specs/owasp-checklist.md) for full pattern list.
|
||||
|
||||
**A01 - Broken Access Control**:
|
||||
```bash
|
||||
# Missing auth middleware on routes
|
||||
grep -rn 'app\.\(get\|post\|put\|delete\|patch\)(' --include='*.ts' --include='*.js' . | grep -v 'auth\|middleware\|protect'
|
||||
# Direct object references without ownership check
|
||||
grep -rn 'params\.id\|req\.params\.' --include='*.ts' --include='*.js' . || true
|
||||
```
|
||||
|
||||
**A03 - Injection**:
|
||||
```bash
|
||||
# SQL string concatenation
|
||||
grep -rniE '(query|execute|raw)\s*\(\s*[`"'\'']\s*SELECT.*\+\s*|f".*SELECT.*{' --include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
# Command injection
|
||||
grep -rniE '(exec|spawn|system|popen|subprocess)\s*\(' --include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
```
|
||||
|
||||
**A05 - Security Misconfiguration**:
|
||||
```bash
|
||||
# Debug mode enabled
|
||||
grep -rniE '(DEBUG|debug)\s*[:=]\s*(true|True|1|"true")' --include='*.env' --include='*.py' --include='*.ts' --include='*.json' . || true
|
||||
# CORS wildcard
|
||||
grep -rniE "cors.*\*|Access-Control-Allow-Origin.*\*" --include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
```
|
||||
|
||||
**A07 - Identification and Authentication Failures**:
|
||||
```bash
|
||||
# Weak password patterns
|
||||
grep -rniE 'password.*length.*[0-5][^0-9]|minlength.*[0-5][^0-9]' --include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
# Hardcoded credentials
|
||||
grep -rniE '(password|passwd|pwd)\s*[:=]\s*["\x27][^"\x27]{3,}' --include='*.ts' --include='*.js' --include='*.py' --include='*.env' . || true
|
||||
```
|
||||
|
||||
### Step 4: Consolidate Findings
|
||||
|
||||
Merge CLI analysis results and manual pattern scan results. Deduplicate and classify by OWASP category.
|
||||
|
||||
## OWASP Top 10 2021 Categories
|
||||
|
||||
| ID | Category | Key Checks |
|
||||
|----|----------|------------|
|
||||
| A01 | Broken Access Control | Missing auth, IDOR, path traversal, CORS |
|
||||
| A02 | Cryptographic Failures | Weak algorithms, plaintext storage, missing TLS |
|
||||
| A03 | Injection | SQL, NoSQL, OS command, LDAP, XPath injection |
|
||||
| A04 | Insecure Design | Missing threat modeling, insecure business logic |
|
||||
| A05 | Security Misconfiguration | Debug enabled, default creds, verbose errors |
|
||||
| A06 | Vulnerable and Outdated Components | Known CVEs in dependencies (from Phase 1) |
|
||||
| A07 | Identification and Authentication Failures | Weak passwords, missing MFA, session issues |
|
||||
| A08 | Software and Data Integrity Failures | Unsigned updates, insecure deserialization, CI/CD |
|
||||
| A09 | Security Logging and Monitoring Failures | Missing audit logs, no alerting, insufficient logging |
|
||||
| A10 | Server-Side Request Forgery (SSRF) | Unvalidated URLs, internal resource access |
|
||||
|
||||
## Output
|
||||
|
||||
- **File**: `owasp-findings.json`
|
||||
- **Location**: `${WORK_DIR}/owasp-findings.json`
|
||||
- **Format**: JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "owasp-review",
|
||||
"timestamp": "ISO-8601",
|
||||
"owasp_version": "2021",
|
||||
"findings": [
|
||||
{
|
||||
"owasp_id": "A01",
|
||||
"owasp_category": "Broken Access Control",
|
||||
"severity": "critical|high|medium|low",
|
||||
"title": "Finding title",
|
||||
"description": "Detailed description",
|
||||
"file": "path/to/file",
|
||||
"line": 42,
|
||||
"evidence": "code snippet or pattern match",
|
||||
"remediation": "Specific fix recommendation",
|
||||
"cwe": "CWE-XXX"
|
||||
}
|
||||
],
|
||||
"coverage": {
|
||||
"A01": "checked|not_applicable",
|
||||
"A02": "checked|not_applicable",
|
||||
"A03": "checked|not_applicable",
|
||||
"A04": "checked|not_applicable",
|
||||
"A05": "checked|not_applicable",
|
||||
"A06": "checked|not_applicable",
|
||||
"A07": "checked|not_applicable",
|
||||
"A08": "checked|not_applicable",
|
||||
"A09": "checked|not_applicable",
|
||||
"A10": "checked|not_applicable"
|
||||
},
|
||||
"summary": {
|
||||
"total": 0,
|
||||
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
|
||||
"categories_checked": 10,
|
||||
"categories_with_findings": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Next Phase
|
||||
|
||||
Proceed to [Phase 3: Threat Modeling](03-threat-modeling.md) with OWASP findings as input for STRIDE analysis.
|
||||
180
.claude/skills/security-audit/phases/03-threat-modeling.md
Normal file
180
.claude/skills/security-audit/phases/03-threat-modeling.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# Phase 3: Threat Modeling (STRIDE)
|
||||
|
||||
Map STRIDE threat categories to architecture components, identify trust boundaries, and assess attack surface.
|
||||
|
||||
## Objective
|
||||
|
||||
- Apply the STRIDE threat model to the project architecture
|
||||
- Identify trust boundaries between system components
|
||||
- Assess attack surface area per component
|
||||
- Cross-reference with Phase 1 and Phase 2 findings
|
||||
|
||||
## STRIDE Categories
|
||||
|
||||
| Category | Threat | Question | Typical Targets |
|
||||
|----------|--------|----------|-----------------|
|
||||
| **S** - Spoofing | Identity impersonation | Can an attacker pretend to be someone else? | Auth endpoints, API keys, session tokens |
|
||||
| **T** - Tampering | Data modification | Can data be modified in transit or at rest? | Request bodies, database records, config files |
|
||||
| **R** - Repudiation | Deniable actions | Can a user deny performing an action? | Audit logs, transaction records, user actions |
|
||||
| **I** - Information Disclosure | Data leakage | Can sensitive data be exposed? | Error messages, logs, API responses, storage |
|
||||
| **D** - Denial of Service | Availability disruption | Can the system be made unavailable? | API endpoints, resource-intensive operations |
|
||||
| **E** - Elevation of Privilege | Unauthorized access | Can a user gain higher privileges? | Role checks, admin routes, permission logic |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Architecture Component Discovery
|
||||
|
||||
Identify major system components by scanning project structure.
|
||||
|
||||
```bash
|
||||
# Identify entry points (API routes, CLI commands, event handlers)
|
||||
grep -rlE '(app\.(get|post|put|delete|patch|use)|router\.|@app\.route|@router\.)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Identify data stores (database connections, file storage)
|
||||
grep -rlE '(createConnection|mongoose\.connect|sqlite|redis|S3|createClient)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Identify external service integrations
|
||||
grep -rlE '(fetch|axios|http\.request|requests\.(get|post)|urllib)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Identify auth/session components
|
||||
grep -rlE '(jwt|passport|session|oauth|bcrypt|argon2|crypto)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
```
|
||||
|
||||
### Step 2: Trust Boundary Identification
|
||||
|
||||
Map trust boundaries in the system:
|
||||
|
||||
1. **External boundary**: User/browser <-> Application server
|
||||
2. **Service boundary**: Application <-> External APIs/services
|
||||
3. **Data boundary**: Application <-> Database/storage
|
||||
4. **Internal boundary**: Public routes <-> Authenticated routes <-> Admin routes
|
||||
5. **Process boundary**: Main process <-> Worker/subprocess
|
||||
|
||||
For each boundary, document:
|
||||
- What crosses the boundary (data types, credentials)
|
||||
- How the boundary is enforced (middleware, TLS, auth)
|
||||
- What happens when enforcement fails
|
||||
|
||||
### Step 3: STRIDE per Component
|
||||
|
||||
For each discovered component, systematically evaluate all 6 STRIDE categories:
|
||||
|
||||
**Spoofing Analysis**:
|
||||
- Are authentication mechanisms in place at all entry points?
|
||||
- Can API keys or tokens be forged or replayed?
|
||||
- Are session tokens properly validated and rotated?
|
||||
|
||||
**Tampering Analysis**:
|
||||
- Is input validation applied before processing?
|
||||
- Are database queries parameterized?
|
||||
- Can request bodies or headers be manipulated to alter behavior?
|
||||
- Are file uploads validated for type and content?
|
||||
|
||||
**Repudiation Analysis**:
|
||||
- Are user actions logged with sufficient detail (who, what, when)?
|
||||
- Are logs tamper-proof or centralized?
|
||||
- Can critical operations (payments, deletions) be traced to a user?
|
||||
|
||||
**Information Disclosure Analysis**:
|
||||
- Do error responses leak stack traces or internal paths?
|
||||
- Are sensitive fields (passwords, tokens) excluded from logs and API responses?
|
||||
- Is PII properly handled (encryption at rest, masking in logs)?
|
||||
- Do debug endpoints or verbose modes expose internals?
|
||||
|
||||
**Denial of Service Analysis**:
|
||||
- Are rate limits applied to public endpoints?
|
||||
- Can resource-intensive operations be triggered without limits?
|
||||
- Are file upload sizes bounded?
|
||||
- Are database queries bounded (pagination, timeouts)?
|
||||
|
||||
**Elevation of Privilege Analysis**:
|
||||
- Are role/permission checks applied consistently?
|
||||
- Can horizontal privilege escalation occur (accessing other users' data)?
|
||||
- Can vertical escalation occur (user -> admin)?
|
||||
- Are admin/debug routes properly protected?
|
||||
|
||||
### Step 4: Attack Surface Assessment
|
||||
|
||||
Quantify the attack surface:
|
||||
|
||||
```
|
||||
Attack Surface = Sum of:
|
||||
- Number of public API endpoints
|
||||
- Number of external service integrations
|
||||
- Number of user-controllable input points
|
||||
- Number of privileged operations
|
||||
- Number of data stores with sensitive content
|
||||
```
|
||||
|
||||
Rate each component:
|
||||
- **High exposure**: Public-facing, handles sensitive data, complex logic
|
||||
- **Medium exposure**: Authenticated access, moderate data sensitivity
|
||||
- **Low exposure**: Internal only, no sensitive data, simple operations
|
||||
|
||||
## Output
|
||||
|
||||
- **File**: `threat-model.json`
|
||||
- **Location**: `${WORK_DIR}/threat-model.json`
|
||||
- **Format**: JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "threat-modeling",
|
||||
"timestamp": "ISO-8601",
|
||||
"framework": "STRIDE",
|
||||
"components": [
|
||||
{
|
||||
"name": "Component name",
|
||||
"type": "api_endpoint|data_store|external_service|auth_module|worker",
|
||||
"files": ["path/to/file.ts"],
|
||||
"exposure": "high|medium|low",
|
||||
"trust_boundaries": ["external", "data"],
|
||||
"threats": {
|
||||
"spoofing": {
|
||||
"applicable": true,
|
||||
"findings": ["Description of threat"],
|
||||
"mitigations": ["Existing mitigation"],
|
||||
"gaps": ["Missing mitigation"]
|
||||
},
|
||||
"tampering": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
|
||||
"repudiation": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
|
||||
"information_disclosure": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
|
||||
"denial_of_service": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
|
||||
"elevation_of_privilege": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] }
|
||||
}
|
||||
}
|
||||
],
|
||||
"trust_boundaries": [
|
||||
{
|
||||
"name": "Boundary name",
|
||||
"from": "Component A",
|
||||
"to": "Component B",
|
||||
"enforcement": "TLS|auth_middleware|API_key",
|
||||
"data_crossing": ["request bodies", "credentials"],
|
||||
"risk_level": "high|medium|low"
|
||||
}
|
||||
],
|
||||
"attack_surface": {
|
||||
"public_endpoints": 0,
|
||||
"external_integrations": 0,
|
||||
"input_points": 0,
|
||||
"privileged_operations": 0,
|
||||
"sensitive_data_stores": 0,
|
||||
"total_score": 0
|
||||
},
|
||||
"summary": {
|
||||
"components_analyzed": 0,
|
||||
"threats_identified": 0,
|
||||
"by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 },
|
||||
"high_exposure_components": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Next Phase
|
||||
|
||||
Proceed to [Phase 4: Report & Tracking](04-report-tracking.md) with the threat model to generate the final scored audit report.
|
||||
177
.claude/skills/security-audit/phases/04-report-tracking.md
Normal file
177
.claude/skills/security-audit/phases/04-report-tracking.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# Phase 4: Report & Tracking
|
||||
|
||||
Generate scored audit report, compare with previous audits, and track trends.
|
||||
|
||||
## Objective
|
||||
|
||||
- Calculate security score from all phase findings
|
||||
- Compare with previous audit results (if available)
|
||||
- Generate date-stamped report in `.workflow/.security/`
|
||||
- Track improvement or regression trends
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Phase 1: `supply-chain-report.json`
|
||||
- Phase 2: `owasp-findings.json`
|
||||
- Phase 3: `threat-model.json`
|
||||
- Previous audit: `.workflow/.security/audit-report-*.json` (optional)
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Aggregate Findings
|
||||
|
||||
Collect all findings from phases 1-3 and classify by severity.
|
||||
|
||||
```
|
||||
All findings =
|
||||
supply-chain-report.findings
|
||||
+ owasp-findings.findings
|
||||
+ threat-model threats (where gaps exist)
|
||||
```
|
||||
|
||||
### Step 2: Calculate Score
|
||||
|
||||
Apply scoring formula from [specs/scoring-gates.md](../specs/scoring-gates.md):
|
||||
|
||||
```
|
||||
Base score = 10.0
|
||||
|
||||
For each finding:
|
||||
penalty = severity_weight / total_files_scanned
|
||||
- Critical: weight = 10 (each critical finding has outsized impact)
|
||||
- High: weight = 7
|
||||
- Medium: weight = 4
|
||||
- Low: weight = 1
|
||||
|
||||
Weighted penalty = SUM(finding_weight * count_per_severity) / normalization_factor
|
||||
Final score = max(0, 10.0 - weighted_penalty)
|
||||
|
||||
Normalization factor = max(10, total_files_scanned)
|
||||
```
|
||||
|
||||
**Score interpretation**:
|
||||
|
||||
| Score | Rating | Meaning |
|
||||
|-------|--------|---------|
|
||||
| 9-10 | Excellent | Minimal risk, production-ready |
|
||||
| 7-8 | Good | Acceptable risk, minor improvements needed |
|
||||
| 5-6 | Fair | Notable risks, remediation recommended |
|
||||
| 3-4 | Poor | Significant risks, remediation required |
|
||||
| 0-2 | Critical | Severe vulnerabilities, immediate action needed |
|
||||
|
||||
### Step 3: Gate Evaluation
|
||||
|
||||
**Daily quick-scan gate** (Phase 1 only):
|
||||
- PASS: score >= 8/10
|
||||
- FAIL: score < 8/10 -- block deployment or flag for review
|
||||
|
||||
**Comprehensive audit gate** (all phases):
|
||||
- For initial/baseline: PASS if score >= 2/10 (establishes baseline)
|
||||
- For subsequent: PASS if score >= previous_score (no regression)
|
||||
- Target: score >= 7/10 for production readiness
|
||||
|
||||
### Step 4: Trend Comparison
|
||||
|
||||
```bash
|
||||
# Find previous audit reports
|
||||
ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
Compare current vs. previous:
|
||||
- Delta per OWASP category
|
||||
- Delta per STRIDE category
|
||||
- New findings vs. resolved findings
|
||||
- Overall score trend
|
||||
|
||||
### Step 5: Generate Report
|
||||
|
||||
Write the final report with all consolidated data.
|
||||
|
||||
## Output
|
||||
|
||||
- **File**: `audit-report-{YYYY-MM-DD}.json`
|
||||
- **Location**: `.workflow/.security/audit-report-{YYYY-MM-DD}.json`
|
||||
- **Format**: JSON
|
||||
|
||||
```json
|
||||
{
|
||||
"report": "security-audit",
|
||||
"version": "1.0",
|
||||
"timestamp": "ISO-8601",
|
||||
"date": "YYYY-MM-DD",
|
||||
"mode": "comprehensive|quick-scan",
|
||||
"score": {
|
||||
"overall": 7.5,
|
||||
"rating": "Good",
|
||||
"gate": "PASS|FAIL",
|
||||
"gate_threshold": 8
|
||||
},
|
||||
"findings_summary": {
|
||||
"total": 0,
|
||||
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
|
||||
"by_phase": {
|
||||
"supply_chain": 0,
|
||||
"owasp": 0,
|
||||
"stride": 0
|
||||
},
|
||||
"by_owasp": {
|
||||
"A01": 0, "A02": 0, "A03": 0, "A04": 0, "A05": 0,
|
||||
"A06": 0, "A07": 0, "A08": 0, "A09": 0, "A10": 0
|
||||
},
|
||||
"by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 }
|
||||
},
|
||||
"top_risks": [
|
||||
{
|
||||
"rank": 1,
|
||||
"title": "Most critical finding",
|
||||
"severity": "critical",
|
||||
"source_phase": "owasp",
|
||||
"remediation": "How to fix",
|
||||
"effort": "low|medium|high"
|
||||
}
|
||||
],
|
||||
"trend": {
|
||||
"previous_date": "YYYY-MM-DD or null",
|
||||
"previous_score": 0,
|
||||
"score_delta": 0,
|
||||
"new_findings": 0,
|
||||
"resolved_findings": 0,
|
||||
"direction": "improving|stable|regressing|baseline"
|
||||
},
|
||||
"phases_completed": ["supply-chain-scan", "owasp-review", "threat-modeling", "report-tracking"],
|
||||
"files_scanned": 0,
|
||||
"remediation_priority": [
|
||||
{
|
||||
"priority": 1,
|
||||
"finding": "Finding title",
|
||||
"effort": "low",
|
||||
"impact": "high",
|
||||
"recommendation": "Specific action"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Report Storage
|
||||
|
||||
```bash
|
||||
# Ensure directory exists
|
||||
mkdir -p .workflow/.security
|
||||
|
||||
# Write report with date stamp
|
||||
DATE=$(date +%Y-%m-%d)
|
||||
cp "${WORK_DIR}/audit-report.json" ".workflow/.security/audit-report-${DATE}.json"
|
||||
|
||||
# Also maintain latest copies of phase outputs
|
||||
cp "${WORK_DIR}/supply-chain-report.json" ".workflow/.security/" 2>/dev/null || true
|
||||
cp "${WORK_DIR}/owasp-findings.json" ".workflow/.security/" 2>/dev/null || true
|
||||
cp "${WORK_DIR}/threat-model.json" ".workflow/.security/" 2>/dev/null || true
|
||||
```
|
||||
|
||||
## Completion
|
||||
|
||||
After report generation, output skill completion status per the Completion Status Protocol:
|
||||
|
||||
- **DONE**: All phases completed, report generated, score calculated
|
||||
- **DONE_WITH_CONCERNS**: Report generated but score below target or regression detected
|
||||
- **BLOCKED**: Phase data missing or corrupted
|
||||
442
.claude/skills/security-audit/specs/owasp-checklist.md
Normal file
442
.claude/skills/security-audit/specs/owasp-checklist.md
Normal file
@@ -0,0 +1,442 @@
|
||||
# OWASP Top 10 2021 Checklist
|
||||
|
||||
Code-level detection patterns, vulnerable code examples, and remediation templates for each OWASP category.
|
||||
|
||||
## When to Use
|
||||
|
||||
| Phase | Usage | Section |
|
||||
|-------|-------|---------|
|
||||
| Phase 2 | Reference during OWASP code review | All categories |
|
||||
| Phase 4 | Classify findings by OWASP category | Category IDs |
|
||||
|
||||
---
|
||||
|
||||
## A01: Broken Access Control
|
||||
|
||||
**CWE**: CWE-200, CWE-284, CWE-285, CWE-352, CWE-639
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# Missing auth middleware on route handlers
|
||||
grep -rnE 'app\.(get|post|put|delete|patch)\s*\(\s*["\x27/]' --include='*.ts' --include='*.js' .
|
||||
# Then verify each route has auth middleware
|
||||
|
||||
# Direct object reference without ownership check
|
||||
grep -rnE 'findById\(.*params|findOne\(.*params|\.get\(.*id' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# Path traversal patterns
|
||||
grep -rnE '(readFile|writeFile|createReadStream|open)\s*\(.*req\.' --include='*.ts' --include='*.js' .
|
||||
grep -rnE 'os\.path\.join\(.*request\.' --include='*.py' .
|
||||
|
||||
# Missing CORS restrictions
|
||||
grep -rnE 'Access-Control-Allow-Origin.*\*|cors\(\s*\)' --include='*.ts' --include='*.js' .
|
||||
```
|
||||
|
||||
### Vulnerable Code Example
|
||||
|
||||
```javascript
|
||||
// BAD: No ownership check
|
||||
app.get('/api/documents/:id', auth, async (req, res) => {
|
||||
const doc = await Document.findById(req.params.id); // Any user can access any doc
|
||||
res.json(doc);
|
||||
});
|
||||
```
|
||||
|
||||
### Remediation
|
||||
|
||||
```javascript
|
||||
// GOOD: Ownership check
|
||||
app.get('/api/documents/:id', auth, async (req, res) => {
|
||||
const doc = await Document.findOne({ _id: req.params.id, owner: req.user.id });
|
||||
if (!doc) return res.status(404).json({ error: 'Not found' });
|
||||
res.json(doc);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## A02: Cryptographic Failures
|
||||
|
||||
**CWE**: CWE-259, CWE-327, CWE-331, CWE-798
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# Weak hash algorithms
|
||||
grep -rniE '(md5|sha1)\s*\(' --include='*.ts' --include='*.js' --include='*.py' --include='*.java' .
|
||||
|
||||
# Plaintext password storage
|
||||
grep -rniE 'password\s*[:=]\s*.*\.(body|query|params)' --include='*.ts' --include='*.js' .
|
||||
|
||||
# Hardcoded encryption keys
|
||||
grep -rniE '(encrypt|cipher|secret|key)\s*[:=]\s*["\x27][A-Za-z0-9+/=]{8,}' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# HTTP (not HTTPS) for sensitive operations
|
||||
grep -rniE 'http://.*\.(api|auth|login|payment)' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# Missing encryption at rest
|
||||
grep -rniE '(password|ssn|credit.?card|social.?security)' --include='*.sql' --include='*.prisma' --include='*.schema' .
|
||||
```
|
||||
|
||||
### Vulnerable Code Example
|
||||
|
||||
```python
|
||||
# BAD: MD5 for password hashing
|
||||
import hashlib
|
||||
password_hash = hashlib.md5(password.encode()).hexdigest()
|
||||
```
|
||||
|
||||
### Remediation
|
||||
|
||||
```python
|
||||
# GOOD: bcrypt with proper work factor
|
||||
import bcrypt
|
||||
password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## A03: Injection
|
||||
|
||||
**CWE**: CWE-20, CWE-74, CWE-79, CWE-89
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# SQL string concatenation/interpolation
|
||||
grep -rniE "(query|execute|raw)\s*\(\s*[\`\"'].*(\+|\$\{|%s|\.format)" --include='*.ts' --include='*.js' --include='*.py' .
|
||||
grep -rniE "f[\"'].*SELECT.*\{" --include='*.py' .
|
||||
|
||||
# NoSQL injection
|
||||
grep -rniE '\$where|\$regex.*req\.' --include='*.ts' --include='*.js' .
|
||||
grep -rniE 'find\(\s*\{.*req\.(body|query|params)' --include='*.ts' --include='*.js' .
|
||||
|
||||
# OS command injection
|
||||
grep -rniE '(child_process|exec|execSync|spawn|system|popen|subprocess)\s*\(.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# XPath/LDAP injection
|
||||
grep -rniE '(xpath|ldap).*\+.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# Template injection
|
||||
grep -rniE '(render_template_string|Template\(.*req\.|eval\(.*req\.)' --include='*.py' --include='*.js' .
|
||||
```
|
||||
|
||||
### Vulnerable Code Example
|
||||
|
||||
```javascript
|
||||
// BAD: SQL string concatenation
|
||||
const result = await db.query(`SELECT * FROM users WHERE id = ${req.params.id}`);
|
||||
```
|
||||
|
||||
### Remediation
|
||||
|
||||
```javascript
|
||||
// GOOD: Parameterized query
|
||||
const result = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## A04: Insecure Design
|
||||
|
||||
**CWE**: CWE-209, CWE-256, CWE-501, CWE-522
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# Missing rate limiting on auth endpoints
|
||||
grep -rniE '(login|register|reset.?password|forgot.?password)' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
# Then check if rate limiting middleware is applied
|
||||
|
||||
# No account lockout mechanism
|
||||
grep -rniE 'failed.?login|login.?attempt|max.?retries' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# Business logic without validation
|
||||
grep -rniE '(transfer|withdraw|purchase|delete.?account)' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
# Then check for confirmation/validation steps
|
||||
```
|
||||
|
||||
### Checks
|
||||
|
||||
- [ ] Authentication flows have rate limiting
|
||||
- [ ] Account lockout after N failed attempts
|
||||
- [ ] Multi-step operations have proper state validation
|
||||
- [ ] Business-critical operations require confirmation
|
||||
- [ ] Threat modeling has been performed (see Phase 3)
|
||||
|
||||
### Remediation
|
||||
|
||||
Implement defense-in-depth: rate limiting, input validation, business logic validation, and multi-step confirmation for critical operations.
|
||||
|
||||
---
|
||||
|
||||
## A05: Security Misconfiguration
|
||||
|
||||
**CWE**: CWE-2, CWE-11, CWE-13, CWE-15, CWE-16, CWE-388
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# Debug mode enabled
|
||||
grep -rniE '(DEBUG|NODE_ENV)\s*[:=]\s*(true|True|1|"development"|"debug")' \
|
||||
--include='*.env' --include='*.env.*' --include='*.py' --include='*.json' --include='*.yaml' .
|
||||
|
||||
# Default credentials
|
||||
grep -rniE '(admin|root|test|default).*[:=].*password' --include='*.env' --include='*.yaml' --include='*.json' --include='*.py' .
|
||||
|
||||
# Verbose error responses (stack traces to client)
|
||||
grep -rniE '(stack|stackTrace|traceback).*res\.(json|send)|app\.use.*err.*stack' --include='*.ts' --include='*.js' .
|
||||
|
||||
# Missing security headers
|
||||
grep -rniE '(helmet|X-Frame-Options|X-Content-Type-Options|Strict-Transport-Security)' --include='*.ts' --include='*.js' .
|
||||
|
||||
# Directory listing enabled
|
||||
grep -rniE 'autoindex\s+on|directory.?listing|serveStatic.*index.*false' --include='*.conf' --include='*.ts' --include='*.js' .
|
||||
|
||||
# Unnecessary features/services
|
||||
grep -rniE '(graphiql|playground|swagger-ui).*true' --include='*.ts' --include='*.js' --include='*.py' --include='*.yaml' .
|
||||
```
|
||||
|
||||
### Vulnerable Code Example
|
||||
|
||||
```javascript
|
||||
// BAD: Stack trace in error response
|
||||
app.use((err, req, res, next) => {
|
||||
res.status(500).json({ error: err.message, stack: err.stack });
|
||||
});
|
||||
```
|
||||
|
||||
### Remediation
|
||||
|
||||
```javascript
|
||||
// GOOD: Generic error response in production
|
||||
app.use((err, req, res, next) => {
|
||||
console.error(err.stack); // Log internally
|
||||
res.status(500).json({ error: 'Internal server error' });
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## A06: Vulnerable and Outdated Components
|
||||
|
||||
**CWE**: CWE-1104
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# Check dependency lock files age
|
||||
ls -la package-lock.json yarn.lock requirements.txt Pipfile.lock go.sum 2>/dev/null
|
||||
|
||||
# Run package audits (from Phase 1)
|
||||
npm audit --json 2>/dev/null
|
||||
pip-audit --format json 2>/dev/null
|
||||
|
||||
# Check for pinned vs unpinned dependencies
|
||||
grep -E ':\s*"\^|:\s*"~|:\s*"\*|>=\s' package.json 2>/dev/null
|
||||
grep -E '^[a-zA-Z].*[^=]==[^=]' requirements.txt 2>/dev/null # Good: pinned
|
||||
grep -E '^[a-zA-Z].*>=|^[a-zA-Z][^=]*$' requirements.txt 2>/dev/null # Bad: unpinned
|
||||
```
|
||||
|
||||
### Checks
|
||||
|
||||
- [ ] All dependencies have pinned versions
|
||||
- [ ] No known CVEs in dependencies (via audit tools)
|
||||
- [ ] Dependencies are actively maintained (not abandoned)
|
||||
- [ ] Lock files are committed to version control
|
||||
|
||||
### Remediation
|
||||
|
||||
Run `npm audit fix` or `pip install --upgrade` for vulnerable packages. Pin all dependency versions. Set up automated dependency scanning (Dependabot, Renovate).
|
||||
|
||||
---
|
||||
|
||||
## A07: Identification and Authentication Failures
|
||||
|
||||
**CWE**: CWE-255, CWE-259, CWE-287, CWE-384
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# Weak password requirements
|
||||
grep -rniE 'password.*length.*[0-5]|minlength.*[0-5]|min.?length.*[0-5]' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# Missing password hashing
|
||||
grep -rniE 'password\s*[:=].*req\.' --include='*.ts' --include='*.js' .
|
||||
# Then check if bcrypt/argon2/scrypt is used before storage
|
||||
|
||||
# Session fixation (no rotation after login)
|
||||
grep -rniE 'session\.regenerate|session\.id\s*=' --include='*.ts' --include='*.js' .
|
||||
|
||||
# JWT without expiration
|
||||
grep -rniE 'jwt\.sign\(' --include='*.ts' --include='*.js' .
|
||||
# Then check for expiresIn option
|
||||
|
||||
# Credentials in URL
|
||||
grep -rniE '(token|key|password|secret)=[^&\s]+' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
```
|
||||
|
||||
### Vulnerable Code Example
|
||||
|
||||
```javascript
|
||||
// BAD: JWT without expiration
|
||||
const token = jwt.sign({ userId: user.id }, SECRET);
|
||||
```
|
||||
|
||||
### Remediation
|
||||
|
||||
```javascript
|
||||
// GOOD: JWT with expiration and proper claims
|
||||
const token = jwt.sign(
|
||||
{ userId: user.id, role: user.role },
|
||||
SECRET,
|
||||
{ expiresIn: '1h', issuer: 'myapp', audience: 'myapp-client' }
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## A08: Software and Data Integrity Failures
|
||||
|
||||
**CWE**: CWE-345, CWE-353, CWE-426, CWE-494, CWE-502
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# Insecure deserialization
|
||||
grep -rniE '(pickle\.load|yaml\.load\(|unserialize|JSON\.parse\(.*req\.|eval\()' --include='*.py' --include='*.ts' --include='*.js' --include='*.php' .
|
||||
|
||||
# Missing integrity checks on downloads/updates
|
||||
grep -rniE '(download|fetch|curl|wget)' --include='*.sh' --include='*.yaml' --include='*.yml' .
|
||||
# Then check for checksum/signature verification
|
||||
|
||||
# CI/CD pipeline without pinned action versions
|
||||
grep -rniE 'uses:\s*[^@]+$|uses:.*@(main|master|latest)' .github/workflows/*.yml 2>/dev/null
|
||||
|
||||
# Unsafe YAML loading
|
||||
grep -rniE 'yaml\.load\(' --include='*.py' .
|
||||
# Should be yaml.safe_load()
|
||||
```
|
||||
|
||||
### Vulnerable Code Example
|
||||
|
||||
```python
|
||||
# BAD: Unsafe YAML loading
|
||||
import yaml
|
||||
data = yaml.load(user_input) # Allows arbitrary code execution
|
||||
```
|
||||
|
||||
### Remediation
|
||||
|
||||
```python
|
||||
# GOOD: Safe YAML loading
|
||||
import yaml
|
||||
data = yaml.safe_load(user_input)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## A09: Security Logging and Monitoring Failures
|
||||
|
||||
**CWE**: CWE-223, CWE-532, CWE-778
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# Check for logging of auth events
|
||||
grep -rniE '(log|logger|logging)\.' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
# Then check if login/logout/failed-auth events are logged
|
||||
|
||||
# Sensitive data in logs
|
||||
grep -rniE 'log.*(password|token|secret|credit.?card|ssn)' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# Empty catch blocks (swallowed errors)
|
||||
grep -rniE 'catch\s*\([^)]*\)\s*\{\s*\}' --include='*.ts' --include='*.js' .
|
||||
|
||||
# Missing audit trail for critical operations
|
||||
grep -rniE '(delete|update|create|transfer)' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
# Then check if these operations are logged with user context
|
||||
```
|
||||
|
||||
### Checks
|
||||
|
||||
- [ ] Failed login attempts are logged with IP and timestamp
|
||||
- [ ] Successful logins are logged
|
||||
- [ ] Access control failures are logged
|
||||
- [ ] Input validation failures are logged
|
||||
- [ ] Sensitive data is NOT logged (passwords, tokens, PII)
|
||||
- [ ] Logs include sufficient context (who, what, when, where)
|
||||
|
||||
### Remediation
|
||||
|
||||
Implement structured logging with: user ID, action, timestamp, IP address, result (success/failure). Exclude sensitive data. Set up log monitoring and alerting for anomalous patterns.
|
||||
|
||||
---
|
||||
|
||||
## A10: Server-Side Request Forgery (SSRF)
|
||||
|
||||
**CWE**: CWE-918
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
```bash
|
||||
# User-controlled URLs in fetch/request calls
|
||||
grep -rniE '(fetch|axios|http\.request|requests\.(get|post)|urllib)\s*\(.*req\.(body|query|params)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# URL construction from user input
|
||||
grep -rniE '(url|endpoint|target|redirect)\s*[:=].*req\.(body|query|params)' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# Image/file fetch from URL
|
||||
grep -rniE '(download|fetchImage|getFile|loadUrl)\s*\(.*req\.' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
|
||||
# Redirect without validation
|
||||
grep -rniE 'res\.redirect\(.*req\.|redirect_to.*request\.' --include='*.ts' --include='*.js' --include='*.py' .
|
||||
```
|
||||
|
||||
### Vulnerable Code Example
|
||||
|
||||
```javascript
|
||||
// BAD: Unvalidated URL fetch
|
||||
app.get('/proxy', async (req, res) => {
|
||||
const response = await fetch(req.query.url); // Can access internal services
|
||||
res.send(await response.text());
|
||||
});
|
||||
```
|
||||
|
||||
### Remediation
|
||||
|
||||
```javascript
|
||||
// GOOD: URL allowlist validation
|
||||
const ALLOWED_HOSTS = ['api.example.com', 'cdn.example.com'];
|
||||
|
||||
app.get('/proxy', async (req, res) => {
|
||||
const url = new URL(req.query.url);
|
||||
if (!ALLOWED_HOSTS.includes(url.hostname)) {
|
||||
return res.status(400).json({ error: 'Host not allowed' });
|
||||
}
|
||||
if (url.protocol !== 'https:') {
|
||||
return res.status(400).json({ error: 'HTTPS required' });
|
||||
}
|
||||
const response = await fetch(url.toString());
|
||||
res.send(await response.text());
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| ID | Category | Key Grep Pattern | Severity Baseline |
|
||||
|----|----------|-----------------|-------------------|
|
||||
| A01 | Broken Access Control | `findById.*params` without owner check | High |
|
||||
| A02 | Cryptographic Failures | `md5\|sha1` for passwords | High |
|
||||
| A03 | Injection | `query.*\+.*req\.\|f".*SELECT.*\{` | Critical |
|
||||
| A04 | Insecure Design | Missing rate limit on auth routes | Medium |
|
||||
| A05 | Security Misconfiguration | `DEBUG.*true\|stack.*res.json` | Medium |
|
||||
| A06 | Vulnerable Components | `npm audit` / `pip-audit` results | Varies |
|
||||
| A07 | Auth Failures | `jwt.sign` without `expiresIn` | High |
|
||||
| A08 | Integrity Failures | `pickle.load\|yaml.load` | High |
|
||||
| A09 | Logging Failures | Empty catch blocks, no auth logging | Medium |
|
||||
| A10 | SSRF | `fetch.*req.query.url` | High |
|
||||
141
.claude/skills/security-audit/specs/scoring-gates.md
Normal file
141
.claude/skills/security-audit/specs/scoring-gates.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# Scoring Gates
|
||||
|
||||
Defines the 10-point scoring system, severity weights, quality gates, and trend tracking format for security audits.
|
||||
|
||||
## When to Use
|
||||
|
||||
| Phase | Usage | Section |
|
||||
|-------|-------|---------|
|
||||
| Phase 1 | Quick-scan scoring (daily gate) | Severity Weights, Daily Gate |
|
||||
| Phase 4 | Full audit scoring and reporting | All sections |
|
||||
|
||||
---
|
||||
|
||||
## 10-Point Scale
|
||||
|
||||
All security audit scores are on a 0-10 scale where 10 = no findings and 0 = critical exposure.
|
||||
|
||||
| Score | Rating | Description |
|
||||
|-------|--------|-------------|
|
||||
| 9.0 - 10.0 | Excellent | Minimal risk. Production-ready without reservations. |
|
||||
| 7.0 - 8.9 | Good | Low risk. Acceptable for production with minor improvements. |
|
||||
| 5.0 - 6.9 | Fair | Moderate risk. Remediation recommended before production. |
|
||||
| 3.0 - 4.9 | Poor | High risk. Remediation required. Not production-ready. |
|
||||
| 0.0 - 2.9 | Critical | Severe exposure. Immediate action required. |
|
||||
|
||||
## Severity Weights
|
||||
|
||||
Each finding is weighted by severity for score calculation.
|
||||
|
||||
| Severity | Weight | Criteria | Examples |
|
||||
|----------|--------|----------|----------|
|
||||
| **Critical** | 10 | Exploitable with high impact, no user interaction needed | RCE, SQL injection with data access, leaked production credentials, auth bypass |
|
||||
| **High** | 7 | Exploitable with significant impact, may need user interaction | Broken authentication, SSRF, privilege escalation, XSS with session theft |
|
||||
| **Medium** | 4 | Limited exploitability or moderate impact | Reflected XSS, CSRF, verbose error messages, missing security headers |
|
||||
| **Low** | 1 | Informational or minimal impact | Missing best-practice headers, minor info disclosure, deprecated dependencies without known exploit |
|
||||
|
||||
## Score Calculation
|
||||
|
||||
```
|
||||
Input:
|
||||
findings[] -- array of all findings with severity
|
||||
files_scanned -- total source files analyzed
|
||||
|
||||
Algorithm:
|
||||
base_score = 10.0
|
||||
normalization = max(10, files_scanned)
|
||||
|
||||
weighted_sum = 0
|
||||
for each finding:
|
||||
weighted_sum += severity_weight(finding.severity)
|
||||
|
||||
penalty = weighted_sum / normalization
|
||||
final_score = max(0, base_score - penalty)
|
||||
final_score = round(final_score, 1)
|
||||
|
||||
return final_score
|
||||
```
|
||||
|
||||
**Example**:
|
||||
|
||||
| Findings | Files Scanned | Weighted Sum | Penalty | Score |
|
||||
|----------|--------------|--------------|---------|-------|
|
||||
| 1 critical | 50 | 10 | 0.2 | 9.8 |
|
||||
| 2 critical, 3 high | 50 | 41 | 0.82 | 9.2 |
|
||||
| 5 critical, 10 high | 50 | 120 | 2.4 | 7.6 |
|
||||
| 10 critical, 20 high, 15 medium | 100 | 300 | 3.0 | 7.0 |
|
||||
| 20 critical | 20 | 200 | 10.0 | 0.0 |
|
||||
|
||||
## Quality Gates
|
||||
|
||||
### Daily Quick-Scan Gate
|
||||
|
||||
Applies to Phase 1 (Supply Chain Scan) only.
|
||||
|
||||
| Result | Condition | Action |
|
||||
|--------|-----------|--------|
|
||||
| **PASS** | score >= 8.0 | Continue. No blocking issues. |
|
||||
| **WARN** | 6.0 <= score < 8.0 | Log warning. Review findings before deploy. |
|
||||
| **FAIL** | score < 6.0 | Block deployment. Remediate critical/high findings. |
|
||||
|
||||
### Comprehensive Audit Gate
|
||||
|
||||
Applies to full audit (all 4 phases).
|
||||
|
||||
**Initial/Baseline audit** (no previous audit exists):
|
||||
|
||||
| Result | Condition | Action |
|
||||
|--------|-----------|--------|
|
||||
| **PASS** | score >= 2.0 | Baseline established. Plan remediation. |
|
||||
| **FAIL** | score < 2.0 | Critical exposure. Immediate triage required. |
|
||||
|
||||
**Subsequent audits** (previous audit exists):
|
||||
|
||||
| Result | Condition | Action |
|
||||
|--------|-----------|--------|
|
||||
| **PASS** | score >= previous_score | No regression. Continue improvement. |
|
||||
| **WARN** | score within 0.5 of previous | Marginal change. Review new findings. |
|
||||
| **FAIL** | score < previous_score - 0.5 | Regression detected. Investigate new findings. |
|
||||
|
||||
**Production readiness target**: score >= 7.0
|
||||
|
||||
## Trend Tracking Format
|
||||
|
||||
Each audit report stores trend data for comparison.
|
||||
|
||||
```json
|
||||
{
|
||||
"trend": {
|
||||
"current_date": "2026-03-29",
|
||||
"current_score": 7.5,
|
||||
"previous_date": "2026-03-22",
|
||||
"previous_score": 6.8,
|
||||
"score_delta": 0.7,
|
||||
"new_findings": 2,
|
||||
"resolved_findings": 5,
|
||||
"direction": "improving",
|
||||
"history": [
|
||||
{ "date": "2026-03-15", "score": 5.2, "total_findings": 45 },
|
||||
{ "date": "2026-03-22", "score": 6.8, "total_findings": 32 },
|
||||
{ "date": "2026-03-29", "score": 7.5, "total_findings": 29 }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Direction values**:
|
||||
|
||||
| Direction | Condition |
|
||||
|-----------|-----------|
|
||||
| `improving` | score_delta > 0.5 |
|
||||
| `stable` | -0.5 <= score_delta <= 0.5 |
|
||||
| `regressing` | score_delta < -0.5 |
|
||||
| `baseline` | No previous audit exists |
|
||||
|
||||
## Finding Deduplication
|
||||
|
||||
When the same vulnerability appears in multiple phases:
|
||||
1. Keep the highest-severity classification
|
||||
2. Merge evidence from all phases
|
||||
3. Count as a single finding for scoring
|
||||
4. Note all phases that detected it
|
||||
105
.claude/skills/ship/SKILL.md
Normal file
105
.claude/skills/ship/SKILL.md
Normal file
@@ -0,0 +1,105 @@
|
||||
---
|
||||
name: ship
|
||||
description: Structured release pipeline with pre-flight checks, AI code review, version bump, changelog, and PR creation. Triggers on "ship", "release", "publish".
|
||||
allowed-tools: Read, Write, Bash, Glob, Grep
|
||||
---
|
||||
|
||||
# Ship
|
||||
|
||||
Structured release pipeline that guides code from working branch to pull request through 5 gated phases: pre-flight checks, automated code review, version bump, changelog generation, and PR creation.
|
||||
|
||||
## Key Design Principles
|
||||
|
||||
1. **Phase Gates**: Each phase must pass before the next begins — no shipping broken code
|
||||
2. **Multi-Project Support**: Detects npm (package.json), Python (pyproject.toml), and generic (VERSION) projects
|
||||
3. **AI-Powered Review**: Uses CCW CLI to run automated code review before release
|
||||
4. **Audit Trail**: Each phase produces structured output for traceability
|
||||
5. **Safe Defaults**: Warns on risky operations (direct push to main, major version bumps)
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
User: "ship" / "release" / "publish"
|
||||
|
|
||||
v
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ Phase 1: Pre-Flight Checks │
|
||||
│ → git clean? branch ok? tests pass? build ok? │
|
||||
│ → Output: preflight-report.json │
|
||||
│ → Gate: ALL checks must pass │
|
||||
├──────────────────────────────────────────────────────────┤
|
||||
│ Phase 2: Code Review │
|
||||
│ → detect merge base, diff against base │
|
||||
│ → ccw cli --tool gemini --mode analysis │
|
||||
│ → flag high-risk changes │
|
||||
│ → Output: review-summary │
|
||||
│ → Gate: No critical issues flagged │
|
||||
├──────────────────────────────────────────────────────────┤
|
||||
│ Phase 3: Version Bump │
|
||||
│ → detect version file (package.json/pyproject.toml/VERSION)
|
||||
│ → determine bump type from commits or user input │
|
||||
│ → update version file │
|
||||
│ → Output: version change record │
|
||||
│ → Gate: Version updated successfully │
|
||||
├──────────────────────────────────────────────────────────┤
|
||||
│ Phase 4: Changelog & Commit │
|
||||
│ → generate changelog from git log since last tag │
|
||||
│ → update CHANGELOG.md │
|
||||
│ → create release commit, push to remote │
|
||||
│ → Output: commit SHA │
|
||||
│ → Gate: Push successful │
|
||||
├──────────────────────────────────────────────────────────┤
|
||||
│ Phase 5: PR Creation │
|
||||
│ → gh pr create with structured body │
|
||||
│ → auto-link issues from commits │
|
||||
│ → Output: PR URL │
|
||||
│ → Gate: PR created │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Execution Flow
|
||||
|
||||
Execute phases sequentially. Each phase has a gate condition — if the gate fails, stop and report status.
|
||||
|
||||
1. **Phase 1**: [Pre-Flight Checks](phases/01-preflight-checks.md) -- Validate git state, branch, tests, build
|
||||
2. **Phase 2**: [Code Review](phases/02-code-review.md) -- AI-powered diff review with risk assessment
|
||||
3. **Phase 3**: [Version Bump](phases/03-version-bump.md) -- Detect and update version across project types
|
||||
4. **Phase 4**: [Changelog & Commit](phases/04-changelog-commit.md) -- Generate changelog, create release commit, push
|
||||
5. **Phase 5**: [PR Creation](phases/05-pr-creation.md) -- Create PR with structured body and issue links
|
||||
|
||||
## Pre-Flight Checklist (Quick Reference)
|
||||
|
||||
| Check | Command | Pass Condition |
|
||||
|-------|---------|----------------|
|
||||
| Git clean | `git status --porcelain` | Empty output |
|
||||
| Branch | `git branch --show-current` | Not main/master |
|
||||
| Tests | `npm test` / `pytest` | Exit code 0 |
|
||||
| Build | `npm run build` / `python -m build` | Exit code 0 |
|
||||
|
||||
## Completion Status Protocol
|
||||
|
||||
This skill follows the Completion Status Protocol defined in [SKILL-DESIGN-SPEC.md sections 13-14](../_shared/SKILL-DESIGN-SPEC.md#13-completion-status-protocol).
|
||||
|
||||
Every execution terminates with one of:
|
||||
|
||||
| Status | When |
|
||||
|--------|------|
|
||||
| **DONE** | All 5 phases completed, PR created |
|
||||
| **DONE_WITH_CONCERNS** | PR created but with review warnings or non-critical issues |
|
||||
| **BLOCKED** | A gate failed (dirty git, tests fail, push rejected) |
|
||||
| **NEEDS_CONTEXT** | Cannot determine bump type, ambiguous branch target |
|
||||
|
||||
### Escalation
|
||||
|
||||
Follows the Three-Strike Rule (SKILL-DESIGN-SPEC section 14). On 3 consecutive failures at the same step, stop and output diagnostic dump.
|
||||
|
||||
## Reference Documents
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [phases/01-preflight-checks.md](phases/01-preflight-checks.md) | Git, branch, test, build validation |
|
||||
| [phases/02-code-review.md](phases/02-code-review.md) | AI-powered diff review |
|
||||
| [phases/03-version-bump.md](phases/03-version-bump.md) | Version detection and bump |
|
||||
| [phases/04-changelog-commit.md](phases/04-changelog-commit.md) | Changelog generation and release commit |
|
||||
| [phases/05-pr-creation.md](phases/05-pr-creation.md) | PR creation with issue linking |
|
||||
| [../_shared/SKILL-DESIGN-SPEC.md](../_shared/SKILL-DESIGN-SPEC.md) | Skill design spec (completion protocol, escalation) |
|
||||
121
.claude/skills/ship/phases/01-preflight-checks.md
Normal file
121
.claude/skills/ship/phases/01-preflight-checks.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Phase 1: Pre-Flight Checks
|
||||
|
||||
Validate that the repository is in a shippable state before proceeding with the release pipeline.
|
||||
|
||||
## Objective
|
||||
|
||||
- Confirm working tree is clean (no uncommitted changes)
|
||||
- Validate current branch is appropriate for release
|
||||
- Run test suite and confirm all tests pass
|
||||
- Verify build succeeds
|
||||
|
||||
## Gate Condition
|
||||
|
||||
ALL four checks must pass. If any check fails, stop the pipeline and report BLOCKED status with the specific failure.
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Git Clean Check
|
||||
|
||||
```bash
|
||||
git_status=$(git status --porcelain)
|
||||
if [ -n "$git_status" ]; then
|
||||
echo "FAIL: Working tree is dirty"
|
||||
echo "$git_status"
|
||||
# Gate: BLOCKED — commit or stash changes first
|
||||
else
|
||||
echo "PASS: Working tree is clean"
|
||||
fi
|
||||
```
|
||||
|
||||
**Pass condition**: `git status --porcelain` produces empty output.
|
||||
**On failure**: Report dirty files and suggest `git stash` or `git commit`.
|
||||
|
||||
### Step 2: Branch Validation
|
||||
|
||||
```bash
|
||||
current_branch=$(git branch --show-current)
|
||||
if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
|
||||
echo "WARN: Currently on $current_branch — direct push to main/master is risky"
|
||||
# Ask user for confirmation before proceeding
|
||||
else
|
||||
echo "PASS: On branch $current_branch"
|
||||
fi
|
||||
```
|
||||
|
||||
**Pass condition**: Not on main/master, OR user explicitly confirms direct-to-main release.
|
||||
**On warning**: Ask user to confirm they intend to release from main/master directly.
|
||||
|
||||
### Step 3: Test Suite Execution
|
||||
|
||||
Detect and run the project's test suite:
|
||||
|
||||
```bash
|
||||
# Detection priority:
|
||||
# 1. package.json with "test" script → npm test
|
||||
# 2. pytest available and tests exist → pytest
|
||||
# 3. No tests found → WARN and continue
|
||||
|
||||
if [ -f "package.json" ] && grep -q '"test"' package.json; then
|
||||
npm test
|
||||
elif command -v pytest &>/dev/null && [ -d "tests" -o -d "test" ]; then
|
||||
pytest
|
||||
elif [ -f "pyproject.toml" ] && grep -q 'pytest' pyproject.toml; then
|
||||
pytest
|
||||
else
|
||||
echo "WARN: No test suite detected — skipping test check"
|
||||
fi
|
||||
```
|
||||
|
||||
**Pass condition**: Test command exits with code 0, or no tests detected (warn).
|
||||
**On failure**: Report test failures and stop the pipeline.
|
||||
|
||||
### Step 4: Build Verification
|
||||
|
||||
Detect and run the project's build step:
|
||||
|
||||
```bash
|
||||
# Detection priority:
|
||||
# 1. package.json with "build" script → npm run build
|
||||
# 2. pyproject.toml → python -m build (if build module available)
|
||||
# 3. Makefile with build target → make build
|
||||
# 4. No build step → PASS (not all projects need a build)
|
||||
|
||||
if [ -f "package.json" ] && grep -q '"build"' package.json; then
|
||||
npm run build
|
||||
elif [ -f "pyproject.toml" ] && python -m build --help &>/dev/null; then
|
||||
python -m build
|
||||
elif [ -f "Makefile" ] && grep -q '^build:' Makefile; then
|
||||
make build
|
||||
else
|
||||
echo "INFO: No build step detected — skipping build check"
|
||||
fi
|
||||
```
|
||||
|
||||
**Pass condition**: Build command exits with code 0, or no build step detected.
|
||||
**On failure**: Report build errors and stop the pipeline.
|
||||
|
||||
## Output
|
||||
|
||||
- **Format**: JSON object with pass/fail per check
|
||||
- **Structure**:
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "preflight",
|
||||
"timestamp": "ISO-8601",
|
||||
"checks": {
|
||||
"git_clean": { "status": "pass|fail", "details": "" },
|
||||
"branch": { "status": "pass|warn", "current": "branch-name", "details": "" },
|
||||
"tests": { "status": "pass|fail|skip", "details": "" },
|
||||
"build": { "status": "pass|fail|skip", "details": "" }
|
||||
},
|
||||
"overall": "pass|fail",
|
||||
"blockers": []
|
||||
}
|
||||
```
|
||||
|
||||
## Next Phase
|
||||
|
||||
If all checks pass, proceed to [Phase 2: Code Review](02-code-review.md).
|
||||
If any check fails, report BLOCKED status with the preflight report.
|
||||
137
.claude/skills/ship/phases/02-code-review.md
Normal file
137
.claude/skills/ship/phases/02-code-review.md
Normal file
@@ -0,0 +1,137 @@
|
||||
# Phase 2: Code Review
|
||||
|
||||
Automated AI-powered code review of changes since the base branch, with risk assessment.
|
||||
|
||||
## Objective
|
||||
|
||||
- Detect the merge base between current branch and target branch
|
||||
- Generate diff for review
|
||||
- Run AI-powered code review via CCW CLI
|
||||
- Flag high-risk changes (large diffs, sensitive files, breaking changes)
|
||||
|
||||
## Gate Condition
|
||||
|
||||
No critical issues flagged by the review. Warnings are reported but do not block.
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Detect Merge Base
|
||||
|
||||
```bash
|
||||
# Determine target branch (default: main, fallback: master)
|
||||
target_branch="main"
|
||||
if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
|
||||
target_branch="master"
|
||||
fi
|
||||
|
||||
# Find merge base
|
||||
merge_base=$(git merge-base "origin/$target_branch" HEAD)
|
||||
echo "Merge base: $merge_base"
|
||||
|
||||
# If on main/master directly, compare against last tag
|
||||
current_branch=$(git branch --show-current)
|
||||
if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
|
||||
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
if [ -n "$last_tag" ]; then
|
||||
merge_base="$last_tag"
|
||||
echo "On main — using last tag as base: $last_tag"
|
||||
else
|
||||
# Use first commit if no tags exist
|
||||
merge_base=$(git rev-list --max-parents=0 HEAD | head -1)
|
||||
echo "No tags found — using initial commit as base"
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
### Step 2: Generate Diff Summary
|
||||
|
||||
```bash
|
||||
# File-level summary
|
||||
git diff --stat "$merge_base"...HEAD
|
||||
|
||||
# Full diff for review
|
||||
git diff "$merge_base"...HEAD > /tmp/ship-review-diff.txt
|
||||
|
||||
# Count changes for risk assessment
|
||||
files_changed=$(git diff --name-only "$merge_base"...HEAD | wc -l)
|
||||
lines_added=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$1} END {print s}')
|
||||
lines_removed=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$2} END {print s}')
|
||||
```
|
||||
|
||||
### Step 3: Risk Assessment
|
||||
|
||||
Flag high-risk indicators before AI review:
|
||||
|
||||
| Risk Factor | Threshold | Risk Level |
|
||||
|-------------|-----------|------------|
|
||||
| Files changed | > 50 | High |
|
||||
| Lines changed | > 1000 | High |
|
||||
| Sensitive files modified | Any of: `.env*`, `*secret*`, `*credential*`, `*auth*`, `*.key`, `*.pem` | High |
|
||||
| Config files modified | `package.json`, `pyproject.toml`, `tsconfig.json`, `Dockerfile` | Medium |
|
||||
| Migration files | `*migration*`, `*migrate*` | Medium |
|
||||
|
||||
```bash
|
||||
# Check for sensitive file changes
|
||||
sensitive_files=$(git diff --name-only "$merge_base"...HEAD | grep -iE '\.(env|key|pem)|secret|credential' || true)
|
||||
if [ -n "$sensitive_files" ]; then
|
||||
echo "HIGH RISK: Sensitive files modified:"
|
||||
echo "$sensitive_files"
|
||||
fi
|
||||
```
|
||||
|
||||
### Step 4: AI Code Review
|
||||
|
||||
Use CCW CLI for automated analysis:
|
||||
|
||||
```bash
|
||||
ccw cli -p "PURPOSE: Review code changes for release readiness; success = all critical issues identified with file:line references
|
||||
TASK: Review diff for bugs | Check for breaking changes | Identify security concerns | Assess test coverage gaps
|
||||
MODE: analysis
|
||||
CONTEXT: @**/* | Reviewing diff from $merge_base to HEAD ($files_changed files, +$lines_added/-$lines_removed lines)
|
||||
EXPECTED: Risk assessment (low/medium/high), list of issues with severity and file:line, release recommendation (ship/hold/fix-first)
|
||||
CONSTRAINTS: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes
|
||||
" --tool gemini --mode analysis
|
||||
```
|
||||
|
||||
**Note**: Wait for the CLI analysis to complete before proceeding. Do not proceed to Phase 3 while review is running.
|
||||
|
||||
### Step 5: Evaluate Review Results
|
||||
|
||||
Based on the AI review output:
|
||||
|
||||
| Review Result | Action |
|
||||
|---------------|--------|
|
||||
| No critical issues | Proceed to Phase 3 |
|
||||
| Critical issues found | Report BLOCKED, list issues |
|
||||
| Warnings only | Proceed with DONE_WITH_CONCERNS note |
|
||||
| Review failed/timeout | Ask user whether to proceed or retry |
|
||||
|
||||
## Output
|
||||
|
||||
- **Format**: Review summary with risk assessment
|
||||
- **Structure**:
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "code-review",
|
||||
"merge_base": "commit-sha",
|
||||
"stats": {
|
||||
"files_changed": 0,
|
||||
"lines_added": 0,
|
||||
"lines_removed": 0
|
||||
},
|
||||
"risk_level": "low|medium|high",
|
||||
"risk_factors": [],
|
||||
"ai_review": {
|
||||
"recommendation": "ship|hold|fix-first",
|
||||
"critical_issues": [],
|
||||
"warnings": []
|
||||
},
|
||||
"overall": "pass|fail|warn"
|
||||
}
|
||||
```
|
||||
|
||||
## Next Phase
|
||||
|
||||
If review passes (no critical issues), proceed to [Phase 3: Version Bump](03-version-bump.md).
|
||||
If critical issues found, report BLOCKED status with review summary.
|
||||
171
.claude/skills/ship/phases/03-version-bump.md
Normal file
171
.claude/skills/ship/phases/03-version-bump.md
Normal file
@@ -0,0 +1,171 @@
|
||||
# Phase 3: Version Bump
|
||||
|
||||
Detect the current version, determine the bump type, and update the version file.
|
||||
|
||||
## Objective
|
||||
|
||||
- Detect which version file the project uses
|
||||
- Read the current version
|
||||
- Determine bump type (patch/minor/major) from commit messages or user input
|
||||
- Update the version file
|
||||
- Record the version change
|
||||
|
||||
## Gate Condition
|
||||
|
||||
Version file updated successfully with the new version.
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Detect Version File
|
||||
|
||||
Detection priority order:
|
||||
|
||||
| Priority | File | Read Method |
|
||||
|----------|------|-------------|
|
||||
| 1 | `package.json` | `jq -r .version package.json` |
|
||||
| 2 | `pyproject.toml` | `grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml` |
|
||||
| 3 | `VERSION` | `cat VERSION` |
|
||||
|
||||
```bash
|
||||
if [ -f "package.json" ]; then
|
||||
version_file="package.json"
|
||||
current_version=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
|
||||
elif [ -f "pyproject.toml" ]; then
|
||||
version_file="pyproject.toml"
|
||||
current_version=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
|
||||
elif [ -f "VERSION" ]; then
|
||||
version_file="VERSION"
|
||||
current_version=$(cat VERSION | tr -d '[:space:]')
|
||||
else
|
||||
echo "NEEDS_CONTEXT: No version file found"
|
||||
echo "Expected one of: package.json, pyproject.toml, VERSION"
|
||||
# Ask user which file to use or create
|
||||
fi
|
||||
|
||||
echo "Version file: $version_file"
|
||||
echo "Current version: $current_version"
|
||||
```
|
||||
|
||||
### Step 2: Determine Bump Type
|
||||
|
||||
**Auto-detection from commit messages** (conventional commits):
|
||||
|
||||
```bash
|
||||
# Get commits since last tag
|
||||
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
if [ -n "$last_tag" ]; then
|
||||
commits=$(git log "$last_tag"..HEAD --oneline)
|
||||
else
|
||||
commits=$(git log --oneline -20)
|
||||
fi
|
||||
|
||||
# Scan for conventional commit prefixes
|
||||
has_breaking=$(echo "$commits" | grep -iE '(BREAKING CHANGE|!:)' || true)
|
||||
has_feat=$(echo "$commits" | grep -iE '^[a-f0-9]+ feat' || true)
|
||||
has_fix=$(echo "$commits" | grep -iE '^[a-f0-9]+ fix' || true)
|
||||
|
||||
if [ -n "$has_breaking" ]; then
|
||||
suggested_bump="major"
|
||||
elif [ -n "$has_feat" ]; then
|
||||
suggested_bump="minor"
|
||||
else
|
||||
suggested_bump="patch"
|
||||
fi
|
||||
|
||||
echo "Suggested bump: $suggested_bump"
|
||||
```
|
||||
|
||||
**User confirmation**:
|
||||
- For `patch` and `minor`: proceed with suggested bump, inform user
|
||||
- For `major`: always ask user to confirm before proceeding (major bumps have significant implications)
|
||||
- User can override the suggestion with an explicit bump type
|
||||
|
||||
### Step 3: Calculate New Version
|
||||
|
||||
```bash
|
||||
# Parse semver components
|
||||
IFS='.' read -r major minor patch <<< "$current_version"
|
||||
|
||||
case "$bump_type" in
|
||||
major)
|
||||
new_version="$((major + 1)).0.0"
|
||||
;;
|
||||
minor)
|
||||
new_version="${major}.$((minor + 1)).0"
|
||||
;;
|
||||
patch)
|
||||
new_version="${major}.${minor}.$((patch + 1))"
|
||||
;;
|
||||
esac
|
||||
|
||||
echo "Version bump: $current_version -> $new_version"
|
||||
```
|
||||
|
||||
### Step 4: Update Version File
|
||||
|
||||
```bash
|
||||
case "$version_file" in
|
||||
package.json)
|
||||
# Use node/jq for safe JSON update
|
||||
jq --arg v "$new_version" '.version = $v' package.json > tmp.json && mv tmp.json package.json
|
||||
# Also update package-lock.json if it exists
|
||||
if [ -f "package-lock.json" ]; then
|
||||
jq --arg v "$new_version" '.version = $v | .packages[""].version = $v' package-lock.json > tmp.json && mv tmp.json package-lock.json
|
||||
fi
|
||||
;;
|
||||
pyproject.toml)
|
||||
# Use sed for TOML update (version line in [project] or [tool.poetry])
|
||||
sed -i "s/^version\s*=\s*\".*\"/version = \"$new_version\"/" pyproject.toml
|
||||
;;
|
||||
VERSION)
|
||||
echo "$new_version" > VERSION
|
||||
;;
|
||||
esac
|
||||
|
||||
echo "Updated $version_file: $current_version -> $new_version"
|
||||
```
|
||||
|
||||
### Step 5: Verify Update
|
||||
|
||||
```bash
|
||||
# Re-read to confirm
|
||||
case "$version_file" in
|
||||
package.json)
|
||||
verified=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
|
||||
;;
|
||||
pyproject.toml)
|
||||
verified=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
|
||||
;;
|
||||
VERSION)
|
||||
verified=$(cat VERSION | tr -d '[:space:]')
|
||||
;;
|
||||
esac
|
||||
|
||||
if [ "$verified" = "$new_version" ]; then
|
||||
echo "PASS: Version verified as $new_version"
|
||||
else
|
||||
echo "FAIL: Version mismatch — expected $new_version, got $verified"
|
||||
fi
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **Format**: Version change record
|
||||
- **Structure**:
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "version-bump",
|
||||
"version_file": "package.json",
|
||||
"previous_version": "1.2.3",
|
||||
"new_version": "1.3.0",
|
||||
"bump_type": "minor",
|
||||
"bump_source": "auto-detected|user-specified",
|
||||
"overall": "pass|fail"
|
||||
}
|
||||
```
|
||||
|
||||
## Next Phase
|
||||
|
||||
If version updated successfully, proceed to [Phase 4: Changelog & Commit](04-changelog-commit.md).
|
||||
If version update fails, report BLOCKED status.
|
||||
167
.claude/skills/ship/phases/04-changelog-commit.md
Normal file
167
.claude/skills/ship/phases/04-changelog-commit.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# Phase 4: Changelog & Commit
|
||||
|
||||
Generate changelog entry from git history, update CHANGELOG.md, create release commit, and push to remote.
|
||||
|
||||
## Objective
|
||||
|
||||
- Parse git log since last tag into grouped changelog entry
|
||||
- Update or create CHANGELOG.md
|
||||
- Create a release commit with version in the message
|
||||
- Push the branch to remote
|
||||
|
||||
## Gate Condition
|
||||
|
||||
Release commit created and pushed to remote successfully.
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Gather Commits Since Last Tag
|
||||
|
||||
```bash
|
||||
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
|
||||
if [ -n "$last_tag" ]; then
|
||||
echo "Generating changelog since tag: $last_tag"
|
||||
git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges
|
||||
else
|
||||
echo "No previous tag found — using last 50 commits"
|
||||
git log --pretty=format:"%h %s" --no-merges -50
|
||||
fi
|
||||
```
|
||||
|
||||
### Step 2: Group Commits by Conventional Commit Type
|
||||
|
||||
Parse commit messages and group into categories:
|
||||
|
||||
| Prefix | Category | Changelog Section |
|
||||
|--------|----------|-------------------|
|
||||
| `feat:` / `feat(*):`| Features | **Features** |
|
||||
| `fix:` / `fix(*):`| Bug Fixes | **Bug Fixes** |
|
||||
| `perf:` | Performance | **Performance** |
|
||||
| `docs:` | Documentation | **Documentation** |
|
||||
| `refactor:` | Refactoring | **Refactoring** |
|
||||
| `chore:` | Maintenance | **Maintenance** |
|
||||
| `test:` | Testing | *(omitted from changelog)* |
|
||||
| Other | Miscellaneous | **Other Changes** |
|
||||
|
||||
```bash
|
||||
# Example grouping logic (executed by the agent, not a literal script):
|
||||
# 1. Read all commits since last tag
|
||||
# 2. Parse prefix from each commit message
|
||||
# 3. Group into categories
|
||||
# 4. Format as markdown sections
|
||||
# 5. Omit empty categories
|
||||
```
|
||||
|
||||
### Step 3: Format Changelog Entry
|
||||
|
||||
Generate a markdown changelog entry:
|
||||
|
||||
```markdown
|
||||
## [X.Y.Z] - YYYY-MM-DD
|
||||
|
||||
### Features
|
||||
- feat: description (sha)
|
||||
- feat(scope): description (sha)
|
||||
|
||||
### Bug Fixes
|
||||
- fix: description (sha)
|
||||
|
||||
### Performance
|
||||
- perf: description (sha)
|
||||
|
||||
### Other Changes
|
||||
- chore: description (sha)
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Date format: YYYY-MM-DD (ISO 8601)
|
||||
- Each entry includes the short SHA for traceability
|
||||
- Empty categories are omitted
|
||||
- Entries are listed in chronological order within each category
|
||||
|
||||
### Step 4: Update CHANGELOG.md
|
||||
|
||||
```bash
|
||||
if [ -f "CHANGELOG.md" ]; then
|
||||
# Insert new entry after the first heading line (# Changelog)
|
||||
# The new entry goes between the main heading and the previous version entry
|
||||
# Use Write tool to insert the new section at the correct position
|
||||
echo "Updating existing CHANGELOG.md"
|
||||
else
|
||||
# Create new CHANGELOG.md with header
|
||||
echo "Creating new CHANGELOG.md"
|
||||
fi
|
||||
```
|
||||
|
||||
**CHANGELOG.md structure**:
|
||||
```markdown
|
||||
# Changelog
|
||||
|
||||
## [X.Y.Z] - YYYY-MM-DD
|
||||
(new entry here)
|
||||
|
||||
## [X.Y.Z-1] - YYYY-MM-DD
|
||||
(previous entry)
|
||||
```
|
||||
|
||||
### Step 5: Create Release Commit
|
||||
|
||||
```bash
|
||||
# Stage version file and changelog
|
||||
git add package.json package-lock.json pyproject.toml VERSION CHANGELOG.md 2>/dev/null
|
||||
|
||||
# Only stage files that actually exist and are modified
|
||||
git add -u
|
||||
|
||||
# Create release commit
|
||||
git commit -m "$(cat <<'EOF'
|
||||
chore: bump version to X.Y.Z
|
||||
|
||||
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
**Commit message format**: `chore: bump version to X.Y.Z`
|
||||
- Follows conventional commit format
|
||||
- Includes Co-Authored-By trailer
|
||||
|
||||
### Step 6: Push to Remote
|
||||
|
||||
```bash
|
||||
current_branch=$(git branch --show-current)
|
||||
|
||||
# Check if remote tracking branch exists
|
||||
if git rev-parse --verify "origin/$current_branch" &>/dev/null; then
|
||||
git push origin "$current_branch"
|
||||
else
|
||||
git push -u origin "$current_branch"
|
||||
fi
|
||||
```
|
||||
|
||||
**On push failure**:
|
||||
- If rejected (non-fast-forward): Report BLOCKED, suggest `git pull --rebase`
|
||||
- If permission denied: Report BLOCKED, check remote access
|
||||
- If no remote configured: Report BLOCKED, suggest `git remote add`
|
||||
|
||||
## Output
|
||||
|
||||
- **Format**: Commit and push record
|
||||
- **Structure**:
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "changelog-commit",
|
||||
"changelog_entry": "## [X.Y.Z] - YYYY-MM-DD ...",
|
||||
"commit_sha": "abc1234",
|
||||
"commit_message": "chore: bump version to X.Y.Z",
|
||||
"pushed_to": "origin/branch-name",
|
||||
"overall": "pass|fail"
|
||||
}
|
||||
```
|
||||
|
||||
## Next Phase
|
||||
|
||||
If commit and push succeed, proceed to [Phase 5: PR Creation](05-pr-creation.md).
|
||||
If push fails, report BLOCKED status with error details.
|
||||
163
.claude/skills/ship/phases/05-pr-creation.md
Normal file
163
.claude/skills/ship/phases/05-pr-creation.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# Phase 5: PR Creation
|
||||
|
||||
Create a pull request via GitHub CLI with a structured body, linked issues, and release metadata.
|
||||
|
||||
## Objective
|
||||
|
||||
- Create a PR using `gh pr create` with structured body
|
||||
- Auto-link related issues from commit messages
|
||||
- Include release summary (version, changes, test plan)
|
||||
- Output the PR URL
|
||||
|
||||
## Gate Condition
|
||||
|
||||
PR created successfully and URL returned.
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Extract Issue References from Commits
|
||||
|
||||
```bash
|
||||
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
|
||||
if [ -n "$last_tag" ]; then
|
||||
commits=$(git log "$last_tag"..HEAD --pretty=format:"%s" --no-merges)
|
||||
else
|
||||
commits=$(git log --pretty=format:"%s" --no-merges -50)
|
||||
fi
|
||||
|
||||
# Extract issue references: fixes #N, closes #N, resolves #N, refs #N
|
||||
issues=$(echo "$commits" | grep -oiE '(fix(es)?|close[sd]?|resolve[sd]?|refs?)\s*#[0-9]+' | grep -oE '#[0-9]+' | sort -u || true)
|
||||
|
||||
echo "Referenced issues: $issues"
|
||||
```
|
||||
|
||||
### Step 2: Determine Target Branch
|
||||
|
||||
```bash
|
||||
# Default target: main (fallback: master)
|
||||
target_branch="main"
|
||||
if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
|
||||
target_branch="master"
|
||||
fi
|
||||
|
||||
current_branch=$(git branch --show-current)
|
||||
echo "PR: $current_branch -> $target_branch"
|
||||
```
|
||||
|
||||
### Step 3: Build PR Title
|
||||
|
||||
Format: `release: vX.Y.Z`
|
||||
|
||||
```bash
|
||||
pr_title="release: v${new_version}"
|
||||
```
|
||||
|
||||
If the version context is not available, fall back to a descriptive title from the branch name.
|
||||
|
||||
### Step 4: Build PR Body
|
||||
|
||||
Construct the PR body using a HEREDOC for correct formatting:
|
||||
|
||||
```bash
|
||||
# Gather change summary
|
||||
change_summary=$(git log "$merge_base"..HEAD --pretty=format:"- %s (%h)" --no-merges)
|
||||
|
||||
# Build linked issues section
|
||||
if [ -n "$issues" ]; then
|
||||
issues_section="## Linked Issues
|
||||
$(echo "$issues" | while read -r issue; do echo "- $issue"; done)"
|
||||
else
|
||||
issues_section=""
|
||||
fi
|
||||
```
|
||||
|
||||
### Step 5: Create PR via gh CLI
|
||||
|
||||
```bash
|
||||
gh pr create --title "$pr_title" --base "$target_branch" --body "$(cat <<'EOF'
|
||||
## Summary
|
||||
Release vX.Y.Z
|
||||
|
||||
### Changes
|
||||
- list of changes from changelog
|
||||
|
||||
## Linked Issues
|
||||
- #N (fixes)
|
||||
- #M (closes)
|
||||
|
||||
## Version
|
||||
- Previous: X.Y.Z-1
|
||||
- New: X.Y.Z
|
||||
- Bump type: patch|minor|major
|
||||
|
||||
## Test Plan
|
||||
- [ ] Pre-flight checks passed (git clean, branch, tests, build)
|
||||
- [ ] AI code review completed with no critical issues
|
||||
- [ ] Version bump verified in version file
|
||||
- [ ] Changelog updated with all changes since last release
|
||||
- [ ] Release commit pushed successfully
|
||||
|
||||
Generated with [Claude Code](https://claude.com/claude-code)
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
**PR body sections**:
|
||||
|
||||
| Section | Content |
|
||||
|---------|---------|
|
||||
| **Summary** | Version being released, one-line description |
|
||||
| **Changes** | Grouped changelog entries (from Phase 4) |
|
||||
| **Linked Issues** | Auto-extracted `fixes #N`, `closes #N` references |
|
||||
| **Version** | Previous version, new version, bump type |
|
||||
| **Test Plan** | Checklist confirming all phases passed |
|
||||
|
||||
### Step 6: Capture and Report PR URL
|
||||
|
||||
```bash
|
||||
# gh pr create outputs the PR URL on success
|
||||
pr_url=$(gh pr create ... 2>&1 | tail -1)
|
||||
echo "PR created: $pr_url"
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
- **Format**: PR creation record
|
||||
- **Structure**:
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "pr-creation",
|
||||
"pr_url": "https://github.com/owner/repo/pull/N",
|
||||
"pr_title": "release: vX.Y.Z",
|
||||
"target_branch": "main",
|
||||
"source_branch": "feature-branch",
|
||||
"linked_issues": ["#1", "#2"],
|
||||
"overall": "pass|fail"
|
||||
}
|
||||
```
|
||||
|
||||
## Completion
|
||||
|
||||
After PR creation, output the final Completion Status:
|
||||
|
||||
```
|
||||
## STATUS: DONE
|
||||
|
||||
**Summary**: Released vX.Y.Z — PR created at {pr_url}
|
||||
|
||||
### Details
|
||||
- Phases completed: 5/5
|
||||
- Version: {previous} -> {new} ({bump_type})
|
||||
- PR: {pr_url}
|
||||
- Key outputs: CHANGELOG.md updated, release commit pushed, PR created
|
||||
|
||||
### Outputs
|
||||
- CHANGELOG.md (updated)
|
||||
- {version_file} (version bumped)
|
||||
- Release commit: {sha}
|
||||
- PR: {pr_url}
|
||||
```
|
||||
|
||||
If there were review warnings, use `DONE_WITH_CONCERNS` and list the warnings in the Details section.
|
||||
392
.codex/skills/investigate/agents/investigator.md
Normal file
392
.codex/skills/investigate/agents/investigator.md
Normal file
@@ -0,0 +1,392 @@
|
||||
# Investigator Agent
|
||||
|
||||
Executes all 5 phases of the systematic debugging investigation under the Iron Law methodology. Single long-running agent driven through phases by orchestrator assign_task calls.
|
||||
|
||||
## Identity
|
||||
|
||||
- **Type**: `investigation`
|
||||
- **Role File**: `~/.codex/skills/investigate/agents/investigator.md`
|
||||
- **task_name**: `investigator`
|
||||
- **Responsibility**: Full 5-phase investigation execution — evidence collection, pattern search, hypothesis testing, minimal fix, verification
|
||||
- **fork_context**: false
|
||||
- **Reasoning Effort**: high
|
||||
|
||||
## Boundaries
|
||||
|
||||
### MUST
|
||||
|
||||
- Load role definition via MANDATORY FIRST STEPS pattern before any phase execution
|
||||
- Read the phase file at the start of each phase before executing that phase's steps
|
||||
- Collect concrete evidence before forming any theories (evidence-first)
|
||||
- Check `confirmed_root_cause` exists before executing Phase 4 (Iron Law gate)
|
||||
- Track 3-strike counter accurately in Phase 3
|
||||
- Implement only minimal fix — change only what addresses the confirmed root cause
|
||||
- Add a regression test that fails without the fix and passes with it
|
||||
- Write the final debug report to `.workflow/.debug/` using the schema in `~/.codex/skills/investigate/specs/debug-report-format.md`
|
||||
- Produce structured output after each phase, then await next assign_task
|
||||
|
||||
### MUST NOT
|
||||
|
||||
- Skip MANDATORY FIRST STEPS role loading
|
||||
- Proceed to Phase 4 without `confirmed_root_cause` (Iron Law violation)
|
||||
- Modify production code during Phases 1-3 (read-only investigation)
|
||||
- Count a rejected hypothesis as a strike if it yielded new actionable insight
|
||||
- Refactor, add features, or change formatting beyond the minimal fix
|
||||
- Change more than 3 files without written justification
|
||||
- Proceed past Phase 3 BLOCKED status
|
||||
|
||||
---
|
||||
|
||||
## Toolbox
|
||||
|
||||
### Available Tools
|
||||
|
||||
| Tool | Type | Purpose |
|
||||
|------|------|---------|
|
||||
| `Bash` | Shell execution | Run tests, reproduce bug, detect test framework, run full test suite |
|
||||
| `Read` | File read | Read source files, test files, phase docs, role files |
|
||||
| `Write` | File write | Write debug report to `.workflow/.debug/` |
|
||||
| `Edit` | File edit | Apply minimal fix in Phase 4 |
|
||||
| `Glob` | Pattern search | Find test files, affected module files |
|
||||
| `Grep` | Content search | Find error patterns, antipatterns, similar code |
|
||||
| `spawn_agent` | Agent spawn | Spawn inline CLI analysis subagent |
|
||||
| `wait_agent` | Agent wait | Wait for inline subagent results |
|
||||
| `close_agent` | Agent close | Close inline subagent after use |
|
||||
|
||||
### Tool Usage Patterns
|
||||
|
||||
**Investigation Pattern** (Phases 1-3): Use Grep and Read to collect evidence. No Write or Edit.
|
||||
|
||||
**Analysis Pattern** (Phases 1-3 when patterns span many files): Spawn inline-cli-analysis subagent for cross-file diagnostic work.
|
||||
|
||||
**Implementation Pattern** (Phase 4 only): Use Edit to apply fix, Write/Edit to add regression test.
|
||||
|
||||
**Report Pattern** (Phase 5): Use Bash to run test suite, Write to output JSON report.
|
||||
|
||||
---
|
||||
|
||||
## Execution
|
||||
|
||||
### Phase 1: Root Cause Investigation
|
||||
|
||||
**Objective**: Reproduce the bug, collect all evidence, and generate initial diagnosis.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| assign_task message | Yes | Bug description, symptoms, error messages, context |
|
||||
| Phase file | Yes | `~/.codex/skills/investigate/phases/01-root-cause-investigation.md` |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/investigate/phases/01-root-cause-investigation.md` before executing.
|
||||
2. Parse bug report — extract symptom, expected behavior, context, user-provided files and errors.
|
||||
3. Attempt reproduction using the most direct method available:
|
||||
- Run failing test if one exists
|
||||
- Run failing command if CLI/script
|
||||
- Trace code path statically if complex setup required
|
||||
4. Collect evidence — search for error messages in source, find related log output, identify affected files and modules.
|
||||
5. Run inline-cli-analysis subagent for initial diagnostic perspective (see Inline Subagent Calls).
|
||||
6. Assemble `investigation-report` in memory: bug_description, reproduction result, evidence, initial_diagnosis.
|
||||
7. Output Phase 1 summary and await assign_task for Phase 2.
|
||||
|
||||
**Output**: In-memory investigation-report (phase 1 fields populated)
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Pattern Analysis
|
||||
|
||||
**Objective**: Search for similar patterns in the codebase, classify bug scope.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| assign_task message | Yes | Phase 2 instruction |
|
||||
| Phase file | Yes | `~/.codex/skills/investigate/phases/02-pattern-analysis.md` |
|
||||
| investigation-report | Yes | Phase 1 output in context |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/investigate/phases/02-pattern-analysis.md` before executing.
|
||||
2. Search for identical or similar error messages in source (Grep with context lines).
|
||||
3. Search for the same exception/error type across the codebase.
|
||||
4. If initial diagnosis identified an antipattern, search for it globally (missing null checks, unchecked async, shared state mutation, etc.).
|
||||
5. Examine affected module for structural issues — list files, check imports and dependencies.
|
||||
6. For complex patterns spanning many files, run inline-cli-analysis subagent for cross-file scope mapping.
|
||||
7. Classify scope: `isolated` | `module-wide` | `systemic` with justification.
|
||||
8. Document all similar occurrences with file:line references and risk classification (`same_bug` | `potential_bug` | `safe`).
|
||||
9. Add `pattern_analysis` section to investigation-report in memory.
|
||||
10. Output Phase 2 summary and await assign_task for Phase 3.
|
||||
|
||||
**Output**: investigation-report with pattern_analysis section added
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Hypothesis Testing
|
||||
|
||||
**Objective**: Form up to 3 hypotheses, test each, enforce 3-strike escalation, confirm root cause.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| assign_task message | Yes | Phase 3 instruction |
|
||||
| Phase file | Yes | `~/.codex/skills/investigate/phases/03-hypothesis-testing.md` |
|
||||
| investigation-report | Yes | Phase 1-2 output in context |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/investigate/phases/03-hypothesis-testing.md` before executing.
|
||||
2. Form up to 3 ranked hypotheses from Phase 1-2 evidence. Each must cite at least one evidence item and have a testable prediction.
|
||||
3. Initialize strike counter at 0.
|
||||
4. Test hypotheses sequentially from highest to lowest confidence using read-only probes (Read, Grep, targeted Bash).
|
||||
5. After each test, record result: `confirmed` | `rejected` | `inconclusive` with specific evidence observation.
|
||||
|
||||
**Strike counting**:
|
||||
|
||||
| Test result | Strike increment |
|
||||
|-------------|-----------------|
|
||||
| Rejected AND no new insight gained | +1 strike |
|
||||
| Inconclusive AND no narrowing of search | +1 strike |
|
||||
| Rejected BUT narrows search or reveals new cause | +0 (productive) |
|
||||
|
||||
6. If strike counter reaches 3 — STOP immediately. Output escalation block (see 3-Strike Escalation Output below). Set status BLOCKED.
|
||||
7. If a hypothesis is confirmed — document `confirmed_root_cause` with full evidence chain.
|
||||
8. Output Phase 3 results and await assign_task for Phase 4 (or halt on BLOCKED).
|
||||
|
||||
**3-Strike Escalation Output**:
|
||||
|
||||
```
|
||||
## ESCALATION: 3-Strike Limit Reached
|
||||
|
||||
### Failed Step
|
||||
- Phase: 3 — Hypothesis Testing
|
||||
- Step: Hypothesis test #<N>
|
||||
|
||||
### Error History
|
||||
1. Attempt 1: <H1 description>
|
||||
Test: <what was checked>
|
||||
Result: <rejected/inconclusive> — <why>
|
||||
2. Attempt 2: <H2 description>
|
||||
Test: <what was checked>
|
||||
Result: <rejected/inconclusive> — <why>
|
||||
3. Attempt 3: <H3 description>
|
||||
Test: <what was checked>
|
||||
Result: <rejected/inconclusive> — <why>
|
||||
|
||||
### Current State
|
||||
- Evidence collected: <summary from Phase 1-2>
|
||||
- Hypotheses tested: <list>
|
||||
- Files examined: <list>
|
||||
|
||||
### Diagnosis
|
||||
- Likely root cause area: <best guess based on all evidence>
|
||||
- Suggested human action: <specific recommendation>
|
||||
|
||||
### Diagnostic Dump
|
||||
<Full investigation-report content>
|
||||
|
||||
STATUS: BLOCKED
|
||||
```
|
||||
|
||||
**Output**: investigation-report with hypothesis_tests and confirmed_root_cause (or BLOCKED escalation)
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Implementation
|
||||
|
||||
**Objective**: Verify Iron Law gate, implement minimal fix, add regression test.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| assign_task message | Yes | Phase 4 instruction |
|
||||
| Phase file | Yes | `~/.codex/skills/investigate/phases/04-implementation.md` |
|
||||
| investigation-report | Yes | Must contain confirmed_root_cause |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/investigate/phases/04-implementation.md` before executing.
|
||||
|
||||
2. **Iron Law Gate Check** — verify `confirmed_root_cause` is present in investigation-report:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| confirmed_root_cause present | Proceed to Step 3 |
|
||||
| confirmed_root_cause absent | Output "BLOCKED: Iron Law violation — no confirmed root cause. Return to Phase 3." Halt. |
|
||||
|
||||
3. Plan the minimal fix before writing any code. Document: description, files to change, change types, estimated lines.
|
||||
|
||||
| Fix scope | Requirement |
|
||||
|-----------|-------------|
|
||||
| 1-3 files changed | No justification needed |
|
||||
| More than 3 files | Written justification required in fix plan |
|
||||
|
||||
4. Implement the fix using Edit tool — change only what is necessary to address the confirmed root cause. No refactoring, no style changes to unrelated code.
|
||||
5. Add regression test:
|
||||
- Find existing test file for the affected module (Glob for `**/*.test.{ts,js,py}` or `**/test_*.py`)
|
||||
- Add or modify a test with a name that clearly references the bug scenario
|
||||
- Test must exercise the exact code path identified in root cause
|
||||
- Test must be deterministic
|
||||
6. Re-run the original reproduction case from Phase 1. Verify it now passes.
|
||||
7. Add `fix_applied` section to investigation-report in memory.
|
||||
8. Output Phase 4 summary and await assign_task for Phase 5.
|
||||
|
||||
**Output**: Modified source files, regression test file; investigation-report with fix_applied section
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: Verification & Report
|
||||
|
||||
**Objective**: Run full test suite, check regressions, generate structured debug report.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| assign_task message | Yes | Phase 5 instruction |
|
||||
| Phase file | Yes | `~/.codex/skills/investigate/phases/05-verification-report.md` |
|
||||
| investigation-report | Yes | All phases populated |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/investigate/phases/05-verification-report.md` before executing.
|
||||
2. Detect and run the project's test framework:
|
||||
- Check for `package.json` (npm test)
|
||||
- Check for `pytest.ini` / `pyproject.toml` (pytest)
|
||||
- Check for `go.mod` (go test)
|
||||
- Check for `Cargo.toml` (cargo test)
|
||||
3. Record test results: total, passed, failed, skipped. Note if regression test passed.
|
||||
4. Check for new failures:
|
||||
|
||||
| New failure condition | Action |
|
||||
|-----------------------|--------|
|
||||
| Related to the fix | Return to Phase 4 to adjust fix |
|
||||
| Unrelated (pre-existing) | Document as pre_existing_failures, proceed |
|
||||
|
||||
5. Generate debug report JSON following schema in `~/.codex/skills/investigate/specs/debug-report-format.md`. Populate all required fields from investigation-report phases.
|
||||
6. Create output directory and write report:
|
||||
```
|
||||
Bash: mkdir -p .workflow/.debug
|
||||
```
|
||||
Filename: `.workflow/.debug/debug-report-<YYYY-MM-DD>-<slug>.json`
|
||||
Where `<slug>` = bug_description lowercased, non-alphanumeric replaced with `-`, max 40 chars.
|
||||
7. Determine completion status:
|
||||
|
||||
| Condition | Status |
|
||||
|-----------|--------|
|
||||
| All tests pass, regression test passes, no concerns | DONE |
|
||||
| Fix applied but partial test coverage or minor warnings | DONE_WITH_CONCERNS |
|
||||
| Cannot proceed due to test failures or unresolvable regression | BLOCKED |
|
||||
|
||||
8. Output completion status block.
|
||||
|
||||
**Output**: `.workflow/.debug/debug-report-<date>-<slug>.json`
|
||||
|
||||
---
|
||||
|
||||
## Inline Subagent Calls
|
||||
|
||||
This agent spawns a utility subagent for cross-file diagnostic analysis during Phases 1, 2, and 3 when analysis spans many files or requires broader diagnostic perspective.
|
||||
|
||||
### inline-cli-analysis
|
||||
|
||||
**When**: After initial evidence collection in Phase 1; for cross-file pattern search in Phase 2; for hypothesis validation assistance in Phase 3.
|
||||
|
||||
**Agent File**: `~/.codex/agents/cli-explore-agent.md`
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "inline-cli-analysis",
|
||||
fork_context: false,
|
||||
model: "haiku",
|
||||
reasoning_effort: "medium",
|
||||
message: `### MANDATORY FIRST STEPS
|
||||
1. Read: ~/.codex/agents/cli-explore-agent.md
|
||||
|
||||
<analysis task description — e.g.:
|
||||
PURPOSE: Diagnose root cause of bug from collected evidence
|
||||
TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
|
||||
MODE: analysis
|
||||
CONTEXT: @<affected_files> | Evidence: <error_messages_and_traces>
|
||||
EXPECTED: Top 3 likely root causes ranked by evidence strength
|
||||
CONSTRAINTS: Read-only analysis | Focus on <affected_module>>
|
||||
|
||||
Expected: Structured findings with file:line references`
|
||||
})
|
||||
const result = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
|
||||
close_agent({ target: "inline-cli-analysis" })
|
||||
```
|
||||
|
||||
Substitute the analysis task description with phase-appropriate content:
|
||||
- Phase 1: Initial diagnosis from error evidence
|
||||
- Phase 2: Cross-file pattern search and scope mapping
|
||||
- Phase 3: Hypothesis validation assistance
|
||||
|
||||
### Result Handling
|
||||
|
||||
| Result | Action |
|
||||
|--------|--------|
|
||||
| Success | Integrate findings into investigation-report, continue |
|
||||
| Timeout / Error | Continue without subagent result, log warning in investigation-report |
|
||||
|
||||
---
|
||||
|
||||
## Structured Output Template
|
||||
|
||||
After each phase, output the following structure before awaiting the next assign_task:
|
||||
|
||||
```
|
||||
## Phase <N> Complete
|
||||
|
||||
### Summary
|
||||
- <one-sentence status of what was accomplished>
|
||||
|
||||
### Findings
|
||||
- <Finding 1>: <specific description with file:line reference>
|
||||
- <Finding 2>: <specific description with file:line reference>
|
||||
|
||||
### Investigation Report Update
|
||||
- Fields updated: <list of fields added/modified this phase>
|
||||
- Key data: <most important finding from this phase>
|
||||
|
||||
### Status
|
||||
<AWAITING_NEXT_PHASE | BLOCKED: <reason> | DONE>
|
||||
```
|
||||
|
||||
Final Phase 5 output follows Completion Status Protocol:
|
||||
|
||||
```
|
||||
## STATUS: DONE
|
||||
|
||||
**Summary**: Fixed <bug_description> — root cause was <root_cause_summary>
|
||||
|
||||
### Details
|
||||
- Phases completed: 5/5
|
||||
- Root cause: <confirmed_root_cause>
|
||||
- Fix: <fix_description>
|
||||
- Regression test: <test_name> in <test_file>
|
||||
|
||||
### Outputs
|
||||
- Debug report: <reportPath>
|
||||
- Files changed: <list>
|
||||
- Tests added: <list>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Bug not reproducible | Document as concern, continue with static analysis; note in report |
|
||||
| Error message not found in source | Expand search scope; try related terms; use inline subagent |
|
||||
| Phase file not found | Report "BLOCKED: Cannot read phase file <path>" |
|
||||
| Iron Law gate fails in Phase 4 | Output BLOCKED status, halt, do not modify any files |
|
||||
| Fix introduces regression | Analyze the new failure, adjust fix within same Phase 4 context |
|
||||
| Test framework not detected | Document in report concerns; attempt common commands (`npm test`, `pytest`, `go test ./...`) |
|
||||
| inline-cli-analysis timeout | Continue without subagent result, log warning |
|
||||
| Scope ambiguity | Report in Open Questions, proceed with reasonable assumption and document |
|
||||
362
.codex/skills/investigate/orchestrator.md
Normal file
362
.codex/skills/investigate/orchestrator.md
Normal file
@@ -0,0 +1,362 @@
|
||||
---
|
||||
name: investigate
|
||||
description: Systematic debugging with Iron Law methodology. 5-phase investigation from evidence collection to verified fix. Triggers on "investigate", "debug", "root cause".
|
||||
agents: investigator
|
||||
phases: 5
|
||||
---
|
||||
|
||||
# Investigate
|
||||
|
||||
Systematic debugging skill that enforces the Iron Law: never fix without a confirmed root cause. Produces a structured debug report with full evidence chain, minimal fix, and regression test.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+--------------------------------------------------------------+
|
||||
| investigate Orchestrator |
|
||||
| -> Drive investigator agent through 5 sequential phases |
|
||||
+----------------------------+---------------------------------+
|
||||
|
|
||||
spawn_agent (Phase 1 initial task)
|
||||
|
|
||||
v
|
||||
+------------------+
|
||||
| investigator |
|
||||
| (single agent, |
|
||||
| 5-phase loop) |
|
||||
+------------------+
|
||||
| ^ |
|
||||
assign_task | | | assign_task
|
||||
(Phase 2-5) v | v (Phase 3 gate check)
|
||||
+------------------+
|
||||
| Phase 1: Root |
|
||||
| Phase 2: Pattern |
|
||||
| Phase 3: Hyp. | <-- Gate: BLOCKED?
|
||||
| Phase 4: Impl. | <-- Iron Law gate
|
||||
| Phase 5: Report |
|
||||
+------------------+
|
||||
|
|
||||
v
|
||||
.workflow/.debug/debug-report-*.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Registry
|
||||
|
||||
| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
|
||||
|-------|-----------|-----------|----------------|---------|-------------|
|
||||
| investigator | `investigator` | `~/.codex/skills/investigate/agents/investigator.md` | Full 5-phase investigation execution | Deep Interaction (2.3) | false |
|
||||
|
||||
> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
|
||||
|
||||
---
|
||||
|
||||
## Fork Context Strategy
|
||||
|
||||
| Agent | task_name | fork_context | fork_from | Rationale |
|
||||
|-------|-----------|-------------|-----------|-----------|
|
||||
| investigator | `investigator` | false | — | Starts fresh; receives all phase context via assign_task messages. No prior conversation history needed. |
|
||||
|
||||
**Fork Decision Rules**:
|
||||
|
||||
| Condition | fork_context | Reason |
|
||||
|-----------|-------------|--------|
|
||||
| investigator spawned (Phase 1) | false | Clean context; full task description in message |
|
||||
| Phase 2-5 transitions | N/A | assign_task used, agent already running |
|
||||
|
||||
---
|
||||
|
||||
## Subagent Registry
|
||||
|
||||
Utility subagents callable by the investigator agent during analysis phases:
|
||||
|
||||
| Subagent | Agent File | Callable By | Purpose | Model |
|
||||
|----------|-----------|-------------|---------|-------|
|
||||
| inline-cli-analysis | `~/.codex/agents/cli-explore-agent.md` | investigator | Cross-file diagnostic analysis (replaces ccw cli calls) | haiku |
|
||||
|
||||
> Subagents are spawned by the investigator within its own execution context (Pattern 2.8), not by the orchestrator.
|
||||
|
||||
---
|
||||
|
||||
## Phase Execution
|
||||
|
||||
### Phase 1: Root Cause Investigation
|
||||
|
||||
**Objective**: Spawn the investigator agent and assign the Phase 1 investigation task. Agent reproduces the bug, collects evidence, and runs initial diagnosis.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| User message | Bug description, symptom, context, error messages |
|
||||
|
||||
**Execution**:
|
||||
|
||||
Build the initial spawn message embedding the bug report and Phase 1 instructions, then spawn the investigator:
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "investigator",
|
||||
fork_context: false,
|
||||
message: `## TASK ASSIGNMENT
|
||||
|
||||
### MANDATORY FIRST STEPS (Agent Execute)
|
||||
1. Read role definition: ~/.codex/skills/investigate/agents/investigator.md (MUST read first)
|
||||
2. Read: ~/.codex/skills/investigate/phases/01-root-cause-investigation.md
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Root Cause Investigation
|
||||
|
||||
Bug Report:
|
||||
<user-provided bug description, symptoms, error messages, context>
|
||||
|
||||
Execute Phase 1 per the phase file. Produce investigation-report (in-memory) and report back with:
|
||||
- Phase 1 complete summary
|
||||
- bug_description, reproduction result, evidence collected, initial diagnosis
|
||||
- Await next phase assignment.`
|
||||
})
|
||||
|
||||
const p1Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| p1Result | Phase 1 completion summary with evidence, reproduction, initial diagnosis |
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Pattern Analysis
|
||||
|
||||
**Objective**: Assign Phase 2 to the running investigator. Agent searches codebase for similar patterns and classifies bug scope.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| p1Result | Phase 1 output — evidence, affected files, initial suspects |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "investigator",
|
||||
items: [{
|
||||
type: "text",
|
||||
text: `## Phase 2: Pattern Analysis
|
||||
|
||||
Read: ~/.codex/skills/investigate/phases/02-pattern-analysis.md
|
||||
|
||||
Using your Phase 1 findings, execute Phase 2:
|
||||
- Search for similar error patterns across the codebase
|
||||
- Search for the same antipattern if identified
|
||||
- Classify scope: isolated | module-wide | systemic
|
||||
- Document all occurrences with file:line references
|
||||
|
||||
Report back with pattern_analysis section and scope classification. Await next phase assignment.`
|
||||
}]
|
||||
})
|
||||
|
||||
const p2Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| p2Result | Pattern analysis section: scope classification, similar occurrences, scope justification |
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Hypothesis Testing
|
||||
|
||||
**Objective**: Assign Phase 3 to the investigator. Agent forms and tests up to 3 hypotheses. Orchestrator checks output for `BLOCKED` marker before proceeding.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| p2Result | Pattern analysis results |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "investigator",
|
||||
items: [{
|
||||
type: "text",
|
||||
text: `## Phase 3: Hypothesis Testing
|
||||
|
||||
Read: ~/.codex/skills/investigate/phases/03-hypothesis-testing.md
|
||||
|
||||
Using Phase 1-2 evidence, execute Phase 3:
|
||||
- Form up to 3 ranked hypotheses, each citing evidence
|
||||
- Test each hypothesis with read-only probes
|
||||
- Track 3-strike counter — if 3 consecutive unproductive failures: STOP and output ESCALATION block with BLOCKED status
|
||||
- If a hypothesis is confirmed: output confirmed_root_cause with full evidence chain
|
||||
|
||||
Report back with hypothesis test results and either:
|
||||
confirmed_root_cause (proceed to Phase 4)
|
||||
OR BLOCKED: <escalation dump> (halt)`
|
||||
}]
|
||||
})
|
||||
|
||||
const p3Result = wait_agent({ targets: ["investigator"], timeout_ms: 480000 })
|
||||
```
|
||||
|
||||
**Phase 3 Gate Decision**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| p3Result contains `confirmed_root_cause` | Proceed to Phase 4 |
|
||||
| p3Result contains `BLOCKED` | Halt workflow, output escalation dump to user, close investigator |
|
||||
| p3Result contains `ESCALATION: 3-Strike Limit Reached` | Halt workflow, output diagnostic dump, close investigator |
|
||||
| Timeout | assign_task "Finalize Phase 3 results now", re-wait 120s; if still timeout → halt |
|
||||
|
||||
If BLOCKED: close investigator and surface the diagnostic dump to the user. Do not proceed to Phase 4.
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Implementation
|
||||
|
||||
**Objective**: Assign Phase 4 only after confirmed root cause. Agent implements minimal fix and adds regression test.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| p3Result | confirmed_root_cause with evidence chain, affected file:line |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "investigator",
|
||||
items: [{
|
||||
type: "text",
|
||||
text: `## Phase 4: Implementation
|
||||
|
||||
Read: ~/.codex/skills/investigate/phases/04-implementation.md
|
||||
|
||||
Iron Law gate confirmed — proceed with implementation:
|
||||
- Verify confirmed_root_cause is present in your context (gate check)
|
||||
- Plan the minimal fix before writing any code
|
||||
- Implement only what is necessary to fix the confirmed root cause
|
||||
- Add regression test: must fail without fix, pass with fix
|
||||
- Verify fix against original reproduction case from Phase 1
|
||||
|
||||
Report back with fix_applied section. Await Phase 5 assignment.`
|
||||
}]
|
||||
})
|
||||
|
||||
const p4Result = wait_agent({ targets: ["investigator"], timeout_ms: 480000 })
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| p4Result | fix_applied section: files changed, regression test details, reproduction verified |
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: Verification & Report
|
||||
|
||||
**Objective**: Assign Phase 5 to run the full test suite and generate the structured debug report.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| p4Result | fix_applied details — files changed, regression test |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "investigator",
|
||||
items: [{
|
||||
type: "text",
|
||||
text: `## Phase 5: Verification & Report
|
||||
|
||||
Read: ~/.codex/skills/investigate/phases/05-verification-report.md
|
||||
|
||||
Final phase:
|
||||
- Run full test suite (detect framework: npm test / pytest / go test / cargo test)
|
||||
- Verify the regression test passes
|
||||
- Check for new failures introduced by the fix
|
||||
- Generate structured debug report per specs/debug-report-format.md
|
||||
- Write report to .workflow/.debug/debug-report-<YYYY-MM-DD>-<slug>.json
|
||||
- Output completion status: DONE | DONE_WITH_CONCERNS | BLOCKED`
|
||||
}]
|
||||
})
|
||||
|
||||
const p5Result = wait_agent({ targets: ["investigator"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| p5Result | Completion status, test suite results, path to debug report file |
|
||||
|
||||
---
|
||||
|
||||
## Lifecycle Management
|
||||
|
||||
### Timeout Protocol
|
||||
|
||||
| Phase | Default Timeout | On Timeout |
|
||||
|-------|-----------------|------------|
|
||||
| Phase 1 (spawn + wait) | 300000 ms | assign_task "Finalize Phase 1 now" + wait 120s; if still timeout → halt |
|
||||
| Phase 2 (assign + wait) | 300000 ms | assign_task "Finalize Phase 2 now" + wait 120s; if still timeout → halt |
|
||||
| Phase 3 (assign + wait) | 480000 ms | assign_task "Finalize Phase 3 now" + wait 120s; if still timeout → halt BLOCKED |
|
||||
| Phase 4 (assign + wait) | 480000 ms | assign_task "Finalize Phase 4 now" + wait 120s; if still timeout → halt |
|
||||
| Phase 5 (assign + wait) | 300000 ms | assign_task "Finalize Phase 5 now" + wait 120s; if still timeout → partial report |
|
||||
|
||||
### Cleanup Protocol
|
||||
|
||||
At workflow end (success or halt), close the investigator agent:
|
||||
|
||||
```
|
||||
close_agent({ target: "investigator" })
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Agent timeout (first) | assign_task "Finalize current work and output results" + re-wait 120000 ms |
|
||||
| Agent timeout (second) | close_agent, report partial results to user |
|
||||
| Phase 3 BLOCKED | close_agent, surface full escalation dump to user, halt |
|
||||
| Phase 4 Iron Law violation | close_agent, report "Cannot proceed: no confirmed root cause" |
|
||||
| Phase 4 introduces regression | Investigator returns to fix adjustment; orchestrator re-waits same phase |
|
||||
| User cancellation | close_agent({ target: "investigator" }), report current state |
|
||||
| send_message ignored | Escalate to assign_task |
|
||||
|
||||
---
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
## Summary
|
||||
- One-sentence completion status (DONE / DONE_WITH_CONCERNS / BLOCKED)
|
||||
|
||||
## Results
|
||||
- Root cause: <confirmed root cause description>
|
||||
- Fix: <what was changed>
|
||||
- Regression test: <test name in test file>
|
||||
|
||||
## Artifacts
|
||||
- File: .workflow/.debug/debug-report-<date>-<slug>.json
|
||||
- Description: Full structured investigation report
|
||||
|
||||
## Next Steps (if DONE_WITH_CONCERNS or BLOCKED)
|
||||
1. <recommended follow-up action>
|
||||
2. <recommended follow-up action>
|
||||
```
|
||||
212
.codex/skills/investigate/phases/01-root-cause-investigation.md
Normal file
212
.codex/skills/investigate/phases/01-root-cause-investigation.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# Phase 1: Root Cause Investigation
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Reproduce the bug and collect all available evidence before forming any theories.
|
||||
|
||||
## Objective
|
||||
|
||||
- Reproduce the bug with concrete, observable symptoms
|
||||
- Collect all evidence: error messages, logs, stack traces, affected files
|
||||
- Establish a baseline understanding of what goes wrong and where
|
||||
- Use inline CLI analysis for initial diagnosis
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| assign_task message | Yes | Bug description, symptom, expected behavior, context, user-provided errors |
|
||||
| User-provided files | Optional | Any files or paths the user mentioned as relevant |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Parse the Bug Report
|
||||
|
||||
Extract the following from the user's description:
|
||||
- **Symptom**: What observable behavior is wrong?
|
||||
- **Expected**: What should happen instead?
|
||||
- **Context**: When/where does it occur? (specific input, environment, timing)
|
||||
- **User-provided files**: Any files mentioned
|
||||
- **User-provided errors**: Any error messages provided
|
||||
|
||||
Assemble the extracted fields as the initial `investigation-report` structure in memory:
|
||||
|
||||
```
|
||||
bugReport = {
|
||||
symptom: <extracted from description>,
|
||||
expected_behavior: <what should happen>,
|
||||
context: <when/where it occurs>,
|
||||
user_provided_files: [<files mentioned>],
|
||||
user_provided_errors: [<error messages>]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Reproduce the Bug
|
||||
|
||||
Attempt reproduction using the most direct method available:
|
||||
|
||||
| Method | When to use |
|
||||
|--------|-------------|
|
||||
| Run failing test | A specific failing test is known or can be identified |
|
||||
| Run failing command | Bug is triggered by a CLI command or script |
|
||||
| Static code path trace | Reproduction requires complex setup; use Read + Grep to trace the path |
|
||||
|
||||
Execution for each method:
|
||||
|
||||
**Run failing test**:
|
||||
```
|
||||
Bash: <detect test runner and run the specific failing test>
|
||||
```
|
||||
|
||||
**Run failing command**:
|
||||
```
|
||||
Bash: <execute the command that triggers the bug>
|
||||
```
|
||||
|
||||
**Static code path trace**:
|
||||
- Use Grep to find the error message text in source
|
||||
- Use Read to trace the code path that produces the error
|
||||
- Document the theoretical reproduction path
|
||||
|
||||
**Decision table**:
|
||||
|
||||
| Outcome | Action |
|
||||
|---------|--------|
|
||||
| Reproduction successful | Document steps and method, proceed to Step 3 |
|
||||
| Reproduction failed | Document what was attempted, note as concern, continue with static analysis |
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Collect Evidence
|
||||
|
||||
Gather all available evidence using project tools:
|
||||
|
||||
1. Search for the exact error message text in source files (Grep with 3 lines of context).
|
||||
2. Search for related log output patterns.
|
||||
3. Read any stack trace files or test output files if they exist on disk.
|
||||
4. Use Glob to identify all files in the affected module or area.
|
||||
5. Read the most directly implicated source files.
|
||||
|
||||
Compile findings into the evidence section of the investigation-report:
|
||||
|
||||
```
|
||||
evidence = {
|
||||
error_messages: [<exact error text>],
|
||||
stack_traces: [<relevant stack trace>],
|
||||
affected_files: [<file1>, <file2>],
|
||||
affected_modules: [<module-name>],
|
||||
log_output: [<relevant log lines>]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Initial Diagnosis via Inline CLI Analysis
|
||||
|
||||
Spawn inline-cli-analysis subagent for broader diagnostic perspective:
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "inline-cli-analysis",
|
||||
fork_context: false,
|
||||
model: "haiku",
|
||||
reasoning_effort: "medium",
|
||||
message: `### MANDATORY FIRST STEPS
|
||||
1. Read: ~/.codex/agents/cli-explore-agent.md
|
||||
|
||||
PURPOSE: Diagnose root cause of bug from collected evidence
|
||||
TASK: Analyze error context | Trace data flow | Identify suspicious code patterns
|
||||
MODE: analysis
|
||||
CONTEXT: @<affected_files_from_step3> | Evidence: <error_messages_and_traces>
|
||||
EXPECTED: Top 3 likely root causes ranked by evidence strength, each with file:line reference
|
||||
CONSTRAINTS: Read-only analysis | Focus on <affected_module>`
|
||||
})
|
||||
const diagResult = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
|
||||
close_agent({ target: "inline-cli-analysis" })
|
||||
```
|
||||
|
||||
Record results in initial_diagnosis section:
|
||||
|
||||
```
|
||||
initial_diagnosis = {
|
||||
cli_tool_used: "inline-cli-analysis",
|
||||
top_suspects: [
|
||||
{ description: <suspect 1>, evidence_strength: "strong|moderate|weak", files: [<files>] }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Decision table**:
|
||||
|
||||
| Outcome | Action |
|
||||
|---------|--------|
|
||||
| Subagent returns top suspects | Integrate into investigation-report, proceed to Step 5 |
|
||||
| Subagent timeout or error | Log warning in investigation-report, proceed to Step 5 without subagent findings |
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Assemble Investigation Report
|
||||
|
||||
Combine all findings into the complete Phase 1 investigation-report:
|
||||
|
||||
```
|
||||
investigation_report = {
|
||||
phase: 1,
|
||||
bug_description: <concise one-sentence description>,
|
||||
reproduction: {
|
||||
reproducible: true|false,
|
||||
steps: ["step 1: ...", "step 2: ...", "step 3: observe error"],
|
||||
reproduction_method: "test|command|static_analysis"
|
||||
},
|
||||
evidence: {
|
||||
error_messages: [<exact error text>],
|
||||
stack_traces: [<relevant stack trace>],
|
||||
affected_files: [<file1>, <file2>],
|
||||
affected_modules: [<module-name>],
|
||||
log_output: [<relevant log lines>]
|
||||
},
|
||||
initial_diagnosis: {
|
||||
cli_tool_used: "inline-cli-analysis",
|
||||
top_suspects: [
|
||||
{ description: <suspect>, evidence_strength: "strong|moderate|weak", files: [] }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Output Phase 1 summary and await assign_task for Phase 2.
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| investigation-report (phase 1) | In-memory JSON | bug_description, reproduction, evidence, initial_diagnosis |
|
||||
| Phase 1 summary | Structured text output | Summary for orchestrator, await Phase 2 assignment |
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Bug symptom clearly documented | bug_description field populated with 10+ chars |
|
||||
| Reproduction attempted | reproduction.reproducible is true or failure documented |
|
||||
| At least one concrete evidence item collected | evidence.error_messages OR stack_traces OR affected_files non-empty |
|
||||
| Affected files identified | evidence.affected_files non-empty |
|
||||
| Initial diagnosis generated | initial_diagnosis.top_suspects has at least one entry (or timeout documented) |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Cannot reproduce bug | Document what was attempted, set reproducible: false, continue with static analysis |
|
||||
| Error message not found in source | Expand search to whole project, try related terms, continue |
|
||||
| No affected files identifiable | Use Glob on broad patterns, document uncertainty |
|
||||
| inline-cli-analysis timeout | Continue without subagent result, log warning in initial_diagnosis |
|
||||
| User description insufficient | Document in Open Questions, proceed with available information |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 2: Pattern Analysis](02-pattern-analysis.md)
|
||||
181
.codex/skills/investigate/phases/02-pattern-analysis.md
Normal file
181
.codex/skills/investigate/phases/02-pattern-analysis.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Phase 2: Pattern Analysis
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Search for similar patterns in the codebase to determine if the bug is isolated or systemic.
|
||||
|
||||
## Objective
|
||||
|
||||
- Search for similar error patterns, antipatterns, or code smells across the codebase
|
||||
- Determine if the bug is an isolated incident or part of a systemic issue
|
||||
- Identify related code that may be affected by the same root cause
|
||||
- Refine the scope of the investigation
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| investigation-report (phase 1) | Yes | Evidence, affected files, affected modules, initial diagnosis suspects |
|
||||
| assign_task message | Yes | Phase 2 instruction |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Search for Similar Error Patterns
|
||||
|
||||
Search for the same error type or message elsewhere in the codebase:
|
||||
|
||||
1. Grep for identical or similar error message fragments in `src/` with 3 lines of context.
|
||||
2. Grep for the same exception class or error code — output mode: files with matches.
|
||||
3. Grep for similar error handling patterns in the same module.
|
||||
|
||||
**Decision table**:
|
||||
|
||||
| Result | Action |
|
||||
|--------|--------|
|
||||
| Similar patterns found in same module | Note as module-wide indicator, continue |
|
||||
| Similar patterns found across multiple modules | Note as systemic indicator, continue |
|
||||
| No similar patterns found | Note as isolated indicator, continue |
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Search for the Same Antipattern
|
||||
|
||||
If the Phase 1 initial diagnosis identified a coding antipattern, search for it globally:
|
||||
|
||||
**Common antipattern examples to search for**:
|
||||
|
||||
| Antipattern | Grep pattern style |
|
||||
|-------------|-------------------|
|
||||
| Missing null/undefined check | `variable\.property` without guard |
|
||||
| Unchecked async operation | unhandled promise, missing await |
|
||||
| Direct mutation of shared state | shared state write without lock |
|
||||
| Type assumption violation | forced cast without validation |
|
||||
|
||||
Execute at least one targeted Grep for the identified antipattern across relevant source directories.
|
||||
|
||||
**Decision table**:
|
||||
|
||||
| Result | Action |
|
||||
|--------|--------|
|
||||
| Antipattern found in multiple files | Classify as module-wide or systemic candidate |
|
||||
| Antipattern isolated to one location | Classify as isolated candidate |
|
||||
| No antipattern identifiable | Proceed without antipattern classification |
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Module-Level Analysis
|
||||
|
||||
Examine the affected module for structural issues:
|
||||
|
||||
1. Use Glob to list all files in the affected module directory.
|
||||
2. Grep for imports from the affected module to understand its consumers.
|
||||
3. Check for circular dependencies or unusual import patterns.
|
||||
|
||||
---
|
||||
|
||||
### Step 4: CLI Cross-File Pattern Analysis (Optional)
|
||||
|
||||
For complex patterns that span many files, use inline-cli-analysis subagent:
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "inline-cli-analysis",
|
||||
fork_context: false,
|
||||
model: "haiku",
|
||||
reasoning_effort: "medium",
|
||||
message: `### MANDATORY FIRST STEPS
|
||||
1. Read: ~/.codex/agents/cli-explore-agent.md
|
||||
|
||||
PURPOSE: Identify all instances of antipattern across codebase; success = complete scope map
|
||||
TASK: Search for pattern '<antipattern_description>' | Map all occurrences | Assess systemic risk
|
||||
MODE: analysis
|
||||
CONTEXT: @src/**/*.<ext> | Bug in <module>, pattern: <pattern_description>
|
||||
EXPECTED: List of all files with same pattern, risk assessment per occurrence (same_bug|potential_bug|safe)
|
||||
CONSTRAINTS: Focus on <antipattern> pattern only | Ignore test files for scope`
|
||||
})
|
||||
const patternResult = wait_agent({ targets: ["inline-cli-analysis"], timeout_ms: 180000 })
|
||||
close_agent({ target: "inline-cli-analysis" })
|
||||
```
|
||||
|
||||
**Decision table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Pattern spans >3 files in >1 module | Use subagent for full scope map |
|
||||
| Pattern confined to 1 module | Skip subagent, proceed with manual search results |
|
||||
| Subagent timeout | Continue with manual search results |
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Classify Scope and Assemble Pattern Analysis
|
||||
|
||||
Classify the bug scope based on all search findings:
|
||||
|
||||
**Scope Definitions**:
|
||||
|
||||
| Scope | Definition |
|
||||
|-------|-----------|
|
||||
| `isolated` | Bug exists in a single location; no similar patterns found elsewhere |
|
||||
| `module-wide` | Same pattern exists in multiple files within the same module |
|
||||
| `systemic` | Pattern spans multiple modules; may require broader fix |
|
||||
|
||||
Assemble `pattern_analysis` section and add to investigation-report:
|
||||
|
||||
```
|
||||
pattern_analysis = {
|
||||
scope: "isolated|module-wide|systemic",
|
||||
similar_occurrences: [
|
||||
{
|
||||
file: "<path/to/file.ts>",
|
||||
line: <line number>,
|
||||
pattern: "<description of similar pattern>",
|
||||
risk: "same_bug|potential_bug|safe"
|
||||
}
|
||||
],
|
||||
total_occurrences: <count>,
|
||||
affected_modules: ["<module-name>"],
|
||||
antipattern_identified: "<description or null>",
|
||||
scope_justification: "<evidence-based reasoning for this scope classification>"
|
||||
}
|
||||
```
|
||||
|
||||
**Scope decision table**:
|
||||
|
||||
| Scope | Phase 3 Focus |
|
||||
|-------|--------------|
|
||||
| isolated | Narrow hypothesis scope to single location |
|
||||
| module-wide | Note all occurrences for Phase 4 fix planning |
|
||||
| systemic | Note for potential multi-location fix; flag for separate tracking |
|
||||
|
||||
Output Phase 2 summary and await assign_task for Phase 3.
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| investigation-report (phase 2) | In-memory JSON | Phase 1 fields + pattern_analysis section added |
|
||||
| Phase 2 summary | Structured text output | Scope classification with justification, await Phase 3 |
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| At least 3 search queries executed | Count of Grep/Glob operations performed |
|
||||
| Scope classified | pattern_analysis.scope is one of: isolated, module-wide, systemic |
|
||||
| Similar occurrences documented | pattern_analysis.similar_occurrences populated (empty array acceptable for isolated) |
|
||||
| Scope justification provided | pattern_analysis.scope_justification non-empty with evidence |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| No source directory found | Search from project root, document uncertainty |
|
||||
| Grep returns too many results | Narrow pattern, add path filter, take top 10 most relevant |
|
||||
| inline-cli-analysis timeout | Continue with manual search results, log warning |
|
||||
| Antipattern not identifiable from Phase 1 | Skip Step 2 antipattern search, proceed with error pattern search only |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 3: Hypothesis Testing](03-hypothesis-testing.md)
|
||||
214
.codex/skills/investigate/phases/03-hypothesis-testing.md
Normal file
214
.codex/skills/investigate/phases/03-hypothesis-testing.md
Normal file
@@ -0,0 +1,214 @@
|
||||
# Phase 3: Hypothesis Testing
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.
|
||||
|
||||
## Objective
|
||||
|
||||
- Form a maximum of 3 hypotheses from Phase 1-2 evidence
|
||||
- Test each hypothesis with minimal, read-only probes
|
||||
- Confirm or reject each hypothesis with concrete evidence
|
||||
- Enforce 3-strike rule: STOP and escalate after 3 consecutive unproductive test failures
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| investigation-report (phases 1-2) | Yes | Evidence, affected files, pattern analysis, initial suspects |
|
||||
| assign_task message | Yes | Phase 3 instruction |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Form Hypotheses
|
||||
|
||||
Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:
|
||||
|
||||
**Hypothesis formation rules**:
|
||||
- Each hypothesis must cite at least one piece of evidence from Phase 1-2
|
||||
- Each hypothesis must have a testable prediction
|
||||
- Rank by confidence (high first)
|
||||
- Maximum 3 hypotheses per investigation
|
||||
|
||||
Assemble hypotheses in memory:
|
||||
|
||||
```
|
||||
hypotheses = [
|
||||
{
|
||||
id: "H1",
|
||||
description: "The root cause is <X> because evidence <Y>",
|
||||
evidence_supporting: ["<evidence item 1>", "<evidence item 2>"],
|
||||
predicted_behavior: "If H1 is correct, then we should observe <Z>",
|
||||
test_method: "How to verify: read file <X> line <Y>, check value <Z>",
|
||||
confidence: "high|medium|low"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
Initialize strike counter: 0
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Test Hypotheses Sequentially
|
||||
|
||||
Test each hypothesis starting from highest confidence (H1 first). Use read-only probes only during testing.
|
||||
|
||||
**Allowed test methods**:
|
||||
|
||||
| Method | Usage |
|
||||
|--------|-------|
|
||||
| Read a specific file | Check a specific value, condition, or code pattern |
|
||||
| Grep for a pattern | Confirm or deny the presence of a condition |
|
||||
| Bash targeted test | Run a specific test that reveals the condition |
|
||||
| Temporary log statement | Add a log to observe runtime behavior; MUST revert after |
|
||||
|
||||
**Prohibited during hypothesis testing**:
|
||||
- Modifying production code (save for Phase 4)
|
||||
- Changing multiple things at once
|
||||
- Running the full test suite (targeted checks only)
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Record Test Results
|
||||
|
||||
For each hypothesis test, record:
|
||||
|
||||
```
|
||||
hypothesis_test = {
|
||||
id: "H1",
|
||||
test_performed: "<what was checked, e.g.: Read src/caller.ts:42 — checked null handling>",
|
||||
result: "confirmed|rejected|inconclusive",
|
||||
evidence: "<specific observation that confirms or rejects>",
|
||||
files_checked: ["<src/caller.ts:42-55>"]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4: 3-Strike Escalation Rule
|
||||
|
||||
Track consecutive unproductive test failures. After each hypothesis test, evaluate:
|
||||
|
||||
**Strike evaluation**:
|
||||
|
||||
| Test result | New insight gained | Strike action |
|
||||
|-------------|-------------------|---------------|
|
||||
| confirmed | — | CONFIRM root cause, end testing |
|
||||
| rejected | Yes — narrows search or reveals new cause | No strike (productive rejection) |
|
||||
| rejected | No — no actionable insight | +1 strike |
|
||||
| inconclusive | Yes — identifies new area | No strike (productive) |
|
||||
| inconclusive | No — no narrowing | +1 strike |
|
||||
|
||||
**Strike counter tracking**:
|
||||
|
||||
| Strike count | Action |
|
||||
|--------------|--------|
|
||||
| 1 | Continue to next hypothesis |
|
||||
| 2 | Continue to next hypothesis |
|
||||
| 3 | STOP — output escalation block immediately |
|
||||
|
||||
**On 3rd Strike — output this escalation block verbatim and halt**:
|
||||
|
||||
```
|
||||
## ESCALATION: 3-Strike Limit Reached
|
||||
|
||||
### Failed Step
|
||||
- Phase: 3 — Hypothesis Testing
|
||||
- Step: Hypothesis test #<N>
|
||||
|
||||
### Error History
|
||||
1. Attempt 1: <H1 description>
|
||||
Test: <what was checked>
|
||||
Result: <rejected/inconclusive> — <why>
|
||||
2. Attempt 2: <H2 description>
|
||||
Test: <what was checked>
|
||||
Result: <rejected/inconclusive> — <why>
|
||||
3. Attempt 3: <H3 description>
|
||||
Test: <what was checked>
|
||||
Result: <rejected/inconclusive> — <why>
|
||||
|
||||
### Current State
|
||||
- Evidence collected: <summary from Phase 1-2>
|
||||
- Hypotheses tested: <list>
|
||||
- Files examined: <list>
|
||||
|
||||
### Diagnosis
|
||||
- Likely root cause area: <best guess based on all evidence>
|
||||
- Suggested human action: <specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z">
|
||||
|
||||
### Diagnostic Dump
|
||||
<Full investigation-report content from all phases>
|
||||
|
||||
STATUS: BLOCKED
|
||||
```
|
||||
|
||||
After outputting escalation: set status BLOCKED. Do not proceed to Phase 4.
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Confirm Root Cause
|
||||
|
||||
If a hypothesis is confirmed, document the confirmed root cause:
|
||||
|
||||
```
|
||||
confirmed_root_cause = {
|
||||
hypothesis_id: "H1",
|
||||
description: "<Root cause description with full technical detail>",
|
||||
evidence_chain: [
|
||||
"Phase 1: <Error message X observed in Y>",
|
||||
"Phase 2: <Same pattern found in N other files>",
|
||||
"Phase 3: H1 confirmed — <specific condition at file.ts:42>"
|
||||
],
|
||||
affected_code: {
|
||||
file: "<path/to/file.ts>",
|
||||
line_range: "<42-55>",
|
||||
function: "<functionName>"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Add `hypothesis_tests` and `confirmed_root_cause` to investigation-report in memory.
|
||||
|
||||
Output Phase 3 results and await assign_task for Phase 4.
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| investigation-report (phase 3) | In-memory JSON | Phases 1-2 fields + hypothesis_tests + confirmed_root_cause |
|
||||
| Phase 3 summary or escalation block | Structured text output | Either confirmed root cause or BLOCKED escalation |
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Maximum 3 hypotheses formed | Count of hypotheses array |
|
||||
| Each hypothesis cites evidence | evidence_supporting non-empty for each |
|
||||
| Each hypothesis tested with documented probe | test_performed field populated for each |
|
||||
| Strike counter maintained correctly | Count of unproductive consecutive failures |
|
||||
| Root cause confirmed with evidence chain OR escalation triggered | confirmed_root_cause present OR BLOCKED output |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Evidence insufficient to form 3 hypotheses | Form as many as evidence supports (minimum 1), proceed |
|
||||
| Partial insight from rejected hypothesis | Do not count as strike; re-form or refine remaining hypotheses with new insight |
|
||||
| All 3 hypotheses confirmed simultaneously | Use highest-confidence confirmed one as root cause |
|
||||
| Hypothesis test requires production change | Prohibited — use static analysis or targeted read-only probe instead |
|
||||
|
||||
## Gate for Phase 4
|
||||
|
||||
Phase 4 can ONLY proceed if `confirmed_root_cause` is present. This is the Iron Law gate.
|
||||
|
||||
| Outcome | Next Step |
|
||||
|---------|-----------|
|
||||
| Root cause confirmed | -> [Phase 4: Implementation](04-implementation.md) |
|
||||
| 3-strike escalation triggered | STOP — output diagnostic dump — STATUS: BLOCKED |
|
||||
| Partial insight, re-forming hypotheses | Stay in Phase 3, re-test with refined hypotheses |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 4: Implementation](04-implementation.md) ONLY with confirmed root cause.
|
||||
195
.codex/skills/investigate/phases/04-implementation.md
Normal file
195
.codex/skills/investigate/phases/04-implementation.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# Phase 4: Implementation
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Implement the minimal fix and add a regression test. Iron Law gate enforced at entry.
|
||||
|
||||
## Objective
|
||||
|
||||
- Verify Iron Law gate: confirmed root cause MUST exist from Phase 3
|
||||
- Implement the minimal fix that addresses the confirmed root cause
|
||||
- Add a regression test that fails without the fix and passes with it
|
||||
- Verify the fix resolves the original reproduction case
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| investigation-report (phase 3) | Yes | Must contain confirmed_root_cause with evidence chain |
|
||||
| assign_task message | Yes | Phase 4 instruction |
|
||||
|
||||
## Iron Law Gate Check
|
||||
|
||||
**MANDATORY FIRST ACTION before any code modification**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| investigation-report contains `confirmed_root_cause` with non-empty description | Proceed to Step 1 |
|
||||
| `confirmed_root_cause` absent or empty | Output "BLOCKED: Iron Law violation — no confirmed root cause. Return to Phase 3." Halt. Do NOT modify any files. |
|
||||
|
||||
Log the confirmed state before proceeding:
|
||||
- Root cause: `<confirmed_root_cause.description>`
|
||||
- Evidence chain: `<confirmed_root_cause.evidence_chain.length>` items
|
||||
- Affected code: `<confirmed_root_cause.affected_code.file>:<confirmed_root_cause.affected_code.line_range>`
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Plan the Minimal Fix
|
||||
|
||||
Define the fix scope BEFORE writing any code:
|
||||
|
||||
```
|
||||
fix_plan = {
|
||||
description: "<What the fix does and why>",
|
||||
changes: [
|
||||
{
|
||||
file: "<path/to/file.ts>",
|
||||
change_type: "modify|add|remove",
|
||||
description: "<specific change description>",
|
||||
lines_affected: "<42-45>"
|
||||
}
|
||||
],
|
||||
total_files_changed: <count>,
|
||||
total_lines_changed: "<estimated>"
|
||||
}
|
||||
```
|
||||
|
||||
**Minimal Fix Rules** (from Iron Law):
|
||||
|
||||
| Rule | Requirement |
|
||||
|------|-------------|
|
||||
| Change only necessary code | Only the confirmed root cause location |
|
||||
| No refactoring | Do not restructure surrounding code |
|
||||
| No feature additions | Fix only; no new capabilities |
|
||||
| No style/format changes | Do not touch unrelated code formatting |
|
||||
| >3 files changed | Requires written justification in fix_plan |
|
||||
|
||||
**Fix scope decision**:
|
||||
|
||||
| Files to change | Action |
|
||||
|----------------|--------|
|
||||
| 1-3 files | Proceed without justification |
|
||||
| More than 3 files | Document justification in fix_plan.description before proceeding |
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Implement the Fix
|
||||
|
||||
Apply the planned changes using Edit:
|
||||
|
||||
- Target only the file(s) and line(s) identified in `confirmed_root_cause.affected_code`
|
||||
- Make exactly the change described in fix_plan
|
||||
- Verify the edit was applied correctly by reading the modified section
|
||||
|
||||
**Decision table**:
|
||||
|
||||
| Edit outcome | Action |
|
||||
|-------------|--------|
|
||||
| Edit applied correctly | Proceed to Step 3 |
|
||||
| Edit failed or incorrect | Re-apply with corrected old_string/new_string; if Edit fails 2+ times, use Bash sed as fallback |
|
||||
| Fix requires more than planned | Document the additional change in fix_plan with justification |
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Add Regression Test
|
||||
|
||||
Create or modify a test that proves the fix:
|
||||
|
||||
1. Find the appropriate test file for the affected module:
|
||||
- Use Glob for `**/*.test.{ts,js,py}`, `**/__tests__/**/*.{ts,js}`, or `**/test_*.py`
|
||||
- Match the test file to the affected source module
|
||||
2. Add a regression test with these requirements:
|
||||
|
||||
**Regression test requirements**:
|
||||
|
||||
| Requirement | Details |
|
||||
|-------------|---------|
|
||||
| Test name references the bug | Name clearly describes the bug scenario (e.g., "should handle null display_name without error") |
|
||||
| Tests exact code path | Exercises the specific path identified in root cause |
|
||||
| Deterministic | No timing dependencies, no external services |
|
||||
| Correct placement | In the appropriate test file for the affected module |
|
||||
| Proves the fix | Must fail when fix is reverted, pass when fix is applied |
|
||||
|
||||
**Decision table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Existing test file found for module | Add test to that file |
|
||||
| No existing test file found | Create new test file following project conventions |
|
||||
| Multiple candidate test files | Choose the one most directly testing the affected module |
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Verify Fix Against Reproduction
|
||||
|
||||
Re-run the original reproduction case from Phase 1:
|
||||
|
||||
- If Phase 1 used a failing test: run that same test now
|
||||
- If Phase 1 used a failing command: run that same command now
|
||||
- If Phase 1 used static analysis: run the regression test as verification
|
||||
|
||||
Record verification result:
|
||||
|
||||
```
|
||||
fix_applied = {
|
||||
description: "<what was fixed>",
|
||||
files_changed: ["<path/to/file.ts>"],
|
||||
lines_changed: <count>,
|
||||
regression_test: {
|
||||
file: "<path/to/test.ts>",
|
||||
test_name: "<test name>",
|
||||
status: "added|modified"
|
||||
},
|
||||
reproduction_verified: true|false
|
||||
}
|
||||
```
|
||||
|
||||
**Decision table**:
|
||||
|
||||
| Verification result | Action |
|
||||
|--------------------|--------|
|
||||
| Reproduction case now passes | Set reproduction_verified: true, proceed to Step 5 |
|
||||
| Reproduction case still fails | Analyze why fix is insufficient, adjust fix, re-run |
|
||||
| Cannot verify (setup required) | Document as concern, set reproduction_verified: false, proceed |
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Assemble Phase 4 Output
|
||||
|
||||
Add `fix_applied` to investigation-report in memory. Output Phase 4 summary and await assign_task for Phase 5.
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| Modified source files | File edits | Minimal fix applied to affected code |
|
||||
| Regression test | File add/edit | Test covering the exact bug scenario |
|
||||
| investigation-report (phase 4) | In-memory JSON | Phases 1-3 fields + fix_applied section |
|
||||
| Phase 4 summary | Structured text output | Fix description, test added, verification result |
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Iron Law gate passed | confirmed_root_cause present before any code change |
|
||||
| Fix is minimal | fix_plan.total_files_changed <= 3 OR justification documented |
|
||||
| Regression test added | fix_applied.regression_test populated |
|
||||
| Original reproduction passes | fix_applied.reproduction_verified: true |
|
||||
| No unrelated code changes | Only confirmed_root_cause.affected_code locations modified |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Iron Law gate fails | Output BLOCKED, halt, do not modify any files |
|
||||
| Edit tool fails twice | Try Bash sed/awk as fallback; if still failing, use Write to recreate file |
|
||||
| Fix does not resolve reproduction | Analyze remaining failure, adjust fix within Phase 4 |
|
||||
| Fix requires changing >3 files | Document justification, proceed after explicit justification |
|
||||
| No test file found for module | Create new test file following nearest similar test file pattern |
|
||||
| Regression test is non-deterministic | Refactor test to remove timing/external dependencies |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 5: Verification & Report](05-verification-report.md)
|
||||
240
.codex/skills/investigate/phases/05-verification-report.md
Normal file
240
.codex/skills/investigate/phases/05-verification-report.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Phase 5: Verification & Report
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Run full test suite, check for regressions, and generate the structured debug report.
|
||||
|
||||
## Objective
|
||||
|
||||
- Run the full test suite to verify no regressions were introduced
|
||||
- Generate a structured debug report for future reference
|
||||
- Output the report to `.workflow/.debug/` directory
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| investigation-report (phases 1-4) | Yes | All phases populated: evidence, root cause, fix_applied |
|
||||
| assign_task message | Yes | Phase 5 instruction |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Detect and Run Full Test Suite
|
||||
|
||||
Detect the project's test framework by checking for project files, then run the full suite:
|
||||
|
||||
| Detection file | Test command |
|
||||
|---------------|-------------|
|
||||
| `package.json` with `test` script | `npm test` |
|
||||
| `pytest.ini` or `pyproject.toml` | `pytest` |
|
||||
| `go.mod` | `go test ./...` |
|
||||
| `Cargo.toml` | `cargo test` |
|
||||
| `Makefile` with `test` target | `make test` |
|
||||
| None detected | Try `npm test`, `pytest`, `go test ./...` sequentially |
|
||||
|
||||
```
|
||||
Bash: mkdir -p .workflow/.debug
|
||||
Bash: <detected test command>
|
||||
```
|
||||
|
||||
Record test results:
|
||||
|
||||
```
|
||||
test_results = {
|
||||
total: <count>,
|
||||
passed: <count>,
|
||||
failed: <count>,
|
||||
skipped: <count>,
|
||||
regression_test_passed: true|false,
|
||||
new_failures: []
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Regression Check
|
||||
|
||||
Verify specifically:
|
||||
|
||||
1. The new regression test passes (check by test name from fix_applied.regression_test.test_name).
|
||||
2. All tests that were passing before the fix still pass.
|
||||
3. No new warnings or errors appeared in test output.
|
||||
|
||||
**Decision table for new failures**:
|
||||
|
||||
| New failure | Assessment | Action |
|
||||
|-------------|-----------|--------|
|
||||
| Related to fix (same module, same code path) | Fix introduced regression | Return to Phase 4 to adjust fix |
|
||||
| Unrelated to fix (different module, pre-existing) | Pre-existing failure | Document in pre_existing_failures, proceed |
|
||||
| Regression test itself fails | Fix is not working correctly | Return to Phase 4 |
|
||||
|
||||
Classify failures:
|
||||
|
||||
```
|
||||
regression_check_result = {
|
||||
passed: true|false,
|
||||
total_tests: <count>,
|
||||
new_failures: ["<test names that newly fail>"],
|
||||
pre_existing_failures: ["<tests that were already failing>"]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Generate Structured Debug Report
|
||||
|
||||
Compile all investigation data into the final debug report JSON following the schema from `~/.codex/skills/investigate/specs/debug-report-format.md`:
|
||||
|
||||
```
|
||||
debug_report = {
|
||||
"bug_description": "<concise one-sentence description of the bug>",
|
||||
"reproduction_steps": [
|
||||
"<step 1>",
|
||||
"<step 2>",
|
||||
"<step 3: observe error>"
|
||||
],
|
||||
"root_cause": "<confirmed root cause description with technical detail and file:line reference>",
|
||||
"evidence_chain": [
|
||||
"Phase 1: <error message X observed in module Y>",
|
||||
"Phase 2: <pattern analysis found N similar occurrences>",
|
||||
"Phase 3: hypothesis H<N> confirmed — <specific condition at file:line>"
|
||||
],
|
||||
"fix_description": "<what was changed and why>",
|
||||
"files_changed": [
|
||||
{
|
||||
"path": "<src/module/file.ts>",
|
||||
"change_type": "add|modify|remove",
|
||||
"description": "<brief description of changes to this file>"
|
||||
}
|
||||
],
|
||||
"tests_added": [
|
||||
{
|
||||
"file": "<src/module/__tests__/file.test.ts>",
|
||||
"test_name": "<should handle null return from X>",
|
||||
"type": "regression|unit|integration"
|
||||
}
|
||||
],
|
||||
"regression_check_result": {
|
||||
"passed": true|false,
|
||||
"total_tests": <count>,
|
||||
"new_failures": [],
|
||||
"pre_existing_failures": []
|
||||
},
|
||||
"completion_status": "DONE|DONE_WITH_CONCERNS|BLOCKED",
|
||||
"concerns": [],
|
||||
"timestamp": "<ISO-8601 timestamp>",
|
||||
"investigation_duration_phases": 5
|
||||
}
|
||||
```
|
||||
|
||||
**Field sources**:
|
||||
|
||||
| Field | Source Phase | Description |
|
||||
|-------|-------------|-------------|
|
||||
| `bug_description` | Phase 1 | User-reported symptom, one sentence |
|
||||
| `reproduction_steps` | Phase 1 | Ordered steps to trigger the bug |
|
||||
| `root_cause` | Phase 3 | Confirmed cause with file:line reference |
|
||||
| `evidence_chain` | Phase 1-3 | Each item prefixed with "Phase N:" |
|
||||
| `fix_description` | Phase 4 | What code was changed and why |
|
||||
| `files_changed` | Phase 4 | Each file with change type and description |
|
||||
| `tests_added` | Phase 4 | Regression tests covering the bug |
|
||||
| `regression_check_result` | Phase 5 | Full test suite results |
|
||||
| `completion_status` | Phase 5 | Final status per protocol |
|
||||
| `concerns` | Phase 5 | Non-blocking issues (if any) |
|
||||
| `timestamp` | Phase 5 | When report was generated |
|
||||
| `investigation_duration_phases` | Phase 5 | Always 5 for complete investigation |
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Write Report File
|
||||
|
||||
Compute the filename:
|
||||
- `<slug>` = bug_description lowercased, non-alphanumeric characters replaced with `-`, truncated to 40 chars
|
||||
- `<date>` = current date as YYYY-MM-DD
|
||||
|
||||
```
|
||||
Bash: mkdir -p .workflow/.debug
|
||||
Write: .workflow/.debug/debug-report-<date>-<slug>.json
|
||||
Content: <debug_report JSON with 2-space indent>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Output Completion Status
|
||||
|
||||
Determine status and output completion block:
|
||||
|
||||
**Status determination**:
|
||||
|
||||
| Condition | Status |
|
||||
|-----------|--------|
|
||||
| Regression test passes, no new failures, all quality checks met | DONE |
|
||||
| Fix applied but partial test coverage, minor warnings, or non-critical concerns | DONE_WITH_CONCERNS |
|
||||
| New test failures introduced by fix (unresolvable), or critical concern | BLOCKED |
|
||||
|
||||
**DONE output**:
|
||||
|
||||
```
|
||||
## STATUS: DONE
|
||||
|
||||
**Summary**: Fixed <bug_description> — root cause was <root_cause_summary>
|
||||
|
||||
### Details
|
||||
- Phases completed: 5/5
|
||||
- Root cause: <confirmed_root_cause.description>
|
||||
- Fix: <fix_description>
|
||||
- Regression test: <test_name> in <test_file>
|
||||
|
||||
### Outputs
|
||||
- Debug report: .workflow/.debug/debug-report-<date>-<slug>.json
|
||||
- Files changed: <list>
|
||||
- Tests added: <list>
|
||||
```
|
||||
|
||||
**DONE_WITH_CONCERNS output**:
|
||||
|
||||
```
|
||||
## STATUS: DONE_WITH_CONCERNS
|
||||
|
||||
**Summary**: Fixed <bug_description> with concerns
|
||||
|
||||
### Details
|
||||
- Phases completed: 5/5
|
||||
- Concerns:
|
||||
1. <concern> — Impact: low|medium — Suggested fix: <action>
|
||||
|
||||
### Outputs
|
||||
- Debug report: .workflow/.debug/debug-report-<date>-<slug>.json
|
||||
- Files changed: <list>
|
||||
- Tests added: <list>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| `.workflow/.debug/debug-report-<date>-<slug>.json` | JSON file | Full structured investigation report |
|
||||
| Completion status block | Structured text output | Final status per Completion Status Protocol |
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Full test suite executed | Test command ran and produced output |
|
||||
| Regression test passes | test_results.regression_test_passed: true |
|
||||
| No new failures introduced | regression_check_result.new_failures is empty (or documented as pre-existing) |
|
||||
| Debug report written | File exists at `.workflow/.debug/debug-report-<date>-<slug>.json` |
|
||||
| Completion status output | Status block follows protocol format |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Test framework not detected | Try common commands in order; document uncertainty in concerns |
|
||||
| New failures related to fix | Return to Phase 4 to adjust; do not write report until resolved |
|
||||
| New failures unrelated | Document as pre_existing_failures, set DONE_WITH_CONCERNS if impactful |
|
||||
| Report directory not writable | Try alternate path `.workflow/debug/`; document in output |
|
||||
| Test suite takes >5 minutes | Run regression test only; note full suite skipped in concerns |
|
||||
| Regression test was not added in Phase 4 | Document as DONE_WITH_CONCERNS concern |
|
||||
341
.codex/skills/security-audit/agents/security-auditor.md
Normal file
341
.codex/skills/security-audit/agents/security-auditor.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# Security Auditor Agent
|
||||
|
||||
Executes all 4 phases of the security audit: supply chain scan, OWASP Top 10 review, STRIDE threat modeling, and scored report generation. Driven by orchestrator via assign_task through each phase.
|
||||
|
||||
## Identity
|
||||
|
||||
- **Type**: `analysis`
|
||||
- **Role File**: `~/.codex/agents/security-auditor.md`
|
||||
- **task_name**: `security-auditor`
|
||||
- **Responsibility**: Read-only analysis (Phases 1–3) + Write (Phase 4 report output)
|
||||
- **fork_context**: false
|
||||
- **Reasoning Effort**: high
|
||||
|
||||
## Boundaries
|
||||
|
||||
### MUST
|
||||
|
||||
- Load role definition via MANDATORY FIRST STEPS pattern
|
||||
- Produce structured JSON output for every phase
|
||||
- Include file:line references in all code-level findings
|
||||
- Enforce scoring gates: quick-scan >= 8/10; comprehensive initial >= 2/10
|
||||
- Deduplicate findings that appear in multiple phases (keep highest severity, merge evidence)
|
||||
- Write phase output files to `.workflow/.security/` before reporting completion
|
||||
|
||||
### MUST NOT
|
||||
|
||||
- Skip phases in comprehensive mode — all 4 phases must complete in sequence
|
||||
- Proceed to next phase before writing current phase output file
|
||||
- Include sensitive discovered values (actual secrets, credentials) in JSON evidence fields — redact with `[REDACTED]`
|
||||
- Apply suppression (`@ts-ignore`, empty catch) — report findings as-is
|
||||
|
||||
---
|
||||
|
||||
## Toolbox
|
||||
|
||||
### Available Tools
|
||||
|
||||
| Tool | Type | Purpose |
|
||||
|------|------|---------|
|
||||
| `Bash` | execution | Run dependency audits, grep patterns, file discovery, directory setup |
|
||||
| `Read` | read | Load phase files, specs, previous audit reports |
|
||||
| `Write` | write | Output JSON phase results to `.workflow/.security/` |
|
||||
| `Glob` | read | Discover source files by pattern for scoping |
|
||||
| `Grep` | read | Pattern-based security scanning across source files |
|
||||
| `spawn_agent` | agent | Spawn inline subagent for OWASP CLI analysis (Phase 2) |
|
||||
| `wait_agent` | agent | Await inline subagent result |
|
||||
| `close_agent` | agent | Close inline subagent after result received |
|
||||
|
||||
### Tool Usage Patterns
|
||||
|
||||
**Setup Pattern**: Ensure work directory exists before any phase output.
|
||||
```
|
||||
Bash("mkdir -p .workflow/.security")
|
||||
```
|
||||
|
||||
**Read Pattern**: Load phase spec before executing.
|
||||
```
|
||||
Read("~/.codex/skills/security-audit/phases/01-supply-chain-scan.md")
|
||||
Read("~/.codex/skills/security-audit/specs/scoring-gates.md")
|
||||
```
|
||||
|
||||
**Write Pattern**: Output structured JSON after each phase.
|
||||
```
|
||||
Write(".workflow/.security/supply-chain-report.json", <json_content>)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Execution
|
||||
|
||||
### Phase 1: Supply Chain Scan
|
||||
|
||||
**Objective**: Detect vulnerable dependencies, hardcoded secrets, CI/CD injection risks, and LLM prompt injection vectors.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Phase spec | Yes | `~/.codex/skills/security-audit/phases/01-supply-chain-scan.md` |
|
||||
| Project root | Yes | Working directory with source files |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/security-audit/phases/01-supply-chain-scan.md` for full execution instructions.
|
||||
2. Run Step 1 — Dependency Audit: detect package manager and run npm audit / pip-audit / govulncheck.
|
||||
3. Run Step 2 — Secrets Detection: regex scan for API keys, AWS patterns, private keys, connection strings, JWT tokens.
|
||||
4. Run Step 3 — CI/CD Config Review: scan `.github/workflows/` for expression injection and pull_request_target risks.
|
||||
5. Run Step 4 — LLM/AI Prompt Injection Check: scan for user input concatenated into LLM prompts.
|
||||
6. Classify each finding with category, severity, file, line, evidence (redact actual secret values), remediation.
|
||||
7. Write output file.
|
||||
|
||||
**Decision Table — Dependency Audit**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| npm / yarn lock file found | Run `npm audit --json` |
|
||||
| requirements.txt / pyproject.toml found | Run `pip-audit --format json`; fallback to `safety check --json` |
|
||||
| go.sum found | Run `govulncheck ./...` |
|
||||
| No lock files found | Log INFO finding: "No lock files detected"; continue |
|
||||
| Audit tool not installed | Log INFO finding: "<tool> not installed"; continue |
|
||||
|
||||
**Decision Table — Secrets Detection**:
|
||||
|
||||
| Pattern Match | Severity | Category |
|
||||
|---------------|----------|----------|
|
||||
| API key / secret / token with 16+ char value | Critical | secret |
|
||||
| AWS AKIA key pattern | Critical | secret |
|
||||
| `-----BEGIN PRIVATE KEY-----` | Critical | secret |
|
||||
| DB connection string with password | Critical | secret |
|
||||
| Hardcoded JWT token | High | secret |
|
||||
|
||||
**Output**: `.workflow/.security/supply-chain-report.json` — schema per phase spec.
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: OWASP Review
|
||||
|
||||
**Objective**: Systematic code-level review against all 10 OWASP Top 10 2021 categories.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Phase spec | Yes | `~/.codex/skills/security-audit/phases/02-owasp-review.md` |
|
||||
| OWASP checklist | Yes | `~/.codex/skills/security-audit/specs/owasp-checklist.md` |
|
||||
| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/security-audit/phases/02-owasp-review.md` for full execution instructions.
|
||||
2. Read `~/.codex/skills/security-audit/specs/owasp-checklist.md` for detection patterns.
|
||||
3. Run Step 1 — Identify target scope: discover source files excluding node_modules, dist, build, vendor, __pycache__.
|
||||
4. Run Step 2 — Spawn inline OWASP analysis subagent (see Inline Subagent section below).
|
||||
5. Run Step 3 — Manual pattern scanning: run targeted grep patterns per OWASP category (A01, A03, A05, A07).
|
||||
6. Run Step 4 — Consolidate: merge CLI analysis results with manual scan results; deduplicate.
|
||||
7. Set coverage field for each category: `checked` or `not_applicable`.
|
||||
8. Write output file.
|
||||
|
||||
**Decision Table — Scope**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Source files found | Proceed with full scan |
|
||||
| No source files detected | Report as BLOCKED with scope note |
|
||||
| Files > 500 | Prioritize: routes/, auth/, api/, handlers/ first |
|
||||
|
||||
**Output**: `.workflow/.security/owasp-findings.json` — schema per phase spec.
|
||||
|
||||
---
|
||||
|
||||
## Inline Subagent: OWASP CLI Analysis (Phase 2, Step 2)
|
||||
|
||||
**When**: After identifying target scope in Phase 2, Step 2.
|
||||
|
||||
**Agent File**: `~/.codex/agents/cli-explore-agent.md`
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "inline-owasp-analysis",
|
||||
fork_context: false,
|
||||
model: "haiku",
|
||||
reasoning_effort: "medium",
|
||||
message: `### MANDATORY FIRST STEPS
|
||||
1. Read: ~/.codex/agents/cli-explore-agent.md
|
||||
|
||||
Goal: OWASP Top 10 2021 security analysis of this codebase.
|
||||
Systematically check each OWASP category:
|
||||
A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
|
||||
A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
|
||||
A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
|
||||
A09 Security Logging/Monitoring Failures | A10 SSRF
|
||||
|
||||
Scope: @src/**/* @**/*.config.* @**/*.env.example
|
||||
|
||||
Expected: JSON findings per OWASP category with severity, file:line, evidence, remediation.
|
||||
|
||||
Constraints: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks`
|
||||
})
|
||||
const result = wait_agent({ targets: ["inline-owasp-analysis"], timeout_ms: 300000 })
|
||||
close_agent({ target: "inline-owasp-analysis" })
|
||||
```
|
||||
|
||||
**Result Handling**:
|
||||
|
||||
| Result | Action |
|
||||
|--------|--------|
|
||||
| Success | Integrate findings into owasp-findings.json consolidation step |
|
||||
| Timeout / Error | Continue with manual pattern scan results only; log warning |
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Threat Modeling
|
||||
|
||||
**Objective**: Apply STRIDE framework to architecture components; identify trust boundaries and attack surface.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Phase spec | Yes | `~/.codex/skills/security-audit/phases/03-threat-modeling.md` |
|
||||
| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
|
||||
| OWASP findings | Yes | `.workflow/.security/owasp-findings.json` |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/security-audit/phases/03-threat-modeling.md` for full execution instructions.
|
||||
2. Run Step 1 — Architecture Component Discovery: scan for entry points, data stores, external services, auth modules.
|
||||
3. Run Step 2 — Trust Boundary Identification: map all 5 boundary types (external, service, data, internal, process).
|
||||
4. Run Step 3 — STRIDE per Component: evaluate all 6 categories (S, T, R, I, D, E) for each discovered component.
|
||||
5. Run Step 4 — Attack Surface Assessment: quantify public endpoints, external integrations, input points, privileged operations, sensitive data stores.
|
||||
6. Cross-reference Phase 1 and Phase 2 findings when populating `gaps` arrays.
|
||||
7. Write output file.
|
||||
|
||||
**STRIDE Evaluation Decision Table**:
|
||||
|
||||
| Component Type | Priority STRIDE Categories |
|
||||
|----------------|---------------------------|
|
||||
| api_endpoint | S (spoofing), T (tampering), D (denial-of-service), E (elevation) |
|
||||
| auth_module | S (spoofing), R (repudiation), E (elevation) |
|
||||
| data_store | T (tampering), I (information disclosure), R (repudiation) |
|
||||
| external_service | T (tampering), I (information disclosure), D (denial-of-service) |
|
||||
| worker | T (tampering), D (denial-of-service) |
|
||||
|
||||
**Output**: `.workflow/.security/threat-model.json` — schema per phase spec.
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Report & Tracking
|
||||
|
||||
**Objective**: Aggregate all findings, calculate score, compare trends, write dated report.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Phase spec | Yes | `~/.codex/skills/security-audit/phases/04-report-tracking.md` |
|
||||
| Scoring gates | Yes | `~/.codex/skills/security-audit/specs/scoring-gates.md` |
|
||||
| Supply chain report | Yes | `.workflow/.security/supply-chain-report.json` |
|
||||
| OWASP findings | Yes | `.workflow/.security/owasp-findings.json` |
|
||||
| Threat model | Yes | `.workflow/.security/threat-model.json` |
|
||||
| Previous audits | No | `.workflow/.security/audit-report-*.json` (for trend) |
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. Read `~/.codex/skills/security-audit/phases/04-report-tracking.md` for full execution instructions.
|
||||
2. Aggregate all findings from phases 1–3 (supply-chain + owasp + STRIDE gaps).
|
||||
3. Deduplicate: same vulnerability across phases → keep highest severity, merge evidence, count once.
|
||||
4. Count files scanned (from phase outputs).
|
||||
5. Calculate score per formula: `base_score(10.0) - (weighted_sum / max(10, files_scanned))`.
|
||||
6. Find previous audit: `ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -1`.
|
||||
7. Compute trend direction and score_delta.
|
||||
8. Evaluate gate (initial vs. subsequent logic).
|
||||
9. Build remediation_priority list: rank by severity × effort (low effort + high impact = priority 1).
|
||||
10. Write dated report.
|
||||
11. Copy phase outputs to `.workflow/.security/` as latest copies.
|
||||
|
||||
**Score Calculation**:
|
||||
|
||||
| Severity | Weight |
|
||||
|----------|--------|
|
||||
| critical | 10 |
|
||||
| high | 7 |
|
||||
| medium | 4 |
|
||||
| low | 1 |
|
||||
|
||||
Formula: `final_score = max(0, round(10.0 - (weighted_sum / max(10, files_scanned)), 1))`
|
||||
|
||||
**Score Interpretation Table**:
|
||||
|
||||
| Score Range | Rating | Meaning |
|
||||
|-------------|--------|---------|
|
||||
| 9.0 – 10.0 | Excellent | Minimal risk, production-ready |
|
||||
| 7.0 – 8.9 | Good | Acceptable risk, minor improvements needed |
|
||||
| 5.0 – 6.9 | Fair | Notable risks, remediation recommended |
|
||||
| 3.0 – 4.9 | Poor | Significant risks, remediation required |
|
||||
| 0.0 – 2.9 | Critical | Severe vulnerabilities, immediate action needed |
|
||||
|
||||
**Gate Evaluation**:
|
||||
|
||||
| Condition | Gate Result | Status |
|
||||
|-----------|------------|--------|
|
||||
| No previous audit AND score >= 2.0 | PASS | Baseline established |
|
||||
| No previous audit AND score < 2.0 | FAIL | DONE_WITH_CONCERNS |
|
||||
| Previous audit AND score >= previous_score | PASS | No regression |
|
||||
| Previous audit AND score within 0.5 of previous | WARN | DONE_WITH_CONCERNS |
|
||||
| Previous audit AND score < previous_score - 0.5 | FAIL | DONE_WITH_CONCERNS |
|
||||
|
||||
**Trend Direction**:
|
||||
|
||||
| Condition | direction field |
|
||||
|-----------|----------------|
|
||||
| No previous audit | `baseline` |
|
||||
| score_delta > 0.5 | `improving` |
|
||||
| -0.5 <= score_delta <= 0.5 | `stable` |
|
||||
| score_delta < -0.5 | `regressing` |
|
||||
|
||||
**Output**: `.workflow/.security/audit-report-<YYYY-MM-DD>.json` — full schema per phase spec.
|
||||
|
||||
---
|
||||
|
||||
## Structured Output Template
|
||||
|
||||
```
|
||||
## Summary
|
||||
- One-sentence completion status with phase completed and finding count
|
||||
|
||||
## Score (Phase 4 / quick-scan)
|
||||
- Score: <N>/10 (<Rating>)
|
||||
- Gate: PASS|FAIL|WARN
|
||||
- Trend: <improving|stable|regressing|baseline> (delta: <+/-N.N>)
|
||||
|
||||
## Findings
|
||||
- Critical: <N> High: <N> Medium: <N> Low: <N>
|
||||
|
||||
## Phase Outputs Written
|
||||
- .workflow/.security/supply-chain-report.json
|
||||
- .workflow/.security/owasp-findings.json (if Phase 2 completed)
|
||||
- .workflow/.security/threat-model.json (if Phase 3 completed)
|
||||
- .workflow/.security/audit-report-<date>.json (if Phase 4 completed)
|
||||
|
||||
## Top Risks
|
||||
1. [severity] <title> — <file>:<line> — <remediation summary>
|
||||
2. [severity] <title> — <file>:<line> — <remediation summary>
|
||||
|
||||
## Open Questions
|
||||
1. <Any scope ambiguity or blocked items>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Phase spec file not found | Read from fallback path; report in Open Questions if unavailable |
|
||||
| Dependency audit tool missing | Log as INFO finding (category: dependency), continue with other steps |
|
||||
| No source files found | Report as BLOCKED with path; request scope clarification |
|
||||
| Inline subagent timeout (Phase 2) | Continue with manual grep results only; note in findings summary |
|
||||
| Phase output file write failure | Retry once; if still failing report as BLOCKED |
|
||||
| Previous audit parse error | Treat as baseline (no prior data); note in trend section |
|
||||
| Timeout approaching mid-phase | Output partial results with "PARTIAL" status, write what is available |
|
||||
384
.codex/skills/security-audit/orchestrator.md
Normal file
384
.codex/skills/security-audit/orchestrator.md
Normal file
@@ -0,0 +1,384 @@
|
||||
---
|
||||
name: security-audit
|
||||
description: OWASP Top 10 and STRIDE security auditing with supply chain analysis. Triggers on "security audit", "security scan", "cso".
|
||||
agents: security-auditor
|
||||
phases: 4
|
||||
---
|
||||
|
||||
# Security Audit
|
||||
|
||||
4-phase security audit covering supply chain risks, OWASP Top 10 code review, STRIDE threat modeling, and trend-tracked reporting. Produces structured JSON findings in `.workflow/.security/`.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+----------------------------------------------------------------------+
|
||||
| security-audit Orchestrator |
|
||||
| -> Mode selection: quick-scan (Phase 1 only) vs comprehensive |
|
||||
+-----------------------------------+----------------------------------+
|
||||
|
|
||||
+---------------------+---------------------+
|
||||
| |
|
||||
[quick-scan mode] [comprehensive mode]
|
||||
| |
|
||||
+---------v---------+ +------------v-----------+
|
||||
| Phase 1 | | Phase 1 |
|
||||
| Supply Chain Scan | | Supply Chain Scan |
|
||||
| -> supply-chain- | | -> supply-chain- |
|
||||
| report.json | | report.json |
|
||||
+---------+---------+ +------------+-----------+
|
||||
| |
|
||||
[score gate] +-----------v-----------+
|
||||
score >= 8/10 | Phase 2 |
|
||||
| | OWASP Review |
|
||||
[DONE or | -> owasp-findings. |
|
||||
DONE_WITH_CONCERNS] | json |
|
||||
+-----------+-----------+
|
||||
|
|
||||
+-----------v-----------+
|
||||
| Phase 3 |
|
||||
| Threat Modeling |
|
||||
| (STRIDE) |
|
||||
| -> threat-model.json |
|
||||
+-----------+-----------+
|
||||
|
|
||||
+-----------v-----------+
|
||||
| Phase 4 |
|
||||
| Report & Tracking |
|
||||
| -> audit-report- |
|
||||
| {date}.json |
|
||||
+-----------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Registry
|
||||
|
||||
| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
|
||||
|-------|-----------|-----------|----------------|---------|-------------|
|
||||
| security-auditor | security-auditor | ~/.codex/agents/security-auditor.md | Execute all 4 phases: dependency audit, OWASP review, STRIDE modeling, report generation | Deep Interaction (2.3) | false |
|
||||
|
||||
> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
|
||||
|
||||
---
|
||||
|
||||
## Fork Context Strategy
|
||||
|
||||
| Agent | task_name | fork_context | fork_from | Rationale |
|
||||
|-------|-----------|-------------|-----------|-----------|
|
||||
| security-auditor | security-auditor | false | — | Starts fresh; all context provided via assign_task phase messages |
|
||||
|
||||
**Fork Decision Rules**:
|
||||
|
||||
| Condition | fork_context | Reason |
|
||||
|-----------|-------------|--------|
|
||||
| security-auditor spawn | false | Self-contained pipeline; phase inputs passed via assign_task |
|
||||
|
||||
---
|
||||
|
||||
## Subagent Registry
|
||||
|
||||
Utility subagents spawned by `security-auditor` (not by the orchestrator):
|
||||
|
||||
| Subagent | Agent File | Callable By | Purpose | Model |
|
||||
|----------|-----------|-------------|---------|-------|
|
||||
| inline-owasp-analysis | ~/.codex/agents/cli-explore-agent.md | security-auditor (Phase 2) | OWASP Top 10 2021 code-level analysis | haiku |
|
||||
|
||||
> Subagents are spawned by agents within their own execution context (Pattern 2.8), not by the orchestrator.
|
||||
|
||||
---
|
||||
|
||||
## Mode Selection
|
||||
|
||||
Determine mode from user request before spawning any agent.
|
||||
|
||||
| User Intent | Mode | Phases to Execute | Gate |
|
||||
|-------------|------|-------------------|------|
|
||||
| "quick scan", "daily check", "fast audit" | quick-scan | Phase 1 only | score >= 8/10 |
|
||||
| "full audit", "comprehensive", "security audit", "cso" | comprehensive | Phases 1 → 2 → 3 → 4 | no regression (initial: >= 2/10) |
|
||||
| Ambiguous | Prompt user: "Quick-scan (Phase 1 only) or comprehensive (all 4 phases)?" | — | — |
|
||||
|
||||
---
|
||||
|
||||
## Phase Execution
|
||||
|
||||
### Phase 1: Supply Chain Scan
|
||||
|
||||
**Objective**: Detect low-hanging security risks in dependencies, secrets, CI/CD pipelines, and LLM integrations.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| Working directory | Project source to be scanned |
|
||||
| Mode | quick-scan or comprehensive |
|
||||
|
||||
**Execution**:
|
||||
|
||||
Spawn the security-auditor agent and assign Phase 1:
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "security-auditor",
|
||||
fork_context: false,
|
||||
message: `### MANDATORY FIRST STEPS
|
||||
1. Read: ~/.codex/skills/security-audit/agents/security-auditor.md
|
||||
|
||||
## TASK: Phase 1 — Supply Chain Scan
|
||||
|
||||
Mode: <quick-scan|comprehensive>
|
||||
Work directory: .workflow/.security
|
||||
|
||||
Execute Phase 1 per: ~/.codex/skills/security-audit/phases/01-supply-chain-scan.md
|
||||
|
||||
Deliverables:
|
||||
- .workflow/.security/supply-chain-report.json
|
||||
- Structured output summary with finding counts by severity`
|
||||
})
|
||||
const phase1Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**On timeout**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "security-auditor",
|
||||
items: [{ type: "text", text: "Finalize current supply chain scan and output supply-chain-report.json now." }]
|
||||
})
|
||||
const phase1Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 120000 })
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| `.workflow/.security/supply-chain-report.json` | Dependency, secrets, CI/CD, and LLM findings |
|
||||
|
||||
---
|
||||
|
||||
### Quick-Scan Gate (quick-scan mode only)
|
||||
|
||||
After Phase 1 completes, evaluate score and close agent.
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| score >= 8.0 | Status: DONE. No blocking issues. |
|
||||
| 6.0 <= score < 8.0 | Status: DONE_WITH_CONCERNS. Log warning — review before deploy. |
|
||||
| score < 6.0 | Status: DONE_WITH_CONCERNS. Block deployment. Remediate critical/high findings. |
|
||||
|
||||
```
|
||||
close_agent({ target: "security-auditor" })
|
||||
```
|
||||
|
||||
> **If quick-scan mode**: Stop here. Output final summary with score and findings count.
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: OWASP Review (comprehensive mode only)
|
||||
|
||||
**Objective**: Systematic code-level review against all 10 OWASP Top 10 2021 categories.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| `.workflow/.security/supply-chain-report.json` | Phase 1 findings for context |
|
||||
| Source files | All .ts/.js/.py/.go/.java excluding node_modules, dist, build |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "security-auditor",
|
||||
items: [{ type: "text", text: `## Phase 2 — OWASP Review
|
||||
|
||||
Execute Phase 2 per: ~/.codex/skills/security-audit/phases/02-owasp-review.md
|
||||
|
||||
Context: supply-chain-report.json already written to .workflow/.security/
|
||||
Reference: ~/.codex/skills/security-audit/specs/owasp-checklist.md
|
||||
|
||||
Deliverables:
|
||||
- .workflow/.security/owasp-findings.json
|
||||
- Coverage for all 10 OWASP categories (A01–A10)` }]
|
||||
})
|
||||
const phase2Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 360000 })
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| `.workflow/.security/owasp-findings.json` | OWASP findings with owasp_id, severity, file:line, evidence, remediation |
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Threat Modeling (comprehensive mode only)
|
||||
|
||||
**Objective**: Apply STRIDE threat model to architecture components; assess attack surface.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| `.workflow/.security/supply-chain-report.json` | Phase 1 findings |
|
||||
| `.workflow/.security/owasp-findings.json` | Phase 2 findings |
|
||||
| Source files | Route handlers, data stores, auth modules, external service clients |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "security-auditor",
|
||||
items: [{ type: "text", text: `## Phase 3 — Threat Modeling (STRIDE)
|
||||
|
||||
Execute Phase 3 per: ~/.codex/skills/security-audit/phases/03-threat-modeling.md
|
||||
|
||||
Context: supply-chain-report.json and owasp-findings.json available in .workflow/.security/
|
||||
Cross-reference Phase 1 and Phase 2 findings when mapping STRIDE categories.
|
||||
|
||||
Deliverables:
|
||||
- .workflow/.security/threat-model.json
|
||||
- All 6 STRIDE categories (S, T, R, I, D, E) evaluated per component
|
||||
- Trust boundaries and attack surface quantified` }]
|
||||
})
|
||||
const phase3Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 360000 })
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| `.workflow/.security/threat-model.json` | STRIDE threat model with components, trust boundaries, attack surface |
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Report & Tracking (comprehensive mode only)
|
||||
|
||||
**Objective**: Calculate score, compare with previous audits, generate date-stamped report.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| `.workflow/.security/supply-chain-report.json` | Phase 1 output |
|
||||
| `.workflow/.security/owasp-findings.json` | Phase 2 output |
|
||||
| `.workflow/.security/threat-model.json` | Phase 3 output |
|
||||
| `.workflow/.security/audit-report-*.json` | Previous audit reports (optional, for trend) |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "security-auditor",
|
||||
items: [{ type: "text", text: `## Phase 4 — Report & Tracking
|
||||
|
||||
Execute Phase 4 per: ~/.codex/skills/security-audit/phases/04-report-tracking.md
|
||||
|
||||
Scoring reference: ~/.codex/skills/security-audit/specs/scoring-gates.md
|
||||
|
||||
Steps:
|
||||
1. Aggregate all findings from phases 1–3
|
||||
2. Calculate score using formula: base 10.0 - (weighted_sum / normalization)
|
||||
3. Check for previous audit: ls -t .workflow/.security/audit-report-*.json | head -1
|
||||
4. Compute trend (improving/stable/regressing/baseline)
|
||||
5. Evaluate gate (initial >= 2/10; subsequent >= previous_score)
|
||||
6. Write .workflow/.security/audit-report-<YYYY-MM-DD>.json
|
||||
|
||||
Deliverables:
|
||||
- .workflow/.security/audit-report-<YYYY-MM-DD>.json
|
||||
- Updated copies of all phase outputs in .workflow/.security/` }]
|
||||
})
|
||||
const phase4Result = wait_agent({ targets: ["security-auditor"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| `.workflow/.security/audit-report-<date>.json` | Full scored report with trend, top risks, remediation priority |
|
||||
|
||||
---
|
||||
|
||||
### Comprehensive Gate (comprehensive mode only)
|
||||
|
||||
After Phase 4 completes, evaluate gate and close agent.
|
||||
|
||||
| Audit Type | Condition | Result | Action |
|
||||
|------------|-----------|--------|--------|
|
||||
| Initial (no prior audit) | score >= 2.0 | PASS | DONE. Baseline established. Plan remediation. |
|
||||
| Initial | score < 2.0 | FAIL | DONE_WITH_CONCERNS. Critical exposure. Immediate triage required. |
|
||||
| Subsequent | score >= previous_score | PASS | DONE. No regression. |
|
||||
| Subsequent | previous_score - 0.5 <= score < previous_score | WARN | DONE_WITH_CONCERNS. Marginal change. Review new findings. |
|
||||
| Subsequent | score < previous_score - 0.5 | FAIL | DONE_WITH_CONCERNS. Regression detected. Investigate new findings. |
|
||||
|
||||
```
|
||||
close_agent({ target: "security-auditor" })
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Lifecycle Management
|
||||
|
||||
### Timeout Protocol
|
||||
|
||||
| Phase | Default Timeout | On Timeout |
|
||||
|-------|-----------------|------------|
|
||||
| Phase 1: Supply Chain | 300000 ms (5 min) | assign_task "Finalize output now", re-wait 120s |
|
||||
| Phase 2: OWASP Review | 360000 ms (6 min) | assign_task "Output partial findings", re-wait 120s |
|
||||
| Phase 3: Threat Modeling | 360000 ms (6 min) | assign_task "Output partial threat model", re-wait 120s |
|
||||
| Phase 4: Report | 300000 ms (5 min) | assign_task "Write report with available data", re-wait 120s |
|
||||
|
||||
### Cleanup Protocol
|
||||
|
||||
Agent is closed after the final executed phase (Phase 1 for quick-scan, Phase 4 for comprehensive).
|
||||
|
||||
```
|
||||
close_agent({ target: "security-auditor" })
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Agent timeout (first) | assign_task "Finalize current work and output now" + re-wait 120000 ms |
|
||||
| Agent timeout (second) | Log error, close_agent({ target: "security-auditor" }), report partial results |
|
||||
| Phase output file missing | assign_task requesting specific file output, re-wait |
|
||||
| Audit tool not installed (npm/pip) | Phase 1 logs as INFO finding and continues — not a blocker |
|
||||
| No previous audit found | Treat as baseline — apply initial gate (>= 2/10) |
|
||||
| User cancellation | close_agent({ target: "security-auditor" }), report current state |
|
||||
|
||||
---
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
## Summary
|
||||
- One-sentence completion status with mode and final score
|
||||
|
||||
## Score
|
||||
- Overall: <N>/10 (<Rating>)
|
||||
- Gate: PASS|FAIL|WARN
|
||||
- Mode: quick-scan|comprehensive
|
||||
|
||||
## Findings
|
||||
- Critical: <N>
|
||||
- High: <N>
|
||||
- Medium: <N>
|
||||
- Low: <N>
|
||||
|
||||
## Artifacts
|
||||
- File: .workflow/.security/supply-chain-report.json
|
||||
- File: .workflow/.security/owasp-findings.json (comprehensive only)
|
||||
- File: .workflow/.security/threat-model.json (comprehensive only)
|
||||
- File: .workflow/.security/audit-report-<date>.json (comprehensive only)
|
||||
|
||||
## Top Risks
|
||||
1. <Most critical finding with file:line and remediation>
|
||||
2. <Second finding>
|
||||
|
||||
## Next Steps
|
||||
1. Remediate critical findings (effort: <low|medium|high>)
|
||||
2. Re-run audit to verify fixes
|
||||
```
|
||||
226
.codex/skills/security-audit/phases/01-supply-chain-scan.md
Normal file
226
.codex/skills/security-audit/phases/01-supply-chain-scan.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# Phase 1: Supply Chain Scan
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Detect low-hanging security risks in third-party dependencies, hardcoded secrets, CI/CD pipelines, and LLM/AI integrations.
|
||||
|
||||
## Objective
|
||||
|
||||
- Audit third-party dependencies for known vulnerabilities
|
||||
- Scan source code for leaked secrets and credentials
|
||||
- Review CI/CD configuration for injection risks
|
||||
- Check for LLM/AI prompt injection vulnerabilities
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Project root | Yes | Working directory containing source files and dependency manifests |
|
||||
| WORK_DIR | Yes | `.workflow/.security` — output directory (create if not exists) |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Dependency Audit
|
||||
|
||||
Detect package manager and run appropriate audit tool.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| `package-lock.json` or `yarn.lock` present | Run `npm audit --json` |
|
||||
| `requirements.txt` or `pyproject.toml` present | Run `pip-audit --format json`; fallback `safety check --json` |
|
||||
| `go.sum` present | Run `govulncheck ./...` |
|
||||
| No manifest files found | Log INFO finding: "No dependency manifests detected"; continue |
|
||||
| Audit tool not installed | Log INFO finding: "<tool> not installed — manual review needed"; continue |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```bash
|
||||
# Ensure output directory exists
|
||||
mkdir -p .workflow/.security
|
||||
WORK_DIR=".workflow/.security"
|
||||
|
||||
# Node.js projects
|
||||
if [ -f package-lock.json ] || [ -f yarn.lock ]; then
|
||||
npm audit --json > "${WORK_DIR}/npm-audit-raw.json" 2>&1 || true
|
||||
fi
|
||||
|
||||
# Python projects
|
||||
if [ -f requirements.txt ] || [ -f pyproject.toml ]; then
|
||||
pip-audit --format json --output "${WORK_DIR}/pip-audit-raw.json" 2>&1 || true
|
||||
# Fallback: safety check
|
||||
safety check --json > "${WORK_DIR}/safety-raw.json" 2>&1 || true
|
||||
fi
|
||||
|
||||
# Go projects
|
||||
if [ -f go.sum ]; then
|
||||
govulncheck ./... 2>&1 | tee "${WORK_DIR}/govulncheck-raw.txt" || true
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Secrets Detection
|
||||
|
||||
Scan source files for hardcoded secrets using regex patterns. Exclude generated, compiled, and dependency directories.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Match Type | Severity | Category |
|
||||
|------------|----------|----------|
|
||||
| API key / token with 16+ chars | Critical | secret |
|
||||
| AWS AKIA key pattern | Critical | secret |
|
||||
| Private key PEM block | Critical | secret |
|
||||
| DB connection string with embedded password | Critical | secret |
|
||||
| Hardcoded JWT token | High | secret |
|
||||
| No matches | — | No finding |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```bash
|
||||
# High-confidence patterns (case-insensitive)
|
||||
grep -rniE \
|
||||
'(api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token|secret[_-]?key)\s*[:=]\s*["\x27][A-Za-z0-9+/=_-]{16,}' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' --include='*.go' \
|
||||
--include='*.java' --include='*.rb' --include='*.env' --include='*.yml' \
|
||||
--include='*.yaml' --include='*.json' --include='*.toml' --include='*.cfg' \
|
||||
. || true
|
||||
|
||||
# AWS patterns
|
||||
grep -rniE '(AKIA[0-9A-Z]{16}|aws[_-]?secret[_-]?access[_-]?key)' . || true
|
||||
|
||||
# Private keys
|
||||
grep -rniE '-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----' . || true
|
||||
|
||||
# Connection strings with passwords
|
||||
grep -rniE '(mongodb|postgres|mysql|redis)://[^:]+:[^@]+@' . || true
|
||||
|
||||
# JWT tokens (hardcoded)
|
||||
grep -rniE 'eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}' . || true
|
||||
```
|
||||
|
||||
Exclude from scan: `node_modules/`, `.git/`, `dist/`, `build/`, `__pycache__/`, `*.lock`, `*.min.js`.
|
||||
|
||||
Redact actual matched secret values in findings — use `[REDACTED]` in evidence field.
|
||||
|
||||
---
|
||||
|
||||
### Step 3: CI/CD Config Review
|
||||
|
||||
Check GitHub Actions and other CI/CD configurations for injection risks.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Pattern Found | Severity | Finding |
|
||||
|---------------|----------|---------|
|
||||
| `${{ github.event.` in `run:` block | High | Expression injection in workflow run step |
|
||||
| `pull_request_target` with checkout of PR code | High | Privileged workflow triggered by untrusted code |
|
||||
| `actions/checkout@v1` or `@v2` | Medium | Deprecated action version with known issues |
|
||||
| `secrets.` passed to untrusted context | High | Secret exposure risk |
|
||||
| No `.github/workflows/` directory | — | Not applicable; skip |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```bash
|
||||
# Find workflow files
|
||||
find .github/workflows -name '*.yml' -o -name '*.yaml' 2>/dev/null
|
||||
|
||||
# Check for expression injection in run: blocks
|
||||
# Dangerous: ${{ github.event.pull_request.title }} in run:
|
||||
grep -rn '\${{.*github\.event\.' .github/workflows/ 2>/dev/null || true
|
||||
|
||||
# Check for pull_request_target with checkout of PR code
|
||||
grep -rn 'pull_request_target' .github/workflows/ 2>/dev/null || true
|
||||
|
||||
# Check for use of deprecated/vulnerable actions
|
||||
grep -rn 'actions/checkout@v1\|actions/checkout@v2' .github/workflows/ 2>/dev/null || true
|
||||
|
||||
# Check for secrets passed to untrusted contexts
|
||||
grep -rn 'secrets\.' .github/workflows/ 2>/dev/null || true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4: LLM/AI Prompt Injection Check
|
||||
|
||||
Scan for patterns indicating prompt injection risk in LLM integrations.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Pattern Found | Severity | Finding |
|
||||
|---------------|----------|---------|
|
||||
| User input directly concatenated into prompt/system_message | High | LLM prompt injection vector |
|
||||
| User input in template string passed to LLM call | High | LLM prompt injection via template |
|
||||
| f-string with user data in `.complete`/`.generate` call | High | Python LLM prompt injection |
|
||||
| LLM API call detected, no injection pattern | Low | LLM integration present — review for sanitization |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```bash
|
||||
# User input concatenated directly into prompts
|
||||
grep -rniE '(prompt|system_message|messages)\s*[+=].*\b(user_input|request\.(body|query|params)|req\.)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Template strings with user data in LLM calls
|
||||
grep -rniE '(openai|anthropic|llm|chat|completion)\.' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Check for missing input sanitization before LLM calls
|
||||
grep -rniE 'f".*{.*}.*".*\.(chat|complete|generate)' \
|
||||
--include='*.py' . || true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| `.workflow/.security/supply-chain-report.json` | JSON | All supply chain findings with severity classifications |
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "supply-chain-scan",
|
||||
"timestamp": "ISO-8601",
|
||||
"findings": [
|
||||
{
|
||||
"category": "dependency|secret|cicd|llm",
|
||||
"severity": "critical|high|medium|low",
|
||||
"title": "Finding title",
|
||||
"description": "Detailed description",
|
||||
"file": "path/to/file",
|
||||
"line": 42,
|
||||
"evidence": "matched text or context",
|
||||
"remediation": "How to fix"
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"total": 0,
|
||||
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
|
||||
"by_category": { "dependency": 0, "secret": 0, "cicd": 0, "llm": 0 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| All 4 scan steps executed or explicitly skipped with reason | Review step execution log |
|
||||
| `supply-chain-report.json` written to `.workflow/.security/` | File exists and is valid JSON |
|
||||
| All findings have category, severity, file, evidence, remediation | JSON schema check |
|
||||
| Secret values redacted in evidence field | No raw credential values in output |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Audit tool not installed | Log INFO finding; continue with remaining steps |
|
||||
| `grep` finds no matches | No finding generated for that pattern; continue |
|
||||
| `.github/workflows/` does not exist | Mark CI/CD step as not_applicable; continue |
|
||||
| Write to WORK_DIR fails | Attempt `mkdir -p .workflow/.security` and retry once |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 2: OWASP Review](02-owasp-review.md)
|
||||
232
.codex/skills/security-audit/phases/02-owasp-review.md
Normal file
232
.codex/skills/security-audit/phases/02-owasp-review.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# Phase 2: OWASP Review
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Systematic code-level review against OWASP Top 10 2021 categories using inline subagent analysis and targeted pattern scanning.
|
||||
|
||||
## Objective
|
||||
|
||||
- Review codebase against all 10 OWASP Top 10 2021 categories
|
||||
- Use inline subagent multi-model analysis for comprehensive coverage
|
||||
- Produce structured findings with file:line references and remediation steps
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| `~/.codex/skills/security-audit/specs/owasp-checklist.md` | Yes | Detection patterns per OWASP category |
|
||||
| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings for dependency context |
|
||||
| Project source files | Yes | `.ts`, `.js`, `.py`, `.go`, `.java` excluding deps/build |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Identify Target Scope
|
||||
|
||||
Discover source files, excluding generated and dependency directories.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Source files found | Proceed to Step 2 |
|
||||
| No source files found | Report as BLOCKED with path note; do not proceed |
|
||||
| Files > 500 | Prioritize routes/, auth/, api/, handlers/ first |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```bash
|
||||
# Identify source directories (exclude deps, build, test fixtures)
|
||||
# Focus on: API routes, auth modules, data access, input handlers
|
||||
find . -type f \( -name '*.ts' -o -name '*.js' -o -name '*.py' -o -name '*.go' -o -name '*.java' \) \
|
||||
! -path '*/node_modules/*' ! -path '*/dist/*' ! -path '*/.git/*' \
|
||||
! -path '*/build/*' ! -path '*/__pycache__/*' ! -path '*/vendor/*' \
|
||||
| head -200
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Inline Subagent OWASP Analysis
|
||||
|
||||
Spawn inline subagent using `cli-explore-agent` role to perform systematic OWASP analysis.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Subagent completes successfully | Integrate findings into Step 4 consolidation |
|
||||
| Subagent times out | Continue with manual pattern scan (Step 3) only; log warning |
|
||||
| Subagent errors | Continue with manual pattern scan only; log warning |
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "inline-owasp-analysis",
|
||||
fork_context: false,
|
||||
model: "haiku",
|
||||
reasoning_effort: "medium",
|
||||
message: `### MANDATORY FIRST STEPS
|
||||
1. Read: ~/.codex/agents/cli-explore-agent.md
|
||||
|
||||
Goal: OWASP Top 10 2021 security audit of this codebase.
|
||||
Systematically check each OWASP category:
|
||||
A01 Broken Access Control | A02 Cryptographic Failures | A03 Injection |
|
||||
A04 Insecure Design | A05 Security Misconfiguration | A06 Vulnerable Components |
|
||||
A07 Identification/Auth Failures | A08 Software/Data Integrity Failures |
|
||||
A09 Security Logging/Monitoring Failures | A10 SSRF
|
||||
|
||||
TASK: For each OWASP category, scan relevant code patterns, identify vulnerabilities with file:line references, classify severity, provide remediation.
|
||||
|
||||
MODE: analysis
|
||||
|
||||
CONTEXT: @src/**/* @**/*.config.* @**/*.env.example
|
||||
|
||||
EXPECTED: JSON-structured findings per OWASP category with severity, file:line, evidence, remediation.
|
||||
|
||||
CONSTRAINTS: Code-level analysis only | Every finding must have file:line reference | Focus on real vulnerabilities not theoretical risks`
|
||||
})
|
||||
const result = wait_agent({ targets: ["inline-owasp-analysis"], timeout_ms: 300000 })
|
||||
close_agent({ target: "inline-owasp-analysis" })
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Manual Pattern Scanning
|
||||
|
||||
Supplement inline subagent analysis with targeted grep patterns per OWASP category. Reference `~/.codex/skills/security-audit/specs/owasp-checklist.md` for full pattern list.
|
||||
|
||||
**A01 — Broken Access Control**:
|
||||
|
||||
```bash
|
||||
# Missing auth middleware on routes
|
||||
grep -rn 'app\.\(get\|post\|put\|delete\|patch\)(' --include='*.ts' --include='*.js' . | grep -v 'auth\|middleware\|protect'
|
||||
# Direct object references without ownership check
|
||||
grep -rn 'params\.id\|req\.params\.' --include='*.ts' --include='*.js' . || true
|
||||
```
|
||||
|
||||
**A03 — Injection**:
|
||||
|
||||
```bash
|
||||
# SQL string concatenation
|
||||
grep -rniE '(query|execute|raw)\s*\(\s*[`"'\'']\s*SELECT.*\+\s*|f".*SELECT.*{' --include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
# Command injection
|
||||
grep -rniE '(exec|spawn|system|popen|subprocess)\s*\(' --include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
```
|
||||
|
||||
**A05 — Security Misconfiguration**:
|
||||
|
||||
```bash
|
||||
# Debug mode enabled
|
||||
grep -rniE '(DEBUG|debug)\s*[:=]\s*(true|True|1|"true")' --include='*.env' --include='*.py' --include='*.ts' --include='*.json' . || true
|
||||
# CORS wildcard
|
||||
grep -rniE "cors.*\*|Access-Control-Allow-Origin.*\*" --include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
```
|
||||
|
||||
**A07 — Identification and Authentication Failures**:
|
||||
|
||||
```bash
|
||||
# Weak password patterns
|
||||
grep -rniE 'password.*length.*[0-5][^0-9]|minlength.*[0-5][^0-9]' --include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
# Hardcoded credentials
|
||||
grep -rniE '(password|passwd|pwd)\s*[:=]\s*["\x27][^"\x27]{3,}' --include='*.ts' --include='*.js' --include='*.py' --include='*.env' . || true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Consolidate Findings
|
||||
|
||||
Merge inline subagent results and manual pattern scan results. Deduplicate and classify by OWASP category.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Same finding in both sources | Keep highest severity; merge evidence; note both sources |
|
||||
| Finding lacks file:line reference | Attempt to resolve via grep; if not resolvable, mark evidence as "pattern match — no line ref" |
|
||||
| Category has no findings | Set coverage to `checked` with 0 findings |
|
||||
| Category not applicable to project stack | Set coverage to `not_applicable` with reason |
|
||||
|
||||
---
|
||||
|
||||
## OWASP Top 10 2021 Coverage
|
||||
|
||||
| ID | Category | Key Checks |
|
||||
|----|----------|------------|
|
||||
| A01 | Broken Access Control | Missing auth, IDOR, path traversal, CORS |
|
||||
| A02 | Cryptographic Failures | Weak algorithms, plaintext storage, missing TLS |
|
||||
| A03 | Injection | SQL, NoSQL, OS command, LDAP, XPath injection |
|
||||
| A04 | Insecure Design | Missing threat modeling, insecure business logic |
|
||||
| A05 | Security Misconfiguration | Debug enabled, default creds, verbose errors |
|
||||
| A06 | Vulnerable and Outdated Components | Known CVEs in dependencies (from Phase 1) |
|
||||
| A07 | Identification and Authentication Failures | Weak passwords, missing MFA, session issues |
|
||||
| A08 | Software and Data Integrity Failures | Unsigned updates, insecure deserialization, CI/CD |
|
||||
| A09 | Security Logging and Monitoring Failures | Missing audit logs, no alerting, insufficient logging |
|
||||
| A10 | Server-Side Request Forgery (SSRF) | Unvalidated URLs, internal resource access |
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| `.workflow/.security/owasp-findings.json` | JSON | Findings per OWASP category with coverage map |
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "owasp-review",
|
||||
"timestamp": "ISO-8601",
|
||||
"owasp_version": "2021",
|
||||
"findings": [
|
||||
{
|
||||
"owasp_id": "A01",
|
||||
"owasp_category": "Broken Access Control",
|
||||
"severity": "critical|high|medium|low",
|
||||
"title": "Finding title",
|
||||
"description": "Detailed description",
|
||||
"file": "path/to/file",
|
||||
"line": 42,
|
||||
"evidence": "code snippet or pattern match",
|
||||
"remediation": "Specific fix recommendation",
|
||||
"cwe": "CWE-XXX"
|
||||
}
|
||||
],
|
||||
"coverage": {
|
||||
"A01": "checked|not_applicable",
|
||||
"A02": "checked|not_applicable",
|
||||
"A03": "checked|not_applicable",
|
||||
"A04": "checked|not_applicable",
|
||||
"A05": "checked|not_applicable",
|
||||
"A06": "checked|not_applicable",
|
||||
"A07": "checked|not_applicable",
|
||||
"A08": "checked|not_applicable",
|
||||
"A09": "checked|not_applicable",
|
||||
"A10": "checked|not_applicable"
|
||||
},
|
||||
"summary": {
|
||||
"total": 0,
|
||||
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
|
||||
"categories_checked": 10,
|
||||
"categories_with_findings": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| All 10 OWASP categories have coverage entry | JSON coverage map has all A01–A10 keys |
|
||||
| All findings have owasp_id, severity, file, evidence, remediation | JSON schema check |
|
||||
| `owasp-findings.json` written to `.workflow/.security/` | File exists and is valid JSON |
|
||||
| Inline subagent result integrated (or skip logged) | Summary includes source note |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Inline subagent timeout | Continue with manual grep results; log "inline-owasp-analysis timed out" in summary |
|
||||
| OWASP checklist spec not found | Use built-in patterns from this file; note missing spec |
|
||||
| No source files in scope | Report BLOCKED with path; set all categories to not_applicable |
|
||||
| Grep produces no matches for a category | Set that category coverage to `checked` with 0 findings |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 3: Threat Modeling](03-threat-modeling.md)
|
||||
249
.codex/skills/security-audit/phases/03-threat-modeling.md
Normal file
249
.codex/skills/security-audit/phases/03-threat-modeling.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Phase 3: Threat Modeling
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Map STRIDE threat categories to architecture components, identify trust boundaries, and assess attack surface.
|
||||
|
||||
## Objective
|
||||
|
||||
- Apply the STRIDE threat model to the project architecture
|
||||
- Identify trust boundaries between system components
|
||||
- Assess attack surface area per component
|
||||
- Cross-reference with Phase 1 and Phase 2 findings
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings for dependency/CI context |
|
||||
| `.workflow/.security/owasp-findings.json` | Yes | Phase 2 findings to cross-reference in STRIDE gaps |
|
||||
| Project source files | Yes | Route handlers, data stores, external service clients, auth modules |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Architecture Component Discovery
|
||||
|
||||
Identify major system components by scanning project structure.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Component Pattern Found | component.type |
|
||||
|------------------------|----------------|
|
||||
| `app.get/post/put/delete/patch`, `router.`, `@app.route`, `@router.` | api_endpoint |
|
||||
| `createConnection`, `mongoose.connect`, `sqlite`, `redis`, `S3`, `createClient` | data_store |
|
||||
| `fetch`, `axios`, `http.request`, `requests.get/post`, `urllib` | external_service |
|
||||
| `jwt`, `passport`, `session`, `oauth`, `bcrypt`, `argon2`, `crypto` | auth_module |
|
||||
| `worker`, `subprocess`, `child_process`, `celery`, `queue` | worker |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```bash
|
||||
# Identify entry points (API routes, CLI commands, event handlers)
|
||||
grep -rlE '(app\.(get|post|put|delete|patch|use)|router\.|@app\.route|@router\.)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Identify data stores (database connections, file storage)
|
||||
grep -rlE '(createConnection|mongoose\.connect|sqlite|redis|S3|createClient)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Identify external service integrations
|
||||
grep -rlE '(fetch|axios|http\.request|requests\.(get|post)|urllib)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
|
||||
# Identify auth/session components
|
||||
grep -rlE '(jwt|passport|session|oauth|bcrypt|argon2|crypto)' \
|
||||
--include='*.ts' --include='*.js' --include='*.py' . || true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Trust Boundary Identification
|
||||
|
||||
Map the 5 standard trust boundary types. For each boundary: document what data crosses it, how it is enforced, and what happens when enforcement fails.
|
||||
|
||||
**Trust Boundary Types**:
|
||||
|
||||
| Boundary | From | To | Key Data Crossing |
|
||||
|----------|------|----|------------------|
|
||||
| External boundary | User/browser | Application server | User input, credentials, session tokens |
|
||||
| Service boundary | Application | External APIs/services | API keys, request bodies, response data |
|
||||
| Data boundary | Application | Database/storage | Query parameters, credentials, PII |
|
||||
| Internal boundary | Public routes | Authenticated/admin routes | Auth tokens, role claims |
|
||||
| Process boundary | Main process | Worker/subprocess | Job parameters, environment variables |
|
||||
|
||||
For each boundary, document:
|
||||
- What crosses the boundary (data types, credentials)
|
||||
- How the boundary is enforced (middleware, TLS, auth)
|
||||
- What happens when enforcement fails
|
||||
|
||||
---
|
||||
|
||||
### Step 3: STRIDE per Component
|
||||
|
||||
For each discovered component, evaluate all 6 STRIDE categories systematically.
|
||||
|
||||
**STRIDE Category Definitions**:
|
||||
|
||||
| Category | Threat | Key Question |
|
||||
|----------|--------|-------------|
|
||||
| S — Spoofing | Identity impersonation | Can an attacker pretend to be someone else? |
|
||||
| T — Tampering | Data modification | Can data be modified in transit or at rest? |
|
||||
| R — Repudiation | Deniable actions | Can a user deny performing an action? |
|
||||
| I — Information Disclosure | Data leakage | Can sensitive data be exposed? |
|
||||
| D — Denial of Service | Availability disruption | Can the system be made unavailable? |
|
||||
| E — Elevation of Privilege | Unauthorized access | Can a user gain higher privileges? |
|
||||
|
||||
**Spoofing Analysis Checks**:
|
||||
- Are authentication mechanisms in place at all entry points?
|
||||
- Can API keys or tokens be forged or replayed?
|
||||
- Are session tokens properly validated and rotated?
|
||||
|
||||
**Tampering Analysis Checks**:
|
||||
- Is input validation applied before processing?
|
||||
- Are database queries parameterized?
|
||||
- Can request bodies or headers be manipulated to alter behavior?
|
||||
- Are file uploads validated for type and content?
|
||||
|
||||
**Repudiation Analysis Checks**:
|
||||
- Are user actions logged with sufficient detail (who, what, when)?
|
||||
- Are logs tamper-proof or centralized?
|
||||
- Can critical operations (payments, deletions) be traced to a user?
|
||||
|
||||
**Information Disclosure Analysis Checks**:
|
||||
- Do error responses leak stack traces or internal paths?
|
||||
- Are sensitive fields (passwords, tokens) excluded from logs and API responses?
|
||||
- Is PII properly handled (encryption at rest, masking in logs)?
|
||||
- Do debug endpoints or verbose modes expose internals?
|
||||
|
||||
**Denial of Service Analysis Checks**:
|
||||
- Are rate limits applied to public endpoints?
|
||||
- Can resource-intensive operations be triggered without limits?
|
||||
- Are file upload sizes bounded?
|
||||
- Are database queries bounded (pagination, timeouts)?
|
||||
|
||||
**Elevation of Privilege Analysis Checks**:
|
||||
- Are role/permission checks applied consistently?
|
||||
- Can horizontal privilege escalation occur (accessing other users' data)?
|
||||
- Can vertical escalation occur (user -> admin)?
|
||||
- Are admin/debug routes properly protected?
|
||||
|
||||
**Component Exposure Rating**:
|
||||
|
||||
| Rating | Criteria |
|
||||
|--------|----------|
|
||||
| High | Public-facing, handles sensitive data, complex logic |
|
||||
| Medium | Authenticated access, moderate data sensitivity |
|
||||
| Low | Internal only, no sensitive data, simple operations |
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Attack Surface Assessment
|
||||
|
||||
Quantify the attack surface across the entire system.
|
||||
|
||||
**Attack Surface Components**:
|
||||
|
||||
```
|
||||
Attack Surface = Sum of:
|
||||
- Number of public API endpoints
|
||||
- Number of external service integrations
|
||||
- Number of user-controllable input points
|
||||
- Number of privileged operations
|
||||
- Number of data stores with sensitive content
|
||||
```
|
||||
|
||||
**Decision Table — Attack Surface Rating**:
|
||||
|
||||
| Total Score | Interpretation |
|
||||
|-------------|---------------|
|
||||
| 0–5 | Low attack surface |
|
||||
| 6–15 | Moderate attack surface |
|
||||
| 16–30 | High attack surface |
|
||||
| > 30 | Very high attack surface — prioritize hardening |
|
||||
|
||||
Cross-reference Phase 1 and Phase 2 findings when populating `gaps` arrays for each STRIDE category. A finding in Phase 2 (e.g., A03 injection) maps to STRIDE T (Tampering) for the relevant component.
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| `.workflow/.security/threat-model.json` | JSON | STRIDE model with components, trust boundaries, attack surface |
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "threat-modeling",
|
||||
"timestamp": "ISO-8601",
|
||||
"framework": "STRIDE",
|
||||
"components": [
|
||||
{
|
||||
"name": "Component name",
|
||||
"type": "api_endpoint|data_store|external_service|auth_module|worker",
|
||||
"files": ["path/to/file.ts"],
|
||||
"exposure": "high|medium|low",
|
||||
"trust_boundaries": ["external", "data"],
|
||||
"threats": {
|
||||
"spoofing": {
|
||||
"applicable": true,
|
||||
"findings": ["Description of threat"],
|
||||
"mitigations": ["Existing mitigation"],
|
||||
"gaps": ["Missing mitigation"]
|
||||
},
|
||||
"tampering": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
|
||||
"repudiation": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
|
||||
"information_disclosure": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
|
||||
"denial_of_service": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] },
|
||||
"elevation_of_privilege": { "applicable": true, "findings": [], "mitigations": [], "gaps": [] }
|
||||
}
|
||||
}
|
||||
],
|
||||
"trust_boundaries": [
|
||||
{
|
||||
"name": "Boundary name",
|
||||
"from": "Component A",
|
||||
"to": "Component B",
|
||||
"enforcement": "TLS|auth_middleware|API_key",
|
||||
"data_crossing": ["request bodies", "credentials"],
|
||||
"risk_level": "high|medium|low"
|
||||
}
|
||||
],
|
||||
"attack_surface": {
|
||||
"public_endpoints": 0,
|
||||
"external_integrations": 0,
|
||||
"input_points": 0,
|
||||
"privileged_operations": 0,
|
||||
"sensitive_data_stores": 0,
|
||||
"total_score": 0
|
||||
},
|
||||
"summary": {
|
||||
"components_analyzed": 0,
|
||||
"threats_identified": 0,
|
||||
"by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 },
|
||||
"high_exposure_components": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| At least one component analyzed | `components` array has at least 1 entry |
|
||||
| All 6 STRIDE categories evaluated per component | Each component.threats has all 6 keys |
|
||||
| Trust boundaries mapped | `trust_boundaries` array populated |
|
||||
| Attack surface quantified | `attack_surface.total_score` calculated |
|
||||
| `threat-model.json` written to `.workflow/.security/` | File exists and is valid JSON |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| No components discovered via grep | Analyze project structure manually (README, package.json); note uncertainty |
|
||||
| Phase 2 findings not available for cross-reference | Proceed with grep-only; note missing OWASP context |
|
||||
| Ambiguous architecture (monolith vs microservices) | Document assumption in summary; note for user review |
|
||||
| No `.github/workflows/` for CI boundary | Mark process boundary as not_applicable |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 4: Report & Tracking](04-report-tracking.md)
|
||||
300
.codex/skills/security-audit/phases/04-report-tracking.md
Normal file
300
.codex/skills/security-audit/phases/04-report-tracking.md
Normal file
@@ -0,0 +1,300 @@
|
||||
# Phase 4: Report & Tracking
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Generate scored audit report, compare with previous audits, and track security trends.
|
||||
|
||||
## Objective
|
||||
|
||||
- Calculate security score from all phase findings
|
||||
- Compare with previous audit results (if available)
|
||||
- Generate date-stamped report in `.workflow/.security/`
|
||||
- Track improvement or regression trends
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| `.workflow/.security/supply-chain-report.json` | Yes | Phase 1 findings |
|
||||
| `.workflow/.security/owasp-findings.json` | Yes | Phase 2 findings |
|
||||
| `.workflow/.security/threat-model.json` | Yes | Phase 3 findings (STRIDE gaps) |
|
||||
| `.workflow/.security/audit-report-*.json` | No | Previous audit reports for trend comparison |
|
||||
| `~/.codex/skills/security-audit/specs/scoring-gates.md` | Yes | Scoring formula and gate thresholds |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Aggregate Findings
|
||||
|
||||
Collect all findings from phases 1–3 and classify by severity.
|
||||
|
||||
**Aggregation Formula**:
|
||||
|
||||
```
|
||||
All findings =
|
||||
supply-chain-report.findings
|
||||
+ owasp-findings.findings
|
||||
+ threat-model threats (where gaps array is non-empty)
|
||||
```
|
||||
|
||||
**Deduplication Rule**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Same vulnerability appears in multiple phases | Keep highest-severity classification; merge evidence; count as single finding |
|
||||
| Same file:line in different categories | Merge into one finding; note all phases that detected it |
|
||||
| Unique finding per phase | Include as-is |
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Calculate Score
|
||||
|
||||
Apply scoring formula from `~/.codex/skills/security-audit/specs/scoring-gates.md`.
|
||||
|
||||
**Scoring Formula**:
|
||||
|
||||
```
|
||||
Base score = 10.0
|
||||
|
||||
For each finding:
|
||||
penalty = severity_weight / total_files_scanned
|
||||
- Critical: weight = 10 (each critical finding has outsized impact)
|
||||
- High: weight = 7
|
||||
- Medium: weight = 4
|
||||
- Low: weight = 1
|
||||
|
||||
Weighted penalty = SUM(finding_weight * count_per_severity) / normalization_factor
|
||||
Final score = max(0, 10.0 - weighted_penalty)
|
||||
|
||||
Normalization factor = max(10, total_files_scanned)
|
||||
```
|
||||
|
||||
**Severity Weights**:
|
||||
|
||||
| Severity | Weight | Criteria | Examples |
|
||||
|----------|--------|----------|----------|
|
||||
| Critical | 10 | Exploitable with high impact, no user interaction needed | RCE, SQL injection with data access, leaked production credentials, auth bypass |
|
||||
| High | 7 | Exploitable with significant impact, may need user interaction | Broken authentication, SSRF, privilege escalation, XSS with session theft |
|
||||
| Medium | 4 | Limited exploitability or moderate impact | Reflected XSS, CSRF, verbose error messages, missing security headers |
|
||||
| Low | 1 | Informational or minimal impact | Missing best-practice headers, minor info disclosure, deprecated dependencies without known exploit |
|
||||
|
||||
**Score Interpretation**:
|
||||
|
||||
| Score | Rating | Meaning |
|
||||
|-------|--------|---------|
|
||||
| 9.0–10.0 | Excellent | Minimal risk, production-ready |
|
||||
| 7.0–8.9 | Good | Acceptable risk, minor improvements needed |
|
||||
| 5.0–6.9 | Fair | Notable risks, remediation recommended |
|
||||
| 3.0–4.9 | Poor | Significant risks, remediation required |
|
||||
| 0.0–2.9 | Critical | Severe vulnerabilities, immediate action needed |
|
||||
|
||||
**Example Score Calculations**:
|
||||
|
||||
| Findings | Files Scanned | Weighted Sum | Penalty | Score |
|
||||
|----------|--------------|--------------|---------|-------|
|
||||
| 1 critical | 50 | 10 | 0.2 | 9.8 |
|
||||
| 2 critical, 3 high | 50 | 41 | 0.82 | 9.2 |
|
||||
| 5 critical, 10 high | 50 | 120 | 2.4 | 7.6 |
|
||||
| 10 critical, 20 high, 15 medium | 100 | 300 | 3.0 | 7.0 |
|
||||
| 20 critical | 20 | 200 | 10.0 | 0.0 |
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Gate Evaluation
|
||||
|
||||
**Daily quick-scan gate** (Phase 1 only):
|
||||
|
||||
| Result | Condition | Action |
|
||||
|--------|-----------|--------|
|
||||
| PASS | score >= 8.0 | Continue. No blocking issues. |
|
||||
| WARN | 6.0 <= score < 8.0 | Log warning. Review findings before deploy. |
|
||||
| FAIL | score < 6.0 | Block deployment. Remediate critical/high findings. |
|
||||
|
||||
**Comprehensive audit gate** (all phases):
|
||||
|
||||
Initial/baseline audit (no previous audit exists):
|
||||
|
||||
| Result | Condition | Action |
|
||||
|--------|-----------|--------|
|
||||
| PASS | score >= 2.0 | Baseline established. Plan remediation. |
|
||||
| FAIL | score < 2.0 | Critical exposure. Immediate triage required. |
|
||||
|
||||
Subsequent audits (previous audit exists):
|
||||
|
||||
| Result | Condition | Action |
|
||||
|--------|-----------|--------|
|
||||
| PASS | score >= previous_score | No regression. Continue improvement. |
|
||||
| WARN | score within 0.5 of previous | Marginal change. Review new findings. |
|
||||
| FAIL | score < previous_score - 0.5 | Regression detected. Investigate new findings. |
|
||||
|
||||
Production readiness target: score >= 7.0
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Trend Comparison
|
||||
|
||||
Find and compare with previous audit reports.
|
||||
|
||||
**Execution**:
|
||||
|
||||
```bash
|
||||
# Find previous audit reports
|
||||
ls -t .workflow/.security/audit-report-*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
**Trend Direction Decision Table**:
|
||||
|
||||
| Condition | direction |
|
||||
|-----------|-----------|
|
||||
| No previous audit file found | `baseline` |
|
||||
| score_delta > 0.5 | `improving` |
|
||||
| -0.5 <= score_delta <= 0.5 | `stable` |
|
||||
| score_delta < -0.5 | `regressing` |
|
||||
|
||||
Compare current vs. previous:
|
||||
- Delta per OWASP category (new findings vs. resolved findings)
|
||||
- Delta per STRIDE category
|
||||
- New findings vs. resolved findings (by title/file comparison)
|
||||
- Overall score trend
|
||||
|
||||
**Trend JSON Format**:
|
||||
|
||||
```json
|
||||
{
|
||||
"trend": {
|
||||
"current_date": "2026-03-29",
|
||||
"current_score": 7.5,
|
||||
"previous_date": "2026-03-22",
|
||||
"previous_score": 6.8,
|
||||
"score_delta": 0.7,
|
||||
"new_findings": 2,
|
||||
"resolved_findings": 5,
|
||||
"direction": "improving",
|
||||
"history": [
|
||||
{ "date": "2026-03-15", "score": 5.2, "total_findings": 45 },
|
||||
{ "date": "2026-03-22", "score": 6.8, "total_findings": 32 },
|
||||
{ "date": "2026-03-29", "score": 7.5, "total_findings": 29 }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Generate Report
|
||||
|
||||
Assemble and write the final scored report.
|
||||
|
||||
**Execution**:
|
||||
|
||||
```bash
|
||||
# Ensure directory exists
|
||||
mkdir -p .workflow/.security
|
||||
|
||||
# Write report with date stamp
|
||||
DATE=$(date +%Y-%m-%d)
|
||||
cp "${WORK_DIR}/audit-report.json" ".workflow/.security/audit-report-${DATE}.json"
|
||||
|
||||
# Also maintain latest copies of phase outputs
|
||||
cp "${WORK_DIR}/supply-chain-report.json" ".workflow/.security/" 2>/dev/null || true
|
||||
cp "${WORK_DIR}/owasp-findings.json" ".workflow/.security/" 2>/dev/null || true
|
||||
cp "${WORK_DIR}/threat-model.json" ".workflow/.security/" 2>/dev/null || true
|
||||
```
|
||||
|
||||
Build `remediation_priority` list: rank by severity weight × inverse effort (low effort + high impact = priority 1).
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| `.workflow/.security/audit-report-<YYYY-MM-DD>.json` | JSON | Full scored report with trend, top risks, remediation priority |
|
||||
|
||||
```json
|
||||
{
|
||||
"report": "security-audit",
|
||||
"version": "1.0",
|
||||
"timestamp": "ISO-8601",
|
||||
"date": "YYYY-MM-DD",
|
||||
"mode": "comprehensive|quick-scan",
|
||||
"score": {
|
||||
"overall": 7.5,
|
||||
"rating": "Good",
|
||||
"gate": "PASS|FAIL",
|
||||
"gate_threshold": 8
|
||||
},
|
||||
"findings_summary": {
|
||||
"total": 0,
|
||||
"by_severity": { "critical": 0, "high": 0, "medium": 0, "low": 0 },
|
||||
"by_phase": {
|
||||
"supply_chain": 0,
|
||||
"owasp": 0,
|
||||
"stride": 0
|
||||
},
|
||||
"by_owasp": {
|
||||
"A01": 0, "A02": 0, "A03": 0, "A04": 0, "A05": 0,
|
||||
"A06": 0, "A07": 0, "A08": 0, "A09": 0, "A10": 0
|
||||
},
|
||||
"by_stride": { "S": 0, "T": 0, "R": 0, "I": 0, "D": 0, "E": 0 }
|
||||
},
|
||||
"top_risks": [
|
||||
{
|
||||
"rank": 1,
|
||||
"title": "Most critical finding",
|
||||
"severity": "critical",
|
||||
"source_phase": "owasp",
|
||||
"remediation": "How to fix",
|
||||
"effort": "low|medium|high"
|
||||
}
|
||||
],
|
||||
"trend": {
|
||||
"previous_date": "YYYY-MM-DD or null",
|
||||
"previous_score": 0,
|
||||
"score_delta": 0,
|
||||
"new_findings": 0,
|
||||
"resolved_findings": 0,
|
||||
"direction": "improving|stable|regressing|baseline"
|
||||
},
|
||||
"phases_completed": ["supply-chain-scan", "owasp-review", "threat-modeling", "report-tracking"],
|
||||
"files_scanned": 0,
|
||||
"remediation_priority": [
|
||||
{
|
||||
"priority": 1,
|
||||
"finding": "Finding title",
|
||||
"effort": "low",
|
||||
"impact": "high",
|
||||
"recommendation": "Specific action"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Score calculated using correct formula | Verify: base 10.0 - (weighted_sum / max(10, files)) |
|
||||
| Gate evaluation matches mode and audit history | Check gate logic against previous audit presence |
|
||||
| Trend direction computed correctly | Verify score_delta and direction mapping |
|
||||
| `audit-report-<date>.json` written to `.workflow/.security/` | File exists, is valid JSON, contains all required fields |
|
||||
| remediation_priority ranked by severity and effort | Priority 1 = highest severity + lowest effort |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Phase data file missing or corrupted | Report as BLOCKED; output partial report with available data |
|
||||
| Previous audit parse error | Treat as baseline; note data integrity issue |
|
||||
| files_scanned is zero | Use normalization_factor of 10 (minimum); continue |
|
||||
| Date command unavailable | Use ISO timestamp substring for date portion |
|
||||
| Write fails | Retry once with explicit `mkdir -p`; report BLOCKED if still failing |
|
||||
|
||||
## Completion Status
|
||||
|
||||
After report generation, output skill completion status:
|
||||
|
||||
| Status | Condition |
|
||||
|--------|-----------|
|
||||
| DONE | All phases completed, report generated, gate PASS |
|
||||
| DONE_WITH_CONCERNS | Report generated but gate WARN or FAIL, or regression detected |
|
||||
| BLOCKED | Phase data missing or corrupted, cannot calculate score |
|
||||
318
.codex/skills/ship/agents/ship-operator.md
Normal file
318
.codex/skills/ship/agents/ship-operator.md
Normal file
@@ -0,0 +1,318 @@
|
||||
# ship-operator Agent
|
||||
|
||||
Executes all 5 gated phases of the release pipeline sequentially, enforcing gate conditions before advancing.
|
||||
|
||||
## Identity
|
||||
|
||||
- **Type**: `pipeline-executor`
|
||||
- **Role File**: `~/.codex/agents/ship-operator.md`
|
||||
- **task_name**: `ship-operator`
|
||||
- **Responsibility**: Code generation / Execution (write mode — git, file updates, push, PR)
|
||||
- **fork_context**: false
|
||||
|
||||
## Boundaries
|
||||
|
||||
### MUST
|
||||
|
||||
- Load role definition via MANDATORY FIRST STEPS pattern
|
||||
- Read the phase detail file at the start of each phase before executing any step
|
||||
- Check gate condition after each phase and halt on failure
|
||||
- Produce structured JSON output for each completed phase
|
||||
- Confirm with user before proceeding on major version bumps or direct-to-main releases
|
||||
- Include file:line references in any findings
|
||||
|
||||
### MUST NOT
|
||||
|
||||
- Skip the MANDATORY FIRST STEPS role loading
|
||||
- Advance to the next phase if the current phase gate fails
|
||||
- Push to remote if Phase 3 (version bump) gate failed
|
||||
- Create a PR if Phase 4 (push) gate failed
|
||||
- Produce unstructured output
|
||||
- Modify files outside the release pipeline scope (version file, CHANGELOG.md, package-lock.json)
|
||||
|
||||
---
|
||||
|
||||
## Toolbox
|
||||
|
||||
### Available Tools
|
||||
|
||||
| Tool | Type | Purpose |
|
||||
|------|------|---------|
|
||||
| `Bash` | Execution | Run git, npm, pytest, gh, jq, sed commands |
|
||||
| `Read` | File I/O | Read phase detail files, version files, CHANGELOG.md |
|
||||
| `Write` | File I/O | Write/update CHANGELOG.md, VERSION file |
|
||||
| `Edit` | File I/O | Update package.json, pyproject.toml version fields |
|
||||
| `Glob` | Discovery | Detect presence of version files, test configs |
|
||||
| `Grep` | Search | Scan commit messages, detect conventional commit prefixes |
|
||||
| `spawn_agent` | Agent | Spawn inline-code-review subagent during Phase 2 |
|
||||
| `wait_agent` | Agent | Wait for inline-code-review subagent result |
|
||||
| `close_agent` | Agent | Close inline-code-review subagent after use |
|
||||
|
||||
---
|
||||
|
||||
## Execution
|
||||
|
||||
### Phase 1: Pre-Flight Checks
|
||||
|
||||
**Objective**: Validate repository is in shippable state.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| ~/.codex/skills/ship/phases/01-preflight-checks.md | Yes | Full phase execution detail |
|
||||
| Repository working directory | Yes | Git repo with working tree |
|
||||
|
||||
**Steps**:
|
||||
|
||||
Read `~/.codex/skills/ship/phases/01-preflight-checks.md` first.
|
||||
|
||||
Then execute all four checks as specified in that file:
|
||||
1. Git clean check — `git status --porcelain`
|
||||
2. Branch validation — `git branch --show-current`
|
||||
3. Test suite execution — detect and run npm test / pytest
|
||||
4. Build verification — detect and run npm run build / python -m build / make build
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| All checks pass | Set gate = pass, output preflight JSON, await Phase 2 task |
|
||||
| Any check fails | Set gate = fail, output BLOCKED with failure details, halt |
|
||||
| Branch is main/master | Set gate = warn, ask user to confirm direct release |
|
||||
| No tests detected | Set gate = warn (skip), continue to build check |
|
||||
| No build step detected | Set gate = pass (info), continue |
|
||||
|
||||
**Output**: Structured preflight-report JSON (see phase file for schema).
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Code Review
|
||||
|
||||
**Objective**: Diff analysis and AI-powered code review via inline subagent.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| ~/.codex/skills/ship/phases/02-code-review.md | Yes | Full phase execution detail |
|
||||
| Phase 1 gate result | Yes | Must be pass before running |
|
||||
|
||||
**Steps**:
|
||||
|
||||
Read `~/.codex/skills/ship/phases/02-code-review.md` first.
|
||||
|
||||
1. Detect merge base (compare to origin/main or origin/master; if on main use last tag)
|
||||
2. Generate diff summary (`git diff --stat`, count files/lines)
|
||||
3. Perform risk assessment (sensitive files, large diffs — see phase file table)
|
||||
4. Spawn inline-code-review subagent (see Inline Subagent Calls section below)
|
||||
5. Evaluate review results against gate condition
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| No critical issues | Set gate = pass, output review JSON |
|
||||
| Critical issues found | Set gate = fail, output BLOCKED with issues list |
|
||||
| Warnings only | Set gate = warn, proceed, flag DONE_WITH_CONCERNS |
|
||||
| Subagent timeout or error | Log warning, ask user whether to proceed or retry |
|
||||
|
||||
**Output**: Structured code-review JSON (see phase file for schema).
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Version Bump
|
||||
|
||||
**Objective**: Detect version file, determine and apply bump.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| ~/.codex/skills/ship/phases/03-version-bump.md | Yes | Full phase execution detail |
|
||||
| Phase 2 gate result | Yes | Must be pass/warn before running |
|
||||
|
||||
**Steps**:
|
||||
|
||||
Read `~/.codex/skills/ship/phases/03-version-bump.md` first.
|
||||
|
||||
1. Detect version file (package.json > pyproject.toml > VERSION)
|
||||
2. Read current version
|
||||
3. Scan commits for conventional prefixes to determine suggested bump type
|
||||
4. For major bumps: ask user to confirm before proceeding
|
||||
5. Calculate new version (semver)
|
||||
6. Update version file using jq / sed / echo as appropriate
|
||||
7. Verify update by re-reading
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Version file found and updated | Set gate = pass, output version record |
|
||||
| No version file found | Set gate = needs_context, ask user, halt until answered |
|
||||
| Version mismatch after update | Set gate = fail, output BLOCKED |
|
||||
| User declines major bump | Set gate = blocked, halt |
|
||||
| Bump type ambiguous | Default to patch, inform user |
|
||||
|
||||
**Output**: Structured version-bump JSON (see phase file for schema).
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Changelog & Commit
|
||||
|
||||
**Objective**: Generate changelog, create release commit, push to remote.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| ~/.codex/skills/ship/phases/04-changelog-commit.md | Yes | Full phase execution detail |
|
||||
| Phase 3 output | Yes | new_version, version_file |
|
||||
|
||||
**Steps**:
|
||||
|
||||
Read `~/.codex/skills/ship/phases/04-changelog-commit.md` first.
|
||||
|
||||
1. Gather commits since last tag (`git log "$last_tag"..HEAD`)
|
||||
2. Group by conventional commit prefix into changelog sections
|
||||
3. Format markdown changelog entry (`## [X.Y.Z] - YYYY-MM-DD`)
|
||||
4. Update or create CHANGELOG.md (insert new entry after main heading)
|
||||
5. Stage changes (`git add -u`)
|
||||
6. Create release commit (`chore: bump version to <new_version>`)
|
||||
7. Push branch to remote
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Push succeeded | Set gate = pass, output commit record |
|
||||
| Push rejected (non-fast-forward) | Set gate = fail, BLOCKED — suggest `git pull --rebase` |
|
||||
| Permission denied | Set gate = fail, BLOCKED — advise check remote access |
|
||||
| No remote configured | Set gate = fail, BLOCKED — suggest `git remote add` |
|
||||
| No previous tag | Use last 50 commits for changelog |
|
||||
|
||||
**Output**: Structured changelog-commit JSON (see phase file for schema).
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: PR Creation
|
||||
|
||||
**Objective**: Create PR with structured body and linked issues.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| ~/.codex/skills/ship/phases/05-pr-creation.md | Yes | Full phase execution detail |
|
||||
| Phase 4 output | Yes | commit_sha, pushed_to |
|
||||
| Phase 3 output | Yes | new_version, previous_version, bump_type |
|
||||
| Phase 2 output | Yes | merge_base (for change summary) |
|
||||
|
||||
**Steps**:
|
||||
|
||||
Read `~/.codex/skills/ship/phases/05-pr-creation.md` first.
|
||||
|
||||
1. Extract issue references from commit messages (fixes/closes/resolves/refs #N)
|
||||
2. Determine target branch (main fallback master)
|
||||
3. Build PR title: `release: v<new_version>`
|
||||
4. Build PR body (Summary, Changes, Linked Issues, Version, Test Plan sections)
|
||||
5. Create PR via `gh pr create`
|
||||
6. Capture PR URL from gh output
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| PR created, URL returned | Set gate = pass, output PR record, output DONE |
|
||||
| Phase 2 had warnings only | Set gate = pass with concerns, output DONE_WITH_CONCERNS |
|
||||
| gh CLI not available | Set gate = fail, BLOCKED — advise `gh auth login` |
|
||||
| PR creation fails | Set gate = fail, BLOCKED — report error details |
|
||||
|
||||
**Output**: Structured PR creation JSON plus final completion status (see phase file for schema).
|
||||
|
||||
---
|
||||
|
||||
## Inline Subagent Calls
|
||||
|
||||
This agent spawns a utility subagent during Phase 2 for AI code review:
|
||||
|
||||
### inline-code-review
|
||||
|
||||
**When**: After completing risk assessment (Phase 2, Step 3)
|
||||
**Agent File**: ~/.codex/agents/cli-explore-agent.md
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "inline-code-review",
|
||||
fork_context: false,
|
||||
model: "haiku",
|
||||
reasoning_effort: "medium",
|
||||
message: `### MANDATORY FIRST STEPS
|
||||
1. Read: ~/.codex/agents/cli-explore-agent.md
|
||||
|
||||
Goal: Review code changes for release readiness
|
||||
Context: Diff from <merge_base> to HEAD (<files_changed> files, +<lines_added>/-<lines_removed> lines)
|
||||
|
||||
Task:
|
||||
- Review diff for bugs and correctness issues
|
||||
- Check for breaking changes (API, config, schema)
|
||||
- Identify security concerns
|
||||
- Assess test coverage gaps
|
||||
- Flag formatting-only changes to exclude from critical issues
|
||||
|
||||
Expected: Risk level (low/medium/high), list of issues with severity and file:line reference, release recommendation (ship|hold|fix-first)
|
||||
Constraints: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes`
|
||||
})
|
||||
const result = wait_agent({ targets: ["inline-code-review"], timeout_ms: 300000 })
|
||||
close_agent({ target: "inline-code-review" })
|
||||
```
|
||||
|
||||
### Result Handling
|
||||
|
||||
| Result | Severity | Action |
|
||||
|--------|----------|--------|
|
||||
| recommendation: "ship", no critical issues | — | gate = pass, integrate findings |
|
||||
| recommendation: "hold" or critical issues found | HIGH | gate = fail, BLOCKED — list issues |
|
||||
| recommendation: "fix-first" | HIGH | gate = fail, BLOCKED — list issues with locations |
|
||||
| Warnings only, recommendation: "ship" | MEDIUM | gate = warn, proceed with DONE_WITH_CONCERNS |
|
||||
| Timeout or error | — | Log warning, ask user whether to proceed or retry |
|
||||
|
||||
---
|
||||
|
||||
## Structured Output Template
|
||||
|
||||
```
|
||||
## Summary
|
||||
- One-sentence phase completion status
|
||||
|
||||
## Phase Result
|
||||
- Phase: <phase_name>
|
||||
- Gate: pass | fail | warn | blocked | needs_context
|
||||
- Status: PASS | BLOCKED | NEEDS_CONTEXT | DONE_WITH_CONCERNS | DONE
|
||||
|
||||
## Findings
|
||||
- Finding 1: specific description with file:line reference (if applicable)
|
||||
- Finding 2: specific description with file:line reference (if applicable)
|
||||
|
||||
## Artifacts
|
||||
- File: path/to/modified/file
|
||||
Change: specific modification made
|
||||
|
||||
## Open Questions
|
||||
1. Question needing user answer (if gate = needs_context)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Phase detail file not found | Report error, halt — phase files are required |
|
||||
| Git command fails | Report stderr, set gate = fail, BLOCKED |
|
||||
| Version file parse error | Report error, set gate = needs_context, ask user |
|
||||
| Inline subagent timeout | Log warning, ask user whether to proceed without AI review |
|
||||
| Build/test failure | Report output, set gate = fail, BLOCKED |
|
||||
| Push rejected | Report rejection reason, set gate = fail, BLOCKED with suggestion |
|
||||
| gh CLI missing | Report error, set gate = fail, BLOCKED with install advice |
|
||||
| Three consecutive failures at same step | Stop, output diagnostic dump, halt |
|
||||
426
.codex/skills/ship/orchestrator.md
Normal file
426
.codex/skills/ship/orchestrator.md
Normal file
@@ -0,0 +1,426 @@
|
||||
---
|
||||
name: ship
|
||||
description: Structured release pipeline with pre-flight checks, AI code review, version bump, changelog, and PR creation. Triggers on "ship", "release", "publish".
|
||||
agents: ship-operator
|
||||
phases: 5
|
||||
---
|
||||
|
||||
# Ship
|
||||
|
||||
Structured release pipeline that guides code from working branch to pull request through 5 gated phases: pre-flight checks, automated code review, version bump, changelog generation, and PR creation.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+--------------------------------------------------------------+
|
||||
| ship Orchestrator |
|
||||
| -> Single ship-operator agent driven through 5 gated phases |
|
||||
+------------------------------+-------------------------------+
|
||||
|
|
||||
+-------------------+-------------------+
|
||||
v v v
|
||||
+------------+ +------------+ +------------+
|
||||
| Phase 1 | --> | Phase 2 | --> | Phase 3 |
|
||||
| Pre-Flight | | Code Review| | Version |
|
||||
| Checks | | | | Bump |
|
||||
+------------+ +------------+ +------------+
|
||||
v v v
|
||||
Gate: ALL Gate: No Gate: Version
|
||||
4 checks critical updated OK
|
||||
pass issues
|
||||
|
|
||||
+-------------------+-------------------+
|
||||
v v
|
||||
+------------+ +------------+
|
||||
| Phase 4 | ----------------------> | Phase 5 |
|
||||
| Changelog | | PR Creation|
|
||||
| & Commit | | |
|
||||
+------------+ +------------+
|
||||
v v
|
||||
Gate: Push Gate: PR
|
||||
succeeded created
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Registry
|
||||
|
||||
| Agent | task_name | Role File | Responsibility | Pattern | fork_context |
|
||||
|-------|-----------|-----------|----------------|---------|--------------|
|
||||
| ship-operator | ship-operator | ~/.codex/agents/ship-operator.md | Execute all 5 release phases sequentially, enforce gates | Deep Interaction (2.3) | false |
|
||||
|
||||
> **COMPACT PROTECTION**: Agent files are execution documents. When context compression occurs and agent instructions are reduced to summaries, **you MUST immediately `Read` the corresponding agent.md to reload before continuing execution**.
|
||||
|
||||
---
|
||||
|
||||
## Fork Context Strategy
|
||||
|
||||
| Agent | task_name | fork_context | fork_from | Rationale |
|
||||
|-------|-----------|--------------|-----------|-----------|
|
||||
| ship-operator | ship-operator | false | — | Starts fresh; all context provided in initial task message |
|
||||
|
||||
**Fork Decision Rules**:
|
||||
|
||||
| Condition | fork_context | Reason |
|
||||
|-----------|--------------|--------|
|
||||
| Pipeline stage with explicit input | false | Context in message, not history |
|
||||
| Agent is isolated utility | false | Clean context, focused task |
|
||||
| ship-operator | false | Self-contained release operator; no parent context needed |
|
||||
|
||||
---
|
||||
|
||||
## Subagent Registry
|
||||
|
||||
Utility subagents callable by ship-operator (not separate pipeline stages):
|
||||
|
||||
| Subagent | Agent File | Callable By | Purpose | Model |
|
||||
|----------|-----------|-------------|---------|-------|
|
||||
| inline-code-review | ~/.codex/agents/cli-explore-agent.md | ship-operator | AI code review of diff during Phase 2 | haiku |
|
||||
|
||||
> Subagents are spawned by agents within their own execution context (Pattern 2.8), not by the orchestrator.
|
||||
|
||||
---
|
||||
|
||||
## Phase Execution
|
||||
|
||||
### Phase 1: Pre-Flight Checks
|
||||
|
||||
**Objective**: Validate that the repository is in a shippable state — confirm clean working tree, appropriate branch, passing tests, and successful build.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| User trigger | "ship" / "release" / "publish" command |
|
||||
| Repository | Current git working directory |
|
||||
| Phase detail | ~/.codex/skills/ship/phases/01-preflight-checks.md |
|
||||
|
||||
**Execution**:
|
||||
|
||||
Spawn ship-operator with Phase 1 task. The operator reads the phase detail file then executes all four checks.
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "ship-operator",
|
||||
fork_context: false,
|
||||
message: `## TASK ASSIGNMENT
|
||||
|
||||
### MANDATORY FIRST STEPS
|
||||
1. Read role definition: ~/.codex/agents/ship-operator.md (MUST read first)
|
||||
2. Read phase detail: ~/.codex/skills/ship/phases/01-preflight-checks.md
|
||||
|
||||
---
|
||||
|
||||
Goal: Execute Phase 1 Pre-Flight Checks for the release pipeline.
|
||||
|
||||
Execute all four checks (git clean, branch validation, test suite, build verification).
|
||||
Output structured preflight-report JSON plus gate status.`
|
||||
})
|
||||
const phase1Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**Gate Decision**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| All four checks pass (overall: "pass") | Fast-advance: assign Phase 2 task to ship-operator |
|
||||
| Any check fails (overall: "fail") | BLOCKED — report failure details, halt pipeline |
|
||||
| Branch is main/master (warn) | Ask user to confirm direct-to-main release before proceeding |
|
||||
| Timeout | assign_task "Finalize current work and output results", re-wait 120s |
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| preflight-report JSON | Pass/fail per check, blockers list |
|
||||
| Gate status | pass / fail / blocked |
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Code Review
|
||||
|
||||
**Objective**: Detect merge base, generate diff, run AI-powered code review via inline subagent, assess risk, evaluate results.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| Phase 1 result | Gate passed (overall: "pass") |
|
||||
| Repository | Git history, diff data |
|
||||
| Phase detail | ~/.codex/skills/ship/phases/02-code-review.md |
|
||||
|
||||
**Execution**:
|
||||
|
||||
Phase 2 is assigned to the already-running ship-operator via assign_task.
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "ship-operator",
|
||||
items: [{ type: "text", text: `## PHASE 2 TASK
|
||||
|
||||
Read phase detail: ~/.codex/skills/ship/phases/02-code-review.md
|
||||
|
||||
Execute Phase 2 Code Review:
|
||||
1. Detect merge base
|
||||
2. Generate diff summary
|
||||
3. Perform risk assessment
|
||||
4. Spawn inline-code-review subagent for AI analysis
|
||||
5. Evaluate review results and report gate status` }]
|
||||
})
|
||||
const phase2Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 600000 })
|
||||
```
|
||||
|
||||
**Gate Decision**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| No critical issues (overall: "pass") | Fast-advance: assign Phase 3 task to ship-operator |
|
||||
| Critical issues found (overall: "fail") | BLOCKED — report critical issues list, halt pipeline |
|
||||
| Warnings only (overall: "warn") | Fast-advance to Phase 3, flag DONE_WITH_CONCERNS |
|
||||
| Review subagent timeout/error | Ask user whether to proceed or retry; if proceed, flag warn |
|
||||
| Timeout on phase2Result | assign_task "Finalize current work", re-wait 120s |
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| Review summary JSON | Risk level, risk factors, AI review recommendation, issues |
|
||||
| Gate status | pass / fail / warn / blocked |
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Version Bump
|
||||
|
||||
**Objective**: Detect version file, determine bump type from commits or user input, calculate new version, update version file, verify update.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| Phase 2 result | Gate passed (no critical issues) |
|
||||
| Repository | package.json / pyproject.toml / VERSION |
|
||||
| Phase detail | ~/.codex/skills/ship/phases/03-version-bump.md |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "ship-operator",
|
||||
items: [{ type: "text", text: `## PHASE 3 TASK
|
||||
|
||||
Read phase detail: ~/.codex/skills/ship/phases/03-version-bump.md
|
||||
|
||||
Execute Phase 3 Version Bump:
|
||||
1. Detect version file (package.json > pyproject.toml > VERSION)
|
||||
2. Determine bump type from commit messages (patch/minor/major)
|
||||
3. For major bumps: ask user to confirm before proceeding
|
||||
4. Calculate new version
|
||||
5. Update version file
|
||||
6. Verify update
|
||||
Output version change record JSON plus gate status.` }]
|
||||
})
|
||||
const phase3Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**Gate Decision**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Version file updated and verified (overall: "pass") | Fast-advance: assign Phase 4 task to ship-operator |
|
||||
| Version file not found | NEEDS_CONTEXT — ask user which file to use; halt until answered |
|
||||
| Version mismatch after update (overall: "fail") | BLOCKED — report mismatch, halt pipeline |
|
||||
| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
|
||||
| Timeout | assign_task "Finalize current work", re-wait 120s |
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| Version change record JSON | version_file, previous_version, new_version, bump_type, bump_source |
|
||||
| Gate status | pass / fail / needs_context / blocked |
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Changelog & Commit
|
||||
|
||||
**Objective**: Parse git log into grouped changelog entry, update CHANGELOG.md, create release commit, push branch to remote.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| Phase 3 result | new_version, version_file, bump_type |
|
||||
| Repository | Git history since last tag |
|
||||
| Phase detail | ~/.codex/skills/ship/phases/04-changelog-commit.md |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "ship-operator",
|
||||
items: [{ type: "text", text: `## PHASE 4 TASK
|
||||
|
||||
Read phase detail: ~/.codex/skills/ship/phases/04-changelog-commit.md
|
||||
|
||||
New version: <new_version>
|
||||
Version file: <version_file>
|
||||
|
||||
Execute Phase 4 Changelog & Commit:
|
||||
1. Gather commits since last tag
|
||||
2. Group by conventional commit type
|
||||
3. Format changelog entry
|
||||
4. Update or create CHANGELOG.md
|
||||
5. Create release commit (chore: bump version to <new_version>)
|
||||
6. Push branch to remote
|
||||
Output commit record JSON plus gate status.` }]
|
||||
})
|
||||
const phase4Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**Gate Decision**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Push succeeded (overall: "pass") | Fast-advance: assign Phase 5 task to ship-operator |
|
||||
| Push rejected (non-fast-forward) | BLOCKED — report error, suggest `git pull --rebase` |
|
||||
| Permission denied | BLOCKED — report error, advise check remote access |
|
||||
| No remote configured | BLOCKED — report error, suggest `git remote add` |
|
||||
| Timeout | assign_task "Finalize current work", re-wait 120s |
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| Commit record JSON | changelog_entry, commit_sha, commit_message, pushed_to |
|
||||
| Gate status | pass / fail / blocked |
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: PR Creation
|
||||
|
||||
**Objective**: Extract issue references from commits, build PR title and body, create PR via `gh pr create`, capture PR URL.
|
||||
|
||||
**Input**:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| Phase 4 result | commit_sha, new_version, previous_version, bump_type |
|
||||
| Phase 2 result | merge_base (for change_summary) |
|
||||
| Repository | Git history, remote |
|
||||
| Phase detail | ~/.codex/skills/ship/phases/05-pr-creation.md |
|
||||
|
||||
**Execution**:
|
||||
|
||||
```
|
||||
assign_task({
|
||||
target: "ship-operator",
|
||||
items: [{ type: "text", text: `## PHASE 5 TASK
|
||||
|
||||
Read phase detail: ~/.codex/skills/ship/phases/05-pr-creation.md
|
||||
|
||||
New version: <new_version>
|
||||
Previous version: <previous_version>
|
||||
Bump type: <bump_type>
|
||||
Merge base: <merge_base>
|
||||
Commit SHA: <commit_sha>
|
||||
|
||||
Execute Phase 5 PR Creation:
|
||||
1. Extract issue references from commits
|
||||
2. Determine target branch
|
||||
3. Build PR title: "release: v<new_version>"
|
||||
4. Build PR body with all sections
|
||||
5. Create PR via gh pr create
|
||||
6. Capture and report PR URL
|
||||
Output PR creation record JSON plus final completion status.` }]
|
||||
})
|
||||
const phase5Result = wait_agent({ targets: ["ship-operator"], timeout_ms: 300000 })
|
||||
```
|
||||
|
||||
**Gate Decision**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| PR created, URL returned (overall: "pass") | Pipeline complete — output DONE status |
|
||||
| PR created with review warnings | Pipeline complete — output DONE_WITH_CONCERNS |
|
||||
| gh CLI not available | BLOCKED — report error, advise `gh auth login` |
|
||||
| PR creation fails | BLOCKED — report error details, halt |
|
||||
| Timeout | assign_task "Finalize current work", re-wait 120s |
|
||||
|
||||
**Output**:
|
||||
|
||||
| Artifact | Description |
|
||||
|----------|-------------|
|
||||
| PR record JSON | pr_url, pr_title, target_branch, source_branch, linked_issues |
|
||||
| Final completion status | DONE / DONE_WITH_CONCERNS / BLOCKED |
|
||||
|
||||
---
|
||||
|
||||
## Lifecycle Management
|
||||
|
||||
### Timeout Protocol
|
||||
|
||||
| Phase | Default Timeout | On Timeout |
|
||||
|-------|-----------------|------------|
|
||||
| Phase 1: Pre-Flight | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
|
||||
| Phase 2: Code Review | 600000 ms (10 min) | assign_task "Finalize current work", re-wait 120s |
|
||||
| Phase 3: Version Bump | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
|
||||
| Phase 4: Changelog & Commit | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
|
||||
| Phase 5: PR Creation | 300000 ms (5 min) | assign_task "Finalize current work", re-wait 120s |
|
||||
|
||||
### Cleanup Protocol
|
||||
|
||||
After Phase 5 completes (or on any terminal BLOCKED halt), close ship-operator.
|
||||
|
||||
```
|
||||
close_agent({ target: "ship-operator" })
|
||||
```
|
||||
|
||||
### Agent Health Check
|
||||
|
||||
```
|
||||
const remaining = list_agents({})
|
||||
if (remaining.length > 0) {
|
||||
remaining.forEach(agent => close_agent({ target: agent.id }))
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Agent timeout (first) | assign_task with "Finalize current work and output results" + re-wait 120s |
|
||||
| Agent timeout (second) | Log error, close_agent({ target: "ship-operator" }), report partial results |
|
||||
| Gate fail — any phase | Log BLOCKED status with phase name and failure detail, close_agent, halt |
|
||||
| NEEDS_CONTEXT | Pause pipeline, surface question to user, resume with assign_task on answer |
|
||||
| send_message ignored | Escalate to assign_task |
|
||||
| Inline subagent timeout | ship-operator handles internally; continue with warn if review failed |
|
||||
| User cancellation | close_agent({ target: "ship-operator" }), report current pipeline state |
|
||||
| Fork from closed agent | Not applicable (single agent, no forking) |
|
||||
|
||||
---
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
## Summary
|
||||
- One-sentence completion status (DONE / DONE_WITH_CONCERNS / BLOCKED)
|
||||
|
||||
## Results
|
||||
- Phase 1 Pre-Flight: pass/fail
|
||||
- Phase 2 Code Review: pass/warn/fail
|
||||
- Phase 3 Version Bump: <previous> -> <new> (<bump_type>)
|
||||
- Phase 4 Changelog & Commit: commit <sha> pushed to <remote/branch>
|
||||
- Phase 5 PR Creation: <pr_url>
|
||||
|
||||
## Artifacts
|
||||
- CHANGELOG.md (updated)
|
||||
- <version_file> (version bumped to <new_version>)
|
||||
- Release commit: <sha>
|
||||
- PR: <pr_url>
|
||||
|
||||
## Next Steps (Optional)
|
||||
1. Review and merge the PR
|
||||
2. Tag the release after merge
|
||||
```
|
||||
198
.codex/skills/ship/phases/01-preflight-checks.md
Normal file
198
.codex/skills/ship/phases/01-preflight-checks.md
Normal file
@@ -0,0 +1,198 @@
|
||||
# Phase 1: Pre-Flight Checks
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Validate that the repository is in a shippable state before proceeding with the release pipeline.
|
||||
|
||||
## Objective
|
||||
|
||||
- Confirm working tree is clean (no uncommitted changes)
|
||||
- Validate current branch is appropriate for release
|
||||
- Run test suite and confirm all tests pass
|
||||
- Verify build succeeds
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Repository working directory | Yes | Git repo with working tree |
|
||||
| package.json / pyproject.toml / Makefile | No | Used for test and build detection |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Git Clean Check
|
||||
|
||||
Run `git status --porcelain` and evaluate output.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Output is empty | PASS — working tree is clean |
|
||||
| Output is non-empty | FAIL — working tree is dirty; report dirty files, suggest `git stash` or `git commit` |
|
||||
|
||||
```bash
|
||||
git_status=$(git status --porcelain)
|
||||
if [ -n "$git_status" ]; then
|
||||
echo "FAIL: Working tree is dirty"
|
||||
echo "$git_status"
|
||||
# Gate: BLOCKED — commit or stash changes first
|
||||
else
|
||||
echo "PASS: Working tree is clean"
|
||||
fi
|
||||
```
|
||||
|
||||
**Pass condition**: `git status --porcelain` produces empty output.
|
||||
**On failure**: Report dirty files and suggest `git stash` or `git commit`.
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Branch Validation
|
||||
|
||||
Run `git branch --show-current` and evaluate.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Branch is not main or master | PASS — proceed |
|
||||
| Branch is main or master | WARN — ask user to confirm direct-to-main/master release before proceeding |
|
||||
| User confirms direct release | PASS with warning noted |
|
||||
| User declines | BLOCKED — halt pipeline |
|
||||
|
||||
```bash
|
||||
current_branch=$(git branch --show-current)
|
||||
if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
|
||||
echo "WARN: Currently on $current_branch — direct push to main/master is risky"
|
||||
# Ask user for confirmation before proceeding
|
||||
else
|
||||
echo "PASS: On branch $current_branch"
|
||||
fi
|
||||
```
|
||||
|
||||
**Pass condition**: Not on main/master, OR user explicitly confirms direct-to-main release.
|
||||
**On warning**: Ask user to confirm they intend to release from main/master directly.
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Test Suite Execution
|
||||
|
||||
Detect project type and run appropriate test command.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| package.json with "test" script exists | Run `npm test` |
|
||||
| pytest available and tests/ or test/ directory exists | Run `pytest` |
|
||||
| pyproject.toml with pytest listed exists | Run `pytest` |
|
||||
| No test suite detected | WARN and continue (skip check) |
|
||||
| Test command exits code 0 | PASS |
|
||||
| Test command exits non-zero | FAIL — report test failures, halt pipeline |
|
||||
|
||||
```bash
|
||||
# Detection priority:
|
||||
# 1. package.json with "test" script → npm test
|
||||
# 2. pytest available and tests exist → pytest
|
||||
# 3. No tests found → WARN and continue
|
||||
|
||||
if [ -f "package.json" ] && grep -q '"test"' package.json; then
|
||||
npm test
|
||||
elif command -v pytest &>/dev/null && [ -d "tests" -o -d "test" ]; then
|
||||
pytest
|
||||
elif [ -f "pyproject.toml" ] && grep -q 'pytest' pyproject.toml; then
|
||||
pytest
|
||||
else
|
||||
echo "WARN: No test suite detected — skipping test check"
|
||||
fi
|
||||
```
|
||||
|
||||
**Pass condition**: Test command exits with code 0, or no tests detected (warn).
|
||||
**On failure**: Report test failures and stop the pipeline.
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Build Verification
|
||||
|
||||
Detect project build step and run it.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| package.json with "build" script exists | Run `npm run build` |
|
||||
| pyproject.toml exists and python build module available | Run `python -m build` |
|
||||
| Makefile with build target exists | Run `make build` |
|
||||
| No build step detected | INFO — skip (not all projects need a build), PASS |
|
||||
| Build command exits code 0 | PASS |
|
||||
| Build command exits non-zero | FAIL — report build errors, halt pipeline |
|
||||
|
||||
```bash
|
||||
# Detection priority:
|
||||
# 1. package.json with "build" script → npm run build
|
||||
# 2. pyproject.toml → python -m build (if build module available)
|
||||
# 3. Makefile with build target → make build
|
||||
# 4. No build step → PASS (not all projects need a build)
|
||||
|
||||
if [ -f "package.json" ] && grep -q '"build"' package.json; then
|
||||
npm run build
|
||||
elif [ -f "pyproject.toml" ] && python -m build --help &>/dev/null; then
|
||||
python -m build
|
||||
elif [ -f "Makefile" ] && grep -q '^build:' Makefile; then
|
||||
make build
|
||||
else
|
||||
echo "INFO: No build step detected — skipping build check"
|
||||
fi
|
||||
```
|
||||
|
||||
**Pass condition**: Build command exits with code 0, or no build step detected.
|
||||
**On failure**: Report build errors and stop the pipeline.
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| preflight-report | JSON | Pass/fail per check, current branch, blockers list |
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "preflight",
|
||||
"timestamp": "ISO-8601",
|
||||
"checks": {
|
||||
"git_clean": { "status": "pass|fail", "details": "" },
|
||||
"branch": { "status": "pass|warn", "current": "branch-name", "details": "" },
|
||||
"tests": { "status": "pass|fail|skip", "details": "" },
|
||||
"build": { "status": "pass|fail|skip", "details": "" }
|
||||
},
|
||||
"overall": "pass|fail",
|
||||
"blockers": []
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Git working tree is clean | `git status --porcelain` returns empty |
|
||||
| Branch is non-main or user confirmed | Branch check + optional user confirmation |
|
||||
| Tests pass or skipped with warning | Test command exit code 0, or skip with WARN |
|
||||
| Build passes or skipped with info | Build command exit code 0, or skip with INFO |
|
||||
| Overall gate is "pass" | All checks produce pass/warn/skip (no fail) |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| Dirty working tree | BLOCKED — list dirty files, suggest `git stash` or `git commit`, halt |
|
||||
| Tests fail | BLOCKED — report test output, halt pipeline |
|
||||
| Build fails | BLOCKED — report build output, halt pipeline |
|
||||
| git command not found | BLOCKED — report environment error |
|
||||
| No version file or project type detected | WARN — continue, version detection deferred to Phase 3 |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 2: Code Review](02-code-review.md)
|
||||
|
||||
If any check fails (overall: "fail"), report BLOCKED status with the preflight report. Do not proceed.
|
||||
228
.codex/skills/ship/phases/02-code-review.md
Normal file
228
.codex/skills/ship/phases/02-code-review.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# Phase 2: Code Review
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Automated AI-powered code review of changes since the base branch, with risk assessment.
|
||||
|
||||
## Objective
|
||||
|
||||
- Detect the merge base between current branch and target branch
|
||||
- Generate diff for review
|
||||
- Assess high-risk indicators before AI review
|
||||
- Run AI-powered code review via inline subagent
|
||||
- Flag high-risk changes (large diffs, sensitive files, breaking changes)
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Phase 1 gate result | Yes | overall: "pass" — must have passed |
|
||||
| Repository git history | Yes | Commit log, diff data |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Detect Merge Base
|
||||
|
||||
Determine the target branch and find the common ancestor commit.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| origin/main exists | Use main as target branch |
|
||||
| origin/main not found | Fall back to master as target branch |
|
||||
| Current branch is main or master | Use last tag as merge base |
|
||||
| Current branch is main/master and no tags exist | Use initial commit as merge base |
|
||||
| Current branch is feature branch | Use `git merge-base origin/<target> HEAD` |
|
||||
|
||||
```bash
|
||||
# Determine target branch (default: main, fallback: master)
|
||||
target_branch="main"
|
||||
if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
|
||||
target_branch="master"
|
||||
fi
|
||||
|
||||
# Find merge base
|
||||
merge_base=$(git merge-base "origin/$target_branch" HEAD)
|
||||
echo "Merge base: $merge_base"
|
||||
|
||||
# If on main/master directly, compare against last tag
|
||||
current_branch=$(git branch --show-current)
|
||||
if [ "$current_branch" = "main" ] || [ "$current_branch" = "master" ]; then
|
||||
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
if [ -n "$last_tag" ]; then
|
||||
merge_base="$last_tag"
|
||||
echo "On main — using last tag as base: $last_tag"
|
||||
else
|
||||
# Use first commit if no tags exist
|
||||
merge_base=$(git rev-list --max-parents=0 HEAD | head -1)
|
||||
echo "No tags found — using initial commit as base"
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Generate Diff Summary
|
||||
|
||||
Collect statistics and full diff content.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Diff command succeeds | Record files_changed, lines_added, lines_removed |
|
||||
| No changes found | WARN — nothing to review; ask user whether to proceed |
|
||||
|
||||
```bash
|
||||
# File-level summary
|
||||
git diff --stat "$merge_base"...HEAD
|
||||
|
||||
# Full diff for review
|
||||
git diff "$merge_base"...HEAD > /tmp/ship-review-diff.txt
|
||||
|
||||
# Count changes for risk assessment
|
||||
files_changed=$(git diff --name-only "$merge_base"...HEAD | wc -l)
|
||||
lines_added=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$1} END {print s}')
|
||||
lines_removed=$(git diff --numstat "$merge_base"...HEAD | awk '{s+=$2} END {print s}')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Risk Assessment
|
||||
|
||||
Flag high-risk indicators before AI review.
|
||||
|
||||
**Risk Factor Table**:
|
||||
|
||||
| Risk Factor | Threshold | Risk Level |
|
||||
|-------------|-----------|------------|
|
||||
| Files changed | > 50 | High |
|
||||
| Lines changed | > 1000 | High |
|
||||
| Sensitive files modified | Any of: `.env*`, `*secret*`, `*credential*`, `*auth*`, `*.key`, `*.pem` | High |
|
||||
| Config files modified | `package.json`, `pyproject.toml`, `tsconfig.json`, `Dockerfile` | Medium |
|
||||
| Migration files | `*migration*`, `*migrate*` | Medium |
|
||||
|
||||
```bash
|
||||
# Check for sensitive file changes
|
||||
sensitive_files=$(git diff --name-only "$merge_base"...HEAD | grep -iE '\.(env|key|pem)|secret|credential' || true)
|
||||
if [ -n "$sensitive_files" ]; then
|
||||
echo "HIGH RISK: Sensitive files modified:"
|
||||
echo "$sensitive_files"
|
||||
fi
|
||||
```
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Sensitive files detected | Set risk_level = high, add to risk_factors |
|
||||
| files_changed > 50 | Set risk_level = high, add to risk_factors |
|
||||
| lines changed > 1000 | Set risk_level = high, add to risk_factors |
|
||||
| Config or migration files detected | Set risk_level = medium (if not already high) |
|
||||
| No risk factors | Set risk_level = low |
|
||||
|
||||
---
|
||||
|
||||
### Step 4: AI Code Review via Inline Subagent
|
||||
|
||||
Spawn inline-code-review subagent for AI analysis. Replace the ccw cli call from the original with this inline subagent:
|
||||
|
||||
```
|
||||
spawn_agent({
|
||||
task_name: "inline-code-review",
|
||||
fork_context: false,
|
||||
model: "haiku",
|
||||
reasoning_effort: "medium",
|
||||
message: `### MANDATORY FIRST STEPS
|
||||
1. Read: ~/.codex/agents/cli-explore-agent.md
|
||||
|
||||
Goal: Review code changes for release readiness
|
||||
Context: Diff from <merge_base> to HEAD (<files_changed> files, +<lines_added>/-<lines_removed> lines)
|
||||
|
||||
Task:
|
||||
- Review diff for bugs and correctness issues
|
||||
- Check for breaking changes (API, config, schema)
|
||||
- Identify security concerns
|
||||
- Assess test coverage gaps
|
||||
- Flag formatting-only changes to exclude from critical issues
|
||||
|
||||
Expected: Risk level (low/medium/high), list of issues with severity and file:line reference, release recommendation (ship|hold|fix-first)
|
||||
Constraints: Focus on correctness and security | Flag breaking API changes | Ignore formatting-only changes`
|
||||
})
|
||||
const result = wait_agent({ targets: ["inline-code-review"], timeout_ms: 300000 })
|
||||
close_agent({ target: "inline-code-review" })
|
||||
```
|
||||
|
||||
**Note**: Wait for the subagent to complete before proceeding. Do not advance to Step 5 while review is running.
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Evaluate Review Results
|
||||
|
||||
Based on the inline subagent output, apply gate logic.
|
||||
|
||||
**Review Result Decision Table**:
|
||||
|
||||
| Review Result | Action |
|
||||
|---------------|--------|
|
||||
| recommendation: "ship", no critical issues | Gate = pass — proceed to Phase 3 |
|
||||
| recommendation: "hold" or critical issues present | Gate = fail — report BLOCKED, list issues |
|
||||
| recommendation: "fix-first" | Gate = fail — report BLOCKED, list issues with file:line |
|
||||
| Warnings only, recommendation: "ship" | Gate = warn — proceed with DONE_WITH_CONCERNS note |
|
||||
| Review subagent failed or timed out | Ask user whether to proceed or retry |
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| Review summary | JSON | Risk level, risk factors, AI review recommendation, critical issues, warnings |
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "code-review",
|
||||
"merge_base": "commit-sha",
|
||||
"stats": {
|
||||
"files_changed": 0,
|
||||
"lines_added": 0,
|
||||
"lines_removed": 0
|
||||
},
|
||||
"risk_level": "low|medium|high",
|
||||
"risk_factors": [],
|
||||
"ai_review": {
|
||||
"recommendation": "ship|hold|fix-first",
|
||||
"critical_issues": [],
|
||||
"warnings": []
|
||||
},
|
||||
"overall": "pass|fail|warn"
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Merge base detected | merge_base SHA present in output |
|
||||
| Diff statistics collected | files_changed, lines_added, lines_removed populated |
|
||||
| Risk assessment completed | risk_level set (low/medium/high), risk_factors populated |
|
||||
| AI review completed | ai_review.recommendation present |
|
||||
| Gate condition evaluated | overall set to pass/fail/warn |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| origin/main and origin/master both missing | Use HEAD~1 as merge base, warn user |
|
||||
| No commits in diff | WARN — nothing to review; ask user whether to proceed |
|
||||
| Inline subagent timeout | Log warning, ask user whether to proceed without AI review |
|
||||
| Inline subagent error | Log error, ask user whether to proceed |
|
||||
| Critical issues found | BLOCKED — report full issues list with severity and file:line |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 3: Version Bump](03-version-bump.md)
|
||||
|
||||
If review passes (overall: "pass" or "warn"), proceed to Phase 3.
|
||||
If critical issues found (overall: "fail"), report BLOCKED status with review summary. Do not proceed.
|
||||
259
.codex/skills/ship/phases/03-version-bump.md
Normal file
259
.codex/skills/ship/phases/03-version-bump.md
Normal file
@@ -0,0 +1,259 @@
|
||||
# Phase 3: Version Bump
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Detect the current version, determine the bump type, and update the version file.
|
||||
|
||||
## Objective
|
||||
|
||||
- Detect which version file the project uses
|
||||
- Read the current version
|
||||
- Determine bump type (patch/minor/major) from commit messages or user input
|
||||
- Update the version file
|
||||
- Record the version change
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Phase 2 gate result | Yes | overall: "pass" or "warn" — must have passed |
|
||||
| package.json / pyproject.toml / VERSION | Conditional | One must exist; used for version detection |
|
||||
| Git history | Yes | Commit messages for bump type auto-detection |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Detect Version File
|
||||
|
||||
Search for version file in priority order.
|
||||
|
||||
**Version File Detection Priority Table**:
|
||||
|
||||
| Priority | File | Read Method |
|
||||
|----------|------|-------------|
|
||||
| 1 | `package.json` | `jq -r .version package.json` |
|
||||
| 2 | `pyproject.toml` | `grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml` |
|
||||
| 3 | `VERSION` | `cat VERSION` |
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| package.json found | Set version_file = package.json, read version with node/jq |
|
||||
| pyproject.toml found (no package.json) | Set version_file = pyproject.toml, read with grep -oP |
|
||||
| VERSION found (no others) | Set version_file = VERSION, read with cat |
|
||||
| No version file found | NEEDS_CONTEXT — ask user which file to use or create |
|
||||
|
||||
```bash
|
||||
if [ -f "package.json" ]; then
|
||||
version_file="package.json"
|
||||
current_version=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
|
||||
elif [ -f "pyproject.toml" ]; then
|
||||
version_file="pyproject.toml"
|
||||
current_version=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
|
||||
elif [ -f "VERSION" ]; then
|
||||
version_file="VERSION"
|
||||
current_version=$(cat VERSION | tr -d '[:space:]')
|
||||
else
|
||||
echo "NEEDS_CONTEXT: No version file found"
|
||||
echo "Expected one of: package.json, pyproject.toml, VERSION"
|
||||
# Ask user which file to use or create
|
||||
fi
|
||||
|
||||
echo "Version file: $version_file"
|
||||
echo "Current version: $current_version"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Determine Bump Type
|
||||
|
||||
Auto-detect from commit messages, then confirm with user for major bumps.
|
||||
|
||||
**Bump Type Auto-Detection from Conventional Commits**:
|
||||
|
||||
```bash
|
||||
# Get commits since last tag
|
||||
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
if [ -n "$last_tag" ]; then
|
||||
commits=$(git log "$last_tag"..HEAD --oneline)
|
||||
else
|
||||
commits=$(git log --oneline -20)
|
||||
fi
|
||||
|
||||
# Scan for conventional commit prefixes
|
||||
has_breaking=$(echo "$commits" | grep -iE '(BREAKING CHANGE|!:)' || true)
|
||||
has_feat=$(echo "$commits" | grep -iE '^[a-f0-9]+ feat' || true)
|
||||
has_fix=$(echo "$commits" | grep -iE '^[a-f0-9]+ fix' || true)
|
||||
|
||||
if [ -n "$has_breaking" ]; then
|
||||
suggested_bump="major"
|
||||
elif [ -n "$has_feat" ]; then
|
||||
suggested_bump="minor"
|
||||
else
|
||||
suggested_bump="patch"
|
||||
fi
|
||||
|
||||
echo "Suggested bump: $suggested_bump"
|
||||
```
|
||||
|
||||
**User Confirmation Decision Table**:
|
||||
|
||||
| Bump Type | Action |
|
||||
|-----------|--------|
|
||||
| patch | Proceed with suggested bump, inform user |
|
||||
| minor | Proceed with suggested bump, inform user |
|
||||
| major | Always ask user to confirm before proceeding |
|
||||
| User overrides suggestion | Use user-specified bump type |
|
||||
| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Calculate New Version
|
||||
|
||||
Apply semver arithmetic to derive new version.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Bump Type | Calculation |
|
||||
|-----------|-------------|
|
||||
| major | `(major+1).0.0` |
|
||||
| minor | `major.(minor+1).0` |
|
||||
| patch | `major.minor.(patch+1)` |
|
||||
|
||||
```bash
|
||||
# Parse semver components
|
||||
IFS='.' read -r major minor patch <<< "$current_version"
|
||||
|
||||
case "$bump_type" in
|
||||
major)
|
||||
new_version="$((major + 1)).0.0"
|
||||
;;
|
||||
minor)
|
||||
new_version="${major}.$((minor + 1)).0"
|
||||
;;
|
||||
patch)
|
||||
new_version="${major}.${minor}.$((patch + 1))"
|
||||
;;
|
||||
esac
|
||||
|
||||
echo "Version bump: $current_version -> $new_version"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Update Version File
|
||||
|
||||
Write new version to the appropriate file using the correct method for each format.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Version File | Update Method |
|
||||
|--------------|---------------|
|
||||
| package.json | `jq --arg v "<new_version>" '.version = $v'` + update package-lock.json if present |
|
||||
| pyproject.toml | `sed -i "s/^version\s*=\s*\".*\"/version = \"<new_version>\"/"` |
|
||||
| VERSION | `echo "<new_version>" > VERSION` |
|
||||
|
||||
```bash
|
||||
case "$version_file" in
|
||||
package.json)
|
||||
# Use node/jq for safe JSON update
|
||||
jq --arg v "$new_version" '.version = $v' package.json > tmp.json && mv tmp.json package.json
|
||||
# Also update package-lock.json if it exists
|
||||
if [ -f "package-lock.json" ]; then
|
||||
jq --arg v "$new_version" '.version = $v | .packages[""].version = $v' package-lock.json > tmp.json && mv tmp.json package-lock.json
|
||||
fi
|
||||
;;
|
||||
pyproject.toml)
|
||||
# Use sed for TOML update (version line in [project] or [tool.poetry])
|
||||
sed -i "s/^version\s*=\s*\".*\"/version = \"$new_version\"/" pyproject.toml
|
||||
;;
|
||||
VERSION)
|
||||
echo "$new_version" > VERSION
|
||||
;;
|
||||
esac
|
||||
|
||||
echo "Updated $version_file: $current_version -> $new_version"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Verify Update
|
||||
|
||||
Re-read version file to confirm the update was applied correctly.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Re-read version equals new_version | PASS — gate satisfied |
|
||||
| Re-read version does not match | FAIL — report mismatch, BLOCKED |
|
||||
|
||||
```bash
|
||||
# Re-read to confirm
|
||||
case "$version_file" in
|
||||
package.json)
|
||||
verified=$(node -p "require('./package.json').version" 2>/dev/null || jq -r .version package.json)
|
||||
;;
|
||||
pyproject.toml)
|
||||
verified=$(grep -oP 'version\s*=\s*"\K[^"]+' pyproject.toml | head -1)
|
||||
;;
|
||||
VERSION)
|
||||
verified=$(cat VERSION | tr -d '[:space:]')
|
||||
;;
|
||||
esac
|
||||
|
||||
if [ "$verified" = "$new_version" ]; then
|
||||
echo "PASS: Version verified as $new_version"
|
||||
else
|
||||
echo "FAIL: Version mismatch — expected $new_version, got $verified"
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| Version change record | JSON | version_file, previous_version, new_version, bump_type, bump_source |
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "version-bump",
|
||||
"version_file": "package.json",
|
||||
"previous_version": "1.2.3",
|
||||
"new_version": "1.3.0",
|
||||
"bump_type": "minor",
|
||||
"bump_source": "auto-detected|user-specified",
|
||||
"overall": "pass|fail"
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Version file detected | version_file field populated |
|
||||
| Current version read | current_version field populated |
|
||||
| Bump type determined | bump_type set to patch/minor/major |
|
||||
| Version file updated | Write/edit operation succeeded |
|
||||
| Update verified | Re-read matches new_version |
|
||||
| overall = "pass" | All steps completed without error |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| No version file found | NEEDS_CONTEXT — ask user which file to create or use |
|
||||
| Version parse error (malformed semver) | NEEDS_CONTEXT — report current value, ask user for correction |
|
||||
| jq not available | Fall back to node for package.json; report error for others |
|
||||
| sed fails on pyproject.toml | Try Write tool to rewrite the file; report on failure |
|
||||
| User declines major bump | BLOCKED — halt, user must re-trigger with explicit bump type |
|
||||
| Version mismatch after update | BLOCKED — report expected vs actual, suggest manual fix |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 4: Changelog & Commit](04-changelog-commit.md)
|
||||
|
||||
If version updated successfully (overall: "pass"), proceed to Phase 4.
|
||||
If update fails or context needed, report BLOCKED / NEEDS_CONTEXT. Do not proceed.
|
||||
263
.codex/skills/ship/phases/04-changelog-commit.md
Normal file
263
.codex/skills/ship/phases/04-changelog-commit.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# Phase 4: Changelog & Commit
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Generate changelog entry from git history, update CHANGELOG.md, create release commit, and push to remote.
|
||||
|
||||
## Objective
|
||||
|
||||
- Parse git log since last tag into grouped changelog entry
|
||||
- Update or create CHANGELOG.md
|
||||
- Create a release commit with version in the message
|
||||
- Push the branch to remote
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Phase 3 output | Yes | new_version, version_file, bump_type |
|
||||
| Git history | Yes | Commits since last tag |
|
||||
| CHANGELOG.md | No | Updated in-place if it exists; created if not |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Gather Commits Since Last Tag
|
||||
|
||||
Retrieve commits to include in the changelog.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Last tag exists | `git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges` |
|
||||
| No previous tag found | Use last 50 commits: `git log --pretty=format:"%h %s" --no-merges -50` |
|
||||
|
||||
```bash
|
||||
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
|
||||
if [ -n "$last_tag" ]; then
|
||||
echo "Generating changelog since tag: $last_tag"
|
||||
git log "$last_tag"..HEAD --pretty=format:"%h %s" --no-merges
|
||||
else
|
||||
echo "No previous tag found — using last 50 commits"
|
||||
git log --pretty=format:"%h %s" --no-merges -50
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Group Commits by Conventional Commit Type
|
||||
|
||||
Parse commit messages and group into changelog sections.
|
||||
|
||||
**Conventional Commit Grouping Table**:
|
||||
|
||||
| Prefix | Category | Changelog Section |
|
||||
|--------|----------|-------------------|
|
||||
| `feat:` / `feat(*):` | Features | **Features** |
|
||||
| `fix:` / `fix(*):` | Bug Fixes | **Bug Fixes** |
|
||||
| `perf:` | Performance | **Performance** |
|
||||
| `docs:` | Documentation | **Documentation** |
|
||||
| `refactor:` | Refactoring | **Refactoring** |
|
||||
| `chore:` | Maintenance | **Maintenance** |
|
||||
| `test:` | Testing | *(omitted from changelog)* |
|
||||
| Other | Miscellaneous | **Other Changes** |
|
||||
|
||||
```bash
|
||||
# Example grouping logic (executed by the agent, not a literal script):
|
||||
# 1. Read all commits since last tag
|
||||
# 2. Parse prefix from each commit message
|
||||
# 3. Group into categories
|
||||
# 4. Format as markdown sections
|
||||
# 5. Omit empty categories
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Format Changelog Entry
|
||||
|
||||
Generate a markdown changelog entry using ISO 8601 date format.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Category has commits | Include section with all entries |
|
||||
| Category is empty | Omit section entirely |
|
||||
| test: commits present | Omit from changelog output |
|
||||
|
||||
Changelog entry format:
|
||||
|
||||
```markdown
|
||||
## [X.Y.Z] - YYYY-MM-DD
|
||||
|
||||
### Features
|
||||
- feat: description (sha)
|
||||
- feat(scope): description (sha)
|
||||
|
||||
### Bug Fixes
|
||||
- fix: description (sha)
|
||||
|
||||
### Performance
|
||||
- perf: description (sha)
|
||||
|
||||
### Other Changes
|
||||
- chore: description (sha)
|
||||
```
|
||||
|
||||
Rules:
|
||||
- Date format: YYYY-MM-DD (ISO 8601)
|
||||
- Each entry includes the short SHA for traceability
|
||||
- Empty categories are omitted
|
||||
- Entries are listed in chronological order within each category
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Update CHANGELOG.md
|
||||
|
||||
Write the new entry into CHANGELOG.md.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| CHANGELOG.md exists | Insert new entry after the first heading line (`# Changelog`), before previous version entry |
|
||||
| CHANGELOG.md does not exist | Create new file with `# Changelog` heading followed by new entry |
|
||||
|
||||
```bash
|
||||
if [ -f "CHANGELOG.md" ]; then
|
||||
# Insert new entry after the first heading line (# Changelog)
|
||||
# The new entry goes between the main heading and the previous version entry
|
||||
# Use Write tool to insert the new section at the correct position
|
||||
echo "Updating existing CHANGELOG.md"
|
||||
else
|
||||
# Create new CHANGELOG.md with header
|
||||
echo "Creating new CHANGELOG.md"
|
||||
fi
|
||||
```
|
||||
|
||||
**CHANGELOG.md structure**:
|
||||
|
||||
```markdown
|
||||
# Changelog
|
||||
|
||||
## [X.Y.Z] - YYYY-MM-DD
|
||||
(new entry here)
|
||||
|
||||
## [X.Y.Z-1] - YYYY-MM-DD
|
||||
(previous entry)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Create Release Commit
|
||||
|
||||
Stage changed files and create conventionally-formatted release commit.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Version file is package.json | Stage package.json and package-lock.json (if present) |
|
||||
| Version file is pyproject.toml | Stage pyproject.toml |
|
||||
| Version file is VERSION | Stage VERSION |
|
||||
| CHANGELOG.md was updated/created | Stage CHANGELOG.md |
|
||||
| git commit succeeds | Proceed to push step |
|
||||
| git commit fails | BLOCKED — report error |
|
||||
|
||||
```bash
|
||||
# Stage version file and changelog
|
||||
git add package.json package-lock.json pyproject.toml VERSION CHANGELOG.md 2>/dev/null
|
||||
|
||||
# Only stage files that actually exist and are modified
|
||||
git add -u
|
||||
|
||||
# Create release commit
|
||||
git commit -m "$(cat <<'EOF'
|
||||
chore: bump version to X.Y.Z
|
||||
|
||||
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
**Commit message format**: `chore: bump version to <new_version>`
|
||||
- Follows conventional commit format
|
||||
- Includes Co-Authored-By trailer
|
||||
|
||||
---
|
||||
|
||||
### Step 6: Push to Remote
|
||||
|
||||
Push the branch to the remote origin.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Remote tracking branch exists | `git push origin "<current_branch>"` |
|
||||
| No remote tracking branch | `git push -u origin "<current_branch>"` |
|
||||
| Push succeeds (exit 0) | PASS — gate satisfied |
|
||||
| Push rejected (non-fast-forward) | BLOCKED — report error, suggest `git pull --rebase` |
|
||||
| Permission denied | BLOCKED — report error, advise check remote access |
|
||||
| No remote configured | BLOCKED — report error, suggest `git remote add` |
|
||||
|
||||
```bash
|
||||
current_branch=$(git branch --show-current)
|
||||
|
||||
# Check if remote tracking branch exists
|
||||
if git rev-parse --verify "origin/$current_branch" &>/dev/null; then
|
||||
git push origin "$current_branch"
|
||||
else
|
||||
git push -u origin "$current_branch"
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| Commit and push record | JSON | changelog_entry, commit_sha, commit_message, pushed_to |
|
||||
| CHANGELOG.md | Markdown file | Updated with new version entry |
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "changelog-commit",
|
||||
"changelog_entry": "## [X.Y.Z] - YYYY-MM-DD ...",
|
||||
"commit_sha": "abc1234",
|
||||
"commit_message": "chore: bump version to X.Y.Z",
|
||||
"pushed_to": "origin/branch-name",
|
||||
"overall": "pass|fail"
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Commits gathered since last tag | Commit list non-empty or warn if empty |
|
||||
| Changelog entry formatted | Markdown entry with correct sections |
|
||||
| CHANGELOG.md updated or created | File exists with new entry at top |
|
||||
| Release commit created | `git log -1 --oneline` shows commit |
|
||||
| Branch pushed to remote | Push command exits 0 |
|
||||
| overall = "pass" | All steps completed without error |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| No commits since last tag | WARN — create minimal changelog entry, continue |
|
||||
| CHANGELOG.md write error | BLOCKED — report file system error |
|
||||
| git commit fails (nothing staged) | Verify version file and CHANGELOG.md were modified, re-stage |
|
||||
| Push rejected (non-fast-forward) | BLOCKED — suggest `git pull --rebase`, halt |
|
||||
| Push permission denied | BLOCKED — advise check SSH keys or access token |
|
||||
| No remote configured | BLOCKED — suggest `git remote add origin <url>` |
|
||||
|
||||
## Next Phase
|
||||
|
||||
-> [Phase 5: PR Creation](05-pr-creation.md)
|
||||
|
||||
If commit and push succeed (overall: "pass"), proceed to Phase 5.
|
||||
If push fails, report BLOCKED status with error details. Do not proceed.
|
||||
280
.codex/skills/ship/phases/05-pr-creation.md
Normal file
280
.codex/skills/ship/phases/05-pr-creation.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# Phase 5: PR Creation
|
||||
|
||||
> **COMPACT PROTECTION**: This is a core execution phase. If context compression has occurred and this file is only a summary, **MUST `Read` this file again before executing any Step**. Do not execute from memory.
|
||||
|
||||
Create a pull request via GitHub CLI with a structured body, linked issues, and release metadata.
|
||||
|
||||
## Objective
|
||||
|
||||
- Create a PR using `gh pr create` with structured body
|
||||
- Auto-link related issues from commit messages
|
||||
- Include release summary (version, changes, test plan)
|
||||
- Output the PR URL
|
||||
|
||||
## Input
|
||||
|
||||
| Source | Required | Description |
|
||||
|--------|----------|-------------|
|
||||
| Phase 4 output | Yes | commit_sha, pushed_to |
|
||||
| Phase 3 output | Yes | new_version, previous_version, bump_type, version_file |
|
||||
| Phase 2 output | Yes | merge_base (for change summary) |
|
||||
| Git history | Yes | Commit messages for issue extraction |
|
||||
|
||||
## Execution Steps
|
||||
|
||||
### Step 1: Extract Issue References from Commits
|
||||
|
||||
Scan commit messages for issue reference patterns.
|
||||
|
||||
**Issue Reference Pattern**: `fixes #N`, `closes #N`, `resolves #N`, `refs #N` (case-insensitive, singular and plural forms).
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| Last tag exists | Scan commits from last_tag..HEAD |
|
||||
| No last tag | Scan last 50 commit subjects |
|
||||
| Issue references found | Deduplicate, sort numerically |
|
||||
| No issue references found | issues_section = empty (omit section from PR body) |
|
||||
|
||||
```bash
|
||||
last_tag=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
|
||||
if [ -n "$last_tag" ]; then
|
||||
commits=$(git log "$last_tag"..HEAD --pretty=format:"%s" --no-merges)
|
||||
else
|
||||
commits=$(git log --pretty=format:"%s" --no-merges -50)
|
||||
fi
|
||||
|
||||
# Extract issue references: fixes #N, closes #N, resolves #N, refs #N
|
||||
issues=$(echo "$commits" | grep -oiE '(fix(es)?|close[sd]?|resolve[sd]?|refs?)\s*#[0-9]+' | grep -oE '#[0-9]+' | sort -u || true)
|
||||
|
||||
echo "Referenced issues: $issues"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Determine Target Branch
|
||||
|
||||
Find the appropriate base branch for the PR.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| origin/main exists | target_branch = main |
|
||||
| origin/main not found | target_branch = master |
|
||||
|
||||
```bash
|
||||
# Default target: main (fallback: master)
|
||||
target_branch="main"
|
||||
if ! git rev-parse --verify "origin/$target_branch" &>/dev/null; then
|
||||
target_branch="master"
|
||||
fi
|
||||
|
||||
current_branch=$(git branch --show-current)
|
||||
echo "PR: $current_branch -> $target_branch"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Build PR Title
|
||||
|
||||
Format the PR title as `release: vX.Y.Z`.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| new_version available from Phase 3 | pr_title = "release: v<new_version>" |
|
||||
| new_version not available | Fall back to descriptive title derived from branch name |
|
||||
|
||||
```bash
|
||||
pr_title="release: v${new_version}"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Build PR Body
|
||||
|
||||
Construct the full PR body with all sections.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| issues list non-empty | Include "## Linked Issues" section with each issue as `- #N` |
|
||||
| issues list empty | Omit "## Linked Issues" section |
|
||||
| Phase 2 warnings exist | Include warning note in Summary section |
|
||||
|
||||
```bash
|
||||
# Gather change summary
|
||||
change_summary=$(git log "$merge_base"..HEAD --pretty=format:"- %s (%h)" --no-merges)
|
||||
|
||||
# Build linked issues section
|
||||
if [ -n "$issues" ]; then
|
||||
issues_section="## Linked Issues
|
||||
$(echo "$issues" | while read -r issue; do echo "- $issue"; done)"
|
||||
else
|
||||
issues_section=""
|
||||
fi
|
||||
```
|
||||
|
||||
**PR Body Sections Table**:
|
||||
|
||||
| Section | Content |
|
||||
|---------|---------|
|
||||
| **Summary** | Version being released, one-line description |
|
||||
| **Changes** | Grouped changelog entries (from Phase 4) |
|
||||
| **Linked Issues** | Auto-extracted `fixes #N`, `closes #N` references |
|
||||
| **Version** | Previous version, new version, bump type |
|
||||
| **Test Plan** | Checklist confirming all phases passed |
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Create PR via gh CLI
|
||||
|
||||
Invoke `gh pr create` with title and fully assembled body.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| gh CLI available | Execute `gh pr create` |
|
||||
| gh CLI not installed | BLOCKED — report missing CLI, advise `gh auth login` |
|
||||
| PR created successfully | Capture URL from output |
|
||||
| PR creation fails (already exists) | Report existing PR URL, gate = pass |
|
||||
| PR creation fails (other error) | BLOCKED — report error details |
|
||||
|
||||
```bash
|
||||
gh pr create --title "$pr_title" --base "$target_branch" --body "$(cat <<'EOF'
|
||||
## Summary
|
||||
Release vX.Y.Z
|
||||
|
||||
### Changes
|
||||
- list of changes from changelog
|
||||
|
||||
## Linked Issues
|
||||
- #N (fixes)
|
||||
- #M (closes)
|
||||
|
||||
## Version
|
||||
- Previous: X.Y.Z-1
|
||||
- New: X.Y.Z
|
||||
- Bump type: patch|minor|major
|
||||
|
||||
## Test Plan
|
||||
- [ ] Pre-flight checks passed (git clean, branch, tests, build)
|
||||
- [ ] AI code review completed with no critical issues
|
||||
- [ ] Version bump verified in version file
|
||||
- [ ] Changelog updated with all changes since last release
|
||||
- [ ] Release commit pushed successfully
|
||||
|
||||
Generated with [Claude Code](https://claude.com/claude-code)
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Step 6: Capture and Report PR URL
|
||||
|
||||
Extract the PR URL from gh output.
|
||||
|
||||
**Decision Table**:
|
||||
|
||||
| Condition | Action |
|
||||
|-----------|--------|
|
||||
| URL present in output | Record pr_url, set gate = pass |
|
||||
| No URL in output | Check `gh pr view --json url` as fallback |
|
||||
| Both fail | BLOCKED — report failure |
|
||||
|
||||
```bash
|
||||
# gh pr create outputs the PR URL on success
|
||||
pr_url=$(gh pr create ... 2>&1 | tail -1)
|
||||
echo "PR created: $pr_url"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Artifact | Format | Description |
|
||||
|----------|--------|-------------|
|
||||
| PR creation record | JSON | pr_url, pr_title, target_branch, source_branch, linked_issues |
|
||||
| Final completion status | Text block | DONE / DONE_WITH_CONCERNS with full summary |
|
||||
|
||||
```json
|
||||
{
|
||||
"phase": "pr-creation",
|
||||
"pr_url": "https://github.com/owner/repo/pull/N",
|
||||
"pr_title": "release: vX.Y.Z",
|
||||
"target_branch": "main",
|
||||
"source_branch": "feature-branch",
|
||||
"linked_issues": ["#1", "#2"],
|
||||
"overall": "pass|fail"
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Validation Method |
|
||||
|-----------|-------------------|
|
||||
| Issue references extracted | issues list populated (or empty with no error) |
|
||||
| Target branch determined | target_branch set to main or master |
|
||||
| PR title formatted | pr_title = "release: v<new_version>" |
|
||||
| PR body assembled with all sections | All required sections present |
|
||||
| PR created via gh CLI | pr_url present in output |
|
||||
| Completion status output | DONE or DONE_WITH_CONCERNS block present |
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Scenario | Resolution |
|
||||
|----------|------------|
|
||||
| gh CLI not installed | BLOCKED — report error, advise install + `gh auth login` |
|
||||
| Not authenticated with gh | BLOCKED — report auth error, advise `gh auth login` |
|
||||
| PR already exists for branch | Report existing PR URL, treat as pass |
|
||||
| No changes to create PR for | BLOCKED — report, suggest verifying Phase 4 push succeeded |
|
||||
| Issue regex finds no matches | issues = [] — omit Linked Issues section, continue |
|
||||
|
||||
## Completion Status
|
||||
|
||||
After PR creation, output the final Completion Status:
|
||||
|
||||
```
|
||||
## STATUS: DONE
|
||||
|
||||
**Summary**: Released vX.Y.Z — PR created at <pr_url>
|
||||
|
||||
### Details
|
||||
- Phases completed: 5/5
|
||||
- Version: <previous> -> <new> (<bump_type>)
|
||||
- PR: <pr_url>
|
||||
- Key outputs: CHANGELOG.md updated, release commit pushed, PR created
|
||||
|
||||
### Outputs
|
||||
- CHANGELOG.md (updated)
|
||||
- <version_file> (version bumped)
|
||||
- Release commit: <sha>
|
||||
- PR: <pr_url>
|
||||
```
|
||||
|
||||
If there were review warnings from Phase 2, use `DONE_WITH_CONCERNS` and list the warnings in the Details section:
|
||||
|
||||
```
|
||||
## STATUS: DONE_WITH_CONCERNS
|
||||
|
||||
**Summary**: Released vX.Y.Z — PR created at <pr_url> (review warnings noted)
|
||||
|
||||
### Details
|
||||
- Phases completed: 5/5
|
||||
- Version: <previous> -> <new> (<bump_type>)
|
||||
- PR: <pr_url>
|
||||
- Concerns: <list review warnings from Phase 2>
|
||||
|
||||
### Outputs
|
||||
- CHANGELOG.md (updated)
|
||||
- <version_file> (version bumped)
|
||||
- Release commit: <sha>
|
||||
- PR: <pr_url>
|
||||
```
|
||||
@@ -416,6 +416,8 @@ Visual workflow template editor with drag-drop.
|
||||
|
||||
- **[Impeccable](https://github.com/pbakaus/impeccable)** — Design audit methodology, OKLCH color system, anti-AI-slop detection patterns, editorial typography standards, motion/animation token architecture, and vanilla JS interaction patterns. The UI team skills (`team-ui-polish`, `team-interactive-craft`, `team-motion-design`, `team-visual-a11y`, `team-uidesign`, `team-ux-improve`) draw heavily from Impeccable's design knowledge.
|
||||
|
||||
- **[gstack](https://github.com/garrytan/gstack)** — Systematic debugging methodology, security audit frameworks, and release pipeline patterns. The skills `investigate` (Iron Law debugging), `security-audit` (OWASP Top 10 + STRIDE), and `ship` (gated release pipeline) are inspired by gstack's workflow designs.
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
@@ -416,6 +416,8 @@ v2 团队架构引入了**事件驱动的节拍模型**,实现高效编排:
|
||||
|
||||
- **[Impeccable](https://github.com/pbakaus/impeccable)** — 设计审计方法论、OKLCH 色彩系统、anti-AI-slop 检测模式、编辑级排版标准、动效/动画 token 体系、以及原生 JS 交互模式。UI 团队技能(`team-ui-polish`、`team-interactive-craft`、`team-motion-design`、`team-visual-a11y`、`team-uidesign`、`team-ux-improve`)大量借鉴了 Impeccable 的设计知识。
|
||||
|
||||
- **[gstack](https://github.com/garrytan/gstack)** — 系统化调试方法论、安全审计框架与发布流水线模式。`investigate`(Iron Law 调试)、`security-audit`(OWASP Top 10 + STRIDE)、`ship`(门控发布流水线)三个技能的设计灵感来源于 gstack 的工作流设计。
|
||||
|
||||
---
|
||||
|
||||
## 🤝 贡献
|
||||
|
||||
Reference in New Issue
Block a user