mirror of https://github.com/catlog22/Claude-Code-Workflow.git synced 2026-03-30 20:21:09 +08:00

Files

catlog22 67ff3fe339 feat: add investigate, security-audit, ship skills (Claude + Codex)

- Add 3 new Claude skills: investigate (Iron Law debugging), security-audit
  (OWASP Top 10 + STRIDE), ship (gated release pipeline)
- Port all 3 skills to Codex v4 format under .codex/skills/ using
  Deep Interaction pattern (spawn_agent + assign_task phase transitions)
- Update README/README_CN acknowledgments: credit gstack
  (https://github.com/garrytan/gstack) as inspiration source

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-30 10:31:13 +08:00

7.0 KiB

Raw Blame History

Phase 3: Hypothesis Testing

COMPACT PROTECTION: This is a core execution phase. If context compression has occurred and this file is only a summary, MUST Read this file again before executing any Step. Do not execute from memory.

Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.

Objective

Form a maximum of 3 hypotheses from Phase 1-2 evidence
Test each hypothesis with minimal, read-only probes
Confirm or reject each hypothesis with concrete evidence
Enforce 3-strike rule: STOP and escalate after 3 consecutive unproductive test failures

Input

Source	Required	Description
investigation-report (phases 1-2)	Yes	Evidence, affected files, pattern analysis, initial suspects
assign_task message	Yes	Phase 3 instruction

Execution Steps

Step 1: Form Hypotheses

Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:

Hypothesis formation rules:

Each hypothesis must cite at least one piece of evidence from Phase 1-2
Each hypothesis must have a testable prediction
Rank by confidence (high first)
Maximum 3 hypotheses per investigation

Assemble hypotheses in memory:

hypotheses = [
  {
    id: "H1",
    description: "The root cause is <X> because evidence <Y>",
    evidence_supporting: ["<evidence item 1>", "<evidence item 2>"],
    predicted_behavior: "If H1 is correct, then we should observe <Z>",
    test_method: "How to verify: read file <X> line <Y>, check value <Z>",
    confidence: "high|medium|low"
  }
]

Initialize strike counter: 0

Step 2: Test Hypotheses Sequentially

Test each hypothesis starting from highest confidence (H1 first). Use read-only probes only during testing.

Allowed test methods:

Method	Usage
Read a specific file	Check a specific value, condition, or code pattern
Grep for a pattern	Confirm or deny the presence of a condition
Bash targeted test	Run a specific test that reveals the condition
Temporary log statement	Add a log to observe runtime behavior; MUST revert after

Prohibited during hypothesis testing:

Modifying production code (save for Phase 4)
Changing multiple things at once
Running the full test suite (targeted checks only)

Step 3: Record Test Results

For each hypothesis test, record:

hypothesis_test = {
  id: "H1",
  test_performed: "<what was checked, e.g.: Read src/caller.ts:42 — checked null handling>",
  result: "confirmed|rejected|inconclusive",
  evidence: "<specific observation that confirms or rejects>",
  files_checked: ["<src/caller.ts:42-55>"]
}

Step 4: 3-Strike Escalation Rule

Track consecutive unproductive test failures. After each hypothesis test, evaluate:

Strike evaluation:

Test result	New insight gained	Strike action
confirmed	—	CONFIRM root cause, end testing
rejected	Yes — narrows search or reveals new cause	No strike (productive rejection)
rejected	No — no actionable insight	+1 strike
inconclusive	Yes — identifies new area	No strike (productive)
inconclusive	No — no narrowing	+1 strike

Strike counter tracking:

Strike count	Action
1	Continue to next hypothesis
2	Continue to next hypothesis
3	STOP — output escalation block immediately

On 3rd Strike — output this escalation block verbatim and halt:

## ESCALATION: 3-Strike Limit Reached

### Failed Step
- Phase: 3 — Hypothesis Testing
- Step: Hypothesis test #<N>

### Error History
1. Attempt 1: <H1 description>
   Test: <what was checked>
   Result: <rejected/inconclusive> — <why>
2. Attempt 2: <H2 description>
   Test: <what was checked>
   Result: <rejected/inconclusive> — <why>
3. Attempt 3: <H3 description>
   Test: <what was checked>
   Result: <rejected/inconclusive> — <why>

### Current State
- Evidence collected: <summary from Phase 1-2>
- Hypotheses tested: <list>
- Files examined: <list>

### Diagnosis
- Likely root cause area: <best guess based on all evidence>
- Suggested human action: <specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z">

### Diagnostic Dump
<Full investigation-report content from all phases>

STATUS: BLOCKED

After outputting escalation: set status BLOCKED. Do not proceed to Phase 4.

Step 5: Confirm Root Cause

If a hypothesis is confirmed, document the confirmed root cause:

confirmed_root_cause = {
  hypothesis_id: "H1",
  description: "<Root cause description with full technical detail>",
  evidence_chain: [
    "Phase 1: <Error message X observed in Y>",
    "Phase 2: <Same pattern found in N other files>",
    "Phase 3: H1 confirmed — <specific condition at file.ts:42>"
  ],
  affected_code: {
    file: "<path/to/file.ts>",
    line_range: "<42-55>",
    function: "<functionName>"
  }
}

Add hypothesis_tests and confirmed_root_cause to investigation-report in memory.

Output Phase 3 results and await assign_task for Phase 4.

Output

Artifact	Format	Description
investigation-report (phase 3)	In-memory JSON	Phases 1-2 fields + hypothesis_tests + confirmed_root_cause
Phase 3 summary or escalation block	Structured text output	Either confirmed root cause or BLOCKED escalation

Success Criteria

Criterion	Validation Method
Maximum 3 hypotheses formed	Count of hypotheses array
Each hypothesis cites evidence	evidence_supporting non-empty for each
Each hypothesis tested with documented probe	test_performed field populated for each
Strike counter maintained correctly	Count of unproductive consecutive failures
Root cause confirmed with evidence chain OR escalation triggered	confirmed_root_cause present OR BLOCKED output

Error Handling

Scenario	Resolution
Evidence insufficient to form 3 hypotheses	Form as many as evidence supports (minimum 1), proceed
Partial insight from rejected hypothesis	Do not count as strike; re-form or refine remaining hypotheses with new insight
All 3 hypotheses confirmed simultaneously	Use highest-confidence confirmed one as root cause
Hypothesis test requires production change	Prohibited — use static analysis or targeted read-only probe instead

Gate for Phase 4

Phase 4 can ONLY proceed if confirmed_root_cause is present. This is the Iron Law gate.

Outcome	Next Step
Root cause confirmed	-> Phase 4: Implementation
3-strike escalation triggered	STOP — output diagnostic dump — STATUS: BLOCKED
Partial insight, re-forming hypotheses	Stay in Phase 3, re-test with refined hypotheses

Next Phase

-> Phase 4: Implementation ONLY with confirmed root cause.

7.0 KiB Raw Blame History