Claude-Code-Workflow/.claude/skills/investigate/phases/03-hypothesis-testing.md

# Phase 3: Hypothesis Testing

Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.

## Objective

- Form a maximum of 3 hypotheses from Phase 1-2 evidence
- Test each hypothesis with minimal, read-only probes
- Confirm or reject each hypothesis with concrete evidence
- Enforce 3-strike rule: STOP and escalate after 3 consecutive test failures

## Execution Steps

### Step 1: Form Hypotheses

Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:

```json
{
  "hypotheses": [
    {
      "id": "H1",
      "description": "The root cause is X because evidence Y",
      "evidence_supporting": ["evidence item 1", "evidence item 2"],
      "predicted_behavior": "If H1 is correct, then we should observe Z",
      "test_method": "How to verify: read file X line Y, check value Z",
      "confidence": "high|medium|low"
    }
  ]
}
```

**Hypothesis Formation Rules**:
- Each hypothesis must cite at least one piece of evidence from Phase 1-2
- Each hypothesis must have a testable prediction
- Rank by confidence (high first)
- Maximum 3 hypotheses per investigation

### Step 2: Test Hypotheses Sequentially

Test each hypothesis starting from highest confidence. Use read-only probes:

**Allowed test methods**:
- `Read` a specific file and check a specific value or condition
- `Grep` for a pattern that would confirm or deny the hypothesis
- `Bash` to run a specific test or command that reveals the condition
- Temporarily add a log statement to observe runtime behavior (revert after)

**Prohibited during testing**:
- Modifying production code (save that for Phase 4)
- Changing multiple things at once
- Running the full test suite (targeted checks only)

```javascript
// Example hypothesis test
// H1: "Function X receives null because caller Y doesn't check return value"
const evidence = Read({ file_path: "src/caller.ts" })
// Check: Does caller Y use the return value without null check?
// Result: Confirmed / Rejected with specific evidence
```

### Step 3: Record Test Results

For each hypothesis test:

```json
{
  "hypothesis_tests": [
    {
      "id": "H1",
      "test_performed": "Read src/caller.ts:42 - checked null handling",
      "result": "confirmed|rejected|inconclusive",
      "evidence": "specific observation that confirms or rejects",
      "files_checked": ["src/caller.ts:42-55"]
    }
  ]
}
```

### Step 4: 3-Strike Escalation Rule

Track consecutive test failures. A "failure" means the test was inconclusive or the hypothesis was rejected AND no actionable insight was gained.

```
Strike Counter:
  [H1 rejected, no insight] → Strike 1
  [H2 rejected, no insight] → Strike 2
  [H3 rejected, no insight] → Strike 3 → STOP
```

**Important**: A rejected hypothesis that provides useful insight (narrows the search) does NOT count as a strike. Only truly unproductive tests count.

**On 3rd Strike — STOP and Escalate**:

```
## ESCALATION: 3-Strike Limit Reached

### Failed Step
- Phase: 3 — Hypothesis Testing
- Step: Hypothesis test #{N}

### Error History
1. Attempt 1: H1 — {description}
   Test: {what was checked}
   Result: {rejected/inconclusive} — {why}
2. Attempt 2: H2 — {description}
   Test: {what was checked}
   Result: {rejected/inconclusive} — {why}
3. Attempt 3: H3 — {description}
   Test: {what was checked}
   Result: {rejected/inconclusive} — {why}

### Current State
- Evidence collected: {summary from Phase 1-2}
- Hypotheses tested: {list}
- Files examined: {list}

### Diagnosis
- Likely root cause area: {best guess based on all evidence}
- Suggested human action: {specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z"}

### Diagnostic Dump
{Full investigation-report.json content}
```

After escalation, set status to **BLOCKED** per Completion Status Protocol.

### Step 5: Confirm Root Cause

If a hypothesis is confirmed, document the confirmed root cause:

```json
{
  "phase": 3,
  "confirmed_root_cause": {
    "hypothesis_id": "H1",
    "description": "Root cause description with full evidence chain",
    "evidence_chain": [
      "Phase 1: Error message X observed in Y",
      "Phase 2: Same pattern found in 3 other files",
      "Phase 3: H1 confirmed — null check missing at file.ts:42"
    ],
    "affected_code": {
      "file": "path/to/file.ts",
      "line_range": "42-55",
      "function": "functionName"
    }
  }
}
```

## Output

- **Data**: `hypothesis-tests` and `confirmed_root_cause` added to investigation report (in-memory)
- **Format**: JSON structure as defined above

## Gate for Phase 4

**Phase 4 can ONLY proceed if `confirmed_root_cause` is present.** This is the Iron Law gate.

| Outcome | Next Step |
|---------|-----------|
| Root cause confirmed | Proceed to [Phase 4: Implementation](04-implementation.md) |
| 3-strike escalation | STOP, output diagnostic dump, status = BLOCKED |
| Partial insight | Re-form hypotheses with new evidence (stays in Phase 3) |

## Quality Checks

- [ ] Maximum 3 hypotheses formed, each with cited evidence
- [ ] Each hypothesis tested with a specific, documented probe
- [ ] Test results recorded with concrete evidence
- [ ] 3-strike counter maintained correctly
- [ ] Root cause confirmed with full evidence chain OR escalation triggered

## Next Phase

Proceed to [Phase 4: Implementation](04-implementation.md) ONLY with confirmed root cause.