mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-03-30 20:21:09 +08:00
- Add 3 new Claude skills: investigate (Iron Law debugging), security-audit (OWASP Top 10 + STRIDE), ship (gated release pipeline) - Port all 3 skills to Codex v4 format under .codex/skills/ using Deep Interaction pattern (spawn_agent + assign_task phase transitions) - Update README/README_CN acknowledgments: credit gstack (https://github.com/garrytan/gstack) as inspiration source Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
178 lines
5.3 KiB
Markdown
178 lines
5.3 KiB
Markdown
# Phase 3: Hypothesis Testing
|
|
|
|
Form hypotheses from evidence and test each one. Enforce the 3-strike escalation rule.
|
|
|
|
## Objective
|
|
|
|
- Form a maximum of 3 hypotheses from Phase 1-2 evidence
|
|
- Test each hypothesis with minimal, read-only probes
|
|
- Confirm or reject each hypothesis with concrete evidence
|
|
- Enforce 3-strike rule: STOP and escalate after 3 consecutive test failures
|
|
|
|
## Execution Steps
|
|
|
|
### Step 1: Form Hypotheses
|
|
|
|
Using evidence from Phase 1 (investigation report) and Phase 2 (pattern analysis), form up to 3 ranked hypotheses:
|
|
|
|
```json
|
|
{
|
|
"hypotheses": [
|
|
{
|
|
"id": "H1",
|
|
"description": "The root cause is X because evidence Y",
|
|
"evidence_supporting": ["evidence item 1", "evidence item 2"],
|
|
"predicted_behavior": "If H1 is correct, then we should observe Z",
|
|
"test_method": "How to verify: read file X line Y, check value Z",
|
|
"confidence": "high|medium|low"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Hypothesis Formation Rules**:
|
|
- Each hypothesis must cite at least one piece of evidence from Phase 1-2
|
|
- Each hypothesis must have a testable prediction
|
|
- Rank by confidence (high first)
|
|
- Maximum 3 hypotheses per investigation
|
|
|
|
### Step 2: Test Hypotheses Sequentially
|
|
|
|
Test each hypothesis starting from highest confidence. Use read-only probes:
|
|
|
|
**Allowed test methods**:
|
|
- `Read` a specific file and check a specific value or condition
|
|
- `Grep` for a pattern that would confirm or deny the hypothesis
|
|
- `Bash` to run a specific test or command that reveals the condition
|
|
- Temporarily add a log statement to observe runtime behavior (revert after)
|
|
|
|
**Prohibited during testing**:
|
|
- Modifying production code (save that for Phase 4)
|
|
- Changing multiple things at once
|
|
- Running the full test suite (targeted checks only)
|
|
|
|
```javascript
|
|
// Example hypothesis test
|
|
// H1: "Function X receives null because caller Y doesn't check return value"
|
|
const evidence = Read({ file_path: "src/caller.ts" })
|
|
// Check: Does caller Y use the return value without null check?
|
|
// Result: Confirmed / Rejected with specific evidence
|
|
```
|
|
|
|
### Step 3: Record Test Results
|
|
|
|
For each hypothesis test:
|
|
|
|
```json
|
|
{
|
|
"hypothesis_tests": [
|
|
{
|
|
"id": "H1",
|
|
"test_performed": "Read src/caller.ts:42 - checked null handling",
|
|
"result": "confirmed|rejected|inconclusive",
|
|
"evidence": "specific observation that confirms or rejects",
|
|
"files_checked": ["src/caller.ts:42-55"]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Step 4: 3-Strike Escalation Rule
|
|
|
|
Track consecutive test failures. A "failure" means the test was inconclusive or the hypothesis was rejected AND no actionable insight was gained.
|
|
|
|
```
|
|
Strike Counter:
|
|
[H1 rejected, no insight] → Strike 1
|
|
[H2 rejected, no insight] → Strike 2
|
|
[H3 rejected, no insight] → Strike 3 → STOP
|
|
```
|
|
|
|
**Important**: A rejected hypothesis that provides useful insight (narrows the search) does NOT count as a strike. Only truly unproductive tests count.
|
|
|
|
**On 3rd Strike — STOP and Escalate**:
|
|
|
|
```
|
|
## ESCALATION: 3-Strike Limit Reached
|
|
|
|
### Failed Step
|
|
- Phase: 3 — Hypothesis Testing
|
|
- Step: Hypothesis test #{N}
|
|
|
|
### Error History
|
|
1. Attempt 1: H1 — {description}
|
|
Test: {what was checked}
|
|
Result: {rejected/inconclusive} — {why}
|
|
2. Attempt 2: H2 — {description}
|
|
Test: {what was checked}
|
|
Result: {rejected/inconclusive} — {why}
|
|
3. Attempt 3: H3 — {description}
|
|
Test: {what was checked}
|
|
Result: {rejected/inconclusive} — {why}
|
|
|
|
### Current State
|
|
- Evidence collected: {summary from Phase 1-2}
|
|
- Hypotheses tested: {list}
|
|
- Files examined: {list}
|
|
|
|
### Diagnosis
|
|
- Likely root cause area: {best guess based on all evidence}
|
|
- Suggested human action: {specific recommendation — e.g., "Add logging to X", "Check runtime config Y", "Reproduce in debugger at Z"}
|
|
|
|
### Diagnostic Dump
|
|
{Full investigation-report.json content}
|
|
```
|
|
|
|
After escalation, set status to **BLOCKED** per Completion Status Protocol.
|
|
|
|
### Step 5: Confirm Root Cause
|
|
|
|
If a hypothesis is confirmed, document the confirmed root cause:
|
|
|
|
```json
|
|
{
|
|
"phase": 3,
|
|
"confirmed_root_cause": {
|
|
"hypothesis_id": "H1",
|
|
"description": "Root cause description with full evidence chain",
|
|
"evidence_chain": [
|
|
"Phase 1: Error message X observed in Y",
|
|
"Phase 2: Same pattern found in 3 other files",
|
|
"Phase 3: H1 confirmed — null check missing at file.ts:42"
|
|
],
|
|
"affected_code": {
|
|
"file": "path/to/file.ts",
|
|
"line_range": "42-55",
|
|
"function": "functionName"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Output
|
|
|
|
- **Data**: `hypothesis-tests` and `confirmed_root_cause` added to investigation report (in-memory)
|
|
- **Format**: JSON structure as defined above
|
|
|
|
## Gate for Phase 4
|
|
|
|
**Phase 4 can ONLY proceed if `confirmed_root_cause` is present.** This is the Iron Law gate.
|
|
|
|
| Outcome | Next Step |
|
|
|---------|-----------|
|
|
| Root cause confirmed | Proceed to [Phase 4: Implementation](04-implementation.md) |
|
|
| 3-strike escalation | STOP, output diagnostic dump, status = BLOCKED |
|
|
| Partial insight | Re-form hypotheses with new evidence (stays in Phase 3) |
|
|
|
|
## Quality Checks
|
|
|
|
- [ ] Maximum 3 hypotheses formed, each with cited evidence
|
|
- [ ] Each hypothesis tested with a specific, documented probe
|
|
- [ ] Test results recorded with concrete evidence
|
|
- [ ] 3-strike counter maintained correctly
|
|
- [ ] Root cause confirmed with full evidence chain OR escalation triggered
|
|
|
|
## Next Phase
|
|
|
|
Proceed to [Phase 4: Implementation](04-implementation.md) ONLY with confirmed root cause.
|