Files
catlog22 b361f42c1c Add E2E tests for MCP Tool Execution and Session Lifecycle
- Implement comprehensive end-to-end tests for MCP Tool Execution, covering tool discovery, execution, parameter validation, error handling, and timeout scenarios.
- Introduce tests for the complete lifecycle of a workflow session, including initialization, task management, status updates, and archiving.
- Validate dual parameter format support and handle boundary conditions such as invalid JSON, non-existent sessions, and path traversal attempts.
- Ensure concurrent task updates are handled without data loss and that task data is preserved when archiving sessions.
- List sessions across all locations and verify metadata inclusion in the results.
2026-01-05 09:44:08 +08:00

299 lines
8.0 KiB
Markdown

# E2E Test Suite for CCW
End-to-end tests for the Claude Code Workflow (CCW) project, implementing comprehensive test scenarios based on Gemini's analysis.
## Test Files
### 1. `session-lifecycle.e2e.test.ts`
**Priority: HIGH** - Tests the complete session lifecycle (Golden Path)
**Scenarios Covered:**
- ✅ Session initialization → Add tasks → Update status → Archive
- ✅ Dual parameter format support (legacy/new)
- ✅ Boundary conditions:
- Invalid JSON in task files
- Non-existent session references
- Path traversal prevention
- Concurrent task updates
- ✅ Data preservation during archiving
- ✅ Multi-location session listing
**Key Test Cases:**
```typescript
// Golden path: Full lifecycle
init write tasks update status archive
// Boundary tests
- Invalid JSON handling
- Path traversal attempts: '../../../etc/passwd'
- Concurrent updates without data loss
- Complex nested data preservation
```
### 2. `dashboard-websocket.e2e.test.ts`
**Priority: HIGH** - Tests Dashboard real-time updates via WebSocket
**Scenarios Covered:**
- ✅ WebSocket connection and event dispatch
- ✅ Fire-and-forget notification behavior
- ✅ Event types:
- `SESSION_CREATED`
- `SESSION_UPDATED`
- `TASK_UPDATED`
- `SESSION_ARCHIVED`
- ✅ Multiple concurrent WebSocket clients
- ✅ Network failure resilience
- ✅ Event payload validation
- ✅ Client reconnection handling
**Key Test Cases:**
```typescript
// Real-time updates
CLI command HTTP hook WebSocket broadcast Dashboard update
// Fire-and-forget verification
Request duration < 1000ms, no blocking
// Multi-client broadcast
3 concurrent clients receive same event
```
### 3. `mcp-tools.e2e.test.ts`
**Priority: HIGH** - Tests MCP JSON-RPC tool execution
**Scenarios Covered:**
- ✅ Tool discovery (`tools/list`)
- ✅ Tool execution (`tools/call`)
- ✅ Parameter validation
- ✅ Error handling:
- Missing required parameters
- Invalid parameter values
- Non-existent tools
- ✅ Security validation (path traversal prevention)
- ✅ Concurrent tool calls
- ✅ Tool schema completeness
**Key Test Cases:**
```typescript
// JSON-RPC protocol
tools/list Returns tool schemas
tools/call Executes with parameters
// Security
Path traversal attempt: '../../../etc/passwd' Rejected
// Concurrency
3 parallel tool calls No interference
```
## Running Tests
### Run All E2E Tests
```bash
npm test ccw/tests/e2e/*.test.ts
```
### Run Individual Test Suite
```bash
# Session lifecycle tests
node --experimental-strip-types --test ccw/tests/e2e/session-lifecycle.e2e.test.ts
# WebSocket tests
node --experimental-strip-types --test ccw/tests/e2e/dashboard-websocket.e2e.test.ts
# MCP tools tests
node --experimental-strip-types --test ccw/tests/e2e/mcp-tools.e2e.test.ts
```
### Run with Verbose Output
```bash
node --experimental-strip-types --test --test-reporter=spec ccw/tests/e2e/*.test.ts
```
## Test Architecture
### Mock Strategy
Following Gemini's recommendations:
1. **`executeTool` Mocking** (Avoided)
- Tests use real `session_manager` tool for authenticity
- Temporary directories isolate test environments
2. **`memfs` Mocking** (Not needed)
- Tests use real filesystem with `mkdtempSync`
- Automatic cleanup with `afterEach` hooks
3. **`http.request` Mocking** (WebSocket tests)
- Custom `WebSocketClient` class for real protocol testing
- Fire-and-forget behavior verified via timing measurements
### Test Fixtures
#### Session Lifecycle
```typescript
projectRoot = mkdtempSync('/tmp/ccw-e2e-session-lifecycle-')
sessionPath = projectRoot/.workflow/active/WFS-xxx
```
#### Dashboard WebSocket
```typescript
server = startServer(projectRoot, randomPort)
wsClient = new WebSocketClient()
wsClient.connect(port)
```
#### MCP Tools
```typescript
mcpClient = new McpClient()
mcpClient.start() // Spawns ccw-mcp.js
mcpClient.call('tools/list', {})
```
## Test Patterns
### Arrangement-Act-Assert (AAA)
```typescript
it('test description', async () => {
// Arrange
const sessionId = 'WFS-test-001';
await sessionManager.handler({ operation: 'init', ... });
// Act
const result = await sessionManager.handler({ operation: 'read', ... });
// Assert
assert.equal(result.success, true);
assert.equal(result.result.session_id, sessionId);
});
```
### Setup and Teardown
```typescript
before(async () => {
projectRoot = mkdtempSync('/tmp/ccw-test-');
process.chdir(projectRoot);
});
afterEach(() => {
rmSync(workflowPath(projectRoot), { recursive: true, force: true });
});
after(() => {
process.chdir(originalCwd);
rmSync(projectRoot, { recursive: true, force: true });
});
```
### Error Assertion
```typescript
// Verify error handling
const result = await handler({ invalid: 'params' });
assert.equal(result.success, false);
assert.ok(result.error.includes('expected error message'));
```
## Boundary Conditions Tested
### Invalid Input
- ❌ Malformed JSON in files
- ❌ Missing required parameters
- ❌ Invalid parameter types
- ❌ Non-existent resources
### Security
- 🔒 Path traversal attempts: `../../../etc/passwd`
- 🔒 Invalid session ID formats: `bad/session/id`
- 🔒 Directory escape in task IDs
### Concurrency
- 🔄 Multiple simultaneous task updates
- 🔄 Concurrent WebSocket clients (3+)
- 🔄 Parallel MCP tool calls
### Network Failures
- 🌐 Dashboard server unreachable
- 🌐 WebSocket disconnect/reconnect
- 🌐 Fire-and-forget behavior (no blocking)
## Integration with Existing Tests
These E2E tests complement existing integration tests:
```
ccw/tests/
├── integration/
│ ├── session-lifecycle.test.ts (Unit-level session ops)
│ ├── session-routes.test.ts (HTTP API routes)
│ └── ...
└── e2e/
├── session-lifecycle.e2e.test.ts (Full workflow golden path)
├── dashboard-websocket.e2e.test.ts (Real-time updates)
└── mcp-tools.e2e.test.ts (JSON-RPC protocol)
```
**Difference:**
- **Integration tests**: Test individual components in isolation
- **E2E tests**: Test complete user workflows across components
## Coverage Goals
Based on Gemini's analysis:
-**Session Lifecycle**: 100% golden path coverage
-**WebSocket Events**: All event types (`SESSION_*`, `TASK_*`)
-**MCP Tools**: Core tools (`session_manager`, `smart_search`, `write_file`, `edit_file`)
-**Boundary Conditions**: 8+ edge cases per test suite
-**Error Handling**: Consistent error format validation
## Known Limitations
1. **File System Mock**: Tests use real filesystem (not `memfs`)
- **Reason**: Ensures compatibility with actual file operations
- **Trade-off**: Slightly slower than in-memory tests
2. **Process Spawning**: MCP tests spawn real Node processes
- **Reason**: Verifies JSON-RPC stdio protocol accurately
- **Trade-off**: Platform-dependent (requires Node.js)
3. **Network Timing**: WebSocket tests may be flaky on slow systems
- **Mitigation**: Timeout values set to 5000ms (generous)
## Future Enhancements
1. **Performance Benchmarks**
- Measure session operations latency
- WebSocket event dispatch time
- MCP tool execution overhead
2. **Load Testing**
- 100+ concurrent WebSocket clients
- Bulk session creation (1000+ sessions)
- High-frequency task updates
3. **Visual Testing** (Playwright)
- Dashboard UI interaction
- Real-time chart updates
- Task queue drag-and-drop
## References
- **Gemini Analysis**: Based on comprehensive test analysis report
- **Node.js Test Runner**: Native test framework (no external dependencies)
- **MCP Protocol**: Model Context Protocol JSON-RPC specification
- **WebSocket Protocol**: RFC 6455 compliance
## Contributing
When adding new E2E tests:
1. Follow AAA pattern (Arrange-Act-Assert)
2. Use descriptive test names: `it('completes full session lifecycle: init → add tasks → update status → archive')`
3. Test both happy path and boundary conditions
4. Clean up resources in `afterEach` hooks
5. Mock console output to reduce noise: `mock.method(console, 'error', () => {})`
6. Add test documentation to this README
## License
MIT - Same as CCW project