- Introduced Role Analysis Reviewer Agent to validate role analysis outputs against templates and quality standards. - Created a detailed validation ruleset for the system-architect role, including mandatory and recommended sections. - Added JSON validation report structure for output. - Implemented execution command for validation process. test: Add UX tests for HookCard component - Created comprehensive tests for HookCard component, focusing on delete confirmation UX pattern. - Verified confirmation dialog appearance, deletion functionality, and button interactions. - Ensured proper handling of state updates and visual feedback for enabled/disabled status. test: Add UX tests for ThemeSelector component - Developed tests for ThemeSelector component, emphasizing delete confirmation UX pattern. - Validated confirmation dialog display, deletion actions, and toast notifications for undo functionality. - Ensured proper management of theme slots and state updates. feat: Implement useDebounce hook - Added useDebounce hook to delay expensive computations or API calls, enhancing performance. feat: Create System Architect Analysis Template - Developed a comprehensive template for system architect role analysis, covering required sections such as architecture overview, data model, state machine, error handling strategy, observability requirements, configuration model, and boundary scenarios. - Included examples and templates for each section to guide users in producing SPEC.md-level precision modeling.
15 KiB
System Architect Analysis Template
This template guides system-architect role analysis to produce SPEC.md-level precision modeling.
Required Sections
1. Architecture Overview
High-level system architecture and component interaction.
2. Data Model
Define 3-5 core entities with precise field definitions.
Template:
## Data Model
### Entity: [EntityName]
**Purpose**: [What this entity represents]
| Field | Type | Constraint | Description |
|-------|------|------------|-------------|
| id | UUID | NOT NULL, PRIMARY KEY | Unique identifier |
| status | Enum(created, active, suspended, deleted) | NOT NULL, DEFAULT 'created' | Entity lifecycle state |
| created_at | Timestamp | NOT NULL, DEFAULT NOW() | Creation timestamp |
| updated_at | Timestamp | NOT NULL, DEFAULT NOW() | Last update timestamp |
| [field_name] | [type] | [constraints] | [description] |
**Relationships**:
- [EntityA] → [EntityB]: [relationship type] (one-to-many, many-to-many, etc.)
- Foreign keys: [field] REFERENCES [table(field)]
**Indexes**:
- PRIMARY KEY: id
- INDEX: status, created_at
- UNIQUE: [unique_field]
Example:
### Entity: Order
**Purpose**: Represents a customer purchase order
| Field | Type | Constraint | Description |
|-------|------|------------|-------------|
| id | UUID | NOT NULL, PRIMARY KEY | Unique order identifier |
| user_id | UUID | NOT NULL, FOREIGN KEY | Reference to User entity |
| status | Enum(pending, processing, completed, cancelled) | NOT NULL, DEFAULT 'pending' | Order lifecycle state |
| total_amount | Decimal(10,2) | NOT NULL, CHECK (total_amount >= 0) | Total order amount in USD |
| created_at | Timestamp | NOT NULL, DEFAULT NOW() | Order creation time |
| updated_at | Timestamp | NOT NULL, DEFAULT NOW() | Last status update time |
**Relationships**:
- Order → User: many-to-one (each order belongs to one user)
- Order → OrderItem: one-to-many (each order contains multiple items)
**Indexes**:
- PRIMARY KEY: id
- INDEX: user_id, status, created_at
- COMPOSITE INDEX: (user_id, status) for user order queries
3. State Machine
Define lifecycle state machines for entities with complex workflows.
Template:
## State Machine: [EntityName] Lifecycle
**ASCII State Diagram**:
[Initial] --event--> [State1] --event--> [State2] --event--> [Final] | | | +---error-----------+---error-----------+----> [Error State]
**State Definitions**:
| State | Description | Entry Conditions | Exit Events |
|-------|-------------|------------------|-------------|
| [state] | [what this state means] | [conditions to enter] | [events that trigger exit] |
**State Transitions**:
| From State | Event | To State | Side Effects | Validation |
|------------|-------|----------|--------------|------------|
| [from] | [event] | [to] | [what happens] | [pre-conditions] |
**Error Handling**:
| Error Scenario | From State | Recovery Action | Timeout |
|----------------|------------|-----------------|---------|
| [error] | [state] | [action] | [duration] |
Example:
## State Machine: Order Lifecycle
**ASCII State Diagram**:
[Created] --submit--> [Pending] --process--> [Processing] --complete--> [Completed] | | | | +---cancel--------------> [Cancelled] | | | +---timeout-----------+---payment_failed------> [Failed]
**State Definitions**:
| State | Description | Entry Conditions | Exit Events |
|-------|-------------|------------------|-------------|
| Created | Order initialized but not submitted | User adds items to cart | submit, timeout |
| Pending | Order submitted, awaiting payment | Payment initiated | process, cancel, payment_failed |
| Processing | Payment confirmed, fulfilling order | Payment successful | complete, cancel |
| Completed | Order fulfilled and delivered | All items shipped | - |
| Cancelled | Order cancelled by user or system | User cancels or admin cancels | - |
| Failed | Order failed due to payment or system error | Payment failed or timeout | - |
**State Transitions**:
| From State | Event | To State | Side Effects | Validation |
|------------|-------|----------|--------------|------------|
| Created | submit | Pending | Initiate payment, lock inventory | Cart not empty, items available |
| Pending | process | Processing | Charge payment, allocate inventory | Payment authorized |
| Pending | cancel | Cancelled | Release inventory, refund if paid | User request or timeout |
| Pending | payment_failed | Failed | Release inventory, notify user | Payment gateway error |
| Processing | complete | Completed | Ship items, send confirmation | All items shipped |
| Processing | cancel | Cancelled | Stop fulfillment, refund payment | Admin approval required |
**Error Handling**:
| Error Scenario | From State | Recovery Action | Timeout |
|----------------|------------|-----------------|---------|
| Payment timeout | Pending | Auto-cancel, release inventory | 15 minutes |
| Inventory unavailable | Processing | Cancel order, full refund | Immediate |
| Shipping failure | Processing | Retry 3x, then cancel | 24 hours |
4. Error Handling Strategy
Define global error handling approach.
Template:
## Error Handling Strategy
### Error Classification
| Error Type | Classification | Retry Strategy | User Impact |
|------------|----------------|----------------|-------------|
| [error] | Transient/Permanent/Degraded | [strategy] | [impact] |
### Recovery Mechanisms
| Component | Error | Recovery | Timeout | Fallback |
|-----------|-------|----------|---------|----------|
| [component] | [error] | [action] | [duration] | [fallback] |
### Circuit Breaker
- **Threshold**: [failure count] failures in [time window]
- **Open Duration**: [duration]
- **Half-Open Test**: [test strategy]
Example:
## Error Handling Strategy
### Error Classification
| Error Type | Classification | Retry Strategy | User Impact |
|------------|----------------|----------------|-------------|
| DB Connection Lost | Transient | Exponential backoff, 3 retries | Request delayed 1-5s |
| Invalid Input | Permanent | No retry, return 400 | Immediate error response |
| Payment Gateway Timeout | Transient | Retry 2x with 5s delay | Order pending, notify user |
| Inventory Service Down | Degraded | Use cached data, mark stale | Show approximate availability |
### Recovery Mechanisms
| Component | Error | Recovery | Timeout | Fallback |
|-----------|-------|----------|---------|----------|
| Database | Connection lost | Retry 3x with exponential backoff (1s, 2s, 4s) | 10s total | Return 503 Service Unavailable |
| Payment Gateway | Timeout | Retry 2x with 5s delay | 15s total | Mark order as pending, async retry |
| Inventory Service | Service down | Use cached inventory (max 5min old) | 3s | Show "limited availability" |
| Email Service | Send failure | Queue for async retry (5 attempts over 24h) | N/A | Log failure, continue order |
### Circuit Breaker
- **Threshold**: 5 failures in 60 seconds
- **Open Duration**: 30 seconds
- **Half-Open Test**: Single request, if success → close, if fail → open for 60s
5. Observability Requirements
Define metrics, logs, and health checks.
Template:
## Observability Requirements
### Metrics
| Metric Name | Type | Labels | Description | Alert Threshold |
|-------------|------|--------|-------------|-----------------|
| [metric] | Counter/Gauge/Histogram | [labels] | [description] | [threshold] |
### Log Events
| Event | Level | Fields | When to Log |
|-------|-------|--------|-------------|
| [event] | INFO/WARN/ERROR | [fields] | [condition] |
### Health Checks
| Endpoint | Check | Success Criteria | Timeout |
|----------|-------|------------------|---------|
| [endpoint] | [check] | [criteria] | [duration] |
Example:
## Observability Requirements
### Metrics
| Metric Name | Type | Labels | Description | Alert Threshold |
|-------------|------|--------|-------------|-----------------|
| http_request_duration_ms | Histogram | endpoint, method, status | Request latency distribution | P99 > 500ms |
| active_orders | Gauge | status | Current orders by status | - |
| order_state_transitions | Counter | from_state, to_state, event | State transition count | - |
| payment_failures | Counter | reason | Payment failure count | > 10/min |
| db_connection_pool_size | Gauge | pool_name | Active DB connections | > 80% capacity |
### Log Events
| Event | Level | Fields | When to Log |
|-------|-------|--------|-------------|
| order_created | INFO | order_id, user_id, total_amount | Every order creation |
| order_state_changed | INFO | order_id, from_state, to_state, event | Every state transition |
| payment_failed | WARN | order_id, user_id, reason, amount | Payment failure |
| inventory_unavailable | ERROR | order_id, item_id, requested_qty | Inventory check fails |
| circuit_breaker_opened | ERROR | component, failure_count | Circuit breaker opens |
### Health Checks
| Endpoint | Check | Success Criteria | Timeout |
|----------|-------|------------------|---------|
| /health/live | Process alive | HTTP 200 | 1s |
| /health/ready | DB + Cache reachable | HTTP 200, all deps OK | 5s |
| /health/db | Database query | SELECT 1 succeeds | 3s |
| /health/cache | Cache ping | PING succeeds | 2s |
6. Configuration Model
Define all configurable parameters for the system.
Template:
## Configuration Model
| Config Key | Type | Default | Constraint | Description | Environment |
|------------|------|---------|------------|-------------|-------------|
| [key] | [type] | [default] | [constraint] | [description] | [env] |
**Configuration Categories**:
- **Application**: Core application settings
- **Database**: Database connection and pool settings
- **Cache**: Cache configuration
- **Security**: Authentication and authorization settings
- **Observability**: Logging and monitoring settings
**Validation Rules**:
- [Config key] MUST be [constraint]
- [Config key] SHOULD be [recommendation]
**Configuration Loading**:
- Priority: Environment variables > Config file > Defaults
- Hot reload: [Supported/Not supported]
Example:
## Configuration Model
| Config Key | Type | Default | Constraint | Description | Environment |
|------------|------|---------|------------|-------------|-------------|
| SERVER_PORT | int | 8080 | 1024-65535 | HTTP server port | APP_PORT |
| DB_HOST | string | localhost | Valid hostname/IP | Database host | DATABASE_HOST |
| DB_PORT | int | 5432 | 1024-65535 | Database port | DATABASE_PORT |
| DB_MAX_CONNECTIONS | int | 100 | 10-1000 | Max DB connection pool size | DB_POOL_SIZE |
| CACHE_TTL_SECONDS | int | 3600 | 60-86400 | Cache entry TTL | CACHE_TTL |
| SESSION_TIMEOUT_SECONDS | int | 1800 | 300-7200 | User session timeout | SESSION_TIMEOUT |
| LOG_LEVEL | enum | INFO | DEBUG/INFO/WARN/ERROR | Logging level | LOG_LEVEL |
| ENABLE_METRICS | bool | true | true/false | Enable Prometheus metrics | METRICS_ENABLED |
| RATE_LIMIT_PER_MINUTE | int | 100 | 1-10000 | API rate limit per user | RATE_LIMIT |
**Configuration Categories**:
- **Application**: SERVER_PORT, LOG_LEVEL
- **Database**: DB_HOST, DB_PORT, DB_MAX_CONNECTIONS
- **Cache**: CACHE_TTL_SECONDS
- **Security**: SESSION_TIMEOUT_SECONDS
- **Observability**: ENABLE_METRICS, LOG_LEVEL
**Validation Rules**:
- DB_MAX_CONNECTIONS MUST be <= database server max_connections
- SESSION_TIMEOUT_SECONDS SHOULD be >= 300 (5 minutes) for security
- CACHE_TTL_SECONDS MUST be > 0
- RATE_LIMIT_PER_MINUTE SHOULD be tuned based on expected load
**Configuration Loading**:
- Priority: Environment variables > config.yaml > Defaults
- Hot reload: Supported for LOG_LEVEL, CACHE_TTL_SECONDS, RATE_LIMIT_PER_MINUTE
- Restart required: SERVER_PORT, DB_HOST, DB_PORT
7. Boundary Scenarios
Define system behavior in edge cases and operational scenarios.
Template:
## Boundary Scenarios
### Concurrency
- **Max Concurrent Requests**: [number]
- **Queueing Strategy**: [Drop oldest / Block / Return 503]
- **Thread Pool Size**: [number]
- **Connection Pool Size**: [number]
### Rate Limiting
- **Per-User Limit**: [requests/minute]
- **Global Limit**: [requests/minute]
- **Burst Allowance**: [number]
- **Rate Limit Response**: HTTP 429 with Retry-After header
### Graceful Shutdown
- **Drain Period**: [duration] - Stop accepting new requests
- **In-Flight Timeout**: [duration] - Wait for active requests
- **Force Kill After**: [duration] - Hard shutdown
- **Shutdown Hooks**: [cleanup actions]
### Resource Cleanup
- **Idle Connection Timeout**: [duration]
- **Stale Session Cleanup**: [frequency]
- **Temporary File Cleanup**: [frequency]
- **Log Rotation**: [frequency/size]
### Scalability
- **Horizontal Scaling**: [Supported/Not supported]
- **Stateless Design**: [Yes/No]
- **Shared State**: [Redis/Database/None]
- **Load Balancing**: [Round-robin/Least-connections/IP-hash]
### Disaster Recovery
- **Backup Frequency**: [frequency]
- **Backup Retention**: [duration]
- **Recovery Time Objective (RTO)**: [duration]
- **Recovery Point Objective (RPO)**: [duration]
Example:
## Boundary Scenarios
### Concurrency
- **Max Concurrent Requests**: 1000 per instance
- **Queueing Strategy**: Return HTTP 503 when queue full (max 100 queued)
- **Thread Pool Size**: 200 worker threads
- **Connection Pool Size**: 100 database connections
### Rate Limiting
- **Per-User Limit**: 100 requests/minute
- **Global Limit**: 10,000 requests/minute per instance
- **Burst Allowance**: 20 requests (allow short bursts)
- **Rate Limit Response**: HTTP 429 with `Retry-After: 60` header
### Graceful Shutdown
- **Drain Period**: 10 seconds - Stop accepting new requests, return 503
- **In-Flight Timeout**: 30 seconds - Wait for active requests to complete
- **Force Kill After**: 60 seconds - Hard shutdown if requests still active
- **Shutdown Hooks**:
- Close database connections
- Flush metrics to monitoring system
- Save in-memory cache to Redis
- Deregister from service discovery
### Resource Cleanup
- **Idle Connection Timeout**: 5 minutes - Close idle DB connections
- **Stale Session Cleanup**: Every 1 hour - Remove expired sessions
- **Temporary File Cleanup**: Every 24 hours - Delete files older than 7 days
- **Log Rotation**: Daily or when file size > 100MB
### Scalability
- **Horizontal Scaling**: Supported - Stateless design
- **Stateless Design**: Yes - All state in Redis/Database
- **Shared State**: Redis for sessions, PostgreSQL for persistent data
- **Load Balancing**: Least-connections algorithm (sticky sessions not required)
### Disaster Recovery
- **Backup Frequency**: Database backup every 6 hours
- **Backup Retention**: 30 days
- **Recovery Time Objective (RTO)**: 1 hour - System operational within 1 hour
- **Recovery Point Objective (RPO)**: 6 hours - Max 6 hours of data loss acceptable
Usage Instructions
When generating system-architect analysis:
- Architecture Overview (high-level)
- Data Model (3-5 core entities)
- State Machine (1-2 entities with complex lifecycle)
- Error Handling Strategy (global + per-component)
- Observability Requirements (metrics, logs, health checks)
- Configuration Model (all configurable parameters)
- Boundary Scenarios (concurrency, rate limiting, shutdown, cleanup, scalability, DR)
All sections MUST use RFC 2119 keywords (MUST, SHOULD, MAY) for constraints.