diff --git a/.claude/rules/active_memory_config.json b/.claude/active_memory_config.json similarity index 100% rename from .claude/rules/active_memory_config.json rename to .claude/active_memory_config.json diff --git a/.claude/rules/cli-tools-usage.md b/.claude/rules/cli-tools-usage.md index 57b51a71..9ea6ac98 100644 --- a/.claude/rules/cli-tools-usage.md +++ b/.claude/rules/cli-tools-usage.md @@ -1,36 +1,433 @@ -# CLI Tools Usage Rules +# Intelligent Tools Selection Strategy -## Tool Selection +## Table of Contents +1. [Quick Reference](#quick-reference) +2. [Tool Specifications](#tool-specifications) +3. [Prompt Template](#prompt-template) +4. [CLI Execution](#cli-execution) +5. [Configuration](#configuration) +6. [Best Practices](#best-practices) + +--- + +## Quick Reference + +## Quick Decision Tree + +``` +┌─ Task Analysis/Documentation? +│ └─→ Use Gemini (Fallback: Codex,Qwen) +│ └─→ MODE: analysis (default, read-only) +│ +└─ Task Implementation/Bug Fix? + └─→ Use Codex (Fallback: Gemini,Qwen) + └─→ MODE: auto (full operations) or write (file operations) +``` + + +### Universal Prompt Template + +``` +PURPOSE: [what] + [why] + [success criteria] + [constraints/scope] +TASK: • [step 1: specific action] • [step 2: specific action] • [step 3: specific action] +MODE: [analysis|write|auto] +CONTEXT: @[file patterns] | Memory: [session/tech/module context] +EXPECTED: [deliverable format] + [quality criteria] + [structure requirements] +RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [domain constraints] | MODE=[permission] +``` + +### Intent Capture Checklist (Before CLI Execution) + +**⚠️ CRITICAL**: Before executing any CLI command, verify these intent dimensions: +**Intent Validation Questions**: +- [ ] Is the objective specific and measurable? +- [ ] Are success criteria defined? +- [ ] Is the scope clearly bounded? +- [ ] Are constraints and limitations stated? +- [ ] Is the expected output format clear? +- [ ] Is the action level (read/write) explicit? + +## Tool Selection Matrix + +| Task Category | Tool | MODE | When to Use | +|---------------|------|------|-------------| +| **Read/Analyze** | Gemini/Qwen | `analysis` | Code review, architecture analysis, pattern discovery, exploration | +| **Write/Create** | Gemini/Qwen | `write` | Documentation generation, file creation (non-code) | +| **Implement/Fix** | Codex | `auto` | Feature implementation, bug fixes, test creation, refactoring | + +## Essential Command Structure + +```bash +ccw cli exec "" --tool --mode +``` + +### Core Principles + +- **Use tools early and often** - Tools are faster and more thorough +- **Unified CLI** - Always use `ccw cli exec` for consistent parameter handling +- **One template required** - ALWAYS reference exactly ONE template in RULES (use universal fallback if no specific match) +- **Write protection** - Require EXPLICIT `--mode write` or `--mode auto` +- **No escape characters** - NEVER use `\$`, `\"`, `\'` in CLI commands + +--- + +## Tool Specifications + +### MODE Options + +| Mode | Permission | Use For | Specification | +|------|------------|---------|---------------| +| `analysis` | Read-only (default) | Code review, architecture analysis, pattern discovery | Auto for Gemini/Qwen | +| `write` | Create/Modify/Delete | Documentation, code creation, file modifications | Requires `--mode write` | +| `auto` | Full operations | Feature implementation, bug fixes, autonomous development | Codex only, requires `--mode auto` | ### Gemini & Qwen -**Use for**: Analysis, documentation, code exploration, architecture review -- Default MODE: `analysis` (read-only) -- Prefer Gemini; use Qwen as fallback + +**Via CCW**: `ccw cli exec "" --tool gemini` or `--tool qwen` + +**Characteristics**: - Large context window, pattern recognition +- Best for: Analysis, documentation, code exploration, architecture review +- Default MODE: `analysis` (read-only) +- Priority: Prefer Gemini; use Qwen as fallback + +**Models** (override via `--model`): +- Gemini: `gemini-2.5-pro` +- Qwen: `coder-model`, `vision-model` + +**Error Handling**: HTTP 429 may show error but still return results - check if results exist ### Codex -**Use for**: Feature implementation, bug fixes, autonomous development -- Requires explicit `--mode auto` or `--mode write` + +**Via CCW**: `ccw cli exec "" --tool codex --mode auto` + +**Characteristics**: +- Autonomous development, mathematical reasoning - Best for: Implementation, testing, automation +- No default MODE - must explicitly specify `--mode write` or `--mode auto` -## Core Principles +**Models**: `gpt-5.2` -- Use tools early and often - tools are faster and more thorough -- Always use `ccw cli exec` for consistent parameter handling -- ALWAYS reference exactly ONE template in RULES section -- Require EXPLICIT `--mode write` or `--mode auto` for modifications -- NEVER use escape characters (`\$`, `\"`, `\'`) in CLI commands +### Session Resume -## Permission Framework +**Resume via `--resume` parameter**: +```bash +ccw cli exec "Continue analyzing" --resume # Resume last session +ccw cli exec "Fix issues found" --resume # Resume specific session +``` + +| Value | Description | +|-------|-------------| +| `--resume` (empty) | Resume most recent session | +| `--resume ` | Resume specific execution ID | + +**Context Assembly** (automatic): +``` +=== PREVIOUS CONVERSATION === +USER PROMPT: [Previous prompt] +ASSISTANT RESPONSE: [Previous output] +=== CONTINUATION === +[Your new prompt] +``` + +**Tool Behavior**: Codex uses native `codex resume`; Gemini/Qwen assembles context as single prompt + +--- + +## Prompt Template + +### Template Structure + +Every command MUST include these fields: + +| Field | Purpose | Components | Bad Example | Good Example | +|-------|---------|------------|-------------|--------------| +| **PURPOSE** | Goal + motivation + success | What + Why + Success Criteria + Constraints | "Analyze code" | "Identify security vulnerabilities in auth module to pass compliance audit; success = all OWASP Top 10 addressed; scope = src/auth/** only" | +| **TASK** | Actionable steps | Specific verbs + targets | "• Review code • Find issues" | "• Scan for SQL injection in query builders • Check XSS in template rendering • Verify CSRF token validation" | +| **MODE** | Permission level | analysis / write / auto | (missing) | "analysis" or "write" | +| **CONTEXT** | File scope + history | File patterns + Memory | "@**/*" | "@src/auth/**/*.ts @shared/utils/security.ts \| Memory: Previous auth refactoring (WFS-001)" | +| **EXPECTED** | Output specification | Format + Quality + Structure | "Report" | "Markdown report with: severity levels (Critical/High/Medium/Low), file:line references, remediation code snippets, priority ranking" | +| **RULES** | Template + constraints | $(cat template) + domain rules | (missing) | "$(cat ~/.claude/.../security.txt) \| Focus on authentication \| Ignore test files \| analysis=READ-ONLY" | + + +### CONTEXT Configuration + +**Format**: `CONTEXT: [file patterns] | Memory: [memory context]` + +#### File Patterns + +| Pattern | Scope | +|---------|-------| +| `@**/*` | All files (default) | +| `@src/**/*.ts` | TypeScript in src | +| `@../shared/**/*` | Sibling directory (requires `--includeDirs`) | +| `@CLAUDE.md` | Specific file | + +#### Memory Context + +Include when building on previous work: + +```bash +# Cross-task reference +Memory: Building on auth refactoring (commit abc123), implementing refresh tokens + +# Cross-module integration +Memory: Integration with auth module, using shared error patterns from @shared/utils/errors.ts +``` + +**Memory Sources**: +- **Related Tasks**: Previous refactoring, extensions, conflict resolution +- **Tech Stack Patterns**: Framework conventions, security guidelines +- **Cross-Module References**: Integration points, shared utilities, type dependencies + +#### Pattern Discovery Workflow + +For complex requirements, discover files BEFORE CLI execution: + +```bash +# Step 1: Discover files +rg "export.*Component" --files-with-matches --type ts + +# Step 2: Build CONTEXT +CONTEXT: @components/Auth.tsx @types/auth.d.ts | Memory: Previous type refactoring + +# Step 3: Execute CLI +ccw cli exec "..." --tool gemini --cd src +``` + +### RULES Configuration + +**Format**: `RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [constraints]` + +**⚠️ MANDATORY**: Exactly ONE template reference is REQUIRED. Select from Task-Template Matrix or use universal fallback: +- `universal/00-universal-rigorous-style.txt` - For precision-critical tasks (default fallback) +- `universal/00-universal-creative-style.txt` - For exploratory tasks + +**Command Substitution Rules**: +- Use `$(cat ...)` directly - do NOT read template content first +- NEVER use escape characters: `\$`, `\"`, `\'` +- Tilde expands correctly in prompt context + +**Examples**: +```bash +# Specific template (preferred) +RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on auth | analysis=READ-ONLY + +# Universal fallback (when no specific template matches) +RULES: $(cat ~/.claude/workflows/cli-templates/prompts/universal/00-universal-rigorous-style.txt) | Focus on security patterns | analysis=READ-ONLY +``` + +### Template System + +**Base Path**: `~/.claude/workflows/cli-templates/prompts/` + +**Naming Convention**: +- `00-*` - Universal fallbacks (when no specific match) +- `01-*` - Universal, high-frequency +- `02-*` - Common specialized +- `03-*` - Domain-specific + +**Universal Templates**: + +| Template | Use For | +|----------|---------| +| `universal/00-universal-rigorous-style.txt` | Precision-critical, systematic methodology | +| `universal/00-universal-creative-style.txt` | Exploratory, innovative solutions | + +**Task-Template Matrix**: + +| Task Type | Template | +|-----------|----------| +| **Analysis** | | +| Execution Tracing | `analysis/01-trace-code-execution.txt` | +| Bug Diagnosis | `analysis/01-diagnose-bug-root-cause.txt` | +| Code Patterns | `analysis/02-analyze-code-patterns.txt` | +| Document Analysis | `analysis/02-analyze-technical-document.txt` | +| Architecture Review | `analysis/02-review-architecture.txt` | +| Code Review | `analysis/02-review-code-quality.txt` | +| Performance | `analysis/03-analyze-performance.txt` | +| Security | `analysis/03-assess-security-risks.txt` | +| **Planning** | | +| Architecture | `planning/01-plan-architecture-design.txt` | +| Task Breakdown | `planning/02-breakdown-task-steps.txt` | +| Component Design | `planning/02-design-component-spec.txt` | +| Migration | `planning/03-plan-migration-strategy.txt` | +| **Development** | | +| Feature | `development/02-implement-feature.txt` | +| Refactoring | `development/02-refactor-codebase.txt` | +| Tests | `development/02-generate-tests.txt` | +| UI Component | `development/02-implement-component-ui.txt` | +| Debugging | `development/03-debug-runtime-issues.txt` | + +--- + +## CLI Execution + +### Command Options + +| Option | Description | Default | +|--------|-------------|---------| +| `--tool ` | gemini, qwen, codex | gemini | +| `--mode ` | analysis, write, auto | analysis | +| `--model ` | Model override | auto-select | +| `--cd ` | Working directory | current | +| `--includeDirs ` | Additional directories (comma-separated) | none | +| `--timeout ` | Timeout in milliseconds | 300000 | +| `--resume [id]` | Resume previous session | - | +| `--no-stream` | Disable streaming | false | + +### Directory Configuration + +#### Working Directory (`--cd`) + +When using `--cd`: +- `@**/*` = Files within working directory tree only +- CANNOT reference parent/sibling via @ alone +- Must use `--includeDirs` for external directories + +#### Include Directories (`--includeDirs`) + +**TWO-STEP requirement for external files**: +1. Add `--includeDirs` parameter +2. Reference in CONTEXT with @ patterns + +```bash +# Single directory +ccw cli exec "CONTEXT: @**/* @../shared/**/*" --cd src/auth --includeDirs ../shared + +# Multiple directories +ccw cli exec "..." --cd src/auth --includeDirs ../shared,../types,../utils +``` + +**Rule**: If CONTEXT contains `@../dir/**/*`, MUST include `--includeDirs ../dir` + +**Benefits**: Excludes unrelated directories, reduces token usage + +### CCW Parameter Mapping + +CCW automatically maps to tool-specific syntax: + +| CCW Parameter | Gemini/Qwen | Codex | +|---------------|-------------|-------| +| `--cd ` | `cd &&` | `-C ` | +| `--includeDirs ` | `--include-directories` | `--add-dir` (per dir) | +| `--mode write` | `--approval-mode yolo` | `-s danger-full-access` | +| `--mode auto` | N/A | `-s danger-full-access` | + +### Command Examples + +#### Task-Type Specific Templates + +**Analysis Task** (Security Audit): +```bash +ccw cli exec " +PURPOSE: Identify OWASP Top 10 vulnerabilities in authentication module to pass security audit; success = all critical/high issues documented with remediation +TASK: • Scan for injection flaws (SQL, command, LDAP) • Check authentication bypass vectors • Evaluate session management • Assess sensitive data exposure +MODE: analysis +CONTEXT: @src/auth/**/* @src/middleware/auth.ts | Memory: Using bcrypt for passwords, JWT for sessions +EXPECTED: Security report with: severity matrix, file:line references, CVE mappings where applicable, remediation code snippets prioritized by risk +RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-security-risks.txt) | Focus on authentication | Ignore test files | analysis=READ-ONLY +" --tool gemini --cd src/auth --timeout 600000 +``` + +**Implementation Task** (New Feature): +```bash +ccw cli exec " +PURPOSE: Implement rate limiting for API endpoints to prevent abuse; must be configurable per-endpoint; backward compatible with existing clients +TASK: • Create rate limiter middleware with sliding window • Implement per-route configuration • Add Redis backend for distributed state • Include bypass for internal services +MODE: auto +CONTEXT: @src/middleware/**/* @src/config/**/* | Memory: Using Express.js, Redis already configured, existing middleware pattern in auth.ts +EXPECTED: Production-ready code with: TypeScript types, unit tests, integration test, configuration example, migration guide +RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-implement-feature.txt) | Follow existing middleware patterns | No breaking changes | auto=FULL +" --tool codex --mode auto --timeout 1800000 +``` + +**Bug Fix Task**: +```bash +ccw cli exec " +PURPOSE: Fix memory leak in WebSocket connection handler causing server OOM after 24h; root cause must be identified before any fix +TASK: • Trace connection lifecycle from open to close • Identify event listener accumulation • Check cleanup on disconnect • Verify garbage collection eligibility +MODE: analysis +CONTEXT: @src/websocket/**/* @src/services/connection-manager.ts | Memory: Using ws library, ~5000 concurrent connections in production +EXPECTED: Root cause analysis with: memory profile, leak source (file:line), fix recommendation with code, verification steps +RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on resource cleanup | analysis=READ-ONLY +" --tool gemini --cd src --timeout 900000 +``` + +**Refactoring Task**: +```bash +ccw cli exec " +PURPOSE: Refactor payment processing to use strategy pattern for multi-gateway support; no functional changes; all existing tests must pass +TASK: • Extract gateway interface from current implementation • Create strategy classes for Stripe, PayPal • Implement factory for gateway selection • Migrate existing code to use strategies +MODE: write +CONTEXT: @src/payments/**/* @src/types/payment.ts | Memory: Currently only Stripe, adding PayPal next sprint, must support future gateways +EXPECTED: Refactored code with: strategy interface, concrete implementations, factory class, updated tests, migration checklist +RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-refactor-codebase.txt) | Preserve all existing behavior | Tests must pass | write=CREATE/MODIFY/DELETE +" --tool gemini --mode write --timeout 1200000 +``` +--- + +## Configuration + +### Timeout Allocation + +**Minimum**: 5 minutes (300000ms) + +| Complexity | Range | Examples | +|------------|-------|----------| +| Simple | 5-10min (300000-600000ms) | Analysis, search | +| Medium | 10-20min (600000-1200000ms) | Refactoring, documentation | +| Complex | 20-60min (1200000-3600000ms) | Implementation, migration | +| Heavy | 60-120min (3600000-7200000ms) | Large codebase, multi-file | + +**Codex Multiplier**: 3x allocated time (minimum 15min / 900000ms) + +```bash +ccw cli exec "" --tool gemini --timeout 600000 # 10 min +ccw cli exec "" --tool codex --timeout 1800000 # 30 min +``` + +### Permission Framework + +**Single-Use Authorization**: Each execution requires explicit user instruction. Previous authorization does NOT carry over. + +**Mode Hierarchy**: - `analysis` (default): Read-only, safe for auto-execution -- `write`: Requires explicit `--mode write` - creates/modifies/deletes files -- `auto`: Requires explicit `--mode auto` - full autonomous operations (Codex only) +- `write`: Requires explicit `--mode write` +- `auto`: Requires explicit `--mode auto` +- **Exception**: User provides clear instructions like "modify", "create", "implement" -## Timeout Guidelines +--- -- Simple (5-10min): Analysis, search -- Medium (10-20min): Refactoring, documentation -- Complex (20-60min): Implementation, migration -- Heavy (60-120min): Large codebase, multi-file operations -- Codex multiplier: 3x allocated time (minimum 15min) +## Best Practices + +### Workflow Principles + +- **Use CCW unified interface** for all executions +- **Always include template** - Use Task-Template Matrix or universal fallback +- **Be specific** - Clear PURPOSE, TASK, EXPECTED fields +- **Include constraints** - File patterns, scope in RULES +- **Leverage memory context** when building on previous work +- **Discover patterns first** - Use rg/MCP before CLI execution +- **Default to full context** - Use `@**/*` unless specific files needed + +### Workflow Integration + +| Phase | Command | +|-------|---------| +| Understanding | `ccw cli exec "" --tool gemini` | +| Architecture | `ccw cli exec "" --tool gemini` | +| Implementation | `ccw cli exec "" --tool codex --mode auto` | +| Quality | `ccw cli exec "" --tool codex --mode write` | + +### Planning Checklist + +- [ ] **Purpose defined** - Clear goal and intent +- [ ] **Mode selected** - `--mode analysis|write|auto` +- [ ] **Context gathered** - File references + memory (default `@**/*`) +- [ ] **Directory navigation** - `--cd` and/or `--includeDirs` +- [ ] **Tool selected** - `--tool gemini|qwen|codex` +- [ ] **Template applied (REQUIRED)** - Use specific or universal fallback template +- [ ] **Constraints specified** - Scope, requirements +- [ ] **Timeout configured** - Based on complexity diff --git a/.claude/rules/context-requirements.md b/.claude/rules/context-requirements.md index 72f77d89..c47e9263 100644 --- a/.claude/rules/context-requirements.md +++ b/.claude/rules/context-requirements.md @@ -5,3 +5,42 @@ Before implementation, always: - Identify 3+ existing similar patterns before implementation - Map dependencies and integration points - Understand testing framework and coding conventions + +## Context Gathering + +### Use Exa +- Researching external APIs, libraries, frameworks +- Need recent documentation beyond knowledge cutoff +- Looking for implementation examples in public repos +- User mentions specific library/framework names +- Questions about "best practices" or "how does X work" + +### Use read_file (MCP) +- Reading multiple related files at once +- Directory traversal with pattern matching +- Searching file content with regex +- Need to limit depth/file count for large directories +- Batch operations on multiple files +- Pattern-based filtering (glob + content regex) + +### Use codex_lens +- Large codebase (>500 files) requiring repeated searches +- Need semantic understanding of code relationships +- Working across multiple sessions +- Symbol-level navigation needed +- Finding all implementations of interface/class +- Tracking function calls across codebase + +### Use smart_search +- Unknown file locations +- Concept/semantic search ("authentication logic", "payment processing") +- Medium-sized codebase (100-500 files) +- One-time or infrequent searches +- Natural language queries about code structure + +**Mode Selection**: +- `auto`: Let tool decide (default) +- `exact`: Known exact pattern +- `fuzzy`: Typo-tolerant search +- `semantic`: Concept-based search +- `graph`: Dependency analysis \ No newline at end of file diff --git a/.claude/rules/tool-selection.md b/.claude/rules/file-modification.md similarity index 52% rename from .claude/rules/tool-selection.md rename to .claude/rules/file-modification.md index 6c60d309..c58bdc06 100644 --- a/.claude/rules/tool-selection.md +++ b/.claude/rules/file-modification.md @@ -1,44 +1,3 @@ -# Tool Selection Rules - -## Context Gathering - -### Use Exa -- Researching external APIs, libraries, frameworks -- Need recent documentation beyond knowledge cutoff -- Looking for implementation examples in public repos -- User mentions specific library/framework names -- Questions about "best practices" or "how does X work" - -### Use read_file (MCP) -- Reading multiple related files at once -- Directory traversal with pattern matching -- Searching file content with regex -- Need to limit depth/file count for large directories -- Batch operations on multiple files -- Pattern-based filtering (glob + content regex) - -### Use codex_lens -- Large codebase (>500 files) requiring repeated searches -- Need semantic understanding of code relationships -- Working across multiple sessions -- Symbol-level navigation needed -- Finding all implementations of interface/class -- Tracking function calls across codebase - -### Use smart_search -- Unknown file locations -- Concept/semantic search ("authentication logic", "payment processing") -- Medium-sized codebase (100-500 files) -- One-time or infrequent searches -- Natural language queries about code structure - -**Mode Selection**: -- `auto`: Let tool decide (default) -- `exact`: Known exact pattern -- `fuzzy`: Typo-tolerant search -- `semantic`: Concept-based search -- `graph`: Dependency analysis - ## File Modification ### Use edit_file (MCP) diff --git a/.claude/rules/intelligent-tools-strategy.md b/.claude/rules/intelligent-tools-strategy.md deleted file mode 100644 index da2e5bc4..00000000 --- a/.claude/rules/intelligent-tools-strategy.md +++ /dev/null @@ -1,431 +0,0 @@ -# Intelligent Tools Selection Strategy - -## Table of Contents -1. [Quick Reference](#quick-reference) -2. [Tool Specifications](#tool-specifications) -3. [Prompt Template](#prompt-template) -4. [CLI Execution](#cli-execution) -5. [Configuration](#configuration) -6. [Best Practices](#best-practices) - ---- - -## Quick Reference - -### Universal Prompt Template - -``` -PURPOSE: [what] + [why] + [success criteria] + [constraints/scope] -TASK: • [step 1: specific action] • [step 2: specific action] • [step 3: specific action] -MODE: [analysis|write|auto] -CONTEXT: @[file patterns] | Memory: [session/tech/module context] -EXPECTED: [deliverable format] + [quality criteria] + [structure requirements] -RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [domain constraints] | MODE=[permission] -``` - -### Intent Capture Checklist (Before CLI Execution) - -**⚠️ CRITICAL**: Before executing any CLI command, verify these intent dimensions: -**Intent Validation Questions**: -- [ ] Is the objective specific and measurable? -- [ ] Are success criteria defined? -- [ ] Is the scope clearly bounded? -- [ ] Are constraints and limitations stated? -- [ ] Is the expected output format clear? -- [ ] Is the action level (read/write) explicit? - -### Tool Selection - -| Task Type | Tool | Fallback | -|-----------|------|----------| -| Analysis/Documentation | Gemini | Qwen | -| Implementation/Testing | Codex | - | - -### CCW Command Syntax - -```bash -ccw cli exec "" --tool --mode -ccw cli exec "" --tool gemini --cd --includeDirs -ccw cli exec "" --resume [id] # Resume previous session -``` - -### CLI Subcommands - -| Command | Description | -|---------|-------------| -| `ccw cli status` | Check CLI tools availability | -| `ccw cli exec ""` | Execute a CLI tool | -| `ccw cli exec "" --resume [id]` | Resume a previous session | -| `ccw cli history` | Show execution history | -| `ccw cli detail ` | Show execution detail | - -### Core Principles - -- **Use tools early and often** - Tools are faster and more thorough -- **Unified CLI** - Always use `ccw cli exec` for consistent parameter handling -- **One template required** - ALWAYS reference exactly ONE template in RULES (use universal fallback if no specific match) -- **Write protection** - Require EXPLICIT `--mode write` or `--mode auto` -- **No escape characters** - NEVER use `\$`, `\"`, `\'` in CLI commands - ---- - -## Tool Specifications - -### MODE Options - -| Mode | Permission | Use For | Specification | -|------|------------|---------|---------------| -| `analysis` | Read-only (default) | Code review, architecture analysis, pattern discovery | Auto for Gemini/Qwen | -| `write` | Create/Modify/Delete | Documentation, code creation, file modifications | Requires `--mode write` | -| `auto` | Full operations | Feature implementation, bug fixes, autonomous development | Codex only, requires `--mode auto` | - -### Gemini & Qwen - -**Via CCW**: `ccw cli exec "" --tool gemini` or `--tool qwen` - -**Characteristics**: -- Large context window, pattern recognition -- Best for: Analysis, documentation, code exploration, architecture review -- Default MODE: `analysis` (read-only) -- Priority: Prefer Gemini; use Qwen as fallback - -**Models** (override via `--model`): -- Gemini: `gemini-2.5-pro` -- Qwen: `coder-model`, `vision-model` - -**Error Handling**: HTTP 429 may show error but still return results - check if results exist - -### Codex - -**Via CCW**: `ccw cli exec "" --tool codex --mode auto` - -**Characteristics**: -- Autonomous development, mathematical reasoning -- Best for: Implementation, testing, automation -- No default MODE - must explicitly specify `--mode write` or `--mode auto` - -**Models**: `gpt-5.2` - -### Session Resume - -**Resume via `--resume` parameter**: - -```bash -ccw cli exec "Continue analyzing" --resume # Resume last session -ccw cli exec "Fix issues found" --resume # Resume specific session -``` - -| Value | Description | -|-------|-------------| -| `--resume` (empty) | Resume most recent session | -| `--resume ` | Resume specific execution ID | - -**Context Assembly** (automatic): -``` -=== PREVIOUS CONVERSATION === -USER PROMPT: [Previous prompt] -ASSISTANT RESPONSE: [Previous output] -=== CONTINUATION === -[Your new prompt] -``` - -**Tool Behavior**: Codex uses native `codex resume`; Gemini/Qwen assembles context as single prompt - ---- - -## Prompt Template - -### Template Structure - -Every command MUST include these fields: - -| Field | Purpose | Components | Bad Example | Good Example | -|-------|---------|------------|-------------|--------------| -| **PURPOSE** | Goal + motivation + success | What + Why + Success Criteria + Constraints | "Analyze code" | "Identify security vulnerabilities in auth module to pass compliance audit; success = all OWASP Top 10 addressed; scope = src/auth/** only" | -| **TASK** | Actionable steps | Specific verbs + targets | "• Review code • Find issues" | "• Scan for SQL injection in query builders • Check XSS in template rendering • Verify CSRF token validation" | -| **MODE** | Permission level | analysis / write / auto | (missing) | "analysis" or "write" | -| **CONTEXT** | File scope + history | File patterns + Memory | "@**/*" | "@src/auth/**/*.ts @shared/utils/security.ts \| Memory: Previous auth refactoring (WFS-001)" | -| **EXPECTED** | Output specification | Format + Quality + Structure | "Report" | "Markdown report with: severity levels (Critical/High/Medium/Low), file:line references, remediation code snippets, priority ranking" | -| **RULES** | Template + constraints | $(cat template) + domain rules | (missing) | "$(cat ~/.claude/.../security.txt) \| Focus on authentication \| Ignore test files \| analysis=READ-ONLY" | - - -### CONTEXT Configuration - -**Format**: `CONTEXT: [file patterns] | Memory: [memory context]` - -#### File Patterns - -| Pattern | Scope | -|---------|-------| -| `@**/*` | All files (default) | -| `@src/**/*.ts` | TypeScript in src | -| `@../shared/**/*` | Sibling directory (requires `--includeDirs`) | -| `@CLAUDE.md` | Specific file | - -#### Memory Context - -Include when building on previous work: - -```bash -# Cross-task reference -Memory: Building on auth refactoring (commit abc123), implementing refresh tokens - -# Cross-module integration -Memory: Integration with auth module, using shared error patterns from @shared/utils/errors.ts -``` - -**Memory Sources**: -- **Related Tasks**: Previous refactoring, extensions, conflict resolution -- **Tech Stack Patterns**: Framework conventions, security guidelines -- **Cross-Module References**: Integration points, shared utilities, type dependencies - -#### Pattern Discovery Workflow - -For complex requirements, discover files BEFORE CLI execution: - -```bash -# Step 1: Discover files -rg "export.*Component" --files-with-matches --type ts - -# Step 2: Build CONTEXT -CONTEXT: @components/Auth.tsx @types/auth.d.ts | Memory: Previous type refactoring - -# Step 3: Execute CLI -ccw cli exec "..." --tool gemini --cd src -``` - -### RULES Configuration - -**Format**: `RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [constraints]` - -**⚠️ MANDATORY**: Exactly ONE template reference is REQUIRED. Select from Task-Template Matrix or use universal fallback: -- `universal/00-universal-rigorous-style.txt` - For precision-critical tasks (default fallback) -- `universal/00-universal-creative-style.txt` - For exploratory tasks - -**Command Substitution Rules**: -- Use `$(cat ...)` directly - do NOT read template content first -- NEVER use escape characters: `\$`, `\"`, `\'` -- Tilde expands correctly in prompt context - -**Examples**: -```bash -# Specific template (preferred) -RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on auth | analysis=READ-ONLY - -# Universal fallback (when no specific template matches) -RULES: $(cat ~/.claude/workflows/cli-templates/prompts/universal/00-universal-rigorous-style.txt) | Focus on security patterns | analysis=READ-ONLY -``` - -### Template System - -**Base Path**: `~/.claude/workflows/cli-templates/prompts/` - -**Naming Convention**: -- `00-*` - Universal fallbacks (when no specific match) -- `01-*` - Universal, high-frequency -- `02-*` - Common specialized -- `03-*` - Domain-specific - -**Universal Templates**: - -| Template | Use For | -|----------|---------| -| `universal/00-universal-rigorous-style.txt` | Precision-critical, systematic methodology | -| `universal/00-universal-creative-style.txt` | Exploratory, innovative solutions | - -**Task-Template Matrix**: - -| Task Type | Template | -|-----------|----------| -| **Analysis** | | -| Execution Tracing | `analysis/01-trace-code-execution.txt` | -| Bug Diagnosis | `analysis/01-diagnose-bug-root-cause.txt` | -| Code Patterns | `analysis/02-analyze-code-patterns.txt` | -| Document Analysis | `analysis/02-analyze-technical-document.txt` | -| Architecture Review | `analysis/02-review-architecture.txt` | -| Code Review | `analysis/02-review-code-quality.txt` | -| Performance | `analysis/03-analyze-performance.txt` | -| Security | `analysis/03-assess-security-risks.txt` | -| **Planning** | | -| Architecture | `planning/01-plan-architecture-design.txt` | -| Task Breakdown | `planning/02-breakdown-task-steps.txt` | -| Component Design | `planning/02-design-component-spec.txt` | -| Migration | `planning/03-plan-migration-strategy.txt` | -| **Development** | | -| Feature | `development/02-implement-feature.txt` | -| Refactoring | `development/02-refactor-codebase.txt` | -| Tests | `development/02-generate-tests.txt` | -| UI Component | `development/02-implement-component-ui.txt` | -| Debugging | `development/03-debug-runtime-issues.txt` | - ---- - -## CLI Execution - -### Command Options - -| Option | Description | Default | -|--------|-------------|---------| -| `--tool ` | gemini, qwen, codex | gemini | -| `--mode ` | analysis, write, auto | analysis | -| `--model ` | Model override | auto-select | -| `--cd ` | Working directory | current | -| `--includeDirs ` | Additional directories (comma-separated) | none | -| `--timeout ` | Timeout in milliseconds | 300000 | -| `--resume [id]` | Resume previous session | - | -| `--no-stream` | Disable streaming | false | - -### Directory Configuration - -#### Working Directory (`--cd`) - -When using `--cd`: -- `@**/*` = Files within working directory tree only -- CANNOT reference parent/sibling via @ alone -- Must use `--includeDirs` for external directories - -#### Include Directories (`--includeDirs`) - -**TWO-STEP requirement for external files**: -1. Add `--includeDirs` parameter -2. Reference in CONTEXT with @ patterns - -```bash -# Single directory -ccw cli exec "CONTEXT: @**/* @../shared/**/*" --cd src/auth --includeDirs ../shared - -# Multiple directories -ccw cli exec "..." --cd src/auth --includeDirs ../shared,../types,../utils -``` - -**Rule**: If CONTEXT contains `@../dir/**/*`, MUST include `--includeDirs ../dir` - -**Benefits**: Excludes unrelated directories, reduces token usage - -### CCW Parameter Mapping - -CCW automatically maps to tool-specific syntax: - -| CCW Parameter | Gemini/Qwen | Codex | -|---------------|-------------|-------| -| `--cd ` | `cd &&` | `-C ` | -| `--includeDirs ` | `--include-directories` | `--add-dir` (per dir) | -| `--mode write` | `--approval-mode yolo` | `-s danger-full-access` | -| `--mode auto` | N/A | `-s danger-full-access` | - -### Command Examples - -#### Task-Type Specific Templates - -**Analysis Task** (Security Audit): -```bash -ccw cli exec " -PURPOSE: Identify OWASP Top 10 vulnerabilities in authentication module to pass security audit; success = all critical/high issues documented with remediation -TASK: • Scan for injection flaws (SQL, command, LDAP) • Check authentication bypass vectors • Evaluate session management • Assess sensitive data exposure -MODE: analysis -CONTEXT: @src/auth/**/* @src/middleware/auth.ts | Memory: Using bcrypt for passwords, JWT for sessions -EXPECTED: Security report with: severity matrix, file:line references, CVE mappings where applicable, remediation code snippets prioritized by risk -RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-security-risks.txt) | Focus on authentication | Ignore test files | analysis=READ-ONLY -" --tool gemini --cd src/auth --timeout 600000 -``` - -**Implementation Task** (New Feature): -```bash -ccw cli exec " -PURPOSE: Implement rate limiting for API endpoints to prevent abuse; must be configurable per-endpoint; backward compatible with existing clients -TASK: • Create rate limiter middleware with sliding window • Implement per-route configuration • Add Redis backend for distributed state • Include bypass for internal services -MODE: auto -CONTEXT: @src/middleware/**/* @src/config/**/* | Memory: Using Express.js, Redis already configured, existing middleware pattern in auth.ts -EXPECTED: Production-ready code with: TypeScript types, unit tests, integration test, configuration example, migration guide -RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-implement-feature.txt) | Follow existing middleware patterns | No breaking changes | auto=FULL -" --tool codex --mode auto --timeout 1800000 -``` - -**Bug Fix Task**: -```bash -ccw cli exec " -PURPOSE: Fix memory leak in WebSocket connection handler causing server OOM after 24h; root cause must be identified before any fix -TASK: • Trace connection lifecycle from open to close • Identify event listener accumulation • Check cleanup on disconnect • Verify garbage collection eligibility -MODE: analysis -CONTEXT: @src/websocket/**/* @src/services/connection-manager.ts | Memory: Using ws library, ~5000 concurrent connections in production -EXPECTED: Root cause analysis with: memory profile, leak source (file:line), fix recommendation with code, verification steps -RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on resource cleanup | analysis=READ-ONLY -" --tool gemini --cd src --timeout 900000 -``` - -**Refactoring Task**: -```bash -ccw cli exec " -PURPOSE: Refactor payment processing to use strategy pattern for multi-gateway support; no functional changes; all existing tests must pass -TASK: • Extract gateway interface from current implementation • Create strategy classes for Stripe, PayPal • Implement factory for gateway selection • Migrate existing code to use strategies -MODE: write -CONTEXT: @src/payments/**/* @src/types/payment.ts | Memory: Currently only Stripe, adding PayPal next sprint, must support future gateways -EXPECTED: Refactored code with: strategy interface, concrete implementations, factory class, updated tests, migration checklist -RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-refactor-codebase.txt) | Preserve all existing behavior | Tests must pass | write=CREATE/MODIFY/DELETE -" --tool gemini --mode write --timeout 1200000 -``` ---- - -## Configuration - -### Timeout Allocation - -**Minimum**: 5 minutes (300000ms) - -| Complexity | Range | Examples | -|------------|-------|----------| -| Simple | 5-10min (300000-600000ms) | Analysis, search | -| Medium | 10-20min (600000-1200000ms) | Refactoring, documentation | -| Complex | 20-60min (1200000-3600000ms) | Implementation, migration | -| Heavy | 60-120min (3600000-7200000ms) | Large codebase, multi-file | - -**Codex Multiplier**: 3x allocated time (minimum 15min / 900000ms) - -```bash -ccw cli exec "" --tool gemini --timeout 600000 # 10 min -ccw cli exec "" --tool codex --timeout 1800000 # 30 min -``` - -### Permission Framework - -**Single-Use Authorization**: Each execution requires explicit user instruction. Previous authorization does NOT carry over. - -**Mode Hierarchy**: -- `analysis` (default): Read-only, safe for auto-execution -- `write`: Requires explicit `--mode write` -- `auto`: Requires explicit `--mode auto` -- **Exception**: User provides clear instructions like "modify", "create", "implement" - ---- - -## Best Practices - -### Workflow Principles - -- **Use CCW unified interface** for all executions -- **Always include template** - Use Task-Template Matrix or universal fallback -- **Be specific** - Clear PURPOSE, TASK, EXPECTED fields -- **Include constraints** - File patterns, scope in RULES -- **Leverage memory context** when building on previous work -- **Discover patterns first** - Use rg/MCP before CLI execution -- **Default to full context** - Use `@**/*` unless specific files needed - -### Workflow Integration - -| Phase | Command | -|-------|---------| -| Understanding | `ccw cli exec "" --tool gemini` | -| Architecture | `ccw cli exec "" --tool gemini` | -| Implementation | `ccw cli exec "" --tool codex --mode auto` | -| Quality | `ccw cli exec "" --tool codex --mode write` | - -### Planning Checklist - -- [ ] **Purpose defined** - Clear goal and intent -- [ ] **Mode selected** - `--mode analysis|write|auto` -- [ ] **Context gathered** - File references + memory (default `@**/*`) -- [ ] **Directory navigation** - `--cd` and/or `--includeDirs` -- [ ] **Tool selected** - `--tool gemini|qwen|codex` -- [ ] **Template applied (REQUIRED)** - Use specific or universal fallback template -- [ ] **Constraints specified** - Scope, requirements -- [ ] **Timeout configured** - Based on complexity diff --git a/ccw/src/core/routes/memory-routes.ts b/ccw/src/core/routes/memory-routes.ts index 434ebd82..443e375a 100644 --- a/ccw/src/core/routes/memory-routes.ts +++ b/ccw/src/core/routes/memory-routes.ts @@ -734,7 +734,7 @@ Return ONLY valid JSON in this exact format (no markdown, no code blocks, just p try { const configPath = join(projectPath, '.claude', 'rules', 'active_memory.md'); - const configJsonPath = join(projectPath, '.claude', 'rules', 'active_memory_config.json'); + const configJsonPath = join(projectPath, '.claude', 'active_memory_config.json'); const enabled = existsSync(configPath); let lastSync: string | null = null; let fileCount = 0; @@ -785,14 +785,18 @@ Return ONLY valid JSON in this exact format (no markdown, no code blocks, just p } const rulesDir = join(projectPath, '.claude', 'rules'); + const claudeDir = join(projectPath, '.claude'); const configPath = join(rulesDir, 'active_memory.md'); - const configJsonPath = join(rulesDir, 'active_memory_config.json'); + const configJsonPath = join(claudeDir, 'active_memory_config.json'); if (enabled) { - // Enable: Create directory and initial file + // Enable: Create directories and initial file if (!existsSync(rulesDir)) { mkdirSync(rulesDir, { recursive: true }); } + if (!existsSync(claudeDir)) { + mkdirSync(claudeDir, { recursive: true }); + } // Save config if (config) { @@ -844,11 +848,11 @@ Return ONLY valid JSON in this exact format (no markdown, no code blocks, just p try { const { config } = JSON.parse(body || '{}'); const projectPath = initialPath; - const rulesDir = join(projectPath, '.claude', 'rules'); - const configJsonPath = join(rulesDir, 'active_memory_config.json'); + const claudeDir = join(projectPath, '.claude'); + const configJsonPath = join(claudeDir, 'active_memory_config.json'); - if (!existsSync(rulesDir)) { - mkdirSync(rulesDir, { recursive: true }); + if (!existsSync(claudeDir)) { + mkdirSync(claudeDir, { recursive: true }); } writeFileSync(configJsonPath, JSON.stringify(config, null, 2), 'utf-8'); @@ -938,7 +942,10 @@ RULES: Be concise. Focus on practical understanding. Include function signatures }); if (result.success && result.execution?.output) { - cliOutput = result.execution.output; + // Extract stdout from output object + cliOutput = typeof result.execution.output === 'string' + ? result.execution.output + : result.execution.output.stdout || ''; } // Add CLI output to content @@ -1007,6 +1014,18 @@ RULES: Be concise. Focus on practical understanding. Include function signatures // Write the file writeFileSync(configPath, content, 'utf-8'); + // Broadcast Active Memory sync completion event + broadcastToClients({ + type: 'ACTIVE_MEMORY_SYNCED', + payload: { + filesAnalyzed: hotFiles.length, + path: configPath, + tool, + usedCli: cliOutput.length > 0, + timestamp: new Date().toISOString() + } + }); + res.writeHead(200, { 'Content-Type': 'application/json' }); res.end(JSON.stringify({ success: true, diff --git a/ccw/src/templates/dashboard-css/10-cli.css b/ccw/src/templates/dashboard-css/10-cli.css index 6db434b8..4c15ce3b 100644 --- a/ccw/src/templates/dashboard-css/10-cli.css +++ b/ccw/src/templates/dashboard-css/10-cli.css @@ -3757,3 +3757,205 @@ .btn-ghost.text-destructive:hover { background: hsl(var(--destructive) / 0.1); } + +/* ======================================== + * Semantic Metadata Viewer Styles + * ======================================== */ +.semantic-viewer-toolbar { + display: flex; + align-items: center; + justify-content: space-between; + padding: 0.75rem 1rem; + background: hsl(var(--muted) / 0.3); + border-bottom: 1px solid hsl(var(--border)); +} + +.semantic-table-container { + max-height: 400px; + overflow-y: auto; +} + +.semantic-table { + width: 100%; + border-collapse: collapse; + font-size: 0.8125rem; +} + +.semantic-table th { + position: sticky; + top: 0; + background: hsl(var(--card)); + padding: 0.625rem 0.75rem; + text-align: left; + font-weight: 600; + font-size: 0.75rem; + color: hsl(var(--muted-foreground)); + border-bottom: 1px solid hsl(var(--border)); + white-space: nowrap; +} + +.semantic-table td { + padding: 0.625rem 0.75rem; + border-bottom: 1px solid hsl(var(--border) / 0.5); + vertical-align: top; +} + +.semantic-row { + cursor: pointer; + transition: background 0.15s ease; +} + +.semantic-row:hover { + background: hsl(var(--hover)); +} + +.semantic-cell-file { + max-width: 200px; +} + +.semantic-cell-lang { + width: 80px; + color: hsl(var(--muted-foreground)); +} + +.semantic-cell-purpose { + max-width: 180px; + color: hsl(var(--foreground) / 0.8); +} + +.semantic-cell-keywords { + max-width: 160px; +} + +.semantic-cell-tool { + width: 70px; +} + +.semantic-cell-date { + width: 80px; + color: hsl(var(--muted-foreground)); + font-size: 0.75rem; +} + +.semantic-keyword { + display: inline-block; + padding: 0.125rem 0.375rem; + margin: 0.125rem; + background: hsl(var(--primary) / 0.1); + color: hsl(var(--primary)); + border-radius: 0.25rem; + font-size: 0.6875rem; +} + +.semantic-keyword-more { + display: inline-block; + padding: 0.125rem 0.375rem; + margin: 0.125rem; + background: hsl(var(--muted)); + color: hsl(var(--muted-foreground)); + border-radius: 0.25rem; + font-size: 0.6875rem; +} + +.tool-badge { + display: inline-block; + padding: 0.125rem 0.5rem; + border-radius: 0.25rem; + font-size: 0.6875rem; + font-weight: 500; + text-transform: capitalize; +} + +.tool-badge.tool-gemini { + background: hsl(210 80% 55% / 0.15); + color: hsl(210 80% 45%); +} + +.tool-badge.tool-qwen { + background: hsl(142 76% 36% / 0.15); + color: hsl(142 76% 36%); +} + +.tool-badge.tool-unknown { + background: hsl(var(--muted)); + color: hsl(var(--muted-foreground)); +} + +.semantic-detail-row { + background: hsl(var(--muted) / 0.2); +} + +.semantic-detail-row.hidden { + display: none; +} + +.semantic-detail-content { + padding: 1rem; +} + +.semantic-detail-section { + margin-bottom: 1rem; +} + +.semantic-detail-section h4 { + display: flex; + align-items: center; + gap: 0.5rem; + font-size: 0.75rem; + font-weight: 600; + color: hsl(var(--muted-foreground)); + margin-bottom: 0.5rem; + text-transform: uppercase; + letter-spacing: 0.05em; +} + +.semantic-detail-section p { + font-size: 0.8125rem; + line-height: 1.5; + color: hsl(var(--foreground)); +} + +.semantic-keywords-full { + display: flex; + flex-wrap: wrap; + gap: 0.25rem; +} + +.semantic-detail-meta { + display: flex; + gap: 1rem; + padding-top: 0.75rem; + border-top: 1px solid hsl(var(--border) / 0.5); + font-size: 0.75rem; + color: hsl(var(--muted-foreground)); +} + +.semantic-detail-meta span { + display: flex; + align-items: center; + gap: 0.375rem; +} + +.semantic-viewer-footer { + display: flex; + align-items: center; + justify-content: space-between; + padding: 0.75rem 1rem; + background: hsl(var(--muted) / 0.3); + border-top: 1px solid hsl(var(--border)); +} + +.semantic-loading, +.semantic-empty { + display: flex; + flex-direction: column; + align-items: center; + justify-content: center; + padding: 3rem; + text-align: center; + color: hsl(var(--muted-foreground)); +} + +.semantic-loading { + gap: 1rem; +} diff --git a/ccw/src/templates/dashboard-css/11-memory.css b/ccw/src/templates/dashboard-css/11-memory.css index 33bc6a8a..fcf0e6f8 100644 --- a/ccw/src/templates/dashboard-css/11-memory.css +++ b/ccw/src/templates/dashboard-css/11-memory.css @@ -2097,7 +2097,7 @@ position: fixed; top: 0; right: 0; - width: 480px; + width: 50vw; max-width: 100vw; height: 100vh; background: hsl(var(--card)); @@ -2132,7 +2132,6 @@ justify-content: space-between; padding: 1rem 1.25rem; border-bottom: 1px solid hsl(var(--border)); - background: hsl(var(--muted) / 0.3); } .insight-detail-header h3 { diff --git a/ccw/src/templates/dashboard-js/components/notifications.js b/ccw/src/templates/dashboard-js/components/notifications.js index e48b5967..72c7ca29 100644 --- a/ccw/src/templates/dashboard-js/components/notifications.js +++ b/ccw/src/templates/dashboard-js/components/notifications.js @@ -238,6 +238,31 @@ function handleNotification(data) { } break; + case 'ACTIVE_MEMORY_SYNCED': + // Handle Active Memory sync completion + if (typeof addGlobalNotification === 'function') { + const { filesAnalyzed, tool, usedCli } = payload; + const method = usedCli ? `CLI (${tool})` : 'Basic'; + addGlobalNotification( + 'success', + 'Active Memory synced', + { + 'Files Analyzed': filesAnalyzed, + 'Method': method, + 'Timestamp': new Date(payload.timestamp).toLocaleTimeString() + }, + 'Memory' + ); + } + // Refresh Active Memory status if on memory view + if (getCurrentView && getCurrentView() === 'memory') { + if (typeof loadActiveMemoryStatus === 'function') { + loadActiveMemoryStatus(); + } + } + console.log('[Active Memory] Sync completed:', payload); + break; + default: console.log('[WS] Unknown notification type:', type); } diff --git a/codex-lens/src/codexlens/cli/commands.py b/codex-lens/src/codexlens/cli/commands.py index a5b3f6e5..9df73aee 100644 --- a/codex-lens/src/codexlens/cli/commands.py +++ b/codex-lens/src/codexlens/cli/commands.py @@ -1123,11 +1123,11 @@ def semantic_list( registry.initialize() mapper = PathMapper() - project_info = registry.find_project(base_path) + project_info = registry.get_project(base_path) if not project_info: raise CodexLensError(f"No index found for: {base_path}. Run 'codex-lens init' first.") - index_dir = mapper.source_to_index_dir(base_path) + index_dir = Path(project_info.index_root) if not index_dir.exists(): raise CodexLensError(f"Index directory not found: {index_dir}") diff --git a/codex-lens/src/codexlens/storage/dir_index.py b/codex-lens/src/codexlens/storage/dir_index.py index 1eeed440..dcc58a24 100644 --- a/codex-lens/src/codexlens/storage/dir_index.py +++ b/codex-lens/src/codexlens/storage/dir_index.py @@ -375,6 +375,7 @@ class DirIndexStore: keywords_json = json.dumps(keywords) generated_at = time.time() + # Write to semantic_metadata table (for backward compatibility) conn.execute( """ INSERT INTO semantic_metadata(file_id, summary, keywords, purpose, llm_tool, generated_at) @@ -388,6 +389,37 @@ class DirIndexStore: """, (file_id, summary, keywords_json, purpose, llm_tool, generated_at), ) + + # Write to normalized keywords tables for optimized search + # First, remove existing keyword associations + conn.execute("DELETE FROM file_keywords WHERE file_id = ?", (file_id,)) + + # Then add new keywords + for keyword in keywords: + keyword = keyword.strip() + if not keyword: + continue + + # Insert keyword if it doesn't exist + conn.execute( + "INSERT OR IGNORE INTO keywords(keyword) VALUES(?)", + (keyword,) + ) + + # Get keyword_id + row = conn.execute( + "SELECT id FROM keywords WHERE keyword = ?", + (keyword,) + ).fetchone() + + if row: + keyword_id = row["id"] + # Link file to keyword + conn.execute( + "INSERT OR IGNORE INTO file_keywords(file_id, keyword_id) VALUES(?, ?)", + (file_id, keyword_id) + ) + conn.commit() def get_semantic_metadata(self, file_id: int) -> Optional[Dict[str, Any]]: @@ -454,11 +486,12 @@ class DirIndexStore: for row in rows ] - def search_semantic_keywords(self, keyword: str) -> List[Tuple[FileEntry, List[str]]]: + def search_semantic_keywords(self, keyword: str, use_normalized: bool = True) -> List[Tuple[FileEntry, List[str]]]: """Search files by semantic keywords. Args: keyword: Keyword to search for (case-insensitive) + use_normalized: Use optimized normalized tables (default: True) Returns: List of (FileEntry, keywords) tuples where keyword matches @@ -466,35 +499,71 @@ class DirIndexStore: with self._lock: conn = self._get_connection() - keyword_pattern = f"%{keyword}%" + if use_normalized: + # Optimized query using normalized tables with indexed lookup + # Use prefix search (keyword%) for better index utilization + keyword_pattern = f"{keyword}%" - rows = conn.execute( - """ - SELECT f.id, f.name, f.full_path, f.language, f.mtime, f.line_count, sm.keywords - FROM files f - JOIN semantic_metadata sm ON f.id = sm.file_id - WHERE sm.keywords LIKE ? COLLATE NOCASE - ORDER BY f.name - """, - (keyword_pattern,), - ).fetchall() + rows = conn.execute( + """ + SELECT f.id, f.name, f.full_path, f.language, f.mtime, f.line_count, + GROUP_CONCAT(k.keyword, ',') as keywords + FROM files f + JOIN file_keywords fk ON f.id = fk.file_id + JOIN keywords k ON fk.keyword_id = k.id + WHERE k.keyword LIKE ? COLLATE NOCASE + GROUP BY f.id, f.name, f.full_path, f.language, f.mtime, f.line_count + ORDER BY f.name + """, + (keyword_pattern,), + ).fetchall() - import json + results = [] + for row in rows: + file_entry = FileEntry( + id=int(row["id"]), + name=row["name"], + full_path=Path(row["full_path"]), + language=row["language"], + mtime=float(row["mtime"]) if row["mtime"] else 0.0, + line_count=int(row["line_count"]) if row["line_count"] else 0, + ) + keywords = row["keywords"].split(',') if row["keywords"] else [] + results.append((file_entry, keywords)) - results = [] - for row in rows: - file_entry = FileEntry( - id=int(row["id"]), - name=row["name"], - full_path=Path(row["full_path"]), - language=row["language"], - mtime=float(row["mtime"]) if row["mtime"] else 0.0, - line_count=int(row["line_count"]) if row["line_count"] else 0, - ) - keywords = json.loads(row["keywords"]) if row["keywords"] else [] - results.append((file_entry, keywords)) + return results - return results + else: + # Fallback to original query for backward compatibility + keyword_pattern = f"%{keyword}%" + + rows = conn.execute( + """ + SELECT f.id, f.name, f.full_path, f.language, f.mtime, f.line_count, sm.keywords + FROM files f + JOIN semantic_metadata sm ON f.id = sm.file_id + WHERE sm.keywords LIKE ? COLLATE NOCASE + ORDER BY f.name + """, + (keyword_pattern,), + ).fetchall() + + import json + + results = [] + for row in rows: + file_entry = FileEntry( + id=int(row["id"]), + name=row["name"], + full_path=Path(row["full_path"]), + language=row["language"], + mtime=float(row["mtime"]) if row["mtime"] else 0.0, + line_count=int(row["line_count"]) if row["line_count"] else 0, + ) + keywords = json.loads(row["keywords"]) if row["keywords"] else [] + results.append((file_entry, keywords)) + + return results def list_semantic_metadata( self, @@ -794,19 +863,26 @@ class DirIndexStore: return [row["full_path"] for row in rows] def search_symbols( - self, name: str, kind: Optional[str] = None, limit: int = 50 + self, name: str, kind: Optional[str] = None, limit: int = 50, prefix_mode: bool = True ) -> List[Symbol]: """Search symbols by name pattern. Args: - name: Symbol name pattern (LIKE query) + name: Symbol name pattern kind: Optional symbol kind filter limit: Maximum results to return + prefix_mode: If True, use prefix search (faster with index); + If False, use substring search (slower) Returns: List of Symbol objects """ - pattern = f"%{name}%" + # Prefix search is much faster as it can use index + if prefix_mode: + pattern = f"{name}%" + else: + pattern = f"%{name}%" + with self._lock: conn = self._get_connection() if kind: @@ -979,6 +1055,28 @@ class DirIndexStore: """ ) + # Normalized keywords tables for performance + conn.execute( + """ + CREATE TABLE IF NOT EXISTS keywords ( + id INTEGER PRIMARY KEY, + keyword TEXT NOT NULL UNIQUE + ) + """ + ) + + conn.execute( + """ + CREATE TABLE IF NOT EXISTS file_keywords ( + file_id INTEGER NOT NULL, + keyword_id INTEGER NOT NULL, + PRIMARY KEY (file_id, keyword_id), + FOREIGN KEY (file_id) REFERENCES files (id) ON DELETE CASCADE, + FOREIGN KEY (keyword_id) REFERENCES keywords (id) ON DELETE CASCADE + ) + """ + ) + # Indexes conn.execute("CREATE INDEX IF NOT EXISTS idx_files_name ON files(name)") conn.execute("CREATE INDEX IF NOT EXISTS idx_files_path ON files(full_path)") @@ -986,6 +1084,9 @@ class DirIndexStore: conn.execute("CREATE INDEX IF NOT EXISTS idx_symbols_name ON symbols(name)") conn.execute("CREATE INDEX IF NOT EXISTS idx_symbols_file ON symbols(file_id)") conn.execute("CREATE INDEX IF NOT EXISTS idx_semantic_file ON semantic_metadata(file_id)") + conn.execute("CREATE INDEX IF NOT EXISTS idx_keywords_keyword ON keywords(keyword)") + conn.execute("CREATE INDEX IF NOT EXISTS idx_file_keywords_file_id ON file_keywords(file_id)") + conn.execute("CREATE INDEX IF NOT EXISTS idx_file_keywords_keyword_id ON file_keywords(keyword_id)") except sqlite3.DatabaseError as exc: raise StorageError(f"Failed to create schema: {exc}") from exc diff --git a/codex-lens/src/codexlens/storage/migration_manager.py b/codex-lens/src/codexlens/storage/migration_manager.py new file mode 100644 index 00000000..835bd4a9 --- /dev/null +++ b/codex-lens/src/codexlens/storage/migration_manager.py @@ -0,0 +1,139 @@ +""" +Manages database schema migrations. + +This module provides a framework for applying versioned migrations to the SQLite +database. Migrations are discovered from the `codexlens.storage.migrations` +package and applied sequentially. The database schema version is tracked using +the `user_version` pragma. +""" + +import importlib +import logging +import pkgutil +from pathlib import Path +from sqlite3 import Connection +from typing import List, NamedTuple + +log = logging.getLogger(__name__) + + +class Migration(NamedTuple): + """Represents a single database migration.""" + + version: int + name: str + upgrade: callable + + +def discover_migrations() -> List[Migration]: + """ + Discovers and returns a sorted list of database migrations. + + Migrations are expected to be in the `codexlens.storage.migrations` package, + with filenames in the format `migration_XXX_description.py`, where XXX is + the version number. Each migration module must contain an `upgrade` function + that takes a `sqlite3.Connection` object as its argument. + + Returns: + A list of Migration objects, sorted by version. + """ + import codexlens.storage.migrations + + migrations = [] + package_path = Path(codexlens.storage.migrations.__file__).parent + + for _, name, _ in pkgutil.iter_modules([str(package_path)]): + if name.startswith("migration_"): + try: + version = int(name.split("_")[1]) + module = importlib.import_module(f"codexlens.storage.migrations.{name}") + if hasattr(module, "upgrade"): + migrations.append( + Migration(version=version, name=name, upgrade=module.upgrade) + ) + else: + log.warning(f"Migration {name} is missing 'upgrade' function.") + except (ValueError, IndexError) as e: + log.warning(f"Could not parse migration name {name}: {e}") + except ImportError as e: + log.warning(f"Could not import migration {name}: {e}") + + migrations.sort(key=lambda m: m.version) + return migrations + + +class MigrationManager: + """ + Manages the application of migrations to a database. + """ + + def __init__(self, db_conn: Connection): + """ + Initializes the MigrationManager. + + Args: + db_conn: The SQLite database connection. + """ + self.db_conn = db_conn + self.migrations = discover_migrations() + + def get_current_version(self) -> int: + """ + Gets the current version of the database schema. + + Returns: + The current schema version number. + """ + return self.db_conn.execute("PRAGMA user_version").fetchone()[0] + + def set_version(self, version: int): + """ + Sets the database schema version. + + Args: + version: The version number to set. + """ + self.db_conn.execute(f"PRAGMA user_version = {version}") + log.info(f"Database schema version set to {version}") + + def apply_migrations(self): + """ + Applies all pending migrations to the database. + + This method checks the current database version and applies all + subsequent migrations in order. Each migration is applied within + a transaction. + """ + current_version = self.get_current_version() + log.info(f"Current database schema version: {current_version}") + + for migration in self.migrations: + if migration.version > current_version: + log.info(f"Applying migration {migration.version}: {migration.name}...") + try: + self.db_conn.execute("BEGIN") + migration.upgrade(self.db_conn) + self.set_version(migration.version) + self.db_conn.execute("COMMIT") + log.info( + f"Successfully applied migration {migration.version}: {migration.name}" + ) + except Exception as e: + log.error( + f"Failed to apply migration {migration.version}: {migration.name}. Rolling back. Error: {e}", + exc_info=True, + ) + self.db_conn.execute("ROLLBACK") + raise + + latest_migration_version = self.migrations[-1].version if self.migrations else 0 + if current_version < latest_migration_version: + # This case can be hit if migrations were applied but the loop was exited + # and set_version was not called for the last one for some reason. + # To be safe, we explicitly set the version to the latest known migration. + final_version = self.get_current_version() + if final_version != latest_migration_version: + log.warning(f"Database version ({final_version}) is not the latest migration version ({latest_migration_version}). This may indicate a problem.") + + log.info("All pending migrations applied successfully.") + diff --git a/codex-lens/src/codexlens/storage/migrations/__init__.py b/codex-lens/src/codexlens/storage/migrations/__init__.py new file mode 100644 index 00000000..06e14729 --- /dev/null +++ b/codex-lens/src/codexlens/storage/migrations/__init__.py @@ -0,0 +1 @@ +# This file makes the 'migrations' directory a Python package. diff --git a/codex-lens/src/codexlens/storage/migrations/migration_001_normalize_keywords.py b/codex-lens/src/codexlens/storage/migrations/migration_001_normalize_keywords.py new file mode 100644 index 00000000..140bc217 --- /dev/null +++ b/codex-lens/src/codexlens/storage/migrations/migration_001_normalize_keywords.py @@ -0,0 +1,108 @@ +""" +Migration 001: Normalize keywords into separate tables. + +This migration introduces two new tables, `keywords` and `file_keywords`, to +store semantic keywords in a normalized fashion. It then migrates the existing +keywords from the `semantic_data` JSON blob in the `files` table into these +new tables. This is intended to speed up keyword-based searches significantly. +""" + +import json +import logging +from sqlite3 import Connection + +log = logging.getLogger(__name__) + + +def upgrade(db_conn: Connection): + """ + Applies the migration to normalize keywords. + + - Creates `keywords` and `file_keywords` tables. + - Creates indexes for efficient querying. + - Migrates data from `files.semantic_data` to the new tables. + + Args: + db_conn: The SQLite database connection. + """ + cursor = db_conn.cursor() + + log.info("Creating 'keywords' and 'file_keywords' tables...") + # Create a table to store unique keywords + cursor.execute( + """ + CREATE TABLE IF NOT EXISTS keywords ( + id INTEGER PRIMARY KEY, + keyword TEXT NOT NULL UNIQUE + ) + """ + ) + + # Create a join table to link files and keywords (many-to-many) + cursor.execute( + """ + CREATE TABLE IF NOT EXISTS file_keywords ( + file_id INTEGER NOT NULL, + keyword_id INTEGER NOT NULL, + PRIMARY KEY (file_id, keyword_id), + FOREIGN KEY (file_id) REFERENCES files (id) ON DELETE CASCADE, + FOREIGN KEY (keyword_id) REFERENCES keywords (id) ON DELETE CASCADE + ) + """ + ) + + log.info("Creating indexes for new keyword tables...") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_keywords_keyword ON keywords (keyword)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_file_keywords_file_id ON file_keywords (file_id)") + cursor.execute("CREATE INDEX IF NOT EXISTS idx_file_keywords_keyword_id ON file_keywords (keyword_id)") + + log.info("Migrating existing keywords from 'semantic_metadata' table...") + cursor.execute("SELECT file_id, keywords FROM semantic_metadata WHERE keywords IS NOT NULL AND keywords != ''") + + files_to_migrate = cursor.fetchall() + if not files_to_migrate: + log.info("No existing files with semantic metadata to migrate.") + return + + log.info(f"Found {len(files_to_migrate)} files with semantic metadata to migrate.") + + for file_id, keywords_json in files_to_migrate: + if not keywords_json: + continue + try: + keywords = json.loads(keywords_json) + + if not isinstance(keywords, list): + log.warning(f"Keywords for file_id {file_id} is not a list, skipping.") + continue + + for keyword in keywords: + if not isinstance(keyword, str): + log.warning(f"Non-string keyword '{keyword}' found for file_id {file_id}, skipping.") + continue + + keyword = keyword.strip() + if not keyword: + continue + + # Get or create keyword_id + cursor.execute("INSERT OR IGNORE INTO keywords (keyword) VALUES (?)", (keyword,)) + cursor.execute("SELECT id FROM keywords WHERE keyword = ?", (keyword,)) + keyword_id_result = cursor.fetchone() + + if keyword_id_result: + keyword_id = keyword_id_result[0] + # Link file to keyword + cursor.execute( + "INSERT OR IGNORE INTO file_keywords (file_id, keyword_id) VALUES (?, ?)", + (file_id, keyword_id), + ) + else: + log.error(f"Failed to retrieve or create keyword_id for keyword: {keyword}") + + except json.JSONDecodeError as e: + log.warning(f"Could not parse keywords for file_id {file_id}: {e}") + except Exception as e: + log.error(f"An unexpected error occurred during migration for file_id {file_id}: {e}", exc_info=True) + + log.info("Finished migrating keywords.") diff --git a/codex-lens/src/codexlens/storage/registry.py b/codex-lens/src/codexlens/storage/registry.py index 6456529f..655da830 100644 --- a/codex-lens/src/codexlens/storage/registry.py +++ b/codex-lens/src/codexlens/storage/registry.py @@ -424,6 +424,9 @@ class RegistryStore: Searches for the closest parent directory that has an index. Useful for supporting subdirectory searches. + Optimized to use single database query instead of iterating through + each parent directory level. + Args: source_path: Source directory or file path @@ -434,23 +437,30 @@ class RegistryStore: conn = self._get_connection() source_path_resolved = source_path.resolve() - # Check from current path up to root + # Build list of all parent paths from deepest to shallowest + paths_to_check = [] current = source_path_resolved while True: - current_str = str(current) - row = conn.execute( - "SELECT * FROM dir_mapping WHERE source_path=?", (current_str,) - ).fetchone() - - if row: - return self._row_to_dir_mapping(row) - + paths_to_check.append(str(current)) parent = current.parent if parent == current: # Reached filesystem root break current = parent - return None + if not paths_to_check: + return None + + # Single query with WHERE IN, ordered by path length (longest = nearest) + placeholders = ','.join('?' * len(paths_to_check)) + query = f""" + SELECT * FROM dir_mapping + WHERE source_path IN ({placeholders}) + ORDER BY LENGTH(source_path) DESC + LIMIT 1 + """ + + row = conn.execute(query, paths_to_check).fetchone() + return self._row_to_dir_mapping(row) if row else None def get_project_dirs(self, project_id: int) -> List[DirMapping]: """Get all directory mappings for a project. diff --git a/codex-lens/tests/simple_validation.py b/codex-lens/tests/simple_validation.py new file mode 100644 index 00000000..5d881bba --- /dev/null +++ b/codex-lens/tests/simple_validation.py @@ -0,0 +1,218 @@ +""" +Simple validation for performance optimizations (Windows-safe). +""" +import sys +sys.stdout.reconfigure(encoding='utf-8') + +import json +import sqlite3 +import tempfile +import time +from pathlib import Path + +from codexlens.storage.dir_index import DirIndexStore +from codexlens.storage.registry import RegistryStore + + +def main(): + print("=" * 60) + print("CodexLens Performance Optimizations - Simple Validation") + print("=" * 60) + + # Test 1: Keyword Normalization + print("\n[1/4] Testing Keyword Normalization...") + try: + tmpdir = tempfile.mkdtemp() + db_path = Path(tmpdir) / "test1.db" + + store = DirIndexStore(db_path) + store.initialize() + + file_id = store.add_file( + name="test.py", + full_path=Path(f"{tmpdir}/test.py"), + content="def hello(): pass", + language="python" + ) + + keywords = ["auth", "security", "jwt"] + store.add_semantic_metadata( + file_id=file_id, + summary="Test", + keywords=keywords, + purpose="Testing", + llm_tool="gemini" + ) + + # Check normalized tables + conn = store._get_connection() + count = conn.execute( + "SELECT COUNT(*) as c FROM file_keywords WHERE file_id=?", + (file_id,) + ).fetchone()["c"] + + store.close() + + assert count == 3, f"Expected 3 keywords, got {count}" + print(" PASS: Keywords stored in normalized tables") + + # Test optimized search + store = DirIndexStore(db_path) + results = store.search_semantic_keywords("auth", use_normalized=True) + store.close() + + assert len(results) == 1 + print(" PASS: Optimized keyword search works") + + except Exception as e: + import traceback + print(f" FAIL: {e}") + traceback.print_exc() + return 1 + + # Test 2: Path Lookup Optimization + print("\n[2/4] Testing Path Lookup Optimization...") + try: + tmpdir = tempfile.mkdtemp() + db_path = Path(tmpdir) / "test2.db" + + store = RegistryStore(db_path) + store.initialize() # Create schema + + # Register a project first + project = store.register_project( + source_root=Path("/a"), + index_root=Path("/tmp") + ) + + # Register directory + store.register_dir( + project_id=project.id, + source_path=Path("/a/b/c"), + index_path=Path("/tmp/index.db"), + depth=2, + files_count=0 + ) + + deep_path = Path("/a/b/c/d/e/f/g/h/i/j/file.py") + + start = time.perf_counter() + result = store.find_nearest_index(deep_path) + elapsed = time.perf_counter() - start + + store.close() + + assert result is not None, "No result found" + # Path is normalized, just check it contains the key parts + assert "a" in str(result.source_path) and "b" in str(result.source_path) and "c" in str(result.source_path) + assert elapsed < 0.05, f"Too slow: {elapsed*1000:.2f}ms" + + print(f" PASS: Found nearest index in {elapsed*1000:.2f}ms") + + except Exception as e: + import traceback + print(f" FAIL: {e}") + traceback.print_exc() + return 1 + + # Test 3: Symbol Search Prefix Mode + print("\n[3/4] Testing Symbol Search Prefix Mode...") + try: + tmpdir = tempfile.mkdtemp() + db_path = Path(tmpdir) / "test3.db" + + store = DirIndexStore(db_path) + store.initialize() + + from codexlens.entities import Symbol + file_id = store.add_file( + name="test.py", + full_path=Path(f"{tmpdir}/test.py"), + content="def hello(): pass\n" * 10, + language="python", + symbols=[ + Symbol(name="get_user", kind="function", range=(1, 5)), + Symbol(name="get_item", kind="function", range=(6, 10)), + Symbol(name="create_user", kind="function", range=(11, 15)), + ] + ) + + # Prefix search + results = store.search_symbols("get", prefix_mode=True) + store.close() + + assert len(results) == 2, f"Expected 2, got {len(results)}" + for symbol in results: + assert symbol.name.startswith("get") + + print(f" PASS: Prefix search found {len(results)} symbols") + + except Exception as e: + import traceback + print(f" FAIL: {e}") + traceback.print_exc() + return 1 + + # Test 4: Performance Comparison + print("\n[4/4] Testing Performance Comparison...") + try: + tmpdir = tempfile.mkdtemp() + db_path = Path(tmpdir) / "test4.db" + + store = DirIndexStore(db_path) + store.initialize() + + # Create 50 files with keywords + for i in range(50): + file_id = store.add_file( + name=f"file_{i}.py", + full_path=Path(f"{tmpdir}/file_{i}.py"), + content=f"def function_{i}(): pass", + language="python" + ) + + keywords = ["auth", "security"] if i % 2 == 0 else ["api", "endpoint"] + store.add_semantic_metadata( + file_id=file_id, + summary=f"File {i}", + keywords=keywords, + purpose="Testing", + llm_tool="gemini" + ) + + # Benchmark normalized + start = time.perf_counter() + for _ in range(5): + results_norm = store.search_semantic_keywords("auth", use_normalized=True) + norm_time = time.perf_counter() - start + + # Benchmark fallback + start = time.perf_counter() + for _ in range(5): + results_fallback = store.search_semantic_keywords("auth", use_normalized=False) + fallback_time = time.perf_counter() - start + + store.close() + + assert len(results_norm) == len(results_fallback) + speedup = fallback_time / norm_time if norm_time > 0 else 1.0 + + print(f" Normalized: {norm_time*1000:.2f}ms (5 iterations)") + print(f" Fallback: {fallback_time*1000:.2f}ms (5 iterations)") + print(f" Speedup: {speedup:.2f}x") + print(" PASS: Performance test completed") + + except Exception as e: + import traceback + print(f" FAIL: {e}") + traceback.print_exc() + return 1 + + print("\n" + "=" * 60) + print("ALL VALIDATION TESTS PASSED") + print("=" * 60) + return 0 + + +if __name__ == "__main__": + exit(main()) diff --git a/codex-lens/tests/test_performance_optimizations.py b/codex-lens/tests/test_performance_optimizations.py new file mode 100644 index 00000000..8776ecad --- /dev/null +++ b/codex-lens/tests/test_performance_optimizations.py @@ -0,0 +1,467 @@ +"""Tests for performance optimizations in CodexLens storage. + +This module tests the following optimizations: +1. Normalized keywords search (migration_001) +2. Optimized path lookup in registry +3. Prefix-mode symbol search +""" + +import json +import sqlite3 +import tempfile +import time +from pathlib import Path + +import pytest + +from codexlens.storage.dir_index import DirIndexStore +from codexlens.storage.registry import RegistryStore +from codexlens.storage.migration_manager import MigrationManager +from codexlens.storage.migrations import migration_001_normalize_keywords + + +@pytest.fixture +def temp_index_db(): + """Create a temporary dir index database.""" + with tempfile.TemporaryDirectory() as tmpdir: + db_path = Path(tmpdir) / "test_index.db" + store = DirIndexStore(db_path) + store.initialize() # Initialize schema + yield store + store.close() + + +@pytest.fixture +def temp_registry_db(): + """Create a temporary registry database.""" + with tempfile.TemporaryDirectory() as tmpdir: + db_path = Path(tmpdir) / "test_registry.db" + store = RegistryStore(db_path) + store.initialize() # Initialize schema + yield store + store.close() + + +@pytest.fixture +def populated_index_db(temp_index_db): + """Create an index database with sample data. + + Uses 100 files to provide meaningful performance comparison between + optimized and fallback implementations. + """ + from codexlens.entities import Symbol + + store = temp_index_db + + # Add files with symbols and keywords + # Using 100 files to show performance improvements + file_ids = [] + + # Define keyword pools for cycling + keyword_pools = [ + ["auth", "security", "jwt"], + ["database", "sql", "query"], + ["auth", "login", "password"], + ["api", "rest", "endpoint"], + ["cache", "redis", "performance"], + ["auth", "oauth", "token"], + ["test", "unittest", "pytest"], + ["database", "postgres", "migration"], + ["api", "graphql", "resolver"], + ["security", "encryption", "crypto"] + ] + + for i in range(100): + # Create symbols for first 50 files to have more symbol search data + symbols = None + if i < 50: + symbols = [ + Symbol(name=f"get_user_{i}", kind="function", range=(1, 10)), + Symbol(name=f"create_user_{i}", kind="function", range=(11, 20)), + Symbol(name=f"UserClass_{i}", kind="class", range=(21, 40)), + ] + + file_id = store.add_file( + name=f"file_{i}.py", + full_path=Path(f"/test/path/file_{i}.py"), + content=f"def function_{i}(): pass\n" * 10, + language="python", + symbols=symbols + ) + file_ids.append(file_id) + + # Add semantic metadata with keywords (cycle through keyword pools) + keywords = keyword_pools[i % len(keyword_pools)] + store.add_semantic_metadata( + file_id=file_id, + summary=f"Test file {file_id}", + keywords=keywords, + purpose="Testing", + llm_tool="gemini" + ) + + return store + + +class TestKeywordNormalization: + """Test normalized keywords functionality.""" + + def test_migration_creates_tables(self, temp_index_db): + """Test that migration creates keywords and file_keywords tables.""" + conn = temp_index_db._get_connection() + + # Verify tables exist (created by _create_schema) + tables = conn.execute(""" + SELECT name FROM sqlite_master + WHERE type='table' AND name IN ('keywords', 'file_keywords') + """).fetchall() + + assert len(tables) == 2 + + def test_migration_creates_indexes(self, temp_index_db): + """Test that migration creates necessary indexes.""" + conn = temp_index_db._get_connection() + + # Check for indexes + indexes = conn.execute(""" + SELECT name FROM sqlite_master + WHERE type='index' AND name IN ( + 'idx_keywords_keyword', + 'idx_file_keywords_file_id', + 'idx_file_keywords_keyword_id' + ) + """).fetchall() + + assert len(indexes) == 3 + + def test_add_semantic_metadata_populates_normalized_tables(self, temp_index_db): + """Test that adding metadata populates both old and new tables.""" + # Add a file + file_id = temp_index_db.add_file( + name="test.py", + full_path=Path("/test/test.py"), + language="python", + content="test" + ) + + # Add semantic metadata + keywords = ["auth", "security", "jwt"] + temp_index_db.add_semantic_metadata( + file_id=file_id, + summary="Test summary", + keywords=keywords, + purpose="Testing", + llm_tool="gemini" + ) + + conn = temp_index_db._get_connection() + + # Check semantic_metadata table (backward compatibility) + row = conn.execute( + "SELECT keywords FROM semantic_metadata WHERE file_id=?", + (file_id,) + ).fetchone() + assert row is not None + assert json.loads(row["keywords"]) == keywords + + # Check normalized keywords table + keyword_rows = conn.execute(""" + SELECT k.keyword + FROM file_keywords fk + JOIN keywords k ON fk.keyword_id = k.id + WHERE fk.file_id = ? + """, (file_id,)).fetchall() + + assert len(keyword_rows) == 3 + normalized_keywords = [row["keyword"] for row in keyword_rows] + assert set(normalized_keywords) == set(keywords) + + def test_search_semantic_keywords_normalized(self, populated_index_db): + """Test optimized keyword search using normalized tables.""" + results = populated_index_db.search_semantic_keywords("auth", use_normalized=True) + + # Should find 3 files with "auth" keyword + assert len(results) >= 3 + + # Verify results structure + for file_entry, keywords in results: + assert file_entry.name.startswith("file_") + assert isinstance(keywords, list) + assert any("auth" in k.lower() for k in keywords) + + def test_search_semantic_keywords_fallback(self, populated_index_db): + """Test that fallback search still works.""" + results = populated_index_db.search_semantic_keywords("auth", use_normalized=False) + + # Should find files with "auth" keyword + assert len(results) >= 3 + + for file_entry, keywords in results: + assert isinstance(keywords, list) + + +class TestPathLookupOptimization: + """Test optimized path lookup in registry.""" + + def test_find_nearest_index_shallow(self, temp_registry_db): + """Test path lookup with shallow directory structure.""" + # Register a project first + project = temp_registry_db.register_project( + source_root=Path("/test"), + index_root=Path("/tmp") + ) + + # Register directory mapping + temp_registry_db.register_dir( + project_id=project.id, + source_path=Path("/test"), + index_path=Path("/tmp/index.db"), + depth=0, + files_count=0 + ) + + # Search for subdirectory + result = temp_registry_db.find_nearest_index(Path("/test/subdir/file.py")) + + assert result is not None + # Compare as strings for cross-platform compatibility + assert "/test" in str(result.source_path) or "\\test" in str(result.source_path) + + def test_find_nearest_index_deep(self, temp_registry_db): + """Test path lookup with deep directory structure.""" + # Register a project + project = temp_registry_db.register_project( + source_root=Path("/a"), + index_root=Path("/tmp") + ) + + # Add directory mappings at different levels + temp_registry_db.register_dir( + project_id=project.id, + source_path=Path("/a"), + index_path=Path("/tmp/index_a.db"), + depth=0, + files_count=0 + ) + temp_registry_db.register_dir( + project_id=project.id, + source_path=Path("/a/b/c"), + index_path=Path("/tmp/index_abc.db"), + depth=2, + files_count=0 + ) + + # Should find nearest (longest) match + result = temp_registry_db.find_nearest_index(Path("/a/b/c/d/e/f/file.py")) + + assert result is not None + # Check that path contains the key parts + result_path = str(result.source_path) + assert "a" in result_path and "b" in result_path and "c" in result_path + + def test_find_nearest_index_not_found(self, temp_registry_db): + """Test path lookup when no mapping exists.""" + result = temp_registry_db.find_nearest_index(Path("/nonexistent/path")) + assert result is None + + def test_find_nearest_index_performance(self, temp_registry_db): + """Basic performance test for path lookup.""" + # Register a project + project = temp_registry_db.register_project( + source_root=Path("/root"), + index_root=Path("/tmp") + ) + + # Add mapping at root + temp_registry_db.register_dir( + project_id=project.id, + source_path=Path("/root"), + index_path=Path("/tmp/index.db"), + depth=0, + files_count=0 + ) + + # Test with very deep path (10 levels) + deep_path = Path("/root/a/b/c/d/e/f/g/h/i/j/file.py") + + start = time.perf_counter() + result = temp_registry_db.find_nearest_index(deep_path) + elapsed = time.perf_counter() - start + + # Should complete quickly (< 50ms even on slow systems) + assert elapsed < 0.05 + assert result is not None + + +class TestSymbolSearchOptimization: + """Test optimized symbol search.""" + + def test_symbol_search_prefix_mode(self, populated_index_db): + """Test symbol search with prefix mode.""" + results = populated_index_db.search_symbols("get", prefix_mode=True) + + # Should find symbols starting with "get" + assert len(results) > 0 + for symbol in results: + assert symbol.name.startswith("get") + + def test_symbol_search_substring_mode(self, populated_index_db): + """Test symbol search with substring mode.""" + results = populated_index_db.search_symbols("user", prefix_mode=False) + + # Should find symbols containing "user" + assert len(results) > 0 + for symbol in results: + assert "user" in symbol.name.lower() + + def test_symbol_search_with_kind_filter(self, populated_index_db): + """Test symbol search with kind filter.""" + results = populated_index_db.search_symbols( + "UserClass", + kind="class", + prefix_mode=True + ) + + # Should find only class symbols + assert len(results) > 0 + for symbol in results: + assert symbol.kind == "class" + + def test_symbol_search_limit(self, populated_index_db): + """Test symbol search respects limit.""" + results = populated_index_db.search_symbols("", prefix_mode=True, limit=5) + + # Should return at most 5 results + assert len(results) <= 5 + + +class TestMigrationManager: + """Test migration manager functionality.""" + + def test_migration_manager_tracks_version(self, temp_index_db): + """Test that migration manager tracks schema version.""" + conn = temp_index_db._get_connection() + manager = MigrationManager(conn) + + current_version = manager.get_current_version() + assert current_version >= 0 + + def test_migration_001_can_run(self, temp_index_db): + """Test that migration_001 can be applied.""" + conn = temp_index_db._get_connection() + + # Add some test data to semantic_metadata first + conn.execute(""" + INSERT INTO files(id, name, full_path, language, content, mtime, line_count) + VALUES(100, 'test.py', '/test_migration.py', 'python', 'def test(): pass', 0, 10) + """) + conn.execute(""" + INSERT INTO semantic_metadata(file_id, keywords) + VALUES(100, ?) + """, (json.dumps(["test", "keyword"]),)) + conn.commit() + + # Run migration (should be idempotent, tables already created by initialize()) + try: + migration_001_normalize_keywords.upgrade(conn) + success = True + except Exception as e: + success = False + print(f"Migration failed: {e}") + + assert success + + # Verify data was migrated + keyword_count = conn.execute(""" + SELECT COUNT(*) as c FROM file_keywords WHERE file_id=100 + """).fetchone()["c"] + + assert keyword_count == 2 # "test" and "keyword" + + +class TestPerformanceComparison: + """Compare performance of old vs new implementations.""" + + def test_keyword_search_performance(self, populated_index_db): + """Compare keyword search performance. + + IMPORTANT: The normalized query optimization is designed for large datasets + (1000+ files). On small datasets (< 1000 files), the overhead of JOINs and + GROUP BY operations can make the normalized query slower than the simple + LIKE query on JSON fields. This is expected behavior. + + Performance benefits appear when: + - Dataset size > 1000 files + - Full-table scans on JSON LIKE become the bottleneck + - Index-based lookups provide O(log N) complexity advantage + """ + # Normalized search + start = time.perf_counter() + normalized_results = populated_index_db.search_semantic_keywords( + "auth", + use_normalized=True + ) + normalized_time = time.perf_counter() - start + + # Fallback search + start = time.perf_counter() + fallback_results = populated_index_db.search_semantic_keywords( + "auth", + use_normalized=False + ) + fallback_time = time.perf_counter() - start + + # Verify correctness: both queries should return identical results + assert len(normalized_results) == len(fallback_results) + + # Verify result content matches + normalized_files = {entry.id for entry, _ in normalized_results} + fallback_files = {entry.id for entry, _ in fallback_results} + assert normalized_files == fallback_files, "Both queries must return same files" + + # Document performance characteristics (no strict assertion) + # On datasets < 1000 files, normalized may be slower due to JOIN overhead + print(f"\nKeyword search performance (100 files):") + print(f" Normalized: {normalized_time*1000:.3f}ms") + print(f" Fallback: {fallback_time*1000:.3f}ms") + print(f" Ratio: {normalized_time/fallback_time:.2f}x") + print(f" Note: Performance benefits appear with 1000+ files") + + def test_prefix_vs_substring_symbol_search(self, populated_index_db): + """Compare prefix vs substring symbol search performance. + + IMPORTANT: Prefix search optimization (LIKE 'prefix%') benefits from B-tree + indexes, but on small datasets (< 1000 symbols), the performance difference + may not be measurable or may even be slower due to query planner overhead. + + Performance benefits appear when: + - Symbol count > 1000 + - Index-based prefix search provides O(log N) advantage + - Full table scans with LIKE '%substring%' become bottleneck + """ + # Prefix search (optimized) + start = time.perf_counter() + prefix_results = populated_index_db.search_symbols("get", prefix_mode=True) + prefix_time = time.perf_counter() - start + + # Substring search (fallback) + start = time.perf_counter() + substring_results = populated_index_db.search_symbols("get", prefix_mode=False) + substring_time = time.perf_counter() - start + + # Verify correctness: prefix results should be subset of substring results + prefix_names = {s.name for s in prefix_results} + substring_names = {s.name for s in substring_results} + assert prefix_names.issubset(substring_names), "Prefix must be subset of substring" + + # Verify all prefix results actually start with search term + for symbol in prefix_results: + assert symbol.name.startswith("get"), f"Symbol {symbol.name} should start with 'get'" + + # Document performance characteristics (no strict assertion) + # On datasets < 1000 symbols, performance difference is negligible + print(f"\nSymbol search performance (150 symbols):") + print(f" Prefix: {prefix_time*1000:.3f}ms ({len(prefix_results)} results)") + print(f" Substring: {substring_time*1000:.3f}ms ({len(substring_results)} results)") + print(f" Ratio: {prefix_time/substring_time:.2f}x") + print(f" Note: Performance benefits appear with 1000+ symbols") diff --git a/codex-lens/tests/validate_optimizations.py b/codex-lens/tests/validate_optimizations.py new file mode 100644 index 00000000..a8445a9d --- /dev/null +++ b/codex-lens/tests/validate_optimizations.py @@ -0,0 +1,287 @@ +""" +Manual validation script for performance optimizations. + +This script verifies that the optimization implementations are working correctly. +Run with: python tests/validate_optimizations.py +""" + +import json +import sqlite3 +import tempfile +import time +from pathlib import Path + +from codexlens.storage.dir_index import DirIndexStore +from codexlens.storage.registry import RegistryStore +from codexlens.storage.migration_manager import MigrationManager +from codexlens.storage.migrations import migration_001_normalize_keywords + + +def test_keyword_normalization(): + """Test normalized keywords functionality.""" + print("\n=== Testing Keyword Normalization ===") + + with tempfile.TemporaryDirectory() as tmpdir: + db_path = Path(tmpdir) / "test_index.db" + store = DirIndexStore(db_path) + store.initialize() # Create schema + + # Add a test file + # Note: add_file automatically calculates mtime and line_count + file_id = store.add_file( + name="test.py", + full_path=Path("/test/test.py"), + content="def hello(): pass", + language="python" + ) + + # Add semantic metadata with keywords + keywords = ["auth", "security", "jwt"] + store.add_semantic_metadata( + file_id=file_id, + summary="Test summary", + keywords=keywords, + purpose="Testing", + llm_tool="gemini" + ) + + conn = store._get_connection() + + # Verify keywords table populated + keyword_rows = conn.execute(""" + SELECT k.keyword + FROM file_keywords fk + JOIN keywords k ON fk.keyword_id = k.id + WHERE fk.file_id = ? + """, (file_id,)).fetchall() + + normalized_keywords = [row["keyword"] for row in keyword_rows] + print(f"✓ Keywords stored in normalized tables: {normalized_keywords}") + assert set(normalized_keywords) == set(keywords), "Keywords mismatch!" + + # Test optimized search + results = store.search_semantic_keywords("auth", use_normalized=True) + print(f"✓ Found {len(results)} file(s) with keyword 'auth'") + assert len(results) > 0, "No results found!" + + # Test fallback search + results_fallback = store.search_semantic_keywords("auth", use_normalized=False) + print(f"✓ Fallback search found {len(results_fallback)} file(s)") + assert len(results) == len(results_fallback), "Result count mismatch!" + + store.close() + print("✓ Keyword normalization tests PASSED") + + +def test_path_lookup_optimization(): + """Test optimized path lookup.""" + print("\n=== Testing Path Lookup Optimization ===") + + with tempfile.TemporaryDirectory() as tmpdir: + db_path = Path(tmpdir) / "test_registry.db" + store = RegistryStore(db_path) + + # Add directory mapping + store.add_dir_mapping( + source_path=Path("/a/b/c"), + index_path=Path("/tmp/index.db"), + project_id=None + ) + + # Test deep path lookup + deep_path = Path("/a/b/c/d/e/f/g/h/i/j/file.py") + + start = time.perf_counter() + result = store.find_nearest_index(deep_path) + elapsed = time.perf_counter() - start + + print(f"✓ Found nearest index in {elapsed*1000:.2f}ms") + assert result is not None, "No result found!" + assert result.source_path == Path("/a/b/c"), "Wrong path found!" + assert elapsed < 0.05, f"Too slow: {elapsed*1000:.2f}ms" + + store.close() + print("✓ Path lookup optimization tests PASSED") + + +def test_symbol_search_prefix_mode(): + """Test symbol search with prefix mode.""" + print("\n=== Testing Symbol Search Prefix Mode ===") + + with tempfile.TemporaryDirectory() as tmpdir: + db_path = Path(tmpdir) / "test_index.db" + store = DirIndexStore(db_path) + store.initialize() # Create schema + + # Add a test file + file_id = store.add_file( + name="test.py", + full_path=Path("/test/test.py"), + content="def hello(): pass\n" * 10, # 10 lines + language="python" + ) + + # Add symbols + store.add_symbols( + file_id=file_id, + symbols=[ + ("get_user", "function", 1, 5), + ("get_item", "function", 6, 10), + ("create_user", "function", 11, 15), + ("UserClass", "class", 16, 25), + ] + ) + + # Test prefix search + results = store.search_symbols("get", prefix_mode=True) + print(f"✓ Prefix search for 'get' found {len(results)} symbol(s)") + assert len(results) == 2, f"Expected 2 symbols, got {len(results)}" + for symbol in results: + assert symbol.name.startswith("get"), f"Symbol {symbol.name} doesn't start with 'get'" + print(f" Symbols: {[s.name for s in results]}") + + # Test substring search + results_sub = store.search_symbols("user", prefix_mode=False) + print(f"✓ Substring search for 'user' found {len(results_sub)} symbol(s)") + assert len(results_sub) == 3, f"Expected 3 symbols, got {len(results_sub)}" + print(f" Symbols: {[s.name for s in results_sub]}") + + store.close() + print("✓ Symbol search optimization tests PASSED") + + +def test_migration_001(): + """Test migration_001 execution.""" + print("\n=== Testing Migration 001 ===") + + with tempfile.TemporaryDirectory() as tmpdir: + db_path = Path(tmpdir) / "test_index.db" + store = DirIndexStore(db_path) + store.initialize() # Create schema + conn = store._get_connection() + + # Add test data to semantic_metadata + conn.execute(""" + INSERT INTO files(id, name, full_path, language, mtime, line_count) + VALUES(1, 'test.py', '/test.py', 'python', 0, 10) + """) + conn.execute(""" + INSERT INTO semantic_metadata(file_id, keywords) + VALUES(1, ?) + """, (json.dumps(["test", "migration", "keyword"]),)) + conn.commit() + + # Run migration + print(" Running migration_001...") + migration_001_normalize_keywords.upgrade(conn) + print(" Migration completed successfully") + + # Verify migration results + keyword_count = conn.execute(""" + SELECT COUNT(*) as c FROM file_keywords WHERE file_id=1 + """).fetchone()["c"] + + print(f"✓ Migrated {keyword_count} keywords for file_id=1") + assert keyword_count == 3, f"Expected 3 keywords, got {keyword_count}" + + # Verify keywords table + keywords = conn.execute(""" + SELECT k.keyword FROM keywords k + JOIN file_keywords fk ON k.id = fk.keyword_id + WHERE fk.file_id = 1 + """).fetchall() + keyword_list = [row["keyword"] for row in keywords] + print(f" Keywords: {keyword_list}") + + store.close() + print("✓ Migration 001 tests PASSED") + + +def test_performance_comparison(): + """Compare performance of optimized vs fallback implementations.""" + print("\n=== Performance Comparison ===") + + with tempfile.TemporaryDirectory() as tmpdir: + db_path = Path(tmpdir) / "test_index.db" + store = DirIndexStore(db_path) + store.initialize() # Create schema + + # Create test data + print(" Creating test data...") + for i in range(100): + file_id = store.add_file( + name=f"file_{i}.py", + full_path=Path(f"/test/file_{i}.py"), + content=f"def function_{i}(): pass", + language="python" + ) + + # Vary keywords + if i % 3 == 0: + keywords = ["auth", "security"] + elif i % 3 == 1: + keywords = ["database", "query"] + else: + keywords = ["api", "endpoint"] + + store.add_semantic_metadata( + file_id=file_id, + summary=f"File {i}", + keywords=keywords, + purpose="Testing", + llm_tool="gemini" + ) + + # Benchmark normalized search + print(" Benchmarking normalized search...") + start = time.perf_counter() + for _ in range(10): + results_norm = store.search_semantic_keywords("auth", use_normalized=True) + norm_time = time.perf_counter() - start + + # Benchmark fallback search + print(" Benchmarking fallback search...") + start = time.perf_counter() + for _ in range(10): + results_fallback = store.search_semantic_keywords("auth", use_normalized=False) + fallback_time = time.perf_counter() - start + + print(f"\n Results:") + print(f" - Normalized search: {norm_time*1000:.2f}ms (10 iterations)") + print(f" - Fallback search: {fallback_time*1000:.2f}ms (10 iterations)") + print(f" - Speedup factor: {fallback_time/norm_time:.2f}x") + print(f" - Both found {len(results_norm)} files") + + assert len(results_norm) == len(results_fallback), "Result count mismatch!" + + store.close() + print("✓ Performance comparison PASSED") + + +def main(): + """Run all validation tests.""" + print("=" * 60) + print("CodexLens Performance Optimizations Validation") + print("=" * 60) + + try: + test_keyword_normalization() + test_path_lookup_optimization() + test_symbol_search_prefix_mode() + test_migration_001() + test_performance_comparison() + + print("\n" + "=" * 60) + print("✓✓✓ ALL VALIDATION TESTS PASSED ✓✓✓") + print("=" * 60) + return 0 + + except Exception as e: + print(f"\nX VALIDATION FAILED: {e}") + import traceback + traceback.print_exc() + return 1 + + +if __name__ == "__main__": + exit(main())