mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-02-05 01:50:27 +08:00
Implement database migration framework and performance optimizations
- Added active memory configuration for manual interval and Gemini tool. - Created file modification rules for handling edits and writes. - Implemented migration manager for managing database schema migrations. - Added migration 001 to normalize keywords into separate tables. - Developed tests for validating performance optimizations including keyword normalization, path lookup, and symbol search. - Created validation script to manually verify optimization implementations.
This commit is contained in:
@@ -1,36 +1,433 @@
|
||||
# CLI Tools Usage Rules
|
||||
# Intelligent Tools Selection Strategy
|
||||
|
||||
## Tool Selection
|
||||
## Table of Contents
|
||||
1. [Quick Reference](#quick-reference)
|
||||
2. [Tool Specifications](#tool-specifications)
|
||||
3. [Prompt Template](#prompt-template)
|
||||
4. [CLI Execution](#cli-execution)
|
||||
5. [Configuration](#configuration)
|
||||
6. [Best Practices](#best-practices)
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
## Quick Decision Tree
|
||||
|
||||
```
|
||||
┌─ Task Analysis/Documentation?
|
||||
│ └─→ Use Gemini (Fallback: Codex,Qwen)
|
||||
│ └─→ MODE: analysis (default, read-only)
|
||||
│
|
||||
└─ Task Implementation/Bug Fix?
|
||||
└─→ Use Codex (Fallback: Gemini,Qwen)
|
||||
└─→ MODE: auto (full operations) or write (file operations)
|
||||
```
|
||||
|
||||
|
||||
### Universal Prompt Template
|
||||
|
||||
```
|
||||
PURPOSE: [what] + [why] + [success criteria] + [constraints/scope]
|
||||
TASK: • [step 1: specific action] • [step 2: specific action] • [step 3: specific action]
|
||||
MODE: [analysis|write|auto]
|
||||
CONTEXT: @[file patterns] | Memory: [session/tech/module context]
|
||||
EXPECTED: [deliverable format] + [quality criteria] + [structure requirements]
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [domain constraints] | MODE=[permission]
|
||||
```
|
||||
|
||||
### Intent Capture Checklist (Before CLI Execution)
|
||||
|
||||
**⚠️ CRITICAL**: Before executing any CLI command, verify these intent dimensions:
|
||||
**Intent Validation Questions**:
|
||||
- [ ] Is the objective specific and measurable?
|
||||
- [ ] Are success criteria defined?
|
||||
- [ ] Is the scope clearly bounded?
|
||||
- [ ] Are constraints and limitations stated?
|
||||
- [ ] Is the expected output format clear?
|
||||
- [ ] Is the action level (read/write) explicit?
|
||||
|
||||
## Tool Selection Matrix
|
||||
|
||||
| Task Category | Tool | MODE | When to Use |
|
||||
|---------------|------|------|-------------|
|
||||
| **Read/Analyze** | Gemini/Qwen | `analysis` | Code review, architecture analysis, pattern discovery, exploration |
|
||||
| **Write/Create** | Gemini/Qwen | `write` | Documentation generation, file creation (non-code) |
|
||||
| **Implement/Fix** | Codex | `auto` | Feature implementation, bug fixes, test creation, refactoring |
|
||||
|
||||
## Essential Command Structure
|
||||
|
||||
```bash
|
||||
ccw cli exec "<PROMPT>" --tool <gemini|qwen|codex> --mode <analysis|write|auto>
|
||||
```
|
||||
|
||||
### Core Principles
|
||||
|
||||
- **Use tools early and often** - Tools are faster and more thorough
|
||||
- **Unified CLI** - Always use `ccw cli exec` for consistent parameter handling
|
||||
- **One template required** - ALWAYS reference exactly ONE template in RULES (use universal fallback if no specific match)
|
||||
- **Write protection** - Require EXPLICIT `--mode write` or `--mode auto`
|
||||
- **No escape characters** - NEVER use `\$`, `\"`, `\'` in CLI commands
|
||||
|
||||
---
|
||||
|
||||
## Tool Specifications
|
||||
|
||||
### MODE Options
|
||||
|
||||
| Mode | Permission | Use For | Specification |
|
||||
|------|------------|---------|---------------|
|
||||
| `analysis` | Read-only (default) | Code review, architecture analysis, pattern discovery | Auto for Gemini/Qwen |
|
||||
| `write` | Create/Modify/Delete | Documentation, code creation, file modifications | Requires `--mode write` |
|
||||
| `auto` | Full operations | Feature implementation, bug fixes, autonomous development | Codex only, requires `--mode auto` |
|
||||
|
||||
### Gemini & Qwen
|
||||
**Use for**: Analysis, documentation, code exploration, architecture review
|
||||
- Default MODE: `analysis` (read-only)
|
||||
- Prefer Gemini; use Qwen as fallback
|
||||
|
||||
**Via CCW**: `ccw cli exec "<prompt>" --tool gemini` or `--tool qwen`
|
||||
|
||||
**Characteristics**:
|
||||
- Large context window, pattern recognition
|
||||
- Best for: Analysis, documentation, code exploration, architecture review
|
||||
- Default MODE: `analysis` (read-only)
|
||||
- Priority: Prefer Gemini; use Qwen as fallback
|
||||
|
||||
**Models** (override via `--model`):
|
||||
- Gemini: `gemini-2.5-pro`
|
||||
- Qwen: `coder-model`, `vision-model`
|
||||
|
||||
**Error Handling**: HTTP 429 may show error but still return results - check if results exist
|
||||
|
||||
### Codex
|
||||
**Use for**: Feature implementation, bug fixes, autonomous development
|
||||
- Requires explicit `--mode auto` or `--mode write`
|
||||
|
||||
**Via CCW**: `ccw cli exec "<prompt>" --tool codex --mode auto`
|
||||
|
||||
**Characteristics**:
|
||||
- Autonomous development, mathematical reasoning
|
||||
- Best for: Implementation, testing, automation
|
||||
- No default MODE - must explicitly specify `--mode write` or `--mode auto`
|
||||
|
||||
## Core Principles
|
||||
**Models**: `gpt-5.2`
|
||||
|
||||
- Use tools early and often - tools are faster and more thorough
|
||||
- Always use `ccw cli exec` for consistent parameter handling
|
||||
- ALWAYS reference exactly ONE template in RULES section
|
||||
- Require EXPLICIT `--mode write` or `--mode auto` for modifications
|
||||
- NEVER use escape characters (`\$`, `\"`, `\'`) in CLI commands
|
||||
### Session Resume
|
||||
|
||||
## Permission Framework
|
||||
**Resume via `--resume` parameter**:
|
||||
|
||||
```bash
|
||||
ccw cli exec "Continue analyzing" --resume # Resume last session
|
||||
ccw cli exec "Fix issues found" --resume <id> # Resume specific session
|
||||
```
|
||||
|
||||
| Value | Description |
|
||||
|-------|-------------|
|
||||
| `--resume` (empty) | Resume most recent session |
|
||||
| `--resume <id>` | Resume specific execution ID |
|
||||
|
||||
**Context Assembly** (automatic):
|
||||
```
|
||||
=== PREVIOUS CONVERSATION ===
|
||||
USER PROMPT: [Previous prompt]
|
||||
ASSISTANT RESPONSE: [Previous output]
|
||||
=== CONTINUATION ===
|
||||
[Your new prompt]
|
||||
```
|
||||
|
||||
**Tool Behavior**: Codex uses native `codex resume`; Gemini/Qwen assembles context as single prompt
|
||||
|
||||
---
|
||||
|
||||
## Prompt Template
|
||||
|
||||
### Template Structure
|
||||
|
||||
Every command MUST include these fields:
|
||||
|
||||
| Field | Purpose | Components | Bad Example | Good Example |
|
||||
|-------|---------|------------|-------------|--------------|
|
||||
| **PURPOSE** | Goal + motivation + success | What + Why + Success Criteria + Constraints | "Analyze code" | "Identify security vulnerabilities in auth module to pass compliance audit; success = all OWASP Top 10 addressed; scope = src/auth/** only" |
|
||||
| **TASK** | Actionable steps | Specific verbs + targets | "• Review code • Find issues" | "• Scan for SQL injection in query builders • Check XSS in template rendering • Verify CSRF token validation" |
|
||||
| **MODE** | Permission level | analysis / write / auto | (missing) | "analysis" or "write" |
|
||||
| **CONTEXT** | File scope + history | File patterns + Memory | "@**/*" | "@src/auth/**/*.ts @shared/utils/security.ts \| Memory: Previous auth refactoring (WFS-001)" |
|
||||
| **EXPECTED** | Output specification | Format + Quality + Structure | "Report" | "Markdown report with: severity levels (Critical/High/Medium/Low), file:line references, remediation code snippets, priority ranking" |
|
||||
| **RULES** | Template + constraints | $(cat template) + domain rules | (missing) | "$(cat ~/.claude/.../security.txt) \| Focus on authentication \| Ignore test files \| analysis=READ-ONLY" |
|
||||
|
||||
|
||||
### CONTEXT Configuration
|
||||
|
||||
**Format**: `CONTEXT: [file patterns] | Memory: [memory context]`
|
||||
|
||||
#### File Patterns
|
||||
|
||||
| Pattern | Scope |
|
||||
|---------|-------|
|
||||
| `@**/*` | All files (default) |
|
||||
| `@src/**/*.ts` | TypeScript in src |
|
||||
| `@../shared/**/*` | Sibling directory (requires `--includeDirs`) |
|
||||
| `@CLAUDE.md` | Specific file |
|
||||
|
||||
#### Memory Context
|
||||
|
||||
Include when building on previous work:
|
||||
|
||||
```bash
|
||||
# Cross-task reference
|
||||
Memory: Building on auth refactoring (commit abc123), implementing refresh tokens
|
||||
|
||||
# Cross-module integration
|
||||
Memory: Integration with auth module, using shared error patterns from @shared/utils/errors.ts
|
||||
```
|
||||
|
||||
**Memory Sources**:
|
||||
- **Related Tasks**: Previous refactoring, extensions, conflict resolution
|
||||
- **Tech Stack Patterns**: Framework conventions, security guidelines
|
||||
- **Cross-Module References**: Integration points, shared utilities, type dependencies
|
||||
|
||||
#### Pattern Discovery Workflow
|
||||
|
||||
For complex requirements, discover files BEFORE CLI execution:
|
||||
|
||||
```bash
|
||||
# Step 1: Discover files
|
||||
rg "export.*Component" --files-with-matches --type ts
|
||||
|
||||
# Step 2: Build CONTEXT
|
||||
CONTEXT: @components/Auth.tsx @types/auth.d.ts | Memory: Previous type refactoring
|
||||
|
||||
# Step 3: Execute CLI
|
||||
ccw cli exec "..." --tool gemini --cd src
|
||||
```
|
||||
|
||||
### RULES Configuration
|
||||
|
||||
**Format**: `RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [constraints]`
|
||||
|
||||
**⚠️ MANDATORY**: Exactly ONE template reference is REQUIRED. Select from Task-Template Matrix or use universal fallback:
|
||||
- `universal/00-universal-rigorous-style.txt` - For precision-critical tasks (default fallback)
|
||||
- `universal/00-universal-creative-style.txt` - For exploratory tasks
|
||||
|
||||
**Command Substitution Rules**:
|
||||
- Use `$(cat ...)` directly - do NOT read template content first
|
||||
- NEVER use escape characters: `\$`, `\"`, `\'`
|
||||
- Tilde expands correctly in prompt context
|
||||
|
||||
**Examples**:
|
||||
```bash
|
||||
# Specific template (preferred)
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on auth | analysis=READ-ONLY
|
||||
|
||||
# Universal fallback (when no specific template matches)
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/universal/00-universal-rigorous-style.txt) | Focus on security patterns | analysis=READ-ONLY
|
||||
```
|
||||
|
||||
### Template System
|
||||
|
||||
**Base Path**: `~/.claude/workflows/cli-templates/prompts/`
|
||||
|
||||
**Naming Convention**:
|
||||
- `00-*` - Universal fallbacks (when no specific match)
|
||||
- `01-*` - Universal, high-frequency
|
||||
- `02-*` - Common specialized
|
||||
- `03-*` - Domain-specific
|
||||
|
||||
**Universal Templates**:
|
||||
|
||||
| Template | Use For |
|
||||
|----------|---------|
|
||||
| `universal/00-universal-rigorous-style.txt` | Precision-critical, systematic methodology |
|
||||
| `universal/00-universal-creative-style.txt` | Exploratory, innovative solutions |
|
||||
|
||||
**Task-Template Matrix**:
|
||||
|
||||
| Task Type | Template |
|
||||
|-----------|----------|
|
||||
| **Analysis** | |
|
||||
| Execution Tracing | `analysis/01-trace-code-execution.txt` |
|
||||
| Bug Diagnosis | `analysis/01-diagnose-bug-root-cause.txt` |
|
||||
| Code Patterns | `analysis/02-analyze-code-patterns.txt` |
|
||||
| Document Analysis | `analysis/02-analyze-technical-document.txt` |
|
||||
| Architecture Review | `analysis/02-review-architecture.txt` |
|
||||
| Code Review | `analysis/02-review-code-quality.txt` |
|
||||
| Performance | `analysis/03-analyze-performance.txt` |
|
||||
| Security | `analysis/03-assess-security-risks.txt` |
|
||||
| **Planning** | |
|
||||
| Architecture | `planning/01-plan-architecture-design.txt` |
|
||||
| Task Breakdown | `planning/02-breakdown-task-steps.txt` |
|
||||
| Component Design | `planning/02-design-component-spec.txt` |
|
||||
| Migration | `planning/03-plan-migration-strategy.txt` |
|
||||
| **Development** | |
|
||||
| Feature | `development/02-implement-feature.txt` |
|
||||
| Refactoring | `development/02-refactor-codebase.txt` |
|
||||
| Tests | `development/02-generate-tests.txt` |
|
||||
| UI Component | `development/02-implement-component-ui.txt` |
|
||||
| Debugging | `development/03-debug-runtime-issues.txt` |
|
||||
|
||||
---
|
||||
|
||||
## CLI Execution
|
||||
|
||||
### Command Options
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--tool <tool>` | gemini, qwen, codex | gemini |
|
||||
| `--mode <mode>` | analysis, write, auto | analysis |
|
||||
| `--model <model>` | Model override | auto-select |
|
||||
| `--cd <path>` | Working directory | current |
|
||||
| `--includeDirs <dirs>` | Additional directories (comma-separated) | none |
|
||||
| `--timeout <ms>` | Timeout in milliseconds | 300000 |
|
||||
| `--resume [id]` | Resume previous session | - |
|
||||
| `--no-stream` | Disable streaming | false |
|
||||
|
||||
### Directory Configuration
|
||||
|
||||
#### Working Directory (`--cd`)
|
||||
|
||||
When using `--cd`:
|
||||
- `@**/*` = Files within working directory tree only
|
||||
- CANNOT reference parent/sibling via @ alone
|
||||
- Must use `--includeDirs` for external directories
|
||||
|
||||
#### Include Directories (`--includeDirs`)
|
||||
|
||||
**TWO-STEP requirement for external files**:
|
||||
1. Add `--includeDirs` parameter
|
||||
2. Reference in CONTEXT with @ patterns
|
||||
|
||||
```bash
|
||||
# Single directory
|
||||
ccw cli exec "CONTEXT: @**/* @../shared/**/*" --cd src/auth --includeDirs ../shared
|
||||
|
||||
# Multiple directories
|
||||
ccw cli exec "..." --cd src/auth --includeDirs ../shared,../types,../utils
|
||||
```
|
||||
|
||||
**Rule**: If CONTEXT contains `@../dir/**/*`, MUST include `--includeDirs ../dir`
|
||||
|
||||
**Benefits**: Excludes unrelated directories, reduces token usage
|
||||
|
||||
### CCW Parameter Mapping
|
||||
|
||||
CCW automatically maps to tool-specific syntax:
|
||||
|
||||
| CCW Parameter | Gemini/Qwen | Codex |
|
||||
|---------------|-------------|-------|
|
||||
| `--cd <path>` | `cd <path> &&` | `-C <path>` |
|
||||
| `--includeDirs <dirs>` | `--include-directories` | `--add-dir` (per dir) |
|
||||
| `--mode write` | `--approval-mode yolo` | `-s danger-full-access` |
|
||||
| `--mode auto` | N/A | `-s danger-full-access` |
|
||||
|
||||
### Command Examples
|
||||
|
||||
#### Task-Type Specific Templates
|
||||
|
||||
**Analysis Task** (Security Audit):
|
||||
```bash
|
||||
ccw cli exec "
|
||||
PURPOSE: Identify OWASP Top 10 vulnerabilities in authentication module to pass security audit; success = all critical/high issues documented with remediation
|
||||
TASK: • Scan for injection flaws (SQL, command, LDAP) • Check authentication bypass vectors • Evaluate session management • Assess sensitive data exposure
|
||||
MODE: analysis
|
||||
CONTEXT: @src/auth/**/* @src/middleware/auth.ts | Memory: Using bcrypt for passwords, JWT for sessions
|
||||
EXPECTED: Security report with: severity matrix, file:line references, CVE mappings where applicable, remediation code snippets prioritized by risk
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-security-risks.txt) | Focus on authentication | Ignore test files | analysis=READ-ONLY
|
||||
" --tool gemini --cd src/auth --timeout 600000
|
||||
```
|
||||
|
||||
**Implementation Task** (New Feature):
|
||||
```bash
|
||||
ccw cli exec "
|
||||
PURPOSE: Implement rate limiting for API endpoints to prevent abuse; must be configurable per-endpoint; backward compatible with existing clients
|
||||
TASK: • Create rate limiter middleware with sliding window • Implement per-route configuration • Add Redis backend for distributed state • Include bypass for internal services
|
||||
MODE: auto
|
||||
CONTEXT: @src/middleware/**/* @src/config/**/* | Memory: Using Express.js, Redis already configured, existing middleware pattern in auth.ts
|
||||
EXPECTED: Production-ready code with: TypeScript types, unit tests, integration test, configuration example, migration guide
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-implement-feature.txt) | Follow existing middleware patterns | No breaking changes | auto=FULL
|
||||
" --tool codex --mode auto --timeout 1800000
|
||||
```
|
||||
|
||||
**Bug Fix Task**:
|
||||
```bash
|
||||
ccw cli exec "
|
||||
PURPOSE: Fix memory leak in WebSocket connection handler causing server OOM after 24h; root cause must be identified before any fix
|
||||
TASK: • Trace connection lifecycle from open to close • Identify event listener accumulation • Check cleanup on disconnect • Verify garbage collection eligibility
|
||||
MODE: analysis
|
||||
CONTEXT: @src/websocket/**/* @src/services/connection-manager.ts | Memory: Using ws library, ~5000 concurrent connections in production
|
||||
EXPECTED: Root cause analysis with: memory profile, leak source (file:line), fix recommendation with code, verification steps
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on resource cleanup | analysis=READ-ONLY
|
||||
" --tool gemini --cd src --timeout 900000
|
||||
```
|
||||
|
||||
**Refactoring Task**:
|
||||
```bash
|
||||
ccw cli exec "
|
||||
PURPOSE: Refactor payment processing to use strategy pattern for multi-gateway support; no functional changes; all existing tests must pass
|
||||
TASK: • Extract gateway interface from current implementation • Create strategy classes for Stripe, PayPal • Implement factory for gateway selection • Migrate existing code to use strategies
|
||||
MODE: write
|
||||
CONTEXT: @src/payments/**/* @src/types/payment.ts | Memory: Currently only Stripe, adding PayPal next sprint, must support future gateways
|
||||
EXPECTED: Refactored code with: strategy interface, concrete implementations, factory class, updated tests, migration checklist
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-refactor-codebase.txt) | Preserve all existing behavior | Tests must pass | write=CREATE/MODIFY/DELETE
|
||||
" --tool gemini --mode write --timeout 1200000
|
||||
```
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Timeout Allocation
|
||||
|
||||
**Minimum**: 5 minutes (300000ms)
|
||||
|
||||
| Complexity | Range | Examples |
|
||||
|------------|-------|----------|
|
||||
| Simple | 5-10min (300000-600000ms) | Analysis, search |
|
||||
| Medium | 10-20min (600000-1200000ms) | Refactoring, documentation |
|
||||
| Complex | 20-60min (1200000-3600000ms) | Implementation, migration |
|
||||
| Heavy | 60-120min (3600000-7200000ms) | Large codebase, multi-file |
|
||||
|
||||
**Codex Multiplier**: 3x allocated time (minimum 15min / 900000ms)
|
||||
|
||||
```bash
|
||||
ccw cli exec "<prompt>" --tool gemini --timeout 600000 # 10 min
|
||||
ccw cli exec "<prompt>" --tool codex --timeout 1800000 # 30 min
|
||||
```
|
||||
|
||||
### Permission Framework
|
||||
|
||||
**Single-Use Authorization**: Each execution requires explicit user instruction. Previous authorization does NOT carry over.
|
||||
|
||||
**Mode Hierarchy**:
|
||||
- `analysis` (default): Read-only, safe for auto-execution
|
||||
- `write`: Requires explicit `--mode write` - creates/modifies/deletes files
|
||||
- `auto`: Requires explicit `--mode auto` - full autonomous operations (Codex only)
|
||||
- `write`: Requires explicit `--mode write`
|
||||
- `auto`: Requires explicit `--mode auto`
|
||||
- **Exception**: User provides clear instructions like "modify", "create", "implement"
|
||||
|
||||
## Timeout Guidelines
|
||||
---
|
||||
|
||||
- Simple (5-10min): Analysis, search
|
||||
- Medium (10-20min): Refactoring, documentation
|
||||
- Complex (20-60min): Implementation, migration
|
||||
- Heavy (60-120min): Large codebase, multi-file operations
|
||||
- Codex multiplier: 3x allocated time (minimum 15min)
|
||||
## Best Practices
|
||||
|
||||
### Workflow Principles
|
||||
|
||||
- **Use CCW unified interface** for all executions
|
||||
- **Always include template** - Use Task-Template Matrix or universal fallback
|
||||
- **Be specific** - Clear PURPOSE, TASK, EXPECTED fields
|
||||
- **Include constraints** - File patterns, scope in RULES
|
||||
- **Leverage memory context** when building on previous work
|
||||
- **Discover patterns first** - Use rg/MCP before CLI execution
|
||||
- **Default to full context** - Use `@**/*` unless specific files needed
|
||||
|
||||
### Workflow Integration
|
||||
|
||||
| Phase | Command |
|
||||
|-------|---------|
|
||||
| Understanding | `ccw cli exec "<prompt>" --tool gemini` |
|
||||
| Architecture | `ccw cli exec "<prompt>" --tool gemini` |
|
||||
| Implementation | `ccw cli exec "<prompt>" --tool codex --mode auto` |
|
||||
| Quality | `ccw cli exec "<prompt>" --tool codex --mode write` |
|
||||
|
||||
### Planning Checklist
|
||||
|
||||
- [ ] **Purpose defined** - Clear goal and intent
|
||||
- [ ] **Mode selected** - `--mode analysis|write|auto`
|
||||
- [ ] **Context gathered** - File references + memory (default `@**/*`)
|
||||
- [ ] **Directory navigation** - `--cd` and/or `--includeDirs`
|
||||
- [ ] **Tool selected** - `--tool gemini|qwen|codex`
|
||||
- [ ] **Template applied (REQUIRED)** - Use specific or universal fallback template
|
||||
- [ ] **Constraints specified** - Scope, requirements
|
||||
- [ ] **Timeout configured** - Based on complexity
|
||||
|
||||
@@ -5,3 +5,42 @@ Before implementation, always:
|
||||
- Identify 3+ existing similar patterns before implementation
|
||||
- Map dependencies and integration points
|
||||
- Understand testing framework and coding conventions
|
||||
|
||||
## Context Gathering
|
||||
|
||||
### Use Exa
|
||||
- Researching external APIs, libraries, frameworks
|
||||
- Need recent documentation beyond knowledge cutoff
|
||||
- Looking for implementation examples in public repos
|
||||
- User mentions specific library/framework names
|
||||
- Questions about "best practices" or "how does X work"
|
||||
|
||||
### Use read_file (MCP)
|
||||
- Reading multiple related files at once
|
||||
- Directory traversal with pattern matching
|
||||
- Searching file content with regex
|
||||
- Need to limit depth/file count for large directories
|
||||
- Batch operations on multiple files
|
||||
- Pattern-based filtering (glob + content regex)
|
||||
|
||||
### Use codex_lens
|
||||
- Large codebase (>500 files) requiring repeated searches
|
||||
- Need semantic understanding of code relationships
|
||||
- Working across multiple sessions
|
||||
- Symbol-level navigation needed
|
||||
- Finding all implementations of interface/class
|
||||
- Tracking function calls across codebase
|
||||
|
||||
### Use smart_search
|
||||
- Unknown file locations
|
||||
- Concept/semantic search ("authentication logic", "payment processing")
|
||||
- Medium-sized codebase (100-500 files)
|
||||
- One-time or infrequent searches
|
||||
- Natural language queries about code structure
|
||||
|
||||
**Mode Selection**:
|
||||
- `auto`: Let tool decide (default)
|
||||
- `exact`: Known exact pattern
|
||||
- `fuzzy`: Typo-tolerant search
|
||||
- `semantic`: Concept-based search
|
||||
- `graph`: Dependency analysis
|
||||
@@ -1,44 +1,3 @@
|
||||
# Tool Selection Rules
|
||||
|
||||
## Context Gathering
|
||||
|
||||
### Use Exa
|
||||
- Researching external APIs, libraries, frameworks
|
||||
- Need recent documentation beyond knowledge cutoff
|
||||
- Looking for implementation examples in public repos
|
||||
- User mentions specific library/framework names
|
||||
- Questions about "best practices" or "how does X work"
|
||||
|
||||
### Use read_file (MCP)
|
||||
- Reading multiple related files at once
|
||||
- Directory traversal with pattern matching
|
||||
- Searching file content with regex
|
||||
- Need to limit depth/file count for large directories
|
||||
- Batch operations on multiple files
|
||||
- Pattern-based filtering (glob + content regex)
|
||||
|
||||
### Use codex_lens
|
||||
- Large codebase (>500 files) requiring repeated searches
|
||||
- Need semantic understanding of code relationships
|
||||
- Working across multiple sessions
|
||||
- Symbol-level navigation needed
|
||||
- Finding all implementations of interface/class
|
||||
- Tracking function calls across codebase
|
||||
|
||||
### Use smart_search
|
||||
- Unknown file locations
|
||||
- Concept/semantic search ("authentication logic", "payment processing")
|
||||
- Medium-sized codebase (100-500 files)
|
||||
- One-time or infrequent searches
|
||||
- Natural language queries about code structure
|
||||
|
||||
**Mode Selection**:
|
||||
- `auto`: Let tool decide (default)
|
||||
- `exact`: Known exact pattern
|
||||
- `fuzzy`: Typo-tolerant search
|
||||
- `semantic`: Concept-based search
|
||||
- `graph`: Dependency analysis
|
||||
|
||||
## File Modification
|
||||
|
||||
### Use edit_file (MCP)
|
||||
@@ -1,431 +0,0 @@
|
||||
# Intelligent Tools Selection Strategy
|
||||
|
||||
## Table of Contents
|
||||
1. [Quick Reference](#quick-reference)
|
||||
2. [Tool Specifications](#tool-specifications)
|
||||
3. [Prompt Template](#prompt-template)
|
||||
4. [CLI Execution](#cli-execution)
|
||||
5. [Configuration](#configuration)
|
||||
6. [Best Practices](#best-practices)
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Universal Prompt Template
|
||||
|
||||
```
|
||||
PURPOSE: [what] + [why] + [success criteria] + [constraints/scope]
|
||||
TASK: • [step 1: specific action] • [step 2: specific action] • [step 3: specific action]
|
||||
MODE: [analysis|write|auto]
|
||||
CONTEXT: @[file patterns] | Memory: [session/tech/module context]
|
||||
EXPECTED: [deliverable format] + [quality criteria] + [structure requirements]
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [domain constraints] | MODE=[permission]
|
||||
```
|
||||
|
||||
### Intent Capture Checklist (Before CLI Execution)
|
||||
|
||||
**⚠️ CRITICAL**: Before executing any CLI command, verify these intent dimensions:
|
||||
**Intent Validation Questions**:
|
||||
- [ ] Is the objective specific and measurable?
|
||||
- [ ] Are success criteria defined?
|
||||
- [ ] Is the scope clearly bounded?
|
||||
- [ ] Are constraints and limitations stated?
|
||||
- [ ] Is the expected output format clear?
|
||||
- [ ] Is the action level (read/write) explicit?
|
||||
|
||||
### Tool Selection
|
||||
|
||||
| Task Type | Tool | Fallback |
|
||||
|-----------|------|----------|
|
||||
| Analysis/Documentation | Gemini | Qwen |
|
||||
| Implementation/Testing | Codex | - |
|
||||
|
||||
### CCW Command Syntax
|
||||
|
||||
```bash
|
||||
ccw cli exec "<prompt>" --tool <gemini|qwen|codex> --mode <analysis|write|auto>
|
||||
ccw cli exec "<prompt>" --tool gemini --cd <path> --includeDirs <dirs>
|
||||
ccw cli exec "<prompt>" --resume [id] # Resume previous session
|
||||
```
|
||||
|
||||
### CLI Subcommands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `ccw cli status` | Check CLI tools availability |
|
||||
| `ccw cli exec "<prompt>"` | Execute a CLI tool |
|
||||
| `ccw cli exec "<prompt>" --resume [id]` | Resume a previous session |
|
||||
| `ccw cli history` | Show execution history |
|
||||
| `ccw cli detail <id>` | Show execution detail |
|
||||
|
||||
### Core Principles
|
||||
|
||||
- **Use tools early and often** - Tools are faster and more thorough
|
||||
- **Unified CLI** - Always use `ccw cli exec` for consistent parameter handling
|
||||
- **One template required** - ALWAYS reference exactly ONE template in RULES (use universal fallback if no specific match)
|
||||
- **Write protection** - Require EXPLICIT `--mode write` or `--mode auto`
|
||||
- **No escape characters** - NEVER use `\$`, `\"`, `\'` in CLI commands
|
||||
|
||||
---
|
||||
|
||||
## Tool Specifications
|
||||
|
||||
### MODE Options
|
||||
|
||||
| Mode | Permission | Use For | Specification |
|
||||
|------|------------|---------|---------------|
|
||||
| `analysis` | Read-only (default) | Code review, architecture analysis, pattern discovery | Auto for Gemini/Qwen |
|
||||
| `write` | Create/Modify/Delete | Documentation, code creation, file modifications | Requires `--mode write` |
|
||||
| `auto` | Full operations | Feature implementation, bug fixes, autonomous development | Codex only, requires `--mode auto` |
|
||||
|
||||
### Gemini & Qwen
|
||||
|
||||
**Via CCW**: `ccw cli exec "<prompt>" --tool gemini` or `--tool qwen`
|
||||
|
||||
**Characteristics**:
|
||||
- Large context window, pattern recognition
|
||||
- Best for: Analysis, documentation, code exploration, architecture review
|
||||
- Default MODE: `analysis` (read-only)
|
||||
- Priority: Prefer Gemini; use Qwen as fallback
|
||||
|
||||
**Models** (override via `--model`):
|
||||
- Gemini: `gemini-2.5-pro`
|
||||
- Qwen: `coder-model`, `vision-model`
|
||||
|
||||
**Error Handling**: HTTP 429 may show error but still return results - check if results exist
|
||||
|
||||
### Codex
|
||||
|
||||
**Via CCW**: `ccw cli exec "<prompt>" --tool codex --mode auto`
|
||||
|
||||
**Characteristics**:
|
||||
- Autonomous development, mathematical reasoning
|
||||
- Best for: Implementation, testing, automation
|
||||
- No default MODE - must explicitly specify `--mode write` or `--mode auto`
|
||||
|
||||
**Models**: `gpt-5.2`
|
||||
|
||||
### Session Resume
|
||||
|
||||
**Resume via `--resume` parameter**:
|
||||
|
||||
```bash
|
||||
ccw cli exec "Continue analyzing" --resume # Resume last session
|
||||
ccw cli exec "Fix issues found" --resume <id> # Resume specific session
|
||||
```
|
||||
|
||||
| Value | Description |
|
||||
|-------|-------------|
|
||||
| `--resume` (empty) | Resume most recent session |
|
||||
| `--resume <id>` | Resume specific execution ID |
|
||||
|
||||
**Context Assembly** (automatic):
|
||||
```
|
||||
=== PREVIOUS CONVERSATION ===
|
||||
USER PROMPT: [Previous prompt]
|
||||
ASSISTANT RESPONSE: [Previous output]
|
||||
=== CONTINUATION ===
|
||||
[Your new prompt]
|
||||
```
|
||||
|
||||
**Tool Behavior**: Codex uses native `codex resume`; Gemini/Qwen assembles context as single prompt
|
||||
|
||||
---
|
||||
|
||||
## Prompt Template
|
||||
|
||||
### Template Structure
|
||||
|
||||
Every command MUST include these fields:
|
||||
|
||||
| Field | Purpose | Components | Bad Example | Good Example |
|
||||
|-------|---------|------------|-------------|--------------|
|
||||
| **PURPOSE** | Goal + motivation + success | What + Why + Success Criteria + Constraints | "Analyze code" | "Identify security vulnerabilities in auth module to pass compliance audit; success = all OWASP Top 10 addressed; scope = src/auth/** only" |
|
||||
| **TASK** | Actionable steps | Specific verbs + targets | "• Review code • Find issues" | "• Scan for SQL injection in query builders • Check XSS in template rendering • Verify CSRF token validation" |
|
||||
| **MODE** | Permission level | analysis / write / auto | (missing) | "analysis" or "write" |
|
||||
| **CONTEXT** | File scope + history | File patterns + Memory | "@**/*" | "@src/auth/**/*.ts @shared/utils/security.ts \| Memory: Previous auth refactoring (WFS-001)" |
|
||||
| **EXPECTED** | Output specification | Format + Quality + Structure | "Report" | "Markdown report with: severity levels (Critical/High/Medium/Low), file:line references, remediation code snippets, priority ranking" |
|
||||
| **RULES** | Template + constraints | $(cat template) + domain rules | (missing) | "$(cat ~/.claude/.../security.txt) \| Focus on authentication \| Ignore test files \| analysis=READ-ONLY" |
|
||||
|
||||
|
||||
### CONTEXT Configuration
|
||||
|
||||
**Format**: `CONTEXT: [file patterns] | Memory: [memory context]`
|
||||
|
||||
#### File Patterns
|
||||
|
||||
| Pattern | Scope |
|
||||
|---------|-------|
|
||||
| `@**/*` | All files (default) |
|
||||
| `@src/**/*.ts` | TypeScript in src |
|
||||
| `@../shared/**/*` | Sibling directory (requires `--includeDirs`) |
|
||||
| `@CLAUDE.md` | Specific file |
|
||||
|
||||
#### Memory Context
|
||||
|
||||
Include when building on previous work:
|
||||
|
||||
```bash
|
||||
# Cross-task reference
|
||||
Memory: Building on auth refactoring (commit abc123), implementing refresh tokens
|
||||
|
||||
# Cross-module integration
|
||||
Memory: Integration with auth module, using shared error patterns from @shared/utils/errors.ts
|
||||
```
|
||||
|
||||
**Memory Sources**:
|
||||
- **Related Tasks**: Previous refactoring, extensions, conflict resolution
|
||||
- **Tech Stack Patterns**: Framework conventions, security guidelines
|
||||
- **Cross-Module References**: Integration points, shared utilities, type dependencies
|
||||
|
||||
#### Pattern Discovery Workflow
|
||||
|
||||
For complex requirements, discover files BEFORE CLI execution:
|
||||
|
||||
```bash
|
||||
# Step 1: Discover files
|
||||
rg "export.*Component" --files-with-matches --type ts
|
||||
|
||||
# Step 2: Build CONTEXT
|
||||
CONTEXT: @components/Auth.tsx @types/auth.d.ts | Memory: Previous type refactoring
|
||||
|
||||
# Step 3: Execute CLI
|
||||
ccw cli exec "..." --tool gemini --cd src
|
||||
```
|
||||
|
||||
### RULES Configuration
|
||||
|
||||
**Format**: `RULES: $(cat ~/.claude/workflows/cli-templates/prompts/[category]/[template].txt) | [constraints]`
|
||||
|
||||
**⚠️ MANDATORY**: Exactly ONE template reference is REQUIRED. Select from Task-Template Matrix or use universal fallback:
|
||||
- `universal/00-universal-rigorous-style.txt` - For precision-critical tasks (default fallback)
|
||||
- `universal/00-universal-creative-style.txt` - For exploratory tasks
|
||||
|
||||
**Command Substitution Rules**:
|
||||
- Use `$(cat ...)` directly - do NOT read template content first
|
||||
- NEVER use escape characters: `\$`, `\"`, `\'`
|
||||
- Tilde expands correctly in prompt context
|
||||
|
||||
**Examples**:
|
||||
```bash
|
||||
# Specific template (preferred)
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on auth | analysis=READ-ONLY
|
||||
|
||||
# Universal fallback (when no specific template matches)
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/universal/00-universal-rigorous-style.txt) | Focus on security patterns | analysis=READ-ONLY
|
||||
```
|
||||
|
||||
### Template System
|
||||
|
||||
**Base Path**: `~/.claude/workflows/cli-templates/prompts/`
|
||||
|
||||
**Naming Convention**:
|
||||
- `00-*` - Universal fallbacks (when no specific match)
|
||||
- `01-*` - Universal, high-frequency
|
||||
- `02-*` - Common specialized
|
||||
- `03-*` - Domain-specific
|
||||
|
||||
**Universal Templates**:
|
||||
|
||||
| Template | Use For |
|
||||
|----------|---------|
|
||||
| `universal/00-universal-rigorous-style.txt` | Precision-critical, systematic methodology |
|
||||
| `universal/00-universal-creative-style.txt` | Exploratory, innovative solutions |
|
||||
|
||||
**Task-Template Matrix**:
|
||||
|
||||
| Task Type | Template |
|
||||
|-----------|----------|
|
||||
| **Analysis** | |
|
||||
| Execution Tracing | `analysis/01-trace-code-execution.txt` |
|
||||
| Bug Diagnosis | `analysis/01-diagnose-bug-root-cause.txt` |
|
||||
| Code Patterns | `analysis/02-analyze-code-patterns.txt` |
|
||||
| Document Analysis | `analysis/02-analyze-technical-document.txt` |
|
||||
| Architecture Review | `analysis/02-review-architecture.txt` |
|
||||
| Code Review | `analysis/02-review-code-quality.txt` |
|
||||
| Performance | `analysis/03-analyze-performance.txt` |
|
||||
| Security | `analysis/03-assess-security-risks.txt` |
|
||||
| **Planning** | |
|
||||
| Architecture | `planning/01-plan-architecture-design.txt` |
|
||||
| Task Breakdown | `planning/02-breakdown-task-steps.txt` |
|
||||
| Component Design | `planning/02-design-component-spec.txt` |
|
||||
| Migration | `planning/03-plan-migration-strategy.txt` |
|
||||
| **Development** | |
|
||||
| Feature | `development/02-implement-feature.txt` |
|
||||
| Refactoring | `development/02-refactor-codebase.txt` |
|
||||
| Tests | `development/02-generate-tests.txt` |
|
||||
| UI Component | `development/02-implement-component-ui.txt` |
|
||||
| Debugging | `development/03-debug-runtime-issues.txt` |
|
||||
|
||||
---
|
||||
|
||||
## CLI Execution
|
||||
|
||||
### Command Options
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--tool <tool>` | gemini, qwen, codex | gemini |
|
||||
| `--mode <mode>` | analysis, write, auto | analysis |
|
||||
| `--model <model>` | Model override | auto-select |
|
||||
| `--cd <path>` | Working directory | current |
|
||||
| `--includeDirs <dirs>` | Additional directories (comma-separated) | none |
|
||||
| `--timeout <ms>` | Timeout in milliseconds | 300000 |
|
||||
| `--resume [id]` | Resume previous session | - |
|
||||
| `--no-stream` | Disable streaming | false |
|
||||
|
||||
### Directory Configuration
|
||||
|
||||
#### Working Directory (`--cd`)
|
||||
|
||||
When using `--cd`:
|
||||
- `@**/*` = Files within working directory tree only
|
||||
- CANNOT reference parent/sibling via @ alone
|
||||
- Must use `--includeDirs` for external directories
|
||||
|
||||
#### Include Directories (`--includeDirs`)
|
||||
|
||||
**TWO-STEP requirement for external files**:
|
||||
1. Add `--includeDirs` parameter
|
||||
2. Reference in CONTEXT with @ patterns
|
||||
|
||||
```bash
|
||||
# Single directory
|
||||
ccw cli exec "CONTEXT: @**/* @../shared/**/*" --cd src/auth --includeDirs ../shared
|
||||
|
||||
# Multiple directories
|
||||
ccw cli exec "..." --cd src/auth --includeDirs ../shared,../types,../utils
|
||||
```
|
||||
|
||||
**Rule**: If CONTEXT contains `@../dir/**/*`, MUST include `--includeDirs ../dir`
|
||||
|
||||
**Benefits**: Excludes unrelated directories, reduces token usage
|
||||
|
||||
### CCW Parameter Mapping
|
||||
|
||||
CCW automatically maps to tool-specific syntax:
|
||||
|
||||
| CCW Parameter | Gemini/Qwen | Codex |
|
||||
|---------------|-------------|-------|
|
||||
| `--cd <path>` | `cd <path> &&` | `-C <path>` |
|
||||
| `--includeDirs <dirs>` | `--include-directories` | `--add-dir` (per dir) |
|
||||
| `--mode write` | `--approval-mode yolo` | `-s danger-full-access` |
|
||||
| `--mode auto` | N/A | `-s danger-full-access` |
|
||||
|
||||
### Command Examples
|
||||
|
||||
#### Task-Type Specific Templates
|
||||
|
||||
**Analysis Task** (Security Audit):
|
||||
```bash
|
||||
ccw cli exec "
|
||||
PURPOSE: Identify OWASP Top 10 vulnerabilities in authentication module to pass security audit; success = all critical/high issues documented with remediation
|
||||
TASK: • Scan for injection flaws (SQL, command, LDAP) • Check authentication bypass vectors • Evaluate session management • Assess sensitive data exposure
|
||||
MODE: analysis
|
||||
CONTEXT: @src/auth/**/* @src/middleware/auth.ts | Memory: Using bcrypt for passwords, JWT for sessions
|
||||
EXPECTED: Security report with: severity matrix, file:line references, CVE mappings where applicable, remediation code snippets prioritized by risk
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/03-assess-security-risks.txt) | Focus on authentication | Ignore test files | analysis=READ-ONLY
|
||||
" --tool gemini --cd src/auth --timeout 600000
|
||||
```
|
||||
|
||||
**Implementation Task** (New Feature):
|
||||
```bash
|
||||
ccw cli exec "
|
||||
PURPOSE: Implement rate limiting for API endpoints to prevent abuse; must be configurable per-endpoint; backward compatible with existing clients
|
||||
TASK: • Create rate limiter middleware with sliding window • Implement per-route configuration • Add Redis backend for distributed state • Include bypass for internal services
|
||||
MODE: auto
|
||||
CONTEXT: @src/middleware/**/* @src/config/**/* | Memory: Using Express.js, Redis already configured, existing middleware pattern in auth.ts
|
||||
EXPECTED: Production-ready code with: TypeScript types, unit tests, integration test, configuration example, migration guide
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-implement-feature.txt) | Follow existing middleware patterns | No breaking changes | auto=FULL
|
||||
" --tool codex --mode auto --timeout 1800000
|
||||
```
|
||||
|
||||
**Bug Fix Task**:
|
||||
```bash
|
||||
ccw cli exec "
|
||||
PURPOSE: Fix memory leak in WebSocket connection handler causing server OOM after 24h; root cause must be identified before any fix
|
||||
TASK: • Trace connection lifecycle from open to close • Identify event listener accumulation • Check cleanup on disconnect • Verify garbage collection eligibility
|
||||
MODE: analysis
|
||||
CONTEXT: @src/websocket/**/* @src/services/connection-manager.ts | Memory: Using ws library, ~5000 concurrent connections in production
|
||||
EXPECTED: Root cause analysis with: memory profile, leak source (file:line), fix recommendation with code, verification steps
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/analysis/01-diagnose-bug-root-cause.txt) | Focus on resource cleanup | analysis=READ-ONLY
|
||||
" --tool gemini --cd src --timeout 900000
|
||||
```
|
||||
|
||||
**Refactoring Task**:
|
||||
```bash
|
||||
ccw cli exec "
|
||||
PURPOSE: Refactor payment processing to use strategy pattern for multi-gateway support; no functional changes; all existing tests must pass
|
||||
TASK: • Extract gateway interface from current implementation • Create strategy classes for Stripe, PayPal • Implement factory for gateway selection • Migrate existing code to use strategies
|
||||
MODE: write
|
||||
CONTEXT: @src/payments/**/* @src/types/payment.ts | Memory: Currently only Stripe, adding PayPal next sprint, must support future gateways
|
||||
EXPECTED: Refactored code with: strategy interface, concrete implementations, factory class, updated tests, migration checklist
|
||||
RULES: $(cat ~/.claude/workflows/cli-templates/prompts/development/02-refactor-codebase.txt) | Preserve all existing behavior | Tests must pass | write=CREATE/MODIFY/DELETE
|
||||
" --tool gemini --mode write --timeout 1200000
|
||||
```
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Timeout Allocation
|
||||
|
||||
**Minimum**: 5 minutes (300000ms)
|
||||
|
||||
| Complexity | Range | Examples |
|
||||
|------------|-------|----------|
|
||||
| Simple | 5-10min (300000-600000ms) | Analysis, search |
|
||||
| Medium | 10-20min (600000-1200000ms) | Refactoring, documentation |
|
||||
| Complex | 20-60min (1200000-3600000ms) | Implementation, migration |
|
||||
| Heavy | 60-120min (3600000-7200000ms) | Large codebase, multi-file |
|
||||
|
||||
**Codex Multiplier**: 3x allocated time (minimum 15min / 900000ms)
|
||||
|
||||
```bash
|
||||
ccw cli exec "<prompt>" --tool gemini --timeout 600000 # 10 min
|
||||
ccw cli exec "<prompt>" --tool codex --timeout 1800000 # 30 min
|
||||
```
|
||||
|
||||
### Permission Framework
|
||||
|
||||
**Single-Use Authorization**: Each execution requires explicit user instruction. Previous authorization does NOT carry over.
|
||||
|
||||
**Mode Hierarchy**:
|
||||
- `analysis` (default): Read-only, safe for auto-execution
|
||||
- `write`: Requires explicit `--mode write`
|
||||
- `auto`: Requires explicit `--mode auto`
|
||||
- **Exception**: User provides clear instructions like "modify", "create", "implement"
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Workflow Principles
|
||||
|
||||
- **Use CCW unified interface** for all executions
|
||||
- **Always include template** - Use Task-Template Matrix or universal fallback
|
||||
- **Be specific** - Clear PURPOSE, TASK, EXPECTED fields
|
||||
- **Include constraints** - File patterns, scope in RULES
|
||||
- **Leverage memory context** when building on previous work
|
||||
- **Discover patterns first** - Use rg/MCP before CLI execution
|
||||
- **Default to full context** - Use `@**/*` unless specific files needed
|
||||
|
||||
### Workflow Integration
|
||||
|
||||
| Phase | Command |
|
||||
|-------|---------|
|
||||
| Understanding | `ccw cli exec "<prompt>" --tool gemini` |
|
||||
| Architecture | `ccw cli exec "<prompt>" --tool gemini` |
|
||||
| Implementation | `ccw cli exec "<prompt>" --tool codex --mode auto` |
|
||||
| Quality | `ccw cli exec "<prompt>" --tool codex --mode write` |
|
||||
|
||||
### Planning Checklist
|
||||
|
||||
- [ ] **Purpose defined** - Clear goal and intent
|
||||
- [ ] **Mode selected** - `--mode analysis|write|auto`
|
||||
- [ ] **Context gathered** - File references + memory (default `@**/*`)
|
||||
- [ ] **Directory navigation** - `--cd` and/or `--includeDirs`
|
||||
- [ ] **Tool selected** - `--tool gemini|qwen|codex`
|
||||
- [ ] **Template applied (REQUIRED)** - Use specific or universal fallback template
|
||||
- [ ] **Constraints specified** - Scope, requirements
|
||||
- [ ] **Timeout configured** - Based on complexity
|
||||
@@ -734,7 +734,7 @@ Return ONLY valid JSON in this exact format (no markdown, no code blocks, just p
|
||||
|
||||
try {
|
||||
const configPath = join(projectPath, '.claude', 'rules', 'active_memory.md');
|
||||
const configJsonPath = join(projectPath, '.claude', 'rules', 'active_memory_config.json');
|
||||
const configJsonPath = join(projectPath, '.claude', 'active_memory_config.json');
|
||||
const enabled = existsSync(configPath);
|
||||
let lastSync: string | null = null;
|
||||
let fileCount = 0;
|
||||
@@ -785,14 +785,18 @@ Return ONLY valid JSON in this exact format (no markdown, no code blocks, just p
|
||||
}
|
||||
|
||||
const rulesDir = join(projectPath, '.claude', 'rules');
|
||||
const claudeDir = join(projectPath, '.claude');
|
||||
const configPath = join(rulesDir, 'active_memory.md');
|
||||
const configJsonPath = join(rulesDir, 'active_memory_config.json');
|
||||
const configJsonPath = join(claudeDir, 'active_memory_config.json');
|
||||
|
||||
if (enabled) {
|
||||
// Enable: Create directory and initial file
|
||||
// Enable: Create directories and initial file
|
||||
if (!existsSync(rulesDir)) {
|
||||
mkdirSync(rulesDir, { recursive: true });
|
||||
}
|
||||
if (!existsSync(claudeDir)) {
|
||||
mkdirSync(claudeDir, { recursive: true });
|
||||
}
|
||||
|
||||
// Save config
|
||||
if (config) {
|
||||
@@ -844,11 +848,11 @@ Return ONLY valid JSON in this exact format (no markdown, no code blocks, just p
|
||||
try {
|
||||
const { config } = JSON.parse(body || '{}');
|
||||
const projectPath = initialPath;
|
||||
const rulesDir = join(projectPath, '.claude', 'rules');
|
||||
const configJsonPath = join(rulesDir, 'active_memory_config.json');
|
||||
const claudeDir = join(projectPath, '.claude');
|
||||
const configJsonPath = join(claudeDir, 'active_memory_config.json');
|
||||
|
||||
if (!existsSync(rulesDir)) {
|
||||
mkdirSync(rulesDir, { recursive: true });
|
||||
if (!existsSync(claudeDir)) {
|
||||
mkdirSync(claudeDir, { recursive: true });
|
||||
}
|
||||
|
||||
writeFileSync(configJsonPath, JSON.stringify(config, null, 2), 'utf-8');
|
||||
@@ -938,7 +942,10 @@ RULES: Be concise. Focus on practical understanding. Include function signatures
|
||||
});
|
||||
|
||||
if (result.success && result.execution?.output) {
|
||||
cliOutput = result.execution.output;
|
||||
// Extract stdout from output object
|
||||
cliOutput = typeof result.execution.output === 'string'
|
||||
? result.execution.output
|
||||
: result.execution.output.stdout || '';
|
||||
}
|
||||
|
||||
// Add CLI output to content
|
||||
@@ -1007,6 +1014,18 @@ RULES: Be concise. Focus on practical understanding. Include function signatures
|
||||
// Write the file
|
||||
writeFileSync(configPath, content, 'utf-8');
|
||||
|
||||
// Broadcast Active Memory sync completion event
|
||||
broadcastToClients({
|
||||
type: 'ACTIVE_MEMORY_SYNCED',
|
||||
payload: {
|
||||
filesAnalyzed: hotFiles.length,
|
||||
path: configPath,
|
||||
tool,
|
||||
usedCli: cliOutput.length > 0,
|
||||
timestamp: new Date().toISOString()
|
||||
}
|
||||
});
|
||||
|
||||
res.writeHead(200, { 'Content-Type': 'application/json' });
|
||||
res.end(JSON.stringify({
|
||||
success: true,
|
||||
|
||||
@@ -3757,3 +3757,205 @@
|
||||
.btn-ghost.text-destructive:hover {
|
||||
background: hsl(var(--destructive) / 0.1);
|
||||
}
|
||||
|
||||
/* ========================================
|
||||
* Semantic Metadata Viewer Styles
|
||||
* ======================================== */
|
||||
.semantic-viewer-toolbar {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 0.75rem 1rem;
|
||||
background: hsl(var(--muted) / 0.3);
|
||||
border-bottom: 1px solid hsl(var(--border));
|
||||
}
|
||||
|
||||
.semantic-table-container {
|
||||
max-height: 400px;
|
||||
overflow-y: auto;
|
||||
}
|
||||
|
||||
.semantic-table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
font-size: 0.8125rem;
|
||||
}
|
||||
|
||||
.semantic-table th {
|
||||
position: sticky;
|
||||
top: 0;
|
||||
background: hsl(var(--card));
|
||||
padding: 0.625rem 0.75rem;
|
||||
text-align: left;
|
||||
font-weight: 600;
|
||||
font-size: 0.75rem;
|
||||
color: hsl(var(--muted-foreground));
|
||||
border-bottom: 1px solid hsl(var(--border));
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
.semantic-table td {
|
||||
padding: 0.625rem 0.75rem;
|
||||
border-bottom: 1px solid hsl(var(--border) / 0.5);
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
.semantic-row {
|
||||
cursor: pointer;
|
||||
transition: background 0.15s ease;
|
||||
}
|
||||
|
||||
.semantic-row:hover {
|
||||
background: hsl(var(--hover));
|
||||
}
|
||||
|
||||
.semantic-cell-file {
|
||||
max-width: 200px;
|
||||
}
|
||||
|
||||
.semantic-cell-lang {
|
||||
width: 80px;
|
||||
color: hsl(var(--muted-foreground));
|
||||
}
|
||||
|
||||
.semantic-cell-purpose {
|
||||
max-width: 180px;
|
||||
color: hsl(var(--foreground) / 0.8);
|
||||
}
|
||||
|
||||
.semantic-cell-keywords {
|
||||
max-width: 160px;
|
||||
}
|
||||
|
||||
.semantic-cell-tool {
|
||||
width: 70px;
|
||||
}
|
||||
|
||||
.semantic-cell-date {
|
||||
width: 80px;
|
||||
color: hsl(var(--muted-foreground));
|
||||
font-size: 0.75rem;
|
||||
}
|
||||
|
||||
.semantic-keyword {
|
||||
display: inline-block;
|
||||
padding: 0.125rem 0.375rem;
|
||||
margin: 0.125rem;
|
||||
background: hsl(var(--primary) / 0.1);
|
||||
color: hsl(var(--primary));
|
||||
border-radius: 0.25rem;
|
||||
font-size: 0.6875rem;
|
||||
}
|
||||
|
||||
.semantic-keyword-more {
|
||||
display: inline-block;
|
||||
padding: 0.125rem 0.375rem;
|
||||
margin: 0.125rem;
|
||||
background: hsl(var(--muted));
|
||||
color: hsl(var(--muted-foreground));
|
||||
border-radius: 0.25rem;
|
||||
font-size: 0.6875rem;
|
||||
}
|
||||
|
||||
.tool-badge {
|
||||
display: inline-block;
|
||||
padding: 0.125rem 0.5rem;
|
||||
border-radius: 0.25rem;
|
||||
font-size: 0.6875rem;
|
||||
font-weight: 500;
|
||||
text-transform: capitalize;
|
||||
}
|
||||
|
||||
.tool-badge.tool-gemini {
|
||||
background: hsl(210 80% 55% / 0.15);
|
||||
color: hsl(210 80% 45%);
|
||||
}
|
||||
|
||||
.tool-badge.tool-qwen {
|
||||
background: hsl(142 76% 36% / 0.15);
|
||||
color: hsl(142 76% 36%);
|
||||
}
|
||||
|
||||
.tool-badge.tool-unknown {
|
||||
background: hsl(var(--muted));
|
||||
color: hsl(var(--muted-foreground));
|
||||
}
|
||||
|
||||
.semantic-detail-row {
|
||||
background: hsl(var(--muted) / 0.2);
|
||||
}
|
||||
|
||||
.semantic-detail-row.hidden {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.semantic-detail-content {
|
||||
padding: 1rem;
|
||||
}
|
||||
|
||||
.semantic-detail-section {
|
||||
margin-bottom: 1rem;
|
||||
}
|
||||
|
||||
.semantic-detail-section h4 {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
font-size: 0.75rem;
|
||||
font-weight: 600;
|
||||
color: hsl(var(--muted-foreground));
|
||||
margin-bottom: 0.5rem;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.05em;
|
||||
}
|
||||
|
||||
.semantic-detail-section p {
|
||||
font-size: 0.8125rem;
|
||||
line-height: 1.5;
|
||||
color: hsl(var(--foreground));
|
||||
}
|
||||
|
||||
.semantic-keywords-full {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.25rem;
|
||||
}
|
||||
|
||||
.semantic-detail-meta {
|
||||
display: flex;
|
||||
gap: 1rem;
|
||||
padding-top: 0.75rem;
|
||||
border-top: 1px solid hsl(var(--border) / 0.5);
|
||||
font-size: 0.75rem;
|
||||
color: hsl(var(--muted-foreground));
|
||||
}
|
||||
|
||||
.semantic-detail-meta span {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.375rem;
|
||||
}
|
||||
|
||||
.semantic-viewer-footer {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: 0.75rem 1rem;
|
||||
background: hsl(var(--muted) / 0.3);
|
||||
border-top: 1px solid hsl(var(--border));
|
||||
}
|
||||
|
||||
.semantic-loading,
|
||||
.semantic-empty {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
padding: 3rem;
|
||||
text-align: center;
|
||||
color: hsl(var(--muted-foreground));
|
||||
}
|
||||
|
||||
.semantic-loading {
|
||||
gap: 1rem;
|
||||
}
|
||||
|
||||
@@ -2097,7 +2097,7 @@
|
||||
position: fixed;
|
||||
top: 0;
|
||||
right: 0;
|
||||
width: 480px;
|
||||
width: 50vw;
|
||||
max-width: 100vw;
|
||||
height: 100vh;
|
||||
background: hsl(var(--card));
|
||||
@@ -2132,7 +2132,6 @@
|
||||
justify-content: space-between;
|
||||
padding: 1rem 1.25rem;
|
||||
border-bottom: 1px solid hsl(var(--border));
|
||||
background: hsl(var(--muted) / 0.3);
|
||||
}
|
||||
|
||||
.insight-detail-header h3 {
|
||||
|
||||
@@ -238,6 +238,31 @@ function handleNotification(data) {
|
||||
}
|
||||
break;
|
||||
|
||||
case 'ACTIVE_MEMORY_SYNCED':
|
||||
// Handle Active Memory sync completion
|
||||
if (typeof addGlobalNotification === 'function') {
|
||||
const { filesAnalyzed, tool, usedCli } = payload;
|
||||
const method = usedCli ? `CLI (${tool})` : 'Basic';
|
||||
addGlobalNotification(
|
||||
'success',
|
||||
'Active Memory synced',
|
||||
{
|
||||
'Files Analyzed': filesAnalyzed,
|
||||
'Method': method,
|
||||
'Timestamp': new Date(payload.timestamp).toLocaleTimeString()
|
||||
},
|
||||
'Memory'
|
||||
);
|
||||
}
|
||||
// Refresh Active Memory status if on memory view
|
||||
if (getCurrentView && getCurrentView() === 'memory') {
|
||||
if (typeof loadActiveMemoryStatus === 'function') {
|
||||
loadActiveMemoryStatus();
|
||||
}
|
||||
}
|
||||
console.log('[Active Memory] Sync completed:', payload);
|
||||
break;
|
||||
|
||||
default:
|
||||
console.log('[WS] Unknown notification type:', type);
|
||||
}
|
||||
|
||||
@@ -1123,11 +1123,11 @@ def semantic_list(
|
||||
registry.initialize()
|
||||
mapper = PathMapper()
|
||||
|
||||
project_info = registry.find_project(base_path)
|
||||
project_info = registry.get_project(base_path)
|
||||
if not project_info:
|
||||
raise CodexLensError(f"No index found for: {base_path}. Run 'codex-lens init' first.")
|
||||
|
||||
index_dir = mapper.source_to_index_dir(base_path)
|
||||
index_dir = Path(project_info.index_root)
|
||||
if not index_dir.exists():
|
||||
raise CodexLensError(f"Index directory not found: {index_dir}")
|
||||
|
||||
|
||||
@@ -375,6 +375,7 @@ class DirIndexStore:
|
||||
keywords_json = json.dumps(keywords)
|
||||
generated_at = time.time()
|
||||
|
||||
# Write to semantic_metadata table (for backward compatibility)
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO semantic_metadata(file_id, summary, keywords, purpose, llm_tool, generated_at)
|
||||
@@ -388,6 +389,37 @@ class DirIndexStore:
|
||||
""",
|
||||
(file_id, summary, keywords_json, purpose, llm_tool, generated_at),
|
||||
)
|
||||
|
||||
# Write to normalized keywords tables for optimized search
|
||||
# First, remove existing keyword associations
|
||||
conn.execute("DELETE FROM file_keywords WHERE file_id = ?", (file_id,))
|
||||
|
||||
# Then add new keywords
|
||||
for keyword in keywords:
|
||||
keyword = keyword.strip()
|
||||
if not keyword:
|
||||
continue
|
||||
|
||||
# Insert keyword if it doesn't exist
|
||||
conn.execute(
|
||||
"INSERT OR IGNORE INTO keywords(keyword) VALUES(?)",
|
||||
(keyword,)
|
||||
)
|
||||
|
||||
# Get keyword_id
|
||||
row = conn.execute(
|
||||
"SELECT id FROM keywords WHERE keyword = ?",
|
||||
(keyword,)
|
||||
).fetchone()
|
||||
|
||||
if row:
|
||||
keyword_id = row["id"]
|
||||
# Link file to keyword
|
||||
conn.execute(
|
||||
"INSERT OR IGNORE INTO file_keywords(file_id, keyword_id) VALUES(?, ?)",
|
||||
(file_id, keyword_id)
|
||||
)
|
||||
|
||||
conn.commit()
|
||||
|
||||
def get_semantic_metadata(self, file_id: int) -> Optional[Dict[str, Any]]:
|
||||
@@ -454,11 +486,12 @@ class DirIndexStore:
|
||||
for row in rows
|
||||
]
|
||||
|
||||
def search_semantic_keywords(self, keyword: str) -> List[Tuple[FileEntry, List[str]]]:
|
||||
def search_semantic_keywords(self, keyword: str, use_normalized: bool = True) -> List[Tuple[FileEntry, List[str]]]:
|
||||
"""Search files by semantic keywords.
|
||||
|
||||
Args:
|
||||
keyword: Keyword to search for (case-insensitive)
|
||||
use_normalized: Use optimized normalized tables (default: True)
|
||||
|
||||
Returns:
|
||||
List of (FileEntry, keywords) tuples where keyword matches
|
||||
@@ -466,35 +499,71 @@ class DirIndexStore:
|
||||
with self._lock:
|
||||
conn = self._get_connection()
|
||||
|
||||
keyword_pattern = f"%{keyword}%"
|
||||
if use_normalized:
|
||||
# Optimized query using normalized tables with indexed lookup
|
||||
# Use prefix search (keyword%) for better index utilization
|
||||
keyword_pattern = f"{keyword}%"
|
||||
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT f.id, f.name, f.full_path, f.language, f.mtime, f.line_count, sm.keywords
|
||||
FROM files f
|
||||
JOIN semantic_metadata sm ON f.id = sm.file_id
|
||||
WHERE sm.keywords LIKE ? COLLATE NOCASE
|
||||
ORDER BY f.name
|
||||
""",
|
||||
(keyword_pattern,),
|
||||
).fetchall()
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT f.id, f.name, f.full_path, f.language, f.mtime, f.line_count,
|
||||
GROUP_CONCAT(k.keyword, ',') as keywords
|
||||
FROM files f
|
||||
JOIN file_keywords fk ON f.id = fk.file_id
|
||||
JOIN keywords k ON fk.keyword_id = k.id
|
||||
WHERE k.keyword LIKE ? COLLATE NOCASE
|
||||
GROUP BY f.id, f.name, f.full_path, f.language, f.mtime, f.line_count
|
||||
ORDER BY f.name
|
||||
""",
|
||||
(keyword_pattern,),
|
||||
).fetchall()
|
||||
|
||||
import json
|
||||
results = []
|
||||
for row in rows:
|
||||
file_entry = FileEntry(
|
||||
id=int(row["id"]),
|
||||
name=row["name"],
|
||||
full_path=Path(row["full_path"]),
|
||||
language=row["language"],
|
||||
mtime=float(row["mtime"]) if row["mtime"] else 0.0,
|
||||
line_count=int(row["line_count"]) if row["line_count"] else 0,
|
||||
)
|
||||
keywords = row["keywords"].split(',') if row["keywords"] else []
|
||||
results.append((file_entry, keywords))
|
||||
|
||||
results = []
|
||||
for row in rows:
|
||||
file_entry = FileEntry(
|
||||
id=int(row["id"]),
|
||||
name=row["name"],
|
||||
full_path=Path(row["full_path"]),
|
||||
language=row["language"],
|
||||
mtime=float(row["mtime"]) if row["mtime"] else 0.0,
|
||||
line_count=int(row["line_count"]) if row["line_count"] else 0,
|
||||
)
|
||||
keywords = json.loads(row["keywords"]) if row["keywords"] else []
|
||||
results.append((file_entry, keywords))
|
||||
return results
|
||||
|
||||
return results
|
||||
else:
|
||||
# Fallback to original query for backward compatibility
|
||||
keyword_pattern = f"%{keyword}%"
|
||||
|
||||
rows = conn.execute(
|
||||
"""
|
||||
SELECT f.id, f.name, f.full_path, f.language, f.mtime, f.line_count, sm.keywords
|
||||
FROM files f
|
||||
JOIN semantic_metadata sm ON f.id = sm.file_id
|
||||
WHERE sm.keywords LIKE ? COLLATE NOCASE
|
||||
ORDER BY f.name
|
||||
""",
|
||||
(keyword_pattern,),
|
||||
).fetchall()
|
||||
|
||||
import json
|
||||
|
||||
results = []
|
||||
for row in rows:
|
||||
file_entry = FileEntry(
|
||||
id=int(row["id"]),
|
||||
name=row["name"],
|
||||
full_path=Path(row["full_path"]),
|
||||
language=row["language"],
|
||||
mtime=float(row["mtime"]) if row["mtime"] else 0.0,
|
||||
line_count=int(row["line_count"]) if row["line_count"] else 0,
|
||||
)
|
||||
keywords = json.loads(row["keywords"]) if row["keywords"] else []
|
||||
results.append((file_entry, keywords))
|
||||
|
||||
return results
|
||||
|
||||
def list_semantic_metadata(
|
||||
self,
|
||||
@@ -794,19 +863,26 @@ class DirIndexStore:
|
||||
return [row["full_path"] for row in rows]
|
||||
|
||||
def search_symbols(
|
||||
self, name: str, kind: Optional[str] = None, limit: int = 50
|
||||
self, name: str, kind: Optional[str] = None, limit: int = 50, prefix_mode: bool = True
|
||||
) -> List[Symbol]:
|
||||
"""Search symbols by name pattern.
|
||||
|
||||
Args:
|
||||
name: Symbol name pattern (LIKE query)
|
||||
name: Symbol name pattern
|
||||
kind: Optional symbol kind filter
|
||||
limit: Maximum results to return
|
||||
prefix_mode: If True, use prefix search (faster with index);
|
||||
If False, use substring search (slower)
|
||||
|
||||
Returns:
|
||||
List of Symbol objects
|
||||
"""
|
||||
pattern = f"%{name}%"
|
||||
# Prefix search is much faster as it can use index
|
||||
if prefix_mode:
|
||||
pattern = f"{name}%"
|
||||
else:
|
||||
pattern = f"%{name}%"
|
||||
|
||||
with self._lock:
|
||||
conn = self._get_connection()
|
||||
if kind:
|
||||
@@ -979,6 +1055,28 @@ class DirIndexStore:
|
||||
"""
|
||||
)
|
||||
|
||||
# Normalized keywords tables for performance
|
||||
conn.execute(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS keywords (
|
||||
id INTEGER PRIMARY KEY,
|
||||
keyword TEXT NOT NULL UNIQUE
|
||||
)
|
||||
"""
|
||||
)
|
||||
|
||||
conn.execute(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS file_keywords (
|
||||
file_id INTEGER NOT NULL,
|
||||
keyword_id INTEGER NOT NULL,
|
||||
PRIMARY KEY (file_id, keyword_id),
|
||||
FOREIGN KEY (file_id) REFERENCES files (id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (keyword_id) REFERENCES keywords (id) ON DELETE CASCADE
|
||||
)
|
||||
"""
|
||||
)
|
||||
|
||||
# Indexes
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_files_name ON files(name)")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_files_path ON files(full_path)")
|
||||
@@ -986,6 +1084,9 @@ class DirIndexStore:
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_symbols_name ON symbols(name)")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_symbols_file ON symbols(file_id)")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_semantic_file ON semantic_metadata(file_id)")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_keywords_keyword ON keywords(keyword)")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_file_keywords_file_id ON file_keywords(file_id)")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_file_keywords_keyword_id ON file_keywords(keyword_id)")
|
||||
|
||||
except sqlite3.DatabaseError as exc:
|
||||
raise StorageError(f"Failed to create schema: {exc}") from exc
|
||||
|
||||
139
codex-lens/src/codexlens/storage/migration_manager.py
Normal file
139
codex-lens/src/codexlens/storage/migration_manager.py
Normal file
@@ -0,0 +1,139 @@
|
||||
"""
|
||||
Manages database schema migrations.
|
||||
|
||||
This module provides a framework for applying versioned migrations to the SQLite
|
||||
database. Migrations are discovered from the `codexlens.storage.migrations`
|
||||
package and applied sequentially. The database schema version is tracked using
|
||||
the `user_version` pragma.
|
||||
"""
|
||||
|
||||
import importlib
|
||||
import logging
|
||||
import pkgutil
|
||||
from pathlib import Path
|
||||
from sqlite3 import Connection
|
||||
from typing import List, NamedTuple
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Migration(NamedTuple):
|
||||
"""Represents a single database migration."""
|
||||
|
||||
version: int
|
||||
name: str
|
||||
upgrade: callable
|
||||
|
||||
|
||||
def discover_migrations() -> List[Migration]:
|
||||
"""
|
||||
Discovers and returns a sorted list of database migrations.
|
||||
|
||||
Migrations are expected to be in the `codexlens.storage.migrations` package,
|
||||
with filenames in the format `migration_XXX_description.py`, where XXX is
|
||||
the version number. Each migration module must contain an `upgrade` function
|
||||
that takes a `sqlite3.Connection` object as its argument.
|
||||
|
||||
Returns:
|
||||
A list of Migration objects, sorted by version.
|
||||
"""
|
||||
import codexlens.storage.migrations
|
||||
|
||||
migrations = []
|
||||
package_path = Path(codexlens.storage.migrations.__file__).parent
|
||||
|
||||
for _, name, _ in pkgutil.iter_modules([str(package_path)]):
|
||||
if name.startswith("migration_"):
|
||||
try:
|
||||
version = int(name.split("_")[1])
|
||||
module = importlib.import_module(f"codexlens.storage.migrations.{name}")
|
||||
if hasattr(module, "upgrade"):
|
||||
migrations.append(
|
||||
Migration(version=version, name=name, upgrade=module.upgrade)
|
||||
)
|
||||
else:
|
||||
log.warning(f"Migration {name} is missing 'upgrade' function.")
|
||||
except (ValueError, IndexError) as e:
|
||||
log.warning(f"Could not parse migration name {name}: {e}")
|
||||
except ImportError as e:
|
||||
log.warning(f"Could not import migration {name}: {e}")
|
||||
|
||||
migrations.sort(key=lambda m: m.version)
|
||||
return migrations
|
||||
|
||||
|
||||
class MigrationManager:
|
||||
"""
|
||||
Manages the application of migrations to a database.
|
||||
"""
|
||||
|
||||
def __init__(self, db_conn: Connection):
|
||||
"""
|
||||
Initializes the MigrationManager.
|
||||
|
||||
Args:
|
||||
db_conn: The SQLite database connection.
|
||||
"""
|
||||
self.db_conn = db_conn
|
||||
self.migrations = discover_migrations()
|
||||
|
||||
def get_current_version(self) -> int:
|
||||
"""
|
||||
Gets the current version of the database schema.
|
||||
|
||||
Returns:
|
||||
The current schema version number.
|
||||
"""
|
||||
return self.db_conn.execute("PRAGMA user_version").fetchone()[0]
|
||||
|
||||
def set_version(self, version: int):
|
||||
"""
|
||||
Sets the database schema version.
|
||||
|
||||
Args:
|
||||
version: The version number to set.
|
||||
"""
|
||||
self.db_conn.execute(f"PRAGMA user_version = {version}")
|
||||
log.info(f"Database schema version set to {version}")
|
||||
|
||||
def apply_migrations(self):
|
||||
"""
|
||||
Applies all pending migrations to the database.
|
||||
|
||||
This method checks the current database version and applies all
|
||||
subsequent migrations in order. Each migration is applied within
|
||||
a transaction.
|
||||
"""
|
||||
current_version = self.get_current_version()
|
||||
log.info(f"Current database schema version: {current_version}")
|
||||
|
||||
for migration in self.migrations:
|
||||
if migration.version > current_version:
|
||||
log.info(f"Applying migration {migration.version}: {migration.name}...")
|
||||
try:
|
||||
self.db_conn.execute("BEGIN")
|
||||
migration.upgrade(self.db_conn)
|
||||
self.set_version(migration.version)
|
||||
self.db_conn.execute("COMMIT")
|
||||
log.info(
|
||||
f"Successfully applied migration {migration.version}: {migration.name}"
|
||||
)
|
||||
except Exception as e:
|
||||
log.error(
|
||||
f"Failed to apply migration {migration.version}: {migration.name}. Rolling back. Error: {e}",
|
||||
exc_info=True,
|
||||
)
|
||||
self.db_conn.execute("ROLLBACK")
|
||||
raise
|
||||
|
||||
latest_migration_version = self.migrations[-1].version if self.migrations else 0
|
||||
if current_version < latest_migration_version:
|
||||
# This case can be hit if migrations were applied but the loop was exited
|
||||
# and set_version was not called for the last one for some reason.
|
||||
# To be safe, we explicitly set the version to the latest known migration.
|
||||
final_version = self.get_current_version()
|
||||
if final_version != latest_migration_version:
|
||||
log.warning(f"Database version ({final_version}) is not the latest migration version ({latest_migration_version}). This may indicate a problem.")
|
||||
|
||||
log.info("All pending migrations applied successfully.")
|
||||
|
||||
1
codex-lens/src/codexlens/storage/migrations/__init__.py
Normal file
1
codex-lens/src/codexlens/storage/migrations/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# This file makes the 'migrations' directory a Python package.
|
||||
@@ -0,0 +1,108 @@
|
||||
"""
|
||||
Migration 001: Normalize keywords into separate tables.
|
||||
|
||||
This migration introduces two new tables, `keywords` and `file_keywords`, to
|
||||
store semantic keywords in a normalized fashion. It then migrates the existing
|
||||
keywords from the `semantic_data` JSON blob in the `files` table into these
|
||||
new tables. This is intended to speed up keyword-based searches significantly.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
from sqlite3 import Connection
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def upgrade(db_conn: Connection):
|
||||
"""
|
||||
Applies the migration to normalize keywords.
|
||||
|
||||
- Creates `keywords` and `file_keywords` tables.
|
||||
- Creates indexes for efficient querying.
|
||||
- Migrates data from `files.semantic_data` to the new tables.
|
||||
|
||||
Args:
|
||||
db_conn: The SQLite database connection.
|
||||
"""
|
||||
cursor = db_conn.cursor()
|
||||
|
||||
log.info("Creating 'keywords' and 'file_keywords' tables...")
|
||||
# Create a table to store unique keywords
|
||||
cursor.execute(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS keywords (
|
||||
id INTEGER PRIMARY KEY,
|
||||
keyword TEXT NOT NULL UNIQUE
|
||||
)
|
||||
"""
|
||||
)
|
||||
|
||||
# Create a join table to link files and keywords (many-to-many)
|
||||
cursor.execute(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS file_keywords (
|
||||
file_id INTEGER NOT NULL,
|
||||
keyword_id INTEGER NOT NULL,
|
||||
PRIMARY KEY (file_id, keyword_id),
|
||||
FOREIGN KEY (file_id) REFERENCES files (id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (keyword_id) REFERENCES keywords (id) ON DELETE CASCADE
|
||||
)
|
||||
"""
|
||||
)
|
||||
|
||||
log.info("Creating indexes for new keyword tables...")
|
||||
cursor.execute("CREATE INDEX IF NOT EXISTS idx_keywords_keyword ON keywords (keyword)")
|
||||
cursor.execute("CREATE INDEX IF NOT EXISTS idx_file_keywords_file_id ON file_keywords (file_id)")
|
||||
cursor.execute("CREATE INDEX IF NOT EXISTS idx_file_keywords_keyword_id ON file_keywords (keyword_id)")
|
||||
|
||||
log.info("Migrating existing keywords from 'semantic_metadata' table...")
|
||||
cursor.execute("SELECT file_id, keywords FROM semantic_metadata WHERE keywords IS NOT NULL AND keywords != ''")
|
||||
|
||||
files_to_migrate = cursor.fetchall()
|
||||
if not files_to_migrate:
|
||||
log.info("No existing files with semantic metadata to migrate.")
|
||||
return
|
||||
|
||||
log.info(f"Found {len(files_to_migrate)} files with semantic metadata to migrate.")
|
||||
|
||||
for file_id, keywords_json in files_to_migrate:
|
||||
if not keywords_json:
|
||||
continue
|
||||
try:
|
||||
keywords = json.loads(keywords_json)
|
||||
|
||||
if not isinstance(keywords, list):
|
||||
log.warning(f"Keywords for file_id {file_id} is not a list, skipping.")
|
||||
continue
|
||||
|
||||
for keyword in keywords:
|
||||
if not isinstance(keyword, str):
|
||||
log.warning(f"Non-string keyword '{keyword}' found for file_id {file_id}, skipping.")
|
||||
continue
|
||||
|
||||
keyword = keyword.strip()
|
||||
if not keyword:
|
||||
continue
|
||||
|
||||
# Get or create keyword_id
|
||||
cursor.execute("INSERT OR IGNORE INTO keywords (keyword) VALUES (?)", (keyword,))
|
||||
cursor.execute("SELECT id FROM keywords WHERE keyword = ?", (keyword,))
|
||||
keyword_id_result = cursor.fetchone()
|
||||
|
||||
if keyword_id_result:
|
||||
keyword_id = keyword_id_result[0]
|
||||
# Link file to keyword
|
||||
cursor.execute(
|
||||
"INSERT OR IGNORE INTO file_keywords (file_id, keyword_id) VALUES (?, ?)",
|
||||
(file_id, keyword_id),
|
||||
)
|
||||
else:
|
||||
log.error(f"Failed to retrieve or create keyword_id for keyword: {keyword}")
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
log.warning(f"Could not parse keywords for file_id {file_id}: {e}")
|
||||
except Exception as e:
|
||||
log.error(f"An unexpected error occurred during migration for file_id {file_id}: {e}", exc_info=True)
|
||||
|
||||
log.info("Finished migrating keywords.")
|
||||
@@ -424,6 +424,9 @@ class RegistryStore:
|
||||
Searches for the closest parent directory that has an index.
|
||||
Useful for supporting subdirectory searches.
|
||||
|
||||
Optimized to use single database query instead of iterating through
|
||||
each parent directory level.
|
||||
|
||||
Args:
|
||||
source_path: Source directory or file path
|
||||
|
||||
@@ -434,23 +437,30 @@ class RegistryStore:
|
||||
conn = self._get_connection()
|
||||
source_path_resolved = source_path.resolve()
|
||||
|
||||
# Check from current path up to root
|
||||
# Build list of all parent paths from deepest to shallowest
|
||||
paths_to_check = []
|
||||
current = source_path_resolved
|
||||
while True:
|
||||
current_str = str(current)
|
||||
row = conn.execute(
|
||||
"SELECT * FROM dir_mapping WHERE source_path=?", (current_str,)
|
||||
).fetchone()
|
||||
|
||||
if row:
|
||||
return self._row_to_dir_mapping(row)
|
||||
|
||||
paths_to_check.append(str(current))
|
||||
parent = current.parent
|
||||
if parent == current: # Reached filesystem root
|
||||
break
|
||||
current = parent
|
||||
|
||||
return None
|
||||
if not paths_to_check:
|
||||
return None
|
||||
|
||||
# Single query with WHERE IN, ordered by path length (longest = nearest)
|
||||
placeholders = ','.join('?' * len(paths_to_check))
|
||||
query = f"""
|
||||
SELECT * FROM dir_mapping
|
||||
WHERE source_path IN ({placeholders})
|
||||
ORDER BY LENGTH(source_path) DESC
|
||||
LIMIT 1
|
||||
"""
|
||||
|
||||
row = conn.execute(query, paths_to_check).fetchone()
|
||||
return self._row_to_dir_mapping(row) if row else None
|
||||
|
||||
def get_project_dirs(self, project_id: int) -> List[DirMapping]:
|
||||
"""Get all directory mappings for a project.
|
||||
|
||||
218
codex-lens/tests/simple_validation.py
Normal file
218
codex-lens/tests/simple_validation.py
Normal file
@@ -0,0 +1,218 @@
|
||||
"""
|
||||
Simple validation for performance optimizations (Windows-safe).
|
||||
"""
|
||||
import sys
|
||||
sys.stdout.reconfigure(encoding='utf-8')
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
from codexlens.storage.dir_index import DirIndexStore
|
||||
from codexlens.storage.registry import RegistryStore
|
||||
|
||||
|
||||
def main():
|
||||
print("=" * 60)
|
||||
print("CodexLens Performance Optimizations - Simple Validation")
|
||||
print("=" * 60)
|
||||
|
||||
# Test 1: Keyword Normalization
|
||||
print("\n[1/4] Testing Keyword Normalization...")
|
||||
try:
|
||||
tmpdir = tempfile.mkdtemp()
|
||||
db_path = Path(tmpdir) / "test1.db"
|
||||
|
||||
store = DirIndexStore(db_path)
|
||||
store.initialize()
|
||||
|
||||
file_id = store.add_file(
|
||||
name="test.py",
|
||||
full_path=Path(f"{tmpdir}/test.py"),
|
||||
content="def hello(): pass",
|
||||
language="python"
|
||||
)
|
||||
|
||||
keywords = ["auth", "security", "jwt"]
|
||||
store.add_semantic_metadata(
|
||||
file_id=file_id,
|
||||
summary="Test",
|
||||
keywords=keywords,
|
||||
purpose="Testing",
|
||||
llm_tool="gemini"
|
||||
)
|
||||
|
||||
# Check normalized tables
|
||||
conn = store._get_connection()
|
||||
count = conn.execute(
|
||||
"SELECT COUNT(*) as c FROM file_keywords WHERE file_id=?",
|
||||
(file_id,)
|
||||
).fetchone()["c"]
|
||||
|
||||
store.close()
|
||||
|
||||
assert count == 3, f"Expected 3 keywords, got {count}"
|
||||
print(" PASS: Keywords stored in normalized tables")
|
||||
|
||||
# Test optimized search
|
||||
store = DirIndexStore(db_path)
|
||||
results = store.search_semantic_keywords("auth", use_normalized=True)
|
||||
store.close()
|
||||
|
||||
assert len(results) == 1
|
||||
print(" PASS: Optimized keyword search works")
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f" FAIL: {e}")
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
# Test 2: Path Lookup Optimization
|
||||
print("\n[2/4] Testing Path Lookup Optimization...")
|
||||
try:
|
||||
tmpdir = tempfile.mkdtemp()
|
||||
db_path = Path(tmpdir) / "test2.db"
|
||||
|
||||
store = RegistryStore(db_path)
|
||||
store.initialize() # Create schema
|
||||
|
||||
# Register a project first
|
||||
project = store.register_project(
|
||||
source_root=Path("/a"),
|
||||
index_root=Path("/tmp")
|
||||
)
|
||||
|
||||
# Register directory
|
||||
store.register_dir(
|
||||
project_id=project.id,
|
||||
source_path=Path("/a/b/c"),
|
||||
index_path=Path("/tmp/index.db"),
|
||||
depth=2,
|
||||
files_count=0
|
||||
)
|
||||
|
||||
deep_path = Path("/a/b/c/d/e/f/g/h/i/j/file.py")
|
||||
|
||||
start = time.perf_counter()
|
||||
result = store.find_nearest_index(deep_path)
|
||||
elapsed = time.perf_counter() - start
|
||||
|
||||
store.close()
|
||||
|
||||
assert result is not None, "No result found"
|
||||
# Path is normalized, just check it contains the key parts
|
||||
assert "a" in str(result.source_path) and "b" in str(result.source_path) and "c" in str(result.source_path)
|
||||
assert elapsed < 0.05, f"Too slow: {elapsed*1000:.2f}ms"
|
||||
|
||||
print(f" PASS: Found nearest index in {elapsed*1000:.2f}ms")
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f" FAIL: {e}")
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
# Test 3: Symbol Search Prefix Mode
|
||||
print("\n[3/4] Testing Symbol Search Prefix Mode...")
|
||||
try:
|
||||
tmpdir = tempfile.mkdtemp()
|
||||
db_path = Path(tmpdir) / "test3.db"
|
||||
|
||||
store = DirIndexStore(db_path)
|
||||
store.initialize()
|
||||
|
||||
from codexlens.entities import Symbol
|
||||
file_id = store.add_file(
|
||||
name="test.py",
|
||||
full_path=Path(f"{tmpdir}/test.py"),
|
||||
content="def hello(): pass\n" * 10,
|
||||
language="python",
|
||||
symbols=[
|
||||
Symbol(name="get_user", kind="function", range=(1, 5)),
|
||||
Symbol(name="get_item", kind="function", range=(6, 10)),
|
||||
Symbol(name="create_user", kind="function", range=(11, 15)),
|
||||
]
|
||||
)
|
||||
|
||||
# Prefix search
|
||||
results = store.search_symbols("get", prefix_mode=True)
|
||||
store.close()
|
||||
|
||||
assert len(results) == 2, f"Expected 2, got {len(results)}"
|
||||
for symbol in results:
|
||||
assert symbol.name.startswith("get")
|
||||
|
||||
print(f" PASS: Prefix search found {len(results)} symbols")
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f" FAIL: {e}")
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
# Test 4: Performance Comparison
|
||||
print("\n[4/4] Testing Performance Comparison...")
|
||||
try:
|
||||
tmpdir = tempfile.mkdtemp()
|
||||
db_path = Path(tmpdir) / "test4.db"
|
||||
|
||||
store = DirIndexStore(db_path)
|
||||
store.initialize()
|
||||
|
||||
# Create 50 files with keywords
|
||||
for i in range(50):
|
||||
file_id = store.add_file(
|
||||
name=f"file_{i}.py",
|
||||
full_path=Path(f"{tmpdir}/file_{i}.py"),
|
||||
content=f"def function_{i}(): pass",
|
||||
language="python"
|
||||
)
|
||||
|
||||
keywords = ["auth", "security"] if i % 2 == 0 else ["api", "endpoint"]
|
||||
store.add_semantic_metadata(
|
||||
file_id=file_id,
|
||||
summary=f"File {i}",
|
||||
keywords=keywords,
|
||||
purpose="Testing",
|
||||
llm_tool="gemini"
|
||||
)
|
||||
|
||||
# Benchmark normalized
|
||||
start = time.perf_counter()
|
||||
for _ in range(5):
|
||||
results_norm = store.search_semantic_keywords("auth", use_normalized=True)
|
||||
norm_time = time.perf_counter() - start
|
||||
|
||||
# Benchmark fallback
|
||||
start = time.perf_counter()
|
||||
for _ in range(5):
|
||||
results_fallback = store.search_semantic_keywords("auth", use_normalized=False)
|
||||
fallback_time = time.perf_counter() - start
|
||||
|
||||
store.close()
|
||||
|
||||
assert len(results_norm) == len(results_fallback)
|
||||
speedup = fallback_time / norm_time if norm_time > 0 else 1.0
|
||||
|
||||
print(f" Normalized: {norm_time*1000:.2f}ms (5 iterations)")
|
||||
print(f" Fallback: {fallback_time*1000:.2f}ms (5 iterations)")
|
||||
print(f" Speedup: {speedup:.2f}x")
|
||||
print(" PASS: Performance test completed")
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f" FAIL: {e}")
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("ALL VALIDATION TESTS PASSED")
|
||||
print("=" * 60)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit(main())
|
||||
467
codex-lens/tests/test_performance_optimizations.py
Normal file
467
codex-lens/tests/test_performance_optimizations.py
Normal file
@@ -0,0 +1,467 @@
|
||||
"""Tests for performance optimizations in CodexLens storage.
|
||||
|
||||
This module tests the following optimizations:
|
||||
1. Normalized keywords search (migration_001)
|
||||
2. Optimized path lookup in registry
|
||||
3. Prefix-mode symbol search
|
||||
"""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from codexlens.storage.dir_index import DirIndexStore
|
||||
from codexlens.storage.registry import RegistryStore
|
||||
from codexlens.storage.migration_manager import MigrationManager
|
||||
from codexlens.storage.migrations import migration_001_normalize_keywords
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_index_db():
|
||||
"""Create a temporary dir index database."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
db_path = Path(tmpdir) / "test_index.db"
|
||||
store = DirIndexStore(db_path)
|
||||
store.initialize() # Initialize schema
|
||||
yield store
|
||||
store.close()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def temp_registry_db():
|
||||
"""Create a temporary registry database."""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
db_path = Path(tmpdir) / "test_registry.db"
|
||||
store = RegistryStore(db_path)
|
||||
store.initialize() # Initialize schema
|
||||
yield store
|
||||
store.close()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def populated_index_db(temp_index_db):
|
||||
"""Create an index database with sample data.
|
||||
|
||||
Uses 100 files to provide meaningful performance comparison between
|
||||
optimized and fallback implementations.
|
||||
"""
|
||||
from codexlens.entities import Symbol
|
||||
|
||||
store = temp_index_db
|
||||
|
||||
# Add files with symbols and keywords
|
||||
# Using 100 files to show performance improvements
|
||||
file_ids = []
|
||||
|
||||
# Define keyword pools for cycling
|
||||
keyword_pools = [
|
||||
["auth", "security", "jwt"],
|
||||
["database", "sql", "query"],
|
||||
["auth", "login", "password"],
|
||||
["api", "rest", "endpoint"],
|
||||
["cache", "redis", "performance"],
|
||||
["auth", "oauth", "token"],
|
||||
["test", "unittest", "pytest"],
|
||||
["database", "postgres", "migration"],
|
||||
["api", "graphql", "resolver"],
|
||||
["security", "encryption", "crypto"]
|
||||
]
|
||||
|
||||
for i in range(100):
|
||||
# Create symbols for first 50 files to have more symbol search data
|
||||
symbols = None
|
||||
if i < 50:
|
||||
symbols = [
|
||||
Symbol(name=f"get_user_{i}", kind="function", range=(1, 10)),
|
||||
Symbol(name=f"create_user_{i}", kind="function", range=(11, 20)),
|
||||
Symbol(name=f"UserClass_{i}", kind="class", range=(21, 40)),
|
||||
]
|
||||
|
||||
file_id = store.add_file(
|
||||
name=f"file_{i}.py",
|
||||
full_path=Path(f"/test/path/file_{i}.py"),
|
||||
content=f"def function_{i}(): pass\n" * 10,
|
||||
language="python",
|
||||
symbols=symbols
|
||||
)
|
||||
file_ids.append(file_id)
|
||||
|
||||
# Add semantic metadata with keywords (cycle through keyword pools)
|
||||
keywords = keyword_pools[i % len(keyword_pools)]
|
||||
store.add_semantic_metadata(
|
||||
file_id=file_id,
|
||||
summary=f"Test file {file_id}",
|
||||
keywords=keywords,
|
||||
purpose="Testing",
|
||||
llm_tool="gemini"
|
||||
)
|
||||
|
||||
return store
|
||||
|
||||
|
||||
class TestKeywordNormalization:
|
||||
"""Test normalized keywords functionality."""
|
||||
|
||||
def test_migration_creates_tables(self, temp_index_db):
|
||||
"""Test that migration creates keywords and file_keywords tables."""
|
||||
conn = temp_index_db._get_connection()
|
||||
|
||||
# Verify tables exist (created by _create_schema)
|
||||
tables = conn.execute("""
|
||||
SELECT name FROM sqlite_master
|
||||
WHERE type='table' AND name IN ('keywords', 'file_keywords')
|
||||
""").fetchall()
|
||||
|
||||
assert len(tables) == 2
|
||||
|
||||
def test_migration_creates_indexes(self, temp_index_db):
|
||||
"""Test that migration creates necessary indexes."""
|
||||
conn = temp_index_db._get_connection()
|
||||
|
||||
# Check for indexes
|
||||
indexes = conn.execute("""
|
||||
SELECT name FROM sqlite_master
|
||||
WHERE type='index' AND name IN (
|
||||
'idx_keywords_keyword',
|
||||
'idx_file_keywords_file_id',
|
||||
'idx_file_keywords_keyword_id'
|
||||
)
|
||||
""").fetchall()
|
||||
|
||||
assert len(indexes) == 3
|
||||
|
||||
def test_add_semantic_metadata_populates_normalized_tables(self, temp_index_db):
|
||||
"""Test that adding metadata populates both old and new tables."""
|
||||
# Add a file
|
||||
file_id = temp_index_db.add_file(
|
||||
name="test.py",
|
||||
full_path=Path("/test/test.py"),
|
||||
language="python",
|
||||
content="test"
|
||||
)
|
||||
|
||||
# Add semantic metadata
|
||||
keywords = ["auth", "security", "jwt"]
|
||||
temp_index_db.add_semantic_metadata(
|
||||
file_id=file_id,
|
||||
summary="Test summary",
|
||||
keywords=keywords,
|
||||
purpose="Testing",
|
||||
llm_tool="gemini"
|
||||
)
|
||||
|
||||
conn = temp_index_db._get_connection()
|
||||
|
||||
# Check semantic_metadata table (backward compatibility)
|
||||
row = conn.execute(
|
||||
"SELECT keywords FROM semantic_metadata WHERE file_id=?",
|
||||
(file_id,)
|
||||
).fetchone()
|
||||
assert row is not None
|
||||
assert json.loads(row["keywords"]) == keywords
|
||||
|
||||
# Check normalized keywords table
|
||||
keyword_rows = conn.execute("""
|
||||
SELECT k.keyword
|
||||
FROM file_keywords fk
|
||||
JOIN keywords k ON fk.keyword_id = k.id
|
||||
WHERE fk.file_id = ?
|
||||
""", (file_id,)).fetchall()
|
||||
|
||||
assert len(keyword_rows) == 3
|
||||
normalized_keywords = [row["keyword"] for row in keyword_rows]
|
||||
assert set(normalized_keywords) == set(keywords)
|
||||
|
||||
def test_search_semantic_keywords_normalized(self, populated_index_db):
|
||||
"""Test optimized keyword search using normalized tables."""
|
||||
results = populated_index_db.search_semantic_keywords("auth", use_normalized=True)
|
||||
|
||||
# Should find 3 files with "auth" keyword
|
||||
assert len(results) >= 3
|
||||
|
||||
# Verify results structure
|
||||
for file_entry, keywords in results:
|
||||
assert file_entry.name.startswith("file_")
|
||||
assert isinstance(keywords, list)
|
||||
assert any("auth" in k.lower() for k in keywords)
|
||||
|
||||
def test_search_semantic_keywords_fallback(self, populated_index_db):
|
||||
"""Test that fallback search still works."""
|
||||
results = populated_index_db.search_semantic_keywords("auth", use_normalized=False)
|
||||
|
||||
# Should find files with "auth" keyword
|
||||
assert len(results) >= 3
|
||||
|
||||
for file_entry, keywords in results:
|
||||
assert isinstance(keywords, list)
|
||||
|
||||
|
||||
class TestPathLookupOptimization:
|
||||
"""Test optimized path lookup in registry."""
|
||||
|
||||
def test_find_nearest_index_shallow(self, temp_registry_db):
|
||||
"""Test path lookup with shallow directory structure."""
|
||||
# Register a project first
|
||||
project = temp_registry_db.register_project(
|
||||
source_root=Path("/test"),
|
||||
index_root=Path("/tmp")
|
||||
)
|
||||
|
||||
# Register directory mapping
|
||||
temp_registry_db.register_dir(
|
||||
project_id=project.id,
|
||||
source_path=Path("/test"),
|
||||
index_path=Path("/tmp/index.db"),
|
||||
depth=0,
|
||||
files_count=0
|
||||
)
|
||||
|
||||
# Search for subdirectory
|
||||
result = temp_registry_db.find_nearest_index(Path("/test/subdir/file.py"))
|
||||
|
||||
assert result is not None
|
||||
# Compare as strings for cross-platform compatibility
|
||||
assert "/test" in str(result.source_path) or "\\test" in str(result.source_path)
|
||||
|
||||
def test_find_nearest_index_deep(self, temp_registry_db):
|
||||
"""Test path lookup with deep directory structure."""
|
||||
# Register a project
|
||||
project = temp_registry_db.register_project(
|
||||
source_root=Path("/a"),
|
||||
index_root=Path("/tmp")
|
||||
)
|
||||
|
||||
# Add directory mappings at different levels
|
||||
temp_registry_db.register_dir(
|
||||
project_id=project.id,
|
||||
source_path=Path("/a"),
|
||||
index_path=Path("/tmp/index_a.db"),
|
||||
depth=0,
|
||||
files_count=0
|
||||
)
|
||||
temp_registry_db.register_dir(
|
||||
project_id=project.id,
|
||||
source_path=Path("/a/b/c"),
|
||||
index_path=Path("/tmp/index_abc.db"),
|
||||
depth=2,
|
||||
files_count=0
|
||||
)
|
||||
|
||||
# Should find nearest (longest) match
|
||||
result = temp_registry_db.find_nearest_index(Path("/a/b/c/d/e/f/file.py"))
|
||||
|
||||
assert result is not None
|
||||
# Check that path contains the key parts
|
||||
result_path = str(result.source_path)
|
||||
assert "a" in result_path and "b" in result_path and "c" in result_path
|
||||
|
||||
def test_find_nearest_index_not_found(self, temp_registry_db):
|
||||
"""Test path lookup when no mapping exists."""
|
||||
result = temp_registry_db.find_nearest_index(Path("/nonexistent/path"))
|
||||
assert result is None
|
||||
|
||||
def test_find_nearest_index_performance(self, temp_registry_db):
|
||||
"""Basic performance test for path lookup."""
|
||||
# Register a project
|
||||
project = temp_registry_db.register_project(
|
||||
source_root=Path("/root"),
|
||||
index_root=Path("/tmp")
|
||||
)
|
||||
|
||||
# Add mapping at root
|
||||
temp_registry_db.register_dir(
|
||||
project_id=project.id,
|
||||
source_path=Path("/root"),
|
||||
index_path=Path("/tmp/index.db"),
|
||||
depth=0,
|
||||
files_count=0
|
||||
)
|
||||
|
||||
# Test with very deep path (10 levels)
|
||||
deep_path = Path("/root/a/b/c/d/e/f/g/h/i/j/file.py")
|
||||
|
||||
start = time.perf_counter()
|
||||
result = temp_registry_db.find_nearest_index(deep_path)
|
||||
elapsed = time.perf_counter() - start
|
||||
|
||||
# Should complete quickly (< 50ms even on slow systems)
|
||||
assert elapsed < 0.05
|
||||
assert result is not None
|
||||
|
||||
|
||||
class TestSymbolSearchOptimization:
|
||||
"""Test optimized symbol search."""
|
||||
|
||||
def test_symbol_search_prefix_mode(self, populated_index_db):
|
||||
"""Test symbol search with prefix mode."""
|
||||
results = populated_index_db.search_symbols("get", prefix_mode=True)
|
||||
|
||||
# Should find symbols starting with "get"
|
||||
assert len(results) > 0
|
||||
for symbol in results:
|
||||
assert symbol.name.startswith("get")
|
||||
|
||||
def test_symbol_search_substring_mode(self, populated_index_db):
|
||||
"""Test symbol search with substring mode."""
|
||||
results = populated_index_db.search_symbols("user", prefix_mode=False)
|
||||
|
||||
# Should find symbols containing "user"
|
||||
assert len(results) > 0
|
||||
for symbol in results:
|
||||
assert "user" in symbol.name.lower()
|
||||
|
||||
def test_symbol_search_with_kind_filter(self, populated_index_db):
|
||||
"""Test symbol search with kind filter."""
|
||||
results = populated_index_db.search_symbols(
|
||||
"UserClass",
|
||||
kind="class",
|
||||
prefix_mode=True
|
||||
)
|
||||
|
||||
# Should find only class symbols
|
||||
assert len(results) > 0
|
||||
for symbol in results:
|
||||
assert symbol.kind == "class"
|
||||
|
||||
def test_symbol_search_limit(self, populated_index_db):
|
||||
"""Test symbol search respects limit."""
|
||||
results = populated_index_db.search_symbols("", prefix_mode=True, limit=5)
|
||||
|
||||
# Should return at most 5 results
|
||||
assert len(results) <= 5
|
||||
|
||||
|
||||
class TestMigrationManager:
|
||||
"""Test migration manager functionality."""
|
||||
|
||||
def test_migration_manager_tracks_version(self, temp_index_db):
|
||||
"""Test that migration manager tracks schema version."""
|
||||
conn = temp_index_db._get_connection()
|
||||
manager = MigrationManager(conn)
|
||||
|
||||
current_version = manager.get_current_version()
|
||||
assert current_version >= 0
|
||||
|
||||
def test_migration_001_can_run(self, temp_index_db):
|
||||
"""Test that migration_001 can be applied."""
|
||||
conn = temp_index_db._get_connection()
|
||||
|
||||
# Add some test data to semantic_metadata first
|
||||
conn.execute("""
|
||||
INSERT INTO files(id, name, full_path, language, content, mtime, line_count)
|
||||
VALUES(100, 'test.py', '/test_migration.py', 'python', 'def test(): pass', 0, 10)
|
||||
""")
|
||||
conn.execute("""
|
||||
INSERT INTO semantic_metadata(file_id, keywords)
|
||||
VALUES(100, ?)
|
||||
""", (json.dumps(["test", "keyword"]),))
|
||||
conn.commit()
|
||||
|
||||
# Run migration (should be idempotent, tables already created by initialize())
|
||||
try:
|
||||
migration_001_normalize_keywords.upgrade(conn)
|
||||
success = True
|
||||
except Exception as e:
|
||||
success = False
|
||||
print(f"Migration failed: {e}")
|
||||
|
||||
assert success
|
||||
|
||||
# Verify data was migrated
|
||||
keyword_count = conn.execute("""
|
||||
SELECT COUNT(*) as c FROM file_keywords WHERE file_id=100
|
||||
""").fetchone()["c"]
|
||||
|
||||
assert keyword_count == 2 # "test" and "keyword"
|
||||
|
||||
|
||||
class TestPerformanceComparison:
|
||||
"""Compare performance of old vs new implementations."""
|
||||
|
||||
def test_keyword_search_performance(self, populated_index_db):
|
||||
"""Compare keyword search performance.
|
||||
|
||||
IMPORTANT: The normalized query optimization is designed for large datasets
|
||||
(1000+ files). On small datasets (< 1000 files), the overhead of JOINs and
|
||||
GROUP BY operations can make the normalized query slower than the simple
|
||||
LIKE query on JSON fields. This is expected behavior.
|
||||
|
||||
Performance benefits appear when:
|
||||
- Dataset size > 1000 files
|
||||
- Full-table scans on JSON LIKE become the bottleneck
|
||||
- Index-based lookups provide O(log N) complexity advantage
|
||||
"""
|
||||
# Normalized search
|
||||
start = time.perf_counter()
|
||||
normalized_results = populated_index_db.search_semantic_keywords(
|
||||
"auth",
|
||||
use_normalized=True
|
||||
)
|
||||
normalized_time = time.perf_counter() - start
|
||||
|
||||
# Fallback search
|
||||
start = time.perf_counter()
|
||||
fallback_results = populated_index_db.search_semantic_keywords(
|
||||
"auth",
|
||||
use_normalized=False
|
||||
)
|
||||
fallback_time = time.perf_counter() - start
|
||||
|
||||
# Verify correctness: both queries should return identical results
|
||||
assert len(normalized_results) == len(fallback_results)
|
||||
|
||||
# Verify result content matches
|
||||
normalized_files = {entry.id for entry, _ in normalized_results}
|
||||
fallback_files = {entry.id for entry, _ in fallback_results}
|
||||
assert normalized_files == fallback_files, "Both queries must return same files"
|
||||
|
||||
# Document performance characteristics (no strict assertion)
|
||||
# On datasets < 1000 files, normalized may be slower due to JOIN overhead
|
||||
print(f"\nKeyword search performance (100 files):")
|
||||
print(f" Normalized: {normalized_time*1000:.3f}ms")
|
||||
print(f" Fallback: {fallback_time*1000:.3f}ms")
|
||||
print(f" Ratio: {normalized_time/fallback_time:.2f}x")
|
||||
print(f" Note: Performance benefits appear with 1000+ files")
|
||||
|
||||
def test_prefix_vs_substring_symbol_search(self, populated_index_db):
|
||||
"""Compare prefix vs substring symbol search performance.
|
||||
|
||||
IMPORTANT: Prefix search optimization (LIKE 'prefix%') benefits from B-tree
|
||||
indexes, but on small datasets (< 1000 symbols), the performance difference
|
||||
may not be measurable or may even be slower due to query planner overhead.
|
||||
|
||||
Performance benefits appear when:
|
||||
- Symbol count > 1000
|
||||
- Index-based prefix search provides O(log N) advantage
|
||||
- Full table scans with LIKE '%substring%' become bottleneck
|
||||
"""
|
||||
# Prefix search (optimized)
|
||||
start = time.perf_counter()
|
||||
prefix_results = populated_index_db.search_symbols("get", prefix_mode=True)
|
||||
prefix_time = time.perf_counter() - start
|
||||
|
||||
# Substring search (fallback)
|
||||
start = time.perf_counter()
|
||||
substring_results = populated_index_db.search_symbols("get", prefix_mode=False)
|
||||
substring_time = time.perf_counter() - start
|
||||
|
||||
# Verify correctness: prefix results should be subset of substring results
|
||||
prefix_names = {s.name for s in prefix_results}
|
||||
substring_names = {s.name for s in substring_results}
|
||||
assert prefix_names.issubset(substring_names), "Prefix must be subset of substring"
|
||||
|
||||
# Verify all prefix results actually start with search term
|
||||
for symbol in prefix_results:
|
||||
assert symbol.name.startswith("get"), f"Symbol {symbol.name} should start with 'get'"
|
||||
|
||||
# Document performance characteristics (no strict assertion)
|
||||
# On datasets < 1000 symbols, performance difference is negligible
|
||||
print(f"\nSymbol search performance (150 symbols):")
|
||||
print(f" Prefix: {prefix_time*1000:.3f}ms ({len(prefix_results)} results)")
|
||||
print(f" Substring: {substring_time*1000:.3f}ms ({len(substring_results)} results)")
|
||||
print(f" Ratio: {prefix_time/substring_time:.2f}x")
|
||||
print(f" Note: Performance benefits appear with 1000+ symbols")
|
||||
287
codex-lens/tests/validate_optimizations.py
Normal file
287
codex-lens/tests/validate_optimizations.py
Normal file
@@ -0,0 +1,287 @@
|
||||
"""
|
||||
Manual validation script for performance optimizations.
|
||||
|
||||
This script verifies that the optimization implementations are working correctly.
|
||||
Run with: python tests/validate_optimizations.py
|
||||
"""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
from codexlens.storage.dir_index import DirIndexStore
|
||||
from codexlens.storage.registry import RegistryStore
|
||||
from codexlens.storage.migration_manager import MigrationManager
|
||||
from codexlens.storage.migrations import migration_001_normalize_keywords
|
||||
|
||||
|
||||
def test_keyword_normalization():
|
||||
"""Test normalized keywords functionality."""
|
||||
print("\n=== Testing Keyword Normalization ===")
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
db_path = Path(tmpdir) / "test_index.db"
|
||||
store = DirIndexStore(db_path)
|
||||
store.initialize() # Create schema
|
||||
|
||||
# Add a test file
|
||||
# Note: add_file automatically calculates mtime and line_count
|
||||
file_id = store.add_file(
|
||||
name="test.py",
|
||||
full_path=Path("/test/test.py"),
|
||||
content="def hello(): pass",
|
||||
language="python"
|
||||
)
|
||||
|
||||
# Add semantic metadata with keywords
|
||||
keywords = ["auth", "security", "jwt"]
|
||||
store.add_semantic_metadata(
|
||||
file_id=file_id,
|
||||
summary="Test summary",
|
||||
keywords=keywords,
|
||||
purpose="Testing",
|
||||
llm_tool="gemini"
|
||||
)
|
||||
|
||||
conn = store._get_connection()
|
||||
|
||||
# Verify keywords table populated
|
||||
keyword_rows = conn.execute("""
|
||||
SELECT k.keyword
|
||||
FROM file_keywords fk
|
||||
JOIN keywords k ON fk.keyword_id = k.id
|
||||
WHERE fk.file_id = ?
|
||||
""", (file_id,)).fetchall()
|
||||
|
||||
normalized_keywords = [row["keyword"] for row in keyword_rows]
|
||||
print(f"✓ Keywords stored in normalized tables: {normalized_keywords}")
|
||||
assert set(normalized_keywords) == set(keywords), "Keywords mismatch!"
|
||||
|
||||
# Test optimized search
|
||||
results = store.search_semantic_keywords("auth", use_normalized=True)
|
||||
print(f"✓ Found {len(results)} file(s) with keyword 'auth'")
|
||||
assert len(results) > 0, "No results found!"
|
||||
|
||||
# Test fallback search
|
||||
results_fallback = store.search_semantic_keywords("auth", use_normalized=False)
|
||||
print(f"✓ Fallback search found {len(results_fallback)} file(s)")
|
||||
assert len(results) == len(results_fallback), "Result count mismatch!"
|
||||
|
||||
store.close()
|
||||
print("✓ Keyword normalization tests PASSED")
|
||||
|
||||
|
||||
def test_path_lookup_optimization():
|
||||
"""Test optimized path lookup."""
|
||||
print("\n=== Testing Path Lookup Optimization ===")
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
db_path = Path(tmpdir) / "test_registry.db"
|
||||
store = RegistryStore(db_path)
|
||||
|
||||
# Add directory mapping
|
||||
store.add_dir_mapping(
|
||||
source_path=Path("/a/b/c"),
|
||||
index_path=Path("/tmp/index.db"),
|
||||
project_id=None
|
||||
)
|
||||
|
||||
# Test deep path lookup
|
||||
deep_path = Path("/a/b/c/d/e/f/g/h/i/j/file.py")
|
||||
|
||||
start = time.perf_counter()
|
||||
result = store.find_nearest_index(deep_path)
|
||||
elapsed = time.perf_counter() - start
|
||||
|
||||
print(f"✓ Found nearest index in {elapsed*1000:.2f}ms")
|
||||
assert result is not None, "No result found!"
|
||||
assert result.source_path == Path("/a/b/c"), "Wrong path found!"
|
||||
assert elapsed < 0.05, f"Too slow: {elapsed*1000:.2f}ms"
|
||||
|
||||
store.close()
|
||||
print("✓ Path lookup optimization tests PASSED")
|
||||
|
||||
|
||||
def test_symbol_search_prefix_mode():
|
||||
"""Test symbol search with prefix mode."""
|
||||
print("\n=== Testing Symbol Search Prefix Mode ===")
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
db_path = Path(tmpdir) / "test_index.db"
|
||||
store = DirIndexStore(db_path)
|
||||
store.initialize() # Create schema
|
||||
|
||||
# Add a test file
|
||||
file_id = store.add_file(
|
||||
name="test.py",
|
||||
full_path=Path("/test/test.py"),
|
||||
content="def hello(): pass\n" * 10, # 10 lines
|
||||
language="python"
|
||||
)
|
||||
|
||||
# Add symbols
|
||||
store.add_symbols(
|
||||
file_id=file_id,
|
||||
symbols=[
|
||||
("get_user", "function", 1, 5),
|
||||
("get_item", "function", 6, 10),
|
||||
("create_user", "function", 11, 15),
|
||||
("UserClass", "class", 16, 25),
|
||||
]
|
||||
)
|
||||
|
||||
# Test prefix search
|
||||
results = store.search_symbols("get", prefix_mode=True)
|
||||
print(f"✓ Prefix search for 'get' found {len(results)} symbol(s)")
|
||||
assert len(results) == 2, f"Expected 2 symbols, got {len(results)}"
|
||||
for symbol in results:
|
||||
assert symbol.name.startswith("get"), f"Symbol {symbol.name} doesn't start with 'get'"
|
||||
print(f" Symbols: {[s.name for s in results]}")
|
||||
|
||||
# Test substring search
|
||||
results_sub = store.search_symbols("user", prefix_mode=False)
|
||||
print(f"✓ Substring search for 'user' found {len(results_sub)} symbol(s)")
|
||||
assert len(results_sub) == 3, f"Expected 3 symbols, got {len(results_sub)}"
|
||||
print(f" Symbols: {[s.name for s in results_sub]}")
|
||||
|
||||
store.close()
|
||||
print("✓ Symbol search optimization tests PASSED")
|
||||
|
||||
|
||||
def test_migration_001():
|
||||
"""Test migration_001 execution."""
|
||||
print("\n=== Testing Migration 001 ===")
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
db_path = Path(tmpdir) / "test_index.db"
|
||||
store = DirIndexStore(db_path)
|
||||
store.initialize() # Create schema
|
||||
conn = store._get_connection()
|
||||
|
||||
# Add test data to semantic_metadata
|
||||
conn.execute("""
|
||||
INSERT INTO files(id, name, full_path, language, mtime, line_count)
|
||||
VALUES(1, 'test.py', '/test.py', 'python', 0, 10)
|
||||
""")
|
||||
conn.execute("""
|
||||
INSERT INTO semantic_metadata(file_id, keywords)
|
||||
VALUES(1, ?)
|
||||
""", (json.dumps(["test", "migration", "keyword"]),))
|
||||
conn.commit()
|
||||
|
||||
# Run migration
|
||||
print(" Running migration_001...")
|
||||
migration_001_normalize_keywords.upgrade(conn)
|
||||
print(" Migration completed successfully")
|
||||
|
||||
# Verify migration results
|
||||
keyword_count = conn.execute("""
|
||||
SELECT COUNT(*) as c FROM file_keywords WHERE file_id=1
|
||||
""").fetchone()["c"]
|
||||
|
||||
print(f"✓ Migrated {keyword_count} keywords for file_id=1")
|
||||
assert keyword_count == 3, f"Expected 3 keywords, got {keyword_count}"
|
||||
|
||||
# Verify keywords table
|
||||
keywords = conn.execute("""
|
||||
SELECT k.keyword FROM keywords k
|
||||
JOIN file_keywords fk ON k.id = fk.keyword_id
|
||||
WHERE fk.file_id = 1
|
||||
""").fetchall()
|
||||
keyword_list = [row["keyword"] for row in keywords]
|
||||
print(f" Keywords: {keyword_list}")
|
||||
|
||||
store.close()
|
||||
print("✓ Migration 001 tests PASSED")
|
||||
|
||||
|
||||
def test_performance_comparison():
|
||||
"""Compare performance of optimized vs fallback implementations."""
|
||||
print("\n=== Performance Comparison ===")
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
db_path = Path(tmpdir) / "test_index.db"
|
||||
store = DirIndexStore(db_path)
|
||||
store.initialize() # Create schema
|
||||
|
||||
# Create test data
|
||||
print(" Creating test data...")
|
||||
for i in range(100):
|
||||
file_id = store.add_file(
|
||||
name=f"file_{i}.py",
|
||||
full_path=Path(f"/test/file_{i}.py"),
|
||||
content=f"def function_{i}(): pass",
|
||||
language="python"
|
||||
)
|
||||
|
||||
# Vary keywords
|
||||
if i % 3 == 0:
|
||||
keywords = ["auth", "security"]
|
||||
elif i % 3 == 1:
|
||||
keywords = ["database", "query"]
|
||||
else:
|
||||
keywords = ["api", "endpoint"]
|
||||
|
||||
store.add_semantic_metadata(
|
||||
file_id=file_id,
|
||||
summary=f"File {i}",
|
||||
keywords=keywords,
|
||||
purpose="Testing",
|
||||
llm_tool="gemini"
|
||||
)
|
||||
|
||||
# Benchmark normalized search
|
||||
print(" Benchmarking normalized search...")
|
||||
start = time.perf_counter()
|
||||
for _ in range(10):
|
||||
results_norm = store.search_semantic_keywords("auth", use_normalized=True)
|
||||
norm_time = time.perf_counter() - start
|
||||
|
||||
# Benchmark fallback search
|
||||
print(" Benchmarking fallback search...")
|
||||
start = time.perf_counter()
|
||||
for _ in range(10):
|
||||
results_fallback = store.search_semantic_keywords("auth", use_normalized=False)
|
||||
fallback_time = time.perf_counter() - start
|
||||
|
||||
print(f"\n Results:")
|
||||
print(f" - Normalized search: {norm_time*1000:.2f}ms (10 iterations)")
|
||||
print(f" - Fallback search: {fallback_time*1000:.2f}ms (10 iterations)")
|
||||
print(f" - Speedup factor: {fallback_time/norm_time:.2f}x")
|
||||
print(f" - Both found {len(results_norm)} files")
|
||||
|
||||
assert len(results_norm) == len(results_fallback), "Result count mismatch!"
|
||||
|
||||
store.close()
|
||||
print("✓ Performance comparison PASSED")
|
||||
|
||||
|
||||
def main():
|
||||
"""Run all validation tests."""
|
||||
print("=" * 60)
|
||||
print("CodexLens Performance Optimizations Validation")
|
||||
print("=" * 60)
|
||||
|
||||
try:
|
||||
test_keyword_normalization()
|
||||
test_path_lookup_optimization()
|
||||
test_symbol_search_prefix_mode()
|
||||
test_migration_001()
|
||||
test_performance_comparison()
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("✓✓✓ ALL VALIDATION TESTS PASSED ✓✓✓")
|
||||
print("=" * 60)
|
||||
return 0
|
||||
|
||||
except Exception as e:
|
||||
print(f"\nX VALIDATION FAILED: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit(main())
|
||||
Reference in New Issue
Block a user