mirror of
https://github.com/catlog22/Claude-Code-Workflow.git
synced 2026-03-21 19:08:17 +08:00
feat: enhance search, ranking, reranker and CLI tooling across ccw and codex-lens
Major improvements to smart-search, chain-search cascade, ranking pipeline, reranker factory, CLI history store, codex-lens integration, and uv-manager. Simplify command-generator skill by inlining phases. Add comprehensive tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,190 +1,307 @@
|
|||||||
---
|
---
|
||||||
name: command-generator
|
name: command-generator
|
||||||
description: Command file generator - 5 phase workflow for creating Claude Code command files with YAML frontmatter. Generates .md command files for project or user scope. Triggers on "create command", "new command", "command generator".
|
description: Command file generator - 5 phase workflow for creating Claude Code command files with YAML frontmatter. Generates .md command files for project or user scope. Triggers on "create command", "new command", "command generator".
|
||||||
allowed-tools: Read, Write, Edit, Bash, Glob
|
allowed-tools: Read, Write, Edit, Bash, Glob, AskUserQuestion
|
||||||
---
|
---
|
||||||
|
|
||||||
# Command Generator
|
<purpose>
|
||||||
|
Generate Claude Code command .md files with concrete, domain-specific content in GSD style. Produces command files at project-level (`.claude/commands/`) or user-level (`~/.claude/commands/`) with YAML frontmatter, XML semantic tags (`<purpose>`, `<process>`, `<step>`, `<error_codes>`, `<success_criteria>`), and actionable execution logic — NOT empty placeholders.
|
||||||
|
|
||||||
CLI-based command file generator producing Claude Code command .md files through a structured 5-phase workflow. Supports both project-level (`.claude/commands/`) and user-level (`~/.claude/commands/`) command locations.
|
Invoked when user requests "create command", "new command", or "command generator".
|
||||||
|
</purpose>
|
||||||
|
|
||||||
## Architecture Overview
|
<required_reading>
|
||||||
|
- @.claude/skills/command-generator/specs/command-design-spec.md
|
||||||
|
- @.claude/skills/command-generator/templates/command-md.md
|
||||||
|
</required_reading>
|
||||||
|
|
||||||
|
<process>
|
||||||
|
|
||||||
|
<step name="validate_params" priority="first">
|
||||||
|
**Parse and validate all input parameters.**
|
||||||
|
|
||||||
|
Extract from `$ARGUMENTS` or skill args:
|
||||||
|
|
||||||
|
| Parameter | Required | Validation | Example |
|
||||||
|
|-----------|----------|------------|---------|
|
||||||
|
| `$SKILL_NAME` | Yes | `/^[a-z][a-z0-9-]*$/`, min 1 char | `deploy`, `create` |
|
||||||
|
| `$DESCRIPTION` | Yes | min 10 chars | `"Deploy application to production"` |
|
||||||
|
| `$LOCATION` | Yes | `"project"` or `"user"` | `project` |
|
||||||
|
| `$GROUP` | No | `/^[a-z][a-z0-9-]*$/` or null | `issue`, `workflow` |
|
||||||
|
| `$ARGUMENT_HINT` | No | any string or empty | `"<url> [--priority 1-5]"` |
|
||||||
|
|
||||||
|
**Validation rules:**
|
||||||
|
- Missing required param → Error with specific message (e.g., `"skillName is required"`)
|
||||||
|
- Invalid `$SKILL_NAME` pattern → Error: `"skillName must be lowercase alphanumeric with hyphens, starting with a letter"`
|
||||||
|
- Invalid `$LOCATION` → Error: `"location must be 'project' or 'user'"`
|
||||||
|
- Invalid `$GROUP` pattern → Warning, continue
|
||||||
|
|
||||||
|
**Normalize:** trim + lowercase for `$SKILL_NAME`, `$LOCATION`, `$GROUP`.
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step name="resolve_target_path">
|
||||||
|
**Resolve target file path based on location and group.**
|
||||||
|
|
||||||
|
Path mapping:
|
||||||
|
|
||||||
|
| Location | Base Directory |
|
||||||
|
|----------|---------------|
|
||||||
|
| `project` | `.claude/commands` |
|
||||||
|
| `user` | `~/.claude/commands` (expand `~` to `$HOME`) |
|
||||||
|
|
||||||
|
Path construction:
|
||||||
|
|
||||||
```
|
```
|
||||||
+-----------------------------------------------------------+
|
If $GROUP:
|
||||||
| Command Generator |
|
$TARGET_DIR = {base}/{$GROUP}
|
||||||
| |
|
$TARGET_PATH = {base}/{$GROUP}/{$SKILL_NAME}.md
|
||||||
| Input: skillName, description, location, [group], [hint] |
|
Else:
|
||||||
| | |
|
$TARGET_DIR = {base}
|
||||||
| +-------------------------------------------------+ |
|
$TARGET_PATH = {base}/{$SKILL_NAME}.md
|
||||||
| | Phase 1-5: Sequential Pipeline | |
|
|
||||||
| | | |
|
|
||||||
| | [P1] --> [P2] --> [P3] --> [P4] --> [P5] | |
|
|
||||||
| | Param Target Template Content File | |
|
|
||||||
| | Valid Path Loading Format Gen | |
|
|
||||||
| +-------------------------------------------------+ |
|
|
||||||
| | |
|
|
||||||
| Output: {scope}/.claude/commands/{group}/{name}.md |
|
|
||||||
| |
|
|
||||||
+-----------------------------------------------------------+
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Key Design Principles
|
Check if `$TARGET_PATH` already exists → store as `$FILE_EXISTS` (true/false).
|
||||||
|
</step>
|
||||||
|
|
||||||
1. **Single Responsibility**: Generates one command file per invocation
|
<step name="gather_requirements">
|
||||||
2. **Scope Awareness**: Supports project and user-level command locations
|
**Gather domain-specific requirements to generate concrete content.**
|
||||||
3. **Template-Driven**: Uses consistent template for all generated commands
|
|
||||||
4. **Validation First**: Validates all required parameters before file operations
|
|
||||||
5. **Non-Destructive**: Warns if command file already exists
|
|
||||||
|
|
||||||
|
Infer the command's domain from `$SKILL_NAME`, `$DESCRIPTION`, and `$ARGUMENT_HINT`:
|
||||||
|
|
||||||
|
| Signal | Extract |
|
||||||
|
|--------|---------|
|
||||||
|
| `$SKILL_NAME` | Action verb (deploy, create, analyze, sync) → step naming |
|
||||||
|
| `$DESCRIPTION` | Domain keywords → execution logic, error scenarios |
|
||||||
|
| `$ARGUMENT_HINT` | Flags/args → parse_input step details, validation rules |
|
||||||
|
| `$GROUP` | Command family → related commands, shared patterns |
|
||||||
|
|
||||||
|
**Determine command complexity:**
|
||||||
|
|
||||||
|
| Complexity | Criteria | Steps to Generate |
|
||||||
|
|------------|----------|-------------------|
|
||||||
|
| Simple | Single action, no flags | 2-3 steps |
|
||||||
|
| Standard | 1-2 flags, clear workflow | 3-4 steps |
|
||||||
|
| Complex | Multiple flags, multi-phase | 4-6 steps |
|
||||||
|
|
||||||
|
**If complexity is unclear**, ask user:
|
||||||
|
|
||||||
|
```
|
||||||
|
AskUserQuestion(
|
||||||
|
header: "Command Scope",
|
||||||
|
question: "What are the main execution steps for this command?",
|
||||||
|
options: [
|
||||||
|
{ label: "Simple", description: "Single action: validate → execute → report" },
|
||||||
|
{ label: "Standard", description: "Multi-step: parse → process → verify → report" },
|
||||||
|
{ label: "Complex", description: "Full workflow: parse → explore → execute → verify → report" },
|
||||||
|
{ label: "I'll describe", description: "Let me specify the steps" }
|
||||||
|
]
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Store as `$COMMAND_STEPS`, `$ERROR_SCENARIOS`, `$SUCCESS_CONDITIONS`.
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step name="draft_content">
|
||||||
|
**Generate concrete, domain-specific command content in GSD style.**
|
||||||
|
|
||||||
|
This is the core generation step. Draft the COMPLETE command file — not a template with placeholders — using the gathered requirements.
|
||||||
|
|
||||||
|
**YAML Frontmatter:**
|
||||||
|
```yaml
|
||||||
---
|
---
|
||||||
|
name: $SKILL_NAME
|
||||||
## Execution Flow
|
description: $DESCRIPTION
|
||||||
|
argument-hint: $ARGUMENT_HINT # only if provided
|
||||||
```
|
|
||||||
Phase 1: Parameter Validation
|
|
||||||
- Ref: phases/01-parameter-validation.md
|
|
||||||
- Validate: skillName (required), description (required), location (required)
|
|
||||||
- Optional: group, argumentHint
|
|
||||||
- Output: validated params object
|
|
||||||
|
|
||||||
Phase 2: Target Path Resolution
|
|
||||||
- Ref: phases/02-target-path-resolution.md
|
|
||||||
- Resolve: location -> target commands directory
|
|
||||||
- Support: project (.claude/commands/) vs user (~/.claude/commands/)
|
|
||||||
- Handle: group subdirectory if provided
|
|
||||||
- Output: targetPath string
|
|
||||||
|
|
||||||
Phase 3: Template Loading
|
|
||||||
- Ref: phases/03-template-loading.md
|
|
||||||
- Load: templates/command-md.md
|
|
||||||
- Template contains YAML frontmatter with placeholders
|
|
||||||
- Output: templateContent string
|
|
||||||
|
|
||||||
Phase 4: Content Formatting
|
|
||||||
- Ref: phases/04-content-formatting.md
|
|
||||||
- Substitute: {{name}}, {{description}}, {{group}}, {{argumentHint}}
|
|
||||||
- Handle: optional fields (group, argumentHint)
|
|
||||||
- Output: formattedContent string
|
|
||||||
|
|
||||||
Phase 5: File Generation
|
|
||||||
- Ref: phases/05-file-generation.md
|
|
||||||
- Check: file existence (warn if exists)
|
|
||||||
- Write: formatted content to target path
|
|
||||||
- Output: success confirmation with file path
|
|
||||||
```
|
|
||||||
|
|
||||||
## Usage Examples
|
|
||||||
|
|
||||||
### Basic Command (Project Scope)
|
|
||||||
```javascript
|
|
||||||
Skill(skill="command-generator", args={
|
|
||||||
skillName: "deploy",
|
|
||||||
description: "Deploy application to production environment",
|
|
||||||
location: "project"
|
|
||||||
})
|
|
||||||
// Output: .claude/commands/deploy.md
|
|
||||||
```
|
|
||||||
|
|
||||||
### Grouped Command with Argument Hint
|
|
||||||
```javascript
|
|
||||||
Skill(skill="command-generator", args={
|
|
||||||
skillName: "create",
|
|
||||||
description: "Create new issue from GitHub URL or text",
|
|
||||||
location: "project",
|
|
||||||
group: "issue",
|
|
||||||
argumentHint: "[-y|--yes] <github-url | text-description> [--priority 1-5]"
|
|
||||||
})
|
|
||||||
// Output: .claude/commands/issue/create.md
|
|
||||||
```
|
|
||||||
|
|
||||||
### User-Level Command
|
|
||||||
```javascript
|
|
||||||
Skill(skill="command-generator", args={
|
|
||||||
skillName: "global-status",
|
|
||||||
description: "Show global Claude Code status",
|
|
||||||
location: "user"
|
|
||||||
})
|
|
||||||
// Output: ~/.claude/commands/global-status.md
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
```
|
||||||
|
|
||||||
## Reference Documents by Phase
|
**`<purpose>` section:** Write 2-3 sentences describing:
|
||||||
|
- What the command does (action + target)
|
||||||
|
- When it's invoked (trigger conditions)
|
||||||
|
- What it produces (output artifacts or effects)
|
||||||
|
|
||||||
### Phase 1: Parameter Validation
|
**`<required_reading>` section:** Infer from domain:
|
||||||
| Document | Purpose | When to Use |
|
- If command reads config → `@.claude/CLAUDE.md` or relevant config files
|
||||||
|----------|---------|-------------|
|
- If command modifies code → relevant source directories
|
||||||
| [phases/01-parameter-validation.md](phases/01-parameter-validation.md) | Validate required parameters | Phase 1 execution |
|
- If command is part of a group → other commands in the same group
|
||||||
|
|
||||||
### Phase 2: Target Path Resolution
|
**`<process>` section with `<step>` blocks:**
|
||||||
| Document | Purpose | When to Use |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| [phases/02-target-path-resolution.md](phases/02-target-path-resolution.md) | Resolve target directory | Phase 2 execution |
|
|
||||||
|
|
||||||
### Phase 3: Template Loading
|
For each step in `$COMMAND_STEPS`, generate a `<step name="snake_case">` block containing:
|
||||||
| Document | Purpose | When to Use |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| [phases/03-template-loading.md](phases/03-template-loading.md) | Load command template | Phase 3 execution |
|
|
||||||
| [templates/command-md.md](templates/command-md.md) | Command file template | Template reference |
|
|
||||||
|
|
||||||
### Phase 4: Content Formatting
|
1. **`parse_input`** (always first, `priority="first"`):
|
||||||
| Document | Purpose | When to Use |
|
- Parse `$ARGUMENTS` for flags and positional args derived from `$ARGUMENT_HINT`
|
||||||
|----------|---------|-------------|
|
- Include specific flag detection logic (e.g., `if arguments contain "--env"`)
|
||||||
| [phases/04-content-formatting.md](phases/04-content-formatting.md) | Format content with params | Phase 4 execution |
|
- Include validation with specific error messages
|
||||||
|
- Include decision routing table if multiple modes exist
|
||||||
|
|
||||||
### Phase 5: File Generation
|
2. **Domain-specific execution steps** (2-4 steps):
|
||||||
| Document | Purpose | When to Use |
|
- Each step has a **bold action description**
|
||||||
|----------|---------|-------------|
|
- Include concrete shell commands, file operations, or tool calls
|
||||||
| [phases/05-file-generation.md](phases/05-file-generation.md) | Write final file | Phase 5 execution |
|
- Use `$UPPER_CASE` variables for user input, `${computed}` for derived values
|
||||||
|
- Include conditional logic with specific conditions (not generic)
|
||||||
|
- Reference actual file paths and tool names
|
||||||
|
|
||||||
### Design Specifications
|
3. **`report`** (always last):
|
||||||
| Document | Purpose | When to Use |
|
- Format output with banner and status
|
||||||
|----------|---------|-------------|
|
- Include file paths, timestamps, next step suggestions
|
||||||
| [specs/command-design-spec.md](specs/command-design-spec.md) | Command design guidelines | Understanding best practices |
|
|
||||||
|
|
||||||
---
|
**Shell Correctness Checklist (MANDATORY for every shell block):**
|
||||||
|
|
||||||
## Output Structure
|
| Rule | Wrong | Correct |
|
||||||
|
|------|-------|---------|
|
||||||
|
| Multi-line output | `echo "{ ... }"` (unquoted multi-line) | `cat <<'EOF' > file`...`EOF` (heredoc) |
|
||||||
|
| Variable init | Use `$VAR` after conditional | `VAR="default"` BEFORE any conditional that sets it |
|
||||||
|
| Error exit | `echo "Error: ..."` (no exit) | `echo "Error: ..." # (see code: E00X)` + `exit 1` |
|
||||||
|
| Quoting | `$VAR` in commands | `"$VAR"` (double-quoted in all expansions) |
|
||||||
|
| Exit on fail | Command chain without checks | `set -e` or explicit `|| { echo "Failed"; exit 1; }` |
|
||||||
|
| Command from var | `$CMD --flag` (word-split fragile) | `eval "$CMD" --flag` or use array: `cmd=(...); "${cmd[@]}"` |
|
||||||
|
| Prerequisites | Implicit `git`/`curl` usage | Declare in `<prerequisites>` section |
|
||||||
|
|
||||||
### Generated Command File
|
**Golden Example — a correctly-written execution step:**
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
---
|
<step name="run_deployment">
|
||||||
name: {skillName}
|
**Execute deployment to target environment.**
|
||||||
description: {description}
|
|
||||||
{group} {argumentHint}
|
|
||||||
---
|
|
||||||
|
|
||||||
# {skillName} Command
|
$DEPLOY_STATUS="pending" # Initialize before conditional
|
||||||
|
|
||||||
## Overview
|
```bash
|
||||||
{Auto-generated placeholder for command overview}
|
# Save current state for rollback
|
||||||
|
cp .deploy/latest.json .deploy/previous.json 2>/dev/null || true
|
||||||
|
|
||||||
## Usage
|
# Write deployment manifest via heredoc
|
||||||
{Auto-generated placeholder for usage examples}
|
cat <<EOF > .deploy/latest.json
|
||||||
|
{
|
||||||
|
"env": "$ENV",
|
||||||
|
"tag": "$DEPLOY_TAG",
|
||||||
|
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
|
||||||
|
"commit": "$(git rev-parse --short HEAD)",
|
||||||
|
"status": "deploying"
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
|
||||||
## Execution Flow
|
# Execute deployment
|
||||||
{Auto-generated placeholder for execution steps}
|
if ! deploy_cmd --env "$ENV" --tag "$DEPLOY_TAG" 2>&1 | tee .deploy/latest.log; then
|
||||||
|
echo "Error: Deployment to $ENV failed" # (see code: E004)
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
$DEPLOY_STATUS="success"
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
| Condition | Action |
|
||||||
|
|-----------|--------|
|
||||||
|
| Deploy succeeds | Update status → `"deployed"`, continue to verify |
|
||||||
|
| Deploy fails | Log error `# (see code: E004)`, exit 1 |
|
||||||
|
| `$ROLLBACK_MODE` | Load `.deploy/previous.json`, redeploy prior version |
|
||||||
|
</step>
|
||||||
|
```
|
||||||
|
|
||||||
## Error Handling
|
**Optional `<prerequisites>` section** (include when command uses external tools):
|
||||||
|
|
||||||
| Error | Stage | Action |
|
```markdown
|
||||||
|-------|-------|--------|
|
<prerequisites>
|
||||||
| Missing skillName | Phase 1 | Error: "skillName is required" |
|
- git (2.20+) — version control operations
|
||||||
| Missing description | Phase 1 | Error: "description is required" |
|
- curl — health check endpoints
|
||||||
| Missing location | Phase 1 | Error: "location is required (project or user)" |
|
- jq — JSON processing (optional)
|
||||||
| Invalid location | Phase 2 | Error: "location must be 'project' or 'user'" |
|
</prerequisites>
|
||||||
| Template not found | Phase 3 | Error: "Command template not found" |
|
```
|
||||||
| File exists | Phase 5 | Warning: "Command file already exists, will overwrite" |
|
|
||||||
| Write failure | Phase 5 | Error: "Failed to write command file" |
|
|
||||||
|
|
||||||
---
|
**`<error_codes>` table:** Generate 3-6 specific error codes:
|
||||||
|
- Derive from `$ARGUMENT_HINT` validation failures (E001-E003)
|
||||||
|
- Derive from domain-specific failure modes (E004+)
|
||||||
|
- Include 1-2 warnings (W001+)
|
||||||
|
- Each code has: Code, Severity, Description, **Stage** (which step triggers it)
|
||||||
|
- **Cross-reference rule**: Every `# (see code: E00X)` comment in `<process>` MUST have a matching row in `<error_codes>`, and every error code row MUST be referenced by at least one inline comment
|
||||||
|
|
||||||
## Related Skills
|
**`<success_criteria>` checkboxes:** Generate 4-8 verifiable conditions:
|
||||||
|
- Input validation passed
|
||||||
|
- Each execution step completed its action
|
||||||
|
- Output artifacts exist / effects applied
|
||||||
|
- Report displayed
|
||||||
|
|
||||||
- **skill-generator**: Create complete skills with phases, templates, and specs
|
**Quality rules for generated content:**
|
||||||
- **flow-coordinator**: Orchestrate multi-step command workflows
|
- NO bracket placeholders (`[Describe...]`, `[List...]`) — all content must be concrete
|
||||||
|
- Steps must contain actionable logic, not descriptions of what to do
|
||||||
|
- Error codes must reference specific failure conditions from this command's domain
|
||||||
|
- Success criteria must be verifiable (not "command works correctly")
|
||||||
|
- Every shell block must pass the Shell Correctness Checklist above
|
||||||
|
- Follow patterns from @.claude/skills/command-generator/templates/command-md.md for structural reference only
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step name="write_file">
|
||||||
|
**Write generated content to target path.**
|
||||||
|
|
||||||
|
**If `$FILE_EXISTS`:** Warn: `"Command file already exists at {path}. Will overwrite."`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p "$TARGET_DIR"
|
||||||
|
```
|
||||||
|
|
||||||
|
Write the drafted content to `$TARGET_PATH` using Write tool.
|
||||||
|
|
||||||
|
**Verify:** Read back the file and confirm:
|
||||||
|
- File exists and is non-empty
|
||||||
|
- Contains `<purpose>` tag with concrete content (no placeholders)
|
||||||
|
- Contains at least 2 `<step name=` blocks with shell code or tool calls
|
||||||
|
- Contains `<error_codes>` with at least 3 rows including Stage column
|
||||||
|
- Contains `<success_criteria>` with at least 4 checkboxes
|
||||||
|
- No unresolved `{{...}}` or `[...]` placeholders remain
|
||||||
|
- Every `# (see code: E0XX)` has a matching `<error_codes>` row (cross-ref check)
|
||||||
|
- Every shell block uses heredoc for multi-line output (no bare multi-line echo)
|
||||||
|
- All state variables initialized before conditional use
|
||||||
|
- All error paths include `exit 1` after error message
|
||||||
|
|
||||||
|
**If verification fails:** Fix the content in-place using Edit tool.
|
||||||
|
|
||||||
|
**Report completion:**
|
||||||
|
|
||||||
|
```
|
||||||
|
Command generated successfully!
|
||||||
|
|
||||||
|
File: {$TARGET_PATH}
|
||||||
|
Name: {$SKILL_NAME}
|
||||||
|
Description: {$DESCRIPTION}
|
||||||
|
Location: {$LOCATION}
|
||||||
|
Group: {$GROUP or "(none)"}
|
||||||
|
Steps: {number of <step> blocks generated}
|
||||||
|
Error codes: {number of error codes}
|
||||||
|
|
||||||
|
Next Steps:
|
||||||
|
1. Review and customize {$TARGET_PATH}
|
||||||
|
2. Test: /{$GROUP}:{$SKILL_NAME} or /{$SKILL_NAME}
|
||||||
|
```
|
||||||
|
</step>
|
||||||
|
|
||||||
|
</process>
|
||||||
|
|
||||||
|
<error_codes>
|
||||||
|
|
||||||
|
| Code | Severity | Description | Stage |
|
||||||
|
|------|----------|-------------|-------|
|
||||||
|
| E001 | error | skillName is required | validate_params |
|
||||||
|
| E002 | error | description is required (min 10 chars) | validate_params |
|
||||||
|
| E003 | error | location is required ("project" or "user") | validate_params |
|
||||||
|
| E004 | error | skillName must be lowercase alphanumeric with hyphens | validate_params |
|
||||||
|
| E005 | error | Failed to infer command domain from description | gather_requirements |
|
||||||
|
| E006 | error | Failed to write command file | write_file |
|
||||||
|
| E007 | error | Generated content contains unresolved placeholders | write_file |
|
||||||
|
| W001 | warning | group must be lowercase alphanumeric with hyphens | validate_params |
|
||||||
|
| W002 | warning | Command file already exists, will overwrite | write_file |
|
||||||
|
| W003 | warning | Could not infer required_reading, using defaults | draft_content |
|
||||||
|
|
||||||
|
</error_codes>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- [ ] All required parameters validated ($SKILL_NAME, $DESCRIPTION, $LOCATION)
|
||||||
|
- [ ] Target path resolved with correct scope (project vs user) and group
|
||||||
|
- [ ] Command domain inferred from description and argument hint
|
||||||
|
- [ ] Concrete `<purpose>` drafted (no placeholders)
|
||||||
|
- [ ] 2-6 `<step>` blocks generated with domain-specific logic
|
||||||
|
- [ ] `<error_codes>` table generated with 3+ specific codes
|
||||||
|
- [ ] `<success_criteria>` generated with 4+ verifiable checkboxes
|
||||||
|
- [ ] File written to $TARGET_PATH and verified
|
||||||
|
- [ ] Zero bracket placeholders in final output
|
||||||
|
- [ ] Completion report displayed
|
||||||
|
</success_criteria>
|
||||||
|
|||||||
@@ -1,174 +0,0 @@
|
|||||||
# Phase 1: Parameter Validation
|
|
||||||
|
|
||||||
Validate all required parameters for command generation.
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Ensure all required parameters are provided before proceeding with command generation:
|
|
||||||
- **skillName**: Command identifier (required)
|
|
||||||
- **description**: Command description (required)
|
|
||||||
- **location**: Target scope - "project" or "user" (required)
|
|
||||||
- **group**: Optional grouping subdirectory
|
|
||||||
- **argumentHint**: Optional argument hint string
|
|
||||||
|
|
||||||
## Input
|
|
||||||
|
|
||||||
Parameters received from skill invocation:
|
|
||||||
- `skillName`: string (required)
|
|
||||||
- `description`: string (required)
|
|
||||||
- `location`: "project" | "user" (required)
|
|
||||||
- `group`: string (optional)
|
|
||||||
- `argumentHint`: string (optional)
|
|
||||||
|
|
||||||
## Validation Rules
|
|
||||||
|
|
||||||
### Required Parameters
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const requiredParams = {
|
|
||||||
skillName: {
|
|
||||||
type: 'string',
|
|
||||||
minLength: 1,
|
|
||||||
pattern: /^[a-z][a-z0-9-]*$/, // lowercase, alphanumeric, hyphens
|
|
||||||
error: 'skillName must be lowercase alphanumeric with hyphens, starting with a letter'
|
|
||||||
},
|
|
||||||
description: {
|
|
||||||
type: 'string',
|
|
||||||
minLength: 10,
|
|
||||||
error: 'description must be at least 10 characters'
|
|
||||||
},
|
|
||||||
location: {
|
|
||||||
type: 'string',
|
|
||||||
enum: ['project', 'user'],
|
|
||||||
error: 'location must be "project" or "user"'
|
|
||||||
}
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
### Optional Parameters
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const optionalParams = {
|
|
||||||
group: {
|
|
||||||
type: 'string',
|
|
||||||
pattern: /^[a-z][a-z0-9-]*$/,
|
|
||||||
default: null,
|
|
||||||
error: 'group must be lowercase alphanumeric with hyphens'
|
|
||||||
},
|
|
||||||
argumentHint: {
|
|
||||||
type: 'string',
|
|
||||||
default: '',
|
|
||||||
error: 'argumentHint must be a string'
|
|
||||||
}
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
## Execution Steps
|
|
||||||
|
|
||||||
### Step 1: Extract Parameters
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Extract from skill args
|
|
||||||
const params = {
|
|
||||||
skillName: args.skillName,
|
|
||||||
description: args.description,
|
|
||||||
location: args.location,
|
|
||||||
group: args.group || null,
|
|
||||||
argumentHint: args.argumentHint || ''
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Validate Required Parameters
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
function validateRequired(params, rules) {
|
|
||||||
const errors = [];
|
|
||||||
|
|
||||||
for (const [key, rule] of Object.entries(rules)) {
|
|
||||||
const value = params[key];
|
|
||||||
|
|
||||||
// Check existence
|
|
||||||
if (value === undefined || value === null || value === '') {
|
|
||||||
errors.push(`${key} is required`);
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check type
|
|
||||||
if (typeof value !== rule.type) {
|
|
||||||
errors.push(`${key} must be a ${rule.type}`);
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check minLength
|
|
||||||
if (rule.minLength && value.length < rule.minLength) {
|
|
||||||
errors.push(`${key} must be at least ${rule.minLength} characters`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check pattern
|
|
||||||
if (rule.pattern && !rule.pattern.test(value)) {
|
|
||||||
errors.push(rule.error);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check enum
|
|
||||||
if (rule.enum && !rule.enum.includes(value)) {
|
|
||||||
errors.push(`${key} must be one of: ${rule.enum.join(', ')}`);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return errors;
|
|
||||||
}
|
|
||||||
|
|
||||||
const requiredErrors = validateRequired(params, requiredParams);
|
|
||||||
if (requiredErrors.length > 0) {
|
|
||||||
throw new Error(`Validation failed:\n${requiredErrors.join('\n')}`);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 3: Validate Optional Parameters
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
function validateOptional(params, rules) {
|
|
||||||
const warnings = [];
|
|
||||||
|
|
||||||
for (const [key, rule] of Object.entries(rules)) {
|
|
||||||
const value = params[key];
|
|
||||||
|
|
||||||
if (value !== null && value !== undefined && value !== '') {
|
|
||||||
if (rule.pattern && !rule.pattern.test(value)) {
|
|
||||||
warnings.push(`${key}: ${rule.error}`);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return warnings;
|
|
||||||
}
|
|
||||||
|
|
||||||
const optionalWarnings = validateOptional(params, optionalParams);
|
|
||||||
// Log warnings but continue
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Normalize Parameters
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const validatedParams = {
|
|
||||||
skillName: params.skillName.trim().toLowerCase(),
|
|
||||||
description: params.description.trim(),
|
|
||||||
location: params.location.trim().toLowerCase(),
|
|
||||||
group: params.group ? params.group.trim().toLowerCase() : null,
|
|
||||||
argumentHint: params.argumentHint ? params.argumentHint.trim() : ''
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
## Output
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
status: 'validated',
|
|
||||||
params: validatedParams,
|
|
||||||
warnings: optionalWarnings
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Next Phase
|
|
||||||
|
|
||||||
Proceed to [Phase 2: Target Path Resolution](02-target-path-resolution.md) with `validatedParams`.
|
|
||||||
@@ -1,171 +0,0 @@
|
|||||||
# Phase 2: Target Path Resolution
|
|
||||||
|
|
||||||
Resolve the target commands directory based on location parameter.
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Determine the correct target path for the command file based on:
|
|
||||||
- **location**: "project" or "user" scope
|
|
||||||
- **group**: Optional subdirectory for command organization
|
|
||||||
- **skillName**: Command filename (with .md extension)
|
|
||||||
|
|
||||||
## Input
|
|
||||||
|
|
||||||
From Phase 1 validation:
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
skillName: string, // e.g., "create"
|
|
||||||
description: string,
|
|
||||||
location: "project" | "user",
|
|
||||||
group: string | null, // e.g., "issue"
|
|
||||||
argumentHint: string
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Path Resolution Rules
|
|
||||||
|
|
||||||
### Location Mapping
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const locationMap = {
|
|
||||||
project: '.claude/commands',
|
|
||||||
user: '~/.claude/commands' // Expands to user home directory
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
### Path Construction
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
function resolveTargetPath(params) {
|
|
||||||
const baseDir = locationMap[params.location];
|
|
||||||
|
|
||||||
if (!baseDir) {
|
|
||||||
throw new Error(`Invalid location: ${params.location}. Must be "project" or "user".`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Expand ~ to user home if present
|
|
||||||
const expandedBase = baseDir.startsWith('~')
|
|
||||||
? path.join(os.homedir(), baseDir.slice(1))
|
|
||||||
: baseDir;
|
|
||||||
|
|
||||||
// Build full path
|
|
||||||
let targetPath;
|
|
||||||
if (params.group) {
|
|
||||||
// Grouped command: .claude/commands/{group}/{skillName}.md
|
|
||||||
targetPath = path.join(expandedBase, params.group, `${params.skillName}.md`);
|
|
||||||
} else {
|
|
||||||
// Top-level command: .claude/commands/{skillName}.md
|
|
||||||
targetPath = path.join(expandedBase, `${params.skillName}.md`);
|
|
||||||
}
|
|
||||||
|
|
||||||
return targetPath;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Execution Steps
|
|
||||||
|
|
||||||
### Step 1: Get Base Directory
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const location = validatedParams.location;
|
|
||||||
const baseDir = locationMap[location];
|
|
||||||
|
|
||||||
if (!baseDir) {
|
|
||||||
throw new Error(`Invalid location: ${location}. Must be "project" or "user".`);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Expand User Path (if applicable)
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const os = require('os');
|
|
||||||
const path = require('path');
|
|
||||||
|
|
||||||
let expandedBase = baseDir;
|
|
||||||
if (baseDir.startsWith('~')) {
|
|
||||||
expandedBase = path.join(os.homedir(), baseDir.slice(1));
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 3: Construct Full Path
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
let targetPath;
|
|
||||||
let targetDir;
|
|
||||||
|
|
||||||
if (validatedParams.group) {
|
|
||||||
// Command with group subdirectory
|
|
||||||
targetDir = path.join(expandedBase, validatedParams.group);
|
|
||||||
targetPath = path.join(targetDir, `${validatedParams.skillName}.md`);
|
|
||||||
} else {
|
|
||||||
// Top-level command
|
|
||||||
targetDir = expandedBase;
|
|
||||||
targetPath = path.join(targetDir, `${validatedParams.skillName}.md`);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Ensure Target Directory Exists
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Check and create directory if needed
|
|
||||||
Bash(`mkdir -p "${targetDir}"`);
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 5: Check File Existence
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const fileExists = Bash(`test -f "${targetPath}" && echo "EXISTS" || echo "NOT_FOUND"`);
|
|
||||||
|
|
||||||
if (fileExists.includes('EXISTS')) {
|
|
||||||
console.warn(`Warning: Command file already exists at ${targetPath}. Will overwrite.`);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Output
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
status: 'resolved',
|
|
||||||
targetPath: targetPath, // Full path to command file
|
|
||||||
targetDir: targetDir, // Directory containing command
|
|
||||||
fileName: `${skillName}.md`,
|
|
||||||
fileExists: fileExists.includes('EXISTS'),
|
|
||||||
params: validatedParams // Pass through to next phase
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Path Examples
|
|
||||||
|
|
||||||
### Project Scope (No Group)
|
|
||||||
```
|
|
||||||
location: "project"
|
|
||||||
skillName: "deploy"
|
|
||||||
-> .claude/commands/deploy.md
|
|
||||||
```
|
|
||||||
|
|
||||||
### Project Scope (With Group)
|
|
||||||
```
|
|
||||||
location: "project"
|
|
||||||
skillName: "create"
|
|
||||||
group: "issue"
|
|
||||||
-> .claude/commands/issue/create.md
|
|
||||||
```
|
|
||||||
|
|
||||||
### User Scope (No Group)
|
|
||||||
```
|
|
||||||
location: "user"
|
|
||||||
skillName: "global-status"
|
|
||||||
-> ~/.claude/commands/global-status.md
|
|
||||||
```
|
|
||||||
|
|
||||||
### User Scope (With Group)
|
|
||||||
```
|
|
||||||
location: "user"
|
|
||||||
skillName: "sync"
|
|
||||||
group: "session"
|
|
||||||
-> ~/.claude/commands/session/sync.md
|
|
||||||
```
|
|
||||||
|
|
||||||
## Next Phase
|
|
||||||
|
|
||||||
Proceed to [Phase 3: Template Loading](03-template-loading.md) with `targetPath` and `params`.
|
|
||||||
@@ -1,123 +0,0 @@
|
|||||||
# Phase 3: Template Loading
|
|
||||||
|
|
||||||
Load the command template file for content generation.
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Load the command template from the skill's templates directory. The template provides:
|
|
||||||
- YAML frontmatter structure
|
|
||||||
- Placeholder variables for substitution
|
|
||||||
- Standard command file sections
|
|
||||||
|
|
||||||
## Input
|
|
||||||
|
|
||||||
From Phase 2:
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
targetPath: string,
|
|
||||||
targetDir: string,
|
|
||||||
fileName: string,
|
|
||||||
fileExists: boolean,
|
|
||||||
params: {
|
|
||||||
skillName: string,
|
|
||||||
description: string,
|
|
||||||
location: string,
|
|
||||||
group: string | null,
|
|
||||||
argumentHint: string
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Template Location
|
|
||||||
|
|
||||||
```
|
|
||||||
.claude/skills/command-generator/templates/command-md.md
|
|
||||||
```
|
|
||||||
|
|
||||||
## Execution Steps
|
|
||||||
|
|
||||||
### Step 1: Locate Template File
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Template is located in the skill's templates directory
|
|
||||||
const skillDir = '.claude/skills/command-generator';
|
|
||||||
const templatePath = `${skillDir}/templates/command-md.md`;
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Read Template Content
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const templateContent = Read(templatePath);
|
|
||||||
|
|
||||||
if (!templateContent) {
|
|
||||||
throw new Error(`Command template not found at ${templatePath}`);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 3: Validate Template Structure
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Verify template contains expected placeholders
|
|
||||||
const requiredPlaceholders = ['{{name}}', '{{description}}'];
|
|
||||||
const optionalPlaceholders = ['{{group}}', '{{argumentHint}}'];
|
|
||||||
|
|
||||||
for (const placeholder of requiredPlaceholders) {
|
|
||||||
if (!templateContent.includes(placeholder)) {
|
|
||||||
throw new Error(`Template missing required placeholder: ${placeholder}`);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Store Template for Next Phase
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const template = {
|
|
||||||
content: templateContent,
|
|
||||||
requiredPlaceholders: requiredPlaceholders,
|
|
||||||
optionalPlaceholders: optionalPlaceholders
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
## Template Format Reference
|
|
||||||
|
|
||||||
The template should follow this structure:
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
---
|
|
||||||
name: {{name}}
|
|
||||||
description: {{description}}
|
|
||||||
{{#if group}}group: {{group}}{{/if}}
|
|
||||||
{{#if argumentHint}}argument-hint: {{argumentHint}}{{/if}}
|
|
||||||
---
|
|
||||||
|
|
||||||
# {{name}} Command
|
|
||||||
|
|
||||||
[Template content with placeholders]
|
|
||||||
```
|
|
||||||
|
|
||||||
## Output
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
status: 'loaded',
|
|
||||||
template: {
|
|
||||||
content: templateContent,
|
|
||||||
requiredPlaceholders: requiredPlaceholders,
|
|
||||||
optionalPlaceholders: optionalPlaceholders
|
|
||||||
},
|
|
||||||
targetPath: targetPath,
|
|
||||||
params: params
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
| Error | Action |
|
|
||||||
|-------|--------|
|
|
||||||
| Template file not found | Throw error with path |
|
|
||||||
| Missing required placeholder | Throw error with missing placeholder name |
|
|
||||||
| Empty template | Throw error |
|
|
||||||
|
|
||||||
## Next Phase
|
|
||||||
|
|
||||||
Proceed to [Phase 4: Content Formatting](04-content-formatting.md) with `template`, `targetPath`, and `params`.
|
|
||||||
@@ -1,184 +0,0 @@
|
|||||||
# Phase 4: Content Formatting
|
|
||||||
|
|
||||||
Format template content by substituting placeholders with parameter values.
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Replace all placeholder variables in the template with validated parameter values:
|
|
||||||
- `{{name}}` -> skillName
|
|
||||||
- `{{description}}` -> description
|
|
||||||
- `{{group}}` -> group (if provided)
|
|
||||||
- `{{argumentHint}}` -> argumentHint (if provided)
|
|
||||||
|
|
||||||
## Input
|
|
||||||
|
|
||||||
From Phase 3:
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
template: {
|
|
||||||
content: string,
|
|
||||||
requiredPlaceholders: string[],
|
|
||||||
optionalPlaceholders: string[]
|
|
||||||
},
|
|
||||||
targetPath: string,
|
|
||||||
params: {
|
|
||||||
skillName: string,
|
|
||||||
description: string,
|
|
||||||
location: string,
|
|
||||||
group: string | null,
|
|
||||||
argumentHint: string
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Placeholder Mapping
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const placeholderMap = {
|
|
||||||
'{{name}}': params.skillName,
|
|
||||||
'{{description}}': params.description,
|
|
||||||
'{{group}}': params.group || '',
|
|
||||||
'{{argumentHint}}': params.argumentHint || ''
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
## Execution Steps
|
|
||||||
|
|
||||||
### Step 1: Initialize Content
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
let formattedContent = template.content;
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Substitute Required Placeholders
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// These must always be replaced
|
|
||||||
formattedContent = formattedContent.replace(/\{\{name\}\}/g, params.skillName);
|
|
||||||
formattedContent = formattedContent.replace(/\{\{description\}\}/g, params.description);
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 3: Handle Optional Placeholders
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Group placeholder
|
|
||||||
if (params.group) {
|
|
||||||
formattedContent = formattedContent.replace(/\{\{group\}\}/g, params.group);
|
|
||||||
} else {
|
|
||||||
// Remove group line if not provided
|
|
||||||
formattedContent = formattedContent.replace(/^group: \{\{group\}\}\n?/gm, '');
|
|
||||||
formattedContent = formattedContent.replace(/\{\{group\}\}/g, '');
|
|
||||||
}
|
|
||||||
|
|
||||||
// Argument hint placeholder
|
|
||||||
if (params.argumentHint) {
|
|
||||||
formattedContent = formattedContent.replace(/\{\{argumentHint\}\}/g, params.argumentHint);
|
|
||||||
} else {
|
|
||||||
// Remove argument-hint line if not provided
|
|
||||||
formattedContent = formattedContent.replace(/^argument-hint: \{\{argumentHint\}\}\n?/gm, '');
|
|
||||||
formattedContent = formattedContent.replace(/\{\{argumentHint\}\}/g, '');
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Handle Conditional Sections
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Remove empty frontmatter lines (caused by missing optional fields)
|
|
||||||
formattedContent = formattedContent.replace(/\n{3,}/g, '\n\n');
|
|
||||||
|
|
||||||
// Handle {{#if group}} style conditionals
|
|
||||||
if (formattedContent.includes('{{#if')) {
|
|
||||||
// Process group conditional
|
|
||||||
if (params.group) {
|
|
||||||
formattedContent = formattedContent.replace(/\{\{#if group\}\}([\s\S]*?)\{\{\/if\}\}/g, '$1');
|
|
||||||
} else {
|
|
||||||
formattedContent = formattedContent.replace(/\{\{#if group\}\}[\s\S]*?\{\{\/if\}\}/g, '');
|
|
||||||
}
|
|
||||||
|
|
||||||
// Process argumentHint conditional
|
|
||||||
if (params.argumentHint) {
|
|
||||||
formattedContent = formattedContent.replace(/\{\{#if argumentHint\}\}([\s\S]*?)\{\{\/if\}\}/g, '$1');
|
|
||||||
} else {
|
|
||||||
formattedContent = formattedContent.replace(/\{\{#if argumentHint\}\}[\s\S]*?\{\{\/if\}\}/g, '');
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 5: Validate Final Content
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Ensure no unresolved placeholders remain
|
|
||||||
const unresolvedPlaceholders = formattedContent.match(/\{\{[^}]+\}\}/g);
|
|
||||||
if (unresolvedPlaceholders) {
|
|
||||||
console.warn(`Warning: Unresolved placeholders found: ${unresolvedPlaceholders.join(', ')}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Ensure frontmatter is valid
|
|
||||||
const frontmatterMatch = formattedContent.match(/^---\n([\s\S]*?)\n---/);
|
|
||||||
if (!frontmatterMatch) {
|
|
||||||
throw new Error('Generated content has invalid frontmatter structure');
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 6: Generate Summary
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const summary = {
|
|
||||||
name: params.skillName,
|
|
||||||
description: params.description.substring(0, 50) + (params.description.length > 50 ? '...' : ''),
|
|
||||||
location: params.location,
|
|
||||||
group: params.group,
|
|
||||||
hasArgumentHint: !!params.argumentHint
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
## Output
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
status: 'formatted',
|
|
||||||
content: formattedContent,
|
|
||||||
targetPath: targetPath,
|
|
||||||
summary: summary
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Content Example
|
|
||||||
|
|
||||||
### Input Template
|
|
||||||
```markdown
|
|
||||||
---
|
|
||||||
name: {{name}}
|
|
||||||
description: {{description}}
|
|
||||||
{{#if group}}group: {{group}}{{/if}}
|
|
||||||
{{#if argumentHint}}argument-hint: {{argumentHint}}{{/if}}
|
|
||||||
---
|
|
||||||
|
|
||||||
# {{name}} Command
|
|
||||||
```
|
|
||||||
|
|
||||||
### Output (with all fields)
|
|
||||||
```markdown
|
|
||||||
---
|
|
||||||
name: create
|
|
||||||
description: Create structured issue from GitHub URL or text description
|
|
||||||
group: issue
|
|
||||||
argument-hint: [-y|--yes] <github-url | text-description> [--priority 1-5]
|
|
||||||
---
|
|
||||||
|
|
||||||
# create Command
|
|
||||||
```
|
|
||||||
|
|
||||||
### Output (minimal fields)
|
|
||||||
```markdown
|
|
||||||
---
|
|
||||||
name: deploy
|
|
||||||
description: Deploy application to production environment
|
|
||||||
---
|
|
||||||
|
|
||||||
# deploy Command
|
|
||||||
```
|
|
||||||
|
|
||||||
## Next Phase
|
|
||||||
|
|
||||||
Proceed to [Phase 5: File Generation](05-file-generation.md) with `content` and `targetPath`.
|
|
||||||
@@ -1,185 +0,0 @@
|
|||||||
# Phase 5: File Generation
|
|
||||||
|
|
||||||
Write the formatted content to the target command file.
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Generate the final command file by:
|
|
||||||
1. Checking for existing file (warn if present)
|
|
||||||
2. Writing formatted content to target path
|
|
||||||
3. Confirming successful generation
|
|
||||||
|
|
||||||
## Input
|
|
||||||
|
|
||||||
From Phase 4:
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
status: 'formatted',
|
|
||||||
content: string,
|
|
||||||
targetPath: string,
|
|
||||||
summary: {
|
|
||||||
name: string,
|
|
||||||
description: string,
|
|
||||||
location: string,
|
|
||||||
group: string | null,
|
|
||||||
hasArgumentHint: boolean
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Execution Steps
|
|
||||||
|
|
||||||
### Step 1: Pre-Write Check
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Check if file already exists
|
|
||||||
const fileExists = Bash(`test -f "${targetPath}" && echo "EXISTS" || echo "NOT_FOUND"`);
|
|
||||||
|
|
||||||
if (fileExists.includes('EXISTS')) {
|
|
||||||
console.warn(`
|
|
||||||
WARNING: Command file already exists at: ${targetPath}
|
|
||||||
The file will be overwritten with new content.
|
|
||||||
`);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Ensure Directory Exists
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Get directory from target path
|
|
||||||
const targetDir = path.dirname(targetPath);
|
|
||||||
|
|
||||||
// Create directory if it doesn't exist
|
|
||||||
Bash(`mkdir -p "${targetDir}"`);
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 3: Write File
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Write the formatted content
|
|
||||||
Write(targetPath, content);
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Verify Write
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Confirm file was created
|
|
||||||
const verifyExists = Bash(`test -f "${targetPath}" && echo "SUCCESS" || echo "FAILED"`);
|
|
||||||
|
|
||||||
if (!verifyExists.includes('SUCCESS')) {
|
|
||||||
throw new Error(`Failed to create command file at ${targetPath}`);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Verify content was written
|
|
||||||
const writtenContent = Read(targetPath);
|
|
||||||
if (!writtenContent || writtenContent.length === 0) {
|
|
||||||
throw new Error(`Command file created but appears to be empty`);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 5: Generate Success Report
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const report = {
|
|
||||||
status: 'completed',
|
|
||||||
file: {
|
|
||||||
path: targetPath,
|
|
||||||
name: summary.name,
|
|
||||||
location: summary.location,
|
|
||||||
group: summary.group,
|
|
||||||
size: writtenContent.length,
|
|
||||||
created: new Date().toISOString()
|
|
||||||
},
|
|
||||||
command: {
|
|
||||||
name: summary.name,
|
|
||||||
description: summary.description,
|
|
||||||
hasArgumentHint: summary.hasArgumentHint
|
|
||||||
},
|
|
||||||
nextSteps: [
|
|
||||||
`Edit ${targetPath} to add implementation details`,
|
|
||||||
'Add usage examples and execution flow',
|
|
||||||
'Test the command with Claude Code'
|
|
||||||
]
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
## Output
|
|
||||||
|
|
||||||
### Success Output
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
status: 'completed',
|
|
||||||
file: {
|
|
||||||
path: '.claude/commands/issue/create.md',
|
|
||||||
name: 'create',
|
|
||||||
location: 'project',
|
|
||||||
group: 'issue',
|
|
||||||
size: 1234,
|
|
||||||
created: '2026-02-27T12:00:00.000Z'
|
|
||||||
},
|
|
||||||
command: {
|
|
||||||
name: 'create',
|
|
||||||
description: 'Create structured issue from GitHub URL...',
|
|
||||||
hasArgumentHint: true
|
|
||||||
},
|
|
||||||
nextSteps: [
|
|
||||||
'Edit .claude/commands/issue/create.md to add implementation details',
|
|
||||||
'Add usage examples and execution flow',
|
|
||||||
'Test the command with Claude Code'
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Console Output
|
|
||||||
|
|
||||||
```
|
|
||||||
Command generated successfully!
|
|
||||||
|
|
||||||
File: .claude/commands/issue/create.md
|
|
||||||
Name: create
|
|
||||||
Description: Create structured issue from GitHub URL...
|
|
||||||
Location: project
|
|
||||||
Group: issue
|
|
||||||
|
|
||||||
Next Steps:
|
|
||||||
1. Edit .claude/commands/issue/create.md to add implementation details
|
|
||||||
2. Add usage examples and execution flow
|
|
||||||
3. Test the command with Claude Code
|
|
||||||
```
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
| Error | Action |
|
|
||||||
|-------|--------|
|
|
||||||
| Directory creation failed | Throw error with directory path |
|
|
||||||
| File write failed | Throw error with target path |
|
|
||||||
| Empty file detected | Throw error and attempt cleanup |
|
|
||||||
| Permission denied | Throw error with permission hint |
|
|
||||||
|
|
||||||
## Cleanup on Failure
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// If any step fails, attempt to clean up partial artifacts
|
|
||||||
function cleanup(targetPath) {
|
|
||||||
try {
|
|
||||||
Bash(`rm -f "${targetPath}"`);
|
|
||||||
} catch (e) {
|
|
||||||
// Ignore cleanup errors
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Completion
|
|
||||||
|
|
||||||
The command file has been successfully generated. The skill execution is complete.
|
|
||||||
|
|
||||||
### Usage Example
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Use the generated command
|
|
||||||
/issue:create https://github.com/owner/repo/issues/123
|
|
||||||
|
|
||||||
# Or with the group prefix
|
|
||||||
/issue:create "Login fails with special chars"
|
|
||||||
```
|
|
||||||
@@ -1,160 +1,65 @@
|
|||||||
# Command Design Specification
|
# Command Design Specification
|
||||||
|
|
||||||
Guidelines and best practices for designing Claude Code command files.
|
Guidelines for Claude Code command files generated by command-generator.
|
||||||
|
|
||||||
## Command File Structure
|
## YAML Frontmatter
|
||||||
|
|
||||||
### YAML Frontmatter
|
|
||||||
|
|
||||||
Every command file must start with YAML frontmatter containing:
|
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
---
|
---
|
||||||
name: command-name # Required: Command identifier (lowercase, hyphens)
|
name: command-name # Required: lowercase with hyphens
|
||||||
description: Description # Required: Brief description of command purpose
|
description: Description # Required: brief purpose
|
||||||
argument-hint: "[args]" # Optional: Argument format hint
|
argument-hint: "[args]" # Optional: argument format hint
|
||||||
allowed-tools: Tool1, Tool2 # Optional: Restricted tool set
|
allowed-tools: Tool1, Tool2 # Optional: restricted tool set
|
||||||
examples: # Optional: Usage examples
|
|
||||||
- /command:example1
|
|
||||||
- /command:example2 --flag
|
|
||||||
---
|
---
|
||||||
```
|
```
|
||||||
|
|
||||||
### Frontmatter Fields
|
|
||||||
|
|
||||||
| Field | Required | Description |
|
|
||||||
|-------|----------|-------------|
|
|
||||||
| `name` | Yes | Command identifier, lowercase with hyphens |
|
|
||||||
| `description` | Yes | Brief description, appears in command listings |
|
|
||||||
| `argument-hint` | No | Usage hint for arguments (shown in help) |
|
|
||||||
| `allowed-tools` | No | Restrict available tools for this command |
|
|
||||||
| `examples` | No | Array of usage examples |
|
|
||||||
|
|
||||||
## Naming Conventions
|
## Naming Conventions
|
||||||
|
|
||||||
### Command Names
|
| Element | Convention | Examples |
|
||||||
|
|---------|-----------|----------|
|
||||||
|
| Command name | lowercase, hyphens, 2-3 words max | `deploy`, `create-issue` |
|
||||||
|
| Group name | singular noun | `issue`, `session`, `workflow` |
|
||||||
|
| Verbs for actions | imperative | `deploy`, `create`, `analyze` |
|
||||||
|
|
||||||
- Use lowercase letters only
|
## Path Structure
|
||||||
- Separate words with hyphens (`create-issue`, not `createIssue`)
|
|
||||||
- Keep names short but descriptive (2-3 words max)
|
|
||||||
- Use verbs for actions (`deploy`, `create`, `analyze`)
|
|
||||||
|
|
||||||
### Group Names
|
|
||||||
|
|
||||||
- Groups organize related commands
|
|
||||||
- Use singular nouns (`issue`, `session`, `workflow`)
|
|
||||||
- Common groups: `issue`, `workflow`, `session`, `memory`, `cli`
|
|
||||||
|
|
||||||
### Path Examples
|
|
||||||
|
|
||||||
```
|
```
|
||||||
.claude/commands/deploy.md # Top-level command
|
.claude/commands/deploy.md # Top-level command
|
||||||
.claude/commands/issue/create.md # Grouped command
|
.claude/commands/issue/create.md # Grouped command
|
||||||
.claude/commands/workflow/init.md # Grouped command
|
~/.claude/commands/global-status.md # User-level command
|
||||||
```
|
```
|
||||||
|
|
||||||
## Content Sections
|
## Content Structure (GSD Style)
|
||||||
|
|
||||||
### Required Sections
|
Generated commands should use XML semantic tags:
|
||||||
|
|
||||||
1. **Overview**: Brief description of command purpose
|
| Tag | Required | Purpose |
|
||||||
2. **Usage**: Command syntax and examples
|
|-----|----------|---------|
|
||||||
3. **Execution Flow**: High-level process diagram
|
| `<purpose>` | Yes | What the command does, when invoked, what it produces |
|
||||||
|
| `<required_reading>` | Yes | Files to read before execution (@ notation) |
|
||||||
|
| `<process>` | Yes | Container for execution steps |
|
||||||
|
| `<step name="...">` | Yes | Individual execution steps with snake_case names |
|
||||||
|
| `<error_codes>` | No | Error code table with severity and description |
|
||||||
|
| `<success_criteria>` | Yes | Checkbox list of verifiable completion conditions |
|
||||||
|
|
||||||
### Recommended Sections
|
## Step Naming
|
||||||
|
|
||||||
4. **Implementation**: Code examples for each phase
|
- Use snake_case: `parse_input`, `validate_config`, `write_output`
|
||||||
5. **Error Handling**: Error cases and recovery
|
- Use action verbs: `discover`, `validate`, `spawn`, `collect`, `report`
|
||||||
6. **Related Commands**: Links to related functionality
|
- First step gets `priority="first"` attribute
|
||||||
|
|
||||||
## Best Practices
|
## Error Messages
|
||||||
|
|
||||||
### 1. Clear Purpose
|
|
||||||
|
|
||||||
Each command should do one thing well:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
Good: /issue:create - Create a new issue
|
Good: Error: GitHub issue URL required
|
||||||
Bad: /issue:manage - Create, update, delete issues (too broad)
|
Usage: /issue:create <github-url>
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Consistent Structure
|
Bad: Error: Invalid input
|
||||||
|
|
||||||
Follow the same pattern across all commands in a group:
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
# All issue commands should have:
|
|
||||||
- Overview
|
|
||||||
- Usage with examples
|
|
||||||
- Phase-based implementation
|
|
||||||
- Error handling table
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Progressive Detail
|
|
||||||
|
|
||||||
Start simple, add detail in phases:
|
|
||||||
|
|
||||||
```
|
|
||||||
Phase 1: Quick overview
|
|
||||||
Phase 2: Implementation details
|
|
||||||
Phase 3: Edge cases and errors
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Reusable Patterns
|
|
||||||
|
|
||||||
Use consistent patterns for common operations:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Input parsing pattern
|
|
||||||
const args = parseArguments($ARGUMENTS);
|
|
||||||
const flags = parseFlags($ARGUMENTS);
|
|
||||||
|
|
||||||
// Validation pattern
|
|
||||||
if (!args.required) {
|
|
||||||
throw new Error('Required argument missing');
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Scope Guidelines
|
## Scope Guidelines
|
||||||
|
|
||||||
### Project Commands (`.claude/commands/`)
|
| Scope | Location | Use For |
|
||||||
|
|-------|----------|---------|
|
||||||
- Project-specific workflows
|
| Project | `.claude/commands/` | Team workflows, project integrations |
|
||||||
- Team conventions
|
| User | `~/.claude/commands/` | Personal tools, cross-project utilities |
|
||||||
- Integration with project tools
|
|
||||||
|
|
||||||
### User Commands (`~/.claude/commands/`)
|
|
||||||
|
|
||||||
- Personal productivity tools
|
|
||||||
- Cross-project utilities
|
|
||||||
- Global configuration
|
|
||||||
|
|
||||||
## Error Messages
|
|
||||||
|
|
||||||
### Good Error Messages
|
|
||||||
|
|
||||||
```
|
|
||||||
Error: GitHub issue URL required
|
|
||||||
Usage: /issue:create <github-url>
|
|
||||||
Example: /issue:create https://github.com/owner/repo/issues/123
|
|
||||||
```
|
|
||||||
|
|
||||||
### Bad Error Messages
|
|
||||||
|
|
||||||
```
|
|
||||||
Error: Invalid input
|
|
||||||
```
|
|
||||||
|
|
||||||
## Testing Commands
|
|
||||||
|
|
||||||
After creating a command, test:
|
|
||||||
|
|
||||||
1. **Basic invocation**: Does it run without arguments?
|
|
||||||
2. **Argument parsing**: Does it handle valid arguments?
|
|
||||||
3. **Error cases**: Does it show helpful errors for invalid input?
|
|
||||||
4. **Help text**: Is the usage clear?
|
|
||||||
|
|
||||||
## Related Documentation
|
|
||||||
|
|
||||||
- [SKILL-DESIGN-SPEC.md](../_shared/SKILL-DESIGN-SPEC.md) - Full skill design specification
|
|
||||||
- [../skill-generator/SKILL.md](../skill-generator/SKILL.md) - Meta-skill for creating skills
|
|
||||||
|
|||||||
@@ -1,75 +1,112 @@
|
|||||||
|
# Command Template — Structural Reference
|
||||||
|
|
||||||
|
This template defines the **structural pattern** for generated commands. The `draft_content` step uses this as a guide to generate concrete, domain-specific content — NOT as a literal copy target.
|
||||||
|
|
||||||
|
## Required Structure
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
name: {$SKILL_NAME}
|
||||||
|
description: {$DESCRIPTION}
|
||||||
|
argument-hint: {$ARGUMENT_HINT} # omit line if empty
|
||||||
---
|
---
|
||||||
name: {{name}}
|
|
||||||
description: {{description}}
|
|
||||||
{{#if argumentHint}}argument-hint: {{argumentHint}}
|
|
||||||
{{/if}}---
|
|
||||||
|
|
||||||
# {{name}} Command
|
<purpose>
|
||||||
|
{2-3 concrete sentences: what it does + when invoked + what it produces}
|
||||||
|
</purpose>
|
||||||
|
|
||||||
## Overview
|
<required_reading>
|
||||||
|
{@ references to files this command needs before execution}
|
||||||
|
</required_reading>
|
||||||
|
|
||||||
[Describe the command purpose and what it does]
|
<prerequisites> <!-- include when command uses external CLI tools -->
|
||||||
|
- {tool} ({version}+) — {what it's used for}
|
||||||
|
</prerequisites>
|
||||||
|
|
||||||
## Usage
|
<process>
|
||||||
|
|
||||||
|
<step name="parse_input" priority="first">
|
||||||
|
**Parse arguments and validate input.**
|
||||||
|
|
||||||
|
Parse `$ARGUMENTS` for:
|
||||||
|
- {specific flags from $ARGUMENT_HINT}
|
||||||
|
- {positional args}
|
||||||
|
|
||||||
|
{Decision routing table if multiple modes:}
|
||||||
|
| Condition | Action |
|
||||||
|
|-----------|--------|
|
||||||
|
| flag present | set variable |
|
||||||
|
| missing required | Error: "message" `# (see code: E001)` + `exit 1` |
|
||||||
|
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step name="{domain_action_1}">
|
||||||
|
**{Concrete action description.}**
|
||||||
|
|
||||||
|
$STATE_VAR="default" <!-- Initialize BEFORE conditional -->
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
/{{#if group}}{{group}}:{{/if}}{{name}} [arguments]
|
# Use heredoc for multi-line output
|
||||||
|
cat <<EOF > output-file
|
||||||
|
{structured content with $VARIABLES}
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Every error path: message + code ref + exit
|
||||||
|
if [ ! -f "$REQUIRED_FILE" ]; then
|
||||||
|
echo "Error: Required file missing" # (see code: E003)
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
```
|
```
|
||||||
|
|
||||||
**Examples**:
|
| Condition | Action |
|
||||||
```bash
|
|-----------|--------|
|
||||||
# Example 1: Basic usage
|
| success | Continue to next step |
|
||||||
/{{#if group}}{{group}}:{{/if}}{{name}}
|
| failure | Error `# (see code: E0XX)`, exit 1 |
|
||||||
|
</step>
|
||||||
|
|
||||||
# Example 2: With arguments
|
<step name="report">
|
||||||
/{{#if group}}{{group}}:{{/if}}{{name}} --option value
|
**Format and display results.**
|
||||||
|
|
||||||
|
{Banner with status, file paths, next steps}
|
||||||
|
</step>
|
||||||
|
|
||||||
|
</process>
|
||||||
|
|
||||||
|
<error_codes>
|
||||||
|
|
||||||
|
| Code | Severity | Description | Stage |
|
||||||
|
|------|----------|-------------|-------|
|
||||||
|
| E001 | error | {specific to parse_input validation} | parse_input |
|
||||||
|
| E002 | error | {specific to domain action failure} | {step_name} |
|
||||||
|
| W001 | warning | {specific recoverable condition} | {step_name} |
|
||||||
|
|
||||||
|
<!-- Every code MUST be referenced by `# (see code: EXXX)` in <process> -->
|
||||||
|
</error_codes>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- [ ] {Input validated}
|
||||||
|
- [ ] {Domain action 1 completed}
|
||||||
|
- [ ] {Domain action 2 completed}
|
||||||
|
- [ ] {Output produced / effect applied}
|
||||||
|
</success_criteria>
|
||||||
```
|
```
|
||||||
|
|
||||||
## Execution Flow
|
## Content Quality Rules
|
||||||
|
|
||||||
```
|
| Rule | Bad Example | Good Example |
|
||||||
Phase 1: Input Parsing
|
|------|-------------|--------------|
|
||||||
- Parse arguments and flags
|
| No bracket placeholders | `[Describe purpose]` | `Deploy to target environment with rollback on failure.` |
|
||||||
- Validate input parameters
|
| Concrete step names | `execute` | `run_deployment`, `validate_config` |
|
||||||
|
| Specific error codes | `E001: Invalid input` | `E001: --env must be "prod" or "staging"` |
|
||||||
|
| Verifiable criteria | `Command works` | `Deployment log written to .deploy/latest.log` |
|
||||||
|
| Real shell commands | `# TODO: implement` | `kubectl apply -f $MANIFEST_PATH` |
|
||||||
|
|
||||||
Phase 2: Core Processing
|
## Step Naming Conventions
|
||||||
- Execute main logic
|
|
||||||
- Handle edge cases
|
|
||||||
|
|
||||||
Phase 3: Output Generation
|
| Domain | Typical Steps |
|
||||||
- Format results
|
|--------|--------------|
|
||||||
- Display to user
|
| Deploy/Release | `validate_config`, `run_deployment`, `verify_health`, `report` |
|
||||||
```
|
| CRUD operations | `parse_input`, `validate_entity`, `persist_changes`, `report` |
|
||||||
|
| Analysis/Review | `parse_input`, `gather_context`, `run_analysis`, `present_findings` |
|
||||||
## Implementation
|
| Sync/Migration | `parse_input`, `detect_changes`, `apply_sync`, `verify_state` |
|
||||||
|
| Build/Generate | `parse_input`, `resolve_dependencies`, `run_build`, `write_output` |
|
||||||
### Phase 1: Input Parsing
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// Parse command arguments
|
|
||||||
const args = parseArguments($ARGUMENTS);
|
|
||||||
```
|
|
||||||
|
|
||||||
### Phase 2: Core Processing
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// TODO: Implement core logic
|
|
||||||
```
|
|
||||||
|
|
||||||
### Phase 3: Output Generation
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
// TODO: Format and display output
|
|
||||||
```
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
| Error | Action |
|
|
||||||
|-------|--------|
|
|
||||||
| Invalid input | Show usage and error message |
|
|
||||||
| Processing failure | Log error and suggest recovery |
|
|
||||||
|
|
||||||
## Related Commands
|
|
||||||
|
|
||||||
- [Related command 1]
|
|
||||||
- [Related command 2]
|
|
||||||
|
|||||||
@@ -29,7 +29,7 @@ import {
|
|||||||
projectExists,
|
projectExists,
|
||||||
getStorageLocationInstructions
|
getStorageLocationInstructions
|
||||||
} from '../tools/storage-manager.js';
|
} from '../tools/storage-manager.js';
|
||||||
import { getHistoryStore, findProjectWithExecution } from '../tools/cli-history-store.js';
|
import { getHistoryStore, findProjectWithExecution, getRegisteredExecutionHistory } from '../tools/cli-history-store.js';
|
||||||
import { createSpinner } from '../utils/ui.js';
|
import { createSpinner } from '../utils/ui.js';
|
||||||
import { loadClaudeCliSettings } from '../tools/claude-cli-tools.js';
|
import { loadClaudeCliSettings } from '../tools/claude-cli-tools.js';
|
||||||
|
|
||||||
@@ -421,11 +421,15 @@ async function outputAction(conversationId: string | undefined, options: OutputV
|
|||||||
if (!result) {
|
if (!result) {
|
||||||
const hint = options.project
|
const hint = options.project
|
||||||
? `in project: ${options.project}`
|
? `in project: ${options.project}`
|
||||||
: 'in current directory or parent directories';
|
: 'in registered CCW project history';
|
||||||
console.error(chalk.red(`Error: Execution not found: ${conversationId}`));
|
console.error(chalk.red(`Error: Execution not found: ${conversationId}`));
|
||||||
console.error(chalk.gray(` Searched ${hint}`));
|
console.error(chalk.gray(` Searched ${hint}`));
|
||||||
|
console.error(chalk.gray(' Tip: use the real CCW execution ID, not an outer task label.'));
|
||||||
|
console.error(chalk.gray(' Capture [CCW_EXEC_ID=...] from stderr, or start with --id <your-id>.'));
|
||||||
|
console.error(chalk.gray(' Discover IDs via: ccw cli show or ccw cli history'));
|
||||||
console.error(chalk.gray('Usage: ccw cli output <conversation-id> [--project <path>]'));
|
console.error(chalk.gray('Usage: ccw cli output <conversation-id> [--project <path>]'));
|
||||||
process.exit(1);
|
process.exit(1);
|
||||||
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (options.raw) {
|
if (options.raw) {
|
||||||
@@ -1394,7 +1398,7 @@ async function showAction(options: { all?: boolean }): Promise<void> {
|
|||||||
|
|
||||||
// 2. Get recent history from SQLite
|
// 2. Get recent history from SQLite
|
||||||
const historyLimit = options.all ? 100 : 20;
|
const historyLimit = options.all ? 100 : 20;
|
||||||
const history = await getExecutionHistoryAsync(process.cwd(), { limit: historyLimit, recursive: true });
|
const history = getRegisteredExecutionHistory({ limit: historyLimit });
|
||||||
const historyById = new Map(history.executions.map(exec => [exec.id, exec]));
|
const historyById = new Map(history.executions.map(exec => [exec.id, exec]));
|
||||||
|
|
||||||
// 3. Build unified list: active first, then history (de-duped)
|
// 3. Build unified list: active first, then history (de-duped)
|
||||||
@@ -1595,7 +1599,7 @@ async function historyAction(options: HistoryOptions): Promise<void> {
|
|||||||
console.log(chalk.bold.cyan('\n CLI Execution History\n'));
|
console.log(chalk.bold.cyan('\n CLI Execution History\n'));
|
||||||
|
|
||||||
// Use recursive: true to aggregate history from parent and child projects (matches Dashboard behavior)
|
// Use recursive: true to aggregate history from parent and child projects (matches Dashboard behavior)
|
||||||
const history = await getExecutionHistoryAsync(process.cwd(), { limit: parseInt(limit, 10), tool, status, recursive: true });
|
const history = getRegisteredExecutionHistory({ limit: parseInt(limit, 10), tool, status });
|
||||||
|
|
||||||
if (history.executions.length === 0) {
|
if (history.executions.length === 0) {
|
||||||
console.log(chalk.gray(' No executions found.\n'));
|
console.log(chalk.gray(' No executions found.\n'));
|
||||||
@@ -1650,7 +1654,14 @@ async function detailAction(conversationId: string | undefined): Promise<void> {
|
|||||||
process.exit(1);
|
process.exit(1);
|
||||||
}
|
}
|
||||||
|
|
||||||
const conversation = getConversationDetail(process.cwd(), conversationId);
|
let conversation = getConversationDetail(process.cwd(), conversationId);
|
||||||
|
|
||||||
|
if (!conversation) {
|
||||||
|
const found = findProjectWithExecution(conversationId, process.cwd());
|
||||||
|
if (found) {
|
||||||
|
conversation = getConversationDetail(found.projectPath, conversationId);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (!conversation) {
|
if (!conversation) {
|
||||||
console.error(chalk.red(`Error: Conversation not found: ${conversationId}`));
|
console.error(chalk.red(`Error: Conversation not found: ${conversationId}`));
|
||||||
|
|||||||
@@ -16,7 +16,7 @@ import { spawn } from 'child_process';
|
|||||||
import { join, dirname } from 'path';
|
import { join, dirname } from 'path';
|
||||||
import { existsSync } from 'fs';
|
import { existsSync } from 'fs';
|
||||||
import { fileURLToPath } from 'url';
|
import { fileURLToPath } from 'url';
|
||||||
import { getCodexLensPython } from '../utils/codexlens-path.js';
|
import { getCodexLensHiddenPython } from '../utils/codexlens-path.js';
|
||||||
import { getCoreMemoryStore } from './core-memory-store.js';
|
import { getCoreMemoryStore } from './core-memory-store.js';
|
||||||
import type { Stage1Output } from './core-memory-store.js';
|
import type { Stage1Output } from './core-memory-store.js';
|
||||||
import { StoragePaths } from '../config/storage-paths.js';
|
import { StoragePaths } from '../config/storage-paths.js';
|
||||||
@@ -26,7 +26,7 @@ const __filename = fileURLToPath(import.meta.url);
|
|||||||
const __dirname = dirname(__filename);
|
const __dirname = dirname(__filename);
|
||||||
|
|
||||||
// Venv paths (reuse CodexLens venv)
|
// Venv paths (reuse CodexLens venv)
|
||||||
const VENV_PYTHON = getCodexLensPython();
|
const VENV_PYTHON = getCodexLensHiddenPython();
|
||||||
|
|
||||||
// Script path
|
// Script path
|
||||||
const EMBEDDER_SCRIPT = join(__dirname, '..', '..', 'scripts', 'memory_embedder.py');
|
const EMBEDDER_SCRIPT = join(__dirname, '..', '..', 'scripts', 'memory_embedder.py');
|
||||||
@@ -116,8 +116,11 @@ function runPython(args: string[], timeout: number = 300000): Promise<string> {
|
|||||||
|
|
||||||
// Spawn Python process
|
// Spawn Python process
|
||||||
const child = spawn(VENV_PYTHON, [EMBEDDER_SCRIPT, ...args], {
|
const child = spawn(VENV_PYTHON, [EMBEDDER_SCRIPT, ...args], {
|
||||||
|
shell: false,
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout,
|
timeout,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
||||||
});
|
});
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
|
|||||||
@@ -8,7 +8,7 @@ import {
|
|||||||
executeCodexLens,
|
executeCodexLens,
|
||||||
installSemantic,
|
installSemantic,
|
||||||
} from '../../../tools/codex-lens.js';
|
} from '../../../tools/codex-lens.js';
|
||||||
import { getCodexLensPython } from '../../../utils/codexlens-path.js';
|
import { getCodexLensHiddenPython } from '../../../utils/codexlens-path.js';
|
||||||
import { spawn } from 'child_process';
|
import { spawn } from 'child_process';
|
||||||
import type { GpuMode } from '../../../tools/codex-lens.js';
|
import type { GpuMode } from '../../../tools/codex-lens.js';
|
||||||
import { loadLiteLLMApiConfig, getAvailableModelsForType, getProvider, getAllProviders } from '../../../config/litellm-api-config-manager.js';
|
import { loadLiteLLMApiConfig, getAvailableModelsForType, getProvider, getAllProviders } from '../../../config/litellm-api-config-manager.js';
|
||||||
@@ -59,10 +59,13 @@ except Exception as e:
|
|||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
`;
|
`;
|
||||||
|
|
||||||
const pythonPath = getCodexLensPython();
|
const pythonPath = getCodexLensHiddenPython();
|
||||||
const child = spawn(pythonPath, ['-c', pythonScript], {
|
const child = spawn(pythonPath, ['-c', pythonScript], {
|
||||||
|
shell: false,
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout,
|
timeout,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
||||||
});
|
});
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
|
|||||||
@@ -126,8 +126,10 @@ export async function handleCodexLensWatcherRoutes(ctx: RouteContext): Promise<b
|
|||||||
const args = ['-m', 'codexlens', 'watch', targetPath, '--debounce', String(debounceMs)];
|
const args = ['-m', 'codexlens', 'watch', targetPath, '--debounce', String(debounceMs)];
|
||||||
watcherProcess = spawn(pythonPath, args, {
|
watcherProcess = spawn(pythonPath, args, {
|
||||||
cwd: targetPath,
|
cwd: targetPath,
|
||||||
|
shell: false,
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
env: { ...process.env }
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' }
|
||||||
});
|
});
|
||||||
|
|
||||||
watcherStats = {
|
watcherStats = {
|
||||||
|
|||||||
@@ -4,7 +4,11 @@
|
|||||||
*/
|
*/
|
||||||
import { z } from 'zod';
|
import { z } from 'zod';
|
||||||
import { spawn } from 'child_process';
|
import { spawn } from 'child_process';
|
||||||
import { getSystemPython } from '../../utils/python-utils.js';
|
import {
|
||||||
|
getSystemPythonCommand,
|
||||||
|
parsePythonCommandSpec,
|
||||||
|
type PythonCommandSpec,
|
||||||
|
} from '../../utils/python-utils.js';
|
||||||
import {
|
import {
|
||||||
isUvAvailable,
|
isUvAvailable,
|
||||||
createCodexLensUvManager
|
createCodexLensUvManager
|
||||||
@@ -102,10 +106,11 @@ interface CcwLitellmStatusResponse {
|
|||||||
}
|
}
|
||||||
|
|
||||||
function checkCcwLitellmImport(
|
function checkCcwLitellmImport(
|
||||||
pythonCmd: string,
|
pythonCmd: string | PythonCommandSpec,
|
||||||
options: { timeout: number; shell?: boolean }
|
options: { timeout: number }
|
||||||
): Promise<CcwLitellmEnvCheck> {
|
): Promise<CcwLitellmEnvCheck> {
|
||||||
const { timeout, shell = false } = options;
|
const { timeout } = options;
|
||||||
|
const pythonSpec = typeof pythonCmd === 'string' ? parsePythonCommandSpec(pythonCmd) : pythonCmd;
|
||||||
|
|
||||||
const sanitizePythonError = (stderrText: string): string | undefined => {
|
const sanitizePythonError = (stderrText: string): string | undefined => {
|
||||||
const trimmed = stderrText.trim();
|
const trimmed = stderrText.trim();
|
||||||
@@ -119,11 +124,12 @@ function checkCcwLitellmImport(
|
|||||||
};
|
};
|
||||||
|
|
||||||
return new Promise((resolve) => {
|
return new Promise((resolve) => {
|
||||||
const child = spawn(pythonCmd, ['-c', 'import ccw_litellm; print(ccw_litellm.__version__)'], {
|
const child = spawn(pythonSpec.command, [...pythonSpec.args, '-c', 'import ccw_litellm; print(ccw_litellm.__version__)'], {
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout,
|
timeout,
|
||||||
windowsHide: true,
|
windowsHide: true,
|
||||||
shell,
|
shell: false,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
||||||
});
|
});
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
@@ -142,20 +148,20 @@ function checkCcwLitellmImport(
|
|||||||
const error = sanitizePythonError(stderr);
|
const error = sanitizePythonError(stderr);
|
||||||
|
|
||||||
if (code === 0 && version) {
|
if (code === 0 && version) {
|
||||||
resolve({ python: pythonCmd, installed: true, version });
|
resolve({ python: pythonSpec.display, installed: true, version });
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (code === null) {
|
if (code === null) {
|
||||||
resolve({ python: pythonCmd, installed: false, error: `Timed out after ${timeout}ms` });
|
resolve({ python: pythonSpec.display, installed: false, error: `Timed out after ${timeout}ms` });
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
resolve({ python: pythonCmd, installed: false, error: error || undefined });
|
resolve({ python: pythonSpec.display, installed: false, error: error || undefined });
|
||||||
});
|
});
|
||||||
|
|
||||||
child.on('error', (err) => {
|
child.on('error', (err) => {
|
||||||
resolve({ python: pythonCmd, installed: false, error: err.message });
|
resolve({ python: pythonSpec.display, installed: false, error: err.message });
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
@@ -940,7 +946,7 @@ export async function handleLiteLLMApiRoutes(ctx: RouteContext): Promise<boolean
|
|||||||
// Diagnostics only: if not installed in venv, also check system python so users understand mismatches.
|
// Diagnostics only: if not installed in venv, also check system python so users understand mismatches.
|
||||||
// NOTE: `installed` flag remains the CodexLens venv status (we want isolated venv dependencies).
|
// NOTE: `installed` flag remains the CodexLens venv status (we want isolated venv dependencies).
|
||||||
const systemPython = !codexLensVenv.installed
|
const systemPython = !codexLensVenv.installed
|
||||||
? await checkCcwLitellmImport(getSystemPython(), { timeout: statusTimeout, shell: true })
|
? await checkCcwLitellmImport(getSystemPythonCommand(), { timeout: statusTimeout })
|
||||||
: undefined;
|
: undefined;
|
||||||
|
|
||||||
const result: CcwLitellmStatusResponse = {
|
const result: CcwLitellmStatusResponse = {
|
||||||
@@ -1410,10 +1416,19 @@ export async function handleLiteLLMApiRoutes(ctx: RouteContext): Promise<boolean
|
|||||||
|
|
||||||
// Priority 2: Fallback to system pip uninstall
|
// Priority 2: Fallback to system pip uninstall
|
||||||
console.log('[ccw-litellm uninstall] Using pip fallback...');
|
console.log('[ccw-litellm uninstall] Using pip fallback...');
|
||||||
const pythonCmd = getSystemPython();
|
const pythonCmd = getSystemPythonCommand();
|
||||||
|
|
||||||
return new Promise((resolve) => {
|
return new Promise((resolve) => {
|
||||||
const proc = spawn(pythonCmd, ['-m', 'pip', 'uninstall', '-y', 'ccw-litellm'], { shell: true, timeout: 120000 });
|
const proc = spawn(
|
||||||
|
pythonCmd.command,
|
||||||
|
[...pythonCmd.args, '-m', 'pip', 'uninstall', '-y', 'ccw-litellm'],
|
||||||
|
{
|
||||||
|
shell: false,
|
||||||
|
timeout: 120000,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
||||||
|
},
|
||||||
|
);
|
||||||
let output = '';
|
let output = '';
|
||||||
let error = '';
|
let error = '';
|
||||||
proc.stdout?.on('data', (data) => { output += data.toString(); });
|
proc.stdout?.on('data', (data) => { output += data.toString(); });
|
||||||
|
|||||||
@@ -16,7 +16,7 @@ import { spawn } from 'child_process';
|
|||||||
import { join, dirname } from 'path';
|
import { join, dirname } from 'path';
|
||||||
import { existsSync } from 'fs';
|
import { existsSync } from 'fs';
|
||||||
import { fileURLToPath } from 'url';
|
import { fileURLToPath } from 'url';
|
||||||
import { getCodexLensPython } from '../utils/codexlens-path.js';
|
import { getCodexLensHiddenPython } from '../utils/codexlens-path.js';
|
||||||
import { StoragePaths, ensureStorageDir } from '../config/storage-paths.js';
|
import { StoragePaths, ensureStorageDir } from '../config/storage-paths.js';
|
||||||
|
|
||||||
// Get directory of this module
|
// Get directory of this module
|
||||||
@@ -24,7 +24,7 @@ const __filename = fileURLToPath(import.meta.url);
|
|||||||
const __dirname = dirname(__filename);
|
const __dirname = dirname(__filename);
|
||||||
|
|
||||||
// Venv python path (reuse CodexLens venv)
|
// Venv python path (reuse CodexLens venv)
|
||||||
const VENV_PYTHON = getCodexLensPython();
|
const VENV_PYTHON = getCodexLensHiddenPython();
|
||||||
|
|
||||||
// Script path
|
// Script path
|
||||||
const EMBEDDER_SCRIPT = join(__dirname, '..', '..', 'scripts', 'unified_memory_embedder.py');
|
const EMBEDDER_SCRIPT = join(__dirname, '..', '..', 'scripts', 'unified_memory_embedder.py');
|
||||||
@@ -170,8 +170,11 @@ function runPython<T>(request: Record<string, unknown>, timeout: number = 300000
|
|||||||
}
|
}
|
||||||
|
|
||||||
const child = spawn(VENV_PYTHON, [EMBEDDER_SCRIPT], {
|
const child = spawn(VENV_PYTHON, [EMBEDDER_SCRIPT], {
|
||||||
|
shell: false,
|
||||||
stdio: ['pipe', 'pipe', 'pipe'],
|
stdio: ['pipe', 'pipe', 'pipe'],
|
||||||
timeout,
|
timeout,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
||||||
});
|
});
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
|
|||||||
@@ -1532,6 +1532,197 @@ export function closeAllStores(): void {
|
|||||||
storeCache.clear();
|
storeCache.clear();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function collectHistoryDatabasePaths(): string[] {
|
||||||
|
const projectsDir = join(getCCWHome(), 'projects');
|
||||||
|
if (!existsSync(projectsDir)) {
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
const historyDbPaths: string[] = [];
|
||||||
|
const visitedDirs = new Set<string>();
|
||||||
|
const skipDirs = new Set(['cache', 'cli-history', 'config', 'memory']);
|
||||||
|
|
||||||
|
function scanDirectory(dir: string): void {
|
||||||
|
const resolvedDir = resolve(dir);
|
||||||
|
if (visitedDirs.has(resolvedDir)) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
visitedDirs.add(resolvedDir);
|
||||||
|
|
||||||
|
const historyDb = join(resolvedDir, 'cli-history', 'history.db');
|
||||||
|
if (existsSync(historyDb)) {
|
||||||
|
historyDbPaths.push(historyDb);
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
const entries = readdirSync(resolvedDir, { withFileTypes: true });
|
||||||
|
for (const entry of entries) {
|
||||||
|
if (!entry.isDirectory() || skipDirs.has(entry.name)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
scanDirectory(join(resolvedDir, entry.name));
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
// Ignore unreadable directories during best-effort global scans.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
scanDirectory(projectsDir);
|
||||||
|
return historyDbPaths;
|
||||||
|
}
|
||||||
|
|
||||||
|
function getConversationLocationColumns(db: Database.Database): {
|
||||||
|
projectRootSelect: string;
|
||||||
|
relativePathSelect: string;
|
||||||
|
} {
|
||||||
|
const tableInfo = db.prepare(`PRAGMA table_info(conversations)`).all() as Array<{ name: string }>;
|
||||||
|
const hasProjectRoot = tableInfo.some(col => col.name === 'project_root');
|
||||||
|
const hasRelativePath = tableInfo.some(col => col.name === 'relative_path');
|
||||||
|
|
||||||
|
return {
|
||||||
|
projectRootSelect: hasProjectRoot ? 'c.project_root AS project_root' : `'' AS project_root`,
|
||||||
|
relativePathSelect: hasRelativePath ? 'c.relative_path AS relative_path' : `'' AS relative_path`
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function normalizeHistoryTimestamp(updatedAt: unknown, createdAt: unknown): number {
|
||||||
|
const parsedUpdatedAt = typeof updatedAt === 'string' ? Date.parse(updatedAt) : NaN;
|
||||||
|
if (!Number.isNaN(parsedUpdatedAt)) {
|
||||||
|
return parsedUpdatedAt;
|
||||||
|
}
|
||||||
|
|
||||||
|
const parsedCreatedAt = typeof createdAt === 'string' ? Date.parse(createdAt) : NaN;
|
||||||
|
return Number.isNaN(parsedCreatedAt) ? 0 : parsedCreatedAt;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function getRegisteredExecutionHistory(options: {
|
||||||
|
limit?: number;
|
||||||
|
offset?: number;
|
||||||
|
tool?: string | null;
|
||||||
|
status?: string | null;
|
||||||
|
category?: ExecutionCategory | null;
|
||||||
|
} = {}): {
|
||||||
|
total: number;
|
||||||
|
count: number;
|
||||||
|
executions: (HistoryIndexEntry & { sourceDir?: string })[];
|
||||||
|
} {
|
||||||
|
const {
|
||||||
|
limit = 50,
|
||||||
|
offset = 0,
|
||||||
|
tool = null,
|
||||||
|
status = null,
|
||||||
|
category = null
|
||||||
|
} = options;
|
||||||
|
|
||||||
|
const perStoreLimit = Math.max(limit + offset, limit, 1);
|
||||||
|
const allExecutions: (HistoryIndexEntry & { sourceDir?: string })[] = [];
|
||||||
|
let totalCount = 0;
|
||||||
|
|
||||||
|
for (const historyDb of collectHistoryDatabasePaths()) {
|
||||||
|
let db: Database.Database | null = null;
|
||||||
|
try {
|
||||||
|
db = new Database(historyDb, { readonly: true });
|
||||||
|
const { projectRootSelect, relativePathSelect } = getConversationLocationColumns(db);
|
||||||
|
|
||||||
|
let whereClause = '1=1';
|
||||||
|
const params: Record<string, string | number> = { limit: perStoreLimit };
|
||||||
|
|
||||||
|
if (tool) {
|
||||||
|
whereClause += ' AND c.tool = @tool';
|
||||||
|
params.tool = tool;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (status) {
|
||||||
|
whereClause += ' AND c.latest_status = @status';
|
||||||
|
params.status = status;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (category) {
|
||||||
|
whereClause += ' AND c.category = @category';
|
||||||
|
params.category = category;
|
||||||
|
}
|
||||||
|
|
||||||
|
const countRow = db.prepare(`
|
||||||
|
SELECT COUNT(*) AS count
|
||||||
|
FROM conversations c
|
||||||
|
WHERE ${whereClause}
|
||||||
|
`).get(params) as { count?: number } | undefined;
|
||||||
|
totalCount += countRow?.count || 0;
|
||||||
|
|
||||||
|
const rows = db.prepare(`
|
||||||
|
SELECT
|
||||||
|
c.id,
|
||||||
|
c.created_at AS timestamp,
|
||||||
|
c.updated_at,
|
||||||
|
c.tool,
|
||||||
|
c.latest_status AS status,
|
||||||
|
c.category,
|
||||||
|
c.total_duration_ms AS duration_ms,
|
||||||
|
c.turn_count,
|
||||||
|
c.prompt_preview,
|
||||||
|
${projectRootSelect},
|
||||||
|
${relativePathSelect}
|
||||||
|
FROM conversations c
|
||||||
|
WHERE ${whereClause}
|
||||||
|
ORDER BY c.updated_at DESC
|
||||||
|
LIMIT @limit
|
||||||
|
`).all(params) as Array<{
|
||||||
|
id: string;
|
||||||
|
timestamp: string;
|
||||||
|
updated_at?: string;
|
||||||
|
tool: string;
|
||||||
|
status: string;
|
||||||
|
category?: ExecutionCategory;
|
||||||
|
duration_ms: number;
|
||||||
|
turn_count?: number;
|
||||||
|
prompt_preview: unknown;
|
||||||
|
project_root?: string;
|
||||||
|
relative_path?: string;
|
||||||
|
}>;
|
||||||
|
|
||||||
|
for (const row of rows) {
|
||||||
|
allExecutions.push({
|
||||||
|
id: row.id,
|
||||||
|
timestamp: row.timestamp,
|
||||||
|
updated_at: row.updated_at,
|
||||||
|
tool: row.tool,
|
||||||
|
status: row.status,
|
||||||
|
category: row.category || 'user',
|
||||||
|
duration_ms: row.duration_ms,
|
||||||
|
turn_count: row.turn_count,
|
||||||
|
prompt_preview: typeof row.prompt_preview === 'string'
|
||||||
|
? row.prompt_preview
|
||||||
|
: (row.prompt_preview ? JSON.stringify(row.prompt_preview) : ''),
|
||||||
|
sourceDir: row.project_root || row.relative_path || undefined
|
||||||
|
});
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
// Skip databases that are unavailable or incompatible.
|
||||||
|
} finally {
|
||||||
|
db?.close();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
allExecutions.sort((a, b) => normalizeHistoryTimestamp(b.updated_at, b.timestamp) - normalizeHistoryTimestamp(a.updated_at, a.timestamp));
|
||||||
|
|
||||||
|
const dedupedExecutions: (HistoryIndexEntry & { sourceDir?: string })[] = [];
|
||||||
|
const seenIds = new Set<string>();
|
||||||
|
for (const execution of allExecutions) {
|
||||||
|
if (seenIds.has(execution.id)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
seenIds.add(execution.id);
|
||||||
|
dedupedExecutions.push(execution);
|
||||||
|
}
|
||||||
|
|
||||||
|
const pagedExecutions = dedupedExecutions.slice(offset, offset + limit);
|
||||||
|
return {
|
||||||
|
total: dedupedExecutions.length || totalCount,
|
||||||
|
count: pagedExecutions.length,
|
||||||
|
executions: pagedExecutions
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Find project path that contains the given execution
|
* Find project path that contains the given execution
|
||||||
* Searches upward through parent directories and all registered projects
|
* Searches upward through parent directories and all registered projects
|
||||||
@@ -1579,43 +1770,28 @@ export function findProjectWithExecution(
|
|||||||
|
|
||||||
// Strategy 2: Search in all registered projects (global search)
|
// Strategy 2: Search in all registered projects (global search)
|
||||||
// This covers cases where execution might be in a completely different project tree
|
// This covers cases where execution might be in a completely different project tree
|
||||||
const projectsDir = join(getCCWHome(), 'projects');
|
for (const historyDb of collectHistoryDatabasePaths()) {
|
||||||
if (existsSync(projectsDir)) {
|
let db: Database.Database | null = null;
|
||||||
try {
|
try {
|
||||||
const entries = readdirSync(projectsDir, { withFileTypes: true });
|
db = new Database(historyDb, { readonly: true });
|
||||||
|
const { projectRootSelect } = getConversationLocationColumns(db);
|
||||||
|
const row = db.prepare(`
|
||||||
|
SELECT ${projectRootSelect}
|
||||||
|
FROM conversations c
|
||||||
|
WHERE c.id = ?
|
||||||
|
LIMIT 1
|
||||||
|
`).get(conversationId) as { project_root?: string } | undefined;
|
||||||
|
|
||||||
for (const entry of entries) {
|
if (row?.project_root) {
|
||||||
if (!entry.isDirectory()) continue;
|
return {
|
||||||
|
projectPath: row.project_root,
|
||||||
const projectId = entry.name;
|
projectId: getProjectId(row.project_root)
|
||||||
const historyDb = join(projectsDir, projectId, 'cli-history', 'history.db');
|
};
|
||||||
|
|
||||||
if (!existsSync(historyDb)) continue;
|
|
||||||
|
|
||||||
try {
|
|
||||||
// Open and query this database directly
|
|
||||||
const db = new Database(historyDb, { readonly: true });
|
|
||||||
const turn = db.prepare(`
|
|
||||||
SELECT * FROM turns
|
|
||||||
WHERE conversation_id = ?
|
|
||||||
ORDER BY turn_number DESC
|
|
||||||
LIMIT 1
|
|
||||||
`).get(conversationId);
|
|
||||||
|
|
||||||
db.close();
|
|
||||||
|
|
||||||
if (turn) {
|
|
||||||
// Found in this project - return the projectId
|
|
||||||
// Note: projectPath is set to projectId since we don't have the original path stored
|
|
||||||
return { projectPath: projectId, projectId };
|
|
||||||
}
|
|
||||||
} catch {
|
|
||||||
// Skip this database (might be corrupted or locked)
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
} catch {
|
} catch {
|
||||||
// Failed to read projects directory
|
// Skip this database (might be corrupted or locked)
|
||||||
|
} finally {
|
||||||
|
db?.close();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -13,10 +13,10 @@ import type { ToolSchema, ToolResult } from '../types/tool.js';
|
|||||||
import { spawn } from 'child_process';
|
import { spawn } from 'child_process';
|
||||||
import { join } from 'path';
|
import { join } from 'path';
|
||||||
import { getProjectRoot } from '../utils/path-validator.js';
|
import { getProjectRoot } from '../utils/path-validator.js';
|
||||||
import { getCodexLensPython } from '../utils/codexlens-path.js';
|
import { getCodexLensHiddenPython } from '../utils/codexlens-path.js';
|
||||||
|
|
||||||
// CodexLens venv configuration
|
// CodexLens venv configuration
|
||||||
const CODEXLENS_VENV = getCodexLensPython();
|
const CODEXLENS_VENV = getCodexLensHiddenPython();
|
||||||
|
|
||||||
// Define Zod schema for validation
|
// Define Zod schema for validation
|
||||||
const ParamsSchema = z.object({
|
const ParamsSchema = z.object({
|
||||||
@@ -122,8 +122,11 @@ except Exception as e:
|
|||||||
`;
|
`;
|
||||||
|
|
||||||
const child = spawn(CODEXLENS_VENV, ['-c', pythonScript], {
|
const child = spawn(CODEXLENS_VENV, ['-c', pythonScript], {
|
||||||
|
shell: false,
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout,
|
timeout,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
||||||
});
|
});
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
|
|||||||
@@ -11,10 +11,15 @@
|
|||||||
|
|
||||||
import { z } from 'zod';
|
import { z } from 'zod';
|
||||||
import type { ToolSchema, ToolResult } from '../types/tool.js';
|
import type { ToolSchema, ToolResult } from '../types/tool.js';
|
||||||
import { spawn, execSync, exec } from 'child_process';
|
import { spawn, spawnSync, execSync, exec, type SpawnOptions, type SpawnSyncOptionsWithStringEncoding } from 'child_process';
|
||||||
import { existsSync, mkdirSync, rmSync } from 'fs';
|
import { existsSync, mkdirSync, rmSync, statSync } from 'fs';
|
||||||
import { join } from 'path';
|
import { join, resolve } from 'path';
|
||||||
import { getSystemPython, parsePythonVersion, isPythonVersionCompatible } from '../utils/python-utils.js';
|
import {
|
||||||
|
getSystemPythonCommand,
|
||||||
|
parsePythonVersion,
|
||||||
|
isPythonVersionCompatible,
|
||||||
|
type PythonCommandSpec,
|
||||||
|
} from '../utils/python-utils.js';
|
||||||
import { EXEC_TIMEOUTS } from '../utils/exec-constants.js';
|
import { EXEC_TIMEOUTS } from '../utils/exec-constants.js';
|
||||||
import {
|
import {
|
||||||
UvManager,
|
UvManager,
|
||||||
@@ -26,6 +31,7 @@ import {
|
|||||||
getCodexLensDataDir,
|
getCodexLensDataDir,
|
||||||
getCodexLensVenvDir,
|
getCodexLensVenvDir,
|
||||||
getCodexLensPython,
|
getCodexLensPython,
|
||||||
|
getCodexLensHiddenPython,
|
||||||
getCodexLensPip,
|
getCodexLensPip,
|
||||||
} from '../utils/codexlens-path.js';
|
} from '../utils/codexlens-path.js';
|
||||||
import {
|
import {
|
||||||
@@ -58,6 +64,10 @@ interface SemanticStatusCache {
|
|||||||
let semanticStatusCache: SemanticStatusCache | null = null;
|
let semanticStatusCache: SemanticStatusCache | null = null;
|
||||||
const SEMANTIC_STATUS_TTL = 5 * 60 * 1000; // 5 minutes TTL
|
const SEMANTIC_STATUS_TTL = 5 * 60 * 1000; // 5 minutes TTL
|
||||||
|
|
||||||
|
type HiddenCodexLensSpawnSyncOptions = Omit<SpawnSyncOptionsWithStringEncoding, 'encoding'> & {
|
||||||
|
encoding?: BufferEncoding;
|
||||||
|
};
|
||||||
|
|
||||||
// Track running indexing process for cancellation
|
// Track running indexing process for cancellation
|
||||||
let currentIndexingProcess: ReturnType<typeof spawn> | null = null;
|
let currentIndexingProcess: ReturnType<typeof spawn> | null = null;
|
||||||
let currentIndexingAborted = false;
|
let currentIndexingAborted = false;
|
||||||
@@ -69,13 +79,34 @@ const VENV_CHECK_TIMEOUT = process.platform === 'win32' ? 15000 : 10000;
|
|||||||
* Pre-flight check: verify Python 3.9+ is available before attempting bootstrap.
|
* Pre-flight check: verify Python 3.9+ is available before attempting bootstrap.
|
||||||
* Returns an error message if Python is not suitable, or null if OK.
|
* Returns an error message if Python is not suitable, or null if OK.
|
||||||
*/
|
*/
|
||||||
|
function probePythonVersion(
|
||||||
|
pythonCommand: PythonCommandSpec,
|
||||||
|
runner: typeof spawnSync = spawnSync,
|
||||||
|
): string {
|
||||||
|
const result = runner(
|
||||||
|
pythonCommand.command,
|
||||||
|
[...pythonCommand.args, '--version'],
|
||||||
|
buildCodexLensSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
|
if (result.error) {
|
||||||
|
throw result.error;
|
||||||
|
}
|
||||||
|
|
||||||
|
const versionOutput = `${result.stdout ?? ''}${result.stderr ?? ''}`.trim();
|
||||||
|
if (result.status !== 0) {
|
||||||
|
throw new Error(versionOutput || `Python version probe exited with code ${String(result.status)}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
return versionOutput;
|
||||||
|
}
|
||||||
|
|
||||||
function preFlightCheck(): string | null {
|
function preFlightCheck(): string | null {
|
||||||
try {
|
try {
|
||||||
const pythonCmd = getSystemPython();
|
const pythonCommand = getSystemPythonCommand();
|
||||||
const version = execSync(`${pythonCmd} --version 2>&1`, {
|
const version = probePythonVersion(pythonCommand);
|
||||||
encoding: 'utf8',
|
|
||||||
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
|
||||||
}).trim();
|
|
||||||
const parsed = parsePythonVersion(version);
|
const parsed = parsePythonVersion(version);
|
||||||
if (!parsed) {
|
if (!parsed) {
|
||||||
return `Cannot parse Python version from: "${version}". Ensure Python 3.9+ is installed.`;
|
return `Cannot parse Python version from: "${version}". Ensure Python 3.9+ is installed.`;
|
||||||
@@ -244,7 +275,7 @@ async function checkVenvStatus(force = false): Promise<ReadyStatus> {
|
|||||||
return result;
|
return result;
|
||||||
}
|
}
|
||||||
|
|
||||||
const pythonPath = getCodexLensPython();
|
const pythonPath = getCodexLensHiddenPython();
|
||||||
|
|
||||||
// Check python executable exists
|
// Check python executable exists
|
||||||
if (!existsSync(pythonPath)) {
|
if (!existsSync(pythonPath)) {
|
||||||
@@ -259,18 +290,21 @@ async function checkVenvStatus(force = false): Promise<ReadyStatus> {
|
|||||||
console.log('[PERF][CodexLens] checkVenvStatus spawning Python...');
|
console.log('[PERF][CodexLens] checkVenvStatus spawning Python...');
|
||||||
|
|
||||||
return new Promise((resolve) => {
|
return new Promise((resolve) => {
|
||||||
const child = spawn(pythonPath, ['-c', 'import sys; import codexlens; import watchdog; print(f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"); print(codexlens.__version__)'], {
|
const child = spawn(
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
pythonPath,
|
||||||
timeout: VENV_CHECK_TIMEOUT,
|
['-c', 'import sys; import codexlens; import watchdog; print(f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"); print(codexlens.__version__)'],
|
||||||
});
|
buildCodexLensSpawnOptions(venvPath, VENV_CHECK_TIMEOUT, {
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
stdout += data.toString();
|
stdout += data.toString();
|
||||||
});
|
});
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -380,18 +414,21 @@ try:
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(json.dumps({"available": False, "error": str(e)}))
|
print(json.dumps({"available": False, "error": str(e)}))
|
||||||
`;
|
`;
|
||||||
const child = spawn(getCodexLensPython(), ['-c', checkCode], {
|
const child = spawn(
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
getCodexLensHiddenPython(),
|
||||||
timeout: 15000,
|
['-c', checkCode],
|
||||||
});
|
buildCodexLensSpawnOptions(getCodexLensVenvDir(), 15000, {
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
stdout += data.toString();
|
stdout += data.toString();
|
||||||
});
|
});
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -441,13 +478,16 @@ async function ensureLiteLLMEmbedderReady(): Promise<BootstrapResult> {
|
|||||||
|
|
||||||
// Check if ccw_litellm can be imported
|
// Check if ccw_litellm can be imported
|
||||||
const importStatus = await new Promise<{ ok: boolean; error?: string }>((resolve) => {
|
const importStatus = await new Promise<{ ok: boolean; error?: string }>((resolve) => {
|
||||||
const child = spawn(getCodexLensPython(), ['-c', 'import ccw_litellm; print("OK")'], {
|
const child = spawn(
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
getCodexLensHiddenPython(),
|
||||||
timeout: 15000,
|
['-c', 'import ccw_litellm; print("OK")'],
|
||||||
});
|
buildCodexLensSpawnOptions(getCodexLensVenvDir(), 15000, {
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -522,10 +562,19 @@ async function ensureLiteLLMEmbedderReady(): Promise<BootstrapResult> {
|
|||||||
const venvPython = getCodexLensPython();
|
const venvPython = getCodexLensPython();
|
||||||
console.warn(`[CodexLens] pip not found at: ${pipPath}. Attempting to bootstrap pip with ensurepip...`);
|
console.warn(`[CodexLens] pip not found at: ${pipPath}. Attempting to bootstrap pip with ensurepip...`);
|
||||||
try {
|
try {
|
||||||
execSync(`\"${venvPython}\" -m ensurepip --upgrade`, {
|
const ensurePipResult = spawnSync(
|
||||||
stdio: 'inherit',
|
venvPython,
|
||||||
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
['-m', 'ensurepip', '--upgrade'],
|
||||||
});
|
buildCodexLensSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
if (ensurePipResult.error) {
|
||||||
|
throw ensurePipResult.error;
|
||||||
|
}
|
||||||
|
if (ensurePipResult.status !== 0) {
|
||||||
|
throw new Error(`ensurepip exited with code ${String(ensurePipResult.status)}`);
|
||||||
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.warn(`[CodexLens] ensurepip failed: ${(err as Error).message}`);
|
console.warn(`[CodexLens] ensurepip failed: ${(err as Error).message}`);
|
||||||
}
|
}
|
||||||
@@ -549,13 +598,36 @@ async function ensureLiteLLMEmbedderReady(): Promise<BootstrapResult> {
|
|||||||
|
|
||||||
try {
|
try {
|
||||||
if (localPath) {
|
if (localPath) {
|
||||||
const pipFlag = editable ? '-e' : '';
|
const pipArgs = editable ? ['install', '-e', localPath] : ['install', localPath];
|
||||||
const pipInstallSpec = editable ? `"${localPath}"` : `"${localPath}"`;
|
|
||||||
console.log(`[CodexLens] Installing ccw-litellm from local path with pip: ${localPath} (editable: ${editable})`);
|
console.log(`[CodexLens] Installing ccw-litellm from local path with pip: ${localPath} (editable: ${editable})`);
|
||||||
execSync(`"${pipPath}" install ${pipFlag} ${pipInstallSpec}`.replace(/ +/g, ' '), { stdio: 'inherit', timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL });
|
const installResult = spawnSync(
|
||||||
|
pipPath,
|
||||||
|
pipArgs,
|
||||||
|
buildCodexLensSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
if (installResult.error) {
|
||||||
|
throw installResult.error;
|
||||||
|
}
|
||||||
|
if (installResult.status !== 0) {
|
||||||
|
throw new Error(`pip install exited with code ${String(installResult.status)}`);
|
||||||
|
}
|
||||||
} else {
|
} else {
|
||||||
console.log('[CodexLens] Installing ccw-litellm from PyPI with pip...');
|
console.log('[CodexLens] Installing ccw-litellm from PyPI with pip...');
|
||||||
execSync(`"${pipPath}" install ccw-litellm`, { stdio: 'inherit', timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL });
|
const installResult = spawnSync(
|
||||||
|
pipPath,
|
||||||
|
['install', 'ccw-litellm'],
|
||||||
|
buildCodexLensSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
if (installResult.error) {
|
||||||
|
throw installResult.error;
|
||||||
|
}
|
||||||
|
if (installResult.status !== 0) {
|
||||||
|
throw new Error(`pip install exited with code ${String(installResult.status)}`);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -609,7 +681,7 @@ interface PythonEnvInfo {
|
|||||||
* DirectML requires: 64-bit Python, version 3.8-3.12
|
* DirectML requires: 64-bit Python, version 3.8-3.12
|
||||||
*/
|
*/
|
||||||
async function checkPythonEnvForDirectML(): Promise<PythonEnvInfo> {
|
async function checkPythonEnvForDirectML(): Promise<PythonEnvInfo> {
|
||||||
const pythonPath = getCodexLensPython();
|
const pythonPath = getCodexLensHiddenPython();
|
||||||
|
|
||||||
if (!existsSync(pythonPath)) {
|
if (!existsSync(pythonPath)) {
|
||||||
return { version: '', majorMinor: '', architecture: 0, compatible: false, error: 'Python not found in venv' };
|
return { version: '', majorMinor: '', architecture: 0, compatible: false, error: 'Python not found in venv' };
|
||||||
@@ -619,8 +691,19 @@ async function checkPythonEnvForDirectML(): Promise<PythonEnvInfo> {
|
|||||||
// Get Python version and architecture in one call
|
// Get Python version and architecture in one call
|
||||||
// Use % formatting instead of f-string to avoid Windows shell escaping issues with curly braces
|
// Use % formatting instead of f-string to avoid Windows shell escaping issues with curly braces
|
||||||
const checkScript = `import sys, struct; print('%d.%d.%d|%d' % (sys.version_info.major, sys.version_info.minor, sys.version_info.micro, struct.calcsize('P') * 8))`;
|
const checkScript = `import sys, struct; print('%d.%d.%d|%d' % (sys.version_info.major, sys.version_info.minor, sys.version_info.micro, struct.calcsize('P') * 8))`;
|
||||||
const result = execSync(`"${pythonPath}" -c "${checkScript}"`, { encoding: 'utf-8', timeout: 10000 }).trim();
|
const result = spawnSync(
|
||||||
const [version, archStr] = result.split('|');
|
pythonPath,
|
||||||
|
['-c', checkScript],
|
||||||
|
buildCodexLensSpawnSyncOptions({ timeout: 10000 }),
|
||||||
|
);
|
||||||
|
if (result.error) {
|
||||||
|
throw result.error;
|
||||||
|
}
|
||||||
|
const output = `${result.stdout ?? ''}${result.stderr ?? ''}`.trim();
|
||||||
|
if (result.status !== 0) {
|
||||||
|
throw new Error(output || `Python probe exited with code ${String(result.status)}`);
|
||||||
|
}
|
||||||
|
const [version, archStr] = output.split('|');
|
||||||
const architecture = parseInt(archStr, 10);
|
const architecture = parseInt(archStr, 10);
|
||||||
const [major, minor] = version.split('.').map(Number);
|
const [major, minor] = version.split('.').map(Number);
|
||||||
const majorMinor = `${major}.${minor}`;
|
const majorMinor = `${major}.${minor}`;
|
||||||
@@ -898,15 +981,18 @@ async function installSemantic(gpuMode: GpuMode = 'cpu'): Promise<BootstrapResul
|
|||||||
console.log(`[CodexLens] Packages: ${packages.join(', ')}`);
|
console.log(`[CodexLens] Packages: ${packages.join(', ')}`);
|
||||||
|
|
||||||
// Install ONNX Runtime first with force-reinstall to ensure clean state
|
// Install ONNX Runtime first with force-reinstall to ensure clean state
|
||||||
const installOnnx = spawn(pipPath, ['install', '--force-reinstall', onnxPackage], {
|
const installOnnx = spawn(
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
pipPath,
|
||||||
timeout: 600000, // 10 minutes for GPU packages
|
['install', '--force-reinstall', onnxPackage],
|
||||||
});
|
buildCodexLensSpawnOptions(getCodexLensVenvDir(), 600000, {
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
let onnxStdout = '';
|
let onnxStdout = '';
|
||||||
let onnxStderr = '';
|
let onnxStderr = '';
|
||||||
|
|
||||||
installOnnx.stdout.on('data', (data) => {
|
installOnnx.stdout?.on('data', (data) => {
|
||||||
onnxStdout += data.toString();
|
onnxStdout += data.toString();
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line.includes('Downloading') || line.includes('Installing')) {
|
if (line.includes('Downloading') || line.includes('Installing')) {
|
||||||
@@ -914,7 +1000,7 @@ async function installSemantic(gpuMode: GpuMode = 'cpu'): Promise<BootstrapResul
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
installOnnx.stderr.on('data', (data) => {
|
installOnnx.stderr?.on('data', (data) => {
|
||||||
onnxStderr += data.toString();
|
onnxStderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -927,15 +1013,18 @@ async function installSemantic(gpuMode: GpuMode = 'cpu'): Promise<BootstrapResul
|
|||||||
console.log(`[CodexLens] ${onnxPackage} installed successfully`);
|
console.log(`[CodexLens] ${onnxPackage} installed successfully`);
|
||||||
|
|
||||||
// Now install remaining packages
|
// Now install remaining packages
|
||||||
const child = spawn(pipPath, ['install', ...packages], {
|
const child = spawn(
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
pipPath,
|
||||||
timeout: 600000,
|
['install', ...packages],
|
||||||
});
|
buildCodexLensSpawnOptions(getCodexLensVenvDir(), 600000, {
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
stdout += data.toString();
|
stdout += data.toString();
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line.includes('Downloading') || line.includes('Installing') || line.includes('Collecting')) {
|
if (line.includes('Downloading') || line.includes('Installing') || line.includes('Collecting')) {
|
||||||
@@ -943,7 +1032,7 @@ async function installSemantic(gpuMode: GpuMode = 'cpu'): Promise<BootstrapResul
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -1028,8 +1117,20 @@ async function bootstrapVenv(): Promise<BootstrapResult> {
|
|||||||
if (!existsSync(venvDir)) {
|
if (!existsSync(venvDir)) {
|
||||||
try {
|
try {
|
||||||
console.log('[CodexLens] Creating virtual environment...');
|
console.log('[CodexLens] Creating virtual environment...');
|
||||||
const pythonCmd = getSystemPython();
|
const pythonCmd = getSystemPythonCommand();
|
||||||
execSync(`${pythonCmd} -m venv "${venvDir}"`, { stdio: 'inherit', timeout: EXEC_TIMEOUTS.PROCESS_SPAWN });
|
const createResult = spawnSync(
|
||||||
|
pythonCmd.command,
|
||||||
|
[...pythonCmd.args, '-m', 'venv', venvDir],
|
||||||
|
buildCodexLensSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PROCESS_SPAWN,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
if (createResult.error) {
|
||||||
|
throw createResult.error;
|
||||||
|
}
|
||||||
|
if (createResult.status !== 0) {
|
||||||
|
throw new Error(`venv creation exited with code ${String(createResult.status)}`);
|
||||||
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
@@ -1049,10 +1150,19 @@ async function bootstrapVenv(): Promise<BootstrapResult> {
|
|||||||
const venvPython = getCodexLensPython();
|
const venvPython = getCodexLensPython();
|
||||||
console.warn(`[CodexLens] pip not found at: ${pipPath}. Attempting to bootstrap pip with ensurepip...`);
|
console.warn(`[CodexLens] pip not found at: ${pipPath}. Attempting to bootstrap pip with ensurepip...`);
|
||||||
try {
|
try {
|
||||||
execSync(`\"${venvPython}\" -m ensurepip --upgrade`, {
|
const ensurePipResult = spawnSync(
|
||||||
stdio: 'inherit',
|
venvPython,
|
||||||
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
['-m', 'ensurepip', '--upgrade'],
|
||||||
});
|
buildCodexLensSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
if (ensurePipResult.error) {
|
||||||
|
throw ensurePipResult.error;
|
||||||
|
}
|
||||||
|
if (ensurePipResult.status !== 0) {
|
||||||
|
throw new Error(`ensurepip exited with code ${String(ensurePipResult.status)}`);
|
||||||
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.warn(`[CodexLens] ensurepip failed: ${(err as Error).message}`);
|
console.warn(`[CodexLens] ensurepip failed: ${(err as Error).message}`);
|
||||||
}
|
}
|
||||||
@@ -1063,8 +1173,20 @@ async function bootstrapVenv(): Promise<BootstrapResult> {
|
|||||||
console.warn('[CodexLens] pip still missing after ensurepip; recreating venv with system Python...');
|
console.warn('[CodexLens] pip still missing after ensurepip; recreating venv with system Python...');
|
||||||
try {
|
try {
|
||||||
rmSync(venvDir, { recursive: true, force: true });
|
rmSync(venvDir, { recursive: true, force: true });
|
||||||
const pythonCmd = getSystemPython();
|
const pythonCmd = getSystemPythonCommand();
|
||||||
execSync(`${pythonCmd} -m venv \"${venvDir}\"`, { stdio: 'inherit', timeout: EXEC_TIMEOUTS.PROCESS_SPAWN });
|
const recreateResult = spawnSync(
|
||||||
|
pythonCmd.command,
|
||||||
|
[...pythonCmd.args, '-m', 'venv', venvDir],
|
||||||
|
buildCodexLensSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PROCESS_SPAWN,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
if (recreateResult.error) {
|
||||||
|
throw recreateResult.error;
|
||||||
|
}
|
||||||
|
if (recreateResult.status !== 0) {
|
||||||
|
throw new Error(`venv recreation exited with code ${String(recreateResult.status)}`);
|
||||||
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
@@ -1090,9 +1212,21 @@ async function bootstrapVenv(): Promise<BootstrapResult> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
const editable = isDevEnvironment() && !discovery.insideNodeModules;
|
const editable = isDevEnvironment() && !discovery.insideNodeModules;
|
||||||
const pipFlag = editable ? ' -e' : '';
|
const pipArgs = editable ? ['install', '-e', discovery.path] : ['install', discovery.path];
|
||||||
console.log(`[CodexLens] Installing from local path: ${discovery.path} (editable: ${editable})`);
|
console.log(`[CodexLens] Installing from local path: ${discovery.path} (editable: ${editable})`);
|
||||||
execSync(`"${pipPath}" install${pipFlag} "${discovery.path}"`, { stdio: 'inherit', timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL });
|
const installResult = spawnSync(
|
||||||
|
pipPath,
|
||||||
|
pipArgs,
|
||||||
|
buildCodexLensSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
if (installResult.error) {
|
||||||
|
throw installResult.error;
|
||||||
|
}
|
||||||
|
if (installResult.status !== 0) {
|
||||||
|
throw new Error(`pip install exited with code ${String(installResult.status)}`);
|
||||||
|
}
|
||||||
|
|
||||||
// Clear cache after successful installation
|
// Clear cache after successful installation
|
||||||
clearVenvStatusCache();
|
clearVenvStatusCache();
|
||||||
@@ -1237,6 +1371,12 @@ function shouldRetryWithoutLanguageFilters(args: string[], error?: string): bool
|
|||||||
return args.includes('--language') && Boolean(error && /Got unexpected extra arguments?\b/i.test(error));
|
return args.includes('--language') && Boolean(error && /Got unexpected extra arguments?\b/i.test(error));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function shouldRetryWithLegacySearchArgs(args: string[], error?: string): boolean {
|
||||||
|
return args[0] === 'search'
|
||||||
|
&& (args.includes('--limit') || args.includes('--mode') || args.includes('--offset'))
|
||||||
|
&& Boolean(error && /Got unexpected extra arguments?\b/i.test(error));
|
||||||
|
}
|
||||||
|
|
||||||
function stripFlag(args: string[], flag: string): string[] {
|
function stripFlag(args: string[], flag: string): string[] {
|
||||||
return args.filter((arg) => arg !== flag);
|
return args.filter((arg) => arg !== flag);
|
||||||
}
|
}
|
||||||
@@ -1253,6 +1393,29 @@ function stripOptionWithValues(args: string[], option: string): string[] {
|
|||||||
return nextArgs;
|
return nextArgs;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function stripSearchCompatibilityOptions(args: string[]): string[] {
|
||||||
|
return stripOptionWithValues(
|
||||||
|
stripOptionWithValues(
|
||||||
|
stripOptionWithValues(args, '--offset'),
|
||||||
|
'--mode',
|
||||||
|
),
|
||||||
|
'--limit',
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function appendWarning(existing: string | undefined, next: string | undefined): string | undefined {
|
||||||
|
if (!next) {
|
||||||
|
return existing;
|
||||||
|
}
|
||||||
|
if (!existing) {
|
||||||
|
return next;
|
||||||
|
}
|
||||||
|
if (existing.includes(next)) {
|
||||||
|
return existing;
|
||||||
|
}
|
||||||
|
return `${existing} ${next}`;
|
||||||
|
}
|
||||||
|
|
||||||
function shouldRetryWithAstGrepPreference(args: string[], error?: string): boolean {
|
function shouldRetryWithAstGrepPreference(args: string[], error?: string): boolean {
|
||||||
return !args.includes('--use-astgrep')
|
return !args.includes('--use-astgrep')
|
||||||
&& !args.includes('--no-use-astgrep')
|
&& !args.includes('--no-use-astgrep')
|
||||||
@@ -1334,6 +1497,142 @@ function tryExtractJsonPayload(raw: string): unknown | null {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function parseLegacySearchPaths(output: string | undefined, cwd: string): string[] {
|
||||||
|
const lines = stripAnsiCodes(output || '')
|
||||||
|
.split(/\r?\n/)
|
||||||
|
.map((line) => line.trim())
|
||||||
|
.filter(Boolean);
|
||||||
|
|
||||||
|
const filePaths: string[] = [];
|
||||||
|
for (const line of lines) {
|
||||||
|
if (line.includes('RuntimeWarning:') || line.startsWith('warn(') || line.startsWith('Warning:')) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
const candidate = /^[a-zA-Z]:[\\/]|^\//.test(line)
|
||||||
|
? line
|
||||||
|
: resolve(cwd, line);
|
||||||
|
|
||||||
|
try {
|
||||||
|
if (statSync(candidate).isFile()) {
|
||||||
|
filePaths.push(candidate);
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return [...new Set(filePaths)];
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildLegacySearchPayload(query: string, filePaths: string[], limit: number): Record<string, unknown> {
|
||||||
|
const results = filePaths.slice(0, limit).map((path, index) => ({
|
||||||
|
path,
|
||||||
|
score: Math.max(0.1, 1 - index * 0.05),
|
||||||
|
excerpt: '',
|
||||||
|
content: '',
|
||||||
|
source: 'legacy_text_output',
|
||||||
|
symbol: null,
|
||||||
|
}));
|
||||||
|
|
||||||
|
return {
|
||||||
|
success: true,
|
||||||
|
result: {
|
||||||
|
query,
|
||||||
|
count: filePaths.length,
|
||||||
|
results,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildLegacySearchFilesPayload(query: string, filePaths: string[], limit: number): Record<string, unknown> {
|
||||||
|
return {
|
||||||
|
success: true,
|
||||||
|
result: {
|
||||||
|
query,
|
||||||
|
count: filePaths.length,
|
||||||
|
files: filePaths.slice(0, limit),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildEmptySearchPayload(query: string, filesOnly: boolean): Record<string, unknown> {
|
||||||
|
return filesOnly
|
||||||
|
? {
|
||||||
|
success: true,
|
||||||
|
result: {
|
||||||
|
query,
|
||||||
|
count: 0,
|
||||||
|
files: [],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
: {
|
||||||
|
success: true,
|
||||||
|
result: {
|
||||||
|
query,
|
||||||
|
count: 0,
|
||||||
|
results: [],
|
||||||
|
},
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function normalizeSearchCommandResult(
|
||||||
|
result: ExecuteResult,
|
||||||
|
options: { query: string; cwd: string; limit: number; filesOnly: boolean },
|
||||||
|
): ExecuteResult {
|
||||||
|
if (!result.success) {
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
const { query, cwd, limit, filesOnly } = options;
|
||||||
|
const rawOutput = typeof result.output === 'string' ? result.output : '';
|
||||||
|
const parsedPayload = rawOutput ? tryExtractJsonPayload(rawOutput) : null;
|
||||||
|
if (parsedPayload !== null) {
|
||||||
|
if (filesOnly) {
|
||||||
|
result.files = parsedPayload;
|
||||||
|
} else {
|
||||||
|
result.results = parsedPayload;
|
||||||
|
}
|
||||||
|
delete result.output;
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
const legacyPaths = parseLegacySearchPaths(rawOutput, cwd);
|
||||||
|
if (legacyPaths.length > 0) {
|
||||||
|
const warning = filesOnly
|
||||||
|
? 'CodexLens CLI returned legacy plain-text file output; synthesized JSON-compatible search_files results.'
|
||||||
|
: 'CodexLens CLI returned legacy plain-text search output; synthesized JSON-compatible search results.';
|
||||||
|
|
||||||
|
if (filesOnly) {
|
||||||
|
result.files = buildLegacySearchFilesPayload(query, legacyPaths, limit);
|
||||||
|
} else {
|
||||||
|
result.results = buildLegacySearchPayload(query, legacyPaths, limit);
|
||||||
|
}
|
||||||
|
delete result.output;
|
||||||
|
result.warning = appendWarning(result.warning, warning);
|
||||||
|
result.message = appendWarning(result.message, warning);
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
const warning = rawOutput.trim()
|
||||||
|
? (filesOnly
|
||||||
|
? 'CodexLens CLI returned non-JSON search_files output; synthesized an empty JSON-compatible fallback payload.'
|
||||||
|
: 'CodexLens CLI returned non-JSON search output; synthesized an empty JSON-compatible fallback payload.')
|
||||||
|
: (filesOnly
|
||||||
|
? 'CodexLens CLI returned empty stdout in JSON mode for search_files; synthesized an empty JSON-compatible fallback payload.'
|
||||||
|
: 'CodexLens CLI returned empty stdout in JSON mode for search; synthesized an empty JSON-compatible fallback payload.');
|
||||||
|
|
||||||
|
if (filesOnly) {
|
||||||
|
result.files = buildEmptySearchPayload(query, true);
|
||||||
|
} else {
|
||||||
|
result.results = buildEmptySearchPayload(query, false);
|
||||||
|
}
|
||||||
|
delete result.output;
|
||||||
|
result.warning = appendWarning(result.warning, warning);
|
||||||
|
result.message = appendWarning(result.message, warning);
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
function extractStructuredError(payload: unknown): string | null {
|
function extractStructuredError(payload: unknown): string | null {
|
||||||
if (!payload || typeof payload !== 'object' || Array.isArray(payload)) {
|
if (!payload || typeof payload !== 'object' || Array.isArray(payload)) {
|
||||||
return null;
|
return null;
|
||||||
@@ -1394,6 +1693,11 @@ async function executeCodexLens(args: string[], options: ExecuteOptions = {}): P
|
|||||||
transform: (currentArgs: string[]) => stripOptionWithValues(currentArgs, '--language'),
|
transform: (currentArgs: string[]) => stripOptionWithValues(currentArgs, '--language'),
|
||||||
warning: 'CodexLens CLI rejected --language filters; retried without language scoping.',
|
warning: 'CodexLens CLI rejected --language filters; retried without language scoping.',
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
shouldRetry: shouldRetryWithLegacySearchArgs,
|
||||||
|
transform: stripSearchCompatibilityOptions,
|
||||||
|
warning: 'CodexLens CLI rejected search --limit/--mode compatibility flags; retried with minimal legacy search args.',
|
||||||
|
},
|
||||||
{
|
{
|
||||||
shouldRetry: shouldRetryWithAstGrepPreference,
|
shouldRetry: shouldRetryWithAstGrepPreference,
|
||||||
transform: (currentArgs: string[]) => [...currentArgs, '--use-astgrep'],
|
transform: (currentArgs: string[]) => [...currentArgs, '--use-astgrep'],
|
||||||
@@ -1441,6 +1745,32 @@ async function executeCodexLens(args: string[], options: ExecuteOptions = {}): P
|
|||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function buildCodexLensSpawnOptions(cwd: string, timeout: number, overrides: SpawnOptions = {}): SpawnOptions {
|
||||||
|
const { env, ...rest } = overrides;
|
||||||
|
return {
|
||||||
|
cwd,
|
||||||
|
shell: false,
|
||||||
|
timeout,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8', ...env },
|
||||||
|
...rest,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildCodexLensSpawnSyncOptions(
|
||||||
|
overrides: HiddenCodexLensSpawnSyncOptions = {},
|
||||||
|
): SpawnSyncOptionsWithStringEncoding {
|
||||||
|
const { env, encoding, ...rest } = overrides;
|
||||||
|
return {
|
||||||
|
shell: false,
|
||||||
|
windowsHide: true,
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8', ...env },
|
||||||
|
...rest,
|
||||||
|
encoding: encoding ?? 'utf8',
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
async function executeCodexLensOnce(args: string[], options: ExecuteOptions = {}): Promise<ExecuteResult> {
|
async function executeCodexLensOnce(args: string[], options: ExecuteOptions = {}): Promise<ExecuteResult> {
|
||||||
const { timeout = 300000, cwd = process.cwd(), onProgress } = options; // Default 5 min
|
const { timeout = 300000, cwd = process.cwd(), onProgress } = options; // Default 5 min
|
||||||
|
|
||||||
@@ -1456,13 +1786,7 @@ async function executeCodexLensOnce(args: string[], options: ExecuteOptions = {}
|
|||||||
// spawn's cwd option handles drive changes correctly on Windows
|
// spawn's cwd option handles drive changes correctly on Windows
|
||||||
const spawnArgs = ['-m', 'codexlens', ...args];
|
const spawnArgs = ['-m', 'codexlens', ...args];
|
||||||
|
|
||||||
const child = spawn(getCodexLensPython(), spawnArgs, {
|
const child = spawn(getCodexLensHiddenPython(), spawnArgs, buildCodexLensSpawnOptions(cwd, timeout));
|
||||||
cwd,
|
|
||||||
shell: false, // CRITICAL: Prevent command injection
|
|
||||||
timeout,
|
|
||||||
// Ensure proper encoding on Windows
|
|
||||||
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
|
||||||
});
|
|
||||||
|
|
||||||
// Track indexing process for cancellation (only for init commands)
|
// Track indexing process for cancellation (only for init commands)
|
||||||
const isIndexingCommand = args.includes('init');
|
const isIndexingCommand = args.includes('init');
|
||||||
@@ -1566,13 +1890,22 @@ async function executeCodexLensOnce(args: string[], options: ExecuteOptions = {}
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const trimmedStdout = stdout.trim();
|
||||||
if (code === 0) {
|
if (code === 0) {
|
||||||
safeResolve({ success: true, output: stdout.trim() });
|
const warning = args.includes('--json') && trimmedStdout.length === 0
|
||||||
|
? `CodexLens CLI exited successfully but produced empty stdout in JSON mode for ${args[0] ?? 'command'}.`
|
||||||
|
: undefined;
|
||||||
|
safeResolve({
|
||||||
|
success: true,
|
||||||
|
output: trimmedStdout || undefined,
|
||||||
|
warning,
|
||||||
|
message: warning,
|
||||||
|
});
|
||||||
} else {
|
} else {
|
||||||
safeResolve({
|
safeResolve({
|
||||||
success: false,
|
success: false,
|
||||||
error: extractCodexLensFailure(stdout, stderr, code),
|
error: extractCodexLensFailure(stdout, stderr, code),
|
||||||
output: stdout.trim() || undefined,
|
output: trimmedStdout || undefined,
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
@@ -1627,18 +1960,12 @@ async function searchCode(params: Params): Promise<ExecuteResult> {
|
|||||||
args.push('--enrich');
|
args.push('--enrich');
|
||||||
}
|
}
|
||||||
|
|
||||||
const result = await executeCodexLens(args, { cwd: path });
|
return normalizeSearchCommandResult(await executeCodexLens(args, { cwd: path }), {
|
||||||
|
query,
|
||||||
if (result.success && result.output) {
|
cwd: path,
|
||||||
try {
|
limit,
|
||||||
result.results = JSON.parse(result.output);
|
filesOnly: false,
|
||||||
delete result.output;
|
});
|
||||||
} catch {
|
|
||||||
// Keep raw output if JSON parse fails
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return result;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -1672,18 +1999,12 @@ async function searchFiles(params: Params): Promise<ExecuteResult> {
|
|||||||
args.push('--enrich');
|
args.push('--enrich');
|
||||||
}
|
}
|
||||||
|
|
||||||
const result = await executeCodexLens(args, { cwd: path });
|
return normalizeSearchCommandResult(await executeCodexLens(args, { cwd: path }), {
|
||||||
|
query,
|
||||||
if (result.success && result.output) {
|
cwd: path,
|
||||||
try {
|
limit,
|
||||||
result.files = JSON.parse(result.output);
|
filesOnly: true,
|
||||||
delete result.output;
|
});
|
||||||
} catch {
|
|
||||||
// Keep raw output if JSON parse fails
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return result;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -2185,11 +2506,18 @@ export {
|
|||||||
|
|
||||||
// Export Python path for direct spawn usage (e.g., watcher)
|
// Export Python path for direct spawn usage (e.g., watcher)
|
||||||
export function getVenvPythonPath(): string {
|
export function getVenvPythonPath(): string {
|
||||||
return getCodexLensPython();
|
return getCodexLensHiddenPython();
|
||||||
}
|
}
|
||||||
|
|
||||||
export type { GpuMode, PythonEnvInfo };
|
export type { GpuMode, PythonEnvInfo };
|
||||||
|
|
||||||
|
export const __testables = {
|
||||||
|
normalizeSearchCommandResult,
|
||||||
|
parseLegacySearchPaths,
|
||||||
|
buildCodexLensSpawnOptions,
|
||||||
|
probePythonVersion,
|
||||||
|
};
|
||||||
|
|
||||||
// Backward-compatible export for tests
|
// Backward-compatible export for tests
|
||||||
export const codexLensTool = {
|
export const codexLensTool = {
|
||||||
name: schema.name,
|
name: schema.name,
|
||||||
|
|||||||
@@ -12,7 +12,7 @@
|
|||||||
import { spawn } from 'child_process';
|
import { spawn } from 'child_process';
|
||||||
import { existsSync } from 'fs';
|
import { existsSync } from 'fs';
|
||||||
import { join } from 'path';
|
import { join } from 'path';
|
||||||
import { getCodexLensPython, getCodexLensVenvDir } from '../utils/codexlens-path.js';
|
import { getCodexLensPython, getCodexLensHiddenPython, getCodexLensVenvDir } from '../utils/codexlens-path.js';
|
||||||
|
|
||||||
export interface LiteLLMConfig {
|
export interface LiteLLMConfig {
|
||||||
pythonPath?: string; // Default: CodexLens venv Python
|
pythonPath?: string; // Default: CodexLens venv Python
|
||||||
@@ -24,7 +24,7 @@ export interface LiteLLMConfig {
|
|||||||
const IS_WINDOWS = process.platform === 'win32';
|
const IS_WINDOWS = process.platform === 'win32';
|
||||||
const CODEXLENS_VENV = getCodexLensVenvDir();
|
const CODEXLENS_VENV = getCodexLensVenvDir();
|
||||||
const VENV_BIN_DIR = IS_WINDOWS ? 'Scripts' : 'bin';
|
const VENV_BIN_DIR = IS_WINDOWS ? 'Scripts' : 'bin';
|
||||||
const PYTHON_EXECUTABLE = IS_WINDOWS ? 'python.exe' : 'python';
|
const PYTHON_EXECUTABLE = IS_WINDOWS ? 'pythonw.exe' : 'python';
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Get the Python path from CodexLens venv
|
* Get the Python path from CodexLens venv
|
||||||
@@ -36,6 +36,10 @@ export function getCodexLensVenvPython(): string {
|
|||||||
if (existsSync(venvPython)) {
|
if (existsSync(venvPython)) {
|
||||||
return venvPython;
|
return venvPython;
|
||||||
}
|
}
|
||||||
|
const hiddenPython = getCodexLensHiddenPython();
|
||||||
|
if (existsSync(hiddenPython)) {
|
||||||
|
return hiddenPython;
|
||||||
|
}
|
||||||
// Fallback to system Python if venv not available
|
// Fallback to system Python if venv not available
|
||||||
return 'python';
|
return 'python';
|
||||||
}
|
}
|
||||||
@@ -46,10 +50,14 @@ export function getCodexLensVenvPython(): string {
|
|||||||
* @returns Path to Python executable
|
* @returns Path to Python executable
|
||||||
*/
|
*/
|
||||||
export function getCodexLensPythonPath(): string {
|
export function getCodexLensPythonPath(): string {
|
||||||
const codexLensPython = getCodexLensPython();
|
const codexLensPython = getCodexLensHiddenPython();
|
||||||
if (existsSync(codexLensPython)) {
|
if (existsSync(codexLensPython)) {
|
||||||
return codexLensPython;
|
return codexLensPython;
|
||||||
}
|
}
|
||||||
|
const fallbackPython = getCodexLensPython();
|
||||||
|
if (existsSync(fallbackPython)) {
|
||||||
|
return fallbackPython;
|
||||||
|
}
|
||||||
// Fallback to system Python if venv not available
|
// Fallback to system Python if venv not available
|
||||||
return 'python';
|
return 'python';
|
||||||
}
|
}
|
||||||
@@ -100,8 +108,10 @@ export class LiteLLMClient {
|
|||||||
|
|
||||||
return new Promise((resolve, reject) => {
|
return new Promise((resolve, reject) => {
|
||||||
const proc = spawn(this.pythonPath, ['-m', 'ccw_litellm.cli', ...args], {
|
const proc = spawn(this.pythonPath, ['-m', 'ccw_litellm.cli', ...args], {
|
||||||
|
shell: false,
|
||||||
|
windowsHide: true,
|
||||||
stdio: ['pipe', 'pipe', 'pipe'],
|
stdio: ['pipe', 'pipe', 'pipe'],
|
||||||
env: { ...process.env }
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' }
|
||||||
});
|
});
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
|
|||||||
@@ -20,7 +20,7 @@
|
|||||||
|
|
||||||
import { z } from 'zod';
|
import { z } from 'zod';
|
||||||
import type { ToolSchema, ToolResult } from '../types/tool.js';
|
import type { ToolSchema, ToolResult } from '../types/tool.js';
|
||||||
import { spawn, execSync } from 'child_process';
|
import { spawn, spawnSync, type SpawnOptions } from 'child_process';
|
||||||
import { existsSync, readFileSync, statSync } from 'fs';
|
import { existsSync, readFileSync, statSync } from 'fs';
|
||||||
import { dirname, join, resolve } from 'path';
|
import { dirname, join, resolve } from 'path';
|
||||||
import {
|
import {
|
||||||
@@ -346,8 +346,12 @@ interface SearchMetadata {
|
|||||||
api_max_workers?: number;
|
api_max_workers?: number;
|
||||||
endpoint_count?: number;
|
endpoint_count?: number;
|
||||||
use_gpu?: boolean;
|
use_gpu?: boolean;
|
||||||
|
reranker_enabled?: boolean;
|
||||||
|
reranker_backend?: string;
|
||||||
|
reranker_model?: string;
|
||||||
cascade_strategy?: string;
|
cascade_strategy?: string;
|
||||||
staged_stage2_mode?: string;
|
staged_stage2_mode?: string;
|
||||||
|
static_graph_enabled?: boolean;
|
||||||
preset?: string;
|
preset?: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -474,8 +478,52 @@ const CODEX_LENS_FTS_COMPATIBILITY_PATTERNS = [
|
|||||||
];
|
];
|
||||||
|
|
||||||
let codexLensFtsBackendBroken = false;
|
let codexLensFtsBackendBroken = false;
|
||||||
|
const autoInitJobs = new Map<string, { startedAt: number; languages?: string[] }>();
|
||||||
const autoEmbedJobs = new Map<string, { startedAt: number; backend?: string; model?: string }>();
|
const autoEmbedJobs = new Map<string, { startedAt: number; backend?: string; model?: string }>();
|
||||||
|
|
||||||
|
type SmartSearchRuntimeOverrides = {
|
||||||
|
checkSemanticStatus?: typeof checkSemanticStatus;
|
||||||
|
getVenvPythonPath?: typeof getVenvPythonPath;
|
||||||
|
spawnProcess?: typeof spawn;
|
||||||
|
now?: () => number;
|
||||||
|
};
|
||||||
|
|
||||||
|
const runtimeOverrides: SmartSearchRuntimeOverrides = {};
|
||||||
|
|
||||||
|
function getSemanticStatusRuntime(): typeof checkSemanticStatus {
|
||||||
|
return runtimeOverrides.checkSemanticStatus ?? checkSemanticStatus;
|
||||||
|
}
|
||||||
|
|
||||||
|
function getVenvPythonPathRuntime(): typeof getVenvPythonPath {
|
||||||
|
return runtimeOverrides.getVenvPythonPath ?? getVenvPythonPath;
|
||||||
|
}
|
||||||
|
|
||||||
|
function getSpawnRuntime(): typeof spawn {
|
||||||
|
return runtimeOverrides.spawnProcess ?? spawn;
|
||||||
|
}
|
||||||
|
|
||||||
|
function getNowRuntime(): number {
|
||||||
|
return (runtimeOverrides.now ?? Date.now)();
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildSmartSearchSpawnOptions(cwd: string, overrides: SpawnOptions = {}): SpawnOptions {
|
||||||
|
const { env, ...rest } = overrides;
|
||||||
|
return {
|
||||||
|
cwd,
|
||||||
|
shell: false,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8', ...env },
|
||||||
|
...rest,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function shouldDetachBackgroundSmartSearchProcess(): boolean {
|
||||||
|
// On Windows, detached Python children can still create a transient console
|
||||||
|
// window even when windowsHide is set. Background warmup only needs to outlive
|
||||||
|
// the current request, not the MCP server process.
|
||||||
|
return process.platform !== 'win32';
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Truncate content to specified length with ellipsis
|
* Truncate content to specified length with ellipsis
|
||||||
* @param content - The content to truncate
|
* @param content - The content to truncate
|
||||||
@@ -523,6 +571,58 @@ interface RipgrepQueryModeResolution {
|
|||||||
warning?: string;
|
warning?: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const GENERATED_QUERY_RE = /(?<!\w)(dist|build|out|coverage|htmlcov|generated|bundle|compiled|artifact|artifacts|\.workflow)(?!\w)/i;
|
||||||
|
const ENV_STYLE_QUERY_RE = /\b[A-Z][A-Z0-9]+(?:_[A-Z0-9]+)+\b/;
|
||||||
|
const TOPIC_TOKEN_RE = /[A-Za-z][A-Za-z0-9]*/g;
|
||||||
|
const LEXICAL_PRIORITY_SURFACE_TOKENS = new Set([
|
||||||
|
'config',
|
||||||
|
'configs',
|
||||||
|
'configuration',
|
||||||
|
'configurations',
|
||||||
|
'setting',
|
||||||
|
'settings',
|
||||||
|
'backend',
|
||||||
|
'backends',
|
||||||
|
'environment',
|
||||||
|
'env',
|
||||||
|
'variable',
|
||||||
|
'variables',
|
||||||
|
'factory',
|
||||||
|
'factories',
|
||||||
|
'override',
|
||||||
|
'overrides',
|
||||||
|
'option',
|
||||||
|
'options',
|
||||||
|
'flag',
|
||||||
|
'flags',
|
||||||
|
'mode',
|
||||||
|
'modes',
|
||||||
|
]);
|
||||||
|
const LEXICAL_PRIORITY_FOCUS_TOKENS = new Set([
|
||||||
|
'embedding',
|
||||||
|
'embeddings',
|
||||||
|
'reranker',
|
||||||
|
'rerankers',
|
||||||
|
'onnx',
|
||||||
|
'api',
|
||||||
|
'litellm',
|
||||||
|
'fastembed',
|
||||||
|
'local',
|
||||||
|
'legacy',
|
||||||
|
'stage',
|
||||||
|
'stage2',
|
||||||
|
'stage3',
|
||||||
|
'stage4',
|
||||||
|
'precomputed',
|
||||||
|
'realtime',
|
||||||
|
'static',
|
||||||
|
'global',
|
||||||
|
'graph',
|
||||||
|
'selection',
|
||||||
|
'model',
|
||||||
|
'models',
|
||||||
|
]);
|
||||||
|
|
||||||
function sanitizeSearchQuery(query: string | undefined): string | undefined {
|
function sanitizeSearchQuery(query: string | undefined): string | undefined {
|
||||||
if (!query) {
|
if (!query) {
|
||||||
return query;
|
return query;
|
||||||
@@ -676,6 +776,18 @@ function noteCodexLensFtsCompatibility(error: string | undefined): boolean {
|
|||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function shouldSurfaceCodexLensFtsCompatibilityWarning(options: {
|
||||||
|
compatibilityTriggeredThisQuery: boolean;
|
||||||
|
skipExactDueToCompatibility: boolean;
|
||||||
|
ripgrepResultCount: number;
|
||||||
|
}): boolean {
|
||||||
|
if (options.ripgrepResultCount > 0) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
return options.compatibilityTriggeredThisQuery || options.skipExactDueToCompatibility;
|
||||||
|
}
|
||||||
|
|
||||||
function summarizeBackendError(error: string | undefined): string {
|
function summarizeBackendError(error: string | undefined): string {
|
||||||
const cleanError = stripAnsi(error || '').trim();
|
const cleanError = stripAnsi(error || '').trim();
|
||||||
if (!cleanError) {
|
if (!cleanError) {
|
||||||
@@ -765,6 +877,61 @@ function hasCentralizedVectorArtifacts(indexRoot: unknown): boolean {
|
|||||||
].every((artifactPath) => existsSync(artifactPath));
|
].every((artifactPath) => existsSync(artifactPath));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function asObjectRecord(value: unknown): Record<string, unknown> | undefined {
|
||||||
|
if (!value || typeof value !== 'object' || Array.isArray(value)) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
return value as Record<string, unknown>;
|
||||||
|
}
|
||||||
|
|
||||||
|
function asFiniteNumber(value: unknown): number | undefined {
|
||||||
|
if (typeof value !== 'number' || !Number.isFinite(value)) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
return value;
|
||||||
|
}
|
||||||
|
|
||||||
|
function asBoolean(value: unknown): boolean | undefined {
|
||||||
|
return typeof value === 'boolean' ? value : undefined;
|
||||||
|
}
|
||||||
|
|
||||||
|
function extractEmbeddingsStatusSummary(embeddingsData: unknown): {
|
||||||
|
coveragePercent: number;
|
||||||
|
totalChunks: number;
|
||||||
|
hasEmbeddings: boolean;
|
||||||
|
} {
|
||||||
|
const embeddings = asObjectRecord(embeddingsData) ?? {};
|
||||||
|
const root = asObjectRecord(embeddings.root) ?? embeddings;
|
||||||
|
const centralized = asObjectRecord(embeddings.centralized);
|
||||||
|
|
||||||
|
const totalIndexes = asFiniteNumber(root.total_indexes)
|
||||||
|
?? asFiniteNumber(embeddings.total_indexes)
|
||||||
|
?? 0;
|
||||||
|
const indexesWithEmbeddings = asFiniteNumber(root.indexes_with_embeddings)
|
||||||
|
?? asFiniteNumber(embeddings.indexes_with_embeddings)
|
||||||
|
?? 0;
|
||||||
|
const totalChunks = asFiniteNumber(root.total_chunks)
|
||||||
|
?? asFiniteNumber(embeddings.total_chunks)
|
||||||
|
?? 0;
|
||||||
|
const coveragePercent = asFiniteNumber(root.coverage_percent)
|
||||||
|
?? asFiniteNumber(embeddings.coverage_percent)
|
||||||
|
?? (totalIndexes > 0 ? (indexesWithEmbeddings / totalIndexes) * 100 : 0);
|
||||||
|
const hasEmbeddings = asBoolean(root.has_embeddings)
|
||||||
|
?? asBoolean(centralized?.usable)
|
||||||
|
?? (totalChunks > 0 || indexesWithEmbeddings > 0 || coveragePercent > 0);
|
||||||
|
|
||||||
|
return {
|
||||||
|
coveragePercent,
|
||||||
|
totalChunks,
|
||||||
|
hasEmbeddings,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function selectEmbeddingsStatusPayload(statusData: unknown): Record<string, unknown> {
|
||||||
|
const status = asObjectRecord(statusData) ?? {};
|
||||||
|
return asObjectRecord(status.embeddings_status) ?? asObjectRecord(status.embeddings) ?? {};
|
||||||
|
}
|
||||||
|
|
||||||
function collectBackendError(
|
function collectBackendError(
|
||||||
errors: string[],
|
errors: string[],
|
||||||
backendName: string,
|
backendName: string,
|
||||||
@@ -825,8 +992,77 @@ function formatSmartSearchCommand(action: string, pathValue: string, extraParams
|
|||||||
return `smart_search(${args.join(', ')})`;
|
return `smart_search(${args.join(', ')})`;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function parseOptionalBooleanEnv(raw: string | undefined): boolean | undefined {
|
||||||
|
const normalized = raw?.trim().toLowerCase();
|
||||||
|
if (!normalized) {
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (['1', 'true', 'on', 'yes'].includes(normalized)) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (['0', 'false', 'off', 'no'].includes(normalized)) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
return undefined;
|
||||||
|
}
|
||||||
|
|
||||||
function isAutoEmbedMissingEnabled(config: CodexLensConfig | null | undefined): boolean {
|
function isAutoEmbedMissingEnabled(config: CodexLensConfig | null | undefined): boolean {
|
||||||
return config?.embedding_auto_embed_missing !== false;
|
const envOverride = parseOptionalBooleanEnv(process.env.CODEXLENS_AUTO_EMBED_MISSING);
|
||||||
|
if (envOverride !== undefined) {
|
||||||
|
return envOverride;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (process.platform === 'win32') {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (typeof config?.embedding_auto_embed_missing === 'boolean') {
|
||||||
|
return config.embedding_auto_embed_missing;
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
function isAutoInitMissingEnabled(): boolean {
|
||||||
|
const envOverride = parseOptionalBooleanEnv(process.env.CODEXLENS_AUTO_INIT_MISSING);
|
||||||
|
if (envOverride !== undefined) {
|
||||||
|
return envOverride;
|
||||||
|
}
|
||||||
|
|
||||||
|
return process.platform !== 'win32';
|
||||||
|
}
|
||||||
|
|
||||||
|
function getAutoEmbedMissingDisabledReason(config: CodexLensConfig | null | undefined): string {
|
||||||
|
const envOverride = parseOptionalBooleanEnv(process.env.CODEXLENS_AUTO_EMBED_MISSING);
|
||||||
|
if (envOverride === false) {
|
||||||
|
return 'Automatic embedding warmup is disabled by CODEXLENS_AUTO_EMBED_MISSING=false.';
|
||||||
|
}
|
||||||
|
|
||||||
|
if (config?.embedding_auto_embed_missing === false) {
|
||||||
|
return 'Automatic embedding warmup is disabled by embedding.auto_embed_missing=false.';
|
||||||
|
}
|
||||||
|
|
||||||
|
if (process.platform === 'win32') {
|
||||||
|
return 'Automatic embedding warmup is disabled by default on Windows even if CodexLens config resolves auto_embed_missing=true. Set CODEXLENS_AUTO_EMBED_MISSING=true to opt in.';
|
||||||
|
}
|
||||||
|
|
||||||
|
return 'Automatic embedding warmup is disabled.';
|
||||||
|
}
|
||||||
|
|
||||||
|
function getAutoInitMissingDisabledReason(): string {
|
||||||
|
const envOverride = parseOptionalBooleanEnv(process.env.CODEXLENS_AUTO_INIT_MISSING);
|
||||||
|
if (envOverride === false) {
|
||||||
|
return 'Automatic static index warmup is disabled by CODEXLENS_AUTO_INIT_MISSING=false.';
|
||||||
|
}
|
||||||
|
|
||||||
|
if (process.platform === 'win32') {
|
||||||
|
return 'Automatic static index warmup is disabled by default on Windows. Set CODEXLENS_AUTO_INIT_MISSING=true to opt in.';
|
||||||
|
}
|
||||||
|
|
||||||
|
return 'Automatic static index warmup is disabled.';
|
||||||
}
|
}
|
||||||
|
|
||||||
function buildIndexSuggestions(indexStatus: IndexStatus, scope: SearchScope): SearchSuggestion[] | undefined {
|
function buildIndexSuggestions(indexStatus: IndexStatus, scope: SearchScope): SearchSuggestion[] | undefined {
|
||||||
@@ -930,29 +1166,24 @@ async function checkIndexStatus(path: string = '.'): Promise<IndexStatus> {
|
|||||||
const status = parsed.result || parsed;
|
const status = parsed.result || parsed;
|
||||||
|
|
||||||
// Get embeddings coverage from comprehensive status
|
// Get embeddings coverage from comprehensive status
|
||||||
const embeddingsData = status.embeddings || {};
|
const embeddingsData = selectEmbeddingsStatusPayload(status);
|
||||||
const totalIndexes = Number(embeddingsData.total_indexes || 0);
|
const legacyEmbeddingsData = asObjectRecord(status.embeddings) ?? {};
|
||||||
const indexesWithEmbeddings = Number(embeddingsData.indexes_with_embeddings || 0);
|
const embeddingsSummary = extractEmbeddingsStatusSummary(embeddingsData);
|
||||||
const totalChunks = Number(embeddingsData.total_chunks || 0);
|
const totalIndexes = Number(legacyEmbeddingsData.total_indexes || asObjectRecord(embeddingsData)?.total_indexes || 0);
|
||||||
const hasCentralizedVectors = hasCentralizedVectorArtifacts(status.index_root);
|
const embeddingsCoverage = embeddingsSummary.coveragePercent;
|
||||||
let embeddingsCoverage = typeof embeddingsData.coverage_percent === 'number'
|
const totalChunks = embeddingsSummary.totalChunks;
|
||||||
? embeddingsData.coverage_percent
|
|
||||||
: (totalIndexes > 0 ? (indexesWithEmbeddings / totalIndexes) * 100 : 0);
|
|
||||||
if (hasCentralizedVectors) {
|
|
||||||
embeddingsCoverage = Math.max(embeddingsCoverage, 100);
|
|
||||||
}
|
|
||||||
const indexed = Boolean(status.projects_count > 0 || status.total_files > 0 || status.index_root || totalIndexes > 0 || totalChunks > 0);
|
const indexed = Boolean(status.projects_count > 0 || status.total_files > 0 || status.index_root || totalIndexes > 0 || totalChunks > 0);
|
||||||
const has_embeddings = indexesWithEmbeddings > 0 || embeddingsCoverage > 0 || totalChunks > 0 || hasCentralizedVectors;
|
const has_embeddings = embeddingsSummary.hasEmbeddings;
|
||||||
|
|
||||||
// Extract model info if available
|
// Extract model info if available
|
||||||
const modelInfoData = embeddingsData.model_info;
|
const modelInfoData = asObjectRecord(embeddingsData.model_info);
|
||||||
const modelInfo: ModelInfo | undefined = modelInfoData ? {
|
const modelInfo: ModelInfo | undefined = modelInfoData ? {
|
||||||
model_profile: modelInfoData.model_profile,
|
model_profile: typeof modelInfoData.model_profile === 'string' ? modelInfoData.model_profile : undefined,
|
||||||
model_name: modelInfoData.model_name,
|
model_name: typeof modelInfoData.model_name === 'string' ? modelInfoData.model_name : undefined,
|
||||||
embedding_dim: modelInfoData.embedding_dim,
|
embedding_dim: typeof modelInfoData.embedding_dim === 'number' ? modelInfoData.embedding_dim : undefined,
|
||||||
backend: modelInfoData.backend,
|
backend: typeof modelInfoData.backend === 'string' ? modelInfoData.backend : undefined,
|
||||||
created_at: modelInfoData.created_at,
|
created_at: typeof modelInfoData.created_at === 'string' ? modelInfoData.created_at : undefined,
|
||||||
updated_at: modelInfoData.updated_at,
|
updated_at: typeof modelInfoData.updated_at === 'string' ? modelInfoData.updated_at : undefined,
|
||||||
} : undefined;
|
} : undefined;
|
||||||
|
|
||||||
let warning: string | undefined;
|
let warning: string | undefined;
|
||||||
@@ -1039,6 +1270,39 @@ function looksLikeCodeQuery(query: string): boolean {
|
|||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function queryTargetsGeneratedFiles(query: string): boolean {
|
||||||
|
return GENERATED_QUERY_RE.test(query.trim());
|
||||||
|
}
|
||||||
|
|
||||||
|
function prefersLexicalPriorityQuery(query: string): boolean {
|
||||||
|
const trimmed = query.trim();
|
||||||
|
if (!trimmed) return false;
|
||||||
|
if (ENV_STYLE_QUERY_RE.test(trimmed)) return true;
|
||||||
|
|
||||||
|
const tokens = new Set((trimmed.match(TOPIC_TOKEN_RE) ?? []).map((token) => token.toLowerCase()));
|
||||||
|
if (tokens.size === 0) return false;
|
||||||
|
if (tokens.has('factory') || tokens.has('factories')) return true;
|
||||||
|
if ((tokens.has('environment') || tokens.has('env')) && (tokens.has('variable') || tokens.has('variables'))) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
if (
|
||||||
|
tokens.has('backend') &&
|
||||||
|
['embedding', 'embeddings', 'reranker', 'rerankers', 'onnx', 'api', 'litellm', 'fastembed', 'local', 'legacy']
|
||||||
|
.some((token) => tokens.has(token))
|
||||||
|
) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
let surfaceHit = false;
|
||||||
|
let focusHit = false;
|
||||||
|
for (const token of tokens) {
|
||||||
|
if (LEXICAL_PRIORITY_SURFACE_TOKENS.has(token)) surfaceHit = true;
|
||||||
|
if (LEXICAL_PRIORITY_FOCUS_TOKENS.has(token)) focusHit = true;
|
||||||
|
if (surfaceHit && focusHit) return true;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Classify query intent and recommend search mode
|
* Classify query intent and recommend search mode
|
||||||
* Simple mapping: hybrid (NL + index + embeddings) | exact (index or insufficient embeddings) | ripgrep (no index)
|
* Simple mapping: hybrid (NL + index + embeddings) | exact (index or insufficient embeddings) | ripgrep (no index)
|
||||||
@@ -1051,6 +1315,8 @@ function classifyIntent(query: string, hasIndex: boolean = false, hasSufficientE
|
|||||||
const isNaturalLanguage = detectNaturalLanguage(query);
|
const isNaturalLanguage = detectNaturalLanguage(query);
|
||||||
const isCodeQuery = looksLikeCodeQuery(query);
|
const isCodeQuery = looksLikeCodeQuery(query);
|
||||||
const isRegexPattern = detectRegex(query);
|
const isRegexPattern = detectRegex(query);
|
||||||
|
const targetsGeneratedFiles = queryTargetsGeneratedFiles(query);
|
||||||
|
const prefersLexicalPriority = prefersLexicalPriorityQuery(query);
|
||||||
|
|
||||||
let mode: string;
|
let mode: string;
|
||||||
let confidence: number;
|
let confidence: number;
|
||||||
@@ -1058,9 +1324,9 @@ function classifyIntent(query: string, hasIndex: boolean = false, hasSufficientE
|
|||||||
if (!hasIndex) {
|
if (!hasIndex) {
|
||||||
mode = 'ripgrep';
|
mode = 'ripgrep';
|
||||||
confidence = 1.0;
|
confidence = 1.0;
|
||||||
} else if (isCodeQuery || isRegexPattern) {
|
} else if (targetsGeneratedFiles || prefersLexicalPriority || isCodeQuery || isRegexPattern) {
|
||||||
mode = 'exact';
|
mode = 'exact';
|
||||||
confidence = 0.95;
|
confidence = targetsGeneratedFiles ? 0.97 : prefersLexicalPriority ? 0.93 : 0.95;
|
||||||
} else if (isNaturalLanguage && hasSufficientEmbeddings) {
|
} else if (isNaturalLanguage && hasSufficientEmbeddings) {
|
||||||
mode = 'hybrid';
|
mode = 'hybrid';
|
||||||
confidence = 0.9;
|
confidence = 0.9;
|
||||||
@@ -1075,6 +1341,8 @@ function classifyIntent(query: string, hasIndex: boolean = false, hasSufficientE
|
|||||||
if (detectNaturalLanguage(query)) detectedPatterns.push('natural language');
|
if (detectNaturalLanguage(query)) detectedPatterns.push('natural language');
|
||||||
if (detectFilePath(query)) detectedPatterns.push('file path');
|
if (detectFilePath(query)) detectedPatterns.push('file path');
|
||||||
if (detectRelationship(query)) detectedPatterns.push('relationship');
|
if (detectRelationship(query)) detectedPatterns.push('relationship');
|
||||||
|
if (targetsGeneratedFiles) detectedPatterns.push('generated artifact');
|
||||||
|
if (prefersLexicalPriority) detectedPatterns.push('lexical priority');
|
||||||
if (isCodeQuery) detectedPatterns.push('code identifier');
|
if (isCodeQuery) detectedPatterns.push('code identifier');
|
||||||
|
|
||||||
const reasoning = `Query classified as ${mode} (confidence: ${confidence.toFixed(2)}, detected: ${detectedPatterns.join(', ')}, index: ${hasIndex ? 'available' : 'not available'}, embeddings: ${hasSufficientEmbeddings ? 'sufficient' : 'insufficient'})`;
|
const reasoning = `Query classified as ${mode} (confidence: ${confidence.toFixed(2)}, detected: ${detectedPatterns.join(', ')}, index: ${hasIndex ? 'available' : 'not available'}, embeddings: ${hasSufficientEmbeddings ? 'sufficient' : 'insufficient'})`;
|
||||||
@@ -1087,12 +1355,21 @@ function classifyIntent(query: string, hasIndex: boolean = false, hasSufficientE
|
|||||||
* @param toolName - Tool executable name
|
* @param toolName - Tool executable name
|
||||||
* @returns True if available
|
* @returns True if available
|
||||||
*/
|
*/
|
||||||
function checkToolAvailability(toolName: string): boolean {
|
function checkToolAvailability(
|
||||||
|
toolName: string,
|
||||||
|
lookupRuntime: typeof spawnSync = spawnSync,
|
||||||
|
): boolean {
|
||||||
try {
|
try {
|
||||||
const isWindows = process.platform === 'win32';
|
const isWindows = process.platform === 'win32';
|
||||||
const command = isWindows ? 'where' : 'which';
|
const command = isWindows ? 'where' : 'which';
|
||||||
execSync(`${command} ${toolName}`, { stdio: 'ignore', timeout: EXEC_TIMEOUTS.SYSTEM_INFO });
|
const result = lookupRuntime(command, [toolName], {
|
||||||
return true;
|
shell: false,
|
||||||
|
windowsHide: true,
|
||||||
|
stdio: 'ignore',
|
||||||
|
timeout: EXEC_TIMEOUTS.SYSTEM_INFO,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
||||||
|
});
|
||||||
|
return !result.error && result.status === 0;
|
||||||
} catch {
|
} catch {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
@@ -1330,6 +1607,23 @@ function normalizeEmbeddingBackend(backend?: string): string | undefined {
|
|||||||
return normalized;
|
return normalized;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function buildIndexInitArgs(projectPath: string, options: { force?: boolean; languages?: string[]; noEmbeddings?: boolean } = {}): string[] {
|
||||||
|
const { force = false, languages, noEmbeddings = true } = options;
|
||||||
|
const args = ['index', 'init', projectPath];
|
||||||
|
|
||||||
|
if (noEmbeddings) {
|
||||||
|
args.push('--no-embeddings');
|
||||||
|
}
|
||||||
|
if (force) {
|
||||||
|
args.push('--force');
|
||||||
|
}
|
||||||
|
if (languages && languages.length > 0) {
|
||||||
|
args.push(...languages.flatMap((language) => ['--language', language]));
|
||||||
|
}
|
||||||
|
|
||||||
|
return args;
|
||||||
|
}
|
||||||
|
|
||||||
function resolveEmbeddingSelection(
|
function resolveEmbeddingSelection(
|
||||||
requestedBackend: string | undefined,
|
requestedBackend: string | undefined,
|
||||||
requestedModel: string | undefined,
|
requestedModel: string | undefined,
|
||||||
@@ -1502,17 +1796,17 @@ function spawnBackgroundEmbeddingsViaPython(params: {
|
|||||||
}): { success: boolean; error?: string } {
|
}): { success: boolean; error?: string } {
|
||||||
const { projectPath, backend, model } = params;
|
const { projectPath, backend, model } = params;
|
||||||
try {
|
try {
|
||||||
const child = spawn(getVenvPythonPath(), ['-c', buildEmbeddingPythonCode(params)], {
|
const child = getSpawnRuntime()(
|
||||||
cwd: projectPath,
|
getVenvPythonPathRuntime()(),
|
||||||
shell: false,
|
['-c', buildEmbeddingPythonCode(params)],
|
||||||
detached: true,
|
buildSmartSearchSpawnOptions(projectPath, {
|
||||||
stdio: 'ignore',
|
detached: shouldDetachBackgroundSmartSearchProcess(),
|
||||||
windowsHide: true,
|
stdio: 'ignore',
|
||||||
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
}),
|
||||||
});
|
);
|
||||||
|
|
||||||
autoEmbedJobs.set(projectPath, {
|
autoEmbedJobs.set(projectPath, {
|
||||||
startedAt: Date.now(),
|
startedAt: getNowRuntime(),
|
||||||
backend,
|
backend,
|
||||||
model,
|
model,
|
||||||
});
|
});
|
||||||
@@ -1532,6 +1826,84 @@ function spawnBackgroundEmbeddingsViaPython(params: {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function spawnBackgroundIndexInit(params: {
|
||||||
|
projectPath: string;
|
||||||
|
languages?: string[];
|
||||||
|
}): { success: boolean; error?: string } {
|
||||||
|
const { projectPath, languages } = params;
|
||||||
|
try {
|
||||||
|
const pythonPath = getVenvPythonPathRuntime()();
|
||||||
|
if (!existsSync(pythonPath)) {
|
||||||
|
return {
|
||||||
|
success: false,
|
||||||
|
error: 'CodexLens Python environment is not ready yet.',
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
const child = getSpawnRuntime()(
|
||||||
|
pythonPath,
|
||||||
|
['-m', 'codexlens', ...buildIndexInitArgs(projectPath, { languages })],
|
||||||
|
buildSmartSearchSpawnOptions(projectPath, {
|
||||||
|
detached: shouldDetachBackgroundSmartSearchProcess(),
|
||||||
|
stdio: 'ignore',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
|
autoInitJobs.set(projectPath, {
|
||||||
|
startedAt: getNowRuntime(),
|
||||||
|
languages,
|
||||||
|
});
|
||||||
|
|
||||||
|
const cleanup = () => {
|
||||||
|
autoInitJobs.delete(projectPath);
|
||||||
|
};
|
||||||
|
child.on('error', cleanup);
|
||||||
|
child.on('close', cleanup);
|
||||||
|
child.unref();
|
||||||
|
return { success: true };
|
||||||
|
} catch (error) {
|
||||||
|
return {
|
||||||
|
success: false,
|
||||||
|
error: error instanceof Error ? error.message : String(error),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function maybeStartBackgroundAutoInit(
|
||||||
|
scope: SearchScope,
|
||||||
|
indexStatus: IndexStatus,
|
||||||
|
): Promise<{ note?: string; warning?: string }> {
|
||||||
|
if (indexStatus.indexed) {
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!isAutoInitMissingEnabled()) {
|
||||||
|
return {
|
||||||
|
note: getAutoInitMissingDisabledReason(),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
if (autoInitJobs.has(scope.workingDirectory)) {
|
||||||
|
return {
|
||||||
|
note: 'Background static index build is already running for this path.',
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
const spawned = spawnBackgroundIndexInit({
|
||||||
|
projectPath: scope.workingDirectory,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!spawned.success) {
|
||||||
|
return {
|
||||||
|
warning: `Automatic static index warmup could not start: ${spawned.error}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
note: 'Background static index build started for this path. Re-run search shortly for indexed FTS results.',
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
async function maybeStartBackgroundAutoEmbed(
|
async function maybeStartBackgroundAutoEmbed(
|
||||||
scope: SearchScope,
|
scope: SearchScope,
|
||||||
indexStatus: IndexStatus,
|
indexStatus: IndexStatus,
|
||||||
@@ -1542,7 +1914,7 @@ async function maybeStartBackgroundAutoEmbed(
|
|||||||
|
|
||||||
if (!isAutoEmbedMissingEnabled(indexStatus.config)) {
|
if (!isAutoEmbedMissingEnabled(indexStatus.config)) {
|
||||||
return {
|
return {
|
||||||
note: 'Automatic embedding warmup is disabled by CODEXLENS_AUTO_EMBED_MISSING=false.',
|
note: getAutoEmbedMissingDisabledReason(indexStatus.config),
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1554,7 +1926,7 @@ async function maybeStartBackgroundAutoEmbed(
|
|||||||
|
|
||||||
const backend = normalizeEmbeddingBackend(indexStatus.config?.embedding_backend) ?? 'fastembed';
|
const backend = normalizeEmbeddingBackend(indexStatus.config?.embedding_backend) ?? 'fastembed';
|
||||||
const model = indexStatus.config?.embedding_model?.trim() || undefined;
|
const model = indexStatus.config?.embedding_model?.trim() || undefined;
|
||||||
const semanticStatus = await checkSemanticStatus();
|
const semanticStatus = await getSemanticStatusRuntime()();
|
||||||
if (!semanticStatus.available) {
|
if (!semanticStatus.available) {
|
||||||
return {
|
return {
|
||||||
warning: 'Automatic embedding warmup skipped because semantic dependencies are not ready.',
|
warning: 'Automatic embedding warmup skipped because semantic dependencies are not ready.',
|
||||||
@@ -1604,18 +1976,19 @@ async function executeEmbeddingsViaPython(params: {
|
|||||||
const pythonCode = buildEmbeddingPythonCode(params);
|
const pythonCode = buildEmbeddingPythonCode(params);
|
||||||
|
|
||||||
return await new Promise((resolve) => {
|
return await new Promise((resolve) => {
|
||||||
const child = spawn(getVenvPythonPath(), ['-c', pythonCode], {
|
const child = getSpawnRuntime()(
|
||||||
cwd: projectPath,
|
getVenvPythonPathRuntime()(),
|
||||||
shell: false,
|
['-c', pythonCode],
|
||||||
timeout: 1800000,
|
buildSmartSearchSpawnOptions(projectPath, {
|
||||||
env: { ...process.env, PYTHONIOENCODING: 'utf-8' },
|
timeout: 1800000,
|
||||||
});
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
const progressMessages: string[] = [];
|
const progressMessages: string[] = [];
|
||||||
|
|
||||||
child.stdout.on('data', (data: Buffer) => {
|
child.stdout?.on('data', (data: Buffer) => {
|
||||||
const chunk = data.toString();
|
const chunk = data.toString();
|
||||||
stdout += chunk;
|
stdout += chunk;
|
||||||
for (const line of chunk.split(/\r?\n/)) {
|
for (const line of chunk.split(/\r?\n/)) {
|
||||||
@@ -1625,7 +1998,7 @@ async function executeEmbeddingsViaPython(params: {
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data: Buffer) => {
|
child.stderr?.on('data', (data: Buffer) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -1683,13 +2056,7 @@ async function executeInitAction(params: Params, force: boolean = false): Promis
|
|||||||
|
|
||||||
// Build args with --no-embeddings for FTS-only index (faster)
|
// Build args with --no-embeddings for FTS-only index (faster)
|
||||||
// Use 'index init' subcommand (new CLI structure)
|
// Use 'index init' subcommand (new CLI structure)
|
||||||
const args = ['index', 'init', scope.workingDirectory, '--no-embeddings'];
|
const args = buildIndexInitArgs(scope.workingDirectory, { force, languages });
|
||||||
if (force) {
|
|
||||||
args.push('--force'); // Force full rebuild
|
|
||||||
}
|
|
||||||
if (languages && languages.length > 0) {
|
|
||||||
args.push(...languages.flatMap((language) => ['--language', language]));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Track progress updates
|
// Track progress updates
|
||||||
const progressUpdates: ProgressInfo[] = [];
|
const progressUpdates: ProgressInfo[] = [];
|
||||||
@@ -1805,8 +2172,12 @@ async function executeEmbedAction(params: Params): Promise<SearchResult> {
|
|||||||
api_max_workers: normalizedBackend === 'litellm' ? effectiveApiMaxWorkers : undefined,
|
api_max_workers: normalizedBackend === 'litellm' ? effectiveApiMaxWorkers : undefined,
|
||||||
endpoint_count: endpoints.length,
|
endpoint_count: endpoints.length,
|
||||||
use_gpu: true,
|
use_gpu: true,
|
||||||
|
reranker_enabled: currentStatus.config?.reranker_enabled,
|
||||||
|
reranker_backend: currentStatus.config?.reranker_backend,
|
||||||
|
reranker_model: currentStatus.config?.reranker_model,
|
||||||
cascade_strategy: currentStatus.config?.cascade_strategy,
|
cascade_strategy: currentStatus.config?.cascade_strategy,
|
||||||
staged_stage2_mode: currentStatus.config?.staged_stage2_mode,
|
staged_stage2_mode: currentStatus.config?.staged_stage2_mode,
|
||||||
|
static_graph_enabled: currentStatus.config?.static_graph_enabled,
|
||||||
note: [embeddingSelection.note, progressMessage].filter(Boolean).join(' | ') || undefined,
|
note: [embeddingSelection.note, progressMessage].filter(Boolean).join(' | ') || undefined,
|
||||||
preset: embeddingSelection.preset,
|
preset: embeddingSelection.preset,
|
||||||
},
|
},
|
||||||
@@ -1856,6 +2227,9 @@ async function executeStatusAction(params: Params): Promise<SearchResult> {
|
|||||||
if (cfg.staged_stage2_mode) {
|
if (cfg.staged_stage2_mode) {
|
||||||
statusParts.push(`Stage2: ${cfg.staged_stage2_mode}`);
|
statusParts.push(`Stage2: ${cfg.staged_stage2_mode}`);
|
||||||
}
|
}
|
||||||
|
if (typeof cfg.static_graph_enabled === 'boolean') {
|
||||||
|
statusParts.push(`Static Graph: ${cfg.static_graph_enabled ? 'on' : 'off'}`);
|
||||||
|
}
|
||||||
|
|
||||||
// Reranker info
|
// Reranker info
|
||||||
if (cfg.reranker_enabled) {
|
if (cfg.reranker_enabled) {
|
||||||
@@ -1874,6 +2248,12 @@ async function executeStatusAction(params: Params): Promise<SearchResult> {
|
|||||||
action: 'status',
|
action: 'status',
|
||||||
path: scope.workingDirectory,
|
path: scope.workingDirectory,
|
||||||
warning: indexStatus.warning,
|
warning: indexStatus.warning,
|
||||||
|
reranker_enabled: indexStatus.config?.reranker_enabled,
|
||||||
|
reranker_backend: indexStatus.config?.reranker_backend,
|
||||||
|
reranker_model: indexStatus.config?.reranker_model,
|
||||||
|
cascade_strategy: indexStatus.config?.cascade_strategy,
|
||||||
|
staged_stage2_mode: indexStatus.config?.staged_stage2_mode,
|
||||||
|
static_graph_enabled: indexStatus.config?.static_graph_enabled,
|
||||||
suggestions: buildIndexSuggestions(indexStatus, scope),
|
suggestions: buildIndexSuggestions(indexStatus, scope),
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
@@ -2026,6 +2406,7 @@ async function executeFuzzyMode(params: Params): Promise<SearchResult> {
|
|||||||
const ftsWasBroken = codexLensFtsBackendBroken;
|
const ftsWasBroken = codexLensFtsBackendBroken;
|
||||||
const ripgrepQueryMode = resolveRipgrepQueryMode(query, regex, tokenize);
|
const ripgrepQueryMode = resolveRipgrepQueryMode(query, regex, tokenize);
|
||||||
const fuzzyWarnings: string[] = [];
|
const fuzzyWarnings: string[] = [];
|
||||||
|
const skipExactDueToCompatibility = ftsWasBroken && !ripgrepQueryMode.literalFallback;
|
||||||
|
|
||||||
let skipExactReason: string | undefined;
|
let skipExactReason: string | undefined;
|
||||||
if (ripgrepQueryMode.literalFallback) {
|
if (ripgrepQueryMode.literalFallback) {
|
||||||
@@ -2043,10 +2424,7 @@ async function executeFuzzyMode(params: Params): Promise<SearchResult> {
|
|||||||
]);
|
]);
|
||||||
timer.mark('parallel_search');
|
timer.mark('parallel_search');
|
||||||
|
|
||||||
if (!skipExactReason && !ftsWasBroken && codexLensFtsBackendBroken) {
|
if (skipExactReason && !skipExactDueToCompatibility) {
|
||||||
fuzzyWarnings.push('CodexLens FTS backend is incompatible with the current CLI runtime. Falling back to ripgrep results.');
|
|
||||||
}
|
|
||||||
if (skipExactReason) {
|
|
||||||
fuzzyWarnings.push(skipExactReason);
|
fuzzyWarnings.push(skipExactReason);
|
||||||
}
|
}
|
||||||
if (ripgrepResult.status === 'fulfilled' && ripgrepResult.value.metadata?.warning) {
|
if (ripgrepResult.status === 'fulfilled' && ripgrepResult.value.metadata?.warning) {
|
||||||
@@ -2070,6 +2448,16 @@ async function executeFuzzyMode(params: Params): Promise<SearchResult> {
|
|||||||
resultsMap.set('ripgrep', ripgrepResult.value.results as any[]);
|
resultsMap.set('ripgrep', ripgrepResult.value.results as any[]);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const ripgrepResultCount = (resultsMap.get('ripgrep') ?? []).length;
|
||||||
|
const compatibilityTriggeredThisQuery = !skipExactReason && !ftsWasBroken && codexLensFtsBackendBroken;
|
||||||
|
if (shouldSurfaceCodexLensFtsCompatibilityWarning({
|
||||||
|
compatibilityTriggeredThisQuery,
|
||||||
|
skipExactDueToCompatibility,
|
||||||
|
ripgrepResultCount,
|
||||||
|
})) {
|
||||||
|
fuzzyWarnings.push('CodexLens FTS backend is incompatible with the current CLI runtime. Falling back to ripgrep results.');
|
||||||
|
}
|
||||||
|
|
||||||
// If both failed, return error
|
// If both failed, return error
|
||||||
if (resultsMap.size === 0) {
|
if (resultsMap.size === 0) {
|
||||||
const errors: string[] = [];
|
const errors: string[] = [];
|
||||||
@@ -2286,20 +2674,23 @@ async function executeRipgrepMode(params: Params): Promise<SearchResult> {
|
|||||||
});
|
});
|
||||||
|
|
||||||
return new Promise((resolve) => {
|
return new Promise((resolve) => {
|
||||||
const child = spawn(command, args, {
|
const child = getSpawnRuntime()(
|
||||||
cwd: scope.workingDirectory || getProjectRoot(),
|
command,
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
args,
|
||||||
});
|
buildSmartSearchSpawnOptions(scope.workingDirectory || getProjectRoot(), {
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
let resultLimitReached = false;
|
let resultLimitReached = false;
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
stdout += data.toString();
|
stdout += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -3484,19 +3875,22 @@ async function executeFindFilesAction(params: Params): Promise<SearchResult> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
const child = spawn('rg', args, {
|
const child = getSpawnRuntime()(
|
||||||
cwd: scope.workingDirectory || getProjectRoot(),
|
'rg',
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
args,
|
||||||
});
|
buildSmartSearchSpawnOptions(scope.workingDirectory || getProjectRoot(), {
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
stdout += data.toString();
|
stdout += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -3800,6 +4194,12 @@ function enrichMetadataWithIndexStatus(
|
|||||||
nextMetadata.index_status = indexStatus.indexed
|
nextMetadata.index_status = indexStatus.indexed
|
||||||
? (indexStatus.has_embeddings ? 'indexed' : 'partial')
|
? (indexStatus.has_embeddings ? 'indexed' : 'partial')
|
||||||
: 'not_indexed';
|
: 'not_indexed';
|
||||||
|
nextMetadata.reranker_enabled = indexStatus.config?.reranker_enabled;
|
||||||
|
nextMetadata.reranker_backend = indexStatus.config?.reranker_backend;
|
||||||
|
nextMetadata.reranker_model = indexStatus.config?.reranker_model;
|
||||||
|
nextMetadata.cascade_strategy = indexStatus.config?.cascade_strategy;
|
||||||
|
nextMetadata.staged_stage2_mode = indexStatus.config?.staged_stage2_mode;
|
||||||
|
nextMetadata.static_graph_enabled = indexStatus.config?.static_graph_enabled;
|
||||||
nextMetadata.warning = mergeWarnings(nextMetadata.warning, indexStatus.warning);
|
nextMetadata.warning = mergeWarnings(nextMetadata.warning, indexStatus.warning);
|
||||||
nextMetadata.suggestions = mergeSuggestions(nextMetadata.suggestions, buildIndexSuggestions(indexStatus, scope));
|
nextMetadata.suggestions = mergeSuggestions(nextMetadata.suggestions, buildIndexSuggestions(indexStatus, scope));
|
||||||
return nextMetadata;
|
return nextMetadata;
|
||||||
@@ -3890,7 +4290,7 @@ export async function handler(params: Record<string, unknown>): Promise<ToolResu
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
let autoEmbedNote: string | undefined;
|
let backgroundNote: string | undefined;
|
||||||
|
|
||||||
// Transform output based on output_mode (for search actions only)
|
// Transform output based on output_mode (for search actions only)
|
||||||
if (action === 'search' || action === 'search_files') {
|
if (action === 'search' || action === 'search_files') {
|
||||||
@@ -3898,12 +4298,13 @@ export async function handler(params: Record<string, unknown>): Promise<ToolResu
|
|||||||
const indexStatus = await checkIndexStatus(scope.workingDirectory);
|
const indexStatus = await checkIndexStatus(scope.workingDirectory);
|
||||||
result.metadata = enrichMetadataWithIndexStatus(result.metadata, indexStatus, scope);
|
result.metadata = enrichMetadataWithIndexStatus(result.metadata, indexStatus, scope);
|
||||||
|
|
||||||
|
const autoInitStatus = await maybeStartBackgroundAutoInit(scope, indexStatus);
|
||||||
const autoEmbedStatus = await maybeStartBackgroundAutoEmbed(scope, indexStatus);
|
const autoEmbedStatus = await maybeStartBackgroundAutoEmbed(scope, indexStatus);
|
||||||
autoEmbedNote = autoEmbedStatus.note;
|
backgroundNote = mergeNotes(autoInitStatus.note, autoEmbedStatus.note);
|
||||||
result.metadata = {
|
result.metadata = {
|
||||||
...(result.metadata ?? {}),
|
...(result.metadata ?? {}),
|
||||||
note: mergeNotes(result.metadata?.note, autoEmbedStatus.note),
|
note: mergeNotes(result.metadata?.note, autoInitStatus.note, autoEmbedStatus.note),
|
||||||
warning: mergeWarnings(result.metadata?.warning, autoEmbedStatus.warning),
|
warning: mergeWarnings(result.metadata?.warning, autoInitStatus.warning, autoEmbedStatus.warning),
|
||||||
};
|
};
|
||||||
|
|
||||||
// Add pagination metadata for search results if not already present
|
// Add pagination metadata for search results if not already present
|
||||||
@@ -3935,8 +4336,8 @@ export async function handler(params: Record<string, unknown>): Promise<ToolResu
|
|||||||
if (result.metadata?.warning) {
|
if (result.metadata?.warning) {
|
||||||
advisoryLines.push('', 'Warnings:', `- ${result.metadata.warning}`);
|
advisoryLines.push('', 'Warnings:', `- ${result.metadata.warning}`);
|
||||||
}
|
}
|
||||||
if (autoEmbedNote) {
|
if (backgroundNote) {
|
||||||
advisoryLines.push('', 'Notes:', `- ${autoEmbedNote}`);
|
advisoryLines.push('', 'Notes:', `- ${backgroundNote}`);
|
||||||
}
|
}
|
||||||
if (result.metadata?.suggestions && result.metadata.suggestions.length > 0) {
|
if (result.metadata?.suggestions && result.metadata.suggestions.length > 0) {
|
||||||
advisoryLines.push('', 'Suggestions:');
|
advisoryLines.push('', 'Suggestions:');
|
||||||
@@ -3972,13 +4373,40 @@ export async function handler(params: Record<string, unknown>): Promise<ToolResu
|
|||||||
*/
|
*/
|
||||||
export const __testables = {
|
export const __testables = {
|
||||||
isCodexLensCliCompatibilityError,
|
isCodexLensCliCompatibilityError,
|
||||||
|
shouldSurfaceCodexLensFtsCompatibilityWarning,
|
||||||
|
buildSmartSearchSpawnOptions,
|
||||||
|
shouldDetachBackgroundSmartSearchProcess,
|
||||||
|
checkToolAvailability,
|
||||||
parseCodexLensJsonOutput,
|
parseCodexLensJsonOutput,
|
||||||
parsePlainTextFileMatches,
|
parsePlainTextFileMatches,
|
||||||
hasCentralizedVectorArtifacts,
|
hasCentralizedVectorArtifacts,
|
||||||
|
extractEmbeddingsStatusSummary,
|
||||||
|
selectEmbeddingsStatusPayload,
|
||||||
resolveRipgrepQueryMode,
|
resolveRipgrepQueryMode,
|
||||||
|
queryTargetsGeneratedFiles,
|
||||||
|
prefersLexicalPriorityQuery,
|
||||||
|
classifyIntent,
|
||||||
resolveEmbeddingSelection,
|
resolveEmbeddingSelection,
|
||||||
|
parseOptionalBooleanEnv,
|
||||||
|
isAutoInitMissingEnabled,
|
||||||
isAutoEmbedMissingEnabled,
|
isAutoEmbedMissingEnabled,
|
||||||
|
getAutoInitMissingDisabledReason,
|
||||||
|
getAutoEmbedMissingDisabledReason,
|
||||||
buildIndexSuggestions,
|
buildIndexSuggestions,
|
||||||
|
maybeStartBackgroundAutoInit,
|
||||||
|
maybeStartBackgroundAutoEmbed,
|
||||||
|
__setRuntimeOverrides(overrides: Partial<SmartSearchRuntimeOverrides>) {
|
||||||
|
Object.assign(runtimeOverrides, overrides);
|
||||||
|
},
|
||||||
|
__resetRuntimeOverrides() {
|
||||||
|
for (const key of Object.keys(runtimeOverrides) as Array<keyof SmartSearchRuntimeOverrides>) {
|
||||||
|
delete runtimeOverrides[key];
|
||||||
|
}
|
||||||
|
},
|
||||||
|
__resetBackgroundJobs() {
|
||||||
|
autoInitJobs.clear();
|
||||||
|
autoEmbedJobs.clear();
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
export async function executeInitWithProgress(
|
export async function executeInitWithProgress(
|
||||||
|
|||||||
@@ -9,6 +9,7 @@
|
|||||||
* 2. Default: ~/.codexlens
|
* 2. Default: ~/.codexlens
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
import { existsSync } from 'fs';
|
||||||
import { join } from 'path';
|
import { join } from 'path';
|
||||||
import { homedir } from 'os';
|
import { homedir } from 'os';
|
||||||
|
|
||||||
@@ -47,6 +48,26 @@ export function getCodexLensPython(): string {
|
|||||||
: join(venvDir, 'bin', 'python');
|
: join(venvDir, 'bin', 'python');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the preferred Python executable for hidden/windowless CodexLens subprocesses.
|
||||||
|
* On Windows this prefers pythonw.exe when available to avoid transient console windows.
|
||||||
|
*
|
||||||
|
* @returns Path to the preferred hidden-subprocess Python executable
|
||||||
|
*/
|
||||||
|
export function getCodexLensHiddenPython(): string {
|
||||||
|
if (process.platform !== 'win32') {
|
||||||
|
return getCodexLensPython();
|
||||||
|
}
|
||||||
|
|
||||||
|
const venvDir = getCodexLensVenvDir();
|
||||||
|
const pythonwPath = join(venvDir, 'Scripts', 'pythonw.exe');
|
||||||
|
if (existsSync(pythonwPath)) {
|
||||||
|
return pythonwPath;
|
||||||
|
}
|
||||||
|
|
||||||
|
return getCodexLensPython();
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Get the pip executable path in the CodexLens venv.
|
* Get the pip executable path in the CodexLens venv.
|
||||||
*
|
*
|
||||||
|
|||||||
@@ -3,9 +3,19 @@
|
|||||||
* Shared module for consistent Python discovery across the application
|
* Shared module for consistent Python discovery across the application
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { execSync } from 'child_process';
|
import { spawnSync, type SpawnSyncOptionsWithStringEncoding } from 'child_process';
|
||||||
import { EXEC_TIMEOUTS } from './exec-constants.js';
|
import { EXEC_TIMEOUTS } from './exec-constants.js';
|
||||||
|
|
||||||
|
export interface PythonCommandSpec {
|
||||||
|
command: string;
|
||||||
|
args: string[];
|
||||||
|
display: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
type HiddenPythonProbeOptions = Omit<SpawnSyncOptionsWithStringEncoding, 'encoding'> & {
|
||||||
|
encoding?: BufferEncoding;
|
||||||
|
};
|
||||||
|
|
||||||
function isExecTimeoutError(error: unknown): boolean {
|
function isExecTimeoutError(error: unknown): boolean {
|
||||||
const err = error as { code?: unknown; errno?: unknown; message?: unknown } | null;
|
const err = error as { code?: unknown; errno?: unknown; message?: unknown } | null;
|
||||||
const code = err?.code ?? err?.errno;
|
const code = err?.code ?? err?.errno;
|
||||||
@@ -14,6 +24,98 @@ function isExecTimeoutError(error: unknown): boolean {
|
|||||||
return message.includes('ETIMEDOUT');
|
return message.includes('ETIMEDOUT');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function quoteCommandPart(value: string): string {
|
||||||
|
if (!/[\s"]/.test(value)) {
|
||||||
|
return value;
|
||||||
|
}
|
||||||
|
return `"${value.replaceAll('"', '\\"')}"`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatPythonCommandDisplay(command: string, args: string[]): string {
|
||||||
|
return [quoteCommandPart(command), ...args.map(quoteCommandPart)].join(' ');
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildPythonCommandSpec(command: string, args: string[] = []): PythonCommandSpec {
|
||||||
|
return {
|
||||||
|
command,
|
||||||
|
args: [...args],
|
||||||
|
display: formatPythonCommandDisplay(command, args),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function tokenizeCommandSpec(raw: string): string[] {
|
||||||
|
const tokens: string[] = [];
|
||||||
|
const tokenPattern = /"((?:\\"|[^"])*)"|(\S+)/g;
|
||||||
|
|
||||||
|
for (const match of raw.matchAll(tokenPattern)) {
|
||||||
|
const quoted = match[1];
|
||||||
|
const plain = match[2];
|
||||||
|
if (quoted !== undefined) {
|
||||||
|
tokens.push(quoted.replaceAll('\\"', '"'));
|
||||||
|
} else if (plain !== undefined) {
|
||||||
|
tokens.push(plain);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return tokens;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function parsePythonCommandSpec(raw: string): PythonCommandSpec {
|
||||||
|
const trimmed = raw.trim();
|
||||||
|
if (!trimmed) {
|
||||||
|
throw new Error('Python command cannot be empty');
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unquoted executable paths on Windows commonly contain spaces.
|
||||||
|
if (!trimmed.includes('"') && /[\\/]/.test(trimmed)) {
|
||||||
|
return buildPythonCommandSpec(trimmed);
|
||||||
|
}
|
||||||
|
|
||||||
|
const tokens = tokenizeCommandSpec(trimmed);
|
||||||
|
if (tokens.length === 0) {
|
||||||
|
return buildPythonCommandSpec(trimmed);
|
||||||
|
}
|
||||||
|
|
||||||
|
return buildPythonCommandSpec(tokens[0], tokens.slice(1));
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildPythonProbeOptions(
|
||||||
|
overrides: HiddenPythonProbeOptions = {},
|
||||||
|
): SpawnSyncOptionsWithStringEncoding {
|
||||||
|
const { env, encoding, ...rest } = overrides;
|
||||||
|
return {
|
||||||
|
shell: false,
|
||||||
|
windowsHide: true,
|
||||||
|
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8', ...env },
|
||||||
|
...rest,
|
||||||
|
encoding: encoding ?? 'utf8',
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
export function probePythonCommandVersion(
|
||||||
|
pythonCommand: PythonCommandSpec,
|
||||||
|
runner: typeof spawnSync = spawnSync,
|
||||||
|
): string {
|
||||||
|
const result = runner(
|
||||||
|
pythonCommand.command,
|
||||||
|
[...pythonCommand.args, '--version'],
|
||||||
|
buildPythonProbeOptions(),
|
||||||
|
);
|
||||||
|
|
||||||
|
if (result.error) {
|
||||||
|
throw result.error;
|
||||||
|
}
|
||||||
|
|
||||||
|
const versionOutput = `${result.stdout ?? ''}${result.stderr ?? ''}`.trim();
|
||||||
|
if (result.status !== 0) {
|
||||||
|
throw new Error(versionOutput || `Python version probe exited with code ${String(result.status)}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
return versionOutput;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Parse Python version string to major.minor numbers
|
* Parse Python version string to major.minor numbers
|
||||||
* @param versionStr - Version string like "Python 3.11.5"
|
* @param versionStr - Version string like "Python 3.11.5"
|
||||||
@@ -42,66 +144,72 @@ export function isPythonVersionCompatible(major: number, minor: number): boolean
|
|||||||
* Detect available Python 3 executable
|
* Detect available Python 3 executable
|
||||||
* Supports CCW_PYTHON environment variable for custom Python path
|
* Supports CCW_PYTHON environment variable for custom Python path
|
||||||
* On Windows, uses py launcher to find compatible versions
|
* On Windows, uses py launcher to find compatible versions
|
||||||
* @returns Python executable command
|
* @returns Python executable command spec
|
||||||
*/
|
*/
|
||||||
export function getSystemPython(): string {
|
export function getSystemPythonCommand(runner: typeof spawnSync = spawnSync): PythonCommandSpec {
|
||||||
// Check for user-specified Python via environment variable
|
const customPython = process.env.CCW_PYTHON?.trim();
|
||||||
const customPython = process.env.CCW_PYTHON;
|
|
||||||
if (customPython) {
|
if (customPython) {
|
||||||
|
const customSpec = parsePythonCommandSpec(customPython);
|
||||||
try {
|
try {
|
||||||
const version = execSync(`"${customPython}" --version 2>&1`, { encoding: 'utf8', timeout: EXEC_TIMEOUTS.PYTHON_VERSION });
|
const version = probePythonCommandVersion(customSpec, runner);
|
||||||
if (version.includes('Python 3')) {
|
if (version.includes('Python 3')) {
|
||||||
const parsed = parsePythonVersion(version);
|
const parsed = parsePythonVersion(version);
|
||||||
if (parsed && !isPythonVersionCompatible(parsed.major, parsed.minor)) {
|
if (parsed && !isPythonVersionCompatible(parsed.major, parsed.minor)) {
|
||||||
console.warn(`[Python] Warning: CCW_PYTHON points to Python ${parsed.major}.${parsed.minor}, which may not be compatible with onnxruntime (requires 3.9-3.12)`);
|
console.warn(
|
||||||
|
`[Python] Warning: CCW_PYTHON points to Python ${parsed.major}.${parsed.minor}, which may not be compatible with onnxruntime (requires 3.9-3.12)`,
|
||||||
|
);
|
||||||
}
|
}
|
||||||
return `"${customPython}"`;
|
return customSpec;
|
||||||
}
|
}
|
||||||
} catch (err: unknown) {
|
} catch (err: unknown) {
|
||||||
if (isExecTimeoutError(err)) {
|
if (isExecTimeoutError(err)) {
|
||||||
console.warn(`[Python] Warning: CCW_PYTHON version check timed out after ${EXEC_TIMEOUTS.PYTHON_VERSION}ms, falling back to system Python`);
|
console.warn(
|
||||||
|
`[Python] Warning: CCW_PYTHON version check timed out after ${EXEC_TIMEOUTS.PYTHON_VERSION}ms, falling back to system Python`,
|
||||||
|
);
|
||||||
} else {
|
} else {
|
||||||
console.warn(`[Python] Warning: CCW_PYTHON="${customPython}" is not a valid Python executable, falling back to system Python`);
|
console.warn(
|
||||||
|
`[Python] Warning: CCW_PYTHON="${customPython}" is not a valid Python executable, falling back to system Python`,
|
||||||
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// On Windows, try py launcher with specific versions first (3.12, 3.11, 3.10, 3.9)
|
|
||||||
if (process.platform === 'win32') {
|
if (process.platform === 'win32') {
|
||||||
const compatibleVersions = ['3.12', '3.11', '3.10', '3.9'];
|
const compatibleVersions = ['3.12', '3.11', '3.10', '3.9'];
|
||||||
for (const ver of compatibleVersions) {
|
for (const ver of compatibleVersions) {
|
||||||
|
const launcherSpec = buildPythonCommandSpec('py', [`-${ver}`]);
|
||||||
try {
|
try {
|
||||||
const version = execSync(`py -${ver} --version 2>&1`, { encoding: 'utf8', timeout: EXEC_TIMEOUTS.PYTHON_VERSION });
|
const version = probePythonCommandVersion(launcherSpec, runner);
|
||||||
if (version.includes(`Python ${ver}`)) {
|
if (version.includes(`Python ${ver}`)) {
|
||||||
console.log(`[Python] Found compatible Python ${ver} via py launcher`);
|
console.log(`[Python] Found compatible Python ${ver} via py launcher`);
|
||||||
return `py -${ver}`;
|
return launcherSpec;
|
||||||
}
|
}
|
||||||
} catch (err: unknown) {
|
} catch (err: unknown) {
|
||||||
if (isExecTimeoutError(err)) {
|
if (isExecTimeoutError(err)) {
|
||||||
console.warn(`[Python] Warning: py -${ver} version check timed out after ${EXEC_TIMEOUTS.PYTHON_VERSION}ms`);
|
console.warn(
|
||||||
|
`[Python] Warning: py -${ver} version check timed out after ${EXEC_TIMEOUTS.PYTHON_VERSION}ms`,
|
||||||
|
);
|
||||||
}
|
}
|
||||||
// Version not installed, try next
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
const commands = process.platform === 'win32' ? ['python', 'py', 'python3'] : ['python3', 'python'];
|
const commands = process.platform === 'win32' ? ['python', 'py', 'python3'] : ['python3', 'python'];
|
||||||
let fallbackCmd: string | null = null;
|
let fallbackCmd: PythonCommandSpec | null = null;
|
||||||
let fallbackVersion: { major: number; minor: number } | null = null;
|
let fallbackVersion: { major: number; minor: number } | null = null;
|
||||||
|
|
||||||
for (const cmd of commands) {
|
for (const cmd of commands) {
|
||||||
|
const pythonSpec = buildPythonCommandSpec(cmd);
|
||||||
try {
|
try {
|
||||||
const version = execSync(`${cmd} --version 2>&1`, { encoding: 'utf8', timeout: EXEC_TIMEOUTS.PYTHON_VERSION });
|
const version = probePythonCommandVersion(pythonSpec, runner);
|
||||||
if (version.includes('Python 3')) {
|
if (version.includes('Python 3')) {
|
||||||
const parsed = parsePythonVersion(version);
|
const parsed = parsePythonVersion(version);
|
||||||
if (parsed) {
|
if (parsed) {
|
||||||
// Prefer compatible version (3.9-3.12)
|
|
||||||
if (isPythonVersionCompatible(parsed.major, parsed.minor)) {
|
if (isPythonVersionCompatible(parsed.major, parsed.minor)) {
|
||||||
return cmd;
|
return pythonSpec;
|
||||||
}
|
}
|
||||||
// Keep track of first Python 3 found as fallback
|
|
||||||
if (!fallbackCmd) {
|
if (!fallbackCmd) {
|
||||||
fallbackCmd = cmd;
|
fallbackCmd = pythonSpec;
|
||||||
fallbackVersion = parsed;
|
fallbackVersion = parsed;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -110,13 +218,14 @@ export function getSystemPython(): string {
|
|||||||
if (isExecTimeoutError(err)) {
|
if (isExecTimeoutError(err)) {
|
||||||
console.warn(`[Python] Warning: ${cmd} --version timed out after ${EXEC_TIMEOUTS.PYTHON_VERSION}ms`);
|
console.warn(`[Python] Warning: ${cmd} --version timed out after ${EXEC_TIMEOUTS.PYTHON_VERSION}ms`);
|
||||||
}
|
}
|
||||||
// Try next command
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// If no compatible version found, use fallback with warning
|
|
||||||
if (fallbackCmd && fallbackVersion) {
|
if (fallbackCmd && fallbackVersion) {
|
||||||
console.warn(`[Python] Warning: Only Python ${fallbackVersion.major}.${fallbackVersion.minor} found, which may not be compatible with onnxruntime (requires 3.9-3.12).`);
|
console.warn(
|
||||||
|
`[Python] Warning: Only Python ${fallbackVersion.major}.${fallbackVersion.minor} found, which may not be compatible with onnxruntime (requires 3.9-3.12).`,
|
||||||
|
);
|
||||||
|
console.warn('[Python] Semantic search may fail with ImportError for onnxruntime.');
|
||||||
console.warn('[Python] To use a specific Python version, set CCW_PYTHON environment variable:');
|
console.warn('[Python] To use a specific Python version, set CCW_PYTHON environment variable:');
|
||||||
console.warn(' Windows: set CCW_PYTHON=C:\\path\\to\\python.exe');
|
console.warn(' Windows: set CCW_PYTHON=C:\\path\\to\\python.exe');
|
||||||
console.warn(' Unix: export CCW_PYTHON=/path/to/python3.11');
|
console.warn(' Unix: export CCW_PYTHON=/path/to/python3.11');
|
||||||
@@ -124,7 +233,19 @@ export function getSystemPython(): string {
|
|||||||
return fallbackCmd;
|
return fallbackCmd;
|
||||||
}
|
}
|
||||||
|
|
||||||
throw new Error('Python 3 not found. Please install Python 3.9-3.12 and ensure it is in PATH, or set CCW_PYTHON environment variable.');
|
throw new Error(
|
||||||
|
'Python 3 not found. Please install Python 3.9-3.12 and ensure it is in PATH, or set CCW_PYTHON environment variable.',
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Detect available Python 3 executable
|
||||||
|
* Supports CCW_PYTHON environment variable for custom Python path
|
||||||
|
* On Windows, uses py launcher to find compatible versions
|
||||||
|
* @returns Python executable command
|
||||||
|
*/
|
||||||
|
export function getSystemPython(): string {
|
||||||
|
return getSystemPythonCommand().display;
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -135,6 +256,14 @@ export function getPipCommand(): { pythonCmd: string; pipArgs: string[] } {
|
|||||||
const pythonCmd = getSystemPython();
|
const pythonCmd = getSystemPython();
|
||||||
return {
|
return {
|
||||||
pythonCmd,
|
pythonCmd,
|
||||||
pipArgs: ['-m', 'pip']
|
pipArgs: ['-m', 'pip'],
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export const __testables = {
|
||||||
|
buildPythonCommandSpec,
|
||||||
|
buildPythonProbeOptions,
|
||||||
|
formatPythonCommandDisplay,
|
||||||
|
parsePythonCommandSpec,
|
||||||
|
probePythonCommandVersion,
|
||||||
|
};
|
||||||
|
|||||||
@@ -9,7 +9,7 @@
|
|||||||
* - Support for local project installs with extras
|
* - Support for local project installs with extras
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { execSync, spawn } from 'child_process';
|
import { spawn, spawnSync, type SpawnOptions, type SpawnSyncOptionsWithStringEncoding } from 'child_process';
|
||||||
import { existsSync, mkdirSync } from 'fs';
|
import { existsSync, mkdirSync } from 'fs';
|
||||||
import { join, dirname } from 'path';
|
import { join, dirname } from 'path';
|
||||||
import { homedir, platform, arch } from 'os';
|
import { homedir, platform, arch } from 'os';
|
||||||
@@ -52,6 +52,74 @@ const UV_BINARY_NAME = IS_WINDOWS ? 'uv.exe' : 'uv';
|
|||||||
const VENV_BIN_DIR = IS_WINDOWS ? 'Scripts' : 'bin';
|
const VENV_BIN_DIR = IS_WINDOWS ? 'Scripts' : 'bin';
|
||||||
const PYTHON_EXECUTABLE = IS_WINDOWS ? 'python.exe' : 'python';
|
const PYTHON_EXECUTABLE = IS_WINDOWS ? 'python.exe' : 'python';
|
||||||
|
|
||||||
|
type HiddenUvSpawnSyncOptions = Omit<SpawnSyncOptionsWithStringEncoding, 'encoding'> & {
|
||||||
|
encoding?: BufferEncoding;
|
||||||
|
};
|
||||||
|
|
||||||
|
function buildUvSpawnOptions(overrides: SpawnOptions = {}): SpawnOptions {
|
||||||
|
const { env, ...rest } = overrides;
|
||||||
|
return {
|
||||||
|
shell: false,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8', ...env },
|
||||||
|
...rest,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildUvSpawnSyncOptions(
|
||||||
|
overrides: HiddenUvSpawnSyncOptions = {},
|
||||||
|
): SpawnSyncOptionsWithStringEncoding {
|
||||||
|
const { env, encoding, ...rest } = overrides;
|
||||||
|
return {
|
||||||
|
shell: false,
|
||||||
|
windowsHide: true,
|
||||||
|
env: { ...process.env, PYTHONIOENCODING: 'utf-8', ...env },
|
||||||
|
...rest,
|
||||||
|
encoding: encoding ?? 'utf-8',
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function findExecutableOnPath(executable: string, runner: typeof spawnSync = spawnSync): string | null {
|
||||||
|
const lookupCommand = IS_WINDOWS ? 'where' : 'which';
|
||||||
|
const result = runner(
|
||||||
|
lookupCommand,
|
||||||
|
[executable],
|
||||||
|
buildUvSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.SYSTEM_INFO,
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
|
if (result.error || result.status !== 0) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
const output = `${result.stdout ?? ''}`.trim();
|
||||||
|
if (!output) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
return output.split(/\r?\n/)[0] || null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function hasWindowsPythonLauncherVersion(version: string, runner: typeof spawnSync = spawnSync): boolean {
|
||||||
|
const result = runner(
|
||||||
|
'py',
|
||||||
|
[`-${version}`, '--version'],
|
||||||
|
buildUvSpawnSyncOptions({
|
||||||
|
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
||||||
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
|
||||||
|
if (result.error || result.status !== 0) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
const output = `${result.stdout ?? ''}${result.stderr ?? ''}`;
|
||||||
|
return output.includes(`Python ${version}`);
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Get the path to the UV binary
|
* Get the path to the UV binary
|
||||||
* Search order:
|
* Search order:
|
||||||
@@ -105,15 +173,9 @@ export function getUvBinaryPath(): string {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// 4. Try system PATH using 'which' or 'where'
|
// 4. Try system PATH using 'which' or 'where'
|
||||||
try {
|
const foundPath = findExecutableOnPath('uv');
|
||||||
const cmd = IS_WINDOWS ? 'where uv' : 'which uv';
|
if (foundPath && existsSync(foundPath)) {
|
||||||
const result = execSync(cmd, { encoding: 'utf-8', timeout: EXEC_TIMEOUTS.SYSTEM_INFO, stdio: ['pipe', 'pipe', 'pipe'] });
|
return foundPath;
|
||||||
const foundPath = result.trim().split('\n')[0];
|
|
||||||
if (foundPath && existsSync(foundPath)) {
|
|
||||||
return foundPath;
|
|
||||||
}
|
|
||||||
} catch {
|
|
||||||
// UV not found in PATH
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Return default path (may not exist)
|
// Return default path (may not exist)
|
||||||
@@ -135,10 +197,10 @@ export async function isUvAvailable(): Promise<boolean> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
return new Promise((resolve) => {
|
return new Promise((resolve) => {
|
||||||
const child = spawn(uvPath, ['--version'], {
|
const child = spawn(uvPath, ['--version'], buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
||||||
});
|
}));
|
||||||
|
|
||||||
child.on('close', (code) => {
|
child.on('close', (code) => {
|
||||||
resolve(code === 0);
|
resolve(code === 0);
|
||||||
@@ -162,14 +224,14 @@ export async function getUvVersion(): Promise<string | null> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
return new Promise((resolve) => {
|
return new Promise((resolve) => {
|
||||||
const child = spawn(uvPath, ['--version'], {
|
const child = spawn(uvPath, ['--version'], buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
||||||
});
|
}));
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
stdout += data.toString();
|
stdout += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -207,19 +269,29 @@ export async function ensureUvInstalled(): Promise<boolean> {
|
|||||||
if (IS_WINDOWS) {
|
if (IS_WINDOWS) {
|
||||||
// Windows: Use PowerShell to run the install script
|
// Windows: Use PowerShell to run the install script
|
||||||
const installCmd = 'irm https://astral.sh/uv/install.ps1 | iex';
|
const installCmd = 'irm https://astral.sh/uv/install.ps1 | iex';
|
||||||
child = spawn('powershell', ['-ExecutionPolicy', 'ByPass', '-Command', installCmd], {
|
child = spawn('powershell', ['-ExecutionPolicy', 'ByPass', '-Command', installCmd], buildUvSpawnOptions({
|
||||||
stdio: 'inherit',
|
stdio: ['pipe', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
});
|
}));
|
||||||
} else {
|
} else {
|
||||||
// Unix: Use curl and sh
|
// Unix: Use curl and sh
|
||||||
const installCmd = 'curl -LsSf https://astral.sh/uv/install.sh | sh';
|
const installCmd = 'curl -LsSf https://astral.sh/uv/install.sh | sh';
|
||||||
child = spawn('sh', ['-c', installCmd], {
|
child = spawn('sh', ['-c', installCmd], buildUvSpawnOptions({
|
||||||
stdio: 'inherit',
|
stdio: ['pipe', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
});
|
}));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
child.stdout?.on('data', (data) => {
|
||||||
|
const line = data.toString().trim();
|
||||||
|
if (line) console.log(`[UV] ${line}`);
|
||||||
|
});
|
||||||
|
|
||||||
|
child.stderr?.on('data', (data) => {
|
||||||
|
const line = data.toString().trim();
|
||||||
|
if (line) console.log(`[UV] ${line}`);
|
||||||
|
});
|
||||||
|
|
||||||
child.on('close', (code) => {
|
child.on('close', (code) => {
|
||||||
if (code === 0) {
|
if (code === 0) {
|
||||||
console.log('[UV] UV installed successfully');
|
console.log('[UV] UV installed successfully');
|
||||||
@@ -315,21 +387,21 @@ export class UvManager {
|
|||||||
console.log(`[UV] Python version: ${this.pythonVersion}`);
|
console.log(`[UV] Python version: ${this.pythonVersion}`);
|
||||||
}
|
}
|
||||||
|
|
||||||
const child = spawn(uvPath, args, {
|
const child = spawn(uvPath, args, buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PROCESS_SPAWN,
|
timeout: EXEC_TIMEOUTS.PROCESS_SPAWN,
|
||||||
});
|
}));
|
||||||
|
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line) {
|
if (line) {
|
||||||
console.log(`[UV] ${line}`);
|
console.log(`[UV] ${line}`);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line) {
|
if (line) {
|
||||||
@@ -390,22 +462,22 @@ export class UvManager {
|
|||||||
|
|
||||||
console.log(`[UV] Installing from project: ${installSpec} (editable: ${editable})`);
|
console.log(`[UV] Installing from project: ${installSpec} (editable: ${editable})`);
|
||||||
|
|
||||||
const child = spawn(uvPath, args, {
|
const child = spawn(uvPath, args, buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
cwd: projectPath,
|
cwd: projectPath,
|
||||||
});
|
}));
|
||||||
|
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line) {
|
if (line) {
|
||||||
console.log(`[UV] ${line}`);
|
console.log(`[UV] ${line}`);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line && !line.startsWith('Resolved') && !line.startsWith('Prepared') && !line.startsWith('Installed')) {
|
if (line && !line.startsWith('Resolved') && !line.startsWith('Prepared') && !line.startsWith('Installed')) {
|
||||||
@@ -460,21 +532,21 @@ export class UvManager {
|
|||||||
|
|
||||||
console.log(`[UV] Installing packages: ${packages.join(', ')}`);
|
console.log(`[UV] Installing packages: ${packages.join(', ')}`);
|
||||||
|
|
||||||
const child = spawn(uvPath, args, {
|
const child = spawn(uvPath, args, buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
});
|
}));
|
||||||
|
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line) {
|
if (line) {
|
||||||
console.log(`[UV] ${line}`);
|
console.log(`[UV] ${line}`);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -524,21 +596,21 @@ export class UvManager {
|
|||||||
|
|
||||||
console.log(`[UV] Uninstalling packages: ${packages.join(', ')}`);
|
console.log(`[UV] Uninstalling packages: ${packages.join(', ')}`);
|
||||||
|
|
||||||
const child = spawn(uvPath, args, {
|
const child = spawn(uvPath, args, buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
});
|
}));
|
||||||
|
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line) {
|
if (line) {
|
||||||
console.log(`[UV] ${line}`);
|
console.log(`[UV] ${line}`);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -585,21 +657,21 @@ export class UvManager {
|
|||||||
|
|
||||||
console.log(`[UV] Syncing dependencies from: ${requirementsPath}`);
|
console.log(`[UV] Syncing dependencies from: ${requirementsPath}`);
|
||||||
|
|
||||||
const child = spawn(uvPath, args, {
|
const child = spawn(uvPath, args, buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
timeout: EXEC_TIMEOUTS.PACKAGE_INSTALL,
|
||||||
});
|
}));
|
||||||
|
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
const line = data.toString().trim();
|
const line = data.toString().trim();
|
||||||
if (line) {
|
if (line) {
|
||||||
console.log(`[UV] ${line}`);
|
console.log(`[UV] ${line}`);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -640,14 +712,14 @@ export class UvManager {
|
|||||||
return new Promise((resolve) => {
|
return new Promise((resolve) => {
|
||||||
const args = ['pip', 'list', '--format', 'json', '--python', this.getVenvPython()];
|
const args = ['pip', 'list', '--format', 'json', '--python', this.getVenvPython()];
|
||||||
|
|
||||||
const child = spawn(uvPath, args, {
|
const child = spawn(uvPath, args, buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: EXEC_TIMEOUTS.PROCESS_SPAWN,
|
timeout: EXEC_TIMEOUTS.PROCESS_SPAWN,
|
||||||
});
|
}));
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
stdout += data.toString();
|
stdout += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -704,20 +776,20 @@ export class UvManager {
|
|||||||
}
|
}
|
||||||
|
|
||||||
return new Promise((resolve) => {
|
return new Promise((resolve) => {
|
||||||
const child = spawn(pythonPath, args, {
|
const child = spawn(pythonPath, args, buildUvSpawnOptions({
|
||||||
stdio: ['ignore', 'pipe', 'pipe'],
|
stdio: ['ignore', 'pipe', 'pipe'],
|
||||||
timeout: options.timeout ?? EXEC_TIMEOUTS.PROCESS_SPAWN,
|
timeout: options.timeout ?? EXEC_TIMEOUTS.PROCESS_SPAWN,
|
||||||
cwd: options.cwd,
|
cwd: options.cwd,
|
||||||
});
|
}));
|
||||||
|
|
||||||
let stdout = '';
|
let stdout = '';
|
||||||
let stderr = '';
|
let stderr = '';
|
||||||
|
|
||||||
child.stdout.on('data', (data) => {
|
child.stdout?.on('data', (data) => {
|
||||||
stdout += data.toString();
|
stdout += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
child.stderr.on('data', (data) => {
|
child.stderr?.on('data', (data) => {
|
||||||
stderr += data.toString();
|
stderr += data.toString();
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -779,17 +851,8 @@ export function getPreferredCodexLensPythonSpec(): string {
|
|||||||
// depend on onnxruntime 1.15.x wheels, which are not consistently available for cp312.
|
// depend on onnxruntime 1.15.x wheels, which are not consistently available for cp312.
|
||||||
const preferredVersions = ['3.11', '3.10', '3.12'];
|
const preferredVersions = ['3.11', '3.10', '3.12'];
|
||||||
for (const version of preferredVersions) {
|
for (const version of preferredVersions) {
|
||||||
try {
|
if (hasWindowsPythonLauncherVersion(version)) {
|
||||||
const output = execSync(`py -${version} --version`, {
|
return version;
|
||||||
encoding: 'utf-8',
|
|
||||||
timeout: EXEC_TIMEOUTS.PYTHON_VERSION,
|
|
||||||
stdio: ['pipe', 'pipe', 'pipe'],
|
|
||||||
});
|
|
||||||
if (output.includes(`Python ${version}`)) {
|
|
||||||
return version;
|
|
||||||
}
|
|
||||||
} catch {
|
|
||||||
// Try next installed version
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -830,3 +893,10 @@ export async function bootstrapUvVenv(
|
|||||||
const manager = new UvManager({ venvPath, pythonVersion });
|
const manager = new UvManager({ venvPath, pythonVersion });
|
||||||
return manager.createVenv();
|
return manager.createVenv();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export const __testables = {
|
||||||
|
buildUvSpawnOptions,
|
||||||
|
buildUvSpawnSyncOptions,
|
||||||
|
findExecutableOnPath,
|
||||||
|
hasWindowsPythonLauncherVersion,
|
||||||
|
};
|
||||||
|
|||||||
118
ccw/tests/cli-history-cross-project.test.js
Normal file
118
ccw/tests/cli-history-cross-project.test.js
Normal file
@@ -0,0 +1,118 @@
|
|||||||
|
/**
|
||||||
|
* Cross-project regression coverage for `ccw cli history` and `ccw cli detail`.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { after, afterEach, before, describe, it, mock } from 'node:test';
|
||||||
|
import assert from 'node:assert/strict';
|
||||||
|
import { mkdtempSync, rmSync } from 'node:fs';
|
||||||
|
import { tmpdir } from 'node:os';
|
||||||
|
import { join } from 'node:path';
|
||||||
|
|
||||||
|
const TEST_CCW_HOME = mkdtempSync(join(tmpdir(), 'ccw-cli-history-cross-home-'));
|
||||||
|
process.env.CCW_DATA_DIR = TEST_CCW_HOME;
|
||||||
|
|
||||||
|
const cliCommandPath = new URL('../dist/commands/cli.js', import.meta.url).href;
|
||||||
|
const cliExecutorPath = new URL('../dist/tools/cli-executor.js', import.meta.url).href;
|
||||||
|
const historyStorePath = new URL('../dist/tools/cli-history-store.js', import.meta.url).href;
|
||||||
|
|
||||||
|
function createConversation({ id, prompt, updatedAt }) {
|
||||||
|
return {
|
||||||
|
id,
|
||||||
|
created_at: updatedAt,
|
||||||
|
updated_at: updatedAt,
|
||||||
|
tool: 'gemini',
|
||||||
|
model: 'default',
|
||||||
|
mode: 'analysis',
|
||||||
|
category: 'user',
|
||||||
|
total_duration_ms: 456,
|
||||||
|
turn_count: 1,
|
||||||
|
latest_status: 'success',
|
||||||
|
turns: [
|
||||||
|
{
|
||||||
|
turn: 1,
|
||||||
|
timestamp: updatedAt,
|
||||||
|
prompt,
|
||||||
|
duration_ms: 456,
|
||||||
|
status: 'success',
|
||||||
|
exit_code: 0,
|
||||||
|
output: {
|
||||||
|
stdout: 'CROSS PROJECT OK',
|
||||||
|
stderr: '',
|
||||||
|
truncated: false,
|
||||||
|
cached: false,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
describe('ccw cli history/detail cross-project', async () => {
|
||||||
|
let cliModule;
|
||||||
|
let cliExecutorModule;
|
||||||
|
let historyStoreModule;
|
||||||
|
|
||||||
|
before(async () => {
|
||||||
|
cliModule = await import(cliCommandPath);
|
||||||
|
cliExecutorModule = await import(cliExecutorPath);
|
||||||
|
historyStoreModule = await import(historyStorePath);
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(() => {
|
||||||
|
mock.restoreAll();
|
||||||
|
try {
|
||||||
|
historyStoreModule?.closeAllStores?.();
|
||||||
|
} catch {
|
||||||
|
// ignore
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
after(() => {
|
||||||
|
try {
|
||||||
|
historyStoreModule?.closeAllStores?.();
|
||||||
|
} catch {
|
||||||
|
// ignore
|
||||||
|
}
|
||||||
|
rmSync(TEST_CCW_HOME, { recursive: true, force: true });
|
||||||
|
});
|
||||||
|
|
||||||
|
it('finds history and detail for executions stored in another registered project', async () => {
|
||||||
|
const projectRoot = mkdtempSync(join(tmpdir(), 'ccw-cli-cross-project-history-'));
|
||||||
|
const unrelatedCwd = mkdtempSync(join(tmpdir(), 'ccw-cli-cross-project-cwd-'));
|
||||||
|
const previousCwd = process.cwd();
|
||||||
|
|
||||||
|
try {
|
||||||
|
const store = new historyStoreModule.CliHistoryStore(projectRoot);
|
||||||
|
store.saveConversation(createConversation({
|
||||||
|
id: 'CONV-CROSS-PROJECT-1',
|
||||||
|
prompt: 'Cross project prompt',
|
||||||
|
updatedAt: new Date('2025-02-01T00:00:01.000Z').toISOString(),
|
||||||
|
}));
|
||||||
|
store.close();
|
||||||
|
|
||||||
|
const logs = [];
|
||||||
|
mock.method(console, 'log', (...args) => {
|
||||||
|
logs.push(args.map(String).join(' '));
|
||||||
|
});
|
||||||
|
mock.method(console, 'error', (...args) => {
|
||||||
|
logs.push(args.map(String).join(' '));
|
||||||
|
});
|
||||||
|
|
||||||
|
process.chdir(unrelatedCwd);
|
||||||
|
|
||||||
|
await cliModule.cliCommand('history', [], { limit: '20' });
|
||||||
|
assert.ok(logs.some((line) => line.includes('CONV-CROSS-PROJECT-1')));
|
||||||
|
|
||||||
|
await cliExecutorModule.getExecutionHistoryAsync(projectRoot, { limit: 1 });
|
||||||
|
|
||||||
|
logs.length = 0;
|
||||||
|
await cliModule.cliCommand('detail', ['CONV-CROSS-PROJECT-1'], {});
|
||||||
|
assert.ok(logs.some((line) => line.includes('Conversation Detail')));
|
||||||
|
assert.ok(logs.some((line) => line.includes('CONV-CROSS-PROJECT-1')));
|
||||||
|
assert.ok(logs.some((line) => line.includes('Cross project prompt')));
|
||||||
|
} finally {
|
||||||
|
process.chdir(previousCwd);
|
||||||
|
rmSync(projectRoot, { recursive: true, force: true });
|
||||||
|
rmSync(unrelatedCwd, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -123,6 +123,39 @@ describe('ccw cli output --final', async () => {
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('loads cached output from another registered project without --project', async () => {
|
||||||
|
const projectRoot = createTestProjectRoot();
|
||||||
|
const unrelatedCwd = createTestProjectRoot();
|
||||||
|
const previousCwd = process.cwd();
|
||||||
|
const store = new historyStoreModule.CliHistoryStore(projectRoot);
|
||||||
|
|
||||||
|
try {
|
||||||
|
store.saveConversation(createConversation({
|
||||||
|
id: 'EXEC-CROSS-PROJECT-OUTPUT',
|
||||||
|
stdoutFull: 'cross project raw output',
|
||||||
|
parsedOutput: 'cross project parsed output',
|
||||||
|
finalOutput: 'cross project final output',
|
||||||
|
}));
|
||||||
|
|
||||||
|
process.chdir(unrelatedCwd);
|
||||||
|
|
||||||
|
const logs = [];
|
||||||
|
mock.method(console, 'log', (...args) => {
|
||||||
|
logs.push(args.map(String).join(' '));
|
||||||
|
});
|
||||||
|
mock.method(console, 'error', () => {});
|
||||||
|
|
||||||
|
await cliModule.cliCommand('output', ['EXEC-CROSS-PROJECT-OUTPUT'], {});
|
||||||
|
|
||||||
|
assert.equal(logs.at(-1), 'cross project final output');
|
||||||
|
} finally {
|
||||||
|
process.chdir(previousCwd);
|
||||||
|
store.close();
|
||||||
|
rmSync(projectRoot, { recursive: true, force: true });
|
||||||
|
rmSync(unrelatedCwd, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
it('fails fast for explicit --final when no final agent result can be recovered', async () => {
|
it('fails fast for explicit --final when no final agent result can be recovered', async () => {
|
||||||
const projectRoot = createTestProjectRoot();
|
const projectRoot = createTestProjectRoot();
|
||||||
const store = new historyStoreModule.CliHistoryStore(projectRoot);
|
const store = new historyStoreModule.CliHistoryStore(projectRoot);
|
||||||
@@ -159,4 +192,34 @@ describe('ccw cli output --final', async () => {
|
|||||||
rmSync(projectRoot, { recursive: true, force: true });
|
rmSync(projectRoot, { recursive: true, force: true });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('prints CCW execution ID guidance when output cannot find the requested execution', async () => {
|
||||||
|
const projectRoot = createTestProjectRoot();
|
||||||
|
const previousCwd = process.cwd();
|
||||||
|
|
||||||
|
try {
|
||||||
|
process.chdir(projectRoot);
|
||||||
|
|
||||||
|
const errors = [];
|
||||||
|
const exitCodes = [];
|
||||||
|
|
||||||
|
mock.method(console, 'log', () => {});
|
||||||
|
mock.method(console, 'error', (...args) => {
|
||||||
|
errors.push(args.map(String).join(' '));
|
||||||
|
});
|
||||||
|
mock.method(process, 'exit', (code) => {
|
||||||
|
exitCodes.push(code);
|
||||||
|
});
|
||||||
|
|
||||||
|
await cliModule.cliCommand('output', ['rebuttal-structure-analysis'], {});
|
||||||
|
|
||||||
|
assert.deepEqual(exitCodes, [1]);
|
||||||
|
assert.ok(errors.some((line) => line.includes('real CCW execution ID')));
|
||||||
|
assert.ok(errors.some((line) => line.includes('CCW_EXEC_ID')));
|
||||||
|
assert.ok(errors.some((line) => line.includes('ccw cli show or ccw cli history')));
|
||||||
|
} finally {
|
||||||
|
process.chdir(previousCwd);
|
||||||
|
rmSync(projectRoot, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -163,6 +163,42 @@ describe('ccw cli show running time formatting', async () => {
|
|||||||
assert.match(rendered, /1h\.\.\./);
|
assert.match(rendered, /1h\.\.\./);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('lists executions from other registered projects in show output', async () => {
|
||||||
|
const projectRoot = mkdtempSync(join(tmpdir(), 'ccw-cli-show-cross-project-'));
|
||||||
|
const unrelatedCwd = mkdtempSync(join(tmpdir(), 'ccw-cli-show-cross-cwd-'));
|
||||||
|
const previousCwd = process.cwd();
|
||||||
|
|
||||||
|
try {
|
||||||
|
process.chdir(unrelatedCwd);
|
||||||
|
const store = new historyStoreModule.CliHistoryStore(projectRoot);
|
||||||
|
store.saveConversation(createConversationRecord({
|
||||||
|
id: 'EXEC-CROSS-PROJECT-SHOW',
|
||||||
|
prompt: 'cross project show prompt',
|
||||||
|
updatedAt: new Date('2025-02-02T00:00:00.000Z').toISOString(),
|
||||||
|
durationMs: 1800,
|
||||||
|
}));
|
||||||
|
store.close();
|
||||||
|
|
||||||
|
stubActiveExecutionsResponse([]);
|
||||||
|
|
||||||
|
const logs = [];
|
||||||
|
mock.method(console, 'log', (...args) => {
|
||||||
|
logs.push(args.map(String).join(' '));
|
||||||
|
});
|
||||||
|
mock.method(console, 'error', () => {});
|
||||||
|
|
||||||
|
await cliModule.cliCommand('show', [], {});
|
||||||
|
|
||||||
|
const rendered = logs.join('\n');
|
||||||
|
assert.match(rendered, /EXEC-CROSS-PROJECT-SHOW/);
|
||||||
|
assert.match(rendered, /cross project show prompt/);
|
||||||
|
} finally {
|
||||||
|
process.chdir(previousCwd);
|
||||||
|
rmSync(projectRoot, { recursive: true, force: true });
|
||||||
|
rmSync(unrelatedCwd, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
it('suppresses stale running rows when saved history is newer than the active start time', async () => {
|
it('suppresses stale running rows when saved history is newer than the active start time', async () => {
|
||||||
const projectRoot = mkdtempSync(join(tmpdir(), 'ccw-cli-show-stale-project-'));
|
const projectRoot = mkdtempSync(join(tmpdir(), 'ccw-cli-show-stale-project-'));
|
||||||
const previousCwd = process.cwd();
|
const previousCwd = process.cwd();
|
||||||
|
|||||||
@@ -13,6 +13,38 @@ after(() => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
describe('CodexLens CLI compatibility retries', () => {
|
describe('CodexLens CLI compatibility retries', () => {
|
||||||
|
it('builds hidden Python spawn options for CLI invocations', async () => {
|
||||||
|
const moduleUrl = new URL(`../dist/tools/codex-lens.js?spawn-opts=${Date.now()}`, import.meta.url).href;
|
||||||
|
const { __testables } = await import(moduleUrl);
|
||||||
|
|
||||||
|
const options = __testables.buildCodexLensSpawnOptions(tmpdir(), 12345);
|
||||||
|
|
||||||
|
assert.equal(options.cwd, tmpdir());
|
||||||
|
assert.equal(options.shell, false);
|
||||||
|
assert.equal(options.timeout, 12345);
|
||||||
|
assert.equal(options.windowsHide, true);
|
||||||
|
assert.equal(options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('probes Python version without a shell-backed console window', async () => {
|
||||||
|
const moduleUrl = new URL(`../dist/tools/codex-lens.js?python-probe=${Date.now()}`, import.meta.url).href;
|
||||||
|
const { __testables } = await import(moduleUrl);
|
||||||
|
const probeCalls = [];
|
||||||
|
|
||||||
|
const version = __testables.probePythonVersion({ command: 'python', args: [], display: 'python' }, (command, args, options) => {
|
||||||
|
probeCalls.push({ command, args, options });
|
||||||
|
return { status: 0, stdout: '', stderr: 'Python 3.11.9\n' };
|
||||||
|
});
|
||||||
|
|
||||||
|
assert.equal(version, 'Python 3.11.9');
|
||||||
|
assert.equal(probeCalls.length, 1);
|
||||||
|
assert.equal(probeCalls[0].command, 'python');
|
||||||
|
assert.deepEqual(probeCalls[0].args, ['--version']);
|
||||||
|
assert.equal(probeCalls[0].options.shell, false);
|
||||||
|
assert.equal(probeCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(probeCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
});
|
||||||
|
|
||||||
it('initializes a tiny index even when CLI emits compatibility conflicts first', async () => {
|
it('initializes a tiny index even when CLI emits compatibility conflicts first', async () => {
|
||||||
const moduleUrl = new URL(`../dist/tools/codex-lens.js?compat=${Date.now()}`, import.meta.url).href;
|
const moduleUrl = new URL(`../dist/tools/codex-lens.js?compat=${Date.now()}`, import.meta.url).href;
|
||||||
const { checkVenvStatus, executeCodexLens } = await import(moduleUrl);
|
const { checkVenvStatus, executeCodexLens } = await import(moduleUrl);
|
||||||
@@ -32,4 +64,76 @@ describe('CodexLens CLI compatibility retries', () => {
|
|||||||
assert.equal(result.success, true, result.error ?? 'Expected init to succeed');
|
assert.equal(result.success, true, result.error ?? 'Expected init to succeed');
|
||||||
assert.ok((result.output ?? '').length > 0 || (result.warning ?? '').length > 0, 'Expected init output or compatibility warning');
|
assert.ok((result.output ?? '').length > 0 || (result.warning ?? '').length > 0, 'Expected init output or compatibility warning');
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('synthesizes a machine-readable fallback when JSON search output is empty', async () => {
|
||||||
|
const moduleUrl = new URL(`../dist/tools/codex-lens.js?compat-empty=${Date.now()}`, import.meta.url).href;
|
||||||
|
const { __testables } = await import(moduleUrl);
|
||||||
|
|
||||||
|
const normalized = __testables.normalizeSearchCommandResult(
|
||||||
|
{ success: true },
|
||||||
|
{ query: 'missing symbol', cwd: tmpdir(), limit: 5, filesOnly: false },
|
||||||
|
);
|
||||||
|
|
||||||
|
assert.equal(normalized.success, true);
|
||||||
|
assert.match(normalized.warning ?? '', /empty stdout/i);
|
||||||
|
assert.deepEqual(normalized.results, {
|
||||||
|
success: true,
|
||||||
|
result: {
|
||||||
|
query: 'missing symbol',
|
||||||
|
count: 0,
|
||||||
|
results: [],
|
||||||
|
},
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('returns structured semantic search results for a local embedded workspace', async () => {
|
||||||
|
const codexLensUrl = new URL(`../dist/tools/codex-lens.js?compat-search=${Date.now()}`, import.meta.url).href;
|
||||||
|
const smartSearchUrl = new URL(`../dist/tools/smart-search.js?compat-search=${Date.now()}`, import.meta.url).href;
|
||||||
|
const codexLensModule = await import(codexLensUrl);
|
||||||
|
const smartSearchModule = await import(smartSearchUrl);
|
||||||
|
|
||||||
|
const ready = await codexLensModule.checkVenvStatus(true);
|
||||||
|
if (!ready.ready) {
|
||||||
|
console.log('Skipping: CodexLens not ready');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const semantic = await codexLensModule.checkSemanticStatus();
|
||||||
|
if (!semantic.available) {
|
||||||
|
console.log('Skipping: semantic dependencies not ready');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const projectDir = mkdtempSync(join(tmpdir(), 'codexlens-search-'));
|
||||||
|
tempDirs.push(projectDir);
|
||||||
|
writeFileSync(
|
||||||
|
join(projectDir, 'sample.ts'),
|
||||||
|
'export function greet(name) { return `hello ${name}`; }\nexport const sum = (a, b) => a + b;\n',
|
||||||
|
);
|
||||||
|
|
||||||
|
const init = await smartSearchModule.handler({ action: 'init', path: projectDir });
|
||||||
|
assert.equal(init.success, true, init.error ?? 'Expected smart-search init to succeed');
|
||||||
|
|
||||||
|
const embed = await smartSearchModule.handler({
|
||||||
|
action: 'embed',
|
||||||
|
path: projectDir,
|
||||||
|
embeddingBackend: 'local',
|
||||||
|
force: true,
|
||||||
|
});
|
||||||
|
assert.equal(embed.success, true, embed.error ?? 'Expected smart-search embed to succeed');
|
||||||
|
|
||||||
|
const result = await codexLensModule.codexLensTool.execute({
|
||||||
|
action: 'search',
|
||||||
|
path: projectDir,
|
||||||
|
query: 'greet function',
|
||||||
|
mode: 'semantic',
|
||||||
|
format: 'json',
|
||||||
|
});
|
||||||
|
|
||||||
|
assert.equal(result.success, true, result.error ?? 'Expected semantic search compatibility fallback to succeed');
|
||||||
|
const payload = result.results?.result ?? result.results;
|
||||||
|
assert.ok(Array.isArray(payload?.results), 'Expected structured search results payload');
|
||||||
|
assert.ok(payload.results.length > 0, 'Expected at least one structured semantic search result');
|
||||||
|
assert.doesNotMatch(result.error ?? '', /unexpected extra arguments/i);
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
66
ccw/tests/codexlens-path.test.js
Normal file
66
ccw/tests/codexlens-path.test.js
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
import { after, afterEach, describe, it } from 'node:test';
|
||||||
|
import assert from 'node:assert/strict';
|
||||||
|
import { mkdtempSync, rmSync } from 'node:fs';
|
||||||
|
import { createRequire, syncBuiltinESMExports } from 'node:module';
|
||||||
|
import { tmpdir } from 'node:os';
|
||||||
|
import { join } from 'node:path';
|
||||||
|
|
||||||
|
const require = createRequire(import.meta.url);
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-var-requires
|
||||||
|
const fs = require('node:fs');
|
||||||
|
|
||||||
|
const originalExistsSync = fs.existsSync;
|
||||||
|
const originalCodexLensDataDir = process.env.CODEXLENS_DATA_DIR;
|
||||||
|
const tempDirs = [];
|
||||||
|
|
||||||
|
afterEach(() => {
|
||||||
|
fs.existsSync = originalExistsSync;
|
||||||
|
syncBuiltinESMExports();
|
||||||
|
|
||||||
|
if (originalCodexLensDataDir === undefined) {
|
||||||
|
delete process.env.CODEXLENS_DATA_DIR;
|
||||||
|
} else {
|
||||||
|
process.env.CODEXLENS_DATA_DIR = originalCodexLensDataDir;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
after(() => {
|
||||||
|
while (tempDirs.length > 0) {
|
||||||
|
rmSync(tempDirs.pop(), { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('codexlens-path hidden python selection', () => {
|
||||||
|
it('prefers pythonw.exe for hidden Windows subprocesses when available', async () => {
|
||||||
|
if (process.platform !== 'win32') {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const dataDir = mkdtempSync(join(tmpdir(), 'ccw-codexlens-hidden-python-'));
|
||||||
|
tempDirs.push(dataDir);
|
||||||
|
process.env.CODEXLENS_DATA_DIR = dataDir;
|
||||||
|
|
||||||
|
const expectedPythonw = join(dataDir, 'venv', 'Scripts', 'pythonw.exe');
|
||||||
|
fs.existsSync = (path) => String(path) === expectedPythonw;
|
||||||
|
syncBuiltinESMExports();
|
||||||
|
|
||||||
|
const moduleUrl = new URL(`../dist/utils/codexlens-path.js?t=${Date.now()}`, import.meta.url);
|
||||||
|
const mod = await import(moduleUrl.href);
|
||||||
|
|
||||||
|
assert.equal(mod.getCodexLensHiddenPython(), expectedPythonw);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to python.exe when pythonw.exe is unavailable', async () => {
|
||||||
|
const dataDir = mkdtempSync(join(tmpdir(), 'ccw-codexlens-hidden-fallback-'));
|
||||||
|
tempDirs.push(dataDir);
|
||||||
|
process.env.CODEXLENS_DATA_DIR = dataDir;
|
||||||
|
|
||||||
|
fs.existsSync = () => false;
|
||||||
|
syncBuiltinESMExports();
|
||||||
|
|
||||||
|
const moduleUrl = new URL(`../dist/utils/codexlens-path.js?t=${Date.now()}`, import.meta.url);
|
||||||
|
const mod = await import(moduleUrl.href);
|
||||||
|
|
||||||
|
assert.equal(mod.getCodexLensHiddenPython(), mod.getCodexLensPython());
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -105,7 +105,10 @@ describe('memory-embedder-bridge', () => {
|
|||||||
assert.equal(spawnCalls.length, 1);
|
assert.equal(spawnCalls.length, 1);
|
||||||
assert.equal(spawnCalls[0].args.at(-2), 'status');
|
assert.equal(spawnCalls[0].args.at(-2), 'status');
|
||||||
assert.equal(spawnCalls[0].args.at(-1), 'C:\\tmp\\db.sqlite');
|
assert.equal(spawnCalls[0].args.at(-1), 'C:\\tmp\\db.sqlite');
|
||||||
|
assert.equal(spawnCalls[0].options.shell, false);
|
||||||
assert.equal(spawnCalls[0].options.timeout, 30000);
|
assert.equal(spawnCalls[0].options.timeout, 30000);
|
||||||
|
assert.equal(spawnCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(spawnCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
});
|
});
|
||||||
|
|
||||||
it('generateEmbeddings builds args for sourceId, batchSize, and force', async () => {
|
it('generateEmbeddings builds args for sourceId, batchSize, and force', async () => {
|
||||||
@@ -138,7 +141,10 @@ describe('memory-embedder-bridge', () => {
|
|||||||
assert.equal(args[batchSizeIndex + 1], '4');
|
assert.equal(args[batchSizeIndex + 1], '4');
|
||||||
|
|
||||||
assert.ok(args.includes('--force'));
|
assert.ok(args.includes('--force'));
|
||||||
|
assert.equal(spawnCalls[0].options.shell, false);
|
||||||
assert.equal(spawnCalls[0].options.timeout, 300000);
|
assert.equal(spawnCalls[0].options.timeout, 300000);
|
||||||
|
assert.equal(spawnCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(spawnCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
|
||||||
spawnCalls.length = 0;
|
spawnCalls.length = 0;
|
||||||
spawnPlan.push({
|
spawnPlan.push({
|
||||||
|
|||||||
@@ -103,7 +103,7 @@ describe('LiteLLM client bridge', () => {
|
|||||||
|
|
||||||
assert.equal(available, true);
|
assert.equal(available, true);
|
||||||
assert.equal(spawnCalls.length, 1);
|
assert.equal(spawnCalls.length, 1);
|
||||||
assert.equal(spawnCalls[0].command, 'python');
|
assert.equal(spawnCalls[0].command, mod.getCodexLensVenvPython());
|
||||||
assert.deepEqual(spawnCalls[0].args, ['-m', 'ccw_litellm.cli', 'version']);
|
assert.deepEqual(spawnCalls[0].args, ['-m', 'ccw_litellm.cli', 'version']);
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -117,6 +117,19 @@ describe('LiteLLM client bridge', () => {
|
|||||||
assert.equal(spawnCalls[0].command, 'python3');
|
assert.equal(spawnCalls[0].command, 'python3');
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('spawns LiteLLM Python with hidden window options', async () => {
|
||||||
|
spawnPlan.push({ type: 'close', code: 0, stdout: '1.2.3\n' });
|
||||||
|
|
||||||
|
const client = new mod.LiteLLMClient({ timeout: 10 });
|
||||||
|
const available = await client.isAvailable();
|
||||||
|
|
||||||
|
assert.equal(available, true);
|
||||||
|
assert.equal(spawnCalls.length, 1);
|
||||||
|
assert.equal(spawnCalls[0].options.shell, false);
|
||||||
|
assert.equal(spawnCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(spawnCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
});
|
||||||
|
|
||||||
it('isAvailable returns false on spawn error', async () => {
|
it('isAvailable returns false on spawn error', async () => {
|
||||||
spawnPlan.push({ type: 'error', error: new Error('ENOENT') });
|
spawnPlan.push({ type: 'error', error: new Error('ENOENT') });
|
||||||
|
|
||||||
@@ -154,7 +167,7 @@ describe('LiteLLM client bridge', () => {
|
|||||||
|
|
||||||
assert.deepEqual(cfg, { ok: true });
|
assert.deepEqual(cfg, { ok: true });
|
||||||
assert.equal(spawnCalls.length, 1);
|
assert.equal(spawnCalls.length, 1);
|
||||||
assert.deepEqual(spawnCalls[0].args, ['-m', 'ccw_litellm.cli', 'config', '--json']);
|
assert.deepEqual(spawnCalls[0].args, ['-m', 'ccw_litellm.cli', 'config']);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('getConfig throws on malformed JSON', async () => {
|
it('getConfig throws on malformed JSON', async () => {
|
||||||
|
|||||||
@@ -76,6 +76,26 @@ describe('Smart Search - Query Intent + RRF Weights', async () => {
|
|||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
describe('classifyIntent lexical routing', () => {
|
||||||
|
it('routes config/backend queries to exact when index and embeddings are available', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
const classification = smartSearchModule.__testables.classifyIntent(
|
||||||
|
'embedding backend fastembed local litellm api config',
|
||||||
|
true,
|
||||||
|
true,
|
||||||
|
);
|
||||||
|
assert.strictEqual(classification.mode, 'exact');
|
||||||
|
assert.match(classification.reasoning, /lexical priority/i);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('routes generated artifact queries to exact when index and embeddings are available', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
const classification = smartSearchModule.__testables.classifyIntent('dist bundle output', true, true);
|
||||||
|
assert.strictEqual(classification.mode, 'exact');
|
||||||
|
assert.match(classification.reasoning, /generated artifact/i);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
describe('adjustWeightsByIntent', () => {
|
describe('adjustWeightsByIntent', () => {
|
||||||
it('maps keyword intent to exact-heavy weights', () => {
|
it('maps keyword intent to exact-heavy weights', () => {
|
||||||
if (!smartSearchModule) return;
|
if (!smartSearchModule) return;
|
||||||
@@ -119,4 +139,3 @@ describe('Smart Search - Query Intent + RRF Weights', async () => {
|
|||||||
});
|
});
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -1,16 +1,19 @@
|
|||||||
import { afterEach, before, describe, it } from 'node:test';
|
import { after, afterEach, before, describe, it } from 'node:test';
|
||||||
import assert from 'node:assert/strict';
|
import assert from 'node:assert/strict';
|
||||||
import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from 'node:fs';
|
import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from 'node:fs';
|
||||||
import { tmpdir } from 'node:os';
|
import { tmpdir } from 'node:os';
|
||||||
import { join } from 'node:path';
|
import { join } from 'node:path';
|
||||||
|
|
||||||
const smartSearchPath = new URL('../dist/tools/smart-search.js', import.meta.url).href;
|
const smartSearchPath = new URL('../dist/tools/smart-search.js', import.meta.url).href;
|
||||||
|
const originalAutoInitMissing = process.env.CODEXLENS_AUTO_INIT_MISSING;
|
||||||
|
const originalAutoEmbedMissing = process.env.CODEXLENS_AUTO_EMBED_MISSING;
|
||||||
|
|
||||||
describe('Smart Search MCP usage defaults and path handling', async () => {
|
describe('Smart Search MCP usage defaults and path handling', async () => {
|
||||||
let smartSearchModule;
|
let smartSearchModule;
|
||||||
const tempDirs = [];
|
const tempDirs = [];
|
||||||
|
|
||||||
before(async () => {
|
before(async () => {
|
||||||
|
process.env.CODEXLENS_AUTO_INIT_MISSING = 'false';
|
||||||
try {
|
try {
|
||||||
smartSearchModule = await import(smartSearchPath);
|
smartSearchModule = await import(smartSearchPath);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
@@ -18,10 +21,30 @@ describe('Smart Search MCP usage defaults and path handling', async () => {
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
after(() => {
|
||||||
|
if (originalAutoInitMissing === undefined) {
|
||||||
|
delete process.env.CODEXLENS_AUTO_INIT_MISSING;
|
||||||
|
} else {
|
||||||
|
process.env.CODEXLENS_AUTO_INIT_MISSING = originalAutoInitMissing;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (originalAutoEmbedMissing === undefined) {
|
||||||
|
delete process.env.CODEXLENS_AUTO_EMBED_MISSING;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
process.env.CODEXLENS_AUTO_EMBED_MISSING = originalAutoEmbedMissing;
|
||||||
|
});
|
||||||
|
|
||||||
afterEach(() => {
|
afterEach(() => {
|
||||||
while (tempDirs.length > 0) {
|
while (tempDirs.length > 0) {
|
||||||
rmSync(tempDirs.pop(), { recursive: true, force: true });
|
rmSync(tempDirs.pop(), { recursive: true, force: true });
|
||||||
}
|
}
|
||||||
|
if (smartSearchModule?.__testables) {
|
||||||
|
smartSearchModule.__testables.__resetRuntimeOverrides();
|
||||||
|
smartSearchModule.__testables.__resetBackgroundJobs();
|
||||||
|
}
|
||||||
|
process.env.CODEXLENS_AUTO_INIT_MISSING = 'false';
|
||||||
|
delete process.env.CODEXLENS_AUTO_EMBED_MISSING;
|
||||||
});
|
});
|
||||||
|
|
||||||
function createWorkspace() {
|
function createWorkspace() {
|
||||||
@@ -30,6 +53,15 @@ describe('Smart Search MCP usage defaults and path handling', async () => {
|
|||||||
return dir;
|
return dir;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function createDetachedChild() {
|
||||||
|
return {
|
||||||
|
on() {
|
||||||
|
return this;
|
||||||
|
},
|
||||||
|
unref() {},
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
it('keeps schema defaults aligned with runtime docs', () => {
|
it('keeps schema defaults aligned with runtime docs', () => {
|
||||||
if (!smartSearchModule) return;
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
@@ -50,14 +82,202 @@ describe('Smart Search MCP usage defaults and path handling', async () => {
|
|||||||
assert.equal(props.output_mode.default, 'ace');
|
assert.equal(props.output_mode.default, 'ace');
|
||||||
});
|
});
|
||||||
|
|
||||||
it('defaults auto embedding warmup to enabled unless explicitly disabled', () => {
|
it('defaults auto embedding warmup off on Windows unless explicitly enabled', () => {
|
||||||
if (!smartSearchModule) return;
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
const { __testables } = smartSearchModule;
|
const { __testables } = smartSearchModule;
|
||||||
assert.equal(__testables.isAutoEmbedMissingEnabled(undefined), true);
|
delete process.env.CODEXLENS_AUTO_EMBED_MISSING;
|
||||||
assert.equal(__testables.isAutoEmbedMissingEnabled({}), true);
|
assert.equal(__testables.isAutoEmbedMissingEnabled(undefined), process.platform !== 'win32');
|
||||||
assert.equal(__testables.isAutoEmbedMissingEnabled({ embedding_auto_embed_missing: true }), true);
|
assert.equal(__testables.isAutoEmbedMissingEnabled({}), process.platform !== 'win32');
|
||||||
|
assert.equal(
|
||||||
|
__testables.isAutoEmbedMissingEnabled({ embedding_auto_embed_missing: true }),
|
||||||
|
process.platform === 'win32' ? false : true,
|
||||||
|
);
|
||||||
assert.equal(__testables.isAutoEmbedMissingEnabled({ embedding_auto_embed_missing: false }), false);
|
assert.equal(__testables.isAutoEmbedMissingEnabled({ embedding_auto_embed_missing: false }), false);
|
||||||
|
process.env.CODEXLENS_AUTO_EMBED_MISSING = 'true';
|
||||||
|
assert.equal(__testables.isAutoEmbedMissingEnabled({ embedding_auto_embed_missing: false }), true);
|
||||||
|
process.env.CODEXLENS_AUTO_EMBED_MISSING = 'off';
|
||||||
|
assert.equal(__testables.isAutoEmbedMissingEnabled({ embedding_auto_embed_missing: true }), false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('defaults auto index warmup off on Windows unless explicitly enabled', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const { __testables } = smartSearchModule;
|
||||||
|
delete process.env.CODEXLENS_AUTO_INIT_MISSING;
|
||||||
|
assert.equal(__testables.isAutoInitMissingEnabled(), process.platform !== 'win32');
|
||||||
|
process.env.CODEXLENS_AUTO_INIT_MISSING = 'off';
|
||||||
|
assert.equal(__testables.isAutoInitMissingEnabled(), false);
|
||||||
|
process.env.CODEXLENS_AUTO_INIT_MISSING = '1';
|
||||||
|
assert.equal(__testables.isAutoInitMissingEnabled(), true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('explains when Windows disables background warmup by default', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const { __testables } = smartSearchModule;
|
||||||
|
delete process.env.CODEXLENS_AUTO_INIT_MISSING;
|
||||||
|
delete process.env.CODEXLENS_AUTO_EMBED_MISSING;
|
||||||
|
|
||||||
|
const initReason = __testables.getAutoInitMissingDisabledReason();
|
||||||
|
const embedReason = __testables.getAutoEmbedMissingDisabledReason({});
|
||||||
|
|
||||||
|
if (process.platform === 'win32') {
|
||||||
|
assert.match(initReason, /disabled by default on Windows/i);
|
||||||
|
assert.match(embedReason, /disabled by default on Windows/i);
|
||||||
|
assert.match(embedReason, /auto_embed_missing=true/i);
|
||||||
|
} else {
|
||||||
|
assert.match(initReason, /disabled/i);
|
||||||
|
assert.match(embedReason, /disabled/i);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('builds hidden subprocess options for Smart Search child processes', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const options = smartSearchModule.__testables.buildSmartSearchSpawnOptions(tmpdir(), {
|
||||||
|
detached: true,
|
||||||
|
stdio: 'ignore',
|
||||||
|
timeout: 12345,
|
||||||
|
});
|
||||||
|
|
||||||
|
assert.equal(options.cwd, tmpdir());
|
||||||
|
assert.equal(options.shell, false);
|
||||||
|
assert.equal(options.windowsHide, true);
|
||||||
|
assert.equal(options.detached, true);
|
||||||
|
assert.equal(options.timeout, 12345);
|
||||||
|
assert.equal(options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('avoids detached background warmup children on Windows consoles', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
assert.equal(
|
||||||
|
smartSearchModule.__testables.shouldDetachBackgroundSmartSearchProcess(),
|
||||||
|
process.platform !== 'win32',
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('checks tool availability without shell-based lookup popups', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const lookupCalls = [];
|
||||||
|
const available = smartSearchModule.__testables.checkToolAvailability(
|
||||||
|
'rg',
|
||||||
|
(command, args, options) => {
|
||||||
|
lookupCalls.push({ command, args, options });
|
||||||
|
return { status: 0, stdout: '', stderr: '' };
|
||||||
|
},
|
||||||
|
);
|
||||||
|
|
||||||
|
assert.equal(available, true);
|
||||||
|
assert.equal(lookupCalls.length, 1);
|
||||||
|
assert.equal(lookupCalls[0].command, process.platform === 'win32' ? 'where' : 'which');
|
||||||
|
assert.deepEqual(lookupCalls[0].args, ['rg']);
|
||||||
|
assert.equal(lookupCalls[0].options.shell, false);
|
||||||
|
assert.equal(lookupCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(lookupCalls[0].options.stdio, 'ignore');
|
||||||
|
assert.equal(lookupCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('starts background static index build once for unindexed paths', async () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const { __testables } = smartSearchModule;
|
||||||
|
const dir = createWorkspace();
|
||||||
|
const fakePython = join(dir, 'python.exe');
|
||||||
|
writeFileSync(fakePython, '');
|
||||||
|
process.env.CODEXLENS_AUTO_INIT_MISSING = 'true';
|
||||||
|
|
||||||
|
const spawnCalls = [];
|
||||||
|
__testables.__setRuntimeOverrides({
|
||||||
|
getVenvPythonPath: () => fakePython,
|
||||||
|
now: () => 1234567890,
|
||||||
|
spawnProcess: (command, args, options) => {
|
||||||
|
spawnCalls.push({ command, args, options });
|
||||||
|
return createDetachedChild();
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const scope = { workingDirectory: dir, searchPaths: ['.'] };
|
||||||
|
const indexStatus = { indexed: false, has_embeddings: false };
|
||||||
|
|
||||||
|
const first = await __testables.maybeStartBackgroundAutoInit(scope, indexStatus);
|
||||||
|
const second = await __testables.maybeStartBackgroundAutoInit(scope, indexStatus);
|
||||||
|
|
||||||
|
assert.match(first.note, /started/i);
|
||||||
|
assert.match(second.note, /already running/i);
|
||||||
|
assert.equal(spawnCalls.length, 1);
|
||||||
|
assert.equal(spawnCalls[0].command, fakePython);
|
||||||
|
assert.deepEqual(spawnCalls[0].args, ['-m', 'codexlens', 'index', 'init', dir, '--no-embeddings']);
|
||||||
|
assert.equal(spawnCalls[0].options.cwd, dir);
|
||||||
|
assert.equal(
|
||||||
|
spawnCalls[0].options.detached,
|
||||||
|
smartSearchModule.__testables.shouldDetachBackgroundSmartSearchProcess(),
|
||||||
|
);
|
||||||
|
assert.equal(spawnCalls[0].options.windowsHide, true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('starts background embedding build without detached Windows consoles', async () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const { __testables } = smartSearchModule;
|
||||||
|
const dir = createWorkspace();
|
||||||
|
const fakePython = join(dir, 'python.exe');
|
||||||
|
writeFileSync(fakePython, '');
|
||||||
|
process.env.CODEXLENS_AUTO_EMBED_MISSING = 'true';
|
||||||
|
|
||||||
|
const spawnCalls = [];
|
||||||
|
__testables.__setRuntimeOverrides({
|
||||||
|
getVenvPythonPath: () => fakePython,
|
||||||
|
checkSemanticStatus: async () => ({ available: true, litellmAvailable: true }),
|
||||||
|
now: () => 1234567890,
|
||||||
|
spawnProcess: (command, args, options) => {
|
||||||
|
spawnCalls.push({ command, args, options });
|
||||||
|
return createDetachedChild();
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const status = await __testables.maybeStartBackgroundAutoEmbed(
|
||||||
|
{ workingDirectory: dir, searchPaths: ['.'] },
|
||||||
|
{
|
||||||
|
indexed: true,
|
||||||
|
has_embeddings: false,
|
||||||
|
config: { embedding_backend: 'fastembed' },
|
||||||
|
},
|
||||||
|
);
|
||||||
|
|
||||||
|
assert.match(status.note, /started/i);
|
||||||
|
assert.equal(spawnCalls.length, 1);
|
||||||
|
assert.equal(spawnCalls[0].command, fakePython);
|
||||||
|
assert.deepEqual(spawnCalls[0].args.slice(0, 1), ['-c']);
|
||||||
|
assert.equal(spawnCalls[0].options.cwd, dir);
|
||||||
|
assert.equal(
|
||||||
|
spawnCalls[0].options.detached,
|
||||||
|
smartSearchModule.__testables.shouldDetachBackgroundSmartSearchProcess(),
|
||||||
|
);
|
||||||
|
assert.equal(spawnCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(spawnCalls[0].options.stdio, 'ignore');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('surfaces warnings when background static index warmup cannot start', async () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const { __testables } = smartSearchModule;
|
||||||
|
const dir = createWorkspace();
|
||||||
|
process.env.CODEXLENS_AUTO_INIT_MISSING = 'true';
|
||||||
|
|
||||||
|
__testables.__setRuntimeOverrides({
|
||||||
|
getVenvPythonPath: () => join(dir, 'missing-python.exe'),
|
||||||
|
});
|
||||||
|
|
||||||
|
const status = await __testables.maybeStartBackgroundAutoInit(
|
||||||
|
{ workingDirectory: dir, searchPaths: ['.'] },
|
||||||
|
{ indexed: false, has_embeddings: false },
|
||||||
|
);
|
||||||
|
|
||||||
|
assert.match(status.warning, /Automatic static index warmup could not start/i);
|
||||||
|
assert.match(status.warning, /not ready yet/i);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('honors explicit small limit values', async () => {
|
it('honors explicit small limit values', async () => {
|
||||||
@@ -246,15 +466,98 @@ describe('Smart Search MCP usage defaults and path handling', async () => {
|
|||||||
assert.match(String(matches[0].file).replace(/\\/g, '/'), /target\.ts$/);
|
assert.match(String(matches[0].file).replace(/\\/g, '/'), /target\.ts$/);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('detects centralized vector artifacts as full embedding coverage evidence', () => {
|
it('uses root-scoped embedding status instead of subtree artifacts', () => {
|
||||||
if (!smartSearchModule) return;
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
const dir = createWorkspace();
|
const summary = smartSearchModule.__testables.extractEmbeddingsStatusSummary({
|
||||||
writeFileSync(join(dir, '_vectors.hnsw'), 'hnsw');
|
total_indexes: 3,
|
||||||
writeFileSync(join(dir, '_vectors_meta.db'), 'meta');
|
indexes_with_embeddings: 2,
|
||||||
writeFileSync(join(dir, '_binary_vectors.mmap'), 'mmap');
|
total_chunks: 24,
|
||||||
|
coverage_percent: 66.7,
|
||||||
|
root: {
|
||||||
|
total_files: 4,
|
||||||
|
files_with_embeddings: 0,
|
||||||
|
total_chunks: 0,
|
||||||
|
coverage_percent: 0,
|
||||||
|
has_embeddings: false,
|
||||||
|
},
|
||||||
|
subtree: {
|
||||||
|
total_indexes: 3,
|
||||||
|
indexes_with_embeddings: 2,
|
||||||
|
total_files: 12,
|
||||||
|
files_with_embeddings: 8,
|
||||||
|
total_chunks: 24,
|
||||||
|
coverage_percent: 66.7,
|
||||||
|
},
|
||||||
|
centralized: {
|
||||||
|
dense_index_exists: true,
|
||||||
|
binary_index_exists: true,
|
||||||
|
meta_db_exists: true,
|
||||||
|
usable: false,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
assert.equal(smartSearchModule.__testables.hasCentralizedVectorArtifacts(dir), true);
|
assert.equal(summary.coveragePercent, 0);
|
||||||
|
assert.equal(summary.totalChunks, 0);
|
||||||
|
assert.equal(summary.hasEmbeddings, false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('accepts validated root centralized readiness from CLI status payloads', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const summary = smartSearchModule.__testables.extractEmbeddingsStatusSummary({
|
||||||
|
total_indexes: 2,
|
||||||
|
indexes_with_embeddings: 1,
|
||||||
|
total_chunks: 10,
|
||||||
|
coverage_percent: 25,
|
||||||
|
root: {
|
||||||
|
total_files: 2,
|
||||||
|
files_with_embeddings: 1,
|
||||||
|
total_chunks: 3,
|
||||||
|
coverage_percent: 50,
|
||||||
|
has_embeddings: true,
|
||||||
|
},
|
||||||
|
centralized: {
|
||||||
|
usable: true,
|
||||||
|
dense_ready: true,
|
||||||
|
chunk_metadata_rows: 3,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
assert.equal(summary.coveragePercent, 50);
|
||||||
|
assert.equal(summary.totalChunks, 3);
|
||||||
|
assert.equal(summary.hasEmbeddings, true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('prefers embeddings_status over legacy embeddings summary payloads', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const payload = smartSearchModule.__testables.selectEmbeddingsStatusPayload({
|
||||||
|
embeddings: {
|
||||||
|
total_indexes: 7,
|
||||||
|
indexes_with_embeddings: 4,
|
||||||
|
total_chunks: 99,
|
||||||
|
},
|
||||||
|
embeddings_status: {
|
||||||
|
total_indexes: 7,
|
||||||
|
total_chunks: 3,
|
||||||
|
root: {
|
||||||
|
total_files: 2,
|
||||||
|
files_with_embeddings: 1,
|
||||||
|
total_chunks: 3,
|
||||||
|
coverage_percent: 50,
|
||||||
|
has_embeddings: true,
|
||||||
|
},
|
||||||
|
centralized: {
|
||||||
|
usable: true,
|
||||||
|
dense_ready: true,
|
||||||
|
chunk_metadata_rows: 3,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
assert.equal(payload.root.total_chunks, 3);
|
||||||
|
assert.equal(payload.centralized.usable, true);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('recognizes CodexLens CLI compatibility failures and invalid regex fallback', () => {
|
it('recognizes CodexLens CLI compatibility failures and invalid regex fallback', () => {
|
||||||
@@ -281,6 +584,37 @@ describe('Smart Search MCP usage defaults and path handling', async () => {
|
|||||||
assert.match(resolution.warning, /literal ripgrep matching/i);
|
assert.match(resolution.warning, /literal ripgrep matching/i);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('suppresses compatibility-only fuzzy warnings when ripgrep already produced hits', () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
assert.equal(
|
||||||
|
smartSearchModule.__testables.shouldSurfaceCodexLensFtsCompatibilityWarning({
|
||||||
|
compatibilityTriggeredThisQuery: true,
|
||||||
|
skipExactDueToCompatibility: false,
|
||||||
|
ripgrepResultCount: 2,
|
||||||
|
}),
|
||||||
|
false,
|
||||||
|
);
|
||||||
|
|
||||||
|
assert.equal(
|
||||||
|
smartSearchModule.__testables.shouldSurfaceCodexLensFtsCompatibilityWarning({
|
||||||
|
compatibilityTriggeredThisQuery: true,
|
||||||
|
skipExactDueToCompatibility: false,
|
||||||
|
ripgrepResultCount: 0,
|
||||||
|
}),
|
||||||
|
true,
|
||||||
|
);
|
||||||
|
|
||||||
|
assert.equal(
|
||||||
|
smartSearchModule.__testables.shouldSurfaceCodexLensFtsCompatibilityWarning({
|
||||||
|
compatibilityTriggeredThisQuery: false,
|
||||||
|
skipExactDueToCompatibility: true,
|
||||||
|
ripgrepResultCount: 0,
|
||||||
|
}),
|
||||||
|
true,
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
it('builds actionable index suggestions for unhealthy index states', () => {
|
it('builds actionable index suggestions for unhealthy index states', () => {
|
||||||
if (!smartSearchModule) return;
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
@@ -318,4 +652,52 @@ describe('Smart Search MCP usage defaults and path handling', async () => {
|
|||||||
assert.match(toolResult.error, /Both search backends failed:/);
|
assert.match(toolResult.error, /Both search backends failed:/);
|
||||||
assert.match(toolResult.error, /(FTS|Ripgrep)/);
|
assert.match(toolResult.error, /(FTS|Ripgrep)/);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('returns structured semantic results after local init and embed without JSON parse warnings', async () => {
|
||||||
|
if (!smartSearchModule) return;
|
||||||
|
|
||||||
|
const codexLensModule = await import(new URL(`../dist/tools/codex-lens.js?smart-semantic=${Date.now()}`, import.meta.url).href);
|
||||||
|
const ready = await codexLensModule.checkVenvStatus(true);
|
||||||
|
if (!ready.ready) {
|
||||||
|
console.log('Skipping: CodexLens not ready');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const semantic = await codexLensModule.checkSemanticStatus();
|
||||||
|
if (!semantic.available) {
|
||||||
|
console.log('Skipping: semantic dependencies not ready');
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const dir = createWorkspace();
|
||||||
|
writeFileSync(
|
||||||
|
join(dir, 'sample.ts'),
|
||||||
|
'export function parseCodexLensOutput() { return stripAnsiOutput(); }\nexport const sum = (a, b) => a + b;\n',
|
||||||
|
);
|
||||||
|
|
||||||
|
const init = await smartSearchModule.handler({ action: 'init', path: dir });
|
||||||
|
assert.equal(init.success, true, init.error ?? 'Expected init to succeed');
|
||||||
|
|
||||||
|
const embed = await smartSearchModule.handler({
|
||||||
|
action: 'embed',
|
||||||
|
path: dir,
|
||||||
|
embeddingBackend: 'local',
|
||||||
|
force: true,
|
||||||
|
});
|
||||||
|
assert.equal(embed.success, true, embed.error ?? 'Expected local embed to succeed');
|
||||||
|
|
||||||
|
const search = await smartSearchModule.handler({
|
||||||
|
action: 'search',
|
||||||
|
mode: 'semantic',
|
||||||
|
path: dir,
|
||||||
|
query: 'parse CodexLens output strip ANSI',
|
||||||
|
limit: 5,
|
||||||
|
});
|
||||||
|
|
||||||
|
assert.equal(search.success, true, search.error ?? 'Expected semantic search to succeed');
|
||||||
|
assert.equal(search.result.success, true);
|
||||||
|
assert.equal(search.result.results.format, 'ace');
|
||||||
|
assert.ok(search.result.results.total >= 1, 'Expected at least one structured semantic match');
|
||||||
|
assert.doesNotMatch(search.result.metadata?.warning ?? '', /Failed to parse JSON output/i);
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
97
ccw/tests/unified-vector-index.test.ts
Normal file
97
ccw/tests/unified-vector-index.test.ts
Normal file
@@ -0,0 +1,97 @@
|
|||||||
|
import { after, beforeEach, describe, it } from 'node:test';
|
||||||
|
import assert from 'node:assert/strict';
|
||||||
|
import { EventEmitter } from 'node:events';
|
||||||
|
import { createRequire } from 'node:module';
|
||||||
|
import { mkdtempSync, rmSync } from 'node:fs';
|
||||||
|
import { tmpdir } from 'node:os';
|
||||||
|
import { join } from 'node:path';
|
||||||
|
|
||||||
|
const require = createRequire(import.meta.url);
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-var-requires
|
||||||
|
const fs = require('node:fs') as typeof import('node:fs');
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-var-requires
|
||||||
|
const childProcess = require('node:child_process') as typeof import('node:child_process');
|
||||||
|
|
||||||
|
class FakeChildProcess extends EventEmitter {
|
||||||
|
stdout = new EventEmitter();
|
||||||
|
stderr = new EventEmitter();
|
||||||
|
stdinChunks: string[] = [];
|
||||||
|
stdin = {
|
||||||
|
write: (chunk: string | Buffer) => {
|
||||||
|
this.stdinChunks.push(String(chunk));
|
||||||
|
return true;
|
||||||
|
},
|
||||||
|
end: () => undefined,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
type SpawnCall = {
|
||||||
|
command: string;
|
||||||
|
args: string[];
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||||
|
options: any;
|
||||||
|
child: FakeChildProcess;
|
||||||
|
};
|
||||||
|
|
||||||
|
const spawnCalls: SpawnCall[] = [];
|
||||||
|
const tempDirs: string[] = [];
|
||||||
|
let embedderAvailable = true;
|
||||||
|
|
||||||
|
const originalExistsSync = fs.existsSync;
|
||||||
|
const originalSpawn = childProcess.spawn;
|
||||||
|
|
||||||
|
fs.existsSync = ((..._args: unknown[]) => embedderAvailable) as typeof fs.existsSync;
|
||||||
|
|
||||||
|
childProcess.spawn = ((command: string, args: string[] = [], options: unknown = {}) => {
|
||||||
|
const child = new FakeChildProcess();
|
||||||
|
spawnCalls.push({ command: String(command), args: args.map(String), options, child });
|
||||||
|
|
||||||
|
queueMicrotask(() => {
|
||||||
|
child.stdout.emit('data', JSON.stringify({
|
||||||
|
success: true,
|
||||||
|
total_chunks: 4,
|
||||||
|
hnsw_available: true,
|
||||||
|
hnsw_count: 4,
|
||||||
|
dimension: 384,
|
||||||
|
}));
|
||||||
|
child.emit('close', 0);
|
||||||
|
});
|
||||||
|
|
||||||
|
return child as unknown as ReturnType<typeof childProcess.spawn>;
|
||||||
|
}) as typeof childProcess.spawn;
|
||||||
|
|
||||||
|
after(() => {
|
||||||
|
fs.existsSync = originalExistsSync;
|
||||||
|
childProcess.spawn = originalSpawn;
|
||||||
|
while (tempDirs.length > 0) {
|
||||||
|
rmSync(tempDirs.pop() as string, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('unified-vector-index', () => {
|
||||||
|
beforeEach(() => {
|
||||||
|
embedderAvailable = true;
|
||||||
|
spawnCalls.length = 0;
|
||||||
|
});
|
||||||
|
|
||||||
|
it('spawns CodexLens venv python with hidden window options', async () => {
|
||||||
|
const projectDir = mkdtempSync(join(tmpdir(), 'ccw-unified-vector-index-'));
|
||||||
|
tempDirs.push(projectDir);
|
||||||
|
|
||||||
|
const moduleUrl = new URL('../dist/core/unified-vector-index.js', import.meta.url);
|
||||||
|
moduleUrl.searchParams.set('t', String(Date.now()));
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||||
|
const mod: any = await import(moduleUrl.href);
|
||||||
|
|
||||||
|
const index = new mod.UnifiedVectorIndex(projectDir);
|
||||||
|
const status = await index.getStatus();
|
||||||
|
|
||||||
|
assert.equal(status.success, true);
|
||||||
|
assert.equal(spawnCalls.length, 1);
|
||||||
|
assert.equal(spawnCalls[0].options.shell, false);
|
||||||
|
assert.equal(spawnCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(spawnCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
assert.deepEqual(spawnCalls[0].options.stdio, ['pipe', 'pipe', 'pipe']);
|
||||||
|
assert.match(spawnCalls[0].child.stdinChunks.join(''), /"operation":"status"/);
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -3,13 +3,16 @@ import assert from 'node:assert/strict';
|
|||||||
import { execSync } from 'node:child_process';
|
import { execSync } from 'node:child_process';
|
||||||
|
|
||||||
const uvManagerPath = new URL('../dist/utils/uv-manager.js', import.meta.url).href;
|
const uvManagerPath = new URL('../dist/utils/uv-manager.js', import.meta.url).href;
|
||||||
|
const pythonUtilsPath = new URL('../dist/utils/python-utils.js', import.meta.url).href;
|
||||||
|
|
||||||
describe('CodexLens UV python preference', async () => {
|
describe('CodexLens UV python preference', async () => {
|
||||||
let mod;
|
let mod;
|
||||||
|
let pythonUtils;
|
||||||
const originalPython = process.env.CCW_PYTHON;
|
const originalPython = process.env.CCW_PYTHON;
|
||||||
|
|
||||||
before(async () => {
|
before(async () => {
|
||||||
mod = await import(uvManagerPath);
|
mod = await import(uvManagerPath);
|
||||||
|
pythonUtils = await import(pythonUtilsPath);
|
||||||
});
|
});
|
||||||
|
|
||||||
afterEach(() => {
|
afterEach(() => {
|
||||||
@@ -25,6 +28,73 @@ describe('CodexLens UV python preference', async () => {
|
|||||||
assert.equal(mod.getPreferredCodexLensPythonSpec(), 'C:/Custom/Python/python.exe');
|
assert.equal(mod.getPreferredCodexLensPythonSpec(), 'C:/Custom/Python/python.exe');
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('parses py launcher commands into spawn-safe command specs', () => {
|
||||||
|
const spec = pythonUtils.parsePythonCommandSpec('py -3.11');
|
||||||
|
|
||||||
|
assert.equal(spec.command, 'py');
|
||||||
|
assert.deepEqual(spec.args, ['-3.11']);
|
||||||
|
assert.equal(spec.display, 'py -3.11');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('treats unquoted Windows-style executable paths as a single command', () => {
|
||||||
|
const spec = pythonUtils.parsePythonCommandSpec('C:/Program Files/Python311/python.exe');
|
||||||
|
|
||||||
|
assert.equal(spec.command, 'C:/Program Files/Python311/python.exe');
|
||||||
|
assert.deepEqual(spec.args, []);
|
||||||
|
assert.equal(spec.display, '"C:/Program Files/Python311/python.exe"');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('probes Python launcher versions without opening a shell window', () => {
|
||||||
|
const probeCalls = [];
|
||||||
|
const version = pythonUtils.probePythonCommandVersion(
|
||||||
|
{ command: 'py', args: ['-3.11'], display: 'py -3.11' },
|
||||||
|
(command, args, options) => {
|
||||||
|
probeCalls.push({ command, args, options });
|
||||||
|
return { status: 0, stdout: '', stderr: 'Python 3.11.9\n' };
|
||||||
|
},
|
||||||
|
);
|
||||||
|
|
||||||
|
assert.equal(version, 'Python 3.11.9');
|
||||||
|
assert.equal(probeCalls.length, 1);
|
||||||
|
assert.equal(probeCalls[0].command, 'py');
|
||||||
|
assert.deepEqual(probeCalls[0].args, ['-3.11', '--version']);
|
||||||
|
assert.equal(probeCalls[0].options.shell, false);
|
||||||
|
assert.equal(probeCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(probeCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('looks up uv on PATH without spawning a visible shell window', () => {
|
||||||
|
const lookupCalls = [];
|
||||||
|
const found = mod.__testables.findExecutableOnPath('uv', (command, args, options) => {
|
||||||
|
lookupCalls.push({ command, args, options });
|
||||||
|
return { status: 0, stdout: 'C:/Tools/uv.exe\n', stderr: '' };
|
||||||
|
});
|
||||||
|
|
||||||
|
assert.equal(found, 'C:/Tools/uv.exe');
|
||||||
|
assert.equal(lookupCalls.length, 1);
|
||||||
|
assert.equal(lookupCalls[0].command, process.platform === 'win32' ? 'where' : 'which');
|
||||||
|
assert.deepEqual(lookupCalls[0].args, ['uv']);
|
||||||
|
assert.equal(lookupCalls[0].options.shell, false);
|
||||||
|
assert.equal(lookupCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(lookupCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('checks Windows launcher preferences with hidden subprocess options', () => {
|
||||||
|
const probeCalls = [];
|
||||||
|
const available = mod.__testables.hasWindowsPythonLauncherVersion('3.11', (command, args, options) => {
|
||||||
|
probeCalls.push({ command, args, options });
|
||||||
|
return { status: 0, stdout: '', stderr: 'Python 3.11.9\n' };
|
||||||
|
});
|
||||||
|
|
||||||
|
assert.equal(available, true);
|
||||||
|
assert.equal(probeCalls.length, 1);
|
||||||
|
assert.equal(probeCalls[0].command, 'py');
|
||||||
|
assert.deepEqual(probeCalls[0].args, ['-3.11', '--version']);
|
||||||
|
assert.equal(probeCalls[0].options.shell, false);
|
||||||
|
assert.equal(probeCalls[0].options.windowsHide, true);
|
||||||
|
assert.equal(probeCalls[0].options.env.PYTHONIOENCODING, 'utf-8');
|
||||||
|
});
|
||||||
|
|
||||||
it('prefers Python 3.11 or 3.10 on Windows when available', () => {
|
it('prefers Python 3.11 or 3.10 on Windows when available', () => {
|
||||||
if (process.platform !== 'win32') return;
|
if (process.platform !== 'win32') return;
|
||||||
delete process.env.CCW_PYTHON;
|
delete process.env.CCW_PYTHON;
|
||||||
|
|||||||
@@ -41,6 +41,56 @@ pip install codex-lens[semantic-directml]
|
|||||||
pip install codex-lens[full]
|
pip install codex-lens[full]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Local ONNX Reranker Bootstrap
|
||||||
|
|
||||||
|
Use the pinned bootstrap flow when you want the local-only reranker backend in an
|
||||||
|
existing CodexLens virtual environment without asking pip to resolve the whole
|
||||||
|
project extras set at once.
|
||||||
|
|
||||||
|
1. Start from the CodexLens repo root and create or activate the project venv.
|
||||||
|
2. Review the pinned install manifest in `scripts/requirements-reranker-local.txt`.
|
||||||
|
3. Render the deterministic setup plan:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/bootstrap_reranker_local.py --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
The bootstrap script always targets the selected venv Python, installs the local
|
||||||
|
ONNX reranker stack in a fixed order, and keeps the package set pinned to the
|
||||||
|
validated Python 3.13-compatible combination:
|
||||||
|
|
||||||
|
- `numpy==2.4.0`
|
||||||
|
- `onnxruntime==1.23.2`
|
||||||
|
- `huggingface-hub==0.36.2`
|
||||||
|
- `transformers==4.53.3`
|
||||||
|
- `optimum[onnxruntime]==2.1.0`
|
||||||
|
|
||||||
|
When you are ready to apply it to the CodexLens venv, use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/bootstrap_reranker_local.py --apply
|
||||||
|
```
|
||||||
|
|
||||||
|
To pre-download the default local reranker model (`Xenova/ms-marco-MiniLM-L-6-v2`)
|
||||||
|
into the repo-local Hugging Face cache, use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/bootstrap_reranker_local.py --apply --download-model
|
||||||
|
```
|
||||||
|
|
||||||
|
The dry-run plan also prints the equivalent explicit model download command. On
|
||||||
|
Windows PowerShell with the default repo venv, it looks like:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
.venv/Scripts/hf.exe download Xenova/ms-marco-MiniLM-L-6-v2 --local-dir .cache/huggingface/models/Xenova--ms-marco-MiniLM-L-6-v2
|
||||||
|
```
|
||||||
|
|
||||||
|
After installation, probe the backend from the same venv:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/bootstrap_reranker_local.py --apply --probe
|
||||||
|
```
|
||||||
|
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
- Python >= 3.10
|
- Python >= 3.10
|
||||||
|
|||||||
@@ -0,0 +1,16 @@
|
|||||||
|
{"query":"executeHybridMode dense_rerank semantic smart_search","relevant_paths":["ccw/src/tools/smart-search.ts"],"intent":"ccw-semantic-routing","notes":"CCW semantic mode delegates to CodexLens dense_rerank."}
|
||||||
|
{"query":"parse CodexLens JSON output strip ANSI smart_search","relevant_paths":["ccw/src/tools/smart-search.ts"],"intent":"ccw-json-fallback","notes":"Covers JSON/plain-text fallback handling for CodexLens output."}
|
||||||
|
{"query":"smart_search init embed search action schema","relevant_paths":["ccw/src/tools/smart-search.ts"],"intent":"ccw-action-schema","notes":"Find the Zod schema that defines init/embed/search actions."}
|
||||||
|
{"query":"auto init missing job dedupe smart_search","relevant_paths":["ccw/src/tools/smart-search.ts"],"intent":"ccw-auto-init","notes":"Targets background init/embed warmup and dedupe state."}
|
||||||
|
{"query":"smart_search exact mode fallback to CodexLens fts","relevant_paths":["ccw/src/tools/smart-search.ts"],"intent":"ccw-exact-fallback","notes":"Tracks the exact-mode fallback path into CodexLens FTS."}
|
||||||
|
{"query":"smart_search settings snapshot embedding backend reranker backend staged stage2 mode","relevant_paths":["ccw/src/tools/smart-search.ts"],"intent":"ccw-config-snapshot","notes":"Reads local config snapshot for embedding/reranker/staged pipeline settings."}
|
||||||
|
{"query":"embedding backend fastembed local litellm api config","relevant_paths":["codex-lens/src/codexlens/config.py"],"intent":"codexlens-embedding-config","notes":"Local-only benchmark should resolve to fastembed defaults."}
|
||||||
|
{"query":"reranker backend onnx api legacy configuration","relevant_paths":["codex-lens/src/codexlens/config.py","codex-lens/src/codexlens/env_config.py"],"intent":"codexlens-reranker-config","notes":"Covers both config dataclass fields and env overrides."}
|
||||||
|
{"query":"staged stage2 mode precomputed realtime static_global_graph","relevant_paths":["codex-lens/src/codexlens/config.py","codex-lens/src/codexlens/env_config.py"],"intent":"codexlens-stage2-config","notes":"Benchmark matrix should exercise the three supported stage2 modes."}
|
||||||
|
{"query":"enable staged rerank stage 4 config","relevant_paths":["codex-lens/src/codexlens/config.py"],"intent":"codexlens-stage4-rerank","notes":"Stage 4 rerank flag needs to stay enabled for local benchmarks."}
|
||||||
|
{"query":"cascade_search dense_rerank staged pipeline ChainSearchEngine","relevant_paths":["codex-lens/src/codexlens/search/chain_search.py"],"intent":"chain-search-cascade","notes":"Baseline query for the central retrieval engine."}
|
||||||
|
{"query":"realtime LSP expand stage2 search pipeline","relevant_paths":["codex-lens/src/codexlens/search/chain_search.py"],"intent":"chain-search-stage2-realtime","notes":"Targets realtime stage2 expansion logic."}
|
||||||
|
{"query":"static global graph stage2 expansion implementation","relevant_paths":["codex-lens/src/codexlens/search/chain_search.py"],"intent":"chain-search-stage2-static","notes":"Targets static_global_graph stage2 expansion logic."}
|
||||||
|
{"query":"cross encoder rerank stage 4 implementation","relevant_paths":["codex-lens/src/codexlens/search/chain_search.py"],"intent":"chain-search-rerank","notes":"Relevant for dense_rerank and staged rerank latency comparisons."}
|
||||||
|
{"query":"get_reranker factory onnx backend selection","relevant_paths":["codex-lens/src/codexlens/semantic/reranker/factory.py"],"intent":"reranker-factory","notes":"Keeps the benchmark aligned with local ONNX reranker selection."}
|
||||||
|
{"query":"EMBEDDING_BACKEND and RERANKER_BACKEND environment variables","relevant_paths":["codex-lens/src/codexlens/env_config.py"],"intent":"env-overrides","notes":"Covers CCW/CodexLens local-only environment overrides."}
|
||||||
@@ -239,6 +239,7 @@ def main() -> None:
|
|||||||
config.staged_clustering_strategy = str(args.staged_cluster_strategy or "path").strip().lower()
|
config.staged_clustering_strategy = str(args.staged_cluster_strategy or "path").strip().lower()
|
||||||
# Stability: on some Windows setups, DirectML/ONNX can crash under load.
|
# Stability: on some Windows setups, DirectML/ONNX can crash under load.
|
||||||
config.embedding_use_gpu = False
|
config.embedding_use_gpu = False
|
||||||
|
config.reranker_use_gpu = False
|
||||||
|
|
||||||
registry = RegistryStore()
|
registry = RegistryStore()
|
||||||
registry.initialize()
|
registry.initialize()
|
||||||
@@ -362,4 +363,3 @@ def main() -> None:
|
|||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|
||||||
|
|||||||
980
codex-lens/benchmarks/compare_ccw_smart_search_stage2.py
Normal file
980
codex-lens/benchmarks/compare_ccw_smart_search_stage2.py
Normal file
@@ -0,0 +1,980 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
"""Benchmark local-only staged stage2 modes for CCW smart_search queries.
|
||||||
|
|
||||||
|
This benchmark reuses the existing CodexLens benchmark style, but focuses on
|
||||||
|
the real search intents that drive CCW `smart_search`. It evaluates:
|
||||||
|
|
||||||
|
1. `dense_rerank` baseline
|
||||||
|
2. `staged` + `precomputed`
|
||||||
|
3. `staged` + `realtime`
|
||||||
|
4. `staged` + `static_global_graph`
|
||||||
|
|
||||||
|
Metrics:
|
||||||
|
- Hit@K
|
||||||
|
- MRR@K
|
||||||
|
- Recall@K
|
||||||
|
- latency (avg/p50/p95)
|
||||||
|
|
||||||
|
The runner is intentionally local-only. By default it uses:
|
||||||
|
- embedding backend: `fastembed`
|
||||||
|
- reranker backend: `onnx`
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
python benchmarks/compare_ccw_smart_search_stage2.py --dry-run
|
||||||
|
python benchmarks/compare_ccw_smart_search_stage2.py --self-check
|
||||||
|
python benchmarks/compare_ccw_smart_search_stage2.py --source .. --k 10
|
||||||
|
python benchmarks/compare_ccw_smart_search_stage2.py --embedding-model code --reranker-model cross-encoder/ms-marco-MiniLM-L-6-v2
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
from copy import deepcopy
|
||||||
|
import gc
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import statistics
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from dataclasses import asdict, dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, Iterable, List, Optional, Sequence, Tuple
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
|
||||||
|
|
||||||
|
from codexlens.config import Config
|
||||||
|
from codexlens.search.chain_search import ChainSearchEngine, SearchOptions
|
||||||
|
from codexlens.search.ranking import (
|
||||||
|
QueryIntent,
|
||||||
|
detect_query_intent,
|
||||||
|
is_generated_artifact_path,
|
||||||
|
is_test_file,
|
||||||
|
query_prefers_lexical_search,
|
||||||
|
query_targets_generated_files,
|
||||||
|
)
|
||||||
|
from codexlens.storage.path_mapper import PathMapper
|
||||||
|
from codexlens.storage.registry import RegistryStore
|
||||||
|
|
||||||
|
|
||||||
|
DEFAULT_SOURCE = Path(__file__).resolve().parents[2]
|
||||||
|
DEFAULT_QUERIES_FILE = Path(__file__).parent / "accuracy_queries_ccw_smart_search.jsonl"
|
||||||
|
DEFAULT_OUTPUT = Path(__file__).parent / "results" / "ccw_smart_search_stage2.json"
|
||||||
|
|
||||||
|
VALID_STAGE2_MODES = ("precomputed", "realtime", "static_global_graph")
|
||||||
|
VALID_LOCAL_EMBEDDING_BACKENDS = ("fastembed",)
|
||||||
|
VALID_LOCAL_RERANKER_BACKENDS = ("onnx", "fastembed", "legacy")
|
||||||
|
VALID_BASELINE_METHODS = ("auto", "fts", "hybrid")
|
||||||
|
DEFAULT_LOCAL_ONNX_RERANKER_MODEL = "Xenova/ms-marco-MiniLM-L-6-v2"
|
||||||
|
|
||||||
|
|
||||||
|
def _now_ms() -> float:
|
||||||
|
return time.perf_counter() * 1000.0
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_path_key(path: str) -> str:
|
||||||
|
try:
|
||||||
|
candidate = Path(path)
|
||||||
|
if str(candidate) and (candidate.is_absolute() or re.match(r"^[A-Za-z]:", str(candidate))):
|
||||||
|
normalized = str(candidate.resolve())
|
||||||
|
else:
|
||||||
|
normalized = str(candidate)
|
||||||
|
except Exception:
|
||||||
|
normalized = path
|
||||||
|
normalized = normalized.replace("/", "\\")
|
||||||
|
if os.name == "nt":
|
||||||
|
normalized = normalized.lower()
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _dedup_topk(paths: Iterable[str], k: int) -> List[str]:
|
||||||
|
output: List[str] = []
|
||||||
|
seen: set[str] = set()
|
||||||
|
for path in paths:
|
||||||
|
if path in seen:
|
||||||
|
continue
|
||||||
|
seen.add(path)
|
||||||
|
output.append(path)
|
||||||
|
if len(output) >= k:
|
||||||
|
break
|
||||||
|
return output
|
||||||
|
|
||||||
|
|
||||||
|
def _first_hit_rank(topk_paths: Sequence[str], relevant: set[str]) -> Optional[int]:
|
||||||
|
for index, path in enumerate(topk_paths, start=1):
|
||||||
|
if path in relevant:
|
||||||
|
return index
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _mrr(ranks: Sequence[Optional[int]]) -> float:
|
||||||
|
values = [1.0 / rank for rank in ranks if rank and rank > 0]
|
||||||
|
return statistics.mean(values) if values else 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def _mean(values: Sequence[float]) -> float:
|
||||||
|
return statistics.mean(values) if values else 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def _percentile(values: Sequence[float], percentile: float) -> float:
|
||||||
|
if not values:
|
||||||
|
return 0.0
|
||||||
|
ordered = sorted(values)
|
||||||
|
if len(ordered) == 1:
|
||||||
|
return ordered[0]
|
||||||
|
index = (len(ordered) - 1) * percentile
|
||||||
|
lower = int(index)
|
||||||
|
upper = min(lower + 1, len(ordered) - 1)
|
||||||
|
if lower == upper:
|
||||||
|
return ordered[lower]
|
||||||
|
fraction = index - lower
|
||||||
|
return ordered[lower] + (ordered[upper] - ordered[lower]) * fraction
|
||||||
|
|
||||||
|
|
||||||
|
def _load_labeled_queries(path: Path, limit: Optional[int]) -> List[Dict[str, Any]]:
|
||||||
|
if not path.is_file():
|
||||||
|
raise SystemExit(f"Queries file does not exist: {path}")
|
||||||
|
|
||||||
|
output: List[Dict[str, Any]] = []
|
||||||
|
for raw_line in path.read_text(encoding="utf-8", errors="ignore").splitlines():
|
||||||
|
line = raw_line.strip()
|
||||||
|
if not line or line.startswith("#"):
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
item = json.loads(line)
|
||||||
|
except Exception as exc:
|
||||||
|
raise SystemExit(f"Invalid JSONL line in {path}: {raw_line!r} ({exc})") from exc
|
||||||
|
if not isinstance(item, dict) or "query" not in item or "relevant_paths" not in item:
|
||||||
|
raise SystemExit(f"Invalid query item (expected object with query/relevant_paths): {item!r}")
|
||||||
|
relevant_paths = item.get("relevant_paths")
|
||||||
|
if not isinstance(relevant_paths, list) or not relevant_paths:
|
||||||
|
raise SystemExit(f"Query item must include non-empty relevant_paths[]: {item!r}")
|
||||||
|
output.append(item)
|
||||||
|
if limit is not None and len(output) >= limit:
|
||||||
|
break
|
||||||
|
return output
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_expected_paths(source_root: Path, paths: Sequence[str]) -> Tuple[List[str], set[str], List[str]]:
|
||||||
|
resolved_display: List[str] = []
|
||||||
|
resolved_keys: set[str] = set()
|
||||||
|
missing: List[str] = []
|
||||||
|
|
||||||
|
for raw_path in paths:
|
||||||
|
candidate = Path(raw_path)
|
||||||
|
if not candidate.is_absolute():
|
||||||
|
candidate = (source_root / candidate).resolve()
|
||||||
|
if not candidate.exists():
|
||||||
|
missing.append(str(candidate))
|
||||||
|
resolved_display.append(str(candidate))
|
||||||
|
resolved_keys.add(_normalize_path_key(str(candidate)))
|
||||||
|
return resolved_display, resolved_keys, missing
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_local_only_backends(embedding_backend: str, reranker_backend: str) -> None:
|
||||||
|
if embedding_backend not in VALID_LOCAL_EMBEDDING_BACKENDS:
|
||||||
|
raise SystemExit(
|
||||||
|
"This runner is local-only. "
|
||||||
|
f"--embedding-backend must be one of {', '.join(VALID_LOCAL_EMBEDDING_BACKENDS)}; got {embedding_backend!r}"
|
||||||
|
)
|
||||||
|
if reranker_backend not in VALID_LOCAL_RERANKER_BACKENDS:
|
||||||
|
raise SystemExit(
|
||||||
|
"This runner is local-only. "
|
||||||
|
f"--reranker-backend must be one of {', '.join(VALID_LOCAL_RERANKER_BACKENDS)}; got {reranker_backend!r}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_stage2_modes(stage2_modes: Sequence[str]) -> List[str]:
|
||||||
|
normalized = [str(mode).strip().lower() for mode in stage2_modes if str(mode).strip()]
|
||||||
|
if not normalized:
|
||||||
|
raise SystemExit("At least one --stage2-modes entry is required")
|
||||||
|
invalid = [mode for mode in normalized if mode not in VALID_STAGE2_MODES]
|
||||||
|
if invalid:
|
||||||
|
raise SystemExit(
|
||||||
|
f"Invalid --stage2-modes entry: {invalid[0]} "
|
||||||
|
f"(valid: {', '.join(VALID_STAGE2_MODES)})"
|
||||||
|
)
|
||||||
|
deduped: List[str] = []
|
||||||
|
seen: set[str] = set()
|
||||||
|
for mode in normalized:
|
||||||
|
if mode in seen:
|
||||||
|
continue
|
||||||
|
seen.add(mode)
|
||||||
|
deduped.append(mode)
|
||||||
|
return deduped
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_baseline_methods(methods: Sequence[str]) -> List[str]:
|
||||||
|
normalized = [str(method).strip().lower() for method in methods if str(method).strip()]
|
||||||
|
invalid = [method for method in normalized if method not in VALID_BASELINE_METHODS]
|
||||||
|
if invalid:
|
||||||
|
raise SystemExit(
|
||||||
|
f"Invalid --baseline-methods entry: {invalid[0]} "
|
||||||
|
f"(valid: {', '.join(VALID_BASELINE_METHODS)})"
|
||||||
|
)
|
||||||
|
deduped: List[str] = []
|
||||||
|
seen: set[str] = set()
|
||||||
|
for method in normalized:
|
||||||
|
if method in seen:
|
||||||
|
continue
|
||||||
|
seen.add(method)
|
||||||
|
deduped.append(method)
|
||||||
|
return deduped
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class StrategyRun:
|
||||||
|
strategy_key: str
|
||||||
|
strategy: str
|
||||||
|
stage2_mode: Optional[str]
|
||||||
|
effective_method: str
|
||||||
|
execution_method: str
|
||||||
|
latency_ms: float
|
||||||
|
topk_paths: List[str]
|
||||||
|
first_hit_rank: Optional[int]
|
||||||
|
hit_at_k: bool
|
||||||
|
recall_at_k: float
|
||||||
|
generated_artifact_count: int
|
||||||
|
test_file_count: int
|
||||||
|
error: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class QueryEvaluation:
|
||||||
|
query: str
|
||||||
|
intent: Optional[str]
|
||||||
|
notes: Optional[str]
|
||||||
|
relevant_paths: List[str]
|
||||||
|
runs: Dict[str, StrategyRun]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class PairwiseDelta:
|
||||||
|
mode_a: str
|
||||||
|
mode_b: str
|
||||||
|
hit_at_k_delta: float
|
||||||
|
mrr_at_k_delta: float
|
||||||
|
avg_recall_at_k_delta: float
|
||||||
|
avg_latency_ms_delta: float
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class StrategySpec:
|
||||||
|
strategy_key: str
|
||||||
|
strategy: str
|
||||||
|
stage2_mode: Optional[str]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class StrategyRuntime:
|
||||||
|
strategy_spec: StrategySpec
|
||||||
|
config: Config
|
||||||
|
registry: RegistryStore
|
||||||
|
engine: ChainSearchEngine
|
||||||
|
|
||||||
|
|
||||||
|
def _strategy_specs(
|
||||||
|
stage2_modes: Sequence[str],
|
||||||
|
include_dense_baseline: bool,
|
||||||
|
*,
|
||||||
|
baseline_methods: Sequence[str],
|
||||||
|
) -> List[StrategySpec]:
|
||||||
|
specs: List[StrategySpec] = []
|
||||||
|
for method in baseline_methods:
|
||||||
|
specs.append(StrategySpec(strategy_key=method, strategy=method, stage2_mode=None))
|
||||||
|
if include_dense_baseline:
|
||||||
|
specs.append(StrategySpec(strategy_key="dense_rerank", strategy="dense_rerank", stage2_mode=None))
|
||||||
|
for stage2_mode in stage2_modes:
|
||||||
|
specs.append(
|
||||||
|
StrategySpec(
|
||||||
|
strategy_key=f"staged:{stage2_mode}",
|
||||||
|
strategy="staged",
|
||||||
|
stage2_mode=stage2_mode,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return specs
|
||||||
|
|
||||||
|
|
||||||
|
def _build_strategy_runtime(base_config: Config, strategy_spec: StrategySpec) -> StrategyRuntime:
|
||||||
|
runtime_config = deepcopy(base_config)
|
||||||
|
registry = RegistryStore()
|
||||||
|
registry.initialize()
|
||||||
|
mapper = PathMapper()
|
||||||
|
engine = ChainSearchEngine(registry=registry, mapper=mapper, config=runtime_config)
|
||||||
|
return StrategyRuntime(
|
||||||
|
strategy_spec=strategy_spec,
|
||||||
|
config=runtime_config,
|
||||||
|
registry=registry,
|
||||||
|
engine=engine,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _select_effective_method(query: str, requested_method: str) -> str:
|
||||||
|
requested = str(requested_method).strip().lower()
|
||||||
|
if requested != "auto":
|
||||||
|
return requested
|
||||||
|
if query_targets_generated_files(query) or query_prefers_lexical_search(query):
|
||||||
|
return "fts"
|
||||||
|
intent = detect_query_intent(query)
|
||||||
|
if intent == QueryIntent.KEYWORD:
|
||||||
|
return "fts"
|
||||||
|
if intent == QueryIntent.SEMANTIC:
|
||||||
|
return "dense_rerank"
|
||||||
|
return "hybrid"
|
||||||
|
|
||||||
|
|
||||||
|
def _filter_dataset_by_query_match(
|
||||||
|
dataset: Sequence[Dict[str, Any]],
|
||||||
|
query_match: Optional[str],
|
||||||
|
) -> List[Dict[str, Any]]:
|
||||||
|
"""Filter labeled queries by case-insensitive substring match."""
|
||||||
|
needle = str(query_match or "").strip().casefold()
|
||||||
|
if not needle:
|
||||||
|
return list(dataset)
|
||||||
|
return [
|
||||||
|
dict(item)
|
||||||
|
for item in dataset
|
||||||
|
if needle in str(item.get("query", "")).casefold()
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _apply_query_limit(
|
||||||
|
dataset: Sequence[Dict[str, Any]],
|
||||||
|
query_limit: Optional[int],
|
||||||
|
) -> List[Dict[str, Any]]:
|
||||||
|
"""Apply the optional query limit after any dataset-level filtering."""
|
||||||
|
if query_limit is None:
|
||||||
|
return list(dataset)
|
||||||
|
return [dict(item) for item in list(dataset)[: max(0, int(query_limit))]]
|
||||||
|
|
||||||
|
|
||||||
|
def _write_json_payload(path: Path, payload: Dict[str, Any]) -> None:
|
||||||
|
"""Persist a benchmark payload as UTF-8 JSON."""
|
||||||
|
path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def _write_final_outputs(
|
||||||
|
*,
|
||||||
|
output_path: Path,
|
||||||
|
progress_output: Optional[Path],
|
||||||
|
payload: Dict[str, Any],
|
||||||
|
) -> None:
|
||||||
|
"""Persist the final completed payload to both result and progress outputs."""
|
||||||
|
_write_json_payload(output_path, payload)
|
||||||
|
if progress_output is not None:
|
||||||
|
_write_json_payload(progress_output, payload)
|
||||||
|
|
||||||
|
|
||||||
|
def _make_progress_payload(
|
||||||
|
*,
|
||||||
|
args: argparse.Namespace,
|
||||||
|
source_root: Path,
|
||||||
|
strategy_specs: Sequence[StrategySpec],
|
||||||
|
evaluations: Sequence[QueryEvaluation],
|
||||||
|
query_index: int,
|
||||||
|
total_queries: int,
|
||||||
|
run_index: int,
|
||||||
|
total_runs: int,
|
||||||
|
current_query: str,
|
||||||
|
current_strategy_key: str,
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
"""Create a partial progress snapshot for long benchmark runs."""
|
||||||
|
return {
|
||||||
|
"status": "running",
|
||||||
|
"timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
|
"source": str(source_root),
|
||||||
|
"queries_file": str(args.queries_file),
|
||||||
|
"query_count": len(evaluations),
|
||||||
|
"planned_query_count": total_queries,
|
||||||
|
"k": int(args.k),
|
||||||
|
"coarse_k": int(args.coarse_k),
|
||||||
|
"strategy_keys": [spec.strategy_key for spec in strategy_specs],
|
||||||
|
"progress": {
|
||||||
|
"completed_queries": query_index,
|
||||||
|
"total_queries": total_queries,
|
||||||
|
"completed_runs": run_index,
|
||||||
|
"total_runs": total_runs,
|
||||||
|
"current_query": current_query,
|
||||||
|
"current_strategy_key": current_strategy_key,
|
||||||
|
},
|
||||||
|
"evaluations": [
|
||||||
|
{
|
||||||
|
"query": evaluation.query,
|
||||||
|
"intent": evaluation.intent,
|
||||||
|
"notes": evaluation.notes,
|
||||||
|
"relevant_paths": evaluation.relevant_paths,
|
||||||
|
"runs": {key: asdict(run) for key, run in evaluation.runs.items()},
|
||||||
|
}
|
||||||
|
for evaluation in evaluations
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _make_search_options(method: str, *, k: int) -> SearchOptions:
|
||||||
|
normalized = str(method).strip().lower()
|
||||||
|
if normalized == "fts":
|
||||||
|
return SearchOptions(
|
||||||
|
total_limit=k,
|
||||||
|
hybrid_mode=False,
|
||||||
|
enable_fuzzy=False,
|
||||||
|
enable_vector=False,
|
||||||
|
pure_vector=False,
|
||||||
|
enable_cascade=False,
|
||||||
|
)
|
||||||
|
if normalized == "hybrid":
|
||||||
|
return SearchOptions(
|
||||||
|
total_limit=k,
|
||||||
|
hybrid_mode=True,
|
||||||
|
enable_fuzzy=False,
|
||||||
|
enable_vector=True,
|
||||||
|
pure_vector=False,
|
||||||
|
enable_cascade=False,
|
||||||
|
)
|
||||||
|
if normalized in {"dense_rerank", "staged"}:
|
||||||
|
return SearchOptions(
|
||||||
|
total_limit=k,
|
||||||
|
hybrid_mode=True,
|
||||||
|
enable_fuzzy=False,
|
||||||
|
enable_vector=True,
|
||||||
|
pure_vector=False,
|
||||||
|
enable_cascade=True,
|
||||||
|
)
|
||||||
|
raise ValueError(f"Unsupported benchmark method: {method}")
|
||||||
|
|
||||||
|
|
||||||
|
def _run_strategy(
|
||||||
|
engine: ChainSearchEngine,
|
||||||
|
config: Config,
|
||||||
|
*,
|
||||||
|
strategy_spec: StrategySpec,
|
||||||
|
query: str,
|
||||||
|
source_path: Path,
|
||||||
|
k: int,
|
||||||
|
coarse_k: int,
|
||||||
|
relevant: set[str],
|
||||||
|
) -> StrategyRun:
|
||||||
|
gc.collect()
|
||||||
|
effective_method = _select_effective_method(query, strategy_spec.strategy)
|
||||||
|
execution_method = "cascade" if effective_method in {"dense_rerank", "staged"} else effective_method
|
||||||
|
previous_cascade_strategy = getattr(config, "cascade_strategy", None)
|
||||||
|
previous_stage2_mode = getattr(config, "staged_stage2_mode", None)
|
||||||
|
|
||||||
|
start_ms = _now_ms()
|
||||||
|
try:
|
||||||
|
options = _make_search_options(
|
||||||
|
"staged" if strategy_spec.strategy == "staged" else effective_method,
|
||||||
|
k=k,
|
||||||
|
)
|
||||||
|
if strategy_spec.strategy == "staged":
|
||||||
|
config.cascade_strategy = "staged"
|
||||||
|
if strategy_spec.stage2_mode:
|
||||||
|
config.staged_stage2_mode = strategy_spec.stage2_mode
|
||||||
|
result = engine.cascade_search(
|
||||||
|
query=query,
|
||||||
|
source_path=source_path,
|
||||||
|
k=k,
|
||||||
|
coarse_k=coarse_k,
|
||||||
|
options=options,
|
||||||
|
strategy="staged",
|
||||||
|
)
|
||||||
|
elif effective_method == "dense_rerank":
|
||||||
|
config.cascade_strategy = "dense_rerank"
|
||||||
|
result = engine.cascade_search(
|
||||||
|
query=query,
|
||||||
|
source_path=source_path,
|
||||||
|
k=k,
|
||||||
|
coarse_k=coarse_k,
|
||||||
|
options=options,
|
||||||
|
strategy="dense_rerank",
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
result = engine.search(
|
||||||
|
query=query,
|
||||||
|
source_path=source_path,
|
||||||
|
options=options,
|
||||||
|
)
|
||||||
|
latency_ms = _now_ms() - start_ms
|
||||||
|
paths_raw = [item.path for item in (result.results or []) if getattr(item, "path", None)]
|
||||||
|
topk = _dedup_topk((_normalize_path_key(path) for path in paths_raw), k=k)
|
||||||
|
rank = _first_hit_rank(topk, relevant)
|
||||||
|
recall = 0.0
|
||||||
|
if relevant:
|
||||||
|
recall = len(set(topk) & relevant) / float(len(relevant))
|
||||||
|
return StrategyRun(
|
||||||
|
strategy_key=strategy_spec.strategy_key,
|
||||||
|
strategy=strategy_spec.strategy,
|
||||||
|
stage2_mode=strategy_spec.stage2_mode,
|
||||||
|
effective_method=effective_method,
|
||||||
|
execution_method=execution_method,
|
||||||
|
latency_ms=latency_ms,
|
||||||
|
topk_paths=topk,
|
||||||
|
first_hit_rank=rank,
|
||||||
|
hit_at_k=rank is not None,
|
||||||
|
recall_at_k=recall,
|
||||||
|
generated_artifact_count=sum(1 for path in topk if is_generated_artifact_path(path)),
|
||||||
|
test_file_count=sum(1 for path in topk if is_test_file(path)),
|
||||||
|
error=None,
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
latency_ms = _now_ms() - start_ms
|
||||||
|
return StrategyRun(
|
||||||
|
strategy_key=strategy_spec.strategy_key,
|
||||||
|
strategy=strategy_spec.strategy,
|
||||||
|
stage2_mode=strategy_spec.stage2_mode,
|
||||||
|
effective_method=effective_method,
|
||||||
|
execution_method=execution_method,
|
||||||
|
latency_ms=latency_ms,
|
||||||
|
topk_paths=[],
|
||||||
|
first_hit_rank=None,
|
||||||
|
hit_at_k=False,
|
||||||
|
recall_at_k=0.0,
|
||||||
|
generated_artifact_count=0,
|
||||||
|
test_file_count=0,
|
||||||
|
error=f"{type(exc).__name__}: {exc}",
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
config.cascade_strategy = previous_cascade_strategy
|
||||||
|
config.staged_stage2_mode = previous_stage2_mode
|
||||||
|
|
||||||
|
|
||||||
|
def _summarize_runs(runs: Sequence[StrategyRun]) -> Dict[str, Any]:
|
||||||
|
latencies = [run.latency_ms for run in runs if not run.error]
|
||||||
|
ranks = [run.first_hit_rank for run in runs]
|
||||||
|
effective_method_counts: Dict[str, int] = {}
|
||||||
|
for run in runs:
|
||||||
|
effective_method_counts[run.effective_method] = effective_method_counts.get(run.effective_method, 0) + 1
|
||||||
|
return {
|
||||||
|
"query_count": len(runs),
|
||||||
|
"hit_at_k": _mean([1.0 if run.hit_at_k else 0.0 for run in runs]),
|
||||||
|
"mrr_at_k": _mrr(ranks),
|
||||||
|
"avg_recall_at_k": _mean([run.recall_at_k for run in runs]),
|
||||||
|
"avg_latency_ms": _mean(latencies),
|
||||||
|
"p50_latency_ms": _percentile(latencies, 0.50),
|
||||||
|
"p95_latency_ms": _percentile(latencies, 0.95),
|
||||||
|
"avg_generated_artifact_count": _mean([float(run.generated_artifact_count) for run in runs]),
|
||||||
|
"avg_test_file_count": _mean([float(run.test_file_count) for run in runs]),
|
||||||
|
"runs_with_generated_artifacts": sum(1 for run in runs if run.generated_artifact_count > 0),
|
||||||
|
"runs_with_test_files": sum(1 for run in runs if run.test_file_count > 0),
|
||||||
|
"effective_methods": effective_method_counts,
|
||||||
|
"errors": sum(1 for run in runs if run.error),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _build_pairwise_deltas(stage2_summaries: Dict[str, Dict[str, Any]]) -> List[PairwiseDelta]:
|
||||||
|
modes = list(stage2_summaries.keys())
|
||||||
|
deltas: List[PairwiseDelta] = []
|
||||||
|
for left_index in range(len(modes)):
|
||||||
|
for right_index in range(left_index + 1, len(modes)):
|
||||||
|
left = modes[left_index]
|
||||||
|
right = modes[right_index]
|
||||||
|
left_summary = stage2_summaries[left]
|
||||||
|
right_summary = stage2_summaries[right]
|
||||||
|
deltas.append(
|
||||||
|
PairwiseDelta(
|
||||||
|
mode_a=left,
|
||||||
|
mode_b=right,
|
||||||
|
hit_at_k_delta=left_summary["hit_at_k"] - right_summary["hit_at_k"],
|
||||||
|
mrr_at_k_delta=left_summary["mrr_at_k"] - right_summary["mrr_at_k"],
|
||||||
|
avg_recall_at_k_delta=left_summary["avg_recall_at_k"] - right_summary["avg_recall_at_k"],
|
||||||
|
avg_latency_ms_delta=left_summary["avg_latency_ms"] - right_summary["avg_latency_ms"],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return deltas
|
||||||
|
|
||||||
|
|
||||||
|
def _make_plan_payload(
|
||||||
|
*,
|
||||||
|
args: argparse.Namespace,
|
||||||
|
source_root: Path,
|
||||||
|
dataset: Sequence[Dict[str, Any]],
|
||||||
|
baseline_methods: Sequence[str],
|
||||||
|
stage2_modes: Sequence[str],
|
||||||
|
strategy_specs: Sequence[StrategySpec],
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"mode": "dry-run" if args.dry_run else "self-check",
|
||||||
|
"local_only": True,
|
||||||
|
"source": str(source_root),
|
||||||
|
"queries_file": str(args.queries_file),
|
||||||
|
"query_count": len(dataset),
|
||||||
|
"query_match": args.query_match,
|
||||||
|
"k": int(args.k),
|
||||||
|
"coarse_k": int(args.coarse_k),
|
||||||
|
"baseline_methods": list(baseline_methods),
|
||||||
|
"stage2_modes": list(stage2_modes),
|
||||||
|
"strategy_keys": [spec.strategy_key for spec in strategy_specs],
|
||||||
|
"local_backends": {
|
||||||
|
"embedding_backend": args.embedding_backend,
|
||||||
|
"embedding_model": args.embedding_model,
|
||||||
|
"reranker_backend": args.reranker_backend,
|
||||||
|
"reranker_model": args.reranker_model,
|
||||||
|
"embedding_use_gpu": bool(args.embedding_use_gpu),
|
||||||
|
"reranker_use_gpu": bool(args.reranker_use_gpu),
|
||||||
|
},
|
||||||
|
"output": str(args.output),
|
||||||
|
"progress_output": str(args.progress_output) if args.progress_output else None,
|
||||||
|
"dataset_preview": [
|
||||||
|
{
|
||||||
|
"query": item.get("query"),
|
||||||
|
"intent": item.get("intent"),
|
||||||
|
"relevant_paths": item.get("relevant_paths"),
|
||||||
|
}
|
||||||
|
for item in list(dataset)[: min(3, len(dataset))]
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def build_parser() -> argparse.ArgumentParser:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument(
|
||||||
|
"--source",
|
||||||
|
type=Path,
|
||||||
|
default=DEFAULT_SOURCE,
|
||||||
|
help="Source root to benchmark. Defaults to the repository root so CCW and CodexLens paths resolve together.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--queries-file",
|
||||||
|
type=Path,
|
||||||
|
default=DEFAULT_QUERIES_FILE,
|
||||||
|
help="Labeled JSONL dataset of CCW smart_search queries",
|
||||||
|
)
|
||||||
|
parser.add_argument("--query-limit", type=int, default=None, help="Optional query limit")
|
||||||
|
parser.add_argument(
|
||||||
|
"--query-match",
|
||||||
|
type=str,
|
||||||
|
default=None,
|
||||||
|
help="Optional case-insensitive substring filter for selecting specific benchmark queries.",
|
||||||
|
)
|
||||||
|
parser.add_argument("--k", type=int, default=10, help="Top-k to evaluate")
|
||||||
|
parser.add_argument("--coarse-k", type=int, default=100, help="Stage-1 coarse_k")
|
||||||
|
parser.add_argument(
|
||||||
|
"--baseline-methods",
|
||||||
|
nargs="*",
|
||||||
|
default=list(VALID_BASELINE_METHODS),
|
||||||
|
help="Requested smart_search baselines to compare before staged modes (valid: auto, fts, hybrid).",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--stage2-modes",
|
||||||
|
nargs="*",
|
||||||
|
default=list(VALID_STAGE2_MODES),
|
||||||
|
help="Stage-2 modes to compare",
|
||||||
|
)
|
||||||
|
parser.add_argument("--warmup", type=int, default=0, help="Warmup iterations per strategy")
|
||||||
|
parser.add_argument(
|
||||||
|
"--embedding-backend",
|
||||||
|
default="fastembed",
|
||||||
|
help="Local embedding backend. This runner only accepts fastembed.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--embedding-model",
|
||||||
|
default="code",
|
||||||
|
help="Embedding model/profile for the local embedding backend",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--embedding-use-gpu",
|
||||||
|
action="store_true",
|
||||||
|
help="Enable GPU acceleration for local embeddings. Off by default for stability.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--reranker-backend",
|
||||||
|
default="onnx",
|
||||||
|
help="Local reranker backend. Supported local values: onnx, fastembed, legacy.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--reranker-model",
|
||||||
|
default=DEFAULT_LOCAL_ONNX_RERANKER_MODEL,
|
||||||
|
help="Reranker model name for the local reranker backend",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--reranker-use-gpu",
|
||||||
|
action="store_true",
|
||||||
|
help="Enable GPU acceleration for the local reranker. Off by default for stability.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--skip-dense-baseline",
|
||||||
|
action="store_true",
|
||||||
|
help="Only compare staged stage2 modes and skip the dense_rerank baseline.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--dry-run",
|
||||||
|
action="store_true",
|
||||||
|
help="Validate dataset/config and print the benchmark plan without running retrieval.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--self-check",
|
||||||
|
action="store_true",
|
||||||
|
help="Smoke-check the entrypoint by validating dataset, source paths, and stage matrix wiring.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output",
|
||||||
|
type=Path,
|
||||||
|
default=DEFAULT_OUTPUT,
|
||||||
|
help="Output JSON path",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--progress-output",
|
||||||
|
type=Path,
|
||||||
|
default=None,
|
||||||
|
help="Optional JSON path updated after each query with partial progress and completed runs.",
|
||||||
|
)
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = build_parser()
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
source_root = args.source.expanduser().resolve()
|
||||||
|
if not source_root.exists():
|
||||||
|
raise SystemExit(f"Source path does not exist: {source_root}")
|
||||||
|
if int(args.k) <= 0:
|
||||||
|
raise SystemExit("--k must be > 0")
|
||||||
|
if int(args.coarse_k) <= 0:
|
||||||
|
raise SystemExit("--coarse-k must be > 0")
|
||||||
|
if int(args.coarse_k) < int(args.k):
|
||||||
|
raise SystemExit("--coarse-k must be >= --k")
|
||||||
|
if int(args.warmup) < 0:
|
||||||
|
raise SystemExit("--warmup must be >= 0")
|
||||||
|
|
||||||
|
embedding_backend = str(args.embedding_backend).strip().lower()
|
||||||
|
reranker_backend = str(args.reranker_backend).strip().lower()
|
||||||
|
_validate_local_only_backends(embedding_backend, reranker_backend)
|
||||||
|
baseline_methods = _validate_baseline_methods(args.baseline_methods)
|
||||||
|
stage2_modes = _validate_stage2_modes(args.stage2_modes)
|
||||||
|
|
||||||
|
dataset = _load_labeled_queries(args.queries_file, None)
|
||||||
|
dataset = _filter_dataset_by_query_match(dataset, args.query_match)
|
||||||
|
dataset = _apply_query_limit(dataset, args.query_limit)
|
||||||
|
if not dataset:
|
||||||
|
raise SystemExit("No queries to run")
|
||||||
|
|
||||||
|
missing_paths: List[str] = []
|
||||||
|
for item in dataset:
|
||||||
|
_, _, item_missing = _resolve_expected_paths(source_root, [str(path) for path in item["relevant_paths"]])
|
||||||
|
missing_paths.extend(item_missing)
|
||||||
|
if missing_paths:
|
||||||
|
preview = ", ".join(missing_paths[:3])
|
||||||
|
raise SystemExit(
|
||||||
|
"Dataset relevant_paths do not resolve under the selected source root. "
|
||||||
|
f"Examples: {preview}"
|
||||||
|
)
|
||||||
|
|
||||||
|
strategy_specs = _strategy_specs(
|
||||||
|
stage2_modes,
|
||||||
|
include_dense_baseline=not args.skip_dense_baseline,
|
||||||
|
baseline_methods=baseline_methods,
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.dry_run or args.self_check:
|
||||||
|
payload = _make_plan_payload(
|
||||||
|
args=args,
|
||||||
|
source_root=source_root,
|
||||||
|
dataset=dataset,
|
||||||
|
baseline_methods=baseline_methods,
|
||||||
|
stage2_modes=stage2_modes,
|
||||||
|
strategy_specs=strategy_specs,
|
||||||
|
)
|
||||||
|
if args.self_check:
|
||||||
|
payload["status"] = "ok"
|
||||||
|
payload["checks"] = {
|
||||||
|
"dataset_loaded": True,
|
||||||
|
"stage2_matrix_size": len(stage2_modes),
|
||||||
|
"local_only_validation": True,
|
||||||
|
"source_path_exists": True,
|
||||||
|
}
|
||||||
|
print(json.dumps(payload, ensure_ascii=False, indent=2))
|
||||||
|
return
|
||||||
|
|
||||||
|
config = Config.load()
|
||||||
|
config.cascade_strategy = "staged"
|
||||||
|
config.enable_staged_rerank = True
|
||||||
|
config.enable_cross_encoder_rerank = True
|
||||||
|
config.embedding_backend = embedding_backend
|
||||||
|
config.embedding_model = str(args.embedding_model).strip()
|
||||||
|
config.embedding_use_gpu = bool(args.embedding_use_gpu)
|
||||||
|
config.embedding_auto_embed_missing = False
|
||||||
|
config.reranker_backend = reranker_backend
|
||||||
|
config.reranker_model = str(args.reranker_model).strip()
|
||||||
|
config.reranker_use_gpu = bool(args.reranker_use_gpu)
|
||||||
|
|
||||||
|
strategy_runtimes = {
|
||||||
|
spec.strategy_key: _build_strategy_runtime(config, spec)
|
||||||
|
for spec in strategy_specs
|
||||||
|
}
|
||||||
|
|
||||||
|
evaluations: List[QueryEvaluation] = []
|
||||||
|
total_queries = len(dataset)
|
||||||
|
total_runs = total_queries * len(strategy_specs)
|
||||||
|
completed_runs = 0
|
||||||
|
|
||||||
|
try:
|
||||||
|
if int(args.warmup) > 0:
|
||||||
|
warm_query = str(dataset[0]["query"]).strip()
|
||||||
|
warm_relevant_paths = [str(path) for path in dataset[0]["relevant_paths"]]
|
||||||
|
_, warm_relevant, _ = _resolve_expected_paths(source_root, warm_relevant_paths)
|
||||||
|
for spec in strategy_specs:
|
||||||
|
runtime = strategy_runtimes[spec.strategy_key]
|
||||||
|
for _ in range(int(args.warmup)):
|
||||||
|
_run_strategy(
|
||||||
|
runtime.engine,
|
||||||
|
runtime.config,
|
||||||
|
strategy_spec=spec,
|
||||||
|
query=warm_query,
|
||||||
|
source_path=source_root,
|
||||||
|
k=min(int(args.k), 5),
|
||||||
|
coarse_k=min(int(args.coarse_k), 50),
|
||||||
|
relevant=warm_relevant,
|
||||||
|
)
|
||||||
|
|
||||||
|
for index, item in enumerate(dataset, start=1):
|
||||||
|
query = str(item.get("query", "")).strip()
|
||||||
|
if not query:
|
||||||
|
continue
|
||||||
|
print(f"[query {index}/{total_queries}] {query}", flush=True)
|
||||||
|
relevant_paths, relevant, _ = _resolve_expected_paths(
|
||||||
|
source_root,
|
||||||
|
[str(path) for path in item["relevant_paths"]],
|
||||||
|
)
|
||||||
|
runs: Dict[str, StrategyRun] = {}
|
||||||
|
for spec in strategy_specs:
|
||||||
|
if args.progress_output is not None:
|
||||||
|
_write_json_payload(
|
||||||
|
args.progress_output,
|
||||||
|
_make_progress_payload(
|
||||||
|
args=args,
|
||||||
|
source_root=source_root,
|
||||||
|
strategy_specs=strategy_specs,
|
||||||
|
evaluations=evaluations,
|
||||||
|
query_index=index - 1,
|
||||||
|
total_queries=total_queries,
|
||||||
|
run_index=completed_runs,
|
||||||
|
total_runs=total_runs,
|
||||||
|
current_query=query,
|
||||||
|
current_strategy_key=spec.strategy_key,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
print(
|
||||||
|
f"[run {completed_runs + 1}/{total_runs}] "
|
||||||
|
f"strategy={spec.strategy_key} query={query}",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
runtime = strategy_runtimes[spec.strategy_key]
|
||||||
|
runs[spec.strategy_key] = _run_strategy(
|
||||||
|
runtime.engine,
|
||||||
|
runtime.config,
|
||||||
|
strategy_spec=spec,
|
||||||
|
query=query,
|
||||||
|
source_path=source_root,
|
||||||
|
k=int(args.k),
|
||||||
|
coarse_k=int(args.coarse_k),
|
||||||
|
relevant=relevant,
|
||||||
|
)
|
||||||
|
completed_runs += 1
|
||||||
|
run = runs[spec.strategy_key]
|
||||||
|
outcome = "error" if run.error else "ok"
|
||||||
|
print(
|
||||||
|
f"[done {completed_runs}/{total_runs}] "
|
||||||
|
f"strategy={spec.strategy_key} outcome={outcome} "
|
||||||
|
f"latency_ms={run.latency_ms:.2f} "
|
||||||
|
f"first_hit_rank={run.first_hit_rank}",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
evaluations.append(
|
||||||
|
QueryEvaluation(
|
||||||
|
query=query,
|
||||||
|
intent=str(item.get("intent")) if item.get("intent") is not None else None,
|
||||||
|
notes=str(item.get("notes")) if item.get("notes") is not None else None,
|
||||||
|
relevant_paths=relevant_paths,
|
||||||
|
runs=runs,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
if args.progress_output is not None:
|
||||||
|
_write_json_payload(
|
||||||
|
args.progress_output,
|
||||||
|
_make_progress_payload(
|
||||||
|
args=args,
|
||||||
|
source_root=source_root,
|
||||||
|
strategy_specs=strategy_specs,
|
||||||
|
evaluations=evaluations,
|
||||||
|
query_index=index,
|
||||||
|
total_queries=total_queries,
|
||||||
|
run_index=completed_runs,
|
||||||
|
total_runs=total_runs,
|
||||||
|
current_query=query,
|
||||||
|
current_strategy_key="complete",
|
||||||
|
),
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
for runtime in strategy_runtimes.values():
|
||||||
|
try:
|
||||||
|
runtime.engine.close()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
for runtime in strategy_runtimes.values():
|
||||||
|
try:
|
||||||
|
runtime.registry.close()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
strategy_summaries: Dict[str, Dict[str, Any]] = {}
|
||||||
|
for spec in strategy_specs:
|
||||||
|
spec_runs = [evaluation.runs[spec.strategy_key] for evaluation in evaluations if spec.strategy_key in evaluation.runs]
|
||||||
|
summary = _summarize_runs(spec_runs)
|
||||||
|
summary["strategy"] = spec.strategy
|
||||||
|
summary["stage2_mode"] = spec.stage2_mode
|
||||||
|
strategy_summaries[spec.strategy_key] = summary
|
||||||
|
|
||||||
|
stage2_mode_matrix = {
|
||||||
|
mode: strategy_summaries[f"staged:{mode}"]
|
||||||
|
for mode in stage2_modes
|
||||||
|
if f"staged:{mode}" in strategy_summaries
|
||||||
|
}
|
||||||
|
pairwise_deltas = [asdict(item) for item in _build_pairwise_deltas(stage2_mode_matrix)]
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"status": "completed",
|
||||||
|
"timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
|
"source": str(source_root),
|
||||||
|
"queries_file": str(args.queries_file),
|
||||||
|
"query_count": len(evaluations),
|
||||||
|
"query_match": args.query_match,
|
||||||
|
"k": int(args.k),
|
||||||
|
"coarse_k": int(args.coarse_k),
|
||||||
|
"local_only": True,
|
||||||
|
"strategies": strategy_summaries,
|
||||||
|
"stage2_mode_matrix": stage2_mode_matrix,
|
||||||
|
"pairwise_stage2_deltas": pairwise_deltas,
|
||||||
|
"config": {
|
||||||
|
"embedding_backend": config.embedding_backend,
|
||||||
|
"embedding_model": config.embedding_model,
|
||||||
|
"embedding_use_gpu": bool(config.embedding_use_gpu),
|
||||||
|
"reranker_backend": config.reranker_backend,
|
||||||
|
"reranker_model": config.reranker_model,
|
||||||
|
"reranker_use_gpu": bool(config.reranker_use_gpu),
|
||||||
|
"enable_staged_rerank": bool(config.enable_staged_rerank),
|
||||||
|
"enable_cross_encoder_rerank": bool(config.enable_cross_encoder_rerank),
|
||||||
|
},
|
||||||
|
"progress_output": str(args.progress_output) if args.progress_output else None,
|
||||||
|
"evaluations": [
|
||||||
|
{
|
||||||
|
"query": evaluation.query,
|
||||||
|
"intent": evaluation.intent,
|
||||||
|
"notes": evaluation.notes,
|
||||||
|
"relevant_paths": evaluation.relevant_paths,
|
||||||
|
"runs": {key: asdict(run) for key, run in evaluation.runs.items()},
|
||||||
|
}
|
||||||
|
for evaluation in evaluations
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
_write_final_outputs(
|
||||||
|
output_path=args.output,
|
||||||
|
progress_output=args.progress_output,
|
||||||
|
payload=payload,
|
||||||
|
)
|
||||||
|
print(json.dumps(payload, ensure_ascii=False, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -280,8 +280,9 @@ def main() -> None:
|
|||||||
if args.staged_cluster_strategy:
|
if args.staged_cluster_strategy:
|
||||||
config.staged_clustering_strategy = str(args.staged_cluster_strategy)
|
config.staged_clustering_strategy = str(args.staged_cluster_strategy)
|
||||||
# Stability: on some Windows setups, fastembed + DirectML can crash under load.
|
# Stability: on some Windows setups, fastembed + DirectML can crash under load.
|
||||||
# Dense_rerank uses the embedding backend that matches the index; force CPU here.
|
# Force local embeddings and reranking onto CPU for reproducible benchmark runs.
|
||||||
config.embedding_use_gpu = False
|
config.embedding_use_gpu = False
|
||||||
|
config.reranker_use_gpu = False
|
||||||
registry = RegistryStore()
|
registry = RegistryStore()
|
||||||
registry.initialize()
|
registry.initialize()
|
||||||
mapper = PathMapper()
|
mapper = PathMapper()
|
||||||
|
|||||||
1704
codex-lens/benchmarks/results/ccw_smart_search_stage2.json
Normal file
1704
codex-lens/benchmarks/results/ccw_smart_search_stage2.json
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,526 @@
|
|||||||
|
{
|
||||||
|
"timestamp": "2026-03-14 23:16:55",
|
||||||
|
"source": "D:\\Claude_dms3",
|
||||||
|
"queries_file": "D:\\Claude_dms3\\codex-lens\\benchmarks\\accuracy_queries_ccw_smart_search.jsonl",
|
||||||
|
"query_count": 4,
|
||||||
|
"k": 10,
|
||||||
|
"coarse_k": 100,
|
||||||
|
"local_only": true,
|
||||||
|
"strategies": {
|
||||||
|
"dense_rerank": {
|
||||||
|
"query_count": 4,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 20171.940174996853,
|
||||||
|
"p50_latency_ms": 14222.247749984264,
|
||||||
|
"p95_latency_ms": 35222.31535999476,
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "dense_rerank",
|
||||||
|
"stage2_mode": null
|
||||||
|
},
|
||||||
|
"staged:precomputed": {
|
||||||
|
"query_count": 4,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 13679.793299987912,
|
||||||
|
"p50_latency_ms": 12918.63379997015,
|
||||||
|
"p95_latency_ms": 16434.964765003322,
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed"
|
||||||
|
},
|
||||||
|
"staged:realtime": {
|
||||||
|
"query_count": 4,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 13885.101849973202,
|
||||||
|
"p50_latency_ms": 13826.323699980974,
|
||||||
|
"p95_latency_ms": 14867.712269958853,
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime"
|
||||||
|
},
|
||||||
|
"staged:static_global_graph": {
|
||||||
|
"query_count": 4,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 13336.124025002122,
|
||||||
|
"p50_latency_ms": 13415.476950019598,
|
||||||
|
"p95_latency_ms": 13514.329230004549,
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"stage2_mode_matrix": {
|
||||||
|
"precomputed": {
|
||||||
|
"query_count": 4,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 13679.793299987912,
|
||||||
|
"p50_latency_ms": 12918.63379997015,
|
||||||
|
"p95_latency_ms": 16434.964765003322,
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed"
|
||||||
|
},
|
||||||
|
"realtime": {
|
||||||
|
"query_count": 4,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 13885.101849973202,
|
||||||
|
"p50_latency_ms": 13826.323699980974,
|
||||||
|
"p95_latency_ms": 14867.712269958853,
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime"
|
||||||
|
},
|
||||||
|
"static_global_graph": {
|
||||||
|
"query_count": 4,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 13336.124025002122,
|
||||||
|
"p50_latency_ms": 13415.476950019598,
|
||||||
|
"p95_latency_ms": 13514.329230004549,
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"pairwise_stage2_deltas": [
|
||||||
|
{
|
||||||
|
"mode_a": "precomputed",
|
||||||
|
"mode_b": "realtime",
|
||||||
|
"hit_at_k_delta": 0.0,
|
||||||
|
"mrr_at_k_delta": 0.0,
|
||||||
|
"avg_recall_at_k_delta": 0.0,
|
||||||
|
"avg_latency_ms_delta": -205.30854998528957
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode_a": "precomputed",
|
||||||
|
"mode_b": "static_global_graph",
|
||||||
|
"hit_at_k_delta": 0.0,
|
||||||
|
"mrr_at_k_delta": 0.0,
|
||||||
|
"avg_recall_at_k_delta": 0.0,
|
||||||
|
"avg_latency_ms_delta": 343.66927498579025
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode_a": "realtime",
|
||||||
|
"mode_b": "static_global_graph",
|
||||||
|
"hit_at_k_delta": 0.0,
|
||||||
|
"mrr_at_k_delta": 0.0,
|
||||||
|
"avg_recall_at_k_delta": 0.0,
|
||||||
|
"avg_latency_ms_delta": 548.9778249710798
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"config": {
|
||||||
|
"embedding_backend": "fastembed",
|
||||||
|
"embedding_model": "code",
|
||||||
|
"embedding_use_gpu": false,
|
||||||
|
"reranker_backend": "onnx",
|
||||||
|
"reranker_model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
|
||||||
|
"enable_staged_rerank": true,
|
||||||
|
"enable_cross_encoder_rerank": true
|
||||||
|
},
|
||||||
|
"evaluations": [
|
||||||
|
{
|
||||||
|
"query": "executeHybridMode dense_rerank semantic smart_search",
|
||||||
|
"intent": "ccw-semantic-routing",
|
||||||
|
"notes": "CCW semantic mode delegates to CodexLens dense_rerank.",
|
||||||
|
"relevant_paths": [
|
||||||
|
"D:\\Claude_dms3\\ccw\\src\\tools\\smart-search.ts"
|
||||||
|
],
|
||||||
|
"runs": {
|
||||||
|
"dense_rerank": {
|
||||||
|
"strategy_key": "dense_rerank",
|
||||||
|
"strategy": "dense_rerank",
|
||||||
|
"stage2_mode": null,
|
||||||
|
"latency_ms": 38829.27079999447,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\issue-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\session-manager.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\types\\queue-types.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\nativesessionpanel.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\history-importer.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\memory-extraction-pipeline.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\skills-page.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\tools\\discover-design-files.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\api-settings\\clisettingsmodal.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\api-settings.spec.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:precomputed": {
|
||||||
|
"strategy_key": "staged:precomputed",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed",
|
||||||
|
"latency_ms": 16915.833400011063,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useissues.test.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-sessions-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\filepreview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\hooks\\hook-templates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\utils\\file-reader.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\routes\\cli-sessions-routes.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\history-importer.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:realtime": {
|
||||||
|
"strategy_key": "staged:realtime",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime",
|
||||||
|
"latency_ms": 13961.2567999959,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useissues.test.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-sessions-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\filepreview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\hooks\\hook-templates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\utils\\file-reader.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\routes\\cli-sessions-routes.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\history-importer.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:static_global_graph": {
|
||||||
|
"strategy_key": "staged:static_global_graph",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph",
|
||||||
|
"latency_ms": 12986.330999970436,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useissues.test.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-sessions-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\filepreview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\hooks\\hook-templates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\utils\\file-reader.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\routes\\cli-sessions-routes.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\history-importer.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"query": "parse CodexLens JSON output strip ANSI smart_search",
|
||||||
|
"intent": "ccw-json-fallback",
|
||||||
|
"notes": "Covers JSON/plain-text fallback handling for CodexLens output.",
|
||||||
|
"relevant_paths": [
|
||||||
|
"D:\\Claude_dms3\\ccw\\src\\tools\\smart-search.ts"
|
||||||
|
],
|
||||||
|
"runs": {
|
||||||
|
"dense_rerank": {
|
||||||
|
"strategy_key": "dense_rerank",
|
||||||
|
"strategy": "dense_rerank",
|
||||||
|
"stage2_mode": null,
|
||||||
|
"latency_ms": 14782.901199996471,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\codex-lens-lsp.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\issue\\queue\\queueexecuteinsession.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\terminal-dashboard\\queuepanel.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\usewebsocket.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useflows.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\api-error-monitoring.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\tests\\native-session-discovery.test.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\services\\checkpoint-service.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\tests\\integration\\system-routes.test.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:precomputed": {
|
||||||
|
"strategy_key": "staged:precomputed",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed",
|
||||||
|
"latency_ms": 13710.042499959469,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\userealtimeupdates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\stores\\queueexecutionstore.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\themeshare.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\clistreampanel.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\terminal-panel\\queueexecutionlistview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\api-settings.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\test\\i18n.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\history-importer.js"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:realtime": {
|
||||||
|
"strategy_key": "staged:realtime",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime",
|
||||||
|
"latency_ms": 15027.674999952316,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\userealtimeupdates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\stores\\queueexecutionstore.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\themeshare.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\clistreampanel.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\terminal-panel\\queueexecutionlistview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\api-settings.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\test\\i18n.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\history-importer.js"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:static_global_graph": {
|
||||||
|
"strategy_key": "staged:static_global_graph",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph",
|
||||||
|
"latency_ms": 13389.622500002384,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\userealtimeupdates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\stores\\queueexecutionstore.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\themeshare.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\clistreampanel.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\terminal-panel\\queueexecutionlistview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\api-settings.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\test\\i18n.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\history-importer.js"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"query": "smart_search init embed search action schema",
|
||||||
|
"intent": "ccw-action-schema",
|
||||||
|
"notes": "Find the Zod schema that defines init/embed/search actions.",
|
||||||
|
"relevant_paths": [
|
||||||
|
"D:\\Claude_dms3\\ccw\\src\\tools\\smart-search.ts"
|
||||||
|
],
|
||||||
|
"runs": {
|
||||||
|
"dense_rerank": {
|
||||||
|
"strategy_key": "dense_rerank",
|
||||||
|
"strategy": "dense_rerank",
|
||||||
|
"stage2_mode": null,
|
||||||
|
"latency_ms": 13661.594299972057,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\ask-question.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\a2ui\\a2uipopupcard.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\discovery-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\a2ui\\a2uiwebsockethandler.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useissues.test.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\discovery.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\__tests__\\ask-question.test.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\api-settings.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\a2ui\\a2uiwebsockethandler.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\dashboard.spec.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:precomputed": {
|
||||||
|
"strategy_key": "staged:precomputed",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed",
|
||||||
|
"latency_ms": 12127.225099980831,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\lite-scanner-complete.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\themeselector.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\team\\teamheader.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\ask-question.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\a2ui\\a2uipopupcard.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\issue\\discovery\\findinglist.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\api-settings\\clisettingsmodal.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\discovery-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\a2ui\\a2uiwebsockethandler.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:realtime": {
|
||||||
|
"strategy_key": "staged:realtime",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime",
|
||||||
|
"latency_ms": 12860.084999978542,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\lite-scanner-complete.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\themeselector.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\team\\teamheader.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\ask-question.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\a2ui\\a2uipopupcard.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\issue\\discovery\\findinglist.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\api-settings\\clisettingsmodal.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\discovery-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\a2ui\\a2uiwebsockethandler.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:static_global_graph": {
|
||||||
|
"strategy_key": "staged:static_global_graph",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph",
|
||||||
|
"latency_ms": 13441.331400036812,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\lite-scanner-complete.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\themeselector.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\team\\teamheader.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\ask-question.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\a2ui\\a2uipopupcard.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\issue\\discovery\\findinglist.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\api-settings\\clisettingsmodal.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\discovery-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\a2ui\\a2uiwebsockethandler.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"query": "auto init missing job dedupe smart_search",
|
||||||
|
"intent": "ccw-auto-init",
|
||||||
|
"notes": "Targets background init/embed warmup and dedupe state.",
|
||||||
|
"relevant_paths": [
|
||||||
|
"D:\\Claude_dms3\\ccw\\src\\tools\\smart-search.ts"
|
||||||
|
],
|
||||||
|
"runs": {
|
||||||
|
"dense_rerank": {
|
||||||
|
"strategy_key": "dense_rerank",
|
||||||
|
"strategy": "dense_rerank",
|
||||||
|
"stage2_mode": null,
|
||||||
|
"latency_ms": 13413.994400024414,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\pages\\memorypage.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\memory-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\usememory.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\batchoperationtoolbar.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\memory.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useprompthistory.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\stores\\flowstore.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\services\\deepwiki-service.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\claude-routes.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:precomputed": {
|
||||||
|
"strategy_key": "staged:precomputed",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed",
|
||||||
|
"latency_ms": 11966.072200000286,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\lsp\\handlers.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\ui\\commandcombobox.tsx",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\search\\global_graph_expander.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\api\\definition.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\orchestrator\\orchestrationplanbuilder.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\lsp\\handlers.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\search\\global_graph_expander.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\api\\definition.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\pages\\memorypage.tsx"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:realtime": {
|
||||||
|
"strategy_key": "staged:realtime",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime",
|
||||||
|
"latency_ms": 13691.39059996605,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\lsp\\handlers.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\ui\\commandcombobox.tsx",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\search\\global_graph_expander.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\api\\definition.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\orchestrator\\orchestrationplanbuilder.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\lsp\\handlers.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\search\\global_graph_expander.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\api\\definition.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\pages\\memorypage.tsx"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:static_global_graph": {
|
||||||
|
"strategy_key": "staged:static_global_graph",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph",
|
||||||
|
"latency_ms": 13527.211199998856,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\lsp\\handlers.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\ui\\commandcombobox.tsx",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\search\\global_graph_expander.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\api\\definition.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\orchestrator\\orchestrationplanbuilder.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\lsp\\handlers.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\search\\global_graph_expander.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\build\\lib\\codexlens\\api\\definition.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\pages\\memorypage.tsx"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"error": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
@@ -0,0 +1,415 @@
|
|||||||
|
{
|
||||||
|
"timestamp": "2026-03-15 00:19:16",
|
||||||
|
"source": "D:\\Claude_dms3",
|
||||||
|
"queries_file": "D:\\Claude_dms3\\codex-lens\\benchmarks\\accuracy_queries_ccw_smart_search.jsonl",
|
||||||
|
"query_count": 1,
|
||||||
|
"k": 10,
|
||||||
|
"coarse_k": 100,
|
||||||
|
"local_only": true,
|
||||||
|
"strategies": {
|
||||||
|
"auto": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 1.0,
|
||||||
|
"mrr_at_k": 1.0,
|
||||||
|
"avg_recall_at_k": 1.0,
|
||||||
|
"avg_latency_ms": 1377.3565999865532,
|
||||||
|
"p50_latency_ms": 1377.3565999865532,
|
||||||
|
"p95_latency_ms": 1377.3565999865532,
|
||||||
|
"avg_generated_artifact_count": 0.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 0,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"fts": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "auto",
|
||||||
|
"stage2_mode": null
|
||||||
|
},
|
||||||
|
"fts": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 1.0,
|
||||||
|
"mrr_at_k": 1.0,
|
||||||
|
"avg_recall_at_k": 1.0,
|
||||||
|
"avg_latency_ms": 1460.0819000601768,
|
||||||
|
"p50_latency_ms": 1460.0819000601768,
|
||||||
|
"p95_latency_ms": 1460.0819000601768,
|
||||||
|
"avg_generated_artifact_count": 0.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 0,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"fts": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "fts",
|
||||||
|
"stage2_mode": null
|
||||||
|
},
|
||||||
|
"hybrid": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 45991.74140000343,
|
||||||
|
"p50_latency_ms": 45991.74140000343,
|
||||||
|
"p95_latency_ms": 45991.74140000343,
|
||||||
|
"avg_generated_artifact_count": 0.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 0,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"hybrid": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "hybrid",
|
||||||
|
"stage2_mode": null
|
||||||
|
},
|
||||||
|
"dense_rerank": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 22739.62610000372,
|
||||||
|
"p50_latency_ms": 22739.62610000372,
|
||||||
|
"p95_latency_ms": 22739.62610000372,
|
||||||
|
"avg_generated_artifact_count": 1.0,
|
||||||
|
"avg_test_file_count": 2.0,
|
||||||
|
"runs_with_generated_artifacts": 1,
|
||||||
|
"runs_with_test_files": 1,
|
||||||
|
"effective_methods": {
|
||||||
|
"dense_rerank": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "dense_rerank",
|
||||||
|
"stage2_mode": null
|
||||||
|
},
|
||||||
|
"staged:precomputed": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 14900.017599999905,
|
||||||
|
"p50_latency_ms": 14900.017599999905,
|
||||||
|
"p95_latency_ms": 14900.017599999905,
|
||||||
|
"avg_generated_artifact_count": 1.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 1,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"staged": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed"
|
||||||
|
},
|
||||||
|
"staged:realtime": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 14104.314599990845,
|
||||||
|
"p50_latency_ms": 14104.314599990845,
|
||||||
|
"p95_latency_ms": 14104.314599990845,
|
||||||
|
"avg_generated_artifact_count": 1.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 1,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"staged": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime"
|
||||||
|
},
|
||||||
|
"staged:static_global_graph": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 11906.852500021458,
|
||||||
|
"p50_latency_ms": 11906.852500021458,
|
||||||
|
"p95_latency_ms": 11906.852500021458,
|
||||||
|
"avg_generated_artifact_count": 1.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 1,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"staged": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"stage2_mode_matrix": {
|
||||||
|
"precomputed": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 14900.017599999905,
|
||||||
|
"p50_latency_ms": 14900.017599999905,
|
||||||
|
"p95_latency_ms": 14900.017599999905,
|
||||||
|
"avg_generated_artifact_count": 1.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 1,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"staged": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed"
|
||||||
|
},
|
||||||
|
"realtime": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 14104.314599990845,
|
||||||
|
"p50_latency_ms": 14104.314599990845,
|
||||||
|
"p95_latency_ms": 14104.314599990845,
|
||||||
|
"avg_generated_artifact_count": 1.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 1,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"staged": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime"
|
||||||
|
},
|
||||||
|
"static_global_graph": {
|
||||||
|
"query_count": 1,
|
||||||
|
"hit_at_k": 0.0,
|
||||||
|
"mrr_at_k": 0.0,
|
||||||
|
"avg_recall_at_k": 0.0,
|
||||||
|
"avg_latency_ms": 11906.852500021458,
|
||||||
|
"p50_latency_ms": 11906.852500021458,
|
||||||
|
"p95_latency_ms": 11906.852500021458,
|
||||||
|
"avg_generated_artifact_count": 1.0,
|
||||||
|
"avg_test_file_count": 0.0,
|
||||||
|
"runs_with_generated_artifacts": 1,
|
||||||
|
"runs_with_test_files": 0,
|
||||||
|
"effective_methods": {
|
||||||
|
"staged": 1
|
||||||
|
},
|
||||||
|
"errors": 0,
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"pairwise_stage2_deltas": [
|
||||||
|
{
|
||||||
|
"mode_a": "precomputed",
|
||||||
|
"mode_b": "realtime",
|
||||||
|
"hit_at_k_delta": 0.0,
|
||||||
|
"mrr_at_k_delta": 0.0,
|
||||||
|
"avg_recall_at_k_delta": 0.0,
|
||||||
|
"avg_latency_ms_delta": 795.7030000090599
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode_a": "precomputed",
|
||||||
|
"mode_b": "static_global_graph",
|
||||||
|
"hit_at_k_delta": 0.0,
|
||||||
|
"mrr_at_k_delta": 0.0,
|
||||||
|
"avg_recall_at_k_delta": 0.0,
|
||||||
|
"avg_latency_ms_delta": 2993.165099978447
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode_a": "realtime",
|
||||||
|
"mode_b": "static_global_graph",
|
||||||
|
"hit_at_k_delta": 0.0,
|
||||||
|
"mrr_at_k_delta": 0.0,
|
||||||
|
"avg_recall_at_k_delta": 0.0,
|
||||||
|
"avg_latency_ms_delta": 2197.462099969387
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"config": {
|
||||||
|
"embedding_backend": "fastembed",
|
||||||
|
"embedding_model": "code",
|
||||||
|
"embedding_use_gpu": false,
|
||||||
|
"reranker_backend": "onnx",
|
||||||
|
"reranker_model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
|
||||||
|
"reranker_use_gpu": false,
|
||||||
|
"enable_staged_rerank": true,
|
||||||
|
"enable_cross_encoder_rerank": true
|
||||||
|
},
|
||||||
|
"evaluations": [
|
||||||
|
{
|
||||||
|
"query": "executeHybridMode dense_rerank semantic smart_search",
|
||||||
|
"intent": "ccw-semantic-routing",
|
||||||
|
"notes": "CCW semantic mode delegates to CodexLens dense_rerank.",
|
||||||
|
"relevant_paths": [
|
||||||
|
"D:\\Claude_dms3\\ccw\\src\\tools\\smart-search.ts"
|
||||||
|
],
|
||||||
|
"runs": {
|
||||||
|
"auto": {
|
||||||
|
"strategy_key": "auto",
|
||||||
|
"strategy": "auto",
|
||||||
|
"stage2_mode": null,
|
||||||
|
"effective_method": "fts",
|
||||||
|
"execution_method": "fts",
|
||||||
|
"latency_ms": 1377.3565999865532,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\smart-search.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": 1,
|
||||||
|
"hit_at_k": true,
|
||||||
|
"recall_at_k": 1.0,
|
||||||
|
"generated_artifact_count": 0,
|
||||||
|
"test_file_count": 0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"fts": {
|
||||||
|
"strategy_key": "fts",
|
||||||
|
"strategy": "fts",
|
||||||
|
"stage2_mode": null,
|
||||||
|
"effective_method": "fts",
|
||||||
|
"execution_method": "fts",
|
||||||
|
"latency_ms": 1460.0819000601768,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\smart-search.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": 1,
|
||||||
|
"hit_at_k": true,
|
||||||
|
"recall_at_k": 1.0,
|
||||||
|
"generated_artifact_count": 0,
|
||||||
|
"test_file_count": 0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"hybrid": {
|
||||||
|
"strategy_key": "hybrid",
|
||||||
|
"strategy": "hybrid",
|
||||||
|
"stage2_mode": null,
|
||||||
|
"effective_method": "hybrid",
|
||||||
|
"execution_method": "hybrid",
|
||||||
|
"latency_ms": 45991.74140000343,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\config\\litellm-api-config-manager.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\semantic\\reranker\\api_reranker.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\core-memory.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\cli\\commands.py",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\scripts\\generate_embeddings.py",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\notification-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\team-msg.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\types\\remote-notification.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\memory-store.ts",
|
||||||
|
"d:\\claude_dms3\\codex-lens\\src\\codexlens\\semantic\\vector_store.py"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"generated_artifact_count": 0,
|
||||||
|
"test_file_count": 0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"dense_rerank": {
|
||||||
|
"strategy_key": "dense_rerank",
|
||||||
|
"strategy": "dense_rerank",
|
||||||
|
"stage2_mode": null,
|
||||||
|
"effective_method": "dense_rerank",
|
||||||
|
"execution_method": "cascade",
|
||||||
|
"latency_ms": 22739.62610000372,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\issue-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\session-manager.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\types\\queue-types.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\nativesessionpanel.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\history-importer.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\memory-extraction-pipeline.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\skills-page.spec.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\tools\\discover-design-files.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\api-settings\\clisettingsmodal.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\tests\\e2e\\api-settings.spec.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"generated_artifact_count": 1,
|
||||||
|
"test_file_count": 2,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:precomputed": {
|
||||||
|
"strategy_key": "staged:precomputed",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "precomputed",
|
||||||
|
"effective_method": "staged",
|
||||||
|
"execution_method": "cascade",
|
||||||
|
"latency_ms": 14900.017599999905,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useissues.test.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-sessions-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\filepreview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\hooks\\hook-templates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\utils\\file-reader.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\routes\\cli-sessions-routes.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\history-importer.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"generated_artifact_count": 1,
|
||||||
|
"test_file_count": 0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:realtime": {
|
||||||
|
"strategy_key": "staged:realtime",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "realtime",
|
||||||
|
"effective_method": "staged",
|
||||||
|
"execution_method": "cascade",
|
||||||
|
"latency_ms": 14104.314599990845,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useissues.test.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-sessions-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\filepreview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\hooks\\hook-templates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\utils\\file-reader.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\routes\\cli-sessions-routes.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\history-importer.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"generated_artifact_count": 1,
|
||||||
|
"test_file_count": 0,
|
||||||
|
"error": null
|
||||||
|
},
|
||||||
|
"staged:static_global_graph": {
|
||||||
|
"strategy_key": "staged:static_global_graph",
|
||||||
|
"strategy": "staged",
|
||||||
|
"stage2_mode": "static_global_graph",
|
||||||
|
"effective_method": "staged",
|
||||||
|
"execution_method": "cascade",
|
||||||
|
"latency_ms": 11906.852500021458,
|
||||||
|
"topk_paths": [
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\tools\\native-session-discovery.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\commands\\memory.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\hooks\\useissues.test.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\routes\\cli-sessions-routes.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\lib\\api.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\frontend\\src\\components\\shared\\filepreview.tsx",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\hooks\\hook-templates.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\utils\\file-reader.ts",
|
||||||
|
"d:\\claude_dms3\\ccw\\dist\\core\\routes\\cli-sessions-routes.js",
|
||||||
|
"d:\\claude_dms3\\ccw\\src\\core\\history-importer.ts"
|
||||||
|
],
|
||||||
|
"first_hit_rank": null,
|
||||||
|
"hit_at_k": false,
|
||||||
|
"recall_at_k": 0.0,
|
||||||
|
"generated_artifact_count": 1,
|
||||||
|
"test_file_count": 0,
|
||||||
|
"error": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
@@ -57,9 +57,9 @@ semantic-directml = [
|
|||||||
# Cross-encoder reranking (second-stage, optional)
|
# Cross-encoder reranking (second-stage, optional)
|
||||||
# Install with: pip install codexlens[reranker] (default: ONNX backend)
|
# Install with: pip install codexlens[reranker] (default: ONNX backend)
|
||||||
reranker-onnx = [
|
reranker-onnx = [
|
||||||
"optimum~=1.16.0",
|
"optimum[onnxruntime]~=2.1.0",
|
||||||
"onnxruntime~=1.15.0",
|
"onnxruntime~=1.23.0",
|
||||||
"transformers~=4.36.0",
|
"transformers~=4.53.0",
|
||||||
]
|
]
|
||||||
|
|
||||||
# Remote reranking via HTTP API
|
# Remote reranking via HTTP API
|
||||||
@@ -79,9 +79,9 @@ reranker-legacy = [
|
|||||||
|
|
||||||
# Backward-compatible alias for default reranker backend
|
# Backward-compatible alias for default reranker backend
|
||||||
reranker = [
|
reranker = [
|
||||||
"optimum~=1.16.0",
|
"optimum[onnxruntime]~=2.1.0",
|
||||||
"onnxruntime~=1.15.0",
|
"onnxruntime~=1.23.0",
|
||||||
"transformers~=4.36.0",
|
"transformers~=4.53.0",
|
||||||
]
|
]
|
||||||
|
|
||||||
# Encoding detection for non-UTF8 files
|
# Encoding detection for non-UTF8 files
|
||||||
@@ -116,3 +116,12 @@ package-dir = { "" = "src" }
|
|||||||
|
|
||||||
[tool.setuptools.package-data]
|
[tool.setuptools.package-data]
|
||||||
"codexlens.lsp" = ["lsp-servers.json"]
|
"codexlens.lsp" = ["lsp-servers.json"]
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
markers = [
|
||||||
|
"integration: marks tests that exercise broader end-to-end or dependency-heavy flows",
|
||||||
|
]
|
||||||
|
filterwarnings = [
|
||||||
|
"ignore:'BaseCommand' is deprecated and will be removed in Click 9.0.*:DeprecationWarning",
|
||||||
|
"ignore:The '__version__' attribute is deprecated and will be removed in Click 9.1.*:DeprecationWarning",
|
||||||
|
]
|
||||||
|
|||||||
340
codex-lens/scripts/bootstrap_reranker_local.py
Normal file
340
codex-lens/scripts/bootstrap_reranker_local.py
Normal file
@@ -0,0 +1,340 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Bootstrap a local-only ONNX reranker environment for CodexLens.
|
||||||
|
|
||||||
|
This script defaults to dry-run output so it can be used as a reproducible
|
||||||
|
bootstrap manifest. When `--apply` is passed, it installs pinned reranker
|
||||||
|
packages into the selected virtual environment and can optionally pre-download
|
||||||
|
the ONNX reranker model into a repo-local Hugging Face cache.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
python scripts/bootstrap_reranker_local.py --dry-run
|
||||||
|
python scripts/bootstrap_reranker_local.py --apply --download-model
|
||||||
|
python scripts/bootstrap_reranker_local.py --venv .venv --model Xenova/ms-marco-MiniLM-L-12-v2
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import shlex
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Iterable
|
||||||
|
|
||||||
|
|
||||||
|
PROJECT_ROOT = Path(__file__).resolve().parents[1]
|
||||||
|
MANIFEST_PATH = Path(__file__).with_name("requirements-reranker-local.txt")
|
||||||
|
DEFAULT_MODEL = "Xenova/ms-marco-MiniLM-L-6-v2"
|
||||||
|
DEFAULT_HF_HOME = PROJECT_ROOT / ".cache" / "huggingface"
|
||||||
|
|
||||||
|
STEP_NOTES = {
|
||||||
|
"runtime": "Install the local ONNX runtime first so optimum/transformers do not backtrack over runtime wheels.",
|
||||||
|
"hf-stack": "Pin the Hugging Face stack used by the ONNX reranker backend.",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class RequirementStep:
|
||||||
|
name: str
|
||||||
|
packages: tuple[str, ...]
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_venv_path(raw_path: str | Path) -> Path:
|
||||||
|
return (Path(raw_path) if raw_path else PROJECT_ROOT / ".venv").expanduser().resolve()
|
||||||
|
|
||||||
|
|
||||||
|
def _venv_python(venv_path: Path) -> Path:
|
||||||
|
if os.name == "nt":
|
||||||
|
return venv_path / "Scripts" / "python.exe"
|
||||||
|
return venv_path / "bin" / "python"
|
||||||
|
|
||||||
|
|
||||||
|
def _venv_huggingface_cli(venv_path: Path) -> Path:
|
||||||
|
if os.name == "nt":
|
||||||
|
preferred = venv_path / "Scripts" / "hf.exe"
|
||||||
|
return preferred if preferred.exists() else venv_path / "Scripts" / "huggingface-cli.exe"
|
||||||
|
preferred = venv_path / "bin" / "hf"
|
||||||
|
return preferred if preferred.exists() else venv_path / "bin" / "huggingface-cli"
|
||||||
|
|
||||||
|
|
||||||
|
def _default_shell() -> str:
|
||||||
|
return "powershell" if os.name == "nt" else "bash"
|
||||||
|
|
||||||
|
|
||||||
|
def _shell_quote(value: str, shell: str) -> str:
|
||||||
|
if shell == "bash":
|
||||||
|
return shlex.quote(value)
|
||||||
|
return "'" + value.replace("'", "''") + "'"
|
||||||
|
|
||||||
|
|
||||||
|
def _format_command(parts: Iterable[str], shell: str) -> str:
|
||||||
|
return " ".join(_shell_quote(str(part), shell) for part in parts)
|
||||||
|
|
||||||
|
|
||||||
|
def _format_set_env(name: str, value: str, shell: str) -> str:
|
||||||
|
quoted_value = _shell_quote(value, shell)
|
||||||
|
if shell == "bash":
|
||||||
|
return f"export {name}={quoted_value}"
|
||||||
|
return f"$env:{name} = {quoted_value}"
|
||||||
|
|
||||||
|
|
||||||
|
def _model_local_dir(hf_home: Path, model_name: str) -> Path:
|
||||||
|
slug = model_name.replace("/", "--")
|
||||||
|
return hf_home / "models" / slug
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_manifest(manifest_path: Path) -> list[RequirementStep]:
|
||||||
|
current_name: str | None = None
|
||||||
|
current_packages: list[str] = []
|
||||||
|
steps: list[RequirementStep] = []
|
||||||
|
|
||||||
|
for raw_line in manifest_path.read_text(encoding="utf-8").splitlines():
|
||||||
|
line = raw_line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if line.startswith("# [") and line.endswith("]"):
|
||||||
|
if current_name and current_packages:
|
||||||
|
steps.append(RequirementStep(current_name, tuple(current_packages)))
|
||||||
|
current_name = line[3:-1]
|
||||||
|
current_packages = []
|
||||||
|
continue
|
||||||
|
|
||||||
|
if line.startswith("#"):
|
||||||
|
continue
|
||||||
|
|
||||||
|
if current_name is None:
|
||||||
|
raise ValueError(f"Package entry found before a section header in {manifest_path}")
|
||||||
|
current_packages.append(line)
|
||||||
|
|
||||||
|
if current_name and current_packages:
|
||||||
|
steps.append(RequirementStep(current_name, tuple(current_packages)))
|
||||||
|
|
||||||
|
if not steps:
|
||||||
|
raise ValueError(f"No requirement steps found in {manifest_path}")
|
||||||
|
return steps
|
||||||
|
|
||||||
|
|
||||||
|
def _pip_install_command(python_path: Path, packages: Iterable[str]) -> list[str]:
|
||||||
|
return [
|
||||||
|
str(python_path),
|
||||||
|
"-m",
|
||||||
|
"pip",
|
||||||
|
"install",
|
||||||
|
"--upgrade",
|
||||||
|
"--disable-pip-version-check",
|
||||||
|
"--upgrade-strategy",
|
||||||
|
"only-if-needed",
|
||||||
|
"--only-binary=:all:",
|
||||||
|
*packages,
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _probe_command(python_path: Path) -> list[str]:
|
||||||
|
return [
|
||||||
|
str(python_path),
|
||||||
|
"-c",
|
||||||
|
(
|
||||||
|
"from codexlens.semantic.reranker.factory import check_reranker_available; "
|
||||||
|
"print(check_reranker_available('onnx'))"
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _download_command(huggingface_cli: Path, model_name: str, model_dir: Path) -> list[str]:
|
||||||
|
return [
|
||||||
|
str(huggingface_cli),
|
||||||
|
"download",
|
||||||
|
model_name,
|
||||||
|
"--local-dir",
|
||||||
|
str(model_dir),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _print_plan(
|
||||||
|
shell: str,
|
||||||
|
venv_path: Path,
|
||||||
|
python_path: Path,
|
||||||
|
huggingface_cli: Path,
|
||||||
|
manifest_path: Path,
|
||||||
|
steps: list[RequirementStep],
|
||||||
|
model_name: str,
|
||||||
|
hf_home: Path,
|
||||||
|
) -> None:
|
||||||
|
model_dir = _model_local_dir(hf_home, model_name)
|
||||||
|
|
||||||
|
print("CodexLens local reranker bootstrap")
|
||||||
|
print(f"manifest: {manifest_path}")
|
||||||
|
print(f"target_venv: {venv_path}")
|
||||||
|
print(f"target_python: {python_path}")
|
||||||
|
print(f"backend: onnx")
|
||||||
|
print(f"model: {model_name}")
|
||||||
|
print(f"hf_home: {hf_home}")
|
||||||
|
print("mode: dry-run")
|
||||||
|
print("notes:")
|
||||||
|
print("- Uses only the selected venv Python; no global pip commands are emitted.")
|
||||||
|
print("- Targets the local ONNX reranker backend only; no API or LiteLLM providers are involved.")
|
||||||
|
print("")
|
||||||
|
print("pinned_steps:")
|
||||||
|
for step in steps:
|
||||||
|
print(f"- {step.name}: {', '.join(step.packages)}")
|
||||||
|
note = STEP_NOTES.get(step.name)
|
||||||
|
if note:
|
||||||
|
print(f" note: {note}")
|
||||||
|
print("")
|
||||||
|
print("commands:")
|
||||||
|
print(
|
||||||
|
"1. "
|
||||||
|
+ _format_command(
|
||||||
|
[
|
||||||
|
str(python_path),
|
||||||
|
"-m",
|
||||||
|
"pip",
|
||||||
|
"install",
|
||||||
|
"--upgrade",
|
||||||
|
"pip",
|
||||||
|
"setuptools",
|
||||||
|
"wheel",
|
||||||
|
],
|
||||||
|
shell,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
command_index = 2
|
||||||
|
for step in steps:
|
||||||
|
print(f"{command_index}. " + _format_command(_pip_install_command(python_path, step.packages), shell))
|
||||||
|
command_index += 1
|
||||||
|
print(f"{command_index}. " + _format_set_env("HF_HOME", str(hf_home), shell))
|
||||||
|
command_index += 1
|
||||||
|
print(f"{command_index}. " + _format_command(_download_command(huggingface_cli, model_name, model_dir), shell))
|
||||||
|
command_index += 1
|
||||||
|
print(f"{command_index}. " + _format_command(_probe_command(python_path), shell))
|
||||||
|
print("")
|
||||||
|
print("optional_runtime_env:")
|
||||||
|
print(_format_set_env("RERANKER_BACKEND", "onnx", shell))
|
||||||
|
print(_format_set_env("RERANKER_MODEL", str(model_dir), shell))
|
||||||
|
print(_format_set_env("HF_HOME", str(hf_home), shell))
|
||||||
|
|
||||||
|
|
||||||
|
def _run_command(command: list[str], *, env: dict[str, str] | None = None) -> None:
|
||||||
|
command_env = os.environ.copy()
|
||||||
|
if env:
|
||||||
|
command_env.update(env)
|
||||||
|
command_env.setdefault("PYTHONUTF8", "1")
|
||||||
|
command_env.setdefault("PYTHONIOENCODING", "utf-8")
|
||||||
|
subprocess.run(command, check=True, env=command_env)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> int:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Bootstrap pinned local-only ONNX reranker dependencies for a CodexLens virtual environment.",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog=__doc__,
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--venv",
|
||||||
|
type=Path,
|
||||||
|
default=PROJECT_ROOT / ".venv",
|
||||||
|
help="Path to the CodexLens virtual environment (default: ./.venv under codex-lens).",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--model",
|
||||||
|
default=DEFAULT_MODEL,
|
||||||
|
help=f"Model repo to pre-download for local reranking (default: {DEFAULT_MODEL}).",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--hf-home",
|
||||||
|
type=Path,
|
||||||
|
default=DEFAULT_HF_HOME,
|
||||||
|
help="Repo-local Hugging Face cache directory used for optional model downloads.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--shell",
|
||||||
|
choices=("powershell", "bash"),
|
||||||
|
default=_default_shell(),
|
||||||
|
help="Shell syntax to use when rendering dry-run commands.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--apply",
|
||||||
|
action="store_true",
|
||||||
|
help="Execute the pinned install steps against the selected virtual environment.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--download-model",
|
||||||
|
action="store_true",
|
||||||
|
help="When used with --apply, pre-download the model into the configured HF_HOME directory.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--probe",
|
||||||
|
action="store_true",
|
||||||
|
help="When used with --apply, run a small reranker availability probe at the end.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--dry-run",
|
||||||
|
action="store_true",
|
||||||
|
help="Print the deterministic bootstrap plan. This is also the default when --apply is omitted.",
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
steps = _parse_manifest(MANIFEST_PATH)
|
||||||
|
venv_path = _normalize_venv_path(args.venv)
|
||||||
|
python_path = _venv_python(venv_path)
|
||||||
|
huggingface_cli = _venv_huggingface_cli(venv_path)
|
||||||
|
hf_home = args.hf_home.expanduser().resolve()
|
||||||
|
|
||||||
|
if not args.apply:
|
||||||
|
_print_plan(
|
||||||
|
shell=args.shell,
|
||||||
|
venv_path=venv_path,
|
||||||
|
python_path=python_path,
|
||||||
|
huggingface_cli=huggingface_cli,
|
||||||
|
manifest_path=MANIFEST_PATH,
|
||||||
|
steps=steps,
|
||||||
|
model_name=args.model,
|
||||||
|
hf_home=hf_home,
|
||||||
|
)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
if not python_path.exists():
|
||||||
|
print(f"Target venv Python not found: {python_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
_run_command(
|
||||||
|
[
|
||||||
|
str(python_path),
|
||||||
|
"-m",
|
||||||
|
"pip",
|
||||||
|
"install",
|
||||||
|
"--upgrade",
|
||||||
|
"pip",
|
||||||
|
"setuptools",
|
||||||
|
"wheel",
|
||||||
|
]
|
||||||
|
)
|
||||||
|
for step in steps:
|
||||||
|
_run_command(_pip_install_command(python_path, step.packages))
|
||||||
|
|
||||||
|
if args.download_model:
|
||||||
|
if not huggingface_cli.exists():
|
||||||
|
print(f"Expected venv-local Hugging Face CLI not found: {huggingface_cli}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
download_env = os.environ.copy()
|
||||||
|
download_env["HF_HOME"] = str(hf_home)
|
||||||
|
hf_home.mkdir(parents=True, exist_ok=True)
|
||||||
|
_run_command(_download_command(huggingface_cli, args.model, _model_local_dir(hf_home, args.model)), env=download_env)
|
||||||
|
|
||||||
|
if args.probe:
|
||||||
|
local_model_dir = _model_local_dir(hf_home, args.model)
|
||||||
|
probe_env = os.environ.copy()
|
||||||
|
probe_env["HF_HOME"] = str(hf_home)
|
||||||
|
probe_env.setdefault("RERANKER_BACKEND", "onnx")
|
||||||
|
probe_env.setdefault("RERANKER_MODEL", str(local_model_dir if local_model_dir.exists() else args.model))
|
||||||
|
_run_command(_probe_command(python_path), env=probe_env)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
13
codex-lens/scripts/requirements-reranker-local.txt
Normal file
13
codex-lens/scripts/requirements-reranker-local.txt
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
# Ordered local ONNX reranker pins for CodexLens.
|
||||||
|
# Validated against the repo-local Python 3.13 virtualenv on Windows.
|
||||||
|
# bootstrap_reranker_local.py installs each section in file order to keep
|
||||||
|
# pip resolver work bounded and repeatable.
|
||||||
|
|
||||||
|
# [runtime]
|
||||||
|
numpy==2.4.0
|
||||||
|
onnxruntime==1.23.2
|
||||||
|
|
||||||
|
# [hf-stack]
|
||||||
|
huggingface-hub==0.36.2
|
||||||
|
transformers==4.53.3
|
||||||
|
optimum[onnxruntime]==2.1.0
|
||||||
@@ -2,10 +2,13 @@
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import inspect
|
||||||
import json
|
import json
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
|
import re
|
||||||
import shutil
|
import shutil
|
||||||
|
import subprocess
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Annotated, Any, Dict, Iterable, List, Optional
|
from typing import Annotated, Any, Dict, Iterable, List, Optional
|
||||||
|
|
||||||
@@ -22,6 +25,13 @@ from codexlens.storage.registry import RegistryStore, ProjectInfo
|
|||||||
from codexlens.storage.index_tree import IndexTreeBuilder
|
from codexlens.storage.index_tree import IndexTreeBuilder
|
||||||
from codexlens.storage.dir_index import DirIndexStore
|
from codexlens.storage.dir_index import DirIndexStore
|
||||||
from codexlens.search.chain_search import ChainSearchEngine, SearchOptions
|
from codexlens.search.chain_search import ChainSearchEngine, SearchOptions
|
||||||
|
from codexlens.search.ranking import (
|
||||||
|
QueryIntent,
|
||||||
|
apply_path_penalties,
|
||||||
|
detect_query_intent,
|
||||||
|
query_prefers_lexical_search,
|
||||||
|
query_targets_generated_files,
|
||||||
|
)
|
||||||
from codexlens.watcher import WatcherManager, WatcherConfig
|
from codexlens.watcher import WatcherManager, WatcherConfig
|
||||||
|
|
||||||
from .output import (
|
from .output import (
|
||||||
@@ -34,6 +44,56 @@ from .output import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
app = typer.Typer(help="CodexLens CLI — local code indexing and search.")
|
app = typer.Typer(help="CodexLens CLI — local code indexing and search.")
|
||||||
|
# Index subcommand group for reorganized commands
|
||||||
|
def _patch_typer_click_help_compat() -> None:
|
||||||
|
"""Patch Typer help rendering for Click versions that pass ctx to make_metavar()."""
|
||||||
|
import click.core
|
||||||
|
from typer.core import TyperArgument
|
||||||
|
|
||||||
|
try:
|
||||||
|
params = inspect.signature(TyperArgument.make_metavar).parameters
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return
|
||||||
|
|
||||||
|
if len(params) != 1:
|
||||||
|
return
|
||||||
|
|
||||||
|
def _compat_make_metavar(self, ctx=None): # type: ignore[override]
|
||||||
|
if self.metavar is not None:
|
||||||
|
return self.metavar
|
||||||
|
|
||||||
|
var = (self.name or "").upper()
|
||||||
|
if not self.required:
|
||||||
|
var = f"[{var}]"
|
||||||
|
|
||||||
|
try:
|
||||||
|
type_var = self.type.get_metavar(param=self, ctx=ctx)
|
||||||
|
except TypeError:
|
||||||
|
try:
|
||||||
|
type_var = self.type.get_metavar(self, ctx)
|
||||||
|
except TypeError:
|
||||||
|
type_var = self.type.get_metavar(self)
|
||||||
|
|
||||||
|
if type_var:
|
||||||
|
var += f":{type_var}"
|
||||||
|
if self.nargs != 1:
|
||||||
|
var += "..."
|
||||||
|
return var
|
||||||
|
|
||||||
|
TyperArgument.make_metavar = _compat_make_metavar
|
||||||
|
|
||||||
|
param_params = inspect.signature(click.core.Parameter.make_metavar).parameters
|
||||||
|
if len(param_params) == 2:
|
||||||
|
original_param_make_metavar = click.core.Parameter.make_metavar
|
||||||
|
|
||||||
|
def _compat_param_make_metavar(self, ctx=None): # type: ignore[override]
|
||||||
|
return original_param_make_metavar(self, ctx)
|
||||||
|
|
||||||
|
click.core.Parameter.make_metavar = _compat_param_make_metavar
|
||||||
|
|
||||||
|
|
||||||
|
_patch_typer_click_help_compat()
|
||||||
|
|
||||||
|
|
||||||
# Index subcommand group for reorganized commands
|
# Index subcommand group for reorganized commands
|
||||||
index_app = typer.Typer(help="Index management commands (init, embeddings, binary, status, migrate, all)")
|
index_app = typer.Typer(help="Index management commands (init, embeddings, binary, status, migrate, all)")
|
||||||
@@ -119,6 +179,281 @@ def _extract_embedding_error(embed_result: Dict[str, Any]) -> str:
|
|||||||
return "Embedding generation failed (no error details provided)"
|
return "Embedding generation failed (no error details provided)"
|
||||||
|
|
||||||
|
|
||||||
|
def _auto_select_search_method(query: str) -> str:
|
||||||
|
"""Choose a default search method from query intent."""
|
||||||
|
if query_targets_generated_files(query) or query_prefers_lexical_search(query):
|
||||||
|
return "fts"
|
||||||
|
|
||||||
|
intent = detect_query_intent(query)
|
||||||
|
if intent == QueryIntent.KEYWORD:
|
||||||
|
return "fts"
|
||||||
|
if intent == QueryIntent.SEMANTIC:
|
||||||
|
return "dense_rerank"
|
||||||
|
return "hybrid"
|
||||||
|
|
||||||
|
|
||||||
|
_CLI_NON_CODE_EXTENSIONS = {
|
||||||
|
"md", "txt", "json", "yaml", "yml", "xml", "csv", "log",
|
||||||
|
"ini", "cfg", "conf", "toml", "env", "properties",
|
||||||
|
"html", "htm", "svg", "png", "jpg", "jpeg", "gif", "ico", "webp",
|
||||||
|
"pdf", "doc", "docx", "xls", "xlsx", "ppt", "pptx",
|
||||||
|
"lock", "sum", "mod",
|
||||||
|
}
|
||||||
|
_FALLBACK_ARTIFACT_DIRS = {
|
||||||
|
"dist",
|
||||||
|
"build",
|
||||||
|
"out",
|
||||||
|
"coverage",
|
||||||
|
"htmlcov",
|
||||||
|
".cache",
|
||||||
|
".workflow",
|
||||||
|
".next",
|
||||||
|
".nuxt",
|
||||||
|
".parcel-cache",
|
||||||
|
".turbo",
|
||||||
|
"tmp",
|
||||||
|
"temp",
|
||||||
|
"generated",
|
||||||
|
}
|
||||||
|
_FALLBACK_SOURCE_DIRS = {
|
||||||
|
"src",
|
||||||
|
"lib",
|
||||||
|
"core",
|
||||||
|
"app",
|
||||||
|
"server",
|
||||||
|
"client",
|
||||||
|
"services",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_extension_filters(exclude_extensions: Optional[Iterable[str]]) -> set[str]:
|
||||||
|
"""Normalize extension filters to lowercase values without leading dots."""
|
||||||
|
normalized: set[str] = set()
|
||||||
|
for ext in exclude_extensions or []:
|
||||||
|
cleaned = (ext or "").strip().lower().lstrip(".")
|
||||||
|
if cleaned:
|
||||||
|
normalized.add(cleaned)
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _score_filesystem_fallback_match(
|
||||||
|
query: str,
|
||||||
|
path_text: str,
|
||||||
|
line_text: str,
|
||||||
|
*,
|
||||||
|
base_score: float,
|
||||||
|
) -> float:
|
||||||
|
"""Score filesystem fallback hits with light source-aware heuristics."""
|
||||||
|
score = max(0.0, float(base_score))
|
||||||
|
if score <= 0:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
query_intent = detect_query_intent(query)
|
||||||
|
if query_intent != QueryIntent.KEYWORD:
|
||||||
|
return score
|
||||||
|
|
||||||
|
path_parts = {
|
||||||
|
part.casefold()
|
||||||
|
for part in str(path_text).replace("\\", "/").split("/")
|
||||||
|
if part and part != "."
|
||||||
|
}
|
||||||
|
if _FALLBACK_SOURCE_DIRS.intersection(path_parts):
|
||||||
|
score *= 1.15
|
||||||
|
|
||||||
|
symbol = (query or "").strip()
|
||||||
|
if " " in symbol or not symbol:
|
||||||
|
return score
|
||||||
|
|
||||||
|
escaped_symbol = re.escape(symbol)
|
||||||
|
definition_patterns = (
|
||||||
|
rf"^\s*(?:export\s+)?(?:async\s+)?def\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?(?:async\s+)?function\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?class\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?interface\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?type\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?(?:const|let|var)\s+{escaped_symbol}\b",
|
||||||
|
)
|
||||||
|
if any(re.search(pattern, line_text) for pattern in definition_patterns):
|
||||||
|
score *= 1.8
|
||||||
|
|
||||||
|
return score
|
||||||
|
|
||||||
|
|
||||||
|
def _filesystem_fallback_search(
|
||||||
|
query: str,
|
||||||
|
search_path: Path,
|
||||||
|
*,
|
||||||
|
limit: int,
|
||||||
|
config: Config,
|
||||||
|
code_only: bool = False,
|
||||||
|
exclude_extensions: Optional[Iterable[str]] = None,
|
||||||
|
) -> Optional[dict[str, Any]]:
|
||||||
|
"""Fallback to ripgrep when indexed keyword search returns no results."""
|
||||||
|
rg_path = shutil.which("rg")
|
||||||
|
if not rg_path or not query.strip():
|
||||||
|
return None
|
||||||
|
|
||||||
|
import time
|
||||||
|
|
||||||
|
allow_generated = query_targets_generated_files(query)
|
||||||
|
ignored_dirs = {name for name in IndexTreeBuilder.IGNORE_DIRS if name}
|
||||||
|
ignored_dirs.add(".workflow")
|
||||||
|
if allow_generated:
|
||||||
|
ignored_dirs.difference_update(_FALLBACK_ARTIFACT_DIRS)
|
||||||
|
|
||||||
|
excluded_exts = _normalize_extension_filters(exclude_extensions)
|
||||||
|
if code_only:
|
||||||
|
excluded_exts.update(_CLI_NON_CODE_EXTENSIONS)
|
||||||
|
|
||||||
|
args = [
|
||||||
|
rg_path,
|
||||||
|
"--json",
|
||||||
|
"--line-number",
|
||||||
|
"--fixed-strings",
|
||||||
|
"--smart-case",
|
||||||
|
"--max-count",
|
||||||
|
"1",
|
||||||
|
]
|
||||||
|
if allow_generated:
|
||||||
|
args.append("--hidden")
|
||||||
|
|
||||||
|
for dirname in sorted(ignored_dirs):
|
||||||
|
args.extend(["--glob", f"!**/{dirname}/**"])
|
||||||
|
|
||||||
|
args.extend([query, str(search_path)])
|
||||||
|
|
||||||
|
start_time = time.perf_counter()
|
||||||
|
proc = subprocess.run(
|
||||||
|
args,
|
||||||
|
stdout=subprocess.PIPE,
|
||||||
|
stderr=subprocess.PIPE,
|
||||||
|
text=True,
|
||||||
|
encoding="utf-8",
|
||||||
|
errors="replace",
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
if proc.returncode not in (0, 1):
|
||||||
|
return None
|
||||||
|
|
||||||
|
matches: List[SearchResult] = []
|
||||||
|
seen_paths: set[str] = set()
|
||||||
|
for raw_line in proc.stdout.splitlines():
|
||||||
|
if len(matches) >= limit:
|
||||||
|
break
|
||||||
|
try:
|
||||||
|
event = json.loads(raw_line)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
if event.get("type") != "match":
|
||||||
|
continue
|
||||||
|
|
||||||
|
data = event.get("data") or {}
|
||||||
|
path_text = ((data.get("path") or {}).get("text") or "").strip()
|
||||||
|
if not path_text or path_text in seen_paths:
|
||||||
|
continue
|
||||||
|
|
||||||
|
path_obj = Path(path_text)
|
||||||
|
extension = path_obj.suffix.lower().lstrip(".")
|
||||||
|
if extension and extension in excluded_exts:
|
||||||
|
continue
|
||||||
|
if code_only and config.language_for_path(path_obj) is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
line_text = ((data.get("lines") or {}).get("text") or "").rstrip("\r\n")
|
||||||
|
line_number = data.get("line_number")
|
||||||
|
seen_paths.add(path_text)
|
||||||
|
base_score = float(limit - len(matches))
|
||||||
|
matches.append(
|
||||||
|
SearchResult(
|
||||||
|
path=path_text,
|
||||||
|
score=_score_filesystem_fallback_match(
|
||||||
|
query,
|
||||||
|
path_text,
|
||||||
|
line_text,
|
||||||
|
base_score=base_score,
|
||||||
|
),
|
||||||
|
excerpt=line_text.strip() or line_text or path_text,
|
||||||
|
content=None,
|
||||||
|
metadata={
|
||||||
|
"filesystem_fallback": True,
|
||||||
|
"backend": "ripgrep-fallback",
|
||||||
|
"stale_index_suspected": True,
|
||||||
|
},
|
||||||
|
start_line=line_number,
|
||||||
|
end_line=line_number,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
if not matches:
|
||||||
|
return None
|
||||||
|
|
||||||
|
matches = apply_path_penalties(
|
||||||
|
matches,
|
||||||
|
query,
|
||||||
|
test_file_penalty=config.test_file_penalty,
|
||||||
|
generated_file_penalty=config.generated_file_penalty,
|
||||||
|
)
|
||||||
|
return {
|
||||||
|
"results": matches,
|
||||||
|
"time_ms": (time.perf_counter() - start_time) * 1000.0,
|
||||||
|
"fallback": {
|
||||||
|
"backend": "ripgrep-fallback",
|
||||||
|
"stale_index_suspected": True,
|
||||||
|
"reason": "Indexed FTS search returned no results; filesystem fallback used.",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _remove_tree_best_effort(target: Path) -> dict[str, Any]:
|
||||||
|
"""Remove a directory tree without aborting on locked files."""
|
||||||
|
target = target.resolve()
|
||||||
|
if not target.exists():
|
||||||
|
return {
|
||||||
|
"removed": True,
|
||||||
|
"partial": False,
|
||||||
|
"locked_paths": [],
|
||||||
|
"errors": [],
|
||||||
|
"remaining_path": None,
|
||||||
|
}
|
||||||
|
|
||||||
|
locked_paths: List[str] = []
|
||||||
|
errors: List[str] = []
|
||||||
|
entries = sorted(target.rglob("*"), key=lambda path: len(path.parts), reverse=True)
|
||||||
|
|
||||||
|
for entry in entries:
|
||||||
|
try:
|
||||||
|
if entry.is_dir() and not entry.is_symlink():
|
||||||
|
entry.rmdir()
|
||||||
|
else:
|
||||||
|
entry.unlink()
|
||||||
|
except FileNotFoundError:
|
||||||
|
continue
|
||||||
|
except PermissionError:
|
||||||
|
locked_paths.append(str(entry))
|
||||||
|
except OSError as exc:
|
||||||
|
if entry.is_dir():
|
||||||
|
continue
|
||||||
|
errors.append(f"{entry}: {exc}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
target.rmdir()
|
||||||
|
except FileNotFoundError:
|
||||||
|
pass
|
||||||
|
except PermissionError:
|
||||||
|
locked_paths.append(str(target))
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
return {
|
||||||
|
"removed": not target.exists(),
|
||||||
|
"partial": target.exists(),
|
||||||
|
"locked_paths": sorted(set(locked_paths)),
|
||||||
|
"errors": errors,
|
||||||
|
"remaining_path": str(target) if target.exists() else None,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
def _get_index_root() -> Path:
|
def _get_index_root() -> Path:
|
||||||
"""Get the index root directory from config or default.
|
"""Get the index root directory from config or default.
|
||||||
|
|
||||||
@@ -542,7 +877,7 @@ def search(
|
|||||||
offset: int = typer.Option(0, "--offset", min=0, help="Pagination offset - skip first N results."),
|
offset: int = typer.Option(0, "--offset", min=0, help="Pagination offset - skip first N results."),
|
||||||
depth: int = typer.Option(-1, "--depth", "-d", help="Search depth (-1 = unlimited, 0 = current only)."),
|
depth: int = typer.Option(-1, "--depth", "-d", help="Search depth (-1 = unlimited, 0 = current only)."),
|
||||||
files_only: bool = typer.Option(False, "--files-only", "-f", help="Return only file paths without content snippets."),
|
files_only: bool = typer.Option(False, "--files-only", "-f", help="Return only file paths without content snippets."),
|
||||||
method: str = typer.Option("dense_rerank", "--method", "-m", help="Search method: 'dense_rerank' (semantic, default), 'fts' (exact keyword)."),
|
method: str = typer.Option("auto", "--method", "-m", help="Search method: 'auto' (intent-aware, default), 'dense_rerank' (semantic), 'fts' (exact keyword)."),
|
||||||
use_fuzzy: bool = typer.Option(False, "--use-fuzzy", help="Enable fuzzy matching in FTS method."),
|
use_fuzzy: bool = typer.Option(False, "--use-fuzzy", help="Enable fuzzy matching in FTS method."),
|
||||||
code_only: bool = typer.Option(False, "--code-only", help="Only return code files (excludes md, txt, json, yaml, xml, etc.)."),
|
code_only: bool = typer.Option(False, "--code-only", help="Only return code files (excludes md, txt, json, yaml, xml, etc.)."),
|
||||||
exclude_extensions: Optional[str] = typer.Option(None, "--exclude-extensions", help="Comma-separated list of file extensions to exclude (e.g., 'md,txt,json')."),
|
exclude_extensions: Optional[str] = typer.Option(None, "--exclude-extensions", help="Comma-separated list of file extensions to exclude (e.g., 'md,txt,json')."),
|
||||||
@@ -576,14 +911,16 @@ def search(
|
|||||||
Use --depth to limit search recursion (0 = current dir only).
|
Use --depth to limit search recursion (0 = current dir only).
|
||||||
|
|
||||||
Search Methods:
|
Search Methods:
|
||||||
- dense_rerank (default): Semantic search using Dense embedding coarse retrieval +
|
- auto (default): Intent-aware routing. KEYWORD -> fts, MIXED -> hybrid,
|
||||||
|
SEMANTIC -> dense_rerank.
|
||||||
|
- dense_rerank: Semantic search using Dense embedding coarse retrieval +
|
||||||
Cross-encoder reranking. Best for natural language queries and code understanding.
|
Cross-encoder reranking. Best for natural language queries and code understanding.
|
||||||
- fts: Full-text search using FTS5 (unicode61 tokenizer). Best for exact code
|
- fts: Full-text search using FTS5 (unicode61 tokenizer). Best for exact code
|
||||||
identifiers like function/class names. Use --use-fuzzy for typo tolerance.
|
identifiers like function/class names. Use --use-fuzzy for typo tolerance.
|
||||||
|
|
||||||
Method Selection Guide:
|
Method Selection Guide:
|
||||||
- Code identifiers (function/class names): fts
|
- Code identifiers (function/class names): auto or fts
|
||||||
- Natural language queries: dense_rerank (default)
|
- Natural language queries: auto or dense_rerank
|
||||||
- Typo-tolerant search: fts --use-fuzzy
|
- Typo-tolerant search: fts --use-fuzzy
|
||||||
|
|
||||||
Requirements:
|
Requirements:
|
||||||
@@ -591,7 +928,7 @@ def search(
|
|||||||
Use 'codexlens embeddings-generate' to create embeddings first.
|
Use 'codexlens embeddings-generate' to create embeddings first.
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
# Default semantic search (dense_rerank)
|
# Default intent-aware search
|
||||||
codexlens search "authentication logic"
|
codexlens search "authentication logic"
|
||||||
|
|
||||||
# Exact code identifier search
|
# Exact code identifier search
|
||||||
@@ -612,7 +949,7 @@ def search(
|
|||||||
|
|
||||||
# Map old mode values to new method values
|
# Map old mode values to new method values
|
||||||
mode_to_method = {
|
mode_to_method = {
|
||||||
"auto": "hybrid",
|
"auto": "auto",
|
||||||
"exact": "fts",
|
"exact": "fts",
|
||||||
"fuzzy": "fts", # with use_fuzzy=True
|
"fuzzy": "fts", # with use_fuzzy=True
|
||||||
"hybrid": "hybrid",
|
"hybrid": "hybrid",
|
||||||
@@ -638,19 +975,27 @@ def search(
|
|||||||
|
|
||||||
# Validate method - simplified interface exposes only dense_rerank and fts
|
# Validate method - simplified interface exposes only dense_rerank and fts
|
||||||
# Other methods (vector, hybrid, cascade) are hidden but still work for backward compatibility
|
# Other methods (vector, hybrid, cascade) are hidden but still work for backward compatibility
|
||||||
valid_methods = ["fts", "dense_rerank", "vector", "hybrid", "cascade"]
|
valid_methods = ["auto", "fts", "dense_rerank", "vector", "hybrid", "cascade"]
|
||||||
if actual_method not in valid_methods:
|
if actual_method not in valid_methods:
|
||||||
if json_mode:
|
if json_mode:
|
||||||
print_json(success=False, error=f"Invalid method: {actual_method}. Use 'dense_rerank' (semantic) or 'fts' (exact keyword).")
|
print_json(success=False, error=f"Invalid method: {actual_method}. Use 'auto', 'dense_rerank', or 'fts'.")
|
||||||
else:
|
else:
|
||||||
console.print(f"[red]Invalid method:[/red] {actual_method}")
|
console.print(f"[red]Invalid method:[/red] {actual_method}")
|
||||||
console.print("[dim]Use 'dense_rerank' (semantic, default) or 'fts' (exact keyword)[/dim]")
|
console.print("[dim]Use 'auto' (default), 'dense_rerank' (semantic), or 'fts' (exact keyword)[/dim]")
|
||||||
raise typer.Exit(code=1)
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
|
resolved_method = (
|
||||||
|
_auto_select_search_method(query)
|
||||||
|
if actual_method == "auto"
|
||||||
|
else actual_method
|
||||||
|
)
|
||||||
|
display_method = resolved_method
|
||||||
|
execution_method = resolved_method
|
||||||
|
|
||||||
# Map dense_rerank to cascade method internally
|
# Map dense_rerank to cascade method internally
|
||||||
internal_cascade_strategy = cascade_strategy
|
internal_cascade_strategy = cascade_strategy
|
||||||
if actual_method == "dense_rerank":
|
if execution_method == "dense_rerank":
|
||||||
actual_method = "cascade"
|
execution_method = "cascade"
|
||||||
internal_cascade_strategy = "dense_rerank"
|
internal_cascade_strategy = "dense_rerank"
|
||||||
|
|
||||||
# Validate cascade_strategy if provided (for advanced users)
|
# Validate cascade_strategy if provided (for advanced users)
|
||||||
@@ -733,32 +1078,32 @@ def search(
|
|||||||
# vector: Pure vector semantic search
|
# vector: Pure vector semantic search
|
||||||
# hybrid: RRF fusion of sparse + dense
|
# hybrid: RRF fusion of sparse + dense
|
||||||
# cascade: Two-stage binary + dense retrieval
|
# cascade: Two-stage binary + dense retrieval
|
||||||
if actual_method == "fts":
|
if execution_method == "fts":
|
||||||
hybrid_mode = False
|
hybrid_mode = False
|
||||||
enable_fuzzy = use_fuzzy
|
enable_fuzzy = use_fuzzy
|
||||||
enable_vector = False
|
enable_vector = False
|
||||||
pure_vector = False
|
pure_vector = False
|
||||||
enable_cascade = False
|
enable_cascade = False
|
||||||
elif actual_method == "vector":
|
elif execution_method == "vector":
|
||||||
hybrid_mode = True
|
hybrid_mode = True
|
||||||
enable_fuzzy = False
|
enable_fuzzy = False
|
||||||
enable_vector = True
|
enable_vector = True
|
||||||
pure_vector = True
|
pure_vector = True
|
||||||
enable_cascade = False
|
enable_cascade = False
|
||||||
elif actual_method == "hybrid":
|
elif execution_method == "hybrid":
|
||||||
hybrid_mode = True
|
hybrid_mode = True
|
||||||
enable_fuzzy = use_fuzzy
|
enable_fuzzy = use_fuzzy
|
||||||
enable_vector = True
|
enable_vector = True
|
||||||
pure_vector = False
|
pure_vector = False
|
||||||
enable_cascade = False
|
enable_cascade = False
|
||||||
elif actual_method == "cascade":
|
elif execution_method == "cascade":
|
||||||
hybrid_mode = True
|
hybrid_mode = True
|
||||||
enable_fuzzy = False
|
enable_fuzzy = False
|
||||||
enable_vector = True
|
enable_vector = True
|
||||||
pure_vector = False
|
pure_vector = False
|
||||||
enable_cascade = True
|
enable_cascade = True
|
||||||
else:
|
else:
|
||||||
raise ValueError(f"Invalid method: {actual_method}")
|
raise ValueError(f"Invalid method: {execution_method}")
|
||||||
|
|
||||||
# Parse exclude_extensions from comma-separated string
|
# Parse exclude_extensions from comma-separated string
|
||||||
exclude_exts_list = None
|
exclude_exts_list = None
|
||||||
@@ -790,10 +1135,28 @@ def search(
|
|||||||
console.print(fp)
|
console.print(fp)
|
||||||
else:
|
else:
|
||||||
# Dispatch to cascade_search for cascade method
|
# Dispatch to cascade_search for cascade method
|
||||||
if actual_method == "cascade":
|
if execution_method == "cascade":
|
||||||
result = engine.cascade_search(query, search_path, k=limit, options=options, strategy=internal_cascade_strategy)
|
result = engine.cascade_search(query, search_path, k=limit, options=options, strategy=internal_cascade_strategy)
|
||||||
else:
|
else:
|
||||||
result = engine.search(query, search_path, options)
|
result = engine.search(query, search_path, options)
|
||||||
|
effective_results = result.results
|
||||||
|
effective_files_matched = result.stats.files_matched
|
||||||
|
effective_time_ms = result.stats.time_ms
|
||||||
|
fallback_payload = None
|
||||||
|
if display_method == "fts" and not use_fuzzy and not effective_results:
|
||||||
|
fallback_payload = _filesystem_fallback_search(
|
||||||
|
query,
|
||||||
|
search_path,
|
||||||
|
limit=limit,
|
||||||
|
config=config,
|
||||||
|
code_only=code_only,
|
||||||
|
exclude_extensions=exclude_exts_list,
|
||||||
|
)
|
||||||
|
if fallback_payload is not None:
|
||||||
|
effective_results = fallback_payload["results"]
|
||||||
|
effective_files_matched = len(effective_results)
|
||||||
|
effective_time_ms = result.stats.time_ms + float(fallback_payload["time_ms"])
|
||||||
|
|
||||||
results_list = [
|
results_list = [
|
||||||
{
|
{
|
||||||
"path": r.path,
|
"path": r.path,
|
||||||
@@ -803,25 +1166,29 @@ def search(
|
|||||||
"source": getattr(r, "search_source", None),
|
"source": getattr(r, "search_source", None),
|
||||||
"symbol": getattr(r, "symbol", None),
|
"symbol": getattr(r, "symbol", None),
|
||||||
}
|
}
|
||||||
for r in result.results
|
for r in effective_results
|
||||||
]
|
]
|
||||||
|
|
||||||
payload = {
|
payload = {
|
||||||
"query": query,
|
"query": query,
|
||||||
"method": actual_method,
|
"method": display_method,
|
||||||
"count": len(results_list),
|
"count": len(results_list),
|
||||||
"results": results_list,
|
"results": results_list,
|
||||||
"stats": {
|
"stats": {
|
||||||
"dirs_searched": result.stats.dirs_searched,
|
"dirs_searched": result.stats.dirs_searched,
|
||||||
"files_matched": result.stats.files_matched,
|
"files_matched": effective_files_matched,
|
||||||
"time_ms": result.stats.time_ms,
|
"time_ms": effective_time_ms,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
if fallback_payload is not None:
|
||||||
|
payload["fallback"] = fallback_payload["fallback"]
|
||||||
if json_mode:
|
if json_mode:
|
||||||
print_json(success=True, result=payload)
|
print_json(success=True, result=payload)
|
||||||
else:
|
else:
|
||||||
render_search_results(result.results, verbose=verbose)
|
render_search_results(effective_results, verbose=verbose)
|
||||||
console.print(f"[dim]Method: {actual_method} | Searched {result.stats.dirs_searched} directories in {result.stats.time_ms:.1f}ms[/dim]")
|
if fallback_payload is not None:
|
||||||
|
console.print("[yellow]No indexed matches found; showing filesystem fallback results (stale index suspected).[/yellow]")
|
||||||
|
console.print(f"[dim]Method: {display_method} | Searched {result.stats.dirs_searched} directories in {effective_time_ms:.1f}ms[/dim]")
|
||||||
|
|
||||||
except SearchError as exc:
|
except SearchError as exc:
|
||||||
if json_mode:
|
if json_mode:
|
||||||
@@ -1454,7 +1821,7 @@ def projects(
|
|||||||
mapper = PathMapper()
|
mapper = PathMapper()
|
||||||
index_root = mapper.source_to_index_dir(project_path)
|
index_root = mapper.source_to_index_dir(project_path)
|
||||||
if index_root.exists():
|
if index_root.exists():
|
||||||
shutil.rmtree(index_root)
|
_remove_tree_best_effort(index_root)
|
||||||
|
|
||||||
if json_mode:
|
if json_mode:
|
||||||
print_json(success=True, result={"removed": str(project_path)})
|
print_json(success=True, result={"removed": str(project_path)})
|
||||||
@@ -1966,17 +2333,30 @@ def clean(
|
|||||||
registry_path.unlink()
|
registry_path.unlink()
|
||||||
|
|
||||||
# Remove all indexes
|
# Remove all indexes
|
||||||
shutil.rmtree(index_root)
|
removal = _remove_tree_best_effort(index_root)
|
||||||
|
|
||||||
result = {
|
result = {
|
||||||
"cleaned": str(index_root),
|
"cleaned": str(index_root),
|
||||||
"size_freed_mb": round(total_size / (1024 * 1024), 2),
|
"size_freed_mb": round(total_size / (1024 * 1024), 2),
|
||||||
|
"partial": bool(removal["partial"]),
|
||||||
|
"locked_paths": removal["locked_paths"],
|
||||||
|
"remaining_path": removal["remaining_path"],
|
||||||
|
"errors": removal["errors"],
|
||||||
}
|
}
|
||||||
|
|
||||||
if json_mode:
|
if json_mode:
|
||||||
print_json(success=True, result=result)
|
print_json(success=True, result=result)
|
||||||
else:
|
else:
|
||||||
console.print(f"[green]Removed all indexes:[/green] {result['size_freed_mb']} MB freed")
|
if result["partial"]:
|
||||||
|
console.print(
|
||||||
|
f"[yellow]Partially removed all indexes:[/yellow] {result['size_freed_mb']} MB freed"
|
||||||
|
)
|
||||||
|
if result["locked_paths"]:
|
||||||
|
console.print(
|
||||||
|
f"[dim]Locked paths left behind: {len(result['locked_paths'])}[/dim]"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
console.print(f"[green]Removed all indexes:[/green] {result['size_freed_mb']} MB freed")
|
||||||
|
|
||||||
elif path:
|
elif path:
|
||||||
# Remove specific project
|
# Remove specific project
|
||||||
@@ -2003,18 +2383,29 @@ def clean(
|
|||||||
registry.close()
|
registry.close()
|
||||||
|
|
||||||
# Remove indexes
|
# Remove indexes
|
||||||
shutil.rmtree(project_index)
|
removal = _remove_tree_best_effort(project_index)
|
||||||
|
|
||||||
result = {
|
result = {
|
||||||
"cleaned": str(project_path),
|
"cleaned": str(project_path),
|
||||||
"index_path": str(project_index),
|
"index_path": str(project_index),
|
||||||
"size_freed_mb": round(total_size / (1024 * 1024), 2),
|
"size_freed_mb": round(total_size / (1024 * 1024), 2),
|
||||||
|
"partial": bool(removal["partial"]),
|
||||||
|
"locked_paths": removal["locked_paths"],
|
||||||
|
"remaining_path": removal["remaining_path"],
|
||||||
|
"errors": removal["errors"],
|
||||||
}
|
}
|
||||||
|
|
||||||
if json_mode:
|
if json_mode:
|
||||||
print_json(success=True, result=result)
|
print_json(success=True, result=result)
|
||||||
else:
|
else:
|
||||||
console.print(f"[green]Removed indexes for:[/green] {project_path}")
|
if result["partial"]:
|
||||||
|
console.print(f"[yellow]Partially removed indexes for:[/yellow] {project_path}")
|
||||||
|
if result["locked_paths"]:
|
||||||
|
console.print(
|
||||||
|
f"[dim]Locked paths left behind: {len(result['locked_paths'])}[/dim]"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
console.print(f"[green]Removed indexes for:[/green] {project_path}")
|
||||||
console.print(f" Freed: {result['size_freed_mb']} MB")
|
console.print(f" Freed: {result['size_freed_mb']} MB")
|
||||||
|
|
||||||
else:
|
else:
|
||||||
@@ -2617,7 +3008,7 @@ def embeddings_status(
|
|||||||
codexlens embeddings-status ~/projects/my-app # Check project (auto-finds index)
|
codexlens embeddings-status ~/projects/my-app # Check project (auto-finds index)
|
||||||
"""
|
"""
|
||||||
_deprecated_command_warning("embeddings-status", "index status")
|
_deprecated_command_warning("embeddings-status", "index status")
|
||||||
from codexlens.cli.embedding_manager import check_index_embeddings, get_embedding_stats_summary
|
from codexlens.cli.embedding_manager import get_embedding_stats_summary, get_embeddings_status
|
||||||
|
|
||||||
# Determine what to check
|
# Determine what to check
|
||||||
if path is None:
|
if path is None:
|
||||||
@@ -3715,7 +4106,7 @@ def index_status(
|
|||||||
"""
|
"""
|
||||||
_configure_logging(verbose, json_mode)
|
_configure_logging(verbose, json_mode)
|
||||||
|
|
||||||
from codexlens.cli.embedding_manager import check_index_embeddings, get_embedding_stats_summary
|
from codexlens.cli.embedding_manager import get_embedding_stats_summary, get_embeddings_status
|
||||||
|
|
||||||
# Determine target path and index root
|
# Determine target path and index root
|
||||||
if path is None:
|
if path is None:
|
||||||
@@ -3751,13 +4142,19 @@ def index_status(
|
|||||||
raise typer.Exit(code=1)
|
raise typer.Exit(code=1)
|
||||||
|
|
||||||
# Get embeddings status
|
# Get embeddings status
|
||||||
embeddings_result = get_embedding_stats_summary(index_root)
|
embeddings_result = get_embeddings_status(index_root)
|
||||||
|
embeddings_summary_result = get_embedding_stats_summary(index_root)
|
||||||
|
|
||||||
# Build combined result
|
# Build combined result
|
||||||
result = {
|
result = {
|
||||||
"index_root": str(index_root),
|
"index_root": str(index_root),
|
||||||
"embeddings": embeddings_result.get("result") if embeddings_result.get("success") else None,
|
# Keep "embeddings" backward-compatible as the subtree summary payload.
|
||||||
"embeddings_error": embeddings_result.get("error") if not embeddings_result.get("success") else None,
|
"embeddings": embeddings_summary_result.get("result") if embeddings_summary_result.get("success") else None,
|
||||||
|
"embeddings_error": embeddings_summary_result.get("error") if not embeddings_summary_result.get("success") else None,
|
||||||
|
"embeddings_status": embeddings_result.get("result") if embeddings_result.get("success") else None,
|
||||||
|
"embeddings_status_error": embeddings_result.get("error") if not embeddings_result.get("success") else None,
|
||||||
|
"embeddings_summary": embeddings_summary_result.get("result") if embeddings_summary_result.get("success") else None,
|
||||||
|
"embeddings_summary_error": embeddings_summary_result.get("error") if not embeddings_summary_result.get("success") else None,
|
||||||
}
|
}
|
||||||
|
|
||||||
if json_mode:
|
if json_mode:
|
||||||
@@ -3770,13 +4167,39 @@ def index_status(
|
|||||||
console.print("[bold]Dense Embeddings (HNSW):[/bold]")
|
console.print("[bold]Dense Embeddings (HNSW):[/bold]")
|
||||||
if embeddings_result.get("success"):
|
if embeddings_result.get("success"):
|
||||||
data = embeddings_result["result"]
|
data = embeddings_result["result"]
|
||||||
total = data.get("total_indexes", 0)
|
root = data.get("root") or data
|
||||||
with_emb = data.get("indexes_with_embeddings", 0)
|
subtree = data.get("subtree") or {}
|
||||||
total_chunks = data.get("total_chunks", 0)
|
centralized = data.get("centralized") or {}
|
||||||
|
|
||||||
console.print(f" Total indexes: {total}")
|
console.print(f" Root files: {root.get('total_files', 0)}")
|
||||||
console.print(f" Indexes with embeddings: [{'green' if with_emb > 0 else 'yellow'}]{with_emb}[/]/{total}")
|
console.print(
|
||||||
console.print(f" Total chunks: {total_chunks:,}")
|
f" Root files with embeddings: "
|
||||||
|
f"[{'green' if root.get('has_embeddings') else 'yellow'}]{root.get('files_with_embeddings', 0)}[/]"
|
||||||
|
f"/{root.get('total_files', 0)}"
|
||||||
|
)
|
||||||
|
console.print(f" Root coverage: {root.get('coverage_percent', 0):.1f}%")
|
||||||
|
console.print(f" Root chunks: {root.get('total_chunks', 0):,}")
|
||||||
|
console.print(f" Root storage mode: {root.get('storage_mode', 'none')}")
|
||||||
|
console.print(
|
||||||
|
f" Centralized dense: "
|
||||||
|
f"{'ready' if centralized.get('dense_ready') else ('present' if centralized.get('dense_index_exists') else 'missing')}"
|
||||||
|
)
|
||||||
|
console.print(
|
||||||
|
f" Centralized binary: "
|
||||||
|
f"{'ready' if centralized.get('binary_ready') else ('present' if centralized.get('binary_index_exists') else 'missing')}"
|
||||||
|
)
|
||||||
|
|
||||||
|
subtree_total = subtree.get("total_indexes", 0)
|
||||||
|
subtree_with_embeddings = subtree.get("indexes_with_embeddings", 0)
|
||||||
|
subtree_chunks = subtree.get("total_chunks", 0)
|
||||||
|
if subtree_total:
|
||||||
|
console.print("\n[bold]Subtree Summary:[/bold]")
|
||||||
|
console.print(f" Total indexes: {subtree_total}")
|
||||||
|
console.print(
|
||||||
|
f" Indexes with embeddings: "
|
||||||
|
f"[{'green' if subtree_with_embeddings > 0 else 'yellow'}]{subtree_with_embeddings}[/]/{subtree_total}"
|
||||||
|
)
|
||||||
|
console.print(f" Total chunks: {subtree_chunks:,}")
|
||||||
else:
|
else:
|
||||||
console.print(f" [yellow]--[/yellow] {embeddings_result.get('error', 'Not available')}")
|
console.print(f" [yellow]--[/yellow] {embeddings_result.get('error', 'Not available')}")
|
||||||
|
|
||||||
|
|||||||
@@ -48,6 +48,8 @@ from itertools import islice
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any, Dict, Generator, List, Optional, Tuple
|
from typing import Any, Dict, Generator, List, Optional, Tuple
|
||||||
|
|
||||||
|
from codexlens.storage.index_filters import filter_index_paths
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from codexlens.semantic import SEMANTIC_AVAILABLE, is_embedding_backend_available
|
from codexlens.semantic import SEMANTIC_AVAILABLE, is_embedding_backend_available
|
||||||
except ImportError:
|
except ImportError:
|
||||||
@@ -61,9 +63,15 @@ except ImportError: # pragma: no cover
|
|||||||
VectorStore = None # type: ignore[assignment]
|
VectorStore = None # type: ignore[assignment]
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from codexlens.config import VECTORS_META_DB_NAME
|
from codexlens.config import (
|
||||||
|
BINARY_VECTORS_MMAP_NAME,
|
||||||
|
VECTORS_HNSW_NAME,
|
||||||
|
VECTORS_META_DB_NAME,
|
||||||
|
)
|
||||||
except ImportError:
|
except ImportError:
|
||||||
|
VECTORS_HNSW_NAME = "_vectors.hnsw"
|
||||||
VECTORS_META_DB_NAME = "_vectors_meta.db"
|
VECTORS_META_DB_NAME = "_vectors_meta.db"
|
||||||
|
BINARY_VECTORS_MMAP_NAME = "_binary_vectors.mmap"
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from codexlens.search.ranking import get_file_category
|
from codexlens.search.ranking import get_file_category
|
||||||
@@ -410,6 +418,98 @@ def check_index_embeddings(index_path: Path) -> Dict[str, any]:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _sqlite_table_exists(conn: sqlite3.Connection, table_name: str) -> bool:
|
||||||
|
"""Return whether a SQLite table exists."""
|
||||||
|
cursor = conn.execute(
|
||||||
|
"SELECT name FROM sqlite_master WHERE type='table' AND name=?",
|
||||||
|
(table_name,),
|
||||||
|
)
|
||||||
|
return cursor.fetchone() is not None
|
||||||
|
|
||||||
|
|
||||||
|
def _sqlite_count_rows(conn: sqlite3.Connection, table_name: str) -> int:
|
||||||
|
"""Return row count for a table, or 0 when the table is absent."""
|
||||||
|
if not _sqlite_table_exists(conn, table_name):
|
||||||
|
return 0
|
||||||
|
cursor = conn.execute(f"SELECT COUNT(*) FROM {table_name}")
|
||||||
|
return int(cursor.fetchone()[0] or 0)
|
||||||
|
|
||||||
|
|
||||||
|
def _sqlite_count_distinct_rows(conn: sqlite3.Connection, table_name: str, column_name: str) -> int:
|
||||||
|
"""Return distinct row count for a table column, or 0 when the table is absent."""
|
||||||
|
if not _sqlite_table_exists(conn, table_name):
|
||||||
|
return 0
|
||||||
|
cursor = conn.execute(f"SELECT COUNT(DISTINCT {column_name}) FROM {table_name}")
|
||||||
|
return int(cursor.fetchone()[0] or 0)
|
||||||
|
|
||||||
|
|
||||||
|
def _get_model_info_from_index(index_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Read embedding model metadata from an index if available."""
|
||||||
|
try:
|
||||||
|
with sqlite3.connect(index_path) as conn:
|
||||||
|
if not _sqlite_table_exists(conn, "embeddings_config"):
|
||||||
|
return None
|
||||||
|
from codexlens.semantic.vector_store import VectorStore
|
||||||
|
with VectorStore(index_path) as vs:
|
||||||
|
config = vs.get_model_config()
|
||||||
|
if not config:
|
||||||
|
return None
|
||||||
|
return {
|
||||||
|
"model_profile": config.get("model_profile"),
|
||||||
|
"model_name": config.get("model_name"),
|
||||||
|
"embedding_dim": config.get("embedding_dim"),
|
||||||
|
"backend": config.get("backend"),
|
||||||
|
"created_at": config.get("created_at"),
|
||||||
|
"updated_at": config.get("updated_at"),
|
||||||
|
}
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _inspect_centralized_embeddings(index_root: Path) -> Dict[str, Any]:
|
||||||
|
"""Inspect centralized vector artifacts stored directly at the current root."""
|
||||||
|
dense_index_path = index_root / VECTORS_HNSW_NAME
|
||||||
|
meta_db_path = index_root / VECTORS_META_DB_NAME
|
||||||
|
binary_index_path = index_root / BINARY_VECTORS_MMAP_NAME
|
||||||
|
|
||||||
|
result: Dict[str, Any] = {
|
||||||
|
"index_root": str(index_root),
|
||||||
|
"dense_index_path": str(dense_index_path) if dense_index_path.exists() else None,
|
||||||
|
"binary_index_path": str(binary_index_path) if binary_index_path.exists() else None,
|
||||||
|
"meta_db_path": str(meta_db_path) if meta_db_path.exists() else None,
|
||||||
|
"dense_index_exists": dense_index_path.exists(),
|
||||||
|
"binary_index_exists": binary_index_path.exists(),
|
||||||
|
"meta_db_exists": meta_db_path.exists(),
|
||||||
|
"chunk_metadata_rows": 0,
|
||||||
|
"binary_vector_rows": 0,
|
||||||
|
"files_with_embeddings": 0,
|
||||||
|
"dense_ready": False,
|
||||||
|
"binary_ready": False,
|
||||||
|
"usable": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
if not meta_db_path.exists():
|
||||||
|
return result
|
||||||
|
|
||||||
|
try:
|
||||||
|
with sqlite3.connect(meta_db_path) as conn:
|
||||||
|
result["chunk_metadata_rows"] = _sqlite_count_rows(conn, "chunk_metadata")
|
||||||
|
result["binary_vector_rows"] = _sqlite_count_rows(conn, "binary_vectors")
|
||||||
|
result["files_with_embeddings"] = _sqlite_count_distinct_rows(conn, "chunk_metadata", "file_path")
|
||||||
|
except Exception as exc:
|
||||||
|
result["error"] = f"Failed to inspect centralized metadata: {exc}"
|
||||||
|
return result
|
||||||
|
|
||||||
|
result["dense_ready"] = result["dense_index_exists"] and result["chunk_metadata_rows"] > 0
|
||||||
|
result["binary_ready"] = (
|
||||||
|
result["binary_index_exists"]
|
||||||
|
and result["chunk_metadata_rows"] > 0
|
||||||
|
and result["binary_vector_rows"] > 0
|
||||||
|
)
|
||||||
|
result["usable"] = result["dense_ready"] or result["binary_ready"]
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
def _get_embedding_defaults() -> tuple[str, str, bool, List, str, float]:
|
def _get_embedding_defaults() -> tuple[str, str, bool, List, str, float]:
|
||||||
"""Get default embedding settings from config.
|
"""Get default embedding settings from config.
|
||||||
|
|
||||||
@@ -1024,7 +1124,7 @@ def _discover_index_dbs_internal(index_root: Path) -> List[Path]:
|
|||||||
if not index_root.exists():
|
if not index_root.exists():
|
||||||
return []
|
return []
|
||||||
|
|
||||||
return sorted(index_root.rglob("_index.db"))
|
return sorted(filter_index_paths(index_root.rglob("_index.db"), index_root))
|
||||||
|
|
||||||
|
|
||||||
def build_centralized_binary_vectors_from_existing(
|
def build_centralized_binary_vectors_from_existing(
|
||||||
@@ -1353,7 +1453,7 @@ def find_all_indexes(scan_dir: Path) -> List[Path]:
|
|||||||
if not scan_dir.exists():
|
if not scan_dir.exists():
|
||||||
return []
|
return []
|
||||||
|
|
||||||
return list(scan_dir.rglob("_index.db"))
|
return _discover_index_dbs_internal(scan_dir)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -1866,8 +1966,32 @@ def get_embeddings_status(index_root: Path) -> Dict[str, any]:
|
|||||||
Aggregated status with coverage statistics, model info, and timestamps
|
Aggregated status with coverage statistics, model info, and timestamps
|
||||||
"""
|
"""
|
||||||
index_files = _discover_index_dbs_internal(index_root)
|
index_files = _discover_index_dbs_internal(index_root)
|
||||||
|
centralized = _inspect_centralized_embeddings(index_root)
|
||||||
|
root_index_path = index_root / "_index.db"
|
||||||
|
root_index_exists = root_index_path.exists()
|
||||||
|
|
||||||
if not index_files:
|
if not index_files:
|
||||||
|
root_result = {
|
||||||
|
"index_path": str(root_index_path),
|
||||||
|
"exists": root_index_exists,
|
||||||
|
"total_files": 0,
|
||||||
|
"files_with_embeddings": 0,
|
||||||
|
"files_without_embeddings": 0,
|
||||||
|
"total_chunks": 0,
|
||||||
|
"coverage_percent": 0.0,
|
||||||
|
"has_embeddings": False,
|
||||||
|
"storage_mode": "none",
|
||||||
|
}
|
||||||
|
subtree_result = {
|
||||||
|
"total_indexes": 0,
|
||||||
|
"total_files": 0,
|
||||||
|
"files_with_embeddings": 0,
|
||||||
|
"files_without_embeddings": 0,
|
||||||
|
"total_chunks": 0,
|
||||||
|
"coverage_percent": 0.0,
|
||||||
|
"indexes_with_embeddings": 0,
|
||||||
|
"indexes_without_embeddings": 0,
|
||||||
|
}
|
||||||
return {
|
return {
|
||||||
"success": True,
|
"success": True,
|
||||||
"result": {
|
"result": {
|
||||||
@@ -1880,72 +2004,123 @@ def get_embeddings_status(index_root: Path) -> Dict[str, any]:
|
|||||||
"indexes_with_embeddings": 0,
|
"indexes_with_embeddings": 0,
|
||||||
"indexes_without_embeddings": 0,
|
"indexes_without_embeddings": 0,
|
||||||
"model_info": None,
|
"model_info": None,
|
||||||
|
"root": root_result,
|
||||||
|
"subtree": subtree_result,
|
||||||
|
"centralized": centralized,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
total_files = 0
|
subtree_total_files = 0
|
||||||
files_with_embeddings = 0
|
subtree_files_with_embeddings = 0
|
||||||
total_chunks = 0
|
subtree_total_chunks = 0
|
||||||
indexes_with_embeddings = 0
|
subtree_indexes_with_embeddings = 0
|
||||||
model_info = None
|
subtree_model_info = None
|
||||||
latest_updated_at = None
|
latest_updated_at = None
|
||||||
|
|
||||||
for index_path in index_files:
|
for index_path in index_files:
|
||||||
status = check_index_embeddings(index_path)
|
status = check_index_embeddings(index_path)
|
||||||
if status["success"]:
|
if not status["success"]:
|
||||||
result = status["result"]
|
continue
|
||||||
total_files += result["total_files"]
|
|
||||||
files_with_embeddings += result["files_with_chunks"]
|
|
||||||
total_chunks += result["total_chunks"]
|
|
||||||
if result["has_embeddings"]:
|
|
||||||
indexes_with_embeddings += 1
|
|
||||||
|
|
||||||
# Get model config from first index with embeddings (they should all match)
|
result = status["result"]
|
||||||
if model_info is None:
|
subtree_total_files += result["total_files"]
|
||||||
try:
|
subtree_files_with_embeddings += result["files_with_chunks"]
|
||||||
from codexlens.semantic.vector_store import VectorStore
|
subtree_total_chunks += result["total_chunks"]
|
||||||
with VectorStore(index_path) as vs:
|
|
||||||
config = vs.get_model_config()
|
|
||||||
if config:
|
|
||||||
model_info = {
|
|
||||||
"model_profile": config.get("model_profile"),
|
|
||||||
"model_name": config.get("model_name"),
|
|
||||||
"embedding_dim": config.get("embedding_dim"),
|
|
||||||
"backend": config.get("backend"),
|
|
||||||
"created_at": config.get("created_at"),
|
|
||||||
"updated_at": config.get("updated_at"),
|
|
||||||
}
|
|
||||||
latest_updated_at = config.get("updated_at")
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
else:
|
|
||||||
# Track the latest updated_at across all indexes
|
|
||||||
try:
|
|
||||||
from codexlens.semantic.vector_store import VectorStore
|
|
||||||
with VectorStore(index_path) as vs:
|
|
||||||
config = vs.get_model_config()
|
|
||||||
if config and config.get("updated_at"):
|
|
||||||
if latest_updated_at is None or config["updated_at"] > latest_updated_at:
|
|
||||||
latest_updated_at = config["updated_at"]
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Update model_info with latest timestamp
|
if not result["has_embeddings"]:
|
||||||
if model_info and latest_updated_at:
|
continue
|
||||||
model_info["updated_at"] = latest_updated_at
|
|
||||||
|
subtree_indexes_with_embeddings += 1
|
||||||
|
candidate_model_info = _get_model_info_from_index(index_path)
|
||||||
|
if not candidate_model_info:
|
||||||
|
continue
|
||||||
|
if subtree_model_info is None:
|
||||||
|
subtree_model_info = candidate_model_info
|
||||||
|
latest_updated_at = candidate_model_info.get("updated_at")
|
||||||
|
continue
|
||||||
|
candidate_updated_at = candidate_model_info.get("updated_at")
|
||||||
|
if candidate_updated_at and (latest_updated_at is None or candidate_updated_at > latest_updated_at):
|
||||||
|
latest_updated_at = candidate_updated_at
|
||||||
|
|
||||||
|
if subtree_model_info and latest_updated_at:
|
||||||
|
subtree_model_info["updated_at"] = latest_updated_at
|
||||||
|
|
||||||
|
root_total_files = 0
|
||||||
|
root_files_with_embeddings = 0
|
||||||
|
root_total_chunks = 0
|
||||||
|
root_has_embeddings = False
|
||||||
|
root_storage_mode = "none"
|
||||||
|
|
||||||
|
if root_index_exists:
|
||||||
|
root_status = check_index_embeddings(root_index_path)
|
||||||
|
if root_status["success"]:
|
||||||
|
root_data = root_status["result"]
|
||||||
|
root_total_files = int(root_data["total_files"])
|
||||||
|
if root_data["has_embeddings"]:
|
||||||
|
root_files_with_embeddings = int(root_data["files_with_chunks"])
|
||||||
|
root_total_chunks = int(root_data["total_chunks"])
|
||||||
|
root_has_embeddings = True
|
||||||
|
root_storage_mode = "distributed"
|
||||||
|
|
||||||
|
if centralized["usable"]:
|
||||||
|
root_files_with_embeddings = int(centralized["files_with_embeddings"])
|
||||||
|
root_total_chunks = int(centralized["chunk_metadata_rows"])
|
||||||
|
root_has_embeddings = True
|
||||||
|
root_storage_mode = "centralized" if root_storage_mode == "none" else "mixed"
|
||||||
|
|
||||||
|
model_info = None
|
||||||
|
if root_has_embeddings:
|
||||||
|
if root_storage_mode in {"distributed", "mixed"} and root_index_exists:
|
||||||
|
model_info = _get_model_info_from_index(root_index_path)
|
||||||
|
if model_info is None and root_storage_mode in {"centralized", "mixed"}:
|
||||||
|
model_info = subtree_model_info
|
||||||
|
|
||||||
|
root_coverage_percent = round(
|
||||||
|
(root_files_with_embeddings / root_total_files * 100) if root_total_files > 0 else 0,
|
||||||
|
1,
|
||||||
|
)
|
||||||
|
root_files_without_embeddings = max(root_total_files - root_files_with_embeddings, 0)
|
||||||
|
|
||||||
|
root_result = {
|
||||||
|
"index_path": str(root_index_path),
|
||||||
|
"exists": root_index_exists,
|
||||||
|
"total_files": root_total_files,
|
||||||
|
"files_with_embeddings": root_files_with_embeddings,
|
||||||
|
"files_without_embeddings": root_files_without_embeddings,
|
||||||
|
"total_chunks": root_total_chunks,
|
||||||
|
"coverage_percent": root_coverage_percent,
|
||||||
|
"has_embeddings": root_has_embeddings,
|
||||||
|
"storage_mode": root_storage_mode,
|
||||||
|
}
|
||||||
|
subtree_result = {
|
||||||
|
"total_indexes": len(index_files),
|
||||||
|
"total_files": subtree_total_files,
|
||||||
|
"files_with_embeddings": subtree_files_with_embeddings,
|
||||||
|
"files_without_embeddings": subtree_total_files - subtree_files_with_embeddings,
|
||||||
|
"total_chunks": subtree_total_chunks,
|
||||||
|
"coverage_percent": round(
|
||||||
|
(subtree_files_with_embeddings / subtree_total_files * 100) if subtree_total_files > 0 else 0,
|
||||||
|
1,
|
||||||
|
),
|
||||||
|
"indexes_with_embeddings": subtree_indexes_with_embeddings,
|
||||||
|
"indexes_without_embeddings": len(index_files) - subtree_indexes_with_embeddings,
|
||||||
|
}
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"success": True,
|
"success": True,
|
||||||
"result": {
|
"result": {
|
||||||
"total_indexes": len(index_files),
|
"total_indexes": 1 if root_index_exists else 0,
|
||||||
"total_files": total_files,
|
"total_files": root_total_files,
|
||||||
"files_with_embeddings": files_with_embeddings,
|
"files_with_embeddings": root_files_with_embeddings,
|
||||||
"files_without_embeddings": total_files - files_with_embeddings,
|
"files_without_embeddings": root_files_without_embeddings,
|
||||||
"total_chunks": total_chunks,
|
"total_chunks": root_total_chunks,
|
||||||
"coverage_percent": round((files_with_embeddings / total_files * 100) if total_files > 0 else 0, 1),
|
"coverage_percent": root_coverage_percent,
|
||||||
"indexes_with_embeddings": indexes_with_embeddings,
|
"indexes_with_embeddings": 1 if root_has_embeddings else 0,
|
||||||
"indexes_without_embeddings": len(index_files) - indexes_with_embeddings,
|
"indexes_without_embeddings": 1 if root_index_exists and not root_has_embeddings else 0,
|
||||||
"model_info": model_info,
|
"model_info": model_info,
|
||||||
|
"root": root_result,
|
||||||
|
"subtree": subtree_result,
|
||||||
|
"centralized": centralized,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -126,11 +126,14 @@ class Config:
|
|||||||
enable_reranking: bool = False
|
enable_reranking: bool = False
|
||||||
reranking_top_k: int = 50
|
reranking_top_k: int = 50
|
||||||
symbol_boost_factor: float = 1.5
|
symbol_boost_factor: float = 1.5
|
||||||
|
test_file_penalty: float = 0.15 # Penalty for test/fixture paths during final ranking
|
||||||
|
generated_file_penalty: float = 0.35 # Penalty for generated/build artifact paths during final ranking
|
||||||
|
|
||||||
# Optional cross-encoder reranking (second stage; requires optional reranker deps)
|
# Optional cross-encoder reranking (second stage; requires optional reranker deps)
|
||||||
enable_cross_encoder_rerank: bool = False
|
enable_cross_encoder_rerank: bool = False
|
||||||
reranker_backend: str = "onnx"
|
reranker_backend: str = "onnx"
|
||||||
reranker_model: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
reranker_model: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
||||||
|
reranker_use_gpu: bool = True # Whether reranker backends should use GPU acceleration
|
||||||
reranker_top_k: int = 50
|
reranker_top_k: int = 50
|
||||||
reranker_max_input_tokens: int = 8192 # Maximum tokens for reranker API batching
|
reranker_max_input_tokens: int = 8192 # Maximum tokens for reranker API batching
|
||||||
reranker_chunk_type_weights: Optional[Dict[str, float]] = None # Weights for chunk types: {"code": 1.0, "docstring": 0.7}
|
reranker_chunk_type_weights: Optional[Dict[str, float]] = None # Weights for chunk types: {"code": 1.0, "docstring": 0.7}
|
||||||
@@ -312,6 +315,7 @@ class Config:
|
|||||||
"enabled": self.enable_cross_encoder_rerank,
|
"enabled": self.enable_cross_encoder_rerank,
|
||||||
"backend": self.reranker_backend,
|
"backend": self.reranker_backend,
|
||||||
"model": self.reranker_model,
|
"model": self.reranker_model,
|
||||||
|
"use_gpu": self.reranker_use_gpu,
|
||||||
"top_k": self.reranker_top_k,
|
"top_k": self.reranker_top_k,
|
||||||
"max_input_tokens": self.reranker_max_input_tokens,
|
"max_input_tokens": self.reranker_max_input_tokens,
|
||||||
"pool_enabled": self.reranker_pool_enabled,
|
"pool_enabled": self.reranker_pool_enabled,
|
||||||
@@ -418,6 +422,8 @@ class Config:
|
|||||||
)
|
)
|
||||||
if "model" in reranker:
|
if "model" in reranker:
|
||||||
self.reranker_model = reranker["model"]
|
self.reranker_model = reranker["model"]
|
||||||
|
if "use_gpu" in reranker:
|
||||||
|
self.reranker_use_gpu = reranker["use_gpu"]
|
||||||
if "top_k" in reranker:
|
if "top_k" in reranker:
|
||||||
self.reranker_top_k = reranker["top_k"]
|
self.reranker_top_k = reranker["top_k"]
|
||||||
if "max_input_tokens" in reranker:
|
if "max_input_tokens" in reranker:
|
||||||
@@ -712,6 +718,7 @@ class Config:
|
|||||||
EMBEDDING_COOLDOWN: Rate limit cooldown for embedding
|
EMBEDDING_COOLDOWN: Rate limit cooldown for embedding
|
||||||
RERANKER_MODEL: Override reranker model
|
RERANKER_MODEL: Override reranker model
|
||||||
RERANKER_BACKEND: Override reranker backend
|
RERANKER_BACKEND: Override reranker backend
|
||||||
|
RERANKER_USE_GPU: Override reranker GPU usage (true/false)
|
||||||
RERANKER_ENABLED: Override reranker enabled state (true/false)
|
RERANKER_ENABLED: Override reranker enabled state (true/false)
|
||||||
RERANKER_POOL_ENABLED: Enable reranker high availability pool
|
RERANKER_POOL_ENABLED: Enable reranker high availability pool
|
||||||
RERANKER_STRATEGY: Load balance strategy for reranker
|
RERANKER_STRATEGY: Load balance strategy for reranker
|
||||||
@@ -832,6 +839,11 @@ class Config:
|
|||||||
else:
|
else:
|
||||||
log.warning("Invalid RERANKER_BACKEND in .env: %r", reranker_backend)
|
log.warning("Invalid RERANKER_BACKEND in .env: %r", reranker_backend)
|
||||||
|
|
||||||
|
reranker_use_gpu = get_env("RERANKER_USE_GPU")
|
||||||
|
if reranker_use_gpu:
|
||||||
|
self.reranker_use_gpu = _parse_bool(reranker_use_gpu)
|
||||||
|
log.debug("Overriding reranker_use_gpu from .env: %s", self.reranker_use_gpu)
|
||||||
|
|
||||||
reranker_enabled = get_env("RERANKER_ENABLED")
|
reranker_enabled = get_env("RERANKER_ENABLED")
|
||||||
if reranker_enabled:
|
if reranker_enabled:
|
||||||
value = reranker_enabled.lower()
|
value = reranker_enabled.lower()
|
||||||
@@ -878,6 +890,25 @@ class Config:
|
|||||||
except ValueError:
|
except ValueError:
|
||||||
log.warning("Invalid RERANKER_TEST_FILE_PENALTY in .env: %r", test_penalty)
|
log.warning("Invalid RERANKER_TEST_FILE_PENALTY in .env: %r", test_penalty)
|
||||||
|
|
||||||
|
ranking_test_penalty = get_env("TEST_FILE_PENALTY")
|
||||||
|
if ranking_test_penalty:
|
||||||
|
try:
|
||||||
|
self.test_file_penalty = float(ranking_test_penalty)
|
||||||
|
log.debug("Overriding test_file_penalty from .env: %s", self.test_file_penalty)
|
||||||
|
except ValueError:
|
||||||
|
log.warning("Invalid TEST_FILE_PENALTY in .env: %r", ranking_test_penalty)
|
||||||
|
|
||||||
|
generated_penalty = get_env("GENERATED_FILE_PENALTY")
|
||||||
|
if generated_penalty:
|
||||||
|
try:
|
||||||
|
self.generated_file_penalty = float(generated_penalty)
|
||||||
|
log.debug(
|
||||||
|
"Overriding generated_file_penalty from .env: %s",
|
||||||
|
self.generated_file_penalty,
|
||||||
|
)
|
||||||
|
except ValueError:
|
||||||
|
log.warning("Invalid GENERATED_FILE_PENALTY in .env: %r", generated_penalty)
|
||||||
|
|
||||||
docstring_weight = get_env("RERANKER_DOCSTRING_WEIGHT")
|
docstring_weight = get_env("RERANKER_DOCSTRING_WEIGHT")
|
||||||
if docstring_weight:
|
if docstring_weight:
|
||||||
try:
|
try:
|
||||||
|
|||||||
@@ -23,6 +23,7 @@ ENV_VARS = {
|
|||||||
# Reranker configuration (overrides settings.json)
|
# Reranker configuration (overrides settings.json)
|
||||||
"RERANKER_MODEL": "Reranker model name (overrides settings.json)",
|
"RERANKER_MODEL": "Reranker model name (overrides settings.json)",
|
||||||
"RERANKER_BACKEND": "Reranker backend: fastembed, onnx, api, litellm, legacy",
|
"RERANKER_BACKEND": "Reranker backend: fastembed, onnx, api, litellm, legacy",
|
||||||
|
"RERANKER_USE_GPU": "Use GPU for local reranker backends: true/false",
|
||||||
"RERANKER_ENABLED": "Enable reranker: true/false",
|
"RERANKER_ENABLED": "Enable reranker: true/false",
|
||||||
"RERANKER_API_KEY": "API key for reranker service (SiliconFlow/Cohere/Jina)",
|
"RERANKER_API_KEY": "API key for reranker service (SiliconFlow/Cohere/Jina)",
|
||||||
"RERANKER_API_BASE": "Base URL for reranker API (overrides provider default)",
|
"RERANKER_API_BASE": "Base URL for reranker API (overrides provider default)",
|
||||||
@@ -65,6 +66,9 @@ ENV_VARS = {
|
|||||||
# Chunking configuration
|
# Chunking configuration
|
||||||
"CHUNK_STRIP_COMMENTS": "Strip comments from code chunks for embedding: true/false (default: true)",
|
"CHUNK_STRIP_COMMENTS": "Strip comments from code chunks for embedding: true/false (default: true)",
|
||||||
"CHUNK_STRIP_DOCSTRINGS": "Strip docstrings from code chunks for embedding: true/false (default: true)",
|
"CHUNK_STRIP_DOCSTRINGS": "Strip docstrings from code chunks for embedding: true/false (default: true)",
|
||||||
|
# Search ranking tuning
|
||||||
|
"TEST_FILE_PENALTY": "Penalty for test/fixture paths in final search ranking: 0.0-1.0 (default: 0.15)",
|
||||||
|
"GENERATED_FILE_PENALTY": "Penalty for generated/build artifact paths in final search ranking: 0.0-1.0 (default: 0.35)",
|
||||||
# Reranker tuning
|
# Reranker tuning
|
||||||
"RERANKER_TEST_FILE_PENALTY": "Penalty for test files in reranking: 0.0-1.0 (default: 0.0)",
|
"RERANKER_TEST_FILE_PENALTY": "Penalty for test files in reranking: 0.0-1.0 (default: 0.0)",
|
||||||
"RERANKER_DOCSTRING_WEIGHT": "Weight for docstring chunks in reranking: 0.0-1.0 (default: 1.0)",
|
"RERANKER_DOCSTRING_WEIGHT": "Weight for docstring chunks in reranking: 0.0-1.0 (default: 1.0)",
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -7,6 +7,7 @@ results via Reciprocal Rank Fusion (RRF) algorithm.
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import logging
|
import logging
|
||||||
|
import threading
|
||||||
import time
|
import time
|
||||||
from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError, as_completed
|
from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError, as_completed
|
||||||
from contextlib import contextmanager
|
from contextlib import contextmanager
|
||||||
@@ -34,19 +35,21 @@ from codexlens.config import Config
|
|||||||
from codexlens.config import VECTORS_HNSW_NAME
|
from codexlens.config import VECTORS_HNSW_NAME
|
||||||
from codexlens.entities import SearchResult
|
from codexlens.entities import SearchResult
|
||||||
from codexlens.search.ranking import (
|
from codexlens.search.ranking import (
|
||||||
DEFAULT_WEIGHTS,
|
DEFAULT_WEIGHTS as RANKING_DEFAULT_WEIGHTS,
|
||||||
QueryIntent,
|
QueryIntent,
|
||||||
apply_symbol_boost,
|
apply_symbol_boost,
|
||||||
cross_encoder_rerank,
|
cross_encoder_rerank,
|
||||||
detect_query_intent,
|
detect_query_intent,
|
||||||
filter_results_by_category,
|
filter_results_by_category,
|
||||||
get_rrf_weights,
|
get_rrf_weights,
|
||||||
|
query_prefers_lexical_search,
|
||||||
reciprocal_rank_fusion,
|
reciprocal_rank_fusion,
|
||||||
rerank_results,
|
rerank_results,
|
||||||
simple_weighted_fusion,
|
simple_weighted_fusion,
|
||||||
tag_search_source,
|
tag_search_source,
|
||||||
)
|
)
|
||||||
from codexlens.storage.dir_index import DirIndexStore
|
from codexlens.storage.dir_index import DirIndexStore
|
||||||
|
from codexlens.storage.index_filters import filter_index_paths
|
||||||
|
|
||||||
# Optional LSP imports (for real-time graph expansion)
|
# Optional LSP imports (for real-time graph expansion)
|
||||||
try:
|
try:
|
||||||
@@ -67,8 +70,13 @@ class HybridSearchEngine:
|
|||||||
default_weights: Default RRF weights for each source
|
default_weights: Default RRF weights for each source
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# NOTE: DEFAULT_WEIGHTS imported from ranking.py - single source of truth
|
# Public compatibility contract for callers/tests that expect the legacy
|
||||||
# FTS + vector hybrid mode (exact: 0.3, fuzzy: 0.1, vector: 0.6)
|
# three-backend defaults on the engine instance.
|
||||||
|
DEFAULT_WEIGHTS = {
|
||||||
|
"exact": 0.3,
|
||||||
|
"fuzzy": 0.1,
|
||||||
|
"vector": 0.6,
|
||||||
|
}
|
||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
@@ -95,11 +103,172 @@ class HybridSearchEngine:
|
|||||||
f"Did you mean to pass index_path to search() instead of __init__()?"
|
f"Did you mean to pass index_path to search() instead of __init__()?"
|
||||||
)
|
)
|
||||||
|
|
||||||
self.weights = weights or DEFAULT_WEIGHTS.copy()
|
self.weights = weights
|
||||||
self._config = config
|
self._config = config
|
||||||
self.embedder = embedder
|
self.embedder = embedder
|
||||||
self.reranker: Any = None
|
self.reranker: Any = None
|
||||||
self._use_gpu = config.embedding_use_gpu if config else True
|
self._use_gpu = config.embedding_use_gpu if config else True
|
||||||
|
self._centralized_cache_lock = threading.RLock()
|
||||||
|
self._centralized_model_config_cache: Dict[str, Any] = {}
|
||||||
|
self._centralized_embedder_cache: Dict[tuple[Any, ...], Any] = {}
|
||||||
|
self._centralized_ann_cache: Dict[tuple[str, int], Any] = {}
|
||||||
|
self._centralized_query_embedding_cache: Dict[tuple[Any, ...], Any] = {}
|
||||||
|
|
||||||
|
@property
|
||||||
|
def weights(self) -> Dict[str, float]:
|
||||||
|
"""Public/default weights exposed for backwards compatibility."""
|
||||||
|
return dict(self._weights)
|
||||||
|
|
||||||
|
@weights.setter
|
||||||
|
def weights(self, value: Optional[Dict[str, float]]) -> None:
|
||||||
|
"""Update public and internal fusion weights together."""
|
||||||
|
if value is None:
|
||||||
|
public_weights = self.DEFAULT_WEIGHTS.copy()
|
||||||
|
fusion_weights = dict(RANKING_DEFAULT_WEIGHTS)
|
||||||
|
fusion_weights.update(public_weights)
|
||||||
|
else:
|
||||||
|
if not isinstance(value, dict):
|
||||||
|
raise TypeError(f"weights must be a dict, got {type(value).__name__}")
|
||||||
|
public_weights = dict(value)
|
||||||
|
fusion_weights = dict(value)
|
||||||
|
|
||||||
|
self._weights = public_weights
|
||||||
|
self._fusion_weights = fusion_weights
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _clamp_search_score(score: float) -> float:
|
||||||
|
"""Keep ANN-derived similarity scores within SearchResult's valid domain."""
|
||||||
|
|
||||||
|
return max(0.0, float(score))
|
||||||
|
|
||||||
|
def _get_centralized_model_config(self, index_root: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Load and cache the centralized embedding model config for an index root."""
|
||||||
|
root_key = str(Path(index_root).resolve())
|
||||||
|
|
||||||
|
with self._centralized_cache_lock:
|
||||||
|
if root_key in self._centralized_model_config_cache:
|
||||||
|
cached = self._centralized_model_config_cache[root_key]
|
||||||
|
return dict(cached) if isinstance(cached, dict) else None
|
||||||
|
|
||||||
|
model_config: Optional[Dict[str, Any]] = None
|
||||||
|
try:
|
||||||
|
from codexlens.semantic.vector_store import VectorStore
|
||||||
|
|
||||||
|
central_index_path = Path(root_key) / "_index.db"
|
||||||
|
if central_index_path.exists():
|
||||||
|
with VectorStore(central_index_path) as vs:
|
||||||
|
loaded = vs.get_model_config()
|
||||||
|
if isinstance(loaded, dict):
|
||||||
|
model_config = dict(loaded)
|
||||||
|
self.logger.debug(
|
||||||
|
"Loaded model config from centralized index: %s",
|
||||||
|
model_config,
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.debug(
|
||||||
|
"Failed to load model config from centralized index: %s",
|
||||||
|
exc,
|
||||||
|
)
|
||||||
|
|
||||||
|
with self._centralized_cache_lock:
|
||||||
|
self._centralized_model_config_cache[root_key] = (
|
||||||
|
dict(model_config) if isinstance(model_config, dict) else None
|
||||||
|
)
|
||||||
|
|
||||||
|
return dict(model_config) if isinstance(model_config, dict) else None
|
||||||
|
|
||||||
|
def _get_centralized_embedder(
|
||||||
|
self,
|
||||||
|
model_config: Optional[Dict[str, Any]],
|
||||||
|
) -> tuple[Any, int, tuple[Any, ...]]:
|
||||||
|
"""Resolve and cache the embedder used for centralized vector search."""
|
||||||
|
from codexlens.semantic.factory import get_embedder
|
||||||
|
|
||||||
|
backend = "fastembed"
|
||||||
|
model_name: Optional[str] = None
|
||||||
|
model_profile = "code"
|
||||||
|
use_gpu = bool(self._use_gpu)
|
||||||
|
embedding_dim: Optional[int] = None
|
||||||
|
|
||||||
|
if model_config:
|
||||||
|
backend = str(model_config.get("backend", "fastembed") or "fastembed")
|
||||||
|
model_name = model_config.get("model_name")
|
||||||
|
model_profile = str(model_config.get("model_profile", "code") or "code")
|
||||||
|
raw_dim = model_config.get("embedding_dim")
|
||||||
|
embedding_dim = int(raw_dim) if raw_dim else None
|
||||||
|
|
||||||
|
if backend == "litellm":
|
||||||
|
embedder_key: tuple[Any, ...] = ("litellm", model_name or "", None)
|
||||||
|
else:
|
||||||
|
embedder_key = ("fastembed", model_profile, use_gpu)
|
||||||
|
|
||||||
|
with self._centralized_cache_lock:
|
||||||
|
cached = self._centralized_embedder_cache.get(embedder_key)
|
||||||
|
if cached is None:
|
||||||
|
if backend == "litellm":
|
||||||
|
cached = get_embedder(backend="litellm", model=model_name)
|
||||||
|
else:
|
||||||
|
cached = get_embedder(
|
||||||
|
backend="fastembed",
|
||||||
|
profile=model_profile,
|
||||||
|
use_gpu=use_gpu,
|
||||||
|
)
|
||||||
|
with self._centralized_cache_lock:
|
||||||
|
existing = self._centralized_embedder_cache.get(embedder_key)
|
||||||
|
if existing is None:
|
||||||
|
self._centralized_embedder_cache[embedder_key] = cached
|
||||||
|
else:
|
||||||
|
cached = existing
|
||||||
|
|
||||||
|
if embedding_dim is None:
|
||||||
|
embedding_dim = int(getattr(cached, "embedding_dim", 0) or 0)
|
||||||
|
|
||||||
|
return cached, embedding_dim, embedder_key
|
||||||
|
|
||||||
|
def _get_centralized_ann_index(self, index_root: Path, dim: int) -> Any:
|
||||||
|
"""Load and cache a centralized ANN index for repeated searches."""
|
||||||
|
from codexlens.semantic.ann_index import ANNIndex
|
||||||
|
|
||||||
|
resolved_root = Path(index_root).resolve()
|
||||||
|
cache_key = (str(resolved_root), int(dim))
|
||||||
|
|
||||||
|
with self._centralized_cache_lock:
|
||||||
|
cached = self._centralized_ann_cache.get(cache_key)
|
||||||
|
if cached is not None:
|
||||||
|
return cached
|
||||||
|
|
||||||
|
ann_index = ANNIndex.create_central(index_root=resolved_root, dim=int(dim))
|
||||||
|
if not ann_index.load():
|
||||||
|
return None
|
||||||
|
|
||||||
|
with self._centralized_cache_lock:
|
||||||
|
existing = self._centralized_ann_cache.get(cache_key)
|
||||||
|
if existing is None:
|
||||||
|
self._centralized_ann_cache[cache_key] = ann_index
|
||||||
|
return ann_index
|
||||||
|
return existing
|
||||||
|
|
||||||
|
def _get_cached_query_embedding(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
embedder: Any,
|
||||||
|
embedder_key: tuple[Any, ...],
|
||||||
|
) -> Any:
|
||||||
|
"""Cache repeated query embeddings for the same embedder settings."""
|
||||||
|
cache_key = embedder_key + (query,)
|
||||||
|
|
||||||
|
with self._centralized_cache_lock:
|
||||||
|
cached = self._centralized_query_embedding_cache.get(cache_key)
|
||||||
|
if cached is not None:
|
||||||
|
return cached
|
||||||
|
|
||||||
|
query_embedding = embedder.embed_single(query)
|
||||||
|
with self._centralized_cache_lock:
|
||||||
|
existing = self._centralized_query_embedding_cache.get(cache_key)
|
||||||
|
if existing is None:
|
||||||
|
self._centralized_query_embedding_cache[cache_key] = query_embedding
|
||||||
|
return query_embedding
|
||||||
|
return existing
|
||||||
|
|
||||||
def search(
|
def search(
|
||||||
self,
|
self,
|
||||||
@@ -154,6 +323,7 @@ class HybridSearchEngine:
|
|||||||
|
|
||||||
# Detect query intent early for category filtering at index level
|
# Detect query intent early for category filtering at index level
|
||||||
query_intent = detect_query_intent(query)
|
query_intent = detect_query_intent(query)
|
||||||
|
lexical_priority_query = query_prefers_lexical_search(query)
|
||||||
# Map intent to category for vector search:
|
# Map intent to category for vector search:
|
||||||
# - KEYWORD (code intent) -> filter to 'code' only
|
# - KEYWORD (code intent) -> filter to 'code' only
|
||||||
# - SEMANTIC (doc intent) -> no filter (allow docs to surface)
|
# - SEMANTIC (doc intent) -> no filter (allow docs to surface)
|
||||||
@@ -182,11 +352,11 @@ class HybridSearchEngine:
|
|||||||
backends["exact"] = True
|
backends["exact"] = True
|
||||||
if enable_fuzzy:
|
if enable_fuzzy:
|
||||||
backends["fuzzy"] = True
|
backends["fuzzy"] = True
|
||||||
if enable_vector:
|
if enable_vector and not lexical_priority_query:
|
||||||
backends["vector"] = True
|
backends["vector"] = True
|
||||||
|
|
||||||
# Add LSP graph expansion if requested and available
|
# Add LSP graph expansion if requested and available
|
||||||
if enable_lsp_graph and HAS_LSP:
|
if enable_lsp_graph and HAS_LSP and not lexical_priority_query:
|
||||||
backends["lsp_graph"] = True
|
backends["lsp_graph"] = True
|
||||||
elif enable_lsp_graph and not HAS_LSP:
|
elif enable_lsp_graph and not HAS_LSP:
|
||||||
self.logger.warning(
|
self.logger.warning(
|
||||||
@@ -214,7 +384,7 @@ class HybridSearchEngine:
|
|||||||
# Filter weights to only active backends
|
# Filter weights to only active backends
|
||||||
active_weights = {
|
active_weights = {
|
||||||
source: weight
|
source: weight
|
||||||
for source, weight in self.weights.items()
|
for source, weight in self._fusion_weights.items()
|
||||||
if source in results_map
|
if source in results_map
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -247,10 +417,16 @@ class HybridSearchEngine:
|
|||||||
)
|
)
|
||||||
|
|
||||||
# Optional: embedding-based reranking on top results
|
# Optional: embedding-based reranking on top results
|
||||||
if self._config is not None and self._config.enable_reranking:
|
if (
|
||||||
|
self._config is not None
|
||||||
|
and self._config.enable_reranking
|
||||||
|
and not lexical_priority_query
|
||||||
|
):
|
||||||
with timer("reranking", self.logger):
|
with timer("reranking", self.logger):
|
||||||
if self.embedder is None:
|
if self.embedder is None:
|
||||||
self.embedder = self._get_reranking_embedder()
|
with self._centralized_cache_lock:
|
||||||
|
if self.embedder is None:
|
||||||
|
self.embedder = self._get_reranking_embedder()
|
||||||
fused_results = rerank_results(
|
fused_results = rerank_results(
|
||||||
query,
|
query,
|
||||||
fused_results[:100],
|
fused_results[:100],
|
||||||
@@ -267,10 +443,13 @@ class HybridSearchEngine:
|
|||||||
self._config is not None
|
self._config is not None
|
||||||
and self._config.enable_reranking
|
and self._config.enable_reranking
|
||||||
and self._config.enable_cross_encoder_rerank
|
and self._config.enable_cross_encoder_rerank
|
||||||
|
and not lexical_priority_query
|
||||||
):
|
):
|
||||||
with timer("cross_encoder_rerank", self.logger):
|
with timer("cross_encoder_rerank", self.logger):
|
||||||
if self.reranker is None:
|
if self.reranker is None:
|
||||||
self.reranker = self._get_cross_encoder_reranker()
|
with self._centralized_cache_lock:
|
||||||
|
if self.reranker is None:
|
||||||
|
self.reranker = self._get_cross_encoder_reranker()
|
||||||
if self.reranker is not None:
|
if self.reranker is not None:
|
||||||
fused_results = cross_encoder_rerank(
|
fused_results = cross_encoder_rerank(
|
||||||
query,
|
query,
|
||||||
@@ -363,11 +542,18 @@ class HybridSearchEngine:
|
|||||||
|
|
||||||
device: str | None = None
|
device: str | None = None
|
||||||
kwargs: dict[str, Any] = {}
|
kwargs: dict[str, Any] = {}
|
||||||
|
reranker_use_gpu = bool(
|
||||||
|
getattr(
|
||||||
|
self._config,
|
||||||
|
"reranker_use_gpu",
|
||||||
|
getattr(self._config, "embedding_use_gpu", True),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if backend == "onnx":
|
if backend == "onnx":
|
||||||
kwargs["use_gpu"] = bool(getattr(self._config, "embedding_use_gpu", True))
|
kwargs["use_gpu"] = reranker_use_gpu
|
||||||
elif backend == "legacy":
|
elif backend == "legacy":
|
||||||
if not bool(getattr(self._config, "embedding_use_gpu", True)):
|
if not reranker_use_gpu:
|
||||||
device = "cpu"
|
device = "cpu"
|
||||||
elif backend == "api":
|
elif backend == "api":
|
||||||
# Pass max_input_tokens for adaptive batching
|
# Pass max_input_tokens for adaptive batching
|
||||||
@@ -573,60 +759,16 @@ class HybridSearchEngine:
|
|||||||
List of SearchResult objects ordered by semantic similarity
|
List of SearchResult objects ordered by semantic similarity
|
||||||
"""
|
"""
|
||||||
try:
|
try:
|
||||||
import sqlite3
|
|
||||||
import json
|
|
||||||
from codexlens.semantic.factory import get_embedder
|
|
||||||
from codexlens.semantic.ann_index import ANNIndex
|
|
||||||
|
|
||||||
# Get model config from the first index database we can find
|
|
||||||
# (all indexes should use the same embedding model)
|
|
||||||
index_root = hnsw_path.parent
|
index_root = hnsw_path.parent
|
||||||
model_config = None
|
model_config = self._get_centralized_model_config(index_root)
|
||||||
|
|
||||||
# Try to get model config from the centralized index root first
|
|
||||||
# (not the sub-directory index_path, which may have outdated config)
|
|
||||||
try:
|
|
||||||
from codexlens.semantic.vector_store import VectorStore
|
|
||||||
central_index_path = index_root / "_index.db"
|
|
||||||
if central_index_path.exists():
|
|
||||||
with VectorStore(central_index_path) as vs:
|
|
||||||
model_config = vs.get_model_config()
|
|
||||||
self.logger.debug(
|
|
||||||
"Loaded model config from centralized index: %s",
|
|
||||||
model_config
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
self.logger.debug("Failed to load model config from centralized index: %s", e)
|
|
||||||
|
|
||||||
# Detect dimension from HNSW file if model config not found
|
|
||||||
if model_config is None:
|
if model_config is None:
|
||||||
self.logger.debug("Model config not found, will detect from HNSW index")
|
self.logger.debug("Model config not found, will detect from cached embedder")
|
||||||
# Create a temporary ANNIndex to load and detect dimension
|
embedder, embedding_dim, embedder_key = self._get_centralized_embedder(model_config)
|
||||||
# We need to know the dimension to properly load the index
|
|
||||||
|
|
||||||
# Get embedder based on model config or default
|
|
||||||
if model_config:
|
|
||||||
backend = model_config.get("backend", "fastembed")
|
|
||||||
model_name = model_config["model_name"]
|
|
||||||
model_profile = model_config["model_profile"]
|
|
||||||
embedding_dim = model_config["embedding_dim"]
|
|
||||||
|
|
||||||
if backend == "litellm":
|
|
||||||
embedder = get_embedder(backend="litellm", model=model_name)
|
|
||||||
else:
|
|
||||||
embedder = get_embedder(backend="fastembed", profile=model_profile)
|
|
||||||
else:
|
|
||||||
# Default to code profile
|
|
||||||
embedder = get_embedder(backend="fastembed", profile="code")
|
|
||||||
embedding_dim = embedder.embedding_dim
|
|
||||||
|
|
||||||
# Load centralized ANN index
|
# Load centralized ANN index
|
||||||
start_load = time.perf_counter()
|
start_load = time.perf_counter()
|
||||||
ann_index = ANNIndex.create_central(
|
ann_index = self._get_centralized_ann_index(index_root=index_root, dim=embedding_dim)
|
||||||
index_root=index_root,
|
if ann_index is None:
|
||||||
dim=embedding_dim,
|
|
||||||
)
|
|
||||||
if not ann_index.load():
|
|
||||||
self.logger.warning("Failed to load centralized vector index from %s", hnsw_path)
|
self.logger.warning("Failed to load centralized vector index from %s", hnsw_path)
|
||||||
return []
|
return []
|
||||||
self.logger.debug(
|
self.logger.debug(
|
||||||
@@ -637,7 +779,7 @@ class HybridSearchEngine:
|
|||||||
|
|
||||||
# Generate query embedding
|
# Generate query embedding
|
||||||
start_embed = time.perf_counter()
|
start_embed = time.perf_counter()
|
||||||
query_embedding = embedder.embed_single(query)
|
query_embedding = self._get_cached_query_embedding(query, embedder, embedder_key)
|
||||||
self.logger.debug(
|
self.logger.debug(
|
||||||
"[TIMING] query_embedding: %.2fms",
|
"[TIMING] query_embedding: %.2fms",
|
||||||
(time.perf_counter() - start_embed) * 1000
|
(time.perf_counter() - start_embed) * 1000
|
||||||
@@ -658,7 +800,7 @@ class HybridSearchEngine:
|
|||||||
return []
|
return []
|
||||||
|
|
||||||
# Convert distances to similarity scores (for cosine: score = 1 - distance)
|
# Convert distances to similarity scores (for cosine: score = 1 - distance)
|
||||||
scores = [1.0 - d for d in distances]
|
scores = [self._clamp_search_score(1.0 - d) for d in distances]
|
||||||
|
|
||||||
# Fetch chunk metadata from semantic_chunks tables
|
# Fetch chunk metadata from semantic_chunks tables
|
||||||
# We need to search across all _index.db files in the project
|
# We need to search across all _index.db files in the project
|
||||||
@@ -755,7 +897,7 @@ class HybridSearchEngine:
|
|||||||
start_line = row.get("start_line")
|
start_line = row.get("start_line")
|
||||||
end_line = row.get("end_line")
|
end_line = row.get("end_line")
|
||||||
|
|
||||||
score = score_map.get(chunk_id, 0.0)
|
score = self._clamp_search_score(score_map.get(chunk_id, 0.0))
|
||||||
|
|
||||||
# Build excerpt
|
# Build excerpt
|
||||||
excerpt = content[:200] + "..." if len(content) > 200 else content
|
excerpt = content[:200] + "..." if len(content) > 200 else content
|
||||||
@@ -818,7 +960,7 @@ class HybridSearchEngine:
|
|||||||
import json
|
import json
|
||||||
|
|
||||||
# Find all _index.db files
|
# Find all _index.db files
|
||||||
index_files = list(index_root.rglob("_index.db"))
|
index_files = filter_index_paths(index_root.rglob("_index.db"), index_root)
|
||||||
|
|
||||||
results = []
|
results = []
|
||||||
found_ids = set()
|
found_ids = set()
|
||||||
@@ -870,7 +1012,7 @@ class HybridSearchEngine:
|
|||||||
metadata_json = row["metadata"]
|
metadata_json = row["metadata"]
|
||||||
metadata = json.loads(metadata_json) if metadata_json else {}
|
metadata = json.loads(metadata_json) if metadata_json else {}
|
||||||
|
|
||||||
score = score_map.get(chunk_id, 0.0)
|
score = self._clamp_search_score(score_map.get(chunk_id, 0.0))
|
||||||
|
|
||||||
# Build excerpt
|
# Build excerpt
|
||||||
excerpt = content[:200] + "..." if len(content) > 200 else content
|
excerpt = content[:200] + "..." if len(content) > 200 else content
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ for combining results from heterogeneous search backends (exact FTS, fuzzy FTS,
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
import re
|
import re
|
||||||
import math
|
import math
|
||||||
from enum import Enum
|
from enum import Enum
|
||||||
@@ -14,6 +15,8 @@ from typing import Any, Dict, List, Optional
|
|||||||
|
|
||||||
from codexlens.entities import SearchResult, AdditionalLocation
|
from codexlens.entities import SearchResult, AdditionalLocation
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
# Default RRF weights for hybrid search
|
# Default RRF weights for hybrid search
|
||||||
DEFAULT_WEIGHTS = {
|
DEFAULT_WEIGHTS = {
|
||||||
@@ -32,6 +35,229 @@ class QueryIntent(str, Enum):
|
|||||||
MIXED = "mixed"
|
MIXED = "mixed"
|
||||||
|
|
||||||
|
|
||||||
|
_TEST_QUERY_RE = re.compile(
|
||||||
|
r"\b(test|tests|spec|specs|fixture|fixtures|benchmark|benchmarks)\b",
|
||||||
|
flags=re.IGNORECASE,
|
||||||
|
)
|
||||||
|
_AUXILIARY_QUERY_RE = re.compile(
|
||||||
|
r"\b(example|examples|demo|demos|sample|samples|debug|benchmark|benchmarks|profile|profiling)\b",
|
||||||
|
flags=re.IGNORECASE,
|
||||||
|
)
|
||||||
|
_ARTIFACT_QUERY_RE = re.compile(
|
||||||
|
r"(?<!\w)(dist|build|out|coverage|htmlcov|generated|bundle|compiled|artifact|artifacts|\.workflow)(?!\w)",
|
||||||
|
flags=re.IGNORECASE,
|
||||||
|
)
|
||||||
|
_ENV_STYLE_QUERY_RE = re.compile(r"\b[A-Z][A-Z0-9]+(?:_[A-Z0-9]+)+\b")
|
||||||
|
_AUXILIARY_DIR_NAMES = frozenset(
|
||||||
|
{
|
||||||
|
"example",
|
||||||
|
"examples",
|
||||||
|
"demo",
|
||||||
|
"demos",
|
||||||
|
"sample",
|
||||||
|
"samples",
|
||||||
|
"benchmark",
|
||||||
|
"benchmarks",
|
||||||
|
"profile",
|
||||||
|
"profiles",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
_GENERATED_DIR_NAMES = frozenset(
|
||||||
|
{
|
||||||
|
"dist",
|
||||||
|
"build",
|
||||||
|
"out",
|
||||||
|
"coverage",
|
||||||
|
"htmlcov",
|
||||||
|
".cache",
|
||||||
|
".workflow",
|
||||||
|
".next",
|
||||||
|
".nuxt",
|
||||||
|
".parcel-cache",
|
||||||
|
".turbo",
|
||||||
|
"tmp",
|
||||||
|
"temp",
|
||||||
|
"generated",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
_GENERATED_FILE_SUFFIXES = (
|
||||||
|
".generated.ts",
|
||||||
|
".generated.tsx",
|
||||||
|
".generated.js",
|
||||||
|
".generated.jsx",
|
||||||
|
".generated.py",
|
||||||
|
".gen.ts",
|
||||||
|
".gen.tsx",
|
||||||
|
".gen.js",
|
||||||
|
".gen.jsx",
|
||||||
|
".min.js",
|
||||||
|
".min.css",
|
||||||
|
".bundle.js",
|
||||||
|
".bundle.css",
|
||||||
|
)
|
||||||
|
_SOURCE_DIR_NAMES = frozenset(
|
||||||
|
{
|
||||||
|
"src",
|
||||||
|
"lib",
|
||||||
|
"core",
|
||||||
|
"app",
|
||||||
|
"server",
|
||||||
|
"client",
|
||||||
|
"services",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
_IDENTIFIER_QUERY_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
|
||||||
|
_TOPIC_TOKEN_RE = re.compile(r"[A-Za-z][A-Za-z0-9]*")
|
||||||
|
_EXPLICIT_PATH_HINT_MARKER_RE = re.compile(r"[_\-/\\.]")
|
||||||
|
_SEMANTIC_QUERY_STOPWORDS = frozenset(
|
||||||
|
{
|
||||||
|
"the",
|
||||||
|
"a",
|
||||||
|
"an",
|
||||||
|
"is",
|
||||||
|
"are",
|
||||||
|
"was",
|
||||||
|
"were",
|
||||||
|
"be",
|
||||||
|
"been",
|
||||||
|
"being",
|
||||||
|
"have",
|
||||||
|
"has",
|
||||||
|
"had",
|
||||||
|
"do",
|
||||||
|
"does",
|
||||||
|
"did",
|
||||||
|
"will",
|
||||||
|
"would",
|
||||||
|
"could",
|
||||||
|
"should",
|
||||||
|
"may",
|
||||||
|
"might",
|
||||||
|
"must",
|
||||||
|
"can",
|
||||||
|
"to",
|
||||||
|
"of",
|
||||||
|
"in",
|
||||||
|
"for",
|
||||||
|
"on",
|
||||||
|
"with",
|
||||||
|
"at",
|
||||||
|
"by",
|
||||||
|
"from",
|
||||||
|
"as",
|
||||||
|
"into",
|
||||||
|
"through",
|
||||||
|
"and",
|
||||||
|
"but",
|
||||||
|
"if",
|
||||||
|
"or",
|
||||||
|
"not",
|
||||||
|
"this",
|
||||||
|
"that",
|
||||||
|
"these",
|
||||||
|
"those",
|
||||||
|
"it",
|
||||||
|
"its",
|
||||||
|
"how",
|
||||||
|
"what",
|
||||||
|
"where",
|
||||||
|
"when",
|
||||||
|
"why",
|
||||||
|
"which",
|
||||||
|
"who",
|
||||||
|
"whom",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
_PATH_TOPIC_STOPWORDS = frozenset(
|
||||||
|
{
|
||||||
|
*_SOURCE_DIR_NAMES,
|
||||||
|
*_AUXILIARY_DIR_NAMES,
|
||||||
|
*_GENERATED_DIR_NAMES,
|
||||||
|
"tool",
|
||||||
|
"tools",
|
||||||
|
"util",
|
||||||
|
"utils",
|
||||||
|
"test",
|
||||||
|
"tests",
|
||||||
|
"spec",
|
||||||
|
"specs",
|
||||||
|
"fixture",
|
||||||
|
"fixtures",
|
||||||
|
"index",
|
||||||
|
"main",
|
||||||
|
"ts",
|
||||||
|
"tsx",
|
||||||
|
"js",
|
||||||
|
"jsx",
|
||||||
|
"mjs",
|
||||||
|
"cjs",
|
||||||
|
"py",
|
||||||
|
"java",
|
||||||
|
"go",
|
||||||
|
"rs",
|
||||||
|
"rb",
|
||||||
|
"php",
|
||||||
|
"cs",
|
||||||
|
"cpp",
|
||||||
|
"cc",
|
||||||
|
"c",
|
||||||
|
"h",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
_LEXICAL_PRIORITY_SURFACE_TOKENS = frozenset(
|
||||||
|
{
|
||||||
|
"config",
|
||||||
|
"configs",
|
||||||
|
"configuration",
|
||||||
|
"configurations",
|
||||||
|
"setting",
|
||||||
|
"settings",
|
||||||
|
"backend",
|
||||||
|
"backends",
|
||||||
|
"environment",
|
||||||
|
"env",
|
||||||
|
"variable",
|
||||||
|
"variables",
|
||||||
|
"factory",
|
||||||
|
"factories",
|
||||||
|
"override",
|
||||||
|
"overrides",
|
||||||
|
"option",
|
||||||
|
"options",
|
||||||
|
"flag",
|
||||||
|
"flags",
|
||||||
|
"mode",
|
||||||
|
"modes",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
_LEXICAL_PRIORITY_FOCUS_TOKENS = frozenset(
|
||||||
|
{
|
||||||
|
"embedding",
|
||||||
|
"embeddings",
|
||||||
|
"reranker",
|
||||||
|
"rerankers",
|
||||||
|
"onnx",
|
||||||
|
"api",
|
||||||
|
"litellm",
|
||||||
|
"fastembed",
|
||||||
|
"local",
|
||||||
|
"legacy",
|
||||||
|
"stage",
|
||||||
|
"stage2",
|
||||||
|
"stage3",
|
||||||
|
"stage4",
|
||||||
|
"precomputed",
|
||||||
|
"realtime",
|
||||||
|
"static",
|
||||||
|
"global",
|
||||||
|
"graph",
|
||||||
|
"selection",
|
||||||
|
"model",
|
||||||
|
"models",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def normalize_weights(weights: Dict[str, float | None]) -> Dict[str, float | None]:
|
def normalize_weights(weights: Dict[str, float | None]) -> Dict[str, float | None]:
|
||||||
"""Normalize weights to sum to 1.0 (best-effort)."""
|
"""Normalize weights to sum to 1.0 (best-effort)."""
|
||||||
total = sum(float(v) for v in weights.values() if v is not None)
|
total = sum(float(v) for v in weights.values() if v is not None)
|
||||||
@@ -66,6 +292,7 @@ def detect_query_intent(query: str) -> QueryIntent:
|
|||||||
has_code_signals = bool(
|
has_code_signals = bool(
|
||||||
re.search(r"(::|->|\.)", trimmed)
|
re.search(r"(::|->|\.)", trimmed)
|
||||||
or re.search(r"[A-Z][a-z]+[A-Z]", trimmed)
|
or re.search(r"[A-Z][a-z]+[A-Z]", trimmed)
|
||||||
|
or re.search(r"\b[a-z]+[A-Z][A-Za-z0-9_]*\b", trimmed)
|
||||||
or re.search(r"\b\w+_\w+\b", trimmed)
|
or re.search(r"\b\w+_\w+\b", trimmed)
|
||||||
or re.search(
|
or re.search(
|
||||||
r"\b(def|class|function|const|let|var|import|from|return|async|await|interface|type)\b",
|
r"\b(def|class|function|const|let|var|import|from|return|async|await|interface|type)\b",
|
||||||
@@ -119,6 +346,56 @@ def get_rrf_weights(
|
|||||||
return adjust_weights_by_intent(detect_query_intent(query), base_weights)
|
return adjust_weights_by_intent(detect_query_intent(query), base_weights)
|
||||||
|
|
||||||
|
|
||||||
|
def query_targets_test_files(query: str) -> bool:
|
||||||
|
"""Return True when the query explicitly targets tests/spec fixtures."""
|
||||||
|
return bool(_TEST_QUERY_RE.search((query or "").strip()))
|
||||||
|
|
||||||
|
|
||||||
|
def query_targets_generated_files(query: str) -> bool:
|
||||||
|
"""Return True when the query explicitly targets generated/build artifacts."""
|
||||||
|
return bool(_ARTIFACT_QUERY_RE.search((query or "").strip()))
|
||||||
|
|
||||||
|
|
||||||
|
def query_targets_auxiliary_files(query: str) -> bool:
|
||||||
|
"""Return True when the query explicitly targets examples, benchmarks, or debug files."""
|
||||||
|
return bool(_AUXILIARY_QUERY_RE.search((query or "").strip()))
|
||||||
|
|
||||||
|
|
||||||
|
def query_prefers_lexical_search(query: str) -> bool:
|
||||||
|
"""Return True when config/env/factory style queries are safer with lexical-first search."""
|
||||||
|
trimmed = (query or "").strip()
|
||||||
|
if not trimmed:
|
||||||
|
return False
|
||||||
|
|
||||||
|
if _ENV_STYLE_QUERY_RE.search(trimmed):
|
||||||
|
return True
|
||||||
|
|
||||||
|
query_tokens = set(_semantic_query_topic_tokens(trimmed))
|
||||||
|
if not query_tokens:
|
||||||
|
return False
|
||||||
|
|
||||||
|
if query_tokens.intersection({"factory", "factories"}):
|
||||||
|
return True
|
||||||
|
|
||||||
|
if query_tokens.intersection({"environment", "env"}) and query_tokens.intersection({"variable", "variables"}):
|
||||||
|
return True
|
||||||
|
|
||||||
|
if "backend" in query_tokens and query_tokens.intersection(
|
||||||
|
{"embedding", "embeddings", "reranker", "rerankers", "onnx", "api", "litellm", "fastembed", "local", "legacy"}
|
||||||
|
):
|
||||||
|
return True
|
||||||
|
|
||||||
|
surface_hits = query_tokens.intersection(_LEXICAL_PRIORITY_SURFACE_TOKENS)
|
||||||
|
focus_hits = query_tokens.intersection(_LEXICAL_PRIORITY_FOCUS_TOKENS)
|
||||||
|
return bool(surface_hits and focus_hits)
|
||||||
|
|
||||||
|
|
||||||
|
def _normalized_path_parts(path: str) -> List[str]:
|
||||||
|
"""Normalize a path string into casefolded components for heuristics."""
|
||||||
|
normalized = (path or "").replace("\\", "/")
|
||||||
|
return [part.casefold() for part in normalized.split("/") if part and part != "."]
|
||||||
|
|
||||||
|
|
||||||
# File extensions to category mapping for fast lookup
|
# File extensions to category mapping for fast lookup
|
||||||
_EXT_TO_CATEGORY: Dict[str, str] = {
|
_EXT_TO_CATEGORY: Dict[str, str] = {
|
||||||
# Code extensions
|
# Code extensions
|
||||||
@@ -196,6 +473,482 @@ def filter_results_by_category(
|
|||||||
return filtered
|
return filtered
|
||||||
|
|
||||||
|
|
||||||
|
def is_test_file(path: str) -> bool:
|
||||||
|
"""Return True when a path clearly refers to a test/spec file."""
|
||||||
|
parts = _normalized_path_parts(path)
|
||||||
|
if not parts:
|
||||||
|
return False
|
||||||
|
basename = parts[-1]
|
||||||
|
return (
|
||||||
|
basename.startswith("test_")
|
||||||
|
or basename.endswith("_test.py")
|
||||||
|
or basename.endswith(".test.ts")
|
||||||
|
or basename.endswith(".test.tsx")
|
||||||
|
or basename.endswith(".test.js")
|
||||||
|
or basename.endswith(".test.jsx")
|
||||||
|
or basename.endswith(".spec.ts")
|
||||||
|
or basename.endswith(".spec.tsx")
|
||||||
|
or basename.endswith(".spec.js")
|
||||||
|
or basename.endswith(".spec.jsx")
|
||||||
|
or "tests" in parts[:-1]
|
||||||
|
or "test" in parts[:-1]
|
||||||
|
or "__fixtures__" in parts[:-1]
|
||||||
|
or "fixtures" in parts[:-1]
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def is_generated_artifact_path(path: str) -> bool:
|
||||||
|
"""Return True when a path clearly points at generated/build artifacts."""
|
||||||
|
parts = _normalized_path_parts(path)
|
||||||
|
if not parts:
|
||||||
|
return False
|
||||||
|
basename = parts[-1]
|
||||||
|
return any(part in _GENERATED_DIR_NAMES for part in parts[:-1]) or basename.endswith(
|
||||||
|
_GENERATED_FILE_SUFFIXES
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def is_auxiliary_reference_path(path: str) -> bool:
|
||||||
|
"""Return True for examples, benchmarks, demos, and debug helper files."""
|
||||||
|
parts = _normalized_path_parts(path)
|
||||||
|
if not parts:
|
||||||
|
return False
|
||||||
|
basename = parts[-1]
|
||||||
|
if any(part in _AUXILIARY_DIR_NAMES for part in parts[:-1]):
|
||||||
|
return True
|
||||||
|
return (
|
||||||
|
basename.startswith("debug_")
|
||||||
|
or basename.startswith("benchmark")
|
||||||
|
or basename.startswith("profile_")
|
||||||
|
or "_benchmark" in basename
|
||||||
|
or "_profile" in basename
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_identifier_query(query: str) -> Optional[str]:
|
||||||
|
"""Return a single-token identifier query when definition boosting is safe."""
|
||||||
|
trimmed = (query or "").strip()
|
||||||
|
if not trimmed or " " in trimmed:
|
||||||
|
return None
|
||||||
|
if not _IDENTIFIER_QUERY_RE.fullmatch(trimmed):
|
||||||
|
return None
|
||||||
|
return trimmed
|
||||||
|
|
||||||
|
|
||||||
|
def extract_explicit_path_hints(query: str) -> List[List[str]]:
|
||||||
|
"""Extract explicit path/file hints from separator-style query tokens.
|
||||||
|
|
||||||
|
Natural-language queries often contain one or two high-signal feature/file
|
||||||
|
hints such as ``smart_search`` or ``smart-search.ts`` alongside broader
|
||||||
|
platform words like ``CodexLens``. These hints should be treated as more
|
||||||
|
specific than the surrounding prose.
|
||||||
|
"""
|
||||||
|
hints: List[List[str]] = []
|
||||||
|
seen: set[tuple[str, ...]] = set()
|
||||||
|
for raw_part in re.split(r"\s+", query or ""):
|
||||||
|
candidate = raw_part.strip().strip("\"'`()[]{}<>:,;")
|
||||||
|
if not candidate or not _EXPLICIT_PATH_HINT_MARKER_RE.search(candidate):
|
||||||
|
continue
|
||||||
|
tokens = [
|
||||||
|
token
|
||||||
|
for token in _split_identifier_like_tokens(candidate)
|
||||||
|
if token not in _PATH_TOPIC_STOPWORDS
|
||||||
|
]
|
||||||
|
if len(tokens) < 2:
|
||||||
|
continue
|
||||||
|
key = tuple(tokens)
|
||||||
|
if key in seen:
|
||||||
|
continue
|
||||||
|
seen.add(key)
|
||||||
|
hints.append(list(key))
|
||||||
|
return hints
|
||||||
|
|
||||||
|
|
||||||
|
def _is_source_implementation_path(path: str) -> bool:
|
||||||
|
"""Return True when a path looks like an implementation file under a source dir."""
|
||||||
|
parts = _normalized_path_parts(path)
|
||||||
|
if not parts:
|
||||||
|
return False
|
||||||
|
return any(part in _SOURCE_DIR_NAMES for part in parts[:-1])
|
||||||
|
|
||||||
|
|
||||||
|
def _result_text_candidates(result: SearchResult) -> List[str]:
|
||||||
|
"""Collect short text snippets that may contain a symbol definition."""
|
||||||
|
candidates: List[str] = []
|
||||||
|
for text in (result.excerpt, result.content):
|
||||||
|
if not isinstance(text, str) or not text.strip():
|
||||||
|
continue
|
||||||
|
for line in text.splitlines():
|
||||||
|
stripped = line.strip()
|
||||||
|
if stripped:
|
||||||
|
candidates.append(stripped)
|
||||||
|
if len(candidates) >= 6:
|
||||||
|
break
|
||||||
|
if len(candidates) >= 6:
|
||||||
|
break
|
||||||
|
|
||||||
|
symbol_name = result.symbol_name
|
||||||
|
if not symbol_name and result.symbol is not None:
|
||||||
|
symbol_name = getattr(result.symbol, "name", None)
|
||||||
|
if isinstance(symbol_name, str) and symbol_name.strip():
|
||||||
|
candidates.append(symbol_name.strip())
|
||||||
|
return candidates
|
||||||
|
|
||||||
|
|
||||||
|
def _result_defines_identifier(result: SearchResult, symbol: str) -> bool:
|
||||||
|
"""Best-effort check for whether a result snippet looks like a symbol definition."""
|
||||||
|
escaped_symbol = re.escape(symbol)
|
||||||
|
definition_patterns = (
|
||||||
|
rf"^\s*(?:export\s+)?(?:default\s+)?(?:async\s+)?def\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?(?:default\s+)?(?:async\s+)?function\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?(?:default\s+)?class\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?(?:default\s+)?interface\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?(?:default\s+)?type\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*(?:export\s+)?(?:default\s+)?(?:const|let|var)\s+{escaped_symbol}\b",
|
||||||
|
rf"^\s*{escaped_symbol}\s*=\s*(?:async\s+)?\(",
|
||||||
|
rf"^\s*{escaped_symbol}\s*=\s*(?:async\s+)?[^=]*=>",
|
||||||
|
)
|
||||||
|
for candidate in _result_text_candidates(result):
|
||||||
|
if any(re.search(pattern, candidate) for pattern in definition_patterns):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _split_identifier_like_tokens(text: str) -> List[str]:
|
||||||
|
"""Split identifier-like text into normalized word tokens."""
|
||||||
|
if not text:
|
||||||
|
return []
|
||||||
|
|
||||||
|
tokens: List[str] = []
|
||||||
|
for raw_token in _TOPIC_TOKEN_RE.findall(text):
|
||||||
|
expanded = re.sub(r"([a-z0-9])([A-Z])", r"\1 \2", raw_token)
|
||||||
|
expanded = re.sub(r"([A-Z]+)([A-Z][a-z])", r"\1 \2", expanded)
|
||||||
|
for token in expanded.split():
|
||||||
|
normalized = _normalize_topic_token(token)
|
||||||
|
if normalized:
|
||||||
|
tokens.append(normalized)
|
||||||
|
return tokens
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_topic_token(token: str) -> Optional[str]:
|
||||||
|
"""Normalize lightweight topic tokens for query/path overlap heuristics."""
|
||||||
|
normalized = (token or "").casefold()
|
||||||
|
if len(normalized) < 2 or normalized.isdigit():
|
||||||
|
return None
|
||||||
|
if len(normalized) > 4 and normalized.endswith("ies"):
|
||||||
|
normalized = f"{normalized[:-3]}y"
|
||||||
|
elif len(normalized) > 3 and normalized.endswith("s") and not normalized.endswith("ss"):
|
||||||
|
normalized = normalized[:-1]
|
||||||
|
return normalized or None
|
||||||
|
|
||||||
|
|
||||||
|
def _dedupe_preserve_order(tokens: List[str]) -> List[str]:
|
||||||
|
"""Deduplicate tokens while preserving the first-seen order."""
|
||||||
|
deduped: List[str] = []
|
||||||
|
seen: set[str] = set()
|
||||||
|
for token in tokens:
|
||||||
|
if token in seen:
|
||||||
|
continue
|
||||||
|
seen.add(token)
|
||||||
|
deduped.append(token)
|
||||||
|
return deduped
|
||||||
|
|
||||||
|
|
||||||
|
def _semantic_query_topic_tokens(query: str) -> List[str]:
|
||||||
|
"""Extract salient natural-language tokens for lightweight topic matching."""
|
||||||
|
tokens = [
|
||||||
|
token
|
||||||
|
for token in _split_identifier_like_tokens(query)
|
||||||
|
if token not in _SEMANTIC_QUERY_STOPWORDS
|
||||||
|
]
|
||||||
|
return _dedupe_preserve_order(tokens)
|
||||||
|
|
||||||
|
|
||||||
|
def _path_topic_tokens(path: str) -> tuple[List[str], List[str]]:
|
||||||
|
"""Extract normalized topic tokens from a path and its basename."""
|
||||||
|
parts = _normalized_path_parts(path)
|
||||||
|
if not parts:
|
||||||
|
return [], []
|
||||||
|
|
||||||
|
path_tokens: List[str] = []
|
||||||
|
basename_tokens: List[str] = []
|
||||||
|
last_index = len(parts) - 1
|
||||||
|
for index, part in enumerate(parts):
|
||||||
|
target = basename_tokens if index == last_index else path_tokens
|
||||||
|
for token in _split_identifier_like_tokens(part):
|
||||||
|
if token in _PATH_TOPIC_STOPWORDS:
|
||||||
|
continue
|
||||||
|
target.append(token)
|
||||||
|
return _dedupe_preserve_order(path_tokens), _dedupe_preserve_order(basename_tokens)
|
||||||
|
|
||||||
|
|
||||||
|
def _source_path_topic_boost(
|
||||||
|
query: str,
|
||||||
|
path: str,
|
||||||
|
query_intent: QueryIntent,
|
||||||
|
) -> tuple[float, List[str]]:
|
||||||
|
"""Return a path/topic boost when a query strongly overlaps a source path."""
|
||||||
|
query_tokens = _semantic_query_topic_tokens(query)
|
||||||
|
if len(query_tokens) < 2:
|
||||||
|
return 1.0, []
|
||||||
|
|
||||||
|
path_tokens, basename_tokens = _path_topic_tokens(path)
|
||||||
|
if not path_tokens and not basename_tokens:
|
||||||
|
return 1.0, []
|
||||||
|
|
||||||
|
path_token_set = set(path_tokens) | set(basename_tokens)
|
||||||
|
basename_overlap = [token for token in query_tokens if token in basename_tokens]
|
||||||
|
all_overlap = [token for token in query_tokens if token in path_token_set]
|
||||||
|
explicit_hint_tokens = extract_explicit_path_hints(query)
|
||||||
|
|
||||||
|
for hint_tokens in explicit_hint_tokens:
|
||||||
|
if basename_tokens == hint_tokens:
|
||||||
|
if query_intent == QueryIntent.KEYWORD:
|
||||||
|
return 4.5, hint_tokens[:3]
|
||||||
|
return 2.4, hint_tokens[:3]
|
||||||
|
if all(token in basename_tokens for token in hint_tokens):
|
||||||
|
if query_intent == QueryIntent.KEYWORD:
|
||||||
|
return 4.5, hint_tokens[:3]
|
||||||
|
return 1.6, hint_tokens[:3]
|
||||||
|
|
||||||
|
if query_prefers_lexical_search(query):
|
||||||
|
lexical_surface_overlap = [
|
||||||
|
token for token in basename_tokens if token in query_tokens and token in _LEXICAL_PRIORITY_SURFACE_TOKENS
|
||||||
|
]
|
||||||
|
if lexical_surface_overlap:
|
||||||
|
lexical_overlap = lexical_surface_overlap[:3]
|
||||||
|
if query_intent == QueryIntent.KEYWORD:
|
||||||
|
return 5.5, lexical_overlap
|
||||||
|
return 5.0, lexical_overlap
|
||||||
|
|
||||||
|
if query_intent == QueryIntent.KEYWORD:
|
||||||
|
if len(basename_overlap) >= 2:
|
||||||
|
# Multi-token identifier-style queries often name the feature/file directly.
|
||||||
|
# Give basename matches a stronger lift so they can survive workspace fan-out.
|
||||||
|
multiplier = min(4.5, 2.0 + 1.25 * float(len(basename_overlap)))
|
||||||
|
return multiplier, basename_overlap[:3]
|
||||||
|
if len(all_overlap) >= 3:
|
||||||
|
multiplier = min(2.0, 1.1 + 0.2 * len(all_overlap))
|
||||||
|
return multiplier, all_overlap[:3]
|
||||||
|
return 1.0, []
|
||||||
|
|
||||||
|
if len(basename_overlap) >= 2:
|
||||||
|
multiplier = min(1.45, 1.15 + 0.1 * len(basename_overlap))
|
||||||
|
return multiplier, basename_overlap[:3]
|
||||||
|
if len(all_overlap) >= 3:
|
||||||
|
multiplier = min(1.3, 1.05 + 0.05 * len(all_overlap))
|
||||||
|
return multiplier, all_overlap[:3]
|
||||||
|
return 1.0, []
|
||||||
|
|
||||||
|
|
||||||
|
def apply_path_penalties(
|
||||||
|
results: List[SearchResult],
|
||||||
|
query: str,
|
||||||
|
*,
|
||||||
|
test_file_penalty: float = 0.15,
|
||||||
|
generated_file_penalty: float = 0.35,
|
||||||
|
) -> List[SearchResult]:
|
||||||
|
"""Apply lightweight path-based penalties to reduce noisy rankings."""
|
||||||
|
if not results or (test_file_penalty <= 0 and generated_file_penalty <= 0):
|
||||||
|
return results
|
||||||
|
|
||||||
|
query_intent = detect_query_intent(query)
|
||||||
|
skip_test_penalty = query_targets_test_files(query)
|
||||||
|
skip_auxiliary_penalty = query_targets_auxiliary_files(query)
|
||||||
|
skip_generated_penalty = query_targets_generated_files(query)
|
||||||
|
query_topic_tokens = _semantic_query_topic_tokens(query)
|
||||||
|
keyword_path_query = query_intent == QueryIntent.KEYWORD and len(query_topic_tokens) >= 2
|
||||||
|
explicit_feature_query = bool(extract_explicit_path_hints(query))
|
||||||
|
source_oriented_query = (
|
||||||
|
explicit_feature_query
|
||||||
|
or keyword_path_query
|
||||||
|
or (
|
||||||
|
query_intent in {QueryIntent.SEMANTIC, QueryIntent.MIXED}
|
||||||
|
and len(query_topic_tokens) >= 2
|
||||||
|
)
|
||||||
|
)
|
||||||
|
identifier_query = None
|
||||||
|
if query_intent == QueryIntent.KEYWORD:
|
||||||
|
identifier_query = _extract_identifier_query(query)
|
||||||
|
effective_test_penalty = float(test_file_penalty)
|
||||||
|
if effective_test_penalty > 0 and not skip_test_penalty:
|
||||||
|
if query_intent == QueryIntent.KEYWORD:
|
||||||
|
# Identifier-style queries should prefer implementation files over test references.
|
||||||
|
effective_test_penalty = max(effective_test_penalty, 0.35)
|
||||||
|
elif query_intent in {QueryIntent.SEMANTIC, QueryIntent.MIXED}:
|
||||||
|
# Natural-language code queries should still prefer implementation files over references.
|
||||||
|
effective_test_penalty = max(effective_test_penalty, 0.25)
|
||||||
|
if explicit_feature_query:
|
||||||
|
# Explicit feature/file hints should be even more biased toward source implementations.
|
||||||
|
effective_test_penalty = max(effective_test_penalty, 0.45)
|
||||||
|
effective_auxiliary_penalty = effective_test_penalty
|
||||||
|
if effective_auxiliary_penalty > 0 and not skip_auxiliary_penalty and explicit_feature_query:
|
||||||
|
# Examples/benchmarks are usually descriptive noise for feature-targeted implementation queries.
|
||||||
|
effective_auxiliary_penalty = max(effective_auxiliary_penalty, 0.5)
|
||||||
|
effective_generated_penalty = float(generated_file_penalty)
|
||||||
|
if effective_generated_penalty > 0 and not skip_generated_penalty:
|
||||||
|
if source_oriented_query:
|
||||||
|
effective_generated_penalty = max(effective_generated_penalty, 0.45)
|
||||||
|
if explicit_feature_query:
|
||||||
|
effective_generated_penalty = max(effective_generated_penalty, 0.6)
|
||||||
|
|
||||||
|
penalized: List[SearchResult] = []
|
||||||
|
for result in results:
|
||||||
|
multiplier = 1.0
|
||||||
|
penalty_multiplier = 1.0
|
||||||
|
boost_multiplier = 1.0
|
||||||
|
penalty_reasons: List[str] = []
|
||||||
|
boost_reasons: List[str] = []
|
||||||
|
|
||||||
|
if effective_test_penalty > 0 and not skip_test_penalty and is_test_file(result.path):
|
||||||
|
penalty_multiplier *= max(0.0, 1.0 - effective_test_penalty)
|
||||||
|
penalty_reasons.append("test_file")
|
||||||
|
|
||||||
|
if (
|
||||||
|
effective_auxiliary_penalty > 0
|
||||||
|
and not skip_auxiliary_penalty
|
||||||
|
and not is_test_file(result.path)
|
||||||
|
and is_auxiliary_reference_path(result.path)
|
||||||
|
):
|
||||||
|
penalty_multiplier *= max(0.0, 1.0 - effective_auxiliary_penalty)
|
||||||
|
penalty_reasons.append("auxiliary_file")
|
||||||
|
|
||||||
|
if (
|
||||||
|
effective_generated_penalty > 0
|
||||||
|
and not skip_generated_penalty
|
||||||
|
and is_generated_artifact_path(result.path)
|
||||||
|
):
|
||||||
|
penalty_multiplier *= max(0.0, 1.0 - effective_generated_penalty)
|
||||||
|
penalty_reasons.append("generated_artifact")
|
||||||
|
|
||||||
|
if (
|
||||||
|
identifier_query
|
||||||
|
and not is_test_file(result.path)
|
||||||
|
and not is_generated_artifact_path(result.path)
|
||||||
|
and _result_defines_identifier(result, identifier_query)
|
||||||
|
):
|
||||||
|
if _is_source_implementation_path(result.path):
|
||||||
|
boost_multiplier *= 2.0
|
||||||
|
boost_reasons.append("source_definition")
|
||||||
|
else:
|
||||||
|
boost_multiplier *= 1.35
|
||||||
|
boost_reasons.append("symbol_definition")
|
||||||
|
|
||||||
|
if (
|
||||||
|
(query_intent in {QueryIntent.SEMANTIC, QueryIntent.MIXED} or keyword_path_query)
|
||||||
|
and not skip_test_penalty
|
||||||
|
and not skip_auxiliary_penalty
|
||||||
|
and not skip_generated_penalty
|
||||||
|
and not is_test_file(result.path)
|
||||||
|
and not is_generated_artifact_path(result.path)
|
||||||
|
and not is_auxiliary_reference_path(result.path)
|
||||||
|
and _is_source_implementation_path(result.path)
|
||||||
|
):
|
||||||
|
semantic_path_boost, overlap_tokens = _source_path_topic_boost(
|
||||||
|
query,
|
||||||
|
result.path,
|
||||||
|
query_intent,
|
||||||
|
)
|
||||||
|
if semantic_path_boost > 1.0:
|
||||||
|
boost_multiplier *= semantic_path_boost
|
||||||
|
boost_reasons.append("source_path_topic_overlap")
|
||||||
|
|
||||||
|
multiplier = penalty_multiplier * boost_multiplier
|
||||||
|
if penalty_reasons or boost_reasons:
|
||||||
|
metadata = {
|
||||||
|
**result.metadata,
|
||||||
|
"path_rank_multiplier": multiplier,
|
||||||
|
}
|
||||||
|
if penalty_reasons:
|
||||||
|
metadata["path_penalty_reasons"] = penalty_reasons
|
||||||
|
metadata["path_penalty_multiplier"] = penalty_multiplier
|
||||||
|
if boost_reasons:
|
||||||
|
metadata["path_boost_reasons"] = boost_reasons
|
||||||
|
metadata["path_boost_multiplier"] = boost_multiplier
|
||||||
|
if "source_path_topic_overlap" in boost_reasons and overlap_tokens:
|
||||||
|
metadata["path_boost_overlap_tokens"] = overlap_tokens
|
||||||
|
penalized.append(
|
||||||
|
result.model_copy(
|
||||||
|
deep=True,
|
||||||
|
update={
|
||||||
|
"score": max(0.0, float(result.score) * multiplier),
|
||||||
|
"metadata": metadata,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
penalized.append(result)
|
||||||
|
|
||||||
|
penalized.sort(key=lambda r: r.score, reverse=True)
|
||||||
|
return penalized
|
||||||
|
|
||||||
|
|
||||||
|
def rebalance_noisy_results(
|
||||||
|
results: List[SearchResult],
|
||||||
|
query: str,
|
||||||
|
) -> List[SearchResult]:
|
||||||
|
"""Move noisy test/generated/auxiliary results behind implementation hits when safe."""
|
||||||
|
if not results:
|
||||||
|
return []
|
||||||
|
|
||||||
|
query_intent = detect_query_intent(query)
|
||||||
|
skip_test_penalty = query_targets_test_files(query)
|
||||||
|
skip_auxiliary_penalty = query_targets_auxiliary_files(query)
|
||||||
|
skip_generated_penalty = query_targets_generated_files(query)
|
||||||
|
query_topic_tokens = _semantic_query_topic_tokens(query)
|
||||||
|
keyword_path_query = query_intent == QueryIntent.KEYWORD and len(query_topic_tokens) >= 2
|
||||||
|
explicit_feature_query = bool(extract_explicit_path_hints(query))
|
||||||
|
source_oriented_query = (
|
||||||
|
explicit_feature_query
|
||||||
|
or keyword_path_query
|
||||||
|
or (
|
||||||
|
query_intent in {QueryIntent.SEMANTIC, QueryIntent.MIXED}
|
||||||
|
and len(query_topic_tokens) >= 2
|
||||||
|
)
|
||||||
|
)
|
||||||
|
if not source_oriented_query:
|
||||||
|
return results
|
||||||
|
|
||||||
|
max_generated_results = len(results) if skip_generated_penalty else 0
|
||||||
|
max_test_results = len(results) if skip_test_penalty else (0 if explicit_feature_query else 1)
|
||||||
|
max_auxiliary_results = len(results) if skip_auxiliary_penalty else (0 if explicit_feature_query else 1)
|
||||||
|
|
||||||
|
selected: List[SearchResult] = []
|
||||||
|
deferred: List[SearchResult] = []
|
||||||
|
generated_count = 0
|
||||||
|
test_count = 0
|
||||||
|
auxiliary_count = 0
|
||||||
|
|
||||||
|
for result in results:
|
||||||
|
if not skip_generated_penalty and is_generated_artifact_path(result.path):
|
||||||
|
if generated_count >= max_generated_results:
|
||||||
|
deferred.append(result)
|
||||||
|
continue
|
||||||
|
generated_count += 1
|
||||||
|
selected.append(result)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not skip_test_penalty and is_test_file(result.path):
|
||||||
|
if test_count >= max_test_results:
|
||||||
|
deferred.append(result)
|
||||||
|
continue
|
||||||
|
test_count += 1
|
||||||
|
selected.append(result)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not skip_auxiliary_penalty and is_auxiliary_reference_path(result.path):
|
||||||
|
if auxiliary_count >= max_auxiliary_results:
|
||||||
|
deferred.append(result)
|
||||||
|
continue
|
||||||
|
auxiliary_count += 1
|
||||||
|
selected.append(result)
|
||||||
|
continue
|
||||||
|
|
||||||
|
selected.append(result)
|
||||||
|
|
||||||
|
return selected + deferred
|
||||||
|
|
||||||
|
|
||||||
def simple_weighted_fusion(
|
def simple_weighted_fusion(
|
||||||
results_map: Dict[str, List[SearchResult]],
|
results_map: Dict[str, List[SearchResult]],
|
||||||
weights: Dict[str, float] = None,
|
weights: Dict[str, float] = None,
|
||||||
@@ -633,10 +1386,16 @@ def cross_encoder_rerank(
|
|||||||
raw_scores = reranker.predict(pairs, batch_size=int(batch_size))
|
raw_scores = reranker.predict(pairs, batch_size=int(batch_size))
|
||||||
else:
|
else:
|
||||||
return results
|
return results
|
||||||
except Exception:
|
except Exception as exc:
|
||||||
|
logger.debug("Cross-encoder rerank failed; returning original ranking: %s", exc)
|
||||||
return results
|
return results
|
||||||
|
|
||||||
if not raw_scores or len(raw_scores) != rerank_count:
|
if not raw_scores or len(raw_scores) != rerank_count:
|
||||||
|
logger.debug(
|
||||||
|
"Cross-encoder rerank returned %d scores for %d candidates; returning original ranking",
|
||||||
|
len(raw_scores) if raw_scores else 0,
|
||||||
|
rerank_count,
|
||||||
|
)
|
||||||
return results
|
return results
|
||||||
|
|
||||||
scores = [float(s) for s in raw_scores]
|
scores = [float(s) for s in raw_scores]
|
||||||
@@ -653,26 +1412,13 @@ def cross_encoder_rerank(
|
|||||||
else:
|
else:
|
||||||
probs = [sigmoid(s) for s in scores]
|
probs = [sigmoid(s) for s in scores]
|
||||||
|
|
||||||
|
query_intent = detect_query_intent(query)
|
||||||
|
skip_test_penalty = query_targets_test_files(query)
|
||||||
|
skip_auxiliary_penalty = query_targets_auxiliary_files(query)
|
||||||
|
skip_generated_penalty = query_targets_generated_files(query)
|
||||||
|
keyword_path_query = query_intent == QueryIntent.KEYWORD and len(_semantic_query_topic_tokens(query)) >= 2
|
||||||
reranked_results: List[SearchResult] = []
|
reranked_results: List[SearchResult] = []
|
||||||
|
|
||||||
# Helper to detect test files
|
|
||||||
def is_test_file(path: str) -> bool:
|
|
||||||
if not path:
|
|
||||||
return False
|
|
||||||
basename = path.split("/")[-1].split("\\")[-1]
|
|
||||||
return (
|
|
||||||
basename.startswith("test_") or
|
|
||||||
basename.endswith("_test.py") or
|
|
||||||
basename.endswith(".test.ts") or
|
|
||||||
basename.endswith(".test.js") or
|
|
||||||
basename.endswith(".spec.ts") or
|
|
||||||
basename.endswith(".spec.js") or
|
|
||||||
"/tests/" in path or
|
|
||||||
"\\tests\\" in path or
|
|
||||||
"/test/" in path or
|
|
||||||
"\\test\\" in path
|
|
||||||
)
|
|
||||||
|
|
||||||
for idx, result in enumerate(results):
|
for idx, result in enumerate(results):
|
||||||
if idx < rerank_count:
|
if idx < rerank_count:
|
||||||
prev_score = float(result.score)
|
prev_score = float(result.score)
|
||||||
@@ -699,6 +1445,52 @@ def cross_encoder_rerank(
|
|||||||
if test_file_penalty > 0 and is_test_file(result.path):
|
if test_file_penalty > 0 and is_test_file(result.path):
|
||||||
combined_score = combined_score * (1.0 - test_file_penalty)
|
combined_score = combined_score * (1.0 - test_file_penalty)
|
||||||
|
|
||||||
|
cross_encoder_floor_reason = None
|
||||||
|
cross_encoder_floor_score = None
|
||||||
|
cross_encoder_floor_overlap_tokens: List[str] = []
|
||||||
|
if (
|
||||||
|
(query_intent in {QueryIntent.SEMANTIC, QueryIntent.MIXED} or keyword_path_query)
|
||||||
|
and not skip_test_penalty
|
||||||
|
and not skip_auxiliary_penalty
|
||||||
|
and not skip_generated_penalty
|
||||||
|
and not is_test_file(result.path)
|
||||||
|
and not is_generated_artifact_path(result.path)
|
||||||
|
and not is_auxiliary_reference_path(result.path)
|
||||||
|
and _is_source_implementation_path(result.path)
|
||||||
|
):
|
||||||
|
semantic_path_boost, overlap_tokens = _source_path_topic_boost(
|
||||||
|
query,
|
||||||
|
result.path,
|
||||||
|
query_intent,
|
||||||
|
)
|
||||||
|
if semantic_path_boost > 1.0:
|
||||||
|
floor_ratio = 0.8 if semantic_path_boost >= 1.35 else 0.75
|
||||||
|
candidate_floor = prev_score * floor_ratio
|
||||||
|
if candidate_floor > combined_score:
|
||||||
|
combined_score = candidate_floor
|
||||||
|
cross_encoder_floor_reason = (
|
||||||
|
"keyword_source_path_overlap"
|
||||||
|
if query_intent == QueryIntent.KEYWORD
|
||||||
|
else "semantic_source_path_overlap"
|
||||||
|
)
|
||||||
|
cross_encoder_floor_score = candidate_floor
|
||||||
|
cross_encoder_floor_overlap_tokens = overlap_tokens
|
||||||
|
|
||||||
|
metadata = {
|
||||||
|
**result.metadata,
|
||||||
|
"pre_cross_encoder_score": prev_score,
|
||||||
|
"cross_encoder_score": ce_score,
|
||||||
|
"cross_encoder_prob": ce_prob,
|
||||||
|
"cross_encoder_reranked": True,
|
||||||
|
}
|
||||||
|
if cross_encoder_floor_reason is not None:
|
||||||
|
metadata["cross_encoder_floor_reason"] = cross_encoder_floor_reason
|
||||||
|
metadata["cross_encoder_floor_score"] = cross_encoder_floor_score
|
||||||
|
if cross_encoder_floor_overlap_tokens:
|
||||||
|
metadata["cross_encoder_floor_overlap_tokens"] = (
|
||||||
|
cross_encoder_floor_overlap_tokens
|
||||||
|
)
|
||||||
|
|
||||||
reranked_results.append(
|
reranked_results.append(
|
||||||
SearchResult(
|
SearchResult(
|
||||||
path=result.path,
|
path=result.path,
|
||||||
@@ -707,13 +1499,7 @@ def cross_encoder_rerank(
|
|||||||
content=result.content,
|
content=result.content,
|
||||||
symbol=result.symbol,
|
symbol=result.symbol,
|
||||||
chunk=result.chunk,
|
chunk=result.chunk,
|
||||||
metadata={
|
metadata=metadata,
|
||||||
**result.metadata,
|
|
||||||
"pre_cross_encoder_score": prev_score,
|
|
||||||
"cross_encoder_score": ce_score,
|
|
||||||
"cross_encoder_prob": ce_prob,
|
|
||||||
"cross_encoder_reranked": True,
|
|
||||||
},
|
|
||||||
start_line=result.start_line,
|
start_line=result.start_line,
|
||||||
end_line=result.end_line,
|
end_line=result.end_line,
|
||||||
symbol_name=result.symbol_name,
|
symbol_name=result.symbol_name,
|
||||||
|
|||||||
@@ -383,8 +383,37 @@ class ANNIndex:
|
|||||||
if self._index is None or self._current_count == 0:
|
if self._index is None or self._current_count == 0:
|
||||||
return [], [] # Empty index
|
return [], [] # Empty index
|
||||||
|
|
||||||
# Perform kNN search
|
effective_k = min(max(int(top_k), 0), self._current_count)
|
||||||
labels, distances = self._index.knn_query(query, k=top_k)
|
if effective_k == 0:
|
||||||
|
return [], []
|
||||||
|
|
||||||
|
try:
|
||||||
|
self._index.set_ef(max(self.ef, effective_k))
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
labels, distances = self._index.knn_query(query, k=effective_k)
|
||||||
|
break
|
||||||
|
except Exception as exc:
|
||||||
|
if "contiguous 2D array" in str(exc) and effective_k > 1:
|
||||||
|
next_k = max(1, effective_k // 2)
|
||||||
|
logger.debug(
|
||||||
|
"ANN search knn_query failed for k=%d; retrying with k=%d: %s",
|
||||||
|
effective_k,
|
||||||
|
next_k,
|
||||||
|
exc,
|
||||||
|
)
|
||||||
|
if next_k == effective_k:
|
||||||
|
raise
|
||||||
|
effective_k = next_k
|
||||||
|
try:
|
||||||
|
self._index.set_ef(max(self.ef, effective_k))
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
continue
|
||||||
|
raise
|
||||||
|
|
||||||
# Convert to lists and flatten (knn_query returns 2D arrays)
|
# Convert to lists and flatten (knn_query returns 2D arrays)
|
||||||
ids = labels[0].tolist()
|
ids = labels[0].tolist()
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ def check_reranker_available(backend: str) -> tuple[bool, str | None]:
|
|||||||
|
|
||||||
Notes:
|
Notes:
|
||||||
- "fastembed" uses fastembed TextCrossEncoder (pip install fastembed>=0.4.0). [Recommended]
|
- "fastembed" uses fastembed TextCrossEncoder (pip install fastembed>=0.4.0). [Recommended]
|
||||||
- "onnx" redirects to "fastembed" for backward compatibility.
|
- "onnx" uses Optimum + ONNX Runtime (pip install onnxruntime optimum[onnxruntime] transformers).
|
||||||
- "legacy" uses sentence-transformers CrossEncoder (pip install codexlens[reranker-legacy]).
|
- "legacy" uses sentence-transformers CrossEncoder (pip install codexlens[reranker-legacy]).
|
||||||
- "api" uses a remote reranking HTTP API (requires httpx).
|
- "api" uses a remote reranking HTTP API (requires httpx).
|
||||||
- "litellm" uses `ccw-litellm` for unified access to LLM providers.
|
- "litellm" uses `ccw-litellm` for unified access to LLM providers.
|
||||||
@@ -33,10 +33,9 @@ def check_reranker_available(backend: str) -> tuple[bool, str | None]:
|
|||||||
return check_fastembed_reranker_available()
|
return check_fastembed_reranker_available()
|
||||||
|
|
||||||
if backend == "onnx":
|
if backend == "onnx":
|
||||||
# Redirect to fastembed for backward compatibility
|
from .onnx_reranker import check_onnx_reranker_available
|
||||||
from .fastembed_reranker import check_fastembed_reranker_available
|
|
||||||
|
|
||||||
return check_fastembed_reranker_available()
|
return check_onnx_reranker_available()
|
||||||
|
|
||||||
if backend == "litellm":
|
if backend == "litellm":
|
||||||
try:
|
try:
|
||||||
@@ -66,7 +65,7 @@ def check_reranker_available(backend: str) -> tuple[bool, str | None]:
|
|||||||
|
|
||||||
|
|
||||||
def get_reranker(
|
def get_reranker(
|
||||||
backend: str = "fastembed",
|
backend: str = "onnx",
|
||||||
model_name: str | None = None,
|
model_name: str | None = None,
|
||||||
*,
|
*,
|
||||||
device: str | None = None,
|
device: str | None = None,
|
||||||
@@ -76,18 +75,18 @@ def get_reranker(
|
|||||||
|
|
||||||
Args:
|
Args:
|
||||||
backend: Reranker backend to use. Options:
|
backend: Reranker backend to use. Options:
|
||||||
- "fastembed": FastEmbed TextCrossEncoder backend (default, recommended)
|
- "onnx": Optimum + ONNX Runtime backend (default)
|
||||||
- "onnx": Redirects to fastembed for backward compatibility
|
- "fastembed": FastEmbed TextCrossEncoder backend
|
||||||
- "api": HTTP API backend (remote providers)
|
- "api": HTTP API backend (remote providers)
|
||||||
- "litellm": LiteLLM backend (LLM-based, for API mode)
|
- "litellm": LiteLLM backend (LLM-based, for API mode)
|
||||||
- "legacy": sentence-transformers CrossEncoder backend (optional)
|
- "legacy": sentence-transformers CrossEncoder backend (optional)
|
||||||
model_name: Model identifier for model-based backends. Defaults depend on backend:
|
model_name: Model identifier for model-based backends. Defaults depend on backend:
|
||||||
|
- onnx: Xenova/ms-marco-MiniLM-L-6-v2
|
||||||
- fastembed: Xenova/ms-marco-MiniLM-L-6-v2
|
- fastembed: Xenova/ms-marco-MiniLM-L-6-v2
|
||||||
- onnx: (redirects to fastembed)
|
|
||||||
- api: BAAI/bge-reranker-v2-m3 (SiliconFlow)
|
- api: BAAI/bge-reranker-v2-m3 (SiliconFlow)
|
||||||
- legacy: cross-encoder/ms-marco-MiniLM-L-6-v2
|
- legacy: cross-encoder/ms-marco-MiniLM-L-6-v2
|
||||||
- litellm: default
|
- litellm: default
|
||||||
device: Optional device string for backends that support it (legacy only).
|
device: Optional device string for backends that support it (legacy and onnx).
|
||||||
**kwargs: Additional backend-specific arguments.
|
**kwargs: Additional backend-specific arguments.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
@@ -111,16 +110,17 @@ def get_reranker(
|
|||||||
return FastEmbedReranker(model_name=resolved_model_name, **kwargs)
|
return FastEmbedReranker(model_name=resolved_model_name, **kwargs)
|
||||||
|
|
||||||
if backend == "onnx":
|
if backend == "onnx":
|
||||||
# Redirect to fastembed for backward compatibility
|
ok, err = check_reranker_available("onnx")
|
||||||
ok, err = check_reranker_available("fastembed")
|
|
||||||
if not ok:
|
if not ok:
|
||||||
raise ImportError(err)
|
raise ImportError(err)
|
||||||
|
|
||||||
from .fastembed_reranker import FastEmbedReranker
|
from .onnx_reranker import ONNXReranker
|
||||||
|
|
||||||
resolved_model_name = (model_name or "").strip() or FastEmbedReranker.DEFAULT_MODEL
|
resolved_model_name = (model_name or "").strip() or ONNXReranker.DEFAULT_MODEL
|
||||||
_ = device # Device selection is managed via fastembed providers.
|
effective_kwargs = dict(kwargs)
|
||||||
return FastEmbedReranker(model_name=resolved_model_name, **kwargs)
|
if "use_gpu" not in effective_kwargs and device is not None:
|
||||||
|
effective_kwargs["use_gpu"] = str(device).strip().lower() not in {"cpu", "none"}
|
||||||
|
return ONNXReranker(model_name=resolved_model_name, **effective_kwargs)
|
||||||
|
|
||||||
if backend == "legacy":
|
if backend == "legacy":
|
||||||
ok, err = check_reranker_available("legacy")
|
ok, err = check_reranker_available("legacy")
|
||||||
|
|||||||
@@ -58,6 +58,38 @@ def _iter_batches(items: Sequence[Any], batch_size: int) -> Iterable[Sequence[An
|
|||||||
yield items[i : i + batch_size]
|
yield items[i : i + batch_size]
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_provider_specs(
|
||||||
|
providers: Sequence[Any] | None,
|
||||||
|
) -> tuple[list[str], list[dict[str, Any]]]:
|
||||||
|
"""Split execution-provider specs into Optimum-compatible names and options."""
|
||||||
|
normalized_providers: list[str] = []
|
||||||
|
normalized_options: list[dict[str, Any]] = []
|
||||||
|
|
||||||
|
for provider in providers or ():
|
||||||
|
provider_name: str | None = None
|
||||||
|
provider_options: dict[str, Any] = {}
|
||||||
|
|
||||||
|
if isinstance(provider, tuple):
|
||||||
|
if provider:
|
||||||
|
provider_name = str(provider[0]).strip()
|
||||||
|
if len(provider) > 1 and isinstance(provider[1], dict):
|
||||||
|
provider_options = dict(provider[1])
|
||||||
|
elif provider is not None:
|
||||||
|
provider_name = str(provider).strip()
|
||||||
|
|
||||||
|
if not provider_name:
|
||||||
|
continue
|
||||||
|
|
||||||
|
normalized_providers.append(provider_name)
|
||||||
|
normalized_options.append(provider_options)
|
||||||
|
|
||||||
|
if not normalized_providers:
|
||||||
|
normalized_providers.append("CPUExecutionProvider")
|
||||||
|
normalized_options.append({})
|
||||||
|
|
||||||
|
return normalized_providers, normalized_options
|
||||||
|
|
||||||
|
|
||||||
class ONNXReranker(BaseReranker):
|
class ONNXReranker(BaseReranker):
|
||||||
"""Cross-encoder reranker using Optimum + ONNX Runtime with lazy loading."""
|
"""Cross-encoder reranker using Optimum + ONNX Runtime with lazy loading."""
|
||||||
|
|
||||||
@@ -110,19 +142,21 @@ class ONNXReranker(BaseReranker):
|
|||||||
use_gpu=self.use_gpu, with_device_options=True
|
use_gpu=self.use_gpu, with_device_options=True
|
||||||
)
|
)
|
||||||
|
|
||||||
|
provider_names, provider_options = _normalize_provider_specs(self.providers)
|
||||||
|
|
||||||
# Some Optimum versions accept `providers`, others accept a single `provider`.
|
# Some Optimum versions accept `providers`, others accept a single `provider`.
|
||||||
# Prefer passing the full providers list, with a conservative fallback.
|
# Prefer passing the full providers list, with a conservative fallback.
|
||||||
model_kwargs: dict[str, Any] = {}
|
model_kwargs: dict[str, Any] = {}
|
||||||
try:
|
try:
|
||||||
params = signature(ORTModelForSequenceClassification.from_pretrained).parameters
|
params = signature(ORTModelForSequenceClassification.from_pretrained).parameters
|
||||||
if "providers" in params:
|
if "providers" in params:
|
||||||
model_kwargs["providers"] = self.providers
|
model_kwargs["providers"] = provider_names
|
||||||
|
if "provider_options" in params:
|
||||||
|
model_kwargs["provider_options"] = provider_options
|
||||||
elif "provider" in params:
|
elif "provider" in params:
|
||||||
provider_name = "CPUExecutionProvider"
|
model_kwargs["provider"] = provider_names[0]
|
||||||
if self.providers:
|
if "provider_options" in params and provider_options[0]:
|
||||||
first = self.providers[0]
|
model_kwargs["provider_options"] = provider_options[0]
|
||||||
provider_name = first[0] if isinstance(first, tuple) else str(first)
|
|
||||||
model_kwargs["provider"] = provider_name
|
|
||||||
except Exception:
|
except Exception:
|
||||||
model_kwargs = {}
|
model_kwargs = {}
|
||||||
|
|
||||||
|
|||||||
47
codex-lens/src/codexlens/storage/index_filters.py
Normal file
47
codex-lens/src/codexlens/storage/index_filters.py
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Iterable, List, Optional, Set
|
||||||
|
|
||||||
|
from codexlens.storage.index_tree import DEFAULT_IGNORE_DIRS
|
||||||
|
|
||||||
|
|
||||||
|
EXTRA_IGNORED_INDEX_DIRS = frozenset({".workflow"})
|
||||||
|
IGNORED_INDEX_DIRS = frozenset({name.casefold() for name in DEFAULT_IGNORE_DIRS | set(EXTRA_IGNORED_INDEX_DIRS)})
|
||||||
|
|
||||||
|
|
||||||
|
def is_ignored_index_path(
|
||||||
|
index_path: Path,
|
||||||
|
scan_root: Path,
|
||||||
|
*,
|
||||||
|
ignored_dir_names: Optional[Set[str]] = None,
|
||||||
|
) -> bool:
|
||||||
|
"""Return True when an index lives under an ignored/generated subtree."""
|
||||||
|
|
||||||
|
ignored = (
|
||||||
|
{name.casefold() for name in ignored_dir_names}
|
||||||
|
if ignored_dir_names is not None
|
||||||
|
else IGNORED_INDEX_DIRS
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
relative_parts = index_path.resolve().relative_to(scan_root.resolve()).parts[:-1]
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
return any(part.casefold() in ignored for part in relative_parts)
|
||||||
|
|
||||||
|
|
||||||
|
def filter_index_paths(
|
||||||
|
index_paths: Iterable[Path],
|
||||||
|
scan_root: Path,
|
||||||
|
*,
|
||||||
|
ignored_dir_names: Optional[Set[str]] = None,
|
||||||
|
) -> List[Path]:
|
||||||
|
"""Filter out discovered indexes that belong to ignored/generated subtrees."""
|
||||||
|
|
||||||
|
return [
|
||||||
|
path
|
||||||
|
for path in index_paths
|
||||||
|
if not is_ignored_index_path(path, scan_root, ignored_dir_names=ignored_dir_names)
|
||||||
|
]
|
||||||
@@ -252,6 +252,18 @@ class IndexTreeBuilder:
|
|||||||
# Collect directories by depth
|
# Collect directories by depth
|
||||||
dirs_by_depth = self._collect_dirs_by_depth(source_root, languages)
|
dirs_by_depth = self._collect_dirs_by_depth(source_root, languages)
|
||||||
|
|
||||||
|
if force_full:
|
||||||
|
pruned_dirs = self._prune_stale_project_dirs(
|
||||||
|
project_id=project_info.id,
|
||||||
|
source_root=source_root,
|
||||||
|
dirs_by_depth=dirs_by_depth,
|
||||||
|
)
|
||||||
|
if pruned_dirs:
|
||||||
|
self.logger.info(
|
||||||
|
"Pruned %d stale directory mappings before full rebuild",
|
||||||
|
len(pruned_dirs),
|
||||||
|
)
|
||||||
|
|
||||||
if not dirs_by_depth:
|
if not dirs_by_depth:
|
||||||
self.logger.warning("No indexable directories found in %s", source_root)
|
self.logger.warning("No indexable directories found in %s", source_root)
|
||||||
if global_index is not None:
|
if global_index is not None:
|
||||||
@@ -450,6 +462,52 @@ class IndexTreeBuilder:
|
|||||||
|
|
||||||
# === Internal Methods ===
|
# === Internal Methods ===
|
||||||
|
|
||||||
|
def _prune_stale_project_dirs(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
project_id: int,
|
||||||
|
source_root: Path,
|
||||||
|
dirs_by_depth: Dict[int, List[Path]],
|
||||||
|
) -> List[Path]:
|
||||||
|
"""Remove registry mappings for directories no longer included in the index tree."""
|
||||||
|
source_root = source_root.resolve()
|
||||||
|
valid_dirs: Set[Path] = {
|
||||||
|
path.resolve()
|
||||||
|
for paths in dirs_by_depth.values()
|
||||||
|
for path in paths
|
||||||
|
}
|
||||||
|
valid_dirs.add(source_root)
|
||||||
|
|
||||||
|
stale_mappings = []
|
||||||
|
for mapping in self.registry.get_project_dirs(project_id):
|
||||||
|
mapping_path = mapping.source_path.resolve()
|
||||||
|
if mapping_path in valid_dirs:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
mapping_path.relative_to(source_root)
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
stale_mappings.append(mapping)
|
||||||
|
|
||||||
|
stale_mappings.sort(
|
||||||
|
key=lambda mapping: len(mapping.source_path.resolve().relative_to(source_root).parts),
|
||||||
|
reverse=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
pruned_paths: List[Path] = []
|
||||||
|
for mapping in stale_mappings:
|
||||||
|
try:
|
||||||
|
if self.registry.unregister_dir(mapping.source_path):
|
||||||
|
pruned_paths.append(mapping.source_path.resolve())
|
||||||
|
except Exception as exc:
|
||||||
|
self.logger.warning(
|
||||||
|
"Failed to prune stale mapping for %s: %s",
|
||||||
|
mapping.source_path,
|
||||||
|
exc,
|
||||||
|
)
|
||||||
|
|
||||||
|
return pruned_paths
|
||||||
|
|
||||||
def _collect_dirs_by_depth(
|
def _collect_dirs_by_depth(
|
||||||
self, source_root: Path, languages: List[str] = None
|
self, source_root: Path, languages: List[str] = None
|
||||||
) -> Dict[int, List[Path]]:
|
) -> Dict[int, List[Path]]:
|
||||||
@@ -620,8 +678,9 @@ class IndexTreeBuilder:
|
|||||||
"static_graph_enabled": self.config.static_graph_enabled,
|
"static_graph_enabled": self.config.static_graph_enabled,
|
||||||
"static_graph_relationship_types": self.config.static_graph_relationship_types,
|
"static_graph_relationship_types": self.config.static_graph_relationship_types,
|
||||||
"use_astgrep": getattr(self.config, "use_astgrep", False),
|
"use_astgrep": getattr(self.config, "use_astgrep", False),
|
||||||
"ignore_patterns": list(getattr(self.config, "ignore_patterns", [])),
|
"ignore_patterns": list(self.ignore_patterns),
|
||||||
"extension_filters": list(getattr(self.config, "extension_filters", [])),
|
"extension_filters": list(self.extension_filters),
|
||||||
|
"incremental": bool(self.incremental),
|
||||||
}
|
}
|
||||||
|
|
||||||
worker_args = [
|
worker_args = [
|
||||||
@@ -693,6 +752,9 @@ class IndexTreeBuilder:
|
|||||||
# Ensure index directory exists
|
# Ensure index directory exists
|
||||||
index_db_path.parent.mkdir(parents=True, exist_ok=True)
|
index_db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
if not self.incremental:
|
||||||
|
_reset_index_db_files(index_db_path)
|
||||||
|
|
||||||
# Create directory index
|
# Create directory index
|
||||||
if self.config.global_symbol_index_enabled:
|
if self.config.global_symbol_index_enabled:
|
||||||
global_index = GlobalSymbolIndex(global_index_db_path, project_id=project_id)
|
global_index = GlobalSymbolIndex(global_index_db_path, project_id=project_id)
|
||||||
@@ -1100,6 +1162,18 @@ def _matches_extension_filters(path: Path, patterns: List[str], source_root: Opt
|
|||||||
return _matches_path_patterns(path, patterns, source_root)
|
return _matches_path_patterns(path, patterns, source_root)
|
||||||
|
|
||||||
|
|
||||||
|
def _reset_index_db_files(index_db_path: Path) -> None:
|
||||||
|
"""Best-effort removal of a directory index DB and common SQLite sidecars."""
|
||||||
|
for suffix in ("", "-wal", "-shm", "-journal"):
|
||||||
|
target = Path(f"{index_db_path}{suffix}") if suffix else index_db_path
|
||||||
|
try:
|
||||||
|
target.unlink()
|
||||||
|
except FileNotFoundError:
|
||||||
|
continue
|
||||||
|
except OSError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
|
||||||
def _build_dir_worker(args: tuple) -> DirBuildResult:
|
def _build_dir_worker(args: tuple) -> DirBuildResult:
|
||||||
"""Worker function for parallel directory building.
|
"""Worker function for parallel directory building.
|
||||||
|
|
||||||
@@ -1140,6 +1214,9 @@ def _build_dir_worker(args: tuple) -> DirBuildResult:
|
|||||||
global_index = GlobalSymbolIndex(Path(global_index_db_path), project_id=int(project_id))
|
global_index = GlobalSymbolIndex(Path(global_index_db_path), project_id=int(project_id))
|
||||||
global_index.initialize()
|
global_index.initialize()
|
||||||
|
|
||||||
|
if not bool(config_dict.get("incremental", True)):
|
||||||
|
_reset_index_db_files(index_db_path)
|
||||||
|
|
||||||
store = DirIndexStore(index_db_path, config=config, global_index=global_index)
|
store = DirIndexStore(index_db_path, config=config, global_index=global_index)
|
||||||
store.initialize()
|
store.initialize()
|
||||||
|
|
||||||
|
|||||||
@@ -591,6 +591,56 @@ class RegistryStore:
|
|||||||
|
|
||||||
return [self._row_to_dir_mapping(row) for row in rows]
|
return [self._row_to_dir_mapping(row) for row in rows]
|
||||||
|
|
||||||
|
def find_descendant_project_roots(self, source_root: Path) -> List[DirMapping]:
|
||||||
|
"""Return root directory mappings for nested projects under ``source_root``."""
|
||||||
|
with self._lock:
|
||||||
|
conn = self._get_connection()
|
||||||
|
source_root_resolved = source_root.resolve()
|
||||||
|
source_root_str = self._normalize_path_for_comparison(source_root_resolved)
|
||||||
|
|
||||||
|
rows = conn.execute(
|
||||||
|
"""
|
||||||
|
SELECT dm.*
|
||||||
|
FROM dir_mapping dm
|
||||||
|
INNER JOIN projects p ON p.id = dm.project_id
|
||||||
|
WHERE dm.source_path = p.source_root
|
||||||
|
AND p.source_root LIKE ?
|
||||||
|
ORDER BY p.source_root ASC
|
||||||
|
""",
|
||||||
|
(f"{source_root_str}%",),
|
||||||
|
).fetchall()
|
||||||
|
|
||||||
|
descendant_roots: List[DirMapping] = []
|
||||||
|
normalized_root_path = Path(source_root_str)
|
||||||
|
|
||||||
|
for row in rows:
|
||||||
|
mapping = self._row_to_dir_mapping(row)
|
||||||
|
normalized_mapping_path = Path(
|
||||||
|
self._normalize_path_for_comparison(mapping.source_path.resolve())
|
||||||
|
)
|
||||||
|
|
||||||
|
if normalized_mapping_path == normalized_root_path:
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
normalized_mapping_path.relative_to(normalized_root_path)
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
descendant_roots.append(mapping)
|
||||||
|
|
||||||
|
descendant_roots.sort(
|
||||||
|
key=lambda mapping: (
|
||||||
|
len(
|
||||||
|
mapping.source_path.resolve().relative_to(
|
||||||
|
source_root_resolved
|
||||||
|
).parts
|
||||||
|
),
|
||||||
|
self._normalize_path_for_comparison(mapping.source_path.resolve()),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return descendant_roots
|
||||||
|
|
||||||
def update_dir_stats(self, source_path: Path, files_count: int) -> None:
|
def update_dir_stats(self, source_path: Path, files_count: int) -> None:
|
||||||
"""Update directory statistics.
|
"""Update directory statistics.
|
||||||
|
|
||||||
|
|||||||
@@ -11,12 +11,25 @@ Common Fixtures:
|
|||||||
- sample_code_files: Factory for creating sample code files
|
- sample_code_files: Factory for creating sample code files
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import pytest
|
|
||||||
import tempfile
|
|
||||||
import shutil
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Dict, Any
|
|
||||||
import sqlite3
|
import sqlite3
|
||||||
|
import shutil
|
||||||
|
import tempfile
|
||||||
|
import warnings
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
warnings.filterwarnings(
|
||||||
|
"ignore",
|
||||||
|
message=r"'BaseCommand' is deprecated and will be removed in Click 9\.0\..*",
|
||||||
|
category=DeprecationWarning,
|
||||||
|
)
|
||||||
|
warnings.filterwarnings(
|
||||||
|
"ignore",
|
||||||
|
message=r"The '__version__' attribute is deprecated and will be removed in Click 9\.1\..*",
|
||||||
|
category=DeprecationWarning,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
|
|||||||
@@ -98,6 +98,23 @@ class TestANNIndex:
|
|||||||
assert ids[0] == 1 # ID of first vector
|
assert ids[0] == 1 # ID of first vector
|
||||||
assert distances[0] < 0.01 # Very small distance (almost identical)
|
assert distances[0] < 0.01 # Very small distance (almost identical)
|
||||||
|
|
||||||
|
@pytest.mark.skipif(
|
||||||
|
not _hnswlib_available(),
|
||||||
|
reason="hnswlib not installed"
|
||||||
|
)
|
||||||
|
def test_search_clamps_top_k_to_available_vectors(self, temp_db, sample_vectors, sample_ids):
|
||||||
|
"""Search should clamp top_k to the loaded vector count."""
|
||||||
|
from codexlens.semantic.ann_index import ANNIndex
|
||||||
|
|
||||||
|
index = ANNIndex(temp_db, dim=384)
|
||||||
|
index.add_vectors(sample_ids[:3], sample_vectors[:3])
|
||||||
|
|
||||||
|
ids, distances = index.search(sample_vectors[0], top_k=10)
|
||||||
|
|
||||||
|
assert len(ids) == 3
|
||||||
|
assert len(distances) == 3
|
||||||
|
assert ids[0] == 1
|
||||||
|
|
||||||
@pytest.mark.skipif(
|
@pytest.mark.skipif(
|
||||||
not _hnswlib_available(),
|
not _hnswlib_available(),
|
||||||
reason="hnswlib not installed"
|
reason="hnswlib not installed"
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
350
codex-lens/tests/test_compare_ccw_smart_search_stage2.py
Normal file
350
codex-lens/tests/test_compare_ccw_smart_search_stage2.py
Normal file
@@ -0,0 +1,350 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import importlib.util
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from types import SimpleNamespace
|
||||||
|
|
||||||
|
|
||||||
|
MODULE_PATH = Path(__file__).resolve().parents[1] / "benchmarks" / "compare_ccw_smart_search_stage2.py"
|
||||||
|
MODULE_NAME = "compare_ccw_smart_search_stage2_test_module"
|
||||||
|
MODULE_SPEC = importlib.util.spec_from_file_location(MODULE_NAME, MODULE_PATH)
|
||||||
|
assert MODULE_SPEC is not None and MODULE_SPEC.loader is not None
|
||||||
|
benchmark = importlib.util.module_from_spec(MODULE_SPEC)
|
||||||
|
sys.modules[MODULE_NAME] = benchmark
|
||||||
|
MODULE_SPEC.loader.exec_module(benchmark)
|
||||||
|
|
||||||
|
|
||||||
|
class _FakeChainResult:
|
||||||
|
def __init__(self, paths: list[str]) -> None:
|
||||||
|
self.results = [SimpleNamespace(path=path) for path in paths]
|
||||||
|
|
||||||
|
|
||||||
|
class _FakeEngine:
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
search_paths: list[str] | None = None,
|
||||||
|
cascade_paths: list[str] | None = None,
|
||||||
|
) -> None:
|
||||||
|
self.search_paths = search_paths or []
|
||||||
|
self.cascade_paths = cascade_paths or []
|
||||||
|
self.search_calls: list[dict[str, object]] = []
|
||||||
|
self.cascade_calls: list[dict[str, object]] = []
|
||||||
|
|
||||||
|
def search(self, query: str, source_path: Path, options: object) -> _FakeChainResult:
|
||||||
|
self.search_calls.append(
|
||||||
|
{
|
||||||
|
"query": query,
|
||||||
|
"source_path": source_path,
|
||||||
|
"options": options,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return _FakeChainResult(self.search_paths)
|
||||||
|
|
||||||
|
def cascade_search(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
source_path: Path,
|
||||||
|
*,
|
||||||
|
k: int,
|
||||||
|
coarse_k: int,
|
||||||
|
options: object,
|
||||||
|
strategy: str,
|
||||||
|
) -> _FakeChainResult:
|
||||||
|
self.cascade_calls.append(
|
||||||
|
{
|
||||||
|
"query": query,
|
||||||
|
"source_path": source_path,
|
||||||
|
"k": k,
|
||||||
|
"coarse_k": coarse_k,
|
||||||
|
"options": options,
|
||||||
|
"strategy": strategy,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return _FakeChainResult(self.cascade_paths)
|
||||||
|
|
||||||
|
|
||||||
|
def test_strategy_specs_include_baselines_before_stage2_modes() -> None:
|
||||||
|
specs = benchmark._strategy_specs(
|
||||||
|
["realtime", "static_global_graph"],
|
||||||
|
include_dense_baseline=True,
|
||||||
|
baseline_methods=["auto", "fts", "hybrid"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert [spec.strategy_key for spec in specs] == [
|
||||||
|
"auto",
|
||||||
|
"fts",
|
||||||
|
"hybrid",
|
||||||
|
"dense_rerank",
|
||||||
|
"staged:realtime",
|
||||||
|
"staged:static_global_graph",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_select_effective_method_matches_cli_auto_routing() -> None:
|
||||||
|
assert benchmark._select_effective_method("find_descendant_project_roots", "auto") == "fts"
|
||||||
|
assert benchmark._select_effective_method("build dist artifact output", "auto") == "fts"
|
||||||
|
assert benchmark._select_effective_method("embedding backend fastembed local litellm api config", "auto") == "fts"
|
||||||
|
assert benchmark._select_effective_method("get_reranker factory onnx backend selection", "auto") == "fts"
|
||||||
|
assert benchmark._select_effective_method("how does the authentication flow work", "auto") == "dense_rerank"
|
||||||
|
assert benchmark._select_effective_method("how smart_search keyword routing works", "auto") == "hybrid"
|
||||||
|
|
||||||
|
|
||||||
|
def test_filter_dataset_by_query_match_uses_case_insensitive_substring() -> None:
|
||||||
|
dataset = [
|
||||||
|
{"query": "embedding backend fastembed local litellm api config", "relevant_paths": ["a"]},
|
||||||
|
{"query": "get_reranker factory onnx backend selection", "relevant_paths": ["b"]},
|
||||||
|
{"query": "how does smart search route keyword queries", "relevant_paths": ["c"]},
|
||||||
|
]
|
||||||
|
|
||||||
|
filtered = benchmark._filter_dataset_by_query_match(dataset, "BACKEND")
|
||||||
|
assert [item["query"] for item in filtered] == [
|
||||||
|
"embedding backend fastembed local litellm api config",
|
||||||
|
"get_reranker factory onnx backend selection",
|
||||||
|
]
|
||||||
|
|
||||||
|
narrow_filtered = benchmark._filter_dataset_by_query_match(dataset, "FASTEMBED")
|
||||||
|
assert [item["query"] for item in narrow_filtered] == [
|
||||||
|
"embedding backend fastembed local litellm api config",
|
||||||
|
]
|
||||||
|
|
||||||
|
unfiltered = benchmark._filter_dataset_by_query_match(dataset, None)
|
||||||
|
assert [item["query"] for item in unfiltered] == [item["query"] for item in dataset]
|
||||||
|
|
||||||
|
|
||||||
|
def test_apply_query_limit_runs_after_filtering() -> None:
|
||||||
|
dataset = [
|
||||||
|
{"query": "executeHybridMode dense_rerank semantic smart_search", "relevant_paths": ["a"]},
|
||||||
|
{"query": "embedding backend fastembed local litellm api config", "relevant_paths": ["b"]},
|
||||||
|
{"query": "reranker backend onnx api legacy configuration", "relevant_paths": ["c"]},
|
||||||
|
]
|
||||||
|
|
||||||
|
filtered = benchmark._filter_dataset_by_query_match(dataset, "backend")
|
||||||
|
limited = benchmark._apply_query_limit(filtered, 1)
|
||||||
|
|
||||||
|
assert [item["query"] for item in limited] == [
|
||||||
|
"embedding backend fastembed local litellm api config",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_make_progress_payload_reports_partial_completion() -> None:
|
||||||
|
args = SimpleNamespace(
|
||||||
|
queries_file=Path("queries.jsonl"),
|
||||||
|
k=10,
|
||||||
|
coarse_k=100,
|
||||||
|
)
|
||||||
|
strategy_specs = [
|
||||||
|
benchmark.StrategySpec(strategy_key="auto", strategy="auto", stage2_mode=None),
|
||||||
|
benchmark.StrategySpec(strategy_key="dense_rerank", strategy="dense_rerank", stage2_mode=None),
|
||||||
|
]
|
||||||
|
evaluations = [
|
||||||
|
benchmark.QueryEvaluation(
|
||||||
|
query="embedding backend fastembed local litellm api config",
|
||||||
|
intent="config",
|
||||||
|
notes=None,
|
||||||
|
relevant_paths=["codex-lens/src/codexlens/config.py"],
|
||||||
|
runs={
|
||||||
|
"auto": benchmark.StrategyRun(
|
||||||
|
strategy_key="auto",
|
||||||
|
strategy="auto",
|
||||||
|
stage2_mode=None,
|
||||||
|
effective_method="fts",
|
||||||
|
execution_method="fts",
|
||||||
|
latency_ms=123.0,
|
||||||
|
topk_paths=["config.py"],
|
||||||
|
first_hit_rank=1,
|
||||||
|
hit_at_k=True,
|
||||||
|
recall_at_k=1.0,
|
||||||
|
generated_artifact_count=0,
|
||||||
|
test_file_count=0,
|
||||||
|
error=None,
|
||||||
|
)
|
||||||
|
},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
payload = benchmark._make_progress_payload(
|
||||||
|
args=args,
|
||||||
|
source_root=Path("D:/repo"),
|
||||||
|
strategy_specs=strategy_specs,
|
||||||
|
evaluations=evaluations,
|
||||||
|
query_index=1,
|
||||||
|
total_queries=3,
|
||||||
|
run_index=2,
|
||||||
|
total_runs=6,
|
||||||
|
current_query="embedding backend fastembed local litellm api config",
|
||||||
|
current_strategy_key="complete",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert payload["status"] == "running"
|
||||||
|
assert payload["progress"]["completed_queries"] == 1
|
||||||
|
assert payload["progress"]["completed_runs"] == 2
|
||||||
|
assert payload["progress"]["total_runs"] == 6
|
||||||
|
assert payload["strategy_keys"] == ["auto", "dense_rerank"]
|
||||||
|
assert payload["evaluations"][0]["runs"]["auto"]["effective_method"] == "fts"
|
||||||
|
|
||||||
|
|
||||||
|
def test_write_final_outputs_updates_progress_snapshot(tmp_path: Path) -> None:
|
||||||
|
output_path = tmp_path / "results.json"
|
||||||
|
progress_path = tmp_path / "progress.json"
|
||||||
|
payload = {
|
||||||
|
"status": "completed",
|
||||||
|
"query_count": 1,
|
||||||
|
"strategies": {"auto": {"effective_methods": {"fts": 1}}},
|
||||||
|
}
|
||||||
|
|
||||||
|
benchmark._write_final_outputs(
|
||||||
|
output_path=output_path,
|
||||||
|
progress_output=progress_path,
|
||||||
|
payload=payload,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert json.loads(output_path.read_text(encoding="utf-8")) == payload
|
||||||
|
assert json.loads(progress_path.read_text(encoding="utf-8")) == payload
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_parser_defaults_reranker_gpu_to_disabled() -> None:
|
||||||
|
parser = benchmark.build_parser()
|
||||||
|
args = parser.parse_args([])
|
||||||
|
|
||||||
|
assert args.embedding_use_gpu is False
|
||||||
|
assert args.reranker_use_gpu is False
|
||||||
|
assert args.reranker_model == benchmark.DEFAULT_LOCAL_ONNX_RERANKER_MODEL
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_strategy_runtime_clones_config(monkeypatch, tmp_path: Path) -> None:
|
||||||
|
class _FakeRegistry:
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self.initialized = False
|
||||||
|
|
||||||
|
def initialize(self) -> None:
|
||||||
|
self.initialized = True
|
||||||
|
|
||||||
|
class _FakeMapper:
|
||||||
|
pass
|
||||||
|
|
||||||
|
class _FakeEngine:
|
||||||
|
def __init__(self, *, registry, mapper, config) -> None:
|
||||||
|
self.registry = registry
|
||||||
|
self.mapper = mapper
|
||||||
|
self.config = config
|
||||||
|
|
||||||
|
monkeypatch.setattr(benchmark, "RegistryStore", _FakeRegistry)
|
||||||
|
monkeypatch.setattr(benchmark, "PathMapper", _FakeMapper)
|
||||||
|
monkeypatch.setattr(benchmark, "ChainSearchEngine", _FakeEngine)
|
||||||
|
|
||||||
|
base_config = benchmark.Config(data_dir=tmp_path, reranker_use_gpu=False)
|
||||||
|
strategy_spec = benchmark.StrategySpec(strategy_key="dense_rerank", strategy="dense_rerank", stage2_mode=None)
|
||||||
|
|
||||||
|
runtime = benchmark._build_strategy_runtime(base_config, strategy_spec)
|
||||||
|
|
||||||
|
assert runtime.strategy_spec == strategy_spec
|
||||||
|
assert runtime.config is not base_config
|
||||||
|
assert runtime.config.reranker_use_gpu is False
|
||||||
|
assert runtime.registry.initialized is True
|
||||||
|
assert runtime.engine.config is runtime.config
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_strategy_routes_auto_keyword_queries_to_fts_search() -> None:
|
||||||
|
engine = _FakeEngine(
|
||||||
|
search_paths=[
|
||||||
|
"D:/repo/src/codexlens/storage/registry.py",
|
||||||
|
"D:/repo/build/lib/codexlens/storage/registry.py",
|
||||||
|
]
|
||||||
|
)
|
||||||
|
config = SimpleNamespace(cascade_strategy="staged", staged_stage2_mode="realtime")
|
||||||
|
relevant = {benchmark._normalize_path_key("D:/repo/src/codexlens/storage/registry.py")}
|
||||||
|
|
||||||
|
run = benchmark._run_strategy(
|
||||||
|
engine,
|
||||||
|
config,
|
||||||
|
strategy_spec=benchmark.StrategySpec(strategy_key="auto", strategy="auto", stage2_mode=None),
|
||||||
|
query="find_descendant_project_roots",
|
||||||
|
source_path=Path("D:/repo"),
|
||||||
|
k=5,
|
||||||
|
coarse_k=20,
|
||||||
|
relevant=relevant,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert len(engine.search_calls) == 1
|
||||||
|
assert len(engine.cascade_calls) == 0
|
||||||
|
assert run.effective_method == "fts"
|
||||||
|
assert run.execution_method == "fts"
|
||||||
|
assert run.hit_at_k is True
|
||||||
|
assert run.generated_artifact_count == 1
|
||||||
|
assert run.test_file_count == 0
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_strategy_uses_cascade_for_dense_rerank_and_restores_config() -> None:
|
||||||
|
engine = _FakeEngine(cascade_paths=["D:/repo/src/tools/smart-search.ts"])
|
||||||
|
config = SimpleNamespace(cascade_strategy="staged", staged_stage2_mode="static_global_graph")
|
||||||
|
relevant = {benchmark._normalize_path_key("D:/repo/src/tools/smart-search.ts")}
|
||||||
|
|
||||||
|
run = benchmark._run_strategy(
|
||||||
|
engine,
|
||||||
|
config,
|
||||||
|
strategy_spec=benchmark.StrategySpec(
|
||||||
|
strategy_key="dense_rerank",
|
||||||
|
strategy="dense_rerank",
|
||||||
|
stage2_mode=None,
|
||||||
|
),
|
||||||
|
query="how does smart search route keyword queries",
|
||||||
|
source_path=Path("D:/repo"),
|
||||||
|
k=5,
|
||||||
|
coarse_k=20,
|
||||||
|
relevant=relevant,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert len(engine.search_calls) == 0
|
||||||
|
assert len(engine.cascade_calls) == 1
|
||||||
|
assert engine.cascade_calls[0]["strategy"] == "dense_rerank"
|
||||||
|
assert run.effective_method == "dense_rerank"
|
||||||
|
assert run.execution_method == "cascade"
|
||||||
|
assert run.hit_at_k is True
|
||||||
|
assert config.cascade_strategy == "staged"
|
||||||
|
assert config.staged_stage2_mode == "static_global_graph"
|
||||||
|
|
||||||
|
|
||||||
|
def test_summarize_runs_tracks_effective_method_and_artifact_pressure() -> None:
|
||||||
|
summary = benchmark._summarize_runs(
|
||||||
|
[
|
||||||
|
benchmark.StrategyRun(
|
||||||
|
strategy_key="auto",
|
||||||
|
strategy="auto",
|
||||||
|
stage2_mode=None,
|
||||||
|
effective_method="fts",
|
||||||
|
execution_method="fts",
|
||||||
|
latency_ms=10.0,
|
||||||
|
topk_paths=["a"],
|
||||||
|
first_hit_rank=1,
|
||||||
|
hit_at_k=True,
|
||||||
|
recall_at_k=1.0,
|
||||||
|
generated_artifact_count=1,
|
||||||
|
test_file_count=0,
|
||||||
|
error=None,
|
||||||
|
),
|
||||||
|
benchmark.StrategyRun(
|
||||||
|
strategy_key="auto",
|
||||||
|
strategy="auto",
|
||||||
|
stage2_mode=None,
|
||||||
|
effective_method="hybrid",
|
||||||
|
execution_method="hybrid",
|
||||||
|
latency_ms=30.0,
|
||||||
|
topk_paths=["b"],
|
||||||
|
first_hit_rank=None,
|
||||||
|
hit_at_k=False,
|
||||||
|
recall_at_k=0.0,
|
||||||
|
generated_artifact_count=0,
|
||||||
|
test_file_count=2,
|
||||||
|
error=None,
|
||||||
|
),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert summary["effective_methods"] == {"fts": 1, "hybrid": 1}
|
||||||
|
assert summary["runs_with_generated_artifacts"] == 1
|
||||||
|
assert summary["runs_with_test_files"] == 1
|
||||||
|
assert summary["avg_generated_artifact_count"] == 0.5
|
||||||
|
assert summary["avg_test_file_count"] == 1.0
|
||||||
83
codex-lens/tests/test_config_search_env_overrides.py
Normal file
83
codex-lens/tests/test_config_search_env_overrides.py
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
"""Unit tests for Config .env overrides for final search ranking penalties."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import tempfile
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from codexlens.config import Config
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def temp_config_dir() -> Path:
|
||||||
|
"""Create temporary directory for config data_dir."""
|
||||||
|
tmpdir = tempfile.TemporaryDirectory(ignore_cleanup_errors=True)
|
||||||
|
yield Path(tmpdir.name)
|
||||||
|
try:
|
||||||
|
tmpdir.cleanup()
|
||||||
|
except (PermissionError, OSError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_penalty_env_overrides_apply(temp_config_dir: Path) -> None:
|
||||||
|
config = Config(data_dir=temp_config_dir)
|
||||||
|
|
||||||
|
env_path = temp_config_dir / ".env"
|
||||||
|
env_path.write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"TEST_FILE_PENALTY=0.25",
|
||||||
|
"GENERATED_FILE_PENALTY=0.4",
|
||||||
|
"",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
config.load_settings()
|
||||||
|
|
||||||
|
assert config.test_file_penalty == 0.25
|
||||||
|
assert config.generated_file_penalty == 0.4
|
||||||
|
|
||||||
|
|
||||||
|
def test_reranker_gpu_env_override_apply(temp_config_dir: Path) -> None:
|
||||||
|
config = Config(data_dir=temp_config_dir)
|
||||||
|
|
||||||
|
env_path = temp_config_dir / ".env"
|
||||||
|
env_path.write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"RERANKER_USE_GPU=false",
|
||||||
|
"",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
config.load_settings()
|
||||||
|
|
||||||
|
assert config.reranker_use_gpu is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_penalty_env_overrides_invalid_ignored(temp_config_dir: Path) -> None:
|
||||||
|
config = Config(data_dir=temp_config_dir)
|
||||||
|
|
||||||
|
env_path = temp_config_dir / ".env"
|
||||||
|
env_path.write_text(
|
||||||
|
"\n".join(
|
||||||
|
[
|
||||||
|
"TEST_FILE_PENALTY=oops",
|
||||||
|
"GENERATED_FILE_PENALTY=nope",
|
||||||
|
"",
|
||||||
|
]
|
||||||
|
),
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
config.load_settings()
|
||||||
|
|
||||||
|
assert config.test_file_penalty == 0.15
|
||||||
|
assert config.generated_file_penalty == 0.35
|
||||||
|
assert config.reranker_use_gpu is True
|
||||||
204
codex-lens/tests/test_embedding_status_root_model.py
Normal file
204
codex-lens/tests/test_embedding_status_root_model.py
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
import gc
|
||||||
|
import gc
|
||||||
|
import shutil
|
||||||
|
import sqlite3
|
||||||
|
import tempfile
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
import codexlens.cli.embedding_manager as embedding_manager
|
||||||
|
from codexlens.cli.embedding_manager import get_embedding_stats_summary, get_embeddings_status
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def status_temp_dir() -> Path:
|
||||||
|
temp_path = Path(tempfile.mkdtemp())
|
||||||
|
try:
|
||||||
|
yield temp_path
|
||||||
|
finally:
|
||||||
|
gc.collect()
|
||||||
|
for _ in range(5):
|
||||||
|
try:
|
||||||
|
if temp_path.exists():
|
||||||
|
shutil.rmtree(temp_path)
|
||||||
|
break
|
||||||
|
except PermissionError:
|
||||||
|
time.sleep(0.1)
|
||||||
|
|
||||||
|
|
||||||
|
def _create_index_db(index_path: Path, files: list[str], embedded_files: list[str] | None = None) -> None:
|
||||||
|
index_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with sqlite3.connect(index_path) as conn:
|
||||||
|
cursor = conn.cursor()
|
||||||
|
cursor.execute(
|
||||||
|
"""
|
||||||
|
CREATE TABLE files (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
path TEXT NOT NULL UNIQUE,
|
||||||
|
content TEXT,
|
||||||
|
language TEXT,
|
||||||
|
hash TEXT
|
||||||
|
)
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
cursor.executemany(
|
||||||
|
"INSERT INTO files (path, content, language, hash) VALUES (?, ?, ?, ?)",
|
||||||
|
[(file_path, "", "python", f"hash-{idx}") for idx, file_path in enumerate(files)],
|
||||||
|
)
|
||||||
|
|
||||||
|
if embedded_files is not None:
|
||||||
|
cursor.execute(
|
||||||
|
"""
|
||||||
|
CREATE TABLE semantic_chunks (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
file_path TEXT NOT NULL,
|
||||||
|
content TEXT,
|
||||||
|
embedding BLOB,
|
||||||
|
metadata TEXT,
|
||||||
|
category TEXT
|
||||||
|
)
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
cursor.executemany(
|
||||||
|
"INSERT INTO semantic_chunks (file_path, content, embedding, metadata, category) VALUES (?, ?, ?, ?, ?)",
|
||||||
|
[(file_path, "chunk", b"vec", "{}", "code") for file_path in embedded_files],
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
|
||||||
|
|
||||||
|
def _create_vectors_meta_db(meta_path: Path, embedded_files: list[str], binary_vector_count: int = 0) -> None:
|
||||||
|
meta_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with sqlite3.connect(meta_path) as conn:
|
||||||
|
cursor = conn.cursor()
|
||||||
|
cursor.execute(
|
||||||
|
"""
|
||||||
|
CREATE TABLE chunk_metadata (
|
||||||
|
chunk_id INTEGER PRIMARY KEY,
|
||||||
|
file_path TEXT NOT NULL,
|
||||||
|
content TEXT,
|
||||||
|
start_line INTEGER,
|
||||||
|
end_line INTEGER,
|
||||||
|
category TEXT,
|
||||||
|
metadata TEXT,
|
||||||
|
source_index_db TEXT
|
||||||
|
)
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
cursor.execute(
|
||||||
|
"""
|
||||||
|
CREATE TABLE binary_vectors (
|
||||||
|
chunk_id INTEGER PRIMARY KEY,
|
||||||
|
vector BLOB NOT NULL
|
||||||
|
)
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
cursor.executemany(
|
||||||
|
"""
|
||||||
|
INSERT INTO chunk_metadata (
|
||||||
|
chunk_id, file_path, content, start_line, end_line, category, metadata, source_index_db
|
||||||
|
) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
|
||||||
|
""",
|
||||||
|
[
|
||||||
|
(idx, file_path, "chunk", 1, 1, "code", "{}", str(meta_path.parent / "_index.db"))
|
||||||
|
for idx, file_path in enumerate(embedded_files, start=1)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
cursor.executemany(
|
||||||
|
"INSERT INTO binary_vectors (chunk_id, vector) VALUES (?, ?)",
|
||||||
|
[(idx, b"\x01") for idx in range(1, binary_vector_count + 1)],
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
|
||||||
|
|
||||||
|
def test_root_status_does_not_inherit_child_embeddings(
|
||||||
|
monkeypatch: pytest.MonkeyPatch, status_temp_dir: Path
|
||||||
|
) -> None:
|
||||||
|
workspace = status_temp_dir / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
_create_index_db(workspace / "_index.db", ["a.py", "b.py"])
|
||||||
|
_create_index_db(workspace / "child" / "_index.db", ["child.py"], embedded_files=["child.py"])
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
embedding_manager,
|
||||||
|
"_get_model_info_from_index",
|
||||||
|
lambda index_path: {
|
||||||
|
"model_profile": "fast",
|
||||||
|
"model_name": "unit-test-model",
|
||||||
|
"embedding_dim": 384,
|
||||||
|
"backend": "fastembed",
|
||||||
|
"created_at": "2026-03-13T00:00:00Z",
|
||||||
|
"updated_at": "2026-03-13T00:00:00Z",
|
||||||
|
} if index_path.parent.name == "child" else None,
|
||||||
|
)
|
||||||
|
|
||||||
|
status = get_embeddings_status(workspace)
|
||||||
|
assert status["success"] is True
|
||||||
|
|
||||||
|
result = status["result"]
|
||||||
|
assert result["coverage_percent"] == 0.0
|
||||||
|
assert result["files_with_embeddings"] == 0
|
||||||
|
assert result["root"]["has_embeddings"] is False
|
||||||
|
assert result["model_info"] is None
|
||||||
|
assert result["subtree"]["indexes_with_embeddings"] == 1
|
||||||
|
assert result["subtree"]["coverage_percent"] > 0
|
||||||
|
|
||||||
|
|
||||||
|
def test_root_status_uses_validated_centralized_metadata(status_temp_dir: Path) -> None:
|
||||||
|
workspace = status_temp_dir / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
_create_index_db(workspace / "_index.db", ["a.py", "b.py"])
|
||||||
|
_create_vectors_meta_db(workspace / "_vectors_meta.db", ["a.py"])
|
||||||
|
(workspace / "_vectors.hnsw").write_bytes(b"hnsw")
|
||||||
|
|
||||||
|
status = get_embeddings_status(workspace)
|
||||||
|
assert status["success"] is True
|
||||||
|
|
||||||
|
result = status["result"]
|
||||||
|
assert result["coverage_percent"] == 50.0
|
||||||
|
assert result["files_with_embeddings"] == 1
|
||||||
|
assert result["total_chunks"] == 1
|
||||||
|
assert result["root"]["has_embeddings"] is True
|
||||||
|
assert result["root"]["storage_mode"] == "centralized"
|
||||||
|
assert result["centralized"]["dense_ready"] is True
|
||||||
|
assert result["centralized"]["usable"] is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_embedding_stats_summary_skips_ignored_artifact_indexes(status_temp_dir: Path) -> None:
|
||||||
|
workspace = status_temp_dir / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
_create_index_db(workspace / "_index.db", ["root.py"])
|
||||||
|
_create_index_db(workspace / "src" / "_index.db", ["src.py"])
|
||||||
|
_create_index_db(workspace / "dist" / "_index.db", ["bundle.py"], embedded_files=["bundle.py"])
|
||||||
|
_create_index_db(workspace / ".workflow" / "_index.db", ["trace.py"], embedded_files=["trace.py"])
|
||||||
|
|
||||||
|
summary = get_embedding_stats_summary(workspace)
|
||||||
|
|
||||||
|
assert summary["success"] is True
|
||||||
|
result = summary["result"]
|
||||||
|
assert result["total_indexes"] == 2
|
||||||
|
assert {Path(item["path"]).relative_to(workspace).as_posix() for item in result["indexes"]} == {
|
||||||
|
"_index.db",
|
||||||
|
"src/_index.db",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_root_status_ignores_empty_centralized_artifacts(status_temp_dir: Path) -> None:
|
||||||
|
workspace = status_temp_dir / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
_create_index_db(workspace / "_index.db", ["a.py", "b.py"])
|
||||||
|
_create_vectors_meta_db(workspace / "_vectors_meta.db", [])
|
||||||
|
(workspace / "_vectors.hnsw").write_bytes(b"hnsw")
|
||||||
|
(workspace / "_binary_vectors.mmap").write_bytes(b"mmap")
|
||||||
|
|
||||||
|
status = get_embeddings_status(workspace)
|
||||||
|
assert status["success"] is True
|
||||||
|
|
||||||
|
result = status["result"]
|
||||||
|
assert result["coverage_percent"] == 0.0
|
||||||
|
assert result["files_with_embeddings"] == 0
|
||||||
|
assert result["root"]["has_embeddings"] is False
|
||||||
|
assert result["centralized"]["chunk_metadata_rows"] == 0
|
||||||
|
assert result["centralized"]["binary_vector_rows"] == 0
|
||||||
|
assert result["centralized"]["usable"] is False
|
||||||
@@ -833,6 +833,36 @@ class TestHybridSearchAdaptiveWeights:
|
|||||||
|
|
||||||
assert captured["weights"]["vector"] > 0.6
|
assert captured["weights"]["vector"] > 0.6
|
||||||
|
|
||||||
|
def test_default_engine_weights_keep_lsp_graph_backend_available(self):
|
||||||
|
"""Legacy public defaults should not discard LSP graph fusion weights internally."""
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
engine = HybridSearchEngine()
|
||||||
|
|
||||||
|
results_map = {
|
||||||
|
"exact": [SearchResult(path="a.py", score=10.0, excerpt="a")],
|
||||||
|
"fuzzy": [SearchResult(path="b.py", score=9.0, excerpt="b")],
|
||||||
|
"vector": [SearchResult(path="c.py", score=0.9, excerpt="c")],
|
||||||
|
"lsp_graph": [SearchResult(path="d.py", score=0.8, excerpt="d")],
|
||||||
|
}
|
||||||
|
|
||||||
|
captured = {}
|
||||||
|
from codexlens.search import ranking as ranking_module
|
||||||
|
|
||||||
|
def capture_rrf(map_in, weights_in, k=60):
|
||||||
|
captured["weights"] = dict(weights_in)
|
||||||
|
return ranking_module.reciprocal_rank_fusion(map_in, weights_in, k=k)
|
||||||
|
|
||||||
|
with patch.object(HybridSearchEngine, "_search_parallel", return_value=results_map), patch(
|
||||||
|
"codexlens.search.hybrid_search.reciprocal_rank_fusion",
|
||||||
|
side_effect=capture_rrf,
|
||||||
|
):
|
||||||
|
engine.search(Path("dummy.db"), "auth flow", enable_vector=True, enable_lsp_graph=True)
|
||||||
|
|
||||||
|
assert engine.weights == HybridSearchEngine.DEFAULT_WEIGHTS
|
||||||
|
assert "lsp_graph" in captured["weights"]
|
||||||
|
assert captured["weights"]["lsp_graph"] > 0.0
|
||||||
|
|
||||||
def test_reranking_enabled(self, tmp_path):
|
def test_reranking_enabled(self, tmp_path):
|
||||||
"""Reranking runs only when explicitly enabled via config."""
|
"""Reranking runs only when explicitly enabled via config."""
|
||||||
from unittest.mock import patch
|
from unittest.mock import patch
|
||||||
|
|||||||
@@ -93,7 +93,8 @@ def test_get_cross_encoder_reranker_uses_factory_backend_onnx_gpu_flag(
|
|||||||
enable_reranking=True,
|
enable_reranking=True,
|
||||||
enable_cross_encoder_rerank=True,
|
enable_cross_encoder_rerank=True,
|
||||||
reranker_backend="onnx",
|
reranker_backend="onnx",
|
||||||
embedding_use_gpu=False,
|
embedding_use_gpu=True,
|
||||||
|
reranker_use_gpu=False,
|
||||||
)
|
)
|
||||||
engine = HybridSearchEngine(config=config)
|
engine = HybridSearchEngine(config=config)
|
||||||
|
|
||||||
@@ -109,6 +110,58 @@ def test_get_cross_encoder_reranker_uses_factory_backend_onnx_gpu_flag(
|
|||||||
assert get_args["kwargs"]["use_gpu"] is False
|
assert get_args["kwargs"]["use_gpu"] is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_get_cross_encoder_reranker_uses_cpu_device_for_legacy_when_reranker_gpu_disabled(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
calls: dict[str, object] = {}
|
||||||
|
|
||||||
|
def fake_check_reranker_available(backend: str):
|
||||||
|
calls["check_backend"] = backend
|
||||||
|
return True, None
|
||||||
|
|
||||||
|
sentinel = object()
|
||||||
|
|
||||||
|
def fake_get_reranker(*, backend: str, model_name=None, device=None, **kwargs):
|
||||||
|
calls["get_args"] = {
|
||||||
|
"backend": backend,
|
||||||
|
"model_name": model_name,
|
||||||
|
"device": device,
|
||||||
|
"kwargs": kwargs,
|
||||||
|
}
|
||||||
|
return sentinel
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"codexlens.semantic.reranker.check_reranker_available",
|
||||||
|
fake_check_reranker_available,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"codexlens.semantic.reranker.get_reranker",
|
||||||
|
fake_get_reranker,
|
||||||
|
)
|
||||||
|
|
||||||
|
config = Config(
|
||||||
|
data_dir=tmp_path / "legacy-cpu",
|
||||||
|
enable_reranking=True,
|
||||||
|
enable_cross_encoder_rerank=True,
|
||||||
|
reranker_backend="legacy",
|
||||||
|
reranker_model="dummy-model",
|
||||||
|
embedding_use_gpu=True,
|
||||||
|
reranker_use_gpu=False,
|
||||||
|
)
|
||||||
|
engine = HybridSearchEngine(config=config)
|
||||||
|
|
||||||
|
reranker = engine._get_cross_encoder_reranker()
|
||||||
|
assert reranker is sentinel
|
||||||
|
assert calls["check_backend"] == "legacy"
|
||||||
|
|
||||||
|
get_args = calls["get_args"]
|
||||||
|
assert isinstance(get_args, dict)
|
||||||
|
assert get_args["backend"] == "legacy"
|
||||||
|
assert get_args["model_name"] == "dummy-model"
|
||||||
|
assert get_args["device"] == "cpu"
|
||||||
|
|
||||||
|
|
||||||
def test_get_cross_encoder_reranker_returns_none_when_backend_unavailable(
|
def test_get_cross_encoder_reranker_returns_none_when_backend_unavailable(
|
||||||
monkeypatch: pytest.MonkeyPatch,
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
tmp_path,
|
tmp_path,
|
||||||
|
|||||||
@@ -150,6 +150,30 @@ class TestHybridSearchBackends:
|
|||||||
assert "exact" in backends
|
assert "exact" in backends
|
||||||
assert "vector" in backends
|
assert "vector" in backends
|
||||||
|
|
||||||
|
def test_search_lexical_priority_query_skips_vector_backend(self, temp_paths, mock_config):
|
||||||
|
"""Config/env/factory queries should stay lexical-first in hybrid mode."""
|
||||||
|
engine = HybridSearchEngine(config=mock_config)
|
||||||
|
index_path = temp_paths / "_index.db"
|
||||||
|
|
||||||
|
with patch.object(engine, "_search_parallel") as mock_parallel:
|
||||||
|
mock_parallel.return_value = {
|
||||||
|
"exact": [SearchResult(path="config.py", score=10.0, excerpt="exact")],
|
||||||
|
"fuzzy": [SearchResult(path="env_config.py", score=8.0, excerpt="fuzzy")],
|
||||||
|
}
|
||||||
|
|
||||||
|
results = engine.search(
|
||||||
|
index_path,
|
||||||
|
"embedding backend fastembed local litellm api config",
|
||||||
|
enable_fuzzy=True,
|
||||||
|
enable_vector=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert len(results) >= 1
|
||||||
|
backends = mock_parallel.call_args[0][2]
|
||||||
|
assert "exact" in backends
|
||||||
|
assert "fuzzy" in backends
|
||||||
|
assert "vector" not in backends
|
||||||
|
|
||||||
def test_search_pure_vector(self, temp_paths, mock_config):
|
def test_search_pure_vector(self, temp_paths, mock_config):
|
||||||
"""Pure vector mode should only use vector backend."""
|
"""Pure vector mode should only use vector backend."""
|
||||||
engine = HybridSearchEngine(config=mock_config)
|
engine = HybridSearchEngine(config=mock_config)
|
||||||
@@ -257,6 +281,39 @@ class TestHybridSearchFusion:
|
|||||||
|
|
||||||
mock_rerank.assert_called_once()
|
mock_rerank.assert_called_once()
|
||||||
|
|
||||||
|
def test_search_lexical_priority_query_skips_expensive_reranking(self, temp_paths, mock_config):
|
||||||
|
"""Lexical-priority queries should bypass embedder and cross-encoder reranking."""
|
||||||
|
mock_config.enable_reranking = True
|
||||||
|
mock_config.enable_cross_encoder_rerank = True
|
||||||
|
mock_config.reranking_top_k = 50
|
||||||
|
mock_config.reranker_top_k = 20
|
||||||
|
engine = HybridSearchEngine(config=mock_config)
|
||||||
|
index_path = temp_paths / "_index.db"
|
||||||
|
|
||||||
|
with patch.object(engine, "_search_parallel") as mock_parallel:
|
||||||
|
mock_parallel.return_value = {
|
||||||
|
"exact": [SearchResult(path="config.py", score=10.0, excerpt="code")],
|
||||||
|
"fuzzy": [SearchResult(path="env_config.py", score=9.0, excerpt="env vars")],
|
||||||
|
}
|
||||||
|
|
||||||
|
with patch("codexlens.search.hybrid_search.rerank_results") as mock_rerank, patch(
|
||||||
|
"codexlens.search.hybrid_search.cross_encoder_rerank"
|
||||||
|
) as mock_cross_encoder, patch.object(
|
||||||
|
engine,
|
||||||
|
"_get_cross_encoder_reranker",
|
||||||
|
) as mock_get_reranker:
|
||||||
|
results = engine.search(
|
||||||
|
index_path,
|
||||||
|
"get_reranker factory onnx backend selection",
|
||||||
|
enable_fuzzy=True,
|
||||||
|
enable_vector=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert len(results) >= 1
|
||||||
|
mock_rerank.assert_not_called()
|
||||||
|
mock_cross_encoder.assert_not_called()
|
||||||
|
mock_get_reranker.assert_not_called()
|
||||||
|
|
||||||
def test_search_category_filtering(self, temp_paths, mock_config):
|
def test_search_category_filtering(self, temp_paths, mock_config):
|
||||||
"""Category filtering should separate code/doc results by intent."""
|
"""Category filtering should separate code/doc results by intent."""
|
||||||
mock_config.enable_category_filter = True
|
mock_config.enable_category_filter = True
|
||||||
@@ -316,6 +373,217 @@ class TestSearchParallel:
|
|||||||
mock_fuzzy.assert_called_once()
|
mock_fuzzy.assert_called_once()
|
||||||
|
|
||||||
|
|
||||||
|
class TestCentralizedMetadataFetch:
|
||||||
|
"""Tests for centralized metadata retrieval helpers."""
|
||||||
|
|
||||||
|
def test_fetch_from_vector_meta_store_clamps_negative_scores(self, temp_paths, mock_config, monkeypatch):
|
||||||
|
engine = HybridSearchEngine(config=mock_config)
|
||||||
|
|
||||||
|
class FakeMetaStore:
|
||||||
|
def __init__(self, _path):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
return self
|
||||||
|
|
||||||
|
def __exit__(self, exc_type, exc, tb):
|
||||||
|
return False
|
||||||
|
|
||||||
|
def get_chunks_by_ids(self, _chunk_ids, category=None):
|
||||||
|
assert category is None
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"chunk_id": 7,
|
||||||
|
"file_path": "src/app.py",
|
||||||
|
"content": "def app(): pass",
|
||||||
|
"metadata": {},
|
||||||
|
"start_line": 1,
|
||||||
|
"end_line": 1,
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
||||||
|
import codexlens.storage.vector_meta_store as vector_meta_store
|
||||||
|
|
||||||
|
monkeypatch.setattr(vector_meta_store, "VectorMetadataStore", FakeMetaStore)
|
||||||
|
|
||||||
|
results = engine._fetch_from_vector_meta_store(
|
||||||
|
temp_paths / "_vectors_meta.db",
|
||||||
|
[7],
|
||||||
|
{7: -0.01},
|
||||||
|
)
|
||||||
|
|
||||||
|
assert len(results) == 1
|
||||||
|
assert results[0].path == "src/app.py"
|
||||||
|
assert results[0].score == 0.0
|
||||||
|
|
||||||
|
|
||||||
|
class TestCentralizedVectorCaching:
|
||||||
|
"""Tests for centralized vector search runtime caches."""
|
||||||
|
|
||||||
|
def test_search_vector_centralized_reuses_cached_resources(
|
||||||
|
self,
|
||||||
|
temp_paths,
|
||||||
|
mock_config,
|
||||||
|
):
|
||||||
|
engine = HybridSearchEngine(config=mock_config)
|
||||||
|
hnsw_path = temp_paths / "_vectors.hnsw"
|
||||||
|
hnsw_path.write_bytes(b"hnsw")
|
||||||
|
|
||||||
|
vector_store_opened: List[Path] = []
|
||||||
|
|
||||||
|
class FakeVectorStore:
|
||||||
|
def __init__(self, path):
|
||||||
|
vector_store_opened.append(Path(path))
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
return self
|
||||||
|
|
||||||
|
def __exit__(self, exc_type, exc, tb):
|
||||||
|
return False
|
||||||
|
|
||||||
|
def get_model_config(self):
|
||||||
|
return {
|
||||||
|
"backend": "fastembed",
|
||||||
|
"model_name": "BAAI/bge-small-en-v1.5",
|
||||||
|
"model_profile": "fast",
|
||||||
|
"embedding_dim": 384,
|
||||||
|
}
|
||||||
|
|
||||||
|
class FakeEmbedder:
|
||||||
|
embedding_dim = 384
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.embed_calls: List[str] = []
|
||||||
|
|
||||||
|
def embed_single(self, query):
|
||||||
|
self.embed_calls.append(query)
|
||||||
|
return [0.1, 0.2, 0.3]
|
||||||
|
|
||||||
|
class FakeAnnIndex:
|
||||||
|
def __init__(self):
|
||||||
|
self.load_calls = 0
|
||||||
|
self.search_calls = 0
|
||||||
|
|
||||||
|
def load(self):
|
||||||
|
self.load_calls += 1
|
||||||
|
return True
|
||||||
|
|
||||||
|
def count(self):
|
||||||
|
return 3
|
||||||
|
|
||||||
|
def search(self, _query_vec, top_k):
|
||||||
|
self.search_calls += 1
|
||||||
|
assert top_k == 10
|
||||||
|
return [7], [0.2]
|
||||||
|
|
||||||
|
fake_embedder = FakeEmbedder()
|
||||||
|
fake_ann_index = FakeAnnIndex()
|
||||||
|
|
||||||
|
with patch("codexlens.semantic.vector_store.VectorStore", FakeVectorStore), patch(
|
||||||
|
"codexlens.semantic.factory.get_embedder",
|
||||||
|
return_value=fake_embedder,
|
||||||
|
) as mock_get_embedder, patch(
|
||||||
|
"codexlens.semantic.ann_index.ANNIndex.create_central",
|
||||||
|
return_value=fake_ann_index,
|
||||||
|
) as mock_create_central, patch.object(
|
||||||
|
engine,
|
||||||
|
"_fetch_chunks_by_ids_centralized",
|
||||||
|
return_value=[SearchResult(path="src/app.py", score=0.8, excerpt="hit")],
|
||||||
|
) as mock_fetch:
|
||||||
|
first = engine._search_vector_centralized(
|
||||||
|
temp_paths / "child-a" / "_index.db",
|
||||||
|
hnsw_path,
|
||||||
|
"smart search routing",
|
||||||
|
limit=5,
|
||||||
|
)
|
||||||
|
second = engine._search_vector_centralized(
|
||||||
|
temp_paths / "child-b" / "_index.db",
|
||||||
|
hnsw_path,
|
||||||
|
"smart search routing",
|
||||||
|
limit=5,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert [result.path for result in first] == ["src/app.py"]
|
||||||
|
assert [result.path for result in second] == ["src/app.py"]
|
||||||
|
assert vector_store_opened == [temp_paths / "_index.db"]
|
||||||
|
assert mock_get_embedder.call_count == 1
|
||||||
|
assert mock_create_central.call_count == 1
|
||||||
|
assert fake_ann_index.load_calls == 1
|
||||||
|
assert fake_embedder.embed_calls == ["smart search routing"]
|
||||||
|
assert fake_ann_index.search_calls == 2
|
||||||
|
assert mock_fetch.call_count == 2
|
||||||
|
|
||||||
|
def test_search_vector_centralized_respects_embedding_use_gpu(
|
||||||
|
self,
|
||||||
|
temp_paths,
|
||||||
|
mock_config,
|
||||||
|
):
|
||||||
|
engine = HybridSearchEngine(config=mock_config)
|
||||||
|
hnsw_path = temp_paths / "_vectors.hnsw"
|
||||||
|
hnsw_path.write_bytes(b"hnsw")
|
||||||
|
|
||||||
|
class FakeVectorStore:
|
||||||
|
def __init__(self, _path):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
return self
|
||||||
|
|
||||||
|
def __exit__(self, exc_type, exc, tb):
|
||||||
|
return False
|
||||||
|
|
||||||
|
def get_model_config(self):
|
||||||
|
return {
|
||||||
|
"backend": "fastembed",
|
||||||
|
"model_name": "BAAI/bge-small-en-v1.5",
|
||||||
|
"model_profile": "code",
|
||||||
|
"embedding_dim": 384,
|
||||||
|
}
|
||||||
|
|
||||||
|
class FakeEmbedder:
|
||||||
|
embedding_dim = 384
|
||||||
|
|
||||||
|
def embed_single(self, _query):
|
||||||
|
return [0.1, 0.2]
|
||||||
|
|
||||||
|
class FakeAnnIndex:
|
||||||
|
def load(self):
|
||||||
|
return True
|
||||||
|
|
||||||
|
def count(self):
|
||||||
|
return 1
|
||||||
|
|
||||||
|
def search(self, _query_vec, top_k):
|
||||||
|
assert top_k == 6
|
||||||
|
return [9], [0.1]
|
||||||
|
|
||||||
|
with patch("codexlens.semantic.vector_store.VectorStore", FakeVectorStore), patch(
|
||||||
|
"codexlens.semantic.factory.get_embedder",
|
||||||
|
return_value=FakeEmbedder(),
|
||||||
|
) as mock_get_embedder, patch(
|
||||||
|
"codexlens.semantic.ann_index.ANNIndex.create_central",
|
||||||
|
return_value=FakeAnnIndex(),
|
||||||
|
), patch.object(
|
||||||
|
engine,
|
||||||
|
"_fetch_chunks_by_ids_centralized",
|
||||||
|
return_value=[SearchResult(path="src/app.py", score=0.9, excerpt="hit")],
|
||||||
|
):
|
||||||
|
results = engine._search_vector_centralized(
|
||||||
|
temp_paths / "_index.db",
|
||||||
|
hnsw_path,
|
||||||
|
"semantic query",
|
||||||
|
limit=3,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert len(results) == 1
|
||||||
|
assert mock_get_embedder.call_count == 1
|
||||||
|
assert mock_get_embedder.call_args.kwargs == {
|
||||||
|
"backend": "fastembed",
|
||||||
|
"profile": "code",
|
||||||
|
"use_gpu": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# Tests: _search_lsp_graph
|
# Tests: _search_lsp_graph
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
|
|||||||
674
codex-lens/tests/test_index_status_cli_contract.py
Normal file
674
codex-lens/tests/test_index_status_cli_contract.py
Normal file
@@ -0,0 +1,674 @@
|
|||||||
|
import json
|
||||||
|
|
||||||
|
from typer.testing import CliRunner
|
||||||
|
|
||||||
|
import codexlens.cli.commands as commands
|
||||||
|
from codexlens.cli.commands import app
|
||||||
|
import codexlens.cli.embedding_manager as embedding_manager
|
||||||
|
from codexlens.config import Config
|
||||||
|
from codexlens.entities import SearchResult
|
||||||
|
from codexlens.search.chain_search import ChainSearchResult, SearchStats
|
||||||
|
|
||||||
|
|
||||||
|
def test_index_status_json_preserves_legacy_embeddings_contract(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
(workspace / "_index.db").touch()
|
||||||
|
|
||||||
|
legacy_summary = {
|
||||||
|
"total_indexes": 3,
|
||||||
|
"indexes_with_embeddings": 1,
|
||||||
|
"total_chunks": 42,
|
||||||
|
"indexes": [
|
||||||
|
{
|
||||||
|
"project": "child",
|
||||||
|
"path": str(workspace / "child" / "_index.db"),
|
||||||
|
"has_embeddings": True,
|
||||||
|
"total_chunks": 42,
|
||||||
|
"total_files": 1,
|
||||||
|
"coverage_percent": 100.0,
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
root_status = {
|
||||||
|
"total_indexes": 3,
|
||||||
|
"total_files": 2,
|
||||||
|
"files_with_embeddings": 0,
|
||||||
|
"files_without_embeddings": 2,
|
||||||
|
"total_chunks": 0,
|
||||||
|
"coverage_percent": 0.0,
|
||||||
|
"indexes_with_embeddings": 1,
|
||||||
|
"indexes_without_embeddings": 2,
|
||||||
|
"model_info": None,
|
||||||
|
"root": {
|
||||||
|
"index_path": str(workspace / "_index.db"),
|
||||||
|
"exists": False,
|
||||||
|
"total_files": 2,
|
||||||
|
"files_with_embeddings": 0,
|
||||||
|
"files_without_embeddings": 2,
|
||||||
|
"total_chunks": 0,
|
||||||
|
"coverage_percent": 0.0,
|
||||||
|
"has_embeddings": False,
|
||||||
|
"storage_mode": "none",
|
||||||
|
},
|
||||||
|
"subtree": {
|
||||||
|
"total_indexes": 3,
|
||||||
|
"total_files": 3,
|
||||||
|
"files_with_embeddings": 1,
|
||||||
|
"files_without_embeddings": 2,
|
||||||
|
"total_chunks": 42,
|
||||||
|
"coverage_percent": 33.3,
|
||||||
|
"indexes_with_embeddings": 1,
|
||||||
|
"indexes_without_embeddings": 2,
|
||||||
|
},
|
||||||
|
"centralized": {
|
||||||
|
"dense_index_exists": False,
|
||||||
|
"binary_index_exists": False,
|
||||||
|
"dense_ready": False,
|
||||||
|
"binary_ready": False,
|
||||||
|
"usable": False,
|
||||||
|
"chunk_metadata_rows": 0,
|
||||||
|
"binary_vector_rows": 0,
|
||||||
|
"files_with_embeddings": 0,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
embedding_manager,
|
||||||
|
"get_embeddings_status",
|
||||||
|
lambda _index_root: {"success": True, "result": root_status},
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
embedding_manager,
|
||||||
|
"get_embedding_stats_summary",
|
||||||
|
lambda _index_root: {"success": True, "result": legacy_summary},
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"RegistryStore",
|
||||||
|
type(
|
||||||
|
"FakeRegistryStore",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"initialize": lambda self: None,
|
||||||
|
"close": lambda self: None,
|
||||||
|
},
|
||||||
|
),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"PathMapper",
|
||||||
|
type(
|
||||||
|
"FakePathMapper",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"source_to_index_db": lambda self, _target_path: workspace / "_index.db",
|
||||||
|
},
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
runner = CliRunner()
|
||||||
|
result = runner.invoke(app, ["index", "status", str(workspace), "--json"])
|
||||||
|
|
||||||
|
assert result.exit_code == 0, result.output
|
||||||
|
payload = json.loads(result.stdout)
|
||||||
|
body = payload["result"]
|
||||||
|
assert body["embeddings"] == legacy_summary
|
||||||
|
assert body["embeddings_error"] is None
|
||||||
|
assert body["embeddings_status"] == root_status
|
||||||
|
assert body["embeddings_status_error"] is None
|
||||||
|
assert body["embeddings_summary"] == legacy_summary
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_json_preserves_dense_rerank_method_label(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
|
||||||
|
search_result = ChainSearchResult(
|
||||||
|
query="greet function",
|
||||||
|
results=[
|
||||||
|
SearchResult(
|
||||||
|
path=str(workspace / "src" / "app.py"),
|
||||||
|
score=0.97,
|
||||||
|
excerpt="def greet(name):",
|
||||||
|
content="def greet(name):\n return f'hello {name}'\n",
|
||||||
|
)
|
||||||
|
],
|
||||||
|
symbols=[],
|
||||||
|
stats=SearchStats(dirs_searched=2, files_matched=1, time_ms=12.5),
|
||||||
|
)
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands.Config, "load", staticmethod(lambda: Config(data_dir=tmp_path / "data")))
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"RegistryStore",
|
||||||
|
type(
|
||||||
|
"FakeRegistryStore",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"initialize": lambda self: None,
|
||||||
|
"close": lambda self: None,
|
||||||
|
},
|
||||||
|
),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"PathMapper",
|
||||||
|
type(
|
||||||
|
"FakePathMapper",
|
||||||
|
(),
|
||||||
|
{},
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
class FakeChainSearchEngine:
|
||||||
|
def __init__(self, registry, mapper, config=None):
|
||||||
|
captured["registry"] = registry
|
||||||
|
captured["mapper"] = mapper
|
||||||
|
captured["config"] = config
|
||||||
|
|
||||||
|
def search(self, *_args, **_kwargs):
|
||||||
|
raise AssertionError("dense_rerank should dispatch via cascade_search")
|
||||||
|
|
||||||
|
def cascade_search(self, query, source_path, k=10, options=None, strategy=None):
|
||||||
|
captured["query"] = query
|
||||||
|
captured["source_path"] = source_path
|
||||||
|
captured["limit"] = k
|
||||||
|
captured["options"] = options
|
||||||
|
captured["strategy"] = strategy
|
||||||
|
return search_result
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands, "ChainSearchEngine", FakeChainSearchEngine)
|
||||||
|
|
||||||
|
runner = CliRunner()
|
||||||
|
result = runner.invoke(
|
||||||
|
app,
|
||||||
|
["search", "greet function", "--path", str(workspace), "--method", "dense_rerank", "--json"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0, result.output
|
||||||
|
payload = json.loads(result.stdout)
|
||||||
|
body = payload["result"]
|
||||||
|
assert body["method"] == "dense_rerank"
|
||||||
|
assert body["count"] == 1
|
||||||
|
assert body["results"][0]["path"] == str(workspace / "src" / "app.py")
|
||||||
|
assert captured["strategy"] == "dense_rerank"
|
||||||
|
assert captured["limit"] == 20
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_json_auto_routes_keyword_queries_to_fts(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
|
||||||
|
search_result = ChainSearchResult(
|
||||||
|
query="windowsHide",
|
||||||
|
results=[
|
||||||
|
SearchResult(
|
||||||
|
path=str(workspace / "src" / "spawn.ts"),
|
||||||
|
score=0.91,
|
||||||
|
excerpt="windowsHide: true",
|
||||||
|
content="spawn('node', [], { windowsHide: true })",
|
||||||
|
)
|
||||||
|
],
|
||||||
|
symbols=[],
|
||||||
|
stats=SearchStats(dirs_searched=2, files_matched=1, time_ms=8.0),
|
||||||
|
)
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands.Config, "load", staticmethod(lambda: Config(data_dir=tmp_path / "data")))
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"RegistryStore",
|
||||||
|
type("FakeRegistryStore", (), {"initialize": lambda self: None, "close": lambda self: None}),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(commands, "PathMapper", type("FakePathMapper", (), {}))
|
||||||
|
|
||||||
|
class FakeChainSearchEngine:
|
||||||
|
def __init__(self, registry, mapper, config=None):
|
||||||
|
captured["config"] = config
|
||||||
|
|
||||||
|
def search(self, query, source_path, options=None):
|
||||||
|
captured["query"] = query
|
||||||
|
captured["source_path"] = source_path
|
||||||
|
captured["options"] = options
|
||||||
|
return search_result
|
||||||
|
|
||||||
|
def cascade_search(self, *_args, **_kwargs):
|
||||||
|
raise AssertionError("auto keyword queries should not dispatch to cascade_search")
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands, "ChainSearchEngine", FakeChainSearchEngine)
|
||||||
|
|
||||||
|
runner = CliRunner()
|
||||||
|
result = runner.invoke(
|
||||||
|
app,
|
||||||
|
["search", "windowsHide", "--path", str(workspace), "--json"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0, result.output
|
||||||
|
body = json.loads(result.stdout)["result"]
|
||||||
|
assert body["method"] == "fts"
|
||||||
|
assert captured["options"].enable_vector is False
|
||||||
|
assert captured["options"].hybrid_mode is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_json_auto_routes_mixed_queries_to_hybrid(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
|
||||||
|
search_result = ChainSearchResult(
|
||||||
|
query="how does my_function work",
|
||||||
|
results=[
|
||||||
|
SearchResult(
|
||||||
|
path=str(workspace / "src" / "app.py"),
|
||||||
|
score=0.81,
|
||||||
|
excerpt="def my_function():",
|
||||||
|
content="def my_function():\n return 1\n",
|
||||||
|
)
|
||||||
|
],
|
||||||
|
symbols=[],
|
||||||
|
stats=SearchStats(dirs_searched=2, files_matched=1, time_ms=10.0),
|
||||||
|
)
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands.Config, "load", staticmethod(lambda: Config(data_dir=tmp_path / "data")))
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"RegistryStore",
|
||||||
|
type("FakeRegistryStore", (), {"initialize": lambda self: None, "close": lambda self: None}),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(commands, "PathMapper", type("FakePathMapper", (), {}))
|
||||||
|
|
||||||
|
class FakeChainSearchEngine:
|
||||||
|
def __init__(self, registry, mapper, config=None):
|
||||||
|
captured["config"] = config
|
||||||
|
|
||||||
|
def search(self, query, source_path, options=None):
|
||||||
|
captured["query"] = query
|
||||||
|
captured["source_path"] = source_path
|
||||||
|
captured["options"] = options
|
||||||
|
return search_result
|
||||||
|
|
||||||
|
def cascade_search(self, *_args, **_kwargs):
|
||||||
|
raise AssertionError("mixed auto queries should not dispatch to cascade_search")
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands, "ChainSearchEngine", FakeChainSearchEngine)
|
||||||
|
|
||||||
|
runner = CliRunner()
|
||||||
|
result = runner.invoke(
|
||||||
|
app,
|
||||||
|
["search", "how does my_function work", "--path", str(workspace), "--json"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0, result.output
|
||||||
|
body = json.loads(result.stdout)["result"]
|
||||||
|
assert body["method"] == "hybrid"
|
||||||
|
assert captured["options"].enable_vector is True
|
||||||
|
assert captured["options"].hybrid_mode is True
|
||||||
|
assert captured["options"].enable_cascade is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_json_auto_routes_generated_artifact_queries_to_fts(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
|
||||||
|
search_result = ChainSearchResult(
|
||||||
|
query="dist bundle output",
|
||||||
|
results=[
|
||||||
|
SearchResult(
|
||||||
|
path=str(workspace / "dist" / "bundle.js"),
|
||||||
|
score=0.77,
|
||||||
|
excerpt="bundle output",
|
||||||
|
content="console.log('bundle')",
|
||||||
|
)
|
||||||
|
],
|
||||||
|
symbols=[],
|
||||||
|
stats=SearchStats(dirs_searched=2, files_matched=1, time_ms=9.0),
|
||||||
|
)
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands.Config, "load", staticmethod(lambda: Config(data_dir=tmp_path / "data")))
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"RegistryStore",
|
||||||
|
type("FakeRegistryStore", (), {"initialize": lambda self: None, "close": lambda self: None}),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(commands, "PathMapper", type("FakePathMapper", (), {}))
|
||||||
|
|
||||||
|
class FakeChainSearchEngine:
|
||||||
|
def __init__(self, registry, mapper, config=None):
|
||||||
|
captured["config"] = config
|
||||||
|
|
||||||
|
def search(self, query, source_path, options=None):
|
||||||
|
captured["query"] = query
|
||||||
|
captured["source_path"] = source_path
|
||||||
|
captured["options"] = options
|
||||||
|
return search_result
|
||||||
|
|
||||||
|
def cascade_search(self, *_args, **_kwargs):
|
||||||
|
raise AssertionError("generated artifact auto queries should not dispatch to cascade_search")
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands, "ChainSearchEngine", FakeChainSearchEngine)
|
||||||
|
|
||||||
|
runner = CliRunner()
|
||||||
|
result = runner.invoke(
|
||||||
|
app,
|
||||||
|
["search", "dist bundle output", "--path", str(workspace), "--json"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0, result.output
|
||||||
|
body = json.loads(result.stdout)["result"]
|
||||||
|
assert body["method"] == "fts"
|
||||||
|
assert captured["options"].enable_vector is False
|
||||||
|
assert captured["options"].hybrid_mode is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_auto_select_search_method_prefers_fts_for_lexical_config_queries() -> None:
|
||||||
|
assert commands._auto_select_search_method("embedding backend fastembed local litellm api config") == "fts"
|
||||||
|
assert commands._auto_select_search_method("get_reranker factory onnx backend selection") == "fts"
|
||||||
|
assert commands._auto_select_search_method("how to authenticate users safely?") == "dense_rerank"
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_json_fts_zero_results_uses_filesystem_fallback(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
|
||||||
|
indexed_result = ChainSearchResult(
|
||||||
|
query="find_descendant_project_roots",
|
||||||
|
results=[],
|
||||||
|
symbols=[],
|
||||||
|
stats=SearchStats(dirs_searched=3, files_matched=0, time_ms=7.5),
|
||||||
|
)
|
||||||
|
fallback_result = SearchResult(
|
||||||
|
path=str(workspace / "src" / "registry.py"),
|
||||||
|
score=1.0,
|
||||||
|
excerpt="def find_descendant_project_roots(...):",
|
||||||
|
content=None,
|
||||||
|
metadata={
|
||||||
|
"filesystem_fallback": True,
|
||||||
|
"backend": "ripgrep-fallback",
|
||||||
|
"stale_index_suspected": True,
|
||||||
|
},
|
||||||
|
start_line=12,
|
||||||
|
end_line=12,
|
||||||
|
)
|
||||||
|
captured: dict[str, object] = {"fallback_calls": 0}
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands.Config, "load", staticmethod(lambda: Config(data_dir=tmp_path / "data")))
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"RegistryStore",
|
||||||
|
type("FakeRegistryStore", (), {"initialize": lambda self: None, "close": lambda self: None}),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(commands, "PathMapper", type("FakePathMapper", (), {}))
|
||||||
|
|
||||||
|
class FakeChainSearchEngine:
|
||||||
|
def __init__(self, registry, mapper, config=None):
|
||||||
|
captured["config"] = config
|
||||||
|
|
||||||
|
def search(self, query, source_path, options=None):
|
||||||
|
captured["query"] = query
|
||||||
|
captured["source_path"] = source_path
|
||||||
|
captured["options"] = options
|
||||||
|
return indexed_result
|
||||||
|
|
||||||
|
def cascade_search(self, *_args, **_kwargs):
|
||||||
|
raise AssertionError("fts zero-result queries should not dispatch to cascade_search")
|
||||||
|
|
||||||
|
def fake_fallback(query, source_path, *, limit, config, code_only=False, exclude_extensions=None):
|
||||||
|
captured["fallback_calls"] = int(captured["fallback_calls"]) + 1
|
||||||
|
captured["fallback_query"] = query
|
||||||
|
captured["fallback_path"] = source_path
|
||||||
|
captured["fallback_limit"] = limit
|
||||||
|
captured["fallback_code_only"] = code_only
|
||||||
|
captured["fallback_exclude_extensions"] = exclude_extensions
|
||||||
|
return {
|
||||||
|
"results": [fallback_result],
|
||||||
|
"time_ms": 2.5,
|
||||||
|
"fallback": {
|
||||||
|
"backend": "ripgrep-fallback",
|
||||||
|
"stale_index_suspected": True,
|
||||||
|
"reason": "Indexed FTS search returned no results; filesystem fallback used.",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands, "ChainSearchEngine", FakeChainSearchEngine)
|
||||||
|
monkeypatch.setattr(commands, "_filesystem_fallback_search", fake_fallback)
|
||||||
|
|
||||||
|
runner = CliRunner()
|
||||||
|
result = runner.invoke(
|
||||||
|
app,
|
||||||
|
["search", "find_descendant_project_roots", "--method", "fts", "--path", str(workspace), "--json"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0, result.output
|
||||||
|
body = json.loads(result.stdout)["result"]
|
||||||
|
assert body["method"] == "fts"
|
||||||
|
assert body["count"] == 1
|
||||||
|
assert body["results"][0]["path"] == str(workspace / "src" / "registry.py")
|
||||||
|
assert body["results"][0]["excerpt"] == "def find_descendant_project_roots(...):"
|
||||||
|
assert body["stats"]["files_matched"] == 1
|
||||||
|
assert body["stats"]["time_ms"] == 10.0
|
||||||
|
assert body["fallback"] == {
|
||||||
|
"backend": "ripgrep-fallback",
|
||||||
|
"stale_index_suspected": True,
|
||||||
|
"reason": "Indexed FTS search returned no results; filesystem fallback used.",
|
||||||
|
}
|
||||||
|
assert captured["fallback_calls"] == 1
|
||||||
|
assert captured["fallback_query"] == "find_descendant_project_roots"
|
||||||
|
assert captured["fallback_path"] == workspace
|
||||||
|
assert captured["fallback_limit"] == 20
|
||||||
|
assert captured["options"].enable_vector is False
|
||||||
|
assert captured["options"].hybrid_mode is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_search_json_hybrid_zero_results_does_not_use_filesystem_fallback(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
|
||||||
|
indexed_result = ChainSearchResult(
|
||||||
|
query="how does my_function work",
|
||||||
|
results=[],
|
||||||
|
symbols=[],
|
||||||
|
stats=SearchStats(dirs_searched=4, files_matched=0, time_ms=11.0),
|
||||||
|
)
|
||||||
|
captured: dict[str, object] = {"fallback_calls": 0}
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands.Config, "load", staticmethod(lambda: Config(data_dir=tmp_path / "data")))
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands,
|
||||||
|
"RegistryStore",
|
||||||
|
type("FakeRegistryStore", (), {"initialize": lambda self: None, "close": lambda self: None}),
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(commands, "PathMapper", type("FakePathMapper", (), {}))
|
||||||
|
|
||||||
|
class FakeChainSearchEngine:
|
||||||
|
def __init__(self, registry, mapper, config=None):
|
||||||
|
captured["config"] = config
|
||||||
|
|
||||||
|
def search(self, query, source_path, options=None):
|
||||||
|
captured["query"] = query
|
||||||
|
captured["source_path"] = source_path
|
||||||
|
captured["options"] = options
|
||||||
|
return indexed_result
|
||||||
|
|
||||||
|
def cascade_search(self, *_args, **_kwargs):
|
||||||
|
raise AssertionError("hybrid queries should not dispatch to cascade_search")
|
||||||
|
|
||||||
|
def fake_fallback(*_args, **_kwargs):
|
||||||
|
captured["fallback_calls"] = int(captured["fallback_calls"]) + 1
|
||||||
|
return None
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands, "ChainSearchEngine", FakeChainSearchEngine)
|
||||||
|
monkeypatch.setattr(commands, "_filesystem_fallback_search", fake_fallback)
|
||||||
|
|
||||||
|
runner = CliRunner()
|
||||||
|
result = runner.invoke(
|
||||||
|
app,
|
||||||
|
["search", "how does my_function work", "--path", str(workspace), "--json"],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0, result.output
|
||||||
|
body = json.loads(result.stdout)["result"]
|
||||||
|
assert body["method"] == "hybrid"
|
||||||
|
assert body["count"] == 0
|
||||||
|
assert "fallback" not in body
|
||||||
|
assert body["stats"]["files_matched"] == 0
|
||||||
|
assert body["stats"]["time_ms"] == 11.0
|
||||||
|
assert captured["fallback_calls"] == 0
|
||||||
|
assert captured["options"].enable_vector is True
|
||||||
|
assert captured["options"].hybrid_mode is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_filesystem_fallback_search_prefers_source_definitions_for_keyword_queries(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
workspace.mkdir()
|
||||||
|
|
||||||
|
source_path = workspace / "src" / "registry.py"
|
||||||
|
test_path = workspace / "tests" / "test_registry.py"
|
||||||
|
ref_path = workspace / "src" / "chain_search.py"
|
||||||
|
|
||||||
|
match_lines = [
|
||||||
|
{
|
||||||
|
"type": "match",
|
||||||
|
"data": {
|
||||||
|
"path": {"text": str(test_path)},
|
||||||
|
"lines": {"text": "def test_find_descendant_project_roots_returns_nested_project_roots():\n"},
|
||||||
|
"line_number": 12,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "match",
|
||||||
|
"data": {
|
||||||
|
"path": {"text": str(source_path)},
|
||||||
|
"lines": {"text": "def find_descendant_project_roots(self, source_root: Path) -> List[DirMapping]:\n"},
|
||||||
|
"line_number": 48,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "match",
|
||||||
|
"data": {
|
||||||
|
"path": {"text": str(ref_path)},
|
||||||
|
"lines": {"text": "descendant_roots = self.registry.find_descendant_project_roots(source_root)\n"},
|
||||||
|
"line_number": 91,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands.shutil, "which", lambda _name: "rg")
|
||||||
|
monkeypatch.setattr(
|
||||||
|
commands.subprocess,
|
||||||
|
"run",
|
||||||
|
lambda *_args, **_kwargs: type(
|
||||||
|
"FakeCompletedProcess",
|
||||||
|
(),
|
||||||
|
{
|
||||||
|
"returncode": 0,
|
||||||
|
"stdout": "\n".join(json.dumps(line) for line in match_lines),
|
||||||
|
"stderr": "",
|
||||||
|
},
|
||||||
|
)(),
|
||||||
|
)
|
||||||
|
|
||||||
|
fallback = commands._filesystem_fallback_search(
|
||||||
|
"find_descendant_project_roots",
|
||||||
|
workspace,
|
||||||
|
limit=5,
|
||||||
|
config=Config(data_dir=tmp_path / "data"),
|
||||||
|
)
|
||||||
|
|
||||||
|
assert fallback is not None
|
||||||
|
assert fallback["fallback"]["backend"] == "ripgrep-fallback"
|
||||||
|
assert fallback["results"][0].path == str(source_path)
|
||||||
|
assert fallback["results"][1].path == str(ref_path)
|
||||||
|
assert fallback["results"][2].path == str(test_path)
|
||||||
|
assert fallback["results"][0].score > fallback["results"][1].score > fallback["results"][2].score
|
||||||
|
|
||||||
|
|
||||||
|
def test_clean_json_reports_partial_success_when_locked_files_remain(
|
||||||
|
monkeypatch,
|
||||||
|
tmp_path,
|
||||||
|
) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
project_index = tmp_path / "indexes" / "workspace"
|
||||||
|
project_index.mkdir(parents=True)
|
||||||
|
(project_index / "_index.db").write_text("db", encoding="utf-8")
|
||||||
|
locked_path = project_index / "nested" / "_index.db"
|
||||||
|
locked_path.parent.mkdir(parents=True)
|
||||||
|
locked_path.write_text("locked", encoding="utf-8")
|
||||||
|
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
class FakePathMapper:
|
||||||
|
def __init__(self):
|
||||||
|
self.index_root = tmp_path / "indexes"
|
||||||
|
|
||||||
|
def source_to_index_dir(self, source_path):
|
||||||
|
captured["mapped_source"] = source_path
|
||||||
|
return project_index
|
||||||
|
|
||||||
|
class FakeRegistryStore:
|
||||||
|
def initialize(self):
|
||||||
|
captured["registry_initialized"] = True
|
||||||
|
|
||||||
|
def unregister_project(self, source_path):
|
||||||
|
captured["unregistered_project"] = source_path
|
||||||
|
return True
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
captured["registry_closed"] = True
|
||||||
|
|
||||||
|
def fake_remove_tree(target):
|
||||||
|
captured["removed_target"] = target
|
||||||
|
return {
|
||||||
|
"removed": False,
|
||||||
|
"partial": True,
|
||||||
|
"locked_paths": [str(locked_path)],
|
||||||
|
"remaining_path": str(project_index),
|
||||||
|
"errors": [],
|
||||||
|
}
|
||||||
|
|
||||||
|
monkeypatch.setattr(commands, "PathMapper", FakePathMapper)
|
||||||
|
monkeypatch.setattr(commands, "RegistryStore", FakeRegistryStore)
|
||||||
|
monkeypatch.setattr(commands, "_remove_tree_best_effort", fake_remove_tree)
|
||||||
|
|
||||||
|
runner = CliRunner()
|
||||||
|
result = runner.invoke(app, ["clean", str(workspace), "--json"])
|
||||||
|
|
||||||
|
assert result.exit_code == 0, result.output
|
||||||
|
payload = json.loads(result.stdout)
|
||||||
|
body = payload["result"]
|
||||||
|
assert payload["success"] is True
|
||||||
|
assert body["cleaned"] == str(workspace.resolve())
|
||||||
|
assert body["index_path"] == str(project_index)
|
||||||
|
assert body["partial"] is True
|
||||||
|
assert body["locked_paths"] == [str(locked_path)]
|
||||||
|
assert body["remaining_path"] == str(project_index)
|
||||||
|
assert captured["registry_initialized"] is True
|
||||||
|
assert captured["registry_closed"] is True
|
||||||
|
assert captured["unregistered_project"] == workspace.resolve()
|
||||||
|
assert captured["removed_target"] == project_index
|
||||||
@@ -5,7 +5,10 @@ from pathlib import Path
|
|||||||
from unittest.mock import MagicMock
|
from unittest.mock import MagicMock
|
||||||
|
|
||||||
from codexlens.config import Config
|
from codexlens.config import Config
|
||||||
from codexlens.storage.index_tree import IndexTreeBuilder
|
from codexlens.storage.dir_index import DirIndexStore
|
||||||
|
from codexlens.storage.index_tree import DirBuildResult, IndexTreeBuilder
|
||||||
|
from codexlens.storage.path_mapper import PathMapper
|
||||||
|
from codexlens.storage.registry import RegistryStore
|
||||||
|
|
||||||
|
|
||||||
def _relative_dirs(source_root: Path, dirs_by_depth: dict[int, list[Path]]) -> set[str]:
|
def _relative_dirs(source_root: Path, dirs_by_depth: dict[int, list[Path]]) -> set[str]:
|
||||||
@@ -145,3 +148,148 @@ def test_builder_loads_saved_ignore_and_extension_filters_by_default(tmp_path: P
|
|||||||
|
|
||||||
assert [path.name for path in source_files] == ["app.ts"]
|
assert [path.name for path in source_files] == ["app.ts"]
|
||||||
assert "frontend/dist" not in discovered_dirs
|
assert "frontend/dist" not in discovered_dirs
|
||||||
|
|
||||||
|
|
||||||
|
def test_prune_stale_project_dirs_removes_ignored_artifact_mappings(tmp_path: Path) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
src_dir = workspace / "src"
|
||||||
|
dist_dir = workspace / "dist"
|
||||||
|
src_dir.mkdir(parents=True)
|
||||||
|
dist_dir.mkdir(parents=True)
|
||||||
|
(src_dir / "app.py").write_text("print('ok')\n", encoding="utf-8")
|
||||||
|
(dist_dir / "bundle.py").write_text("print('artifact')\n", encoding="utf-8")
|
||||||
|
|
||||||
|
mapper = PathMapper(index_root=tmp_path / "indexes")
|
||||||
|
registry = RegistryStore(db_path=tmp_path / "registry.db")
|
||||||
|
registry.initialize()
|
||||||
|
project = registry.register_project(workspace, mapper.source_to_index_dir(workspace))
|
||||||
|
registry.register_dir(project.id, workspace, mapper.source_to_index_db(workspace), depth=0)
|
||||||
|
registry.register_dir(project.id, src_dir, mapper.source_to_index_db(src_dir), depth=1)
|
||||||
|
registry.register_dir(project.id, dist_dir, mapper.source_to_index_db(dist_dir), depth=1)
|
||||||
|
|
||||||
|
builder = IndexTreeBuilder(
|
||||||
|
registry=registry,
|
||||||
|
mapper=mapper,
|
||||||
|
config=Config(data_dir=tmp_path / "data"),
|
||||||
|
incremental=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
dirs_by_depth = builder._collect_dirs_by_depth(workspace)
|
||||||
|
pruned = builder._prune_stale_project_dirs(
|
||||||
|
project_id=project.id,
|
||||||
|
source_root=workspace,
|
||||||
|
dirs_by_depth=dirs_by_depth,
|
||||||
|
)
|
||||||
|
|
||||||
|
remaining = {mapping.source_path.resolve() for mapping in registry.get_project_dirs(project.id)}
|
||||||
|
registry.close()
|
||||||
|
|
||||||
|
assert dist_dir.resolve() in pruned
|
||||||
|
assert workspace.resolve() in remaining
|
||||||
|
assert src_dir.resolve() in remaining
|
||||||
|
assert dist_dir.resolve() not in remaining
|
||||||
|
|
||||||
|
|
||||||
|
def test_force_full_build_prunes_stale_ignored_mappings(tmp_path: Path) -> None:
|
||||||
|
workspace = tmp_path / "workspace"
|
||||||
|
src_dir = workspace / "src"
|
||||||
|
dist_dir = workspace / "dist"
|
||||||
|
src_dir.mkdir(parents=True)
|
||||||
|
dist_dir.mkdir(parents=True)
|
||||||
|
(src_dir / "app.py").write_text("print('ok')\n", encoding="utf-8")
|
||||||
|
(dist_dir / "bundle.py").write_text("print('artifact')\n", encoding="utf-8")
|
||||||
|
|
||||||
|
mapper = PathMapper(index_root=tmp_path / "indexes")
|
||||||
|
registry = RegistryStore(db_path=tmp_path / "registry.db")
|
||||||
|
registry.initialize()
|
||||||
|
project = registry.register_project(workspace, mapper.source_to_index_dir(workspace))
|
||||||
|
registry.register_dir(project.id, workspace, mapper.source_to_index_db(workspace), depth=0)
|
||||||
|
registry.register_dir(project.id, dist_dir, mapper.source_to_index_db(dist_dir), depth=1)
|
||||||
|
|
||||||
|
builder = IndexTreeBuilder(
|
||||||
|
registry=registry,
|
||||||
|
mapper=mapper,
|
||||||
|
config=Config(
|
||||||
|
data_dir=tmp_path / "data",
|
||||||
|
global_symbol_index_enabled=False,
|
||||||
|
),
|
||||||
|
incremental=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
def fake_build_level_parallel(
|
||||||
|
dirs: list[Path],
|
||||||
|
languages,
|
||||||
|
workers,
|
||||||
|
*,
|
||||||
|
source_root: Path,
|
||||||
|
project_id: int,
|
||||||
|
global_index_db_path: Path,
|
||||||
|
) -> list[DirBuildResult]:
|
||||||
|
return [
|
||||||
|
DirBuildResult(
|
||||||
|
source_path=dir_path,
|
||||||
|
index_path=mapper.source_to_index_db(dir_path),
|
||||||
|
files_count=1 if dir_path == src_dir else 0,
|
||||||
|
symbols_count=0,
|
||||||
|
subdirs=[],
|
||||||
|
)
|
||||||
|
for dir_path in dirs
|
||||||
|
]
|
||||||
|
|
||||||
|
builder._build_level_parallel = fake_build_level_parallel # type: ignore[method-assign]
|
||||||
|
builder._link_children_to_parent = MagicMock()
|
||||||
|
|
||||||
|
build_result = builder.build(workspace, force_full=True, workers=1)
|
||||||
|
|
||||||
|
remaining = {mapping.source_path.resolve() for mapping in registry.get_project_dirs(project.id)}
|
||||||
|
registry.close()
|
||||||
|
|
||||||
|
assert build_result.total_dirs == 2
|
||||||
|
assert workspace.resolve() in remaining
|
||||||
|
assert src_dir.resolve() in remaining
|
||||||
|
assert dist_dir.resolve() not in remaining
|
||||||
|
|
||||||
|
|
||||||
|
def test_force_full_build_rewrites_directory_db_and_drops_stale_ignored_subdirs(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
project_root = tmp_path / "project"
|
||||||
|
src_dir = project_root / "src"
|
||||||
|
build_dir = project_root / "build"
|
||||||
|
src_dir.mkdir(parents=True)
|
||||||
|
build_dir.mkdir(parents=True)
|
||||||
|
(src_dir / "app.py").write_text("print('ok')\n", encoding="utf-8")
|
||||||
|
(build_dir / "generated.py").write_text("print('artifact')\n", encoding="utf-8")
|
||||||
|
|
||||||
|
mapper = PathMapper(index_root=tmp_path / "indexes")
|
||||||
|
registry = RegistryStore(db_path=tmp_path / "registry.db")
|
||||||
|
registry.initialize()
|
||||||
|
config = Config(
|
||||||
|
data_dir=tmp_path / "data",
|
||||||
|
global_symbol_index_enabled=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
root_index_db = mapper.source_to_index_db(project_root)
|
||||||
|
with DirIndexStore(root_index_db, config=config) as store:
|
||||||
|
store.register_subdir(
|
||||||
|
name="build",
|
||||||
|
index_path=mapper.source_to_index_db(build_dir),
|
||||||
|
files_count=1,
|
||||||
|
)
|
||||||
|
|
||||||
|
builder = IndexTreeBuilder(
|
||||||
|
registry=registry,
|
||||||
|
mapper=mapper,
|
||||||
|
config=config,
|
||||||
|
incremental=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
build_result = builder.build(project_root, force_full=True, workers=1)
|
||||||
|
|
||||||
|
with DirIndexStore(root_index_db, config=config) as store:
|
||||||
|
subdir_names = [link.name for link in store.get_subdirs()]
|
||||||
|
|
||||||
|
registry.close()
|
||||||
|
|
||||||
|
assert build_result.total_dirs == 2
|
||||||
|
assert subdir_names == ["src"]
|
||||||
|
|||||||
@@ -24,13 +24,24 @@ from codexlens.entities import SearchResult
|
|||||||
from codexlens.search.ranking import (
|
from codexlens.search.ranking import (
|
||||||
DEFAULT_WEIGHTS,
|
DEFAULT_WEIGHTS,
|
||||||
QueryIntent,
|
QueryIntent,
|
||||||
|
apply_path_penalties,
|
||||||
|
extract_explicit_path_hints,
|
||||||
|
cross_encoder_rerank,
|
||||||
adjust_weights_by_intent,
|
adjust_weights_by_intent,
|
||||||
apply_symbol_boost,
|
apply_symbol_boost,
|
||||||
detect_query_intent,
|
detect_query_intent,
|
||||||
filter_results_by_category,
|
filter_results_by_category,
|
||||||
get_rrf_weights,
|
get_rrf_weights,
|
||||||
group_similar_results,
|
group_similar_results,
|
||||||
|
is_auxiliary_reference_path,
|
||||||
|
is_generated_artifact_path,
|
||||||
|
is_test_file,
|
||||||
normalize_weights,
|
normalize_weights,
|
||||||
|
query_prefers_lexical_search,
|
||||||
|
query_targets_auxiliary_files,
|
||||||
|
query_targets_generated_files,
|
||||||
|
query_targets_test_files,
|
||||||
|
rebalance_noisy_results,
|
||||||
reciprocal_rank_fusion,
|
reciprocal_rank_fusion,
|
||||||
simple_weighted_fusion,
|
simple_weighted_fusion,
|
||||||
)
|
)
|
||||||
@@ -73,6 +84,7 @@ class TestDetectQueryIntent:
|
|||||||
def test_detect_keyword_intent(self):
|
def test_detect_keyword_intent(self):
|
||||||
"""CamelCase/underscore queries should be detected as KEYWORD."""
|
"""CamelCase/underscore queries should be detected as KEYWORD."""
|
||||||
assert detect_query_intent("MyClassName") == QueryIntent.KEYWORD
|
assert detect_query_intent("MyClassName") == QueryIntent.KEYWORD
|
||||||
|
assert detect_query_intent("windowsHide") == QueryIntent.KEYWORD
|
||||||
assert detect_query_intent("my_function_name") == QueryIntent.KEYWORD
|
assert detect_query_intent("my_function_name") == QueryIntent.KEYWORD
|
||||||
assert detect_query_intent("foo::bar") == QueryIntent.KEYWORD
|
assert detect_query_intent("foo::bar") == QueryIntent.KEYWORD
|
||||||
|
|
||||||
@@ -91,6 +103,25 @@ class TestDetectQueryIntent:
|
|||||||
assert detect_query_intent("") == QueryIntent.MIXED
|
assert detect_query_intent("") == QueryIntent.MIXED
|
||||||
assert detect_query_intent(" ") == QueryIntent.MIXED
|
assert detect_query_intent(" ") == QueryIntent.MIXED
|
||||||
|
|
||||||
|
def test_query_targets_test_files(self):
|
||||||
|
"""Queries explicitly mentioning tests should skip test penalties."""
|
||||||
|
assert query_targets_test_files("how do tests cover auth flow?")
|
||||||
|
assert query_targets_test_files("spec fixtures for parser")
|
||||||
|
assert not query_targets_test_files("windowsHide")
|
||||||
|
|
||||||
|
def test_query_targets_generated_files(self):
|
||||||
|
"""Queries explicitly mentioning build artifacts should skip that penalty."""
|
||||||
|
assert query_targets_generated_files("inspect dist bundle output")
|
||||||
|
assert query_targets_generated_files("generated artifacts under build")
|
||||||
|
assert not query_targets_generated_files("cache invalidation strategy")
|
||||||
|
|
||||||
|
def test_query_prefers_lexical_search(self):
|
||||||
|
"""Config/env/factory queries should prefer lexical-first routing."""
|
||||||
|
assert query_prefers_lexical_search("embedding backend fastembed local litellm api config")
|
||||||
|
assert query_prefers_lexical_search("get_reranker factory onnx backend selection")
|
||||||
|
assert query_prefers_lexical_search("EMBEDDING_BACKEND and RERANKER_BACKEND environment variables")
|
||||||
|
assert not query_prefers_lexical_search("how does smart search route keyword queries")
|
||||||
|
|
||||||
|
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# Tests: adjust_weights_by_intent
|
# Tests: adjust_weights_by_intent
|
||||||
@@ -129,6 +160,427 @@ class TestAdjustWeightsByIntent:
|
|||||||
assert adjusted["exact"] == pytest.approx(0.3, abs=0.01)
|
assert adjusted["exact"] == pytest.approx(0.3, abs=0.01)
|
||||||
|
|
||||||
|
|
||||||
|
class TestPathPenalties:
|
||||||
|
"""Tests for lightweight path-based ranking penalties."""
|
||||||
|
|
||||||
|
def test_is_test_file(self):
|
||||||
|
assert is_test_file("/repo/tests/test_auth.py")
|
||||||
|
assert is_test_file("D:\\repo\\src\\auth.spec.ts")
|
||||||
|
assert is_test_file("/repo/frontend/src/pages/discoverypage.test.tsx")
|
||||||
|
assert is_test_file("/repo/frontend/src/pages/discoverypage.spec.jsx")
|
||||||
|
assert not is_test_file("/repo/src/auth.py")
|
||||||
|
|
||||||
|
def test_is_generated_artifact_path(self):
|
||||||
|
assert is_generated_artifact_path("/repo/dist/app.js")
|
||||||
|
assert is_generated_artifact_path("/repo/src/generated/client.ts")
|
||||||
|
assert is_generated_artifact_path("D:\\repo\\frontend\\.next\\server.js")
|
||||||
|
assert not is_generated_artifact_path("/repo/src/auth.py")
|
||||||
|
|
||||||
|
def test_is_auxiliary_reference_path(self):
|
||||||
|
assert is_auxiliary_reference_path("/repo/examples/auth_demo.py")
|
||||||
|
assert is_auxiliary_reference_path("/repo/benchmarks/search_eval.py")
|
||||||
|
assert is_auxiliary_reference_path("/repo/tools/debug_search.py")
|
||||||
|
assert not is_auxiliary_reference_path("/repo/src/auth.py")
|
||||||
|
|
||||||
|
def test_query_targets_auxiliary_files(self):
|
||||||
|
assert query_targets_auxiliary_files("show smart search examples")
|
||||||
|
assert query_targets_auxiliary_files("benchmark smart search")
|
||||||
|
assert not query_targets_auxiliary_files("smart search routing")
|
||||||
|
|
||||||
|
def test_apply_path_penalties_demotes_test_files(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/tests/test_auth.py", score=10.0),
|
||||||
|
_make_result(path="/repo/src/auth.py", score=9.0),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"authenticate user",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/auth.py"
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["test_file"]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_more_aggressively_demotes_tests_for_keyword_queries(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/tests/test_auth.py", score=5.0),
|
||||||
|
_make_result(path="/repo/src/auth.py", score=4.0),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"find_descendant_project_roots",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/auth.py"
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["test_file"]
|
||||||
|
assert penalized[1].metadata["path_penalty_multiplier"] == pytest.approx(0.55)
|
||||||
|
assert penalized[1].metadata["path_rank_multiplier"] == pytest.approx(0.55)
|
||||||
|
|
||||||
|
def test_apply_path_penalties_more_aggressively_demotes_tests_for_semantic_queries(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/tests/test_auth.py", score=5.0),
|
||||||
|
_make_result(path="/repo/src/auth.py", score=4.1),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"how does auth routing work",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/auth.py"
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["test_file"]
|
||||||
|
assert penalized[1].metadata["path_penalty_multiplier"] == pytest.approx(0.75)
|
||||||
|
|
||||||
|
def test_apply_path_penalties_boosts_source_definitions_for_identifier_queries(self):
|
||||||
|
results = [
|
||||||
|
_make_result(
|
||||||
|
path="/repo/tests/test_registry.py",
|
||||||
|
score=4.2,
|
||||||
|
excerpt='query="find_descendant_project_roots"',
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/registry.py",
|
||||||
|
score=3.0,
|
||||||
|
excerpt="def find_descendant_project_roots(self, source_root: Path) -> list[str]:",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"find_descendant_project_roots",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/registry.py"
|
||||||
|
assert penalized[0].metadata["path_boost_reasons"] == ["source_definition"]
|
||||||
|
assert penalized[0].metadata["path_boost_multiplier"] == pytest.approx(2.0)
|
||||||
|
assert penalized[0].metadata["path_rank_multiplier"] == pytest.approx(2.0)
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["test_file"]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_boosts_source_paths_for_semantic_feature_queries(self):
|
||||||
|
results = [
|
||||||
|
_make_result(
|
||||||
|
path="/repo/tests/smart-search-intent.test.js",
|
||||||
|
score=0.832,
|
||||||
|
excerpt="describes how smart search routes keyword queries",
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/smart-search.ts",
|
||||||
|
score=0.555,
|
||||||
|
excerpt="smart search keyword routing logic",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"how does smart search route keyword queries",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/tools/smart-search.ts"
|
||||||
|
assert penalized[0].metadata["path_boost_reasons"] == ["source_path_topic_overlap"]
|
||||||
|
assert penalized[0].metadata["path_boost_multiplier"] == pytest.approx(1.35)
|
||||||
|
assert penalized[0].metadata["path_boost_overlap_tokens"] == ["smart", "search"]
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["test_file"]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_strongly_boosts_keyword_basename_overlap(self):
|
||||||
|
results = [
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/core-memory.ts",
|
||||||
|
score=0.04032417772512223,
|
||||||
|
excerpt="memory listing helpers",
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/smart-search.ts",
|
||||||
|
score=0.009836065573770493,
|
||||||
|
excerpt="smart search keyword routing logic",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"executeHybridMode dense_rerank semantic smart_search",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/tools/smart-search.ts"
|
||||||
|
assert penalized[0].metadata["path_boost_reasons"] == ["source_path_topic_overlap"]
|
||||||
|
assert penalized[0].metadata["path_boost_multiplier"] == pytest.approx(4.5)
|
||||||
|
assert penalized[0].metadata["path_boost_overlap_tokens"] == ["smart", "search"]
|
||||||
|
|
||||||
|
def test_extract_explicit_path_hints_ignores_generic_platform_terms(self):
|
||||||
|
assert extract_explicit_path_hints(
|
||||||
|
"parse CodexLens JSON output strip ANSI smart_search",
|
||||||
|
) == [["smart", "search"]]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_prefers_explicit_feature_hint_over_platform_terms(self):
|
||||||
|
results = [
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/codex-lens-lsp.ts",
|
||||||
|
score=0.045,
|
||||||
|
excerpt="CodexLens LSP bridge",
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/smart-search.ts",
|
||||||
|
score=0.03,
|
||||||
|
excerpt="parse JSON output and strip ANSI for plain-text fallback",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"parse CodexLens JSON output strip ANSI smart_search",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/tools/smart-search.ts"
|
||||||
|
assert penalized[0].metadata["path_boost_reasons"] == ["source_path_topic_overlap"]
|
||||||
|
assert penalized[0].metadata["path_boost_overlap_tokens"] == ["smart", "search"]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_strongly_boosts_lexical_config_modules(self):
|
||||||
|
results = [
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/smart-search.ts",
|
||||||
|
score=22.07,
|
||||||
|
excerpt="embedding backend local api config routing",
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/codexlens/config.py",
|
||||||
|
score=4.88,
|
||||||
|
excerpt="embedding_backend = 'fastembed'",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"embedding backend fastembed local litellm api config",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/codexlens/config.py"
|
||||||
|
assert penalized[0].metadata["path_boost_reasons"] == ["source_path_topic_overlap"]
|
||||||
|
assert penalized[0].metadata["path_boost_multiplier"] == pytest.approx(5.0)
|
||||||
|
assert penalized[0].metadata["path_boost_overlap_tokens"] == ["config"]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_more_aggressively_demotes_tests_for_explicit_feature_queries(self):
|
||||||
|
results = [
|
||||||
|
_make_result(
|
||||||
|
path="/repo/tests/smart-search-intent.test.js",
|
||||||
|
score=1.0,
|
||||||
|
excerpt="smart search intent coverage",
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/smart-search.ts",
|
||||||
|
score=0.58,
|
||||||
|
excerpt="plain-text JSON fallback for smart search",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"parse CodexLens JSON output strip ANSI smart_search",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/tools/smart-search.ts"
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["test_file"]
|
||||||
|
assert penalized[1].metadata["path_penalty_multiplier"] == pytest.approx(0.55)
|
||||||
|
|
||||||
|
def test_apply_path_penalties_demotes_generated_artifacts(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/dist/auth.js", score=10.0),
|
||||||
|
_make_result(path="/repo/src/auth.ts", score=9.0),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"authenticate user",
|
||||||
|
generated_file_penalty=0.35,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/auth.ts"
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["generated_artifact"]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_more_aggressively_demotes_generated_artifacts_for_explicit_feature_queries(self):
|
||||||
|
results = [
|
||||||
|
_make_result(
|
||||||
|
path="/repo/dist/tools/smart-search.js",
|
||||||
|
score=1.0,
|
||||||
|
excerpt="built smart search output",
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/smart-search.ts",
|
||||||
|
score=0.45,
|
||||||
|
excerpt="plain-text JSON fallback for smart search",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"parse CodexLens JSON output strip ANSI smart_search",
|
||||||
|
generated_file_penalty=0.35,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/tools/smart-search.ts"
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["generated_artifact"]
|
||||||
|
assert penalized[1].metadata["path_penalty_multiplier"] == pytest.approx(0.4)
|
||||||
|
|
||||||
|
def test_apply_path_penalties_demotes_auxiliary_reference_files(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/examples/simple_search_comparison.py", score=10.0),
|
||||||
|
_make_result(path="/repo/src/search/router.py", score=9.0),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"how does smart search route keyword queries",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/search/router.py"
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["auxiliary_file"]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_more_aggressively_demotes_auxiliary_files_for_explicit_feature_queries(self):
|
||||||
|
results = [
|
||||||
|
_make_result(
|
||||||
|
path="/repo/benchmarks/smart_search_demo.py",
|
||||||
|
score=1.0,
|
||||||
|
excerpt="demo for smart search fallback",
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/smart-search.ts",
|
||||||
|
score=0.52,
|
||||||
|
excerpt="plain-text JSON fallback for smart search",
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"parse CodexLens JSON output strip ANSI smart_search",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/src/tools/smart-search.ts"
|
||||||
|
assert penalized[1].metadata["path_penalty_reasons"] == ["auxiliary_file"]
|
||||||
|
assert penalized[1].metadata["path_penalty_multiplier"] == pytest.approx(0.5)
|
||||||
|
|
||||||
|
def test_apply_path_penalties_skips_when_query_targets_tests(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/tests/test_auth.py", score=10.0),
|
||||||
|
_make_result(path="/repo/src/auth.py", score=9.0),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"auth tests",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/tests/test_auth.py"
|
||||||
|
|
||||||
|
def test_apply_path_penalties_skips_generated_penalty_when_query_targets_artifacts(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/dist/auth.js", score=10.0),
|
||||||
|
_make_result(path="/repo/src/auth.ts", score=9.0),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"dist auth bundle",
|
||||||
|
generated_file_penalty=0.35,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/dist/auth.js"
|
||||||
|
|
||||||
|
def test_rebalance_noisy_results_pushes_explicit_feature_query_noise_behind_source_files(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/src/tools/smart-search.ts", score=0.9),
|
||||||
|
_make_result(path="/repo/tests/smart-search-intent.test.tsx", score=0.8),
|
||||||
|
_make_result(path="/repo/src/core/cli-routes.ts", score=0.7),
|
||||||
|
_make_result(path="/repo/dist/tools/smart-search.js", score=0.6),
|
||||||
|
_make_result(path="/repo/benchmarks/smart_search_demo.py", score=0.5),
|
||||||
|
]
|
||||||
|
|
||||||
|
rebalanced = rebalance_noisy_results(
|
||||||
|
results,
|
||||||
|
"parse CodexLens JSON output strip ANSI smart_search",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert [item.path for item in rebalanced[:2]] == [
|
||||||
|
"/repo/src/tools/smart-search.ts",
|
||||||
|
"/repo/src/core/cli-routes.ts",
|
||||||
|
]
|
||||||
|
|
||||||
|
def test_rebalance_noisy_results_preserves_tests_when_query_targets_them(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/tests/smart-search-intent.test.tsx", score=0.9),
|
||||||
|
_make_result(path="/repo/src/tools/smart-search.ts", score=0.8),
|
||||||
|
]
|
||||||
|
|
||||||
|
rebalanced = rebalance_noisy_results(results, "smart search tests")
|
||||||
|
|
||||||
|
assert [item.path for item in rebalanced] == [
|
||||||
|
"/repo/tests/smart-search-intent.test.tsx",
|
||||||
|
"/repo/src/tools/smart-search.ts",
|
||||||
|
]
|
||||||
|
|
||||||
|
def test_apply_path_penalties_skips_auxiliary_penalty_when_query_targets_examples(self):
|
||||||
|
results = [
|
||||||
|
_make_result(path="/repo/examples/simple_search_comparison.py", score=10.0),
|
||||||
|
_make_result(path="/repo/src/search/router.py", score=9.0),
|
||||||
|
]
|
||||||
|
|
||||||
|
penalized = apply_path_penalties(
|
||||||
|
results,
|
||||||
|
"smart search examples",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert penalized[0].path == "/repo/examples/simple_search_comparison.py"
|
||||||
|
|
||||||
|
|
||||||
|
class TestCrossEncoderRerank:
|
||||||
|
"""Tests for cross-encoder reranking edge cases."""
|
||||||
|
|
||||||
|
def test_cross_encoder_rerank_preserves_strong_source_candidates_for_semantic_feature_queries(self):
|
||||||
|
class DummyReranker:
|
||||||
|
def score_pairs(self, pairs, batch_size=32):
|
||||||
|
_ = (pairs, batch_size)
|
||||||
|
return [0.8323705792427063, 1.2463066923373844e-05]
|
||||||
|
|
||||||
|
reranked = cross_encoder_rerank(
|
||||||
|
"how does smart search route keyword queries",
|
||||||
|
[
|
||||||
|
_make_result(
|
||||||
|
path="/repo/tests/smart-search-intent.test.js",
|
||||||
|
score=0.5989155769348145,
|
||||||
|
excerpt="describes how smart search routes keyword queries",
|
||||||
|
),
|
||||||
|
_make_result(
|
||||||
|
path="/repo/src/tools/smart-search.ts",
|
||||||
|
score=0.554444432258606,
|
||||||
|
excerpt="smart search keyword routing logic",
|
||||||
|
),
|
||||||
|
],
|
||||||
|
DummyReranker(),
|
||||||
|
top_k=2,
|
||||||
|
)
|
||||||
|
reranked = apply_path_penalties(
|
||||||
|
reranked,
|
||||||
|
"how does smart search route keyword queries",
|
||||||
|
test_file_penalty=0.15,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert reranked[0].path == "/repo/src/tools/smart-search.ts"
|
||||||
|
assert reranked[0].metadata["cross_encoder_floor_reason"] == "semantic_source_path_overlap"
|
||||||
|
assert reranked[0].metadata["cross_encoder_floor_overlap_tokens"] == ["smart", "search"]
|
||||||
|
assert reranked[0].metadata["path_boost_reasons"] == ["source_path_topic_overlap"]
|
||||||
|
assert reranked[1].metadata["path_penalty_reasons"] == ["test_file"]
|
||||||
|
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# Tests: get_rrf_weights
|
# Tests: get_rrf_weights
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
|
|||||||
@@ -67,3 +67,60 @@ def test_find_nearest_index(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) ->
|
|||||||
assert found is not None
|
assert found is not None
|
||||||
assert found.id == mapping.id
|
assert found.id == mapping.id
|
||||||
|
|
||||||
|
|
||||||
|
def test_find_descendant_project_roots_returns_nested_project_roots(tmp_path: Path) -> None:
|
||||||
|
db_path = tmp_path / "registry.db"
|
||||||
|
workspace_root = tmp_path / "workspace"
|
||||||
|
child_a = workspace_root / "packages" / "app-a"
|
||||||
|
child_b = workspace_root / "tools" / "app-b"
|
||||||
|
outside_root = tmp_path / "external"
|
||||||
|
|
||||||
|
with RegistryStore(db_path=db_path) as store:
|
||||||
|
workspace_project = store.register_project(
|
||||||
|
workspace_root,
|
||||||
|
tmp_path / "indexes" / "workspace",
|
||||||
|
)
|
||||||
|
child_a_project = store.register_project(
|
||||||
|
child_a,
|
||||||
|
tmp_path / "indexes" / "workspace" / "packages" / "app-a",
|
||||||
|
)
|
||||||
|
child_b_project = store.register_project(
|
||||||
|
child_b,
|
||||||
|
tmp_path / "indexes" / "workspace" / "tools" / "app-b",
|
||||||
|
)
|
||||||
|
outside_project = store.register_project(
|
||||||
|
outside_root,
|
||||||
|
tmp_path / "indexes" / "external",
|
||||||
|
)
|
||||||
|
|
||||||
|
store.register_dir(
|
||||||
|
workspace_project.id,
|
||||||
|
workspace_root,
|
||||||
|
tmp_path / "indexes" / "workspace" / "_index.db",
|
||||||
|
depth=0,
|
||||||
|
)
|
||||||
|
child_a_mapping = store.register_dir(
|
||||||
|
child_a_project.id,
|
||||||
|
child_a,
|
||||||
|
tmp_path / "indexes" / "workspace" / "packages" / "app-a" / "_index.db",
|
||||||
|
depth=0,
|
||||||
|
)
|
||||||
|
child_b_mapping = store.register_dir(
|
||||||
|
child_b_project.id,
|
||||||
|
child_b,
|
||||||
|
tmp_path / "indexes" / "workspace" / "tools" / "app-b" / "_index.db",
|
||||||
|
depth=0,
|
||||||
|
)
|
||||||
|
store.register_dir(
|
||||||
|
outside_project.id,
|
||||||
|
outside_root,
|
||||||
|
tmp_path / "indexes" / "external" / "_index.db",
|
||||||
|
depth=0,
|
||||||
|
)
|
||||||
|
|
||||||
|
descendants = store.find_descendant_project_roots(workspace_root)
|
||||||
|
|
||||||
|
assert [mapping.index_path for mapping in descendants] == [
|
||||||
|
child_a_mapping.index_path,
|
||||||
|
child_b_mapping.index_path,
|
||||||
|
]
|
||||||
|
|||||||
@@ -313,3 +313,89 @@ def test_onnx_reranker_scores_pairs_with_sigmoid_normalization(
|
|||||||
|
|
||||||
expected = [1.0 / (1.0 + math.exp(-float(i))) for i in range(len(pairs))]
|
expected = [1.0 / (1.0 + math.exp(-float(i))) for i in range(len(pairs))]
|
||||||
assert scores == pytest.approx(expected, rel=1e-6, abs=1e-6)
|
assert scores == pytest.approx(expected, rel=1e-6, abs=1e-6)
|
||||||
|
|
||||||
|
|
||||||
|
def test_onnx_reranker_splits_tuple_providers_into_provider_options(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
captured: dict[str, object] = {}
|
||||||
|
|
||||||
|
dummy_onnxruntime = types.ModuleType("onnxruntime")
|
||||||
|
|
||||||
|
dummy_optimum = types.ModuleType("optimum")
|
||||||
|
dummy_optimum.__path__ = []
|
||||||
|
dummy_optimum_ort = types.ModuleType("optimum.onnxruntime")
|
||||||
|
|
||||||
|
class DummyModelOutput:
|
||||||
|
def __init__(self, logits: np.ndarray) -> None:
|
||||||
|
self.logits = logits
|
||||||
|
|
||||||
|
class DummyModel:
|
||||||
|
input_names = ["input_ids", "attention_mask"]
|
||||||
|
|
||||||
|
def __call__(self, **inputs):
|
||||||
|
batch = int(inputs["input_ids"].shape[0])
|
||||||
|
return DummyModelOutput(logits=np.zeros((batch, 1), dtype=np.float32))
|
||||||
|
|
||||||
|
class DummyORTModelForSequenceClassification:
|
||||||
|
@classmethod
|
||||||
|
def from_pretrained(
|
||||||
|
cls,
|
||||||
|
model_name: str,
|
||||||
|
providers=None,
|
||||||
|
provider_options=None,
|
||||||
|
**kwargs,
|
||||||
|
):
|
||||||
|
captured["model_name"] = model_name
|
||||||
|
captured["providers"] = providers
|
||||||
|
captured["provider_options"] = provider_options
|
||||||
|
captured["kwargs"] = kwargs
|
||||||
|
return DummyModel()
|
||||||
|
|
||||||
|
dummy_optimum_ort.ORTModelForSequenceClassification = DummyORTModelForSequenceClassification
|
||||||
|
|
||||||
|
dummy_transformers = types.ModuleType("transformers")
|
||||||
|
|
||||||
|
class DummyAutoTokenizer:
|
||||||
|
model_max_length = 512
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_pretrained(cls, model_name: str, **kwargs):
|
||||||
|
_ = model_name, kwargs
|
||||||
|
return cls()
|
||||||
|
|
||||||
|
def __call__(self, *, text, text_pair, return_tensors, **kwargs):
|
||||||
|
_ = text_pair, kwargs
|
||||||
|
assert return_tensors == "np"
|
||||||
|
batch = len(text)
|
||||||
|
return {
|
||||||
|
"input_ids": np.zeros((batch, 4), dtype=np.int64),
|
||||||
|
"attention_mask": np.ones((batch, 4), dtype=np.int64),
|
||||||
|
}
|
||||||
|
|
||||||
|
dummy_transformers.AutoTokenizer = DummyAutoTokenizer
|
||||||
|
|
||||||
|
monkeypatch.setitem(sys.modules, "onnxruntime", dummy_onnxruntime)
|
||||||
|
monkeypatch.setitem(sys.modules, "optimum", dummy_optimum)
|
||||||
|
monkeypatch.setitem(sys.modules, "optimum.onnxruntime", dummy_optimum_ort)
|
||||||
|
monkeypatch.setitem(sys.modules, "transformers", dummy_transformers)
|
||||||
|
|
||||||
|
reranker = get_reranker(
|
||||||
|
backend="onnx",
|
||||||
|
model_name="dummy-model",
|
||||||
|
use_gpu=True,
|
||||||
|
providers=[
|
||||||
|
("DmlExecutionProvider", {"device_id": 1}),
|
||||||
|
"CPUExecutionProvider",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
assert isinstance(reranker, ONNXReranker)
|
||||||
|
|
||||||
|
scores = reranker.score_pairs([("q", "d")], batch_size=1)
|
||||||
|
|
||||||
|
assert scores == pytest.approx([0.5])
|
||||||
|
assert captured["model_name"] == "dummy-model"
|
||||||
|
assert captured["providers"] == ["DmlExecutionProvider", "CPUExecutionProvider"]
|
||||||
|
assert captured["provider_options"] == [{"device_id": 1}, {}]
|
||||||
|
|||||||
@@ -428,6 +428,51 @@ class TestIndexPathCollection:
|
|||||||
assert len(paths) == 1
|
assert len(paths) == 1
|
||||||
engine.close()
|
engine.close()
|
||||||
|
|
||||||
|
def test_collect_skips_ignored_artifact_indexes(self, mock_registry, mock_mapper, temp_dir):
|
||||||
|
"""Test collection skips dist/build-style artifact subtrees."""
|
||||||
|
root_dir = temp_dir / "project"
|
||||||
|
root_dir.mkdir()
|
||||||
|
|
||||||
|
root_db = root_dir / "_index.db"
|
||||||
|
root_store = DirIndexStore(root_db)
|
||||||
|
root_store.initialize()
|
||||||
|
|
||||||
|
src_dir = root_dir / "src"
|
||||||
|
src_dir.mkdir()
|
||||||
|
src_db = src_dir / "_index.db"
|
||||||
|
src_store = DirIndexStore(src_db)
|
||||||
|
src_store.initialize()
|
||||||
|
|
||||||
|
dist_dir = root_dir / "dist"
|
||||||
|
dist_dir.mkdir()
|
||||||
|
dist_db = dist_dir / "_index.db"
|
||||||
|
dist_store = DirIndexStore(dist_db)
|
||||||
|
dist_store.initialize()
|
||||||
|
|
||||||
|
workflow_dir = root_dir / ".workflow"
|
||||||
|
workflow_dir.mkdir()
|
||||||
|
workflow_db = workflow_dir / "_index.db"
|
||||||
|
workflow_store = DirIndexStore(workflow_db)
|
||||||
|
workflow_store.initialize()
|
||||||
|
|
||||||
|
root_store.register_subdir(name="src", index_path=src_db)
|
||||||
|
root_store.register_subdir(name="dist", index_path=dist_db)
|
||||||
|
root_store.register_subdir(name=".workflow", index_path=workflow_db)
|
||||||
|
|
||||||
|
root_store.close()
|
||||||
|
src_store.close()
|
||||||
|
dist_store.close()
|
||||||
|
workflow_store.close()
|
||||||
|
|
||||||
|
engine = ChainSearchEngine(mock_registry, mock_mapper)
|
||||||
|
paths = engine._collect_index_paths(root_db, depth=-1)
|
||||||
|
|
||||||
|
assert {path.relative_to(root_dir).as_posix() for path in paths} == {
|
||||||
|
"_index.db",
|
||||||
|
"src/_index.db",
|
||||||
|
}
|
||||||
|
engine.close()
|
||||||
|
|
||||||
|
|
||||||
class TestResultMergeAndRank:
|
class TestResultMergeAndRank:
|
||||||
"""Tests for _merge_and_rank method."""
|
"""Tests for _merge_and_rank method."""
|
||||||
@@ -490,6 +535,36 @@ class TestResultMergeAndRank:
|
|||||||
assert merged == []
|
assert merged == []
|
||||||
engine.close()
|
engine.close()
|
||||||
|
|
||||||
|
def test_merge_applies_test_file_penalty_for_non_test_query(self, mock_registry, mock_mapper):
|
||||||
|
"""Non-test queries should lightly demote test files during merge."""
|
||||||
|
engine = ChainSearchEngine(mock_registry, mock_mapper)
|
||||||
|
|
||||||
|
results = [
|
||||||
|
SearchResult(path="/repo/tests/test_auth.py", score=10.0, excerpt="match 1"),
|
||||||
|
SearchResult(path="/repo/src/auth.py", score=9.0, excerpt="match 2"),
|
||||||
|
]
|
||||||
|
|
||||||
|
merged = engine._merge_and_rank(results, limit=10, query="authenticate users")
|
||||||
|
|
||||||
|
assert merged[0].path == "/repo/src/auth.py"
|
||||||
|
assert merged[1].metadata["path_penalty_reasons"] == ["test_file"]
|
||||||
|
engine.close()
|
||||||
|
|
||||||
|
def test_merge_applies_generated_file_penalty_for_non_artifact_query(self, mock_registry, mock_mapper):
|
||||||
|
"""Non-artifact queries should lightly demote generated/build results during merge."""
|
||||||
|
engine = ChainSearchEngine(mock_registry, mock_mapper)
|
||||||
|
|
||||||
|
results = [
|
||||||
|
SearchResult(path="/repo/dist/auth.js", score=10.0, excerpt="match 1"),
|
||||||
|
SearchResult(path="/repo/src/auth.ts", score=9.0, excerpt="match 2"),
|
||||||
|
]
|
||||||
|
|
||||||
|
merged = engine._merge_and_rank(results, limit=10, query="authenticate users")
|
||||||
|
|
||||||
|
assert merged[0].path == "/repo/src/auth.ts"
|
||||||
|
assert merged[1].metadata["path_penalty_reasons"] == ["generated_artifact"]
|
||||||
|
engine.close()
|
||||||
|
|
||||||
|
|
||||||
# === Hierarchical Chain Search Tests ===
|
# === Hierarchical Chain Search Tests ===
|
||||||
|
|
||||||
|
|||||||
@@ -400,15 +400,20 @@ class TestStage4OptionalRerank:
|
|||||||
"""Tests for Stage 4: Optional cross-encoder reranking."""
|
"""Tests for Stage 4: Optional cross-encoder reranking."""
|
||||||
|
|
||||||
def test_stage4_reranks_with_reranker(
|
def test_stage4_reranks_with_reranker(
|
||||||
self, mock_registry, mock_mapper, mock_config
|
self, mock_registry, mock_mapper, temp_paths
|
||||||
):
|
):
|
||||||
"""Test _stage4_optional_rerank uses _cross_encoder_rerank."""
|
"""Test _stage4_optional_rerank overfetches before final trim."""
|
||||||
engine = ChainSearchEngine(mock_registry, mock_mapper, config=mock_config)
|
config = Config(data_dir=temp_paths / "data")
|
||||||
|
config.reranker_top_k = 4
|
||||||
|
config.reranking_top_k = 4
|
||||||
|
engine = ChainSearchEngine(mock_registry, mock_mapper, config=config)
|
||||||
|
|
||||||
results = [
|
results = [
|
||||||
SearchResult(path="a.py", score=0.9, excerpt="a"),
|
SearchResult(path="a.py", score=0.9, excerpt="a"),
|
||||||
SearchResult(path="b.py", score=0.8, excerpt="b"),
|
SearchResult(path="b.py", score=0.8, excerpt="b"),
|
||||||
SearchResult(path="c.py", score=0.7, excerpt="c"),
|
SearchResult(path="c.py", score=0.7, excerpt="c"),
|
||||||
|
SearchResult(path="d.py", score=0.6, excerpt="d"),
|
||||||
|
SearchResult(path="e.py", score=0.5, excerpt="e"),
|
||||||
]
|
]
|
||||||
|
|
||||||
# Mock the _cross_encoder_rerank method that _stage4 calls
|
# Mock the _cross_encoder_rerank method that _stage4 calls
|
||||||
@@ -416,12 +421,14 @@ class TestStage4OptionalRerank:
|
|||||||
mock_rerank.return_value = [
|
mock_rerank.return_value = [
|
||||||
SearchResult(path="c.py", score=0.95, excerpt="c"),
|
SearchResult(path="c.py", score=0.95, excerpt="c"),
|
||||||
SearchResult(path="a.py", score=0.85, excerpt="a"),
|
SearchResult(path="a.py", score=0.85, excerpt="a"),
|
||||||
|
SearchResult(path="d.py", score=0.83, excerpt="d"),
|
||||||
|
SearchResult(path="e.py", score=0.81, excerpt="e"),
|
||||||
]
|
]
|
||||||
|
|
||||||
reranked = engine._stage4_optional_rerank("query", results, k=2)
|
reranked = engine._stage4_optional_rerank("query", results, k=2)
|
||||||
|
|
||||||
mock_rerank.assert_called_once_with("query", results, 2)
|
mock_rerank.assert_called_once_with("query", results, 4)
|
||||||
assert len(reranked) <= 2
|
assert len(reranked) == 4
|
||||||
# First result should be reranked winner
|
# First result should be reranked winner
|
||||||
assert reranked[0].path == "c.py"
|
assert reranked[0].path == "c.py"
|
||||||
|
|
||||||
@@ -633,6 +640,113 @@ class TestStagedCascadeIntegration:
|
|||||||
a_result = next(r for r in result.results if r.path == "a.py")
|
a_result = next(r for r in result.results if r.path == "a.py")
|
||||||
assert a_result.score == 0.9
|
assert a_result.score == 0.9
|
||||||
|
|
||||||
|
def test_staged_cascade_expands_stage3_target_for_rerank_budget(
|
||||||
|
self, mock_registry, mock_mapper, temp_paths
|
||||||
|
):
|
||||||
|
"""Test staged cascade preserves enough Stage 3 reps for rerank budget."""
|
||||||
|
config = Config(data_dir=temp_paths / "data")
|
||||||
|
config.enable_staged_rerank = True
|
||||||
|
config.reranker_top_k = 6
|
||||||
|
config.reranking_top_k = 6
|
||||||
|
|
||||||
|
engine = ChainSearchEngine(mock_registry, mock_mapper, config=config)
|
||||||
|
expanded_results = [
|
||||||
|
SearchResult(path=f"src/file-{index}.ts", score=1.0 - (index * 0.01), excerpt="x")
|
||||||
|
for index in range(8)
|
||||||
|
]
|
||||||
|
|
||||||
|
with patch.object(engine, "_find_start_index") as mock_find:
|
||||||
|
mock_find.return_value = temp_paths / "index" / "_index.db"
|
||||||
|
|
||||||
|
with patch.object(engine, "_collect_index_paths") as mock_collect:
|
||||||
|
mock_collect.return_value = [temp_paths / "index" / "_index.db"]
|
||||||
|
|
||||||
|
with patch.object(engine, "_stage1_binary_search") as mock_stage1:
|
||||||
|
mock_stage1.return_value = (
|
||||||
|
[SearchResult(path="seed.ts", score=0.9, excerpt="seed")],
|
||||||
|
temp_paths / "index",
|
||||||
|
)
|
||||||
|
|
||||||
|
with patch.object(engine, "_stage2_lsp_expand") as mock_stage2:
|
||||||
|
mock_stage2.return_value = expanded_results
|
||||||
|
|
||||||
|
with patch.object(engine, "_stage3_cluster_prune") as mock_stage3:
|
||||||
|
mock_stage3.return_value = expanded_results[:6]
|
||||||
|
|
||||||
|
with patch.object(engine, "_stage4_optional_rerank") as mock_stage4:
|
||||||
|
mock_stage4.return_value = expanded_results[:2]
|
||||||
|
|
||||||
|
engine.staged_cascade_search(
|
||||||
|
"query",
|
||||||
|
temp_paths / "src",
|
||||||
|
k=2,
|
||||||
|
coarse_k=20,
|
||||||
|
)
|
||||||
|
|
||||||
|
mock_stage3.assert_called_once_with(
|
||||||
|
expanded_results,
|
||||||
|
6,
|
||||||
|
query="query",
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_staged_cascade_overfetches_rerank_before_final_trim(
|
||||||
|
self, mock_registry, mock_mapper, temp_paths
|
||||||
|
):
|
||||||
|
"""Test staged rerank keeps enough candidates for path penalties to work."""
|
||||||
|
config = Config(data_dir=temp_paths / "data")
|
||||||
|
config.enable_staged_rerank = True
|
||||||
|
config.reranker_top_k = 4
|
||||||
|
config.reranking_top_k = 4
|
||||||
|
config.test_file_penalty = 0.15
|
||||||
|
config.generated_file_penalty = 0.35
|
||||||
|
|
||||||
|
engine = ChainSearchEngine(mock_registry, mock_mapper, config=config)
|
||||||
|
|
||||||
|
src_primary = str(temp_paths / "src" / "tools" / "smart-search.ts")
|
||||||
|
src_secondary = str(temp_paths / "src" / "tools" / "codex-lens.ts")
|
||||||
|
test_primary = str(temp_paths / "tests" / "integration" / "cli-routes.test.ts")
|
||||||
|
test_secondary = str(
|
||||||
|
temp_paths / "frontend" / "tests" / "e2e" / "prompt-memory.spec.ts"
|
||||||
|
)
|
||||||
|
query = "parse CodexLens JSON output strip ANSI smart_search"
|
||||||
|
clustered_results = [
|
||||||
|
SearchResult(path=test_primary, score=0.98, excerpt="test"),
|
||||||
|
SearchResult(path=test_secondary, score=0.97, excerpt="test"),
|
||||||
|
SearchResult(path=src_primary, score=0.96, excerpt="source"),
|
||||||
|
SearchResult(path=src_secondary, score=0.95, excerpt="source"),
|
||||||
|
]
|
||||||
|
|
||||||
|
with patch.object(engine, "_find_start_index") as mock_find:
|
||||||
|
mock_find.return_value = temp_paths / "index" / "_index.db"
|
||||||
|
|
||||||
|
with patch.object(engine, "_collect_index_paths") as mock_collect:
|
||||||
|
mock_collect.return_value = [temp_paths / "index" / "_index.db"]
|
||||||
|
|
||||||
|
with patch.object(engine, "_stage1_binary_search") as mock_stage1:
|
||||||
|
mock_stage1.return_value = (
|
||||||
|
[SearchResult(path=src_primary, score=0.9, excerpt="seed")],
|
||||||
|
temp_paths / "index",
|
||||||
|
)
|
||||||
|
|
||||||
|
with patch.object(engine, "_stage2_lsp_expand") as mock_stage2:
|
||||||
|
mock_stage2.return_value = clustered_results
|
||||||
|
|
||||||
|
with patch.object(engine, "_stage3_cluster_prune") as mock_stage3:
|
||||||
|
mock_stage3.return_value = clustered_results
|
||||||
|
|
||||||
|
with patch.object(engine, "_cross_encoder_rerank") as mock_rerank:
|
||||||
|
mock_rerank.return_value = clustered_results
|
||||||
|
|
||||||
|
result = engine.staged_cascade_search(
|
||||||
|
query,
|
||||||
|
temp_paths / "src",
|
||||||
|
k=2,
|
||||||
|
coarse_k=20,
|
||||||
|
)
|
||||||
|
|
||||||
|
mock_rerank.assert_called_once_with(query, clustered_results, 4)
|
||||||
|
assert [item.path for item in result.results] == [src_primary, src_secondary]
|
||||||
|
|
||||||
|
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# Graceful Degradation Tests
|
# Graceful Degradation Tests
|
||||||
|
|||||||
Reference in New Issue
Block a user