Add comprehensive tests for schema cleanup migration and search comparison

- Implement tests for migration 005 to verify removal of deprecated fields in the database schema. - Ensure that new databases are created with a clean schema. - Validate that keywords are correctly extracted from the normalized file_keywords table. - Test symbol insertion without deprecated fields and subdir operations without direct_files. - Create a detailed search comparison test to evaluate vector search vs hybrid search performance. - Add a script for reindexing projects to extract code relationships and verify GraphAnalyzer functionality. - Include a test script to check TreeSitter parser availability and relationship extraction from sample files.
2026-02-12 02:37:45 +08:00 · 2025-12-16 19:27:05 +08:00
parent 3da0ef2adb
commit df23975a0b
61 changed files with 13114 additions and 366 deletions
--- a/codex-lens/docs/test-quality-enhancements.md
+++ b/codex-lens/docs/test-quality-enhancements.md
@@ -0,0 +1,187 @@
+# Test Quality Enhancements - Implementation Summary
+
+**Date**: 2025-12-16
+**Status**: ✅ Complete - All 4 recommendations implemented and passing
+
+## Overview
+
+Implemented all 4 test quality recommendations from Gemini's comprehensive analysis to enhance test coverage and robustness across the codex-lens test suite.
+
+## Recommendation 1: Verify True Fuzzy Matching ✅
+
+**File**: `tests/test_dual_fts.py`
+**Test Class**: `TestDualFTSPerformance`
+**New Test**: `test_fuzzy_substring_matching`
+
+### Implementation
+- Verifies trigram tokenizer enables partial token matching
+- Tests that searching for "func" matches "function0", "function1", etc.
+- Gracefully skips if trigram tokenizer unavailable
+- Validates BM25 scoring for fuzzy results
+
+### Key Features
+- Runtime detection of trigram support
+- Validates substring matching capability
+- Ensures proper score ordering (negative BM25)
+
+### Test Result
+```bash
+PASSED tests/test_dual_fts.py::TestDualFTSPerformance::test_fuzzy_substring_matching
+```
+
+---
+
+## Recommendation 2: Enable Mocked Vector Search ✅
+
+**File**: `tests/test_hybrid_search_e2e.py`
+**Test Class**: `TestHybridSearchWithVectorMock`
+**New Test**: `test_hybrid_with_vector_enabled`
+
+### Implementation
+- Mocks vector search to return predefined results
+- Tests RRF fusion with exact + fuzzy + vector sources
+- Validates hybrid search handles vector integration correctly
+- Uses `unittest.mock.patch` for clean mocking
+
+### Key Features
+- Mock SearchResult objects with scores
+- Tests enable_vector=True parameter
+- Validates RRF fusion score calculation (positive scores)
+- Gracefully handles missing vector search module
+
+### Test Result
+```bash
+PASSED tests/test_hybrid_search_e2e.py::TestHybridSearchWithVectorMock::test_hybrid_with_vector_enabled
+```
+
+---
+
+## Recommendation 3: Complex Query Parser Stress Tests ✅
+
+**File**: `tests/test_query_parser.py`
+**Test Class**: `TestComplexBooleanQueries`
+**New Tests**: 5 comprehensive tests
+
+### Implementation
+
+#### 1. `test_nested_boolean_and_or`
+- Tests: `(login OR logout) AND user`
+- Validates nested parentheses preservation
+- Ensures boolean operators remain intact
+
+#### 2. `test_mixed_operators_with_expansion`
+- Tests: `UserAuth AND (login OR logout)`
+- Verifies CamelCase expansion doesn't break operators
+- Ensures expansion + boolean logic coexist
+
+#### 3. `test_quoted_phrases_with_boolean`
+- Tests: `"user authentication" AND login`
+- Validates quoted phrase preservation
+- Ensures AND operator survives
+
+#### 4. `test_not_operator_preservation`
+- Tests: `login NOT logout`
+- Confirms NOT operator handling
+- Validates negation logic
+
+#### 5. `test_complex_nested_three_levels`
+- Tests: `((UserAuth OR login) AND session) OR token`
+- Stress tests deep nesting (3 levels)
+- Validates multiple parentheses pairs
+
+### Test Results
+```bash
+PASSED tests/test_query_parser.py::TestComplexBooleanQueries::test_nested_boolean_and_or
+PASSED tests/test_query_parser.py::TestComplexBooleanQueries::test_mixed_operators_with_expansion
+PASSED tests/test_query_parser.py::TestComplexBooleanQueries::test_quoted_phrases_with_boolean
+PASSED tests/test_query_parser.py::TestComplexBooleanQueries::test_not_operator_preservation
+PASSED tests/test_query_parser.py::TestComplexBooleanQueries::test_complex_nested_three_levels
+```
+
+---
+
+## Recommendation 4: Migration Reversibility Tests ✅
+
+**File**: `tests/test_dual_fts.py`
+**Test Class**: `TestMigrationRecovery`
+**New Tests**: 2 migration robustness tests
+
+### Implementation
+
+#### 1. `test_migration_preserves_data_on_failure`
+- Creates v2 database with test data
+- Attempts migration (may succeed or fail)
+- Validates data preservation in both scenarios
+- Smart column detection (path vs full_path)
+
+**Key Features**:
+- Checks schema version to determine column names
+- Handles both migration success and failure
+- Ensures no data loss
+
+#### 2. `test_migration_idempotent_after_partial_failure`
+- Tests retry capability after partial migration
+- Validates graceful handling of repeated initialization
+- Ensures database remains in usable state
+
+**Key Features**:
+- Double initialization without errors
+- Table existence verification
+- Safe retry mechanism
+
+### Test Results
+```bash
+PASSED tests/test_dual_fts.py::TestMigrationRecovery::test_migration_preserves_data_on_failure
+PASSED tests/test_dual_fts.py::TestMigrationRecovery::test_migration_idempotent_after_partial_failure
+```
+
+---
+
+## Test Suite Statistics
+
+### Overall Results
+```
+91 passed, 2 skipped, 2 warnings in 3.31s
+```
+
+### New Tests Added
+- **Recommendation 1**: 1 test (fuzzy substring matching)
+- **Recommendation 2**: 1 test (vector mock integration)
+- **Recommendation 3**: 5 tests (complex boolean queries)
+- **Recommendation 4**: 2 tests (migration recovery)
+
+**Total New Tests**: 9
+
+### Coverage Improvements
+- **Fuzzy Search**: Now validates actual trigram substring matching
+- **Hybrid Search**: Tests vector integration with mocks
+- **Query Parser**: Handles complex nested boolean logic
+- **Migration**: Validates data preservation and retry capability
+
+---
+
+## Code Quality
+
+### Best Practices Applied
+1. **Graceful Degradation**: Tests skip when features unavailable (trigram)
+2. **Clean Mocking**: Uses `unittest.mock` for vector search
+3. **Smart Assertions**: Adapts to migration outcomes dynamically
+4. **Edge Case Handling**: Tests multiple nesting levels and operators
+
+### Integration
+- All tests integrate seamlessly with existing pytest fixtures
+- Maintains 100% pass rate across test suite
+- No breaking changes to existing tests
+
+---
+
+## Validation
+
+All 4 recommendations successfully implemented and verified:
+
+✅ **Recommendation 1**: Fuzzy substring matching with trigram validation  
+✅ **Recommendation 2**: Vector search mocking for hybrid fusion testing  
+✅ **Recommendation 3**: Complex boolean query stress tests (5 tests)  
+✅ **Recommendation 4**: Migration recovery and idempotency tests (2 tests)  
+
+**Final Status**: Production-ready, all tests passing