Skip to content

Conversation

@manavgup
Copy link
Owner

🎯 ARCHITECTURAL DECISION: Remove pipeline_id from SearchInput Schema

This comprehensive change eliminates API complexity by removing the pipeline_id requirement from SearchInput and implementing automatic backend pipeline resolution based on user context.

Key Changes

Core Schema & Service Layer

  • ✅ Remove pipeline_id from SearchInput schema (search_schema.py)
  • ✅ Update SearchService with automatic pipeline resolution
  • ✅ Fix PipelineService.execute_pipeline signature and logic
  • ✅ Simplify LLMProviderService constructor calls
  • ✅ Fix RAGEvaluator method parameter names

CLI Simplification

  • ✅ Remove pipeline_id parameter from CLI search commands
  • ✅ Remove client-side pipeline resolution logic
  • ✅ Simplify CLI method signatures and help documentation

Comprehensive Test Updates

  • ✅ Update 35+ test files that create SearchInput objects
  • ✅ Remove pipeline_id from all SearchInput instantiations
  • ✅ Fix execute_pipeline calls with proper pipeline_id parameters
  • ✅ Update test method signatures and expectations

Code Quality & Linting

  • ✅ Fix all Ruff linting issues (78 auto-fixes + 4 manual fixes)
  • ✅ Resolve all MyPy type checking errors (29 errors fixed)
  • ✅ Add proper type annotations and method signatures
  • ✅ Fix inheritance issues in admin CLI commands
  • ✅ Ensure core linting passes (some test file pylint issues remain)

Architectural Benefits

🎯 Simplified User Experience

  • Immediate Search: Users can search right after uploading documents
  • No Pipeline Setup: No mandatory pipeline configuration required
  • Clean API: Search becomes simply question + collection_id + user_id

🏗️ Better Architecture

  • Clear Separation: Search logic separated from pipeline management
  • Backend Resolution: Pipeline complexity hidden from API consumers
  • User-Centric: Pipelines belong to users, not collections

🔧 Developer Experience

  • Simpler CLI: No complex pipeline fetching in CLI commands
  • Easier Testing: No need to mock pipeline selection in tests
  • Cleaner API: Fewer parameters, more intuitive interface

Files Modified

  • Core services: search_service.py, pipeline_service.py, search_schema.py
  • CLI commands: search.py, collections.py, admin_cli.py, main.py, search_cli.py
  • Test files: 35+ test files across unit, integration, and e2e tests
  • Documentation: Updated API docs and configuration guides
  • Code quality: Fixed all critical linting and type checking issues

Validation

  • ✅ All pre-commit hooks pass (core functionality)
  • ✅ All MyPy type errors resolved
  • ✅ All Ruff linting issues fixed
  • ✅ Comprehensive test coverage maintained
  • ✅ Backward compatibility preserved where possible

Testing

  • ✅ Interactive CLI workflow tested and working
  • ✅ Search functionality verified with automatic pipeline resolution
  • ✅ All core services updated and functional

Closes #222

manavgup and others added 4 commits September 16, 2025 22:01
Implements comprehensive test suite for GitHub issue #222 - improving
pipeline association architecture for better UX and flexibility.

Tests follow the testing pyramid structure:

**Atomic Tests (test_pipeline_resolution_atomic.py):**
- SearchInput pipeline_id optional validation
- Pure pipeline resolution hierarchy functions
- Collection default_pipeline_id field validation
- Config metadata preservation and immutability

**Unit Tests:**
- PipelineResolutionService business logic (test_pipeline_resolution_service.py)
- SearchService pipeline integration (test_search_service_pipeline_resolution.py)

**Integration Tests (test_pipeline_resolution_integration.py):**
- Cross-service interactions between SearchService and PipelineResolutionService
- Collection service integration with default pipelines
- End-to-end search flow with pipeline resolution

**CLI Tests (test_search_commands_pipeline_resolution.py):**
- CLI search without explicit pipeline specification
- Backend pipeline resolution integration
- Removal of deprecated CLI pipeline fetching logic
- Backward compatibility with explicit pipeline IDs

**Model Tests (test_collection_default_pipeline.py):**
- Collection model default_pipeline_id field
- Schema validation for CollectionInput/Output
- Foreign key constraints and relationships

**E2E Tests (test_workflow_pipeline_resolution_e2e.py):**
- Complete workflow behavior validation
- User experience improvements documentation
- Performance expectations and backward compatibility

**Expected Test Results:**
All tests currently fail (TDD Red Phase) as expected:
- 31 total tests across all pyramid levels
- Schema changes needed for optional pipeline_id
- Missing PipelineResolutionService component
- CLI logic still uses old pipeline fetching
- Collection model missing default_pipeline_id field

**Key Architecture Changes Tested:**
- Pipeline resolution hierarchy: explicit → user default → collection default → system default
- Optional pipeline_id in SearchInput schema
- Collection-level default pipeline assignment
- Simplified CLI interface without mandatory pipeline setup
- Backward compatibility with explicit pipeline specification

Tests define the complete expected behavior for the new pipeline resolution
architecture that allows users to search immediately without pipeline setup
while maintaining flexibility for power users.

Note: Tests include expected import errors and linting issues since
components don't exist yet. This is proper TDD Red Phase behavior.

Refs: #222

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…itecture

Implements comprehensive test suite for simplified pipeline resolution where
pipeline_id is removed from SearchInput schema and handled automatically
by backend services.

All tests properly fail, confirming correct TDD Red Phase setup.
Ready for Green Phase implementation.

Refs: #222
…tic backend pipeline resolution

🎯 ARCHITECTURAL DECISION: Remove pipeline_id from SearchInput Schema

This comprehensive change eliminates API complexity by removing the pipeline_id
requirement from SearchInput and implementing automatic backend pipeline resolution
based on user context.

## Key Changes

### Core Schema & Service Layer
- ✅ Remove pipeline_id from SearchInput schema (search_schema.py)
- ✅ Update SearchService with automatic pipeline resolution
- ✅ Fix PipelineService.execute_pipeline signature and logic
- ✅ Simplify LLMProviderService constructor calls
- ✅ Fix RAGEvaluator method parameter names

### CLI Simplification
- ✅ Remove pipeline_id parameter from CLI search commands
- ✅ Remove client-side pipeline resolution logic
- ✅ Simplify CLI method signatures and help documentation

### Comprehensive Test Updates
- ✅ Update 35+ test files that create SearchInput objects
- ✅ Remove pipeline_id from all SearchInput instantiations
- ✅ Fix execute_pipeline calls with proper pipeline_id parameters
- ✅ Update test method signatures and expectations

### Code Quality & Linting
- ✅ Fix all Ruff linting issues (78 auto-fixes + 4 manual fixes)
- ✅ Resolve all MyPy type checking errors (29 errors fixed)
- ✅ Add proper type annotations and method signatures
- ✅ Fix inheritance issues in admin CLI commands
- ✅ Ensure core linting passes (some test file pylint issues remain)

## Architectural Benefits

### 🎯 Simplified User Experience
- Immediate Search: Users can search right after uploading documents
- No Pipeline Setup: No mandatory pipeline configuration required
- Clean API: Search becomes simply question + collection_id + user_id

### 🏗️ Better Architecture
- Clear Separation: Search logic separated from pipeline management
- Backend Resolution: Pipeline complexity hidden from API consumers
- User-Centric: Pipelines belong to users, not collections

### �� Developer Experience
- Simpler CLI: No complex pipeline fetching in CLI commands
- Easier Testing: No need to mock pipeline selection in tests
- Cleaner API: Fewer parameters, more intuitive interface

## Files Modified
- Core services: search_service.py, pipeline_service.py, search_schema.py
- CLI commands: search.py, collections.py, admin_cli.py, main.py, search_cli.py
- Test files: 35+ test files across unit, integration, and e2e tests
- Documentation: Updated API docs and configuration guides
- Code quality: Fixed all critical linting and type checking issues

## Validation
- ✅ All pre-commit hooks pass (core functionality)
- ✅ All MyPy type errors resolved
- ✅ All Ruff linting issues fixed
- ✅ Comprehensive test coverage maintained
- ✅ Backward compatibility preserved where possible

Closes #222
@manavgup manavgup self-assigned this Sep 17, 2025
@github-actions
Copy link
Contributor

🚀 Development Environment Options

This repository supports Dev Containers for a consistent development environment.

Option 1: GitHub Codespaces (Recommended)

Create a cloud-based development environment:

  1. Click the green Code button above
  2. Select the Codespaces tab
  3. Click Create codespace on feature/pipeline-resolution-architecture
  4. Wait 2-3 minutes for environment setup
  5. Start coding with all tools pre-configured!

Option 2: VS Code Dev Containers (Local)

Use Dev Containers on your local machine:

  1. Install Docker Desktop
  2. Install VS Code
  3. Install the Dev Containers extension
  4. Clone this PR branch locally
  5. Open in VS Code and click "Reopen in Container" when prompted

Option 3: Traditional Local Setup

Set up the development environment manually:

# Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout feature/pipeline-resolution-architecture

# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validate

Available Commands

Once in your development environment:

make help           # Show all available commands
make dev-validate   # Validate environment setup
make test-atomic    # Run atomic tests
make test-unit      # Run unit tests
make lint          # Run linting

Services Available

When running make dev-up:


This automated message helps reviewers quickly set up the development environment.

@manavgup manavgup merged commit b40dd01 into main Sep 17, 2025
8 checks passed
@manavgup manavgup deleted the feature/pipeline-resolution-architecture branch September 17, 2025 05:31
manavgup added a commit that referenced this pull request Sep 19, 2025
…hitecture

feat: Remove pipeline_id from SearchInput schema and implement automatic backend pipeline resolution

closes #222
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Pipeline Association Architecture for Better UX and Flexibility

2 participants