-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add document upload endpoint and improve error handling #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- New POST /api/collections/{collection_id}/documents endpoint for uploading documents to existing collections - Returns HTTP 409 Conflict for duplicate collection names instead of 500 - Reuses existing document processing pipeline from _upload_files_and_trigger_processing - Improves error handling with AlreadyExistsError exception mapping
🚀 Development Environment OptionsThis repository supports Dev Containers for a consistent development environment. Option 1: GitHub Codespaces (Recommended)Create a cloud-based development environment:
Option 2: VS Code Dev Containers (Local)Use Dev Containers on your local machine:
Option 3: Traditional Local SetupSet up the development environment manually: # Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout pr-02-backend-upload-error-handling
# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validate Available CommandsOnce in your development environment: make help # Show all available commands
make dev-validate # Validate environment setup
make test-atomic # Run atomic tests
make test-unit # Run unit tests
make lint # Run linting Services AvailableWhen running
This automated message helps reviewers quickly set up the development environment. |
Code Review: PR #341SummaryThis PR adds a new endpoint for uploading documents to existing collections and improves error handling for duplicate collection names. Overall solid changes that follow codebase conventions, but several areas need attention. ✅ Strengths
🔴 Critical Issues1. Accessing Private Method from Router (Line 705)Router calls _upload_files_and_trigger_processing (private method). This violates encapsulation. 2. Import Organization (Lines 2-28)Imports reorganized incorrectly - core before fastapi breaks stdlib→third-party→local order. Will fail ruff checks. 3. Missing Test CoverageNo tests for new endpoint. Need unit tests, integration tests, and error case tests.
|
Fix import ordering to pass ruff CI checks. Signed-off-by: Manav Gupta <manavg@gmail.com>
Root cause: Ruff's isort configuration only knew about 'rag_solution' as first-party, causing inconsistent import ordering for 'core', 'auth', and 'vectordbs' modules. Changes: - Updated pyproject.toml to include core, auth, vectordbs in known-first-party - Fixed import order in collection_router.py (third-party before first-party) - Fixed import order in milvus_store.py (third-party before first-party) This ensures consistent import sorting across: 1. Local dev (pyproject.toml) 2. Pre-commit hooks (.pre-commit-config.yaml) 3. CI/CD pipeline (01-lint.yml) Resolves Ruff I001 (isort) linting failures in PR #341
Implements single source of truth for code quality standards to ensure consistency between local development, pre-commit hooks, and CI/CD. ## Backend Changes (Python) - Updated Ruff to v0.14.0 (from v0.1.11 pre-commit, v0.5.7 poetry) - Updated MyPy to v1.15.0 (from v1.13.0) - Updated Pylint and Pydocstyle to latest versions - Configured isort to recognize all first-party packages: * rag_solution, core, auth, vectordbs - Changed linting scope from 'rag_solution/' to ALL backend files: * backend/rag_solution/ * backend/vectordbs/ * backend/core/ * backend/auth/ * backend/tests/ ## Frontend Changes (TypeScript/JavaScript) - Added .eslintrc.json with standardized rules - Configured react-app preset with custom overrides - Ready for Prettier integration ## CI/CD Workflow Updates (.github/workflows/01-lint.yml) - Ruff now checks ALL backend files using pyproject.toml config - MyPy uses --config-file pyproject.toml for consistency - Pylint checks all backend directories with --rcfile - Pydocstyle checks all backend directories with --config ## Pre-commit Hooks Updates - Ruff v0.14.0 with --config pointing to pyproject.toml - MyPy v1.15.0 for consistency with CI/CD - All hooks reference central configuration files ## Makefile Updates - lint-ruff now checks ALL backend files (backend/.) - All linting commands use centralized config files ## Documentation - Created comprehensive code-quality-standards.md - Added visual diagrams showing: * Three-layer enforcement (Local, Pre-commit, CI/CD) * Single source of truth architecture * Linting checklist matrix - Updated mkdocs.yml navigation - Documented version history and command reference ## Benefits ✅ Single source of truth (pyproject.toml, .eslintrc.json) ✅ Consistent linting across all environments ✅ Same Ruff/MyPy/Pylint versions everywhere ✅ All backend Python files now linted ✅ Clear documentation with visual representations ✅ Easy to update: change config once, applies everywhere Resolves inconsistent import sorting and linting behavior
Auto-fixed 47+ import sorting violations and formatting issues across the entire backend using centralized isort configuration. Files updated: - backend/auth/ (oidc, middleware) - backend/core/ (authentication_middleware) - backend/main.py - backend/vectordbs/ (all vector database implementations) - backend/tests/ (all test files) - backend/dev_tests/ (manual test files) Import order now follows configured standard: 1. Standard library 2. Third-party (fastapi, pydantic, sqlalchemy, pymilvus, etc.) 3. First-party (core, auth, rag_solution, vectordbs) Additional fixes: - Fixed unused variable warning in token_tracking test - Applied Ruff format to 2 files Remaining SIM117 warnings (nested with statements) are informational.
## Changes ### CI/CD Workflow (.github/workflows/01-lint.yml) - Combined separate 'Ruff Check' and 'Ruff Format Check' into single step - New step 'Ruff (Lint + Format)' runs both checks sequentially - Benefits: * Simpler workflow (one step instead of two) * Faster execution (shared poetry install) * Clearer GitHub Actions UI ### Formatting Fixes - Applied Ruff formatting to test_token_tracking_e2e_tdd.py - Applied formatting via pre-commit hooks - Resolves CI/CD format check failure ### Documentation Update (docs/development/code-quality-standards.md) - Updated linting checklist matrix to show combined Ruff step - Updated blocking checks list - Maintains accuracy with actual CI/CD implementation ## Before - Step 1: Ruff Check (lint only) - Step 2: Ruff Format Check (format only) - Two separate poetry installations ## After - Single step: Ruff (Lint + Format) - One poetry installation - Both checks run in sequence
- Auto-fix import sorting issues (I001) in test files - Combine nested with statements (SIM117) into single with statements - Format test_token_tracking_e2e_tdd.py to comply with Ruff formatting Changes: - backend/tests/e2e/test_conversation_e2e_tdd.py: Import sorting - backend/tests/e2e/test_seamless_workflow_tdd.py: Import sorting - backend/tests/e2e/test_token_tracking_e2e_tdd.py: Import sorting + formatting - backend/tests/fixtures/auth.py: Import sorting - backend/tests/integration/test_podcast_generation_integration.py: Combined 5 nested with statements All Ruff checks now pass (check + format)
This commit establishes backend/pyproject.toml as the single source of truth for all linting tool versions and configurations across: - Local development (Makefile targets) - Pre-commit hooks (.pre-commit-config.yaml) - CI/CD pipelines (.github/workflows/01-lint.yml) Changes: 1. Tool Versions (pyproject.toml): - Ruff: ^0.14.0 - MyPy: ^1.15.0 - Pylint: ^3.3.8 - Pydocstyle: ^6.3.0 2. Configuration Updates: - Added 'main' to known-first-party for import sorting - All tools now reference pyproject.toml explicitly 3. Makefile Improvements: - lint-ruff: Now checks both lint AND format - lint-mypy: Added --config-file and consistent args - lint-pylint: Uses pyproject.toml rcfile - lint-docstrings: Includes all modules consistently - format-ruff/format-check: Now scans entire backend (.) - format-imports/check-imports: Now scans entire backend (.) 4. Pre-commit Hook Updates: - Added version comments matching pyproject.toml - Added descriptive names to each hook - Unified MyPy args with Makefile and CI/CD - All hooks now explicitly reference backend/pyproject.toml 5. CI/CD Workflow Updates: - Added header comment documenting tool versions - Explicit reference to pyproject.toml as source of truth 6. Documentation: - Created docs/development/unified-linting.md - Comprehensive guide on unified linting approach - Troubleshooting section for common issues - Migration history and references 7. Test File Fixes: - Fixed import sorting in all test files - Applied Ruff formatting consistently Benefits: - Consistency: Same results in local dev, pre-commit, and CI/CD - Maintainability: Single file (pyproject.toml) to update configs - Transparency: Clear documentation of versions and settings - Speed: Ruff handles most checks in milliseconds - Developer Experience: No surprises between environments All linting checks pass: ✅ Ruff check: All checks passed ✅ Ruff format: 347 files already formatted
…_e2e_tdd.py Ruff prefers a different formatting style for this multi-line assert statement. This resolves the CI/CD Ruff format check failure. Changed from: assert create_response.status_code == 200, ( f"message" ) To: assert ( create_response.status_code == 200 ), f"message"
Fixed 4 Ruff linting issues: - RUF059: Prefix unused unpacked variables with underscore - RUF022: Sort __all__ list alphabetically
Fixed 4 Ruff linting issues: - RUF059: Prefix unused unpacked variables with underscore - RUF022: Sort __all__ list alphabetically
Activated backend/.venv to use Ruff 0.14.0 instead of system Ruff 0.5.7. This resolves CI/CD formatting check failures.
Summary
Adds a new endpoint for uploading documents to existing collections and improves error handling for duplicate collection names.
Changes
New Endpoint
Improved Error Handling
Technical Details
The new endpoint:
Testing
Related Issues
Fixes issue where uploading documents to existing collections returned 404 error.
Improves error handling for duplicate collection creation attempts.