Skip to content

Conversation

manavgup
Copy link
Owner

@manavgup manavgup commented Oct 8, 2025

Summary

Adds a new endpoint for uploading documents to existing collections and improves error handling for duplicate collection names.

Changes

New Endpoint

  • POST /api/collections/{collection_id}/documents: Upload documents to an existing collection
    • Verifies collection exists before processing
    • Reuses existing logic
    • Triggers background document processing pipeline
    • Returns list of uploaded file records

Improved Error Handling

  • HTTP 409 Conflict: Returns proper status code for duplicate collection names (previously 500)
  • AlreadyExistsError: Added exception handling for collection name conflicts
  • Consistent error messages: Better error details for duplicate names

Technical Details

The new endpoint:

  1. Validates that the collection exists (404 if not found)
  2. Extracts user_id from JWT token
  3. Calls shared method
  4. Triggers async background processing for document ingestion
  5. Returns uploaded file metadata

Testing

  • Tested with existing collection upload workflow
  • Verified 409 status code for duplicate collection names
  • Confirmed document processing pipeline triggers correctly
  • Validated error messages for better UX

Related Issues

Fixes issue where uploading documents to existing collections returned 404 error.
Improves error handling for duplicate collection creation attempts.

- New POST /api/collections/{collection_id}/documents endpoint for uploading documents to existing collections
- Returns HTTP 409 Conflict for duplicate collection names instead of 500
- Reuses existing document processing pipeline from _upload_files_and_trigger_processing
- Improves error handling with AlreadyExistsError exception mapping
Copy link
Contributor

github-actions bot commented Oct 8, 2025

🚀 Development Environment Options

This repository supports Dev Containers for a consistent development environment.

Option 1: GitHub Codespaces (Recommended)

Create a cloud-based development environment:

  1. Click the green Code button above
  2. Select the Codespaces tab
  3. Click Create codespace on pr-02-backend-upload-error-handling
  4. Wait 2-3 minutes for environment setup
  5. Start coding with all tools pre-configured!

Option 2: VS Code Dev Containers (Local)

Use Dev Containers on your local machine:

  1. Install Docker Desktop
  2. Install VS Code
  3. Install the Dev Containers extension
  4. Clone this PR branch locally
  5. Open in VS Code and click "Reopen in Container" when prompted

Option 3: Traditional Local Setup

Set up the development environment manually:

# Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout pr-02-backend-upload-error-handling

# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validate

Available Commands

Once in your development environment:

make help           # Show all available commands
make dev-validate   # Validate environment setup
make test-atomic    # Run atomic tests
make test-unit      # Run unit tests
make lint          # Run linting

Services Available

When running make dev-up:


This automated message helps reviewers quickly set up the development environment.

Copy link

claude bot commented Oct 8, 2025

Code Review: PR #341

Summary

This PR adds a new endpoint for uploading documents to existing collections and improves error handling for duplicate collection names. Overall solid changes that follow codebase conventions, but several areas need attention.

✅ Strengths

  1. Good Code Reuse: _upload_files_and_trigger_processing() effectively eliminates duplication
  2. Proper HTTP Status Codes: 409 Conflict for duplicate names is correct
  3. Improved UX: Frontend shows specific errors for duplicate names
  4. Consistent Architecture: Follows service-based pattern
  5. Double-Submit Protection: submittingRef prevents race conditions

🔴 Critical Issues

1. Accessing Private Method from Router (Line 705)

Router calls _upload_files_and_trigger_processing (private method). This violates encapsulation.
Fix: Make method public or wrap in public method

2. Import Organization (Lines 2-28)

Imports reorganized incorrectly - core before fastapi breaks stdlib→third-party→local order. Will fail ruff checks.

3. Missing Test Coverage

No tests for new endpoint. Need unit tests, integration tests, and error case tests.

⚠️ High Priority

4. Frontend Upload Progress UX

Files marked 100% complete immediately but upload happens on submit - misleading UX

5. Fragile Error Handling

Uses string matching errorMessage.includes('already exists'). Should check status code 409

6. Authorization Check

Doesn't explicitly verify user permission to upload to collection

💡 Medium Priority

  1. Endpoint docs don't explain async background processing
  2. Drag-and-drop handler duplicates upload logic
  3. Verify .gitignore excludes log files

Overall Assessment

  • Code Quality: ⭐⭐⭐⭐ (4/5)
  • Architecture: ⭐⭐⭐⭐ (4/5)
  • Test Coverage: ⭐⭐ (2/5)
  • Security: ⭐⭐⭐⭐ (4/5)
  • Documentation: ⭐⭐⭐⭐⭐ (5/5)

Recommendation: Request changes for issues #1-4, approve after fixes.

Great work adding this feature! Architecture is solid, needs refinement on encapsulation and testing.

Fix import ordering to pass ruff CI checks.

Signed-off-by: Manav Gupta <manavg@gmail.com>
Root cause: Ruff's isort configuration only knew about 'rag_solution' as
first-party, causing inconsistent import ordering for 'core', 'auth', and
'vectordbs' modules.

Changes:
- Updated pyproject.toml to include core, auth, vectordbs in known-first-party
- Fixed import order in collection_router.py (third-party before first-party)
- Fixed import order in milvus_store.py (third-party before first-party)

This ensures consistent import sorting across:
1. Local dev (pyproject.toml)
2. Pre-commit hooks (.pre-commit-config.yaml)
3. CI/CD pipeline (01-lint.yml)

Resolves Ruff I001 (isort) linting failures in PR #341
Implements single source of truth for code quality standards to ensure
consistency between local development, pre-commit hooks, and CI/CD.

## Backend Changes (Python)
- Updated Ruff to v0.14.0 (from v0.1.11 pre-commit, v0.5.7 poetry)
- Updated MyPy to v1.15.0 (from v1.13.0)
- Updated Pylint and Pydocstyle to latest versions
- Configured isort to recognize all first-party packages:
  * rag_solution, core, auth, vectordbs
- Changed linting scope from 'rag_solution/' to ALL backend files:
  * backend/rag_solution/
  * backend/vectordbs/
  * backend/core/
  * backend/auth/
  * backend/tests/

## Frontend Changes (TypeScript/JavaScript)
- Added .eslintrc.json with standardized rules
- Configured react-app preset with custom overrides
- Ready for Prettier integration

## CI/CD Workflow Updates (.github/workflows/01-lint.yml)
- Ruff now checks ALL backend files using pyproject.toml config
- MyPy uses --config-file pyproject.toml for consistency
- Pylint checks all backend directories with --rcfile
- Pydocstyle checks all backend directories with --config

## Pre-commit Hooks Updates
- Ruff v0.14.0 with --config pointing to pyproject.toml
- MyPy v1.15.0 for consistency with CI/CD
- All hooks reference central configuration files

## Makefile Updates
- lint-ruff now checks ALL backend files (backend/.)
- All linting commands use centralized config files

## Documentation
- Created comprehensive code-quality-standards.md
- Added visual diagrams showing:
  * Three-layer enforcement (Local, Pre-commit, CI/CD)
  * Single source of truth architecture
  * Linting checklist matrix
- Updated mkdocs.yml navigation
- Documented version history and command reference

## Benefits
✅ Single source of truth (pyproject.toml, .eslintrc.json)
✅ Consistent linting across all environments
✅ Same Ruff/MyPy/Pylint versions everywhere
✅ All backend Python files now linted
✅ Clear documentation with visual representations
✅ Easy to update: change config once, applies everywhere

Resolves inconsistent import sorting and linting behavior
Auto-fixed 47+ import sorting violations and formatting issues across
the entire backend using centralized isort configuration.

Files updated:
- backend/auth/ (oidc, middleware)
- backend/core/ (authentication_middleware)
- backend/main.py
- backend/vectordbs/ (all vector database implementations)
- backend/tests/ (all test files)
- backend/dev_tests/ (manual test files)

Import order now follows configured standard:
1. Standard library
2. Third-party (fastapi, pydantic, sqlalchemy, pymilvus, etc.)
3. First-party (core, auth, rag_solution, vectordbs)

Additional fixes:
- Fixed unused variable warning in token_tracking test
- Applied Ruff format to 2 files

Remaining SIM117 warnings (nested with statements) are informational.
## Changes

### CI/CD Workflow (.github/workflows/01-lint.yml)
- Combined separate 'Ruff Check' and 'Ruff Format Check' into single step
- New step 'Ruff (Lint + Format)' runs both checks sequentially
- Benefits:
  * Simpler workflow (one step instead of two)
  * Faster execution (shared poetry install)
  * Clearer GitHub Actions UI

### Formatting Fixes
- Applied Ruff formatting to test_token_tracking_e2e_tdd.py
- Applied formatting via pre-commit hooks
- Resolves CI/CD format check failure

### Documentation Update (docs/development/code-quality-standards.md)
- Updated linting checklist matrix to show combined Ruff step
- Updated blocking checks list
- Maintains accuracy with actual CI/CD implementation

## Before
- Step 1: Ruff Check (lint only)
- Step 2: Ruff Format Check (format only)
- Two separate poetry installations

## After
- Single step: Ruff (Lint + Format)
- One poetry installation
- Both checks run in sequence
- Auto-fix import sorting issues (I001) in test files
- Combine nested with statements (SIM117) into single with statements
- Format test_token_tracking_e2e_tdd.py to comply with Ruff formatting

Changes:
- backend/tests/e2e/test_conversation_e2e_tdd.py: Import sorting
- backend/tests/e2e/test_seamless_workflow_tdd.py: Import sorting
- backend/tests/e2e/test_token_tracking_e2e_tdd.py: Import sorting + formatting
- backend/tests/fixtures/auth.py: Import sorting
- backend/tests/integration/test_podcast_generation_integration.py: Combined 5 nested with statements

All Ruff checks now pass (check + format)
This commit establishes backend/pyproject.toml as the single source of truth
for all linting tool versions and configurations across:
- Local development (Makefile targets)
- Pre-commit hooks (.pre-commit-config.yaml)
- CI/CD pipelines (.github/workflows/01-lint.yml)

Changes:

1. Tool Versions (pyproject.toml):
   - Ruff: ^0.14.0
   - MyPy: ^1.15.0
   - Pylint: ^3.3.8
   - Pydocstyle: ^6.3.0

2. Configuration Updates:
   - Added 'main' to known-first-party for import sorting
   - All tools now reference pyproject.toml explicitly

3. Makefile Improvements:
   - lint-ruff: Now checks both lint AND format
   - lint-mypy: Added --config-file and consistent args
   - lint-pylint: Uses pyproject.toml rcfile
   - lint-docstrings: Includes all modules consistently
   - format-ruff/format-check: Now scans entire backend (.)
   - format-imports/check-imports: Now scans entire backend (.)

4. Pre-commit Hook Updates:
   - Added version comments matching pyproject.toml
   - Added descriptive names to each hook
   - Unified MyPy args with Makefile and CI/CD
   - All hooks now explicitly reference backend/pyproject.toml

5. CI/CD Workflow Updates:
   - Added header comment documenting tool versions
   - Explicit reference to pyproject.toml as source of truth

6. Documentation:
   - Created docs/development/unified-linting.md
   - Comprehensive guide on unified linting approach
   - Troubleshooting section for common issues
   - Migration history and references

7. Test File Fixes:
   - Fixed import sorting in all test files
   - Applied Ruff formatting consistently

Benefits:
- Consistency: Same results in local dev, pre-commit, and CI/CD
- Maintainability: Single file (pyproject.toml) to update configs
- Transparency: Clear documentation of versions and settings
- Speed: Ruff handles most checks in milliseconds
- Developer Experience: No surprises between environments

All linting checks pass:
✅ Ruff check: All checks passed
✅ Ruff format: 347 files already formatted
…_e2e_tdd.py

Ruff prefers a different formatting style for this multi-line assert statement.
This resolves the CI/CD Ruff format check failure.

Changed from:
    assert create_response.status_code == 200, (
        f"message"
    )

To:
    assert (
        create_response.status_code == 200
    ), f"message"
Fixed 4 Ruff linting issues:
- RUF059: Prefix unused unpacked variables with underscore
- RUF022: Sort __all__ list alphabetically
Fixed 4 Ruff linting issues:
- RUF059: Prefix unused unpacked variables with underscore
- RUF022: Sort __all__ list alphabetically
Activated backend/.venv to use Ruff 0.14.0 instead of system Ruff 0.5.7.
This resolves CI/CD formatting check failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant