-
Notifications
You must be signed in to change notification settings - Fork 132
Fix distributed tests #1451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix distributed tests #1451
Conversation
Reviewer's GuideThis PR overhauls the distributed test infrastructure by centralizing job creation, refining environment variable management, improving the distributed worker fixture, adjusting test expectations, skipping incompatible tests in distributed mode, and enhancing the CI workflow with unified dependencies and a venv caching step. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- There’s a lot of repeated monkeypatch.delenv("DATACHAIN_JOB_ID") in the unit tests—consider introducing a dedicated fixture to clear that env var automatically and reduce boilerplate.
- The cloud_server_credentials fixture mixes direct os.environ mutations with monkeypatch calls; switching entirely to monkeypatch.setenv/delenv will keep test isolation more consistent.
- The CI workflow’s virtualenv and wheel-building steps are quite verbose—extracting them into a reusable GitHub Action or caching the built venv could simplify the pipeline and speed up builds.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There’s a lot of repeated monkeypatch.delenv("DATACHAIN_JOB_ID") in the unit tests—consider introducing a dedicated fixture to clear that env var automatically and reduce boilerplate.
- The cloud_server_credentials fixture mixes direct os.environ mutations with monkeypatch calls; switching entirely to monkeypatch.setenv/delenv will keep test isolation more consistent.
- The CI workflow’s virtualenv and wheel-building steps are quite verbose—extracting them into a reusable GitHub Action or caching the built venv could simplify the pipeline and speed up builds.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates test infrastructure to handle distributed mode execution and improves test isolation by ensuring clean environment state. The key changes focus on preventing test interference from environment variables and skipping checkpoint tests in distributed mode.
Key changes:
- Added
monkeypatch.delenv("DATACHAIN_JOB_ID", raising=False)to multiple test functions to ensure clean job ID state - Added
@pytest.mark.skipifdecorators to checkpoint tests to skip them in distributed mode - Modified fixtures to properly set up job IDs and create unique worker queues for distributed testing
- Simplified error handling in UDF tests to expect consistent RuntimeError behavior
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_job_management.py | Added monkeypatch parameter and job ID cleanup to ensure test isolation |
| tests/unit/test_catalog_loader.py | Added environment cleanup for distributed mode testing |
| tests/unit/lib/test_checkpoints.py | Added skipif decorators to skip checkpoint tests in distributed mode |
| tests/func/test_checkpoints.py | Added skipif decorator to skip checkpoint test in distributed mode |
| tests/func/test_udf.py | Simplified error handling to expect consistent RuntimeError behavior |
| tests/conftest.py | Refactored fixtures to set up job IDs properly and create unique worker queues; changed cloud_server_credentials scope to session |
| .github/workflows/tests-studio.yml | Consolidated FFmpeg installation and added datachain venv initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @pytest.mark.skipif( | ||
| "os.environ.get('DATACHAIN_DISTRIBUTED')", | ||
| reason="Checkpoints test skipped in distributed mode", | ||
| ) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @pytest.mark.skipif condition uses a string expression but os is not imported in this file. This will cause a NameError at test collection time. Import os at the top of the file or use 'DATACHAIN_DISTRIBUTED' in os.environ with a proper import.
| @pytest.mark.skipif( | ||
| "os.environ.get('DATACHAIN_DISTRIBUTED')", | ||
| reason="Checkpoints test skipped in distributed mode", | ||
| ) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @pytest.mark.skipif condition uses a string expression but os is not imported in this file. This will cause a NameError at test collection time. Import os at the top of the file or use 'DATACHAIN_DISTRIBUTED' in os.environ with a proper import.
| @pytest.mark.skipif( | ||
| "os.environ.get('DATACHAIN_DISTRIBUTED')", | ||
| reason="Checkpoints test skipped in distributed mode", | ||
| ) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @pytest.mark.skipif condition uses a string expression but os is not imported in this file. This will cause a NameError at test collection time. Import os at the top of the file or use 'DATACHAIN_DISTRIBUTED' in os.environ with a proper import.
| @pytest.mark.skipif( | ||
| "os.environ.get('DATACHAIN_DISTRIBUTED')", | ||
| reason="Checkpoints test skipped in distributed mode", | ||
| ) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @pytest.mark.skipif condition uses a string expression but os is not imported in this file. This will cause a NameError at test collection time. Import os at the top of the file or use 'DATACHAIN_DISTRIBUTED' in os.environ with a proper import.
| @pytest.mark.skipif( | ||
| "os.environ.get('DATACHAIN_DISTRIBUTED')", | ||
| reason="Checkpoints test skipped in distributed mode", | ||
| ) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @pytest.mark.skipif condition uses a string expression but os is not imported in this file. This will cause a NameError at test collection time. Import os at the top of the file or use 'DATACHAIN_DISTRIBUTED' in os.environ with a proper import.
| _metastore.cleanup_for_tests() | ||
| else: | ||
| _metastore = SQLiteMetastore(db_file=tmp_path / "test.db") | ||
| _metastore = SQLiteMetastore(db_file=str(tmp_path / "test.db")) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting Path to string is unnecessary here. The SQLiteMetastore constructor likely accepts Path objects directly. If the change from tmp_path / \"test.db\" to str(tmp_path / \"test.db\") is intentional to fix an issue, it would be helpful to understand why, but this appears inconsistent with other code that uses paths directly.
| _metastore = SQLiteMetastore(db_file=str(tmp_path / "test.db")) | |
| _metastore = SQLiteMetastore(db_file=tmp_path / "test.db") |
| _warehouse.cleanup_for_tests() | ||
| else: | ||
| _warehouse = SQLiteWarehouse(db_file=tmp_path / "test.db") | ||
| _warehouse = SQLiteWarehouse(db_file=str(tmp_path / "test.db")) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting Path to string is unnecessary here. The SQLiteWarehouse constructor likely accepts Path objects directly. If the change from tmp_path / \"test.db\" to str(tmp_path / \"test.db\") is intentional to fix an issue, it would be helpful to understand why, but this appears inconsistent with other code that uses paths directly.
| _warehouse = SQLiteWarehouse(db_file=str(tmp_path / "test.db")) | |
| _warehouse = SQLiteWarehouse(db_file=tmp_path / "test.db") |
| PYTHON_VERSION: ${{ matrix.pyv }} | ||
| DATACHAIN_VENV_DIR: /tmp/local/datachain_venv/python${{ matrix.pyv }} | ||
| run: | | ||
| virtualenv -p "$(which python"${PYTHON_VERSION}")" "${DATACHAIN_VENV_DIR}" |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space in the command substitution syntax. The correct syntax should be \"$(which python\"${PYTHON_VERSION}\")\" or more clearly \"$(which python${PYTHON_VERSION})\". The current syntax python\"${PYTHON_VERSION}\" will not properly interpolate the variable.
| virtualenv -p "$(which python"${PYTHON_VERSION}")" "${DATACHAIN_VENV_DIR}" | |
| virtualenv -p "$(which python${PYTHON_VERSION})" "${DATACHAIN_VENV_DIR}" |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1451 +/- ##
=======================================
Coverage 87.96% 87.96%
=======================================
Files 160 160
Lines 15377 15377
Branches 2224 2224
=======================================
Hits 13527 13527
Misses 1336 1336
Partials 514 514
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Alternative to #1438
See also SaaS PRs.
Summary by Sourcery
Fix distributed tests by centralizing job creation and environment setup, refactoring Celery worker orchestration, skipping incompatible tests in distributed mode, standardizing test fixtures, and updating CI workflow for unified FFmpeg installation and prebuilt datachain_server venv initialization.
Enhancements:
CI:
Tests: