-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt to pooch
and cleanup testing data
#29
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FYI, this PR requires changes in existing projects first before merging. |
This was referenced Aug 22, 2024
huard
approved these changes
Aug 22, 2024
Zeitsperre
added a commit
to bird-house/birdhouse-deploy
that referenced
this pull request
Aug 23, 2024
## Overview This PR updates the cloning of the xclim-testdata repo to reflect structural changes. ## Changes **Non-breaking changes** - Adjusts the location of the xclim-testdata data folder ## Related Issue / Discussion - Ouranosinc/xclim-testdata#29 - Ouranosinc/xclim#1889 ## CI Operations <!-- The test suite can be run using a different DACCS config with ``birdhouse_daccs_configs_branch: branch_name`` in the PR description. To globally skip the test suite regardless of the commit message use ``birdhouse_skip_ci`` set to ``true`` in the PR description. Note that using ``[skip ci]``, ``[ci skip]`` or ``[no ci]`` in the commit message will override ``birdhouse_skip_ci`` from the PR description. --> birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false
Zeitsperre
added a commit
to Ouranosinc/xclim
that referenced
this pull request
Aug 28, 2024
<!--Please ensure the PR fulfills the following requirements! --> <!-- If this is your first PR, make sure to add your details to the AUTHORS.rst! --> ### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR relies on changes to be merged in Ouranosinc/xclim-testdata#29 - [x] Tests for the changes have been added (for bug fixes / features) - [x] (If applicable) Documentation has been added / updated (for bug fixes / features) - [x] CHANGELOG.rst has been updated (with summary of main changes) - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added ### What kind of change does this PR introduce? * Replaces the logic for file gathering and caching from the in-house developed version to instead use `pooch`. * In order to fetch testing data, one can now use the following: ```python from xclim.testing.utils import nimbus n = nimbus() # from a fork of xclim-testdata: nimbus(repo="https://github.com/Me/My_Repo", branch="my_test_branch") file = n.fetch("some_folder/some_data.nc") ``` * Removes the remote GitHub calls for every file request (which was performed by `_get()`). * Exports most of the file request and cache handling to `pooch`, while maintaining a relatively unchanged API for users. * (To be confirmed) Speeds up the delivery of test data to tests by reducing the amount of redundant calls to fixtures and relying on a single pooch instance of pooch to prevent multiple setup stages. ### Does this PR introduce a breaking change? Absolutely. `get_file` and `open_dataset` no longer fetch remote files from GitHub. Instead, a locally-stored `registry.txt` file contains all the checksums of all files needed to run the tests and returns the appropriate file from a locally-held cache. If the file checksum does not match the expected value, it will attempt to replace it from the remote storage. Additionally, the `md5` files that accompanied all testing data files are now obsolete thanks to the use of the registry. The testing data is now versioned according to the `xclim-testdata` version/tag. All the `prefetch` logic baked into the `pytest` calls has been removed, making the setup code much easier to follow. There is no longer a need to run `$ xclim prefetch_testing_data` unless users are running on Windows (for the very first run of `pytest` only). There are now three environment variables to help developers: - XCLIM_TESTDATA_BRANCH - Controls the branch name of `xclim-testdata`. - XCLIM_TESTDATA_CACHE_DIR - Controls the local folder to be used when fetching the test data. - XCLIM_TESTDATA_REPO_URL - Controls the repository URL for `xclim-testdata` (for forks) `platformdirs` is no longer a hard dependency. The default cache directory will only be determined if `pooch` is installed. ### Other information: There is still a lot of potential here to tighten this up; I'd like to land on a design that is clean and easily portable to other projects. What is unchanged is that `pytest` will still do the following on every run: 1. Check that a locally stored copy of the test data exists in a platform-dependent default location and, if not found, will fetch a copy. 2. Each worker of `pytest` creates its own copy of the test data, which is delivered by its own `pooch` instance, written to a threadsafe temporary directory 3. The equivalent to the `get_file()` fixture is now `nimbus.fetch()`, providing the absolute paths to files, respective of platform and workers. Many tests related to testing the file accessors have also been removed (as these are now out of scope).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
.md5
checksum file generated withmake_check_sums.py
This will be paired with changes to be made in:
Changes
data
folder.registry.txt
file containing paired filepaths and sha256sums for each file underdata
.README.md
file and updates the examples to show the new API.More Information
This new system will leverage the
pooch
library to handle checksum verification and file caching, removing the need for many helper functions currently copied inxclim
andclisops
(andRavenPy
). Thepooch
approach should speed up testing data fetching as checksums will no longer verified on a file-to-file basis using remote calls to GitHub. Instead, locally cached files will be compared to aregistry.txt
file that is downloaded from here when tests are spun up and (assuming the local checksums do not match existing file checksums) new files will be downloaded as needed.The goal here is to reduce the number of lines of code at both
xclim
,clisops
, possibly other projects.