Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pooch #1889

Merged
merged 22 commits into from
Aug 28, 2024
Merged

Use pooch #1889

merged 22 commits into from
Aug 28, 2024

Conversation

Zeitsperre
Copy link
Collaborator

@Zeitsperre Zeitsperre commented Aug 20, 2024

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGELOG.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

  • Replaces the logic for file gathering and caching from the in-house developed version to instead use pooch.
    • In order to fetch testing data, one can now use the following:
    from xclim.testing.helpers import nimbus
    
    n = nimbus()
    # from a fork of xclim-testdata: nimbus(repo="https://github.com/Me/My_Repo", branch="my_test_branch")
    file = n.fetch("some_folder/some_data.nc")
  • Removes the remote GitHub calls for every file request (which was performed by _get()).
  • Exports most of the file request and cache handling to pooch, while maintaining a relatively unchanged API for users.
  • (To be confirmed) Speeds up the delivery of test data to tests by reducing the amount of redundant calls to fixtures and relying on a single pooch instance of pooch to prevent multiple setup stages.

Does this PR introduce a breaking change?

Absolutely. get_file and open_dataset no longer fetch remote files from GitHub. Instead, a locally-stored registry.txt file contains all the checksums of all files needed to run the tests and returns the appropriate file from a locally-held cache. If the file checksum does not match the expected value, it will attempt to replace it from the remote storage.

Additionally, the md5 files that accompanied all testing data files are now obsolete thanks to the use of the registry. The testing data is now versioned according to the xclim-testdata version/tag.

All the prefetch logic baked into the pytest calls has been removed, making the setup code much easier to follow. There is no longer a need to run $ xclim prefetch_testing_data unless users are running on Windows (for the very first run of pytest only).

There are now three environment variables to help developers:

  • XCLIM_TESTDATA_BRANCH
    • Controls the branch name of xclim-testdata.
  • XCLIM_TESTDATA_CACHE_DIR
    • Controls the local folder to be used when fetching the test data.
  • XCLIM_TESTDATA_REPO_URL
    • Controls the repository URL for xclim-testdata (for forks)

platformdirs is no longer a hard dependency. The default cache directory will only be determined if pooch is installed.

Other information:

There is still a lot of potential here to tighten this up; I'd like to land on a design that is clean and easily portable to other projects.

What is unchanged is that pytest will still do the following on every run:

  1. Check that a locally stored copy of the test data exists in a platform-dependent default location and, if not found, will fetch a copy.
  2. Each worker of pytest creates its own copy of the test data, which is delivered by its own pooch instance, written to a threadsafe temporary directory
  3. The equivalent to the get_file() fixture is now nimbus.fetch(), providing the absolute paths to files, respective of platform and workers.

Many tests related to testing the file accessors have been removed (as these are now out of scope), but I'd like to test the implementation a bit before merging this (for both coverage and peace of mind).

@github-actions github-actions bot added the API Interfacing and User Concerns label Aug 20, 2024
@github-actions github-actions bot added the CI Automation and Contiunous Integration label Aug 20, 2024
Copy link

github-actions bot commented Aug 20, 2024

Note

It appears that this Pull Request modifies the main.yml workflow.

On inspection, the XCLIM_TESTDATA_BRANCH environment variable is set to the most recent tag (v2024.8.23).

No further action is required.

@Zeitsperre Zeitsperre marked this pull request as ready for review August 21, 2024 19:27
@github-actions github-actions bot added information For development/intsructional purposes docs Improvements to documenation labels Aug 22, 2024
.github/workflows/main.yml Outdated Show resolved Hide resolved
Zeitsperre added a commit to bird-house/birdhouse-deploy that referenced this pull request Aug 23, 2024
## Overview

This PR updates the cloning of the xclim-testdata repo to reflect
structural changes.

## Changes

**Non-breaking changes**
- Adjusts the location of the xclim-testdata data folder

## Related Issue / Discussion

- Ouranosinc/xclim-testdata#29
- Ouranosinc/xclim#1889

## CI Operations

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci`` set to ``true`` in the PR description.
Note that using ``[skip ci]``, ``[ci skip]`` or ``[no ci]`` in the
commit message will override ``birdhouse_skip_ci`` from the PR
description.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
Zeitsperre added a commit that referenced this pull request Aug 23, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

xclim/testing/helpers.py Outdated Show resolved Hide resolved
@coxipi
Copy link
Contributor

coxipi commented Aug 23, 2024

THe only change on user side is that open_dataset will automatically use caching if pooch is installed correct?

I tried reviewing the code but there is a lot of things I don't really understand, my approval would not be much better than rubber stamp

Copy link
Collaborator

@aulemahal aulemahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Thanks.
A suggestion and yet again a nitpicky comment on the changelog ;).

CHANGELOG.rst Outdated Show resolved Hide resolved
docs/notebooks/analogs.ipynb Outdated Show resolved Hide resolved
@github-actions github-actions bot added the approved Approved for additional tests label Aug 27, 2024
@Zeitsperre Zeitsperre merged commit c3045b1 into main Aug 28, 2024
23 checks passed
@Zeitsperre Zeitsperre deleted the use-pooch branch August 28, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Interfacing and User Concerns approved Approved for additional tests CI Automation and Contiunous Integration docs Improvements to documenation information For development/intsructional purposes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants