-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Contrib test suite + tests for timm and sentence_transformers #1200
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! I left some minor questions, but overall this looks to be going in the right direction! 🔥
@osanseviero thanks for your feedback, it helped a lot !
This way we don't have to handle conflicts between different dependencies / versions. It makes local tests a bit more complex as it requires 1 env per contrib. Hopefully not much people will really need to run those stuff locally. What I plan to do is to add a script (either separate, either in makefile) to handle all the stuff with the venvs. WDYT in general ? |
Refactored a bit the # Setup all virtualenvs
make contrib_setup
# Run all tests
make contrib_tests
# Setup and run all tests at once
make contrib
# Delete all virtual envs (if corrupted)
make contrib_clear And for a specific lib: # Setup timm tests
make contrib_setup_timm
# Run timm tests
make contrib_test_timm
# Setup and run timm tests at once
make contrib_timm
# Delete timm virtualenv
make contrib_clear_timm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking neat! I want to make a second pass through this PR and also let's see if others have any thoughts
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
workflow_dispatch: | ||
push: | ||
branches: | ||
- ci_contrib_* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we run this for main as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If we want to make trigger it manually from main
it's possible but if we run it all the time, the main
branch could end up in a ❌ status as I expect contrib tests to fail (code can break in downstream library without a change on our side).
|
||
|
||
@contextlib.contextmanager | ||
def production_endpoint() -> Generator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed? With HF_ENDPOINT
you could change the endpoint you're using
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed?
I'd say yes, kinda the same as what already exists in the hfh tests/
. The problem with HF_ENDPOINT
is that it is evaluated only once at startup. What I want here is to make all calls to staging environment (especially pushing to repos) except some calls that have to be made to production environment (especially loading models).
Another solution could be to upload test models to staging but then we wouldn't notice if a model changed in production.
Thanks for the review @osanseviero. I made some changes and addresses all of your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I left a few comments
contrib: [ | ||
"sentence_transformers", | ||
"timm", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea to use a matrix here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this matrix be governed by a ls
first so that we have individual jobs for each folder under contrib
and without the need to specify each folder here? (nitpick though, this should (or shouldn't) be done in a follow up PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contrib/README.md
Outdated
4. Edit `makefile` to add the lib to `CONTRIB_LIBS` variable. Example: `CONTRIB_LIBS := timm transformers` | ||
5. Edit `.github/workflows/contrib-tests.yml` to add the lib to `matrix.contrib` list. Example: `contrib: ["timm", "transformers"]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd eventually look into automating these two so that it's slightly less error-prone, but as said above: nitpick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now done by make style
and make quality
. Good call to reduce contribution efforts.
|
||
Contrib tests can be [manually triggered in GitHub](https://github.com/huggingface/huggingface_hub/actions) with the `Contrib tests` workflow. | ||
|
||
Tests are not run in the default test suite (for each PR) as this would slow down development process. The goal is to notice breaking changes, not to avoid them. In particular, it is interesting to trigger it before a release to make sure it will not cause too much friction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also run them once a week just to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cron job added by d5949fa. Will run every week on Saturday midnight.
contrib/requirements.txt
Outdated
pytest | ||
pytest-env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo we could just require the testing
extra here instead of adding another requirements.txt
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Made the change.
@pytest.mark.xfail( | ||
reason=( | ||
"Production endpoint is hardcoded in sentence_transformers when pushing to Hub." | ||
) | ||
) | ||
def test_push_to_hub( | ||
multi_qa_model: SentenceTransformer, repo_name: str, cleanup_repo: None | ||
) -> None: | ||
multi_qa_model.save_to_hub(repo_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to eventually have a test that doesn't fail to ensure that save to hub actually works :) we could have a specific org for that, like skops does, but I understand it's a bit complex to setup + very annoying to have testing artifacts on the actual hub
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to eventually have a test that doesn't fail to ensure that save to hub actually works :)
Yes, completely agree on that. I'd like to do that later. I am about to open an issue/PR on sentence transformers side to test that properly.
Worse case scenario, I have set a reminder to myself in 10 days.
def test_push_to_hub(repo_name: str, cleanup_repo: None) -> None: | ||
model = timm.create_model("resnet18") | ||
timm.models.hub.push_to_hf_hub(model, repo_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also test that the model pushed is according to what we expect? For example, that we can redownload it and use it once again, as this would be the usual workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(discussed offline) Decision has been taken that the contrib/ test suite purpose is only to test the deprecation warnings in downstream libraries. Testing the validity of a pushed/downloaded model is therefore out of scope here.
This can be reevaluated in the future :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @LysandreJik feedback, other than that it LGTM! 🔥 great work
My only concern is identifying issues to 3rd party libraries close to release rather than earlier in the process, so the more often we could run this (e.g. every week sounds great), the better
Codecov ReportBase: 84.37% // Head: 84.33% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1200 +/- ##
==========================================
- Coverage 84.37% 84.33% -0.04%
==========================================
Files 44 44
Lines 4365 4355 -10
==========================================
- Hits 3683 3673 -10
Misses 682 682
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@osanseviero @LysandreJik thanks for the last reviews :)
So I think we are now finally good to go 😄 🔥 |
First discussed in #1190. Goal is to detect proactively breaking changes and deprecation warnings in downstream libraries. This is a very first implementation with tests for
timm
andsentence_transformers
libraries to validate the concept. I think for a start we can have a github workflow only triggeredmanually or
contrib_ci_*
branches. It doesn't really make sense to run it on each PR or on main.How it works ?
The contrib folder contains simple end-to-end scripts to test integration of
huggingface_hub
in downstream libraries. The main goal is to proactively notice breaking changes and deprecation warnings. Each library is tested in its own virtualenv (with its own dependencies). Here is the workflow for "timm" library:./contrib/timm/.venv
./contrib/timm/requirements.txt
: install timm frommain
branch. Configurable for each lib.pytest ./contrib/timm
See #1200 (comment) for more details.
How to add a new library ?
To add another contrib lib, one must:
./contrib/transformers
requirements.txt
file specific to this lib. Example./contrib/transformers/requirements.txt
./contrib/transformers/test_push_to_hub.py
make style
to editmakefile
and.github/workflows/contrib-tests.yml
to add the new lib.Run contrib tests in CI
Contrib tests can be manually triggered in GitHub with the
Contrib tests
workflow. CI is also triggered on branches starting byci_contrib_*
Tests are not run in the default test suite (for each PR) as this would slow down development process. The goal is to notice breaking changes, not to avoid them. In particular, it is interesting to trigger it before a release to make sure it will not cause too much friction.
Run contrib tests locally
Tests are separated to avoid conflicts between version dependencies. Before running tests, a virtual env must be setup for each contrib library. To do so, run:
See #1200 (comment) for more details.
Todo: