-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster, more space-efficient tutorials #1124
Conversation
This is waiting on the 100 image version of EuroSat right? |
Yep, will start trying to integrate that now. |
724311f
to
c026e42
Compare
Notebook tests are passing for the first time in almost a year! |
I think this PR is mostly complete. I couldn't get caching working, but we can figure that out another day. A few things remaining I'm concerned about:
Depending on whether the notebook was saved locally or on Colab, the indentation changes, meaning every single line is changed (maybe we can find a JSON autoformatter to fix this?). And saving on Colab adds a ton of extraneous metadata that I don't want. We could remove all outputs, which would fix nbmake and reduce the size of the files, but then you wouldn't be able to see plots without running tutorials. We could also use nbsphinx to generate the outputs, but this would be slow and require downloads and need to happen on every commit. We would at least want to make smaller downloads for the last 3 tutorials. |
Would also be nice if we could run isort/flake8/pyupgrade on our notebooks. Or even store the notebook in a .py file and autoconvert it to .ipynb on the fly like PyTorch does. But I know less about those. |
Opened a discussion for this: #1152 I guess I'm fine merging this PR as is, although it would be nice to decide whether we want to include outputs in tutorials before we merge this PR which adds 10K lines of code. |
Not sure why the tests are crashing again. They were working fine on Colab. May have to try stripping outputs again. |
env: | ||
MLHUB_API_KEY: ${{ secrets.MLHUB_API_KEY }} | ||
run: pytest --nbmake --nbmake-timeout=3000 docs/tutorials --durations=10 | ||
run: pytest --nbmake --durations=10 --reruns=10 docs/tutorials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--durations=10
prints the 10 slowest tests (very useful for seeing which tests are worth speeding up). --reruns=10
reruns the tests up to 10 times until they pass.
Even with all the changes in this PR, the tests seem to still fail intermittently. I'm hoping that once treebeardtech/nbmake#80 is solved, the error message will be more useful. Until then, rerunning the tests is necessary to ensure that they pass.
* Speed up notebook tests * Black fix * Mock rest of variables * Undo URL changes * Update conda deps * Notebooks also plot images * Fix undefined variable * Test with serial data loading * Use tempfile for all data download directories * Encode timeout in notebook * Share datasets across processes * Fix missing import * Pretrained Weights: use EuroSAT100 * Transforms: use EuroSAT100 * Trainers: use EuroSAT100 * Blacken * MPLBACKEND is already Agg by default on Linux * Indices: use EuroSAT100 * Pretrained Weights: add output * Pretrained Weights: add output * Trainers: save output * Pretrained Weights: ResNet 50 -> 18 * Trainers: better graph * Indices: add missing plot * Cache downloads * Small edit * Revert "Cache downloads" This reverts commit 5276c53. * Revert "Revert "Cache downloads"" This reverts commit 137c69e. * env only * half env * Variable with no braces * Set tmpdir elsewhere * Give up on tmpdir caching * Trainers: clear output * lightning.pytorch package import * nbstripout * Rerun upon failure * Re-add caching * Rerun failures on release branch too
* Speed up notebook tests * Black fix * Mock rest of variables * Undo URL changes * Update conda deps * Notebooks also plot images * Fix undefined variable * Test with serial data loading * Use tempfile for all data download directories * Encode timeout in notebook * Share datasets across processes * Fix missing import * Pretrained Weights: use EuroSAT100 * Transforms: use EuroSAT100 * Trainers: use EuroSAT100 * Blacken * MPLBACKEND is already Agg by default on Linux * Indices: use EuroSAT100 * Pretrained Weights: add output * Pretrained Weights: add output * Trainers: save output * Pretrained Weights: ResNet 50 -> 18 * Trainers: better graph * Indices: add missing plot * Cache downloads * Small edit * Revert "Cache downloads" This reverts commit 5276c53. * Revert "Revert "Cache downloads"" This reverts commit 137c69e. * env only * half env * Variable with no braces * Set tmpdir elsewhere * Give up on tmpdir caching * Trainers: clear output * lightning.pytorch package import * nbstripout * Rerun upon failure * Re-add caching * Rerun failures on release branch too
This PR attempts to simplify and speed up our notebook tests, and includes the following changes:
Closes #665
Closes #1074
Before
Numbers from here.
*failed, likely longer if passed
†duplicate copies of each download
After
Numbers from here.
††shared copies across downloads