-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up and reducing the memory footprint of the trainer tests #344
Conversation
This isn't as much of a speed/memory difference as I was expecting. Any thoughts on why things are still quite slow and use a lot of memory? |
For classification, we could probably make the stride something large so we do less multiplies during convolution. Wonder if that would make a difference. |
For "test_classification.py", if I monkeypatch timm in every method we use ClassificationTask then it drops resource usage a little bit more:
However, even if I replace the models with noops (this is kind of hacky so I haven't committed it), the resource usage doesn't drop lower. This means that the datasets/datamodules are responsible for the additional time/memory. This is a decent improvement though, it drops test time from (sampling from recent Ubuntu3.6 runs) from ~2m20s to 1m35s (33% decrease). |
how can I do this once for the whole file? |
c3e640c
to
fc54edd
Compare
With the shrinking of the OSCD test data "test_segmentation.py" is now:
(a huge improvement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Not sure if we need docstrings for the classes in test_utils
@@ -30,6 +32,8 @@ def test_trainer(self, name: str, classname: Type[LightningDataModule]) -> None: | |||
model_kwargs = conf_dict["module"] | |||
model = RegressionTask(**model_kwargs) | |||
|
|||
model.model = RegressionTestModel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could monkeypatch torchvision.models.resnet18
here to save having the ResNet18 object actually being created in the constructor, but it seems like that won't save much time/RAM and will change when we upgrade RegressionTask
Tested again with the latest commit and latest main: main$ /usr/bin/time -v pytest
User time (seconds): 735.98
System time (seconds): 289.61
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:19.87
Maximum resident set size (kbytes): 8144368 tests/fix_memory$ /usr/bin/time -v pytest
User time (seconds): 620.70
System time (seconds): 210.79
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.44
Maximum resident set size (kbytes): 7420892 So that's a 15% (10 s) reduction in wall clock time and a 10% (700 MB) reduction in total RAM. This is definitely an improvement, but it's unclear if this is enough to avoid all OOM errors on Windows. I've still seen at least one OOM Windows error on main even after the reduction in OSCD pad size. |
Collection of comments:
|
You would probably want to create a fixture in |
I don't think that I want to make a fixture. I want to make it such that when the code in ClassificationTask calls
|
Fixtures don't need to have a return value, they can monkeypatch a section of code and then yield, allowing test functions to use that monkeypatched library. |
Oh cool, so it would be something like:
Then I can add that as a parameter to any test I want to use monkeypatched timm in? |
Yep, plus |
Can you rebase now that dataset-specific trainers (and tests) are gone? That should hopefully show further reduction in memory usage. |
fc54edd
to
9155955
Compare
Now main
tests/fix_memory
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to wait until after 0.2.0 but given that those tests are consistently failing I think we need this now. Changed the base to releases/v0.2
so that these can be merged into that PR branch.
* 0.2.0 release * Fix notebooks * Fix minimal dependency tests * Fix integration tests * Fix integration tests * Try to avoid running GitHub Actions twice on release PRs * Revert "Try to avoid running GitHub Actions twice on release PRs" This reverts commit a1ac7ab. * GeoDatasets use intersection, not addition * Adding stack_samples to benchmarks * Fix zero division error in SEN12MS tests * Replaces test models with dummy models (#344) Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * lc values must be < num_classes * updated indices tutorial with latest indices Co-authored-by: Caleb Robinson <calebrob6@gmail.com> Co-authored-by: isaaccorley <22203655+isaaccorley@users.noreply.github.com>
* 0.2.0 release * Fix notebooks * Fix minimal dependency tests * Fix integration tests * Fix integration tests * Try to avoid running GitHub Actions twice on release PRs * Revert "Try to avoid running GitHub Actions twice on release PRs" This reverts commit a1ac7ab. * GeoDatasets use intersection, not addition * Adding stack_samples to benchmarks * Fix zero division error in SEN12MS tests * Replaces test models with dummy models (microsoft#344) Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * lc values must be < num_classes * updated indices tutorial with latest indices Co-authored-by: Caleb Robinson <calebrob6@gmail.com> Co-authored-by: isaaccorley <22203655+isaaccorley@users.noreply.github.com>
Our "tests/trainers/" unit tests currently use the following resources (based off of
/usr/bin/time --verbose pytest tests/trainers/
on my machine):This PR monkeypatches / replaces the models (ResNet18s and UNets with ResNet18 backbones) in the generic tests with smaller models. This reduces the overall resource usage to:
For "test_classification.py" specifically the before/after is:
to
Similarly, for "test_segmentation.py":
to