-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor datamodule/model testing #329
Conversation
153b60c
to
a102194
Compare
Ah yeah I saw this too, I had to delete the bounding box that gets put in
the `sample` for the Chesapeake trainer to work.
…On Mon, Dec 27, 2021 at 8:52 AM Adam J. Stewart ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In torchgeo/datasets/utils.py
<#329 (comment)>:
> @@ -198,7 +198,7 @@ def download_radiant_mlhub_collection(
collection.download(output_dir=download_root, api_key=api_key)
***@***.***(frozen=True)
***@***.***
pytorch-lightning seems to want to overwrite the values in all objects in
a sample, including the bounding box. I'm not sure why it needs to do this,
but if I don't remove frozen=True, I see the following testing error:
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:737: in fit
self._call_and_handle_interrupt(
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:682: in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:772: in _fit_impl
self._run(model, ckpt_path=ckpt_path)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1195: in _run
self._dispatch()
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1274: in _dispatch
self.training_type_plugin.start_training(self)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py:202: in start_training
self._results = trainer.run_stage()
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1284: in run_stage
return self._run_train()
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1314: in _run_train
self.fit_loop.run()
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/loops/base.py:145: in run
self.advance(*args, **kwargs)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py:234: in advance
self.epoch_loop.run(data_fetcher)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/loops/base.py:145: in run
self.advance(*args, **kwargs)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py:160: in advance
batch = self.trainer.accelerator.batch_to_device(batch)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py:206: in batch_to_device
return model._apply_batch_transfer_handler(batch, device=device, dataloader_idx=dataloader_idx)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py:291: in _apply_batch_transfer_handler
batch = self.transfer_batch_to_device(batch, device, dataloader_idx)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/core/hooks.py:707: in transfer_batch_to_device
return move_data_to_device(batch, device)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py:279: in move_data_to_device
return apply_to_collection(batch, dtype=dtype, function=batch_to)
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py:100: in apply_to_collection
v = apply_to_collection(
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py:114: in apply_to_collection
v = apply_to_collection(
../.spack/.spack-env/view/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py:146: in apply_to_collection
setattr(result, field_name, v)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = BoundingBox(minx=-8411085.25243577, maxx=-8410957.25243577, miny=4720865.81310193, maxy=4720993.81310193, mint=0.0, maxt=9.223372036854776e+18), name = 'minx'
value = -8411085.25243577
> ???
E dataclasses.FrozenInstanceError: cannot assign to field 'minx'
—
Reply to this email directly, view it on GitHub
<#329 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIJUTUZRBF4SI2ODHTQ4S3UTCKT5ANCNFSM5KYHPCRQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
|
||
|
||
# The values here are taken from the defaults here https://pytorch-lightning.readthedocs.io/en/1.3.8/common/trainer.html#init | ||
# this probably should be made into a schema, e.g. as shown https://omegaconf.readthedocs.io/en/2.0_branch/structured_config.html#merging-with-other-configs | ||
trainer: # These are the parameters passed to the pytorch lightning Trainer object | ||
logger: True | ||
checkpoint_callback: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these defaults have been renamed. The previous values give deprecation warnings. We may want to not hardcode these values and instead let pytorch-lightning assign these values automatically.
ignore_zeros: False | ||
datamodule: | ||
naip_root_dir: "tests/data/naip" | ||
chesapeake_root_dir: "tests/data/chesapeake/BAYWIDE" | ||
batch_size: 32 | ||
batch_size: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a batch size of 1 (what I'm using on all other tests) this breaks and I don't know why.
root_dir: "tests/data/oscd" | ||
batch_size: 1 | ||
num_workers: 0 | ||
val_split_pct: 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val_split_pct == 0
breaks the tests and I don't know why
"ucmerced", | ||
], | ||
) | ||
def test_tasks(task: str, tmp_path: Path) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are now in tests/trainers/*.py
and will be run on every commit instead of just on releases.
batch_size: 64 | ||
patches_per_tile: 2 | ||
patch_size: 64 | ||
batch_size: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BYOLTask tests fail with a batch size of 1 and I have no idea why. SemanticSegmentationTask tests work fine with a batch size of 1. Oh, the mysteries of life...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any preprocessing methods use .squeeze()
, then they will remove the batch dimension which will in turn break the forward pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update these to define the only dim that should be squeezed like .squeeze(dim=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried this (both dim=0 and dim=1) and as soon as it fixes one issue, it creates another one. I don't think I have the time to debug this any further, but if anyone wants to submit a follow-up PR to fix this I would be very happy.
@@ -261,7 +261,7 @@ def __init__( | |||
image_size: the size of the training images | |||
hidden_layer: the hidden layer in ``model`` to attach the projection | |||
head to, can be the name of the layer or index of the layer | |||
input_channels: number of input channels to the model | |||
in_channels: number of input channels to the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trainer was pretty out-of-sync with our other trainers and uses different key names. Tried to sync them a bit more.
082a078
to
e3811cf
Compare
batch_size: 2 | ||
num_workers: 0 | ||
class_set: ${experiment.module.num_classes} | ||
use_prior_labels: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These datamodule settings (use_prior_labels: True
) work with BYOLTask but not with SemanticSegmentationTask and I have no idea why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what error you're getting but BYOLTask doesn't use the masks so not surprised it passes. The prior labels I believe are soft probabilities so I don't think we've set up the SegmentationTask loss to handle that.
For OSCD, I think a lot of the confusion is that our image is [2 x C x H x W] instead of [2C x H x W] unlike all other datasets. We could concatenate the two images instead of stacking to resolve some of these issues. Alternatively, we could create a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lgtm. My only concern is that adding data.py
scripts as an additional step for adding datasets may be confusing/complicated to new contributors.
I'm not sure if I would consider |
* Refactor RegressionTask testing * Programmatically determine max value * Refactor ClassificationTask testing * Silence warnings * Refactor SegmentationTask testing * Fix training mappings * Fix GeoDataset trainers * Fix ETCI trainer fake data * Update OSCD training data * Get LandCoverAI tests to pass * Fix OSCD checksum handling * Fix NAIP-Chesapeake tests * Fix OSCD tests * Keep BoundingBox icy * Fix other datamodules * Fix chesapeake testing * Refactor BYOLTask tests * Style fixes * Silence pytorch-lightning warnings * Get coverage for Chesapeake CVPR prior * Fix trainer tests
Note: The word "model" in this PR refers to pytorch-lightning models, AKA
torchgeo.trainers.*Task
objects. The word "trainer" in this PR refers topl.Trainer
objects, nottorchgeo.trainers.*Task
objects. I really wish naming conventions between torchvision and pytorch-lightning were more consistent...Motivation
Previously, we attempted to unit test all datamodules and models. However, pytorch-lightning isn't designed for unit testing (datamodules/models don't work standalone, they need a
pl.Trainer
class wrapping them). This meant that we needed to monkeypatch large parts of the code to add fake trainers and loggers. In order to properly test #286, we would need to monkeypatch even more features. This isn't sustainable, and would break if pytorch-lightning ever changed their API.Implementation
This PR removes almost all tests from
tests/datamodules
and convertstests/trainers
to use real datamodules and models for integration testing with apl.Trainer
. This has a number of advantages:This change also necessitates modifying some of our testing data to increase image sizes. In these cases, I've added a
data.py
script that can be used to generate testing data. We should start encouraging these for contributors adding new datasets. In the future, this could allow us to remove all fake data from the repo and generate it on-the-fly for testing purposes.