Trainers: split tasks into separate files, add SemanticSegmentationTask #224

adamjstewart · 2021-11-05T16:48:26Z

This is the last refactor I want to get in before the 0.1.0 release. See #205 for motivation.

calebrob6 · 2021-11-05T17:34:41Z

I don't understand why tasks should be organized by problem type -- by dataset seems more appropriate to me as there could be several different tasks for a given dataset.

adamjstewart · 2021-11-05T17:59:10Z

I don't understand why tasks should be organized by problem type

The way I'm envisioning things is that there are many Datasets/DataModules but only a few tasks. That's the whole reason we consolidated everything to a single ClassificationTask and RegressionTask. The plan is to do the same for SemanticSegmentationTask.

Basically, all of these tasks are almost identical, and half of our code base is the same code repeated multiple times. Because of that repetition, it's really hard to add a new model or loss function without adding it in several places. Conversely, if you fix a bug, you have to fix it in multiple places, making it easy to miss a spot.

there could be several different tasks for a given dataset.

Can you give an example of what you mean by this? For COWC there is a classification version and a regression version, so you just choose the appropriate task and use it. I'm not sure why you would need multiple different kinds of RegressionTasks.

adamjstewart · 2021-11-05T18:01:12Z

Right now we have things like LandcoverAISegmentationTask and NAIPChesapeakeSegmentationTask, but in the long run these will be unified into a single SemanticSegmentationTask. This PR does the preliminary move to have all of these in a single file so it's easier to combine them later and keep some history.

adamjstewart · 2021-11-05T21:45:17Z

Going to add a SemanticSegmentationTask and remove most of this duplication, shouldn't take long.

calebrob6 · 2021-11-05T22:43:16Z

I understand the motivation for factoring out the general logic into these super classes, I'm asking why reorganize them into different files by task type? Do you imagine that these will be the only trainers we have?

Can you give an example of what you mean by this? For COWC there is a classification version and a regression version, so you just choose the appropriate task and use it. I'm not sure why you would need multiple different kinds of RegressionTasks.

Any cases where you need to subclass to override the train_step different ways for a given dataset/datamodule. Easy example is different training setups with the ChesapeakeCVPR dataset where you incorporate different layers in the loss function. You may want to just train in a vanilla with the high-resolution labels, but then you may also want to additionally use the NLCD labels in a different training. Another example is in the change detection datasets/tasks where a generic loss won't really make sense.

adamjstewart · 2021-11-05T22:51:13Z

Do you imagine that these will be the only trainers we have?

Yes, more or less. Obviously we will add additional trainers for things like InstanceSegmentationTask, ObjectDetectionTask, etc. But I think that these trainers cover the vast majority of use cases.

Any cases where you need to subclass to override the train_step different ways for a given dataset/datamodule.

Anything that is dataset-specific should go in the DataModule. I think that this is generally possible if we structure things intelligently, but I could be very wrong.

Easy example is different training setups with the ChesapeakeCVPR dataset where you incorporate different layers in the loss function. You may want to just train in a vanilla with the high-resolution labels, but then you may also want to additionally use the NLCD labels in a different training.

Could this be handled by adding a layers arg to ChesapeakeCVPRDataModule?

Another example is in the change detection datasets/tasks where a generic loss won't really make sense.

Can you elaborate on this? We can always add additional loss functions to the task. I (possibly naively) think of change detection as simply binary semantic segmentation.

calebrob6 · 2021-11-05T22:53:49Z

Maybe even a better example: different training augmentations will be appropriate for different dataset -- especially cropping. We went ahead and just removed most augmentations from the trainers for simplicity, however proper augmentation is a crucial part of training. It would be cumbersome to define these via config, and in Kornia these should be defined in the tasks not the datamodules to take advantage of GPU acceleration. I imagine that we will, at the least, need lightweight classes per task to define this.

adamjstewart · 2021-11-05T22:56:53Z

in Kornia these should be defined in the tasks not the datamodules to take advantage of GPU acceleration

Is that the only reason we can't put data augmentations in a DataModule? I feel like we can workaround that if we need to, we can always submit a PR to PyTorch Lightning.

calebrob6 · 2021-11-05T22:57:09Z

Could this be handled by adding a layers arg to ChesapeakeCVPRDataModule?

You can definitely add a layers arg to the ChesapeakeCVPRDataModule, and then it will return sample["image"] and sample["mask"] with different numbers of channels. But now, how does the SemanticSegmentationTask know what do with this?

adamjstewart · 2021-11-05T22:59:31Z

You can definitely add a layers arg to the ChesapeakeCVPRDataModule, and then it will return sample["image"] and sample["mask"] with different numbers of channels. But now, how does the SemanticSegmentationTask know what do with this?

Number of image channels is controlled by config. If number of mask channels is not 1, we would add a MultiLabelSemanticSegmentationTask like we did with classification. Then number of mask channels will also be controlled by config.

calebrob6 · 2021-11-05T23:10:05Z

Is that the only reason we can't put data augmentations in a DataModule? I feel like we can workaround that if we need to, we can always submit a PR to Kornia.

This is an issue of where the computation is run, you can run Kornia augmentation code wherever you want, but DataModules don't get put on the GPU (or even know about the GPU AFAIK).

Number of image channels is controlled by config. If number of mask channels is not 1, we would add a MultiLabelSemanticSegmentationTask like we did with classification. Then number of mask channels will also be controlled by config.

It isn't a matter of "number of masks" or "number of input channels", but how you might want to use them in training the model. In ChesapeakeCVPR we assume you have high-resolution labels in some places and low-resolution labels everywhere. You can train the semantic segmentation task using a loss function that depends on both the high-resolution and low resolution labels. Put differently, there are (at least) two valid ways of training a semantic segmentation model on ChesapeakeCVPR:

One uses on layer of labels and a normal cross entropy loss. This is a vanilla setup that is handled nicely by SemanticSegmentationTask
One uses two layers of labels and a super-resolution loss. This is a slightly more involved setup that fundamentally cannot be handled by SemanticSegmentationTask because it involves specific knowledge about the dataset. You could have a MixedLowAndHighResolutionLabelSemanticSegmentationTask that handles this, but that seems more confusing to me.

adamjstewart · 2021-11-06T00:08:41Z

For the Kornia thing: does the Task have any knowledge of the DataModule? I wonder if we can add an if-statement like if hasattr(self.datamodule, 'train_augmentation'): do stuff...

For the Chesapeake stuff: you've definitely convinced me that there are situations that can't be handled by SemanticSegmentationTask, but I would argue that what you've described sounds more like an interesting research project and less like a useful library feature. I think that's something that would make more sense in experiments/ than in torchgeo/trainers, as it isn't particularly useful for the average user, or even the power user, unless they happen to be working on that specific dataset. I'm not saying it isn't something we could support, but it's not something I would expose to the user. The goal of a library is to provide tools to make hard things easy and impossible things possible. I want to enable users to use TorchGeo to do cool research, and this hyper-specific Chesapeake task sounds like really cool research! And the fact that a user can subclass SemanticSegmentationTask and override a single function and do that is amazing.

Basically, I understand your point that not everything can be done with these simple tasks, but if 99% of the work can be done with these, I think that's good enough. Everything else can be done adhoc by the user or in experiments/.

calebrob6 · 2021-11-06T00:39:32Z

See https://colab.research.google.com/github/kornia/tutorials/blob/master/source/data_augmentation_kornia_lightning_gpu.ipynb or https://github.com/microsoft/torchgeo/blob/trainers/refactor/torchgeo/trainers/segmentation.py#L331 for an example of how Kornia + PyTorchLightning works. K.AugmentationSequential is a Module that gets automatically placed on the device that the LightningModule is using (this could even be across multiple GPUs/machines).

Yeah I largely agree with that (that we shouldn't actually implement the trainer that I described above in torchgeo proper), but this is about reorganizing the task classes. Trying put everything in {classification, regression, semanticsegmentation, ...} categories is limiting as, fundamentally, training is strongly coupled to datasets. ImageNet training looks very different than MNIST training. Are there arguments for this reorg other than "TorchGeo shouldn't have many tasks?"

adamjstewart · 2021-11-06T14:54:14Z

Are there arguments for this reorg other than "TorchGeo shouldn't have many tasks?"

The proof is in the pudding. This PR adds several new features:

SEN12MS now supports deeplabv3+ and fcn models
NAIP + Chesapeake now supports focal loss
SEN12MS now supports focal loss
SEN12MS now tracks IOU metrics

In the process of adding these new features, I:

Removed 650 lines of code
Shaved 1 min off testing time
Improved maintainability
Made it trivially easy to add trainers for new semantic segmentation datasets

If I can add all of these features while reducing the total lines of code, the refactor was a success in my book. The number of tasks was never the issue, it was the fact that so much of our code was duplicated, and that:

If you add a new task, you have to duplicate that same code again
If you want to add a new model/loss/metric, you need to do it in 4+ files
If you find and fix a bug, you need to do it in 4+ files

This simply isn't maintainable, and leads to tasks that have different features and different bugs.

calebrob6 · 2021-11-06T17:09:11Z

I think you're missing my question - I'm totally fine with the refactoring part, that part is great, I don't understand why we are moving/reorganizing the dataset specific tasks into the generic files.

adamjstewart · 2021-11-06T17:23:49Z

Because the dataset-specific tasks are deprecated and will be removed in the near future. The only difference between these dataset-specific tasks is how samples are plotted. This logic should be moved to the respective Dataset so that both trainer- and non-trainer-based workflows can benefit from it. Once that's done, these dataset-specific tasks will be removed.

If you want I can keep those tasks in separate files until they get removed, but it will increase the lines of code a bit just because of duplicate imports. I don't have a strong preference about this since they will be gone in a couple months.

isaaccorley · 2021-11-06T17:35:19Z

I think this is fine but we are probably going to find that segmentation and detection tasks have custom differences for each dataset so we may want to change to organizing tasks into their own task folder with different scripts.

adamjstewart · 2021-11-06T17:45:27Z

we are probably going to find that segmentation and detection tasks have custom differences for each dataset

Can you give examples of this that can't be handled by dataset-specific DataModules or generic hparams? So far all of the examples I've seen are too hyper-specific to warrant including them in TorchGeo. That doesn't mean that users can't subclass SemanticSegmentationTask and override certain methods, but that these don't need to be included in TorchGeo.

calebrob6 · 2021-11-06T17:47:00Z

E.g. I have re-implemented a RESISC45 task that inherits from the Classification task that just contains a constructor with Kornia augmentations and override of train_step to use them. Without these, training loss goes to 0 and train acc goes to 1. With these, training is better regularized and val/test acc is better (see below). I think it makes sense to not have this in classification.py, but where should it go?

adamjstewart · 2021-11-06T17:51:01Z

I'm pretty confident that we can find a way to use Kornia augmentations in a DataModule on the GPU without having to specify them in a Task. If not, we can open a PR with PyTorch Lightning to add support for this. This seems like something they would love to have support for since DataModules are designed to handle data loading, and data loaders are designed to handle data augmentations.

calebrob6 · 2021-11-06T18:18:36Z

Changing PyTorchLightning sounds like a huge task, but I know nothing about the internals of Lightning. How does this sound:

keep the current class-specific trainers in their own files just until we can refactor out their functionality
I'll push the custom RESISC trainer to torchgeo/trainers/resisc45.py as another example of the augmentation pattern
we'll look to move dataset specific augmentation to the DataModules, we'll look to refactor plotting to Datasets, and dealing with any other dataset specific logic that comes up

adamjstewart · 2021-11-06T18:35:05Z

I'm okay with 1, but 2 seems like a step in the wrong direction. We already have an example of the augmentation pattern with LandCoverAISegmentationTask.

I can open issues with PyTorch Lightning to get the ball rolling on some of these ideas, but I don't want it to hold up this PR and the 0.1 release.

calebrob6 · 2021-11-06T18:40:59Z

But RESISC45 trainer badly overfits, see above. It will be extra motivation to figure out augmentations ;)

…

On Sat, Nov 6, 2021, 11:35 AM Adam J. Stewart ***@***.***> wrote: I'm okay with 1, but 2 seems like a step in the wrong direction. We already have an example of the augmentation pattern with LandCoverAISegmentationTask. I can open issues with PyTorch Lightning to get the ball rolling on some of these ideas, but I don't want it to hold up this PR and the 0.1 release. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#224 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIJUTVWQXINRN7L6CT6L6DUKV7PJANCNFSM5HOIJEJA> .

calebrob6 · 2021-11-06T18:42:15Z

Anyway, I think we agree on this one, happy to approve whenever

adamjstewart · 2021-11-06T19:05:21Z

Do you still want me to do 1 first or do you think this is good to merge as is? I'm pretty confident that we can get rid of the dataset-specific tasks, but if not we may need to think more about where to put them or whether to include them in TorchGeo proper.

calebrob6 · 2021-11-06T19:57:58Z

number 1 first

…sk (microsoft#224) * Trainers: split tasks into separate files * Add SemanticSegmentationTask * Fix doc tests * Keep dataset-specific tasks in separate files * Remove duplicate So2Sat trainer

Trainers: split tasks into separate files

c60d3d6

adamjstewart added the trainers PyTorch Lightning trainers label Nov 5, 2021

adamjstewart requested review from calebrob6 and isaaccorley November 5, 2021 16:48

adamjstewart marked this pull request as draft November 5, 2021 21:44

Add SemanticSegmentationTask

79b351c

Fix doc tests

b716930

adamjstewart marked this pull request as ready for review November 5, 2021 22:52

adamjstewart changed the title ~~Trainers: split tasks into separate files~~ Trainers: split tasks into separate files, add SemanticSegmentationTask Nov 5, 2021

adamjstewart added 2 commits November 6, 2021 18:32

Keep dataset-specific tasks in separate files

e051d40

Remove duplicate So2Sat trainer

45f2fea

calebrob6 approved these changes Nov 7, 2021

View reviewed changes

adamjstewart merged commit 967b4b1 into main Nov 7, 2021

adamjstewart deleted the trainers/refactor branch November 7, 2021 04:57

adamjstewart added this to the 0.1.0 milestone Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainers: split tasks into separate files, add SemanticSegmentationTask #224

Trainers: split tasks into separate files, add SemanticSegmentationTask #224

adamjstewart commented Nov 5, 2021

calebrob6 commented Nov 5, 2021

adamjstewart commented Nov 5, 2021

adamjstewart commented Nov 5, 2021

adamjstewart commented Nov 5, 2021

calebrob6 commented Nov 5, 2021 •

edited

Loading

adamjstewart commented Nov 5, 2021

calebrob6 commented Nov 5, 2021

adamjstewart commented Nov 5, 2021 •

edited

Loading

calebrob6 commented Nov 5, 2021

adamjstewart commented Nov 5, 2021 •

edited

Loading

calebrob6 commented Nov 5, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021 •

edited

Loading

adamjstewart commented Nov 6, 2021 •

edited

Loading

calebrob6 commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

isaaccorley commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021 via email

calebrob6 commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021

Trainers: split tasks into separate files, add SemanticSegmentationTask #224

Trainers: split tasks into separate files, add SemanticSegmentationTask #224

Conversation

adamjstewart commented Nov 5, 2021

calebrob6 commented Nov 5, 2021

adamjstewart commented Nov 5, 2021

adamjstewart commented Nov 5, 2021

adamjstewart commented Nov 5, 2021

calebrob6 commented Nov 5, 2021 • edited Loading

adamjstewart commented Nov 5, 2021

calebrob6 commented Nov 5, 2021

adamjstewart commented Nov 5, 2021 • edited Loading

calebrob6 commented Nov 5, 2021

adamjstewart commented Nov 5, 2021 • edited Loading

calebrob6 commented Nov 5, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021 • edited Loading

adamjstewart commented Nov 6, 2021 • edited Loading

calebrob6 commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

isaaccorley commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021 via email

calebrob6 commented Nov 6, 2021

adamjstewart commented Nov 6, 2021

calebrob6 commented Nov 6, 2021

calebrob6 commented Nov 5, 2021 •

edited

Loading

adamjstewart commented Nov 5, 2021 •

edited

Loading

adamjstewart commented Nov 5, 2021 •

edited

Loading

calebrob6 commented Nov 6, 2021 •

edited

Loading

adamjstewart commented Nov 6, 2021 •

edited

Loading