Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainers: split tasks into separate files, add SemanticSegmentationTask #224

Merged
merged 5 commits into from
Nov 7, 2021

Conversation

adamjstewart
Copy link
Collaborator

This is the last refactor I want to get in before the 0.1.0 release. See #205 for motivation.

@adamjstewart adamjstewart added the trainers PyTorch Lightning trainers label Nov 5, 2021
@calebrob6
Copy link
Member

I don't understand why tasks should be organized by problem type -- by dataset seems more appropriate to me as there could be several different tasks for a given dataset.

@adamjstewart
Copy link
Collaborator Author

I don't understand why tasks should be organized by problem type

The way I'm envisioning things is that there are many Datasets/DataModules but only a few tasks. That's the whole reason we consolidated everything to a single ClassificationTask and RegressionTask. The plan is to do the same for SemanticSegmentationTask.

Basically, all of these tasks are almost identical, and half of our code base is the same code repeated multiple times. Because of that repetition, it's really hard to add a new model or loss function without adding it in several places. Conversely, if you fix a bug, you have to fix it in multiple places, making it easy to miss a spot.

there could be several different tasks for a given dataset.

Can you give an example of what you mean by this? For COWC there is a classification version and a regression version, so you just choose the appropriate task and use it. I'm not sure why you would need multiple different kinds of RegressionTasks.

@adamjstewart
Copy link
Collaborator Author

Right now we have things like LandcoverAISegmentationTask and NAIPChesapeakeSegmentationTask, but in the long run these will be unified into a single SemanticSegmentationTask. This PR does the preliminary move to have all of these in a single file so it's easier to combine them later and keep some history.

@adamjstewart adamjstewart marked this pull request as draft November 5, 2021 21:44
@adamjstewart
Copy link
Collaborator Author

Going to add a SemanticSegmentationTask and remove most of this duplication, shouldn't take long.

@calebrob6
Copy link
Member

calebrob6 commented Nov 5, 2021

I understand the motivation for factoring out the general logic into these super classes, I'm asking why reorganize them into different files by task type? Do you imagine that these will be the only trainers we have?

Can you give an example of what you mean by this? For COWC there is a classification version and a regression version, so you just choose the appropriate task and use it. I'm not sure why you would need multiple different kinds of RegressionTasks.

Any cases where you need to subclass to override the train_step different ways for a given dataset/datamodule. Easy example is different training setups with the ChesapeakeCVPR dataset where you incorporate different layers in the loss function. You may want to just train in a vanilla with the high-resolution labels, but then you may also want to additionally use the NLCD labels in a different training. Another example is in the change detection datasets/tasks where a generic loss won't really make sense.

@adamjstewart
Copy link
Collaborator Author

Do you imagine that these will be the only trainers we have?

Yes, more or less. Obviously we will add additional trainers for things like InstanceSegmentationTask, ObjectDetectionTask, etc. But I think that these trainers cover the vast majority of use cases.

Any cases where you need to subclass to override the train_step different ways for a given dataset/datamodule.

Anything that is dataset-specific should go in the DataModule. I think that this is generally possible if we structure things intelligently, but I could be very wrong.

Easy example is different training setups with the ChesapeakeCVPR dataset where you incorporate different layers in the loss function. You may want to just train in a vanilla with the high-resolution labels, but then you may also want to additionally use the NLCD labels in a different training.

Could this be handled by adding a layers arg to ChesapeakeCVPRDataModule?

Another example is in the change detection datasets/tasks where a generic loss won't really make sense.

Can you elaborate on this? We can always add additional loss functions to the task. I (possibly naively) think of change detection as simply binary semantic segmentation.

@adamjstewart adamjstewart marked this pull request as ready for review November 5, 2021 22:52
@adamjstewart adamjstewart changed the title Trainers: split tasks into separate files Trainers: split tasks into separate files, add SemanticSegmentationTask Nov 5, 2021
@calebrob6
Copy link
Member

Maybe even a better example: different training augmentations will be appropriate for different dataset -- especially cropping. We went ahead and just removed most augmentations from the trainers for simplicity, however proper augmentation is a crucial part of training. It would be cumbersome to define these via config, and in Kornia these should be defined in the tasks not the datamodules to take advantage of GPU acceleration. I imagine that we will, at the least, need lightweight classes per task to define this.

@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Nov 5, 2021

in Kornia these should be defined in the tasks not the datamodules to take advantage of GPU acceleration

Is that the only reason we can't put data augmentations in a DataModule? I feel like we can workaround that if we need to, we can always submit a PR to PyTorch Lightning.

@calebrob6
Copy link
Member

Could this be handled by adding a layers arg to ChesapeakeCVPRDataModule?

You can definitely add a layers arg to the ChesapeakeCVPRDataModule, and then it will return sample["image"] and sample["mask"] with different numbers of channels. But now, how does the SemanticSegmentationTask know what do with this?

@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Nov 5, 2021

You can definitely add a layers arg to the ChesapeakeCVPRDataModule, and then it will return sample["image"] and sample["mask"] with different numbers of channels. But now, how does the SemanticSegmentationTask know what do with this?

Number of image channels is controlled by config. If number of mask channels is not 1, we would add a MultiLabelSemanticSegmentationTask like we did with classification. Then number of mask channels will also be controlled by config.

@calebrob6
Copy link
Member

Is that the only reason we can't put data augmentations in a DataModule? I feel like we can workaround that if we need to, we can always submit a PR to Kornia.

This is an issue of where the computation is run, you can run Kornia augmentation code wherever you want, but DataModules don't get put on the GPU (or even know about the GPU AFAIK).

Number of image channels is controlled by config. If number of mask channels is not 1, we would add a MultiLabelSemanticSegmentationTask like we did with classification. Then number of mask channels will also be controlled by config.

It isn't a matter of "number of masks" or "number of input channels", but how you might want to use them in training the model. In ChesapeakeCVPR we assume you have high-resolution labels in some places and low-resolution labels everywhere. You can train the semantic segmentation task using a loss function that depends on both the high-resolution and low resolution labels. Put differently, there are (at least) two valid ways of training a semantic segmentation model on ChesapeakeCVPR:

  • One uses on layer of labels and a normal cross entropy loss. This is a vanilla setup that is handled nicely by SemanticSegmentationTask
  • One uses two layers of labels and a super-resolution loss. This is a slightly more involved setup that fundamentally cannot be handled by SemanticSegmentationTask because it involves specific knowledge about the dataset. You could have a MixedLowAndHighResolutionLabelSemanticSegmentationTask that handles this, but that seems more confusing to me.

@adamjstewart
Copy link
Collaborator Author

For the Kornia thing: does the Task have any knowledge of the DataModule? I wonder if we can add an if-statement like if hasattr(self.datamodule, 'train_augmentation'): do stuff...

For the Chesapeake stuff: you've definitely convinced me that there are situations that can't be handled by SemanticSegmentationTask, but I would argue that what you've described sounds more like an interesting research project and less like a useful library feature. I think that's something that would make more sense in experiments/ than in torchgeo/trainers, as it isn't particularly useful for the average user, or even the power user, unless they happen to be working on that specific dataset. I'm not saying it isn't something we could support, but it's not something I would expose to the user. The goal of a library is to provide tools to make hard things easy and impossible things possible. I want to enable users to use TorchGeo to do cool research, and this hyper-specific Chesapeake task sounds like really cool research! And the fact that a user can subclass SemanticSegmentationTask and override a single function and do that is amazing.

Basically, I understand your point that not everything can be done with these simple tasks, but if 99% of the work can be done with these, I think that's good enough. Everything else can be done adhoc by the user or in experiments/.

@calebrob6
Copy link
Member

calebrob6 commented Nov 6, 2021

See https://colab.research.google.com/github/kornia/tutorials/blob/master/source/data_augmentation_kornia_lightning_gpu.ipynb or https://github.com/microsoft/torchgeo/blob/trainers/refactor/torchgeo/trainers/segmentation.py#L331 for an example of how Kornia + PyTorchLightning works. K.AugmentationSequential is a Module that gets automatically placed on the device that the LightningModule is using (this could even be across multiple GPUs/machines).

Yeah I largely agree with that (that we shouldn't actually implement the trainer that I described above in torchgeo proper), but this is about reorganizing the task classes. Trying put everything in {classification, regression, semanticsegmentation, ...} categories is limiting as, fundamentally, training is strongly coupled to datasets. ImageNet training looks very different than MNIST training. Are there arguments for this reorg other than "TorchGeo shouldn't have many tasks?"

@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Nov 6, 2021

Are there arguments for this reorg other than "TorchGeo shouldn't have many tasks?"

The proof is in the pudding. This PR adds several new features:

  • SEN12MS now supports deeplabv3+ and fcn models
  • NAIP + Chesapeake now supports focal loss
  • SEN12MS now supports focal loss
  • SEN12MS now tracks IOU metrics

In the process of adding these new features, I:

  • Removed 650 lines of code
  • Shaved 1 min off testing time
  • Improved maintainability
  • Made it trivially easy to add trainers for new semantic segmentation datasets

If I can add all of these features while reducing the total lines of code, the refactor was a success in my book. The number of tasks was never the issue, it was the fact that so much of our code was duplicated, and that:

  • If you add a new task, you have to duplicate that same code again
  • If you want to add a new model/loss/metric, you need to do it in 4+ files
  • If you find and fix a bug, you need to do it in 4+ files

This simply isn't maintainable, and leads to tasks that have different features and different bugs.

@calebrob6
Copy link
Member

I think you're missing my question - I'm totally fine with the refactoring part, that part is great, I don't understand why we are moving/reorganizing the dataset specific tasks into the generic files.

@adamjstewart
Copy link
Collaborator Author

Because the dataset-specific tasks are deprecated and will be removed in the near future. The only difference between these dataset-specific tasks is how samples are plotted. This logic should be moved to the respective Dataset so that both trainer- and non-trainer-based workflows can benefit from it. Once that's done, these dataset-specific tasks will be removed.

If you want I can keep those tasks in separate files until they get removed, but it will increase the lines of code a bit just because of duplicate imports. I don't have a strong preference about this since they will be gone in a couple months.

@isaaccorley
Copy link
Collaborator

I think this is fine but we are probably going to find that segmentation and detection tasks have custom differences for each dataset so we may want to change to organizing tasks into their own task folder with different scripts.

@adamjstewart
Copy link
Collaborator Author

we are probably going to find that segmentation and detection tasks have custom differences for each dataset

Can you give examples of this that can't be handled by dataset-specific DataModules or generic hparams? So far all of the examples I've seen are too hyper-specific to warrant including them in TorchGeo. That doesn't mean that users can't subclass SemanticSegmentationTask and override certain methods, but that these don't need to be included in TorchGeo.

@calebrob6
Copy link
Member

E.g. I have re-implemented a RESISC45 task that inherits from the Classification task that just contains a constructor with Kornia augmentations and override of train_step to use them. Without these, training loss goes to 0 and train acc goes to 1. With these, training is better regularized and val/test acc is better (see below). I think it makes sense to not have this in classification.py, but where should it go?

image

@adamjstewart
Copy link
Collaborator Author

I'm pretty confident that we can find a way to use Kornia augmentations in a DataModule on the GPU without having to specify them in a Task. If not, we can open a PR with PyTorch Lightning to add support for this. This seems like something they would love to have support for since DataModules are designed to handle data loading, and data loaders are designed to handle data augmentations.

@calebrob6
Copy link
Member

Changing PyTorchLightning sounds like a huge task, but I know nothing about the internals of Lightning. How does this sound:

  1. keep the current class-specific trainers in their own files just until we can refactor out their functionality
  2. I'll push the custom RESISC trainer to torchgeo/trainers/resisc45.py as another example of the augmentation pattern
  3. we'll look to move dataset specific augmentation to the DataModules, we'll look to refactor plotting to Datasets, and dealing with any other dataset specific logic that comes up

@adamjstewart
Copy link
Collaborator Author

I'm okay with 1, but 2 seems like a step in the wrong direction. We already have an example of the augmentation pattern with LandCoverAISegmentationTask.

I can open issues with PyTorch Lightning to get the ball rolling on some of these ideas, but I don't want it to hold up this PR and the 0.1 release.

@calebrob6
Copy link
Member

calebrob6 commented Nov 6, 2021 via email

@calebrob6
Copy link
Member

Anyway, I think we agree on this one, happy to approve whenever

@adamjstewart
Copy link
Collaborator Author

Do you still want me to do 1 first or do you think this is good to merge as is? I'm pretty confident that we can get rid of the dataset-specific tasks, but if not we may need to think more about where to put them or whether to include them in TorchGeo proper.

@calebrob6
Copy link
Member

number 1 first

@adamjstewart adamjstewart merged commit 967b4b1 into main Nov 7, 2021
@adamjstewart adamjstewart deleted the trainers/refactor branch November 7, 2021 04:57
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
…sk (microsoft#224)

* Trainers: split tasks into separate files

* Add SemanticSegmentationTask

* Fix doc tests

* Keep dataset-specific tasks in separate files

* Remove duplicate So2Sat trainer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trainers PyTorch Lightning trainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants