-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainers: split tasks into separate files, add SemanticSegmentationTask #224
Conversation
I don't understand why tasks should be organized by problem type -- by dataset seems more appropriate to me as there could be several different tasks for a given dataset. |
The way I'm envisioning things is that there are many Datasets/DataModules but only a few tasks. That's the whole reason we consolidated everything to a single ClassificationTask and RegressionTask. The plan is to do the same for SemanticSegmentationTask. Basically, all of these tasks are almost identical, and half of our code base is the same code repeated multiple times. Because of that repetition, it's really hard to add a new model or loss function without adding it in several places. Conversely, if you fix a bug, you have to fix it in multiple places, making it easy to miss a spot.
Can you give an example of what you mean by this? For COWC there is a classification version and a regression version, so you just choose the appropriate task and use it. I'm not sure why you would need multiple different kinds of RegressionTasks. |
Right now we have things like |
Going to add a |
I understand the motivation for factoring out the general logic into these super classes, I'm asking why reorganize them into different files by task type? Do you imagine that these will be the only trainers we have?
Any cases where you need to subclass to override the train_step different ways for a given dataset/datamodule. Easy example is different training setups with the ChesapeakeCVPR dataset where you incorporate different layers in the loss function. You may want to just train in a vanilla with the high-resolution labels, but then you may also want to additionally use the NLCD labels in a different training. Another example is in the change detection datasets/tasks where a generic loss won't really make sense. |
Yes, more or less. Obviously we will add additional trainers for things like InstanceSegmentationTask, ObjectDetectionTask, etc. But I think that these trainers cover the vast majority of use cases.
Anything that is dataset-specific should go in the DataModule. I think that this is generally possible if we structure things intelligently, but I could be very wrong.
Could this be handled by adding a
Can you elaborate on this? We can always add additional loss functions to the task. I (possibly naively) think of change detection as simply binary semantic segmentation. |
Maybe even a better example: different training augmentations will be appropriate for different dataset -- especially cropping. We went ahead and just removed most augmentations from the trainers for simplicity, however proper augmentation is a crucial part of training. It would be cumbersome to define these via config, and in Kornia these should be defined in the tasks not the datamodules to take advantage of GPU acceleration. I imagine that we will, at the least, need lightweight classes per task to define this. |
Is that the only reason we can't put data augmentations in a DataModule? I feel like we can workaround that if we need to, we can always submit a PR to PyTorch Lightning. |
You can definitely add a layers arg to the ChesapeakeCVPRDataModule, and then it will return |
Number of image channels is controlled by config. If number of mask channels is not 1, we would add a MultiLabelSemanticSegmentationTask like we did with classification. Then number of mask channels will also be controlled by config. |
This is an issue of where the computation is run, you can run Kornia augmentation code wherever you want, but DataModules don't get put on the GPU (or even know about the GPU AFAIK).
It isn't a matter of "number of masks" or "number of input channels", but how you might want to use them in training the model. In ChesapeakeCVPR we assume you have high-resolution labels in some places and low-resolution labels everywhere. You can train the semantic segmentation task using a loss function that depends on both the high-resolution and low resolution labels. Put differently, there are (at least) two valid ways of training a semantic segmentation model on ChesapeakeCVPR:
|
For the Kornia thing: does the Task have any knowledge of the DataModule? I wonder if we can add an if-statement like For the Chesapeake stuff: you've definitely convinced me that there are situations that can't be handled by Basically, I understand your point that not everything can be done with these simple tasks, but if 99% of the work can be done with these, I think that's good enough. Everything else can be done adhoc by the user or in |
See https://colab.research.google.com/github/kornia/tutorials/blob/master/source/data_augmentation_kornia_lightning_gpu.ipynb or https://github.com/microsoft/torchgeo/blob/trainers/refactor/torchgeo/trainers/segmentation.py#L331 for an example of how Kornia + PyTorchLightning works. Yeah I largely agree with that (that we shouldn't actually implement the trainer that I described above in torchgeo proper), but this is about reorganizing the task classes. Trying put everything in {classification, regression, semanticsegmentation, ...} categories is limiting as, fundamentally, training is strongly coupled to datasets. ImageNet training looks very different than MNIST training. Are there arguments for this reorg other than "TorchGeo shouldn't have many tasks?" |
The proof is in the pudding. This PR adds several new features:
In the process of adding these new features, I:
If I can add all of these features while reducing the total lines of code, the refactor was a success in my book. The number of tasks was never the issue, it was the fact that so much of our code was duplicated, and that:
This simply isn't maintainable, and leads to tasks that have different features and different bugs. |
I think you're missing my question - I'm totally fine with the refactoring part, that part is great, I don't understand why we are moving/reorganizing the dataset specific tasks into the generic files. |
Because the dataset-specific tasks are deprecated and will be removed in the near future. The only difference between these dataset-specific tasks is how samples are plotted. This logic should be moved to the respective Dataset so that both trainer- and non-trainer-based workflows can benefit from it. Once that's done, these dataset-specific tasks will be removed. If you want I can keep those tasks in separate files until they get removed, but it will increase the lines of code a bit just because of duplicate imports. I don't have a strong preference about this since they will be gone in a couple months. |
I think this is fine but we are probably going to find that segmentation and detection tasks have custom differences for each dataset so we may want to change to organizing tasks into their own task folder with different scripts. |
Can you give examples of this that can't be handled by dataset-specific DataModules or generic hparams? So far all of the examples I've seen are too hyper-specific to warrant including them in TorchGeo. That doesn't mean that users can't subclass SemanticSegmentationTask and override certain methods, but that these don't need to be included in TorchGeo. |
E.g. I have re-implemented a RESISC45 task that inherits from the Classification task that just contains a constructor with Kornia augmentations and override of train_step to use them. Without these, training loss goes to 0 and train acc goes to 1. With these, training is better regularized and val/test acc is better (see below). I think it makes sense to not have this in |
I'm pretty confident that we can find a way to use Kornia augmentations in a DataModule on the GPU without having to specify them in a Task. If not, we can open a PR with PyTorch Lightning to add support for this. This seems like something they would love to have support for since DataModules are designed to handle data loading, and data loaders are designed to handle data augmentations. |
Changing PyTorchLightning sounds like a huge task, but I know nothing about the internals of Lightning. How does this sound:
|
I'm okay with 1, but 2 seems like a step in the wrong direction. We already have an example of the augmentation pattern with I can open issues with PyTorch Lightning to get the ball rolling on some of these ideas, but I don't want it to hold up this PR and the 0.1 release. |
But RESISC45 trainer badly overfits, see above. It will be extra motivation
to figure out augmentations ;)
…On Sat, Nov 6, 2021, 11:35 AM Adam J. Stewart ***@***.***> wrote:
I'm okay with 1, but 2 seems like a step in the wrong direction. We
already have an example of the augmentation pattern with
LandCoverAISegmentationTask.
I can open issues with PyTorch Lightning to get the ball rolling on some
of these ideas, but I don't want it to hold up this PR and the 0.1 release.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#224 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIJUTVWQXINRN7L6CT6L6DUKV7PJANCNFSM5HOIJEJA>
.
|
Anyway, I think we agree on this one, happy to approve whenever |
Do you still want me to do 1 first or do you think this is good to merge as is? I'm pretty confident that we can get rid of the dataset-specific tasks, but if not we may need to think more about where to put them or whether to include them in TorchGeo proper. |
number 1 first |
…sk (microsoft#224) * Trainers: split tasks into separate files * Add SemanticSegmentationTask * Fix doc tests * Keep dataset-specific tasks in separate files * Remove duplicate So2Sat trainer
This is the last refactor I want to get in before the 0.1.0 release. See #205 for motivation.