Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigEarthNet Trainers #211

Merged
merged 20 commits into from
Nov 2, 2021
Merged

Conversation

isaaccorley
Copy link
Collaborator

  • Adds torchgeo.trainers.BigEarthNetClassificationTask and torchgeo.trainers.BigEarthNetDataModule
  • Adds torchgeo.trainers.tasks.MultiLabelClassficationTask. This differs from ClassificationTask by using torch.nn.BCEWithLogitsLoss and modifies the metrics to handle multilabel outputs
  • Adds BigEarthNet train configs
  • Adds unit tests
  • Added additional BigEarthNet test data to allow for a train/val/test split
  • Adds estimate band min/max stats based on random samples from the dataset

TODO:

  • Need to train to get benchmark result and compare to recent papers

@isaaccorley isaaccorley added the trainers PyTorch Lightning trainers label Oct 29, 2021
@isaaccorley isaaccorley requested a review from calebrob6 October 29, 2021 04:24
@isaaccorley isaaccorley self-assigned this Oct 29, 2021
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this trainer need to be added to train.py?

tests/trainers/test_bigearthnet.py Outdated Show resolved Hide resolved
torchgeo/trainers/__init__.py Show resolved Hide resolved
conf/bigearthnet.yaml Show resolved Hide resolved
tests/trainers/test_tasks.py Show resolved Hide resolved
torchgeo/trainers/bigearthnet.py Show resolved Hide resolved
torchgeo/trainers/tasks.py Show resolved Hide resolved
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still haven't heard a good reason why we can't have a single ClassificationTask and MultiLabelClassificationTask without all of these subclasses.

tests/trainers/test_bigearthnet.py Outdated Show resolved Hide resolved
tests/trainers/test_tasks.py Outdated Show resolved Hide resolved
tests/trainers/test_tasks.py Outdated Show resolved Hide resolved
torchgeo/trainers/__init__.py Outdated Show resolved Hide resolved
@isaaccorley
Copy link
Collaborator Author

I refactored test_tasks to use a dummy dataset and datamodule so we can test for varying number of channels but avoid having to repeat tests for bigearthnet and so2sat datamodule specific args. Let me know if this works.

@isaaccorley
Copy link
Collaborator Author

You are right thought that there is not a good case for making a separate task for each classification dataset. We can definitely remove them if that's the route you want to take.

DataLoader.__module__ = "torch.utils.data"


class BigEarthNetClassificationTask(MultiLabelClassificationTask):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting rid of this class is the last remaining thing to do before this can be merged.

Copy link
Collaborator

@adamjstewart adamjstewart Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would refactor ClassificationTask and MultiLabelClassificationTask first in separate PRs to keep this PR from getting any bigger.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should merge this because I still need to refactor the num classes as an input arg before I can remove it. If I do that then I will need to remove/refactor all other classification tasks as well.

@adamjstewart adamjstewart merged commit 3cc63de into microsoft:main Nov 2, 2021
@isaaccorley isaaccorley deleted the trainers/bigearthnet branch November 2, 2021 16:29
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* add additional bigearthnet test data for train/val/test split

* update bigearthnet dataset length test

* add MultiLabelClassificationTask

* add BigEarthNet trainer and datamodule

* add bigearthnet and multilabelclassificationtask tests

* mypy and format

* add estimated band min/max values for normalization

* softmax outputs to correctly compute metrics

* update min/max stats for 100k samples

* organize imports in torchgeo.trainers.__init__.py

* clean up fixtures in test_tasks.py

* added bigearthnet to train.py

* format

* move fixtures into class methods

* consolidate bigearthnet fixtures

* refactor tasks tests

* add scope=class

* style/mypy fixes

* mypy fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trainers PyTorch Lightning trainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants