Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets/cv4a kenya crop type competition #22

Merged
merged 13 commits into from
Jun 7, 2021

Conversation

calebrob6
Copy link
Member

Draft of the CV4A Kenya Crop Type Dataset.

Some things to check:

  • flake8 is complaining about line length and the formatting of some numpy slicing, not sure how to appease it.
  • The subset of bands you pass to the constructor must be a tuple as the caching needs hashable types to work. We need to enforce this.
  • Not sure if I did the type annotations correctly.
  • The train/test split for this dataset is defined at a pixel level. I included get_splits() that returns the list of field_ids in each split, but it isn't immediately obvious to me how someone would use this. Maybe we should discuss this in detail.
  • I made a jupyter notebook for benchmarking this dataset, and displaying how chips are returned for different chip_size and stride values. Should we include notebooks like this in the repo, if not, what is the best way to give code examples that include visuals? E.g. the following image is when chip_size=stride=256
    image

@calebrob6 calebrob6 requested a review from adamjstewart June 5, 2021 19:47
@calebrob6 calebrob6 marked this pull request as draft June 5, 2021 19:47
@adamjstewart
Copy link
Collaborator

  1. Black should fix most of the line length issues, I'll push a commit to address this.
  2. We can add an assert statement for that
  3. I'll take a look
  4. I'll have to think about the train/test split stuff a bit
  5. In the long run, I would like to include Jupyter Notebooks in a tutorials/examples section much like torchvision does. I've never done this before so I'll have to investigate what is involved in getting this up and running. I would also like to incorporate these notebooks into CI tests so we can keep them up-to-date.

We'll need to add radiant_mlhub as a dependency in a few files, I'll get started on a Spack package for it.

@calebrob6
Copy link
Member Author

calebrob6 commented Jun 5, 2021

Black should fix most of the line length issues, I'll push a commit to address this.

The line length issue was because there was an URL in a comment block that was > 88 characters. I removed it because it was unnecessary, but how do you generally deal with this? Is there an easy way to auto format comments that are > 88 chars?

We can add an assert statement for that

Added to the _validate_bands method.

I'll take a look

I saw that this was what mypy was complaining about and fixed the blatant issues. I'm unfamiliar with using typing, so any comments/suggestions would be much appreciated!

In the long run, I would like to include Jupyter Notebooks in a tutorials/examples section much like torchvision does. I've never done this before so I'll have to investigate what is involved in getting this up and running. I would also like to incorporate these notebooks into CI tests so we can keep them up-to-date.

This sounds great. Should we put them in a temporary place for now, then git mv later?

@adamjstewart
Copy link
Collaborator

Is there an easy way to auto format comments that are > 88 chars?

Usually flake8 is smart enough to ignore URLs. If you put the URL on a line by itself that usually works.

Should we put them in a temporary place for now, then git mv later?

Yeah, throw them in a docs directory for now. You'll also need to remove *.ipynb from .gitignore.

@calebrob6
Copy link
Member Author

Okay should be good to go now! I'll leave this to you to review and we can talk about the train/test splits whenever.

@calebrob6 calebrob6 linked an issue Jun 6, 2021 that may be closed by this pull request
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed a first round of review. Overall, looks great. There's a lot of design decisions here we can argue about in the future, but for now I would rather merge this as is and iterate on those later. Left a few comments/questions. I'll also push a couple of commits to fix minor missing things I noticed. I can also address most of the things I commented on if you don't mind.

.gitignore Outdated Show resolved Hide resolved
docs/CV4A Kenya Crop Type Dataset.ipynb Show resolved Hide resolved
torchgeo/datasets/cv4a_kenya_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/cv4a_kenya_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/cv4a_kenya_crop_type.py Show resolved Hide resolved
torchgeo/datasets/cv4a_kenya_crop_type.py Show resolved Hide resolved
torchgeo/datasets/cv4a_kenya_crop_type.py Show resolved Hide resolved
torchgeo/datasets/cv4a_kenya_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/cv4a_kenya_crop_type.py Outdated Show resolved Hide resolved
torchgeo/datasets/cv4a_kenya_crop_type.py Outdated Show resolved Hide resolved
@calebrob6
Copy link
Member Author

Completed a first round of review. Overall, looks great. There's a lot of design decisions here we can argue about in the future, but for now I would rather merge this as is and iterate on those later. Left a few comments/questions. I'll also push a couple of commits to fix minor missing things I noticed. I can also address most of the things I commented on if you don't mind.

Yeah I don't mind at all -- thanks for the detailed comments! I wanted to do this exercise as early as possible so that I can understand the way you are thinking about the details. I'll incorporate the above comments into the next one I do.

@adamjstewart adamjstewart added datasets Geospatial or benchmark datasets supervised labels Jun 7, 2021
@adamjstewart adamjstewart marked this pull request as ready for review June 7, 2021 17:14
@adamjstewart adamjstewart merged commit a197de4 into main Jun 7, 2021
@adamjstewart adamjstewart deleted the datasets/cv4a-kenya-crop-type-competition branch June 7, 2021 18:50
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CV4A Kenya Crop Type Competition Dataset
2 participants