Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop geographic sampling strategy based on Worldcover #28

Closed
2 tasks done
yellowcap opened this issue Nov 13, 2023 · 1 comment · Fixed by #29
Closed
2 tasks done

Develop geographic sampling strategy based on Worldcover #28

yellowcap opened this issue Nov 13, 2023 · 1 comment · Fixed by #29
Assignees
Labels
data-pipeline Pull Requests about the data pipeline
Milestone

Comments

@yellowcap
Copy link
Member

yellowcap commented Nov 13, 2023

To have balanced data we probably need to oversample certain land cover types such as urban areas and wetlands. We will use a simplified version of the Worldcover dataset for this sampling effort.

Deliverables:

  • A script for doing sampling based on landcover classes
  • A list of locations / scenes to process for the initial data collection
@yellowcap yellowcap added the data-pipeline Pull Requests about the data pipeline label Nov 13, 2023
@yellowcap yellowcap self-assigned this Nov 13, 2023
@brunosan
Copy link
Member

As much as possible, please try to tweak this to add other sampling sources.

If we only sample to have all and roughly equal classes of land cover, we are teaching the model that that is the important variance to pay attention. This is not wrong, but I worry we will underindex the semantics that many applications need to learn about nature (besides land cover) and people. I.e. if possible sample also from:

For nature:

For people and human assets:

@brunosan brunosan added this to the v0 Release milestone Nov 15, 2023
yellowcap added a commit that referenced this issue Nov 15, 2023
yellowcap added a commit that referenced this issue Nov 15, 2023
yellowcap added a commit that referenced this issue Nov 15, 2023
@weiji14 weiji14 linked a pull request Nov 16, 2023 that will close this issue
yellowcap added a commit that referenced this issue Nov 16, 2023
yellowcap added a commit that referenced this issue Nov 16, 2023
* Add landcover based sampling scripts

Closes #28

* Drop duplicates, fix typo, uncomment compute_stats function.

* Fix comment that was out of sync with code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-pipeline Pull Requests about the data pipeline
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants