Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider hosting dataset on Huggingface & source.coop #15

Closed
robmarkcole opened this issue Sep 14, 2023 · 4 comments
Closed

Consider hosting dataset on Huggingface & source.coop #15

robmarkcole opened this issue Sep 14, 2023 · 4 comments

Comments

@robmarkcole
Copy link

robmarkcole commented Sep 14, 2023

I'm noting very slow download times for the dataset (my connection is fast):

satlas-dataset-v1-sentinel2-small.tar                  39%[++] 183.55G  3.87MB/s

I've experienced very rapid downloads from Huggingface and suggest it as an additional location to host and distribute the dataset

Additionally https://beta.source.coop/ would be a relevant portal

@favyen2
Copy link
Collaborator

favyen2 commented Sep 15, 2023

I will try to add it here https://huggingface.co/allenai/satlas-pretrain but it may take some time due to large size of the dataset.

@robmarkcole
Copy link
Author

Hi @favyen2 I see you got a couple of files up which is great. Can I request you prioritise the following data? Been attempting to download since start of week, still going

satlas_explorer_datasets_2023-07-24.tar                45%          ] 430.34G  3.34MB/s    eta 32h 8m

@robmarkcole
Copy link
Author

robmarkcole commented Oct 16, 2023

For the explorer dataset it took most of the week to download the tar and most of the weekend to untar. On reviewing the labelled datasets:

  • marine_infra: 28GB detection json + images
  • solar_farm: 19GB segmentation mask + images
  • tree_cover: 963GB segmentation mask + images
  • wind_turbine: 40.02 GB detection json + images

The 3 small datasets could be uploaded as individual datasets - HF has 40GB limit (TBC) per zip/tar so these should be fine. This would be a much faster experience for people who only care about one of those

@robmarkcole robmarkcole changed the title Consider hosting datasets on Huggingface Consider hosting dataset on Huggingface & source.coop Nov 21, 2023
@favyen2
Copy link
Collaborator

favyen2 commented Feb 28, 2024

The dataset is now available on Hugging Face. The hand-labeled datasets for individual tasks are updated regularly and we are still deciding how to release those on an ongoing basis.

@favyen2 favyen2 closed this as completed Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants