Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing files in Hugging Face #46

Open
ADHuan opened this issue Jun 21, 2024 · 3 comments
Open

Missing files in Hugging Face #46

ADHuan opened this issue Jun 21, 2024 · 3 comments

Comments

@ADHuan
Copy link

ADHuan commented Jun 21, 2024

Hi,

I recently downloaded the full Satlas pretrain dataset from Hugging Face. However, upon reviewing the file lists, I noticed that several tar files for the NAIP dataset are missing for the years 2013, 2016, 2017, and 2018. The missing files are illustrated in the attached screenshots below.

2013-1
2013-2
2016
2017
2018

Additionally, I have roughly calculated the total data size of the available tar files for 2013, 2016, 2017, and 2018. There seems to be a mismatch between this total and the data size listed in your AWS S3 bucket.

Could you please verify the completeness of the dataset and ensure that all files are available for download on Hugging Face?

@favyen2
Copy link
Collaborator

favyen2 commented Jun 21, 2024

Yes it is complete, and I checked the total data size matches for 2018 (1.7 TB). Your screenshots show that files from every year are available. NAIP images are captured once every 2-3 years at a given location.

@favyen2 favyen2 closed this as completed Jun 21, 2024
@favyen2
Copy link
Collaborator

favyen2 commented Jun 21, 2024

OK I see now what you mean about missing files. I will check it.

@favyen2 favyen2 reopened this Jun 21, 2024
@favyen2
Copy link
Collaborator

favyen2 commented Aug 1, 2024

In the meantime please download from S3: https://github.com/allenai/satlas/blob/main/satlaspretrain_urls.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants