-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you please add the Stanford dog dataset? #4504
Comments
would you like to give it a try, @dgrnd4? (maybe with the help of the dataset author?) |
@julien-c i am sorry but I have no idea about how it works: can I add the dataset by myself, following "instructions to add a new dataset"? |
Hi! The ADD NEW DATASET instructions are indeed the best place to start. It's also perfectly fine to add a dataset if it's public, even if it's not yours. Let me know if you need some additional pointers. |
If no one is working on this, I could take this up! |
@khushmeeet this is the link where I added the dataset already. If you can I would ask you to do this:
Thank you!! |
Hi @khushmeeet! Thanks for the interest. You can self-assign the issue by commenting Also, I think we can skip @dgrnd4's steps as we try to avoid any custom processing on top of raw data. One can later copy the script and override |
Thanks @mariosasko @dgrnd4 As dataset is there on Hub, and preprocessing is not recommended. I am not sure if there is any other task to do. However, I can't seem to find relevant |
@khushmeeet @mariosasko The point is that the images must be processed and must have the same size in order to can be used for things for example "Training". |
@dgrnd4 Yes, but this can be done after loading ( @khushmeeet The linked version is implemented as a no-code dataset and is generated directly from the ZIP archive, but our "GitHub" datasets (these are datasets without a user/org namespace on the Hub) need a generation script, and you can find one here. |
@mariosasko The point is that if I use something like this: to get Train 90% and Test 10%, and then to get the Validation Set (10% of the whole 100%):
The point is that the structure of the data is:
So how to extract images and labels? EDIT --> Split of the dataset in Train-Test-Validation:
|
This comment was marked as resolved.
This comment was marked as resolved.
#self-assign |
I have raised PR for adding stanford-dog dataset. I have not added any data preprocessing code. Only dataset generation script is there. Let me know any changes required, or anything to add to README. |
Is this issue still open, i am new to open source thus want to take this one as my start. |
I didn't know about this issue until now but i added my version of the dataset on the hub with the bboxes : Although I could have made it cleaner and built the splits from the .txt files + put into the coco format. |
Adding a Dataset
Instructions to add a new dataset can be found here.
The text was updated successfully, but these errors were encountered: