-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generate_to_hf Method to FastData Class #9
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @AndreaFrancis, thanks for working on this! I've left a few comments regarding the huggingface_hub implementation. Let me know if you have any questions :)
sorry this sat for so long @AndreaFrancis ! Please @ me or request a review from me so I get notified else it risks getting lost in my inbox 😅 . This is awesome, thank you so much for putting this together!! I'm very excited to use this myself since we have a project that will be a lot easier with this added 🤓 . As it stands now, I believe that the order of the inputs and what gets saved to the hub will not be preserved. I think it would be quite difficult to get that functionality so just documenting it is good since the original Could you add that note and also the suggestions from @Wauplin into your PR ensuring that its added to the ipynb instead of the py files? After that I think it is good to go 🤓 |
Co-authored-by: Lucain <lucainp@gmail.com>
Thank you for your feedback, @Wauplin and @ncoop57! I’ve addressed all your suggestions regarding CommitScheduler and included the order topic in the code. @ncoop57, please let me know if I missed anything. Here are the steps I followed:
Let me know if any further adjustments are needed :) |
Hey @AndreaFrancis looks good, however, I'm having issues running the code. It executes successfully, but the repo generated is empty. I double checked my HF token and it is properly set. I tried fresh pip installing the repo and running the command 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 9.26it/s]
A new repository has been create on ncoop57/my_dataset
[I am going to the beach this weekend ➡ *Voy a la playa este fin de semana*, I am going to the gym after work ➡ *Voy al gimnasio después del trabajo*, I am going to the park with my kids ➡ *Voy al parque con mis hijos*, I am going to the movies with my friends ➡ *Voy al cine con mis amigos*, I am going to the store to buy some groceries ➡ *Voy a la tienda a comprar algunos comestibles*, I am going to the library to read some books ➡ *Voy a la biblioteca a leer algunos libros*, I am going to the zoo to see the animals ➡ *Voy al zoológico a ver a los animales*, I am going to the museum to see the art ➡ *Voy al museo a ver el arte*, I am going to the restaurant to eat some food ➡ *Voy al restaurante a comer algo de comida*] I similarly ran the code cells in the core.ipynb to no success. Here are the two repos it created:
I also checked the local file system and nothing was created. Any ideas for what I'm doing wrong? |
Co-authored-by: Lucain <lucainp@gmail.com>
Thank you for your guidance! I had initially expected this behavior too, but I mistakenly tried fixing it with
This might be ready for another review (hopefully, the final one! 🤞) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Looking good from a huggingface_hub side! 🤗
Great, thanks so much @AndreaFrancis and @Wauplin !! Super excited to use this feature 🤓 |
Amazing work, this is a super nice feature! |
Related Issue:
This PR addresses Issue #3.
Introduced a new method,
generate_to_hf
, in theFastData
class. This method facilitates seamless integration with Hugging Face by generating and uploading datasets directly to the Hugging Face Hub with the following new parameters:Functionality:
max_items_per_file
records.fastdata
andsynthetic
tags for easier discovery of datasets created using this library.Users will find datasets tagged with
fastdata
on Hugging Face Datasets, making it easier to explore AI-generated data.Sample Output:
Check out a sample dataset created using this method: FastData Example.