-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load GitHub datasets from Hub #4059
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Currently the github datasets versioning is synced with the We could stop having this behavior and always use the latest version of the dataset, but when we do a breaking change it will break github datasets for previous versions of the library. It could be nice to think about tools that will allow backward compatibility if we ever need to to a breaking change in some datasets. Maybe a way to specify which revision of the dataset to use based on the If we keep this behavior, then maybe add a note in setup.py to push to PyPI only after the |
@lhoestq I was going to increase the But then I realized that loading from the Hub would work as well. That is why I opened this PR. Definitely, we should decide which behavior we want:
Not sure what could be better in the long term... |
Not sure of understanding this. Previous versions of the |
Yes you're right, previous versions of |
Ideally we should drop the differences between github datasets and community datasets, and maybe provide a way to fallback on an older version of a dataset repository if the user's |
I just noticed I literally opened the same PR lol I'm still convinced that we should do a better version compatibility check but we can see that later IMO |
Normally in open source projects, when there is a duplicate PR, the latter is tagged as "duplicate" and closed. 😜 Let me make things clear in my mind: so you say that the blocking point that was preventing this PR from merging, now is no longer a blocking point and could be addresses in a subsequent PR? |
Let me close the duplicate one, sorry
Yes 🙈 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! LGTM :)
Finally we'll remove the differences between Hub datasets and GitHub datasets ^^
(Note that after this PR, all the changes made to a dataset will affect all the datasets
version from now on)
Yes, we have aligned this behavior with Hub datasets, as this is already the case for Hub datasets. |
We have recurrently had connection errors when requesting GitHub because sometimes the site is not available.
This PR requests the Hub instead, once all GitHub datasets are mirrored on the Hub.
Fix #2048
Related to: