Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider hosting datasets in this GitHub repo, using our XetData extension #25

Closed
srinify opened this issue Dec 2, 2023 · 1 comment

Comments

@srinify
Copy link

srinify commented Dec 2, 2023

Hey folks! I work at XetHub and we scale Git to handle large files. We recently launched a GitHub integration that brings this into GitHub repos too and it's free forever for public repos.

As an example, we brought 100+ GB of onxx model files into this repo: https://github.com/xetdata/onnx-models

After people have installed our tiny Git extension, whenever they clone from this repo all large files are downloaded. They can optionally ignore the large files as well.

If that sounds interesting, I'd love to collaborate to make this happen.

@srinify srinify changed the title Consider hosting on XetHub Consider hosting datasets in this GitHub repo, using our XetData extension Dec 2, 2023
@favyen2
Copy link
Collaborator

favyen2 commented Dec 14, 2023

We still intend to upload the data to Hugging Face, however we will use a separate repository for the code and for the data since even with lfs having data in the same repo makes having a full copy of the repo unwieldy (especially with our 40 TB dataset).

@favyen2 favyen2 closed this as completed Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants