Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Proposal #297

Open
ArthurMinovsky opened this issue Aug 25, 2023 · 1 comment
Open

Storage Proposal #297

ArthurMinovsky opened this issue Aug 25, 2023 · 1 comment
Assignees

Comments

@ArthurMinovsky
Copy link
Collaborator

It should have

  • Why do we need more storage
  1. Model Checkpoint
  2. Data Sourcing
  3. Common Crawl Creating
@boat1603
Copy link
Collaborator

  • We will use 80TB to keep dataset for multimodal dataset (5B files of LAION image dataset will use 50TB [reference]) and 30TB for other multimodal dataset (COYO-700M, Conceptual 12M and other datasets).
  • We will use 40TB for our multimodal experiments (Keep weights, keep preprocess or cleaned data)

@new5558 new5558 assigned new5558 and unassigned boat1603 Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants