-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Dear mantainers,
I've been working with different LLM's pretraining data and I've noticed that even though the dataset has a unique name such as "dolmino-mix-1124", it encompasses other well-known datasets. For example, "dolmino-mix-1124" contains DCLM and FLAN. Since this information is relevant for people who are working with pre- and post-training data, I was wondering whether we could add a column for the datasets included in the training data of models and a way of comparing them across models given that there's a long list of datasets. Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels