Document model repo best practices #53

osanseviero · 2021-08-25T13:59:47Z

I would like to start documenting good practices of model repos to add to our documentation.

Some come to mind rather quickly

One model per repo (avoid having multiple models in the same repo)
Add metadata to the model card
Add metrics to the metadata of the model card

How do we want to encourage users to have multiple checkpoints in a single repo? There was a related discussion in GPT-J and for other contributions

One branch per checkpoint?
One commit per checkpoint?

My suggestion

When using checkpoints for version control, use a commit per checkpoint
- For example, Mistral has 600 checkpoints per model. Each checkpoint correspond to a different step. In that sense, I think it makes sense to have a commit/tag per checkpoint
When using checkpoints of a model with slightly different characteristics, use a branch per checkpoint
- For example, GPT-J 6B has a half precision checkpoint and a single precision checkpoint.

I'm just gathering ideas so any are welcome!

cc @patrickvonplaten @julien-c @LysandreJik @lewtun @NielsRogge I hope I did not forget anyone

StellaAthena · 2021-08-25T14:25:11Z

In #13022, @xloem raises an important point about git-lfs:

Just a note that for organizing models outside the hugging face cache, it is more convenient to have subfolders or separate repos for different content, because git-lfs can be very slow filtering many gigabytes when switching branches. Not planning on arguing the point, just making sure the use-case is shared.

lewtun · 2022-03-21T15:33:31Z

cc @lvwerra who has experience with creating model repos with multiple checkpoints for largish models like CodeParrot

osanseviero self-assigned this Aug 25, 2021

LysandreJik transferred this issue from huggingface/huggingface_hub Mar 16, 2022

osanseviero added the documentation Improvements or additions to documentation label Mar 17, 2022

osanseviero mentioned this issue Apr 15, 2022

Docs Revamp: new "Repositories" page for hub-docs #92

Merged

NimaBoscarino mentioned this issue May 25, 2022

Documentation Revamp #156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document model repo best practices #53

Document model repo best practices #53

osanseviero commented Aug 25, 2021

StellaAthena commented Aug 25, 2021

lewtun commented Mar 21, 2022

Document model repo best practices #53

Document model repo best practices #53

Comments

osanseviero commented Aug 25, 2021

StellaAthena commented Aug 25, 2021

lewtun commented Mar 21, 2022