Mbarthez training #7

SimonBenhamou · 2024-05-16T16:38:45Z

I can't find in the repository the code used to continue mbart pretraining to create mbarthez. Did you make it available somewhere ?

More specifically, I'm interested in understanding how you adapted the mbart tokenizer. It looks like that the checkpoint on huggingface uses the barthez tokenizer, not the mbart tokenizer. So my question is: how did you align the pretrained mbart embeddings with the barthez tokenizer vocab ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mbarthez training #7

Mbarthez training #7

SimonBenhamou commented May 16, 2024 •

edited

Loading

Mbarthez training #7

Mbarthez training #7

Comments

SimonBenhamou commented May 16, 2024 • edited Loading

SimonBenhamou commented May 16, 2024 •

edited

Loading