You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue #33 points out that there are 99 unused entries in the mBERT vocabulary intended for users to add task-specific vocabulary entries for fine-tuning. We could use the entries to improve the vocabulary's coverage of Irish without having to train from scratch. However, to not put stones in the way of users of our models who want to use unused entries for their own tasks, we should not use all 99 entries.
A way to choose the entries to add would be to induce new vocabularies for a clean Irish corpus, reducing the size until the number of new entries, i.e. entries that are not in the mBERT vocabulary, is less than or equal to the number of entries we want to add, say 49.
The text was updated successfully, but these errors were encountered:
jowagner
changed the title
Populate unused vocabulary entries of mBERT model
Populate unused vocabulary entries of our mBERT-based models
Nov 27, 2020
Issue #33 points out that there are 99 unused entries in the mBERT vocabulary intended for users to add task-specific vocabulary entries for fine-tuning. We could use the entries to improve the vocabulary's coverage of Irish without having to train from scratch. However, to not put stones in the way of users of our models who want to use unused entries for their own tasks, we should not use all 99 entries.
A way to choose the entries to add would be to induce new vocabularies for a clean Irish corpus, reducing the size until the number of new entries, i.e. entries that are not in the mBERT vocabulary, is less than or equal to the number of entries we want to add, say 49.
The text was updated successfully, but these errors were encountered: