You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I normally use indicnlp to tokenize and moses to train the MT but your model is giving better accuracy and can you give an insight into the amount or corpus used to train the model? Thank you.
The text was updated successfully, but these errors were encountered:
fairseq/data/cvit/corpora.py has dataset with a tag based inclusion, which I specify through a configuration file (example). An example training script in our cluster looks like this.
hey, Is there way to add vocabulary (I mean words) to the model instead of retraining the entire model? can we edit the files in mm-all-iter1 to do this?
This paper might have some useful information, I think. I'd just retrain with new vocabulary, turnaround is approximately 1 day or something on 4 GPUs to start getting reasonable numbers. This one used 1080Tis or 2080Tis.
I normally use indicnlp to tokenize and moses to train the MT but your model is giving better accuracy and can you give an insight into the amount or corpus used to train the model? Thank you.
The text was updated successfully, but these errors were encountered: