Initial checkpoints
The AR and NAR model checkpoints for Mars 5, trained for 1.68M steps for the AR model and 1.26M steps for the NAR model, the number of updates is some fraction thereof with gradient accumulation steps.
The AR and NAR model checkpoints for Mars 5, trained for 1.68M steps for the AR model and 1.26M steps for the NAR model, the number of updates is some fraction thereof with gradient accumulation steps.