Hi, I would like to know how much VRAM would be normally required to train this model with similar hyper parameters ? Thanks