You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to reproduce the pre-training tasks of Video-LLaMAv2. I have already downloaded the Vallay and LLaVA-image datasets and started experimenting with pre-training. However, I noticed that the video dimensions obtained in LazySupervisedDataset and DataCollatorForSupervisedDataset are 16, 3, 336, 336. Without making any modifications, I found that the video dimensions became 2, 3, 336, 336 in the forward method of VideoLLaMA2MistralForCausalLM. I couldn't find where the changes occurred and couldn't understand the logic behind the modification. Could you help me resolve this issue?
The text was updated successfully, but these errors were encountered:
@BenoitHanotte@hill2hill@lixin4ever@hangzhang-nlp Could you help me with this issue? It's very important to me, and I’ve already spent about two weeks on it. I would greatly appreciate your assistance in resolving it.
I am attempting to reproduce the pre-training tasks of Video-LLaMAv2. I have already downloaded the Vallay and LLaVA-image datasets and started experimenting with pre-training. However, I noticed that the video dimensions obtained in
LazySupervisedDataset
andDataCollatorForSupervisedDataset
are 16, 3, 336, 336. Without making any modifications, I found that the video dimensions became 2, 3, 336, 336 in the forward method ofVideoLLaMA2MistralForCausalLM
. I couldn't find where the changes occurred and couldn't understand the logic behind the modification. Could you help me resolve this issue?The text was updated successfully, but these errors were encountered: