training gpu hours #5

Li-Jicheng · 2025-03-04T07:51:35Z

Hi, great work! Can you share how many GPUs were used and the total training time? Thanks!

shenyunhang · 2025-03-12T03:55:53Z

Hi @Li-Jicheng , I made some statistics:

Stage-1: 12 nodes for 24 hours
Stage-2: 12 nodes for 76 hours (2024-10-14 13:31 -> 2024-10-17 17:55)
Stage-3: 32 nodes for 26 hours (2024-11-27 12:54 -> 2024-11-28 14:53)
Stage-4: 32 nodes for 78 hours (2024-11-28 16:16 -> 2024-12-01 22:33)

Each node has 16 NPUs with 64G memory.

For more details, you may refer to our training logs in:

https://huggingface.co/VITA-MLLM/Long-VITA-16K/raw/main/log_node11.txt

https://huggingface.co/VITA-MLLM/Long-VITA-128K/raw/main/log_node31.txt

https://huggingface.co/VITA-MLLM/Long-VITA-1M/raw/main/log_node31.txt

Li-Jicheng · 2025-03-12T15:35:31Z

Thank you for your prompt response. I have a couple of follow-up questions, if you’re willing to assist:

1.Long-VITA’s long image context capability is impressive. In my scenario, I’ll be working with prompts that include lengthy text instructions alongside a few images. Do you think Long-VITA is well-suited for this? Are there other models you’d recommend exploring for such tasks?

2.I have access to 4 nodes, each equipped with 8 A100 GPUs. (I can potentially scale to 8 nodes, but only for short-term experiments, up to a week at most.) Do you believe this hardware setup is sufficient to replicate results similar to Long-VITA’s?

Appreciate your insights!

shenyunhang · 2025-03-13T02:49:03Z

Long-VITA is suitable for this task. You could try different models which are compared in our paper.
With 32 A100, it may train a 512K model. Only the stage-4 needs 64 GPUs to train the 1024K model. If you use a smaller model, e.g., 7B, 32 A100 should be able to train 1024K model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training gpu hours #5

training gpu hours #5

Li-Jicheng commented Mar 4, 2025

shenyunhang commented Mar 12, 2025

Li-Jicheng commented Mar 12, 2025 •

edited

Loading

shenyunhang commented Mar 13, 2025

training gpu hours #5

training gpu hours #5

Comments

Li-Jicheng commented Mar 4, 2025

shenyunhang commented Mar 12, 2025

Li-Jicheng commented Mar 12, 2025 • edited Loading

shenyunhang commented Mar 13, 2025

Li-Jicheng commented Mar 12, 2025 •

edited

Loading