Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training gpu hours #5

Open
Li-Jicheng opened this issue Mar 4, 2025 · 3 comments
Open

training gpu hours #5

Li-Jicheng opened this issue Mar 4, 2025 · 3 comments

Comments

@Li-Jicheng
Copy link

Hi, great work! Can you share how many GPUs were used and the total training time? Thanks!

@shenyunhang
Copy link
Collaborator

Hi @Li-Jicheng , I made some statistics:

Stage-1: 12 nodes for 24 hours
Stage-2: 12 nodes for 76 hours (2024-10-14 13:31 -> 2024-10-17 17:55)
Stage-3: 32 nodes for 26 hours (2024-11-27 12:54 -> 2024-11-28 14:53)
Stage-4: 32 nodes for 78 hours (2024-11-28 16:16 -> 2024-12-01 22:33)

Each node has 16 NPUs with 64G memory.

For more details, you may refer to our training logs in:

https://huggingface.co/VITA-MLLM/Long-VITA-16K/raw/main/log_node11.txt

https://huggingface.co/VITA-MLLM/Long-VITA-128K/raw/main/log_node31.txt

https://huggingface.co/VITA-MLLM/Long-VITA-1M/raw/main/log_node31.txt

@Li-Jicheng
Copy link
Author

Li-Jicheng commented Mar 12, 2025

Thank you for your prompt response. I have a couple of follow-up questions, if you’re willing to assist:

1.Long-VITA’s long image context capability is impressive. In my scenario, I’ll be working with prompts that include lengthy text instructions alongside a few images. Do you think Long-VITA is well-suited for this? Are there other models you’d recommend exploring for such tasks?

2.I have access to 4 nodes, each equipped with 8 A100 GPUs. (I can potentially scale to 8 nodes, but only for short-term experiments, up to a week at most.) Do you believe this hardware setup is sufficient to replicate results similar to Long-VITA’s?

Appreciate your insights!

@shenyunhang
Copy link
Collaborator

  1. Long-VITA is suitable for this task. You could try different models which are compared in our paper.

  2. With 32 A100, it may train a 512K model. Only the stage-4 needs 64 GPUs to train the 1024K model. If you use a smaller model, e.g., 7B, 32 A100 should be able to train 1024K model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants