-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codellama training issue with Multiple GPUs in SFTTrainer #844
Comments
Can you compare also |
Hi @Humza1996 |
Is torch.bfloat16 not a good option? |
This looks similar to the issue I created at #921 Specifically how all the GPUs are idle except one. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
I am trying to train codellama-7B in int8 using SFT trainer by trl. Model size after quantization is around 8GB. I tried to train it on
RTX 3090 24GB (35 FLOPS)
and it took~380 Hours
for complete training. Then I upgraded my system and now I am trying to train it on4xA4000 ~64GB (82 FLOPS)
. Training time on new setup is increased to~4200 Hours
which is suprisezingly wrong. It should be lower than the previous setup because VRAM and computing power is increased. What is the best way to utilize multiple GPUs for LLM training ?I am using following code block:
GPU USAGE
accelerate env
Accelerate
version: 0.23.0Accelerate
default config:- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: bf16
- use_cpu: False
- debug: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
The text was updated successfully, but these errors were encountered: