Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference ---- out of memery #23

Open
AugWrite opened this issue Jul 9, 2024 · 2 comments
Open

inference ---- out of memery #23

AugWrite opened this issue Jul 9, 2024 · 2 comments

Comments

@AugWrite
Copy link

AugWrite commented Jul 9, 2024

请问推理代码并行化demo是哪个?根据 readme这么调的,但是不行一直报错。。
CUDA_VISIBLE_DEVICES=4,5,6,7 python cli_demo_sat.py --from_pretrained /data/CogCoM/CogCoM/cogcom-chat-17b --local_tokenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8 --english --nproc_per_node 4

(服务器是8卡v100-32G)
报错:
kenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8 --english --nproc_per_node 4
[2024-07-08 18:19:08,531] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-08 18:19:10,471] [WARNING] Failed to load bitsandbytes:No module named 'bitsandbytes'
[2024-07-08 18:19:12,346] [INFO] building CogCoMModel model ...
[2024-07-08 18:19:12,348] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-07-08 18:19:12,349] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-07-08 18:19:12,349] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
[2024-07-08 18:19:27,219] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 17639685376
[2024-07-08 18:19:45,973] [INFO] [RANK 0] CUDA out of memory. Tried to allocate 86.00 MiB. GPU
[2024-07-08 18:19:45,974] [INFO] [RANK 0] global rank 0 is loading checkpoint /data/CogCoM/CogCoM/cogcom-chat-17b/50000/mp_rank_00_model_states.pt
[2024-07-08 18:20:19,633] [INFO] [RANK 0] > successfully loaded /data/CogCoM/CogCoM/cogcom-chat-17b/50000/mp_rank_00_model_states.pt
[2024-07-08 18:20:21,178] [INFO] [RANK 0] > Quantizing model weight to 8 bits
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/CogCoM/CogCoM-main/cogcom/demo/cli_demo_sat_zd.py", line 167, in
[rank0]: main()
[rank0]: File "/data/CogCoM/CogCoM-main/cogcom/demo/cli_demo_sat_zd.py", line 70, in main
[rank0]: quantize(model.transformer, args.quant)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 282, in quantize
[rank0]: replace_linear(model)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 247, in replace_linear
[rank0]: setattr(module, name, QuantizedColumnParallelLinear(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 158, in init
[rank0]: super(QuantizedColumnParallelLinear, self).init(*args, **kwargs)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/mpu/layers.py", line 256, in init
[rank0]: self.weight = Parameter(torch.empty(self.output_size_per_partition,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU

@terryII
Copy link

terryII commented Jul 9, 2024

可以通过如下方式调用torchrun --standalone --nnodes=1 --nproc-per-node=4 cogcom/demo/cli_demo_sat.py --from_pretrained /data/CogCoM/CogCoM/cogcom-chat-17b --local_tokenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8,是否量化按需添加 @AugWrite

@AugWrite
Copy link
Author

AugWrite commented Jul 9, 2024

哇谢谢!解决了!!(^ ▽ ^)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants