You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(服务器是8卡v100-32G)
报错:
kenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8 --english --nproc_per_node 4
[2024-07-08 18:19:08,531] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-08 18:19:10,471] [WARNING] Failed to load bitsandbytes:No module named 'bitsandbytes'
[2024-07-08 18:19:12,346] [INFO] building CogCoMModel model ...
[2024-07-08 18:19:12,348] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-07-08 18:19:12,349] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-07-08 18:19:12,349] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
[2024-07-08 18:19:27,219] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 17639685376
[2024-07-08 18:19:45,973] [INFO] [RANK 0] CUDA out of memory. Tried to allocate 86.00 MiB. GPU
[2024-07-08 18:19:45,974] [INFO] [RANK 0] global rank 0 is loading checkpoint /data/CogCoM/CogCoM/cogcom-chat-17b/50000/mp_rank_00_model_states.pt
[2024-07-08 18:20:19,633] [INFO] [RANK 0] > successfully loaded /data/CogCoM/CogCoM/cogcom-chat-17b/50000/mp_rank_00_model_states.pt
[2024-07-08 18:20:21,178] [INFO] [RANK 0] > Quantizing model weight to 8 bits
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/CogCoM/CogCoM-main/cogcom/demo/cli_demo_sat_zd.py", line 167, in
[rank0]: main()
[rank0]: File "/data/CogCoM/CogCoM-main/cogcom/demo/cli_demo_sat_zd.py", line 70, in main
[rank0]: quantize(model.transformer, args.quant)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 282, in quantize
[rank0]: replace_linear(model)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 247, in replace_linear
[rank0]: setattr(module, name, QuantizedColumnParallelLinear(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 158, in init
[rank0]: super(QuantizedColumnParallelLinear, self).init(*args, **kwargs)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/mpu/layers.py", line 256, in init
[rank0]: self.weight = Parameter(torch.empty(self.output_size_per_partition,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU
The text was updated successfully, but these errors were encountered:
请问推理代码并行化demo是哪个?根据 readme这么调的,但是不行一直报错。。
CUDA_VISIBLE_DEVICES=4,5,6,7 python cli_demo_sat.py --from_pretrained /data/CogCoM/CogCoM/cogcom-chat-17b --local_tokenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8 --english --nproc_per_node 4
(服务器是8卡v100-32G)
报错:
kenizer /data/CogCoM/CogCoM-main/vicuna-7b-v1.5 --fp16 --quant 8 --english --nproc_per_node 4
[2024-07-08 18:19:08,531] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-08 18:19:10,471] [WARNING] Failed to load bitsandbytes:No module named 'bitsandbytes'
[2024-07-08 18:19:12,346] [INFO] building CogCoMModel model ...
[2024-07-08 18:19:12,348] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-07-08 18:19:12,349] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-07-08 18:19:12,349] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
[2024-07-08 18:19:27,219] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 17639685376
[2024-07-08 18:19:45,973] [INFO] [RANK 0] CUDA out of memory. Tried to allocate 86.00 MiB. GPU
[2024-07-08 18:19:45,974] [INFO] [RANK 0] global rank 0 is loading checkpoint /data/CogCoM/CogCoM/cogcom-chat-17b/50000/mp_rank_00_model_states.pt
[2024-07-08 18:20:19,633] [INFO] [RANK 0] > successfully loaded /data/CogCoM/CogCoM/cogcom-chat-17b/50000/mp_rank_00_model_states.pt
[2024-07-08 18:20:21,178] [INFO] [RANK 0] > Quantizing model weight to 8 bits
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/CogCoM/CogCoM-main/cogcom/demo/cli_demo_sat_zd.py", line 167, in
[rank0]: main()
[rank0]: File "/data/CogCoM/CogCoM-main/cogcom/demo/cli_demo_sat_zd.py", line 70, in main
[rank0]: quantize(model.transformer, args.quant)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 282, in quantize
[rank0]: replace_linear(model)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 280, in replace_linear
[rank0]: replace_linear(sub_module)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 247, in replace_linear
[rank0]: setattr(module, name, QuantizedColumnParallelLinear(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/quantization/kernels.py", line 158, in init
[rank0]: super(QuantizedColumnParallelLinear, self).init(*args, **kwargs)
[rank0]: File "/data/anaconda3/envs/CogCOM/lib/python3.11/site-packages/sat/mpu/layers.py", line 256, in init
[rank0]: self.weight = Parameter(torch.empty(self.output_size_per_partition,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU
The text was updated successfully, but these errors were encountered: