gptManagerBenchmark std::bad_alloc error #66

clockfly · 2023-10-23T01:46:19Z

machine: nv 4090 24GB
model: llama13B-gptq (the GPU memory should be enough)
Problem: std::bad_alloc error when starting GptManager.
Expected: runs successfully.

root@ubuntu-devel:/code/tensorrt_llm/cpp/build/benchmarks# CUDA_VISIBLE_DEVICES=0  ./gptManagerBenchmark     --model llama13b_gptq_compiled     --engine_dir /code/tensorrt_llm/models/llama13b_gptq_compiled     --type IFB     --dataset /code/tensorrt_llm/models/llama13b_gptq/preprocessed_dataset.json  --log_level verbose --kv_cache_free_gpu_mem_fraction 0.2
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][ERROR] std::bad_alloc

The text was updated successfully, but these errors were encountered:

jdemouth-nvidia · 2023-10-23T04:55:23Z

Hi @clockfly ,

Can you share the command used to build the model, please? We'd like to see if we can reproduce the problem.

Thanks,
Julien

ljayx · 2023-10-23T10:51:25Z

@jdemouth-nvidia
Same issue. Please help.

build:

python build.py --model_dir /models/Llama-2-7b-chat-hf \
	--dtype float16 \
	--use_gpt_attention_plugin float16 \
	--use_gemm_plugin float16 \
	--output_dir /models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
	--use_inflight_batching \
	--paged_kv_cache \
	--remove_input_padding

run:

CUDA_VISIBLE_DEVICES=7 ${proj_dir}/cpp/bbb/benchmarks/gptManagerBenchmark \
	--model=llama \
	--engine_dir=/models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
	--dataset=${proj_dir}/benchmarks/cpp/preprocessed_dataset.json \
	--log_level=verbose

log:

[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][ERROR] std::bad_alloc

Model: llama2-7b
GPU: 8 x A100 80GB, but used only the last
CPU: 96 x Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
Memory: 768GB

juney-nvidia · 2023-10-23T13:09:10Z

@jdemouth-nvidia Same issue. Please help.

build:

python build.py --model_dir /models/Llama-2-7b-chat-hf \
	--dtype float16 \
	--use_gpt_attention_plugin float16 \
	--use_gemm_plugin float16 \
	--output_dir /models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
	--use_inflight_batching \
	--paged_kv_cache \
	--remove_input_padding

run:

CUDA_VISIBLE_DEVICES=7 ${proj_dir}/cpp/bbb/benchmarks/gptManagerBenchmark \
	--model=llama \
	--engine_dir=/models/Llama-2-7b-chat-hf/fp16/1-gpu/ \
	--dataset=${proj_dir}/benchmarks/cpp/preprocessed_dataset.json \
	--log_level=verbose

log:

[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][ERROR] std::bad_alloc

Model: llama2-7b GPU: 8 x A100 80GB, but used only the last CPU: 96 x Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz Memory: 768GB

Thanks for sharing the concrete steps, we will try to reproduce it and hopefully provide some feedback tomorrow.

June

ljayx · 2023-10-24T06:35:25Z

Hi June, how's the issue going.
I'm stuck here. From the backtrace, it looks like the binary throwed from std::filesystem::path ctor. Not sure if CXX11_ABI matters since it's different with CXX17.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
*** Process received signal ***
...
...
[ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt28__throw_bad_array_new_lengthv+0x0)[0x7f790fd70265]
[ 9] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZNSt10filesystem7__cxx114pathC1ERKS1_+0xff)[0x7f794d1995ef]
[10] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager18TrtGptModelFactory6createERKNSt10filesystem7__cxx114pathENS0_15TrtGptModelTypeEiNS0_15batch_scheduler15SchedulerPolicyERKNS0_25TrtGptModelOptionalParamsE+0xcf)[0x7f794d19bf8f]

kaiyux · 2023-10-24T14:21:51Z

Hi @clockfly @ljayx , I did not reproduce that issue.

Can you share the operating system that you're using? Thanks.

ryxli · 2023-10-24T20:04:45Z

Hi @kaiyux, was also able to reproduce this issue with both V1 and IFB:

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"

Build Engine

python build.py --dtype float16 \
                --remove_input_padding \
                --use_gpt_attention_plugin float16 \
                --use_gemm_plugin float16 \
                --enable_context_fmha \
                --world_size 8 \
                --tp_size 8 \
                --model_dir $HF_DIR \
                --output_dir $ENGINE_DIR \
                --use_inflight_batching \
                --paged_kv_cache

Run

mpirun -n 8 --allow-run-as-root gptManagerBenchmark \
    --model llama \
    --engine_dir $ENGINE_DIR \
    --type V1 \
    --dataset $DATASET_OUT \
    --log_level verbose

Out

[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 8, rank: 0
[TensorRT-LLM][INFO] MPI size: 8, rank: 7
[TensorRT-LLM][INFO] MPI size: 8, rank: 1
[TensorRT-LLM][INFO] MPI size: 8, rank: 2
[TensorRT-LLM][INFO] MPI size: 8, rank: 5
[TensorRT-LLM][INFO] MPI size: 8, rank: 3
[TensorRT-LLM][INFO] MPI size: 8, rank: 6
[TensorRT-LLM][INFO] MPI size: 8, rank: 4
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
[TensorRT-LLM][ERROR] std::bad_alloc
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[7622,1],7]
  Exit code:    1

ljayx · 2023-10-25T00:27:10Z

Hi @clockfly @ljayx , I did not reproduce that issue.

Can you share the operating system that you're using? Thanks.

It is a docker container with base image nvcr.io/nvidia/pytorch:23.08-py3 running on host.

## Host:
CentOS release 7.6 (Final)
Linux sh***d 5.10.0-1.0.0.29 #1 SMP Fri Jul 28 08:23:51 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

## Container:
Ubuntu 22.04.2 LTS
Linux sh***d 5.10.0-1.0.0.29 #1 SMP Fri Jul 28 08:23:51 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

gesanqiu · 2023-10-25T06:14:16Z

Meet the same issue on the base image nvcr.io/nvidia/pytorch:23.08-py3.
Build Engine:

python build.py --model_dir /workdir/hf_models/llama-2-7b-chat-hf/ --dtype float16 --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --max_batch_size 16 --max_input_len 2048 --max_output_len 2048 --use_inflight_batching --paged_kv_cache --remove_input_padding --output_dir /workdir/trt_llm_models/llama-2-7b-chat/fp16-inflight/1-gpu/

Run:

root@dell:/workdir/TensorRT-LLM/cpp/build# CUDA_VISIBLE_DEVICES=1 ./benchmarks/gptManagerBenchmark --model llama_7b --engine_dir /workdir/trt_llm_models/llama-2-7b-chat/fp16-inflight/1-gpu/ --type IFB --dataset ../../examples/llama/llama_preprocessed_dataset.json --log_level verbose
[TensorRT-LLM][INFO] Set logger level by TRACE
[TensorRT-LLM][DEBUG] Registered plugin creator Identity version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator BertAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator GPTAttention version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Gemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Send version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Recv version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllReduce version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator AllGather version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Layernorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Rmsnorm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator SmoothQuantGemm version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator LayernormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizePerToken version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator QuantizeTensor version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator RmsnormQuantization version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyGroupwiseQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator WeightOnlyQuantMatmul version 1 in namespace tensorrt_llm
[TensorRT-LLM][DEBUG] Registered plugin creator Lookup version 1 in namespace tensorrt_llm
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][INFO] MPI size: 1, rank: 0
[TensorRT-LLM][ERROR] std::bad_alloc

gptSessionBenchmark also run into error.

root@dell:/workdir/TensorRT-LLM/cpp/build# CUDA_VISIBLE_DEVICES=1 ./benchmarks/gptSessionBenchmark --model llama_7b --engine_dir /workdir/trt_llm_models/llama-2-7b-chat/fp16-inflight/1-gpu/ --batch_size "1" --input_output_len "60,20"
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Error opening engine file: /workdir/trt_llm_models/llama-2-7b-chat/fp16-inflight/1-gpu/llama_7b_float16_tp1_rank0.engine (/workdir/TensorRT-LLM/cpp/tensorrt_llm/runtime/utils/sessionUtils.cpp:42)
1       0x5555ebef46ee tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f522d9790ed /workdir/TensorRT-LLM/cpp/build/tensorrt_llm/libtensorrt_llm.so(+0x3720ed) [0x7f522d9790ed]
3       0x5555ebf04dd3 ./benchmarks/gptSessionBenchmark(+0x23dd3) [0x5555ebf04dd3]
4       0x5555ebef7bef ./benchmarks/gptSessionBenchmark(+0x16bef) [0x5555ebef7bef]
5       0x7f51f0bd8d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f51f0bd8d90]
6       0x7f51f0bd8e40 __libc_start_main + 128
7       0x5555ebef9fe5 ./benchmarks/gptSessionBenchmark(+0x18fe5) [0x5555ebef9fe5]
root@dell:/workdir/TensorRT-LLM/cpp/build#

ljayx · 2023-10-25T13:16:49Z

Hi June, how's the issue going. I'm stuck here. From the backtrace, it looks like the binary throwed from std::filesystem::path ctor. Not sure if CXX11_ABI matters since it's different with CXX17.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
*** Process received signal ***
...
...
[ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt28__throw_bad_array_new_lengthv+0x0)[0x7f790fd70265]
[ 9] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZNSt10filesystem7__cxx114pathC1ERKS1_+0xff)[0x7f794d1995ef]
[10] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager18TrtGptModelFactory6createERKNSt10filesystem7__cxx114pathENS0_15TrtGptModelTypeEiNS0_15batch_scheduler15SchedulerPolicyERKNS0_25TrtGptModelOptionalParamsE+0xcf)[0x7f794d19bf8f]

I resolved the issue. The root cause is CXX11_ABI related.

Root cause:
The Dockerfile skipped install pytorch, but the default pytorch inside nvcr.io/nvidia/pytorch:23.08-py3 has cxx11_abi.
The CMakeLists.txt enabled USE_CXX11_ABI as per the pytorch cxx11_abi.
Since GptManager used std::filesystem::path and this api is different between C++11 and C++17, the binary throwed from the ctor.

Solution:
Just:

bash install_pytorch.sh src_non_cxx11_abi

It should an issue would confuse users.

ryxli · 2023-10-25T20:36:15Z

*** Process received signal ***
...
...
[ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt28__throw_bad_array_new_lengthv+0x0)[0x7f790fd70265]
[ 9] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(ZNSt10filesystem7__cxx114pathC1ERKS1+0xff)[0x7f794d1995ef]
[10] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager18TrtGptModelFactory6createERKNSt10filesystem7__cxx114pathENS0_15TrtGptModelTypeEiNS0_15batch_scheduler15SchedulerPolicyERKNS0_25TrtGptModelOptionalParamsE+0xcf)[0x7f794d19bf8f]

@ljayx thanks for finding the issue and sharing the workaround.
Would you mind sharing how you get the backtrace to show up in the console?

ryxli · 2023-10-25T20:39:53Z

Also share this, unsure if it's relevant

the tensorrtllm_backend Dockerfile sets the pytorch installation arg as

# `pypi` for x86_64 arch and `src_cxx11_abi` for aarch64 arch
ARG TORCH_INSTALL_TYPE="pypi"
COPY tensorrt_llm/docker/common/install_pytorch.sh install_pytorch.sh
RUN bash ./install_pytorch.sh $TORCH_INSTALL_TYPE && rm install_pytorch.sh

https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/dockerfile/Dockerfile.trt_llm_backend#L33C1-L37C1

VS skipped in this package Dockerfile:

# Install PyTorch
ARG TORCH_INSTALL_TYPE="skip"
COPY docker/common/install_pytorch.sh install_pytorch.sh
RUN bash ./install_pytorch.sh $TORCH_INSTALL_TYPE && rm install_pytorch.sh

Using same base image:

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch
ARG BASE_TAG=23.08-py3

ryxli · 2023-10-25T23:57:53Z

Install pytorch with src_non_cxx11_abi seems to fix the issues encountered when using the batch manager, but maybe breaks some of the other scripts ?

python3 build.py .....

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1099, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 86, in <module>
    from accelerate import dispatch_model, infer_auto_device_map, init_empty_weights
  File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 34, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 112, in <module>
    from .launch import (
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 27, in <module>
    from ..utils.other import merge_dicts
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 24, in <module>
    from .transformer_engine import convert_model
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module>
    import transformer_engine.pytorch as te
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
    from .module import LayerNormLinear
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
    from .layernorm_linear import LayerNormLinear
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 15, in <module>
    from .. import cpp_extensions as tex
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
    from transformer_engine_extensions import *
ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tensorrt_llm/examples/llama/build.py", line 24, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1090, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1089, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1101, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

ljayx · 2023-10-26T01:57:18Z

*** Process received signal ***
...
...
[ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt28__throw_bad_array_new_lengthv+0x0)[0x7f790fd70265]
[ 9] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(ZNSt10filesystem7__cxx114pathC1ERKS1+0xff)[0x7f794d1995ef]
[10] /ljay/workspace/local/TensorRT-LLM/cpp/bbb/tensorrt_llm/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager18TrtGptModelFactory6createERKNSt10filesystem7__cxx114pathENS0_15TrtGptModelTypeEiNS0_15batch_scheduler15SchedulerPolicyERKNS0_25TrtGptModelOptionalParamsE+0xcf)[0x7f794d19bf8f]

@ljayx thanks for finding the issue and sharing the workaround. Would you mind sharing how you get the backtrace to show up in the console?

@ryxli I write a simple demo, FYI:

int main(void) {
    auto logger = std::make_shared<TllmLogger>();
    using severity = nvinfer1::ILogger::Severity;
    logger->setLevel(severity::kWARNING);
    initTrtLlmPlugins(logger.get());

    std::filesystem::path engine_path{"/ljay/model/llama2-7b-chat-hf/trt_engines/fp16/1-gpu/"};
    auto model_type = TrtGptModelType::InflightFusedBatching;
    int32_t max_seq_len = 512;
    int32_t max_num_req = 8;
    int32_t max_beam_width = 1;
    int32_t max_tokens_in_paged_kvcache = -1;
    float kv_cache_free_gpu_mem_fraction = -1;
    bool enable_trt_overlap = false;
    uint64_t terminate_reqId = 10000;

    batch_scheduler::SchedulerPolicy scheduler_policy{batch_scheduler::SchedulerPolicy::GUARANTEED_NO_EVICT};
    auto const worldConfig = WorldConfig::mpi(*logger);

    const TrtGptModelOptionalParams& optional_params = TrtGptModelOptionalParams(
        max_num_req, max_tokens_in_paged_kvcache, kv_cache_free_gpu_mem_fraction, enable_trt_overlap);

    auto m = std::make_shared<GptManager>(engine_path, model_type, max_beam_width, scheduler_policy, requests_callback, response_callback, nullptr, nullptr, optional_params, terminate_reqId);

    for (;;)
        ;
}

ljayx · 2023-10-26T02:00:24Z

Install pytorch with src_non_cxx11_abi seems to fix the issues encountered when using the batch manager, but maybe breaks some of the other scripts ?

python3 build.py .....

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1099, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 32, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 86, in <module>
    from accelerate import dispatch_model, infer_auto_device_map, init_empty_weights
  File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 34, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 112, in <module>
    from .launch import (
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/launch.py", line 27, in <module>
    from ..utils.other import merge_dicts
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 24, in <module>
    from .transformer_engine import convert_model
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module>
    import transformer_engine.pytorch as te
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
    from .module import LayerNormLinear
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
    from .layernorm_linear import LayerNormLinear
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 15, in <module>
    from .. import cpp_extensions as tex
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
    from transformer_engine_extensions import *
ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tensorrt_llm/examples/llama/build.py", line 24, in <module>
    from transformers import LlamaConfig, LlamaForCausalLM
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1090, in __getattr__
    value = getattr(module, name)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1089, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1101, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Hit the same issue with you. Please Nvidia experts help check it @jdemouth-nvidia @kaiyux

ljayx · 2023-10-26T03:08:29Z

tried uninstall transformer-engine, build.py and gptManagerBenchmark get to work. Not sure will it have bad influence.

pip uninstall transformer-engine

kaiyux · 2023-10-26T11:17:59Z

We are still trying to reproduce and investigate, we will get back to you when we have conclusion. Thanks.

zhaoyang-star · 2023-10-26T11:45:36Z

Same error. Keep tuned.

juney-nvidia · 2023-10-27T02:48:00Z

Thanks for the patience. We have found the root cause and are now working on the fix. We will push the fix(with other enhancements) in the recent days, and when it gets pushed, a new "announcement" will also get updated.

June

Shixiaowei02 · 2023-10-27T03:55:19Z

The fixed MR is: #152, we tested it and will merge it. Thank you all for your support and help! @zhaoyang-star @ljayx @ryxli @gesanqiu @clockfly

jdemouth-nvidia added the triaged Issue has been triaged by maintainers label Oct 23, 2023

clockfly closed this as completed Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gptManagerBenchmark std::bad_alloc error #66

gptManagerBenchmark std::bad_alloc error #66

clockfly commented Oct 23, 2023

jdemouth-nvidia commented Oct 23, 2023

ljayx commented Oct 23, 2023

juney-nvidia commented Oct 23, 2023

ljayx commented Oct 24, 2023

kaiyux commented Oct 24, 2023

ryxli commented Oct 24, 2023 •

edited

Loading

ljayx commented Oct 25, 2023

gesanqiu commented Oct 25, 2023

ljayx commented Oct 25, 2023

ryxli commented Oct 25, 2023

ryxli commented Oct 25, 2023 •

edited

Loading

ryxli commented Oct 25, 2023

ljayx commented Oct 26, 2023 •

edited

Loading

ljayx commented Oct 26, 2023

ljayx commented Oct 26, 2023

kaiyux commented Oct 26, 2023

zhaoyang-star commented Oct 26, 2023

juney-nvidia commented Oct 27, 2023

Shixiaowei02 commented Oct 27, 2023 •

edited

Loading

gptManagerBenchmark std::bad_alloc error #66

gptManagerBenchmark std::bad_alloc error #66

Comments

clockfly commented Oct 23, 2023

jdemouth-nvidia commented Oct 23, 2023

ljayx commented Oct 23, 2023

juney-nvidia commented Oct 23, 2023

ljayx commented Oct 24, 2023

kaiyux commented Oct 24, 2023

ryxli commented Oct 24, 2023 • edited Loading

ljayx commented Oct 25, 2023

gesanqiu commented Oct 25, 2023

ljayx commented Oct 25, 2023

ryxli commented Oct 25, 2023

ryxli commented Oct 25, 2023 • edited Loading

ryxli commented Oct 25, 2023

ljayx commented Oct 26, 2023 • edited Loading

ljayx commented Oct 26, 2023

ljayx commented Oct 26, 2023

kaiyux commented Oct 26, 2023

zhaoyang-star commented Oct 26, 2023

juney-nvidia commented Oct 27, 2023

Shixiaowei02 commented Oct 27, 2023 • edited Loading

ryxli commented Oct 24, 2023 •

edited

Loading

ryxli commented Oct 25, 2023 •

edited

Loading

ljayx commented Oct 26, 2023 •

edited

Loading

Shixiaowei02 commented Oct 27, 2023 •

edited

Loading