qwen 2-1.5B model build error #2420

rexmxw02 · 2024-11-06T12:11:07Z

System Info

TensorRT-LLM] TensorRT-LLM version: 0.14.0
0.14.0
229it [00:59, 3.88it/s]
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 303, in
main()
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 295, in main
convert_and_save_hf(args)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 251, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 258, in execute
f(args, rank)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 241, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File "/app/tensorrt_llm/tensorrt_llm/models/qwen/model.py", line 428, in from_hugging_face
loader.generate_tllm_weights(model)
File "/app/tensorrt_llm/tensorrt_llm/models/model_weights_loader.py", line 408, in generate_tllm_weights
self.load(tllm_key,
File "/app/tensorrt_llm/tensorrt_llm/models/model_weights_loader.py", line 296, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/app/tensorrt_llm/tensorrt_llm/layers/linear.py", line 407, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python3 /app/tensorrt_llm/examples/qwen/convert_checkpoint.py --model_dir /Qwen2-1.5B-Instruct --output_dir /qwen --dtype float16 --tp_size 1 --pp_size 1

Expected behavior

python3 /app/tensorrt_llm/examples/qwen/convert_checkpoint.py --model_dir /Qwen2-1.5B-Instruct --output_dir /qwen --dtype float16 --tp_size 1 --pp_size 1

actual behavior

python3 /app/tensorrt_llm/examples/qwen/convert_checkpoint.py --model_dir /Qwen2-1.5B-Instruct --output_dir /qwen --dtype float16 --tp_size 1 --pp_size 1

TensorRT-LLM] TensorRT-LLM version: 0.14.0
0.14.0
229it [00:59, 3.88it/s]
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 303, in
main()
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 295, in main
convert_and_save_hf(args)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 251, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 258, in execute
f(args, rank)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 241, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File "/app/tensorrt_llm/tensorrt_llm/models/qwen/model.py", line 428, in from_hugging_face
loader.generate_tllm_weights(model)
File "/app/tensorrt_llm/tensorrt_llm/models/model_weights_loader.py", line 408, in generate_tllm_weights
self.load(tllm_key,
File "/app/tensorrt_llm/tensorrt_llm/models/model_weights_loader.py", line 296, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/app/tensorrt_llm/tensorrt_llm/layers/linear.py", line 407, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'

additional notes

TensorRT-LLM] TensorRT-LLM version: 0.14.0
0.14.0
229it [00:59, 3.88it/s]
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 303, in
main()
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 295, in main
convert_and_save_hf(args)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 251, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 258, in execute
f(args, rank)
File "/app/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 241, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File "/app/tensorrt_llm/tensorrt_llm/models/qwen/model.py", line 428, in from_hugging_face
loader.generate_tllm_weights(model)
File "/app/tensorrt_llm/tensorrt_llm/models/model_weights_loader.py", line 408, in generate_tllm_weights
self.load(tllm_key,
File "/app/tensorrt_llm/tensorrt_llm/models/model_weights_loader.py", line 296, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/app/tensorrt_llm/tensorrt_llm/layers/linear.py", line 407, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'

The text was updated successfully, but these errors were encountered:

rexmxw02 · 2024-11-06T12:17:54Z

hardware/gpu H20/tensorrt llm version release 0.14.0

nv-guomingz · 2024-11-06T14:52:59Z

Hi @rexmxw02 we've fixed this issue on main branch instead of 0.14 release. Please have a try on main branch.
Similar issue for your reference #2388 (comment)

rexmxw02 · 2024-11-07T02:39:18Z

convert is ok,but build error:
trtllm-build --checkpoint_dir /qwen --output_dir /qwen_trt_engine_1.5b --gemm_plugin float16 --max_batch_size 120 --max_input_len 10000 --max_num_tokens 10800
[TensorRT-LLM] TensorRT-LLM version: 0.14.0
[11/07/2024-02:14:57] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set gemm_plugin to float16.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set nccl_plugin to auto.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set lookup_plugin to None.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set lora_plugin to None.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set moe_plugin to auto.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set context_fmha to True.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set remove_input_padding to True.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set reduce_fusion to False.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set enable_xqa to True.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set tokens_per_block to 64.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set multiple_profiles to False.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set paged_state to True.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set streamingllm to False.
[11/07/2024-02:14:57] [TRT-LLM] [I] Set use_fused_mlp to True.
[11/07/2024-02:14:57] [TRT-LLM] [W] Implicitly setting QWenConfig.qwen_type = qwen2
[11/07/2024-02:14:57] [TRT-LLM] [W] Implicitly setting QWenConfig.moe_intermediate_size = 0
[11/07/2024-02:14:57] [TRT-LLM] [W] Implicitly setting QWenConfig.moe_shared_expert_intermediate_size = 0
[11/07/2024-02:14:57] [TRT-LLM] [W] Implicitly setting QWenConfig.tie_word_embeddings = True
[11/07/2024-02:14:57] [TRT-LLM] [I] Compute capability: (9, 0)
[11/07/2024-02:14:57] [TRT-LLM] [I] SM count: 78
[11/07/2024-02:14:57] [TRT-LLM] [I] SM clock: 1980 MHz
[11/07/2024-02:14:57] [TRT-LLM] [I] int4 TFLOPS: 0
[11/07/2024-02:14:57] [TRT-LLM] [I] int8 TFLOPS: 2530
[11/07/2024-02:14:57] [TRT-LLM] [I] fp8 TFLOPS: 2530
[11/07/2024-02:14:57] [TRT-LLM] [I] float16 TFLOPS: 1265
[11/07/2024-02:14:57] [TRT-LLM] [I] bfloat16 TFLOPS: 1265
[11/07/2024-02:14:57] [TRT-LLM] [I] float32 TFLOPS: 632
[11/07/2024-02:14:57] [TRT-LLM] [I] Total Memory: 95 GiB
[11/07/2024-02:14:57] [TRT-LLM] [I] Memory clock: 2619 MHz
[11/07/2024-02:14:57] [TRT-LLM] [I] Memory bus width: 6144
[11/07/2024-02:14:57] [TRT-LLM] [I] Memory bandwidth: 4022 GB/s
[11/07/2024-02:14:57] [TRT-LLM] [I] NVLink is active: True
[11/07/2024-02:14:57] [TRT-LLM] [I] NVLink version: 4
[11/07/2024-02:14:57] [TRT-LLM] [I] NVLink bandwidth: 450 GB/s
[11/07/2024-02:15:00] [TRT-LLM] [I] Set dtype to float16.
[11/07/2024-02:15:00] [TRT-LLM] [I] Set paged_kv_cache to True.
[11/07/2024-02:15:00] [TRT-LLM] [W] Overriding paged_state to False
[11/07/2024-02:15:00] [TRT-LLM] [I] Set paged_state to False.
[11/07/2024-02:15:00] [TRT-LLM] [I] max_seq_len is not specified, using deduced value 32768
[11/07/2024-02:15:00] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width.

[11/07/2024-02:15:00] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[11/07/2024-02:15:22] [TRT] [I] [MemUsageChange] Init CUDA: CPU +16, GPU +0, now: CPU 1637, GPU 96996 (MiB)
[11/07/2024-02:15:43] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +4382, GPU +274, now: CPU 6175, GPU 97270 (MiB)
[11/07/2024-02:15:43] [TRT-LLM] [I] Set nccl_plugin to None.
[TensorRT-LLM][ERROR] tensorrt_llm::common::TllmException: [TensorRT-LLM][ERROR] CUDA runtime error in cublasCreate(handle.get()): CUBLAS_STATUS_INTERNAL_ERROR (/app/tensorrt_llm/cpp/tensorrt_llm/plugins/common/plugin.cpp:288)
1 0x7fd60decb0c9 /app/tensorrt_llm/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x5e0c9) [0x7fd60decb0c9]
2 0x7fd60df8a6cc /app/tensorrt_llm/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x11d6cc) [0x7fd60df8a6cc]
3 0x7fd60df317fe tensorrt_llm::plugins::GemmPlugin::init() + 46
4 0x7fd60df31f42 tensorrt_llm::plugins::GemmPlugin::GemmPlugin(int, int, int, int, nvinfer1::DataType, bool, float, std::shared_ptr<tensorrt_llm::plugins::CublasLtGemmPluginProfiler> const&) + 354
5 0x7fd60df324fc tensorrt_llm::plugins::GemmPluginCreator::createPlugin(char const*, nvinfer1::PluginFieldCollection const*) + 828
6 0x7fda82363799 /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0x163799) [0x7fda82363799]
7 0x7fda8224b35e /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0x4b35e) [0x7fda8224b35e]
8 0x5635ba21bc9e /usr/bin/python3(+0x15ac9e) [0x5635ba21bc9e]
9 0x5635ba2123cb _PyObject_MakeTpCall + 603
10 0x5635ba22a3eb /usr/bin/python3(+0x1693eb) [0x5635ba22a3eb]
11 0x5635ba20a59a _PyEval_EvalFrameDefault + 25674
12 0x5635ba21c59c _PyFunction_Vectorcall + 124
13 0x5635ba205b77 _PyEval_EvalFrameDefault + 6695
14 0x5635ba22a111 /usr/bin/python3(+0x169111) [0x5635ba22a111]
15 0x5635ba205b77 _PyEval_EvalFrameDefault + 6695
16 0x5635ba22a111 /usr/bin/python3(+0x169111) [0x5635ba22a111]
17 0x5635ba22adb2 PyObject_Call + 290
18 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
19 0x5635ba22a25e /usr/bin/python3(+0x16925e) [0x5635ba22a25e]
20 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
21 0x5635ba211564 _PyObject_FastCallDictTstate + 196
22 0x5635ba22746c _PyObject_Call_Prepend + 92
23 0x5635ba342180 /usr/bin/python3(+0x281180) [0x5635ba342180]
24 0x5635ba2123cb _PyObject_MakeTpCall + 603
25 0x5635ba20afab _PyEval_EvalFrameDefault + 28251
26 0x5635ba22a111 /usr/bin/python3(+0x169111) [0x5635ba22a111]
27 0x5635ba22adb2 PyObject_Call + 290
28 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
29 0x5635ba21c59c _PyFunction_Vectorcall + 124
30 0x5635ba21160d _PyObject_FastCallDictTstate + 365
31 0x5635ba22746c _PyObject_Call_Prepend + 92
32 0x5635ba342180 /usr/bin/python3(+0x281180) [0x5635ba342180]
33 0x5635ba2123cb _PyObject_MakeTpCall + 603
34 0x5635ba20b63b _PyEval_EvalFrameDefault + 29931
35 0x5635ba22a111 /usr/bin/python3(+0x169111) [0x5635ba22a111]
36 0x5635ba22adb2 PyObject_Call + 290
37 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
38 0x5635ba21c59c _PyFunction_Vectorcall + 124
39 0x5635ba21160d _PyObject_FastCallDictTstate + 365
40 0x5635ba22746c _PyObject_Call_Prepend + 92
41 0x5635ba342180 /usr/bin/python3(+0x281180) [0x5635ba342180]
42 0x5635ba22ad4b PyObject_Call + 187
43 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
44 0x5635ba22a111 /usr/bin/python3(+0x169111) [0x5635ba22a111]
45 0x5635ba205b77 _PyEval_EvalFrameDefault + 6695
46 0x5635ba22a111 /usr/bin/python3(+0x169111) [0x5635ba22a111]
47 0x5635ba22adb2 PyObject_Call + 290
48 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
49 0x5635ba22a111 /usr/bin/python3(+0x169111) [0x5635ba22a111]
50 0x5635ba22adb2 PyObject_Call + 290
51 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
52 0x5635ba21c59c _PyFunction_Vectorcall + 124
53 0x5635ba21160d _PyObject_FastCallDictTstate + 365
54 0x5635ba22746c _PyObject_Call_Prepend + 92
55 0x5635ba342180 /usr/bin/python3(+0x281180) [0x5635ba342180]
56 0x5635ba22ad4b PyObject_Call + 187
57 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
58 0x5635ba21c59c _PyFunction_Vectorcall + 124
59 0x5635ba204827 _PyEval_EvalFrameDefault + 1751
60 0x5635ba21c59c _PyFunction_Vectorcall + 124
61 0x5635ba22adb2 PyObject_Call + 290
62 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
63 0x5635ba21c59c _PyFunction_Vectorcall + 124
64 0x5635ba22adb2 PyObject_Call + 290
65 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
66 0x5635ba21c59c _PyFunction_Vectorcall + 124
67 0x5635ba22adb2 PyObject_Call + 290
68 0x5635ba206a9d _PyEval_EvalFrameDefault + 10573
69 0x5635ba21c59c _PyFunction_Vectorcall + 124
70 0x5635ba204827 _PyEval_EvalFrameDefault + 1751
71 0x5635ba200f96 /usr/bin/python3(+0x13ff96) [0x5635ba200f96]
72 0x5635ba2f6c66 PyEval_EvalCode + 134
73 0x5635ba321b38 /usr/bin/python3(+0x260b38) [0x5635ba321b38]
74 0x5635ba31b3fb /usr/bin/python3(+0x25a3fb) [0x5635ba31b3fb]
75 0x5635ba321885 /usr/bin/python3(+0x260885) [0x5635ba321885]
76 0x5635ba320d68 _PyRun_SimpleFileObject + 424
77 0x5635ba3209b3 _PyRun_AnyFileObject + 67
78 0x5635ba31345e Py_RunMain + 702
79 0x5635ba2e9a3d Py_BytesMain + 45
80 0x7fdb24124d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fdb24124d90]
81 0x7fdb24124e40 __libc_start_main + 128
82 0x5635ba2e9935 _start + 37
Traceback (most recent call last):
File "/usr/local/bin/trtllm-build", line 8, in
sys.exit(main())
File "/app/tensorrt_llm/tensorrt_llm/commands/build.py", line 568, in main
parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
File "/app/tensorrt_llm/tensorrt_llm/commands/build.py", line 423, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
File "/app/tensorrt_llm/tensorrt_llm/commands/build.py", line 390, in build_and_save
engine = build_model(build_config,
File "/app/tensorrt_llm/tensorrt_llm/commands/build.py", line 383, in build_model
return build(model, build_config)
File "/app/tensorrt_llm/tensorrt_llm/builder.py", line 1162, in build
model(**inputs)
File "/app/tensorrt_llm/tensorrt_llm/module.py", line 52, in call
output = self.forward(*args, **kwargs)
File "/app/tensorrt_llm/tensorrt_llm/models/modeling_utils.py", line 920, in forward
hidden_states = self.transformer.forward(**kwargs)
File "/app/tensorrt_llm/tensorrt_llm/models/qwen/model.py", line 222, in forward
hidden_states = self.layers.forward(
File "/app/tensorrt_llm/tensorrt_llm/models/modeling_utils.py", line 522, in forward
hidden_states = layer(
File "/app/tensorrt_llm/tensorrt_llm/module.py", line 52, in call
output = self.forward(*args, **kwargs)
File "/app/tensorrt_llm/tensorrt_llm/models/qwen/model.py", line 137, in forward
attention_output = self.attention(
File "/app/tensorrt_llm/tensorrt_llm/module.py", line 52, in call
output = self.forward(*args, **kwargs)
File "/app/tensorrt_llm/tensorrt_llm/layers/attention.py", line 684, in forward
qkv = self.qkv(hidden_states, qkv_lora_params)
File "/app/tensorrt_llm/tensorrt_llm/module.py", line 52, in call
output = self.forward(*args, **kwargs)
File "/app/tensorrt_llm/tensorrt_llm/layers/linear.py", line 289, in forward
return self.multiply_collect(
File "/app/tensorrt_llm/tensorrt_llm/layers/linear.py", line 272, in multiply_collect
x = self.multiply_and_lora(
File "/app/tensorrt_llm/tensorrt_llm/layers/linear.py", line 238, in multiply_and_lora
x = _gemm_plugin(x,
File "/app/tensorrt_llm/tensorrt_llm/layers/linear.py", line 127, in _gemm_plugin
layer = default_trtnet().add_plugin_v2(plug_inputs, gemm_plug)
TypeError: add_plugin_v2(): incompatible function arguments. The following argument types are supported:
1. (self: tensorrt.tensorrt.INetworkDefinition, inputs: List[tensorrt.tensorrt.ITensor], plugin: tensorrt.tensorrt.IPluginV2) -> tensorrt.tensorrt.IPluginV2Layer

Invoked with: <tensorrt.tensorrt.INetworkDefinition object at 0x7fdb1d530730>, [<tensorrt.tensorrt.ITensor object at 0x7fdb1d570d30>, <tensorrt.tensorrt.ITensor object at 0x7fd8da553170>], None

nv-guomingz · 2024-11-19T10:44:59Z

CUBLAS_STATUS_INTERNAL_ERROR

Hi @rexmxw02 , could u please try the latest code base to see if the issue still exists or not?
I can't reproduce your issue on my H100

rexmxw02 added the bug Something isn't working label Nov 6, 2024

nv-guomingz added duplicate This issue or pull request already exists triaged Issue has been triaged by maintainers labels Nov 6, 2024

hello-11 assigned nv-guomingz Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen 2-1.5B model build error #2420

qwen 2-1.5B model build error #2420

rexmxw02 commented Nov 6, 2024

rexmxw02 commented Nov 6, 2024

nv-guomingz commented Nov 6, 2024

rexmxw02 commented Nov 7, 2024 •

edited

Loading

nv-guomingz commented Nov 19, 2024

qwen 2-1.5B model build error #2420

qwen 2-1.5B model build error #2420

Comments

rexmxw02 commented Nov 6, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

rexmxw02 commented Nov 6, 2024

nv-guomingz commented Nov 6, 2024

rexmxw02 commented Nov 7, 2024 • edited Loading

nv-guomingz commented Nov 19, 2024

rexmxw02 commented Nov 7, 2024 •

edited

Loading