You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
run.py Modified some content
''' python
input = """***""" # *** = print('input_tokens:',len(input_tokens[0])) The length of the input needs to be greater than around 2000 to make an error, But the max_output_len is 4096
parser.add_argument('--input_text', type=str, default=input)
'''
The text was updated successfully, but these errors were encountered:
run.py Modified some content ''' python input = """***""" # *** = print('input_tokens:',len(input_tokens[0])) The length of the input needs to be greater than around 2000 to make an error, But the max_output_len is 4096 parser.add_argument('--input_text', type=str, default=input) '''
That's because the default dynamic share memory size is only 46KB, which is not enough when the total length is longer than about 6k in a sampling kernel. You can try fixing this issue by adding
if (smem_size >= 46*1024)
{
cudaError_tres=cudaFuncSetAttribute(batchApplyRepetitionPenalty<T, RepetitionPenaltyType::Additive>,
cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size);
}
Baichuan2-13B-Chat
‘
python examples/baichuan/build.py --model_version v2_13b --max_input_len=4096 --max_output_len=4096 --model_dir ./models/Baichuan2-13B-Chat/ --dtype float16 --use_gemm_plugin float16 --use_gpt_attention_plugin float16 --use_weight_only --output_dir ./models/tmp/baichuan_v2_13b/trt_engines/fp16+flashattention+int8+4096/1-gpu/
’
Model conversion successful
‘

python examples/baichuan/run.py --model_version v2_13b --max_output_len=4096 --tokenizer_dir=./models/Baichuan2-13B-Chat/ --engine_dir=./models/tmp/baichuan_v2_13b/trt_engines/fp16+flashattention+int8+4096/1-gpu/
’
run.py Modified some content
''' python
input = """***""" # *** = print('input_tokens:',len(input_tokens[0])) The length of the input needs to be greater than around 2000 to make an error, But the max_output_len is 4096
parser.add_argument('--input_text', type=str, default=input)
'''
The text was updated successfully, but these errors were encountered: