You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A GPT model based on the triton-with-ft always generates a sequence with a length of request_max_output_len instead of ending generation with the eos_id.
#577
Closed
songkq opened this issue
Apr 24, 2023
· 2 comments
@byshiue Hi, could you please give some advice for this issue?
A GPT model based on the triton-with-ft always generate a sequence with a length of request_max_output_len. It cannot end to generate tokens even with eos_id. Once request_max_output_len was set done, the elapsed_time keep the same regardless of the lenghth of input query and output_sequence_length.
This is a know issue, please refer this issue #487.
songkq
changed the title
A GPT model based on the triton-with-ft always generate a sequence with a length of request_max_output_len instead of ending generation with the eos_id.
A GPT model based on the triton-with-ft always generates a sequence with a length of request_max_output_len instead of ending generation with the eos_id.
Apr 24, 2023
@byshiue Hi, could you please give some advice for this issue?
A GPT model based on the triton-with-ft always generate a sequence with a length of
request_max_output_len
. It cannot end to generate tokens even with eos_id. Oncerequest_max_output_len
was set done, the elapsed_time keep the same regardless of the lenghth of input query andoutput_sequence_length
.model: nemo-megatron-gpt-5B
The text was updated successfully, but these errors were encountered: