-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Hi, I'm trying to get the newly generated tokens. It looks like currently the model returns all the token ids including input.
I tried to update the postprocessing model to take in REQUEST_INPUT_LEN from preprocessing model, however I'm receiving error E1101 04:46:29.381834 3651 model_repository_manager.cc:563] Invalid argument: in ensemble ensemble, step of model 'ensemble' receives inputs originated from different decoupled models.
This appears to be that postprocessing model tries to take input from both the decouple model tensorrt_llm and non-decoupled model preprocessing. Ensemble loads if I set model_transaction_policy.decoupled of model tensorrt_llm to false.
I could also do some tokenizations outside of the model ensemble but that is duplicated calculations. Any suggestions?