Skip to content

How to get the newly generated tokens only? #227

@yunfeng-scale

Description

@yunfeng-scale

Hi, I'm trying to get the newly generated tokens. It looks like currently the model returns all the token ids including input.

I tried to update the postprocessing model to take in REQUEST_INPUT_LEN from preprocessing model, however I'm receiving error E1101 04:46:29.381834 3651 model_repository_manager.cc:563] Invalid argument: in ensemble ensemble, step of model 'ensemble' receives inputs originated from different decoupled models.

This appears to be that postprocessing model tries to take input from both the decouple model tensorrt_llm and non-decoupled model preprocessing. Ensemble loads if I set model_transaction_policy.decoupled of model tensorrt_llm to false.

I could also do some tokenizations outside of the model ensemble but that is duplicated calculations. Any suggestions?

Metadata

Metadata

Assignees

Labels

feature requestNew feature or request. This includes new model, dtype, functionality supporttriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions