Configuration to allow output of special tokens #970

blahblahasdf · 2023-09-06T19:48:25Z

I have a model that generates special tokens with important meaning to generation tasks. At present there is no way to get these special tokens back in the generated text because of the hardcoded input to detokenize_incrementally in llm_engine.

This PR adds an option at start up to --keep-special-tokens which resolves this limitation.

I elected to resolve this with a ModelConfig change instead of changing SamplingParams but I could see an argument the other way.

…d of always skipping them.

blahblahasdf · 2023-09-06T19:53:56Z

This is an approach to addressing #893.

WoosukKwon · 2023-09-18T18:35:19Z

Hi @blahblahasdf, sorry for the late response. Could you elaborate more on the reason you add it as a model parameter instead of a sampling parameter?

blahblahasdf · 2023-09-20T15:43:13Z

No worries @WoosukKwon . For my use cases I want the special tokens for every request. To me, this meant it should be a configuration of the model more than the request. As I said I could easily see it the other way and I'd be happy to change this to an option on SamplingParams instead if that fits better with your design.

WoosukKwon · 2023-09-20T20:42:22Z

@blahblahasdf Thanks! I believe HF transformers also views it as a generation parameter, rather than a model's hyper parameter. Could you fix the code?

blahblahasdf · 2023-09-26T18:10:39Z

Great, will do!

blahblahasdf · 2023-09-26T21:59:17Z

I created a new PR for the different approach. #1186

blahblahasdf added 2 commits September 6, 2023 12:38

Add option to ModelConfig to keep special tokens in the output instea…

39dbd7b

…d of always skipping them.

Update documentation.

10bf588

blahblahasdf closed this Sep 26, 2023

blahblahasdf mentioned this pull request Sep 26, 2023

Keep special sampling params #1186

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration to allow output of special tokens #970

Configuration to allow output of special tokens #970

blahblahasdf commented Sep 6, 2023

blahblahasdf commented Sep 6, 2023

WoosukKwon commented Sep 18, 2023

blahblahasdf commented Sep 20, 2023

WoosukKwon commented Sep 20, 2023

blahblahasdf commented Sep 26, 2023

blahblahasdf commented Sep 26, 2023

Configuration to allow output of special tokens #970

Configuration to allow output of special tokens #970

Conversation

blahblahasdf commented Sep 6, 2023

blahblahasdf commented Sep 6, 2023

WoosukKwon commented Sep 18, 2023

blahblahasdf commented Sep 20, 2023

WoosukKwon commented Sep 20, 2023

blahblahasdf commented Sep 26, 2023

blahblahasdf commented Sep 26, 2023