Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Transformers logits manipulators #241

Open
0xymoro opened this issue Nov 2, 2023 · 3 comments
Open

Add Transformers logits manipulators #241

0xymoro opened this issue Nov 2, 2023 · 3 comments
Assignees
Labels
Community want to contribute feature request New feature or request Sampling triaged Issue has been triaged by maintainers

Comments

@0xymoro
Copy link
Contributor

0xymoro commented Nov 2, 2023

Hi - really interesting work. We're currently using HF TGI in production and exploring using this instead, are there plans to add things like typical_p that transformers supports? Would greatly ease the transition. Thanks!

@0xymoro
Copy link
Contributor Author

0xymoro commented Nov 2, 2023

In particular typical p in production environments (our 300k users) has proved to create significantly more natural sequences. The Python code is at line 456 of https://github.com/huggingface/transformers/blob/main/src/transformers/generation/logits_process.py and it is a pretty simple entropy calculation & filtering out the high entropy (unpredictable/off the rails) and low entropy (boring and contributing nothing new) tokens.

I see the sampling is done at a much lower level here and it's pretty different but please let me know if I can help in making some PR. I'm not familiar with cuda programming as I am with python but happy to help if there's any way.

@juney-nvidia
Copy link
Collaborator

@jerryMeng100

Thanks for sharing the idea.

For sure it is more than welcome for you to make contribution to TensorRT-LLM to add the typical P support.
Currently, the community contribution process is(and the process may be iterated and improved based on the concrete feedback we receive):

  • Community members prepare the MR and do the validation in their local environment.
  • When the MR is ready, they can ping us for code review(like this one and this one). Dedicated NVIDIA engineers will be assigned to work with the community contributor to merge his or her MR into our internal repo and go through all the internal validation process.
  • When it is internally validated okay, the community contributed code will be incorporated as part of the next release(either to the main branch or the new release branch) commit(like this one), with explicitly acknowledging the community member's name, also in the release commit, the community member will be mentioned as the co-author. Thus to ensure the community contribution can be respected and acknowledged suitably.

Pls let us know whether it makes sense to you.

Thanks
June

@juney-nvidia juney-nvidia self-assigned this Nov 5, 2023
@ncomly-nvidia ncomly-nvidia added the triaged Issue has been triaged by maintainers label Nov 6, 2023
@nv-guomingz
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community want to contribute feature request New feature or request Sampling triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants