Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add model_kwargs and tokenizer_kwargs option to TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder #8055

Closed
sjrl opened this issue Jul 23, 2024 · 0 comments · Fixed by #8145

Comments

@sjrl
Copy link
Contributor

sjrl commented Jul 23, 2024

Is your feature request related to a problem? Please describe.
We are starting to see more open source embedding and ranking models that have long model max lengths (e.g. up to 8k tokens). This is great advancement!

However, as a user I'd like to be able to set the max length of these models to a lower value sometimes (e.g. 1024) so I can better control the memory usage during inference time. For example, when left at 8K tokens and I accidentally pass one large document to the Ranker or Embedders it causes the whole batch to have an 8K matrix length which can cause an OOM if I only have a small amount of resources.

This is easily fixable if I can specify model_max_length which is a kwarg that I can pass to the from_pretrained method of the Tokenizer.

So in general I think it would be wise to add model_kwargs and tokenizer_kwargs as optional params when we load models from HuggingFace or SentenceTransformers. A good place to start would be the components TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, and SentenceTransformersTextEmbedder.

Additional context
Some example models that would benefit from these parameters:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant