You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add model_kwargs and tokenizer_kwargs option to TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder#8055
Closed
sjrl opened this issue
Jul 23, 2024
· 0 comments
· Fixed by #8145
Is your feature request related to a problem? Please describe.
We are starting to see more open source embedding and ranking models that have long model max lengths (e.g. up to 8k tokens). This is great advancement!
However, as a user I'd like to be able to set the max length of these models to a lower value sometimes (e.g. 1024) so I can better control the memory usage during inference time. For example, when left at 8K tokens and I accidentally pass one large document to the Ranker or Embedders it causes the whole batch to have an 8K matrix length which can cause an OOM if I only have a small amount of resources.
This is easily fixable if I can specify model_max_length which is a kwarg that I can pass to the from_pretrained method of the Tokenizer.
So in general I think it would be wise to add model_kwargs and tokenizer_kwargs as optional params when we load models from HuggingFace or SentenceTransformers. A good place to start would be the components TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, and SentenceTransformersTextEmbedder.
Additional context
Some example models that would benefit from these parameters:
Is your feature request related to a problem? Please describe.
We are starting to see more open source embedding and ranking models that have long model max lengths (e.g. up to 8k tokens). This is great advancement!
However, as a user I'd like to be able to set the max length of these models to a lower value sometimes (e.g. 1024) so I can better control the memory usage during inference time. For example, when left at 8K tokens and I accidentally pass one large document to the Ranker or Embedders it causes the whole batch to have an 8K matrix length which can cause an OOM if I only have a small amount of resources.
This is easily fixable if I can specify
model_max_length
which is a kwarg that I can pass to the from_pretrained method of theTokenizer
.So in general I think it would be wise to add
model_kwargs
andtokenizer_kwargs
as optional params when we load models from HuggingFace or SentenceTransformers. A good place to start would be the componentsTransformersSimilarityRanker
,SentenceTransformersDocumentEmbedder
, andSentenceTransformersTextEmbedder
.Additional context
Some example models that would benefit from these parameters:
default_language
as a model_kwarg to benefit from the language specific adapter for embedding.The text was updated successfully, but these errors were encountered: