Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. #3617

Open
Weishaoya opened this issue Nov 24, 2024 · 2 comments
Labels
question Further information is requested

Comments

@Weishaoya
Copy link

Describe your problem

image
image
The retrieval speed is too slow, How do I follow the prompt "You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. "to speed up the retrieval_test api? The commit id for my ragflow version is a20b820, and I hope you can help me.

@Weishaoya Weishaoya added the question Further information is requested label Nov 24, 2024
@KevinHuSh
Copy link
Collaborator

Sorry, I can't identify where the BertTokenizerFast is used here.
Do you set re-rank model which is quite time consuming?
Otherwise, we've accelerate retrieval in the latest code.

@Weishaoya
Copy link
Author

Sorry, I can't identify where the BertTokenizerFast is used here. Do you set re-rank model which is quite time consuming? Otherwise, we've accelerate retrieval in the latest code.
Thank you! I don't set re-rank model.
image
image
image
My ragflow service was running in a vmware environment with no gpu and only 8 cpus. When I switched to the gpu environment, I set RERANK_PAGE_LIMIT: 2, page_size: 2, the retrieval_test api speed is greatly improved, and the warning mentioned in the title disappears, except for the first time to load the embedding model to the gpu, which takes 5s, the rest of the test time is less than 0.5s. I would like to ask, if there is no gpu environment, will the embedding model be loaded and memory released every time the retrieval_test api is called, which will cause it to be too slow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants