[Question]: You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `call` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. #3617

Weishaoya · 2024-11-24T16:45:58Z

Describe your problem

The retrieval speed is too slow, How do I follow the prompt "You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. "to speed up the retrieval_test api? The commit id for my ragflow version is a20b820, and I hope you can help me.

The text was updated successfully, but these errors were encountered:

KevinHuSh · 2024-11-25T02:19:19Z

Sorry, I can't identify where the BertTokenizerFast is used here.
Do you set re-rank model which is quite time consuming?
Otherwise, we've accelerate retrieval in the latest code.

Weishaoya · 2024-11-25T16:01:53Z

Sorry, I can't identify where the BertTokenizerFast is used here. Do you set re-rank model which is quite time consuming? Otherwise, we've accelerate retrieval in the latest code.
Thank you! I don't set re-rank model.

My ragflow service was running in a vmware environment with no gpu and only 8 cpus. When I switched to the gpu environment, I set RERANK_PAGE_LIMIT: 2, page_size: 2, the retrieval_test api speed is greatly improved, and the warning mentioned in the title disappears, except for the first time to load the embedding model to the gpu, which takes 5s, the rest of the test time is less than 0.5s. I would like to ask, if there is no gpu environment, will the embedding model be loaded and memory released every time the retrieval_test api is called, which will cause it to be too slow?

Weishaoya added the question Further information is requested label Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `call` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. #3617

[Question]: You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `call` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. #3617

Weishaoya commented Nov 24, 2024

KevinHuSh commented Nov 25, 2024

Weishaoya commented Nov 25, 2024

[Question]: You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. #3617

[Question]: You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. #3617

Comments

Weishaoya commented Nov 24, 2024

Describe your problem

KevinHuSh commented Nov 25, 2024

Weishaoya commented Nov 25, 2024

[Question]: You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `call` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. #3617

[Question]: You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `call` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. #3617