Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在纯中文应用场景下的评测指标比较 #14

Open
zhanghx0905 opened this issue Jan 28, 2024 · 5 comments
Open

在纯中文应用场景下的评测指标比较 #14

zhanghx0905 opened this issue Jan 28, 2024 · 5 comments

Comments

@zhanghx0905
Copy link

你好,我看到你们的工作取得了非常优秀的评测成绩。

我想知道在纯中文RAG评测集中,你们的embedding和reranker组合与其他组合相比怎么样,比如bge-zh和bge-reranker。
在我们目前的落地需求中,文档以中文为主,双语的需求应该会非常少。

@shenlei1020
Copy link
Collaborator

@zhanghx0905
Copy link
Author

谢谢更新,我还有一个问题。

https://huggingface.co/maidalun1020/bce-reranker-base_v1/blob/main/tokenizer_config.json

model_max_length 为什么设置的这么大,不设置成 512 吗?

@shenlei1020
Copy link
Collaborator

建议按照这么写,在sentence transformers调用cross encoder时候写指定512。

@zhanghx0905
Copy link
Author

建议按照这么写,在sentence transformers调用cross encoder时候写指定512。

谢谢,这个有什么说法吗

@Kgoeson
Copy link

Kgoeson commented Mar 8, 2024

建议按照这么写,在sentence transformers调用cross encoder时候写指定512。
model_max_length最大可以多少呢?目前文档的需求要求文本块大一点的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants