Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

可否利用这个方法查重 #1

Open
JasonBike opened this issue Sep 30, 2018 · 2 comments
Open

可否利用这个方法查重 #1

JasonBike opened this issue Sep 30, 2018 · 2 comments

Comments

@JasonBike
Copy link

如何输入一段文字然后搜索全库找出是否有相似度十分高的文档

@joway
Copy link
Owner

joway commented Oct 5, 2018

只要你的 hash 算法恰当就能够利用它来查重的。但是建议用于句子查重,大文本不太适用这种方式的查重。

@JasonBike

@NingerJohn
Copy link

只要你的 hash 算法恰当就能够利用它来查重的。但是建议用于句子查重,大文本不太适用这种方式的查重。

@JasonBike

大文本,有啥推荐的吗?我们这边的场景是题目的题干这种查重。一般几百个文字(中文,英文,字符)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants