-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
结果不准确,两段完全不同的文字居然计算出两个完全一致的simhash #6
Comments
这些文本也是计算重复的。。。 2018/04/13 17:17:40.041 [I] [main.go:87] 9ba529d31d516007 |
可以把top_n参数放大了试试看的,这个参数是可根据实际情况调整的吧 |
@betazk 尝试了,不行,最后还是换了tf-idf算法。:) |
|
你们换了什么算法?方便贴个github地址吗? @Anderson-Lu @betazk |
文本1:
文明2
top_n = 1
simhash 居然都是
11215416433742798855
11215416433742798855
The text was updated successfully, but these errors were encountered: