Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据集 #2

Open
doris-he opened this issue Nov 14, 2018 · 8 comments
Open

数据集 #2

doris-he opened this issue Nov 14, 2018 · 8 comments

Comments

@doris-he
Copy link

还是不行呀,我应该怎样修改,求指导

@zzdely
Copy link

zzdely commented Jan 16, 2019

您好,请问根据本数据集计算idf文件您使用的是什么方式?是单纯的对每个词求math.log(N/n)吗?我是这样单纯计算的,但是整个数据集所花费的时间非常恐怖,想请教一下有由有什么特殊的办法。

@huangtianan
Copy link

可以发一份数据集给我吗?万分感谢,1003899428@qq.com

@suifeng227
Copy link

您好,请问可以发一份数据集给我吗?万分感谢,602669371@qq.com

@bigzhao
Copy link
Owner

bigzhao commented May 29, 2019

数据地址:https://pan.baidu.com/s/1LBfqT86y7TEf4hDNCU6DpA
密码:qa2u

可以发一份数据集给我吗?万分感谢,1003899428@qq.com

@bigzhao
Copy link
Owner

bigzhao commented May 29, 2019

您好,请问可以发一份数据集给我吗?万分感谢,602669371@qq.com

数据地址:https://pan.baidu.com/s/1LBfqT86y7TEf4hDNCU6DpA
密码:qa2u

@huangtianan
Copy link

huangtianan commented May 29, 2019 via email

@bigzhao
Copy link
Owner

bigzhao commented May 30, 2019

您好,请问根据本数据集计算idf文件您使用的是什么方式?是单纯的对每个词求math.log(N/n)吗?我是这样单纯计算的,但是整个数据集所花费的时间非常恐怖,想请教一下有由有什么特殊的办法。

是这样做的,当时拿的代码是jieba分词作者提供的,印象中跑的时间不算特别久。
可以参考一下 fxsjy/jieba#393

@dulimei
Copy link

dulimei commented Sep 4, 2019

你好,我想问一下gensim 的docvector为啥训练保存的vector和采用同样解析关键词infer出来的结果不一致呢?并且不同时间infer出来的结果老变动?
训练样本本身:

print common_texts
[['human', 'interface', 'computer'], ['survey', 'user', 'computer', 'system', 'response', 'time'], ['eps', 'user', 'interface', 'system'], ['system', 'human', 'system', 'eps'], ['user', 'response', 'time'], ['trees'], ['graph', 'trees'], ['graph', 'minors', 'trees'], ['graph', 'minors', 'survey']]
modelnew.docvecs[0]
array([-0.03136408, -0.03000615, 0.03789993, 0.00673222, 0.06904926],
dtype=float32)

infer出来的结果

vector1 = modelnew.infer_vector(['human', 'interface', 'computer'], alpha=0.1, min_alpha=0.0001, steps=5)
print vector1
[-0.09213404 -0.02755559 -0.05117805 0.03933969 -0.09278027]

vector1 = modelnew.infer_vector(['human', 'interface', 'computer'], alpha=0.1, min_alpha=0.0001, steps=5)
print vector1
[-0.09092788 -0.02800747 -0.05157306 0.03857147 -0.09291061]

@bigzhao bigzhao changed the title problem 数据集 Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants