Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WikiCorpus scans corpus to determine vocabulary when an empty dictionary is provided #2052

Closed
ouromoros opened this issue May 16, 2018 · 3 comments

Comments

@ouromoros
Copy link

I have the following code:

wiki = WikiCorpus(inp, dictionary={}, lemmatize=False)

When I use it, it tries to scan through the corpus to determine vocabulary, while a dictionary is provided.

It clearly violates what is being said in the documentation:

dictionary (Dictionary, optional) – Dictionary, if not provided, this scans the corpus once, to determine its vocabulary (this needs really long time).

@piskvorky
Copy link
Owner

piskvorky commented May 16, 2018

@ouromoros please report the versions, as per our issue template.

@steremma this seems a bug introduced here: https://github.com/RaRe-Technologies/gensim/pull/1821/files#diff-eece52d95c280dabe57c803c95d6bb96L335 . That commit changed the logic that worked and that is still documented.

@steremma
Copy link
Contributor

@piskvorky I have already submitted a fix at #2042, I would also also prefer to complement it with a test that catches the mistake but I am very low on time ATM

@ouromoros
Copy link
Author

@piskvorky Sorry for not including the version, but you're right, that commit is the cause of it. @steremma 's code shoud fix it, so it seems this issue can be closed soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants