Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link to common datasets #746

Closed
piskvorky opened this issue Jun 18, 2016 · 9 comments
Closed

Link to common datasets #746

piskvorky opened this issue Jun 18, 2016 · 9 comments
Labels
difficulty easy Easy issue: required small fix documentation Current issue related to documentation wishlist Feature request

Comments

@piskvorky
Copy link
Owner

piskvorky commented Jun 18, 2016

There's a bunch of datasets and even trained models, that are suitable as gensim input.

Collect them and create and promote a page that links to these resources.

Example:

@piskvorky piskvorky added documentation Current issue related to documentation difficulty easy Easy issue: required small fix labels Jun 18, 2016
@panamantis
Copy link

Here's another resource. I'm still looking for a doc2vec
https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models

@tmylk
Copy link
Contributor

tmylk commented Oct 6, 2016

Hi @panamantis Thanks for the link. Did you come across any pre-trained doc2vec models?

@joyjeni
Copy link

joyjeni commented Oct 16, 2016

I m checking pretrained word2vec and topicmodelling models mentioned in https://github.com/ai-ku/wvec
and
http://www.pdhillon.com/code.html

@chinmayapancholi13
Copy link
Contributor

chinmayapancholi13 commented Mar 8, 2017

Hey! I found the following pre-trained word2vec resources to be relevant as well.
https://github.com/alexandres/lexvec
http://cistern.cis.lmu.de/meta-emb/
https://github.com/icoxfog417/fastTextJapaneseTutorial
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
http://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes

Two pre-trained doc2vec models, one for 'English Wikipedia' and another for 'Associated Press News' , have been provided here : https://github.com/jhlau/doc2vec

@tmylk
Copy link
Contributor

tmylk commented Mar 9, 2017

More pre-trained word2vec models from @akutuzov

http://ltr.uio.no/semvec/en/about#models

@akutuzov
Copy link
Contributor

@tmylk the preferred link to the WebVectors service has changed:
http://ltr.uio.no/semvec/ is deprecated, the correct URL now is http://vectors.nlpl.eu/explore/embeddings/

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Oct 2, 2017

Will be resolved in #1492, #1453

@menshikh-iv menshikh-iv added the wishlist Feature request label Oct 2, 2017
@menshikh-iv
Copy link
Contributor

Resolved in #1705

@piskvorky
Copy link
Owner Author

@menshikh-iv which of the resources above (from @akutuzov , @chinmayapancholi13 , @joyjeni , @panamantis ) are already included? Any plans to include others (where relevant)? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty easy Easy issue: required small fix documentation Current issue related to documentation wishlist Feature request
Projects
None yet
Development

No branches or pull requests

7 participants