Link to common datasets #746

piskvorky · 2016-06-18T06:13:41Z

There's a bunch of datasets and even trained models, that are suitable as gensim input.

Collect them and create and promote a page that links to these resources.

Example:

GloVe vectors by NLP stanford: http://nlp.stanford.edu/data/glove.840B.300d.zip
various LSI/LDA/word2vec models trained on Wikipedia (I think I saw English, German, Spanish)

panamantis · 2016-06-30T13:50:31Z

Here's another resource. I'm still looking for a doc2vec
https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models

tmylk · 2016-10-06T05:12:17Z

Hi @panamantis Thanks for the link. Did you come across any pre-trained doc2vec models?

joyjeni · 2016-10-16T09:38:09Z

I m checking pretrained word2vec and topicmodelling models mentioned in https://github.com/ai-ku/wvec
and
http://www.pdhillon.com/code.html

chinmayapancholi13 · 2017-03-08T11:52:48Z

Hey! I found the following pre-trained word2vec resources to be relevant as well.
https://github.com/alexandres/lexvec
http://cistern.cis.lmu.de/meta-emb/
https://github.com/icoxfog417/fastTextJapaneseTutorial
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
http://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes

Two pre-trained doc2vec models, one for 'English Wikipedia' and another for 'Associated Press News' , have been provided here : https://github.com/jhlau/doc2vec

tmylk · 2017-03-09T13:18:16Z

More pre-trained word2vec models from @akutuzov

http://ltr.uio.no/semvec/en/about#models

akutuzov · 2017-03-15T21:56:48Z

@tmylk the preferred link to the WebVectors service has changed:
http://ltr.uio.no/semvec/ is deprecated, the correct URL now is http://vectors.nlpl.eu/explore/embeddings/

menshikh-iv · 2017-10-02T13:43:32Z

Will be resolved in #1492, #1453

menshikh-iv · 2017-11-14T08:33:03Z

Resolved in #1705

piskvorky · 2017-11-14T12:32:59Z

@menshikh-iv which of the resources above (from @akutuzov , @chinmayapancholi13 , @joyjeni , @panamantis ) are already included? Any plans to include others (where relevant)? Thanks.

piskvorky added documentation Current issue related to documentation difficulty easy Easy issue: required small fix labels Jun 18, 2016

macks22 mentioned this issue Jun 28, 2017

Data/Model storage #1453

Closed

menshikh-iv added the wishlist Feature request label Oct 2, 2017

menshikh-iv closed this as completed Nov 14, 2017

menshikh-iv mentioned this issue Nov 14, 2017

Add more datasets/models to gensim-data #1717

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link to common datasets #746

Link to common datasets #746

piskvorky commented Jun 18, 2016 •

edited

Loading

panamantis commented Jun 30, 2016

tmylk commented Oct 6, 2016

joyjeni commented Oct 16, 2016

chinmayapancholi13 commented Mar 8, 2017 •

edited

Loading

tmylk commented Mar 9, 2017

akutuzov commented Mar 15, 2017

menshikh-iv commented Oct 2, 2017 •

edited

Loading

menshikh-iv commented Nov 14, 2017

piskvorky commented Nov 14, 2017

Link to common datasets #746

Link to common datasets #746

Comments

piskvorky commented Jun 18, 2016 • edited Loading

panamantis commented Jun 30, 2016

tmylk commented Oct 6, 2016

joyjeni commented Oct 16, 2016

chinmayapancholi13 commented Mar 8, 2017 • edited Loading

tmylk commented Mar 9, 2017

akutuzov commented Mar 15, 2017

menshikh-iv commented Oct 2, 2017 • edited Loading

menshikh-iv commented Nov 14, 2017

piskvorky commented Nov 14, 2017

piskvorky commented Jun 18, 2016 •

edited

Loading

chinmayapancholi13 commented Mar 8, 2017 •

edited

Loading

menshikh-iv commented Oct 2, 2017 •

edited

Loading