Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recipe for Gensim #3225

Merged
merged 35 commits into from
Sep 4, 2017
Merged
Changes from 26 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4f80272
Add recipe for Gensim
invalid-email-address Jun 30, 2017
1b25b9c
Update meta.yaml
souravsingh Jul 6, 2017
d96feb8
Add additional dependencies
souravsingh Jul 6, 2017
f88123a
Make fixes
souravsingh Jul 6, 2017
321e1bb
Update minimum NumPy and Scipy vers and fix tests
souravsingh Jul 7, 2017
62aef3d
Update meta.yaml
souravsingh Jul 7, 2017
454a9a5
Fix test
souravsingh Jul 14, 2017
b28f0e8
Fix problems
souravsingh Jul 14, 2017
bb4ecc5
Update meta.yaml
souravsingh Jul 17, 2017
7831ca8
Update meta.yaml
souravsingh Jul 17, 2017
22c08c7
Update meta.yaml
souravsingh Jul 17, 2017
74bda87
Small fix
souravsingh Jul 18, 2017
b6a7a3e
Fix test requirements
souravsingh Jul 18, 2017
118e8b4
Update meta.yaml
souravsingh Jul 18, 2017
e47dff8
Remove cython and update the about sections
souravsingh Jul 20, 2017
b38a352
Updates to about section
souravsingh Jul 20, 2017
aa07df0
Update version
souravsingh Aug 3, 2017
85fb030
Skip win32 builds
souravsingh Aug 3, 2017
f6e8a6c
Remove python-annoy from test dependency
souravsingh Aug 3, 2017
31e6c43
Update meta.yaml
souravsingh Aug 4, 2017
df8323e
Add command
souravsingh Aug 7, 2017
6c23113
Update meta.yaml
souravsingh Aug 16, 2017
c71e153
Make updates to recipe
souravsingh Aug 16, 2017
42ac3c5
Update meta.yaml
souravsingh Aug 16, 2017
fe9051d
Update meta.yaml
souravsingh Aug 17, 2017
a5395e8
Add test requirements
souravsingh Aug 17, 2017
6d2164a
Make changes to accomodate nosetests
souravsingh Aug 22, 2017
a43cba4
Pin scikit-learn version
souravsingh Aug 24, 2017
0d7ff05
Remove tensorflow from test requirements
souravsingh Aug 24, 2017
3b6612f
Skip builds entirely for OSX and Windows
souravsingh Aug 24, 2017
61854a0
Fix formatting
souravsingh Aug 24, 2017
4df111d
Remove keras from test requires
souravsingh Aug 26, 2017
45218e8
Test support for win64 platform
souravsingh Aug 26, 2017
1bb5ce8
Update meta.yaml
ocefpaf Sep 1, 2017
53a16a2
add the skip statement
ocefpaf Sep 1, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions recipes/gensim/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
{% set name = "gensim" %}
{% set version = "2.3.0" %}
{% set sha256 = "7d0dccc7d2c576e270037949874800b7cfbc86ef081ff981483f612cd18e223f" %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update this every release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version and the sha256 checksum is updated for every release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update it manually every release?

Copy link
Contributor Author

@souravsingh souravsingh Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, We only need to update the version and the sha256 for every release. This is done by sending a PR to the complete gensim feedstock repository.

Copy link
Member

@menshikh-iv menshikh-iv Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't catch, should we update this two lines over creating PR here after each release? Another question - what is this hash from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a new release has been made when this recipe hasn't been merged, we update it the two lines from here to ensure that the latest version is packaged.

Otherwise, if this patch is merged before the new release, we will be sending a pull request to the feedstock repository of gensim, which would look similar to this- https://github.com/conda-forge/validictory-feedstock and the maintainers of the recipe(mentioned in the recipe-maintainers section of the recipe) will take a look and merge the patch to update the version of the conda package.

The sha256 comes from the project link in pypi.io.


package:
name: {{ name|lower }}
version: {{ version }}

source:
fn: {{ name }}-{{ version }}.tar.gz
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz
sha256: {{ sha256 }}

build:
number: 0
script: python setup.py install --single-version-externally-managed --record record.txt

requirements:
build:
- python
- setuptools
- numpy >=1.11.3
- scipy >=0.18.1
- six >=1.5.0
- smart_open >=1.2.1

run:
- python
- numpy >=1.11.3
- scipy >=0.18.1
- six >=1.5.0
- smart_open >=1.2.1

test:
requires:
- morfessor
- scikit-learn

imports:
- gensim
- gensim.corpora
- gensim.models
- gensim.models.wrappers
- gensim.parsing
- gensim.scripts
- gensim.similarities
- gensim.sklearn_integration
- gensim.summarization
- gensim.test
- gensim.topic_coherence

commands:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you removed test run?

Copy link
Contributor Author

@souravsingh souravsingh Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few reasons for removing the test runs-

  1. Tensorflow doesn't support 32-bit Windows, so running Gensim tests for win-32 on Appveyor will fail. It is possible to skip win-32 builds, but it might defeat the purpose of full support.

  2. The test run times out in Travis( which is used for building recipe for OSX) at Doc2Vec parallel training test due to no output for 10 mins.

Do we want to add the test run in the recipe?

Copy link
Member

@menshikh-iv menshikh-iv Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we want to run tests for sure. About win32 - we can check win support with x64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing for sklearn_integration

======================================================================
ERROR: testPipeline (gensim.test.test_sklearn_integration.TestSklLdaModelWrapper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/staged-recipes/build_artefacts/gensim_1503413959047/_t_env/lib/python2.7/site-packages/gensim/test/test_sklearn_integration.py", line 153, in testPipeline
    text_lda.fit(corpus, data.target)
  File "/staged-recipes/build_artefacts/gensim_1503413959047/_t_env/lib/python2.7/site-packages/sklearn/pipeline.py", line 257, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/staged-recipes/build_artefacts/gensim_1503413959047/_t_env/lib/python2.7/site-packages/sklearn/pipeline.py", line 226, in _fit
    self.steps[step_idx] = (name, fitted_transformer)
TypeError: 'tuple' object does not support item assignment
-------------------- >> begin captured logging << --------------------

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem with sklearn==0.19.0, we already fix it in develop branch. Now you can "hardcode" sklearn to 0.18.2 to avoid this (and remove in next gensim release)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pinning has solved the sklearn tests, but there is a new failure-

======================================================================
ERROR: Test Keras 'Embedding' layer returned by 'get_embedding_layer' function for a smaller version of the 20NewsGroup classification problem.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/site-packages/gensim/test/test_keras_integration.py", line 100, in testEmbeddingLayer20NewsGroup
    data = fetch_20newsgroups(subset='train', categories=['alt.atheism', 'comp.graphics', 'sci.space'])
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/site-packages/sklearn/datasets/twenty_newsgroups.py", line 225, in fetch_20newsgroups
    cache_path=cache_path)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/site-packages/sklearn/datasets/twenty_newsgroups.py", line 91, in download_20newsgroups
    opener = urlopen(URL)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/staged-recipes/build_artefacts/gensim_1503567059210/_t_env/lib/python2.7/urllib2.py", line 1198, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 110] Connection timed out>

- conda inspect linkages -p $PREFIX gensim # [not win]
- conda inspect objects -p $PREFIX gensim # [osx]

about:
home: http://github.com/RaRe-Technologies/gensim
license: LGPL 3.0
license_file: COPYING
license_family: LGPL
summary: 'Topic Modelling for Humans'
description: |
Copy link

@piskvorky piskvorky Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the official tagline and description of gensim (see https://github.com/RaRe-Technologies/gensim).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The | is some specific formatter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Gensim is a Python library for topic modelling, document indexing
and similarity retrieval with large corpora.
Target audience is the natural language processing (NLP)
and information retrieval (IR) community.
doc_url: http://radimrehurek.com/gensim/
dev_url: https://github.com/RaRe-Technologies/gensim

extra:
recipe-maintainers:
- souravsingh
- tmylk
- menshikh-iv