Skip to content

Commit

Permalink
Merge pull request #990 from weixuanfu/v0_11_1
Browse files Browse the repository at this point in the history
v0.11.1 minor release
  • Loading branch information
weixuanfu authored Jan 3, 2020
2 parents 3d31727 + e6e7ce6 commit aea42a5
Show file tree
Hide file tree
Showing 21 changed files with 124 additions and 103 deletions.
7 changes: 0 additions & 7 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,6 @@ matrix:
env: PYTHON_VERSION="3.7" COVERAGE="true" DASK_ML_VERSION="1.0.0"
before_install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- name: "Python 3.7 on macOS"
os: osx
osx_image: xcode10.2 # Python 3.7.2 running on macOS 10.14.3
language: shell # 'language: python' is an error on Travis CI macOS
env: PYTHON_VERSION="3.7" DASK_ML_VERSION="1.0.0"
before_install:
- wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
install: source ./ci/.travis_install.sh
script: bash ./ci/.travis_test.sh
after_success:
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ <h2 id="overview">Overview</h2>
<td>subscription prediction</td>
<td>classification</td>
<td align="center"><a href="https://archive.ics.uci.edu/ml/datasets/Bank+Marketing">link</a></td>
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Portuguese%20Bank%20Marketing/Portuguese%20Bank%20Marketing%20Stratergy.ipynb">link</a></td>
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Portuguese%20Bank%20Marketing/Portuguese%20Bank%20Marketing%20Strategy.ipynb">link</a></td>
</tr>
<tr>
<td>MAGIC Gamma Telescope</td>
Expand Down
2 changes: 1 addition & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -213,5 +213,5 @@

<!--
MkDocs version : 0.17.2
Build Date UTC : 2019-11-05 20:44:02
Build Date UTC : 2020-01-03 17:34:52
-->
4 changes: 2 additions & 2 deletions docs/search/search_index.json

Large diffs are not rendered by default.

20 changes: 10 additions & 10 deletions docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,79 +4,79 @@

<url>
<loc>http://epistasislab.github.io/tpot/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/installing/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/using/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/api/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/examples/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/contributing/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/releases/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/citing/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/support/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>



<url>
<loc>http://epistasislab.github.io/tpot/related/</loc>
<lastmod>2019-11-05</lastmod>
<lastmod>2020-01-03</lastmod>
<changefreq>daily</changefreq>
</url>

Expand Down
2 changes: 1 addition & 1 deletion docs/using/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -661,7 +661,7 @@ <h1 id="template-option-in-tpot">Template option in TPOT</h1>

<p>If a specific operator, e.g. <code>SelectPercentile</code>, is preferred for usage in the 1st step of the pipeline, the template can be defined like 'SelectPercentile-Transformer-Classifier'.</p>
<h1 id="featuresetselector-in-tpot">FeatureSetSelector in TPOT</h1>
<p><code>FeatureSetSelector</code> is a special new operator in TPOT. This operator enables feature selection based on <em>priori</em> export knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database (<a href="http://software.broadinstitute.org/gsea/msigdb/index.jsp">MSigDB</a>) in the 1st step of pipeline via <code>template</code> option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.</p>
<p><code>FeatureSetSelector</code> is a special new operator in TPOT. This operator enables feature selection based on <em>priori</em> expert knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database (<a href="http://software.broadinstitute.org/gsea/msigdb/index.jsp">MSigDB</a>) in the 1st step of pipeline via <code>template</code> option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.</p>
<p>Please check our <a href="https://www.biorxiv.org/content/10.1101/502484v1.article-info">preprint paper</a> for more details.</p>
<pre><code class="Python">from tpot import TPOTClassifier
import numpy as np
Expand Down
2 changes: 1 addition & 1 deletion docs_sources/using.md
Original file line number Diff line number Diff line change
Expand Up @@ -550,7 +550,7 @@ If a specific operator, e.g. `SelectPercentile`, is preferred for usage in the 1

# FeatureSetSelector in TPOT

`FeatureSetSelector` is a special new operator in TPOT. This operator enables feature selection based on *priori* export knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database ([MSigDB](http://software.broadinstitute.org/gsea/msigdb/index.jsp)) in the 1st step of pipeline via `template` option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.
`FeatureSetSelector` is a special new operator in TPOT. This operator enables feature selection based on *priori* expert knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database ([MSigDB](http://software.broadinstitute.org/gsea/msigdb/index.jsp)) in the 1st step of pipeline via `template` option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.

Please check our [preprint paper](https://www.biorxiv.org/content/10.1101/502484v1.article-info) for more details.

Expand Down
2 changes: 1 addition & 1 deletion optional-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
xgboost==0.6a2
xgboost==0.90
scikit-mdr==0.4.4
skrebate==0.3.4
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
deap>=1.2
nose==1.3.7
numpy>=1.16.3
scikit-learn>=0.21.0
scikit-learn>=0.22.0
scipy>=1.3.1
tqdm>=4.36.1
update-checker>=0.16
stopit>=1.1.1
stopit>=1.1.2
pandas>=0.24.2
joblib>=0.13.2
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def calculate_version():
zip_safe=True,
install_requires=['numpy>=1.16.3',
'scipy>=1.3.1',
'scikit-learn>=0.21.0',
'scikit-learn>=0.22.0',
'deap>=1.2',
'update_checker>=0.16',
'tqdm>=4.36.1',
Expand Down
2 changes: 0 additions & 2 deletions tests/driver_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,8 +296,6 @@ def test_print_args(self):
VERBOSITY = 1
"""
print

self.assertEqual(_sort_lines(expected_output), _sort_lines(output))


Expand Down
25 changes: 13 additions & 12 deletions tests/export_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ def test_export_random_ind():
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from tpot.export_utils import set_param_recursive
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
Expand All @@ -80,14 +79,14 @@ def test_export_random_ind():
train_test_split(features, tpot_data['target'], random_state=39)
exported_pipeline = BernoulliNB(alpha=1.0, fit_prior=False)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 39)
# Fix random state in exported estimator
if hasattr(exported_pipeline, 'random_state'):
setattr(exported_pipeline, 'random_state', 39)
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
"""
exported_code = export_pipeline(pipeline, tpot_obj.operators, tpot_obj._pset, random_state=tpot_obj.random_state)

assert expected_code == exported_code


Expand Down Expand Up @@ -487,18 +486,17 @@ def test_export_pipeline_6():
"""Assert that exported_pipeline() generated a compile source file with random_state and data_file_path."""

pipeline_string = (
'KNeighborsClassifier('
'input_matrix, '
'KNeighborsClassifier__n_neighbors=10, '
'KNeighborsClassifier__p=1, '
'KNeighborsClassifier__weights=uniform'
')'
'DecisionTreeClassifier(SelectPercentile(input_matrix, SelectPercentile__percentile=20),'
'DecisionTreeClassifier__criterion=gini, DecisionTreeClassifier__max_depth=8,'
'DecisionTreeClassifier__min_samples_leaf=5, DecisionTreeClassifier__min_samples_split=5)'
)
pipeline = creator.Individual.from_string(pipeline_string, tpot_obj._pset)
expected_code = """import numpy as np
import pandas as pd
from sklearn.feature_selection import SelectPercentile, f_classif
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sklearn.tree import DecisionTreeClassifier
from tpot.export_utils import set_param_recursive
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
Expand All @@ -507,7 +505,10 @@ def test_export_pipeline_6():
training_features, testing_features, training_target, testing_target = \\
train_test_split(features, tpot_data['target'], random_state=42)
exported_pipeline = KNeighborsClassifier(n_neighbors=10, p=1, weights="uniform")
exported_pipeline = make_pipeline(
SelectPercentile(score_func=f_classif, percentile=20),
DecisionTreeClassifier(criterion="gini", max_depth=8, min_samples_leaf=5, min_samples_split=5)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)
Expand Down
4 changes: 2 additions & 2 deletions tests/stacking_estimator_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def test_StackingEstimator_3():

# test cv score
cv_score = np.mean(cross_val_score(sklearn_pipeline, training_features, training_target, cv=3, scoring='accuracy'))
known_cv_score = 0.9472823753147593
known_cv_score = 0.9643652561247217

assert np.allclose(known_cv_score, cv_score)

Expand All @@ -101,6 +101,6 @@ def test_StackingEstimator_4():

# test cv score
cv_score = np.mean(cross_val_score(sklearn_pipeline, training_features_r, training_target_r, cv=3, scoring='r2'))
known_cv_score = 0.7989564328211737
known_cv_score = 0.8216045257587923

assert np.allclose(known_cv_score, cv_score)
29 changes: 24 additions & 5 deletions tests/tpot_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,10 @@
from joblib import Memory
from sklearn.metrics import make_scorer, roc_auc_score
from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin, TransformerMixin
from sklearn.feature_selection.base import SelectorMixin
try:
from sklearn.feature_selection._base import SelectorMixin
except ImportError:
from sklearn.feature_selection.base import SelectorMixin
from deap import creator, gp
from deap.tools import ParetoFront
from nose.tools import nottest, assert_raises, assert_not_equal, assert_greater_equal, assert_equal, assert_in
Expand Down Expand Up @@ -965,7 +968,7 @@ def test_fit_4():
assert tpot_obj.generations == 1000000

# reset generations to 20 just in case that the failed test may take too much time
tpot_obj.generations == 20
tpot_obj.generations = 20

tpot_obj.fit(training_features, training_target)
assert tpot_obj._pop == []
Expand All @@ -988,7 +991,7 @@ def test_fit_5():
assert tpot_obj.generations == 1000000

# reset generations to 20 just in case that the failed test may take too much time
tpot_obj.generations == 20
tpot_obj.generations = 20

tpot_obj.fit(training_features, training_target)
assert tpot_obj._pop != []
Expand Down Expand Up @@ -1426,7 +1429,15 @@ def pareto_eq(ind1, ind2):
sklearn_pipeline = tpot_obj._toolbox.compile(expr=deap_pipeline)

try:
cv_scores = cross_val_score(sklearn_pipeline, training_features, training_target, cv=5, scoring='accuracy', verbose=0)
with warnings.catch_warnings():
warnings.simplefilter('ignore')
cv_scores = cross_val_score(sklearn_pipeline,
training_features,
training_target,
cv=5,
scoring='accuracy',
verbose=0,
error_score='raise')
mean_cv_scores = np.mean(cv_scores)
except Exception as e:
mean_cv_scores = -float('inf')
Expand Down Expand Up @@ -1460,7 +1471,15 @@ def pareto_eq(ind1, ind2):
sklearn_pipeline = tpot_obj._toolbox.compile(expr=deap_pipeline)

try:
cv_scores = cross_val_score(sklearn_pipeline, training_features, training_target, cv=5, scoring='accuracy', verbose=0)
with warnings.catch_warnings():
warnings.simplefilter('ignore')
cv_scores = cross_val_score(sklearn_pipeline,
training_features,
training_target,
cv=5,
scoring='accuracy',
verbose=0,
error_score='raise')
mean_cv_scores = np.mean(cv_scores)
except Exception as e:
mean_cv_scores = -float('inf')
Expand Down
2 changes: 1 addition & 1 deletion tpot/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@
"""

__version__ = '0.11.0'
__version__ = '0.11.1'
Loading

0 comments on commit aea42a5

Please sign in to comment.