Skip to content

Commit

Permalink
Merge pull request #765 from openml/develop
Browse files Browse the repository at this point in the history
Release 0.10
  • Loading branch information
mfeurer authored Aug 19, 2019
2 parents 8efcf9d + 0f99118 commit 0f36642
Show file tree
Hide file tree
Showing 57 changed files with 1,670 additions and 296 deletions.
9 changes: 5 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@ env:
- TEST_DIR=/tmp/test_dir/
- MODULE=openml
matrix:
- DISTRIB="conda" PYTHON_VERSION="3.5" SKLEARN_VERSION="0.20.0"
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.20.0"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.20.0" RUN_FLAKE8="true" SKIP_TESTS="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.20.0" COVERAGE="true" DOCPUSH="true"
- DISTRIB="conda" PYTHON_VERSION="3.5" SKLEARN_VERSION="0.21.2"
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.21.2"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.21.2" RUN_FLAKE8="true" SKIP_TESTS="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.21.2" COVERAGE="true" DOCPUSH="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.20.2"
# Checks for older scikit-learn versions (which also don't nicely work with
# Python3.7)
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.19.2"
Expand Down
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ following rules before you submit a pull request:
Drafts often benefit from the inclusion of a
[task list](https://github.com/blog/1375-task-lists-in-gfm-issues-pulls-comments)
in the PR description.
- Add [unit tests](https://github.com/openml/openml-python/tree/develop/tests) and [examples](https://github.com/openml/openml-python/tree/develop/examples) for any new functionality being introduced.
- If an unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example, `TestBase._mark_entity_for_removal('data', dataset.dataset_id)`, `TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name))`.
- Please ensure that the example is run on the test server by beginning with the call to `openml.config.start_using_configuration_for_example()`.
- All tests pass when running `pytest`. On
Unix-like systems, check with (from the toplevel source folder):
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
BSD 3-Clause License

Copyright (c) 2014-2018, Matthias Feurer, Jan van Rijn, Andreas Müller,
Copyright (c) 2014-2019, Matthias Feurer, Jan van Rijn, Andreas Müller,
Joaquin Vanschoren and others.
All rights reserved.

Expand Down
2 changes: 2 additions & 0 deletions PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ Please make sure that:
* for any new function or class added, please add it to doc/api.rst
* the list of classes and functions should be alphabetical
* for any new functionality, consider adding a relevant example
* add unit tests for new functionalities
* collect files uploaded to test server using _mark_entity_for_removal()
-->

#### Reference Issue
Expand Down
20 changes: 19 additions & 1 deletion ci_scripts/test.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
set -e

# check status and branch before running the unit tests
before="`git status --porcelain -b`"
before="$before"
# storing current working directory
curr_dir=`pwd`

run_tests() {
# Get into a temp directory to run test from the installed scikit learn and
# check if we do not leave artifacts
Expand All @@ -22,7 +28,7 @@ run_tests() {
PYTEST_ARGS=''
fi

pytest -n 4 --duration=20 --timeout=600 --timeout-method=thread -sv --ignore='test_OpenMLDemo.py' $PYTEST_ARGS $test_dir
pytest -n 4 --durations=20 --timeout=600 --timeout-method=thread -sv --ignore='test_OpenMLDemo.py' $PYTEST_ARGS $test_dir
}

if [[ "$RUN_FLAKE8" == "true" ]]; then
Expand All @@ -32,3 +38,15 @@ fi
if [[ "$SKIP_TESTS" != "true" ]]; then
run_tests
fi

# changing directory to stored working directory
cd $curr_dir
# check status and branch after running the unit tests
# compares with $before to check for remaining files
after="`git status --porcelain -b`"
if [[ "$before" != "$after" ]]; then
echo 'git status from before: '$before
echo 'git status from after: '$after
echo "All generated files have not been deleted!"
exit 1
fi
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ Modules
get_dataset
get_datasets
list_datasets
list_qualities
status_update

:mod:`openml.evaluations`: Evaluation Functions
Expand All @@ -83,6 +84,7 @@ Modules
:template: function.rst

list_evaluations
list_evaluation_measures

:mod:`openml.flows`: Flow Functions
-----------------------------------
Expand Down
3 changes: 2 additions & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import os
import sys
import sphinx_bootstrap_theme
import time
import openml

# If extensions (or modules to document with autodoc) are in another directory,
Expand Down Expand Up @@ -65,7 +66,7 @@
# General information about the project.
project = u'OpenML'
copyright = (
u'2014-2019, the OpenML-Python team.'
u'2014-{}, the OpenML-Python team.'.format(time.strftime("%Y,%m,%d,%H,%M,%S").split(',')[0])
)

# The version info for the project you're documenting, acts as replacement for
Expand Down
13 changes: 6 additions & 7 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,12 @@ Example
.. code:: python
import openml
from sklearn import preprocessing, tree, pipeline
# Set the OpenML API Key which is required to upload your runs.
# You can get your own API by signing up to OpenML.org.
openml.config.apikey = 'ABC'
from sklearn import impute, tree, pipeline
# Define a scikit-learn classifier or pipeline
clf = pipeline.Pipeline(
steps=[
('imputer', preprocessing.Imputer()),
('imputer', impute.SimpleImputer()),
('estimator', tree.DecisionTreeClassifier())
]
)
Expand All @@ -39,10 +35,13 @@ Example
task = openml.tasks.get_task(31)
# Run the scikit-learn model on the task.
run = openml.runs.run_model_on_task(clf, task)
# Publish the experiment on OpenML (optional, requires an API key).
# Publish the experiment on OpenML (optional, requires an API key.
# You can get your own API key by signing up to OpenML.org)
run.publish()
print('View the run online: %s/run/%d' % (openml.config.server, run.run_id))
You can find more examples in our `examples gallery <examples/index.html>`_.

----------------------------
How to get OpenML for python
----------------------------
Expand Down
22 changes: 22 additions & 0 deletions doc/progress.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,27 @@
Changelog
=========

0.10.0
~~~~~~
* ADD #737: Add list_evaluations_setups to return hyperparameters along with list of evaluations.
* FIX #261: Test server is cleared of all files uploaded during unit testing.
* FIX #447: All files created by unit tests no longer persist in local.
* FIX #608: Fixing dataset_id referenced before assignment error in get_run function.
* FIX #447: All files created by unit tests are deleted after the completion of all unit tests.
* FIX #589: Fixing a bug that did not successfully upload the columns to ignore when creating and publishing a dataset.
* FIX #608: Fixing dataset_id referenced before assignment error in get_run function.
* DOC #639: More descriptive documention for function to convert array format.
* DOC #719: Add documentation on uploading tasks.
* ADD #687: Adds a function to retrieve the list of evaluation measures available.
* ADD #695: A function to retrieve all the data quality measures available.
* ADD #412: Add a function to trim flow names for scikit-learn flows.
* ADD #715: `list_evaluations` now has an option to sort evaluations by score (value).
* ADD #722: Automatic reinstantiation of flow in `run_model_on_task`. Clearer errors if that's not possible.
* ADD #412: The scikit-learn extension populates the short name field for flows.
* MAINT #726: Update examples to remove deprecation warnings from scikit-learn
* MAINT #752: Update OpenML-Python to be compatible with sklearn 0.21


0.9.0
~~~~~
* ADD #560: OpenML-Python can now handle regression tasks as well.
Expand All @@ -21,6 +42,7 @@ Changelog
* ADD #659: Lazy loading of task splits.
* ADD #516: `run_flow_on_task` flow uploading is now optional.
* ADD #680: Adds `openml.config.start_using_configuration_for_example` (and resp. stop) to easily connect to the test server.
* ADD #75, #653: Adds a pretty print for objects of the top-level classes.
* FIX #642: `check_datasets_active` now correctly also returns active status of deactivated datasets.
* FIX #304, #636: Allow serialization of numpy datatypes and list of lists of more types (e.g. bools, ints) for flows.
* FIX #651: Fixed a bug that would prevent openml-python from finding the user's config file.
Expand Down
11 changes: 5 additions & 6 deletions examples/fetch_evaluations_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@

############################################################################
import openml
from pprint import pprint

############################################################################
# Listing evaluations
Expand All @@ -37,7 +36,7 @@
output_format='dataframe')

# Querying the returned results for precision above 0.98
pprint(evals[evals.value > 0.98])
print(evals[evals.value > 0.98])

#############################################################################
# Viewing a sample task
Expand All @@ -47,7 +46,7 @@
# We will start by displaying a simple *supervised classification* task:
task_id = 167140 # https://www.openml.org/t/167140
task = openml.tasks.get_task(task_id)
pprint(vars(task))
print(task)

#############################################################################
# Obtaining all the evaluations for the task
Expand All @@ -60,11 +59,11 @@
evals = openml.evaluations.list_evaluations(function=metric, task=[task_id],
output_format='dataframe')
# Displaying the first 10 rows
pprint(evals.head(n=10))
print(evals.head(n=10))
# Sorting the evaluations in decreasing order of the metric chosen
evals = evals.sort_values(by='value', ascending=False)
print("\nDisplaying head of sorted dataframe: ")
pprint(evals.head())
print(evals.head())

#############################################################################
# Obtaining CDF of metric for chosen task
Expand Down Expand Up @@ -147,4 +146,4 @@ def plot_flow_compare(evaluations, top_n=10, metric='predictive_accuracy'):
flow_ids = evals.flow_id.unique()[:top_n]
flow_names = evals.flow_name.unique()[:top_n]
for i in range(top_n):
pprint((flow_ids[i], flow_names[i]))
print((flow_ids[i], flow_names[i]))
40 changes: 30 additions & 10 deletions examples/flows_and_runs_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@
"""

import openml
from pprint import pprint
from sklearn import ensemble, neighbors, preprocessing, pipeline, tree
from sklearn import compose, ensemble, impute, neighbors, preprocessing, pipeline, tree

############################################################################
# Train machine learning models
Expand Down Expand Up @@ -39,8 +38,9 @@
target=dataset.default_target_attribute
)
print("Categorical features: {}".format(categorical_indicator))
enc = preprocessing.OneHotEncoder(categorical_features=categorical_indicator)
X = enc.fit_transform(X)
transformer = compose.ColumnTransformer(
[('one_hot_encoder', preprocessing.OneHotEncoder(categories='auto'), categorical_indicator)])
X = transformer.fit_transform(X)
clf.fit(X, y)

############################################################################
Expand All @@ -57,7 +57,7 @@
# Run the flow
run = openml.runs.run_model_on_task(clf, task)

# pprint(vars(run), depth=2)
print(run)

############################################################################
# Share the run on the OpenML server
Expand All @@ -74,18 +74,38 @@
# We can now also inspect the flow object which was automatically created:

flow = openml.flows.get_flow(run.flow_id)
pprint(vars(flow), depth=1)
print(flow)

############################################################################
# It also works with pipelines
# ############################
#
# When you need to handle 'dirty' data, build pipelines to model then automatically.
task = openml.tasks.get_task(115)
task = openml.tasks.get_task(1)
features = task.get_dataset().features
nominal_feature_indices = [
i for i in range(len(features))
if features[i].name != task.target_name and features[i].data_type == 'nominal'
]
pipe = pipeline.Pipeline(steps=[
('Imputer', preprocessing.Imputer(strategy='median')),
('OneHotEncoder', preprocessing.OneHotEncoder(sparse=False, handle_unknown='ignore')),
('Classifier', ensemble.RandomForestClassifier())
(
'Preprocessing',
compose.ColumnTransformer([
('Nominal', pipeline.Pipeline(
[
('Imputer', impute.SimpleImputer(strategy='most_frequent')),
(
'Encoder',
preprocessing.OneHotEncoder(
sparse=False, handle_unknown='ignore',
)
),
]),
nominal_feature_indices,
),
]),
),
('Classifier', ensemble.RandomForestClassifier(n_estimators=10))
])

run = openml.runs.run_model_on_task(pipe, task, avoid_duplicate_runs=False)
Expand Down
7 changes: 6 additions & 1 deletion examples/introduction_tutorial.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Introduction
===================
============
An introduction to OpenML, followed up by a simple example.
"""
Expand All @@ -15,6 +15,8 @@
# * Works seamlessly with scikit-learn and other libraries
# * Large scale benchmarking, compare to state of the art
#

############################################################################
# Installation
# ^^^^^^^^^^^^
# Installation is done via ``pip``:
Expand All @@ -26,6 +28,8 @@
# For further information, please check out the installation guide at
# https://openml.github.io/openml-python/master/contributing.html#installation
#

############################################################################
# Authentication
# ^^^^^^^^^^^^^^
#
Expand All @@ -49,6 +53,7 @@
# .. warning:: This example uploads data. For that reason, this example
# connects to the test server instead. This prevents the live server from
# crowding with example datasets, tasks, studies, and so on.

############################################################################
import openml
from sklearn import neighbors
Expand Down
4 changes: 2 additions & 2 deletions examples/sklearn/openml_run_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
An example of an automated machine learning experiment.
"""
import openml
from sklearn import tree, preprocessing, pipeline
from sklearn import impute, tree, pipeline

############################################################################
# .. warning:: This example uploads data. For that reason, this example
Expand All @@ -21,7 +21,7 @@
# Define a scikit-learn pipeline
clf = pipeline.Pipeline(
steps=[
('imputer', preprocessing.Imputer()),
('imputer', impute.SimpleImputer()),
('estimator', tree.DecisionTreeClassifier())
]
)
Expand Down
Loading

0 comments on commit 0f36642

Please sign in to comment.