Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor readme updates to align formatting with facet readme #59

Merged
merged 3 commits into from
Nov 4, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 36 additions & 26 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@

|

`sklearndf` is an open source library designed to address a common need with
scikit-learn: the outputs of transformers are numpy arrays, even when the input is a
data frame. However, to inspect a model it is essential to keep track of the feature names.
*sklearndf* is an open source library designed to address a common need with
`scikit-learn <https://github.com/scikit-learn/scikit-learn>`__: the outputs of
transformers are numpy arrays, even when the input is a
data frame. However, to inspect a model it is essential to keep track of the
feature names.

To this end, `sklearndf` enhances scikit-learn's estimators as follows:
To this end, *sklearndf* enhances scikit-learn's estimators as follows:

- **Preserve data frame structure**:
Return data frames as results of transformations, preserving feature names as the column index.
Expand All @@ -17,13 +19,12 @@ To this end, `sklearndf` enhances scikit-learn's estimators as follows:


|azure_pypi| |azure_conda| |azure_devops_master_ci| |code_cov|
|python_versions| |code_style| |documentation_status|
|made_with_sphinx_doc| |License_badge|
|python_versions| |code_style| |made_with_sphinx_doc| |License_badge|

Installation
---------------------

sklearndf supports both PyPI and Anaconda
*sklearndf* supports both PyPI and Anaconda

Anaconda
~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -44,7 +45,7 @@ Quickstart
----------------------

The following quickstart guide provides a minimal example workflow to get up and running
with sklearndf.
with *sklearndf*.


Creating a DataFrame friendly scikit-learn preprocessing pipeline
Expand All @@ -62,7 +63,7 @@ We will build a preprocessing pipeline which:
- for categorical variables fills missing values with the string 'Unknown' and then one-hot encodes
- for numerical values fills missing values using median values

The strength of sklearndf is to maintain the scikit-learn conventions and expressivity,
The strength of *sklearndf* is to maintain the scikit-learn conventions and expressivity,
while also preserving data frames, and hence feature names. We can see this after using
fit_transform on our preprocessing pipeline.

Expand All @@ -72,7 +73,7 @@ fit_transform on our preprocessing pipeline.
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

# Relevant sklearndf imports
# relevant sklearndf imports
from sklearndf.transformation import (
ColumnTransformerDF,
OneHotEncoderDF,
Expand All @@ -84,14 +85,14 @@ fit_transform on our preprocessing pipeline.
)
from sklearndf.classification import RandomForestClassifierDF

# Load titanic data
# load titanic data
titanic_X, titanic_y = fetch_openml("titanic", version=1, as_frame=True, return_X_y=True)

# Select features
# select features
numerical_features = ['age', 'fare']
categorical_features = ['embarked', 'sex', 'pclass']

# Create a preprocessing pipeline
# create a preprocessing pipeline
preprocessing_numeric_df = SimpleImputerDF(strategy="median")

preprocessing_categorical_df = PipelineDF(
Expand All @@ -108,7 +109,7 @@ fit_transform on our preprocessing pipeline.
]
)

# Run preprocessing
# run preprocessing
transformed_df = preprocessing_df.fit_transform(X=titanic_X, y=titanic_y)
transformed_df.head()

Expand All @@ -131,7 +132,7 @@ fit_transform on our preprocessing pipeline.
Tracing features from post-transform to original
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The sklearndf pipeline has a `feature_names_original_` attribute which returns a series
The *sklearndf* pipeline has a `feature_names_original_` attribute which returns a series
mapping the output columns (the series' index) to the input columns (the series' values).
We can therefore easily select all output features generated from a given input feature,
such as in this case for embarked.
Expand Down Expand Up @@ -160,13 +161,13 @@ such as in this case for embarked.
Completing the pipeline with a classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Scikit-learn regressors and classifiers have a sklearndf sibling obtained by appending
Scikit-learn regressors and classifiers have a *sklearndf* sibling obtained by appending
DF to the class name; the API remains the same.
The result of any predict and decision function will be returned as a pandas series
(single output) or data frame (class probabilities or multi-output).

We can combine the preprocessing pipeline above with a classifier to create a full
predictive pipeline. sklearndf provides two useful, specialised pipeline objects for
predictive pipeline. *sklearndf* provides two useful, specialised pipeline objects for
this, RegressorPipelineDF and ClassifierPipelineDF. Both implement a special two-step
pipeline with one preprocessing step and one prediction step, while staying compatible
with the general sklearn pipeline idiom.
Expand Down Expand Up @@ -197,12 +198,14 @@ on a test set.

model score: 0.79

Download the getting started tutorial and explore *sklearndf* for yourself here: |binder|

Contributing
---------------------------

sklearndf is stable and is being supported long-term.
*sklearndf* is stable and is being supported long-term.

Contributions to sklearndf are welcome and appreciated.
Contributions to *sklearndf* are welcome and appreciated.
For any bug reports or feature requests/enhancements please use the appropriate
`GitHub form <https://github.com/BCG-Gamma/sklearndf/issues>`_, and if you wish to do so,
please open a PR addressing the issue.
Expand All @@ -215,7 +218,7 @@ For further information on contributing please see our [LINK: contribution guide
License
---------------------------

sklearndf is licensed under Apache 2.0 as described in the
*sklearndf* is licensed under Apache 2.0 as described in the
`LICENSE <https://github.com/BCG-Gamma/sklearndf/LICENSE>`_ file.


Expand All @@ -225,13 +228,14 @@ Acknowledgements
This package provides a layer on top of some popular building blocks for Machine
Learning:

The `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ learners and
pipelining support the corresponding sklearndf implementations.
The `scikit-learn <https://github.com/scikit-learn/scikit-learn>`__ learners and
pipelining support the corresponding *sklearndf* implementations.

BCG GAMMA
---------------------------

If you would like to know more about the team behind sklearndf please see our [LINK: about us] page.
If you would like to know more about the team behind *sklearndf* please see our
[LINK: about us] page.

We are always on the lookout for passionate and talented data scientists to join the
BCG GAMMA team. If you would like to know more you can find out about BCG GAMMA
Expand All @@ -242,21 +246,27 @@ or have a look at

.. |azure_conda| image:: https://
:target: https://

.. |azure_pypi| image:: https://
:target: https://

.. |azure_devops_master_ci| image:: https://
:target: https://

.. |code_cov| image:: https://
:target: https://
.. |documentation_status| image:: https://
:target: https://

.. |python_versions| image:: https://img.shields.io/badge/python-3.7|3.8-blue.svg
:target: https://www.python.org/downloads/release/python-380/

.. |code_style| image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/psf/black

.. |made_with_sphinx_doc| image:: https://img.shields.io/badge/Made%20with-Sphinx-1f425f.svg
:target: https://www.sphinx-doc.org/

.. |license_badge| image:: https://img.shields.io/badge/License-Apache%202.0-olivegreen.svg
:target: https://opensource.org/licenses/Apache-2.0
:target: https://opensource.org/licenses/Apache-2.0

.. |binder| image:: https://mybinder.org/badge_logo.svg
:target: https://mybinder.org/