Tubular pre-processing for machine learning!
tubular
implements pre-processing steps for tabular data commonly used in machine learning pipelines.
The transformers are compatible with scikit-learn Pipelines. Each has a transform
method to apply the pre-processing step to data and a fit
method to learn the relevant information from the data, if applicable.
The transformers in tubular
work with data in pandas DataFrames.
There are a variety of transformers to assist with;
- capping
- dates
- imputation
- mapping
- categorical encoding
- numeric operations
Here is a simple example of applying capping to two columns;
from tubular.capping import CappingTransformer
import pandas as pd
from sklearn.datasets import fetch_california_housing
# load the california housing dataset
cali = fetch_california_housing()
X = pd.DataFrame(cali['data'], columns=cali['feature_names'])
# initialise a capping transformer for 2 columns
capper = CappingTransformer(capping_values = {'AveOccup': [0, 10], 'HouseAge': [0, 50]})
# transform the data
X_capped = capper.transform(X)
The easiest way to get tubular
is directly from pypi with;
pip install tubular
The documentation for tubular
can be found on readthedocs.
Instructions for building the docs locally can be found in docs/README.
To help get started there are example notebooks in the examples folder in the repo that show how to use each transformer.
To open the example notebooks in binder click here or click on the launch binder
shield above and then click on the directory button in the side bar to the left to navigate to the specific notebook.
For bugs and feature requests please open an issue.
The test framework we are using for this project is pytest. To build the package locally and run the tests follow the steps below.
First clone the repo and move to the root directory;
git clone https://github.com/lvgig/tubular.git
cd tubular
Next install tubular
and development dependencies;
pip install . -r requirements-dev.txt
Finally run the test suite with pytest
;
pytest
tubular
is under active development, we're super excited if you're interested in contributing!
See the CONTRIBUTING file for the full details of our working practices.