awslabs · felixbiessmann · May 4, 2021 · May 5, 2021 · May 5, 2021 · Jun 1, 2021
diff --git a/README.md b/README.md
@@ -1,32 +1,28 @@
 DataWig - Imputation for Tables
 ================================
 
-[![PyPI version](https://badge.fury.io/py/datawig.svg)](https://badge.fury.io/py/datawig.svg)
 [![GitHub license](https://img.shields.io/github/license/awslabs/datawig.svg)](https://github.com/awslabs/datawig/blob/master/LICENSE)
 [![GitHub issues](https://img.shields.io/github/issues/awslabs/datawig.svg)](https://github.com/awslabs/datawig/issues)
-[![Build Status](https://travis-ci.org/awslabs/datawig.svg?branch=master)](https://travis-ci.org/awslabs/datawig)
 
 DataWig learns Machine Learning models to impute missing values in tables.
 
-See our user-guide and extended documentation [here](https://datawig.readthedocs.io/en/latest).
+The latest version of DataWig is built around the [tabular prediction API of AutoGluon](https://auto.gluon.ai/stable/tutorials/tabular_prediction/index.html).
+
+This change will lead to better imputation models and faster training -- but not all of the original DataWig API is yet migrated.
 
 ## Installation
 
-### CPU
-```bash
-pip3 install datawig
+Clone the repository from git and set up virtualenv in the root dir of the package:
+
+```
+python3 -m venv venv
 ```
 
-### GPU
-If you want to run DataWig on a GPU you need to make sure your version of Apache MXNet Incubating contains the GPU bindings.
-Depending on your version of CUDA, you can do this by running the following:
+Install the package from local sources:
 
-```bash
-wget https://raw.githubusercontent.com/awslabs/datawig/master/requirements/requirements.gpu-cu${CUDA_VERSION}.txt
-pip install datawig --no-deps -r requirements.gpu-cu${CUDA_VERSION}.txt
-rm requirements.gpu-cu${CUDA_VERSION}.txt
 ```
-where `${CUDA_VERSION}` can be `75` (7.5), `80` (8.0), `90` (9.0), or `91` (9.1).
+./venv/bin/pip install -e .
+```
 
 ## Running DataWig
 The DataWig API expects your data as a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). Here is an example of how the dataframe might look:
@@ -37,124 +33,109 @@ The DataWig API expects your data as a [pandas DataFrame](https://pandas.pydata.
 | SDCards     | Best SDCard ever ...  | 8GB  | Blue  |
 | Dress       | This **yellow** dress | M    | **?** |
 
-### Quickstart Example
+DataWig let's you impute missing values in two ways:
+  * A `.complete` functionality inspired by [`fancyimpute`](https://github.com/iskandr/fancyimpute)
+  * A `sklearn`-like API with `.fit` and `.predict` methods
+
+## Quickstart Example
 
-For most use cases, the `SimpleImputer` class is the best starting point. For convenience there is the function [SimpleImputer.complete](https://datawig.readthedocs.io/en/latest/source/API.html#datawig.simple_imputer.SimpleImputer.complete) that takes a DataFrame and fits an imputation model for each column with missing values, with all other columns as inputs:
+Here are some examples of the DataWig API, also available as [notebook](datawig-examples.ipynb)
+
+### Using `AutoGluonImputer.complete`
 
 ```python
 import datawig, numpy
 
 # generate some data with simple nonlinear dependency
-df = datawig.utils.generate_df_numeric() 
+df = datawig.utils.generate_df_numeric()
 # mask 10% of the values
 df_with_missing = df.mask(numpy.random.rand(*df.shape) > .9)
 
 # impute missing values
-df_with_missing_imputed = datawig.SimpleImputer.complete(df_with_missing)
+df_with_missing_imputed = datawig.AutoGluonImputer.complete(df_with_missing)
 
 ```
 
-You can also impute values in specific columns only (called `output_column` below) using values in other columns (called `input_columns` below). DataWig currently supports imputation of categorical columns and numeric columns.
+### Using `AutoGluonImputer.fit` and `.predict`
+
+This usage is very similar to using the underlying [tabular prediction API of AutoGluon](https://auto.gluon.ai/stable/tutorials/tabular_prediction/index.html) - but we added some convenience functionality such as a precision filtering for categorical imputations.  
 
-### Imputation of categorical columns
+You can also impute values in specific columns only (called `output_column` below) using values in other columns (called `input_columns` below). DataWig currently supports imputation of categorical columns and numeric columns. Type inference is based on [``pandas.api.types.is_numeric_dtype``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.types.is_numeric_dtype.html) .
+
+#### Imputation of categorical columns
+
+Let's first generate some random strings hidden in longer random strings:
 
 ```python
 import datawig
 
-df = datawig.utils.generate_df_string( num_samples=200, 
-                                       data_column_name='sentences', 
+df = datawig.utils.generate_df_string( num_samples=200,
+                                       data_column_name='sentences',
                                        label_column_name='label')
+df.head(n=2)
+```
+
+The generate data will look like this:
+
+|sentences	|label|
+|---------|-------|
+|	wILsn T366D r1Psz KAnDn 8RfUf GuuRU	|8RfUf|
+|	8RfUf jBq5U BqVnh pnXfL GuuRU XYnSP	|8RfUf|
+
+Now let's split the rows into training and test data and train an imputation model
 
+```python
 df_train, df_test = datawig.utils.random_split(df)
 
-#Initialize a SimpleImputer model
-imputer = datawig.SimpleImputer(
+imputer = datawig.AutoGluonImputer(
     input_columns=['sentences'], # column(s) containing information about the column we want to impute
-    output_column='label', # the column we'd like to impute values for
-    output_path = 'imputer_model' # stores model data and metrics
+    output_column='label' # the column we'd like to impute values for
     )
 
 #Fit an imputer model on the train data
-imputer.fit(train_df=df_train)
+imputer.fit(train_df=df_train, time_limit=100)
 
 #Impute missing values and return original dataframe with predictions
 imputed = imputer.predict(df_test)
 ```
 
-### Imputation of numerical columns
+#### Imputation of numerical columns
+
+Imputation of numerical values works just like for categorical values.
+
+Let's first generate some numeric values with a quadratic dependency:
 
 ```python
 import datawig
 
-df = datawig.utils.generate_df_numeric( num_samples=200, 
-                                        data_column_name='x', 
-                                        label_column_name='y')         
+df = datawig.utils.generate_df_numeric( num_samples=200,
+                                        data_column_name='x',
+                                        label_column_name='y')      
+
 df_train, df_test = datawig.utils.random_split(df)
 
-#Initialize a SimpleImputer model
-imputer = datawig.SimpleImputer(
+imputer = datawig.AutoGluonImputer(
     input_columns=['x'], # column(s) containing information about the column we want to impute
     output_column='y', # the column we'd like to impute values for
-    output_path = 'imputer_model' # stores model data and metrics
     )
 
 #Fit an imputer model on the train data
-imputer.fit(train_df=df_train, num_epochs=50)
+imputer.fit(train_df=df_train, time_limit=100)
 
 #Impute missing values and return original dataframe with predictions
 imputed = imputer.predict(df_test)
-
 ```
 
-In order to have more control over the types of models and preprocessings, the `Imputer` class allows directly specifying all relevant model features and parameters. 
-
-For details on usage, refer to the provided [examples](./examples).
 
 ### Acknowledgments
 Thanks to [David Greenberg](https://github.com/dgreenberg) for the package name.
 
-### Building documentation
-
-```bash
-git clone git@github.com:awslabs/datawig.git
-cd datawig/docs
-make html
-open _build/html/index.html
-```
-
 
 ### Executing Tests
 
-Clone the repository from git and set up virtualenv in the root dir of the package:
-
-```
-python3 -m venv venv
-```
-
-Install the package from local sources:
-
-```
-./venv/bin/pip install -e .
-```
-
 Run tests:
 
 ```
 ./venv/bin/pip install -r requirements/requirements.dev.txt
 ./venv/bin/python -m pytest
 ```
-
-
-### Updating PyPi distribution
-
-Before updating, increment the version in setup.py.
-
-```
-git clone git@github.com:awslabs/datawig.git
-cd datawig
-# build local distribution for current version
-python setup.py sdist
-# upload to PyPi
-twine upload --skip-existing dist/*
-```
-