TransTab: A flexible transferable tabular learning framework [arxiv]
Document is available at https://transtab.readthedocs.io/en/latest/index.html.
Paper is available at https://arxiv.org/pdf/2205.09328.pdf.
5 min blog to understand TransTab at realsunlab.medium.com!
-
[05/04/23] Check the version
0.0.5
ofTransTab
! -
[01/04/23] Check the version
0.0.3
ofTransTab
! -
[12/03/22] Check out our [blog] for a quick understanding of TransTab!
-
[08/31/22]
0.0.2
Support encode tabular inputs into embeddings directly. An example is provided here. Several bugs are fixed.
-
Table embedding.
-
Add support to direct process table with missing values.
-
Add regression support.
This repository provides the python package transtab
for flexible tabular prediction model. The basic usage of transtab
can be done in a couple of lines!
import transtab
# load dataset by specifying dataset name
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data('credit-g')
# build classifier
model = transtab.build_classifier(cat_cols, num_cols, bin_cols)
# start training
transtab.train(model, trainset, valset, **training_arguments)
# make predictions, df_x is a pd.DataFrame with shape (n, d)
# return the predictions ypred with shape (n, 1) if binary classification;
# (n, n_class) if multiclass classification.
ypred = transtab.predict(model, df_x)
It's easy, isn't it?
First, download the right pytorch
version following the guide on https://pytorch.org/get-started/locally/.
Then try to install from pypi directly:
pip install transtab
or
pip install git+https://github.com/RyanWangZf/transtab.git
Please refer to for more guidance on installation and troubleshooting.
A novel feature of transtab
is its ability to learn from multiple distinct tables. It is easy to trigger the training like
# load the pretrained transtab model
model = transtab.build_classifier(checkpoint='./ckpt')
# load a new tabular dataset
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data('credit-approval')
# update categorical/numerical/binary column map of the loaded model
model.update({'cat':cat_cols,'num':num_cols,'bin':bin_cols})
# then we just trigger the training on the new data
transtab.train(model, trainset, valset, **training_arguments)
We can also conduct contrastive pretraining on multiple distinct tables like
# load from multiple tabular datasets
dataname_list = ['credit-g', 'credit-approval']
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data(dataname_list)
# build contrastive learner, set supervised=True for supervised VPCL
model, collate_fn = transtab.build_contrastive_learner(
cat_cols, num_cols, bin_cols, supervised=True)
# start contrastive pretraining training
transtab.train(model, trainset, valset, collate_fn=collate_fn, **training_arguments)
If you find this package useful, please consider citing the following paper:
@inproceedings{wang2022transtab,
title={TransTab: Learning Transferable Tabular Transformers Across Tables},
author={Wang, Zifeng and Sun, Jimeng},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}