Awesome Self-Supervised Learning for Non-Sequential Tabular Data (SSL4NSTD)

This repository contains the frontier research on self-supervised learning for tabular data which has been a popular topic recently.
This list is maintained by Wei-Wei Du and Wei-Yao Wang. (Actively keep updating)
If you have come across relevant resources or found some errors in this repository, feel free to open an issue or submit a PR.

Our Survey Paper

A Survey on Self-Supervised Learning for Non-Sequential Tabular Data (ACML-24 Journal Track)

Citation

@article{DBLP:journals/corr/abs-2402-01204,
  author       = {Wei{-}Yao Wang and
                  Wei{-}Wei Du and
                  Derek Xu and
                  Wei Wang and
                  Wen{-}Chih Peng},
  title        = {A Survey on Self-Supervised Learning for Non-Sequential Tabular Data},
  journal      = {CoRR},
  volume       = {abs/2402.01204},
  year         = {2024}
}

Papers

Predictive Learning

VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain (NeurIPS'20) [Paper] [Supplementary] [Code]
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data (ACL'20) [Paper]
TABBIE: Pretrained Representations of Tabular Data (NAACL'21) [Paper]) [Code]
CORE: Self- and Semi-supervised Tabular Learning with COnditional REgularizations (NeurIPS'21) [Paper]
TabTransformer: Tabular Data Modeling Using Contextual Embeddings [Paper]
TabNet: Attentive Interpretable Tabular Learning (AAAI'21) [Paper] Code
Self-Supervision Enhanced Feature Selection with Correlated Gates (ICLR'22) [Paper] [Code]
TransTab: Learning Transferable Tabular Transformers Across Tables (NeurIPS'22) [Paper] [Code] [Blog]
LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks (NeurIPS'22) [Paper] [Code]
Self Supervised Pre-training for Large Scale Tabular Data (NeurIPS'22 TRL Workshop) [Paper] [Blog]
Local Contrastive Feature Learning for Tabular Data (CIKM'22) [Paper]
Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular Data (preprint'23) [Paper]
Generative Table Pre-training Empowers Models for Tabular Prediction (EMNLP'23) [Paper] [Code]
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second (ICLR'23) [Paper] [Code]
STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables (ICLR'23) [Paper] [Code]
Language Models are Realistic Tabular Data Generators (ICLR'23) [Paper] [Code]
Self-supervised Representation Learning from Random Data Projectors (NeurIPS'23 TRL Workshop) [Paper] [Code]
SwitchTab: Switched Autoencoders Are Effective Tabular Learners (AAAI'24) [Paper]
Making Pre-trained Language Models Great on Tabular Prediction (ICLR'24) [Paper]
Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains (ICML'24) [Paper] [Code]
Large Scale Transfer Learning for Tabular Data via Language Modeling (NeurIPS'24) [Paper] [Code]
Accurate predictions on small data with a tabular foundation model (Nature-25) [Paper]

Contrastive Learning

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption (ICLR'22) [Paper] [Code]
STab: Self-supervised Learning for Tabular Data (NeurIPS'22 Workshop on TRL) [Paper]
TransTab: Learning Transferable Tabular Transformers Across Tables (NeurIPS'22) [Paper]
PTaRL: Prototype-based Tabular Representation Learning via Space Calibration (ICLR'24) [Paper]

Hybrid Learning

SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning (NeurIPS'21) [Paper] [Supplementary] [Code]
SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training (NurIPS‘22 Workshop on TRL) [Paper] [Code]
Transfer Learning with Deep Tabular Models (ICLR'23) [Paper] [Code]
DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal (CIKM'23) [Paper] [Code]
ReConTab: Regularized Contrastive Representation Learning for Tabular Data (NeurIPS'23 Workshop on TRL) [Paper]
XTab: Cross-table Pretraining for Tabular Transformers (ICML'23) [Paper]
UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data Science (ICLR'24) [Paper]

Benchmarks

Benchmark	Task	#Datasets	Paper
MLPCBench	Classification	40	Kadra et al., 2021
DLBench	Classification, Regression	11	Shwartz-Ziv and Armon, 2022
TabularBench	Classification, Regression	45	Grinsztajn et al., 2022
TabZilla	Classification	36	McElfresh et al., 2023
TabPretNet	Unlabeled, Classification, Regression	2000	Ye et al., 2023
The Tremendous TabLib Trawl (T4)	Unlabeled	3.1M	Gardner et al., 2024

Tutorials

Self-Supervised Learning: Self-Prediction and Contrastive Learning (NeurIPS'21) [Website]

Workshops

Table Representation Learning (NeurIPS) [Website]

Related Survey

Deep Neural Networks and Tabular Data: A Survey [Paper]
Self-Supervised Learning for Recommender Systems: A Survey (TKDE) [Paper]
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data [Paper]
Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects [Paper]
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence [Paper]
A Survey on Time-Series Pre-Trained Models [Paper]

Tools & Libraries

Pytorch Frame: A modular deep learning framework for building neural network models on heterogeneous tabular data [Link]
PyTorch Tabular: A Framework for Deep Learning with Tabular Data [Link]
Pytorch wide-deep: A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch [Link]

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Self-Supervised Learning for Non-Sequential Tabular Data (SSL4NSTD)

Our Survey Paper

Citation

Papers

Predictive Learning

Contrastive Learning

Hybrid Learning

Benchmarks

Tutorials

Workshops

Related Survey

Tools & Libraries

About

Releases

Packages

Contributors 5

wwweiwei/awesome-self-supervised-learning-for-tabular-data

Folders and files

Latest commit

History

Repository files navigation

Awesome Self-Supervised Learning for Non-Sequential Tabular Data (SSL4NSTD)

Our Survey Paper

Citation

Papers

Predictive Learning

Contrastive Learning

Hybrid Learning

Benchmarks

Tutorials

Workshops

Related Survey

Tools & Libraries

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages