Skip to content

wwweiwei/awesome-self-supervised-learning-for-tabular-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 

Repository files navigation

Awesome Self-Supervised Learning for Non-Sequential Tabular Data (SSL4NSTD)

Version LastUpdated Topic

This repository contains the frontier research on self-supervised learning for tabular data which has been a popular topic recently.
This list is maintained by Wei-Wei Du and Wei-Yao Wang. (Actively keep updating)
If you have come across relevant resources or found some errors in this repository, feel free to open an issue or submit a PR.

Survey Paper

A Survey on Self-Supervised Learning for Non-Sequential Tabular Data (ACML-24 Journal Track)

Citation

@article{DBLP:journals/corr/abs-2402-01204,
  author       = {Wei{-}Yao Wang and
                  Wei{-}Wei Du and
                  Derek Xu and
                  Wei Wang and
                  Wen{-}Chih Peng},
  title        = {A Survey on Self-Supervised Learning for Non-Sequential Tabular Data},
  journal      = {CoRR},
  volume       = {abs/2402.01204},
  year         = {2024}
}

Papers

Predictive Learning

  • VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain (NeurIPS'20) [Paper] [Supplementary] [Code]
  • TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data (ACL'20) [Paper]
  • TABBIE: Pretrained Representations of Tabular Data (NAACL'21) [Paper]) [Code]
  • CORE: Self- and Semi-supervised Tabular Learning with COnditional REgularizations (NeurIPS'21) [Paper]
  • TabTransformer: Tabular Data Modeling Using Contextual Embeddings [Paper]
  • TabNet: Attentive Interpretable Tabular Learning (AAAI'21) [Paper] Code
  • Self-Supervision Enhanced Feature Selection with Correlated Gates (ICLR'22) [Paper] [Code]
  • TransTab: Learning Transferable Tabular Transformers Across Tables (NeurIPS'22) [Paper] [Code] [Blog]
  • LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks (NeurIPS'22) [Paper] [Code]
  • Self Supervised Pre-training for Large Scale Tabular Data (NeurIPS'22 TRL Workshop) [Paper] [Blog]
  • Local Contrastive Feature Learning for Tabular Data (CIKM'22) [Paper]
  • Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular Data (preprint'23) [Paper]
  • Generative Table Pre-training Empowers Models for Tabular Prediction (EMNLP'23) [Paper] [Code]
  • TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second (ICLR'23) [Paper] [Code]
  • STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables (ICLR'23) [Paper] [Code]
  • Language Models are Realistic Tabular Data Generators (ICLR'23) [Paper] [Code]
  • Self-supervised Representation Learning from Random Data Projectors (NeurIPS'23 TRL Workshop) [Paper] [Code]
  • SwitchTab: Switched Autoencoders Are Effective Tabular Learners (AAAI'24) [Paper]
  • Making Pre-trained Language Models Great on Tabular Prediction (ICLR'24) [Paper]
  • Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains (ICML'24) [Paper] [Code]
  • Large Scale Transfer Learning for Tabular Data via Language Modeling (preprint'24) [Paper] [Code]

Contrastive Learning

  • SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption (ICLR'22) [Paper] [Code]
  • STab: Self-supervised Learning for Tabular Data (NeurIPS'22 Workshop on TRL) [Paper]
  • TransTab: Learning Transferable Tabular Transformers Across Tables (NeurIPS'22) [Paper]
  • PTaRL: Prototype-based Tabular Representation Learning via Space Calibration (ICLR'24) [Paper]

Hybrid Learning

  • SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning (NeurIPS'21) [Paper] [Supplementary] [Code]
  • SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training (NurIPS‘22 Workshop on TRL) [Paper] [Code]
  • Transfer Learning with Deep Tabular Models (ICLR'23) [Paper] [Code]
  • DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal (CIKM'23) [Paper] [Code]
  • ReConTab: Regularized Contrastive Representation Learning for Tabular Data (NeurIPS'23 Workshop on TRL) [Paper]
  • XTab: Cross-table Pretraining for Tabular Transformers (ICML'23) [Paper]
  • UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data Science (ICLR'24) [Paper]

Benchmarks

Benchmark Task #Datasets Paper
MLPCBench Classification 40 Kadra et al., 2021
DLBench Classification, Regression 11 Shwartz-Ziv and Armon, 2022
TabularBench Classification, Regression 45 Grinsztajn et al., 2022
TabZilla Classification 36 McElfresh et al., 2023
TabPretNet Unlabeled, Classification, Regression 2000 Ye et al., 2023
The Tremendous TabLib Trawl (T4) Unlabeled 3.1M Gardner et al., 2024

Tutorials

  • Self-Supervised Learning: Self-Prediction and Contrastive Learning (NeurIPS'21) [Website]

Workshops

  • Table Representation Learning (NeurIPS) [Website]

Related Survey

  • Deep Neural Networks and Tabular Data: A Survey [Paper]
  • Self-Supervised Learning for Recommender Systems: A Survey (TKDE) [Paper]
  • Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data [Paper]
  • Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects [Paper]
  • On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence [Paper]
  • A Survey on Time-Series Pre-Trained Models [Paper]

Tools & Libraries

  • Pytorch Frame: A modular deep learning framework for building neural network models on heterogeneous tabular data [Link]
  • PyTorch Tabular: A Framework for Deep Learning with Tabular Data [Link]

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •