Summary: This is a non-exhaustive list of repositories containing datasets, mainly for Supervised and Unsupervised Learning. Internal datasets are stored on Zeonodo.
Table of Contents
- UCI Machine Learning Repository
- Papers with Code
- Awesome Public Datasets
- Datasets on Kaggle
- DeepLearning.Net
- MLdata
- List of datasets for machine learning research
- Pytorch datasets
- TensorFlow datasets
- Network (Graph) Datasets
- MNIST
- CIFAR-10
- CIFAR-100
- ImageNet
- Tiny ImageNet
- Fashion-MNIST
- iNaturalist 2017, 2018, 2019 and 2021
- Stanford Cars
Sentiment Analysis
News Classification
Topic Classification
Question Answering (Retrieval-based)
Natural Language Inference (NLI)
Linguistic Acceptability
Semantic Textual Similarity
The following are larger datasets in order to train Representation Learning Models (RLMs) by extracting dense representations / features / embeddings.