⭐ Welcome to Graviti AI community! We are devoted to making datasets more accessible and interoperable to AI developers, fostering a supportive community of building machine learning applications.
- 📑 Open datasets catalog
▶️ Quick start on how to use open datasets- 📖 Step-by-step Tutorial
- ✍️ Become a contributor
- 🔍 Find more datasets on Graviti
- ❓ Q&A
- 💡 Documentation
- 🧑🤝🧑 Join the community
These datasets are great for machine learning learners, researchers and engineers to train models for image classification, object detection, visual relationship detection, instance segmentation, and more.
The full list is available on Graviti Community.
Please DO NOT modify this file directly. You could direct to the dataset page to contribute.
Datasets repo is a lightweight library of datasets in high quality. All are open source carrying a diverse range of tasks, annotation types, and sizes.
Search by task types or keywords if you need a specific dataset. You could fork a dataset on dataset page and read data through SDK.
Popular tasks
- Object Detection
- Classification
- Keypoints Detection
- Segmentation
- Pose Estimation
- ASR
- OCR
⭐ You have a complex problem or project involving a large amount of data and lots of variables. You know that finding a public dataset to train your machine learning model would be the best approach. How do you deal with data that’s in a variety of formats? How do you choose the dataset for your model?
We'll walk you through step by step from the basics to advanced techniques and help you get started!
- Sign up for an account
Go to graviti.com to sign up.
Get an AccessKey on Graviti Developer Tools.
An AccessKey is needed to authenticate identity when using TensorBay via SDK or CLI.
You have full permissions for the account. Please keep the key properly.
- Install Tensorbay Python SDK
- To install TensorBay SDK and CLI by pip, run the following command:
pip3 install tensorbay
- To verify the SDK and CLI version, run the following command:
gas --version
- Authorize a Client Instance
from tensorbay import GAS
gas = GAS("<YOUR_ACCESSKEY>")
- Select an open dataset
You need to fork an open dataset from the community to your Graviti workspace before processing the data.
- Search datasets from the open dataset catalog 📖
- Preview the data and annotations
View data visualization in advance to help you quickly understand a dataset and its semantic information. - On the dataset page, choose to fork the dataset in the 'Explore Dataset' drop-down menu.
- Find the dataset on the 'Your Datasets' list
- Prepare data
You could customize open datasets into the right dataset for your models by using features below.
- Integrate with machine learning frameworks (PyTorch, TensorFlow and more)
- PyTorch 📖
The typical method to integrate a dataset with PyTorch is to build a ‘Segment’ class derived from ‘torch.utils.data.Dataset’.
- TensorFlow 📖
The typical method to integrate a dataset with TensorFlow is to build a callable ‘Segment’ class.
- We recommend enabling cache for a better training experience. Sample code is as below (It requires enough local storage to load dataset)
from paddle.io import Dataloader,Dataset
from PIL import Image
from tensorbay.dataset import Dataset as TensorBay Dataset
class DogsVSCatsSegment(Dataset):
##class for wrapping a DosVsCats segment
def __init__(self, gas, segment_name, transfors):
super().__inint__()
self.dataset = TensorBayDataset('DogsVsCats', gas)
self.dataset.enable_cache() ## launch cache
self.segment = self.dataset{segment_name}
self.category_to_index = self.dataset.catalog.clasification.get_category_to_index()
self.transform = transform
print(self.datasdt.cache_enabled) ## confirm if cached has been launched
- Check the full tutorial for advanced tools and techniques.
Contributions are welcomed and greatly appreciated. You can become a community contributor in many different ways, we value all forms of contribution including:
- Improve code
- Improve docs
- Report bugs
- Write blogs
- Give talks
- Provide ideas
- Answer questions
Can I use these datasets for my project?
Sure! You're totally free to do so. You may check detailed license info further on each dataset page.
Can I add a dataset here?
Send us a pull request and we'll discuss.
To connect with all practitioners like you, join our community discord for more communication.