GitHub - Deci-AI/super-gradients at bcdeb0d9a4748e681c8cb2eea0ba8c23360f9767

Name	Name	Last commit message	Last commit date
Latest commit Louis-Dupont improve docstring and homogenize some names Apr 11, 2023 bcdeb0d · Apr 11, 2023 History 1,818 Commits
.circleci	.circleci	fixed yaml name (#809 )	Mar 29, 2023
.github	.github	welcome to the family (#737 )	Feb 22, 2023
checkpoints	checkpoints	PP-YoloE (#643 )	Feb 13, 2023
docs	docs	Feature/sg 493 modelnames instead of strings (#614 )	Jan 23, 2023
documentation	documentation	rename stream to predict_webcam	Apr 11, 2023
scripts	scripts	Feature-OPS-1419_integration-tests-and-docker (#339 )	Jan 8, 2023
src/super_gradients	src/super_gradients	improve docstring and homogenize some names	Apr 11, 2023
tests	tests	Added tutorial on DetectionOutputAdapter (#817 )	Apr 4, 2023
tutorials	tutorials	added docs on PTQ and QAT (#711 )	Feb 16, 2023
utils_script	utils_script	Feature/sg 171 make coco inherit from detection dataset (#298 )	Aug 16, 2022
.gitignore	.gitignore	Progressed the latest README.md to docs/	Feb 3, 2022
.pre-commit-config.yaml	.pre-commit-config.yaml	Feature/SG 143 black formatter (#433 )	Nov 2, 2022
CONTRIBUTING.md	CONTRIBUTING.md	Added user guide to Sphinx	Dec 30, 2021
LICENSE.md	LICENSE.md	year fix	Dec 30, 2021
README.md	README.md	fix image add citing (#814 )	Apr 2, 2023
pyproject.toml	pyproject.toml	Hotfix/sg 000 release candidate fix (#520 )	Nov 28, 2022
requirements.dev.txt	requirements.dev.txt	Feature/SG 143 black formatter (#433 )	Nov 2, 2022
requirements.pro.txt	requirements.pro.txt	Add "resume" option when resuming DeciPlatformClient (#702 )	Feb 21, 2023
requirements.txt	requirements.txt	first try (#761 )	Mar 7, 2023
setup.py	setup.py	fix master installation (#604 )	Jan 8, 2023
version.txt	version.txt	add a __version__ variable (#552 )	Jan 2, 2023

Build, train, and fine-tune production-ready deep learning SOTA vision models

Version 3 is out! Notebooks have been updated!

Website • Docs • Getting Started • Pretrained Models • Community • License • Deci Platform

Build with SuperGradients

Support various computer vision tasks

Ready to deploy pre-trained SOTA models

# Load model with pretrained weights
from super_gradients.training import models
from super_gradients.common.object_names import Models

model = models.get(Models.YOLOX_S, pretrained_weights="coco")

All Computer Vision Models - Pretrained Checkpoints can be found in the Model Zoo

Classification

Semantic Segmentation

Object Detection

Easy to train SOTA Models

Easily load and fine-tune production-ready, pre-trained SOTA models that incorporate best practices and validated hyper-parameters for achieving best-in-class accuracy. For more information on how to do it go to Getting Started

Plug and play recipes

python -m super_gradients.examples.train_from_recipe_example.train_from_recipe architecture=regnetY800 dataset_interface.data_dir=<YOUR_Imagenet_LOCAL_PATH> ckpt_root_dir=<CHEKPOINT_DIRECTORY>

More example on how and why to use recipes can be found in Recipes

Production readiness

All SuperGradients models’ are production ready in the sense that they are compatible with deployment tools such as TensorRT (Nvidia) and OpenVINO (Intel) and can be easily taken into production. With a few lines of code you can easily integrate the models into your codebase.

# Load model with pretrained weights
from super_gradients.training import models
from super_gradients.common.object_names import Models

model = models.get(Models.YOLOX_S, pretrained_weights="coco")

# Prepare model for conversion
# Input size is in format of [Batch x Channels x Width x Height] where 640 is the standart COCO dataset dimensions
model.eval()
model.prep_model_for_conversion(input_size=[1, 3, 640, 640])
    
# Create dummy_input

# Convert model to onnx
torch.onnx.export(model, dummy_input,  "yolox_s.onnx")

More information on how to take your model to production can be found in Getting Started notebooks

Quick Installation

pip install super-gradients

What's New

【17/11/2022】 Integration with ClearML
【06/9/2022】 PP-LiteSeg - new pre-trained checkpoints and recipes for Cityscapes with SOTA mIoU scores (~1.5% above paper)🎯
【07/08/2022】DDRNet23 - new pre-trained checkpoints and recipes for Cityscapes with SOTA mIoU scores (~1% above paper)🎯
【27/07/2022】YOLOX models (object detection) - recipes and pre-trained checkpoints.
【07/07/2022】SSD Lite MobileNet V2,V1 - Training recipes and pre-trained checkpoints on COCO - Tailored for edge devices! 📱
【07/07/2022】 STDC - new pre-trained checkpoints and recipes for Cityscapes with super SOTA mIoU scores (~2.5% above paper)🎯

Check out SG full release notes.

Coming soon

PP-Yolo-E implementation
Quantization aware training (QAT)
Tools for faster training
Integration with more professional tools.

Table of Content

Getting Started
Advanced Features
Installation Methods
- Prerequisites
- Quick Installation
Implemented Model Architectures
Contributing
Citation
Community
License
Deci Platform

Getting Started

Start Training with Just 1 Command Line

The most simple and straightforward way to start training SOTA performance models with SuperGradients reproducible recipes. Just define your dataset path and where you want your checkpoints to be saved and you are good to go from your terminal!

Just make sure that you setup your dataset according to the data dir specified in the recipe.

python -m super_gradients.examples.train_from_recipe_example.train_from_recipe --config-name=imagenet_regnetY architecture=regnetY800 dataset_interface.data_dir=<YOUR_Imagenet_LOCAL_PATH> ckpt_root_dir=<CHEKPOINT_DIRECTORY>

Quickly Load Pre-Trained Weights for Your Desired Model with SOTA Performance

Want to try our pre-trained models on your machine? Import SuperGradients, initialize your Trainer, and load your desired architecture and pre-trained weights from our SOTA model zoo

# The pretrained_weights argument will load a pre-trained architecture on the provided dataset
    
import super_gradients

model = models.get("model-name", pretrained_weights="pretrained-model-name")

Classification

Transfer Learning

Classification Transfer Learning

GitHub source

Semantic Segmentation

Object Detection

Transfer Learning

Detection Transfer Learning

GitHub source

How to Connect Custom Dataset

Detection How to Connect Custom Dataset

GitHub source

How to Predict Using Pre-trained Model

Segmentation, Detection and Classification Prediction

How to Predict Using Pre-trained Model

GitHub source

Advanced Features

Knowledge Distillation Training

Knowledge Distillation is a training technique that uses a large model, teacher model, to improve the performance of a smaller model, the student model. Learn more about SuperGradients knowledge distillation training with our pre-trained BEiT base teacher model and Resnet18 student model on CIFAR10 example notebook on Google Colab for an easy to use tutorial using free GPU hardware

Knowledge Distillation Training

GitHub source

Recipes

To train a model, it is necessary to configure 4 main components. These components are aggregated into a single "main" recipe .yaml file that inherits the aforementioned dataset, architecture, raining and checkpoint params. It is also possible (and recomended for flexibility) to override default settings with custom ones. All recipes can be found here
Recipes support out of the box every model, metric or loss that is implemented in SuperGradients, but you can easily extend this to any custom object that you need by "registering it". Check out this tutorial for more information.

How to Use Recipes

GitHub source

Using Distributed Data Parallel (DDP)

Why use DDP ?

Recent Deep Learning models are growing larger and larger to an extent that training on a single GPU can take weeks. In order to train models in a timely fashion, it is necessary to train them with multiple GPUs. Using 100s GPUs can reduce training time of a model from a week to less than an hour.

How does it work ?

Each GPU has its own process, which controls a copy of the model and which loads its own mini-batch from disk and sends it to its GPU during training. After the forward pass is completed on every GPU, the gradient is reduced across all GPUs, yielding to all the GPUs having the same gradient locally. This leads to the model weights to stay synchronized across all GPUs after the backward pass.

How to use it ?

You can use SuperGradients to train your model with DDP in just a few lines.

main.py

from super_gradients import init_trainer, Trainer
from super_gradients.common import MultiGPUMode
from super_gradients.training.utils.distributed_training_utils import setup_device

# Initialize the environment
init_trainer()

# Launch DDP on 4 GPUs'
setup_device(multi_gpu=MultiGPUMode.DISTRIBUTED_DATA_PARALLEL, num_gpus=4)

# Call the trainer
Trainer(expriment_name=...)

# Everything you do below will run on 4 gpus

...

Trainer.train(...)

Finally, you can launch your distributed training with a simple python call.

python main.py

Please note that if you work with torch<1.9.0 (deprecated), you will have to launch your training with either torch.distributed.launch or torchrun, in which case nproc_per_node will overwrite the value set with gpu_mode:

python -m torch.distributed.launch --nproc_per_node=4 main.py

torchrun --nproc_per_node=4 main.py

Calling functions on a single node

It is often in DDP training that we want to execute code on the master rank (i.e rank 0). In SG, users usually execute their own code by triggering "Phase Callbacks" (see "Using phase callbacks" section below). One can make sure the desired code will only be ran on rank 0, using ddp_silent_mode or the multi_process_safe decorator. For example, consider the simple phase callback below, that uploads the first 3 images of every batch during training to the Tensorboard:

from super_gradients.training.utils.callbacks import PhaseCallback, PhaseContext, Phase
from super_gradients.common.environment.env_helpers import multi_process_safe

class Upload3TrainImagesCalbback(PhaseCallback):
    def __init__(
        self,
    ):
        super().__init__(phase=Phase.TRAIN_BATCH_END)
    
    @multi_process_safe
    def __call__(self, context: PhaseContext):
        batch_imgs = context.inputs.cpu().detach().numpy()
        tag = "batch_" + str(context.batch_idx) + "_images"
        context.sg_logger.add_images(tag=tag, images=batch_imgs[: 3], global_step=context.epoch)

The @multi_process_safe decorator ensures that the callback will only be triggered by rank 0. Alternatively, this can also be done by the SG trainer boolean attribute (which the phase context has access to), ddp_silent_mode, which is set to False iff the current process rank is zero (even after the process group has been killed):

from super_gradients.training.utils.callbacks import PhaseCallback, PhaseContext, Phase

class Upload3TrainImagesCalbback(PhaseCallback):
    def __init__(
        self,
    ):
        super().__init__(phase=Phase.TRAIN_BATCH_END)

    def __call__(self, context: PhaseContext):
        if not context.ddp_silent_mode:
            batch_imgs = context.inputs.cpu().detach().numpy()
            tag = "batch_" + str(context.batch_idx) + "_images"
            context.sg_logger.add_images(tag=tag, images=batch_imgs[: 3], global_step=context.epoch)

Note that ddp_silent_mode can be accessed through SgTrainer.ddp_silent_mode. Hence, it can be used in scripts after calling SgTrainer.train() when some part of it should be ran on rank 0 only.

Good to know

Your total batch size will be (number of gpus x batch size), so you might want to increase your learning rate. There is no clear rule, but a rule of thumb seems to be to linearly increase the learning rate with the number of gpus

Easily change architectures parameters

from super_gradients.training import models

# instantiate default pretrained resnet18
default_resnet18 = models.get(model_name="resnet18", num_classes=100, pretrained_weights="imagenet")

# instantiate pretrained resnet18, turning DropPath on with probability 0.5
droppath_resnet18 = models.get(model_name="resnet18", arch_params={"droppath_prob": 0.5}, num_classes=100, pretrained_weights="imagenet")

# instantiate pretrained resnet18, without classifier head. Output will be from the last stage before global pooling
backbone_resnet18 = models.get(model_name="resnet18", arch_params={"backbone_mode": True}, pretrained_weights="imagenet")

Using phase callbacks

from super_gradients import Trainer
from torch.optim.lr_scheduler import ReduceLROnPlateau
from super_gradients.training.utils.callbacks import Phase, LRSchedulerCallback
from super_gradients.training.metrics.classification_metrics import Accuracy

# define PyTorch train and validation loaders and optimizer

# define what to be called in the callback
rop_lr_scheduler = ReduceLROnPlateau(optimizer, mode="max", patience=10, verbose=True)

# define phase callbacks, they will fire as defined in Phase
phase_callbacks = [LRSchedulerCallback(scheduler=rop_lr_scheduler,
                                       phase=Phase.VALIDATION_EPOCH_END,
                                       metric_name="Accuracy")]

# create a trainer object, look the declaration for more parameters
trainer = Trainer("experiment_name")

# define phase_callbacks as part of the training parameters
train_params = {"phase_callbacks": phase_callbacks}

Integration to Weights and Biases

from super_gradients import Trainer

# create a trainer object, look the declaration for more parameters
trainer = Trainer("experiment_name")

train_params = { ... # training parameters
                "sg_logger": "wandb_sg_logger", # Weights&Biases Logger, see class WandBSGLogger for details
                "sg_logger_params": # paramenters that will be passes to __init__ of the logger 
                  {
                    "project_name": "project_name", # W&B project name
                    "save_checkpoints_remote": True
                    "save_tensorboard_remote": True
                    "save_logs_remote": True
                  } 
               }

Integration to ClearML

from super_gradients import Trainer

# create a trainer object, look the declaration for more parameters
trainer = Trainer("experiment_name")

train_params = { ... # training parameters
                "sg_logger": "clearml_sg_logger", # ClearML Logger, see class ClearMLSGLogger for details
                "sg_logger_params": # paramenters that will be passes to __init__ of the logger 
                  {
                    "project_name": "project_name", # ClearML project name
                    "save_checkpoints_remote": True,
                    "save_tensorboard_remote": True,
                    "save_logs_remote": True,
                  } 
               }

Installation Methods

Prerequisites

General requirements

Python 3.7, 3.8 or 3.9 installed.
1.9.0 <= torch < 1.14
- https://pytorch.org/get-started/locally/
The python packages that are specified in requirements.txt;

To train on nvidia GPUs

Nvidia CUDA Toolkit >= 11.2
CuDNN >= 8.1.x
Nvidia Driver with CUDA >= 11.2 support (≥460.x)

Quick Installation

Install stable version using PyPi

See in PyPi

pip install super-gradients

That's it !

Install using GitHub

pip install git+https://github.com/Deci-AI/super-gradients.git@stable

Implemented Model Architectures

All Computer Vision Models - Pretrained Checkpoints can be found in the Model Zoo

Detailed list can be found here

Image Classification

Semantic Segmentation

Object Detection

Implemented Datasets

Deci provides implementation for various datasets. If you need to download any of the dataset, you can find instructions.

Image Classification

Semantic Segmentation

Object Detection

Documentation

Check SuperGradients Docs for full documentation, user guide, and examples.

Contributing

To learn about making a contribution to SuperGradients, please see our Contribution page.

Our awesome contributors:

Made with contrib.rocks.

Citation

If you are using SuperGradients library or benchmarks in your research, please cite SuperGradients deep learning training library.

Community

If you want to be a part of SuperGradients growing community, hear about all the exciting news and updates, need help, request for advanced features, or want to file a bug or issue report, we would love to welcome you aboard!

Slack is the place to be and ask questions about SuperGradients and get support. Click here to join our Slack
To report a bug, file an issue on GitHub.
Join the SG Newsletter for staying up to date with new features and models, important announcements, and upcoming events.
For a short meeting with us, use this link and choose your preferred time.

License

This project is released under the Apache 2.0 license.

Citing

BibTeX

@misc{supergradients,
  doi = {10.5281/ZENODO.7789328},
  url = {https://zenodo.org/record/7789328},
  author = {Aharon,  Shay and {Louis-Dupont} and {Ofri Masad} and Yurkova,  Kate and {Lotem Fridman} and {Lkdci} and Khvedchenya,  Eugene and Rubin,  Ran and Bagrov,  Natan and Tymchenko,  Borys and Keren,  Tomer and Zhilko,  Alexander and {Eran-Deci}},
  title = {Super-Gradients},
  publisher = {GitHub},
  journal = {GitHub repository},
  year = {2021},
}

Latest DOI

Deci Platform

Deci Platform is our end to end platform for building, optimizing and deploying deep learning models to production.

Request free trial to enjoy immediate improvement in throughput, latency, memory footprint and model size.

Features:

Automatically compile and quantize your models with just a few clicks (TensorRT, OpenVINO).
Gain up to 10X improvement in throughput, latency, memory and model size.
Easily benchmark your models’ performance on different hardware and batch sizes.
Invite co-workers to collaborate on models and communicate your progress.
Deci supports all common frameworks and Hardware, from Intel CPUs to Nvidia's GPUs and Jetsons. ֿ

Request free trial here

License

Deci-AI/super-gradients

Folders and files

Latest commit

History

Repository files navigation

Version 3 is out! Notebooks have been updated!

Build with SuperGradients

Support various computer vision tasks

Ready to deploy pre-trained SOTA models

All Computer Vision Models - Pretrained Checkpoints can be found in the Model Zoo

Classification

Semantic Segmentation

Object Detection

Easy to train SOTA Models

Plug and play recipes

Production readiness

Quick Installation

What's New

Coming soon

Table of Content

Getting Started

Start Training with Just 1 Command Line

Quickly Load Pre-Trained Weights for Your Desired Model with SOTA Performance

Classification

Transfer Learning

Semantic Segmentation

Quick Start

Transfer Learning

How to Connect Custom Dataset

Object Detection

Transfer Learning

How to Connect Custom Dataset

How to Predict Using Pre-trained Model

Segmentation, Detection and Classification Prediction

Advanced Features

Knowledge Distillation Training

Recipes

Using Distributed Data Parallel (DDP)

Why use DDP ?

How does it work ?

How to use it ?

Calling functions on a single node

Good to know

Easily change architectures parameters

Using phase callbacks

Integration to Weights and Biases

Integration to ClearML

Installation Methods

Prerequisites

Quick Installation

Implemented Model Architectures

Image Classification

Semantic Segmentation

Object Detection

Implemented Datasets

Image Classification

Semantic Segmentation

Object Detection

Documentation

Contributing

Citation

Community

License

Citing

BibTeX

Latest DOI

Deci Platform

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 49

Used by 290

Contributors 55

Languages