FasterTransformer DeBERTa

The FasterTransformer DeBERTa implements the huggingface DeBERTa-V2 model (https://huggingface.co/docs/transformers/model_doc/deberta-v2).

Introduction

This document describes what FasterTransformer provides for the DeBERTa model, explaining the workflow and optimization. We also provide a guide to help users to run the DeBERTa model on FasterTransformer.

Supported features

Checkpoint loading
- Huggingface
Data type
- FP32
- FP16
- BF16
Feature
- Multi-GPU multi-node inference (implemented, not verified yet)
- Disentangled attention mechanism support with fused kernels
Frameworks
- PyTorch
- TensorFlow

Optimization

We implemented an efficient algorithm to perform the calculation of disentangled attention matrices for DeBERTa-variant types of Transformers.

Unlike BERT where each word is represented by one vector that sums the content embedding and position embedding, DeBERTa design first proposed the concept of disentangled attention, which uses two vectors to encode content and position respectively and forms attention weights by summing disentangled matrices. Performance gap has been identified between the new attention scheme and the original self-attention, mainly due to extra indexing and gather opertaions. Major optimizations implemented in this plugin includes: (i) fusion of gather and pointwise operataions (ii) utilizing the pattern of relative position matrix and shortcuting out-of-boundary index calculation (iii) parallel index calculation.

The disentangled attention support is primarily intended to be used together with DeBERTa network (with HuggingFace DeBERTa and DeBERTa-V2 implementation), but also applies to generic architectures that adopt disentangeld attention.

Setup

The following section lists the requirements to use FasterTransformer.

Requirements

CMake >= 3.13 for PyTorch
CUDA 11.0 or newer version
NCCL 2.10 or newer version
Python: Only verify on Python 3.
TensorFlow 2.0: Verify on 2.10.0.

Ensure you have the following components:

NVIDIA Docker and NGC container are recommended
NVIDIA Pascal or Volta or Turing or Ampere based GPU

For more information about how to get started with NGC containers, see the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning Documentation:

Getting Started Using NVIDIA GPU Cloud
Accessing And Pulling From The NGC Container Registry
Running PyTorch

For those unable to use the NGC container, to set up the required environment or create your own container, see the versioned NVIDIA Container Support Matrix.

Build FasterTransformer

Prepare

You can choose the pytorch version and python version you want. Here, we suggest image nvcr.io/nvidia/pytorch:22.09-py3, which contains the PyTorch 1.13.0 and python 3.8.

```bash
nvidia-docker run -ti --shm-size 5g --rm nvcr.io/nvidia/pytorch:22.09-py3 bash
git clone https://github.com/NVIDIA/FasterTransformer.git
mkdir -p FasterTransformer/build
cd FasterTransformer/build
git submodule init && git submodule update
```

Build the project

Note: the xx of -DSM=xx in following scripts means the compute capability of your GPU. The following table shows the compute capability of common GPUs.

GPU	compute capacity
P40	60
P4	61
V100	70
T4	75
A100	80
A30	80
A10	86

By default, -DSM is set by 70, 75, 80 and 86. When users set more kinds of -DSM, it requires longer time to compile. So, we suggest setting the -DSM for the device you use only. Here, we use xx as an example due to convenience.

build with TensorFlow

docker build -f docker/Dockerfile.tf2 --build-arg SM=XX --tag=ft-tf2 .
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it --rm ft-tf2:latest

mkdir build && cd build
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON -DBUILD_TF2=ON -DTF_PATH=/usr/local/lib/python3.8/dist-packages/tensorflow/ ..
make -j12

This will build the TensorFlow custom class. Please make sure that the TensorFlow >= 2.0.

build with PyTorch

docker build -f docker/Dockerfile.torch --build-arg SM=XX --tag=ft-pytorch .

mkdir build && cd build
cmake -DSM=xx -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
make -j12

This will build the TorchScript custom class. Please make sure that the PyTorch >= 1.5.0.

How to use

Please refer to DeBERTa examples for demo of FT DeBERTa usage. Meanwhile, task specific examples are under development.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deberta_guide.md

deberta_guide.md

FasterTransformer DeBERTa

Table Of Contents

Introduction

Supported features

Optimization

Setup

Requirements

Build FasterTransformer

Prepare

Build the project

How to use

Files

deberta_guide.md

Latest commit

History

deberta_guide.md

File metadata and controls

FasterTransformer DeBERTa

Table Of Contents

Introduction

Supported features

Optimization

Setup

Requirements

Build FasterTransformer

Prepare

Build the project

How to use