Skip to content

Commit

Permalink
Merge pull request NVIDIA#970 from nv-kkudrynski/kkudrynski/readme_no…
Browse files Browse the repository at this point in the history
…tice

Adding links to performance benchmark page
  • Loading branch information
nv-kkudrynski authored Jul 21, 2021
2 parents 3d8d878 + 49e23b4 commit d788e8d
Show file tree
Hide file tree
Showing 52 changed files with 104 additions and 0 deletions.
2 changes: 2 additions & 0 deletions CUDA-Optimized/FastSpeech/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,8 @@ Sample result waveforms are [FP32](fastspeech/trt/samples) and [FP16](fastspeech

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions Kaldi/SpeechRecognition/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@ you can set `count` to `1` in the [`instance_group` section](https://docs.nvidia

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).


### Metrics

Expand Down
2 changes: 2 additions & 0 deletions MxNet/Classification/RN50v1.5/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,8 @@ By default:

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

To benchmark training and inference, run:
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Classification/ConvNets/efficientnet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,8 @@ Quantized models could also be used to classify new images using the `classify.p

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Classification/ConvNets/resnet50v1.5/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -498,6 +498,8 @@ To run inference on JPEG image using pretrained weights:

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Classification/ConvNets/resnext101-32x4d/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,8 @@ To run inference on JPEG image using pretrained weights:

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Classification/ConvNets/se-resnext101-32x4d/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,8 @@ To run inference on JPEG image using pretrained weights:

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Classification/ConvNets/triton/resnet50/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,8 @@ we can consider that all clients are local.

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).


### Offline scenario
This table lists the common variable parameters for all performance measurements:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,8 @@ To process static configuration logs, `triton/scripts/process_output.sh` script

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Dynamic batching performance
The Triton Inference Server has a dynamic batching mechanism built-in that can be enabled. When it is enabled, the server creates inference batches from multiple received requests. This allows us to achieve better performance than doing inference on each single request. The single request is assumed to be a single image that needs to be inferenced. With dynamic batching enabled, the server will concatenate single image requests into an inference batch. The upper bound of the size of the inference batch is set to 64. All these parameters are configurable.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,8 @@ To process static configuration logs, `triton/scripts/process_output.sh` script

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Dynamic batching performance
The Triton Inference Server has a dynamic batching mechanism built-in that can be enabled. When it is enabled, the server creates inference batches from multiple received requests. This allows us to achieve better performance than doing inference on each single request. The single request is assumed to be a single image that needs to be inferenced. With dynamic batching enabled, the server will concatenate single image requests into an inference batch. The upper bound of the size of the inference batch is set to 64. All these parameters are configurable.

Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Detection/SSD/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -565,6 +565,8 @@ To use the inference example script in your own code, you can call the `main` fu
## Performance
The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).
### Benchmarking
The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/LanguageModeling/BERT/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -692,6 +692,8 @@ For SQuAD, to run inference interactively on question-context pairs, use the scr
The [NVIDIA Triton Inference Server](https://github.com/NVIDIA/triton-inference-server) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. More information on how to perform inference using NVIDIA Triton Inference Server can be found in [triton/README.md](./triton/README.md).

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

Expand Down
2 changes: 2 additions & 0 deletions PyTorch/LanguageModeling/BERT/triton/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ To make the machine wait until the server is initialized, and the model is ready

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

The numbers below are averages, measured on Triton on V100 32G GPU, with [static batching](https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching).

| Format | GPUs | Batch size | Sequence length | Throughput - FP32(sequences/sec) | Throughput - mixed precision(sequences/sec) | Throughput speedup (mixed precision/FP32) |
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/LanguageModeling/Transformer-XL/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1113,6 +1113,8 @@ perplexity on the test dataset.

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Recommendation/DLRM/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,8 @@ The NVIDIA Triton Inference Server provides a cloud inferencing solution optimiz

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Recommendation/DLRM/triton/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,8 @@ For more information about `perf_client` please refer to [official documentation

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Throughput/Latency results

Throughput is measured in recommendations/second, and latency in milliseconds.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Recommendation/NCF/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,8 @@ The script will then:

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

#### Training performance benchmark
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Segmentation/MaskRCNN/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,8 @@ __Note__: The score is always the Average Precision(AP) at
- maxDets = 100

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking
Benchmarking can be performed for both training and inference. Both scripts run the Mask R-CNN model using the parameters defined in `configs/e2e_mask_rcnn_R_50_FPN_1x.yaml`. You can specify whether benchmarking is performed in FP16, TF32 or FP32 by specifying it as an argument to the benchmarking scripts.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Segmentation/nnUNet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,8 @@ The script will then:

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks to measure the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Segmentation/nnUNet/triton/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,8 @@ we can consider that all clients are local.

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).


### Offline scenario
This table lists the common variable parameters for all performance measurements:
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/SpeechRecognition/Jasper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,8 @@ More information on how to perform inference using Triton Inference Server with

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking
The following section shows how to run benchmarks measuring the model performance in training and inference modes.

Expand Down
2 changes: 2 additions & 0 deletions PyTorch/SpeechRecognition/Jasper/triton/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,8 @@ For more information about `perf_client`, refer to the [official documentation](

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Inference Benchmarking in Triton Inference Server

To benchmark the inference performance on Volta Turing or Ampere GPU, run `bash triton/scripts/execute_all_perf_runs.sh` according to [Quick-Start-Guide](#quick-start-guide) Step 7.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/SpeechSynthesis/FastPitch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,8 @@ More examples are presented on the website with [samples](https://fastpitch.gith

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/SpeechSynthesis/FastPitch/triton/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,8 @@ we can consider that all clients are local.

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).



### Offline scenario
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/SpeechSynthesis/Tacotron2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,8 @@ python inference.py --tacotron2 <Tacotron2_checkpoint> --waveglow <WaveGlow_chec

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,8 @@ By default the `./build_trtis.sh` script builds the TensorRT engines with FP16 m

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

The following tables show inference statistics for the Tacotron2 and WaveGlow
text-to-speech system.
The tables include average latency, latency standard deviation,
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Translation/GNMT/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -932,6 +932,8 @@ To view all available options for inference, run `python3 translate.py --help`.

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking
The following section shows how to run benchmarks measuring the model
performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions PyTorch/Translation/Transformer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,8 @@ sacrebleu -t wmt14/full -l en-de --echo src | python inference.py --buffer-size

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions TensorFlow/Classification/ConvNets/resnet50v1.5/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,8 @@ The optional `--xla` and `--amp` flags control XLA and AMP during inference.

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
2 changes: 2 additions & 0 deletions TensorFlow/Classification/ConvNets/resnext101-32x4d/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,8 @@ The optional `--xla` and `--amp` flags control XLA and AMP during inference.

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,8 @@ The optional `--xla` and `--amp` flags control XLA and AMP during inference.

## Performance

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference).

### Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.
Expand Down
Loading

0 comments on commit d788e8d

Please sign in to comment.