Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T4 setting to achieve maximum performance #20

Open
vilmara opened this issue Feb 18, 2020 · 11 comments
Open

T4 setting to achieve maximum performance #20

vilmara opened this issue Feb 18, 2020 · 11 comments

Comments

@vilmara
Copy link
Contributor

vilmara commented Feb 18, 2020

hi @nvpohanh, could you please confirm if there are extra settings to those shown below to get maximum performance on T4 GPU with MLPerf inference benchmarks?

  • Set Transparent Huge Pages (THP) to always (for Server scenario)
  • Turn off the ECC RAM
  • Set GPU Max Clock and Memory Clock rate
  • Set Cooling efficiency
@nvpohanh
Copy link

Yes, please let me know if you still don't get the reported performance numbers.

@vilmara
Copy link
Contributor Author

vilmara commented May 6, 2020

Hi @nvpohanh, I am still getting performance issues, please see below:

on RTX6000:
Mobilenet | Server Scenario: ~43,594 img/sec ( Nvidia has reported ~47,775-49,775 img/sec on RTX8000)

on T4:
Resnet-50- | Server scenario: ~4,782 img/sec (Nvidia has reported ~5,193 img/sec)

@nvpohanh
Copy link

nvpohanh commented May 6, 2020

It is expected that RTX6000/8000 is slightly slower than TitanRTX. To track down, could you share which clock frequency and power level that the GPU stabilizes at during the inference? You can monitor that by running nvidia-smi dmon -s pc concurrently with the harness.

About T4, it would be super helpful to understand the GPU temperature. You can use the same nvidia-smi dmon -s pc command to monitor that.

@vilmara
Copy link
Contributor Author

vilmara commented May 6, 2020

@nvpohanh, I will run the tests again, meanwhile could you please provide the reference MLPerf inference performance numbers for RTX6000?

@nvpohanh
Copy link

nvpohanh commented May 6, 2020

Unfortunately we didn't submit RTX6000 numbers for MLPerf Inference v0.5.

@vilmara
Copy link
Contributor Author

vilmara commented May 7, 2020

Hi @nvpohanh, does Nvidia MLPerf inference scale automatically with RTX8000/6000?. I am running the inference with 3 GPU's; however, seems it is not scaling linearly, see below:

on 1x_RTX6000:
Mobilenet | Server Scenario: ~43,594 img/sec

on 3x_RTX6000:
Mobilenet | Server Scenario: ~93,724.99 img/sec (expected: 43,594x3= ~130,781 )

@vilmara
Copy link
Contributor Author

vilmara commented May 12, 2020

It is expected that RTX6000/8000 is slightly slower than TitanRTX. To track down, could you share which clock frequency and power level that the GPU stabilizes at during the inference? You can monitor that by running nvidia-smi dmon -s pc concurrently with the harness.

Capture_3xRTX6000

The clocks were set up as shown below:

sudo nvidia-smi -ac 6501,1620

@vilmara
Copy link
Contributor Author

vilmara commented Jun 8, 2020

Hi @nvpohanh, does Nvidia MLPerf inference scale automatically with RTX8000/6000?. I am running the inference with 3 GPU's; however, seems it is not scaling linearly, see below:

on 1x_RTX6000:
Mobilenet | Server Scenario: ~43,594 img/sec

on 3x_RTX6000:
Mobilenet | Server Scenario: ~93,724.99 img/sec (expected: 43,594x3= ~130,781 )

Hi @nvpohanh, does the Nvidia MLPerf inference work with multi-GPU on any system?, or does it work multi-GPU only with the systems used for submission v0.5?

@psyhtest
Copy link

psyhtest commented Jun 16, 2020

I'm running BERT experiments on an AWS G4 g4dn.4xlarge instance using a single T4. The supported clocks are a bit lower than in @vilmara's case:

$ sudo nvidia-smi -q -d SUPPORTED_CLOCKS                                     

==============NVSMI LOG==============

Timestamp                           : Tue Jun 16 16:19:30 2020
Driver Version                      : 440.33.01
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:00:1E.0
    Supported Clocks
        Memory                      : 5001 MHz
            Graphics                : 1590 MHz
            Graphics                : 1575 MHz
...

But at the maximum frequency the GPU is not really stable, dropping down to ~900 MHz in a SingleStream run:

$ sudo nvidia-smi -ac 5001,1590
...
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 98016803
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 11.27
QPS w/o loadgen overhead        : 11.27

Min latency (ns)                : 81513330
Max latency (ns)                : 101580923
Mean latency (ns)               : 88718991
50.00 percentile latency (ns)   : 87841138
90.00 percentile latency (ns)   : 98016803
95.00 percentile latency (ns)   : 98051350
97.00 percentile latency (ns)   : 98064832
99.00 percentile latency (ns)   : 98093310
99.90 percentile latency (ns)   : 101015009

It's quite stable at ~800 MHz:

$ sudo nvidia-smi -ac 5001,795
$ python3 run.py --backend=pytorch --scenario=SingleStream
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 100668289
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 10.13
QPS w/o loadgen overhead        : 10.13

Min latency (ns)                : 97998490
Max latency (ns)                : 103667665
Mean latency (ns)               : 98672758
50.00 percentile latency (ns)   : 98098138
90.00 percentile latency (ns)   : 100668289
95.00 percentile latency (ns)   : 101665011
97.00 percentile latency (ns)   : 102236103
99.00 percentile latency (ns)   : 102918025
99.90 percentile latency (ns)   : 103634703

but a bit faster at 900 MHz:

$ sudo nvidia-smi -ac 5001,900
$ python3 run.py --backend=pytorch --scenario=SingleStream
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 99973436
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 10.57
QPS w/o loadgen overhead        : 10.57

Min latency (ns)                : 88573483
Max latency (ns)                : 109587347
Mean latency (ns)               : 94590988
50.00 percentile latency (ns)   : 94583175
90.00 percentile latency (ns)   : 99973436
95.00 percentile latency (ns)   : 101330092
97.00 percentile latency (ns)   : 102890999
99.00 percentile latency (ns)   : 104760879
99.90 percentile latency (ns)   : 107273294

(900 MHz is "a bit faster" than 800 MHz at the 90th percentile. At the 99th percentile, 900 MHz is actually a bit slower than 800 MHz.)

@nvpohanh
Copy link

I have an impression that the T4 on AWS has not-so-good cooling... Could you try nvidia-smi dmon -s pc to see what's the GPU temperature? If it reaches >75C, something needs to be improved.

@psyhtest
Copy link

@nvpohanh Yes, that's the case. I suspect this VM instance takes a slice of a larger machine. Perhaps the neighbours are maxing out their GPUs :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants