-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T4 setting to achieve maximum performance #20
Comments
Yes, please let me know if you still don't get the reported performance numbers. |
Hi @nvpohanh, I am still getting performance issues, please see below: on RTX6000: on T4: |
It is expected that RTX6000/8000 is slightly slower than TitanRTX. To track down, could you share which clock frequency and power level that the GPU stabilizes at during the inference? You can monitor that by running About T4, it would be super helpful to understand the GPU temperature. You can use the same |
@nvpohanh, I will run the tests again, meanwhile could you please provide the reference MLPerf inference performance numbers for RTX6000? |
Unfortunately we didn't submit RTX6000 numbers for MLPerf Inference v0.5. |
Hi @nvpohanh, does Nvidia MLPerf inference scale automatically with RTX8000/6000?. I am running the inference with 3 GPU's; however, seems it is not scaling linearly, see below: on 1x_RTX6000: on 3x_RTX6000: |
The clocks were set up as shown below:
|
Hi @nvpohanh, does the Nvidia MLPerf inference work with multi-GPU on any system?, or does it work multi-GPU only with the systems used for submission v0.5? |
I'm running BERT experiments on an AWS G4 $ sudo nvidia-smi -q -d SUPPORTED_CLOCKS
==============NVSMI LOG==============
Timestamp : Tue Jun 16 16:19:30 2020
Driver Version : 440.33.01
CUDA Version : 10.2
Attached GPUs : 1
GPU 00000000:00:1E.0
Supported Clocks
Memory : 5001 MHz
Graphics : 1590 MHz
Graphics : 1575 MHz
... But at the maximum frequency the GPU is not really stable, dropping down to ~900 MHz in a SingleStream run: $ sudo nvidia-smi -ac 5001,1590
...
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Single Stream
Mode : Performance
90th percentile latency (ns) : 98016803
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Yes
================================================
Additional Stats
================================================
QPS w/ loadgen overhead : 11.27
QPS w/o loadgen overhead : 11.27
Min latency (ns) : 81513330
Max latency (ns) : 101580923
Mean latency (ns) : 88718991
50.00 percentile latency (ns) : 87841138
90.00 percentile latency (ns) : 98016803
95.00 percentile latency (ns) : 98051350
97.00 percentile latency (ns) : 98064832
99.00 percentile latency (ns) : 98093310
99.90 percentile latency (ns) : 101015009 It's quite stable at ~800 MHz: $ sudo nvidia-smi -ac 5001,795
$ python3 run.py --backend=pytorch --scenario=SingleStream
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Single Stream
Mode : Performance
90th percentile latency (ns) : 100668289
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Yes
================================================
Additional Stats
================================================
QPS w/ loadgen overhead : 10.13
QPS w/o loadgen overhead : 10.13
Min latency (ns) : 97998490
Max latency (ns) : 103667665
Mean latency (ns) : 98672758
50.00 percentile latency (ns) : 98098138
90.00 percentile latency (ns) : 100668289
95.00 percentile latency (ns) : 101665011
97.00 percentile latency (ns) : 102236103
99.00 percentile latency (ns) : 102918025
99.90 percentile latency (ns) : 103634703 but a bit faster at 900 MHz: $ sudo nvidia-smi -ac 5001,900
$ python3 run.py --backend=pytorch --scenario=SingleStream
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Single Stream
Mode : Performance
90th percentile latency (ns) : 99973436
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Yes
================================================
Additional Stats
================================================
QPS w/ loadgen overhead : 10.57
QPS w/o loadgen overhead : 10.57
Min latency (ns) : 88573483
Max latency (ns) : 109587347
Mean latency (ns) : 94590988
50.00 percentile latency (ns) : 94583175
90.00 percentile latency (ns) : 99973436
95.00 percentile latency (ns) : 101330092
97.00 percentile latency (ns) : 102890999
99.00 percentile latency (ns) : 104760879
99.90 percentile latency (ns) : 107273294 (900 MHz is "a bit faster" than 800 MHz at the 90th percentile. At the 99th percentile, 900 MHz is actually a bit slower than 800 MHz.) |
I have an impression that the T4 on AWS has not-so-good cooling... Could you try |
@nvpohanh Yes, that's the case. I suspect this VM instance takes a slice of a larger machine. Perhaps the neighbours are maxing out their GPUs :). |
hi @nvpohanh, could you please confirm if there are extra settings to those shown below to get maximum performance on T4 GPU with MLPerf inference benchmarks?
The text was updated successfully, but these errors were encountered: