- 1. Preamble
- 2. Overview
- 3. Power Measurement BKM
- 4. Power Measurement Flow
- 5. Rules
- 5.1. Total SUT power must be measured
- 5.2. Power Delivery
- 5.3. SUT must be consistent with the description
- 5.4. SUT and Power Analyzer to remain unchanged from submission to results
- 5.5. Replicability is Mandatory
- 5.6. SUT must be available upon request
- 5.7. Power is measured in the Performance phase
- 5.8. LoadGen and Start/End TimeStamps
- 5.9. Power and Performance Measurement and Reporting
- 5.10. Realistic Power Management
- 6. What needs to be submitted?
- 7. Frequently Asked Questions (FAQ)
Updated 5 January 2022.
Points of contact:
-
Sachin Idgunji (sidgunji@mlcommons.org).
-
Arun Tejusve Raghunath Rajan (tejus@mlcommons.org).
This document describes how to measure Power for benchmarks in the MLPerf Inference suite. It only outlines the additional steps to be considered for Power submissions compared with original Performance submissions. Please see the MLPerf Inference rules that cover the submission, review, and publication process for MLPerf Inference benchmarks in detail.
Power is measured at the system level ("at the wall"). All submitters must use the software and scripts developed by the MLPerf Power working group ("the MLPerf Power workflow").
The MLPerf name and logo are trademarks. In order to refer to a result using the MLPerf name, the result must conform to the letter and spirit of the rules specified in this document. The MLPerf organization reserves the right to solely determine if a use of its name or logo is acceptable.
A system under test (SUT) consists of a defined set of hardware and software resources that will be measured for performance and power. The hardware resources may include processors, accelerators, memories, disks, and interconnect. The software resources may include an operating system, compilers, libraries, and drivers that significantly affect the running time or power consumption of a benchmark.
A Director/Server system is any system that can act as the host for the power measurements, e.g., a standard PC with an interface that connects to the power analyzer.
A power analyzer (aka power meter) is a device that is used for measuring the wall power of a system. The SUT is powered using the power analyzer and the power, voltage and current drawn by the SUT are measured by the power analyzer. The other end of the power analyzer is connected to the director system that uses a software for logging the collected data.
NTP stands for Network Time Protocol and it ensures time synchronization between the different systems that are used for power collection. It is required that the SUT and the Director are synchronized with NTP.
PTD (Power Thermal Daemon) is a utility that is developed by SPEC® for the purposes of power measurements and logging. SPEC have provided a license to MLCommons members to use PTD. Please ensure to adhere to the End User License Agreement (EULA) between MLCommons and SPEC (see FAQ below). In particular, PTD can be used solely with the MLPerf Power workflow.
The power measurement BKM (Best Known Method) is documented here. This document also contains the block diagram for the power measurement process. This setup will need to be followed for all power submissions.
The power measurement flow detailed below : here. This is part of the power automation tool that is found here. Only these will need to be used if submitting for power. Any other means of power measurement submission will not be allowed. This is because the power measurement flow incorporates within itself a "Dynamic Ranging" method that ensures the power measurements being done are at high precision and accuracy. The flow also ensures that only the performance section of the benchmark is being measured. For power measurement, this flow does not apply to disaggregated systems in this version of the submission.
All rules for performance are applicable for power. In addition:
The power consumption must be measured at the system level, i.e. including all components that are sensitized by LoadGen e.g. the host processor on which LoadGen runs, accelerators, memory, fans, etc.
The MLPerf Power workflow measures AC power consumed at the wall. The expectation is that the power delivery to the SUT must not have any battery or alternate power storage in conjunction to the AC power, before the power is provided to the PSUs of the system.
The SUT used for performance/power measurements must match the description provided as part of the submission.
It is required that the SUT and the Power Analyzer remain unchanged from the time of the submission until the results are published. This is to ensure that in case there are audits, it is easiest to verify and reproduce the results from an existing setup.
If you are measuring the performance of a publicly available and widely-used system or framework, you must use publicly available and widely-used versions of the system or framework.
If you are measuring the performance of an experimental framework or system, you must make the system and framework you use available upon demand for replication.
The MLPerf Inference rules specify several phases of a benchmark: accuracy, performance, compliance. Power is evaluated only in the performance phase, and not in any other phases.
The MLPerf Power workflow uses exactly the same LoadGen as used for performance runs. LoadGen logs the system timestamp at the start and at the end of a performance run. The workflow then uses these timestamps to evaluate the power consumption of the run.
Power and performance measurements should be from the same run for a given benchmark and scenario. The MLPerf Power workflow takes care of this by default. This must not be changed. Example: It is not permitted to run the same benchmark and scenario 3 times and report the highest performance and the lowest power consumption among the 3 runs.
The goal of the testing is to mimic real-world usage scenarios as much as possible and enable showing the benefits of realistic power management. Therefore, we require that:
-
Any power management system be qualified for use appropriate for the submission type (e.g., a generally available system must use software/firmware qualified for general availability and shipping with the platform).
-
Any changes in power management behavior must not have manual intervention.
A valid submission must have all the mandatory fields of the SUT description to be filled by the submitter.
All logs created by the MLPerf Power workflow must be submitted, including the performance measurement logs generated by LoadGen running on the SUT and the power measurement logs generated by the software running on the Director (both for the ranging and testing phases).
The power analyzer is configured automatically through the software that is part of the MLPerf Power workflow. For the v1.0, v1.1 and v2.0 rounds, the software only supports connecting a single meter to a single SUT: connecting multiple meters to a single SUT is not supported.
A power meter configuration must be reported in a file called
analyzer_table.*
placed as follows:
-
If the configuration is common to all scenarios, benchmarks and systems: under the
<division>/<submitter>/measurements
directory. -
If the configuration is common to all scenarios and benchmarks running on a system: under the
<division>/<submitter>/measurements/<system>
directory. -
If the configuration is common to all scenarios for a benchmark running on a system: under the
<division>/<submitter>/measurements/<system>/<benchmark>
directory. -
If the configuration is specific to a scenario for a benchmark running on a system: under the
<division>/<submitter>/measurements/<system>/<benchmark>/<scenario>
directory.
For reproducibility and verifiability of each submission, the Power Management Settings used to achieve the measured performance and power for the submission must be documented.
The Power Management Settings must be reported in a file called
power_settings.*
placed according to the same rules as for the Power Analyzer
Settings.
During the review period, some reviewers try to reproduce runs on systems from submitters. To reproduce Power results faithfully requires an exact replica of the setup including the AC Power supply source and the configuration using break out boxes. There is also an aspect of part-to-part variability. If during reviews there are any differences in the power run reproduction, it is advisable for the submitter to offer their system to audit to ensure that the submission is done following all the rules and can be tested out by the auditor.
The MLPerf Power workflow uses proprietary software (SPEC PTDaemon). To access this software, your organization must be a member of MLCommons. In addition, an authorized representative of your organization must sign the MLPerf Power EULA, and send it to support@mlcommons.org.
Yes, you must use the MLPerf Power workflow for any results submitted to MLPerf. This workflow integrates a number of checks and balances which ensures the highest quality of collected power measurements.
Once your organization signs the EULA, MLCommons staff will give you access to a private GitHub repo containing the tools.
For the v1.0, v1.1 and v2.0 rounds, we only support Yokogawa power analyzers. In the future, we can support any power analyzers supported by PTDaemon.