Replies: 2 comments
-
Here's a bit of background on the scaling coefficients before I describe the quadprog solver: The numbers in the accelwattch_sass_hw.xml correspond to "xi" in equation (12) in the AccelWattch paper. These are the scaling coefficients by which we multiply the initial McPAT energy per access to correct inaccuracies in McPAT's initial energy per access estimates. To get component energy, we then multiply the scaled energy per access with component activity factors as mentioned in the paper. The AccelWattch HW model (accelwattch_sass_hw.xml) collects the hardware power measurements and performance counters (component activity factors) for components shown in Table 1 from hardware counters, except for the shaded components. Note that there are no hardware counters for L1i and register file activity. That is why the scaling coefficients in accelwattch_sass_hw.xml for L1i and register file are set to 0, those components are not modeled separately for AccelWattch HW. In total, there should be 23 dynamic power components for AccelWattch HW, 25 for AccelWattch SASS SIM (+L1i, +RegisterFile), and 31 for AccelWattch PTX SIM as ptx simulation mode models a few more components like INT_MUL24. For the tuning process, we selected a starting point with per-component scaling coefficients obtained from the GPUWattch model for NVIDIA Fermi GTX 480, instead of starting with all scaling coefficients set as 1 - See section 5.4 of the paper. The quadprog_solver.m script is setting the input matrices up, enforcing constraints to guard against unrealistic estimates for the solver (shown in equation (14)), and calling quadprog. The output of the quadprog solver should be a single column of "factors" which you multiply with your corresponding starting per-component scaling coefficients to obtain new scaling coefficients that will get the sum of your scaled per-component powers closer to the total system power measurements of the microbenchmarks. You then scale your per-component powers with the corresponding new scaling coefficients and pass them into the solver for the next iteration. You stop iterating when the solver is unable to improve your per-component scaling coefficients any more. The version of this quadprog script that's included in the repository has the input/output matrices sized for accelwattch_ptx_sim.xml with 31 dynamic power components modeled (first 31 columns of input.csv), and it expects the measured total system power for microbenchmarks in column 32 of input.csv. You would have to resize the input and output matrices in the script accordingly to use it for AccelWattch HW, so 23 columns for per-component powers and the 24th column for total system power measurements. You would also have to update matrix C in the script at each iteration of the solver (and at starting point) based on your scaling coefficients to respect the constraints we enforce in equation 14 to guard against unrealistic power estimates for the scaled energy per instructions of execution units. |
Beta Was this translation helpful? Give feedback.
-
Hello,
In an attempt to re-produce the power coefficients of the simulation .xml's for the V100 GPUs as reported in the AccelWattch paper we have recollected:
By running the accelwattch model we can receive an initial erroneous estimate for the power which we wish to use as the first step of the regression to calculate the power coefficients as described in the paper. We would like to know how to use the aforementioned collected data to perform that regression and produce the desired .xml's.
We have detected the /accel-sim-framework/util/accelwattch/quadprogsolver.m script which would appear to be a part of the process however we have not been able to find suitable documentation on how to use it.
Thank you in advance!
Beta Was this translation helpful? Give feedback.
All reactions