-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce number of calls to rocprof #384
Conversation
I reviewed the PR and in general, things look good. The one thing that I wanted to review closely was the Omniperf metrics that are explicitly dependent on SQ_*.csv (indicated by the Things line up for the most part however when contrasting the two I find that v1 (original code vs. original code)omniperf analyze -p workloads/orig_run1/MI300A/ -p workloads/orig_run2/MI300A/ -b 11.2.10 13.2.5 v2 (original code vs. ben's modification)omniperf analyze -p workloads/orig_run1/MI300A/ -p workloads/bens_run1/MI300A/ -b 11.2.10 13.2.5 |
Hi @coleramos425,
I think this is run-to-run variation. Might depend on the workload. I ran the same experiment using the occupancy.hip sample workload (MI300X): |
Signed-off-by: benrichard-amd <ben.richard@amd.com>
Signed-off-by: benrichard-amd <ben.richard@amd.com>
Interleve TCC channel counters in putput file e.g. TCC_HIT[0] TCC_ATOMIC[0] ... TCC_HIT[1] TCC_ATOMIC[1] Signed-off-by: benrichard-amd <ben.richard@amd.com>
Omniperf analyze expects the accumulate files to be in SQ_*.csv files. Since these files also contain PMC counters (we are trying to fit as many counters into each file as possible to minimize runs), we need to include these SQ_*.csv files in pmc_perf.csv. Signed-off-by: benrichard-amd <ben.richard@amd.com>
Signed-off-by: benrichard-amd <ben.richard@amd.com>
Signed-off-by: benrichard-amd <ben.richard@amd.com>
Ran into rocprof error: ROCProfiler: fatal error: input metric'TCC_EA0_RDREQ[16]' not supported on this hardware: gfx942 gfx942 has 16 channels, not 32. Signed-off-by: benrichard-amd <ben.richard@amd.com>
Signed-off-by: benrichard-amd <ben.richard@amd.com>
b2d124a
to
5eaed48
Compare
LGTM |
For the record, in the event of a vanilla Omniperf profiling run (e.g. no IP block filtering) this PR reduces the num. of required application replays from 24 -> 15. In the study of mixbench profiling performance (below), I found this leads to a ~27% improvement. Note: This test was ran using a production rocprofiler build. We can expect an even larger improvement when this is applied in combination with profiler performance enhancements in a future release.
CC: @koomie |
…m-rel-6.2 (#422) * Improve perfmon coalescing Signed-off-by: benrichard-amd <ben.richard@amd.com> * Interleve TCC channel counters Signed-off-by: benrichard-amd <ben.richard@amd.com> * Remove duplicate normal counters Interleve TCC channel counters in putput file e.g. TCC_HIT[0] TCC_ATOMIC[0] ... TCC_HIT[1] TCC_ATOMIC[1] Signed-off-by: benrichard-amd <ben.richard@amd.com> * Save accumulate counters to SQ_ files Omniperf analyze expects the accumulate files to be in SQ_*.csv files. Since these files also contain PMC counters (we are trying to fit as many counters into each file as possible to minimize runs), we need to include these SQ_*.csv files in pmc_perf.csv. Signed-off-by: benrichard-amd <ben.richard@amd.com> * Update to work with rocprof v1 Signed-off-by: benrichard-amd <ben.richard@amd.com> * Remove unused method Signed-off-by: benrichard-amd <ben.richard@amd.com> * Set correct number of TCC channels for gfx942 Ran into rocprof error: ROCProfiler: fatal error: input metric'TCC_EA0_RDREQ[16]' not supported on this hardware: gfx942 gfx942 has 16 channels, not 32. Signed-off-by: benrichard-amd <ben.richard@amd.com> * Fix code formatting Signed-off-by: benrichard-amd <ben.richard@amd.com> --------- Signed-off-by: benrichard-amd <ben.richard@amd.com>
Reduces number of calls to rocprof by improving perfmon coalescing.