Skip to content

Commit

Permalink
part 1 to support rocprofv3 (#492)
Browse files Browse the repository at this point in the history
* rocprofv3 support initial commit

-Can run rocprofv3 but ultimately fails. rocprofv3 says the counter capacity
is exceeded and the output CSV file format is different from v1/v2.

* Add rocprofv3 detection so v2 can still be used

It's hacky but it'll do for now.

* Add code path to convert rocprofv3 JSON output into CSV

* Grab correct value for Queue ID

* Use _sum suffix to sum TCC counters

Previously we were specifying each channel for TCC counters. rocprofv3 does
not support specifing each TCC channel, and instead will auto sum given
the TCC counter name. The counter name with the _sum suffix is also
supported and is also supported in v1 and v2. So we will use the TCC
counter name with the _sum suffix.

* Fix incorrect counter outputs when using rocprofv3

In the JSON output some counters appear multime times and must be
summed to get the correct value. These summed values match the
rocprofv3 output in CSV mode and also match the rocprofv2
output.

* Remove duplicate Correlation_ID and Wave_Size in output

* Handle json output that does not contain any dispatches

Omniperf was assuming each JSON output from rocprofv3 would always contain
dispatches. This is not the case. For example, in a multi-process
workload where one of the processes does not dispatch any kernels. A JSON
file will still be output for this process but it will not contain any dispatches.

* Code cleanup

* Update search path for rocprofv3 results

Rocprofv3 was updated to include the hostname in the path where
it outputs results.

* Handle accumulate counters

In v1/v2 rocprof uses the SQ_ACCUM_PREV_HIRES counter for the accumualte
counters. v3 does not have this. So we need to define our own counters
in counter_defs.yaml. For this we use the counter name + _ACCUM, for
example SQ_INSTR_LEVEL_SMEM_ACCUM.

To use rocprofv3 you will need to update counter_defs.yaml to include
these new counter definitions.

* Use correct GPU ID

When converting JSON -> CSV we were assigning node_id to GPU_ID. Since
the JSON contains non-GPU devices, the node_id for GPUs might not
start at 0 as expected.

This commit maps the agent ID to the appropriate GPU ID.

* Parse scratch memory per work item from JSON

* Support rocprofv3 CSV parsing

JSON decoding is very slow for large files. Include support for parsing
rocprofv3 CSV output and make that the default.

CSV/JSON can be toggled via the ROCPROF_OUTPUT_FORMAT environment
variable e.g. ROCPROF_OUTPUT_FORMAT=csv or ROCPROF_OUTPUT_FORMAT=json

* black format after merge

* format isort

* change return of rocprof_cmd to try to resolve test's error

* hack to pick last part of rocminfo's name

* debug log of hacks

* Modify test_profile_general.py ctest to include MI300 enablement. Currently failing because of explicitly excluded roofline files for the soc and autofailed asserts for roof-only tests- originally in place because roofline was not enabled on mi300 yet.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* black and isort formated

* corrected line of copyright

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: benrichard-amd <ben.richard@amd.com>
Co-authored-by: YANG WANG <ywang@ywang-ubuntu.amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
  • Loading branch information
4 people authored Dec 4, 2024
1 parent 903241a commit 6c470ce
Show file tree
Hide file tree
Showing 7 changed files with 509 additions and 34 deletions.
8 changes: 8 additions & 0 deletions src/rocprof_compute_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ def detect_profiler(self):
self.__profiler_mode = "rocprofv1"
elif str(rocprof_cmd).endswith("rocprofv2"):
self.__profiler_mode = "rocprofv2"
elif str(rocprof_cmd).endswith("rocprofv3"):
self.__profiler_mode = "rocprofv3"
else:
console_error(
"Incompatible profiler: %s. Supported profilers include: %s"
Expand Down Expand Up @@ -221,6 +223,12 @@ def run_profiler(self):
profiler = rocprof_v2_profiler(
self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
)
elif self.__profiler_mode == "rocprofv3":
from rocprof_compute_profile.profiler_rocprof_v3 import rocprof_v3_profiler

profiler = rocprof_v3_profiler(
self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
)
elif self.__profiler_mode == "rocscope":
from rocprof_compute_profile.profiler_rocscope import rocscope_profiler

Expand Down
8 changes: 7 additions & 1 deletion src/rocprof_compute_profile/profiler_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,8 @@ def join_prof(self, out=None):
"TID",
"SIG",
"OBJ",
"Correlation_ID_",
"Wave_Size_",
# rocscope specific merged counters, keep original
"dispatch_",
# extras
Expand Down Expand Up @@ -358,7 +360,11 @@ def run_profiling(self, version: str, prog: str):
# Fetch any SoC/profiler specific profiling options
options = self._soc.get_profiler_options()
options += self.get_profiler_options(fname)
if self.__profiler == "rocprofv1" or self.__profiler == "rocprofv2":
if (
self.__profiler == "rocprofv1"
or self.__profiler == "rocprofv2"
or self.__profiler == "rocprofv3"
):
run_prof(
fname=fname,
profiler_options=options,
Expand Down
92 changes: 92 additions & 0 deletions src/rocprof_compute_profile/profiler_rocprof_v3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
##############################################################################bl
# MIT License
#
# Copyright (c) 2024 - 2024 Advanced Micro Devices, Inc. All Rights Reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
##############################################################################el

import os
import shlex

from rocprof_compute_profile.profiler_base import RocProfCompute_Base
from utils.utils import console_error, console_log, demarcate, replace_timestamps


class rocprof_v3_profiler(RocProfCompute_Base):
def __init__(self, profiling_args, profiler_mode, soc):
super().__init__(profiling_args, profiler_mode, soc)
self.ready_to_profile = (
self.get_args().roof_only
and not os.path.isfile(os.path.join(self.get_args().path, "pmc_perf.csv"))
or not self.get_args().roof_only
)

def get_profiler_options(self, fname):
app_cmd = shlex.split(self.get_args().remaining)
output_format = "csv"
if "ROCPROF_OUTPUT_FORMAT" in os.environ.keys():
output_format = os.environ["ROCPROF_OUTPUT_FORMAT"].lower()

if output_format not in ["csv", "json"]:
console_error("Invalid rocprof output format", True)

args = [
# v3 requires output directory argument
"-d",
self.get_args().path + "/" + "out",
"--kernel-trace",
"--output-format",
output_format,
"--",
]
args.extend(app_cmd)
return args

# -----------------------
# Required child methods
# -----------------------
@demarcate
def pre_processing(self):
"""Perform any pre-processing steps prior to profiling."""
super().pre_processing()

@demarcate
def run_profiling(self, version, prog):
"""Run profiling."""
if self.ready_to_profile:
if self.get_args().roof_only:
console_log(
"roofline", "Generating pmc_perf.csv (roofline counters only)."
)
# Log profiling options and setup filtering
super().run_profiling(version, prog)
else:
console_log("roofline", "Detected existing pmc_perf.csv")

@demarcate
def post_processing(self):
"""Perform any post-processing steps prior to profiling."""
super().post_processing()

if self.ready_to_profile:
# Manually join each pmc_perf*.csv output
self.join_prof()
# Replace timestamp data to solve a known rocprof bug
# replace_timestamps(self.get_args().path)
67 changes: 37 additions & 30 deletions src/rocprof_compute_soc/soc_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,11 @@ def add(self, counter) -> bool:
return self.blocks[block].add(counter)


# TODO: This is a HACK
def using_v3():
return "ROCPROF" in os.environ.keys() and os.environ["ROCPROF"] == "rocprofv3"


@demarcate
def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir):
"""Sort and bucket all related performance counters to minimize required application passes"""
Expand Down Expand Up @@ -334,14 +339,21 @@ def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir):
# Normal counters
for ctr in counters:

# v3 doesn't seem to support this counter
if using_v3():
if ctr.startswith("TCC_BUBBLE"):
continue

# Channel counter e.g. TCC_ATOMIC[0]
if "[" in ctr:

# Remove channel number, append "_expand" so we know
# add the channel numbers back later
# Remove channel number, append "_sum" so rocprof will
# sum the counters for us instead of specifying every
# channel.
channel = int(ctr.split("[")[1].split("]")[0])
if channel == 0:
counter_name = ctr.split("[")[0] + "_expand"
counter_name = ctr.split("[")[0] + "_sum"

try:
normal_counters[counter_name] += 1
except:
Expand All @@ -363,8 +375,19 @@ def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir):
# Each accumulate counter is in a different file
for ctrs in accumulate_counters:

# Get name of the counter and use it as file name
ctr_name = ctrs[ctrs.index("SQ_ACCUM_PREV_HIRES") - 1]

if using_v3():
# v3 does not support SQ_ACCUM_PREV_HIRES. Instead we defined our own
# counters in counter_defs.yaml that use the accumulate() function. These
# use the name of the accumulate counter with _ACCUM appended to them.
ctrs.remove("SQ_ACCUM_PREV_HIRES")

accum_name = ctr_name + "_ACCUM"

ctrs.append(accum_name)

# Use the name of the accumulate counter as the file name
output_files.append(CounterFile(ctr_name + ".txt", perfmon_config))
for ctr in ctrs:
output_files[-1].add(ctr)
Expand Down Expand Up @@ -393,26 +416,8 @@ def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir):

pmc = []
for block_name in f.blocks.keys():
if block_name == "TCC":

# Expand and interleve the TCC channel counters
# e.g. TCC_HIT[0] TCC_ATOMIC[0] ... TCC_HIT[1] TCC_ATOMIC[1] ...
channel_counters = []
for ctr in f.blocks[block_name].elements:
if "_expand" in ctr:
channel_counters.append(ctr.split("_expand")[0])

for i in range(0, perfmon_config["TCC_channels"]):
for c in channel_counters:
pmc.append("{}[{}]".format(c, i))

# Handle the rest of the TCC counters
for ctr in f.blocks[block_name].elements:
if "_expand" not in ctr:
pmc.append(ctr)
else:
for ctr in f.blocks[block_name].elements:
pmc.append(ctr)
for ctr in f.blocks[block_name].elements:
pmc.append(ctr)

stext = "pmc: " + " ".join(pmc)

Expand All @@ -425,9 +430,11 @@ def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir):
fd.close()

# Add a timestamp file
fd = open(os.path.join(workload_perfmon_dir, "timestamps.txt"), "w")
fd.write("pmc:\n\n")
fd.write("gpu:\n")
fd.write("range:\n")
fd.write("kernel:\n")
fd.close()
# TODO: Does v3 need this?
if not using_v3():
fd = open(os.path.join(workload_perfmon_dir, "timestamps.txt"), "w")
fd.write("pmc:\n\n")
fd.write("gpu:\n")
fd.write("range:\n")
fd.write("kernel:\n")
fd.close()
2 changes: 1 addition & 1 deletion src/rocprof_compute_soc/soc_gfx942.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def __init__(self, args, mspec):
"gfx940",
)
)
self.set_compatible_profilers(["rocprofv1", "rocprofv2"])
self.set_compatible_profilers(["rocprofv1", "rocprofv2", "rocprofv3"])
# Per IP block max number of simultaneous counters. GFX IP Blocks
self.set_perfmon_config(
{
Expand Down
Loading

0 comments on commit 6c470ce

Please sign in to comment.