-
Notifications
You must be signed in to change notification settings - Fork 6
rocm 711 asset update #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,58 +1,83 @@ | ||
| GPU Agent provides programmable APIs to configure and monitor AMD Instinct GPUs | ||
| # GPU Agent provides programmable APIs to configure and monitor AMD Instinct GPUs | ||
|
|
||
| To build GPU Agent, follow the steps below: | ||
| ## To build GPU Agent, follow the steps below: | ||
|
|
||
| 1. setup workspace (required once) | ||
| ### setup workspace (required once) | ||
|
|
||
| ``` | ||
| # git submodule update --init --recursive -f | ||
| ```bash | ||
| $ git submodule update --init --recursive -f | ||
| ``` | ||
|
|
||
| 2. create build container image (required once) | ||
| ### create build container image (required once) | ||
|
|
||
| ``` | ||
| # make build-container | ||
| ```bash | ||
| $ make build-container | ||
| ``` | ||
|
|
||
| 3. vendor setup workspace (required once) | ||
| ### Building artifacts | ||
|
|
||
| Follow either of the two methods below to build gpuagent and gpuctl binaries | ||
|
|
||
| #### Manual Steps | ||
|
|
||
| vendor setup workspace for manual building (required once) | ||
|
|
||
| - choose build/developer environment | ||
| - rhel9 | ||
| - rhel9 | ||
| ```bash | ||
| $ GPUAGENT_BLD_CONTAINER_IMAGE=gpuagent-builder-rhel:9 make docker-shell | ||
| [user@host]# GPUAGENT_BLD_CONTAINER_IMAGE=gpuagent-builder-rhel:9 make docker-shell | ||
| ``` | ||
|
|
||
| - ubuntu 22.04 | ||
| - ubuntu 22.04 | ||
| ```bash | ||
| $ GPUAGENT_BLD_CONTAINER_IMAGE=gpuagent-bldr-ubuntu:22.04 make docker-shell | ||
| [user@host]# GPUAGENT_BLD_CONTAINER_IMAGE=gpuagent-bldr-ubuntu:22.04 make docker-shell | ||
| ``` | ||
|
|
||
| - golang dependency setup (required once) | ||
| ```bash | ||
| [user@build-container ]# make gopkglist | ||
| ``` | ||
| - golang vendor setup | ||
| ```bash | ||
| [root@dev gpu-agent]# cd sw/nic/gpuagent/ | ||
| [root@dev gpuagent]# go mod vendor | ||
|
|
||
| ``` | ||
|
|
||
| 4. building artifacts | ||
| - choose build base os | ||
| - rhel9 | ||
| ```bash | ||
| # GPUAGENT_BLD_CONTAINER_IMAGE=gpuagent-builder-rhel:9 make gpuagent | ||
| [user@build-container ]# cd sw/nic/gpuagent/ | ||
| [user@build-container ]# go mod vendor | ||
| ``` | ||
|
|
||
| - ubuntu 22.04 | ||
| - bild gpuagent (within build-container) | ||
| ```bash | ||
| $ GPUAGENT_BLD_CONTAINER_IMAGE=gpuagent-bldr-ubuntu:22.04 make docker-shell | ||
| [user@build-container ]# make | ||
| ``` | ||
|
|
||
| 5. artifacts location | ||
| #### Full target build in single step (from host) | ||
|
|
||
| Choose build base os | ||
|
|
||
| - rhel9 | ||
| ```bash | ||
| [user@host]# GPUAGENT_BLD_CONTAINER_IMAGE=gpuagent-builder-rhel:9 make gpuagent | ||
| ``` | ||
|
|
||
| - ubuntu 22.04 | ||
| ```bash | ||
| [user@host]# GPUAGENT_BLD_CONTAINER_IMAGE=gpuagent-bldr-ubuntu:22.04 make gpuagent | ||
| ``` | ||
|
|
||
| ### Artifacts location | ||
| - gpuagent binary can be found at - ${TOP_DIR}/sw/nic/build/x86_64/sim/bin/gpuagent | ||
| - gpuctl binary can be found at - ${TOP_DIR}/sw/nic/build/x86_64/sim/bin/gpuctl | ||
|
|
||
| 6. To clean the build artifacts (run it within build-container) | ||
| ### To clean the build artifacts (run it within build-container) | ||
|
|
||
| ```bash | ||
| [root@dev gpu-agent]# make -C sw/nic/gpuagent clean | ||
| [root@dev gpu-agent]# | ||
| ``` | ||
|
|
||
| # Things to note | ||
| - For updating any amdsmi library to any other version, make sure the libamdsmi.so libraries are built correctly and are available in sw/nic/build/x86_64/sim/lib/ path. These are required during runtime, mismatch in library version may lead to runtime issues. These libraries are built from [amdsmi git](https://github.com/rocm/amdsmi/). The commit/tag the current gpuagent is built on can be found in [file](sw/nic/third-party/rocm/amd_smi_lib/version.txt) | ||
| - apply patches on amdsmi found in [here](patch/amdsmi) | ||
| - amdsmi build instructions are available [here](sw/nic/gpuagent/api/smi/amdsmi/README.md) | ||
|
|
||
| # Troubleshooting | ||
| - If you face any issue with golang dependencies, re-run `make gopkglist` and `go mod vendor` command. | ||
| - some go files are generated during build time, if you face any issue related to missing files, run `make gpuagent` command within build-container, then re-run `go mod vendor` command. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| diff --git a/src/amd_smi/amd_smi.cc b/src/amd_smi/amd_smi.cc | ||
| index 2bb04732..a169b0bc 100644 | ||
| --- a/src/amd_smi/amd_smi.cc | ||
| +++ b/src/amd_smi/amd_smi.cc | ||
| @@ -635,6 +635,11 @@ amdsmi_get_gpu_device_uuid(amdsmi_processor_handle processor_handle, | ||
| return status; | ||
| } | ||
|
|
||
| +// Add a static cache for KFD nodes with initialization flag | ||
| +static std::once_flag kfd_nodes_initialized; | ||
| +static std::map<uint64_t, std::shared_ptr<amd::smi::KFDNode>> cached_nodes; | ||
| +static uint32_t cached_smallest_node_id = std::numeric_limits<uint32_t>::max(); | ||
| + | ||
| amdsmi_status_t | ||
| amdsmi_get_gpu_enumeration_info(amdsmi_processor_handle processor_handle, | ||
| amdsmi_enumeration_info_t *info){ | ||
| @@ -663,25 +668,26 @@ amdsmi_get_gpu_enumeration_info(amdsmi_processor_handle processor_handle, | ||
| info->drm_render = gpu_device->get_drm_render_minor(); | ||
|
|
||
| // Retrieve HIP ID (difference from the smallest node ID) and HSA ID | ||
| - std::map<uint64_t, std::shared_ptr<amd::smi::KFDNode>> nodes; | ||
| - if (amd::smi::DiscoverKFDNodes(&nodes) == 0) { | ||
| - uint32_t smallest_node_id = std::numeric_limits<uint32_t>::max(); | ||
| - for (const auto& node_pair : nodes) { | ||
| - uint32_t node_id = 0; | ||
| - if (node_pair.second->get_node_id(&node_id) == 0) { | ||
| - smallest_node_id = std::min(smallest_node_id, node_id); | ||
| + // Initialize KFD nodes once | ||
| + std::call_once(kfd_nodes_initialized, []() { | ||
| + if (amd::smi::DiscoverKFDNodes(&cached_nodes) == 0) { | ||
| + for (const auto& node_pair : cached_nodes) { | ||
| + uint32_t node_id = 0; | ||
| + if (node_pair.second->get_node_id(&node_id) == 0) { | ||
| + cached_smallest_node_id = std::min(cached_smallest_node_id, node_id); | ||
| + } | ||
| } | ||
| } | ||
| + }); | ||
|
|
||
| - // Default to 0xffffffff as not supported | ||
| - info->hsa_id = std::numeric_limits<uint32_t>::max(); | ||
| - info->hip_id = std::numeric_limits<uint32_t>::max(); | ||
| - amdsmi_kfd_info_t kfd_info; | ||
| - status = amdsmi_get_gpu_kfd_info(processor_handle, &kfd_info); | ||
| - if (status == AMDSMI_STATUS_SUCCESS) { | ||
| - info->hsa_id = kfd_info.node_id; | ||
| - info->hip_id = kfd_info.node_id - smallest_node_id; | ||
| - } | ||
| + // Default to 0xffffffff as not supported | ||
| + info->hsa_id = std::numeric_limits<uint32_t>::max(); | ||
| + info->hip_id = std::numeric_limits<uint32_t>::max(); | ||
| + amdsmi_kfd_info_t kfd_info; | ||
| + status = amdsmi_get_gpu_kfd_info(processor_handle, &kfd_info); | ||
| + if (status == AMDSMI_STATUS_SUCCESS) { | ||
| + info->hsa_id = kfd_info.node_id; | ||
| + info->hip_id = kfd_info.node_id - cached_smallest_node_id; | ||
| } | ||
|
|
||
| // Retrieve HIP UUID |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.