Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expected behavior to unified memory #3617

Merged
merged 1 commit into from
Nov 26, 2024
Merged

Conversation

matyas-streamhpc
Copy link

No description provided.

@matyas-streamhpc matyas-streamhpc self-assigned this Sep 19, 2024
@neon60 neon60 marked this pull request as draft September 19, 2024 09:43
@neon60 neon60 force-pushed the update-unified-memory branch from da1233f to 267b2f2 Compare September 26, 2024 10:23
@neon60 neon60 force-pushed the update-unified-memory branch 3 times, most recently from 7e27ba9 to 84f9097 Compare September 29, 2024 17:13
@neon60 neon60 force-pushed the update-unified-memory branch 4 times, most recently from 934e4d3 to 4baf272 Compare October 9, 2024 09:59
@neon60 neon60 force-pushed the update-unified-memory branch 3 times, most recently from 546b8d9 to 6244fc2 Compare October 15, 2024 05:37
@neon60 neon60 changed the base branch from docs/develop to hip_runtime_api_how-to October 15, 2024 08:23
@neon60 neon60 marked this pull request as ready for review October 15, 2024 08:23
@neon60 neon60 requested a review from MKKnorr October 15, 2024 08:24
@MKKnorr MKKnorr force-pushed the update-unified-memory branch from f4ab47d to 74ebf79 Compare October 25, 2024 11:03
@neon60 neon60 force-pushed the hip_runtime_api_how-to branch from 38d2787 to 8d5b771 Compare October 30, 2024 09:13
@neon60 neon60 force-pushed the update-unified-memory branch from 6879e02 to ab44e41 Compare October 31, 2024 13:16
@neon60 neon60 changed the base branch from hip_runtime_api_how-to to docs/develop October 31, 2024 14:30
********************************************************************************

The HIP runtime API provides C and C++ functionalities to manage event, stream,
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Compute

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently CLR is supposed to be Compute Language Runtime

The runtime offers functions for allocating, freeing, and copying device memory,
along with transferring data between host and device memory.

Here are the various memory management techniques:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these techniques could use a brief description?

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/pageable_pinned.svg

The pageable and pinned memory allow you to exercise direct control over
memory operations, which is known as explicit memory management. When using the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In memory_management.rst we list four memory management techniques, none of which is "explicit memory management". Is it possible to relate this explicit memory management to one of the four techniques? Or should it be listed as one of the techniques?

// Run the kernel
// ...

HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. We have device_output and host_output, shouldn't those come into play here?

Pinned memory
================================================================================

Pinned memory or page-locked memory is stored in pages that are locked in specific sectors in RAM and can't be migrated. The pointer can be used on both

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should note that pinned memory is allocated by hipMalloc() and hipHostMalloc().

// Run the kernel
// ...

HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be DeviceToHost?

as HBM2e. Device memory can be allocated as global memory, constant, texture or
surface memory.

Global memory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making this a bulleted list rather than four separate subheads.

:sup:`1` The :cpp:func:`hipHostMalloc` memory allocation coherence mode can be
affected by the ``HIP_HOST_COHERENT`` environment variable, if the
``hipHostMallocCoherent``, ``hipHostMallocNonCoherent``, and
``hipHostMallocMapped`` are unset. If neither these flags nor the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the footnote to the table references hipHostMallocMapped flag, but the table above does not show this flag. This also is mentioned in the following note.

functions. To unmap the memory, use :cpp:func:`hipMemUnmap`. To release the
virtual address range, use :cpp:func:`hipMemAddressFree`. Finally, to release
the physical memory, use :cpp:func:`hipMemRelease`. A side effect of these
functions is the lack of synchronization when memory is released. If you call
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call :cpp:func:hipFree when you have multiple streams running in parallel, it
synchronizes the device. This causes worse resource usage and performance.

This seems like it could be an important note?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 24: - Learning curve: Requires additional effort to understand and utilize SOMA effectively.

Copy link

@randyh62 randyh62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments

@neon60 neon60 force-pushed the update-unified-memory branch from 78c8a5a to aa3584c Compare November 5, 2024 07:56
@neon60 neon60 changed the base branch from docs/develop to hip_runtime_api_how-to November 5, 2024 07:57
@neon60 neon60 force-pushed the update-unified-memory branch from aa3584c to 7c6949e Compare November 5, 2024 08:10
@neon60 neon60 force-pushed the hip_runtime_api_how-to branch 2 times, most recently from eb41cb1 to 7cb026c Compare November 7, 2024 12:00
@MKKnorr MKKnorr force-pushed the update-unified-memory branch 2 times, most recently from 0c4abec to 07a7fdc Compare November 8, 2024 10:41
Copy link

@randyh62 randyh62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@MKKnorr MKKnorr force-pushed the update-unified-memory branch from 07a7fdc to af9064b Compare November 15, 2024 09:29
@neon60 neon60 force-pushed the hip_runtime_api_how-to branch from 0c52420 to 31fa125 Compare November 18, 2024 16:16
@neon60 neon60 deleted the branch docs/develop November 18, 2024 17:39
@neon60 neon60 closed this Nov 18, 2024
@neon60 neon60 reopened this Nov 18, 2024
@MKKnorr MKKnorr changed the base branch from hip_runtime_api_how-to to docs/develop November 19, 2024 09:25
@MKKnorr MKKnorr force-pushed the update-unified-memory branch 3 times, most recently from 890f0a5 to bbb00d0 Compare November 25, 2024 09:44
@g-h-c
Copy link

g-h-c commented Nov 26, 2024

I think many people would be caught up by the new/delete example not working, as its ability to work is not only dependent on the device attributes but also the kernel HMM support. I would add some comments to the actual code, e.g.

// for this example to work the environment variable HSA_XNACK needs to be set to 1, and the kernel needs HMM support

We can potentially indicate that a way to check for HMM support is setting the environment variable, AMD_LOG_LEVEL e.g. to 3 or 4. For instance on my instinct accelerator it prints:

:3:rocdevice.cpp :1790: 10080897639915 us: HMM support: 1, XNACK: 1, Direct host access: 0

when set to 3.
But this last thing about AMD_LOG_LEVEL does not necessarily belong to this page

@MKKnorr MKKnorr force-pushed the update-unified-memory branch from bbb00d0 to 24856f9 Compare November 26, 2024 12:00
@MKKnorr
Copy link

MKKnorr commented Nov 26, 2024

Thanks for the review, I agree with your points.

I think many people would be caught up by the new/delete example not working, as its ability to work is not only dependent on the device attributes but also the kernel HMM support. I would add some comments to the actual code, e.g.

I clarified that a bit more under the system requirements, and moved the first mention of HMM support there.

// for this example to work the environment variable HSA_XNACK needs to be set to 1, and the kernel needs HMM support

Added a comment similar to this to the example

We can potentially indicate that a way to check for HMM support is setting the environment variable, AMD_LOG_LEVEL e.g. to 3 or 4. For instance on my instinct accelerator it prints:

:3:rocdevice.cpp :1790: 10080897639915 us: HMM support: 1, XNACK: 1, Direct host access: 0

when set to 3. But this last thing about AMD_LOG_LEVEL does not necessarily belong to this page

Agreed, this is a bit overkill, as AMD_LOG_LEVEL=3 also prints a lot of other stuff, which might make it hard to look for the HMM and XNACK support. Also similar information is listed on https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html#xnack, which is being linked several times in this document.

@MKKnorr MKKnorr force-pushed the update-unified-memory branch from 24856f9 to b6da83e Compare November 26, 2024 12:48
@MKKnorr MKKnorr merged commit 923f80a into docs/develop Nov 26, 2024
4 checks passed
@neon60 neon60 deleted the update-unified-memory branch December 5, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants