-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add expected behavior to unified memory #3617
Conversation
da1233f
to
267b2f2
Compare
98f9599
to
73814d1
Compare
7e27ba9
to
84f9097
Compare
0049c2b
to
e7486be
Compare
934e4d3
to
4baf272
Compare
546b8d9
to
6244fc2
Compare
f4ab47d
to
74ebf79
Compare
docs/how-to/hip_runtime_api/memory_management/unified_memory.rst
Outdated
Show resolved
Hide resolved
38d2787
to
8d5b771
Compare
6879e02
to
ab44e41
Compare
docs/how-to/hip_runtime_api.rst
Outdated
******************************************************************************** | ||
|
||
The HIP runtime API provides C and C++ functionalities to manage event, stream, | ||
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common | |
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Compute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently CLR is supposed to be Compute Language Runtime
The runtime offers functions for allocating, freeing, and copying device memory, | ||
along with transferring data between host and device memory. | ||
|
||
Here are the various memory management techniques: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like these techniques could use a brief description?
.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/pageable_pinned.svg | ||
|
||
The pageable and pinned memory allow you to exercise direct control over | ||
memory operations, which is known as explicit memory management. When using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In memory_management.rst we list four memory management techniques, none of which is "explicit memory management". Is it possible to relate this explicit memory management to one of the four techniques? Or should it be listed as one of the techniques?
// Run the kernel | ||
// ... | ||
|
||
HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right to me. We have device_output and host_output, shouldn't those come into play here?
Pinned memory | ||
================================================================================ | ||
|
||
Pinned memory or page-locked memory is stored in pages that are locked in specific sectors in RAM and can't be migrated. The pointer can be used on both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should note that pinned memory is allocated by hipMalloc() and hipHostMalloc().
// Run the kernel | ||
// ... | ||
|
||
HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be DeviceToHost?
as HBM2e. Device memory can be allocated as global memory, constant, texture or | ||
surface memory. | ||
|
||
Global memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest making this a bulleted list rather than four separate subheads.
docs/how-to/hip_runtime_api/memory_management/device_memory.rst
Outdated
Show resolved
Hide resolved
:sup:`1` The :cpp:func:`hipHostMalloc` memory allocation coherence mode can be | ||
affected by the ``HIP_HOST_COHERENT`` environment variable, if the | ||
``hipHostMallocCoherent``, ``hipHostMallocNonCoherent``, and | ||
``hipHostMallocMapped`` are unset. If neither these flags nor the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the footnote to the table references hipHostMallocMapped flag, but the table above does not show this flag. This also is mentioned in the following note.
functions. To unmap the memory, use :cpp:func:`hipMemUnmap`. To release the | ||
virtual address range, use :cpp:func:`hipMemAddressFree`. Finally, to release | ||
the physical memory, use :cpp:func:`hipMemRelease`. A side effect of these | ||
functions is the lack of synchronization when memory is released. If you call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call :cpp:func:hipFree
when you have multiple streams running in parallel, it
synchronizes the device. This causes worse resource usage and performance.
This seems like it could be an important note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 24: - Learning curve: Requires additional effort to understand and utilize SOMA effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments
78c8a5a
to
aa3584c
Compare
aa3584c
to
7c6949e
Compare
eb41cb1
to
7cb026c
Compare
0c4abec
to
07a7fdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
07a7fdc
to
af9064b
Compare
0c52420
to
31fa125
Compare
890f0a5
to
bbb00d0
Compare
I think many people would be caught up by the new/delete example not working, as its ability to work is not only dependent on the device attributes but also the kernel HMM support. I would add some comments to the actual code, e.g.
We can potentially indicate that a way to check for HMM support is setting the environment variable, AMD_LOG_LEVEL e.g. to 3 or 4. For instance on my instinct accelerator it prints:
when set to 3. |
bbb00d0
to
24856f9
Compare
Thanks for the review, I agree with your points.
I clarified that a bit more under the system requirements, and moved the first mention of HMM support there.
Added a comment similar to this to the example
Agreed, this is a bit overkill, as AMD_LOG_LEVEL=3 also prints a lot of other stuff, which might make it hard to look for the HMM and XNACK support. Also similar information is listed on https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html#xnack, which is being linked several times in this document. |
24856f9
to
b6da83e
Compare
No description provided.