memcpy_async should cache only in L2 when possible #220

gonzalobg · 2021-10-28T19:02:51Z

This PR adds src sizes to memcpy_async specializations, and for the 16 byte alignment case, it changes the cache operator from caching at all levels, to only caching at the global L2 level.

After an app carves some shared memory out of the L1 and then copies data to that shared memory, chances are that it will only read that data from shared memory. Caching the data only in the L2, and not in the L1, seems like the better default.

Closes #135 .

This commit changes memcpy_async for 16 byte alignment from using ca (cache all) to using cg (cache global) hint and also specifies the size of the source.

griwes

Looks good to me, but do secure an approve from @ogiroux too ;>

gonzalobg added 2 commits October 28, 2021 10:07

memcpy_async with 16 byte alignment uses cg and pass src size

a08d5b9

This commit changes memcpy_async for 16 byte alignment from using ca (cache all) to using cg (cache global) hint and also specifies the size of the source.

Pass src_size to memcpy_async 4 and 8 byte alignment specializations

652c092

gonzalobg assigned griwes and ogiroux Oct 28, 2021

gonzalobg changed the title ~~memcpy_async should use cache only in the L2 when possible~~ memcpy_async should cache only in L2 when possible Nov 3, 2021

wmaxey requested a review from griwes November 3, 2021 20:31

wmaxey added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Nov 3, 2021

griwes approved these changes Nov 5, 2021

View reviewed changes

wmaxey merged commit 4f42427 into NVIDIA:main Nov 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memcpy_async should cache only in L2 when possible #220

memcpy_async should cache only in L2 when possible #220

gonzalobg commented Oct 28, 2021 •

edited

Loading

griwes left a comment

memcpy_async should cache only in L2 when possible #220

memcpy_async should cache only in L2 when possible #220

Conversation

gonzalobg commented Oct 28, 2021 • edited Loading

griwes left a comment

Choose a reason for hiding this comment

gonzalobg commented Oct 28, 2021 •

edited

Loading