fix illegal memory access #150

psychocoderHPC · 2019-01-04T13:07:39Z

The number of warps per multiprocessor depends on the architecture.
In some places, the warp id was used to access block shared memory with a fixed size of 32.
Since sm_20 the number of warps per multiprocessor is 64 which can create an out of memory access.

add helper methods for:
- MaxThreadsPerBlock
- WarpSize
- warpid_withinblock()
fix collective warp aggregations

This bug effects:

the device function getAvailableSlotsAccelerator()
the distribution pollicy XMallocSIMD

The number of warps per multiprocessor depends on the architecture. On some places the warp id was used to access block shared memory with a fixed size of 32. Since sm_20 the number of warps per multiprocessor is 64 which can lead into a out of memory access. - add helper methods for: - MaxThreadsPerBlock - WarpSize - warpid_withinblock() - fix collective warp aggregations

ax3l

Interestingly, that's still 32 but with valid access now ;-)

ax3l · 2019-01-04T15:36:18Z

src/include/mallocMC/mallocMC_utils.hpp

+   */
+  struct MaxThreadsPerBlock
+  {
+	// valid for sm_2.X - sm_7.5


ax3l · 2019-01-04T15:38:24Z

src/include/mallocMC/mallocMC_utils.hpp

+    BOOST_STATIC_ASSERT uint32_t value = 1024;
+  };
+
+  /** the maximal number threads per block


copy paste in docs title :)

ax3l · 2019-01-04T15:38:27Z

src/include/mallocMC/mallocMC_utils.hpp

+   */
+  struct WarpSize
+  {
+	// valid for sm_2.X - sm_7.5


ax3l · 2019-01-04T15:38:58Z

src/include/mallocMC/mallocMC_utils.hpp

+
+  /** warp id within a block
+   *
+   * The id is constant over the livetime of the thread.


needs more details please:
the warpid you calculate here is: unique for the warps in a block over the lifetime of a whole block, right?
Because it could also be unique only for the active number of threads, etc.

typo: lifetime

the warpid you calculate here is: unique for the warps in a block over the lifetime of a whole block, right?

Yes, but since a thread can only have a warpid during the lifetime, the doc is correct. As long as the thread exists it has a corresponding warpid. It do not matter if the thread is active or inactive.

ax3l · 2019-01-04T15:41:46Z

src/include/mallocMC/mallocMC_utils.hpp

+   * The id is constant over the livetime of the thread.
+   * The id is not equal to warpid().
+   *
+   * @return warp id within the block


in warpid() we should add a comment that it should not be used besides for diagnostics.
Maybe we want to implement the %ctaid and %tid getter as described by the docs.

Can you please grep the code after your change to check no warpid is used anywhere else?

in warpid() we should add a comment that it should not be used besides for diagnostics.

We can extend the documentation but the point with the diagnostic is not correct e.g. warpid is still used for a hashing function where it makes sense and it is not critical.

Maybe we want to implement the %ctaid and %tid getter as described by the docs.

I do not understand how this helps because %%tid == threadIdx

ax3l · 2019-01-04T15:51:45Z

src/include/mallocMC/mallocMC_utils.hpp

+  MAMC_ACCELERATOR inline boost::uint32_t warpid_withinblock()
+  {
+    return (
+      threadIdx.z * blockDim.y * blockDim.x +


that implementation is likely not true if any of the block dims is not exactly a multiple of the warp size.

For this reason, %ctaid and %tid should be used to compute a virtual warp index if such a value is needed in kernel code.

that implementation is likely not true if any of the block dims is not exactly a multiple of the warp size.

It is not required that it is a multiple. It gives the linear thread idx and than device it by the warp size.
Could you please explain why it should be wrong?

offline discussion: the question is just about the strategy of thread-linearization to warps at runtime. do we have a reference that this is the same mapping during thread scheduling?

As we found out, CTA and T (cooperative thread arrays and threads) are just PTX speech for blocks in grids and threads.

matthias-springer · 2019-01-04T17:11:21Z

src/include/mallocMC/mallocMC_utils.hpp

+  struct MaxThreadsPerBlock
+  {
+	// valid for sm_2.X - sm_7.5
+    BOOST_STATIC_ASSERT uint32_t value = 1024;


I am getting a compile error here. Shouldn't there be parentheses around any static assert? Or what does BOOST_STATIC_ASSERT do here?

ohh it is a copy past mistake it must be BOOST_STATIC_CONSTEXPR

psychocoderHPC · 2019-01-04T18:36:15Z

src/include/mallocMC/mallocMC_utils.hpp

+  struct WarpSize
+  {
+	// valid for sm_2.X - sm_7.5
+    BOOST_STATIC_ASSERT uint32_t value = 32;


must be BOOST_STATIC_CONSTEXPR

ax3l · 2019-01-07T09:23:54Z

src/include/mallocMC/mallocMC_utils.hpp

+  /** warp index within a multiprocessor
+   *
+   * Index of the warp within the multiprocessor at the moment of the query.
+   * The result is volatile and can different with each query.


Index of the warp on its assigned multiprocessor
can be different

ax3l · 2019-01-07T09:24:42Z

src/include/mallocMC/mallocMC_utils.hpp

  MAMC_ACCELERATOR inline boost::uint32_t warpid()
  {
    boost::uint32_t mywarpid;
    asm("mov.u32 %0, %%warpid;" : "=r" (mywarpid));
    return mywarpid;
  }
+
+  /** maximum number of warps on the multiprocessor


- remove tabs - update documentation - fix wrong used variable qualifier

psychocoderHPC · 2019-01-07T14:06:48Z

@matthias-springer I fixed my copy past issue. Could you please test if this PR solved the issue for you.

matthias-springer · 2019-01-08T02:04:39Z

@psychocoderHPC It works! Thank you!

psychocoderHPC added the bug label Jan 4, 2019

psychocoderHPC added this to the 2.4.0crp milestone Jan 4, 2019

psychocoderHPC requested review from slizzered and ax3l January 4, 2019 13:07

psychocoderHPC mentioned this pull request Jan 4, 2019

Use threadIdx instead of %%warpid. #149

Closed

ax3l reviewed Jan 4, 2019

View reviewed changes

matthias-springer reviewed Jan 4, 2019

View reviewed changes

psychocoderHPC commented Jan 4, 2019

View reviewed changes

ax3l reviewed Jan 7, 2019

View reviewed changes

fix style, fix wrong used qualifier

cab1dd5

- remove tabs - update documentation - fix wrong used variable qualifier

psychocoderHPC force-pushed the fix-warpsPerSM branch from 1919f96 to cab1dd5 Compare January 7, 2019 14:06

ax3l approved these changes Jan 8, 2019

View reviewed changes

ax3l merged commit 8dbb2dd into alpaka-group:dev Jan 8, 2019

psychocoderHPC deleted the fix-warpsPerSM branch January 8, 2019 07:04

ax3l mentioned this pull request Feb 13, 2019

Prepare 2.3.1crp Release #152

Open

ax3l modified the milestones: 2.4.0crp, 2.3.0crp: Refactoring Globals, Removing Macros Feb 13, 2019

ax3l mentioned this pull request Feb 14, 2019

Update mallocMC to 2.3.1crp ComputationalRadiationPhysics/picongpu#2893

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix illegal memory access #150

fix illegal memory access #150

psychocoderHPC commented Jan 4, 2019 •

edited

Loading

ax3l left a comment

ax3l Jan 4, 2019

ax3l Jan 4, 2019

ax3l Jan 4, 2019

ax3l Jan 4, 2019

psychocoderHPC Jan 4, 2019

ax3l Jan 4, 2019

psychocoderHPC Jan 4, 2019

ax3l Jan 4, 2019 •

edited

Loading

ax3l Jan 4, 2019 •

edited

Loading

psychocoderHPC Jan 4, 2019

ax3l Jan 4, 2019 •

edited

Loading

matthias-springer Jan 4, 2019

psychocoderHPC Jan 4, 2019

psychocoderHPC Jan 4, 2019

ax3l Jan 7, 2019

ax3l Jan 7, 2019

psychocoderHPC commented Jan 7, 2019

matthias-springer commented Jan 8, 2019

fix illegal memory access #150

fix illegal memory access #150

Conversation

psychocoderHPC commented Jan 4, 2019 • edited Loading

ax3l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ax3l Jan 4, 2019 • edited Loading

Choose a reason for hiding this comment

ax3l Jan 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ax3l Jan 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psychocoderHPC commented Jan 7, 2019

matthias-springer commented Jan 8, 2019

psychocoderHPC commented Jan 4, 2019 •

edited

Loading

ax3l Jan 4, 2019 •

edited

Loading

ax3l Jan 4, 2019 •

edited

Loading

ax3l Jan 4, 2019 •

edited

Loading