Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AccumulateTopology uses uninitialized global memory? #62

Open
icoderaven opened this issue Mar 8, 2019 · 5 comments
Open

AccumulateTopology uses uninitialized global memory? #62

icoderaven opened this issue Mar 8, 2019 · 5 comments

Comments

@icoderaven
Copy link
Contributor

Hi @ramakarl

In the process of trying to debug my application today, I ran cuda-memcheck with
cuda-memcheck --tool initcheck
and discovered that when using AccumulateTopology() I get a whole host of global reads for uninitialized memory. To replicate, this is easily seen if running it on the gPointFusion example
cuda-memcheck --tool initcheck ./gPointFusion
Is this a bug? I don't see this when using RebuildTopology(), but I'd much rather not write additional logic to store the active brick center locations and then add a new point cloud to those points.

Tracing back the error message it seems like something from within the RadixSort CuDPP methods.

========= Uninitialized __global__ memory read of size 4
=========     at 0x000001c8 in void thrust::cuda_cub::cub::RadixSortScanBinsKernel<thrust::cuda_cub::cub::DeviceRadixSortPolicy<__int64, thrust::cuda_cub::cub::NullType, int>::Policy700, int>(thrust::cuda_cub::cub::NullType*, int)
=========     by thread (374,0,0) in block (0,0,0)
=========     Address 0x7fc3305a69d8
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x24f90d]
=========     Host Frame:/home/icoderaven/packages/gvdb2/build/shared_cudpp/lib/libcudpp_cu9.2.so [0x458172]
=========     Host Frame:/home/icoderaven/packages/gvdb2/build/shared_cudpp/lib/libcudpp_cu9.2.so [0x458367]
=========     Host Frame:/home/icoderaven/packages/gvdb2/build/shared_cudpp/lib/libcudpp_cu9.2.so [0x4870f5]
=========     Host Frame:/home/icoderaven/packages/gvdb2/build/shared_cudpp/lib/libcudpp_cu9.2.so (_ZN6thrust8cuda_cub3cub23RadixSortScanBinsKernelINS1_21DeviceRadixSortPolicyIxNS1_8NullTypeEiE9Policy700EiEEvPT0_i + 0xdf) [0x3c3d0f]
=========     Host Frame:/home/icoderaven/packages/gvdb2/build/shared_cudpp/lib/libcudpp_cu9.2.so (_ZN6thrust8cuda_cub3cub15DeviceRadixSort8SortKeysIxEE9cudaErrorPvRmRNS1_12DoubleBufferIT_EEiiiP11CUstream_stb + 0x168e) [0x425d1e]
=========     Host Frame:/home/icoderaven/packages/gvdb2/build/shared_cudpp/lib/libcudpp_cu9.2.so (_ZN6thrust8cuda_cub4sortINS0_3tagENS_10device_ptrIxEENS_4lessIxEEEEvRNS0_16execution_policyIT_EET0_SB_T1_ + 0xe6) [0x42b086]
=========     Host Frame:/home/icoderaven/packages/gvdb2/build/shared_cudpp/lib/libcudpp_cu9.2.so (_Z7runSortIxEvPT_PjmPK18CUDPPRadixSortPlan + 0x57) [0x432197]
=========     Host Frame:/home/icoderaven/packages/gvdb2/build/shared_cudpp/lib/libcudpp_cu9.2.so (cudppRadixSort + 0x3e) [0xe003e]
=========     Host Frame:/home/icoderaven/packages/gvdb2/install/lib/libgvdb.so (_ZN4nvdb10VolumeGVDB15RadixSortByByteEii + 0xb9) [0x80fd9]
=========     Host Frame:/home/icoderaven/packages/gvdb2/install/lib/libgvdb.so (_ZN4nvdb10VolumeGVDB17ActivateBricksGPUEifNS_9Vector3DFEiNS_9Vector3DIE + 0x357) [0x82f57]
=========     Host Frame:/home/icoderaven/packages/gvdb2/install/lib/libgvdb.so (_ZN4nvdb10VolumeGVDB15RebuildTopologyEifNS_9Vector3DFE + 0x180) [0x8c1b0]
=========     Host Frame:/home/icoderaven/packages/gvdb2/install/lib/libgvdb.so (_ZN4nvdb10VolumeGVDB18AccumulateTopologyEifNS_9Vector3DFEi + 0xa0) [0x8c300]
@ramakarl
Copy link
Contributor

ramakarl commented Apr 1, 2019

It may be an issue in CuDPP.

@icoderaven
Copy link
Contributor Author

Even after the move away from cudpp, this error still shows up while using thrust (specifically when casting the device pointer to host in gvdb_cutils.cu) but seems to be not much of an in issue. I'm not sure if there are performance/safety implications, however.

@NBickford-NV
Copy link
Collaborator

NBickford-NV commented Jun 9, 2020

Ah, that's interesting to hear! I'll probably see if I can take time to run through it and see where Thrust is using uninitialized memory soon. (You probably noticed that I closed most of the issues that were about CUDPP earlier today - I left this one open because it seemed like it could still be an issue, so that's good to hear!)

@icoderaven
Copy link
Contributor Author

I remember looking at it a couple of weeks back, and there were two instances -
First, in createLinearMem initcheck complained about not initialising memory - I think that was benign, not sure how much of an impact memsetting it to 0 would have (didn't profile)
Second was the location I mentioned, and I suspect it's got to do with unified memory used in thrust, but I don't have much experience there, so I left it there and didn't update this thread. Your recent activity did cause me to post, as you correctly guessed!

@icoderaven
Copy link
Contributor Author

BTW, I think I buried this in one of my previous issues that I closed two weeks back, but in AccumulateTopology, when adding a new set of points that would increase the spanning cover and requiring a new root node to be created, the library segfaults. I wrote a quick fix that works for now for me that involves reparenting the root node, but might be worth a second look (and I think should definitely merged into mainline)
icoderaven@b3256a9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants