Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMM Memory Leak after running for a while [QST] #75

Closed
lucafuji opened this issue Apr 9, 2019 · 11 comments
Closed

RMM Memory Leak after running for a while [QST] #75

lucafuji opened this issue Apr 9, 2019 · 11 comments
Labels
feature request New feature or request question Further information is requested

Comments

@lucafuji
Copy link
Contributor

lucafuji commented Apr 9, 2019

What is your question?
AresDB integrated with RMM last week and tried to run it under staging for a while.
We used pooled memory management and default stream for memory allocation.

After 30 minutes, it seems all memory of one GPU card is exhausted and a segmentation fault happens in next memory allocation.

I don't think there are any memory leaks in our code since previously when we call cudaMalloc/cudaFree, it works.

Here is the link to our code
https://github.com/uber/aresdb/blob/master/memutils/memory/rmm_alloc.cu
Thank you so much!

@lucafuji lucafuji added the question Further information is requested label Apr 9, 2019
@jrhemstad
Copy link
Contributor

Are you 100% sure that you don't have any other rogue cudaMalloc calls within your code? If the memory pool grows to fill the entire GPU memory and you attempt a normal cudaMalloc, you'll get an OOM error.

Also, you say that you get a segmentation fault on the allocation. I would expect that if you were running out of memory that you'd get an OOM error code.

@jrhemstad
Copy link
Contributor

You can also enable logging at initialization of RMM with this flag: https://github.com/rapidsai/rmm/blob/branch-0.7/include/rmm/rmm_api.h#L68

This will log every allocation/free allowing you to plot your allocs/frees overtime and see if there are leaks in your calls.

@harrism
Copy link
Member

harrism commented Apr 30, 2019

@lucafuji any response to @jrhemstad 's logging suggestion? I don't think there are sufficient details here to allow us to repro locally.

@lucafuji
Copy link
Contributor Author

lucafuji commented May 7, 2019

@lucafuji any response to @jrhemstad 's logging suggestion? I don't think there are sufficient details here to allow us to repro locally.

Yes, let me turn on logging to try

@lucafuji
Copy link
Contributor Author

lucafuji commented May 9, 2019

here is the log I got

Event Type,Device ID,Address,Stream,Size (bytes),Free Memory,Total Memory,Current Allocs,Start,End,Elapsed,Location
Alloc,0,0x7f1b37c00000,0,720984,0,0,1,46.0344,46.0345,0.000155089,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b37e00000,0,5056,0,0,2,46.0429,46.0431,0.000168821,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae00000,0,101376,0,0,3,46.0431,46.0433,0.000150959,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b02000000,0,23246208,0,0,4,46.0433,46.0435,0.000166125,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b00000000,0,23246176,0,0,5,46.0454,46.0455,0.000150455,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b01800000,0,5811544,0,0,6,46.0456,46.0457,0.000131093,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b37e01400,0,5120,0,0,7,46.0458,46.0458,1e-05,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae18c00,0,160064,0,0,8,46.0459,46.0459,9.546e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afc000000,0,45950208,0,0,9,46.046,46.0462,0.000174189,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae3fe00,0x7f1b04000bf0,91903,0,0,10,46.0481,46.0481,1.7684e-05,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b3ae3fe00,0x7f1b04000bf0,0,0,0,9,46.0484,46.0484,4.976e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b3ae3fe00,0x7f1b04000bf0,51967,0,0,10,46.0485,46.0485,5.73e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b3ae3fe00,0x7f1b04000bf0,0,0,0,9,46.0487,46.0487,4.16e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b3ae3fe00,0x7f1b04000bf0,51967,0,0,10,46.0489,46.0489,4.43e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b3ae3fe00,0x7f1b04000bf0,0,0,0,9,46.049,46.049,3.58e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1afa000000,0,26015104,0,0,10,46.0491,46.0499,0.000850349,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af8000000,0,29266992,0,0,11,46.0521,46.0523,0.000153303,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af6000000,0,29266992,0,0,12,46.0524,46.0525,0.000150801,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afec00000,0,14633496,0,0,13,46.0526,46.0528,0.000127401,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3b000000,0,14633496,0,0,14,46.0529,46.053,0.000128175,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af4000000,0,29266992,0,0,15,46.0531,46.0532,0.000146059,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af2000000,0,29266992,0,0,16,46.0533,46.0535,0.00014292,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b40c00000,0,14633496,0,0,17,46.0536,46.0537,0.000127568,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af0000000,0,14633496,0,0,18,46.0538,46.0539,0.000133091,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1b02000000,0,0,0,0,17,46.0555,46.0555,1.904e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3ae00000,0,0,0,0,16,46.0556,46.0556,6.18e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b37e00000,0,0,0,0,15,46.0558,46.0558,9.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b00000000,0,0,0,0,14,46.0559,46.0559,5.1e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b01800000,0,0,0,0,13,46.0561,46.0561,5.59e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afa000000,0,0,0,0,12,46.0563,46.0563,5.18e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1aec000000,0x7f1b04000bf0,39172479,0,0,13,46.0579,46.0581,0.000173683,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1aec000000,0x7f1b04000bf0,0,0,0,12,46.0582,46.0582,5.77e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b3ae3fe00,0x7f1b04000bf0,37375,0,0,13,46.0582,46.0582,5.5e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b3ae3fe00,0x7f1b04000bf0,0,0,0,12,46.0636,46.0636,5.81e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1ae8000000,0,45950152,0,0,13,46.0636,46.0638,0.000178471,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b00000000,0,11487538,0,0,14,46.0639,46.0639,5.326e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b01800000,0x7f1b040010c0,180735,0,0,15,46.0673,46.0673,2.625e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b01800000,0x7f1b040010c0,0,0,0,14,46.0677,46.0677,4.87e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b01800000,0x7f1b040010c0,69375,0,0,15,46.0679,46.0679,5.39e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b01800000,0x7f1b040010c0,0,0,0,14,46.0681,46.0681,4.05e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b01800000,0x7f1b040010c0,69375,0,0,15,46.0683,46.0683,4.03e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b01800000,0x7f1b040010c0,0,0,0,14,46.0685,46.0685,3.17e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1ae4000000,0,34897200,0,0,15,46.0685,46.0687,0.00016921,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ae0000000,0,39343856,0,0,16,46.0716,46.0718,0.000160882,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1adc000000,0,39343856,0,0,17,46.072,46.0721,0.000167852,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af6000000,0,0,0,0,16,46.0723,46.0723,1.065e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af8000000,0,0,0,0,15,46.0725,46.0725,4.75e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1b02000000,0,19671928,0,0,16,46.0727,46.0727,7.47e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afa000000,0,19671928,0,0,17,46.0729,46.0729,6.82e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1afec00000,0,0,0,0,16,46.0732,46.0732,5.15e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3b000000,0,0,0,0,15,46.0734,46.0734,5.95e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ad8000000,0,39343856,0,0,16,46.0736,46.0738,0.000164241,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ad4000000,0,39343856,0,0,17,46.074,46.0741,0.000162211,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af2000000,0,0,0,0,16,46.0743,46.0743,5.79e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af4000000,0,0,0,0,15,46.0746,46.0746,5.8e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1af2000000,0,19671928,0,0,16,46.0748,46.0748,8.17e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af4000000,0,19671928,0,0,17,46.075,46.075,7.85e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af0000000,0,0,0,0,16,46.0753,46.0753,5.63e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b40c00000,0,0,0,0,15,46.0755,46.0755,6.56e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afc000000,0,0,0,0,14,46.0777,46.0777,6.77e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3ae18c00,0,0,0,0,13,46.0779,46.0779,6.2e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b37e01400,0,0,0,0,12,46.0781,46.0781,5.49e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae8000000,0,0,0,0,11,46.0784,46.0784,4.35e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b00000000,0,0,0,0,10,46.0787,46.0787,6.51e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae4000000,0,0,0,0,9,46.0789,46.0789,4.58e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ad0000000,0x7f1b040010c0,52608383,0,0,10,46.0811,46.0813,0.000179708,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1ad0000000,0x7f1b040010c0,0,0,0,9,46.0814,46.0814,5.11e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b01800000,0x7f1b040010c0,49663,0,0,10,46.0814,46.0814,4.06e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b01800000,0x7f1b040010c0,0,0,0,9,46.0888,46.0888,4.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Free,0,0x7f1b37c00000,0,0,0,0,8,46.0889,46.0889,7.79e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1adc000000,0,0,0,0,7,46.0892,46.0892,4.6e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae0000000,0,0,0,0,6,46.0895,46.0895,4.27e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b02000000,0,0,0,0,5,46.09,46.09,6.68e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afa000000,0,0,0,0,4,46.0903,46.0903,5.11e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ad4000000,0,0,0,0,3,46.0906,46.0906,4.37e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ad8000000,0,0,0,0,2,46.0909,46.0909,4.42e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af4000000,0,0,0,0,1,46.0912,46.0912,4.57e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af2000000,0,0,0,0,0,46.0915,46.0915,4.78e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1b37c00000,0,720984,0,0,1,53.8111,53.8111,3.913e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b37e00000,0,5056,0,0,2,53.8116,53.8116,1.095e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae00000,0,101376,0,0,3,53.8118,53.8118,8.01e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b00000000,0,23246208,0,0,4,53.8125,53.8125,8.89e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b02000000,0,23246176,0,0,5,53.8147,53.8147,9.92e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af0000000,0,5811544,0,0,6,53.815,53.815,9.18e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b37e01400,0,5120,0,0,7,53.8153,53.8153,7.33e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae18c00,0,160064,0,0,8,53.8156,53.8156,8.06e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ae8000000,0,45950208,0,0,9,53.816,53.816,8.03e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b0182c200,0x7f1c30000d30,91903,0,0,10,53.8181,53.8181,2.865e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b0182c200,0x7f1c30000d30,0,0,0,9,53.8184,53.8184,9.72e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b0182c200,0x7f1c30000d30,51967,0,0,10,53.8185,53.8185,5.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b0182c200,0x7f1c30000d30,0,0,0,9,53.8187,53.8187,3.76e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b0182c200,0x7f1c30000d30,51967,0,0,10,53.8189,53.8189,4.22e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b0182c200,0x7f1c30000d30,0,0,0,9,53.8191,53.8191,3.12e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1afa000000,0,26015104,0,0,10,53.8191,53.8191,6.68e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af2000000,0,29266992,0,0,11,53.8221,53.8221,8.11e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af4000000,0,29266992,0,0,12,53.8224,53.8224,6.74e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afec00000,0,14633496,0,0,13,53.8228,53.8228,7.94e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3b000000,0,14633496,0,0,14,53.8231,53.8231,7.12e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af6000000,0,29266992,0,0,15,53.8235,53.8236,6.38e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af8000000,0,29266992,0,0,16,53.824,53.824,6.13e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b40c00000,0,14633496,0,0,17,53.8244,53.8244,5.89e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ae4000000,0,14633496,0,0,18,53.8248,53.8248,6.43e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1b00000000,0,0,0,0,17,53.8265,53.8265,8.54e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3ae00000,0,0,0,0,16,53.8269,53.8269,5.44e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b37e00000,0,0,0,0,15,53.8273,53.8273,5.32e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b02000000,0,0,0,0,14,53.8277,53.8277,4.92e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af0000000,0,0,0,0,13,53.8281,53.8281,5.66e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afa000000,0,0,0,0,12,53.8285,53.8285,4.76e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ad4000000,0x7f1c30000d30,39172479,0,0,13,53.8304,53.8304,1.655e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1ad4000000,0x7f1c30000d30,0,0,0,12,53.8305,53.8305,3.88e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b0182c200,0x7f1c30000d30,37375,0,0,13,53.8305,53.8305,3.77e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b0182c200,0x7f1c30000d30,0,0,0,12,53.8358,53.8358,4.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1afc000000,0,45950152,0,0,13,53.8359,53.8359,8.02e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af0000000,0,11487538,0,0,14,53.8362,53.8362,7.54e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af0af4a00,0x7f1c30000d50,180735,0,0,15,53.8398,53.8398,2.422e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1af0af4a00,0x7f1c30000d50,0,0,0,14,53.8402,53.8403,4.46e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1af0af4a00,0x7f1c30000d50,69375,0,0,15,53.8404,53.8404,5.61e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1af0af4a00,0x7f1c30000d50,0,0,0,14,53.8406,53.8406,3.59e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1af0af4a00,0x7f1c30000d50,69375,0,0,15,53.8408,53.8408,3.63e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1af0af4a00,0x7f1c30000d50,0,0,0,14,53.841,53.841,3.84e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1ad8000000,0,34897200,0,0,15,53.841,53.841,5.2e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1adc000000,0,39343856,0,0,16,53.8443,53.8443,7.41e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ae0000000,0,39343856,0,0,17,53.8616,53.8616,8.61e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af4000000,0,0,0,0,16,53.8862,53.8862,7.55e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af2000000,0,0,0,0,15,53.8867,53.8867,7.89e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ae4df4c00,0,19671928,0,0,16,53.8872,53.8872,8.82e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b00000000,0,19671928,0,0,17,53.8876,53.8876,8.65e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1afec00000,0,0,0,0,16,53.8881,53.8881,7.2e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3b000000,0,0,0,0,15,53.8886,53.8886,9.65e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1acc000000,0,39343856,0,0,16,53.8891,53.8894,0.000243787,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ac8000000,0,39343856,0,0,17,53.8898,53.89,0.000176231,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af8000000,0,0,0,0,16,53.8904,53.8904,7.5e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af6000000,0,0,0,0,15,53.8909,53.8909,6.86e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1b02000000,0,19671928,0,0,16,53.8914,53.8914,9.21e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afa000000,0,19671928,0,0,17,53.8919,53.8919,8.92e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1ae4000000,0,0,0,0,16,53.8924,53.8924,6.06e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b40c00000,0,0,0,0,15,53.8929,53.8929,7.06e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae8000000,0,0,0,0,14,53.8953,53.8953,7.59e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3ae18c00,0,0,0,0,13,53.8957,53.8957,6.11e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b37e01400,0,0,0,0,12,53.8962,53.8962,6.05e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afc000000,0,0,0,0,11,53.8967,53.8967,5.19e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af0000000,0,0,0,0,10,53.8973,53.8973,4.48e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ad8000000,0,0,0,0,9,53.8978,53.8978,5.42e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ac4000000,0x7f1c30000d50,52608383,0,0,10,53.9003,53.9005,0.000174657,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1ac4000000,0x7f1c30000d50,0,0,0,9,53.9006,53.9006,5.2e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1af0af4a00,0x7f1c30000d50,49663,0,0,10,53.9006,53.9006,4.94e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1af0af4a00,0x7f1c30000d50,0,0,0,9,53.9079,53.9079,5.02e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Free,0,0x7f1b37c00000,0,0,0,0,8,53.908,53.908,7.59e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae0000000,0,0,0,0,7,53.9085,53.9085,5.33e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1adc000000,0,0,0,0,6,53.9091,53.9091,3.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae4df4c00,0,0,0,0,5,53.9097,53.9097,5.54e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b00000000,0,0,0,0,4,53.9102,53.9102,6.38e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ac8000000,0,0,0,0,3,53.9108,53.9108,4.5e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1acc000000,0,0,0,0,2,53.9113,53.9113,3.83e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afa000000,0,0,0,0,1,53.9119,53.9119,5.75e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b02000000,0,0,0,0,0,53.9124,53.9124,5.71e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102

It seems every time when I run one query, the memory keeps increasing which seems some part of the memory should be freed but not.
It works well if we call cudaMalloc/cudaFree directly

@lucafuji
Copy link
Contributor Author

lucafuji commented May 9, 2019

@harrism ^^

@lucafuji
Copy link
Contributor Author

lucafuji commented May 9, 2019

Hi, @harrism I think the problem should be within PooledAllocation

when I initialize the RMM with this option, no memory leak happens

rmmOptions_t options = {
        CudaDefaultAllocation,
        0,
        true 
    };

@harrism
Copy link
Member

harrism commented May 10, 2019

I believe what is occurring is a limitation of cnmem, the allocator underlying RMM. If CNMem exceeds it's initial pool size, it grows by individual calls to cudaMalloc that cannot be merged into a larger pool. These cudaMalloc calls are the size of the calls to cnmemMalloc, not larger. Therefore if the application exceeds the initial pool size and then makes a lot of small allocations, you end up with a VERY fragmented upper memory, which can't be defragmented. If the application exceeds the initial pool size and then makes a few LARGE allocations, the upper memory is not as fragmented, so can be used effectively for sub allocation. I think cuDF typically hits the latter case, but AresDB is hitting the former.

A workaround might be to initialize with a large fraction of the total GPU memory, e.g. 75% or 95%. (Just replace 0 in the rmmOptions_t second field with that value.) This should improve performance for allocations when usage is high too.

Ultimately I would like to redesign the allocator to be smarter, and have a parameter to control growth steps. Or it could fall back to cudaMalloc when the pool is exceeded, so at least freeing the small allocations will make that memory usable again.

@harrism
Copy link
Member

harrism commented May 10, 2019

I'll keep this issue open as a placeholder for pool growth redesign for now.

@harrism harrism added the feature request New feature or request label May 10, 2019
@nouiz
Copy link

nouiz commented May 15, 2019 via email

@harrism
Copy link
Member

harrism commented May 16, 2019

All of the above require modifying or replacing cnmem, so no matter what it's a big undertaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants