RMM Memory Leak after running for a while [QST] #75

lucafuji · 2019-04-09T21:29:14Z

What is your question?
AresDB integrated with RMM last week and tried to run it under staging for a while.
We used pooled memory management and default stream for memory allocation.

After 30 minutes, it seems all memory of one GPU card is exhausted and a segmentation fault happens in next memory allocation.

I don't think there are any memory leaks in our code since previously when we call cudaMalloc/cudaFree, it works.

Here is the link to our code
https://github.com/uber/aresdb/blob/master/memutils/memory/rmm_alloc.cu
Thank you so much!

jrhemstad · 2019-04-12T16:44:15Z

Are you 100% sure that you don't have any other rogue cudaMalloc calls within your code? If the memory pool grows to fill the entire GPU memory and you attempt a normal cudaMalloc, you'll get an OOM error.

Also, you say that you get a segmentation fault on the allocation. I would expect that if you were running out of memory that you'd get an OOM error code.

jrhemstad · 2019-04-12T16:47:15Z

You can also enable logging at initialization of RMM with this flag: https://github.com/rapidsai/rmm/blob/branch-0.7/include/rmm/rmm_api.h#L68

This will log every allocation/free allowing you to plot your allocs/frees overtime and see if there are leaks in your calls.

harrism · 2019-04-30T00:08:54Z

@lucafuji any response to @jrhemstad 's logging suggestion? I don't think there are sufficient details here to allow us to repro locally.

lucafuji · 2019-05-07T18:19:34Z

@lucafuji any response to @jrhemstad 's logging suggestion? I don't think there are sufficient details here to allow us to repro locally.

Yes, let me turn on logging to try

lucafuji · 2019-05-09T22:11:25Z

here is the log I got

Event Type,Device ID,Address,Stream,Size (bytes),Free Memory,Total Memory,Current Allocs,Start,End,Elapsed,Location
Alloc,0,0x7f1b37c00000,0,720984,0,0,1,46.0344,46.0345,0.000155089,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b37e00000,0,5056,0,0,2,46.0429,46.0431,0.000168821,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae00000,0,101376,0,0,3,46.0431,46.0433,0.000150959,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b02000000,0,23246208,0,0,4,46.0433,46.0435,0.000166125,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b00000000,0,23246176,0,0,5,46.0454,46.0455,0.000150455,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b01800000,0,5811544,0,0,6,46.0456,46.0457,0.000131093,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b37e01400,0,5120,0,0,7,46.0458,46.0458,1e-05,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae18c00,0,160064,0,0,8,46.0459,46.0459,9.546e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afc000000,0,45950208,0,0,9,46.046,46.0462,0.000174189,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae3fe00,0x7f1b04000bf0,91903,0,0,10,46.0481,46.0481,1.7684e-05,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b3ae3fe00,0x7f1b04000bf0,0,0,0,9,46.0484,46.0484,4.976e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b3ae3fe00,0x7f1b04000bf0,51967,0,0,10,46.0485,46.0485,5.73e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b3ae3fe00,0x7f1b04000bf0,0,0,0,9,46.0487,46.0487,4.16e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b3ae3fe00,0x7f1b04000bf0,51967,0,0,10,46.0489,46.0489,4.43e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b3ae3fe00,0x7f1b04000bf0,0,0,0,9,46.049,46.049,3.58e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1afa000000,0,26015104,0,0,10,46.0491,46.0499,0.000850349,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af8000000,0,29266992,0,0,11,46.0521,46.0523,0.000153303,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af6000000,0,29266992,0,0,12,46.0524,46.0525,0.000150801,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afec00000,0,14633496,0,0,13,46.0526,46.0528,0.000127401,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3b000000,0,14633496,0,0,14,46.0529,46.053,0.000128175,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af4000000,0,29266992,0,0,15,46.0531,46.0532,0.000146059,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af2000000,0,29266992,0,0,16,46.0533,46.0535,0.00014292,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b40c00000,0,14633496,0,0,17,46.0536,46.0537,0.000127568,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af0000000,0,14633496,0,0,18,46.0538,46.0539,0.000133091,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1b02000000,0,0,0,0,17,46.0555,46.0555,1.904e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3ae00000,0,0,0,0,16,46.0556,46.0556,6.18e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b37e00000,0,0,0,0,15,46.0558,46.0558,9.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b00000000,0,0,0,0,14,46.0559,46.0559,5.1e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b01800000,0,0,0,0,13,46.0561,46.0561,5.59e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afa000000,0,0,0,0,12,46.0563,46.0563,5.18e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1aec000000,0x7f1b04000bf0,39172479,0,0,13,46.0579,46.0581,0.000173683,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1aec000000,0x7f1b04000bf0,0,0,0,12,46.0582,46.0582,5.77e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b3ae3fe00,0x7f1b04000bf0,37375,0,0,13,46.0582,46.0582,5.5e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b3ae3fe00,0x7f1b04000bf0,0,0,0,12,46.0636,46.0636,5.81e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1ae8000000,0,45950152,0,0,13,46.0636,46.0638,0.000178471,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b00000000,0,11487538,0,0,14,46.0639,46.0639,5.326e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b01800000,0x7f1b040010c0,180735,0,0,15,46.0673,46.0673,2.625e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b01800000,0x7f1b040010c0,0,0,0,14,46.0677,46.0677,4.87e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b01800000,0x7f1b040010c0,69375,0,0,15,46.0679,46.0679,5.39e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b01800000,0x7f1b040010c0,0,0,0,14,46.0681,46.0681,4.05e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b01800000,0x7f1b040010c0,69375,0,0,15,46.0683,46.0683,4.03e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b01800000,0x7f1b040010c0,0,0,0,14,46.0685,46.0685,3.17e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1ae4000000,0,34897200,0,0,15,46.0685,46.0687,0.00016921,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ae0000000,0,39343856,0,0,16,46.0716,46.0718,0.000160882,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1adc000000,0,39343856,0,0,17,46.072,46.0721,0.000167852,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af6000000,0,0,0,0,16,46.0723,46.0723,1.065e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af8000000,0,0,0,0,15,46.0725,46.0725,4.75e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1b02000000,0,19671928,0,0,16,46.0727,46.0727,7.47e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afa000000,0,19671928,0,0,17,46.0729,46.0729,6.82e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1afec00000,0,0,0,0,16,46.0732,46.0732,5.15e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3b000000,0,0,0,0,15,46.0734,46.0734,5.95e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ad8000000,0,39343856,0,0,16,46.0736,46.0738,0.000164241,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ad4000000,0,39343856,0,0,17,46.074,46.0741,0.000162211,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af2000000,0,0,0,0,16,46.0743,46.0743,5.79e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af4000000,0,0,0,0,15,46.0746,46.0746,5.8e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1af2000000,0,19671928,0,0,16,46.0748,46.0748,8.17e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af4000000,0,19671928,0,0,17,46.075,46.075,7.85e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af0000000,0,0,0,0,16,46.0753,46.0753,5.63e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b40c00000,0,0,0,0,15,46.0755,46.0755,6.56e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afc000000,0,0,0,0,14,46.0777,46.0777,6.77e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3ae18c00,0,0,0,0,13,46.0779,46.0779,6.2e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b37e01400,0,0,0,0,12,46.0781,46.0781,5.49e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae8000000,0,0,0,0,11,46.0784,46.0784,4.35e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b00000000,0,0,0,0,10,46.0787,46.0787,6.51e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae4000000,0,0,0,0,9,46.0789,46.0789,4.58e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ad0000000,0x7f1b040010c0,52608383,0,0,10,46.0811,46.0813,0.000179708,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1ad0000000,0x7f1b040010c0,0,0,0,9,46.0814,46.0814,5.11e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b01800000,0x7f1b040010c0,49663,0,0,10,46.0814,46.0814,4.06e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b01800000,0x7f1b040010c0,0,0,0,9,46.0888,46.0888,4.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Free,0,0x7f1b37c00000,0,0,0,0,8,46.0889,46.0889,7.79e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1adc000000,0,0,0,0,7,46.0892,46.0892,4.6e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae0000000,0,0,0,0,6,46.0895,46.0895,4.27e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b02000000,0,0,0,0,5,46.09,46.09,6.68e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afa000000,0,0,0,0,4,46.0903,46.0903,5.11e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ad4000000,0,0,0,0,3,46.0906,46.0906,4.37e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ad8000000,0,0,0,0,2,46.0909,46.0909,4.42e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af4000000,0,0,0,0,1,46.0912,46.0912,4.57e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af2000000,0,0,0,0,0,46.0915,46.0915,4.78e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1b37c00000,0,720984,0,0,1,53.8111,53.8111,3.913e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b37e00000,0,5056,0,0,2,53.8116,53.8116,1.095e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae00000,0,101376,0,0,3,53.8118,53.8118,8.01e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b00000000,0,23246208,0,0,4,53.8125,53.8125,8.89e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b02000000,0,23246176,0,0,5,53.8147,53.8147,9.92e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af0000000,0,5811544,0,0,6,53.815,53.815,9.18e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b37e01400,0,5120,0,0,7,53.8153,53.8153,7.33e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3ae18c00,0,160064,0,0,8,53.8156,53.8156,8.06e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ae8000000,0,45950208,0,0,9,53.816,53.816,8.03e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b0182c200,0x7f1c30000d30,91903,0,0,10,53.8181,53.8181,2.865e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b0182c200,0x7f1c30000d30,0,0,0,9,53.8184,53.8184,9.72e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b0182c200,0x7f1c30000d30,51967,0,0,10,53.8185,53.8185,5.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b0182c200,0x7f1c30000d30,0,0,0,9,53.8187,53.8187,3.76e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b0182c200,0x7f1c30000d30,51967,0,0,10,53.8189,53.8189,4.22e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b0182c200,0x7f1c30000d30,0,0,0,9,53.8191,53.8191,3.12e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1afa000000,0,26015104,0,0,10,53.8191,53.8191,6.68e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af2000000,0,29266992,0,0,11,53.8221,53.8221,8.11e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af4000000,0,29266992,0,0,12,53.8224,53.8224,6.74e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afec00000,0,14633496,0,0,13,53.8228,53.8228,7.94e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b3b000000,0,14633496,0,0,14,53.8231,53.8231,7.12e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af6000000,0,29266992,0,0,15,53.8235,53.8236,6.38e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af8000000,0,29266992,0,0,16,53.824,53.824,6.13e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b40c00000,0,14633496,0,0,17,53.8244,53.8244,5.89e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ae4000000,0,14633496,0,0,18,53.8248,53.8248,6.43e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1b00000000,0,0,0,0,17,53.8265,53.8265,8.54e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3ae00000,0,0,0,0,16,53.8269,53.8269,5.44e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b37e00000,0,0,0,0,15,53.8273,53.8273,5.32e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b02000000,0,0,0,0,14,53.8277,53.8277,4.92e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af0000000,0,0,0,0,13,53.8281,53.8281,5.66e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afa000000,0,0,0,0,12,53.8285,53.8285,4.76e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ad4000000,0x7f1c30000d30,39172479,0,0,13,53.8304,53.8304,1.655e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1ad4000000,0x7f1c30000d30,0,0,0,12,53.8305,53.8305,3.88e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1b0182c200,0x7f1c30000d30,37375,0,0,13,53.8305,53.8305,3.77e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1b0182c200,0x7f1c30000d30,0,0,0,12,53.8358,53.8358,4.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1afc000000,0,45950152,0,0,13,53.8359,53.8359,8.02e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af0000000,0,11487538,0,0,14,53.8362,53.8362,7.54e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1af0af4a00,0x7f1c30000d50,180735,0,0,15,53.8398,53.8398,2.422e-06,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1af0af4a00,0x7f1c30000d50,0,0,0,14,53.8402,53.8403,4.46e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1af0af4a00,0x7f1c30000d50,69375,0,0,15,53.8404,53.8404,5.61e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1af0af4a00,0x7f1c30000d50,0,0,0,14,53.8406,53.8406,3.59e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1af0af4a00,0x7f1c30000d50,69375,0,0,15,53.8408,53.8408,3.63e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1af0af4a00,0x7f1c30000d50,0,0,0,14,53.841,53.841,3.84e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1ad8000000,0,34897200,0,0,15,53.841,53.841,5.2e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1adc000000,0,39343856,0,0,16,53.8443,53.8443,7.41e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ae0000000,0,39343856,0,0,17,53.8616,53.8616,8.61e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af4000000,0,0,0,0,16,53.8862,53.8862,7.55e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af2000000,0,0,0,0,15,53.8867,53.8867,7.89e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ae4df4c00,0,19671928,0,0,16,53.8872,53.8872,8.82e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1b00000000,0,19671928,0,0,17,53.8876,53.8876,8.65e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1afec00000,0,0,0,0,16,53.8881,53.8881,7.2e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3b000000,0,0,0,0,15,53.8886,53.8886,9.65e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1acc000000,0,39343856,0,0,16,53.8891,53.8894,0.000243787,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1ac8000000,0,39343856,0,0,17,53.8898,53.89,0.000176231,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1af8000000,0,0,0,0,16,53.8904,53.8904,7.5e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af6000000,0,0,0,0,15,53.8909,53.8909,6.86e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1b02000000,0,19671928,0,0,16,53.8914,53.8914,9.21e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Alloc,0,0x7f1afa000000,0,19671928,0,0,17,53.8919,53.8919,8.92e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:85
Free,0,0x7f1ae4000000,0,0,0,0,16,53.8924,53.8924,6.06e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b40c00000,0,0,0,0,15,53.8929,53.8929,7.06e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae8000000,0,0,0,0,14,53.8953,53.8953,7.59e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b3ae18c00,0,0,0,0,13,53.8957,53.8957,6.11e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b37e01400,0,0,0,0,12,53.8962,53.8962,6.05e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afc000000,0,0,0,0,11,53.8967,53.8967,5.19e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1af0000000,0,0,0,0,10,53.8973,53.8973,4.48e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ad8000000,0,0,0,0,9,53.8978,53.8978,5.42e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Alloc,0,0x7f1ac4000000,0x7f1c30000d50,52608383,0,0,10,53.9003,53.9005,0.000174657,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1ac4000000,0x7f1c30000d50,0,0,0,9,53.9006,53.9006,5.2e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Alloc,0,0x7f1af0af4a00,0x7f1c30000d50,49663,0,0,10,53.9006,53.9006,4.94e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:47
Free,0,0x7f1af0af4a00,0x7f1c30000d50,0,0,0,9,53.9079,53.9079,5.02e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/query/thrust_rmm_allocator.hpp:58
Free,0,0x7f1b37c00000,0,0,0,0,8,53.908,53.908,7.59e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae0000000,0,0,0,0,7,53.9085,53.9085,5.33e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1adc000000,0,0,0,0,6,53.9091,53.9091,3.69e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ae4df4c00,0,0,0,0,5,53.9097,53.9097,5.54e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b00000000,0,0,0,0,4,53.9102,53.9102,6.38e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1ac8000000,0,0,0,0,3,53.9108,53.9108,4.5e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1acc000000,0,0,0,0,2,53.9113,53.9113,3.83e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1afa000000,0,0,0,0,1,53.9119,53.9119,5.75e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102
Free,0,0x7f1b02000000,0,0,0,0,0,53.9124,53.9124,5.71e-07,/home/zewang/workplace/gocode/src/github.com/uber/aresdb/memutils/memory/rmm_alloc.cu:102

It seems every time when I run one query, the memory keeps increasing which seems some part of the memory should be freed but not.
It works well if we call cudaMalloc/cudaFree directly

lucafuji · 2019-05-09T22:11:44Z

@harrism ^^

lucafuji · 2019-05-09T23:02:22Z

Hi, @harrism I think the problem should be within PooledAllocation

when I initialize the RMM with this option, no memory leak happens

rmmOptions_t options = {
        CudaDefaultAllocation,
        0,
        true 
    };

harrism · 2019-05-10T00:45:26Z

I believe what is occurring is a limitation of cnmem, the allocator underlying RMM. If CNMem exceeds it's initial pool size, it grows by individual calls to cudaMalloc that cannot be merged into a larger pool. These cudaMalloc calls are the size of the calls to cnmemMalloc, not larger. Therefore if the application exceeds the initial pool size and then makes a lot of small allocations, you end up with a VERY fragmented upper memory, which can't be defragmented. If the application exceeds the initial pool size and then makes a few LARGE allocations, the upper memory is not as fragmented, so can be used effectively for sub allocation. I think cuDF typically hits the latter case, but AresDB is hitting the former.

A workaround might be to initialize with a large fraction of the total GPU memory, e.g. 75% or 95%. (Just replace 0 in the rmmOptions_t second field with that value.) This should improve performance for allocations when usage is high too.

Ultimately I would like to redesign the allocator to be smarter, and have a parameter to control growth steps. Or it could fall back to cudaMalloc when the pool is exceeded, so at least freeing the small allocations will make that memory usable again.

harrism · 2019-05-10T00:45:56Z

I'll keep this issue open as a placeholder for pool growth redesign for now.

nouiz · 2019-05-15T13:18:37Z

Here is a work around that could be implemented. It is based on ideas in the PyTorch memory allocator. When the pool can't fill a request and that cudaMalloc fail to extend the allocation, what about deleting all the cached allocation that aren't used, but then call cudaMalloc with the sum of all those freed allocation? This would be the equivalent of a slower coalescing of those free smaller bloc. But this will never be as good as @harrism suggestion, just could make it less frequently needed, but would have some extra cost.

…

On Thu, May 9, 2019 at 8:45 PM Mark Harris ***@***.***> wrote: I'll keep this issue open as a placeholder for pool growth redesign for now. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#75 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABMF65AMNYHXKMFYUFC5OTPUTAUJANCNFSM4HEWU3TA> .

harrism · 2019-05-16T02:57:32Z

All of the above require modifying or replacing cnmem, so no matter what it's a big undertaking.

lucafuji added the question Further information is requested label Apr 9, 2019

harrism added the feature request New feature or request label May 10, 2019

jrhemstad mentioned this issue Aug 23, 2019

[FEA] RMM Pool by default and ease of use rapidsai/cudf#2676

Closed

kkraus14 closed this as completed Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMM Memory Leak after running for a while [QST] #75

RMM Memory Leak after running for a while [QST] #75

lucafuji commented Apr 9, 2019

jrhemstad commented Apr 12, 2019

jrhemstad commented Apr 12, 2019

harrism commented Apr 30, 2019

lucafuji commented May 7, 2019

lucafuji commented May 9, 2019 •

edited

Loading

lucafuji commented May 9, 2019

lucafuji commented May 9, 2019

harrism commented May 10, 2019

harrism commented May 10, 2019

nouiz commented May 15, 2019 via email

harrism commented May 16, 2019

RMM Memory Leak after running for a while [QST] #75

RMM Memory Leak after running for a while [QST] #75

Comments

lucafuji commented Apr 9, 2019

jrhemstad commented Apr 12, 2019

jrhemstad commented Apr 12, 2019

harrism commented Apr 30, 2019

lucafuji commented May 7, 2019

lucafuji commented May 9, 2019 • edited Loading

lucafuji commented May 9, 2019

lucafuji commented May 9, 2019

harrism commented May 10, 2019

harrism commented May 10, 2019

nouiz commented May 15, 2019 via email

harrism commented May 16, 2019

lucafuji commented May 9, 2019 •

edited

Loading