You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was doing my learning which needs many single_train_steps on my gpu(NVIDIA GeForce RTX 2060 Mobile) when I noticed the irregular RAM(not video RAM) usage. I tested by modifying the epochs of https://lux.csail.mit.edu/stable/tutorials/beginner/2_PolynomialFitting from 250 to 2 500 000 and the RAM usage(of the single process, provided by kde system monitor) is increasing by time still it's up to 4.6 GB. The same issue does not happen if I disable LuxCUDA and run it on cpu. I think there is a memory leak.
The text was updated successfully, but these errors were encountered:
I can reproduce this, but I don't think it is a memory leak. It is probably just Julia not freeing memory that it doesn't need to. I tried adding a GC.gc(true) at the end of the run and it was able to free all the memory, which (I think) wouldn't have been the case if it was a memory leak
Though 4.6 GB seems extremely high. I am running the job with very limited available memory (~2GB) and then the memory usage saturates at a certain point. Can you try adding a GC.gc(true) at the end of every epoch and see if the memory usage still grows?
I tested again. I was using jupyter lab and GC.gc(true) did not work for me. I runed GC.gc(true) every 50000 epochs and in the end. The memory usage is 4.5GB in the end. I don't know if jupyter or others matters.
I was doing my learning which needs many single_train_steps on my gpu(NVIDIA GeForce RTX 2060 Mobile) when I noticed the irregular RAM(not video RAM) usage. I tested by modifying the epochs of https://lux.csail.mit.edu/stable/tutorials/beginner/2_PolynomialFitting from 250 to 2 500 000 and the RAM usage(of the single process, provided by kde system monitor) is increasing by time still it's up to 4.6 GB. The same issue does not happen if I disable LuxCUDA and run it on cpu. I think there is a memory leak.
The text was updated successfully, but these errors were encountered: