Enabling release memory (device memory deallocations) mode after each run from the Execution Plan #444

jjfumero · 2024-06-06T13:16:49Z

Description

TornadoVM fully manages device memory, and the way it works is similar to the Java memory management. TornadoVM has a hard limit for the maximum amount of device memory to use. Then, the TornadoVM runtime can allocate as many buffers in that region. Thus, the memory used expands until the maximum limit is reach.

Besides, TornadoVM maintains a list of free and used buffers. Thus, when an execution plan finishes, device buffers are marked as free, but never released (e.g., clMemFree in OpenCL), but rather declare as free for other task-graphs to use the already allocated areas. In the case compaction is needed, TornadoVM deallocs and allocs a new consecutive region.
This whole process is fully transparent for the programmer.

However, it might be cases in which programmers would like the TornadoVM runtime to free all resources after an execution plan has finished. This PR adds support for this feature.

If the flag -Dtornado.reuse.device.buffers=False is set, then TornadoVM allocs and deallocs device buffers every time an execution plan is launched. By default, it is set to true (to reuse buffers as much as possible).

Behaviour

To check all JNI calls, including allocations and deallocations, we need to enable the LOG_JNI macro:

diff --git a/tornado-drivers/opencl-jni/src/main/cpp/source/ocl_log.h b/tornado-drivers/opencl-jni/src/main/cpp/source/ocl_log.h
index 94e46bf8d..9079d6c78 100644
--- a/tornado-drivers/opencl-jni/src/main/cpp/source/ocl_log.h
+++ b/tornado-drivers/opencl-jni/src/main/cpp/source/ocl_log.h
@@ -31,7 +31,7 @@
 #define PRINT_DATA_TIMES 0
 #define PRINT_DATA_SIZES 0
 
-#define LOG_JNI 0
+#define LOG_JNI 1
 
 #define LOG_OCL_AND_VALIDATE(name, result)                     \
     if (LOG_JNI == 1)  {                                       \
diff --git a/tornado-drivers/ptx-jni/src/main/cpp/source/ptx_log.h b/tornado-drivers/ptx-jni/src/main/cpp/source/ptx_log.h
index 5e0dd6eec..d32fd804e 100644
--- a/tornado-drivers/ptx-jni/src/main/cpp/source/ptx_log.h
+++ b/tornado-drivers/ptx-jni/src/main/cpp/source/ptx_log.h
@@ -26,7 +26,7 @@
 #define TORNADO_PTX_LOG_H
 
 #include <cuda.h>
-#define LOG_PTX 0
+#define LOG_PTX 1
 
 #define LOG_PTX_AND_VALIDATE(name, result)                      \
     if (LOG_PTX == 1)  {

$ tornado-test --printKernel --jvm="-Dtornado.reuse.device.buffers=false" -V uk.ac.manchester.tornado.unittests.foundation.TestFloats#testVectorFloatAdd

// OpenCL

[TornadoVM-OCL-JNI] Calling : clEnqueueNDRangeKernel -> Status: 0
[TornadoVM-OCL-JNI] Calling : clEnqueueReadBuffer -> Status: 0
[TornadoVM-OCL-JNI] Calling : clFlush -> Status: 0
[TornadoVM-OCL-JNI] Calling : clReleaseMemObject -> Status: 0
[TornadoVM-OCL-JNI] Calling : clReleaseMemObject -> Status: 0
[TornadoVM-OCL-JNI] Calling : clReleaseMemObject -> Status: 0
[TornadoVM-OCL-JNI] Calling : clFlush -> Status: 0

Level Zero:

[TornadoVM-SPIRV-JNI]  Calling : zeCommandListAppendMemoryCopy-[INTEGER] -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListAppendBarrier -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListClose -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandQueueExecuteCommandLists -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandQueueSynchronize -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListReset -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeMemFree -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeMemFree -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeMemFree -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListClose -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandQueueExecuteCommandLists -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandQueueSynchronize -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListReset -> Status: 0

PTX:

[TornadoVM-PTX-JNI] Calling : cuLaunchKernel -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventRecord -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventCreate (beforeEvent) -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventCreate (afterEvent) -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventRecord -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuMemcpyDtoHMemSeg -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventRecord -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuCtxSetCurrent -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuMemFree -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuCtxSetCurrent -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuMemFree -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuCtxSetCurrent -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuMemFree -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuStreamSynchronize -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuStreamSynchronize -> Status: 0
Test: class uk.ac.manchester.tornado.unittests.foundation.TestFloats#testVectorFloatAdd
	Running test: testVectorFloatAdd         ................  [PASS]

Problem description

n/ a.

Backend/s tested

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

Yes
No

How to test the new patch?

Any test with the flag -Dtornado.reuse.device.buffers=false:

$ tornado-test --printKernel --jvm="-Dtornado.reuse.device.buffers=false" -V uk.ac.manchester.tornado.unittests.foundation.TestFloats#testVectorFloatAdd 

## all unit-test also are passing
make tests

tornado-drivers/opencl-jni/src/main/cpp/source/OCLPlatform.cpp

tornado-drivers/opencl/src/main/java/uk/ac/manchester/tornado/drivers/opencl/OCLPlatform.java

stratika

Thanks, it looks good to me. The only comment that I have (as we discussed) is to rethink the strategy of releasing the allocated memory and data structures when the JVM finishes. This is not part of this PR.

Note: I also tried the multi-threaded execution plans with the flag and works:

tornado-test --jvm="-Dtornado.reuse.device.buffers=false -Dtornado.device.memory=2GB" -V --fast uk.ac.manchester.tornado.unittests.multithreaded.TestMultiThreadedExecutionPlans

jjfumero added 5 commits June 5, 2024 16:01

Fix OCL-SPIRV platform and rename DevicePointer and platform pointer

e374d01

pending debug message removed

2e31f91

[feat] Enable explicit memory dealloc

fa66321

PTX mem release fixed

d6cc9ee

JNI Debug info disabled

05ea44e

jjfumero added enhancement New feature or request runtime labels Jun 6, 2024

jjfumero requested review from mikepapadim and stratika June 6, 2024 13:16

jjfumero self-assigned this Jun 6, 2024

stratika reviewed Jun 7, 2024

View reviewed changes

tornado-drivers/opencl-jni/src/main/cpp/source/OCLPlatform.cpp Outdated Show resolved Hide resolved

tornado-drivers/opencl/src/main/java/uk/ac/manchester/tornado/drivers/opencl/OCLPlatform.java Outdated Show resolved Hide resolved

[jni] nativeFree function removed

61f32d6

stratika approved these changes Jun 7, 2024

View reviewed changes

mikepapadim approved these changes Jun 10, 2024

View reviewed changes

jjfumero merged commit a0d04e7 into beehive-lab:develop Jun 10, 2024
2 checks passed

jjfumero deleted the feat/mem/forcerelease branch June 10, 2024 17:11

jjfumero mentioned this pull request Jun 27, 2024

[release] TornadoVM v1.0.6 #464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling release memory (device memory deallocations) mode after each run from the Execution Plan #444

Enabling release memory (device memory deallocations) mode after each run from the Execution Plan #444

jjfumero commented Jun 6, 2024 •

edited

Loading

stratika left a comment

Enabling release memory (device memory deallocations) mode after each run from the Execution Plan #444

Enabling release memory (device memory deallocations) mode after each run from the Execution Plan #444

Conversation

jjfumero commented Jun 6, 2024 • edited Loading

Description

Behaviour

Problem description

Backend/s tested

OS tested

Did you check on FPGAs?

How to test the new patch?

stratika left a comment

Choose a reason for hiding this comment

jjfumero commented Jun 6, 2024 •

edited

Loading