Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling release memory (device memory deallocations) mode after each run from the Execution Plan #444

Merged
merged 6 commits into from
Jun 10, 2024

Conversation

jjfumero
Copy link
Member

@jjfumero jjfumero commented Jun 6, 2024

Description

TornadoVM fully manages device memory, and the way it works is similar to the Java memory management. TornadoVM has a hard limit for the maximum amount of device memory to use. Then, the TornadoVM runtime can allocate as many buffers in that region. Thus, the memory used expands until the maximum limit is reach.

Besides, TornadoVM maintains a list of free and used buffers. Thus, when an execution plan finishes, device buffers are marked as free, but never released (e.g., clMemFree in OpenCL), but rather declare as free for other task-graphs to use the already allocated areas. In the case compaction is needed, TornadoVM deallocs and allocs a new consecutive region.
This whole process is fully transparent for the programmer.

However, it might be cases in which programmers would like the TornadoVM runtime to free all resources after an execution plan has finished. This PR adds support for this feature.

If the flag -Dtornado.reuse.device.buffers=False is set, then TornadoVM allocs and deallocs device buffers every time an execution plan is launched. By default, it is set to true (to reuse buffers as much as possible).

Behaviour

To check all JNI calls, including allocations and deallocations, we need to enable the LOG_JNI macro:

diff --git a/tornado-drivers/opencl-jni/src/main/cpp/source/ocl_log.h b/tornado-drivers/opencl-jni/src/main/cpp/source/ocl_log.h
index 94e46bf8d..9079d6c78 100644
--- a/tornado-drivers/opencl-jni/src/main/cpp/source/ocl_log.h
+++ b/tornado-drivers/opencl-jni/src/main/cpp/source/ocl_log.h
@@ -31,7 +31,7 @@
 #define PRINT_DATA_TIMES 0
 #define PRINT_DATA_SIZES 0
 
-#define LOG_JNI 0
+#define LOG_JNI 1
 
 #define LOG_OCL_AND_VALIDATE(name, result)                     \
     if (LOG_JNI == 1)  {                                       \
diff --git a/tornado-drivers/ptx-jni/src/main/cpp/source/ptx_log.h b/tornado-drivers/ptx-jni/src/main/cpp/source/ptx_log.h
index 5e0dd6eec..d32fd804e 100644
--- a/tornado-drivers/ptx-jni/src/main/cpp/source/ptx_log.h
+++ b/tornado-drivers/ptx-jni/src/main/cpp/source/ptx_log.h
@@ -26,7 +26,7 @@
 #define TORNADO_PTX_LOG_H
 
 #include <cuda.h>
-#define LOG_PTX 0
+#define LOG_PTX 1
 
 #define LOG_PTX_AND_VALIDATE(name, result)                      \
     if (LOG_PTX == 1)  {  
$ tornado-test --printKernel --jvm="-Dtornado.reuse.device.buffers=false" -V uk.ac.manchester.tornado.unittests.foundation.TestFloats#testVectorFloatAdd 

// OpenCL

[TornadoVM-OCL-JNI] Calling : clEnqueueNDRangeKernel -> Status: 0
[TornadoVM-OCL-JNI] Calling : clEnqueueReadBuffer -> Status: 0
[TornadoVM-OCL-JNI] Calling : clFlush -> Status: 0
[TornadoVM-OCL-JNI] Calling : clReleaseMemObject -> Status: 0
[TornadoVM-OCL-JNI] Calling : clReleaseMemObject -> Status: 0
[TornadoVM-OCL-JNI] Calling : clReleaseMemObject -> Status: 0
[TornadoVM-OCL-JNI] Calling : clFlush -> Status: 0

Level Zero:

[TornadoVM-SPIRV-JNI]  Calling : zeCommandListAppendMemoryCopy-[INTEGER] -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListAppendBarrier -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListClose -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandQueueExecuteCommandLists -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandQueueSynchronize -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListReset -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeMemFree -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeMemFree -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeMemFree -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListClose -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandQueueExecuteCommandLists -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandQueueSynchronize -> Status: 0
[TornadoVM-SPIRV-JNI]  Calling : zeCommandListReset -> Status: 0

PTX:

[TornadoVM-PTX-JNI] Calling : cuLaunchKernel -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventRecord -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventCreate (beforeEvent) -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventCreate (afterEvent) -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventRecord -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuMemcpyDtoHMemSeg -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuEventRecord -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuCtxSetCurrent -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuMemFree -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuCtxSetCurrent -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuMemFree -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuCtxSetCurrent -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuMemFree -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuStreamSynchronize -> Status: 0
[TornadoVM-PTX-JNI] Calling : cuStreamSynchronize -> Status: 0
Test: class uk.ac.manchester.tornado.unittests.foundation.TestFloats#testVectorFloatAdd
	Running test: testVectorFloatAdd         ................  [PASS] 

Problem description

n/ a.

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

Any test with the flag -Dtornado.reuse.device.buffers=false:

$ tornado-test --printKernel --jvm="-Dtornado.reuse.device.buffers=false" -V uk.ac.manchester.tornado.unittests.foundation.TestFloats#testVectorFloatAdd 

## all unit-test also are passing
make tests 

@jjfumero jjfumero added enhancement New feature or request runtime labels Jun 6, 2024
@jjfumero jjfumero self-assigned this Jun 6, 2024
Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks good to me. The only comment that I have (as we discussed) is to rethink the strategy of releasing the allocated memory and data structures when the JVM finishes. This is not part of this PR.

Note: I also tried the multi-threaded execution plans with the flag and works:

tornado-test --jvm="-Dtornado.reuse.device.buffers=false -Dtornado.device.memory=2GB" -V --fast uk.ac.manchester.tornado.unittests.multithreaded.TestMultiThreadedExecutionPlans

@jjfumero jjfumero merged commit a0d04e7 into beehive-lab:develop Jun 10, 2024
2 checks passed
@jjfumero jjfumero deleted the feat/mem/forcerelease branch June 10, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request runtime
Projects
Development

Successfully merging this pull request may close these issues.

3 participants