-
Notifications
You must be signed in to change notification settings - Fork 33
Memory Usage in Kernel
Kilik Kuo edited this page Mar 7, 2017
·
13 revisions
This page shares findings or problems we've encountered.
NOTE : On windows, out of resources error may happen easily when NV device is used due to TdrLevel enabled, turn it OFF to fix it.
- On CPU device, local memory is a regular RAM - same as global memory.
- On GPU device, very fast on-chip controllable cache.
To find out device local memory size
import pyopencl as cl
from pyopencl import device_info as di
# dev is the target cl.Device instance.
local_memory_size = dev.get_info(di.LOCAL_MEM_SIZE)
Memory caching implementation on Intel architecture
- Two ways
-
a) In-kernel allocation, e.g.
#define LM_SIZE 1024 __kernel void test_1(...) { __local int localArray[LM_SIZE]; } __kernel void test_2(...) { __local int localArray[1024]; }
-
b) Host-side allocation, e.g. allocating 32 KBs local memory.
- Python
prog.test_input_local(queue, global_work_items, local_work_items, cl.LocalMemory(32*1024)).wait()
- Kernel
__kernel void test_input_local(local int* localArray) {}
NOTE : The usage of local memory can NOT be calculated during compilation. Out-Of-Resources may happen during runtime.
-
TBD
- Used when a). register spilling happens 2). private array is used.
TBD
TBD
Device name : Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
Device type : CPU
Device version : OpenCL 2.1 (Build 18)
Device Profile : FULL_PROFILE
=====================================================================================
Global memory cache line size : 64B
Global memory cache size : 256.00KB
Global memory cache type : READ_WRITE_CACHE
Global memory size : 7.71GB
Max preferred size of global variables : 64.00KB
Local memory size : 32.00KB
Local memory type : GLOBAL
Max constant arguments count : 480
Max size of a constant buffer : 128.00KB
Max global variable size : 64.00KB
Max size of memory object allocation : 1.93GB
Max parameter size : 3.75KB
Max pipe objects : 16
max work group size : 8192
max work item dimensions : 3
max work item size : [8192, 8192, 8192]
base address align : 1024
Local memory size : 32.00KB
The max size of the device queue : 4.00GB
The size of the device queue : 4.00GB
Compute Units : 4
Device name : GeForce GTX 950M
Device type : GPU
Device version : OpenCL 1.2 CUDA
Device Profile : FULL_PROFILE
=====================================================================================
Global memory cache line size : 128B
Global memory cache size : 80.00KB
Global memory cache type : READ_WRITE_CACHE
Global memory size : 2.00GB
Max preferred size of global variables : Not available (version)
Local memory size : 48.00KB
Local memory type : LOCAL
Max constant arguments count : 9
Max size of a constant buffer : 64.00KB
Max global variable size : Not available (version)
Max size of memory object allocation : 512.00MB
Max parameter size : 4.25KB
Max pipe objects : Not available (version)
max work group size : 1024
max work item dimensions : 3
max work item size : [1024, 1024, 64]
base address align : 4096
Local memory size : 48.00KB
The max size of the device queue : Not available (version)
The size of the device queue : Not available (version)
Device command-queue properties : Not available (version)
Host command-queue properties : OUT_OF_ORDER_EXEC_MODE_ENABLE
: PROFILING_ENABLE
Device name : Intel(R) HD Graphics 530
Device type : GPU
Device version : OpenCL 2.0
Device Profile : FULL_PROFILE
=====================================================================================
Global memory cache line size : 64B
Global memory cache size : 512.00KB
Global memory cache type : READ_WRITE_CACHE
Global memory size : 3.15GB
Max preferred size of global variables : 2.00GB
Local memory size : 64.00KB
Local memory type : LOCAL
Max constant arguments count : 8
Max size of a constant buffer : 2.00GB
Max global variable size : 64.00KB
Max size of memory object allocation : 2.00GB
Max parameter size : 1.00KB
Max pipe objects : 1
max work group size : 256
max work item dimensions : 3
max work item size : [256, 256, 256]
base address align : 1024
Local memory size : 64.00KB
The max size of the device queue : 64.00MB
The size of the device queue : 128.00KB
Device command-queue properties : OUT_OF_ORDER_EXEC_MODE_ENABLE
: PROFILING_ENABLE
Host command-queue properties : OUT_OF_ORDER_EXEC_MODE_ENABLE
: PROFILING_ENABLE
Device name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Device type : CPU
Device version : OpenCL 2.0 (Build 359)
Device Profile : FULL_PROFILE
=====================================================================================
Global memory cache line size : 64B
Global memory cache size : 256.00KB
Global memory cache type : READ_WRITE_CACHE
Global memory size : 7.89GB
Max preferred size of global variables : 64.00KB
Local memory size : 32.00KB
Local memory type : GLOBAL
Max constant arguments count : 480
Max size of a constant buffer : 128.00KB
Max global variable size : 64.00KB
Max size of memory object allocation : 1.97GB
Max parameter size : 3.75KB
Max pipe objects : 16
max work group size : 8192
max work item dimensions : 3
max work item size : [8192, 8192, 8192]
base address align : 1024
Local memory size : 32.00KB
The max size of the device queue : 4.00GB
The size of the device queue : 4.00GB
Device command-queue properties : OUT_OF_ORDER_EXEC_MODE_ENABLE
: PROFILING_ENABLE
Host command-queue properties : OUT_OF_ORDER_EXEC_MODE_ENABLE
: PROFILING_ENABLE