Skip to content

Memory Usage in Kernel

Kilik Kuo edited this page Mar 5, 2017 · 13 revisions

This page shares findings or problems we've encountered.

Local memory :

  • On CPU device, local memory is a regular RAM - same as global memory.
  • On GPU device, very fast on-chip controllable cache.

Memory caching implementation on Intel architecture

TBD

When is local memory used ?

TBD

Example

TBD


Private memory :

TBD

When is private memory used ?

TBD

Example

TBD


Tested Devices Information 1

   Device name                             : Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
   Device type                             :                                      CPU
   Device version                          :                    OpenCL 2.1 (Build 18)
   Device Profile                          :                             FULL_PROFILE
=====================================================================================
   Global memory cache line size           :                                      64B
   Global memory cache size                :                                 256.00KB
   Global memory cache type                :                         READ_WRITE_CACHE
   Global memory size                      :                                   7.71GB
   Max preferred size of global variables  :                                  64.00KB
   Local memory size                       :                                  32.00KB
   Local memory type                       :                                   GLOBAL
   Max constant arguments count            :                                      480
   Max size of a constant buffer           :                                 128.00KB
   Max global variable size                :                                  64.00KB
   Max size of memory object allocation    :                                   1.93GB
   Max parameter size                      :                                   3.75KB
   Max pipe objects                        :                                       16
   max work group size                     :                                     8192
   max work item dimensions                :                                        3
   max work item size                      :                       [8192, 8192, 8192]
   base address align                      :                                     1024
   Local memory size                       :                                  32.00KB
   The max size of the device queue        :                                   4.00GB
   The size of the device queue            :                                   4.00GB

Tested Device Information 2

   Device name                             :                         GeForce GTX 950M
   Device type                             :                                      GPU
   Device version                          :                          OpenCL 1.2 CUDA
   Device Profile                          :                             FULL_PROFILE
=====================================================================================
   Global memory cache line size           :                                     128B
   Global memory cache size                :                                  80.00KB
   Global memory cache type                :                         READ_WRITE_CACHE
   Global memory size                      :                                   2.00GB
   Max preferred size of global variables  :                  Not available (version)
   Local memory size                       :                                  48.00KB
   Local memory type                       :                                    LOCAL
   Max constant arguments count            :                                        9
   Max size of a constant buffer           :                                  64.00KB
   Max global variable size                :                  Not available (version)
   Max size of memory object allocation    :                                 512.00MB
   Max parameter size                      :                                   4.25KB
   Max pipe objects                        :                  Not available (version)
   max work group size                     :                                     1024
   max work item dimensions                :                                        3
   max work item size                      :                         [1024, 1024, 64]
   base address align                      :                                     4096
   Local memory size                       :                                  48.00KB
   The max size of the device queue        :                  Not available (version)
   The size of the device queue            :                  Not available (version)
   Device command-queue properties         :                  Not available (version)
   Host command-queue properties           :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE

Tested Device Information 3

   Device name                             :                 Intel(R) HD Graphics 530
   Device type                             :                                      GPU
   Device version                          :                              OpenCL 2.0
   Device Profile                          :                             FULL_PROFILE
=====================================================================================
   Global memory cache line size           :                                      64B
   Global memory cache size                :                                 512.00KB
   Global memory cache type                :                         READ_WRITE_CACHE
   Global memory size                      :                                   3.15GB
   Max preferred size of global variables  :                                   2.00GB
   Local memory size                       :                                  64.00KB
   Local memory type                       :                                    LOCAL
   Max constant arguments count            :                                        8
   Max size of a constant buffer           :                                   2.00GB
   Max global variable size                :                                  64.00KB
   Max size of memory object allocation    :                                   2.00GB
   Max parameter size                      :                                   1.00KB
   Max pipe objects                        :                                       1
   max work group size                     :                                      256
   max work item dimensions                :                                        3
   max work item size                      :                          [256, 256, 256]
   base address align                      :                                     1024
   Local memory size                       :                                  64.00KB
   The max size of the device queue        :                                  64.00MB
   The size of the device queue            :                                 128.00KB
   Device command-queue properties         :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE
   Host command-queue properties           :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE

Tested Device Information 4

   Device name                             : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
   Device type                             :                                      CPU
   Device version                          :                   OpenCL 2.0 (Build 359)
   Device Profile                          :                             FULL_PROFILE
=====================================================================================
   Global memory cache line size           :                                      64B
   Global memory cache size                :                                 256.00KB
   Global memory cache type                :                         READ_WRITE_CACHE
   Global memory size                      :                                   7.89GB
   Max preferred size of global variables  :                                  64.00KB
   Local memory size                       :                                  32.00KB
   Local memory type                       :                                   GLOBAL
   Max constant arguments count            :                                      480
   Max size of a constant buffer           :                                 128.00KB
   Max global variable size                :                                  64.00KB
   Max size of memory object allocation    :                                   1.97GB
   Max parameter size                      :                                   3.75KB
   Max pipe objects                        :                                       16
   max work group size                     :                                     8192
   max work item dimensions                :                                        3
   max work item size                      :                       [8192, 8192, 8192]
   base address align                      :                                     1024
   Local memory size                       :                                  32.00KB
   The max size of the device queue        :                                   4.00GB
   The size of the device queue            :                                   4.00GB
   Device command-queue properties         :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE
   Host command-queue properties           :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE