-
I was just wondering if there is a way to create arrays in shared memory. In Windows 10 task Manager > Perfomance tab, it shows that Intel GPU have only shared memory, and NVIDIA has both dedicated and shared. I can monitor dedicated and shared memory usage When I run my code in Intel GPU, the arrays are created in shared memory. When I run my code in NVIDIA the arrays are created in the dedicated GPU memory. In some computers this is actually less than the shared memory. Large arrays may throw MEM_OBJECT_ALLOCATION_FAILURE error. I have looked into SVM but this appears to be something different (and a bit too complex), and I don't think this a solution. How can I specify which memory to use when creating/copying buffers, please? A simple example of elementwise multiplication of two 2D arrays, would be a super bonus. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
To be unhelpful, this question is technically out of scope of PyOpenCL, which merely exposes the abstraction offered by OpenCL, and what physical memory is used to back allocations is up to the actual ICD ("driver"/"implementation"). So the right thing to do is to look at the documentation for the ICD. To be a bit more helpful, a set of useful questions is:
Another question is, "how does Windows report this memory usage?". I unfortunately can't help with that, since I know next to nothing about Windows. The other two I can try. For (current-gen) Intel, this is easy: There is only "DRAM on the mainboard". Some of it may be earmarked specifically for GPU use (and therefore reported as "Dedicated", but in general, there is only one type of physical memory. While the non-dedicated memory may have some overheads in access due to page tables and such, generally both should be equivalent bandwidth- and latency-wise. For Nvidia, there (typically) DRAM directly attached to the GPU, and that's what you get when you allocate memory. This is typically somewhat limited in size. (a few gigs) Nvidia GPUs can also access host memory directly, but this has sufficiently high cost that it is not often an appealing option. So overall, there is potentially little reason to want anything but the default mode of memory allocation. Could you clarify what you're trying to accomplish? |
Beta Was this translation helpful? Give feedback.
To be unhelpful, this question is technically out of scope of PyOpenCL, which merely exposes the abstraction offered by OpenCL, and what physical memory is used to back allocations is up to the actual ICD ("driver"/"implementation"). So the right thing to do is to look at the documentation for the ICD.
To be a bit more helpful, a set of useful questions is:
Another question is, "how does Windows report this memory usage?". I unfortunately can't help with that, since I know next to nothing about Windows. The other two I can try.
For (current-gen) Intel, this is easy: There is only "DRAM on the mai…