NVIDIA 2D Thread Scheduler Fixed #421

jjfumero · 2024-05-14T11:23:22Z

Description

This patch provides a new Thread Scheduler for NVIDIA GPUs.

Problem description

The problem is that, when using the latest NVIDIA Drivers (e.g., 550.76), the thread block is set to 32x32 for 2D kernels. This block size seems to be illegal only when using the latest NVIDIA drivers. This patch provides a custom NVIDIA scheduler to fix this. Performance over the default scheduler increases ~300GFLOPs on my RTX 3070 GPU for the canonical matrix multiplications with this patch.

Backend/s tested

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

Yes
No

How to test the new patch?

make BACKEND=opencl
tornado --threadInfo --jvm="-Ds0.t0.device=0:0" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D 512
tornado --threadInfo --jvm="-Ds0.t0.device=0:0" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D 1024

jjfumero · 2024-05-14T11:25:45Z

Related issue: #356

...s/opencl/src/main/java/uk/ac/manchester/tornado/drivers/opencl/TornadoPlatformInterface.java

stratika · 2024-05-14T13:24:54Z

...l/src/main/java/uk/ac/manchester/tornado/drivers/opencl/scheduler/OCLNVIDIAGPUScheduler.java

+    }
+
+    private long[] calculateEffectiveMaxWorkItemSizes(TaskMetaData metaData) {
+        long[] intermediates = new long[] { 1, 1, 1 };


any particular reasoning for this particular name of the array? Shouldn't this reflect the max work items?

This is to calculate the local Work Group. We initialize it to one and then fill with the dimensions used.

yes, but the name is intermediates? can we make it more specific?

Yes, I agree. I just did a refactoring of this method

…drivers/opencl/TornadoPlatformInterface.java Co-authored-by: Thanos Stratikopoulos <34061419+stratika@users.noreply.github.com>

stratika · 2024-05-14T13:59:15Z

I guess for older driver versions, we will not see any difference, right? I tried it with 525.147.05.

…uler' into feat/nvidia/scheduler

stratika

LGTM

Improvements ~~~~~~~~~~~~~~~~~~ - beehive-lab#402 <beehive-lab#402>: Support for TornadoNativeArrays from FFI buffers. - beehive-lab#403 <beehive-lab#403>: Clean-up and refactoring for the code analysis of the loop-interchange. - beehive-lab#405 <beehive-lab#405>: Disable Loop-Interchange for CPU offloading.. - beehive-lab#407 <beehive-lab#407>: Debugging OpenCL Kernels builds improved. - beehive-lab#410 <beehive-lab#410>: CPU block scheduler disabled by default and option to switch between different thread-schedulers added. - beehive-lab#418 <beehive-lab#418>: TornadoOptions and TornadoLogger improved. - beehive-lab#423 <beehive-lab#423>: MxM using ns instead of ms to report performance. - beehive-lab#425 <beehive-lab#425>: Vector types for ``Float<Width>`` and ``Int<Width>`` supported. - beehive-lab#429 <beehive-lab#429>: Documentation of the installation process updated and improved. - beehive-lab#432 <beehive-lab#432>: Support for SPIR-V code generation and dispatcher using the TornadoVM OpenCL runtime. Compatibility ~~~~~~~~~~~~~~~~~~ - beehive-lab#409 <beehive-lab#409>: Guidelines to build the documentation. - beehive-lab#411 <beehive-lab#411>: Windows installer improved. - beehive-lab#412 <beehive-lab#412>: Python installer improved to check download all Python dependencies before the main installer. - beehive-lab#413 <beehive-lab#413>: Improved documentation for installing all configurations of backends and OS. - beehive-lab#424 <beehive-lab#424>: Use Generic GPU Scheduler for some older NVIDIA Drivers for the OpenCL runtime. - beehive-lab#430 <beehive-lab#430>: Improved the installer by checking that the TornadoVM environment is loaded upfront. Bug Fixes ~~~~~~~~~~~~~~~~~~ - beehive-lab#400 <beehive-lab#400>: Fix batch computation when the global thread indexes are used to compute the outputs. - beehive-lab#414 <beehive-lab#414>: Recover Test-Field unit-tests using Panama types. - beehive-lab#415 <beehive-lab#415>: Check style errors fixed. - beehive-lab#416 <beehive-lab#416>: FPGA execution with multiple tasks in a task-graph fixed. - beehive-lab#417 <beehive-lab#417>: Lazy-copy out fixed for Java fields. - beehive-lab#420 <beehive-lab#420>: Fix Mandelbrot example. - beehive-lab#421 <beehive-lab#421>: OpenCL 2D thread-scheduler fixed for NVIDIA GPUs. - beehive-lab#422 <beehive-lab#422>: Compilation for NVIDIA Jetson Nano fixed. - beehive-lab#426 <beehive-lab#426>: Fix Logger for all backends. - beehive-lab#428 <beehive-lab#428>: Math cos/sin operations supported for vector types. - beehive-lab#431 <beehive-lab#431>: Jenkins files fixed.

jjfumero added 3 commits May 14, 2024 12:32

[refactor] All OpenCL schedulers moved to scheduler package

a058577

TornadoPlatform moved to TornadoPlatformImplementation

4bac2f6

NVIDIA GPU Thread Scheduler addeD

3f713c6

jjfumero added OpenCL runtime fix Provides a fix labels May 14, 2024

jjfumero requested review from mikepapadim and stratika May 14, 2024 11:23

jjfumero self-assigned this May 14, 2024

stratika reviewed May 14, 2024

View reviewed changes

Update tornado-drivers/opencl/src/main/java/uk/ac/manchester/tornado/…

aa624c7

…drivers/opencl/TornadoPlatformInterface.java Co-authored-by: Thanos Stratikopoulos <34061419+stratika@users.noreply.github.com>

jjfumero added 2 commits May 14, 2024 17:01

minor refactor

0403cfa

Merge remote-tracking branch 'refs/remotes/personal/feat/nvidia/sched…

b74e2d3

…uler' into feat/nvidia/scheduler

stratika approved these changes May 14, 2024

View reviewed changes

mikepapadim approved these changes May 14, 2024

View reviewed changes

jjfumero merged commit 48468bf into beehive-lab:develop May 14, 2024
2 checks passed

jjfumero deleted the feat/nvidia/scheduler branch May 14, 2024 15:32

jjfumero mentioned this pull request May 28, 2024

[release] TornadoVM 1.0.5 #433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA 2D Thread Scheduler Fixed #421

NVIDIA 2D Thread Scheduler Fixed #421

jjfumero commented May 14, 2024

jjfumero commented May 14, 2024

stratika May 14, 2024

jjfumero May 14, 2024

stratika May 14, 2024

jjfumero May 14, 2024

stratika commented May 14, 2024

stratika left a comment

NVIDIA 2D Thread Scheduler Fixed #421

NVIDIA 2D Thread Scheduler Fixed #421

Conversation

jjfumero commented May 14, 2024

Description

Problem description

Backend/s tested

OS tested

Did you check on FPGAs?

How to test the new patch?

jjfumero commented May 14, 2024

stratika May 14, 2024

Choose a reason for hiding this comment

jjfumero May 14, 2024

Choose a reason for hiding this comment

stratika May 14, 2024

Choose a reason for hiding this comment

jjfumero May 14, 2024

Choose a reason for hiding this comment

stratika commented May 14, 2024

stratika left a comment

Choose a reason for hiding this comment