Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA 2D Thread Scheduler Fixed #421

Merged
merged 6 commits into from
May 14, 2024

Conversation

jjfumero
Copy link
Member

Description

This patch provides a new Thread Scheduler for NVIDIA GPUs.

Problem description

The problem is that, when using the latest NVIDIA Drivers (e.g., 550.76), the thread block is set to 32x32 for 2D kernels. This block size seems to be illegal only when using the latest NVIDIA drivers. This patch provides a custom NVIDIA scheduler to fix this. Performance over the default scheduler increases ~300GFLOPs on my RTX 3070 GPU for the canonical matrix multiplications with this patch.

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

make BACKEND=opencl
tornado --threadInfo --jvm="-Ds0.t0.device=0:0" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D 512
tornado --threadInfo --jvm="-Ds0.t0.device=0:0" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D 1024

@jjfumero jjfumero added OpenCL runtime fix Provides a fix labels May 14, 2024
@jjfumero jjfumero self-assigned this May 14, 2024
@jjfumero
Copy link
Member Author

Related issue: #356

}

private long[] calculateEffectiveMaxWorkItemSizes(TaskMetaData metaData) {
long[] intermediates = new long[] { 1, 1, 1 };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reasoning for this particular name of the array? Shouldn't this reflect the max work items?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to calculate the local Work Group. We initialize it to one and then fill with the dimensions used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but the name is intermediates? can we make it more specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I just did a refactoring of this method

…drivers/opencl/TornadoPlatformInterface.java

Co-authored-by: Thanos Stratikopoulos <34061419+stratika@users.noreply.github.com>
@stratika
Copy link
Collaborator

I guess for older driver versions, we will not see any difference, right? I tried it with 525.147.05.

Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjfumero jjfumero merged commit 48468bf into beehive-lab:develop May 14, 2024
2 checks passed
@jjfumero jjfumero deleted the feat/nvidia/scheduler branch May 14, 2024 15:32
jjfumero added a commit to jjfumero/TornadoVM that referenced this pull request May 28, 2024
Improvements
~~~~~~~~~~~~~~~~~~

- beehive-lab#402 <beehive-lab#402>: Support for TornadoNativeArrays from FFI buffers.
- beehive-lab#403 <beehive-lab#403>: Clean-up and refactoring for the code analysis of the loop-interchange.
- beehive-lab#405 <beehive-lab#405>: Disable Loop-Interchange for CPU offloading..
- beehive-lab#407 <beehive-lab#407>: Debugging OpenCL Kernels builds improved.
- beehive-lab#410 <beehive-lab#410>: CPU block scheduler disabled by default and option to switch between different thread-schedulers added.
- beehive-lab#418 <beehive-lab#418>: TornadoOptions and TornadoLogger improved.
- beehive-lab#423 <beehive-lab#423>: MxM using ns instead of ms to report performance.
- beehive-lab#425 <beehive-lab#425>: Vector types for ``Float<Width>`` and ``Int<Width>`` supported.
- beehive-lab#429 <beehive-lab#429>: Documentation of the installation process updated and improved.
- beehive-lab#432 <beehive-lab#432>: Support for SPIR-V code generation and dispatcher using the TornadoVM OpenCL runtime.

Compatibility
~~~~~~~~~~~~~~~~~~

- beehive-lab#409 <beehive-lab#409>: Guidelines to build the documentation.
- beehive-lab#411 <beehive-lab#411>: Windows installer improved.
- beehive-lab#412 <beehive-lab#412>: Python installer improved to check download all Python dependencies before the main installer.
- beehive-lab#413 <beehive-lab#413>: Improved documentation for installing all configurations of backends and OS.
- beehive-lab#424 <beehive-lab#424>: Use Generic GPU Scheduler for some older NVIDIA Drivers for the OpenCL runtime.
- beehive-lab#430 <beehive-lab#430>: Improved the installer by checking  that the TornadoVM environment is loaded upfront.

Bug Fixes
~~~~~~~~~~~~~~~~~~

- beehive-lab#400 <beehive-lab#400>: Fix batch computation when the global thread indexes are used to compute the outputs.
- beehive-lab#414 <beehive-lab#414>: Recover Test-Field unit-tests using Panama types.
- beehive-lab#415 <beehive-lab#415>: Check style errors fixed.
- beehive-lab#416 <beehive-lab#416>: FPGA execution with multiple tasks in a task-graph fixed.
- beehive-lab#417 <beehive-lab#417>: Lazy-copy out fixed for Java fields.
- beehive-lab#420 <beehive-lab#420>: Fix Mandelbrot example.
- beehive-lab#421 <beehive-lab#421>: OpenCL 2D thread-scheduler fixed for NVIDIA GPUs.
- beehive-lab#422 <beehive-lab#422>: Compilation for NVIDIA Jetson Nano fixed.
- beehive-lab#426 <beehive-lab#426>: Fix Logger for all backends.
- beehive-lab#428 <beehive-lab#428>: Math cos/sin operations supported for vector types.
- beehive-lab#431 <beehive-lab#431>: Jenkins files fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

3 participants