-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Problem with FPGA execution for multiple tasks and the default scheduler #416
Merged
jjfumero
merged 5 commits into
beehive-lab:develop
from
stratika:fix/401/fpga-multiple-tasks
May 13, 2024
Merged
[fix] Problem with FPGA execution for multiple tasks and the default scheduler #416
jjfumero
merged 5 commits into
beehive-lab:develop
from
stratika:fix/401/fpga-multiple-tasks
May 13, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… to obtain the default local work group from the abstract class
I could reproduce the fix with my configuration. Thanks @stratika. |
jjfumero
approved these changes
May 13, 2024
mairooni
approved these changes
May 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jjfumero
added a commit
to jjfumero/TornadoVM
that referenced
this pull request
May 28, 2024
Improvements ~~~~~~~~~~~~~~~~~~ - beehive-lab#402 <beehive-lab#402>: Support for TornadoNativeArrays from FFI buffers. - beehive-lab#403 <beehive-lab#403>: Clean-up and refactoring for the code analysis of the loop-interchange. - beehive-lab#405 <beehive-lab#405>: Disable Loop-Interchange for CPU offloading.. - beehive-lab#407 <beehive-lab#407>: Debugging OpenCL Kernels builds improved. - beehive-lab#410 <beehive-lab#410>: CPU block scheduler disabled by default and option to switch between different thread-schedulers added. - beehive-lab#418 <beehive-lab#418>: TornadoOptions and TornadoLogger improved. - beehive-lab#423 <beehive-lab#423>: MxM using ns instead of ms to report performance. - beehive-lab#425 <beehive-lab#425>: Vector types for ``Float<Width>`` and ``Int<Width>`` supported. - beehive-lab#429 <beehive-lab#429>: Documentation of the installation process updated and improved. - beehive-lab#432 <beehive-lab#432>: Support for SPIR-V code generation and dispatcher using the TornadoVM OpenCL runtime. Compatibility ~~~~~~~~~~~~~~~~~~ - beehive-lab#409 <beehive-lab#409>: Guidelines to build the documentation. - beehive-lab#411 <beehive-lab#411>: Windows installer improved. - beehive-lab#412 <beehive-lab#412>: Python installer improved to check download all Python dependencies before the main installer. - beehive-lab#413 <beehive-lab#413>: Improved documentation for installing all configurations of backends and OS. - beehive-lab#424 <beehive-lab#424>: Use Generic GPU Scheduler for some older NVIDIA Drivers for the OpenCL runtime. - beehive-lab#430 <beehive-lab#430>: Improved the installer by checking that the TornadoVM environment is loaded upfront. Bug Fixes ~~~~~~~~~~~~~~~~~~ - beehive-lab#400 <beehive-lab#400>: Fix batch computation when the global thread indexes are used to compute the outputs. - beehive-lab#414 <beehive-lab#414>: Recover Test-Field unit-tests using Panama types. - beehive-lab#415 <beehive-lab#415>: Check style errors fixed. - beehive-lab#416 <beehive-lab#416>: FPGA execution with multiple tasks in a task-graph fixed. - beehive-lab#417 <beehive-lab#417>: Lazy-copy out fixed for Java fields. - beehive-lab#420 <beehive-lab#420>: Fix Mandelbrot example. - beehive-lab#421 <beehive-lab#421>: OpenCL 2D thread-scheduler fixed for NVIDIA GPUs. - beehive-lab#422 <beehive-lab#422>: Compilation for NVIDIA Jetson Nano fixed. - beehive-lab#426 <beehive-lab#426>: Fix Logger for all backends. - beehive-lab#428 <beehive-lab#428>: Math cos/sin operations supported for vector types. - beehive-lab#431 <beehive-lab#431>: Jenkins files fixed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR provides a fix for the issue described in #401.
Note: This PR is tested on Intel Emulation mode. I do not have access to Xilinx FPGA to test it.
Problem description
There are two identified problems:
OCLCodeCache
class we have a method that checks if force compilation has been triggered, and the FPGA compilers for Intel are triggered to compile only if the check is true. This seems to have been an old check that we had from the time we had thelookupbuffer
kernel, and we were waiting till allLAUNCH
bytecodes that corresponds to all task indices (all tasks within aTaskGraph
) are issued, in order to trigger theforceCompilation()
method from theTornadoVM
class. See here.executor.withDefaultScheduler()
configuration in theExecutionPlan
seems to break the execution and results in OpenCL error (CL_INVALID_WORK_GROUP_SIZE
) when theclEnqueueNDRangeKernel
function is invoked.To fix the first problem, I removed the
shouldCompile
check that existed inOCLCodeCache
. To my understanding this is an old check, and it is not required since we deprecated thelookupbuffer
kernel.To fix the second problem, I performed a short refactoring in the
OCLKernelScheduler
(i.e., an abstract class) and theOCLFPGAScheduler
which extends the abstract class, to assess the default scheduling local work group for FPGAs when theexecutor.withDefaultScheduler()
is enabled in aTornadoExecutionPlan
.This change made me think of testing also to run the BlurFilter example with a WorkerGrid, and applied a small update in the
OCLGridInfo
to check the default FPGA local work group.Backend/s tested
Mark the backends affected by this PR.
OS tested
Mark the OS where this PR is tested.
Did you check on FPGAs?
If it is applicable, check your changes on FPGAs.
How to test the new patch?
Then, you can run, as described also in the issue #401:
rm -rf fpga-source-comp tornado --debug --threadInfo --jvm="-Dblur.red.device=0:3 -Dblur.green.device=0:3 -Dblur.blue.device=0:3 -Dtornado.recover.bailout=False" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.BlurFilter
Output:
You can download, apply the patch and build TornadoVM:
and then run the same example:
rm -rf fpga-source-comp tornado --debug --threadInfo --jvm="-Dblur.red.device=0:3 -Dblur.green.device=0:3 -Dblur.blue.device=0:3 -Dtornado.recover.bailout=False" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.BlurFilter
Output:
MultipleTasks
example that runs two kernels on the FPGA.You can download, apply the patch and build TornadoVM:
and then run the same example:
rm -rf fpga-source-comp tornado --threadInfo --jvm="-Dexample.foo.device=0:3 -Dexample.bar.device=0:3" -m tornado.examples/uk.ac.manchester.tornado.examples.MultipleTasks