Windows variant of Linux installer without MSys2 #356

otabuzzman · 2024-03-19T20:25:08Z

Description

The PR is about an installer script to simplify installation on Windows. The script is supposed to work similar to the Linux one. It downloads and compiles all repos necessary to build TornadoVM. The script requires standard installations of Windows tools (Visual Studio Community 2022, CMake, Maven, and Python) as well as GraalVM unpacked somewhere in the file system.

The script is stored in bin. The name is tornadovm-installer.cmd. It provides a help option (--help). Further information is in an additional section on Windows installation in the documentation (readthedocs) of TornadoVM.

The script downloads the forked beehive-lab repos of the SPIR-V Toolkit and the LevelZero JNI, and checks out the winstall branch of each. Repo urls and branch names are hard-coded into the script. Both need to be changed after merging, if you decide to do so.

Repo urls and branch names have also been hard-coded into the bin/compile script used by the Linux installer. This has been done for testing purposes on Linux. The compile script thus too needs the above changes after merging.

Problem description

n/ a.

Backend/s tested

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

The unit tests provided with TornadoVM have been executed on Windows 11, Windows Server 2022 and Amazon Linux 2. Details are in this Google sheet. Some notes after a rough inspection:

The test method testBatchNotEven failed on every system for every backend with same extepcted/ was values for each failure. Might thus be a principal problem.
The test methods testTornadoMathSinPIDouble and testTornadoMathCosPIDouble failed on every system for the PTX backend with compile errors. CosPi/ SinPI might thus not be implemented at all for PTX.
The test method testCopyInWithDevice fails sometimes. Might be due to different timings and a too small generous value for deltain assertEqual.
The remaining failed test methods only affected Windows 11 and the SPIR-V backend. These need investigation.

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

Yes
No

How to test the new patch?

On a Windows box:

Install Visual Studio Community 2022, CMake, Maven, GraalVM, Python (using respective Windows installer for each)
Run Windwos installer script bin\tornadovm-installer.cmd
Setup environment with command setvars.cmd
List devices with command python %TORNADO_SDK%\bin\tornado --devices

CLAassistant · 2024-03-19T20:25:14Z

All committers have signed the CLA.

jjfumero · 2024-03-19T21:02:26Z

Thank you @otabuzzman . This is awesome! I was planing to do something like this soon, so very timely. Give me a few days to check with my windows PC and try all instructions step by step.

otabuzzman · 2024-03-20T08:06:33Z

Take your time, I'm in no hurry ;-) but glad to hear you find it useful. When I first started, I used the cmd.exe tool. Later I realized that using Python would have been better since it is necessary to run and test TornadoVM interactively anyway. I now think that customizing the original installer should be possible with little effort and am considering giving it a try. Juan Fumero ***@***.***> schrieb am Di. 19. März 2024 um 22:02:

…

Thank you @otabuzzman <https://github.com/otabuzzman> . This is awesome! I was planing to do something like so very timely. Give me a few days to check with my windows PC and try all instructions step by step. — Reply to this email directly, view it on GitHub <#356 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD7PMXHPLH2OZFRETFMUUHLYZCRXRAVCNFSM6AAAAABE6IT42SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGEZDSNRZGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jjfumero · 2024-03-20T10:00:43Z

I will start with the dependencies and then switch to this main repo.

jjfumero · 2024-03-21T16:46:03Z

I could make it work. However, depending on the backend, I get errors.

OpenCL:

python %TORNADO_SDK%\bin\tornado --threadInfo -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D

[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on NVIDIA GeForce RTX 3070 (Device 0).

 -> Returned: -5
        Single Threaded CPU Execution: 2.63 GFlops, Total time = 102 ms
        Streams Execution: 16.78 GFlops, Total time = 16 ms
        TornadoVM Execution on GPU (Accelerated): 268.44 GFlops, Total Time = 1 ms
        Speedup: 102.0x
        Verification false

But the same kernel, running with SPIR-V (Level Zero) and CUDA PTX works fine:

python %TORNADO_SDK%\bin\tornado --threadInfo -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3070 GPU
        Dims              : 2
        Thread dimensions : [512, 512]
        Blocks dimensions : [16, 16, 1]
        Grids dimensions  : [32, 32, 1]

        Single Threaded CPU Execution: 2.63 GFlops, Total time = 102 ms
        Streams Execution: 16.78 GFlops, Total time = 16 ms
        TornadoVM Execution on GPU (Accelerated): 268.44 GFlops, Total Time = 1 ms
        Speedup: 102.0x
        Verification true

python %TORNADO_SDK%\bin\tornado --threadInfo -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) UHD Graphics 770 GPU
        Dims              : 2
        Global work offset: [0, 0]
        Global work size  : [512, 512]
        Local  work size  : [512, 1, 1]
        Number of workgroups  : [1, 512]

        Single Threaded CPU Execution: 2.40 GFlops, Total time = 112 ms
        Streams Execution: 17.90 GFlops, Total time = 15 ms
        TornadoVM Execution on GPU (Accelerated): 22.37 GFlops, Total Time = 12 ms
        Speedup: 9.333333333333334x
        Verification true

It looks to me a driver issue, but this test passes on Linux and OSx.

OpenCL devices:

python %TORNADO_SDK%\bin\tornado --devices
WARNING: Using incubator modules: jdk.incubator.vector
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967295
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967266
[TornadoVM-OCL-JNI] ERROR : clCreateContext -> Returned: -30

Number of Tornado drivers: 1
Driver: OpenCL
  Total number of OpenCL devices  : 4
  Tornado device=0:0  (DEFAULT)
        OPENCL --  [NVIDIA CUDA] -- NVIDIA GeForce RTX 3070
                Global Memory Size: 8.0 GB
                Local Memory Size: 48.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [1024]
                Max WorkGroup Configuration: [1024, 1024, 64]
                Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:1
        OPENCL --  [Intel(R) OpenCL Graphics] -- Intel(R) UHD Graphics 770
                Global Memory Size: 12.7 GB
                Local Memory Size: 64.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [512]
                Max WorkGroup Configuration: [512, 512, 512]
                Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:2
        OPENCL --  [Intel(R) OpenCL] -- 12th Gen Intel(R) Core(TM) i7-12700K
                Global Memory Size: 31.7 GB
                Local Memory Size: 32.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [8192]
                Max WorkGroup Configuration: [8192, 8192, 8192]
                Device OpenCL C version: OpenCL C 3.0

  Tornado device=0:3
        OPENCL --  [Intel(R) FPGA Emulation Platform for OpenCL(TM)] -- Intel(R) FPGA Emulation Device
                Global Memory Size: 31.7 GB
                Local Memory Size: 256.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [67108864]
                Max WorkGroup Configuration: [67108864, 67108864, 67108864]
                Device OpenCL C version: OpenCL C 1.2


[TornadoVM-OCL-JNI] ERROR : clReleaseContext -> Returned: -34

The errors seems to be related to the FPGA, that we need to access in emulation mode.

jjfumero · 2024-03-21T16:50:05Z

bin/compile

@@ -144,7 +144,7 @@ def build_levelzero_jni_lib(rebuild=False):
            [
                "git",
                "clone",
-                "https://github.com/beehive-lab/levelzero-jni",
+                "https://github.com/otabuzzman/levelzero-jni#winstall",


Just to keep a note: We should merge first the dependencies and then update this URL to the official repos.

Once we merge the develop of levelzero-jni to master, we can revert this link.

jjfumero · 2024-03-21T16:50:11Z

bin/compile

@@ -184,7 +184,7 @@ def build_spirv_toolkit_and_level_zero(rebuild=False):
            [
                "git",
                "clone",
-                "https://github.com/beehive-lab/beehive-spirv-toolkit.git",
+                "https://github.com/otabuzzman/beehive-spirv-toolkit.git#winstall",


bin/tornadovm-installer.cmd

docs/source/CHANGELOG.rst

jjfumero · 2024-03-21T16:58:52Z

Strange, with the OpenCL and my setup, nothing works. It looks to me a problem with my configuration:

python %TORNADO_SDK%\bin\tornado --devices
WARNING: Using incubator modules: jdk.incubator.vector
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967295
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967266
[TornadoVM-OCL-JNI] ERROR : clCreateContext -> Returned: -30

Number of Tornado drivers: 1
Driver: OpenCL
  Total number of OpenCL devices  : 4
  Tornado device=0:0  (DEFAULT)
        OPENCL --  [NVIDIA CUDA] -- NVIDIA GeForce RTX 3070
                Global Memory Size: 8.0 GB
                Local Memory Size: 48.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [1024]
                Max WorkGroup Configuration: [1024, 1024, 64]
                Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:1
        OPENCL --  [Intel(R) OpenCL Graphics] -- Intel(R) UHD Graphics 770
                Global Memory Size: 12.7 GB
                Local Memory Size: 64.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [512]
                Max WorkGroup Configuration: [512, 512, 512]
                Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:2
        OPENCL --  [Intel(R) OpenCL] -- 12th Gen Intel(R) Core(TM) i7-12700K
                Global Memory Size: 31.7 GB
                Local Memory Size: 32.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [8192]
                Max WorkGroup Configuration: [8192, 8192, 8192]
                Device OpenCL C version: OpenCL C 3.0

  Tornado device=0:3
        OPENCL --  [Intel(R) FPGA Emulation Platform for OpenCL(TM)] -- Intel(R) FPGA Emulation Device
                Global Memory Size: 31.7 GB
                Local Memory Size: 256.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [67108864]
                Max WorkGroup Configuration: [67108864, 67108864, 67108864]
                Device OpenCL C version: OpenCL C 1.2


[TornadoVM-OCL-JNI] ERROR : clReleaseContext -> Returned: -34

C:\Users\jjfum\source\repos\TornadoVM>python %TORNADO_SDK%\bin\tornado-test
python C:/Users/jjfum/source/repos/TornadoVM/bin/sdk/bin/tornado  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=False "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.foundation.TestIntegers"
WARNING: Using incubator modules: jdk.incubator.vector

[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967295
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967266
[TornadoVM-OCL-JNI] ERROR : clCreateContext -> Returned: -30
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_fp16 : enable

otabuzzman · 2024-03-21T21:48:40Z

Strange behavior, indeed. What oneAPI components are installed in your setup? In my there is only the Intel® CPU Runtime for OpenCL™ Applications with SYCL support. To make it work the steps given on the webpage in section Known Issues and Limitations needed to be applied.

What is that FPGA emulator? Can you switch it off?

jjfumero · 2024-03-22T05:51:26Z

In my case I installed the oneAPI Base Toolkit, which includes the FPGA emulation and other tools. I also have installed the Intel ARC GPU Drivers, since time to time, I switch my 3070 for the ARC 750 for experiments, and this might be causing the problem.
The thing is:

Using Msys64 tool on Windows runs fine with OpenCL
Native Windows (using VS Tools) runs fine with PTX and SPIR-V. PTX runs on the same NVIDIA 3070 that "fails" with OpenCL.

I will dig in to investigate the problem, but good to know it works for you. I will also work with Thanos to try to reproduce this on a different machine.

jjfumero · 2024-03-22T06:32:53Z

Update:

I updated the NVIDIA Driver and removed an old installation of oneAPI Toolkit (I had 2, 2022 and 2024.0.1) and unittests are passing with the OpenCL backend on the RTX 3070. Not all of them, though, and still the Matrix Multiplication benchmark fails.

> python %TORNADO_SDK%\bin\tornado --devices

Number of Tornado drivers: 1
Driver: OpenCL
  Total number of OpenCL devices  : 4
  Tornado device=0:0  (DEFAULT)
        OPENCL --  [NVIDIA CUDA] -- NVIDIA GeForce RTX 3070
                Global Memory Size: 8.0 GB
                Local Memory Size: 48.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [1024]
                Max WorkGroup Configuration: [1024, 1024, 64]
                Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:1
        OPENCL --  [Intel(R) OpenCL Graphics] -- Intel(R) UHD Graphics 770
                Global Memory Size: 12.7 GB
                Local Memory Size: 64.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [512]
                Max WorkGroup Configuration: [512, 512, 512]
                Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:2
        OPENCL --  [Intel(R) OpenCL] -- 12th Gen Intel(R) Core(TM) i7-12700K
                Global Memory Size: 31.7 GB
                Local Memory Size: 32.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [8192]
                Max WorkGroup Configuration: [8192, 8192, 8192]
                Device OpenCL C version: OpenCL C 3.0

  Tornado device=0:3
        OPENCL --  [Intel(R) FPGA Emulation Platform for OpenCL(TM)] -- Intel(R) FPGA Emulation Device
                Global Memory Size: 31.7 GB
                Local Memory Size: 256.0 KB
                Workgroup Dimensions: 3
                Total Number of Block Threads: [67108864]
                Max WorkGroup Configuration: [67108864, 67108864, 67108864]
                Device OpenCL C version: OpenCL C 1.2


> python %TORNADO_SDK%\bin\tornado-test -V

Test: class uk.ac.manchester.tornado.unittests.foundation.TestIntegers
        Running test: test01                     ................  [PASS]
        Running test: test03                     ................  [PASS]
        Running test: test04                     ................  [PASS]
        Running test: test05                     ................  [PASS]
        Running test: test06                     ................  [PASS]
        Running test: test07                     ................  [PASS]
        Running test: test02                     ................  [PASS]

Test: class uk.ac.manchester.tornado.unittests.foundation.TestFloats
        Running test: testFloatsCopy             ................  [PASS]
        Running test: testVectorFloatMul         ................  [PASS]
        Running test: testVectorFloatDiv         ................  [PASS]
        Running test: testVectorFloatAdd         ................  [PASS]
        Running test: testVectorFloatSub         ................  [PASS]

Test: class uk.ac.manchester.tornado.unittests.foundation.TestDoubles
        Running test: testDoublesMul             ................  [PASS]
        Running test: testDoublesCopy            ................  [PASS]
        Running test: testDoublesAdd             ................  [PASS]
        Running test: testDoublesDiv             ................  [PASS]
        Running test: testDoublesSub             ................  [PASS]

        ...
Test: class uk.ac.manchester.tornado.unittests.compute.ComputeTests
        Running test: testNBodyBigNoWorker       ................  [PASS]
        Running test: testBlackScholes           ................  [PASS]
        Running test: testHilbert                ................  [PASS]
        Running test: testNBodySmall             ................  [PASS]
        Running test: testDFTVectorTypes         ................  [PASS]
        Running test: matrixVector               ................  [PASS]
        Running test: testDFTFloat               ................  [PASS]
        Running test: testRenderTrack            ................  [PASS]
        Running test: testDFTDouble              ................  [PASS]
        Running test: testMandelbrot             ................  [FAILED]
                \_[REASON] expected:<8> but was:<9>
        Running test: testMontecarlo             ................  [PASS]
        Running test: matrixVectorFloat4         ................  [PASS]
        Running test: testJuliaSets              ................  [FAILED]
                \_[REASON] expected:<-1000.0> but was:<1.5197569>
        Running test: testNBody                  ................  [PASS]
        Running test: testEuler                  ................  [PASS]
        ...

==================================================
              Unit tests report
==================================================

{'[PASS]': 579, '[FAILED]': 16, '[UNSUPPORTED]': 22}
Coverage [PASS/(PASS+FAIL)]: 97.31%
Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 93.84%

==================================================
....

jjfumero · 2024-03-22T06:34:28Z

Based on the previous test, I am more towards a misconfiguration regarding the OpenCL on my Windows 11.

jjfumero · 2024-03-22T06:40:28Z

I used the cmd.exe tool. Later I realized that using Python would have been better since it is necessary to run and test TornadoVM interactively anyway. I now think that customizing the original installer should be possible with little effort and am considering giving it a try.

Ok. My only concern is that, as it is, it kind of branches away from the style we have for Linux and OSx. To simplify the process of merging and review, my suggestion is that, for this iteration of the code, we move on with this CMD tool, and you can open a second PR with the Python migration if you want. Is this something you would like to try?

jjfumero · 2024-03-22T08:03:11Z

More updates regarding NVIDIA OpenCL support on Windows 11:

I uninstalled oneAPI just to see if that was the problem. Same failure for the Matrix Multiply
I installed the ARC Drivers -> same behaviour.
In a closer look, I noticed that the Matrix Multiply in OpenCL using the RTX 3070 via NVIDIA is correct for small matrices (less than 32x32). Which makes me think it is related to the block size. The same GPU is used in WSL under Linux Ubuntu and it works. The only difference is the driver. The same local workgroup is selected for the PTX CUDA Backend in TornadoVM, and it works. So this suggests to me it is a matter of drivers.
I updated my NVIDIA Driver from "stable" to "gaming", and I noticed the same behaviour.

I am running out of ideas, but at least we know it is not due to the installation of oneAPI + ARC Drivers.

jjfumero · 2024-03-22T08:16:34Z

Ok, I think I got it.

So the error is printed by the Driver and captured in our JNI code to dispatch OpeNCL kernels:

[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -5
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on NVIDIA GeForce RTX 3070 (Device 0).

This mainly suggests an issue with the block size. Since I noticed that smaller block sizes are executed correctly with OpenCL, I modified the Matrix Multiplication example in TorandoVM as follows:

TaskGraph taskGraph = new TaskGraph("s0") //
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, matrixA, matrixB) //
                .task("t0", MatrixMultiplication2D::matrixMultiplication, matrixA, matrixB, matrixC, size) //
                .transferToHost(DataTransferMode.EVERY_EXECUTION, matrixC);

        ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();
        TornadoExecutionPlan executor = new TornadoExecutionPlan(immutableTaskGraph);

        WorkerGrid workerGrid = new WorkerGrid2D(matrixA.getNumRows(), matrixA.getNumColumns());
        GridScheduler gridScheduler = new GridScheduler("s0.t0", workerGrid);
        workerGrid.setLocalWork(16, 16, 1);

        executor.withGridScheduler(gridScheduler).withWarmUp();

Diff:

diff --git a/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java b/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java
index 0426e2dbb..a28ed57c6 100644
--- a/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java
+++ b/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java
@@ -20,9 +20,7 @@ package uk.ac.manchester.tornado.examples.compute;
 import java.util.Random;
 import java.util.stream.IntStream;

-import uk.ac.manchester.tornado.api.ImmutableTaskGraph;
-import uk.ac.manchester.tornado.api.TaskGraph;
-import uk.ac.manchester.tornado.api.TornadoExecutionPlan;
+import uk.ac.manchester.tornado.api.*;
 import uk.ac.manchester.tornado.api.annotations.Parallel;
 import uk.ac.manchester.tornado.api.enums.DataTransferMode;
 import uk.ac.manchester.tornado.api.enums.TornadoDeviceType;
@@ -97,7 +95,12 @@ public class MatrixMultiplication2D {

         ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();
         TornadoExecutionPlan executor = new TornadoExecutionPlan(immutableTaskGraph);
-        executor.withWarmUp();
+
+        WorkerGrid workerGrid = new WorkerGrid2D(matrixA.getNumRows(), matrixA.getNumColumns());
+        GridScheduler gridScheduler = new GridScheduler("s0.t0", workerGrid);
+        workerGrid.setLocalWork(16, 16, 1);
+
+        executor.withGridScheduler(gridScheduler).withWarmUp();

         // 1. Warm up Tornado
         for (int i = 0; i < WARMING_UP_ITERATIONS; i++) {

So I forced to execute in blocks of 16x16 instead of the default value of 32x32, and the execution is correct.

Task info: s0.t0
        Backend           : OPENCL
        Device            : NVIDIA GeForce RTX 3070 CL_DEVICE_TYPE_GPU (available)
        Dims              : 2
        Global work offset: [0, 0, 0]
        Global work size  : [512, 512, 1]
        Local  work size  : [16, 16, 1]
        Number of workgroups  : [32, 32, 1]

        Single Threaded CPU Execution: 2.58 GFlops, Total time = 104 ms
        Streams Execution: 15.79 GFlops, Total time = 17 ms
        TornadoVM Execution on GPU (Accelerated): 268.44 GFlops, Total Time = 1 ms
        Speedup: 104.0x
        Verification true

Takeaways:

This issue (results not correct in OpenCL for MxM) does not have anything to do with the new installation for Windows, so we can move on with the PR.
This is a new issue (new for us at least) regarding the block sizes. The TornadoVM Runtime selects the block size using the NVIDIA Guidelines for OpenCL: https://www.nvidia.com/content/cudazone/download/opencl/nvidia_opencl_programmingguide.pdf
So TornadoVM does not reinvent the wheel, and that block size should be valid because TornadoVM queries the device properties first. We will investigate this for Windows in a separate issue.
We can have ARC drivers + oneAPI drivers in combination with NVIDIA for Windows 11.

otabuzzman · 2024-03-22T10:05:32Z

Totally fine. I'll try it and come back with a PR if I succeed. Juan Fumero ***@***.***> schrieb am Fr. 22. März 2024 um 07:40:

…

I used the cmd.exe tool. Later I realized that using Python would have been better since it is necessary to run and test TornadoVM interactively anyway. I now think that customizing the original installer should be possible with little effort and am considering giving it a try. Ok. My only concern is that, as it is, it kind of branches away from the style we have for Linux and OSx. To simplify the process of merging and review, my suggestion is that, for this iteration of the code, we move on with this CMD tool, and you can open a second PR with the Python migration if you want. Is this something you would like to try? — Reply to this email directly, view it on GitHub <#356 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD7PMXCHMGQMDGUCXDI4JCTYZPG7FAVCNFSM6AAAAABE6IT42SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGQ2TENRXGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

stratika · 2024-03-26T13:55:07Z

bin/compile

@@ -144,7 +144,7 @@ def build_levelzero_jni_lib(rebuild=False):
            [
                "git",
                "clone",
-                "https://github.com/beehive-lab/levelzero-jni",
+                "https://github.com/otabuzzman/levelzero-jni#winstall",


Once we merge the develop of levelzero-jni to master, we can revert this link.

tornado-assembly/src/bin/test-native.cmd

stratika

LGTM, I think we can iterate to simplify the part of installation in the documentation. Very good work for native installation in Windows.

I tested it on Windows 11.

jjfumero · 2024-03-26T14:48:47Z

I will merge this. Awesome work @otabuzzman . Thank you!

otabuzzman added 11 commits March 10, 2024 18:06

Save work on Windows support

e872430

Save work on Windows support

7ecdd08

Resolved merge conflict

8ab1a2c

Save work on Windows support

0b6c921

Save work on Windows support

22ab46c

Save work on Windows support

61776ea

Save work on Windows support

07b0aab

Save work on Windows support

8177065

Save work on Windows support

2be9950

Save work on Windows support

0d87c16

Merge branch 'master' into winstall

f17952b

jjfumero self-requested a review March 19, 2024 21:02

jjfumero added installation windows labels Mar 19, 2024

jjfumero self-assigned this Mar 20, 2024

jjfumero requested a review from stratika March 21, 2024 13:49

jjfumero requested changes Mar 21, 2024

View reviewed changes

Deleting remains of older GIT conflicts

4e0ec2e

stratika reviewed Mar 26, 2024

View reviewed changes

jjfumero approved these changes Mar 26, 2024

View reviewed changes

jjfumero added the enhancement New feature or request label Mar 26, 2024

stratika approved these changes Mar 26, 2024

View reviewed changes

jjfumero merged commit 6bb982e into beehive-lab:develop Mar 26, 2024
2 checks passed

jjfumero mentioned this pull request Mar 27, 2024

[release] TornadoVM 1.0.3 #365

Merged

8 tasks

otabuzzman deleted the winstall branch March 27, 2024 10:28

jjfumero mentioned this pull request Apr 29, 2024

MatrixMultiplication example ERROR : clBuildProgram -> Returned: -11 #397

Open

jjfumero mentioned this pull request May 14, 2024

NVIDIA 2D Thread Scheduler Fixed #421

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows variant of Linux installer without MSys2 #356

Windows variant of Linux installer without MSys2 #356

otabuzzman commented Mar 19, 2024

CLAassistant commented Mar 19, 2024 •

edited

Loading

jjfumero commented Mar 19, 2024 •

edited

Loading

otabuzzman commented Mar 20, 2024 via email

jjfumero commented Mar 20, 2024

jjfumero commented Mar 21, 2024 •

edited

Loading

jjfumero Mar 21, 2024

stratika Mar 26, 2024

jjfumero Mar 21, 2024

jjfumero commented Mar 21, 2024 •

edited

Loading

otabuzzman commented Mar 21, 2024

jjfumero commented Mar 22, 2024

jjfumero commented Mar 22, 2024 •

edited

Loading

jjfumero commented Mar 22, 2024

jjfumero commented Mar 22, 2024

jjfumero commented Mar 22, 2024

jjfumero commented Mar 22, 2024

otabuzzman commented Mar 22, 2024 via email

stratika Mar 26, 2024

stratika left a comment

jjfumero commented Mar 26, 2024

Windows variant of Linux installer without MSys2 #356

Windows variant of Linux installer without MSys2 #356

Conversation

otabuzzman commented Mar 19, 2024

Description

Problem description

Backend/s tested

OS tested

Did you check on FPGAs?

How to test the new patch?

CLAassistant commented Mar 19, 2024 • edited Loading

jjfumero commented Mar 19, 2024 • edited Loading

otabuzzman commented Mar 20, 2024 via email

jjfumero commented Mar 20, 2024

jjfumero commented Mar 21, 2024 • edited Loading

jjfumero Mar 21, 2024

Choose a reason for hiding this comment

stratika Mar 26, 2024

Choose a reason for hiding this comment

jjfumero Mar 21, 2024

Choose a reason for hiding this comment

jjfumero commented Mar 21, 2024 • edited Loading

otabuzzman commented Mar 21, 2024

jjfumero commented Mar 22, 2024

jjfumero commented Mar 22, 2024 • edited Loading

jjfumero commented Mar 22, 2024

jjfumero commented Mar 22, 2024

jjfumero commented Mar 22, 2024

jjfumero commented Mar 22, 2024

otabuzzman commented Mar 22, 2024 via email

stratika Mar 26, 2024

Choose a reason for hiding this comment

stratika left a comment

Choose a reason for hiding this comment

jjfumero commented Mar 26, 2024

CLAassistant commented Mar 19, 2024 •

edited

Loading

jjfumero commented Mar 19, 2024 •

edited

Loading

jjfumero commented Mar 21, 2024 •

edited

Loading

jjfumero commented Mar 21, 2024 •

edited

Loading

jjfumero commented Mar 22, 2024 •

edited

Loading