testing for portblas portfft by rscohn2 · Pull Request #495 · uxlfoundation/oneMath

rscohn2 · 2024-05-21T20:11:42Z

Description

Add Unit tests for portfft, portblas to CI

.github/workflows/pr.yml

hjabird

With Romain's comments, LGTM.

Rbiessy · 2024-05-22T09:51:34Z

We'll look into ways to reduce the time it takes to run the tests. We may need to cache the SYCL kernels across the tests.

rscohn2 · 2024-05-22T10:58:51Z

We'll look into ways to reduce the time it takes to run the tests. We may need to cache the SYCL kernels across the tests.

The build times are comparable to MKL backend. MKL tests run in < 5 minutes. CI failed with a timeout because both portblas and portfft tests take > 4 hours to run. Are you saying that driver compile time is the cause? I can add a regex to select a subset of the tests to run, with a target of running for 5 minutes. What do you recommend?

Rbiessy · 2024-05-22T13:31:13Z

The build times are comparable to MKL backend. MKL tests run in < 5 minutes. CI failed with a timeout because both portblas and portfft tests take > 4 hours to run. Are you saying that driver compile time is the cause? I can add a regex to select a subset of the tests to run, with a target of running for 5 minutes. What do you recommend?

Yes the JIT compilation may be the issue, we'll need to confirm. I think in the current situation we would need to disable too many tests so I would suggest to wait until we know more. Running all the tests under 5 minutes seems very optimistic for portFFT or portBLAS though.

rscohn2 · 2024-05-22T16:39:23Z

I changed it so we build all the unit tests for portfft/portblas but only run a handful of tests. Everything passes and port* finishes before MKL. It is not what we want for the long term, but we can enable more as we get faster machines and hopefully find a way to reduce the JIT time. Are you OK with committing what is there now?

dnhsieh-intel · 2024-05-22T17:01:29Z

An indirectly related question: I noticed that previous workflows were run on my forked repo. Do you think a condition like this can avoid the runs on forked repos?

rscohn2 · 2024-05-22T18:21:19Z

Do you think a condition like this can avoid the runs on forked repos?

That will work, but you have to put the condition in every job for every workflow. I don't think there is a file-level way to do it. An alternative is to disable github actions for your fork. It is in settings/actions/general, then look near the top.

dnhsieh-intel · 2024-05-22T19:00:24Z

I see. I hope GitHub can provide a more maintainable way for this. At the same time, we may need to communicate this with contributors, but this is out of scope of this PR.

Rbiessy · 2024-05-23T11:20:46Z

@rscohn2 from our local testing we have confirmed that using AOT compilation of the kernels speeds up the testing a lot. This is best to use if we know we will run on a specific device.
For the portBLAS backend it turns out this can be done by setting the cmake flag -DPORTBLAS_TUNING_TARGET=INTEL_CPU. This sets the SYCL target to spir64_x86_64. With this we were able to compile in 32 minutes and run all the tests in 8 minutes on a i9-12900K. I think it would take a bit longer with the current GitHub runner though.

For the portFFT backend you might as well directly set the target to spir64_x86_64 and enable more tests if that's ok.

We'll try to improve the documentation for this. We need to find a meaningful default behavior while still letting the user tune for more specific use-cases.

rscohn2 · 2024-05-23T11:42:36Z

The github runner has 2 cores/4 threads, but I think we will have more cores available soon. For the 8 minute test, were they parallel or serial? @dnhsieh-intel prefers serial tests because the logs are easier to read.

Rbiessy · 2024-05-23T13:18:25Z

We just ran ctest so the tests should have been run serially when we measured 8 minutes.

rscohn2 · 2024-05-23T16:56:54Z

portfft tests are not passing with aot

https://github.com/oneapi-src/oneMKL/actions/runs/9208889271/job/25332103601#:~:text=48-,unknown%20file%3A%20Failure,%5B%20%20FAILED%20%20%5D,-ComputeTestSuite/ComputeTests_in_place_COMPLEX.COMPLEX_SINGLE_in_place_buffer

Rbiessy · 2024-05-24T08:16:47Z

Right, that makes sense actually portFFT is using spec constants. Let's revert the target to spir64 for portFFT then.

rscohn2 · 2024-05-26T11:38:58Z

With AOT, portblas takes 1.5 hours to compile and 5 minutes to run. Runners only have 2 cores, but we are getting runners with more cores so parallel build will be faster. 2nd longest is mkl blas, which takes 1 hour with most time spent in compile.
portfft is back to not using aot and running a couple tests. Everything passes and runs in an acceptable amount of time.

Rbiessy

Just a minor comment, looks good to me otherwise, thanks

.github/workflows/pr.yml

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

.github/workflows/pr.yml

…get (#511) * The portFFT backends targets Nvidia by default * In some situations, this causes a failure whilst compiling (CUDA might not be installed) * This checks the target is supported before use This solves the issue in PR #495

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

…get (uxlfoundation#511) * The portFFT backends targets Nvidia by default * In some situations, this causes a failure whilst compiling (CUDA might not be installed) * This checks the target is supported before use This solves the issue in PR uxlfoundation#495

rscohn2 added 2 commits May 21, 2024 15:02

testing for portblas portfft

1da2708

update

907b236

rscohn2 requested review from Rbiessy, dnhsieh-intel and hjabird May 21, 2024 20:54

rscohn2 marked this pull request as ready for review May 21, 2024 20:55

Rbiessy reviewed May 22, 2024

View reviewed changes

.github/workflows/pr.yml Outdated Show resolved Hide resolved

.github/workflows/pr.yml Show resolved Hide resolved

hjabird reviewed May 22, 2024

View reviewed changes

rscohn2 added 2 commits May 22, 2024 08:43

respond to review comments

eda138e

update

b243696

rscohn2 added 2 commits May 23, 2024 08:26

AOT compile for port*

5c67db8

portfft tests do not run

e51c68a

rscohn2 added 2 commits May 24, 2024 07:23

portfft is spir64

85ed214

use ubuntu-latest runners

d1a46b0

Rbiessy approved these changes May 27, 2024

View reviewed changes

.github/workflows/pr.yml Outdated Show resolved Hide resolved

remove commented out code

634ade9

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

dnhsieh-intel reviewed May 29, 2024

View reviewed changes

.github/workflows/pr.yml Outdated Show resolved Hide resolved

dnhsieh-intel approved these changes May 29, 2024

View reviewed changes

review comment suggestions

69ac4b1

rscohn2 merged commit 6d6a7b7 into uxlfoundation:develop May 30, 2024

hjabird mentioned this pull request Jun 12, 2024

[DFT][CMake] Check CUDA support with portFFT before using it as a target #511

Merged

normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024

testing for portblas portfft (uxlfoundation#495)

cdf7ab2

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

Conversation

rscohn2 commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Uh oh!

Uh oh!

hjabird left a comment

Choose a reason for hiding this comment

Uh oh!

Rbiessy commented May 22, 2024

Uh oh!

rscohn2 commented May 22, 2024

Uh oh!

Rbiessy commented May 22, 2024

Uh oh!

rscohn2 commented May 22, 2024

Uh oh!

dnhsieh-intel commented May 22, 2024

Uh oh!

rscohn2 commented May 22, 2024

Uh oh!

dnhsieh-intel commented May 22, 2024

Uh oh!

Rbiessy commented May 23, 2024

Uh oh!

rscohn2 commented May 23, 2024

Uh oh!

Rbiessy commented May 23, 2024

Uh oh!

rscohn2 commented May 23, 2024

Uh oh!

Rbiessy commented May 24, 2024

Uh oh!

rscohn2 commented May 26, 2024

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rscohn2 commented May 21, 2024 •

edited

Loading