Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XE_HP* related unit tests fail with all compute-runtime versions #498

Closed
anbe42 opened this issue Jan 26, 2022 · 15 comments
Closed

XE_HP* related unit tests fail with all compute-runtime versions #498

anbe42 opened this issue Jan 26, 2022 · 15 comments
Labels

Comments

@anbe42
Copy link

anbe42 commented Jan 26, 2022

Hi,

while trying to update the Debian packages for the Intel OpenCL stack, I've run into two failing tests:

./opencl/test/unit_test/kernel/cache_flush_xehp_and_later_tests.inl:345: Failure
Expected equality of these values:
  rangesExpected
    Which is: { 8-byte object <01-E0 5D-F0 AD-55 00-00>, 8-byte object <01-00 5E-F0 AD-55 00-00>, 8-byte object <00-20 5E-F0 AD-55 00-00> }
  validateL3ControlPolicy.l3RangesParsed
    Which is: { 8-byte object <01-E0 5D-F0 FF-FF FF-3F>, 8-byte object <01-00 5E-F0 FF-FF FF-3F>, 8-byte object <00-20 5E-F0 FF-FF FF-3F> }
[  FAILED  ][ XE_HP_SDV ][ 6297 ] GivenCacheFlushAfterWalkerAndTimestampPacketsEnabledWhenAllocationRequiresCacheFlushThenFlushCommandPresentAfterWalkerXEHP.I

./opencl/test/unit_test/kernel/cache_flush_xehp_and_later_tests.inl:288: Failure
Expected equality of these values:
  rangesExpected
    Which is: { 8-byte object <00-10 62-F0 AD-55 00-00>, 8-byte object <01-20 62-F0 AD-55 00-00>, 8-byte object <01-40 62-F0 AD-55 00-00> }
  validateL3ControlPolicy.l3RangesParsed
    Which is: { 8-byte object <00-10 62-F0 FF-FF FF-3F>, 8-byte object <01-20 62-F0 FF-FF FF-3F>, 8-byte object <01-40 62-F0 FF-FF FF-3F> }
[  FAILED  ][ XE_HP_SDV ][ 6297 ] GivenCacheFlushAfterWalkerEnabledWhenAllocationRequiresCacheFlushThenFlushCommandPresentAfterWalkerXEHP.I

This happened while building intel-compute-runtime 22.03.22192 using igc 1.0.9933 and llvm-12. There may well be some unsupported combination of build dependency versions ...
I'm thankful for any pointer where to start looking.
(I can successfully build 21.32.20609 using igc 1.0.8744 (or 1.0.9933 as well) and llvm-12 in the same setup. I haven't tried any versions inbetween.)

@anbe42
Copy link
Author

anbe42 commented Jan 27, 2022

I've managed to bisect the issue to commit 947297d (Compile XeHpSdv by default) in 21.44.21506, so the problem has been there since XeHpSdv got enabled ,,,

I've also noticed that it does not happen always (only 43 out of 100 attempts in a longer series) and that the mismatching 8-byte objects have different values in each run:

log.81.2:    Which is: { 8-byte object <01-E0 C4-DD EF-55 00-00>, 8-byte object <01-00 C5-DD EF-55 00-00>, 8-byte object <00-20 C5-DD EF-55 00-00> }
log.81.2:    Which is: { 8-byte object <01-E0 C4-DD FF-FF FF-3F>, 8-byte object <01-00 C5-DD FF-FF FF-3F>, 8-byte object <00-20 C5-DD FF-FF FF-3F> }
log.81.2:    Which is: { 8-byte object <00-50 C5-DD EF-55 00-00>, 8-byte object <01-60 C5-DD EF-55 00-00>, 8-byte object <01-80 C5-DD EF-55 00-00> }
log.81.2:    Which is: { 8-byte object <00-50 C5-DD FF-FF FF-3F>, 8-byte object <01-60 C5-DD FF-FF FF-3F>, 8-byte object <01-80 C5-DD FF-FF FF-3F> }
log.84.2:    Which is: { 8-byte object <02-C0 E8-9B F8-55 00-00>, 8-byte object <00-00 E9-9B F8-55 00-00> }
log.84.2:    Which is: { 8-byte object <02-C0 E8-9B FF-FF FF-3F>, 8-byte object <00-00 E9-9B FF-FF FF-3F> }
log.84.2:    Which is: { 8-byte object <02-80 E9-9B F8-55 00-00>, 8-byte object <00-C0 E9-9B F8-55 00-00> }
log.84.2:    Which is: { 8-byte object <02-80 E9-9B FF-FF FF-3F>, 8-byte object <00-C0 E9-9B FF-FF FF-3F> }
log.86.2:    Which is: { 8-byte object <01-A0 0B-F0 50-56 00-00>, 8-byte object <01-C0 0B-F0 50-56 00-00>, 8-byte object <00-E0 0B-F0 50-56 00-00> }
log.86.2:    Which is: { 8-byte object <01-A0 0B-F0 FF-FF FF-3F>, 8-byte object <01-C0 0B-F0 FF-FF FF-3F>, 8-byte object <00-E0 0B-F0 FF-FF FF-3F> }
log.86.2:    Which is: { 8-byte object <01-60 0E-F0 50-56 00-00>, 8-byte object <01-80 0E-F0 50-56 00-00>, 8-byte object <00-A0 0E-F0 50-56 00-00> }
log.86.2:    Which is: { 8-byte object <01-60 0E-F0 FF-FF FF-3F>, 8-byte object <01-80 0E-F0 FF-FF FF-3F>, 8-byte object <00-A0 0E-F0 FF-FF FF-3F> }
log.87.2:    Which is: { 8-byte object <01-A0 35-8B B9-55 00-00>, 8-byte object <01-C0 35-8B B9-55 00-00>, 8-byte object <00-E0 35-8B B9-55 00-00> }
log.87.2:    Which is: { 8-byte object <01-A0 35-8B FF-FF FF-3F>, 8-byte object <01-C0 35-8B FF-FF FF-3F>, 8-byte object <00-E0 35-8B FF-FF FF-3F> }
log.87.2:    Which is: { 8-byte object <01-A0 35-8B B9-55 00-00>, 8-byte object <01-C0 35-8B B9-55 00-00>, 8-byte object <00-E0 35-8B B9-55 00-00> }
log.87.2:    Which is: { 8-byte object <01-A0 35-8B FF-FF FF-3F>, 8-byte object <01-C0 35-8B FF-FF FF-3F>, 8-byte object <00-E0 35-8B FF-FF FF-3F> }
log.92.2:    Which is: { 8-byte object <01-A0 4D-F0 00-56 00-00>, 8-byte object <01-C0 4D-F0 00-56 00-00>, 8-byte object <00-E0 4D-F0 00-56 00-00> }
log.92.2:    Which is: { 8-byte object <01-A0 4D-F0 FF-FF FF-3F>, 8-byte object <01-C0 4D-F0 FF-FF FF-3F>, 8-byte object <00-E0 4D-F0 FF-FF FF-3F> }
log.92.2:    Which is: { 8-byte object <01-A0 55-F0 00-56 00-00>, 8-byte object <01-C0 55-F0 00-56 00-00>, 8-byte object <00-E0 55-F0 00-56 00-00> }
log.92.2:    Which is: { 8-byte object <01-A0 55-F0 FF-FF FF-3F>, 8-byte object <01-C0 55-F0 FF-FF FF-3F>, 8-byte object <00-E0 55-F0 FF-FF FF-3F> }
log.95.2:    Which is: { 8-byte object <01-20 19-D9 C5-55 00-00>, 8-byte object <01-40 19-D9 C5-55 00-00>, 8-byte object <00-60 19-D9 C5-55 00-00> }
log.95.2:    Which is: { 8-byte object <01-20 19-D9 FF-FF FF-3F>, 8-byte object <01-40 19-D9 FF-FF FF-3F>, 8-byte object <00-60 19-D9 FF-FF FF-3F> }
log.95.2:    Which is: { 8-byte object <00-B0 1F-D9 C5-55 00-00>, 8-byte object <02-C0 1F-D9 C5-55 00-00> }
log.95.2:    Which is: { 8-byte object <00-B0 1F-D9 FF-FF FF-3F>, 8-byte object <02-C0 1F-D9 FF-FF FF-3F> }
log.98.2:    Which is: { 8-byte object <01-60 3F-C7 37-56 00-00>, 8-byte object <01-80 3F-C7 37-56 00-00>, 8-byte object <00-A0 3F-C7 37-56 00-00> }
log.98.2:    Which is: { 8-byte object <01-60 3F-C7 FF-FF FF-3F>, 8-byte object <01-80 3F-C7 FF-FF FF-3F>, 8-byte object <00-A0 3F-C7 FF-FF FF-3F> }
log.98.2:    Which is: { 8-byte object <02-40 36-C7 37-56 00-00>, 8-byte object <00-80 36-C7 37-56 00-00> }
log.98.2:    Which is: { 8-byte object <02-40 36-C7 FF-FF FF-3F>, 8-byte object <00-80 36-C7 FF-FF FF-3F> }
log.99.2:    Which is: { 8-byte object <02-C0 DB-C7 5C-55 00-00>, 8-byte object <00-00 DC-C7 5C-55 00-00> }
log.99.2:    Which is: { 8-byte object <02-C0 DB-C7 FF-FF FF-3F>, 8-byte object <00-00 DC-C7 FF-FF FF-3F> }
log.99.2:    Which is: { 8-byte object <00-50 EA-C7 5C-55 00-00>, 8-byte object <01-60 EA-C7 5C-55 00-00>, 8-byte object <01-80 EA-C7 5C-55 00-00> }
log.99.2:    Which is: { 8-byte object <00-50 EA-C7 FF-FF FF-3F>, 8-byte object <01-60 EA-C7 FF-FF FF-3F>, 8-byte object <01-80 EA-C7 FF-FF FF-3F> }

There seems to be some pattern: the expected value is <uu-vv ww-xx yy-zz 00-00> but the actual value is <uu-vv ww-xx FF-FF FF-3F>

@eero-t
Copy link

eero-t commented Feb 1, 2022

HP_SDV tests are not the only ones failing for me (on Xeon build host) when building latest "compute-runtime" 22.04.22286 release.

To get its build (tests) to succeed (with LLVM v11), I need to give CMake: -DSUPPORT_XE_HP_SDV=0 -DSUPPORT_XE_HP_CORE=0 -DSUPPORT_XE_HPC_CORE=0 -DSUPPORT_XE_HPG_CORE=0

I can enable support for DG1 and earlier HW though: -DSUPPORT_GEN9=1 -DSUPPORT_GEN11=1 -DSUPPORT_GEN12LP=1 -DSUPPORT_DG1=1

For now this is fine because public kernel does not have support for XE_HP* (and even enabling support for DG1 requires force probing).

@eero-t
Copy link

eero-t commented Feb 24, 2022

@anbe42 I would suggest renaming this as "XE_HP* related unit tests fail with all compute-runtime versions".

@anbe42 anbe42 changed the title two test failures building 22.03.22192 on Debian XE_HP* related unit tests fail with all compute-runtime versions Feb 24, 2022
@JablonskiMateusz
Copy link
Contributor

@anbe42 what compiler version are you using?

@anbe42
Copy link
Author

anbe42 commented Feb 24, 2022

@JablonskiMateusz compute-runtime is built with gcc 11 (Debian package) and uses igc built with llvm 12 (Debian package) and gcc 11 (if there are components built with gcc instead of clang/llvm), igc uses opencl-clang and spirv-llvm-translator built with llvm-12

@eero-t
Copy link

eero-t commented Feb 24, 2022

@JablonskiMateusz In my case, I was building "compute-runtime" 22.04.22286 with gcc v10 and LLVM v11 on Ubuntu 21.04:

LD_LIBRARY_PATH=/home/nobody/source/compute-runtime/build/bin /home/nobody/source/compute-runtime/build/bin/ocloc -gen_file -file /home/nobody/source/compute-runtime/opencl/test/unit_test/test_files/CopyBuffer_simd16.cl -device dg2 -64 -out_dir /home/nobody/source/compute-runtime/build/bin/XE_HPG_COREdg2/0/test_files/x64/ -revision_id 0 -options -g -options_name

error: System thread kernel could not be created!
error: backend compiler failed build.

Build failed with error code: -11

By disabling all XE_HP* stuff, build worked again.

Note that this is with "igc-1.0.9636" because "compute-runtime" build fails with newer IGC versions to: intel/intel-graphics-compiler#224

Opencl-clang was oldish one from the distro, but all SPIRV deps are built by IGC from sources.

I have not tried to enabing XE_HP* options with newer "compute-runtime" versions (I've been waiting for that IGC bug to get first some attention).

@JablonskiMateusz
Copy link
Contributor

@eero-t regarding the issue:

error: System thread kernel could not be created!
error: backend compiler failed build.
Build failed with error code: -11

igc-1.0.9636 didn't work with DG2 very well, the issue should go away with igc-1.0.9933

@eero-t
Copy link

eero-t commented Feb 25, 2022

igc-1.0.9636 didn't work with DG2 very well, the issue should go away with igc-1.0.9933

@JablonskiMateusz OK, as newer HW unit-tests in compute-runtime need also newer IGC, I assume disabling XE_HP* platforms while still using older IGC version is the correct thing to do?

Could compute-runtime add IGC version check and disable them automatically when IGC is too old, to avoid user confusion about this?

EDIT: I misunderstood your comment at first, this was rewritten after the lightbulb lit...

@eero-t
Copy link

eero-t commented Feb 25, 2022

igc-1.0.9636 didn't work with DG2 very well, the issue should go away with igc-1.0.9933

While that explain my issues, what about the XE_HP* failures that the original bug reporter got with "igc-1.0.9933", are even newer IGC versions needed for the other XE_HP* platforms (supported by compute-runtime) than DG2?

EDIT: Looking at IGC release notes after "igc-1.0.9636", I started to wonder is this issue VC (Vector Compiler / intrinsics) related. Any idea whether things should work any better if one would disable / not use VC in IGC build?

(I have impression that no-VC is less tested code path in IGC, that's why I'm asking first whether it's even supposed to work better, before trying it myself.)

@JablonskiMateusz
Copy link
Contributor

regarding two failures that @anbe42 mentioned at the very beginning of the issue, I setup workspace with gcc11 and reproduced the issue.

@JablonskiMateusz
Copy link
Contributor

This is in our queue. We expect to have a fix in the near future.

@eero-t
Copy link

eero-t commented Mar 17, 2022

I setup workspace with gcc11 and reproduced the issue.
...
This is in our queue. We expect to have a fix in the near future.

Any news?

@JablonskiMateusz
Copy link
Contributor

I setup workspace with gcc11 and reproduced the issue.
...
This is in our queue. We expect to have a fix in the near future.

Any news?

could you try with 95103c3 ?

@tjaalton
Copy link
Contributor

tjaalton commented Apr 6, 2022

seems to work here

@JablonskiMateusz
Copy link
Contributor

I'm closing the issue as fix is delivered and @tjaalton confirmed it works.
@anbe42 please create a new issue if you still see any related issues. Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants