-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utest failures on various arches #1469
Comments
Thanks. The dsdot error is most likely spurious as it flags all architectures that use the generic C implementation from kernel/generic/dot.c - I have just seen it myself on arm64. A drawback of using canned "known good" results across all architectures instead of comparing to the reference implementation running on the same cpu. The ROT and AXPY are "interesting"... |
Let me know if you need access to the non-x86 machines or more info. |
Actually it is the C implementation in kernel/arm/dot.c that is used by the arm,mips and zarch kernels that provides an inexact dsdot. |
I cannot debug the armv7 issues unfortunately, I can only guess that they may have crept in through the vfp changes of #1221 @ashwinyes |
@martin-frbg Looks like this is an issue with #1462. I went briefly through the commit diff and there are some serious issues with the commit, if I am not mistaken. In utest, I assume we need to call both ref BLAS implementation and the OpenBLAS implementation and then compare the result. In the commit, all calls to the BLAS reference implementation have been removed. There is no point in comparing the results. Also, at some places the input arrays have been modified. So even if the reference BLAS calls are restored, most likely it will generate failures. Could you please explain what this commit is doing ?? |
The commit restored older tests, of which only one (test_amax) had been similarly converted by xianyi (5a8447e) from the previously used cutest framework to something that would run on travis&appveyor. I have replaced the calls to the reference implementation with an array of results from running the respective netlib functions locally (on Haswell, but this should not cause large differences). Can you please point out where you think input arrays have been modified ? The x2,y2 which were previously both input and output for the reference implementation now hold the reference results. The only flaw I see with my changes is that currently the calculated and expected values are swapped in the error message. |
Note also that the failed tests are checking a valid but probably highly unusual corner case (zero increments), so I am not happy with just replacing your optimized axpy and rot codes with their generic C implementations that happen to pass the test. Omitting these two tests on arm may be an option if nobody has time to look at them now. |
@martin-frbg Apologies. Spoke too soon without understanding your commit completely. I will also look and try to fix the arm and arm64 codes when I have the bandwidth. |
No problem. I thought I had added the explanation about using canned results instead of live netlib calls in the commit message, but either I forgot or it got lost when I did a compressed merge to get rid of all the intermediate attempts to get prototypes right for all platforms. |
In fact, all the kernels for ARMv8 (including ThunderX) uses the KERNEL.ARMV8 as the base for DSDOT. Your changes were affecting ThunderX also. Didnt realise it until now :) But anyways, PR #1475 has the fix for DSDOT utest errors on ARM64. |
I confirm that the "dsdot" failures are now gone on armv7 and s390x |
I promise to execute more restraint in the future regarding ARMV8/ThunderX... though I do wonder what I was testing when I did not see problems with the ARMV8 dot.S , only with the arm/dot.c that it was originally falling back to for dsdot. |
Now also aarch64 is green. |
Do your builds include mips architectures as well ? I suspect those might show at least the "dsdot" failure as well. |
Unfortunately not, we do builds on official Fedora arches only - armv7, aarch64, ppc64, ppc64le, s390x and x86_64 |
Perhaps https://gcc.gnu.org/wiki/CompileFarm for MIPS? A possible connexion with GCC is gfortran's option to call BLAS for matmul. |
In Qemu, mips32 fails the dsdot test (for the same reason as the others above). mips64 additionally fails both complex axpy tests, complex swap and all rot tests. cpuid_mips.c needed a patch to even compile. |
Thanks for the pointer to the gcc compile farm. Seems their mips64 machines are actually $300 routers, so is this all that is left of that platform nowadays ? |
@sharkcz are you still seeing the failures on ppc ? |
yes, I do
with recent develop branch and using POWER6 kernel. The expected/got values are now swapped though. |
Thanks. Actually I was not sure what kernel you are using, and had hoped it might be POWER8 which got a zrot rewrite a few weeks ago. |
With POWER8 kernel I see a segfault from the dblat3 test, POWER7 gets same failures as POWER6, build with POWER5 results in assembler errors. |
segfault with POWER8 is not good obviously, is it in any way apparent which of the tests in dblat3 crashes it ? POWER7 maps to POWER6 internally so no surprise, POWER5 I have no idea... |
@martin-frbg anything wrong with z13 or power8le kernels? I have still access to those pcs. I could check at the week end |
power8le is OK from what I can see in our CI |
So it is "only" the POWER8 target on the POWER6 hardware (or emulator) that is segfaulting ? In that case I guess it can be ignored as "probably not expected to work". @quickwritereader I believe with my trivial fix for dsdot the z13 target should pass all utests. |
the POWER8 target segfaults on Power8 HW (ppc64 big endian VM), we build POWER6 target (on Power8 HW) as a least common denominator at the distro level |
Pretty sure the POWER8 code is written with power8le in mind, perhaps this needs to be clarified somewhere. (Wrong-endian assembly would still compile I guess, just do unexpected things later) |
yes, that's very plausible. I'm thinking what is best target for a distro still supporting ppc64, I would prefer correctness over speed ... |
yes, that's very plausible. I'm thinking what is best target for a
distro still supporting ppc64, I would prefer correctness over speed
...
I don't have a requirement for ppc, but I was going to suggest
RHEL7/Fedora (though I don't know SuSE); then I noticed who provides
developer shell access. Why not RHEL7?
|
I took his question to mean "what TARGET should I set in the OpenBLAS build process to get something that still works on ppc64".) While I assume the zrot bug has been there for ages and only this one test uncovered it, replacing the POWER6 zrot assembly with the generic implementation will probably not hurt performance too much. .(I am much less sure if I could read from the working srot/drot implementation how to handle the incx=zero case properly in assembly) |
yes, Martin's interpretation is what I meant by the question :-) |
And as ppc64 is being sunset in the Linux world, then using generic zrot would definitely be acceptable for us. |
Could you try with the two additional lines from #1535 in KERNEL.POWER6 please ? |
Thanks for testing. That would seem to leave the ARMV7 axpy and rot implementations, axpy_vfp.S and rot_vfp.S where my current thinking is that the check for incx=0,incy=0 at https://github.com/xianyi/OpenBLAS/blob/8a3b6fa108b15331c2af8777d1ea0206f85673b8/kernel/arm/axpy_vfp.S#L444-L448 (and similarly in the rot file) is premature, causing an exit before the first element of y has been updated. I suspect these lines should be commented out, or if an early exit is desired, perhaps be moved to the end of the KERNEL_S1 implementation(s). I do not have ARMV7 hardware for testing. |
ARMV7 changes verified via QEMU now. |
Awesome, I did a build on real hw and all is OK. |
The implementation in `riscv64/dot.c` fails the `test_dsdot` test, and the generic kernel seems to have better precision. Tested on SiFive FU740 (HiFive Unmatched) and QEMU. Also see OpenMathLib#1469.
After commit e7366a4 we see test failures on various non-x86 arches
s390x
TEST 11/22 dsdot:dsdot_n_1 [FAIL]
ERR: test_dsdot.c:47 expected -2.393e-03, got -2.393e-03 (diff 1.035e-10, tol 1.000e-13)
TEST 16/21 rot:csrot_inc_0 [FAIL]
ERR: test_rot.c:109 expected 3.125e-01, got -2.148e-01 (diff 5.273e-01, tol 1.000e-04)
TEST 18/21 rot:zdrot_inc_0 [FAIL]
ERR: test_rot.c:71 expected 3.125e-01, got -2.148e-01 (diff 5.273e-01, tol 1.000e-13)
The text was updated successfully, but these errors were encountered: