Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in zgemm on AMD #380

Closed
andreasnoack opened this issue Jun 3, 2014 · 67 comments
Closed

Bug in zgemm on AMD #380

andreasnoack opened this issue Jun 3, 2014 · 67 comments
Labels

Comments

@andreasnoack
Copy link
Contributor

On a AMD FX(tm)-8320 Eight-Core Processor machine running Windows and with OpenBLAS complied with 64 bit integer support the following program

program test

     complex*16 :: a(4,4), b(4,4), c(4,4)

     a(:,:) = (1.0d0, 1.0d0)
     b(:,:) = 1.0d0

     call zgemm('N', 'N', 4_8, 4_8, 4_8, (1.0d0, 0.0d0), a, 4_8, b, 4_8, (0.0d0, 0.0d0), c, 4_8)

    write(*,*) c

 end program

gives the wrong answer

(  0.0000000000000000     ,  8.0000000000000000     ) (  0.0000000000000000
 ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.0000000000000000
 ) (  0.0000000000000000     ,  8.0000000000000000     ) (  0.0000000000000000
   ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.0000000000000000
   ) (  0.0000000000000000     ,  8.0000000000000000     ) (  0.0000000000000000
     ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.0000000000000000
     ) (  0.0000000000000000     ,  8.0000000000000000     ) (  0.00000000000000
00     ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.00000000000000
00     ) (  0.0000000000000000     ,  8.0000000000000000     ) (  0.000000000000
0000     ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.000000000000
0000     ) (  0.0000000000000000     ,  8.0000000000000000     )
@andreasnoack
Copy link
Contributor Author

The problem is also present for cgemm.

@wernsaar
Copy link
Contributor

wernsaar commented Jun 4, 2014

On 03.06.2014 20:20, Andreas Noack Jensen wrote:

On a AMD FX(tm)-8320 Eight-Core Processor machine running Windows and with OpenBLAS complied with 64 bit integer support the following program

program test

      complex*16 :: a(4,4), b(4,4), c(4,4)

      a(:,:) = (1.0d0, 1.0d0)
      b(:,:) = 1.0d0

      call zgemm('N', 'N', 4_8, 4_8, 4_8, (1.0d0, 0.0d0), a, 4_8, b, 4_8, (0.0d0, 0.0d0), c, 4_8)

      write(*,*) c

  end program

gives the wrong answer

(  0.0000000000000000     ,  8.0000000000000000     ) (  0.0000000000000000
  ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.0000000000000000
  ) (  0.0000000000000000     ,  8.0000000000000000     ) (  0.0000000000000000
    ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.0000000000000000
    ) (  0.0000000000000000     ,  8.0000000000000000     ) (  0.0000000000000000
      ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.0000000000000000
      ) (  0.0000000000000000     ,  8.0000000000000000     ) (  0.00000000000000
00     ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.00000000000000
00     ) (  0.0000000000000000     ,  8.0000000000000000     ) (  0.000000000000
0000     ,  8.0000000000000000     ) (  0.0000000000000000     ,  8.000000000000
0000     ) (  0.0000000000000000     ,  8.0000000000000000     )

Reply to this email directly or view it on GitHub:
#380
Hi,

I cannot reproduce this error.
Please give me more information.

Is this a 32bit or 64bit build, what is the TARGET, do you use threading?

Regards

Werner

@andreasnoack
Copy link
Contributor Author

We have only seen it on the reported architecture. It is a 64 bit build and the bug is there both with and without threading. @tkelman Can you say more about the TARGET?

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2014

I think TARGET is empty. The set of Makefile flags, as best as I can reconstruct them, are:

FFLAGS=-O2 USE_THREAD=1 NUM_THREADS=16 NO_AFFINITY=1 DYNAMIC_ARCH=1 INTERFACE64=1 BINARY=64

@wernsaar
Copy link
Contributor

wernsaar commented Jun 4, 2014

On 04.06.2014 14:50, Tony Kelman wrote:

I think TARGET is empty. The set of Makefile flags, as best as I can reconstruct them, are:

FFLAGS=-O2 USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=50 NUM_THREADS=16 NO_AFFINITY=1 DYNAMIC_ARCH=1 INTERFACE64=1 BINARY=64


Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

could you please compile with the following comand line:

FFLAGS=-O2 USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=50 NUM_THREADS=16 NO_AFFINITY=1 TARGET=PILEDRIVER INTERFACE64=1 BINARY=64

Which C- and Fortran compiler do you use?

Regards

Werner

@wernsaar
Copy link
Contributor

wernsaar commented Jun 4, 2014

On 04.06.2014 14:50, Tony Kelman wrote:

I think TARGET is empty. The set of Makefile flags, as best as I can reconstruct them, are:

FFLAGS=-O2 USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=50 NUM_THREADS=16 NO_AFFINITY=1 DYNAMIC_ARCH=1 INTERFACE64=1 BINARY=64


Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

I simply cannot reproduce the error.
I now published a binary package for windows on sourceforge.
Please download
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz

The file Makefile.rule is included.
In the directory ftest, you can find the fortran test.

Best regards

Werner

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2014

I used GCC and Gfortran 4.8.2, MinGW cross-compiled either from Linux (Julia binaries) or Cygwin (my build of up-to-date develop branch) to compile OpenBLAS.

The user who's seeing the issue was compiling the test program with

$ gfortran --version
GNU Fortran (tdm64-2) 4.8.1

@wernsaar
Copy link
Contributor

wernsaar commented Jun 4, 2014

On 04.06.2014 16:25, Tony Kelman wrote:

I used GCC and Gfortran 4.8.2, MinGW cross-compiled either from Linux (Julia binaries) or Cygwin (my build of up-to-date develop branch) to compile OpenBLAS.

The user who's seeing the issue was compiling the test program with

$ gfortran --version
GNU Fortran (tdm64-2) 4.8.1

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please try the binary, that I have now published at sourceforge.net.
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz

and report the result.
I don't find a problem in the blas library, perhaps this could be
another problem.

Best regards

Werner

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2014

cc @jasax

@wernsaar wernsaar added this to the 0.2.9 version milestone Jun 4, 2014
@wernsaar wernsaar closed this as completed Jun 4, 2014
@ViralBShah
Copy link
Contributor

@wernsaar Which Fortran compiler and what version do you use?

@wernsaar
Copy link
Contributor

wernsaar commented Jun 5, 2014

On 05.06.2014 19:48, Viral B. Shah wrote:

@wernsaar Which Fortran compiler and what version do you use?


Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

I built openblas for windows on our Ubuntu machine with mingw64.
But please wait a short time. Xianyi wrote me, that he found a bug, when
building
the dll on windows. He will provide a fix soon.

Best regards

Werner

@ViralBShah
Copy link
Contributor

Best to reopen the issue.

@xianyi xianyi reopened this Jun 6, 2014
@xianyi
Copy link
Collaborator

xianyi commented Jun 6, 2014

I just fixed a bug about generating DLL. I think this is a different bug.

@wernsaar
Copy link
Contributor

wernsaar commented Jun 6, 2014

Hi,

@ViralBShah
Is it possible, to debug the fortran test program with gdb in windows?
If yes, compile the fortran code with the flags -O0 -g, and oen the exe in gdb

Type:
break level3.c:358
run
repeat to type s until you are in zgemm_kernel_n_PILEDRIVER or has the zgemm_kernel_n
another name.

Please send me the result of the command:
info all-register

Regards

Werner

@ViralBShah
Copy link
Contributor

The issue was reported by @jasax on his machine. @jasax Could you provide what @wernsaar is requesting?

@jasax
Copy link

jasax commented Jun 6, 2014

Hi,

Which openblas.dll do you want me to use with the gdb test? I have the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)
  4. http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
    (which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:
#380 (comment)

@wernsaar
Copy link
Contributor

wernsaar commented Jun 6, 2014

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)
  4. http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
    (which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner

@jasax
Copy link

jasax commented Jun 6, 2014

Hi,

Sorry, I now see I have certainly missed the test programs (thought it
was the simple one fortrarn function to test complex multiplication).

So, which is the test program? Do I have to compile openblas source
code? What is level3.c?

Thx

Jose

On 6/6/14, wernsaar notifications@github.com wrote:

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at
    http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)

http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
(which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you
provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner


Reply to this email directly or view it on GitHub:
#380 (comment)

@wernsaar
Copy link
Contributor

wernsaar commented Jun 7, 2014

On 06.06.2014 19:31, jasax wrote:

Hi,

Sorry, I now see I have certainly missed the test programs (thought it
was the simple one fortrarn function to test complex multiplication).

So, which is the test program? Do I have to compile openblas source
code? What is level3.c?

Thx

Jose

On 6/6/14, wernsaar notifications@github.com wrote:

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at
    http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)

http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
(which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you
provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the short fortran program for the test.
The file level3.c is the source code, you don't need to compile this code.
It's only a syntax in gdb, to set a breakpoint

Regards,

Werner

@jasax
Copy link

jasax commented Jun 9, 2014

Hi Werner,

I was out for weekend, only now I am working in the amd 8320 8-core PC
and have a few time free.

I really don't understand the instructions, sorry, regarding the
level3.c function.
I ran your command list (pic link below):
https://dl.dropboxusercontent.com/u/9722274/Julia/gdb1.png
But really the gdb run doesn't stop at any level3.c function. The
program runs without stopping and ends.

So I must be doing something wrong. Please see if you can detect the
roots of my ignorance and give me a few more details ;-)

Regards
Jose

On 6/7/14, wernsaar notifications@github.com wrote:

On 06.06.2014 19:31, jasax wrote:

Hi,

Sorry, I now see I have certainly missed the test programs (thought it
was the simple one fortrarn function to test complex multiplication).

So, which is the test program? Do I have to compile openblas source
code? What is level3.c?

Thx

Jose

On 6/6/14, wernsaar notifications@github.com wrote:

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at
    http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)

http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
(which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you
provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the short fortran program for the test.
The file level3.c is the source code, you don't need to compile this code.
It's only a syntax in gdb, to set a breakpoint

Regards,

Werner


Reply to this email directly or view it on GitHub:
#380 (comment)

@jasax
Copy link

jasax commented Jun 9, 2014

Hi Werner,

Perhaps I'm being a bit stupid and this is what you want :-)
I thought level3.c was some inner routine in openblas :-(

Please see if this is what is intended.

Regards

Jose

#############################

$ gdb a.exe
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests\a.exe...done
.
(gdb) break 12
Breakpoint 1 at 0x4015ba: file test.f90, line 12.
(gdb) run
Starting program: c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests/a.exe
[New Thread 6372.0x1a5c]
[New Thread 6372.0xed0]
[New Thread 6372.0xf60]
[New Thread 6372.0x594]
[New Thread 6372.0x88]
[New Thread 6372.0x1154]
[New Thread 6372.0x7a0]
[New Thread 6372.0x14a4]

Breakpoint 1, test () at test.f90:12
12 call zgemm('N', 'N', 4_8, 4_8, 4_8, (1.0d0, 0.0d0), a, 4_8, b, 4_8, (0.0
d0, 0.0d0), c, 4_8)
(gdb) s
14 write(,) c
(gdb) info all-registers
rax 0x6cf442b0 1827947184
rbx 0x22fe10 2293264
rcx 0x2a20000 44171264
rdx 0x6cf442b0 1827947184
rsi 0x22fe10 2293264
rdi 0x4f6da0 5205408
rbp 0x22f8e0 0x22f8e0
rsp 0x22f860 0x22f860
r8 0x2a20000 44171264
r9 0x2a20000 44171264
r10 0x40 64
r11 0x0 0
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x401657 0x401657 <test+359>
eflags 0x202 [ IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
xmm0 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0, 0x0, 0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000), (0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm1 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm2 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm3 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm4 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0, 0x0, 0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000), (0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm5 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm6 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm7 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm8 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm9 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm10 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm11 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm12 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm13 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm14 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm15 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
mxcsr 0x1f80 IM DM ZM OM UM PM

#############################

On 6/7/14, wernsaar notifications@github.com wrote:

On 06.06.2014 19:31, jasax wrote:

Hi,

Sorry, I now see I have certainly missed the test programs (thought it
was the simple one fortrarn function to test complex multiplication).

So, which is the test program? Do I have to compile openblas source
code? What is level3.c?

Thx

Jose

On 6/6/14, wernsaar notifications@github.com wrote:

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at
    http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)

http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
(which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you
provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the short fortran program for the test.
The file level3.c is the source code, you don't need to compile this code.
It's only a syntax in gdb, to set a breakpoint

Regards,

Werner


Reply to this email directly or view it on GitHub:
#380 (comment)

@wernsaar
Copy link
Contributor

wernsaar commented Jun 9, 2014

On 09.06.2014 16:08, jasax wrote:

Hi Werner,

Perhaps I'm being a bit stupid and this is what you want :-)
I thought level3.c was some inner routine in openblas :-(

Please see if this is what is intended.

Regards

Jose

#############################

$ gdb a.exe
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests\a.exe...done
.
(gdb) break 12
Breakpoint 1 at 0x4015ba: file test.f90, line 12.
(gdb) run
Starting program: c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests/a.exe
[New Thread 6372.0x1a5c]
[New Thread 6372.0xed0]
[New Thread 6372.0xf60]
[New Thread 6372.0x594]
[New Thread 6372.0x88]
[New Thread 6372.0x1154]
[New Thread 6372.0x7a0]
[New Thread 6372.0x14a4]

Breakpoint 1, test () at test.f90:12
12 call zgemm('N', 'N', 4_8, 4_8, 4_8, (1.0d0, 0.0d0), a, 4_8, b, 4_8, (0.0
d0, 0.0d0), c, 4_8)
(gdb) s
14 write(,) c
(gdb) info all-registers
rax 0x6cf442b0 1827947184
rbx 0x22fe10 2293264
rcx 0x2a20000 44171264
rdx 0x6cf442b0 1827947184
rsi 0x22fe10 2293264
rdi 0x4f6da0 5205408
rbp 0x22f8e0 0x22f8e0
rsp 0x22f860 0x22f860
r8 0x2a20000 44171264
r9 0x2a20000 44171264
r10 0x40 64
r11 0x0 0
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x401657 0x401657 <test+359>
eflags 0x202 [ IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
xmm0 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0, 0x0, 0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000), (0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm1 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm2 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm3 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm4 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0, 0x0, 0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000), (0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm5 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm6 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm7 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm8 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm9 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm10 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm11 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm12 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm13 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm14 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
xmm15 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0, 0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), 0x00000000000000000000000000000
000 )
mxcsr 0x1f80 IM DM ZM OM UM PM

#############################

On 6/7/14, wernsaar notifications@github.com wrote:

On 06.06.2014 19:31, jasax wrote:

Hi,

Sorry, I now see I have certainly missed the test programs (thought it
was the simple one fortrarn function to test complex multiplication).

So, which is the test program? Do I have to compile openblas source
code? What is level3.c?

Thx

Jose

On 6/6/14, wernsaar notifications@github.com wrote:

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at
    http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)

http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
(which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you
provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the short fortran program for the test.
The file level3.c is the source code, you don't need to compile this code.
It's only a syntax in gdb, to set a breakpoint

Regards,

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

level3.c is the single threaded mid-level function for gemm.
Could You please repeat the test, but without threading.

Best regards

Werner

@jasax
Copy link

jasax commented Jun 9, 2014

Hi Werner,

Sorry, I've been looking how to not use threads in gdb, and could not
find it in gdb help.
Only found how to select threads among a bunch of them...

So, do I have to recompile test.f90 without threads (and what compiler
switch is used for that...) or can I force single thread execution in
gdb?

Thx, and sorry for my ignorance of these matters :-)

Jose

On 6/9/14, wernsaar notifications@github.com wrote:

On 09.06.2014 16:08, jasax wrote:

Hi Werner,

Perhaps I'm being a bit stupid and this is what you want :-)
I thought level3.c was some inner routine in openblas :-(

Please see if this is what is intended.

Regards

Jose

#############################

$ gdb a.exe
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from
c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests\a.exe...done
.
(gdb) break 12
Breakpoint 1 at 0x4015ba: file test.f90, line 12.
(gdb) run
Starting program: c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests/a.exe
[New Thread 6372.0x1a5c]
[New Thread 6372.0xed0]
[New Thread 6372.0xf60]
[New Thread 6372.0x594]
[New Thread 6372.0x88]
[New Thread 6372.0x1154]
[New Thread 6372.0x7a0]
[New Thread 6372.0x14a4]

Breakpoint 1, test () at test.f90:12
12 call zgemm('N', 'N', 4_8, 4_8, 4_8, (1.0d0, 0.0d0), a, 4_8, b,
4_8, (0.0
d0, 0.0d0), c, 4_8)
(gdb) s
14 write(,) c
(gdb) info all-registers
rax 0x6cf442b0 1827947184
rbx 0x22fe10 2293264
rcx 0x2a20000 44171264
rdx 0x6cf442b0 1827947184
rsi 0x22fe10 2293264
rdi 0x4f6da0 5205408
rbp 0x22f8e0 0x22f8e0
rsp 0x22f860 0x22f860
r8 0x2a20000 44171264
r9 0x2a20000 44171264
r10 0x40 64
r11 0x0 0
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x401657 0x401657 <test+359>
eflags 0x202 [ IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
xmm0 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0, 0x0,
0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000),
(0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm1 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm2 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm3 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm4 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0, 0x0,
0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000),
(0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm5 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm6 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm7 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm8 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm9 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm10 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm11 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm12 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm13 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm14 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm15 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
mxcsr 0x1f80 IM DM ZM OM UM PM

#############################

On 6/7/14, wernsaar notifications@github.com wrote:

On 06.06.2014 19:31, jasax wrote:

Hi,

Sorry, I now see I have certainly missed the test programs (thought it
was the simple one fortrarn function to test complex multiplication).

So, which is the test program? Do I have to compile openblas source
code? What is level3.c?

Thx

Jose

On 6/6/14, wernsaar notifications@github.com wrote:

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have
the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at
    http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)

http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
(which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you
provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the short fortran program for the test.
The file level3.c is the source code, you don't need to compile this
code.
It's only a syntax in gdb, to set a breakpoint

Regards,

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

level3.c is the single threaded mid-level function for gemm.
Could You please repeat the test, but without threading.

Best regards

Werner


Reply to this email directly or view it on GitHub:
#380 (comment)

@wernsaar
Copy link
Contributor

wernsaar commented Jun 9, 2014

On 09.06.2014 17:16, jasax wrote:

Hi Werner,

Sorry, I've been looking how to not use threads in gdb, and could not
find it in gdb help.
Only found how to select threads among a bunch of them...

So, do I have to recompile test.f90 without threads (and what compiler
switch is used for that...) or can I force single thread execution in
gdb?

Thx, and sorry for my ignorance of these matters :-)

Jose

On 6/9/14, wernsaar notifications@github.com wrote:

On 09.06.2014 16:08, jasax wrote:

Hi Werner,

Perhaps I'm being a bit stupid and this is what you want :-)
I thought level3.c was some inner routine in openblas :-(

Please see if this is what is intended.

Regards

Jose

#############################

$ gdb a.exe
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from
c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests\a.exe...done
.
(gdb) break 12
Breakpoint 1 at 0x4015ba: file test.f90, line 12.
(gdb) run
Starting program: c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests/a.exe
[New Thread 6372.0x1a5c]
[New Thread 6372.0xed0]
[New Thread 6372.0xf60]
[New Thread 6372.0x594]
[New Thread 6372.0x88]
[New Thread 6372.0x1154]
[New Thread 6372.0x7a0]
[New Thread 6372.0x14a4]

Breakpoint 1, test () at test.f90:12
12 call zgemm('N', 'N', 4_8, 4_8, 4_8, (1.0d0, 0.0d0), a, 4_8, b,
4_8, (0.0
d0, 0.0d0), c, 4_8)
(gdb) s
14 write(,) c
(gdb) info all-registers
rax 0x6cf442b0 1827947184
rbx 0x22fe10 2293264
rcx 0x2a20000 44171264
rdx 0x6cf442b0 1827947184
rsi 0x22fe10 2293264
rdi 0x4f6da0 5205408
rbp 0x22f8e0 0x22f8e0
rsp 0x22f860 0x22f860
r8 0x2a20000 44171264
r9 0x2a20000 44171264
r10 0x40 64
r11 0x0 0
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x401657 0x401657 <test+359>
eflags 0x202 [ IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
xmm0 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0, 0x0,
0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000),
(0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm1 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm2 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm3 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm4 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0, 0x0,
0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000),
(0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm5 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm6 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm7 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm8 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm9 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm10 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm11 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm12 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm13 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm14 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm15 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
mxcsr 0x1f80 IM DM ZM OM UM PM

#############################

On 6/7/14, wernsaar notifications@github.com wrote:

On 06.06.2014 19:31, jasax wrote:

Hi,

Sorry, I now see I have certainly missed the test programs (thought it
was the simple one fortrarn function to test complex multiplication).

So, which is the test program? Do I have to compile openblas source
code? What is level3.c?

Thx

Jose

On 6/6/14, wernsaar notifications@github.com wrote:

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have
the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at
    http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)

http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
(which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you
provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the short fortran program for the test.
The file level3.c is the source code, you don't need to compile this
code.
It's only a syntax in gdb, to set a breakpoint

Regards,

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

level3.c is the single threaded mid-level function for gemm.
Could You please repeat the test, but without threading.

Best regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

in Linux, I simply type export OMP_NUM_THREADS=1 before starting gdb.
I don't know how to set this in Windows.

Regards

Werner

@jasax
Copy link

jasax commented Jun 9, 2014

Hi,

The mingw shell is (or should be...) similar to a bash shell in Linux,
so I used your export.
Nevertheless 2 threads are still opened (not 8 as before) but I still
can't reach the level3.c code. See pic below. Please tell me what am I
doing wrong.
https://dl.dropboxusercontent.com/u/9722274/Julia/gdb2.png

Regards

Jose

On 6/9/14, wernsaar notifications@github.com wrote:

On 09.06.2014 17:16, jasax wrote:

Hi Werner,

Sorry, I've been looking how to not use threads in gdb, and could not
find it in gdb help.
Only found how to select threads among a bunch of them...

So, do I have to recompile test.f90 without threads (and what compiler
switch is used for that...) or can I force single thread execution in
gdb?

Thx, and sorry for my ignorance of these matters :-)

Jose

On 6/9/14, wernsaar notifications@github.com wrote:

On 09.06.2014 16:08, jasax wrote:

Hi Werner,

Perhaps I'm being a bit stupid and this is what you want :-)
I thought level3.c was some inner routine in openblas :-(

Please see if this is what is intended.

Regards

Jose

#############################

$ gdb a.exe
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from
c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests\a.exe...done
.
(gdb) break 12
Breakpoint 1 at 0x4015ba: file test.f90, line 12.
(gdb) run
Starting program: c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests/a.exe
[New Thread 6372.0x1a5c]
[New Thread 6372.0xed0]
[New Thread 6372.0xf60]
[New Thread 6372.0x594]
[New Thread 6372.0x88]
[New Thread 6372.0x1154]
[New Thread 6372.0x7a0]
[New Thread 6372.0x14a4]

Breakpoint 1, test () at test.f90:12
12 call zgemm('N', 'N', 4_8, 4_8, 4_8, (1.0d0, 0.0d0), a, 4_8, b,
4_8, (0.0
d0, 0.0d0), c, 4_8)
(gdb) s
14 write(,) c
(gdb) info all-registers
rax 0x6cf442b0 1827947184
rbx 0x22fe10 2293264
rcx 0x2a20000 44171264
rdx 0x6cf442b0 1827947184
rsi 0x22fe10 2293264
rdi 0x4f6da0 5205408
rbp 0x22f8e0 0x22f8e0
rsp 0x22f860 0x22f860
r8 0x2a20000 44171264
r9 0x2a20000 44171264
r10 0x40 64
r11 0x0 0
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x401657 0x401657 <test+359>
eflags 0x202 [ IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
xmm0 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0,
0x0,
0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000),
(0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm1 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm2 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm3 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm4 ( (0x0, 0x1, 0x0, 0x1), (0x1, 0x1), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f), (0x0, 0x0,
0x0,
0x3ff0,
0x0, 0x0, 0x0, 0x3ff0), (0x0, 0x3ff00000, 0x0, 0x3ff00000),
(0x3ff0000000000000
, 0x3ff0000000000000), 0x3ff00000000000003ff0000000000000 )
xmm5 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm6 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm7 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm8 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm9 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm10 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm11 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm12 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm13 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm14 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
xmm15 ( (0x0, 0x0, 0x0, 0x0), (0x0, 0x0), (0x0, 0x0, 0x0, 0x0,
0x0, 0x0
, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0), (0x0, 0x0, 0x0,
0x0,
0x0, 0
x0, 0x0, 0x0), (0x0, 0x0, 0x0, 0x0), (0x0, 0x0),
0x00000000000000000000000000000
000 )
mxcsr 0x1f80 IM DM ZM OM UM PM

#############################

On 6/7/14, wernsaar notifications@github.com wrote:

On 06.06.2014 19:31, jasax wrote:

Hi,

Sorry, I now see I have certainly missed the test programs (thought
it
was the simple one fortrarn function to test complex multiplication).

So, which is the test program? Do I have to compile openblas source
code? What is level3.c?

Thx

Jose

On 6/6/14, wernsaar notifications@github.com wrote:

On 06.06.2014 18:25, jasax wrote:

Hi,

Which openblas.dll do you want me to use with the gdb test? I have
the
following versions:

  1. original julia 0.2.1
  2. 0.3 dev from a few days ago.
  3. provided at
    http://sourceforge.net/projects/juliadeps-win/files/openblas-47b22763f8ab0219-x86_64-w64-mingw32.7z/download
    (didn't work)

http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-piledriver_win64bit_int64.tar.gz
(which worked!)

Please tell me. Meanwhile I'll try the latest.

Best
Jose

On 6/6/14, Viral B. Shah notifications@github.com wrote:

The issue was reported by @jasax on his machine. @jasax Could you
provide
what @wernsaar is requesting?


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the dll from this download:
http://sourceforge.net/projects/openblas/files/v0.2.9-rc2-lapacktest/openblas-v0.2.9-rc2-lapack-dynamic_win64bit_int64.tar.gz

I want to know, which zgemm_kernel is used.

Regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

please use the short fortran program for the test.
The file level3.c is the source code, you don't need to compile this
code.
It's only a syntax in gdb, to set a breakpoint

Regards,

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

level3.c is the single threaded mid-level function for gemm.
Could You please repeat the test, but without threading.

Best regards

Werner


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

in Linux, I simply type export OMP_NUM_THREADS=1 before starting gdb.
I don't know how to set this in Windows.

Regards

Werner


Reply to this email directly or view it on GitHub:
#380 (comment)

@ViralBShah
Copy link
Contributor

I thought that Piledriver kernels fallback to Bulldozer - is that right? If so, does that mean that the bug could potentially exist on Bulldozer too?

@jasax
Copy link

jasax commented Jun 11, 2014

Hi,

I have the compilation log (below) but warnings and errors are still
emitted for the console, not for the log file. So info goes on 2
blocks (separated by ##############): first the log file, second
console output. Also, I had to create manually the make.inc file...

Now the failure happens when exporting (lots of) symbols in gfortran
compilation of "lapack-netlib". Really don't know what to do from
here... (although mingw is a good system, its not a 100% substitute
for linux compilation toolchain...).

Any help appreciated...

Regards

Jose

..................................................................................

ar -ru ../../libopenblas_piledriverp-r0.2.9.a strtri_UU_single.obj
strtri_UN_single.obj strtri_LU_single.obj strtri_LN_single.obj
strtri_UU_parallel.obj strtri_UN_parallel.obj strtri_LU_parallel.obj
strtri_LN_parallel.obj dtrtri_UU_single.obj dtrtri_UN_single.obj
dtrtri_LU_single.obj dtrtri_LN_single.obj dtrtri_UU_parallel.obj
dtrtri_UN_parallel.obj dtrtri_LU_parallel.obj dtrtri_LN_parallel.obj
ctrtri_UU_single.obj ctrtri_UN_single.obj ctrtri_LU_single.obj
ctrtri_LN_single.obj ctrtri_UU_parallel.obj ctrtri_UN_parallel.obj
ctrtri_LU_parallel.obj ctrtri_LN_parallel.obj ztrtri_UU_single.obj
ztrtri_UN_single.obj ztrtri_LU_single.obj ztrtri_LN_single.obj
ztrtri_UU_parallel.obj ztrtri_UN_parallel.obj ztrtri_LU_parallel.obj
ztrtri_LN_parallel.obj
make[2]: Leaving directory /e/Compiled/OpenBLAS-develop/lapack/trtri' make[1]: Leaving directory/e/Compiled/OpenBLAS-develop/lapack'
make[1]: Entering directory /e/Compiled/OpenBLAS-develop/lapack-netlib' ( cd SRC; make ) make[2]: Entering directory/e/Compiled/OpenBLAS-develop/lapack-netlib/SRC'
gfortran -O2 -frecursive -c sgbbrd.f -o sgbbrd.o
gfortran -O2 -frecursive -c sgbcon.f -o sgbcon.o
gfortran -O2 -frecursive -c sgbequ.f -o sgbequ.o
gfortran -O2 -frecursive -c sgbrfs.f -o sgbrfs.o
gfortran -O2 -frecursive -c sgbsv.f -o sgbsv.o

.................................
gfortran -O2 -frecursive -c slags2.f -o slags2.o
gfortran -O2 -frecursive -c slagtm.f -o slagtm.o
gfortran -O2 -frecursive -c slagv2.f -o slagv2.o
gfortran -O2 -frecursive -c slahqr.f -o slahqr.o
gfortran -O2 -frecursive -c slahrd.f -o slahrd.o
gfortran -O2 -frecursive -c slahr2.f -o slahr2.o
gfortran -O2 -frecursive -c slaic1.f -o slaic1.o
gfortran -O2 -frecursive -c slaln2.f -o slaln2.o
gfortran -O2 -frecursive -c slals0.f -o slals0.o

##########################

................................................

trtri_L_parallel.c: In function 'ztrtri_LN_parallel':
trtri_L_parallel.c:57:17: warning: variable 'range_N' set but not used
[-Wunused-but-set-variable]
BLASLONG lda, range_N[2];
^
Makefile:7: make.inc: No such file or directory
make[1]: *** No rule to make target `make.inc'. Stop.
make: *** [netlib] Error 2

JAugusto@THOR /e/Compiled/OpenBLAS-develop
$ a2dll
sh: a2dll: command not found

JAugusto@THOR /e/Compiled/OpenBLAS-develop
$ lib2a
sh: lib2a: command not found

JAugusto@THOR /e/Compiled/OpenBLAS-develop
$ make TARGET=PILEDRIVER DEBUG=1 NOFORTRAN=0 >log1
The system cannot find the path specified.
Cannot export cbbcsd_: symbol not defined
Cannot export cbdsqr_: symbol not defined
Cannot export cgbbrd_: symbol not defined
Cannot export cgbcon_: symbol not defined
Cannot export cgbequ_: symbol not defined
Cannot export cgbequb_: symbol not defined
Cannot export cgbrfs_: symbol not defined
Cannot export cgbsv_: symbol not defined
Cannot export cgbsvx_: symbol not defined

...........................................................................

Cannot export zunmr3_: symbol not defined
Cannot export zunmrq_: symbol not defined
Cannot export zunmrq_: symbol not defined
Cannot export zunmrz_: symbol not defined
Cannot export zunmrz_: symbol not defined
Cannot export zunmtr_: symbol not defined
Cannot export zunmtr_: symbol not defined
Cannot export zupgtr_: symbol not defined
Cannot export zupgtr_: symbol not defined
Cannot export zupmtr_: symbol not defined
Cannot export zupmtr_: symbol not defined
collect2.exe: error: ld returned 1 exit status
make[1]: *** [../libopenblas.dll] Error 1
make: *** [shared] Error 2

JAugusto@THOR /e/Compiled/OpenBLAS-develop
$

On 6/11/14, Viral B. Shah notifications@github.com wrote:

Thank you very much.


Reply to this email directly or view it on GitHub:
#380 (comment)

@tkelman
Copy link
Contributor

tkelman commented Jun 12, 2014

@jasax it seems like your MinGW installation has trouble with Fortran? Weren't you able to compile the Fortran test examples earlier? Regardless, I built a debug version of OpenBLAS v0.2.9 using Julia's make flags and uploaded it here: http://sourceforge.net/projects/juliadeps-win/files/openblas-v0.2.9-x86_64-w64-mingw32-debug.7z/download

Do you see the problem with that binary, and if so please try the gdb steps discussed earlier.

@wernsaar
Copy link
Contributor

Hi,

I wrote 2 enhancements for dynamic_arch:

  1. You can set the environment variable OPENBLAS_VERBOSE=2, then the name of the core is printed:

Example:

OPENBLAS_VERBOSE=2 ./dlinpack.goto 1000 1000 1
Core: Bulldozer
From : 1000 To : 1000 Step = 1
SIZE Residual Decompose Solve Total
1000 : 7.212009e-13 37346.18 MFlops 2971.77 MFlops 36097.32 MFlops

  1. You can force a spezific core by setting the environmet variable OPENBLAS_CORETYPE

Example:

OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=barcelona ./dlinpack.goto 1000 1000 1
Core: Barcelona
From : 1000 To : 1000 Step = 1
SIZE Residual Decompose Solve Total
1000 : 3.805178e-12 30099.18 MFlops 2032.52 MFlops 28905.32 MFlops

OPENBLAS_VERBOSE=2 OPENBLAS_CORETYPE=opteron ./dlinpack.goto 1000 1000 1
Core: Opteron
From : 1000 To : 1000 Step = 1
SIZE Residual Decompose Solve Total
1000 : Illegal instruction

The name of the coretype is not case sensitive.

To test this, you can checkout my fork at github:
https://github.com/wernsaar/OpenBLAS.git

Best regards

Werner

@jasax
Copy link

jasax commented Jun 13, 2014

Hi,

I compiled the fortran small examplewith no problem.

But all the sources is other kind of animal ;-) Perhaps the errors
have to do with makefile structure or statements. In mingw it is used
mingw-make by default, not GNU make. I usually compile Ruby and Lua
from sources without problems, but they don't use gfortran ;-) And in
other software pieces I try to compile with mingw, I get failure...

Do you know if someone is compiling OpenBLAS regularly from sources
using mingw (or cygwin?).

But I already saw it was made available a tarball with a recompilation
with debug flags, so I'll give that one a shot tomorrow.

Regards
Jose

On 6/12/14, Tony Kelman notifications@github.com wrote:

@jasax it seems like your MinGW installation has trouble with Fortran?
Weren't you able to compile the Fortran test examples earlier? Regardless, I
built a debug version of OpenBLAS v0.2.9 using Julia's make flags and
uploaded it here:
http://sourceforge.net/projects/juliadeps-win/files/openblas-v0.2.9-x86_64-w64-mingw32-debug.7z/download

Do you see the problem with that binary, and if so please try the gdb steps
discussed earlier.


Reply to this email directly or view it on GitHub:
#380 (comment)

@tkelman
Copy link
Contributor

tkelman commented Jun 13, 2014

Yeah we're building openblas very regularly in msys2 with msys-gmake, and cygwin-to-mingw cross-compile with cygwin-gmake. I suspect mingw-make can't handle the openblas Makefiles properly. Also the mingw-w64 distribution you're using (tdm) is one I've used on other projects, but not with openblas and julia.

@jasax
Copy link

jasax commented Jun 16, 2014

Hi Werner, Toni,

Finally I had a few time to test the libopenblas with debug at my work
8-core. In the weekend and end of last week I've been out...

Here goes the output of info all-register inside zgemm... etc.

I tried to set number of threads=1 (as you told) but 2 are launched
(perhaps non overlapping...).

Please give more instructions :-)

Regards

Jose

JAugusto@THOR ~
$ cd /c/Users/JAugusto/Dropbox/Julia/OPenBLAS_Tests/

JAugusto@THOR /c/Users/JAugusto/Dropbox/Julia/OPenBLAS_Tests
$ export OMP_NUM_THREADS=1

JAugusto@THOR /c/Users/JAugusto/Dropbox/Julia/OPenBLAS_Tests
$ gdb a.exe
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from
c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests\a.exe...done.
(gdb) break level3.c:358
No source file named level3.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (level3.c:358) pending.
(gdb) run
Starting program: c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests/a.exe
[New Thread 1652.0xd84]
[New Thread 1652.0x2bc]

Breakpoint 1, zgemm_nn (args=0x22f7a0, range_m=0x0, range_n=0x0,
sa=0x2010040, sb=0x210c380, dummy=0) at level3.c:358
358 level3.c: No such file or directory.
(gdb) s
zgemm_kernel_n_PILEDRIVER () at
../kernel/x86_64/zgemm_kernel_2x2_piledriver.S:429
429 ../kernel/x86_64/zgemm_kernel_2x2_piledriver.S: No such file
or directory.
(gdb) info all-register
rax 0x3ff0000000000000 4607182418800017408
rbx 0x210c380 34653056
rcx 0x4 4
rdx 0x2 2
rsi 0x22fb10 2292496
rdi 0x6e7090 7237776
rbp 0x22f690 0x22f690
rsp 0x22f608 0x22f608
r8 0x4 4
r9 0x6f056860 1862625376
r10 0x2 2
r11 0x4 4
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x6f056860 0x6f056860 <zgemm_kernel_n_PILEDRIVER>
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
xmm0 {v4_float = {0x0, 0x1, 0x0, 0x0}, v2_double = {0x1,
0x0}, v16_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 =
{0x0, 0x0, 0x0, 0x3ff0, 0x0, 0x0, 0x0, 0x0},
v4_int32 = {0x0, 0x3ff00000, 0x0, 0x0}, v2_int64 = {0x3ff0000000000000, 0x0},
uint128 = 0x00000000000000003ff0000000000000}
xmm1 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm2 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm3 {v4_float = {0x0, 0x1, 0x0, 0x0}, v2_double = {0x1,
0x0}, v16_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 =
{0x0, 0x0, 0x0, 0x3ff0, 0x0, 0x0, 0x0, 0x0},
v4_int32 = {0x0, 0x3ff00000, 0x0, 0x0}, v2_int64 = {0x3ff0000000000000, 0x0},
uint128 = 0x00000000000000003ff0000000000000}
xmm4 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm5 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm6 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm7 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm8 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm9 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm10 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm11 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm12 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm13 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm14 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm15 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
mxcsr 0x1f80 IM DM ZM OM UM PM

On 6/13/14, Jose Augusto jasaugusto@gmail.com wrote:

Hi,

I compiled the fortran small examplewith no problem.

But all the sources is other kind of animal ;-) Perhaps the errors
have to do with makefile structure or statements. In mingw it is used
mingw-make by default, not GNU make. I usually compile Ruby and Lua
from sources without problems, but they don't use gfortran ;-) And in
other software pieces I try to compile with mingw, I get failure...

Do you know if someone is compiling OpenBLAS regularly from sources
using mingw (or cygwin?).

But I already saw it was made available a tarball with a recompilation
with debug flags, so I'll give that one a shot tomorrow.

Regards
Jose

On 6/12/14, Tony Kelman notifications@github.com wrote:

@jasax it seems like your MinGW installation has trouble with Fortran?
Weren't you able to compile the Fortran test examples earlier? Regardless,
I
built a debug version of OpenBLAS v0.2.9 using Julia's make flags and
uploaded it here:
http://sourceforge.net/projects/juliadeps-win/files/openblas-v0.2.9-x86_64-w64-mingw32-debug.7z/download

Do you see the problem with that binary, and if so please try the gdb
steps
discussed earlier.


Reply to this email directly or view it on GitHub:
#380 (comment)

@wernsaar
Copy link
Contributor

On 16.06.2014 22:19, jasax wrote:

Hi Werner, Toni,

Finally I had a few time to test the libopenblas with debug at my work
8-core. In the weekend and end of last week I've been out...

Here goes the output of info all-register inside zgemm... etc.

I tried to set number of threads=1 (as you told) but 2 are launched
(perhaps non overlapping...).

Please give more instructions :-)

Regards

Jose

JAugusto@THOR ~
$ cd /c/Users/JAugusto/Dropbox/Julia/OPenBLAS_Tests/

JAugusto@THOR /c/Users/JAugusto/Dropbox/Julia/OPenBLAS_Tests
$ export OMP_NUM_THREADS=1

JAugusto@THOR /c/Users/JAugusto/Dropbox/Julia/OPenBLAS_Tests
$ gdb a.exe
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from
c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests\a.exe...done.
(gdb) break level3.c:358
No source file named level3.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (level3.c:358) pending.
(gdb) run
Starting program: c:\Users\JAugusto\Dropbox\Julia\OPenBLAS_Tests/a.exe
[New Thread 1652.0xd84]
[New Thread 1652.0x2bc]

Breakpoint 1, zgemm_nn (args=0x22f7a0, range_m=0x0, range_n=0x0,
sa=0x2010040, sb=0x210c380, dummy=0) at level3.c:358
358 level3.c: No such file or directory.
(gdb) s
zgemm_kernel_n_PILEDRIVER () at
../kernel/x86_64/zgemm_kernel_2x2_piledriver.S:429
429 ../kernel/x86_64/zgemm_kernel_2x2_piledriver.S: No such file
or directory.
(gdb) info all-register
rax 0x3ff0000000000000 4607182418800017408
rbx 0x210c380 34653056
rcx 0x4 4
rdx 0x2 2
rsi 0x22fb10 2292496
rdi 0x6e7090 7237776
rbp 0x22f690 0x22f690
rsp 0x22f608 0x22f608
r8 0x4 4
r9 0x6f056860 1862625376
r10 0x2 2
r11 0x4 4
r12 0x1 1
r13 0x8 8
r14 0x0 0
r15 0x0 0
rip 0x6f056860 0x6f056860 <zgemm_kernel_n_PILEDRIVER>
eflags 0x206 [ PF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
xmm0 {v4_float = {0x0, 0x1, 0x0, 0x0}, v2_double = {0x1,
0x0}, v16_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 =
{0x0, 0x0, 0x0, 0x3ff0, 0x0, 0x0, 0x0, 0x0},
v4_int32 = {0x0, 0x3ff00000, 0x0, 0x0}, v2_int64 = {0x3ff0000000000000, 0x0},
uint128 = 0x00000000000000003ff0000000000000}
xmm1 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm2 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm3 {v4_float = {0x0, 0x1, 0x0, 0x0}, v2_double = {0x1,
0x0}, v16_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0xf0, 0x3f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 =
{0x0, 0x0, 0x0, 0x3ff0, 0x0, 0x0, 0x0, 0x0},
v4_int32 = {0x0, 0x3ff00000, 0x0, 0x0}, v2_int64 = {0x3ff0000000000000, 0x0},
uint128 = 0x00000000000000003ff0000000000000}
xmm4 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm5 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm6 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm7 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm8 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm9 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm10 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm11 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm12 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm13 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm14 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
xmm15 {v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0,
0x0}, v16_int8 = {0x0 <repeats 16 times>},
v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 =
{0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0},
uint128 = 0x00000000000000000000000000000000}
mxcsr 0x1f80 IM DM ZM OM UM PM

On 6/13/14, Jose Augusto jasaugusto@gmail.com wrote:

Hi,

I compiled the fortran small examplewith no problem.

But all the sources is other kind of animal ;-) Perhaps the errors
have to do with makefile structure or statements. In mingw it is used
mingw-make by default, not GNU make. I usually compile Ruby and Lua
from sources without problems, but they don't use gfortran ;-) And in
other software pieces I try to compile with mingw, I get failure...

Do you know if someone is compiling OpenBLAS regularly from sources
using mingw (or cygwin?).

But I already saw it was made available a tarball with a recompilation
with debug flags, so I'll give that one a shot tomorrow.

Regards
Jose

On 6/12/14, Tony Kelman notifications@github.com wrote:

@jasax it seems like your MinGW installation has trouble with Fortran?
Weren't you able to compile the Fortran test examples earlier? Regardless,
I
built a debug version of OpenBLAS v0.2.9 using Julia's make flags and
uploaded it here:
http://sourceforge.net/projects/juliadeps-win/files/openblas-v0.2.9-x86_64-w64-mingw32-debug.7z/download

Do you see the problem with that binary, and if so please try the gdb
steps
discussed earlier.


Reply to this email directly or view it on GitHub:

#380 (comment)

Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

thank you for the debug output,
It seems that the register rdx and rsi have wrong values, but I don't know
whether gdb jumped into the assembler because the source file was not found.

Regards
Werner

@wernsaar
Copy link
Contributor

Hi,
please send me the files Makefile_kernel.conf and config_kernel.h

Regards
Werner

@wernsaar
Copy link
Contributor

I lowered the stack usage for gemm kernels.
Could you please clone my repository https://github.com/wernsaar/OpenBLAS
and try the tests again.

Thanks
Werner

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2014

Thanks Werner. Building from your develop branch now, will post a binary in the Julia issue so the users who were having this problem can test.

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2014

Oh, and here are the 2 files you requested, but from a build of your latest commit 23203d5, building on my Sandy Bridge machine (which does not exhibit the bug), with DYNAMIC_ARCH enabled so we will test on a few Haswell machines to see whether the bug occurs
https://gist.github.com/tkelman/b6e833ce3333310828dc

@wernsaar
Copy link
Contributor

On 19.06.2014 19:58, Tony Kelman wrote:

Oh, and here are the 2 files you requested, but from a build of your latest commit 23203d5, building on my Sandy Bridge machine (which does not exhibit the bug), but with DYNAMIC_ARCH so we will test on a few Haswell machines to see whether the bug occurs
https://gist.github.com/tkelman/b6e833ce3333310828dc


Reply to this email directly or view it on GitHub:
#380 (comment)
Hi,

on SANDYBRIGE machines, a local stack is not used for gemm kernels, but
on later processors.
I reduced the stack usage on the later processors to max. 8192 MB,
in the hope, that this was the bug.

Best regards

Werner

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2014

Not sure about AMD yet, but one of the Haswell users (i7-4700MQ) is still seeing incorrect results unless he sets OPENBLAS_CORETYPE=nehalem. We need to come up with a reduced Fortran or C test case for that one, it's not the same issue as this thread.

@wernsaar
Copy link
Contributor

On 19.06.2014 21:02, Tony Kelman wrote:

Not sure about AMD yet, but one of the Haswell users (i7-4700MQ) is still seeing incorrect results unless he sets OPENBLAS_CORETYPE=nehalem. We need to come up with a reduced Fortran or C test case for that one, it's not the same issue as this thread.


Reply to this email directly or view it on GitHub:
#380 (comment)
Thank you for the test report. Did you ran the test on Windows?
On Linux, I can't reproduce the error.

Is it possible, that I can have remote access to a test machine, to debug
the gemm kernel.

Best regards

Werner

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2014

Yes, Windows 8 x86_64 was the Haswell user running into problems. I just asked him about the possibility of remote debugging in the Julia issue.

My lab got a new Windows Haswell machine recently, I haven't had a chance to test out OpenBLAS on it yet, but should be able to borrow it starting sometime next week.

@wernsaar
Copy link
Contributor

Hi,

this evening, I got another idea. Do you also get wrong results by setting
OPENBLAS_CORETYPE=sandybridge ?
If yes, this could be a strong assembler problem, so that avx and fma instructions
provide different codes on different platforms.

Best Regards

Werner

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2014

Hi Werner, let's leave this issue for discussing the AMD problem, and use #392 for the Haswell one

@wernsaar
Copy link
Contributor

a lot of bugs if compiling for Windows are fixed. So I close this issue

@ViralBShah
Copy link
Contributor

Can you make a 0.2.10 release candidate release with these fixes, so that we can try it out?

@xianyi
Copy link
Collaborator

xianyi commented Jun 29, 2014

Hi @ViralBShah ,
I merged @wernsaar 's patch and released v0.2.10.rc1 version.
https://github.com/xianyi/OpenBLAS/releases/tag/v0.2.10.rc1

@ViralBShah
Copy link
Contributor

Thank you very much. I have updated julia to use 0.2.10.rc1 and will keep you posted as we verify everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants