Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to compile for AMD A8-3800 APU #586

Closed
hadyelsahar opened this issue Jun 1, 2015 · 7 comments
Closed

fail to compile for AMD A8-3800 APU #586

hadyelsahar opened this issue Jun 1, 2015 · 7 comments

Comments

@hadyelsahar
Copy link

i downloaded latest version from the master branch . when i try to make the source code i get the following error :

getarch_2nd.c: In function ‘main’:
getarch_2nd.c:12:35: error: ‘SGEMM_DEFAULT_UNROLL_M’ undeclared (first use in this function)
printf("SGEMM_UNROLL_M=%d\n", SGEMM_DEFAULT_UNROLL_M);

Obviously, I can't find the AMD A8 series processors inside the TargetList.txt flie. does this mean it's not supported. Is there anything i can do ?

@wernsaar
Copy link
Contributor

wernsaar commented Jun 1, 2015

Hi,

please download at first from the latest develop branch.
Then try to run
make TARGET=STEAMROLLER
or
make TARGET=PILEDRIVER

Regards
Werner

On 06/01/2015 02:01 PM, hady elsahar wrote:

i downloaded latest version from the master branch . when i try to
make the source code i get the following error :

getarch_2nd.c: In function ‘main’:
getarch_2nd.c:12:35: error: ‘SGEMM_DEFAULT_UNROLL_M’ undeclared (first
use in this function)
printf("SGEMM_UNROLL_M=%d\n", SGEMM_DEFAULT_UNROLL_M);

Obviously, I can't find the AMD A8 series processors inside the
TargetList.txt flie. does this mean it's not supported. Is there
anything i can do ?


Reply to this email directly or view it on GitHub
#586.

@hadyelsahar
Copy link
Author

i could overcome the error in the build process but i encounter the same error again when i try make install

sudo make PREFIX=/opt/OpenBLAS install

getarch_2nd.c: In function ‘main’:
getarch_2nd.c:12:35: error: ‘SGEMM_DEFAULT_UNROLL_M’ undeclared (first use in this function)
     printf("SGEMM_UNROLL_M=%d\n", SGEMM_DEFAULT_UNROLL_M);
                                   ^
getarch_2nd.c:12:35: note: each undeclared identifier is reported only once for each function it appears in
getarch_2nd.c:13:35: error: ‘SGEMM_DEFAULT_UNROLL_N’ undeclared (first use in this function)
     printf("SGEMM_UNROLL_N=%d\n", SGEMM_DEFAULT_UNROLL_N);
                                   ^
getarch_2nd.c:14:35: error: ‘DGEMM_DEFAULT_UNROLL_M’ undeclared (first use in this function)
     printf("DGEMM_UNROLL_M=%d\n", DGEMM_DEFAULT_UNROLL_M);
                                   ^
getarch_2nd.c:15:35: error: ‘DGEMM_DEFAULT_UNROLL_N’ undeclared (first use in this function)
     printf("DGEMM_UNROLL_N=%d\n", DGEMM_DEFAULT_UNROLL_N);
                                   ^

@wernsaar
Copy link
Contributor

wernsaar commented Jun 3, 2015

Hi,

you also have to use TARGET=... when running make install.

We need more information about this processor, then we
can update some files for right detection.

Best regards
Werner

On 06/03/2015 10:00 AM, hady elsahar wrote:

i could overcome the error in the build process but i encounter the
same error again when i try make install

|sudo make PREFIX=/opt/OpenBLAS install|

|getarch_2nd.c: In function ‘main’:
getarch_2nd.c:12:35: error: ‘SGEMM_DEFAULT_UNROLL_M’ undeclared (first use in this function)
printf("SGEMM_UNROLL_M=%d\n", SGEMM_DEFAULT_UNROLL_M);
^
getarch_2nd.c:12:35: note: each undeclared identifier is reported only once for each function it appears in
getarch_2nd.c:13:35: error: ‘SGEMM_DEFAULT_UNROLL_N’ undeclared (first use in this function)
printf("SGEMM_UNROLL_N=%d\n", SGEMM_DEFAULT_UNROLL_N);
^
getarch_2nd.c:14:35: error: ‘DGEMM_DEFAULT_UNROLL_M’ undeclared (first use in this function)
printf("DGEMM_UNROLL_M=%d\n", DGEMM_DEFAULT_UNROLL_M);
^
getarch_2nd.c:15:35: error: ‘DGEMM_DEFAULT_UNROLL_N’ undeclared (first use in this function)
printf("DGEMM_UNROLL_N=%d\n", DGEMM_DEFAULT_UNROLL_N);
^
|


Reply to this email directly or view it on GitHub
#586 (comment).

@hadyelsahar
Copy link
Author

Thank you , it seems that it was successfully installed.
however as this was a part of installing Torch dependencies script, hope this modification didn't break something.

We need more information about this processor, then we
can update some files for right detection.

What kind of additional information do you need ?

Thank you
Regards

@wernsaar
Copy link
Contributor

wernsaar commented Jun 5, 2015

Hi,

if this is a linux system, you can send me the output of cat /proc/cpuinfo

Can you download and install latest acml for AMD. You find cpuid.exe
in the folder util. Execute this program and send me the output.

Best regards
Werner

On 06/05/2015 10:49 AM, hady elsahar wrote:

Thank you , it seems that it was successfully installed.
however as this was a part of installing (Torch)[http://torch.ch/]
dependencies script, hope this modification didn't break something.

We need more information about this processor, then we
can update some files for right detection.

What kind of additional information do you need ?

Thank you
Regards


Reply to this email directly or view it on GitHub
#586 (comment).

@hadyelsahar
Copy link
Author

I think compiling and installing using TARGET=STEAMROLLER hasn't solved problem, although it was successful build and install.

when i try to install Torch that depends on OpenBlas , when running the tests, is shows a SIGILL error when debugging i found that the problem was related to the OpenBlas library , specifically for the function dgemm_oncopy () , below are the stack trace and the disassembly of the current block.

the stack trace :

#0  0x00007ffff532df20 in dgemm_oncopy () from /opt/OpenBLAS/lib/libopenblas.so.0
#1  0x0000000000000006 in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x00007ffff51bd66f in dtrsm_LNUN () from /opt/OpenBLAS/lib/libopenblas.so.0
#4  0x00007ffff50e9734 in dtrsm_ () from /opt/OpenBLAS/lib/libopenblas.so.0
#5  0x00007ffff562e679 in dtrtrs_ () from /opt/OpenBLAS/lib/libopenblas.so.0
#6  0x00007ffff555bb0f in dgels_ () from /opt/OpenBLAS/lib/libopenblas.so.0
#7  0x00007ffff62c8449 in THDoubleLapack_gels () from /home/hadynew/torch/install/lib/libTH.so
#8  0x00007ffff62bf518 in THDoubleTensor_gels () from /home/hadynew/torch/install/lib/libTH.so
#9  0x00007ffff67f3f2d in torch_DoubleTensor_gels () from /home/hadynew/torch/install/lib/lua/5.1/libtorch.so
#10 0x000000000047cf9a in lj_BC_FUNCC ()
#11 0x00007ffff681cd42 in torch_gels () from /home/hadynew/torch/install/lib/lua/5.1/libtorch.so
#12 0x000000000047cf9a in lj_BC_FUNCC ()
#13 0x000000000046c57d in lua_pcall ()
#14 0x0000000000406f4f in pmain ()
#15 0x000000000047cf9a in lj_BC_FUNCC ()
#16 0x000000000046c5f7 in lua_cpcall ()
#17 0x0000000000404f04 in main ()

Dump of assembler code for function dgemm_oncopy:

   0x00007ffff532de00 <+0>: push   %r13
   0x00007ffff532de02 <+2>: push   %r12
   0x00007ffff532de04 <+4>: lea    0x0(,%rcx,8),%rcx
   0x00007ffff532de0c <+12>:    mov    %rsi,%r10
   0x00007ffff532de0f <+15>:    sar    %r10
   0x00007ffff532de12 <+18>:    jle    0x7ffff532dfd0 <dgemm_oncopy+464>
   0x00007ffff532de18 <+24>:    nopl   0x0(%rax,%rax,1)
   0x00007ffff532de20 <+32>:    mov    %rdx,%r11
   0x00007ffff532de23 <+35>:    lea    (%rdx,%rcx,1),%r12
   0x00007ffff532de27 <+39>:    lea    (%rdx,%rcx,2),%rdx
   0x00007ffff532de2b <+43>:    mov    %rdi,%r9
   0x00007ffff532de2e <+46>:    sar    $0x3,%r9
   0x00007ffff532de32 <+50>:    jle    0x7ffff532df10 <dgemm_oncopy+272>
   0x00007ffff532de38 <+56>:    nopl   0x0(%rax,%rax,1)
   0x00007ffff532de40 <+64>:    prefetchw 0x100(%r8)
   0x00007ffff532de48 <+72>:    prefetchnta 0x100(%r11)
   0x00007ffff532de50 <+80>:    vmovsd (%r11),%xmm0
   0x00007ffff532de55 <+85>:    vmovsd 0x8(%r11),%xmm1
   0x00007ffff532de5b <+91>:    vmovsd 0x10(%r11),%xmm2
   0x00007ffff532de61 <+97>:    vmovsd 0x18(%r11),%xmm3
   0x00007ffff532de67 <+103>:   vmovsd 0x20(%r11),%xmm4
   0x00007ffff532de6d <+109>:   vmovsd 0x28(%r11),%xmm5
   0x00007ffff532de73 <+115>:   vmovsd 0x30(%r11),%xmm6
   0x00007ffff532de79 <+121>:   vmovsd 0x38(%r11),%xmm7
   0x00007ffff532de7f <+127>:   prefetchnta 0x100(%r12)
   0x00007ffff532de88 <+136>:   vmovhpd (%r12),%xmm0,%xmm0
   0x00007ffff532de8e <+142>:   vmovhpd 0x8(%r12),%xmm1,%xmm1
   0x00007ffff532de95 <+149>:   vmovhpd 0x10(%r12),%xmm2,%xmm2
   0x00007ffff532de9c <+156>:   vmovhpd 0x18(%r12),%xmm3,%xmm3
   0x00007ffff532dea3 <+163>:   vmovhpd 0x20(%r12),%xmm4,%xmm4
   0x00007ffff532deaa <+170>:   vmovhpd 0x28(%r12),%xmm5,%xmm5
   0x00007ffff532deb1 <+177>:   vmovhpd 0x30(%r12),%xmm6,%xmm6
   0x00007ffff532deb8 <+184>:   vmovhpd 0x38(%r12),%xmm7,%xmm7
   0x00007ffff532debf <+191>:   prefetchw 0x140(%r8)
   0x00007ffff532dec7 <+199>:   vmovups %xmm0,(%r8)
   0x00007ffff532decc <+204>:   vmovups %xmm1,0x10(%r8)
   0x00007ffff532ded2 <+210>:   vmovups %xmm2,0x20(%r8)
   0x00007ffff532ded8 <+216>:   vmovups %xmm3,0x30(%r8)
   0x00007ffff532dede <+222>:   vmovups %xmm4,0x40(%r8)
   0x00007ffff532dee4 <+228>:   vmovups %xmm5,0x50(%r8)
   0x00007ffff532deea <+234>:   vmovups %xmm6,0x60(%r8)
   0x00007ffff532def0 <+240>:   vmovups %xmm7,0x70(%r8)
   0x00007ffff532def6 <+246>:   add    $0x40,%r11
   0x00007ffff532defa <+250>:   add    $0x40,%r12
   0x00007ffff532defe <+254>:   sub    $0xffffffffffffff80,%r8
   0x00007ffff532df02 <+258>:   dec    %r9
   0x00007ffff532df05 <+261>:   jg     0x7ffff532de40 <dgemm_oncopy+64>
   0x00007ffff532df0b <+267>:   nopl   0x0(%rax,%rax,1)
   0x00007ffff532df10 <+272>:   test   $0x4,%rdi
   0x00007ffff532df17 <+279>:   je     0x7ffff532df80 <dgemm_oncopy+384>
   0x00007ffff532df19 <+281>:   nopl   0x0(%rax)
=> 0x00007ffff532df20 <+288>:   vmovsd (%r11),%xmm0
   0x00007ffff532df25 <+293>:   vmovsd 0x8(%r11),%xmm1
   0x00007ffff532df2b <+299>:   vmovsd 0x10(%r11),%xmm2
   0x00007ffff532df31 <+305>:   vmovsd 0x18(%r11),%xmm3
   0x00007ffff532df37 <+311>:   vmovhpd (%r12),%xmm0,%xmm0
   0x00007ffff532df3d <+317>:   vmovhpd 0x8(%r12),%xmm1,%xmm1
   0x00007ffff532df44 <+324>:   vmovhpd 0x10(%r12),%xmm2,%xmm2
   0x00007ffff532df4b <+331>:   vmovhpd 0x18(%r12),%xmm3,%xmm3
   0x00007ffff532df52 <+338>:   vmovups %xmm0,(%r8)
   0x00007ffff532df57 <+343>:   vmovups %xmm1,0x10(%r8)
   0x00007ffff532df5d <+349>:   vmovups %xmm2,0x20(%r8)
   0x00007ffff532df63 <+355>:   vmovups %xmm3,0x30(%r8)
   0x00007ffff532df69 <+361>:   add    $0x20,%r11
   0x00007ffff532df6d <+365>:   add    $0x20,%r12
   0x00007ffff532df71 <+369>:   sub    $0xffffffffffffffc0,%r8
   0x00007ffff532df75 <+373>:   data32 nopw %cs:0x0(%rax,%rax,1)
   0x00007ffff532df80 <+384>:   mov    %rdi,%r9
   0x00007ffff532df83 <+387>:   and    $0x3,%r9
   0x00007ffff532df87 <+391>:   jle    0x7ffff532dfc0 <dgemm_oncopy+448>
   0x00007ffff532df89 <+393>:   nopl   0x0(%rax)
   0x00007ffff532df90 <+400>:   vmovsd (%r11),%xmm0
   0x00007ffff532df95 <+405>:   vmovhpd (%r12),%xmm0,%xmm0
   0x00007ffff532df9b <+411>:   vmovups %xmm0,(%r8)
   0x00007ffff532dfa0 <+416>:   add    $0x8,%r11
   0x00007ffff532dfa4 <+420>:   add    $0x8,%r12
   0x00007ffff532dfa8 <+424>:   add    $0x10,%r8
   0x00007ffff532dfac <+428>:   dec    %r9
   0x00007ffff532dfaf <+431>:   jg     0x7ffff532df90 <dgemm_oncopy+400>
   0x00007ffff532dfb1 <+433>:   data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
   0x00007ffff532dfc0 <+448>:   dec    %r10
   0x00007ffff532dfc3 <+451>:   jg     0x7ffff532de20 <dgemm_oncopy+32>
   0x00007ffff532dfc9 <+457>:   nopl   0x0(%rax)
   0x00007ffff532dfd0 <+464>:   test   $0x1,%rsi
   0x00007ffff532dfd7 <+471>:   jle    0x7ffff532e050 <dgemm_oncopy+592>
   0x00007ffff532dfd9 <+473>:   mov    %rdx,%r11
   0x00007ffff532dfdc <+476>:   mov    %rdi,%r9
   0x00007ffff532dfdf <+479>:   sar    $0x2,%r9
   0x00007ffff532dfe3 <+483>:   jle    0x7ffff532e020 <dgemm_oncopy+544>
   0x00007ffff532dfe5 <+485>:   data32 nopw %cs:0x0(%rax,%rax,1)
   0x00007ffff532dff0 <+496>:   vmovups (%r11),%xmm0
   0x00007ffff532dff5 <+501>:   vmovups 0x10(%r11),%xmm1
   0x00007ffff532dffb <+507>:   vmovups %xmm0,(%r8)
   0x00007ffff532e000 <+512>:   vmovups %xmm1,0x10(%r8)
   0x00007ffff532e006 <+518>:   add    $0x20,%r11
   0x00007ffff532e00a <+522>:   sub    $0xffffffffffffffe0,%r8
   0x00007ffff532e00e <+526>:   dec    %r9
   0x00007ffff532e011 <+529>:   jg     0x7ffff532dff0 <dgemm_oncopy+496>
   0x00007ffff532e013 <+531>:   data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
   0x00007ffff532e020 <+544>:   mov    %rdi,%r9
   0x00007ffff532e023 <+547>:   and    $0x3,%r9
   0x00007ffff532e027 <+551>:   jle    0x7ffff532e050 <dgemm_oncopy+592>
   0x00007ffff532e029 <+553>:   nopl   0x0(%rax)
   0x00007ffff532e030 <+560>:   vmovsd (%r11),%xmm0
   0x00007ffff532e035 <+565>:   vmovsd %xmm0,(%r8)
   0x00007ffff532e03a <+570>:   add    $0x8,%r11
   0x00007ffff532e03e <+574>:   add    $0x8,%r8
   0x00007ffff532e042 <+578>:   dec    %r9
   0x00007ffff532e045 <+581>:   jg     0x7ffff532e030 <dgemm_oncopy+560>
   0x00007ffff532e047 <+583>:   nopw   0x0(%rax,%rax,1)
   0x00007ffff532e050 <+592>:   pop    %r12
   0x00007ffff532e052 <+594>:   pop    %r13
   0x00007ffff532e054 <+596>:   retq   

I'll post the processor details in an upcoming comment.

Thank you
Regards

@wernsaar
Copy link
Contributor

wernsaar commented Jun 6, 2015

Hi,

it seems, like there is a stack corruption.
#1 and #2 of the stack trace are strange.
Look with ulimit -s at the maximum stack size.
Try to set the stack size ulimit -s 16384.
If torch uses threads, you should build OpenBLAS with the flag
FCOMMON_OPT=-frecursive.
Run lapack-test after you build OpenBLAS.

Regards
Werner

On 06/05/2015 05:37 PM, hady elsahar wrote:

I think compiling and installing using |TARGET=STEAMROLLER| hasn't
solved problem, although it was successful build and install.

when i try to install Torch http://torch.ch/ that depends on
OpenBlas , when running the tests, is shows a SIGILL error when
debugging i found that the problem was related to the OpenBlas library
, specifically for the function dgemm_oncopy () , below are the stack
trace and the disassembly of the current block.

the stack trace :

|#0 0x00007ffff532df20 in dgemm_oncopy () from /opt/OpenBLAS/lib/libopenblas.so.0
#1 0x0000000000000006 in ?? ()
#2 0x0000000000000002 in ?? ()
#3 0x00007ffff51bd66f in dtrsm_LNUN () from /opt/OpenBLAS/lib/libopenblas.so.0
#4 0x00007ffff50e9734 in dtrsm_ () from /opt/OpenBLAS/lib/libopenblas.so.0
#5 0x00007ffff562e679 in dtrtrs_ () from /opt/OpenBLAS/lib/libopenblas.so.0
#6 0x00007ffff555bb0f in dgels_ () from /opt/OpenBLAS/lib/libopenblas.so.0
#7 0x00007ffff62c8449 in THDoubleLapack_gels () from /home/hadynew/torch/install/lib/libTH.so
#8 0x00007ffff62bf518 in THDoubleTensor_gels () from /home/hadynew/torch/install/lib/libTH.so
#9 0x00007ffff67f3f2d in torch_DoubleTensor_gels () from /home/hadynew/torch/install/lib/lua/5.1/libtorch.so
#10 0x000000000047cf9a in lj_BC_FUNCC ()
#11 0x00007ffff681cd42 in torch_gels () from /home/hadynew/torch/install/lib/lua/5.1/libtorch.so
#12 0x000000000047cf9a in lj_BC_FUNCC ()
#13 0x000000000046c57d in lua_pcall ()
#14 0x0000000000406f4f in pmain ()
#15 0x000000000047cf9a in lj_BC_FUNCC ()
#16 0x000000000046c5f7 in lua_cpcall ()
#17 0x0000000000404f04 in main ()
|

Dump of assembler code for function dgemm_oncopy:

| 0x00007ffff532de00 <+0>: push %r13
0x00007ffff532de02 <+2>: push %r12
0x00007ffff532de04 <+4>: lea 0x0(,%rcx,8),%rcx
0x00007ffff532de0c <+12>: mov %rsi,%r10
0x00007ffff532de0f <+15>: sar %r10
0x00007ffff532de12 <+18>: jle 0x7ffff532dfd0 <dgemm_oncopy+464>
0x00007ffff532de18 <+24>: nopl 0x0(%rax,%rax,1)
0x00007ffff532de20 <+32>: mov %rdx,%r11
0x00007ffff532de23 <+35>: lea (%rdx,%rcx,1),%r12
0x00007ffff532de27 <+39>: lea (%rdx,%rcx,2),%rdx
0x00007ffff532de2b <+43>: mov %rdi,%r9
0x00007ffff532de2e <+46>: sar $0x3,%r9
0x00007ffff532de32 <+50>: jle 0x7ffff532df10 <dgemm_oncopy+272>
0x00007ffff532de38 <+56>: nopl 0x0(%rax,%rax,1)
0x00007ffff532de40 <+64>: prefetchw 0x100(%r8)
0x00007ffff532de48 <+72>: prefetchnta 0x100(%r11)
0x00007ffff532de50 <+80>: vmovsd (%r11),%xmm0
0x00007ffff532de55 <+85>: vmovsd 0x8(%r11),%xmm1
0x00007ffff532de5b <+91>: vmovsd 0x10(%r11),%xmm2
0x00007ffff532de61 <+97>: vmovsd 0x18(%r11),%xmm3
0x00007ffff532de67 <+103>: vmovsd 0x20(%r11),%xmm4
0x00007ffff532de6d <+109>: vmovsd 0x28(%r11),%xmm5
0x00007ffff532de73 <+115>: vmovsd 0x30(%r11),%xmm6
0x00007ffff532de79 <+121>: vmovsd 0x38(%r11),%xmm7
0x00007ffff532de7f <+127>: prefetchnta 0x100(%r12)
0x00007ffff532de88 <+136>: vmovhpd (%r12),%xmm0,%xmm0
0x00007ffff532de8e <+142>: vmovhpd 0x8(%r12),%xmm1,%xmm1
0x00007ffff532de95 <+149>: vmovhpd 0x10(%r12),%xmm2,%xmm2
0x00007ffff532de9c <+156>: vmovhpd 0x18(%r12),%xmm3,%xmm3
0x00007ffff532dea3 <+163>: vmovhpd 0x20(%r12),%xmm4,%xmm4
0x00007ffff532deaa <+170>: vmovhpd 0x28(%r12),%xmm5,%xmm5
0x00007ffff532deb1 <+177>: vmovhpd 0x30(%r12),%xmm6,%xmm6
0x00007ffff532deb8 <+184>: vmovhpd 0x38(%r12),%xmm7,%xmm7
0x00007ffff532debf <+191>: prefetchw 0x140(%r8)
0x00007ffff532dec7 <+199>: vmovups %xmm0,(%r8)
0x00007ffff532decc <+204>: vmovups %xmm1,0x10(%r8)
0x00007ffff532ded2 <+210>: vmovups %xmm2,0x20(%r8)
0x00007ffff532ded8 <+216>: vmovups %xmm3,0x30(%r8)
0x00007ffff532dede <+222>: vmovups %xmm4,0x40(%r8)
0x00007ffff532dee4 <+228>: vmovups %xmm5,0x50(%r8)
0x00007ffff532deea <+234>: vmovups %xmm6,0x60(%r8)
0x00007ffff532def0 <+240>: vmovups %xmm7,0x70(%r8)
0x00007ffff532def6 <+246>: add $0x40,%r11
0x00007ffff532defa <+250>: add $0x40,%r12
0x00007ffff532defe <+254>: sub $0xffffffffffffff80,%r8
0x00007ffff532df02 <+258>: dec %r9
0x00007ffff532df05 <+261>: jg 0x7ffff532de40 <dgemm_oncopy+64>
0x00007ffff532df0b <+267>: nopl 0x0(%rax,%rax,1)
0x00007ffff532df10 <+272>: test $0x4,%rdi
0x00007ffff532df17 <+279>: je 0x7ffff532df80 <dgemm_oncopy+384>
0x00007ffff532df19 <+281>: nopl 0x0(%rax)
=> 0x00007ffff532df20 <+288>: vmovsd (%r11),%xmm0
0x00007ffff532df25 <+293>: vmovsd 0x8(%r11),%xmm1
0x00007ffff532df2b <+299>: vmovsd 0x10(%r11),%xmm2
0x00007ffff532df31 <+305>: vmovsd 0x18(%r11),%xmm3
0x00007ffff532df37 <+311>: vmovhpd (%r12),%xmm0,%xmm0
0x00007ffff532df3d <+317>: vmovhpd 0x8(%r12),%xmm1,%xmm1
0x00007ffff532df44 <+324>: vmovhpd 0x10(%r12),%xmm2,%xmm2
0x00007ffff532df4b <+331>: vmovhpd 0x18(%r12),%xmm3,%xmm3
0x00007ffff532df52 <+338>: vmovups %xmm0,(%r8)
0x00007ffff532df57 <+343>: vmovups %xmm1,0x10(%r8)
0x00007ffff532df5d <+349>: vmovups %xmm2,0x20(%r8)
0x00007ffff532df63 <+355>: vmovups %xmm3,0x30(%r8)
0x00007ffff532df69 <+361>: add $0x20,%r11
0x00007ffff532df6d <+365>: add $0x20,%r12
0x00007ffff532df71 <+369>: sub $0xffffffffffffffc0,%r8
0x00007ffff532df75 <+373>: data32 nopw %cs:0x0(%rax,%rax,1)
0x00007ffff532df80 <+384>: mov %rdi,%r9
0x00007ffff532df83 <+387>: and $0x3,%r9
0x00007ffff532df87 <+391>: jle 0x7ffff532dfc0 <dgemm_oncopy+448>
0x00007ffff532df89 <+393>: nopl 0x0(%rax)
0x00007ffff532df90 <+400>: vmovsd (%r11),%xmm0
0x00007ffff532df95 <+405>: vmovhpd (%r12),%xmm0,%xmm0
0x00007ffff532df9b <+411>: vmovups %xmm0,(%r8)
0x00007ffff532dfa0 <+416>: add $0x8,%r11
0x00007ffff532dfa4 <+420>: add $0x8,%r12
0x00007ffff532dfa8 <+424>: add $0x10,%r8
0x00007ffff532dfac <+428>: dec %r9
0x00007ffff532dfaf <+431>: jg 0x7ffff532df90 <dgemm_oncopy+400>
0x00007ffff532dfb1 <+433>: data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
0x00007ffff532dfc0 <+448>: dec %r10
0x00007ffff532dfc3 <+451>: jg 0x7ffff532de20 <dgemm_oncopy+32>
0x00007ffff532dfc9 <+457>: nopl 0x0(%rax)
0x00007ffff532dfd0 <+464>: test $0x1,%rsi
0x00007ffff532dfd7 <+471>: jle 0x7ffff532e050 <dgemm_oncopy+592>
0x00007ffff532dfd9 <+473>: mov %rdx,%r11
0x00007ffff532dfdc <+476>: mov %rdi,%r9
0x00007ffff532dfdf <+479>: sar $0x2,%r9
0x00007ffff532dfe3 <+483>: jle 0x7ffff532e020 <dgemm_oncopy+544>
0x00007ffff532dfe5 <+485>: data32 nopw %cs:0x0(%rax,%rax,1)
0x00007ffff532dff0 <+496>: vmovups (%r11),%xmm0
0x00007ffff532dff5 <+501>: vmovups 0x10(%r11),%xmm1
0x00007ffff532dffb <+507>: vmovups %xmm0,(%r8)
0x00007ffff532e000 <+512>: vmovups %xmm1,0x10(%r8)
0x00007ffff532e006 <+518>: add $0x20,%r11
0x00007ffff532e00a <+522>: sub $0xffffffffffffffe0,%r8
0x00007ffff532e00e <+526>: dec %r9
0x00007ffff532e011 <+529>: jg 0x7ffff532dff0 <dgemm_oncopy+496>
0x00007ffff532e013 <+531>: data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
0x00007ffff532e020 <+544>: mov %rdi,%r9
0x00007ffff532e023 <+547>: and $0x3,%r9
0x00007ffff532e027 <+551>: jle 0x7ffff532e050 <dgemm_oncopy+592>
0x00007ffff532e029 <+553>: nopl 0x0(%rax)
0x00007ffff532e030 <+560>: vmovsd (%r11),%xmm0
0x00007ffff532e035 <+565>: vmovsd %xmm0,(%r8)
0x00007ffff532e03a <+570>: add $0x8,%r11
0x00007ffff532e03e <+574>: add $0x8,%r8
0x00007ffff532e042 <+578>: dec %r9
0x00007ffff532e045 <+581>: jg 0x7ffff532e030 <dgemm_oncopy+560>
0x00007ffff532e047 <+583>: nopw 0x0(%rax,%rax,1)
0x00007ffff532e050 <+592>: pop %r12
0x00007ffff532e052 <+594>: pop %r13
0x00007ffff532e054 <+596>: retq
|

I'll post the processor details in an upcoming comment.

Thank you
Regards


Reply to this email directly or view it on GitHub
#586 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants