wrong results using blasfeo_dgemv_t - HP and REFERENCE #27

FreyJo · 2018-02-24T16:34:10Z

Hey guys,

I found a bug in blasfeo_dgemv_t:
I am using BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE.
Running the following example, I get wrong results using both HIGH PERFORMANCE and REFERENCE:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
    int nx = 9;
    int nu = 2;

    struct blasfeo_dmat A;
    struct blasfeo_dvec lambda;
    struct blasfeo_dvec lambda0;

    blasfeo_allocate_dmat(nx, nx, &A); //    
    blasfeo_allocate_dvec(nx+nu, &lambda);
    blasfeo_allocate_dvec(nx+nu, &lambda0);

    for (int ii = 0; ii < nx+nu; ii++) {
        blasfeo_dvecin1((double) ii, &lambda, ii);
    }
    blasfeo_dgese(nx, nx, 1.0, &A, 0, 0);

    blasfeo_print_dmat(nx, nx, &A, 0, 0);
    printf("lambda = \n");
    blasfeo_print_dvec(nx+nu, &lambda, 0);    

    blasfeo_dgemv_t(nx, nx, 1.0, &A, 0, 0, &lambda, 0, 0.0, &lambda0, 0, &lambda, 0); // recheck!
    printf("lambda: result = \n");
    blasfeo_print_exp_dvec(nx+  nu, &lambda,0);

    return 0;
}

REFERENCE prints:

lambda: result =
3.600000e+01
3.600000e+01
1.070000e+02
1.070000e+02
3.160000e+02
3.160000e+02
9.390000e+02
9.390000e+02
2.804000e+03
9.000000e+00
1.000000e+01

whereas HP prints:

lambda: result =
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
2.960000e+02
9.000000e+00
1.000000e+01

The correct result should be:

I hope this helps fixing stuff!

The text was updated successfully, but these errors were encountered:

tmmsartor · 2018-02-24T22:36:47Z

Thanks for reporting, I guess the problem is that you passed the same chunk of memory both for result z and for the multiplication vector x, this is forbidden also in standard BLAS implementation.
The only difference is that REF_BLAS overwrites addition vector y with the result.

y := alpha * A * x + beta * y // BLAS
z := alpha * A * x + beta * y // BLASFEO

Despite the fact y and z may be different in BLASFEO z and x must be different in both cases,
i.e. in Netlib_BLAS implementation this is directly enforced by the language. In Fortran having aliased arguments breaks the language standard.
Fun fact: this is also one of the reasons someone believe that Fortran is faster than C.

With the following modifications to your code I'm getting the expected both with HP and REF,
if you need indeed to overwrite vector x, in case, you have to make a copy of the vector.

    for (int ii = 0; ii < nx+nu; ii++) {
        blasfeo_dvecin1((double) ii, &lambda, ii);
        blasfeo_dvecin1((double) 1, &lambda0, ii);
    }
   ...
    blasfeo_dgemv_t(nx, nx, 1.0, &A, 0, 0, &lambda, 0, 1.0, &lambda0, 0, &lambda0, 0);
    printf("lambda0: result = \n");
    blasfeo_print_exp_dvec(nx+nu, &lambda0, 0);
    printf("lambda: result = \n");
    blasfeo_print_exp_dvec(nx+nu, &lambda, 0);

lambda0: result =
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
1.000000e+00
1.000000e+00

lambda: result =
0.000000e+00
1.000000e+00
2.000000e+00
3.000000e+00
4.000000e+00
5.000000e+00
6.000000e+00
7.000000e+00
8.000000e+00
9.000000e+00
1.000000e+01

Thanks again for reporting we will add more documentation about this kind of unexpected behaviors.
Maybe it would also be helpful to add some check inside the routines to avoid those.
ping @giaf

FreyJo · 2018-02-25T11:31:37Z

Oops, I should have checked this before..
But thanks a lot for your reply!

giaf · 2018-02-25T15:12:33Z

Yep it's true, x and z can not be the same. The reason is that, the elements of x will change as soon as the elements of z are computed. x is not stored internally, so if x and z are the same vector, you see that this can not work.

The same applies for most other routines (as e.g. dgemm), with the only exception of triangular matrices. In that case, it is possible to multiply or solver by a triangular matrix "in place" by performing the computations in the proper order. This comes from the triangular shape of the matrix: on the edge side, only one element of x is involved, and once the operation is performed, it can be overwritten with the result z. The standard BLAS interface acknowledge this by requiring to overwrite the result for triangular matrices; the BLASFEO interface allows for this, but does not force it.

FreyJo closed this as completed Feb 25, 2018

tmmsartor mentioned this issue Feb 26, 2018

Argument aliasing check #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong results using blasfeo_dgemv_t - HP and REFERENCE #27

wrong results using blasfeo_dgemv_t - HP and REFERENCE #27

FreyJo commented Feb 24, 2018

tmmsartor commented Feb 24, 2018 •

edited

Loading

FreyJo commented Feb 25, 2018

giaf commented Feb 25, 2018

wrong results using blasfeo_dgemv_t - HP and REFERENCE #27

wrong results using blasfeo_dgemv_t - HP and REFERENCE #27

Comments

FreyJo commented Feb 24, 2018

tmmsartor commented Feb 24, 2018 • edited Loading

FreyJo commented Feb 25, 2018

giaf commented Feb 25, 2018

tmmsartor commented Feb 24, 2018 •

edited

Loading