Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong results using blasfeo_dgemv_t - HP and REFERENCE #27

Closed
FreyJo opened this issue Feb 24, 2018 · 3 comments
Closed

wrong results using blasfeo_dgemv_t - HP and REFERENCE #27

FreyJo opened this issue Feb 24, 2018 · 3 comments

Comments

@FreyJo
Copy link
Contributor

FreyJo commented Feb 24, 2018

Hey guys,

I found a bug in blasfeo_dgemv_t:
I am using BLASFEO_TARGET = X64_INTEL_SANDY_BRIDGE.
Running the following example, I get wrong results using both HIGH PERFORMANCE and REFERENCE:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

// blasfeo
#include <blasfeo/include/blasfeo_target.h>
#include <blasfeo/include/blasfeo_common.h>
#include <blasfeo/include/blasfeo_d_aux.h>
#include <blasfeo/include/blasfeo_d_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_v_aux_ext_dep.h>
#include <blasfeo/include/blasfeo_d_blas.h>
int main() {
    int nx = 9;
    int nu = 2;

    struct blasfeo_dmat A;
    struct blasfeo_dvec lambda;
    struct blasfeo_dvec lambda0;

    blasfeo_allocate_dmat(nx, nx, &A); //    
    blasfeo_allocate_dvec(nx+nu, &lambda);
    blasfeo_allocate_dvec(nx+nu, &lambda0);

    for (int ii = 0; ii < nx+nu; ii++) {
        blasfeo_dvecin1((double) ii, &lambda, ii);
    }
    blasfeo_dgese(nx, nx, 1.0, &A, 0, 0);

    blasfeo_print_dmat(nx, nx, &A, 0, 0);
    printf("lambda = \n");
    blasfeo_print_dvec(nx+nu, &lambda, 0);    

    blasfeo_dgemv_t(nx, nx, 1.0, &A, 0, 0, &lambda, 0, 0.0, &lambda0, 0, &lambda, 0); // recheck!
    printf("lambda: result = \n");
    blasfeo_print_exp_dvec(nx+  nu, &lambda,0);

    return 0;
}

REFERENCE prints:

lambda: result =
3.600000e+01
3.600000e+01
1.070000e+02
1.070000e+02
3.160000e+02
3.160000e+02
9.390000e+02
9.390000e+02
2.804000e+03
9.000000e+00
1.000000e+01

whereas HP prints:

lambda: result =
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
3.600000e+01
2.960000e+02
9.000000e+00
1.000000e+01

The correct result should be:

lambda =

    36
    36
    36
    36
    36
    36
    36
    36
    36
     9
    10

I hope this helps fixing stuff!

@tmmsartor
Copy link
Contributor

tmmsartor commented Feb 24, 2018

Thanks for reporting, I guess the problem is that you passed the same chunk of memory both for result z and for the multiplication vector x, this is forbidden also in standard BLAS implementation.
The only difference is that REF_BLAS overwrites addition vector y with the result.

y := alpha * A * x + beta * y // BLAS
z := alpha * A * x + beta * y // BLASFEO

Despite the fact y and z may be different in BLASFEO z and x must be different in both cases,
i.e. in Netlib_BLAS implementation this is directly enforced by the language. In Fortran having aliased arguments breaks the language standard.
Fun fact: this is also one of the reasons someone believe that Fortran is faster than C.

With the following modifications to your code I'm getting the expected both with HP and REF,
if you need indeed to overwrite vector x, in case, you have to make a copy of the vector.

    for (int ii = 0; ii < nx+nu; ii++) {
        blasfeo_dvecin1((double) ii, &lambda, ii);
        blasfeo_dvecin1((double) 1, &lambda0, ii);
    }
   ...
    blasfeo_dgemv_t(nx, nx, 1.0, &A, 0, 0, &lambda, 0, 1.0, &lambda0, 0, &lambda0, 0);
    printf("lambda0: result = \n");
    blasfeo_print_exp_dvec(nx+nu, &lambda0, 0);
    printf("lambda: result = \n");
    blasfeo_print_exp_dvec(nx+nu, &lambda, 0);

lambda0: result =
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
3.700000e+01
1.000000e+00
1.000000e+00

lambda: result =
0.000000e+00
1.000000e+00
2.000000e+00
3.000000e+00
4.000000e+00
5.000000e+00
6.000000e+00
7.000000e+00
8.000000e+00
9.000000e+00
1.000000e+01

Thanks again for reporting we will add more documentation about this kind of unexpected behaviors.
Maybe it would also be helpful to add some check inside the routines to avoid those.
ping @giaf

@FreyJo
Copy link
Contributor Author

FreyJo commented Feb 25, 2018

Oops, I should have checked this before..
But thanks a lot for your reply!

@FreyJo FreyJo closed this as completed Feb 25, 2018
@giaf
Copy link
Owner

giaf commented Feb 25, 2018

Yep it's true, x and z can not be the same. The reason is that, the elements of x will change as soon as the elements of z are computed. x is not stored internally, so if x and z are the same vector, you see that this can not work.

The same applies for most other routines (as e.g. dgemm), with the only exception of triangular matrices. In that case, it is possible to multiply or solver by a triangular matrix "in place" by performing the computations in the proper order. This comes from the triangular shape of the matrix: on the edge side, only one element of x is involved, and once the operation is performed, it can be overwritten with the result z. The standard BLAS interface acknowledge this by requiring to overwrite the result for triangular matrices; the BLASFEO interface allows for this, but does not force it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants