Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out-of-bounds in xORBDB2 and xORBDB3 #551

Conversation

weslleyspereira
Copy link
Collaborator

@weslleyspereira weslleyspereira commented May 11, 2021

Fix #549.

Description
In #549, we verified xORBDB2 and xORBDB3 use subarrays of size 0 starting at X(N+1), where N is the size of X. These arrays are passed as arguments to other subroutines that never reference the arrays. Therefore, this is a false positive case of out-of-bounds. The code provokes no errors in practice, but some compilers (like gfortran) identify and return an out-of-bounds error. This PR tries to rewrite the code while keeping the check of bounds. [edited]

Checklist

  • Fix sORBDB2
  • Fix sORBDB3
  • Fix dORBDB2 and dORBDB3

@weslleyspereira
Copy link
Collaborator Author

The strategy was:

  1. Remove I=P and I=Q from the loops.

  2. Replace

SLARFGP( 1, ALPHA, OUTOFBOUNDS_V, LDV, TAU )

by

TAU = 1 - SIGN( 1, ALPHA )
ALPHA = ABS( ALPHA )
  1. Remove calls of SLARF where the output argument was invalid.

@codecov
Copy link

codecov bot commented May 11, 2021

Codecov Report

Merging #551 (5a5d229) into master (2dafa3d) will decrease coverage by 0.00%.
The diff coverage is 67.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #551      +/-   ##
==========================================
- Coverage   82.37%   82.36%   -0.01%     
==========================================
  Files        1894     1894              
  Lines      190677   190729      +52     
==========================================
+ Hits       157063   157099      +36     
- Misses      33614    33630      +16     
Impacted Files Coverage Δ
SRC/cunbdb3.f 40.00% <15.78%> (-1.54%) ⬇️
SRC/zunbdb3.f 39.43% <15.78%> (-1.48%) ⬇️
SRC/dorbdb3.f 41.79% <20.00%> (-1.76%) ⬇️
SRC/sorbdb3.f 41.79% <20.00%> (-1.76%) ⬇️
SRC/cunbdb1.f 87.30% <100.00%> (+0.63%) ⬆️
SRC/cunbdb2.f 88.73% <100.00%> (+0.85%) ⬆️
SRC/dorbdb1.f 86.66% <100.00%> (+0.70%) ⬆️
SRC/dorbdb2.f 88.23% <100.00%> (+0.93%) ⬆️
SRC/sorbdb1.f 86.66% <100.00%> (+0.70%) ⬆️
SRC/sorbdb2.f 88.23% <100.00%> (+0.93%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2dafa3d...5a5d229. Read the comment docs.

@langou
Copy link
Contributor

langou commented May 11, 2021

SLARFGP( 1, ALPHA, OUTOFBOUNDS_V, LDV, TAU )

Not related to this specific PR. But this reminds me that there is a bug in xLARFGP that I think we never managed to fix, and we initially inserted xLARFGP everywhere, and then removed it. It worries me that some routines in LAPACK are using xLARFGP. I do not know what to do about it since I do not know whether the code relies on the P or not. One idea would be to try with xLARFG and see if it works. Or we can leave it as is. It is possible that the usage of xLARFGP in this routine makes it such that xLARFGP is stable. All in all, I do not know.

@christoph-conrads
Copy link
Contributor

christoph-conrads commented May 11, 2021

I do not know what to do about it since I do not know whether the code relies on the P or not.

The computed beta values are used as atan2 inputs. Without further review, one must assume that the code relies on the beta values being positive.

Description
In #549, it was highlighted xORBDB2 and xORBDB3 had several out-of-bounds accesses. This PR tries to fix some of them.

There are no out-of-bounds accesses. Given an array X with valid indices 1, 2, ..., N, the code references the subarray of length zero starting at X( N + 1).

Checklist

* [x]  Fix sORBDB2

* [x]  Fix sORBDB3

* [ ]  Fix dORBDB2 and dORBDB3

Add to that:

  • SORBDB1 (for when the column count Q is the smallest dimension)
  • The array accesses in the other do-loop in SORBDB{1,2,3} when the smaller submatrix has more than zero rows
  • DORBDB{1,2,3}
  • CUNBDB{1,2,3}
  • ZUNBDB{1,2,3}

@weslleyspereira
Copy link
Collaborator Author

weslleyspereira commented May 13, 2021

I see... If one wants to fix all of them, we would need to rewrite a number of algorithms with no real improvement in practice. I couldn't think of a good way to continue using -fcheck=bounds and avoid the false positives. Any ideas?

* I updated the description of this PR.

@christoph-conrads
Copy link
Contributor

I see... If one wants to fix all of them, we would need to rewrite a number of algorithms with no real improvement in practice. I couldn't think of a good way to continue using -fcheck=bounds and avoid the false positives. Any ideas?

  • I updated the description of this PR.

Valgrind and address sanitizers work well for me. They also avoid issue #339 when calling LAPACK code from C with the function prototypes in LAPACKE/include/lapack.h (the header omits the hidden arguments of type size_t that contain the length of the strings passed to the callee; consequently C callers do not initialize these values).

Valgrind can be used with any executable: valgrind --valgrind-arg1 --valgrind-arg2 -- my-program --my-arg. Address sanitizers can be used by passing -fsanitize=address on the compiler and linker command line. It is also possible to use address sanitizers without re-building the code by pre-loading them (e.g., env LD_PRELOAD=/path/to/libasan.so my-executable --my-args) but I am not sure which problems can be detected this way. On some Linux distributions, the address sanitizer libraries are not part of the GCC package installed by the system's package manager (e.g., on CentOS); use of the sanitizer will then cause linker errors.

@christoph-conrads
Copy link
Contributor

I see... If one wants to fix all of them, we would need to rewrite a number of algorithms with no real improvement in practice. I couldn't think of a good way to continue using -fcheck=bounds and avoid the false positives. Any ideas?

Just found in sgeqr2.f: Use X( MIN(I+1, N) ).

lapack/SRC/sgeqr2.f

Lines 177 to 182 in 2dafa3d

DO 10 I = 1, K
*
* Generate elementary reflector H(i) to annihilate A(i+1:m,i)
*
CALL SLARFG( M-I+1, A( I, I ), A( MIN( I+1, M ), I ), 1,
$ TAU( I ) )

@weslleyspereira weslleyspereira force-pushed the fix-out--of--bounds-in-ORBDB-2-and-3-routines branch from 9b9443b to 9cd55e4 Compare May 14, 2021 12:57
@weslleyspereira weslleyspereira marked this pull request as ready for review May 14, 2021 12:57
@weslleyspereira
Copy link
Collaborator Author

Thanks, @christoph-conrads! That was easy to implement. I followed your suggestions and created a new commit. I am now applying the fixes to xORBDB1, and then I will look at the complex cases.

@weslleyspereira
Copy link
Collaborator Author

Done! I hope I didn't miss anything.

@christoph-conrads
Copy link
Contributor

christoph-conrads commented May 14, 2021

Thanks, @christoph-conrads! That was easy to implement. I followed your suggestions and created a new commit. I am now applying the fixes to xORBDB1, and then I will look at the complex cases.

This was not a suggestion, I just responded to your request for ideas, and I think this is a bad idea.

These changes trade readability for -fcheck=bounds shutting up about alleged out-of-bounds accesses in the special case where the matrix at hand is not a submatrix of a bigger matrix. Valgrind and the address sanitizer can detect OOB access in this special case, too. In any other case, -fcheck=bounds will not help. Moreover, if there is any numerical mistake in those 250+ lines touched by this commit, none of these tools will help and given variable names like I, I1, I2, I3, and I4, the humans cannot help either.

PS: I gave up on -fcheck=bounds months ago and rely only on the address sanitizer as well as Valgrind.

@weslleyspereira
Copy link
Collaborator Author

I will drop this PR. I don't think this is a good idea anymore. I agree with the sentence:

These changes trade readability for -fcheck=bounds shutting up about alleged out-of-bounds accesses in the special case where the matrix at hand is not a submatrix of a bigger matrix.

I think it would be good to have a routine in the CI to run the tests under Valgrind. Thanks @christoph-conrads!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xUNCSD2BY1 does not handle submatrices with empty rows properly
3 participants