-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test_conv2d] Memory access fault #142
Comments
@shurale-nkn Please delete your local user performance database and retest. |
The problem related to a02171c is probably fixed. However I think that the initial issue still persists. I mean this one:
When there is no perfdb record (or the record is invalid), the Solver should provide valid default perf-config |
This comment has been minimized.
This comment has been minimized.
This shouldn't be released. Also we see memory access faults on Jenkins from time to time, hardly reproducible. This corresponds the random nature of this issue, so it is the "suspect zero". |
The reason of the issue is that OpenCL kernel writes to wrong memory locations. @TejashShah do you have time for this? If not, then we'll introduce a workaround that would disable this solver. |
@TejashShah, @shurale-nkn, and @atamazov, is this issue still at large? Do we have a plan to address it? |
Further TODOs:
Originally posted by @atamazov in #179 (comment)
|
@atamazov do we have a plan to implement these to dos? (or did you mean to add the "priority_medium" ticket :D ) |
The plan is to resolve this ticket, see my previous comment. |
@aserio This priority_high is correct here, because of possibility of undefined behavior (UB). |
@atamazov and @TejashShah, have you been able to disable asymmetric padding for all OCL kernels and disable asymmetric padding for asm kernels that do not support it? Who is responsible for creating this pull request? |
No (no time). Me. |
@TejashShah will look into this issue. |
@atamazov I spent some time to understand this issue. You are right to point out that we should disable asymmetric padding across kernels. Even if, supposedly, one writes kernel that supports asymmetric padding, how does one (frameworks) describe the problem with asymmetric padding? Our current problem description seems to assume symmetric padding with the following variables defined in ProblemDescription int pad_h = 0; There doesn't exist any pad_h_top or pad_h_bottom or pad_w_left or pad_w_right to convey asymmetric padding. So, I believe, in general, we pretty much, disable asymmetric padding across the board. I attempted the following configs which I thought to be asymmetrical cases. None of them attempted solver except gemm. So, at present, there is some sort of guard preventing running solvers in asymmetrical cases.
Do you have specific configs that you find problematic in MIOpen due to missing asymmetric padding check? |
@atamazov Essentially, I am trying to find failed (segfault/verification fail) asymmetrical configs which are currently exercised by MIOpen which would later be un-exercised once I have changed applicability to explicitly disable asymmetrical cases. |
Let's open a ticket for asymmetrical padding support. |
#341 merged |
Error in test_conv2d from MIOpenConvBwdWrWS2.cl
System:
Vega 20
HIP
ROCm 3.0.6
checked commits: ce986b2 , 3a716fe, fcd5563, 481d6b9, cea6064, a02171c
For reproduce:
Reproduceable in combination of BWD + WRW only. The error is present in all combinations, but only in this one it can be seen as "Memory access fault".
The text was updated successfully, but these errors were encountered: