-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-7015 event: proper event handing in case of progress errors (#4911) #4961
Conversation
crt_progress can sometimes return errors that are not timeout. Those are not properly handled in synchronous IO calls and cause the private event to be in an unusable state for following IOs. This PR checks the error from cart and reinits the private event in case an error was returned so it can be resused rather then returning to the user. Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Scan CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4961/2/execution/node/1093/log |
* other progress failure; op should fail with that err. reset | ||
* the private event first so it can be resused. | ||
*/ | ||
rc2 = daos_event_priv_reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should never fail for private event, we might just assert it's zero.
if (rc == 0) | ||
|
||
/** progress succeeded, loop can exit if event completed */ | ||
if (rc == 0) { | ||
rc = ev_thpriv.ev_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rc" will be overwritten by the next crt_progress_cond() immediately, we might not need to save it?
The main behavior changed by this patch is, we originally exit from the loop for ev_thpriv.ev_error != 0, now we don't do that anymore, is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought that if the error is set, then the loop will exit anyway in the check?
maybe im wrong..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you're right though, we should exit the loop. i will fix this first on master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created this PR.
#4981
will integrate into this 1.2 pr once that lands
- exit loop if event error is reported Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
nlt warnings here seems because there is no reference build detected. |
Agreed. This has now been fixed for new builds/PRs. |
crt_progress can sometimes return errors that are not timeout. Those
are not properly handled in synchronous IO calls and cause the private
event to be in an unusable state for following IOs.
This PR checks the error from cart and reinits the private event in
case an error was returned so it can be resused rather then returning
to the user.
Signed-off-by: Mohamad Chaarawi mohamad.chaarawi@intel.com