Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-7015 event: proper event handing in case of progress errors (#4911) #4961

Merged
merged 2 commits into from
Mar 16, 2021

Conversation

mchaarawi
Copy link
Contributor

crt_progress can sometimes return errors that are not timeout. Those
are not properly handled in synchronous IO calls and cause the private
event to be in an unusable state for following IOs.

This PR checks the error from cart and reinits the private event in
case an error was returned so it can be resused rather then returning
to the user.

Signed-off-by: Mohamad Chaarawi mohamad.chaarawi@intel.com

crt_progress can sometimes return errors that are not timeout. Those
are not properly handled in synchronous IO calls and cause the private
event to be in an unusable state for following IOs.

This PR checks the error from cart and reinits the private event in
case an error was returned so it can be resused rather then returning
to the user.

Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Scan CentOS 7 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4961/2/execution/node/1093/log

* other progress failure; op should fail with that err. reset
* the private event first so it can be resused.
*/
rc2 = daos_event_priv_reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never fail for private event, we might just assert it's zero.

if (rc == 0)

/** progress succeeded, loop can exit if event completed */
if (rc == 0) {
rc = ev_thpriv.ev_error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rc" will be overwritten by the next crt_progress_cond() immediately, we might not need to save it?
The main behavior changed by this patch is, we originally exit from the loop for ev_thpriv.ev_error != 0, now we don't do that anymore, is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought that if the error is set, then the loop will exit anyway in the check?
maybe im wrong..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you're right though, we should exit the loop. i will fix this first on master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created this PR.
#4981
will integrate into this 1.2 pr once that lands

- exit loop if event error is reported

Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@mchaarawi mchaarawi changed the title DAOS-6958 event: proper event handing in case of progress errors (#4911) DAOS-7009 event: proper event handing in case of progress errors (#4911) Mar 12, 2021
@mchaarawi mchaarawi changed the title DAOS-7009 event: proper event handing in case of progress errors (#4911) DAOS-7015 event: proper event handing in case of progress errors (#4911) Mar 12, 2021
@mchaarawi mchaarawi requested review from gnailzenh and a team March 15, 2021 19:48
@mchaarawi
Copy link
Contributor Author

nlt warnings here seems because there is no reference build detected.

@ashleypittman
Copy link
Contributor

nlt warnings here seems because there is no reference build detected.

Agreed.

This has now been fixed for new builds/PRs.

@jolivier23 jolivier23 merged commit 9e160e0 into release/1.2 Mar 16, 2021
@jolivier23 jolivier23 deleted the DAOS-6958 branch March 16, 2021 22:57
@ashleypittman ashleypittman mentioned this pull request Apr 28, 2021
@ashleypittman ashleypittman mentioned this pull request May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants