-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete: remove the metafile check #209
Conversation
For some reasons, after updating the metafile then when read it immediately, we could find it won't be updated in time, then though the deletions are all successful, but it will still return as failure. Currently to check the ->exit status is enough. For the case: * When gluster-block delete was executed for the first time, on this node, deletion was successful. But before it can send the response, gluster-blockd died (addr meta status would be CLEANUPFAIL) But for this case we can also check this from ->exit status, not need to check it from the metafile. The deletion failure has also be seen in customer cases. Fixes: gluster#204 Signed-off-by: Xiubo Li <xiubli@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reasons, after updating the metafile then when read it
immediately, we could find it won't be updated in time, then though
the deletions are all successful, but it will still return as failure.
@lxbsz This shouldn't happen, please see the implementation of GB_METAUPDATE_OR_GOTO()
We basically open the metafile and close it in-place, which should flush the data to metafile.
Can you check if we where using the previously read values or deferring calling blockGetMetaInfo() after update ?
For the case:
- When gluster-block delete was executed for the first time, on this
node, deletion was successful. But before it can send the response,
gluster-blockd died (addr meta status would be CLEANUPFAIL)
But for this case we can also check this from ->exit status, not need
to check it from the metafile.
Yes, probably this was achieved by 0a65c01 ?
So with the above PR, we can probably remove this extra read.
Will test this case and get back, for now this PR looks good for me.
Thanks!
Yeah, I saw that, there has the SYNC flag, but it seems not working here in gluster.
I checked this already, it will call the blockGetMetaInfo again and allocate one new info struct. Do you see the code I removed in this PR ? This is the code where it is doing this. And also for the modify size case. And in tcmu-runner I have add some retry by waiting 5 seconds and it can work well. This is the same with the delete case. I can reproduce this by using the tests/basic.t script in RHEL 7 very easily. Thanks. |
The test cases:
The logs:
From the mountpoint:
|
@pranithk are you aware of any similar bug in gusterfs api ? Many Thanks! |
@lxbsz BTW I'm using latest master gluster-block (+ PR#205) with glusterfs-api-5.5-1.fc29.x86_64 on Fedora29, 5.0.3-200.fc29.x86_64 kernel. And run this test atleast 25 times and still not hit this issue. Can you confirm your glusterfs-api version and see if the above mentioned version works for you as well ? |
I am useing the RHEL 7, glusterfs-api-devel-6.0-0.4.rc1.fc29.x86_64 and gluster-block/tcmu-runner upstream code. Will try this later. |
Sorry, didn't get the context. Which bug in glusterfs api are you referring to? |
@pranithk Xiubo had hit a case where he gets old block metadata with blockGetMetaInfo() even after updating the block metadata with GB_METAUPDATE_OR_GOTO() If you remember, in GB_METAUPDATE_OR_GOTO, I have tested the same with glusterfs-api-5.5-1.fc29.x86_64 and it works for me as expected. @pranithk so wanted to check with you, if you are aware of any such bugs. @lxbsz Hacking like open-iscsi/tcmu-runner#546 is not a preferred. Instead we need to report and get this fixed in fs itself. Thanks! |
Yeah, agree. But we cannot make sure that users or customers will be aware of this, in tcmu-runner we just try to avoid it if they are using the buggy glusterfs version, but it is not fixing it totally. Make sense ?
|
@pkalever / @lxbsz Okay, makes sense. Could you raise a bug with the steps on gluster bugzilla with component as 'core', I am not sure which xlator could have lead to this problem. I will need to git-bisect and figure out and may have to reassign the bug to another developer based on that info. Please make sure to give the steps exactly. It may get assigned to a component where the developer may not know enough gluster-block. If you have automated test/script, that would be the best. |
@pranithk Yeah, make sense. I will raise one bz for this today. Thanks. |
@pranithk thanks for quick turnaround. Yes, xiubo seem to have easily hit it just by running ./tests/basic.t, hopefully that should be enough. @lxbsz please makesure to point our installation guide from the ReadMe in the BZ description to help gluster core developers quickly come-up to speed with the setup. And we will be happy to further assist on the BZ in case any other info is required. Thanks! |
Sure. The bugzilla: Thanks.
|
@lxbsz From the BZ, looks like you were not using the recommended settings on the block hosting volume. If that is the case, can we close it? Thanks! |
Yeah, this issue has been resolved with the recommended settings. |
What does this PR achieve? Why do we need it?
For some reasons, after updating the metafile then when read it
immediately, we could find it won't be updated in time, then though
the deletions are all successful, but it will still return as failure.
Currently to check the ->exit status is enough.
For the case:
node, deletion was successful. But before it can send the response,
gluster-blockd died (addr meta status would be CLEANUPFAIL)
But for this case we can also check this from ->exit status, not need
to check it from the metafile.
The deletion failure has also be seen in customer cases.
Does this PR fix issues?
Fixes: #204