-
Notifications
You must be signed in to change notification settings - Fork 37
Analyze lctl output for real snapshot listing failures #2282
Conversation
c7edd57
to
e7a0bae
Compare
It may be a good idea to catch "resource temporarily unavailable" at lctl call here: integrated-manager-for-lustre/iml-agent/src/lustre.rs Lines 13 to 24 in 2cd9ed8
and then retry. So all command would work more reliably. |
e7a0bae
to
e6216c0
Compare
.checked_output() | ||
.await; | ||
|
||
if let Err(CmdError::Output(o)) = &r { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this should wrap the lctl
call as a new fn instead of being within it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it first, but the function was private anyway consisting only of Command::new("/usr/sbin/lctl")
(after I removed premature decoding of output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this fn lctl_retry
, and keep the original lctl
fn in place. the lctl
fn is being used in a few other modules already, and it's behavior should stay the same I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, other modules will just become more sound out of the box. I don't think there is sense in failing for a temporarily unavailable resource.
Initially there were two checks when listing snapshots, but this one (resource temporarily unavailable) seems the best be handled earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we seen this on any other lctl call (and do we know the behavior is the same)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's valid question. As a precaution we indeed can have two version of lctl function. But the lctl executable is using perror() or strerror() consistently. And retrying temporary failures is still needed.
It is really simple: we would loop iff lctl prints "Resource temporarily unavailable" (to stderr + exit code, it could not be a snapshot comment :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the call into IML's liblustreapi library? Might be cleaner than doing a shell command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it lctl_retry
e6216c0
to
e2e63fd
Compare
I successfully created few snapshots from two terminals at the same time :) |
Signed-off-by: Igor Pashev <pashev.igor@gmail.com>
e2e63fd
to
5f50959
Compare
Replaced by #2364 |
Closing #2248.
Signed-off-by: Igor Pashev pashev.igor@gmail.com
This change is