-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ch.Part.API: Allow retry of failed removals #2083
Conversation
Signed-off-by: Tiffany Harris <tiffany.harris@ibm.com>
23110f3
to
7417007
Compare
} else { | ||
return types.ErrChannelRemovalFailure | ||
} | ||
if removalStatus := r.pendingRemoval[channelID]; removalStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we simply make the pendingRemoval
map, a map from string
to struct{}
?
I see it this way:
- If we are in the process of removing, then we have the element in the map, so we return a pending error.
- If we succeeded the removal, then great, we return that it doesn't exist so we can't remove it.
- Else, we failed the removal, so we just re-try and it's equivalent to the first removal attempt, no?
Why do we need to keep a boolean map value if we have 3 possible states, and 2 maps (existing channels and pending channels) which can encode 4 states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Definitely makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. The remove is asynch, so first the request thread stops the chain/follower, moves the chain/follower to the pendingRemove map, spawns a go-routine to remove it, and returns. You need some way of signaling to a potential second remove request thread that the remove is on-going in the background (with the go-routine spawned by the first). This is exactly the boolean in the third map - true
signals an ongoing remove, false
a failed remove, and the key-value is deleted when the remove succeeds. If we had a struct{} in there, there was no way of detecting that background removal is in progress or when it ended, for good or bad. Note that the chains/followers/pendingRemove maps are mutually exclusive. A map[string]boolean
can encodes these 3 states, whereas a map[string]struct{}
cannot.
In addition, see this comment here: #2027 (comment) which advocated for a
struct {
cRel types.ClusterRelation
status types.Status
}
in the pendingRemove
map instead of a boolean, and a new additional status types.StatusFailure
to make diagnosing failures even easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'll admit to not being as familiar with the flow here and will defer to Yoav. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need some way of signaling to a potential second remove request thread that the remove is on-going in the background (with the go-routine spawned by the first).
The fact that the channel is in the pending map conveys that. As long as the mapping exists, there is a goroutine active, and whenever the mapping doesn't exist, there is no such goroutine. This implies that the goroutine deletes the mapping once it is done - whether it succeeded or failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not a good idea.
The go-routine should set the mapping channelID->false when it fails and terminates. This way:
- a
List
(GET request) of the channel can reveal that the removal had failed - subsequent Join requests are rejected (the ledger may be in an inconsistent state)
- but remove can be retried safely.
If we don't keep it in the map, the Registrar has no record that the removal had failed, or that the channel exists, since it is not in either channels/followers/pendingRemove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the assumption that we want to reject joins, then let's go ahead with what you propose
Looks good to me! |
Type of change
Description
Allow retrying of failed channel removals per comment