Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch.Part.API: Allow retry of failed removals #2083

Merged
merged 1 commit into from
Nov 9, 2020

Conversation

stephyee
Copy link
Contributor

@stephyee stephyee commented Nov 6, 2020

Type of change

  • Improvement

Description

Allow retrying of failed channel removals per comment

Signed-off-by: Tiffany Harris <tiffany.harris@ibm.com>
@stephyee stephyee marked this pull request as ready for review November 6, 2020 22:55
@stephyee stephyee requested a review from a team as a code owner November 6, 2020 22:55
} else {
return types.ErrChannelRemovalFailure
}
if removalStatus := r.pendingRemoval[channelID]; removalStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we simply make the pendingRemoval map, a map from string to struct{} ?

I see it this way:

  1. If we are in the process of removing, then we have the element in the map, so we return a pending error.
  2. If we succeeded the removal, then great, we return that it doesn't exist so we can't remove it.
  3. Else, we failed the removal, so we just re-try and it's equivalent to the first removal attempt, no?

Why do we need to keep a boolean map value if we have 3 possible states, and 2 maps (existing channels and pending channels) which can encode 4 states?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Definitely makes sense.

Copy link
Contributor

@tock-ibm tock-ibm Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. The remove is asynch, so first the request thread stops the chain/follower, moves the chain/follower to the pendingRemove map, spawns a go-routine to remove it, and returns. You need some way of signaling to a potential second remove request thread that the remove is on-going in the background (with the go-routine spawned by the first). This is exactly the boolean in the third map - true signals an ongoing remove, false a failed remove, and the key-value is deleted when the remove succeeds. If we had a struct{} in there, there was no way of detecting that background removal is in progress or when it ended, for good or bad. Note that the chains/followers/pendingRemove maps are mutually exclusive. A map[string]boolean can encodes these 3 states, whereas a map[string]struct{} cannot.

In addition, see this comment here: #2027 (comment) which advocated for a

struct {
  cRel     types.ClusterRelation
  status  types.Status
}

in the pendingRemove map instead of a boolean, and a new additional status types.StatusFailure to make diagnosing failures even easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'll admit to not being as familiar with the flow here and will defer to Yoav. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need some way of signaling to a potential second remove request thread that the remove is on-going in the background (with the go-routine spawned by the first).

The fact that the channel is in the pending map conveys that. As long as the mapping exists, there is a goroutine active, and whenever the mapping doesn't exist, there is no such goroutine. This implies that the goroutine deletes the mapping once it is done - whether it succeeded or failed.

Copy link
Contributor

@tock-ibm tock-ibm Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a good idea.
The go-routine should set the mapping channelID->false when it fails and terminates. This way:

  1. a List (GET request) of the channel can reveal that the removal had failed
  2. subsequent Join requests are rejected (the ledger may be in an inconsistent state)
  3. but remove can be retried safely.
    If we don't keep it in the map, the Registrar has no record that the removal had failed, or that the channel exists, since it is not in either channels/followers/pendingRemove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the assumption that we want to reject joins, then let's go ahead with what you propose

@tock-ibm
Copy link
Contributor

tock-ibm commented Nov 8, 2020

Looks good to me!
Good suggestion by @yacovm .

@yacovm yacovm merged commit cc77194 into hyperledger:master Nov 9, 2020
@stephyee stephyee deleted the removechannel branch November 9, 2020 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants