-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel participation API: make joins crash fault tolerant #1608
Conversation
func (r *Registrar) removeJoinBlock(channelID string) error { | ||
if err := r.joinBlockFileRepo.Remove(channelID); err != nil { | ||
return errors.Wrapf(err, "failed removing joinblock for channel %s", channelID) | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a version of this exported and protected with a mutex, so that when a follower finishes on boarding it will be able to call and remove the join block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason the follower should handle removing the join block and not the registrar itself? For now, I've added the remove logic into SwitchFollowerToChain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the ledger progresses past the join block, there is no reason to keep it around... there is no harm in that, but it still feels a bit weird. Note that the follower can live for quite some time past the join-block...
Nevertheless, this is acceptable.
Another thing we could do in support of lazy removal is to use loadJoinBlocks()
and remove a join block if the ledger's height > join-block.Number
.
err = errors.Wrap(err, "failed creating chain support for join") | ||
|
||
if err2 := r.removeJoinBlock(channelID); err2 != nil { | ||
// Stack file repo error on top of chain support creation error | ||
err = errors.WithStack(err2) | ||
} | ||
|
||
return types.ChannelInfo{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this point we have to clean the ledger as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I haven't addressed any comments yet. Just rebased this onto master and adjusted the code as necessary. |
Just another rebase to spread out the effort of keeping up with master. |
7136b04
to
9ae563b
Compare
4c1544d
to
f7e2c61
Compare
// TODO if follower is onboarding, remove the joinblock from file repo | ||
_, status := follower.StatusReport() | ||
if status == types.StatusOnBoarding { | ||
if err := r.removeJoinBlock(channelID); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not going to work if you have a follower that passed the join block (i.e. active) but did not yet remove the join block. That is especially true with the lazy removal that you implemented in this commit, but may also be true regardless because those things happen concurrently.
So a better approach IMO would be to either (1) remove it unconditionally and ignore the error if it is not found (best to make a special error type to identify that) or (2) list join blocks and remove it only if it exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to always attempt removing the join block.
if err := r.joinBlockFileRepo.Save(channelID, blockBytes); err != nil { | ||
return types.ChannelInfo{}, errors.Wrapf(err, "failed saving joinblock to file repo for channel %s", channelID) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this isMember, err := clusterConsenter.IsChannelMember(configBlock)
(one line down) returns an error, the ledger and join block are not removed.
So, instead of having the "cleanup code" repeated 3 times, my original idea was to have named return arguments, i.e. ... (info types.ChannelInfo, err error)
, and right here do a defer that looks at the returned err
, and if it is not nil, do the cleanup.
An alternative is to save the join-block after the call to clusterConsenter.IsChannelMember(configBlock)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
blockBytes, err := proto.Marshal(configBlock) | ||
if err != nil { | ||
return types.ChannelInfo{}, errors.Wrap(err, "failed marshaling joinblock") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this goes bad, you have to remove the empty ledger created above. Same if the joinBlockFileRepo.Save
below fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if err2 := r.ledgerFactory.Remove(channelID); err2 != nil { | ||
// Stack ledger removal error on top of chain support creation error | ||
err = errors.WithStack(err2) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put this in a defer that checks the return err
right after the ledger is created,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if err2 := r.removeJoinBlock(channelID); err2 != nil { | ||
// Stack file repo error on top of ledger removal + chain support creation error | ||
err = errors.WithStack(err2) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put this in a defer that checks the return err
right after the join-block is saved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f56230d
to
109b5e6
Compare
Looks good to me. |
@@ -608,19 +619,41 @@ func (r *Registrar) JoinChannel(channelID string, configBlock *cb.Block, isAppCh | |||
if err != nil { | |||
return types.ChannelInfo{}, err | |||
} | |||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tock-ibm What about the case where the initLedgerResourcesClusterConsenter fails? That will leave around the ledger for the channel. Do you think newLedgerResources should be updated to remove the ledger if the call to ledgerFactory.GetOrCreate() fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. GetOrCreate
is the last thing in newLedgerResources
so I am not too worried about that. Hopefully GetOrCreate
does not leave an empty ledger behind if it fails. initLedgerResourcesClusterConsenter
is used in other places as well, in which the ledger is not empty, so we cant effort to cleanup within it as well.
However if initLedgerResourcesClusterConsenter
fails after newLedgerResources
, that requires cleanup in this method, that is correct.
Here is an idea: how about putting the defer func()
from L622 just before initLedgerResourcesClusterConsenter
(to L617)?
It might try to remove a non-existing ledger, but that is not an error according to the blockstore_provider.go#L131
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good to me, one nit noted below. Will wait to approve until other conversations have resolved.
|
||
func (r *Registrar) removeJoinBlock(channelID string) error { | ||
if err := r.joinBlockFileRepo.Remove(channelID); err != nil { | ||
return errors.Wrapf(err, "failed removing joinblock for channel %s", channelID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we wrapping an error we generate? Why not simply 'WithMessage', looking up the tree, it looks like these errors are already generated with a stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I've just been a little lazy with differentiating between Wrap and WithMessage lately... plus it's what the rest of the file (and maybe package?) does with the errors. I think it would be worth a single pass through to go with the "standard" we set forth for fabric: use Wrap for errors that come from an external library and use WithMessage for errors that fabric already generated using pkg/errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened FAB-18229
Remove joinblock upon completion of JoinChannel (succcess or failure).. FAB-18008 #done Signed-off-by: Danny Cao <dcao@us.ibm.com> Signed-off-by: Will Lahti <wtlahti@us.ibm.com>
Type of change
Description
First commit:
Second commit:
Related issues
FAB-18008