Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Commit after ChangeView #642

Merged
merged 35 commits into from
Mar 25, 2019
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
aa16b7f
Prevent commit after ChangeView.
jsolman Mar 18, 2019
d686236
Minor simplification.
jsolman Mar 18, 2019
0d98768
Minimize differences.
jsolman Mar 18, 2019
d9ea44e
Don't consider view changing once F nodes committed.
jsolman Mar 19, 2019
57065f8
Add a comment describing current potential issues.
jsolman Mar 19, 2019
442bfe8
Save consensus context upon changing view to prevent split commit.
jsolman Mar 19, 2019
f471b32
Adjust comment concerning potential issues.
jsolman Mar 19, 2019
7d5b201
Further improve comment.
jsolman Mar 19, 2019
470b35c
Reduce to minimal changes to fix the main issue that can cause the ne…
jsolman Mar 19, 2019
bfb729e
Rename helper method that checks if F nodes are already committed.
jsolman Mar 19, 2019
76bd576
Greater than F nodes must have committed for it to not be possible fo…
jsolman Mar 19, 2019
ee72d58
Rename flag to MoreThanFNodesCommitted.
jsolman Mar 19, 2019
3dcc435
Merge branch 'master' into consensus/preventStall
vncoelho Mar 19, 2019
b18735e
Merge branch 'master' into consensus/preventStall
jsolman Mar 20, 2019
aa365fd
Make a new flag for clarity. Add TODO for fixing counting committed p…
jsolman Mar 20, 2019
4959176
Keep track of commit payloads in all views to support counting commits.
jsolman Mar 20, 2019
abb8063
Merge branch 'master' into consensus/preventStall
jsolman Mar 20, 2019
68f5235
Rename flag for clarity.
jsolman Mar 20, 2019
eee35cc
Remove unnecessary change of recording commit from another view.
jsolman Mar 20, 2019
2e7d8ad
Fix typo in comment.
jsolman Mar 20, 2019
28b45cc
If we already have a commit for the validator at the current view, we…
jsolman Mar 20, 2019
b6bf4da
Add warning log for receiving a different commit from a validator. A …
jsolman Mar 20, 2019
ce7685b
minor changes
erikzhang Mar 21, 2019
b8beafc
use `LogLevel.Warning`
erikzhang Mar 21, 2019
731b915
Merge branch 'master' into consensus/preventStall
erikzhang Mar 21, 2019
7ba991b
Check `ViewNumber` in `OnPrepareRequestReceived()` and `OnPrepareResp…
erikzhang Mar 21, 2019
6fc4864
Simplify `OnRecoveryMessageReceived()`
erikzhang Mar 21, 2019
b7e7981
minor change
erikzhang Mar 21, 2019
9838afc
Fix a merge issue. OnConsensusPayload method should not be accepting …
jsolman Mar 21, 2019
663a8ae
Revert "Fix a merge issue. OnConsensusPayload method should not be ac…
jsolman Mar 21, 2019
d0f89c6
Merge branch 'master' into consensus/preventStall
jsolman Mar 21, 2019
c3005ac
Merge branch 'master' into consensus/preventStall
shargon Mar 21, 2019
dda3e9a
Merge branch 'master' into consensus/preventStall
vncoelho Mar 24, 2019
5e925b0
Merge branch 'master' into consensus/preventStall
igormcoelho Mar 24, 2019
45091ce
Merge branch 'master' into consensus/preventStall
shargon Mar 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 20 additions & 22 deletions neo/Consensus/ConsensusService.cs
Original file line number Diff line number Diff line change
Expand Up @@ -182,25 +182,27 @@ private void OnChangeViewReceived(ConsensusPayload payload, ChangeView message)
// ChangeView messages include a Timestamp when the change view is sent, thus if a node restarts
// and issues a change view for the same view, it will have a different hash and will correctly respond
// again; however replay attacks of the ChangeView message from arbitrary nodes will not trigger an
// additonal recovery message response.
// additional recovery message response.
if (!knownHashes.Add(payload.Hash)) return;
if (message.NewViewNumber <= context.ViewNumber)
{
bool shouldSendRecovery = false;
// Limit recovery to sending from `f` nodes when the request is from a lower view number.
int allowedRecoveryNodeCount = context.F();
for (int i = 0; i < allowedRecoveryNodeCount; i++)
if (!context.CommitSent())
jsolman marked this conversation as resolved.
Show resolved Hide resolved
{
var eligibleResponders = context.Validators.Length - 1;
var chosenIndex = (payload.ValidatorIndex + i + message.NewViewNumber) % eligibleResponders;
if (chosenIndex >= payload.ValidatorIndex) chosenIndex++;
if (chosenIndex != context.MyIndex) continue;
shouldSendRecovery = true;
break;
}

if (!shouldSendRecovery) return;
bool shouldSendRecovery = false;
// Limit recovery to sending from `f` nodes when the request is from a lower view number.
int allowedRecoveryNodeCount = context.F();
for (int i = 0; i < allowedRecoveryNodeCount; i++)
{
var eligibleResponders = context.Validators.Length - 1;
var chosenIndex = (payload.ValidatorIndex + i + message.NewViewNumber) % eligibleResponders;
if (chosenIndex >= payload.ValidatorIndex) chosenIndex++;
if (chosenIndex != context.MyIndex) continue;
shouldSendRecovery = true;
break;
}

if (!shouldSendRecovery) return;
}
Log($"send recovery from view: {message.ViewNumber} to view: {context.ViewNumber}");
localNode.Tell(new LocalNode.SendDirectly { Inventory = context.MakeRecoveryMessage() });
}
Expand Down Expand Up @@ -297,7 +299,7 @@ private void OnRecoveryMessageReceived(ConsensusPayload payload, RecoveryMessage
if (ReverifyAndProcessPayload(changeViewPayload)) validChangeViews++;
}
if (message.ViewNumber != context.ViewNumber) return;
if (!context.CommitSent())
if (!context.ViewChanging() && !context.CommitSent())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should allow receiving PrepareRequest and PrepareResponse even when ViewChanging() == true. Because ViewChanging() indicates that I want to change views, it may not be able to change successfully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments below if it accepts prepare requests and prepare responses in the current view after it has sent a ChangeView message it can end up sending the other nodes to the next view, where they will never be able to receive the recovery message from this view with the commits, since it will be a lower view number; thus the network will be stalled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the view has not been changed, then we should allow consensus on the current view. Otherwise, we may not be able to reach a consensus or change views.

Copy link
Contributor Author

@jsolman jsolman Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the view is changing for a node (then the view has to be considered to be changed from it's perspective), because it has already sent a change view message that can be used to allow others to accept a change view, so it cannot commit in the current view. If it did commit in the current view after sending a change view, then the others can all accept moving to the later view, where they won't be able to reach consensus.

If more than F nodes have committed, then this code does allow receiving PrepareRequest and PrepareResponse in the current view effectively to obtain consensus by treating the ViewChanging flag as not changing. Maybe you missed this line that is changed in this PR:

public static bool MoreThanFNodesCommitted(this IConsensusContext context) => context.CommitPayloads.Count(p => p != null) > context.F();

If less than or equal to F nodes have been committed then they will still be able to obtain consensus in a higher view. So this code should address your concern of nodes being able to reach consensus in the current view when it is not possible to change view anymore.

{
if (!context.RequestSentOrReceived())
{
Expand Down Expand Up @@ -333,7 +335,7 @@ private void OnRecoveryMessageReceived(ConsensusPayload payload, RecoveryMessage

private void OnPrepareRequestReceived(ConsensusPayload payload, PrepareRequest message)
{
if (context.RequestSentOrReceived()) return;
if (context.RequestSentOrReceived() || context.ViewChanging()) return;
jsolman marked this conversation as resolved.
Show resolved Hide resolved
if (payload.ValidatorIndex != context.PrimaryIndex) return;
Log($"{nameof(OnPrepareRequestReceived)}: height={payload.BlockIndex} view={message.ViewNumber} index={payload.ValidatorIndex} tx={message.TransactionHashes.Length}");
if (message.Timestamp <= context.PrevHeader().Timestamp || message.Timestamp > TimeProvider.Current.UtcNow.AddMinutes(10).ToTimestamp())
Expand Down Expand Up @@ -393,7 +395,7 @@ private void OnPrepareRequestReceived(ConsensusPayload payload, PrepareRequest m

private void OnPrepareResponseReceived(ConsensusPayload payload, PrepareResponse message)
{
if (context.PreparationPayloads[payload.ValidatorIndex] != null) return;
if (context.PreparationPayloads[payload.ValidatorIndex] != null || context.ViewChanging()) return;
jsolman marked this conversation as resolved.
Show resolved Hide resolved
if (context.PreparationPayloads[context.PrimaryIndex] != null && !message.PreparationHash.Equals(context.PreparationPayloads[context.PrimaryIndex].Hash))
return;
Log($"{nameof(OnPrepareResponseReceived)}: height={payload.BlockIndex} view={message.ViewNumber} index={payload.ValidatorIndex}");
Expand Down Expand Up @@ -487,12 +489,8 @@ private void OnTimer(Timer timer)
private void OnTransaction(Transaction transaction)
{
if (transaction.Type == TransactionType.MinerTransaction) return;
if (!context.IsBackup() || !context.RequestSentOrReceived() || context.ResponseSent() || context.BlockSent())
if (!context.IsBackup() || context.ViewChanging() || !context.RequestSentOrReceived() || context.ResponseSent() || context.BlockSent())
return;
// If we are changing view but we already have enough preparation payloads to commit in the current view,
// we must keep on accepting transactions in the current view to be able to create the block.
if (context.ViewChanging() &&
context.PreparationPayloads.Count(p => p != null) < context.M()) return;
if (context.Transactions.ContainsKey(transaction.Hash)) return;
if (!context.TransactionHashes.Contains(transaction.Hash)) return;
AddTransaction(transaction, true);
Expand Down
13 changes: 11 additions & 2 deletions neo/Consensus/Helper.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
using Neo.Network.P2P.Payloads;
using System.Linq;
using Neo.Network.P2P.Payloads;
using Neo.Persistence;
using System.Runtime.CompilerServices;

Expand Down Expand Up @@ -26,8 +27,16 @@ internal static class Helper
public static bool CommitSent(this IConsensusContext context) => context.CommitPayloads[context.MyIndex] != null;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool BlockSent(this IConsensusContext context) => context.Block != null;

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool ViewChanging(this IConsensusContext context) => context.ChangeViewPayloads[context.MyIndex]?.GetDeserializedMessage<ChangeView>().NewViewNumber > context.ViewNumber;
public static bool ViewChanging(this IConsensusContext context) => context.ChangeViewPayloads[context.MyIndex]?.GetDeserializedMessage<ChangeView>().NewViewNumber > context.ViewNumber && !context.MoreThanFNodesCommitted();
jsolman marked this conversation as resolved.
Show resolved Hide resolved

// A possible attack can happen if the last node to commit is malicious and either sends change view after his
// commit to stall nodes in a higher view, or if he refuses to send recovery messages. In addition, if a node
// asking change views loses network or crashes and comes back when nodes are committed in more than one higher
// numbered view, it is possible for the node accepting recovery and commit in any of the higher views, thus
// potentially splitting nodes among views and stalling the network.
public static bool MoreThanFNodesCommitted(this IConsensusContext context) => context.CommitPayloads.Count(p => p != null) > context.F();
jsolman marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be >= context.F(). When F nodes are committed, if false is returned here, then the local node may commit, and the committed nodes may become F+1.

Copy link
Contributor Author

@jsolman jsolman Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in that case the other nodes can join it in this view because they won’t be able to get M nodes to change view out of this view. So I think it is correct as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you use >= F it causes a problem because that would allow a node to commit after sending it’s change view, possibly allowing the other nodes to move to the next view after it had committed leading to a stall.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commits > F and commits >= F only differ when commits == F. I think you should not commit when commits == F. Otherwise it will enter the F+1 situation. So here we should use >= F.

Copy link
Contributor Author

@jsolman jsolman Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When commits == F, the remaining nodes are M nodes which is enough to still come to consensus in the next view. Therefore this must use > F as the criteria for forcing coming to consensus in this view.

Copy link
Contributor Author

@jsolman jsolman Mar 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I thought of how to solve it. It needs to have a new protocol message that is like a change view commit. When M of those messages have been received committed nodes in lower view can uncommit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a recovery message that contains M ChangeView messages is equivalent to a Commitment to change view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a recovery message is received from a higher view it will be possible to uncommit from the lower view. No new protocol messages are needed. We just need to send the recovery if we see a node committed in a lower view.

Copy link
Contributor Author

@jsolman jsolman Mar 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an elegant solution almost complete now to maintain liveness with F failed nodes. Changing view needed to distinguish between a request to change view and a commitment to change view. In this way, nodes can continue to accept preparations until they commit to change view, this in combination with commits from lower views able to be uncommitted, solves liveness issues when there are up to F failed nodes.

Copy link
Member

@vncoelho vncoelho Mar 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff, I agree. I think it is a good idea. If we have a solution that do not block commit forever it is much better and will respect the properties of the classical pBFT.

But I think that we still need some properties that we worked and developed in this PR. The properties are locking nodes to commit if they are on viewchanging (this is a specification of the pbft).

@erikzhang and @igormcoelho & @shargon, let's insight possible "worst case situations" and think if this idea will cause any possibility of exposing a double header:

  1. f_0 = |f| honest nodes are committed in view0;
  2. 2f+1 =|M| nodes change view to view1.

Consideration:

  • f_m = |f| nodes from view1 are malicious and can use their commit signatures both at view0 and view1;

Implications:

  • 2f = |f_0| + |f_m| signatures can exist at view0;
  • f_0 nodes at view0 can be moved to view1 (any of the 2f+1 CN can move then to view1);

Security:

  • We should be sure that any of the f+1 honests that went to view1 (2.) will have a hardware fail. In this sense, I am thinking that PR-643 still should be merged, @jsolman. We do not want these nodes to be back because of hardware fail and give their signatures to f_c + f_m = |M| -1 at view0.
  • I think that we still need to lock those 2f+1 nodes during changeview (lock with viewchanging flag) that went to view1, otherwise, we can produce two headers:
  1. f_0 = |f| honest nodes are committed in view0;
  2. 2f+1 =|M| nodes change view to view1.
  3. f_h = |f| honest nodes stayed at view0 committed while they also sent their changeview package to view1

Hypotheses:

  • A single malicious nodes n \in f_m can now produce a double header;

Procedure:

  • This node n would only need to sign the block at view0 and do not relay it. Then, move other nodes to view1 generating a new block.


[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static uint GetPrimaryIndex(this IConsensusContext context, byte viewNumber)
Expand Down