-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slash approval voters on approving invalid blocks - dynamically #635
Comments
Yes, approval checkers receiving some lesser slash always made sense, but yeah %/k ensures this stays below the backer slash, and solves the issue. I think 50%/k or 10%/k work fine too, while also being robust if we ever lower the backer count to one. |
We should be able to make this a function of the number of required backers. |
Why bother if 50% / k works, no? |
I like this! 50% is probably marginally better than 100% if we also want to make it so this doesn't result in disabling. |
Yeah. I just like to maximize the slash to whatever is possible in a security preserving way. Going with simply 50% is also a good option, especially when going with 100% would complicate things a lot - calculating from the number of required backers seems pretty easy though. |
As an aside, we occasionally discuss not signing approval votes, which directly conflicts here. We'd need almost a complete graph before that works, which makes not signing hard anyways. If we've enough connectivity, then could invalid approval votes be punished? In disputes, we'd need votes about who voted approve, which adds some bitfield into dispute votes, and worse likely demands repeated votes. You could then slash whoever approval voted according to a 2/3rd majority. It'd be a mess but.. It's possible worse issues exist too, ala paritytech/polkadot#7482 (comment) |
We'd need to give up on this slash if paritytech/polkadot#7482 lands I think, which brings various knock on effects. |
Do we really need to distinguish approval checkers from explicit "for invalid" dispute votes? If not, that would make the implementation much easier. And I don't see why we would need to distinguish. |
A dispute is a dispute. It shouldn't matter if it comes from an approval checker or another validator. |
This combined with disabling for minor slashes could have very problematic consequences as attackers could force out a lot of honest nodes by sacrificing fewer of their own. One solution to it was to implement a proportional disabling in case of minor offences (when there is already a lot of validators disabled above some threshold).
I generally believe that we would already be safe from those attacks thanks to time disputes/time overrun changes. They should also cover cases against the candidate and fully prevent this attack scenario. Example:
Question now is... is there any other problem with slashing valid on invalid? Was it the only reason why it was initially disabled? If there isn't anything else this should not strictly depend on proportional disabling. |
Yes, with time disputes we should already be safe when it comes to time nondeterminism. The problem is that PVF execution is complex and we do want to support multiple implementations (and versions of those) and different machine architectures. It seems impossible to "guarantee" realistically that there will never be an exploitable non-determinism. I would be very happy if we at least eliminated all the conceptual ones: Time disputes being one of them, native stack size limits might be solvable by something similar, but different implementations might always behave slightly different on edge cases - which could be exploited. Therefore I would consider defense in depth here not just nice to have, but actually an important requirement to have secure parachain consensus under realistic conditions. What could be an alternative to proportional disabling is to stop disabling any non 100% slashes (assuming non accumulating slashes here), once a certain threshold of disabled validators is reached. This seems fine and avoids nasty flip-flopping of disabled state. Accumulating slashes should then really be avoided though, because then validators could get kicked out longer term (too little stake at the next election). |
Do we need any disabling, except when 100% slashed? We should list the miss-behavior types to figure this out, but at least here I think the time overrun fees and 50% / k trick permits never disables. |
We need either disabling for non 100% slashes or accumulating slashes to deter repeating minor offences. Both have roughly similar security properties and deter repeated attacks. The benefit of disabling for minor offences is in cases of generally honest node operators that for some reason had a minor fault and they repeatedly and rapidly commit this minor offence. In slash accumulating this would drain their stake. In disabling this disables them and gives time to react with minimal losses. |
I actually started doing it recently and I think have a list of different misbehaviour types, how we deal with them now. I still need to check some equivocation cases Im not sure about and I'm listing some future changes like time overruns / time disputes to see the direction we are going and what we'll be covering going ahead. |
We'd still have this 50% / k rule there, so the adversary loses more nodes than he takes down, so no security concern here, right? We've changed these details enough, not sure our story from the retreat, but really the details matter here enormously.. At some point, there were once four+ voting modes in time disputes, "valid with time" which could mean under or over time, and "invalid by reason", where one reason was really exhausting the clock. A 2/3rd "invalid" vote means 100% slashing, regardless of the reasons. If not 2/3rd invalid then we do not slash per se, but we charge backers by the overrun median time. We admit disputes that concludes 2/3rd "invalid" while some honest nodes vote "valid", but we think those honest nodes claim times far beyond the backing timeout (or else we've some other execution bug). Q1. Could we apportion the 50%/k slash based upon how slow the nodes votes?
Q2. Can 2/3rd "valid but overtime" votes trigger reversion?
At first blush Q1 no but Q2 yes looks most robust.. |
Co-authored-by: Alistair Singh <alistair.singh7@gmail.com>
Initial implementation for the plan discussed here: #701 Built on top of #1178 v0: paritytech/polkadot#7554, ## Overall idea When approval-voting checks a candidate and is ready to advertise the approval, defer it in a per-relay chain block until we either have MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what candidates we have available. This should allow us to reduce the number of approvals messages we have to create/send/verify. The parameters are configurable, so we should find some values that balance: - Security of the network: Delaying broadcasting of an approval shouldn't but the finality at risk and to make sure that never happens we won't delay sending a vote if we are past 2/3 from the no-show time. - Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 & MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from the measurements we did on versi, it bottlenecks approval-distribution/approval-voting when increase significantly the number of validators and parachains - Block storage: In case of disputes we have to import this votes on chain and that increase the necessary storage with MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that disputes are not the normal way of the network functioning and we will limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this should be good enough. Alternatively, we could try to create a better way to store this on-chain through indirection, if that's needed. ## Other fixes: - Fixed the fact that we were sending random assignments to non-validators, that was wrong because those won't do anything with it and they won't gossip it either because they do not have a grid topology set, so we would waste the random assignments. - Added metrics to be able to debug potential no-shows and mis-processing of approvals/assignments. ## TODO: - [x] Get feedback, that this is moving in the right direction. @ordian @sandreim @eskimor @burdges, let me know what you think. - [x] More and more testing. - [x] Test in versi. - [x] Make MAX_APPROVAL_COALESCE_COUNT & MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration. - [x] Make sure the backwards compatibility works correctly - [x] Make sure this direction is compatible with other streams of work: #635 & #742 - [x] Final versi burn-in before merging --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
If we've trouble having access to We've If an adversary can split an escalation optimally then we're slashing |
Stale as dynamic slashing will not be implemented. We will be going with 2% as seen in here: see stage 4 of the new Disabling Strategy. 0% AgainstValid (just disable so they lose some future rewards) and 2% for ForInvalid. |
Currently we punt on slashing approval voters approving invalid blocks. This is sound from a gambler's ruin perspective, as the backers are the ones to be supposed to having skin in the game. We will always slash at least those backers.
While the security properties are maintained, there is a slight problem: Without slashing approval voters, there is no incentive left to actually do the validation. Approving without having checked is risk free, for the individual. Of course not for the network: Blindly approving puts the whole chain at risk, given that validators are heavily invested, they should have an interest in maintaining the network's security. Nevertheless there is a tragedy of the commons situation: "If it is just me not validating - I'll save a few resources and as long as everybody else is checking, everything is fine."
Right now this should be a very low risk as running modified source code is neither effort less nor risk free and also for now resources are mostly spent elsewhere. The actual validation is relatively cheap, nevertheless longer term this situation should be improved.
There are at least two possibilities:
While 1 would be a more solid solution: It would just not be possible to not doing the validation without getting slashed/losing out on rewards. On the flip-side it is not even clear how this should work, there are some ideas how this could work, but any proposed solution is far from trivial and would add a lot of complexity.
Which leaves us with option 2. So, why did we remove that slash in the first place? The reason was an attack vector: If someone hacked the backing validators/was willing to sacrifice them and would not care about getting them slashed, he could craft a candidate timing out on two thirds of the validators: Thus the candidate would be deemed invalid and the attacker got up to a third of the validators slashed 100%, by only controlling as little as two backers. It is worth mentioning, that this attack is not resolved by time disputes as time disputes only resolve disputes concluding for the candidate and charging backers is also no threat if they are willing to get slashed.
So why is this a problem? This would clearly be an unjustified slash, governance could refund them. True, but slashing also means:
Both of these would gain an attacker an advantage: Knocking out up to a third of all validators. With byzantine assumptions the attacker already controls up to a third. So with this attack, he would control half the validators!
Proposed Solution
First, we have to understand that this attack is an amplification attack: By only controlling two validators, the attacker can knock out up to a third. This attack only makes any sense if the attacker is able to knock out more honest nodes than he is sacrificing malicious nodes!
But this can easily be avoided: If we changed the slashing strategy such that not each approval voter gets slashed 100% of their stash, but instead that 100% gets distributed among all approving approval checkers:
Or simply: Individual slash per approval voter: 100%/k ... where k is the number of approving checkers.
What does this gain us? A lot:
Prerequisite: Good disabling strategy, which only fully disables nodes that got slashed 100%.
Tasks
The text was updated successfully, but these errors were encountered: