Validator Re-Enabling #5724

Overkillus · 2024-09-16T13:11:54Z

Aims to implement Stage 3 of Validator Disbling as outlined here: #4359

Features:

New Disabling Strategy (Staking level)
Re-enabling logic (Session level)
More generic disabling decision output
New Disabling Events

Testing & Security:

Closes #4745
Closes #2418

tdimitrov

I had a quick pass by focusing mainly on the approach. It looks good, nice work @Overkillus!

I've left some thoughts about a corner case with the re-enabling.

substrate/frame/staking/src/lib.rs

substrate/frame/staking/src/slashing.rs

substrate/frame/staking/src/tests.rs

substrate/frame/staking/src/pallet/mod.rs

polkadot/runtime/westend/src/lib.rs

Overkillus · 2024-11-04T19:52:28Z

bot fmt

command-bot · 2024-11-04T19:52:33Z

@Overkillus https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7692315 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh". Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 15-7908d2d9-da54-4b6e-af31-1b138de1c56d to cancel this command or bot cancel to cancel all commands in this pull request.

command-bot · 2024-11-04T19:55:45Z

@Overkillus Command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh" has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7692315 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7692315/artifacts/download.

Ank4n · 2024-11-04T22:49:30Z

Is it possible to move most of the disabling logic to pallet_session?

It seems staking does not really need to know anything about disabling. Currently, we are maintaing two copies of disabled validator indices in both pallet_session and pallet_staking, apparently because staking only knows about era, but I am not sure what does that mean. Staking does not have any concept of disable in it, and it should only care about slashing. If the concern is that we want to keep a validator disabled for the whole era, staking returns a set of validators only for sessions that trigger an era. For other sessions, it simply returns None. So session should know when to clear disabled validators.

There is a larger reason as to why I am flagging this. With staking moving to AH, the offence lifecycle would be something like this:

Offense is reported and verified on RC.
Any disable logic applied and propagated by pallet_session. It should also keep track of SlashPerbill in its DisabledValidators store.
An offence report (Offender, Session, SlashPerbill) is sent to AH for slashing.

In practice, pallet_staking will be replaced by pallet_staking_client (or a more suitable name) that will fill in the gaps. That is, it will act as OnOffenceHandler and SessionManager on RC, and async communicate with Staking/AH.

tl;dr: Given that Session and Staking communication would become async, this disabling logic doesn’t seem compatible or, at the very least, "good design" with that in mind.

Ank4n · 2024-11-04T21:20:29Z

prdoc/pr_5724.prdoc

+    description: |
+      Implementation of the Stage 3 for the New Disabling Strategy: https://github.com/paritytech/polkadot-sdk/issues/4359
+
+      This PR changes when an active validator node gets disabled within parachain consensus (reduced responsibilities and


Right?

Suggested change

This PR changes when an active validator node gets disabled within parachain consensus (reduced responsibilities and

This PR changes when an active validator node gets disabled within relaychain consensus (reduced responsibilities and

Not really. Parachains don't have their own native consensus or security and they derive it from the RLC.

In Polkadot when we say parachain consensus we are referring to parts of polkadot protocol that allow the inheritance of security and consensus to parachains AKA the part where we assign responsibility (backing), ensure availability, recheck correctness (approval checking rerunning of PVs) and punish wrongdoers (disputes and slashes).

I can maybe change it Parachain Consensus Protocol to make it a bit clearer but otherwise it is consistent with all our other materials. See here for instance: https://wiki.polkadot.network/docs/learn-parachains-faq#:~:text=%22Parachain%20consensus%22%20is%20special%20in,can%20control%20their%20own%20consensus.

Ank4n · 2024-11-04T21:25:53Z

substrate/frame/session/src/lib.rs

+		if i >= Validators::<T>::decode_len().unwrap_or(0) as u32 {
+			return false
+		}


Should this ever happen? Might want to mark this defensive.

Ank4n · 2024-11-04T21:27:13Z

substrate/frame/session/src/lib.rs

@@ -735,6 +735,23 @@ impl<T: Config> Pallet<T> {
 		})
 	}

+	/// Re-enable the validator of index `i`, returns `false` if the validator was already enabled.
+	pub fn enable_index(i: u32) -> bool {
+		if i >= Validators::<T>::decode_len().unwrap_or(0) as u32 {


This could be defensive as well.

Suggested change

if i >= Validators::<T>::decode_len().unwrap_or(0) as u32 {

if i >= Validators::<T>::decode_len().defensive_unwrap_or(0) as u32 {

substrate/frame/staking/src/lib.rs

Co-authored-by: Ankan <10196091+Ank4n@users.noreply.github.com>

Overkillus · 2024-11-05T12:55:16Z

@Ank4n

Is it possible to move most of the disabling logic to pallet_session?

(assuming we want to do this)

I will discuss if this is a good idea later below but even if we assume we want to do this I don't believe it is appropriate to do this in this PR. This PR only aims to, with minimal changes and refactors, simply allow for validator re-enabling using all the previous already in-place logic. I want to make sure that this PR is minimal and adheres to what was already pre-agreed in the design document for disabling, to limit audit time and just be more sure that everything does exactly what we think it does.

If we have plans to refactor the staking pallet (which as you mentioned will be done anyway during the port to AH) then it should be done in a separate primarily refactor/port PR.

Is it possible to move most of the disabling logic to pallet_session?

(answering if we want to do this)

Staking pallet as it stands is responsible for more than just slashing. It holds important params with regards to the active validator set. For instance min, ideal, and max validator counts. What it means to be disabled is not within the purview of the staking pallet, what it does is simply keeps track of highest offenders up to some limit and makes this info available for others. This is highly customisable and different users of this pallet can add new strategies with new limits.

Since we keep those validators for an era it makes sense to keep track of their offences (potential disabling status) also for a full era. Storing this information in session when we aim to keep it for era seems like an unintuitive approach.

I might agree that we could make it less opinionated by moving disabling strategies to session and instead keeping a history of all offences (unfiltered by disabling strategies) in staking. Nevertheless in this approach we still should keep track of all offences in staking. An offence is more than a slash, there might by other byproducts or consequences and we need to allow for them.

In the new pipeline for offences that you suggested, where do you think a history of offences within an era should be stored?

Session - Staking coupling

It is okay to keep staking bounded to session, but session should not be dependant on staking. Adding logic to session that is actually era-wide in scope through hacky reads of signals from staking seems well, hacky. This information should be kept in a context that is explicitly era-wide and shared with session.

TLDR: This might be a needed change but is outside of the scope of this PR and some of that information should still live in era-wide and not session-wide scopes.

Ank4n · 2024-11-05T14:23:17Z

@Overkillus

I wouldn't try to block you if you want to go ahead with this PR, especially since most of the things I flagged should ideally had been flagged with the earlier PRs, and this PR does not introduce a new design decision. That said:

I will discuss if this is a good idea later below but even if we assume we want to do this I don't believe it is appropriate to do this in this PR.

Since this PR touches the logic that we know we will have to migrate in next couple of months, it is reasonable enough enough to make those changes in this PR.

Staking pallet as it stands is responsible for more than just slashing. It holds important params with regards to the active validator set. For instance min, ideal, and max validator counts. What it means to be disabled is not within the purview of the staking pallet, what it does is simply keeps track of highest offenders up to some limit and makes this info available for others. This is highly customisable and different users of this pallet can add new strategies with new limits.

I believe you only require current count of active validator set. Could you point me any param that you need specifically from pallet_staking for disabling? The SlashPerbill is decided based on offense (also outside pallet_staking) and can be tracked in storage for disabled validators in pallet_session as well. I don't see it makes any difference.

In the new pipeline for offences that you suggested, where do you think a history of offences within an era should be stored?
It is okay to keep staking bounded to session, but session should not be dependant on staking. Adding logic to session that is actually era-wide in scope through hacky reads of signals from staking seems well, hacky. This information should be kept in a context that is explicitly era-wide and shared with session.

Why does offence or disabling has to relate with an era? pallet_session can use variety of ways to manage this.

Only clear disabled validator when underlying economic condition changes, i.e. when it receives a new set of validators.
Any time validator is disabled, keep them disabled for the next x sessions. x can be 6.

I think we have already tried to hack around with the concept of Era and Session, and having to maintain DisabledValidators in two places is a code smell stemming from that. If we look at pallet-staking independently, disabled in its context makes no meaningful sense. Another code smell is that the disable logic right now is spread between staking and session, with session acting like a proxy that receives disabled validators from staking and passes it to the session handlers.

IMO, this is a flawed design, in addition to the fact that it also doesn’t fit well with the post-AH migration setup.

Overkillus added 7 commits August 27, 2024 13:03

expand DisabledValidators with severity

3d438de

comment

b2d21dd

migration to disabling with severity

4c01a56

initial func tests

996dfab

session val index enabling

d908f65

Disabling Decision Abstraction

ec350ff

DisablingDecision struct refactor

927d686

Overkillus added I1-security The node fails to follow expected, security-sensitive, behaviour. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Sep 16, 2024

Overkillus requested review from tdimitrov and gpestana September 16, 2024 13:11

Overkillus self-assigned this Sep 16, 2024

Overkillus added 6 commits September 17, 2024 11:16

prdoc

533025b

extra staking func tests

29111f0

decision docs

a9d2450

migration desc in prdoc

dbaa856

severity update todo

7bcfd3e

migration cleanup

9fe1cac

tdimitrov reviewed Sep 19, 2024

View reviewed changes

substrate/frame/staking/src/lib.rs Outdated Show resolved Hide resolved

substrate/frame/staking/src/slashing.rs Outdated Show resolved Hide resolved

substrate/frame/staking/src/slashing.rs Outdated Show resolved Hide resolved

Overkillus added 7 commits September 23, 2024 18:13

repeated offences can update severity

0d994ad

extra severity update tests

50dbecf

refactor match in add_offending_validator

1db6a26

reenabling test update

29c59a6

reenabling test updates p2

2b78972

changing config and default to reenabling strategy

32941ab

Mock tests and disabling events

c153fc0

Overkillus changed the title ~~Validator Re-Enabling (master PR)~~ Validator Re-Enabling Sep 24, 2024

Overkillus added 2 commits September 24, 2024 19:07

fmt

63cca97

prdoc format fix

ca99191

Overkillus marked this pull request as ready for review September 30, 2024 09:33

tdimitrov reviewed Oct 29, 2024

View reviewed changes

substrate/frame/staking/src/tests.rs Show resolved Hide resolved

Overkillus added 4 commits October 30, 2024 13:40

#64 fix

a3aa92f

reverting #64 fix in paras runtime

ece9025

perbill -> OffenceSeverity wrapper

1e1bc48

OffenceSeverity adjusted tests

3382397

ordian reviewed Oct 31, 2024

View reviewed changes

substrate/frame/staking/src/pallet/mod.rs Outdated Show resolved Hide resolved

ordian reviewed Oct 31, 2024

View reviewed changes

polkadot/runtime/westend/src/lib.rs Show resolved Hide resolved

Overkillus added 3 commits November 4, 2024 08:59

deduplicate disable_limit

bd7cd72

typo

a07a7e8

non-cumulative assumption for re-enabling strategy impl

675b2be

ordian approved these changes Nov 4, 2024

View reviewed changes

tdimitrov approved these changes Nov 4, 2024

View reviewed changes

Overkillus removed request for a team and gpestana November 4, 2024 18:58

paritytech-review-bot bot requested a review from a team November 4, 2024 18:59

ordian requested a review from Ank4n November 4, 2024 19:12

Overkillus added 2 commits November 4, 2024 19:35

Merge branch 'master' into mkz-re-enabling

f07d895

md fmt

273342a

Overkillus added 2 commits November 4, 2024 20:13

fmt

cac5a61

clean

b8f6e04

Ank4n reviewed Nov 4, 2024

View reviewed changes

typo

2fe9b6d

Co-authored-by: Ankan <10196091+Ank4n@users.noreply.github.com>

paritytech-review-bot bot requested a review from a team November 5, 2024 11:21

Merge branch 'master' into mkz-re-enabling

beaf472

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validator Re-Enabling #5724

Validator Re-Enabling #5724

Overkillus commented Sep 16, 2024 •

edited

Loading

tdimitrov left a comment

Overkillus commented Nov 4, 2024

command-bot bot commented Nov 4, 2024 •

edited

Loading

command-bot bot commented Nov 4, 2024

Ank4n commented Nov 4, 2024 •

edited

Loading

Ank4n Nov 4, 2024

Overkillus Nov 5, 2024

Ank4n Nov 4, 2024

Ank4n Nov 4, 2024

Overkillus commented Nov 5, 2024 •

edited

Loading

Ank4n commented Nov 5, 2024 •

edited

Loading

	This PR changes when an active validator node gets disabled within parachain consensus (reduced responsibilities and
	This PR changes when an active validator node gets disabled within relaychain consensus (reduced responsibilities and

	if i >= Validators::<T>::decode_len().unwrap_or(0) as u32 {
	if i >= Validators::<T>::decode_len().defensive_unwrap_or(0) as u32 {

Validator Re-Enabling #5724

Are you sure you want to change the base?

Validator Re-Enabling #5724

Conversation

Overkillus commented Sep 16, 2024 • edited Loading

tdimitrov left a comment

Choose a reason for hiding this comment

Overkillus commented Nov 4, 2024

command-bot bot commented Nov 4, 2024 • edited Loading

command-bot bot commented Nov 4, 2024

Ank4n commented Nov 4, 2024 • edited Loading

Ank4n Nov 4, 2024

Choose a reason for hiding this comment

Overkillus Nov 5, 2024

Choose a reason for hiding this comment

Ank4n Nov 4, 2024

Choose a reason for hiding this comment

Ank4n Nov 4, 2024

Choose a reason for hiding this comment

Overkillus commented Nov 5, 2024 • edited Loading

Ank4n commented Nov 5, 2024 • edited Loading

Overkillus commented Sep 16, 2024 •

edited

Loading

command-bot bot commented Nov 4, 2024 •

edited

Loading

Ank4n commented Nov 4, 2024 •

edited

Loading

Overkillus commented Nov 5, 2024 •

edited

Loading

Ank4n commented Nov 5, 2024 •

edited

Loading