Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sys 3925 fix primary validator #367

Merged
merged 34 commits into from
Apr 8, 2024
Merged

Conversation

micaelffrancoAV
Copy link
Contributor

@micaelffrancoAV micaelffrancoAV commented Mar 18, 2024

This PR aims to fix the validator chosen to perform certain tasks. Before we used to have this based on block number and that would always return the same validator if the given block number was a multiple of 10.
This new implementation fixes this issue by always incrementing the validator everytime we call calculate_primary_validator with the aim of balancing the validator resources.

This PR includes:

  • Fix to primary validator calculation (calculate_primary_validator).
  • Fix to isPrimary function.
  • Removed tests for the old logic and added new tests.

pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
pallets/avn/src/tests/tests.rs Outdated Show resolved Hide resolved
pallets/avn/src/tests/tests.rs Outdated Show resolved Hide resolved
pallets/avn/src/tests/tests.rs Outdated Show resolved Hide resolved
pallets/avn/src/tests/tests.rs Outdated Show resolved Hide resolved
pallets/avn/src/tests/tests.rs Outdated Show resolved Hide resolved
pallets/eth-bridge/src/tx.rs Outdated Show resolved Hide resolved
pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
node/src/chain_spec/stable/mod.rs Outdated Show resolved Hide resolved
pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
pallets/eth-bridge/src/lib.rs Outdated Show resolved Hide resolved
pallets/eth-bridge/src/tx.rs Outdated Show resolved Hide resolved
pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
pallets/avn/src/tests/tests.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@nahuseyoum nahuseyoum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
};

if index >= validators.len() {
return Err(Error::<T>::InvalidValidatorIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an open question: Instead of throwing an error here, can we reset the index to 0 and start again? Will this cause any issues? I know we should never get in this state but if we can self heal in a safe way it would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense.
We should not have any issues by resetting the index, going back to zero in the end of the array would be the correct fix for this error because it will follow the modular approach we currently have when advancing index.

Thanks for the comment, I think we should apply it.

pallets/eth-bridge/src/tx.rs Outdated Show resolved Hide resolved
pallets/summary/src/challenge.rs Outdated Show resolved Hide resolved
pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
pallets/avn/src/lib.rs Outdated Show resolved Hide resolved
OperationType::Avn => counters.avn as usize,
};

if index >= validators.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick: if this ever happens on a regular basis, the validator at position 0 will be chosen more frequently than everyone else. We could use an alternative rule (to not always be 0) but that would make it more complex, so I'm ok with the 0 as long as we understand this will skew probabilities if it we frequently change the validator set. That is not expected to happen, so I guess it's just an academic point.

OperationType::Avn => counters.avn as usize,
};

if index >= validators.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably ok to have this check here, but this is an indication that we did not do our work correctly somewhere else.
This can only happen if the validators have changed (one has been removed) after we've last chosen the primary collator. But then, that means that one action left the system in a broken state, which is not good.

How to solve this?

  1. ensure that all actions that are potentially destructive clean up after them. Removing validators has to update the primary collator.

Analysis: I don't like this, because it means we have to know, at each specific site (removing validators) all the other parts of the logic that depend on it. It is not the competence of the validators manager to know that, and so it should not be this doing the update of primary collator.

  1. make the operations associated with primary collator resilient to changes in their parameters:

This should be preferable. Ideally, we would not have to this check to update the primary collator in all the functions that make use of that storage item. Currently, we have this one (is_primary) but we semantically doing this verification and update here feels wrong.
This function should only check if a validator is the primary validator. But if there is ambiguity about what that primary is, because the value in storage may be inconsistent, then I think we should have instead a facade for that storage item, a function whose purpose it is to ensure it always reports a valid value.

This whole logic of checking could be migrated there. From the point of view of meaning of the code, it would be more correct. But we would have to discipline ourselves NOT to use the storage item directly.
From a performance aspect, every time we wanted to access the storage item we would do this check and update the item if needed. Most of times, we will do a couple of ifs and return the item's value.

The alternative, here, also incurs this extra cost, but every time we call is_primary. So depending on how often we call each, the cost may be the same or not. But it never goes away. So I'd go with what makes more logical sense.

@micaelffrancoAV micaelffrancoAV merged commit 30dbf7a into main Apr 8, 2024
5 checks passed
@micaelffrancoAV micaelffrancoAV deleted the SYS-3925_Fix_primary_validator branch April 8, 2024 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants