Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive copying of ValidatorStake in epoch manager #2610

Closed
MaksymZavershynskyi opened this issue May 4, 2020 · 18 comments · Fixed by #2805
Closed

Excessive copying of ValidatorStake in epoch manager #2610

MaksymZavershynskyi opened this issue May 4, 2020 · 18 comments · Fixed by #2805
Assignees
Labels
A-chain Area: Chain, client & related A-security Area: Security issues

Comments

@MaksymZavershynskyi
Copy link
Contributor

MaksymZavershynskyi commented May 4, 2020

Currently, every time epoch manager processes block X it produces BlockInfo and copies all proposals from BlockInfo for block X-1 into BlockInfo for block X. It creates an accumulation of cloned proposals with each new block processing, until the end of the epoch. In other words, the proposal submitted in the first block of the epoch will be cloned for each block in that epoch. See the cloning code here: https://github.com/nearprotocol/nearcore/blob/master/chain/epoch_manager/src/lib.rs#L431

This creates a security vulnerability. If someone submits staking transactions from different accounts with 1000 transactions per block, it will take 227 blocks until the total number of cloned memory exceeds 4GiB, since a single ValidatorStake can take up to 112 bytes. Since our current cache allows 1000 blocks this attack becomes feasible, see: https://github.com/nearprotocol/nearcore/blob/7c75fe5bc2eb44ae08a3445116157ac8fa77f2ac/chain/epoch_manager/src/lib.rs#L31

Simple solutions that wouldn't work

Reducing cache size

We could try reducing the cache size, but even if we had not cache at all, the node would break simply from cloning huge amounts of memory, on each new block. Specifically, after 5 hours (assuming we process 1 block per second) we would be copying 1.8GiB per block. Just having two consecutive BlockInfos would take almost 4GiB of memory.

Increasing the cost of staking transaction

If we assume that it is safe to copy 100MiB of proposals on each block processing, and epoch consists of 3600*12 blocks, we should raise the fees to prohibit more than 22 staking transactions per block. This number seems to be very low.

The proposed solution

I suggest to redesign epoch manager to avoid cloning of the ValidatorStake altogether.

@MaksymZavershynskyi MaksymZavershynskyi added A-chain Area: Chain, client & related A-security Area: Security issues labels May 4, 2020
@MaksymZavershynskyi
Copy link
Contributor Author

CC @mikhailOK , @bowenwang1996

@MaksymZavershynskyi
Copy link
Contributor Author

Moving discussion from Slack:

@bowenwang1996
If we don’t want to charge a lot we can have additional limiters to limit the number of proposals per block

@nearmax
I would reserve modifications to the protocols only to the problems that are hard to solve algorithmically. This seems to be a simple design inefficiency.

I don’t see a good reason why we should be aggregating this data on each single block.

We can probably compute the aggregated proposals at the end of the epoch by just iterating over the blocks in the epoch.

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented May 4, 2020

I suggest to redesign epoch manager to avoid cloning of the ValidatorStake altogether.

We can avoid cloning, but still have to do aggregation at some point, which will still result in slowness. Just for keeping these many proposals in memory will cost around 460MB, which may be fine. But the computation for validator set will likely be slower than acceptable.

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented May 4, 2020

We can probably compute the aggregated proposals at the end of the epoch by just iterating over the blocks in the epoch.

That is what we did before. See my comment above.

@bowenwang1996
Copy link
Collaborator

To avoid over optimization I think we can do aggregation at the end of the epoch like what we did before, but the current cost might need to be adjusted so that the aggregation and the following computation wouldn't be too slow. I'll do some benchmark to see what the appropriate number of proposals is.

@mikhailOK
Copy link
Contributor

The stake return at the end of the epoch has to touch all accounts that made proposals, and it's not possible to optimize because it changes state.
So we need something in the protocol to prevent a lot of small stakes - for example, have a limit on # of proposals below 1/2 of last threshold

@MaksymZavershynskyi
Copy link
Contributor Author

@mikhailOK

The stake return at the end of the epoch has to touch all accounts that made proposals, and it's not possible to optimize because it changes state.

This does not seem to be the bottleneck right now.

@bowenwang1996

We can avoid cloning, but still have to do aggregation at some point, which will still result in slowness. Just for keeping these many proposals in memory will cost around 460MB, which may be fine. But the computation for validator set will likely be slower than acceptable.

We can do aggregation on the database side. Our aggregation is very simple and it is very similar to what indexes in the databases do. There is not need to load all proposals into memory at the end of the epoch, if we figure out smart way of writing proposals on each block into the database then epoch aggregation can be a simple database query. Rocksdb however does not support queries so see my explanation below, but I think we can pull it off anyway. Here is how we can solve it if there were no forks:

  • Create a new column in the database;
  • Every time we process a block, for each ValidatorStake write the following key into the column:
    as_string(maximum_stake - stake) + <account id> + as_string(block_index). The format of this key makes sure the proposals are ordered by stake size, largest first;
  • At the end of the epoch start iterating over this column and stop when the records in the column do not correspond to the stake required for a single seat seat.

Properties of this approach:

  • We delegate the job of storing ordered proposals to the database engine, instead of reordering them ourselves;
  • For the same account id the large stakes come first so it maintains the invariance where stake can only increase.
  • We completely ignore small stakes at the end of the epoch, since we iterate the column large stakes first.

Can we make it work with forks?

@mikhailOK
Copy link
Contributor

@nearmax

This does not seem to be the bottleneck right now.

  • create a million accounts and send a million stake transactions. all get balances locked
  • at the end of the epoch all balances get unlocked at once. massive state transition.

We still need to defend from this attack

@MaksymZavershynskyi
Copy link
Contributor Author

We still need to defend from this attack

I agree.

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented May 5, 2020

@nearmax I think I didn't fully understand your approach.

At the end of the epoch start iterating over this column and stop when the records in the column do not correspond to the stake required for a single seat seat.

How does this prevent taking multiple proposals from the same account? We only take one per account. Also we always take the last proposal whereas that is not preserved in your approach.

EDIT: it might work if we switch to using as_string(max_block_index - block_index) + <account id> + as_string(maximum_stake - stake). This can also sort of work with forks -- we find the last final block and do aggregation this way + manual aggregation from last final block (should not be a lot under normal circumstances).

@bowenwang1996
Copy link
Collaborator

@mikhailOK good point.

@MaksymZavershynskyi
Copy link
Contributor Author

@bowenwang1996

How does this prevent taking multiple proposals from the same account? We only take one per account.
Also we always take the last proposal whereas that is not preserved in your approach.

Even better! We then create another column (B) with key/value: and . When we process the block as we iterate over proposals in the block we extract and remove entry as_string(maximum_stake - prev_stake) + <account id> from column (A) and write as_string(maximum_stake - stake) + <account id> to it.

The original column (A) that would need to have key: as_string(maximum_stake - stake) + <account id>.

@bowenwang1996
Copy link
Collaborator

@nearmax looks like something is missing here

with key/value: and .

Also I don't see how that works with forks. Optimizations aside, it seems that the main problem right now is the amount of the state updates we potentially need to do. I am curious how many accounts can be updated in 1 second. It is probably related to the size and shape of the trie.

@MaksymZavershynskyi
Copy link
Contributor Author

Up to 2000 -- that's what our TPS with financial transactions.

@mikhailOK
Copy link
Contributor

Staking pool contract potentially wants to send a lot of proposals from one account, our defense from million proposal attack should not ban that case

@bowenwang1996
Copy link
Collaborator

After discussion with @evgenykuzyakov, @nearmax, @mikhailOK, @ilblackdragon, we decided to use the following approach: say the seat price computed at the end of epoch X-1 (for epoch X+1) is P, then during epoch X, we reject all staking transactions whose stake is less than P/N where N is a system parameter. The idea is that the seat price wouldn't change too drastically between epochs and we can leverage that to reject staking transactions early. Now we need to decide on the value on N (10 was proposed).

@ilblackdragon
Copy link
Member

Additionally, all BlockInfo fields that are really just "current" tip trackers should probably be reworked. We originally were traversing all blocks within an epoch - but that end up being too slow. Might be a better alternative exists.

@bowenwang1996
Copy link
Collaborator

@ilblackdragon we need to store slashed validators at the tip.

bowenwang1996 added a commit that referenced this issue Jun 15, 2020
#2805)

…uirements

Fixes #2610 in two parts:

1. Introduce a `minimum_stake_divisor` to have a lower limit for staking transactions. If someone tries to stake with less than `last_seat_price / minimum_stake_divisor`, the transaction will fail.

2. Instead of copying the proposals in every block info, use an aggregator that always updates the proposal information to the last final block and at the end of an epoch, process the rest between last final block and last block of the epoch.

Test plan
---------
* Unit tests in epoch manager to make sure that the change in how we store aggregated information doesn't break anything.
* Tests in `neard/src/runtime` to make sure that the proposals that are now invalid are properly rejected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-chain Area: Chain, client & related A-security Area: Security issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants