Excessive copying of ValidatorStake in epoch manager #2610

MaksymZavershynskyi · 2020-05-04T23:46:17Z

Currently, every time epoch manager processes block X it produces BlockInfo and copies all proposals from BlockInfo for block X-1 into BlockInfo for block X. It creates an accumulation of cloned proposals with each new block processing, until the end of the epoch. In other words, the proposal submitted in the first block of the epoch will be cloned for each block in that epoch. See the cloning code here: https://github.com/nearprotocol/nearcore/blob/master/chain/epoch_manager/src/lib.rs#L431

This creates a security vulnerability. If someone submits staking transactions from different accounts with 1000 transactions per block, it will take 227 blocks until the total number of cloned memory exceeds 4GiB, since a single ValidatorStake can take up to 112 bytes. Since our current cache allows 1000 blocks this attack becomes feasible, see: https://github.com/nearprotocol/nearcore/blob/7c75fe5bc2eb44ae08a3445116157ac8fa77f2ac/chain/epoch_manager/src/lib.rs#L31

Simple solutions that wouldn't work

Reducing cache size

We could try reducing the cache size, but even if we had not cache at all, the node would break simply from cloning huge amounts of memory, on each new block. Specifically, after 5 hours (assuming we process 1 block per second) we would be copying 1.8GiB per block. Just having two consecutive BlockInfos would take almost 4GiB of memory.

Increasing the cost of staking transaction

If we assume that it is safe to copy 100MiB of proposals on each block processing, and epoch consists of 3600*12 blocks, we should raise the fees to prohibit more than 22 staking transactions per block. This number seems to be very low.

The proposed solution

I suggest to redesign epoch manager to avoid cloning of the ValidatorStake altogether.

The text was updated successfully, but these errors were encountered:

MaksymZavershynskyi · 2020-05-04T23:46:38Z

CC @mikhailOK , @bowenwang1996

MaksymZavershynskyi · 2020-05-04T23:51:10Z

Moving discussion from Slack:

@bowenwang1996
If we don’t want to charge a lot we can have additional limiters to limit the number of proposals per block

@nearmax
I would reserve modifications to the protocols only to the problems that are hard to solve algorithmically. This seems to be a simple design inefficiency.

I don’t see a good reason why we should be aggregating this data on each single block.

We can probably compute the aggregated proposals at the end of the epoch by just iterating over the blocks in the epoch.

bowenwang1996 · 2020-05-04T23:51:21Z

I suggest to redesign epoch manager to avoid cloning of the ValidatorStake altogether.

We can avoid cloning, but still have to do aggregation at some point, which will still result in slowness. Just for keeping these many proposals in memory will cost around 460MB, which may be fine. But the computation for validator set will likely be slower than acceptable.

bowenwang1996 · 2020-05-04T23:52:17Z

We can probably compute the aggregated proposals at the end of the epoch by just iterating over the blocks in the epoch.

That is what we did before. See my comment above.

bowenwang1996 · 2020-05-05T00:52:57Z

To avoid over optimization I think we can do aggregation at the end of the epoch like what we did before, but the current cost might need to be adjusted so that the aggregation and the following computation wouldn't be too slow. I'll do some benchmark to see what the appropriate number of proposals is.

mikhailOK · 2020-05-05T01:09:59Z

The stake return at the end of the epoch has to touch all accounts that made proposals, and it's not possible to optimize because it changes state.
So we need something in the protocol to prevent a lot of small stakes - for example, have a limit on # of proposals below 1/2 of last threshold

MaksymZavershynskyi · 2020-05-05T02:27:12Z

@mikhailOK

The stake return at the end of the epoch has to touch all accounts that made proposals, and it's not possible to optimize because it changes state.

This does not seem to be the bottleneck right now.

@bowenwang1996

We can avoid cloning, but still have to do aggregation at some point, which will still result in slowness. Just for keeping these many proposals in memory will cost around 460MB, which may be fine. But the computation for validator set will likely be slower than acceptable.

We can do aggregation on the database side. Our aggregation is very simple and it is very similar to what indexes in the databases do. There is not need to load all proposals into memory at the end of the epoch, if we figure out smart way of writing proposals on each block into the database then epoch aggregation can be a simple database query. Rocksdb however does not support queries so see my explanation below, but I think we can pull it off anyway. Here is how we can solve it if there were no forks:

Create a new column in the database;
Every time we process a block, for each ValidatorStake write the following key into the column:
as_string(maximum_stake - stake) + <account id> + as_string(block_index). The format of this key makes sure the proposals are ordered by stake size, largest first;
At the end of the epoch start iterating over this column and stop when the records in the column do not correspond to the stake required for a single seat seat.

Properties of this approach:

We delegate the job of storing ordered proposals to the database engine, instead of reordering them ourselves;
For the same account id the large stakes come first so it maintains the invariance where stake can only increase.
We completely ignore small stakes at the end of the epoch, since we iterate the column large stakes first.

Can we make it work with forks?

mikhailOK · 2020-05-05T02:32:10Z

@nearmax

This does not seem to be the bottleneck right now.

create a million accounts and send a million stake transactions. all get balances locked
at the end of the epoch all balances get unlocked at once. massive state transition.

We still need to defend from this attack

MaksymZavershynskyi · 2020-05-05T02:36:35Z

We still need to defend from this attack

I agree.

bowenwang1996 · 2020-05-05T02:40:49Z

@nearmax I think I didn't fully understand your approach.

At the end of the epoch start iterating over this column and stop when the records in the column do not correspond to the stake required for a single seat seat.

How does this prevent taking multiple proposals from the same account? We only take one per account. Also we always take the last proposal whereas that is not preserved in your approach.

EDIT: it might work if we switch to using as_string(max_block_index - block_index) + <account id> + as_string(maximum_stake - stake). This can also sort of work with forks -- we find the last final block and do aggregation this way + manual aggregation from last final block (should not be a lot under normal circumstances).

bowenwang1996 · 2020-05-05T02:41:02Z

@mikhailOK good point.

MaksymZavershynskyi · 2020-05-05T03:02:57Z

@bowenwang1996

How does this prevent taking multiple proposals from the same account? We only take one per account.
Also we always take the last proposal whereas that is not preserved in your approach.

Even better! We then create another column (B) with key/value: and . When we process the block as we iterate over proposals in the block we extract and remove entry as_string(maximum_stake - prev_stake) + <account id> from column (A) and write as_string(maximum_stake - stake) + <account id> to it.

The original column (A) that would need to have key: as_string(maximum_stake - stake) + <account id>.

bowenwang1996 · 2020-05-05T04:33:42Z

@nearmax looks like something is missing here

with key/value: and .

Also I don't see how that works with forks. Optimizations aside, it seems that the main problem right now is the amount of the state updates we potentially need to do. I am curious how many accounts can be updated in 1 second. It is probably related to the size and shape of the trie.

MaksymZavershynskyi · 2020-05-06T00:25:38Z

Up to 2000 -- that's what our TPS with financial transactions.

mikhailOK · 2020-05-11T19:17:23Z

Staking pool contract potentially wants to send a lot of proposals from one account, our defense from million proposal attack should not ban that case

bowenwang1996 · 2020-05-12T22:37:52Z

After discussion with @evgenykuzyakov, @nearmax, @mikhailOK, @ilblackdragon, we decided to use the following approach: say the seat price computed at the end of epoch X-1 (for epoch X+1) is P, then during epoch X, we reject all staking transactions whose stake is less than P/N where N is a system parameter. The idea is that the seat price wouldn't change too drastically between epochs and we can leverage that to reject staking transactions early. Now we need to decide on the value on N (10 was proposed).

ilblackdragon · 2020-05-29T15:37:19Z

Additionally, all BlockInfo fields that are really just "current" tip trackers should probably be reworked. We originally were traversing all blocks within an epoch - but that end up being too slow. Might be a better alternative exists.

bowenwang1996 · 2020-05-29T16:11:50Z

@ilblackdragon we need to store slashed validators at the tip.

#2805) …uirements Fixes #2610 in two parts: 1. Introduce a `minimum_stake_divisor` to have a lower limit for staking transactions. If someone tries to stake with less than `last_seat_price / minimum_stake_divisor`, the transaction will fail. 2. Instead of copying the proposals in every block info, use an aggregator that always updates the proposal information to the last final block and at the end of an epoch, process the rest between last final block and last block of the epoch. Test plan --------- * Unit tests in epoch manager to make sure that the change in how we store aggregated information doesn't break anything. * Tests in `neard/src/runtime` to make sure that the proposals that are now invalid are properly rejected.

MaksymZavershynskyi added A-chain Area: Chain, client & related A-security Area: Security issues labels May 4, 2020

MaksymZavershynskyi assigned SkidanovAlex May 4, 2020

bowenwang1996 assigned bowenwang1996 and unassigned SkidanovAlex May 5, 2020

MaksymZavershynskyi mentioned this issue May 13, 2020

feat: expose validator information as host functions #2504

Merged

MaksymZavershynskyi mentioned this issue Jun 3, 2020

[Tracking] Phase 1 Protocol Changes #2778

Closed

5 tasks

bowenwang1996 mentioned this issue Jun 6, 2020

fix: prevent million-proposal attack by introducing minimum stake req… #2805

Merged

bowenwang1996 closed this as completed in #2805 Jun 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive copying of ValidatorStake in epoch manager #2610

Excessive copying of ValidatorStake in epoch manager #2610

MaksymZavershynskyi commented May 4, 2020 •

edited

Loading

MaksymZavershynskyi commented May 4, 2020

MaksymZavershynskyi commented May 4, 2020

bowenwang1996 commented May 4, 2020 •

edited

Loading

bowenwang1996 commented May 4, 2020 •

edited

Loading

bowenwang1996 commented May 5, 2020

mikhailOK commented May 5, 2020

MaksymZavershynskyi commented May 5, 2020

mikhailOK commented May 5, 2020

MaksymZavershynskyi commented May 5, 2020

bowenwang1996 commented May 5, 2020 •

edited

Loading

bowenwang1996 commented May 5, 2020

MaksymZavershynskyi commented May 5, 2020

bowenwang1996 commented May 5, 2020

MaksymZavershynskyi commented May 6, 2020

mikhailOK commented May 11, 2020

bowenwang1996 commented May 12, 2020

ilblackdragon commented May 29, 2020

bowenwang1996 commented May 29, 2020

Excessive copying of ValidatorStake in epoch manager #2610

Excessive copying of ValidatorStake in epoch manager #2610

Comments

MaksymZavershynskyi commented May 4, 2020 • edited Loading

Simple solutions that wouldn't work

Reducing cache size

Increasing the cost of staking transaction

The proposed solution

MaksymZavershynskyi commented May 4, 2020

MaksymZavershynskyi commented May 4, 2020

bowenwang1996 commented May 4, 2020 • edited Loading

bowenwang1996 commented May 4, 2020 • edited Loading

bowenwang1996 commented May 5, 2020

mikhailOK commented May 5, 2020

MaksymZavershynskyi commented May 5, 2020

mikhailOK commented May 5, 2020

MaksymZavershynskyi commented May 5, 2020

bowenwang1996 commented May 5, 2020 • edited Loading

bowenwang1996 commented May 5, 2020

MaksymZavershynskyi commented May 5, 2020

bowenwang1996 commented May 5, 2020

MaksymZavershynskyi commented May 6, 2020

mikhailOK commented May 11, 2020

bowenwang1996 commented May 12, 2020

ilblackdragon commented May 29, 2020

bowenwang1996 commented May 29, 2020

MaksymZavershynskyi commented May 4, 2020 •

edited

Loading

bowenwang1996 commented May 4, 2020 •

edited

Loading

bowenwang1996 commented May 4, 2020 •

edited

Loading

bowenwang1996 commented May 5, 2020 •

edited

Loading