Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit ChunkStateWitness size to 16MB #10615

Closed
wants to merge 3 commits into from

Conversation

jancionear
Copy link
Contributor

@jancionear jancionear commented Feb 14, 2024

This PR limits the size of ChunkStateWitness to 16MB. Witnesses larger than 16MB will be considered invalid.

It'll help to protect from abuse - huge ChunkStateWitnesses would require a lot of resource to process, and they would take up a lot of space in orphan witness pool, so it's good to immediately reject witnesses that are too big.

During StatelessNet loadtests by @staffik the maximum observed size of ChunkStateWitness was 8MB, so 16MB should be a safe limit.

There's a safeguard to make sure that a node doesn't send out a ChunkStateWitness which is larger than 16MB. It's good to have a safeguard there, we don't want to get all nodes banned if it turns out that the estimate was wrong and naturally produced ChunkStateWitnesses are sometimes larger than 16MB.

Technically we measure the size of ChunkStateWitnessInner, not ChunkStateWitness. It was easier to do it this way because sign_chunk_state_witness returns the size of ChunkStateWitnessInner, which is also later used in the metrics.
Limiting the size of ChunkStateWitnessInner is enough, as ChunkStateWitness is made up of ChunkStateWitnessInner and a constant-size signature, so its size is also limited.

Refs: #10259 (but not fixes, it's a primitive limit that doesn't take into account computation costs, etc)

@jancionear jancionear added the A-stateless-validation Area: stateless validation label Feb 14, 2024
@jancionear jancionear requested a review from pugachAG February 14, 2024 22:36
@jancionear jancionear requested a review from a team as a code owner February 14, 2024 22:36
@jancionear jancionear requested a review from staffik February 14, 2024 22:39
Copy link

codecov bot commented Feb 14, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (9c66c5b) 72.19% compared to head (a8bf9a2) 72.17%.
Report is 3 commits behind head on master.

Files Patch % Lines
...src/stateless_validation/state_witness_producer.rs 55.55% 4 Missing ⚠️
...client/src/stateless_validation/chunk_validator.rs 97.10% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10615      +/-   ##
==========================================
- Coverage   72.19%   72.17%   -0.03%     
==========================================
  Files         726      726              
  Lines      147697   147777      +80     
  Branches   147697   147777      +80     
==========================================
+ Hits       106636   106656      +20     
- Misses      36264    36321      +57     
- Partials     4797     4800       +3     
Flag Coverage Δ
backward-compatibility 0.08% <0.00%> (-0.01%) ⬇️
db-migration 0.08% <0.00%> (-0.01%) ⬇️
genesis-check 1.24% <0.00%> (-0.01%) ⬇️
integration-tests 36.98% <12.82%> (-0.01%) ⬇️
linux 71.22% <85.89%> (+<0.01%) ⬆️
linux-nightly 71.62% <92.30%> (-0.01%) ⬇️
macos 54.91% <85.89%> (-0.20%) ⬇️
pytests 1.46% <0.00%> (-0.01%) ⬇️
sanity-checks 1.26% <0.00%> (-0.01%) ⬇️
unittests 68.09% <92.30%> (-0.03%) ⬇️
upgradability 0.13% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@staffik staffik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can increase MAX_CHUNK_STATE_WITNESS_INNER_SIZE later if it turns out to be too small.

vec![],
&EmptyValidatorSigner::default(),
));
let dummy_partial_state = PartialState::TrieValues(Vec::new());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PartialState::default()

fn dummy_chunk_state_witness() -> ChunkStateWitness {
let dummy_chunk_header = ShardChunkHeader::V3(ShardChunkHeaderV3::new(
CryptoHash::default(),
CryptoHash::default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: StateRoot::default()

&EmptyValidatorSigner::default(),
));
let dummy_partial_state = PartialState::TrieValues(Vec::new());
ChunkStateWitness {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We have ChunkStateWitness::empty(), although there is a comment suggesting it might be removed in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had added it for genesis chunk state witness but we can use it for tests

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, please address the comments.

Copy link
Contributor

@shreyan-gupta shreyan-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@pugachAG pugachAG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this result in shard becoming unavailable in case when actual state witness size exceeds size limit because all chunk producers for that shard will use the same set of receipts trying to build the chunk?

@Longarithm
Copy link
Member

Right... I see two alternatives to resolve this:

  1. We should probably do this check for now only when we add state witness to cache.
    Then shard becomes unavailable only if for 1/3 chunk validators witness is orphaned.
    But then we also need to keep 5 retries of immediate state witness processing to avoid data races.

  2. Calculate state witness size along with tx/receipt execution.
    Stop execution if previous chunk(s) is(are) skipped when size exceeds some boundary.

2nd is more proper but 1st looks simpler for now.

@staffik
Copy link
Contributor

staffik commented Feb 15, 2024

Stop execution if previous chunk(s) is(are) skipped when size exceeds some boundary.

@Longarithm Do you mean stopping production/validation of current chunk if previous chunk's state witness was too large? I think I do not understand the second idea.

@jancionear
Copy link
Contributor Author

Won't this result in shard becoming unavailable in case when actual state witness size exceeds size limit because all chunk producers for that shard will use the same set of receipts trying to build the chunk?

That's a good point...
I was hoping that every chunk producer produces slightly different chunks, because it chooses slightly different transactions from the pool, but this might not be the case. The receipts are the same every time, plus chunk witness proves a transition proposed by the previous chunk, so things could still break.

But it should still be safe to enforce the size limit for orphaned state witnesses. Orphaned witnesses are an optimization, so it isn't a problem if some nodes miss them. And even then it would happen very rarely, as most witnesses are smaller than 16MB.

Thank you for staying vigilant! I'll close this PR, this requires more work to solve properly. For now I'll just add a size limit for orphans, but leave the rest as is.

@jancionear jancionear closed this Feb 15, 2024
jancionear added a commit to jancionear/nearcore that referenced this pull request Feb 15, 2024
Let's limit the size of orphan witnesses which are
kept in OrphanChunkStateWitnessPool to 16MB to limit
the maximum amount of memory that this pool can consume.

During StatelessNet loadtests by @staffik, the maximum
observed ChunkStateWitness size was 8MB, so 16MB should
be a safe limit. Witnesses above this size won't be saved.
This shouldn't be a problem, if a validator misses one orphaned
ChunkStateWitness they just won't vote on it, which is a normal
occurence in the protocol.

Only size of orphaned witnesses is limited. Witnesses which are
processed immediately can still be aribtrarily large. It might
not be safe to limit non-orphan witness, see the discussion in
near#10615
jancionear added a commit to jancionear/nearcore that referenced this pull request Feb 20, 2024
Let's limit the size of orphan witnesses which are
kept in OrphanChunkStateWitnessPool to 16MB to limit
the maximum amount of memory that this pool can consume.

During StatelessNet loadtests by @staffik, the maximum
observed ChunkStateWitness size was 8MB, so 16MB should
be a safe limit. Witnesses above this size won't be saved.
This shouldn't be a problem, if a validator misses one orphaned
ChunkStateWitness they just won't vote on it, which is a normal
occurence in the protocol.

Only size of orphaned witnesses is limited. Witnesses which are
processed immediately can still be aribtrarily large. It might
not be safe to limit non-orphan witness, see the discussion in
near#10615
github-merge-queue bot pushed a commit that referenced this pull request Feb 23, 2024
### Description
This PR adds a pool for orphaned `ChunkStateWitnesses`.
To process a `ChunkStateWitness` we need the previous block, but
sometimes it isn't available immediately. The node might receive a
`ChunkStateWitness` before the block that's required to process it. In
such cases the witness becomes an "orphaned chunk state witness" and
it's put in `OrphanChunkStateWitnessPool`, where it waits for the
desired block to appear. Once a new block is accepted, we fetch all
orphaned witnesses that were waiting for this block from the pool and
process them.

### Design of `OrphanStateWitnessPool`

`OrphanStateWitnessPool` keeps a cache which maps `shard_id` and
`height` to an orphan `ChunkStateWitness` with these parameters:
```rust
witness_cache: LruCache<(ShardId, BlockHeight), ChunkStateWitness>,
```

All `ChunkStateWitnesses` go through basic validation before being put
in the orphan cache.
* The signature is checked to make sure that this witness really comes
from the right chunk producer that should produce a witness at this
height and shard_id.
* Client keeps only witnesses which are within 5 blocks of the current
chain head to prevent spam attacks. Without this limitation a single
malicious chunk producer could fill the whole cache with their fake
witnesses.
* There's also a limitation on witness size to limit the amount of
memory consumed by the pool. During StatelessNet loadtests performed by
`@staffik` and `@Longarithm` the observed `ChunkStateWitness` sIze was
16-32MB, so a 40MB limit should be alright. This PR only limits the size
of orphaned witnesses, limiting the size of non-orphan witnesses is much
more tricky, see the discussion in
#10615.

It's impossible to fully validate an orphaned witness, but this partial
validation should be enough to protect against attacks on the orphan
pool.

Under normal circumstances there should be only a few orphaned witnesses
per shard. If the node has fallen behind by more than a few blocks, it
has to catch up and its chunk endorsements don't matter.
The default cache capacity is set to 25 witnesses. With 5 shards it
provides capacity for 5 orphaned witnesses on each shard, which should
be enough.
Assuming that a single witness can take up 40 MB, the pool will consume
at most 1GB at full capacity.

The changes are divided into individual commits, they can be reviewed
commit-by-commit.

### Fixes
Fixes: #10552
Fixes: near/stakewars-iv#15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants