Limit ChunkStateWitness size to 16MB #10615

jancionear · 2024-02-14T22:36:43Z

This PR limits the size of ChunkStateWitness to 16MB. Witnesses larger than 16MB will be considered invalid.

It'll help to protect from abuse - huge ChunkStateWitnesses would require a lot of resource to process, and they would take up a lot of space in orphan witness pool, so it's good to immediately reject witnesses that are too big.

During StatelessNet loadtests by @staffik the maximum observed size of ChunkStateWitness was 8MB, so 16MB should be a safe limit.

There's a safeguard to make sure that a node doesn't send out a ChunkStateWitness which is larger than 16MB. It's good to have a safeguard there, we don't want to get all nodes banned if it turns out that the estimate was wrong and naturally produced ChunkStateWitnesses are sometimes larger than 16MB.

Technically we measure the size of ChunkStateWitnessInner, not ChunkStateWitness. It was easier to do it this way because sign_chunk_state_witness returns the size of ChunkStateWitnessInner, which is also later used in the metrics.
Limiting the size of ChunkStateWitnessInner is enough, as ChunkStateWitness is made up of ChunkStateWitnessInner and a constant-size signature, so its size is also limited.

Refs: #10259 (but not fixes, it's a primitive limit that doesn't take into account computation costs, etc)

codecov · 2024-02-14T22:58:29Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (9c66c5b) 72.19% compared to head (a8bf9a2) 72.17%.
Report is 3 commits behind head on master.

Files	Patch %	Lines
...src/stateless_validation/state_witness_producer.rs	55.55%	4 Missing ⚠️
...client/src/stateless_validation/chunk_validator.rs	97.10%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10615      +/-   ##
==========================================
- Coverage   72.19%   72.17%   -0.03%     
==========================================
  Files         726      726              
  Lines      147697   147777      +80     
  Branches   147697   147777      +80     
==========================================
+ Hits       106636   106656      +20     
- Misses      36264    36321      +57     
- Partials     4797     4800       +3

Flag	Coverage Δ
backward-compatibility	`0.08% <0.00%> (-0.01%)`	⬇️
db-migration	`0.08% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.24% <0.00%> (-0.01%)`	⬇️
integration-tests	`36.98% <12.82%> (-0.01%)`	⬇️
linux	`71.22% <85.89%> (+<0.01%)`	⬆️
linux-nightly	`71.62% <92.30%> (-0.01%)`	⬇️
macos	`54.91% <85.89%> (-0.20%)`	⬇️
pytests	`1.46% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.26% <0.00%> (-0.01%)`	⬇️
unittests	`68.09% <92.30%> (-0.03%)`	⬇️
upgradability	`0.13% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

staffik

LGTM, we can increase MAX_CHUNK_STATE_WITNESS_INNER_SIZE later if it turns out to be too small.

staffik · 2024-02-15T08:29:03Z

chain/client/src/stateless_validation/chunk_validator.rs

+            vec![],
+            &EmptyValidatorSigner::default(),
+        ));
+        let dummy_partial_state = PartialState::TrieValues(Vec::new());


nit: PartialState::default()

staffik · 2024-02-15T08:31:58Z

chain/client/src/stateless_validation/chunk_validator.rs

+    fn dummy_chunk_state_witness() -> ChunkStateWitness {
+        let dummy_chunk_header = ShardChunkHeader::V3(ShardChunkHeaderV3::new(
+            CryptoHash::default(),
+            CryptoHash::default(),


nit: StateRoot::default()

staffik · 2024-02-15T08:36:13Z

chain/client/src/stateless_validation/chunk_validator.rs

+            &EmptyValidatorSigner::default(),
+        ));
+        let dummy_partial_state = PartialState::TrieValues(Vec::new());
+        ChunkStateWitness {


nit: We have ChunkStateWitness::empty(), although there is a comment suggesting it might be removed in the future.

Yeah, I had added it for genesis chunk state witness but we can use it for tests

Longarithm

Approving to unblock, please address the comments.

shreyan-gupta

Looks good!

pugachAG

Won't this result in shard becoming unavailable in case when actual state witness size exceeds size limit because all chunk producers for that shard will use the same set of receipts trying to build the chunk?

Longarithm · 2024-02-15T11:17:15Z

Right... I see two alternatives to resolve this:

We should probably do this check for now only when we add state witness to cache.
Then shard becomes unavailable only if for 1/3 chunk validators witness is orphaned.
But then we also need to keep 5 retries of immediate state witness processing to avoid data races.
Calculate state witness size along with tx/receipt execution.
Stop execution if previous chunk(s) is(are) skipped when size exceeds some boundary.

2nd is more proper but 1st looks simpler for now.

staffik · 2024-02-15T11:25:19Z

Stop execution if previous chunk(s) is(are) skipped when size exceeds some boundary.

@Longarithm Do you mean stopping production/validation of current chunk if previous chunk's state witness was too large? I think I do not understand the second idea.

jancionear · 2024-02-15T12:08:22Z

Won't this result in shard becoming unavailable in case when actual state witness size exceeds size limit because all chunk producers for that shard will use the same set of receipts trying to build the chunk?

That's a good point...
I was hoping that every chunk producer produces slightly different chunks, because it chooses slightly different transactions from the pool, but this might not be the case. The receipts are the same every time, plus chunk witness proves a transition proposed by the previous chunk, so things could still break.

But it should still be safe to enforce the size limit for orphaned state witnesses. Orphaned witnesses are an optimization, so it isn't a problem if some nodes miss them. And even then it would happen very rarely, as most witnesses are smaller than 16MB.

Thank you for staying vigilant! I'll close this PR, this requires more work to solve properly. For now I'll just add a size limit for orphans, but leave the rest as is.

@staffik

Let's limit the size of orphan witnesses which are kept in OrphanChunkStateWitnessPool to 16MB to limit the maximum amount of memory that this pool can consume. During StatelessNet loadtests by @staffik, the maximum observed ChunkStateWitness size was 8MB, so 16MB should be a safe limit. Witnesses above this size won't be saved. This shouldn't be a problem, if a validator misses one orphaned ChunkStateWitness they just won't vote on it, which is a normal occurence in the protocol. Only size of orphaned witnesses is limited. Witnesses which are processed immediately can still be aribtrarily large. It might not be safe to limit non-orphan witness, see the discussion in near#10615

@staffik

Let's limit the size of orphan witnesses which are kept in OrphanChunkStateWitnessPool to 16MB to limit the maximum amount of memory that this pool can consume. During StatelessNet loadtests by @staffik, the maximum observed ChunkStateWitness size was 8MB, so 16MB should be a safe limit. Witnesses above this size won't be saved. This shouldn't be a problem, if a validator misses one orphaned ChunkStateWitness they just won't vote on it, which is a normal occurence in the protocol. Only size of orphaned witnesses is limited. Witnesses which are processed immediately can still be aribtrarily large. It might not be safe to limit non-orphan witness, see the discussion in near#10615

### Description This PR adds a pool for orphaned `ChunkStateWitnesses`. To process a `ChunkStateWitness` we need the previous block, but sometimes it isn't available immediately. The node might receive a `ChunkStateWitness` before the block that's required to process it. In such cases the witness becomes an "orphaned chunk state witness" and it's put in `OrphanChunkStateWitnessPool`, where it waits for the desired block to appear. Once a new block is accepted, we fetch all orphaned witnesses that were waiting for this block from the pool and process them. ### Design of `OrphanStateWitnessPool` `OrphanStateWitnessPool` keeps a cache which maps `shard_id` and `height` to an orphan `ChunkStateWitness` with these parameters: ```rust witness_cache: LruCache<(ShardId, BlockHeight), ChunkStateWitness>, ``` All `ChunkStateWitnesses` go through basic validation before being put in the orphan cache. * The signature is checked to make sure that this witness really comes from the right chunk producer that should produce a witness at this height and shard_id. * Client keeps only witnesses which are within 5 blocks of the current chain head to prevent spam attacks. Without this limitation a single malicious chunk producer could fill the whole cache with their fake witnesses. * There's also a limitation on witness size to limit the amount of memory consumed by the pool. During StatelessNet loadtests performed by `@staffik` and `@Longarithm` the observed `ChunkStateWitness` sIze was 16-32MB, so a 40MB limit should be alright. This PR only limits the size of orphaned witnesses, limiting the size of non-orphan witnesses is much more tricky, see the discussion in #10615. It's impossible to fully validate an orphaned witness, but this partial validation should be enough to protect against attacks on the orphan pool. Under normal circumstances there should be only a few orphaned witnesses per shard. If the node has fallen behind by more than a few blocks, it has to catch up and its chunk endorsements don't matter. The default cache capacity is set to 25 witnesses. With 5 shards it provides capacity for 5 orphaned witnesses on each shard, which should be enough. Assuming that a single witness can take up 40 MB, the pool will consume at most 1GB at full capacity. The changes are divided into individual commits, they can be reviewed commit-by-commit. ### Fixes Fixes: #10552 Fixes: near/stakewars-iv#15

jancionear added 3 commits February 14, 2024 21:34

Limit size of ChunkStateWitness to 16MB

1be0f3e

Don't send out witnesses larger than 16MB

641c309

Add a unit test for validate_chunk_state_witness_size()

a8bf9a2

jancionear added the A-stateless-validation Area: stateless validation label Feb 14, 2024

jancionear requested a review from pugachAG February 14, 2024 22:36

jancionear requested a review from a team as a code owner February 14, 2024 22:36

jancionear mentioned this pull request Feb 14, 2024

feat: orphan chunk state witness pool #10613

Merged

jancionear requested a review from staffik February 14, 2024 22:39

staffik approved these changes Feb 15, 2024

View reviewed changes

Longarithm approved these changes Feb 15, 2024

View reviewed changes

shreyan-gupta approved these changes Feb 15, 2024

View reviewed changes

pugachAG requested changes Feb 15, 2024

View reviewed changes

jancionear closed this Feb 15, 2024

jancionear mentioned this pull request Apr 12, 2024

feat: compress state witness #10715

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit ChunkStateWitness size to 16MB #10615

Limit ChunkStateWitness size to 16MB #10615

jancionear commented Feb 14, 2024 •

edited

Loading

codecov bot commented Feb 14, 2024

staffik left a comment

staffik Feb 15, 2024

staffik Feb 15, 2024

staffik Feb 15, 2024

shreyan-gupta Feb 15, 2024

Longarithm left a comment

shreyan-gupta left a comment

pugachAG left a comment

Longarithm commented Feb 15, 2024

staffik commented Feb 15, 2024 •

edited

Loading

jancionear commented Feb 15, 2024

Limit ChunkStateWitness size to 16MB #10615

Limit ChunkStateWitness size to 16MB #10615

Conversation

jancionear commented Feb 14, 2024 • edited Loading

codecov bot commented Feb 14, 2024

Codecov Report

staffik left a comment

Choose a reason for hiding this comment

staffik Feb 15, 2024

Choose a reason for hiding this comment

staffik Feb 15, 2024

Choose a reason for hiding this comment

staffik Feb 15, 2024

Choose a reason for hiding this comment

shreyan-gupta Feb 15, 2024

Choose a reason for hiding this comment

Longarithm left a comment

Choose a reason for hiding this comment

shreyan-gupta left a comment

Choose a reason for hiding this comment

pugachAG left a comment

Choose a reason for hiding this comment

Longarithm commented Feb 15, 2024

staffik commented Feb 15, 2024 • edited Loading

jancionear commented Feb 15, 2024

jancionear commented Feb 14, 2024 •

edited

Loading

staffik commented Feb 15, 2024 •

edited

Loading