fix: don't panic when a Client receives ChunkStateWitness with invalid shard_id #10621

jancionear · 2024-02-15T17:25:08Z

When Client start processing a new ChunkStateWitness which it received, it immediately fetches the previous block and the previous chunk using prev_block_hash and shard_id provided by the ChunkStateWitness.
It turns out that the current implementation of Chain::get_prev_chunk_header will panic if it's given an invalid shard_id, which means that a malicious peer could send a ChunkStateWitness with a bad shard_id and it would crash the node that received it.

nearcore/chain/client/src/stateless_validation/chunk_validator.rs

Lines 596 to 607 in c5c84ad

    
           let prev_block_hash = witness.inner.chunk_header.prev_block_hash(); 
        
           let prev_block = match self.chain.get_block(prev_block_hash) { 
        
               Ok(block) => block, 
        
               Err(_) => { 
        
                   return Ok(Some(witness)); 
        
               } 
        
           }; 
        
           let prev_chunk_header = Chain::get_prev_chunk_header( 
        
               self.epoch_manager.as_ref(), 
        
               &prev_block, 
        
               witness.inner.chunk_header.shard_id(), 
        
           )?;

Chain::get_prev_chunk_header is very deceptive, it returns a Result, but still does an unwrap() inside. Let's fix the problem by removing the unwrap(). From now on Chain::get_prev_chunk_header will return a error when it encounters an invalid shard_id.

A test is added to ensure that this bug doesn't happen again.

jancionear · 2024-02-15T17:25:19Z

Initially I tried to put the test in chain/client/src/tests, but I couldn't get it to work. integration tests can make use of the GenesisExt trait which allows to easily create a valid Genesis that can be used in tests, but it isn't available in the near-client crate (I'd have to import the whole nearcore). I couldn't get it to work with GenesisConfig::test() so I settled on putting it together with integration tests.

codecov · 2024-02-15T17:46:20Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (69c3923) 72.19% compared to head (a14e553) 72.20%.
Report is 3 commits behind head on master.

Files	Patch %	Lines
chain/epoch-manager/src/adapter.rs	70.58%	1 Missing and 4 partials ⚠️
chain/chain/src/chain.rs	83.33%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10621      +/-   ##
==========================================
+ Coverage   72.19%   72.20%   +0.01%     
==========================================
  Files         726      729       +3     
  Lines      147719   148103     +384     
  Branches   147719   148103     +384     
==========================================
+ Hits       106647   106940     +293     
- Misses      36272    36352      +80     
- Partials     4800     4811      +11

Flag	Coverage Δ
backward-compatibility	`0.16% <0.00%> (+0.08%)`	⬆️
db-migration	`0.16% <0.00%> (+0.08%)`	⬆️
genesis-check	`1.33% <0.00%> (+0.08%)`	⬆️
integration-tests	`37.05% <80.00%> (-0.01%)`	⬇️
linux	`71.22% <76.66%> (+<0.01%)`	⬆️
linux-nightly	`71.66% <80.00%> (+0.03%)`	⬆️
macos	`55.19% <76.66%> (+0.31%)`	⬆️
pytests	`1.54% <0.00%> (+0.08%)`	⬆️
sanity-checks	`1.34% <0.00%> (+0.08%)`	⬆️
unittests	`68.11% <76.66%> (+0.01%)`	⬆️
upgradability	`0.21% <0.00%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Longarithm

👍

Longarithm · 2024-02-16T09:52:34Z

chain/chain/src/chain.rs

@@ -4183,7 +4183,11 @@ impl Chain {
        shard_id: ShardId,
    ) -> Result<ShardChunkHeader, Error> {
        let prev_shard_id = epoch_manager.get_prev_shard_ids(prev_block.hash(), vec![shard_id])?[0];


As you are on it, could you please add short comment for get_prev_shard_ids that if result is Ok then length of resulting vector is the same as length of argument vector? I was under impression that it is not the case.

Added a comment.

IMO it always returns a Vec of the same length. It maps every shard to parent shard, and then collects them into a Vec, so the length should be the same:

fn get_prev_shard_ids( &self, prev_hash: &CryptoHash, shard_ids: Vec<ShardId>, ) -> Result<Vec<ShardId>, Error> { if self.is_next_block_epoch_start(prev_hash)? { let shard_layout = self.get_shard_layout_from_prev_block(prev_hash)?; let prev_shard_layout = self.get_shard_layout(&self.get_epoch_id(prev_hash)?)?; if prev_shard_layout != shard_layout { return Ok(shard_ids .into_iter() .map(|shard_id| { shard_layout.get_parent_shard_id(shard_id).map(|parent_shard_id|{ assert!(prev_shard_layout.shard_ids().any(|i| i == parent_shard_id), "invalid shard layout. parent_shard_id: {}\nshard_layout: {:?}\nprev_shard_layout: {:?}", parent_shard_id, shard_layout, parent_shard_id ); parent_shard_id }) }) .collect::<Result<_, ShardLayoutError>>()?); } } Ok(shard_ids) } pub fn get_parent_shard_id(&self, shard_id: ShardId) -> Result<ShardId, ShardLayoutError> { if !self.shard_ids().any(|id| id == shard_id) { return Err(ShardLayoutError::InvalidShardIdError { shard_id }); } let parent_shard_id = match self { Self::V0(_) => panic!("shard layout has no parent shard"), Self::V1(v1) => match &v1.to_parent_shard_map { // we can safely unwrap here because the construction of to_parent_shard_map guarantees // that every shard has a parent shard Some(to_parent_shard_map) => *to_parent_shard_map.get(shard_id as usize).unwrap(), None => panic!("shard_layout has no parent shard"), }, }; Ok(parent_shard_id) }

But I also replaced the [0] with .first() just for the peace of mind, there's no reason to risk a panic there.

Although a more proper way to fix it would be to add a function that operates on a single ShardId. I think I'll go with that.

jancionear · 2024-02-16T15:24:28Z

v2:

Improved the comment on get_prev_shard_ids to better explain how it works
Added get_prev_shard_id which operates on a single shard_id instead of a whole Vec. Indexing into a Vec could potentially panic, so it's safer to use this single-shard version inside get_prev_chunk_header.

jancionear added 2 commits February 15, 2024 17:12

fix: get_prev_chunk_header - don't panic when a chunk doesn't exist

429761c

Test that Client rejects witnesses with invalid shard_id

acf0858

jancionear added the A-stateless-validation Area: stateless validation label Feb 15, 2024

jancionear requested a review from Longarithm February 15, 2024 17:25

jancionear requested a review from a team as a code owner February 15, 2024 17:25

Longarithm approved these changes Feb 16, 2024

View reviewed changes

jancionear added 4 commits February 16, 2024 15:03

Add a comment on get_prev_shard_ids()

595f23f

Don't index into the Vec, it could potentially panic

0f3425e

Move the comment to the proper location

354801d

Add get_prev_shard_id, use it instead of the Vec version

a14e553

jancionear requested a review from Longarithm February 16, 2024 15:48

Longarithm approved these changes Feb 21, 2024

View reviewed changes

Longarithm added this pull request to the merge queue Feb 21, 2024

Longarithm mentioned this pull request Feb 21, 2024

[ProjectTracking]: Stateless validation Mainnet Release near/near-one-project-tracking#46

Open

52 tasks

Merged via the queue into near:master with commit fc79a67 Feb 21, 2024
28 of 29 checks passed

This was referenced Feb 22, 2024

fix: properly handle genesis as part of stateless validation #10633

Merged

Remove pieces of code which index slices by shard id #10648

Open

jancionear mentioned this pull request Apr 23, 2024

Fuzz ChunkStateWitness validation #11132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't panic when a Client receives ChunkStateWitness with invalid shard_id #10621

fix: don't panic when a Client receives ChunkStateWitness with invalid shard_id #10621

jancionear commented Feb 15, 2024

jancionear commented Feb 15, 2024 •

edited

Loading

codecov bot commented Feb 15, 2024 •

edited

Loading

Longarithm left a comment

Longarithm Feb 16, 2024

jancionear Feb 16, 2024

jancionear Feb 16, 2024

jancionear Feb 16, 2024

jancionear commented Feb 16, 2024

	let prev_block_hash = witness.inner.chunk_header.prev_block_hash();
	let prev_block = match self.chain.get_block(prev_block_hash) {
	Ok(block) => block,
	Err(_) => {
	return Ok(Some(witness));
	}
	};
	let prev_chunk_header = Chain::get_prev_chunk_header(
	self.epoch_manager.as_ref(),
	&prev_block,
	witness.inner.chunk_header.shard_id(),
	)?;

fix: don't panic when a Client receives ChunkStateWitness with invalid shard_id #10621

fix: don't panic when a Client receives ChunkStateWitness with invalid shard_id #10621

Conversation

jancionear commented Feb 15, 2024

jancionear commented Feb 15, 2024 • edited Loading

codecov bot commented Feb 15, 2024 • edited Loading

Codecov Report

Longarithm left a comment

Choose a reason for hiding this comment

Longarithm Feb 16, 2024

Choose a reason for hiding this comment

jancionear Feb 16, 2024

Choose a reason for hiding this comment

jancionear Feb 16, 2024

Choose a reason for hiding this comment

jancionear Feb 16, 2024

Choose a reason for hiding this comment

jancionear commented Feb 16, 2024

jancionear commented Feb 15, 2024 •

edited

Loading

codecov bot commented Feb 15, 2024 •

edited

Loading