Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't panic when a Client receives ChunkStateWitness with invalid shard_id #10621

Merged
merged 6 commits into from
Feb 21, 2024

Conversation

jancionear
Copy link
Contributor

When Client start processing a new ChunkStateWitness which it received, it immediately fetches the previous block and the previous chunk using prev_block_hash and shard_id provided by the ChunkStateWitness.
It turns out that the current implementation of Chain::get_prev_chunk_header will panic if it's given an invalid shard_id, which means that a malicious peer could send a ChunkStateWitness with a bad shard_id and it would crash the node that received it.

let prev_block_hash = witness.inner.chunk_header.prev_block_hash();
let prev_block = match self.chain.get_block(prev_block_hash) {
Ok(block) => block,
Err(_) => {
return Ok(Some(witness));
}
};
let prev_chunk_header = Chain::get_prev_chunk_header(
self.epoch_manager.as_ref(),
&prev_block,
witness.inner.chunk_header.shard_id(),
)?;

Chain::get_prev_chunk_header is very deceptive, it returns a Result, but still does an unwrap() inside. Let's fix the problem by removing the unwrap(). From now on Chain::get_prev_chunk_header will return a error when it encounters an invalid shard_id.

A test is added to ensure that this bug doesn't happen again.

@jancionear jancionear added the A-stateless-validation Area: stateless validation label Feb 15, 2024
@jancionear jancionear requested a review from a team as a code owner February 15, 2024 17:25
@jancionear
Copy link
Contributor Author

jancionear commented Feb 15, 2024

Initially I tried to put the test in chain/client/src/tests, but I couldn't get it to work. integration tests can make use of the GenesisExt trait which allows to easily create a valid Genesis that can be used in tests, but it isn't available in the near-client crate (I'd have to import the whole nearcore). I couldn't get it to work with GenesisConfig::test() so I settled on putting it together with integration tests.

Copy link

codecov bot commented Feb 15, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (69c3923) 72.19% compared to head (a14e553) 72.20%.
Report is 3 commits behind head on master.

Files Patch % Lines
chain/epoch-manager/src/adapter.rs 70.58% 1 Missing and 4 partials ⚠️
chain/chain/src/chain.rs 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10621      +/-   ##
==========================================
+ Coverage   72.19%   72.20%   +0.01%     
==========================================
  Files         726      729       +3     
  Lines      147719   148103     +384     
  Branches   147719   148103     +384     
==========================================
+ Hits       106647   106940     +293     
- Misses      36272    36352      +80     
- Partials     4800     4811      +11     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (+0.08%) ⬆️
db-migration 0.16% <0.00%> (+0.08%) ⬆️
genesis-check 1.33% <0.00%> (+0.08%) ⬆️
integration-tests 37.05% <80.00%> (-0.01%) ⬇️
linux 71.22% <76.66%> (+<0.01%) ⬆️
linux-nightly 71.66% <80.00%> (+0.03%) ⬆️
macos 55.19% <76.66%> (+0.31%) ⬆️
pytests 1.54% <0.00%> (+0.08%) ⬆️
sanity-checks 1.34% <0.00%> (+0.08%) ⬆️
unittests 68.11% <76.66%> (+0.01%) ⬆️
upgradability 0.21% <0.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -4183,7 +4183,11 @@ impl Chain {
shard_id: ShardId,
) -> Result<ShardChunkHeader, Error> {
let prev_shard_id = epoch_manager.get_prev_shard_ids(prev_block.hash(), vec![shard_id])?[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are on it, could you please add short comment for get_prev_shard_ids that if result is Ok then length of resulting vector is the same as length of argument vector? I was under impression that it is not the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

IMO it always returns a Vec of the same length. It maps every shard to parent shard, and then collects them into a Vec, so the length should be the same:

fn get_prev_shard_ids(
        &self,
        prev_hash: &CryptoHash,
        shard_ids: Vec<ShardId>,
    ) -> Result<Vec<ShardId>, Error> {
        if self.is_next_block_epoch_start(prev_hash)? {
            let shard_layout = self.get_shard_layout_from_prev_block(prev_hash)?;
            let prev_shard_layout = self.get_shard_layout(&self.get_epoch_id(prev_hash)?)?;
            if prev_shard_layout != shard_layout {
                return Ok(shard_ids
                    .into_iter()
                    .map(|shard_id| {
                        shard_layout.get_parent_shard_id(shard_id).map(|parent_shard_id|{
                            assert!(prev_shard_layout.shard_ids().any(|i| i == parent_shard_id),
                                    "invalid shard layout.  parent_shard_id: {}\nshard_layout: {:?}\nprev_shard_layout: {:?}",
                                    parent_shard_id,
                                    shard_layout,
                                    parent_shard_id
                            );
                            parent_shard_id
                        })
                    })
                    .collect::<Result<_, ShardLayoutError>>()?);
            }
        }
        Ok(shard_ids)
    }
    
    pub fn get_parent_shard_id(&self, shard_id: ShardId) -> Result<ShardId, ShardLayoutError> {
        if !self.shard_ids().any(|id| id == shard_id) {
            return Err(ShardLayoutError::InvalidShardIdError { shard_id });
        }
        let parent_shard_id = match self {
            Self::V0(_) => panic!("shard layout has no parent shard"),
            Self::V1(v1) => match &v1.to_parent_shard_map {
                // we can safely unwrap here because the construction of to_parent_shard_map guarantees
                // that every shard has a parent shard
                Some(to_parent_shard_map) => *to_parent_shard_map.get(shard_id as usize).unwrap(),
                None => panic!("shard_layout has no parent shard"),
            },
        };
        Ok(parent_shard_id)
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I also replaced the [0] with .first() just for the peace of mind, there's no reason to risk a panic there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although a more proper way to fix it would be to add a function that operates on a single ShardId. I think I'll go with that.

@jancionear
Copy link
Contributor Author

v2:

  • Improved the comment on get_prev_shard_ids to better explain how it works
  • Added get_prev_shard_id which operates on a single shard_id instead of a whole Vec. Indexing into a Vec could potentially panic, so it's safer to use this single-shard version inside get_prev_chunk_header.

@Longarithm Longarithm added this pull request to the merge queue Feb 21, 2024
Merged via the queue into near:master with commit fc79a67 Feb 21, 2024
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants