Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fee mismatch on snapshot deserialize #12697

Merged
merged 4 commits into from
Oct 9, 2020

Conversation

carllin
Copy link
Contributor

@carllin carllin commented Oct 6, 2020

Problem

#12670 (comment)

Summary of Changes

As I see it, two options:

  1. Proper fix: Serialize lamports_per_signature field and add gating to roll this out
  2. Just take the lamports_per_signature value from the fee calculator on deserialize

Fixes #

@carllin carllin requested review from ryoqun and t-nelson October 6, 2020 23:15
@@ -67,7 +67,6 @@ impl FeeCalculator {
pub struct FeeRateGovernor {
// The current cost of a signature This amount may increase/decrease over time based on
// cluster processing load.
#[serde(skip)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where gating would be necessary. Alternatively we just set this value on deserialize to bank.fee_calculator.lamports_per_signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun after talking with @t-nelson, it seems removing the skip here directly also breaks RPC, seems like the path of least resistance is to just manually assign this field for now. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the path of least resistance is to just manually assign this field for now. Thoughts?

@carllin yeah, I think this solution is good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when did this bug started to occur? Since beginning? ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm yeah I would guess since it was first checked in xD

@mvines mvines added the v1.3 label Oct 6, 2020
@mvines
Copy link
Member

mvines commented Oct 6, 2020

(+v1.3 label since this could also affect mainnet-beta once the pico-inflation feature is enabled)

@ryoqun
Copy link
Member

ryoqun commented Oct 7, 2020

Proper fix: Serialize lamports_per_signature field and add gating to roll this out

@carllin Also, note that gating here would not be the pretty solana features, rather would be a snapshot version bump.

@ryoqun
Copy link
Member

ryoqun commented Oct 7, 2020

@carllin First of all, thanks for debugging and actually fixing this inconsistency bug. :)

However, what I'm a bit wondering is that I felt that the vote mismatches were very likely to occur only when the solana-tps-bench is running (i.e. under high load).

For me, this looks like a deterministic inconsystency issue. Could this bug explain the indermisticic behavior I've observed? For one thing, my observation isn't so well formal. So I'm not conclusive. (I just sensed this way when tackling another issue (tower assertion fixes #12671)).

@carllin
Copy link
Contributor Author

carllin commented Oct 7, 2020

@ryoqun, you're right that this is a deterministic bug.

For me when following your steps to repro yesterday, I was seeing this bug repro on restart under high tps, I wonder if it has something to do with how the fee is calculated here: https://github.com/solana-labs/solana/blob/master/sdk/src/fee_calculator.rs#L138, which is derived from latest_signatures_per_slot, which needs to hit some threshold before it changes.

@codecov
Copy link

codecov bot commented Oct 7, 2020

Codecov Report

Merging #12697 into master will decrease coverage by 0.0%.
The diff coverage is 95.5%.

@@            Coverage Diff            @@
##           master   #12697     +/-   ##
=========================================
- Coverage    81.9%    81.8%   -0.1%     
=========================================
  Files         353      353             
  Lines       84208    84243     +35     
=========================================
- Hits        69004    68990     -14     
- Misses      15204    15253     +49     

@ryoqun
Copy link
Member

ryoqun commented Oct 7, 2020

@carllin this is LGTM for me even if this is still a draft. Is there any remaining work like write a unit-test? I think this is a bit hard.
Or you might be able to write one by hooking up a new path-through call flow with new_from_fields & get_fields_to_serialize and impl From<BankFieldsToDeserialize> for BankFieldsToSerialize

@carllin carllin marked this pull request as ready for review October 7, 2020 18:57
@carllin
Copy link
Contributor Author

carllin commented Oct 7, 2020

@ryoqun, looks like the compare_bank() function should have covered this case but was missing some cases. I modified the existing test_bank_serialize_newer to cover this case, what do you think?

Comment on lines -3594 to -3596
if ptr::eq(self, dbank) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should be kept to deadlock free? #10466

Yeah, we should have added an comment...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah good catch, thanks!

&& *self.inflation.read().unwrap() == *other.inflation.read().unwrap()
&& *self.stakes.read().unwrap() == *other.stakes.read().unwrap()
&& self.epoch_stakes == other.epoch_stakes
&& self.is_delta.load(Relaxed) == other.is_delta.load(Relaxed)
Copy link
Member

@ryoqun ryoqun Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how we can maintain this for the addition of newer fields... This problem exists from compare_bank.

@@ -477,6 +477,42 @@ pub(crate) struct BankFieldsToSerialize<'a> {
pub(crate) is_delta: bool,
}

impl PartialEq for Bank {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, why can't we just derive PartialEq? A comment might be desirable to explain the limitation.

Copy link
Contributor Author

@carllin carllin Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe so the issue was RwLock doesn't implement PartialEq, and we can't implement it for RwLock because it doesn't belong to this crate, added comment!

@@ -219,7 +219,7 @@ fn test_bank_serialize_style(serde_style: SerdeStyle) {
assert_eq!(dbank.get_balance(&key1.pubkey()), 0);
assert_eq!(dbank.get_balance(&key2.pubkey()), 10);
assert_eq!(dbank.get_balance(&key3.pubkey()), 0);
bank2.compare_bank(&dbank);
assert!(bank2 == dbank);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: how about assert_eq!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I ran into with assert_eq! was that it requires Bank implement Debug, which means all the program stack CowCachedExecutors -> Executor -> InvokeContext have to all implement Debug , and I didn't want to plumb all of that through 😩

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllin FYI, I've took the initiative here after stressed with not being able to freely dbg!() bunch of bank fields: https://github.com/solana-labs/solana/pull/13017/files#diff-b5dead0a38e5f63447595bbeda1428d3aa851d33954343bb1bc38b4b00add5fdR159

@@ -219,7 +219,7 @@ fn test_bank_serialize_style(serde_style: SerdeStyle) {
assert_eq!(dbank.get_balance(&key1.pubkey()), 0);
assert_eq!(dbank.get_balance(&key2.pubkey()), 10);
assert_eq!(dbank.get_balance(&key3.pubkey()), 0);
bank2.compare_bank(&dbank);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

(I wonder why this was like this not defining ==.)..

@ryoqun
Copy link
Member

ryoqun commented Oct 8, 2020

@carllin reviewed. :)

@ryoqun
Copy link
Member

ryoqun commented Oct 8, 2020

@ryoqun, looks like the compare_bank() function should have covered this case but was missing some cases. I modified the existing test_bank_serialize_newer to cover this case, what do you think?

Really thanks for adding this. I'm very happy knowing that we're now covered under tests. :)

Copy link
Member

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@carllin carllin merged commit c879e7c into solana-labs:master Oct 9, 2020
mergify bot pushed a commit that referenced this pull request Oct 9, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit c879e7c)

# Conflicts:
#	runtime/src/bank.rs
mergify bot pushed a commit that referenced this pull request Oct 9, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit c879e7c)
carllin added a commit that referenced this pull request Oct 9, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit c879e7c)
mergify bot added a commit that referenced this pull request Oct 9, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit c879e7c)

Co-authored-by: carllin <wumu727@gmail.com>
carllin added a commit that referenced this pull request Oct 12, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit c879e7c)
carllin added a commit that referenced this pull request Oct 12, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit c879e7c)
carllin added a commit that referenced this pull request Oct 12, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit c879e7c)
mergify bot added a commit that referenced this pull request Oct 12, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit c879e7c)

Co-authored-by: carllin <wumu727@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants