-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CostTracker aware of inflight transactions #437
Conversation
3dd9aff
to
6968d0c
Compare
Some alternatives that were considered:
|
I don't think holding the freeze lock for longer will work, we'd also have to acquire the lock earlier. Right now we only hold freeze lock during (record, commit), but not execution. I think that approach is riskier and not sure we should be backporting that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - let's get approval from second set of eyes before merging.
Ah right, thank you. So yeah, VERY significantly extending the duration of holding that lock
Yep, 100% agreed on that approach being much riskier |
@t-nelson - I think you've followed along with the issue this PR aims to address so adding you as a reviewer with Tao being OOO this week |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #437 +/- ##
=======================================
Coverage 81.8% 81.9%
=======================================
Files 842 842
Lines 228492 228518 +26
=======================================
+ Hits 187104 187165 +61
+ Misses 41388 41353 -35 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I think we could optimize the read locking on the cost reporting
6968d0c
to
bd64aac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c30bbbb 😬 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - how did I misss the report remove 🤦
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
When a leader is packing a Bank, transactions costs are added to the CostTracker and then later updated or removed, depending on if the tx is committed. However, it is possible for a Bank to be frozen while there are several tx's in flight. CostUpdateService submits a metric with cost information almost immediately after a Bank has been frozen. The result is that we have observed cost details being submitted before some cost removals take place, which causes a massive over-reporting of the block cost compared to actual. This PR adds a field to track the number of transactions that are inflight, and adds a simple mechanism to try to allow that value to settle to zero before submitting the datapoint. The number of inflight tx's is submitted with the datapoint, so even if the value does not settle to zero, we can still detect this case and know the metric is tainted. Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com> (cherry picked from commit 9076348)
When a leader is packing a Bank, transactions costs are added to the CostTracker and then later updated or removed, depending on if the tx is committed. However, it is possible for a Bank to be frozen while there are several tx's in flight. CostUpdateService submits a metric with cost information almost immediately after a Bank has been frozen. The result is that we have observed cost details being submitted before some cost removals take place, which causes a massive over-reporting of the block cost compared to actual. This PR adds a field to track the number of transactions that are inflight, and adds a simple mechanism to try to allow that value to settle to zero before submitting the datapoint. The number of inflight tx's is submitted with the datapoint, so even if the value does not settle to zero, we can still detect this case and know the metric is tainted. Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com> (cherry picked from commit 9076348)
When a leader is packing a Bank, transactions costs are added to the CostTracker and then later updated or removed, depending on if the tx is committed. However, it is possible for a Bank to be frozen while there are several tx's in flight. CostUpdateService submits a metric with cost information almost immediately after a Bank has been frozen. The result is that we have observed cost details being submitted before some cost removals take place, which causes a massive over-reporting of the block cost compared to actual. This PR adds a field to track the number of transactions that are inflight, and adds a simple mechanism to try to allow that value to settle to zero before submitting the datapoint. The number of inflight tx's is submitted with the datapoint, so even if the value does not settle to zero, we can still detect this case and know the metric is tainted. Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com> (cherry picked from commit 9076348)
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
When a leader is packing a Bank, transactions costs are added to the CostTracker and then later updated or removed, depending on if the tx is committed. However, it is possible for a Bank to be frozen while there are several tx's in flight. CostUpdateService submits a metric with cost information almost immediately after a Bank has been frozen. The result is that we have observed cost details being submitted before some cost removals take place, which causes a massive over-reporting of the block cost compared to actual. This PR adds a field to track the number of transactions that are inflight, and adds a simple mechanism to try to allow that value to settle to zero before submitting the datapoint. The number of inflight tx's is submitted with the datapoint, so even if the value does not settle to zero, we can still detect this case and know the metric is tainted. Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com> (cherry picked from commit 9076348)
When a leader is packing a Bank, transactions costs are added to the CostTracker and then later updated or removed, depending on if the tx is committed. However, it is possible for a Bank to be frozen while there are several tx's in flight. CostUpdateService submits a metric with cost information almost immediately after a Bank has been frozen. The result is that we have observed cost details being submitted before some cost removals take place, which causes a massive over-reporting of the block cost compared to actual. This PR adds a field to track the number of transactions that are inflight, and adds a simple mechanism to try to allow that value to settle to zero before submitting the datapoint. The number of inflight tx's is submitted with the datapoint, so even if the value does not settle to zero, we can still detect this case and know the metric is tainted. Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com> (cherry picked from commit 9076348)
Problem
When a leader is packing a
Bank
, transactions are added to the cost-tracker and then later updated or removed, depending on whether the transactions were committed or not. However, it is also possible for aBank
to get frozen while these transactions are "in-flight".After the bank is frozen, the bank is sent over to
CostUpdateService
almost immediately. The result is that we've observed aBank
having its' cost details reported to metrics BEFORE the updates / removals were made to theCostTracker
for these in-flight transactions. This causes a leader to submit metrics with an over-reported cost in comparison to what we calculate from replaying the block.Taking over work done by @apfitzge in #398
Summary of Changes
Fixes #366