-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler Frequency Fixes #4545
Scheduler Frequency Fixes #4545
Conversation
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
if num_scanned >= self.config.max_scanned_transactions_per_scheduling_pass { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavioral change here: break loop if the number of scanned transactions exceed the configured maximum
}; | ||
|
||
let (received_packet_results, receive_time_us) = measure_us!(self | ||
.packet_receiver | ||
.receive_packets(recv_timeout, remaining_queue_capacity, |packet| { | ||
.receive_packets(recv_timeout, MAX_RECEIVE_PACKETS, |packet| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavioral change: max received number of transactions is 5000. This is a "loose maximum" as it will continue to receive packets up until it has receive >= 5000. This is because the smallest unit we receive is a Vec<PacketBatch>
which obviously can contain more than one packet.
const MAX_PACKET_RECEIVE_TIME: Duration = Duration::from_millis(10); | ||
let (recv_timeout, should_buffer) = match decision { | ||
BufferedPacketsDecision::Consume(_) => ( | ||
BufferedPacketsDecision::Consume(_) | BufferedPacketsDecision::Hold => ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavioral change: Hold
can now result in 0 timeout if the container is not empty. This is intended to avoid long timeouts of 10ms if a Hold pops up between our leader slots (it can)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the profiles I've been taking so far, we seem to spend exactly 20ms doing nothing in between slots. Curious to see what happens now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't you recently profile on the box with this change in there? did you see something there?
self.receive_completed()?; | ||
self.process_transactions(&decision)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavioral change: receive completed transactions just before scheduling instead of just after.
* Change PrioGraphSchedulerConfig::default for 1k maxs, 256 look ahead * Break loop on scanned transaction count * make Hold decision behave same as Consume during receive * receive maximum of 5_000 packets - loose max * receive_completed before process_transactions (cherry picked from commit 6e93845) # Conflicts: # core/src/banking_stage/transaction_scheduler/prio_graph_scheduler.rs # core/src/banking_stage/transaction_scheduler/receive_and_buffer.rs # core/src/banking_stage/transaction_scheduler/scheduler_controller.rs
* Change PrioGraphSchedulerConfig::default for 1k maxs, 256 look ahead * Break loop on scanned transaction count * make Hold decision behave same as Consume during receive * receive maximum of 5_000 packets - loose max * receive_completed before process_transactions (cherry picked from commit 6e93845) # Conflicts: # core/src/banking_stage/transaction_scheduler/prio_graph_scheduler.rs # core/src/banking_stage/transaction_scheduler/receive_and_buffer.rs # core/src/banking_stage/transaction_scheduler/scheduler_controller.rs
* Change prio_graph_scheduler configurations for 1k maxs, 256 look ahead * Break loop on scanned transaction count * make Hold decision behave same as Consume during receive * receive maximum of 5_000 packets - loose max * receive_completed before process_transactions --------- Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
* Change prio_graph_scheduler configurations for 1k maxs, 256 look ahead * Break loop on scanned transaction count * make Hold decision behave same as Consume during receive * receive maximum of 5_000 packets - loose max * receive_completed before process_transactions --------- Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
* v2.0: Reclaims more old accounts in `clean` (backport of anza-xyz#4044) (anza-xyz#4089) * Reclaims more old accounts in `clean` (anza-xyz#4044) (cherry picked from commit 3d43824) # Conflicts: # accounts-db/src/accounts_db.rs # accounts-db/src/accounts_db/tests.rs * fix merge conflicts --------- Co-authored-by: Brooks <brooks@anza.xyz> * v2.0: Fixes clean_old_storages_with_reclaims tests (backport of anza-xyz#4147) (anza-xyz#4166) * Fixes clean_old_storages_with_reclaims tests (anza-xyz#4147) (cherry picked from commit 4eabeed) # Conflicts: # accounts-db/src/accounts_db/tests.rs * fix merge conflicts --------- Co-authored-by: Brooks <brooks@anza.xyz> * v2.0: blockstore: mark slot as dead on data shred merkle root conflict (backport of anza-xyz#3970) (anza-xyz#4074) * blockstore: mark slot as dead on data shred merkle root conflict (anza-xyz#3970) (cherry picked from commit 5564a94) # Conflicts: # ledger/src/blockstore.rs * fix conflicts --------- Co-authored-by: Ashwin Sekar <ashwin@anza.xyz> Co-authored-by: Ashwin Sekar <ashwin@solana.com> * Bump version to v2.0.22 (anza-xyz#4200) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: hardcode rust version for publish-crate (anza-xyz#4228) * Bump version to v2.0.23 (anza-xyz#4419) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: rolls out chained Merkle shreds to ~21% of mainnet slots (backport of anza-xyz#4431) (anza-xyz#4434) rolls out chained Merkle shreds to ~21% of mainnet slots (anza-xyz#4431) (cherry picked from commit 9d09787) Co-authored-by: behzad nouri <behzadnouri@gmail.com> * v2.0: [rpc] Fatal `getSignaturesForAddress()` when Bigtable errors (backport of anza-xyz#3700) (anza-xyz#4442) [rpc] Fatal `getSignaturesForAddress()` when Bigtable errors (anza-xyz#3700) * Unindent code in `get_signatures_for_address` * Add a custom JSON-RPC error to throw when long-term storage (ie. Bigtable) can't be reached * When the `before`/`until` signatures can't be found, throw `SignatureNotFound` instead of `RowNotFound` * Fatal `getSignaturesForAddress` calls when Bigtable must be queried but can't be reached (cherry picked from commit 52f132c) Co-authored-by: Steven Luscher <steveluscher@users.noreply.github.com> * v2.0: ci: bump [upload|download]-artifact to v4 (anza-xyz#4501) ci: bump [upload|download]-artifact to v4 * v2.0: ci: hardcode crate publishing version (anza-xyz#4515) ci: hardcode rust version for publish-crate * Bump version to v2.0.24 (anza-xyz#4528) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: fix: reduce max packet receive time during leader window (backport of anza-xyz#2801) (anza-xyz#4544) fix: reduce max packet receive time during leader window (anza-xyz#2801) (cherry picked from commit 20e0df4) Co-authored-by: cavemanloverboy <93507302+cavemanloverboy@users.noreply.github.com> * v2.0: Scheduler Frequency Fixes (backport of anza-xyz#4545) (anza-xyz#4576) * Change prio_graph_scheduler configurations for 1k maxs, 256 look ahead * Break loop on scanned transaction count * make Hold decision behave same as Consume during receive * receive maximum of 5_000 packets - loose max * receive_completed before process_transactions --------- Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com> --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Brooks <brooks@anza.xyz> Co-authored-by: Ashwin Sekar <ashwin@anza.xyz> Co-authored-by: Ashwin Sekar <ashwin@solana.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yihau Chen <yihau.chen@icloud.com> Co-authored-by: behzad nouri <behzadnouri@gmail.com> Co-authored-by: Steven Luscher <steveluscher@users.noreply.github.com> Co-authored-by: cavemanloverboy <93507302+cavemanloverboy@users.noreply.github.com> Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
* v2.0: Reclaims more old accounts in `clean` (backport of anza-xyz#4044) (anza-xyz#4089) * Reclaims more old accounts in `clean` (anza-xyz#4044) (cherry picked from commit 3d43824) # Conflicts: # accounts-db/src/accounts_db.rs # accounts-db/src/accounts_db/tests.rs * fix merge conflicts --------- Co-authored-by: Brooks <brooks@anza.xyz> * v2.0: Fixes clean_old_storages_with_reclaims tests (backport of anza-xyz#4147) (anza-xyz#4166) * Fixes clean_old_storages_with_reclaims tests (anza-xyz#4147) (cherry picked from commit 4eabeed) # Conflicts: # accounts-db/src/accounts_db/tests.rs * fix merge conflicts --------- Co-authored-by: Brooks <brooks@anza.xyz> * v2.0: blockstore: mark slot as dead on data shred merkle root conflict (backport of anza-xyz#3970) (anza-xyz#4074) * blockstore: mark slot as dead on data shred merkle root conflict (anza-xyz#3970) (cherry picked from commit 5564a94) # Conflicts: # ledger/src/blockstore.rs * fix conflicts --------- Co-authored-by: Ashwin Sekar <ashwin@anza.xyz> Co-authored-by: Ashwin Sekar <ashwin@solana.com> * Bump version to v2.0.22 (anza-xyz#4200) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: hardcode rust version for publish-crate (anza-xyz#4228) * Bump version to v2.0.23 (anza-xyz#4419) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: rolls out chained Merkle shreds to ~21% of mainnet slots (backport of anza-xyz#4431) (anza-xyz#4434) rolls out chained Merkle shreds to ~21% of mainnet slots (anza-xyz#4431) (cherry picked from commit 9d09787) Co-authored-by: behzad nouri <behzadnouri@gmail.com> * v2.0: [rpc] Fatal `getSignaturesForAddress()` when Bigtable errors (backport of anza-xyz#3700) (anza-xyz#4442) [rpc] Fatal `getSignaturesForAddress()` when Bigtable errors (anza-xyz#3700) * Unindent code in `get_signatures_for_address` * Add a custom JSON-RPC error to throw when long-term storage (ie. Bigtable) can't be reached * When the `before`/`until` signatures can't be found, throw `SignatureNotFound` instead of `RowNotFound` * Fatal `getSignaturesForAddress` calls when Bigtable must be queried but can't be reached (cherry picked from commit 52f132c) Co-authored-by: Steven Luscher <steveluscher@users.noreply.github.com> * v2.0: ci: bump [upload|download]-artifact to v4 (anza-xyz#4501) ci: bump [upload|download]-artifact to v4 * v2.0: ci: hardcode crate publishing version (anza-xyz#4515) ci: hardcode rust version for publish-crate * Bump version to v2.0.24 (anza-xyz#4528) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: fix: reduce max packet receive time during leader window (backport of anza-xyz#2801) (anza-xyz#4544) fix: reduce max packet receive time during leader window (anza-xyz#2801) (cherry picked from commit 20e0df4) Co-authored-by: cavemanloverboy <93507302+cavemanloverboy@users.noreply.github.com> * v2.0: Scheduler Frequency Fixes (backport of anza-xyz#4545) (anza-xyz#4576) * Change prio_graph_scheduler configurations for 1k maxs, 256 look ahead * Break loop on scanned transaction count * make Hold decision behave same as Consume during receive * receive maximum of 5_000 packets - loose max * receive_completed before process_transactions --------- Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com> --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Brooks <brooks@anza.xyz> Co-authored-by: Ashwin Sekar <ashwin@anza.xyz> Co-authored-by: Ashwin Sekar <ashwin@solana.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yihau Chen <yihau.chen@icloud.com> Co-authored-by: behzad nouri <behzadnouri@gmail.com> Co-authored-by: Steven Luscher <steveluscher@users.noreply.github.com> Co-authored-by: cavemanloverboy <93507302+cavemanloverboy@users.noreply.github.com> Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
Problem
Summary of Changes
Fixes #