-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: Start resending sooner during send_and_confirm_transactions_in_parallel
#348
Conversation
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #348 +/- ##
=========================================
- Coverage 81.9% 81.8% -0.1%
=========================================
Files 837 837
Lines 226898 226900 +2
=========================================
- Hits 185871 185817 -54
- Misses 41027 41083 +56 |
thanks for working on that! People in different Solana chats complaining daily about deploy speed. For example, this issue solana-labs#34444 |
Depending on the importance of this task we can prioritize running deployment on testnet with measuring the time to deploy and other relevant metrics solana-labs#32873 |
That's a good point, I've been using just a little test program that uses |
} | ||
.boxed_local(), | ||
]; | ||
join_all(futures).await.into_iter().collect::<Result<_>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, could you add some timeouts on join_all for 2 minutes = 150 blockhashes please. Sometimes CLI gets stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that sometimes... do you have any idea what causes it? In my experience, I don't see the block height updating, which makes me think that something in confirm_transactions_till_block_height_...
is hanging.
I'd prefer that we only abort when we know that the block height has been passed, rather than hardcoding 2 minutes. We can detach the future with tokio::spawn
, loop checking is_finished()
on it, and abort it if the blockhash is expired. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good idea a Notify channel and here we break if the blockheight exceeds the last transaction blockheight or something. This should be done for all join and/or awaits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, do you mind if I include it in a separate PR? It isn't directly related to the changes here and can be done independently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah not at all
…ctions_in_parallel` (backport of anza-xyz#348) (anza-xyz#357) client: Start resending sooner during `send_and_confirm_transactions_in_parallel` (anza-xyz#348) client: Confirm sooner during send_and_confirm_in_parallel (cherry picked from commit b2f4fb3) Co-authored-by: Jon C <me@jonc.dev>
Problem
send_and_confirm_transactions_in_parallel
is really great, but during times when one particular validator is slow or offline, sending and confirming transactions can take forever. This is because we wait to send all of the transactions before moving on to resending.With the congestion on the network nowadays, it makes deploying a program very difficult.
Summary of Changes
This really just moves things around to start retrying and confirming sooner.
Rather than send all transactions first, and then kick off the resends, the concept is to add the transaction to the unconfirmed queue before sending, and also start retrying two seconds after the start of the sending.
The impact of this change is tough to gauge, especially because of volatile priority fees on mainnet, but I tried sending 100 self-transfers with 2 lamports (2 million micro-lamports) per CU for priority fees, and a 450 CU limit. On 5 trials, here's what I got:
While testing without the patch, I had to abort a couple of runs because the sending was absolutely crawling, only one send every 3 seconds.