Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: Start resending sooner during send_and_confirm_transactions_in_parallel #348

Merged
merged 1 commit into from
Mar 21, 2024

Conversation

joncinque
Copy link

Problem

send_and_confirm_transactions_in_parallel is really great, but during times when one particular validator is slow or offline, sending and confirming transactions can take forever. This is because we wait to send all of the transactions before moving on to resending.

With the congestion on the network nowadays, it makes deploying a program very difficult.

Summary of Changes

This really just moves things around to start retrying and confirming sooner.

Rather than send all transactions first, and then kick off the resends, the concept is to add the transaction to the unconfirmed queue before sending, and also start retrying two seconds after the start of the sending.

The impact of this change is tough to gauge, especially because of volatile priority fees on mainnet, but I tried sending 100 self-transfers with 2 lamports (2 million micro-lamports) per CU for priority fees, and a 450 CU limit. On 5 trials, here's what I got:

  • without patch: 40s, 45s, 34s, 37s, 36s
  • with patch: 27s, 68s, 38s, 29s, 20s

While testing without the patch, I had to abort a couple of runs because the sending was absolutely crawling, only one send every 3 seconds.

Copy link

mergify bot commented Mar 21, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 95.12195% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 81.8%. Comparing base (184ba6c) to head (d7aa979).
Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master     #348     +/-   ##
=========================================
- Coverage    81.9%    81.8%   -0.1%     
=========================================
  Files         837      837             
  Lines      226898   226900      +2     
=========================================
- Hits       185871   185817     -54     
- Misses      41027    41083     +56     

@KirillLykov
Copy link

KirillLykov commented Mar 21, 2024

thanks for working on that! People in different Solana chats complaining daily about deploy speed. For example, this issue solana-labs#34444

@KirillLykov
Copy link

Depending on the importance of this task we can prioritize running deployment on testnet with measuring the time to deploy and other relevant metrics solana-labs#32873

@joncinque
Copy link
Author

That's a good point, I've been using just a little test program that uses send_and_confirm_transactions_in_parallel and getting rough numbers. At least some test which writes a program to a buffer would be useful.

}
.boxed_local(),
];
join_all(futures).await.into_iter().collect::<Result<_>>()?;
Copy link

@godmodegalactus godmodegalactus Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great, could you add some timeouts on join_all for 2 minutes = 150 blockhashes please. Sometimes CLI gets stuck.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that sometimes... do you have any idea what causes it? In my experience, I don't see the block height updating, which makes me think that something in confirm_transactions_till_block_height_... is hanging.

I'd prefer that we only abort when we know that the block height has been passed, rather than hardcoding 2 minutes. We can detach the future with tokio::spawn, loop checking is_finished() on it, and abort it if the blockhash is expired. What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good idea a Notify channel and here we break if the blockheight exceeds the last transaction blockheight or something. This should be done for all join and/or awaits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, do you mind if I include it in a separate PR? It isn't directly related to the changes here and can be done independently

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah not at all

@joncinque joncinque merged commit b2f4fb3 into anza-xyz:master Mar 21, 2024
47 checks passed
@joncinque joncinque deleted the confirmsooner branch March 21, 2024 13:35
mergify bot pushed a commit that referenced this pull request Mar 21, 2024
…in_parallel` (#348)

client: Confirm sooner during send_and_confirm_in_parallel
(cherry picked from commit b2f4fb3)
joncinque added a commit that referenced this pull request Mar 22, 2024
…ctions_in_parallel` (backport of #348) (#357)

client: Start resending sooner during `send_and_confirm_transactions_in_parallel` (#348)

client: Confirm sooner during send_and_confirm_in_parallel
(cherry picked from commit b2f4fb3)

Co-authored-by: Jon C <me@jonc.dev>
anwayde pushed a commit to firedancer-io/agave that referenced this pull request Jul 23, 2024
…ctions_in_parallel` (backport of anza-xyz#348) (anza-xyz#357)

client: Start resending sooner during `send_and_confirm_transactions_in_parallel` (anza-xyz#348)

client: Confirm sooner during send_and_confirm_in_parallel
(cherry picked from commit b2f4fb3)

Co-authored-by: Jon C <me@jonc.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants