Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: reduce chain-liveness poll frequency #3610

Merged
merged 2 commits into from
Mar 14, 2023
Merged

Conversation

kantai
Copy link
Member

@kantai kantai commented Mar 14, 2023

Description

This PR reduces the chain-liveness poll frequency. This was driving the issues facing Hiro's testnet miner yesterday.

@codecov
Copy link

codecov bot commented Mar 14, 2023

Codecov Report

Merging #3610 (6ccadda) into master (753aa1a) will increase coverage by 0.46%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3610      +/-   ##
==========================================
+ Coverage   29.86%   30.33%   +0.46%     
==========================================
  Files         298      298              
  Lines      275064   275069       +5     
==========================================
+ Hits        82143    83435    +1292     
+ Misses     192921   191634    -1287     
Impacted Files Coverage Δ
testnet/stacks-node/src/config.rs 48.37% <100.00%> (+0.24%) ⬆️
testnet/stacks-node/src/run_loop/neon.rs 83.14% <100.00%> (-0.56%) ⬇️

... and 75 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kantai kantai requested a review from jcnelson March 14, 2023 14:43
Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm curious how this lead to the livelock conditions you were seeing?

@jcnelson
Copy link
Member

Also, does this supersede #3609?

@kantai
Copy link
Member Author

kantai commented Mar 14, 2023

Yes, I think #3609 isn't necessary with this patch.

Our testnet miner was configured with:

[miner]
    first_attempt_time_ms = 5000
    subsequent_attempt_time_ms = 15000
    microblock_attempt_time_ms = 5000

This caused the chain-liveness thread to submit a new stack block event or new burn block event to the chains coordinator every ~15 seconds. When the coordinator gets those events, it interrupts the miner regardless of whether or not a new burn or stacks block actually arrived. This was just not giving the miner enough time to mine a block.

I think this PR is still a bandaid for this behavior. I think there's perhaps two other approaches that should be explored:

  1. Refactor the miner blocking signal in the chains coordinator to wait until it knows that a block will be processed before interrupting the miner. That has its own kind of concurrency difficulty: the chains coordinator and the miner contend for a lock on the chainstate db (the chains coordinator tries to begin a tx before it checks whether or not there's a block to process).
  2. Refactor the RunTenure directive handler to wait_on() the miner blocking signal. If the miner is in the blocked state, then RunTenure should just wait until it unblocks (up to some timeout). Simply losing a RunTenure directive is costly to active mining.

Copy link
Contributor

@obycode obycode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kantai kantai force-pushed the fix/chain-liveness-poll branch from fbe1b95 to 6ccadda Compare March 14, 2023 18:45
@kantai kantai merged commit bfa941e into master Mar 14, 2023
@kantai kantai deleted the fix/chain-liveness-poll branch March 14, 2023 22:48
@igorsyl
Copy link
Contributor

igorsyl commented Mar 15, 2023

FYI @jferrant @stjepangolemac

@blockstack-devops
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Nov 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
Status: Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

5 participants