Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stacks 2.1] 500 response from GET /v2/pox #3261

Closed
zone117x opened this issue Aug 24, 2022 · 16 comments
Closed

[Stacks 2.1] 500 response from GET /v2/pox #3261

zone117x opened this issue Aug 24, 2022 · 16 comments
Assignees
Labels
bug Unwanted or unintended property causing functional harm locked ship now This is a blocker for the next release (either non-breaking or breaking change) stacks-2.1

Comments

@zone117x
Copy link
Member

zone117x commented Aug 24, 2022

Running the latest next branch in mocknet mode, and the GET /v2/pox endpoint often returns an HTTP 500 error with the message: Failed to query peer info. On my machine, there's around a 2 second window of time where the 500 errors are returned.

It's easier to reproduce this issue with a short mocknet block time. So perhaps some race condition that happens around block assembly / broadcast time?

Here's a snippet of logs during a 500 request, with STACKS_LOG_DEBUG=1:
stacks-node-500.log.txt

It looks like these are the relevant lines:

INFO [1661342705.107448] [testnet/stacks-node/src/tenure.rs:70] [main] Node starting new tenure with VRF f1d24b909c98118e60f4396d65ab320dfa7c907c43d5de2031f8b170203441a3
DEBG [1661342705.199959] [src/net/p2p.rs:4918] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): load chain view for burn block 30
DEBG [1661342705.204901] [src/net/inv.rs:2530] [ThreadId(2)] Awaken inv sync to re-scan peer block inventories at height 28
DEBG [1661342705.205719] [src/net/inv.rs:2238] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): PoX bit vector is PoxId([true]) (reloaded=true)
INFO [1661342705.206280] [src/net/http.rs:1635] [ThreadId(2)] Handle HTTPRequest, verb: GET, peer_addr: 127.0.0.1:55041, path: /v2/pox, query: 
WARN [1661342705.233753] [src/net/rpc.rs:695] [ThreadId(2)] Failed to get PoX info GetPoxInfo(HttpRequestMetadata { version: Http11, peer: PeerHost::DNS(localhost,20443), keep_alive: true, canonical_stacks_tip_height: None }, UseLatestAnchoredTip): DBError(NotFoundError)
DEBG [1661342705.234666] [src/net/rpc.rs:3092] [ThreadId(2)] Processed HTTPRequest, path: /v2/pox, processing_time_ms: 28, conn_id: 8, peer_addr: 127.0.0.1:55041
DEBG [1661342705.234893] [src/net/p2p.rs:3611] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): network work state is GetPublicIP
WARN [1661342705.234911] [src/net/p2p.rs:2453] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): failed to learn public IP: NoSuchNeighbor
DEBG [1661342705.234920] [src/net/p2p.rs:2415] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): reset public IP query state
DEBG [1661342705.234928] [src/net/p2p.rs:3611] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): network work state is BlockInvSync
DEBG [1661342705.234937] [src/net/p2p.rs:2472] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): network inventory sync
DEBG [1661342705.234946] [src/net/inv.rs:2254] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): Inventory state has 0 block stats tracked
DEBG [1661342705.235067] [src/net/inv.rs:2518] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): Sampled 0 outbound sync peers
DEBG [1661342705.235402] [src/net/inv.rs:2384] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): inventory sync finished; sortition start is 1
DEBG [1661342705.235417] [src/net/inv.rs:2409] [ThreadId(2)] local.80000000://(bind=0.0.0.0:20444)(pub=None): inv sync finished, learned something new (have 0 inv states), ibd: false, num_inv_syncs: 0, num_sync_neighbors: 0

Using config.toml to bootstrap to epoch 2.1:

[burnchain]
chain = "bitcoin"
mode = "mocknet"
commit_anchor_block_within = 1500

[[burnchain.epochs]]
epoch_name = "1.0"
start_height = 0

[[burnchain.epochs]]
epoch_name = "2.0"
start_height = 0

[[burnchain.epochs]]
epoch_name = "2.05"
start_height = 1

[[burnchain.epochs]]
epoch_name = "2.1"
start_height = 2
@zone117x
Copy link
Member Author

zone117x commented Sep 5, 2022

I found this error is triggered on this line: https://github.com/stacks-network/stacks-blockchain/blob/7d960260b525d270e97bbafc8f09bd77d883b37b/src/net/rpc.rs#L420

This get_ancestor_sort_id call returns db_error:NotFoundError https://github.com/stacks-network/stacks-blockchain/blob/7d960260b525d270e97bbafc8f09bd77d883b37b/src/chainstate/burn/db/sortdb.rs#L3488-L3490

I'm not sure what the fix is here. Perhaps a mismatch between the reward start height and sortition ID?

@zone117x
Copy link
Member Author

zone117x commented Sep 6, 2022

Also of note: I think the is_pox_active: cur_cycle_pox_active value from the above line (and the parent object current_cycle) is the only part of this function that doesn't respect the tip parameter. Although, I'm not sure if a non-default tip param is explicitly supported in other parts of this /v2/pox fn.

@jcnelson
Copy link
Member

jcnelson commented Sep 6, 2022

This can be reproduced if you hit up /v2/pox aggressively during when a sortition is being processed.

@jcnelson
Copy link
Member

jcnelson commented Sep 6, 2022

Related: don't use the tip argument at all since the sortition DB won't honor it. Also, see if we can get a "read transaction" for querying all the state we need in one query.

@jcnelson jcnelson closed this as completed Sep 6, 2022
@zone117x
Copy link
Member Author

zone117x commented Sep 6, 2022

@jcnelson accidental close?

@jcnelson
Copy link
Member

jcnelson commented Sep 7, 2022

yup, fat-fingered

@jcnelson jcnelson self-assigned this Oct 27, 2022
@zone117x zone117x added the bug Unwanted or unintended property causing functional harm label Oct 27, 2022
@jcnelson
Copy link
Member

jcnelson commented Nov 1, 2022

I think this is fixed in next if you wanna try it again.

@zone117x
Copy link
Member Author

zone117x commented Nov 2, 2022

Unfortunately it doesn't appear to be fixed. Running into the same problem (occurs at the same frequency as originally reported).

Pulling in the temp work-around from #3281 still works in preventing the 500 responses (it just ignores errors from that sortdb.is_pox_active(burnchain, &burnchain_tip) line.

Something is up with that is_pox_active function, I've gone down a few rabbit holes trying to debug why but no luck.

@jcnelson
Copy link
Member

jcnelson commented Nov 4, 2022

Does this happen if v1_unlock_height is the same as epoch 2.1's start_height?

@jcnelson jcnelson added the ship now This is a blocker for the next release (either non-breaking or breaking change) label Nov 7, 2022
@zone117x
Copy link
Member Author

zone117x commented Nov 7, 2022

Does this happen if v1_unlock_height is the same as epoch 2.1's start_height?

I'm actually unable to get the node to mine the epoch 2.1 start block when setting [burnchain.pox_2_activation] to the same value as [[burnchain.epochs]] epoch 2.1 start_height.

I get the following error message repeating:

ERRO [1667850564.225276] [testnet/stacks-node/src/neon_node.rs:1774] [miner-block-http://0.0.0.0:20443] Relayer: Failure fetching recipient set: ChainstateError(ClarityError(Interpreter(Unchecked(NoSuchContract("ST000000000000000000002AMW42H.pox-2")))))

Here's the config I'm using:

[node]
name = "krypton-node"
rpc_bind = "0.0.0.0:20443"
p2p_bind = "0.0.0.0:20444"
working_dir = "/chainstate/stacks-blockchain-data"

miner = true
use_test_genesis_chainstate = true
pox_sync_sample_secs = 1
wait_time_for_blocks = 0
wait_time_for_microblocks = 50
microblock_frequency = 1000

[miner]
first_attempt_time_ms = 30000
subsequent_attempt_time_ms = 5000

[connection_options]
disable_block_download = true
disable_inbound_handshakes = true
disable_inbound_walks = true
public_ip_address = "1.1.1.1:1234"

[burnchain]
chain = "bitcoin"
mode = "krypton"
poll_time_secs = 1
peer_host = "localhost"
peer_port = 18444
rpc_port = 18443
rpc_ssl = false
username = "btc"
password = "btc"
timeout = 30

pox_2_activation = 105 # <---- mining error when set to the same value as epoch 2.1 start height

[[burnchain.epochs]]
epoch_name = "1.0"
start_height = 0

[[burnchain.epochs]]
epoch_name = "2.0"
start_height = 103

[[burnchain.epochs]]
epoch_name = "2.05"
start_height = 104

[[burnchain.epochs]]
epoch_name = "2.1"
start_height = 105

I also get the same error when bumping it to one block higher. In the above config, I need to set pox_2_activation to 107 (2 higher than the epoch21 start height) in order for the miner to progress.

It also works if I set both the epoch21 start height and pox_2_activation to 107.

Is pox_2_activation rounding down to the height of the previous pox cycle?

@gregorycoppola
Copy link
Contributor

@zone117x i'm not sure what the discrepancy is, but I saw it work with

[node]
name = "krypton-node"
rpc_bind = "0.0.0.0:20443"
p2p_bind = "0.0.0.0:20444"
# working_dir = "$DATA_DIR"

seed = "9e446f6b0c6a96cf2190e54bcd5a8569c3e386f091605499464389b8d4e0bfc201"
local_peer_seed = "9e446f6b0c6a96cf2190e54bcd5a8569c3e386f091605499464389b8d4e0bfc201"

miner = true
use_test_genesis_chainstate = true
pox_sync_sample_secs = 5
wait_time_for_blocks = 0
wait_time_for_microblocks = 1000
microblock_frequency = 5000
# mine_microblocks = true
# max_microblocks = 10
#
[[events_observer]]
endpoint = "localhost:50303"
retry_count = 255
events_keys = ["*"]

[miner]
first_attempt_time_ms = 30000
subsequent_attempt_time_ms = 5000
block_reward_recipient = "ST9V4MGBMGHX0WR3MG6JA53HNKMJ2HF9VWRQNTKR"

[connection_options]
# inv_sync_interval = 10
# download_interval = 10
# walk_interval = 10
disable_block_download = true
disable_inbound_handshakes = true
disable_inbound_walks = true
public_ip_address = "1.1.1.1:1234"

[burnchain]
chain = "bitcoin"
mode = "krypton"
poll_time_secs = 1
pox_2_activation = 105

### bitcoind-regtest connection info
peer_host = "127.0.0.1"
peer_port = 18444
rpc_port = 18443
rpc_ssl = false
username = "krypton"
password = "krypton"
timeout = 30

[[burnchain.epochs]]
epoch_name = "1.0"
start_height = 0

[[burnchain.epochs]]
epoch_name = "2.0"
start_height = 103

[[burnchain.epochs]]
epoch_name = "2.05"
start_height = 104

[[burnchain.epochs]]
epoch_name = "2.1"
start_height = 105

@zone117x
Copy link
Member Author

zone117x commented Nov 10, 2022

The problem is some block height configs (seemingly arbitrarily) don't work. My guess is something related to pox_2_activation needing to be on a pox cycle boundary relative to one of the start_height configs.

@zone117x
Copy link
Member Author

zone117x commented Nov 10, 2022

Also this thread is getting a bit off topic (although the latest issue we are discussing might be related to the initial issue idk). The focus of this issue to fix the HTTP 500 responses returned from GET /v2/pox.

Should I open a new issue for Miner halts with error for certain block height configs?

cc @jcnelson ?

@jcnelson jcnelson mentioned this issue Nov 16, 2022
@jcnelson
Copy link
Member

Should I open a new issue for Miner halts with error for certain block height configs?

Yes, let's do that.

@zone117x
Copy link
Member Author

The /v2/pox HTTP 500 error was fixed by @jcnelson in #3399

I'm closing this issue, and opening a new one for the Miner halts with error for certain block height configs issue discussed above.

@blockstack-devops
Copy link
Contributor

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Nov 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Unwanted or unintended property causing functional harm locked ship now This is a blocker for the next release (either non-breaking or breaking change) stacks-2.1
Projects
None yet
Development

No branches or pull requests

4 participants