Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Activation Phased Testing on Testnet #773

Closed
glevco opened this issue Sep 13, 2023 · 18 comments
Closed

Feature Activation Phased Testing on Testnet #773

glevco opened this issue Sep 13, 2023 · 18 comments
Assignees

Comments

@glevco
Copy link
Contributor

glevco commented Sep 13, 2023

This is a tracking issue to follow the Feature Activation Phase Testing. For more information, read the RFC.

@glevco glevco self-assigned this Sep 13, 2023
@glevco glevco converted this from a draft issue Sep 13, 2023
@glevco
Copy link
Contributor Author

glevco commented Sep 13, 2023

After completing all required PRs to make the Phased Testing work, the process was correctly started. However, it was noticed that new blocks mined in the testnet were not signaling support for NOP_FEATURE_1, which was the intended behavior. We found out that the tx-mining-service, that is responsible for block mining in the testnet, also had to be updated to support signal bits, and it was not. Therefore, the initial trial of the whole process is considered failed, and it will have to be restarted.

@glevco
Copy link
Contributor Author

glevco commented Oct 2, 2023

To fix the issue found above, we had to make updates on hathor-core, python-hathorlib, and tx-mining-service:

All those fixes have been merged and released.

@glevco
Copy link
Contributor Author

glevco commented Oct 2, 2023

New blocks mined for testnet are being correctly configured. Even though the process will be restarted for a full check of every evaluation interval, we can already see that it's working as expected.

Logs indicate that the full node is correctly starting with the intended feature support signals:

Screenshot 2023-10-02 at 14 02 30

The state for each feature is also logged on new blocks, as described in the RFC:

Screenshot 2023-10-02 at 14 05 51

Lastly, we can track the progress via the explorer:

Screenshot 2023-10-02 at 14 06 41

And also visualize that blocks are indeed signaling:

Screenshot 2023-10-02 at 14 07 03

@glevco
Copy link
Contributor Author

glevco commented Oct 2, 2023

We also noticed that the implementation of mandatory signaling described in the original RFC was missing.

It has been implemented and is under review in #785. The Phase Testing procedure will be restarted after this is merged and released.

@glevco
Copy link
Contributor Author

glevco commented Nov 23, 2023

The Phase Testing will be restarted with new NOP features, as defined in #879

@glevco
Copy link
Contributor Author

glevco commented Dec 26, 2023

Since the release of the full node version containing the new NOP features was a bit delayed, testnet blocks did not start signaling support right when the activation process started. Nevertheless, we released it 1-2 days after the start, with enough time to reach the configured threshold of 75%.

On block with height 3386879, in the Feature Activation panel of the explorer, we can see that there were no features under the activation process:

Screenshot 2023-12-26 at 18 25 22

This is correct, as this is the block just before the start_height, which is 3386880.

We can also use this CloudWatch query to view the related logs:

fields @timestamp, @message
| sort @timestamp desc
| filter @message like 'height": 3386879'
| filter @logStream = 'testnet-golf-v0_58_0_rc4-20231221_release_candidate-cluster/node-scratch/eu-central-1-aaa612f6d80640c58d4efa1d9f945852'
| limit 100

And in the result we can see that the new NOP features are in the DEFINED state:

Screenshot 2023-12-26 at 18 35 43

@glevco
Copy link
Contributor Author

glevco commented Dec 26, 2023

Then, on block at height 3386880, the process started but the block was not signaling support, as explained above:

Screenshot 2023-12-26 at 18 37 34

And from the CloudWatch logs:

Screenshot 2023-12-26 at 18 39 59

That happened at 12/7/2023 11:03:43 AM.

@glevco
Copy link
Contributor Author

glevco commented Dec 26, 2023

After a while, the threshold of 75% was passed for NOP_FEATURE_4:

Screenshot 2023-12-21 at 15 26 37

NOP_FEATURE_5 is going to be activated by lock_in_on_timeout=True, and NOP_FEATURE_6 is going to be failed. That's why their acceptance is 0%.

Everything is working as expected, up to this point.

@glevco
Copy link
Contributor Author

glevco commented Dec 26, 2023

On block at height 3427199, we can check that the feature states are still STARTED. Then, on block at height 3427200, exactly 40320 blocks after the start_height, the features make their transitions:

Screenshot 2023-12-26 at 18 46 26

From the logs:

Screenshot 2023-12-26 at 18 49 46

This happened at 12/23/2023 5:08:25 AM. It's interesting to note that this is 16 days after the start of the process, instead of the expected 14 days. This is because the blocks are not being relayed consistently with 30 seconds apart.

@glevco
Copy link
Contributor Author

glevco commented Dec 26, 2023

At this time, the current best block is the one with height 3437279, from 12/26/2023 5:29:41 PM.

This is the current state:

Screenshot 2023-12-26 at 18 54 00

There is one action point so far:

@glevco
Copy link
Contributor Author

glevco commented Dec 27, 2023

Also, the MUST_SIGNAL phase started for NOP_FEATURE_5 on Dec 23rd, and today we've reached 25% of non-signaling blocks. We should have prepared a deploy for the tx-mining-service node with support enabled for this feature.

By human error, this wasn't done in time, so all blocks are being rejected in testnet since 17:29 BRT (the block linked above is the last one). At this time, we're already deploying a new node to fix this, with the support option enabled for this feature. Here are the tx-mining-service logs showing the last block that was found, and the stalling after that:

Screenshot 2023-12-26 at 22 49 04

Nonetheless, this correctly tested the mandatory signaling behavior. Blocks that fail to signal support after 1-threshold non-signaling blocks must be rejected. Here's the error when trying to push such block, on my local node connected to the testnet:

Screenshot 2023-12-26 at 22 44 07

So, the Feature Activation process is working as expected, up to this point.

@glevco
Copy link
Contributor Author

glevco commented Dec 27, 2023

Another action point, as suggested by @jansegre:

@glevco
Copy link
Contributor Author

glevco commented Dec 27, 2023

Block mining has been restored. Here's the signaling on the most recent block:

Screenshot 2023-12-27 at 12 52 38

And the acceptance for NOP_FEATURE_5 on MUST_SIGNAL is climbing:

Screenshot 2023-12-27 at 12 53 44

@glevco glevco changed the title Feature Activation Phased Testing Feature Activation Phased Testing on Testnet Dec 27, 2023
@glevco
Copy link
Contributor Author

glevco commented Dec 27, 2023

Another action point suggested by @jansegre as an UX improvement:

@glevco
Copy link
Contributor Author

glevco commented Jan 8, 2024

Today another Evaluation Interval block was reached, and the Feature Activation states advanced accordingly and correctly.

Here's the complete features table from the explorer:

Screenshot 2024-01-07 at 22 26 46

We can see that NOP_FEATURE_6 failed, as expected. NOP_FEATURE_4's state didn't change, as its minimum activation height hasn't been reached yet, and NOP_FEATURE_5 transitioned from MUST_SIGNAL to LOCKED_IN.

Here's the signaling in the current best block:

Screenshot 2024-01-07 at 22 26 56

Where we can confirm the same states.

And lastly, from the logs:

Screenshot 2024-01-07 at 22 33 00

Then, in the next transition (~two weeks from now), we should see features 4 and 5 transitioning to ACTIVE.

@glevco
Copy link
Contributor Author

glevco commented Jan 22, 2024

The testnet Phased Testing has completed yesterday. Features 4 and 5 have transitioned to ACTIVE, as expected, generating logs that are displayed conditionally on those features, meaning that the process is fully working and ready to be tested on mainnet.

Here's the current state from the explorer:

Screenshot 2024-01-22 at 12 06 52

And the moment the features transitioned, from block 3507839 to 3507840, activating the logs:

Screenshot 2024-01-22 at 14 11 27

@glevco
Copy link
Contributor Author

glevco commented Jan 22, 2024

Reviewers: @msbrogli @jansegre

@glevco glevco moved this from In Progress (WIP) to In Progress (Done) in Hathor Network Jan 22, 2024
@glevco
Copy link
Contributor Author

glevco commented Aug 5, 2024

Closing this as the process is already up and running on mainnet.

@glevco glevco closed this as completed Aug 5, 2024
@github-project-automation github-project-automation bot moved this from In Progress (Done) to Waiting to be deployed in Hathor Network Aug 5, 2024
@glevco glevco moved this from Waiting to be deployed to Done in Hathor Network Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant