Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] - cncli.sh - integration of full stakepool history load into epochdata table #1793

Merged
merged 23 commits into from
Sep 25, 2024

Conversation

asnakep
Copy link
Contributor

@asnakep asnakep commented Jul 30, 2024

Hello CNTools/Koios Team,

I would like to propose the integration in cncli.sh of a process I wrote to perform a full load of epochdata table in blocklog db.

It checks all previous epochs where your pool got at least one block based on the query "SELECT DISTINCT epoch FROM blocklog."

It also takes into account the current epoch and the next epoch if it's found in the blocklog table.

Depending of the pool history it can take some minutes or a lot of hours.
It took 2 hours and ~30 minutes to process 138 epochs.

It is shown in cncli.sh menu as:

epochdata Manually re-calculate leaderlog to load stakepool history into epochdata table of blocklog db. Needs completion of cncli.sh sync process.
   all One-time re-calculation of all epochs (avg execution duration: 1hr / 50 epochs)
   epoch One-time re-calculation for the specified epoch

Following the logic of validate all and validate epoch, I added the feature to process a single epoch, with overwrite if epoch already exists.

Altough the most important is epochdata all which requires to run only once, or more if is needed for some reason.

@Scitz0 thanks a lot for your suggestions in PR 1782, it looks much better now.
Anyway I wait for your feedback in case something needs to be reviewed.

Thank you,
kind regards,
Manuel

Description

Added new function cncliEpochData in cncli.sh

Where should the reviewer start?

Run: "cncli.sh epochdata all" or "cncli.sh epochdata epoch"

Motivation and context

The scenario is this: An operator runs a stakepool for months or years without using cntools/cncli and makes blocks during this period. At some point, the operator starts using cntools/cncli, and when cncli initializes, all block schedules from the first pool block until the last are entered into the blocklog table in blocklog.db. Meanwhile, the epochdata table is updated at every slot leader check, so you only have data in epochdata from when you started using CNTools, not from before.

How has this been tested?

New function cncliEpochData has been extensively tested during mainnet epochs 500.

A new function getProtocolParamsHist has been added in env file and added in getConsensus func of cncli.sh to handle consensus choice in previous epochs.

Needed variables in cncli.sh :

POOL_ID
POOL_ID_BECH32

Copy link
Contributor

@Scitz0 Scitz0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the code changes, its now much easier to read 👍
Though I did find stuff to fix so please check comments added.

scripts/cnode-helper-scripts/env Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
@asnakep
Copy link
Contributor Author

asnakep commented Jul 31, 2024

Hello @Scitz0

all requested points have been applied/fixed, I'll keep testing tomorrow for next epoch, and then I'll push the updates

Cheers,
Manuel

@Scitz0
Copy link
Contributor

Scitz0 commented Jul 31, 2024

Hello @Scitz0

all requested points have been applied/fixed, I'll keep testing tomorrow for next epoch, and then I'll push the updates

Cheers, Manuel

👍 I will have a look again once pushed. Thanks for changes.

@asnakep
Copy link
Contributor Author

asnakep commented Aug 1, 2024

Hello @Scitz0

I've just pushed the changes

Cheers
Manuel

@asnakep
Copy link
Contributor Author

asnakep commented Aug 2, 2024

Hello @Scitz0

I've to review an issue i have with getconsensus and next epoch, I'll do a new push when solved.

Cheers,
Manuel

@asnakep
Copy link
Contributor Author

asnakep commented Aug 2, 2024

updated getConsensus function to use getProtocolParams on next_epoch instead of getProtocolParamsHist

Copy link
Contributor

@Scitz0 Scitz0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added several requested code changes now... lets revisit again once all are addressed.
I also marked all old as resolved to make PR less confusing.

scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
@asnakep
Copy link
Contributor Author

asnakep commented Aug 2, 2024

Hello @Scitz0

I just pushed last changes

cheers,
Manuel

@asnakep asnakep requested a review from Scitz0 August 2, 2024 19:35
EPOCHS=$(sqlite3 "$BLOCKLOG_DB" "SELECT group_concat(epoch,' ') FROM (SELECT DISTINCT epoch FROM blocklog ORDER BY epoch);")
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
@asnakep
Copy link
Contributor Author

asnakep commented Aug 3, 2024

Hello @Scitz0

I just pushed last changes, I'm ready to review them as per last comment

cheers,
Manuel

scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
…rentEpoch, runNextEpoch and runPreviousEpochs
@asnakep
Copy link
Contributor Author

asnakep commented Aug 3, 2024

Hello @Scitz0

I just pushed last changes. I removed --ledger-set from the three leaderlog functions, added --epoch param in runCurrentEpoch and runNextEpoch accordingly to the input var $1

local epoch="$1" in now in the three leaderlog funcs and the calls from processAllEpochs and processSingleEpoch,
have been updated as:

runCurrentEpoch ${epoch}
runNextEpoch ${epoch}
runPreviousEpochs ${epoch}

Cheers,
Manuel

echo "ERROR: Failed to fetch protocol parameters for epoch ${1}.";

2. Requoted "$1" in condition [[ ${epoch} = "$1" ]] && matched=true && break
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Outdated Show resolved Hide resolved
@Scitz0
Copy link
Contributor

Scitz0 commented Aug 3, 2024

Hello @Scitz0

I just pushed last changes. I removed --ledger-set from the three leaderlog functions, added --epoch param in runCurrentEpoch and runNextEpoch accordingly to the input var $1

local epoch="$1" in now in the three leaderlog funcs and the calls from processAllEpochs and processSingleEpoch, have been updated as:

runCurrentEpoch ${epoch}
runNextEpoch ${epoch}
runPreviousEpochs ${epoch}

Cheers, Manuel

Ok great, though added some more comments as its currently a bit inconsistent with usage of local ${epoch} variable and ${1}. So either go ${1} everywhere or move local variable to the top and use everywhere instead of $1

@asnakep
Copy link
Contributor Author

asnakep commented Aug 3, 2024

Hello @Scitz0

last updates have been pushed

cheers,
Manuel

@Scitz0
Copy link
Contributor

Scitz0 commented Aug 3, 2024

I now marked all as resolved and pushed a small fix to solve merge conflict.

The one thing I want to fix before we merge is the wait time at epoch start. You dont have to do anything about that right now, just leave PR as is and I'll think about it for a bit.

@asnakep
Copy link
Contributor Author

asnakep commented Aug 4, 2024

Hello @Scitz0

Thanks so much for your precious advices on how to proceed with this work. This is the first time I have worked with CNTools scripts, and I hope I haven’t bothered you too much with all the reviews.

Cheers,
Manuel

@Scitz0
Copy link
Contributor

Scitz0 commented Aug 4, 2024

Hello @Scitz0

Thanks so much for your precious advices on how to proceed with this work. This is the first time I have worked with CNTools scripts, and I hope I haven’t bothered you too much with all the reviews.

Cheers,
Manuel

You are welcome and not at all, always glad to see when people take the time try to improve and add features to the scripts.

TrevorBenson and others added 2 commits August 10, 2024 11:15
## Description
1. Changes to the nginx mithril relay LB and squid mithril relay
   * Nginx mithril relay LB:
     * Uses stream protocol
     * proxy_connect_timeout of 10 seconds
     * max_fails set to 1
* fail_timeout set to 10 seconds per relay (20 seconds for 2 relay
configurations, 30 for 3, etc.)
   * Squid mithril relay:
* Change to how ACL's and allow rulesare written for multiple block
producers
* Includes rules for non block producers (other relays, local relay IP,
simplifies testing from other than block producer) etc.
2. Enhances the verify_signer_registered function
* Continues to check if mithril-signer has registered in the current
epoch.
* If the signer has registered for the current epoch it will also check
if the signer registered two epochs earlier and reports if the signer is
valid to sign certificates in the current epoch or not.
3. Adds a new MITHRIL_HOME variable to the
`scripts/cnode-helper-scripts/env` file.
   * Defaults to the current path of `${CNODE_HOME}/mithril`.
* Provides the SPO a way to set a unique path for where mithril
environment and mithril data-stores will be located.
4. Changes to a status code for checks on mithril minimum versions.
Sourced from past conversation w/ @Scitz0
5. Adds functions **semantic_version_check** and
**check_mithril_upgrade_safe**
* GitHub Actions workflow now uses **check_mithril_upgrade_safe**
against the repo `node-latest.txt` to determine if changing
`mithril-latest.xt` is acceptable.
* Not yet implemented in `guild-deploy.sh`, but the
**check_mithril_upgrade_safe** should support both CI Actions and
production node/signer environments. Open for implementation
discussions.

## Where to start?
I'd suggest review of each commit vs. reviewing the files changed. The
commit for **Refactor mithril functions into mithril library.** will be
quite large. There is a general section of functions used by multiple
scripts, and then functions are sorted into blocks by the script that
uses them, client, relay or signer. Each section is alphabetically
sorted.

## Motivation and context
1. Nginx relay/lb
 * Issues with current nginx config reported in Koios support channel.
 * Simplified testing of squid proxy when additional IPs are provided.
* Offline relays were adding 60 second delays until nginx timeouts
occured and LB moved onto the next relay.
 * Tuned the max fails to 1 and fail_timeout 10 seconds * # of relays.
* Temporarily prevents re-using an offline relay if many requests get
sent through the same LB
2. Reduce confusion for new signer users who were not aware signing had
a 2 epoch delay.
3. Flexibility for SPO's to choose their own directories.
4. Past discussions with @Scitz0 
5. Past discussions with @rdlrt 

## How has this been tested?
1. Nginx relay/squid
   * Direct curl requests on the block producer:
`curl -4 --proxy http://127.0.0.1:3132
https://aggregator.release-mainnet.api.mithril.network/aggregator`

* Enabling tcpdump on port 3132 for all (squid) mithril relays,
disabling the first relay in the mithril_relays upstream definition of
the (nginx) mithril relay lb. Then timing the request while watching
traffic from each mithril relay:
`time curl -4 --proxy http://127.0.0.1:3132
https://aggregator.release-mainnet.api.mithril.network/aggregator`.

* The first request takes 10.0x seconds, to succeed, subsequent requests
take sub 1 second and traffic shows round robin over the remaining
online mithril relays. If round robin reaches the offline mithril relay
before the fail_timout has expired it gets skipped and the traffic is
again sent to the first online (second in the list) upstream mithril
relay.

2. Manually
3. Test virtual machines altering path to `/opt/cardano/mithril`
4. Testing locally and in forked repository github actions
5. Testing locally and in forked repository github actions

![image](https://github.com/user-attachments/assets/3838f427-1f38-4808-8188-eb517def0153)

---------

Co-authored-by: RdLrT <3169068+rdlrt@users.noreply.github.com>
@rdlrt
Copy link
Contributor

rdlrt commented Aug 23, 2024

The PR will not be merged until koios-1.2.0a is on mainnet (since we'd be able to query stats much earlier with upgraded dbysnc) - removing the wait time post epoch transition

@rdlrt rdlrt force-pushed the alpha branch 3 times, most recently from 7366184 to 73bebe5 Compare August 26, 2024 07:33
@rdlrt rdlrt force-pushed the alpha branch 4 times, most recently from a014d46 to 3b839bc Compare September 4, 2024 03:43
@rdlrt
Copy link
Contributor

rdlrt commented Sep 4, 2024

With release of koios-1.2.0, this PR should be unblocked - need to ensure that stake snapshot stats are available immediately post epoch transition, and remove 6 hour hard coded timeout (replace it perhaps with check tip and ensure epoch has new block post transition)

rdlrt and others added 3 commits September 11, 2024 23:22
2. replaced first six hours of new epoch condition for function: getLedgerData

3. added print available epochs from blocklog table in blocklog db
 No slots found in blocklog table for epoch 222.
 choose from epochs in list:
  230 252 274 282 299 311 312 316 317 318......
@asnakep
Copy link
Contributor Author

asnakep commented Sep 19, 2024

Hello Koios Team,

sorry for the delay on this PR, I just pushed these changes:

  1. fixed linting issues.
  2. replaced first six hours of new epoch condition for function: getLedgerData
  3. added print available epochs from blocklog table in blocklog db
    No slots found in blocklog table for epoch 222.
    choose from epochs in list:
    230 252 274 282 299 311 312 316 317 318......

Thank you,
cheers,
Manuel

Copy link
Contributor

@Scitz0 Scitz0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few more things while testing this PR.

scripts/cnode-helper-scripts/cncli.sh Show resolved Hide resolved
scripts/cnode-helper-scripts/cncli.sh Show resolved Hide resolved
Manuel and others added 2 commits September 24, 2024 17:45
2. changed: sed 's/"//g' >> "$tmpcsv"
   for tr -d '"' >> "$tmpcsv"
on runCurrentEpoch, runNextEpoch, runPreviousEpochs
@Scitz0 Scitz0 merged commit f737d0a into cardano-community:alpha Sep 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants