Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add docs for resequencing #323

Merged
merged 3 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions docs/resequence-sequencer/resequence-sequencer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# Resequencing batches with the Erigon sequencer

In the case of the sequencer receiving "bad" batches, which are effectively unprovable, it is possible to resequence.
The attached [script](./test_resequence.sh) provides an automation to test this kind of scenario. Please refer to the script for quick testing.

In cases where you'd want to manually trigger such cases, refer to the below steps:

The high level steps to resequence in the provided script is:

1. Stop sequencer
2. Change configs to simulate bad batches
3. Start sequencer with modified config
4. Inject load
5. Wait for batches to virtualize
6. Stop cdk-node-001
7. Stop sequencer
8. Rollback batches on L1 contract
9. Unwind to batch in sequencer with `integration` command
10. Change sequencer config to resequence
11. Start sequencer
12. Once resequencing is done/timed out, stop the sequencer
13. Change to normal config and restart sequencer
14. Start cdk-node-001
15. Compare block hashes from sequencer and erigon rpc

Assuming that you encountered bad batches, or want to resequence for some reason, we can reference the steps above:

#### Make backup of the configs

```bash
kurtosis service exec cdk cdk-erigon-sequencer-001 "cp \-r /etc/cdk-erigon/ /tmp/"
```

#### Stop the cdk-node
It is important to stop the cdk-node-001 service when attempting this procedure.

```bash
kurtosis service stop cdk cdk-node-001
```

#### Stop the sequencer
The Erigon sequencer image in Kurtosis CDK is setup so that the `cdk-erigon` process can be killed without exitting the container. This allows changing the configuration of the sequencer more easily.

```bash
# Send a SIGTRAP signal to the proc-runner process
kurtosis service exec cdk cdk-erigon-sequencer-001 "pkill -SIGTRAP "proc-runner.sh"" || true
# Send a SIGINT signal to the cdk-erigon process
kurtosis service exec cdk cdk-erigon-sequencer-001 "pkill -SIGINT "cdk-erigon"" || true
```

#### Geting the latest L1 verified batch
This can usually be done by querying the L1 explorer, but in a Kurtosis devnet environment, this can be done by querying the rollup manager contract.

```bash
# Queries the latest verified batch number
current_batch=$(cast logs --rpc-url "$(kurtosis port print cdk el-1-geth-lighthouse rpc)" --address 0x1Fe038B54aeBf558638CA51C91bC8cCa06609e91 --from-block 0 -j | jq -r '.[] | select(.topics[0] == "0x9c72852172521097ba7e1482e6b44b351323df0155f97f4ea18fcec28e1f5966" or .topics[0] == "0xd1ec3a1216f08b6eff72e169ceb548b782db18a6614852618d86bb19f3f9b0d3") | .topics[1]' | tail -n 1 | sed 's/^0x//')

# Converts hexadecimal value
current_batch_dec=$((16#$current_batch))
```

- `0x1Fe038B54aeBf558638CA51C91bC8cCa06609e91` is the address of our particular rollup contract.
- FilterVerifyBatches is a free log retrieval operation binding the contract event `0x9c72852172521097ba7e1482e6b44b351323df0155f97f4ea18fcec28e1f5966` for Validium Etrog networks.
- `0xd1ec3a1216f08b6eff72e169ceb548b782db18a6614852618d86bb19f3f9b0d3` is the verification topic for Etrog networks.

#### Rollback batches on L1 contract
Since the CDK network is managed by the L1 rollup manager contract, its important to trigger a batch rollback on the L1 rollup manager contract for our particular network after the L2 network is stopped.

```bash
cast send "0x2F50ef6b8e8Ee4E579B17619A92dE3E2ffbD8AD2" "rollbackBatches(address,uint64)" "0x1Fe038B54aeBf558638CA51C91bC8cCa06609e91" "$latest_verified_batch" --private-key "0x12d7de8621a77640c9241b2595ba78ce443d05e94090365ab3bb5e19df82c625" --rpc-url "$(kurtosis port print cdk el-1-geth-lighthouse rpc)"
```

- `0x2F50ef6b8e8Ee4E579B17619A92dE3E2ffbD8AD2` is the address of the rollup manager contract on L1
- `0x1Fe038B54aeBf558638CA51C91bC8cCa06609e91` is the address of our particular rollup contract.
- `0x12d7de8621a77640c9241b2595ba78ce443d05e94090365ab3bb5e19df82c625` is the private key of the admin address.

#### Unwind batches in the sequencer using `integration` command

The erigon sequencer image used in Kurtosis CDK comes with a built-in `integration` command line tool.

```bash
$ integration --help
[cdk-erigon-lib] timestamp 2024-03-12:16:34
long and heavy integration tests for Erigon

Usage:
integration [command]

Available Commands:
compare_bucket compare bucket to the same bucket in '--chaindata.reference'
compare_states compare state buckets to buckets in '--chaindata.reference'
completion Generate the autocompletion script for the specified shell
f_to_mdbx copy data from '--chaindata' to '--chaindata.to'
force_set_history_v3 Override existing --history.v3 flag value (if you know what you are doing)
force_set_prune Override existing --prune flag value (if you know what you are doing)
force_set_snapshot Override existing --snapshots flag value (if you know what you are doing)
help Help about any command
loop_exec
loop_ih
mdbx_to_mdbx copy data from '--chaindata' to '--chaindata.to'
print_migrations
print_stages
read_domains Run block execution and commitment with Domains.
remove_migration
reset_state Reset StateStages (5,6,7,8,9,10) and buckets
run_migrations
stage_bodies
stage_call_traces
stage_exec
stage_hash_state
stage_headers
stage_history
stage_log_index
stage_senders
stage_snapshots
stage_trie
stage_tx_lookup
state_domains Run block execution and commitment with Domains.
state_stages Run all StateStages (which happen after senders) in loop.
Examples:
--unwind=1 --unwind.every=10 # 10 blocks forward, 1 block back, 10 blocks forward, ...
--unwind=10 --unwind.every=1 # 1 block forward, 10 blocks back, 1 blocks forward, ...
--unwind=10 # 10 blocks back, then stop
--integrity.fast=false --integrity.slow=false # Performs DB integrity checks each step. You can disable slow or fast checks.
--block # Stop at exact blocks
--chaindata.reference # When finish all cycles, does comparison to this db file.

state_stages_zkevm Run all StateStages in loop.
Examples:
state_stages_zkevm --datadir=/datadirs/hermez-mainnet--unwind-batch-no=10 # unwind so the tip is the highest block in batch number 10
state_stages_zkevm --datadir=/datadirs/hermez-mainnet --unwind-batch-no=2 --chain=hermez-bali --log.console.verbosity=4 --datadir-compare=/datadirs/pre-synced-block-100 # unwind to batch 2 and compare with another datadir

warmup

Flags:
-h, --help help for integration
--log.console.json Format console logs with JSON
--log.console.verbosity string Set the log level for console logs (default "info")
--log.dir.json Format file logs with JSON
--log.dir.path string Path to store user and error logs to disk
--log.dir.verbosity string Set the log verbosity for logs stored to disk (default "info")
--log.json Format console logs with JSON
--metrics Enable metrics collection and reporting
--metrics.addr string Enable stand-alone metrics HTTP server listening interface (default "127.0.0.1")
--metrics.port int Metrics HTTP server listening port (default 6060)
--pprof Enable the pprof HTTP server
--pprof.addr string pprof HTTP server listening interface (default "127.0.0.1")
--pprof.cpuprofile string Write CPU profile to the given file
--pprof.port int pprof HTTP server listening port (default 6060)
--trace string Write execution trace to the given file
--verbosity string Set the log level for console logs (default "info")

Use "integration [command] --help" for more information about a command.

```

Using the `integration` command, we can unwind batches to the latest verified batch.

```bash
kurtosis service exec cdk cdk-erigon-sequencer-001 "integration state_stages_zkevm --config=/etc/cdk-erigon/config.yaml --unwind-batch-no=$latest_verified_batch --chain dynamic-kurtosis --datadir /home/erigon/data/dynamic-kurtosis-sequencer"
```

#### Change sequencer config to resequence with timeout enabled

```bash
kurtosis service exec cdk cdk-erigon-sequencer-001 "timeout 300s cdk-erigon --pprof=true --pprof.addr 0.0.0.0 --config /etc/cdk-erigon/config.yaml --datadir /home/erigon/data/dynamic-kurtosis-sequencer --zkevm.sequencer-resequence-strict=false --zkevm.sequencer-resequence=true --zkevm.sequencer-resequence-reuse-l1-info-index=true"
```

After the above is done, stop the sequencer again.

```bash
# Send a SIGTRAP signal to the proc-runner process
kurtosis service exec cdk cdk-erigon-sequencer-001 "pkill -SIGTRAP "proc-runner.sh"" || true
# Send a SIGINT signal to the cdk-erigon process
kurtosis service exec cdk cdk-erigon-sequencer-001 "pkill -SIGINT "cdk-erigon"" || true
```

#### Restart the cdk-node

```bash
kurtosis service start cdk cdk-node-001
```

#### Monitor logs and check blocks

The resequencing should be complete. Monitor the logs for the CDK components:
- Check the latest block number and arbitrary block hashes from the sequencer
- Check the latest block number and arbitrary block hashes from the erigon rpc and compare
- Check that the L1 verified batch number increments after some time
167 changes: 167 additions & 0 deletions docs/resequence-sequencer/test_resequence.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
#!/bin/bash

get_latest_l2_batch() {
local latest_block
latest_block=$(cast block latest --rpc-url "$(kurtosis port print cdk cdk-erigon-sequencer-001 rpc)" | grep "number" | awk '{print $2}')

local latest_batch
latest_batch=$(cast rpc zkevm_batchNumberByBlockNumber "$latest_block" --rpc-url "$(kurtosis port print cdk cdk-erigon-sequencer-001 rpc)" | sed 's/^"//;s/"$//')

if [[ -z "$latest_batch" ]]; then
echo "Error: Failed to get latest batch number" >&2
return 1
fi

latest_batch_dec=$((latest_batch))

echo "$latest_batch_dec"
}

get_latest_l1_verified_batch() {
current_batch=$(cast logs --rpc-url "$(kurtosis port print cdk el-1-geth-lighthouse rpc)" --address 0x1Fe038B54aeBf558638CA51C91bC8cCa06609e91 --from-block 0 -j | jq -r '.[] | select(.topics[0] == "0x9c72852172521097ba7e1482e6b44b351323df0155f97f4ea18fcec28e1f5966" or .topics[0] == "0xd1ec3a1216f08b6eff72e169ceb548b782db18a6614852618d86bb19f3f9b0d3") | .topics[1]' | tail -n 1 | sed 's/^0x//')
current_batch_dec=$((16#$current_batch))
echo "$current_batch_dec"
}


wait_for_l1_batch() {
local timeout=$1
local batch_type=$2
local start_time

start_time=$(date +%s)

latest_batch=$(get_latest_l2_batch)
if [[ $latest_batch -ne 0 ]]; then
echo "Error: Failed to get latest batch number" >&2
return 1
fi

echo "Waiting for batch $latest_batch to be ${batch_type}..."
while true; do
current_time=$(date +%s)
if [ $((current_time - start_time)) -ge "$timeout" ]; then
echo "Timeout reached. Batch $latest_batch was not ${batch_type} within $timeout seconds."
return 1
fi

if [ "$batch_type" = "virtual" ]; then
current_batch=$(cast logs --rpc-url "$(kurtosis port print cdk el-1-geth-lighthouse rpc)" --address 0x1Fe038B54aeBf558638CA51C91bC8cCa06609e91 --from-block 0 -j | jq -r '.[] | select(.topics[0] == "0x3e54d0825ed78523037d00a81759237eb436ce774bd546993ee67a1b67b6e766") | .topics[1]' | tail -n 1 | sed 's/^0x//')
current_batch=$((16#$current_batch))
elif [ "$batch_type" = "verified" ]; then
current_batch=$(cast rpc zkevm_verifiedBatchNumber --rpc-url "$(kurtosis port print cdk cdk-erigon-node-001 rpc)" | sed 's/^"//;s/"$//')
else
echo "Invalid batch type. Use 'virtual' or 'verified'."
return 1
fi

if [[ -z "$current_batch" ]]; then
echo "Error: Failed to get current batch number" >&2
return 1
fi

current_batch_dec=$((current_batch))
echo "Current ${batch_type} batch: $current_batch_dec, Latest batch: $latest_batch"
if [ "$current_batch_dec" -ge "$latest_batch" ]; then
echo "Batch $latest_batch has been ${batch_type}."
return 0
fi
sleep 10
done
}

stop_cdk_erigon_sequencer() {
echo "Stopping cdk-erigon"
# kurtosis service exec cdk cdk-erigon-sequencer-001 "pkill -SIGTRAP $(pgrep "proc-runner.sh")" || true
kurtosis service exec cdk cdk-erigon-sequencer-001 "pkill -SIGTRAP \"proc-runner.sh\"" || true
sleep 1
kurtosis service exec cdk cdk-erigon-sequencer-001 "pkill -SIGINT \"cdk-erigon\"" || true
sleep 30
}

# Set -e to exit on any command failure
set -e

stop_cdk_erigon_sequencer

echo "Copying and modifying config"
# shellcheck disable=SC2016 # double quotes result in syntax error, single quotes needed.
kurtosis service exec cdk cdk-erigon-sequencer-001 'cp \-r /etc/cdk-erigon/ /tmp/ && sed -i '\''s/zkevm\.executor-strict: true/zkevm.executor-strict: false/;s/zkevm\.executor-urls: zkevm-stateless-executor-001:50071/zkevm.executor-urls: ","/;$a zkevm.disable-virtual-counters: true'\'' /tmp/cdk-erigon/config.yaml'

echo "Starting cdk-erigon with modified config"
kurtosis service exec cdk cdk-erigon-sequencer-001 "nohup cdk-erigon --pprof=true --pprof.addr 0.0.0.0 --config /tmp/cdk-erigon/config.yaml --datadir /home/erigon/data/dynamic-kurtosis-sequencer > /proc/1/fd/1 2>&1 &"

# Wait for cdk-erigon to start
sleep 30

echo "Running loadtest using polycli"
/usr/local/bin/polycli loadtest --rpc-url "$(kurtosis port print cdk cdk-erigon-node-001 rpc)" --private-key "0x12d7de8621a77640c9241b2595ba78ce443d05e94090365ab3bb5e19df82c625" --verbosity 600 --requests 2000 --rate-limit 500 --mode uniswapv3 --legacy

echo "Waiting for batch virtualization"
if ! wait_for_l1_batch 600 "virtual"; then
echo "Failed to wait for batch virtualization"
exit 1
fi

echo "Stopping cdk node"
kurtosis service stop cdk cdk-node-001

stop_cdk_erigon_sequencer


# Good batch before counter overflow
latest_verified_batch=$(get_latest_l1_verified_batch)

# Rollback to the last good batch before the counter overflow on L1
echo "Rolling back to batch $latest_verified_batch"
cast send "0x2F50ef6b8e8Ee4E579B17619A92dE3E2ffbD8AD2" "rollbackBatches(address,uint64)" "0x1Fe038B54aeBf558638CA51C91bC8cCa06609e91" "$latest_verified_batch" --private-key "0x12d7de8621a77640c9241b2595ba78ce443d05e94090365ab3bb5e19df82c625" --rpc-url "$(kurtosis port print cdk el-1-geth-lighthouse rpc)"

echo "Using integration tool to unwind to batch $latest_verified_batch"
kurtosis service exec cdk cdk-erigon-sequencer-001 "integration state_stages_zkevm --config=/etc/cdk-erigon/config.yaml --unwind-batch-no=$latest_verified_batch --chain dynamic-kurtosis --datadir /home/erigon/data/dynamic-kurtosis-sequencer"

echo "Starting cdk-erigon with resequencing and counter enabled"
kurtosis service exec cdk cdk-erigon-sequencer-001 "timeout 300s cdk-erigon --pprof=true --pprof.addr 0.0.0.0 --config /etc/cdk-erigon/config.yaml --datadir /home/erigon/data/dynamic-kurtosis-sequencer --zkevm.sequencer-resequence-strict=false --zkevm.sequencer-resequence=true --zkevm.sequencer-resequence-reuse-l1-info-index=true"

stop_cdk_erigon_sequencer

echo "Starting cdk-erigon with normal execution"
kurtosis service exec cdk cdk-erigon-sequencer-001 "nohup cdk-erigon --pprof=true --pprof.addr 0.0.0.0 --config /etc/cdk-erigon/config.yaml --datadir /home/erigon/data/dynamic-kurtosis-sequencer > /proc/1/fd/1 2>&1 &"

# Wait for cdk-erigon to start
sleep 30

echo "Restarting cdk node"
kurtosis service start cdk cdk-node-001

echo "Getting latest block number from sequencer"
latest_block=$(cast block latest --rpc-url "$(kurtosis port print cdk cdk-erigon-sequencer-001 rpc)" | grep "number" | awk '{print $2}')
echo "Latest block number from sequencer: $latest_block"

echo "Calculating comparison block number"
comparison_block=$((latest_block - 10))
echo "Block number to compare (10 blocks behind): $comparison_block"

echo "Getting block hash from sequencer"
sequencer_hash=$(cast block $comparison_block --rpc-url "$(kurtosis port print cdk cdk-erigon-sequencer-001 rpc)" | grep "hash" | awk '{print $2}')

echo "Getting block hash from node"
node_hash=$(cast block $comparison_block --rpc-url "$(kurtosis port print cdk cdk-erigon-node-001 rpc)" | grep "hash" | awk '{print $2}')

echo "Sequencer block hash: $sequencer_hash"
echo "Node block hash: $node_hash"

echo "Comparing block hashes"
if [ "$sequencer_hash" = "$node_hash" ]; then
echo "The block hashes match for block number $comparison_block."
else
echo "The block hashes do not match for block number $comparison_block."
exit 1
fi

echo "Waiting for batch verification"
if ! wait_for_l1_batch 1200 "verified"; then
echo "Failed to wait for batch verification"
exit 1
fi

echo "All steps completed successfully"
Loading