Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy Geth and set of Nimbus nodes for Sepolia testnet #104

Closed
jakubgs opened this issue Jun 7, 2022 · 16 comments
Closed

Deploy Geth and set of Nimbus nodes for Sepolia testnet #104

jakubgs opened this issue Jun 7, 2022 · 16 comments
Assignees

Comments

@jakubgs
Copy link
Member

jakubgs commented Jun 7, 2022

There is a new testnet brewing intended to test The Merge on Sepolia:

This testnet is launching on the 20th of June, and each org will receive 100 validators.

The support for this testnet has not yet been merged into nimbus-eth2.

What we'll need is:

Since this is a small and temporary testnet we can use an existing host for that, for example nimbus.kiln.

@parithosh
Copy link

Minimal correction, sepolia beacon chain is to test the merge on sepolia (its own PoW chain) and not Ropsten.

But the rest sounds good :)

@jakubgs jakubgs changed the title Deploy Geth and set of Nimbus nodes for Ropsten testnet Deploy Geth and set of Nimbus nodes for Sepolia testnet Jun 7, 2022
@jakubgs
Copy link
Member Author

jakubgs commented Jun 7, 2022

Yeah. I copied the title from the previous issue, and didn't edit it.

@jakubgs
Copy link
Member Author

jakubgs commented Jun 8, 2022

Adjusted the description to take into account that we need one Geth instance for each beacon node instance with validators.

@jakubgs
Copy link
Member Author

jakubgs commented Jun 14, 2022

Looks like the beacon chain config is in place: https://github.com/eth-clients/merge-testnets/tree/main/sepolia

@Marudny
Copy link
Contributor

Marudny commented Jun 14, 2022

Ok, it is very tricky and ugly setup.

  1. As we wanted to deploy sepolia instances on the kiln nodes the initial idea was to create sort of virtual host (create new set of metadata for the same bare metal host) but the idea was rejected because of potential issues with bootstrap playbook.
  2. We followed the idea of modyfing infra-tf-dummy-module to accept multiple groups instead of one group.
➜  nimbus_nodes_ropsten_hetzner git:(sepolia) ✗ diff -u variables.tf ../nimbus_nodes_kiln_hetzner/variables.tf 
--- variables.tf	2022-06-14 12:47:58.878934452 +0200
+++ ../nimbus_nodes_kiln_hetzner/variables.tf	2022-06-14 12:48:53.001772099 +0200
@@ -26,9 +26,9 @@
   default     = "node"
 }
 
-variable "group" {
-  description = "Name of Ansible group to add hosts to."
-  type        = string
+variable "groups" {
+  description = "Name of Ansible groups to add hosts to."
+  type        = list(string)
 }
 
 variable "env" {
➜  nimbus_nodes_ropsten_hetzner git:(sepolia) ✗ diff -u main.tf ../nimbus_nodes_kiln_hetzner/main.tf      
--- main.tf	2022-06-14 12:47:58.878934452 +0200
+++ ../nimbus_nodes_kiln_hetzner/main.tf	2022-06-14 12:51:42.842877706 +0200
@@ -11,6 +11,7 @@
   stage = var.stage != "" ? var.stage : terraform.workspace
   tokens = split(".", local.stage)
   dc     = "${var.prefix}-${var.region}"
+  groups = concat(var.groups, [local.dc, "${var.env}.${local.stage}"])
   # map of hostname => ip
   hostnames = { for i, ip in var.ips :
     "${var.name}-${format("%02d", i + 1)}.${local.dc}.${var.env}.${local.stage}" => ip
@@ -23,7 +24,7 @@
 resource "ansible_host" "host" {
   for_each           = local.hostnames
   inventory_hostname = each.key
-  groups             = [var.group, local.dc, "${var.env}.${local.stage}"]
+  groups             = local.groups
 
   vars = {
     hostname     = each.key
➜  nimbus_nodes_ropsten_hetzner git:(sepolia) 
  1. Above modification worked fine and I was able to put node to the two groups:
    groups = [ "nimbus-kiln-metal", "nimbus-sepolia-metal" ]

  2. Because of Ansible architecture - if hosts belong to two or more groups group_vars file of alphabetically latest group wins. In fact 4 instances of kiln nodes under sepolia name were created.

  3. Based on https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#variable-precedence-where-should-i-put-a-variable we checked different options and include_vars was the working one.

 tasks:
    - include_vars:
        file: group_vars/nimbus-sepolia-metal.yml
    - include_role: name=infra-role-geth
  1. Need to note, that include_vars changes the order of variable assigning so all variables defined in group_vars/nimbus-kiln-metal.yml and not overwritten by group_vars/nimbus-sepolia-meta.yml are in charge.

  2. After deployment, there were issues with --syncmode=snap, seems there are no peers with snap enabled so node couldn't get synced. First instance was switched manually to normal mode and after that got synced.

  3. Full DB takes 1.1GB of disk space.

Ansible configuration is here:
4130b80

@Marudny
Copy link
Contributor

Marudny commented Jun 15, 2022

All Geth instance are synced.
Config here:

@jakubgs
Copy link
Member Author

jakubgs commented Jun 16, 2022

Looks like Sepolia support has been merged:

We should get the beacon chain nodes up tomorrow as Gensis is on Sunday, June 19, 2022 2:00:00 PM +UTC.

@Marudny
Copy link
Contributor

Marudny commented Jun 17, 2022

Nodes are up and running:

artur@metal-01.he-eu-hel1.nimbus.kiln:/etc/sudoers.d % for PORT in {9311..9314}; do c "0:$PORT/eth/v1/node/syncing" | jq -c; done
{"data":{"head_slot":"0","sync_distance":"0","is_syncing":false}}
{"data":{"head_slot":"0","sync_distance":"0","is_syncing":false}}
{"data":{"head_slot":"0","sync_distance":"0","is_syncing":false}}
{"data":{"head_slot":"0","sync_distance":"0","is_syncing":false}}
artur@metal-01.he-eu-hel1.nimbus.kiln:/etc/sudoers.d % 
artur@metal-01.he-eu-hel1.nimbus.kiln:/etc/sudoers.d %  for PORT in {9311..9314}; do c "0:$PORT/eth/v1/beacon/genesis" | jq -c '.data | {genesis_time,genesis_fork_version}'; done 
{"genesis_time":"1655733600","genesis_fork_version":"0x90000069"}
{"genesis_time":"1655733600","genesis_fork_version":"0x90000069"}
{"genesis_time":"1655733600","genesis_fork_version":"0x90000069"}
{"genesis_time":"1655733600","genesis_fork_version":"0x90000069"}

Genesis time in our config is set 24hrs later than in sepolia config (Jun 20 2:00:00 GMT vs Jun 19 2:00:00 GMT).

https://github.com/eth-clients/merge-testnets/blob/f1e50e2d17fbb5a4988964365a5e2ae118b5265c/sepolia/config.yaml#L8-L11

I haven't deployed validators yet as those are missing in https://github.com/status-im/nimbus-private repo.

@Marudny
Copy link
Contributor

Marudny commented Jun 20, 2022

Validators have been deployed:

artur@metal-01.he-eu-hel1.nimbus.kiln:~ % for PORT in {9211..9214}; do c "0:$PORT/metrics" | grep '^validators '; done
validators 25.0
validators 25.0
validators 25.0
validators 25.0

@Marudny
Copy link
Contributor

Marudny commented Jun 20, 2022

As mentioned in Discord. I've faced problem with test_merge_vectors.nim script compilation:

Compilation command (run as user nimbus):
MAKE="make" V=0 /data/beacon-node-sepolia-unstable-01/repo/vendor/nimbus-build-system/scripts/env.sh scripts/compile_nim_program.sh /tmp/test scripts/test_merge_vectors.nim --verbosity:0 --hints:off

nimbus@metal-01.he-eu-hel1.nimbus.kiln:/data/beacon-node-sepolia-unstable-01/repo$ MAKE="make" V=0 /data/beacon-node-sepolia-unstable-01/repo/vendor/nimbus-build-system/scripts/env.sh scripts/compile_nim_program.sh /tmp/test scripts/test_merge_vectors.nim --verbosity:0 --hints:off 
/data/beacon-node-sepolia-unstable-01/repo/scripts/test_merge_vectors.nim(11, 7) template/generic instantiation of `suite` from here
/data/beacon-node-sepolia-unstable-01/repo/scripts/test_merge_vectors.nim(16, 8) template/generic instantiation of `test` from here
/data/beacon-node-sepolia-unstable-01/repo/vendor/nim-unittest2/unittest2.nim(802, 54) template/generic instantiation of `testSetupIMPL` from here
/data/beacon-node-sepolia-unstable-01/repo/scripts/test_merge_vectors.nim(13, 49) Error: type mismatch: got <type Web3DataProvider, Address, string>
but expected one of: 
proc new(T: type Web3DataProvider; depositContractAddress: Eth1Address;
         web3Url: string; jwtSecret: Option[seq[byte]]): Future[
    Result[Web3DataProviderRef, string]]
  first type mismatch at position: 4
  missing parameter: jwtSecret
proc new(t: typedesc): auto
  first type mismatch at position: 2
  extra argument given
2 other mismatching symbols have been suppressed; compile with --showAllMismatches:on to see them

expression: new(Web3DataProvider, default(Eth1Address), "http://127.0.0.1:8545")```

Waiting for fixed version.

@jakubgs
Copy link
Member Author

jakubgs commented Jun 20, 2022

Yeah, looks correct:

image

And the only way it wouldn't start on time if most servers running beacon nodes had wrong time set via NTP in the same way.

@jakubgs
Copy link
Member Author

jakubgs commented Jun 20, 2022

I believe here we see the Altair fork at epoch 50:

image

image

@jakubgs
Copy link
Member Author

jakubgs commented Jun 21, 2022

Nice, we can clearly see 3 points: genesis(epoch 0), altair fork(epoch 50), bellatrix fork(epoch 100):

image

@Marudny
Copy link
Contributor

Marudny commented Jun 22, 2022

Devs have released new version of test script. Tersec suggested to use test_merge_node instead of test_merge_vectors.

How to compile:
MAKE="make" V=0 /data/beacon-node-sepolia-unstable-01/repo/vendor/nimbus-build-system/scripts/env.sh scripts/compile_nim_program.sh test_merge_node scripts/test_merge_node.nim --verbosity:0 --hints:off

How to use:

  1. Create jwt.hex with random (so far) 64 hex nimble. Ex: tr -dc a-f0-9 </dev/urandom | head -c 64 > jwt.hex

Call:

artur@metal-01.he-eu-hel1.nimbus.kiln:~ % ./test_merge_node ws://localhost:9557 jwt.hex     
NOT 2022-06-22 10:37:28.918+00:00 New database from snapshot                 tid=2967401 file=blockchain_dag.nim:1826 genesisBlockRoot=9058609e genesisStateRoot=0fef5721 tailBlockRoot=9058609e tailStateRoot=0fef5721 fork="(previous_version: 00000000, current_version: 00000000, epoch: 0)" validators=64 tailStateSlot=0 genesisStateSlot=0
exchangeTransitionConfiguration ValueError: {"code":-32601,"message":"the method engine_exchangeTransitionConfigurationV1 does not exist/is not available"}
Invalid TTD errors are fine in this context

Above output is for sepolia. Same script called against kiln works fine.

docker-compose.yml comparison didn't show anything interesting except:

-      --syncmode=snap
+      --syncmode=full

It's still under investigation.

@jakubgs
Copy link
Member Author

jakubgs commented Jun 28, 2022

I guess this is fine.

@jakubgs jakubgs closed this as completed Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants