Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scroll chain geth stops unexpectedly at 1444216 #592

Closed
Johnaverse opened this issue Dec 9, 2023 · 12 comments · Fixed by #679
Closed

Scroll chain geth stops unexpectedly at 1444216 #592

Johnaverse opened this issue Dec 9, 2023 · 12 comments · Fixed by #679

Comments

@Johnaverse
Copy link

Johnaverse commented Dec 9, 2023

System information

Geth version: Version: 5.0.0-mainnet
Git Commit: cfd9de0

Chain stopped at 1444216
RPC Server stopped
Software crashed

Error log

CRIT [12-09|16:25:02.529] Unexpected QueueIndex in ReadL1MessagesFrom expected=81965 got=81967 startIndex=81959 maxCount=10

@Thegaram
Copy link

Thegaram commented Dec 9, 2023

Thank you for reporting this.

  • Does the node recover normal operation after restarting it?
  • Can you please share the whole log output of the node?
  • Could you please run the following two commands on the l2geth console and share the output? scroll.getL1MessageByIndex(81965), scroll.getL1MessageByIndex(81966). (Alternatively you can call scroll_getL1MessageByIndex through the RPC API.)
  • Is it possible that your node's DB somehow got corrupted?

@oscgu
Copy link

oscgu commented Dec 10, 2023

Also experiencing this issue. It all started with the following logs appearing:

########## BAD BLOCK #########
Chain config: {ChainID: 534352 Homestead: 0 DAO: <nil> DAOSupport: true EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: 0 Petersburg: 0 Istanbul: 0, Muir Glacier: <nil>, Berlin: 0, London: 0, Arrow Glacier: <nil>, Archimedes: 0, Shanghai: 0, Engine: clique, Scroll config: {useZktrie: true, maxTxPerBlock: 100, MaxTxPayloadBytesPerBlock: 122880, feeVaultAddress: 0x5300000000000000000000000000000000000005, enableEIP2718: false, enableEIP1559: false, l1Config: {l1ChainId: 1, l1MessageQueueAddress: 0x0d7E906BD9cAFa154b048cFa766Cc1E54E39AF9B, numL1MessagesPerBlock: 10}}}

Number: 1423862
Hash: 0x1a92f3343f84f99b26c9eb10fc37b344af097c01effb3b4cf6a7f6584794d617
ParentHash: 0xdae307d76bff29414562559dce7ece8b35db9f6f2b822742106aa5aa17a9c94e


Error: unknown L1 message
##############################

And after restarting the node it completely broke with:

CRIT [12-09|13:54:47.177] Unexpected QueueIndex in ReadL1MessagesFrom expected=80699 got=80701 startIndex=80699 maxCount=10

I cannot access the geth console, because the node crashes almost instantly

@Thegaram
Copy link

Thanks for providing more context @oscgu. Is this the same node as what @Johnaverse reported, or did you two independently ran into the same issue?

@oscgu
Copy link

oscgu commented Dec 11, 2023

@Thegaram, we both independently ran into the same issue

@Johnaverse
Copy link
Author

I confirmed that resync will work. But the crashed node is not non-recoverable.

@Johnaverse
Copy link
Author

Same issue with @oscgu, node crashed instantly, can't call the console method.
Restart no help.

@Johnaverse
Copy link
Author

@Thegaram Any updates?

@Thegaram
Copy link

We have not been able to reproduce this issue. We will work on reviewing the code to see how this could happen.

In the meantime, could you provide some additional context?

  • How do you run the code, what are the exact CLI arguments that you provide? (Feel free to hide anything sensitive.)
  • What L1 node do you connect to? Is it a managed service like Infura, or your own L1 node. In the latter case, what are the execution client and consensus client that you're using (client name and version)?

@Johnaverse
Copy link
Author

From my side, we are using our own L1 node and scroll node running on Debian 12
erigon version: v2.55.1
lighthouse: v4.5.0
scroll geth flag

  --datadir=/var/lib/geth
  --scroll
  --verbosity=3
  --http
  --http.corsdomain=*
  --http.vhosts=*
  --http.addr=0.0.0.0
  --http.port=8545
  --http.api=eth,net,web3,debug,scroll
  --ws
  --ws.addr=0.0.0.0
  --ws.port=8546
  --ws.origins=*
  --ws.api=debug,eth,txpool,net,engine
  --metrics
  --metrics.addr=0.0.0.0
  --metrics.port=6060
  --syncmode=full
  --gcmode=archive
  --maxpeers=100
  --nat=extip:0.0.0.0
  --port=30303
  --l1.endpoint=<https://l1:8545>
  --l1.confirmations=finalized

@oscgu
Copy link

oscgu commented Dec 18, 2023

We are running the node in our k8s cluster (in a container).
For storing the data we use dedicated TrueNAS servers and the l1 node we connect to is a public one.
Args:

          - "--scroll"
          - "--metrics"
          - "--metrics.addr=0.0.0.0"
          - "--metrics.expensive"
          - "--syncmode=full"
          - "--gcmode=archive"
          - "--http"
          - "--http.addr=0.0.0.0"
          - "--http.corsdomain=*"
          - "--ws.addr=0.0.0.0"
          - "--http.vhosts=*"
          - "--port=..."
          - "--nat=extip:..."
          - "--datadir=/scroll/.ethereum"
          - "--l1.endpoint=https://..."

@PeaStew
Copy link

PeaStew commented Mar 25, 2024

Same issue for us recently, affected seevral nodes:

Unexpected QueueIndex in ReadL1MessagesFrom expected=216,169 got=216,177 startIndex=216,162 maxCount=10

@Thegaram
Copy link

The issue was likely fixed in #679. We'll test the latest version internally and publish a release soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants