Skip to content

feat: add toolkit for exporting and transforming missing block header fields #903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
data/
13 changes: 13 additions & 0 deletions rollup/missing_header_fields/export-headers-toolkit/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM golang:1.22

WORKDIR /app

COPY go.mod go.sum ./

RUN go mod download

COPY . .

RUN go build -o main .

ENTRYPOINT ["./main"]
63 changes: 63 additions & 0 deletions rollup/missing_header_fields/export-headers-toolkit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Export missing block header fields toolkit

A toolkit for exporting and transforming missing block header fields of Scroll before {{upgrade_name}} TODO: replace when upgrade is clear.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Replace the {{upgrade_name}} placeholder.

The TODO comment indicates that the placeholder needs to be replaced when the upgrade name is clear.


🏁 Script executed:

#!/bin/bash
# Check if the placeholder still exists in other files
grep -r "{{upgrade_name}}" --include="*.go" --include="*.md" .

Length of output: 575


Replace all {{upgrade_name}} placeholders with the actual upgrade name

The placeholder still exists in multiple locations and must be updated before merging:

• rollup/missing_header_fields/export-headers-toolkit/README.md
– Line 3: “before {{upgrade_name}} TODO: replace when upgrade is clear.”
– Line 5: “before {{upgrade_name}}, these fields were not stored on L1/DA.”
• rollup/missing_header_fields/export-headers-toolkit/cmd/root.go
– In the root command description: “missing from header data stored on L1 before {{upgrade_name}}.”

Please replace each {{upgrade_name}} with the confirmed upgrade identifier.

🤖 Prompt for AI Agents
In rollup/missing_header_fields/export-headers-toolkit/README.md around lines 3
to 5 and in rollup/missing_header_fields/export-headers-toolkit/cmd/root.go,
replace all instances of the placeholder {{upgrade_name}} with the confirmed
upgrade identifier. This involves updating the README lines mentioning the
upgrade and the root command description in root.go to reflect the actual
upgrade name instead of the placeholder.


## Context
We are using the [Clique consensus](https://eips.ethereum.org/EIPS/eip-225) in Scroll L2. Amongst others, it requires the following header fields:
- `extraData`
- `difficulty`

However, before {{upgrade_name}}, these fields were not stored on L1/DA.
In order for nodes to be able to reconstruct the correct block hashes when only reading data from L1,
we need to provide the historical values of these fields to these nodes through a separate file.

This toolkit provides commands to export the missing fields, deduplicate the data and create a file
with the missing fields that can be used to reconstruct the correct block hashes when only reading data from L1.

The toolkit provides the following commands:
- `fetch` - Fetch missing block header fields from a running Scroll L2 node and store in a file
- `dedup` - Deduplicate the headers file, print unique values and create a new file with the deduplicated headers

## Binary layout deduplicated missing header fields file
The deduplicated header file binary layout is as follows:

```plaintext
<unique_vanity_count:uint8><unique_vanity_1:[32]byte>...<unique_vanity_n:[32]byte><header_1:header>...<header_n:header>

Where:
- unique_vanity_count: number of unique vanities n
- unique_vanity_i: unique vanity i
- header_i: block header i
- header:
<flags:uint8><seal:[65|85]byte>
- flags: bitmask, lsb first
- bit 0-5: index of the vanity in the sorted vanities list
- bit 6: 0 if difficulty is 2, 1 if difficulty is 1
- bit 7: 0 if seal length is 65, 1 if seal length is 85
```

## How to run
Each of the commands has its own set of flags and options. To display the help message run with `--help` flag.

1. Fetch the missing block header fields from a running Scroll L2 node via RPC and store in a file (approx 40min for 5.5M blocks).
2. Deduplicate the headers file, print unique values and create a new file with the deduplicated headers

```bash
go run main.go fetch --rpc=http://localhost:8545 --start=0 --end=100 --batch=10 --parallelism=10 --output=headers.bin --humanOutput=true
go run main.go dedup --input=headers.bin --output=headers-dedup.bin
```


### With Docker
To run the toolkit with Docker, build the Docker image and run the commands inside the container.

```bash
docker build -t export-headers-toolkit .

# depending on the Docker config maybe finding the RPC container's IP with docker inspect is necessary. Potentially host IP works: http://172.17.0.1:8545
docker run --rm -v "$(pwd)":/app/result export-headers-toolkit fetch --rpc=<address> --start=0 --end=5422047 --batch=10000 --parallelism=10 --output=/app/result/headers.bin --humanOutput=/app/result/headers.csv
docker run --rm -v "$(pwd)":/app/result export-headers-toolkit dedup --input=/app/result/headers.bin --output=/app/result/headers-dedup.bin
```



296 changes: 296 additions & 0 deletions rollup/missing_header_fields/export-headers-toolkit/cmd/dedup.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
package cmd

import (
"bufio"
"bytes"
"crypto/sha256"
"encoding/binary"
"fmt"
"io"
"log"
"os"
"strconv"
"strings"

"github.com/spf13/cobra"

"github.com/scroll-tech/go-ethereum/common"

"github.com/scroll-tech/go-ethereum/export-headers-toolkit/types"
)

// dedupCmd represents the dedup command
var dedupCmd = &cobra.Command{
Use: "dedup",
Short: "Deduplicate the headers file, print unique values and create a new file with the deduplicated headers",
Long: `Deduplicate the headers file, print unique values and create a new file with the deduplicated headers.

The binary layout of the deduplicated file is as follows:
- 1 byte for the count of unique vanity
- 32 bytes for each unique vanity
- for each header:
- 1 byte (bitmask, lsb first):
- bit 0-5: index of the vanity in the sorted vanities list
- bit 6: 0 if difficulty is 2, 1 if difficulty is 1
- bit 7: 0 if seal length is 65, 1 if seal length is 85
- 65 or 85 bytes for the seal`,
Run: func(cmd *cobra.Command, args []string) {
inputFile, err := cmd.Flags().GetString("input")
if err != nil {
log.Fatalf("Error reading output flag: %v", err)
}
outputFile, err := cmd.Flags().GetString("output")
if err != nil {
log.Fatalf("Error reading output flag: %v", err)
}
verifyFile, err := cmd.Flags().GetString("verify")
if err != nil {
log.Fatalf("Error reading verify flag: %v", err)
}

if verifyFile != "" {
verifyInputFile(verifyFile, inputFile)
}

_, seenVanity, _ := runAnalysis(inputFile)
runDedup(inputFile, outputFile, seenVanity)

if verifyFile != "" {
verifyOutputFile(verifyFile, outputFile)
}

runSHA256(outputFile)
},
}

func init() {
rootCmd.AddCommand(dedupCmd)

dedupCmd.Flags().String("input", "headers.bin", "headers file")
dedupCmd.Flags().String("output", "headers-dedup.bin", "deduplicated, binary formatted file")
dedupCmd.Flags().String("verify", "", "verify the input and output files with the given .csv file")
}

func runAnalysis(inputFile string) (seenDifficulty map[uint64]int, seenVanity map[[32]byte]bool, seenSealLen map[int]int) {
reader := newHeaderReader(inputFile)
defer reader.close()

// track header fields we've seen
seenDifficulty = make(map[uint64]int)
seenVanity = make(map[[32]byte]bool)
seenSealLen = make(map[int]int)

reader.read(func(header *types.Header) {
seenDifficulty[header.Difficulty]++
seenVanity[header.Vanity()] = true
seenSealLen[header.SealLen()]++
})

// Print distinct values and report
fmt.Println("--------------------------------------------------")
for diff, count := range seenDifficulty {
fmt.Printf("Difficulty %d: %d\n", diff, count)
}

for vanity := range seenVanity {
fmt.Printf("Vanity: %x\n", vanity)
}

for sealLen, count := range seenSealLen {
fmt.Printf("SealLen %d bytes: %d\n", sealLen, count)
}

fmt.Println("--------------------------------------------------")
fmt.Printf("Unique values seen in the headers file (last seen block: %d):\n", reader.lastHeader.Number)
fmt.Printf("Distinct count: Difficulty:%d, Vanity:%d, SealLen:%d\n", len(seenDifficulty), len(seenVanity), len(seenSealLen))
fmt.Printf("--------------------------------------------------\n\n")

return seenDifficulty, seenVanity, seenSealLen
}

func runDedup(inputFile, outputFile string, seenVanity map[[32]byte]bool) {
reader := newHeaderReader(inputFile)
defer reader.close()

writer := newMissingHeaderFileWriter(outputFile, seenVanity)
defer writer.close()

writer.missingHeaderWriter.writeVanities()

reader.read(func(header *types.Header) {
writer.missingHeaderWriter.write(header)
})
}

func runSHA256(outputFile string) {
f, err := os.Open(outputFile)
defer f.Close()
if err != nil {
log.Fatalf("Error opening file: %v", err)
}

Comment on lines +125 to +131
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

defer f.Close() executed on nil pointer when os.Open fails

If os.Open returns an error, f is nil, and the deferred call panics.
Move the defer after the error check.

-f, err := os.Open(outputFile)
-defer f.Close()
-if err != nil {
-    log.Fatalf("Error opening file: %v", err)
-}
+f, err := os.Open(outputFile)
+if err != nil {
+    log.Fatalf("Error opening file: %v", err)
+}
+defer f.Close()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func runSHA256(outputFile string) {
f, err := os.Open(outputFile)
defer f.Close()
if err != nil {
log.Fatalf("Error opening file: %v", err)
}
func runSHA256(outputFile string) {
f, err := os.Open(outputFile)
if err != nil {
log.Fatalf("Error opening file: %v", err)
}
defer f.Close()
🤖 Prompt for AI Agents
In rollup/missing_header_fields/export-headers-toolkit/cmd/dedup.go around lines
125 to 131, the defer statement calling f.Close() is placed before checking if
os.Open returned an error, which can cause a panic if f is nil. Move the defer
f.Close() call to after the error check to ensure f is not nil before deferring
the close operation.

h := sha256.New()
if _, err = io.Copy(h, f); err != nil {
log.Fatalf("Error hashing file: %v", err)
}

fmt.Printf("Deduplicated headers written to %s with sha256 checksum: %x\n", outputFile, h.Sum(nil))
}

type headerReader struct {
file *os.File
reader *bufio.Reader
lastHeader *types.Header
}

func newHeaderReader(inputFile string) *headerReader {
f, err := os.Open(inputFile)
if err != nil {
log.Fatalf("Error opening input file: %v", err)
}

h := &headerReader{
file: f,
reader: bufio.NewReader(f),
}

return h
}

func (h *headerReader) read(callback func(header *types.Header)) {
headerSizeBytes := make([]byte, types.HeaderSizeSerialized)

for {
_, err := io.ReadFull(h.reader, headerSizeBytes)
if err != nil {
if err == io.EOF {
break
}
log.Fatalf("Error reading headerSizeBytes: %v", err)
}
headerSize := binary.BigEndian.Uint16(headerSizeBytes)

headerBytes := make([]byte, headerSize)
_, err = io.ReadFull(h.reader, headerBytes)
if err != nil {
if err == io.EOF {
break
}
log.Fatalf("Error reading headerBytes: %v", err)
}
header := new(types.Header).FromBytes(headerBytes)

// sanity check: make sure headers are in order
if h.lastHeader != nil && header.Number != h.lastHeader.Number+1 {
fmt.Println("lastHeader:", h.lastHeader.String())
log.Fatalf("Missing block: %d, got %d instead", h.lastHeader.Number+1, header.Number)
}
h.lastHeader = header

callback(header)
}
}

func (h *headerReader) close() {
h.file.Close()
}

type csvHeaderReader struct {
file *os.File
reader *bufio.Reader
}

func newCSVHeaderReader(verifyFile string) *csvHeaderReader {
f, err := os.Open(verifyFile)
if err != nil {
log.Fatalf("Error opening verify file: %v", err)
}

h := &csvHeaderReader{
file: f,
reader: bufio.NewReader(f),
}

return h
}

func (h *csvHeaderReader) readNext() *types.Header {
line, err := h.reader.ReadString('\n')
if err != nil {
if err == io.EOF {
return nil
}
log.Fatalf("Error reading line: %v", err)
}

s := strings.Split(line, ",")
extraString := strings.Split(s[2], "\n")

num, err := strconv.ParseUint(s[0], 10, 64)
if err != nil {
log.Fatalf("Error parsing block number: %v", err)
}
difficulty, err := strconv.ParseUint(s[1], 10, 64)
if err != nil {
log.Fatalf("Error parsing difficulty: %v", err)
}
extra := common.FromHex(extraString[0])

Comment on lines +217 to +238
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Robustness: guard against malformed CSV rows

csvHeaderReader.readNext assumes every line contains at least three
comma-separated fields. An empty or malformed line causes index out of range.

Add basic validation:

parts := strings.Split(strings.TrimSpace(line), ",")
if len(parts) < 3 {
    log.Fatalf("Malformed CSV line: %q", line)
}
🤖 Prompt for AI Agents
In rollup/missing_header_fields/export-headers-toolkit/cmd/dedup.go around lines
217 to 238, the readNext method assumes each CSV line has at least three
comma-separated fields, which can cause an index out of range panic on malformed
or empty lines. To fix this, add validation after reading the line by trimming
whitespace and splitting on commas, then check if the resulting slice has fewer
than three elements. If so, log a fatal error indicating a malformed CSV line
and include the line content for debugging.

header := types.NewHeader(num, difficulty, extra)
return header
}

func (h *csvHeaderReader) close() {
h.file.Close()
}

func verifyInputFile(verifyFile, inputFile string) {
csvReader := newCSVHeaderReader(verifyFile)
defer csvReader.close()

binaryReader := newHeaderReader(inputFile)
defer binaryReader.close()

binaryReader.read(func(header *types.Header) {
csvHeader := csvReader.readNext()

if !csvHeader.Equal(header) {
log.Fatalf("Header mismatch: %v != %v", csvHeader, header)
}
})

log.Printf("All headers match in %s and %s\n", verifyFile, inputFile)
}

func verifyOutputFile(verifyFile, outputFile string) {
csvReader := newCSVHeaderReader(verifyFile)
defer csvReader.close()

dedupReader, err := NewReader(outputFile)
if err != nil {
log.Fatalf("Error opening dedup file: %v", err)
}
defer dedupReader.Close()

for {
header := csvReader.readNext()
if header == nil {
if _, _, err = dedupReader.ReadNext(); err == nil {
log.Fatalf("Expected EOF, got more headers")
}
break
}

difficulty, extraData, err := dedupReader.Read(header.Number)
if err != nil {
log.Fatalf("Error reading header: %v", err)
}

if header.Difficulty != difficulty {
log.Fatalf("Difficulty mismatch: headerNum %d: %d != %d", header.Number, header.Difficulty, difficulty)
}
if !bytes.Equal(header.ExtraData, extraData) {
log.Fatalf("ExtraData mismatch: headerNum %d: %x != %x", header.Number, header.ExtraData, extraData)
}
}
}
Loading
Loading