Skip to content

Conversation

jonastheis
Copy link

@jonastheis jonastheis commented May 21, 2025

1. Purpose or design rationale of this PR

fix nil pointer due to CommittedBatchMeta not being found in the DB and add rewind of L1 sync height to recover from missing CommittedBatchMeta. This is most likely caused by a faulty L1 RPC (missing or out-of-order events). With this mechanism rollup verifier should be able to handle and recover from such a case.

This should also fix: #1142

2. PR title

Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:

  • build: Changes that affect the build system or external dependencies (example scopes: yarn, eslint, typescript)
  • ci: Changes to our CI configuration files and scripts (example scopes: vercel, github, cypress)
  • docs: Documentation-only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that doesn't fix a bug, or add a feature, or improves performance
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
  • test: Adding missing tests or correcting existing tests

3. Deployment tag versioning

Has the version in params/version.go been updated?

  • This PR doesn't involve a new deployment, git tag, docker image tag, and it doesn't affect traces
  • Yes

4. Breaking change label

Does this PR have the breaking-change label?

  • This PR is not a breaking change
  • Yes

Summary by CodeRabbit

  • Bug Fixes

    • Improved rollup synchronization to better recover from missing batch events by rewinding the sync height further.
  • Chores

    • Updated internal version number.

…nd add rewind of L1 sync height to recover from missing CommittedBatchMeta
Copy link

coderabbitai bot commented May 21, 2025

"""

Walkthrough

The patch increments the version constant and introduces new error handling in the rollup sync service. Specifically, it adds a sentinel error and logic to rewind the sync height by 100 blocks when missing batch events are detected, aiming to address missing or nil batch metadata scenarios during rollup event synchronization.

Changes

File(s) Change Summary
params/version.go Incremented the VersionPatch constant from 48 to 49.
rollup/rollup_sync_service/rollup_sync_service.go Added ErrMissingBatchEvent error and rewindL1Height constant set to 100; updated error handling and control flow for batch meta retrieval and event synchronization, including recursive rewind logic for missing batch events.

Sequence Diagram(s)

sequenceDiagram
    participant RollupSyncService
    participant L1Chain

    RollupSyncService->>L1Chain: fetchRollupEvents()
    alt updateRollupEvents returns ErrMissingBatchEvent
        RollupSyncService->>RollupSyncService: Rewind L1 sync height by 100
        RollupSyncService->>L1Chain: fetchRollupEvents() (recursive)
    else Other errors
        RollupSyncService->>RollupSyncService: Handle error as before
    end
Loading

Assessment against linked issues

Objective Addressed Explanation
Prevent nil pointer dereference in rollup_sync_service by handling missing or nil batch metadata (#1142)

Poem

A patch hops in with version anew,
And batch events missing? We know what to do!
If metadata’s gone, don’t panic or frown,
Just rewind the chain and fetch back down.
With errors now caught, the sync hops along—
This bunny ensures your rollup stays strong! 🐇✨
"""

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between e6deac3 and 044e038.

📒 Files selected for processing (1)
  • rollup/rollup_sync_service/rollup_sync_service.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • rollup/rollup_sync_service/rollup_sync_service.go
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: test
  • GitHub Check: Analyze (go)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
rollup/rollup_sync_service/rollup_sync_service.go (2)

230-238: Consider adding a maximum rewind limit.

While the rewind mechanism is effective, there's no limit to how many times it can be applied recursively. Consider adding a maximum rewind counter to prevent potential infinite rewinding if the batch event cannot be recovered after several attempts.

+const (
+   // maxRewindAttempts is the maximum number of times we'll attempt to rewind the L1 sync height
+   // before giving up and reporting a more serious error.
+   maxRewindAttempts = 5
+)

 func (s *RollupSyncService) fetchRollupEvents() error {
 	s.stateMu.Lock()
 	defer s.stateMu.Unlock()

+	// Keep track of rewind attempts within the current fetchRollupEvents call
+	rewindAttempts := 0
+
 	for {
 		prevL1Height := s.callDataBlobSource.L1Height()

 		daEntries, err := s.callDataBlobSource.NextData()
 		if err != nil {
 			if errors.Is(err, da.ErrSourceExhausted) {
 				log.Trace("Sync service exhausted data source, waiting for next data")
 				return nil
 			}

 			return fmt.Errorf("failed to get next data: %w", err)
 		}

 		if err = s.updateRollupEvents(daEntries); err != nil {
 			if errors.Is(err, ErrShouldResetSyncHeight) {
 				log.Warn("Resetting sync height to L1 block 7892668 to fix L1 message queue hash calculation")
 				s.callDataBlobSource.SetL1Height(7892668)

 				return nil
 			}
 			if errors.Is(err, ErrMissingBatchEvent) {
+				rewindAttempts++
+				if rewindAttempts > maxRewindAttempts {
+					return fmt.Errorf("maximum rewind attempts (%d) exceeded, still missing batch events: %w", maxRewindAttempts, err)
+				}
 				// If there's a missing batch event, rewind the L1 sync height by some blocks to re-fetch from L1 RPC and
 				// replay creating corresponding CommittedBatchMeta in local DB.
 				// This happens recursively until the missing event has been recovered as we will call fetchRollupEvents again
 				// with the `L1Height = prevL1Height - rewindL1Height`.
 				s.callDataBlobSource.SetL1Height(prevL1Height - rewindL1Height)

 				return fmt.Errorf("missing batch event, rewinding L1 sync height by %d blocks: %w", rewindL1Height, err)
 			}

230-238: Enhance logging for rewind operations.

Consider adding more detailed logging about the rewind operation to aid in debugging and tracking recovery attempts.

 			if errors.Is(err, ErrMissingBatchEvent) {
 				// If there's a missing batch event, rewind the L1 sync height by some blocks to re-fetch from L1 RPC and
 				// replay creating corresponding CommittedBatchMeta in local DB.
 				// This happens recursively until the missing event has been recovered as we will call fetchRollupEvents again
 				// with the `L1Height = prevL1Height - rewindL1Height`.
+				rewoundHeight := prevL1Height - rewindL1Height
+				log.Warn("Missing batch event detected, rewinding L1 sync height", 
+					"current_height", prevL1Height, 
+					"rewound_height", rewoundHeight,
+					"rewind_amount", rewindL1Height,
+					"error", err)
 				s.callDataBlobSource.SetL1Height(prevL1Height - rewindL1Height)

 				return fmt.Errorf("missing batch event, rewinding L1 sync height by %d blocks: %w", rewindL1Height, err)
 			}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between c66a003 and 26f43a5.

📒 Files selected for processing (2)
  • params/version.go (1 hunks)
  • rollup/rollup_sync_service/rollup_sync_service.go (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: test
  • GitHub Check: Analyze (go)
🔇 Additional comments (7)
params/version.go (1)

27-27: Version bump aligns with fix for CommittedBatchMeta nil pointer issue.

The version increment from 48 to 49 correctly reflects the protocol changes introduced by the new error handling and L1 sync height rewind mechanism.

rollup/rollup_sync_service/rollup_sync_service.go (6)

46-47: New constant for rewind height provides clear recovery mechanism.

The constant defines how many L1 blocks to rewind when a missing batch event is detected. The value of 10 should provide sufficient backtracking to recover from temporary L1 RPC inconsistencies.


53-53: New sentinel error correctly identifies missing batch event conditions.

The ErrMissingBatchEvent sentinel error allows for specific handling of missing batch metadata scenarios throughout the error chain.


230-238: Robust recovery mechanism for missing batch events.

The new error handling correctly identifies ErrMissingBatchEvent cases and implements a rewind mechanism to re-fetch potentially missed events from the L1 chain. This directly addresses the nil pointer issue mentioned in the PR description.


314-318: Properly identify and handle missing parent batch metadata.

The error handling for parent CommittedBatchMeta now explicitly checks for nil values and properly wraps errors to propagate the ErrMissingBatchEvent signal up the call stack.


322-326: Properly identify and handle missing current batch metadata.

Similar to the parent batch metadata handling, this ensures that missing current batch metadata is properly identified and signals the need for a rewind.


470-474: Consistent error handling in getCommittedBatchMeta function.

The error handling for missing parent committed batch metadata is consistently implemented across all relevant functions in the codebase.

Thegaram
Thegaram previously approved these changes May 21, 2025
georgehao
georgehao previously approved these changes May 21, 2025
colinlyguo
colinlyguo previously approved these changes May 21, 2025
@jonastheis jonastheis dismissed stale reviews from colinlyguo, georgehao, and Thegaram via e6deac3 May 21, 2025 05:52
colinlyguo
colinlyguo previously approved these changes May 21, 2025
Thegaram
Thegaram previously approved these changes May 21, 2025
Copy link

@Thegaram Thegaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but might make a few changes before we merge. (Let's tag on this branch first.)

@Thegaram
Copy link

Pushed a tag scroll-v5.8.49-rc0.

@jonastheis jonastheis dismissed stale reviews from Thegaram and colinlyguo via 044e038 May 21, 2025 08:17
@Thegaram Thegaram merged commit 141a8df into develop May 21, 2025
14 checks passed
@Thegaram Thegaram deleted the fix-rollup-sync-service branch May 21, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rollup_sync_service: invalid memory address or nil pointer dereference
4 participants