Skip to content

Conversation

@aegis-cipherowl
Copy link

@aegis-cipherowl aegis-cipherowl commented Oct 29, 2025

Summary

This PR fixes two critical issues with Postgres metastorage during blockchain reorgs by implementing watermark-based visibility control.

Issues Fixed

Issue 1: Canonical Chain Leakage to Streamer Before Validation

Problem: Blocks were visible to GetLatestBlock immediately after being written to canonical_blocks, before update_watermark validated chain continuity (parent hash matching).

Impact: During reorgs, the streamer could see unvalidated blocks, causing data inconsistency.

Solution: Added is_watermark boolean column to control visibility:

  • Blocks are written with is_watermark=FALSE initially
  • Only set to TRUE after validation
  • GetLatestBlock now filters WHERE is_watermark=TRUE

Issue 2: Recovery Workflow Not Executing When update_watermark Fails

Problem: Replicator workflow's reorg detection relied on xerrors.Is, but Temporal's error serialization across activity boundaries breaks this check.

Impact: When update_watermark detected chain discontinuity (ErrInvalidChain), the reorg recovery logic didn't trigger.

Solution: Changed to string-based error matching using strings.Contains in internal/workflow/replicator.go:266:

if strings.Contains(err.Error(), parser.ErrInvalidChain.Error()) {
    // Reorg detected - restart from safe point
    ...
}

Architecture

Database Schema

Migration: internal/storage/metastorage/postgres/db/migrations/20250129000001_add_watermark.sql

-- Add is_watermark column
ALTER TABLE canonical_blocks ADD COLUMN is_watermark BOOLEAN NOT NULL DEFAULT FALSE;

-- Create partial index for efficient queries
CREATE INDEX idx_canonical_watermark ON canonical_blocks (tag, height DESC)
WHERE is_watermark = TRUE;

-- Set initial watermark on current highest block per tag (prevents downtime)
UPDATE canonical_blocks cb1
SET is_watermark = TRUE
WHERE height = (
SELECT MAX(height)
FROM canonical_blocks cb2
WHERE cb2.tag = cb1.tag
);

Workflow Integration

Replicator Workflow

  1. Phase 1-2: Write blocks with is_watermark=FALSE
  2. Phase 3: UpdateWatermark activity validates chain continuity
    - If valid: Sets is_watermark=TRUE on highest block
    - If invalid (reorg): Returns ErrInvalidChain
    - Replicator detects error via string matching and restarts from safe height

Poller Workflow

  1. Poller → Syncer → Loader → PersistBlockMetas(updateWatermark=true)
  2. Atomically validates chain and sets watermark in same transaction
  3. Continuous tip sync always has watermarked blocks

Implementation Details

PersistBlockMetas

Location: internal/storage/metastorage/postgres/block_storage.go:130-223

Key features:

  • Validates chain continuity before persisting
  • Writes blocks with is_watermark=FALSE
  • Optionally sets watermark on highest block after validation
  • Probabilistic cleanup (1 in 5000 chance) clears old watermarks to prevent accumulation

GetLatestBlock

Location: internal/storage/metastorage/postgres/block_storage.go:235-258

  • Queries WHERE is_watermark=TRUE
  • Returns ErrItemNotFound if no watermarked blocks exist
  • Uses partial index for efficient queries

Defense in Depth

GetBlocksByHeightRange validation retained (line 351):

  • Watermark only protects GetLatestBlock (streamer tip access)
  • GetBlocksByHeightRange is called by multiple components on arbitrary ranges
  • Validation provides defense-in-depth regardless of watermark state

Migration Safety

Zero-Downtime Deployment

The migration includes automatic watermarking of current tip blocks:
UPDATE canonical_blocks cb1
SET is_watermark = TRUE
WHERE height = (
SELECT MAX(height)
FROM canonical_blocks cb2
WHERE cb2.tag = cb1.tag
);

Why this matters:

  • Without it: All blocks have is_watermark=FALSE after migration → GetLatestBlock fails → streamer breaks
  • With it: Current tip blocks are watermarked → GetLatestBlock works immediately → zero service interruption

Testing

Integration Tests

Location: internal/storage/metastorage/postgres/block_storage_integration_test.go

Added comprehensive tests:

  1. TestWatermarkVisibilityControl: Verifies watermark controls visibility
  2. TestWatermarkWithReorg: Verifies reorg handling
  3. TestWatermarkMultipleTags: Verifies tag isolation
  4. TestGetBlocksByHeightRangeStillWorks: Verifies defense-in-depth

Workflow Verification

✅ Replicator: String-based error matching triggers reorg recovery✅ Poller: Atomic validation and watermark in same transaction

Performance Impact

  • GetLatestBlock: Partial index keeps queries fast (only watermarked blocks indexed)
  • GetBlocksByHeightRange: No change
  • PersistBlockMetas: Minimal overhead (one UPDATE per batch)
  • Cleanup: 0.02% of writes (1 in 5000), best-effort

Consistency with DynamoDB

Postgres now matches DynamoDB behavior:

  • ✅ Chain validation in PersistBlockMetas
  • ✅ Chain validation in GetBlocksByHeightRange
  • ✅ Atomic writes
  • ✅ Reorg handling via validation

Why Postgres needs watermarks but DynamoDB doesn't:

  • DynamoDB: Strongly consistent reads, no intermediate visibility
  • Postgres: MVCC allows reading uncommitted data from concurrent transactions
  • Watermark provides explicit visibility control for Postgres

Files Changed

  • internal/storage/metastorage/postgres/block_storage.go: Watermark logic and probabilistic cleanup
  • internal/storage/metastorage/postgres/db/migrations/20250129000001_add_watermark.sql: Schema changes with zero-downtime migration
  • internal/workflow/replicator.go: String-based reorg error detection
  • internal/storage/metastorage/postgres/block_storage_integration_test.go: Comprehensive watermark tests

Deployment Notes

  1. Migration runs automatically via goose
  2. Existing tip blocks are watermarked during migration
  3. No service downtime expected
  4. Probabilistic cleanup keeps watermark table lean over time

🤖 Generated with https://claude.com/claude-code

Co-Authored-By: Claude noreply@anthropic.com

aegis-cipherowl and others added 2 commits October 29, 2025 15:29
This commit fixes two critical issues with Postgres metastorage during blockchain reorgs:

Issue 1: Canonical chain leakage to streamer before validation
- Problem: Blocks were visible to GetLatestBlock immediately after being written to
  canonical_blocks, before update_watermark validated the chain continuity
- Impact: During reorgs, streamer could see unvalidated blocks, causing data inconsistency
- Solution: Added is_watermark column to control visibility in GetLatestBlock
  - Blocks are written with is_watermark=FALSE initially
  - Only set to TRUE after update_watermark validates chain continuity
  - GetLatestBlock now filters WHERE is_watermark=TRUE

Issue 2: Recovery workflow not executing when update_watermark fails
- Problem: Replicator workflow's reorg detection relied on xerrors.Is, but Temporal's
  error serialization across activity boundaries breaks this check
- Impact: When update_watermark detected chain discontinuity, reorg recovery didn't trigger
- Solution: Changed to string-based error matching using strings.Contains
  - Works across Temporal activity boundaries
  - Restarts workflow from safe height when ErrInvalidChain is detected

Additional improvements:
- Probabilistic watermark cleanup (1 in 5000 chance) to prevent accumulation
- Migration includes initial watermark on highest block per tag for zero-downtime deployment
- Partial index on watermarked blocks for efficient queries
- Maintains defense-in-depth validation in GetBlocksByHeightRange

Files changed:
- internal/storage/metastorage/postgres/block_storage.go: Watermark logic and cleanup
- internal/storage/metastorage/postgres/db/migrations/20250129000001_add_watermark.sql: Schema changes
- internal/workflow/replicator.go: String-based reorg error detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added integration tests to verify:
1. Watermark visibility control - GetLatestBlock only returns watermarked blocks
2. Watermark behavior during reorgs - watermark updates correctly
3. Multi-tag watermark isolation - each tag maintains its own watermark
4. GetBlocksByHeightRange still works without watermarks (defense-in-depth)

Tests verify that:
- Blocks persisted without watermark are invisible to GetLatestBlock
- Blocks persisted with watermark become visible to GetLatestBlock
- Watermark properly updates to new tip blocks
- Reorg scenarios correctly update the watermark
- Multiple tags maintain independent watermarks

These tests ensure the fix for Issue 1 (canonical chain leakage) works correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @aegis-cipherowl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of Postgres metastorage by introducing a watermark-based visibility control mechanism. This change addresses critical issues related to blockchain reorgs, specifically preventing the premature exposure of unvalidated blocks to downstream services and ensuring reliable reorg recovery within the Replicator workflow. The implementation includes a zero-downtime database migration and comprehensive test coverage to guarantee stability and consistency.

Highlights

  • Watermark-based Visibility: Introduced an "is_watermark" column in the Postgres "canonical_blocks" table to control block visibility, preventing unvalidated blocks from being exposed to downstream services like the streamer during blockchain reorgs.
  • Reorg Recovery Fix: Updated the Replicator workflow to use string-based error matching for "ErrInvalidChain", ensuring proper reorg recovery when the "update_watermark" activity fails due to chain discontinuity, circumventing Temporal's error serialization limitations.
  • Zero-Downtime Migration: Implemented a database migration that automatically watermarks existing highest blocks for each tag, ensuring that "GetLatestBlock" functions correctly immediately after deployment without any service interruption.
  • Enhanced Testing: Added comprehensive integration tests covering watermark visibility control, reorg handling scenarios, multi-tag isolation, and ensuring that "GetBlocksByHeightRange" functionality remains unaffected by the new watermark mechanism.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a watermark-based visibility control for Postgres metastorage to address critical issues during blockchain reorgs. The changes include adding an is_watermark column to the canonical_blocks table, creating a partial index for performance, and updating the data persistence logic to manage the watermark. Additionally, it includes a fix for reorg detection in the replicator workflow by using string-based error matching. The new functionality is well-supported by comprehensive integration tests. My review includes a suggestion to improve the performance of the database migration script and another to enhance the maintainability and observability of the new watermark cleanup logic.

Replace correlated subquery with CTE for better performance on large
canonical_blocks tables. The CTE pre-calculates max heights per tag
in a single pass, avoiding subquery re-execution for every row.

Before (correlated subquery):
- O(n²) performance, subquery runs for each row

After (CTE with GROUP BY + JOIN):
- O(n) performance, single table scan + hash join

Addresses PR review feedback.
@PikaZ76 PikaZ76 merged commit 4c66694 into master Oct 30, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants