-
Notifications
You must be signed in to change notification settings - Fork 3
fix: Add watermark-based visibility control for Postgres metastorage #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit fixes two critical issues with Postgres metastorage during blockchain reorgs: Issue 1: Canonical chain leakage to streamer before validation - Problem: Blocks were visible to GetLatestBlock immediately after being written to canonical_blocks, before update_watermark validated the chain continuity - Impact: During reorgs, streamer could see unvalidated blocks, causing data inconsistency - Solution: Added is_watermark column to control visibility in GetLatestBlock - Blocks are written with is_watermark=FALSE initially - Only set to TRUE after update_watermark validates chain continuity - GetLatestBlock now filters WHERE is_watermark=TRUE Issue 2: Recovery workflow not executing when update_watermark fails - Problem: Replicator workflow's reorg detection relied on xerrors.Is, but Temporal's error serialization across activity boundaries breaks this check - Impact: When update_watermark detected chain discontinuity, reorg recovery didn't trigger - Solution: Changed to string-based error matching using strings.Contains - Works across Temporal activity boundaries - Restarts workflow from safe height when ErrInvalidChain is detected Additional improvements: - Probabilistic watermark cleanup (1 in 5000 chance) to prevent accumulation - Migration includes initial watermark on highest block per tag for zero-downtime deployment - Partial index on watermarked blocks for efficient queries - Maintains defense-in-depth validation in GetBlocksByHeightRange Files changed: - internal/storage/metastorage/postgres/block_storage.go: Watermark logic and cleanup - internal/storage/metastorage/postgres/db/migrations/20250129000001_add_watermark.sql: Schema changes - internal/workflow/replicator.go: String-based reorg error detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added integration tests to verify: 1. Watermark visibility control - GetLatestBlock only returns watermarked blocks 2. Watermark behavior during reorgs - watermark updates correctly 3. Multi-tag watermark isolation - each tag maintains its own watermark 4. GetBlocksByHeightRange still works without watermarks (defense-in-depth) Tests verify that: - Blocks persisted without watermark are invisible to GetLatestBlock - Blocks persisted with watermark become visible to GetLatestBlock - Watermark properly updates to new tip blocks - Reorg scenarios correctly update the watermark - Multiple tags maintain independent watermarks These tests ensure the fix for Issue 1 (canonical chain leakage) works correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary of ChangesHello @aegis-cipherowl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of Postgres metastorage by introducing a watermark-based visibility control mechanism. This change addresses critical issues related to blockchain reorgs, specifically preventing the premature exposure of unvalidated blocks to downstream services and ensuring reliable reorg recovery within the Replicator workflow. The implementation includes a zero-downtime database migration and comprehensive test coverage to guarantee stability and consistency. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a watermark-based visibility control for Postgres metastorage to address critical issues during blockchain reorgs. The changes include adding an is_watermark column to the canonical_blocks table, creating a partial index for performance, and updating the data persistence logic to manage the watermark. Additionally, it includes a fix for reorg detection in the replicator workflow by using string-based error matching. The new functionality is well-supported by comprehensive integration tests. My review includes a suggestion to improve the performance of the database migration script and another to enhance the maintainability and observability of the new watermark cleanup logic.
internal/storage/metastorage/postgres/db/migrations/20250129000001_add_watermark.sql
Outdated
Show resolved
Hide resolved
Replace correlated subquery with CTE for better performance on large canonical_blocks tables. The CTE pre-calculates max heights per tag in a single pass, avoiding subquery re-execution for every row. Before (correlated subquery): - O(n²) performance, subquery runs for each row After (CTE with GROUP BY + JOIN): - O(n) performance, single table scan + hash join Addresses PR review feedback.
Summary
This PR fixes two critical issues with Postgres metastorage during blockchain reorgs by implementing watermark-based visibility control.
Issues Fixed
Issue 1: Canonical Chain Leakage to Streamer Before Validation
Problem: Blocks were visible to
GetLatestBlockimmediately after being written tocanonical_blocks, beforeupdate_watermarkvalidated chain continuity (parent hash matching).Impact: During reorgs, the streamer could see unvalidated blocks, causing data inconsistency.
Solution: Added
is_watermarkboolean column to control visibility:is_watermark=FALSEinitiallyTRUEafter validationGetLatestBlocknow filtersWHERE is_watermark=TRUEIssue 2: Recovery Workflow Not Executing When update_watermark Fails
Problem: Replicator workflow's reorg detection relied on
xerrors.Is, but Temporal's error serialization across activity boundaries breaks this check.Impact: When
update_watermarkdetected chain discontinuity (ErrInvalidChain), the reorg recovery logic didn't trigger.Solution: Changed to string-based error matching using
strings.Containsininternal/workflow/replicator.go:266:Architecture
Database Schema
Migration: internal/storage/metastorage/postgres/db/migrations/20250129000001_add_watermark.sql
-- Add is_watermark column
ALTER TABLE canonical_blocks ADD COLUMN is_watermark BOOLEAN NOT NULL DEFAULT FALSE;
-- Create partial index for efficient queries
CREATE INDEX idx_canonical_watermark ON canonical_blocks (tag, height DESC)
WHERE is_watermark = TRUE;
-- Set initial watermark on current highest block per tag (prevents downtime)
UPDATE canonical_blocks cb1
SET is_watermark = TRUE
WHERE height = (
SELECT MAX(height)
FROM canonical_blocks cb2
WHERE cb2.tag = cb1.tag
);
Workflow Integration
Replicator Workflow
- If valid: Sets is_watermark=TRUE on highest block
- If invalid (reorg): Returns ErrInvalidChain
- Replicator detects error via string matching and restarts from safe height
Poller Workflow
Implementation Details
PersistBlockMetas
Location: internal/storage/metastorage/postgres/block_storage.go:130-223
Key features:
GetLatestBlock
Location: internal/storage/metastorage/postgres/block_storage.go:235-258
Defense in Depth
GetBlocksByHeightRange validation retained (line 351):
Migration Safety
Zero-Downtime Deployment
The migration includes automatic watermarking of current tip blocks:
UPDATE canonical_blocks cb1
SET is_watermark = TRUE
WHERE height = (
SELECT MAX(height)
FROM canonical_blocks cb2
WHERE cb2.tag = cb1.tag
);
Why this matters:
Testing
Integration Tests
Location: internal/storage/metastorage/postgres/block_storage_integration_test.go
Added comprehensive tests:
Workflow Verification
✅ Replicator: String-based error matching triggers reorg recovery✅ Poller: Atomic validation and watermark in same transaction
Performance Impact
Consistency with DynamoDB
Postgres now matches DynamoDB behavior:
Why Postgres needs watermarks but DynamoDB doesn't:
Files Changed
Deployment Notes
🤖 Generated with https://claude.com/claude-code
Co-Authored-By: Claude noreply@anthropic.com