Skip to content

Conversation

@leozc
Copy link
Contributor

@leozc leozc commented Apr 1, 2025

What changed? Why?

  1. This change includes a list of improvements from CipherOwl based on the original ChainStorage from Coinbase
  2. Key improvement:
    2.1 Adding LTC / Tron Support,
    2.2 Adding ZSTD support
    2.3 Making ChainStorage more friendly for opensource / k8s environment

How did you test the change?

  • [ X ] unit test
  • [ X ] integration test
  • functional test
  • adhoc test (described below)
  • running in production

ImNumber4 and others added 30 commits January 24, 2024 19:36
…replicator

# Conflicts:
#	internal/workflow/replicator.go
Signed-off-by: Henry Yang <henry.yang@cipherowl.com>
PikaZ76 and others added 30 commits September 24, 2025 17:10
The syncer activity was timing out due to slow blockchain node responses
that exceeded the heartbeat timeout (2 minutes). This fix adds heartbeat
calls at critical points:

1. Before and after processing each block in getBlocksInParallel
2. Before and after slow blockchain client calls in BatchGetBlockMetadata

This ensures that even when blockchain nodes are slow to respond (e.g.,
Story mainnet), the activity sends heartbeats frequently enough to prevent
Temporal from timing out the activity.

The issue was particularly noticeable with slow chains where fetching
blocks could take several minutes, causing the activity to exceed the
2-minute heartbeat timeout and fail with "activity timeout due to missing
heartbeats".

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
fix: add more frequent heartbeats in syncer activity to prevent timeouts
add validation, deleted deprecated migration sequence plan
This commit fixes two critical issues with Postgres metastorage during blockchain reorgs:

Issue 1: Canonical chain leakage to streamer before validation
- Problem: Blocks were visible to GetLatestBlock immediately after being written to
  canonical_blocks, before update_watermark validated the chain continuity
- Impact: During reorgs, streamer could see unvalidated blocks, causing data inconsistency
- Solution: Added is_watermark column to control visibility in GetLatestBlock
  - Blocks are written with is_watermark=FALSE initially
  - Only set to TRUE after update_watermark validates chain continuity
  - GetLatestBlock now filters WHERE is_watermark=TRUE

Issue 2: Recovery workflow not executing when update_watermark fails
- Problem: Replicator workflow's reorg detection relied on xerrors.Is, but Temporal's
  error serialization across activity boundaries breaks this check
- Impact: When update_watermark detected chain discontinuity, reorg recovery didn't trigger
- Solution: Changed to string-based error matching using strings.Contains
  - Works across Temporal activity boundaries
  - Restarts workflow from safe height when ErrInvalidChain is detected

Additional improvements:
- Probabilistic watermark cleanup (1 in 5000 chance) to prevent accumulation
- Migration includes initial watermark on highest block per tag for zero-downtime deployment
- Partial index on watermarked blocks for efficient queries
- Maintains defense-in-depth validation in GetBlocksByHeightRange

Files changed:
- internal/storage/metastorage/postgres/block_storage.go: Watermark logic and cleanup
- internal/storage/metastorage/postgres/db/migrations/20250129000001_add_watermark.sql: Schema changes
- internal/workflow/replicator.go: String-based reorg error detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added integration tests to verify:
1. Watermark visibility control - GetLatestBlock only returns watermarked blocks
2. Watermark behavior during reorgs - watermark updates correctly
3. Multi-tag watermark isolation - each tag maintains its own watermark
4. GetBlocksByHeightRange still works without watermarks (defense-in-depth)

Tests verify that:
- Blocks persisted without watermark are invisible to GetLatestBlock
- Blocks persisted with watermark become visible to GetLatestBlock
- Watermark properly updates to new tip blocks
- Reorg scenarios correctly update the watermark
- Multiple tags maintain independent watermarks

These tests ensure the fix for Issue 1 (canonical chain leakage) works correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace correlated subquery with CTE for better performance on large
canonical_blocks tables. The CTE pre-calculates max heights per tag
in a single pass, avoiding subquery re-execution for every row.

Before (correlated subquery):
- O(n²) performance, subquery runs for each row

After (CTE with GROUP BY + JOIN):
- O(n) performance, single table scan + hash join

Addresses PR review feedback.
…-handling

fix: Add watermark-based visibility control for Postgres metastorage
retry workflow for create temporal session failure by continue as new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.