node: Solana watcher filters transactions based on logs which may be truncated, leading to missed messages #4065

evan-gray · 2024-08-07T18:46:18Z

Description and context

Background

Wormhole

The guardian is responsible for witnessing all core contract message emissions on all chains. On Solana, the message data is stored in a Solana account via the postMessage or postMessageUnreliable instruction.

wormhole/node/pkg/watchers/solana/client.go

Lines 192 to 193 in 46bcc70

    
           postMessageInstructionID             = 0x01 
        
           postMessageUnreliableInstructionID   = 0x08

It must also be able to detect and process all Wormhole messages within a block in less time than it takes the following block to be produced, so as not to fall behind, in addition to all of the other guardian node tasks. As of this writing, according to the Solana Explorer, the average block time is ~422ms with about 3.7k transactions per second.

Notably, the Solana contract includes only one log (msg!) which logs the sequence number.

wormhole/solana/bridge/program/src/api/post_message.rs

Lines 224 to 225 in 46bcc70

    
           // DO NOT REMOVE - CRITICAL OUTPUT 
        
           msg!("Sequence: {}", sequence.sequence);

Solana

With the advent of Versioned Transactions, v0 transactions added support for Address Lookup Tables. This means that, for v0 transactions, an instruction's program index may be located in the lookup table, and populating that table requires an additional RPC call for the given account. A quick check of a recent block shows a mix of 172 v0 transactions, 79 of those with address table lookups, to 3074 legacy transactions.

Additionally, there is a long standing Solana node DoS prevention around log messages, truncating the message log to a default of 10k log bytes.

Current Watcher Implementation

The current guardian watcher (note: there are two, one for confirmed and one for finalized) performs the following steps:

For each slot between the last read (exclusive) and the latest slot (inclusive), fetch the block (RPC call) [source]
For each successful transaction in the block... [source]
That includes a Wormhole Core program log... [source]
Decode the transaction and populate the lookup table accounts (RPC call for each v0 transaction leveraging lookup tables) [source]
Process each top-level instruction [source]
Process each inner instruction [source]

The explicit purpose of the log filter (step 3) was to prevent the RPC footprint of this method from growing linearly with the number of v0 transactions using lookup tables.

However, this comes with an extremely notable shortcoming - the watcher will skip any transaction where the critical log does not appear in the first 10k bytes. Reliable messages missed in this way, can still be reobserved.

Steps to reproduce

Write and invoke a Solana program that performs the following:

Log 10k bytes
Call post_message on the core bridge

Experienced behavior

The message is not observed by the guardians and a VAA is not produced.

Expected behavior

The message is observed by the guardians and a VAA is produced.

Solution recommendation

I am not immediately confident that a different solution is more desirable than the status quo, as they all come with trade-offs. Here is a list of alternatives I have considered.

Investigate an alternative to the log check in step 3 above. This could be to perform a check for instructions which otherwise look like the postMessage or postMessageUnreliable instruction. However, this requires decoding all transactions in a block and has a potential for false-positives, leading to loading more lookup tables than necessary. The scaling performance of decoding all transactions would have to be considered along with the false-positive rate based on historical transactions.
Switch to the websockets implementation used by Pyth. The trade-off here is that contributors have seen degradation and misses relying on programSubscribe. For this reason, there is a check which prevents this from being used for a chain other than Pyth. The reliability could be investigated and then toggled via feature flag to allow individual guardians to test the performance and reliability against their RPC nodes. My understanding is that these subscriptions do require greater RPC resources than the existing approach.
Rewrite the watcher to use getSignaturesForAddress to filter the transactions. I'm unsure of the cost of this, but at least this could narrow down the transactions to only those from the core bridge program. However, this is at least one additional RPC call per block, which again would have to be completed quickly, and those transactions would still need to go through all of the existing processing.
Offer a Geyser plugin *handwave handwave*.

It is again important to note that RPC load can cause a Solana node to fall behind the network and slow RPC responses can cause a guardian to fall behind its peers and delay quorum for messages. As is the case for every guardian responsibility, it is critical for guardians to process Solana messages in a timely and performant manner. The log limitation is an effective compromise, but I am opening this issue to document the limitation, reveal the considerations, and weigh alternatives.

The text was updated successfully, but these errors were encountered:

linuxhjkaru · 2024-08-12T09:17:29Z

@evan-gray
How can we make the guardian reobserve the missing message?

evan-gray · 2024-08-12T12:05:00Z

@evan-gray How can we make the guardian reobserve the missing message?

Guardians may manually or automatically re-observe missing transactions via their admin commands - ideally, these are not required during normal network operations.

As far as I understand, integrators who believe they have a missing message should reach out on the Wormhole discord for support.

evan-gray added bug Something isn't working node solana labels Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node: Solana watcher filters transactions based on logs which may be truncated, leading to missed messages #4065

node: Solana watcher filters transactions based on logs which may be truncated, leading to missed messages #4065

evan-gray commented Aug 7, 2024 •

edited by kcsongor

Loading

linuxhjkaru commented Aug 12, 2024 •

edited

Loading

evan-gray commented Aug 12, 2024

node: Solana watcher filters transactions based on logs which may be truncated, leading to missed messages #4065

node: Solana watcher filters transactions based on logs which may be truncated, leading to missed messages #4065

Comments

evan-gray commented Aug 7, 2024 • edited by kcsongor Loading

Description and context

Background

Wormhole

Solana

Current Watcher Implementation

Steps to reproduce

Experienced behavior

Expected behavior

Solution recommendation

linuxhjkaru commented Aug 12, 2024 • edited Loading

evan-gray commented Aug 12, 2024

evan-gray commented Aug 7, 2024 •

edited by kcsongor

Loading

linuxhjkaru commented Aug 12, 2024 •

edited

Loading