You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The guardian is responsible for witnessing all core contract message emissions on all chains. On Solana, the message data is stored in a Solana account via the postMessage or postMessageUnreliable instruction.
It must also be able to detect and process all Wormhole messages within a block in less time than it takes the following block to be produced, so as not to fall behind, in addition to all of the other guardian node tasks. As of this writing, according to the Solana Explorer, the average block time is ~422ms with about 3.7k transactions per second.
Notably, the Solana contract includes only one log (msg!) which logs the sequence number.
With the advent of Versioned Transactions, v0 transactions added support for Address Lookup Tables. This means that, for v0 transactions, an instruction's program index may be located in the lookup table, and populating that table requires an additional RPC call for the given account. A quick check of a recent block shows a mix of 172 v0 transactions, 79 of those with address table lookups, to 3074 legacy transactions.
The explicit purpose of the log filter (step 3) was to prevent the RPC footprint of this method from growing linearly with the number of v0 transactions using lookup tables.
However, this comes with an extremely notable shortcoming - the watcher will skip any transaction where the critical log does not appear in the first 10k bytes. Reliable messages missed in this way, can still be reobserved.
Steps to reproduce
Write and invoke a Solana program that performs the following:
Log 10k bytes
Call post_message on the core bridge
Experienced behavior
The message is not observed by the guardians and a VAA is not produced.
Expected behavior
The message is observed by the guardians and a VAA is produced.
Solution recommendation
I am not immediately confident that a different solution is more desirable than the status quo, as they all come with trade-offs. Here is a list of alternatives I have considered.
Investigate an alternative to the log check in step 3 above. This could be to perform a check for instructions which otherwise look like the postMessage or postMessageUnreliable instruction. However, this requires decoding all transactions in a block and has a potential for false-positives, leading to loading more lookup tables than necessary. The scaling performance of decoding all transactions would have to be considered along with the false-positive rate based on historical transactions.
Switch to the websockets implementation used by Pyth. The trade-off here is that contributors have seen degradation and misses relying on programSubscribe. For this reason, there is a check which prevents this from being used for a chain other than Pyth. The reliability could be investigated and then toggled via feature flag to allow individual guardians to test the performance and reliability against their RPC nodes. My understanding is that these subscriptions do require greater RPC resources than the existing approach.
Rewrite the watcher to use getSignaturesForAddress to filter the transactions. I'm unsure of the cost of this, but at least this could narrow down the transactions to only those from the core bridge program. However, this is at least one additional RPC call per block, which again would have to be completed quickly, and those transactions would still need to go through all of the existing processing.
It is again important to note that RPC load can cause a Solana node to fall behind the network and slow RPC responses can cause a guardian to fall behind its peers and delay quorum for messages. As is the case for every guardian responsibility, it is critical for guardians to process Solana messages in a timely and performant manner. The log limitation is an effective compromise, but I am opening this issue to document the limitation, reveal the considerations, and weigh alternatives.
The text was updated successfully, but these errors were encountered:
@evan-gray How can we make the guardian reobserve the missing message?
Guardians may manually or automatically re-observe missing transactions via their admin commands - ideally, these are not required during normal network operations.
As far as I understand, integrators who believe they have a missing message should reach out on the Wormhole discord for support.
Description and context
Background
Wormhole
The guardian is responsible for witnessing all core contract message emissions on all chains. On Solana, the message data is stored in a Solana account via the
postMessage
orpostMessageUnreliable
instruction.wormhole/node/pkg/watchers/solana/client.go
Lines 192 to 193 in 46bcc70
It must also be able to detect and process all Wormhole messages within a block in less time than it takes the following block to be produced, so as not to fall behind, in addition to all of the other guardian node tasks. As of this writing, according to the Solana Explorer, the average block time is ~422ms with about 3.7k transactions per second.
Notably, the Solana contract includes only one log (
msg!
) which logs the sequence number.wormhole/solana/bridge/program/src/api/post_message.rs
Lines 224 to 225 in 46bcc70
Solana
With the advent of Versioned Transactions,
v0
transactions added support for Address Lookup Tables. This means that, forv0
transactions, an instruction's program index may be located in the lookup table, and populating that table requires an additional RPC call for the given account. A quick check of a recent block shows a mix of 172v0
transactions, 79 of those with address table lookups, to 3074legacy
transactions.Additionally, there is a long standing Solana node DoS prevention around log messages, truncating the message log to a default of 10k log bytes.
Current Watcher Implementation
The current guardian watcher (note: there are two, one for
confirmed
and one forfinalized
) performs the following steps:v0
transaction leveraging lookup tables) [source]The explicit purpose of the log filter (step 3) was to prevent the RPC footprint of this method from growing linearly with the number of
v0
transactions using lookup tables.However, this comes with an extremely notable shortcoming - the watcher will skip any transaction where the critical log does not appear in the first 10k bytes. Reliable messages missed in this way, can still be reobserved.
Steps to reproduce
Write and invoke a Solana program that performs the following:
post_message
on the core bridgeExperienced behavior
The message is not observed by the guardians and a VAA is not produced.
Expected behavior
The message is observed by the guardians and a VAA is produced.
Solution recommendation
I am not immediately confident that a different solution is more desirable than the status quo, as they all come with trade-offs. Here is a list of alternatives I have considered.
postMessage
orpostMessageUnreliable
instruction. However, this requires decoding all transactions in a block and has a potential for false-positives, leading to loading more lookup tables than necessary. The scaling performance of decoding all transactions would have to be considered along with the false-positive rate based on historical transactions.programSubscribe
. For this reason, there is a check which prevents this from being used for a chain other than Pyth. The reliability could be investigated and then toggled via feature flag to allow individual guardians to test the performance and reliability against their RPC nodes. My understanding is that these subscriptions do require greater RPC resources than the existing approach.*handwave handwave*
.It is again important to note that RPC load can cause a Solana node to fall behind the network and slow RPC responses can cause a guardian to fall behind its peers and delay quorum for messages. As is the case for every guardian responsibility, it is critical for guardians to process Solana messages in a timely and performant manner. The log limitation is an effective compromise, but I am opening this issue to document the limitation, reveal the considerations, and weigh alternatives.
The text was updated successfully, but these errors were encountered: