-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor loop to event based processing #743
Conversation
419146d
to
fd66519
Compare
|
||
ENV PACKAGES curl make git libc-dev bash gcc linux-headers eudev-dev | ||
|
||
RUN apk add --no-cache $PACKAGES | ||
|
||
WORKDIR /go/src/github.com/osmosis-labs | ||
|
||
RUN wget -O /lib/libwasmvm_muslc.a https://github.com/CosmWasm/wasmvm/releases/download/v1.0.0/libwasmvm_muslc.$(uname -m).a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh good catch on these.
Gonna be reading through this on/off today while I play catch up a bit! I did run into a couple issues running the integration tests locally, I tried
Attempting to run the relayer on the sommelier<->osmosis path on mainnet also yields a panic.
|
Thanks for catching this. This should be resolved now!
|
Looks like another panic when running Here's the full stacktrace https://gist.github.com/jtieri/f2bce727e7f63ea236c7d0e82e6ebbf4 |
Meant to remove that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general architecture is a routing or pub/sub type workflow, which I do like.
Right now, I don't think I have authority to approve this PR yet given my lack of experience with the relayer.
To state the obvious, this is a large PR which always comes with risk. I'm betting manual QA and ibctest will have our bases covered though.
In the future, (if this is a routing pattern), I wonder if it's worth exploring something similar to Go http routing like https://github.com/go-chi/chi.
I'm not saying use http routing verbatim, but it's a nice pattern to declare your routes and attach handlers and (optional) middleware.
8b14ade
to
025a4a9
Compare
Agreed. I think a few days of mainnet testing, in addition to the passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To David's point, there is a bit much going on in the chain processor.
After nearly a decade working in distributed systems in Go, the concurrency pattern that I much prefer over independent goroutines taking common locks, is what I consider the "control loop" pattern. A single goroutine is responsible for all of the mutable data, without requiring any mutexes. The goroutine uses a single for-select loop to handle mutating actions, and it can use request-response channels to respond to queries.
I haven't absorbed the chain processor API enough to say with certainty whether this would apply cleanly here. Sometimes this pattern can be a bit more verbose, but the significant advantage is that you completely avoid deadlocks (because only one goroutine is modifying the data, no need for sync.Mutex), and you completely avoid data races (only one goroutine is reading or writing the set of data). The other tradeoff is that you have to return read-only copies, which I don't expect will be an issue here.
relayer/paths/path_processor.go
Outdated
srcPathEnd.messagesLock.Lock() | ||
dstPathEnd.messagesLock.Lock() | ||
srcPathEnd.inProgressMessageSendsLock.Lock() | ||
dstPathEnd.inProgressMessageSendsLock.Lock() | ||
defer srcPathEnd.messagesLock.Unlock() | ||
defer dstPathEnd.messagesLock.Unlock() | ||
defer srcPathEnd.inProgressMessageSendsLock.Unlock() | ||
defer dstPathEnd.inProgressMessageSendsLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking several locks in sequence will incur a high risk of deadlock, for any application that has enough goroutines active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this somehow. Yeah, high chance of deadlock. As Mark stated:
A single goroutine is responsible for all of the mutable data, without requiring any mutexes.
I am also a fan of this pattern. I've heard it called a "single writer" pattern. Only 1 goroutine writes (or mutates) data.
To echo Mark, hard to tell if that's feasible here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned this section up by making copies of the messages, using that for iteration to determine what needs to be deleted, then only locking/deleting for the specific maps after all of that.
I also removed the locks for the ibc message assembly, since those can just be separate slices that are concatenated after the waitgroup.
Right now the chain processors push messages into the path processor, and the path processor manages the backlog of messages coming from both chains, and clears out old packets. That data could be separated into two different stores, but we'd still have to lock at some point since the data is being read/written by different threads.
var clientInfo *ibc.ClientInfo | ||
var action string | ||
for _, event := range messageLog.Events { | ||
switch event.Type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to log an ignore event type at debug level in the default case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea for debugging
One final comment, I recommend unit tests around the relayer rejecting/accepting a packet. E.g. Is the channel open, port and channel ids match up, correct sequence, channel filtering (already suggested), etc. Basically, starting here: https://github.com/cosmos/ibc/blob/master/spec/core/ics-004-channel-and-packet-semantics/README.md#a-day-in-the-life-of-a-packet The above could be tested with ibctest. However, given ibctest's black box style, the more we add, the more brittle it becomes. Plus, black box tests are slow. The routing and validation of packets probably can be segmented into units. Hence, unit tests will be valuable, give instant feedback, and we can quickly test a variety of edge/error cases. If given appetite for the above, I'm happy to make an issue and tackle it in the near future. |
"golang.org/x/sync/errgroup" | ||
) | ||
|
||
const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bringing these up to be user configurable is good
} | ||
} | ||
|
||
func (ccp *CosmosChainProcessor) IsRelevantChannel(channelKey ibc.ChannelKey) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need a IsRelevantConnection
action = attr.Value | ||
} | ||
} | ||
case "connection_open_init", "connection_open_try", "connection_open_ack", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to catch channel_open_init
and complete the channel opening (i.e. format and send the completion transactions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this implemented as a PoC in a stale branch in case it's helpful here.
7fb2846#diff-8a85df9f5b36f2f6c8bbf9f000dc704f92f60bc6e94da171df51c9cd3a422620L51
There are two upstream changes in ibc-go that we need iirc so that we can properly pull the channel version identifier from the events.
cosmos/ibc-go#1204
cosmos/ibc-go#1203
Thomas from Interchain said, "So, the fixes have been implemented in the release/v3.1.x (and release/v2.3.x) branch already. So the next release should include them but we're not yet sure if that's going to be minor release or v4
cosmos/ibc-go#1279"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. I'll get the connection and channel handshakes pulled into the chain/path processors also.
func (ccp *CosmosChainProcessor) GetMsgRecvPacket(signer string, msgRecvPacket provider.RelayerMessage) (provider.RelayerMessage, error) { | ||
msg := getCosmosMsg[*chantypes.MsgRecvPacket](msgRecvPacket) | ||
|
||
key := host.PacketCommitmentKey(msg.Packet.SourcePort, msg.Packet.SourceChannel, msg.Packet.Sequence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we wanna handle retries here or upstream? Prob upstream
As an update per conversation outside this PR, I am going to break down this PR into smaller chunks. I am planning on a first PR with the minimal Chain and Path processors so we can decide on an optimal relationship between the two. Runtime consists of multiple chain processors and potentially multiple path processors, with both multiple path processors per chain processor and multiple chain processors per path processor, so we can experiment with different routing and concurrency strategies that will allow the parallelization needed between them. |
All of this functionality is now present, was handled by smaller PRs to revise all of these components. |
Refactors main query loop to a new event-based processing method based on processing loops for the involved chains.
For the cosmos specific implementation:
Drastically reduces query load on the RPC nodes. Now the query loop is slimmed down to (per chain):
Now no other queries are necessary for determining whether packets need to be relayed, and also for constructing the
MsgUpdateClient
. For constructing packet relay messages (MsgRecvPacket
,MsgAcknowledgement
,MsgTimeout
,MsgTimeoutOnClose
), just need one additional lightweight query per packet to get the tendermint proof from the counterparty chain.High level constructs:
ChainProcessor
(interface)One will be run per chain involved in the desired paths. Specific implementation should be provided for different chain types. Responsibilities of the
ChainProcessor
implementation:PathProcessors
where applicablePathProcessor
s for closed channelsCosmosChainProcessor
implementation:caches things as blocks are polled to prevent additional queries when relaying:
MsgUpdateClient
messages for counterparty chainPathProcessor
(struct)pathEnd1
andpathEnd2
, linking up to theChainProcessor
for each PathEnd.PathEndRuntime
maintains the state of IBC packets for the given side of the path.MsgRecvPacket
,MsgAcknowledgement
,MsgTimeout
,MsgTimeoutOnClose
)InSync
(query loop is up to date with the chain)MsgUpdateClient
to prepend all non-empty message sends.The
ChainProcessor
currently holds reference to theChainProvider
. TheChainProvider
can probably now be slimmed down quite a bit, but the methods are still quite useful for the CLI methods, so will need to determine whether that makes sense going through theChainProcessor
or not. Consolidation can definitely be improved between theChainProcessor
andChainProvider
to DRY things up.Other stuff:
Resolves #721
Resolves #726