Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hermes resubmits already received packet #1608

Closed
5 tasks
lyqingye opened this issue Nov 24, 2021 · 12 comments
Closed
5 tasks

Hermes resubmits already received packet #1608

lyqingye opened this issue Nov 24, 2021 · 12 comments
Labels
A: bug Admin: something isn't working A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews A: question Admin: further information is requested I: logic Internal: related to the relaying logic I: rpc Internal: related to (g)RPC
Milestone

Comments

@lyqingye
Copy link

lyqingye commented Nov 24, 2021

Crate

hermes

Summary of Bug

 worker: schedule execution encountered error: failed with underlying error: gRPC call failed with status: status: InvalidArgument, message: "failed to execute message; message index: 2: receive packet verification failed: packet sequence (8472) already has been received: invalid packet: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }

Ref: #1479

This bug still appears with v0.9.0

Version

v0.9.0

Steps to Reproduce

Acceptance Criteria


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere
Copy link
Member

adizere commented Nov 29, 2021

Hey, thanks for submitting this bug report.

The log output is quite minimal, it's not clear if this is an error from gas simulation step (which is OK), or from tx submission step (which would be a bug indeed).

Can you provide more context to your logs? Thank you.

@adizere adizere added A: bug Admin: something isn't working I: logic Internal: related to the relaying logic I: rpc Internal: related to (g)RPC A: question Admin: further information is requested labels Nov 29, 2021
@adizere adizere added this to the v0.9.3 milestone Nov 29, 2021
@lyqingye
Copy link
Author

Hey, adizere
The direct cause of this error is send msgs-batch (estimate_gas) The MsgRecvPacket message with the same sequence is contained in batch.

@lyqingye
Copy link
Author

lyqingye commented Nov 30, 2021

I try to add log code at send_tx like that

fn send_tx(&mut self, proto_msgs: Vec<Any>) -> Result<Response, Error> {
        for msg in &proto_msgs {
            if msg.type_url == "/ibc.core.channel.v1.MsgRecvPacket" {
                let packet: MsgRecvPacket  = prost::Message::decode(msg.value.as_slice()).unwrap();
                println!("{:?}",packet);
            }
        }

when this error occurs,the output of the above code is

MsgRecvPacket { packet: Some(Packet { sequence: 7, source_port: "oracle", source_channel: "channel-0",.......
MsgRecvPacket { packet: Some(Packet { sequence: 7, source_port: "oracle", source_channel: "channel-0",.......

this method is called twice in the same NewBlock, I can't reproduce the problem, but it does exist
https://github.com/informalsystems/ibc-rs/blob/master/relayer/src/event/rpc.rs#L13

@adizere
Copy link
Member

adizere commented Nov 30, 2021

I see, thank you!

One more question: Do you happen to know if this problem with duplicate sequences appears for a certain network? Or did you notice the problem for any network?

Any additional feedback would be welcome, since it still seems difficult for us to reproduce & fix this based on the discussion so far.

@lyqingye
Copy link
Author

I see, thank you!

One more question: Do you happen to know if this problem with duplicate sequences appears for a certain network? Or did you notice the problem for any network?

Any additional feedback would be welcome, since it still seems difficult for us to reproduce & fix this based on the discussion so far.

bandprotocol , it produce ibc-packet at Endblock , only in this scenario can the above situation occur. At present, I am gradually narrowing the scope of bugs

@romac romac changed the title [bug] resubmit already received packet Hermes resubmits already received packet Nov 30, 2021
@lyqingye
Copy link
Author

lyqingye commented Dec 1, 2021

add some log at get_all_events

pub fn get_all_events(
    chain_id: &ChainId,
    result: RpcEvent,
    latest_height: u64,
) -> Result<Vec<(Height, IbcEvent)>, String> {
    let mut vals: Vec<(Height, IbcEvent)> = vec![];

    match &result.data {
        RpcEventData::NewBlock { block, .. } => {

            let height = Height::new(
                ChainId::chain_version(chain_id.to_string().as_str()),
                u64::from(block.as_ref().ok_or("tx.height")?.header.height),
            );

            if height.revision_height <= latest_height {
                return Ok(vals);
            }
            tracing::info!("Chain:{:?} height:{:?}",chain_id,height);

log output, same thread call twice

2021-12-01T02:07:15.849061Z  INFO ThreadId(16) Chain:ChainId { id: "bandchain", version: 0 } height:Height { revision: 0, height: 2279 }
2021-12-01T02:07:15.852507Z  INFO ThreadId(16) Chain:ChainId { id: "bandchain", version: 0 } height:Height { revision: 0, height: 2279 }

now, I use a temporary solution: filter the same messages when sending msgs in same batch.
this is not a good solution. I will continue to track this problem when I have time

@lyqingye
Copy link
Author

lyqingye commented Dec 2, 2021

Hey @adizere I found the cause of the problem

https://github.com/informalsystems/ibc-rs/blob/master/relayer/src/event/monitor.rs#L103

 pub fn all() -> Vec<Query> {
        // Note: Tendermint-go supports max 5 query specifiers!
        vec![
            new_block(),
            ibc_client(),
            ibc_connection(),
            ibc_channel(),
            // This will be needed when we send misbehavior evidence to full node
            // Query::eq("message.module", "evidence"),
        ]
    }

hermes EventMonitor Subscribe NewBlock and Channel Events Topic, Bandchain Produce SendPacket Ibc Event at NewBlock Events, so Hermes captured the same SendPacket event from both topics.

Try the following steps:

  1. try subscribe ws://rpc.laozi1.bandchain.org/websocket (I using postman to subscribe websocket)
{
    "jsonrpc": "2.0",
    "method": "subscribe",
    "id": 0,
    "params": {
        "query": "message.module='ibc_channel'"
    }
}
  1. view log output when dest chain produce IbcPacket at BeginBlock/EndBlock
{ "jsonrpc": "2.0", "id": 0, "result": { "query": "message.module='ibc_channel'", "data": { "type": "tendermint/event/NewBlockHeader", "value": { "header":

{ "jsonrpc": "2.0", "id": 0, "result": { "query": "message.module='ibc_channel'", "data": { "type": "tendermint/event/NewBlock", "value": { "block":

{ "jsonrpc": "2.0", "id": 0, "result": { "query": "message.module='ibc_channel'", "data": { "type": "tendermint/event/Tx", "value": 

conclution:

when SendPacket produce at BeginBlock/EndBlock and websocket subscribe condition is "message.module='ibc_channel'"
you will get the NewBlock Event.
so when you subscribe new_block and message.module='ibc_channel'" topic at same time, you will get two NewBlock events, so Hermes parses the two NewBlock events and gets two identical SendPacket events.

It seems that this problem only occurs in the lower version of Cosmos-SDK, which is not found in the latest Cosmos-Hub

solution

filter RpcEvent.query string at get_all_events

pub fn get_all_events(
    chain_id: &ChainId,
    result: RpcEvent,
) -> Result<Vec<(Height, IbcEvent)>, String> {
    let mut vals: Vec<(Height, IbcEvent)> = vec![];

    match &result.data {
        RpcEventData::NewBlock { block, .. } => {
           // filter NewBlock query string
            if result.query == queries::new_block().to_string() {
                let height = Height::new(
                    ChainId::chain_version(chain_id.to_string().as_str()),
                    u64::from(block.as_ref().ok_or("tx.height")?.header.height),
                );

                vals.push((height, ClientEvents::NewBlock::new(height).into()));
                if let Some(events) = &result.events {
                    let ibc_events =
                        send_packet_from_block_events(height, events.clone().into_iter().collect());
                    if !ibc_events.is_empty() {
                        vals.extend(ibc_events);
                    }
                }
            }
        }

now hermes works normally!
I'm a beginner. My solution is temporary, If there is a better solution, please share!

@adizere
Copy link
Member

adizere commented Dec 6, 2021

so when you subscribe new_block and message.module='ibc_channel'" topic at same time, you will get two NewBlock events, so Hermes parses the two NewBlock events and gets two identical SendPacket events.

That's quite a subtle bug. Your solution is pretty good I would say, and the debugging part even more so, thanks for sharing these findings with us!

It seems that this problem only occurs in the lower version of Cosmos-SDK, which is not found in the latest Cosmos-Hub

This is very interesting, and makes me wonder if it should be a priority to include a permanent fix in Hermes for this problem, if at all. It's a matter of time until Bandchain also updates their SDK version and then the problem will disappear. That being said, we are pushing to releasing a couple of major improvements in a new version, and it's unlikely we'll have the capacity to fix this in the next release (#1648), but we'll keep it on our radar.

I assume you're running a forked version of Hermes. So it's not a problem if this fix is delayed, no? Let me know otherwise and maybe we can include it. Thank you for you understanding.

@dualsystems
Copy link
Contributor

dualsystems commented Dec 7, 2021

@lyqingye
Copy link
Author

lyqingye commented Dec 7, 2021

so when you subscribe new_block and message.module='ibc_channel'" topic at same time, you will get two NewBlock events, so Hermes parses the two NewBlock events and gets two identical SendPacket events.

That's quite a subtle bug. Your solution is pretty good I would say, and the debugging part even more so, thanks for sharing these findings with us!

It seems that this problem only occurs in the lower version of Cosmos-SDK, which is not found in the latest Cosmos-Hub

This is very interesting, and makes me wonder if it should be a priority to include a permanent fix in Hermes for this problem, if at all. It's a matter of time until Bandchain also updates their SDK version and then the problem will disappear. That being said, we are pushing to releasing a couple of major improvements in a new version, and it's unlikely we'll have the capacity to fix this in the next release (#1648), but we'll keep it on our radar.

I assume you're running a forked version of Hermes. So it's not a problem if this fix is delayed, no? Let me know otherwise and maybe we can include it. Thank you for you understanding.

Yes, I running a forked version of Hermes, but I still hope Hermes can permanently fix the problem. because this is a problem caused by the SDK and will affect all chains using this version.

@adizere adizere added the A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews label Jan 21, 2022
@romac romac modified the milestones: v0.11.1, v0.12.0 Feb 2, 2022
@romac
Copy link
Member

romac commented Feb 8, 2022

@lyqingye Can you please test with this branch and see if it fixes the problem? (cf. #1844)

@lyqingye
Copy link
Author

lyqingye commented Feb 11, 2022

#1844

I didn't try the code for this branch, but I reviewed it #1852 The code is consistent with the code I repaired, so it seems to be no problem

@romac romac closed this as completed Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Admin: something isn't working A: low-priority Admin: low priority / non urgent issue, expect longer wait time for PR reviews A: question Admin: further information is requested I: logic Internal: related to the relaying logic I: rpc Internal: related to (g)RPC
Projects
None yet
Development

No branches or pull requests

3 participants