Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message gets dropped because ttl reaches 0 #2849

Closed
bowenwang1996 opened this issue Jun 14, 2020 · 5 comments
Closed

Message gets dropped because ttl reaches 0 #2849

bowenwang1996 opened this issue Jun 14, 2020 · 5 comments
Assignees
Labels
A-network Area: Network C-bug Category: This is a bug

Comments

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented Jun 14, 2020

We've seen on betanet the following:

Jun 12 05:20:50.928  WARN network: Message dropped because TTL reached 0. Message: RoutedMessage { target: PeerId(ed25519:C6fKqCcFDbHm9DPTN2CR5QeembdsF7zcrwALyj2bjXMw), author: ed25519:7s9EHT2cx55MzcLvJzUTr9yzvoxroy2ktpEQTrVDk4pa, signature: ed25519:4p1xu4YpLf25Jkvg5rPdsJQqkCYuaSdugc32RQkrPTxDc9wsCeHXkt1PHt9jwLqEfG8kU42oqRiJGjcT5o2pjfgz, ttl: 0, body: PartialEncodedChunkRequest(PartialEncodedChunkRequestMsg { chunk_hash: ChunkHash(`G1ArxgVond2oD8KAgnFcxPebFn7bG1nrq6saLxwjSdAw`), part_ords: [49], tracking_shards: {} }) } From: ed25519:3ZjWJWFCxoFpdco7NFdmXVJ3tijt86XcUnGEKm9eUW9z
Jun 12 05:20:50.994  WARN network: Message dropped because TTL reached 0. Message: RoutedMessage { target: PeerId(ed25519:C6fKqCcFDbHm9DPTN2CR5QeembdsF7zcrwALyj2bjXMw), author: ed25519:37ye9asfGMRYej5gkVugYPLZT1hCWmjf6YGPpWnisGPW, signature: ed25519:4zTxideW33AgTj6TtNXu5E3f2TaHQj7mrxV4CYW7h5wibEmznW2sSjRSHbJcCZAyu6QkkYagsxxFDwf4YijCWzyQ, ttl: 0, body: PartialEncodedChunkRequest(PartialEncodedChunkRequestMsg { chunk_hash: ChunkHash(`GHmRRLPkiN9X6g2HXX9Sp77Mwgo3f8CspQqw4Evthp5i`), part_ords: [49], tracking_shards: {} }) } From: ed25519:3ZjWJWFCxoFpdco7NFdmXVJ3tijt86XcUnGEKm9eUW9z
Jun 12 05:20:51.118  WARN network: Message dropped because TTL reached 0. Message: RoutedMessage { target: PeerId(ed25519:C6fKqCcFDbHm9DPTN2CR5QeembdsF7zcrwALyj2bjXMw), author: ed25519:Dq9JMFJ4n5rW1X7oXXQxPXHcH99f2j9ueBVqHpf435WM, signature: ed25519:XaNevh3BYnEYu4rUB7iCBK5f7zKKF9YnZwSFMzBi8Yc5xR9sLNQoAhJGfQdUD13WPZ2FA6c5UfZm7QUgPe4BEiH, ttl: 0, body: PartialEncodedChunkRequest(PartialEncodedChunkRequestMsg { chunk_hash: ChunkHash(`DdmuW199Mkn3eePmPQ1vzMCrQDRqTsSJ8yQswmT62y9r`), part_ords: [49], tracking_shards: {} }) } From: ed25519:3ZjWJWFCxoFpdco7NFdmXVJ3tijt86XcUnGEKm9eUW9z
Jun 12 05:20:51.174  WARN network: Message dropped because TTL reached 0. Message: RoutedMessage { target: PeerId(ed25519:C6fKqCcFDbHm9DPTN2CR5QeembdsF7zcrwALyj2bjXMw), author: ed25519:Ar9Gpj7J2H5acqDJDUmGg3VRWhbc8Xtmydt77JV8fBQb, signature: ed25519:4f4UzazyqpydUFiQVxNnQzUXMuP47VXzxAcdTB7nS4CXU3wYsPoZ4g7cETxNkCS9TaX6y4RVyq6QMcogGNcpkmMF, ttl: 0, body: PartialEncodedChunkRequest(PartialEncodedChunkRequestMsg { chunk_hash: ChunkHash(`6cc1kQMAYvxgoK5DP5NznK9u633fBdN3Cynn8eXcAP3M`), part_ords: [49], tracking_shards: {} }) } From: ed25519:3ZjWJWFCxoFpdco7NFdmXVJ3tijt86XcUnGEKm9eUW9z
Jun 12 05:20:51.198  WARN network: Message dropped because TTL reached 0. Message: RoutedMessage { target: PeerId(ed25519:C6fKqCcFDbHm9DPTN2CR5QeembdsF7zcrwALyj2bjXMw), author: ed25519:FFfFQvcroRq56dB2BoKYxeJG8hr77C5iSJrQjnrqBBR3, signature: ed25519:Yy1tNUuSyw2dp5E4C2eJmnQgsupdfYic2PmrQmmTX7pBaRHBE4NAgwKfHZLE4yUNLCktSKMZne2d2WsZ1vGKnDT, ttl: 0, body: PartialEncodedChunkRequest(PartialEncodedChunkRequestMsg { chunk_hash: ChunkHash(`DdmuW199Mkn3eePmPQ1vzMCrQDRqTsSJ8yQswmT62y9r`), part_ords: [49], tracking_shards: {} }) } From: ed25519:3ZjWJWFCxoFpdco7NFdmXVJ3tijt86XcUnGEKm9eUW9z
Jun 12 05:20:51.391  WARN network: Message dropped because TTL reached 0. Message: RoutedMessage { target: PeerId(ed25519:C6fKqCcFDbHm9DPTN2CR5QeembdsF7zcrwALyj2bjXMw), author: ed25519:37ye9asfGMRYej5gkVugYPLZT1hCWmjf6YGPpWnisGPW, signature: ed25519:3aTgiNQuBsaphSYSMXvowUsbAUEAu1bdaiXgTtHyELkhuZsSqs3jSFEPKfBttMnwLminQbYZXwhaJBgPX5JezrG, ttl: 0, body: PartialEncodedChunkRequest(PartialEncodedChunkRequestMsg { chunk_hash: ChunkHash(`G1ArxgVond2oD8KAgnFcxPebFn7bG1nrq6saLxwjSdAw`), part_ords: [49], tracking_shards: {} }) } From: ed25519:3ZjWJWFCxoFpdco7NFdmXVJ3tijt86XcUnGEKm9eUW9z

Jun 12 05:20:51.522  WARN network: Message dropped because TTL reached 0. Message: RoutedMessage { target: PeerId(ed25519:C6fKqCcFDbHm9DPTN2CR5QeembdsF7zcrwALyj2bjXMw), author: ed25519:FGoGjEhfprqkhNMhumK2DHgdr7Hf7Vww3EW97kKGL5fN, signature: ed25519:3MA8d2JVqLyge5siuV8EaJPJo3gJg4E8USzK5VwGqby746e1Xrunr8HuCNQz47NfeU5L2FwKBxP1sFvCK5kHPE4o, ttl: 0, body: PartialEncodedChunkRequest(PartialEncodedChunkRequestMsg { chunk_hash: ChunkHash(`HVtzQyHb6EF6XPrsayu5DRKqhCayqMhgB1fMkTjkAw43`), part_ords: [49], tracking_shards: {} }) } From: ed25519:3ZjWJWFCxoFpdco7NFdmXVJ3tijt86XcUnGEKm9eUW9z

It seems that there is some routing loop in the network, but it's not clear to me why that would happen.

@bowenwang1996 bowenwang1996 added the A-network Area: Network label Jun 14, 2020
@mfornet mfornet added the C-bug Category: This is a bug label Jun 14, 2020
@bowenwang1996
Copy link
Collaborator Author

@mfornet any updates? We need to understand whether we should address this before phase 1.

@mfornet
Copy link
Member

mfornet commented Jun 22, 2020

@mfornet any updates? We need to understand whether we should address this before phase 1.

I've been investigating this, it looks like a problem with the routing table which is creating a cycle, but I haven't been able to track down what is the root cause yet. Is this happening consistently to some node?

@bowenwang1996
Copy link
Collaborator Author

No. At least I have seen this reported only once, but that is not (and should not be) an argument for the severity (or the lack thereof) of the issue.

@mfornet
Copy link
Member

mfornet commented Jun 23, 2020

@bowenwang1996 the simplest scenario where this might happen is if there are three nodes A, B, C all connected to each other, and C goes offline. A and B both learn immediately that its connection with C is dropped, but it takes "a moment" to learn that the other dropped the connection with C. So A sends a routed message to C via B, and then B sends it back to A (to route it to C). Eventually they will drop the message.

Something similar can happen with many more nodes when a nodes goes offline. Routing table synchronization should happen fairly fast (in the order of seconds), so this issue will be present during a short period of time and most of the time it will happen trying to route message to a disconnected peer.

I'm of course assuming those nodes are running legit (non-tampered) clients, because the easiest way to reproduce this today is simply sending message with ttl=1, or sending routed messages through non-optimal routes.

@mfornet
Copy link
Member

mfornet commented Jun 23, 2020

@bowenwang1996 I'm closing this, since this behavior is "expected" in some situations. Reopen if we see high number of dropped messages because of this reason.

@mfornet mfornet closed this as completed Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network C-bug Category: This is a bug
Projects
None yet
Development

No branches or pull requests

2 participants