Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ipfs/go-bitswap] Failure to fetch content from connected nodes #83

Open
vdamle opened this issue Aug 30, 2021 · 16 comments
Open

[ipfs/go-bitswap] Failure to fetch content from connected nodes #83

vdamle opened this issue Aug 30, 2021 · 16 comments
Labels
help wanted Seeking public contribution on this issue P2 Medium: Good to have, but can wait until someone steps up

Comments

@vdamle
Copy link

vdamle commented Aug 30, 2021

Environment

IPFS Version:

go-ipfs version: 0.9.1-dc2715a
Repo version: 11
System version: amd64/linux
Golang version: go1.15.2

We are running a private IPFS cluster with 2 or more nodes (docker/k8s)

Description

As an example, we've got a network with 2 nodes - node1 and node2. After startup, we observe that node2 cannot fetch content from node1 (a bootstrap peer) because it is not registered as a Partner. We see on node1 that node2 is registered as a Peer/Partner. I tried adding some additonal debug logs and best I could tell, the steps to register a BitSwapNetwork Receiver to receive events from the network was not complete before node2 received a "connected" notification from node1. It seems to me like this is a race condition and we see this occur 90% of the time, when we spin up nodes (ref: https://github.com/ipfs/go-bitswap/blob/0e81f7c1fef795a8d34cdd494fa2d8fdc045d199/bitswap.go#L215 ). As a throw of dice, I tried to move up the call before any options/decision engine is setup, this did not make a whole lot of difference to the outcome.

Workaround

  • Execute ipfs swarm peer disconnect <address>
  • Execute ipfs swarm peer connect <address>

We can see that the PeerConnected handler is invoked from Connected, which adds the Peer to the peer list. After this, WANT_HAVE is sent as expected to the peer and content is retrieved.

PS: Originally, I created the following issue in go-ipfs, but after further investigation, it feels like go-bitswap is the right place to report this.

I'd love to hear any suggestions/thoughts for how this can be addressed since I have a limited view of the architecture/intent for how connections are supposed to work. I have attached logs and output of commands from both nodes.

ipfs-config-node1.txt
ipfs-config-node2.txt
ipfs-id-node1.txt
ipfs-id-node2.txt
ipfs-node1.log
ipfs-node2.log

@vdamle vdamle added the need/triage Needs initial labeling and prioritization label Aug 30, 2021
@welcome
Copy link

welcome bot commented Aug 30, 2021

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@Stebalien
Copy link
Member

We usually handle this by setting up the node as follows:

  1. Instantiate the libp2p node, but don't start listening on addresses.
  2. Instantiate any services that need libp2p.
  3. Finally, start listening.

However, ideally bitswap would add all existing peers on startup. If you want to write a quick patch to do that, I'd be happy to take a look.

@BigLep BigLep added need/author-input Needs input from the original author and removed need/triage Needs initial labeling and prioritization labels Sep 24, 2021
@BigLep
Copy link
Contributor

BigLep commented Sep 24, 2021

@vdamle : is this a patch you're interested in contributing?

@vdamle
Copy link
Author

vdamle commented Sep 24, 2021

Sorry, I missed the update @BigLep / @Stebalien - thanks for the input!

I'd love to contribute a patch.

@Stebalien : Here's what I'm thinking:

Please let me know if I'm on the right track. In the meantime, I will try to make the above changes and test in my private network deployment.

@Stebalien
Copy link
Member

So, I tried to implement it and realized this just isn't going to work, or at least not "well" because your libp2p host may have received connections before it's listening on the bitswap protocol.

The correct fix is to:

  1. Wait for "identify" and "protocol changed" events on the libp2p event bus instead of connect events. That way, two peers will only "see" each other once they start listening on the bitswap protocol, not when they first connect.
  2. On start, iterate through all peers and check to see if they speak the bitswap protocol. If so, simulate an event as in step 1.

But this needs to be done carefully to make sure it's somewhat atomic.

@BigLep
Copy link
Contributor

BigLep commented Oct 1, 2021

@vdamle: do you have a better idea of what a patch would need to look like given @Stebalien 's feedback?

@BigLep BigLep added help wanted Seeking public contribution on this issue P2 Medium: Good to have, but can wait until someone steps up labels Jan 7, 2022
@BigLep
Copy link
Contributor

BigLep commented Jan 7, 2022

@vdamle : is this something you'd be to contribute?

2022-01-07 conversation: @aschmahmann ran into this as well in 202112 while working on Testground/integration testing. Not fixing this means tests have to work around this issue.

@BigLep BigLep removed the need/author-input Needs input from the original author label Mar 25, 2022
@peterargue
Copy link

@BigLep @Stebalien I took a stab at this in ipfs/go-bitswap#590. Am I on the right track?

@BigLep BigLep added the need/author-input Needs input from the original author label Oct 11, 2022
@github-actions
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@peterargue
Copy link

what additional information is needed?

@BigLep
Copy link
Contributor

BigLep commented Oct 18, 2022

@peterargue : apologies for the confusion. It looks like we need to do some bot adjustments for when the ball is back in the maintainer's court. We'll need maintainers to look at this (next triage is scheduled for 2022-10-20).

@peterargue
Copy link

@BigLep no worries and thanks for the update

@peterargue
Copy link

@BigLep any idea when the team will have some time to take a look at this PR? FWIW, we've been running with these changes on a live network for the last 4 weeks with great results. All of the correct peers are added on startup and updated as expected.

@BigLep
Copy link
Contributor

BigLep commented Nov 27, 2022

Hi @peterargue : I personally haven't been at the most recent triage meetings to give more insight.

@guseggert : has this come up during weekly triage? Do we have any initial feedback here?

@Jorropo Jorropo changed the title Failure to fetch content from connected nodes [ipfs/go-bitswap] Failure to fetch content from connected nodes Jan 27, 2023
@welcome
Copy link

welcome bot commented Jan 27, 2023

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@Jorropo Jorropo transferred this issue from ipfs/go-bitswap Jan 27, 2023
@github-actions
Copy link

github-actions bot commented Feb 3, 2023

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@BigLep BigLep removed need/author-input Needs input from the original author kind/stale labels Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking public contribution on this issue P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

No branches or pull requests

4 participants