Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessively large number of inbound /ipfs/id/push/1.0.0 streams with v0.21.0-rc1 #9957

Closed
3 tasks done
mrd0ll4r opened this issue Jun 15, 2023 · 4 comments · Fixed by #9959
Closed
3 tasks done

Excessively large number of inbound /ipfs/id/push/1.0.0 streams with v0.21.0-rc1 #9957

mrd0ll4r opened this issue Jun 15, 2023 · 4 comments · Fixed by #9959
Labels
kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP

Comments

@mrd0ll4r
Copy link
Contributor

mrd0ll4r commented Jun 15, 2023

Checklist

Installation method

built from source

Version

Compiled from tag v0.21.0-rc1 with Go 1.20.5:

Kubo version: 0.21.0-rc1
Repo version: 14
System version: amd64/linux
Golang version: go1.20.5

Config

# Modified as such:

ipfs config profile apply server

ipfs config --bool 'Swarm.ResourceMgr.Enabled' false

ipfs config --json 'Swarm.ConnMgr' '{
  "GracePeriod": "0s",
  "HighWater": 100000,
  "LowWater": 0,
  "Type": "basic"
}'

ipfs config --bool 'Swarm.RelayService.Enabled' false

Description

I'm that guy running https://grafana.monitoring.ipfs.trudi.group
This is our setup.
In particular, we run two daemons in docker-compose, see here.
The images are build using this Dockerfile
and configured using this script.

I recently moved from v0.18.1 to v0.21.0-rc1. I did not change the config mods I have been running before. We have a plugin to export Bitswap messages and information from the Peerstore (this is called every few minutes by an external client). We also export information about the peer store to Prometheus, see here.

It's mostly running fine, although with fewer connections, but that's probably just a question of time.
I noticed, however, that I'm approaching 1M goroutines per daemon, which is quite a bit more than before, see here.
I believe this might be connected to the number of inbound /ipfs/id/push/1.0.0 streams I have, see here.

Interestingly, the (linear with time) rise in inbound streams does not happen immediately when we start the daemons, and not at the same time for both daemons, although the were started within seconds of each other, see this graph. The second daemon follows a few hours later. Because the symptoms don't show up at the same time in both daemons, it doesn't feel like this is directly related to our regular data exports. It feels more like some concurrency bug in kubo that shows up only after a while. This is the graph in question, in case Grafana doesn't work:
image
The daemons did not restart in between (there's a panel for that somewhere).

Not too sure what's going on here. Let me know if I can help debug. I wonder if this is related to how we're exporting data from the Peerstore -- we're only using public functionality, was there some API change I missed, some cleanup or something? I will try running without our client for a while.

@mrd0ll4r mrd0ll4r added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Jun 15, 2023
@Jorropo
Copy link
Contributor

Jorropo commented Jun 15, 2023

Thx for the report, we know about it: libp2p/go-libp2p-kad-dht#849 we have a PR on the way.

This is very heavily degrading the node's ability to do traffic.

@Jorropo
Copy link
Contributor

Jorropo commented Jun 15, 2023

@mrd0ll4r on an unrelated note there is a regression in go1.20.5 so we are still building with 1.19.10 as the moment golang/go#60674 you shouldn't be too bad, AFAIK right now it only prevents ipfs add wil binary file names to work properly.

@Jorropo Jorropo added P0 Critical: Tackled by core team ASAP and removed need/triage Needs initial labeling and prioritization labels Jun 15, 2023
Jorropo added a commit that referenced this issue Jun 15, 2023
Streams used to be blocked on ping IO because we didn't handled the DHT ping check asynchronously.

Fixes #9957
Jorropo added a commit that referenced this issue Jun 15, 2023
Streams used to be blocked on ping IO because we didn't handled the DHT ping check asynchronously.

Include fixes from libp2p/go-libp2p-kad-dht#851
Fixes #9957
hacdias pushed a commit that referenced this issue Jun 15, 2023
Streams used to be blocked on ping IO because we didn't handled the DHT ping check asynchronously.

Include fixes from libp2p/go-libp2p-kad-dht#851
Fixes #9957
@mrd0ll4r
Copy link
Contributor Author

@Jorropo oh! I was wondering why you were building with 1.19, but didn't find anything obvious in the issues. Thanks for letting me know! I'll do that too, then :)

hacdias pushed a commit that referenced this issue Jun 15, 2023
Streams used to be blocked on ping IO because we didn't handled the DHT ping check asynchronously.

Include fixes from libp2p/go-libp2p-kad-dht#851
Fixes #9957
@Jorropo
Copy link
Contributor

Jorropo commented Jun 15, 2023

I don't think it's obvious, it's a small edge case, the CI caught it.
Might be more.

hacdias pushed a commit that referenced this issue Jun 15, 2023
Streams used to be blocked on ping IO because we didn't handled the DHT ping check asynchronously.

Include fixes from libp2p/go-libp2p-kad-dht#851
Fixes #9957
hacdias pushed a commit that referenced this issue Jun 15, 2023
Streams used to be blocked on ping IO because we didn't handled the DHT ping check asynchronously.

Include fixes from libp2p/go-libp2p-kad-dht#851
Fixes #9957
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants