-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add error log when resource manager throttles crawler #772
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, don't have time to test this out at the moment (somewhat necessary given the abysmal test coverage of the accelerated DHT client 😬). Should be able to later today or tomorrow.
If you want another set of hands to run tests to see how helpful this is at finding the limits @Jorropo may be free.
fullrt/dht.go
Outdated
func(p peer.ID, err error) { | ||
if errors.Is(err, network.ErrResourceLimitExceeded) { | ||
limitErrOnce.Do(func() { | ||
logger.Errorf("Accelerated DHT client was unable to fully refresh its routing table due to Resource Manager limits, which may degrade content routing. Consider increasing resource limits. See debug logs for details.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it look when testing this out within go-ipfs? Have you been able to use this mechanism to discover what we should raise the default limits to such that this should not trigger for people?
See debug logs for details.
What types of logs are you expecting people are going to examine? For example, do they need to look at the dht-crawler
logs rather than the fullrt
ones, do they need to look at rcmgr
logs, etc.
I'm not sure there's enough info here for a user to "figure it out" more than just filing an issue on this repo. If we need to add text to the README or put a link to a wiki for this repo we can do that, otherwise if there's some simple text we can use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it look when testing this out within go-ipfs? Have you been able to use this mechanism to discover what we should raise the default limits to such that this should not trigger for people?
So the logs look like this (next to ipfs/kubo#8980):
2022-05-18T11:34:24.234-0400 ERROR fullrtdht fullrt/dht.go:304 Accelerated DHT client was unable to fully refresh its routing table due to Resource Manager limits, which may degrade content routing. Consider increasing resource limits. See debug logs for details.
2022-05-18T11:34:32.201-0400 ERROR resourcemanager libp2p/rcmgr_logging.go:51 Resource limits were exceeded 55405 times, consider inspecting logs and raising the resource manager limits.
There aren't details about specifically which kind of limit was hit (outbound conns, streams, mem, FDs), and at which scope, so that error alone is not enough to know what limits need to be raised. I can add more brains to ipfs/kubo#8980 to try to come up with some recommendation but that will take some time, since we'd need to register DHT interactions w/ libp2p as a proper resource manager "system". I figured we could release it like this and see how frequently this happens (with raised default limits).
What types of logs are you expecting people are going to examine? For example, do they need to look at the dht-crawler logs rather than the fullrt ones, do they need to look at rcmgr logs, etc.
For these specific errors related to refreshing the accelerated DHT client's routing table, yeah it's just dht-crawler
. I can add that into the error message, since this is specific to that case.
The accelerated DHT client more broadly uses at least three logging subsystems (dht-crawler, fullrtdht, dht) where RM errors might surface.
Since this could leave the Accelerated DHT client in a degraded state, due to not being able to completely populate its routing table, we want to signal this to the user until we can modify the client to degrade more gracefully when hitting resource limits. Note that this logs only once per crawl, to avoid spamming the user.
3a1113f
to
b874f6b
Compare
Since this could leave the Accelerated DHT client in a degraded state,
due to not being able to completely populate its routing table, we
want to signal this to the user until we can modify the client to
degrade more gracefully when hitting resource limits.
Note that this logs only once per crawl, to avoid spamming the user.