-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rcmgr: the last charge #9680
rcmgr: the last charge #9680
Conversation
Blocked on libp2p/go-libp2p#2155 |
core/node/libp2p/rcmgr_defaults.go
Outdated
} | ||
|
||
maxMemoryMB := maxMemory / (1024 * 1024) | ||
maxFD := int(cfg.ResourceMgr.MaxFileDescriptors.WithDefault(int64(fd.GetNumFDs()) / 2)) | ||
|
||
// We want to see this message on startup, that's why we are using fmt instead of log. | ||
fmt.Printf(` | ||
msg := fmt.Sprintf(` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message needs to be updated, like we don't apply user-supplied overrides on top, and "ipfs swarm limit all" is no longer a valid command.
I think it'd be fine to just remove this entirely though, does it really add much value anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the value of the message is that it makes clear what maxMemoryMB value is being used for computing defaults. I'm torn whether to remove it, because this has been source of confusion before, but maybe it was more confusing with all the other stuff going on. We could remove for now.
If we do keep this message, we should:
- talk about
ipfs swarm resources
- remove the user-supplied overrides on top
- document what the returned string is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look again
core/node/groups.go
Outdated
@@ -340,9 +338,9 @@ var Core = fx.Options( | |||
fx.Provide(Files), | |||
) | |||
|
|||
func Networked(bcfg *BuildCfg, cfg *config.Config) fx.Option { | |||
func Networked(bcfg *BuildCfg, cfg *config.Config, userRessourceOverrides rcmgr.PartialLimitConfig) fx.Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func Networked(bcfg *BuildCfg, cfg *config.Config, userRessourceOverrides rcmgr.PartialLimitConfig) fx.Option { | |
func Networked(bcfg *BuildCfg, cfg *config.Config, userResourceOverrides rcmgr.PartialLimitConfig) fx.Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I had no bandwidth to review code, but small ask about end user error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @Jorropo for taking this on and helping us get this over the line.
A few high level comments:
- After incorporating comments, please add example output in the PR from running this code. That helps validate that we're getting what's expected.
- I'll take a crack at adding some commits for the documentation side. That should also help ensure we're aligned here on the desired output.
- I was surprised not to see a call to
rcmgr. NewLimiterFromJSON
. That was the idea behind Resource Manager: RemoveResourceMgr.Limits
configuration from Kubo and use the default libp2p JSON file for it #9603. We instead seem to be plumbing through "UserResrouceLimitOverrides" in various places. Instead can we just say that "if you provide limits.json" we take a hands off approach and let the user do whatever they like. At that point a user is skiing out of bounds and assume all liability.
- As an aside, given there is nothing special about the name "limits.json" lets come up with a clearer name. I have a comment about this inline.
- I recognize that there is going to need to be some back and forth here. I'll make it clear that the release is delayed.
core/commands/swarm.go
Outdated
}, | ||
Encoders: cmds.EncoderMap{ | ||
cmds.Text: cmds.MakeTypedEncoder(func(req *cmds.Request, w io.Writer, ris libp2p.ResourceInfos) error { | ||
tw := tabwriter.NewWriter(w, 30, 8, 0, '\t', 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we emit TSV anywhere else?
If the default is to do JSON output, we can always have folks use jq to convert to TSV if desired.
( .[0] | keys_unsorted), (.[] | [.[]]) | @tsv
Example snippet: https://jqplay.org/s/SsPvm8FG3ER
That said I agree a TSV table is most user-friendly and if we just want to do that, that's fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always add more encoders later, the framework we use for commands supports them both.
repo/fsrepo/fsrepo.go
Outdated
@@ -437,6 +443,16 @@ func (r *FSRepo) openConfig() error { | |||
return nil | |||
} | |||
|
|||
// openUserRessourceOverrides will remove all overrides if the file is not present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "remove all overrides"?
I assume if the file isn't present that then we are only using computed defaults. Is that right? Can you please clarify?
Also can we add comments about why Kubo even needs to be looking at libp2p's limits.json file here? Why doesn't libp2p just use it automatically? (I think I know the answer, but I think we should be clear.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add a link to go-libp2p limits.json docs so it's clear where this file path came from.
(That said I can't find a place to link to per https://filecoinproject.slack.com/archives/C03FFEVK30F/p1677704303811539 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so per Slack thread, there is nothing special about limits.json. My bad for not realizing that earlier. Given we get to come up with the name then lets be more self describing. Maybe something like libp2p-resource-limit-overrides.json
. Key thing is to get the libp2p
prefix and to have the name match what we agree to call this in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lotus and boost already name this file limits.json
, I like the consistency.
What do you mean by "remove all overrides"?
It gives a partial config that is empty.
The partial config object will use the concrete value when we call .Build
when a 0 is present, if I give you an object that is all zeros everywhere, all of the values will be replaced by the supplied concrete.
I'll update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed verbally using a name that is clear for Kubo users (most of which aren't paying attention to conventions in Filecoin projects like Lotus and Boost). "limits.json" is too generic as there a lot of potential limits that aren't being defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated please take a look again (when I push)
@@ -106,6 +88,8 @@ func ResourceManager(cfg config.SwarmConfig) interface{} { | |||
ropts = append(ropts, rcmgr.WithTrace(traceFilePath)) | |||
} | |||
|
|||
limiter := rcmgr.NewFixedLimiter(limitConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought that if there is a "limit.json" file that we do rcmgr.NewLimiterFromJSON(reader for limit.json, limitConfig)
That was the spirit of #9603. I think this is an important distinction and I have added a top-level comment on that. Getting alignment there will affect some of the implementation I suspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this verbally. @Jorropo understandably wants to have all the IO happen up front and not sneaking in later down the line. Per #9680 (comment), we are replicating the logic of cmgr.NewLimiterFromJSON
, but doing it upfront to make it clearer to reason about when I/O is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some logic questions coming up for me while writing docs and checking it against the implementation...
core/node/libp2p/rcmgr_defaults.go
Outdated
} | ||
|
||
return defaultLimitConfig.Build(orig), nil | ||
return partialLimits.Build(rcmgr.DefaultLimits.Scale(int64(maxMemory), maxFD)), msg, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few important logic changes here and I'm not sure they're right...
- Previously we were doing the autoscaling before the
if cfg.ConnMgr.Type.WithDefault(config.DefaultConnMgrType) != "none"
code that ensuresSystem.ConnsInbound
is high enough. That was giving us the logic ofmax(computed System.ConnsInbound, connmgrHighWaterTimesTwo)
. It now seems like we just getconnmgrHighWaterTimesTwo
. - Prevously we were scaling some base limits by maxMemroy and maxFD. Now we're scaling rcmgr.DefaultLimits. I'm confused why
rcmgr.DefaultLimits
is even in the picture.partialLimits
already has all the scopes defined. Why do we need DefaultLimits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this is a mistake that slipped through while rebasing and fixing conflicts, fixed.
- partial limit is not scalled, we used to define both a base and a scalling config, Antonio changed it to now where we only define a base and then use default scalling, I don't know what are the impacts of not using a custom scalling.
// NewLimiterFromJSON creates a new limiter by parsing a json configuration.
func NewLimiterFromJSON(in io.Reader, defaults ConcreteLimitConfig) (Limiter, error) {
cfg, err := readLimiterConfigFromJSON(in, defaults)
if err != nil {
return nil, err
}
return &fixedLimiter{cfg}, nil
}
func readLimiterConfigFromJSON(in io.Reader, defaults ConcreteLimitConfig) (ConcreteLimitConfig, error) {
var cfg PartialLimitConfig
if err := json.NewDecoder(in).Decode(&cfg); err != nil {
return ConcreteLimitConfig{}, err
}
return cfg.Build(defaults), nil
} We have the same: var cfg PartialLimitConfig
if err := json.NewDecoder(in).Decode(&cfg); err != nil {
return ConcreteLimitConfig{}, err
} logic (except ours will ignore if the file is not present). |
25ab9a8
to
a78f61a
Compare
Rcmgr tests are passing on my machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jorropo . This is looking great. Thanks for the back and forth. The reasons I didn't give it an "Approve" was because:
- Missing the docs changes.
- I'd ideally like to see the output of
ipfs swarm resources
to confirm it's what we'd expect - I had one question about the tests and why we're gating some of the asserts with an if check.
That said I'm good with you merging if:
- You add the docs
- You look through my comment and incorporate where they make sense
- Are highly confident that the
ipfs swarm resources
output is what you'd expect given a value of Swarm.ResrouceMgr.MaxMemory of something like ~4GB.
If you aren't feeling confident, I'm good to review tomorrow (2023-03-03). (I realize that means the RC delays to Monday, 2023-03-03, in that case.)
de74d8e
to
bb850b3
Compare
Example output:
The I don't think this is a blocker for the rc. |
c8a8406
to
3a36dcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - thanks @Jorropo
Maybe to the resource manager .md file add something like
What do svc
and peer
scopes list a lot of blockAll
values for ipfs swarm resources
?
The svc
and peer
scopes function at the stream (not connection) level. As a result, limits for
- FD
- Conns
- ConnsInbound
- ConnsOutbound
aren't applicable for these scopes. They use the same internal struct within go-libp2p as other scopes like system, transient, etc. but have a value of "blockAll" internally which is why they are outputted in this way.
Wouldn't it be trivial to just drop those limits for |
Also in that latest example output you posted, I don't see any "unlimited" limits, so I can't verify that the result is interpretable in that case. |
limit = "unlimited" | ||
percentage = "n/a" | ||
case rcmgr.BlockAllLimit64: | ||
limit = "blockAll" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit = "unlimited" | |
percentage = "n/a" | |
case rcmgr.BlockAllLimit64: | |
limit = "blockAll" | |
limit = "Unlimited" | |
percentage = "n/a" | |
case rcmgr.BlockAllLimit64: | |
limit = "BlockAll" |
for consistency with the other enum string representations used in the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the output use unlimited
and blockAll
:
https://github.com/libp2p/go-libp2p/blob/59a14cf3194d5d057c45cb1dbc7b1af3a116bc7a/p2p/host/resource-manager/limit_defaults.go#L127-L131
This is required because of the cyclic module dependencies, this will fix CI for ipfs/kubo#9680.
@@ -309,7 +309,7 @@ jobs: | |||
- run: | |||
name: Cloning | |||
command: | | |||
git clone https://github.com/ipfs/go-ipfs-http-client.git | |||
git clone https://github.com/ipfs/go-ipfs-http-client.git -b bump-for-rcmgr-last-push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove once the cylce has been upgraded.
@@ -149,6 +149,7 @@ jobs: | |||
with: | |||
repository: ipfs/go-ipfs-http-client | |||
path: go-ipfs-http-client | |||
ref: bump-for-rcmgr-last-push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove once the cylce has been upgraded.
Co-Authored-By: Antonio Navarro Perez <antnavper@gmail.com>
This is required because of the cyclic module dependencies, this will fix CI for ipfs/kubo#9680.
This is required because of the cyclic module dependencies, this will fix CI for ipfs/kubo#9680.
Includes work from: #9623
Includes work from: #9612
Fixes: #9650
Fixes: #9621
Fixes: #9577
Fixes: #9603