-
Notifications
You must be signed in to change notification settings - Fork 24
fix: give one minute timeouts to function calls instead of block retrievals #44
Conversation
…ievals to get around issues with shared IPLD ADL contexts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually grok how these functions fit together, or who the caller of the exported functions here is, or what their expectations are, so my review should be considered low-value.
However, it does look to me like -- depending on what callers actually expect from these methods -- in some of the cases, the current diff might not be hoisting the context initialization high enough up the call stack.
(Maybe it'll work fine, in some cases, or even in "all" cases considering how this is used "in practice". I don't know. If so, it still probably suggests a weirdly close coupling of this library to a specific single application that uses it. I'm not gonna try to police that or change that today; just sayin'.)
In general I wonder if this defer cancel()
behavior is a useful idea. Would anything bad happen if we just dropped it? And I continue to doubt that timeouts being introduced by a library at all are a sensible idea, even under the alluring guise of "defense in depth".
// create a new cancellable session | ||
ctx, cancel := context.WithTimeout(ctx, time.Minute) | ||
defer cancel() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, because the method's return is (cid.Cid, []string, error)
-- there's no Node
which might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it says "i'm cancelled", so it's good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems to be the one that's used by go-ipfs and caused the reported error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for all the other ones aside from this we just remove the timeout.
@Stebalien you were thinking that leaving these timeouts in might make debugging changes here easier (#36 (comment)). Any objection to leaving this one in and removing the rest?
The change here means we have 1min to resolve the function instead of 1min per block, but that should be way more than enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember why that had anything to do with tracking down bugs... (might have been WRT keeping changes minimal?).
I'm generally against timeouts in places like this where there's no reasonable "we should have received a response by X time" so I'd rather remove all of the hard-coded timeouts like this eventually.
resolver/resolver.go
Outdated
// create a new cancellable session | ||
ctx, cancel := context.WithTimeout(ctx, time.Minute) | ||
defer cancel() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably still going to exhibit roughly the same behaviour as before, because it's still cancelling by the time the method returns, and the method's return is (ipld.Node, ipld.Link, error)
-- emphasis on that it returns a Node
, and so that might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it immediately says "i'm cancelled".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There don't seem to be too many users of this. There's go-ipfs' https://github.com/ipfs/go-ipfs/blob/92854db7aed4424fad117ceb4e13f64a80ff348b/fuse/readonly/readonly_unix.go#L70
From the way it's used there it doesn't seem to be expecting to return a HAMT. @hannahhoward can you confirm that a HAMT pointing to a HAMT won't cause issues with the above code?
The function also seems to be used https://github.com/filecoin-project/lotus/blob/4fc78bf72037ada0cb3edd0f551172428ddbc9e8/node/impl/full/chain.go#L593, which exhibits the concerns you were indicating. It doesn't deal with UnixFS HAMTs, but ... that's not a great feeling. Should be fine if we drop the timeouts per #44 (comment) though.
resolver/resolver.go
Outdated
// create a new cancellable session | ||
ctx, cancel := context.WithTimeout(ctx, time.Minute) | ||
defer cancel() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably still going to exhibit roughly the same behaviour as before, because it's still cancelling by the time the method returns, and the method's return is ([]ipld.Node, error)
-- emphasis on that it returns a Node
, and so that might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it immediately says "i'm cancelled".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if ResolvePathComponents
or ResolveLinks
are used anywhere. https://grep.app/ doesn't show any uses. We can probably drop the timeouts here (as mentioned in #44 (comment))
…t work with fetchers and return IPLD nodes
// | ||
// Note: if/when the context is cancelled or expires then if a multi-block ADL node is returned then it may not be | ||
// possible to load certain values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so I've removed the other 3 timeouts but left this one in as it's at least plausible.
I'm still concerned that the other functions all return nodes where cancelling the context prevents the nodes from being traversable. However, the caller is able to control if/when this happens so it's less of a problem. I added a comment calling out the difficulty.
cc @warpfork any better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM by me. I definitely feel this is a really weird place to have timeouts in general.
fix: give one minute timeouts to function calls instead of block retrievals This commit was moved from ipfs/go-path@7031c42
This allows us to get around issues with shared IPLD ADL contexts such as ipfs/kubo#8420