-
Notifications
You must be signed in to change notification settings - Fork 24
IPLD Prime In IPFS: Target Merge Branch #36
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working my way through the libraries and some of the issues I raise might come lower down the stack, but wanted to get a few thoughts out there while I'm still reviewing.
resolver/resolver.go
Outdated
DAG ipld.NodeGetter | ||
|
||
ResolveOnce ResolveOnce | ||
FetchConfig fetcher.FetcherConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of exposing this as a fetcher.FetcherConfig
struct as opposed to some interface?
Perhaps this question is really about go-fetcher, but it looks like in the non-test code here we really just need something that exposes NewSession(ctx) fetcher.Fetcher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd vote weakly (both weak vote and weakly held position) that that might be premature abstraction. If there's probably only that one shape of config fields in the fetcher component, we might as well admit it, because it'll be more discoverable and easier to autocomplete. But, I don't have strong feelings about this, not having a super complete grasp of how diverse fetcher config might be planned to become.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it ended up being an interface anyway! (due to various other changes happening in go-fetcher) :)
resolver/resolver.go
Outdated
func NewBasicResolver(ds ipld.DAGService) *Resolver { | ||
func NewBasicResolver(bs blockservice.BlockService) *Resolver { | ||
fc := fetcher.NewFetcherConfig(bs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We previously took in a DagService here which, on the read end, is being replaced by the Fetcher. Shouldn't we pass in a Fetcher interface of some sort instead of a blockservice which then gets wrapped?
Perhaps it's the same outcome either way, but IIUC part of the "ipld-prime" way here is to be dealing with things on a DAG/node rather than block level which means that the thing we want is something that cares about DAGs rather than blocks.
WDYT about having something like a FetcherSession
(or better name) interface that just implements NewSession(context.Context) Fetcher
? It might make writing tests simpler and allow us to more easily make changes later (i.e. if we need to extend the input interface to do more than either the structs we're using already support then new functionality and the change is easy, or they don't and the change would have been hard either way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this all ended up happening!
break | ||
} | ||
// resolve node before last path segment | ||
nodes, lastCid, depth, err := r.resolveNodes(ctx, c, pathSelector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm likely missing something here, but why can't I just use resolveNodes
for the full path selector/why do we need to stop and make the last path segment special?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPLD selectors will follow a link transparently. If we're going to match the current behavior of the function, which returns the CID of the block pointed to by the last segment on the path without loading the block itself, we need to stop the selector traverse before the block gets loaded. If we traverse the full path with a selector, the block will get loaded. Perhaps @warpfork can clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is like the difference between stat
and lstat
in linux syscalls. Most of IPLD functions are behaviorally like stat
; they'll traverse links implicitly. In particular, yeah, as currently written, traversal.Walk
loads links before inspecting them to see if the link node is matched itself.
I'd be open to having more lstat
-like functionality in go-ipld-prime. (If someone wants to do that, 👍 But assume that won't make this PR easier to land today, so, 🤷 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without loading the block itself
To make sure I understand, are you saying the issue is that if rootCID/foo/bar
is a CID bafybar
then we'll end up loading the block instead of just returning the CID? In the case where rootCID/foo/bar
is an integer but rootCID/foo
is a CID we'll have to load the rootCID/foo
block anyway.
If so @warpfork any thoughts on if this is the way to resolve this? I'd think that basically all of the logic from this function would live in a selector traversal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann your summary is correct.
In practice, most path in IPFS is protobuf, where each segment really is a whole new node. We're trying to not have any more actual block loads (which are a network fetch) than before, and I believe the code as written mirrors what is current in terms of network fetches.
In the long term, as @warpfork points out, this repository is highly transitional at this point, so there may not be elegant solutions to everything. We are shooting for as much backward compatibility as possible.
resolver/resolver_test.go
Outdated
resolver.FetchConfig.NodeReifier = unixfsnode.Reify | ||
node, lnk, err := resolver.ResolvePath(ctx, p) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
|
||
uNode, ok := node.(unixfsnode.PathedPBNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is UnixFS really required here, aren't we just traversing DagPb nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc this go-path library is really all for unixfsv1 semantics. If we just wanted to traverse over dagpb nodes directly, we'd... traversal.*
over ipld.Node
, and there'd be nothing here.
I'm hoping in the longer run, this whole library will also just disappear into being a "normal" traversal.*
over these things from the unixfsnode
package, which are actually ADLs that are making the unixfsv1 semantics legible as normal Node step-wise semantics. IIUC, where were at now, with this PR, is that we're eroding go-path into being becoming this (but not getting rid of go-path yet, because incrementalism/scope-avoidance/you-know).
What this is testing is that the NodeReifier
hook system is actually producing one of those ADLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea what we're trying to do is maintain current behavior -- which is pathing by string, which neccesitated the unixfsnode library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with regular protonodes -- not unixfs files/directories -- you need the ADL to do pathing by string.
go.mod
Outdated
github.com/ipfs/go-blockservice v0.1.4 | ||
github.com/ipfs/go-cid v0.0.7 | ||
github.com/ipfs/go-datastore v0.4.5 | ||
github.com/ipfs/go-fetcher v1.2.0 | ||
github.com/ipfs/go-ipfs-blockstore v0.1.4 | ||
github.com/ipfs/go-ipfs-exchange-offline v0.0.1 | ||
github.com/ipfs/go-ipld-cbor v0.0.3 | ||
github.com/ipfs/go-ipld-format v0.2.0 | ||
github.com/ipfs/go-log v1.0.4 | ||
github.com/ipfs/go-merkledag v0.3.2 | ||
github.com/ipfs/go-unixfsnode v1.1.1 | ||
github.com/ipld/go-ipld-prime v0.9.1-0.20210402181957-7406578571d1 | ||
github.com/stretchr/testify v1.7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It kind of bothers me that we've ended up with so many extra dependencies here, even though they're mostly just used for testing. I feel like @warpfork may have some thoughts/suggestions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... there's a mixture of things going on here though...
- I bet a couple of these are new because the
go mod tidy
tool is listing more things lately (and they're not actually new dependencies). - Yeah, I'd like this list to be less...
- ... but testing is a predominatingly worthy cause, if that's what's causing them to appear.
- It's maybe not worth spending a ton of time on it in this repo, if we hope to erode this repo into the abyss entirely. (But let's watchdog intensely on the things that replace it, yeah.)
- The real problem, no matter which way we slice it, is... if you really wanna be sad, go look at the lock file. Somehow we've got transitive dependencies sprawling all the way to
etcd
(??) andbtcd
(?!) andgopherjs
(??!). I don't know which of these direct dependencies has those wildly out of control transitive sprawls, but that's probably the thing I'd worry about first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There a few less now. Some are just new dependencies (the fetcher). A few are yes -- for testing. While we could drop a bunch of these by switching away from protonodes, I'd rather not we ship v0.10.0 with these changes -- the main consume of this functionality is UnixFS, so it feels important to test the main exercised code path.
* first pass * Update resolver/resolver.go Co-authored-by: Eric Myhre <hash@exultant.us> * update dependencies to tagged versions * correctly handles nested nodes within blocks * return link from resolve path so we can fetch container block * return expected NoSuchLink error * more accurate errors * feat(resolver): remove resolve once remove ResolveOnce as it's no longer used and is just confusing Co-authored-by: acruikshank <acruikshank@example.com> Co-authored-by: Eric Myhre <hash@exultant.us> Co-authored-by: hannahhoward <hannah@hannahhoward.net>
802a897
to
1fd7492
Compare
resolve go vet and staticcheck issues. note we had to ignore two lines that use deprecated behavior, but which replacing could have unintended effects
Note: I had to update go-log cause of dependencies This module uses |
Fairly large comment coming. It will be non-blocking for this PR, but is triggered by my journey trying to understand the PR. You can skip it, unless you're thinking about where we're going with all this in the future. TOC:
We have a lot of reposIn reviewing this, I got the feeling I'm implicitly reviewing things across a number of other repos. And indeed: I drew a graph of go modules that reach between go-ipfs and go-ipld-prime as of today's tip of PR7976, and the current versions any of them reference amongst themselves: It's pretty hairy. Some of these (especially some of the sheer versions within each module) should go away by the time we're done with the mergeparty. Some of the complexity will stick around. And some of these repos are new (which frankly terrifies me, by default, unless they're outweighed by the number of repos disappearing). I want to understand:
More things in those middle two states means we've got more work to do in the future, but also means there's a broader range of what we frame as acceptable and successful for the current work cycle. ☁️ ☁️ ☁️ ☁️ ☁️ discovery logThis section is going to be high detail and high noise. Consider skipping it and going straight to the Summary at the end. The following notes are stream of consciousness. And I'm going to brain-OOM at least once in the middle of these notes. I'm confessing that very intentionally: I think other people will probably also brain-OOM when trying to chart this, because it's legitimately a maze. We'll need to keep this in mind when allocating time for work (and for planning work!) in the future. If we don't make good roadmaps, and document intended scope for repos... we'll get stuck doing re-discovery like this, and it's both time-consuming, and ultimately reduces the odds of success in reaching a simpler state by the end of renovations. Roadmaps and scope docs are good. Not time-cheap to make. But worth it. Okay! Here we go:
☁️ ☁️ ☁️ ☁️ ☁️ summaryLet's answer those original questions by color-coding these repos:
Green and black are good. Brown and orange mean there's still work to do in future cycles. Repos:
Okay. Ouch. That's a lot of brown circles. We didn't get a lot of successful removes. That means we're definitely not done renovating. And it also means we didn't score a lot of wins for overall complexity reduction yet (-> we might not actually be scoring dev velocity improvements by the end of this PR & mergeparty, either). I think we still consider this a good outcome. A lot of work was accomplished. (A lot of it is just at higher granularity than this repo/module-sized field-of-view can easily see.) There's one green circle that's a new thing that we're really happy about, and this PR here in And now we're at the end of a time-box for working on this. Hopefully we're in a better position for the next round of work. And hopefully in that next round of work, we'll take a bunch more of these brown circles and move them to black. Sorry for the verbosity. I just really needed to refresh on what the goalposts are. This PR is easier to review (and a lot easier to be happy about!) when one remembers that yeah, a lot of it is gonna smell janky because it's still constrained by legacy -- it's transitional code and we know it -- and the goal isn't to polish this, it's to move on quickly and try to replace most of this with other new APIs that are better, in the next round of work. Knowing and agreeing that we're not looking for polish / sanitization / complexity-reduction out of this patchset makes it much easier to approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm basically good with this.
The reached state is not attractive, but the reasoning about where to park scope for right now seems to hold.
go.mod
Outdated
github.com/ipld/go-codec-dagpb v1.2.1-0.20210330082435-8ec6b0fbad18 | ||
github.com/ipld/go-ipld-prime v0.9.1-0.20210402181957-7406578571d1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: When we're ready to merge let's use tagged versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is understood.
resolver/resolver.go
Outdated
// Resolver provides path resolution to IPFS | ||
// It has a pointer to a DAGService, which is uses to resolve nodes. | ||
// It reference to a FetcherFactory, which is uses to resolve nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "It references a" or "It has a reference to a"
break | ||
} | ||
// resolve node before last path segment | ||
nodes, lastCid, depth, err := r.resolveNodes(ctx, c, pathSelector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without loading the block itself
To make sure I understand, are you saying the issue is that if rootCID/foo/bar
is a CID bafybar
then we'll end up loading the block instead of just returning the CID? In the case where rootCID/foo/bar
is an integer but rootCID/foo
is a CID we'll have to load the rootCID/foo
block anyway.
If so @warpfork any thoughts on if this is the way to resolve this? I'd think that basically all of the logic from this function would live in a selector traversal.
resolver/resolver.go
Outdated
if len(nodes) < 1 { | ||
return nil, nil, fmt.Errorf("path %v did not resolve to a node", fpath) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this allowed to not equal 1? Since we're using a pathLeafSelector
shouldn't we only match the last element in the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any instance in which it could be greater than 1 based of the way selectors work.
It should equal exactly 1, assuming the entire path was resolved -- if it wasn't, then it will have 0 nodes, indicating a failure.
resolver/resolver.go
Outdated
} | ||
|
||
// ResolveSingle simply resolves one hop of a path through a graph with no | ||
// extra context (does not opaquely resolve through sharded nodes) | ||
func ResolveSingle(ctx context.Context, ds ipld.NodeGetter, nd ipld.Node, names []string) (*ipld.Link, []string, error) { | ||
// Deprecated: fetch node as ipld-prime or convert it and then use a selector to traverse through it. | ||
func ResolveSingle(ctx context.Context, ds format.NodeGetter, nd format.Node, names []string) (*format.Link, []string, error) { | ||
return nd.ResolveLink(names) | ||
} | ||
|
||
// ResolvePathComponents fetches the nodes for each segment of the given path. | ||
// It uses the first path component as a hash (key) of the first node, then | ||
// resolves all other components walking the links, with ResolveLinks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the code or the comment here need to change? Why aren't we using ResolveLinks
as before, do the functions not quite match up anymore, is it an efficiency thing, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should change and I will make said change.
As for why they don't quite operate the same any more... the functions just don't line up in quite the same way and it ends up being easier to seperate them.
uNode, ok := node.(unixfsnode.PathedPBNode) | ||
require.True(t, ok) | ||
fd := uNode.FieldData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? randNode()
generates a DagPB node that is likely not valid UnixFS.
Should we be checking the "Data" field in the IPLD node instead, or am I missing the point here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnixFSNode the repo should probably be renamed something along the lines of "ADLs for DagPB" because it's not entirely just UnixFS.
"PathedPBNode" is an ADL that's not quite UnixFS but also not just DagPB with no additional logic.
Basically, it turns a non-unixfs DagPB node into a map where the keys are the Link names.
We end up needing this for path traversal even when we're not dealing with unixFS.
Perhaps this particular ADL should not live in the go-unixfsnode repo. I don't particularly feel like it's pressing to move it right this second though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is interesting. It seems like we currently (i.e. before the IPLD prime PRs) support pathing through DagPB (not UnixFS) data by name even when ignoring the UnixFS resolver. This also seems to preclude any ability to address by path the Data
fields in DagPB nodes or the Size
fields in their links.
AFAICT this seems like a mistake since it prevents some of the DagPB data from being addressable by pathing. This is likely something we'll need to resolve as part of ipfs/kubo#7976 (comment).
…fields (#42) * fix(resolver): LookupBySegment to handle list indexes as well as map fields * Add test for /mixed/path/segment/types/1/2/3
@warpfork regarding your larger analysis, I largely agree with your conclusions. I would say the goals of this work were two fold:
I would say overall these two goals have been achieved. With the caveat that I think we learned more about some things than others. go-unixfsnode, while somewhat inadvertant, turned out to be the most fruitful exploration here, teaching us a lot about what ADLs can do, and how to actually get rid of go-unixfs eventually. go-fetcher in my mind was originally intended to eventually replace go-merkledag and go-blockservice, but that would require a bunch more work. Moreover, I'm not sure we've proved the interfaces by any means. I suspect the integration with Filecoin will help drive clarity on what our top level "go get me some stuff, either from disk or the internet" interface should ultimately look like. That's why I'm personally happy it did not make it on to the IPFS Core API. I think we learned something but not as much as I'd hoped. As you say, right now it's mostly glue code and utility functions on top of go-ipld-prime & go-blockservice While we haven't "won" yet in the sense of making the code simpler, ultimately, almost every single node now used in IPFS has an ipld-prime implementation underneath (mostly do, for better or worse, to our go-ipld-legacy shim layer). This was always a part 1 of many, and I really would like to get it merged soon so it's not hanging out there forever. |
@aschmahmann I've address the various PR comments and I hope we're at the point of approval? |
@aschmahmann : is going to leave a comment on some weirdness he sees in the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a nit, but otherwise seems good to go here.
Official merge branch for IPLD Prime In IPFS effort.
Switches usage over to go-fetcher
See: #34