-
Notifications
You must be signed in to change notification settings - Fork 12
Implement missing features to include IPLD in IPFS #14
Comments
The protobuf links all have a field called
In IPLD, the Size field is used in a few places:
If the size is optional, we will have to modify IPFS to handle the case when it is absent. And we should remind ourselves that even in case where is is here or mandatory, it may not be correct. If we rely of the advertised size to. That's why I would prefer the size information to be optional. |
There is another issue with the It is used basically in the functions This
|
Well, actually the edit: or perhaps not. I was using golang tooling that hide away a few calling sites edit again: but instead of constructing a merkle tree in memory, we should perhaps imagine a streaming interface to allow adding huge trees that cannot fit in memory. |
(first responses, more coming)
that's right.
this sounds good. all of this should happen over the so the steps shoul dbe:
the ripping out to own repo may make things easier or harder, not sure. whatever is seen as easiest. |
@mildred i think your analysis re the Node pointer is correct. and agree with your thoughts. the pointers are used all over importing, i believe, to make mutations of whole graphs easier. and we will want to make this a possibility anyway to enable users to write algorithms running over the graph natively. in this, i'd like to borrow from how implementations of ORMs, and graph OMs do it. in those, the link object includes a pointer which is never serialized, and which may or may not be nil. thus, if it's there, we use the node, if it's not, we must fetch it, etc. this may also be doable in the current go-ipld package by making the but, note that there is not an easy way around all of this in go, that preserves the ease of serialization the we could give up on that for the sake of nice programmatic use. not sure. it may be that it is ok to have two packages, or promote experimentation defining them for a while, until we find a great one. by the way, it's good to bikeshed all our concerns here, as "very nice programmatic interfaces" are notoriously hard to get right, and can enable tons of very nice use cases and client programs. the Go team routinely points out they had long discussions in defining the core types of the stdlib. Updated, so please re read if reading from email |
I was also wondering if we had to deserialize everything from JSON/CBOR/ProtoBuf or if we could just deserialize just what we are interested into. In that case, we must keep the original bytes around in case we need to do something more with the object. What I am saying here is that IPFS is only interested in links and the data section of an object. Other keys might be interesting, but only to some parts of the system. The HTTP gateway might want to read content type information, the IPNS might want to read a few more things. This way, instead of having a very generic (and inefficient) data structure (map of maps of maps), we could have more efficient data structures constructed only when necessary, and always by deserializing the original bytestream. These datastructure can hold anything of interest (like the We would have to make sure of course that we don't serialize them back to an object directly, as it will miss important data that wasn't deserialized. But because objects are immutable, I don't think we will run much into this issue. The only use case where we serialize objects is when we construct them anew. |
i do like the idea of keeping the thing serialized and only pulling out what you need, this in general makes programs faster, because sometimes can avoid deserializing, or can even just update the relevant pieces. ((btw, we can make capnp like support for go-ipld (or really just cbor docs) which only pull out the data when you need it. so R/W is a bit slower, but by not reading everything we win.)) anyway, what you describe might make for a nice implementation of go-ipld. Im not sure though-- there is a lot of benefit that comes from supporting maps natively and giving users that type of easy access (json-like). I think it would be nice to be able to have both. the tricky part comes when users use it. i wonder then if maybe we should be implementing this as an btw:
well, not anymore in that the data section is now "everything else except the links". |
cc @whyrusleeping to take a look and provide feedback, as he manipulates nodes a lot. @whyrusleeping -- do you prefer the "convert to map" approach, or the "keep it all serialized and only read out (duplicated, or on every access) what you need" |
I'm not super sure whats happening here, but some feedback i can provide: The 'Node' field in the link objects is very useful, but isnt something we want serialized, its just a cache to make working on the trees easier. Removing it would be fine, but might make certain operations on dags a little slower. In regards to moving merkledag out into its own package, gx is getting a lot nicer, i can probably try it out for extracting that package soon |
What is preventing from accessing the Node through a hash map that associates a link hash to the Because, in this ipld package as it is now, the But as we discussed just before, we can just keep the encoded data in encoded form, and decode just what we need for efficient reading and traversing operations. We keep the data structure simple (structs with pointers instead of hash maps). Node modification would be a little more difficult as we mustn't lose the data that we didn't bother to decode. |
i think having the pointers on the object is very useful to write algorithms that traverse the graphs. it makes thinking about them much easier to be able to do also, we could write a |
Late commenter here! I'm really interested in @mildred's comment about Some examples in the spec include a |
@noffle I think creating ipfs/unixfs objects should remain the same (adding size information whenever it makes sense to), but when you're reading a general IPLD object, you shouldn't assume there to be a size attached (and even if there is, it's not necessarily accurate). |
@davidar: I think that makes sense. Baking size into the format seems like an overly stringent requirement. Not needing to know size is a nice property: it means you can finally build objects without needing to actually look up the things you link to -- multihash is sufficient. The best we can do then is ensuring that our tooling tries to include whenever it's reasonable to do so (much more important on binary blobs than on metadata that is likely small). |
Perhaps we don't need to do anything, but I would think that perhaps we are missing a fes things to integrate IPLD in IPFS.
If I'm not mistaken, the go-ipld package should replace github.com/ipfs/go-ipfs/merkledag package. Is that right ?
If so, I'd think we should modify both the old merkledag package to look a bit more like go-ipld (remove external access to struct members, rename a few things to match go-ipld function names if needed). That will also require modifying the core of ipfs.
Then, we must implement any missing function (if any) in go-ipld, and make the switch.
Does that sound like a good idea ? @jbenet, anyone ?
The text was updated successfully, but these errors were encountered: