Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch add-link fails to write result to blockstore #1925

Closed
eminence opened this issue Nov 2, 2015 · 8 comments
Closed

patch add-link fails to write result to blockstore #1925

eminence opened this issue Nov 2, 2015 · 8 comments

Comments

@eminence
Copy link
Contributor

eminence commented Nov 2, 2015

This isn't new, mostly recently tested with 36de29a

$ ipfs object new 
QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n
$ ipfs object patch QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n add-link name QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n
QmaLj6BqZg5BnjpVCBeUVzqGQcGuZPdBtVxz9zS4MbDTnu

At this point I expect this new hash (Qm...Tnu) to be available, but it isn't:

$ ipfs object stat QmaLj6BqZg5BnjpVCBeUVzqGQcGuZPdBtVxz9zS4MbDTnu
Error: merkledag: not found

Oddly, the above sequence works if you start with a unixfs-dir object (instead of an empty object)

eminence added a commit to eminence/go-ipfs that referenced this issue Nov 2, 2015
License: MIT
Signed-off-by: Andrew Chin <achin@eminence32.net>
@eminence
Copy link
Contributor Author

eminence commented Nov 3, 2015

Ok, I dug into this tonight, and forty-three thousand printfs later, I've found the root of this issue. It's kinda gnarly, I'm not sure how to solve this.

While the title of this issue says "fails to write to blockstore", that's not quite true -- ipfs ends up writing something to the blockstore, but it's not the thing it gives you the hash for. In the above example, instead of writing QmaLj6BqZg5BnjpVCBeUVzqGQcGuZPdBtVxz9zS4MbDTnu to the blockstore, it writes Qmb52TwanzjR4kXxnzvydzzqjmi1yNb9GEtc7hRogjsgf6 :

$ ./ipfs object get Qmb52TwanzjR4kXxnzvydzzqjmi1yNb9GEtc7hRogjsgf6
{
  "Links": [
    {
      "Name": "name",
      "Hash": "QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n",
      "Size": 0
    }
  ],
  "Data": ""
}

What's the difference and where did Qm...gf8 come from? The block actually written is one that doesn't have a data field at all. Since it is optional, it can be left out entirely from the protobuf serialized wire format.

But the hash returned by patch-object is the hash of a protobuf object that has a zero-length data field. This discprecenty is first introduced in Node.Copy(). To see this, we can instrument Node.Copy() to print out the encoded bytes of the new cloned node, and then the original node:

func (n *Node) Copy() *Node {
    nnode := new(Node)
    nnode.Data = make([]byte, len(n.Data))
    copy(nnode.Data, n.Data)

    nnode.Links = make([]*Link, len(n.Links))
    copy(nnode.Links, n.Links)

    nnode_encoded, _ := nnode.Encoded(true)
    n_encoded, _ := n.Encoded(true)
    fmt.Println("nnode enc:", nnode_encoded)
    fmt.Println("n encoded:", n_encoded)

    return nnode
}

The result:

nnode enc: [18 44 10 34 18 32 227 176 196 66 152 252 28 20 154 251 244 200 153 111 185 36 39 174 65 228 100 155 147 76 164 149 153 27 120 82 184 85 18 4 110 97 109 101 24 0 10 0]
n encoded: [18 44 10 34 18 32 227 176 196 66 152 252 28 20 154 251 244 200 153 111 185 36 39 174 65 228 100 155 147 76 164 149 153 27 120 82 184 85 18 4 110 97 109 101 24 0]

The new node has 2 extra bytes on the end. This corresponds to our optional bytes Data = 1 field (the 10 is a combination of a field number of 1, and a wiretype of 2, and the 0 is the length of the field). It's present in the new node (which we compute a multihash for and return it to the user), but not the original node (which is the object written to the blockstore)

If this was a greenfield project, I might suggest making the Data field required, so there wouldn't be two different ways to serialize the same object. I'm not sure if we can change it now in a compatible way (maybe if we added a default value?)

@whyrusleeping
Copy link
Member

@jbenet has strong opinions about 'optional' vs 'required' in protobuf. The issue here is actually (as you point out) that a zero length array and a nil array are nearly the same in go, but different in protobuf.

@whyrusleeping
Copy link
Member

I think this has been encountered before, and the idea we (i think i remember) decided on was do set zero length arrays to nil on marshaling.

@jbenet
Copy link
Member

jbenet commented Nov 3, 2015

@eminence great detective work! 🔎

I think this has been encountered before, and the idea we (i think i remember) decided on was do set zero length arrays to nil on marshaling.

yes, we should set them to nil.

If this was a greenfield project, I might suggest making the Data field required, so there wouldn't be two different ways to serialize the same object. I'm not sure if we can change it now in a compatible way (maybe if we added a default value?)

it's unfortunate but will be ok. we can always be compatible with what we get and aim towards producing canonical reps.

it is unfortunate if other objects exist out there that have an empty bytes. they wont dedup :(

also note this will change when https://github.com/ipfs/go-ipld/ lands

@eminence
Copy link
Contributor Author

eminence commented Nov 3, 2015

Thanks for the fix! I've confirmed that it's working. On the subject on canonical representations, you can sometimes see weird thing when working with PBNodes that were not generated by go-ipfs:

$ ipfs object patch QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n add-link HEAD QmWeTS4M8W9VKFxhZXB2JjnJuhaKXRrWyJ43qDGShn5KYo
QmViMFCVHgxE8dzHgiz1FedBWBuYUYTxKPPpsTn35BZFJx


$ ipfs object get QmViMFCVHgxE8dzHgiz1FedBWBuYUYTxKPPpsTn35BZFJx
{
  "Links": [
    {
      "Name": "HEAD",
      "Hash": "QmV6Y2zs2dZwE1Lx1WdcsZ1DhSVRhXrKN3dviWbJc5xZiq",
      "Size": 44483
    }
  ],
  "Data": ""
}

I asked for a link to Qm…KYo but ended up with a link to Qm…Ziq. This just because go-ipfs sorted the links within Qm…KYo and wrote out a new object. I don't think anything is incorrect here, just a little bit non-obvious. Perhaps we could document how IPFS produces canonical reps, so that other tools can follow the same rules?

Finally, please don't forget about #1926 (either close it, or merge it). Thanks!

@whyrusleeping
Copy link
Member

@eminence what tool created the ref in question?

@eminence
Copy link
Contributor Author

eminence commented Nov 3, 2015

@whyrusleeping It's a custom tool I created. It creates/manipulates PBNodes and writes them directly to disk (into $IPFS_PATH/blocks). Unlike go-ipfs, I'm not doing any sorting of my links (though I will start doing this now)

@whyrusleeping
Copy link
Member

This has been resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants