Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: IPLD spec #37

Merged
merged 38 commits into from
Feb 12, 2016
Merged

WIP: IPLD spec #37

merged 38 commits into from
Feb 12, 2016

Conversation

jbenet
Copy link
Member

@jbenet jbenet commented Nov 8, 2015

This PR adds a new IPLD spec.

Some things TODO:

  • paths: link to path issue in go-ipfs or go-ipld
  • paths: list path resolving restrictions
  • paths: show examples of path resolving
  • examples/bitcoin: make this a real txn
  • more examples

@mildred @diasdavid could you review?

@jbenet
Copy link
Member Author

jbenet commented Nov 8, 2015

Another TODO:

  • multicodec: attribute to signal the canonical multicodec for a datastructure. I.e. if a datastructure ought to use a canonical serializing format (e.g. different than cbor), we can use an attribute like @multicodec to signal that to implementations.

It may need to be stored in the serialized format, since decoding it will yield a multicodec. maybe just in the in memory logical representation, so serialization happens correctly. (With the exception for old style proto/mdagv1) (thoughts @mildred?)


TODO:
- [ ] list path resolving restrictions
- [ ] show examples
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willglynn could you help me fill this out?

the more concise here the better, but i suspect this section may be a bit large.

see rendered doc here: https://github.com/ipfs/specs/blob/ipld-spec/merkledag/ipld.md

@jbenet
Copy link
Member Author

jbenet commented Nov 8, 2015

More TODOs:

  • @attributes: describe @ escaping for the potential future @attributes. (cc @mildred)
  • Linked Data: add a section describing relationship to proper linked data formats, JSON-LD, etc.
  • Linked Data: add a JSON-LD example
  • Linked Data: add an RDF example

}
}

> ipld cat --json QmBBB...BBB/author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this resolve to

"mlink": "QmAAA...AAA" // links to the node above.

? Same for the YAML example below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right and this was a copy/paste typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks. it should actually be:

{
  "name": "Vannevar Bush"
}

addressed in 37c662a

@hackergrrl
Copy link

I left a few nitpicks. On the whole I found this doc hugely helpful in understanding IPLD and its intent.

- `ipfs` is a protocol namespace (to allow the computer to discern what to do)
- `QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k` is a cryptographic hash.
- `a/b/c/d` is a path _traversal_, as in unix.
- this link traverses five objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily.

We could have the object QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k having a link named a/b/c/d directly pointing to the final object (or any combination in between).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mildred ah that's right we did say we would allow sparse. one question remains re ordering of links there-- do we want to take them by lexicographic order? or in order in the serialized fmt?

ordering based on the serialized fmt will be needed if links have same name/no name (someone WILL do it so ipld implementations should be written to handle the case even if we say people should not do it)

but ordering lexicographically when links do have names is useful for getting users to expect the same behavior.

how do we handle this?

> ipld cat --fmt yml $h1
---
foo: {mlink: $h2}
foo/bar: {mlink: $h3}

> ipld cat --fmt yml $h2
---
bar:
 hello: h2bar1

> ipld cat --fmt yml $h3
---
hello: h3bar2

> ipld cat --fmt yml $h1/foo/bar
# ??? should it be
---
hello: h2bar1

# or should it be
---
hello: h3bar2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not expect this to be an issue because what you describe is the compact form of the following object:

> ipld cat --fmt yml $h1
---
foo: 
  mlink: $h2
  bar:
    mlink: $h3

And in this object, only the "foo" link will be considered valid, not "foo/bar" because :

  • keys should not be allowed to have / character (so we will never have "foo/bar" verbatim in a key) for the same reasons / is not allowed in a unix filename.

  • in the previous object, "foo/bar" will not be considered as a link as per the "Duplicate property keys" section :

    Note that having two properties with the same name IS NOT ALLOWED, but actually impossible to prevent (someone will do it and feed it to parsers), so to be safe, we define the value of the path traversal to be the first entry in the serialized representation. For example, suppose we have the object:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipld cat --fmt yml $h1/foo/bar

should be h2bar1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • keys should not be allowed to have / character

i agree, but unlike unix pathnames, there already are datastructs out there that we should be able to store, even if the resolution through them is not perfect. I.e. if we define how the resolution would work even in this case, we avoid the problem of forcing users to change their data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say, don't resolve links for keys that have / in them. We can still store data structure which have those keys, we just can't resolve them through paths. I don't see a problem in that. We would have a separate API to parse the local data structure without following links.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not escape slashes, so:

> ipld cat --fmt yml $h1/foo/bar
---
hello: h2bar1

> ipld cat --fmt yml $h1/foo\/bar
---
hello: h3bar2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that you meant to put the path in simple quotes, else your bourne compatible shell will replace \/ by / and both commands are identical.

> ipld cat --fmt yml '$h1/foo\/bar'
---
hello: h3bar2

The kernel doesn't know escaping so what the linux kernel will understand when presented with the path foo\/bar is the entry bar enclosed in a directory called foo\. Backslashes in file names are valid and this notation would prevent using them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you wouldn't be able to access keys like that with the fuse mount, but I don't see why that doesn't mean the ipld command couldn't support escaping? If you wanted a literal backslash, you'd write \\

@mildred
Copy link
Contributor

mildred commented Nov 9, 2015

Another TODO:

  • multicodec: attribute to signal the canonical multicodec for a datastructure. I.e. if a datastructure ought to use a canonical serializing format (e.g. different than cbor), we can use an attribute like @multicodec to signal that to implementations.

It may need to be stored in the serialized format, since decoding it will yield a multicodec. maybe just in the in memory logical representation, so serialization happens correctly. (With the exception for old style proto/mdagv1) (thoughts @mildred?)

In that case, we should either provision key escaping to avoid clashing with someone wanting to have "@multicodec" as an arbitrary key in his JSON data structure (think a @multicodec in a directory object). Or we should transmit this information outside of the IPLD object itself (not encoded as part of the datastructure).

@mildred
Copy link
Contributor

mildred commented Nov 9, 2015

👍 on this whole thing. Still a major TODO for me is the description of the @ character escaping for introducing IPLD specific directives.

By the way, why choose the @ character to introduce directives? This might be confusing in the context of JSON-LD where the same character is used (but it will be escaped in the IPLD object). We are free to choose anything ($%!#&?*+...)

@jbenet jbenet mentioned this pull request Nov 9, 2015
11 tasks
- **IPLD Serialized Formats**: a set of formats in which IPLD objects can be represented, for example JSON, CBOR, CSON, YAML, Protobuf, XML, RDF, etc.
- **IPLD Canonical Format**: a deterministic description on a serialized format that ensures the same _logical_ object is always serialized to _the exact same sequence of bits_. This is critical for merkle-linking, and all cryptographic applications.

In short: JSON documents with named merkle-links that can be traversed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traversed or resolved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both -- i think one (traversed) implies the other (resolved)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, my initial thought was that merkle links can be resolved and something like a smart bitswap can traverse

@daviddias
Copy link
Member

I don't see any specific mention to the data types the links point to, have we decided that isn't IPLD turf (and just json-ld)?

@daviddias
Copy link
Member

Also, instead of adding more TODO here, can we before go issue by issue here https://github.com/ipfs/go-ipld/issues and ipld/js-ipld-dag-cbor#2 and check what is already included in the spec, add what is missing, explicitly add what was accepted and then see what's left, bringing all of that conversation to this PR? So that we all talk over the same page :)

@mildred
Copy link
Contributor

mildred commented Nov 13, 2015

I don't see any specific mention to the data types the links point to, have we decided that isn't IPLD turf (and just json-ld)?

It is possible to describe the merkle link in a JSON-LD context easily, but the context must also describe other parts of the JSON document. So it can't be a single context for all IPLD documents.

So, you'll have different contexts for small files, chunked files, directory, git blob, tree, commit, ..., and each of these contexts will be able to describe the merkle links the same way.

Also, we could imagine allowing any arbitrary key name instead of "mlink" and use the provided JSON-LD context to know which key is a merkle link, but this would mean parsing the JSON-LD context for every IPLD operation, which I believe would be way too slow.

@candeira
Copy link

From IRC:

< kandinski> jbenet: the IPLD spec doesn't define where the "IPLD" initials come from
< kandinski> Linked DAG?
@jbenet kandinski: good point. Linked Data. but it's not "legit linked data"
@jbenet linked dag could work :)
< kandinski> It's InterPlanetary Linked Data
@jbenet but will confuse people
< dignifiedquire> but InterPlanetaryLegitData sounds way cooler

In any case, origin of initials should be referenced at the top.

Since it's going to be pronounced "Eye Pee Ell Dee" anyway, we could say something like:

"The IPLD (short for InterPlanetary Linked DirectedAcyclicGraph) is...". We do call it a "thin-waist merkle dag" anyway, which is a killer description, by the way.

@candeira
Copy link

Same as above with the first mention of CDRTs. "Conflict-free replicated data type" is easier to understand for people coming in anew.

@jbenet
Copy link
Member Author

jbenet commented Nov 21, 2015

cc @mekarpeles

@jbenet
Copy link
Member Author

jbenet commented Nov 21, 2015

@mildred
Copy link
Contributor

mildred commented Feb 11, 2016

i think i like solution (8) the most, with / and // and /@link/ -- the reason i want (8) over (4) is that i do want ipld paths to be expressible in a unix path, web URI/URL, and having /@link/ make it possible to do. I would prefer // the rest of the time.

I like these options as well. If the filesystem layer contracts // to / it is always possible to use /@link/, and this can be nicely represented on a filesystem.

@jbenet
Copy link
Member Author

jbenet commented Feb 12, 2016

OK! #59 #61 #62 #64 are all merged! 👍 MASSIVE thanks to @mildred for pushing it through. We all are very thankful :)

What issues remain here? I think I will merge this (FINALLY!) and let's continue to iron it out with future PRs against master. I think we have a solid spec, and now https://github.com/ipfs/go-ipld/ and https://github.com/diasdavid/js-ipld/ can match it.

jbenet added a commit that referenced this pull request Feb 12, 2016
@jbenet jbenet merged commit 5e5f3ba into master Feb 12, 2016
@jbenet
Copy link
Member Author

jbenet commented Feb 12, 2016

Note: let's keep the branch, as there's many links to that specifically.

@hackergrrl
Copy link

Great work, @mildred, @jbenet! 👏

@parkan
Copy link

parkan commented Feb 12, 2016

🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.