A MerkleDAG Linked Web or IPLD proposal/idea #2

daviddias · 2015-08-28T11:10:03Z

In order to create a Web of information that survives over time, passing through generations of language idioms and primitives, we need a way to communicate effectively where information lives and how to interpret it.

This idea is not new and there has been several attempts to solve this problem, but to the added complexity, we always see the race to adoption fail, as it typically requires a full buy in from the developer, in order to leverage the advantages of SM/LD.

One recent attempt is JSON-LD, which takes on JSON, successful and well adopted data format used in the Web today and adds Linked-Data '@context', so that Linked-Data processors can infer the type of information and links present. One identified shortcoming of JSON-LD is its inability to coexist with normal JSON data, JSON-LD doesn't support any data that is not referenced in a given '@context', discarding that data if it passes through a JSON-LD processor.

Another issue is the present use of URL to store the schemas that describe the data. URLs are not eternal, they might disappear or the schemas might change location and since they rely on DNS, they require constant Internet access in order to understand the that that is given to us.

💡

What if we treat Linked Data in the same way we handle files, that is, we specify how the file/link is encoded, so that the decoder knows how to treat the link that will give it data.

foo: {
  '@multicodec': '/person/'
  name: 'Tim'
  age: 9000
}

This way, a processor of this object will know it will have to use a 'person' decoder, in order to make sense of this data.

Let's look now with an link:

foo: {
  '@multicodec': '/www/link/person/'
  '@value': 'http://someurl.com/person-1'
}

This tells us that our value is of type person, in a link inside the world wide web.

Now let's look if instead of using the www and http, if we used a content address filed system

foo: {
  '@multicodec': '/ipfs/link/person/'
  '@value': '/ipfs/QmbuH1ZExsQvzVEFFw9S2CivasHrQ9KmCy6zbxSymq8X5r/person-1'
}

Now our linked data processor would know that in order to fetch that object, it would have to use a IPFS, the content address filesystem.

This gives us the opportunity to link data that is not even in the Internet now, like books, articles, papers, that once uploaded or manually searched, can be part of our data structure. For e.g

foo: {
  '@multicode': '/book/'
  '@value': '/Alice's Adventures in Wonderland/chapter/Advice from a Caterpillar
}

What about the decoders? One of the benefits of Linked-Data is that once we can find the schema, we can make sense of the data because it tells us how to parse it. Well, with multicodec references, we can host the schemas on a Content Address File System, not liable by a single point of failure, that can host the schema and even the code necessary to decode that information. The way to find the decoder can be a simple 1:1 reference between the multicodec and its hash /person/ -> hash(/person/). So that they are always findable. It is like a package manager, but for data encoders/decoders.

One more thing, data structures might change over time and maybe what we consider to be a train or a ball might be not the same 10 years from now, so it is important to have versioning to enable data structures to evolve, like /ball/1.0.0.

This way, we ensure that:

we don't need to change current data structures, instead we just have to 'sprinkle' a lil bit of multicodec in top of links or new data structures we want to be self describable
we don't loose the decoders/encoders
links can be locally resolvable, without the requirement of centralized services such as DNS
we can reference data that is not even on the Internet

References:

multicodec is a self describing for protocols and or streams - https://github.com/jbenet/multicodec

The text was updated successfully, but these errors were encountered:

mildred · 2015-08-28T12:07:44Z

If you want to be completely agnostic of the JSON document, it's better not to alter it at all. JSON-LD provides this by linking to the context file using an out of band transmission: http://www.w3.org/TR/json-ld/#interpreting-json-as-json-ld

The problem with adding a key to the JSON document is that the JSON document could already have a @multicode key that means something else. Or that by adding the @multicode key, you change its semantics.

For example, if we represents a directory using a JSON document, we would have key per file name. If we add the @multicode key, we would be adding a file names @multicode in the directory.

Or perhaps some JSON-LD parser out there depends on the fact that keys starting with @ are reserved for JSON-LD semantic. We would break this parser.

I really like the idea of self describing files, but I would prefer if the multicodec was transmitted outside of the JSON document. This could be by HTTP headers, or embedding the JSON document after a multicodec header.

daviddias · 2015-08-28T14:54:04Z

The problem with adding a key to the JSON document is that the JSON document could already have a @multicode key that means something else. Or that by adding the @multicode key, you change its semantics.

You are right, I wish we could find a key that was not used by anything else, by parallel to @context, @type, @id, etc in JSON-LD, which might be used already by some JSON blobs, it is a matter of serving 99.999% of the scenarios, creating a 'good enough' solution.

For example, if we represents a directory using a JSON document, we would have key per file name. If we add the @multicode key, we would be adding a file names @multicode in the directory.

Not necessarily, we can have a multicodec for the unix file format, but we can also have a multicodec that specifies that a type of JSON blob is a directory of files and with that, only having a multicodec in the top level JSON object. The level of granularity of how data structures are defined and encoded is up to the user. Like storing things in a Hard Drive, in the beginning it is just a very long byte array, but once we have a pattern, we don't have to to specify each byte belongs to which format.

Or perhaps some JSON-LD parser out there depends on the fact that keys starting with @ are reserved for JSON-LD semantic. We would break this parser.

Good point, I wrote with '@' to leverage the bias we now have from JSON-LD that '@' is a key that will define the type of data, but we can use '#' or any other char for this matter.

I really like the idea of self describing files, but I would prefer if the multicodec was transmitted outside of the JSON document. This could be by HTTP headers, or embedding the JSON document after a multicodec header.

@jbenet also mentioned that idea and I'm also in favor of making fully self described data a 1st class citizen of a Self Describing Information System. The reason why I'm also in favor of having the option of having a kv that describes a JSON blob or a given remote link is for human readability and the ability to extend already made JSON api with a encoder/decoder for it. for e.g:

Imagine I have my.api.com/humans which till today it returns a list of humans, this API endpoint was designed without any notion of LD. Now I want to use that data in my app, but I know that the format stays the same, so I build the /humans/ codec which knows how to interpret that data, I can then have a link in my app that points to that url with a @multicodec: /humans/.

daviddias mentioned this issue Nov 13, 2015

WIP: IPLD spec ipfs/specs#37

Merged

5 tasks

daviddias closed this as completed Apr 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A MerkleDAG Linked Web or IPLD proposal/idea #2

A MerkleDAG Linked Web or IPLD proposal/idea #2

daviddias commented Aug 28, 2015

mildred commented Aug 28, 2015

daviddias commented Aug 28, 2015

A MerkleDAG Linked Web or IPLD proposal/idea #2

A MerkleDAG Linked Web or IPLD proposal/idea #2

Comments

daviddias commented Aug 28, 2015

mildred commented Aug 28, 2015

daviddias commented Aug 28, 2015