Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotate hashes to be tagged pointers? does this lead to schemas? #21

Closed
dominictarr opened this issue Sep 6, 2014 · 21 comments
Closed

Comments

@dominictarr
Copy link
Contributor

I've been thinking about this a lot since discussing on irc with @pfraze the other day,
ssb has 3 types of links:

  • hash of pubkey (a Feed id)
  • hash of a message (a Message id - i.e. previous message hash on every message)
  • hash of an arbitary blob (an attachment)

If hashes where tagged with the type of thing they point to you get a really nice ability:
the system can index relationships automatically: that feed X is linked to feed Y (example, because they are friends), that message B was created strictly after message A (to which it was a reply) or that some message refers to an attachment J. Without tags, then the specific
meaning of each hash must be interpreted from it's context, or the hashed object must be
retrived and parsed.

Another idea that we have discussed recently is identifying messages via the hash of their schema. There are certainly nice things about this idea, but also unknowns.
The idea would be to have a canonical representation of each schema, and then id that schema with it's hash. this would allow objects to be tagged like in git, but also to allow
user applications to create new types, and to reflect and parse the documents without
running those applications.

To combine these ideas, we would need links to have be a hash and a type hash.
That would make each link 64 bytes long, which wouldn't fit on a 80 char terminal line.

Maybe we could just use a 1 byte tag for messages, feeds and attachments,
An attachment could be tagged by having the hash of it's schema at the beginning.
Of course, this would be incompatible with most standard mimetypes, so we'd need a raw
blob as well... so that would be 4 id types? (feed, message, attachments: tagged and raw?)

What should links be like?

We could get away with {T}{hash} but I think there is a strong case for including other metadata in the link, such as the size of an attachment, or the feed id and sequence of a message? or the ip address of a relay, as part of a feed id. Sometimes this extra metadata
might be unnecessary or unwarranted or would just create a token that is too long.

@jbenet has a similar idea over here: jbenet/random-ideas#1

@pfrazee
Copy link
Contributor

pfrazee commented Sep 6, 2014

My first proposal: a base-256 numbering system that goes 0-9, A-Z, and then uses emojis for the remaining 214 numbers. Now 64 bytes fits in the 80-char terminal. ✌️ victory declared

The goal is context-free processing... what I'm wondering is, do we have the structure to do that? Links would be embedded in message bodies, right? If there's no standard schema, then we can only do contextual processing (by the apps that manage the message type).

@jbenet
Copy link

jbenet commented Sep 6, 2014

I don't recommend putting in types of objects into the hash value itself. IPFS handles it this way:

https://github.com/jbenet/go-ipfs/blob/master/merkledag/merkledag.go#L19-L40

We're still discussing whether the Link struct will carry a type, or the type will be in the Data portion.

@dominictarr
Copy link
Contributor Author

@pfraze yeah, for this idea to work we'd need a standard encoding for messages (for example, msgpack) If an app really needs something special then it can use a buffer inside message pack.

@jbenet I would love to hear your thoughts on why tagged links are bad?
It would be more complicated to support this in ipfs, because you can create arbitary document types. In our case, we only have 3 main types of object, so tagging them seems straightforward.

@pfraze <3 your idea for unicode'd encoding. it would actually be longer, but would look awesome. I think @substack would approve of this idea also.

@dominictarr
Copy link
Contributor Author

Instead of using tag+hash it might be better to have {link: hash, meta: metadata, type: linktype}

That way the metadata can be kept in the index. this means we can link to a message and also optionally include the author id of that message (which may be useful).

maybe we could put a type in the message. so if a message contains a link to a author with a type: 'follow' then they followed that key -- we'd have permissions about what sort of links that app was allowed to create. This scheme would be highly flexible because nodes could create multiple links of different types if they needed.

Since messages are already size limited, it's not really a problem if metadata is large.

@pfrazee
Copy link
Contributor

pfrazee commented Sep 10, 2014

I'm in favor that. It brings us back to our issue of type semantics and assigning unique names to types (same issue as with schemas)

@jbenet
Copy link

jbenet commented Sep 10, 2014

Instead of using tag+hash it might be better to have {link: hash, meta: metadata, type: linktype}

Yes, at least do this :)

maybe we could put a type in the message. so if a message contains a link to a author with a type: 'follow' then they followed that key -- we'd have permissions about what sort of links that app was allowed to create. This scheme would be highly flexible because nodes could create multiple links of different types if they needed.

Yep, this is roughly the model we're following.

A slightly modified version is to think of two classes of links (roughly map to raw ptrs and smart ptrs):

  1. link -- just the hash
  2. link with metadata -- an object with metadata about the link.

you implement 2 on top of 1:

  // given
  p1 = {name: "foo"}
  p2 = {name: "bar"}

  // metalink outside of the file (in the links changing / Ted Nelson / TBL 2.0 friendly way)
  // p1 and p2 don't change with creation of links.
  follows = {person: Hash(p1), follows: Hash(p2)}
  // or even straight up triple
  follows = {source: Hash(p1), target: Hash(p2), type: Hash(followRelationship)} 

  // metalink inside of the file (TBL 1.0 style)
  m1 = {text: "o hai @dominictarr!", sender: Hash(p1), recipient: Hash(p2)}  

TBL 1.0 = http web
TBL 2.0 = semantic web

@jbenet
Copy link

jbenet commented Sep 10, 2014

((a thought alongside is that IPFS proposes files do belong IN the file, but that meaningful links are also files, so Link => Objects is a thing.))

@dominictarr
Copy link
Contributor Author

@jbenet so you are saying that the link itself needs to be an object that can be linked to?
can you describe a usecase for this?

@pfraze you are correct about the names... maybe the solution is to make any link revokable?
then we can handle cases where problems arise?

@pfrazee
Copy link
Contributor

pfrazee commented Sep 11, 2014

@dominictarr We just need a global namespace, and I think that means we either use DNS or something GUIDlike-- maybe the idea where we publish a type definition on the feed as a message and do author_hash + typedef_message_hash. That's nice because it's immutable, but it's also 64 bytes. Maybe we could get away with just typedef_message_hash but that does have a non-zero collision risk.

@dominictarr
Copy link
Contributor Author

Maybe we can just use names for now, and then change to hashed schemas when we figure that out. if the type can be up to 32 bytes long, then that will be possible.

@dominictarr
Copy link
Contributor Author

Maybe we could just whitelist link types for now, and then switch to hashed schema types.

@pfrazee
Copy link
Contributor

pfrazee commented Sep 18, 2014

Maybe we should take the same stance on types for messages that we do with links -- don't ever try to enforce a global namespace, and trust developers to coordinate with each other and come up with good identifiers.

You're going to have to validate messages no matter what, and if you want something stronger to disambiguate the semantics, you can use your own identifier: { type: 'foomsg', message: { paulFooType: 'v2' }}. In practice, you can avoid most collisions with a dash: orgname-type or projectname-type. This is what the HTML custom elements do -- custom elements have to use a dash in their names.

@dominictarr
Copy link
Contributor Author

yeah. well this will have to do for now anyway.

@jbenet
Copy link

jbenet commented Sep 18, 2014

Thoughts on JSON-LD?

Part of me wants to force it, since it's trivial addition of a context. Could really really help.

Maybe the right thing to do for me is define a way to do It that doesn't force json (you can see JSON-LD as a Tree-LD, protobufs are trees)

Make sure you watch the JSON-LD video before dismissing it.

Sent from Mailbox

On Thu, Sep 18, 2014 at 3:09 PM, Dominic Tarr notifications@github.com
wrote:

yeah. well this will have to do for now anyway.

Reply to this email directly or view it on GitHub:
#21 (comment)

@pfrazee
Copy link
Contributor

pfrazee commented Sep 18, 2014

Which video?

@jbenet
Copy link

jbenet commented Sep 19, 2014

msporny's explanations are really good.

@dominictarr
Copy link
Contributor Author

@jbenet what do you think are the key points? it would help if your links contained more context (!)

@jbenet
Copy link

jbenet commented Sep 19, 2014

@dominictarr watch the JSON-LD one, he explains how JSON-LD works. (pro tip: bump it up to 2x speed), i really can't do his explanation justice. He makes the semantic web actually tractable.

@dominictarr
Copy link
Contributor Author

if it's so simple, why can't you explain it in a sentence or two?

@dominictarr
Copy link
Contributor Author

Okay I watched the videos, but to be honest, all I got from it was that you have a link with properties and a context... it sounds like the context is a schema of some sort.

@dominictarr
Copy link
Contributor Author

okay so we went with this, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants