Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add version info to Hashes #466

Open
pchiusano opened this issue Apr 19, 2019 · 6 comments
Open

Add version info to Hashes #466

pchiusano opened this issue Apr 19, 2019 · 6 comments

Comments

@pchiusano
Copy link
Member

See #459

@pchiusano pchiusano added this to the M1 milestone Apr 19, 2019
@pchiusano
Copy link
Member Author

I'm moving this off of M1 (but open to PRs from new contributors!!). The codebase format is already going to be versioned separately, so it's less important that the base58-encoded hashes of that format include Unison version information (and the hash algorithm) now. We could choose to add this info in a later version of the codebase format, or not, since it will be redundant - within a version of the codebase format, all the hashes will be of the same type.

I see this as being more useful when displaying hashes to users and when sharing copy-pastable hashes that can have an unambiguous meaning even as Unison evolves. It might also prove useful in the implementation of the Unison inter-node protocol so we can keep that in mind, too.

Here's a proposed self-describing hash, using multiformats:

<multibase><unison-multicodec-id><unison-version-id><multihash>

Notes:

  • <multibase> will just be z for bitcoin base 58 if we're rendering the hash as text.
  • <multihash> format is just <hash-algo><hash-len-in-bytes><hash-value>
  • The <unison-multicodec-id> is added to the community table. This is the only thing that will be added to that table. The idea is we'd like to avoid spamming that community table every time there's a new version of Unison and we don't want that to be a bottleneck for doing releases of Unison.
  • The <unison-version-id> references a Unison application-specific multicodec table. (Initially, the "table" will just have one entry in it, Hash.unisonVersion1 = 1 :: Word8, just stored in the Unison source itself).

I'd be open to a PR for this, I would just edit the Unison.Hash module. Some implementation notes (assuming the above sounds good):

  • Open a PR against https://github.com/multiformats/multicodec#multicodec-table to add an entry for Unison. You can reference this issue.
  • I dunno if I'd bother with the multihash dependency, these formats are so simple, it's like 3 LOC...
  • The Hash type could still be a Hash ByteString, but those bytes will be:
    • <multibase><unison-multicodec-id><unison-version-id><multihash>
    • <multihash> is just <hash-algo><hash-len-in-bytes><hash-value>
    • So, basically, just don't include the multibase prefix, it's assumed to be binary.
  • Then modify the base58 and fromBase58 functions accordingly.
  • And also modify the Accumulate instance here to prepend:
    • <unison-multicodec-id><unison-version-id><hash-algo><hash-len-in-bytes> to the raw bytes produced by the hash.
  • If we need varint serialization in Haskell, that's here. But I don't think that will be needed yet until we have more than 128 versions of Unison. 😀 The community table will just have a constant in it that we'll reference in the Haskell code.

@pchiusano pchiusano removed this from the M1 milestone Apr 29, 2019
@pchiusano pchiusano added good first issue A good first issue for new contributors help wanted labels Apr 29, 2019
@tysonzero
Copy link

tysonzero commented Jan 28, 2020

Will this be a path towards allowing people to seamlessly store all their public Unison code on IPFS?

I'm a huge fan of all these CAS-focused projects, as I think it is absolutely the future, however I think a huge chunk of the benefit is being able to store all public CAS content on a single decentralized network.

@zipper97412
Copy link

Hi, newcomer here! I am also a huge fan of CAS (ipfs and ipld mostly) as I see things, the AST could be represented as an ipld object, linking with other code by CID, we could use an ipld store as backend (ipfs) for ucm to handle burden of storing artefacts. Also, we get code sync and tests results sync for free just by resolving CIDs on ipfs first
Todo:

  • Use ipld (cbor or pb) as AST storing format
  • Use ipfs as main ipld store, codebase, types, eval, namespace etc... will be stored and published by ipfs
  • Also provide other store implementations that does not depend on ipfs but still use ipld as format, ex: local store in folder (like current implementation)

Ipld also defines an archive format that can be used in future as binary format for standalone executables and/or library just by providing an ipld store implementation for ucm

I will probably open an issue for this later for comments :)

@jphastings
Copy link

Did you get anywhere with this @zipper97412?

@solomon-b
Copy link
Contributor

I can try to take a stab at this if @zipper97412 is busy.

@tysonzero
Copy link

@pchiusano What's the reason for using a custom unison-multicodec over dag-cbor? Filecoin for example uses dag-cbor. This will give you a lot more interop with existing and future tooling, as dag-cbor is more or less the preferred multicodec outside of some dag-pb for files and folders.

@mitchellwrosen mitchellwrosen removed the good first issue A good first issue for new contributors label Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants