Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple multihash implementations #13

Closed
bmcorser opened this issue Oct 15, 2015 · 14 comments
Closed

Multiple multihash implementations #13

bmcorser opened this issue Oct 15, 2015 · 14 comments

Comments

@bmcorser
Copy link
Contributor

Just noticed there are (at least) two multihash implementations out there. Oddly, it appears as if they were created a few weeks apart:

@tehmaze's implementation is mentioned in the main ipfs documentation and actually does the hashing for the user, my implementation offers an interface and __eq__. Can they be merged under a common repo?

@moreati
Copy link

moreati commented Oct 15, 2015

I apologize in advance for bike shedding, I'm going to suggest a 3rd option. Like @bmcorser I'd like to see a single, canonical multihash package for Python, but I think it should expose an API based on stdlib's hashlib. Ideally it would be usable as a drop in replacement.

I'll do a rough draft to flesh out the idea.

@BrendanBenshoof
Copy link
Contributor

Just to exemplify this problem: https://github.com/BrendanBenshoof/py-multihash

I'm using it in UrDHT and just about any project that needs hashing.

@bmcorser
Copy link
Contributor Author

@moreati 👍 for hashlib API

@JulienPalard
Copy link
Contributor

Implemented an hashlib-compatible (PEP247) one, missing the "decode" part as the PEP obviously don't provide one. Also I'm not in the decode vocabulary, I think we can find better. And also I don't already know exactly why we would decode it :-p

BTW, here it is, just a draft, open for any merge in any directions to hopelessly avoid the https://xkcd.com/927/ effect.

https://github.com/JulienPalard/multihash

@BrendanBenshoof
Copy link
Contributor

@jelien that looks great! we should add multiple hash value
encodings/decodings other than Hex. You also have some issues with not
using varInts, but that is long-term future proofing.

On Sun, Oct 25, 2015 at 2:13 PM Julien notifications@github.com wrote:

Implemented an hashlib-compatible (PEP247) one, missing the "decode" part
as the PEP obviously don't provide one. Also I'm not in the decode
vocabulary, I think we can find better. And also I don't already know
exactly why we would decode it :-p

BTW, here it is, just a draft, open for any merge in any directions to
hopelessly avoid the https://xkcd.com/927/ effect.

https://github.com/JulienPalard/multihash


Reply to this email directly or view it on GitHub
#13 (comment).

@JulienPalard
Copy link
Contributor

Multiple hash values ? I'm not getting it, sry, what do you mean ?

varints ? I'm not sure I'm getting this one right, but, the spec for multihash is: <1-byte hash function code><1-byte digest size in bytes>, so what did I missed ?

@BrendanBenshoof
Copy link
Contributor

multiple (hash value encodings) other than hex. Essentially other
bases/alphabets like base 64 or base 58. I'd write a general:
encode(bytes,alphabet)->string and decode(string,alphabet)->bytes then wrap
it with the default alphabets.

Read: jbenet/random-ideas#1
especially the section on varints.

On Mon, Oct 26, 2015 at 11:30 AM Julien notifications@github.com wrote:

Multiple hash values ? I'm not getting it, sry, what do you mean ?

varints ? I'm not sure I'm getting this one right, but, the spec for
multihash is: <1-byte hash function code><1-byte digest size in bytes>,
so what did I missed ?


Reply to this email directly or view it on GitHub
#13 (comment).

@JulienPalard
Copy link
Contributor

I'm not sure it's the responsibility of the "hashlib" / "multihash-lib" to encode in base64.

Python already have a https://docs.python.org/3.4/library/base64.html module to do this.

Also, doing the encoding in our hashlib-compatible implementation of multihash get our implementation away from PEP247. And by doing so, we cannot longer replace our multihash implementation by another (also PEP247 compliant) one.

I got it for varints, and will do, I open a ticket, yet we have plenty of time to do so, current implementation is compatible with varint up to 128 hashes :)

@JulienPalard
Copy link
Contributor

Ticket opened about varint, I think I will NOT implement it until mandatory. The necessity will happen with a commit adding the 128th hash, that's a human intervention, not just the time flowing, so, adding this 128th will be the point in time we'll have to implement varint.

I mean, the cost of parsing a varint when we know for sure (list of known hashes is hardcoded) that we have less than a byte, is clearly a waste of time (human and CPU).

JulienPalard/multihash#1

@BrendanBenshoof
Copy link
Contributor

@julien +1 agreed/persuaded on all counts

On Mon, Oct 26, 2015, 12:04 Julien notifications@github.com wrote:

Ticket opened about varint, I think I will NOT implement it until
mandatory. The necessity will happen with a commit adding the 128th hash,
that's a human intervention, not just the time flowing, so, adding this
128th will be the point in time we'll have to implement varint.

I mean, the cost of parsing a varint when we know for sure (list of known
hashes is hardcoded) that we have less than a byte, is clearly a waste of
time (human and CPU).

JulienPalard/multihash#1
JulienPalard/multihash#1


Reply to this email directly or view it on GitHub
#13 (comment).

@amstocker
Copy link

@JulienPalard I wrote a simple unsigned-varint conversion util for a python multiaddr implementation that I started. You guys are definitely welcome to absorb or cherry pick from what I've started.

@JulienPalard
Copy link
Contributor

@amstocker That's cool, thx, I'll probably put the code ready-to-use, like in a comment, but unused as long as it's not mandatory. As I said, it's not worth the cost of using varInt as long as the hardcoded list of supported hashes does not use the first byte.

@amstocker amstocker mentioned this issue Nov 20, 2015
8 tasks
@candeira
Copy link
Contributor

I've made a plan for implementing multihash/multihashing in this comment. Feedback welcome.

@ntninja
Copy link
Contributor

ntninja commented Sep 28, 2018

Reviewing the discussion here and on #23 I conclude that https://pypi.org/project/pymultihash/ is the one-stop go to solution for computing multihashes in python. I'll look into getting this moved into https://github.com/multiformats/py-multihash/ and added to the official list of implementations.

Closing this.

@ntninja ntninja closed this as completed Sep 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants