-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many Code duplicates #89
Comments
See: #59 and the followup #68. Basically, 'U' happens to encode to 0x55 in ASCII/UTF-8 but 'U', itself, is a symbol. Multibase only really makes sense in a text context where we have a string of character symbols. Note: after some followup discussions, we realized that these really don't belong in the same table. Technically, bytes are also symbols but I'm not aware of any text encoding that allows for both character symbols and byte symbols. The current setup causes more confusion than it's worth. |
The terminology "symbol" is really confusing, because it does not belong to any data type in, as far as I know, any programming language. At least we have data type From the implementation point of view, I still don't understand how to represent a symbol, where the other Codes are having |
According to this implementation https://github.com/multiformats/js-multicodec/blob/master/src/base-table.js, there are only the following bases implemented:
And here, the so called symbols are actually treated as hex in Byte. Can I do this if I want to implement it in another language? |
Update from #76: Both js-multicodec and py-multicodec are wrong. |
Copied from #76 (comment) to keep everything in this thread:
When I say symbol, I'm talking about these: https://en.wikipedia.org/wiki/Turing_machine (I agree this is confusing. It's "technically" correct but I can't think of a better explanation that's still correct.) |
So that means, in an actual implementation, a symbol has to be implemented as a special data structure, maybe called class Symbol (
val isByte: Bool = ~
val value: Bytes = ~
) |
It's probably best to just have two tables:
Combining them under a single abstraction probably isn't worth it. For multibase, you'd just use whatever encoding your language supports. For example, the symbol 👍 has one encoding in UTF-8, another in UTF-16, and another in UTF-32. At the end of the day, that doesn't really matter. The important part is whether or not some string starts with the symbol 👍 (regardless of encoding). |
However, a new question is: if we treat so many different things (such as |
Those all occur in a binary context. That is, they all answer the question "what does this series of bytes mean". However, mulitbase occurs in a text context. It answers the question "how do I convert this sequence of characters to a sequence of bytes". |
import com.github.fluency03.multibase.Multibase
import com.github.fluency03.multibase.Base._
val str = "Multibase is awesome! \\o/"
Multibase.encodeString(Base32Upper, str) // BJV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP
Multibase.encodeString(Base32Pad, str) // cjv2wy5djmjqxgzjanfzsaylxmvzw63lfeeqfy3zp
Multibase.encodeString(Base32PadUpper, str) // CJV2WY5DJMJQXGZJANFZSAYLXMVZW63LFEEQFY3ZP
Multibase.encodeString(Base32Z, str) // hji4sa7djcjozg3jypf31yamzci3s65mfrrofa53x
Multibase.encodeString(Base58Flickr, str) // ZxaJjNnAzU5jHQLhoLrXxcVM66Ca1VkLWAT
Multibase.encodeString(Base58BTC, str) // zYAjKoNbau5KiqmHPmSxYCvn66dA1vLmwbt
Multibase.encodeString(Base64, str) // mTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw
Multibase.encodeString(Base64Pad, str) // MTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw==
Multibase.encodeString(Base64URL, str) // uTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw
Multibase.encodeString(Base64URLPad, str) // UTXVsdGliYXNlIGlzIGF3ZXNvbWUhIFxvLw==
val encodedStr: String = Multibase.encode(Base16, str.getBytes)
// encodedStr: String = f4d756c74696261736520697320617765736f6d6521205c6f2f
val decodedBytes: Array[Byte] = Multibase.decode(encodedStr)
// decodedBytes: Array[Byte] = Array(77, 117, 108, 116, 105, 98, 97, 115, 101, 32, 105, 115, 32, 97, 119, 101, 115, 111, 109, 101, 33, 32, 92, 111, 47)
val decodedStr = new String(decodedBytes)
// decodedStr: String = Multibase is awesome! \o/ If you take this as an example, you can also say: what does this series of bytes mean? That is, for this
|
That's a series of characters. That may or could, potentially, encode to entirely different sequences of bytes depending on the underlying encoding. For example, "f4d756c74696261736520697320617765736f6d6521205c6f2f" encoded in UTF-32 is |
According to this Protocol Description - How does the protocol work?:
So, in this example Because it starts with |
I added the "symbols" concept in #68 in an attempt to address this exact issue. I'm now proposing that we remove it in #90 because it's clear that it's still confusing. Really, multibase is a multicodec (of sorts). However, our other multicodecs all show up in a binary context while multibase shows up in a text context but this distinction and why it matters is confusing. Nit: other multicodecs usually use |
I think this is also inaccurate.
Therefore, the difference
instead of
|
Such as:
raw
0x55
- base64urlpad'U'
, which is0x55
bencode
0x63
- base32pad'c'
, which is0x63
dbl-sha2-256
0x56
- base32hex-upper'V'
, which is0x56
And,
multihash
0x31
- base1'1'
, which is0x31
multicodec
0x30
- base2'0'
, which is0x30
dns6
0x37
- base8'7'
, which is0x37
The text was updated successfully, but these errors were encountered: