Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalise node ID representation #261

Closed
joshuakarp opened this issue Oct 22, 2021 · 15 comments · Fixed by #318
Closed

Canonicalise node ID representation #261

joshuakarp opened this issue Oct 22, 2021 · 15 comments · Fixed by #318
Assignees
Labels
design Requires design development Standard development discussion Requires discussion epic Big issue with multiple subissues r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy

Comments

@joshuakarp
Copy link
Contributor

joshuakarp commented Oct 22, 2021

Specification

We'd like to canonicalise the representation of a keynode's node ID.

Recall that the node ID of a node is based on the public key of a keynode: it is a public key fingerprint. At the moment, we produce a SHA-256 hash of the public key, resulting in an array of 32 bytes. This array of bytes can be currently seen as the primitive canonicalised representation of the node ID.

The representation needs to provide:

  • fixed size encodings: the Kademlia system for node ID resolution requires a fixed-size representation of a node ID, such that we can perform bitwise XOR operations between node IDs.
  • URL-safe characters

Previously, we were using base64 as the encoding choice for a human-readable version of the node ID. This produced fixed-size node IDs of 44 characters, but used unsafe characters (e.g. /, +).

After that, we were then looking at base58btc. This has the advantage of no longer producing URL-unsafe characters, as well as some quality-of-life improvements by not including visually-similar characters. However, it can produce different sized node IDs. See below:

import * as keysUtils from './src/keys/utils';
import {
  util as forgeUtil,
} from 'node-forge';
import { base58btc } from "multiformats/bases/base58";

async function main() {
  const pemPublicKey43 = `-----BEGIN PUBLIC KEY-----
  MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA0dTpU0P4AMR3bWs9t4KQ
  WWkQZZsjqMLj4Hr2c3n6bYhqCl4e69P7tFUG6mrIGomUb/JCqZ5cik+IyDoKz/rw
  wB6vJ51ajT4nLgiCgKbiMeU3gdlXKDpJSBxi85sFFjprbmJiO41r2ZezE7bS62Sl
  k8vkPI2IPhH4H/Zun60iO0lRBMd/aWSZ03hNZHGBe1JVTzkoxLneaNOd5XRQ90KO
  TPButem2+5tU91oQsFH4VZONHuJbHEuYeiJUbFtgKNvwDe8w/0BmDoV5r9R7oEjI
  busgF0SQv6DkQILL2PfhJAsYQdY3azd07q3Nhm4YgQJg3r7XbN4+goPUwCr3073d
  rM1Ulc2zh9Oukx7iqzlGjjQWJ7q6bd2ewn/E9dlQDNc8KP4Fq08aMo1bRGSX1Pl8
  h5E5lr/FjKgNmxLwe+gBqo/iMvGYISo6IedEpw2xLcWxQrZ6yu9MzCi+6l6mr/M7
  PQqye8FzMXMxaaqIBi0WUkgg3qTynqAM/jde64GcBdLv4UfkbSkgSRlJb5srmcrc
  jckqpjeP4yRO0ex0ynkMMi78LT0yG3E/XYqFx+XD6EGEpEKMbVPwBmOyEapRbVVI
  DLBEmR6e6WuvdnyiJksCUpZhuDrVyJuVOImL8FJMFtanpPxwZe3FpNaGJQxa0m1d
  VQE/QpKzYZL0pMhBhHr/JBkCAwEAAQ==
  -----END PUBLIC KEY-----`;
  const pemPublicKey44 = `-----BEGIN PUBLIC KEY-----
  MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA3SABwHq1vJ9ydCl2eW/Q
  r3guKAPdesaxkm85k4EhJ2ywvDcw0wjh39KbcU5ymexxDGzKa3DkhrGzzHbQmS37
  Ee0Nvm6VR2PI+rhmRtJAVztAPPTMj3YHJXaqpoffILXt508jAnvQ38477tZ8hxSR
  /NfZ2OYCLJrCENXvIy6NOPWBwcXPJUL02fDPpaZsvcnoLpIptOnIEuQXL3YJHjKd
  w6yYT4g0sy0MSqp8NFPzzlcY4k5gKO1KR/G2/o2DFErd0n2a7k+SY/ChuUaZpk6U
  q88zfLSRdUCM4pe469ZRrlz7lRoQUemBrmDx0g00MS9GsSEgOZFz3xqt+/3jE+vg
  vt9Ul4929z+9onK5NvjwVR3gRDAl9IFW6P3YVdnyPUo3tNkgQ3Q9Pew/cmulGVf3
  js/DiS9QVjivscfWGbVnkRPuNlUQ0WX4mNEDtYUEGsuYIRkgvBwHYFO1TGQWn3P3
  vEfUg61LzP8WHoYxb/nmYcd14t8zt9reQnCOk+/ON+Btp1Q1ZhIiy7uxG0IGE7m2
  Nawvql9YGFheFC56TYnikm3jfw4boyaqwCeVmqRBAe/NCXcIzDK76AmEvfjzitOl
  2f8rtFMcRE9LFzAYf0bGMbtLEWmE3tMbxDeO3B+DXKtUjeRun6TWca1biyK8niI0
  xsGTzpaAygxgj5DHF8OjP9ECAwEAAQ==
  -----END PUBLIC KEY-----`;
  const publicKey43 = keysUtils.publicKeyFromPem(pemPublicKey43);
  const publicKey44 = keysUtils.publicKeyFromPem(pemPublicKey44);

  const bytes43 = keysUtils.publicKeyToFingerprintBytes(publicKey43);
  const bytes44 = keysUtils.publicKeyToFingerprintBytes(publicKey44);
  console.log('publicKeyToFingerprintBytes length 43:', bytes43.length);
  console.log('publicKeyToFingerprintBytes length 44:', bytes44.length);
  const fTypedArray43 = forgeUtil.binary.raw.decode(bytes43);
  const fTypedArray44 = forgeUtil.binary.raw.decode(bytes44);
  console.log('fTypedArray 43:', fTypedArray43);
  console.log('fTypedArray 44:', fTypedArray44);
  console.log('Hex string 43:', Buffer.from(fTypedArray43).toString('hex'));
  console.log(base58btc.baseEncode(fTypedArray43));
  console.log(keysUtils.publicKeyToFingerprint(publicKey43));

}

main();
[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm run ts-node -- nodeIds.ts 

> @matrixai/polykey@1.0.0 ts-node /home/josh/Documents/MatrixAI/js-polykey
> ts-node -r tsconfig-paths/register "nodeIds.ts"

publicKeyToFingerprintBytes length 43: 32
publicKeyToFingerprintBytes length 44: 32
fTypedArray 43: Uint8Array(32) [
  10,  45,  47,  45, 125, 109,  40,  40,
  20, 160,  60, 163, 101, 100, 189, 222,
  38,  35, 246, 172,  73, 251, 232, 221,
  13,  97, 139, 151, 240, 251, 129, 201
]
fTypedArray 44: Uint8Array(32) [
  233, 254, 155, 220,   4, 146, 225,  43,
  130, 201,  87,  50,  52, 141, 221, 156,
   15, 162, 131,  58, 120, 191, 146,  36,
   49, 133,  50,  92, 203,  95, 232,  13
]
Hex string 43: 0a2d2f2d7d6d282814a03ca36564bdde2623f6ac49fbe8dd0d618b97f0fb81c9
gj3DUVjzAGHvmP9ZRFB5geWLVRwXUSdEUtMcVvxozLx
gj3DUVjzAGHvmP9ZRFB5geWLVRwXUSdEUtMcVvxozLx

As a temporary solution, we've adopted a base32hex encoding, producing a fixed fixed 53 character node ID, such as v6n7m9vuf44pfqq133re8v91ju7a8968h4nkirjt8r7r9pbcoqqq0 (with a prepended v character from multibase): https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_711237922

Additional context

Tasks

  1. Choose a standard means of representing the node ID
  2. Change all current usages of node ID to be this new standard (e.g. within NodeGraph (kademlia), across GRPC, etc)
@joshuakarp joshuakarp added development Standard development design Requires design discussion Requires discussion labels Oct 22, 2021
@CMCDragonkai
Copy link
Member

My proposal is to just use a class NodeId extends Uint8Array object. We can also use permaproxy as well if we want to refer to an primitive or Uint8Array.

Then any kind of encoding can be used, and we always use NodeId as the official type.

We knock out the problem of dereferencing node ids improperly, and also ensure that all domains are use the byte-form canonical version of NodeId and not some encoded form.

The exact encoding algorithm we use has some important features needed:

  1. base64 has characters that cause problems like /
  2. do you need lexicographic order? if so you have to use base32hex or one of the encodings talked about here: Which of the multibase formats preserves lexicographic order? multiformats/js-multiformats#124 (comment)
  3. do you want padding? As per the answer https://stackoverflow.com/a/26632221/582917, the reason padding is used or not used depends on whether there will be a concatenation of encoded node ids without any kind of structure indicating separation, ideally we don't want padding

If we do choose base58btc, that's fine, as long as the above 3 problems are resolved, and no domains in PK are using string/encoded form of node id, and they are all using NodeId as a type, whether that is a special singleton object, or a permaproxy.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 22, 2021

Remember with this issue done, it does not matter what the encoded node id lengths are, and there is no guarantee that the all encoded node id lengths are the same length and this is fine cause it doesn't matter anymore.

At the moment in the source code, there's a need to temporarily ensure that the base encoding we are using is producing a fixed length. That's base32hex. But this is only because so many domains like kademlia, and others all assume a string base id that is of a fixed length.

@joshuakarp
Copy link
Contributor Author

Some interesting discussions that came up with Brian and I whilst we've been debugging the tests.

Previously, the node ID's "canonicalised" representation was its string representation (as a base64/base58 string). Instead, we're now using the actual, internal 32 bytes of the node ID for all calculations in the nodes domain. See https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_712546331 for more details.

@joshuakarp
Copy link
Contributor Author

This work on the node ID could also relate to #168. When we eventually move to Ed25519 keys, then the node ID will be the 32 byte sequence of the Ed25519 public key.

@CMCDragonkai
Copy link
Member

As talked about in #269, we need to support multibase encoded Node IDs as these will be used by PK_SEED_NODES env variable and related configuration. Right now I believe @joshuakarp has already made the changes to nodes domain to use the 32 byte Uint8Array, which means we actually can reuse base58btc now, but there are other domains that are still using string-encoded Node IDs, all of those will require refactoring.

Note that we do not need ordered base encoding for Node Ids, Node Ids do not have any order.

One question I have is whether this blocks testnet deployment? If not, then this can come after we have testnet working, and I don't think it blocks general publishing and distribution of PK CLI #268. If so this can be pushed to the icebox.

@joshuakarp
Copy link
Contributor Author

No, I don't think it does block testnet deployment, nor #268, so this can be pushed back.

@joshuakarp
Copy link
Contributor Author

For this issue, I think we need to do a bit more speccing out of the requirements for our string-encoded node ID. e.g. what's needed from a CLI point of view? What would we want a user to see as the representation of their node ID? Do we have length constraints on this? etc etc

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 27, 2021 via email

@joshuakarp
Copy link
Contributor Author

Seems like Wireguard uses base64 encodings: https://www.wireguard.com/quickstart/#key-generation

Interestingly, seems like the standard for SSH keys is also base64: https://datatracker.ietf.org/doc/html/rfc4716#section-3.4

@joshuakarp
Copy link
Contributor Author

Perhaps there's also merit in standardising this with our claims on the sigchain with base64url (removes unsafe URL characters for transfer): https://datatracker.ietf.org/doc/html/rfc7515

@CMCDragonkai
Copy link
Member

Note that ErrorInvalidId should be removed. Parsing whether an id is valid or not is supposed to happen on the boundaries of the program, not internally. So it should be part of parsing functions. Internally the typesystem should ensure that we have valid ids and we should not need to have runtime exceptions for this.

@joshuakarp
Copy link
Contributor Author

As a refresher, it's worthwhile to remember that we've already done some form of canonicalisation of the node ID, as discussed here: #261 (comment). See the following commit too: 5f382e2

@joshuakarp
Copy link
Contributor Author

It will also potentially be worthwhile to undertake #299 (the review of the GenericIdTypes.ts 'module') alongside this work.

@CMCDragonkai
Copy link
Member

With #318, we can use the raw typed array/buffer form of NodeId without needing to encode it first, this works by using the Id from js-id. The canonical version of NodeId is the byte-representation.

Encoded representations of NodeId can be used elsewhere where string form/human readable is needed. And in those cases, we would use base32hex in order to preserve lexicographic order.

@CMCDragonkai
Copy link
Member

Example of encoding the public key fingerprint as Id: #254 (comment)

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy label Jul 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Requires design development Standard development discussion Requires discussion epic Big issue with multiple subissues r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy
3 participants