-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ID standardisation for all projects and domains #1
Comments
The above spec doesn't concern itself with deterministic IDs. That is IDs generated by hashing some value. The UUIDv3 and UUIDv5 specs are for deterministic IDs. Incorporating deterministic IDs can be useful for dealing with MatrixAI/js-db#1. Because we are exposing values to be keys, and they are not encrypted at rest, this means it can be a good idea to hash those values and ensure we have a deterministic ID. This would mean that we get a similar ID representation and base encoding even if the source of the ID is deterministic data. Note that it also supports the usecase of namespaces. Namespaces sound like the machine ID that we have, and one would imagine the generation of a namespaced random identifier. Note that the UUID specs state that they use MD5 or SHA1. It sounds like UUIDv5 would override any usage of UUIDv3. See this for more info: https://stackoverflow.com/a/28776880/582917 According to https://www.sohamkamani.com/uuid-versions-explained/ it would make sense that UUIDv5 is taken over. And again, the main reason not to use UUIDv1 is because we want the higher amounts of entropy, which does mean our IDs are alot bigger. Or our compound IDs are used instead. |
We are probably not going to follow the UUID versions strictly but instead create our IDs for our 4 separate usecases above and additional usecases as we see fit. One of the things that would be nice is if we can configure whatever ID generated to be a composition of different properties. But somethings will be mutually exclusive. |
The uuid library supports writing to a given buffer. However the buffer must be a typed array or a Node buffer, not an ArrayBuffer. This makes sense, since ArrayBuffer cannot be written to directly, you must use a typed array. In this case, it's always a Uint8Array which will be more portable than using Node buffers. |
Instead of expecting a CSPRNG, you can ask the user to supply a random generator:
|
The API of uuid in some situations makes use of "array of numbers" as a representation of bytes. I think this seems like a vestigial feature from the olden days of lacking proper binary types. Now with typed arrays, we shouldn't need to support this representation. Since we're going to be using this with LevelDB in js-db, and that is always buffers, we can focus entirely on buffer types instead of array of numbers. The text representation of uuids have dashes, this seems intended to help human readability. So it seems that even this strict adherence is not necessary, this shows equivalent base encodings.
Seems like we would also be able to support multibase representations in this library too. https://github.com/multiformats/js-multiformats Our preference I believe is base58btc, which means a bit longer, but easier to double click and copy. |
Here's an example of using multibase: const idBuf = new Uint8Array(16);
uuid.v5(
'abc',
'c1b00cc0-25a3-11ec-be9b-033334f5bcba',
idBuf
);
const idEnc = base58btc.encode(idBuf);
const idEnc2 = base64.encode(idBuf);
const ds = Object.values(bases).map((c) => c.decoder).reduce(
// @ts-ignore
(d1, d2) => d1.or(d2)
);
const idBuf_ = ds.decode(idEnc);
const b1 = Buffer.from(idBuf);
const b2 = Buffer.from(idBuf_);
console.log(b1.equals(b2));
console.log(idEnc);
console.log(idEnc2); Note you can see here that we are using UUIDv5 to write a 16 byte UUID. This is then encoded as base58btc using multibase. Then to decode it, we combine all the base codecs together and reduce it into a combined decoder. This has a type error due to: multiformats/js-multiformats#121 but it works fine. If the base encoded string is not recognised, the exception is:
Or:
So any exception thrown here can be considered a parse error. The multibase encoded string looks like:
Which has a much more constrained alphabet compared to base64, and we always have the |
I wonder whether to integrate multibase directly here in ID or leave it out for user to integrate outside in case they are likely to use it elsewhere. This would then ensure that this library focuses on binary representations and leaves the string encoding to the user. |
One of the issues with using a CSPRNG, is that acquiring data may be an asynchronous operation. This would mean that the ID generation would be asynchronous as well. Note that sometimes there are synchronous or asynchronous APIs, both are possible: https://nodejs.org/api/crypto.html#crypto_crypto_randombytes_size_callback However webcrypto API specifies synchronous CSPRNG.
And our usage of node-forge would mean that we have synchronous and asynchronous forms: async function getRandomBytes(size: number): Promise<Buffer> {
return Buffer.from(await random.getBytes(size), 'binary');
}
function getRandomBytesSync(size: number): Buffer {
return Buffer.from(random.getBytesSync(size), 'binary');
} So our random IDs should then be both synchronous and asynchronous. That is we may return the raw data or return a promise on the data. |
Note that CSPRNG and PRNG are all random number generators. CSPRNGs are better than PRNGs. But this is all pluggable, and for PK's purposes we are going to always package with a CSPRNG. |
I've been investigating the uuidv7 functionality. It is claimed to be lexicographic order. However I'm not sure if this is the case when used as a bytes and without encoding. When I used lexicographic-integer https://github.com/substack/lexicographic-integer, their byte representation was also in lexicographic order, so I didn't have to hex encode to get the lexicographic order and leveldb supported buffer keys. // lexi.pack(number) => Array<number> - this converts a number into an array of number byte format
// Buffer.from(Array<number>) => Buffer -> this converts the array of numbers into a binary buffer
Buffer.from(lexi.pack(100))
// the result is usable as a leveldb key This was proven by several tests that I did in js-db. So I'm going to port over uuidv7 and test if its byte representation is still lexically ordered. |
Ok I've reviewed the UUIDv7 and UUIDv8 specs. It looks like the UUIDv8 spec would the most suitable to handle all of our usecases, and it would allow us to flexibly define and change the id structure. The structure is:
And it is possible to potentially truncate the public key id to fit entirely in the uuidv8. What ids are actually are in a shared context? In most cases you're going to always look up the node id before you look up the individual ids like vault id. Unless vault id were to be preserved across keynodes. It doesn't seem like you need to preserve it entirely if the vault remotes are used. While vault ids may not need to be "shared" anymore. I reckon claim ids may still be if we want to be able to refer to them. Anyway we can make this optional to fill in the node part of the UUID or they are just CSPRNG filled random data. |
There's no reference implementation of UUIDv8 atm, so we will just have to start with UUIDv7 reference implementation and port it over. |
So here's some interesting issues with timestamps. JavaScript has a This means: import { performance } from 'perf_hooks';
// returns the number of milliseconds and microseconds in floating point
console.log(performance.now()); // 1206298.1301128864 => 1206298 milliseconds and 0.1301128864 milliseconds
// to convert to microseconds multiply by 1000
// 1206298130.1128864 microseconds
// 1206298130112.8864 nanoseconds
// 1206298130112886.4 picoseconds
// will be equivalent to Date.now() (but with extra precision of microseconds)
// there's not enough bit space to carry the nano seconds
console.log(performance.timeOrigin + performance.now()); However some browsers will limit this to 1 millisecond resolution. It seems chrome and nodejs doesn't limit it. Also it seems there's way more than microseconds here. It seems to go to all the way to picoseconds. I don't see any documentation describing why this is the case. I think in this case we may have to stick with milliseconds to be portable across browsers for the library, or stick with microseconds if we stick with Nodejs. But milliseconds might be better idea since we can share the library later. It may actually be because JS floating point numbers have limited bit space, and thus this is the maximum largest floating point number: The usage of microseconds, in a 64 bit timestamp would be able to reach 584542 years according to https://en.wikipedia.org/wiki/Microsecond: The
You see that it says "MUST never be negative", it doesn't say it's not allowed to be However by itself this is not enough, we want to this be aligned with "wall clock time" as well, so we have some rough idea of when these events themselves have happened. And if we want to align it, we need to use
The spec says it is monotonic, bu this doesn't seem to be the case, how would browsers or node runtime get a monotonic clock? Oh yea it is still bounded by when the process is restarted. So I can imagine, we save the
The usage of the sequence is interesting as that it makes the ID monotonic, but a slight tweak to the above algorithm would allow one to always create strictly monotonic clock all the time and one could just use that. See also: uuid6/uuid6-ietf-draft#41 (comment) |
Another issue is the year 2038 problem, which if we only use 32 bits and only for seconds https://en.wikipedia.org/wiki/Year_2038_problem then our timestamp would not work. Given that we intend to use milliseconds or even microseconds we are likely to run out even more quickly, so we would need to use alot more bits. The UUIDv8 scheme allows 32 bits, then 16 bits, then 12 bits for a total of 60 bits. Then the
I reckon even 48 bits would be fine here.
If we use microseconds, then 48 bits is not enough, 60 bits is better.
If we use 64 bits, we have 4 bits left for sequence. 4 bits for sequence from 0 to 15, that is 16 numbers. If we use 60 bits, we have 8 bits left, that's 256 numbers. Anyway milliseconds at 48 bits would mean we have 12 bits for sequence, and the later 8 bits are just random or part of the node identifier. So for now I reckon we use:
|
I might have |
Note that webcrypto is only available on Node v15 and above. The LTS version of NodeJS is still v14, and doesn't support it, and our import { webcrypto } from 'crypto';
const data = new Uint8Array(8);
webcrypto.getRandomValues(data); In browsers, this is just A simple feature test would be to test if
So best test for whether we can just use the webcrypto API would be: // globalThis is expected to exist, but if not then use globalThis?.crypto
if (typeof globalThis.crypto?.getRandomValues === 'function') {
// use crypto.getRandomValues();
} else {
// make sure to bring in some default or something else to bring in random values
} |
Proposed API for import id from '@matrixai/id';
const idGen = id({/* ...options */});
const id1 = idGen.next();
const id2 = idGen.next();
const id3 = idGen.next();
// if we pass in asynchronous CSPRNG or source of information
const idGenAsync = id({async: true, /* ...options */});
await idGenAsync.next();
await idGenAsync.next(); Alternatively we can use ES6 classes with import Id from '@matrixai/id';
const idGen = new Id();
idGen.next();
await idGen.next();
// i'm not sure if it's possible to have a class with both synchronous and asynchronous iteration? Then we can support 3 kinds of ids:
The generators will be independent from each other, so properties like monotonicity are not preserved across generators. One has to use the same generator instance. Except during process restart where we would put in the last timestamp created. Since the timestamp is internal data, one might instead ask for the last ID generated, and if you pass that in, it will ensure that the next ids are always ahead of the last id. |
Playing around with this protocols of iterable, async iterable, iterator and async iterator, I can see that it's not possible to have an object be both an async iterator and iterator. The async iterator interface demands that It seems like our class Id<T = Iterator<IDBuffer> | AsyncIterator<IDBuffer>> {
next(): T {
}
} But it seems easier to just create 2 different classes to do this, plus the generator syntax simplifies alot of this, so a function construction might be better. |
Example usage: function * id () {
while (true) {
const idData = new ArrayBuffer(16);
uuid.v4({}, new Uint8Array(idData));
yield idData;
}
}
function toUUID(idData: ArrayBuffer): string {
return uuid.stringify(new Uint8Array(idData));
}
function main () {
const idGen = id();
console.log(idGen.next());
console.log(idGen.next());
console.log(idGen.next());
const u = idGen.next().value as ArrayBuffer;
console.log(u);
console.log(toUUID(u));
}
main(); The Maybe even spread but limited like It seems like we would want something like |
function *take<T>(g: Iterator<T>, l: number): Generator<T> {
for (let i = 0; i < l; i++) {
const item = g.next();
if (item.done) return;
yield item.value;
}
}
class IdSync implements IterableIterator<ArrayBuffer> {
get(): ArrayBuffer {
return this.next().value as ArrayBuffer;
}
next(): IteratorResult<ArrayBuffer, void> {
const idData = new ArrayBuffer(16);
uuid.v4({}, new Uint8Array(idData));
return {
value: idData,
done: false
};
}
[Symbol.iterator](): IterableIterator<ArrayBuffer> {
return this;
}
}
const idGen = new IdSync();
console.log([...take(idGen, 5)]); The above creates a class that acts like generator. But the generator type has |
It turns out trying to use top level await to do something like let randomSource: (size: number) => Uint8Array;
if (typeof globalThis.crypto?.getRandomValues === 'function') {
randomSource = (size) => globalThis.crypto.getRandomValues(new Uint8Array(size));
} else {
const crypto = await import('crypto');
randomSource = (size) => crypto.randomBytes(size);
}
let timeSource;
if (typeof globalThis.performance?.now === 'function' && typeof globalThis.performance.timeOrigin === 'number') {
// use globalThis.performance.now() and globalThis.performance.timeOrigin
} else {
const { performance } = await import('perf_hooks');
// use performance.now() and performance.timeOrigin
} Is quite difficult. It requires some big changes:
It's just not ready yet. But soon one day it can work. So for now, the default Another thing is that libraries generally should leave the browser bundling to the platform user. That keeps library simple so they don't have to have polyfills and assume too much, and one can always use different libraries for different platforms. Also exploring the webcrypto API it appears acquiring random bytes is always synchronous. So in this case we will stick with an synchronous API here. |
We can do this now: import { IdRandom, IdDeterministic, IdSortable, utils } from '@matrixai/id'; The Right now we cannot really make this library isomorphic, so it is Node specific for now (until we figure out how to write isomorphic libraries, or browserify when we need it). The APIs all return Additionally we have the When we use it in PK DB, it will need to be wrapped as |
BTW apparently Might make type changes in js-db to test this and then it will be easier to use js-id without converting. |
After reviewing everything, I'm changing It uses 36 bits for This is the structure then:
|
I'm hitting a problem for how best to write 36 bits into the array buffer. Some notes here: uuid6/uuid6-ietf-draft#11 (comment) It seems like it should be possible to left shift 12 bits, assuming that we have 64 bit integers... but may be we don't actually have 64 integers in JS.
|
Ok I have a python script that I can use as reference. It seems my bigint idea above isn't working. from uuid import UUID
from time import time_ns
from random import randint
TOTAL_BITS=128
VERSION_BITS = 4
VARIANT_BITS = 2
# Binary digits before the binary point
SEC_BITS=38
# Binary digits after the binary point
SUBSEC_BITS_S=0
SUBSEC_BITS_MS=10
SUBSEC_BITS_US=20
SUBSEC_BITS_NS=30
SUBSEC_BITS_DEFAULT=SUBSEC_BITS_NS
# Decimal digits after the decimal point
SUBSEC_DECIMAL_DIGITS_S=0 # 0
SUBSEC_DECIMAL_DIGITS_MS=3 # 0.999
SUBSEC_DECIMAL_DIGITS_US=6 # 0.999999
SUBSEC_DECIMAL_DIGITS_NS=9 # 0.999999999
SUBSEC_DECIMAL_DIGITS_DEFAULT=SUBSEC_DECIMAL_DIGITS_NS
SLICE_MASK_0 = 0xffffffffffff00000000000000000000
SLICE_MASK_1 = 0x0000000000000fff0000000000000000
SLICE_MASK_2 = 0x00000000000000003fffffffffffffff
def uuid7(t=None, subsec_bits=SUBSEC_BITS_DEFAULT, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
if t == None:
t = time_ns()
i = get_integer_part(t)
f = get_fractional_part(t, subsec_decimal_digits)
sec = i
subsec = round(f * (2 ** subsec_bits))
node_bits = (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS - subsec_bits)
uuid_sec = sec << (subsec_bits + node_bits)
uuid_subsec = subsec << node_bits
uuid_node = randint(0, (2 ** node_bits))
uuid_int = uuid_sec | uuid_subsec | uuid_node # 122 bits
uuid_int = __add_version__(uuid_int) # 128 bits
return UUID(int=uuid_int)
def uuid7_s(t=None):
return uuid7(t, SUBSEC_BITS_S, SUBSEC_DECIMAL_DIGITS_S)
def uuid7_ms(t=None):
return uuid7(t, SUBSEC_BITS_MS, SUBSEC_DECIMAL_DIGITS_MS)
def uuid7_us(t=None):
return uuid7(t, SUBSEC_BITS_US, SUBSEC_DECIMAL_DIGITS_US)
def uuid7_ns(t=None):
return uuid7(t, SUBSEC_BITS_NS, SUBSEC_DECIMAL_DIGITS_NS)
def __add_version__(uuid_int):
slice_mask_0 = SLICE_MASK_0 >> (VERSION_BITS + VARIANT_BITS)
slice_mask_1 = SLICE_MASK_1 >> (VARIANT_BITS)
slice_mask_2 = SLICE_MASK_2
slice_0 = (uuid_int & slice_mask_0) << (VERSION_BITS + VARIANT_BITS)
slice_1 = (uuid_int & slice_mask_1) << (VARIANT_BITS)
slice_2 = (uuid_int & slice_mask_2)
uuid_output = slice_0 | slice_1 | slice_2
uuid_output = uuid_output & 0xffffffffffff0fff3fffffffffffffff # clear version
uuid_output = uuid_output | 0x00000000000070008000000000000000 # apply version
return uuid_output
def __rem_version__(uuid_int):
slice_0 = (uuid_int & SLICE_MASK_0) >> (VERSION_BITS + VARIANT_BITS)
slice_1 = (uuid_int & SLICE_MASK_1) >> (VARIANT_BITS)
slice_2 = (uuid_int & SLICE_MASK_2)
uuid_output = slice_0 | slice_1 | slice_2
return uuid_output
def get_integer_part(t):
SUBSEC_DECIMAL_DIGITS_PYTHON=9
subsec_decimal_divisor = (10 ** SUBSEC_DECIMAL_DIGITS_PYTHON)
return int(t / subsec_decimal_divisor)
def get_fractional_part(t, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
SUBSEC_DECIMAL_DIGITS_PYTHON=9
subsec_decimal_divisor = (10 ** SUBSEC_DECIMAL_DIGITS_PYTHON)
return round((t % subsec_decimal_divisor) / subsec_decimal_divisor, subsec_decimal_digits)
def extract_sec(uuid):
uuid_int = __rem_version__(uuid.int)
uuid_sec = uuid_int >> (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS)
return uuid_sec
def extract_subsec(uuid, subsec_bits=SUBSEC_BITS_DEFAULT, subsec_decimal_digits=SUBSEC_DECIMAL_DIGITS_DEFAULT):
uuid_int = __rem_version__(uuid.int)
node_bits = (TOTAL_BITS - VERSION_BITS - VARIANT_BITS - SEC_BITS - subsec_bits)
uuid_subsec = (uuid_int >> node_bits) & ((1 << (subsec_bits)) - 1)
return round(uuid_subsec / (2 ** subsec_bits), subsec_decimal_digits)
def list():
print("UUIDv7 sec in sec out subsec in subsec out")
for i in range(10):
t = time_ns()
u = uuid7(t)
i = get_integer_part(t)
f = get_fractional_part(t)
sec = extract_sec(u)
subsec = extract_subsec(u)
print(u, str(i).ljust(12), str(sec).ljust(12), str(f).ljust(12), str(subsec).ljust(12))
list() |
Ok so first point of different is the use of fixed point numbers instead of floating point numbers. This PDF explains how to convert a floating point number to a fixed point number: http://ee.sharif.edu/~asic/Tutorials/Fixed-Point.pdf This is used in the part of the script that converts the f = get_fractional_part(t, subsec_decimal_digits)
subsec = round(f * (2 ** subsec_bits)) Where So This was then converted to Why is The upstream issue (uuid6/uuid6-ietf-draft#11 (comment)) says:
And also:
So here we see that this is earlier then the latest draft of the spec. The idea is to use a fixed point number with The current state of the spec says that I wonder what the advantages of using fixed point number here instead of just using the truncated subseconds? Maybe it's more accurate to do it this way? |
And now I find the up to date reference implementation in python: https://github.com/uuid6/prototypes/blob/main/python/new_uuid.py. And yes, they are going ahead now with 36 bits there, and with the fixed point number system for the subsecond sections. The python mechanism supports nanoseconds. Due to JS cross-platform limitations we are stuck with milliseconds for now even though other parts of this system only work in NodeJS for now. (Nodejs technically supports nanoseconds via hrtime). |
Ok so the in the That is: sec = 1633608789
unixts = f'{sec:032b}'
print(type(unixts)) # <class 'str'>
print(len(unixts)) # 32 The
It is 32 characters long, and it is 0-padded from the left. The The Python language can use bit strings, and turn them back into numbers with: Therefore when they finally put together all the bits at: UUIDv7_bin = unixts + subsec_a + uuidVersion + subsec_b + uuidVariant + subsec_seq_node They are actually just concatenating strings together, and at the end it can be turned into a integer OR hex encoded:
|
In JS, we can achieve a similar thing with
So that seems a lot easier to do then having to fiddle with typed arrays. However this reminded me of something, the bitset data structure that I've used before in js-resource-counter. The library is still maintained today as https://github.com/infusion/bitset.js and the concept of a data structure for bit arrays is located here: https://en.wikipedia.org/wiki/Bit_array. There's even faster libraries these days like https://github.com/SalvatorePreviti/roaring-node which are using compressed bitmaps, but overkill for this library. In the wiki article you can see that bit level manipulations can with bitwise ops against numbers, in this case the number acts like the "word" structure. It's a bit more esoteric as proper understanding of the bitwise ops are needed. I think it's sufficient to actually use bitstrings, we just need to create the equivalent of
The We can do typed array as an optimisation later when we have this working correctly and develop the proper test cases. |
To convert a bitstring back to an array of numbers that can be put into a typed array, we may need to chunk it up to 8 bits each and convert to a number. Also the act of converting a number to a base 2 string outputs in big endian form. This makes sense as the 0 padding occurs on the left. So we can go with bitstrings first and then later in the future optimise to using typedarrays directly acting as a bitmap (without bringing in newer libraries). |
The spec has been updated here: https://github.com/uuid6/uuid6-ietf-draft/blob/master/draft-peabody-dispatch-new-uuid-format-02.txt But it's not published. So that should be followed. There's a bug in the python reference implementation: uuid6/prototypes#8 I'm now fixing that up in our JS port. |
I've got a working test script now: import { performance } from 'perf_hooks';
import crypto from 'crypto';
const uuidVariantBits = '10';
const uuidVersionBits = '0111';
// this returns milliseconds (with microsecond precision)
let timestamp = performance.timeOrigin + performance.now();
console.log(timestamp);
console.log(timestamp / 1000);
// seconds as integer part
const unixts = Math.trunc(timestamp / 1000);
console.log(unixts);
console.log('%', timestamp % 1000);
// milliseconds is here
// and this i a "fractional" part
// so this is the number of seconds in "millisecond" form
// this will have to be stored in 12 bits
console.log((timestamp % 1000) / 1000);
// this is the "milliseconds" as fractional
// we know this will always be 3 decimal places
const msec = roundFixed((timestamp % 1000) / 1000, 3, 10);
console.log(msec);
const msecSize = 12;
const msecFixed = Math.round(msec * (2 ** msecSize));
console.log(msecFixed);
// other way (when parsing)
console.log(roundFixed(msecFixed / (2 ** msecSize), 3));
const unixtsBits = dec2bin(unixts, 36);
console.log(unixtsBits);
console.log(unixtsBits.length);
const msecBits = dec2bin(msecFixed, 12);
console.log(msecBits);
console.log(msecBits.length);
const seq = 0;
const seqBits = dec2bin(seq, 12);
console.log(seqBits);
// 64 bits
const randomBytes = crypto.randomBytes(8);
const randomBits = [...randomBytes].map((n) => dec2bin(n, 8)).join('');
console.log([...randomBytes]);
console.log(randomBits);
console.log(randomBits.length);
const randomBits_ = randomBits.substr(0, 62);
console.log(randomBits_);
console.log(randomBits_.length);
// nice
const uuidBits = unixtsBits + msecBits + uuidVersionBits + seqBits + uuidVariantBits + randomBits_;
// all done
console.log(uuidBits);
console.log(uuidBits.length);
function roundFixed(num: number, digits: number, base?: number){
const pow = Math.pow(base ?? 10, digits);
return Math.round((num + Number.EPSILON) * pow) / pow;
}
// binary formatted string is naturally big-endian
// will pad with 0 in front
// see: https://stackoverflow.com/a/16155417/582917
function dec2bin(dec: number, size: number): string {
return (dec >>> 0).toString(2).padStart(size, '0');
}
// this is the "integer form of 128 bits"
// i'm sure we couldn't actually do this
// but it's a large number
// const uuidInt = parseInt(uuidBits, 2);
// the uuidBits is too big, how do you convert it to hex?
// try something different
// console.log(uuidInt.toString(16));
// 61628e3e-a197-000b-6c2e-87b65fc9bc5
// 123e4567-e89b-12d3-a456-426614174000
// i'm getting only 31 characters
// not 32 characters... why?
function bin2hex(b) {
return b.match(/.{4}/g).reduce(function(acc, i) {
return acc + parseInt(i, 2).toString(16);
}, '')
}
function hex2bin(h) {
return h.split('').reduce(function(acc, i) {
return acc + ('000' + parseInt(i, 16).toString(2)).substr(-4, 4);
}, '')
}
const uuidHex = (bin2hex(uuidBits));
console.log(uuidHex);
console.log(uuidHex.length); |
I've created a number of utility functions to help the creation of
Additional utility functions maybe used to encode and decode to multibase but I'll leave that later. Tests have been added for all of the above. |
It's working so far. Some more needed to get it ready for prod. Regarding the last id tracking, I reckon if the last id is earlier than the current time origin, then we should only use it as the origin plus 1 if the |
One of the cool things I just realised is that because the |
The |
TS accepts For now I'll leave this as documented. |
Multibase encoding and decoding has been added in. However due to a problem involving jsdom: jestjs/jest#8022 I've had to upgrade the jest testing facilities. This may impact downstream PK, so we have to update TypeScript-Demo-Lib and PK related testing.
|
This is published now: https://www.npmjs.com/package/@matrixai/id All done and ready for integration into PK. |
Specification
ID generation is used in many places in PK. But the IDs must have different properties depending on the usecase.
The properties we care about are:
IDs compared to petnames give us the secure and decentralized properties, but not human-meaningful. Human meaningful names can be generated by mapping to a mnemonic. But that is outside the scope of this for now.
There are roughly 4 usecases in PK:
To resolve decentralised vs centralised, rather than assuming that the machine id should be embedded in the identifier, we would instead always expect the machine id to be appended as as suffix. Appending is superior to prepending to ensure that sorting is still done.
We can default to using 128 bit sizes, but allow the user to specify higher or smaller sizes.
We can use a default CSPRNG, but also allow users to submit a custom CSPRNG for random number generation.
To ensure monotonicity, we want to allow the external system to save a clock state and give it to us, so we can ensure that ids are always monotonic.
We may expect that our IDs to be later encoded with multibase, we should allow this library to be composed with multibase later.
Note that ID generation is different when it's meant to be backed by a public key. That is out side of the scope of this library. These IDs are not public keys!
There are places in PK where we use https://github.com/substack/lexicographic-integer, in those cases we may keep using that instead of this library. However those are when it is truly that we are trying to store a number like the inode indexes in EFS, in the sigchain, what we really want is
IdSortable
Additional context
Tasks
performance.now()
andperformance.timeOrigin
APIs (make it browser possible by testing with a dynamic import?)IdRandom
IdDeterministic
IdSortable
[ ] - Port tests over from https://github.com/uuid6/prototypes/tree/main/python for- created our own testsIdSortable
[ ] - Test that it is actually returning- can't do this because it doesn't work in jestArrayBuffer
The text was updated successfully, but these errors were encountered: