Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Substrate BIP-39 generate different seeds between Polkadot-API and Subkey #14631

Open
2 tasks done
Lohann opened this issue Jul 25, 2023 · 5 comments
Open
2 tasks done

Comments

@Lohann
Copy link
Contributor

Lohann commented Jul 25, 2023

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

The implementation of Substrate BIP39 is not consistent between polkadot-api and subkey for numerical junctions that have more than 64bits, example:

Using polkadot-api keyring:

import { Keyring } from '@polkadot/keyring';

const keyring = new Keyring({ type: 'sr25519', ss58Format: 42 });
const account = keyring.createFromUri("//18446744073709551616");
console.log(account.address);

// Output: 5FsxbdTHtNSLE5HCLzzGVB2oEzxZB6QJuakiHSoTNWEXD1qj

Using subkey:

$ subkey inspect --scheme=Sr25519 --network=substrate "//18446744073709551616"

Secret Key URI `//18446744073709551616` is account:
  Network ID:        substrate 
 Secret seed:       0xe95232125504305f665b8e8c891e1e1e87fcbb0b4eff38a9e100862207d2ce97
  Public key (hex):  0x11ea83f6705f1e3bd3b50695227a1c64dccd5c368648f17eb80f093695028daa
  Account ID:        0x11ea83f6705f1e3bd3b50695227a1c64dccd5c368648f17eb80f093695028daa
  Public key (SS58): 5CUCLVQzLh5oxNanu3699rqaxmkiQG4kNNd4WCDfJWiqwuYk
  SS58 Address:      5CUCLVQzLh5oxNanu3699rqaxmkiQG4kNNd4WCDfJWiqwuYk

This section of the documentation says:

purely numeric items are interpreted as integers, non-numeric items as strings.

However it doesn't define a max bitlength for the numerical value, the Javascript implementation consider integers of any size:

const RE_NUMBER = /^\d+$/;

RE_NUMBER.test(code)
        ? new BN(code, 10)
        : code

While the Rust implementation only accepts u64:

let res = if let Ok(n) = str::parse::<u64>(code) {
	// number
	DeriveJunction::soft(n)
} else {
	// something else
	DeriveJunction::soft(code)
};

Which implementation is the right one?

Steps to reproduce

No response

@burdges
Copy link

burdges commented Jul 26, 2023

There were older variations in subkey too, so maybe we should document build steps for the important versions of subkey in a wiki somewhere.

@Lohann
Copy link
Contributor Author

Lohann commented Jul 27, 2023

I think we need a formal specification for Substrate-BIP39 that excludes any ambiguity, also include a formal definition for junction derivation for all supported algorithms (ED25519, SR25519 and ecdsa), without this there's the risk of the user lost its funds when trying to migrate its mneumonic pharse to wallets with different implementations.

I'm implemeting a Dart Client for interacting with Polkadot/Substrate, and I found this issue while implemeting Substrate-bip39 in Dart. Without a formal spec I'm not sure what is the correct implementation:

final n = BigInt.tryParse(code, radix: 10);
final Uint8List bytes;
if (n != null &&
    n >= BigInt.zero &&
    n < BigInt.parse('18446744073709551616')) {
  // number
  bytes = U64Codec.codec.encode(n);
} else {
  // something else
  bytes = StrCodec.codec.encode(code);
}

or this:

final n = BigInt.tryParse(code, radix: 10);
final Uint8List bytes;
if (n != null && n >= BigInt.zero) {
  // number
  bytes = toLittleEndianBytes(n);
} else {
  // something else
  bytes = StrCodec.codec.encode(code);
}

Would be so much easier if the standard considered everything as a byte string, but now it would break compatibility.

Also would be nice to enforce some limits and constraints, it would make the implementation safer and more predictable, especially for constrained devices like hardware wallets, ex:
1 - A valid uri cannot have more than 32 junctions, or all complaint devices must support up to 32 junctions.
2 - If the uri starts with 0x it must be a raw hex encoded seed
3 - A valid URI and junction must contain only alphanumeric chars, ex: vessel ladder //$#(* would be an invalid uri
4 - A junction cannot have more than 32 chars in length (either for numeric or string junctions).
5 - A numeric junction must be 256bit unsigned integer LE encoded, otherwise is a string junction SCALE encoded.

@Lohann
Copy link
Contributor Author

Lohann commented Jul 27, 2023

I think that everything that is substrate BIP-39 related must move to substrate-bip39 repo, including junction derivation logic, I had a lot of extra work trying to implement a complaint substrate-bip39 lib because the code is spread between substrate and substrate-bip39 repos.
Having everything centralized in one repo also makes much easier for third-party libs do a native binding and support substrate-bip39.

Also if someone want to support a new signature scheme, like NIST Post Quantum ones, we can discuss how junction derivation should be implemented in one place.

@davxy
Copy link
Member

davxy commented Aug 8, 2023

If I don't miss something, currently the sources of knowledge about some key derivation details are:

Currently the wiki:

  • Doesn't specify that soft derivation is supported only for sr25519
  • The detail the OP is talking about (i.e. serialization of integers ≥ 2^64 is performed using string byte array)

@burdges is there any RFC or "official" doc describing how our HDKD algorithm (not the bitcoin one since is not compatible) should behave for integer junctions with value ≥ 2^26?

cc @lisa-parity what about starting by adding these details to the wiki?
Maybe should not be part of the subkey tool description but as a paragraph of https://docs.substrate.io/learn/accounts-addresses-keys/

@burdges
Copy link

burdges commented Aug 10, 2023

I'd think the spec team should've written something by now, but maybe not.. @drskalman ?

You'll need the code in https://github.com/w3f/schnorrkel/blob/master/src/derive.rs too probably, but maybe wallets do their own hard derivations, not sure. Yes it should be speced it nobody did it.

We make several mistakes in substrate, some inherented from bitcoin and some new, but mostly impossible to fix.

We should've less logical similarity between hard and soft derivations, so they should somehow be much more separated than in bitcoin & similar.

  • A hard derivation is an internal function of a wallet, or even secret delegation between wallets. A hard derivation scheme works for the vast majority of signature schemes, but not any known post-quantum VRF for example.
  • A soft derivation is a family of related accounts, with morally identical secret keys. A soft derivation scheme is a feature of a particular signature scheme, which only exists for some signatures, but not say BLS, and could break other feature. As an example, schnorr soft derivations break the schnorr flavor of ecRecover, but ecRecover was always a stupid thing to do.
  • Also, soft derivations could theoretically be strengthened by proper management of additional entropy, what bitcoin calls the chain code. We messed this up, due to Gav & I miss-communicating, aka the chain code should not human readable. Soft derivations were a hack for UTXOs so we botched them because we're not thinking clearly about UTXOs.
  • Aside form UTXOs, DKGs could work nicely with soft derivations, but not with hard derivations. This is where the chain code mistake becomes a problem.
  • Do not expect post-quantum schemes to have soft derivations. Some actually do of course, likely some have much more interesting tricks, but anything like this requires really careful analysis, which nobody did so far. There are people who I'd trust to do this analysis, but afaik none of them care about blockchains.

In other words, the whole "seed phrase / phrase / phrase" thing makes sense for hard derivations. It's kinda a mess for soft derivations though. Ideally we'd deploy a "new fixed" soft derivation that works under an sr25519 key pair or DKG, but doing this is not on anyone's radar. I'm not even sure if it depends upon the DKG right now.

All the above says, there is limited value in the much of the currently deployed scheme, but like I hinted above there is great value in users being able to access their old accounts on niche signers like subkey. Imho the first question should be: Do we have adequate instructions for accessing every possible past variant under which someone maybe holds DOT/KSM? I suppose this is a question for the W3F tech ed team, but I want to stress that this is the part that matters.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants