Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wallet: add doc for importing seeds #58

Merged
merged 3 commits into from
Nov 19, 2023

Conversation

apoelstra
Copy link
Contributor

This is the start of a wallet authors' document for handling seeds. It covers the rules for importing seeds. A later PR will cover rules for generating them.

@apoelstra
Copy link
Contributor Author

Copy link
Contributor

@BenWestgate BenWestgate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Seeds may be encoded across multiple shares, rather than a single string"
Seeds may be split across multiple shares, rather than encoded as as single string.

We might as well offer reference libraries developers can import that do both import and generate, once we have a deterministic generation standard that's beyond reproach.

Imagine if bech32 had recommended decode() only implementation, how many wallets would support it today? (can send to bech32 but not generate them.)

It's generating the codex32 backups that drives the demand for implementing importing them.

  1. It's not necessary to enter the bit length before entering data. If the wrong length (not 16, 32, 64) is given, the implementation can confirm the unusual length. (I don't know, means the length closest to 64, 32, 16 is considered the correct length for error correction, but will consider other lengths if they can't find a matching correction.)

  2. typo "entext"
    I chose to display the user's typing field how the user types.

The corrected string I display Upper if typed mostly upper, lower if typed mostly lower.

If they want to see upper case while entering, every keyboard has a Caps Lock key.

4 character boxes doesn't divide evenly in 256-bit seeds, the final box is half filled.

That's why I used a single text field for recovery. But if you think it's best because of how nice 128-bit looks as 2 groups of 6 groups of 4 characters.

  1. "should not be able to entire mixed-case or non-bech32 characters" typo "able to enter"
    Also let them and then just erasure correct it.
    Same for an invalid header, let them type it wrong, and if the checksum doesn't pass, assume it's identical to the previously valid share(s) for error correction. But it should be pre-filled anyways, which emphasizes the need for header to match on every share input. Perhaps that field is even disabled for editing but shown for visual alignment purposes.

threshold other than 0 and share "s" or 2 - 9 is an auto-corrected/located error (1st share it's an erasure location). reused share indexes can be replaced with an erasure, implementations don't need to check already used valid indexes for a small speed up.

For importing, highlighting the specific location of insertions and deletions (if practical), is fine.

You want to give as much help as possible, and because it can only correct a few errors, it is not overwhelming to show every specific correction like it could be during initial confirmation.

  1. You can't tell the difference between a new share with an index typo and the user retyping in a previous share until they're nearly done with the checksum, since an independent share could theoretically have 25/26 symbols in common with another. Any heuristic here would ban valid share sets.

If to prevent retyping a share in vain, you want to block on the wrong share index, I suggest making a top up or disable typing so the user knows they can't continue until it's correct / unique. But remember they may have an actual typo or error in the paper here causing index reuse, so then what?

I suggest you just mark it as an erasure and let them continue. Most people with legible handwriting will make few enough errors that the checksum + known valid header and known erasures from non CHARSET characters or index reuse will have the power to correct them all.

@BenWestgate
Copy link
Contributor

I'm currently writing a python draft reference implementation of the share generation based on the lengthy discussion in the issue about it.

@apoelstra
Copy link
Contributor Author

apoelstra commented Aug 10, 2023

Imagine if bech32 had recommended decode() only implementation, how many wallets would support it today? (can send to bech32 but not generate them.)

This was the case for many wallets for a long time, actually :). For both Segwit and Taproot. In bitcoin it's inherently easier to send to addresses (you just need to figure out how to decode it into a scriptpubkey) than to send from them (then you need to actually understand the scriptpubkey).

Also let them and then just erasure correct it.

Oh, yeah, this is a good idea.

If they want to see upper case while entering, every keyboard has a Caps Lock key.

Hardware wallets do not. They often have only two buttons, or a scrollwheel.

It's not necessary to enter the bit length before entering data. If the wrong length (not 16, 32, 64) is given, the implementation can confirm the unusual length.

Ah, yeah, confirming length if it isn't one of the "normal" ones is a good idea. Though it does make it a bit harder to detect insertion/deletion errors.

4 character boxes doesn't divide evenly in 256-bit seeds, the final box is half filled.

I think this is fine. 4-character boxes are way easier to scan by eye than continuous text, and it matches the separations on the codex32 worksheets.

If to prevent retyping a share in vain, you want to block on the wrong share index, I suggest making a top up or disable typing so the user knows they can't continue until it's correct / unique. But remember they may have an actual typo or error in the paper here causing index reuse, so then what?
I suggest you just mark it as an erasure and let them continue. Most people with legible handwriting will make few enough errors that the checksum + known valid header and known erasures from non CHARSET characters or index reuse will have the power to correct them all.

I think it'd be far more common for people to try to type the same share twice than for them to make a typo in the share index. I think in practice people would have a document indicating which share indices are in what locations so they pretty-much couldn't get into a situation where they didn't know the correct index.

You want to give as much help as possible, and because it can only correct a few errors, it is not overwhelming to show every specific correction like it could be during initial confirmation.

Yes, but there's a tradeoff here -- we want to help as much as possible but we definitely do not want to import the wrong seed, because that can cause (effective) coin loss when they later import the right seed and can't find their coins.

@apoelstra
Copy link
Contributor Author

As for generation, I'd really like to keep discussion of that to #57 for now, and we can discuss it even further in a followup to this one. But it's a bigger topic and if we discuss it here it'll overwhelm the discussion of import rules.

@BenWestgate
Copy link
Contributor

BenWestgate commented Aug 11, 2023

Hardware wallets do not. They often have only two buttons, or a scrollwheel.

In that case then, they should display upper case for legibility because the users should write that way.

On HWW input style:

I liked the ledger two button keyboard that randomizes your start position in the word list(s), but it's painfully slow.

But the fastest 2 button input, space permitting (and it permits if the device can display the whole string at once) would be:

Seeing 16 and 16 characters choosing group left or right, then 8 and 8 choosing group left or right, then 4, L/R, then 2 L/R selects the character. Or an instruction manual with 32 LLLR, RLRL style codes for each character if space does not permit and it is a new product. '?' needs to be a 33rd character to mark erasure locations for correction. Perhaps holding one of the buttons a couple seconds could type it.

Already used share indexes and invalid thresholds should be removed from the lists, index "s" should auto complete if k=0.

Implementations randomizing the starting index (but not the order) of the displayed group content will prevent microphones from determining the strings entered if the 2 buttons produce distinct noise.

Only 225 button presses instead of potentially 720 average (optimal tapping) if they must scroll thru the entire alphabet sequentially.

Also have a way to go back and fix a mistake, perhaps by holding both buttons a few seconds.

If they do not have color (and perhaps even if they do) corrected error's location or group should be underlined for easy locating.

Ah, yeah, confirming length if it isn't one of the "normal" ones is a good idea. Though it does make it a bit harder to detect insertion/deletion errors.

That's what my implementation does, warns on an unusual length and says codex32 strings are usually 48, XX or XXX characters long. If they proceed with it and the checksum fails, it does only try to error correct it as if it were 48. It could of course try other lengths >48, if 48 fails to validate. I did not implement that however. It was already slow enough.

I'd prefer the length be fixed to 16-byte, 32-byte and 64-byte seeds for stronger insertion and deletion correction safety guarantees. Is there any tangible benefit to a 20-byte seed? If there is none, it should be dropped to gain tangible insert/delete correction improvements, without having to explicitly ask for seed length.

This for Bitcoin seeds, I understand any arbitrary byte data can be encoded manually. But for wallets, who wouldn't use a 128-bit or 256-bit seed?

I think this is fine. 4-character boxes are way easier to scan by eye than continuous text, and it matches the separations on the codex32 worksheets.

I'll update my implementation then. This almost requires asking the seed length (which i want to avoid) on desktop wallets to know how many empty boxes to initially draw. Unless we can find some standard stacked rows of X 4 character boxes format (the strings are really long written horizontally anyhow, like 8-9" wide in most tester's handwriting.) That neatly cuts off for 128-bit seeds while giving space to continue a longer one, the stopping point could be marked in the GUI (128-bit seed).

I think the easiest way I could avoid asking is use a single line but automatically place an em space or three-per-em space between every 4th character typed. That way it's one long box suitable for 128-bit or 256-bit but they still get the visual separation of groups of 4. I'd add an extra em space or three-per-em space between every 6 groups. Or split them on two lines. And the current state of my new confirm dialog I use monospace font so the different characters always neatly fill the 4 character boxes. If I use monospace for importing, an ordinary space or two between groups would suffice. gpg does this when displaying 160-bit hex fingerprints. 5 x 4 characters with a space between each group, then a space and then another 5x4. The center space after 5-6 groups is to avoid losing track of which group you're on.

Automatically typing the space rather than laying out boxes in advance might startle some people into thinking it was their own typo. two spaces would be more certain it's the software's behavior than one. prefilled-ms1, type , boom spaces, despite them being on opposite side of keyboard should immediately train any attentive user to expect it.

I think it'd be far more common for people to try to type the same share twice than for them to make a typo in the share index. I think in practice people would have a document indicating which share indices are in what locations so they pretty-much couldn't get into a situation where they didn't know the correct index.

I've done both, but repeated shares a bit more often. It's probably too late to change now, but this is the consequence of putting the most important character of the header last. None of my testers reported entering a share twice but only a dozen or so tested and mostly threshold 2 backups since it's a hot wallet.

If if were ms1<index><threshold><indentifier>... this would happen less. Eye ignores the constant parts and then the random index gets blended up with the random payload.

So if they don't know the correct index they can manually mark it as an erasure (long press R xor L) to proceed?

That's better for ECC than fudging the character just to get past the nanny.

...we want to help as much as possible but we definitely do not want to import the wrong seed...

Another reason to use ECC payload padding and a bip32 fingerprint or cryptographic hash of valid shares in the identifier when electronically generated.

I noticed the other day, that highlighting groups with an insertion error when confirming a new set, has to be made ambiguous with identifying the location of substitutions (the 4 character box turns red), otherwise there's only 4 deletes to try one of which is guaranteed to resolve it. That seemed too easy on screen, when they really ought to be going back and forth between the paper and the screen.

The general failure mode of an over correction will be a string that is the same as they typed except for in 5-8+ positions so verifying the entire thing is always critical, perhaps we should only lightly highlight error location groups even for the more "certain" substitution corrections then?

Knowing an insertion or deletion correction is correct is not different than knowing if there were 4 or less substitution errors. Maybe only highlight specifically the erasure corrections. But even there? So they can repair their paper copy? Displaying it doesn't help confirm the correction is right unless the character was partially legible.

Error correction back-end implementation details can get updated when electronic generation is standardized to "overclock" the correction a bit since the ID and padding are checksums by default. (and it's an easy question for the user to remember if they set the identifier, or think the backup creator did or not, random usually looks random)

@apoelstra
Copy link
Contributor Author

Only 225 button presses instead of potentially 720 average (optimal tapping) if they must scroll thru the entire alphabet sequentially.

Yikes. And this is per share. I think 720 is too many. Though the binary-tree thing I think would be hard to do on the tiny Ledger screen. 225 is presses is probably okay -- I think entering 12 BIP39 words takes roughly that many presses today. I've done a 24-word entry before, multiple times, which sucks but isn't the end of the world.

At least with our scheme we can zero in on error characters so you don't need to redo the whole thing when you find you've made a mistake.

Is there any tangible benefit to a 20-byte seed?

No, but it is a common thing that BIP39 implementations support (16-word lists). For codex32 this would be a 54-character string which is a weird size but gives you a bit of extra protection against leaked characters. But OTOH I feel like 128-bit seeds already give you at least 16, probably 32, bits of security that aren't pratically needed. I don't know if anybody actually uses 16-word seeds. I think we shouldn't bother supporting it.

This almost requires asking the seed length (which i want to avoid) on desktop wallets to know how many empty boxes to initially draw

I think the empty boxes should be at the end, not the beginning. So you're fine. Your comments about entering everything into one box but with automatic em-spaces (I'd maybe use smaller, e.g. half-character spaces) all sound good to me. Maybe I should weaken the recommendation text here to just say "visually separate groups of 4" without specifying the exact mechanism.

If if were ms1<index><threshold><indentifier>... this would happen less. Eye ignores the constant parts and then the random index gets blended up with the random payload.

Hmmm, it might not be too late to change this. Unsure how many people have actually deployed this in a hard-to-redo setting. (And we can provide guides/tooling for replacing headers; it can be done much much faster than redoing the entire checksum worksheet.) I am the only person I know who is actually using the scheme with real money, and my setup is already non-standard and needs to be replaced.

So if they don't know the correct index they can manually mark it as an erasure (long press R xor L) to proceed?

Regardless of the index position, I do like this idea (both of supporting erasure entry and of recommending people do it for the share index if they're unsure).

perhaps we should only lightly highlight error location groups even for the more "certain" substitution corrections then?

I like this. Maybe we should only provide specific error information for erasures. So if the user legitimately doesn't know a character, he's forced to say so rather than guessing.

Error correction back-end implementation details can get updated when electronic generation is standardized to "overclock" the correction

Yeah. I guess, one thing we should think about now is how a HWW, on import, can detect whether this "overclocking" has been done. Maybe the user should need to explicitly say it.

@BenWestgate
Copy link
Contributor

Only 225 button presses instead of potentially 720 average (optimal tapping) if they must scroll thru the entire alphabet sequentially.

Yikes. And this is per share. I think 720 is too many. Though the binary-tree thing I think would be hard to do on the tiny Ledger screen. 225 is presses is probably okay -- I think entering 12 BIP39 words takes roughly that many presses today. I've done a 24-word entry before, multiple times, which sucks but isn't the end of the world.

If 32 characters doesn't fit to make the first tree, you split it in thirds or quarters and let them scroll between groups of 8 or 10-11 symbols to select the first bit or two with a double press.

...weaken the recommendation text here to just say "visually separate groups of 4" without specifying the exact mechanism

This is more likely to lead to a better design being came up with than imposing something. dash is a very common character used by microsoft and apple for long random codes. 4 character groups is quite standard so that is worth recommending.

I am the only person I know who is actually using the scheme with real money, and my setup is already non-standard and needs to be replaced.

A dozen+ people alpha tested Bails which creates & imports codex32 seed backups to Bitcoin Core and shared feedback on my setup and recovery process. Nobody said they were using the alpha with real funds, but their backups will restore fine on any bip93 compliant implementation if they do.

But the old backups are missing some of the recovery features of last resort we discussed in #57 and the convenient fingerprint ident. My alpha didn't allow choosing identifiers when creating a backup, I want to add the option after we standardize share set generation.

docs/wallets.md Outdated

The process for entering shares is:

1. The user should select the bit length of the import before entering any actual data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree. User's don't necessarily know anything about bit lengths, or what they are. It doesn't appear in the shares themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. By recommending electronic generations use 16, 32 or 64 byte master_seeds we can guess the correct length. And even for odd-ball hand generated 17-byte sets, the checksum will almost always confirm it was not a typo. User should have to confirm an "unusual" length is correct (matches what is written) however after entering it. Since the checksum can't give absolute guarantees for insertion errors. After the 1st share is validated, then similar to the header, the new length can be used for error correction of subsequent shares.

docs/wallets.md Outdated
1. The user should select the bit length of the import before entering any actual data.
1. Then the user should enter the first share. To the entext possible given screen limitations, when entering share data, data should be displayed in uppercase, visually separated into four-character windows. The first four-character window should include the `MS1` prefix, which should be pre-filled.
1. Once the first share is fully entered, the wallet should validate the checksum and header before accepting it.
* The user should not be able to entire mixed-case or non-bech32 characters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users definitely need to be able to enter at least a ? and in theory accepting non-bech32 characters could be helpful for improving desperate error recovery. (e.g. converting O to 0s).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing a non-bech32 character that has a common confusion B to 8, I to l should try the common substitutions first then turn it into an erasure by substituting ?, other non-bech can be immediately marked as ?, perhaps even as the user types to educate them of the CHARSET.

If the string has more than 8 ? or 13 contiguous (or the combos of contiguous and random mentioned in my error correction thread) the "Submit" button could change to "Attempt Repair" or "Attempt Correction" which will ask if the share set was generated by electronics. Standardized implementations in #57 will have some additional error detection bandwidth in the identifier, payload padding and potentially even share payloads themselves (tbd). Handmade or non-compliant electronic implementations will need to provide a known address to "overclock" the erasure correction.

Hypothetically, this option could appear if more than 4 substitution errors are present as well.

@apoelstra
Copy link
Contributor Author

Updated based on this discussion.

@BenWestgate in future can you post comments on specific text if you think it needs to be changed? Otherwise I fear I will lose track of suggestions.

Copy link
Contributor

@BenWestgate BenWestgate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in-line comments that were missed from the original review and some new points.

docs/wallets.md Outdated
1. Once the first share is fully entered, the wallet should validate the checksum and header before accepting it.
* The user should not be able to entire mixed-case or non-bech32 characters.
* The user should not be able to entire mixed-case characters. The user must be able to enter all bech32 characters as well as `?` indicating an erasure. Wallets may allow users to enter non-bech32 characters, at their discretion. (This may be useful to guide error correction, e.g. by attempting to replace all `B`s with `8`s.)
* If the header is invalid, the wallet should highlight this and prevent the user from entering additional data until it is fixed. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an invalid header is also one whose threshold and identifier do not exactly match a previously accepted share.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeds may be encoded across multiple shares, rather than a single string.
Should be:
Seeds may be split across multiple shares, rather than encoded as as single string

docs/wallets.md Outdated
1. Once the first share is fully entered, the wallet should validate the checksum and header before accepting it.
* The user should not be able to entire mixed-case or non-bech32 characters.
* The user should not be able to entire mixed-case characters. The user must be able to enter all bech32 characters as well as `?` indicating an erasure. Wallets may allow users to enter non-bech32 characters, at their discretion. (This may be useful to guide error correction, e.g. by attempting to replace all `B`s with `8`s.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be able to entire

should not be able to enter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user should not be able to enter a previously used share index at position 9. If they do the user should be immediately warned to enter a different codex32 share than previously entered. "I am" converts it to an erasure ?, while "Thanks/OK" leaves the character unfilled and lets them start over.

docs/wallets.md Outdated

The process for entering shares is:

1. The user should select the bit length of the import before entering any actual data.
1. Then the user should enter the first share. To the entext possible given screen limitations, when entering share data, data should be displayed in uppercase, visually separated into four-character windows. The first four-character window should include the `MS1` prefix, which should be pre-filled.
1. The user should enter the first share. To the extent possible given screen limitations, when entering share data, data should be displayed in uppercase, with visually distinct four-character windows. The first four-character window should include the `MS1` prefix, which should be pre-filled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For subsequent shares the threshold and identifier (first 8 characters will also be pre-filled).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have "the wallet may pre-fill...". Should I change this from "may" to "must"?

Copy link
Contributor

@BenWestgate BenWestgate Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing good comes from not prefilling (and disabling editing) the threshold and identifier of subsequent shares, they'll just get an error later their shares were incompatible if the checksum passes.

I think must is fine unless you can find any good reason to let them introduce errors to their shares.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optionally the pre-filled 'ms1' can be disabled for editing as well.

@apoelstra
Copy link
Contributor Author

@BenWestgate added a new commit which should address your comments.

docs/wallets.md Outdated
1. The user should enter the first share. To the extent possible given screen limitations, when entering share data, data should be displayed in uppercase, with visually distinct four-character windows. The first four-character window should include the `MS1` prefix, which should be pre-filled.
1. Once the first share is fully entered, the wallet should validate the checksum and header before accepting it.
* The user should not be able to enter mixed-case characters. The user must be able to enter all bech32 characters as well as `?` indicating an erasure. Wallets may allow users to enter non-bech32 characters, at their discretion. (This may be useful to guide error correction, e.g. by attempting to replace all `B`s with `8`s.)
* If the header is invalid, the wallet should highlight this and prevent the user from entering additional data until it is fixed. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Ben already mention this, but the header is covered by the error correcting checksum. Therefore users need to be able to enter invalid header data all the way to the end so that it can be corrected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For subsequent shares, it should be pre-filled after 'ms1' with the first valid share's header and not editable. The entry cursor starts at position 9, share_index.

What Andrew wants to prevent is someone accidentally re-entering a previous share, as it'd be very frustrating on a ledger style input. The suggestion was a hard warning if share_index is not unique. If user assets they're entering a unique share, index would be replaced with an erasure / ?, otherwise they'd get a new share and enter that..

Regarding threshold 0 and index != s, both the threshold and the share index should be marked with an erasure (?) imo. And during error correction implementations should always substitute index 's' when checking threshold = '0'.

I'm of the opinion certain mistakes should be automatically substituted with ? as the user types, to increase their chances of understanding how the format works and correcting their mistake. With the exception of repeating a share index as that's a waste of user time, that should require confirmation to continue (marking it as an erasure).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inputting share_index = 's' in subsequent share entries should also always be marked as an erasure or changed to '5' (then tried as an erasure if '5' fails to correct) imo. (electronic implementations will shuffle share indexes for redundancy privacy so it's not reasonable to assume they're usually early alphabet characters)

Copy link
Contributor

@BenWestgate BenWestgate Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the 1st valid share is accepted, the length of that share should be used to inform later deletion and insertion errors.

For known wrong strings (wrong length on subsequent, wrong charset/case or any of the erasures mentioned above) the submit button might say "attempt correction".

In my experience testing my error correcting import GUI in Bails, it's much faster to accept an incorrect length entry and let the computer attempt to correct and then verify corrections match the paper, than find and fix the discrepancy yourself. As much as possible, we should allow them to make small mistakes and suggest a correction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change the text to say that the wallet "SHOULD ask for confirmation" before continuing with an invalid header.

docs/wallets.md Outdated
1. The user should enter the first share. To the extent possible given screen limitations, when entering share data, data should be displayed in uppercase, with visually distinct four-character windows. The first four-character window should include the `MS1` prefix, which should be pre-filled.
1. Once the first share is fully entered, the wallet should validate the checksum and header before accepting it.
* The user should not be able to enter mixed-case characters. The user must be able to enter all bech32 characters as well as `?` indicating an erasure. Wallets may allow users to enter non-bech32 characters, at their discretion. (This may be useful to guide error correction, e.g. by attempting to replace all `B`s with `8`s.)
* If the header is invalid, the wallet should highlight this and prevent the user from entering additional data until it is fixed. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
Copy link
Contributor

@BenWestgate BenWestgate Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types of errors users make depends on input device.

For example 'B' could be a typo of 'V'. It might make sense for implementations to create a list for erasure correction of invalid characters as some are more likely than others. Similar to how when correcting an invalid share_index, there is no need to check already entered share indices.

Always substituting 'B' with '8' or 'I' with 'L' actually reduces correction ability since it turns an erasure location into an error. OTOH when the suggestion is correct, it's more powerful than marking it as an erasure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides visual similarity, keyboard proximity, there is also auditory similarity:

When spoken over the phone (and foolishly not using NATO phonetic) which characters are most likely to be mistaken for one another in popular languages.

Having a set of common mistakes (from the above 3 risks) for each bech32 character helps the correction power when an error location is known due to being non-bech32 or some of the other rules. Although easiest implementation may try to correct it as an erasure first, THEN substitute the ? for the top picks based on the character they actually typed to see if they help make it correctable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good points. I think my current text, which just gives 8 and B as an example, is okay. I can remove the example if you think it's too leading. But I don't want to try to fit a nuanced paragraph about mistake patterns into the spec, since this isn't the right place for it.

docs/wallets.md Outdated
1. Once the first share is fully entered, the wallet should validate the checksum and header before accepting it.
* The user should not be able to enter mixed-case characters. The user must be able to enter all bech32 characters as well as `?` indicating an erasure. Wallets may allow users to enter non-bech32 characters, at their discretion. (This may be useful to guide error correction, e.g. by attempting to replace all `B`s with `8`s.)
* If the header is invalid, the wallet should highlight this and prevent the user from entering additional data until it is fixed. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
* If the checksum is invalid, the wallet may use an error-correction algorithm to determine a corrected share, but the wallet MUST show these corrections to the user rather than silently applying them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the BIP said "SHOULD" provide a corrected string not 'may', did you intend to lower the recommendation here? I have naive python brute force correcting code desktop wallets can use to fix 2-3 errors until rust-bech32 is finished. I think your lookup table approach is not feasible on HWWs though.

About half of my testers tested a version that did not provide suggested corrections and it was usable but harder, especially for a boomer with messy handwriting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will strengthen the recommendation. My original intent was to make this easy to implement but after discussing on IRC, I think error correction (or at least, error locating) is too important to usability.

docs/wallets.md Outdated
* If the header is invalid, the wallet should highlight this and prevent the user from entering additional data until it is fixed. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
* If the checksum is invalid, the wallet may use an error-correction algorithm to determine a corrected share, but the wallet MUST show these corrections to the user rather than silently applying them.
* For substitution errors, the wallet may highlight the offending 4-character window or the offending character. It may also show the corrected character.
* If the wallet can determine insertion or deletion errors, it should highlight the offending 4-character window. When detecting insertion or deletion errors, the wallet may assume that the correct share length is a multiple of 16 bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may assume the correct share length is 16-bytes, 32-bytes or optionally 64-bytes.

docs/wallets.md Outdated
1. If the share length is *not* 16, 32 or 64 bytes, but the checksum passes, the wallet should confirm that the user intends to import a non-standard share length.
1. After the first share has been entered and accepted, the wallet now knows the seed ID and threshold value.
* If the first share had index `S`, this was the actual seed and the import process is complete.
* Otherwise, the first character of the share will be a numeric character between `2` and `9` inclusive. The user must enter this many shares in total.
Copy link
Contributor

@BenWestgate BenWestgate Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support a case where the import progress can be asymmetrically (or symmetrically) encrypted, saved and resumed later by entering a "resume key"? For example Bails runs on a laptop, if the user is physically visiting share locations, the battery could die before a threshold is reached, not to mention issues of not powering the device off if crossing international borders.

Feel free to open a new issue for this, or I will if it's something wallets should support as it's an implementation detail of decent complexity, especially since flash memory cannot be relied upon to reliably erase itself due to wear leveling. (secure erase is done by physically shredding/burning the "resume key" afterwards)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I would like to support this. I did actually do such a process by hand and it was very useful to be able to do.

But I think of this document as a "minimum set of supported features", which is also why I make pretty weak recommendations for error correction etc. With the intention that wallets are free to try to add extra functionality. I'm not sure how best to express this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the useful but optional features using the word "May" so developers can consider if they are worth implementing or not.

Error correction seems like a "should, whereever possible" feature since it makes a large UX improvement.

Electronics allow asymmetric encryption. The header of saved share progress can index the public key of the private 'resume key' for a particular set of shares being restored in multiple sessions. This avoids traveling to shares with the symmetric key that decrypts them all. This is a nice to have, but it's extra labor for users to write a restore key and not helpful for every recovery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to how the BIP says wallets SHOULD display a corrected string. Without going into any details how to. One sentence: "Wallets MAY encrypt and store recovery progress, facilitating resumption of the process later by entering a designated 'resume key'."

I wouldn't even go into how to encrypt, seems device specific. I will eventually implement this in my project, I'll probably go the asymmetric encryption route. Then we can make a reference implementation for "save/resume"

This project uses SSS and had to solve similar problem. I gave them a few options and they went with: each time a share was gathered or transaction partially signed, they'd encrypt it by the pubkeys of all the remaining locations. So data in transit is not readable.
https://github.com/wild-kard/arctica/

I preferred writing a fresh resume key after the first save, keeping it somewhere safe, encrypting everything gathered by its public key, decrypting in a safe place to recover the secret and shredding it after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error correction seems like a "should, whereever possible" feature since it makes a large UX improvement.

Yeah, I think you're right. Though it is a big implementation burden which sucks. I guess having reference Python code will make the situation better.

Wallets MAY encrypt and store recovery progress, facilitating resumption of the process later by entering a designated 'resume key'

I think even this much specification leads to complicated questions about key management. It's not obvious to me that the "asymmetric crypto with resume keys at each location" paradigm is the right one. I'm tempted to just say "Wallets MAY encrypt and store recovery progress, to allow recovery without having all shares available at once" and leaving it at that.

Copy link
Contributor

@BenWestgate BenWestgate Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a single resume key that's written down and kept somewhere safe before the user shuts down if they want to save progress. Then subsequent shares encrypt to its pubkey.

But if you think even mentioning any implementation detail opens too much a can of worms, it's fine to omit, I'm sure almost every project of appreciable size will eventually have the open issue: "I shut down while restoring and lost my progress, can we get a save feature" and they can see what works for their hardware's threat model.

"Wallets MAY encrypt and store recovery progress, to allow recovery without having all shares available at once"

This is great.

Without ECC, Codex32 could be harder to import than bip39, where at least mispelled semi-legible words have natural error correction. So I think it's important for it to be an all around superior replacement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you think even mentioning any implementation detail opens too much a can of worms, it's fine to omit, I'm sure almost every project of appreciable size will eventually have the open issue: "I shut down while restoring and lost my progress, can we get a save feature" and they can see what works for their hardware's threat model.

Yep. And I'm happy to revisit this doc later if some sort of consensus emerges about how best to do this.

Without ECC, Codex32 could be harder to import than bip39, where at least mispelled semi-legible words have natural error correction. So I think it's important for it to be an all around superior replacement.

Definitely agreed. (Though BIP39 does have a few distance-one words so sometimes you don't get natural-word error correction. I have been burned by this with real funds before. Was not impressed after writing a bunch of python seed-grinding shit and loading real seed data into it, to find the spec had screwed me like this.)

@apoelstra
Copy link
Contributor Author

Pushed another round of changes.

I wonder if we should change to a different format (e.g. HTML) at least temporarily, so that I can break bullet points across multiple lines in the text fie. I find the diff is extremely hard to read when we are editing these super-long lines.

Copy link
Contributor

@BenWestgate BenWestgate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string (or secret or share) is most appropriate for the 1st entry as we do not know which it is yet.

are we using the term "seed ID" or "identifier" for this document?

docs/wallets.md Outdated
* If the checksum is invalid, the wallet may use an error-correction algorithm to determine a corrected share, but the wallet MUST show these corrections to the user rather than silently applying them.
* For substitution errors, the wallet may highlight the offending 4-character window or the offending character. It may also show the corrected character.
* If the wallet can determine insertion or deletion errors, it should highlight the offending 4-character window. When detecting insertion or deletion errors, the wallet may assume that the correct share length is a multiple of 16 bytes.
* If the header is invalid, the wallet SHOULD highlight this and request confirmation from the user before allowing additional data to be entered. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For shares after the first, a header is invalid if its share index is 'S' or matches the share index of any previously accepted share.

You don't need to mention threshold and identifier as these SHOULD (MUST?) be prefilled with the first share's values after 'ms1' in subsequent shares.

This conveniently pre-fills two 4 character blocks by the way and looks quite nice with 10 remaining.

Personally, I would not block and request confirmation on subsequent share has share index 'S', it's obviously an isolated error, correct it as an erasure afterwards. I would block on a repeated share index though so they do not re-enter the same share accidentally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it obviously an isolated error, or did they somehow produce an S share (or recover/store it)? It's not obvious to me at all that a S share is the result of the user mistyping the share index.

As for the threshold and identifier, while these are pre-filled, I have been assuming the user would have the ability to backspace over them and try to replace them.

Copy link
Contributor

@BenWestgate BenWestgate Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the solution is the same as reusing a previously used share index. Warn them and ask for confirmation to continue.

I suppose if it ends up being a valid codex32 secret then they are done importing regardless of threshold. And if it's not valid then we have a strong error location hint just like repeated indices.

I see no benefit to being able to backspace over them. Any change will either introduce an error or produce an invalid set of shares. Why let them mess up?

Or are you suggesting we let them input different identifiers and then do best of 3 error corrections on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no benefit to being able to backspace over them. Any change will either introduce an error or produce an invalid set of shares. Why let them mess up?

Or are you suggesting we let them input different identifiers and then do best of 3 error corrections on it?

I had something like that in mind. Maybe it's better to just refuse to let the user overwrite it. It does seem like it can pretty-much only make things worse. After all, their first share was accepted, with a passing checksum, with the other identifier. So if their next share has a different identifier, it either has a mistake in it (which would be corrected by forcing the identifier to be the same as the first one), or it's part of a different share set (in which case all bets are off).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roconnor mentioned there is more error correction possible for invalid sets if given more elements than the threshold. So an "invalid set" repair utility could relax the share index, identifier, index, length reqs and threshold share limit to do deeper corrections across damaged shares.
This is also where a known address, electronically generated identifier and/or padding can help, but the tool would need to ask about its existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roconnor mentioned there is more error correction possible for invalid sets

We believe this, but we don't know any efficient algorithms for doing this. So I think any such thing is out of scope for this spec.

docs/wallets.md Outdated
* For substitution errors, the wallet may highlight the offending 4-character window or the offending character. It may also show the corrected character.
* If the wallet can determine insertion or deletion errors, it should highlight the offending 4-character window. When detecting insertion or deletion errors, the wallet may assume that the correct share length is a multiple of 16 bytes.
* If the header is invalid, the wallet SHOULD highlight this and request confirmation from the user before allowing additional data to be entered. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
* If the checksum is invalid, the wallet SHOULD use an error-correction algorithm to locate errors in the share and show these to the user. It MAY additinally determine corrected share data, but if so, the wallet MUST show these corrections to the user rather than silently applying them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locate errors in the string...

It MAY additionally determine corrected string data

Codex32 secrets get errors too!

docs/wallets.md Outdated
* If the header is invalid, the wallet SHOULD highlight this and request confirmation from the user before allowing additional data to be entered. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
* If the checksum is invalid, the wallet SHOULD use an error-correction algorithm to locate errors in the share and show these to the user. It MAY additinally determine corrected share data, but if so, the wallet MUST show these corrections to the user rather than silently applying them.
* To show locations of substitution errors, the wallet SHOULD highlight the offending 4-character window or the specific offending character.
* If the wallet can determine insertion or deletion errors, it SHOULD highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct share length is 16, 32 or (optionally) 64 bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...the wallet MAY assume that the correct string length is 48, 74 or (optionally) 127 characters, corresponding to 128-bit, 256-bit and optional 512-bit master seeds.

docs/wallets.md Outdated
* If the header is invalid, the wallet SHOULD highlight this and request confirmation from the user before allowing additional data to be entered. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
* If the checksum is invalid, the wallet SHOULD use an error-correction algorithm to locate errors in the share and show these to the user. It MAY additinally determine corrected share data, but if so, the wallet MUST show these corrections to the user rather than silently applying them.
* To show locations of substitution errors, the wallet SHOULD highlight the offending 4-character window or the specific offending character.
* If the wallet can determine insertion or deletion errors, it SHOULD highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct share length is 16, 32 or (optionally) 64 bytes.
1. If the share length is *not* 16, 32 or 64 bytes, but the checksum passes, the wallet should confirm that the user intends to import a non-standard share length.
Copy link
Contributor

@BenWestgate BenWestgate Aug 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if the checksum fails, we assume the length is the closer of 48 or 74? That's dangerous if they do have a custom length string they may get a wrong correction.

If the string length is not 48, 74 or (optionally) 127 characters, the wallet should confirm that the user intends to import a non-standard string length after a checksum failure.

My implementation warned and asked for confirmation "Entered string length X is an unusual length. Codex32 strings are usually 48 or 74 characters.", then only error corrected it as if it were 48 if they didn't confirm it was intentionally non-standard length.

I have no insert and delete correction for custom length strings, although you could still try about half the usual amount. (they become "length errors" rather than "length erasures", so to speak.

Technically, if you simply append 13 (or 15) characters you always create a valid codex32 string. By checksumming the old one. So that sets an upper bound on deletion correction to at most half, 6 characters. My laptop only had enough memory to correct 2 deletions, maybe 2 deletions and 1 insertion but took minutes not seconds.

Unknown length string deletion corrections MUST be limited to at most 3.

Not sure if more informed error correction can lead to more informed candidate deletion corrections (by treating the candidate deletion locations as erasures) to improve performance. It's plausible to skip 3+ characters, especially in the longer length 74 strings.

Insertions are much faster to correct with brute force as only locations need to be tried. I could correct as many as 9 in a few minutes, 5 or 6 in useful time for a known 48 char length. I'm unsure how this impacts the odds of getting false corrections as the amount increases.

They will correct much slower for ~74 character strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed "share" to "string" and added a sentence that if you have a nonstandard length and failed checksum, then the wallet MAY "attempt correction by adding or deleting up to 3 characters".

docs/wallets.md Outdated
* If the header is invalid, the wallet SHOULD highlight this and request confirmation from the user before allowing additional data to be entered. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
* If the checksum is invalid, the wallet SHOULD use an error-correction algorithm to locate errors in the share and show these to the user. It MAY additinally determine corrected share data, but if so, the wallet MUST show these corrections to the user rather than silently applying them.
* To show locations of substitution errors, the wallet SHOULD highlight the offending 4-character window or the specific offending character.
* If the wallet can determine insertion or deletion errors, it SHOULD highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct share length is 16, 32 or (optionally) 64 bytes.
1. If the share length is *not* 16, 32 or 64 bytes, but the checksum passes, the wallet should confirm that the user intends to import a non-standard share length.
1. After the first share has been entered and accepted, the wallet now knows the seed ID and threshold value.
* If the first share had index `S`, this was the actual seed and the import process is complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the codex32 secret...

docs/wallets.md Outdated
* If the header is invalid, the wallet SHOULD highlight this and request confirmation from the user before allowing additional data to be entered. An invalid header is one that starts with a character other than `0` or `2` through `9`, or one which starts with `0` but whose share index is not `S`. For shares after the first, a header is also invalid if its threshold and identifier do not match those of the first share.
* If the checksum is invalid, the wallet SHOULD use an error-correction algorithm to locate errors in the share and show these to the user. It MAY additinally determine corrected share data, but if so, the wallet MUST show these corrections to the user rather than silently applying them.
* To show locations of substitution errors, the wallet SHOULD highlight the offending 4-character window or the specific offending character.
* If the wallet can determine insertion or deletion errors, it SHOULD highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct share length is 16, 32 or (optionally) 64 bytes.
1. If the share length is *not* 16, 32 or 64 bytes, but the checksum passes, the wallet should confirm that the user intends to import a non-standard share length.
1. After the first share has been entered and accepted, the wallet now knows the seed ID and threshold value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the first string... the wallet now knows the seed ID and threshold value which SHOULD be pre-filled for subsequent shares.

Also,
seed ID or identifier?

docs/wallets.md Outdated
1. If the share length is *not* 16, 32 or 64 bytes, but the checksum passes, the wallet should confirm that the user intends to import a non-standard share length.
1. After the first share has been entered and accepted, the wallet now knows the seed ID and threshold value.
* If the first share had index `S`, this was the actual seed and the import process is complete.
* Otherwise, the first character of the share will be a numeric character between `2` and `9` inclusive. The user must enter this many shares in total.
* Wallets MAY encrypt and store recovery progress, to allow recovery without having all shares available at once. The details of this are currently outside of the scope of this specification.
1. The user should then enter the remaining shares, in the same manner as the first.
* The wallet may pre-fill the header (threshold value and seed ID).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may should be should. There is no upside to letting them make header errors on subsequent shares. The resulting set will be invalid even if new headers pass checksum.

docs/wallets.md Outdated
1. If the share length is *not* 16, 32 or 64 bytes, but the checksum passes, the wallet should confirm that the user intends to import a non-standard share length.
1. After the first share has been entered and accepted, the wallet now knows the seed ID and threshold value.
* If the first share had index `S`, this was the actual seed and the import process is complete.
* Otherwise, the first character of the share will be a numeric character between `2` and `9` inclusive. The user must enter this many shares in total.
* Wallets MAY encrypt and store recovery progress, to allow recovery without having all shares available at once. The details of this are currently outside of the scope of this specification.
1. The user should then enter the remaining shares, in the same manner as the first.
* The wallet may pre-fill the header (threshold value and seed ID).
* If the user tries to repeat an already-entered share index, they should be prevented from entering additional data until it is corrected, with the exception that `?` may be used as a share index arbitrarily many times. The wallet may guide the user by indicating that a share index has been repeated; if the user indicates that they are not repeating the share, the share index should be replaced by `?`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erasure correction MUST not replace the ? with a previously valid share's index.

small performance improvement that helps insertion/deletion correction as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make error correction harder to implement. I'd rather not recommend it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what does the wallet do when it finds a "valid" correction that reuses a previous share index or has a different threshold or identifier?

This could happen when doing insertion/deletion correction.

Even my naive brute force assumes valid the previous header and only attempts corrections to share index, payload and checksum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it finds a valid correction that causes a share to be reused it should error out and make the user remove at least one of the shares from the set.

Brute force algorithms are easy to tweak to force certain conditions on the corrected string, versus standard error correction algorithms which will always find the closest match. If the closest match has an invalid share index, I'm very doubtful that the "next-closest match" which does have a valid share index will be unique, and even if it is, I don't know how to find it, and even if I could, I don't think that finding a worse match (in terms of distance) is a better idea than just rejecting the share data.

In other words, this situation can't happen except when the user has 5 or more errors in their string and there is a valid string with distance 4 or less, whose share index has already been used. So we definitely have two possible corrections (the distance-4 and the distance-5 one) and probably many more. I don't see any principled way to pick one so we should give up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation also can happen when correcting insertions and deletions since the checksum has no "4 or less" guarantee.

The correction algorithm has to consider every possible location as a potential edit. So here it can leave the pre-filled header alone.

Giving up is reasonable when multiple corrections are both valid shares and valid share set members for equal edit distance. I'd treat a "valid" corrected share that does not belong in the share set as invalid and try to do an additional round of insert/delete. That's only ~1,600x more candidates to check. The 20-bits of known ident should break any ties and produce just 0 or 1 valid correction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this alone, other than to capitalize "should".

Let me spend some time implementing error correction in rust-codex32, and maybe trying to implement insertion/deletion correction, and then see if I develop an intuition about what sorts of constraints are reasonable for correction implementations.

1. If the share length is *not* 16, 32 or 64 bytes, but the checksum passes, the wallet should confirm that the user intends to import a non-standard share length.
1. After the first share has been entered and accepted, the wallet now knows the seed ID and threshold value.
* If the first share had index `S`, this was the actual seed and the import process is complete.
* Otherwise, the first character of the share will be a numeric character between `2` and `9` inclusive. The user must enter this many shares in total.
* Wallets MAY encrypt and store recovery progress, to allow recovery without having all shares available at once. The details of this are currently outside of the scope of this specification.
1. The user should then enter the remaining shares, in the same manner as the first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For insertion/deletion correction, the wallet MUST assume the valid length is equal to the first share.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

   * The wallet MUST assume the valid length of all subsequent strings is equal to the valid length of the first string. If the lengths do not match, the wallet MAY attempt correction by deleting or inserting up to 3 characters.

@BenWestgate
Copy link
Contributor

Pushed another round of changes.

I wonder if we should change to a different format (e.g. HTML) at least temporarily, so that I can break bullet points across multiple lines in the text fie. I find the diff is extremely hard to read when we are editing these super-long lines.

I'm not having difficulty reviewing it yet. Do what's best for you.

Copy link
Contributor

@BenWestgate BenWestgate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to not specify an obvious example:

This phrasing conveys developers should consider replacing characters that are frequently mistaken for bech32 characters, whether due to visual similarity, keyboard proximity, or spoken language confusion.

docs/wallets.md Outdated

1. The user should enter the first share. To the extent possible given screen limitations, when entering share data, data should be displayed in uppercase, with visually distinct four-character windows. The first four-character window should include the `MS1` prefix, which should be pre-filled.
1. Once the first share is fully entered, the wallet should validate the checksum and header before accepting it.
* The user should not be able to enter mixed-case characters. The user must be able to enter all bech32 characters as well as `?` indicating an erasure. Wallets may allow users to enter non-bech32 characters, at their discretion. (This may be useful to guide error correction, e.g. by attempting to replace all `B`s with `8`s.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This may be useful to guide error correction, by attempting to replace commonly confused characters.)

@apoelstra
Copy link
Contributor Author

Ok, pushed one more commit. I'd to push forward on this and try to get something mergeable this week. In parallel I'll spend some time on error correction.

@BenWestgate
Copy link
Contributor

BenWestgate commented Sep 24, 2023 via email

@BenWestgate
Copy link
Contributor

BenWestgate commented Sep 24, 2023 via email

@apoelstra
Copy link
Contributor Author

@BenWestgate force-pushed to change the lengths and to remove the "up to 3" restriction on length-correcting subsequent strings.

@BenWestgate
Copy link
Contributor

BenWestgate commented Sep 24, 2023 via email

@BenWestgate
Copy link
Contributor

Okay I'll leave my last comments here so you can get it merged:

Codex32 should be Capitalized when it begins a sentence:

For Wallet Developers

codex32 is a new format

Import Support

codex32 shares may be any length between 128 and 512 bits.

BenWestgate

This comment was marked as duplicate.

Copy link
Contributor

@BenWestgate BenWestgate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If the checksum is invalid, the wallet SHOULD use an error-correction algorithm to locate errors in the string and show these to the user. It MAY additionally determine corrected data, but if so, the wallet MUST show these corrections to the user rather than silently applying them.
  • To show locations of substitution errors, the wallet SHOULD highlight the offending 4-character window or the specific offending character.
  • If the wallet can determine insertion or deletion errors, it SHOULD highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct string length is 48, 74 or (optionally) 127 characters (corresponding to 16-, 32- or 64-byte seeds).

Change this to:

  • If the checksum is invalid, the wallet SHOULD use an error-correction algorithm to attempt to correct errors in the string. If a valid candidate correction is found, the wallet MUST show it to the user for confirmation rather than silently applying it.
  • To show locations of substitution errors, the wallet MAY highlight the offending 4-character window or the specific offending character.
  • If the wallet can determine insertion or deletion errors, it MAY highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct string length is 48, 74 or (optionally) 127 characters (corresponding to 16-, 32- or 64-byte seeds).
  1. If the string length is not 48, 74 or 127 characters, and the checksum does not pass, then the wallet MAY attempt correction by deleting and/or inserting up to 3 characters, as long as the resulting string has a valid length for a codex32 string.

Should be:

  1. If the checksum does not pass, then the wallet MAY attempt correction by deleting and/or inserting up to 3 characters, as long as the resulting string has a valid length for a codex32 string.

I believe you've favored MAY over SHOULD here because correcting some insert/delete patterns requires brute force.

I would also swap the order of "If the wallet can determine insertion or deletion errors" and "If the checksum does not pass, then the wallet MAY attempt correction by deleting and/or inserting"

As the attempt to correct comes before the possible display of corrections.

With these changes it should be perfect.

@apoelstra
Copy link
Contributor Author

Agreed with all suggestions. I did the line reordering in a separate commit so hopefully the diff is easier to read (should appear as moved lines rather than changed ones).

@BenWestgate
Copy link
Contributor

BenWestgate commented Oct 26, 2023

  1. Once the first string is fully entered, the wallet should validate the checksum and header before accepting it.

If the checksum does not pass, then the wallet MAY attempt correction by deleting and/or inserting up to 3 characters, as long as the resulting string has a valid length for a codex32 string.
If the checksum is invalid, the wallet SHOULD use an error-correction algorithm to attempt to correct errors in the string. If a valid candidate correction is found, the wallet MUST show it to the user for confirmation rather than silently applying it.

Is clearer and more concise like this:

  • If the checksum is invalid, to attempt to correct errors in the string, wallets:
    • SHOULD substitute up to 4 characters using an error-correction algorithm.
    • MAY delete and/or insert up to 3 characters, as long as the result is a valid length codex32 string. Wallets MAY assume that the correct length is the closest of 48, 74 or (optionally) 127 characters (corresponding to 16-, 32- or 64-byte seeds).
  • If a valid candidate correction is found, the wallet MUST show it to the user for confirmation rather than silently applying it.

Then for the last two repetitive bullets:

To show locations of substitution errors, the wallet MAY highlight the offending 4-character window or the specific offending character.
If the wallet can determine insertion or deletion errors, it MAY highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct string length is 48, 74 or (optionally) 127 characters (corresponding to 16-, 32- or 64-byte seeds).

These can be one bullet if combined with the suggestion above:

  • When displaying a candidate correction, wallets MAY highlight the corrected 4-character windows and/or the specific locations of substitution, insertion and/or deletion corrections.

Since we now display a corrected string instead of highlight wrong locations, errors becomes "corrections". "Locations" over characters because corrections that delete a character are invisible.

This can be a sub-bullet under "If a valid candidate correction is found" if that formatting looks better.

  • If a valid candidate correction is found, the wallet MUST show it to the user for confirmation rather than silently applying it.
    • When displaying a candidate correction, wallets MAY highlight the corrected 4-character windows and/or the specific locations of substitution, insertion and/or deletion corrections.

Finally, notice "the wallet" vs "wallets", characters can be saved with the phrase "wallets". With the first use being "compliant wallets", "compatible wallets" or "codex32-enabled wallets". This is up to you, but it should be consistent throughout, so go ahead and change the quotes above back to "the wallet" if you prefer that or don't want to change it.

@BenWestgate
Copy link
Contributor

BenWestgate commented Oct 26, 2023

Under step 4:

  • If the checksum fails, the wallet MAY attempt correction by deleting and/or inserting up to 3 characters. However, the wallet MUST assume the valid length of all subsequent shares is equal to the valid length of the first share, so the number of characters inserted and deleted must net out to the correct length.

Either remove the mention of "up to _" here and let developers figure it out the hard way or say:

When attempting to correct subsequent shares, wallets MAY delete up to 6 characters and/or insert up to 4 characters. However, the wallet MUST assume the valid length is equal to the length of the first share, so the number of characters inserted and deleted must net out to the correct length.

alternatively, "...wallets MAY insert and/or delete characters. However..."

The edit correction can go further safely (and quickly) when the target length is known.

Here is a comment on something we may need to mention in step 2:

But insertions are like substitutions, it's unlikely a wallet could attempt to correct say 1 insertion of a missing character AND still try 4 substitutions without a High Risk of finding over corrections. Since it's similar to 5 substitutions. Insertion correction is like substituting one of the invisible '' strings around each character of an invalid string with one from the charset.

If I'm correct about this limitation, then a bullet saying the maximum amount of correction attempted MUST be erasures/2 + deletions/1.5 + substitutions + insertions <= 4. You might find a way to express this including the potential to correct contiguous erasures up to 13 (or 15?). I commented somewhere that BIP93 forgot to mention correction of these "combination" errors but they occur constantly in practice.

Where 6 deletes comes from.
Deletions are less complex than substitution errors by a factor of 32 (they always substitute '' rather than any character), erasures are less complex than a substitution error by a factor of the string length (position known vs unknown) so 32/(48-8) * 8 = 6.4 was where I came up with a maximum deletion quantity of 6. Or the checksum length / 2 (imagine appending a checksum to a valid string, then partially truncating it to make it invalid again, you should always correct to the closest valid string, trying 7 deletes would not be it.)

We said wallets SHOULD attempt error correction, but we said

"The user must be able to enter all bech32 characters as well as ? indicating an erasure."

Since we said they MUST accept '?' as a character then they MUST attempt erasure correction. Which we did NOT MENTION as the first thing to do when a string is invalid because it contains 1 or more '?' characters.

Or if you don't want to make erasure correction a MUST that will need to be changed to SHOULD accept '?' and IF they accept '?' they MUST attempt erasure correction for at least one erasure (up to 13 continuous or 8 discontinuous).

@apoelstra
Copy link
Contributor Author

Ok, pushed to:

  • Use your suggested text where it made sense.
  • Introduce the term "error-correcting wallet" and use that to make it clearer under what conditions wallets MUST/MAY do things (ECWs have more MUSTs)
  • Remove the "up to 3" condition on insertions/deletions.

Right now I specify that ? indicates an erasure but I'm silent on how the wallet treats this, because I leave erasure support as optional. If the wallet does not support erasures it can just treat ?s as substitution errors.

I also don't specify what to do in the case that there are multiple possible corrections.

docs/wallets.md Outdated
## Error Detection and Correction

Wallets MUST support detection of errors using the codex32 checksum algorithm.
Wallets MAY additionally support error correction; such wallets are referred to as "error-correcting wallets (ECWs)" and have additional requirements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mentioned on IRC, for codex32 to be easier to use than bip39 it needs error correction. That observation changed the original document from MAY correct to SHOULD correct, so MAY here seems like a regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can correct (some) errors by hand using a lookup table. So it's easier to use than bip39 regardless of what the wallet supports.

But sure, I'll change this to SHOULD.

docs/wallets.md Outdated

An ECW

* MUST support correction of up to 4 substitution errors;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads as though implementations that can't correct "up to 4 substitution errors" ... "SHOULD NOT" expose error-correction function...

This is fine if there's a reference implementation to copy before anyone reads at this. Otherwise, it'll cause wallet developers to skip the ECC entirely. The ECC is very helpful even correcting up to 2 edits not "up to 4."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll write a reference implementation. I guess you are imagining that a wallet could support correcting up to 2 substitution via brute-force? Then what happens if it can't correct a string? It shouldn't say "uncorrectable" because the string may well be correctable, the wallet just has a poor implementation. So we would need extra text to discuss this case.

Better to just say "implement a standard ECC algorithm or don't bother".

Copy link
Contributor

@BenWestgate BenWestgate Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did 2 substitutions by brute force.
It will say the checksum is invalid and won't display a candidate correction. Same as with full ECC. No claim an input is not "correctable", just display the closest valid candidate found, if any.

Even standard ECC can't correct subsequent shares if the closest valid checksum share has an invalid header.

docs/wallets.md Outdated
* MUST attempt error correction of substitution and erasure errors.
* MAY attempt correction by deleting and/or inserting characters, as long as the resulting string has a valid length for a codex32 string.
* If a valid candidate correction is found, the wallet MUST show it to the user for confirmation rather than silently applying it.
* When displaying a candidate correction, wallets MAY highlight the corrected 4-character windows and/or the specific locations of substitution, insertion and/or deletion corrections.
1. After the first string has been entered and accepted, the wallet now knows the identifier, threshold value and valid length.
* If the first string had index `S`, this was the codex32 secret and the import process is complete.
* Otherwise, the fourth character of the share will be a numeric character between `2` and `9` inclusive. The user must enter this many shares in total.
* Wallets MAY encrypt and store recovery progress, to allow recovery without having all shares available at once. The details of this are currently outside of the scope of this specification.
1. The user should then enter the remaining shares, in the same manner as the first.
* The wallet SHOULD pre-fill the header (threshold value and identifier).
* If the user tries to repeat an already-entered share index, they should be prevented from entering additional data until it is corrected, with the exception that `?` may be used as a share index arbitrarily many times. The wallet may guide the user by indicating that a share index has been repeated; if the user indicates that they are not repeating the share, the share index SHOULD be replaced by `?`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we say '?' MAY be used as a share index and if they repeat an index but claim they aren't repeating, the index SHOULD be replaced by '?'. But we didn't say ECWs MUST or SHOULD interpret '?' as an erasure and we don't say what non-ECWs should do when the user tries to type a reused index and claims they are not repeating.

Maybe don't let non-ECWs type a reused share index at this position. Then we wouldn't have to substitute '?'. Another non-ECW idea, if they're trying to repeat an index, ask them to verify / re-enter the 9th character from their source copy of the data, then fail early if they confirm it's a repeat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the text so wallets SHOULD be ECWs, but if they're not error-correcting, they SHOULD NOT accept ?.

Copy link
Contributor

@BenWestgate BenWestgate Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. So non-ECW will not let user type '?' or a repeated index and will error w/ explanation until a fresh index is provided. ECW will warn and replace it with '?' if user confirms they are entering a new share. (perhaps it's an error in their source data that'll be corrected.)

Non-ECW also won't let users type an index besides 'S' when the threshold typed was '0', or any non-bech32 characters (unless auto-corrected to a bech32 one), following this theme of not letting users type errors the wallet won't correct

if k == '0' and index != 'S', ECWs can mark both as erasures after a warning to double check the source data. (We know at least one is an error, so marking two erasures matches the best case: 1 error and beats the worst case, both are errors.)

docs/wallets.md Outdated
@@ -33,21 +46,22 @@ Wallets SHOULD support import of 128- and 256-bit seeds; other lengths are optio
The process for entering codex32 strings is:

1. The user should enter the first string. To the extent possible given screen limitations, data should be displayed in uppercase with visually distinct four-character windows. The first four-character window should include the `MS1` prefix, which should be pre-filled.
* The user should not be able to enter mixed-case characters. The user must be able to enter all bech32 characters as well as `?` indicating an erasure. Wallets may allow users to enter non-bech32 characters, at their discretion. (This may be useful to guide error correction, by attempting to replace commonly confused characters.)
* The user should not be able to enter mixed-case characters. The user MUST be able to enter all bech32 characters. ECWs MUST also allow entry of `?` which indicates an erasure (unknown character). Wallets MAY allow users to enter non-bech32 characters, at their discretion. (This may be useful to guide error correction, by attempting to replace commonly confused characters.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here ECW MUST allow '?' but only MAY support correction of erasures, so you propose '?' be filled as errors despite their known location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

docs/wallets.md Outdated
* If the wallet can determine insertion or deletion errors, it MAY highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct string length is 48, 74 or (optionally) 127 characters (corresponding to 16-, 32- or 64-byte seeds).
1. Once the first string is fully entered, the wallet MUST validate the checksum and header before accepting it.
* If the checksum does not pass, then an ECW
* MUST attempt error correction of substitution and erasure errors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you meant MUST correct erasures when introducing the ECW wallet, this is correct, the introduction was wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"MUST attempt correction of erasures and substitution errors."

docs/wallets.md Outdated
An ECW

* MUST support correction of up to 4 substitution errors;
* MAY also support correction of up to 8 erasures;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contradicts line 53, use MUST support correction of erasures. Especially since standard QR encodings may erase 1 character to fit in 25x25.

Copy link
Contributor

@BenWestgate BenWestgate Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"MUST also support up to 8 erasures, 13 if contiguous;"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the problem here. Should I change line 32 to say "substitution or erasures" so that it's clear that any wallet that can correct 4 substitutions can also correct 4 erasures?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be more clear. But the detail I missed was that during correction without erasure correction support, '?' is treated as an arbitrary character. That's a bit different than these quotes:

BIP93

Implementations interpret "?" as an erasure.

This doc

ECWs MUST also allow entry of ? which indicates an erasure (unknown character).

I'd say something along the lines of

"'?' can/may be substituted with an arbitrary bech32 character to correct up to 4 erasures as errors."


* MUST support correction of up to 4 substitution errors;
* MAY also support correction of up to 8 erasures;
* MAY support correction of further errors, including insertion or deletion errors.
Copy link
Contributor

@BenWestgate BenWestgate Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We listed the max erasures and errors that may be corrected and still find the closest valid string. Deleting 6 characters and inserting 4 missing characters are near certainly the limit before these edits find multiple valid strings or strings that are not the closest. With the limit likely 3 and 3 on the first string with an unknown correct length. Assuming the correct length is 48 or 74 gives full power on the 1st string, at the expense of being able to correct invalid intended unusual length 1st strings that are too near 48 and 74.

Any reason to not mention these limits here, as we did for substitution errors and erasures? With 13 insertions, every string has a valid correction candidate (append checksum) but it's extremely unlikely to be the closest or intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you can only correct insertions or deletions via brute-force, and the result is never guaranteed to be unique, so there are no limits that make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a quantity of inserts or deletes that is guaranteed to NOT be unique? (besides the trivial 13 inserts)

Do we need to specify precedence between substitutions and the optional insertion/deletions? If a string becomes valid with 4 substitutions as well as 1 delete and 1 insert (and they are NOT the same correction), should we say which should be displayed?

docs/wallets.md Outdated
* To show locations of substitution errors, the wallet MAY highlight the offending 4-character window or the specific offending character.
* If the wallet can determine insertion or deletion errors, it MAY highlight the offending 4-character window or the specific location of the inserted or missing character. When detecting insertion or deletion errors, the wallet MAY assume that the correct string length is 48, 74 or (optionally) 127 characters (corresponding to 16-, 32- or 64-byte seeds).
1. Once the first string is fully entered, the wallet MUST validate the checksum and header before accepting it.
* If the checksum does not pass, then an ECW
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If the checksum is invalid, then an ECW:"

@BenWestgate
Copy link
Contributor

BenWestgate commented Oct 27, 2023

  • Remove the "up to 3" condition on insertions/deletions.

They still have a limit, just like erasures and substitutions, I meant it's higher than "up to 3" if the valid length is known.

Right now I specify that ? indicates an erasure but I'm silent on how the wallet treats this, because I leave erasure support as optional. If the wallet does not support erasures it can just treat ?s as substitution errors.

Wastes correction power but is fine.

I also don't specify what to do in the case that there are multiple possible corrections.

Eliminate any with an hrp, threshold, identifier not matching previous valid shares or with a share index that is reused or those with k = '0', index != 's'.
If multiple remain, pick the closest by edit distance. If still tied pick the one(s) with share index earliest in alphabet (since the book starts with share 'A', 'C'...)

Brute force algorithms should search from closest to furthest edit distance and return the 1st valid correction candidate found for speed.

I've never seen a wrong candidate correction even doing 6 deletes.

@apoelstra
Copy link
Contributor Author

I've never seen a wrong candidate correction even doing 6 deletes

Yes, they're extremely unlikely because the 65-bit checksum is (approximately) a random hash and if your Python code is able to grind out a solution, it's nowhere near doing 65 bits of work, which is what you'd expect in order to find an incorrect correction.

But it's still possible and will happen a nonzero number of times to any hardware vendor who sells enough units.

Eliminate any with an hrp, threshold, identifier not matching previous valid shares or with a share index that is reused or those with k = '0', index != 's'.
If multiple remain, pick the closest by edit distance. If still tied pick the one(s) with share index earliest in alphabet (since the book starts with share 'A', 'C'...)

This seems like a lot of text that's completely inapplicable to people using normal error-correcting algorithms and not trying to brute-force errors :/. And it will be very hard to produce test vectors for these.

@apoelstra
Copy link
Contributor Author

This seems like a lot of text that's completely inapplicable to people using normal

As you point out above, this isn't true -- even using a normal substitution error correcting algo, if you 'correct' a subsequent share and the result has a bad HRP or inconsistent header, we need to decide what to do.

@BenWestgate
Copy link
Contributor

This seems like a lot of text that's completely inapplicable to people using normal

As you point out above, this isn't true -- even using a normal substitution error correcting algo, if you 'correct' a subsequent share and the result has a bad HRP or inconsistent header, we need to decide what to do.

The easiest option is not give a correction, show a warning if the correction found repeated a previous share exactly. Otherwise a warning is probably more confusing than helpful.

Did we ever figure out how many valid corrections on average there are after 5 substitutions? If it's much less than potential headers, filtering for a valid header may be enough to narrow it down to a single correction candidate in most or nearly all cases..

@BenWestgate
Copy link
Contributor

This seems like a lot of text that's completely inapplicable...

"Eliminate any with..."

Shouldn't need mentioning that brute force search is faster only generating valid header candidates.

It is a real possibility with standard substitution correction algorithms so we shouldn't waste user's time asking them to confirm strings we'd reject were it typed.

If we can, it'd be more helpful to try to correct 5 substitution errors and show the first with a valid hrp, header and index.

@BenWestgate
Copy link
Contributor

BenWestgate commented Nov 13, 2023

Screenshots of a codex32 import implementation.

An invalid but correctable string is drawn red and the OK button becomes "Suggest Correction"
image
Here with a good checksum it turns green and the OK button becomes "Submit" and grabs focus (hitting enter clicks it).
image
The button is unsensitive when a correction cannot be found.
image

Right now clicking "Suggest Correction" will start the ECC search, but it would be more responsive to look for corrections every time the entry changes and leave the button unsensitive until a candidate correction is ready to display (red text) or the checksum passes (green text).

I'll delay turning the text Red for a couple seconds so that doesn't happen every time a user is nearly finished entering an error free string and the background ECC finds insertions to complete it.

@BenWestgate
Copy link
Contributor

BenWestgate commented Nov 13, 2023

In a stateless signing situation, the wallet will ask for the PSBT first then the 2nd group of the first codex32 string can often be prefilled with the default fingerprint ID yet remain editable unlike 'MS1' or subsequent shares' first two groups.

A PSBT will also always detect bad combos and tampered shares, if the recovered seed's fingerprint is not in the PSBT, reject the share set. (or the PSBT..) Since PSBTs contain an xpub it's 100% safe to "overclock" the error correction one last share (the entire set, if necessary) in a recover & sign situation.

"The extended public keys and/or BIP32 fingerprint from a PSBT or watch-only descriptor MAY be used to further guide error correction."

Recover & Sign should be the main codex32 import workflow, why use SSS and make the signer a single point of failure by storing private keys between uses? (There's reasons, but generally, SSS is chosen to split a secret so access requires 2+ things.)

Something ledger style inputs might consider is entering the first 32 characters, then displaying the checksum for users to visually confirm or deny exactly matches the last 13 characters of their backup. If the checksum does not exactly match, make the user type what they have written down for the checksum then display a candidate correction. This is near 50% speedup on subsequent shares without errors.

Reposted from IRC since a rephrasing might make it into this document as a MAY if the concept is sound.
This visually verify checksum isn't much faster than typing on a Real Keyboard but it's way faster on fake ones.
Another option along this idea is to pop up the first character of the expected (error free data) checksum in the scroll wheel keyboard to minimize typing time to simply confirming the checksum character by character if no errors happened.

@apoelstra
Copy link
Contributor Author

cc @roconnor-blockstream I think this is good to go. Could you give it a quick re-review? You can ignore all 120 of the above comments.

@roconnor-blockstream
Copy link
Collaborator

Seems fine.

@BenWestgate
Copy link
Contributor

BenWestgate commented Nov 18, 2023

FYI: If you wish to use consistent quotes throughout, a typo:

I used 'Submit' and then "Attempt Correction" when referring to option names.

Edit:
Double quotes would be consistent with Andrew's intro.

And I wrote '?' Instead of ? which we had used earlier. ? is better.

@apoelstra
Copy link
Contributor Author

Good catch. I went through and made sure that every ? is in backticks and every quote is a ".

@apoelstra
Copy link
Contributor Author

I think we're good to merge then, with your blessing @BenWestgate.

@BenWestgate
Copy link
Contributor

BenWestgate commented Nov 18, 2023

This is ready.

I will make a PR later to add a link to my Gtk Import GUI implementation of this document as an example.

The "ECW" part isn't finished, I haven't connected my error corrector to the "Attempt Correction" button, or built a window to display and Accept/Reject any results found (with corrected groups highlighted).

It fully works as a non-ECW. It decodes standard QR encodings of codex32 strings too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants