Clarify link label matching #695

dbuenzli · 2021-11-11T11:36:45Z

In the 0.30 spec we have:

One label matches another just in case their normalized forms are equal. To normalize a label, strip off the opening and closing brackets, perform the Unicode case fold, [...]

"Perform the Unicode case fold" is a bit unclear – in the sense I had to consult cmark to see what it was doing. If I understood correctly this is definition R4 of the Unicode standard p. 154. so maybe that could be referenced

P.S. A better definition would likely have been R5 as it would handle correctly identifiers in different normal forms (like é composed in one id and é decomposed in another one) but you'd need to import the Unicode normalization and associated machinery into the definition of CommonMark.

The text was updated successfully, but these errors were encountered:

kivikakk · 2024-07-10T15:29:30Z

I'm currently bringing Comrak up to speed on the changes to CommonMark since GFM was rebased on it, and I hit some difficulty here too, since "Unicode case fold" has no precise meaning.

I might end up imitating the mechanism used in cmark directly (generating code based on CaseFolding-x.0.0.txt) since every Unicode library out there supports a slightly different set of things.

We add `caseless` to do the folding. It matches upstream enough [^1], unlike e.g. ICU4X's `CaseMapper` (doesn't fold Eszett to "ss"), and also unlike ICU4X, it doesn't require us to bump our MSRV. 2/2 sgtm A separate `--gfm-quirks` CLI option is added since base tests fail if we just turn on all of GFM for them. The nice thing about `caseless` is that while its last release may be 6 years ago, it depends on unicode-normalization ^0.1, the latest of which is 5 months ago. It's also [very easy to read][caseless], so I'm all good with this. [^1] Not that straightforward: commonmark/commonmark-spec#695 [caseless]: https://github.com/unicode-rs/rust-caseless/blob/v0.2.1/src/lib.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify link label matching #695

Clarify link label matching #695

dbuenzli commented Nov 11, 2021

kivikakk commented Jul 10, 2024

Clarify link label matching #695

Clarify link label matching #695

Comments

dbuenzli commented Nov 11, 2021

kivikakk commented Jul 10, 2024