Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode link label normalization (fix test 539) #277

Merged
merged 4 commits into from
Aug 5, 2022
Merged

Conversation

tatchi
Copy link
Collaborator

@tatchi tatchi commented Jul 31, 2022

Input:

[ẞ]

[SS]: /url

This is the result in master:

File "tests/spec-539.html", line 1, characters 0-0:
diff --git a/_build/default/tests/spec-539.html b/_build/default/tests/spec-539.html.new
index c31102f..4e28a6f 100644
--- a/_build/default/tests/spec-539.html
+++ b/_build/default/tests/spec-539.html.new
@@ -1 +1 @@
-<p><a href="/url">ẞ</a></p>
+<p>[ẞ]</p>
make: *** [test] Error 1   

The issue is that both labels are not being matched, hence is it not recognized as a link. To match labels, we need to normalize them (strip off leading/trailing whitespace, ...) and do a case-insensitive comparison. The unicode version of that is a bit more complex as we need to do a Unicode case folding. From the spec:

One label matches another just in case their normalized forms are equal. To normalize a label, strip off the opening and closing brackets, perform the Unicode case fold, strip leading and trailing spaces, tabs, and line endings, and collapse consecutive internal spaces, tabs, and line endings to a single space.

This PR adapts the normalize function to work with unicode labels too. Fortunately, I could rely on some libs (uutf, uucp, and uunf) and I even found a piece of code in the doc that does almost what's needed.

With that adapted normalize function, and SS are matched. The result is now a link as expected.

@tatchi tatchi mentioned this pull request Jul 31, 2022
Copy link
Collaborator

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement and fix! I've two minor suggestions. I'll stage my suggestions (and perhaps merge conflict cleanup) as PR into this PR, unless you prefer (and have time) to address those things before I can get to it :D

Copy link
Collaborator

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tatchi
Copy link
Collaborator Author

tatchi commented Aug 4, 2022

Thanks for the review @shonfeder, I added extra tests to cover the normalization of labels in 49d3ca2

@@ -8,7 +8,7 @@ let protect ~finally f =
finally ();
r

let disabled = [ 206; 215; 216; 519; 539 ]
let disabled = [ 206; 215; 216; 519 ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outstanding progress. You're demolishing these deviations. Thanks so much @tatchi! 🎉

@shonfeder shonfeder merged commit 4f41b7d into master Aug 5, 2022
@shonfeder shonfeder deleted the fix-539 branch August 5, 2022 17:55
mseri pushed a commit to ocaml/opam-repository that referenced this pull request Dec 13, 2022
CHANGES:

- Expose the HTML escape function `htmlentities` (ocaml-community/omd#295 @cuihtlauac)

- Support generation of identifiers in headers (ocaml-community/omd#294, @tatchi)

- Support GitHub-Flavoured Markdown tables (ocaml-community/omd#292, @bobatkey)

- Update parser to support CommonMark Spec 0.30 (ocaml-community/omd#266, @SquidDev)

- Preserve the order of input files in the HTML output to stdout (ocaml-community/omd#258,
  @patricoferris)

- Fix all deviations from CommonMark Spec 0.30 (ocaml-community/omd#284, ocaml-community/omd#283, ocaml-community/omd#278, ocaml-community/omd#277, ocaml-community/omd#269,
  @tatchi)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants