fix: disallow more characters in links #5509
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As mentioned in #5486, this is the "second" part at making the link parser a bit better. This PR disallows characters from ASCII that aren't valid in domains. Similar to what GFM considers a valid domain, 0-9, A-Z, a-z, '-', '_', and '.' are allowed. However, since the parser should be able to parse "domains" that actually require punycode like https://köln.de, all characters outside ASCII are allowed. I think it's fine to allow these, even though it's a bit wrong.
Fixes #4769.