Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Needless URL encoding of link destinations #312

Open
hukkin opened this issue Mar 1, 2022 · 6 comments
Open

Needless URL encoding of link destinations #312

hukkin opened this issue Mar 1, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@hukkin
Copy link
Member

hukkin commented Mar 1, 2022

# From
- [Rénovateur de pannes](Rénovateur-de-pannes)
# To
- [Rénovateur de pannes](R%C3%A9novateur-de-pannes)

Changing the link character format is unneeded IHMO and makes the link less readable, less practical to correct if needed.

Originally posted by @mdeweerd in #112 (comment)

@hukkin hukkin added the enhancement New feature or request label Mar 1, 2022
@hukkin hukkin mentioned this issue May 9, 2022
@ericholscher
Copy link

ericholscher commented Jun 24, 2022

I'm hitting this as well, and it's breaking my usage of the feature. I'm trying to use pelican and it's encoding things that break the syntax. For example:

As our [publisher policy]({filename}../publisher-policy.md) lays out:

Becomes

As our [publisher policy](%7Bfilename%7D../publisher-policy.md) lays out:

I'd love a no encoding option.

@sanmai-NL
Copy link

sanmai-NL commented Jun 8, 2023

- [_MoSCoW_ 🗳️](#moscow-%EF%B8%8F)
- [_Task ✔️_](#task-%EF%B8%8F)
- [_🛡️ Security_](#%EF%B8%8F-security)

These emoji get %EF%B8%8F appended for some reason under mdformat 0.7.16.

@kdeldycke
Copy link

Same thing here, trying to format the Chinese translation of my awesome-iam project, which ends up like this:

- [Bloom Filter](https://zh.wikipedia.org/wiki/%E5%B8%83%E9%9A%86%E8%BF%87%E6%BB%A4%E5%99%A8)
diff --git readme.md readme.md
index 2ec585b..462846c 100644
--- readme.md
+++ readme.md
@@ -38,46 +38,46 @@
 
 <!-- mdformat-toc start --slug=github --no-anchors --maxlevel=6 --minlevel=2 -->
 
-- [概述](#概述)
-- [安全](#安全)
-- [账户管理](#账户管理)
-- [密码学](#密码学)
-  - [标识符](#标识符)
-- [零信任网络](#零信任网络)
-- [认证](#认证)
-  - [基于密码](#基于密码)
-  - [无密码](#无密码)
-  - [安全密钥](#安全密钥)
-  - [多因素](#多因素)
-  - [基于短信](#基于短信)
-  - [公钥基础设施](#公钥基础设施)
+- [概述](#%E6%A6%82%E8%BF%B0)
+- [安全](#%E5%AE%89%E5%85%A8)
+- [账户管理](#%E8%B4%A6%E6%88%B7%E7%AE%A1%E7%90%86)
+- [密码学](#%E5%AF%86%E7%A0%81%E5%AD%A6)
+  - [标识符](#%E6%A0%87%E8%AF%86%E7%AC%A6)
+- [零信任网络](#%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C)
+- [认证](#%E8%AE%A4%E8%AF%81)
+  - [基于密码](#%E5%9F%BA%E4%BA%8E%E5%AF%86%E7%A0%81)
+  - [无密码](#%E6%97%A0%E5%AF%86%E7%A0%81)
+  - [安全密钥](#%E5%AE%89%E5%85%A8%E5%AF%86%E9%92%A5)
+  - [多因素](#%E5%A4%9A%E5%9B%A0%E7%B4%A0)
+  - [基于短信](#%E5%9F%BA%E4%BA%8E%E7%9F%AD%E4%BF%A1)
+  - [公钥基础设施](#%E5%85%AC%E9%92%A5%E5%9F%BA%E7%A1%80%E8%AE%BE%E6%96%BD)
   - [JWT](#jwt)
   - [OAuth2 & OpenID](#oauth2--openid)
   - [SAML](#saml)
-- [授权](#授权)
-  - [策略模型](#策略模型)
-  - [开源策略框架](#开源策略框架)
-  - [AWS 策略工具](#AWS-策略工具)
+- [授权](#%E6%8E%88%E6%9D%83)
+  - [策略模型](#%E7%AD%96%E7%95%A5%E6%A8%A1%E5%9E%8B)
+  - [开源策略框架](#%E5%BC%80%E6%BA%90%E7%AD%96%E7%95%A5%E6%A1%86%E6%9E%B6)
+  - [AWS 策略工具](#AWS-%E7%AD%96%E7%95%A5%E5%B7%A5%E5%85%B7)
   - [Macaroons](#macaroons)
-- [秘密管理](#秘密管理)
-  - [硬件安全模块 (HSM)](#硬件安全模块-hsm)
-- [信任与安全](#信任与安全)
-  - [用户身份](#用户身份)
-  - [欺诈](#欺诈)
+- [秘密管理](#%E7%A7%98%E5%AF%86%E7%AE%A1%E7%90%86)
+  - [硬件安全模块 (HSM)](#%E7%A1%AC%E4%BB%B6%E5%AE%89%E5%85%A8%E6%A8%A1%E5%9D%97-hsm)
+- [信任与安全](#%E4%BF%A1%E4%BB%BB%E4%B8%8E%E5%AE%89%E5%85%A8)
+  - [用户身份](#%E7%94%A8%E6%88%B7%E8%BA%AB%E4%BB%BD)
+  - [欺诈](#%E6%AC%BA%E8%AF%88)
   - [Moderation](#moderation)
-  - [威胁情报](#威胁情报)
-  - [验证码](#验证码)
-- [黑名单](#黑名单)
-  - [主机名和子域](#主机名和子域)
-  - [邮件](#邮件)
-  - [保留的 ID](#保留的-ID)
-  - [诽谤](#诽谤)
-- [隐私](#隐私)
-  - [匿名化](#匿名化)
+  - [威胁情报](#%E5%A8%81%E8%83%81%E6%83%85%E6%8A%A5)
+  - [验证码](#%E9%AA%8C%E8%AF%81%E7%A0%81)
+- [黑名单](#%E9%BB%91%E5%90%8D%E5%8D%95)
+  - [主机名和子域](#%E4%B8%BB%E6%9C%BA%E5%90%8D%E5%92%8C%E5%AD%90%E5%9F%9F)
+  - [邮件](#%E9%82%AE%E4%BB%B6)
+  - [保留的 ID](#%E4%BF%9D%E7%95%99%E7%9A%84-ID)
+  - [诽谤](#%E8%AF%BD%E8%B0%A4)
+- [隐私](#%E9%9A%90%E7%A7%81)
+  - [匿名化](#%E5%8C%BF%E5%90%8D%E5%8C%96)
   - [GDPR](#gdpr)
 - [UX/UI](#uxui)
-- [竞争分析](#竞争分析)
-- [历史](#历史)
+- [竞争分析](#%E7%AB%9E%E4%BA%89%E5%88%86%E6%9E%90)
+- [历史](#%E5%8E%86%E5%8F%B2)
 
 <!-- mdformat-toc end -->

Source: https://github.com/kdeldycke/awesome-iam/pull/100/files#diff-109f56ef9f23fd7bfdbf2e2c9a28b45bbe8160c71c6ee1f0f1439e0ea22103be

@sanmai-NL
Copy link

Please note that URIs cannot contain non-ASCII characters, so the fix is correct.
But hopefully there's some middle ground or work-around.

https://bugs.ruby-lang.org/issues/12852

@kdeldycke
Copy link

Yes, maybe adding a --allow-iri or --allow-unicode-links to allow for Internationalized Resource Identifier instead or normalizing everything to URIs.

Unicode characters is extremely user-friendly for international content, both for readers and maintainers.

Note that Wikipedia renders all URLs with ASCII % escape codes in HTML, but let its links in MediaWiki syntax (like [[统一资源定位符]]) be written with unicode. You can check this out by trying to edit any non-english Wikipedia page.

I guess it is no unreasonable to let links and URLs in Markdown (a markup syntax) have unicode, and leave the rendering engine apply the appropriate URL encoding depending on the target (HTML, Latex, etc.).

@mdeweerd
Copy link

IMHO one of the goals of Markdown is to keep the source readable.

And mdformat helps to keep the source somewhat normalized.

One could argue that if the source is recognized by CommonMark reference implementations, then the source is acceptable.

Testing the chinese "links" at https://spec.commonmark.org/dingus/ shows that the reference implementation still shows a link.
Our browsers are smart enough to URLencode the links before they are actually used - a target server will receive urlencoded links.

Converting the links specified in markdown to urlencoded links is technically correct because that is what a browser will do, but it makes the markdown source less readable and not a easy to adjust by a human. There is IMHO no technical need to urlencode the links in the markdown source.

We can also look at the commonmark specification, and more specifically the examples. In Example 31 we can see that html encoding is accepted in the link and it is only url encoded in the html rendering.

The CommonMark specification for a link destination does not require that links are URL encoded.

So in the end it's a matter of taste as both approaches are technically valid. Personnally I would not convert the link representation and prefer options modify that behavior to urlencode or urldecode links. I would probably have a preference to Urldecode links that are urlencoded to make them more readable to humans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants