You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When converting non-english text with anchor tags in UTF-8 to html, output tags are "-1", "-2", .. instead of error thrown / tag converted in Russian. Also in case of German (not tested on other languages with umlauts), umlauts (= ä, ö, ü, ..) are changed to their default versions (a, o, u, ..) in id's.
To Reproduce
With english text:
# main.pyimportmarkdown2help_text='''# Header## Table of Contents1. [Getting Started](#getting-started)### Getting Started {#}To begin using the application, launch `main.py`.'''help_text_html=markdown2.markdown(help_text, extras=['header-ids'])
print(help_text_html)
Result (all ok):
<h1 id="header">Header</h1>
<h2 id="table-of-contents">Table of Contents</h2>
<ol>
<li><a href="#getting-started">Getting Started</a></li>
</ol>
<h3 id="getting-started">Getting Started {#}</h3>
<p>To begin using the application, launch <code>main.py</code>.</p>
With Russian text, encoding - UTF-8:
importmarkdown2help_text='''# Руководство ## Содержание1. [Начало работы](#начало-работы)### Начало работы {#}Для начала работы запустите `main.py`.'''help_text_html=markdown2.markdown(help_text, extras=['header-ids'])
print(help_text_html)
Output (id's are somehow "-x"..):
<h1 id="-1">Руководство</h1>
<h2 id="-2">Содержание</h2>
<ol>
<li><a href="#начало-работы">Начало работы</a></li>
</ol>
<h3 id="-3">Начало работы {#}</h3>
<p>Для начала работы запустите <code>main.py</code>.</p>
With German text, encoding - UTF-8 (Umlauts replaced in id's):
<had to change text a bit cause translation for text above doesn't contain any umlauts by default>
importmarkdown2help_text='''## Handbuch ## Inhalt1. [ü-umlaut-test-encoding](#ü-umlaut-test-encoding)### ü-umlaut-test-encoding {#}Führen Sie `main.py` aus, um loszulegen.'''help_text_html=markdown2.markdown(help_text, extras=['header-ids'])
print(help_text_html)
Output:
<h2 id="handbuch">Handbuch</h2>
<h2 id="inhalt">Inhalt</h2>
<ol>
<li><a href="#ü-umlaut-test-encoding">ü-umlaut-test-encoding</a></li>
</ol>
<h3 id="u-umlaut-test-encoding">ü-umlaut-test-encoding {#}</h3>
<p>Führen Sie <code>main.py</code> aus, um loszulegen.</p>
Expected behavior
In case if only ASCII is supported, it would like to see an error thrown with sort of "Unsupported character at position XYZ" description. Also i would expect warning and/or error in case of German where "ü" would be preserved in text, in link (#ü-umlaut-test-encoding), BUT not in id: <h3 id="**u**-umlaut-test-encoding">
Debug info
markdown2 version = 2.4.10
Any extras being used: 'header-ids'
The text was updated successfully, but these errors were encountered:
HardMax71
changed the title
Converting non-english anchor tags leads to "-x" values
Converting non-english anchor tags leads to "-x" values (or umlauts are replaced)
Nov 6, 2023
Git blame shows this was last touched April 2012, so I guess this was a compatibility limitation at the time? The wiki page also explicity says header IDs are ASCII. @nicholasserra can you see any issues with bumping this up to utf-8?
Describe the bug
When converting non-english text with anchor tags in UTF-8 to html, output tags are "-1", "-2", .. instead of error thrown / tag converted in Russian. Also in case of German (not tested on other languages with umlauts), umlauts (= ä, ö, ü, ..) are changed to their default versions (a, o, u, ..) in id's.
To Reproduce
With english text:
Result (all ok):
With Russian text, encoding - UTF-8:
Output (id's are somehow "-x"..):
With German text, encoding - UTF-8 (Umlauts replaced in id's):
<had to change text a bit cause translation for text above doesn't contain any umlauts by default>
Output:
Expected behavior
In case if only ASCII is supported, it would like to see an error thrown with sort of "Unsupported character at position XYZ" description. Also i would expect warning and/or error in case of German where "ü" would be preserved in text, in link (#ü-umlaut-test-encoding), BUT not in id:
<h3 id="**u**-umlaut-test-encoding">
Debug info
markdown2 version = 2.4.10
Any extras being used:
'header-ids'
The text was updated successfully, but these errors were encountered: