-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFM to HTML creates incompatible anchors vs. Github/Gitlab anchor logic #5057
Comments
GitHub doesn't use redcarpet any more for rendering. They use a variant of cmark. |
Yeah I wasn't 100% sure, chasing this down ended in several dead ends, I'm not exactly sure what code is where to put my finger on the exact routine. The only reason it "felt right" was the list of stripped characters matched what I was seeing... (not having luck with Google finding the right source code) |
I notice that the Here's the relevant function ( toIdent :: ReaderOptions -> [Inline] -> String
toIdent opts = map (\c -> if isSpace c then '-' else c)
. filterer
. map toLower . stringify
where filterer = if isEnabled Ext_ascii_identifiers opts
then mapMaybe toAsciiChar
else filter (\c -> isLetter c || isAlphaNum c || isSpace c ||
c == '_' || c == '-') |
@kivikakk might be able to help us locate the exact algorithm GitHub uses to create the automatic header identifiers, so we can match it better. |
Ah, it looks like the
This should be trivial to fix. |
Roger that! I manage this via CI/CD, testing a pipeline build now to verify.... insert hold music |
Every example above works great (the actual in-place content as well as a bunch more), 100% fixes everything right up when adding |
Great, I'm going to fix the code so that |
This partially addresses #5057, fixing a bad interaction between the `ascii_identifiers` extension and the `gfm_auto_identifiers` extension, and creating identifiers that match the ones GitHub produces. This code still needs to be put somewhere common, so the `gfm_auto_identifiers` extension will work with other formats.
Happy to help. We:
That leaves us with the ID. If we've already created a heading with an identical ID, we append |
@kivikakk thanks, we were close but not exactly there. This helps! |
Pandoc: 2.4-1 / Debian 9 (github DEB download)
Use:
pandoc -s -f gfm+backtick_code_blocks -t html -o file.html file.md
The logic Pandoc is using to generate anchors doesn't match the same logic as used by Github/Gitlab rendering. After searching around, I think this is the routine they use, the list of
STRIPPED
chars seems to match what I am seeing:In general, Pandoc's generated format allows more markup (slashes, parens, periods, etc.) in the generated anchor than Github/Gitlab does. Source example of various headings from my Markdown rendered in Pandoc using the markup not being generated correctly:
Gists of each platform rendering showing their anchor generation, it matches what you see rendered when the MD file is saved into the repository view (same rendering engine):
The functional problem is that manually maintained TOC lists which work correctly when using the Markdown files linked directly from Github/Gitlab do not work when Pandoc processes them to create HTML Pages out of the content. With this kind of technical writing it's hard to not use these kinds of markup in Heading elements here and there, especially when referring to filenames or keywords which can't be reworded. Thanks!
Related issues I found: #2821 #3388
The text was updated successfully, but these errors were encountered: