-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFM to HTML conversion adding extra paragraph markup around sub-list elements #6589
Comments
Pandoc switched to a new library for |
Might be a bug in commonmark-hs... should we migrate it there? |
Well this is puzzling, because I can't reproduce it.
|
You can try using https://pandoc.org/try to reproduce the issue. |
While I can't recreate it using the tool above, a little testing tells me that it broke between 2.10 and 2.10.1; I was able to test 2.9.2.1 and 2.10 and receive the expected HTML sans
When I bounce the script back up to the latest 2.10.1 we get the injection of the extra paragraphs inside lists. Verbose mode didn't reveal anything interesting, no changes other than swapping out the release version. CI/CD pipeline (
The Markdown is written in pure generic GFM using the "4 spaces indent" style for sub-list items, the idea is that these documents display the same inside Gitlab/Github rendering as they do when generated to HTML and some CSS applied (we fixed a pandoc issue about a year ago related to these TOC entries not matching, I'm the same guy). The template is almost the same as the default pandoc one, I had to use a unique one to override some of the CSS/HTML embedded in the internal template (it's been so long I forget what, exactly - something in the header is hard coded?) but just in case here's the TPL file referenced above:
I create the TOCs manually so they display inside Gitlab/Github, it's not pandoc generating the TOC example just to be clear. A different page which has no sub-lists in the TOC has it's HTML looking like this:
Another document has the mix-and-match going on:
...this last one is interesting because sub-sub-lists (??) are missing it - so a sub-list without sub-sub has extra Something strange is afoot at the Circle-K... |
Puzzling indeed, I can reproduce with the pandoc from homebrew:
I haven't master on this machine though... and are we sure try.pandoc is running 2.10.1 ? |
It puts the version (generated by the library) on the bottom of the page when you convert -- so yes. |
Can you put something into your pipeline that runs |
Oh, I just noticed something that explains our divergent results. |
And, I can reproduce this with the EDIT: you can work around this in your pipeline by stripping excess blank lines before passing to pandoc. |
Excellent, thank you for diagnosing and fixing so quickly, much appreciated. |
I leverage the latest pandoc (grabbed with curl) in a CI/CD pipeline to process Markdown (GFM) into HTML; as I only edit these files once every few months I don't know exactly when this started happening but my thought is between version 2.9.2.1 and 2.10 based on the last time I edited a MD file and it (re)generated the HTML which was fine. This report is having just edited and run the pipeline today which used 2.10.1 version compiled for Debian downloaded via github Releases.
The Markdown looks like this:
The processing used in a loop of all files is creating the HTML like so:
The resulting HTML has extra embedded
<p>
elements wrapping the sub-list items, but this is inconsistent; on some markdown pages where only a top list exists (TOC with no leafs) it injects<p>
inside the list elements, but in this case it's "mix and match" within the list and sub-list like so:As far as I can recall, the
<p>
elements never injected in the older version (I would have noticed as now I have huge extra line spacing between elements), now all the TOC generated output is a mashup of "sometimes" causing odd visual formatting. Once CSS is applied, the above ends up looking like this:The result is semi-random (I'm sure there's a pattern hiding in there), as the placement of
<p>
elements seems to be random depending on the TOC construction (how many elements and sub-list elements). Pandoc definitely did not do this before, it's something new -- the last time I ran my CI/CD it used a pandoc featuregfm+backtick_code_blocks+...
which was deprecated in the latest code (my CI/CD failed and I had to go fix the script to remove that), if that helps tell when it was last working correctly - that feature was still possible/accepted.Thanks!
The text was updated successfully, but these errors were encountered: