[markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled #1841

timjb · 2014-12-25T01:06:20Z

This is an example from the documentation:

$ pandoc --version
pandoc 1.13.2
...
$ cat test.markdown
<table>
    <tr>
        <td>*one*</td>
        <td>[a link](http://google.com)</td>
    </tr>
</table>
$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test.markdown 
<table>
<pre><code>&lt;tr&gt;
    &lt;td&gt;*one*&lt;/td&gt;
    &lt;td&gt;[a link](http://google.com)&lt;/td&gt;
&lt;/tr&gt;</code></pre>
</table>

The documentation says that I should get

$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test.markdown 
<table>
    <tr>
        <td><em>one</em></td>
        <td><a href="http://google.com">a link</a></td>
    </tr>
</table>

The text was updated successfully, but these errors were encountered:

jgm · 2014-12-25T17:53:14Z

Indeed, this seems to be a regression. (I just tried with pandoc 1.9.4.1 and got the right result.) I'm not sure which release broke this, but I suspect the culprit is this change in version 1.13:

    + Revamped raw HTML block parsing in markdown (#1330).
      We no longer include trailing spaces and newlines in the
      raw blocks.  We look for closing tags for elements (but without
      backtracking).  Each block-level tag is its own `RawBlock`;
      we no longer try to consolidate them (though `--normalize` will do so).

Previously we parsed clumps of raw HTML tags as one block. With this change, each tag went into its own block. But that had the side effect that the indented tag gets parsed as an indented code block.

jgm · 2014-12-25T18:16:33Z

Actually, it's a bit unclear what the behavior should be. If we're really parsing markdown inside HTML tags, then anything indented four spaces should be a code block, which is exactly what we see in 1.13.2.

gforge · 2015-01-07T12:43:31Z

This has turned out as a bug in my htmlTable R package. I have a fix for the issue consisting of removing the tabs but this removes the readability if I want to look at the raw output. If possible a solution using a comment tag may be an option:

<!-- Start raw html -->
<table class='gmisc_table' style='border-collapse: collapse;' >
    <thead>
    <tr>
        <th> </th>
        <th>Header</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <td>Row 1</td>
        <td>Value</td>
    </tr>
    </tbody>
</table>
<!-- End raw html -->

seagreen · 2016-03-08T02:43:00Z

Perhaps an extension to turn off automatic code blocks for indented lines would be a good workaround?

pqmatagi · 2016-04-26T23:23:50Z

This appears to happen not just for indented HTML, but within any HTML element. For example, unindenting the simple example above is a fix, but:

$ pandoc --version
pandoc 1.17.0.2
Compiled with texmath 0.8.5, highlighting-kate 0.6.2.

$ cat test2.markdown
<table>
<tr>
<td>    *one*</td>
<td>    [a link](http://google.com)</td>
</tr>
</table>

$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test2.markdown
<table>
<tr>
<td>
<pre><code>*one*&lt;/td&gt;</code></pre>
<td>
<pre><code>[a link](http://google.com)&lt;/td&gt;</code></pre>
</tr>
</table>

or

$ cat test3.markdown
<table>
<tr>
<td>    *one*</td> <td>    [a link](http://google.com)</td>
</tr>
</table>

$ /usr/bin/pandoc --from=markdown+raw_html+markdown_in_html_blocks test3.markdown
<table>
<tr>
<td>
<pre><code>*one*&lt;/td&gt; &lt;td&gt;    [a link](http://google.com)&lt;/td&gt;</code></pre>
</tr>
</table>

(I encountered this when embedding raw HTML tables, generated by the R xtable package, that had "too many" leading spaces before some of the numbers, and one HTML table row per line.)

Being able to embed line-oriented markdown within HTML elements is a nice feature, but wouldn't it make sense to insist that it be at the start of an actual line, given that HTML (roughly) doesn't care? E.g.,

$ cat test4.markdown
<table>
<tr>
<td>
    This is a
    multi-line code block
</td> <td>
* [a link](http://google.com)
* another item
</td>
</tr>
</table>

$ /usr/bin/pandoc --from=markdown+raw_html+markdown_in_html_blocks --to=html test5.markdown
<table>
<tr>
<td>
<pre><code>This is a
multi-line code block</code></pre>
</td>
<td>
<ul>
<li><a href="http://google.com">a link</a></li>
<li>another item</li>
</ul>
</td>
</tr>
</table>

We have to: - Re-enable the extension that parses markdown inside raw html. - Flatten the tables, so they are processed as html instead of code (see jgm/pandoc#1841).

mpickering added the bug label Dec 27, 2014

jgm changed the title ~~Indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled~~ [markdwon reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled Jan 7, 2015

jgm added the status:more-discussion-needed label Jan 7, 2015

gforge mentioned this issue Jan 7, 2015

rendering tab characters in .Rmd gforge/Gmisc#9

Closed

andy-morris mentioned this issue Apr 27, 2015

Add option to disable indented code blocks? #2120

Closed

jgm changed the title ~~[markdwon reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled~~ [markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled Jul 1, 2015

Zaharid added a commit to NNPDF/reportengine that referenced this issue Feb 1, 2017

Interpret LaTex math inside tables

6420ce2

We have to: - Re-enable the extension that parses markdown inside raw html. - Flatten the tables, so they are processed as html instead of code (see jgm/pandoc#1841).

jgm added format:Markdown reader labels Mar 9, 2017

jgm added this to the pandoc 2.0 milestone Mar 9, 2017

jgm closed this as completed in 82cc7fb May 6, 2017

davidmerfield mentioned this issue May 20, 2019

Handles indentation inside pre-formatted Markdown blocks davidmerfield/Blot#184

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled #1841

[markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled #1841

timjb commented Dec 25, 2014

jgm commented Dec 25, 2014

jgm commented Dec 25, 2014

gforge commented Jan 7, 2015

seagreen commented Mar 8, 2016

pqmatagi commented Apr 26, 2016

[markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled #1841

[markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled #1841

Comments

timjb commented Dec 25, 2014

jgm commented Dec 25, 2014

jgm commented Dec 25, 2014

gforge commented Jan 7, 2015

seagreen commented Mar 8, 2016

pqmatagi commented Apr 26, 2016