Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled #1841

Closed
timjb opened this issue Dec 25, 2014 · 5 comments

Comments

@timjb
Copy link

timjb commented Dec 25, 2014

This is an example from the documentation:

$ pandoc --version
pandoc 1.13.2
...
$ cat test.markdown
<table>
    <tr>
        <td>*one*</td>
        <td>[a link](http://google.com)</td>
    </tr>
</table>
$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test.markdown 
<table>
<pre><code>&lt;tr&gt;
    &lt;td&gt;*one*&lt;/td&gt;
    &lt;td&gt;[a link](http://google.com)&lt;/td&gt;
&lt;/tr&gt;</code></pre>
</table>

The documentation says that I should get

$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test.markdown 
<table>
    <tr>
        <td><em>one</em></td>
        <td><a href="http://google.com">a link</a></td>
    </tr>
</table>
@jgm
Copy link
Owner

jgm commented Dec 25, 2014

Indeed, this seems to be a regression. (I just tried with pandoc 1.9.4.1 and got the right result.) I'm not sure which release broke this, but I suspect the culprit is this change in version 1.13:

    + Revamped raw HTML block parsing in markdown (#1330).
      We no longer include trailing spaces and newlines in the
      raw blocks.  We look for closing tags for elements (but without
      backtracking).  Each block-level tag is its own `RawBlock`;
      we no longer try to consolidate them (though `--normalize` will do so).

Previously we parsed clumps of raw HTML tags as one block. With this change, each tag went into its own block. But that had the side effect that the indented tag gets parsed as an indented code block.

@jgm
Copy link
Owner

jgm commented Dec 25, 2014

Actually, it's a bit unclear what the behavior should be. If we're really parsing markdown inside HTML tags, then anything indented four spaces should be a code block, which is exactly what we see in 1.13.2.

@mpickering mpickering added the bug label Dec 27, 2014
@jgm jgm changed the title Indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled [markdwon reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled Jan 7, 2015
@gforge
Copy link

gforge commented Jan 7, 2015

This has turned out as a bug in my htmlTable R package. I have a fix for the issue consisting of removing the tabs but this removes the readability if I want to look at the raw output. If possible a solution using a comment tag may be an option:

<!-- Start raw html -->
<table class='gmisc_table' style='border-collapse: collapse;' >
    <thead>
    <tr>
        <th> </th>
        <th>Header</th>
    </tr>
    </thead>
    <tbody>
    <tr>
        <td>Row 1</td>
        <td>Value</td>
    </tr>
    </tbody>
</table>
<!-- End raw html -->

@jgm jgm changed the title [markdwon reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled [markdown reader] indentation in HTML blocks is parsed as code block when markdown_in_html_blocks is enabled Jul 1, 2015
@seagreen
Copy link

seagreen commented Mar 8, 2016

Perhaps an extension to turn off automatic code blocks for indented lines would be a good workaround?

@pqmatagi
Copy link

This appears to happen not just for indented HTML, but within any HTML element. For example, unindenting the simple example above is a fix, but:

$ pandoc --version
pandoc 1.17.0.2
Compiled with texmath 0.8.5, highlighting-kate 0.6.2.

$ cat test2.markdown
<table>
<tr>
<td>    *one*</td>
<td>    [a link](http://google.com)</td>
</tr>
</table>

$ pandoc --from=markdown+raw_html+markdown_in_html_blocks test2.markdown
<table>
<tr>
<td>
<pre><code>*one*&lt;/td&gt;</code></pre>
<td>
<pre><code>[a link](http://google.com)&lt;/td&gt;</code></pre>
</tr>
</table>

or

$ cat test3.markdown
<table>
<tr>
<td>    *one*</td> <td>    [a link](http://google.com)</td>
</tr>
</table>

$ /usr/bin/pandoc --from=markdown+raw_html+markdown_in_html_blocks test3.markdown
<table>
<tr>
<td>
<pre><code>*one*&lt;/td&gt; &lt;td&gt;    [a link](http://google.com)&lt;/td&gt;</code></pre>
</tr>
</table>

(I encountered this when embedding raw HTML tables, generated by the R xtable package, that had "too many" leading spaces before some of the numbers, and one HTML table row per line.)

Being able to embed line-oriented markdown within HTML elements is a nice feature, but wouldn't it make sense to insist that it be at the start of an actual line, given that HTML (roughly) doesn't care? E.g.,

$ cat test4.markdown
<table>
<tr>
<td>
    This is a
    multi-line code block
</td> <td>
* [a link](http://google.com)
* another item
</td>
</tr>
</table>

$ /usr/bin/pandoc --from=markdown+raw_html+markdown_in_html_blocks --to=html test5.markdown
<table>
<tr>
<td>
<pre><code>This is a
multi-line code block</code></pre>
</td>
<td>
<ul>
<li><a href="http://google.com">a link</a></li>
<li>another item</li>
</ul>
</td>
</tr>
</table>

Zaharid added a commit to NNPDF/reportengine that referenced this issue Feb 1, 2017
We have to:

 - Re-enable the extension that parses markdown inside raw html.
 - Flatten the tables, so they are processed as html instead of code
   (see jgm/pandoc#1841).
@jgm jgm added this to the pandoc 2.0 milestone Mar 9, 2017
@jgm jgm closed this as completed in 82cc7fb May 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants