Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code blocks not rendering properly in reStructuredText #3496

Closed
AlmightyOatmeal opened this issue Mar 7, 2017 · 6 comments
Closed

Code blocks not rendering properly in reStructuredText #3496

AlmightyOatmeal opened this issue Mar 7, 2017 · 6 comments

Comments

@AlmightyOatmeal
Copy link

AlmightyOatmeal commented Mar 7, 2017

There are several issues when converting XHTML to reStructuredText so I will try to elaborate on any/all of them.

  1. Using simple a <code /> block, the rST code block contains a preceding space which causes rST not to render the code block as a code block. Newlines are also not preserved.

Example XHTML:

    <p>
      <code>
maven install
cd appd-report-standalone
maven exec:java
</code>
    </p>

Resulting rST:

`` maven install cd appd-report-standalone maven exec:java``
  1. When using a <code /> block within <pre />, newlines are preserved but the preceding space throws off the rendered rST.

Example XHTML:

    <pre>
      <code>maven install
cd appd-report-standalone
maven exec:java
</code>
    </pre>

Resulting rST:

::

          maven install
    cd appd-report-standalone
    maven exec:java

Where the hell are all of these mysterious spaces coming from?! The code lines are indented with 4 spaces but where the hell do the additional 6 spaces come from in the first line? It's certainly not from the XHTML. Looking at the XHTML, the <code /> block is indented with 6 spaces but then why take <pre /> literally while stripping out <code />? It doesn't really make sense and the behavior is inconsistent with item 1 vs item 2.

  1. Along with the aforementioned, it should be formatted like:
.. code-block::

    maven install
    cd appd-report-standalone
    maven exec:java

(NOTE: note the empty lines as well)

Because of formatting issues like this, I need to go and read what I already converted and use a series of regular expressions to parse my rST document and fix what was converted. This makes for a sad panda.

@mb21
Copy link
Collaborator

mb21 commented Mar 7, 2017

what pandoc version are you using?

@jgm
Copy link
Owner

jgm commented Mar 7, 2017

In (1), you have a space at the beginning because there are spaces after the <code> tag. These spaces are semantically significant in HTML (open in a browser and you'll see a space at the beginning of your code). So the only question is, what kind of RST output do we need to generate in order to produce a space at the beginning of this literal span? Do you know?

In (2), your <code> tag is indented 6 spaces, that's where the 6 spaces come. <pre> says preserve whitespace.

Have you tried looking at the XHTML snippet you posted in a browser? You'll see that indeed, the first line is indented. Our RST output respects that; your suggested output does not.

Summary: the only problem I see here is with case (1), where evidently we need to do some kind of escaping of the initial space (advice welcome).

@AlmightyOatmeal
Copy link
Author

@jgm,

In (1), there are no spaces after the <code> node, just a newline. A newline character should not be treated as a space even if it is preceding actual content.

In regard to case (2): yes, <pre> preserves white spaces, but then Pandoc should be accepting the <code> node as a literal otherwise it should be taking the contents of <code> and taking it as reformatted which would not include the whitespace of pretty-printed HTML because the content is within the <code> node and not <pre> node.

I used to make a living off of living, breathing, and sleeping XML so I see a problem with case (1) and case (2); in terms of HTML, that's a different story altogether. :)

As far as recommendations from my perspective:
<two_cents>

  • A preceding newline should not be treated as a space nor should any newline character.
  • Pretty-printed code should not impact a <pre> node if there are children nodes; children nodes should be treated as preformatted and not impacted by the pretty-printed structure.

</two_cents> :)

@AlmightyOatmeal
Copy link
Author

@mb21,

$ pandoc -v
pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.5, texmath 0.9, skylighting 0.1.1.4
Default user data directory: /Users/jamie/.pandoc
Copyright (C) 2006-2016 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

@jgm
Copy link
Owner

jgm commented Mar 7, 2017

Re (1), open this in a browser, you'll see that the code doesn't start right after BEFORE, but on the next line.

    <p>
      BEFORE<code>
maven install
cd appd-report-standalone
maven exec:java
</code>AFTER
    </p>

so the newline after the <code> tag is not semantically insignificant in HTML. Pandoc generally treats newlines inside <code> as spaces (because generally they're there because of hard wrapping).

Re (2), well, pandoc is parsing this as HTML, so it's the HTML rules that matter. Any pretty-printer that introduces spaces between the pre tag and the code tag is changing the HTML meaning of the document.

What you could help us with is how to represent a code span starting with a space (or for that matter, newline) character in reStructuredText. I couldn't see a way to do it from my brief perusal of the documentation. If there's no way to do it, we could simply remove the space, or leave things as they are.

@mb21
Copy link
Collaborator

mb21 commented Mar 7, 2017

A newline character should not be treated as a space

What do you mean? Yes it should, see e.g. http://stackoverflow.com/questions/588356/why-does-the-browser-renders-a-newline-as-space

@jgm jgm closed this as completed in fd35661 Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants