Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrettifyTreeprocessor stripping whitespace in <pre><code> #1263

Closed
fourpoints opened this issue May 26, 2022 · 4 comments
Closed

PrettifyTreeprocessor stripping whitespace in <pre><code> #1263

fourpoints opened this issue May 26, 2022 · 4 comments
Labels
bug Bug report. confirmed Confirmed bug report or approved feature request. core Related to the core parser code.

Comments

@fourpoints
Copy link
Contributor

fourpoints commented May 26, 2022

Again this small code section markdown.treeprocessor.py:L432. I hope I'm not too annoying hitting these edge cases 😅

My element looks like this; it's a table where each line is wrapped in <pre><code>.

<table>
  <tbody>
    <tr>
      <td class="lineno">1</td>
      <td><pre><code><span>def</span> <span>foo</span><span>()</span><span>:</span></code></pre></td>
    </tr>
    <tr>
      <td class="lineno">2</td>
      <td><pre><code>    <span>return</span> <span>2</span></code></pre></td>
    </tr>
  </tbody>
</table>

which looks like this

1 def foo():
2    return 2

The Prettifier turns this into something like this

<table>
  <tbody>
    <tr>
      <td class="lineno">1</td>
      <td><pre><code>
<span>def</span> <span>foo</span><span>()</span><span>:</span></code></pre></td>
    </tr>
    <tr>
      <td class="lineno">2</td>
      <td><pre><code>
<span>return</span> <span>2</span></code></pre></td>
    </tr>
  </tbody>
</table>

which looks like this

1
def foo():
2
return 2

Looking at Babelmark only pandoc and Python-Markdown appears to be stripping whitespace in <pre><code>, though fenced code isn't supported by default.

Possible fixes:

  • Don't change anything, keep it as it is.
  • Remove section.
  • Don't prettify if pre[0].text.isspace() is True (would fix my case).
@waylan
Copy link
Member

waylan commented May 26, 2022

When you made your previous report my first thought was that it is strange the we are altering whitespace in a code block. But then I realized that the intent was only to remove extra blank lines at the end of a code block. Perhaps the earlier code which parses the code block should be removing those blank lines instead of addressing it here.

Regardless, the issues you are encountering are specifically because the text of the element is empty and/or only contains whitespace, which is not something which can be done with the built-in features. I accepted the previous change only because that avoided an error being raised when the default value for the text attribute was encountered. In this case, the only way this specific edge case can be encountered is by a third-party extension building elements in this unexpected way. That being the case, the best way forward would be for your extension to also provide a replacement PrettifyTreeprocessor which alters the behavior.

As an aside, I find it odd that you are wrapping each individual line with <pre><code>. As there is no need to preserve line breaks, etc, why not simply wrap each line in a <code> tag only? That would completely avoid any issues with the PrettifyTreeprocessor.

@waylan
Copy link
Member

waylan commented May 26, 2022

On further reflection, it occurs to me that perhaps a general fix could be to add something like and not len(pre[0]) to the end of the if statement on line 435, which would ensure that we only act on code elements which do not contain any child elements. After all, we only want to remove extra whitespace from the end of the code element, and if it contains child elements, then the text is not at the end. I think that this is as far as I would be willing to go though. For anything beyond that, I would expect an extension to provide its own replacement.

@waylan waylan added the needs-decision A decision needs to be made regarding request. label May 26, 2022
@fourpoints
Copy link
Contributor Author

Thanks for the quick reply!

As an aside, I find it odd that you are wrapping each individual line with <pre><code>. As there is no need to preserve line breaks, etc, why not simply wrap each line in a <code> tag only?

I needed to preserve the leading whitespace (the spaces before return), so wrapping in <pre><code> seemed natural.

After all, we only want to remove extra whitespace from the end of the code element, and if it contains child elements, then the text is not at the end.

I think this may have been the original intent, so your suggested fix seems better. As long as the code doesn't have any highlighting, it seems natural to assume that it strips the end of the text, since it would perfectly fix the following:

<pre><code>def foo():
    return 2                        </code></pre>

The prettifier could rstrip the tail of the last element if there are children. If only ET had a TextElement so we wouldn't have to keep track of both .text and .tail 😅

Solution 1

if len(pre) and pre[0].tag == 'code':
    code = pre[0]
    if len(code)
        if code[-1].tail is not None:
            code[-1].tail = util.AtomicString(code[-1].tail.rstrip() + '\n')
    else:
        if code.text is not None:
            code.text = util.AtomicString(code.text.rstrip() + '\n')

Solution 2

if len(pre) and pre[0].tag == 'code':
    code = pre[0]
    if len(code) and code[-1].tail is not None:
            code[-1].tail = util.AtomicString(code[-1].tail.rstrip() + '\n')
    elif not len(code) and code.text is not None:
            code.text = util.AtomicString(code.text.rstrip() + '\n')

I've just made a custom Prettifier with the section removed, but it's probably more useful to include it by default.

@waylan
Copy link
Member

waylan commented May 26, 2022

I needed to preserve the leading whitespace (the spaces before return), so wrapping in <pre><code> seemed natural.

Ah, right. Although you could address that through CSS alone. Regardless, we do want to generally support syntax highlighters which output ET so we should probably fix the general issue.

The prettifier could rstrip the tail of the last element if there are children. If only ET had a TextElement so we wouldn't have to keep track of both .text and .tail 😅

This has been a never-ending frustration of mine. If I was to start over, I would have never used ET based on this issue alone. but I digress...

I am unsure about specifically trimming the tail. I expect that we would only ever see tails in code blocks if ET objects are custom built in an extension. That being the case, my thinking was to not alter them at all in my previous response. I would accept either in a PR. Although, I think we should probably include some tests for various edge cases to test every posable variation of the logic in the if statements. We would probably need to have each test case build up an ET object, pass that to an instance of the PrettifyTreeprocessor and then confirm that the output is as expected. This differs from most of our tests which simply take Markdown input and verify the HTML output

@waylan waylan added bug Bug report. core Related to the core parser code. confirmed Confirmed bug report or approved feature request. and removed needs-decision A decision needs to be made regarding request. labels May 26, 2022
@waylan waylan closed this as completed in dc434df May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report. confirmed Confirmed bug report or approved feature request. core Related to the core parser code.
Projects
None yet
Development

No branches or pull requests

2 participants