PrettifyTreeprocessor stripping whitespace in `<pre><code>` #1263

fourpoints · 2022-05-26T17:24:36Z

Again this small code section markdown.treeprocessor.py:L432. I hope I'm not too annoying hitting these edge cases 😅

My element looks like this; it's a table where each line is wrapped in <pre><code>.

<table>
  <tbody>
    <tr>
      <td class="lineno">1</td>
      <td><pre><code><span>def</span> <span>foo</span><span>()</span><span>:</span></code></pre></td>
    </tr>
    <tr>
      <td class="lineno">2</td>
      <td><pre><code>    <span>return</span> <span>2</span></code></pre></td>
    </tr>
  </tbody>
</table>

which looks like this

1 def foo():
2    return 2

The Prettifier turns this into something like this

<table>
  <tbody>
    <tr>
      <td class="lineno">1</td>
      <td><pre><code>
<span>def</span> <span>foo</span><span>()</span><span>:</span></code></pre></td>
    </tr>
    <tr>
      <td class="lineno">2</td>
      <td><pre><code>
<span>return</span> <span>2</span></code></pre></td>
    </tr>
  </tbody>
</table>

which looks like this

1
def foo():
2
return 2

Looking at Babelmark only pandoc and Python-Markdown appears to be stripping whitespace in <pre><code>, though fenced code isn't supported by default.

Possible fixes:

Don't change anything, keep it as it is.
Remove section.
Don't prettify if pre[0].text.isspace() is True (would fix my case).

The text was updated successfully, but these errors were encountered:

waylan · 2022-05-26T18:00:52Z

When you made your previous report my first thought was that it is strange the we are altering whitespace in a code block. But then I realized that the intent was only to remove extra blank lines at the end of a code block. Perhaps the earlier code which parses the code block should be removing those blank lines instead of addressing it here.

Regardless, the issues you are encountering are specifically because the text of the element is empty and/or only contains whitespace, which is not something which can be done with the built-in features. I accepted the previous change only because that avoided an error being raised when the default value for the text attribute was encountered. In this case, the only way this specific edge case can be encountered is by a third-party extension building elements in this unexpected way. That being the case, the best way forward would be for your extension to also provide a replacement PrettifyTreeprocessor which alters the behavior.

As an aside, I find it odd that you are wrapping each individual line with <pre><code>. As there is no need to preserve line breaks, etc, why not simply wrap each line in a <code> tag only? That would completely avoid any issues with the PrettifyTreeprocessor.

waylan · 2022-05-26T18:09:41Z

On further reflection, it occurs to me that perhaps a general fix could be to add something like and not len(pre[0]) to the end of the if statement on line 435, which would ensure that we only act on code elements which do not contain any child elements. After all, we only want to remove extra whitespace from the end of the code element, and if it contains child elements, then the text is not at the end. I think that this is as far as I would be willing to go though. For anything beyond that, I would expect an extension to provide its own replacement.

fourpoints · 2022-05-26T18:41:39Z

Thanks for the quick reply!

As an aside, I find it odd that you are wrapping each individual line with <pre><code>. As there is no need to preserve line breaks, etc, why not simply wrap each line in a <code> tag only?

I needed to preserve the leading whitespace (the spaces before return), so wrapping in <pre><code> seemed natural.

After all, we only want to remove extra whitespace from the end of the code element, and if it contains child elements, then the text is not at the end.

I think this may have been the original intent, so your suggested fix seems better. As long as the code doesn't have any highlighting, it seems natural to assume that it strips the end of the text, since it would perfectly fix the following:

<pre><code>def foo():
    return 2                        </code></pre>

The prettifier could rstrip the tail of the last element if there are children. If only ET had a TextElement so we wouldn't have to keep track of both .text and .tail 😅

Solution 1

if len(pre) and pre[0].tag == 'code':
    code = pre[0]
    if len(code)
        if code[-1].tail is not None:
            code[-1].tail = util.AtomicString(code[-1].tail.rstrip() + '\n')
    else:
        if code.text is not None:
            code.text = util.AtomicString(code.text.rstrip() + '\n')

Solution 2

if len(pre) and pre[0].tag == 'code':
    code = pre[0]
    if len(code) and code[-1].tail is not None:
            code[-1].tail = util.AtomicString(code[-1].tail.rstrip() + '\n')
    elif not len(code) and code.text is not None:
            code.text = util.AtomicString(code.text.rstrip() + '\n')

I've just made a custom Prettifier with the section removed, but it's probably more useful to include it by default.

waylan · 2022-05-26T19:26:43Z

I needed to preserve the leading whitespace (the spaces before return), so wrapping in <pre><code> seemed natural.

Ah, right. Although you could address that through CSS alone. Regardless, we do want to generally support syntax highlighters which output ET so we should probably fix the general issue.

The prettifier could rstrip the tail of the last element if there are children. If only ET had a TextElement so we wouldn't have to keep track of both .text and .tail 😅

This has been a never-ending frustration of mine. If I was to start over, I would have never used ET based on this issue alone. but I digress...

I am unsure about specifically trimming the tail. I expect that we would only ever see tails in code blocks if ET objects are custom built in an extension. That being the case, my thinking was to not alter them at all in my previous response. I would accept either in a PR. Although, I think we should probably include some tests for various edge cases to test every posable variation of the logic in the if statements. We would probably need to have each test case build up an ET object, pass that to an instance of the PrettifyTreeprocessor and then confirm that the output is as expected. This differs from most of our tests which simply take Markdown input and verify the HTML output

waylan added the needs-decision A decision needs to be made regarding request. label May 26, 2022

waylan added bug Bug report. core Related to the core parser code. confirmed Confirmed bug report or approved feature request. and removed needs-decision A decision needs to be made regarding request. labels May 26, 2022

fourpoints mentioned this issue May 26, 2022

Update PrettifyTreeprocessor <pre><code> handling (attempt 2) #1264

Merged

waylan closed this as completed in dc434df May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrettifyTreeprocessor stripping whitespace in `<pre><code>` #1263

PrettifyTreeprocessor stripping whitespace in `<pre><code>` #1263

fourpoints commented May 26, 2022 •

edited

Loading

waylan commented May 26, 2022 •

edited

Loading

waylan commented May 26, 2022 •

edited

Loading

fourpoints commented May 26, 2022

waylan commented May 26, 2022

PrettifyTreeprocessor stripping whitespace in <pre><code> #1263

PrettifyTreeprocessor stripping whitespace in <pre><code> #1263

Comments

fourpoints commented May 26, 2022 • edited Loading

waylan commented May 26, 2022 • edited Loading

waylan commented May 26, 2022 • edited Loading

fourpoints commented May 26, 2022

waylan commented May 26, 2022

PrettifyTreeprocessor stripping whitespace in `<pre><code>` #1263

PrettifyTreeprocessor stripping whitespace in `<pre><code>` #1263

fourpoints commented May 26, 2022 •

edited

Loading

waylan commented May 26, 2022 •

edited

Loading

waylan commented May 26, 2022 •

edited

Loading