Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature_request(superfences): Tab indentation in code blocks #276

Closed
Kristinita opened this issue Apr 13, 2018 · 9 comments
Closed

feature_request(superfences): Tab indentation in code blocks #276

Kristinita opened this issue Apr 13, 2018 · 9 comments
Labels
P: maybe Pending approval of low priority request. T: feature Feature.

Comments

@Kristinita
Copy link

Kristinita commented Apr 13, 2018

1. Summary

If I generate Markdown GFM code blocks to HTML via SuperFences:

    I get unexpected spaces in code blocks.

2. Argumentation

Premices:

Conclusion:

  • I get errors, when I check my HTML files, that them contains tabs for indentation, not spaces.

3. Configuration

Example file SashaSuperFences.md:

```json
{
	"name": "SashaSuperFences",
}
```

4. Steps to reproduce

I print in console:

python -m markdown -x pymdownx.superfences SashaSuperFences.md

5. Expected behavior

Tab before <span class="nt">.

<div class="highlight"><pre><span></span><span class="p">{</span>
	<span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;SashaSuperFences&quot;</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>

6. Actual behavior

4 spaces before <span class="nt">.

<div class="highlight"><pre><span></span><span class="p">{</span>
    <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;SashaSuperFences&quot;</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>

Thanks.

@facelessuser
Copy link
Owner

Yes, this is generally a problem with the approach Python Markdown takes with whitespace. All tabs are normalized to the default indentation (which is 4). This is applied to the entire file usually before any processor is applied. You'll notice this behavior in indented code blocks as well. I believe this is done to make patterns and indention detection more straight forward in the code. In most places, this replacement isn't a problem....but with code blocks, especially if you are using tabs to align certain things, can be a problem.

With all of that said, I might be able be able address this for SuperFences. It will take a bit of work, but I believe it is possible. This would not apply to indented code blocks; only fenced blocks. If I do this, I might leave the current whitespace behavior as default for consistency with the rest of Python Markdown, but have an option to respect tabs.

In general though, I'd bring this issue up on Python Markdown, but I can probably alter SuperFences' fenced code block behavior in the future.

@facelessuser facelessuser added P: maybe Pending approval of low priority request. Priority - Low T: feature Feature. labels Apr 13, 2018
@facelessuser
Copy link
Owner

Thinking about this a little more. If I do this, I'll have to account for some special cases. SuperFences expects fenced blocks to align. If we have inconsistent indentation, some tabs will have to be converted to spaces to preserve alignment.

    ```
  \tcontent
\t```

In this case the opening and closing fence markers are equivalent as tabs are equivalent to 4 spaces when parsing. But the content line is preceded by 2 spaces and 1 tab. Indentation should be 4 spaces or 1 tab, but we have 2 spaces and a tab that is equivalent to 4 spaces that crosses the indentation boundary. In this case, we'd probably have to split the tab as 2 spaces for indentation and 2 spaces in content.

On the other hand, in this case we could also just have indentation absorb the entire tab. This might create unexpected formatting.

In short, to preserve tabs, it would be expected that tabs are wholly contained within the fenced content region or wholly contained in the indentation region. If they cross boundaries, they'd get translated to spaces.

Considering how Python Markdown can not currently handle fenced blocks as block processors, they are actually implemented as a preprocessor. This means that we sometimes convert blocks that are actually in indented code blocks and then have to restore them. But if we are trying to preserve tabs, our logic could potentially cause some weird results in indented code blocks when we restore.

In short, whatever we do may cause unintended side effects depending how it is implemented. We'll have to see if the effort required is worth it.

@facelessuser
Copy link
Owner

This looks to be doable when moved before the normalize_whitespace step, but unfortunately introduces an issue. We use place holders in the same way that Python Markdown's other extensions do. The placeholders utilize a special key and number surrounded by the control characters STX and ETX.

normalize_whitespace strips out all STX and ETX control characters, but we have to execute before normalize_whitespace. We'll probably have to look for something different to pull this off. I'd like to not introduce a complex normalize_whitespace replacement.

facelessuser added a commit that referenced this issue Apr 15, 2018
Add experimental `preserve_tabs` option that will preserve tabs inside code blocks. Tabs must be contained completly inside code block for proper preservation.  Tabs outside of the code block are converted to spaces.  Tabs that cross the identation/code content boundary will be seen as spaces inside the code block.

Ref #276
@facelessuser
Copy link
Owner

Experimental branch contains a working tab preserving feature. I've barely tested it, but it in limited testing, it appears to work. See commit for notes.

@facelessuser
Copy link
Owner

My understanding of the tab conversion process was incorrect. Python Markdown actually uses expandtabs method to expand tabs to spaces. This converts tab logic appropriately. If a tab length is 4 and you are presented with ..- (and . is a space and - is a tab) the tab will be covered to 2 spaces while - would be converted to 4. The following commit address this logic and now matches Python Markdown: 3d6fe65.

facelessuser added a commit that referenced this issue Apr 17, 2018
More efficient tab calculation and properly store content for restore. Ref #276
facelessuser added a commit that referenced this issue Apr 17, 2018
Fix corner case where fences have empty lines and no indentation.  Fix case where tabs are present in fence parameters. Ref #276
@facelessuser
Copy link
Owner

Looks like we finally have something pretty solid via pull request #279.

@Kristinita
Copy link
Author

@facelessuser , we wait new release.

Thanks.

facelessuser added a commit that referenced this issue Apr 19, 2018
Add experimental `preserve_tabs` option that will preserve tabs inside code blocks. Tabs must be contained completely inside code block for proper preservation.  Tabs outside of the code block are converted to spaces. Ref #276
@facelessuser
Copy link
Owner

This has been released. If you run into tab related parsing issues, let me know, but hopefully it just works.

@Kristinita
Copy link
Author

Status: ✔️ Fixed for me

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P: maybe Pending approval of low priority request. T: feature Feature.
Projects
None yet
Development

No branches or pull requests

2 participants