Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to replace empty <p> with <br> during html2mark conversion #331

Closed
dmitrymurashenkov opened this issue Apr 4, 2019 · 3 comments
Closed

Comments

@dmitrymurashenkov
Copy link

dmitrymurashenkov commented Apr 4, 2019

This is a typical case when converting html from rich text editor like TinyMCE to markdown. User presses enter key 2 times and gets 2 paragraphs in editor which visually look like several newlines in html:

<p>1</p>
<p></p>
<p>2</p>

We expect it to be processed similar to:

1<br><br><br>2

which gives:

1

<br />

2

but actual result is:

1

2

Note that in real case the empty <p> tag looks like this:

<p>&nbsp;</p>

So strictly it is not empty, but is still omitted.

Options used:

MutableDataSet options = new MutableDataSet();
options.set(FlexmarkHtmlParser.OUTPUT_ATTRIBUTES_ID, false);
options.set(FlexmarkHtmlParser.THEMATIC_BREAK, "-------------------");
options.set(FlexmarkHtmlParser.SETEXT_HEADINGS, false);
options.set(FlexmarkHtmlParser.BR_AS_EXTRA_BLANK_LINES, true);
options.set(FlexmarkHtmlParser.BR_AS_PARA_BREAKS, true);
options.set(FlexmarkHtmlParser.DIV_AS_PARAGRAPH, true);
options.set(TableFormatOptions.FORMAT_TABLE_CAPTION, TableCaptionHandling.REMOVE);
@dmitrymurashenkov
Copy link
Author

Found workaround for my particular case with nbsp in the empty paragraph:

options.set(FlexmarkHtmlParser.NBSP_TEXT, "&nbsp;");

This gives:

1

&nbsp;

2

Which is rendered as several blank lines too.

@vsch vsch added the 🪲 bug label Apr 4, 2019
@vsch vsch added this to the V 0.40.34 milestone Apr 4, 2019
@vsch
Copy link
Owner

vsch commented Apr 4, 2019

@dmitrymurashenkov, this is a bug of not implementing BR_AS_EXTRA_BLANK_LINES if the paragraph element is empty.

1

<br />

2
.
<p>1</p>
<p></p>
<p>2</p>

Release with the fix coming shortly.

@vsch
Copy link
Owner

vsch commented Apr 4, 2019

Fix for this is available. Repo updated, maven updated but may take a while to show up in maven central.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants