Empty blocks not included by convertFromHTML #231

benbriggs · 2016-03-25T02:33:01Z

HTML equivalent to a block with no text are omitted from the output of Draft.convertFromHTML.

Steps to reproduce:

const html = '<div>first block</div><div></div><div>third block</div>';
Draft.convertFromHTML(html) //returns an array of 2 blocks (the non-empty ones) instead of three

@hellendag mentioned in Slack that @CLowbrow might know most about this - I'm happy to PR a change to fix it but could use a bit more context on what the condition in joinChunks is used and not used for. Looks like this is the source of the removal of the block when it's called from here. When removing that last character of chunk A does its block ever need to be popped? My interpretation is that chunk A has a delimiter and chunk B is empty since its first character is \r.

Thank you!

The text was updated successfully, but these errors were encountered:

CLowbrow · 2016-03-25T07:28:37Z

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up. I'm open to modifying this. What kinds of things are you trying to preserve?

benbriggs · 2016-03-25T14:48:24Z

Ah I see. I'm using this module along with another that I built to go to/from HTML for saving the contents, so the empty block tags are actually coming from a transformation of ContentState. I didn't use this as my initial example since it added extra complexity but having three ordered-list-item blocks:

first
third

That ContentState is going to have three blocks that'll map to <ol><li>first</li><li></li><li>third</li></ol>. Throwing that back into convertFromHTML omits that middle block making the last one which causes noticeable differences from what Draft was rendering originally.

CLowbrow · 2016-03-25T17:55:18Z

Storing draft.js content as html is probably not the way to go. Internally, we store a serialized content state and just re-instantiate/render from that. The paste handler is more tailored to minimize the amount of follow-up editing after a paste than faithfully recreate document states.

ianstormtaylor · 2016-03-25T17:57:25Z

@CLowbrow question, what are the negatives you'd point to to storing content as HTML?

It totally makes sense that the internal convertFromHTML is used for copy-pasting, which is going to have different needs than an HTML (de)serializer, so it probably shouldn't be used for that. But just wondering if beyond that, are there specific things that would make you still not use HTML even if someone wasn't using that util?

(Wondering because my current plan is to store content as HTML, since the API I'm building needs to give it to clients in HTML form, so I figure better to (de)serialize on the client, instead of having to do that at the API level, but I may be missing big downsides.)

CLowbrow · 2016-03-25T18:05:51Z

Nothing huge. In my brain it makes more sense to store document data in the format the editor uses since you don't have to worry about doing the transformation both ways (just have to worry about converting to html on demand). Parsing html client-side to initialize the editor state is a bunch of extra work. If you really need html stored, I would probably double write an editor state and the html representation.

That being said there a probably a bunch of use cases for writing as html only that I can't think of :P

benbriggs · 2016-03-25T19:21:19Z

My use case is similar to that of @ianstormtaylor where I have clients that needs HTML to process, so I use HTML to persist to the server. In my implementation I'm actually customizing a separate version of convertFromHTML to accommodate a plugin system I'm building (i.e. it's never being used for pasted data) so I'll make the changes to that version to not omit blocks. I figured I'd explore whether this was a change that could be used in the original as well.

benbriggs · 2016-03-30T23:50:15Z

For future reference, I found a solution that seems to work. In the condition in joinChunks linked above, the block was being omitted when joining the ending 'block divider chunk" (a chunk with text of \r) to the block's chunk that is just another divider chunk, since it has no text. So the join was essentially of two divider chunks, something that (from what I can tell) only happens when creating an empty block. So I added an additional check that (A.text !== '\r' || B.text !== '\r') before popping off the end of block A.

gdehmlow · 2016-08-04T17:52:59Z

I created a duplicate issue about this (#578)

I understand that it may be a good idea to remove extra whitespace in many use cases, but I really would prefer there be an option to load the extra whitespace accurately, especially when there's nothing in the documentation that refers to this behavior. My use case: I'm using DraftJS as an email composer, and I have a snippet menu that allows me to insert snippets into the editor.

When I initially create my snippets (with Draft), the HTML in the editor itself as well as the HTML I export from the editor (to save as the snippets in my database) includes <div><br></div>, which gets rendered as a blank line in the email.

When I try to insert then these snippets into the email composer, I'm using convertFromHTML and then inserting the result as a fragment. The problem is that the convertFromHTML function strips out these blank lines.

My current solution is to instead create fragments line by line, insert then, then split the blocks manually, but this is pretty error prone as it depends on specifics aspects of the HTML I'm inserting. I'll probably switch to the custom solution @benbriggs noted above, but that feels pretty heavy handed for something that should be configurable.

benbriggs · 2016-08-04T21:33:09Z

@gdehmlow since I last wrote that I've open sourced my version: https://github.com/HubSpot/draft-convert

gdehmlow · 2016-08-04T23:55:44Z

:O that's awesome. Exactly what I've needed.. I had taken the lazy way out and forked from conversion to HTML as well so I could support custom entity exports. Will try to contribute as possible..

Taakn · 2016-10-05T16:02:26Z

Agreed the user should have a choice, or at least this should be documented.

flarnie · 2017-09-10T00:26:12Z

Just came across this while having some issues with the problem myself, and I think the answer given originally makes sense:

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up.

Going to close this for now, but we may revisit it in the future.

**what is the change?:** We don't allow pasting whitespace-only blocks - see facebookarchive#231 In the past we set the 'src' as the text content of an 'img' block when pasting/converting HTML, but sometimes the 'src' is really long and can even crash the editor. Plus that's not really what users expect when they paste an image. An empty string as a default would be better but the code in `convertFromHTML` needs refactored and for now I don't want to touch it. The camera emoji is a cute and easy fix. **why make this change?:** Right now images are being dropped by `convertFromHTML **test plan:** Manual testing with examples/draft-0-10-0/convertFromHTML/convert.html We should do a follow-up change to add a test that catches this. **issue:** facebookarchive#1377

Summary: **what is the change?:** We don't allow pasting whitespace-only blocks - see #231 In the past we set the 'src' as the text content of an 'img' block when pasting/converting HTML, but sometimes the 'src' is really long and can even crash the editor. Plus that's not really what users expect when they paste an image. An empty string as a default would be better but the code in `convertFromHTML` needs refactored and for now I don't want to touch it. The camera emoji is a cute and easy fix. **why make this change?:** Right now images are being dropped by `convertFromHTML **test plan:** Manual testing with examples/draft-0-10-0/convertFromHTML/convert.html We should do a follow-up change to add a test that catches this. **issue:** #1377 *Before* submitting a pull request, please make sure the following is done... 1. Fork the repo and create your branch from `master`. 2. If you've added code that should be tested, add tests! 3. If you've changed APIs, update the documentation. 4. Ensure that: * The test suite passes (`npm test`) * Your code lints (`npm run lint`) and passes Flow (`npm run flow`) * You have followed the [testing guidelines](https://github.com/facebook/draft-js/wiki/Testing-for-Pull-Requests) 5. If you haven't already, complete the [CLA](https://code.facebook.com/cla). Please use the simple form below as a guideline for describing your pull request. Thanks for contributing to Draft.js! - **Summary** [...] **Test Plan** [...] Closes #1378 Differential Revision: D5804382 fbshipit-source-id: f465a1a92155b02c5650acb9622e27e4e3714492

GeorgeWL · 2018-03-16T09:18:39Z

Just came across this while having some issues with the problem myself, and I think the answer given originally makes sense:

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up.

Going to close this for now, but we may revisit it in the future.

Whilst I agree, in most cases it makes sense to have it do that,

I feel that it should be an optional param for those many cases where it's not the behaviour which is wanted. Choice is always better than no choice and having to hack your way around it.

Smaxor5 · 2020-05-26T19:56:02Z

I honestly have never hated a library more than this one because of this problem. Watching you just scoff this off is infuriating. I'm sitting here trying to regex replace newline characters with sections just to get custom mentions with the ability to save text as paragraphs working.

f-ricci · 2020-07-03T13:49:59Z

@Smaxor5 have you tried using zero width characters like &zwnj; It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.

some context for zero width characters

GeorgeWL · 2020-07-06T14:12:51Z

@Smaxor5 have you tried using zero width characters like &zwnj; It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.

some context for zero width characters

that's kinda then avoiding the point of the function surely?

of it reading html and converting it, without requiring pre-parsing by the user

gtavantzopoulos · 2021-07-12T16:40:11Z

Using draft-js with React-Final-Form caused me a great headache because, in the initialization of the form converting the saved HTML document to the editor's content state it was removing the empty blocks or any   character that might exist at the beginning of some elements, so the RFF was receiving these changes as form mutations.

Thanks to @gdehmlow solution I've solved the issues with RFF.

robo3945 · 2021-10-01T17:10:45Z

@Smaxor5 have you tried using zero width characters like &zwnj; It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.

some context for zero width characters

you solved my problems with empty blocks. Good job

Summary: **what is the change?:** We don't allow pasting whitespace-only blocks - see facebookarchive/draft-js#231 In the past we set the 'src' as the text content of an 'img' block when pasting/converting HTML, but sometimes the 'src' is really long and can even crash the editor. Plus that's not really what users expect when they paste an image. An empty string as a default would be better but the code in `convertFromHTML` needs refactored and for now I don't want to touch it. The camera emoji is a cute and easy fix. **why make this change?:** Right now images are being dropped by `convertFromHTML **test plan:** Manual testing with examples/draft-0-10-0/convertFromHTML/convert.html We should do a follow-up change to add a test that catches this. **issue:** facebookarchive/draft-js#1377 *Before* submitting a pull request, please make sure the following is done... 1. Fork the repo and create your branch from `master`. 2. If you've added code that should be tested, add tests! 3. If you've changed APIs, update the documentation. 4. Ensure that: * The test suite passes (`npm test`) * Your code lints (`npm run lint`) and passes Flow (`npm run flow`) * You have followed the [testing guidelines](https://github.com/facebook/draft-js/wiki/Testing-for-Pull-Requests) 5. If you haven't already, complete the [CLA](https://code.facebook.com/cla). Please use the simple form below as a guideline for describing your pull request. Thanks for contributing to Draft.js! - **Summary** [...] **Test Plan** [...] Closes facebookarchive/draft-js#1378 Differential Revision: D5804382 fbshipit-source-id: f465a1a92155b02c5650acb9622e27e4e3714492

hellendag added the question label Mar 25, 2016

coopy mentioned this issue Apr 8, 2016

Support custom inline styles when importing from HTML? #282

Open

benbriggs mentioned this issue Aug 3, 2016

convertFromHTML doesn't preserve <br>s inside <div>s #578

Closed

just1jp mentioned this issue Mar 3, 2017

(bug - easy) WYSIWYG loses blank lines between paragraphs bearded-artichokes/twine#130

Closed

flarnie closed this as completed Sep 10, 2017

flarnie mentioned this issue Sep 10, 2017

Make default text content of img blocks a little camera emoji #1378

Closed

simpleshadow mentioned this issue Jul 11, 2018

Pasting is losing some new lines? draft-js-plugins/draft-js-plugins#638

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty blocks not included by convertFromHTML #231

Empty blocks not included by convertFromHTML #231

benbriggs commented Mar 25, 2016

CLowbrow commented Mar 25, 2016

benbriggs commented Mar 25, 2016

CLowbrow commented Mar 25, 2016

ianstormtaylor commented Mar 25, 2016

CLowbrow commented Mar 25, 2016

benbriggs commented Mar 25, 2016

benbriggs commented Mar 30, 2016

gdehmlow commented Aug 4, 2016

benbriggs commented Aug 4, 2016

gdehmlow commented Aug 4, 2016 •

edited

Loading

Taakn commented Oct 5, 2016

flarnie commented Sep 10, 2017

GeorgeWL commented Mar 16, 2018 •

edited

Loading

Smaxor5 commented May 26, 2020

f-ricci commented Jul 3, 2020

GeorgeWL commented Jul 6, 2020

gtavantzopoulos commented Jul 12, 2021

robo3945 commented Oct 1, 2021

Empty blocks not included by convertFromHTML #231

Empty blocks not included by convertFromHTML #231

Comments

benbriggs commented Mar 25, 2016

CLowbrow commented Mar 25, 2016

benbriggs commented Mar 25, 2016

CLowbrow commented Mar 25, 2016

ianstormtaylor commented Mar 25, 2016

CLowbrow commented Mar 25, 2016

benbriggs commented Mar 25, 2016

benbriggs commented Mar 30, 2016

gdehmlow commented Aug 4, 2016

benbriggs commented Aug 4, 2016

gdehmlow commented Aug 4, 2016 • edited Loading

Taakn commented Oct 5, 2016

flarnie commented Sep 10, 2017

GeorgeWL commented Mar 16, 2018 • edited Loading

Smaxor5 commented May 26, 2020

f-ricci commented Jul 3, 2020

GeorgeWL commented Jul 6, 2020

gtavantzopoulos commented Jul 12, 2021

robo3945 commented Oct 1, 2021

gdehmlow commented Aug 4, 2016 •

edited

Loading

GeorgeWL commented Mar 16, 2018 •

edited

Loading