Skip to content
This repository has been archived by the owner on Feb 6, 2023. It is now read-only.

Empty blocks not included by convertFromHTML #231

Closed
benbriggs opened this issue Mar 25, 2016 · 18 comments
Closed

Empty blocks not included by convertFromHTML #231

benbriggs opened this issue Mar 25, 2016 · 18 comments
Labels

Comments

@benbriggs
Copy link
Contributor

HTML equivalent to a block with no text are omitted from the output of Draft.convertFromHTML.

Steps to reproduce:

const html = '<div>first block</div><div></div><div>third block</div>';
Draft.convertFromHTML(html) //returns an array of 2 blocks (the non-empty ones) instead of three

@hellendag mentioned in Slack that @CLowbrow might know most about this - I'm happy to PR a change to fix it but could use a bit more context on what the condition in joinChunks is used and not used for. Looks like this is the source of the removal of the block when it's called from here. When removing that last character of chunk A does its block ever need to be popped? My interpretation is that chunk A has a delimiter and chunk B is empty since its first character is \r.

Thank you!

@CLowbrow
Copy link
Contributor

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up. I'm open to modifying this. What kinds of things are you trying to preserve?

@benbriggs
Copy link
Contributor Author

Ah I see. I'm using this module along with another that I built to go to/from HTML for saving the contents, so the empty block tags are actually coming from a transformation of ContentState. I didn't use this as my initial example since it added extra complexity but having three ordered-list-item blocks:

  1. first
  2. third

That ContentState is going to have three blocks that'll map to <ol><li>first</li><li></li><li>third</li></ol>. Throwing that back into convertFromHTML omits that middle block making the last one which causes noticeable differences from what Draft was rendering originally.

@CLowbrow
Copy link
Contributor

Storing draft.js content as html is probably not the way to go. Internally, we store a serialized content state and just re-instantiate/render from that. The paste handler is more tailored to minimize the amount of follow-up editing after a paste than faithfully recreate document states.

@ianstormtaylor
Copy link
Contributor

@CLowbrow question, what are the negatives you'd point to to storing content as HTML?

It totally makes sense that the internal convertFromHTML is used for copy-pasting, which is going to have different needs than an HTML (de)serializer, so it probably shouldn't be used for that. But just wondering if beyond that, are there specific things that would make you still not use HTML even if someone wasn't using that util?

(Wondering because my current plan is to store content as HTML, since the API I'm building needs to give it to clients in HTML form, so I figure better to (de)serialize on the client, instead of having to do that at the API level, but I may be missing big downsides.)

@CLowbrow
Copy link
Contributor

Nothing huge. In my brain it makes more sense to store document data in the format the editor uses since you don't have to worry about doing the transformation both ways (just have to worry about converting to html on demand). Parsing html client-side to initialize the editor state is a bunch of extra work. If you really need html stored, I would probably double write an editor state and the html representation.

That being said there a probably a bunch of use cases for writing as html only that I can't think of :P

@benbriggs
Copy link
Contributor Author

My use case is similar to that of @ianstormtaylor where I have clients that needs HTML to process, so I use HTML to persist to the server. In my implementation I'm actually customizing a separate version of convertFromHTML to accommodate a plugin system I'm building (i.e. it's never being used for pasted data) so I'll make the changes to that version to not omit blocks. I figured I'd explore whether this was a change that could be used in the original as well.

@benbriggs
Copy link
Contributor Author

For future reference, I found a solution that seems to work. In the condition in joinChunks linked above, the block was being omitted when joining the ending 'block divider chunk" (a chunk with text of \r) to the block's chunk that is just another divider chunk, since it has no text. So the join was essentially of two divider chunks, something that (from what I can tell) only happens when creating an empty block. So I added an additional check that (A.text !== '\r' || B.text !== '\r') before popping off the end of block A.

@gdehmlow
Copy link

gdehmlow commented Aug 4, 2016

I created a duplicate issue about this (#578)

I understand that it may be a good idea to remove extra whitespace in many use cases, but I really would prefer there be an option to load the extra whitespace accurately, especially when there's nothing in the documentation that refers to this behavior. My use case: I'm using DraftJS as an email composer, and I have a snippet menu that allows me to insert snippets into the editor.

When I initially create my snippets (with Draft), the HTML in the editor itself as well as the HTML I export from the editor (to save as the snippets in my database) includes <div><br></div>, which gets rendered as a blank line in the email.

When I try to insert then these snippets into the email composer, I'm using convertFromHTML and then inserting the result as a fragment. The problem is that the convertFromHTML function strips out these blank lines.

My current solution is to instead create fragments line by line, insert then, then split the blocks manually, but this is pretty error prone as it depends on specifics aspects of the HTML I'm inserting. I'll probably switch to the custom solution @benbriggs noted above, but that feels pretty heavy handed for something that should be configurable.

@benbriggs
Copy link
Contributor Author

@gdehmlow since I last wrote that I've open sourced my version: https://github.com/HubSpot/draft-convert

@gdehmlow
Copy link

gdehmlow commented Aug 4, 2016

:O that's awesome. Exactly what I've needed.. I had taken the lazy way out and forked from conversion to HTML as well so I could support custom entity exports. Will try to contribute as possible..

@Taakn
Copy link

Taakn commented Oct 5, 2016

Agreed the user should have a choice, or at least this should be documented.

@flarnie
Copy link
Contributor

flarnie commented Sep 10, 2017

Just came across this while having some issues with the problem myself, and I think the answer given originally makes sense:

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up.

Going to close this for now, but we may revisit it in the future.

@flarnie flarnie closed this as completed Sep 10, 2017
flarnie added a commit to flarnie/draft-js that referenced this issue Sep 10, 2017
**what is the change?:**
We don't allow pasting whitespace-only blocks - see facebookarchive#231

In the past we set the 'src' as the text content of an 'img' block when
pasting/converting HTML, but sometimes the 'src' is really long and can
even crash the editor. Plus that's not really what users expect when
they paste an image.

An empty string as a default would be better but the code in
`convertFromHTML` needs refactored and for now I don't want to touch it.
The camera emoji is a cute and easy fix.

**why make this change?:**
Right now images are being dropped by `convertFromHTML

**test plan:**
Manual testing with examples/draft-0-10-0/convertFromHTML/convert.html

We should do a follow-up change to add a test that catches this.

**issue:**
facebookarchive#1377
facebook-github-bot pushed a commit that referenced this issue Sep 11, 2017
Summary:
**what is the change?:**
We don't allow pasting whitespace-only blocks - see #231

In the past we set the 'src' as the text content of an 'img' block when
pasting/converting HTML, but sometimes the 'src' is really long and can
even crash the editor. Plus that's not really what users expect when
they paste an image.

An empty string as a default would be better but the code in
`convertFromHTML` needs refactored and for now I don't want to touch it.
The camera emoji is a cute and easy fix.

**why make this change?:**
Right now images are being dropped by `convertFromHTML

**test plan:**
Manual testing with examples/draft-0-10-0/convertFromHTML/convert.html

We should do a follow-up change to add a test that catches this.

**issue:**
#1377

*Before* submitting a pull request, please make sure the following is done...

1. Fork the repo and create your branch from `master`.
2. If you've added code that should be tested, add tests!
3. If you've changed APIs, update the documentation.
4. Ensure that:
  * The test suite passes (`npm test`)
  * Your code lints (`npm run lint`) and passes Flow (`npm run flow`)
  * You have followed the [testing guidelines](https://github.com/facebook/draft-js/wiki/Testing-for-Pull-Requests)
5. If you haven't already, complete the [CLA](https://code.facebook.com/cla).

Please use the simple form below as a guideline for describing your pull request.

Thanks for contributing to Draft.js!

-

**Summary**

[...]

**Test Plan**

[...]
Closes #1378

Differential Revision: D5804382

fbshipit-source-id: f465a1a92155b02c5650acb9622e27e4e3714492
@GeorgeWL
Copy link

GeorgeWL commented Mar 16, 2018

Just came across this while having some issues with the problem myself, and I think the answer given originally makes sense:

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up.

Going to close this for now, but we may revisit it in the future.

Whilst I agree, in most cases it makes sense to have it do that,

I feel that it should be an optional param for those many cases where it's not the behaviour which is wanted. Choice is always better than no choice and having to hack your way around it.

@Smaxor5
Copy link

Smaxor5 commented May 26, 2020

I honestly have never hated a library more than this one because of this problem. Watching you just scoff this off is infuriating. I'm sitting here trying to regex replace newline characters with sections just to get custom mentions with the ability to save text as paragraphs working.

@f-ricci
Copy link

f-ricci commented Jul 3, 2020

@Smaxor5 have you tried using zero width characters like &zwnj; It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.

some context for zero width characters

@GeorgeWL
Copy link

GeorgeWL commented Jul 6, 2020

@Smaxor5 have you tried using zero width characters like &zwnj; It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.

some context for zero width characters

that's kinda then avoiding the point of the function surely?

of it reading html and converting it, without requiring pre-parsing by the user

@gtavantzopoulos
Copy link

Using draft-js with React-Final-Form caused me a great headache because, in the initialization of the form converting the saved HTML document to the editor's content state it was removing the empty blocks or any &nbsp; character that might exist at the beginning of some elements, so the RFF was receiving these changes as form mutations.

Thanks to @gdehmlow solution I've solved the issues with RFF.

@robo3945
Copy link

robo3945 commented Oct 1, 2021

@Smaxor5 have you tried using zero width characters like &zwnj; It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.

some context for zero width characters

you solved my problems with empty blocks. Good job

midas19910709 added a commit to midas19910709/draft-js that referenced this issue Mar 30, 2022
Summary:
**what is the change?:**
We don't allow pasting whitespace-only blocks - see facebookarchive/draft-js#231

In the past we set the 'src' as the text content of an 'img' block when
pasting/converting HTML, but sometimes the 'src' is really long and can
even crash the editor. Plus that's not really what users expect when
they paste an image.

An empty string as a default would be better but the code in
`convertFromHTML` needs refactored and for now I don't want to touch it.
The camera emoji is a cute and easy fix.

**why make this change?:**
Right now images are being dropped by `convertFromHTML

**test plan:**
Manual testing with examples/draft-0-10-0/convertFromHTML/convert.html

We should do a follow-up change to add a test that catches this.

**issue:**
facebookarchive/draft-js#1377

*Before* submitting a pull request, please make sure the following is done...

1. Fork the repo and create your branch from `master`.
2. If you've added code that should be tested, add tests!
3. If you've changed APIs, update the documentation.
4. Ensure that:
  * The test suite passes (`npm test`)
  * Your code lints (`npm run lint`) and passes Flow (`npm run flow`)
  * You have followed the [testing guidelines](https://github.com/facebook/draft-js/wiki/Testing-for-Pull-Requests)
5. If you haven't already, complete the [CLA](https://code.facebook.com/cla).

Please use the simple form below as a guideline for describing your pull request.

Thanks for contributing to Draft.js!

-

**Summary**

[...]

**Test Plan**

[...]
Closes facebookarchive/draft-js#1378

Differential Revision: D5804382

fbshipit-source-id: f465a1a92155b02c5650acb9622e27e4e3714492
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests