Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DomConverter should not trim whitespaces in nodes that neighbour with black–box elements (e.g. MathML) #5870

Closed
oleq opened this issue Nov 28, 2019 · 1 comment · Fixed by #8529
Assignees
Labels
package:engine squad:core Issue to be handled by the Core team. type:bug This issue reports a buggy (incorrect) behavior.

Comments

@oleq
Copy link
Member

oleq commented Nov 28, 2019

📝 Provide detailed reproduction steps (if any)

Note: I used my PoC code here that brings necessary conversion to load <math> to showcase the problem. But the issue isn't about the PoC code because it occurs before any conversion kicks in, which is at the raw HTML parsing stage.

  1. Try to load this kind of HTML into the editor
<p>Foo <math xmlns="http://www.w3.org/1998/Math/MathML">x</math> bar</p>
  1. It loads just fine
    image
  2. But now try this one (note the trailing space in the <math>):
<p>Foo <math xmlns="http://www.w3.org/1998/Math/MathML">x </math> bar</p>

✔️ Expected result

Spaces after "foo" and before "bar" should be preserved.

❌ Actual result

The later is gone:

image

📃 Other details

I debugged the issue, and the story is as follows:

  1. In DomConverter there's a private method that converts native DOM nodes into text.
  2. It does trimming from both sides if
    • there's no adjacent node,
    • adjacent node is an element,
    • adjacent node is text and its last character is a white-space (this is is the case we're entering here)
  3. The trimming is based on what is considered a "touching inline node".
  4. The "touching inline node" is found using the native tree walker.
  5. The walker enters a <math> and if it ends with some white spaces like, for instance, an indentation used to make the code human-readable (or like a trailing space from the TC), it figures this is it and gives a green light for trimming, which is obviously wrong.

The white spaces in the <math> do not render like regular white spaces in a bold or something, so they should never be considered.

We need a fix and here are some ideas:

  • Some DomConverter#objectLike collection that would collect black–box elements. It would be used in the walker to prevent it from considering black–box content and thus, it would prevent the trimming.
  • I think we could safely skip all elements in the tree walker that have Element#namespaceURI. I didn't test it but I suppose if we replaced <math> with inline <svg> (that is also formatted inside), we'd end up with the same issue.

The issue was spotted when creating a PoC for MathML .


If you'd like to see this fixed sooner, add a 👍 reaction to this post.

@oleq oleq added type:bug This issue reports a buggy (incorrect) behavior. package:engine labels Nov 28, 2019
@oleq
Copy link
Member Author

oleq commented Nov 28, 2019

WDYT @scofalik?

@Reinmar Reinmar modified the milestones: nice-to-have, next Dec 16, 2019
@pomek pomek modified the milestones: next, nice-to-have Nov 10, 2020
@niegowski niegowski modified the milestones: nice-to-have, iteration 38 Nov 25, 2020
@niegowski niegowski self-assigned this Nov 25, 2020
@niegowski niegowski added squad:dx squad:core Issue to be handled by the Core team. labels Nov 25, 2020
jodator added a commit that referenced this issue Nov 28, 2020
Feature (engine): Introduce the `DataProcessor#registerRawContentMatcher()` API that marks content sections that contains arbitrary character data and should not be parsed during conversion. See #8323.

Fix (html-embed): Editor will not crash after inserting a broken HTML. Closes #8323.

Fix (engine): `DomConverter` will not trim whitespaces in nodes that are siblings to inline raw content elements (e.g. MathML). Closes #5870.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:engine squad:core Issue to be handled by the Core team. type:bug This issue reports a buggy (incorrect) behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants