Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only trim nodes if parent can't contain text nodes #384

Closed
vidhu opened this issue Dec 10, 2021 · 11 comments · Fixed by #394
Closed

Only trim nodes if parent can't contain text nodes #384

vidhu opened this issue Dec 10, 2021 · 11 comments · Fixed by #394
Labels
bug Something isn't working

Comments

@vidhu
Copy link
Contributor

vidhu commented Dec 10, 2021

When parsing html, in most cases we need to enable trim to prevent the validateDOMNesting errors

domToReact(html, {trim: true})

However this doesn't always work. There may be cases where a div contains a &nbsp character. During parsing, this gets trimmed out.

const html = `
  <div>     Hello</div>
`;

function App() {
  return parse(html, { trim: true });
}

CodeSandBox Demo

Note, the whitespace before the Hello aren't spaces . They are &nbsp.

Would you be open to the idea of allowing users to provide a function for trim that would allow specifying a custom trimmer to handle these special spaces? It would look something like this

const customTrim = node => {
  const data = node.data;
  // some trimming logic
  return data;
}

return parse(html, { trim: customTrim });

I could even see it being extended to check if the node type allows spaces or not and trimming based on that although this can be done in the replacer function also as suggested in #205

const customTrim = node => {
  const {type, name, data} = node;
  if(type === 'tag' && ['div', 'p', ...etc].contains(name)) {
    return data
  }
  return data.trim()
}

I'm happy to create a PR if you're open to this idea

@remarkablemark
Copy link
Owner

@vidhu Would it be possible to do what you need with replace?

@vidhu
Copy link
Contributor Author

vidhu commented Dec 10, 2021

@remarkablemark I don't think so unfortunately. The issue is that I want to trim only certain kind of whitespaces. To do this, I need to be able to specify my own trimming logic instead of this

if (trim) {
data = node.data.trim();
if (data) {
result.push(node.data);
}
} else {

replace would only help me in specifying which elements to trim, not which kind of whitespaces to trim

@remarkablemark
Copy link
Owner

Question: @vidhu in your example above, how do you differentiate if something is a normal space vs &nbsp;? Or do you only want to control which elements get trimmed?

In the current code, only text nodes are trimmed so are you asking to change that logic?

if (node.type === 'text') {
// if trim option is enabled, skip whitespace text nodes
if (trim) {
data = node.data.trim();
if (data) {
result.push(node.data);
}
} else {
result.push(node.data);
}
continue;
}

@vidhu
Copy link
Contributor Author

vidhu commented Dec 10, 2021

Hey @remarkablemark. I'm talking about the former case. I want to only trim a subset of white spaces.

I would do this like so:

const demoTrim = s => {
  return s.replaceAll(/^[\f\t\n\v\r ]+|[\f\t\n\v\r ]+$/gm, '');
}

demoTrim("    H ello    ")  // "  H ello  "
demoTrim("  sdf.    ")      // "sdf"

Here's a regex101 link to see what's being matched https://regex101.com/r/gJpVYH/1/

@remarkablemark
Copy link
Owner

Gotcha @vidhu, so back to your original example where you say the whitespace before Hello is getting trimmed out, I checked your CodeSandbox and saw that the whitespace before the Hello is maintained but that's not how it renders in the browser:

Screen Shot 2021-12-10 at 2 47 41 PM

I created a Replit and found that the whitespace is maintained for nodes that are not text (e.g., div). So maybe trim is not the issue here?

@vidhu
Copy link
Contributor Author

vidhu commented Dec 10, 2021

@remarkablemark you're right. not sure how that happens. I've updated the sandbox to show this issue better. I've replaced the characters with html's &nbsp;

const html = `
  <div>&nbsp;&nbsp;&nbsp;&nbsp;</div>
`;

function App() {
  return parse(html, { trim: true });
}

image

@remarkablemark
Copy link
Owner

Thanks for updating your example @vidhu, I think this might be coming from html-dom-parser since I believe the native DOM API parses &nbsp; to .

@vidhu
Copy link
Contributor Author

vidhu commented Dec 12, 2021

@remarkablemark hmm that confuses me. In some cases, &nbsp; is converted to UTF-8 U+00A0 and other cases it ends up as U+0020 (regular spaces). So I feel this approach of modifying the trim function wouldn't always work.


I do have an alternative approach. Could we maintain a list of tags which can't contain spaces (or more generally #text nodes) and trim according to that?

Looking at react's logic, it doesn't look like a huge list. The logic would look roughly like this.

// A set containing elements that can't contain text nodes
// Taken from https://github.com/facebook/react/blob/cae635054e17a6f107a39d328649137b83f25972/packages/react-dom/src/client/validateDOMNesting.js#L213
const elementsWithNoTextChildren = new Set([
  'tr',
  'tbody',
  'thead',
  'tfoot',
  'colgroup',
  'table',
  'head',
  'html',
  'frameset',
  '#document',
]);

if (node.type === 'text') {
  const { parent } = node;
  if (parent && elementsWithNoTextChildren.has(parent.name)) {
    // If the parent of this text node can not have text, them trim it.
    data = node.data.trim();
    if (data) {
      result.push(node.data);
    }
  } else {
    result.push(node.data);
  }
    continue;
}

With this modification, react shouldn't complain about validateDOMNesting issues and we wont have to trim everything

@vidhu vidhu changed the title Allow user to specify a function for trim Only trim nodes if parent can't contain text nodes Dec 13, 2021
@vidhu
Copy link
Contributor Author

vidhu commented Dec 14, 2021

Hey @remarkablemark Not sure if I was clear in my earlier post but I've opened a PR #394 incase it help you in evaluating if this is a good change or not.

@remarkablemark remarkablemark added the bug Something isn't working label Dec 15, 2021
@remarkablemark
Copy link
Owner

Thanks, I left comments

@remarkablemark
Copy link
Owner

Released in #395 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants