Only trim nodes if parent can't contain text nodes #384

vidhu · 2021-12-10T05:05:25Z

When parsing html, in most cases we need to enable trim to prevent the validateDOMNesting errors

domToReact(html, {trim: true})

However this doesn't always work. There may be cases where a div contains a &nbsp character. During parsing, this gets trimmed out.

const html = `
  <div>     Hello</div>
`;

function App() {
  return parse(html, { trim: true });
}

CodeSandBox Demo

Note, the whitespace before the Hello aren't spaces . They are &nbsp.

Would you be open to the idea of allowing users to provide a function for trim that would allow specifying a custom trimmer to handle these special spaces? It would look something like this

const customTrim = node => {
  const data = node.data;
  // some trimming logic
  return data;
}

return parse(html, { trim: customTrim });

I could even see it being extended to check if the node type allows spaces or not and trimming based on that although this can be done in the replacer function also as suggested in #205

const customTrim = node => {
  const {type, name, data} = node;
  if(type === 'tag' && ['div', 'p', ...etc].contains(name)) {
    return data
  }
  return data.trim()
}

I'm happy to create a PR if you're open to this idea

The text was updated successfully, but these errors were encountered:

remarkablemark · 2021-12-10T05:12:39Z

@vidhu Would it be possible to do what you need with replace?

vidhu · 2021-12-10T05:20:17Z

@remarkablemark I don't think so unfortunately. The issue is that I want to trim only certain kind of whitespaces. To do this, I need to be able to specify my own trimming logic instead of this

html-react-parser/lib/dom-to-react.js

Lines 55 to 60 in a41e352

    
           if (trim) { 
        
             data = node.data.trim(); 
        
             if (data) { 
        
               result.push(node.data); 
        
             } 
        
           } else {

replace would only help me in specifying which elements to trim, not which kind of whitespaces to trim

remarkablemark · 2021-12-10T06:28:32Z

Question: @vidhu in your example above, how do you differentiate if something is a normal space vs  ? Or do you only want to control which elements get trimmed?

In the current code, only text nodes are trimmed so are you asking to change that logic?

html-react-parser/lib/dom-to-react.js

Lines 53 to 64 in a41e352

    
           if (node.type === 'text') { 
        
             // if trim option is enabled, skip whitespace text nodes 
        
             if (trim) { 
        
               data = node.data.trim(); 
        
               if (data) { 
        
                 result.push(node.data); 
        
               } 
        
             } else { 
        
               result.push(node.data); 
        
             } 
        
             continue; 
        
           }

vidhu · 2021-12-10T15:34:36Z

Hey @remarkablemark. I'm talking about the former case. I want to only trim a subset of white spaces.

I would do this like so:

const demoTrim = s => {
  return s.replaceAll(/^[\f\t\n\v\r ]+|[\f\t\n\v\r ]+$/gm, '');
}

demoTrim("    H ello    ")  // "  H ello  "
demoTrim("  sdf.    ")      // "sdf"

Here's a regex101 link to see what's being matched https://regex101.com/r/gJpVYH/1/

remarkablemark · 2021-12-10T19:52:58Z

Gotcha @vidhu, so back to your original example where you say the whitespace before Hello is getting trimmed out, I checked your CodeSandbox and saw that the whitespace before the Hello is maintained but that's not how it renders in the browser:

I created a Replit and found that the whitespace is maintained for nodes that are not text (e.g., div). So maybe trim is not the issue here?

vidhu · 2021-12-10T20:48:05Z

@remarkablemark you're right. not sure how that happens. I've updated the sandbox to show this issue better. I've replaced the characters with html's  

const html = `
  <div>&nbsp;&nbsp;&nbsp;&nbsp;</div>
`;

function App() {
  return parse(html, { trim: true });
}

remarkablemark · 2021-12-12T22:19:10Z

Thanks for updating your example @vidhu, I think this might be coming from html-dom-parser since I believe the native DOM API parses   to .

vidhu · 2021-12-12T23:15:48Z

@remarkablemark hmm that confuses me. In some cases,   is converted to UTF-8 U+00A0 and other cases it ends up as U+0020 (regular spaces). So I feel this approach of modifying the trim function wouldn't always work.

I do have an alternative approach. Could we maintain a list of tags which can't contain spaces (or more generally #text nodes) and trim according to that?

Looking at react's logic, it doesn't look like a huge list. The logic would look roughly like this.

// A set containing elements that can't contain text nodes
// Taken from https://github.com/facebook/react/blob/cae635054e17a6f107a39d328649137b83f25972/packages/react-dom/src/client/validateDOMNesting.js#L213
const elementsWithNoTextChildren = new Set([
  'tr',
  'tbody',
  'thead',
  'tfoot',
  'colgroup',
  'table',
  'head',
  'html',
  'frameset',
  '#document',
]);

if (node.type === 'text') {
  const { parent } = node;
  if (parent && elementsWithNoTextChildren.has(parent.name)) {
    // If the parent of this text node can not have text, them trim it.
    data = node.data.trim();
    if (data) {
      result.push(node.data);
    }
  } else {
    result.push(node.data);
  }
    continue;
}

With this modification, react shouldn't complain about validateDOMNesting issues and we wont have to trim everything

vidhu · 2021-12-14T05:49:49Z

Hey @remarkablemark Not sure if I was clear in my earlier post but I've opened a PR #394 incase it help you in evaluating if this is a good change or not.

remarkablemark · 2021-12-15T02:55:25Z

Thanks, I left comments

remarkablemark · 2021-12-18T06:36:00Z

Released in #395 (comment)

vidhu changed the title ~~Allow user to specify a function for trim~~ Only trim nodes if parent can't contain text nodes Dec 13, 2021

vidhu mentioned this issue Dec 14, 2021

fix: trim only when necessary if trimming is off #394

Merged

4 tasks

remarkablemark added the bug Something isn't working label Dec 15, 2021

remarkablemark closed this as completed in #394 Dec 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only trim nodes if parent can't contain text nodes #384

Only trim nodes if parent can't contain text nodes #384

vidhu commented Dec 10, 2021

remarkablemark commented Dec 10, 2021

vidhu commented Dec 10, 2021

remarkablemark commented Dec 10, 2021

vidhu commented Dec 10, 2021

remarkablemark commented Dec 10, 2021

vidhu commented Dec 10, 2021 •

edited

Loading

remarkablemark commented Dec 12, 2021

vidhu commented Dec 12, 2021 •

edited

Loading

vidhu commented Dec 14, 2021

remarkablemark commented Dec 15, 2021

remarkablemark commented Dec 18, 2021

Only trim nodes if parent can't contain text nodes #384

Only trim nodes if parent can't contain text nodes #384

Comments

vidhu commented Dec 10, 2021

remarkablemark commented Dec 10, 2021

vidhu commented Dec 10, 2021

remarkablemark commented Dec 10, 2021

vidhu commented Dec 10, 2021

remarkablemark commented Dec 10, 2021

vidhu commented Dec 10, 2021 • edited Loading

remarkablemark commented Dec 12, 2021

vidhu commented Dec 12, 2021 • edited Loading

vidhu commented Dec 14, 2021

remarkablemark commented Dec 15, 2021

remarkablemark commented Dec 18, 2021

vidhu commented Dec 10, 2021 •

edited

Loading

vidhu commented Dec 12, 2021 •

edited

Loading