Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping elements with empty content #212

Closed
jakeg opened this issue Dec 28, 2017 · 7 comments
Closed

Keeping elements with empty content #212

jakeg opened this issue Dec 28, 2017 · 7 comments

Comments

@jakeg
Copy link

jakeg commented Dec 28, 2017

Using: turndown@4.0.1 via nodejs

Say you have this in your input:

<script src=""//path/to.js"></script>
<iframe src="//path/to.html"></iframe>

You would expect that turndownService.keep(['iframe', 'script']) would keep these tags in the output, but it doesn't, because the nodes have no content in them. We could add some content, but I really don't want to.

Ok, so we can use blankReplacement to surely fix this:

const turndownService = new TurndownService({
  blankReplacement (content, node) {
    if (['SCRIPT', 'IFRAME'].indexOf(node.nodeName) !== -1) {
      return `\n\n${node.outerHTML}\n\n`
    } else {
      return node.isBlock ? '\n\n' : ''
    }
  }
})

Great, it works! Well, not quite...

<script src=""//path/to.js"></script>
<!-- for whatever reason, the iframe is now nested in another element, which is otherwise empty -->
<div><iframe src="//path/to.html"></iframe></div>

Here we have a <div> around the iframe, but nothing else in that div, so that iframe is left out of the output. I'm guessing when parsing the outer div, it sees it only has an iframe in it, which is empty, so it counts the div as empty, never giving the iframe a chance to appear.

I could continue by changing my blankReplacement function, but I feel this is a bug, or at least something which others will want to have as well, so could be added as an option.

@jakeg
Copy link
Author

jakeg commented Dec 28, 2017

My full hack which seems to work for now:

const turndownService = new TurndownService({
  blankReplacement (content, node) {
    const types = ['SCRIPT', 'IFRAME']
    if (types.indexOf(node.nodeName) !== -1) {
      return `\n\n${node.outerHTML}\n\n`
    } else {
      const output = []
      node.childNodes.forEach((child) => {
        if (types.indexOf(child.nodeName) !== -1) {
          output.push(child.outerHTML)
        }
      })
      if (output.length) {
        return '\n\n' + output.join('\n\n') + '\n\n'
      } else {
        return node.isBlock ? '\n\n' : ''
      }
    }
  }
})

turndownService.keep(['iframe', 'script'])

@domchristie
Copy link
Collaborator

domchristie commented Dec 31, 2017

Thanks for raising this and for including so much detail.

I think might be fixable by modifying the isBlank function. An element is considered blank if:

  • it is not a a, th, td element (these are often purposely left blank)
  • and it's content is empty
  • and it's not a void element
  • and it does no contain any void elements

scripts and iframes are interesting cases because they are not void elements but can still be visible/usable even if their content is blank. (I think the same would apply to audio, video, and object elements?

I suppose a simple fix might be to add those elements to the isBlank check?

@ramanjitsingh
Copy link

@domchristie - I agree that iframe and script nodes should not be considered as BLANK and isBlank function should be updated to bypass them.

@mojoLyon
Copy link

@domchristie What about put those specials elements in the voidElements list. They act like it.

I've tryed to add IFRAME in the isBlank function but it's not suficient to have a correct replacement whereas adding 'iframe' in voidElements is enough to replace the iframe by a content.

In the code below in first case(isBlank) replace the iframe by an empty string and in the second one(voidElements) replace it with 'hey there'

turndownService.addRule('iframes', {
            filter: ['iframe'],
            replacement: (content, node) => {
                return 'hey there'
            },
})

@domchristie
Copy link
Collaborator

What about put those specials elements in the voidElements list. They act like it.

True, although because they are not strictly void elements (by definition), then I feel there should be a separate condition. I seem to be able to keep iframes if the isBlank function is updated to:

function isBlank (node) {
  return (
    ['A', 'TH', 'TD', 'IFRAME'].indexOf(node.nodeName) === -1 &&
    /^\s*$/i.test(node.textContent) &&
    !isVoid(node) &&
    !hasVoid(node)
  )
}

and iframe are "kept":

turndownService.keep('iframe')

@domchristie
Copy link
Collaborator

Fixed by #243

@LukeTOBrien
Copy link

LukeTOBrien commented Oct 29, 2020

Hello there,

Sorry to be a pain but I think this could be opened up again.
What I guess you need is an array in the options so the developer can choose which elements he wants not to be considered isBlank.
The fix added 'IFRAME', 'SCRIPT', 'AUDIO', 'VIDEO' to the array, but for my case I have added a rule for DIV, so I need empty divs to still be converted.
You could have something like options.processBlank = ['A', 'TH', 'TD', 'IFRAME', 'SCRIPT', 'AUDIO', 'VIDEO']; - So then I could append 'DIV' to the array if I wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants