Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapping untagged content? #4

Open
jgbishop opened this issue Mar 1, 2018 · 1 comment
Open

Wrapping untagged content? #4

jgbishop opened this issue Mar 1, 2018 · 1 comment

Comments

@jgbishop
Copy link

jgbishop commented Mar 1, 2018

Is there a way I can wrap content that doesn't already happen to be wrapped in an HTML tag? Here's a sample fragment:

<p>A paragraph here.</p>
Naked text (not in an element for some reason).
<p>Another paragraph.</p>

I'd love to be able to wrap the second line in the example above in some tag (<p> in this example). Would a custom post-processor be able to handle this? I'm not sure how your parser handles untagged elements...

Thanks for this fantastic package!

@matthiask
Copy link
Owner

Hey, thanks for the kind words!

Right now, there's nothing in html-sanitizer which would help you with that. The naked text is the .tail of the <p> tag before it and is left alone since it contains more than only whitespace.

I'd parse your fragment with beautifulsoup4, loop through all top-level elements and wrap text-only elements. Something like this (code follows because the problem was interesting 😄)

import bs4
soup = bs4.BeautifulSoup('<p>a</p>b<p>c</p>', 'html.parser')

for node in soup:
    if isinstance(node, bs4.element.NavigableString):
        tag = soup.new_tag('p')
        tag.append(str(node))
        node.replace_with(tag)
print(soup)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants