Wrapping untagged content? #4

jgbishop · 2018-03-01T19:33:36Z

Is there a way I can wrap content that doesn't already happen to be wrapped in an HTML tag? Here's a sample fragment:

<p>A paragraph here.</p>
Naked text (not in an element for some reason).
<p>Another paragraph.</p>

I'd love to be able to wrap the second line in the example above in some tag (<p> in this example). Would a custom post-processor be able to handle this? I'm not sure how your parser handles untagged elements...

Thanks for this fantastic package!

The text was updated successfully, but these errors were encountered:

matthiask · 2018-03-02T08:52:10Z

Hey, thanks for the kind words!

Right now, there's nothing in html-sanitizer which would help you with that. The naked text is the .tail of the <p> tag before it and is left alone since it contains more than only whitespace.

I'd parse your fragment with beautifulsoup4, loop through all top-level elements and wrap text-only elements. Something like this (code follows because the problem was interesting 😄)

import bs4
soup = bs4.BeautifulSoup('<p>a</p>b<p>c</p>', 'html.parser')

for node in soup:
    if isinstance(node, bs4.element.NavigableString):
        tag = soup.new_tag('p')
        tag.append(str(node))
        node.replace_with(tag)
print(soup)

jsonn mentioned this issue Mar 27, 2018

Allow better form handling and tag merging #6

Merged

matthiask added the idea 🦄 label Feb 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrapping untagged content? #4

Wrapping untagged content? #4

jgbishop commented Mar 1, 2018

matthiask commented Mar 2, 2018

Wrapping untagged content? #4

Wrapping untagged content? #4

Comments

jgbishop commented Mar 1, 2018

matthiask commented Mar 2, 2018