Skip to content

Commit

Permalink
feat: Shortcut in safe_html
Browse files Browse the repository at this point in the history
Check for signs of html or script, skip further processing if none are found.
Saves processing time for lxml parsing and manipulation.
  • Loading branch information
reinhardt committed Aug 5, 2024
1 parent b8617c0 commit d5c91cf
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 0 deletions.
9 changes: 9 additions & 0 deletions Products/PortalTransforms/transforms/safe_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from lxml_html_clean import Cleaner
from plone.base.interfaces import IFilterSchema
from plone.base.utils import safe_bytes
from plone.base.utils import safe_text
from plone.registry.interfaces import IRegistry
from Products.PortalTransforms.interfaces import ITransform
from Products.PortalTransforms.libtransforms.utils import bodyfinder
Expand Down Expand Up @@ -183,6 +184,14 @@ def cleaner_options(self):
return options

def scrub_html(self, orig):
orig_text = safe_text(orig)
# short cut if no html or script is detected
if not orig or not (
hasScript(orig_text)
or "<" in orig_text
or any((entity in orig_text for entity in html5entities.values()))
):
return orig_text
# append html tag to create a dummy parent for the tree
html_parser = html.HTMLParser(encoding="utf-8")
orig = safe_bytes(orig)
Expand Down
1 change: 1 addition & 0 deletions news/66.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Shortcut in safe_html: Check for signs of html or script, skip further processing if none are found.

0 comments on commit d5c91cf

Please sign in to comment.