-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't include text from sanitised elements? #33
Comments
Hi Thanks! I think that's not possible currently; the only place where a tag is dropped including its contents is here: html-sanitizer/html_sanitizer/sanitizer.py Lines 309 to 317 in e0937eb
Maybe something like this would work:
Or if that doesn't work you could try adding support for dropping tags incl. their contents to the sanitizer? I don't immediately see the value of this change to my use cases, but the addition shouldn't bring too much code with it and as long as the output is well tested this doesn't seem to burdensome. |
Thanks for the response :) I'm trying to be very strict about what I let through as I sanitise user-provided XML (not HTML), it would be useful to me to have this feature. I'm not sure which way I'll go with this just yet, but seeing as you're amenable to having the functionality added, I just might go in that direction. You'll be the first to know if I do. I'm not sure how you like to manage your issue tracker. If you like I'm happy for you to close this (and I can reopen a new issue, or just a pull request, if I do implement something), or perhaps you'd prefer to mark it a feature request? Thanks for your time (and code!) |
aah yes and your suggestion sounds sensible, but my problem now is that I don't actually just want to remove text from |
Don't worry, you can leave it open if you want to -- I'll maybe close it in a few months or years if it hasn't been addressed by then, or not.
Ah yes, that's different. Maybe the element preprocessor could receive the sanitizer instance so that it can inspect the list of allowed tags. The pre- and postprocessors could be made to return the tag if it should be kept and return nothing if the element or tree has been dropped already? Making this backwards compatible would require some thinking. Or maybe there's a better way to do it. |
Hi, thanks for html_sanitizer, I've been using it for a while and it's been doing a great job.
I was wondering if there's a way to have html_sanitizer not include the text from elements that it strips out. For example, given:
I would like to have html-sanitizer return
<p>good text</p>
, but at the moment I get backbad text<p>good text</p>
I looked into doing this with an element_preprocessor, I was thinking I could simply set the text to an empty string if the tag wasn't in the list of allowed tags, but it seems that elements which aren't in the allow list don't make it into the preprocessors, e.g the preprocessor never fires for the element because the sanitizer is stripping that tag before it gets to that stage.
Any suggestions would be appreciated! :)
The text was updated successfully, but these errors were encountered: