-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is an html strip-tags method a worth stdlib addition? #14356
Comments
This sounds a lot like https://github.com/straight-shoota/sanitize I published this as a shard for independent development and because of its complexity. I could see this shard (or a similar implementation) becoming a part of stdlib if we consider the use case to be very common. It's certainly a very important feature when you're dealing with untrusted HTML content. |
Absolutely excellent, and I wish I'd been able to come across that in my searches. |
Yeah, I guess there are quite a lot different terms for this kind of thing. Probably because you can view and resolve the problem from different angles. So that's not ideal for search. What did you look for and where? Maybe we can improve discoverability a bit. |
I was focused on the keyword "striptags". I looked in the HTML parsing section of awesome-crystal, and I search github for |
Thanks. https://shardbox.org/categories/HTML_XML_Parsing would've brought you there as well. |
I've forgotten about or never heard of shardbox.org, but that makes me realize that I would have found it with https://shards.info/search?query=html too. |
this example also similar to strip tags: https://github.com/kostya/lexbor/blob/master/examples/texts.cr |
I'm working on a pure-crystal implementation of an HTML utility function called
striptags
. Would it be a welcome contribution to stdlib, perhaps under HTML, which currently provides only escape/unescape?The implementation I'm working is a little more enhanced than just
XML.parse(dirty).inner_text
(a la crinja's striptags) -- I have added the ability to whitelist tags and attributes which can stay in the document, and it filters out the rest.I was surprised to see a lack of html strip-tags routines in use in the crystal ecosystem. There are some minimal attempts, in crynja and in a handcrafted rust aggregator -- the latter seemed promising but eventually I realized it is simply wrapping a function from a rust crate.
NEED
This function is important for user contributed content sites, because it allows a non-regex based way of sanitizing user markup while still allowing user contributed markup. Attempts to sanitize html by way of regexes are notorious for being XSS vectors.
Comparison / Community Interest Survey
Risks
str =~ /^\<\w+\>/
example in my searches.Other options
Cheers!
The text was updated successfully, but these errors were encountered: