-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to clean XML Document / preserve case in cleaned HTML #1930
Comments
Right, the Cleaner right now is designed to take HTML body content and clean that. I had been thinking of adding extra support to clean a complete Document (vs a body fragment). That path would also then support XML Documents. Another (and for your case, probably better) feature would be to enable case-insensitive attribute checks and output case-preserving HTML. You can almost do that now -- the cleaner checks tag normal names, but does not do that for attributes. So currently through the Cleaner, tag case can be preserved, but not attribute case.
For just parsing (not the cleaner, as noted above), you can preserve tag and attribute case and still use the HTML parser. E.g.: Document doc = Jsoup.parse(
"<SVG viewBox=123 />",
Parser.htmlParser()
.settings(ParseSettings.preserveCase)
);
System.out.println(doc.html()); Gives <SVG viewBox="123" /> Another nice to have may be to automatically preserve case in SVG elements when in HTML. |
Hello,
I'm trying to sanitize some SVG content and am using Jsoup for that specific case.
It is possible to get an XML Document by using the xmlParser as below:
Document document = Jsoup.parse(svg, Parser.xmlParser());
However, there is no possible way next to clean this XML with a whitelist (Safelist).
It handles the content as if it is HTML.
Is there a way of doing this ? This would be expected with the XML parsing being enabled.
What I need here is preserving case sensitivity on the Attributes and Tags which is only possible when using XML parsing
The text was updated successfully, but these errors were encountered: