-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide a way to specify a blocklist of protocols #2047
Comments
What protocols do you want to support additionally than HTTP and HTTPS? Gopher? My fundamental concern with blocklists is that it is very easy to forget something -- or the world changes and introduces something -- that will then cause unexpected results. That's why IMO the Cleaner Safelist implementation has proven largely resilient to XSS over time. Only allowing known safe content. What is the motivating use case here? Are there that many protocols that people are actually using that it is a real chore to add them explicitly, if you know they're safe? Are you confident that in a few years somebody won't add gpt:// as a protocol? Etc. |
I actually totally see your point and usually whitelist is what I use. this time I wasn't able to convince the client otherwise. the client want to be able to use all kind of protocols such as tel, callto, mailto, all kind of ftp etc. the client basically doesn't want to restrict itself to use new protocols in the future -- without the need to adjust the software. So it is rather the opposite, the client would want to use gpt: in the future. The only thing I was able to convince is that javascript needs to be forbidden. Anyway, I see your concerns and IMO it's OK to keep it that way. Maybe another feature could be helpful which might exist already within jsoup and I did just not find it yet. A way to get the protocol from an href normalized in the sense of all kind of variants of (side question. if I use |
Yes, String html = """
<a href=" javascript:alert(1)">1</a>
<a href="javascript:alert(1)">2</a>
<a href="java\tscript:alert(1)">3</a>
""";
Document doc = Jsoup.parse(html, "https://example.com/");
Elements links = doc.select("a");
links.forEach(el -> print(el.text() + ": " + el.absUrl("href"))); Gives:
Note that I'm showing this as an example of using The Safelist isSafeTag and isSafeAttribute methods are overridable, so you could extend it to allow any protocol if you want, or perform other extended assertions.
My suggestion would be to make it admin configurable - just pipe in a list of accepted, known safe protocols.
The Safelist.basic already includes https protocols et al. And yes, links that are protocol relative are supported as long as you pass in the baseUri when parsing (either into the String html = "<a href='//google.com/'>Link</a>";
String clean = Jsoup.clean(html, "https://example.com/", Safelist.basic());
System.out.println(clean); Produces: <a href="https://google.com/" rel="nofollow">Link</a> |
was my suggestion as well. |
That was preventing the valid decode of �� to 💯. This rule must have been in the spec when initially implemented but I can't find a reference to it now. I'm assuming that the range had since been added, but can't immediately identify why it was explicitly excluded originally. Fixes #2047
(The above commit is not related to this issue, and should point to #2074 instead) |
following use case:
I want to restrict the protocol in href of
<a>
. allow everything with the exception of javascript (and all its encoded variants etc.)currently, one can only either allow all or safelist.
The text was updated successfully, but these errors were encountered: