-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove HTML escaping JSON-encoded widget state #1934
Conversation
cc: @keller-mark |
Overview of the PR's / commits that touched this line:
With this PR we are reverting #1665, which is not what we want I guess (since it fixed a lot of issues). I don't know why the 2nd and 3rd commit come from (ping @SylvainCorlay and @martinRenou ) but it seems that that has a different purpose (can you find the PR for this?). |
Ok, google answered that: |
I don't think GHSL-2021-1025 applied anymore after spatialaudio/nbsphinx#611 (which is the improved version of #1665 cc @mgeier) |
Thanks for digging into this and having a look. I understand why avoiding a closing a script tag early is desirable, but the current html escaping modifies much more than this specific case. Would a method which is more precise to cover this case be accepted? As a side note, I think I understand issue in GHSL-2021-1025 but Jupyter widgets generally expose a mechanism to run any JavaScript on the page (in the global scope, with or without the identified </script> vulnerability). |
Unfortunately the security issue applies more broadly than the nbsphinx case, so the nbsphinx PR does not fix the nbconvert issue.
That's true, but this comment also applies to the following code cell, and we would probably like to escape it: from IPython.display import Javascript
display(Javascript('alert("this code runs")')) I think those were the main reasons for introducing the In the long term we could probably think of a way to trust some well-known widgets libraries like ipywidgets/ipyleaflet. |
Why is spatialaudio/nbsphinx#611 not sufficient for the widget output? So only addressing GHSL-2021-1025 and the widget output, not things like the title etc. |
Because nbsphinx is not nbconvert, nbconvert is used in other tools, unless I misunderstand your point? |
Yes, I mean #1665, but with the improvement of making it case insensitive (like what was done in spatialaudio/nbsphinx#611) |
Does this mean that with |
I think your fix still applies after c90b746 because it should be escaped by I may have read that thread and the code too fast, setting Maybe reintroducing #1665 instead of using |
But when |
Just for clarity, #1665 does not try to solve a security issue, but fixes a bug when a script end tag is included in a string for the JSON. So what #1665 does (or better, the way spatialaudio/nbsphinx#611 does it) should always happen indeed. |
Just for my understanding, if |
Agreed 👍🏽
You could inject any HTML in the widget JSON repr (the same way you can close the script tag). There are other ways than script tags for injecting malicious code in the page (through image sources for example). I'm the one being slow this morning sorry. That code that the PR touches is behind a |
The code should probably be the following: {%- block footer %}
{% set mimetype = 'application/vnd.jupyter.widget-state+json'%}
{%- if not resources.should_sanitize_html %}
{% if mimetype in nb.metadata.get("widgets",{})%}
<script type="{{ mimetype }}">
{{ nb.metadata.widgets[mimetype] | json_dumps | escape_html_script }}
</script>
{% endif %}
{%- elif %}
{% if mimetype in nb.metadata.get("widgets",{})%}
<script type="{{ mimetype }}">
{{ nb.metadata.widgets[mimetype] | json_dumps | escape_html_keep_quotes }}
</script>
{% endif %}
{% endif %}
{{ super() }}
{%- endblock footer-%} With |
Better, the regex version from here: spatialaudio/nbsphinx#611
But we don't insert whatever is in nb.metadata.widgets[mimetype] verbatim, but we do a For comparison:
Here, we just include the value of the But I think that after a |
I'm not an expert in security, though
|
Good question. So, can we write a water-tight |
I also don't like |
What is this called? I have never seen this, is this some kind of encoding? What should it do? |
I wonder if replacing |
This is a NULL byte, from what I've seen online this notation is only valid (valid, as in it would read |
Thanks for the active discussion. I'm just catching up...
I meant more generally that jupyter widgets provide a mechanism to run third-party JS. The security issue here is concerned with preventing widget authors from injecting malicious scripts into the widget data, but every widget has third-party JS which is loaded from a CDN by For example, if I use a Widget authors control both the widget data (which may be exploited) and widget JS (which needs no exploit to run arbitrary JS). If the widget JS is treated as a trusted source, by extension I don't see why the widget data must be sanitized.
I would be in support of this solution, although with the point above don't see why this shouldn't just be the default.
I'm not sure how prevalent this edge case is (no pun intended) since IE support officially ended as of June 2022. |
I think this is a water-tight solution. I can't think how this can be circumvented (famous last words?) |
Please apply the same fixes also to share/templates/lab/base.html.j2 file. |
Are there any plans to get this merged? I'd love to use v7 and this PR seems to fix the issue. Happy to help getting it merged if there's something I can do. |
Fixes #1900
The widget data in the templates are either the special JSON mime-type or within a
<script>
tag (JS). Transforming the JSON-encoded widget data with HTML escapes breaks application widget code. The widget author should be responsible for escaping HTML strings if necessary before encoding the data as JSON in the template - not a post-encoding mutation.