Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Trusted sources for React elements. #3583

Closed
wants to merge 1 commit into from

Conversation

leebyron
Copy link
Contributor

@leebyron leebyron commented Apr 3, 2015

This is a proposal for tightening React's XSS security model as referenced by #3473.

React Elements are simply JS objects of a particular shape. Because React applications are frequently assembling large trees of these elements and including user data, it's possible for a malicious user to insert data that appears like a React Element and therefore render arbitrary and potentially dangerous markup.

Previous versions of React used instanceof, but that limited the ability to use multiple Reacts and tightly coupled JSX to React which we wanted to avoid.

This proposal augments {_isReactElement: true} with {_source: "randomstring"}. Using React.createElement() automatically adds this _source, but JSX could generate regular object bodies with {_source: React.getSourceID()}.

In order to use multiple Reacts, a new API React.trustSource(sourceID) is added. You can imagine using a different React instance in a webworker and using React.trustSource(sourceIDFromCallingWorkersReactGetSourceID).

To preserve back-compat, the default behavior proposed is to dangerously trust all sources, but to warn! If this proposal lands, then I imagine a future version of React will make the default behavior to not trust unknown sources, removing warnings.

@leebyron
Copy link
Contributor Author

leebyron commented Apr 3, 2015

cc @sebmarkbage @graue @yukinying

@leebyron
Copy link
Contributor Author

leebyron commented Apr 3, 2015

There's also an argument to be made for the default behavior being not trusting unknown sources from day 1 of this landing. That's technically a breaking change from v0.13, but actually matches the behavior of using instanceof with v0.12. It would remove all the warnings and just go straight for safety. I'm curious what you all think of that.

@sophiebits
Copy link
Collaborator

Maybe warn in 0.13.2 with opt-in errors, then always error in 0.14? I don't want to keep dangerouslyTrustAllSources long-term.

@leebyron
Copy link
Contributor Author

leebyron commented Apr 3, 2015

To be clear - there are no errors at present (maybe there will be in v0.14 for other reasons). Simply isValidElement returns false, so foreign elements are seen as arbitrary object literals.

dangerouslyTrustAllSources has it's place beyond whether or not we do warnings. It provides a quick fix if you're doing hyjinx and this change screws you over. I agree that long term, it makes far less sense.

This is a proposal for tightening React's XSS security model as referenced by facebook#3473.

React Elements are simply JS objects of a particular shape. Because React applications are frequently assembling large trees of these elements and including user data, it's possible for a malicious user to insert data that appears like a React Element and therefore render arbitrary and potentially dangerous markup.

Previous versions of React used instanceof, but that limited the ability to use multiple Reacts and tightly coupled JSX to React which we wanted to avoid.

This proposal replaces `{_isReactElement: true}` with `{_source: "randomstring"}`. Using React.createElement() automatically adds this _source, but JSX could generate regular object bodies with `{_source: React.getSourceID()}`.

In order to use multiple Reacts, a new API `React.trustSource(sourceID)` is added. You can imagine using a different React instance in a webworker and using `React.trustSource(sourceIDFromCallingWorkersReactGetSourceID)`.

To preserve back-compat, the default behavior proposed is to dangerously trust all sources, but to warn! If this proposal lands, then I imagine a future version of React will make the default behavior to not trust unknown sources, removing warnings.
@leebyron
Copy link
Contributor Author

leebyron commented Apr 4, 2015

The more I think about it, the more I agree with you @spicyj that this makes sense as a 0.13.2 as a partial security fix (introduces the warnings and gives you new tools) and by 0.14 the default is explicit trusting and no more warnings.

@yukinying
Copy link

I like this idea, it is very much similar to how CSP's script-source nonce ("randomstring") and draws a clear boundary between the library generated data and untrusted user input.

@syranide
Copy link
Contributor

syranide commented Apr 6, 2015

It's obviously a question of how far you want to take it, but this only prevents exploitation of React elements. It does not prevent other future sources, say if a users writes a component that takes HTML as children rather than a prop <MyComp>blabla{{__html}}</MyComp>, I would even argue that if it weren't for HTML DOM complexities we too would/should support that syntax rather than dangerouslySetInnerHTML which really seems like a temporary hack to me.

I get that explicitly separating values from elements is rather invasive (it's implicit), but it really only applies to inheriting/markup frontends and mostly codebases that do not follow the React recommendation of "isolating components" so that they can be reused in any context. Explicitly separating renderable elements (values of otherwise) from non-renderable primitives also has a number of additional benefits and most non-HTML frontends would be minimally affected.

I get that it's invasive and a hard sell when HTML is our current frontend, but the problem is introduced by our current implicit behavior and can be definitively solved if we made explicit.

@sebmarkbage
Copy link
Collaborator

@leebyron I think that it is important that compilers can generate ReactElements without coupling to the React instance. It removes the annoying need for JSX to have React in scope and these optimizations can work: #3228

To do this, the transpiler would need to have access to a source ID in its runtime instead of the React instance.

<div className="foo" />

could turn into:

{ type: 'div', props: { className: 'foo' }, sourceID: $runtime.jsxSourceID  }

However, to do this I would have to wire up my transpiler and my React instance somehow. React.trustSource($runtime.jsxSourceID) It wouldn't be seamless out-of-the-box.

It might also require multiple different sourceIDs to be active at any time which would be slower than the ideal by-ref always-true comparison.

Therefore I was thinking about maybe storing some source ID on the global that can be shared by transpilers and React, or if it is available, then React picks it up or something. What do you think?

@sebmarkbage
Copy link
Collaborator

@syranide Your comment is a bit unclear in the context of this PR. I think you're referring to your comment in #3473 that you would prefer primitives to be wrapped. I'll comment there to keep this thread about the merits of the sourceID approach on its own. We can use #3473 to discuss alternative approaches.

@syranide
Copy link
Contributor

syranide commented Apr 7, 2015

@sebmarkbage My bad, after more consideration I don't think my line of thinking is right approach for this immediate XSS issue, it's about going back and re-evaluating and that's an entirely separate discussion. This PR seems like the way to go 👍 (even if we could instanceof it wouldn't be a universal fix, while JS is currently lacking native serialization of classes it's not an intrinsic guarantee).

@brigand
Copy link
Contributor

brigand commented Apr 7, 2015

I think that it is important that compilers can generate ReactElements without coupling to the React instance.

If the goal is to prevent JSON deserializing to react elements, you could use one of the two types not allowed in JSON.

These are also cross realm and indifferent to the number of react versions, including 0.

el = { type: 'div', props: { className: 'foo' }, unsafe: undefined };
invariant(hasOwn.call(el, 'unsafe') && el.unsafe === undefined, '...');

el = { type: 'div', props: { className: 'foo' }, safe: Math.max /* pick your favorite */ };
invariant(hasOwn.call(el, 'safe') && typeof el.safe === 'function', '...';)

JSON.stringify(el) === '{"type":"div","props":{"className":"foo"}}';

@leebyron
Copy link
Contributor Author

leebyron commented Apr 8, 2015

@sebmarkbage #3228 optimization can still work in this case - the inlined object body just needs to include a _source: $somethingGoesHere.sourceID.

I'm not sure I understand why having React in scope vs $runtime in scope is better or worse. They seem isomorphic to me - you need something in scope, so why not React?

@leebyron
Copy link
Contributor Author

leebyron commented Apr 8, 2015

@brigand - we do want JSON serialization. This would be much easier to solve otherwise. JSON serialization gives us the ability to move React components across web workers and between server and client.

@sebmarkbage
Copy link
Collaborator

@leebyron JSX is a language feature so it should be able to execute without any other context. For example, array literals doesn't require the Array constructor to be in scope.

var Array = null;
var arr = []; // still works

Likewise, I shouldn't need to do:

import React from "react";
var element = <div />;

It even looks like it is unused. Even if I did want to use React for something else I should be able to call it what I want without JSX dictating the variable name:

import $R from "react";
import { Component, PropTypes } from "react";

Another neat feature is that React components doesn't even depend on React's base class necessarily.

function Bar(props) {
  return <div />;
}
class Foo {
  render() {
    return <Bar />;
  }
}

This should be perfectly valid. You can even load the modules without loading the dependency on the rest of React. You can run these in isolated unit tests without React and without any mocking. They're also agnostic to multiple instances of React. If you end up with both React v4.2.1 and React v4.2.0 because of some clowny npm versioning.

It also decouples it from React itself so that other VDOM libraries can use these elements and the same JSX transpiler.

Now if a transpiler decides to require React automatically and bring it into scope to use React.createElement, that's fine but that should be an implementation detail of the transpiler's runtime rather than a hard requirement on the developer.

@mindplay-dk
Copy link

@sebmarkbage This would be awesome 👍

@Ciantic
Copy link

Ciantic commented Apr 12, 2015

@sebmarkbage 👍 should someone open a new issue to change transpiler to work like that? It would make so much more sense.

@sebmarkbage
Copy link
Collaborator

We still have some work to make that a reality. In <0.14 a lot of things depend on "owner" which is global state inside React at the time the element is created.

In 0.14 that is still true for refs.

On Apr 12, 2015, at 8:53 AM, Jari Pennanen notifications@github.com wrote:

@sebmarkbage should someone open a new issue to change transpiler to work like that? It would make so much more sense.


Reply to this email directly or view it on GitHub.

@leebyron
Copy link
Contributor Author

JSX is a language feature so it should be able to execute without any other context.

Yes, however the _source key being proposed here is unique to a React-specific transpilation.

I think we might beyond describing semantics of JSX as I imagine whatever we do here would be a transpiler detail and not a user-facing API change for those who use JSX.

What I'm actually unsure about is how the order of operations will work. Putting this "source token" in a global somewhere like $runtime.jsxSourceID means that something else needs to be responsible for initializing that value before the JSX block runs. As currently written, a source ID is set as a side-effect when require('react') is first run, such that React.getSourceID() is guaranteed to return a correct value, because you need to acquire the React reference first. The reason for it being a function rather than an accessor is to protect against malicious or blind mutations of that value, but we can probably find other ways to protect against that.

Perhaps we can propose something like $runtime.jsxSourceID() as a function call, so the first call to this function ensures a value has been set and every other call returns the same, and it's the role of a transpiler to ensure this function exists and is in scope before a JSX expression.

@jimfb
Copy link
Contributor

jimfb commented Apr 13, 2015

@leebyron While JSON serialization is useful, I think @brigand's suggestion is interesting and potentially worth further thought/discussion. His solution doesn't prohibit moving components across a serialization boundary (@brigand's suggestion merely requires that an additional explicit action must be taken to say "Yes, I mean for this to be deserialized as a component", namely that the developer set the unserializable field on the component after deserialization). It also meets Sebastian's desired language-feature rule/syntax without resorting to smelly globals. It has the advantage that even if the attacker has full knowledge of the user's browser state (including the _source variable or random-number-seed) AND the attacker has full control over the serialized json, they still can't mount an attack. It's super simple/elegant, which scores it bonus points, and I don't think it actually prohibits anything that anyone would reasonably want to do.

@leebyron
Copy link
Contributor Author

I'm not sure if I understand the full proposal from @brigand. I was under the impression that JSON serialization isn't just potentially useful - it's our goal. It unlocks the ability for React to operate across Realms, server, webworker, etc. In my opinion safe: Math.max seems fairly arbitrary in comparison to __proto__: React.Element

@jimfb
Copy link
Contributor

jimfb commented Apr 13, 2015

His suggestion, as I understand it, is to use a value which is impossible to describe using JSON (eg. a cyclic reference). Component (de)serialization is still possible, because the deserializer can explicitly declare their intent to receive a component by re-setting the value which was "lost" during the json serialization. So you don't loose the ability to pass components across the serialization boundary (to webworkers or whatever) but you make it impossible to "accidentally" deserialize a component because components require that special value. It means you have to know when you are expecting a component vs. data because you need to set that field if you're expecting a component, but I can't think of a case where you wouldn't/couldn't easily know/determine that.

The only downside is that deserialization of components requires "an extra step", but I don't see that as a bad thing. If you're really expecting to deserialize a component, you pass it into a function that rehydrates the component.

@sebmarkbage
Copy link
Collaborator

My problem with that solution is that is limited to the current JSON limitations. It says nothing about other serialization mechanisms such as postMessage, etc. Most serialization boundaries don't support functions atm but there's nothing permanent about that. If you add a new feature to the serialization boundary, you change the security implications of the whole app.

One way you can illustrate this is using the second (reviver) argument to JSON.parse which lets you return an arbitrary rich object which may contain a function for example, for whatever reason.

One reason we care about serializability, in general, is because we don't want to traverse and copy an entire object hierarchy to make it React-compatible. One of the most powerful features of React is that you can put ReactElements in arbitrary data structures. Not just a single "children" array.

var data = [
  {
    first: [
      new Immutable.Vector([ element1, element2 ]),
      new Immutable.Map().set('foo', element3)
    ],
    second: element4
  },
  element5
];

return <Component data={data} />; // Component does something fancy with this deep data structure

Even if your serializer knew how to properly transfer these data structures, it would also have to know how to properly deserialize a React element for this realm. A lot of serialization is, in practice, not pluggable because of layers of indirection. E.g. a data fetching library.

The alternative is that you have a secondary step that recursively goes through this data structure, after data fetching, and creates local React elements in place of these elements. That requires marshalling of every single node in this potentially large tree. So the ideal scenario would go directly from deserialization into the React component.

Yes, however the _source key being proposed here is unique to a React-specific transpilation.

@leebyron Well we could drop the underscore and make it possible for anyone to use this. Then it is not React specific.

What I'm actually unsure about is how the order of operations will work. Putting this "source token" in a global somewhere like $runtime.jsxSourceID means that something else needs to be responsible for initializing

Well, to properly decouple the transpiler and React, I think they would probably need to communicate through a third-party. Otherwise, you couple React to the transpiler instead. One possible third-party is a global like $JSX_SOURCE_ID. However, to avoid a direct third-party dependency on runtime code, we can introduce the notion that anyone who plays in this ecosystem sets this global variable if it is not already set. That way it doesn't matter which player runs first. Whoever is first gets to set the value and any subsequent player reads that value.

Standard Value DOM

My hope was that we would find a way to "standardize" (at some level, not necessarily ECMAScript) both JSX and the data structure together as a generic "Virtual DOM" (Value Type DOM). We would couple the VElement data structure with <... /> just like the Array data structure is coupled to [...], or the String data structure is coupled to "..." etc.

If we implemented this on top of something like the Value Type Proposal. Then we could potentially do something like typeof element === 'velement' or typeof element === VElement.symbol.

Custom value types can be passed between realms with their identity preserved by sharing a Symbol between realms. That way this at least works between non-serialized multiple realms (like iframes). Presumably, anyone that wants to share custom value types between realms would be in a similar situation where they need to find a way to share some kind of ID.

This is primarily why I like the ID solution because it is essentially a polyfill of the shared symbol solution for realms.

Of course, this "Value DOM" type is not really JSON serializable, unless it becomes defacto standard and therefore part of common JSON protocols or common revivers above JSON. Which sort of brings us back to the situation that we can be shared between realms but not serializable boundaries.

@sebmarkbage sebmarkbage mentioned this pull request May 23, 2015
25 tasks
@sebmarkbage
Copy link
Collaborator

The current thinking around Value Types in the spec is that they will use a single Symbol that is passed between realms. This is a coordination issue that is problematic for many systems like module systems and npm versioning. You also have to coordinate it with build systems like Babel. It is currently an unsolved problem.

Therefore I would like to punt on this and make "dangerouslyTrustAllSources" the default for now, just like in 0.13. We can still have the opt-in solution IF you have the ability to coordinate this in your system.

Another consequence of the Value Types system is that you're expected to only have ONE symbol to describe these types. I think that therefore we can get rid of the multi-source system which should simplify things a bit.

We have also separated ReactDOMClient/ReactDOMServer/ReactIsomorphic so we already have a need to coordinate between these packages.

How about we only expose setTrustedSourceID on each of these packages? This causes that package to turn on "secure mode" and only allow that trusted ID. The type of the ID can be of any type for now, but will expected to become a Symbol in the future.

@sophiebits
Copy link
Collaborator

(Can we call it a sourceToken? I think that sounds a little better.)

@sebmarkbage
Copy link
Collaborator

Eventually went with #4832 instead. Pushes the burden of managing this to the Symbol system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants