Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider surfacing sanitization fallbacks to the developer #21

Closed
koto opened this issue Oct 1, 2017 · 12 comments
Closed

Consider surfacing sanitization fallbacks to the developer #21

koto opened this issue Oct 1, 2017 · 12 comments
Labels

Comments

@koto
Copy link
Member

koto commented Oct 1, 2017

e.g. when TrustedURL gets replaced with about:invalid.

@xtofian
Copy link

xtofian commented Dec 1, 2017

What might be useful is to allow registration of callbacks that are invoked when a sink is supplied with a value of a type other than the appropriate TrustedType for the sink (e.g. if .innerHTML is assigned a plain string, of a value of TrustedURL, or something else).

The callback must be a function that returns the correct TrustedType for the sink (if some other type is returned, an exception results). This requirement avoids the need for security reviewers to make special efforts to find implementations of such callbacks: The callback mechanism itself is not privileged. Of course in many cases the callback's implementation will use the unsafelyCreate function for the type; but security reviewers already need to track all instances thereof (and in larger projects will typically rely on a mechanism to ensure such reviews indeed take place, see #31).

Such a callback mechanism would be useful for a number of reasons:

  • It makes it easy to adapt a code-base that already uses framework-specific safe-value types (e.g. Closure's or Angular's SafeHtml). The callback for each type can translate corresponding types into the web-platform type (e.g. Closure SafeHtml into TrustedHTML).

  • The application can install custom sanitizers for types that can be run-time sanitized. For instance, in the context of a typical web application (i.e. not to be served in a webview), tel:, sms:, whatsapp: etc URLs are safe, and can be run-time sanitized. A callback for the TrustedURL type can perform this run-time validation, and return URLs that pass validation via TrustedURL.unsafelyCreate

  • It might be worth considering special treatment for the callback for the innerHTML setter: In addition to TrustedHTML, the callback may also return a list of Nodes, which will replace the current element's children. This would allow the callback to invoke a HTML sanitizer that safely produces DOM from untrustworthy HTML markup.

  • The callbacks can be used as hooks to customize logging and error reporting when a value of a type other than the expected TrustedType reaches a sink. This could be especially useful when large existing code bases are to be adapted to TrustedType mediation of the DOM; in this case the callbacks could be used to instrument DOM sink assignments, and collect detailed stack traces (in a test environment) to determine where values of non-trusted-types reach a sink.

    Similarly, the application can customize what should happen if a non-trusted-type is assigned to a sink -- throw an exception, or assign a particular innocuous value.

With the latter point in mind, it might make sense for the platform to have the non-configurable behavior to always throw an exception if a non-trusted-type is assigned to a sink (either directly if no callback is invoked, or returned from the callback if so).

@koto
Copy link
Member Author

koto commented Dec 1, 2017

The polyfill allows us to enforce any behavior in the sinks, including a configurable behavior for throwing, logging and sanitizing values. However it might be problematic to have that much freedom in the native version, as the simplicity of the native TT lies in them being based on the type system (we disable the string-based versions of the sinks). Having a custom implicit string-to-TT conversion complicates that a bit.

The approach you're suggesting seems to encourage the design where the strings (or application-specific wrappers over them) continue to be used throughout the application, and the conversion would be done at the sinks themselves. I understand why this might be useful for existing types implementation (e.g. SafeTypes or Angular types), but I think it would be beneficial for those types to wrap over TT instead of strings.

That said, I feel there is a need for allowing custom, but explicit string-to-TT conversion, in the form e.g. installing sanitizers, URL filters and such. Those user-defined sanitizers could do the conversions you need, e.g. there might be a sanitizer accepting a SafeHtml instance and unwrapping it, or a sanitizer that actually HTML sanitizes a string.

To solve the custom logging issue in the native version, can we have an exception handler per-type, @mikewest? Something like:

TrustedHtml.addEventListener('error', e => {console.log(e.data /*the value causing the error */)})

@xtofian
Copy link

xtofian commented Dec 6, 2017

The approach you're proposing (no implicit sanitization/conversion at sinks; all sanitization has to be explicitly invoked) is certainly more principled.

However, it's important to note that (unless I'm missing something), there is no security difference between that and implicit sanitization at the sink. My reasoning is this:

There are two basic ways to establish that a value is a member of a given trustedtype:

  1. By run-time predicate. E.g.,
    • verifying that a URL in string form has a http or https scheme is sufficient to establish that it's a member of the TrustedURL type (and safe to use as a hyperlink URL, e.g. in a <a href> context)
    • Escaping <, >, &, ', " with corresponding HTML entities is sufficient to make any arbitrary string a member of the TrustedHTML type.
  2. By construction or provenance. E.g., in the Closure types, we use the fact that a URL derives from entirely application-controlled data as a reasonable proxy for the notion that it points to a trustworthy source (i.e. is a TrustedScriptURL). Or, HTML markup composed from a properly-formatted HTML tag with appropriately typed attributes, and content obtained from a TrustedHTML, yields another member of TrustedHTML.

(Aside: Most types have some members that are established by either type. For some types, there's no general way to establish membership at runtime, i.e. they only can be established based on provenance. TrustedScriptURL is an example: There is no general predicate that can decide if a given URL points to something trustworthy or not (one can however construct application-specific predicates; e.g. by checking that a URL points into a directory on a particular host that is known to serve trustworthy static assets))

Obviously, in case (2), the code needs to pass the value in the form of the type -- the type serves to attach provenance to the value.

However, for values that can be turned into a trustedtype based on a runtime predicate/transformation, it really doesn't matter from a security perspective if that happens near the source or the sink, or somewhere in between: TrustedURL.sanitize makes a TrustedURL for any arbitrary input. Calling it just before assignment to HTMLAnchorElement#href (explicitly or implicitly) is just as safe as calling it somewhere near the beginning of the flow of that value.

There is a difference from a functional correctness perspective: Sanitization can fail. E.g. TrustedURL.sanitize would reject custom scheme URL (say, whatsapp://), even though a human reviewer could establish that they're safe. If such sanitization happens implicitly near the sink, this results in a functional bug. It'll be fairly apparent if there is test coverage (the link won't work). However, could still be somewhat tedious to figure out where the value comes from and was originally constructed, and that's the place where the code is best refactored to use a custom builder that supports this scheme.

This is one reason why implicit near-sink sanitization should be optional.

Note that the implicit-conversion hook as proposed in #21 (comment) does not enlarge the TCB: the platform requires the callback/hook to return a value of the correct type for the destination context. This means that the callback can use public, inherently-safe builder APIs for the types, e.g. TrustedURL.sanitize. Some hooks might need to use .unsafelyCreate, which does add to the TCB, but that's because of the use of the unchecked conversion, not because it's a sanitization callback. IOW, the callbacks are not security-sensitive code (in the sense that a mistake therein could cause a vulnerabiility), unless they use unsafelyCreate .

@xtofian
Copy link

xtofian commented Dec 6, 2017

Based on experience with the SafeHtml types, it's my impression that allowing plain strings to flow to sinks (DOM API wrappers, or contextually escaping/sanitizing template systems) and "just work" in most cases, was fairly important for this approach to be practical. The safe/trusted-types builders are a bit more cumbersome than simple string concatenation/formatting, and not requiring developers to deal with that except in rare, unusual scenarios (e.g, URLs with custom schemes, HTML markup passed from a backend to a frontend), seems to be helpful.

Intuitively, developers are used to shoving strings into sinks, and it seems desirable to preserve that where it doesn't conflict with security goals.

FWIW, in our code base, there are approx 10 times as many calls to goog.dom.safe.setAnchorHref or setLocationHref (which perform run-time sanitization at the sink) as there are calls to SafeUrl.sanitize.

@koto
Copy link
Member Author

koto commented Dec 6, 2017

I see the point. So, in general, you propose something akin to:

let aTrustedHTML = TrustedHTML.escape('foo')
TrustedHTML.setFallbackConverter(html => {
console.log('sanitizing');
return TrustedHTML.unsafelyCreate(customSanitizer.sanitize(html));
});
TrustedHTML.setFallbackConverter(_ => {}) // throws. there can be only one converter.
element.innerHTML = aTrustedHTML // fallback not called
element.innerHTML = '<foo>' // fallback called 
'sanitizing'

This looks interesting, and might be a boon for adoption, especially in relation to mikewest/tc39-proposal-literals#2 - in the fallback we could check for literalness, and that would remove the need for rewriting all sinks. We'd have to make sure that:

  • Only one fallback converter per type can be used
  • It's possible to lock down this configuration setting as well (Mechanism to constrain usages of unsafelyCreate #31), such that one can reasonably audit such function (overwriting it would create lots of functional bugs only caught in integration tests).

I'll let @mikewest comment on the actual implementation difficulty. In a polyfill, it's trivial.

@xtofian
Copy link

xtofian commented Dec 6, 2017

Yes, that's what I had in mind. Perhaps hyperlink URLs are an even better example: For HTML, you could argue that HTML-sanitizing on the fly is not a clean approach in the long run (but might be expedient when adapting existing code). And if one is assigning plain text, one should just use .textContent.

With URLs, we'd do something like,

TrustedURL.setFallbackConverter(url => {
  console.log('sanitizing: ' + url);
  return TrustedURL.sanitize(url);
});

There are two plausible ways of handling actual values that aren't accepted by the sanitizer (e.g. foo-app://xyz): Assign an innocuous value ('about:invalid#failed-to-sanitize&url= + escapeURIcomponent(url)`, or throw an exception. The former makes sense in production code, since it defangs an actual attack without totally breaking the app. The latter is preferable during testing, since it makes functional bugs more apparent.

In the Closure libraries we originally went with the former, but have then added debug-only asserts (i.e. it throws an exception in debug builds, but assigns innocuous values in prod builds).

@xtofian
Copy link

xtofian commented Dec 6, 2017

I like the idea of supporting fallback to literals here. In Closure, we do this at the static checking step (the js-conformance rules allow foo.innerHTML = '<em>Hello</em>', i.e. don't require this to be re-written using the type-safe goog.dom.safe.setInnerHtml wrapper).

It's pretty common to see code like that (or more common, .innerHTML = ''). Of course those could be rewritten, but my hunch is that it's quite helpful to minimize how much developers have to deviate from what they're used to.

@xtofian
Copy link

xtofian commented Dec 6, 2017

Good point re the need to lock down configuration of fallbacks. This is not a security concern (assuming the platform enforces that the correct type comes out of the fallback). However you're right that it would be a functional-correctness nightmare if every library assumed it can install its fallbacks.

If we go with the capability/singleton approach in #33 (comment), one way of achieving the lockdown is to expose the .setFallback functions on that capability object instead of globally. (in that case that capability object wouldn't be TrustedTypesFactory anymore, but rather something like TrustedTypesPrivilegedOperationsCapability)

@mikesamuel
Copy link
Collaborator

Does the below summarize the non-interface ideas above?

Scenarios

  • Library code (e.g. sanitizer) doesn't like its input, and uses a fallback return value
  • IDL endpoint gets incompatible RHS

Use cases

  • Make root causes apparent to developer
  • Allow servers to collect telemetry from clients a la report-uri
  • Allow client interface to degrade gracefully
  • Allow an app to run in report-only mode by intentionally substituting unsafe value as the fallback value

@koto koto added the spec label Jun 4, 2018
@koto
Copy link
Member Author

koto commented Jul 4, 2018

Merging this with #63. It seems like having a platform-provided fallback mechanism would really ease the adoption. This could be implemented in several ways:

  • Allow for limited number of statically defined "string-to-type" upgrade mechanisms, like url-allow-http
  • Allow for defining a programmatic fallback policy that gets automatically called when native sink is used with a string. While this can be done from userland code (by hooking into setters), there's probably a non-trivial performance cost of doing that.
  • Allow for error recovery from outside the policies (e.g. dispatch an event?). As long as the event object has enough context information (e.g. the property name, the value that was violating the policies), the error handling mechanism could be used to call the setter again with the upgraded, sanitized value. preventDefault could be used to stop the eventual CSP violation from happening.

Is there any preference to implementing one of the above variants in the browser? cc @mikewest

@koto koto closed this as completed Jul 4, 2018
@koto
Copy link
Member Author

koto commented Jul 4, 2018

Perhaps the SecurityPolicyViolation event event could hold enough information for the applications to implement the fallback policy when handling it? It's likely the event will be cancellable (i.e. applications could recover from it, and not cause sending violation reports).

Content-Security-Policy: trusted-types policyFoo policyBar fallback-allow-http
TrustedTypes.createPolicy('fallback-allow-http', (p) => {
p.expose = true;
p.createHTML = (s) => { isHttpUrl(s) ? s : throw TypeError('Nope') }
})

// programmatic fallback
document.addEventListener('securitypolicyviolation', (ev) => {
  if (ev.violatedDirective == 'trusted-types') {
    console.log(ev.target, ev.propertyName); // e.g. <a>, 'href' - propertyName would have to be added to the event.
    if (ev.requestedType == 'TrustedURL') { 
      // That property doesn't exist now in the event.
      // It could be inferred from ev.target & ev.propertyName. 
     try {
      const url = TrustedTypes.getPolicy('fallback-allow-http').createURL(ev.blockedURI);
      ev.target[ev.propertyName] = url;
      ev.preventDefault();
     } catch(e) {}
    }
  }
});

@koto
Copy link
Member Author

koto commented Jul 4, 2018

This might be hard as the event object as specced currently purposefully doesn't include a lot of information - e.g:

  • sample is trimmed
  • sample is present iff the violation is caused by inline script/style,
  • URIs don't have fragments

It's mostly meant to contain data that might be exported off-domain to 3rd party CSP log analyzers, whereas in TT case it's meant to help us recover in-document. It's also harder to polyfill an event-based solution, as the browsers limit what properties an event object might have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants