-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redact location.ancestorOrigins according to Referrer Policy #1918
Comments
One big question, which I asked in the PR, is what does "redact" mean. Since it's an origin instead of a URL, several of the referrer policies don't really apply (e.g. maybe they're no-ops). If it gets censored completely (e.g. if the referrer policy is "no-referrer"), then does the resulting array contain null? The empty string? Or is that entry just missing, so that the number of entries in the array is less than the number of ancestor browsing contexts? We'll need a comprehensive spec for (origin, referrer policy) -> censored origin. Otherwise, I think we'd need to get a sense of what other user agents besides Firefox would be interested in this spec change. I guess only Chrome implements both referrer policy and ancestorOrigins, so... @mikewest, perhaps? As for WebKit and Edge, which don't implement referrer policy but do implement ancestorOrigins: does this sound reasonable to you, as something you would do if/when you eventually implemented referrer policy? Leaving aside any commitments to implementing referrer policy. Tagging the usual suspects... @cdumez @travisleithead. Please route to more appropriate people as necessary. |
The idea is that if the referrer policy allows the origin to leak out via the referrer (which I believe all policies except "no-referrer" do) then we should just go ahead and return the origin in ancestorOrigins. So this is really about the "no-referrer" case, plus any browser configuration that has equivalent effects. As for what value should be used in the "no-referrer" case, I don't have a strong opinion. Obvious options are |
I should write some test cases, but isn't the null case already possible today with GUID URL schemes? (data:, file:, etc.) And implicitly handled, as with CORS, by serializing to the string literal "null" according to RFC6454? |
|
Well, this could be defined as basically a switch on the referrer policy states (which might be the most logical internal implementation choice), but I thought that calling out to the algorithm to produce a referrer and then extracting the origin via URL parsing would be more future compatible with new policy states that might be defined. I can revisit if that seems preferable. |
IMO a switch makes the most sense, but adding it to the Referrer Policy spec would be best, since that ensures that whenever they add new policies they'll see that they need to update that algorithm as well. |
The referrer may or may not be related to the origin in general (e.g. for a sandboxed iframe the referrer is based on its URL but the origin a unique origin). So going via some sort of "extract the referrer" algorithm to get a value to use in |
Take a look at: w3c/webappsec-referrer-policy#77 ? |
One thing that I'd like to check on, actually. What should happen if a page at origin A loads a subframe from origin A which then loads a page from origin B, if the original page is sending full referrers but the subframe is using the |
I haven't spec'd it as a barrier or ratchet, but an individual query from a On Tue, Oct 18, 2016 at 6:00 PM Boris Zbarsky notifications@github.com
|
OK, but that will leak the origin of the topmost page in this case, when it should be able to have a reasonable expectation of no such leakage occurring, right? |
Is that a reasonable expectation? Or should it set its own policy if it is |
As long as it's only loading things it controls, I think it is, yes. This way the decision as to whether to allow the origin to escape only has to be made in the page that actually loads cross-site things.
I'm not sure what you mean by "ratchet" here, but two simple things to specify would be that once you hit |
Note the more clearly articulated proposal I made for this in w3c/webappsec-referrer-policy#77 (comment). I thought @hillbrad was going to convert that to an HTML spec issue, but that didn't seem to happen... Anyway, I would love feedback from Blink and WebKit on whether the change I propose is something they would implement, and feedback from Edge on whether they're interested in implementing this at all, and if so under what conditions. |
Copying @RByers, @cdumez, @travisleithead to get input from Blink, WebKit, and Edge. Would be nice to make some progress here. |
For Blink, perhaps @dominiccooney or @mikewest could comment? |
@jeisinger and @estark37 are Blink's referrer policy folks, and will likely have opinions. |
What I like about @bzbarsky's proposal is that it only indirectly uses referrer policy - referrer policy ideally should only affect the referrer. Of course using the referrer afterwards for whatever is fine. I think we'd implement this if that means that Firefox will ship ancestorOrigins, and the API is still good enough to achieve the kind of protection @hillbrad et al need |
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.
Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
For what it's worth, the idea to respect a referrer policy set by the domains in the ancestry chain is great, but neither ancestorOrigins nor the requested change go far enough in either direction. A full URL should be available in ancestorOrigins because domain on its own is no more or less secure because information about a person can be groked by domain + some number of other data points, so truncating it doesn't make much sense for user privacy concerns if we're being strict here. Conversely, a domain (cnn.com) may be considered ok, but a page on that domain (cnn.com/vegas-shooting-kills-dozens-etc) may be considered not ok given a specific context. On the other hand, the user also has not and cannot indicate via referrer policy set by the middle men that it doesn't want to leak information about the ancestor chain, and that begs the question, should there be user level controls for turning this information flow on or off. In my opinion, this requires a multi-part solution where the user has the ability to turn off a behavior, as do sites(content providers) who manage relationships between one another, but the location.href chain should be opened up fully where no restrictions are explicitly called for. The primary case FOR doing this from a supply chain perspective is being assured the message and markup you're delivering is not being framed in an inappropriate context. Advertisers, for instance, may have strict policies against placing their brand next to content related to pornography or extreme violence for instance. This information, when locked away through cross origin chains of iframes, becomes unknowable. On the other hand, if a user jumps into "in private" mode and disables this information from leaking to chains of iframes, a disabled chain of unknowable origins should be enough information for an advertiser to use as an indicator that maybe the risk isn't worth the buy opportunity, and the end users experience and privacy is preserved. |
The current webkit implementation is helpful to ad tech as it helps determine the validity of the embed. It's possible for an advertisement to be chained from the original site through multiple intermediary iframes before finally rendering the bottom level ad content - this is normal, if an ad request is going through multiple ad networks before finally arriving on a served ad. What ad tech wants to detect is when an ad is being served on an unwanted domain, or if something else is generally amiss in the chain of ancestors. Failure to make this information available makes it easier for bad actors to commit ad fraud. |
Sure, and ad tech could just treat "no available ancestorOrigins" as "bad actor" for its purposes. Then sites can decide whether they want to leak their origin to their subframes (and allow ad tech in there) or not, right? |
I'm a little confused by the attitude that a parent frame should remain anonymous to its subframes. If a site is being embedded by another site, don't they deserve to know by who? In what legitimate scenario does a site embed an iframe (or a chain of iframes) and need to be anonymous? |
As a reminder, there's a HTML PR for this at #2480 and a WPT PR at web-platform-tests/wpt#5402. @othermaciej @johnwilander I suspect Safari picking this up would make it more likely for Firefox to ship this too (it currently does not expose this attribute at all). |
Imo, no. If it doesn't want to be framed, it has ways to avoid being framed, yes? My usual go-to example here is that imo a site should be able to embed a video from a video hosting site without exposing information about itself to a video hosting site. Under the assumption that the video hosting site allows such framing, of course. |
A person who visits a political or health blog doesn't want these URLs to be shared with giphy, facebook, and every adtech company on the planet. While it's understandable that adtech companies want to know my political views and if I have cancer or not (and as a side effect, can prevent ad fraud more easily), as a user I'd like to have a choice if my browser sends this very personal information. Embed providers are not entitled to it. They should be able to choose who can embed them (possible with frame-ancestors), and users should be able to choose whom they want to share information with. |
My counter point is that blocking-by-default will effectively block the majority of ancestor data to ad tech because you can't expect developers to go out of their way to add/enable allow-policies. From the ad tech point of view, if you can't reliably see the ancestors, you can't reliably detect fraud. With regard to your privacy concerns, 1) Not all ad tech companies are interested in invading your privacy (although sure probably most are) and 2) If that's something you're worried about, ad block is fairly effective and 3) If the sites you're visiting are of a sensitive nature and are embedding advertisements and you're concerned about your privacy, perhaps you should be evaluating those sites and their choice of ad partners. I am someone who is building an ad tech company who is not interested in tracking individual users, and I need tools to detect, prevent and deter ad fraud. |
I have worked in adtech myself, on several sides of the ecosystem – adtech developers are used to much more painful things than adding allow policies to websites ;) So you can expect developers to do this. You can’t demand from a normal person using a browser to know what's going on behind the scenes. If I, as a software developer, have no means to see which health site tracks me and which doesn’t, how is a non-IT person supposed to understand this? It's the standard’s job to help creating browsers that protect me from bad actors. No matter if I have an ad blocker or not. If ad fraud can't be detected without complete surveillance, so be it? The ad industry is free to adapt business models that don’t simplify privacy fraud. If a user explicitly wants to be tracked in exchange for freebies, they'd still be free to configure their browser accordingly. Thanks for your counter arguments – I'm out of this discussion, and I hope that this issue can be solved in a way that doesn't hand my browser history over to random companies as a default. |
Browsers can determine if the user is a bot or not, as least as well as any external service. If this is communicated in a privacy preserving way then fraud could be detected more effectively without having to rely on surveillance. |
That is useful, but the issue I'm talking about is running ads that are supposed to only be served on one site and running them on another site. The people seeing the ads will be legitimate users, but how will the ad tech know if the ads are being served on the intended site without the ancestor list? |
In this proposal the browser will determine if they are being shown on the intended site, the ad tech only gets metrics from the Metrics Server e.g. Neilson or similar. Anything invalid gets ignored. |
... but wouldn't bots just use lying browsers? |
It's not so much about detecting bots as it is about preventing malicious publishers from sending spoofed data via real users. |
Problem statement: This is the default case. The top level site does not have the ability to control this behavior. Proposed solution: Regarding privacy, here’s my best semi-complete list for @dliebner and @opyh. A user should be able to opt into advertising and tracking for ad supported publisher content. For example:
A site (aka publisher) should expect to be able to restrict page content, including cross domain content such as ads, to appropriate usage. For example:
It's probable that I missed some things here. Good news, bad news is… OpenRTB 3.0 has a possible solution using blockchain like signed ledgers to show the chain of changes to a bid request. The problem is, the adoption rate for OpenRTB is not fast. It's a big change and it's making some big assumptions about publishers, exchanges and networks willingness to adopt the new complexity and cost associated with implementing it. The biggest benefits are for adopters of header bidding. The biggest losers in this are probably ad networks, which is likely why there is a real reluctance to adopt this version. They married a good tasting thing with a bad tasting thing. You can read more on the certificate chain here: What is ads.cert? @opyh has some valid points related to not leaking the full browsing history of the user to advertisers. @dliebner also has valid points related to a trustworthy supply chain free of fraudulent publisher and exchange practices. My earlier comment is probably closer to an additional feature request for user level controls since this ticket addresses publisher level controls. |
@bzbarsky @dakami and I had a hallway discussion at the end of TPAC about the possibility of adding location.ancestorOrigins to Firefox. bz has had longstanding concerns about the information this leaks to child frames. We arrived at a local consensus that any leakage is roughly equivalent to what happens already with referrer, so it would make sense to redact ancestorOrigins according to referrer policy. (and this could resolve that objection to a Mozilla implementation of ancestorOrigins)
/cc @smaug---- @annevk
The text was updated successfully, but these errors were encountered: