-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User from user's agent, and expanding third parties #94
Changes from all commits
8f3df59
04e2674
954fd02
598dc8d
d2b10fb
9991581
4e2d872
7192d64
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,8 +91,9 @@ we've prepared [a list of these questions in Markdown](https://raw.githubusercon | |
</h3> | ||
|
||
Just because information can be exposed to the web doesn’t mean that it | ||
should be. How does exposing this information to an origin benefit a user? | ||
Is the benefit outweighed by the potential risks? If so, how? | ||
should be. How does exposing this information to an origin or other party | ||
benefit a user? Is the benefit outweighed by the potential risks? If so, | ||
how? | ||
|
||
In answering this question, it often helps to ensure that the use cases your | ||
feature and specification is enable are made clear in the specification | ||
|
@@ -136,77 +137,170 @@ important to consider ways to mitigate the obvious impacts. For instance: | |
How does this specification deal with sensitive information? | ||
</h3> | ||
|
||
Just because data is not personal information or PII, that does not mean | ||
that it is not sensitive information; moreover, whether any given information | ||
is sensitive may vary from user to user. Data to consider if sensitive | ||
includes: financial data, credentials, health information, location, or | ||
credentials. When this data is exposed to the web, steps should be taken to | ||
mitigate the risk of exposing it. | ||
|
||
<p class=example> | ||
Credential Management [[CREDENTIAL-MANAGEMENT-1]] allows sites to request | ||
a user's credentials from a user agent's password manager in order to | ||
sign the user in quickly and easily. This opens the door for abuse, as | ||
a single XSS vulnerability could expose user data trivially to | ||
JavaScript. The Credential Management API mitigates | ||
the risk by offering the username and password as only an opaque | ||
{{FormData}} object which cannot be directly read by JavaScript | ||
and strongly suggests that authors use Content Security Policy [[CSP]] | ||
with reasonable [=connect-src=] and [=form-action=] | ||
values to further mitigate the risk of exfiltration. | ||
</p> | ||
|
||
<p class=example> | ||
Geolocation information can serve many use cases at a much less granular | ||
precision than the user agent can offer. For instance, a restaurant | ||
recommendation can be generated by asking for a user’s city-level | ||
location rather than a position accurate to the centimeter. | ||
</p> | ||
|
||
<p class=example> | ||
A Geofencing proposal [[GEOFENCING-EXPLAINED]] ties itself to service workers and | ||
therefore to encrypted and authenticated origins. | ||
</p> | ||
Personal information is not the only kind of sensitive information. | ||
Many other kinds of information may also be sensitive. | ||
What is or isn't sensitive information can vary | ||
from person to person | ||
or from place to place. | ||
Information that would be harmless if known about | ||
one person or group of people | ||
could be dangerous if known about | ||
another person or group. | ||
Information about a person | ||
that would be harmless in one country | ||
might be used in another country | ||
to detain, kidnap, or imprison them. | ||
|
||
Note: | ||
caste, | ||
citizenship, | ||
color, | ||
credentials, | ||
criminal record, | ||
demographic information, | ||
employment status, | ||
ethnicity, | ||
financial information, | ||
health information, | ||
location data, | ||
marital status, | ||
political beliefs, | ||
profession, | ||
race, | ||
religious beliefs or nonbeliefs, | ||
sexual preferences, | ||
and | ||
trans status | ||
are all examples of sensitive information. | ||
|
||
When a feature exposes sensitive information to the web, | ||
its designers must take steps | ||
to mitigate the risk of exposing the information. | ||
|
||
<div class=example> | ||
|
||
The Credential Management API allows sites | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR seems to mix stylistic rewordings like this paragraph, with changes in meaning like "or other party" in the first paragraph and "user"->"user agent" in the fingerprinting section. That makes it hard to review. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we could arrange time, either in a side meeting, or at a future meeting to discuss this? Personally I find document collaboration via GitHub to be more complex than other document collaboration tools such as those provided by Google or Microsoft. In the meantime here is some background and related issues. PR Background The PR is made against the "revise-2" branch of the master document. This PR introduces the following for reviewer consideration in addition to the changes already included in "revise-2".
a) First-parties – web site authors who are readily identifiable via the domain name displayed in the address bar. People can be harmed by any one or all three of the above. Afterall some user agent vendors have received the largest fines in relation to privacy violations. The document would benefit from recognising each group and the role they play in the specific issues of security and privacy. This was originally raised as an issue which includes some background. Related Issues These related issues are important. @torgo closed an issue related to people's ability to trust supply chains with this statement “We have already established that the statement "supply chains can be trusted" (on its own) is false.” But unfortunately, did not provide me with any references to justify that that statement is false. Such information should be included in the document and be provided from an authoritative source. We can look at other industries they all have supply chains people can trust. Why should the web be different? As a minimum the document should be expanded to consider the conditions that do or do not enable a supply chain to be trust. Referenced RFC 6973 entertains the possibility and certainly does not exclude it. I believe the onus in a consensus driven governance model is for the proposer to convince others. As a new member I am keen to understand the facts and feel these issues have been closed without discussion or clarity in the document. I've been advised to progress resolution via text changes to the document and I'm doing so here. Policy The more general issue relates to W3C policy. This document infers policy. It does not seem to be the role of a standards body to make decisions that restrict people’s choice. That is a role of law makers. Whilst these documents are supposed to advise they do prescribe rules akin to laws. According to the W3C Process if a proposal is to be successful it will at some point need to pass horizontal review. It's unlikely the AC or the Director would approve a standard that were deficient. Rightly proposers become familiar with those documents that will be used to assess their proposal at the onset. Reviewer's frequently reference these documents. Whilst these documents may not be formally classified as laws, they appear to be use as such in practice. |
||
to request a user's credentials | ||
from a password manager. [[CREDENTIAL-MANAGEMENT-1]] | ||
If it exposed the user's credentials to JavaScript, | ||
and if the page using the API were vulnerable to [=XSS=] attacks, | ||
the user's credentials could be leaked to attackers. | ||
|
||
The Credential Management API | ||
mitigates this risk | ||
by not exposing the credentials to JavaScript. | ||
Instead, it exposes | ||
an opaque {{FormData}} object | ||
which cannot be read by JavaScript. | ||
The spec also recommends | ||
that sites configure Content Security Policy [[CSP]] | ||
with reasonable [=connect-src=] and [=form-action=] values | ||
to further mitigate the risk of exfiltration. | ||
|
||
</div> | ||
|
||
Many use cases | ||
which require location information | ||
can be adequately served | ||
with very coarse location data. | ||
For instance, | ||
a site which recommends restaurants | ||
could adequately serve its users | ||
with city-level location information | ||
instead of exposing the user's precise location. | ||
|
||
See also | ||
|
||
* [[DESIGN-PRINCIPLES#do-not-expose-use-of-assistive-tech]] | ||
|
||
<h3 class=question id="persistent-origin-specific-state"> | ||
Does this specification introduce new state for an origin that persists | ||
across browsing sessions? | ||
</h3> | ||
|
||
Allowing an origin to persist data on a user’s device across browsing | ||
sessions introduces the risk that this state may be used to track a user | ||
without their knowledge or control, either in a first party or third party | ||
contexts. New state persistence mechanisms should not be introduced without | ||
mitigations to prevent it from being used to track users across domains or | ||
without control over clearing this state. And, are there specific caches | ||
that a user agent should specially consider? | ||
|
||
<p class=example> | ||
Service Worker [[SERVICE-WORKERS]] intercept all requests made by an | ||
origin, allowing sites to function perfectly even when offline. A | ||
maliciously-injected service worker, however, would be devastating (as | ||
documented in [[SERVICE-WORKERS#security-considerations]]). | ||
They mitigate the risks an [=active network attacker=] or [=XSS=] | ||
vulnerability present by requiring an encrypted and authenticated | ||
connection in order to register a service worker. | ||
</p> | ||
|
||
<p class=example> | ||
Platform-specific DRM implementations might expose origin-specific | ||
information in order to help identify users and determine whether they | ||
ought to be granted access to a specific piece of media. These kinds of | ||
identifiers should be carefully evaluated to determine how abuse can be | ||
mitigated; identifiers which a user cannot easily change are very | ||
valuable from a tracking perspective, and protecting the identifiers from | ||
an active network attacker is an important concern. | ||
</p> | ||
There are many existing mechanisms | ||
origins can use to | ||
store information about a user. | ||
Cookies, | ||
`ETag`, | ||
`Last Modified`, | ||
{{localStorage}}, | ||
and | ||
{{indexedDB}} | ||
are just a few examples. | ||
|
||
Allowing an origin | ||
to store data | ||
on a user’s device | ||
in a way that persists across browsing sessions | ||
introduces the risk | ||
that this state may be used | ||
to track a user | ||
without their knowledge or control, | ||
either in [=first-party-site context|first-=] or [=third-party context|third-party=] contexts. | ||
|
||
One of the ways | ||
user agents mitigate the risk | ||
that client-side storage mechanisms | ||
will form a persistent identifier | ||
is by providing users with the ability | ||
to clear out the data stored by origins. | ||
New state persistence mechanisms | ||
should not be introduced | ||
without mitigations | ||
to prevent them | ||
from being used | ||
to track users | ||
across domains | ||
or without control | ||
over clearing this state. | ||
That said, | ||
manually clearing storage | ||
is something users do only rarely. | ||
Spec authors should consider ways | ||
to make new features more privacy-preserving without full storage clearing, | ||
such as | ||
reducing the uniqueness of values, | ||
rotating values, | ||
or otherwise making features no more identifying than is needed. | ||
<!-- https://github.com/w3ctag/design-principles/issues/215 --> | ||
Also, keep in mind that | ||
user agents make use of several different caching mechanisms. | ||
Which, if any, caches will store this new state? | ||
Are additional mitigations necessary? | ||
|
||
<div class=example> | ||
|
||
Service Workers | ||
intercept all requests made by an origin, | ||
which enables sites | ||
to continue to function when the browser goes offline. | ||
Because of this, | ||
a maliciously-injected service worker | ||
could compromise the user (as documented in [[SERVICE-WORKERS#security-considerations]]). | ||
|
||
The spec mitigates the risks | ||
an [=active network attacker=] or [=XSS=] vulnerability present | ||
by limiting service worker registration to [=secure contexts=]. | ||
[[SERVICE-WORKERS]] | ||
|
||
</div> | ||
|
||
<p class=example> | ||
Cookies, `ETag`, `Last Modified`, {{localStorage}}, {{indexedDB}}, etc. all | ||
allow an origin to store information about a user, and retrieve it later, | ||
directly or indirectly. User agents mitigate the risk that these kinds of | ||
storage mechanisms will form a persistent identifier by offering users the | ||
ability to wipe out the data contained in these types of storage. | ||
Platform-specific DRM implementations | ||
(such as [=content decryption modules=] in [[ENCRYPTED-MEDIA]]) | ||
might expose origin-specific information | ||
in order to help identify users | ||
and determine whether they ought to be granted access | ||
to a specific piece of media. | ||
These kinds of identifiers | ||
should be carefully evaluated | ||
to determine how abuse can be mitigated; | ||
identifiers which a user cannot easily change | ||
are very valuable from a tracking perspective, | ||
and protecting such identifiers | ||
from an [=active network attacker=] | ||
is vital. | ||
</p> | ||
|
||
<h3 class=question id="underlying-platform-data"> | ||
|
@@ -221,13 +315,13 @@ communication methods. | |
|
||
When a specification exposes specific information about a host to an origin, | ||
if that information changes rarely and is not variable across origins, then | ||
it can be used to uniquely identify a user across two origins — either | ||
directly because any given piece of information is unique or because the | ||
combination of disparate pieces of information are unique and can be used to | ||
form a fingerprint [[FINGERPRINTING-GUIDANCE]]. Specifications and user agents | ||
should treat the risk of fingerprinting by carefully considering the surface | ||
of available information, and the relative differences between software and | ||
hardware stacks. Sometimes reducing fingerprintability may as simple as | ||
it can be used to uniquely identify a user agent or user across two origins | ||
— either directly because any given piece of information is unique or because | ||
the combination of disparate pieces of information are unique and can be used | ||
to form a fingerprint [[FINGERPRINTING-GUIDANCE]]. Specifications and user | ||
agents should treat the risk of fingerprinting by carefully considering the | ||
surface of available information, and the relative differences between software | ||
and hardware stacks. Sometimes reducing fingerprintability may as simple as | ||
ensuring consistency, i.e. ordering the list of fonts, but sometimes may be | ||
more complex. | ||
|
||
|
@@ -239,7 +333,7 @@ exfiltrate data. | |
<p class=example> | ||
The `RENDERER` string exposed by some WebGL implementations | ||
improves performance in some kinds of applications, but does so at the | ||
cost of adding persistent state to a user's fingerprint. These kinds of | ||
cost of adding persistent state to a user agent's fingerprint. These kinds of | ||
jwrosewell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
device-level details should be carefully weighed to ensure that the costs | ||
are outweighed by the benefits. | ||
</p> | ||
|
@@ -258,16 +352,17 @@ entropy introduced by [disallowing direct enumeration of the plugin list](https: | |
If so, what kind of sensors and information derived from those sensors does | ||
this standard expose to origins? | ||
|
||
Information from sensors may serve as a fingerprinting vector across origins. | ||
In addition, sensor also reveals something about my device or environment and | ||
that fact might be what is sensitive. In addition, as technology advances, | ||
mitigations in place at the time a specification is written may have to be | ||
reconsidered as the threat landscape changes. | ||
Information from sensors may serve as a fingerprinting vector on the same origin | ||
or across origins. In addition, sensor can reveal something about the user | ||
agent or environment and that fact might be what is sensitive. In addition, as | ||
technology advances, mitigations in place at the time a specification is written | ||
may have to be reconsidered as the threat landscape changes. | ||
jwrosewell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Sensor data might even become a cross-origin identifier when the sensor reading | ||
is relatively stable, for example for short time periods (seconds, minutes, even days), and | ||
is consistent across-origins. In fact, if two user-agents expose the same | ||
sensor data the same way, it may become a cross-browser, possibly even a cross-device identifier. | ||
is relatively stable, for example for short time periods (seconds, minutes, | ||
even days), and is consistent across-origins. In fact, if two user agents | ||
expose the same sensor data the same way, it may become a cross-browser, | ||
possibly even a cross-device identifier. | ||
|
||
<p class=example> | ||
As gyroscopes advanced, their sampling rate had to be lowered to | ||
|
@@ -286,16 +381,16 @@ serve as an identifier if misused/abused [[OLEJNIK-BATTERY]]. | |
</p> | ||
|
||
<h3 class=question id="other-data"> | ||
What data does this specification expose to an origin? Please also | ||
document what data is identical to data exposed by other features, in the | ||
same or different contexts. | ||
What data does this specification expose to an origin or other party? | ||
Please also document what data is identical to data exposed by other | ||
features, in the same or different contexts. | ||
</h3> | ||
|
||
As noted above in [[#sop-violations]], the [=same-origin policy=] is an | ||
As noted in [[#sop-violations]], the [=same-origin policy=] is an | ||
jwrosewell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
important security barrier that new features need to carefully consider. | ||
If a specification exposes details about another origin's state, or allows | ||
POST or GET requests to be made to another origin, the consequences can be | ||
severe. | ||
POST or GET requests to be made to another origin or party, the consequences | ||
can be severe. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also not necessary as origin implies party |
||
|
||
<p class=example> | ||
Content Security Policy [[CSP]] unintentionally exposed redirect targets | ||
|
@@ -408,21 +503,18 @@ at zero, increments, and is reset — is a good example of a privacy friendly | |
temporary identifier. | ||
|
||
<h3 class=question id="first-third-party"> | ||
How does this specification distinguish between behavior in first-party and | ||
third-party contexts? | ||
How does this specification protect end users from storing persistent data | ||
on a user’s device across browsing session? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I think your new question may be one worth asking in the questionnaire, I'd rather it be spun off into a separate PR that adds this question, instead of one which replaces the first-/third-party question. Among other things, this new question is significantly narrower than the original. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe pursue removing or modifying the first-/third-party question in a separate PR as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two key changes in this pull request.
Before splitting the second change into another pull request I’m keen to gauge wider views on this and have been encouraged to raise these issues via specific text changes. If smaller participants on the web are denied the ability to “band together” to deliver services that only larger participants can deliver due to their scale, then we have to consider the W3C’s antitrust policy. What work has been done on this document to verify it does not unconsciously steer specifications towards this outcome? Whilst I recognise that the subject of competition and antitrust may not be one other people feel willing or able to engage with that does not resolve the matter. There are other considerations related to people’s trust choices and trusting supply chains. The document, and reviewers of issues, have categorically stated that people can not trust supply chains. I’ve not seen the evidence for this from any authoritative sources. As a minimum the document should reference such sources for new readers like myself. Perhaps approaching the problem from the perspective of the conditions under which a person can or can not trust a supply chain would add a much needed dimension to the document? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is the right forum to have that debate. We're trying to craft some guidelines here, not open up a debate about the nature of the web. Can I suggest that we go with Tess's suggestion and split this out into a new question? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following discussion in the IWA BG where there was broad support to discuss the subject further, and considering the opportunity at TPAC to hold a debate on these matters, I have proposed a session for October. The output of that session will help inform the future of these definitions and how they apply to the questionairre and the work of the TAG and PING. It is clear that the current definitions or interpretations of parties is too simplistic to solve the myriad of challanges we are all working on. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A TPAC break session proposal has been added following the Improving Web Advertising Business Group today and discussion on the session there. |
||
</h3> | ||
|
||
The behavior of a feature should be considered not just in the context of its | ||
being used by a first party origin that a user is visiting but also the | ||
implications of its being used by an arbitrary third party that the first | ||
party includes. When developing your specification, consider the implications | ||
of its use by third party resources on a page and, consider if support for | ||
use by third party resources should be optional to conform to the | ||
specification. If supporting use by third party resources is mandatory for | ||
conformance, please explain why and what privacy mitigations are in place. | ||
This is particularly important as user agents may take steps to reduce the | ||
availability or functionality of certain features to third parties if the | ||
third parties are found to be abusing the functionality. | ||
The behavior of a feature should be considered not just in the context of | ||
its being used by a first party origin that a user is visiting but also the | ||
implications of its being used by other parties. When developing your | ||
specification, consider the implications of its use by all parties. Consider | ||
if this sufficiently protects end users from storing persistent data on a | ||
user’s agent across browsing sessions. If supporting use by parties other | ||
than the first party origin is mandatory for conformance, please explain why | ||
and what privacy mitigations are in place. | ||
|
||
<h3 class=question id="private-browsing"> | ||
How does this specification work in the context of a user agent’s Private | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rebase? I don't think this change is necessary given the current text.