Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User from user's agent, and expanding third parties #94

Closed
wants to merge 8 commits into from
273 changes: 182 additions & 91 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,9 @@ we've prepared [a list of these questions in Markdown](https://raw.githubusercon
</h3>

Just because information can be exposed to the web doesn’t mean that it
should be. How does exposing this information to an origin benefit a user?
Is the benefit outweighed by the potential risks? If so, how?
should be. How does exposing this information to an origin or other party
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rebase? I don't think this change is necessary given the current text.

benefit a user? Is the benefit outweighed by the potential risks? If so,
how?

In answering this question, it often helps to ensure that the use cases your
feature and specification is enable are made clear in the specification
Expand Down Expand Up @@ -136,77 +137,170 @@ important to consider ways to mitigate the obvious impacts. For instance:
How does this specification deal with sensitive information?
</h3>

Just because data is not personal information or PII, that does not mean
that it is not sensitive information; moreover, whether any given information
is sensitive may vary from user to user. Data to consider if sensitive
includes: financial data, credentials, health information, location, or
credentials. When this data is exposed to the web, steps should be taken to
mitigate the risk of exposing it.

<p class=example>
Credential Management [[CREDENTIAL-MANAGEMENT-1]] allows sites to request
a user's credentials from a user agent's password manager in order to
sign the user in quickly and easily. This opens the door for abuse, as
a single XSS vulnerability could expose user data trivially to
JavaScript. The Credential Management API mitigates
the risk by offering the username and password as only an opaque
{{FormData}} object which cannot be directly read by JavaScript
and strongly suggests that authors use Content Security Policy [[CSP]]
with reasonable [=connect-src=] and [=form-action=]
values to further mitigate the risk of exfiltration.
</p>

<p class=example>
Geolocation information can serve many use cases at a much less granular
precision than the user agent can offer. For instance, a restaurant
recommendation can be generated by asking for a user’s city-level
location rather than a position accurate to the centimeter.
</p>

<p class=example>
A Geofencing proposal [[GEOFENCING-EXPLAINED]] ties itself to service workers and
therefore to encrypted and authenticated origins.
</p>
Personal information is not the only kind of sensitive information.
Many other kinds of information may also be sensitive.
What is or isn't sensitive information can vary
from person to person
or from place to place.
Information that would be harmless if known about
one person or group of people
could be dangerous if known about
another person or group.
Information about a person
that would be harmless in one country
might be used in another country
to detain, kidnap, or imprison them.

Note:
caste,
citizenship,
color,
credentials,
criminal record,
demographic information,
employment status,
ethnicity,
financial information,
health information,
location data,
marital status,
political beliefs,
profession,
race,
religious beliefs or nonbeliefs,
sexual preferences,
and
trans status
are all examples of sensitive information.

When a feature exposes sensitive information to the web,
its designers must take steps
to mitigate the risk of exposing the information.

<div class=example>

The Credential Management API allows sites
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to mix stylistic rewordings like this paragraph, with changes in meaning like "or other party" in the first paragraph and "user"->"user agent" in the fingerprinting section. That makes it hard to review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could arrange time, either in a side meeting, or at a future meeting to discuss this? Personally I find document collaboration via GitHub to be more complex than other document collaboration tools such as those provided by Google or Microsoft.

In the meantime here is some background and related issues.

PR Background

The PR is made against the "revise-2" branch of the master document. This PR introduces the following for reviewer consideration in addition to the changes already included in "revise-2".

  1. Differentiate between "user" and "user agent". For example; a fingerprint can relate to a person, or a user agent. The harms, benefits and risks we are concerned about vary accordingly.

  2. There are three different entities that might benefit or harm people’s privacy.

a) First-parties – web site authors who are readily identifiable via the domain name displayed in the address bar.
b) First-parties’ suppliers – organisations first-parties rely upon that are not identified in the address bar.
c) User agent vendors – organisations that make or control the web browser

People can be harmed by any one or all three of the above. Afterall some user agent vendors have received the largest fines in relation to privacy violations.

The document would benefit from recognising each group and the role they play in the specific issues of security and privacy. This was originally raised as an issue which includes some background.

Related Issues

These related issues are important. @torgo closed an issue related to people's ability to trust supply chains with this statement “We have already established that the statement "supply chains can be trusted" (on its own) is false.” But unfortunately, did not provide me with any references to justify that that statement is false. Such information should be included in the document and be provided from an authoritative source. We can look at other industries they all have supply chains people can trust. Why should the web be different? As a minimum the document should be expanded to consider the conditions that do or do not enable a supply chain to be trust.

Referenced RFC 6973 entertains the possibility and certainly does not exclude it.

I believe the onus in a consensus driven governance model is for the proposer to convince others. As a new member I am keen to understand the facts and feel these issues have been closed without discussion or clarity in the document. I've been advised to progress resolution via text changes to the document and I'm doing so here.

Policy

The more general issue relates to W3C policy. This document infers policy. It does not seem to be the role of a standards body to make decisions that restrict people’s choice. That is a role of law makers.

Whilst these documents are supposed to advise they do prescribe rules akin to laws. According to the W3C Process if a proposal is to be successful it will at some point need to pass horizontal review. It's unlikely the AC or the Director would approve a standard that were deficient.

Rightly proposers become familiar with those documents that will be used to assess their proposal at the onset. Reviewer's frequently reference these documents. Whilst these documents may not be formally classified as laws, they appear to be use as such in practice.

to request a user's credentials
from a password manager. [[CREDENTIAL-MANAGEMENT-1]]
If it exposed the user's credentials to JavaScript,
and if the page using the API were vulnerable to [=XSS=] attacks,
the user's credentials could be leaked to attackers.

The Credential Management API
mitigates this risk
by not exposing the credentials to JavaScript.
Instead, it exposes
an opaque {{FormData}} object
which cannot be read by JavaScript.
The spec also recommends
that sites configure Content Security Policy [[CSP]]
with reasonable [=connect-src=] and [=form-action=] values
to further mitigate the risk of exfiltration.

</div>

Many use cases
which require location information
can be adequately served
with very coarse location data.
For instance,
a site which recommends restaurants
could adequately serve its users
with city-level location information
instead of exposing the user's precise location.

See also

* [[DESIGN-PRINCIPLES#do-not-expose-use-of-assistive-tech]]

<h3 class=question id="persistent-origin-specific-state">
Does this specification introduce new state for an origin that persists
across browsing sessions?
</h3>

Allowing an origin to persist data on a user’s device across browsing
sessions introduces the risk that this state may be used to track a user
without their knowledge or control, either in a first party or third party
contexts. New state persistence mechanisms should not be introduced without
mitigations to prevent it from being used to track users across domains or
without control over clearing this state. And, are there specific caches
that a user agent should specially consider?

<p class=example>
Service Worker [[SERVICE-WORKERS]] intercept all requests made by an
origin, allowing sites to function perfectly even when offline. A
maliciously-injected service worker, however, would be devastating (as
documented in [[SERVICE-WORKERS#security-considerations]]).
They mitigate the risks an [=active network attacker=] or [=XSS=]
vulnerability present by requiring an encrypted and authenticated
connection in order to register a service worker.
</p>

<p class=example>
Platform-specific DRM implementations might expose origin-specific
information in order to help identify users and determine whether they
ought to be granted access to a specific piece of media. These kinds of
identifiers should be carefully evaluated to determine how abuse can be
mitigated; identifiers which a user cannot easily change are very
valuable from a tracking perspective, and protecting the identifiers from
an active network attacker is an important concern.
</p>
There are many existing mechanisms
origins can use to
store information about a user.
Cookies,
`ETag`,
`Last Modified`,
{{localStorage}},
and
{{indexedDB}}
are just a few examples.

Allowing an origin
to store data
on a user’s device
in a way that persists across browsing sessions
introduces the risk
that this state may be used
to track a user
without their knowledge or control,
either in [=first-party-site context|first-=] or [=third-party context|third-party=] contexts.

One of the ways
user agents mitigate the risk
that client-side storage mechanisms
will form a persistent identifier
is by providing users with the ability
to clear out the data stored by origins.
New state persistence mechanisms
should not be introduced
without mitigations
to prevent them
from being used
to track users
across domains
or without control
over clearing this state.
That said,
manually clearing storage
is something users do only rarely.
Spec authors should consider ways
to make new features more privacy-preserving without full storage clearing,
such as
reducing the uniqueness of values,
rotating values,
or otherwise making features no more identifying than is needed.
<!-- https://github.com/w3ctag/design-principles/issues/215 -->
Also, keep in mind that
user agents make use of several different caching mechanisms.
Which, if any, caches will store this new state?
Are additional mitigations necessary?

<div class=example>

Service Workers
intercept all requests made by an origin,
which enables sites
to continue to function when the browser goes offline.
Because of this,
a maliciously-injected service worker
could compromise the user (as documented in [[SERVICE-WORKERS#security-considerations]]).

The spec mitigates the risks
an [=active network attacker=] or [=XSS=] vulnerability present
by limiting service worker registration to [=secure contexts=].
[[SERVICE-WORKERS]]

</div>

<p class=example>
Cookies, `ETag`, `Last Modified`, {{localStorage}}, {{indexedDB}}, etc. all
allow an origin to store information about a user, and retrieve it later,
directly or indirectly. User agents mitigate the risk that these kinds of
storage mechanisms will form a persistent identifier by offering users the
ability to wipe out the data contained in these types of storage.
Platform-specific DRM implementations
(such as [=content decryption modules=] in [[ENCRYPTED-MEDIA]])
might expose origin-specific information
in order to help identify users
and determine whether they ought to be granted access
to a specific piece of media.
These kinds of identifiers
should be carefully evaluated
to determine how abuse can be mitigated;
identifiers which a user cannot easily change
are very valuable from a tracking perspective,
and protecting such identifiers
from an [=active network attacker=]
is vital.
</p>

<h3 class=question id="underlying-platform-data">
Expand All @@ -221,7 +315,7 @@ communication methods.

When a specification exposes specific information about a host to an origin,
if that information changes rarely and is not variable across origins, then
it can be used to uniquely identify a user across two origins — either
it can be used to uniquely identify a user's agent across two origins — either
jwrosewell marked this conversation as resolved.
Show resolved Hide resolved
directly because any given piece of information is unique or because the
combination of disparate pieces of information are unique and can be used to
form a fingerprint [[FINGERPRINTING-GUIDANCE]]. Specifications and user agents
Expand All @@ -239,7 +333,7 @@ exfiltrate data.
<p class=example>
The `RENDERER` string exposed by some WebGL implementations
improves performance in some kinds of applications, but does so at the
cost of adding persistent state to a user's fingerprint. These kinds of
cost of adding persistent state to a user agent's fingerprint. These kinds of
jwrosewell marked this conversation as resolved.
Show resolved Hide resolved
device-level details should be carefully weighed to ensure that the costs
are outweighed by the benefits.
</p>
Expand All @@ -258,11 +352,11 @@ entropy introduced by [disallowing direct enumeration of the plugin list](https:
If so, what kind of sensors and information derived from those sensors does
this standard expose to origins?

Information from sensors may serve as a fingerprinting vector across origins.
In addition, sensor also reveals something about my device or environment and
that fact might be what is sensitive. In addition, as technology advances,
mitigations in place at the time a specification is written may have to be
reconsidered as the threat landscape changes.
Information from sensors may serve as a fingerprinting vector on the same origin
or across origins. In addition, sensor also reveals something about the user’s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe s/on the same origin or across origins/on origins/, like in #90?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there's some grammar weirdness in the original text too ("sensor also reveals" should probably be "sensors can reveal". Can you fix this while you're in here? Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed "sensors can reveal" comment in latest commit.

agent or environment and that fact might be what is sensitive. In addition, as
technology advances, mitigations in place at the time a specification is written
may have to be reconsidered as the threat landscape changes.
jwrosewell marked this conversation as resolved.
Show resolved Hide resolved

Sensor data might even become a cross-origin identifier when the sensor reading
is relatively stable, for example for short time periods (seconds, minutes, even days), and
Expand All @@ -286,16 +380,16 @@ serve as an identifier if misused/abused [[OLEJNIK-BATTERY]].
</p>

<h3 class=question id="other-data">
What data does this specification expose to an origin? Please also
document what data is identical to data exposed by other features, in the
same or different contexts.
What data does this specification expose to an origin or other party?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "origin" is fine since "parties" are origins in the web.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other parties include the user agent vendor that I don't believe would be classified as origins. Consider a feature where the user agent vendor is capturing personal information. Considerations related to privacy would at least include;

a) consent obtained for use of that information;
b) transparency to assure the user that the consent preferences were respected; and
c) security issues associated with the way that data is stored and deleted, among others.

Considering security and privacy beyond protocols and origins is an important theme I would like to see developed in PING and this document. What mitigations can be applied that do not relate entirely to technical standards? For example; accepting terms of service and default settings at the point of user agent setup or upgrade will enable implementors to consider solutions that are not restricted to permission pop ups that are annoying, or defaults that are so restrictive they impair functionality, or expose user agent vendors to competition issues by gaining consent across vertically integrated services at the point of account setup which is impossible for other players in the market to achieve.

I hope we can all agree that a situation where acceptable privacy can only be achieved by a very small number of implementors would not be compatible with the W3C antitrust policy or the mission and purpose of the W3C.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point that the user agent (vendor) is a party outside of "origins" - that effectively has "root" - is a good one. Another example is "geolocation" where information is sent out out band to the browser maker as part of the API.

Please also at data is identical to data exposed by other features, in
jwrosewell marked this conversation as resolved.
Show resolved Hide resolved
the same or different contexts.
</h3>

As noted above in [[#sop-violations]], the [=same-origin policy=] is an
As noted in [[#sop-violations]], the [=same-origin policy=] is an
jwrosewell marked this conversation as resolved.
Show resolved Hide resolved
important security barrier that new features need to carefully consider.
If a specification exposes details about another origin's state, or allows
POST or GET requests to be made to another origin, the consequences can be
severe.
POST or GET requests to be made to another origin or party, the consequences
can be severe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not necessary as origin implies party


<p class=example>
Content Security Policy [[CSP]] unintentionally exposed redirect targets
Expand Down Expand Up @@ -408,21 +502,18 @@ at zero, increments, and is reset — is a good example of a privacy friendly
temporary identifier.

<h3 class=question id="first-third-party">
How does this specification distinguish between behavior in first-party and
third-party contexts?
How does this specification protect end users from storing persistent data
on a user’s device across browsing session?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think your new question may be one worth asking in the questionnaire, I'd rather it be spun off into a separate PR that adds this question, instead of one which replaces the first-/third-party question. Among other things, this new question is significantly narrower than the original.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pursue removing or modifying the first-/third-party question in a separate PR as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two key changes in this pull request.

  1. Recognising the full range of parties involved in the web.
  2. Resolving first and third-party bias.

Before splitting the second change into another pull request I’m keen to gauge wider views on this and have been encouraged to raise these issues via specific text changes.

If smaller participants on the web are denied the ability to “band together” to deliver services that only larger participants can deliver due to their scale, then we have to consider the W3C’s antitrust policy. What work has been done on this document to verify it does not unconsciously steer specifications towards this outcome? Whilst I recognise that the subject of competition and antitrust may not be one other people feel willing or able to engage with that does not resolve the matter.

There are other considerations related to people’s trust choices and trusting supply chains. The document, and reviewers of issues, have categorically stated that people can not trust supply chains. I’ve not seen the evidence for this from any authoritative sources. As a minimum the document should reference such sources for new readers like myself. Perhaps approaching the problem from the perspective of the conditions under which a person can or can not trust a supply chain would add a much needed dimension to the document?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right forum to have that debate. We're trying to craft some guidelines here, not open up a debate about the nature of the web. Can I suggest that we go with Tess's suggestion and split this out into a new question?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following discussion in the IWA BG where there was broad support to discuss the subject further, and considering the opportunity at TPAC to hold a debate on these matters, I have proposed a session for October. The output of that session will help inform the future of these definitions and how they apply to the questionairre and the work of the TAG and PING.

It is clear that the current definitions or interpretations of parties is too simplistic to solve the myriad of challanges we are all working on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TPAC break session proposal has been added following the Improving Web Advertising Business Group today and discussion on the session there.

</h3>

The behavior of a feature should be considered not just in the context of its
being used by a first party origin that a user is visiting but also the
implications of its being used by an arbitrary third party that the first
party includes. When developing your specification, consider the implications
of its use by third party resources on a page and, consider if support for
use by third party resources should be optional to conform to the
specification. If supporting use by third party resources is mandatory for
conformance, please explain why and what privacy mitigations are in place.
This is particularly important as user agents may take steps to reduce the
availability or functionality of certain features to third parties if the
third parties are found to be abusing the functionality.
The behavior of a feature should be considered not just in the context of
its being used by a first party origin that a user is visiting but also the
implications of its being used by other parties. When developing your
specification, consider the implications of its use by all parties. Consider
if this sufficiently protects end users from storing persistent data on a
user’s agent across browsing sessions. If supporting use by parties other
than the first party origin is mandatory for conformance, please explain why
and what privacy mitigations are in place.

<h3 class=question id="private-browsing">
How does this specification work in the context of a user agent’s Private
Expand Down