-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Privacy policy discovery. #39
Comments
I think it would be useful to explain the relationship with P3P (probably none). Were there other prior initiatives like this to acknowledge? https://microformats.org/wiki/existing-rel-values has Also, what's the intended purpose of this? On its own this seems relatively harmless, but if it's combined in some way whereby if you have this you get to share cookies across the site boundary, not so much. |
P3P was an attempt to define a machine-readable representation of a privacy policy. This proposal only defines a link to a sites' (hopefully) already-existing privacy policy prose. I think they're different in kind; I think we can learn a lot from P3P and there's good discussion to be had around it, but I don't think this proposal has much relationship to it at all.
I noted this in https://mikewest.github.io/privacy-policy-discovery/#link-type. HTTP Archive suggests that I also think the word "privacy" is a bit broader, and I could imagine someone wanting to define something more general that didn't link to the policy specifically but something else. 🤷
This is relatively harmless. :) I'm not suggesting any behavioral change, and certainly nothing with regard to storage or cookies could be justified by a link to a privacy policy. The immediate use case would be UX changes in clients (including user agents of course, but also crawlers of various sorts) that could help users discover privacy policies, not web-facing changes. |
I think it's worth pointing out P3P and saying there's no relation. At least I think that will preempt a set of concerns. A colleague pointed out that we might also want to consider other privacy policies an origin might be responsible for, such as an Android application. Presumably we'd want to clearly scope this to websites, but giving some kind of indication what other platforms could do would be good. |
I'll find some way to make that disclaimer, thanks for the suggestion.
The link type seems like it wouldn't be subject to this kind of confusion, but I agree that clarifying the purpose of the well-known redirect as being focused on the website that hosts it would be worthwhile. Do y'all think it would be worth defining specific extensions to this (e.g. |
In privacycg/proposals#39, annevk@ suggested clarifying this proposal's relationship to P3P, and discussing the scoping of the well-known URL as it regards non-web platforms. This patch attempts to do both.
Potentially, providing some kind of direction if platforms want to go that way seems worthwhile. Using a |
I agree with all of that. I added a small note to https://mikewest.github.io/privacy-policy-discovery/#scope suggesting the possibility of this kind of extension, but I agree that it's unnecessary (and unhelpful) to use that for the web. |
Thanks! I'd expand "PWA" or simply say website. |
As annevk@ suggested in privacycg/proposals#39.
Paving the The utility of a well-known link is less obvious to me, but if the scope is set to the Origin then it seems like a good option to provide to web developers. One question: In the case where both are present, both would apply? |
Thanks, @bvandersloot-mozilla! The well-known link seems likely to me to be useful for non-browser clients (e.g. crawlers). I agree with you that the link type is more likely to be immediately useful for browsers. Regarding scoping, I expect a mismatch between the well-known URL's redirect target and a link on a page to generally be a misconfiguration. I can imagine a circumstance in which the claims made about a specific page could be more strict than the claims made about an origin at large, but that seems like an edge case that I'm not sure maps to any practical use case I'm familiar with. |
Thanks for capturing my thoughts better than I could! This also crystalizes a bit why the platform option may be undesirable without a small, known set of platforms. It also opens up use for even more non-web use-cases. E.g. |
I think it would potentially be desirable for us to define how we'd expect platforms to spell their URL (e.g. by adding their name to the path). I'm not sure it would be reasonable for us to define the meaning of any given platform name. In principle, that seems like it would require a registry (but in practice it would be a short list, so probably no harm done by codifying it).
This is a good point that I'll add to the doc. |
One nerd-sniping on The reasons that convinced me:
Are these fair points? |
I think your points are fair, but I disagree with your conclusion. :) Some thoughts inline:
I agree that the distinction between page-level and origin-level declarations is meaningful, but I think they cut in the other direction. Precisely because Additionally, I think that creating a well-understood mechanism for declaring a set of policy constraints on an entire origin's behavior is valuable. I don't think that can reasonably or semantically be done on a resource-by-resource basis.
I think you're correct to say that there's a chance of origin-level and page-level declarations pointing to distinct documents. That said, you suggest above that there's sufficiently-direct attention paid to page-level links to a privacy policy. I suggest above that even more attention is likely paid to origin-wide declarations of the same. There will certainly be cases in which there's a conflicting declaration, but given the effort that well-meaning entities put into their policy declarations, the risk of user confusion seems low in the long run.
Yup. I agree that a
I think you provided good counterexamples of domains that don't serve navigable HTML documents, but for which it would be nice to expect a declaration of policy constraints on data collection and usage. Thanks again for the feedback, @bvandersloot-mozilla! |
We're (Brave) very interested in this and would find it useful for making it easy to discover the privacy policies rather than needing to maintain lists of popular sites. Would there be any interest in also reusing this pattern for terms of service as well? That's another important link used during registration flows that would be useful to discover and surface within UI upon registration (the use case we're interested in this for). The easy discovery of these two pages will be useful for UAs to be able to better assist users during registration flows and could lead to some useful additions in FedCM I'd think as well. |
Hey @kdenhartog, thanks for your thoughts.
👍
I don't see the same level of alignment around a particular link type here. Very naively skimming HTTP Archive, I only see 665 pages that contain "terms" (many more contain "tos", but generally as part of another string, like In the absence of a clear indication of preference among web developers, I don't have any objection to adding a |
Yup, that's exactly what I had in mind. |
Why specifically is the .well-known version more useful to crawlers? Do crawlers not parse HTML as they go? I’m asking because having two distinct ways of specifying the privacy policy creates the possibility that they may be in conflict. That means we have to specify which one takes precedence if they are different. If it’s the rel link on the page, then crawlers will have to read that anyway. If it’s the well-known URL that takes precedence, then browsers will have to read that anyway and not just trust the rel. If they are required to always be the same but with no enforcement or defined precedence, then that creates the potential for confusing or deceiving users, if for example privacy policy UI showed different things in browsers and in services that obtain their content by crawling. All these options are kind of bad, so it would be better if there is only one way to specify. But having a defined precedence and having the rel link take precedence is probably the least bad possibility (b/c it makes more sense for the specific to override the general, and better to have it defined that any client potentially has to check for both than to incorrectly imply that either will do and they are guaranteed to be the same). |
Hey @othermaciej, thanks for the feedback! While I think that sites generally have a single policy document they point to for their behavior on a given platform, I agree that the concern you and @bvandersloot-mozilla raise is reasonable. If we end up deciding that the link type is the only thing we need, great. :) Broadly, I have three kinds of answers for you:
So, I agree that the conflicts you're pointing to can and will happen. This seems to me like a policy problem whose risk we can mitigate at that layer by defining expectations around this mechanism more clearly, and relying on non-technical actors in the ecosystem to help us create incentives for correct usage. |
+1
+1, I think a
Just to try to add some clarity here, FedCM has defined an un-crededentialed
This is currently used in the FedCM UI on sign-up (when the user is creating a new account on the website). This mechanism is different from what's being proposed here in a few ways:
It is hard to tell with a lot of confidence right now, but I have an intuition that the mechanism proposed here could be made to augment the FedCM UX indeed, as @kdenhartog points out.
One consideration here is that, for browsers (as opposed to crawlers), loading and checking for the existence of a |
As suggested by kdenhartog@ in [1]. [1]: privacycg/proposals#39 (comment)
That's what we had in mind. One of the use cases here is that browser UI could call out the ToS/privacy policy links to make these easier to find. The other thing that we're interested in experimenting with here is using a local LLM to be able to parse the privacy policy and terms of service and flag any concerning issues to the user. Obviously LLMs are a bit finicky for something like this so it's more just an experiment at this point, but having to manually maintain a list for this seemed a bit more of a headache than something like this as an option. |
This PR defines a `terms-of-service` link type that refers to a document which contains information about the agreements between a document's provider and users who wish to use the document provided. This link type was initially discussed in privacycg/proposals#39 and initially sketched in https://mikewest.github.io/privacy-policy-discovery/.
A browser would looking in the current page for the appropriate link type, rather than loading another resource would be the most appropriate step to take.
This is exactly the use case for a rel-link. Forcing the UA to go out of band to double-check that a well-known resource doesn't exist before being certain that a privacy policy doesn't exist for a page is undesirable. |
I agree with @bvandersloot-mozilla that browsers currently rendering pages are quite likely to be able to extract metadata like this from the page they're currently rendering. I likewise agree that That said, there are certainly UI use cases that benefit from out-of-band checks with scope broader than an arbitrary document on an origin (see e.g. https://www.w3.org/TR/change-password-url/), and I think the conversation above has outlined even some in-band cases in which a |
(It might be worthwhile to split the |
RFC 6903 defined |
I identified four existing ways of doing this when we last discussed it in ... 2013, apparently. For the use case of researchers or civil society who might want to discover and compare privacy policies en masse, there might be some advantage to |
Thanks for the link, Nick! I didn't realize this document existed (and I'm embarrassed that I didn't think to look at the IETF for link type definitions...). Thanks also for the pointer to earlier discussion. I'm glad the proposals here landed on the same names, and I'll update my doc and PRs to point to that document instead.
From talking with folks like https://checks.google.com/, establishing and encouraging a pattern through which a |
The privacy-policy link type that refers to a document which contains information about the data collection and usage practices that apply to the current context. This link type was defined in section 4 of RFC 6903 (https://datatracker.ietf.org/doc/html/rfc6903#section-4), and rediscovered in a discussion at privacycg/proposals#39.
The terms-of-service link type refers to a document which contains information about the agreements between a document's provider and users who wish to use the document provided. This link type was initially defined in RFC 6903 section 5 (https://datatracker.ietf.org/doc/html/rfc6903#section-5), then rediscovered in a discussion in privacycg/proposals#39. See also https://mikewest.github.io/privacy-policy-discovery/.
It would be ideal if sites' privacy policies were more discoverable to users, their agents, and to crawlers. To that end, I'd suggest that we:
Pave the
rel=privacy-policy
cowpath (based on HTTP Archive data, this appears in at least 285,421 distinct documents) by defining aprivacy-policy
link type.Define a well-known URL that redirects to a host's privacy policy (e.g.
/.well-known/privacy-policy
).There's quite a bit that could be done beyond discovery of course, but these two steps seem small, simple, and relatively easy to adopt.
I've written this up in a little more detail at https://mikewest.github.io/privacy-policy-discovery/, but there's not much to that document beyond what's written here.
WDYT?
The text was updated successfully, but these errors were encountered: