-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What implementation change, if any, do we need to make given the deprecation of the US Privacy String? #60
Comments
Possibly, helpful links: |
The short answer is that we will need to get the GPP string using the Some more info on GPP: These are the current sections to choose from. Section 6 (uspv1) is the USPS we are used to, which will be deprecated. These new US sections contain much more information than the USPS (11-16 fields instead of 4 fields in the USPS). The new fields are listed below:
|
Excellent! |
I went through the sites in the paper that were compliant with GPC. From that list, these sites implemented GPP: Unfortunately, all of these sites use version 1.0 of the CMP API. In this version, the GPP string could be accessed using And this for This complicates things on our end a bit, as we can’t get the GPP string from the return object of First, I’m going to focus on successfully pinging the CMP API from the extension. Using this, I’ll try to identify more sites that implemented GPP (if they exist) and check which version of the CMP API they use. If we want to look at the contents of the GPP strings for sites that use version 1.0, I'll use |
Yes, that is good. Could we also test every site with both |
Yes, that would also work. |
OK, let's explore that as well then. Especially, if there is a gradual change over a longer period of time it may be worth it to look for both (for a certain time, at least). |
I think I may check the version using Does that sound ok? I'll go ahead and add columns for the GPP string before and after GPC. |
But wouldn't
|
No, |
Ah, very enlightening, indeed. :) Yes, the |
-updating rest api -adding gpp columns + injection script to check for gpp -adding a counter so that the site can't run analysis or halt analysis twice. (some sites were loading multiple times which resulted in multiple entries)
I crawled the last 774 sites of the initial set of ~2000 to test the last commit. It took ~32 s per site. The success rate was 98% when insecure certificates and error pages were excluded. I believe the rest of the failed sites were due to sites not loading (either initially or on the reload). 74 sites in this set had gpp implemented. Some details on the commit:
The injection script in the commit accommodates all 3 categories. For all sites that have implemented v 1.1, if you call a default function (I.e. Another thing I added was a counter for how many times analysis was started and stopped for each domain. With the counter, we can prevent analysis from starting or stopping twice (or more) in a row for a particular domain. I added this because I was getting multiple entries for some sites. I realized that the sites were somehow loading more than once (or at least the event listener that responds to “load” events is being triggered more than once), which was triggering the extension to start or stop analysis more than once for a particular domain. (The load events happen close enough together that the extension has not updated the variable that stores whether analysis is running). Out of the 774 sites I crawled, these are the sites that loaded multiple times: https://www.furniturerow.com/, https://www.ultrasurfing.com/, https://www.wm.com/, https://www.fee.org/, https://www.ammoland.com/, https://www.upmc.com/. It seems rare enough that it could just be a site issue, and with the counter, analysis runs normally. |
Excellent! |
The IAB has a website that encodes and decodes GPP strings. The site uses this JS/TS package to do the encoding and decoding process. The full package cannot run locally on a computer, as it must be able to access a window object in the browser. Since the package uses an Apache 2.0 license, I decided to just start from the library, remove all code that we won't need to use (we only need to decode), and convert it to python so it can easily integrate with our existing colab notebooks. The python code is now here in drive, and there is example usage inside the Processing_Analysis_Data colab. |
The last test I did on GPP (of ~700 sites) found 76 sites with GPP strings. The graph below shows how frequently various combinations of sections were implemented in the GPP strings I found. As we discussed in the meeting, the states separate different types of information disclosure (i.e. Sale, Sharing, Targeted Advertising). CA's section uses Sale and Sharing, while all other states use Sale and Targeted Advertising. The national section includes all three. The values N/A, opted out, and not opted out correspond to the value inputs as described by IAB I separated the different opt outs into different graphs, shown below. |
I am wondering if we should also consider the TCF EU/Canada sections as well. I could imagine some sites saying, "if we are receiving a GPC (or other privacy preference signal), let's just opt out the user everywhere."
What is the
So, there is no "profiling" despite being defined in some some laws?
Nice analysis! |
I can work on that
CA, CO, CT, UT, and VA each have their own section that has fields specific to that state's laws. usvav1 is the section for Virginia. Most sites just chose to implement usnatv1 because then they cover all state laws, as usnatv1 is the union of the fields of all the state sections.
No, there is not any field for profiling. Here is a complete list of fields. |
Thanks, @katehausladen! And, to clarify, we have the logic for capturing the TCF string in the crawler, right? |
Currently, the crawler gets TCF strings via the GPP string (if those sections are included in the GPP string). |
Ah, yes, right, that is good! |
There may be no profiling because maybe the opt out right does not cover profiling, which may be defined in the laws for other reasons, or, at least, does not cover the opt out right via privacy preferences signals. We would need to check the laws. (#59) It may also simply the case that as GPP is under development the IAB has not yet added profiling. |
This has been merged and completed. |
The US Privacy String (including the USPAPI) will be deprecated on September 30, 2023. The IAB is introducing their Global Privacy Platform (GPP) --- not to be confused with Global Privacy Control (GPC); the two are unrelated.
What would we need to change for our crawler to work in the GPP environment?
@katehausladen and @OliverWang13, can you look into that?
The text was updated successfully, but these errors were encountered: