Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate domains blocked by __cfduid cookies. #1538

Closed
cowlicks opened this issue Jul 29, 2017 · 9 comments · Fixed by #2536
Closed

Migrate domains blocked by __cfduid cookies. #1538

cowlicks opened this issue Jul 29, 2017 · 9 comments · Fixed by #2536
Labels
broken site bug heuristic Badger's core learning-what-to-block functionality important migrations Badger user data modifications

Comments

@cowlicks
Copy link
Contributor

User's might still have many sites blocked by the cloudflare __cfduid cookie. We should unblock these.

@ghostwords
Copy link
Member

ghostwords commented Jul 31, 2017

This follows up on #1361 and #1533 (comment).

We should first try to figure out if this is a thing and how big of a thing. One approach idea: Of all the domains in error reports/GitHub issues submitted using version 2017.5.9 (when we started ignoring __cfduid) or later, how many domains do nothing except set/get __cfduid cookies?

@ghostwords ghostwords added bug heuristic Badger's core learning-what-to-block functionality labels Jul 31, 2017
@cowlicks
Copy link
Contributor Author

cowlicks commented Aug 2, 2017

Instead of doing that, we could look at blocked domains, then check if these have only a __cfduid cookie, if that is the case the domain should be unblocked. Regardless of what error reports we have. If there is another reason it was blocked, Privacy Badger will eventually block it again.

@ghostwords
Copy link
Member

This sounds reasonable, as an implementation approach. We should get a number of specific cases to verify the migration though. Can that cookie db site (https://cookiepedia.co.uk/?) help?

@cowlicks
Copy link
Contributor Author

cowlicks commented Aug 3, 2017

From here.


About this cookie:
Cookie assoiated with sites using CloudFlare, used to speed up page load times. According to CloudFlare it is used to override any security restrictions based on the IP address the visitor is coming from. It does not contain any user identification information.

The main purpose of this cookie is: Strictly Necessary

Key numbers for __cfduid:
Cookies with this name have been found on 14,461 websites, set by 9,650 host domains.

It has been found as a First Party cookie on 5,466 websites and a Third Party cookie on 18,290 websites.

It has been found as a Persistent cookie on 23,740 websites, with an average life span of 2,262 days.

It has been found as a Session cookie on 16 websites.

@ghostwords
Copy link
Member

ghostwords commented Sep 22, 2017

First, let's get the full list of blocked domains from error reports from Privacy Badger versions after the Cloudflare workaround went out:

DROP TABLE IF EXISTS numbers;
CREATE TEMPORARY TABLE numbers AS (
  SELECT id FROM reports WHERE id <= (
    SELECT MAX(ROUND((LENGTH(block) - LENGTH(REPLACE(block, ",", ""))) / LENGTH(",")) + 1) AS max_split_length
    FROM reports WHERE (
      version = "2017.5.9" OR
      version = "2017.6.13" OR
      version = "2017.6.13.1" OR
      version = "2017.7.24" OR
      version = "2017.9.12" OR
      version = "2017.9.12.1"
    )
  )
);
SELECT blocked_fqdn FROM (
  SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(reports.block, ",", numbers.id), ",", -1) AS blocked_fqdn
  FROM numbers
  INNER JOIN reports ON CHAR_LENGTH(reports.block) - CHAR_LENGTH(REPLACE(reports.block, ",", "")) >= numbers.id-1
  WHERE (
    version = "2017.5.9" OR
    version = "2017.6.13" OR
    version = "2017.6.13.1" OR
    version = "2017.7.24" OR
    version = "2017.9.12" OR
    version = "2017.9.12.1"
  )
) AS tmp GROUP BY blocked_fqdn;

Then, let's make a GET to "/" of each domain and see if __cfduid is in the response headers:

for domain in $(cat domains.txt); do curl -Iv -m10 "$domain" 2>&1 | grep -q __cfduid && echo "$domain"; done

@ghostwords ghostwords added the migrations Badger user data modifications label Sep 22, 2017
@ghostwords
Copy link
Member

ghostwords commented Sep 22, 2017

Here is the list of Cloudflare-using domains reported as blocked in errors reports from Badgers version 2017.5.9 and above, ordered it by prevalence:

+-------+-----------------------------------------+
| count | blocked_fqdn                            |
+-------+-----------------------------------------+
|   326 | cdn-images-1.medium.com                 |
|   286 | cdn-static-1.medium.com                 |
|    98 | tru.am                                  |
|    96 | cdn.onesignal.com                       |
|    76 | edge.alluremedia.com.au                 |
|    72 | cdn.viglink.com                         |
|    63 | rum-static.pingdom.net                  |
|    51 | www.lightboxcdn.com                     |
|    49 | www.dianomi.com                         |
|    37 | cdn.inspectlet.com                      |
|    33 | browser-update.org                      |
|    31 | www.npttech.com                         |
|    30 | static.adzerk.net                       |
|    29 | try.abtasty.com                         |
|    28 | a.fsdn.com                              |
|    28 | static.addtoany.com                     |
|    27 | experience.tinypass.com                 |
|    26 | prebid.districtm.ca                     |
|    24 | p.bm23.com                              |
|    23 | cdn.tinypass.com                        |
|    22 | js.bronto.com                           |
|    18 | widget.uservoice.com                    |
|    18 | cdn.pbbl.co                             |
|    17 | stats.lotlinx.com                       |
|    14 | geoservice.curse.com                    |
|    14 | freegeoip.net                           |
|    12 | cdn.pubexchange.com                     |
|    11 | tag.navdmp.com                          |
|    11 | secure.statcounter.com                  |
|    11 | domain157.club                          |
|    11 | cdn.tynt.com                            |
|    10 | static.mijnwebwinkel.nl                 |
|    10 | asset.mijnwebwinkel.nl                  |
|    10 | analytics.codigo.se                     |
|    10 | files.imbox.io                          |
|     9 | cdn.districtm.ca                        |
|     9 | c.statcounter.com                       |
|     9 | rangeblessedness.men                    |
|     9 | www.marinetraffic.com                   |
|     8 | static.impresa.pt                       |
|     8 | dashboard.tinypass.com                  |
|     8 | cdn.datatables.net                      |
|     8 | onesignal.com                           |
|     8 | westen-r.life                           |
                        ...
|     1 | promo.bitmedia.io                       |
|     1 | livenewschat.os.tc                      |
+-------+-----------------------------------------+

Full list without the counts:
cloudflare.txt

@ghostwords
Copy link
Member

ghostwords commented Sep 22, 2017

The same list but grouped by version:

+-------+-------------+
| count | version     |
+-------+-------------+
|    43 | 2017.9.12.1 |
|    85 | 2017.9.12   |
|   626 | 2017.7.24   |
|   671 | 2017.6.13.1 |
|     8 | 2017.6.13   |
|  1096 | 2017.5.9    |
+-------+-------------+

Just the Medium domains grouped by version:

+-------+-------------+
| count | version     |
+-------+-------------+
|     0 | 2017.9.12.1 |
|     5 | 2017.9.12   |
|    80 | 2017.7.24   |
|   155 | 2017.6.13.1 |
|     2 | 2017.6.13   |
|   381 | 2017.5.9    |
+-------+-------------+

@ghostwords
Copy link
Member

ghostwords commented Sep 22, 2017

Should next see which (of the top reported ones) are only here because of __cfduid cookies, and are causing site breakages.

@ghostwords
Copy link
Member

ghostwords commented Oct 5, 2017

edge.alluremedia.com.au is one. New Badgers don't seem to learn to block it, but old Badgers still run into issues; error report counts by month of reports where this domain is blocked:

+---------+----------+
| ym      | count(*) |
+---------+----------+
| 2017-10 |        1 |
| 2017-09 |        9 |
| 2017-08 |       14 |
| 2017-07 |       19 |
| 2017-06 |       21 |
| 2017-05 |       25 |
| 2017-04 |       39 |
| 2017-03 |       16 |
| 2017-02 |       22 |
| 2017-01 |       15 |
| 2016-12 |       11 |
| 2016-11 |       13 |
| 2016-10 |       15 |
| 2016-09 |        9 |
| 2016-08 |        9 |
| 2016-07 |        6 |
| 2016-06 |        4 |
| 2016-05 |        3 |
| 2016-04 |       10 |
| 2016-03 |       17 |
| 2016-02 |       16 |
| 2016-01 |       12 |
| 2015-12 |       14 |
| 2015-11 |       15 |
| 2015-10 |       13 |
| 2015-09 |       11 |
| 2015-08 |       10 |
| 2015-07 |        1 |
+---------+----------+

We released #1361 with Privacy Badger version 2017.5.9. I guess affected users either whitelisted the affected site, unblocked the relevant domain, or uninstalled Privacy Badger.

@ghostwords ghostwords changed the title Migrate domains blocked by __cfduid cookies. Migrate domains blocked by __cfduid cookies. Feb 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken site bug heuristic Badger's core learning-what-to-block functionality important migrations Badger user data modifications
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants