Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNAME trackers identified via HTTP Archive crawls #102

Closed
max-ostapenko opened this issue Jul 26, 2024 · 1 comment
Closed

CNAME trackers identified via HTTP Archive crawls #102

max-ostapenko opened this issue Jul 26, 2024 · 1 comment
Assignees

Comments

@max-ostapenko
Copy link

max-ostapenko commented Jul 26, 2024

Based on HTTP Archive data analysis there is a number of news trackers missing in this list.

Some examples:

Company CNAME Domain Disguised hostname examples
utiq.com utiq-aws.net utiq.kreiszeitung.de, utiq.aufeminin.com
actioniq.com mr-in.com aiq-in.ext.hp.com, aiq-in.skechers.com
truedata.co truedata.co td.popularmechanics.com, td.esquire.com

Is it possible to extend this list?
What information is required?

Data is public in BigQuery, so here is a query for context:

CREATE TEMP FUNCTION convert_cname_json(json_str STRING)
RETURNS ARRAY<STRUCT<hostname STRING, cname STRING>>
LANGUAGE js AS """
  const obj = JSON.parse(json_str);
  const result = [];
  for (const key in obj) {
    result.push({
      hostname: key,
      cname: obj[key]
    });
  }
  return result;
""";

WITH adguard_trackers AS (
  SELECT hostname
  FROM UNNEST(["cz.affilbox.cz", "pl02.prolitteris.2cnt.net", "a8.net", "mm.actionlink.jp", "ebis.ne.jp", "0i0i0i0.com", "ads.bid", "at-o.net", "actonservice.com", "actonsoftware.com", "2o7.net", "data.adobedc.net", "sc.adobedc.net", "sc.omtrdc.net", "adocean.pl", "aquaplatform.com", "cdn18685953.ahacdn.me", "thirdparty.bnc.lt", "api.clickaine.com", "tagcommander.com", "track.sp.crdl.io", "dnsdelegation.io", "storetail.io", "e.customeriomail.com", "dataunlocker.com", "monopoly-drain.ga", "friendly-community.tk", "nc0.co", "eulerian.net", "extole.com", "extole.io", "fathomdns.com", "genieespv.jp", "ad-cloud.jp", "goatcounter.com", "heleric.com", "iocnt.net", "affex.org", "k.keyade.com", "ghochv3eng.trafficmanager.net", "online-metrix.net", "logly.co.jp", "mailgun.org", "ab1n.net", "ntv.io", "ntvpforever.com", "postrelease.com", "non.li", "tracking.bp01.net", "t.eloqua.com", "oghub.io", "go.pardot.com", "parsely.com", "custom.plausible.io", "popcashjs.b-cdn.net", "rdtk.io", "sailthru.com", "exacttarget.com", "a351fec2c318c11ea9b9b0a0ae18fb0b-1529426863.eu-central-1.elb.amazonaws.com", "a5e652663674a11e997c60ac8a4ec150-1684524385.eu-central-1.elb.amazonaws.com", "a88045584548111e997c60ac8a4ec150-1610510072.eu-central-1.elb.amazonaws.com", "afc4d9aa2a91d11e997c60ac8a4ec150-2082092489.eu-central-1.elb.amazonaws.com", "webtrekk.net", "wt-eu02.net", "ak-is2.net", "wizaly.com"]) AS hostname
), cnames AS (
  SELECT
  cnames.cname AS cname,
  cnames.hostname AS hostname,
  page
FROM `httparchive.all.pages`,
UNNEST(convert_cname_json(JSON_QUERY(custom_metrics, '$.privacy.request_hostnames_with_cname'))) AS cnames
WHERE
  date = '2024-06-01' AND
  is_root_page = TRUE
)

SELECT
  NET.REG_DOMAIN(cnames.cname) AS cname,
  NET.REG_DOMAIN(adguard_trackers.hostname) AS adguard_cname,
  COUNT(DISTINCT NET.REG_DOMAIN(cnames.hostname)) AS request_domain_count,
  ARRAY_AGG(DISTINCT cnames.hostname LIMIT 2) AS request_domain_examples,
  ARRAY_AGG(DISTINCT page LIMIT 2) AS page_examples,
FROM cnames
LEFT JOIN adguard_trackers
ON cnames.cname LIKE CONCAT('%', adguard_trackers.hostname, '%')
GROUP BY cname, adguard_cname
HAVING request_domain_count > 100
ORDER BY request_domain_count DESC
adguard pushed a commit that referenced this issue Aug 1, 2024
Squashed commit of the following:

commit eadcad9
Author: jellizaveta <e.egorova@adguard.com>
Date:   Thu Aug 1 16:35:25 2024 +0300

    fixed the tracking domain

commit 5374723
Author: jellizaveta <e.egorova@adguard.com>
Date:   Wed Jul 31 18:39:31 2024 +0300

    Add Etracker

commit 4af922a
Author: jellizaveta <e.egorova@adguard.com>
Date:   Wed Jul 31 17:54:45 2024 +0300

    Add ActionIQ, TrueData, Utiq to config.#102
@jellizaveta
Copy link
Contributor

Hello. ActionIQ, TrueData, Utiq and Etracker (found this company while doing research on your request) was added to config

Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants