Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy: ClearURLs #962

Closed
2 tasks done
TPS opened this issue Jun 23, 2024 · 15 comments
Closed
2 tasks done

Privacy: ClearURLs #962

TPS opened this issue Jun 23, 2024 · 15 comments
Labels
Feature Request New feature or request

Comments

@TPS
Copy link

TPS commented Jun 23, 2024

Prerequisites

  • I checked the documentation and understood it;
  • I checked to make sure that this issue has not already been filed;

Problem description

The ClearURLs database might be be transformed into a powerful privacy-enhancing filterlist &/or userscript.

Proposed solution

The specs @ https://docs.clearurls.xyz/latest/specs/rules/ would be utterly necessary to transform this to something end-usable.

Additional information

Originally found via svenjacobs/leon#315 (reply in thread), where several interrelated projects are thinking of how to incorporate this database themselves.

@TPS TPS added the Feature Request New feature or request label Jun 23, 2024
@TPS
Copy link
Author

TPS commented Jun 25, 2024

Definitely also see DandelionSprout/adfilt#163

@krystian3w
Copy link
Contributor

krystian3w commented Jun 25, 2024

Added years ago: https://github.com/AdguardTeam/FiltersRegistry/tree/master/filters/ThirdParty/filter_251_LegitimateURLShortener - 65694c6 (#401)

@TPS
Copy link
Author

TPS commented Jun 25, 2024

I do use LUS, but am hoping to improve coverage for these trackers.

Not identical, but now I think LUS is a derivative of ClearURLs (& probably other sources), so maybe this is duplicate in some sense? If you'd comment on the relationship between the 2, @DandelionSprout, it'd help.

@iam-py-test
Copy link

iam-py-test commented Jun 25, 2024

Conflict of interest disclaimer: I am the assistant maintainer of the Actually Legitimate URL Shortener Tool, and current maintainer of the ClearURLs for uBo list (I did not create the original ClearURLs for uBo list; credit for that goes to rustysnake)

DandelionSprout's LUS is a derivative of ClearURLs (& probably other sources)

It is not. While a few filters have been copied from elsewhere (with credit), most have been manually added either based on user reports or tracking parameters Imre (and I) found.
Thank you

@TPS
Copy link
Author

TPS commented Jun 26, 2024

@iam-py-test Thanks very much for answering. 🙇🏾‍♂️ Could you comment on how different the contents of the 2 lists are from each other?

@iam-py-test
Copy link

The Actually Legitimate URL Shortener, as described, is a variety of rules manually added by Imre (DandelionSprout) and me.
ClearURLs for uBo uses a Python script to convert the ClearURLs rules into a filterlist for uBlock Origin and AdGuard (basically what you requested here). There are a few modifications to remove problematic rules, but largely it's just the ClearURLs rules.
Thanks

@DandelionSprout
Copy link
Member

In theory, I could potentially have attempted to merge relevant entries from ClearURLs into LUS, which I can only presume would be a win-win for most parties.

@TPS
Copy link
Author

TPS commented Jun 26, 2024

@DandelionSprout 🙇🏾‍♂️ Actually, if the contents are that different, it'd make sense to keep them separate, & offer each as AG options to supplement each other & AG's other Privacy filterlists. OTOH, if the included rules overlap significantly, then it would make sense to use 1 as another source for the other, to keep down duplication.

DandelionSprout added a commit to DandelionSprout/adfilt that referenced this issue Jun 26, 2024
Only 2 entries (excluding odd ones like `keywords`) in ClearURLs had not previously been in LUS. I'd call that a high success rate for LUS considering the latter's 80-ish entries for Amazon.
@DandelionSprout
Copy link
Member

So, I ran a comparison this morning about whether ClearURLs had any coverage that LUS didn't. I decided to test with Amazon, a high-coverage site in both lists.

LUS had well above 80 entries for Amazon (70 of them being specific entries). Only 2 entries that made sense (e.g. not ones like keywords or _encoding) had been in ClearURLs but not in LUS.

Although I do have conflicts of interest in the matter, I'd say that at this point ClearURLs has been obliterated in comparison. I give iam-py-test full 100% rights to make the calls on the following, with no interference from me, but I personally am getting unsure if a ClearURLs list conversion would be considered necessary nowadays. 😓

@TPS
Copy link
Author

TPS commented Jun 27, 2024

That's reasonable methodology. Possible to be more comprehensive over domain variety, like this is for TLD variety? I've a hunch that far-less-well-known sites than Amazon may have wider coverage on ClearURLs.

@iam-py-test
Copy link

Possible to be more comprehensive over domain variety, like StevenBlack/hosts#1181 (comment)?

Given both lists have many global (applies to all websites) rules, measuring such coverage would be difficult.

@krystian3w
Copy link
Contributor

krystian3w commented Jun 27, 2024

It is definitely worth testing which permissions deactivate the global removeparam (AdGuard only):

removeparam rules can also be disabled by $document and $urlblock exception rules. But basic exception rules without modifiers do not do that. For example, @@||example.com^ will not disable $removeparam=p for requests to example.com, but @@||example.com^$urlblock will.

Then the script "user.js" with API to edit parameters will probably work better on locked ranges.

https://adguard.com/kb/general/ad-filtering/create-own-filters/#urlblock-modifier

@zloyden
Copy link
Contributor

zloyden commented Dec 6, 2024

Hi! According to our rules, it should be the filter that oriented towards browser content blockers as mentioned Legitimate URL Shortener here.

@zloyden zloyden closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2024
@krystian3w
Copy link
Contributor

As currently one list pulls in 100% of the rules of the other is a bit like that (I have not checked how it is done, for example, with the reduction of duplicates on the script side before publishing the list update).

The only thing that worries me is something like the mode of deactivation of cosmetic filters on https-sensitive sites - here with rules we can also deactivate the removal of parameters completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants