-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid output #3
Comments
Indeed, before this patch we weren't decoding the following cases: * |http://example.com/* * |http://example.org^ This patch fixes PyFunceble/adblock-decoder#3. Contributors: * @smed79
@smed79 please review the testcases before I deploy/release my change: funilrys/PyFunceble@d32914b#diff-6fbb548d14d904b48cdaa09ea8c1ca04249d69cef0763217ac957605c50548a6R278-R326 Let me know If I missed a test case. Stay safe and healthy! |
1st,
There is no such cas in adblock (plus) syntax. blocked requests (files or domains) cannot be separated by comma, so the correct syntax have only to be
or
or we have to use a regex rule, as below
2nd, Excuse my ignorance, i have a question ...
what is the intended behavior ? extracting all domains for testing purpose (ACTIVE, INACTIVE or INVALID) <-- Case 1 or extracting only domains that are safe to be blocked ? <-- Case 2 for the second case (safe), the tool have to extract only domains that flagged with the
I mean
more aggressive, include popups filters
clean output (with 0.0.0.0)
|
@smed79 , I don't create such lists complex lists on my own, so I'm happy to have inputs from the community.
That's good to know. Will be fixed.
Actually both. But I'm willing to make some changes. Please keep in mind that the adblock-decoder actually is a wrapper around the functionalities of PyFunceble. What you describe as
That's interesting. If everyone (cc: @Yuki2718 | @ryanbr | please flag others) agree on that, I can only see improvement. I (and probably the community too) will be grateful if you could have the time to check the tests cases and let me know:
I'll then follow up with a complete rewrite of the decoder module. |
TBH I don't understand what is the issue. I see
only - what's wrong with extracting |
So I changed |
In the other cases we are targeting a specific file¹, folder², request type³ (image, script ...) or applying the filter for a specific⁴ website. ||ttt.com/*/ban.js <--¹
||sss.com^*/img/ <--²
||uuu.com/*$script <--³
||eee.com/*$image,domain=fff.com <--⁴ So the above example, the output should not include if an adblock list have the filter
@mapx- @okiehsch @Alex-302 @AdamWr @Khrin any test/comment will be appreciated (sure if you have some free time). |
@funilrys see my comments before the {
"subject": '##[href^="https://funceble.funilrys.com/"]',
"expected": {
"aggressive": ["funceble.funilrys.com"],
"standard": [],
},
},
{
"subject": "||test.hello.world^$domain=hello.world",
"expected": {
"aggressive": ["hello.world", "test.hello.world"],
"standard": ["test.hello.world"], # should be null because the filter is applyed for a specific website
},
},a
{
"subject": '##div[href^="http://funilrys.com/"]',
"expected": {"aggressive": ["funilrys.com"], "standard": []},
},
{
"subject": 'com##[href^="ftp://funceble.funilrys-funceble.com/"]',
"expected": {
"aggressive": ["funceble.funilrys-funceble.com"],
"standard": [],
},
},
{
"subject": "!@@||funceble.world/js",
"expected": {"aggressive": [], "standard": []},
},
{
"subject": "!||world.hello/*ad.xml",
"expected": {"aggressive": [], "standard": []},
},
{
"subject": "!funilrys.com##body",
"expected": {"aggressive": [], "standard": []},
},
{
"subject": "[AdBlock Plus 2.0]",
"expected": {"aggressive": [], "standard": []},
},
{
"subject": "@@||ads.example.com/notbanner^$~script",
"expected": {"aggressive": ["ads.example.com"], "standard": []},
},
{"subject": "/banner/*/img^", "expected": {"aggressive": [], "standard": []}},
{
"subject": "||ad.example.co.uk^",
"expected": {
"aggressive": ["ad.example.co.uk"],
"standard": ["ad.example.co.uk"],
},
},
{
"subject": "||ad.example.fr^$image,test",
"expected": {
"aggressive": ["ad.example.fr"],
"standard": ["ad.example.fr"], # should be null because we are targeting a specific request type
},
},
{
"subject": "||api.funilrys.com/widget/$",
"expected": {
"aggressive": ["api.funilrys.com"],
"standard": ["api.funilrys.com"], # should be null because we are targeting a specific file/folder
},
},
{
"subject": "||api.example.com/papi/action$popup",
"expected": {
"aggressive": ["api.example.com"],
"standard": ["api.example.com"], # should be null because we are targeting a specific request type
},
},
{
"subject": "||funilrys.github.io$script,image",
"expected": {
"aggressive": ["funilrys.github.io"],
"standard": ["funilrys.github.io"], # should be null because we are targeting a specific request type
},
},
{
"subject": "||example.net^$script,image",
"expected": {"aggressive": ["example.net"],
"standard": ["example.net"]}, # should be null because we are targeting a specific request type
},
{
"subject": "||static.hello.world.examoke.org/*/exit-banner.js",
"expected": {
"aggressive": ["static.hello.world.examoke.org"],
"standard": ["static.hello.world.examoke.org"], # should be null because we are targeting a specific file
},
},
{
"subject": "$domain=exam.pl|elpmaxe.pl|example.pl",
"expected": {
"aggressive": ["elpmaxe.pl", "exam.pl", "example.pl"],
"standard": [],
},
},
{
"subject": "||example.de^helloworld.com", # unlikely scenario to have a similar filter case
"expected": {
"aggressive": ["example.de"],
"standard": ["example.de"],
},
},
{
"subject": "|github.io|", # unlikely scenario
"expected": {"aggressive": ["github.io"], "standard": ["github.io"]},
},
{
"subject": "~github.com,hello.world##.wrapper",
"expected": {"aggressive": ["github.com", "hello.world"], "standard": []},
},
{
"subject": "bing.com,bingo.com#@##adBanner",
"expected": {"aggressive": ["bing.com", "bingo.com"], "standard": []},
},
{
"subject": "example.org#@##test",
"expected": {"aggressive": ["example.org"], "standard": []},
},
{
"subject": "hubgit.com|oohay.com|ipa.elloh.dlorw#@#awesomeWorld", # incorrect filter (for element hiding rules, domains are separated with commas)
"expected": {
"aggressive": ["hubgit.com|oohay.com|ipa.elloh.dlorw"],
"standard": [],
},
},
{"subject": ".com", "expected": {"aggressive": [], "standard": []}},
{
"subject": "||ggggggggggg.gq^$all",
"expected": {
"aggressive": ["ggggggggggg.gq"],
"standard": ["ggggggggggg.gq"],
},
},
{
"subject": "facebook.com##.search",
"expected": {"aggressive": ["facebook.com"], "standard": []},
},
{
"subject": "||test.hello.world^$domain=hello.world",
"expected": {
"aggressive": ["hello.world", "test.hello.world"],
"standard": ["test.hello.world"], # should be null because the filter is applyed for a specific website
},
},
{
"subject": "||examplae.com",
"expected": {"aggressive": ["examplae.com"], "standard": ["examplae.com"]},
},
{
"subject": "||examplbe.com^",
"expected": {"aggressive": ["examplbe.com"], "standard": ["examplbe.com"]},
},
{
"subject": "||examplce.com$third-party",
"expected": {"aggressive": ["examplce.com"], "standard": ["examplce.com"]},
},
{
"subject": "||examplde.com^$third-party",
"expected": {"aggressive": ["examplde.com"], "standard": ["examplde.com"]},
},
{
"subject": '##[href^="https://examplee.com/"]',
"expected": {"aggressive": ["examplee.com"], "standard": []},
},
{
"subject": "||examplfe.com^examplge.com", # same as the case in the line 103
"expected": {"aggressive": ["examplfe.com"], "standard": ["examplfe.com"]},
},
{
"subject": "||examplhe.com$script,image", # same as the case in the line 56 and 84
"expected": {"aggressive": ["examplhe.com"], "standard": ["examplhe.com"]},
},
{
"subject": "||examplie.com^$domain=domain1.com|domain2.com",
"expected": {
"aggressive": [
"domain1.com",
"domain2.com",
"examplie.com",
],
"standard": ["examplie.com"], # should be null because the filter is applyed for a specific website
},
},
{
"subject": 'examlple.com##[href^="http://hello.world."], '
'[href^="http://example.net/"]',
"expected": {
"aggressive": ["examlple.com", "example.net", "hello.world."],
"standard": [],
},
},
{"subject": "##.ad-href1", "expected": {"aggressive": [], "standard": []}},
{
"subject": "^hello^$domain=example.com",
"expected": {"standard": [], "aggressive": ["example.com"]},
},
{
"subject": "hello$domain=example.net|example.com",
"expected": {"standard": [], "aggressive": ["example.com", "example.net"]},
},
{
"subject": "hello^$domain=example.org|example.com|example.net",
"expected": {
"standard": [],
"aggressive": ["example.com", "example.net", "example.org"],
},
},
{
"subject": "|http://example.org/hello-world^$scripts,image",
"expected": {"aggressive": ["example.org"],
"standard": ["example.org"]}, # should be null because we are targeting a specific file/folder for a specific request type
},
{
"subject": "|http://example.org/*",
"expected": {"aggressive": ["example.org"], "standard": ["example.org"]},
},
{
"subject": "|http://example.org^",
"expected": {"aggressive": ["example.org"], "standard": ["example.org"]},
},
{
"subject": "|http://example.org",
"expected": {"aggressive": ["example.org"], "standard": ["example.org"]},
},
{
"subject": "|https://example.org/^$domain=example.com",
"expected": {
"aggressive": ["example.com", "example.org"],
"standard": ["example.org"], # should be null because the filter is applyed for a specific websites
},
},
{
"subject": "|ftp://example.org$domain=example.com|example.net",
"expected": {
"aggressive": ["example.com", "example.net", "example.org"],
"standard": ["example.org"], # should be null because the filter is applyed for a specific websites
},
},
{
"subject": "|http://example.com$script,image,domain=example.org|foo.example.net",
"expected": {
"aggressive": ["example.com", "example.org", "foo.example.net"],
"standard": ["example.com"], # should be null because the filter is applyed for a specific websites
},
},
{
"subject": "|http://example.com,https://example.de$script,image,domain=example.org|foo.example.net", # incorrect filter (not possible to block many sites in the same filter or we have to use a regex rule)
"expected": {
"aggressive": [
"example.com",
"example.de",
"example.org",
"foo.example.net",
],
"standard": ["example.com", "example.de"],
},
},
] |
If it only scans for rules to block the entire domain, |
If so, then it's a misunderstanding of the tool by me. for that i asked above what is the intended behavior (2nd). 😕 I thought we can use the tool to convert an adblock list to blacklist hosts file (Case 2). |
Testing the below list
Output
The expected output should be
The text was updated successfully, but these errors were encountered: