-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
A new standalone static filtering parser is introduced, vAPI.StaticFilteringParser. It's purpose is to parse line of text into representation suitable for compiling filters. It can additionally serves for syntax highlighting purpose. As a side effect, this solves: - uBlockOrigin/uBlock-issues#1038 This is a first draft, there are more work left to do to further perfect the implementation and extend its capabilities, especially those useful to assist filter authors. For the time being, this commits break line-continuation syntax highlighting -- which was already flaky prior to this commit anyway.
- Loading branch information
Showing
10 changed files
with
1,895 additions
and
546 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
01b1ed9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, bad commit message -- many obvious English typos.
Additionally, I couldn't remember what I meant to mention in the commit message, and as is often the case I remembered not long after I pushed the commit to GitHub, so here:
I found a long standing issue in how some static network filters were previously erroneously parsed, those which starts with an underscore and which were confused by uBO as pure hostname filters while they were not. Examples from EasyList:
The above filters were obviously not meant to be parsed as pure hostname filters. This has been fixed in the above commit, a filter starting with an underscore (a valid hostname character) will no longer be considered as "pure hostname". The filters above ended up being stored in an HNTrie meaning they would never match as intended by the filter author.
Another issue was the incorrect parsing of some hosts files, for example:
Specifically, lines with
##
were parsed as cosmetic filter. This has also been fixed in the above commit, instances of##
(with a space afterward) will be parsed as comments.01b1ed9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
01b1ed9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected output given the fix I had to put in.
First line is because of that logic: there can't be a space in the middle of a pattern, as this never occurs in a URL. So when this happens in a filter, uBO discards all that appear before the space as being irrelevant. This fixes parsing
https://raw.githubusercontent.com/lennylxx/ipv6-hosts/master/hosts
.The second is because you created a cosmetic filter and also in such case uBO expects a list of valid hostname before the
##
./
is not a valid hostname.Edit: to be clear regarding the first pattern, the space after
##
causes the filter to not be deemed a cosmetic filter, so it's being parsed as a network filter, and thus the space-in-the-middle rule applies.01b1ed9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://raw.githubusercontent.com/stonecrusher/filterlists-pihole/master/watchlist-internet-ph.txt
01b1ed9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't see what I can do for this one, spaces are allowed in CSS selector. At least ultimately it will be rejected because it's an invalid cosmetic filter.