-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[static filter syntax] irregularities when no wildcard is used #1065
Comments
By design. Unlike ABP, uBO does not use RegExp by default internally for most filters, only when it can not be avoided -- for efficiency reason. An asterisk is required to explicitly declare that any character segment is allowed in one spot, this includes the beginning/end of a filter, otherwise it is assume the beginning/end is a word boundary (if defining word as a sequence of any of the character from the set |
Could you put that into the "Static filter syntax" section of the wiki? I too was under the impression that blocking rules had implicit wildcards at beginning and end (with the exception that if the rule would otherwise start and end with a forward slash This actually makes uBO more efficient than ABP and the like, because it's possible to specify that a filter begins and ends on a word boundary without using a regex. |
They have implicit wildcards, except after implying word boundaries.
The kind of filters as seen in OP are quite rare in EasyList/EasyPrivacy. They leads to inefficient filtering, because they are broad, can't be tokenized (if following ABP rules), and with such filter every single URL must be checked against every single one of these untokenizable filters. Given their rarity in EasyList/EasyPrivacy, this is not really the explanation of why uBO's static network filtering engine is more efficient. For other cases where the start or end is a word character, it's clear the intended function was not to have a wildcard interpretation. For example, in EasyList:
I really doubt that the purpose of this filter is to block all GIF images which URL ends with Implicit word boundaries is what is pretty much always happening anyways in EasyList/EasyPrivacy. I am sure that nowadays, for performance consideration re. ABP, official filter list maintainers avoid filters like in OP. Regarding the evaluation of network static filters, the upperhand of uBO versus ABP is not because of the differing interpretation re. leading/tailing wildcards, but rather because majority of filters are evaluated not using a regex in uBO (this would still be true if implying wildcards like ABP), while all filters are translated into regexes in ABP (someone correct me if I am wrong). Typically regex will cause the whole URL to be scanned. With uBO, the token extracted from the URL is used to first lookup a bucket of potentially matching filters, and key to the performance of static network filter evaluation, the position of the token is used as an anchor to perform a plain string comparison, no need to scan the whole URL:
So uBO matches the anchor of the filter token to the anchor of the URL token:
Segment of string to compare in URL to filter string is at position (31 - 1 = 30):
In the end it's a mere string comparison: There are other different optimizations too for static network filtering evaluation, like using the request type/party as hash key to narrow the number of filters to evaluate; or another optimization which is related to the compact storing of plain hostname-based filters -- which is the most common occurrence in uBO. |
I prefer to leave it as is, it's just makes more sense to explicitly use a wildcard if we want to cancel word boundaries (i.e. |
Here is a real case where implying a wildcard causes an issue: http://www.858.photos/. The images in the carousel at the top of the page are prevented from loading, because their URL is something like |
Is this the reason why the filter |
Yes, |
Interesting note - asterisk is not required to be at end of string https://gitlab.com/xuhaiyang1234/AAK-Cont/issues/17#note_31436319 |
browser version/ublock version: iw-38.4.0esr / uBO-1.4.1b2
open this exact URL: https://github.com
do this: lets try to filter https://api.github.com/_private/browser/stats
Neither of the following filters work
||api.git
^browse
browser
stats
_privat (for some reason _private works though)
All of the filters work in ABP (even a single character does), but in uBO it seems you have to add at least one wildcard if you don't:
... but why is _private working then?
Not that those examples make much sense, I'm just trying to understand in which cases the syntax doesn't comply with ABP.
The text was updated successfully, but these errors were encountered: