Skip to content

Commit

Permalink
Add a new static filtering parser
Browse files Browse the repository at this point in the history
A new standalone static filtering parser is introduced,
vAPI.StaticFilteringParser. It's purpose is to parse
line of text into representation suitable for
compiling filters. It can additionally serves for
syntax highlighting purpose.

As a side effect, this solves:
- uBlockOrigin/uBlock-issues#1038

This is a first draft, there are more work left to do
to further perfect the implementation and extend its
capabilities, especially those useful to assist filter
authors.

For the time being, this commits break line-continuation
syntax highlighting -- which was already flaky prior to
this commit anyway.
  • Loading branch information
gorhill committed Jun 4, 2020
1 parent e8c8fab commit 01b1ed9
Show file tree
Hide file tree
Showing 10 changed files with 1,895 additions and 546 deletions.
1 change: 1 addition & 0 deletions src/1p-filters.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
<script src="js/i18n.js"></script>
<script src="js/dashboard-common.js"></script>
<script src="js/cloud-ui.js"></script>
<script src="js/static-filtering-parser.js"></script>
<script src="js/1p-filters.js"></script>

</body>
Expand Down
1 change: 1 addition & 0 deletions src/asset-viewer.html
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
<script src="js/udom.js"></script>
<script src="js/i18n.js"></script>
<script src="js/dashboard-common.js"></script>
<script src="js/static-filtering-parser.js"></script>
<script src="js/asset-viewer.js"></script>

</body>
Expand Down
3 changes: 2 additions & 1 deletion src/background.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@
<script src="js/filtering-context.js"></script>
<script src="js/redirect-engine.js"></script>
<script src="js/dynamic-net-filtering.js"></script>
<script src="js/static-net-filtering.js"></script>
<script src="js/url-net-filtering.js"></script>
<script src="js/static-filtering-parser.js"></script>
<script src="js/static-net-filtering.js"></script>
<script src="js/static-ext-filtering.js"></script>
<script src="js/cosmetic-filtering.js"></script>
<script src="js/scriptlet-filtering.js"></script>
Expand Down
10 changes: 10 additions & 0 deletions src/css/codemirror.css
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,17 @@
word-break: break-all;
}

/* CodeMirror theme overrides */
.cm-s-default .cm-string-2 { color: #a30; }
.cm-s-default .cm-comment { color: #777; }
.cm-s-default .cm-keyword { color: #90b; }
.cm-s-default .cm-error,
.CodeMirror-linebackground.error {
background-color: #ff000018;
text-decoration: underline red;
text-underline-position: under;
}

.cm-directive { color: #333; font-weight: bold; }
.cm-staticext { color: #008; }
.cm-staticnetBlock { color: #800; }
Expand Down
224 changes: 131 additions & 93 deletions src/js/codemirror/ubo-static-filtering.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,117 +24,155 @@
'use strict';

CodeMirror.defineMode("ubo-static-filtering", function() {
const reDirective = /^\s*!#(?:if|endif|include)\b/;
const reComment1 = /^\s*!/;
const reComment2 = /^\s*#/;
const reExt = /(#@?(?:\$\??|\?)?#)(?!##)/;
const reNet = /^\s*(?:@@)?.*(?:(\$)(?:[^$]+)?)?$/;
let lineStyle = null;
let anchorOptPos = null;

const lines = [];
let iLine = 0;
const parser = new vAPI.StaticFilteringParser(true);
const reDirective = /^!#(?:if|endif|include)\b/;
let parserSlot = 0;
let netOptionValueMode = false;

const lineFromLineBuffer = function() {
return lines.length === 1
? lines[0]
: lines.filter(a => a.replace(/^\s*|\s+\\$/g, '')).join('');
};

const parseExtFilter = function() {
lineStyle = 'staticext';
for ( let i = 0; i < lines.length; i++ ) {
const match = reExt.exec(lines[i]);
if ( match === null ) { continue; }
anchorOptPos = { y: i, x: match.index, l: match[1].length };
break;
const colorSpan = function(stream) {
if ( parser.category === parser.CATNone || parser.shouldIgnore() ) {
stream.skipToEnd();
return 'comment';
}
};

const parseNetFilter = function() {
lineStyle = lineFromLineBuffer().startsWith('@@')
? 'staticnetAllow'
: 'staticnetBlock';
let i = lines.length;
while ( i-- ) {
const pos = lines[i].lastIndexOf('$');
if ( pos === -1 ) { continue; }
anchorOptPos = { y: i, x: pos, l: 1 };
break;
if ( parser.category === parser.CATComment ) {
stream.skipToEnd();
return reDirective.test(stream.string)
? 'variable strong'
: 'comment';
}
};

const highlight = function(stream) {
if ( anchorOptPos !== null && iLine === anchorOptPos.y ) {
if ( stream.pos === anchorOptPos.x ) {
stream.pos += anchorOptPos.l;
return `${lineStyle} staticOpt`;
if ( (parser.slices[parserSlot] & parser.BITIgnore) !== 0 ) {
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return 'comment';
}
if ( (parser.slices[parserSlot] & parser.BITError) !== 0 ) {
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return 'error';
}
if ( parser.category === parser.CATStaticExtFilter ) {
if ( parserSlot < parser.optionsAnchorSpan.i ) {
const style = (parser.slices[parserSlot] & parser.BITComma) === 0
? 'string-2'
: 'def';
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return style;
}
if (
parserSlot >= parser.optionsAnchorSpan.i &&
parserSlot < parser.patternSpan.i
) {
const style = (parser.flavorBits & parser.BITFlavorException) !== 0
? 'tag'
: 'def';
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return `${style} strong`;
}
if ( stream.pos < anchorOptPos.x ) {
stream.pos = anchorOptPos.x;
return lineStyle;
if ( parserSlot >= parser.patternSpan.i ) {
stream.skipToEnd();
return 'variable';
}
stream.skipToEnd();
return '';
}
stream.skipToEnd();
return lineStyle;
};

const parseMultiLine = function() {
anchorOptPos = null;
const line = lineFromLineBuffer();
if ( reDirective.test(line) ) {
lineStyle = 'directive';
return;
if ( parserSlot < parser.exceptionSpan.i ) {
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return '';
}
if ( reComment1.test(line) ) {
lineStyle = 'comment';
return;
if (
parserSlot === parser.exceptionSpan.i &&
parser.exceptionSpan.l !== 0
) {
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return 'tag strong';
}
if ( line.indexOf('#') !== -1 ) {
if ( reExt.test(line) ) {
return parseExtFilter();
if (
parserSlot === parser.patternLeftAnchorSpan.i &&
parser.patternLeftAnchorSpan.l !== 0 ||
parserSlot === parser.patternRightAnchorSpan.i &&
parser.patternRightAnchorSpan.l !== 0
) {
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return 'keyword strong';
}
if (
parserSlot >= parser.patternSpan.i &&
parserSlot < parser.patternRightAnchorSpan.i
) {
if ( (parser.slices[parserSlot] & (parser.BITAsterisk | parser.BITCaret)) !== 0 ) {
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return 'keyword strong';
}
if ( reComment2.test(line) ) {
lineStyle = 'comment';
return;
const nextSlot = parser.skipUntil(
parserSlot,
parser.patternRightAnchorSpan.i,
parser.BITAsterisk | parser.BITCaret
);
stream.pos = parser.slices[nextSlot+1];
parserSlot = nextSlot;
return 'variable';
}
if (
parserSlot === parser.optionsAnchorSpan.i &&
parser.optionsAnchorSpan.l !== 0
) {
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return 'def strong';
}
if (
parserSlot >= parser.optionsSpan.i &&
parser.optionsSpan.l !== 0
) {
const bits = parser.slices[parserSlot];
let style;
if ( (bits & parser.BITComma) !== 0 ) {
style = 'def strong';
netOptionValueMode = false;
} else if ( (bits & parser.BITTilde) !== 0 ) {
style = 'keyword strong';
} else if ( (bits & parser.BITPipe) !== 0 ) {
style = 'def';
} else if ( netOptionValueMode ) {
style = 'string-2';
} else if ( (bits & parser.BITEqual) !== 0 ) {
netOptionValueMode = true;
}
stream.pos += parser.slices[parserSlot+2];
parserSlot += 3;
return style || 'def';
}
if ( reNet.test(line) ) {
return parseNetFilter();
if (
parserSlot >= parser.commentSpan.i &&
parser.commentSpan.l !== 0
) {
stream.skipToEnd();
return 'comment';
}
lineStyle = null;
stream.skipToEnd();
return '';
};

return {
startState: function() {
},
token: function(stream) {
if ( iLine === lines.length || stream.string !== lines[iLine] ) {
iLine = 0;
}
if ( iLine === 0 ) {
if ( lines.length > 1 ) {
lines.length = 1;
}
let line = stream.string;
lines[0] = line;
if ( line.endsWith(' \\') ) {
do {
line = stream.lookAhead(lines.length);
if (
line === undefined ||
line.startsWith(' ') === false
) { break; }
lines.push(line);
} while ( line.endsWith(' \\') );
}
parseMultiLine();
if ( stream.sol() ) {
parser.analyze(stream.string);
parser.analyzeExtra(stream.string);
parserSlot = 0;
netOptionValueMode = false;
}
const style = highlight(stream);
if ( stream.eol() ) {
iLine += 1;
let style = colorSpan(stream);
if ( (parser.flavorBits & parser.BITFlavorError) !== 0 ) {
style += ' line-background-error';
}
return style;
style = style.trim();
return style !== '' ? style : null;
},
};
});
5 changes: 4 additions & 1 deletion src/js/reverselookup.js
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,10 @@ const fromNetFilter = async function(rawFilter) {

const µb = µBlock;
const writer = new µb.CompiledLineIO.Writer();
if ( µb.staticNetFilteringEngine.compile(rawFilter, writer) === false ) {
const parser = new vAPI.StaticFilteringParser();
parser.analyze(rawFilter);

if ( µb.staticNetFilteringEngine.compile(parser, writer) === false ) {
return;
}

Expand Down
Loading

5 comments on commit 01b1ed9

@gorhill
Copy link
Owner Author

@gorhill gorhill commented on 01b1ed9 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, bad commit message -- many obvious English typos.

Additionally, I couldn't remember what I meant to mention in the commit message, and as is often the case I remembered not long after I pushed the commit to GitHub, so here:

I found a long standing issue in how some static network filters were previously erroneously parsed, those which starts with an underscore and which were confused by uBO as pure hostname filters while they were not. Examples from EasyList:

_468.gif
_468.htm
_728.htm
_ads.cgi
_ads.html
_adverts.js
_rebid.js

The above filters were obviously not meant to be parsed as pure hostname filters. This has been fixed in the above commit, a filter starting with an underscore (a valid hostname character) will no longer be considered as "pure hostname". The filters above ended up being stored in an HNTrie meaning they would never match as intended by the filter author.

Another issue was the incorrect parsing of some hosts files, for example:

https://raw.githubusercontent.com/lennylxx/ipv6-hosts/master/hosts

Specifically, lines with ## were parsed as cosmetic filter. This has also been fixed in the above commit, instances of ## (with a space afterward) will be parsed as comments.

@gwarser
Copy link
Contributor

@gwarser gwarser commented on 01b1ed9 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@gorhill
Copy link
Owner Author

@gorhill gorhill commented on 01b1ed9 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected output given the fix I had to put in.

First line is because of that logic: there can't be a space in the middle of a pattern, as this never occurs in a URL. So when this happens in a filter, uBO discards all that appear before the space as being irrelevant. This fixes parsing https://raw.githubusercontent.com/lennylxx/ipv6-hosts/master/hosts.

The second is because you created a cosmetic filter and also in such case uBO expects a list of valid hostname before the ##. / is not a valid hostname.

Edit: to be clear regarding the first pattern, the space after ## causes the filter to not be deemed a cosmetic filter, so it's being parsed as a network filter, and thus the space-in-the-middle rule applies.

@gwarser
Copy link
Contributor

@gwarser gwarser commented on 01b1ed9 Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gorhill
Copy link
Owner Author

@gorhill gorhill commented on 01b1ed9 Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't see what I can do for this one, spaces are allowed in CSS selector. At least ultimately it will be rejected because it's an invalid cosmetic filter.

Please sign in to comment.