Skip to content

Commit

Permalink
cpp: Fix highlighting of unterminated raw strings
Browse files Browse the repository at this point in the history
PR highlightjs#1897 switched C++ raw strings to use backreferences, however this
breaks souce files where raw strings are truncated. Like comments, it
would be preferable to highlight them.

Instead, go back to using separate begin and end regexps, but introduce
an endFilter feature to filter out false positive matches. This
internally works similarly to endSameAsBegin.

See also issue highlightjs#2259.
  • Loading branch information
davidben authored and joshgoebel committed Mar 12, 2020
1 parent 94faa80 commit 573174e
Show file tree
Hide file tree
Showing 7 changed files with 54 additions and 7 deletions.
22 changes: 21 additions & 1 deletion docs/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ endSameAsBegin
Acts as ``end`` matching exactly the same string that was found by the
corresponding ``begin`` regexp.

For example, in PostgreSQL string constants can uee "dollar quotes",
For example, in PostgreSQL string constants can use "dollar quotes",
consisting of a dollar sign, an optional tag of zero or more characters,
and another dollar sign. String constant must be ended with the same
construct using the same tag. It is possible to nest dollar-quoted string
Expand All @@ -208,6 +208,26 @@ In this case you can't simply specify the same regexp for ``begin`` and
``end`` (say, ``"\\$[a-z]\\$"``), but you can use ``begin: "\\$[a-z]\\$"``
and ``endSameAsBegin: true``.

.. _endFilter:

endFilter
^^^^^^^^^

**type**: function

Filters ``end`` matches to implement end rules that cannot be expressed as a
standalone regular expression.

This should be a function which takes two string parameters, the string that
matched the ``begin`` regexp and the string that matched the ``end`` regexp. It
should return true to end the mode and false otherwise.

For example, C++11 raw string constants use syntax like ``R"tag(.....)tag"``,
where ``tag`` is any zero to sixteen character string that must be repeated at
the end. This could be matched with a single regexp containing backreferences,
but truncated raw strings would not highlight. Instead, ``endFilter`` can be
used to reject ``)tag"`` delimiters which do not match the starting value.

.. _lexemes:

lexemes
Expand Down
14 changes: 9 additions & 5 deletions src/highlight.js
Original file line number Diff line number Diff line change
Expand Up @@ -120,15 +120,19 @@ const HLJS = function(hljs) {
function _highlight(languageName, code, ignore_illegals, continuation) {
var codeToHighlight = code;

function endOfMode(mode, lexeme) {
if (regex.startsWith(mode.endRe, lexeme)) {
function endOfMode(mode, matchPlusRemainder, lexeme) {
var modeEnded = regex.startsWith(mode.endRe, matchPlusRemainder);
if (modeEnded && mode.endFilter) {
modeEnded = mode.endFilter(mode.beginValue, lexeme);
}
if (modeEnded) {
while (mode.endsParent && mode.parent) {
mode = mode.parent;
}
return mode;
}
if (mode.endsWithParent) {
return endOfMode(mode.parent, lexeme);
return endOfMode(mode.parent, matchPlusRemainder, lexeme);
}
}

Expand Down Expand Up @@ -210,7 +214,7 @@ const HLJS = function(hljs) {
if (mode.className) {
emitter.openNode(mode.className);
}
top = Object.create(mode, {parent: {value: top}});
top = Object.create(mode, {parent: {value: top}, beginValue: {value: lexeme}});
}

function doIgnore(lexeme) {
Expand Down Expand Up @@ -259,7 +263,7 @@ const HLJS = function(hljs) {
function doEndMatch(match) {
var lexeme = match[0];
var matchPlusRemainder = codeToHighlight.substr(match.index);
var end_mode = endOfMode(top, matchPlusRemainder);
var end_mode = endOfMode(top, matchPlusRemainder, lexeme);
if (!end_mode) { return; }

var origin = top;
Expand Down
11 changes: 10 additions & 1 deletion src/languages/c-like.js
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,16 @@ export default function(hljs) {
begin: '(u8?|U|L)?\'(' + CHARACTER_ESCAPES + "|.)", end: '\'',
illegal: '.'
},
{ begin: /(?:u8?|U|L)?R"([^()\\ ]{0,16})\((?:.|\n)*?\)\1"/ }
{
begin: /(?:u8?|U|L)?R"[^()\\ ]{0,16}\(/,
end: /\)[^()\\ ]{0,16}"/,
endFilter: function(begin, end) {
var quote = begin.indexOf('"');
var beginDelimiter = begin.substring(quote + 1, begin.length - 1);
var endDelimiter = end.substring(1, end.length - 1);
return beginDelimiter == endDelimiter;
},
}
]
};

Expand Down
3 changes: 3 additions & 0 deletions test/markup/cpp/truncated-block-comment.expect.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<span class="hljs-comment">/*
Truncated block comment
</span>
2 changes: 2 additions & 0 deletions test/markup/cpp/truncated-block-comment.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/*
Truncated block comment
5 changes: 5 additions & 0 deletions test/markup/cpp/truncated-raw-string.expect.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<span class="hljs-string">R"foo(
Truncated raw string
)nope"
Still not completed.
</span>
4 changes: 4 additions & 0 deletions test/markup/cpp/truncated-raw-string.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
R"foo(
Truncated raw string
)nope"
Still not completed.

0 comments on commit 573174e

Please sign in to comment.