-
Notifications
You must be signed in to change notification settings - Fork 365
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2128 from guwirth/right-angle-brackets
Parsing of nested templates is slow
- Loading branch information
Showing
5 changed files
with
329 additions
and
99 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,221 @@ | ||
<html> | ||
<head> | ||
<title>Right Angle Brackets (N1757/05-0017)</title> | ||
</head> | ||
<body bgcolor=white fgcolor=black> | ||
|
||
<p align=right> | ||
<table> | ||
<tr><td>Document number:</td><td>N1757</td></tr> | ||
<tr><td></td><td>05-0017</td></tr> | ||
<tr><td>Author:</td><td>Daveed Vandevoorde</td></tr> | ||
<tr><td></td><td>Edison Design Group</td></tr> | ||
<tr><td>Date:</td><td>2005-01-14</td></tr> | ||
</table> | ||
|
||
<center><h1>Right Angle Brackets</h1></center> | ||
<center>(Revision 2)</center> | ||
|
||
<h2>Introduction</h2> | ||
<p> | ||
Ever since the introduction of angle brackets, C++ programmers have been | ||
surprised by the fact that two consecutive right angle brackets must be | ||
separated by whitespace: | ||
<blockquote><tt><pre>#include <vector> | ||
typedef std::vector<std::vector<int> > Table; // OK | ||
typedef std::vector<std::vector<bool>> Flags; // Error | ||
</pre></tt></blockquote> | ||
The problem is an immediate consequence of the the “maximum munch” principle and the fact that <tt>>></tt> is a valid token (right shift) in C++. | ||
<p> | ||
This issue is a minor, but persisting, annoying, and somewhat | ||
embarrassing problem. If the cost is reasonable, it seems therefore | ||
worthwhile to eliminate the surprise. | ||
<p> | ||
The purpose of this document is to explain ways to allow <tt>>></tt> to be treated as two closing angle brackets, as well as to discuss the resulting issues. A specific option is proposed along with wording that would implement the proposal in the current working paper. | ||
|
||
<h2>Constructs with Right Angle Brackets</h2> | ||
<p> | ||
The example above shows the most common context of double right angle brackets: Nested template-ids. However, the “new-style” cast syntax may also participate in such constructs. For example: | ||
<blockquote><tt><pre> | ||
static_cast<List<B>>(ld) | ||
</pre></tt></blockquote> | ||
This situation currently occurs fairly rarely because the template-ids involved always represent class types, whereas these casts usually involve pointer, pointer-to-member, or reference types. | ||
<p> | ||
However, if template aliases make it into the language (and it seems likely | ||
they will), then template-ids will be able to represent nonclass types. | ||
It seems therefore desirable to address the issue for all constructs with | ||
right angle brackets, not just for templates. | ||
<p> | ||
It is also worth noting that the problem can also occur with the <tt>>>=</tt> and <tt>>=</tt> tokens. For example | ||
<blockquote><tt><pre> | ||
void func(List<B>= default_val1); | ||
void func(List<List<B>>= default_val2); | ||
</pre></tt></blockquote> | ||
Both of these forms are currently ill-formed. It may be desirable to | ||
also address this issue, but this paper does not propose to do so. | ||
|
||
<h2>Possible Solutions</h2> | ||
<p> | ||
Solving our problem amounts to decreeing that under some circumstances | ||
a <tt>>></tt> token is treated as two right angle brackets | ||
instead of a right shift operator. As it turns out, there are several | ||
general approaches to defining those | ||
“circumstances.” | ||
<p> | ||
<b>Approach 1.</b> | ||
The first approach is the simplest: Decree that if a left angle bracket is | ||
active (i.e. not yet matched by a right angle bracket) the <tt>>></tt> token | ||
is treated as two right angle brackets instead of a shift operator, | ||
except within parentheses or brackets that are themselves within the angle brackets. | ||
A slight | ||
variation on that theme (call it “Approach 1b”) is to | ||
require at least two left angle brackets to | ||
be active since otherwise the construct would be an error (because there would be an excess of right angle brackets). | ||
<p> | ||
This strategy is similar to the treatment of the <tt>></tt> token: | ||
If a left angle bracket is active, the token is treated as a right angle | ||
bracket, except within parentheses. For example: | ||
<blockquote><tt><pre> | ||
A<(X>Y)> a; // The first > token appears within parentheses and | ||
// therefore is not a right angle bracket. The second one | ||
// <i>is</i> a right angle bracket because a left angle bracket | ||
// is active and no parentheses are more recently active. | ||
</pre></tt></blockquote> | ||
<p> | ||
Unfortunately, some programs may be broken by this approach. | ||
Consider the following example: | ||
<blockquote><tt><pre>#include <iostream> | ||
template<int I> struct X { | ||
static int const c = 2; | ||
}; | ||
template<> struct X<0> { | ||
typedef int c; | ||
}; | ||
template<typename T> struct Y { | ||
static int const c = 3; | ||
}; | ||
static int const c = 4; | ||
int main() { | ||
std::cout << (Y<X<1> >::c >::c>::c) << '\n'; | ||
std::cout << (Y<X< 1>>::c >::c>::c) << '\n'; | ||
} | ||
</pre></tt></blockquote> | ||
This program is valid today; it produces the following output: | ||
<blockquote><tt><pre>0 | ||
3 | ||
</pre></tt></blockquote> | ||
With the right angle bracket rule proposed above, the <tt>>></tt> token | ||
in the second statement would change its meaning (from right shift to double right | ||
angle bracket) and the output would therefore | ||
become: | ||
<blockquote><tt><pre>0 | ||
0 | ||
</pre></tt></blockquote> | ||
<p> | ||
<b>Approach 2.</b> | ||
To avoid the backward incompatibility, an alternative solution it to modify | ||
the rule proposed above to only treat the <tt>>></tt> token as two right | ||
angle brackets when parsing template type arguments or template template | ||
arguments, but not when parsing template nontype arguments. This approach would make <tt>A<B<int>></tt> valid, but would leave <tt>C<D<12>></tt> ill-formed. | ||
<p> | ||
Another way to view this alternative approach is that a template argument | ||
is always parsed as far as possible (which may include right shift operators). | ||
When an argument is parsed, the next token must be a comma, a <tt>></tt> | ||
treated as a single closing angle bracket, or (with this proposal) a | ||
<tt>>></tt> token treated as a double angle bracket. | ||
<p> | ||
<b>Approach 3.</b> | ||
Finally, a third way to tackle the problem is to eliminate the right shift | ||
token altogether and to modify the grammar so that two consecutive | ||
<tt>></tt> tokens are treated as a right shift operation in the appropriate circumstances. This would for example allow the following form: | ||
<blockquote><tt><pre> | ||
int i = 10000 > > x; | ||
</pre></tt></blockquote> | ||
If limited to the right shift token, this approach introduces no known | ||
new ambiguities, but it does introduce at least one backward compatibility | ||
issue: The <tt>##</tt> preprocessing token can no longer be applied to two | ||
<tt>></tt> tokens. However, it would be surprising to eliminate the | ||
right shift token and not the left shift token. Eliminating the left | ||
shift token does introduce new parsing ambiguities | ||
(e.g., <tt>&X::operator<tt><</tt> <tt><</tt>Y<tt>></tt></tt>). | ||
The shift-assign operators (<tt><<=</tt> and <tt>>>=</tt>) | ||
lead to similar considerations. It may also come as a surprise that | ||
shift operations are realized through a two-token construct, whereas | ||
other operations (e.g., prefix and postfix <tt>--</tt>, or <tt>&&</tt>) | ||
use a single two-character token. | ||
|
||
<h2>Implementation Experience</h2> | ||
<p> | ||
<b>Approach 1.</b> | ||
As mentioned, the first proposal is analogous to the existing language | ||
rule for the <tt>></tt> token. We therefore do not expect implementation difficulty for the approach. | ||
<p> | ||
<b>Approach 2.</b> | ||
The GNU and EDG C++ compilers currently implement the second proposed | ||
alternative for error recovery purposes. It would be trivial to promote | ||
the error recovery procedure to a correct parse procedure. (Other compilers | ||
appear to have a facility for the same purpose, but I do not know their exact | ||
strategy.) | ||
<p> | ||
<b>Approach 3.</b> | ||
I'm unaware of implementation experience with eliminating shift tokens | ||
and replacing them with grammar that allows two-token shift expressions. | ||
|
||
<h2>Recommendation</h2> | ||
<p> | ||
I suggest we pursue “Approach 1” (which breaks some valid programs). | ||
Specifically, I propose that if even a single left angle bracket is active, | ||
a <tt>>></tt> token not enclosed in parentheses is treated as two | ||
right angle brackets and not as a right shift operator. I do <i>not</i> | ||
recommend the variation described as “Approach 1b.” | ||
<p> | ||
My arguments for doing so are the following: | ||
<ul> | ||
<li>It leaves no remaining cases that require whitespace between | ||
two right angle brackets, which makes teaching easier.</li> | ||
<li>It treats the <tt>>></tt> token in the same way as the <tt>></tt> | ||
token, making both specification and teaching simpler.</li> | ||
<li>Programs that would change meaning are probably as contrived as the | ||
example shown above, and therefore unlikely to be found in nature. Programs that would become ill-formed (i.e., containing a nonparenthesized right-shift operator in a trailing nontype template argument) are probably slightly more common but still rare.</li> | ||
</ul> | ||
<p> | ||
(While the approach of eliminating the shift tokens (approach 3) was presented for the | ||
sake of completeness, I find that it has enough small technical and | ||
aesthetic problems to make the other approaches far preferable.) | ||
|
||
<h2>Wording changes</h2> | ||
<p> | ||
Insert after the last normative sentence of 14.2/3, but before the example: | ||
<blockquote>Similarly, the first non-nested <tt>>></tt> is treated as two consecutive but distinct <tt>></tt> tokens, the first of which is taken as the end of the <i>template-argument-list</i> and completes the <i>template-id</i>. [ <i>Note:</i> The second <tt>></tt> token produced by this replacement rule may terminate an enclosing <i>template-id</i> construct or it may be part of a different construct (e.g., a cast). <i>--end note</i> ]</blockquote> | ||
<p> | ||
Replace the example of 14.2/3 by the following: | ||
<blockquote>[ <i>Example:</i><pre> | ||
template<int i> class X { /* ... */ }; | ||
X< 1>2 > x1; // Syntax error. | ||
X<(1>2)> x2; // Okay. | ||
|
||
template<class T> class Y { /* ... */ }; | ||
Y<X<1>> x3; // Okay, same as "Y<X<1> > x3;". | ||
Y<X<6>>1>> x4; // Syntax error. Instead, write "Y<X<(6>>1)>> x4;". | ||
</pre> | ||
</blockquote> | ||
<p> | ||
Insert just before the first "<i>Note:</i>" of translation phase "7." in 2.1/1: | ||
<blockquote>[ <i>Note:</i> The process of analyzing and translating the tokens may occasionally result in one token being replaced by a sequence of other tokens (14.2 temp.names). <i>--end note</i> ]</blockquote> | ||
<p> | ||
Insert a new paragraph 5.2/2 that reads: | ||
<blockquote>[ <i>Note:</i> The <tt>></tt> token following the <i>type-id</i> in a <tt>dynamic_cast</tt>, <tt>static_cast</tt>, <tt>reinterpret_cast</tt>, or <tt>const_cast</tt>, may be the product of replacing a <tt>>></tt> token by two consecutive <tt>></tt> tokens (14.2 temp.names). <i>--end note</i> ]</blockquote> | ||
<p> | ||
Insert in 14/1 just after the grammar rules: | ||
<blockquote>[ <i>Note:</i> The <tt>></tt> token following the <i>template-parameter-list</i> of a <i>template-declaration</i> may be the product of replacing a <tt>>></tt> token by two consecutive <tt>></tt> tokens (14.2 temp.names). <i>--end note</i> ]</blockquote> | ||
<p> | ||
Append to 14.1/1 (following the grammar rules): | ||
<blockquote>[ <i>Note:</i> The <tt>></tt> token following the <i>template-parameter-list</i> of a <i>type-parameter</i> may be the product of replacing a <tt>>></tt> token by two consecutive <tt>></tt> tokens (14.2 temp.names). <i>--end note</i> ]</blockquote> | ||
|
||
<h2>See Also...</h2> | ||
<p> | ||
Reflector messages: c++std-ext-6767,6771,6773,6775,6779,6786,6788,6789,6792,6793,6794,6799,6801,6809. | ||
<p> | ||
Previous revision: N1649/04-0089, N1699/0139. | ||
</body> | ||
</html> |
96 changes: 96 additions & 0 deletions
96
cxx-squid/src/main/java/org/sonar/cxx/channels/RightAngleBracketsChannel.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
/* | ||
* C++ Community Plugin (cxx plugin) | ||
* Copyright (C) 2010-2021 SonarOpenCommunity | ||
* http://github.com/SonarOpenCommunity/sonar-cxx | ||
* | ||
* This program is free software; you can redistribute it and/or | ||
* modify it under the terms of the GNU Lesser General Public | ||
* License as published by the Free Software Foundation; either | ||
* version 3 of the License, or (at your option) any later version. | ||
* | ||
* This program is distributed in the hope that it will be useful, | ||
* but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | ||
* Lesser General Public License for more details. | ||
* | ||
* You should have received a copy of the GNU Lesser General Public License | ||
* along with this program; if not, write to the Free Software Foundation, | ||
* Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. | ||
*/ | ||
package org.sonar.cxx.channels; | ||
|
||
import com.sonar.sslr.api.Token; | ||
import com.sonar.sslr.impl.Lexer; | ||
import org.sonar.cxx.parser.CxxPunctuator; | ||
import org.sonar.sslr.channel.Channel; | ||
import org.sonar.sslr.channel.CodeReader; | ||
|
||
/** | ||
* Solving the problem amounts to decreeing that under some circumstances a >> token is treated as | ||
* two right angle brackets instead of a right shift operator. | ||
* | ||
* According to Document number: N1757 05-0017, Right Angle Brackets (Revision 2) | ||
* | ||
* Decree that if a left angle bracket is active (i.e. not yet matched by a right angle bracket) | ||
* the >> token is treated as two right angle brackets instead of a shift operator, except within | ||
* - parentheses or | ||
* - brackets that are themselves within the angle brackets. | ||
* | ||
* A<(X>Y)> a; // The first > token appears within parentheses and | ||
* // therefore is not a right angle bracket. The second one | ||
* // is a right angle bracket because a left angle bracket | ||
* // is active and no parentheses are more recently active. | ||
*/ | ||
public class RightAngleBracketsChannel extends Channel<Lexer> { | ||
|
||
private int angleBracketLevel = 0; | ||
private int parentheseLevel = 0; | ||
|
||
@Override | ||
public boolean consume(CodeReader code, Lexer output) { | ||
char ch = (char) code.peek(); | ||
boolean consumed = false; | ||
|
||
if (ch == '<') { | ||
if (parentheseLevel == 0) { | ||
char next = code.charAt(1); | ||
if ((next != '<') && (next != '=')) { // not <<, <=, <<=, <=>, | ||
angleBracketLevel++; | ||
} | ||
} | ||
} else if (angleBracketLevel > 0) { | ||
switch (ch) { | ||
case ';': // end of expression => reset | ||
angleBracketLevel = 0; | ||
parentheseLevel = 0; | ||
break; | ||
case '>': | ||
if (parentheseLevel == 0) { | ||
output.addToken(Token.builder() | ||
.setLine(code.getLinePosition()) | ||
.setColumn(code.getColumnPosition()) | ||
.setURI(output.getURI()) | ||
.setValueAndOriginalValue(">") | ||
.setType(CxxPunctuator.GT) | ||
.build()); | ||
code.pop(); | ||
consumed = true; | ||
} | ||
angleBracketLevel = Math.max(0, angleBracketLevel - 1); | ||
if (angleBracketLevel == 0) { | ||
parentheseLevel = 0; | ||
} | ||
break; | ||
case '(': | ||
parentheseLevel++; | ||
break; | ||
case ')': | ||
parentheseLevel = Math.max(0, parentheseLevel - 1); | ||
break; | ||
} | ||
} | ||
|
||
return consumed; | ||
} | ||
|
||
} |
Oops, something went wrong.