Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lex] Reorder subclauses to better follow phases of translation #7316

Merged
merged 1 commit into from
Oct 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 118 additions & 118 deletions source/lex.tex
Original file line number Diff line number Diff line change
Expand Up @@ -627,83 +627,6 @@
\end{example}
\indextext{token!preprocessing|)}

\rSec1[lex.digraph]{Alternative tokens}

\pnum
\indextext{token!alternative|(}%
Alternative token representations are provided for some operators and
punctuators.
\begin{footnote}
\indextext{digraph}%
These include ``digraphs'' and additional reserved words. The term
``digraph'' (token consisting of two characters) is not perfectly
descriptive, since one of the alternative \grammarterm{preprocessing-token}s is
\tcode{\%:\%:} and of course several primary tokens contain two
characters. Nonetheless, those alternative tokens that aren't lexical
keywords are colloquially known as ``digraphs''.
\end{footnote}

\pnum
In all respects of the language, each alternative token behaves the
same, respectively, as its primary token, except for its spelling.
\begin{footnote}
Thus the ``stringized'' values\iref{cpp.stringize} of
\tcode{[} and \tcode{<:} will be different, maintaining the source
spelling, but the tokens can otherwise be freely interchanged.
\end{footnote}
The set of alternative tokens is defined in
\tref{lex.digraph}.

\begin{tokentable}{Alternative tokens}{lex.digraph}{Alternative}{Primary}
\tcode{<\%} & \tcode{\{} &
\keyword{and} & \tcode{\&\&} &
\keyword{and_eq} & \tcode{\&=} \\ \rowsep
\tcode{\%>} & \tcode{\}} &
\keyword{bitor} & \tcode{|} &
\keyword{or_eq} & \tcode{|=} \\ \rowsep
\tcode{<:} & \tcode{[} &
\keyword{or} & \tcode{||} &
\keyword{xor_eq} & \tcode{\caret=} \\ \rowsep
\tcode{:>} & \tcode{]} &
\keyword{xor} & \tcode{\caret} &
\keyword{not} & \tcode{!} \\ \rowsep
\tcode{\%:} & \tcode{\#} &
\keyword{compl} & \tcode{\~} &
\keyword{not_eq} & \tcode{!=} \\ \rowsep
\tcode{\%:\%:} & \tcode{\#\#} &
\keyword{bitand} & \tcode{\&} &
& \\
\end{tokentable}%
\indextext{token!alternative|)}

\rSec1[lex.token]{Tokens}

\indextext{token|(}%
\begin{bnf}
\nontermdef{token}\br
identifier\br
keyword\br
literal\br
operator-or-punctuator
\end{bnf}

\pnum
\indextext{\idxgram{token}}%
There are five kinds of tokens: identifiers, keywords, literals,%
\begin{footnote}
Literals include strings and character and numeric literals.
\end{footnote}
operators, and other separators.
\indextext{whitespace}%
Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments
(collectively, ``whitespace''), as described below, are ignored except
as they serve to separate tokens.
\begin{note}
Whitespace can separate otherwise adjacent identifiers, keywords, numeric
literals, and alternative tokens containing alphabetic characters.
\end{note}
\indextext{token|)}

\rSec1[lex.header]{Header names}

\indextext{header!name|(}%
Expand Down Expand Up @@ -793,6 +716,124 @@
a \grammarterm{floating-point-literal} token.%
\indextext{number!preprocessing|)}

\rSec1[lex.operators]{Operators and punctuators}

\pnum
\indextext{operator|(}%
\indextext{punctuator|(}%
The lexical representation of \Cpp{} programs includes a number of
preprocessing tokens that are used in the syntax of the preprocessor or
are converted into tokens for operators and punctuators:

\begin{bnf}
\nontermdef{preprocessing-op-or-punc}\br
preprocessing-operator\br
operator-or-punctuator
\end{bnf}

\begin{bnf}
%% Ed. note: character protrusion would misalign various operators.
\microtypesetup{protrusion=false}\obeyspaces
\nontermdef{preprocessing-operator} \textnormal{one of}\br
\terminal{\# \#\# \%: \%:\%:}
\end{bnf}

\begin{bnf}
\microtypesetup{protrusion=false}\obeyspaces
\nontermdef{operator-or-punctuator} \textnormal{one of}\br
\terminal{\{ \} [ ] ( )}\br
\terminal{<: :> <\% \%> ; : ...}\br
\terminal{? :: . .* -> ->* \~}\br
\terminal{! + - * / \% \caret{} \& |}\br
\terminal{= += -= *= /= \%= \caret{}= \&= |=}\br
\terminal{== != < > <= >= <=> \&\& ||}\br
\terminal{<< >> <<= >>= ++ -- ,}\br
\terminal{\keyword{and} \keyword{or} \keyword{xor} \keyword{not} \keyword{bitand} \keyword{bitor} \keyword{compl}}\br
\terminal{\keyword{and_eq} \keyword{or_eq} \keyword{xor_eq} \keyword{not_eq}}
\end{bnf}

Each \grammarterm{operator-or-punctuator} is converted to a single token
in translation phase 7\iref{lex.phases}.%
\indextext{punctuator|)}%
\indextext{operator|)}

\rSec1[lex.digraph]{Alternative tokens}

\pnum
\indextext{token!alternative|(}%
Alternative token representations are provided for some operators and
punctuators.
\begin{footnote}
\indextext{digraph}%
These include ``digraphs'' and additional reserved words. The term
``digraph'' (token consisting of two characters) is not perfectly
descriptive, since one of the alternative \grammarterm{preprocessing-token}s is
\tcode{\%:\%:} and of course several primary tokens contain two
characters. Nonetheless, those alternative tokens that aren't lexical
keywords are colloquially known as ``digraphs''.
\end{footnote}

\pnum
In all respects of the language, each alternative token behaves the
same, respectively, as its primary token, except for its spelling.
\begin{footnote}
Thus the ``stringized'' values\iref{cpp.stringize} of
\tcode{[} and \tcode{<:} will be different, maintaining the source
spelling, but the tokens can otherwise be freely interchanged.
\end{footnote}
The set of alternative tokens is defined in
\tref{lex.digraph}.

\begin{tokentable}{Alternative tokens}{lex.digraph}{Alternative}{Primary}
\tcode{<\%} & \tcode{\{} &
\keyword{and} & \tcode{\&\&} &
\keyword{and_eq} & \tcode{\&=} \\ \rowsep
\tcode{\%>} & \tcode{\}} &
\keyword{bitor} & \tcode{|} &
\keyword{or_eq} & \tcode{|=} \\ \rowsep
\tcode{<:} & \tcode{[} &
\keyword{or} & \tcode{||} &
\keyword{xor_eq} & \tcode{\caret=} \\ \rowsep
\tcode{:>} & \tcode{]} &
\keyword{xor} & \tcode{\caret} &
\keyword{not} & \tcode{!} \\ \rowsep
\tcode{\%:} & \tcode{\#} &
\keyword{compl} & \tcode{\~} &
\keyword{not_eq} & \tcode{!=} \\ \rowsep
\tcode{\%:\%:} & \tcode{\#\#} &
\keyword{bitand} & \tcode{\&} &
& \\
\end{tokentable}%
\indextext{token!alternative|)}

\rSec1[lex.token]{Tokens}

\indextext{token|(}%
\begin{bnf}
\nontermdef{token}\br
identifier\br
keyword\br
literal\br
operator-or-punctuator
\end{bnf}

\pnum
\indextext{\idxgram{token}}%
There are five kinds of tokens: identifiers, keywords, literals,%
\begin{footnote}
Literals include strings and character and numeric literals.
\end{footnote}
operators, and other separators.
\indextext{whitespace}%
Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments
(collectively, ``whitespace''), as described below, are ignored except
as they serve to separate tokens.
\begin{note}
Whitespace can separate otherwise adjacent identifiers, keywords, numeric
literals, and alternative tokens containing alphabetic characters.
\end{note}
\indextext{token|)}

\rSec1[lex.name]{Identifiers}

\indextext{identifier|(}%
Expand Down Expand Up @@ -1038,47 +1079,6 @@
\indextext{keyword|)}%


\rSec1[lex.operators]{Operators and punctuators}

\pnum
\indextext{operator|(}%
\indextext{punctuator|(}%
The lexical representation of \Cpp{} programs includes a number of
preprocessing tokens that are used in the syntax of the preprocessor or
are converted into tokens for operators and punctuators:

\begin{bnf}
\nontermdef{preprocessing-op-or-punc}\br
preprocessing-operator\br
operator-or-punctuator
\end{bnf}

\begin{bnf}
%% Ed. note: character protrusion would misalign various operators.
\microtypesetup{protrusion=false}\obeyspaces
\nontermdef{preprocessing-operator} \textnormal{one of}\br
\terminal{\# \#\# \%: \%:\%:}
\end{bnf}

\begin{bnf}
\microtypesetup{protrusion=false}\obeyspaces
\nontermdef{operator-or-punctuator} \textnormal{one of}\br
\terminal{\{ \} [ ] ( )}\br
\terminal{<: :> <\% \%> ; : ...}\br
\terminal{? :: . .* -> ->* \~}\br
\terminal{! + - * / \% \caret{} \& |}\br
\terminal{= += -= *= /= \%= \caret{}= \&= |=}\br
\terminal{== != < > <= >= <=> \&\& ||}\br
\terminal{<< >> <<= >>= ++ -- ,}\br
\terminal{\keyword{and} \keyword{or} \keyword{xor} \keyword{not} \keyword{bitand} \keyword{bitor} \keyword{compl}}\br
\terminal{\keyword{and_eq} \keyword{or_eq} \keyword{xor_eq} \keyword{not_eq}}
\end{bnf}

Each \grammarterm{operator-or-punctuator} is converted to a single token
in translation phase 7\iref{lex.phases}.%
\indextext{punctuator|)}%
\indextext{operator|)}

\rSec1[lex.literal]{Literals}%
\indextext{literal|(}

Expand Down
Loading