From dcf6f3281b41e963bd6b77e209bd622f630d0f9d Mon Sep 17 00:00:00 2001 From: Bruno Haible Date: Fri, 15 Mar 2024 16:57:24 +0100 Subject: [PATCH] Disallow whitespace as the first character of a reserved-body in a reserved-statement. In the 'reserved-statement' nonterminal, there is an ambiguity if there is more than one whitespace character between the 'reserved-keyword' and the first non-whitespace character of the 'reserved-body', because these whitespace characters can be seen as part of the 's' nonterminal or as part of the 'reserved-body' nonterminal. According to the principles explained in #725 and the proposed resolution of #721, it is not desired that a 'reserved-body' starts with a whitespace character; rather, such a whitespace character is meant to be interpreted as part of the preceding 's' nonterminal. Test case: ``` .regex /foo/{xyz}{{hello}} ``` This patch removes this ambiguity, by disallowing whitespace as the first character of a 'reserved-body' in a reserved-statement. It thus fixes the first part of #721. Details: - In the other occurrences of 'resolved-body' as well (in a 'reserved-annotation' or 'private-use-annotation') the leading whitespace is separated as well. This has no influence on the set of inputs that the 'reserved-annotation' and 'private-use-annotation' nonterminals can match, but highlights that the parser should better trim off this leading whitespace in these places before entering the resolved-body into the data model. - A nonterminal 'resolved-body-part' is introduced. --- spec/message.abnf | 7 ++++--- spec/syntax.md | 7 ++++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/spec/message.abnf b/spec/message.abnf index 8436fb9c99..465c7fd89b 100644 --- a/spec/message.abnf +++ b/spec/message.abnf @@ -61,13 +61,14 @@ reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression) reserved-keyword = "." name ; Reserve additional sigils for use by future versions of this specification. -reserved-annotation = reserved-annotation-start reserved-body +reserved-annotation = reserved-annotation-start [[s] reserved-body] reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~" ; Reserve sigils for private-use by implementations. -private-use-annotation = private-start reserved-body +private-use-annotation = private-start [[s] reserved-body] private-start = "^" / "&" -reserved-body = *([s] 1*(reserved-char / reserved-escape / quoted)) +reserved-body = reserved-body-part *([s] reserved-body-part) +reserved-body-part = reserved-char / reserved-escape / quoted ; Names and identifiers ; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName diff --git a/spec/syntax.md b/spec/syntax.md index 3b4384c8e8..b826899751 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -610,7 +610,7 @@ wish to use a syntax exactly like other functions. Specifically: A _private-use annotation_ MAY be empty after its introducing sigil. ```abnf -private-use-annotation = private-start reserved-body +private-use-annotation = private-start [[s] reserved-body] private-start = "^" / "&" ``` @@ -653,10 +653,11 @@ While a reserved sequence is technically "well-formed", unrecognized _reserved-annotations_ or _private-use-annotations_ have no meaning. ```abnf -reserved-annotation = reserved-annotation-start reserved-body +reserved-annotation = reserved-annotation-start [[s] reserved-body] reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~" -reserved-body = *([s] 1*(reserved-char / reserved-escape / quoted)) +reserved-body = reserved-body-part *([s] reserved-body-part) +reserved-body-part = reserved-char / reserved-escape / quoted ``` ## Markup