From 06fa1796b781dd850ead7e873d779dd1f1b510e6 Mon Sep 17 00:00:00 2001 From: "Dr. Colin Hirsch" Date: Tue, 23 Apr 2024 23:20:59 +0200 Subject: [PATCH] Work on rule reference. --- doc/Rule-Reference.md | 540 +++++++++++++++++++++++------------------- 1 file changed, 300 insertions(+), 240 deletions(-) diff --git a/doc/Rule-Reference.md b/doc/Rule-Reference.md index 11400a967..2d6fc90c3 100644 --- a/doc/Rule-Reference.md +++ b/doc/Rule-Reference.md @@ -1,42 +1,14 @@ # Rule Reference -This page contains brief descriptions of all PEGTL rule and combinator classes. +The reference documentation for all PEGTL rules and combinators. -## Failure Modes - -The information about how much input is consumed by the rules only applies when the rules succeed. -Otherwise there are two failure modes with different requirements. - -- *Local failure* is when a rule returns `false` and the rule **must** generally rewind the input to where its match attempt started. -- *Global failure* is when a rule throws an exception (usually of type `tao::pegtl::parse_error`)(usually via the control-class' `raise()` function). - -Since an exception, by default, aborts a parsing run -- hence the term "global failure" -- there are no assumptions or requirements for the throwing rule to rewind the input. - -On the other hand a local failure will frequently lead to back-tracking, i.e. the attempt to match a different rule at the same position in the input, wherefore rules that were previously attempted at the same position must rewind back to where they started in preparation of the next attempt. - -Note that in some cases it is not necessary to actually rewind on local failure, see the description of the [rewind_mode](Rules-and-Grammars.md#modes) in the section on [how to implement custom rules](Rules-and-Grammars.md#creating-new-rules), and that the PEGTL attempts to minimise superfluous rewinding by statically detecting most of these cases. - -## Equivalence - -Some rule classes are said to be *equivalent to* a combination of other rules. -Here, *equivalence* is with respect to which inputs are matched, but not (necessarily) how the rule is implemented. - -For rules other than `must<>` that contain "must" in their name, rule equivalence shows which rule will be used to call the control class' `raise()` function when certain sub-rules fail to match. - -## Implementation - -The "meta data and implementation mapping" section of each rule's description shows both how the rule is implemented and what the [meta data](Meta-Data-and-Visit.md) looks like. -When the list of sub-rules is empty then the definition of `subs_t` is omitted from the description. - -## Parameter Packs - -The documentation will use [(template parameter) packs](https://en.cppreference.com/w/cpp/language/parameter_pack) when zero-or-more or one-or-more of a (template) parameter are allowed. -For example `seq< R... >` accepts zero-or-more template parameters. -In the zero case, i.e. `seq<>`, we describe `R` as "empty". -When at least one parameter is given, i.e. `seq< A >` or `seq< A, B, C >`, `R` is "non-empty". ## Contents +* [Remarks](#remarks) + * [Equivalence](#equivalence) + * [Implementation](#implemenation) + * [Parameter Packs](#parameter-packs) * [Meta Rules](#meta-rules) * [Combinators](#combinators) * [Convenience](#convenience) @@ -55,15 +27,42 @@ When at least one parameter is given, i.e. `seq< A >` or `seq< A, B, C >`, `R` i * [Exception Rules](#exception-rules) * [Full Index](#full-index) + +## Remarks + +#### Equivalence + +Some rule classes are said to be *equivalent to* a combination of other rules. +This *equivalence* is with respect to which inputs are matched, but is **not** (necessarily) how the rule is implemented. + +For rules other than `must<>` that contain "must" in their name, rule equivalence shows which rule will be used to call the control class' `raise()` function when certain sub-rules fail to match. + +#### Implementation + +The "meta data and implementation mapping" section of each rule's description shows both how the rule is implemented and what the [meta data](Meta-Data-and-Visit.md) looks like. +When the list of sub-rules is empty then the definition of `subs_t` is omitted from the description. + +Remember that the default [control](Control-and-Debug.md) `tao::pegtl::normal` does *not* call control functions for all rules in the `tao::pegtl::internal` namespace. + +#### Parameter Packs + +The documentation will use [(template parameter) packs](https://en.cppreference.com/w/cpp/language/parameter_pack) when zero-or-more or one-or-more of a (template) parameter are allowed. +For example `seq< R... >` accepts zero-or-more template parameters. +In the zero case, i.e. `seq<>`, we describe `R` as "empty". +When at least one parameter is given, i.e. `seq< A >` or `seq< A, B, C >`, `R` is "non-empty". + + ## Meta Rules +These rules manage and change the [control](Control-and-Debug.md), [action and states](Actions-and-States.md) during a parsing run. + These rules are in namespace `tao::pegtl`. ###### `action< A, R... >` * [Equivalent] to `seq< R... >`, but: -* Uses the given class template `A` for [actions](Actions-and-States.md). -* Does not `enable` or `disable` actions while matching `R...`. +* Uses the given class template `A` as [action](Actions-and-States.md). +* Does *not* change whether actions are enabled or disabled! * [Meta data] and [implementation] mapping: - `action< A >::rule_t` is `internal::success` - `action< A, R >::rule_t` is `internal::action< A, R >` @@ -74,7 +73,7 @@ These rules are in namespace `tao::pegtl`. ###### `control< C, R... >` * [Equivalent] to `seq< R... >`, but: -* Uses the given class template `C` as [control class](Control-and-Debug.md). +* Uses the given class template `C` as [control](Control-and-Debug.md). * [Meta data] and [implementation] mapping: - `control< C >::rule_t` is `internal::success` - `control< C, R >::rule_t` is `internal::control< C, R >` @@ -85,7 +84,7 @@ These rules are in namespace `tao::pegtl`. ###### `disable< R... >` * [Equivalent] to `seq< R... >`, but: -* Disables all actions. +* Disables all actions while parsing `R...`. * [Meta data] and [implementation] mapping: - `disable<>::rule_t` is `internal::success` - `disable< R >::rule_t` is `internal::disable<, R >` @@ -96,7 +95,7 @@ These rules are in namespace `tao::pegtl`. ###### `enable< R... >` * [Equivalent] to `seq< R... >`, but: -* Enables all actions (if any). +* Enables all actions (if any) while parsing `R...`. * [Meta data] and [implementation] mapping: - `enable<>::rule_t` is `internal::success` - `enable< R >::rule_t` is `internal::enable< R >` @@ -108,8 +107,9 @@ These rules are in namespace `tao::pegtl`. * [Equivalent] to `seq< R... >`, but: * Replaces all state arguments with a new instance `s` of type `S`. -* `s` is constructed with the input and all previous states as arguments. -* If `seq< R... >` succeeds then `s.success()` is called with the input after the match and all previous states as arguments. +* `s` is constructed with the input and all previous states as arguments, or +* `s` is default constructed if `S` has no constructor for the above clause. +* If `R...` succeeds then `s.success()` is called with the input (after the match) and all previous states as arguments. * [Meta data] and [implementation] mapping: - `state< S >::rule_t` is `internal::success` - `state< S, R >::rule_t` is `internal::state< S, R >` @@ -117,18 +117,21 @@ These rules are in namespace `tao::pegtl`. - `state< S, R... >::rule_t` is `internal::state< S, internal::seq< R... > >` - `state< S, R... >::subs_t` is `type_list< internal::seq< R... > >` + ## Combinators Combinators (or combinator rules) are rules that combine (other) rules into new ones. -These are the classical **PEG** combinator rules and are defined in namespace `tao::pegtl`. +The following rules correspond to the classical [**PEG**](https://en.wikipedia.org/wiki/Parsing_expression_grammar) combinators or operators. + +These rules are in namespace `tao::pegtl`. ###### `at< R... >` * PEG **and-predicate** &*e* * Succeeds if and only if `seq< R... >` would succeed. -* Consumes nothing, i.e. rewinds after matching. -* Disables all actions. +* Consumes nothing independent of result. +* Disables all actions while matching `R...`. * [Meta data] and [implementation] mapping: - `at<>::rule_t` is `internal::success` - `at< R >::rule_t` is `internal::at< R >` @@ -140,8 +143,8 @@ These are the classical **PEG** combinator rules and are defined in namespace `t * PEG **not-predicate** !*e* * Succeeds if and only if `seq< R... >` would **not** succeed. -* Consumes nothing, i.e. rewinds after matching. -* Disables all actions. +* Consumes nothing independent of result. +* Disables all actions whiel matching `R...`. * [Meta data] and [implementation] mapping: - `not_at<>::rule_t` is `internal::failure` - `not_at< R >::rule_t` is `internal::not_at< R >` @@ -152,7 +155,8 @@ These are the classical **PEG** combinator rules and are defined in namespace `t ###### `opt< R... >` * PEG **optional** *e*? -* Optional `seq< R... >`, i.e. attempt to match `seq< R... >` and signal success regardless of the result. +* Optional `seq< R... >`, i.e. attempt to match `seq< R... >` and always return success: +* The return value of `opt< R... >` does *not* depend on whether `R...` matched. * [Equivalent] to `sor< seq< R... >, success >`. * [Meta data] and [implementation] mapping: - `opt<>::rule_t` is `internal::success` @@ -165,7 +169,7 @@ These are the classical **PEG** combinator rules and are defined in namespace `t * PEG **one-or-more** *e*+ * Matches `seq< R... >` as often as possible and succeeds if it matches at least once. -* [Equivalent] to `rep_min< 1, R... >`. +* [Equivalent] to `seq< R..., star< R... > >`. * `R` must be a non-empty rule pack. * [Meta data] and [implementation] mapping: - `plus< R >::rule_t` is `internal::plus< R >` @@ -207,6 +211,7 @@ These are the classical **PEG** combinator rules and are defined in namespace `t * PEG **zero-or-more** *e** * Matches `seq< R... >` as often as possible and always succeeds. +* [Equivalent] to `opt< plus< R... > >`. * `R` must be a non-empty rule pack. * [Meta data] and [implementation] mapping: - `star< R >::rule_t` is `internal::star< R >` @@ -214,16 +219,16 @@ These are the classical **PEG** combinator rules and are defined in namespace `t - `star< R... >::rule_t` is `internal::star< internal::seq< R... > >` - `star< R... >::subs_t` is `type_list< internal::seq< R... > >` + ## Convenience -The PEGTL offers a variety of convenience rules which help writing -concise grammars as well as offering performance benefits over the -equivalent implementation with classical PEG combinators. +The PEGTL contains convenience rules that serve as higher-level building blocks for concise grammars and/or as higher performance alternative to an equivalent implementation with classical PEG combinators. These rules are in namespace `tao::pegtl`. ###### `if_then_else< R, S, T >` +* Attempts to match either `S` or `T` depending on whether `R` matched. * [Equivalent] to `sor< seq< R, S >, seq< not_at< R >, T > >`. * [Meta data] and [implementation] mapping: - `if_then_else< R, S, T >::rule_t` is `internal::if_then_else< R, S, T>` @@ -281,9 +286,13 @@ Note that `S` is ignored in the grammar analysis. - `pad< R, S, T >::rule_t` is `internal::seq< internal::star< S >, R, internal::star< T > >` - `pad< R, S, T >::subs_t` is `type_list< internal::star< S >, R, internal::star< T > >` +A common mistake is to forget about the implicit `star` and use e.g. `star< blank >` for `S` (and `T`). +This attempts to match `star< star< S > >` which is an infinite loop without progress. +The PEGTL [grammar analysis](Grammar-Analysis.md) catches this mistake. + ###### `pad_opt< R, P >` -* Matches an optional `R` that can be padded by arbitrary many `P` or just arbitrary many `P`. +* Matches an optional `R` that can be padded by arbitrary many `P`, or just arbitrary many `P`. * [Equivalent] to `seq< star< P >, opt< R, star< P > > >`. * [Meta data] and [implementation] mapping: - `pad_opt< R, P >::rule_t` is `internal::seq< internal::star< P >, internal::opt< R, internal::star< P > > >` @@ -299,6 +308,7 @@ Note that `S` is ignored in the grammar analysis. * Consumes everything that the successful rules of `R...` consumed. * `R` must be a non-empty rule pack. * [Equivalent] to `opt< R >` when `R...` is a single rule. +* [Equivalent] to `opt< R1, opt< R2, opt< ...` when `R` is `R1`, `R2`, ... * [Meta data] and [implementation] mapping: - `partial< R... >::rule_t` is `internal::partial< R... >` - `partial< R... >::subs_t` is `type_list< R... >` @@ -306,12 +316,12 @@ Note that `S` is ignored in the grammar analysis. ###### `rematch< R, S... >` * Succeeds if `R` matches, and each `S` matches the input that `R` matched. -* Ignores all `S` for the [grammar analysis](Grammar-Analysis.md). * [Meta data] and [implementation] mapping: - `rematch< R, S... >::rule_t` is `internal::rematch< R, S... >` - `rematch< R, S... >::subs_t` is `type_list< R, S... >` -Note that the `S` do *not* need to match *all* of the input matched by `R` (which is why `minus` uses `eof` in its implementation). +Note that the rules in `S...` do *not* need to match *all* of the input matched by `R`. +(Which is why [`minus`](#minus-m-s-) uses `eof` in its implementation). Note that the `S...` are ignored in the grammar analysis. @@ -407,7 +417,7 @@ Note that the `S...` are ignored in the grammar analysis. ###### `until< R >` * Consumes all input until `R` matches. -* [Equivalent] to `until< R, any >`. +* [Equivalent] to `until< R, consume< 1 > >`. * [Meta data] and [implementation] mapping: - `until< R >::rule_t` is `internal::until< R >` - `until< R >::subs_t` is `type_list< R >` @@ -416,21 +426,22 @@ Note that the `S...` are ignored in the grammar analysis. * Matches `seq< S... >` as long as `at< R >` does not match and succeeds when `R` matches. * [Equivalent] to `seq< star< not_at< R >, S... >, R >`. -* Does not apply if `S` is an empty rule pack, see the previous entry for the semantics of `until< R >`. +* See the previous entry for `until< R >`, i.e. when `S...` is empty. * [Meta data] and [implementation] mapping: - `until< R, S >::rule_t` is `internal::until< R, S >` - `until< R, S >::subs_t` is `type_list< R, S >` - `until< R, S... >::rule_t` is `internal::until< R, internal::seq< S... > >` - `until< R, S... >::subs_t` is `type_list< R, internal::seq< S... > >` -## Action Rules -These rules are in namespace `tao::pegtl`. +## Action Rules These rules replicate the intrusive way actions were called from within the grammar in the PEGTL 0.x with the `apply<>` and `if_apply<>` rules. -The actions for these rules are classes, rather than class templates as required for `parse()` and the `action<>`-rule. +The actions for these rules are classes, rather than class templates as required for `parse()` and `action<>`. These rules *do* respect the current `apply_mode`, but do **not** use the Control class to invoke the actions. +These rules are in namespace `tao::pegtl`. + ###### `apply< A... >` * Calls `A::apply()` for all `A`, in order, with an empty input and all states as arguments. @@ -456,24 +467,25 @@ These rules *do* respect the current `apply_mode`, but do **not** use the Contro - `if_apply< R, A... >::rule_t` is `internal::if_apply< R, A... >` - `if_apply< R, A... >::subs_t` is `type_list< R >` + ## Atomic Rules -These rules are in namespace `tao::pegtl`. +Atomic rules do not rely on other rules (but might have template parameters that are not rules). -Atomic rules do not rely on other rules. +These rules are in namespace `tao::pegtl`. ###### `bof` * Succeeds at "beginning-of-file", i.e. when the input is at its start. * Does not consume input. * Requires an input `in` with the `in.start()` member function, and/or -* requires an input `in` where `in`direct_position()` has a `count` member. +* requires an input `in` where `in.direct_position().count` is available. * [Meta data] and [implementation] mapping: - `bof::rule_t` is `internal::bof` ###### `bol` -* Succeeds at "beginning-of-line", i.e. when the input's `column()` member function returns one. +* Succeeds at "beginning-of-line", i.e. when the input's `column()` member function returns `1`. * Does not consume input. * Requires an input with eager text position tracking, more precisely: * Requires an input `in` where `in.direct_position().column` is available. @@ -482,23 +494,33 @@ Atomic rules do not rely on other rules. ###### `consume< Count >` -TODO +* Succeeds if the input contains at least `Count` further objects, and +* unconditionally consumes `Count` objects from the input. +* Limited to the buffer size when using an [Incremental Input]. +* [Meta data] and [imnplementation] mapping: + - `consume< 0 >::rule_t` is `internal::success` + - `consume< N >::rule_t` is `internal::consume< N, void >` ###### `eof` -* Succeeds at "end-of-file", i.e. when the input is empty or all input has been consumed. +* Succeeds at "end-of-file", i.e. when the input is empty (all input has been consumed). * Does not consume input. * [Meta data] and [implementation] mapping: - `eof::rule_t` is `internal::eof` ###### `eol` -* Depends on the `Eol` template parameter of the input, by default: -* Matches and consumes a Unix or MS-DOS line ending, that is: +* Matches a single end-of-line as defined by the input. +* The default definition of what constitutes a lines ending is system dependent. + - On Unix-like systems a Unix line ending is matched, a single 'LF'. + - On Windows systems an MS-DOS line ending 'CR LF' and Unix line endings work. +* Requires an input with an `eol_rule` type definition. * [Equivalent] to `sor< one< '\n' >, string< '\r', '\n' > >`. * [Meta data] and [implementation] mapping: - `eol::rule_t` is `internal::eol` +Note that the default behaviour can be changed either by defining `TAO_PEGTL_DEFAULT_EOL` before `tao/pegtl/system.hpp` is (indirectly) included or by supplying an end-of-line rule as template parameter to the input. + ###### `eolf` * [Equivalent] to `sor< eof, eol >`. @@ -507,7 +529,7 @@ TODO ###### `everything` -* Matches and consumes the entire input in one go, but: +* Matches and consumes the entire input in one go. * Limited by the buffer size when using an [Incremental Input]. * [Equivalent] to `until< eof, any >`. * [Meta data] and [implementation] mapping: @@ -515,163 +537,178 @@ TODO ###### `failure` -* Dummy rule that never succeeds. +* Rule that never succeeds. * Does not consume input. * [Meta data] and [implementation] mapping: - `failure::rule_t` is `internal::failure` ###### `success` -* Dummy rule that always succeeds. +* Rule that always succeeds. * Does not consume input. * [Meta data] and [implementation] mapping: - `success::rule_t` is `internal::success` -## ASCII Rules -These rules are in the inline namespace `tao::pegtl::ascii`. +## ASCII Rules -The ASCII rules operate on single bytes, without restricting the range of values to 7 bits. +The ASCII rules operate on single bytes. +Most do *not* restrict the range of matched values to 7 bits, or 0 to 127. They are compatible with input with the 8th bit set in the sense that nothing breaks in their presence. -Rules like `ascii::any` or `ascii::not_one< 'a' >` will match all possible byte values, -and all possible byte values excluding `'a'`, respectively. However the character class rules like -`ascii::alpha` only match the corresponding ASCII characters. -It is possible to match UTF-8 multi-byte characters with the ASCII rules, -for example the Euro sign code point `U+20AC`, which is encoded by the UTF-8 sequence `E2 82 AC`, -can be matched by either `tao::pegtl::ascii::string< 0xe2, 0x82, 0xac >` or `tao::pegtl::utf8::one< 0x20ac >`. +Rules like `ascii::any` or `ascii::not_one< 'a' >` will match all possible byte values, and all possible byte values excluding `'a'`, respectively. +The character class rules like `ascii::alpha` are understood to only match the corresponding ASCII characters. + +It is possible to match UTF-8 multi-byte characters with the ASCII rules. +For example the Euro sign code point `U+20AC`, which is encoded by the UTF-8 sequence `E2 82 AC`, +can be matched by both `ascii::string< 0xe2, 0x82, 0xac >` and `utf8::one< 0x20ac >` as both of these rules have `internal::ascii_string< 0xe2, 0x82, 0xac >` as their `rule_t`. -The ASCII rules are input-adaptive and work with inputs whose `data_t` is a 16bit or 32bit integer or enum type. +The ASCII rules operate on any input of integral or enum type of size 1. -ASCII rules do not usually rely on other rules. +These rules are in the *inline* namespace `tao::pegtl::ascii`. ###### `alnum` * Matches and consumes a single ASCII alphabetic or numeric character. -* [Equivalent] to `ranges< 'a', 'z', 'A', 'Z', '0', '9' >`. +* [Equivalent] to `(ascii::)ranges< 'a', 'z', 'A', 'Z', '0', '9' >`. ###### `alpha` * Matches and consumes a single ASCII alphabetic character. -* [Equivalent] to `ranges< 'a', 'z', 'A', 'Z' >`. +* [Equivalent] to `(ascii::)ranges< 'a', 'z', 'A', 'Z' >`. ###### `any` * Matches and consumes any single byte, including all ASCII characters. -* [Equivalent] to `bytes< 1 >`. +* [Equivalent] to `(ascii::)many< 1 >`. * [Meta data] and [implementation] mapping: - `ascii::any::rule_t` is `internal::any< internal::peek_char >` +###### `any7` + +* Matches and consumes any single "true" ASCII character that fits into 7 bits. +* [Equivalent] to `(ascii::)range< 0, 127 >`. +* [Meta data] and [implementation] mapping: + - `ascii::any7::rule_t` is `internal::any< internal::peek_seven >` + ###### `blank` * Matches and consumes a single ASCII horizontal space or horizontal tabulator character. -* [Equivalent] to `one< ' ', '\t' >`. +* [Equivalent] to `(ascii::)one< ' ', '\t' >`. ###### `cntrl` * Matches and consumes a single ASCII control character. -* [Equivalent] to `ranges< 0, 31, 127 >'. +* [Equivalent] to `(ascii::)ranges< 0, 31, 127 >'. ###### `cr` * Matches and consumes a single ASCII carriage return character of value `13` or `0x0d`. -* [Equivalent] to `one< '\r' >`. +* [Equivalent] to `(ascii::)one< '\r' >`. ###### `crlf` * Matches and consumes the common ASCII carriage return followed by a line feed. -* [Equivalent] to `string< '\r', '\n' >`. +* [Equivalent] to `(ascii::)string< '\r', '\n' >`. ###### `digit` -* Matches and consumes a single ASCII decimal digit character. -* [Equivalent] to `range< '0', '9' >`. +* Matches and consumes a single ASCII decimal digit. +* [Equivalent] to `(ascii::)range< '0', '9' >`. ###### `ellipsis` * Matches and consumes three dots. -* [Equivalent] to `three< '.' >`. +* [Equivalent] to `(ascii::)three< '.' >`. ###### `esc` * Matches and consumes a single ASCII escape character of value `27` or `0x1b`. -* [Equivalent] to `one< '\e' >`. +* [Equivalent] to `(ascii::)one< '\e' >`. ###### `ff` * Matches and consumes a single ASCII form feed (new page) character of value `12` or `0x0c`. -* [Equivalent] to `one< '\f' >`. +* [Equivalent] to `(ascii::)one< '\f' >`. ###### `graph` * Matches and consumes a single ASCII character traditionally defined as "printable but not space". -* [Equivalent] to `range< '33', '126' >`. +* [Equivalent] to `(ascii::)range< '33', '126' >`. ###### `ht` * Matches and consumes a single ASCII horizontal tab character of value `9`. -* [Equivalent] to `one< '\t' >`. +* [Equivalent] to `(ascii::)one< '\t' >`. ###### `identifier_first` * Matches and consumes a single ASCII character permissible as first character of a C identifier. -* [Equivalent] to `ranges< 'a', 'z', 'A', 'Z', '_' >`. +* [Equivalent] to `(ascii::)ranges< 'a', 'z', 'A', 'Z', '_' >`. * [Meta data] and [implementation] mapping: - `ascii::identifier_first::rule_t` is `internal::ranges< internal::peek_char, 'a', 'z', 'A', 'Z', '_' >` ###### `identifier_other` * Matches and consumes a single ASCII character permissible as subsequent character of a C identifier. -* [Equivalent] to `ranges< 'a', 'z', 'A', 'Z', '0', '9', '_' >`. +* [Equivalent] to `(ascii::)ranges< 'a', 'z', 'A', 'Z', '0', '9', '_' >`. * [Meta data] and [implementation] mapping: - `ascii::identifier_first::rule_t` is `internal::ranges< internal::peek_char, 'a', 'z', 'A', 'Z', '0', '9', '_' >` ###### `identifier` * Matches and consumes an ASCII identifier as defined for the C programming language. -* [Equivalent] to `seq< identifier_first, star< identifier_other > >`. +* [Equivalent] to `seq< (ascii::)identifier_first, star< (ascii::)identifier_other > >`. * [Meta data] and [implementation] mapping: - `ascii::identifier::rule_t` is `internal::seq< identifier_first, internal::star< identifier_other > >`. ###### `istring< C... >` * Matches and consumes the given ASCII string `C...` with case insensitive matching. -* Similar to `string< C... >`, but: +* Similar to `(ascii::)string< C... >`, but: * For ASCII letters a-z and A-Z the match is case insensitive. * [Meta data] and [implementation] mapping: - `ascii::istring<>::rule_t` is `internal::success` - - `ascii::istring< C... >::rule_t` is `internal::istring< C... >` + - `ascii::istring< C... >::rule_t` is `internal::ascii_istring< C... >` ###### `keyword< C... >` * Matches and consumes a non-empty string not followed by an identifier character. -* [Equivalent] to `seq< string< C... >, not_at< identifier_other > >`. +* [Equivalent] to `seq< (ascii::)string< C... >, not_at< (ascii::)identifier_other > >`. * `C` must be a non-empty character pack. * [Meta data] and [implementation] mapping: - - `ascii::keyword< C... >::rule_t` is `internal::seq< internal::string< C... >, internal::not_at< internal::ranges< internal::peek_char, 'a', 'z', 'A', 'Z', '0', '9', '_' > > >` + - `ascii::keyword< C... >::rule_t` is `internal::seq< internal::ascii_string< C... >, internal::not_at< internal::ranges< internal::peek_char, 'a', 'z', 'A', 'Z', '0', '9', '_' > > >` ###### `lf` * Matches and consumes a single ASCII line feed (new line) character of value `10` or `0x0a`. -* [Equivalent] to `one< '\n' >`. +* [Equivalent] to `(ascii::)one< '\n' >`. ###### `lfcr` * Matches and consumes an uncommon ASCII line feed followed by a carriage return. -* [Equivalent] to `string< '\n', '\r' >`. +* [Equivalent] to `(ascii::)string< '\n', '\r' >`. ###### `lower` * Matches and consumes a single ASCII lower-case alphabetic character. -* [Equivalent] to `range< 'a', 'z' >`. +* [Equivalent] to `(ascii::)range< 'a', 'z' >`. ###### `many< Num >` * Succeeds when the input contains at least `Num` further bytes. * Consumes these `Num` bytes from the input. -* [Equivalent] to `rep< N, any >`. +* [Equivalent] to `(ascii::)rep< Num, any >`. +* [Meta data] and [implementation] mapping: + - `ascii::many< 0 >::rule_t` is `internal::success` + - `ascii::many< Num >::rule_t` is `internal::many< Num, internal::peek_char >` + +###### `many7< Num >` + +* Matches and consumes `Num` "true" ASCII characters that fit into 7 bits. +* [Equivalent] to `req< Num, (ascii::)any7 >`. * [Meta data] and [implementation] mapping: - - `many< 0 >::rule_t` is `internal::success` - - `many< Num >::rule_t` is `internal::many< Num, internal::peek_char >` + - `ascii::many7< 0 >::rule_t` is `internal::success` + - `ascii::many7< Num >::rule_t` is `internal::many< Num, internal::peek_seven >` ###### `not_one< C... >` @@ -680,7 +717,14 @@ ASCII rules do not usually rely on other rules. * Consumes one byte when it succeeds. * [Meta data] and [implementation] mapping: - `ascii::not_one<>::rule_t` is `internal::any< internal::peek_char >` - - `ascii::not_one< C... >::rule_t` is `internal::one< result_on_found::failure, internal::peek_char, C... >` + - `ascii::not_one< C... >::rule_t` is `internal::tester< internal::not_one< internal::peek_char, Cs... > >`. + +###### `not_one7< C... >` + +* True ASCII version of `not_one` only matches input bytes between 0 and 127. +* [Meta data] and [implementation] mapping: + - `ascii::not_one<>::rule_t` is `internal::any< internal::peek_seven >` + - `ascii::not_one< C... >::rule_t` is `internal::tester< internal::not_one< internal::peek_seven, Cs... > >`. ###### `not_range< C, D >` @@ -688,18 +732,25 @@ ASCII rules do not usually rely on other rules. * The next input byte is **not** in the closed range `C ... D`. * Consumes one byte when it succeeds. * [Meta data] and [implementation] mapping: - - `ascii::not_range< C, C >::rule_t` is `internal::one< result_on_found::failure, internal::peek_char, C >` - - `ascii::not_range< C, D >::rule_t` is `internal::range< result_on_found::failure, internal::peek_char, C, D >` + - `ascii::not_range< C, C >::rule_t` is `internal::tester< internal::one< internal::peek_char, C > >`. + - `ascii::not_range< C, D >::rule_t` is `internal::tester< internal::range< internal::peek_char, C, D > >`. + +###### `not_range7< C, D >` + +* True ASCII version of `not_range` only matches input bytes between 0 and 127. +* [Meta data] and [implementation] mapping: + - `ascii::not_range< C, C >::rule_t` is `internal::tester< internal::one< internal::peek_seven, C > >`. + - `ascii::not_range< C, D >::rule_t` is `internal::tester< internal::range< internal::peek_seven, C, D > >`. ###### `nul` -* Matches and consumes an ASCII nul character of value `0`. -* [Equivalent] to `one< '\0' >`. +* Matches and consumes an ASCII nul character. +* [Equivalent] to `(ascii::)one< '\0' >`. ###### `odigit` -* Matches and consumes a single ASCII octal digit character. -* [Equivalent] to `range< '0', '7' >`. +* Matches and consumes a single ASCII octal digit. +* [Equivalent] to `(ascii::)range< '0', '7' >`. ###### `one< C... >` @@ -709,12 +760,12 @@ ASCII rules do not usually rely on other rules. * Fails if `C` is an empty character pack. * [Meta data] and [implementation] mapping: - `ascii::not_one<>::rule_t` is `internal::failure` - - `ascii::not_one< C... >::rule_t` is `internal::one< result_on_found::success, internal::peek_char, C... >` + - `ascii::not_one< C... >::rule_t` is `internal::tester< internal::one< internal::peek_char, C... > >`. ###### `print` * Matches and consumes any single ASCII character traditionally defined as printable. -* [Equivalent] to `range< 32, 126 >`. +* [Equivalent] to `(ascii::)range< 32, 126 >`. ###### `range< C, D >` @@ -722,68 +773,63 @@ ASCII rules do not usually rely on other rules. * The next input byte is in the closed range `C ... D`. * Consumes one byte when it succeeds. * [Meta data] and [implementation] mapping: - - `ascii::range< C, C >::rule_t` is `internal::one< result_on_found::success, internal::peek_char, C >` - - `ascii::range< C, D >::rule_t` is `internal::range< result_on_found::success, internal::peek_char, C, D >` + - `ascii::range< C, C >::rule_t` is `internal::tester< internal::one< internal::peek_char, C > >`. + - `ascii::range< C, D >::rule_t` is `internal::tester< internal::range< internal::peek_char, C, D > >`. ###### `ranges< C1, D1, C2, D2, ... >` ###### `ranges< C1, D1, C2, D2, ..., E >` -* [Equivalent] to `sor< range< C1, D1 >, range< C2, D2 >, ... >`. -* [Equivalent] to `sor< range< C1, D1 >, range< C2, D2 >, ..., one< E > >`. +* [Equivalent] to `sor< (ascii::)range< C1, D1 >, (ascii::)range< C2, D2 >, ... >`. +* [Equivalent] to `sor< (ascii::)range< C1, D1 >, (ascii::)range< C2, D2 >, ..., (ascii::)one< E > >`. * [Meta data] and [implementation] mapping: - `ascii::ranges<>::rule_t` is `internal::failure` - - `ascii::ranges< E >::rule_t` is `internal::one< result_on_found::success, internal::peek_char, E >` - - `ascii::ranges< C, D >::rule_t` is `internal::range< result_on_found::success, internal::peek_char, C, D >` - - `ascii::ranges< C... >::rule_t` is `internal::ranges< internal::peek_char, C... >` - -###### `seven` - -* Matches and consumes any single true ASCII character that fits into 7 bits. -* [Equivalent] to `range< 0, 127 >`. + - `ascii::ranges< E >::rule_t` is `internal::tester< internal::one< internal::peek_char, E > >`. + - `ascii::ranges< C, D >::rule_t` is `internal::tester< internal::range< internal::peek_char, C, D > >`. + - `ascii::ranges< C... >::rule_t` is `internal::tester< internal::ranges< internal::peek_char, C... > >`. ###### `shebang` -* [Equivalent] to `if_must< string< '#', '!' >, until< eolf > >`. +* [Equivalent] to `if_must< (ascii::)string< '#', '!' >, until< eolf > >`. * [Meta data] and [implementation] mapping: - - `ascii::shebang::rule_t` is `internal::seq< false, internal::string< '#', '!' >, internal::until< internal::eolf > >` - - `ascii::shebang::subs_t` is `type_list< internal::string< '#', '!' >, internal::until< internal::eolf > >` + - `ascii::shebang::rule_t` is `internal::seq< false, internal::ascii_string< '#', '!' >, internal::until< internal::eolf > >` + - `ascii::shebang::subs_t` is `type_list< internal::ascii_string< '#', '!' >, internal::until< internal::eolf > >` ###### `sp` * Matches and consumes a single ASCII space character of value `32` or `0x20`. -* [Equivalent] to `one< ' ' >`. +* [Equivalent] to `(ascii::)one< ' ' >`. ###### `space` * Matches and consumes a single space, line feed, carriage return, horizontal tab, vertical tab or form feed. -* [Equivalent] to `one< ' ', '\n', '\r', '\t', '\v', '\f' >`. +* [Equivalent] to `(ascii::)one< ' ', '\n', '\r', '\t', '\v', '\f' >`. ###### `string< C... >` -* Matches and consumes a string, a sequence of bytes or single-byte characters. -* [Equivalent] to `seq< one< C >... >`. +* Matches and consumes a string, a sequence of bytes or ASCII characters. +* [Equivalent] to `seq< (ascii::)one< C >... >`. * [Meta data] and [implementation] mapping: - `ascii::string<>::rule_t` is `internal::success` - - `ascii::string< C >:rule_t` is `internal::one< result_on_found::success, internal::peek_char, C >` - - `ascii::string< C... >::rule_t` is `internal::string< C... >` + - `ascii::string< C >:rule_t` is `internal::tester< internal::one< internal::peek_char, C > >` + - `ascii::string< C... >::rule_t` is `internal::ascii_string< C... >` ###### `TAO_PEGTL_ISTRING( "..." )` -* Macro where `TAO_PEGTL_ISTRING( "foo" )` yields `istring< 'f', 'o', 'o' >`. +* Macro where `TAO_PEGTL_ISTRING( "foo" )` yields `(ascii::)istring< 'f', 'o', 'o' >`. * The argument must be a string literal. * Works for strings up to 512 bytes of length (excluding trailing `'\0'`). * Strings may contain embedded `'\0'`. ###### `TAO_PEGTL_KEYWORD( "..." )` -* Macro where `TAO_PEGTL_KEYWORD( "foo" )` yields `keyword< 'f', 'o', 'o' >`. +* Macro where `TAO_PEGTL_KEYWORD( "foo" )` yields `(ascii::)keyword< 'f', 'o', 'o' >`. * The argument must be a string literal. * Works for keywords up to 512 bytes of length (excluding trailing `'\0'`). * Strings may contain embedded `'\0'`. ###### `TAO_PEGTL_STRING( "..." )` -* Macro where `TAO_PEGTL_STRING( "foo" )` yields `string< 'f', 'o', 'o' >`. +* Macro where `TAO_PEGTL_STRING( "foo" )` yields `(ascii::)string< 'f', 'o', 'o' >`. * The argument must be a string literal. * Works for strings up to 512 bytes of length (excluding trailing `'\0'`). * Strings may contain embedded `'\0'`. @@ -794,7 +840,7 @@ ASCII rules do not usually rely on other rules. * These three input bytes are all `C`. * Consumes three bytes when it succeeds. * [Meta data] and [implementation] mapping: - - `ascii::three< C >::rule_t` is `internal::string< C, C, C >` + - `ascii::three< C >::rule_t` is `internal::ascii_string< C, C, C >` ###### `two< C >` @@ -802,73 +848,67 @@ ASCII rules do not usually rely on other rules. * These two input bytes are both `C`. * Consumes two bytes when it succeeds. * [Meta data] and [implementation] mapping: - - `ascii::two< C >::rule_t` is `internal::string< C, C >` + - `ascii::two< C >::rule_t` is `internal::ascii_string< C, C >` ###### `upper` * Matches and consumes a single ASCII upper-case alphabetic character. -* [Equivalent] to `range< 'A', 'Z' >`. +* [Equivalent] to `(ascii::)range< 'A', 'Z' >`. ###### `vt` * Matches and consumes a single ASCII vertical tab of value `11` or `0x0b`. -* [Equivalent] to `one< '\v' >`. +* [Equivalent] to `(ascii::)one< '\v' >`. ###### `xdigit` * Matches and consumes a single ASCII hexadecimal digit character. -* [Equivalent] to `ranges< '0', '9', 'a', 'f', 'A', 'F' >`. +* [Equivalent] to `(ascii::)ranges< '0', '9', 'a', 'f', 'A', 'F' >`. + ## Unicode Rules These rules are available in multiple versions, -* in namespace `tao::pegtl::utf8` for UTF-8 encoded inputs, -* in namespace `tao::pegtl::utf16_be` for big-endian UTF-16 encoded inputs, -* in namespace `tao::pegtl::utf16_le` for little-endian UTF-16 encoded inputs, -* in namespace `tao::pegtl::utf32_be` for big-endian UTF-32 encoded inputs, -* in namespace `tao::pegtl::utf32_le` for little-endian UTF-32 encoded inputs, +* in namespace `tao::pegtl::utf8` for UTF-8 inputs, +* in namespace alias `tao::pegtl::utf16` for native-endian UTF-16 inputs, +* in namespace alias `tao::pegtl::utf32` for native-endian UTF-32 inputs. +* in namespace `tao::pegtl::utf16_be` for big-endian UTF-16 inputs, +* in namespace `tao::pegtl::utf16_le` for little-endian UTF-16 inputs, +* in namespace `tao::pegtl::utf32_be` for big-endian UTF-32 inputs, +* in namespace `tao::pegtl::utf32_le` for little-endian UTF-32 inputs, * in namespace `tao::pegtl::unicode` for native-endian input-adaptive rules. -For convenience, they also appear in multiple namespace aliases, - -* namespace alias `tao::pegtl::utf16` for native-endian UTF-16 encoded inputs, -* namespace alias `tao::pegtl::utf32` for native-endian UTF-32 encoded inputs. - -The following limitations apply to the UTF-16, UTF-32 and Unicode rules: +Only the UTF-8 rules are included with the default header `tao/pegtl.hpp`. +For the other rules the following header files need to be included as required. -* The line and column numbers are not counted correctly. -* They are not automatically included with `tao/pegtl.hpp`. - -The UTF-8 rules are included with `include/tao/pegtl.hpp` while the UTF-16 and UTF-32 rules require manual inclusion of the following files. * `tao/pegtl/utf16.hpp` * `tao/pegtl/utf32.hpp` * `tao/pegtl/unicode.hpp` -While unaligned accesses are no problem on x86 compatible processors, on other architectures they might be very slow or even crash the application. - -In the following descriptions a Unicode code point is considered *valid* when it is in the range `0` to `0x10ffff`. -The parameter N stands for the size of the encoding of the next Unicode code point in the input, i.e. +Remember that the `column` field of the input positions counts the number of input objects since the last line ending, not the actual column! -* for UTF-8 the rules are multi-byte-sequence-aware and N is either 1, 2, 3 or 4, -* for UTF-16 the rules are surrogate-pair-aware and N is either 2 or 4, and -* for UTF-32 everything is simple and N is always 4. +A Unicode code point is considered *valid* when it is in the range `0` to `0x10ffff`. -The UTF-16 rules also work with inputs whose `data_t` is a 16bit integer or enum type. +* The UTF-8 rules work on the same inputs as the ASCII rules whose `data_t` is an 8-bit integer or enum type. +* The UTF-16 rules work with inputs whose `data_t` is either an 8-bit or a 16-bit integer or enum type. +* The UTF-32 rules work with inputs whose `data_t` is either an 8-bit or a 32-bit integer or enum type. -The UTF-16 rules also work with inputs whose `data_t` is a 32bit integer or enum type. +In the following descriptions the parameter N stands for the size of the encoding of the next Unicode code point in the input. -The input-adaptive rules use UTF-8 on inputs whose `data_t` is an 8bit integer or enum type, native-endian UTF-16 on 16bit inputs, and native-endian UTF-32 on 32bit inputs. +* For UTF-8 the rules are multi-byte-sequence-aware and N is either 1, 2, 3 or 4. +* For UTF-16 the rules are surrogate-pair-aware and N is either 2 or 4 for 8-bit inputs, or either 1 or 2 for 16-bit inputs. +* For UTF-32 N is always 4 for 8-bit inputs, and always 1 for 32-bit inputs. -It is an error when a code unit in the range `0xd800` to `0xdfff` is encountered outside of a valid UTF-16 surrogate pair (this changed in version 2.6.0). +The input-adaptive rules use UTF-8 on 8-bit inputs, native-endian UTF-16 on 16-bit inputs, and native-endian UTF-32 on 32bit inputs. -Unicode rules do not rely on other rules. +Note that the Unicode rules only match UTF-16 surrogates as part of a valid UTF-16 surrogate pair. ###### `any` * Succeeds when the input is not empty, and: -* The next N bytes encode a valid Unicode code point. -* Consumes the N bytes when it succeeds. +* The next N input objects encode a valid Unicode code point. +* Consumes the N input objects when it succeeds. ###### `bom` @@ -878,35 +918,35 @@ Unicode rules do not rely on other rules. * Succeeds when the input contains at least `Num` further code points. * Consumes these `Num` code points from the input. -* [Equivalent] to `rep< N, any >`. +* [Equivalent] to `rep< Num, any >`. ###### `not_one< C... >` * Succeeds when the input is not empty, and: -* The next N bytes encode a valid Unicode code point, and: +* The next N input objects encode a valid Unicode code point, and: * `C` is an empty character pack or the input code point is **not** one of the given code points `C...`. -* Consumes the N bytes when it succeeds. +* Consumes the N input objects when it succeeds. ###### `not_range< C, D >` * Succeeds when the input is not empty, and: -* The next N bytes encode a valid Unicode code point, and: +* The next N input objects encode a valid Unicode code point, and: * The input code point `B` satisfies `B < C || D < B`. -* Consumes the N bytes when it succeeds. +* Consumes the N input objects when it succeeds. ###### `one< C... >` * Succeeds when the input is not empty, and: -* The next N bytes encode a valid Unicode code point, and: +* The next N input objects encode a valid Unicode code point, and: * `C` is a non-empty character pack and the input code point is one of the given code points `C...`. -* Consumes the N bytes when it succeeds. +* Consumes the N input objects when it succeeds. ###### `range< C, D >` * Succeeds when the input is not empty, and: -* The next N bytes encode a valid Unicode code point, and: +* The next N input objects encode a valid Unicode code point, and: * The input code point `B` satisfies `C <= B && B <= D`. -* Consumes the N bytes when it succeeds. +* Consumes the N input objects when it succeeds. ###### `ranges< C1, D1, C2, D2, ... >` @@ -919,10 +959,10 @@ Unicode rules do not rely on other rules. ###### `string< C... >` * [Equivalent] to `seq< one< C >... >`. -* [Meta data] and [implementation] mapping for UTF-8 only: - * `string<>::rule_t` is `internal::success`. - * `string< C >::rule_t` is `internal::single< internal::one< internal::peek_utf8, C > >`. - * `string< C... >::rule_t` is `internal::char_string< U... >` where `U...` is the UTF-8 encoding of the codepoints `C...`. +* [Meta data] and [implementation] mapping for UTF-8: + * `utf8::string<>::rule_t` is `internal::success`. + * `utf8::string< C >::rule_t` is `internal::tester< internal::one< internal::peek_utf8, C > >`. + * `utf8::string< C... >::rule_t` is `internal::ascii_string< U... >` where `U...` is the UTF-8 encoding of the codepoints `C...`. ### ICU Support @@ -932,17 +972,14 @@ Because of this external dependency the rules are not automatically included in The ICU-based rules are again available in multiple versions, * in namespace `tao::pegtl::utf8::icu` for UTF-8 encoded inputs, +* in namespace alias `tao::pegtl::utf16::icu` for native-endian UTF-16 inputs, +* in namespace alias `tao::pegtl::utf32::icu` for native-endian UTF-32 inputs. * in namespace `tao::pegtl::utf16_be::icu` for big-endian UTF-16 encoded inputs, * in namespace `tao::pegtl::utf16_le::icu` for little-endian UTF-16 encoded inputs, * in namespace `tao::pegtl::utf32_be::icu` for big-endian UTF-32 encoded inputs, * in namespace `tao::pegtl::utf32_le::icu` for little-endian UTF-32 encoded inputs, * in namespace `tao::pegtl::unicode::icu` for native-endian input-adaptive rules. -And, for convenience, they again appear in multiple namespace aliases, - -* namespace alias `tao::pegtl::utf16::icu` for native-endian UTF-16 encoded inputs, -* namespace alias `tao::pegtl::utf32::icu` for native-endian UTF-32 encoded inputs. - To use these rules it is necessary to provide an include path to the ICU library, to link the application against `libicu`, and to manually include one or more of the following header files: * `tao/pegtl/contrib/icu/utf8.hpp` @@ -962,9 +999,9 @@ Each of the above namespaces provides two basic rules for matching binary proper * `P` is a binary property defined by ICU, see [`UProperty`](http://icu-project.org/apiref/icu4c/uchar_8h.html). * `V` is a boolean value. * Succeeds when the input is not empty, and: -* The next N bytes encode a valid unicode code point, and: +* The next N input objects encode a valid unicode code point, and: * The code point's property `P`, i.e. [`u_hasBinaryProperty( cp, P )`](http://icu-project.org/apiref/icu4c/uchar_8h.html), equals `V`. -* Consumes the N bytes when it succeeds. +* Consumes the N input objects when it succeeds. ###### `binary_property< P >` @@ -975,9 +1012,9 @@ Each of the above namespaces provides two basic rules for matching binary proper * `P` is an enumerated property defined by ICU, see [`UProperty`](http://icu-project.org/apiref/icu4c/uchar_8h.html). * `V` is an integer value. * Succeeds when the input is not empty, and: -* The next N bytes encode a valid unicode code point, and: +* The next N input objects encode a valid unicode code point, and: * The code point's property `P`, i.e. [`u_getIntPropertyValue( cp, P )`](http://icu-project.org/apiref/icu4c/uchar_8h.html), equals `V`. -* Consumes the N bytes when it succeeds. +* Consumes the N input objects when it succeeds. ### ICU Rules for Binary Properties @@ -1267,9 +1304,10 @@ Convenience wrappers for enumerated properties that return a value instead of an * `V` is of type `std::uint8_t`. * [Equivalent] to `property_value< UCHAR_TRAIL_CANONICAL_COMBINING_CLASS, V >`. + ## Buffer Rules -These rules are used in conjunction with buffer inputs. +These rules are used in conjunction with [Incremental Input]. Unlike most other rules their implementation resides directly in namespace `tao::pegtl` instead of in `tao::pegtl::internal` with disabled control. @@ -1298,71 +1336,82 @@ They are in the directory `include/tao/pegtl/buffer` and are **not** included wi - `require< 0 >::rule_t` is `internal::success` - `require< N >::rule_t` is `require< N >` + ## Binary Rules These rules are available in multiple versions, -* in namespace `tao::pegtl::int8` for signed 8-bit integer values, -* in namespace `tao::pegtl::uint8` for unsigned 8-bit integer values, -* in namespace `tao::pegtl::int16_be` for big-endian 16-bit integer values, -* in namespace `tao::pegtl::int16_le` for little-endian 16-bit integer values, -* in namespace `tao::pegtl::int32_be` for big-endian 32-bit integer values, -* in namespace `tao::pegtl::int32_le` for little-endian 32-bit integer values, -* in namespace `tao::pegtl::int64_be` for big-endian 64-bit integer values, and -* in namespace `tao::pegtl::int64_le` for little-endian 64-bit integer values. -* in namespace `tao::pegtl::uint16_be` for big-endian 16-bit integer values, -* in namespace `tao::pegtl::uint16_le` for little-endian 16-bit integer values, -* in namespace `tao::pegtl::uint32_be` for big-endian 32-bit integer values, -* in namespace `tao::pegtl::uint32_le` for little-endian 32-bit integer values, -* in namespace `tao::pegtl::uint64_be` for big-endian 64-bit integer values, and -* in namespace `tao::pegtl::uint64_le` for little-endian 64-bit integer values - -Please not that the masked rules are available *only* for unsigned integers. +* in namespace `tao::pegtl::int8` for `std::int8_t` values, +* in namespace `tao::pegtl::uint8` for `std::uint8_t` values, +* in namespace `tao::pegtl::int16_be` for big-endian `std::int16_t` values, +* in namespace `tao::pegtl::int16_le` for little-endian `std::int16_t` values, +* in namespace `tao::pegtl::int32_be` for big-endian `std::int32_t` values, +* in namespace `tao::pegtl::int32_le` for little-endian `std::int32_t` values, +* in namespace `tao::pegtl::int64_be` for big-endian `std::int64_t` values, +* in namespace `tao::pegtl::int64_le` for little-endian `std::int64_t` values, +* in namespace `tao::pegtl::uint16_be` for big-endian `std::uint16_t` values, +* in namespace `tao::pegtl::uint16_le` for little-endian `std::uint16_t` values, +* in namespace `tao::pegtl::uint32_be` for big-endian `std::uint32_t` values, +* in namespace `tao::pegtl::uint32_le` for little-endian `std::uint32_t` values, +* in namespace `tao::pegtl::uint64_be` for big-endian `std::uint64_t` values, +* in namespace `tao::pegtl::uint64_le` for little-endian `std::uint64_t` values, +* in namespace alias `tao::pegtl::int16` for native-endian `std::int16_t` values, +* in namespace alias `tao::pegtl::int32` for native-endian `std::int32_t` values, +* in namespace alias `tao::pegtl::int64` for native-endian `std::int64_t` values, +* in namespace alias `tao::pegtl::uint16` for native-endian `std::uint16_t` values, +* in namespace alias `tao::pegtl::uint32` for native-endian `std::uint32_t` values, +* in namespace alias `tao::pegtl::uint64` for native-endian `std::uint64_t` values. The binary rules need to be manually included, i.e. they are not part of `. -These rules read one or more bytes from the input to form (and match) an 8, 16, 32 or 64-bit value, respectively, and corresponding template parameters are given as either `std::int8_t`, `std::uint8_t`, `std::int16_t`, `std::uint16_t`, `std::int32_t`, `std::uint32_t`, `std::int64_t` or `std::uin64_t`. +* The 8-bit rules work on the same inputs as the ASCII rules whose `data_t` is an 8-bit integer or enum type. +* The 16-bit rules work with inputs whose `data_t` is either an 8-bit or a 16-bit integer or enum type. +* The 32-bit rules work with inputs whose `data_t` is either an 8-bit or a 32-bit integer or enum type. +* The 64-bit rules work with inputs whose `data_t` is either an 8-bit or a 64-bit integer or enum type. + +In the following descriptions the parameter N stands for the size of the encoding of the next integer value according to the size of integer matched by the rule in question. -In the following descriptions, the parameter N is the size of a single value in bytes, i.e. either 1, 2, 4 or 8. -The term *input value* indicates a correspondingly sized integer value read from the input. -For inputs of values of size 1 like `char` or `std::byte` all integer rules can be used, for inputs of values of size greater than 1 like `int` or `long` only integer rules of matching size are possible. +* For 8-bit rules N is always 1. +* For 16-bit rules N is 2 for 8-bit inputs and 1 for 16-bit inputs. +* For 32-bit rules N is 4 for 8-bit inputs and 1 for 32-bit inputs. +* For 64-bit rules N is 8 for 8-bit inputs and 1 for 64-bit inputs. -Binary rules do not rely on other rules. +The term *input value* indicates an integer value of the appropriate size read from the input. ###### `any` -* Succeeds when the input contains at least N bytes. -* Consumes N bytes when it succeeds. +* Succeeds when the input contains at least N objects. +* Consumes N objects when it succeeds. ###### `many< Num >` -* Succeeds when the input contains at least `Num` times N bytes. -* Consumes these `Num` * N bytes from the input. +* Succeeds when the input contains at least `Num` times N objects. +* Consumes these `Num` * N objects from the input. * [Equivalent] to `rep< N, any >`. ###### `mask_not_one< M, C... >` -* Succeeds when the input contains at least N bytes, and: -* `C` is an empty character pack or the (endian adjusted) input value masked with `M` is **not** one of the given values `C...`. -* Consumes N bytes when it succeeds. +* Succeeds when the input contains at least N objects, and: +* `C` is an empty pack *or* the (endian adjusted) input value masked with `M` is **not** one of the values `C...`. +* Consumes N objects when it succeeds. ###### `mask_not_range< M, C, D >` -* Succeeds when the input contains at least N bytes, and: +* Succeeds when the input contains at least N objects, and: * The (endian adjusted) input value `B` satisfies `( B & M ) < C || D < ( B & M )`. -* Consumes N bytes when it succeeds. +* Consumes N objects when it succeeds. ###### `mask_one< M, C... >` -* Succeeds when the input contains at least N bytes, and: +* Succeeds when the input contains at least N objects, and: * `C` is a non-empty character pack and the (endian adjusted) input value masked with `M` is one of the given values `C...`. -* Consumes N bytes when it succeeds. +* Consumes N objects when it succeeds. ###### `mask_range< M, C, D >` -* Succeeds when the input contains at least N bytes, and: +* Succeeds when the input contains at least N objects, and: * The (endian adjusted) input value `B` satisfies `C <= ( B & M ) && ( B & M ) <= D`. -* Consumes N bytes when it succeeds. +* Consumes N objects when it succeeds. ###### `mask_ranges< M, C1, D1, C2, D2, ... >` @@ -1378,27 +1427,27 @@ Binary rules do not rely on other rules. ###### `not_one< C... >` -* Succeeds when the input contains at least N bytes, and: +* Succeeds when the input contains at least N objects, and: * `C` is an empty character pack or the (endian adjusted) input value is **not** one of the given values `C...`. -* Consumes N bytes when it succeeds. +* Consumes N objects when it succeeds. ###### `not_range< C, D >` -* Succeeds when the input contains at least N bytes, and: +* Succeeds when the input contains at least N objects, and: * The (endian adjusted) input value `B` satisfies `B < C || D < B`. -* Consumes N bytes when it succeeds. +* Consumes N objects when it succeeds. ###### `one< C... >` -* Succeeds when the input contains at least N bytes, and: +* Succeeds when the input contains at least N objects, and: * `C` is a non-empty character pack and the (endian adjusted) input value is one of the given values `C...`. -* Consumes N bytes when it succeeds. +* Consumes N objects when it succeeds. ###### `range< C, D >` -* Succeeds when the input contains at least N bytes, and: +* Succeeds when the input contains at least N objects, and: * The (endian adjusted) input value `B` satisfies `C <= B && B <= D`. -* Consumes N byte when it succeeds. +* Consumes N objects when it succeeds. ###### `ranges< C1, D1, C2, D2, ... >` @@ -1412,6 +1461,7 @@ Binary rules do not rely on other rules. * [Equivalent] to `seq< one< C >... >`. + ## Advanced Rules ###### `function< F >` @@ -1420,8 +1470,13 @@ Binary rules do not rely on other rules. ###### `nested< R >` + ## Exception Rules +These rules are only available when compiling with exception support (which is usually the default outside of some restricted embedded applications). + +These rules are in namespace `tao::pegtl`. + ###### `if_must< R, S... >` * Attempts to match `R` and depending on the result proceeds with either `must< S... >` or `failure`. @@ -1580,7 +1635,7 @@ Note that the `true` template parameter to `internal::if_must` corresponds to th - `try_catch_std_raise_nested< R... >::rule_t` is `internal::try_catch_raise_nested< std::exception, internal::seq< R... > >` - `try_catch_std_raise_nested< R... >::subs_t` is `type_list< internal::seq< R... > >` -###### `try_catch_std_return_false< E, R... >` +###### `try_catch_std_return_false< R... >` * [Equivalent] to `seq< R... >`, but: * Catches exceptions of type `std::exception` (or derived), and: @@ -1624,6 +1679,7 @@ Note that the `true` template parameter to `internal::if_must` corresponds to th * The argument must be a string literal. * Works for strings up to 512 bytes of length (excluding trailing `'\0'`). + ## Full Index * [`action< A, R... >`](#action-a-r-) [(meta rules)](#meta-rules) @@ -1633,6 +1689,7 @@ Note that the `true` template parameter to `internal::if_must` corresponds to th * [`any`](#any) [(ascii rules)](#ascii-rules) * [`any`](#any-1) [(unicode rules)](#unicode-rules) * [`any`](#any-2) [(binary rules)](#binary-rules) +* [`any7`](#any7) [(ascii rules)](#ascii-rules) * [`apply< A... >`](#apply-a-) [(action rules)](#action-rules) * [`apply0< A... >`](#apply0-a-) [(action rules)](#action-rules) * [`ascii_hex_digit`](#ascii_hex_digit) [(icu rules)](#icu-rules-for-binary-properties) @@ -1716,6 +1773,7 @@ Note that the `true` template parameter to `internal::if_must` corresponds to th * [`many< Num >`](#many-num-) [(ascii rules)](#ascii-rules) * [`many< Num >`](#many-num--1) [(unicode rules)](#unicode-rules) * [`many< Num >`](#many-num--2) [(binary rules)](#binary-rules) +* [`many7< Num >`](#many7-num-) [(ascii rules)](#ascii-rules) * [`mask_not_one< M, C... >`](#mask_not_one-m-c-) [(binary rules)](#binary-rules) * [`mask_not_range< M, C, D >`](#mask_not_range-m-c-d-) [(binary rules)](#binary-rules) * [`mask_one< M, C... >`](#mask_one-m-c-) [(binary rules)](#binary-rules) @@ -1735,9 +1793,11 @@ Note that the `true` template parameter to `internal::if_must` corresponds to th * [`not_one< C... >`](#not_one-c-) [(ascii rules)](#ascii-rules) * [`not_one< C... >`](#not_one-c--1) [(unicode rules)](#unicode-rules) * [`not_one< C... >`](#not_one-c--2) [(binary rules)](#binary-rules) +* [`not_one7< C... >`](#not_one7-c-) [(ascii rules)](#ascii-rules) * [`not_range< C, D >`](#not_range-c-d-) [(ascii rules)](#ascii-rules) * [`not_range< C, D >`](#not_range-c-d--1) [(unicode rules)](#unicode-rules) * [`not_range< C, D >`](#not_range-c-d--2) [(binary rules)](#binary-rules) +* [`not_range7< C, D >`](#not_range7-c-d-) [(ascii rules)](#ascii-rules) * [`nul`](#nul) [(ascii rules)](#ascii-rules) * [`numeric_type< V >`](#numeric_type-v-) [(icu rules)](#icu-rules-for-enumerated-properties) * [`one< C... >`](#one-c-) [(ascii rules)](#ascii-rules)