Skip to content

Commit

Permalink
Changes capitalization (#2146) and anchors (to avoid #561)
Browse files Browse the repository at this point in the history
  • Loading branch information
JJ committed Aug 21, 2018
1 parent 984ad94 commit 2e92958
Show file tree
Hide file tree
Showing 5 changed files with 60 additions and 44 deletions.
2 changes: 1 addition & 1 deletion doc/Language/5to6-nutshell.pod6
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ features and idioms are not).
Hence this should not be mistaken for a beginner tutorial or a promotional
overview of Perl 6; it is intended as a technical reference for Perl 6
learners with a strong Perl 5 background and for anyone porting Perl 5 code
to Perl 6 (though note that L<#Automated Translation> might be more
to Perl 6 (though note that L<#Automated translation> might be more
convenient).
A note on semantics; when we say "now" in this document, we mostly just
Expand Down
9 changes: 8 additions & 1 deletion doc/Language/glossary.pod6
Original file line number Diff line number Diff line change
Expand Up @@ -865,7 +865,14 @@ is its usual acronym.
=head1 property
In this context, it either refers to an L<object property|https://docs.perl6.org/language/objects#index-entry-Property>, which is the value of an instance variable, or an L<Unicode property|https://docs.perl6.org/language/regexes#Unicode_Properties> which are codepoint features that allow programs to identify what kind of entity they represent, that is, if they are a letter, or a number, or something completely different like a control character.
In this context, it either refers to an
L<object property|/language/objects#index-entry-Property>,
which is the value of an instance variable, or an
L<Unicode property|/language/regexes#Unicode_properties>
which are codepoint features that
allow programs to identify what kind of entity they represent, that is, if they
are a letter, or a number, or something completely different like a control
character.
X<|pugs>
=head1 pugs
Expand Down
85 changes: 47 additions & 38 deletions doc/Language/regexes.pod6
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Regular expressions, I<regexes> for short, are a sequence of characters that
describe a pattern of text. Pattern matching is the process of matching
those patterns to actual text.
=head1 X<Lexical Conventions|quote,/ /;quote,rx;quote,m>
=head1 X<Lexical conventions|quote,/ /;quote,rx;quote,m>
Perl 6 has special syntax for writing regexes:
Expand Down Expand Up @@ -242,14 +242,15 @@ Note that the character classes C«<same>», C«<wb>» and C«<ww>» are
so called zero-width assertions, which do not really match a
character.
=head2 X«Unicode Properties|regex,<:property>»
=head2 X«Unicode properties|regex,<:property>»
The character classes mentioned so far are mostly for convenience; another
approach is to use Unicode character properties. These come in the form
C«<:property>», where C<property> can be a short or long Unicode General
Category name. These use pair syntax.
To match against a Unicode property you can use either smartmatch or L<C<uniprop>|/routine/uniprop>:
To match against a Unicode property you can use either smartmatch or
L<C<uniprop>|/routine/uniprop>:
"a".uniprop('Script'); # OUTPUT: «Latin␤»
"a" ~~ / <:Script<Latin>> /; # OUTPUT: «「a」␤»
Expand Down Expand Up @@ -325,7 +326,7 @@ parentheses; for example:
say $0 if 'perl6' ~~ /\w+(<:Ll+:N>)/ # OUTPUT: «「6」␤»
=head2 X«Enumerated Character Classes and Ranges|regex,<[ ]>;regex,<-[ ]>»
=head2 X«Enumerated character classes and ranges|regex,<[ ]>;regex,<-[ ]>»
Sometimes the pre-existing wildcards and character classes are not
enough. Fortunately, defining your own is fairly simple. Within C«<[ ]>»,
Expand Down Expand Up @@ -587,7 +588,7 @@ string of non-whitespace characters.
Even in non-backtracking contexts, the alternation operator C<||> tries
all the branches in order until the first one matches.
=head1 X<Longest Alternation: C<|>|regex,|>
=head1 X<Longest alternation: C<|>|regex,|>
In short, in regex branches separated by C<|>, the longest token match wins,
independent of the textual ordering in the regex. However, what C<|> really
Expand Down Expand Up @@ -649,7 +650,7 @@ Arrays can also be interpolated into a regex to achieve the same effect:
my @increasingly-edible = <f fo foo food>;
say 'food' ~~ /@increasingly-edible/; # OUTPUT: «「food」␤»
This is documented further under L<Regex Interpolation|#Regex_Interpolation>,
This is documented further under L<Regex Interpolation|#Regex_interpolation>,
below.
=head1 X<Conjunction: C<&&>|regex,&&>
Expand Down Expand Up @@ -686,7 +687,7 @@ Regexes search an entire string for matches. Sometimes this is not what
you want. Anchors match only at certain positions in the string, thereby
anchoring the regex match to that position.
=head2 X<Start of String and End of String|regex,^;regex,$>
=head2 X<Start of string and end of string|regex,^;regex,$>
The C<^> anchor only matches at the start of the string:
Expand Down Expand Up @@ -729,7 +730,7 @@ The following is a multi-line string:
# 'and' is at the start of a line -- not the string
say so $str ~~ /^and /; # OUTPUT: «False␤»
=head2 X<Start of Line and End of Line|regex,^^;regex,$$>
=head2 X<Start of line and end of line|regex,^^;regex,$$>
The C<^^> anchor matches at the start of a logical line. That is, either
at the start of the string, or after a newline character. However, it does not
Expand Down Expand Up @@ -774,7 +775,7 @@ two leading spaces each.
# matched at the last line
say so $str ~~ / '."' $$/; # OUTPUT: «True␤»
=head2 X«Word Boundary|regex, <|w>;regex, <!|w>»
=head2 X«Word boundary|regex, <|w>;regex, <!|w>»
To match any word boundary, use C«<|w>» or C«<?wb>». This is similar to
X«C<\b>|regex deprecated,\b» of other languages.
Expand Down Expand Up @@ -860,7 +861,7 @@ lookahead and lookbehind assertions.
Technically, anchors are also zero-width assertions, and they can look
both ahead and behind.
=head2 X<Lookahead Assertions|regex,before>
=head2 X<Lookahead assertions|regex,before>
To check that a pattern appears before another pattern, use a
lookahead assertion via the C<before> assertion. This has the form:
Expand Down Expand Up @@ -942,7 +943,9 @@ These are, as in the case of lookahead, zero-width assertions which do not I<con
say "atfoobar" ~~ / (.**3) .**2 <?after foo> bar /;
# OUTPUT: «「atfoobar」␤ 0 => 「atf」␤»
where we capture the first 3 of the 5 characters before bar, but only if C<bar> is preceded by C<foo>. The fact that the assertion is zero-width allows us to use part of the characters in the assertion for capture.
where we capture the first 3 of the 5 characters before bar, but only if C<bar>
is preceded by C<foo>. The fact that the assertion is zero-width allows us to
use part of the characters in the assertion for capture.
Expand Down Expand Up @@ -1065,7 +1068,9 @@ it in a variable first:
X<|:my>
C<:my> helps scoping the C<$c> variable within the regex and beyond; in this case we can use it in the next sentence to show what has been matched inside the regex. This can be used for debugging inside regular expressions, for instance:
C<:my> helps scoping the C<$c> variable within the regex and beyond; in this
case we can use it in the next sentence to show what has been matched inside the
regex. This can be used for debugging inside regular expressions, for instance:
my $paragraph="line\nline2\nline3";
$paragraph ~~ rx| :my $counter = 0; ( \V* { ++$counter } ) *%% \n |;
Expand All @@ -1086,7 +1091,8 @@ say HasOur.parse('Þor is mighty'); # OUTPUT: «「Þor is mighty」␤»
say $HasOur::our; # OUTPUT: «Þor␤»
=end code
Once the parsing has been done successfully, we use the FQN name of the C<$our> variable to access its value, that can be none other than C<Þor>
Once the parsing has been done successfully, we use the FQN name of the C<$our>
variable to access its value, that can be none other than C<Þor>.
=head2 X<Named captures|regex, Named captures>
Expand Down Expand Up @@ -1136,9 +1142,10 @@ C<\K>.
say 'abc' ~~ / a <( b )> c/; # OUTPUT: «「b」␤»
say 'abc' ~~ / <(a <( b )> c)>/; # OUTPUT: «「bc」␤»
As in the example above, you can see C«<(» sets the start point and C«)>» sets the
endpoint; since they are actually independent of each other, the inner-most start point
wins (the one attached to C<b>) and the outer-most end wins (the one attached to C<c>).
As in the example above, you can see C«<(» sets the start point and C«)>» sets
the endpoint; since they are actually independent of each other, the inner-most
start point wins (the one attached to C<b>) and the outer-most end wins (the one
attached to C<c>).
=head1 Substitution
Expand Down Expand Up @@ -1430,7 +1437,7 @@ list of predefined subrules is listed in
L<S05-regex|https://design.perl6.org/S05.html#Predefined_Subrules> of design
documents.
=head1 X<Regex Interpolation|regex, Regex Interpolation>
=head1 X<Regex interpolation|regex, Regex Interpolation>
If you want to build a regex using a pattern given at runtime, regex
interpolation is what you are looking for.
Expand Down Expand Up @@ -1575,7 +1582,7 @@ like C<:overlap> are appended to the match call:
}
# OUTPUT: «ba␤aA␤»
=head2 X<Regex Adverbs|regex adverb,:ignorecase;regex adverb,:i>
=head2 X<Regex adverbs|regex adverb,:ignorecase;regex adverb,:i>
Adverbs that appear at the time of a regex declaration are part of the
actual regex and influence how the Perl 6 compiler translates the regex into
Expand Down Expand Up @@ -2150,12 +2157,13 @@ my $string = 'PostgreSQL is an SQL database!';
say $string ~~ /(.+)(SQL) (.+) $1/; # OUTPUT: 「PostgreSQL is an SQL」
=end code
What happens in the above example is that the string has to be matched against the
second occurrence of the word I<SQL>, eating all characters before and leaving out
the rest.
What happens in the above example is that the string has to be matched against
the second occurrence of the word I<SQL>, eating all characters before and
leaving out the rest.
Since it is possible to execute a piece of code within a regular expression, it is also possible
to inspect the L<Match|/type/Match> object within the regular expression itself:
Since it is possible to execute a piece of code within a regular expression, it
is also possible to inspect the L<Match|/type/Match> object within the regular
expression itself:
=begin code :preamble<my $string = '';>
my $iteration = 0;
Expand Down Expand Up @@ -2186,10 +2194,10 @@ Capture 2 = is an
showing that the string has been split around the second occurrence of I<SQL>, that
is the repetition of the first capture (C<$/[1]>).
With that in place, it is now possible to see how the engine backtracks
to find the above match: it does suffice to move the C<show-captures>
in the middle of the regular expression, in particular before the repetition of the
first capture C<$1> to see it in action:
With that in place, it is now possible to see how the engine backtracks to find
the above match: it does suffice to move the C<show-captures> in the middle of
the regular expression, in particular before the repetition of the first capture
C<$1> to see it in action:
=begin code :preamble<my $string = '';>
my $iteration = 0;
Expand All @@ -2207,8 +2215,8 @@ sub show-captures( Match $m ){
$string ~~ / (.+)(SQL) (.+) { show-captures( $/ ); } $1 /;
=end code
The output will be much more verbose and will show several iterations, with the last one
being the I<winning>. The following is an excerpt of the output:
The output will be much more verbose and will show several iterations, with the
last one being the I<winning>. The following is an excerpt of the output:
=begin code :lang<text>
=== Iteration 1 ===
Expand Down Expand Up @@ -2260,11 +2268,11 @@ say $string ~~ /(.+)(SQL) (.+) $1/; # OUTPUT: 「PostgreSQL is an SQL」
say $string ~~ / :r (.+)(SQL) (.+) $1/; # OUTPUT: Nil
=end code
The fact is that, as shown in the I<iteration 1> output, the first match
of the regular expression engine will be C<PostgreSQL is an >, C<SQL>, C< database>
that does not leave out any room for matching another occurrence of the word I<SQL>
(as C<$1> in the regular expression). Since the engine is not able to get backward and change the
path to match, the regular expression fails.
The fact is that, as shown in the I<iteration 1> output, the first match of the
regular expression engine will be C<PostgreSQL is an >, C<SQL>, C< database>
that does not leave out any room for matching another occurrence of the word
I<SQL> (as C<$1> in the regular expression). Since the engine is not able to get
backward and change the path to match, the regular expression fails.
It is worth noting that disabling backtracking will not prevent the engine
to try several ways to match the regular expression.
Expand Down Expand Up @@ -2312,8 +2320,8 @@ Capture 1 = database!
[SQL][ database!]
=end code
Even using the L<:r|/language/regexes#ratchet> adverb to prevent backtracking will not
change things:
Even using the L<:r|/language/regexes#ratchet> adverb to prevent backtracking
will not change things:
=begin code :preamble<my $string = '';>
my $iteration = 0;
Expand Down Expand Up @@ -2345,8 +2353,9 @@ Capture 1 = database!
[SQL][ database!]
=end code
This demonstrate that disabling backtracking does not mean disabling possible multiple
iterations of the matching engine, but rather disabling the backward matching tuning.
This demonstrate that disabling backtracking does not mean disabling possible
multiple iterations of the matching engine, but rather disabling the backward
matching tuning.
=head1 C<$/> changes each time a regular expression is matched
Expand Down
2 changes: 1 addition & 1 deletion doc/Language/traps.pod6
Original file line number Diff line number Diff line change
Expand Up @@ -934,7 +934,7 @@ When there are multiple matching alternations, for those separated by
C<||>, the first matching alternation wins; for those separated by C<|>,
which to win is decided by LTM strategy. See also:
L<documentation on C<||>|/language/regexes#Alternation:_||> and
L<documentation on C<|>|/language/regexes#Longest_Alternation:_|>.
L<documentation on C<|>|/language/regexes#Longest_alternation:_|>.
For simple regexes just using C<||> instead of C<|>
will get you familiar semantics, but if writing grammars then it's useful to
Expand Down
6 changes: 3 additions & 3 deletions doc/Language/variables.pod6
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ The C<:> twigil declares a formal named parameter to a block or subroutine.
Variables declared using this form are a type of placeholder variable too.
Therefore the same things that apply to variables declared using the C<^>
twigil also apply here (with the exception that they are not positional and
therefore not ordered using Unicode order, of course). So this:
therefore not ordered using Unicode order, of course). For instance:
say { $:add ?? $^a + $^b !! $^a - $^b }( 4, 5 ) :!add
# OUTPUT: «-1␤»
Expand Down Expand Up @@ -473,8 +473,8 @@ say $foo; # Exception! "Variable '$foo' is not declared"
This dies because C<$foo> is only defined as long as we are in the same
scope.
In order to create more than one variable with a lexical scope in the same sentence
surround the variables with parentheses:
In order to create more than one variable with a lexical scope in the same
sentence surround the variables with parentheses:
my ( $foo, $bar );
Expand Down

0 comments on commit 2e92958

Please sign in to comment.