Skip to content

Add to documentation of -a in perlrun #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

1nickt
Copy link

@1nickt 1nickt commented Jul 30, 2015

No description provided.

@MartinMcGrath
Copy link
Contributor

This repo is a mirror only, see perlhack/Super Quick Patch Guide

@1nickt
Copy link
Author

1nickt commented Jul 31, 2015

Thanks.

On Fri, Jul 31, 2015 at 2:18 AM, Martin McGrath notifications@github.com
wrote:

This repo is a mirror only, see perlhack/Super Quick Patch Guide
http://perldoc.perl.org/perlhack.html#SUPER-QUICK-PATCH-GUIDE


Reply to this email directly or view it on GitHub
#9 (comment).

@1nickt 1nickt closed this Aug 3, 2015
p5p pushed a commit that referenced this pull request Sep 15, 2016
This macro follows Unicode Corrigendum #9 to allow non-character code
points.  These are still discouraged but not completely forbidden.

It's best for code that isn't intended to operate on arbitrary other
code text to use the original definition, but code that does things,
such as source code control, should change to use this definition if it
wants to be Unicode-strict.

Perl can't adopt C9 wholesale, as it might create security holes in
existing applications that rely on Perl keeping non-chars out.
p5p pushed a commit that referenced this pull request Sep 15, 2016
This macro follows Unicode Corrigendum #9 to allow non-character code
points.  These are still discouraged but not completely forbidden.

It's best for code that isn't intended to operate on arbitrary other
code text to use the original definition, but code that does things,
such as source code control, should change to use this definition if it
wants to be Unicode-strict.

Perl can't adopt C9 wholesale, as it might create security holes in
existing applications that rely on Perl keeping non-chars out.
p5p pushed a commit that referenced this pull request Sep 17, 2016
This macro follows Unicode Corrigendum #9 to allow non-character code
points.  These are still discouraged but not completely forbidden.

It's best for code that isn't intended to operate on arbitrary other
code text to use the original definition, but code that does things,
such as source code control, should change to use this definition if it
wants to be Unicode-strict.

Perl can't adopt C9 wholesale, as it might create security holes in
existing applications that rely on Perl keeping non-chars out.
p5p pushed a commit that referenced this pull request Sep 18, 2016
This macro follows Unicode Corrigendum #9 to allow non-character code
points.  These are still discouraged but not completely forbidden.

It's best for code that isn't intended to operate on arbitrary other
code text to use the original definition, but code that does things,
such as source code control, should change to use this definition if it
wants to be Unicode-strict.

Perl can't adopt C9 wholesale, as it might create security holes in
existing applications that rely on Perl keeping non-chars out.
demerphq added a commit that referenced this pull request Oct 25, 2022
demerphq added a commit that referenced this pull request Nov 5, 2022
demerphq added a commit that referenced this pull request Nov 5, 2022
demerphq added a commit that referenced this pull request Nov 5, 2022
demerphq added a commit that referenced this pull request Nov 5, 2022
demerphq added a commit that referenced this pull request Dec 31, 2022
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jan 30, 2023
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jan 30, 2023
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jan 30, 2023
- floor(), abs_floor(), ceil() and abs_ceil() added
- roundoption integrated as fifth argument to format_number()
- see Perl#9
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jan 30, 2023
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jan 30, 2023
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jan 30, 2023
- to export all constants use :constants or :all
- see Perl#9
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jan 30, 2023
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jan 30, 2023
- Documentation changes
 - explain undef as argument to round() and format_number()
- see Perl#9
demerphq added a commit that referenced this pull request Feb 8, 2023
demerphq added a commit that referenced this pull request Feb 19, 2023
demerphq added a commit that referenced this pull request Feb 20, 2023
khwilliamson pushed a commit to khwilliamson/perl5 that referenced this pull request Jul 16, 2023
fix stack usage in vcmp method (cmp overload)
khwilliamson added a commit to khwilliamson/perl5 that referenced this pull request Sep 28, 2024
I'm uncertain about this commit.

There are three separate DFA tables already in core.  One accepts Perl
extended UTF-8; one accepts only strict Unicode UTF-8; and the third
accepts modified Unicode UTF-8 spelled out by them in Corrigendum Perl#9.

Both the Unicode varieties reject surrogate code points and anything
above U+10FFFF.  C9 accepts, but the other rejects non-character code
points.

Without this commit, the way it works is it uses the most restrictive
table for the DFA.  Anything it accepts is always valid.  Anything it
rejects is potentially problematic, and it calls a non-inlined function
to examine the input more slowly to determine if it is acceptable and/or
if a warning needs to be raised.

This commit examines the input flags to determine which DFA to use
in this situation.  The benefit is that the slower routine could be
avoided for many more code points.

But the vast vast majority of calls to this function aren't for any
problematic code points, so the extra cost of this will very rarely be
recouped. The translation from UTF-8 is critically important.  We want
it to be as fast as possible.  I would not even consider this commit if
the extra cost weren't very small.

A complicating factor is that 2048 (approximately 20% of the total)
Korean Hangul syllable code points are not handled by the strict table,
so must be by the slower function; though they're handled at the very
beginning of it.  These code points are never problematic, so it is
unfortunate that they have to be handled via the slower function.  But
still, rarely will this function be called with them.  Only the strict
table has this problem

The way this commit works is to have a table containing pointers to the
three DFA tables.  The function looks at the input flags; if none are
present, it uses the loosest dfa; if any restrictions are present, it
adds 1 to the index to use, and it the C9 resetrictions are present, it
adds an extra 1.  The flags are cast to bools to get each addition.  If
the bool casts didn't generate conditionals, the only cost to this would
be two additions and an indirection; and I would say that that cost is
so tiny that this would be worth it.  But I looked at godbolt, and
casting to bool requires a comparison on both modern clang and gcc.
That makes me unsure of the tradeoff.

Another option would be to just juse two DFAs, loose and most strict.
Then there would be a single conditional, and the Hanguls still would be
handled by the DFA when there were no flags restricting things
khwilliamson added a commit to khwilliamson/perl5 that referenced this pull request Sep 29, 2024
I'm uncertain about this commit.

There are three separate DFA tables already in core.  One accepts Perl
extended UTF-8; one accepts only strict Unicode UTF-8; and the third
accepts modified Unicode UTF-8 spelled out by them in Corrigendum Perl#9.

Both the Unicode varieties reject surrogate code points and anything
above U+10FFFF.  C9 accepts, but the other rejects non-character code
points.

Without this commit, the way it works is it uses the most restrictive
table for the DFA.  Anything it accepts is always valid.  Anything it
rejects is potentially problematic, and it calls a non-inlined function
to examine the input more slowly to determine if it is acceptable and/or
if a warning needs to be raised.

This commit examines the input flags to determine which DFA to use
in this situation.  The benefit is that the slower routine could be
avoided for many more code points.

But the vast vast majority of calls to this function aren't for any
problematic code points, so the extra cost of this will very rarely be
recouped. The translation from UTF-8 is critically important.  We want
it to be as fast as possible.  I would not even consider this commit if
the extra cost weren't very small.

A complicating factor is that 2048 (approximately 20% of the total)
Korean Hangul syllable code points are not handled by the strict table,
so must be by the slower function; though they're handled at the very
beginning of it.  These code points are never problematic, so it is
unfortunate that they have to be handled via the slower function.  But
still, rarely will this function be called with them.  Only the strict
table has this problem

The way this commit works is to have a table containing pointers to the
three DFA tables.  The function looks at the input flags; if none are
present, it uses the loosest dfa; if any restrictions are present, it
adds 1 to the index to use, and it the C9 resetrictions are present, it
adds an extra 1.  The flags are cast to bools to get each addition.  If
the bool casts didn't generate conditionals, the only cost to this would
be two additions and an indirection; and I would say that that cost is
so tiny that this would be worth it.  But I looked at godbolt, and
casting to bool requires a comparison on both modern clang and gcc.
That makes me unsure of the tradeoff.

Another option would be to just juse two DFAs, loose and most strict.
Then there would be a single conditional, and the Hanguls still would be
handled by the DFA when there were no flags restricting things
khwilliamson added a commit to khwilliamson/perl5 that referenced this pull request Sep 29, 2024
I'm uncertain about this commit.

There are three separate DFA tables already in core.  One accepts Perl
extended UTF-8; one accepts only strict Unicode UTF-8; and the third
accepts modified Unicode UTF-8 spelled out by them in Corrigendum Perl#9.

Both the Unicode varieties reject surrogate code points and anything
above U+10FFFF.  C9 accepts, but the other rejects non-character code
points.

Without this commit, the way it works is it uses the most restrictive
table for the DFA.  Anything it accepts is always valid.  Anything it
rejects is potentially problematic, and it calls a non-inlined function
to examine the input more slowly to determine if it is acceptable and/or
if a warning needs to be raised.

This commit examines the input flags to determine which DFA to use
in this situation.  The benefit is that the slower routine could be
avoided for many more code points.

But the vast vast majority of calls to this function aren't for any
problematic code points, so the extra cost of this will very rarely be
recouped. The translation from UTF-8 is critically important.  We want
it to be as fast as possible.  I would not even consider this commit if
the extra cost weren't very small.

A complicating factor is that 2048 (approximately 20% of the total)
Korean Hangul syllable code points are not handled by the strict table,
so must be by the slower function; though they're handled at the very
beginning of it.  These code points are never problematic, so it is
unfortunate that they have to be handled via the slower function.  But
still, rarely will this function be called with them.  Only the strict
table has this problem

The way this commit works is to have a table containing pointers to the
three DFA tables.  The function looks at the input flags; if none are
present, it uses the loosest dfa; if any restrictions are present, it
adds 1 to the index to use, and it the C9 resetrictions are present, it
adds an extra 1.  The flags are cast to bools to get each addition.  If
the bool casts didn't generate conditionals, the only cost to this would
be two additions and an indirection; and I would say that that cost is
so tiny that this would be worth it.  But I looked at godbolt, and
casting to bool requires a comparison on both modern clang and gcc.
That makes me unsure of the tradeoff.

Another option would be to just juse two DFAs, loose and most strict.
Then there would be a single conditional, and the Hanguls still would be
handled by the DFA when there were no flags restricting things
khwilliamson added a commit to khwilliamson/perl5 that referenced this pull request Oct 1, 2024
I'm uncertain about this commit.

There are three separate DFA tables already in core.  One accepts Perl
extended UTF-8; one accepts only strict Unicode UTF-8; and the third
accepts modified Unicode UTF-8 spelled out by them in Corrigendum Perl#9.

Both the Unicode varieties reject surrogate code points and anything
above U+10FFFF.  C9 accepts, but the other rejects non-character code
points.

Without this commit, the way it works is it uses the most restrictive
table for the DFA.  Anything it accepts is always valid.  Anything it
rejects is potentially problematic, and it calls a non-inlined function
to examine the input more slowly to determine if it is acceptable and/or
if a warning needs to be raised.

This commit examines the input flags to determine which DFA to use
in this situation.  The benefit is that the slower routine could be
avoided for many more code points.

But the vast vast majority of calls to this function aren't for any
problematic code points, so the extra cost of this will very rarely be
recouped. The translation from UTF-8 is critically important.  We want
it to be as fast as possible.  I would not even consider this commit if
the extra cost weren't very small.

A complicating factor is that 2048 (approximately 20% of the total)
Korean Hangul syllable code points are not handled by the strict table,
so must be by the slower function; though they're handled at the very
beginning of it.  These code points are never problematic, so it is
unfortunate that they have to be handled via the slower function.  But
still, rarely will this function be called with them.  Only the strict
table has this problem

The way this commit works is to have a table containing pointers to the
three DFA tables.  The function looks at the input flags; if none are
present, it uses the loosest dfa; if any restrictions are present, it
adds 1 to the index to use, and it the C9 resetrictions are present, it
adds an extra 1.  The flags are cast to bools to get each addition.  If
the bool casts didn't generate conditionals, the only cost to this would
be two additions and an indirection; and I would say that that cost is
so tiny that this would be worth it.  But I looked at godbolt, and
casting to bool requires a comparison on both modern clang and gcc.
That makes me unsure of the tradeoff.

Another option would be to just juse two DFAs, loose and most strict.
Then there would be a single conditional, and the Hanguls still would be
handled by the DFA when there were no flags restricting things
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants