Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$1 dynamic scoping breaks with recursion #6543

Open
p5pRT opened this issue May 29, 2003 · 13 comments
Open

$1 dynamic scoping breaks with recursion #6543

p5pRT opened this issue May 29, 2003 · 13 comments

Comments

@p5pRT
Copy link

p5pRT commented May 29, 2003

Migrated from rt.perl.org#22369 (status was 'open')

Searchable as RT22369$

@p5pRT
Copy link
Author

p5pRT commented May 29, 2003

From frederik@ugcs.caltech.edu

The perlre man page says:

The numbered variables ($1, $2, $3, etc.) and the related punctuation
set ($+, $&, $`, $', and $^N) are all dynamically scoped until the
end of the enclosing block or until the next successful match,
whichever comes first.

The phrase "until the next successful match" is not quite clear to me.
Unless it means the obvious "until the next successful match in the
same enclosing scope", I would have interpreted it as "until the next
successful match anywhere". But the latter is not true (otherwise the
dynamic scoping would be useless); instead the statement seems to
indicate that the variables will be overwritten only if another match
is done using the same piece of code, even in a different scope:

sub r
{
shift =~ /(.*)/;
if(shift) {
r("bar", 0);
print "$1\n";
}
}
r("foo", 1);

# prints: bar
# (expected: foo)

In addition to strange action-at-a-distance, this behavior means that
replacing a call to one function with a call to another which has an
identical body to the first can change the behavior of your program:

# t is a copy of r
sub t
{
shift =~ /(.*)/;
if(shift) {
t("bar", 0);
print "$1\n";
}
}
sub r
{
shift =~ /(.*)/;
if(shift) {
t("bar", 0);
print "$1\n";
}
}
r("foo", 1);

# prints: foo

I know of no other programming language that breaks this common
invariant of functions. Furthermore, I can't imagine why the above
behavior would be a desirable feature of Perl, unless the only
possible fix results in considerably slower performance. So I am
reporting it as a bug, either in the regular expression matching code,
or in the documentation, which should be clearer if this behavior is
indeed intentional.

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl v5.8.0:

Configured by Debian Project at Mon Feb 17 13:43:44 UTC 2003.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.4.19-powerpc-smp, archname=powerpc-linux-thread-multi
    uname='linux voltaire 2.4.19-powerpc-smp #1 smp mon sep 9 09:11:02 edt 2002 ppc unknown unknown gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=powerpc-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8.0 -Darchlib=/usr/lib/perl/5.8.0 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.0 -Dsitearch=/usr/local/lib/perl/5.8.0 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.0 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O3',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing'
    ccversion='', gccversion='3.2.3 20030210 (Debian prerelease)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.1.so, so=so, useshrplib=true, libperl=libperl.so.5.8.0
    gnulibc_version='2.3.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    


@INC for perl v5.8.0:
    /etc/perl
    /usr/local/lib/perl/5.8.0
    /usr/local/share/perl/5.8.0
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8.0
    /usr/share/perl/5.8.0
    /usr/local/lib/site_perl
    .


Environment for perl v5.8.0:
    HOME=/home/frederik
    LANG=C
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/frederik/bin:/usr/X11R6/bin:/usr/local/bin:/usr/bin:/bin:/home/frederik/sbin:/usr/X11R6/sbin:/usr/local/sbin:/usr/sbin:/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented May 30, 2003

From @rgs

"frederik@​ugcs.caltech.edu (via RT)" <perlbug-followup@​perl.org> wrote​:

sub r
{
shift =~ /(.*)/;
if(shift) {
r("bar", 0);
print "$1\n";
}
}
r("foo", 1);

# prints​: bar
# (expected​: foo)

As I *know* how the digit variables are implemented, I actually expect
"bar" there, but you're right, most people probably expect "foo".

Basically regexp vars ($1 and alii) are associated to regexps. There's
one set of such variables per regexp. In addition, there's some magic
to change the "current" regexp -- the one in scope -- at scope exit.
There's only one regexp in your code snippet I quoted, thus only one $1
variable slot, so it's normal that you get only one value out of $1.

This could be better documented.

@p5pRT
Copy link
Author

p5pRT commented May 21, 2007

From guest@guest.guest.xxxxxxxx

Someone, please do document this. This bit me too recently. Unfortunately, I cannot
document it myself, not fully understanding how dynamic scoping works (hence the need for
better documentation :-).

At least this much I understand​:

$u = ",echle etn sJ";
$t = "\nrka rPrhoatu";
$_ = $u.$t;
sub foo { s/(.)//s or return; bar(); print chop $$1 }
sub bar { s/(.)//s or return; foo(); print chop $$1 }
foo

@p5pRT
Copy link
Author

p5pRT commented Jun 27, 2011

From perlbug@spamwagon.com

When PerlIO::via runs the WRITE stack (or possibly any stack),
it does not appear to create a new dynamic scope for each layer
method called. Modifying the global regular expression match
variables like $1 in a lower IO layer can cause $1 to have an
unexpected value in the IO layer above.

Consider the following example. Both PerlIO::via::RegexBug and
PerlIO::via::RegexBugWorkaround perform the same function. They
iterate through the supplied $buffer character by character and
print a space before each one. PerlIO::via::RegexBugWorkaround
preserved the value of $1 in a lexical variable while calling into
the lower layer to print the space, while PerlIO::via::RegexBug
relies on $1's value to remain the same over the call to
print $fh ' ';

PerlIO::via::RegexBugWorkaround can be layered as expected, and
produces output prepended with one or two spaces, depending on
how many times it's been pushed onto the stack.

PerlIO::via::RegexBug works fine when pushed once. When pushed
twice, the value of $1 changes over the "print $fh ' ';" call,
as the lower instance of PerlIO::via::RegexBug matched a space
against the ' ' string.

package PerlIO::via::RegexBug;
use strict;

sub PUSHED {
my ($class, $mode, $fh) = @_;

return bless {}, $class;
}

sub WRITE {
my ($this, $buffer, $fh) = @_;
my $len = length $buffer;
while ($buffer =~ m/\G(.)/gs) {
my $c = $1;
print $fh ' ';
if ($c ne $1) {
die "Invalid output char expected:'$c' got:'$1'\n";
}
print $fh $1;
}
$len;
}

1;

package PerlIO::via::RegexBugWorkaround;
use strict;

sub PUSHED {
my ($class, $mode, $fh) = @_;

return bless {}, $class;
}

sub WRITE {
my ($this, $buffer, $fh) = @_;
my $len = length $buffer;
while ($buffer =~ m/\G(.)/gs) {
my $c = $1;
print $fh ' ';
print $fh $c;
}
$len;
}

1;


package main;

my @lines = <DATA>;
my $out = \*STDOUT;

Header('NORMAL');
TestOutput($out);

Header('WITH 1xRegexBugWorkaround');
binmode $out, ":via(RegexBugWorkaround)";
TestOutput($out);

Header('WITH 2xRegexBugWorkaround');
binmode $out, ":via(RegexBugWorkaround)";
TestOutput($out);

binmode $out, ":pop";
binmode $out, ":pop";


Header('NORMAL');
TestOutput($out);

Header('WITH 1xRegexBug');
binmode $out, ":via(RegexBug)";
TestOutput($out);

Header('WITH 2xRegexBug');
binmode $out, ":via(RegexBug)";
TestOutput($out);

sub Header {
print STDERR "="x40,"\n";
print STDERR "="x4," ",@_,"\n";
print STDERR "="x40,"\n";
}

sub TestOutput {
my $out = shift;
foreach (@lines) {
print $out $_;
}
}

__DATA__
This is some text
which may match
isnt that fun
Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.12.3:

Configured by 1 at Sun May 15 16:53:01 2011.

Summary of my perl5 (revision 5 version 12 subversion 3) configuration:
  
  Platform:
    osname=MSWin32, osvers=5.1, archname=MSWin32-x86-multi-thread
    uname='Win32 strawberryperl 5.12.3.0 #1 Sun May 15 09:44:53 2011 i386'
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags =' -s -O2 -DWIN32 -DHAVE_DES_FCRYPT 
-DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS
-fno-strict-aliasing -mms-bitfields -DPERL_MSVCRT_READFIX',
    optimize='-s -O2',
    cppflags='-DWIN32'
    ccversion='', gccversion='4.4.3', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='long
long', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='g++', ldflags ='-s -L"C:\strawberry\perl\lib\CORE"
-L"C:\strawberry\c\lib"'
    libpth=C:\strawberry\c\lib C:\strawberry\c\i686-w64-mingw32\lib
    libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32
-ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr
-lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool
-lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid
-lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32
    libc=, so=dll, useshrplib=true, libperl=libperl512.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-mdll -s -L"C:\strawberry\perl\lib\CORE"
-L"C:\strawberry\c\lib"'

Locally applied patches:
   


@INC for perl 5.12.3:
    C:/strawberry/perl/site/lib
    C:/strawberry/perl/vendor/lib
    C:/strawberry/perl/lib
    .


Environment for perl 5.12.3:
    HOME=C:\users\Jerbraun
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=C:\strawberry\perl\bin
    PERLDOC=-n "groff.exe -E -mtty-char -Tascii -P-c"
    PERL_BADLANG (unset)
    PERL_JSON_BACKEND=JSON::XS
    PERL_YAML_BACKEND=YAML
    SHELL (unset)


@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2011

From @cpansprout

On Mon Jun 27 16​:13​:21 2011, perlbug@​spamwagon.com wrote​:

When PerlIO​::via runs the WRITE stack (or possibly any stack),
it does not appear to create a new dynamic scope for each layer
method called. Modifying the global regular expression match
variables like $1 in a lower IO layer can cause $1 to have an
unexpected value in the IO layer above.

You are right in saying ‘possibly any stack’. This is the same as bug
#22369. (But I disagree with the assessment in that ticket that it is
just a matter of documenting it. This *is* a bug that needs to be fixed,
IMO, since it is so counter-intuitive.)

@p5pRT
Copy link
Author

p5pRT commented Jul 3, 2011

The RT System itself - Status changed from 'new' to 'open'

@toddr
Copy link
Member

toddr commented Feb 14, 2020

I don't follow how you could fix it at this @cpansprout and possible @demerphq "spamwagon", I don't see how you could possibly change its behavior at this point without causing a even more confusion.

@hvds
Copy link
Contributor

hvds commented Feb 15, 2020

I think if we were to come up with a technical fix (which I'd certainly like to see), the next step would be various levels of smoking to see if this would cause widespread breakage or not.

My first instinct is that a) the kind of recursion that tickles it would be relatively rare; b) that in most cases where people know that they have encountered it, their workarounds would continue to work if it were fixed; c) that in most cases where people don't realise they've encountered it, fixing it would silently fix a bug for them; d) that most exceptions will be deliberately too-clever-by-half code in JAPHs or the like.

I can't think of an easy way to test that instinct other than to try it, unless readers report that they've knowingly used this in production code.

Hugo

@khwilliamson
Copy link
Contributor

khwilliamson commented Feb 15, 2020 via email

@demerphq
Copy link
Collaborator

This ticket contains two different bugs. I am not sure if the first bug is really a bug, but the second part does look like a bug.

@khwilliamson any thoughts?

@demerphq
Copy link
Collaborator

FWIW, I looked into this. The code that executes via hooks does not do a SAVETMPS and FREETMPS. Just ENTER/LEAVE. But when I added SAVETMPS and FREETMPS it did not seem to help. Reading more it seemed like the context code changed in 5.24 or so, and I am still wrapping my head around its implications. I think that it should be creating a context block to use for rolling back PL_curpm, but I admit I could be totally wrong. @iabyn do you have any thoughts on that?

@iabyn
Copy link
Contributor

iabyn commented Dec 31, 2022 via email

@demerphq
Copy link
Collaborator

But since WRITE is now being called recursively (once for each layer), surely this is exactly the same bug as #1?

I dont know. That was what i was unsure of. If something is triggered "magically", should it behave differently than if it is executed normally. I was thinking maybe it should, and I found some hints in the docs about context that seemed to suggest as much.

PerlIO::via invokes the callbacks using call_sv(), which itself calls pp_entersub(), which pushes a CXt_SUB context as normal, which saves a copy of PL_curpm as normal, which is restored as normal on sub return.

That was what I didn't understand when I wrote my comment.

But this also ties in with the fact that at the moment we don't properly split regexes (i.e. SVt_REGEXP SVs) into separate fixed and dynamic parts. Really the info such as capture indexes should be separate in each SV, while multiple SVs would all link to the same ref-counted 'fixed' data structure.

Yes. I did something like this, with the "mother_re" stuff, which helped somewhat, but I think I wasn't brave enough to completely split out the capture buffer state from the rest. I am now however. I will take care of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants