Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in Perl_re_op_compile at regcomp.c:8488 #17775

Closed
dur-randir opened this issue May 9, 2020 · 14 comments
Closed

Segfault in Perl_re_op_compile at regcomp.c:8488 #17775

dur-randir opened this issue May 9, 2020 · 14 comments
Assignees
Milestone

Comments

@dur-randir
Copy link
Member

dur-randir commented May 9, 2020

This is a bug report for perl from sergey.aleynikov@gmail.com,
generated with the help of perlbug 1.41 running under perl 5.31.10.

[Please describe your issue here]

While fuzzing perl v5.31.9-70-g0c96aa4b7b built with afl and run
under libdislocator, I found the following program

0=~/(?[(?^:(?[[0]])(?0))0])/

to cause a segfault. GDB stack trace is:

0x0000555555636622 in Perl_re_op_compile (patternp=0x0, pat_count=1, expr=0x555555981bd0,
    eng=0x55555594ad20 <PL_core_reg_engine>, old_re=0x0, is_bare_re=0x0, orig_rx_flags=0, pm_flags=0) at regcomp.c:8488
8488	        ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - REGNODE_OFFSET(scan));
(gdb) bt
#0  0x0000555555636622 in Perl_re_op_compile (patternp=0x0, pat_count=1, expr=0x555555981bd0,
    eng=0x55555594ad20 <PL_core_reg_engine>, old_re=0x0, is_bare_re=0x0, orig_rx_flags=0, pm_flags=0) at regcomp.c:8488
#1  0x00005555555b2fb2 in Perl_pmruntime (o=0x555555981c08, expr=0x555555981bd0, repl=0x0, flags=1, floor=0) at op.c:8359
#2  0x00005555556185a4 in Perl_yyparse (gramtype=258) at perly.y:1293
#3  0x00005555555d0207 in S_parse_body (env=0x0, xsinit=0x5555555a01df <xs_init>) at perl.c:2574
#4  0x00005555555cf0cb in perl_parse (my_perl=0x555555955260, xsinit=0x5555555a01df <xs_init>, argc=3, argv=0x7fffffffe1a8,
    env=0x0) at perl.c:1869
#5  0x00005555555a011d in main (argc=3, argv=0x7fffffffe1a8, env=0x7fffffffe1c8) at perlmain.c:132

This is a regression in blead, bisect points to d8d1ded is the first bad commit

commit d8d1dede53afc4f33cf63203b0992459fe964dc3
Author: Karl Williamson <khw@cpan.org>
Date:   Mon Feb 17 12:07:07 2020 -0700

    Improve handling of nested qr/(?[...])/

    A set operations expression can contain a previously-compiled one
    interpolated in.  Prior to this commit, some heuristics were employed
    to verify it actually was such a thing, and not a sort of look-alike
    that wasn't necessarily valid.  The heuristics actually forbade legal
    ones.  I don't know of any illegal ones that were let through, but it is
    certainly possible.  Also, the error/warning messages referred to the
    heuristics, and were unhelpful at best.

    The technique used instead in this commit is to return a regop only used
    by this feature for any nested compilations.  This guarantees that the
    caller can determine if the result is valid, and what that result is
    without having to do any heuristics or inspecting any flags.  The
    error/warning messages are changed to reflect this, and I believe are
    now helpful.

    This fixes the bugs in #16779
    https://github.com/Perl/perl5/issues/16779#issuecomment-563987618
[Please do not change anything below this line]
Flags:
category=core
severity=medium
Site configuration information for perl 5.31.10:

Configured by root at Fri Mar 13 17:15:02 MSK 2020.

Summary of my perl5 (revision 5 version 31 subversion 10) configuration:
Commit id: 0c96aa4
Platform:
osname=linux
osvers=4.19.0-8-amd64
archname=x86_64-linux
uname='linux dorothy 4.19.0-8-amd64 #1 smp debian 4.19.98-1 (2020-01-26) x86_64 gnulinux '
config_args='-de -Dusedevel -Doptimize=-O2'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler:
cc='cc'
ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
optimize='-O2'
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='8.3.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.28.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.28'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

@inc for perl 5.31.10:
lib
/usr/local/lib/perl5/site_perl/5.31.10/x86_64-linux
/usr/local/lib/perl5/site_perl/5.31.10
/usr/local/lib/perl5/5.31.10/x86_64-linux
/usr/local/lib/perl5/5.31.10

Environment for perl 5.31.10:
HOME=/home/afl
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE=en_US.UTF-8
LC_TIME=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.30.0-dbg/bin:/opt/local/bin:/usr/texbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PERLBREW_HOME=/home/afl/.perlbrew
PERLBREW_MANPATH=/home/afl/perlbrew/perls/perl-5.30.0-dbg/man
PERLBREW_PATH=/home/afl/perlbrew/bin:/home/afl/perlbrew/perls/perl-5.30.0-dbg/bin
PERLBREW_PERL=perl-5.30.0-dbg
PERLBREW_ROOT=/home/afl/perlbrew
PERLBREW_SHELLRC_VERSION=0.88
PERLBREW_VERSION=0.88
PERL_BADLANG (unset)
@khwilliamson khwilliamson added this to the 5.32.0 milestone May 9, 2020
@hvds
Copy link
Contributor

hvds commented May 12, 2020

At an initial look, it appears the chunk at regcomp.c:16465 isn't working as intended - it is relying on the documented return value of reg() (same as regbranch()):

/* On success, returns the offset at which any next node should be placed into
 * the regex engine program being compiled.

.. but stepping through regbranch(), it appears that it actually returns the offset of the first node it has emitted - which in this case is probably always 1.

The size of the extended charclass node appears to be 4, and the patch below appears to now give the intended error instead of a core:

% ./miniperl -e 'qr{(?[(?^:(?[[0]])(?0))0])}'
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^:(?[[0]])(?0))0])/ at -e line 1.
Expecting interpolated extended charclass in regex; marked by <-- HERE in m/(?[(?^:(?[[0]])(?0)) <-- HERE 0])/ at -e line 1.

@khwilliamson please take a look whether this looks sane.

I'll try to dig into the history of the discrepancy between documentation and reality for the return values of reg() and regbranch().

--- a/regcomp.c
+++ b/regcomp.c
@@ -16466,7 +16466,7 @@ redo_curchar:
                            /* If more than a single node returned, the nested
                             * parens evaluated to more than just a (?[...]),
                             * which isn't legal */
-                        || node != 1) {
+                        || RExC_emit != 1 + 4) {
                         vFAIL("Expecting interpolated extended charclass");
                     }
                     resultant_invlist = (SV *) ARGp(REGNODE_p(node));

jkeenan added a commit that referenced this issue May 12, 2020
@jkeenan
Copy link
Contributor

jkeenan commented May 12, 2020

At an initial look, it appears the chunk at regcomp.c:16465 isn't working as intended - it is relying on the documented return value of reg() (same as regbranch()):

/* On success, returns the offset at which any next node should be placed into
 * the regex engine program being compiled.

.. but stepping through regbranch(), it appears that it actually returns the offset of the first node it has emitted - which in this case is probably always 1.

The size of the extended charclass node appears to be 4, and the patch below appears to now give the intended error instead of a core:

% ./miniperl -e 'qr{(?[(?^:(?[[0]])(?0))0])}'
The regex_sets feature is experimental in regex; marked by <-- HERE in m/(?[ <-- HERE (?^:(?[[0]])(?0))0])/ at -e line 1.
Expecting interpolated extended charclass in regex; marked by <-- HERE in m/(?[(?^:(?[[0]])(?0)) <-- HERE 0])/ at -e line 1.

@khwilliamson please take a look whether this looks sane.

I'll try to dig into the history of the discrepancy between documentation and reality for the return values of reg() and regbranch().

--- a/regcomp.c
+++ b/regcomp.c
@@ -16466,7 +16466,7 @@ redo_curchar:
                            /* If more than a single node returned, the nested
                             * parens evaluated to more than just a (?[...]),
                             * which isn't legal */
-                        || node != 1) {
+                        || RExC_emit != 1 + 4) {
                         vFAIL("Expecting interpolated extended charclass");
                     }
                     resultant_invlist = (SV *) ARGp(REGNODE_p(node));

@hvds I placed your patch in a smoke-me branch, smoke-me/jkeenan/hvds/gh-17775. While it's PASSing and FAILing on our smoke-testers in the normal places, it's FAILing on two of our github actions. See:
https://github.com/Perl/perl5/runs/665552670
https://github.com/Perl/perl5/runs/665552641

I do not know how reliable these github actions setups are, so I can't say whether they are the basis for rejecting your patch. Can you take a look?

Thank you very much.
Jim Keenan

@hvds
Copy link
Contributor

hvds commented May 12, 2020

I placed your patch in a smoke-me branch, smoke-me/jkeenan/hvds/gh-17775. While it's PASSing and FAILing on our smoke-testers in the normal places, it's FAILing on two of our github actions. See:
https://github.com/Perl/perl5/runs/665552670
https://github.com/Perl/perl5/runs/665552641

These are definitely test failures caused by the patch:

re/reg_mesg.t                                                    (Wstat: 0 Tests: 3174 Failed: 2)
  Failed tests:  452, 1268
re/regex_sets.t                                                  (Wstat: 65280 Tests: 27 Failed: 0)
  Non-zero exit status: 255

I'll take a look.

@hvds
Copy link
Contributor

hvds commented May 12, 2020

The regex_sets failure is:

Expecting interpolated extended charclass in regex; marked by <-- HERE in m/(?[ \p{Digit} & (?^:(?[ \p{Thai} + \p{Lao} ])) <-- HERE  ])/ at re/regex_sets.t line 61.

.. and this appears to be because the size of the extended charclass op is 2 on 32-bit, not 4.

To fix that I'll need to work out how to ask the size of it properly, unless @khwilliamson already knows.

The two failures in reg_mesg.t are probably occurring for the same reason: the regexp introduced as a test for #16649 now fails with "Expecting interpolated extended charclass" instead of the expected message.

@khwilliamson
Copy link
Contributor

It will take me a few minutes to get the code to do the size properly.

@khwilliamson
Copy link
Contributor

I replaced that smoke test with one that should work on 32 bit systems, using the standard way to find the size of a node

I'm thinking for 5.33 that this paradigm is used in enough places that there should be a macro, say, NODE_EQUIVALENTS(node) that encapsulates it.

@hvds
Copy link
Contributor

hvds commented May 12, 2020

I'll try to dig into the history of the discrepancy between documentation and reality for the return values of reg() and regbranch().

It looks like that part of the documentation was already wrong when first added as part of f55b7b2.

I think what both these functions actually now return is the offset of a newly inserted node that either needs its 'next' pointer updated to point to whatever will get added next, or starts a chain whose 'next' pointers can be followed to find such a node - it appears the primary use (other than as a success marker) is to pass into a subsequent REGTAIL call.

@xsawyerx
Copy link
Member

I'm not sure where we are on this.

@hvds
Copy link
Contributor

hvds commented May 21, 2020

I'm not sure where we are on this.

Assuming smokes are clean, we should go with Karl's patch - @khwilliamson, @jkeenan?

@jkeenan
Copy link
Contributor

jkeenan commented May 21, 2020 via email

@xsawyerx
Copy link
Member

@khwilliamson thoughts?

@khwilliamson
Copy link
Contributor

I'm fine with taking my revision to @hvds patch

@xsawyerx
Copy link
Member

I made a comment on the commit. There seems to be an XXX: wrong added to a line there. Is this something that should be resolved before we merge the fix?

@xsawyerx xsawyerx self-assigned this May 24, 2020
xsawyerx pushed a commit that referenced this issue May 24, 2020
xsawyerx pushed a commit that referenced this issue May 25, 2020
@xsawyerx
Copy link
Member

Resolved. Thanks, everyone. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants