-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault at long lines for regexp #17230
Comments
With a debugging build I get an assertion without the second s///:
|
I get this on 5.35.10 |
I tested on CygWin perl-5.32.1. It does not seg fault, but the result is clearly wrong.
Looks like a 31-bit overflow. So a workaround might be to simply changing an
|
On Mon, 21 Mar 2022, 20:03 prosaole, ***@***.***> wrote:
I tested on CygWin perl-5.32.1.
It does not seg fault, but the result is clearly wrong.
$ (seq 10;perl -e 'print "\0"x(2**31+50),"a"') | perl -pe '$/=undef; s/(([^\0]+\0{0,10})+)/(length $1)."r\n"/ge; s/(\0+)/(length $1)."\n"/ge;'
2r
29r
4294967297r
$ (seq 10;perl -e 'print "\0"x(2**31-50),"a"') | perl -pe '$/=undef; s/(([^\0]+\0{0,10})+)/(length $1)."r\n"/ge; s/(\0+)/(length $1)."\n"/ge;'
2r
29r
2147483588
1r
Looks like a 31-bit overflow. So a workaround might be to simply changing
an int to a long: I expect very few will have lines with 2^63 chars.
I don't think that is possible, this logic suffers from the semi-predicate
problem and uses -1 to represent "no match". We would need to change the
offset data and possibly other counters from I32 to I64.
Yves
… Message ID: ***@***.***>
|
This still exists in 5.37.12 |
#21012 fixes the crash (or assertion failure when I tried it), but is still limited by the default REG_INFTY:
Unfortunately a lot of the regexp code assumes that REG_INFTY fits in an I32:
When I was scanning* for I32 issues I assumed that was by design so I didn't consider them as something to fix, should they be fixed?
|
"Fixed" implies they are broken, which is a little debatable. Quantifier sizes are stored inside of the regops and the largest size value we can store inside of a regop is 32 bits, adding support for 64 bit values would be awkward, as the data in the compiled regexp program is 32 bit aligned. I am not convinced we should change this, and if we do it is going to involve a lot more than just changing some I32's to IV's. Every CURLY opcode would have to pay a price for the change, or we would have to bifurcate the CURLY opcodes. None of this would be straight forward. Might be better to simply throw an exception if we are matching against a string longer than I32_MAX bytes. There are a lot of parts of the regex engine that simply arent expecting Setting REG_INFTY to a value larger I32_MAX should throw an exception at perl startup or something like that.
We could do it for strings longer than 2GB, but i am not sure how we would do it for /lines/. |
I wasn't suggesting that regops support very large quantifier values, though I assume now that an infinite quantifier is currently represented by REG_INFTY in the op, which I hadn't realized. (I hadn't looked.) To get the effect I was thinking of, when the regop specifies REG_INFTY we could treat that as SSize_t_MAX when matching. As to line length, we can already match strings over 2GB (using the #21012 branch):
it's just that a single quantifier won't match more than 2G-1 times. I'll take a closer look at actually trying to implement this, from a quick look over the code it doesn't look unreasonable to do. |
Yes. In 0678333 I changed it from a U16 to I32.
Yeah, I was a bit slow yesterday, of course we can do that. We used to do that with the 16 bit REG_INFTY as well. The regops STAR and PLUS for the * and + quantifiers also need to be changed, I think they dont use REG_INFTY.
I might do the same, but don't let that stop you from doing it first. I have other priorities just at the moment. Thanks for following up on this! |
On 4/26/23 19:12, Tony Cook wrote:
I wasn't suggesting that regops support very large quantifier values,
though I assume now that an infinite quantifier is currently represented
by REG_INFTY in the op, which I hadn't realized. (I hadn't looked.)
To get the effect I was thinking of, when the regop specifies REG_INFTY
we could treat that as SSize_t_MAX when matching.
It used to be that REG_INFTY was reserved and the regexec.c code would
substitute a much larger value for it as the upper limit of a match. I
haven't looked lately
…
As to line length, we can already match strings over 2GB (using the
#21012 <#21012> branch):
|$ ./miniperl -we '$y = "abcd"; $x = $y x 0x8000_0000; printf "%x\n",
length $x; $x =~ /((?:......)*)/; printf "%x\n", length $1' 200000000
1fffffffe |
it's just that a single quantifier won't match more than 2G-1 times.
I'll take a closer look at actually trying to implement this, from a
quick look over the code it doesn't look unreasonable to do.
—
Reply to this email directly, view it on GitHub
<#17230 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA2DH63XB3SKSDNTKRHEX3XDHBXXANCNFSM4JGF22PQ>.
You are receiving this because you commented.Message ID:
***@***.***>
|
This is a bug report for perl from perlbug@tange.dk.
generated with the help of perlbug 1.41 running under perl 5.30.0.
[Please describe your issue here]
This seg faults:
I had expected it to not seg fault.
These give very different output:
I had expected them to give almost the same output.
It seems the regexp engine is unhappy about lines > 2GB.
[Please do not change anything below this line]
Flags:
category=core
severity=medium
Site configuration information for perl 5.30.0:
Configured by tange at Mon Oct 28 22:16:49 CET 2019.
Summary of my perl5 (revision 5 version 30 subversion 0) configuration:
Platform:
osname=linux
osvers=4.15.0-58-generic
archname=x86_64-linux
uname='linux aspire 4.15.0-58-generic #64-ubuntu smp tue aug 6 11:12:41 utc 2019 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Dprefix=/mnt/4tb/home/tange/localperl'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
bincompat5005=undef
Compiler:
cc='cc'
ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
optimize='-O2'
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='7.4.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.27.so
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.27'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'
@inc for perl 5.30.0:
/mnt/4tb/home/tange/localperl/lib/site_perl/5.30.0/x86_64-linux
/mnt/4tb/home/tange/localperl/lib/site_perl/5.30.0
/mnt/4tb/home/tange/localperl/lib/5.30.0/x86_64-linux
/mnt/4tb/home/tange/localperl/lib/5.30.0
/mnt/4tb/home/tange/localperl/lib/site_perl/5.24.0
/mnt/4tb/home/tange/localperl/lib/site_perl/5.22.2
/mnt/4tb/home/tange/localperl/lib/site_perl
Environment for perl 5.30.0:
HOME=/mnt/4tb/home/tange
LANG=C
LANGUAGE=C
LC_ALL=en_US.UTF-8
LC_TIME=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=.:/mnt/4tb/home/tange/bin:/mnt/4tb/home/tange/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/sbin:/mnt/4tb/home/tange/.local/bin:/mnt/4tb/home/tange/.cargo/bin:/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/bin
PERL_BADLANG (unset)
PERL_MB_OPT=--install_base "/home/tange/perl5"
PERL_MM_OPT=INSTALL_BASE=/home/tange/perl5
SHELL=/bin/bash
The text was updated successfully, but these errors were encountered: