Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.11.0b3 chockes on regex sub from rjsmin, time grows seemingly exponentially #94675

Closed
hroncok opened this issue Jul 7, 2022 · 10 comments
Labels
tests Tests in the Lib/test dir topic-regex type-bug An unexpected behavior, bug, or error

Comments

@hroncok
Copy link
Contributor

hroncok commented Jul 7, 2022

Bug report

When building chromium in Fedora 37 with Python 3.11.0b3, the build hangs seemingly forever.

While isolating the hang, we've noticed it happens in rjsmin, at https://github.com/ndparker/rjsmin/blob/1.2.0/rjsmin.py#L361

I've further reduced the javascript input as much as I could but I kept the regex pattern intact.

Here's the reporucer:

import re, sys

# pattern from rjsmin
# Copyright 2011 - 2021 André Malo or his licensors, as applicable.
# Apache License Version 2.0
# https://github.com/ndparker/rjsmin/blob/1.2.0/rjsmin.py#L196
pattern = '(?<=[(,=:\\[!&|?{};\\r\\n+*-])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040&)+,.:;=?\\]|}-]))?'
# (originally reported as) pattern = '([^\\047"\\140/\\000-\\040]+)|((?:(?:\\047[^\\047\\\\\\r\\n]*(?:\\\\(?:[^\\r\\n]|\\r?\\n|\\r)[^\\047\\\\\\r\\n]*)*\\047)|(?:"[^"\\\\\\r\\n]*(?:\\\\(?:[^\\r\\n]|\\r?\\n|\\r)[^"\\\\\\r\\n]*)*")|(?:\\140[^\\140\\\\]*(?:\\\\(?:[^\\r\\n]|\\r?\\n|\\r)[^\\140\\\\]*)*\\140))[^\\047"\\140/\\000-\\040]*)|(?<=[)])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))(?=(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*\\.(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*[a-z])|(?<=[(,=:\\[!&|?{};\\r\\n+*-])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040&)+,.:;=?\\]|}-]))?|(?<=[\\000-#%-,./:-@\\[-^\\140{-~-]return)(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:((?:(?://[^\\r\\n]*)?[\\r\\n]))(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040&)+,.:;=?\\]|}-]))?|(?<=[^\\000-!#%&(*,./:-@\\[\\\\^{|~])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:((?:(?://[^\\r\\n]*)?[\\r\\n]))(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040"#%-\\047)*,./:-@\\\\-^\\140|-~])|(?<=[^\\000-#%-,./:-@\\[-^\\140{-~-])((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)))+(?=[^\\000-#%-,./:-@\\[-^\\140{-~-])|(?<=\\+)((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)))+(?=\\+)|(?<=-)((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)))+(?=-)|(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))+|(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+'


input_data = """a(function() {{
      {}
}});
""".format('/' * int(sys.argv[1]))

print(re.compile(pattern).sub('', input_data))

The original input reduced from Chromium's JavaScript file looks like:

a(function() {
      //////////////////////////////////////////////////////////////////////////
}); 

And here how long it takes with Python 3.10.5 and 3.11.0b3 for a reduced number of slashes.

$ time python3.10 reproducer.py 30
real    0m0,033s

$ time python3.11 reproducer.py 30
real    0m0,082s

$ time python3.10 reproducer.py 40
real    0m0,034s

$ time python3.11 reproducer.py 40
real    0m6,902s

$ time python3.10 reproducer.py 45
real    0m0,032s

$ time python3.11 reproducer.py 45
real    1m25,004s

$ time python3.10 reproducer.py 50
real    0m0,031s

$ time python3.11 reproducer.py 50
??? killed after 10+ minutes

For 74 slashes, I guess it would run forever.

Your environment

  • CPython versions tested on: 3.11.0b3
  • Operating system and architecture: Fedora Linux 35 or 37, x86_64
@hroncok hroncok added the type-bug An unexpected behavior, bug, or error label Jul 7, 2022
@hroncok
Copy link
Contributor Author

hroncok commented Jul 7, 2022

A more "real" example is to pipe the javascript input to https://github.com/ndparker/rjsmin/blob/1.2.0/rjsmin.py

$ cat file_to_pipe_in.js 
a(function() {
      /////////////////////////////////////////////
});

$ time python3.10 rjsmin.py < file_to_pipe_in.js
a(function(){});
real	0m0,039s
user	0m0,031s
sys	0m0,009s

$ time python3.11 rjsmin.py < file_to_pipe_in.js
...

@hroncok
Copy link
Contributor Author

hroncok commented Jul 7, 2022

I've simplified the regex to:

pattern = '(?<=[(,=:\\[!&|?{};\\r\\n+*-])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040&)+,.:;=?\\]|}-]))?'

The following ones are even shorter and exhibit the same behavior but slightly different times, so I am not editing it in the original report:

pattern = r'(?<=[({}])(?:(?:(?://[^\n]*)?[\n])(?:[\000-\040])*)*((?:/[^/\[\n]*(?:(?:[^\n]|(?:\[\n]*(?:]*)*\]))[^/\[]*)*/))((?:(?:(?://[^\n]*)?[\n])(?:[\000-\040]|(?:/\*[^*]*\*+(?:[^/*]\*+)*/))*)+(?=[^\000-\040&)+,.:;=?\]|}-]))'
pattern = r'(?<=[({}])(((//[^\n]*)?[\n])([\000-\040])*)*((/[^/\[\n]*(([^\n]|(\[\n]*(]*)*\]))[^/\[]*)*/))((((//[^\n]*)?[\n])([\000-\040]|(/\*[^*]*\*+([^/*]\*+)*/))*)+(?=[^\000-\040);\]}]))'

@AlexWaygood AlexWaygood added topic-regex 3.11 only security fixes 3.12 bugs and security fixes labels Jul 8, 2022
@hauntsaninja
Copy link
Contributor

Can you repro on latest main?
A slowdown in Chromium build was previously reported here #91404 (comment)
I believe #93928 did not make beta 3

@hauntsaninja
Copy link
Contributor

(If #93928 did indeed fix, it would be nice to add your reduced example as a test case!)

@hroncok
Copy link
Contributor Author

hroncok commented Jul 8, 2022

Will test.

@hroncok
Copy link
Contributor Author

hroncok commented Jul 8, 2022

This is fixed on main.

@hroncok
Copy link
Contributor Author

hroncok commented Jul 8, 2022

#93928 also fixes it on 3.11

@hroncok hroncok changed the title Python 3.11 chockes on regex sub from rjsmin, time grows seemingly exponentially Python 3.11.0b3 chockes on regex sub from rjsmin, time grows seemingly exponentially Jul 8, 2022
@AlexWaygood AlexWaygood added tests Tests in the Lib/test dir and removed 3.11 only security fixes 3.12 bugs and security fixes labels Jul 8, 2022
@hroncok
Copy link
Contributor Author

hroncok commented Jul 8, 2022

@hauntsaninja Any idea how are the "does not take too long" cases handled in cpython tests?

@hroncok
Copy link
Contributor Author

hroncok commented Jul 8, 2022

I've added #94685 -- hoping that the general timeout thing would stop it on b3

hroncok added a commit to hroncok/cpython that referenced this issue Jul 8, 2022
Use multiprocessing to kill the test after SHORT_TIMEOUT.
hroncok added a commit to fedora-python/cpython that referenced this issue Jul 8, 2022
This fixes a speed regression in the re module
which prevented chromium from building in Fedora.

Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure"

This reverts commit 6e3eee5.

Manual fixups to increase the MAGIC number and to handle conflicts with
a couple of changes that landed after that.

(cherry picked from commit 4beee0c)
Co-authored-by: Gregory P. Smith <greg@krypto.org>

pythongh-94675: Add a regression test for rjsmin re slowdown

Co-authored-by: Miro Hrončok <miro@hroncok.cz>
hroncok added a commit to fedora-python/cpython that referenced this issue Jul 11, 2022
This tests a speed regression in the re module
which prevented chromium from building in Fedora.

python#94685
hrnciar pushed a commit to fedora-python/cpython that referenced this issue Jul 26, 2022
This tests a speed regression in the re module
which prevented chromium from building in Fedora.

python#94685
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 3, 2022
…H-94685)

Adds a regression test for an re slowdown observed by rjsmin.
Uses multiprocessing to kill the test after SHORT_TIMEOUT.

Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Co-authored-by: Christian Heimes <christian@python.org>
(cherry picked from commit fe23c00)

Co-authored-by: Miro Hrončok <miro@hroncok.cz>
gpshead pushed a commit that referenced this issue Aug 3, 2022
Adds a regression test for an re slowdown observed by rjsmin.
Uses multiprocessing to kill the test after SHORT_TIMEOUT.

Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Co-authored-by: Christian Heimes <christian@python.org>
miss-islington added a commit that referenced this issue Aug 3, 2022
Adds a regression test for an re slowdown observed by rjsmin.
Uses multiprocessing to kill the test after SHORT_TIMEOUT.

Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Co-authored-by: Christian Heimes <christian@python.org>
(cherry picked from commit fe23c00)

Co-authored-by: Miro Hrončok <miro@hroncok.cz>
@kumaraditya303
Copy link
Contributor

Fixed by #94685

DaanDeMeyer pushed a commit to DaanDeMeyer/python-rpm that referenced this issue Nov 17, 2022
…uilding

For details, see python/cpython#94675

Backported from upstream 3.11 branch + python/cpython#94685

Needs bootstrap for test_distutils (assert _sre.MAGIC == MAGIC, "SRE module mismatch").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir topic-regex type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants