-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex fails expected match #47936
Comments
Your regex is matching at least one
|
Why does it not match the last three characters though? Regex need not match the full string, but should be able to match a substring. For example: julia> m = match(r"a+[bc]c", "ababc")
RegexMatch("abc") Here, the last three letters are being matched. Similarly, OP's regex should match the same last three letters. |
Also: julia> match(r"a[bc]+c", "ababc")
RegexMatch("abc") Seems to have regressed in 1.6, btw as with 1.5: julia> match(r"a+[bc]+c", "ababc")
RegexMatch("abc") |
Nice catch, I missed that this was the intention! |
Cursory testing suggests this is due to anchoring - see https://regex101.com/r/YoewW3/1 and toggle the |
Change and/or regression in PCRE2 then, it was updated from 10.36 to 10.40 between 1.7 and 1.8 (v1.7.3...v1.8.3#diff-077a0e0d8ff036a0ed8b93361a2c26cfc8ce3d96049890896e25c20d06e07461R97). |
Which is weird, since we don't pass the anchoring flag in that codepath, as far as I can tell 🤔 It does work though if we pass in
|
Nope, must have been earlier than that: julia> VERSION
v"1.5.1"
julia> match(r"a+[bc]+c", "ababc")
RegexMatch("abc") julia> VERSION
v"1.6.7"
julia> match(r"a+[bc]+c", "ababc") |
Works in 1.6.6, and indeed that same version bump of PCRE was backported for 1.6.7 (v1.6.6...v1.6.7#diff-077a0e0d8ff036a0ed8b93361a2c26cfc8ce3d96049890896e25c20d06e07461L94). |
The only change in PCRE related to anchoring I can find the in the changelog between 10.36 and 10.40 is this, in 10.38: So we could try with 10.37 to see if 10.38 is to blame, and the current most recent version 10.42 and if that also doesn't work, open an issue upstream? |
FWIW, the regex seems to work in pcre2 10.40:
|
Well it also doesn't work if I build PCRE2 10.37 from source on a current-ish master, but it DOES work if I build with 10.36. Set So something must have changed in PCRE2 v10.37, with regards to the flags we are passing. PCRE2Project/pcre2@pcre2-10.36...pcre2-10.37 I can't spot anything that jumps out as being the culprit just from those commits though..
|
For what is worth, last week I built pcre2 10.41 and 10.42: https://github.com/JuliaBinaryWrappers/PCRE2_jll.jl. I haven't got around updating it here in Julia, but if anyone wants to do it before me (I don't know when I'd be able to do it in the next few weeks), please go ahead. |
Based on jakobnissen and martinholters's counterexamples that simply remove a julia> match(r"a+[bc]{1,2}c", "ababc")
RegexMatch("abc")
julia> match(r"(a)+[bc]+c", "ababc")
RegexMatch("abc", 1="a") Here are some similar cases that share the issue: julia> match(r"a{1,2}[bc]+c", "ababc")
julia> match(r"(a+)[bc]+c", "ababc")
|
As mentioned above, recent master with PCRE2 10.36 works, while PCRE2 10.37 does not, leading me to believe this is an issue with PCRE. To be clear, I also think now that anchoring was a red herring. Here's what I get for your examples on a local build with 10.42: julia> ccall((:pcre2_config_8, libpcre2_8), Cint, (UInt32, Ref{UInt8}), 11, vstr)
17
julia> VersionNumber(split(unsafe_string(pointer(vstr)), " ")[1])
v"10.42.0"
julia> match(r"a+[bc]+c", "ababc")
RegexMatch("abc")
julia> match(r"a+[bc]{1,2}c", "ababc")
RegexMatch("abc")
julia> match(r"(a)+[bc]+c", "ababc")
RegexMatch("abc", 1="a")
julia> match(r"a{1,2}[bc]+c", "ababc")
RegexMatch("abc")
julia> match(r"(a+)[bc]+c", "ababc")
RegexMatch("abc", 1="a")
julia> versioninfo()
Julia Version 1.10.0-DEV.147
Commit 7b10d5fe01* (2022-12-14 19:15 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: 24 × AMD Ryzen 9 7900X 12-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
Threads: 24 on 24 virtual cores
Environment:
JULIA_NUM_THREADS = 24 So looks like this really was a bug in PCRE and us bumping the version should fix it in the next release. Should be backportable too. @giordano which places besides |
See #47174 (comment) |
I confirm this is fixed by upgrading to PCRE2 10.42 in #47940
|
Yikes. This is a bad regression. Glad we have tests against this now. Chagrinned that we didn't. |
I created a minimal working example for a problem in the regular expression engine. I was able to reproduce the issue on the latest stable release 1.8.3, on 1.9.0-alpha1, and on master built today. It seems to be a problem with the engine failing to backtrack.
The text was updated successfully, but these errors were encountered: