Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounds exception when using deepcopy with regex #31260

Closed
tjgreen42 opened this issue Mar 5, 2019 · 8 comments
Closed

Bounds exception when using deepcopy with regex #31260

tjgreen42 opened this issue Mar 5, 2019 · 8 comments
Labels
bug Indicates an unexpected problem or unintended behavior good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request strings "Strings!"

Comments

@tjgreen42
Copy link

tjgreen42 commented Mar 5, 2019

julia> match(deepcopy(r""), "")
ERROR: BoundsError: attempt to access ""
  at index [1829569203750929]
Stacktrace:
 [1] prevind(::String, ::Int64, ::Int64) at ./strings/basic.jl:460
 [2] prevind at ./strings/basic.jl:455 [inlined]
 [3] prevind at ./strings/basic.jl:454 [inlined]
 [4] match(::Regex, ::String, ::Int64, ::UInt32) at ./regex.jl:214
 [5] match at ./regex.jl:207 [inlined]
 [6] match(::Regex, ::String) at ./regex.jl:222
 [7] top-level scope at none:0```

As discussed on Slack with @StefanKarpinski and @vtjnash, the right fix is probably to provide an overload of deepcopy_internal for Regex which makes a copy of the underlying PCRE object, ideally through the PCRE API if such functionality exists.

@tjgreen42
Copy link
Author

While inspecting the Regex/PCRE code, I also noticed that the call to compile inside match is probably unnecessary, since the Regex inner constructor already calls compile.

@StefanKarpinski StefanKarpinski added bug Indicates an unexpected problem or unintended behavior help wanted Indicates that a maintainer wants help on an issue or pull request good first issue Indicates a good issue for first-time contributors to Julia strings "Strings!" labels Mar 5, 2019
@JeffBezanson
Copy link
Member

Is it actually meaningful to copy a regex, or could we make deepcopy_internal just return the original object?

@tjgreen42
Copy link
Author

@JeffBezanson Dunno, what about multithreading, I can imagine people might be tempted to use deepcopy to create independent copies of an object to avoid concurrency issues?

@StefanKarpinski
Copy link
Member

Is it actually meaningful to copy a regex, or could we make deepcopy_internal just return the original object?

This seems like it would be fine and the simplest solution to this, although it doesn't address the multithreading issue. But that issue would still exist anyway since multiple threads could have references to the same uncopied regex object, which means that if we want to make regexes threadsafe, we'd either need to put a mutex around the use of match_data or make it per-thread.

@Moelf
Copy link
Contributor

Moelf commented Mar 8, 2019

Is this already fixed?

julia> match(deepcopy(r""), "")
RegexMatch("")

@eulerkochy
Copy link
Contributor

Well, I was experimenting for this issue, and I found out for Regex, out of the 7 fields, one of them is an Array, which implies deepcopy(r"") === r"" will always be false. The simplest way out as @JeffBezanson suggested would be to make a separate deepcopy_internal for Regex type and avoid this issue. That being said, it might happen this might be one of the several types that are facing this problem, (basically any type with Array as one their field). What's the way out in those cases? We can't hardcode everytime, right?

@vtjnash
Copy link
Member

vtjnash commented Mar 26, 2019

Is it actually meaningful to copy a regex

Depends on what you think "deepcopy" should mean. There's bits of internal state (mostly write-once/immutable), and an allocation cache (mutable) inside. But otoh, there's no copy method since it doesn't expose a mutation API (although some mutation does happen, as a side-effect of usage).

@vtjnash
Copy link
Member

vtjnash commented Mar 26, 2019

That being said, it might happen this might be one of the several types that are facing this problem, (basically any type with Array as one their field)

I think the issue is often with types that have a Ptr field and/or a finalizer (often those go together)—and Regex has both. This issue has also been previously discussed in #16667.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request strings "Strings!"
Projects
None yet
Development

No branches or pull requests

6 participants