Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi threading #522

Open
jmbnyc opened this issue Oct 8, 2024 · 6 comments
Open

multi threading #522

jmbnyc opened this issue Oct 8, 2024 · 6 comments

Comments

@jmbnyc
Copy link

jmbnyc commented Oct 8, 2024

How are you suppose to allow a compiled regex pattern to be matched from multiple threads concurrently.

If I have a jit stack that is thread local but the pcre2_match_context is effectively tied to the pattern then I believe this call will cause problems:

pcre2_jit_stack_assign because it modifies the context.

Can you advise the best way to have a compiled pattern concurrently matched without locking?

@jmbnyc
Copy link
Author

jmbnyc commented Oct 8, 2024

I think I figured this out by reading the code. All params to the jit must be thread local, match data, match context and the jit stack that is attached to the match context. Please confirm. It would be nice in the MT section if this was made extremely clear because although it seems to make sense now it was not clear before I read the code and observed how match context was being used (as a conduit for some params).

@carenas
Copy link
Contributor

carenas commented Oct 8, 2024

All params to the jit must be thread local

For your setup, the pcre2_jit_compile() call itself should be done on each thread as explained in the documentation.

Additionally, if you use a custom thread stack then that needs to be assigned to each thread independently as explained in the JIT documentation.

The match_data (which could be reused by multiple serial calls to pcre2_match() in the same thread cannot be shared between threads.

@ltrzesniewski
Copy link
Contributor

The pcre2_jit_compile() call itself must be done on each thread as explained in the documentation.

Are you sure? That's not how I understand the documentation: it says to lock the pcre2_code* before calling pcre2_jit_compile, or essentially make sure it's JIT compiled only once.

@carenas
Copy link
Contributor

carenas commented Oct 9, 2024

Are you sure?

Was thinking that the fact the information is all spread around is confusing, specially as the interpreter is lock free and thread safe, and will only need a mutex when the patterns are compiled on demand.

JIT uses a mutex internally (allthough that can be configurable as well) for its memory allocation of executable code, and uses a stack at match time that can't be shared between threads. and creates non PIC code so it will need to be called again in a pcre2_code that was created from pcre_code_copy().

Maybe we need a pcre2thread.3 man page.

FWIW you CAN safely call most of the time pcre_jit_compile() in the same pcre2_code more than once, and indeed you are encouraged to do so in the documentation as well. and you could call pcre2_jit_compile() only once as far as you make sure that each thread uses a different JIT stack at match time, which needs to be done implicitly and is more tricky to get right.

@zherczeg
Copy link
Collaborator

zherczeg commented Oct 9, 2024

I am sorry if it is confusing. You need to compile the jit code only once, with pcre_jit_compile(). That is non-thread safe, but you can do right after the normal compilation, to avoid parallel compilation.

As for matching, your second comment is also correct. You can run the code in parallel in any threads, but you need a separate match data and jit stack. I don't think you need a unique match context.

The "CONTROLLING THE JIT STACK" section here gives you more info:
https://www.pcre.org/current/doc/html/pcre2jit.html

@jmbnyc
Copy link
Author

jmbnyc commented Oct 9, 2024

My issue was that I was doing

:pcre2_jit_stack_assign(_pMatchContext, nullptr, pJitStack)

just before calling the match function. In the above, my code had _pMatchContext as an instance variable associated with a pcre2_code object and pJitStack is thread local.

This causes a problem if another thread uses the regex object and does the same thing concurrently (_pMatchContext would be same but pJitStack would be a different thread local).

Thus, the fix was to have a thread local match context and assign the jit stack one time at thread local create time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants