Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug a potential jit issue #517

Closed
jmbnyc opened this issue Oct 7, 2024 · 7 comments
Closed

How to debug a potential jit issue #517

jmbnyc opened this issue Oct 7, 2024 · 7 comments

Comments

@jmbnyc
Copy link

jmbnyc commented Oct 7, 2024

My team and I are still working to confirm but we are seeing a crash inside pcre2_jit_match. Unfortunately gdb is not very helpful because we get a huge set of stacks with ??. We might be able to do better if we pull the pcre code into our main code base instead of loading it as a library. However, we are wondering how we can debug? Do you have any suggestions on how we can narrow down the issue we might be encountering. We are using version 42.

@carenas
Copy link
Contributor

carenas commented Oct 8, 2024

sljit stacktraces might normally contain those, if the crash is in the generated code.

if you have a core dump with the crash then a backtrace and a disassemble of the crash x/16i $pc-32, together with the expression that crashed it (specially if it is reproducible) will help.

@carenas
Copy link
Contributor

carenas commented Oct 8, 2024

We are using version 42.

Assume you mean 10.42. If your application is threaded then probably should upgrade to 10.44. Also see #435

@zherczeg
Copy link
Collaborator

zherczeg commented Oct 8, 2024

gdb usually does not support backtraces for jit code. If the issue can be reproduced easily, than it is better to put a breakpoint before the jit code is executed:
https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_match.c#L91

You can use ignore or condition gdb commands to stop the right time, then get a backtrace. If you don't know how many times the breakpoint needs to be ignored, you can set a huge number, such as 1000000, and use info breakpoints to get the number of ignores before the issue. That number-1 is a good number for the next ignore.

@jmbnyc
Copy link
Author

jmbnyc commented Oct 8, 2024

does this mean anything to anyone that has been kind enough to respond?

(gdb) x/16i $pc-32
0x2e5c8c5 <pcre2_jit_match_8+496>: and $0x48,%al
0x2e5c8c7 <pcre2_jit_match_8+498>: mov -0x8(%rbp),%eax
0x2e5c8ca <pcre2_jit_match_8+501>: mov 0x18(%rax),%rax
0x2e5c8ce <pcre2_jit_match_8+505>: mov %rax,-0xa0(%rbp)
0x2e5c8d5 <pcre2_jit_match_8+512>: mov -0x38(%rbp),%rdx
0x2e5c8d9 <pcre2_jit_match_8+516>: lea -0xa0(%rbp),%rax
0x2e5c8e0 <pcre2_jit_match_8+523>: mov %rax,%rdi
0x2e5c8e3 <pcre2_jit_match_8+526>: callq *%rdx
=> 0x2e5c8e5 <pcre2_jit_match_8+528>: mov %eax,-0x10(%rbp)
0x2e5c8e8 <pcre2_jit_match_8+531>: jmp 0x2e5c903 <pcre2_jit_match_8+558>
0x2e5c8ea <pcre2_jit_match_8+533>: mov -0x38(%rbp),%rdx
0x2e5c8ee <pcre2_jit_match_8+537>: lea -0xa0(%rbp),%rax
0x2e5c8f5 <pcre2_jit_match_8+544>: mov %rdx,%rsi
0x2e5c8f8 <pcre2_jit_match_8+547>: mov %rax,%rdi
0x2e5c8fb <pcre2_jit_match_8+550>: callq 0x2e5c652 <jit_machine_stack_exec>
0x2e5c900 <pcre2_jit_match_8+555>: mov %eax,-0x10(%rbp)

@zherczeg
Copy link
Collaborator

zherczeg commented Oct 9, 2024

The crash is not in a jit code, it is in pcre2_jit_match_8. The callq *%rdx is an indirect call, the target is loaded by mov -0x38(%rbp),%rdx. You should check if rbp contains a valid stack location. Maybe the call does not restore it properly. It would be good to know what is called there.

Probably this is the location:
https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_match.c#L171

@zherczeg
Copy link
Collaborator

zherczeg commented Oct 9, 2024

I need to correct myself. If you use pcre2_match and pcre2_jit_stack_assign then you need a separate match context. If you use pcre2_jit_match() you don't need it.

@jmbnyc
Copy link
Author

jmbnyc commented Oct 9, 2024

zherczeg,
Thanks for your response. I determined the same thing and determined that the cause was concurrent calls to pcre2_jit_stack_assign with the same match context and different jit stack memory. As I mentioned in another post, once I read the code, it was obvious that match context must be thread local (in my code). Net/Net, my thread local for matching now contains match data, match context, and jit stack. Each regex pattern is matched using a thread local object where the jit stack assign can be done during thread local init.

I appreciate the help here as it allowed me to debug and figure this out. As I mentioned, the docs did not make it completely clear that match data, match context and jit stack all need to be thread local to allow concurrent matching against a pattern (represented by a pcre2_code object. I probably should have read the code first because it becomes very clear what is required to get thread safe concurrent matching.

@zherczeg zherczeg closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants