-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack overflow in _yr_re_emit() #674
Comments
Slightly different issue in 5e2d279 ASAN:
|
I just debugged an issue for a friend who had a bunch of rules break due to the limit of 2000. He has some rules which have very long hex strings in them, and bumping the value up works fine for him but I'm curious if you would consider bumping it up to 3000 (that more than covers the one rule I saw from him). I realize arbitrary limits are, by definition, arbitrary but I'm hoping we can bump this as to not break rules that exist. |
Yes sure, as long as it avoids the stack overflow we should go for the largest number. |
Just to make sure I could reproduce the problem I checked out
My plan was once I knew how to trigger it I would keep find the largest value for |
@wxsBSD actually I had a similar problem and had to create my own rule to trigger the bug. I just used a very long regexp, like /xxxxxxxxx...{a few thousand more}.....xxxxxxxxxxxx/. Maybe @fumfel has a smaller stack size limit and the bug is triggered in his system but not in yours. Check your stack size limit with |
I forgot about |
@plusvic 8192KB, but AFAIR ASAN for their purposes have different stack limits - 2/4 MB. |
We had many rules break with the 2000 limit. I had to go all the way up to 6000. Is this safe? Just potentially uses more resources? Not sure I understand the implications. |
@marnao a 6000 limit should be safe too. In a 64 bits system my estimation is that each recursive call to _yr_re_emit() consumes around 180 bytes of stack space. So, setting the limit to 10000 will consume around 1.8MB of stack. With the typical 8MB stack space that's less than 25%. However I wonder what kind of regular expressions are hitting the 2000 limit, I'm afraid that by raising the limit I could be encouraging massive regular expressions that don't actually make sense. |
@plusvic thanks for the quick response as always. That's very helpful context. I think you have a legitimate concern about encouraging poor use cases. Some of our worst offending rules were externally sourced, i.e. we did not write them. I can't speak for these, but the signatures that we wrote that resulted in lengthy hex strings were either:
Hope this helps! |
Stack overflow in _yr_re_emit()
Tested on Git HEAD: cdbacf5
Payload
To reproduce:
yara yara_so_yr_re_emit.yar strings
ASAN:
The text was updated successfully, but these errors were encountered: