-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On Android pthread_attr_getstack() includes the guard pages in the returned stack size #1535
Comments
We're working around this by going back to a 4k stack guard temporarily. We can add detection of the Hermes library and downgrade security for apps using it in the future. It's very problematic though since the ABI says the stack probe size is 64k so even if we use 4k that doesn't mean everything will use 4k. |
I am sorry, we will be happy to fix this, but can you explain in more detail what the actual problem in Hermes is? When is it crashing and how? What is the error message? Do you have a symbolicated stack trace? Do you have a minimal repro? |
This is the stack overflow detection code that's buggy: hermes/lib/Support/OSCompatPosix.cpp Line 638 in 0ff6c2a
It doesn't take into account the guard size in the calculations, which isn't always 4k. It works with a 4k or 16k guard but breaks on the way to 64k. 64-bit ARM on Linux has standardized a 64k guard since it's better for security but Android hasn't moved to it from 4k. GrapheneOS uses a 64k guard, but we're working on a new release which among other changes temporarily rolls back to 4k due to this issue in Hermes. If you test on a non-end-of-life Pixel with the current release of GrapheneOS you can replicate the crash. You could also manually set a 64k guard size in your code and see that it results in a crash if you want an easy way to do it without GrapheneOS. Hermes is currently using the default guard size, which is 64k rather than 4k on GrapheneOS. However, setting it manually to 4k would also get 64k since it's a minimum value just like glibc for arm64. |
The calculation wrongly thinks the stack has overflowed and Hermes aborts, wrongly thinking there's an overflow. Do not know the exact details of how it goes wrong, just that it breaks with the default guard size set to 64k instead of 4k which is definitely a bug. It's possible it just doesn't allocate large enough stacks for a 64k guard and needs to factor it in. Not sure exactly what the code is doing, not at all familiar with it. |
I am sorry, you will have to work closely with us here, since we don't have Graphene installed. Let's try to figure out what is going on. I am looking at the code and I don't see how the guard size is relevant at all for this check. The code is pretty simple. It obtains the stack bounds, subtracts a configurable "gap" (64KB by default) for additional safety, and checks whether the stack pointer falls within that region. So, for the check to be failing, the stack bounds must be incorrect. How is that possible? And why does the size of the guard matter at all? |
You can explicitly set the guard size to 64k when creating the threads and it should fail in the same way. |
I just tried that on arm64 Linux: https://gist.github.com/tmikov/5cc745cf16cc35118fb459b0d4414f85. It worked as expected. We really can't do anything more about this until we can understand what the bug is. |
It needs to be done on Android. Bionic handles things differently than glibc. |
@thestinger it appears that Bionic has a bug and it includes the guard size in the reported stack size. The problem is not in the size being 64KB specifically, but that we assume that the entire reported stack size is valid. Also, the problem appears only when there is an actual stack overflow. We will work to fix this. |
To be clear, the problem I discovered has different symptoms than what you are describing. The problem only appears when there is an actual stack overflow. Our stack checking fails to catch it, resulting in a sigsegv. Sure that is bad, but the app was already crashing. |
That's different from what I'm describing since Bionic adds space for the stack guard so there should be just as much space available with a larger stack guard and it shouldn't cause an overflow, Also, the stack checking is what's causing the abort we see rather than a stack overflow. |
@tmikov That's not what I was reporting here. I suggest not working around how Android reports the guard page as part of |
@thestinger I wrote a small test app, creating a thread with 64KB stack guard size. I tested it on MacOS, Linux and Android, all arm64. It worked normally on all platforms and did not trigger a stack overflow prematurely. I do not see how any guard size can trigger a premature stack overflow. The stack overflow check is simply a comparison of the current stack pointer with the bounds of the stack. The guard size is irrelevant in that calculation. I especially don't see how it can cause a premature SIGSEGV. I did however discover that, only on Android, stack overflow is not correctly detected with 64KB guard size, causing a crash instead of a controlled JS exception. This is explained fully by the difference of behavior of At the moment I am satisfied that I understand what I am seeing and I have a workaround for it. I am grateful that we caught this problem. Again, it only affects code that is experiencing an actual stack overflow, causing a SIGSEGV instead of gracefully throwing a JS exception. This is not great, but that code was already broken. If you are seeing different kinds of crashes, there must be a different explanation. Happy to work with you on it, but we need more information on order to diagnose it. Unfortunately for now we can't install GrapheneOS on our devices, so we will have to rely on you to run some experiments. If you can run https://gist.github.com/tmikov/fc3e91a16b9debb8877e4257a551fd08 in GrapheneOS, it will be interesting to observe the behavior in these two cases:
|
We know exactly why this happens but changing the behavior breaks compatibility with apps. In general, behavior like this could only be changed for a new target API level. Android itself has code in ART depending on it working the way that it does. It's because they set both the internal mmap base and stack base to the mapping base instead of offsetting the stack base. It can't be easily changed due to compatibility needing to be preserved. Could try to convince them to change it for a new target API level but they have bigger things to worry about. It shouldn't break many apps and the apps it breaks should be able to deal with it since they're messing with low level, subtle details. We cannot change this behavior though. You could change your code to work around this Bionic behavior but then your code will break if it's changed in Bionic. I would expect that to only happen with a new target API level but they could decide to do it with a new major yearly release for all apps instead. I'd expect too much breaks to do it without the new target API level.
It is quite possible the app we encountered depends on catching a stack overflow. Only 1 app has been reported as broken so far and it's likely much more would have broken if it caused problems outside of extreme edge cases. React Native is extremely popular but 1 user has reported 1 app stopped working after the app was updated, and it's an obscure app. |
@tmikov It does appear to be the case that this app triggers a stack overflow and normally recovers from it, but with the guard size raised to 64k it can't recover because the stack overflow check is misaligned with what Bionic actually does and allows it to hit the guard. |
It looks a lot like the developers of this app worked around a bug by catching an exception and simply not fixing it. That's why it's the only app which broke. Some of the stuff we find with our security hardening is truly strange. Still not as weird as how tons of banking apps use the 32-bit ARM syscall ABI perhaps by accident on pure 64-bit devices as part of their anti-tampering frameworks. |
Android's implementation does not appear to violate the POSIX requirements. It requires that the address range is readable and writable for setstack, not getstack. A way to work around this without hard-wiring a specific Bionic behavior would be to configure the thread with a custom mapping with a 64k guard on both ends for arm64 and 4k for 32-bit arm so that you control the layout and can make assumptions about it. I don't think there's any hope of convincing Google to change Bionic since what they're doing is not explicitly forbidden. |
Summary: Related to: facebook#1535 It turns out that on Android/Bionic `pthread_attr_getstack()` includes the stack guard pages in the returned size. Whether that is a bug or not is debatable, but it differs in behavior from Linux/glibc. Plus, returning a stack range that is not actually not all accessible is less than useful. In practice it worked with our approach because the default stack guard size is 4KB on Android, while our "native gap" is 64KB. Our gap is larger than the stack guard size, so we would still correctly detect and catch stack overflow. However, if the stack guard size is 64KB (or larger), we would not detect the overflow and crash. It is possible to manually set a larger guard size, plus security-oriented OS-es like GrapheneOS do it by default. The fix is to subtract the guard size from the available stack size when running on Bionic. Added tests checking the behavior in the main thread, in a default thread and in a thread with a 64KB guard. Two separate tests are performed under these circumstances: - Unbounded recursion checking for native stack overflow - Manually reading the available stack range to ensure it is all accessible. Differential Revision: D64308925
Fix in #1538 |
Summary: Pull Request resolved: #1538 Related to: #1535 It turns out that on Android/Bionic `pthread_attr_getstack()` includes the stack guard pages in the returned size. Whether that is a bug or not is debatable, but it differs in behavior from Linux/glibc. Plus, returning a stack range that is not actually not all accessible is less than useful. In practice it worked with our approach because the default stack guard size is 4KB on Android, while our "native gap" is 64KB. Our gap is larger than the stack guard size, so we would still correctly detect and catch stack overflow. However, if the stack guard size is 64KB (or larger), we would not detect the overflow and crash. It is possible to manually set a larger guard size, plus security-oriented OS-es like GrapheneOS do it by default. The fix is to subtract the guard size from the available stack size when running on Bionic. Added tests checking the behavior in the main thread, in a default thread and in a thread with a 64KB guard. Two separate tests are performed under these circumstances: - Unbounded recursion checking for native stack overflow - Manually reading the available stack range to ensure it is all accessible. Reviewed By: avp Differential Revision: D64308925 fbshipit-source-id: cdc5e8df6dc0aefd2735d67384d9a7728151d75c
Summary: Ported from Hermes D64308925. Related to: #1535 It turns out that on Android/Bionic `pthread_attr_getstack()` includes the stack guard pages in the returned size. Whether that is a bug or not is debatable, but it differs in behavior from Linux/glibc. Plus, returning a stack range that is not actually not all accessible is less than useful. In practice it worked with our approach because the default stack guard size is 4KB on Android, while our "native gap" is 64KB. Our gap is larger than the stack guard size, so we would still correctly detect and catch stack overflow. However, if the stack guard size is 64KB (or larger), we would not detect the overflow and crash. It is possible to manually set a larger guard size, plus security-oriented OS-es like GrapheneOS do it by default. The fix is to subtract the guard size from the available stack size when running on Bionic. Added tests checking the behavior in the main thread, in a default thread and in a thread with a 64KB guard. Two separate tests are performed under these circumstances: - Unbounded recursion checking for native stack overflow - Manually reading the available stack range to ensure it is all accessible. Reviewed By: avp Differential Revision: D64309284 fbshipit-source-id: 83b881fbb336cbab5bfde2520eae68d0ca4dacde
Bug Description
64-bit ARM requires 64k as the minimum guard size since it uses this as the default stack probe size to reduce the overhead of
-fstack-clash-protection
compared to a 4k size. GrapheneOS uses 64k as the default guard size and adjusts a guard size below 64k to 64k similarly to glibc. Hermes handles this incorrectly and crashes with a stack overflow error. This is causing apps which update to the new version to crash on GrapheneOS, reducing app compatibility. This is a severe issue and you're requiring that operating systems have a security vulnerability in order to be compatible with Hermes.We plan on reporting the lack of the required minimum guard size as a security vulnerability in Android, therefore it should get fixed in a monthly security update. Apps using React Native / Hermes would then break with the stock OS across devices if they update to this version. We'll be reported it soon and it usually takes them a minimum of 3 months to ship a patch, although probably longer for something like this. The issue we see is you don't control when apps update and they're currently slowly updating or starting out using a broken version which they may not upgrade from for years.
Steps To Reproduce
Run apps using Hermes on one of the supported GrapheneOS devices and notice the crash from the stack overflow detection having a false positive.
https://grapheneos.org/faq#recommended-devices
https://grapheneos.org/install/web
The Expected Behavior
Should take the guard size into account properly and avoid crashing. The actual guard size that's used needs to be required rather than assuming it isn't rounded up higher than 4k.
The text was updated successfully, but these errors were encountered: