Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing structs by value on the stack is incorrect on arm64 #1259

Closed
rfk opened this issue Oct 26, 2020 · 13 comments
Closed

Passing structs by value on the stack is incorrect on arm64 #1259

rfk opened this issue Oct 26, 2020 · 13 comments

Comments

@rfk
Copy link

rfk commented Oct 26, 2020

The version of libffi bundled with JNA appears to incorrectly pass structs on the stack on arm64 devices.

Consider a function that takes struct arguments by value, like this example which just returns a field from its ninth struct argument:

typedef struct WrappedLong {
    int64_t inner;
} WrappedLong;

int64_t return_the_ninth_argument(WrappedLong arg1, WrappedLong arg2, WrappedLong arg3, WrappedLong arg4, WrappedLong arg5, WrappedLong arg6, WrappedLong arg7, WrappedLong arg8, WrappedLong arg9) {
  return arg9.inner;
}

When I try to call this function via JNA on an arm64 device, is returns a large and random-looking integer rather than the value passed in the ninth argument. I've published a minimal example to show how I'm calling this code via JNA, since the declarations seem like they would be a bit unwieldy copy-pasted into a github comment:

The linked file is part of a tiny Android application that I have been using to investigate this issue and which can be used to reproduce it in full: rfk/arm64-jna-structbug.

Running this example app works as expected on an x86_64 device, and it works as expected on an arm64 device if I modify it to return e.g. the eight argument rather than the ninth.

I believe this to be a bug in the libffi bundled with JNA:

  • The first 8 arguments to this function are passed in registers, but subsequent arguments are passed on the stack.
  • The code for passing small structs on the stack appears to copy a pointer to the argument value rather than copying the argument value itself.

If I change the linked line of code to memcpy from ecif->avalue[i] instead of ecif->avalue + i then my example app seems to work as expected, correctly returning the value passed as the ninth argument.

I took a quick look in the equivalent code in upstream libffi and the issue seems to be fixed there, although as far as I can tell it was fixed as a side-effect of this significant refactor/cleanup, so I'm not sure if it was ever reported upstream as a bug.

Since upstream libffi seems to have diverged quite a bit from the version in this repo, I wanted to ask for advice before proceeding any further here. Would this be best approached by trying to update the bundled version of libffi, or as a standalone fix?

For completeness: I reproduced this issue on a Samsung Galaxy S8 running Android 9, which is an arm64-v8a device, using both the released JNA 5.6.0 and a local build of the latest JNA master from github.

@matthiasblaesing
Copy link
Member

Thank you for your analysis. To address the core question:

Since upstream libffi seems to have diverged quite a bit from the version in this repo, I wanted to ask for advice before proceeding any further here. Would this be best approached by trying to update the bundled version of libffi, or as a standalone fix?

I suspect, that with windows entering new territory (arm64) and apple also jumping that wagon, updates in libffi are to be expected (at least in the build system part), which we will also need. I don't see benefit in doing/repeating FFI upstream work here in the repository. What is more, JNA can already be build against a system installed libffi, so I'd consider updating the bundled version to be save.

There are instructions for updating the bundled libffi in native/README.libffi. This reads as updates are expected.

@twall anything from your point, that would prevent updating to a new upstream version of libff?

@twall
Copy link
Contributor

twall commented Nov 1, 2020 via email

@twall
Copy link
Contributor

twall commented Nov 1, 2020 via email

@matthiasblaesing
Copy link
Member

I traced through the updates of native libffi and found these commits, that touched the imported code and thus would need to be checked:

The code at 951de1a is mostly in sync with 3.2.1, around that commit I found these:

  • a3afbfe Revert incorrect bitwise size check; sizes can not be guaranteed to be aligned or padded, so check small struct sizes explicitly
  • 2577b95 fix mingw struct return when using SYSV ABI, fix msvcc stdcall closures
  • 5ce1c84 fix return type jump tables to accommodate addition of FFI_TYPE_COMPLEX
  • 52eed7b Fix windows builds under MSVC, 32- and 64-bit
  • 4e8c935 Apply input from @joshtriplett on more general stdcall handling
  • 4836f05 don't align arguments in stdcall
  • ec16947 update w32ce-arm build, native
  • 1f1972b remove msvc90 dependency, spurious '-g' flags in libffi, fix compiler warnings against dlmalloc.c

A diff that could only be traced into the 3.2.1 merge was:

diff -urb ../../libffi/src/x86/ffi.c libffi/src/x86/ffi.c
--- ../../libffi/src/x86/ffi.c	2020-11-02 18:35:07.606547408 +0100
+++ libffi/src/x86/ffi.c	2020-11-02 18:42:52.212778456 +0100
@@ -329,6 +329,7 @@
 #ifndef X86_WIN64
   if (cif->abi == FFI_SYSV || cif->abi == FFI_UNIX64)
 #endif
+  if (cif->abi != FFI_STDCALL)
     cif->bytes = (cif->bytes + 15) & ~0xF;
 #endif

@twall
Copy link
Contributor

twall commented Nov 2, 2020 via email

@matthiasblaesing
Copy link
Member

You might be able to set libffi repo as an additional remote and merge libffi updates locally. I recall I had a git subtree setup so that I could do just that.

It is even documented: native/README.libffi. I would use the upstream repository of libffi as the base, but otherwise this might work. The big question in the end is: Is it easier to check if all JNA patches have been integrated upstream and switch to vanialla libffi or is it easier to solve merge conflicts.

@twall
Copy link
Contributor

twall commented Nov 4, 2020 via email

@dkocher
Copy link
Contributor

dkocher commented Nov 6, 2020

Relates to #1238.

@tresf
Copy link
Contributor

tresf commented Nov 12, 2020

can we build from the embedded libffi source/can we update it?

Edit: Sorry, I had missed the native/README.libffi which has explicit instructions for bumping the subtree. According to conversation in #1259, there's divergence issues so I'm still a bit confused as to how best to tackle this.

Originally posted by @tresf in #1238 (comment)

I doubt you'll get many merge conflicts unless a module were completely rewritten, and even then you've got the few commits to isolate the actual changes to be (re-)applied.

I tried this. I added upstream libffi/libffi's master, did the pull with squash per README and I received > 90 conflicts. Am I doing something wrong, or should I revert back to the brute-force replacement used in #1264 while making sure to cherry-pick the commits identified atop our directory?

I'm also willing to fork, point, and submodule, I have a bit of experience with this, but it doesn't come without consequences.

@twall
Copy link
Contributor

twall commented Nov 12, 2020 via email

@tresf
Copy link
Contributor

tresf commented Nov 12, 2020

I started evaluating the commits linked above but it appears libffi has changed too much for nearly all commits listed, so my evaluation isn't very comprehensive.

Click to expand
Commit Description Status
a3afbfe Revert incorrect bitwise size check; sizes can not be guaranteed to be aligned or padded, so check small struct sizes explicitly ❓: x86/ffi.c has completely changed.
2577b95 fix mingw struct return when using SYSV ABI, fix msvcc stdcall closures src/x86/win32.S is missing.
5ce1c84 fix return type jump tables to accommodate addition of FFI_TYPE_COMPLEX src/x86/win32.S is missing, src/x86/win64.S doesn't match.
52eed7b Fix windows builds under MSVC, 32- and 64-bit ffitarget.h has the patched. Most other files have completely changed.
4e8c935 Apply input from @joshtriplett on more general stdcall handling ❓: x86/ffi.c has completely changed although similar 64-bit guards seem to be present throughout.
4836f05 don't align arguments in stdcall x86/ffi.c has completely changed. The X86_WIN32| FFI_STDCALL guard seems to be missing, but it may be done elsewhere.
ec16947 update w32ce-arm build, native ❓ Files have completely changed.
1f1972b remove msvc90 dependency, spurious '-g' flags in libffi, fix compiler warnings against dlmalloc.c ❓ Makefile changes seem OK. dlmalloc.c fixes can't easily be determined.

A diff that could only be traced into the 3.2.1 merge was: (x86/ffi.c's [X86_WIN32| FFI_STDCALL] from 4836f05.)

I noticed this too however the code has changed so much, we'd need someone much more familiar with it to know if the patch is needed or exists elsewhere. Here's what it looks like upstream...

for (i = 0, n = cif->nargs; i < n; i++)
{
ffi_type *t = cif->arg_types[i];
bytes = FFI_ALIGN (bytes, t->alignment);
bytes += FFI_ALIGN (t->size, FFI_SIZEOF_ARG);
}
cif->bytes = bytes;

can you generalize at all about the nature of the conflicts? that's way more than I would have expected.

Hard to put into words, but a lot goes into conflict, so perhaps I'm doing it wrong? Even the .gitignore and .travis.yml files went into conflict, which I wouldn't expect us to touch: https://gist.github.com/tresf/fb4034ccc75cb55eb23dd43a3cea3cb1

@rfk
Copy link
Author

rfk commented Jul 20, 2021

I saw that the JNA 5.7 release includes an update the libffi, so I just tried it with my testcase and am no longer able to reproduce the bug. Thanks folks! I think this can be safely closed.

@dbwiddis
Copy link
Contributor

Thanks for the update! Closing per your report.

badboy added a commit to badboy/uniffi-rs that referenced this issue Aug 3, 2021
This (partially) reverts commit 937f9b7

The upstream JNA bug[1] has been resolved.
JNA 5.7 includes the fix.

We're upgrading our Docker test image to 5.8 already.

[1]: java-native-access/jna#1259
badboy added a commit to badboy/uniffi-rs that referenced this issue Aug 3, 2021
This (partially) reverts commit 937f9b7

The upstream JNA bug[1] has been resolved.
JNA 5.7 includes the fix.

We're upgrading our Docker test image to 5.8 already.

Fixes mozilla#334

[1]: java-native-access/jna#1259
badboy added a commit to badboy/uniffi-rs that referenced this issue Aug 3, 2021
This (partially) reverts commit 937f9b7

The upstream JNA bug[1] has been resolved.
JNA 5.7 includes the fix.

We're upgrading our Docker test image to 5.8 already.

Fixes mozilla#334

[1]: java-native-access/jna#1259
badboy added a commit to badboy/uniffi-rs that referenced this issue Aug 4, 2021
This (partially) reverts commit 937f9b7

The upstream JNA bug[1] has been resolved.
JNA 5.7 includes the fix.

We're upgrading our Docker test image to 5.8 already.

Fixes mozilla#334

[1]: java-native-access/jna#1259
rfk pushed a commit to mozilla/uniffi-rs that referenced this issue Aug 5, 2021
This (partially) reverts commit 937f9b7

The upstream JNA bug[1] has been resolved.
JNA 5.7 includes the fix.

We're upgrading our Docker test image to 5.8 already.

Fixes #334

[1]: java-native-access/jna#1259
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants