Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffi_spec fails with --release #12592

Closed
GeopJr opened this issue Oct 11, 2022 · 19 comments · Fixed by #12601
Closed

ffi_spec fails with --release #12592

GeopJr opened this issue Oct 11, 2022 · 19 comments · Fixed by #12601
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:specs topic:compiler:interpreter

Comments

@GeopJr
Copy link
Contributor

GeopJr commented Oct 11, 2022

Bug Report

Running:

make all
make compiler_spec

works as expected and spec passes.

Running:

make all release=1
make compiler_spec release=1

fails with:

Failures:

  1) Crystal::FFI::CallInterface .new sum struct
     Failure/Error: pointer_value.should eq 50

       Expected: 50
            got: 0

     # spec/compiler/ffi/ffi_spec.cr:166

Finished in 16:17 minutes
10780 examples, 1 failures, 0 errors, 15 pending

Failed examples:

crystal spec spec/compiler/ffi/ffi_spec.cr:132 # Crystal::FFI::CallInterface .new sum struct
make: *** [Makefile:92: compiler_spec] Error 1

Crystal:

./bin/crystal -v
Using compiled compiler at .build/crystal
Crystal 1.6.1-dev [809c49e] (2022-10-11)

LLVM: 14.0.6
Default target: x86_64-unknown-linux-gnu

This affects the Nix builds: NixOS/nixpkgs#173928
and possibly the solus ones: https://dev.getsol.us/source/crystal/browse/master/package.yml$36 (they remove spec/compiler/ffi/ffi_spec.cr)

@GeopJr GeopJr added the kind:bug A bug in the code. Does not apply to documentation, specs, etc. label Oct 11, 2022
@straight-shoota
Copy link
Member

Oh, that's an interesting one. Rare to catch because specs usually don't run in release mode ;)

A bit simplified reproduction step:

bin/crystal spec spec/compiler/ffi/ffi_spec.cr --release

And I don't think the compiler itself needs to be built in release mode. (though it might be relevant which LLVM version it uses)

I have no idea what this might be about. But considering that the Crystal compiler doesn't do anything special in release mode, except for telling LLVM to optimize the generated code. This suggests it might be an LLVM bug.

@asterite
Copy link
Member

asterite commented Oct 12, 2022

I just tried a program that uses ffi with the same shape, so that it compiles faster (I don't know why that ffi_spec is so slow to compile, it probably requires unneeded files) and compiling in release mode works the same way (it works fine for me.)

@asterite
Copy link
Member

Ah, yes, it seems ffi_spec requires the compiler and all of that... is there a reason for that? I think it's just to get a temp path, but I'm sure we could refactor it so that it requires less things.

@asterite
Copy link
Member

Also for me on Mac it works fine compiled in release mode. So maybe it's a problem with how the FFI bindings are defined for linux.

@straight-shoota
Copy link
Member

I don't know why that ffi_spec is so slow to compile, it probably requires unneeded files

Yes, I noticed that too. In theory it shouldn't use most of the compiler code, so that shouldn't really matter for compilation complexity. 🤔

@asterite
Copy link
Member

In theory, yes. In practice, instance variable initializers are always type-checked, even if the type doesn't end up being used, and that might end up requiring the entire compiler's source code.

Definitely something to improve... though I have a feeling that that would be impossible to change at this point.

@dmgk
Copy link
Contributor

dmgk commented Oct 12, 2022

FWIW, I'm getting the same spec failure on FreeBSD (llvm 14.0.6):

$ LLVM_CONFIG=/usr/local/bin/llvm-config14 gmake release=1 progress=1 stats=1 compiler_spec
Failures:

  1) Crystal::FFI::CallInterface .new sum struct
     Failure/Error: pointer_value.should eq 50

       Expected: 50
            got: 0

     # spec/compiler/ffi/ffi_spec.cr:166

@straight-shoota
Copy link
Member

I've managed to reduce the problem to the following pure Crystal program. It works without --release, but in release mode, pointer_value is 0.
There's no C library involved, I'm just using libdl (via Loader) to

require "compiler/crystal/ffi"

fun set_pointer(p : Pointer(Void)) : Int32
  p.as(Int32*).value = 123
  42
end

call_interface = Crystal::FFI::CallInterface.new Crystal::FFI::Type.sint32, [
  Crystal::FFI::Type.pointer,
] of Crystal::FFI::Type

# Locate function pointer to set_pointer in the current program
handle = LibC.dlopen(nil, LibC::RTLD_LAZY | LibC::RTLD_GLOBAL)
function_pointer = LibC.dlsym(handle, "set_pointer")

pointer_value = 11_i32
arg_pointers = Slice[
  Pointer(Int32*).malloc(1, pointerof(pointer_value)).as(Void*),
  Pointer(Void).null,
]

return_value = 0i32
call_interface.call(function_pointer, arg_pointers.to_unsafe, pointerof(return_value).as(Void*))
p! pointer_value
p! return_value

There seems to be some connection to accessing the return value. Even just commenting out the last line (p! return_value) results in a correct value for pointer_value.

@asterite
Copy link
Member

Bold guess: you are not supposed to pass pointerof(...) something to FFI. You should always pass heap memory. It works if you change it to this:

pointer_value = Pointer(Int32).malloc(1, 11_i32)
arg_pointers = Slice[
  Pointer(Int32*).malloc(1, pointer_value).as(Void*),
  Pointer(Void).null,
]

return_value = 0i32
call_interface.call(function_pointer, arg_pointers.to_unsafe, pointerof(return_value).as(Void*))
p! pointer_value.value

@asterite
Copy link
Member

Another test someone could do is do the same program in C and see if it works.

@straight-shoota
Copy link
Member

I came up with the following C program:

// $ clang .test/ffi.c -O3 -lffi
#include <stdio.h>
#include <stdlib.h>
#include <ffi.h>

int set_pointer(int* p){
  *p = 123;
  return 42;
}

int main(){
  ffi_cif cif;
  ffi_type *args[1];
  void *values[1];
  ffi_arg rc;

  int value = 11;
  int** pointer = malloc(1);
  *pointer = &value;

  /* Initialize the argument info vectors */
  args[0] = &ffi_type_pointer;
  values[0] = pointer;

  /* Initialize the cif */
  ffi_prep_cif(&cif, FFI_DEFAULT_ABI, 1, &ffi_type_sint, args);

  ffi_call(&cif, FFI_FN(set_pointer), &rc, values);
  printf("%d\n", value);
  printf("%ld\n", rc);

  return 0;
}

It does not fail in release mode (O3).
I think it should be roughly equivalent, but I'm really not good at C, so I have probably overlooked some things.

@asterite
Copy link
Member

The equivalent Crystal program of that C program is using StaticArray. And if you do that, it works. So there's something fishy about using a Slice.

@straight-shoota
Copy link
Member

Hm, that's strange because the original code actually uses StaticArray. I think I have changed that somewhere in between and at that point the error reproduced in both variants. Now it doesn't.

@asterite
Copy link
Member

I can't get the C program to work locally (setup issues), but what happens if you use malloc for the values, instead of a "static array" in C? Does it also work well?

@asterite
Copy link
Member

Needless to say, I have no idea why it doesn't work with heap memory.

@straight-shoota
Copy link
Member

It works well with void **values = (void**) malloc(1);.

@asterite
Copy link
Member

So strange! The next thing I did was using LibC.malloc on Crystal's side, but it fails too in release mode 🤔

@HertzDevil
Copy link
Contributor

HertzDevil commented Oct 12, 2022

What I could gather on aarch64-apple-darwin is that in release mode pointer_value is right next to return_value on the stack, and the return value is done as a 64-bit store, e.g.:

pointerof(pointer_value) # => 0x16b683184
pointerof(return_value)  # => 0x16b683180

# FFI does the equivalent of the following:
pointerof(return_value).as(Int64*).value = Int64.new!(set_pointer(...))

This is documented behavior:

In most situations, ‘libffi’ will handle promotion according to the ABI. However, for historical reasons, there is a special case with return values that must be handled by your code. In particular, for integral (not struct) types that are narrower than the system register size, the return value will be widened by ‘libffi’. ‘libffi’ provides a type, ffi_arg, that can be used as the return type. For example, if the CIF was defined with a return type of char, ‘libffi’ will try to store a full ffi_arg into the return value.

The reason we don't see this on non-release mode is because there is extra stack space between the two variables. On the other hand the CIF call will still corrupt something else on the stack. This C program above works because the return value is properly declared as an ffi_arg.

ffi_arg is platform-specific, for example defined here for x86. If we want to test Crystal::FFI::CallInterface directly, then the return_values must all use that type (or its signed counterpart ffi_sarg) regardless of what spec/compiler/data/ffi/sum.c uses as C return types.

I believe the interpreter is unaffected since the return value is always placed on the top of the interpreter stack, according to Crystal::Repl::Interpreter.lib_call.

@straight-shoota
Copy link
Member

straight-shoota commented Oct 12, 2022

Oh, great find. I know I've read that before and I've had it somewhere in my head, but couldn't connect the dots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:specs topic:compiler:interpreter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants