Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix symbol trampolines for OSX and export jl_n_threads #38938

Merged
merged 10 commits into from
Dec 24, 2020

Conversation

imciner2
Copy link
Contributor

The symbol trampoline for OSX wasn't prepending the underscore to the name properly when defining the trampoline function, so it wasn't appearing in a form that it could be linked against.

I am marking as a draft because I think there may be an issue with the libjulia public library on 32-bit Windows as well where there is a mismatch between underscore prepending and no underscore (at least by my reading of the trampoline file for i686). I don't have a 32-bit Windows available to actually test this on though, so some help there would be appreciated.

Also expose jl_n_threads as a public data object in the library since it is exposed as JL_DLLEXPORT in the main code.

Fixes #38925

@fingolfin
Copy link
Member

I wonder if somebody who as access to an Apple M1 Mac could double check there that the exports are right?

@vchuravy vchuravy requested a review from staticfloat December 18, 2020 15:45
It turns out that aarch64 assembly syntax uses `;` as the comment
character, not as the statement separator.  So we need to polyfill
that.  We also set an alignment of 4 bytes (`2^2`) which is required on
Apple targets, but is also a good idea on most other aarch64 machines.
Although we were dispatching to the right symbol, the name of the
re-exported symbol was lacking its leading underscore.
@staticfloat
Copy link
Member

Ah, thanks for this @imciner2! Good catch.

I'm going to push a commit to this branch that fixes some problems with the aarch64-apple-darwin build and verifies that these symbols have the proper names there as well.

I'm also going to push a commit to fix the incorrectly-exported name on win32. Thanks for your attention to detail; please do open any further issues you find!

@imciner2 imciner2 marked this pull request as ready for review December 19, 2020 16:20
Without this, it was defined in both `libjulia` and
`libjulia-internal`, causing `libjulia-internal`'s updates to not be
visible.
@staticfloat
Copy link
Member

staticfloat commented Dec 19, 2020

Okay, so I took a look into the x86_64-apple-darwin error; turns out it's because when we export a data symbol from libjulia-internal, we actually have to define it in libjulia (which you did here) and then not define it in libjulia-internal but rather import it from libjulia, so that the storage is shared. Without that, when libjulia-internal does something like jl_n_threads = 1, it stores it in its own local copy and the libjulia version stays at zero, which is what's happening on the mac buildbot here, erroring out with a thread bounds check (because it thinks it has zero threads, instead of 1).

@staticfloat staticfloat reopened this Dec 19, 2020
@staticfloat
Copy link
Member

There's still an error on aarch64 linux; something is going wrong with the trampolines, but that shouldn't stop us from merging this. I will work through the aarch64-linux issues with Jameson or Keno next week. This should get merged once it goes green on all but aarch64-linux.

@KristofferC KristofferC added the backport 1.6 Change should be backported to release-1.6 label Dec 19, 2020
@KristofferC KristofferC mentioned this pull request Dec 19, 2020
26 tasks
@imciner2
Copy link
Contributor Author

Thanks for looking into the macOS CI failure and fixing that. The need to only have the definition inside the public library and reference that from the internal library should probably be documented somewhere so that it isn't forgotten.

@imciner2
Copy link
Contributor Author

Hmm, looks like the Windows CI is now failing because of a definition error with jl_n_threads.

JL_DLLEXPORT should be used because it will automatically switch
between importing and exporting the symbol as needed.
@imciner2
Copy link
Contributor Author

Ok, I think I have fixed the Windows symbol issue - I believe it needs to be defined as JL_DLLEXPORT instead of JL_DLLIMPORT so that it has proper visibility when needed (that macro will automatically switch between importing and exporting the symbol). That is also the macro that other exported variables such as jl_top_module use.

@imciner2
Copy link
Contributor Author

@staticfloat I was just looking at the symbol exports, and I think there is a problem with trying to export jl_n_threads as it is currently implemented. The definitions given inside jl_exports.h are for a const void*, but jl_n_threads needs to be an int and is not an actual pointer value. Right now it is abusing the pointer as an int, but that could lead to issues if the size of a pointer isn't the same size as an int on platforms. The other exported data all appear to be pointers anyway, so they probably won't have this issue but the jl_n_threads will.

I have an idea of how to fix this, so I will try prototyping that on Sunday (it is late my in time currently).

@staticfloat
Copy link
Member

Yep, you're right, it should be JL_DLLEXPORT which gets switched accordingly, thanks!

Right now it is abusing the pointer as an int, but that could lead to issues if the size of a pointer isn't the same size as an int on platforms.

This should only be a problem if a pointer is smaller than an int, which is never the case. What solution were you thinking of?

Before it was always creating the variables as a pointer
even if they are being referenced as a differnt type in the
internal library.
The declaration has been moved to julia.h, which is already included
in this file.
@imciner2
Copy link
Contributor Author

What solution were you thinking of?

I've just pushed the changes I had in mind. Basically it is just another list of data that also has a type associated with it. While it may be safe right now to abuse a pointer that way, I would say we shouldn't count on it. The fix was fairly simple anyway.

Yep, you're right, it should be JL_DLLEXPORT which gets switched accordingly, thanks!

As for the JL_DLLEXPORT, I actually did some reading on the Windows symbol visibility and apparently putting extern _declspec(dllexport) actually creates a declaration anyway (equivalent to just _declspec(dllimport) apparently) - so 🤷. I also did some other searching and other types such as jl_any_type are defined extern JL_DLLIMPORT, so that wasn't the cause of the windows buildbot failures I think. I don't have a Windows setup for compiling Julia currently, so I want to see what the buildbots do with this most recent change.

@staticfloat
Copy link
Member

@vtjnash Just pinging you to take a quick look at this; I'm trying to remember what we decided on data symbols here. Now that I think harder, I think maybe we honestly do want JL_DLLIMPORT because there is no time where julia.h is included that we're actually exporting the value; since they live in libjulia-internal.

I'm also not entirely sure what we want to do about non-pointer-types being exported. The last time we ran into this (jl_world_counter) you preferred to just remove it from the exports.

@vtjnash
Copy link
Member

vtjnash commented Dec 21, 2020

I think we just said there wasn't a case where it had been needed before now

@staticfloat
Copy link
Member

Alright, Jameson and I like this, so let's just provide the type for all variables; for the others right now you can specify void * as we currently do, since we don't have the proper types available at compile time here.

I will be looking into the aarch64 failure today, along with the windows failure.

@fingolfin
Copy link
Member

Thank you for working on this.

Is there a good rule which functions and variables marked with JL_DLLEXPORT are actually re-exported, and which are not? I just hacked up a quick&dirty list to compare the list of such functions against jl_exported_funcs.inc and there are lot "missing" from the latter (perhaps intentionally?), e.g. jl_get_root_task, jl_module_import_as, jl_module_use_as, ...

On Windows, we have a special list of libraries that we search for
default symbol resolution.  Now that we have symbols defined in
`libjulia` and then imported by `libjulia-internal`, we need to make
certain that we search `libjulia-internal` first (so that we find
non-trampoline functions first) and then `libjulia` (so that we do in
fact eventually find the symbols at all).
@staticfloat
Copy link
Member

@vtjnash could I get a quick look over 9489e64 it seems to be correct, but it would be good to get your eyes on it too.

@@ -1,12 +1,27 @@
#include "../../src/jl_exported_funcs.inc"

// On macOS, we need to prepend underscores on symbols
#if defined(__APPLE__) && defined(__MACH__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When were you running 9.x Classic MacOS userpace, and not using the mach kernel???

@vtjnash vtjnash merged commit 48c4283 into JuliaLang:master Dec 24, 2020
@KristofferC KristofferC removed the backport 1.6 Change should be backported to release-1.6 label Jan 6, 2021
@fingolfin
Copy link
Member

@KristofferC the backport1.6 label was removed, but this PR does not seem to be completely backported; indeed, the most important commit (from my POV) in this PR here was c33bd77 which fixes issue #38925 but the error ("Symbol not found: _jl_cstr_to_string") still occurs in the release-1.6 branch; and indeed as far as I can tell, that commit is neither backported in the release-1.6 branch, nor in PR #39160. As far a I can tell, that's the only missing commit from this PR, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JLLs built against libjulia_jll 1.5 are unusable in Julia 1.6 on macOS (breaks CxxWrap)
5 participants