Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s390x: Implement tls_value #4616

Merged
merged 1 commit into from
Aug 10, 2022
Merged

Conversation

uweigand
Copy link
Member

@uweigand uweigand commented Aug 4, 2022

Implement the tls_value for s390 in the ELF general-dynamic mode.

Notable differences to the x86_64 implementation are:

  • We use a __tls_get_offset libcall instead of __tls_get_addr.
  • The current thread pointer (stored in a pair of access registers)
    needs to be added to the result of __tls_get_offset.
  • __tls_get_offset has a variant ABI that requires the address of
    the GOT (global offset table) is passed in %r12.

This means we need a new libcall entries for __tls_get_offset.
In addition, we also need a way to access _GLOBAL_OFFSET_TABLE_.
The latter is a "magic" symbol with a well-known name defined
by the ABI and recognized by the linker. This patch introduces
a new ExternalName::KnownSymbol variant to support such names
(originally due to @afonso360).

We also need to emit a relocation on a symbol placed in a
constant pool, as well as an extra relocation on the call
to __tls_get_offset required for TLS linker optimization.

Needed by the cg_clif frontend.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:module labels Aug 4, 2022
@afonso360
Copy link
Contributor

#4546 introduces a KnownSymbol mechanism, which sounds like it would be useful here.

@uweigand
Copy link
Member Author

uweigand commented Aug 4, 2022

#4546 introduces a KnownSymbol mechanism, which sounds like it would be useful here.

Looking at this, I'm wondering why the whole mechanism of allowing names to be replaced externally is even necessary here. I can see where this is useful for libcalls, in case you want to substitute another library. But for something like _GLOBAL_OFFSET_TABLE_, that name is defined by the ABI and hard-coded into the linker, there is no way to override this anyway.

So maybe it would be simpler to just add a feature to allow the backend to simply hard-code a symbol name?

@afonso360
Copy link
Contributor

bjorn3 mentioned that we may need it in the JIT to perform TLS lookups

@uweigand
Copy link
Member Author

uweigand commented Aug 5, 2022

bjorn3 mentioned that we may need it in the JIT to perform TLS lookups

I'm still not sure I see it - at least in the case of _GLOBAL_OFFSET_TABLE_, this is not a "real" symbol either way. When creating an object file used by the system linker, the name has to be exactly this string as specified by the ABI, otherwise the required special processing will not trigger in the linker. If we re-implement linker functionality in the JIT, the JIT will simply have to perform that special processing itself, which should not involve any symbol name at all. There is no scenario I can envision where it makes sense to replace a reference to _GLOBAL_OFFSET_TABLE_ with a reference to some symbol with another name.

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 5, 2022

In the JIT no object files are created and it is not possible to use the libc tls implementation. My idea has been to intercept the libc function call to get the tls address and implement it in the jit, entirely avoiding libc in the process (except for a single native tls access to get a list of all jit tls values for the current thread)

the JIT will simply have to perform that special processing itself, which should not involve any symbol name at all.

Unless the jit consumer does the implementation.

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 5, 2022

Reading the pr description it doesn't seem like the way I was planning to implement it on x86_64 will work on s390x, as on s390x the generated code already accesses the tls register, while on x86_64 this is left for the libcall that can be replaced.

@afonso360
Copy link
Contributor

afonso360 commented Aug 5, 2022

Reading the pr description it doesn't seem like the way I was planning to implement it on x86_64 will work on s390x, as on s390x the generated code already accesses the tls register, while on x86_64 this is left for the libcall that can be replaced.

I think this also applies to the COFF TLS model, we never perform any libcall's, its all relocations. And if we want to implement it in the JIT we are also going to need to handle that separately in the JIT linker right?

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 5, 2022

I think this also applies to the COFF TLS model, we never perform any libcall's, its all relocations.

Even on Windows it would be possible to use ELF tls in the JIT.

And if we want to implement it in the JIT we are also going to need to handle that separately in the JIT linker right?

At least a prototype can be done outside of the jit in cg_clif itself.

@uweigand
Copy link
Member Author

uweigand commented Aug 5, 2022

Reading the pr description it doesn't seem like the way I was planning to implement it on x86_64 will work on s390x, as on s390x the generated code already accesses the tls register, while on x86_64 this is left for the libcall that can be replaced.

So the way GD TLS works on s390x is that you 1) have a constant pool entry that contains a relocation which the linker will resolve to the GOT offset of a pair of GOT slots holding the TLS module index and offset for the target symbol; 2) load this GOT offset and pass it to __tls_get_offset, which will load the module index and offset from the GOT, and return the TLS section offset of the requested symbol, and 3) add the value of the current thread pointer to that TLS offset.

There is a linker optimization in play, for the case where you don't need the full general-dynamic support, that will do two things: 1) change the relocation on the constant pool entry to directly place the TLS section offset of the target symbol there (instead of the GOT offset of the module/offset pair), and replace the call to __tls_get_offset with a no-op. This will lead the generated code to effectively, instead, load the TLS section offset from the constant pool and add the current thread pointer, with no libcall involved.

W.r.t. implementing TLS in the JIT, I guess that depends to a large extent on what exactly should be supported. But the fact that we add the thread pointer in itself doesn't change much - if nothing better is possible, you can always implement __tls_get_offset in terms of your __tls_get_addr implementation and just subtract the current thread pointer before returning.

This will require overloading __tls_get_offset (and __tls_get_addr) - but those are (and should remain) normal libcalls that can already be overloaded by the JIT.

This discussion is about _GLOBAL_OFFSET_TABLE_, which is needed here as parameter to __tls_get_offset. (Of course there can also be other uses of _GLOBAL_OFFSET_TABLE_ unrelated to TLS.) This has to be handled by the JIT itself, because whatever value is used as _GLOBAL_OFFSET_TABLE_ must match the GOT start that is also used to resolve GOT-relative relocations. I don't see any way to implement _GLOBAL_OFFSET_TABLE_ by just redirecting to some other symbol provided outside the JIT.

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 5, 2022

This discussion is about GLOBAL_OFFSET_TABLE, which is needed here as parameter to __tls_get_offset. (Of course there can also be other uses of GLOBAL_OFFSET_TABLE unrelated to TLS.) This has to be handled by the JIT itself, because whatever value is used as GLOBAL_OFFSET_TABLE must match the GOT start that is also used to resolve GOT-relative relocations. I don't see any way to implement GLOBAL_OFFSET_TABLE by just redirecting to some other symbol provided outside the JIT.

For the JIT case it would probably have a value of 0, or whatever arbitrary value the JIT wants as it isn't actually necessary there.

@uweigand
Copy link
Member Author

uweigand commented Aug 5, 2022

For the JIT case it would probably have a value of 0, or whatever arbitrary value the JIT wants as it isn't actually necessary there.

Well, the TLS relocation resolves to an offset relative to the GOT, so the value GLOBAL_OFFSET_TABLE must match how these offsets were computed. It's probably true you can choose any arbitrary value in the JIT, including zero, as long as it's used consistently.

However, this just reinforces my point that the JIT will resolve this, even if to always 0. It will never be replaced by some other symbol, therefore having a replaceable name is not useful.

@uweigand
Copy link
Member Author

uweigand commented Aug 5, 2022

I've now updated the patch to include the KnownSymbol mechanism by @afonso360. As discussed above, I've not included any "lookup" function to provide names from the outside for the well-known symbols. Instead, they only have internal names for display/debug purposes (just like libcalls), and the well-known external name is hard-coded in the object backend. (The JIT backend wouldn't need any symbol name, it can just implement the semantics directly.)

@bjorn3 does this look reasonable to you?

Implement the tls_value for s390 in the ELF general-dynamic mode.

Notable differences to the x86_64 implementation are:
- We use a __tls_get_offset libcall instead of __tls_get_addr.
- The current thread pointer (stored in a pair of access registers)
  needs to be added to the result of __tls_get_offset.
- __tls_get_offset has a variant ABI that requires the address of
  the GOT (global offset table) is passed in %r12.

This means we need a new libcall entries for __tls_get_offset.
In addition, we also need a way to access _GLOBAL_OFFSET_TABLE_.
The latter is a "magic" symbol with a well-known name defined
by the ABI and recognized by the linker.  This patch introduces
a new ExternalName::KnownSymbol variant to support such names
(originally due to @afonso360).

We also need to emit a relocation on a symbol placed in a
constant pool, as well as an extra relocation on the call
to __tls_get_offset required for TLS linker optimization.

Needed by the cg_clif frontend.
@uweigand
Copy link
Member Author

uweigand commented Aug 9, 2022

Hi @cfallin @bjorn3 @afonso360 as far as I can see, this version no longer has any open issues and should be good to go. (It will conflict with #4546 as it pulls it parts of that change, but that should be easy to resolve once it's in.) Is this now OK, or are there any further changes needed?

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me; I'm not familiar with the s390x-specific conventions here but the KnownSymbol abstraction is reasonable and otherwise everything looks like I would expect. Thanks!

@cfallin
Copy link
Member

cfallin commented Aug 9, 2022

(It will conflict with #4546 as it pulls it parts of that change, but that should be easy to resolve once it's in.)

@uweigand it's not totally clear to me, do you mean you want to merge this PR first and then resolve the conflicts in #4546, or wait for #4546 then resolve conflicts here?

@uweigand
Copy link
Member Author

(It will conflict with #4546 as it pulls it parts of that change, but that should be easy to resolve once it's in.)

@uweigand it's not totally clear to me, do you mean you want to merge this PR first and then resolve the conflicts in #4546, or wait for #4546 then resolve conflicts here?

I think we should merge this PR first. I copied the implementation of KnownSymbol initially from #4546, but then added a few changes to address issues that @bjorn3 had pointed out. So the implementation here should be the correct one.

@cfallin cfallin merged commit 50fcab2 into bytecodealliance:main Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:module cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants