Skip to content

Poor optimization of thread local globals on OSX #41067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
llvmbot opened this issue May 3, 2019 · 0 comments
Open

Poor optimization of thread local globals on OSX #41067

llvmbot opened this issue May 3, 2019 · 0 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Member

llvmbot commented May 3, 2019

Bugzilla Link 41722
Version 8.0
OS MacOS X
Reporter LLVM Bugzilla Contributor
CC @TNorthover

Extended Description

Multiple calls to tlv_get_addr are (often) generated per usage of a thread local variable on OSX. This issue was discovered by looking at the assembly generated by rustc, and is discussed in more detail here:

rust-lang/rust#60341 (comment)

I know very little about llvm - so hopefully this all makes sense. The linked IR 1 demonstrates the issue. Often, the optimizer spits out IR which references thread_local globals multiple times when the unoptimized IR only references them once. Often associated with getelementptr.

In the final assembly the asm does the tlv_get_addr dance twice.

movq _foo@TLVP(%rip), %rdi
callq *(%rdi)

For larger structures with multiple members, the problem gets worse, resulting in many redundant calls to tlv_get_addr. In contrast, when targeting linux, __tls_get_addr@PLT, is only invoked once.

Maybe there's a good reason the address isn't cached on OSX, but I'm hoping there isn't :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

1 participant