-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[ObjC][Gen] Emit ObjC strings to respective sections #84300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an abstraction over these sections? I understand that a lot of this is ObjC-specific and therefore (at least for now) Darwin-specific, but it's unfortunate to have to spell out these section strings out all over the place. Among other things, it means we're not in a good position to make similar section optimizations for other platforms, for globals for which that makes sense.
a0713ad
to
fda1a55
Compare
lib/IRGen/GenDecl.cpp
Outdated
auto sectionName = | ||
useOSLogSection ? "__TEXT,__oslogstring,cstring_literals" : ""; | ||
sectionName = | ||
useOSLogSection ? "__TEXT,__oslogstring,cstring_literals" : sectionName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the usages of bool useOSLogSection
that pass true
should be moved to pass sectionName
as IRGenModule::OSLogStringSectionName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately useOSLogSection
does more than just set the section name. See the first few lines of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would not be impossible to use ==
from StringRef
up there, but feels wasteful and it looks like neither StringRef::operator==
nor memcmp
might have early returns for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you know, this might be a good reason to go ahead and pass some kind of enum constant here rather than just a section name. You can then map that to a section name in this function, but since it's an enum, you can also easily check for the os_log section and give that its special-case behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder if the behavior you're talking about earlier in the function might be what's causing this test failure. The current behavior is to use different uniquing sets for os_log
strings and everything else, but of course os_log
strings are also the only strings that are getting a special section. Now you're allowing other strings to get special sections, but that special section is not factoring into the uniquing, which means that we can use the same string data for different purposes. The linker is known to treat some of these sections specially, and it's quite possible that different linkers do that slightly differently, possibly with platform-specific deviations. I suspect that Clang uses different uniquing sets for strings that are going into different sections; we should probably match that.
Also, for correctness in a hopefully-impossible case, when you're mapping enumerators to section names, you should probably use the same MachO-only logic for these ObjC sections as the code is doing for os_log
. Non-MachO platforms use a different schema for section names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've added an enum and change GlobalStrings
to handle different sections. Hopefully the tests should pass now.
@rjmccall do you have any other concerns? |
Add the `--{no-}separate-cstring-literal-sections` option to emit cstring literals into sections defined by their section name. This allows for changes like swiftlang/swift#84300 and swiftlang/swift#84236 to actually have an affect. The default behavior has not changed. The reason this is useful is because strings in different sections might have different access patterns at runtime. By splitting these strings into separate sections, we may reduce the number of page faults during startup. For example, the ObjC runtime accesses all strings in `__objc_classname` before main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is a good first step; the call sites look reasonable, and if we want to change how IRGenModule::ObjCClassNameSetionName
resolves in the future to provide better abstraction, that'll be easy.
@rjmccall sounds good! And thanks for accepting. Again, I don't have commit access so please test/land for me :) |
…8720) Add the `--{no-}separate-cstring-literal-sections` option to emit cstring literals into sections defined by their section name. This allows for changes like swiftlang/swift#84300 and swiftlang/swift#84236 to actually have an affect. The default behavior has not changed. The reason this is useful is because strings in different sections might have different access patterns at runtime. By splitting these strings into separate sections, we may reduce the number of page faults during startup. For example, the ObjC runtime accesses all strings in `__objc_classname` before main.
@swift-ci Please smoke test |
The test failure on macOS looks real. I don't understand why it's happening, but it looks very real. |
Thanks. I'll take a look |
Actually I just rebased and the test doesn't seem to fail on my macos. I'm not sure what it was. |
ea56cdb
to
9cfc0a3
Compare
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. I wonder if avoiding the DenseMap
is a concern for others, or just early optimization.
lib/IRGen/IRGenModule.h
Outdated
GlobalStrings; | ||
llvm::StringMap<std::pair<llvm::GlobalVariable*, llvm::Constant*>> | ||
GlobalOSLogStrings; | ||
llvm::DenseMap< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked if using IndexedMap
is possible with the enum class
and a lambda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexedMap
doesn't automatically grow when accessing values. If we want to avoid dynamic allocations, we can use std::array<>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, but before these changes we already had the OSLog and the global string maps, even if the OSLog might not have been used, so sizing the pre-initializing the storage didn't seem like a big loss. Point is moot since the usage of StringMap
seems to be required to avoid all the duplication problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this doesn't actually have to support an arbitrary section name, just a few enumerable choices, there are a few tricks you could use here to avoid the map of maps. The simplest would just be to have an array of StringMap
s. If StringMap
supported looking up a Twine
, you could also just prepend the enumerator to the string, but unfortunately it doesn't, and copying the string is probably worse than an extra lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that we can't use the enum as a key because ObjCMethodName
and ObjCPropertyName
both map to the same section: __objc_methname
. The reason I used different enums is because that is what Clang does and I wasn't sure if we eventually wanted to emit them into a separate section or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One can map the semantic names enum into a section names enum and use the second enum for any of those tricks trying to avoid the generic StringMap
or DenseMap
. But I don't know if this makes sense at all. For a map with 4-5 entries, we might be discussing a very silly amount of time/space complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think emulating Clang is probably the right approach for now. You could just make one enumerator be an alias for the other, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just make one enumerator be an alias for the other, though.
That's a good idea. Would you like me to do this and use an array? Or is it fine to use a map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An array is definitely the more efficient data structure, if you don't mind.
Looks like another test broke because some |
When This made me realize that we might have duplicate strings in different sections if some caller of I also happened to see in the IR of the
Is there an easy way to check to see if we have duplicate strings across sections? |
I think we always want these to be de-duplicated if possible, including across sections. But the linker can handle that, and emitting duplicates when we want them in different sections is probably the best thing for the linker to deal with as an input.
You mean, is there a way to get FileCheck to check whether any strings are duplicated? No, I can't think of one. You could add an IRGen flag that emits diagnostics about duplicated strings in the LLVM module if there's a specific pattern you think is problematic, but of course, those duplicated strings can happen for legitimate reasons. |
@swift-ci Please test |
Thanks, that looks good to me. |
For some reason github isn't processing my last commit. And the windows tests claim to have merge conflicts? I'm testing a rebase and if it succeeds I'll push it. |
54b2393
to
e99fc5f
Compare
@swift-ci please test |
@rjmccall looks like the tests passed. Are we ready to land? |
Clang emits ObjC strings into special sections
https://github.com/llvm/llvm-project/blob/d5e7c27d53887e6ae490d8e26193a54987728458/clang/lib/CodeGen/CGObjCMac.cpp#L4056-L4072
As far as I can tell from
CGObjCMac.cpp
:__objc_methtype
__objc_classname
__objc_methname
See also #84236.