Skip to content

Conversation

ellishg
Copy link
Contributor

@ellishg ellishg commented Sep 15, 2025

Clang emits ObjC strings into special sections
https://github.com/llvm/llvm-project/blob/d5e7c27d53887e6ae490d8e26193a54987728458/clang/lib/CodeGen/CGObjCMac.cpp#L4056-L4072

As far as I can tell from CGObjCMac.cpp:

  • types go into __objc_methtype
  • class, category, and protocol names go into __objc_classname
  • method names, property names, and property types go into __objc_methname

See also #84236.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an abstraction over these sections? I understand that a lot of this is ObjC-specific and therefore (at least for now) Darwin-specific, but it's unfortunate to have to spell out these section strings out all over the place. Among other things, it means we're not in a good position to make similar section optimizations for other platforms, for globals for which that makes sense.

@ellishg ellishg force-pushed the objc-strings-sectios branch from a0713ad to fda1a55 Compare September 15, 2025 20:43
Comment on lines 6056 to 6064
auto sectionName =
useOSLogSection ? "__TEXT,__oslogstring,cstring_literals" : "";
sectionName =
useOSLogSection ? "__TEXT,__oslogstring,cstring_literals" : sectionName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the usages of bool useOSLogSection that pass true should be moved to pass sectionName as IRGenModule::OSLogStringSectionName?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately useOSLogSection does more than just set the section name. See the first few lines of this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not be impossible to use == from StringRef up there, but feels wasteful and it looks like neither StringRef::operator== nor memcmp might have early returns for that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you know, this might be a good reason to go ahead and pass some kind of enum constant here rather than just a section name. You can then map that to a section name in this function, but since it's an enum, you can also easily check for the os_log section and give that its special-case behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder if the behavior you're talking about earlier in the function might be what's causing this test failure. The current behavior is to use different uniquing sets for os_log strings and everything else, but of course os_log strings are also the only strings that are getting a special section. Now you're allowing other strings to get special sections, but that special section is not factoring into the uniquing, which means that we can use the same string data for different purposes. The linker is known to treat some of these sections specially, and it's quite possible that different linkers do that slightly differently, possibly with platform-specific deviations. I suspect that Clang uses different uniquing sets for strings that are going into different sections; we should probably match that.

Also, for correctness in a hopefully-impossible case, when you're mapping enumerators to section names, you should probably use the same MachO-only logic for these ObjC sections as the code is doing for os_log. Non-MachO platforms use a different schema for section names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've added an enum and change GlobalStrings to handle different sections. Hopefully the tests should pass now.

@ellishg
Copy link
Contributor Author

ellishg commented Sep 22, 2025

@rjmccall do you have any other concerns?

ellishg added a commit to llvm/llvm-project that referenced this pull request Sep 22, 2025
Add the `--{no-}separate-cstring-literal-sections` option to emit
cstring literals into sections defined by their section name. This
allows for changes like swiftlang/swift#84300
and swiftlang/swift#84236 to actually have an
affect. The default behavior has not changed.

The reason this is useful is because strings in different sections might
have different access patterns at runtime. By splitting these strings
into separate sections, we may reduce the number of page faults during
startup. For example, the ObjC runtime accesses all strings in
`__objc_classname` before main.
Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is a good first step; the call sites look reasonable, and if we want to change how IRGenModule::ObjCClassNameSetionName resolves in the future to provide better abstraction, that'll be easy.

@ellishg
Copy link
Contributor Author

ellishg commented Sep 22, 2025

@rjmccall sounds good! And thanks for accepting. Again, I don't have commit access so please test/land for me :)

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 22, 2025
…8720)

Add the `--{no-}separate-cstring-literal-sections` option to emit
cstring literals into sections defined by their section name. This
allows for changes like swiftlang/swift#84300
and swiftlang/swift#84236 to actually have an
affect. The default behavior has not changed.

The reason this is useful is because strings in different sections might
have different access patterns at runtime. By splitting these strings
into separate sections, we may reduce the number of page faults during
startup. For example, the ObjC runtime accesses all strings in
`__objc_classname` before main.
@rjmccall
Copy link
Contributor

@swift-ci Please smoke test

@rjmccall
Copy link
Contributor

The test failure on macOS looks real. I don't understand why it's happening, but it looks very real.

@ellishg
Copy link
Contributor Author

ellishg commented Sep 23, 2025

The test failure on macOS looks real. I don't understand why it's happening, but it looks very real.

Thanks. I'll take a look

@ellishg
Copy link
Contributor Author

ellishg commented Sep 23, 2025

The test failure on macOS looks real. I don't understand why it's happening, but it looks very real.

Actually I just rebased and the test doesn't seem to fail on my macos. I'm not sure what it was.

@ellishg ellishg force-pushed the objc-strings-sectios branch from ea56cdb to 9cfc0a3 Compare September 23, 2025 17:45
@rjmccall
Copy link
Contributor

@swift-ci Please test

Copy link
Contributor

@drodriguez drodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. I wonder if avoiding the DenseMap is a concern for others, or just early optimization.

GlobalStrings;
llvm::StringMap<std::pair<llvm::GlobalVariable*, llvm::Constant*>>
GlobalOSLogStrings;
llvm::DenseMap<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked if using IndexedMap is possible with the enum class and a lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexedMap doesn't automatically grow when accessing values. If we want to avoid dynamic allocations, we can use std::array<>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, but before these changes we already had the OSLog and the global string maps, even if the OSLog might not have been used, so sizing the pre-initializing the storage didn't seem like a big loss. Point is moot since the usage of StringMap seems to be required to avoid all the duplication problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this doesn't actually have to support an arbitrary section name, just a few enumerable choices, there are a few tricks you could use here to avoid the map of maps. The simplest would just be to have an array of StringMaps. If StringMap supported looking up a Twine, you could also just prepend the enumerator to the string, but unfortunately it doesn't, and copying the string is probably worse than an extra lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that we can't use the enum as a key because ObjCMethodName and ObjCPropertyName both map to the same section: __objc_methname. The reason I used different enums is because that is what Clang does and I wasn't sure if we eventually wanted to emit them into a separate section or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can map the semantic names enum into a section names enum and use the second enum for any of those tricks trying to avoid the generic StringMap or DenseMap. But I don't know if this makes sense at all. For a map with 4-5 entries, we might be discussing a very silly amount of time/space complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think emulating Clang is probably the right approach for now. You could just make one enumerator be an alias for the other, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just make one enumerator be an alias for the other, though.

That's a good idea. Would you like me to do this and use an array? Or is it fine to use a map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An array is definitely the more efficient data structure, if you don't mind.

@ellishg
Copy link
Contributor Author

ellishg commented Sep 24, 2025

Looks like another test broke because some .{{[0-9]+}} suffixes changed. I'll push a fix shortly.

@ellishg
Copy link
Contributor Author

ellishg commented Sep 24, 2025

When GlobalStrings was a map with CStringSectionType as the key, it would treat ObjCMethodName and ObjCPropertyName strings as different, even though they both emit into the __objc_methname section. This caused duplicate strings in that section, and some tests failed. I've fixed the tests and I'm currently running validation tests on my machine.

This made me realize that we might have duplicate strings in different sections if some caller of getAddrOfGlobalString() incorrectly emitted into the default section, or some other section. I do know that, until llvm/llvm-project#158720, lld and I believe ld64 actually ignore string section names and emit all strings into the __cstring section where they can be deduplicated anyway.

I also happened to see in the IR of the IRGen/objc_properties.swift test that the bareIvar string was in __TEXT,__objc_methname,cstring_literals and __TEXT,__swift5_reflstr.

@"\01L_selector_data(bareIvar)" = private global [9 x i8] c"bareIvar\00", section "__TEXT,__objc_methname,cstring_literals", align 1
@0 = private constant [9 x i8] c"bareIvar\00", section "__TEXT,__swift5_reflstr, regular", no_sanitize_address

Is there an easy way to check to see if we have duplicate strings across sections?

@rjmccall
Copy link
Contributor

When GlobalStrings was a map with CStringSectionType as the key, it would treat ObjCMethodName and ObjCPropertyName strings as different, even though they both emit into the __objc_methname section. This caused duplicate strings in that section, and some tests failed. I've fixed the tests and I'm currently running validation tests on my machine.

This made me realize that we might have duplicate strings in different sections if some caller of getAddrOfGlobalString() incorrectly emitted into the default section, or some other section. I do know that, until llvm/llvm-project#158720, lld and I believe ld64 actually ignore string section names and emit all strings into the __cstring section where they can be deduplicated anyway.

I think we always want these to be de-duplicated if possible, including across sections. But the linker can handle that, and emitting duplicates when we want them in different sections is probably the best thing for the linker to deal with as an input.

Is there an easy way to check to see if we have duplicate strings across sections?

You mean, is there a way to get FileCheck to check whether any strings are duplicated? No, I can't think of one. You could add an IRGen flag that emits diagnostics about duplicated strings in the LLVM module if there's a specific pattern you think is problematic, but of course, those duplicated strings can happen for legitimate reasons.

@rjmccall
Copy link
Contributor

@swift-ci Please test

@rjmccall
Copy link
Contributor

Thanks, that looks good to me.

@ellishg
Copy link
Contributor Author

ellishg commented Sep 26, 2025

For some reason github isn't processing my last commit. And the windows tests claim to have merge conflicts? I'm testing a rebase and if it succeeds I'll push it.

@ellishg ellishg force-pushed the objc-strings-sectios branch from 54b2393 to e99fc5f Compare September 26, 2025 01:39
@drodriguez
Copy link
Contributor

@swift-ci please test

@ellishg
Copy link
Contributor Author

ellishg commented Sep 30, 2025

@rjmccall looks like the tests passed. Are we ready to land?

@drodriguez drodriguez merged commit 2c919e1 into swiftlang:main Sep 30, 2025
5 checks passed
@ellishg ellishg deleted the objc-strings-sectios branch September 30, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants