Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with swift.index_while_building #561

Closed
jerrymarino opened this issue Feb 3, 2021 · 9 comments
Closed

Performance issues with swift.index_while_building #561

jerrymarino opened this issue Feb 3, 2021 · 9 comments

Comments

@jerrymarino
Copy link
Contributor

Good morning! I'm again looking at how to use the feature swift.index_while_building with Bazel and found it to introduce significant build performance issues. The impact of using this flag is a 300% increase in swift compilation times as opposed to not using it.

From what I can tell, it adds multiple GB of output to the build - as there is a per swift_library index, all system modules and deps seem to be indexed O(N swift_library) times. rules_ios uses the program index-import which churns trying to import these indicies for longer than it'd takes Xcode to index the source code after build. Using a "global index" - a single index mitigates the perf impact, but I'm not sure how this would work with remote caching.

I was curious how "index while building" is working with rules_swift and Bazel in general for larger builds. XCHammer doesn't have this feature and works around it by letting Xcode index a focused subset after the build.

@jerrymarino jerrymarino changed the title Performance issues / useage of swift.index_while_building Performance issues with swift.index_while_building Feb 3, 2021
@brentleyjones
Copy link
Collaborator

brentleyjones commented Feb 3, 2021

At Target we used a global index, but only when running locally: 6b77a6b.

Remote cache hits would still download to each module folder, but post build we would copy those into the global index. Locally built modules would go directly into the global index. We then only index-import from the global index, also only importing units and records that changed: MobileNativeFoundation/index-import@6bb07ce

We personally found that the largest hit was the vast number of files and directories being made (and re-made, since declare_directory would delete the indexstore before re-populating it). I wrote up about it here: https://brentley.dev/xcodebuild-vs-bazel-incremental/#multiple-index-stores


I would love to see a first class solution for a global index store, but I think it would require changes to Bazel: bazelbuild/bazel#7527

@jerrymarino
Copy link
Contributor Author

@brentleyjones great to hear that you've got this working! Also thanks for taking the time to write up how you're dealing with this, I'm going to give your blog post a read 👍

Remote cache hits would still download to each module folder, but post build we would copy those into the global index.

So you're writing a per-swift_library-index into the cache - then locally you pull from the cache; locally the build also uses a global index store. I assumed the file .."/rules_swift_global_index_enabled" is how you're able to get the worker to manage this.

I was thinking it'd be ideal to stripping out a minimal subset of records before writing to the cache - perhaps that's not 100% necessary if the artifacts are reasonably small. I was also thinking to leverage -index-ignore-system-modules to minimize the size here.

With the incremental index-import MobileNativeFoundation/index-import@6bb07ce you're able to get index-import running at a reasonable speed importing the out of band global index into DerivedData? I was hoping index-import would be fast enough and assumed it might have been if people didn't devise a way to patch Xcode to use bazel's index.


Yep it seems like Bazel needs some solution to bazelbuild/bazel#7527. Like the clang module cache this is basically a mutable tree artifact that many actions need to mutate 🤔

@amberdixon
Copy link

I was also thinking to leverage -index-ignore-system-modules to minimize the size here.

Looks like this feature was introduced here, yay! #563

@keith
Copy link
Member

keith commented Feb 8, 2021

Definitely pass that to reduce extra work

@brentleyjones
Copy link
Collaborator

My analysis was with that flag on, btw.

@jerrymarino
Copy link
Contributor Author

Nice, I think this is a sensible default! I tried passing -index-ignore-system-modules but it's still a large regression. I can post up stats.

Basically there is a O(N indexes * M imports) problem here, so repos with a ton of code will have same problem but with internal code: N*M indexing internal code anyhow. Making a per module index ( N * M ) is the real root of the issue that going with a global index will fix.

In addition to burning CPU and I/O, I get OOMs due to the amount of code it's loading in memory: specifically when --remote_cache=some and --disk_cache=.

I think to implement "Index while building" in Bazel there's some options.

Use the original architecture of a global index and implicit cache

Remote exec / caching

Workers use a global index internally, and then write records and units into the remote cache by copying material from the worker's global index into bazel-out. Perhaps using a tree artifact to obfuscate this to Bazel is the way to go.

Locally

Local machines w/o cache writes write to a global index directly. Ideally Xcode can just read from this index, possibly with a patch to Xcode or somehow preventing the re-mapping

Implement "per module" indexing

Possibility 1. Perhaps we can add a feature to clang and swift that only index files in the module. This could be nice for other reasons
Possibility 2. Make everything a system import to piggy back onto -index-ignore-system-modules. This has tradeoffs in clang for diagnostics.

@segiddins
Copy link
Contributor

Use the original architecture of a global index and implicit cache

This potentially seems doable with the worker, similar to how it gives a global module cache? We'd just need to be able to copy out the relevant bits of the index to the desired output directory after the swiftc invocation

@jerrymarino
Copy link
Contributor Author

jerrymarino commented Feb 12, 2021

Yep it seems totally reasonable and I think we'd just need a C++ program to extract the relevant units/records. The other piece to this is making it work with the swift and objc rules. In practice, this rules_swift feature doesn't work with a large majority of the objc code and there isn't yet a way to handle objc code and remote caching I know if.

The other simple/pragmatic option is to disable "Index while building" in the interim: driving this with Xcode, letting it manage the indexes incrementally as you open files after a build. With a focused project it will quickly index the subset of the code in the editor w/o paying to index the entire graph. Even a theoretical optimized approach that uses a global index will have a noticeable impact on build perf.

jerrymarino added a commit to jerrymarino/index-import that referenced this issue Feb 19, 2021
This PR adds the capability to import output files from a compilation:
to improve performance and experiment with using a "global shared index"
while remote caching a subset for lib. The github issue has some overall
mentions of this bazelbuild/rules_swift#561

Index while building is performant in clang and swift becauase it
conditionally writes data based on files in a global index cache reused
across all clang/swift invocations in the build. The overhead of 3-5%
performance hit in the whitepaper of "Index while building" was hinged
on the fact it'd be using a global index cache.  With the per library
index as rules_swift implemented this feature, "Index while building"
adds a 300% increase in build times in my testing.

Longer term, can add flags to clang and swift indexing to be aware of
remote compilation and caching, this is a quick hack incase that doesn't
ever happen.

Remote build possibilities:

The plan is to be able to use a global index on compilation fix
compilation performance and minimize the O ( M imports * N modules )
disk usage. I'd then intend to use this code to import a subset of the
global index material into the remote cache by putting it in `bazel-out`
at `swift_library`'s "index-store" path. Eventually a feature can be
added to rules_cc following the same pattern for .m, .cpp

Local build possibilities:

I plan to import swift_library cache remote cache hits from `bazel-out`
into Xcode's index.

However, importing output files as a capability is useful even if that
doesn't play out: if we orient index-import around output files, then it
doesn't have to enumerate and process an entire index ever again.
Another build related program ran aas an aspect or via BEP could read
outputs and then invoke index-import with these outputs.
jerrymarino added a commit to bazel-ios/rules_ios that referenced this issue Feb 23, 2021
jerrymarino added a commit to bazel-ios/rules_ios that referenced this issue Feb 25, 2021
* Writeup doc on Index While Building

Overall architecture of design for "Index While Building" in Bazel

- MobileNativeFoundation/index-import#53
- bazelbuild/rules_swift#567
- bazelbuild/rules_swift#561
brentleyjones pushed a commit to MobileNativeFoundation/index-import that referenced this issue Mar 12, 2021
This PR adds the capability to import output files from a compilation:
to improve performance and experiment with using a "global shared index"
while remote caching a subset for lib. The github issue has some overall
mentions of this bazelbuild/rules_swift#561

Index while building is performant in clang and swift becauase it
conditionally writes data based on files in a global index cache reused
across all clang/swift invocations in the build. The overhead of 3-5%
performance hit in the whitepaper of "Index while building" was hinged
on the fact it'd be using a global index cache.  With the per library
index as rules_swift implemented this feature, "Index while building"
adds a 300% increase in build times in my testing.

Longer term, can add flags to clang and swift indexing to be aware of
remote compilation and caching, this is a quick hack incase that doesn't
ever happen.

Remote build possibilities:

The plan is to be able to use a global index on compilation fix
compilation performance and minimize the O ( M imports * N modules )
disk usage. I'd then intend to use this code to import a subset of the
global index material into the remote cache by putting it in `bazel-out`
at `swift_library`'s "index-store" path. Eventually a feature can be
added to rules_cc following the same pattern for .m, .cpp

Local build possibilities:

I plan to import swift_library cache remote cache hits from `bazel-out`
into Xcode's index.

However, importing output files as a capability is useful even if that
doesn't play out: if we orient index-import around output files, then it
doesn't have to enumerate and process an entire index ever again.
Another build related program ran aas an aspect or via BEP could read
outputs and then invoke index-import with these outputs.
@keith
Copy link
Member

keith commented Oct 26, 2021

Closed by #567

@keith keith closed this as completed Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants