Faster incremental imports #46

brentleyjones · 2020-06-15T13:03:33Z

"Only transfer units if they are newer" is an alternative implementation of Only rewrite unit files when outdated #32.
"Allow specifying direct paths to files to import" allows directly specifying which paths have changed for even more granularity. It also strides on unit paths instead of indexstores, which in our case (a single indexstore) allowed near full parallelism (4x faster for 4 cores, stride of 64 with 4568 files).

DavidGoldman · 2020-06-17T15:18:40Z

Maybe we should remove the per-index store parallelism in favor of the unit path parallelism? Also, have you tested in making the stride boil down to N threads (e.g. if you have 20 files, 4 threads, stride = 20 / 4 =5)?

Also, I think the Only transfer units if they are newer feature should be optional (behind a setting), WDYT?

brentleyjones · 2020-06-17T15:23:36Z

Maybe we should remove the per-index store parallelism in favor of the unit path parallelism?

For sure. It might make sense for that to be a subsequent PR, since the code around unit path parallelism depends on the fact that I'm passing in the direct paths.

Also, have you tested in making the stride boil down to N threads (e.g. if you have 20 files, 4 threads, stride = 20 / 4 =5)?

I did not test that yet.

Also, I think the Only transfer units if they are newer feature should be optional (behind a setting), WDYT?

Yeah, I could be behind that, as long as it was the default.

DavidGoldman · 2020-06-18T15:28:48Z

Maybe we should remove the per-index store parallelism in favor of the unit path parallelism?

For sure. It might make sense for that to be a subsequent PR, since the code around unit path parallelism depends on the fact that I'm passing in the direct paths.

Also, have you tested in making the stride boil down to N threads (e.g. if you have 20 files, 4 threads, stride = 20 / 4 =5)?

I did not test that yet.

Worth testing, have a feeling it might be the most optimal.

Also, I think the Only transfer units if they are newer feature should be optional (behind a setting), WDYT?

Yeah, I could be behind that, as long as it was the default.

How are you planning on using this? For our integration I was planning on having per-target/source index zip files, which we re-import (after extracting into a single index store dir) if they have changed. I guess this approach makes sense as long as you're having Bazel build this index locally (so the timestamps are set when it is built)?

brentleyjones · 2020-06-18T16:02:57Z

How are you planning on using this? ... I guess this approach makes sense as long as you're having Bazel build this index locally (so the timestamps are set when it is built)?

We ensure that all of our indexes are in a single store dir (via https://github.com/target/rules_swift/commit/dc0c9ef0db74e24895d06cf27ade0321371e3ecf for incremental builds and copying them from individual targets if downloading from a cache). So for us the timestamps are correct since they are the result of a fresh build.

DavidGoldman · 2020-06-18T18:13:07Z

How are you planning on using this? ... I guess this approach makes sense as long as you're having Bazel build this index locally (so the timestamps are set when it is built)?

We ensure that all of our indexes are in a single store dir (via target/rules_swift@dc0c9ef for incremental builds and copying them from individual targets if downloading from a cache). So for us the timestamps are correct since they are the result of a fresh build.

Ah I see, so you produce the index store as a side-effect, completely external to Bazel, even on CI? When you initially download from the cache you preserve timestamps on all of the unit/record files?

brentleyjones · 2020-06-18T18:36:41Z

Ah I see, so you produce the index store as a side-effect, completely external to Bazel, even on CI?

We produce the indexes via Bazel, but that patch allows normal remote-cache downloading of the indexes, and when that happens we move those indexes to the same location the patch would put them (the moving is done completely external to Bazel though). The way the patch works though might be described as "completely external to Bazel" since Bazel doesn't track those indexes when building locally.

When you initially download from the cache you preserve timestamps on all of the unit/record files?

Hopefully the above explains that Bazel still downloads the indexes. I'm not sure how it determines timestamps in that case, but even if it's the current time, from the PoV of the build, that is correct, since it had to rebuild and the timestamps would be the current time in that case as well.

DavidGoldman · 2020-06-18T18:51:57Z

Ah I see, so you produce the index store as a side-effect, completely external to Bazel, even on CI?

We produce the indexes via Bazel, but that patch allows normal remote-cache downloading of the indexes, and when that happens we move those indexes to the same location the patch would put them (the moving is done completely external to Bazel though). The way the patch works though might be described as "completely external to Bazel" since Bazel doesn't track those indexes when building locally.

When you initially download from the cache you preserve timestamps on all of the unit/record files?

Hopefully the above explains that Bazel still downloads the indexes. I'm not sure how it determines timestamps in that case, but even if it's the current time, from the PoV of the build, that is correct, since it had to rebuild and the timestamps would be the current time in that case as well.

I see, I was thinking of doing something similar, with the optimization instead being we don't reimport per-target index stores unless they have changed (e.g. different timestamp on the zip itself). Then we don't need the optimization you're adding here although it could be useful if you're remapping a locally made index store.

It seems like your approach may be using folders though (tree artifacts) in which case the timestamps might be a bit different.

index-import.cpp

DavidGoldman · 2020-06-30T19:52:06Z

Ah I see, so you produce the index store as a side-effect, completely external to Bazel, even on CI?

We produce the indexes via Bazel, but that patch allows normal remote-cache downloading of the indexes, and when that happens we move those indexes to the same location the patch would put them (the moving is done completely external to Bazel though). The way the patch works though might be described as "completely external to Bazel" since Bazel doesn't track those indexes when building locally.

When you initially download from the cache you preserve timestamps on all of the unit/record files?

Hopefully the above explains that Bazel still downloads the indexes. I'm not sure how it determines timestamps in that case, but even if it's the current time, from the PoV of the build, that is correct, since it had to rebuild and the timestamps would be the current time in that case as well.

I see, I was thinking of doing something similar, with the optimization instead being we don't reimport per-target index stores unless they have changed (e.g. different timestamp on the zip itself). Then we don't need the optimization you're adding here although it could be useful if you're remapping a locally made index store.

It seems like your approach may be using folders though (tree artifacts) in which case the timestamps might be a bit different.

Does putting the up-to-date check behind a flag (enabled by default) seem reasonable then?

brentleyjones · 2020-06-30T19:53:56Z

Does putting the up-to-date check behind a flag (enabled by default) seem reasonable then?

Yes. I can add a flag for it, unless @kastiglione thinks it shouldn't have one.

keith · 2020-08-28T18:47:19Z

flag sounds fine

brentleyjones · 2020-10-19T13:33:12Z

Now that I'm back from paternity leave I'll get around to this soon!

This allows specifying a subset of an index store to import, if you know which files changed.

segiddins · 2020-12-11T22:23:51Z

ping @keith @brentleyjones

brentleyjones · 2020-12-11T22:42:40Z

I'll poke around with this next week. Sorry for the delay.

brentleyjones · 2021-01-25T14:00:03Z

The first commit has landed with #52. I'll make a new PR soon (sooner than last time 😉) for the remaining commit.

jerrymarino · 2021-02-20T02:48:39Z

@brentleyjones I've added the ability to import indexes from a specific compilation, this is useful for managing a global index in Bazel with minimal overhead.

I also realized that we can use this approach to incrementally import units and records when they are compiled in the build system: as an aspect or reading in a BEP stream.

Since the unit contain a pointer to the record, it just pulls dep records for the unit.

jerrymarino · 2021-02-20T02:49:01Z

#53 - this one should be able to use the other bits you've added for incremental too 👍

brentleyjones · 2022-05-04T21:20:22Z

I think #53 covers the remaining part of this for now. If there is still desire for whatever was left here, let me know and I'll work on getting a new PR rolling.

amberdixon mentioned this pull request Jun 18, 2020

Only move indexstores generated by build bazel-ios/rules_ios#78

Merged

amberdixon mentioned this pull request Jun 24, 2020

Separate objective-c indexstore units and records by objc_library bazel-ios/rules_ios#79

Merged

kastiglione reviewed Jun 24, 2020

View reviewed changes

index-import.cpp Outdated Show resolved Hide resolved

brentleyjones force-pushed the better-incremental branch from 0c1b563 to ed0e30e Compare June 24, 2020 21:58

kastiglione mentioned this pull request Aug 6, 2020

Only rewrite unit files when outdated #32

Closed

Brentley Jones added 2 commits November 18, 2020 08:32

Only transfer units if they are newer

1c64e29

Allow specifying direct paths to files to import

6bb07ce

This allows specifying a subset of an index store to import, if you know which files changed.

brentleyjones force-pushed the better-incremental branch from ed0e30e to 6bb07ce Compare November 18, 2020 14:33

brentleyjones mentioned this pull request Jan 22, 2021

Add option to only transfer units if they are newer #52

Merged

brentleyjones closed this May 4, 2022

Faster incremental imports #46

Faster incremental imports #46

Uh oh!

Conversation

brentleyjones commented Jun 15, 2020

Uh oh!

DavidGoldman commented Jun 17, 2020

Uh oh!

brentleyjones commented Jun 17, 2020

Uh oh!

DavidGoldman commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brentleyjones commented Jun 18, 2020

Uh oh!

DavidGoldman commented Jun 18, 2020

Uh oh!

brentleyjones commented Jun 18, 2020

Uh oh!

DavidGoldman commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DavidGoldman commented Jun 30, 2020

Uh oh!

brentleyjones commented Jun 30, 2020

Uh oh!

keith commented Aug 28, 2020

Uh oh!

brentleyjones commented Oct 19, 2020

Uh oh!

segiddins commented Dec 11, 2020

Uh oh!

brentleyjones commented Dec 11, 2020

Uh oh!

brentleyjones commented Jan 25, 2021

Uh oh!

jerrymarino commented Feb 20, 2021

Uh oh!

jerrymarino commented Feb 20, 2021

Uh oh!

brentleyjones commented May 4, 2022

Uh oh!

Uh oh!

DavidGoldman commented Jun 18, 2020 •

edited

Loading

DavidGoldman commented Jun 18, 2020 •

edited

Loading