Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate outputs based on better than timestamp #371

Closed
natebosch opened this issue Aug 17, 2017 · 2 comments
Closed

Invalidate outputs based on better than timestamp #371

natebosch opened this issue Aug 17, 2017 · 2 comments

Comments

@natebosch
Copy link
Member

Today a changed file can cascade to any output which may end up rebuilding way more than is optimal. If we track some type of content hash we can get away with pruning the build graph more eagerly.

@jakemac53
Copy link
Contributor

This will also help for correctness, so I am going to go ahead and tackle it.

See #536 for an example.

@jakemac53 jakemac53 removed their assignment Oct 26, 2017
@jakemac53
Copy link
Contributor

Actually I am only going to tackle invalidating inputs, not also invaliding outputs based on whether files actually changed (at least right now).

jakemac53 added a commit that referenced this issue Nov 1, 2017
…stamps (#547)

All `AssetNode`s now have a `Digest` field (from package:crypto). These are computed eagerly, and must be available for any node which has a corresponding asset that exists. It is also serialized as a part of the asset graph (base64 encoded for json compatibility for now).

There is now an additional interface `DigestAssetReader` which adds the `Future<Digest> digest(id)` method to a normal asset reader, and the `RunnerAssetReader` interface now requires that interface to be implemented.

`AssetGraph.build` is now a static method which returns a `Future<AssetGraph>`, and requires a `DigestAssetReader`. It eagerly reads in digests for all source assets.

MD5 is the current hash function used, because collisions are not a concern in this case (we don't use the hashes for lookups, only to compare new/old values for a single file). Bazel also uses md5 today, as well as the analyzer for summaries, so this is also consistent with those (although they may change it is still kind of nice). It is also significantly faster than sha1 in practice.

This does complicate some of the tests a bit unfortunately (search this pr for `computeHash` in tests....), but its a necessary evil imo. I don't see a way to simplify it significantly.

I didn't yet completely remove the `lastModified` method, it is still used when checking if a build script itself has been updated. I wanted to keep the scope of this as small as I could, so I will follow up with a different pull request to update that as well, and remove `lastModified` entirely.

Partially fixes #536
Unblocks #371 and #412
@jakemac53 jakemac53 self-assigned this Nov 10, 2017
jakemac53 added a commit that referenced this issue Nov 14, 2017
Highlights are as follows:

- Fixed some behavior around replacing `SyntheticAssetNode`s with real ones. Previously we would remove all the outputs from the graph and clean up their inputs sets which wasn't correct (we want to retain all those edges and nodes, and just swap out the node in question).
- When invalidating nodes, we no longer pre-emptively delete them. In fact, I never explicitly delete them in this cl I just rely on the asset writer to overwrite them (unless their primary input is deleted).
- The inputs set for `GeneratedAssetNode`s is now ordered to ensure the combined md5 hash will always be the same for the same inputs.

Fixes #371
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants