From 854abd2b0c67875b8de3294e51da586ca7d5cbbd Mon Sep 17 00:00:00 2001 From: David Federman Date: Fri, 13 Oct 2023 14:13:14 -0700 Subject: [PATCH 1/7] Update Project Cache docs to reflect new functionality --- documentation/specs/project-cache.md | 251 ++++++++++++++++++--------- 1 file changed, 170 insertions(+), 81 deletions(-) diff --git a/documentation/specs/project-cache.md b/documentation/specs/project-cache.md index 7c3a2fad63a..40afc4881a7 100644 --- a/documentation/specs/project-cache.md +++ b/documentation/specs/project-cache.md @@ -1,10 +1,10 @@ # Summary -Project cache is a new assembly-based plugin extension point in MSBuild which determines whether a build request (a project) can be skipped during build. The main expected benefit is reduced build times via [caching and/or distribution](https://github.com/dotnet/msbuild/blob/main/documentation/specs/static-graph.md#weakness-of-the-old-model-caching-and-distributability). +Project cache is a new assembly-based plugin extension point in MSBuild which determines whether a build request (a project) can be skipped during build. The main expected benefit is reduced build times via [caching and/or distribution](static-graph.md#weakness-of-the-old-model-caching-and-distributability). # Motivation -As the introduction to [static graph](https://github.com/dotnet/msbuild/blob/main/documentation/specs/static-graph.md#what-is-static-graph-for) suggests, large and complex repos expose the weaknesses in MSBuild's scheduling and incrementality models as build times elongate. This project cache plugin lets MSBuild natively communicate with existing tools that enable build caching and/or distribution, enabling true scalability. +As the introduction to [static graph](static-graph.md#what-is-static-graph-for) suggests, large and complex repos expose the weaknesses in MSBuild's scheduling and incrementality models as build times elongate. This project cache plugin lets MSBuild natively communicate with existing tools that enable build caching and/or distribution, enabling true scalability. Visual Studio is one beneficiary. This plugin inverts dependencies among build systems: instead of higher level build engines ([Cloudbuild](https://www.microsoft.com/research/publication/cloudbuild-microsofts-distributed-and-caching-build-service/), [Anybuild](https://github.com/AnyBuild/AnyBuild), [BuildXL](https://github.com/microsoft/BuildXL), etc) calling into MSBuild, MSBuild calls into them, keeping MSBuild's external APIs and command line arguments largely unchanged and thus reusable by Visual Studio. @@ -14,106 +14,195 @@ This change also simplifies and unifies user experiences. MSBuild works the same - The plugin should tell MSBuild whether a build request needs building. If a project is skipped, then the plugin needs to ensure that: - it makes the filesystem look as if the project built - - it returns sufficient information back to MSBuild such that MSBuild can construct a valid [BuildResult](https://github.com/dotnet/msbuild/blob/d39f2e4f5f3d461bc456f9abed9adec4a2f0f542/src/Build/BackEnd/Shared/BuildResult.cs#L30-L33) for its internal scheduling logic, such that future requests to build a skipped project are served directly from MSBuild's internal caches. + - it returns sufficient information back to MSBuild such that MSBuild can construct a valid [`BuildResult`](/src/Build/BackEnd/Shared/BuildResult.cs#L30-L33) for its internal scheduling logic, such that future requests to build a skipped project are served directly from MSBuild's internal caches. # High-level design -- For each [BuildRequestData](https://github.com/dotnet/msbuild/blob/d39f2e4f5f3d461bc456f9abed9adec4a2f0f542/src/Build/BackEnd/BuildManager/BuildRequestData.cs#L83) ([ProjectInstance](https://github.com/dotnet/msbuild/blob/d39f2e4f5f3d461bc456f9abed9adec4a2f0f542/src/Build/Instance/ProjectInstance.cs#L71), Global Properties, Targets) submitted to the [BuildManager](https://github.com/dotnet/msbuild/blob/d39f2e4f5f3d461bc456f9abed9adec4a2f0f542/src/Build/BackEnd/BuildManager/BuildManager.cs#L38), MSBuild asks the plugin whether to build the request or not. - - If the BuildRequestData is based on a project path instead of a ProjectInstance, the project is evaluated by the BuildManager. -- If the plugin decides to build, then MSBuild proceeds building the project as usual. -- If the plugin decides to skip the build, it needs to return back to MSBuild the target results that the build request would have produced. It can either provide the results directly, or instruct MSBuild to run a set of less expensive targets on the projects with the same effect as the expensive targets. - - MSBuild injects the BuildResult into its Scheduler, so that future projects that need to call into the skipped project have the target results they need served directly from MSBuild's internal cache. + +Conceptually, there are two parts of caching: "cache get" and "cache add". "Cache get" is MSBuild asking the plugin if it wants to handle a build request, ie by fetching from some cache. "Cache add" is, upon cache miss, MSBuild providing enough information to the plugin during the build of the build request for the plugin add the results to its cache and safely be able to retrieve it for some future build. + +The "cache get" functionality was introduced in 16.9, while "cache add" was added in 17.8. + +## Plugin discovery + - Plugin dlls are discovered by MSBuild via a new special purpose `ProjectCachePlugin` [items](https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild-items). - These items can get injected into a project's import graph by package managers via the [PackageReference](https://docs.microsoft.com/en-us/nuget/consume-packages/package-references-in-project-files) item. - MSBuild will discover the plugin by searching project evaluations for `ProjectCachePlugin` items. -- Plugin instances reside only in the BuildManager node. Having it otherwise (plugin instances residing in all nodes) means forcing the plugins to either deal with distributed state or implement a long lived service. We consider this high complexity cost to not be worth it. We also want to avoid serializing the ProjectInstance between nodes, which is expensive. -- The plugin instance will get called in reverse topo sort order (from dependencies up towards dependents). Building in reverse topo sort order is common between Visual Studio solution builds and higher build engines. -- Plugins can function with and without a static graph. When a static graph is not provided, hints about the graph entry points are provided (details in Defining the "graph" when static graph is not available). -- A single plugin is supported (for now). - -# APIs and calling patterns -- Plugin APIs are found [here](https://github.com/cdmihai/msbuild/tree/projectCache/src/Build/BackEnd/Components/ProjectCache). - -## From BuildManager API users who have a project dependency graph at hand and want to manually issue builds for each graph node in reverse topo sort order. -- Users set [BuildParameters.ProjectCacheDescriptor](https://github.com/cdmihai/msbuild/blob/projectCache/src/Build/BackEnd/Components/ProjectCache/ProjectCacheDescriptor.cs) which triggers MSBuild to instantiate the plugin and call `ProjectCacheBase.BeginBuildAsync` on it in `BuildManager.BeginBuild`. - - `BuildManager.BeginBuild` does not wait for the plugin to initialize. The first query on the plugin will wait for plugin initialization. -- All the build requests submitted in the current `BuildManager.BeginBuild/EndBuild` session will get checked against the plugin instance. -- Only the user provided top level build requests are checked against the cache. The build requests issued recursively from the top level requests are not checked against the cache, since it is assumed that users issue build requests in reverse toposort order. Therefore when a project builds its references, those references should have already been built and present in MSBuild's internal cache, provided either by the project cache plugin or real builds. -- `BuildManager.EndBuild` calls `ProjectCacheBase.EndBuildAsync`. -- There is no static graph instantiated by MSBuild in this case and the user needs to set `ProjectCacheDescriptor.EntryPoints`. - -## From command line -- Requires /graph. It is the easiest way to implement the plugin: - - The static graph has all the project instances in the same process, makes it easy to find and keep plugin instances in one process. - - Builds bottom up, so by the time a project is considered, all of its references and their build results are already present in the Scheduler. -- User calls msbuild /graph. -- MSBuild constructs the static graph. -- The graph builder finds and loads the plugin into the `BuildManager`. - - Each graph node has a ProjectInstance, which is used to search for the plugin. - - If a project defines a plugin, then all projects in the graph must define that same plugin. - - The `ProjectGraph` is passed to the plugin upon initialization, so the plugin can avoid building its own static graph (in case it needs a graph). -- From this point on the calling patterns are similar to the `BuildParameters.ProjectCacheDescriptor` flow described [above](#from-buildmanager-api-users-who-have-a-project-dependency-graph-at-hand-and-want-to-manually-issue-builds-for-each-graph-node-in-reverse-topo-sort-order). The only difference is that the plugin is not instantiated in `BuildManager.BeginBuild`, but between graph construction and graph build. - - However, if `BuildParameters.ProjectCacheDescriptor` was set and a plugin was instantiated, it will take precedence. In this case graph build will not even search the graph nodes for plugins. - -## From Visual Studio, a temporary workaround -- Ideally, Visual Studio would use the [above APIs](#from-buildmanager-api-users-who-have-a-project-dependency-graph-at-hand-and-want-to-manually-issue-builds-for-each-graph-node-in-reverse-topo-sort-order) to set project cache plugins. Since VS evaluates all projects in a solution, it could search for `ProjectCachePlugin` items and provide them back to MSBuild during real builds via `BuildParameters.ProjectCacheDescriptor`. Until that happens, a workaround will be used: - - The workaround logic activates only when MSBuild detects that it's running under VS. - - Plugin discovery - - When VS evaluates projects via "new Project(..)" (it does this on all the solution projects on solution load), the evaluator will search for and store all detected plugins in a static field on the `BuildManager`. - - Plugin usage: - - The first build request will check the static state for the presence of plugins. If there's a plugin, it will instantiate it via plugin.BeginBuild. - -# Details -- Plugin discovery - - Each project defines an item containing the path to the plugin DLL: ```xml - + + + ``` -- Plugin acquisition - - Via the dependency manager of choice. PackageReference / Nuget for managed projects, pacman / vcpkg / nuget on packages.config for C++. The package contents injects the plugin item into the project import graph. -- Defining the "graph" when static graph is not available - - Plugins need to know the top level entry point for various reasons, but without a static graph the entry points need to be explicitly declared or inferred. - - Entry points are set via `ProjectCacheDescriptor.EntryPoints`. - - The Visual Studio workaround will use the `SolutionPath` global property as the graph entry point. -- Returning a valid BuildResult object on cache hits. - - On cache hits, MSBuild skips the project, but needs a BuildResult with target results to send back to the [Scheduler](https://github.com/dotnet/msbuild/blob/d39f2e4f5f3d461bc456f9abed9adec4a2f0f542/src/Build/BackEnd/Components/Scheduler/Scheduler.cs#L25). +- Progrmatic usage of `BuildManager` can also set `BuildParameters.ProjectCacheDescriptor` to apply a plugin to all requests. + +## Plugin lifetime + +- Plugin instances reside only in the `BuildManager` node. Having it otherwise (plugin instances residing in all nodes) means forcing the plugins to either deal with distributed state or implement a long lived service. We consider this high complexity cost to not be worth it. We also want to avoid serializing the `ProjectInstance` between nodes, which is expensive. +- `BuildManager.BeginBuild` calls `ProjectCacheBase.BeginBuildAsync` on all discovered plugins. This allows plugins to start any required initialization work. It does not wait for the plugins to fully initialize, ie it is a "fire-and-forget" call at this point. The first query on the plugin will wait for plugin initialization. +- `BuildManager.EndBuild` calls `ProjectCacheBase.EndBuildAsync` on all discovered plugins. This allows plugins to perform any required cleanup work. This is a blocking call which will be awaited before th build can complete. +- Only the user provided top level build requests are checked against the cache. The build requests issued recursively from the top level requests are not checked against the cache, since it is assumed that users issue build requests in reverse toposort order. Therefore when a project builds its references, those references should have already been built and present in MSBuild's internal cache, provided either by the project cache plugin or real builds. In other words, projects which are not well-described in the graph (eg using `` tasks directly) will not benefit from the cache. +- The plugin instance will get called in reverse topo sort order (from dependencies up towards dependents). This happens based on the requirement of using `/graph`. Building in reverse topo sort order is common between Visual Studio solution builds and higher build engines. +- Plugins can function with and without a static graph. When a static graph is not provided, hints about the graph entry points are provided (details in Defining the "graph" when static graph is not available). + +## Cache get scenario + +- For each [`BuildRequestData`](/src/Build/BackEnd/BuildManager/BuildRequestData.cs#L83) ([`ProjectInstance`](/src/Build/Instance/ProjectInstance.cs#L71), Global Properties, Targets) submitted to the [`BuildManager`](/src/Build/BackEnd/BuildManager/BuildManager.cs#L38), MSBuild asks the plugin whether to build the request or not. + - If the `BuildRequestData` is based on a project path instead of a `ProjectInstance`, the project is evaluated by the `BuildManager`. +- If the plugin decides to build, then MSBuild proceeds building the project as usual. +- If the plugin decides to skip the build, it needs to return back to MSBuild the target results that the build request would have produced. It can either provide the results directly, or instruct MSBuild to run a set of less expensive targets on the projects with the same effect as the expensive targets ("proxy targets"). + - MSBuild injects the `BuildResult` into its Scheduler, so that future projects that need to call into the skipped project have the target results they need served directly from MSBuild's internal cache. - Plugins have three options: - - Worst: plugins fake the build results for each target. We consider this brittle since the plugins will have to be updated whenever the build logic changes. - - Better: plugins tell MSBuild to run a proxy target as a replacement for the expensive target (e.g. it tells MSBuild to run `GetTargetPath` and use those results for the Build target). See the [ProjectReference protocol](https://github.com/dotnet/msbuild/blob/main/documentation/ProjectReference-Protocol.md) for more details. + - Worst: plugins fake the build results for each target based on assumptions about how the target executes. We consider this brittle since the plugins will have to be updated whenever the build logic changes. + - Better: plugins tell MSBuild to run a proxy target as a replacement for the expensive target (e.g. it tells MSBuild to run `GetTargetPath` and use those results for the `Build` target). See the [ProjectReference protocol](/documentation/ProjectReference-Protocol.md) for more details. - Proxy target assumptions: - They are very fast and only retrieve items and properties from the evaluated state (like `GetTargetPath`). - They do not mutate state (file system, environment variables, etc). - They do not MSBuild task call into other projects. - - The BuildManager schedules the proxy targets to build on the in proc node to avoid ProjectInstance serialization costs. - - Best: when the plugin's infrastructure (e.g. cloudbuild or anybuild builder nodes) runs and caches the build, it can tell MSBuild to serialize the BuildResult to a file via [BuildParameters.OutputResultsCacheFile](https://github.com/dotnet/msbuild/blob/d39f2e4f5f3d461bc456f9abed9adec4a2f0f542/src/Build/BackEnd/BuildManager/BuildParameters.cs#L767) or the `/outputResultsCache` command line argument. Then, on cache hits, the plugins deserialize the BuildResult and send it back to MSBuild. This is the most correct option, as it requires neither guessing nor proxy targets. Whatever the build did, that's what's returned. - - This is not yet possible. Outputting results cache files needs to first be decoupled from `/isolate`. - - Potential Issue: serialization format may change between runtime msbuild and the cache results file, especially if binary serialization is used. -- Configuring plugins - - Plugin configuration options can be provided as metadata on the `ProjectCachePlugin` item. + - The BuildManager schedules the proxy targets to build on the in-proc node to avoid `ProjectInstance` serialization costs. + - Best: A real `BuildResult` from a previous build is provided. This can either be done by serializing the `HandleProjectFinishedAsync`, or when the plugin's infrastructure (e.g. CloudBuild or AnyBuild builder nodes) runs and caches the build, it can tell MSBuild to serialize the BuildResult to a file via [BuildParameters.OutputResultsCacheFile](/src/Build/BackEnd/BuildManager/BuildParameters.cs#L767) or the `/outputResultsCache` command line argument. Then, on cache hits, the plugins deserialize the `BuildResult` and send it back to MSBuild. This is the most correct option, as it requires neither guessing nor proxy targets. Whatever a previous build did, that's exactly what's returned. + - Potential Issue: serialization format may change between writing and reading the `BuildResult`, especially if binary serialization is used. + +## Cache add scenario + +- Upon a cache miss, MSBuild will generally handle a request as normal, ie by building it. +- To facilitate the plugin being able to hande future builds, MSBuild can report file accesses as well as the build result for the BuildRequestData. +- MSBuild disables the in-proc node and uses [Detours](https://github.com/microsoft/Detours) to observe file accesses of the worker nodes. It forwards this information to the plugin for it to use as desired. + - This functionality has some implementation restrictions so will require additional opt-in. Specifically, the `/ReportFileAccesses` command-line flag or by setting `BuildParameters.ReportFileAccesses` for programatic use of `BuildManager`. If this is not set, no file accesses will be reported to the plugin, however the plugin will still be notified of the build result. +- Due to the experimental nature of the feature, `/ReportFileAccesses` is only available with MSBuild.exe (ie. the Visual Studio install; not `dotnet`) and only from the command-line. The Visual Studio IDE does not set `BuildParameters.ReportFileAccesses`. +- As described above, it is recommended to serialize the `BuildResult` from `HandleProjectFinishedAsync` for later replay. + +# APIs and calling patterns + +## Plugin API +[ProjectCachePluginBase](/src/Build/BackEnd/Components/ProjectCache/ProjectCachePluginBase.cs) is an abstract class which plugin implementors will subclass. It contains the following methods: + +```cs +public abstract Task BeginBuildAsync(CacheContext context, PluginLoggerBase logger, CancellationToken cancellationToken); + +public abstract Task EndBuildAsync(PluginLoggerBase logger, CancellationToken cancellationToken); + +public abstract Task GetCacheResultAsync(BuildRequestData buildRequest, PluginLoggerBase logger, CancellationToken cancellationToken); + +public virtual void HandleFileAccess(FileAccessContext fileAccessContext, FileAccessData fileAccessData); + +public virtual void HandleProcess(FileAccessContext fileAccessContext, ProcessData processData); + +public virtual Task HandleProjectFinishedAsync(FileAccessContext fileAccessContext, BuildResult buildResult, PluginLoggerBase logger, CancellationToken cancellationToken); +``` + +## Configuring plugins + +Plugins may need configuration options provided by the user. This can be done via metadata on the `ProjectCachePlugin` item: + ```xml - + + + $(PluginSetting1) + $(PluginSetting2) + $(PluginSetting3) + + ``` -- Configuring MSBuild to query the caches but not do any builds (bin-place from the cache without building anything): - - From command line: `msbuild /graph:NoBuild` - - From APIs: `GraphBuildRequestData.GraphBuildRequestDataFlags.{NoBuild}` -- Logging - - Log messages from `Plugin.{BeginBuild, EndBuild}` do not have a parent build event context and get displayed at the top level in the binlog. - - Log messages from querying a project get parented under that project's logging context. - - This is not yet implemented. For now, all plugin log messages do not have a parent event context. + +This can then be accessed by the plugin in `BeginBuildAsync` as a dictionary via `CacheContext.PluginSettings`. + +Note: As it is likely that plugins will be distributed through NuGet packages and those packages would define the `ProjectCachePlugin` item in a props or targets file in the package, it's recommended for plugin authors to have settings backed by MSBuild properties as in the example above. This allows the user to easily configure a plugin simply by setting the properties and including the `PackageReference`. + +## Enabling from command line + +- Requires `/graph` to light up cache get scenarios. +- Requires `/reportfileaccesses` to light up cache add scenarios. +- The static graph has all the project instances in the same process, makes it easy to find and keep plugin instances in one process. +- MSBuild constructs the static graph and build bottom up, so by the time a project is considered, all of its references and their build results are already present in the Scheduler. + +## Enabling from Visual Studio, a temporary workaround + +- Ideally, Visual Studio would provide a `ProjectGraph` instance. Until that happens, a workaround is needed. +- The workaround logic activates only when MSBuild detects that it's running under VS. +- When VS evaluates projects via `new Project(..)` (it does this on all the solution projects on solution load), the evaluator will search for and store all detected plugins in a static field on the `BuildManager`. +- The first build request will check the static state for the presence of plugins. If there's a plugin, it will initialize it at that point. +- Plugins will be given the graph entry points instead of the entire graph in this scenario. +- There is currently no way to enable cache add scenarios in Visual Studio. + +# Detours (cache add scenario) + +In order for MSBuild to observe the file accesses as part of the build, it uses Detours on the worker nodes. In this way the Scheduler node will events for all file accesses done by the worker nodes. As the Scheduler knows what build request a worker node is working on at any given moment, it is able to properly associate the file access with a build request and dispatch these augmented events to plugins via the plugins' `HandleFileAccess` and `HandleProcess` implementations. + +Note that the Scheduler node cannot use Detours on itself, so the in-proc node is disabled when repoting file accesses. Additionally task yielding is disabled since it would leave to improperly associated file accesses. + +## Pipe synchronization + +Because the Detours implementation being used communicates over a pipe, and node communication is also over a pipe, and pipes are async, there is some coordination required for ensuring that file accesses are associated with the proper build request. For example, if a "project finished" signal comes through the node communication pipe, but the detours pipe still has a queue of file accesses which have not been processed yet, those file accesses might be processed after the worker node has moved onto some other project. + +To address this problem, when a worker node finishes a project it will emit a dummy file access with a specific format known to MSBuild. When the scheduler not recieves as "project finished" event over the node communication pipe, it will wait to determine that the project is actually finished until it also recieves the dummy file access. This ensures that the all file accesses associated with the project have fully flushed from the pipe before the scheduler deterines the project is finished and schedules new work to the worker node (which would trigger new file accesses). + +# Plugin implementation guidance and simple example design + +The following will describe a very basic (and not very correct) plugin implementation. + +In practice, plugins will have to choose the specific level of correctness they're willing to trade off for the ability to get cache hits. Any machine state *could* impact build results, and the plugin implementation will need to determine what state matters and what doesn't. An obvious example to consider would be the content of the project file. An example which has trade-offs would be the processes' environment variables. Even the current time could possibly impact the build ("if Tuesday copy this file"), but if considered caching would be quite infeasible. + +## Fingerprinting + +A "fingerprint" describes each unique input which went into the building a build request. The more granular the fingerprint, the more "correct" the caching is, as described above. + +In this example, we will only consider the following as inputs, and thus part of the fingerprint: +- The global properties of the build request (eg `Configuration=Debug`, `Platform=AnyCPU`) +- The content hash of the project file +- The content hash of files defined in specific items we know contribute to the build, like `` and ``. +- The fingerprint of referenced projects + +Again, this is for illustrative purposes and a real implementation will want to use additional state for fingerprinting depending on the environment in which it runs and the correctness requirements. + +It can make sense for a fingerprint to be a hash of its inputs, so effectively is a byte array which can be represented by a string. + +At the beginning of the build, the plugin's `BeginBuildAsync` method will be called. As part of the `CacheContext`, the plugin is either given the graph or the entry points to the graph for which it can create a graph from. The plugin can use this graph to do some initial processing, like predicting various inputs which a project is likely to use. This information can then be stored to help construct a fingerprint for a build request later. + +## Cache storage + +Any storage mechanism can be used as a cache implementation, for example [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs/), or even just the local filesystem. At least in this example the only real requirement is that it can be used effectively as a key-value store. In many cases it can be useful for content to be keyed by its hash, and for the metadata file to be keyed by the fingerprint. In particular when content is keyed by hash, it is effectively deduplicated across multiple copies of the same file, which is common in builds. + +For illustration purposes, consider our cache implementation is based on a simple filesystem which a separate metadata and content directory inside it. Under the metadata dir, each file is a metadata file where the filename matches the fingerprint it's describing. Under the content dir, each file is a content file where the filename matches the hash of the content itself. + +## First build (cache population) + +In the very first build there will be no cache hits so the "cache add" scenario will be most relevant here. + +For a given project, `GetCacheResultAsync` will be invoked, but will end up returning a cache miss since the cache is empty. + +MSBuild will then build the project normally but under a detoured worker node. Because of this, the plugin will recieve `HandleFileAccess` and `HandleProcess` events. In this example implementation we will ignore `HandleProcess`. For `HandleFileAccess`, the plugin will simply store all `FileAccessData`s for a `FileAccessContext` to build up a list of all file accesses during the build. The plugin may decide to avoid storing the entire `FileAccessData` and instead just peel off the data it finds relevant (eg. paths, whether it was a read or write, etc). + +Once MSBuild is done building the project, it will call the plugin's `HandleProjectFinishedAsync`. Now the plugin knows the project is done and can process the results and add them a cache. In general it's only useful to cache successful results, so the plugin should filter out non-success results. The `FileAccessContext` provided can then be used to retrieve the list of `FileAccessData` the plugin recieved. These `FileAccessData` can be processed to understand which files were read and writted as part of the build. + +In our example, we can use the read files to construct a fingerprint for the build request. We can then add the files written during the build ("outputs") to some cache implementation. + +The plugin would then create some metadata describing the outputs (eg. the paths and hashes) and the serialized `BuildResult`, and associate it with the fingerprint and put that assocation in the cache. + +To illustrate this, consider a project with fingerprint `F` which wrote a single file `O` with hash `H` and had `BuildResult R`. The plugin could create a metadata file `M` which describes the outputs of the build (the path and hash of `O`) as well as the serialized `R`. Using the cache implementation described above, the plugin would write the following two files to the cache: + - `metadata/F -> M` + - `content/H -> O` + +This can then be used for future builds. + + ## Second Build (cache hits) + + In the second build we have a populated cache and so it could be possible to get cache hits. + + For a given project, `GetCacheResultAsync` will be invoked. The plugin can fingerprint the request and use that fingerprint to look up in its cache. If the cache entry exists, it can declare a cache hit. + +In the example above, if all inputs are the same as in the first build, we should end up with a fingerprint `F`. We look up in the metadata part of the cache (file `metadata/F`) and find that it exists. This means we have a cache hit. We can fetch that metadata `M` from the cache and find that is describes the output with path `O` and hash `H`. The plugin would then copy `content/H` to `O` and return the deserialized `BuildResult R` contained in `M` to MSBuild. + +If the inputs were not the same as in the first build, for example if a `Compile` item (a .cs file) changed, the fingerprint would be something else besides `F` and so would not have corresponding cache entries for it, indicating a cache miss. This will then go through the "cache add" scenario described above to populate the cache with the new fingerprint. # Caveats +- Without the "cache add" scenario enabled, the content which powers "cache get" must be populated by some external entity, for example some higher-order build engine. - Absolute paths circulating through the saved build results - Absolute paths will likely break the build, since they'd be captured on the machine that writes to the cache. + - Plugins can attempt to normalize well-known paths, like the repo root, but this can be brittle and there may be unknown path types. - Slow connections. In a coffee shop it might be faster to build everything instead of downloading from the cache. Consider racing plugin checks and building: if the bottom up build traversal reaches a node that's still querying the cache, cancel the cache query and build the node instead. - Inferring what targets to run on each node when using /graph - - Msbuild /graph requires that the [target inference protocol](https://github.com/dotnet/msbuild/blob/main/documentation/specs/static-graph.md#inferring-which-targets-to-run-for-a-project-within-the-graph) is good enough. + - Msbuild /graph requires that the [target inference protocol](static-graph.md#inferring-which-targets-to-run-for-a-project-within-the-graph) is good enough. - Small repos will probably be slower with plugin implementations that access the network. Remote distribution and caching will only be worth it for repos that are large enough. -# Future work -- On cache misses plugins can build the project with IO monitoring and write to the local cache. As far as we can tell there are two main possibilities: - - plugins build the projects themselves in isolation (without projects building their reference, probably by setting `BuildProjectReferences` to false) by calling msbuild.exe. - - plugins request msbuild to build the projects on special out of proc nodes whose IO system calls can be monitored. - -# Potential work of dubious value -- Allow multiple plugin instances and query them based on some priority, similar to sdk resolvers. +# Potential future work of dubious value - Enable plugins to work with the just-in-time top down msbuild traversal that msbuild natively does when it's not using `/graph`. - Extend the project cache API to allow skipping individual targets or tasks instead of entire projects. This would allow for smaller specialized plugins, like plugins that only know to distribute, cache, and skip CSC.exe calls. From 1e78fa90a4987b15709d472c62d4ceec1c9ee143 Mon Sep 17 00:00:00 2001 From: David Federman Date: Fri, 20 Oct 2023 10:52:58 -0700 Subject: [PATCH 2/7] PR feedback --- documentation/specs/project-cache.md | 52 +++++++++++----------------- 1 file changed, 21 insertions(+), 31 deletions(-) diff --git a/documentation/specs/project-cache.md b/documentation/specs/project-cache.md index 40afc4881a7..2e15b5bc3c7 100644 --- a/documentation/specs/project-cache.md +++ b/documentation/specs/project-cache.md @@ -18,7 +18,7 @@ This change also simplifies and unifies user experiences. MSBuild works the same # High-level design -Conceptually, there are two parts of caching: "cache get" and "cache add". "Cache get" is MSBuild asking the plugin if it wants to handle a build request, ie by fetching from some cache. "Cache add" is, upon cache miss, MSBuild providing enough information to the plugin during the build of the build request for the plugin add the results to its cache and safely be able to retrieve it for some future build. +Conceptually, there are two parts of caching: "cache get" and "cache add". "Cache get" is MSBuild asking the plugin if it wants to handle a build request, ie by fetching from some cache. "Cache add" is, upon cache miss, MSBuild providing enough information to the plugin during the build of the build request for the plugin to add the results to its cache and safely be able to retrieve it for some future build. The "cache get" functionality was introduced in 16.9, while "cache add" was added in 17.8. @@ -32,20 +32,21 @@ The "cache get" functionality was introduced in 16.9, while "cache add" was adde ``` -- Progrmatic usage of `BuildManager` can also set `BuildParameters.ProjectCacheDescriptor` to apply a plugin to all requests. +- Programmatic usage of `BuildManager` can also set `BuildParameters.ProjectCacheDescriptor` to apply a plugin to all requests. ## Plugin lifetime - Plugin instances reside only in the `BuildManager` node. Having it otherwise (plugin instances residing in all nodes) means forcing the plugins to either deal with distributed state or implement a long lived service. We consider this high complexity cost to not be worth it. We also want to avoid serializing the `ProjectInstance` between nodes, which is expensive. - `BuildManager.BeginBuild` calls `ProjectCacheBase.BeginBuildAsync` on all discovered plugins. This allows plugins to start any required initialization work. It does not wait for the plugins to fully initialize, ie it is a "fire-and-forget" call at this point. The first query on the plugin will wait for plugin initialization. -- `BuildManager.EndBuild` calls `ProjectCacheBase.EndBuildAsync` on all discovered plugins. This allows plugins to perform any required cleanup work. This is a blocking call which will be awaited before th build can complete. -- Only the user provided top level build requests are checked against the cache. The build requests issued recursively from the top level requests are not checked against the cache, since it is assumed that users issue build requests in reverse toposort order. Therefore when a project builds its references, those references should have already been built and present in MSBuild's internal cache, provided either by the project cache plugin or real builds. In other words, projects which are not well-described in the graph (eg using `` tasks directly) will not benefit from the cache. -- The plugin instance will get called in reverse topo sort order (from dependencies up towards dependents). This happens based on the requirement of using `/graph`. Building in reverse topo sort order is common between Visual Studio solution builds and higher build engines. -- Plugins can function with and without a static graph. When a static graph is not provided, hints about the graph entry points are provided (details in Defining the "graph" when static graph is not available). + - `BeginBuildAsync` may be called with or without a `ProjectGraph`, depending on MSBuild has one to provide. When it is not provided, hints about the graph entry points are provided with which the plugin may decide to construct the `ProjectGraph` itself, if desired. +- `BuildManager.EndBuild` calls `ProjectCacheBase.EndBuildAsync` on all discovered plugins. This allows plugins to perform any required cleanup work. This is a blocking call which will be awaited before the build can complete. +- The plugin instance will get called in reverse topological sort order (from dependencies up towards dependents). This happens when performing a graph build (`/graph`), Visual Studio solution builds, and commonly in higher build engines. +- Only the top-level build requests are checked against the cache. Build requests issued recursively from the top-level requests, for example a project building its dependencies, are not checked against the cache. However, because the build requests are assumed to be issued in reverse topological sort order, those requests should have already been built and present in MSBuild's internal result cache, provided either by the project cache plugin or real builds. A consequence of this is that, projects which are not well-described in the graph (eg using `` tasks directly) will not benefit from the cache. ## Cache get scenario - For each [`BuildRequestData`](/src/Build/BackEnd/BuildManager/BuildRequestData.cs#L83) ([`ProjectInstance`](/src/Build/Instance/ProjectInstance.cs#L71), Global Properties, Targets) submitted to the [`BuildManager`](/src/Build/BackEnd/BuildManager/BuildManager.cs#L38), MSBuild asks the plugin whether to build the request or not. + - If the `BuildRequestData` is based on a project path instead of a `ProjectInstance`, the project is evaluated by the `BuildManager`. - If the plugin decides to build, then MSBuild proceeds building the project as usual. - If the plugin decides to skip the build, it needs to return back to MSBuild the target results that the build request would have produced. It can either provide the results directly, or instruct MSBuild to run a set of less expensive targets on the projects with the same effect as the expensive targets ("proxy targets"). @@ -64,30 +65,19 @@ The "cache get" functionality was introduced in 16.9, while "cache add" was adde ## Cache add scenario - Upon a cache miss, MSBuild will generally handle a request as normal, ie by building it. -- To facilitate the plugin being able to hande future builds, MSBuild can report file accesses as well as the build result for the BuildRequestData. -- MSBuild disables the in-proc node and uses [Detours](https://github.com/microsoft/Detours) to observe file accesses of the worker nodes. It forwards this information to the plugin for it to use as desired. - - This functionality has some implementation restrictions so will require additional opt-in. Specifically, the `/ReportFileAccesses` command-line flag or by setting `BuildParameters.ReportFileAccesses` for programatic use of `BuildManager`. If this is not set, no file accesses will be reported to the plugin, however the plugin will still be notified of the build result. -- Due to the experimental nature of the feature, `/ReportFileAccesses` is only available with MSBuild.exe (ie. the Visual Studio install; not `dotnet`) and only from the command-line. The Visual Studio IDE does not set `BuildParameters.ReportFileAccesses`. +- MSBuild uses [Detours](https://github.com/microsoft/Detours) to observe file accesses of the worker nodes. To facilitate the plugin being able to handle future builds, it forwards this information as well as the build result to the plugin for it to use as desired, for example to add to a cache. + - This functionality has some implementation restrictions so will require additional opt-in. Specifically, the `/ReportFileAccesses` command-line flag or by setting `BuildParameters.ReportFileAccesses` for programmatic use of `BuildManager`. If this is not set, no file accesses will be reported to the plugin, however the plugin will still be notified of the build result. + - The in-proc node is disabled since MSBuild is unable to use Detours on the currently running process. It also would not want to capture the file accesses of the plugins themselves. + - Detours adds some overhead to file accesses. Based on initial experimentation, it's around 10-15%. There's the overhead of the plugin adding to the cache. Caching becomes valuable if it can save more than the overhead on average. +- Due to the experimental nature of the feature, `/ReportFileAccesses` is only available with MSBuild.exe (ie. the Visual Studio install; not `dotnet`), only for the x64 flavor (not x86 or arm64), and only from the command-line. The Visual Studio IDE does not set `BuildParameters.ReportFileAccesses`. - As described above, it is recommended to serialize the `BuildResult` from `HandleProjectFinishedAsync` for later replay. # APIs and calling patterns ## Plugin API -[ProjectCachePluginBase](/src/Build/BackEnd/Components/ProjectCache/ProjectCachePluginBase.cs) is an abstract class which plugin implementors will subclass. It contains the following methods: +[ProjectCachePluginBase](/src/Build/BackEnd/Components/ProjectCache/ProjectCachePluginBase.cs) is an abstract class which plugin implementors will subclass. -```cs -public abstract Task BeginBuildAsync(CacheContext context, PluginLoggerBase logger, CancellationToken cancellationToken); - -public abstract Task EndBuildAsync(PluginLoggerBase logger, CancellationToken cancellationToken); - -public abstract Task GetCacheResultAsync(BuildRequestData buildRequest, PluginLoggerBase logger, CancellationToken cancellationToken); - -public virtual void HandleFileAccess(FileAccessContext fileAccessContext, FileAccessData fileAccessData); - -public virtual void HandleProcess(FileAccessContext fileAccessContext, ProcessData processData); - -public virtual Task HandleProjectFinishedAsync(FileAccessContext fileAccessContext, BuildResult buildResult, PluginLoggerBase logger, CancellationToken cancellationToken); -``` +See the [Plugin implementation guidance and simple example design](#plugin-implementation-guidance-and-simple-example-design) section for guidance for plugin implementations. ## Configuring plugins @@ -111,7 +101,7 @@ Note: As it is likely that plugins will be distributed through NuGet packages an - Requires `/graph` to light up cache get scenarios. - Requires `/reportfileaccesses` to light up cache add scenarios. -- The static graph has all the project instances in the same process, makes it easy to find and keep plugin instances in one process. +- The static graph has all the project instances in the same process, making it easy to find and keep plugin instances in one process. - MSBuild constructs the static graph and build bottom up, so by the time a project is considered, all of its references and their build results are already present in the Scheduler. ## Enabling from Visual Studio, a temporary workaround @@ -125,15 +115,15 @@ Note: As it is likely that plugins will be distributed through NuGet packages an # Detours (cache add scenario) -In order for MSBuild to observe the file accesses as part of the build, it uses Detours on the worker nodes. In this way the Scheduler node will events for all file accesses done by the worker nodes. As the Scheduler knows what build request a worker node is working on at any given moment, it is able to properly associate the file access with a build request and dispatch these augmented events to plugins via the plugins' `HandleFileAccess` and `HandleProcess` implementations. +In order for MSBuild to observe the file accesses as part of the build, it uses Detours on the worker nodes. In this way the Scheduler node will emit events for all file accesses done by the worker nodes. As the Scheduler knows what build request a worker node is working on at any given moment, it is able to properly associate the file access with a build request and dispatch these augmented events to plugins via the plugins' `HandleFileAccess` and `HandleProcess` implementations. Note that the Scheduler node cannot use Detours on itself, so the in-proc node is disabled when repoting file accesses. Additionally task yielding is disabled since it would leave to improperly associated file accesses. ## Pipe synchronization -Because the Detours implementation being used communicates over a pipe, and node communication is also over a pipe, and pipes are async, there is some coordination required for ensuring that file accesses are associated with the proper build request. For example, if a "project finished" signal comes through the node communication pipe, but the detours pipe still has a queue of file accesses which have not been processed yet, those file accesses might be processed after the worker node has moved onto some other project. +Because the Detours implementation being used communicates over a pipe, and nodes communicate over a pipe as well, and pipes are async, there is some coordination required to ensure that file accesses are associated with the proper build request. For example, if a "project finished" signal comes through the node communication pipe, but the detours pipe still has a queue of file accesses which have not been processed yet, those file accesses might be processed after the worker node has moved onto some other project. -To address this problem, when a worker node finishes a project it will emit a dummy file access with a specific format known to MSBuild. When the scheduler not recieves as "project finished" event over the node communication pipe, it will wait to determine that the project is actually finished until it also recieves the dummy file access. This ensures that the all file accesses associated with the project have fully flushed from the pipe before the scheduler deterines the project is finished and schedules new work to the worker node (which would trigger new file accesses). +To address this problem, when a worker node finishes a project it will emit a dummy file access with a specific format known to MSBuild. When the scheduler node receives as "project finished" event over the node communication pipe, it will wait to determine that the project is actually finished until it also receives the dummy file access. This ensures that the all file accesses associated with the project have fully flushed from the pipe before the scheduler determines the project is finished and schedules new work to the worker node (which would trigger new file accesses). # Plugin implementation guidance and simple example design @@ -148,7 +138,7 @@ A "fingerprint" describes each unique input which went into the building a build In this example, we will only consider the following as inputs, and thus part of the fingerprint: - The global properties of the build request (eg `Configuration=Debug`, `Platform=AnyCPU`) - The content hash of the project file -- The content hash of files defined in specific items we know contribute to the build, like `` and ``. +- The content hash of files defined in specific items we know contribute to the build, like `` and `` - The fingerprint of referenced projects Again, this is for illustrative purposes and a real implementation will want to use additional state for fingerprinting depending on the environment in which it runs and the correctness requirements. @@ -171,7 +161,7 @@ For a given project, `GetCacheResultAsync` will be invoked, but will end up retu MSBuild will then build the project normally but under a detoured worker node. Because of this, the plugin will recieve `HandleFileAccess` and `HandleProcess` events. In this example implementation we will ignore `HandleProcess`. For `HandleFileAccess`, the plugin will simply store all `FileAccessData`s for a `FileAccessContext` to build up a list of all file accesses during the build. The plugin may decide to avoid storing the entire `FileAccessData` and instead just peel off the data it finds relevant (eg. paths, whether it was a read or write, etc). -Once MSBuild is done building the project, it will call the plugin's `HandleProjectFinishedAsync`. Now the plugin knows the project is done and can process the results and add them a cache. In general it's only useful to cache successful results, so the plugin should filter out non-success results. The `FileAccessContext` provided can then be used to retrieve the list of `FileAccessData` the plugin recieved. These `FileAccessData` can be processed to understand which files were read and writted as part of the build. +Once MSBuild is done building the project, it will call the plugin's `HandleProjectFinishedAsync`. Now the plugin knows the project is done and can process the results and add them to a cache. In general it's only useful to cache successful results, so the plugin should filter out non-success results. The `FileAccessContext` provided can then be used to retrieve the list of `FileAccessData` the plugin recieved. These `FileAccessData` can be processed to understand which files were read and written as part of the build. In our example, we can use the read files to construct a fingerprint for the build request. We can then add the files written during the build ("outputs") to some cache implementation. @@ -189,7 +179,7 @@ This can then be used for future builds. For a given project, `GetCacheResultAsync` will be invoked. The plugin can fingerprint the request and use that fingerprint to look up in its cache. If the cache entry exists, it can declare a cache hit. -In the example above, if all inputs are the same as in the first build, we should end up with a fingerprint `F`. We look up in the metadata part of the cache (file `metadata/F`) and find that it exists. This means we have a cache hit. We can fetch that metadata `M` from the cache and find that is describes the output with path `O` and hash `H`. The plugin would then copy `content/H` to `O` and return the deserialized `BuildResult R` contained in `M` to MSBuild. +In the example above, if all inputs are the same as in the first build, we should end up with a fingerprint `F`. We look up in the metadata part of the cache (file `metadata/F`) and find that it exists. This means we have a cache hit. We can fetch that metadata `M` from the cache and find that it describes the output with path `O` and hash `H`. The plugin would then copy `content/H` to `O` and return the deserialized `BuildResult R` contained in `M` to MSBuild. If the inputs were not the same as in the first build, for example if a `Compile` item (a .cs file) changed, the fingerprint would be something else besides `F` and so would not have corresponding cache entries for it, indicating a cache miss. This will then go through the "cache add" scenario described above to populate the cache with the new fingerprint. From cd782c5c3b40416eb4d2302ee89ab5edd7b73cb9 Mon Sep 17 00:00:00 2001 From: David Federman Date: Wed, 20 Dec 2023 13:22:27 -0800 Subject: [PATCH 3/7] PR feedback Co-authored-by: Rainer Sigwald --- documentation/specs/project-cache.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/specs/project-cache.md b/documentation/specs/project-cache.md index 2e15b5bc3c7..96cc741b0ba 100644 --- a/documentation/specs/project-cache.md +++ b/documentation/specs/project-cache.md @@ -40,7 +40,7 @@ The "cache get" functionality was introduced in 16.9, while "cache add" was adde - `BuildManager.BeginBuild` calls `ProjectCacheBase.BeginBuildAsync` on all discovered plugins. This allows plugins to start any required initialization work. It does not wait for the plugins to fully initialize, ie it is a "fire-and-forget" call at this point. The first query on the plugin will wait for plugin initialization. - `BeginBuildAsync` may be called with or without a `ProjectGraph`, depending on MSBuild has one to provide. When it is not provided, hints about the graph entry points are provided with which the plugin may decide to construct the `ProjectGraph` itself, if desired. - `BuildManager.EndBuild` calls `ProjectCacheBase.EndBuildAsync` on all discovered plugins. This allows plugins to perform any required cleanup work. This is a blocking call which will be awaited before the build can complete. -- The plugin instance will get called in reverse topological sort order (from dependencies up towards dependents). This happens when performing a graph build (`/graph`), Visual Studio solution builds, and commonly in higher build engines. +- The plugin instance will get called in reverse topological sort order (from referenced projects up towards referencing projects). This happens when performing a graph build (`/graph`), Visual Studio solution builds, and commonly in higher build engines. - Only the top-level build requests are checked against the cache. Build requests issued recursively from the top-level requests, for example a project building its dependencies, are not checked against the cache. However, because the build requests are assumed to be issued in reverse topological sort order, those requests should have already been built and present in MSBuild's internal result cache, provided either by the project cache plugin or real builds. A consequence of this is that, projects which are not well-described in the graph (eg using `` tasks directly) will not benefit from the cache. ## Cache get scenario From b344365ea37a48db0dd8f006a66c47b888220067 Mon Sep 17 00:00:00 2001 From: David Federman Date: Wed, 20 Dec 2023 13:22:40 -0800 Subject: [PATCH 4/7] PR feedback Co-authored-by: Rainer Sigwald --- documentation/specs/project-cache.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/specs/project-cache.md b/documentation/specs/project-cache.md index 96cc741b0ba..f030c110e8c 100644 --- a/documentation/specs/project-cache.md +++ b/documentation/specs/project-cache.md @@ -41,7 +41,7 @@ The "cache get" functionality was introduced in 16.9, while "cache add" was adde - `BeginBuildAsync` may be called with or without a `ProjectGraph`, depending on MSBuild has one to provide. When it is not provided, hints about the graph entry points are provided with which the plugin may decide to construct the `ProjectGraph` itself, if desired. - `BuildManager.EndBuild` calls `ProjectCacheBase.EndBuildAsync` on all discovered plugins. This allows plugins to perform any required cleanup work. This is a blocking call which will be awaited before the build can complete. - The plugin instance will get called in reverse topological sort order (from referenced projects up towards referencing projects). This happens when performing a graph build (`/graph`), Visual Studio solution builds, and commonly in higher build engines. -- Only the top-level build requests are checked against the cache. Build requests issued recursively from the top-level requests, for example a project building its dependencies, are not checked against the cache. However, because the build requests are assumed to be issued in reverse topological sort order, those requests should have already been built and present in MSBuild's internal result cache, provided either by the project cache plugin or real builds. A consequence of this is that, projects which are not well-described in the graph (eg using `` tasks directly) will not benefit from the cache. +- Only the top-level build requests are checked against the cache. Build requests issued recursively from the top-level requests, for example a project building its dependencies, are not checked against the cache. However, because the build requests are assumed to be issued in reverse topological sort order, those requests should have already been built and present in MSBuild's internal result cache, provided either by the project cache plugin or real builds. A consequence of this is that projects which are not well-described in the graph (e.g. using `` tasks directly) will not benefit from the cache. ## Cache get scenario From dd06d5490b998d14464908e211427f81bf005ef7 Mon Sep 17 00:00:00 2001 From: David Federman Date: Wed, 20 Dec 2023 13:22:52 -0800 Subject: [PATCH 5/7] PR feedback Co-authored-by: Rainer Sigwald --- documentation/specs/project-cache.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/specs/project-cache.md b/documentation/specs/project-cache.md index f030c110e8c..e7785e8c0a4 100644 --- a/documentation/specs/project-cache.md +++ b/documentation/specs/project-cache.md @@ -149,7 +149,7 @@ At the beginning of the build, the plugin's `BeginBuildAsync` method will be cal ## Cache storage -Any storage mechanism can be used as a cache implementation, for example [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs/), or even just the local filesystem. At least in this example the only real requirement is that it can be used effectively as a key-value store. In many cases it can be useful for content to be keyed by its hash, and for the metadata file to be keyed by the fingerprint. In particular when content is keyed by hash, it is effectively deduplicated across multiple copies of the same file, which is common in builds. +Any storage mechanism can be used as a cache implementation, for example [Azure Blob Storage](https://azure.microsoft.com/products/storage/blobs/), or even just the local filesystem. At least in this example the only real requirement is that it can be used effectively as a key-value store. In many cases it can be useful for content to be keyed by its hash, and for the metadata file to be keyed by the fingerprint. In particular when content is keyed by hash, it is effectively deduplicated across multiple copies of the same file, which is common in builds. For illustration purposes, consider our cache implementation is based on a simple filesystem which a separate metadata and content directory inside it. Under the metadata dir, each file is a metadata file where the filename matches the fingerprint it's describing. Under the content dir, each file is a content file where the filename matches the hash of the content itself. From 97a51c298464e81e013e2746b795d061232aae2f Mon Sep 17 00:00:00 2001 From: David Federman Date: Wed, 20 Dec 2023 13:23:04 -0800 Subject: [PATCH 6/7] PR feedback Co-authored-by: Rainer Sigwald --- documentation/specs/project-cache.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/specs/project-cache.md b/documentation/specs/project-cache.md index e7785e8c0a4..66beb99cfe9 100644 --- a/documentation/specs/project-cache.md +++ b/documentation/specs/project-cache.md @@ -151,7 +151,7 @@ At the beginning of the build, the plugin's `BeginBuildAsync` method will be cal Any storage mechanism can be used as a cache implementation, for example [Azure Blob Storage](https://azure.microsoft.com/products/storage/blobs/), or even just the local filesystem. At least in this example the only real requirement is that it can be used effectively as a key-value store. In many cases it can be useful for content to be keyed by its hash, and for the metadata file to be keyed by the fingerprint. In particular when content is keyed by hash, it is effectively deduplicated across multiple copies of the same file, which is common in builds. -For illustration purposes, consider our cache implementation is based on a simple filesystem which a separate metadata and content directory inside it. Under the metadata dir, each file is a metadata file where the filename matches the fingerprint it's describing. Under the content dir, each file is a content file where the filename matches the hash of the content itself. +For illustration purposes, consider our cache implementation is based on a simple filesystem with a separate metadata and content directory inside it. Under the metadata dir, each file is a metadata file where the filename matches the fingerprint it's describing. Under the content dir, each file is a content file where the filename matches the hash of the content itself. ## First build (cache population) From 95220f68ec71553261512aed5316e71bff96749e Mon Sep 17 00:00:00 2001 From: David Federman Date: Wed, 20 Dec 2023 13:26:29 -0800 Subject: [PATCH 7/7] PR feedback --- documentation/specs/project-cache.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/specs/project-cache.md b/documentation/specs/project-cache.md index 66beb99cfe9..b0ce961313d 100644 --- a/documentation/specs/project-cache.md +++ b/documentation/specs/project-cache.md @@ -168,7 +168,7 @@ In our example, we can use the read files to construct a fingerprint for the bui The plugin would then create some metadata describing the outputs (eg. the paths and hashes) and the serialized `BuildResult`, and associate it with the fingerprint and put that assocation in the cache. To illustrate this, consider a project with fingerprint `F` which wrote a single file `O` with hash `H` and had `BuildResult R`. The plugin could create a metadata file `M` which describes the outputs of the build (the path and hash of `O`) as well as the serialized `R`. Using the cache implementation described above, the plugin would write the following two files to the cache: - - `metadata/F -> M` + - `metadata/F -> M:"{outputs: [{path: 'path/to/O', hash: H}], result: R}"` - `content/H -> O` This can then be used for future builds.