CI Installation performance #6376
Replies: 5 comments 7 replies
-
@dmichon-msft I'd like to help push this idea along, but we need a better name for the feature. "Phased install" isn't very accurate since your prototype ended up not actually implementing per-project phases (given that significant speed gains were already achieved by the other optimizations). A key aspect involves distributing the work among a pool of worker threads, so... What if we call your algorithm a threaded install for PNPM? |
Beta Was this translation helpful? Give feedback.
-
@dmichon-msft sorry for not seeing this discussion earlier. This is amazing! This is actually partially what @mcollina has suggested to do recently. Could you make a PR with this improvement? Even if it is not ready yet, we can work together to polish it. |
Beta Was this translation helpful? Give feedback.
-
@nachoaldamav this might be interesting to you as you were also experimenting with sync fs operations in ultra |
Beta Was this translation helpful? Give feedback.
-
@zkochan |
Beta Was this translation helpful? Give feedback.
-
I am not sure if this has any relation, but I've noticed in my own project with ~1800 dependencies that on my CI machines, even if the pnpm store and node_modules is fully cached and PNPM reports that it's downloading nothing, it still takes 15-25 seconds for
Not sure if there's something that I can do to make it go faster given that this is delaying almost every step in our pipeline, adding many minutes to CI runs overall - it seems that locally it would just say |
Beta Was this translation helpful? Give feedback.
-
Summary
I've recently been doing work on dependency installation performance, since installations on my team's Linux CI agents are rather slower than I'd like (our repository serves about 300 developers at Microsoft). Following some guidance from one of my colleagues who has been doing installation performance work for an internal fork of Yarn, I was able to put together a CI-mode installer that reads the pnpm-lock.yaml and installs all the packages in the same layout (as seen by user code) in about 8% of the time.
Old Install (
pnpm install --frozen-lockfile
)pnpm version: 7.26.1
total packages: 7032 downloaded, 7406 installed, 8297 linked (including workspace packages)
mode: no cache, yes lockfile, no node_modules
agent: D32ads_v5 Azure SKU
os: Linux
duration: 5 minutes, 14.5 seconds
New Install (
rush phased-install
) (local custom algorithm)total packages: 7032 downloaded, 7406 installed, 8297 linked (including workspace packages)
mode: no cache, yes lockfile, no node_modules
agent: D32ads_v5 Azure SKU
os: Linux
duration: 23.6 seconds
How the custom installer works
pnpm-lock.yaml
, constructs a task graphnode:http2
connect(url, options)
API).node:https
module. Configure thehttps.Agent
to ensure that the timeout is at least 60 seconds.ArrayBuffer
andpostMessage
it (with transfer) to a worker thread for parsing. Allow up to#CPUs * 0.9
(configurable) concurrent worker threads. This ensures that the main thread is not doing any significant CPU work.SharedArrayBuffer
. Synchronously parse the raw TAR buffer, identifying generating a Map from filename to{ mode, offset, length }
.SharedArrayBuffer
and map back to the orchestrator viapostMessage
, so that it can be load balanced again, and so that the orchestrator can get access to the information inpackage.json
.mkdirSync
/openSync
/writeSync
/closeSync
to produce the output files.While the download is ongoing, the installer can be busy creating all the necessary
node_modules
symlink layouts, since that doesn't depend on the downloads.Performance notes
On Windows, unpack time increases by about 10x relative to Linux. I haven't found any ways around this; the parse/unpack routine is already about 2x as fast as the native
tar.exe
.The biggest performance gain here comes from using synchronous I/O in worker threads, since Node's async APIs ultimately are just synchronous I/O in worker threads, but by controlling the threads directly a lot of overhead can be avoided.
The next biggest performance gain is the use of multiple HTTP/2 connections for the registry communication. Using multiple connections separates TCP congestion control and allows the registry's load balancer to distribute them across multiple servers, while still taking advantage of HTTP/2's ability to send a large number of streams over the same connection.Edit: Appears that using HTTP/1.1 doesn't have a significant performance difference vs. HTTP/2, unless the number of concurrent streams is much higher than the number of open sockets.
It's entirely possible that performance could be improved a fair bit further, since I still used Node's async I/O for symlink creation instead of handing that off to the same pool of workers. Additionally, it might be beneficial to perform the downloads directly in the worker thread that will also perform the parsing, so that integrity checking and gunzip can be streamed without blocking the main thread. This would also avoid the need to copy the tarball to a single buffer before decompression.
Parse cost is dominated by gunzip.
I can share detailed code from the prototype; there's nothing proprietary beyond that I happen to have currently hardcoded the registry domain and process of obtaining the Authorization header, but that's easily abstracted.
Beta Was this translation helpful? Give feedback.
All reactions