Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Improve pnp loader speed and memory #6671

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

goloveychuk
Copy link
Contributor

@goloveychuk goloveychuk commented Jan 31, 2025

What's the problem this PR addresses?

Firstly, thanks for moving industry forward with pnp.
I've tried to migrate big project to pnp and found out that it's slower (even with compression=0) and uses much more memory.
So much memory that OOM kills jest workers in our CI.
Which is a bummer, because pnp has a potential to be faster than native fs. Probably.

So to target this, I've implemented readonly zip reader in native js.

Speed

It's pretty fast, there is bench/run.ts script which I'm running for all my yarn cache, results for me are:

yarn node -r ./scripts/setup-ts-execution bench/run.ts 
files count: 18976 size 21888.08 MB
MiniZipFS: try 1
{ took: 10877.897208, allFiles: 3313709 }
MiniZipFS: try 2
{ took: 9407.813458999999, allFiles: 3313709 }
MiniZipFS: try 3
{ took: 9956.555375, allFiles: 3313709 }
ZipFS: try 1
{ took: 22209.03275, allFiles: 3313709 }
ZipFS: try 2
{ took: 21734.838125000002, allFiles: 3313709 }
ZipFS: try 3
{ took: 21009.119666, allFiles: 3313709 }

Now, it's possible to make it even faster, if I benchmark zip parsing only (readZipSync), it takes 1500ms on warm fs cache.
Which means, that registerListing, registerEntry impl is not optimal.

Memory

As I understand, there is issue with memory management with libzip compiled to asm.js.

yarn node -r ./scripts/setup-ts-execution bench/run.ts  --mem
files count: 18976 size 21888.08 MB
MiniZipFS: try 1
{
  p50: '577.78 MB',
  p90: '582.83 MB',
  p95: '588.58 MB',
  p99: '588.58 MB'
}
MiniZipFS: try 2
{
  p50: '616.94 MB',
  p90: '618.89 MB',
  p95: '619.66 MB',
  p99: '621.78 MB'
}
MiniZipFS: try 3
{
 p50: '644.38 MB',
 p90: '645.3 MB',
 p95: '645.3 MB',
 p99: '645.3 MB'
}
ZipFS: try 1
{
  p50: '1030.02 MB',
  p90: '1825.63 MB',
  p95: '2750.5 MB',
  p99: '2815.41 MB'
}
ZipFS: try 2
{
  p50: '948.2 MB',
  p90: '1191.19 MB',
  p95: '2336.78 MB',
  p99: '2378.06 MB'
}
ZipFS: try 3
{
  p50: '1366.63 MB',
  p90: '1933.64 MB',
  p95: '1979.81 MB',
  p99: '2031.67 MB'
}

More speed improvements

Even with new zip implementation, pnp is still slower then fs, although pnp has a potential to be faster.

pnp:

yarn node -r ./scripts/setup-ts-execution bench/run-resolve.ts 
/Users/vadymh/github/berry/.yarn/cache/micromatch-npm-4.0.5-cfab5d7669-a749888789.zip/node_modules/micromatch/index.js
5549.224292

Native fs

1357.4191660000001

...

Caveats

Security of solution should be checked.
Possible vectors of attack - path traversal, mode (set uid, set gid, exec flag, perms).
Should check what libzip does to mitigate this, but maybe, because it's readonly and no extraction done, it's safe. I'm not sure.

If you are interested, I'll finish this PR and investigate further performance improvements.

Checklist

  • I have set the packages that need to be released for my changes to be effective.
  • I will check that all automated PR checks pass before the PR gets reviewed.

@goloveychuk goloveychuk changed the title Improve pnp loader speed and memory [POC] Improve pnp loader speed and memory Feb 1, 2025
@goloveychuk
Copy link
Contributor Author

@RDIL @merceyz @arcanis can you pls check? Are you interested in finishing this pr?

@arcanis
Copy link
Member

arcanis commented Feb 3, 2025

I like the idea (I experimented myself with using raw JS instead of wasm in #5738, although it was for a different section of the code), however I'd have issues with duplicating the ZipFS implementation which is already gnarly enough as it is. Do you think it could instead be done by implementing the raw libzip interface?

Also, did you check actual performances? I think https://github.com/andrewrk/poop allows you to check both memory usage & performances in the same pass.

@goloveychuk
Copy link
Contributor Author

goloveychuk commented Feb 3, 2025

I will check performance with this tool tomorrow.

Regarding avoiding zipfs duplication - it's possible to abstract libzip and native js zip with interface, with small or no changes to orig zipfs interface.
But if we want to check more improvements and make peformance on par with native fs, we will need to change code in a way that it's fast for readonly but not convenient for other ops.
In this context, it makes sense to have different implementations, one is performance-focused readonly, second support full ops set.

@goloveychuk
Copy link
Contributor Author

I'm currently suspecting lookup file logic is making it slower than fs.

@merceyz
Copy link
Member

merceyz commented Feb 3, 2025

Some previous attempts at doing this: #2757 and #3228

@goloveychuk
Copy link
Contributor Author

goloveychuk commented Feb 4, 2025

Benchmarks:
poopuses linux APIs, but I'm on macos. I've run other tools:
Time:
CleanShot 2025-02-04 at 12 17 24@2x
Memory:
CleanShot 2025-02-04 at 12 21 02@2x
Also there are results in PR header.
All benchmarks shows same: native js 2 times faster for reading (opening + listing) zip archives, but most importantly, uses much less memory:
time -v shows 600 vs 3800
process.memoryUsage().rss shows 600 vs 2000-2800 (not peak but p99, see second output in header)

@goloveychuk
Copy link
Contributor Author

goloveychuk commented Feb 5, 2025

@arcanis how we can move from this point?
I see two directions.

  1. try to optimize pnp runtime further for critical paths (enhanced-resolve resolution, commonjs resolve, esm resolve, file reading, which probably translates to stat, readFile, resolveToUnqualified, )
  2. make common interface for minizip and libzip and merge PR, since it's already gives substantial improvements. Investigate runtime improvements in other PRs.
    WDYT?

@arcanis
Copy link
Member

arcanis commented Feb 11, 2025

That sounds promising. I think the best approach for now is to first land your implementation in its current state (ie without more optimizations), as it's already a significant endeavour we'll want to test in prod for some time.

To that end, can you look at unifying the interface + adding a setting to select which version to use? I think for this first iteration we can use a simple environment variable checked from within the loader; something like PNP_ZIP=js that, when set, will pass a different Zip implementation to ZipOpenFS (here).

@goloveychuk goloveychuk mentioned this pull request Feb 14, 2025
9 tasks
@goloveychuk
Copy link
Contributor Author

@arcanis #6688
Some todos are left, but main implementation is ready.
I've left a flag in .yarnrc, so it can be enabled for all developers for project, which will easy testing in my company.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants