feat: improve npm v7 support by walking the dependency tree #15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This restores full & proper support for
npm
v7, making audit output equivalentto pre-v7 - specifically now when auditing with npm v7:
The backstory:
npm v7 changed the audit output to something like this
Importantly, it no longer includes a list of the specific versions of vulnerable
packages it found in the tree, nor does it guarantee a complete and full
dependency path. As a result, the original short-term solution that was landed
to provide basic npm v7 support used only the vulnerable package name, meaning
ignores were no longer per vulnerability, but instead per advisory + package.
The solution:
All of this information that we want is stored in the dependency tree that
npm
uses to do its thing (it's actually pretty much most of the point of the tree),
so if we can get a complete representation of the tree, we can figure out the
versions & paths ourselves.
This is actually something
npm
already supports doing for us - it's thenpm list
command. Even better, you can pass it specific package names +constraints, and it'll give you the paths to only those named packages that
match the constraints:
output
So originally I did a rough implementation for that: we'd call
npm audit
, andif the results were v7's output, we'd then call
npm list <packages>
.That worked great, except it turns out
npm list
requiresnode_modules
,because it uses the dependency tree that exists on disk (termed the "actual"
tree) - this meant we couldn't use it as a solution because
npm i
which means systems like CIswould have to meet all external requirements like node versions and c-libs,
and
fsevents
) wouldn't be shown on thetree if your OS didn't match (since the dependency wouldn't be installed).
This wouldn't be an issue if we could have
npm list
use the virtual tree,which is what npm computes initially (and stores in the lockfile) in order to
then compute the actual tree (aka, figure out how to write the virtual tree to
disk).
So that's what I did.
But, while I was doing so, I kept thinking about how else we could get this,
because calling
npm list
adds some extra overhead plus I didn't know when myPR would get landed & released (if at all) - and I realised something:
package-lock.json
npm audit
to work (and has to be in sync withpackage.json
)package.json
&package-lock.json
in orderto do
npm audit
package.json
&package-lock.json
ourselves without adding any extra requirements for using
audit-app
🎉Now,
npm
actually publish the component that is responsible for interactingwith and managing the dependency tree &
node_modules
as its own package for itto be used on its own - it's called
arborist
; in theory we coulduse for this, but
arborist
provides a lot more than we need since it has tosupport far more than just walking the dependency tree, so it's a very big
dependency to be using for just v7 lock-file support.
Fortunately, walking the virtual dependency tree is actually a relatively
straightforward task because the whole point is that it describes the tree
without external restrictions like supported OSs, de-duplications, engine
versions, etc - those only come into play when figuring out how to represent the
virtual tree in the current environment (aka, installing dependencies), which is
something
arborist
does.This means I was able to write a lightweight tree walker that should be
comparable to
arborist
, without adding a lot of extra dependencies or size toaudit-app
; this in turn allows us to calculate the missing information forvulnerable packages when auditing v7 lock-files.
I've tested it against a number of repos and trees, with a number of variations
and weird trees, and so far it looks to be working fine - which thinking about
it makes sense: a lot of the weird edge-case stuff really is when you're
installing the dependencies to an actual file system.
The primary "edge-case" I've found so far is with the handling of
file:
dependencies, which is mainly impactful because they're how
npm
7 workspacesoperate. Specifically, it's not possible to accurately determine if a
file:
dependency is a direct dependency or pulled in by another package or both or if
they're a workspace dependency.
The bottom line is that
file:
dependencies are treated as if they're directdependencies, in addition to the usual dependency tree pathing; in practice this
means if a package depends on a
file:
dependency which itself pulls in avulnerable package, then that vulnerable package will be counted twice: once for
the top-level package, and then again from the
file:
dependency.I don't think this should create any big issues because:
file:
dependencies are complex and weird anyway, with their own securitycases - this is doubly so for packages depending on
file:
dependenciesnpm
/arborist
itself doesn't handle these any better, so there isn't reallyany alternatives anyway
This work has also highlighted that it would probably be best to refactor a
bunch of the code so that it's cleaner and better structured - specifically
about breaking out the auditing functions into their own grouped files based on
what they're actually doing.
I think the test layout could use a refactor to - ideally it'd be great to have
a proper e2e suite that could be run.
However, I've decided to get this PR landed first because I think it greatly
influences what a good layout would be, so refactoring first would just be doing
work that'd need refactoring very soon afterwards anyway.
Finally, I think there are some normalizing of the audit output that we should
do:
.>
prefix in paths frompnpm
pnpm
output to be one-per-path (to match npmv7)