feat: improve npm v7 support by walking the dependency tree #15

G-Rath · 2021-07-01T23:28:57Z

This restores full & proper support for npm v7, making audit output equivalent
to pre-v7 - specifically now when auditing with npm v7:

Vulnerability paths are now full paths
Versions of the packages that are vulnerable are now listed in table output

The backstory:

npm v7 changed the audit output to something like this

{
  "auditReportVersion": 2,
  "vulnerabilities": {
    "minimist": {
      "name": "minimist",
      "severity": "low",
      "via": [
        {
          "source": 1179,
          "name": "minimist",
          "dependency": "minimist",
          "title": "Prototype Pollution",
          "url": "https://npmjs.com/advisories/1179",
          "severity": "low",
          "range": "<0.2.1 || >=1.0.0 <1.2.3"
        }
      ],
      "effects": ["mkdirp"],
      "range": "<0.2.1 || >=1.0.0 <1.2.3",
      "nodes": ["node_modules/minimist"],
      "fixAvailable": {
        "name": "mkdirp",
        "version": "0.5.5",
        "isSemVerMajor": false
      }
    },
    "mkdirp": {
      "name": "mkdirp",
      "severity": "low",
      "via": ["minimist"],
      "effects": [],
      "range": "0.4.1 - 0.5.1",
      "nodes": ["node_modules/mkdirp"],
      "fixAvailable": {
        "name": "mkdirp",
        "version": "0.5.5",
        "isSemVerMajor": false
      }
    }
  },
  "metadata": {
    "vulnerabilities": {
      "info": 0,
      "low": 2,
      "moderate": 0,
      "high": 0,
      "critical": 0,
      "total": 2
    },
    "dependencies": {
      "prod": 3,
      "dev": 0,
      "optional": 0,
      "peer": 0,
      "peerOptional": 0,
      "total": 2
    }
  }
}

Importantly, it no longer includes a list of the specific versions of vulnerable
packages it found in the tree, nor does it guarantee a complete and full
dependency path. As a result, the original short-term solution that was landed
to provide basic npm v7 support used only the vulnerable package name, meaning
ignores were no longer per vulnerability, but instead per advisory + package.

The solution:

All of this information that we want is stored in the dependency tree that npm
uses to do its thing (it's actually pretty much most of the point of the tree),
so if we can get a complete representation of the tree, we can figure out the
versions & paths ourselves.

This is actually something npm already supports doing for us - it's the
npm list command. Even better, you can pass it specific package names +
constraints, and it'll give you the paths to only those named packages that
match the constraints:

npm lists 'make-dir@<=2' --json

output

❯ npm lists 'make-dir@<=2' --json
{
  "version": "0.0.0",
  "name": "pr-statistician-frontend",
  "dependencies": {
    "webpack": {
      "version": "4.46.0",
      "resolved": "https://registry.npmjs.org/webpack/-/webpack-4.46.0.tgz",
      "dependencies": {
        "terser-webpack-plugin": {
          "version": "1.4.5",
          "resolved": "https://registry.npmjs.org/terser-webpack-plugin/-/terser-webpack-plugin-1.4.5.tgz",
          "dependencies": {
            "find-cache-dir": {
              "version": "2.1.0",
              "resolved": "https://registry.npmjs.org/find-cache-dir/-/find-cache-dir-2.1.0.tgz",
              "dependencies": {
                "make-dir": {
                  "version": "2.1.0",
                  "resolved": "https://registry.npmjs.org/make-dir/-/make-dir-2.1.0.tgz"
                }
              }
            }
          }
        }
      }
    }
  }
}

So originally I did a rough implementation for that: we'd call npm audit, and
if the results were v7's output, we'd then call npm list <packages>.

That worked great, except it turns out
npm list requires node_modules,
because it uses the dependency tree that exists on disk (termed the "actual"
tree) - this meant we couldn't use it as a solution because

You'd have to be able to do a successful npm i which means systems like CIs
would have to meet all external requirements like node versions and c-libs,
and
It meant OS-specific dependencies (like fsevents) wouldn't be shown on the
tree if your OS didn't match (since the dependency wouldn't be installed).

This wouldn't be an issue if we could have npm list use the virtual tree,
which is what npm computes initially (and stores in the lockfile) in order to
then compute the actual tree (aka, figure out how to write the virtual tree to
disk).

So that's what I did.

But, while I was doing so, I kept thinking about how else we could get this,
because calling npm list adds some extra overhead plus I didn't know when my
PR would get landed & released (if at all) - and I realised something:

the virtual dependency tree is stored in the package-lock.json
this file is required for npm audit to work (and has to be in sync with
package.json)
we need to know the path to the package.json & package-lock.json in order
to do npm audit
so, we should be able to read the package.json & package-lock.json
ourselves without adding any extra requirements for using audit-app 🎉

Now, npm actually publish the component that is responsible for interacting
with and managing the dependency tree & node_modules as its own package for it
to be used on its own - it's called
arborist; in theory we could
use for this, but arborist provides a lot more than we need since it has to
support far more than just walking the dependency tree, so it's a very big
dependency to be using for just v7 lock-file support.

Fortunately, walking the virtual dependency tree is actually a relatively
straightforward task because the whole point is that it describes the tree
without external restrictions like supported OSs, de-duplications, engine
versions, etc - those only come into play when figuring out how to represent the
virtual tree in the current environment (aka, installing dependencies), which is
something arborist does.

This means I was able to write a lightweight tree walker that should be
comparable to arborist, without adding a lot of extra dependencies or size to
audit-app; this in turn allows us to calculate the missing information for
vulnerable packages when auditing v7 lock-files.

I've tested it against a number of repos and trees, with a number of variations
and weird trees, and so far it looks to be working fine - which thinking about
it makes sense: a lot of the weird edge-case stuff really is when you're
installing the dependencies to an actual file system.

The primary "edge-case" I've found so far is with the handling of file:
dependencies, which is mainly impactful because they're how npm 7 workspaces
operate. Specifically, it's not possible to accurately determine if a file:
dependency is a direct dependency or pulled in by another package or both or if
they're a workspace dependency.

The bottom line is that file: dependencies are treated as if they're direct
dependencies, in addition to the usual dependency tree pathing; in practice this
means if a package depends on a file: dependency which itself pulls in a
vulnerable package, then that vulnerable package will be counted twice: once for
the top-level package, and then again from the file: dependency.

I don't think this should create any big issues because:

the paths are still correct, there's a few more
file: dependencies are complex and weird anyway, with their own security
cases - this is doubly so for packages depending on file: dependencies
npm/arborist itself doesn't handle these any better, so there isn't really
any alternatives anyway

This work has also highlighted that it would probably be best to refactor a
bunch of the code so that it's cleaner and better structured - specifically
about breaking out the auditing functions into their own grouped files based on
what they're actually doing.

I think the test layout could use a refactor to - ideally it'd be great to have
a proper e2e suite that could be run.

However, I've decided to get this PR landed first because I think it greatly
influences what a good layout would be, so refactoring first would just be doing
work that'd need refactoring very soon afterwards anyway.

Finally, I think there are some normalizing of the audit output that we should
do:

remove .> prefix in paths from pnpm
repeat the versions in npm v6 & pnpm output to be one-per-path (to match npm
v7)

G-Rath added 3 commits July 2, 2021 11:00

feat: improve npm v7 support by walking the dependency tree

f1c0a62

feat: add support for workspaces & file: dependencies

6f05fb9

docs: update README with information about npm v7 support

d864a18

G-Rath merged commit b7694d8 into main Jul 22, 2021

G-Rath deleted the implement-better-npm-7-support branch July 22, 2021 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve npm v7 support by walking the dependency tree #15

feat: improve npm v7 support by walking the dependency tree #15

G-Rath commented Jul 1, 2021

feat: improve npm v7 support by walking the dependency tree #15

feat: improve npm v7 support by walking the dependency tree #15

Conversation

G-Rath commented Jul 1, 2021

The backstory:

The solution: