Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve npm v7 support by walking the dependency tree #15

Merged
merged 3 commits into from
Jul 22, 2021

Conversation

G-Rath
Copy link
Owner

@G-Rath G-Rath commented Jul 1, 2021

This restores full & proper support for npm v7, making audit output equivalent
to pre-v7 - specifically now when auditing with npm v7:

  • Vulnerability paths are now full paths
  • Versions of the packages that are vulnerable are now listed in table output

The backstory:

npm v7 changed the audit output to something like this
{
  "auditReportVersion": 2,
  "vulnerabilities": {
    "minimist": {
      "name": "minimist",
      "severity": "low",
      "via": [
        {
          "source": 1179,
          "name": "minimist",
          "dependency": "minimist",
          "title": "Prototype Pollution",
          "url": "https://npmjs.com/advisories/1179",
          "severity": "low",
          "range": "<0.2.1 || >=1.0.0 <1.2.3"
        }
      ],
      "effects": ["mkdirp"],
      "range": "<0.2.1 || >=1.0.0 <1.2.3",
      "nodes": ["node_modules/minimist"],
      "fixAvailable": {
        "name": "mkdirp",
        "version": "0.5.5",
        "isSemVerMajor": false
      }
    },
    "mkdirp": {
      "name": "mkdirp",
      "severity": "low",
      "via": ["minimist"],
      "effects": [],
      "range": "0.4.1 - 0.5.1",
      "nodes": ["node_modules/mkdirp"],
      "fixAvailable": {
        "name": "mkdirp",
        "version": "0.5.5",
        "isSemVerMajor": false
      }
    }
  },
  "metadata": {
    "vulnerabilities": {
      "info": 0,
      "low": 2,
      "moderate": 0,
      "high": 0,
      "critical": 0,
      "total": 2
    },
    "dependencies": {
      "prod": 3,
      "dev": 0,
      "optional": 0,
      "peer": 0,
      "peerOptional": 0,
      "total": 2
    }
  }
}

Importantly, it no longer includes a list of the specific versions of vulnerable
packages it found in the tree, nor does it guarantee a complete and full
dependency path. As a result, the original short-term solution that was landed
to provide basic npm v7 support used only the vulnerable package name, meaning
ignores were no longer per vulnerability, but instead per advisory + package.

The solution:

All of this information that we want is stored in the dependency tree that npm
uses to do its thing (it's actually pretty much most of the point of the tree),
so if we can get a complete representation of the tree, we can figure out the
versions & paths ourselves.

This is actually something npm already supports doing for us - it's the
npm list command. Even better, you can pass it specific package names +
constraints, and it'll give you the paths to only those named packages that
match the constraints:

npm lists 'make-dir@<=2' --json
output
❯ npm lists 'make-dir@<=2' --json
{
  "version": "0.0.0",
  "name": "pr-statistician-frontend",
  "dependencies": {
    "webpack": {
      "version": "4.46.0",
      "resolved": "https://registry.npmjs.org/webpack/-/webpack-4.46.0.tgz",
      "dependencies": {
        "terser-webpack-plugin": {
          "version": "1.4.5",
          "resolved": "https://registry.npmjs.org/terser-webpack-plugin/-/terser-webpack-plugin-1.4.5.tgz",
          "dependencies": {
            "find-cache-dir": {
              "version": "2.1.0",
              "resolved": "https://registry.npmjs.org/find-cache-dir/-/find-cache-dir-2.1.0.tgz",
              "dependencies": {
                "make-dir": {
                  "version": "2.1.0",
                  "resolved": "https://registry.npmjs.org/make-dir/-/make-dir-2.1.0.tgz"
                }
              }
            }
          }
        }
      }
    }
  }
}

So originally I did a rough implementation for that: we'd call npm audit, and
if the results were v7's output, we'd then call npm list <packages>.

That worked great, except it turns out
npm list requires node_modules,
because it uses the dependency tree that exists on disk (termed the "actual"
tree) - this meant we couldn't use it as a solution because

  1. You'd have to be able to do a successful npm i which means systems like CIs
    would have to meet all external requirements like node versions and c-libs,
    and
  2. It meant OS-specific dependencies (like fsevents) wouldn't be shown on the
    tree if your OS didn't match (since the dependency wouldn't be installed).

This wouldn't be an issue if we could have npm list use the virtual tree,
which is what npm computes initially (and stores in the lockfile) in order to
then compute the actual tree (aka, figure out how to write the virtual tree to
disk).

So that's what I did.

But, while I was doing so, I kept thinking about how else we could get this,
because calling npm list adds some extra overhead plus I didn't know when my
PR would get landed & released (if at all) - and I realised something:

  • the virtual dependency tree is stored in the package-lock.json
  • this file is required for npm audit to work (and has to be in sync with
    package.json)
  • we need to know the path to the package.json & package-lock.json in order
    to do npm audit
  • so, we should be able to read the package.json & package-lock.json
    ourselves without adding any extra requirements for using audit-app 🎉

Now, npm actually publish the component that is responsible for interacting
with and managing the dependency tree & node_modules as its own package for it
to be used on its own - it's called
arborist; in theory we could
use for this, but arborist provides a lot more than we need since it has to
support far more than just walking the dependency tree, so it's a very big
dependency to be using for just v7 lock-file support.

Fortunately, walking the virtual dependency tree is actually a relatively
straightforward task because the whole point is that it describes the tree
without external restrictions like supported OSs, de-duplications, engine
versions, etc - those only come into play when figuring out how to represent the
virtual tree in the current environment (aka, installing dependencies), which is
something arborist does.

This means I was able to write a lightweight tree walker that should be
comparable to arborist, without adding a lot of extra dependencies or size to
audit-app; this in turn allows us to calculate the missing information for
vulnerable packages when auditing v7 lock-files.

I've tested it against a number of repos and trees, with a number of variations
and weird trees, and so far it looks to be working fine - which thinking about
it makes sense: a lot of the weird edge-case stuff really is when you're
installing the dependencies to an actual file system.

The primary "edge-case" I've found so far is with the handling of file:
dependencies, which is mainly impactful because they're how npm 7 workspaces
operate. Specifically, it's not possible to accurately determine if a file:
dependency is a direct dependency or pulled in by another package or both or if
they're a workspace dependency.

The bottom line is that file: dependencies are treated as if they're direct
dependencies, in addition to the usual dependency tree pathing; in practice this
means if a package depends on a file: dependency which itself pulls in a
vulnerable package, then that vulnerable package will be counted twice: once for
the top-level package, and then again from the file: dependency.

I don't think this should create any big issues because:

  • the paths are still correct, there's a few more
  • file: dependencies are complex and weird anyway, with their own security
    cases - this is doubly so for packages depending on file: dependencies
  • npm/arborist itself doesn't handle these any better, so there isn't really
    any alternatives anyway

This work has also highlighted that it would probably be best to refactor a
bunch of the code so that it's cleaner and better structured - specifically
about breaking out the auditing functions into their own grouped files based on
what they're actually doing.

I think the test layout could use a refactor to - ideally it'd be great to have
a proper e2e suite that could be run.

However, I've decided to get this PR landed first because I think it greatly
influences what a good layout would be, so refactoring first would just be doing
work that'd need refactoring very soon afterwards anyway.

Finally, I think there are some normalizing of the audit output that we should
do:

  • remove .> prefix in paths from pnpm
  • repeat the versions in npm v6 & pnpm output to be one-per-path (to match npm
    v7)

@G-Rath G-Rath merged commit b7694d8 into main Jul 22, 2021
@G-Rath G-Rath deleted the implement-better-npm-7-support branch July 22, 2021 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant