Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude non-essential files from published package (especially test/**) #277

Closed
broofa opened this issue Feb 18, 2022 · 15 comments
Closed

Comments

@broofa
Copy link

broofa commented Feb 18, 2022

Excluding the test directory from the npm published files will shave ~75% off the size of the package.

'Probably want to also exclude appveyor.yml and example.

$ mkdir -p /tmp/foo
$ cd /tmp/foo
$ npm install resolve
$ du -sh node_modules/resolve

4.0K	node_modules/resolve/LICENSE
4.0K	node_modules/resolve/SECURITY.md
4.0K	node_modules/resolve/appveyor.yml
4.0K	node_modules/resolve/async.js
4.0K	node_modules/resolve/bin
8.0K	node_modules/resolve/example
4.0K	node_modules/resolve/index.js
 52K	node_modules/resolve/lib
4.0K	node_modules/resolve/package.json
 12K	node_modules/resolve/readme.markdown
4.0K	node_modules/resolve/sync.js
304K	node_modules/resolve/test
@ljharb
Copy link
Member

ljharb commented Feb 18, 2022

They are in fact quite essential; they’re included by design. npm install foo && npm explore foo && npm install && npm test should always work.

Duplicate of #258. Duplicate of #235. Duplicate of #207. Duplicate of #171. Duplicate of #180. See not just #82, but #149 (review), #132 (review), #128 (review), #59 (comment).

I’d be happy to exclude appveyor.yml if those mere bytes (not 4K, that’s just the block size limits of your hard drive) are worth excluding. Excluding examples would be a breaking change, so we could do it in v2, i suppose - except that examples are ran as part of tests, so they’d need to remain.

@ljharb ljharb closed this as completed Feb 18, 2022
@broofa
Copy link
Author

broofa commented Feb 18, 2022

npm install foo && npm explore foo && npm install && npm test should always work

This doesn't seem like a valid reason for shipping test files in production. I've never once had a case where a user expected this to work, nor where I felt it would have been helpful.

Bottom line: Test scripts just aren't something anyone expects to run as part of a production environment. In part because production environments often go through additional toolchains that will break this sort of thing.

@ljharb
Copy link
Member

ljharb commented Feb 18, 2022

Sure - but the other bottom line is that package size doesn't matter in practice - what matters is the memory or bundle size of your program, and none of these files affect that. There's certainly tools that try to imply that package size matters, but this is just FUD.

@jason-ha
Copy link

@broofa Very much agree. Jordan did not reference #235 where we had a longer discussion that was eventually not replied to. It seems that Jordan is very convinced despite reoccurring requests. I am back here checking on things because security scanning tools continue to crawl though this dependency and find issues. With such demand out there it would be helpful to provide community with the option to not get the overhead (which is not just about size). There could be two packages. One with just what is needed and another with whatever extras.

@ljharb
Copy link
Member

ljharb commented May 10, 2022

@jason-ha thanks for that; i'll update the list to include that one also.

@ljharb
Copy link
Member

ljharb commented May 10, 2022

@jason-ha if npm offered a way to have the "default" installation be minimal, and a "full" installation be something that could be opted into, i would absolutely do that immediately - but npm doesn't offer it. Having two packages would not solve anything because then I wouldn't have the "full" one installed when I suddenly find myself without internet and needing to test dependencies.

Please feel free to file bugs on any security scanning tools that incorrectly flag these files.

@jason-ha
Copy link

I will direct you to the #235 thread about the tooling. Reading through some related issues it is not just tooling my team uses that is impacted.

It seems that two packages would very much solve the issue. In your use case of having everything, you would never reference the production only package. The other folks who never want to deal with the resolve tests as part of their production environment would never reference the "full" package.

@ljharb
Copy link
Member

ljharb commented May 10, 2022

The problem is that I always want the full package, transitively.

@jason-ha
Copy link

Either I am not understanding you or you are misunderstanding the proposal. Proposing a new package; let's call it resolve-prod-only. It only contains the essential files. You would not use it. You keep on using resolve.

@ljharb
Copy link
Member

ljharb commented May 10, 2022

@jason-ha I'm saying that even when resolve is 10 levels deep, I still want its test installed. The only way to achieve that is to either have the status quo, or, have every dependent use the full package.

The interest you highlight means that folks would switch to resolve-prod-only, which would defeat the point.

@jason-ha
Copy link

Please help me understand why you want the tests installed in all situations. I can understand there are packages you work with and would get to stick with resolve (full) dependency. Why in anyone's usage do you get to insist that tests and whatever are installed [apart from being the package owner of course :) ] ?

It might be helpful to walk through the npm "default"/"full" scenario since you are saying that would work for you. What is a case now where resolve package is N (>1) dependency levels deep, what is install process now, and how would install process change? Is the "default" or "full" option something that exists per dependency or a top-level option such that at top every package transitively involved would get the "full" install?

I am also wondering if there may be a solution where tests are packaged apart from resolve. When you get to the local npm install step or your npm install foo && npm explore foo && npm install && npm test expectation that is when the tests are really installed. From the previous conversation on 235 it is the first three steps that you do to prepare to be offline and npm test is executed later.

@ljharb
Copy link
Member

ljharb commented May 10, 2022

First of all, this isn't just for resolve; it's for hundreds of other packages I maintain as well - resolve is just one of them.

The problem is that my local npm cache is only guaranteed to contain what's in my projects' transitive dependency graphs. Thus, resolve appears, but resolve-full doesn't. npm install --prefer-offline works fine with a primed npm cache.

The problem is that I can't predict when I'll need resolve-full, and when I do, I might not have internet available. So, the only way I can ensure it's always present - so that it's always present when I need it, internet or no - is to have it always be present.

npm could solve this potentially by allowing a package to have multiple "distributions", and then the minimal one would be the default, and I'd set an npm config value to always install the "full" one. However, short of that, this is the only practical approach.

@jason-ha
Copy link

Definitely understand that is it not just resolve and the issue goes to all packages built in this way. Packaging of things beyond the minimal is not a best practice for secure products. But security conversation should probably happen in issue 235 where there is more content to date.

I am not suggesting resolve-full; I want to but expect that is dead on arrival. I am suggesting resolve-prod-only. You keep using resolve; no changes for you. Please say how leaving resolve as is and providing an alternate for all those finding the minimal package advantageous is a problem.

I was looking for "official" guidance on this question and NPMJS does recommend chucking things that are not required like test data. It also indicates that a smoke test is fine, but (a) that is not a whole test suite and (b) doesn't really say it is for use once installed. And elsewhere recommendation for testing a module is to create a test external to the package and see also here.

It is not clear to me if you are completely opposed to the other solution I was wondering about where the tests are brought in via the install in package scope step. As far as I can tell that can meet your caching and offline requirements.

@ljharb
Copy link
Member

ljharb commented May 11, 2022

My use case only works if everyone chooses the full package. That’s the problem.

@jason-ha
Copy link

Please explain everyone. This project has 12.6M repos / 10k packages that depend on it. I don't see how you need every single instance of those uses to have full package.

Can you help explore alternatives to the current pattern that resolve this issue for consumers and are acceptable for your situation? I want to encourage your "will try anything once" part of your profile. I see two possibilities right now (beyond the rejected two package "distributions") that don't require package manager features:

  1. the previously listed pattern where tests are brought in via install
  2. leverage your git cache - details below

To the extent that I understand your use case, you can before going offline ensure your local git repositories are synced with remote. When you explore into package you will know the version and may grab all of the additional files from your local repository at the same version. Now you can test. You will even have the additional option to validate that what came installed matches your repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants