Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RRFC] Only use registry provided URLs to fetch tarballs #472

Open
iarna opened this issue Oct 11, 2021 · 6 comments
Open

[RRFC] Only use registry provided URLs to fetch tarballs #472

iarna opened this issue Oct 11, 2021 · 6 comments

Comments

@iarna
Copy link
Contributor

iarna commented Oct 11, 2021

Motivation ("The Why")

npm 7 introduced a change where it computes tarball URLs from the registry name instead of using the URL provided by the registry. This causes third party registries that don't host their own tarballs to fail.

How

Current Behaviour

Tarball URLs are rewritten to one's that are not guaranteed to match the one's that are registry provided nor in the lockfile. This amounted to a substantial change to the npm protocol.

Desired Behaviour

This seems to have been implemented so that when you change your default registry, npm would download tarballs from the new registry regardless of the lockfile. This is clearly desirable behavior, but the assumptions about how third party registries store their tarballs are not always correct.

I would propose that:

A) The registry config in use when a lock-file is generated should be recorded to the lockfile.
B) When the currently configured registr(y|ies) changes (determined by comparing config to lockfile), npm will refresh tarball URLs by querying metadata from the new registry. This would allow for the new npm7 functionality without breaking third party registries that don't host their tarballs the same way that npmjs.org does.

Open Questions

  • When npm finds a registry change or new tarball URLs should it update the lockfile?
    • Naively, the answer feels like yes, but my impression is that the core use case of this feature is for environments where the same lockfile is used both in public environments (with npmjs.org) and private one's (with a private registry). If so, updating the lockfile every time would not be desirable.ee

References

@darcyclarke darcyclarke added the Agenda will be discussed at the Open RFC call label Oct 12, 2021
@isaacs
Copy link
Contributor

isaacs commented Oct 13, 2021

Initial thoughts (we'll talk it through more in the meeting today, so any/all of this subject to change, of course). I'm definitely leaning in favor of this proposal.

  • Tracking the registry in use in the lockfile is an extremely good idea. I'm surprised no one suggested it before now, but I guess most good ideas look obvious once you see them 😅 I think this can be done as a non-breaking change to the registry format, not sure if it's worthy of bumping the lockfileVersion (which is a pretty big change), but if not, it does mean we'll have to check for the existence of the field. (We already do this with the requires: true flag, so it's not terrible if we go that way.)
  • Historical note, the https://registry.npmjs.org/ string has been slightly magical for quite some time, and gets replaced with the --registry config. We tried to change things to use a more obvious token that was clearly magical, like ${REGISTRY} or something, but that broke a lot of stuff, and would have made lockfiles unusable by npm v6. I'm not sure we'll be able to de-magic it entirely, but this is a step in that direction, at least explicitly defining what the magic registry field was at the time the tree was resolved.
  • Really, the main user-facing behavior shift in this suggestion, for most users, would be some extra HTTP requests for packuments when you change your registry, but that's probably not going to impact performance too horribly. The internal complexity will be forking our "just grab the resolved field, whatever it is" behavior.
  • The users who might be impacted by this (and the reason it is a breaking change) is if you have a passthrough proxy that doesn't modify the response body at all, and is used as the --registry in an environment where the registry is not reachable. In those cases, it'll cause npm to try to fetch the tarballs from the dist.tarball url, which is https://registry.npmjs.org/..., rather going through the proxy. However, in that kind of case, what they ought to do is configure it as a true HTTP proxy, since npm has support for that. So, I'm fine with breaking that use case, but it is a breaking change that will affect someone, I'm sure.

When npm finds a registry change or new tarball URLs should it update the lockfile?

If --save is set, or for other reasons we'd be saving a lockfile, then I'd say yes, it definitely should. (For the same reason we update resolved, integrity, etc.)

In environments where you're running npm ci, or otherwise doing a no-save reification, we don't write the lockfile at all, so we shouldn't change it.

@iarna
Copy link
Contributor Author

iarna commented Oct 13, 2021

I would also note that the problematic rewriting behavior only happens with scoped modules. IMO at the VERY least it should not have different behaviors for scoped and unscoped modules. (This was a bug in my repro. XD)

Also, lockfiles are not required -- the repro does not use a lockfile, it simply breaks basic installs.

@arcanis
Copy link

arcanis commented Oct 15, 2021

In case you find it useful, we solved this problem in Yarn by specifying that "standard registry patterns" are of the form:

/<scope>/<name>/-/<name>-<version>.tgz

If the tarball in metadata matches this URL pattern, Yarn won't store the tarball path and will derive it automatically from whatever is the currently configured registry. If it doesn't, Yarn will store the full URL within the lockfile (which will break the "easy registry switch scenario"). We found this system of graceful degradation working fairly well for us.

@ljharb
Copy link
Contributor

ljharb commented Oct 15, 2021

Notably, I believe that won't match any extant Artifactory registry.

@arcanis
Copy link

arcanis commented Oct 15, 2021

Indeed, but we didn't get requests to make it work nonetheless. Notably, a few issues have been raised by people using them, as evidenced by the logs they shared, but they never complained about it.

There's also a decent case that as the registry maintainers, you probably have more pull to standardize this behavior if you so decided (ref https://github.com/npm/feedback/discussions/55).

@darcyclarke darcyclarke removed the Agenda will be discussed at the Open RFC call label Oct 20, 2021
@fishcharlie
Copy link

What are the next steps on this? Would love to help contribute in anyway I can. npm/cli#3533 is still an incredibly infuriating problem in npm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants