-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing post installation artifacts in offline mirror #50
base: master
Are you sure you want to change the base?
Storing post installation artifacts in offline mirror #50
Conversation
|
||
##Modification to `yarn` offline mirror structure and `yarn.lock` | ||
|
||
- Store post installation artifacts under post-installation subdirectory when using yarn offline mirror. _resolved_ field should reflect this change as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the resolved field need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now need to store a path (post-install/foo.tar.gz#xxxx) instead of just a file name (foo.tar.gz#xxxx). Very minor difference, but I thought I should call it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the path configurable, i.e. is it ever going to change to something other than 'post-install' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be configurable via a yarn option or even an environment variable. The key feature is that we need some way in the stored file itself to tell us that it is a post-install artifact.
Imaging a project adding a new dependency with only the offline mirror, Yarn cli must know what installation steps should be skipped. The only place that we can store this information is in the file name / directory path in my POV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, there is an alternative to store the post installation information in file name / directory name. We could potentially add a file, say .post-install
, in the stored artifact. In this case, we do not need to change resolved field or current structure of offline mirror, which is a flat list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a complex problem but I don't think it will work out via offline mirror
*How should this feature be introduced and taught to existing Yarn users?* | ||
Explain the intended use case with illustrated work flow. | ||
|
||
# Drawbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I think it might work for a subset of Node.js npm packages that don't write or read to folders outside of package.
This won't be a generic solution for packages heavy on native code, we are working on https://github.com/jordwalke/esy to address that. -
Offline mirror is designed to be cross platform because it caches things at the fetching phase.
This feature will be platform specific and in some cases machine specific (sometimes binaries store local paths) and is a linking phase cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Are there examples of package that either read/write to folders outside of package or store absolute paths of local machine? The explicit assumption in the RFC is that we have very few, if any, packages have this kind of behavior.
-
I specifically avoided the platform dependency issue in an effort to limit the reach of this RFC. I guess this is a can of worms that I cannot avoid :-(.
There are two main ways to deal with platform specific codes. Storing precompiled binaries or compile during installation. Some prior arts include Python wheels (https://www.python.org/dev/peps/pep-0427/), Ruby Gems (http://guides.rubygems.org/specification-reference/#platform=), and Go (https://golang.org/src/go/build/doc.go). Python and Go stores platform dependent binaries in their package, while Ruby recompiles during Gem installation.
Storing platform dependent post installation packages via a scheme similar to the one outlined in this RFC is my preferred choice.
Pros:- Guaranteed consist installation across machines with same os / arch / node version combination
- Compatible with NREs.
- Adds no cost for package owners. The choice of what combination of os / arch /node version to support is done by post installation package maintainer, presumably someone has those specific needs.
- Possible to tar the entire installed packages up and copy it to other machines with same platform. This means it will be possible to track a single version as a tar file across its life cycle and can be a great feature for enterprise.
Cons:
- Matrix of os / arch / node version to support can explode. This is somewhat mitigated by the fact that maintainers can choose how big a matrix they want to support.
- Will not work if there are machine specific codes, like linking to an absolute path
- Adds further complexity on Yarn (or a plug in).
-
I choose to reuse offline mirror because I don't want to introduce another cache. If conceptually it is cleaner to have a separate cache for post installation artifacts, that's a change we should modify for this RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Here is an example with node-gyp Parallel workers running install scripts can interfere yarn#1874
- Yep, I know the pain but we have to deal with it as many projects are sharing same yarn.lock files and offline mirror .tgz files across all OS
- I am pretty sure a postinstall cache should be independent from offline mirror
|
||
This observation leads to the assumption that most installation scripts only modify files and directories within its own folders. Consequently, we can store the post installation content as artifacts without worrying about inter-module dependencies. | ||
|
||
##Modification to `yarn` offline mirror structure and `yarn.lock` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline mirror kicks in at fetch
phase.
After the .tgz file is extracted into global cache folder link
phase starts.
During link
phase files are copied from cache into node_modules, considering hoisting, and then lifecycle scripts
are executed that modify some files on those node_modules.
You would have to generate a new .tgz file for each package folder that got modified after lifecycle scripts
phase, disabling their lifecycle scripts
, and then modify yarn.lock file to point to the new .tgz file.
That could be quite complex to implement without bringing too much complexity into Yarn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like your suggestion is to use a separate cache not related to offline mirror to store those artifacts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this will make the offline mirror cache too confusing.
The idea of offline mirror cache is that it stores the file as it was downloaded from a remote repository, this RFC adds a lot of new conditions
- Installation scripts may serve legitimate purposes in certain circumstances | ||
- Requires significant efforts to educate node module writers | ||
- Working on a per package basis and updating all dependent packages might take a long time for the necessary changes to propagate. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Npm community is large and free to do anything, so it will be impossible to enforce any kind of behavior.
The right thing to do would be for the community members to work with the packages individually to provide ability to be installed while using a mirror (sinopia based mirrors have the same problem) and without Internet access: raise issues, send PRs, fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a dumb question but why do some npm packages need internet access to be install? Why can't they hold all needed information within the package itself? (aside from defined dependencies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phantomjs, for example, actually downloads its platform-specific binary upon npm module installation. The npm module is just a wrapper.
I suppose it could package up each target platform/architecture binary and only configure the intended one for that runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I agree that the right thing to do is to work with the package owner to remove network dependencies, the process has been proven as slow and sometimes unresponsive. We not only need to work with the owner of the package in question, in some case, we need to work with dependents package and dependents of dependents as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the default assumption - a package is released "as is" and I think it is an exception when a package author has time to support more use cases.
- Update to installation time downloads are ignored / require explicit action | ||
Installation scripts tends to download the latest version of dependencies. A stored post installation artifacts will always have the same version of dependencies and thus potentially will not have the latest dependencies. To update such installation time downloaded dependencies, explicit actions from offline mirror maintainers will be required. | ||
|
||
# Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said above, this RFC goes beyond the concerns of Offline Mirror feature.
I think the problem may be solved by caching and sharing a built package in some way.
This may not work across platforms and machines, depends on every package and how a project is built.
I would try:
- disable
lifecycle scripts
for a package that needs Internet (maybe have this setting in package.json) - before package is installed from Yarn cache, replace the cache with the prebuilt content. Packing, sharing and replacing in cache could be automated in some way by Yarn or a plugin or a third party script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those that operate in a NRE would likely be less concerned about cross platform compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those that operate in a NRE would likely be less concerned about cross platform compatibility
I'm at Red Hat, working in NREs on multiple architectures.
I would look at this from another angle, some FB employees are working on a generic solution to bring appropriate binaries compilation to Yarn yarnpkg/yarn#480 (comment). |
Initial version of RFC for storing post installation artifacts in offline mirror.
Derived from part of the discussion from yarnpkg/yarn#393 with @bestander