-
Notifications
You must be signed in to change notification settings - Fork 66
Improving support for symlinks in the module system #46
Comments
@phestermcs Please stop trying to change the process. I'm literally in the middle of writing a response to the issues you brought up in the PR right now. Let's do things the way that the Node project does things, respecting the established process, and try to follow the lead set out by the project maintainers. |
I'm attempting to offer solutions for the following when using symlinks:
|
Pinging @bmeck. We should consider the impending ESM support and any impact module system changes might have with that in mind. |
Re nodejs/node#10132 (comment)
I see. So, to make sure we're on the same page, let me try to describe the problem, and y'all can tell me if I'm missing anything. Suppose two apps with the same logical dependency requirements:
Let's further stipulate that
We install in
Some time later, after some updates have been published to bar, we install in
If we use an approach like pnpm and ied, where every instance of So this arrangement would thus be impossible:
This is of course trivial to do without symlinking into a store, because you just have multiple copies of stuff. This is the "disk is cheap" approach. It works pretty well today, because disk is pretty cheap, but it does incur some less obvious costs like building packages, downloading multiple times, etc. There are some other ways to mitigate those costs, of course, and they should be explored separately. However, in this challenge, we want to say that there's exactly one copy on disk of Furthermore, in any satisfying solution, (A) there must be a logically correct dependency graph loaded at run time (ie, everything gets a copy of what it depends on, meaning that there are 2 copies of If I'm not understanding the nature of the logical puzzle, please let's stop and clarify. I don't want to rush to solutions until the problem is very well understood. |
No known problems with ESM on this front. If we were talking archive formats that serialize symlinks like ASAR / NODA / WebPackage we might have things to think about, but none are currently supported by Node. |
Now that we have established the technical challenge and agreed on the terms of it, I'd like to seek alignment on the heuristics to be used in evaluating potential solutions. Here's a sketch of a list of what's important to me, personally, in ranked priority order. This is a question of values, and it's necessarily somewhat subjective, so I'd love to get some input from others in the nodejs core maintainer group about what the project's priorities are.
Does this seem like a fair set of heuristics for judging solutions? I realize that these will be somewhat in conflict with one another, but that's why they call it a trade-off :) If it seems that I'm being overly methodical and pedantic, it is because any change to the module system demands extreme diligence. Last time there was a disruption here, I promised that I would do everything in my power to help avoid it in the future. |
Hah, my intent is that we make any change as un-exceptional as possible ;) |
Regarding the condition of the challenge, I think it should also include a way out of symdir cycles when representing module dependency cycles. |
@phestermcs Can you elaborate on what you mean by "a way out of symdir cycles"? |
So, is it a case like this? A -> B So, my app depends on A, A depends on B, B depends on C, and C depends on A. You want to avoid situations where I can end up with a path like Am I understanding it properly? If so, that seems like a fair restriction to me. |
Correct. Fwiw, Sticking with subordinate |
Yes, sticking with the somewhat naive "link everything into the store" approach, where things in the store have a |
@phestermcs I mean you have a folder structure like this:
That won't work because updating a second app updates the This is the naive "use symlinks into a central store" approach, but it falls down on 2 points. (I'm using "naive" as a compliment here, meaning "do the simplest, most obvious, least clever thing".) |
@isaacs Ok, I thought that's what you may have meant; just double checking. But that does remind me of still another "meet the challenge" point. There also is the case of handling "bundled" node_modules as well. i.e. It should still be possible to have a module in a machine store that does have a '/node_modules' subfolder but all of it's content is 'locked' to that module (no symlinks to outside of module), most simply as copies. For example |
@phestermcs Please let's keep the scope of this problem and solution limited, or else we'll never be able to make progress in reasonable time. Saying that any solution is backwards compatible means that any existing solutions to those problems (for example, just copying files into folders) will continue to work just fine. |
Yes, backwards compatibility means that current apps work with the future node. Forwards compatibility, where future apps work with current node, is much harder to attain for any semantic change or bug fix. Presumably the entire point is that there's a problem we want to fix or something we want to do that is not currently possible. |
Here's another proposal possibility that might work.
I haven't tested it yet, but I believe that would solve the peer dependency issue addressed by The first module store symlink approach is to create actual nested directory hierarchies, but symlink the contents of the packages into their destinations. This results in a layout on disk that is more parallel to the naive "disk is cheap" approach, and that parallel is appealing. However, it necessarily means that the package manager has to keep a record of files in each package, and it's WAY more actual symbolic links. If packages are linked from their development location into the module store, then the developer will have to refresh all of their module installations whenever a file is added to the package. (Maybe that happens rarely? Or can be somehow detected/automated?) So, it doesn't score very highly on the 5th heuristic (minimizing package manager complexity). That would look something like this in practice (where
The second module store symlink approach is to lay files out on disk with a It looks like this:
I'll try to find some time to make a patch for this minimal module system change so that we can investigate it further and try to shake out some other tradeoffs. Once there are at least 2 or 3 options, we can figure out which one satisfies the criteria the best, and with a minimum of new footguns. |
@phestermcs Both module store approaches would depend on the same change to the module loader, described at the start of my post in 3 bullets. The existing module loader already knows what to do with the package.json file as described. The only internal change to node would be that the module search paths include both the |
If the symlink-folder will be always named This structure would be enough, wouldn't it? package.json is in contents anyway...
|
This solution will work when someone requires the package via the entry point. However, frequently people are using a different file from the package, like |
I wrote up a patch for what I'm suggesting: isaacs/node@d0f9a99 It isn't yet ready for a PR (not least because it needs tests, docs, and the like), and I'd like to drive this discussion towards a shared understanding of the problem and requirements of a solution. I only wrote this patch because sometimes it's better to communicate thoughts in code rather than prose :)
Indeed! That's a shortcoming of the Another unaddressed concern: loading deps from modules loaded within the dependent module. For example, if we slightly alter my // foo/index.js
console.error('in foo', module.paths)
require('./other-module.js') // foo/other-module.js
console.error('in foo/other-module.js', module.paths)
var bar = require('bar')
console.log('foo using %s', bar) Then we get this result:
This can be fixed by also passing along the parent module paths to its children modules. I'll add that in soon. @phestermcs I must admit I don't understand what you're talking about with respect to |
Yeah, this fixes the issue: isaacs/node@f727658
EDIT: updated commit link after rebasing to node master |
Effectively, all modules derive there ExampleIn practice the structure and how accessed (tooling launching tooling, etc) is more complex and has different patterns, so this is a contrived simplification to highlight the fundamental problem inherent to all of them.
With current behavior of converting "mod"'s path to its realpath, and using as the main modules In your proposed solution, using a realpath in the search list will also have the same effect, just lower down in the link tree. |
@phestermcs I'm trying to find solutions that do not add additional new API surface, including new folder paths that users might be confused by. Adding And since it only adds support for something that didn't work at all before, and only at a lower priority than the current search paths, there's very little chance of someone's existing use case being disrupted. |
Adding lookup paths will have a performance impact as well. We've done a lot of work recently to make all this faster and if we go the |
@phestermcs @mikeal Yes, I agree, that's a concern. Adding 2 or 3 additional folder lookup paths is probably fine. Adding 10 is less likely to be ok. So, if we go from this: [ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
'/Users/isaacs/dev/.store/foo/node_modules',
'/Users/isaacs/dev/.store/node_modules',
'/Users/isaacs/dev/node_modules',
'/Users/isaacs/node_modules',
'/Users/node_modules',
'/node_modules' ] to this: [ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
'/Users/isaacs/dev/.store/foo/node_modules',
'/Users/isaacs/dev/.store/node_modules',
'/Users/isaacs/dev/node_modules',
'/Users/isaacs/node_modules',
'/Users/node_modules',
'/node_modules',
'/Users/isaacs/dev/app4/node_modules/foo/node_modules',
'/Users/isaacs/dev/app4/node_modules' ] then that feels a lot less risky to me than this: [ '/Users/isaacs/dev/.store/foo/1.2.3/node_modules',
'/Users/isaacs/dev/.store/foo/1.2.3+node_modules',
'/Users/isaacs/dev/.store/foo/node_modules',
'/Users/isaacs/dev/.store/foo+node_modules',
'/Users/isaacs/dev/.store/node_modules',
'/Users/isaacs/dev/.store+node_modules',
'/Users/isaacs/dev/node_modules',
'/Users/isaacs/dev+node_modules',
'/Users/isaacs/node_modules',
'/Users/isaacs+node_modules',
'/Users/node_modules',
'/Users+node_modules',
'/node_modules',
'/Users/isaacs/dev/app4/node_modules/foo/node_modules',
'/Users/isaacs/dev/app4/node_modules/foo+node_modules',
'/Users/isaacs/dev/app4/node_modules',
'/Users/isaacs/dev/app4+node_modules' ] |
For now, don't think of my solution as 'the' solution, but as a fully working one.. a baseline.. a bar to meet. a way to actually experience any solution. |
1. A file symlink that is not the main module. Haven't tested it. 2. A file symlink that is a link to a symlink, even as the main module. Still fails. 3. Since anm only looks one symlink deep....../home/isaacs/node_modules won't be in the module search path. I'm pretty sure that is the desired behavior though, isn't it? I have been thinking about it, and I've concluded that symlinks per se aren't needed to pull off machine-level stores. For instance, it would be an acceptable compromise to wrap commands with a shim, because npm already does this on Windows. If we're going to need to wrap top-level applications in a shim regardless, we might as well use that shim to monkey-patch |
@wmhilton the ESM proposal does not allow modifying the resolution algorithm. The very specific hooks that are safe to allow are shown in the slides from the Sept TC39 meeting. I would take a look at that and see if it is possible using ESM cache manipulation or if your solution only working in CJS works. |
@wmhilton #46 (comment) Warning! This is definitely possible, but leads almost invariably to a very dark and twisted maze of shims and complexity. The reason that node and npm switched to exclusively using I've spoken about this with a few folks who actually see the "update metadeps at a distance" as a feature rather than a downside of ied and linked module directories. (Not people using ied itself, but using npm link, and eager to try out ied/pnpm for this reason, because "it sounds like npm link, but even better".) Again, I'm not sure how this changes the breakage/value considerations here, but at least, it's worth keeping in mind that removing support for "lookup deps based on realpath" is not going to be an unmitigated benefit. @bmeck I think that we should be very wary of adding any features that will make ESM have to behave (even more) significantly different from CJS modules. The more parallels we can maintain, the better it'll be for people transitioning from one to the other. Thanks for sharing these slides. I'll review these this week and try to see if I can imagine a way to conceptualize the same effective behavior in a way that doesn't violate the constraints of ESM loading. The nice thing about both the current behavior and At first glance, it looks like it'll be very difficult to maintain this unless we go the route (like |
Just to be clear, I mean this warning in the sense of "That is going to be an exciting adventure!", and I strongly encourage you to try it, if you've got the time and are in the mood for adventure. But it definitely won't be safe, so it shouldn't be something that npm or node-core try to do :) |
@isaacs I wanted to try your fork of Node with a slightly modified pnpm. I had this error when trying to run the pnpm tests:
Is your fork of Node ready for experimenting with? Let me know if I can help with something. |
@isaacs keep us up to date, i am concerned with potential startup time increase from searching if symlinks are overly common and the fact that things become more non-deterministic than they already are. |
@bmeck Yeah, the ep-46 branch is probably dead on the vine because of the nondeterminism. The startup time is likely not going to be relevant. It only adds 4-5 lstat calls total, since the stats are cached, and all modules in use are likely linked into the same store path. Now that realpath uses the system builtin, it's also a lot faster than it used to be. @zkochan It should work fine, I'm surprised that any tests are failing. Does it work with node on master? |
@isaacs is there a change in the |
@isaacs I tried with master. No issues there |
@thealphanerd Ah, ok. I'd thought it was using the native realpath and then falling back to the JavaScript implementation, but I guess that was just a suggestion I heard somewhere and didn't make it into core. @zkochan Is it possible to reproduce that test in a reasonably minimal way? Anything that works on master should work on ep-46. |
Thanks! And yes, it's precisely because it is crazy and out there that I'm going to try to do it in user space via monkey patching rather than by branching node core. It'll be fun. When/if I get a machine-store loader working I'll post back with a link. |
I noticed that @isaacs's solution uses the real path for This has made me thinking, should there be some new variable that will have the symlink location of the file? The symlink equivalents of |
I have an update regarding this thread. I've been working on changing pnpm, so that it does not rely on the The latest version of pnpm (which is 0.51.2) uses a global (machine) store and works without any changes in Node.js. We did a lot of tweaks to make it work, but the main ones are:
So it is achievable to create a global store without changes in Node.js and without |
Just updating this to say I no longer care about symlink support. Since we've managed to come up with two ways to create a global store without using symlinks (require-hacking and hard links) I think the global store use case is no longer a valid justification for changing the behavior of require in node core. |
@zkochan @wmhilton Interesting, thanks for the update. That still doesn't solve the symlinked peer-dependency use-case though, correct? |
@isaacs Er... what was that use case again? |
@wmhilton
I do |
@isaacs your example is a little bit confusing. Plugin will most likely have framework as a peer dependency, and pnpm links peer deps to the node_modules folder of the dependent packages. So On the other hand, if Time will show, but this approach seems to work fine with apps like eslint, karma, babel |
What system writes cmd shims that update the Dig up the discussions that led to |
I forked This is my personal opinion, but I think the solution should be either the same as |
Is this for development purposes? Or are symlinks and --preserve-symlinks used in production? I'm just confused because I've installed packages that use peer dependencies and I've never seen npm insert a symlink for them.
Maybe, but I have reason to believe at this point it could be solved outside of node core. |
@jasnell is this still desired since |
Continuation of the discussions nodejs/node#10132 and nodejs/node#10107 ... Please use this thread to discuss the high level architectural, merits, and long term impact of the proposed change so that the PR discussion can be focused on the actual code review.
The text was updated successfully, but these errors were encountered: