Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFI: What are the know issues with --preserve-symlinks #9673

Closed
ghost opened this issue Nov 18, 2016 · 14 comments
Closed

RFI: What are the know issues with --preserve-symlinks #9673

ghost opened this issue Nov 18, 2016 · 14 comments
Labels
module Issues and PRs related to the module subsystem. question Issues that look for answers.

Comments

@ghost
Copy link

ghost commented Nov 18, 2016

I was informed nodejs would ideally prefer symlinks are always preserved, but there are known edge cases that prevent defaulting node to this behavior.

I'm requesting someone knowledgeable of the issues enumerate them, ideally separating them into hypotheticals and actuals if possible

For example, the documentation alludes to one hypothetical:

Note, however, that using --preserve-symlinks can have other side effects. Specifically, symbolically linked native modules can fail to load if those are linked from more than one location in the dependency tree (Node.js would see those as two separate modules and would attempt to load the module multiple times, causing an exception to be thrown).

From the OS perspective, loading the same so or dll 2 or more times would not in-and-of-itself cause a fault. It also appears the Modules loader code creates a node js-side Module instance for each absolute require('addon.node') path, passing the Module instance to the loaded addon to initialize. This also would not necessarily cause a fault, as the library is creating and attaching state to the particular Module instance. It's not clear in any docs if v8 would fault if an attempt was made to create a FunctionTemplate in the same Isolate for the same native function, which could occur if an addon's Init() function were called more than once; v8 could easily return the same template on subsequent calls (similar to how an OS's dlopen or LoadLibary call behaves). If the addon defined static state that was in fact state meant to be specific to a Module instance, that could create odd behavior between Module instances, and potentially fault, but has anyone actually experienced any of this, or what the docs alluded to, in the real world?

This is an actual that occurs in the real world:

--preserve-symlinks does preserve symlinks for all require() calls, but strangely not that of the entry.js file passed on node's command line. This can create real issues when package managers attempt to run lifecyle events that are node command-lets who's entry.js is located in a symlinked folder under some node_modules folder (assuming the command-let itself was configured in package.json to launch with --preserve-symlinks).

Providing technical details of all know issues with --preserve-symlinks is greatly appreciated.

@bmeck
Copy link
Member

bmeck commented Nov 18, 2016

I don't prefer symlinks be preserved due to lots of cases around cache consistency and memory bloat.

Enabling them breaks lots of the npm ecosystem that expect shared modules via things like npm link, linklocal, etc. to share the same space in require.cache.

This was attempted in core during v6, but broke vast amounts of things both feature wise and due to things like mass memory usage increase in some cases.

@mscdex mscdex added module Issues and PRs related to the module subsystem. question Issues that look for answers. labels Nov 18, 2016
@ghost
Copy link
Author

ghost commented Nov 18, 2016

@bmeck Thanks for your input.

I'm looking for something maybe a little less anecdotal and more technically precise, if possible.

For example, you state:

...lots of cases around cache consistency and memory bloat.

Regarding cache consistency, I know node keeps a map, or cache, of absolute paths to module instances, but I'm pretty sure it's always consistent in returning the same module instance for a given absolute path, regardless if that path was a resolved-symlink-path, a preserved-symlink-path, or an original-path, so it's not clear as to what you technically mean by cache consistency?

I can understand if there were 'old' installs of node_modules hierarchies, where symlinking had been a technique used to reduce storage space in those cases where a common module was used by several higher order modules, that then preserving the symlinks would result in all those places (as distinguished by the path) causing node to create several module instances in the module map (cache) for what would essentially be the same module, and that this could cosume substantially more memory. I say 'old' because it's now common for npm and yarn, etc. to bubble up common modules as high as possible, and to install copies of all modules, so symlinking in those cases wouldn't necessary take any more memory. Would you agree? (and agreeing doesn't mean there aren't still real issues in defaulting or explicitly preserving symlinks when running against 'old' node_modules installs).

Hoping you can be a bit more technically precise with this one though:

Enabling them breaks lots of the npm ecosystem that expect shared modules via things like npm link, linklocal, etc. to share the same space in require.cache.

Can you provide a concrete example I could reproduce that would exhibit the problem you are referring to? Specifically some specific module or thing from the npm ecosystem, that was relying on the fact linklocal symlinked something for sharing, and that then "broke" in some way? Also, what space are you referring to in require.cache (which I assume is the same Modules cache you were referring to earlier), the folder hierarchy?

I know right now that the opt-in --preserve-symlinks switch does not preserve the symlink of the entry.js file passed on node's command line (for some unknown reason). I also know npm can run lifecycle events that are essentially command lines that launch node, where they never specify the --preserve-symlinks (and they probably shouldn't). These two behaviors can create inconsistencies with when and where symlinks can and cannot be preserved while running lifecycle scripts, or modules that auto-load plugins, and other such things, and this can easily create the perception symlinking has issues, and attempting to preserve only makes things much worse.

I also would like to understand if when it was attempted to default node to always preserve symlinks, as you alluded to at some point in v6's life, if it also was not preserving the symlink of the entry.js passed on the command line, in exactly same way it's not preserved today when using the --preserve-symlinks switch. Do you know if that was the case or not?

Ok, so you may be asking "WTF is this guy asking all this for"?

I have a way to fix this stuff, while also helping the development time experience, by allowing modules on a machine to be physically stored once in a global location, but symlinked to in all cases from anywhere on the same machine, while still preserving specific and varying local dependency tree versions, no matter how many places they're used. This will enable install times to go from minutes to seconds, and save 10's of GBs of storage (assuming a developer has dozens of node projects installed on their machine, which many do)

Going by the responses (sometimes almost hostile if not non-existent) of several others from related issues I've posted on several other repo's, I'm positive you totally and utterly disagree that's realistically plausible, think I'm an absolute utter crackpot, that I have no idea what I'm doing nor what I'm up against, that it could never be done, that it will break the entire ecosystem, and yada yada yada..

All I can tell you is that I've been building software for 35+ years, and I've seen and done just about everything; OS's, languages, frameworks, problem domains, architectures, methodologies, industries... I mean, everything (see my avatar? that's what I look like in real life from all of it).. Maybe that seems a bit like I'm tooting my own horn, but give me the benefit of the doubt, that I might actually know what I'm doing, and help me get concrete, reproducible information, so I can have even more empirical evidence to confirm to all the nay-sayers what I'm doing will actually work.

My overall opinion on this is that the biggest challenge I'm facing is entirely about preconceived notions, misunderstanding, 2nd hand hearsay, and a bunch of anecdotal confusion. Absolutely there are definite technical issues, but they can be overcome pretty easily and they're nothing near what most everyone seems to think, and by providing a simple way of opting-in to the thing, there actually is a way to keep the ecosystem running just fine without breaking things, and still be able to move forward.

It would be a huge favor, if you could help me on this issue to collect any further details and steps of reproduction you may be aware of with respect to running node while preserving symlinks (or even using symlinks for that matter)

Warm Regards

@sam-github
Copy link
Contributor

@phestermcs You may wish to look into running citgm to see the ecosystem breakage that occurred when this was enabled. Any proposal is going to have to be shown to not break the world anyhow, so maybe start with that to see what happened for yourself.

@ghost
Copy link
Author

ghost commented Nov 18, 2016

@sam-github Thank you! this is exactly the kind of stuff I'm looking for, if what you're implying is that a version of node was released (or tested prior to release) that defaulted to preserving symlinks, and that the problems this created when running modules through citgm with that version of node, were captured as issues in the citgm repo? that'd be nice.

Id like to think I'm a part of the ecosystem and so preserving it is also in my own interest. I've also been part of many ecosystems in my time, and know just how bad it can hurt when they break (like lost customers and money kinda hurt).. see my avatar? that's from 'experience' (i.e. mistakes that hurt).

Know I innately understand and emphatically support bringing about change with out breaking the world, especially around broadly reaching, underpinning platform technology, and will absolutely do all I can to ensure such a thing before screaming for the change.

However, a little support would be nice. For example, I've already created a very small PR for node to enable my broader proposal, and one thing that would help empirically prove or disprove the world won't break, is for others to get that version of node without having to build themselves, so we can more effectively test.

What's the probability of nodejs incorporating the PR, and not documenting or publicizing the switch, treating it as "experimental", if it can be shown with the switch off, behavior doesn't change and nothing breaks? Who do I have to schmooze around here??

@Fishrock123
Copy link
Contributor

See also Isaac's post on the original bug PR that later introduced the flag: #3402 (comment)

@sam-github
Copy link
Contributor

Who do I have to schmooze around here?

shmoozing not required, providing code that does what you want, and that does not break the world (as evidenced by citgm) is required.

@jasnell
Copy link
Member

jasnell commented Nov 18, 2016

Note that the native modules reload issue is most definitely not hypothetical.
Also, enabling the symlinks flag by default leads to significant module cache bloat when using circular dependencies.

Each of these are very real problems with using this by default which is why it is opt in only

@ghost
Copy link
Author

ghost commented Nov 18, 2016

@jasnell Thank you. I took it as hypothetical given the way in which it was conveyed in the docs. I would so love it if you could technically describe the reason it's not hypothetical (what I wouldn't give to be talked to in deep technical terms), and in my broader solution, the problem is simply solved by just making multiple copies as necessary.

Re cache bloat and circular deps, I can certainly understand that, but that occurs in the way things are implemented today, and the way in which symlinks are used today. In my solution, while module cycles can certainly exist, it would by physically impossible for cycles to exist in the folder structure, and I would alter the 'bookkeeping' to separate the path used to identify a module from the path used to load a module.

@sam-github Then I will provide such a thing imminently :)

@Fishrock123 I will definitely read, but my solution makes a fundamental change that may negate the issues that surfaced. time and empirical evidence will tell

@ghost
Copy link
Author

ghost commented Nov 21, 2016

@sam-github @jasnell @Fishrock123 You might be interested in this PR: Symlinks Just Work.

@ghost ghost closed this as completed Nov 21, 2016
@ghost
Copy link
Author

ghost commented Nov 27, 2016

@jasnell If you have any details on why add-ons crashed when multi-symlinked, that would be very helpful in reproducing.

I know the OS's don't crash simply from calling dlopen more than once; they work like require() does and return an existing instance.

I can reason that there's some global state that gets clobbered, and I can simulate by setting a static on first Init() call, then checking and storing to address 0 on second to cause crash. Would that be representative enough for valid reproduction, based on your understanding of the problem?

@jasnell
Copy link
Member

jasnell commented Nov 27, 2016 via email

@ghost
Copy link
Author

ghost commented Nov 27, 2016

Thank you for replying. I have fixed the problem (and have tests to prove) of multiple symlinks resulting in the creation and caching of multiple Module instances for the same physical module. This eliminates the memory bloat problem (as the tests prove), and should eliminate crashing with add-ons, as they'd only be loaded once. I would like to reproduce the problem with an add-on, so I can then show the problem has been fixed, and require more detail, or an add-on that we know fails in this way.

Just asking the OS to load an .so or .dll, which is the binary form of an add-on, will not crash a process by itself.

From dlopen docs:

Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.

And LoadLibrary docs:

The system maintains a per-process reference count on all loaded modules. Calling LoadLibrary increments the reference count. Calling the FreeLibrary or FreeLibraryAndExitThread function decrements the reference count. The system unloads a module when its reference count reaches zero or when the process terminates (regardless of the reference count).

The second time an add-on is loaded, the OS will just return the same handle. However, its node registered Initialize() method will get passed a newly created Module instance when called additional times. I can see a couple of different ways from this point how an add-on would crash, but they all fundamentally come down to either the add-author, or something in v8, using global state in a conflicting way.

Can you please provide more detail, or provide an add-on that fails in this way, so I can recreate this problem, to then show it has been fixed?

@ghost
Copy link
Author

ghost commented Nov 28, 2016

@jasnell I'm using the following to repo crashing from multiple symlinks to same physical add-on. I have tested and it does crash in v7.2.0 with --preserve-symlinks (and does not crash with my fixes in place).

Could you review the following and opinion if it recreates the problem enough?

addoff.cc

#include <node.h>

namespace addoff {

using v8::FunctionCallbackInfo;
using v8::Isolate;
using v8::Local;
using v8::Object;
using v8::String;
using v8::Value;
    
void Method(const FunctionCallbackInfo\x3cValue>& args) {
  static int callcount = 0;
  Isolate* isolate = args.GetIsolate();
  const char* message = callcount == 0 
    ? "...add-off here... hey... i smell smoke..."
    : "...i'm still here... hmm... guess it was nothing...";
  args.GetReturnValue().Set(String::NewFromUtf8(isolate, message));
  ++callcount;
}

void LightBomb(Local\x3cObject> exports) {
  #define UNLIT 0
  #define LIT 1
  static int bomb = UNLIT;

  if (bomb == LIT) 
    // os's LOVe this
    {*((int*)0) = 0;} 

  if (bomb == UNLIT)
    // lets make things interesting, shall we?
    {bomb = LIT;} 

  NODE_SET_METHOD(exports, "isLit", Method);
}

NODE_MODULE(addoff, LightBomb)

@ghost
Copy link
Author

ghost commented Dec 4, 2016

@sam-github citgm v1.7.0 results identical to release version.

@jasnell Reproduced add-on crashing, as well as showing it's fixed

More info here

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module Issues and PRs related to the module subsystem. question Issues that look for answers.
Projects
None yet
Development

No branches or pull requests

5 participants