Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core) resolve cwd using realpath when running yarn commands in a workspace #2659

Merged
merged 4 commits into from
May 3, 2021

Conversation

ganemone
Copy link
Contributor

What's the problem this PR addresses?

The default behavior in node is to follow symlinks when resolving files/directories. This
was also the behavior in yarn until this change: bef2039#diff-35b120213ef3f1d3cb1ebad42f2694b4079bb5ccf752a98279a09b0849c4058fR386.
That optimization changed the behavior to preserve symlinks when running a yarn script.

How did you fix it?
This keeps the optimization but uses realpath to ensure symlinks are followed.

...

Checklist

  • I have set the packages that need to be released for my changes to be effective.
  • I will check that all automated PR checks pass before the PR gets reviewed.

@ganemone ganemone requested a review from arcanis as a code owner March 30, 2021 20:42
@@ -376,7 +376,7 @@ async function initializeWorkspaceEnvironment(workspace: Workspace, {binFolder,
)
);

return {manifest: workspace.manifest, binFolder, env, cwd: cwd ?? workspace.cwd};
return {manifest: workspace.manifest, binFolder, env, cwd: cwd ?? ppath.dirname(await xfs.realpathPromise(ppath.join(workspace.cwd, `package.json` as Filename)))};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the dirname/join? Realpath should return the real path even if the last component is a folder, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mimicks the behavior before the optimization change. Also for projects using bazel, which is probably the most common situation where you would run into symlinks, (and also our use case) folders are not linked but files are.

@arcanis
Copy link
Member

arcanis commented Mar 31, 2021

It seems simple enough, but I think we should consider two things:

  • First, can you describe a high-level reasons: why a workspace is a symlink, and why that matters when initializing a project environment? I haven't met this kind of setup in the wild before, so it would be valuable to at least know it exists and why some projects rely on it.

  • I'm worried about using different logics in different places (realpath when initializing the env, sympaths everywhere else). If the workspace cwd must be a realpath, then perhaps it should be done in workspace.cwd itself. On the other hand, if it would break something, then perhaps it would be a sign that we shouldn't use realpaths.

@ganemone
Copy link
Contributor Author

  • First, can you describe a high-level reasons: why a workspace is a symlink, and why that matters when initializing a project environment? I haven't met this kind of setup in the wild before, so it would be valuable to at least know it exists and why some projects rely on it.

The default behavior in node is to follow symlinks. Most tools (yarn included prior to this optimization change) default to this behavior also. I've seen this in jest, flow, typescript, babel, etc. Long term I would recommend respecting the NODE_PRESERVE_SYMLINKS environment variable, but for now my goal was just preserving the previous behavior which is needed to work with projects that use bazel.

  • I'm worried about using different logics in different places (realpath when initializing the env, sympaths everywhere else). If the workspace cwd must be a realpath, then perhaps it should be done in workspace.cwd itself. On the other hand, if it would break something, then perhaps it would be a sign that we shouldn't use realpaths.

I'm not really in a position to do a large refactor of how paths are managed in yarn. This PR at least keeps the behavior consistent. In the future it would probably be good to have logic that could follow the NODE_PRESERVE_SYMLINKS environment variable whenever resolving paths to keep in sync with how node would resolve these paths.

@arcanis
Copy link
Member

arcanis commented Mar 31, 2021

The default behavior in node is to follow symlinks. Most tools (yarn included prior to this optimization change) default to this behavior also. I've seen this in jest, flow, typescript, babel, etc. Long term I would recommend respecting the NODE_PRESERVE_SYMLINKS environment variable, but for now my goal was just preserving the previous behavior which is needed to work with projects that use bazel.

Whether Node preserves symlinks or not isn't directly related to whether Yarn should do it to - I want to have a clear semantical reason why one should be preferred over the other: I don't want, in a year from now, to be unable to say why exactly this choice actually mattered. Similarly, since we're in-between major releases (the next one will be 3.0), fixing the regression is less important than figuring out the behavior we want to follow in the future.

I have little experience with Bazel, so I don't know exactly what makes symlinks important there. Can you detail a bit more?

I'm not really in a position to do a large refactor of how paths are managed in yarn.

Don't worry about this, if necessary I can handle that. But before it comes to that I want to be convinced of the direction things should take - which makes the problem more abstract than technical 😃

@ganemone
Copy link
Contributor Author

If I was arguing from a purely technical perspective I would say realpaths should never be used. The idea of resolving a symlink to the original path kinda defeats the purpose of the symlink. It is the definition of a leaky abstraction.

Unfortunately because node chose this behavior by default, the majority of tools in the open source community either follow this pattern explicitly, or rely on symlinks being resolved and will break if they are not resolved. Currently we are seeing issues with both eslint and flow when running from the bazel sandbox (which is full of symlinks) on the new version of yarn due to the change in how symlinks were followed.

The way I see it there are 3 options for yarn.

  1. Always resolve symlink paths to their real paths (this was the behavior previously afaict)
  2. Never resolve symlinks to their real paths
  3. Resolve symlinks based on some configuration (it would make sense to me to use the same environment variable that node uses)

My preference would be to restore the previous behavior to unblock teams using tools like bazel, but to work towards option 3. Ideally all tools (eslint, flow, etc) would be configurable in this regard. I started to do this work for some of these tools but many projects are reluctant to change. For example: jestjs/jest#9976

@ganemone
Copy link
Contributor Author

Regarding symlinks in bazel, basically bazel creates a sandbox with only the files included in your bazel dep tree. These files are symlinks back to the original repo. Ideally we would not follow symlinks, but it has been really difficult to get that working with the plethora of open source tools that don't support it.

@arcanis arcanis merged commit 0d4f86c into yarnpkg:master May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants