Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change in run_node behaviour from 4.4.1 to 4.4.2 #3054

Closed
dymart opened this issue Oct 31, 2021 · 5 comments · Fixed by #3059 or #3331
Closed

change in run_node behaviour from 4.4.1 to 4.4.2 #3054

dymart opened this issue Oct 31, 2021 · 5 comments · Fixed by #3059 or #3331

Comments

@dymart
Copy link
Contributor

dymart commented Oct 31, 2021

🐞 bug report

Affected Rule

There seems to have been a change to the setup or execution of run_node.
When updating to 4.4.2 the rule that uses run_node would no longer build correctly.
The rules work as expected under 4.4.1 and 4.4.0.

Is this a regression?

This seems to be a regression or a change in behaviour between minor releases. I'm not sure if the previous behaviour was taking advantage of unintentional functionality or if the change was intentional.

When updating to 4.4.2 the rule that uses run_node would no longer build correctly.
The rules work as expected under 4.4.1 and 4.4.0.

Description

The rule is running a nextjs build with run_node. With 4.4.1 everything works as expected. When changing to 4.4.2 the build breaks. I'm not very familiar with what nextjs does for it's build process in the background but a change in the nodejs rules seems to have changed it's ability to complete a build.

🔬 Minimal Reproduction

A branch to demonstrate the problem has been created: https://github.com/ubiquitoustech/rules_nextjs/tree/debug
The debug branch is the one that includes the setup to debug the problem.
In the workspace under examples/with-next-css is the example build that breaks.

from examples/with-next-css run:
$ bazel build prod

The workspace is setup to run on the broken commit initially.
https://github.com/ubiquitoustech/rules_nextjs/blob/50d111513d56111d5db3e03415f8471361def33a/examples/with-next-css/WORKSPACE#L72

Then if you switch the workspace to use the previous commit it builds.
You would need to comment out the above import and remove the comment from this one and build again.
https://github.com/ubiquitoustech/rules_nextjs/blob/50d111513d56111d5db3e03415f8471361def33a/examples/with-next-css/WORKSPACE#L55

This seems to be the commit that changes functionality: f6db6c9

The repo is large right now because there are some extra commits from rules_nodejs included in the external folder of examples/with-next-css

🔥 Exception or Error

The error is one from the nextjs build and I'm not sure what the root cause of that error would be.

This seems to be the line in nextjs that is the error. https://github.com/vercel/next.js/blob/99abb8bfd7f662efb942763332b87f4348b8844d/packages/next/build/index.ts#L136

This also seems to be the source of the problem for nextjs: https://github.com/vercel/next.js/blob/99abb8bfd7f662efb942763332b87f4348b8844d/packages/next/export/index.ts#L696

I'm not sure what changed with how nodejs dealt with symlinks but this seems to have changed the interaction with these parts of nextjs.

the problem seems to be related to nextjs not finding the build directory that is symlinked by bazel.

if the distDir in the next.config.js file is set to only '.next' the build works. If the distDir is set to the build directory that bazel links it throws an error. This build directory can be and environment variable or hard coded it doesn't make a difference it causes the same error with the nextjs build. This only occurs after the above mentioned commit to how the linker works.

I'm not exactly sure what changed in the rules_nodejs commit mentioned above that cause this change but it seems to have changed something.

I can help debug this more when I have more time. Any help resolving this would be greatly appreciated and mean that an open version of rules for nextjs would be available.

Thanks

🌍 Your Environment

ubuntu 18.04.6 LTS

bazel version 4.2.1 in .bazelversion file

rules_nodejs versions and commits are specified in repo

@dymart
Copy link
Contributor Author

dymart commented Nov 2, 2021

@gregmagolan
Copy link
Collaborator

gregmagolan commented Nov 4, 2021

Thanks for the detailed report. I took a look at your repro & your fix in #3058. The problem seems to be that after the refactor, the linker is now creating a node_modules symlink at the root of the bazel-out tree where as before the refactor it wasn't.

In short, in 4.4.2 there is now node_modules symlinked at,

execroot/<wksp>/bazel-out/darwin-fastbuild/bin/node_modules -> execroot/<wksp>/node_modules

while in 4.4.1 this wasn't created.

The error for the nextjs build AFAICT is that it is resolving react to the two different node_modules trees in the build which is breaks the react build. React requires a single version at a single node_modules location. I've seen the error produced by next.js before due to this problem in another scenario:

Error occurred prerendering page "/500". Read more: https://nextjs.org/docs/messages/prerender-error
Error: Minified React error #321; visit https://reactjs.org/docs/error-decoder.html?invariant=321 for the full message or use the non-minified dev environment for full errors and additional helpful warnings.

I'll put up a PR to switch the behaviour back to the way it was in 4.4.1.

The execroot/<wksp>/bazel-out/darwin-fastbuild/bin/node_modules is redundant anyway since standard node_modules resolution will go down the tree and find execroot/<wksp>/node_modules anyway.

@gregmagolan
Copy link
Collaborator

@dymart I create snapshot build with the fix from #3059

http_archive(
    name = "build_bazel_rules_nodejs",
    sha256 = "f097789fa672061b381ff3db0f3ce305745df20ecc0491f9f53e17a45e1fc240",
    urls = ["https://github.com/aspect-dev/rules_nodejs-builds/raw/4.4.2+b757f102/build_bazel_rules_nodejs-snapshot_builds-snapshot.tar.gz"],
)

Tried that in your repro and it resolves the issue

$ bazel build prod
INFO: Invocation ID: c09317ba-c420-481a-9045-679b6142b695
INFO: Analyzed target //:prod (240 packages loaded, 7327 targets configured).
INFO: Found 1 target...
INFO: From next prod:
warn  - No ESLint configuration detected. Run next lint to begin setup
info  - Checking validity of types...
info  - Creating an optimized production build...
info  - Compiled successfully
info  - Collecting page data...
info  - Generating static pages (0/3)
info  - Generating static pages (3/3)
info  - Finalizing page optimization...

Page                                       Size     First Load JS
┌ ○ /                                      380 B          71.5 kB
├   └ css/82bd5ed75a6ca55e.css             57 B
├   /_app                                  0 B            71.1 kB
└ ○ /404                                   194 B          71.3 kB
+ First Load JS shared by all              71.1 kB
  ├ chunks/framework-e29a9982e0e7a36f.js   42 kB
  ├ chunks/main-71cd0fcb1fa739d4.js        27.8 kB
  ├ chunks/pages/_app-50a142e5a30749cb.js  494 B
  ├ chunks/webpack-514908bffb652963.js     770 B
  └ css/2efe555bc9fbace9.css               57 B

○  (Static)  automatically rendered as static HTML (uses no initial props)

Target //:prod up-to-date:
  dist/bin/prod
INFO: Elapsed time: 7.568s, Critical Path: 0.31s
INFO: 2 processes: 1 remote cache hit, 1 internal.
INFO: Build completed successfully, 2 total actions

@dymart
Copy link
Contributor Author

dymart commented Nov 4, 2021

@gregmagolan thanks so much for taking a look and creating a new pr! Really appreciate it!

@mistic
Copy link
Contributor

mistic commented Feb 16, 2022

@gregmagolan I don't think the problem is correctly fixed in a Windows environment and in the worst case I believe the fixes introduced at v4.4.3 have brought other problems with them. If you try the to run bazel build prod on that same debug branch using the rules v4.4.3 in a Windows machine it will throw a large error output but where we can saw things like:

../../../../downloads/rules_nextjs/examples/with-next-css/node_modules/react/cjs/react.production.min.js
There are multiple modules with names that only differ in casing.
This can lead to unexpected behavior when compiling on a filesystem with other case-semantic.
Use equal casing. Compare these module identifiers:
* C:\Users\IEUser\downloads\rules_nextjs\examples\with-next-css\node_modules\react\cjs\react.production.min.js
    Used by 1 module(s), i. e.
    C:\Users\IEUser\downloads\rules_nextjs\examples\with-next-css\node_modules\react\index.js
* C:\users\ieuser\downloads\rules_nextjs\examples\with-next-css\node_modules\react\cjs\react.production.min.js
    Used by 1 module(s), i. e.
    C:\users\ieuser\downloads\rules_nextjs\examples\with-next-css\node_modules\react\index.js

I've found that to be a problem when trying to upgrade a project I'm working on from rules v4.4.1 into v4.4.3. I was getting a strange webpack build problem around mini-css-extract-plugin which can only be caused in case we are loading multiple webpack versions from different locations (or multiple mini-css-extract-plugins from different locations) in the same build context. I was able to track down the problem started to appear after a76d000 which makes sense as it was the commit that brings the fixes for that initially problem. Somehow it broke the node modules layout on Windows that needs to assure each node module discovered along the process is only being loaded from one location at a time while it also respects all the transitive dependencies like it was doing before that commit.

Aso to mention it is working okay on MacOS as it runs the builds in a sandbox environment and doesnt seems to be affected by those linker changes for runfiles. @gregmagolan could you have a look into this please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants