Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapel: resolve failures in mac m1 #107405

Closed

Conversation

bhavanijayakumaran
Copy link
Contributor

Signed-off-by: bhavanijayakumaran 82669529+bhavanijayakumaran@users.noreply.github.com

  • [ x] Have you followed the guidelines for contributing?
  • [x ] Have you ensured that your commits follow the commit style guide?
  • [x ] Have you checked that there aren't other open pull requests for the same formula update/change?
  • [ x] Have you built your formula locally with brew install --build-from-source <formula>, where <formula> is the name of the formula you're submitting?
  • [ x] Is your test running fine brew test <formula>, where <formula> is the name of the formula you're submitting?
  • [x ] Does your build pass brew audit --strict <formula> (after doing brew install --build-from-source <formula>)? If this is a new formula, does it pass brew audit --new <formula>?

@BrewTestBot BrewTestBot added the python Python use is a significant feature of the PR or issue label Aug 5, 2022
@carlocab carlocab changed the title Fixed home brew formula chapel.rb to resolve failures in mac m1 chapel: resolve failures in mac m1 Aug 5, 2022
@carlocab
Copy link
Member

carlocab commented Aug 5, 2022

What failures on M1 are you referring to? Does this require brew install --HEAD chapel?

This will fail brew audit, btw, so you probably shouldn't have marked that item as complete in the PR template.

@bhavanijayakumaran
Copy link
Contributor Author

Chapel build fails with
[Fail] Compilation output:
mv: rename /private/tmp/chpl-brew.deleteme-U5CzFi/hello6-taskpar-dist.tmp to /private/tmp/chapel-test-20220804-87278-h3rn08/.chpl/chapel-test-tZwPUP/hello6-taskpar-dist: No such file or directory
error: mv /private/tmp/chpl-brew.deleteme-U5CzFi/hello6-taskpar-dist.tmp /private/tmp/chapel-test-20220804-87278-h3rn08/.chpl/chapel-test-tZwPUP/hello6-taskpar-dist

@tzinsky
Copy link
Contributor

tzinsky commented Aug 5, 2022

The error that Bhavani listed above is occurring M1 installs of the chapel bottle. Our local builds are not encountering this issue and feel that it should be regenerated to resolve the issue. As for the addition of the head tag as something we could do to facilitate better testing of our package, but did not consider the issues with failing audit.

@carlocab
Copy link
Member

carlocab commented Aug 5, 2022

You can add head with

diff --git a/Formula/chapel.rb b/Formula/chapel.rb
index 6b20db9a0d2..fd3881e9067 100644
--- a/Formula/chapel.rb
+++ b/Formula/chapel.rb
@@ -4,6 +4,7 @@ class Chapel < Formula
   url "https://github.com/chapel-lang/chapel/releases/download/1.27.0/chapel-1.27.0.tar.gz"
   sha256 "558b1376fb7757a5e1f254c717953f598a3e89850c8edd1936b8d09c464f3e8b"
   license "Apache-2.0"
+  head "https://github.com/chapel-lang/chapel.git", branch: "main"
 
   bottle do
     sha256 arm64_monterey: "47e7d368c685aed62e699dbb9e87a881ab7d6a65873425b10d83d58458b62557"

Could you elaborate on when this failure started? What caused it? Adding head doesn't change how chapel is built, so I still don't follow how adding it fixes anything. Do you just need it to be rebuilt? If so, the correct way to do that is to bump the revision so that users get the rebuild with brew upgrade.

We should also update the test to catch the failure.

@tzinsky
Copy link
Contributor

tzinsky commented Aug 5, 2022

Could you elaborate on when this failure started? What caused it?

Unfortunately, we don't have answers to either of those questions. We found the error when one of our team members updated their homebrew setup at the end of July. It appears to only affect M1 systems. There is some known issues with how we handle the paths to build tools and we have a PR for that fix going in for our 1.28 release.

As for the need to rebuild, our team believes a rebuild of the bottle will resolve the path issues, and I will say that I was not aware that simply changing the revision would do that. More pushups for Tim.

We do want to add the head tag to facilitate some automated testing we are working to get setup internally and will move it to the location you suggested. We will also work on the updated tests to help catch this error.

Thanks for your assistance...

@carlocab
Copy link
Member

carlocab commented Aug 5, 2022

Here's the patch that you need:

diff --git a/Formula/chapel.rb b/Formula/chapel.rb
index 6b20db9a0d2..16c1148f7d9 100644
--- a/Formula/chapel.rb
+++ b/Formula/chapel.rb
@@ -4,6 +4,8 @@ class Chapel < Formula
   url "https://github.com/chapel-lang/chapel/releases/download/1.27.0/chapel-1.27.0.tar.gz"
   sha256 "558b1376fb7757a5e1f254c717953f598a3e89850c8edd1936b8d09c464f3e8b"
   license "Apache-2.0"
+  revision 1
+  head "https://github.com/chapel-lang/chapel.git", branch: "main"
 
   bottle do
     sha256 arm64_monterey: "47e7d368c685aed62e699dbb9e87a881ab7d6a65873425b10d83d58458b62557"

@carlocab
Copy link
Member

carlocab commented Aug 5, 2022

Also:

Please use the preferred commit-message style for homebrew/core. We put the name of the formula first in commit-message headings.

For new formulae:

At Homebrew, we like to put the name of the formula up front like so: foobar 7.3 (new formula).

For existing formulae:

The preferred commit message format for simple version updates is foobar 7.3 and for fixes is foobar: fix flibble matrix..

Refer to the commit style guide for more details. Also, when making further changes to your pull request, use the following guidelines to make sure that @BrewTestBot can merge your commits:

  • One formula per commit; one commit per formula.
  • Keep merge commits out of the pull request.

@bhavanijayakumaran
Copy link
Contributor Author

bhavanijayakumaran commented Aug 5, 2022

Thanks carlocab!

I will make the suggested changes

@mppf
Copy link

mppf commented Aug 5, 2022

Could you elaborate on when this failure started? What caused it?

I've made a PR for Chapel to try to avoid the problem in the future: chapel-lang/chapel#20385 . From that PR description:

... historically we have saved the linker to use in a file override-ld and then the LLVM backend reads this file and uses the selected linker in order to link the binary. We recently observed some problems with this strategy in the context of Homebrew. The Homebrew Chapel package had this override-ld file referring to clang in a particular location; but the Homebrew clang package was upgraded and clang was no longer available at that location.

In particular, we had saved /opt/homebrew/Cellar/llvm/14.0.6/bin/clang++ but some Homebrew package change moved it to /opt/homebrew/Cellar/llvm/14.0.6_1/bin/clang++. I think the long-term fix is to avoid saving these paths; but the near-term fix is to re-spin the bottle.

I have not checked the situation on linuxbrew but I think that it's very likely that the GCC 12 upgrade is running into a similar issue (see also #106755 (comment) ).

@carlocab
Copy link
Member

carlocab commented Aug 5, 2022

Thanks for the explanation, @mppf.

It seems like override-ld should point to a fixed directory instead. For M1 macOS, it should be /opt/homebrew/opt/llvm/bin/clang++. That's the path exposed to the build environment, but I'm guessing the chapel build either relies on llvm-config, or is a bit overzealous about resolving symlinks.

If you tell me where the override-ld file lives it should be fairly simple to fix it (assuming it's a text file). We can also read it during the test to make sure it doesn't reference a Cellar path.

@mppf
Copy link

mppf commented Aug 5, 2022

Yes we are using llvm-config and patching that would presumably also have avoided this issue.

On my system, the relevant override-ld is lib/darwin/llvm/x86_64/cpu-native/loc-flat/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-cstdlib/hwloc-bundled/re2-bundled/fs-none/lib_pic-none/san-none/override-ld but obviously that path would be different on an M1. And yes, it's just a text file with a path in it.

In any case this override-ld logic is completely irrelevant for the Homebrew package because that's built with CHPL_COMM=none and override-ld only actually matters when mpicxx is needed to link. My Chapel PR should improve that situation for the future.

@iMichka iMichka mentioned this pull request Aug 5, 2022
6 tasks
Signed-off-by: bhavanijayakumaran <82669529+bhavanijayakumaran@users.noreply.github.com>
@carlocab
Copy link
Member

carlocab commented Aug 6, 2022

Or, maybe to make it simpler, since we run util/test/checkChplInstall, maybe that test should also check that any absolute paths that chapel is keeping track of still exist?

That will cause the test to fail when any of the dependencies change prompting us to rebuild.

Copy link
Member

@carlocab carlocab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to merge now since this helps fix existing problems.

However, I'd like to see a follow up that:

  1. Fixes the test to catch this breakage
  2. Fixes the install so that we don't hardcode fragile Cellar paths

@BrewTestBot
Copy link
Member

🤖 A scheduled task has triggered a merge.

@cho-m cho-m mentioned this pull request Aug 6, 2022
6 tasks
@mppf
Copy link

mppf commented Aug 8, 2022

@carlocab - thanks. I did have one question about this

Or, maybe to make it simpler, since we run util/test/checkChplInstall, maybe that test should also check that any absolute paths that chapel is keeping track of still exist?

That will cause the test to fail when any of the dependencies change prompting us to rebuild.

Wasn't the Chapel test already failing e.g. in PR #106755 ? Is the issue just that the fact that Chapel needed a rebuild was totally unclear from the error message? From the CI test run for that PR I see this

  [Fail] Test job failed to compile - Chapel is not installed correctly
  [Fail] Compilation output:
  mv: cannot stat '/tmp/chpl-linuxbrew.deleteme-DvjGFm/hello6-taskpar-dist.tmp': No such file or directory
  error: mv /tmp/chpl-linuxbrew.deleteme-DvjGFm/hello6-taskpar-dist.tmp /tmp/chapel-test-20220804-3395235-1ge9xfx/.chpl/chapel-test-LhS1J8/hello6-taskpar-dist
  error: Make Binary - Linking
  Error: chapel: failed

which is the failure caused by the path no longer being present. The fact that the error looks like a problem with mv is an bug upstream which my upstream PR chapel-lang/chapel#20385 should fix. That PR will also stop saving the fragile Cellar path. I am expecting that PR to make it in to Chapel 1.28 which we are expecting to release mid-September. Since that is only about a month away I am hoping we can just re-spin the Chapel bottle as-needed until we release the 1.28 Chapel formula.

mppf added a commit to chapel-lang/chapel that referenced this pull request Aug 11, 2022
Move override-ld to printchplenv

This PR addresses a future work item left by PR #18880. In particular,
historically we have saved the linker to use in a file `override-ld` and
then the LLVM backend reads this file and uses the selected linker in
order to link the binary. We recently observed some problems with this
strategy in the context of Homebrew. The Homebrew Chapel package had this
`override-ld` file referring to clang in a particular location; but the
Homebrew clang package was upgraded and clang was no longer available at
that location. (See also
Homebrew/homebrew-core#107405 ).

So, this PR moves the `override-ld` logic to something supported by the
chplenv Python scripts. One wrinkle with that is that the Makefile logic
was using `GASNET_LD_REQUIRES_MPI` but that variable is not stored in the
Gasnet .pc file that is used by the chplenv scripts in LLVM compilation.
So, this PR uses the approximation of determining
`GASNET_LD_REQUIRES_MPI` by searching for `mpi` in the `GASNET_LD`
command name.

Additionally, while debugging the issue, I noticed some odd things about
the error handling within `executeAndWait`, so this PR cleans that
function up a bit as well. It also removes some arguments to that
function that aren't really necessary. It adds a C++ dyno test of this
function to make sure the return value of the function is 0 when the
command runs and something other than 0 when the command does not exist.

The chpl-env-gen.h needed some adjustment to work with this change so I
tidied it up to stop filtering out environment variables that often have
characters that can't be in a `#define` and instead to use a more
comprehensive pattern when replacing such characters with `_`.

Future Work:
 * include https://bitbucket.org/berkeleylab/gasnet/pull-requests/554 in
   our bundled GASNet to actually use `GASNET_LD_REQUIRES_MPI` in the
   logic to match the previous behavior.

Reviewed by @arezaii 

- [x] full local testing
- [x] XC gasnet+aries testing
- [x] XC ugni testing
- [x] XC mpi testing
- [x] Mac OS X testing
- [x] testing Hello world on a real ibv system
@github-actions github-actions bot added the outdated PR was locked due to age label Sep 8, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age python Python use is a significant feature of the PR or issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants