-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Support MacOS Intel and M1 #128
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
(I did not mean to click ready for review 👀) |
Hi @cemlyn007 - first of all, thank you for adding this support! This is a really useful PR for the external community. Unfortunately we lack the internal infrastructure to be able to run unit/integration tests and the resources to ensure that the MacOS build stays healthy. We could add a section in the README.md linking to your forked github repo. If you want, you can rename it to something explicit like |
Hi @ebrevdo, good to hear from you! Yeah I would be happy to maintain |
The choice of PyPi package name is up to you. We could also add a pointer to that. Please make it clear on your GitHub readme and PyPi package readme that the fork and associated packages are not maintained by Google/DeepMind and any questions or support requests should go directly to you and your github Issues page. |
Hi Cemlyn, I'm hoping to also get an easy set-up of reverb and launchpad for local testing and development on M1. I am unable to build this branch, and I was hoping you might be able to help. Alongside this I'm trying to compile a quick step-by-step to getting all of the software consistently working correctly.
Where
import platform
platform.platform()
curl -fLO https://releases.bazel.build/5.3.0/release/bazel-5.3.0-darwin-arm64 && chmod +x bazel-5.3.0-darwin-arm64
git clone https://github.com/deepmind/reverb
cd reverb
gh pr checkout 128
python configure.py
/oss_build.sh --clean true --clear_bazel_cache true --tf_dep_override "tensorflow~=2.12.0" --release --python "3.9" I also tried doing this specifying explicitly the following versions (based on the dates of the package releases and the comments in on these PRs): tensorflow-macos==2.12.0
tensorflow-metal==0.8.0 I surprisingly get an SSL-themed error: ERROR: /private/var/tmp/_bazel_maxsmith/42858cc935d0f8a77bf887408b1437d5/external/boringssl/BUILD:146:11: Compiling src/ssl/bio_ssl.cc failed: (Exit 1): cc_wrapper.sh failed: error executing command external/local_config_cc/cc_wrapper.sh -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics ... (remaining 46 arguments skipped)
external/boringssl/src/ssl/bio_ssl.cc:16:37: error: member access into incomplete type 'BIO' (aka 'bio_st')
return reinterpret_cast<SSL *>(bio->ptr);
^
/usr/local/include/openssl/types.h:89:16: note: forward declaration of 'bio_st'
typedef struct bio_st BIO;
^
external/boringssl/src/ssl/bio_ssl.cc:104:27: error: unexpected type name 'SSL': expected expression
OPENSSL_PUT_ERROR(SSL, ERR_R_SHOULD_NOT_HAVE_BEEN_CALLED);
^
external/boringssl/src/ssl/bio_ssl.cc:104:32: error: use of undeclared identifier 'ERR_R_SHOULD_NOT_HAVE_BEEN_CALLED'
OPENSSL_PUT_ERROR(SSL, ERR_R_SHOULD_NOT_HAVE_BEEN_CALLED);
^
external/boringssl/src/ssl/bio_ssl.cc:112:10: error: member access into incomplete type 'BIO' (aka 'bio_st')
bio->shutdown = static_cast<int>(num);
^ I was wondering if you might have an idea of how to resolve this error and/or if I missed anything in my set-up notes? I'd be trying to get your branch of LaunchPad to work next. Thank you regardless for your effort on both packages. |
Hi Max, A bit about my setup:
Note I have not tried experimenting with Next I would manually set the
This all works for me (just tested in case). Looking at your error, it looks like a local copy of openssl has been detected ( |
I think this is a similar issue with the same problem, although you may want to find a solution that doesn't break other things for you -- EDIT -- |
Thanks for the quick reply! So, it's a bit hacky, but I got it to work by temporarily uninstalling openssl. I had: Strangely, conda create -n py39
conda activate py39
conda install python=3.9 -c conda-forge
pip install tensorflow-macos==2.12.0
go get github.com/bazelbuild/bazelisk
go install github.com/bazelbuild/bazelisk@latest
export PATH=$PATH:$(go env GOPATH)/bin
git clone https://github.com/deepmind/reverb
cd reverb
gh pr checkout 128
./oss_build.sh --release Apple M1 Pro
Thank you so much again. I have spent an unfortunate amount of time trying to get this to work locally (trying a view docker options as well). I'm going to verify LaunchPad now. |
I was wrong about the Bazel version. I used 4.2.1 to build Reverb, but then I had to use 7.0.0 to build LaunchPad (from your PR google-deepmind/launchpad#40). LaunchPad also required me to have OpenSSL uninstalled. I'm kind of curious why OpenSSL is so painful, but it's honestly such a relief to have this all working after how much time I've sunk into trying to get it all working (tried docker, emulated docker, etc.). I love the Google software stack, but it can be incredibly painful to use externally. LaunchPad's build, unmodified, "completes successfully" but has some clear errors in the final deliverable. I haven't had the time to dig into exactly what's happening here, but I'm documenting it here. Essentially, it looks like a
for everything in the launchpad namespace. So it quietly doesn't build everything but the final build is successful because everything is skipped. Then when you try and use launchpad:
If you modify the setup so that
|
Just in case, can you make sure to test importing launchpad in a current working directory that doesn't have launchpad as a relative Python module in it's folder? (The little code snippet is just to see if your import error is fixed by being somewhere where there is no folder called courier) -- EDIT -- My advice above is probably wrong, please could you try importing tensorflow before importing launchpad etc? |
Yeah, I had made the mistake of running reverb in the reverb directory before, so your original intuition on common mistakes was not far off. :) It appears that does fix it. How did you intuit that was the issue? If you import lp before tf, they both fail, but the other way around they work just fine (shown below). The error codes seem pretty unrelated as well. Thank you so much again for your help. (py39) [14:28:06] maxsmith@MaxBook-Pro:~
$mkdir /tmp/123 && cd /tmp/123 && python -c "import tensorflow; from courier.python import server"
(py39) [14:28:28] maxsmith@MaxBook-Pro:/tmp/123
$python Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:53:08)
[Clang 15.0.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import launchpad as lp
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/launchpad/__init__.py", line 36, in <module>
from launchpad.nodes.courier.node import CourierHandle
File "/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/launchpad/nodes/courier/node.py", line 21, in <module>
import courier
File "/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/courier/__init__.py", line 29, in <module>
from courier.python.py_server import Server # pytype: disable=import-error
File "/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/courier/python/py_server.py", line 31, in <module>
from courier.python import server
ImportError: dlopen(/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/courier/python/server.so, 0x0002): symbol not found in flat namespace '__ZN3tsl8profiler8internal13g_trace_levelE'
>>> import tensorflow as tf
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/tensorflow/__init__.py", line 37, in <module>
from tensorflow.python.tools import module_util as _module_util
File "/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/tensorflow/python/__init__.py", line 37, in <module>
from tensorflow.python.eager import context
File "/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 31, in <module>
from tensorflow.python import pywrap_tfe
File "/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/tensorflow/python/pywrap_tfe.py", line 25, in <module>
from tensorflow.python._pywrap_tfe import *
ImportError: dlopen(/Users/maxsmith/mambaforge/envs/py39/lib/python3.9/site-packages/tensorflow/python/_pywrap_tfe.so, 0x0002): symbol not found in flat namespace '__ZN10tensorflow13DeviceFactory19GetAnyDeviceDetailsEiPNSt3__113unordered_mapINS1_12basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES8_NS1_4hashIS8_EENS1_8equal_toIS8_EENS6_INS1_4pairIKS8_S8_EEEEEE'
>>> quit() (py39) [14:28:48] maxsmith@MaxBook-Pro:/tmp/123
$python Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:53:08)
[Clang 15.0.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> import launchpad as lp
>>> |
Interesting, I've never tried to import launchpad then tensorflow on Mac, surprised that it looks like you cannot successfully import tensorflow after failing to import launchpad. I suspect Reverb has the same issue? I do suspect I have a mistake in my build implementation where TensorFlow is not automatically being imported upon importing Launchpad. For the time being I just get always remember to import tensorflow before trying to import launchpad or reverb (sometimes by adding import tensorflow to the init.py for reverb and launchpad. If you know of a better solution please let me know |
Reverb actually works totally fine without importing tensorflow beforehand. So the problem is isolated to launchpad. We might be able to compare the build files across the two packages to see if there's anything to remedy the issue. |
Apologies for the delay. I spent some time fiddling with this to see if I could get it to work. The first takeaway I have is that I can confirm that the same build instructions work for Now, as for the error that Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from launchpad import courier
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/launchpad/__init__.py", line 36, in <module>
from launchpad.nodes.courier.node import CourierHandle
File "/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/launchpad/nodes/courier/node.py", line 21, in <module>
import courier
File "/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/courier/__init__.py", line 29, in <module>
from courier.python.py_server import Server # pytype: disable=import-error
File "/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/courier/python/py_server.py", line 31, in <module>
from courier.python import server
ImportError: dlopen(/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/courier/python/server.so, 0x0002): symbol not found in flat namespace '__ZN3tsl8profiler8internal13g_trace_levelE'
>>> from launchpad import LaunchType
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/launchpad/__init__.py", line 36, in <module>
from launchpad.nodes.courier.node import CourierHandle
File "/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/launchpad/nodes/courier/node.py", line 21, in <module>
import courier
File "/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/courier/__init__.py", line 29, in <module>
from courier.python.py_server import Server # pytype: disable=import-error
File "/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/courier/python/py_server.py", line 31, in <module>
from courier.python import server
ImportError: dlopen(/Users/maxsmith/mambaforge/envs/py310/lib/python3.10/site-packages/courier/python/server.so, 0x0002): symbol not found in flat namespace '__ZN3tsl8profiler8internal13g_trace_levelE' The file exists, so it's upset about finding the presumably first symbol
It is not dependent on tensorflow:
I don't think it's directly the fault of server, because I can't find
If you look at server:
So anyways, that's what I've fiddled with to this point. I'm hoping I can look around a bit more later this week. Let me know if this inspires any thoughts or if I made any mistakes in reasoning! |
Hi @ebrevdo, we're still using the Mac ARM64 wheel at InstaDeep, with GitHub offering Mac runners, do you think there is more of an appetite at DeepMind to support Mac? I think it would wonderful if we could get Reverb to officially support Mac ARM64 as it would streamline the user experience for playing around with other DeepMind & Google-Research libraries like RLDS? Of course I would be incredibly enthusiastic for collaborating in achieving this. |
@maxosmith please could I ask if you are still able to compile reverb from your end? I am getting:
|
I had the same issue @cemlyn007 and your hot fix worked for me also |
@cemlyn007 the issue was resolved in: abseil/abseil-cpp#1289. I've tried it with the following patches to your recipe.
conda create -n py39
conda activate py39
conda install python=3.9 -c conda-forge
pip install tensorflow-macos==2.16.1
// Converts a tensorflow::Status object to an absl::Status object.
inline absl::Status FromTensorflowStatus(const tensorflow::Status& status) {
return status;
}
// Converts an absl::Status object to a tensorflow::Status object.
inline tensorflow::Status ToTensorflowStatus(const absl::Status& status) {
return status;
} |
Put up a PR cemlyn007#1 |
Appreciate you contributing the PR, merging it now, hope you have a good week! |
upgraded absl - kudos to @vyeevani
Hi there! Thanks for this PR/fix! I also got it running on M1 Pro Sonoma 14.6! I had some small issues though:
So, when point 4) is considered these steps worked:
One other question I had was about the versioning:
So, my question is if this reasoning is correct, that the things work, or do you see any issues (e.g. I didn't see anything related to protoc)? Is there a way to test if versions do align or would that only be visible if something breaks? Would integrating the commits reverb is ahead of this PR improve things? (I see there are conflicts but don't know how severe they are) |
Hi @drakon, hope you're having a good week! My draft MR is by no means in a great condition, just a ~working condition ;)! It certainly has some hardcoded stuff that at the moment I'd expect the builder (you, me et al) to edit before compiling but I suspect we could make it less hardcoded and MRs are certainly welcome and I'd be happy to review and merge them if it helps the cause! I think but might be wrong as it has been some time since I have modified this MR, that protoc comes from downloading the particular version of gRPC used. I'd say the current "test" is to compile it and see if the compiled wheel can be used to run the Reverb tests? As for supporting newer versions of TensorFlow, am happy for this MR to be brought up to date, if we can maintain compatibility with TensorFlow 2.12 and onwards that would be great but might not be a hard requirement. As for your Bazel versioning issues, sorry I'm not a massive Bazel / C++ compiling expert but I guess it sounds like your conda environment was interfering a bit? Maybe it was modifying some path environment variables causing Bazel to not detect some files properly but that's a wild guess on my part. As for the go get, I think the go team deprecated and removed go get in favour of go install, but you can get better documentation on installing Bazelisk from their GitHub readme. |
Hi @cemlyn007 thanks for your answer! Hope you having/had a good week too! :) Sure, just wanted to share what issue I had; in case someone else runs into the same (self-caused?) things. Could be the conda environment yes. I haven't really used it for something else so maybe I haven't "reset" it properly. Might test that again to see. My thought went more into the direction that you both might have the same issue as I believe you both compiled with 4.2.1 and only after that later with 5.3.0 and might not have started with a clean environment. Regarding PR: I was considering trying to improve and help but to be honest I realised that at this point I'm way too unfamiliar with bazel, packaging in general and in particular specific changes you made to be of any help and not making things worse. But I'm glad it seems to work now. When I get to familiarising myself more and change something, happy to report back/create PR! Thanks again for the work you already put into it! |
Hi @cemlyn007, I wanted to let you know that I have uploaded your fork and published it to PyPI ( Thanks! |
Good evening,
I have managed to get Reverb to build on MacOS Intel and on the M1, I have tested this with a built version of Launchpad (M1) on Acme with Python 3.9. Hoping that we can work together to get this in!
Looking forward to hearing from anybody interested in collaborating!
Kind regards,
Cemlyn