-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strategies for getting Tensorflow-Java on Apple Silicon? #394
Comments
Self-note. It seems that the above error is due to me building against a 1.8 JDK, instead of something more recent |
Don't try and compile TF-Java using Rosetta, you'll pull in a TF binary which has AVX instructions which will cause a SIGILL and take down the JVM. I've not tried to compile it on an M1 since we bumped to TF 2.6.0 and made some build changes, I can take a look at doing that. Theoretically you should be able to run As for other ML frameworks, I've personally got XGBoost and ONNX Runtime working in Java on an M1 Mac and contributed any fixes back upstream. We had ONNX Runtime working a month or two after the M1 came out. Anything that's in pure Java will work just fine on an M1, but I've not looked at dl4j or djl which both have large native libraries inside. |
@Craigacp Building under Rosetta but using a TF build config file without the AVX instructions is not an option then ? I was not aware that it is pulling a TF binary, I was under the impression that it pulls the TF repo and does compile TF as part of the TF-J build process. |
@Craigacp Any pointer on how to get ONNX going, because the this what I get on the home page? LOL [Edit : I presume you did some cross-compiling to get it work . Going through the docs right now...] |
Java is slow under Rosetta as it messes with the JIT. You could compile TF without AVX support under Rosetta, but it would probably be fairly slow, and at that point I'm not sure what the utility of it is.
I've not tried cross-compiling. Checkout the ONNX Runtime repo on a M1 Mac and then compile it as normal for java |
Well, my main dev machine is now an M1, obviously. So, the utility lies in developing TF models locally, but consequently then train them on a TPU/x86 cloud-based machine. I just want to avoid any pain in my development process. In order to compile it under Rosetta I presume that I need an x86 JVM installed on top of other x86 tools like Bazel, right ? |
Yes, you'll need a full x86 development stack, including Python, probably including compilers as well, and then you might need to change how it finds the compilers to make sure it picks the x86 ones. |
Some people seem to be able to get arm64 binaries for TF 2.6.0, for example, see tensorflow/tensorflow#52160 (comment). BUILD_FLAGS="--cpu=darwin_arm64 --host-cpu=darwin_arm64" mvn clean install |
@saudet That did not work unfortunately. I am able to compile the Tensorflow repo (tensorflow/tensorflow#52160 (comment)), but then Tf-J fails with
On the other hand, diving into the ./tensorflow-core/tensorflow-core-api where the TF core should be built, I was able to start the compilation (
|
Update : Bumping Google's errorprone to <errorprone.version>2.10.0</errorprone.version> fixes this error |
Update: Changing .bazelrc in tensorflow-core/tensorflow-core-api to
In addition I bumped .bazelversion to 4.2.1 and changed build.sh to make bazel run under sudo gets the compilation of that maven compile unit going. Still contains references to the setup on my dev machine, but we are advancing ;-) |
Some of you will be happy. I got the whole thing compiled, however I had to skip tests as it was failing on that part, and there were some warnings on TARGET_OS_IPHONE. Apart from that, it kinda looks good:
|
The build fails with Java 8 (arm64)
[Update]
but unfortunately it fails because in pom.xml we use --add-exports flags for the JVM, which is not supported by the 1.8 JDK. I am not sure why we need these flags in the first place ( I know what the flag is supposed to do), and therefore, what could be a workaround solution. Anyone? |
If it builds with 11 why do you need to build it with 8? It should produce Java 8 compatible jar files even when compiled on 11. |
Because I want to integrate this in a project which uses Spark NLP, and that only runs on a Java 8 VM. As far as I understand, Java 11 compiled jars do not run ok older JVMs |
TF-Java is compiled on 11 but targets 8, and so will produce class files which are compatible with Java 8. |
Ah... I was not aware of this. That means we are good to go. Will you pick up what we did and get the jars onto sonatype? |
I'm trying to replicate what you have on my M1 Mac so I can figure out what the test failures are, but I'm getting issues compiling protobuf. We can't easily deploy to Maven Central as our builds are done through Github Actions and they don't have any Apple Silicon runners. |
@Craigacp I think I solved that by altering .bazelrc cfr #394 (comment) or this build --define=ABSOLUTE_JAVABASE=/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home Not sure in fact, I did many things and document only half of it |
I had to make some modifications to the pom files to get it to build the appropriate jars, and having to run the build as the superuser is worrying to me. I also had to add my JVM as a bazel build flag too. However I didn't need to do anything else to my We're upgrading to TF 2.7.0 at the moment, I'll rerun the build after that has merged in as it might fix some of the issues. |
If I remember well the tests failed on a mismatch of dimension on the input matrix on a NN layer. Mind you that I tried to compile against Java 8 cfr my misunderstanding. |
Could you try and build a clean checkout of this branch - https://github.com/Craigacp/tensorflow-java/tree/apple-silicon ? It'll require sudo, and I don't want that in an actual build, but it would be a useful check if someone else can build it. |
@Craigacp Trying to do so. However, need Google Error Prone bumped to 2.10.0, and what about Bazel? 3.7.2 or 4.2.1 ? |
In fact, I remember I went for Bazel 4.2.1 because there are no pre-compiled Bazel binaries For MacOS arm64, and I wanted to avoid to Compile Bazel from source |
Error prone should work on Java 11. That build is hard coded to expect a Azul Zulu 11 installed in the system. The bazel version is set to 4.2.1 and the whole thing should build with |
Are you still trying to use Bazel |
I set the bazelversion to 4.2.1. Maybe it's not cleaned the build properly? |
It does not work
|
Yes, but I got the whole thing compiled with 4.2.1 last weekend |
Ah, yes, we'll need to update the profiles in the pom.xml files a bit like pull bytedeco/javacpp-presets#1092 for this to work. Are you saying you've already done this? Or should I do it? |
I was able to resolve it by downgrading command line tools and Xcode to version 13.1.6! Goes through like before. |
@DevinTDHa , you just saved my life. I've been trying to build TF 2.10 on my M1 for awhile and was blocked by this Other than that, building 2.10 is pretty straightforward and I've fixed also how the op exporter is linking to TF. I would like to update TF-Java repo so that the latest snapshot can be easily build on M1 machines. @saudet , I've also updated JavaCPP to 1.5.8. Still, I'm facing some new problems that don't seem related to M1 this time but maybe more on 2.10, I'll take a look later but any advice from you would be more than welcome:
|
Yeah, we should do version upgrades for TF Core separately. Put that in a branch and I'll take a look at it. |
Based on that error message, something like this should fix that one though: .put(new Info("absl::Span<const tensorflow::SourceLocation>").annotations("@Span")
.valueTypes("@Cast(\"const tensorflow::SourceLocation*\") SourceLocation")
.pointerTypes("SourceLocation")) |
Yep, that did the trick, thanks! Still, hitting issues now when JavaCPP tries to load the
The JNI library is there and looks ok, here's the output of its
I'll try to debug that but I've pushed too a temporary branch with my actual code, so if you are interested to give it a try @saudet . Don't forget that only works apparently on Xcode CL Tools 13.x, which might require to downgrade |
I don't have access to a Mac like that to check it out, but make sure with, for example, |
Ok, coming back on this. It seems it is yes:
|
Also if I understand JavaCPP debug traces correctly, TF cc and framework libraries have been loaded correctly:
I don't know if that's relevant but these libraries have an extra flag
|
@saudet , anymore guidance you can provide on this? I confirm that binaries build with Bazel (
|
Or could it be related to that change? #394 (comment) |
That seems to get added when there are thread local variables:
No, I don't see how that's related? Since I don't have such a Mac here, that's hard or me to debug, but many users are using other presets for Mac on ARM with no problems at all: bytedeco/javacpp-presets#1069 |
Where does "-undefined dynamic_lookup" get added? You might want to remove that and see if any important looking symbols are missing. |
Actually, I have absolutely no clue where that argument is coming from, the config of the JavaCPP task triggering this build is simply this:
|
Ah, I was looking in the wrong place. JavaCPP adds it automatically so that it behaves more like Linux by default: |
Ok.... so removing
If I add
|
Doesn't look too important. Try to move "tensorflow_framework@.2" from the "preload" list to "link" see if that works: |
@kgoderis am I correct in understanding you got the build to work on M1 with Bazel? Would you be able to share that code? |
Oh, good news everyone!! So @saudet, effectively moving Now, I also had to solve the remaining missing symbol "that didn't look too important" by skipping ... and that being said, I'm planning to push these changes to make the codebase in the TF Java repo compilable on M1 machines, and that also includes an upgrade to.... well, I was at 2.10.0 when I started but I should probably now upgrade to 2.11.0 😄 Just recalling that what @DevinTDHa said previously is still valid, we need to downgrade to Xcode 13.x to get this work until Apple fixes (hopefully) the "malformed trie" issue. That's it, thanks everyone! I'll let you all know when that PR get merged. |
Yes, those libraries are long deprecated, I don't think anyone uses them anymore. |
Ok, I'll do that in a separate PR... one day 😅 |
Hey @karllessard, as per usual, thank you for your work on this project. I am wondering if there has been progress on apple silicon support for this project? Also, I noticed https://github.com/tensorflow/java/actions/runs/7013329954 the other day, which gave me a glint of hope that maybe we'd get apple silicon support AND a bump to TF 2.15, a potent and exciting combination of wins. However, it's unclear if that was just testing for another purpose or testing with the intention of making a new release of this library. Hope you are well. |
We're working on reducing the build process complexity and have added support for building macos arm64 jars locally without running bazel (so it's much simpler and likely to work without user intervention). It's not finished yet as we're hitting issues with Windows which either mean we need to still run a full bazel build on Windows, or we need to wait for Intel to fix the libtensorflow builds on Windows. We might merge it into |
Thanks @mattomatic , +1 to everything that Adam mentioned! I believe we'll get that merged soon (with or without Windows) |
Yay, that's awesome, thanks for the update. I don't fall into the camp who require windows support so it's relieving to know that things seem to work aside from that. |
Hey, I've come across the same issue and wasn't able to downgrade XCode because of the new MacOS version (which worked before). I don't have any experience with jni so I had no idea what to try next, but I had luck changing the JDK from an aarch64 version to a 32Bit one (Termurin 21 without the aarch64 tag) and now it works! I don't know why but it does, maybe that's interesting for someone with more knowledge or anyone with the same issue who needs a solution fast. |
Using an x86 build of TF-Java on Apple Silicon is a bad idea. Apple didn't implement support for AVX vector instructions in Rosetta, so it will crash the JVM whenever it tries to use them. This can be hard to predict as some codepaths may have a fallback to non-vector instructions, but when you hit one it'll cause a |
TF-Java 1.0.0-rc1 has Apple Silicon binaries - https://github.com/tensorflow/java/releases/tag/v1.0.0-rc.1 |
Like some others I am in need to get Tensorflow-java running on an M1 based machine, certainly now that Apple has released a Tensorflow distribution for M1
[I know there is https://github.com//issues/252 but I want to revive the discussion after Apple's recent efforts]
Before even to attempt doing this I was wondering of any of the underlying strategies do make sense, or alternatively, do work
[This one fails based on the current HEAD. (java.lang.NoSuchMethodError: 'java.lang.Iterable com.sun.tools.javac.code.Scope$WriteableScope.getSymbolsByName(com.sun.tools.javac.util.Name, com.sun.tools.javac.util.Filter)'). It does not even gets to the TF native build phase]
Compile from source using x86 tools (e.g. in a "arch -x86_64 zsh" shell), taking into account specific guidelines e.g. remove usage of specific instruction sets. Consequently, run the java.jar using a x86 JVM, e.g. thus under Rossetta
Any other angle to look at the problem ?
[For that matter, how to leverage other ML frameworks on M1, e.g deeplearning4j ?]
The text was updated successfully, but these errors were encountered: