Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing on Mac M1 (with appropriate changes to code) #1

Closed
rwoodard-prog opened this issue Nov 11, 2022 · 4 comments
Closed

Crashing on Mac M1 (with appropriate changes to code) #1

rwoodard-prog opened this issue Nov 11, 2022 · 4 comments
Assignees

Comments

@rwoodard-prog
Copy link

(This is probably more of an issue with https://github.com/JohnSnowLabs/spark-nlp than this code but this is a convenient testing ground for Mac M1 experiments. I am cross referencing this issue with the JSL issue JohnSnowLabs/spark-nlp#13079 I submitted.)

This code crashes when using spark-nlp-m1 on a Mac M1.

$ git log -1 --oneline
e3ce52a (HEAD -> master, origin/master, origin/HEAD) Bump to Spark NLP 4.2.1

$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

$ vim build.sbt  # (see diff output below)

$ vim src/main/scala/Main.scala  # (see diff output below)

$ sbt assembly
...
[success] Total time: 65 s (01:05), completed Nov 11, 2022 11:59:35 AM

$ spark-submit --class "Main" target/scala-2.12/spark-nlp-starter-assembly-4.2.1.jar
...
pos_anc download started this may take some time.
Approximate size to download 3.9 MB
Download done! Loading the resource.
glove_100d download started this may take some time.
Approximate size to download 145.3 MB
Download done! Loading the resource.
ner_dl download started this may take some time.
Approximate size to download 13.6 MB
Download done! Loading the resource.
Exception in thread "main" java.lang.UnsatisfiedLinkError: no jnitensorflow in java.library.path
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
...

My edits:

git diff                                                                                                                            27s  12:00:28
diff --git a/build.sbt b/build.sbt
index e1f948d..db40eb8 100644
--- a/build.sbt
+++ b/build.sbt
@@ -30,7 +30,7 @@ libraryDependencies ++= {
     "org.apache.spark" %% "spark-core" % sparkVer % Provided,
     "org.apache.spark" %% "spark-mllib" % sparkVer % Provided,
     "org.scalatest" %% "scalatest" % scalaTestVersion % "test",
-    "com.johnsnowlabs.nlp" %% "spark-nlp" % sparkNLP)
+    "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % sparkNLP)
 }

 /** Disables tests in assembly */
diff --git a/src/main/scala/Main.scala b/src/main/scala/Main.scala
index 9bee85a..535da00 100644
--- a/src/main/scala/Main.scala
+++ b/src/main/scala/Main.scala
@@ -8,6 +8,7 @@ object Main {
   val spark: SparkSession = SparkSession.builder
     .appName("spark-nlp-starter")
     .master("local[*]")
+    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.1")
     .getOrCreate

   def main(args: Array[String]): Unit = {

I got that last line from SparkNLP.scala.

@rwoodard-prog
Copy link
Author

rwoodard-prog commented Nov 16, 2022

The JSL on Mac M1 issue above has been solved and closed. I thought that would solve this problem. It did solve this jnitensorflow problem but then a new one appeared:

pos_anc download started this may take some time.
Approximate size to download 3.9 MB
Download done! Loading the resource.
glove_100d download started this may take some time.
Approximate size to download 145.3 MB
Download done! Loading the resource.
Exception in thread "main" java.lang.UnsatisfiedLinkError: Can't load library: /var/folders/jj/v8v5hy3d22ndmxj58b_736kw0000gp/T/librocksdbjni2717270241799440176.jnilib
	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2633)
	at java.base/java.lang.Runtime.load0(Runtime.java:768)
	at java.base/java.lang.System.load(System.java:1837)
	at org.rocksdb.NativeLibraryLoader.loadLibraryFromJar(NativeLibraryLoader.java:79)
	at org.rocksdb.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:57)
	at org.rocksdb.RocksDB.loadLibrary(RocksDB.java:69)
	at org.rocksdb.RocksDB.<clinit>(RocksDB.java:38)
	at com.johnsnowlabs.storage.RocksDBConnection.<init>(RocksDBConnection.scala:27)

I know @DevinTDHa and you have have worked on the librocksdbjni issue here. Facebook rocksdb team says it should be working here.

My Azul 1.8 java failed w/ wrong architecture problem so I tried building this starter app with java 11, Spark 3.3.1 and SparkNLP 4.2.3.

(The above error is with the env I described in the closing comment of the above issue.)

--- a/build.sbt
+++ b/build.sbt
@@ -7,11 +7,11 @@ val scalaTestVersion = "3.2.9"

 name := "spark-nlp-starter"

-version := "4.2.1"
+version := "4.2.3"

 scalaVersion := "2.12.15"

-javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
+javacOptions ++= Seq("-source", "11", "-target", "11")

 licenses := Seq("Apache-2.0" -> url("https://opensource.org/licenses/Apache-2.0"))

@@ -22,15 +22,15 @@ developers in ThisBuild := List(
     email = "maziyar.panahi@iscpif.fr",
     url = url("https://github.com/maziyarpanahi")))

-val sparkVer = "3.3.0"
-val sparkNLP = "4.2.1"
+val sparkVer = "3.3.1"
+val sparkNLP = "4.2.3"

 libraryDependencies ++= {
   Seq(
     "org.apache.spark" %% "spark-core" % sparkVer % Provided,
     "org.apache.spark" %% "spark-mllib" % sparkVer % Provided,
     "org.scalatest" %% "scalatest" % scalaTestVersion % "test",
-    "com.johnsnowlabs.nlp" %% "spark-nlp" % sparkNLP)
+    "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % sparkNLP)
 }

--- a/src/main/scala/Main.scala
+++ b/src/main/scala/Main.scala
@@ -8,6 +8,7 @@ object Main {
   val spark: SparkSession = SparkSession.builder
     .appName("spark-nlp-starter")
     .master("local[*]")
+    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.3")
     .getOrCreate

   def main(args: Array[String]): Unit = {

For some reason, two versions of rocksdbjni are downloaded:

[info] Fetching artifacts of
https://repo1.maven.org/maven2/org/rocksdb/rocksdbjni/6.20.3/rocksdbjni-6.20.3.jar
  100.0% [##########] 34.4 MiB (1.6 MiB / s)
https://repo1.maven.org/maven2/org/rocksdb/rocksdbjni/6.29.5/rocksdbjni-6.29.5.jar
  100.0% [##########] 50.0 MiB (2.0 MiB / s)
ls -l ~/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/rocksdb/rocksdbjni
total 0
drwxr-xr-x  12 me  staff  384 Nov 16 12:20 6.20.3
drwxr-xr-x  12 me  staff  384 Nov 16 12:20 6.29.5

But dependencies only show one version:

sbt dependencyTree                                                                                                                                                    12:33:10
[info] welcome to sbt 1.6.2 (Azul Systems, Inc. Java 11.0.17)
[info] loading global plugins from /Users/me/.sbt/1.0/plugins
[info] loading settings for project spark-nlp-starter-build from assembly.sbt,plugins.sbt ...
[info] loading project definition from /Users/me/github/maziyarpanahi/spark-nlp-starter/project
[info] loading settings for project spark-nlp-starter from build.sbt ...
[info] set current project to spark-nlp-starter (in build file:/Users/me/github/maziyarpanahi/spark-nlp-starter/)
[info] default:spark-nlp-starter_2.12:4.2.3 [S]
[info]   +-com.johnsnowlabs.nlp:spark-nlp-m1_2.12:4.2.3 [S]
[info]     +-com.amazonaws:aws-java-sdk-bundle:1.11.828
[info]     +-com.github.universal-automata:liblevenshtein:3.0.0
[info]     | +-com.google.code.findbugs:annotations:3.0.1
[info]     | | +-com.google.code.findbugs:jsr305:3.0.1
[info]     | | +-net.jcip:jcip-annotations:1.0
[info]     | |
[info]     | +-com.google.protobuf:protobuf-java-util:3.0.0-beta-3
[info]     | | +-com.google.code.gson:gson:2.3
[info]     | | +-com.google.protobuf:protobuf-java:3.0.0-beta-3
[info]     | |
[info]     | +-com.google.protobuf:protobuf-java:3.0.0-beta-3
[info]     | +-it.unimi.dsi:fastutil:7.0.12
[info]     | +-org.projectlombok:lombok:1.16.8
[info]     | +-org.slf4j:slf4j-api:1.7.21
[info]     |
[info]     +-com.johnsnowlabs.nlp:tensorflow-m1_2.12:0.4.3 [S]
[info]     +-com.navigamez:greex:1.0
[info]     | +-dk.brics.automaton:automaton:1.11-8
[info]     |
[info]     +-com.typesafe:config:1.4.2
[info]     +-org.rocksdb:rocksdbjni:6.29.5
[info]
[success] Total time: 2 s, completed Nov 16, 2022, 12:33:16 PM

Any thoughts?

Thank you.

@maziyarpanahi
Copy link
Owner

Hi @rwoodard-prog

Could you please create a new issue regarding RocksDB support on M1 in https://github.com/JohnSnowLabs/spark-nlp so we can officially have it on JIRA and follow up on it?

Many thanks

@DevinTDHa
Copy link

Answer to this issue is on the main repo:

JohnSnowLabs/spark-nlp#13106 (comment)

@rwoodard-prog
Copy link
Author

@DevinTDHa found the solution (see thread in spark-nlp above). With Spark downgraded to 3.1.3, this code runs successfully:

+--------------------+
|result              |
+--------------------+
|[Google, TensorFlow]|
|[Paris]             |
+--------------------+

+------------------------------------------------------------------------------+
|result                                                                        |
+------------------------------------------------------------------------------+
|[NNP, VBZ, VBN, DT, NN, IN, DT, NN, NN, IN, DT, JJ, NNP, NN, VBG, NN]         |
|[DT, NNP, NN, MD, RB, VB, DT, JJ, NN, ,, VBG, JJ, NN, NNS, IN, JJ, JJ, NNS, .]|
+------------------------------------------------------------------------------+

Thank you, @DevinTDHa and @maziyarpanahi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants