Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot using OAP-MLlib to run PCA, ALS and K-means thanks to the commit "7a7b8638490b0054540d7e329f14d538c6ac27dc" #107

Closed
haojinIntel opened this issue Jul 28, 2021 · 3 comments · Fixed by #111

Comments

@haojinIntel
Copy link
Collaborator

Thanks to the commit "7a7b8638490b0054540d7e329f14d538c6ac27dc", using Intel-mllib cannot run PCA, ALS and K-means. The error messages are showed below:

2021-07-28 14:40:37,577 INFO util.Instrumentation: [7062d2c9] training: numPartitions=24 storageLevel=StorageLevel(1 replicas)
2021-07-28 14:40:37,586 INFO util.Instrumentation: [7062d2c9] {"ratingCol":"rating","numItemBlocks":8,"implicitPrefs":true,"numUserBlocks":8,"rank":50,"itemCol":"item","userCol":"user","regParam":0.1,"maxIter":10}
Exception in thread "main" java.lang.UnsatisfiedLinkError: /tmp/MLlibDAL_1bd8b0d7-bb91-4b6d-a866-caa2b05dde8c/lib/libMLlibDAL.so: libOpenCL.so.1: cannot open shared object file: No such file or directory
        at java.lang.ClassLoader$NativeLibrary.load(Native Method)
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
        at java.lang.Runtime.load0(Runtime.java:809)
        at java.lang.System.load(System.java:1086)
        at org.apache.spark.ml.util.LibLoader.loadFromJar(LibLoader.java:115)
        at org.apache.spark.ml.util.LibLoader.loadLibMLlibDAL(LibLoader.java:72)
        at org.apache.spark.ml.util.LibLoader.loadLibraries(LibLoader.java:49)
        at org.apache.spark.ml.util.Utils$.checkClusterPlatformCompatibility(Utils.scala:116)
        at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:922)
        at org.apache.spark.ml.recommendation.ALS.$anonfun$fit$1(ALS.scala:709)
        at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
        at scala.util.Try$.apply(Try.scala:213)
        at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
        at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:691)
        at com.intel.hibench.sparkbench.ml.ALSExample$.run(ALSExample.scala:95)
        at com.intel.hibench.sparkbench.ml.ALSExample$.main(ALSExample.scala:66)
        at com.intel.hibench.sparkbench.ml.ALSExample.main(ALSExample.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2021-07-28 14:40:39,189 INFO spark.SparkContext: Invoking stop() from shutdown hook
2021-07-28 14:40:39,216 INFO server.AbstractConnector: Stopped Spark@e11edae{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2021-07-28 14:40:39,217 INFO ui.SparkUI: Stopped Spark web UI at http://bdpe-sky2:4040
2021-07-28 14:40:39,227 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
2021-07-28 14:40:39,259 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
2021-07-28 14:40:39,260 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2021-07-28 14:40:39,271 INFO cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
2021-07-28 14:40:39,322 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2021-07-28 14:40:39,342 INFO memory.MemoryStore: MemoryStore cleared
2021-07-28 14:40:39,343 INFO storage.BlockManager: BlockManager stopped
2021-07-28 14:40:39,363 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2021-07-28 14:40:39,370 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/jars/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,byteCount=111939764,body=FileSegmentManagedBuffer[file=/opt/Beaver/hibench/sparkbench/assembly/target/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,offset=0,length=111939764]] to /192.168.32.35:50486; closing connection
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
        at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/files/oap-mllib-1.2.0.jar,byteCount=111230771,body=FileSegmentManagedBuffer[file=/opt/Beaver/OAP/oap_jar/oap-mllib-1.2.0.jar,offset=0,length=111230771]] to /192.168.32.35:50496; closing connection
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
        at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/jars/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,byteCount=111939764,body=FileSegmentManagedBuffer[file=/opt/Beaver/hibench/sparkbench/assembly/target/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,offset=0,length=111939764]] to /192.168.32.31:42902; closing connection
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
        at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/files/oap-mllib-1.2.0.jar,byteCount=111230771,body=FileSegmentManagedBuffer[file=/opt/Beaver/OAP/oap_jar/oap-mllib-1.2.0.jar,offset=0,length=111230771]] to /192.168.32.35:50494; closing connection
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
        at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/jars/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,byteCount=111939764,body=FileSegmentManagedBuffer[file=/opt/Beaver/hibench/sparkbench/assembly/target/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,offset=0,length=111939764]] to /192.168.32.31:42904; closing connection
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
        at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,393 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/files/oap-mllib-1.2.0.jar,byteCount=111230771,body=FileSegmentManagedBuffer[file=/opt/Beaver/OAP/oap_jar/oap-mllib-1.2.0.jar,offset=0,length=111230771]] to /192.168.32.31:42906; closing connection
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
        at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,434 INFO spark.SparkContext: Successfully stopped SparkContext
2021-07-28 14:40:39,435 INFO util.ShutdownHookManager: Shutdown hook called

When we roll back to the commit "299a0120d2cfe5f1cd501ca370d800202998fbf7", all work well.

@haojinIntel
Copy link
Collaborator Author

@xwu99 @zhixingheyi-tian Please help to track the issue.

@xwu99
Copy link
Collaborator

xwu99 commented Jul 28, 2021

When we add the GPU support of the algorithm, we need to package all SYCL native dependencies which was not figured out yet. The workaround is to install oneAPI libraries in all nodes and source in Spark environment. Another workaround is to seperate the build of GPU algorithms.

@zhixingheyi-tian zhixingheyi-tian changed the title Cannot using Intel-MLlib to run PCA, ALS and K-means thanks to the commit "7a7b8638490b0054540d7e329f14d538c6ac27dc" Cannot using OAP-MLlib to run PCA, ALS and K-means thanks to the commit "7a7b8638490b0054540d7e329f14d538c6ac27dc" Jul 29, 2021
@xwu99
Copy link
Collaborator

xwu99 commented Aug 2, 2021

#108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants