You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks to the commit "7a7b8638490b0054540d7e329f14d538c6ac27dc", using Intel-mllib cannot run PCA, ALS and K-means. The error messages are showed below:
2021-07-28 14:40:37,577 INFO util.Instrumentation: [7062d2c9] training: numPartitions=24 storageLevel=StorageLevel(1 replicas)
2021-07-28 14:40:37,586 INFO util.Instrumentation: [7062d2c9] {"ratingCol":"rating","numItemBlocks":8,"implicitPrefs":true,"numUserBlocks":8,"rank":50,"itemCol":"item","userCol":"user","regParam":0.1,"maxIter":10}
Exception in thread "main" java.lang.UnsatisfiedLinkError: /tmp/MLlibDAL_1bd8b0d7-bb91-4b6d-a866-caa2b05dde8c/lib/libMLlibDAL.so: libOpenCL.so.1: cannot open shared object file: No such file or directory
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)
at java.lang.System.load(System.java:1086)
at org.apache.spark.ml.util.LibLoader.loadFromJar(LibLoader.java:115)
at org.apache.spark.ml.util.LibLoader.loadLibMLlibDAL(LibLoader.java:72)
at org.apache.spark.ml.util.LibLoader.loadLibraries(LibLoader.java:49)
at org.apache.spark.ml.util.Utils$.checkClusterPlatformCompatibility(Utils.scala:116)
at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:922)
at org.apache.spark.ml.recommendation.ALS.$anonfun$fit$1(ALS.scala:709)
at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
at org.apache.spark.ml.recommendation.ALS.fit(ALS.scala:691)
at com.intel.hibench.sparkbench.ml.ALSExample$.run(ALSExample.scala:95)
at com.intel.hibench.sparkbench.ml.ALSExample$.main(ALSExample.scala:66)
at com.intel.hibench.sparkbench.ml.ALSExample.main(ALSExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2021-07-28 14:40:39,189 INFO spark.SparkContext: Invoking stop() from shutdown hook
2021-07-28 14:40:39,216 INFO server.AbstractConnector: Stopped Spark@e11edae{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2021-07-28 14:40:39,217 INFO ui.SparkUI: Stopped Spark web UI at http://bdpe-sky2:4040
2021-07-28 14:40:39,227 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
2021-07-28 14:40:39,259 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
2021-07-28 14:40:39,260 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2021-07-28 14:40:39,271 INFO cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
2021-07-28 14:40:39,322 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2021-07-28 14:40:39,342 INFO memory.MemoryStore: MemoryStore cleared
2021-07-28 14:40:39,343 INFO storage.BlockManager: BlockManager stopped
2021-07-28 14:40:39,363 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2021-07-28 14:40:39,370 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/jars/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,byteCount=111939764,body=FileSegmentManagedBuffer[file=/opt/Beaver/hibench/sparkbench/assembly/target/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,offset=0,length=111939764]] to /192.168.32.35:50486; closing connection
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/files/oap-mllib-1.2.0.jar,byteCount=111230771,body=FileSegmentManagedBuffer[file=/opt/Beaver/OAP/oap_jar/oap-mllib-1.2.0.jar,offset=0,length=111230771]] to /192.168.32.35:50496; closing connection
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/jars/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,byteCount=111939764,body=FileSegmentManagedBuffer[file=/opt/Beaver/hibench/sparkbench/assembly/target/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,offset=0,length=111939764]] to /192.168.32.31:42902; closing connection
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/files/oap-mllib-1.2.0.jar,byteCount=111230771,body=FileSegmentManagedBuffer[file=/opt/Beaver/OAP/oap_jar/oap-mllib-1.2.0.jar,offset=0,length=111230771]] to /192.168.32.35:50494; closing connection
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,389 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/jars/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,byteCount=111939764,body=FileSegmentManagedBuffer[file=/opt/Beaver/hibench/sparkbench/assembly/target/sparkbench-assembly-8.0-SNAPSHOT-dist.jar,offset=0,length=111939764]] to /192.168.32.31:42904; closing connection
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,393 ERROR server.TransportRequestHandler: Error sending result StreamResponse[streamId=/files/oap-mllib-1.2.0.jar,byteCount=111230771,body=FileSegmentManagedBuffer[file=/opt/Beaver/OAP/oap_jar/oap-mllib-1.2.0.jar,offset=0,length=111230771]] to /192.168.32.31:42906; closing connection
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:606)
at io.netty.channel.nio.NioEventLoop.closeAll(NioEventLoop.java:762)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:524)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
2021-07-28 14:40:39,434 INFO spark.SparkContext: Successfully stopped SparkContext
2021-07-28 14:40:39,435 INFO util.ShutdownHookManager: Shutdown hook called
When we roll back to the commit "299a0120d2cfe5f1cd501ca370d800202998fbf7", all work well.
The text was updated successfully, but these errors were encountered:
When we add the GPU support of the algorithm, we need to package all SYCL native dependencies which was not figured out yet. The workaround is to install oneAPI libraries in all nodes and source in Spark environment. Another workaround is to seperate the build of GPU algorithms.
zhixingheyi-tian
changed the title
Cannot using Intel-MLlib to run PCA, ALS and K-means thanks to the commit "7a7b8638490b0054540d7e329f14d538c6ac27dc"
Cannot using OAP-MLlib to run PCA, ALS and K-means thanks to the commit "7a7b8638490b0054540d7e329f14d538c6ac27dc"
Jul 29, 2021
Thanks to the commit "7a7b8638490b0054540d7e329f14d538c6ac27dc", using Intel-mllib cannot run PCA, ALS and K-means. The error messages are showed below:
When we roll back to the commit "299a0120d2cfe5f1cd501ca370d800202998fbf7", all work well.
The text was updated successfully, but these errors were encountered: