Cannot run K-means algorithm with oap-mllib on AWS EMR when using bigdata profile. #120

haojinIntel · 2021-08-16T06:10:05Z

We try to use Intel-mllib to run K-means on AWS EMR and get error message like:

Job aborted due to stage failure: Task 4970 in stage 12.0 failed 4 times, most recent failure: Lost task 4970.3 in stage 12.0 (TID 30597) (ip-172-31-17-241.us-east-2.compute.internal executor 19): java.lang.UnsatisfiedLinkError: com.intel.daal.data_management.data.HomogenNumericTableByteBufferImpl.dInit(JI)J
	at com.intel.daal.data_management.data.HomogenNumericTableByteBufferImpl.dInit(Native Method)
	at com.intel.daal.data_management.data.HomogenNumericTableByteBufferImpl.initHomogenNumericTable(Unknown Source)
	at com.intel.daal.data_management.data.HomogenNumericTableByteBufferImpl.<init>(Unknown Source)
	at com.intel.daal.data_management.data.HomogenNumericTable.<init>(Unknown Source)
	at com.intel.daal.data_management.data.Matrix.<init>(Unknown Source)
	at org.apache.spark.ml.util.OneDAL$.vectorsToDenseNumericTable(OneDAL.scala:373)
	at org.apache.spark.ml.util.OneDAL$.$anonfun$rddVectorToMergedTables$3(OneDAL.scala:445)
	at org.apache.spark.ml.util.OneDAL$.$anonfun$rddVectorToMergedTables$3$adapted(OneDAL.scala:437)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1423)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:

The cluster contains 1 master and 3 workers. Each worker contains 96 vcores and 384GB memory. The configuration of K-means is showed below:

hibench.kmeans.bigdata.num_of_clusters          5
hibench.kmeans.bigdata.dimensions               1000
hibench.kmeans.bigdata.num_of_samples           25000000
hibench.kmeans.bigdata.samples_per_inputfile    10000
hibench.kmeans.bigdata.k                        300
hibench.kmeans.bigdata.max_iteration            40
hibench.kmeans.storage.level                    MEMORY_ONLY

The configuration of spark is showed below:

spark.driver.extraLibraryPath             /opt/benchmark-tools/oap/lib
spark.driver.extraClassPath               /opt/benchmark-tools/oap/oap_jars/oap-mllib-1.2.0.jar
spark.executor.extraLibraryPath           /opt/benchmark-tools/oap/lib
spark.executor.extraClassPath             /opt/benchmark-tools/oap/oap_jars/oap-mllib-1.2.0.jar
hibench.yarn.executor.num     18
hibench.yarn.executor.cores   8
spark.executor.memory 42g
spark.executor.memoryOverhead 8g
spark.driver.memory 100g
spark.dynamicAllocation.enabled  false
spark.default.parallelism 144
spark.sql.shuffle.partitions  144

The text was updated successfully, but these errors were encountered:

haojinIntel · 2021-08-16T06:40:45Z

@xwu99 @zhixingheyi-tian Please help to track the issue. Thanks!

This was referenced Aug 17, 2021

[Core] Improve locality handling for native lib loading #123

Closed

[ML-123][Core] Improve locality handling for native lib loading #124

Merged

zhixingheyi-tian added the bug Something isn't working label Aug 31, 2021

xwu99 linked a pull request Nov 2, 2021 that will close this issue

[ML-123][Core] Improve locality handling for native lib loading #124

Merged

xwu99 removed a link to a pull request Nov 2, 2021

[ML-123][Core] Improve locality handling for native lib loading #124

Merged

xwu99 linked a pull request Nov 2, 2021 that will close this issue

[ML-123][Core] Improve locality handling for native lib loading #124

Merged

xwu99 closed this as completed Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot run K-means algorithm with oap-mllib on AWS EMR when using bigdata profile. #120

Cannot run K-means algorithm with oap-mllib on AWS EMR when using bigdata profile. #120

haojinIntel commented Aug 16, 2021

haojinIntel commented Aug 16, 2021

Cannot run K-means algorithm with oap-mllib on AWS EMR when using bigdata profile. #120

Cannot run K-means algorithm with oap-mllib on AWS EMR when using bigdata profile. #120

Comments

haojinIntel commented Aug 16, 2021

haojinIntel commented Aug 16, 2021