Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run K-means algorithm with oap-mllib on AWS EMR when using bigdata profile. #120

Closed
haojinIntel opened this issue Aug 16, 2021 · 1 comment · Fixed by #124
Closed
Labels
bug Something isn't working

Comments

@haojinIntel
Copy link
Collaborator

We try to use Intel-mllib to run K-means on AWS EMR and get error message like:

Job aborted due to stage failure: Task 4970 in stage 12.0 failed 4 times, most recent failure: Lost task 4970.3 in stage 12.0 (TID 30597) (ip-172-31-17-241.us-east-2.compute.internal executor 19): java.lang.UnsatisfiedLinkError: com.intel.daal.data_management.data.HomogenNumericTableByteBufferImpl.dInit(JI)J
	at com.intel.daal.data_management.data.HomogenNumericTableByteBufferImpl.dInit(Native Method)
	at com.intel.daal.data_management.data.HomogenNumericTableByteBufferImpl.initHomogenNumericTable(Unknown Source)
	at com.intel.daal.data_management.data.HomogenNumericTableByteBufferImpl.<init>(Unknown Source)
	at com.intel.daal.data_management.data.HomogenNumericTable.<init>(Unknown Source)
	at com.intel.daal.data_management.data.Matrix.<init>(Unknown Source)
	at org.apache.spark.ml.util.OneDAL$.vectorsToDenseNumericTable(OneDAL.scala:373)
	at org.apache.spark.ml.util.OneDAL$.$anonfun$rddVectorToMergedTables$3(OneDAL.scala:445)
	at org.apache.spark.ml.util.OneDAL$.$anonfun$rddVectorToMergedTables$3$adapted(OneDAL.scala:437)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1423)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1350)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1414)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1237)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:

The cluster contains 1 master and 3 workers. Each worker contains 96 vcores and 384GB memory. The configuration of K-means is showed below:

hibench.kmeans.bigdata.num_of_clusters          5
hibench.kmeans.bigdata.dimensions               1000
hibench.kmeans.bigdata.num_of_samples           25000000
hibench.kmeans.bigdata.samples_per_inputfile    10000
hibench.kmeans.bigdata.k                        300
hibench.kmeans.bigdata.max_iteration            40
hibench.kmeans.storage.level                    MEMORY_ONLY

The configuration of spark is showed below:

spark.driver.extraLibraryPath             /opt/benchmark-tools/oap/lib
spark.driver.extraClassPath               /opt/benchmark-tools/oap/oap_jars/oap-mllib-1.2.0.jar
spark.executor.extraLibraryPath           /opt/benchmark-tools/oap/lib
spark.executor.extraClassPath             /opt/benchmark-tools/oap/oap_jars/oap-mllib-1.2.0.jar
hibench.yarn.executor.num     18
hibench.yarn.executor.cores   8
spark.executor.memory 42g
spark.executor.memoryOverhead 8g
spark.driver.memory 100g
spark.dynamicAllocation.enabled  false
spark.default.parallelism 144
spark.sql.shuffle.partitions  144
@haojinIntel
Copy link
Collaborator Author

@xwu99 @zhixingheyi-tian Please help to track the issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants