Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-248][LinearRegression] Add Linear Regression GPU algorithm #261

Merged
merged 48 commits into from
May 6, 2023
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
4a4dde2
File init
argentea Mar 6, 2023
8b30e98
Add linear regression framework
argentea Mar 10, 2023
15c1482
Get LinearRegressionDALImpl done
argentea Mar 17, 2023
77c50b9
Scala almost done
argentea Mar 17, 2023
93523f3
Cpp code done
argentea Mar 31, 2023
4a6ec77
Fix typo
argentea Mar 31, 2023
1417964
Add CPU/GPU macro
argentea May 5, 2023
0d0df26
Remove spark313
argentea Apr 4, 2023
7262f32
Prepare for review
argentea Apr 13, 2023
55db6b1
Remove useless file
argentea Apr 13, 2023
6d90a6f
Remove other branch edited file
argentea Apr 13, 2023
fb0408b
Clean files
argentea Apr 13, 2023
73e453a
Do native code link
argentea Apr 20, 2023
82e55ca
Function complete
argentea Apr 25, 2023
2dababc
Format cpp
argentea Apr 20, 2023
9841368
Fomat cpp clean up
argentea Apr 25, 2023
2896731
restruct cpu code
argentea Apr 25, 2023
3dc3f8b
Fix typo
argentea Apr 25, 2023
030c5b9
Add cpu parameter
argentea Apr 25, 2023
23a8b67
Active cpu even use CPU_GPU_PROFILE
argentea Apr 25, 2023
41e12f3
Solve namespace problem
argentea Apr 25, 2023
0d9e16d
Fix variable name mismatch
argentea Apr 25, 2023
06e45fa
Fix typo
argentea Apr 25, 2023
3c8a42c
Format cpp
argentea Apr 25, 2023
16139bf
Clean up
argentea Apr 25, 2023
d380f82
Clean up debug log in scala
argentea Apr 25, 2023
f784acc
Fix typo
argentea Apr 25, 2023
915f6f0
Revert kmeans change
argentea Apr 26, 2023
40aaa94
Remove CPU_ONLY
argentea Apr 26, 2023
dc97232
Revert jni change
argentea Apr 26, 2023
35b66f1
run.sh is used to run CPU mode
argentea May 4, 2023
6877dbd
Add sampile run script for gpu
argentea May 4, 2023
ebdbec6
Update head file
argentea May 4, 2023
a36e591
Native code adapt new communicator
argentea May 5, 2023
2fa795d
Change scala communicator api
argentea May 5, 2023
35ae17d
Rename LR cpp file and generate new bridge file
argentea May 5, 2023
f092bb0
Remove debug code
argentea May 5, 2023
6412408
Patition problem will be fixed at data convertion
argentea May 5, 2023
7124f17
Revert useless change
argentea May 5, 2023
80bfc0d
Fix indentation
argentea May 5, 2023
95563a1
Fix space problem
argentea May 5, 2023
88c23fe
Fix space
argentea May 5, 2023
1481a56
Revert "Fix space problem"
argentea May 5, 2023
16f1844
Format cpp
argentea May 5, 2023
567e813
Make parameter simpler
argentea May 5, 2023
02dedb7
Format cpp
argentea May 5, 2023
833ff4a
Fix sparse data process
argentea May 5, 2023
b406cfc
Refine error msg
argentea May 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions examples/linear-regression/GetIntelGpuResources.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env bash

# This script is a basic example script to get resource information about NVIDIA GPUs.
# It assumes the drivers are properly installed and the nvidia-smi command is available.
# It is not guaranteed to work on all setups so please test and customize as needed
# for your environment. It can be passed into SPARK via the config
# spark.{driver/executor}.resource.gpu.discoveryScript to allow the driver or executor to discover
# the GPUs it was allocated. It assumes you are running within an isolated container where the
# GPUs are allocated exclusively to that driver or executor.
# It outputs a JSON formatted string that is expected by the
# spark.{driver/executor}.resource.gpu.discoveryScript config.
#
# Example output: {"name": "gpu", "addresses":["0","1","2","3","4","5","6","7"]}

#ADDRS=`nvidia-smi --query-gpu=index --format=csv,noheader | sed -e ':a' -e 'N' -e'$!ba' -e 's/\n/","/g'`
#echo {\"name\": \"gpu\", \"addresses\":[\"$ADDRS\"]}
#ADDRS="0","1","2","3","4","5","6","7"
#echo {\"name\": \"gpu\", \"addresses\":[\"$ADDRS\"]}

echo {\"name\": \"gpu\", \"addresses\":[\"0\",\"1\",\"2\",\"3\"]}
1 change: 1 addition & 0 deletions examples/linear-regression/IntelGpuResourceFile.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[{"id":{"componentName": "spark.worker","resourceName":"gpu"},"addresses":["0","1","2","3"]}]
42 changes: 42 additions & 0 deletions examples/linear-regression/run-gpu-standalone.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env bash

source ../../conf/env.sh

# Data file is from Spark Examples (data/mllib/sample_linear_regression_data.txt) and put in examples/data
# The data file should be copied to $HDFS_ROOT before running examples
DATA_FILE=$HDFS_ROOT/data/sample_linear_regression_data.txt

APP_JAR=target/oap-mllib-examples-$OAP_MLLIB_VERSION.jar
APP_CLASS=org.apache.spark.examples.ml.LinearRegressionExample

DEVICE=GPU
RESOURCE_FILE=$PWD/IntelGpuResourceFile.json
WORKER_GPU_AMOUNT=4
EXECUTOR_GPU_AMOUNT=1
TASK_GPU_AMOUNT=1

# Should run in standalone mode
time $SPARK_HOME/bin/spark-submit --master $SPARK_MASTER \
--num-executors $SPARK_NUM_EXECUTORS \
--executor-cores $SPARK_EXECUTOR_CORES \
--total-executor-cores $SPARK_TOTAL_CORES \
--driver-memory $SPARK_DRIVER_MEMORY \
--executor-memory $SPARK_EXECUTOR_MEMORY \
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--conf "spark.default.parallelism=$SPARK_DEFAULT_PARALLELISM" \
--conf "spark.sql.shuffle.partitions=$SPARK_DEFAULT_PARALLELISM" \
--conf "spark.driver.extraClassPath=$SPARK_DRIVER_CLASSPATH" \
--conf "spark.executor.extraClassPath=$SPARK_EXECUTOR_CLASSPATH" \
--conf "spark.oap.mllib.device=$DEVICE" \
--conf "spark.worker.resourcesFile=$RESOURCE_FILE" \
--conf "spark.worker.resource.gpu.amount=$WORKER_GPU_AMOUNT" \
--conf "spark.executor.resource.gpu.amount=$EXECUTOR_GPU_AMOUNT" \
--conf "spark.task.resource.gpu.amount=$TASK_GPU_AMOUNT" \
--conf "spark.shuffle.reduceLocality.enabled=false" \
--conf "spark.network.timeout=1200s" \
--conf "spark.task.maxFailures=1" \
--jars $OAP_MLLIB_JAR \
--class $APP_CLASS \
$APP_JAR $DATA_FILE \
2>&1 | tee LinearRegression-$(date +%m%d_%H_%M_%S).log

3 changes: 2 additions & 1 deletion examples/linear-regression/run.sh
argentea marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,14 @@ time $SPARK_HOME/bin/spark-submit --master $SPARK_MASTER \
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--conf "spark.default.parallelism=$SPARK_DEFAULT_PARALLELISM" \
--conf "spark.sql.shuffle.partitions=$SPARK_DEFAULT_PARALLELISM" \
--conf "spark.oap.mllib.device=$DEVICE" \
--conf "spark.driver.extraClassPath=$SPARK_DRIVER_CLASSPATH" \
--conf "spark.executor.extraClassPath=$SPARK_EXECUTOR_CLASSPATH" \
--conf "spark.oap.mllib.device=$DEVICE" \
--conf "spark.shuffle.reduceLocality.enabled=false" \
--conf "spark.network.timeout=1200s" \
--conf "spark.task.maxFailures=1" \
--jars $OAP_MLLIB_JAR \
--class $APP_CLASS \
$APP_JAR $DATA_FILE \
2>&1 | tee LinearRegression-$(date +%m%d_%H_%M_%S).log

Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,10 @@
// scalastyle:off println
package org.apache.spark.examples.ml

import org.apache.spark.sql.Row
import org.apache.spark.ml.linalg.Vector
import scopt.OptionParser

import org.apache.spark.ml.feature.LabeledPoint
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.sql.{DataFrame, SparkSession}

Expand Down Expand Up @@ -110,6 +112,14 @@ object LinearRegressionExample {
val training = spark.read.format("libsvm")
.load(params.input).toDF("label", "features")

training.select("label", "features").printSchema()
val featuresRDD=training
.select("label", "features").rdd.map{
case Row(label:Double, feature:Vector)=>new LabeledPoint(label, feature.toDense)
}
import spark.implicits._
val df=featuresRDD.toDF("label", "features")
df.show(false)
val lir = new LinearRegression()
.setFeaturesCol("features")
.setLabelCol("label")
Expand All @@ -120,7 +130,7 @@ object LinearRegressionExample {

// Train the model
val startTime = System.nanoTime()
val lirModel = lir.fit(training)
val lirModel = lir.fit(df)
val elapsedTime = (System.nanoTime() - startTime) / 1e9
println(s"Training time: $elapsedTime seconds")

Expand Down
Loading