Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-254] [Random forest] Enable Random Forest Regressor algorithm on OAP-Mllib #277

Merged
merged 120 commits into from
May 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
7a9ac34
Migrate KMeans daal to DPC++ (#209)
minmingzhu Jul 14, 2022
d51c4dd
Migrate pca daal to DPC++ (#223)
minmingzhu Aug 9, 2022
d34c194
[ML-226] Migrate correlation daal to DPC++ (#215)
minmingzhu Aug 31, 2022
490144e
[ML-227] Migrate moments daal to DPC++ (#229)
minmingzhu Sep 13, 2022
be3850a
add RF
Feb 7, 2023
7b2071f
update
minmingzhu Feb 14, 2023
32d289e
update
minmingzhu Mar 10, 2023
5a26b51
update
minmingzhu Mar 22, 2023
12b0fc9
RF example pom.xml
minmingzhu Mar 22, 2023
9a173f2
add debug log
minmingzhu Mar 22, 2023
ff2ae5f
update
minmingzhu Mar 23, 2023
dfca421
update DecisionForestOneAPIImpl.cpp
minmingzhu Mar 23, 2023
f48928f
update log
minmingzhu Mar 23, 2023
3fcffaa
update debug log
minmingzhu Mar 23, 2023
06a3593
update debug log
minmingzhu Mar 23, 2023
a7560f7
update DecisionForestOneAPIImpl.cpp
minmingzhu Mar 23, 2023
5043986
update DecisionForestOneAPIImpl.cpp
minmingzhu Mar 23, 2023
784c7d3
update DecisionForestOneAPIImpl.cpp
minmingzhu Mar 23, 2023
cc9847a
update DecisionForestOneAPIImpl.cpp
minmingzhu Mar 24, 2023
b2a2d4b
update DecisionForestOneAPIImpl.cpp
minmingzhu Mar 24, 2023
014447d
update DecisionForestOneAPIImpl.cpp
minmingzhu Mar 24, 2023
4fc46b1
update DecisionForestOneAPIImpl.cpp
minmingzhu Mar 24, 2023
982cec9
update debug log
minmingzhu Mar 24, 2023
e26ab1e
update debug log
minmingzhu Mar 24, 2023
c4ea999
update debug log
minmingzhu Mar 24, 2023
d26412c
update debug log
minmingzhu Mar 26, 2023
b4560c1
update debug log
minmingzhu Mar 26, 2023
a8d2a78
update debug log
minmingzhu Mar 26, 2023
7a8ac1e
update debug log
minmingzhu Mar 26, 2023
220262e
update debug log
minmingzhu Mar 26, 2023
94801f0
update debug log
minmingzhu Mar 26, 2023
1811880
update debug log
minmingzhu Mar 26, 2023
efcb210
update debug log
minmingzhu Mar 26, 2023
318b44b
update debug log
minmingzhu Mar 26, 2023
0440667
update debug log
minmingzhu Mar 26, 2023
97407b5
update debug log
minmingzhu Mar 26, 2023
1fe2be4
update debug log
minmingzhu Mar 26, 2023
e9da413
update debug log
minmingzhu Mar 26, 2023
f4c279d
update debug log
minmingzhu Mar 26, 2023
a5f13b7
update debug log
minmingzhu Mar 26, 2023
6b27475
update debug log
minmingzhu Mar 26, 2023
da97fbd
update debug log
minmingzhu Mar 26, 2023
76c4cc3
update debug log
minmingzhu Mar 26, 2023
cb4f920
update debug log
minmingzhu Mar 26, 2023
be855cb
update debug log
minmingzhu Mar 26, 2023
43c2566
update debug log
minmingzhu Mar 26, 2023
12d89e5
update debug log
minmingzhu Mar 26, 2023
71e33d0
update debug log
minmingzhu Mar 26, 2023
e7ab144
update debug log
minmingzhu Mar 27, 2023
0ac4ee7
update debug log
minmingzhu Mar 27, 2023
4f77164
update debug log
minmingzhu Mar 27, 2023
8271188
update debug log
minmingzhu Mar 27, 2023
066dd8d
update debug log
minmingzhu Mar 27, 2023
a361a0e
update debug log
minmingzhu Mar 27, 2023
6727177
update debug log
minmingzhu Mar 27, 2023
77ed7d0
update debug log
minmingzhu Mar 27, 2023
491125c
update debug log
minmingzhu Mar 27, 2023
4f31eb5
update debug log
minmingzhu Mar 27, 2023
5e40b3f
update debug log
minmingzhu Mar 27, 2023
cc7b019
update debug log
minmingzhu Mar 27, 2023
62556f1
update debug log
minmingzhu Mar 27, 2023
0642a86
update debug log
minmingzhu Mar 27, 2023
6c22bc8
update debug log
minmingzhu Mar 27, 2023
f808263
update
minmingzhu Mar 28, 2023
4ca559d
add RF classifier unit test
minmingzhu Mar 28, 2023
4c70a81
1. update code style
minmingzhu Mar 30, 2023
4da0a70
update
minmingzhu Apr 13, 2023
cbdd01f
update
minmingzhu Apr 13, 2023
a52a95c
update
minmingzhu Apr 13, 2023
c207381
update
minmingzhu Apr 18, 2023
1a2c169
update
minmingzhu Apr 21, 2023
182b7a3
update
minmingzhu Apr 21, 2023
6ddb4a3
update
minmingzhu Apr 23, 2023
96773fe
update
minmingzhu Apr 23, 2023
3fcb270
Migrate KMeans daal to DPC++ (#209)
minmingzhu Jul 14, 2022
4fffd72
Migrate pca daal to DPC++ (#223)
minmingzhu Aug 9, 2022
ad427e8
[ML-226] Migrate correlation daal to DPC++ (#215)
minmingzhu Aug 31, 2022
3864a1d
update
minmingzhu Mar 10, 2023
bde8b89
update
minmingzhu Apr 19, 2023
2ec2827
update
minmingzhu Apr 19, 2023
59578fb
1. remove extra code
minmingzhu Apr 20, 2023
473360f
update
minmingzhu Apr 20, 2023
86aee94
update
minmingzhu Apr 20, 2023
ea1ede3
Migrate KMeans daal to DPC++ (#209)
minmingzhu Jul 14, 2022
2c19a71
Migrate pca daal to DPC++ (#223)
minmingzhu Aug 9, 2022
a127b29
[ML-226] Migrate correlation daal to DPC++ (#215)
minmingzhu Aug 31, 2022
3879c21
[ML-227] Migrate moments daal to DPC++ (#229)
minmingzhu Sep 13, 2022
c8cfb22
add RF
Feb 7, 2023
bfd6817
update
minmingzhu Mar 10, 2023
1a9de1b
1. update code style
minmingzhu Mar 30, 2023
f3e6163
update
minmingzhu Apr 13, 2023
b00d2c1
update
minmingzhu Apr 19, 2023
a2a611f
enable RF regression on oap-mllib
minmingzhu Apr 6, 2023
63b94d3
update
minmingzhu Apr 10, 2023
1474093
update
minmingzhu Apr 10, 2023
a38ea47
rename catalog for RF example
minmingzhu Apr 11, 2023
3a2790b
update RF example
minmingzhu Apr 11, 2023
f4857c9
update
minmingzhu Apr 12, 2023
9b28117
update
minmingzhu Apr 12, 2023
9895d4d
update
minmingzhu Apr 12, 2023
3a6bce5
update
minmingzhu Apr 20, 2023
e796db2
update
minmingzhu Apr 20, 2023
e933a7d
update
minmingzhu Apr 23, 2023
1d86740
update
minmingzhu Apr 23, 2023
65f03f2
update
minmingzhu Apr 23, 2023
6f2b425
update
minmingzhu Apr 24, 2023
b8ebba0
update
minmingzhu Apr 25, 2023
f6fa0bd
update
minmingzhu Apr 26, 2023
fe5e7e1
update
minmingzhu May 11, 2023
9837639
update
minmingzhu May 11, 2023
f7295eb
update
minmingzhu May 11, 2023
ad122f4
update
minmingzhu May 11, 2023
a46edb1
Update MLlibRandomForestClassifierSuite.scala
minmingzhu May 11, 2023
1c9b751
Update MLlibRandomForestClassifierSuite.scala
minmingzhu May 11, 2023
eae5ef9
update DecisionForestRegressorImpl.cpp
minmingzhu May 11, 2023
6ad125f
update DecisionForestRegressorImpl.cpp
minmingzhu May 11, 2023
8fe87bd
Update DecisionForestRegressorImpl.cpp
xwu99 May 12, 2023
c631c25
Update DecisionForestRegressorImpl.cpp
xwu99 May 12, 2023
091283a
Update DecisionForestRegressorImpl.cpp
xwu99 May 12, 2023
ff9ead9
Update DecisionForestClassifierImpl.cpp
xwu99 May 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions examples/random-forest-pyspark/random_forest_regressor_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

"""
Random Forest Regressor Example.
"""
from pyspark.ml import Pipeline
from pyspark.ml.regression import RandomForestRegressor
from pyspark.ml.feature import VectorIndexer
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.sql import SparkSession

if __name__ == "__main__":
spark = SparkSession \
.builder \
.appName("RandomForestRegressorExample") \
.getOrCreate()

if (len(sys.argv) != 2) :
print("Require data file path as input parameter")
sys.exit(1)

# Load and parse the data file, converting it to a DataFrame.
data_sparse = spark.read.format("libsvm").load(sys.argv[1]).toDF("label", "features_sparse")
data = data_sparse.rdd.map(lambda x: Row(label=x[0], features=DenseVector(x[1].toArray()))).toDF()
data.printSchema()
data.show()

# Automatically identify categorical features, and index them.
# Set maxCategories so features with > 4 distinct values are treated as continuous.
featureIndexer = \
VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=4).fit(data)

# Split the data into training and test sets (30% held out for testing)
(trainingData, testData) = data.randomSplit([0.7, 0.3])

# Train a RandomForest model.
rf = RandomForestRegressor(featuresCol="indexedFeatures")

# Chain indexer and forest in a Pipeline
pipeline = Pipeline(stages=[featureIndexer, rf])

# Train model. This also runs the indexer.
model = pipeline.fit(trainingData)

# Make predictions.
predictions = model.transform(testData)

# Select example rows to display.
predictions.select("prediction", "label", "features").show(5)

# Select (prediction, true label) and compute test error
evaluator = RegressionEvaluator(
labelCol="label", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(predictions)
print("Root Mean Squared Error (RMSE) on test data = %g" % rmse)

rfModel = model.stages[1]
print(rfModel) # summary only

spark.stop()
38 changes: 38 additions & 0 deletions examples/random-forest-pyspark/run-gpu-standalone-regressor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/usr/bin/env bash

source ../../conf/env.sh

# CSV data is the same as in Spark example "ml/pca_example.py"
# The data file should be copied to $HDFS_ROOT before running examples
DATA_FILE=$HDFS_ROOT/data/sample_libsvm_data.txt

DEVICE=GPU
RESOURCE_FILE=$PWD/IntelGpuResourceFile.json
WORKER_GPU_AMOUNT=4
EXECUTOR_GPU_AMOUNT=1
TASK_GPU_AMOUNT=1
APP_PY=random_forest_regressor_example.py


time $SPARK_HOME/bin/spark-submit --master $SPARK_MASTER \
--num-executors $SPARK_NUM_EXECUTORS \
--executor-cores $SPARK_EXECUTOR_CORES \
--total-executor-cores $SPARK_TOTAL_CORES \
--driver-memory $SPARK_DRIVER_MEMORY \
--executor-memory $SPARK_EXECUTOR_MEMORY \
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--conf "spark.default.parallelism=$SPARK_DEFAULT_PARALLELISM" \
--conf "spark.sql.shuffle.partitions=$SPARK_DEFAULT_PARALLELISM" \
--conf "spark.driver.extraClassPath=$SPARK_DRIVER_CLASSPATH" \
--conf "spark.executor.extraClassPath=$SPARK_EXECUTOR_CLASSPATH" \
--conf "spark.oap.mllib.device=$DEVICE" \
--conf "spark.worker.resourcesFile=$RESOURCE_FILE" \
--conf "spark.worker.resource.gpu.amount=$WORKER_GPU_AMOUNT" \
--conf "spark.executor.resource.gpu.amount=$EXECUTOR_GPU_AMOUNT" \
--conf "spark.task.resource.gpu.amount=$TASK_GPU_AMOUNT" \
--conf "spark.shuffle.reduceLocality.enabled=false" \
--conf "spark.network.timeout=1200s" \
--conf "spark.task.maxFailures=1" \
--jars $OAP_MLLIB_JAR \
$APP_PY DATA_FILE \
2>&1 | tee random_forest_regressor-$(date +%m%d_%H_%M_%S).log
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@
public class RandomForestResult {
public long predictionNumericTable;
public long probabilitiesNumericTable;
public long importancesNumericTable;
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ public class HomogenTableImpl implements HomogenTableIface {
private long cObject;
private TableMetadata metadata;
private Common.ComputeDevice device;

protected HomogenTableImpl(Common.ComputeDevice computeDevice) {
super();
this.device = computeDevice;
Expand Down
5 changes: 5 additions & 0 deletions mllib-dal/src/main/native/DecisionForestClassifierImpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,11 @@ Java_com_intel_oap_mllib_classification_RandomForestClassifierDALImpl_cRFClassif
maxTreeDepth, seed, maxBins, bootstrap, comm, resultObj);
return hashmapObj;
}
default: {
std::cout << "RandomForest (native): The compute device "
<< "is not supported!" << std::endl;
exit(-1);
}
}
return nullptr;
}
Expand Down
Loading