Merge pull request #60 from HongW2019/doc

[ML-23]Backport master Readme to branch-1.1
oap-project · Apr 30, 2021 · 6fc0d07 · 6fc0d07
2 parents c0d03ac + 6d89e1e
commit 6fc0d07
Show file tree

Hide file tree

Showing 4 changed files with 23 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -1,3 +1,7 @@
+##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.
+
+##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
+
 # OAP MLlib
 
 ## Overview
@@ -45,6 +49,8 @@ Intel® oneAPI Toolkits components used by the project are already included into
 
 ### Spark Configuration
 
+#### General Configuration
+
 Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. 
 
 ```
@@ -56,6 +62,10 @@ spark.driver.extraClassPath       /path/to/oap-mllib-x.x.x.jar
 spark.executor.extraClassPath     ./oap-mllib-x.x.x.jar
 ```
 
+#### OAP MLlib Specific Configuration
+
+OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.
+
 ### Sanity Check
 
 #### Setup `env.sh`

diff --git a/docs/OAP-Installation-Guide.md b/docs/OAP-Installation-Guide.md
@@ -36,7 +36,7 @@ Once finished steps above, you have completed OAP dependencies installation and
 
 Dependencies below are required by OAP and all of them are included in OAP Conda package, they will be automatically installed in your cluster when you Conda install OAP. Ensure you have activated environment which you created in the previous steps.
 
-- [Arrow](https://github.com/Intel-bigdata/arrow)
+- [Arrow](https://github.com/oap-project/arrow/tree/arrow-3.0.0-oap-1.1)
 - [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/)
 - [Memkind](https://anaconda.org/intel/memkind)
 - [Vmemcache](https://anaconda.org/intel/vmemcache)

diff --git a/docs/User-Guide.md b/docs/User-Guide.md
@@ -1,7 +1,3 @@
-##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.
-
-##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
-
 # OAP MLlib
 
 ## Overview
@@ -13,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in
 OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range.
 For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used. 
 
-## Online Documentation
-
-You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib).
 
 ## Getting Started
 
@@ -49,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into
 
 ### Spark Configuration
 
+#### General Configuration
+
 Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. 
 
 ```
@@ -60,6 +55,10 @@ spark.driver.extraClassPath       /path/to/oap-mllib-x.x.x.jar
 spark.executor.extraClassPath     ./oap-mllib-x.x.x.jar
 ```
 
+#### OAP MLlib Specific Configuration
+
+OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.
+
 ### Sanity Check
 
 #### Setup `env.sh`

diff --git a/docs/index.md b/docs/index.md
@@ -9,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in
 OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range.
 For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used. 
 
-## Online Documentation
-
-You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib).
 
 ## Getting Started
 
@@ -45,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into
 
 ### Spark Configuration
 
+#### General Configuration
+
 Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running. 
 
 ```
@@ -56,6 +55,10 @@ spark.driver.extraClassPath       /path/to/oap-mllib-x.x.x.jar
 spark.executor.extraClassPath     ./oap-mllib-x.x.x.jar
 ```
 
+#### OAP MLlib Specific Configuration
+
+OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.
+
 ### Sanity Check
 
 #### Setup `env.sh`