Skip to content

Commit

Permalink
Merge pull request #60 from HongW2019/doc
Browse files Browse the repository at this point in the history
[ML-23]Backport master Readme to branch-1.1
  • Loading branch information
zhixingheyi-tian authored Apr 30, 2021
2 parents c0d03ac + 6d89e1e commit 6fc0d07
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 11 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.

##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

# OAP MLlib

## Overview
Expand Down Expand Up @@ -45,6 +49,8 @@ Intel® oneAPI Toolkits components used by the project are already included into

### Spark Configuration

#### General Configuration

Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running.

```
Expand All @@ -56,6 +62,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar
spark.executor.extraClassPath ./oap-mllib-x.x.x.jar
```

#### OAP MLlib Specific Configuration

OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.

### Sanity Check

#### Setup `env.sh`
Expand Down
2 changes: 1 addition & 1 deletion docs/OAP-Installation-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Once finished steps above, you have completed OAP dependencies installation and

Dependencies below are required by OAP and all of them are included in OAP Conda package, they will be automatically installed in your cluster when you Conda install OAP. Ensure you have activated environment which you created in the previous steps.

- [Arrow](https://github.com/Intel-bigdata/arrow)
- [Arrow](https://github.com/oap-project/arrow/tree/arrow-3.0.0-oap-1.1)
- [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/)
- [Memkind](https://anaconda.org/intel/memkind)
- [Vmemcache](https://anaconda.org/intel/vmemcache)
Expand Down
13 changes: 6 additions & 7 deletions docs/User-Guide.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.

##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

# OAP MLlib

## Overview
Expand All @@ -13,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in
OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range.
For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used.

## Online Documentation

You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib).

## Getting Started

Expand Down Expand Up @@ -49,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into

### Spark Configuration

#### General Configuration

Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running.

```
Expand All @@ -60,6 +55,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar
spark.executor.extraClassPath ./oap-mllib-x.x.x.jar
```

#### OAP MLlib Specific Configuration

OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.

### Sanity Check

#### Setup `env.sh`
Expand Down
9 changes: 6 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,6 @@ OAP MLlib is an optimized package to accelerate machine learning algorithms in
OAP MLlib tried to maintain the same API interfaces and produce same results that are identical with Spark MLlib. However due to the nature of float point operations, there may be some small deviation from the original result, we will try our best to make sure the error is within acceptable range.
For those algorithms that are not accelerated by OAP MLlib, the original Spark MLlib one will be used.

## Online Documentation

You can find the all the OAP MLlib documents on the [project web page](https://oap-project.github.io/oap-mllib).

## Getting Started

Expand Down Expand Up @@ -45,6 +42,8 @@ Intel® oneAPI Toolkits components used by the project are already included into

### Spark Configuration

#### General Configuration

Users usually run Spark application on __YARN__ with __client__ mode. In that case, you only need to add the following configurations in `spark-defaults.conf` or in `spark-submit` command line before running.

```
Expand All @@ -56,6 +55,10 @@ spark.driver.extraClassPath /path/to/oap-mllib-x.x.x.jar
spark.executor.extraClassPath ./oap-mllib-x.x.x.jar
```

#### OAP MLlib Specific Configuration

OAP MLlib adopted oneDAL as implementation backend. oneDAL requires enough native memory allocated for each executor. For large dataset, depending on algorithms, you may need to tune `spark.executor.memoryOverhead` to allocate enough native memory. Setting this value to larger than __dataset size / executor number__ is a good starting point.

### Sanity Check

#### Setup `env.sh`
Expand Down

0 comments on commit 6fc0d07

Please sign in to comment.