This repository has been archived by the owner on Sep 18, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 75
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* [NSE-130] support decimal round and abs (#166) * support decimal round and abs * remove duplicate cast in multiply/divide * [NSE-161] adding format check (#165) * adding format check Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * formating code Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * adding google format Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * reformat with new style Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * lower clang version to 10 Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * adding script to format code Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * [NSE-170]improve sort shuffle code (#171) * improve sort shuffle code Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * fix format Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * pass by ref in builder Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * fix string/decimal builder Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * [NSE-62]Fixing issue0062 for package arrow dependencies in jar with refresh2 (#172) * Add arrow build and dependency support * Add compress.sh default value * Fix bug for parameter's default value * Add CACHE PATH * fix copy bitmap in InplaceSort (#174) * [NSE-153] fix window results (#175) * fix window sort memory Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * remove unused code Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * fix windown w/o avg Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * fix format Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * fix decimal sort Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * Fix issue 179 for arrow include directory (#181) * Fix issue0191 for .so file copy to tmp. (#192) * Following NSE-153, optimize fallback conditions for columnar window (#189) * Fix q14a/b segfault (#193) Signed-off-by: Chendi Xue <chendi.xue@intel.com> * [NSE-194]Turn on several Arrow parameters (#195) * Turn on several Arrow parameters * Change SIMD Level Setting * Hashmap build opt for semi/anti/exists join (#197) Signed-off-by: Chendi Xue <chendi.xue@intel.com> * [NSE-198] support the month() and dayofmonth() functions with DateType (#199) * [NSE-206] fix doc link, update limitations (#205) * fix doc link Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * update arrow data source Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * adding limitations Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * mention limits Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * [NSE-170] using unsafe appender (#203) This patch adds an unsafe appender which will reserve space before builder array. The boolean builder is not touched as only malloc small sized memory The string builder are not touched as it's diffcult to pre-allocate the space * using unsafe appender Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * fix format Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * update arrow branch Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Co-authored-by: Rui Mo <rui.mo@intel.com> Co-authored-by: Wei-Ting Chen <weiting.chen@intel.com> Co-authored-by: Hongze Zhang <hongze.zhang@intel.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: JiaKe <ke.a.jia@intel.com>
- Loading branch information
1 parent
5e89cd3
commit 692d574
Showing
114 changed files
with
12,569 additions
and
12,418 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
--- | ||
BasedOnStyle: Google | ||
DerivePointerAlignment: false | ||
ColumnLimit: 90 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# llvm-7.0: | ||
Arrow Gandiva depends on LLVM, and I noticed current version strictly depends on llvm7.0 if you installed any other version rather than 7.0, it will fail. | ||
``` shell | ||
wget http://releases.llvm.org/7.0.1/llvm-7.0.1.src.tar.xz | ||
tar xf llvm-7.0.1.src.tar.xz | ||
cd llvm-7.0.1.src/ | ||
cd tools | ||
wget http://releases.llvm.org/7.0.1/cfe-7.0.1.src.tar.xz | ||
tar xf cfe-7.0.1.src.tar.xz | ||
mv cfe-7.0.1.src clang | ||
cd .. | ||
mkdir build | ||
cd build | ||
cmake .. -DCMAKE_BUILD_TYPE=Release | ||
cmake --build . -j | ||
cmake --build . --target install | ||
# check if clang has also been compiled, if no | ||
cd tools/clang | ||
mkdir build | ||
cd build | ||
cmake .. | ||
make -j | ||
make install | ||
``` | ||
|
||
# cmake: | ||
Arrow will download package during compiling, in order to support SSL in cmake, build cmake is optional. | ||
``` shell | ||
wget https://github.com/Kitware/CMake/releases/download/v3.15.0-rc4/cmake-3.15.0-rc4.tar.gz | ||
tar xf cmake-3.15.0-rc4.tar.gz | ||
cd cmake-3.15.0-rc4/ | ||
./bootstrap --system-curl --parallel=64 #parallel num depends on your server core number | ||
make -j | ||
make install | ||
cmake --version | ||
cmake version 3.15.0-rc4 | ||
``` | ||
|
||
# Apache Arrow | ||
``` shell | ||
git clone https://github.com/Intel-bigdata/arrow.git | ||
cd arrow && git checkout branch-0.17.0-oap-1.0 | ||
mkdir -p arrow/cpp/release-build | ||
cd arrow/cpp/release-build | ||
cmake -DARROW_DEPENDENCY_SOURCE=BUNDLED -DARROW_GANDIVA_JAVA=ON -DARROW_GANDIVA=ON -DARROW_PARQUET=ON -DARROW_HDFS=ON -DARROW_BOOST_USE_SHARED=ON -DARROW_JNI=ON -DARROW_DATASET=ON -DARROW_WITH_PROTOBUF=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_LZ4=ON -DARROW_FILESYSTEM=ON -DARROW_JSON=ON .. | ||
make -j | ||
make install | ||
|
||
# build java | ||
cd ../../java | ||
# change property 'arrow.cpp.build.dir' to the relative path of cpp build dir in gandiva/pom.xml | ||
mvn clean install -P arrow-jni -am -Darrow.cpp.build.dir=../cpp/release-build/release/ -DskipTests | ||
# if you are behine proxy, please also add proxy for socks | ||
mvn clean install -P arrow-jni -am -Darrow.cpp.build.dir=../cpp/release-build/release/ -DskipTests -DsocksProxyHost=${proxyHost} -DsocksProxyPort=1080 | ||
``` | ||
|
||
run test | ||
``` shell | ||
mvn test -pl adapter/parquet -P arrow-jni | ||
mvn test -pl gandiva -P arrow-jni | ||
``` | ||
|
||
# Copy binary files to oap-native-sql resources directory | ||
Because oap-native-sql plugin will build a stand-alone jar file with arrow dependency, if you choose to build Arrow by yourself, you have to copy below files as a replacement from the original one. | ||
You can find those files in Apache Arrow installation directory or release directory. Below example assume Apache Arrow has been installed on /usr/local/lib64 | ||
``` shell | ||
cp /usr/local/lib64/libarrow.so.17 $native-sql-engine-dir/cpp/src/resources | ||
cp /usr/local/lib64/libgandiva.so.17 $native-sql-engine-dir/cpp/src/resources | ||
cp /usr/local/lib64/libparquet.so.17 $native-sql-engine-dir/cpp/src/resources | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Spark Configurations for Native SQL Engine | ||
|
||
Add below configuration to spark-defaults.conf | ||
|
||
``` | ||
##### Columnar Process Configuration | ||
spark.sql.sources.useV1SourceList avro | ||
spark.sql.join.preferSortMergeJoin false | ||
spark.sql.extensions com.intel.oap.ColumnarPlugin | ||
spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager | ||
# note native sql engine depends on arrow data source | ||
spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/spark-columnar-core-<version>-jar-with-dependencies.jar:$HOME/miniconda2/envs/oapenv/oap_jars/spark-arrow-datasource-standard-<version>-jar-with-dependencies.jar | ||
spark.executor.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/spark-columnar-core-<version>-jar-with-dependencies.jar:$HOME/miniconda2/envs/oapenv/oap_jars/spark-arrow-datasource-standard-<version>-jar-with-dependencies.jar | ||
spark.executorEnv.LIBARROW_DIR $HOME/miniconda2/envs/oapenv | ||
spark.executorEnv.CC $HOME/miniconda2/envs/oapenv/bin/gcc | ||
###### | ||
``` | ||
|
||
Before you start spark, you must use below command to add some environment variables. | ||
|
||
``` | ||
export CC=$HOME/miniconda2/envs/oapenv/bin/gcc | ||
export LIBARROW_DIR=$HOME/miniconda2/envs/oapenv/ | ||
``` | ||
|
||
About arrow-data-source.jar, you can refer [Unified Arrow Data Source ](https://oap-project.github.io/arrow-data-source/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Spark Native SQL Engine Installation | ||
|
||
For detailed testing scripts, please refer to [solution guide](https://github.com/Intel-bigdata/Solution_navigator/tree/master/nativesql) | ||
|
||
## Install Googletest and Googlemock | ||
|
||
``` shell | ||
yum install gtest-devel | ||
yum install gmock | ||
``` | ||
|
||
## Build Native SQL Engine | ||
|
||
``` shell | ||
git clone -b ${version} https://github.com/oap-project/native-sql-engine.git | ||
cd oap-native-sql | ||
cd cpp/ | ||
mkdir build/ | ||
cd build/ | ||
cmake .. -DTESTS=ON | ||
make -j | ||
``` | ||
|
||
``` shell | ||
cd ../../core/ | ||
mvn clean package -DskipTests | ||
``` | ||
|
||
### Additonal Notes | ||
[Notes for Installation Issues](./InstallationNotes.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
### Notes for Installation Issues | ||
* Before the Installation, if you have installed other version of oap-native-sql, remove all installed lib and include from system path: libarrow* libgandiva* libspark-columnar-jni* | ||
|
||
* libgandiva_jni.so was not found inside JAR | ||
|
||
change property 'arrow.cpp.build.dir' to $ARROW_DIR/cpp/release-build/release/ in gandiva/pom.xml. If you do not want to change the contents of pom.xml, specify it like this: | ||
|
||
``` | ||
mvn clean install -P arrow-jni -am -Darrow.cpp.build.dir=/root/git/t/arrow/cpp/release-build/release/ -DskipTests -Dcheckstyle.skip | ||
``` | ||
|
||
* No rule to make target '../src/protobuf_ep', needed by `src/proto/Exprs.pb.cc' | ||
|
||
remove the existing libprotobuf installation, then the script for find_package() will be able to download protobuf. | ||
|
||
* can't find the libprotobuf.so.13 in the shared lib | ||
|
||
copy the libprotobuf.so.13 from $OAP_DIR/oap-native-sql/cpp/src/resources to /usr/lib64/ | ||
|
||
* unable to load libhdfs: libgsasl.so.7: cannot open shared object file | ||
|
||
libgsasl is missing, run `yum install libgsasl` | ||
|
||
* CentOS 7.7 looks like didn't provide the glibc we required, so binaries packaged on F30 won't work. | ||
|
||
``` | ||
20/04/21 17:46:17 WARN TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2, 10.0.0.143, executor 6): java.lang.UnsatisfiedLinkError: /tmp/libgandiva_jni.sobe729912-3bbe-4bd0-bb96-4c7ce2e62336: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /tmp/libgandiva_jni.sobe729912-3bbe-4bd0-bb96-4c7ce2e62336) | ||
``` | ||
|
||
* Missing symbols due to old GCC version. | ||
|
||
``` | ||
[root@vsr243 release-build]# nm /usr/local/lib64/libparquet.so | grep ZN5boost16re_detail_10710012perl_matcherIN9__gnu_cxx17__normal_iteratorIPKcSsEESaINS_9sub_matchIS6_EEENS_12regex_traitsIcNS_16cpp_regex_traitsIcEEEEE14construct_initERKNS_11basic_regexIcSD_EENS_15regex_constants12_match_flagsE | ||
_ZN5boost16re_detail_10710012perl_matcherIN9__gnu_cxx17__normal_iteratorIPKcSsEESaINS_9sub_matchIS6_EEENS_12regex_traitsIcNS_16cpp_regex_traitsIcEEEEE14construct_initERKNS_11basic_regexIcSD_EENS_15regex_constants12_match_flagsE | ||
``` | ||
|
||
Need to compile all packags with newer GCC: | ||
|
||
``` | ||
[root@vsr243 ~]# export CXX=/usr/local/bin/g++ | ||
[root@vsr243 ~]# export CC=/usr/local/bin/gcc | ||
``` | ||
|
||
* Can not connect to hdfs @sr602 | ||
|
||
vsr606, vsr243 are both not able to connect to hdfs @sr602, need to skipTests to generate the jar | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# OAP Developer Guide | ||
|
||
This document contains the instructions & scripts on installing necessary dependencies and building OAP. | ||
You can get more detailed information from OAP each module below. | ||
|
||
* [SQL Index and Data Source Cache](https://github.com/oap-project/sql-ds-cache/blob/master/docs/Developer-Guide.md) | ||
* [PMem Common](https://github.com/oap-project/pmem-common) | ||
* [PMem Shuffle](https://github.com/oap-project/pmem-shuffle#5-install-dependencies-for-shuffle-remote-pmem-extension) | ||
* [Remote Shuffle](https://github.com/oap-project/remote-shuffle) | ||
* [OAP MLlib](https://github.com/oap-project/oap-mllib) | ||
* [Arrow Data Source](https://github.com/oap-project/arrow-data-source) | ||
* [Native SQL Engine](https://github.com/oap-project/native-sql-engine) | ||
|
||
## Building OAP | ||
|
||
### Prerequisites for Building | ||
|
||
OAP is built with [Apache Maven](http://maven.apache.org/) and Oracle Java 8, and mainly required tools to install on your cluster are listed below. | ||
|
||
- [Cmake](https://help.directadmin.com/item.php?id=494) | ||
- [GCC > 7](https://gcc.gnu.org/wiki/InstallingGCC) | ||
- [Memkind](https://github.com/memkind/memkind/tree/v1.10.1-rc2) | ||
- [Vmemcache](https://github.com/pmem/vmemcache) | ||
- [HPNL](https://github.com/Intel-bigdata/HPNL) | ||
- [PMDK](https://github.com/pmem/pmdk) | ||
- [OneAPI](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html) | ||
- [Arrow](https://github.com/Intel-bigdata/arrow) | ||
|
||
- **Requirements for Shuffle Remote PMem Extension** | ||
If enable Shuffle Remote PMem extension with RDMA, you can refer to [PMem Shuffle](https://github.com/oap-project/pmem-shuffle) to configure and validate RDMA in advance. | ||
|
||
We provide scripts below to help automatically install dependencies above **except RDMA**, need change to **root** account, run: | ||
|
||
``` | ||
# git clone -b <tag-version> https://github.com/Intel-bigdata/OAP.git | ||
# cd OAP | ||
# sh $OAP_HOME/dev/install-compile-time-dependencies.sh | ||
``` | ||
|
||
Run the following command to learn more. | ||
|
||
``` | ||
# sh $OAP_HOME/dev/scripts/prepare_oap_env.sh --help | ||
``` | ||
|
||
Run the following command to automatically install specific dependency such as Maven. | ||
|
||
``` | ||
# sh $OAP_HOME/dev/scripts/prepare_oap_env.sh --prepare_maven | ||
``` | ||
|
||
|
||
### Building | ||
|
||
To build OAP package, run command below then you can find a tarball named `oap-$VERSION-bin-spark-$VERSION.tar.gz` under directory `$OAP_HOME/dev/release-package `. | ||
``` | ||
$ sh $OAP_HOME/dev/compile-oap.sh | ||
``` | ||
|
||
Building Specified OAP Module, such as `oap-cache`, run: | ||
``` | ||
$ sh $OAP_HOME/dev/compile-oap.sh --oap-cache | ||
``` | ||
|
||
|
||
### Running OAP Unit Tests | ||
|
||
Setup building environment manually for intel MLlib, and if your default GCC version is before 7.0 also need export `CC` & `CXX` before using `mvn`, run | ||
|
||
``` | ||
$ export CXX=$OAP_HOME/dev/thirdparty/gcc7/bin/g++ | ||
$ export CC=$OAP_HOME/dev/thirdparty/gcc7/bin/gcc | ||
$ export ONEAPI_ROOT=/opt/intel/inteloneapi | ||
$ source /opt/intel/inteloneapi/daal/2021.1-beta07/env/vars.sh | ||
$ source /opt/intel/inteloneapi/tbb/2021.1-beta07/env/vars.sh | ||
$ source /tmp/oneCCL/build/_install/env/setvars.sh | ||
``` | ||
|
||
Run all the tests: | ||
|
||
``` | ||
$ mvn clean test | ||
``` | ||
|
||
Run Specified OAP Module Unit Test, such as `oap-cache`: | ||
|
||
``` | ||
$ mvn clean -pl com.intel.oap:oap-cache -am test | ||
``` | ||
|
||
### Building SQL Index and Data Source Cache with PMem | ||
|
||
#### Prerequisites for building with PMem support | ||
|
||
When using SQL Index and Data Source Cache with PMem, finish steps of [Prerequisites for building](#prerequisites-for-building) to ensure needed dependencies have been installed. | ||
|
||
#### Building package | ||
|
||
You can build OAP with PMem support with command below: | ||
|
||
``` | ||
$ sh $OAP_HOME/dev/compile-oap.sh | ||
``` | ||
Or run: | ||
|
||
``` | ||
$ mvn clean -q -Ppersistent-memory -Pvmemcache -DskipTests package | ||
``` |
Oops, something went wrong.