Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K-Means UDTF & Intel oneDAL initial integration #669

Open
wants to merge 67 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
f2da94a
Adding DAAL k-means as UDTF
Nov 30, 2020
5fbc3ff
DAAL paths fix
Nov 30, 2020
d6a5ce9
Include fixed
Nov 30, 2020
78e2c73
DAAL part moved into top of CMakeLists.txt
Nov 30, 2020
f7e0c65
Compilation fix
Nov 30, 2020
2280c5c
Compilation fix
Nov 30, 2020
d44fd77
CMakeLists.txt modification to suport DAAL in UdfTest
Dec 1, 2020
b1f0ae7
CMakeLists.txt modification to suport DAAL in TableUpdateDeleteBenchmark
Dec 1, 2020
5694a1c
CMakeLists.txt modification to suport DAAL in initDB
Dec 1, 2020
8ac2eda
CMakeLists.txt modification to suport DAAL in ImportExportTest
Dec 1, 2020
2e90cf7
CMakeLists.txt modification to suport DAAL in test is another way
Dec 1, 2020
11ca191
CMakeLists.txt modification to suport DAAL in PersistentStorageTest
Dec 1, 2020
f67a914
Compilation fix
Dec 1, 2020
a8491cc
CMakeLists.txt modification to suport DAAL in ForeignStorageCacheTest
Dec 1, 2020
b8407e0
DOUBLE replaced by FLOAT in Calcite factory registration function for…
Dec 2, 2020
2b213a6
Object creation for Calcite function registration
Dec 2, 2020
9625a89
Experiment to exclude NUMERIC from list of operand types
Dec 2, 2020
9bfba2e
Experimental code to provide workaround
Dec 2, 2020
8fedf0c
Revert "Experimental code to provide workaround"
Dec 3, 2020
39df235
Revert "Experiment to exclude NUMERIC from list of operand types"
Dec 3, 2020
88ef2a8
Sizer arg pos fixed, signature for Calcite fixed
Dec 3, 2020
d36b8fa
Zer-copy approach for k-means assignments
Dec 3, 2020
7da03c5
Time measurement added to UDTF k-means
Dec 24, 2020
4a66dba
Todd's suggestion of code midifcation to improve performance of UDFs
Dec 24, 2020
baa629c
Experimental code to understand total time and execution time in Omni…
Jan 13, 2021
79a33e2
Time measurement added to sql_execute_impl
Jan 13, 2021
52fe343
Additional time measurement added to sql_execute_impl
Jan 15, 2021
3d5ecd8
Revert "Additional time measurement added to sql_execute_impl"
Jan 19, 2021
67c0288
Revert "Time measurement added to sql_execute_impl"
Jan 19, 2021
435b14e
Revert "Experimental code to understand total time and execution time…
Jan 19, 2021
189f935
Merge branch 'master' of https://github.com/omnisci/omniscidb
Jun 17, 2021
606a35f
Merge branch 'master' of https://github.com/omnisci/omniscidb
Jun 25, 2021
55649e6
k_means fixed to use latest interfaces
Jun 28, 2021
51b6c7d
Compilation fix
Jun 29, 2021
ca8b027
Compilation fix
Jun 29, 2021
845e4c6
Compilation fix
Jun 29, 2021
99604c2
cmake file fix
Jun 29, 2021
2805a9b
Calcite registration was removed
Jul 5, 2021
fd0ad33
cmake file fixed according to @Pearu suggestion
Jul 5, 2021
8d1e6f1
Calcite registration removal fixed
Jul 5, 2021
8617ce6
Column::value_type added
Jul 7, 2021
8d336ff
Code cleanup
Jul 9, 2021
0a9321b
Merge branch 'master' of https://github.com/omnisci/omniscidb
Jul 9, 2021
c9c8224
Code cleanup
Jul 9, 2021
89e3371
Java compilation fix
Jul 12, 2021
d981edf
New cmake integration with oneDAL
Jul 15, 2021
86ba155
mapd-deps.sh generation fixed
Jul 15, 2021
e27fcc6
Cmake package name fixed for oneDAL
Jul 15, 2021
f1f00d6
oneAPI vars scripts is used instead of oneDAL script to have TBB in t…
Jul 19, 2021
d037b17
Compilation fix
Jul 19, 2021
ddd9bba
cmake file code cleanup
Jul 19, 2021
84f2b12
cmake file improvement: oneDAL include directory assighment via cmake…
Jul 19, 2021
fa52d62
Get rid of output from setvars script
Jul 19, 2021
4e341e6
Cmake file improvement experiment
Jul 19, 2021
cf81045
Cmake file improvement experiment partially reverted
Jul 19, 2021
17e513f
Compilation fix
Jul 19, 2021
ef1efc7
oneDAL cmake config settings fixed
Jul 20, 2021
b284fe0
Static linking for oneDAL instead of dynamic
Jul 21, 2021
39eb9f6
Compilation fix
Jul 21, 2021
87c2491
LATEST_TBB_LIBRARIES added to cmake variables
Jul 22, 2021
3ccc0f5
LATEST_TBB_LIBRARIES usage for tests added
Jul 22, 2021
e5ee331
Merge branch 'master' of https://github.com/omnisci/omniscidb
Jul 29, 2021
6d7ecad
TBB full path experiment
Jul 29, 2021
733e410
Latest TBB for omnisci_server
Jul 29, 2021
cfd44bc
TBB full path replaced by environment variable usage
Jul 29, 2021
2a77ec6
Some formatting issues fixed (that introduced during previous merges)
Jul 30, 2021
1cdf997
ENABLE_ONEDAL set to off by default
Jul 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,20 @@ if(ENABLE_MLPACK)
add_definitions("-DHAVE_MLPACK")
endif()

option(ENABLE_ONEDAL "Use Intel oneDAL" OFF)
if(ENABLE_ONEDAL)
set(USE_DPCPP no)
set(USE_NEW_IFACES no)
set(TARGET_LINK static)
find_package(oneDAL REQUIRED)
list(APPEND ONEDAL_LIBRARIES "libonedal_core.a")
list(APPEND ONEDAL_LIBRARIES "libonedal_thread.a")
list(APPEND LATEST_TBB_LIBRARIES "$ENV{TBBROOT}/lib/intel64/gcc4.8/libtbb.so.12")
list(APPEND LATEST_TBB_LIBRARIES "$ENV{TBBROOT}/lib/intel64/gcc4.8/libtbbmalloc.so.2")
include_directories(${oneDAL_INCLUDE_DIRS})
add_definitions("-DHAVE_ONEDAL")
endif()

if(MSVC)
include_directories(include_directories("${LIBS_PATH}/include/pdcurses"))
else()
Expand Down Expand Up @@ -730,6 +744,11 @@ endif()

list(APPEND MAPD_LIBRARIES ${TBB_LIBS})

if(ENABLE_ONEDAL)
list(APPEND MAPD_LIBRARIES ${ONEDAL_LIBRARIES})
endif()


if(ENABLE_CANONICAL_RAFT)
list(APPEND MAPD_LIBRARIES raft_canonical)
endif()
Expand Down Expand Up @@ -812,11 +831,11 @@ add_custom_target(rerun_cmake ALL
)
add_dependencies(omnisci_server rerun_cmake)

target_link_libraries(omnisci_server mapd_thrift thrift_handler ${MAPD_LIBRARIES} ${Boost_LIBRARIES} ${CMAKE_DL_LIBS} ${CUDA_LIBRARIES} ${PROFILER_LIBS} ${ZLIB_LIBRARIES} ${LOCALE_LINK_FLAG})
target_link_libraries(omnisci_server mapd_thrift thrift_handler ${MAPD_LIBRARIES} ${Boost_LIBRARIES} ${CMAKE_DL_LIBS} ${CUDA_LIBRARIES} ${PROFILER_LIBS} ${ZLIB_LIBRARIES} ${LOCALE_LINK_FLAG} ${LATEST_TBB_LIBRARIES})

target_link_libraries(initdb mapd_thrift thrift_handler ${MAPD_LIBRARIES} ${Boost_LIBRARIES} ${CMAKE_DL_LIBS}
${CUDA_LIBRARIES} ${PROFILER_LIBS} ${ZLIB_LIBRARIES} ${BLOSC_LIBRARIES}
${LOCALE_LINK_FLAG})
${LOCALE_LINK_FLAG} ${LATEST_TBB_LIBRARIES})

macro(set_dpkg_arch arch_in arch_out)
if("${arch_in}" STREQUAL "x86_64")
Expand Down
2 changes: 1 addition & 1 deletion QueryEngine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ add_custom_target(QueryEngineFunctionsTargets
${CMAKE_CURRENT_BINARY_DIR}/GeosRuntime.bc
)

set(TABLE_FUNCTION_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/TableFunctions/TableFunctions.hpp ${CMAKE_CURRENT_SOURCE_DIR}/TableFunctions/TableFunctionsTesting.hpp)
set(TABLE_FUNCTION_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/TableFunctions/TableFunctions.hpp ${CMAKE_CURRENT_SOURCE_DIR}/TableFunctions/TableFunctionsTesting.hpp ${CMAKE_CURRENT_SOURCE_DIR}/TableFunctions/MLFunctions.hpp)
if(ENABLE_MLPACK)
list(APPEND TABLE_FUNCTION_HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/TableFunctions/MLFunctions.hpp)
endif()
Expand Down
4 changes: 4 additions & 0 deletions QueryEngine/OmniSciTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,8 @@ struct GeoMultiPolygon {

template <typename T>
struct Column {
using value_type = T;

T* ptr_; // row data
int64_t size_; // row count

Expand Down Expand Up @@ -216,6 +218,8 @@ struct Column {
*/
template <typename T>
struct ColumnList {
using value_type = T;

int8_t** ptrs_; // ptrs to columns data
int64_t num_cols_; // the length of columns list
int64_t size_; // the size of columns
Expand Down
73 changes: 73 additions & 0 deletions QueryEngine/TableFunctions/TableFunctions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@
#include "../../QueryEngine/OmniSciTypes.h"
#include "../../Shared/funcannotations.h"

#ifdef HAVE_ONEDAL
#include <type_traits>
#include "daal.h"
#endif

// clang-format off
/*
UDTF: row_copier(Column<double>, RowMultiplier) -> Column<double>
Expand Down Expand Up @@ -248,4 +253,72 @@ EXTENSION_NOINLINE int32_t column_list_row_sum__cpu_(const ColumnList<int32_t>&
return output_num_rows;
}

#ifdef HAVE_ONEDAL

// clang-format off
/*
UDTF: k_means(Cursor<Column<int>, ColumnList<float>>, int, int, RowMultiplier) -> Column<int>, Column<int>
*/
// clang-format on

EXTENSION_NOINLINE int32_t k_means(const Column<int>& input_ids,
const ColumnList<float>& input,
const int num_clusters,
const int num_iterations,
const int output_multiplier,
Column<int>& output_ids,
Column<int>& output_cluster) {
using namespace daal::algorithms;
using namespace daal::data_management;

// Float data type
using float_type =
std::remove_cv_t<std::remove_reference_t<decltype(input)>>::value_type;

// Assignments data type
using assignments_type =
std::remove_cv_t<std::remove_reference_t<decltype(output_cluster)>>::value_type;

// Data dimensions
const size_t num_rows = input_ids.size();
const size_t num_columns = input.numCols();

// Prepare input data as structure of arrays (SOA) as columnar format (zero-copy)
const auto dataTable = SOANumericTable::create(num_columns, num_rows);
for (size_t i = 0; i < num_columns; ++i) {
dataTable->setArray<float_type>(input[i].ptr_, i);
}

// Initialization phase of K-Means
kmeans::init::Batch<float_type, kmeans::init::randomDense> init(num_clusters);
init.input.set(kmeans::init::data, dataTable);
init.compute();
const NumericTablePtr centroids = init.getResult()->get(kmeans::init::centroids);

// Prepare output data as homogeneous numeric table to allow zero-copy for assignments
const auto assignmentsTable =
HomogenNumericTable<assignments_type>::create(output_cluster.ptr_, 1, num_rows);
const kmeans::ResultPtr result(new kmeans::Result);
result->set(kmeans::assignments, assignmentsTable);
result->set(kmeans::objectiveFunction,
HomogenNumericTable<float>::create(1, 1, NumericTable::doAllocate));
result->set(kmeans::nIterations,
HomogenNumericTable<int>::create(1, 1, NumericTable::doAllocate));

// Clustering phase of K-Means
kmeans::Batch<> algorithm(num_clusters, num_iterations);
algorithm.input.set(kmeans::data, dataTable);
algorithm.input.set(kmeans::inputCentroids, centroids);
algorithm.parameter().resultsToEvaluate = kmeans::computeAssignments;
algorithm.setResult(result);
algorithm.compute();

// Copying from input_ids to output_ids
output_ids = input_ids;

return num_rows;
}

#endif

#include "MLFunctions.hpp"
2 changes: 1 addition & 1 deletion Tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ endif()
# Tests + Microbenchmarks
add_executable(TableUpdateDeleteBenchmark TableUpdateDeleteBenchmark.cpp)

set(EXECUTE_TEST_LIBS gtest mapd_thrift QueryRunner ${MAPD_LIBRARIES} ${CMAKE_DL_LIBS} ${CUDA_LIBRARIES} ${Boost_LIBRARIES} ${ZLIB_LIBRARIES} ${PROFILER_LIBS})
set(EXECUTE_TEST_LIBS gtest mapd_thrift QueryRunner ${MAPD_LIBRARIES} ${CMAKE_DL_LIBS} ${CUDA_LIBRARIES} ${Boost_LIBRARIES} ${ZLIB_LIBRARIES} ${PROFILER_LIBS} ${LATEST_TBB_LIBRARIES})
set(THRIFT_HANDLER_TEST_LIBRARIES thrift_handler ${EXECUTE_TEST_LIBS})

# Replace Licensing library with TestLicensing library
Expand Down
6 changes: 6 additions & 0 deletions scripts/mapd-deps-prebuilt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,10 @@ if [ "$ID" == "ubuntu" ] ; then
python-yaml \
libxerces-c-dev \
swig
sudo wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB -O - | sudo apt-key add -
sudo add-apt-repository "deb https://apt.repos.intel.com/oneapi all main"
sudo $PACKAGER update
sudo $PACKAGER install intel-oneapi-dal-devel

# Set up gcc-8 as default gcc
sudo update-alternatives \
Expand Down Expand Up @@ -148,6 +152,8 @@ VK_LAYER_PATH=\$PREFIX/etc/vulkan/explicit_layer.d
CMAKE_PREFIX_PATH=\$PREFIX:\$CMAKE_PREFIX_PATH

export LD_LIBRARY_PATH PATH VULKAN_SDK VK_LAYER_PATH CMAKE_PREFIX_PATH

source /opt/intel/oneapi/setvars.sh > /dev/null
EOF

PROFPATH=/etc/profile.d/xx-mapd-deps.sh
Expand Down
2 changes: 2 additions & 0 deletions scripts/mapd-deps.sh.in
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,5 @@ export PATH=$PREFIX/maven/bin:$PATH

export VULKAN_SDK=$PREFIX
export VK_LAYER_PATH=$PREFIX/etc/vulkan/explicit_layer.d

source /opt/intel/oneapi/setvars.sh > /dev/null