[NSE-400] Native Arrow Row to columnar support #637

haojinIntel · 2021-12-15T16:07:42Z

What changes were proposed in this pull request?

Native Arrow Row to columnar support

How was this patch tested?

pass jenkins

github-actions · 2021-12-15T16:08:00Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/native-sql-engine/issues

Then could you also rename commit message and pull request title in the following format?

[NSE-${ISSUES_ID}] ${detailed message}

See also:

Other pull requests

…RowToColumnarExec

zhouyuan · 2022-01-04T02:20:11Z

@haojinIntel can you do a rebase? the format check is fixed in master

zhouyuan · 2022-01-05T05:54:38Z

native-sql-engine/core/src/main/scala/com/intel/oap/execution/ArrowRowToColumnarExec.scala

+        val res = new Iterator[ColumnarBatch] {
+          private val converters = new RowToColumnConverter(localSchema)
+          private var last_cb: ColumnarBatch = null
+          private var elapse: Long = 0


new buf here

zhouyuan · 2022-01-05T05:54:55Z

native-sql-engine/core/src/main/scala/com/intel/oap/execution/ArrowRowToColumnarExec.scala

+            // Allocate large buffer to store the numRows rows
+            val bufferSize = 134217728  // 128M can estimator the buffer size based on the data type
+            val allocator = SparkMemoryUtils.contextAllocator()
+            val arrowBuf: ArrowBuf = allocator.buffer(bufferSize)


reuse buf allocated in #95

zhouyuan · 2022-01-05T05:56:25Z

native-sql-engine/core/src/main/scala/com/intel/oap/execution/ArrowRowToColumnarExec.scala

+              }
+              val timeZoneId = SparkSchemaUtils.getLocalTimezoneID()
+              val arrowSchema = ArrowUtils.toArrowSchema(schema, timeZoneId)
+              val schemaBytes: Array[Byte] = ConverterUtils.getSchemaBytesBuf(arrowSchema)


move these varaiables to init

zhouyuan · 2022-01-05T05:59:57Z

native-sql-engine/cpp/src/operators/row_to_columnar_converter.cc

+    out_data.null_count = null_count;
+    *array = MakeArray(std::make_shared<arrow::ArrayData>(std::move(out_data)));
+    return arrow::Status::OK();
+  } else if (type->id() == arrow::Int8Type::type_id) {


use switch here

… Type

github-actions · 2022-01-07T05:25:08Z

#400

* Support ArrowRowToColumnar Optimization * Replace expired code * Add the code to convert recordbatch to columnarBatch * Add unit test on java size * Update the unit tests * Fix the bug when reading decimal value from unsafeRow * Use ArrowRowToColumnarExec instead of RowToArrowColumnarExec * Use clang-format to standardize the CPP code format * enable arrowRowToColumnarExec * Add the metrics for ArrowRowToColumnarExec * Add the metrics for ArrowRowToColumnarExec and unsupport Codegen * Add parameter 'spark.oap.sql.columnar.rowtocolumnar' to control ArrowRowToColumnarExec * Remove useless code * Release arrowbuf after return recordbatch * Fix the processTime metric for ArrowRowToColumnarExec * Refine the code of ArrowRowToColumnar operator * Add more metrics to detect the elapse time of each action * Small fix about allocating buffer for unsafeRow * Remove useless code * Remove useless metrics for ArrowRowToColumnarExec * Fall back to use java RowToColumnarExec when the row is not unsafeRow Type * Fix the bug for decimal format * fix format Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>

* Use Hadoop 3.2 as default hadoop.version (#652) * [NSE-661] Add left/right trim in WSCG * [NSE-675] Add instr expression support (#676) * Initial commit * Add the support in wscg * [NSE-674] Add translate expression support (#672) * Initial commit * Add StringTranslate for subquery checking * Code refactor * Change arrow branch for unit test [will revert at last] * Revert "Change arrow branch for unit test [will revert at last]" This reverts commit bf74356. * Port the function to wscg * Change arrow branch for unit test [will revert at last] * Format native code * Fix a bug * Revert "Change arrow branch for unit test [will revert at last]" This reverts commit 3a53fa2. * [NSE-681] Add floor & ceil expression support (#682) * Initial commit * Add ceil expression support * Change arrow branch for unit test [will revert at last] * Revert "Change arrow branch for unit test [will revert at last]" This reverts commit 5fb2f4b. * [NSE-647] Leverage buffered write in shuffle (#648) Closes #647 * [NSE-400] Native Arrow Row to columnar support (#637) * Support ArrowRowToColumnar Optimization * Replace expired code * Add the code to convert recordbatch to columnarBatch * Add unit test on java size * Update the unit tests * Fix the bug when reading decimal value from unsafeRow * Use ArrowRowToColumnarExec instead of RowToArrowColumnarExec * Use clang-format to standardize the CPP code format * enable arrowRowToColumnarExec * Add the metrics for ArrowRowToColumnarExec * Add the metrics for ArrowRowToColumnarExec and unsupport Codegen * Add parameter 'spark.oap.sql.columnar.rowtocolumnar' to control ArrowRowToColumnarExec * Remove useless code * Release arrowbuf after return recordbatch * Fix the processTime metric for ArrowRowToColumnarExec * Refine the code of ArrowRowToColumnar operator * Add more metrics to detect the elapse time of each action * Small fix about allocating buffer for unsafeRow * Remove useless code * Remove useless metrics for ArrowRowToColumnarExec * Fall back to use java RowToColumnarExec when the row is not unsafeRow Type * Fix the bug for decimal format * fix format Co-authored-by: Yuan Zhou <yuan.zhou@intel.com> * fix leakage in rowtocolumn (#683) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Co-authored-by: Wei-Ting Chen <weiting.chen@intel.com> Co-authored-by: PHILO-HE <feilong.he@intel.com> Co-authored-by: Hongze Zhang <hongze.zhang@intel.com> Co-authored-by: haojinIntel <hao.jin@intel.com>

haojinIntel added 3 commits December 15, 2021 12:24

Support ArrowRowToColumnar Optimization

cfa49bd

Replace expired code

9763bf2

Add the code to convert recordbatch to columnarBatch

db1165c

zhouyuan changed the title ~~Arrow Row to columnar support~~ [DNM] Arrow Row to columnar support Dec 17, 2021

haojinIntel added 12 commits December 20, 2021 21:38

Add unit test on java size

44f36f2

Update the unit tests

5061c2d

Fix the bug when reading decimal value from unsafeRow

a9e9774

Use ArrowRowToColumnarExec instead of RowToArrowColumnarExec

df7df67

Use clang-format to standardize the CPP code format

e580666

enable arrowRowToColumnarExec

f2418a7

Add the metrics for ArrowRowToColumnarExec

43afdfb

Add the metrics for ArrowRowToColumnarExec and unsupport Codegen

b03668d

Add parameter 'spark.oap.sql.columnar.rowtocolumnar' to control Arrow…

723e739

…RowToColumnarExec

Remove useless code

b1b267a

Release arrowbuf after return recordbatch

cc6c9eb

Fix the processTime metric for ArrowRowToColumnarExec

7221c8f

Merge branch 'master' into rowToColumnSupport

aa84c18

zhouyuan reviewed Jan 5, 2022

View reviewed changes

haojinIntel added 6 commits January 5, 2022 18:19

Refine the code of ArrowRowToColumnar operator

8cbe0a9

Add more metrics to detect the elapse time of each action

ba76a11

Small fix about allocating buffer for unsafeRow

0dbf08b

Remove useless code

7c0c372

Remove useless metrics for ArrowRowToColumnarExec

9373bb7

Fall back to use java RowToColumnarExec when the row is not unsafeRow…

9370401

… Type

zhouyuan changed the title ~~[DNM] Arrow Row to columnar support~~ [NSE-400] Arrow Row to columnar support Jan 7, 2022

zhouyuan changed the title ~~[NSE-400] Arrow Row to columnar support~~ [NSE-400] Native Arrow Row to columnar support Jan 7, 2022

haojinIntel and others added 2 commits January 8, 2022 00:09

Fix the bug for decimal format

ef1d109

fix format

f1f2f2c

zhouyuan merged commit 8e9f4fe into oap-project:master Jan 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NSE-400] Native Arrow Row to columnar support #637

[NSE-400] Native Arrow Row to columnar support #637

haojinIntel commented Dec 15, 2021 •

edited by zhouyuan

Loading

github-actions bot commented Dec 15, 2021

zhouyuan commented Jan 4, 2022

zhouyuan Jan 5, 2022

zhouyuan Jan 5, 2022

zhouyuan Jan 5, 2022

zhouyuan Jan 5, 2022

github-actions bot commented Jan 7, 2022

[NSE-400] Native Arrow Row to columnar support #637

[NSE-400] Native Arrow Row to columnar support #637

Conversation

haojinIntel commented Dec 15, 2021 • edited by zhouyuan Loading

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Dec 15, 2021

zhouyuan commented Jan 4, 2022

zhouyuan Jan 5, 2022

Choose a reason for hiding this comment

zhouyuan Jan 5, 2022

Choose a reason for hiding this comment

zhouyuan Jan 5, 2022

Choose a reason for hiding this comment

zhouyuan Jan 5, 2022

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2022

haojinIntel commented Dec 15, 2021 •

edited by zhouyuan

Loading