-
Notifications
You must be signed in to change notification settings - Fork 75
[NSE-400] Native Arrow Row to columnar support #637
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/native-sql-engine/issues Then could you also rename commit message and pull request title in the following format?
See also: |
…RowToColumnarExec
@haojinIntel can you do a rebase? the format check is fixed in master |
val res = new Iterator[ColumnarBatch] { | ||
private val converters = new RowToColumnConverter(localSchema) | ||
private var last_cb: ColumnarBatch = null | ||
private var elapse: Long = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new buf here
// Allocate large buffer to store the numRows rows | ||
val bufferSize = 134217728 // 128M can estimator the buffer size based on the data type | ||
val allocator = SparkMemoryUtils.contextAllocator() | ||
val arrowBuf: ArrowBuf = allocator.buffer(bufferSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reuse buf allocated in #95
} | ||
val timeZoneId = SparkSchemaUtils.getLocalTimezoneID() | ||
val arrowSchema = ArrowUtils.toArrowSchema(schema, timeZoneId) | ||
val schemaBytes: Array[Byte] = ConverterUtils.getSchemaBytesBuf(arrowSchema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move these varaiables to init
out_data.null_count = null_count; | ||
*array = MakeArray(std::make_shared<arrow::ArrayData>(std::move(out_data))); | ||
return arrow::Status::OK(); | ||
} else if (type->id() == arrow::Int8Type::type_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use switch
here
* Support ArrowRowToColumnar Optimization * Replace expired code * Add the code to convert recordbatch to columnarBatch * Add unit test on java size * Update the unit tests * Fix the bug when reading decimal value from unsafeRow * Use ArrowRowToColumnarExec instead of RowToArrowColumnarExec * Use clang-format to standardize the CPP code format * enable arrowRowToColumnarExec * Add the metrics for ArrowRowToColumnarExec * Add the metrics for ArrowRowToColumnarExec and unsupport Codegen * Add parameter 'spark.oap.sql.columnar.rowtocolumnar' to control ArrowRowToColumnarExec * Remove useless code * Release arrowbuf after return recordbatch * Fix the processTime metric for ArrowRowToColumnarExec * Refine the code of ArrowRowToColumnar operator * Add more metrics to detect the elapse time of each action * Small fix about allocating buffer for unsafeRow * Remove useless code * Remove useless metrics for ArrowRowToColumnarExec * Fall back to use java RowToColumnarExec when the row is not unsafeRow Type * Fix the bug for decimal format * fix format Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
* Use Hadoop 3.2 as default hadoop.version (#652) * [NSE-661] Add left/right trim in WSCG * [NSE-675] Add instr expression support (#676) * Initial commit * Add the support in wscg * [NSE-674] Add translate expression support (#672) * Initial commit * Add StringTranslate for subquery checking * Code refactor * Change arrow branch for unit test [will revert at last] * Revert "Change arrow branch for unit test [will revert at last]" This reverts commit bf74356. * Port the function to wscg * Change arrow branch for unit test [will revert at last] * Format native code * Fix a bug * Revert "Change arrow branch for unit test [will revert at last]" This reverts commit 3a53fa2. * [NSE-681] Add floor & ceil expression support (#682) * Initial commit * Add ceil expression support * Change arrow branch for unit test [will revert at last] * Revert "Change arrow branch for unit test [will revert at last]" This reverts commit 5fb2f4b. * [NSE-647] Leverage buffered write in shuffle (#648) Closes #647 * [NSE-400] Native Arrow Row to columnar support (#637) * Support ArrowRowToColumnar Optimization * Replace expired code * Add the code to convert recordbatch to columnarBatch * Add unit test on java size * Update the unit tests * Fix the bug when reading decimal value from unsafeRow * Use ArrowRowToColumnarExec instead of RowToArrowColumnarExec * Use clang-format to standardize the CPP code format * enable arrowRowToColumnarExec * Add the metrics for ArrowRowToColumnarExec * Add the metrics for ArrowRowToColumnarExec and unsupport Codegen * Add parameter 'spark.oap.sql.columnar.rowtocolumnar' to control ArrowRowToColumnarExec * Remove useless code * Release arrowbuf after return recordbatch * Fix the processTime metric for ArrowRowToColumnarExec * Refine the code of ArrowRowToColumnar operator * Add more metrics to detect the elapse time of each action * Small fix about allocating buffer for unsafeRow * Remove useless code * Remove useless metrics for ArrowRowToColumnarExec * Fall back to use java RowToColumnarExec when the row is not unsafeRow Type * Fix the bug for decimal format * fix format Co-authored-by: Yuan Zhou <yuan.zhou@intel.com> * fix leakage in rowtocolumn (#683) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Co-authored-by: Wei-Ting Chen <weiting.chen@intel.com> Co-authored-by: PHILO-HE <feilong.he@intel.com> Co-authored-by: Hongze Zhang <hongze.zhang@intel.com> Co-authored-by: haojinIntel <hao.jin@intel.com>
What changes were proposed in this pull request?
Native Arrow Row to columnar support
How was this patch tested?
pass jenkins