Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip hashagg opt3 #4

Open
wants to merge 91 commits into
base: master
Choose a base branch
from
Open

Wip hashagg opt3 #4

wants to merge 91 commits into from

Conversation

zhouyuan
Copy link
Owner

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

zhouyuan and others added 4 commits April 20, 2022 21:25
This patch disabled SMJ with local limit as child

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
…roject#847)

This patch adds support for expressions: length, char_length, locate, regexp_extract.
The codegen part will be added in next PR

* Enable length/char_length/locate to be workable

* Add regexp_extract expression support

* Correct the return type and add subquery checking

* Change arrow branch for test [will revert at last]

* Let supportColumnarCodegen return false

* Check codegen support for columnar BHJ with condition

* Fallback non-literal regex case

* Remove the assert for bytes read metric in a unit test

* Revert "Change arrow branch for test [will revert at last]"

This reverts commit 19f8942.
* Add substring_index support

* Fix a compile issue

* Change arrow branch for test [will revert at last]

* Revert "Change arrow branch for test [will revert at last]"

This reverts commit e11e9db.

* Return false for checking codegen support
@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/native-sql-engine/issues

Then could you also rename commit message and pull request title in the following format?

[NSE-${ISSUES_ID}] ${detailed message}

See also:

@zhouyuan zhouyuan closed this Apr 25, 2022
@zhouyuan zhouyuan reopened this Apr 25, 2022
Eman Copty and others added 11 commits April 25, 2022 20:51
* code gen changes

* use logDebug instead of logWarning
* implement replace function

* set columnar codegen to false
* [NSE-728] Upgrade to Arrow 7.0.0 (oap-project#729)

Known issues of current Arrow 7.0.0 support:

1. Data Source writing / ORC reading is disabled;
2. Data Source filter pushdown is disabled;
3. FastPFor compression is leading to unexpected concurrent memory writes. Use LZ4 instead.

* fix get_physical_plan issue

* Revert "[NSE-728] Upgrade to Arrow 7.0.0 (oap-project#729)"

This reverts commit e329253.

Co-authored-by: Hongze Zhang <hongze.zhang@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
 (oap-project#894)

* merge master and branch shuffle_opt_fillbyreducer. To submit PR to upstream
Implemented fill by reducer

* format code

* Allocate large block of memory then slice to each buffer

* wip, rebase to master

* to rebase to master

* return to original

* added memory leak check in test

* Done

* disable alignment allocation in benchmark since arrow doesn't support it

* optimized validity buffer assign. initialize the validity buffer as true once allocated. skip the initialize during split
fix validity buffer bug

* fix out of memory test

* fix setbitsto bug
remove nullcnt

* add shuffle test

* remove unused variables

* allocate validity buffer from pool

* fix bug
set validity buffer after allocation
fix bug during of last bits after process valitity buffer

* Add arrow check for batch size and part number
use uint32 as row number size

* format code

* fix format

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
This patch adds the missing columnar expression replacement for substring_index
This patch adds one more configuration for deployment with jar cmd
not in PATH

export JAR=/path/to/jar
--conf spark.executorEnv.JAR=/path/to/jar

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
@zhouyuan zhouyuan force-pushed the wip_hashagg_opt3 branch 2 times, most recently from fa1b0ff to d6ff00a Compare May 10, 2022 03:26
@zhouyuan zhouyuan force-pushed the wip_hashagg_opt3 branch 2 times, most recently from e5d2efe to d5bcc6b Compare May 10, 2022 14:29
zhztheplayer and others added 6 commits May 11, 2022 16:33
"INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension
The issue is GlobalLimit used a special setNumRows() which only effect the recordbatch level, the internal vectors are not changed. 

This patch adds a new API for setting rows on vectors
…#915)

further optimization of validity buffer split. Get 8 bit each time and set the destination.
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
* Add pmod support

* Set false for code gen checking

* Change arrow branch for test [will revert at last]

* Revert "Change arrow branch for test [will revert at last]"

This reverts commit 85fd4f3.
* Initial commit

* Add unit test

* Fix compile issues

* Fix compile issue in ut and remove decimal support

* Fix runtime issues

* Add seperate action class for handling string type input

* Get attribute for first agg func

* Fix bugs in support numeric types

* Add ignoreNulls node in making arrow function

* Handle special case

* Remove a redundant variable

* Exclude first agg in code gen

* Add a unit test for testing group by case

* Format the native code
* Test for removing memset

* merge Numeric type case

* Add #define

* Only remove memset

* Only add macro

* Recover memset and Only add macro

* Use cmov for C2R

* Improve Vector usage

* Remove String case

* Remove memset in Init and add memset in Write

* Add memset for fixedwidth type and  add benchmark

* get optimized code from FelixYBW Repo

* Fix int8_t

* Fix String/Binary Buffer

* Fix Multi Rows Buffer Error

* Add native UT and benchmark

* Add Buffer UT in columnar_to_row_converter_test.cc

* Adapt new interfaces

* Fix length and offset in JNI

* Add AVX512 Flags

* Fix GHA

* Add GHA fixes

* make properties enbale

* Add CXXFlags

* Fix UT bugs

* Fix clang format

* Add .
PHILO-HE and others added 28 commits July 4, 2022 16:41
* Inital commit

* Cover no consition case for BHJ & SHJ
* Initial commit

* Implement doColumnarCodeGen

* Handle different input types
* Revert "disable row_number() temporary (oap-project#994)"

This reverts commit b973977.

* improve row_number()

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
* [NSE-943] Optimize String/Binary Type for Row2Columnar

* Fix TPCDS queries

* Add __AVX512BW__ Check

* [WIP][NSE-943] Utilize CPU Cache by first-row-second-column and fixed-width type Optimization

* Extract vector from functin

* Add optimizations

* Add remaining optimization

* Remove ListType in Native R2C

* Fix Spark UT

* Clean code

* Fix clang format
* s/string/string_view in sort

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* improve timsort

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
… time zone in handling unix timestamp (oap-project#1021)

* Trim user-specified format in time expression

* Support other formats

* Change arrow branch [will revert at last]

* Fix issues

* Do some converts

* Support more format for from_unixtime

* Align with spark's timezone awareness

* Refine the code

* Add some comment

* Correct the expected results in a UT

* Revert "Change arrow branch [will revert at last]"

This reverts commit 11f0977.
* Initial commit

* Change arrow branch [will revert at last]

* Refine the code

* Ignore a UT

* Revert "Change arrow branch [will revert at last]"

This reverts commit 60a7b34.
…oject#1017)

* use TimSort for STRING/DECIMAL onekey based sorting

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* fix sort unit test

std::sort is a stable sort on most times, while Timsort is not stable

this patch changes to sort unit tests to align with Timsort result

gtest repeat 1000 times seems stable

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* fix format

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* test log

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* remove too many tests

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* fix sort external test

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
* fix: missing WSCG check for keys in join

* change comment

* remove UnsupportedOperationException in join key check
* Turn on the support for get_json_object

* Change arrow branch [will revert at last]

* Revert "Change arrow branch [will revert at last]"

This reverts commit 53b8fc8.
* Initial commit

* Cast short type to int32
* Initial commit

* Change arrow branch [will revert at last]

* Revert "Change arrow branch [will revert at last]"

This reverts commit 545b1b4.
* Initial commit

* Ignore some test failures
* Initial commit

* Fix bugs

* Small fix on code format

* Fix bugs for find_in_set

* Change arrow branch [will revert at last]

* Revert "Change arrow branch [will revert at last]"

This reverts commit c727c69.
…ributeReference (oap-project#1041)

* Initial commit

* Consider leaf expression

* Revert some changes and support to get attr for conv/lpad

* Remove the handling for MakeTimestamp
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants