Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH-178]Support cache on locals for remote hdfs files #203

Draft
wants to merge 656 commits into
base: clickhouse_backend
Choose a base branch
from

Conversation

lgbo-ustc
Copy link

@lgbo-ustc lgbo-ustc commented Nov 18, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add cache on the local disk for remote hdfs files, which can be used directly for later query.

Need to set the following configurations

spark.gluten.sql.columnar.backend.ch.runtime_conf.runtime_settings.use_local_cache_for_remote_storage

default is false

spark.gluten.sql.columnar.backend.ch.runtime_conf.local_cache_for_remote_fs.root_dir

default is "local_cache_root"

spark.gluten.sql.columnar.backend.ch.runtime_conf.local_cache_for_remote_fs.limit_size

default is 10G

Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/

KochetovNicolai and others added 30 commits December 13, 2021 11:52
Backport ClickHouse#32270 to 21.9: Fix possible Pipeline stuck in case of StrictResize processor.
fix setcap in docker

(cherry picked from commit 42787cf)
Backport ClickHouse#32117 to 21.9: Dictionaries custom query condition fix
Backport ClickHouse#32359 to 21.9: Fix usage of non-materialized skip indexes
Backport ClickHouse#31859 to 21.9: keeper session timeout doesn't work
Backport ClickHouse#32201 to 21.9: Try fix 'Directory tmp_merge_<part_name>' already exists
Backport ClickHouse#32755 to 21.9: fix crash fuzzbits with multiply same fixedstring
lgbo-ustc and others added 14 commits October 24, 2022 14:35
* support sort op

* fixed null order

* fixed null ordering
* add JNIEXPORT and JNICALL

* add a concurrentMap implementation

* add reserve no exception

* Revert "add JNIEXPORT and JNICALL"

This reverts commit 24f3f71.

* add reserve no exception

* change reserve function
* support count(*)

support count(*)/count(1)

* fixed code style

* update variables' names
…ouse#181)

Support non-HA mode for ClickHouse reading from HDFS.

Close ClickHouse#180 .
…ncat/instr/char_length/replace/abs/chr/ceil/floor/exp/power (ClickHouse#172)

* add functions concat/char_length/instr

* drop functions related with clickhouse/clickhouse repo

* add function abs/chr/ceil/floor/exp/power/pmod

* adject function order

* swap args of function replace
…se#163)

* support calculate backing length of different types

* remove comment

* rename symbols

* apply BackingDataLengthCalculator

* support decimal from ch column to spark row

* fix decimal issue in ch column to spark row

* refactor SparkRowInfo

* fix building error

* wip

* implement demo

* dev map

* finish map and tuple

* fix building error

* finish writer dev

* fix code style

* ready to improve spark row to ch column

* wip

* finish array/map/tuple reader

* fix building error

* add some uts

* finish debug

* commit again

* finish plan convert

* add benchmark

* improve performance

* try to optimize spark row to ch column

* continue

* optimize SparkRowInfo::SparkRowInfo

* wrap functions

* improve performance

* improve from 360ms to 240 ms

* finish optimizeing performance

* add benchmark for BM_SparkRowTOCHColumn_Lineitem

* refactor spark row reader

* finish tests

* revert cmake

* fix code style

* fix code style

* fix memory leak

* fix build error

* fix building error in debug mode

* add test data file

* add build type, convert ch type to substrait type

* refactor jni interface: native column type

* fixbug of decimal

* replace decimal.parquet

* add data array.parquet

* add test data map.parquet

* add test data file

* finish debug

* wip

* fix logging

* fix address problem

* fix core dump

* fix code style

* throw exception when complex types in substrait plan is in nullable

* make ch complex type nullable

* support nullable complex types

* add tests for parquet nullable

* add uts for all types

* debug gtest_parquet_read

* fix issue: Kyligence#166

* remove stdout log

* fix bug of binary null

* remove logs

* remove useless files
* support more math functions

* rename some functions

* add debug logging

* revert log level

* support function greatest and least

* support cast binary

* support quarter
* add prewhere support

* ignore delta directory

* fix prewhere parse error when has in funciton

* fix is_not_null result type error
* [CH-190] enable tests in GlutenDataFrameAggregateSuite

* [CH-190] fix review comments
@kyligence-git
Copy link
Collaborator

Can one of the admins verify this patch?

@lgbo-ustc lgbo-ustc force-pushed the local_cache branch 3 times, most recently from fa89bd0 to 920d901 Compare November 18, 2022 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.