Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement](load) http load using SQL #21621

Closed
wants to merge 5 commits into from

Conversation

zzzzzzzs
Copy link
Contributor

@zzzzzzzs zzzzzzzs commented Jul 7, 2023

This PR aims to implement a new http load. like #21172

For this HTTP load, you can use SQL to encapsulate the parameters for more convenient use.
userguide
curl -v --location-trusted -u user1:password -H "sql: 'sql string'" -T example.csv http://127.0.0.1:8030/api/v2/_load

sql string
INSERT INTO db1.table1 select * from http(format="csv",xxxx,xxxx) where t1 > 10;

example:

curl -v --location-trusted -u root: -H "sql: insert into test.t1(k1) select k1 from http(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/v2/_load
8e88d3f86b81f1111ed7559037bc1dc

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

The current implementation is to use http requests to submit an sql to be.
@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2023

clang-tidy review says "All clean, LGTM! 👍"

@zzzzzzzs
Copy link
Contributor Author

zzzzzzzs commented Jul 7, 2023

run buildall

@yiguolei
Copy link
Contributor

yiguolei commented Jul 9, 2023

very great feature. I want to implement for a long time.

@yiguolei
Copy link
Contributor

yiguolei commented Jul 9, 2023

Is insert into db.table select expr1,expr2 from http() better?

@yiguolei
Copy link
Contributor

yiguolei commented Jul 9, 2023

@zzzzzzzs I think maybe you could refer this PR #16940.

@zzzzzzzs zzzzzzzs closed this Jul 10, 2023
@zzzzzzzs zzzzzzzs reopened this Jul 10, 2023
@zzzzzzzs
Copy link
Contributor Author

Is insert into db.table select expr1,expr2 from http() better?

Yes, I have done some basic work now, and I will improve it

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@zzzzzzzs zzzzzzzs changed the title [Enhancement](load) submit a sql to be side using a new http load [Enhancement](load) http load using SQL Jul 21, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

zzzzzzzs added 2 commits July 22, 2023 02:27
Conflicts:
	be/src/vec/exec/format/csv/csv_reader.cpp
	fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@zzzzzzzs
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does select col1+col2, col2 from http() works?

DEFINE_COUNTER_METRIC_PROTOTYPE_2ARG(http_load_duration_ms, MetricUnit::MILLISECONDS);
DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(http_load_current_processing, MetricUnit::REQUESTS);

void HttpLoadAction::_parse_format(const std::string& format_str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate with code in stream_load.cpp we should refactor it. BTW, please add ut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. PR: #22304

30: optional string delete_condition // delete
31: optional string hidden_columns
32: optional bool trim_double_quotes // trim double quotes for csv
33: optional i32 skip_lines // csv skip line num, only used when csv header_type is not set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should classify these options, like scanner_options, etc. @TangSiyang2001 what is your opinion?

params.setHiddenColumns(paramMap.get("hidden_columns"));
params.setTrimDoubleQuotes(Boolean.valueOf(paramMap.getOrDefault("trim_double_quotes", "false")));
params.setSkipLines(Integer.valueOf(paramMap.getOrDefault("skip_lines", "0")));
params.setPartialColumns(Boolean.valueOf(paramMap.getOrDefault("partial_columns", "false")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should write these code much more beauty.

8: optional string partitions
9: optional string temporary_partitions
10: optional string columns
11: required string format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use required

@zzzzzzzs zzzzzzzs closed this Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants