Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend):change query_handle to return data stream #5556

Merged
merged 4 commits into from
Sep 28, 2022
Merged

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Sep 26, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

change query_handle to return data stream

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#5233

@ZENOTME ZENOTME force-pushed the zj/stream_query branch 2 times, most recently from ba2e65b to 3f22b73 Compare September 26, 2022 11:07
@codecov
Copy link

codecov bot commented Sep 26, 2022

Codecov Report

Merging #5556 (5fc18fa) into main (c9dc5b9) will decrease coverage by 0.05%.
The diff coverage is 61.76%.

❗ Current head 5fc18fa differs from pull request most recent head bdbf753. Consider uploading reports for the commit bdbf753 to get more accurate results

@@            Coverage Diff             @@
##             main    #5556      +/-   ##
==========================================
- Coverage   74.31%   74.25%   -0.06%     
==========================================
  Files         907      907              
  Lines      142996   143193     +197     
==========================================
+ Hits       106269   106334      +65     
- Misses      36727    36859     +132     
Flag Coverage Δ
rust 74.25% <61.76%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/handler/explain.rs 0.00% <0.00%> (ø)
src/frontend/src/handler/query.rs 0.00% <0.00%> (ø)
src/frontend/src/handler/variable.rs 0.00% <0.00%> (ø)
...rontend/src/scheduler/distributed/query_manager.rs 14.01% <0.00%> (+0.38%) ⬆️
src/frontend/src/scheduler/local.rs 0.00% <0.00%> (ø)
src/utils/pgwire/src/pg_protocol.rs 61.33% <62.50%> (+0.04%) ⬆️
src/utils/pgwire/src/pg_extended.rs 88.90% <75.00%> (-0.22%) ⬇️
src/utils/pgwire/src/pg_response.rs 70.11% <80.00%> (+1.82%) ⬆️
src/frontend/src/handler/describe.rs 89.25% <100.00%> (+0.08%) ⬆️
src/frontend/src/handler/show.rs 87.24% <100.00%> (+0.08%) ⬆️
... and 178 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@BowenXiao1999 BowenXiao1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/frontend/src/scheduler/distributed/query_manager.rs Outdated Show resolved Hide resolved
@@ -69,10 +72,15 @@ pub async fn handle_query(
};

let rows_count = match stmt_type {
StatementType::SELECT => rows.len() as i32,
StatementType::SELECT => 0_i32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for query statement, rows_count is meaningless because it's stream mode. We will calculate the row count when we receive the data.
I have added the related comment in the rows_count field of PgResponse.

// row count of effected row. Used for INSERT, UPDATE, DELETE, COPY, and other statements that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if we use Option here.

@TennyZhuang TennyZhuang changed the title feat(frontend):change query_hanlde to return data stream feat(frontend):change query_handle to return data stream Sep 26, 2022
@@ -29,7 +32,7 @@ use crate::scheduler::{
};
use crate::session::{OptimizerContext, SessionImpl};

pub type QueryResultSet = Vec<Row>;
pub type QueryResultSet = BoxStream<'static, std::result::Result<Row, BoxedError>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use generic to avoid this Box? There could be millions of rows in the result set, and I'm not sure whether this impacts the performance a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it may impact the performance. I try to use generic but it will make so many places full of generic param. I think it's not appropriate to use generic param( at least the way I use .
I think there are other ways to solve this problem:

  • use the type implement stream trait to avoid generic param
  • use Vec instead of Row to decrease the time of call next()
    BoxStream<'static, std::result::Result<Vec,BoxedError>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it may impact the performance.

Could you elaborate this point? 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boxstream is Box, it's a dynamic bind. So I guess every time we call 'stream.next()' to get a row, it needs to call by some mechanism like vtable. Hence if there are millions of rows in the result set, we may need to access vtable millions of times.

Copy link
Member

@xxchan xxchan Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeees, Sorry I misread it as "may not impact" 😄😄😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the type implement stream trait to avoid generic param

Yes. We may Type Alias Impl Trait here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Vec instead of Row to decrease the time of call next()
BoxStream<'static, std::result::Result<Vec,BoxedError>>

I have used this way to decrease the time of calling stream.next() .
And I will create a new PR to create a new stream type to avoid Boxstream later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create an issue for that

@@ -69,10 +72,15 @@ pub async fn handle_query(
};

let rows_count = match stmt_type {
StatementType::SELECT => rows.len() as i32,
StatementType::SELECT => 0_i32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if we use Option here.

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

values: Vec<Row>,
row_desc: Vec<PgFieldDescriptor>,
) -> Self {
Self {
stmt_type,
row_cnt,
row_cnt: row_cnt.unwrap_or(0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also change the field to Option?

@ZENOTME ZENOTME linked an issue Sep 28, 2022 that may be closed by this pull request
@mergify mergify bot merged commit 960e11b into main Sep 28, 2022
@mergify mergify bot deleted the zj/stream_query branch September 28, 2022 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Frontend should not collect all data before responding.
4 participants