Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sink): support async for bigquery sink #17488

Merged
merged 23 commits into from
Sep 3, 2024
Merged

feat(sink): support async for bigquery sink #17488

merged 23 commits into from
Sep 3, 2024

Conversation

xxhZs
Copy link
Contributor

@xxhZs xxhZs commented Jun 27, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

in #17383 (comment)
We found that waiting to write data can block for a long time, affecting throughput.
So we support async for bigquery sink

bench with this pr:
avg: 108025 rows/s
p90: 116736 rows/s
p95: 116736 rows/s
p99: 118784 rows/s

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@xxhZs xxhZs requested a review from a team as a code owner June 27, 2024 10:36
@xxhZs xxhZs requested a review from wenym1 July 1, 2024 08:17
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we can avoid spawning a tokio task to poll the response stream. We can implement our own BigQueryLogSinker, which polls the log_reader and the response_stream with a select.

let (expect_offset, mut rx) = self.client.get_subscribe();
let future = Box::pin(async move {
loop {
match rx.recv().await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead you can create a oneshot channel for each AppendRowsRequest so that you won't need this shared broadcast channel. When the spawned worker knows that a request is handled, it notifies the rx with the oneshot tx, and here the future can be simply the rx.

let append_req = AppendRowsRequest {
write_stream: write_stream.clone(),
offset: None,
trace_id: Uuid::new_v4().hyphenated().to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tracked the trace_id locally instead of incrementing the self.offset? I think in the response stream, it should return the same trace_id in the corresponding response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it, but there is no ID in the returned message

@xxhZs xxhZs requested a review from wenym1 July 10, 2024 08:52
xxhZs added 2 commits July 19, 2024 12:39
svae

add fix
@xxhZs
Copy link
Contributor Author

xxhZs commented Jul 22, 2024

Since each row of bg can't exceed 10MB, new logic is added to split the size of each row

@fuyufjh fuyufjh removed the request for review from a team July 23, 2024 08:03
Comment on lines 111 to 115
while resp_num > 1 {
self.offset_queue.push_back(None);
resp_num -= 1;
}
self.offset_queue.push_back(Some(offset));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inserting resp_num - 1 Nones and 1 Some(offset)? Could you please add some comments here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if a chunk is too large, we will split it into multiple async tasks, so we need to wait until all the tasks are finished before we can truncate, here none means that he is only the middle async task is completed, to the chunk before you can truncate.
And i will add comments.


fn default_retry_times() -> usize {
5
pub async fn wait_next_offset(&mut self) -> Result<Option<TruncateOffset>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess async fn next_offset is enough to indicate that the function call may need to be "waited".

Comment on lines 119 to 130
if let Some(Some(TruncateOffset::Barrier { .. })) = self.offset_queue.front() {
return Ok(self.offset_queue.pop_front().unwrap());
}
self.resp_stream
.next()
.await
.ok_or_else(|| SinkError::BigQuery(anyhow::anyhow!("end of stream")))??;
self.offset_queue.pop_front().ok_or_else(|| {
SinkError::BigQuery(anyhow::anyhow!(
"should have pending chunk offset when we receive new response"
))
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we can only directly pop and return if the front of queue is TruncateOffset::Barrier? Is it possible to have other variants of TruncateOffset in the queue? And what happen if there are?

Could you please add some comments here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't go to the bg sink to create an async task when we receive a barrier, there's no need to wait here for the
And i will add comments.

Copy link
Member

@stdrc stdrc Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't go to the bg sink to create an async task when we receive a barrier, there's no need to wait here for the

Can't quite get it. Could you elaborate in proper English?

Copy link
Contributor Author

@xxhZs xxhZs Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 cases:

  1. Some(barrier), since we won't be sending a barrier to bigquery, there is no need to wait for the. But we will record the barriers in the log store, so we need to truncate.
  2. Some(chunk), since we send a chunk to bigquery and record it in the log store, so we need to truncate and wait for resp.
  3. None, It means that we split a large chunk of RW into multiple requests for bigquery (for this chunk, in offset_queue, we set it to none , none ... chunk). So we need to wait for all the none and chunk resps and truncate in Some(chunk)

}
pub struct BigQueryLogSinker {
writer: BigQuerySinkWriter,
bigquery_future_manager: BigQueryFutureManager,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not future_manager? we don't need to add so many prefixes in private fields/types. And I guess it may not be so suitable to call it "manager" here.

Also, do we really need a BigQueryFutureManager type here? It seems don't "manage" anything, we still directly access bigquery_future_manager.offset_queue below😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For async sink, we have a FutureManager to manage the asyn tasks and truncate our log store chunk after the async tasks are done. this BigQueryFutureManager has the same tasks as FutureManager, just customized for bg sink

})
let mut client = conn.conn();

let (tx, rx) = mpsc::channel(BIGQUERY_SEND_FUTURE_BUFFER_MAX_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use bounded channel here since we already limit the queue size in the select!

.map_err(|e| SinkError::BigQuery(e.into()))?
.into_inner();
loop {
if let Some(append_rows_response) = resp_stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we reach the end of the resp_stream, we will keep yielding (). This is not correct. Instead of using if let Some(...), we'd better just resp_stream.message().await.ok_or_else(|| ...) to turn None into an end-of-stream error.

Copy link
Contributor Author

@xxhZs xxhZs Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't return () at the end of the stream, after receiving every reply, if there are no errors, it returns a meaningless (),
The goal is to return that a message was received with no errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the purpose of returning a meaningless (). What I meant was that, when resp_stream.message() returns None, we should break the loop with an EndOfStream error, or just simply break the loop, instead of ignoring it.

In current code, after resp_stream returns None, an () will still be yielded, and later when polled again, and resp_stream returns None again, and an () is still yielded, and this will be repeated endlessly over and over again, while the external code is not aware that the response stream has actually stopped.

if let Some(Some(TruncateOffset::Barrier { .. })) = self.offset_queue.front() {
return Ok(self.offset_queue.pop_front().unwrap());
}
self.resp_stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the correctness depends on that the number of non-barrier items (either None or Some(TruncateOffset::Chunk)) in the queue is the same as the inflight request. However, in the implementation, it's possible to have 0 resp_num when we write a chunk. For this chunk, there is no inflight request, but it will have an item in the queue, which causes inconsistency between the number of queue items and inflight requests.

I think instead of using none to represent that a chunk is split into multiple requests, we'd better store (TruncateOffset, remaining_resp_num) as the queue item. Code will be like the following

if let Some((offset, remaining_resp_num)) = self.offset_queue.front_mut() {
    if *remaining_resp_num == 0 {
        return Ok(self.offset_queue.pop_front().unwrap().0);
    }
    while *remaining_resp_num > 0 {
        self.resp_stream.next().await...??;
        *remaining_resp_num -= 1;
    }
} else {
    return pending().await;
}

save

save
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@@ -100,8 +100,8 @@ Starrocks:
BigQuery:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert the change.

fn default_max_batch_rows() -> usize {
1024
struct BigQueryFutureManager {
// `offset_queue` holds the Some corresponding to each future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment.

@xxhZs xxhZs enabled auto-merge September 3, 2024 08:43
@xxhZs xxhZs added this pull request to the merge queue Sep 3, 2024
Merged via the queue into main with commit c0ce8a8 Sep 3, 2024
29 of 30 checks passed
@xxhZs xxhZs deleted the xxh/bg-async branch September 3, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants