Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: evict hash join cache every n messages. #8731

Merged
merged 2 commits into from
Mar 23, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 26 additions & 5 deletions src/stream/src/executor/hash_join.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ use crate::task::AtomicU64Ref;
/// enum is not supported in const generic.
// TODO: Use enum to replace this once [feature(adt_const_params)](https://github.com/rust-lang/rust/issues/95174) get completed.
pub type JoinTypePrimitive = u8;

/// Evict the cache every n rows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this according to the input rows or output rows? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is really no perfect choice for this. We just hope this change can mitigate the issue instead of fully resolving that.

Input row works here because the cache entry is 1-1 for every input row. If in some cases the OOM still happens, we should consider decreasing the EVICT_EVERY_N_ROWS (even to 1).

Copy link
Member

@fuyufjh fuyufjh Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think do eviction on every Chunk is good enough 🥵

The counting can surely work, but "evict per chunk" way is the simplest one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the motivations for EVICT_EVERY_N_ROWS is to decouple the eviction time with chunk size.

const EVICT_EVERY_N_ROWS: u32 = 1024;

#[allow(non_snake_case, non_upper_case_globals)]
pub mod JoinType {
use super::JoinTypePrimitive;
Expand Down Expand Up @@ -242,6 +246,8 @@ pub struct HashJoinExecutor<K: HashKey, S: StateStore, const T: JoinTypePrimitiv
metrics: Arc<StreamingMetrics>,
/// The maximum size of the chunk produced by executor at a time
chunk_size: usize,
/// Count the messages received, clear to 0 when counted to `EVICT_EVERY_N_MESSAGES`
cnt_rows_received: u32,

/// watermark column index -> `BufferedWatermarks`
watermark_buffers: BTreeMap<usize, BufferedWatermarks<SideTypePrimitive>>,
Expand Down Expand Up @@ -603,6 +609,7 @@ impl<K: HashKey, S: StateStore, const T: JoinTypePrimitive> HashJoinExecutor<K,
append_only_optimize,
metrics,
chunk_size,
cnt_rows_received: 0,
watermark_buffers,
}
}
Expand Down Expand Up @@ -662,6 +669,7 @@ impl<K: HashKey, S: StateStore, const T: JoinTypePrimitive> HashJoinExecutor<K,
chunk,
self.append_only_optimize,
self.chunk_size,
&mut self.cnt_rows_received,
) {
left_time += left_start_time.elapsed();
yield Message::Chunk(chunk?);
Expand All @@ -687,6 +695,7 @@ impl<K: HashKey, S: StateStore, const T: JoinTypePrimitive> HashJoinExecutor<K,
chunk,
self.append_only_optimize,
self.chunk_size,
&mut self.cnt_rows_received,
) {
right_time += right_start_time.elapsed();
yield Message::Chunk(chunk?);
Expand Down Expand Up @@ -752,14 +761,23 @@ impl<K: HashKey, S: StateStore, const T: JoinTypePrimitive> HashJoinExecutor<K,
// `commit` them here.
self.side_l.ht.flush(epoch).await?;
self.side_r.ht.flush(epoch).await?;

// We need to manually evict the cache to the target capacity.
self.side_l.ht.evict();
self.side_r.ht.evict();

Ok(())
}

// We need to manually evict the cache.
fn evict_cache(
side_update: &mut JoinSide<K, S>,
side_match: &mut JoinSide<K, S>,
cnt_rows_received: &mut u32,
) {
*cnt_rows_received += 1;
if *cnt_rows_received == EVICT_EVERY_N_ROWS {
side_update.ht.evict();
side_match.ht.evict();
*cnt_rows_received = 0;
}
}

fn handle_watermark(
&mut self,
side: SideTypePrimitive,
Expand Down Expand Up @@ -850,6 +868,7 @@ impl<K: HashKey, S: StateStore, const T: JoinTypePrimitive> HashJoinExecutor<K,
chunk: StreamChunk,
append_only_optimize: bool,
chunk_size: usize,
cnt_rows_received: &'a mut u32,
) {
let chunk = chunk.compact();

Expand All @@ -870,6 +889,8 @@ impl<K: HashKey, S: StateStore, const T: JoinTypePrimitive> HashJoinExecutor<K,

let keys = K::build(&side_update.join_key_indices, chunk.data_chunk())?;
for ((op, row), key) in chunk.rows().zip_eq_debug(keys.iter()) {
Self::evict_cache(side_update, side_match, cnt_rows_received);

let matched_rows: Option<HashValueType> =
Self::hash_eq_match(key, &mut side_match.ht).await?;
match op {
Expand Down