Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: value encoding for list and struct is slower than key encoding #5713

Closed
Gun9niR opened this issue Oct 8, 2022 · 4 comments
Closed
Labels
component/common Common components, such as array, data chunk, expression. type/perf

Comments

@Gun9niR
Copy link
Contributor

Gun9niR commented Oct 8, 2022

In #5165, we have benchmarked the ser / de of key / value encoding. We discovered that ser/de of value encoding for List and Struct are slower than key encoding. The perf gap comes from the use of protobuf.

fn serialize_value(value: ScalarRefImpl<'_>, mut buf: impl BufMut) {
match value {
ScalarRefImpl::Int16(v) => buf.put_i16_le(v),
ScalarRefImpl::Int32(v) => buf.put_i32_le(v),
ScalarRefImpl::Int64(v) => buf.put_i64_le(v),
ScalarRefImpl::Float32(v) => buf.put_f32_le(v.into_inner()),
ScalarRefImpl::Float64(v) => buf.put_f64_le(v.into_inner()),
ScalarRefImpl::Utf8(v) => serialize_str(v.as_bytes(), buf),
ScalarRefImpl::Bool(v) => buf.put_u8(v as u8),
ScalarRefImpl::Decimal(v) => serialize_decimal(&v, buf),
ScalarRefImpl::Interval(v) => serialize_interval(&v, buf),
ScalarRefImpl::NaiveDate(v) => serialize_naivedate(v.0.num_days_from_ce(), buf),
ScalarRefImpl::NaiveDateTime(v) => {
serialize_naivedatetime(v.0.timestamp(), v.0.timestamp_subsec_nanos(), buf)
}
ScalarRefImpl::NaiveTime(v) => {
serialize_naivetime(v.0.num_seconds_from_midnight(), v.0.nanosecond(), buf)
}
ScalarRefImpl::Struct(s) => {
serialize_struct_or_list(s.to_protobuf_owned(), buf);
}
ScalarRefImpl::List(list) => {
serialize_struct_or_list(list.to_protobuf_owned(), buf);
}
}
}
fn serialize_struct_or_list(bytes: Vec<u8>, mut buf: impl BufMut) {
buf.put_u32_le(bytes.len() as u32);
buf.put_slice(bytes.as_slice());
}

Possible solutions:

  • Just use key encoding instead
  • Explore more efficient interfaces of protobuf
  • Find some other data format

Related context:

@Gun9niR Gun9niR added type/feature component/common Common components, such as array, data chunk, expression. type/perf labels Oct 8, 2022
@github-actions github-actions bot added this to the release-0.1.14 milestone Oct 8, 2022
@xiangjinwu
Copy link
Contributor

Should be fixed by #4672

@waruto210
Copy link
Contributor

The perf of ser/de of value encoding for List and Struct has improved a lot by #5880, but deserialization of value encoding for List is still slower than key encoding.
Should I close this issue?

@Gun9niR
Copy link
Contributor Author

Gun9niR commented Oct 17, 2022

Should we investigate why de is still slower before closing this issue?

@waruto210
Copy link
Contributor

Should we investigate why de is still slower before closing this issue?

According to #5897, this should be caused by lazy allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/common Common components, such as array, data chunk, expression. type/perf
Projects
None yet
Development

No branches or pull requests

3 participants