-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Serde Integration (~80% faster) #4861
Conversation
match i64::try_from(v) { | ||
Ok(v) => self.serialize_i64(v), | ||
Err(_) => { | ||
let mut buffer = [0_u8; u64::FORMATTED_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional complexity to support rountripping values u64::MAX > v > i64::MAX
seemed not worth it, so we just fallback to serializing to a string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you imagine doing it via i128
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, or as a u64 variant. Given JSON only reliably roundtrips f64, I don't think this is a very common use-case worth optimising for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @tustvold
/// Formatting to a string only to parse it back again is rather wasteful, | ||
/// it may be possible to tweak the tape representation to avoid this | ||
/// | ||
/// Need to use macro as const generic expressions are unstable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
match i64::try_from(v) { | ||
Ok(v) => self.serialize_i64(v), | ||
Err(_) => { | ||
let mut buffer = [0_u8; u64::FORMATTED_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you imagine doing it via i128
or something?
@@ -54,6 +55,25 @@ pub enum TapeElement { | |||
/// | |||
/// Contains the offset into the [`Tape`] string data | |||
Number(u32), | |||
|
|||
/// The high bits of a i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
TapeElement::StartList(end_idx) => Ok(end_idx + 1), | ||
TapeElement::StartObject(end_idx) => Ok(end_idx + 1), | ||
_ => Err(self.error(cur_idx, expected)), | ||
TapeElement::EndObject(_) | TapeElement::EndList(_) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing the catch all
* Store decoded numerics in JSON tape * Add arrow-json serde benchmarks * Fix timestamp serialize * Clippy
Which issue does this PR close?
Closes #.
Rationale for this change
Encoding numerics directly in the tape drastically improves the performance of the serde integration.
It additionally opens the door to eager parsing in the future, which may yield performance improvements for regular JSON decoding.
I have confirmed this does not regress the performance of the JSON decoder
What changes are included in this PR?
Are there any user-facing changes?