-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements writing e-expressions in binary 1.1 #722
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
@@ -71,8 +71,9 @@ num-integer = "0.1.44" | |||
num-traits = "0.2" | |||
arrayvec = "0.7" | |||
smallvec = {version ="1.9.0", features = ["const_generics"]} | |||
bumpalo = {version = "3.14.0", features = ["collections", "std"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Version 3.15.3
includes Vec
optimizations that led to substantial speed-ups:
@@ -142,7 +142,6 @@ pub fn criterion_benchmark(c: &mut Criterion) { | |||
} | |||
|
|||
fn roundtrip_var_uint_test(unsigned_values: &[u64]) -> IonResult<Vec<u8>> { | |||
println!("Roundtripping unsigned values as VarUInts to check for correctness."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ These println!
statements were being run even when the associated benchmark was not, leading to weird output.
use ion_rs::lazy::encoder::value_writer::{StructWriter, ValueWriter}; | ||
use ion_rs::RawSymbolTokenRef; | ||
|
||
fn write_struct_with_string_values(value_writer: impl ValueWriter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Each of these methods takes an impl ValueWriter
, allowing us to pass in any writer in the set of {text, binary} x {v1.0, v1.1}
and ensuring that the same logic is used for each.
//! This program demonstrates implementing WriteAsIon using Ion 1.1's e-expressions for a more | ||
//! compact encoding. It uses raw-level writer APIs that end users are unlikely to leverage. | ||
//! Ion 1.1 is not yet finalized; the encoding produced by this example and the APIs it uses | ||
//! are very likely to change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This program generates reasonable example data wherein each struct has the same general layout and values are likely to recur. Its output can be used for data size and compression testing.
// is 10 bytes. | ||
|
||
if bytes_available < 10 { | ||
cold_path! {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This code path is for encoding FlexUInt
s that require 9 or 10 bytes to encode, which is exceedingly rare. There's no logic change here, just indentation for cold_path!
.
pub trait MacroArgsWriter: SequenceWriter { | ||
// TODO: methods for writing tagless encodings | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ For the moment, the MacroArgsWriter
only supports tagged arguments.
// The ValueWriter implementation of these methods moves `self`. In contrast, all of the methods | ||
// in the SequenceWriter interface take `&mut self`, which adds another lifetime to the mix. The | ||
// borrow checker is not currently able to tell that `&mut self`'s lifetime will outlive the | ||
// closure argument's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I took another crack at fixing this only to remember why it wasn't possible. 😅
@@ -31,3 +37,128 @@ impl<'top, D: LazyDecoder> From<Never> for MacroExpr<'top, D> { | |||
unreachable!("macro in Ion 1.0 (method: into)") | |||
} | |||
} | |||
|
|||
impl AnnotatableValueWriter for Never { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Implementing all of these traits for Never
allows it to be used to satisfy macro-related associated types in Ion 1.0, statically preventing those types/methods from being used.
@@ -22,15 +22,15 @@ use std::fmt::{Display, Formatter}; | |||
use crate::types::Str; | |||
/// Configures and constructs new instances of [Reader]. | |||
pub struct ReaderBuilder { | |||
// This will be set to to `Some` catalog when the reader builder is created with catalog information. | |||
// Default value for this field will be set to `None`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This field used to be an Option
. I've updated the comment in the constructor.
digest = { version = "0.9", optional = true } | ||
ice_code = "0.1.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we can trust the author of this dependency? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty shady to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that's why it's so cold.
@@ -1,13 +1,14 @@ | |||
use bumpalo::collections::Vec as BumpVec; | |||
use bumpalo::Bump as BumpAllocator; | |||
use delegate::delegate; | |||
use ice_code::ice as cold_path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI—I think it's fine to use ice! { ... }
without renaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good feedback, thanks. I was wondering if I should push the name cold_path
upstream since it's a bit more self-documenting (albeit less fun 😛). I'll leave this for the moment since it's a pretty easy find/replace to change later. @nirosys, do you have an opinion one way or another on the macro name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the self-documenting nature of cold_path
, since it's a bit easier to understand without familiarity of what ice_cold does. But, I don't feel too strongly about it, since it does rely on knowing what a cold path is to be more understandable.
SymbolId(sid) if self.flex_uint_encoding => { | ||
FlexUInt::write_u64(self.buffer(), sid as u64)?; | ||
// and then opcode 0x90, which indicates symbol ID 0. | ||
cold_path! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question—while this one instance of using ice
/cold_path
might not make much difference on its own, do we need to be concerned about this macro inflating the compiled code?
I.e.: Is cold_path!
going to inflate the size of the compiled code as compared to something like having a function like this so that ice
/cold_path
doesn't generate a new function for both the 0x80
and 0x90
cases?
#[inline(never)]
fn cold_extend_from_slice(self, slice: &[u8]) -> IonResult<()> {
self.buffer().extend_from_slice_copy(slice)
}
cold_path! {{ | ||
let _ = Self::write_large_i64(output, value, encoded_size_in_bytes); | ||
}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, any particular reason we need to use cold_path!
instead of just marking write_large_i64
as #[inline(never)]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downsides of marking the callee #inline(never)
are:
- That information isn't visible at the callsite, so a reader would need to go investigate it separately.
- It prevents that method from being used in non-cold paths. (This isn't a problem for this method in particular since it's never going to be on the hot path, but it is an issue more generally.)
- If any of the method's arguments are statically known at the call site,
cold_path!
bakes them into the resulting function so the assembly doesn't have to push or pop them from the call stack. - There may be other functionality that's part of the cold path but not part of the method. For example:
fn foo(&self) -> IonResult<()> {
if self.something() {
return Ok(());
}
cold_path! {{
// This fn contains most of the cold path logic...
fallible_fn()
// ...but we need to map its result to an IonResult
.map_err(IonError::from)
}}
}
Adds write support for e-expressions in the binary writer.
Additionally:
write_many_structs
benchmark that compares time to write structs in Ion 1.0, Ion 1.1 w/length prefixes, Ion 1.1 w/delimited containers, and Ion 1.1 w/macros.ice_code
as a dependency to label cold code paths without having to construct new methods.delegate_value_writer_to!
macro that cuts the number of places theValueWriter
trait method collection is written out, simplifying refactoring.Related to #671.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.