-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes more ion-tests for NativeReader
#377
Conversation
Fixes include: * Blobs can have whitespace between chunks of base64 characters. Example: `{{aGVsbG8=}}` is the same as `{{ a G Vsb G8 = }}`. * Clobs can have whitespace before and after their quoted text. Example: `{{"hello"}}` is the same as `{{ "hello" }}`. * Clobs (which are text of an unspecified encoding) no longer erroneously support Unicode escape sequences (\u and \U). * S-expressions no longer require whitespace between self-delimiting values. Example: `(+++foo)` parses successfully as an s-expression with an operator, `+++`, and a symbol, `foo`. * S-expression operators followed by negative numbers no longer fail to parse. Examples: * `-3` is a negative int * `--3` is an operator, `--`, followed by an int, `3`. * Struct field names can be long strings. `{'''a''': foo}` is the same as `{a: foo}`. * Decimal and float exponents can start with a `+`. * Decimal and float exponents can start with multiple leading zeros. * All escaped newlines (`\` followed by unescaped \r or \r\n) are now discarded. (Previously, only `\` followed by \n was discarded). * Skip lists have been updated to reflect the current functionality.
Fixes include: * `NativeReader` now loads struct fields correctly. * `RawBinaryReader::ion_type` will return `None` if the current item is `StreamItem::Nothing`. Previously it would return the `IonType` of the most recently encountered value. * The text parser will allow spaces between an annotations sequence and the annotated value. (`foo::bar:: 5`) * Clobs using long string syntax now respect escapes that appear mid-fragment. * Long string fragments can have comments in the interstitial space. (`'''Hello''' /*comment*/ ''', world!''''`)
Codecov Report
@@ Coverage Diff @@
## main #377 +/- ##
==========================================
+ Coverage 91.57% 91.61% +0.04%
==========================================
Files 75 75
Lines 11099 11121 +22
==========================================
+ Hits 10164 10189 +25
+ Misses 935 932 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
@@ -295,7 +295,7 @@ impl<R: IonDataSource> StreamReader for RawBinaryReader<R> { | |||
// If the cursor is nested inside a parent object, don't attempt to read beyond the end of | |||
// the parent. Users can call '.step_out()' to progress beyond the container. | |||
if self.cursor.bytes_read >= parent.value_end_exclusive() { | |||
return Ok(RawStreamItem::Nothing); | |||
return Ok(self.set_current_item(RawStreamItem::Nothing)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ When the reader encountered the end of a container, it would return StreamItem::Nothing
without recording that as the current StreamItem
. This meant that later calls to methods like ion_type()
might operate on stale data.
match self.current() { | ||
Value(ion_type) | Null(ion_type) => Some(ion_type), | ||
_ => None, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ self.cursor.value.header.ion_type
is the IonType
of the most recently encountered value in the stream. It is not overwritten when the reader encounters a non-Value
like an IVM or the end of a container. This method now reports the IonType
based on the reader's current RawStreamItem
.
pub(crate) fn parse_annotations(input: &str) -> IResult<&str, Vec<RawSymbolToken>> { | ||
many1(parse_annotation)(input) | ||
terminated(many1(parse_annotation), opt(whitespace))(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Whitespace can now appear between the last annotation delimiter (::
) and the annotated value.
foo::bar:: 5
foo::bar::
5
foo :: bar ::
5
})(input) | ||
// Matches the head of the string up to the next `'''` or `\`. | ||
// Will not match if the `'''` or `\` is at the very beginning of the string. | ||
long_string_fragment_without_escaped_text(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ For the moment, this logic is identical to that used for long strings. This may change in a future PR as I believe there are additional escape sequence rules to consider for clob
(such as newline normalization). For now, this is more correct than what it replaces.
@@ -36,7 +36,7 @@ fn long_string(input: &str) -> IResult<&str, TextValue> { | |||
terminated( | |||
many1(terminated( | |||
delimited(tag("'''"), long_string_body, tag("'''")), | |||
opt(whitespace), | |||
opt(whitespace_or_comments), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Long strings can have comments between fragments.
'''Hello''' /*comment 1*/
// comment 2
''' world!'''
fn long_string_fragment_without_escaped_text(input: &str) -> IResult<&str, StringFragment> { | ||
pub(in crate::text::parsers) fn long_string_fragment_without_escaped_text( | ||
input: &str, | ||
) -> IResult<&str, StringFragment> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ cargo fmt
/// If there are no more values at this level, returns `Ok(None)`. | ||
/// If an error occurs while materializing the value, returns an `Err`. | ||
/// Advances the reader to the next value in the stream and uses [Self::materialize_current] | ||
/// to materialize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The original implementation of materialize_next()
would advance the reader to the next item in the stream and then consume it recursively.
When the reader went to read struct fields, it would call materialize_next()
and then try to read the field name. This failed for container values because materialize_next()
would advance the cursor beyond the current field in the process of materializing it.
I broke materialize_next()
into two methods: materialize_current()
, which materializes the current item in the stream without first advancing the cursor, and materialize_next()
, which is a convenience function that calls reader.next()
and then materialize_current()
. This allows materialize_struct
to advance the reader to the next item in the stream once, read its field name, and then consume it via materialize_current
.
child_elements.push((owned::text_token(field.as_ref()), element)); | ||
let value = self | ||
.materialize_current()? | ||
.expect("materialize_current() returned None for user data"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
This expect
panics because there was no value
attached to given field_name
?
If yes, maybe more descriptive message here:
.expect("materialize_current() returned None for user data"); | |
.expect(format!("Field: {} has no value attached to it", field)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expect panics because there was no value attached to given field_name?
No, it expect
s because we've already proven that there was a nullable value and that there weren't any errors while reading:
// v-- Here we confirm that there's definitely either a value or a null.
while let StreamItem::Value(_) | StreamItem::Null(_) = self.reader.next()? {
let field = self.reader.field_name()?;
child_elements.push((owned::text_token(field.as_ref()), element));
let value = self
.materialize_current()? // <-- here we use `?` to handle any errors
// At this point, there's definitely a value (or null) and there weren't any errors...
.expect("materialize_current() returned None for user data");
// ...so we can use `expect` to dismiss the `None` case.
I agree that the error message could be clearer; I'll fix it in a follow-on PR to avoid complicating the rebases I need to do in other PRs.
This PR depends on #376. Once that has been merged, I'll rebase this on top of
main
.Fixes include:
NativeReader
now loads struct fields correctly.RawBinaryReader::ion_type
will returnNone
if the currentitem is
StreamItem::Nothing
. Previously it would return theIonType
of the most recently encountered value.and the annotated value. (
foo::bar:: 5
)mid-fragment.
(
'''Hello''' /*comment*/ ''', world!'''
)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.