Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes more ion-tests for NativeReader #377

Merged
merged 7 commits into from
May 11, 2022
Merged

Fixes more ion-tests for NativeReader #377

merged 7 commits into from
May 11, 2022

Conversation

zslayton
Copy link
Contributor

This PR depends on #376. Once that has been merged, I'll rebase this on top of main.


Fixes include:

  • NativeReader now loads struct fields correctly.
  • RawBinaryReader::ion_type will return None if the current
    item is StreamItem::Nothing. Previously it would return the
    IonType of the most recently encountered value.
  • The text parser will allow spaces between an annotations sequence
    and the annotated value. (foo::bar:: 5)
  • Clobs using long string syntax now respect escapes that appear
    mid-fragment.
  • Long string fragments can have comments in the interstitial space.
    ('''Hello''' /*comment*/ ''', world!''')

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

zslayton added 3 commits May 9, 2022 17:58
Fixes include:
* Blobs can have whitespace between chunks of base64 characters.
  Example: `{{aGVsbG8=}}` is the same as `{{  a G Vsb G8 = }}`.
* Clobs can have whitespace before and after their quoted text.
  Example: `{{"hello"}}` is the same as `{{  "hello"  }}`.
* Clobs (which are text of an unspecified encoding) no longer
  erroneously support Unicode escape sequences (\u and \U).
* S-expressions no longer require whitespace between self-delimiting
  values. Example: `(+++foo)` parses successfully as an s-expression
  with an operator, `+++`, and a symbol, `foo`.
* S-expression operators followed by negative numbers no longer
  fail to parse. Examples:
   * `-3` is a negative int
   * `--3` is an operator, `--`, followed by an int, `3`.
* Struct field names can be long strings. `{'''a''': foo}`
  is the same as `{a: foo}`.
* Decimal and float exponents can start with a `+`.
* Decimal and float exponents can start with multiple leading zeros.
* All escaped newlines (`\` followed by unescaped \r or \r\n) are
  now discarded. (Previously, only `\` followed by \n was discarded).
* Skip lists have been updated to reflect the current functionality.
Fixes include:
* `NativeReader` now loads struct fields correctly.
* `RawBinaryReader::ion_type` will return `None` if the current
  item is `StreamItem::Nothing`. Previously it would return the
  `IonType` of the most recently encountered value.
* The text parser will allow spaces between an annotations sequence
  and the annotated value. (`foo::bar::   5`)
* Clobs using long string syntax now respect escapes that appear
  mid-fragment.
* Long string fragments can have comments in the interstitial space.
  (`'''Hello''' /*comment*/ ''', world!''''`)
@codecov
Copy link

codecov bot commented May 10, 2022

Codecov Report

Merging #377 (66cfcb4) into main (3dcd1b6) will increase coverage by 0.04%.
The diff coverage is 96.00%.

@@            Coverage Diff             @@
##             main     #377      +/-   ##
==========================================
+ Coverage   91.57%   91.61%   +0.04%     
==========================================
  Files          75       75              
  Lines       11099    11121      +22     
==========================================
+ Hits        10164    10189      +25     
+ Misses        935      932       -3     
Impacted Files Coverage Δ
src/value/native_reader.rs 97.22% <93.75%> (-1.24%) ⬇️
src/binary/raw_binary_reader.rs 93.81% <100.00%> (+0.32%) ⬆️
src/text/parsers/annotations.rs 96.55% <100.00%> (ø)
src/text/parsers/clob.rs 100.00% <100.00%> (ø)
src/text/parsers/string.rs 98.07% <100.00%> (ø)
src/types/timestamp.rs 92.16% <0.00%> (-0.14%) ⬇️
src/stream_reader.rs 100.00% <0.00%> (ø)
src/binary/raw_binary_writer.rs 95.01% <0.00%> (+0.13%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3dcd1b6...66cfcb4. Read the comment docs.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour

@@ -295,7 +295,7 @@ impl<R: IonDataSource> StreamReader for RawBinaryReader<R> {
// If the cursor is nested inside a parent object, don't attempt to read beyond the end of
// the parent. Users can call '.step_out()' to progress beyond the container.
if self.cursor.bytes_read >= parent.value_end_exclusive() {
return Ok(RawStreamItem::Nothing);
return Ok(self.set_current_item(RawStreamItem::Nothing));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ When the reader encountered the end of a container, it would return StreamItem::Nothing without recording that as the current StreamItem. This meant that later calls to methods like ion_type() might operate on stale data.

match self.current() {
Value(ion_type) | Null(ion_type) => Some(ion_type),
_ => None,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ self.cursor.value.header.ion_type is the IonType of the most recently encountered value in the stream. It is not overwritten when the reader encounters a non-Value like an IVM or the end of a container. This method now reports the IonType based on the reader's current RawStreamItem.

pub(crate) fn parse_annotations(input: &str) -> IResult<&str, Vec<RawSymbolToken>> {
many1(parse_annotation)(input)
terminated(many1(parse_annotation), opt(whitespace))(input)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Whitespace can now appear between the last annotation delimiter (::) and the annotated value.

foo::bar::      5
foo::bar::
5
foo     ::     bar     :: 


5

})(input)
// Matches the head of the string up to the next `'''` or `\`.
// Will not match if the `'''` or `\` is at the very beginning of the string.
long_string_fragment_without_escaped_text(input)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ For the moment, this logic is identical to that used for long strings. This may change in a future PR as I believe there are additional escape sequence rules to consider for clob (such as newline normalization). For now, this is more correct than what it replaces.

@@ -36,7 +36,7 @@ fn long_string(input: &str) -> IResult<&str, TextValue> {
terminated(
many1(terminated(
delimited(tag("'''"), long_string_body, tag("'''")),
opt(whitespace),
opt(whitespace_or_comments),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Long strings can have comments between fragments.

'''Hello''' /*comment 1*/
// comment 2
''' world!'''

fn long_string_fragment_without_escaped_text(input: &str) -> IResult<&str, StringFragment> {
pub(in crate::text::parsers) fn long_string_fragment_without_escaped_text(
input: &str,
) -> IResult<&str, StringFragment> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ cargo fmt

/// If there are no more values at this level, returns `Ok(None)`.
/// If an error occurs while materializing the value, returns an `Err`.
/// Advances the reader to the next value in the stream and uses [Self::materialize_current]
/// to materialize it.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The original implementation of materialize_next() would advance the reader to the next item in the stream and then consume it recursively.

When the reader went to read struct fields, it would call materialize_next() and then try to read the field name. This failed for container values because materialize_next() would advance the cursor beyond the current field in the process of materializing it.

I broke materialize_next() into two methods: materialize_current(), which materializes the current item in the stream without first advancing the cursor, and materialize_next(), which is a convenience function that calls reader.next() and then materialize_current(). This allows materialize_struct to advance the reader to the next item in the stream once, read its field name, and then consume it via materialize_current.

@zslayton zslayton requested review from jobarr-amzn and desaikd May 10, 2022 16:01
@zslayton zslayton mentioned this pull request May 11, 2022
child_elements.push((owned::text_token(field.as_ref()), element));
let value = self
.materialize_current()?
.expect("materialize_current() returned None for user data");
Copy link
Contributor

@desaikd desaikd May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:
This expect panics because there was no value attached to given field_name?
If yes, maybe more descriptive message here:

Suggested change
.expect("materialize_current() returned None for user data");
.expect(format!("Field: {} has no value attached to it", field));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expect panics because there was no value attached to given field_name?

No, it expects because we've already proven that there was a nullable value and that there weren't any errors while reading:

        // v-- Here we confirm that there's definitely either a value or a null.
        while let StreamItem::Value(_) | StreamItem::Null(_) = self.reader.next()? {
            let field = self.reader.field_name()?;
            child_elements.push((owned::text_token(field.as_ref()), element));
            let value = self
                .materialize_current()? // <-- here we use `?` to handle any errors
                // At this point, there's definitely a value (or null) and there weren't any errors...
                .expect("materialize_current() returned None for user data");
                // ...so we can use `expect` to dismiss the `None` case.

I agree that the error message could be clearer; I'll fix it in a follow-on PR to avoid complicating the rebases I need to do in other PRs.

Base automatically changed from test-fixes to main May 11, 2022 18:32
@zslayton zslayton merged commit 3ef4b55 into main May 11, 2022
@zslayton zslayton deleted the struct-fixes branch May 11, 2022 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants