Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GODRIVER-1235 Skip embedded documents and arrays correctly in extJSONValueReader #544

Merged
merged 10 commits into from
Dec 9, 2020
Merged

GODRIVER-1235 Skip embedded documents and arrays correctly in extJSONValueReader #544

merged 10 commits into from
Dec 9, 2020

Conversation

benjirewis
Copy link
Contributor

@benjirewis benjirewis commented Nov 30, 2020

GODRIVER-1235

Adds logic to correctly skip embedded documents and embedded arrays in extJSONValueReader by creating one function to skip both iteratively using the parser's advanceState() function. This addresses errors in unmarshalling extended JSON with undefined fields with nested documents or arrays.

Adds tests for unmarshalling extended JSON with undefined fields in unmarshal_test.go.

@benjirewis benjirewis marked this pull request as ready for review November 30, 2020 15:26
@benjirewis benjirewis requested a review from iwysiu December 2, 2020 20:08
@benjirewis benjirewis changed the title GODRIVER-1235 Skip embedded documents correctly in extJSONValueReader GODRIVER-1235 Skip embedded documents and arrays correctly in extJSONValueReader Dec 2, 2020
Copy link
Contributor

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I just have a question about whether we can use peekType to skip documents and request some additional tests.

bson/bsonrw/extjson_reader.go Show resolved Hide resolved
_, err = ejvr.p.peekType()
typ, err = ejvr.p.peekType()
// account for embedded arrays
if typ == bsontype.Array {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also need to check for bsontype.Document and call skipDocument? What if this is unmarshaling an array of documents?

Edit: it appears the thorough tests you wrote already check that! Looking a little deeper, I think extJSONParser.peekType, unlike extJSONParser.readValue will recurse into the document. If that is the case, could we use peekType to implement skipDocument? Maybe that is a little cheaper than readValue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! The skipObject() function now skips both documents and arrays and uses only peekType().

bson/unmarshal_test.go Show resolved Hide resolved
@benjirewis benjirewis requested a review from kevinAlbs December 3, 2020 21:51
Copy link
Contributor

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterative approach is nicer! I think we can avoid peekType entirely. Let me know if you'd like to discuss.

for err == nil {
_, err = ejvr.p.peekType()
if err == ErrEOA || err == ErrEOD {
err = nil
if ejvr.p.depth == initialDepth-1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah, I pointed you in the wrong direction. Looking at advanceState, depth only applies to documents, not arrays. Sorry about that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, all good!

@@ -104,6 +104,13 @@ func (ejp *extJSONParser) peekType() (bsontype.Type, error) {
case jpsSawEndArray:
// this would only be a valid state if we were in array mode, so return end-of-array error
err = ErrEOA
case jpsSawEndObject:
if ejp.peekMode() == jpmObjectMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over this again, I still feel like this is a little fragile. I think I'd prefer not modifying peekType if possible.

The comment on L105 justifies why the peekMode() is not checked for the array case. So, I thought it would be safe to remove this condition, but that led me astray.

Here is my rough understanding.

extJSONParser maintains a stack of modes for parsing nested arrays and objects. advanceState calls pushMode when encountering an [ or { token, and calls popMode when encountering a ] or } token to check that brackets are balanced and correct.

So when advanceState returns jpsSawEndObject, the mode stack for the object that was being read was just popped. Consequently, I think this condition is only true when popping a nested object. For example:

[ { "a" : { "b": 1 } } ]

When reading the first } this condition will be true.
But when reading the second }, we'll pop the jpmObjectMode. The mode stack will contain one jpmArrayMode. So this condition is false.

In summary, I don't quite understand the need for the check, and I am not sure why removing this check breaks tests.

There are other oddities about peekType (e.g. not all states are handled in this switch, so I think peekType just returns the 0 type in those cases...).

So unless we have a solid understanding, I'd like to vote for another approach. I think we can bypass peekType entirely by using advanceState (explained below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points, and thank you for looking so in-depth into peekType(). I think I agree, and I've switched to using advanceState() with the algorithm you described below. The only thing we lose is the error-checking in peekType(); advanceState() advances forward blindly and does not return errors, so now neither does skipObject(). There are some disadvantages to this I may bring up offline...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, I am not too concerned with losing some skip validation on malformed BSON. The previous behavior was incorrect on valid BSON.


func (ejvr *extJSONValueReader) skipArray() error {
// read entire array until ErrEOA (using peekType)
func (ejvr *extJSONValueReader) skipObject() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can tweak this to only use advanceState.

The initial state, ejvr.p.s should to be jpsSawBeginArray or jpsSawBeginObject at the beginning of this function. I think we can do something like:

  • initialize count to 1
  • call advanceState
  • if the state is jpsSawBeginArray or jpsSawBeginObject, increment count
  • if the state is jpsSawEndArray or jpsSawEndObject, decrement count
  • repeat until count is 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! That algorithm seems to work perfectly for all existing examples.

@kevinAlbs kevinAlbs self-requested a review December 7, 2020 16:33
Copy link
Contributor

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

bson/bsonrw/extjson_reader.go Show resolved Hide resolved
@@ -104,6 +104,13 @@ func (ejp *extJSONParser) peekType() (bsontype.Type, error) {
case jpsSawEndArray:
// this would only be a valid state if we were in array mode, so return end-of-array error
err = ErrEOA
case jpsSawEndObject:
if ejp.peekMode() == jpmObjectMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, I am not too concerned with losing some skip validation on malformed BSON. The previous behavior was incorrect on valid BSON.

Copy link
Contributor

@divjotarora divjotarora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic! The code changes all LGTM and I just have a couple of minor comments on the tests. I'll hold off on approving until those are done, but great work!

bson/unmarshal_test.go Outdated Show resolved Hide resolved
bson/unmarshal_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@divjotarora divjotarora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work!

@benjirewis benjirewis merged commit cbd4540 into mongodb:master Dec 9, 2020
@benjirewis benjirewis deleted the godriver1235 branch December 9, 2020 16:40
benjirewis added a commit that referenced this pull request Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants