-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GODRIVER-1235 Skip embedded documents and arrays correctly in extJSONValueReader #544
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I just have a question about whether we can use peekType
to skip documents and request some additional tests.
bson/bsonrw/extjson_reader.go
Outdated
_, err = ejvr.p.peekType() | ||
typ, err = ejvr.p.peekType() | ||
// account for embedded arrays | ||
if typ == bsontype.Array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also need to check for bsontype.Document
and call skipDocument
? What if this is unmarshaling an array of documents?
Edit: it appears the thorough tests you wrote already check that! Looking a little deeper, I think extJSONParser.peekType
, unlike extJSONParser.readValue
will recurse into the document. If that is the case, could we use peekType
to implement skipDocument
? Maybe that is a little cheaper than readValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! The skipObject()
function now skips both documents and arrays and uses only peekType()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterative approach is nicer! I think we can avoid peekType
entirely. Let me know if you'd like to discuss.
bson/bsonrw/extjson_reader.go
Outdated
for err == nil { | ||
_, err = ejvr.p.peekType() | ||
if err == ErrEOA || err == ErrEOD { | ||
err = nil | ||
if ejvr.p.depth == initialDepth-1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah, I pointed you in the wrong direction. Looking at advanceState
, depth
only applies to documents, not arrays. Sorry about that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, all good!
bson/bsonrw/extjson_parser.go
Outdated
@@ -104,6 +104,13 @@ func (ejp *extJSONParser) peekType() (bsontype.Type, error) { | |||
case jpsSawEndArray: | |||
// this would only be a valid state if we were in array mode, so return end-of-array error | |||
err = ErrEOA | |||
case jpsSawEndObject: | |||
if ejp.peekMode() == jpmObjectMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking over this again, I still feel like this is a little fragile. I think I'd prefer not modifying peekType
if possible.
The comment on L105 justifies why the peekMode()
is not checked for the array case. So, I thought it would be safe to remove this condition, but that led me astray.
Here is my rough understanding.
extJSONParser
maintains a stack of modes for parsing nested arrays and objects. advanceState
calls pushMode
when encountering an [
or {
token, and calls popMode
when encountering a ]
or }
token to check that brackets are balanced and correct.
So when advanceState
returns jpsSawEndObject
, the mode stack for the object that was being read was just popped. Consequently, I think this condition is only true when popping a nested object. For example:
[ { "a" : { "b": 1 } } ]
When reading the first }
this condition will be true.
But when reading the second }
, we'll pop the jpmObjectMode
. The mode stack will contain one jpmArrayMode
. So this condition is false.
In summary, I don't quite understand the need for the check, and I am not sure why removing this check breaks tests.
There are other oddities about peekType
(e.g. not all states are handled in this switch
, so I think peekType
just returns the 0 type in those cases...).
So unless we have a solid understanding, I'd like to vote for another approach. I think we can bypass peekType
entirely by using advanceState
(explained below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points, and thank you for looking so in-depth into peekType()
. I think I agree, and I've switched to using advanceState()
with the algorithm you described below. The only thing we lose is the error-checking in peekType()
; advanceState()
advances forward blindly and does not return errors, so now neither does skipObject()
. There are some disadvantages to this I may bring up offline...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per offline discussion, I am not too concerned with losing some skip validation on malformed BSON. The previous behavior was incorrect on valid BSON.
bson/bsonrw/extjson_reader.go
Outdated
|
||
func (ejvr *extJSONValueReader) skipArray() error { | ||
// read entire array until ErrEOA (using peekType) | ||
func (ejvr *extJSONValueReader) skipObject() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can tweak this to only use advanceState
.
The initial state, ejvr.p.s
should to be jpsSawBeginArray
or jpsSawBeginObject
at the beginning of this function. I think we can do something like:
- initialize
count
to 1 - call advanceState
- if the state is
jpsSawBeginArray
orjpsSawBeginObject
, incrementcount
- if the state is
jpsSawEndArray
orjpsSawEndObject
, decrementcount
- repeat until
count
is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! That algorithm seems to work perfectly for all existing examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
bson/bsonrw/extjson_parser.go
Outdated
@@ -104,6 +104,13 @@ func (ejp *extJSONParser) peekType() (bsontype.Type, error) { | |||
case jpsSawEndArray: | |||
// this would only be a valid state if we were in array mode, so return end-of-array error | |||
err = ErrEOA | |||
case jpsSawEndObject: | |||
if ejp.peekMode() == jpmObjectMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per offline discussion, I am not too concerned with losing some skip validation on malformed BSON. The previous behavior was incorrect on valid BSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic! The code changes all LGTM and I just have a couple of minor comments on the tests. I'll hold off on approving until those are done, but great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great work!
GODRIVER-1235
Adds logic to correctly skip embedded documents and embedded arrays in
extJSONValueReader
by creating one function to skip both iteratively using the parser'sadvanceState()
function. This addresses errors in unmarshalling extended JSON with undefined fields with nested documents or arrays.Adds tests for unmarshalling extended JSON with undefined fields in
unmarshal_test.go
.