-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Use fastjson to parse mutation data string #772
refactor: Use fastjson to parse mutation data string #772
Conversation
f2262a9
to
2b59a63
Compare
Codecov Report
@@ Coverage Diff @@
## develop #772 +/- ##
===========================================
+ Coverage 58.70% 59.11% +0.41%
===========================================
Files 153 153
Lines 17083 16977 -106
===========================================
+ Hits 10028 10036 +8
+ Misses 6121 6023 -98
+ Partials 934 918 -16
|
71111f7
to
e7b0b75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, code is quite a bit nice than it was before :) Did you benchmark it at all? Got a few minor comments for you.
// UpdateWithKeys updates documents matching the given DocKeys. | ||
// | ||
// The provided updater must be a string Patch, string Merge Patch, a parsed Patch, or parsed Merge Patch | ||
// else an ErrInvalidUpdater will be returned. | ||
// | ||
// Returns an ErrDocumentNotFound if a document is not found for any given DocKey. | ||
UpdateWithKeys(context.Context, []DocKey, interface{}) (*UpdateResult, error) | ||
UpdateWithKeys(context.Context, []DocKey, string) (*UpdateResult, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Thanks for this type change - this always confused me and I thought there was more types that this could handle but the diff suggests it was just the two. Do check in with John though if you haven't already to make sure we are not losing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
As far as I could tell the value passed was always a string but yes I will be checking with John.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to be able to support native Go types map[string]interface{}
If using the programatic approach, but at the moment, its only used/tested from the POV of the query system, which only handles strings.
I would nice to still support this, but its prob lower priority, and supporting native types gets in the way of using the fastjson
approach outlined in this PR anyway. So its fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at the last commit for a potential solution to keep Go maps as a possible updater
) error { | ||
keyStr, ok := doc["_key"].(string) | ||
if !ok { | ||
return errors.New("Document is missing key") | ||
return errors.New("document is missing key") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Making these changes within commits that contain more significant changes refactors/feat changes does make the review harder, adding noise that makes it harder to focus on stuff that needs attention. Would suggest doing cosmetic stuff like this in separate commits in the future
if _, ok := mval.(map[string]interface{}); ok { | ||
|
||
mergeMap := make(map[string]*fastjson.Value) | ||
merge.Visit(func(k []byte, v *fastjson.Value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Why bother with the extra variable/iteration and move the contents of the for loop into this Visit
function? Feels a bit odd to have both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I can't return an error
from the anonymous function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could assign the error to a variable defined outside of the anon-func though, and return that? Then you can stick to a single iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is that if we have say 20 fields and we have an error on the first one, there is no way of stopping the Visit
method from iterating over all the other fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah true, it wont break :) Thanks
db/collection_update.go
Outdated
} | ||
|
||
val := client.NewCBORValue(fd.Typ, cval) | ||
val := client.NewCBORValue(fd.Typ, mergeCBOR[mfield]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This feels odd to me and I think the code would be clearer and slightly safer and efficient if you just use the object to created with validateFieldSchema
instead of fetching it from the map. Fetching it suggests that it might be something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's assigned directly to the map hence why it's then fetching from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but instead of doing:
foo := bar()
map[x] = foo
y := map[x]
foobar(y)
you could just do
foo := bar()
map[x] = foo
foobar(foo)
This way there is no ambiguity over what is being passed into fooBar, and the developer reading and the CPU has to do slightly less work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that
map[x] = bar()
foobar(map[x])
is actually less CPU intensive. I also find very little ambiguity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance is secondary here, but you are comparing the cost of fetching an item from a map to the cost of doing nothing.
Similar thought process regarding readability. Only when reading the developer has to also wonder why to would insert something in to a map, and then immediately try and find it again - suggesting that it might have changed between the two lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seam that compiler efficiency would lead to the
foo := bar()
map[x] = foo
foobar(foo)
being the better option. https://godbolt.org/z/cqEYoWWxc
I disagree that it improves readability as there is only an error check of 3 lines between the assignment and where it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 lines (now) and a double take to make sure it is not shared between threads etc. or mutated within any funcs that may be called within those 3 lines.
Is just an odd thing to see IMO. Leave it in if you want though.
db/collection_update.go
Outdated
func convertNillableArrayWithConverter[TIn any, TOut any](val any, converter func(TIn) TOut) ([]*TOut, error) { | ||
if val == nil { | ||
return nil, nil | ||
if zeroValue == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This is bit odd, it looks like you have two functions in here not one :) Suggest using two functions instead of an extra param that is unused for half the callers and a big if block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how you see that it's unused. It's used for all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If zeroValue is nil, it is not used. The param is used to determine if function A or function B is used in that instance, and it is a compile-time constant here - used to toggle a code branch at runtime.
zeroValue == nil
is known at compile time, but you are making it appear (to both developer and CPU) that it is a runtime thing, making the code-flow more complicated than it needs to be.
db/collection_update.go
Outdated
func getArray[T any]( | ||
val *fastjson.Value, | ||
typeGetter func(*fastjson.Value) (T, error), | ||
zeroValue any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Why is zeroValue of type any
, and not of type T
? I can't spot the reason for that, and if you use type T
you remove the potential runtime failure that you are having to check for in the body of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because zeroValue cannot be nil
it type is T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean for the purpose of the big-if I referred to in another comment? That feels like an abuse of the type system if the zero value of T is not of type T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is the zero value of T
cannot be nil
. So we can either do:
func getArray[T any](
val *fastjson.Value,
typeGetter func(*fastjson.Value) (T, error),
zeroValue T,
) (any, error) {
if val.Type() == fastjson.TypeNull {
return nil, nil
}
valArray, err := val.Array()
if err != nil {
return nil, err
}
arr := make([]T, len(valArray))
for i, arrItem := range valArray {
if arrItem.Type() == fastjson.TypeNull {
arr[i] = zeroValue
continue
}
arr[i], err = typeGetter(arrItem)
if err != nil {
return nil, err
}
}
return arr, nil
}
func getNillableArray[T any](
val *fastjson.Value,
typeGetter func(*fastjson.Value) (T, error),
) ([]*T, error) {
if val.Type() == fastjson.TypeNull {
return nil, nil
}
valArray, err := val.Array()
if err != nil {
return nil, err
}
arr := make([]*T, len(valArray))
for i, arrItem := range valArray {
if arrItem.Type() == fastjson.TypeNull {
arr[i] = nil
continue
}
v, err := typeGetter(arrItem)
if err != nil {
return nil, err
}
arr[i] = &v
}
return arr, nil
}
or keep it the way I currently have it:
func getArray[T any](
val *fastjson.Value,
typeGetter func(*fastjson.Value) (T, error),
zeroValue any,
) (any, error) {
if val.Type() == fastjson.TypeNull {
return nil, nil
}
valArray, err := val.Array()
if err != nil {
return nil, err
}
if zeroValue == nil {
arr := make([]*T, len(valArray))
for i, arrItem := range valArray {
if arrItem.Type() == fastjson.TypeNull {
arr[i] = nil
continue
}
v, err := typeGetter(arrItem)
if err != nil {
return nil, err
}
arr[i] = &v
}
return arr, nil
}
arr := make([]T, len(valArray))
for i, arrItem := range valArray {
if arrItem.Type() == fastjson.TypeNull {
var ok bool
arr[i], ok = zeroValue.(T)
if !ok {
return nil, errors.New("zeroValue should be of the same type as the array items type")
}
continue
}
arr[i], err = typeGetter(arrItem)
if err != nil {
return nil, err
}
}
return arr, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to set arr[i] = zeroValue
in getArray
, you just continue - no zeroValue required (leave as array default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Unless we want the option to set a different default value.
So you prefer the two function approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, I see the current as two functions pretending to be one - the main function body is split into two, and only one of those two will execute depending on an input parameter that is (currently) always a compile-time constant.
db/collection_update.go
Outdated
|
||
func getBool(v *fastjson.Value) (bool, error) { | ||
b, err := v.Bool() | ||
return b, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I'm not nitpicking here, just curious - why are you doing this and not just return v.Bool()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ I think I was tired last night when I added these get functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol fair enough 😁 wasn't sure if it was a personal preference
switch patch.(type) { | ||
case []map[string]interface{}: | ||
if parsedUpdater.Type() == fastjson.TypeArray { | ||
isPatch = true | ||
case map[string]interface{}: | ||
isPatch = false | ||
default: | ||
} else if parsedUpdater.Type() != fastjson.TypeObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Did you ensure / test that this still does what we want it to do? I ask this because these lines were not covered by test coverage previously, and after this change, they still aren't.
https://app.codecov.io/gh/sourcenetwork/defradb/compare/772/diff
suggest: If not too painful to hit these, would be nice to have some tests asserting that your change works properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess (if nothing else catches it first), that something like update_users(data: "1") {...
could hit the new version, but I've not tried it.
Would be good to have, our 'negative' tests are quite lacking at the moment too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added unit test to cover those cases. Let me know if you like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that fred! I do see they coverage is hit now. However I was wondering why not add integration tests rather than unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) I forgot we added a location of Collection integration tests - I would say Fred's new tests are integration tests (they call only public funcs from what I see (minus setup which just follows the db_test.go
format and is easily ported).
Adding a todo from me for this, as they should be really easy to move to tests/integration/collection/...
, the utils there should slim them down a bit, and they will be much less liable to sudden deletion if their host file ever becomes a liability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and they will be much less liable to sudden deletion if their host file ever becomes a liability
Not sure I understand this. I'll ask during standup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Any notable thoughts/changes have already been brought up and adressed by others!
4f60e99
to
e4840fa
Compare
db/db_test.go
Outdated
_, err = col.UpdateWithKey(ctx, doc.Key(), `{{ | ||
"Name": "Eric" | ||
}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Perhaps something more obvious to spot as an invalid JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A double curly brace makes it pretty obvious to me. What would make it more obvious to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double brace requires reading the other 2 lines, and is very similar to the character next to it (e.g. how often have we all missed a bracket when coding without intelisense), the rest of the string is also valid json.
IMO it would be more obvious if the error was on the focus line (not the func call, but where "Name": "Eric"
currently lies), and if the error was larger in size - e.g. the below:
_, err = col.UpdateWithKey(ctx, doc.Key(), `{
:--------------INVALID_JSON---------------
}`)
@@ -275,9 +269,9 @@ func (c *collection) updateWithFilter( | |||
// Get the document, and apply the patch | |||
doc := docMap.ToMap(query.Value()) | |||
if isPatch { | |||
err = c.applyPatch(txn, doc, patch.([]map[string]interface{})) | |||
// todo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: All todo's should have an associated ticket number. I know we had todo previously in the updateWithKey
too suggest linking both with the same ticket if it will be relevant to both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah similar to comment RE dead code, I'd suggest deleting this if entirely (or going through the hassle of adding commit/branch to an issue if you really want it available). But if not, we did agree to only keep todos with a link to a ticket.
@@ -291,28 +285,38 @@ func (c *collection) updateWithFilter( | |||
return results, nil | |||
} | |||
|
|||
func (c *collection) applyPatch( | |||
func (c *collection) applyPatch( //nolint:unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: So all the //nolint:unused
indicate that this is now all dead code? Why not completely remove them? OR are these being saved to work on in the future for patch support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking): I would suggest deletion here too - if it is dead and untested it can be assumed to be broken and will be of little value when actually implementing the feature (let alone to anyone reading this file in the meantime). If there is a ticket for this somewhere you can always dump the code in there if really wanted, or the commit hash that removes it, or if really fussed over it a branch name/commit that adds this function back in (a commit that adds it in is far easier to rebase that e.g. a current snapshot pre-removal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this would increase the scope of this PR quite a bit. The reason this is happening is because there was a code path that was available but never actually used. Adding the unit test to hit the path resulted in an error because the feature is not actually fully supported. I suggest we leave it with //nolint for this PR and we can clean it up in a separate PR.
732e196
to
d6502df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, with a todo to address and a suggestion - thanks for moving the tests - they look good and seem to cover the code well :)
tests := []testUtils.TestCase{ | ||
{ | ||
Description: "Test update users with filter and invalid JSON", | ||
Docs: map[string][]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Probably much better to use the doc defined at the top of the test func instead of redefining it with the same values - the current setup is implicitly dependent on the dockeys being deterministic, and it hides the desired relationship and makes it easy to unwittingly edit the values within the test cases (and hide that from anyone reading)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are absolutely right. I should have done that. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
}, | ||
}, | ||
}, | ||
ExpectedError: "cannot parse JSON: cannot parse object: cannot find opening '\"\" for object key; unparsed tail: \"Name: \\\"Eric\\\"\\n\\t\\t\\t\\t\\t\\t}\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): Expected error should work with partials, and the current string contains stuff that we dont care about, and adds failure points that we'd probably rather not fail this test.
Suggest trimming it down a bit to the below (or similar):
cannot parse JSON: cannot parse object: cannot find opening '\"\" for object key; unparsed tail: \"Name: \\\"Eric\\\"
Are a couple more tests like this that could benefit IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. cannot parse JSON: cannot parse object
is even probably enough here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
@@ -22,6 +22,8 @@ import ( | |||
) | |||
|
|||
type TestCase struct { | |||
Description string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: You like the test cases? I didn't, and deliberately didn't bother adding them to this lib lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean Description
? I like to see this info when there are failed tests. Especially that we don't get a specific line number with these test structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I mean an array of TestCases - I've started avoiding doing that, and use one TestCase per test function (reduces the usefulness of the description). Sorry my original comment was poorly worded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that I can declare things once and reuse in the list of test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) I liked that about your tests here, I think I would naturally use a file level var/func (global in Go, which is a bit bad) and share it between tests
Description This PR makes use of the fastjson package to improve mutation data string parsing. Doing so also simplifies the validateFieldSchema function.
Relevant issue(s)
Resolves #124
Description
This PR makes use of the fastjson package to improve mutation data string parsing. Doing so also simplifies the validateFieldSchema function.
Tasks
How has this been tested?
Using integration tests
Specify the platform(s) on which this was tested: