Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Simplify fetcher interface #1746

Merged

Conversation

AndrewSisley
Copy link
Contributor

Relevant issue(s)

Resolves #1589

Description

Simplifies the fetcher interface, removing 2 of the 3 FetchFoo functions. As well as making it simpler, this also means that all (one) of the functions have to behave in the same way (e.g. deleted doc support).

There is a fair amount going on here, suggest reviewing commit by commit - it should help explain what is going on and why.

@AndrewSisley AndrewSisley added area/query Related to the query component refactor This issue specific to or requires *notable* refactoring of existing codebases and components action/no-benchmark Skips the action that runs the benchmark. labels Aug 2, 2023
@AndrewSisley AndrewSisley added this to the DefraDB v0.7 milestone Aug 2, 2023
@AndrewSisley AndrewSisley requested a review from a team August 2, 2023 11:36
@AndrewSisley AndrewSisley self-assigned this Aug 2, 2023
@AndrewSisley AndrewSisley force-pushed the 1589-fetcher-interface branch from c1679c0 to 284b1b5 Compare August 2, 2023 11:37
@codecov
Copy link

codecov bot commented Aug 2, 2023

Codecov Report

Patch coverage: 74.48% and project coverage change: +0.25% 🎉

Comparison is base (0647f97) 75.42% compared to head (8c0b6eb) 75.68%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1746      +/-   ##
===========================================
+ Coverage    75.42%   75.68%   +0.25%     
===========================================
  Files          208      208              
  Lines        21801    21735      -66     
===========================================
+ Hits         16443    16448       +5     
+ Misses        4210     4142      -68     
+ Partials      1148     1145       -3     
Flag Coverage Δ
all-tests 75.68% <74.48%> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
db/collection_get.go 72.73% <25.00%> (-8.52%) ⬇️
db/collection_index.go 96.64% <50.00%> (-0.82%) ⬇️
planner/scan.go 88.50% <50.00%> (-1.40%) ⬇️
lens/fetcher.go 69.65% <68.18%> (+19.28%) ⬆️
db/fetcher/encoded_doc.go 75.68% <82.50%> (+10.05%) ⬆️
db/fetcher/fetcher.go 76.31% <90.24%> (+0.95%) ⬆️

... and 4 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0647f97...8c0b6eb. Read the comment docs.

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one minor todo.

I like the change and the simplification it brings.

schemaVersionID string
status client.DocumentStatus
properties map[client.FieldDescription]*encProperty
decodedpropertyCache map[client.FieldDescription]any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: change to decodePropertyCache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? decode is a verb, and if there are values in here, this means it has been decoded (past-tense) (and I think the past-tense becomes an adjective, which the present tense cannot be)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I did a typo in my change request: decodedPropertyCache. I just want it to be camelcased properly.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah 😆 I didnt spot that at all - thanks Fred, will change of course

  • Fix casing

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and suggestions and one todo.

One notable question regarding the interface change was if we still want it to match the Enumerable interface?

Will run thru this again (went thru it all, but want to digest, and might have some other thoughts depending on the answer/approach to the above question)

But overall, very good!

Comment on lines 164 to 165
// only get filter fields if filter=true
if onlyFilterProps && !prop.IsFilter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: (This might be a problem from the original implementation) But I think the onlyFilterProps should precede the cached value check/assignment?

Copy link
Contributor Author

@AndrewSisley AndrewSisley Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see what you mean, but I thought this check was purely a performance thing from within this function - returning extra stuff would not be an issue, and if it is cached then the performance cost should be negligible?

Although, having written that out it still seems technically incorrect to return the value, even if there is no downside RE user visible behaviour (incl. perf.), I'll move it and see if anything breaks 😁

  • Move

@@ -514,14 +522,49 @@ func (df *DocumentFetcher) processKV(kv *keyValue) error {

df.execInfo.FieldsFetched++

df.doc.Properties[fieldDesc] = property
df.doc.properties[fieldDesc] = property

return nil
}

// FetchNext returns a raw binary encoded document. It iterates over all the relevant
// keypairs from the underlying store and constructs the document.
func (df *DocumentFetcher) FetchNext(ctx context.Context) (EncodedDocument, ExecInfo, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Wasn't part of this refactor to make it also match the Enumerable interface? If so, I think it would be better to do this under a single PR rather than split, since its all very related.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, we can likely move the ExecInfo off the main Fetch method (it feels a little weird here anyway) and have a dedicated method to get the info instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't part of this refactor to make it also match the Enumerable interface?

We for sure spoke a bit about that, and I started to look into it, but it felt slightly forced (due to a slight mismatch of funcs like Close and Reset, Init, Start, etc) , and had much less value IMO. This PR currently makes things much better, and cheaper for myself and Islam to build upon, changing to Enumerable would not speed up the Lens work, and I doubt it would help Islam.

We can look at making it Enumerable later I think, for now it is effort I would rather not spend, and would likely have very little impact on our productivity in the medium term.

we can likely move the ExecInfo off the main Fetch method (it feels a little weird here anyway) and have a dedicated method to get the info instead.

I also thought about doing that too, but it is not a new problem, and is pretty common to have stats like this outputted from funcs like this as return values. It is also extra effort now that does not benefit us anytime soon as far as I can see, and would only slow down the merge of changes that do benefit us right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We for sure spoke a bit about that, and I started to look into it, but it felt slightly forced (due to a slight mismatch of funcs like Close and Reset, Init, Start, etc)

OK no problem. Just remembered talking about it and wanted to make sure it was addressed (regardless of in favor or against).

I also thought about doing that too, but it is not a new problem, and is pretty common to have stats like this outputted from funcs like this

Correct, this isnt a new problem, and not really all that important. The comment was only in relation to if we moved forward with the enumerable interface matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remembered talking about it and wanted to make sure it was addressed

Yeah, I should probably have noted this in the PR description - I nearly did and then changed my mind as it wasnt noted in the related issue.

The comment was only in relation to if we moved forward with the enumerable interface matching.

Does sound like the cleanest way to handle it with an enumerable-derived interface :)

@@ -183,162 +182,3 @@ func TestFetcherGetAllPrimaryIndexEncodedDocMultiple(t *testing.T) {
assert.NoError(t, err)
assert.NotNil(t, encdoc)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to raise (my usual ;) ) objection to removing some of these unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function no longer exists, there is nothing to unit test. FetchNext is covered above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres two notable things here in these removed tests that arent covered in the tests that remain. 1) Decoding 2) Value checks. Both of these have value and can be either integrated in to the remaining tests, or we can keep the deleted ones and just update them to use the new APIs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Decoding

If we want unit test for decoding I think they should look quite different to these, and be far more comprehensive - the ones deleted only ever test a single string value, and one low, positive, integer. I don't think any time would really be saved by converting these tests to decoding unit tests.

  1. Value checks

That's weird, but you are right, the other tests never actually check the result of FetchNext, only that it doesn't error 🤷 - I still very much don't think they are worth keeping though, given that they are offering very little user protection vs our integration tests

@AndrewSisley AndrewSisley requested a review from jsimnz August 3, 2023 11:10
@AndrewSisley AndrewSisley force-pushed the 1589-fetcher-interface branch from 284b1b5 to 407c867 Compare August 3, 2023 11:12
And handle within the fetcher instance that fetches it.  This means the status is now available via fetcher.FetchNext, an important step towards making the 3 fetcher functions equal.
Grants deleted doc support to all three FetchFoo funcs as the two others call FetchNext.  All 3 funcs should now behave the same way, just outputing the result in a slightly different format.
As well as being inconsistent it is blockking the addition of a Properties() func
This allows a minor refactor, and provides access to the properties of the EncodedDoc from outside of the interfaced object.
Simplifies both the EncodedDoc interface, and the internals of the implementation.
FetchNextDoc and FetchNextDecoded now can just call FetchNext.
@AndrewSisley AndrewSisley force-pushed the 1589-fetcher-interface branch from b2f9ab0 to 8c0b6eb Compare August 3, 2023 13:30
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving now. All looks good. Would still in theory like the see some aspect of the tests preserved, but ill leave that discretion to you

@AndrewSisley AndrewSisley merged commit 07380d8 into sourcenetwork:develop Aug 3, 2023
@AndrewSisley AndrewSisley deleted the 1589-fetcher-interface branch August 3, 2023 17:53
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
## Relevant issue(s)

Resolves sourcenetwork#1589

## Description

Simplifies the fetcher interface, removing 2 of the 3 FetchFoo
functions. As well as making it simpler, this also means that all (one)
of the functions have to behave in the same way (e.g. deleted doc
support).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/query Related to the query component refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify Fetcher interface
3 participants