-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add materialized views #3000
feat: Add materialized views #3000
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3000 +/- ##
===========================================
- Coverage 79.49% 79.41% -0.08%
===========================================
Files 329 331 +2
Lines 25225 25670 +445
===========================================
+ Hits 20051 20384 +333
- Misses 3756 3828 +72
- Partials 1418 1458 +40
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 16 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
4b78c74
to
b1542ab
Compare
cli/view_refresh.go
Outdated
to the view will recieve items accessible to the user refreshing the view's permissions. | ||
|
||
Example: refresh all views | ||
defradb view refresh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: defradb client view refresh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers Keenan, nice spot :)
- Fix CLI doc
AuthorView { | ||
name | ||
books { | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: did you forget to uncomment or remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did! Thanks Keenan
- Uncomment test, and revert the second half of the test
cli/view_refresh.go
Outdated
View is refreshed as the current user, meaning results returned for all subsequent query requests | ||
to the view will recieve items accessible to the user refreshing the view's permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Thanks for documenting this.
thought: It makes me think though that this is most likely not the way we want to handle access long term. Access should be reflecting the rights of the user doing the request, not that of the user who refreshed the view. Curious to have your opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the current behaviour long term, and see very few alternatives (we could have the node-user refresh the view).
Worth remembering is that medium-long term view items will be shareable across P2P without needing to share the underlying docs. And we have Lens stuff at play too (not all view items are sourced from any document, Lens can create their own).
The view has its own set of permissions that should be respected, although this was not handled when ACP was introduced to Defra #2018.
IMO a view is a copy of the transformed data, a user has created a view and is sharing it with who ever they see fit, as a convenient alternative to printing the results out and mailing it to people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth remembering is that medium-long term view items will be shareable across P2P without needing to share the underlying docs.
This is specially why we can't have this behaviour long term. We can't allow some random user to access the contents of a view because the user that created the view had the rights to the documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user that created the view would need to give that random user access to it.
There is nothing we (or anyone else, really) can do to stop users sharing data they already have read access to - that is not a technical problem (see print and mail example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring view-viewers to also have read access to the underlying data would totally undermine what I see as the primary use case - exposing data aggregates without exposing the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user that created the view would need to give that random user access to it.
Right. Hence why I'm saying that the description is not what we want long term.
There is nothing we (or anyone else, really) can do to stop users sharing data they already have read access to - that is not a technical problem (see print and mail example).
This tells me we have a misunderstanding. We can clarify things over a call if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed on Discord, Fred and I were chatting about different things. I'll tweak the wording.
- Reword docs
client/collection_description.go
Outdated
// If it is true, they will be, if false, the data returned on query will be calculated | ||
// at query-time from source. | ||
// | ||
// At the moment this can only be set to `false` if this collection sources it's data from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: its data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) Cheers
- Typo
client/db.go
Outdated
@@ -196,6 +196,14 @@ type Store interface { | |||
transform immutable.Option[model.Lens], | |||
) ([]CollectionDefinition, error) | |||
|
|||
// RefreshViews refreshes the caches of all views matching the given options. If no options are set all views |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: If no options are set,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving this one as is, I don't think the comma helps, and I think I'm usually guilty of over-using them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding that without the comma it reads If no options are set-all-views...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed on discord, adding comma
- add comma
// RefreshViews refreshes the caches of all views matching the given options. If no options are set all views | ||
// will be refreshed. | ||
// | ||
// The cached result is dependent on the ACP settings of the source data and the permissions of the user making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: This makes me think that there might be need for setting who is allowed to cache views on a given node. Otherwise this could be abused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes of course, we'll either want it covered by admin ACP or allow users to own views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth documenting this in: #2640
internal/core/key.go
Outdated
// CollectionRootID is the Root of the Collection that this item belongs to. | ||
CollectionRootID uint32 | ||
|
||
// ItemID is the unique (to this CollectionRootID) of the View item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: is the unique ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers, took me a while to figure out what you meant on re-read :)
- Add
ID
after closing of brackets
{ "op": "replace", "path": "/1/IsMaterialized", "value": false } | ||
] | ||
`, | ||
ExpectedError: "non-materialized collections (only views) are not supported. Collection: User", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This reads a little weird. I would suggest to move the parenthesis at the end non-materialized collections are not supported. (only views)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change, but I'll probably just remove the brackets.
- tweak error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just half way through. Looks good so far.
if existingCol.IsMaterialized && !col.IsMaterialized { | ||
// If the collection is being de-materialized - delete any cached values. | ||
// Leaving them around will not break anything, but it would be a waste of | ||
// storage space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: thanks for comments. Very helpful
internal/db/view.go
Outdated
return err | ||
} | ||
|
||
hasNext, err := source.Next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I think hasValue
better reflects the nature of the var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, although I think hasNext
is used more frequently in the codebase.
- Rename to
hasValue
0df32d8
to
c6ac553
Compare
9b2b488
to
d7ce01b
Compare
Switch will be extended shortly, andis nicer than lots of if-elses
2f86bc1
to
4f15328
Compare
4f15328
to
b3ecbc4
Compare
cli/view_refresh.go
Outdated
View is refreshed as the current user, meaning results returned for all subsequent query requests | ||
to the view will receive items generated using the user refreshing the view's permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: This is better but still not very clear. I assume a new user looking at this would still get confused.
suggestion: I said this at some point previously but I can't remember where exactly and John just reminded me of it: We should probably materialize views from the view point of the node and not that of a given user. Queries on the view can then be filtered through ACP for the user doing the request. This will alleviate the need to have multiple materialized versions of the view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will alleviate the need to have multiple materialized versions of the view.
This is an alternative, and long term we probably want both anyway (e.g. here's my personal quick view, go and have a look).
I just picked the easiest, as there is time pressure on this PR and getting the node user is a bit more involved.
This is better but still not very clear. I assume a new user looking at this would still get confused
Suggestions appreciated, it is a moderately complex thing to state in a line or two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets clear up the question of what user is used to generate the view and then we can improve the documentation. I'll bring it up during standup so we can get consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets clear up the question of what user is used to generate the view and then we can improve the documentation.
That sounds like unwanted scope creep forced into a time-sensitive and deliberately lean initial PR, but sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in standup, user-identity is fine for now, long term we may want something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in the standup. This implementation is working for the use case that triggered this PR and as such we can leave it as is.
As for the wording of the documentation, here is an possible alternative:
View is refreshed as the current user, meaning results returned for all subsequent query requests | |
to the view will receive items generated using the user refreshing the view's permissions. | |
View is refreshed as the current user, meaning the cached items will reflect that user's | |
permissions. Subsequent query requests to the view, regardless of user, will receive | |
items from that cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update, thanks Fred
- Update doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
{ "op": "replace", "path": "/2/IsMaterialized", "value": true } | ||
] | ||
`, | ||
ExpectedError: "materialized views do not support ACP. Collection: UserView", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Thanks for not forgetting this :)
} | ||
}`, | ||
Results: map[string]any{ | ||
// Even though UserView was created after the document was created, the results are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It says: Even though UserView was created after the document was created
but correct me if I am wrong weather UserView
is created after or even before the results will always be empty unless the views are refreshed manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I want the createDoc before the view though, as it is more likely to change/break in the future (e.g. auto-refreshing on View create) - this test will catch that as a behaviour change, whereas creating the doc after defining the view would not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -117,3 +107,142 @@ func (n *viewNode) Close() error { | |||
|
|||
return nil | |||
} | |||
|
|||
func convertBetweenMaps(srcMap *core.DocumentMapping, dstMap *core.DocumentMapping, src core.Doc) core.Doc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: would be nice to have a comment for this function. From the name it's not clear why it's called "convert" although both source and destination have the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the first two params to this function are srcMap
and dstMap
, and the place it is called is heavily documented with the reason for calling it.
if err != nil { | ||
return err | ||
} | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: return err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I dislike that if it is not with the immediate result of another func call (e.g. return foo()
), otherwise it looks like it is actually returning a non-nil error. Leaving as is.
} | ||
|
||
var err error | ||
n.currentValue, err = core.UnmarshalViewItem(n.documentMapping, result.Value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is this view stuff really part of the core
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted it in encoding but that created a circular dependency, so it had to go here.
@@ -220,6 +227,22 @@ type CreateView struct { | |||
ExpectedError string | |||
} | |||
|
|||
type RefreshViews struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: would be nice to have the struct documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: The coverage file names will clash currently as you forget to add the new type after line 225
Just append after this block:
coverage\
_${{ matrix.os }}\
_${{ matrix.client-type }}\
_${{ matrix.database-type }}\
_${{ matrix.mutation-type }}\
_${{ matrix.lens-type }}\
_${{ matrix.acp-type }}\
_${{ matrix.database-encryption }}\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers Shahzad, sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Good job Andy!
## Relevant issue(s) Resolves #3014 ## Description Failed run due to code-cov upload file's name clashing: https://github.com/sourcenetwork/defradb/actions/runs/10893445627/job/30228379591 Happened in merge commit: [#4989901](4989901) Just needed to fix the typo introduced in [`15f244d` (#3000)](15f244d) Should be `matrix.view-type` not `matrix.matrix.view-type`, my bad I missed it even while re-reviewing.
Relevant issue(s)
Resolves #2951
Description
Adds materialized views. Also makes materialized views the default (see discord discussion).
The caching behaviour of views in tests is now selected via an environment variable, meaning (with the exception of a few specific examples) a test with a view will test both cacheless and materialized variants - in the CI this adds a new dimension to the matrix, although materialized views are only executed using the simple settings (in-mem store, go client, etc) for now.
#2999 has been mostly fixed in this PR, but not completely - this is why some logic in the lens node has changed.