-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Model Col. SchemaVersions and migrations on Cols #2286
feat: Model Col. SchemaVersions and migrations on Cols #2286
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #2286 +/- ##
===========================================
- Coverage 74.15% 73.83% -0.32%
===========================================
Files 258 257 -1
Lines 25656 25739 +83
===========================================
- Hits 19024 19003 -21
- Misses 5302 5408 +106
+ Partials 1330 1328 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 19 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
e62de3d
to
a6dd001
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a brief review. Looks good so far. Have just some minor comments and question.
I might have another look at it some time later.
client/db.go
Outdated
// present in the database. | ||
// | ||
// If true is provided, the new schema versions will be made active and previous versions deactivated, otherwise | ||
// [SetDefaultSchemaVersion] should be called to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: adjust doc to [SetActiveSchemaVersion]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers :) Will do a search to make sure there aren't others I've missed too :)
- Fix remaining SetDefaultSchemaVersion
@@ -92,6 +111,10 @@ func (col CollectionDescription) QuerySources() []*QuerySource { | |||
return sourcesOfType[*QuerySource](col) | |||
} | |||
|
|||
func (col CollectionDescription) CollectionSources() []*CollectionSource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: please add documentation to the public method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks :)
- Doc CollectionSources()
client/descriptions.go
Outdated
@@ -111,6 +134,28 @@ type QuerySource struct { | |||
Query request.Select | |||
} | |||
|
|||
// QuerySource represents a collection data source from another collection instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: CollectionSource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😁 Copy-pasta
- Fix doc
db/collection.go
Outdated
continue | ||
// The collection version may exist before the schema version was created locally. This is | ||
// because migrations for the globally known schema version may have been registered locally | ||
// (typically to handle documentes synced over P2P at higher versions) before the local schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: documents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
- Fix typo
db/collection.go
Outdated
@@ -196,31 +199,92 @@ func (db *db) updateSchema( | |||
return err | |||
} | |||
|
|||
if setAsDefaultVersion { | |||
cols, err := description.GetCollectionsBySchemaVersionID(ctx, txn, previousVersionID) | |||
// After creating the new schema version, we need to new create collection versions for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "to create new collection"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording was deliberate, to try and avoid ambiguity between many collections with one root, and a brand new collection-set with a different root. It is done in a few places.
Will have another think on this, but please do share any further thoughts of your own in the meantime :)
- "to create new collection" vs "create collection versions" vs other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just that "we need to new create" has new in the wrong position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh 😁 I didn't spot that on re-read even, will fix lol
} | ||
|
||
cols := make([]client.CollectionDescription, 0) | ||
for res := range q.Next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: you could you datastore.DeserializePrefix[uint32]
here.
It could be also used for GetCollections
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think DeserializePrefix
would be beneficial here, it hides the interaction with the datastore, and creates more work for the machine here (second iteration required, an unused set for Key), and is another func for the reader to have to navigate through when they are reading.
GetCollections
is a better fit than this one, but I remain reluctant due to the extra layer.
sourceCols = append(sourceCols, col) | ||
} | ||
|
||
for _, sourceCol := range sourceCols { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I think we should try to use index-based approach as much as possible:
for i := range sourceCols {
// ...
if source.SourceCollectionID == sourceCols[i].ID {
It might seem to be a little improvement, but it add up significantly.
This is one of the things I don't like about Go: it makes things look deceptively simple.
When one writes C++ code he is (or should be) constantly aware of costs of every instruction.
With Go it feels different (talking from my perfonal experienece). But it's the same CPU cycles that we waste.
If you really want to refer to this var as sourceCol
you can extract it as a pointer:
for i := range courseCols {
courseCol := &courseCols[i]
I think we all should try to follow this approach as much as possible.
Note: it applies to other parts in the code as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) I do disagree with this, and do see it as a personal styling preference of yours. I find a more 'modern' for i, foo
loop to be much more readable and less error prone. The performance difference can go either way too, and is typically very very small.
I do see courseCol := &courseCols[i]
as the worst of both worlds lol. Especially when it is creating a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a personal styling preference of yours
It's not personal. I think it is everybody's preference to have faster code than slower.
Yes, you are right, "modern" style is a bit more readable, but I don't think it outweighs the benefits of faster code.
If you look at one instance of a loop it seem like very small impact, but this approach is used in lots of places, which means we do a lot of copying. This adds up.
I do see courseCol := &courseCols[i] as the worst of both worlds lol. Especially when it is creating a pointer.
How is it worst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not personal. I think it is everybody's preference to have faster code than slower.
Performance is not the ultimate goal of most code. Typically we trade performance for readability (otherwise we'd all code in assembly).
Your suggestion is not always going to be more performant either. foo[i]
is not free.
How is it worst?
It does what for i, foo
does, but in a more unexpected way, and it uses a pointer. Depending on the size of foo
that can be more expensive than copying, and it can result in more GC pressure.
In my opinion:
Using for i {
everywhere is seen by me as a premature optimisation with an unproven and likely negligible impact on our users and a very real negative impact on readability.
And readability issues breed bad and unperformant code. Prioritising blanket micro-optimisations like this could ultimately result in much worse performance for our users.
And given that we differ, and a very significant part of our profession appears to also differ (else for i, foo
would not exist in Go and other langs), I see this as a personal preference and is not something that should be imposed on any developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using for i { everywhere
I didn't say everywhere. Some types (like ints) are much cheaper to copy by values. I should be decided on case-by-case basis.
CollectionDescription
is cheaper to pass around by pointer.
Anyway... didn't want to start a long conversation. It's a suggestion after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that for i, foo := range bar
improves readability and the extra allocation here has very little impact on performance. Not worth switching to for i := range bar
.
@@ -269,7 +269,13 @@ func (s *server) PushLog(ctx context.Context, req *pb.PushLogRequest) (*pb.PushL | |||
if len(cols) == 0 { | |||
return nil, client.NewErrCollectionNotFoundForSchema(schemaRoot) | |||
} | |||
col := cols[0] | |||
var col client.Collection | |||
for _, c := range cols { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: just another reminder to use index-based iteration approach
a6dd001
to
2909d8e
Compare
e9aea9a
to
4dfc50e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the changes. Just a handful of comments to take care of before approval and I have something I want to bring up for discussion on Discord. Thanks Andy!
sourceCols = append(sourceCols, col) | ||
} | ||
|
||
for _, sourceCol := range sourceCols { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that for i, foo := range bar
improves readability and the extra allocation here has very little impact on performance. Not worth switching to for i := range bar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for making the changes :)
Unrelated to main change, spotted due to annoying warning.
Another param will be added shortly, and it is already a bit long.
This will be used later.
e2c8d81
to
af16fc7
Compare
…#2286) ## Relevant issue(s) Resolves sourcenetwork#2198 ## Description Models Collection SchemaVersions and migrations on Collections, instead of in the Lens Registry. Primary motivators for the change: 1. Boost the DevEx RE Lens and schema versions. It seemed pretty painful before to find transforms, yet alone figure out where that lay in the version history, or add them to a collection. 2. Unify the Collection history/schema history with the View system, so that they both roughly work in the same way. Reducing the cognitive and maintenance overhead of having two different systems for the same thing for both us and the users. Noteworthy stuff: 1. The branching of schema is not new, it is just more visible with this change. There is nothing we can really do to block it in a distributed system. I do now see it as supported, excluding Lens, which still needs a relatively minor tweak in order to handle the migration-pathing. 2. The model change in client/descriptions.go 3. The removal of stuff from the LensRegistry. This is currently kept to a minimum, however I would very much like to completely remove LensRegistry in the near future. 4. The CLI and the Http clients used to rely on the fact that db.SetMigration and registry.SetMigration did the same thing. This is no longer the case. 5. The CLI and the Http clients followed a different, flatter structure than the Go client, flattening the lens registry into the `lens`/`schema` pathways. This was inconsistent, and only worked when (4) was as it was. This has been changed out of necessity and the 3 clients now have matching structures. 6. The CLI still paths everything under `schema`, and the http client everything under `lens`, which is strange, but out of scope. I will open a ticket to make them the same. 7. Some of the schema versions in the existing schema tests have changed, I think there was a bug in the framework where it was not caring about their accuracy. It now cares. 8. Users can now specify their transforms when patching a schema. 9. Out of the 4 collection getting funcs on `client.Store`, only `GetAllCollections` has the ability to get inactive collections. I currently very much dislike that we have 4 functions for the same thing in the Go client, especially given that this is not reflected in the Http and CLI clients. In another ticket I would very much like us to collapse these four functions into one, and have that one function support the fetching of inactive collections.
Relevant issue(s)
Resolves #2198
Description
Models Collection SchemaVersions and migrations on Collections, instead of in the Lens Registry.
There is quite a lot going on this PR, and most of it is shoved into the last commit, there was quite a lot that had to happen at the same time, and probably a few changes that could have been broken out to other commits, weren't.
Probably the easiest way to view the user visible changes is in
tests/integration/schema/updates/with_schema_branch_test.go
. There is a new test action, and the remodelling should be quite visible there.Primary motivators for the change:
Noteworthy stuff:
lens
/schema
pathways. This was inconsistent, and only worked when (4) was as it was. This has been changed out of necessity and the 3 clients now have matching structures.schema
, and the http client everything underlens
, which is strange, but out of scope. I will open a ticket to make them the same.client.Store
, onlyGetAllCollections
has the ability to get inactive collections. I currently very much dislike that we have 4 functions for the same thing in the Go client, especially given that this is not reflected in the Http and CLI clients. In another ticket I would very much like us to collapse these four functions into one, and have that one function support the fetching of inactive collections.Todo: