Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add basic group by functionality #43

Merged
merged 9 commits into from
Nov 22, 2021
Merged

feat: Add basic group by functionality #43

merged 9 commits into from
Nov 22, 2021

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Nov 16, 2021

Adds basic group by functionality to defraDb. Closes #25 Closes #46

Note for reviewers:

  • This does not add support for joins within child groups, although the gql clients may suggest that it does (see https://github.com/sourcenetwork/defradb/tree/sisley/group-by-child-join with failing test and partial (and very WIP implementation)), please let me know your thoughts on whether this can wait or not (Group By: Support joins within _group items #46) Fixed in Group By: Support joins within _group items #46 and cherry-picked into here, see commits (Handle pipe nodes in addSubPlan and Add support for joins within groups)
  • Adding non-grouped fields at the parent query level is permitted by gql, but the behaviour is undefined (will just render the last value); e.g. user (groupBy: [Age]) {Age, Name}
  • Commits are an absolute mess at the moment and I will clean them up once we decide that the functionality and implementation are alright to merge. They will be squashed, and then possibly re-commited in a sensible fashion. Please review all changes at once via the Files changed.
  • Aggregates are not added here

To do:

  • Clean up commits once people are mostly happy with implementation (previous perma-stashed in sisley/group-by-pre-squash)
  • Cleanup any other 'perma-stash' remote branches
  • Proper godocs for node(s)
  • Decide whether to break out the auxiliary components (such as pipeNode) from the group.go file (I'm really in two minds here, input very very welcome)
  • One-to-many join tests are flacky in this branch - develop doesn't seem to have this issue - find and fix
  • Squash fixup commit and inner join commits

@AndrewSisley AndrewSisley added feature New feature or request area/query Related to the query component labels Nov 16, 2021
@AndrewSisley AndrewSisley self-assigned this Nov 16, 2021
@todo
Copy link

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in 5949c8b in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 16, 2021

@ handle error

//todo@ handle error
}
fields[parser.GroupFieldName] = &gql.Field{
Type: gql.NewList(gqlType),
}


This comment was generated by todo based on a todo comment in 5949c8b in #43. cc @sourcenetwork.

@jsimnz jsimnz linked an issue Nov 16, 2021 that may be closed by this pull request
@AndrewSisley
Copy link
Contributor Author

AndrewSisley commented Nov 16, 2021

More info on the flacky tests:

  • Only one-many are affected (incl. non-group tests)
  • error surfaces ~ln90 in executor.go from the call gql.ValidateDocument(schema, ast, nil)
  • occurs ~ 1 in 10 times an affected test runs
  • spamming db.SchemaManager().ResolveTypes() or gqlObject.Fields() appears to have no effect on the rate of failure
  • schema.TypeMap()["author"].(*gql.Object).Fields()["published"].Args is empty when tests fail (~ln 90, executor.go), and populated when they pass

Once the go cache decides it has passed, it can be really hard to make it fail again so clean the cache first (go clean -testcache).

Lazy command for me to reproduce (may take a few runs, warning - contains local aliases if you are not me):
go.t.c && go.t.r TestQueryOneToManyWithNumericGreaterThanFilterOnParentAndChild

UPDATE: Issue fixed - see commit "Track expanded items by object and name" - issue may have been affecting production (if object has more than one parent record - e.g. book with an author and a publisher - might be worth adding tests for outside of this PR)

@AndrewSisley
Copy link
Contributor Author

this is incorrect for joins within _group collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)

This comment was generated by todo based on a todo comment in 5949c8b in #43. cc @sourcenetwork.

Issue #46

@todo
Copy link

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in fb55cd3 in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in 7fe6c2b in #43. cc @sourcenetwork.

@AndrewSisley AndrewSisley changed the title Add basic group by functionality feat: Add basic group by functionality Nov 16, 2021
@todo
Copy link

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in b97a202 in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in 26d25ec in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in 6a66e1a in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 16, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in 3a24a75 in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in 7423a89 in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in fc4d2e8 in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in 605e732 in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in b6741dc in #43. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 17, 2021

this is incorrect for joins within `_group` collections, and should be corrected when possible

// @todo: this is incorrect for joins within `_group` collections, and should be corrected when possible
// Find the first scan node in the plan, we assume that it will be for the correct collection
scanNode := p.walkAndFindPlanType(plan.plan, &scanNode{}).(*scanNode)
// Check for any existing pipe nodes in the plan, we should use it if there is one
pipe, hasPipe := p.walkAndFindPlanType(plan.plan, &pipeNode{}).(*pipeNode)


This comment was generated by todo based on a todo comment in 82bf22e in #43. cc @sourcenetwork.

@AndrewSisley AndrewSisley linked an issue Nov 17, 2021 that may be closed by this pull request
Tracking by object only means that arguements on child objects only get added to the one field if more than one property requires them
'f' is used everywhere else in the file
Allows a lazy-loaded cache with arbitrary reads
Note, does not yet fully implement the planNode interface and is only used within the groupNode
Test currently fails, needs more work plus cleanup.  Stashing here and focusing on the main feature.
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me, is there any final things you wanted/needed to do before this gets merged?

Is this PR also fixing those OneToMany test cases that we're randomly failing? I see you added some changes to type generation stuff.

This is one of those things that might need an engineering note (or section in the DB Tech Spec) so we have a brief overview of the implementation, as well as ideas/thoughts to extend it with aggregates if you're open to writing that.

@AndrewSisley
Copy link
Contributor Author

AndrewSisley commented Nov 22, 2021

All looks good to me, is there any final things you wanted/needed to do before this gets merged?

Is this PR also fixing those OneToMany test cases that we're randomly failing? I see you added some changes to type generation stuff.

This is one of those things that might need an engineering note (or section in the DB Tech Spec) so we have a brief overview of the implementation, as well as ideas/thoughts to extend it with aggregates if you're open to writing that.

Test flackiness was fixed with commit "Track expanded items by object and name" - it might have been a production issue, but I haven't verified that.

Not before it gets merged, but I would be interested in benchmarking this with large (+1,000,000) keys at somepoint - I don't know about the Go implementation but most normal map implementations can get quite slow with large numbers of records and it might be worth optimizing that at somepoint.

Happy to look at adding stuff to the tech spec - will do before I pick up a new task

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jsimnz
Copy link
Member

jsimnz commented Nov 22, 2021

I believe Go's map implementation holds up pretty well, but we can certainly keep an eye on it

@jsimnz jsimnz merged commit d717b8c into develop Nov 22, 2021
@jsimnz jsimnz added the sred/merged SR&ED activity: Merged label Nov 22, 2021
@AndrewSisley
Copy link
Contributor Author

Added a short section to the tech spec under relationships

@AndrewSisley AndrewSisley deleted the sisley/group-by branch November 22, 2021 23:47
@orpheuslummis orpheuslummis mentioned this pull request Jan 29, 2022
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
* Remove commented out test code

* Track expanded items by object and name

Tracking by object only means that arguements on child objects only get added to the one field if more than one property requires them

* Rename variable

'f' is used everywhere else in the file

* Add pipe node to defra db

Allows a lazy-loaded cache with arbitrary reads

* Add data-source/arbitrary join node

Note, does not yet fully implement the planNode interface and is only used within the groupNode

* Add group-by functionality to defra db

* Add support for joins within groups

Test currently fails, needs more work plus cleanup.  Stashing here and focusing on the main feature.

* Handle pipe nodes in addSubPlan

* FIXUP - Move child group field propogation into p.GroupBy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/query Related to the query component feature New feature or request sred/merged SR&ED activity: Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Group By: Support joins within _group items Query GroupBy operation
2 participants