-
-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query optimizations #478
Query optimizations #478
Conversation
Looks like its failing the unit tests. Can you take a look at the results of |
d38889e
to
9a77dfe
Compare
I really don't like the idea of changing the queries to get a static list of column names, but it'll have to do for now until we get rid of the blobs in the schema. I don't like having an extra place to remember to maintain columns for objects. |
Just as an addendum to last comment, I think having an arbitrary set of columns in the My feeling is that the column changes should be dropped for this PR, so that we can get the performance gain of getting rid of the unnecessary joins, and perform the column stuff in a separate PR. The design of the query builder PR needs to be iterated on further imo. If we absolutely must include it in this PR, my thinking would be that the existing find interfaces should be left as they originally were (getting all columns) and adding a new interface that accepts a list of columns. That way, the calling code has control over what columns it queries for, and makes it more explicit that the graphql code will be querying for specific columns. I think though that this should probably be done separately as part of a larger refactoring of our query interfaces. |
I agree that it's not a great solution. Ideally the resolvers would figure out the required columns based on the query and only fetch those, but that would probably make for a really hairy resolver. Unfortunately once I revert the column changes, the performance gains largely evaporate. I'll try to come up with a cleaner solution, but as you say the query builders should probably be refactored altogether, and I don't think I'm the right person to do that. |
Assuming we remove the blobs from the tables, is there going to be an appreciative performance difference between doing If there is, then it makes sense to refactor the query functions to accept columns. Otherwise, I would suggest changing the unnecessary joins and leaving the column selection as is with the view that the blob stuff should be done as priority for the next release. I'm going to submit a PR to your branch with an alternative approach, but I am hesitant to merge it into 0.2 presuming we release reasonably soon, since there's a decent risk of regression. |
I've had a quick go at it. You can take a look at the branch here: https://github.com/WithoutPants/stash/tree/select-columns I still don't really like the idea in general. When we make a call that gets a |
2922e7b
to
352787a
Compare
Column changes have been reverted, as have the bulk fetching since it proved to be slower than fetching everything one by one without the column change. Also removed some groupBys in performer findByScene since it's really slow and doesn't make much sense anyway. If someone for some reason attaches the same performer multiple times to a scene then we should just return that. Finally, refactored scene query to use the querybuilder because why not. |
Changes look ok and tested ok during my quick testing. #513 is now merged, so if you can merge in the develop branch then we can ensure that at least the new unit tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add the following unit tests to querybuilder_scene_test.go
and remove the applicable TODO comments:
func TestSceneCountByTagID(t *testing.T) {
sqb := models.NewSceneQueryBuilder()
sceneCount, err := sqb.CountByTagID(tagIDs[tagIdxWithScene])
if err != nil {
t.Fatalf("error calling CountByTagID: %s", err.Error())
}
assert.Equal(t, 1, sceneCount)
sceneCount, err = sqb.CountByTagID(0)
if err != nil {
t.Fatalf("error calling CountByTagID: %s", err.Error())
}
assert.Equal(t, 0, sceneCount)
}
func TestSceneCountByMovieID(t *testing.T) {
sqb := models.NewSceneQueryBuilder()
sceneCount, err := sqb.CountByMovieID(movieIDs[movieIdxWithScene])
if err != nil {
t.Fatalf("error calling CountByMovieID: %s", err.Error())
}
assert.Equal(t, 1, sceneCount)
sceneCount, err = sqb.CountByMovieID(0)
if err != nil {
t.Fatalf("error calling CountByMovieID: %s", err.Error())
}
assert.Equal(t, 0, sceneCount)
}
func TestSceneCountByStudioID(t *testing.T) {
sqb := models.NewSceneQueryBuilder()
sceneCount, err := sqb.CountByStudioID(studioIDs[studioIdxWithScene])
if err != nil {
t.Fatalf("error calling CountByStudioID: %s", err.Error())
}
assert.Equal(t, 1, sceneCount)
sceneCount, err = sqb.CountByStudioID(0)
if err != nil {
t.Fatalf("error calling CountByStudioID: %s", err.Error())
}
assert.Equal(t, 0, sceneCount)
}
func TestFindByMovieID(t *testing.T) {
sqb := models.NewSceneQueryBuilder()
scenes, err := sqb.FindByMovieID(movieIDs[movieIdxWithScene])
if err != nil {
t.Fatalf("error calling FindByMovieID: %s", err.Error())
}
assert.Len(t, scenes, 1)
assert.Equal(t, sceneIDs[sceneIdxWithMovie], scenes[0].ID)
scenes, err = sqb.FindByMovieID(0)
if err != nil {
t.Fatalf("error calling FindByMovieID: %s", err.Error())
}
assert.Len(t, scenes, 0)
}
func TestFindByPerformerID(t *testing.T) {
sqb := models.NewSceneQueryBuilder()
scenes, err := sqb.FindByPerformerID(performerIDs[performerIdxWithScene])
if err != nil {
t.Fatalf("error calling FindByPerformerID: %s", err.Error())
}
assert.Len(t, scenes, 1)
assert.Equal(t, sceneIDs[sceneIdxWithPerformer], scenes[0].ID)
scenes, err = sqb.FindByPerformerID(0)
if err != nil {
t.Fatalf("error calling FindByPerformerID: %s", err.Error())
}
assert.Len(t, scenes, 0)
}
func TestFindByStudioID(t *testing.T) {
sqb := models.NewSceneQueryBuilder()
scenes, err := sqb.FindByStudioID(performerIDs[studioIdxWithScene])
if err != nil {
t.Fatalf("error calling FindByStudioID: %s", err.Error())
}
assert.Len(t, scenes, 1)
assert.Equal(t, sceneIDs[sceneIdxWithStudio], scenes[0].ID)
scenes, err = sqb.FindByStudioID(0)
if err != nil {
t.Fatalf("error calling FindByStudioID: %s", err.Error())
}
assert.Len(t, scenes, 0)
}
Three of these tests are failing (TestSceneCountByTagID
, TestSceneCountByMovieID
and TestFindByMovieID
). I believe it is due to the selectAll("all")
calls in lines 30 and 36 of querybuilder_scene.go
. This is also causing the movies and tags pages to error out.
pkg/models/querybuilder_scene.go
Outdated
@@ -10,7 +10,9 @@ import ( | |||
"github.com/stashapp/stash/pkg/database" | |||
) | |||
|
|||
var scenesForPerformerQuery = selectAll("scenes") + ` | |||
const TABLE_NAME = "scenes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golang constants are visible module-wide, so TABLE_NAME
is not a good constant name for a scene-specific value. Suggest changing this to sceneTable
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duh, I keep forgetting that go is basically C, scope and all. Fixed.
* Remove slow and largely pointless groupbys * Change scene.query to use querybuilder
This PR optimizes performer and scene queries by doing a few things:
The performance gains for my test cases have been:
120 Performers: ~560ms -> 30ms.
60 Scenes with performers and custom covers: 220ms -> 70ms.
Note that this is with the migration to move the performer image to the last column. Otherwise the difference for performers is likely less significant.
The scene query is harder to optimize due to a larger number of joins for each entity, but it's still a nice gain.
The diff is a bit hard to follow, and I didn't test all the filters extensively, so it's possible I missed something, but generally speaking there should be no changes other than the performance improvements.