Query optimizations #478

ghost · 2020-04-18T16:44:52Z

This PR optimizes performer and scene queries by doing a few things:

Selecting only columns that are actually used by graphql. The primary purpose of this is excluding performer images and scene covers from the queries since they are never used.
Fetching all entity data at once instead of first querying for ids, and then querying each individual id for the data. Despite the sqlite manual's claims, doing a single query is significantly faster, at least in my tests.
Removing pointless joins. Many places entity tables are joined to do counts and existence checks, despite the join table alone being sufficient. E.g: Joining performers with scenes through performer_scenes and counting scenes.id, rather than just counting performers_scenes.scene_id.

The performance gains for my test cases have been:
120 Performers: ~560ms -> 30ms.
60 Scenes with performers and custom covers: 220ms -> 70ms.

Note that this is with the migration to move the performer image to the last column. Otherwise the difference for performers is likely less significant.

The scene query is harder to optimize due to a larger number of joins for each entity, but it's still a nice gain.

The diff is a bit hard to follow, and I didn't test all the filters extensively, so it's possible I missed something, but generally speaking there should be no changes other than the performance improvements.

WithoutPants · 2020-04-19T04:27:51Z

Looks like its failing the unit tests. Can you take a look at the results of make it?

pkg/models/querybuilder_scene.go

misssing field

WithoutPants · 2020-04-22T00:14:36Z

I really don't like the idea of changing the queries to get a static list of column names, but it'll have to do for now until we get rid of the blobs in the schema. I don't like having an extra place to remember to maintain columns for objects.

WithoutPants · 2020-04-22T01:00:50Z

Just as an addendum to last comment, I think having an arbitrary set of columns in the Find functions are a bad idea. It is non-obvious to the calling code just what is being returned.

My feeling is that the column changes should be dropped for this PR, so that we can get the performance gain of getting rid of the unnecessary joins, and perform the column stuff in a separate PR. The design of the query builder PR needs to be iterated on further imo.

If we absolutely must include it in this PR, my thinking would be that the existing find interfaces should be left as they originally were (getting all columns) and adding a new interface that accepts a list of columns. That way, the calling code has control over what columns it queries for, and makes it more explicit that the graphql code will be querying for specific columns.

I think though that this should probably be done separately as part of a larger refactoring of our query interfaces.

ghost · 2020-04-22T10:33:29Z

I agree that it's not a great solution. Ideally the resolvers would figure out the required columns based on the query and only fetch those, but that would probably make for a really hairy resolver.

Unfortunately once I revert the column changes, the performance gains largely evaporate. I'll try to come up with a cleaner solution, but as you say the query builders should probably be refactored altogether, and I don't think I'm the right person to do that.

WithoutPants · 2020-04-24T05:42:46Z

Assuming we remove the blobs from the tables, is there going to be an appreciative performance difference between doing select * vs select <specific columns>?

If there is, then it makes sense to refactor the query functions to accept columns. Otherwise, I would suggest changing the unnecessary joins and leaving the column selection as is with the view that the blob stuff should be done as priority for the next release.

I'm going to submit a PR to your branch with an alternative approach, but I am hesitant to merge it into 0.2 presuming we release reasonably soon, since there's a decent risk of regression.

WithoutPants · 2020-04-24T06:43:01Z

I've had a quick go at it. You can take a look at the branch here: https://github.com/WithoutPants/stash/tree/select-columns

I still don't really like the idea in general. When we make a call that gets a Scene or Performer object, we should really be fully populating the object, because otherwise other functions have no idea how much an object has been populated. I can see this being a massive maintenance headache.

ghost · 2020-04-29T09:56:13Z

Column changes have been reverted, as have the bulk fetching since it proved to be slower than fetching everything one by one without the column change.

Also removed some groupBys in performer findByScene since it's really slow and doesn't make much sense anyway. If someone for some reason attaches the same performer multiple times to a scene then we should just return that.

Finally, refactored scene query to use the querybuilder because why not.

WithoutPants · 2020-05-04T07:04:58Z

Changes look ok and tested ok during my quick testing. #513 is now merged, so if you can merge in the develop branch then we can ensure that at least the new unit tests pass.

WithoutPants

Can you please add the following unit tests to querybuilder_scene_test.go and remove the applicable TODO comments:

func TestSceneCountByTagID(t *testing.T) {
	sqb := models.NewSceneQueryBuilder()

	sceneCount, err := sqb.CountByTagID(tagIDs[tagIdxWithScene])

	if err != nil {
		t.Fatalf("error calling CountByTagID: %s", err.Error())
	}

	assert.Equal(t, 1, sceneCount)

	sceneCount, err = sqb.CountByTagID(0)

	if err != nil {
		t.Fatalf("error calling CountByTagID: %s", err.Error())
	}

	assert.Equal(t, 0, sceneCount)
}

func TestSceneCountByMovieID(t *testing.T) {
	sqb := models.NewSceneQueryBuilder()

	sceneCount, err := sqb.CountByMovieID(movieIDs[movieIdxWithScene])

	if err != nil {
		t.Fatalf("error calling CountByMovieID: %s", err.Error())
	}

	assert.Equal(t, 1, sceneCount)

	sceneCount, err = sqb.CountByMovieID(0)

	if err != nil {
		t.Fatalf("error calling CountByMovieID: %s", err.Error())
	}

	assert.Equal(t, 0, sceneCount)
}

func TestSceneCountByStudioID(t *testing.T) {
	sqb := models.NewSceneQueryBuilder()

	sceneCount, err := sqb.CountByStudioID(studioIDs[studioIdxWithScene])

	if err != nil {
		t.Fatalf("error calling CountByStudioID: %s", err.Error())
	}

	assert.Equal(t, 1, sceneCount)

	sceneCount, err = sqb.CountByStudioID(0)

	if err != nil {
		t.Fatalf("error calling CountByStudioID: %s", err.Error())
	}

	assert.Equal(t, 0, sceneCount)
}

func TestFindByMovieID(t *testing.T) {
	sqb := models.NewSceneQueryBuilder()

	scenes, err := sqb.FindByMovieID(movieIDs[movieIdxWithScene])

	if err != nil {
		t.Fatalf("error calling FindByMovieID: %s", err.Error())
	}

	assert.Len(t, scenes, 1)
	assert.Equal(t, sceneIDs[sceneIdxWithMovie], scenes[0].ID)

	scenes, err = sqb.FindByMovieID(0)

	if err != nil {
		t.Fatalf("error calling FindByMovieID: %s", err.Error())
	}

	assert.Len(t, scenes, 0)
}

func TestFindByPerformerID(t *testing.T) {
	sqb := models.NewSceneQueryBuilder()

	scenes, err := sqb.FindByPerformerID(performerIDs[performerIdxWithScene])

	if err != nil {
		t.Fatalf("error calling FindByPerformerID: %s", err.Error())
	}

	assert.Len(t, scenes, 1)
	assert.Equal(t, sceneIDs[sceneIdxWithPerformer], scenes[0].ID)

	scenes, err = sqb.FindByPerformerID(0)

	if err != nil {
		t.Fatalf("error calling FindByPerformerID: %s", err.Error())
	}

	assert.Len(t, scenes, 0)
}

func TestFindByStudioID(t *testing.T) {
	sqb := models.NewSceneQueryBuilder()

	scenes, err := sqb.FindByStudioID(performerIDs[studioIdxWithScene])

	if err != nil {
		t.Fatalf("error calling FindByStudioID: %s", err.Error())
	}

	assert.Len(t, scenes, 1)
	assert.Equal(t, sceneIDs[sceneIdxWithStudio], scenes[0].ID)

	scenes, err = sqb.FindByStudioID(0)

	if err != nil {
		t.Fatalf("error calling FindByStudioID: %s", err.Error())
	}

	assert.Len(t, scenes, 0)
}

Three of these tests are failing (TestSceneCountByTagID, TestSceneCountByMovieID and TestFindByMovieID). I believe it is due to the selectAll("all") calls in lines 30 and 36 of querybuilder_scene.go. This is also causing the movies and tags pages to error out.

WithoutPants · 2020-05-09T00:40:33Z

pkg/models/querybuilder_scene.go

@@ -10,7 +10,9 @@ import (
 	"github.com/stashapp/stash/pkg/database"
 )

-var scenesForPerformerQuery = selectAll("scenes") + `
+const TABLE_NAME = "scenes"


Golang constants are visible module-wide, so TABLE_NAME is not a good constant name for a scene-specific value. Suggest changing this to sceneTable or similar.

Duh, I keep forgetting that go is basically C, scope and all. Fixed.

* Remove slow and largely pointless groupbys * Change scene.query to use querybuilder

ghost force-pushed the query-optimizations branch from d38889e to 9a77dfe Compare April 19, 2020 12:31

bnkai mentioned this pull request Apr 20, 2020

Export performance optimization #475

Merged

bnkai previously approved these changes Apr 20, 2020

View reviewed changes

bnkai reviewed Apr 20, 2020

View reviewed changes

pkg/models/querybuilder_scene.go Outdated Show resolved Hide resolved

bnkai approved these changes Apr 21, 2020

View reviewed changes

WithoutPants mentioned this pull request Apr 24, 2020

[RFC] Changing image storage from DB blob to file system #425

Closed

ghost force-pushed the query-optimizations branch from 2922e7b to 352787a Compare April 29, 2020 09:47

WithoutPants mentioned this pull request May 1, 2020

Querybuilder integration tests #513

Merged

ghost force-pushed the query-optimizations branch from 352787a to 4508340 Compare May 4, 2020 15:21

WithoutPants requested changes May 8, 2020

View reviewed changes

WithoutPants requested changes May 9, 2020

View reviewed changes

Infinite added 10 commits May 9, 2020 08:45

Optimize performer and scene queries

371a8a0

Formatting

4c64368

Remove old code

4ac0c24

Add missing column

8fac44e

Add o_counter column

a739e0e

Re-add scenes join to performers query

e3ba504

Revert change to scenes qb.All

680ef01

Revert column changes

8587fcc

Revert query changes

e72ee07

Remove slow and largely pointless groupbys

bbbe210

Infinite added 4 commits May 9, 2020 08:46

Change scene.query to use querybuilder

2079dc5

Fix Movie.FindBySceneId

68d834f

Add additional tests from @WithoutPants and fix bug

f962e4c

Rename sceneTable constZ

7681b2a

ghost force-pushed the query-optimizations branch from 2dd38ee to 7681b2a Compare May 9, 2020 06:50

Infinite and others added 2 commits May 9, 2020 08:53

Revert go.mod

832cda5

Add test for FindNameBySceneID

50921cd

WithoutPants approved these changes May 11, 2020

View reviewed changes

WithoutPants merged commit 8ba7678 into stashapp:develop May 11, 2020

WithoutPants mentioned this pull request Aug 31, 2020

Fix age filtering regression #778

Merged

Tweeticoats pushed a commit to Tweeticoats/stash that referenced this pull request Feb 1, 2021

Query optimizations (stashapp#478)

3508349

* Remove slow and largely pointless groupbys * Change scene.query to use querybuilder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query optimizations #478

Query optimizations #478

ghost commented Apr 18, 2020

WithoutPants commented Apr 19, 2020

WithoutPants commented Apr 22, 2020

WithoutPants commented Apr 22, 2020 •

edited

Loading

ghost commented Apr 22, 2020

WithoutPants commented Apr 24, 2020

WithoutPants commented Apr 24, 2020

ghost commented Apr 29, 2020

WithoutPants commented May 4, 2020

WithoutPants left a comment

WithoutPants May 9, 2020

ghost May 9, 2020

Query optimizations #478

Query optimizations #478

Conversation

ghost commented Apr 18, 2020

WithoutPants commented Apr 19, 2020

WithoutPants commented Apr 22, 2020

WithoutPants commented Apr 22, 2020 • edited Loading

ghost commented Apr 22, 2020

WithoutPants commented Apr 24, 2020

WithoutPants commented Apr 24, 2020

ghost commented Apr 29, 2020

WithoutPants commented May 4, 2020

WithoutPants left a comment

Choose a reason for hiding this comment

WithoutPants May 9, 2020

Choose a reason for hiding this comment

ghost May 9, 2020

Choose a reason for hiding this comment

WithoutPants commented Apr 22, 2020 •

edited

Loading