Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export performance optimization #475

Merged
merged 6 commits into from
Apr 24, 2020
Merged

Conversation

bnkai
Copy link
Collaborator

@bnkai bnkai commented Apr 18, 2020

Improved the export functionality to utilize multiple threads when creating the json files.
In my case ( nvme ssd for db and metadata , dual 6c/12t processor) the export function seemed to be cpu bottlenecked due to single thread usage.I suspect the base64 conversion for images might have something to do with that.
I added some worker threads ( by default GOMAXPROCS in number )
The difference in performance is more evident for the scenes part in my case

~11k scenes ( ~700 mb in metadata / scenes folder)
Time to export scenes
Single thread 3m33s
4 workers 50s
8 workers 26s
12 workers 19s
24 workers 15s

By default workers is set as the number of CPUs
You can experiment with by setting the env GOMAXPROCS
or by running stash like this (linux 4 workers)
GOMAXPROCS=4 ./stash-linux

@bnkai bnkai added the improvement Something needed tweaking. label Apr 18, 2020
@WithoutPants
Copy link
Collaborator

Seems strange that the bottleneck is the CPU.

@bnkai
Copy link
Collaborator Author

bnkai commented Apr 18, 2020

I didn't expect that either. I noticed that a single thread out of 24 (12c/24t cpu) was constantly maxed out during export . My db and metadata are located on nvme ( benches 1GB/s +) and i have plenty of ram available also ( 48gb) so the bottleneck was something else

@WithoutPants
Copy link
Collaborator

In the original code, I profiled the export process, with the following graph:

pprof

I did a quick change to add FindNamesBySceneID to PerformerQueryBuilder which selects the names of the performers only, and called that from ExportScenes. On the database I was testing on, the export went from taking 2.17 minutes to 53 seconds. The graph for that run is below:

pprof_post

I'm guessing it's the blobs in the database that are causing the slowdown (yet another mark for not storing them there).

That was with 10 minutes of investigation or so. With enough time we can probably optimise further. I'd prefer finding and fixing the bottlenecks before we optimise by adding threads.

For profiling I added this code:

import "runtime/pprof"

f, _ := os.Create("export.prof")
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()

// code to be profiled

Then used go tool pprof -svg stash <path to export.prof> > image.svg to generate the graph image.

@bnkai
Copy link
Collaborator Author

bnkai commented Apr 20, 2020

Are you sure you were cpu bottlenecked and not io ?

I tried changing the part you said and got a nice speed bump but not that big compared to yours.
From my test 11k files 536 performers ( NVME ssd)

single worker before change
profile01
after change
profile01-opt

there is a ~40 sec gain 4.50 - > 4.10 minutes
As you can see no io threads are present in the graph before/after compared to your profiling.

8 workers
profile08

8 workers after change
profile08-opt

35s -> 26s from the change

4.10 minutes -> 26s 1 worker change - > 8 workers change

I'll try some more profiling to see if i can find something else that explains my cpu bottleneck but in my case threading still gives substancial gains. Can you double check your cpu usage while doing the export to see if any of your cpu cores is maxed out?

@bnkai
Copy link
Collaborator Author

bnkai commented Apr 20, 2020

Lol FindBySceneID in TagQueryBuilder was using an unnecessary join with scenes.
Removed join and got to
33s for 1 worker
profile01-opt2

and
6.5 s for 8 workers
profile08-opt

It seems @InfiniteTF has the join removal included in #478 also

@bnkai
Copy link
Collaborator Author

bnkai commented Apr 20, 2020

Replaced encoding/json with github.com/json-iterator/go in jsonschema
got from 33s to 21s for 1 worker and from 6.5s to 4.8s using 8 workers
1 worker
profile01-opt3
8 workers
profile08-opt3

with the last change we have
21s 1 worker
11s 2 workers
6,7s 4 workers
4.8s 8 workers

For my case i still have cpu bottleneck ( 1 core maxed out ) but now its only for 20s

@WithoutPants
Copy link
Collaborator

Exporting when the directory structure hasn't been created results in only mappings.json and scraped.json being created.

@bnkai
Copy link
Collaborator Author

bnkai commented Apr 21, 2020

The directory structure is created once during the setup i think. Did you change that in the settings?

@WithoutPants
Copy link
Collaborator

I think it was a problem at my end. I think I must've deleted the directory structure before running the develop version, but must've deleted the directory structure while the branch version was running, meaning that it didn't recreate the structure. In fact, if I look at the logs after exporting, I can see errors because the directories don't exist. Regardless, this is certainly nothing to do with your changes.

If we are going to include creating the directory structure as part of the export, then can you combine the repeated code (at startup and during export) into a separate function?

@bnkai
Copy link
Collaborator Author

bnkai commented Apr 22, 2020

Deleting the structure while stash was running explains your case.
Working on the extra function now.

@WithoutPants WithoutPants merged commit 9b1518b into stashapp:develop Apr 24, 2020
@WithoutPants WithoutPants added this to the Version 0.2.0 milestone Apr 24, 2020
Tweeticoats pushed a commit to Tweeticoats/stash that referenced this pull request Feb 1, 2021
* recreate metadata path if needed, before exporting data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Something needed tweaking.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants