-
-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export performance optimization #475
Conversation
Seems strange that the bottleneck is the CPU. |
I didn't expect that either. I noticed that a single thread out of 24 (12c/24t cpu) was constantly maxed out during export . My db and metadata are located on nvme ( benches 1GB/s +) and i have plenty of ram available also ( 48gb) so the bottleneck was something else |
In the original code, I profiled the export process, with the following graph: I did a quick change to add I'm guessing it's the blobs in the database that are causing the slowdown (yet another mark for not storing them there). That was with 10 minutes of investigation or so. With enough time we can probably optimise further. I'd prefer finding and fixing the bottlenecks before we optimise by adding threads. For profiling I added this code:
Then used |
Lol FindBySceneID in TagQueryBuilder was using an unnecessary join with scenes. It seems @InfiniteTF has the join removal included in #478 also |
Exporting when the directory structure hasn't been created results in only |
The directory structure is created once during the setup i think. Did you change that in the settings? |
I think it was a problem at my end. I think I must've deleted the directory structure before running the If we are going to include creating the directory structure as part of the export, then can you combine the repeated code (at startup and during export) into a separate function? |
Deleting the structure while stash was running explains your case. |
* recreate metadata path if needed, before exporting data
Improved the export functionality to utilize multiple threads when creating the json files.
In my case ( nvme ssd for db and metadata , dual 6c/12t processor) the export function seemed to be cpu bottlenecked due to single thread usage.I suspect the base64 conversion for images might have something to do with that.
I added some worker threads ( by default GOMAXPROCS in number )
The difference in performance is more evident for the scenes part in my case
~11k scenes ( ~700 mb in metadata / scenes folder)
Time to export scenes
Single thread 3m33s
4 workers 50s
8 workers 26s
12 workers 19s
24 workers 15s
By default workers is set as the number of CPUs
You can experiment with by setting the env GOMAXPROCS
or by running stash like this (linux 4 workers)
GOMAXPROCS=4 ./stash-linux