Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous Folder Thumbnailing #381

Closed
desertwitch opened this issue Nov 4, 2021 · 13 comments
Closed

Asynchronous Folder Thumbnailing #381

desertwitch opened this issue Nov 4, 2021 · 13 comments

Comments

@desertwitch
Copy link
Contributor

desertwitch commented Nov 4, 2021

At the moment non-cached folder thumbnails are not received asynchronously, but rather a single database query is sent for every single folder synchronously, and only then the directory listing (gallery) is eventually built when everything else is already available. Therefore the availability of the directory listing (gallery) itself is delayed until a thumbnail was received for every single folder, which can result in the whole application becoming unavailable while these queries are running on particularly large folders or subfolders. (tested 700 subfolders, takes minutes!)

It would be better to initially build the directory listing (gallery) with blank folder thumbnails for it to become available fast, which are then filled in with actual folder thumbnails asynchronously, like it is already done when viewing folders containing only images (not subfolders/subfolders & images). Like for images, there should also be an option that only folder thumbnails are received for the immediate visible folders (lazy image rendering)

This is what is currently happening BEFORE the directory listing is built (with the loading bar filling on top of the page):

query: SELECT "media"."id" AS "media_id", "media"."name" AS "media_name", "directory"."name" AS "directory_name", "directory"."path" AS "directory_path", "directory"."id" AS "directory_id" FROM "media_entity" "media" INNER JOIN "directory_entity" "directory" ON "directory"."id"="media"."directoryId" WHERE ("media"."directoryId" = ? OR "directory"."path" GLOB ?) ORDER BY LENGTH("directory"."path") DESC, "media"."metadataRating" DESC, "media"."metadataCreationdate" DESC LIMIT 1 -- PARAMETERS: [3380,"Dir1/*"]

query: SELECT "media"."id" AS "media_id", "media"."name" AS "media_name", "directory"."name" AS "directory_name", "directory"."path" AS "directory_path", "directory"."id" AS "directory_id" FROM "media_entity" "media" INNER JOIN "directory_entity" "directory" ON "directory"."id"="media"."directoryId" WHERE ("media"."directoryId" = ? OR "directory"."path" GLOB ?) ORDER BY LENGTH("directory"."path") DESC, "media"."metadataRating" DESC, "media"."metadataCreationdate" DESC LIMIT 1 -- PARAMETERS: [3381,"Dir2/*"]

[...] x number of subfolders in navigated directory

I, therefore, propose the directory listing (gallery) is built immediately with blank folder thumbnails, which are then exchanged for actual folder thumbnails with the above queries running asynchronously in the background. The unfinished queries should also be cleared from the asynchronous queue when the user navigates away from the folder before all thumbnails were built, instead of continuing to run in the background and eat up database resources (as it is currently happening).

P.S. I have tested this on a project with all folders pre-indexed and pre-thumbnailed on an SSD, so there should be no further bottlenecks apart from these synchronous thumbnailing queries.

P.P.S I have run further testing and can confirm that these queries, receiving the folder thumbnails from the database, are the bottleneck here. If I let the particular function return "null" on every requested folder thumbnail, the directory listing becomes available instantly with blank thumbnails even on folders with 700+ subfolders.

This might also be relevant for #299

@desertwitch
Copy link
Contributor Author

desertwitch commented Nov 4, 2021

Did some more testing. It seems that what actually delays the directory listing is that first for each folder a media object is requested from inside that folder for the thumbnailing job. This results in a database query for each folder to retrieve one media object from inside that folder to use for thumbnailing later on.

Right now it seems that each folder's media object is pulled first, then (when all are complete) the directory listing is shown to the user, and then asynchronous thumbnailing starts. Is getting all the media objects for thumbnailing first really a necessity for the directory listing, rather than only for the later thumbnailing job? If it is not a necessity for the directory listing, it could be done together with the thumbnailing job asynchronously AFTER the directory listing is presented with temporary blank thumbnails.

@bpatrik
Copy link
Owner

bpatrik commented Nov 8, 2021

hi,

directory listing is a bit complex.
It depends:

  • DB or no DB usage
  • last index time
  • indexing severity

Lets take a szimple / most common approach:
DB, recently indexed, low/medium reindexing severity:

  1. no re-indexing will occur, or it will be done asynchronously, happens here:
    public async listDirectory(relativeDirectoryName: string,
  2. cheks the requeted dir:
    const dir = await this.selectParentDir(connection, directoryPath.name, directoryPath.parent);
  3. adds more data to the requested dir:
    await this.fillParentDir(connection, dir);
  4. gets the faces and sub direcotry thumbnails:
    protected async fillParentDir(connection: Connection, dir: ParentDirectoryDTO): Promise<void> {
  5. gets preview for sub directories, one at a time from here
    public async fillPreviewForSubDir(connection: Connection, dir: SubDirectoryDTO): Promise<void> {
    , using this:
    public async getPreviewForDirectory(dir: { id: number, name: string, path: string }): Promise<PreviewPhotoDTOWithID> {

The last step got recently introduced as people wanted to customize directory thumbnails. Caching the preview photo instead of querying it all the time would probably significantly speed up the listing.

I got rather busy lately, I do not think I will have the time for that any time soon, sorry :/

@desertwitch
Copy link
Contributor Author

desertwitch commented Nov 10, 2021

Thanks for the answer, what I don't understand is why the database queries for the preview media need to happen before the directory listing is shown. Can they not happen afterwards so the directory listing is shown first? If I open a folder with 700 subfolders the directory listing takes minutes because all these database queries for the preview media are run first.

These queries are for a single media object from each folder and are all running before the directory listing is shown, only when all of them are complete the directory listing is eventually shown. Is it not better to have the directory listing available as fast as possible for navigating, rather than getting a single media object from each folder (for later preview thumbnail creation) before the actual directory listing?

query: SELECT "media"."id" AS "media_id", "media"."name" AS "media_name", "directory"."name" AS "directory_name", "directory"."path" AS "directory_path", "directory"."id" AS "directory_id" FROM "media_entity" "media" INNER JOIN "directory_entity" "directory" ON "directory"."id"="media"."directoryId" WHERE ("media"."directoryId" = ? OR "directory"."path" GLOB ?) ORDER BY LENGTH("directory"."path") DESC, "media"."metadataRating" DESC, "media"."metadataCreationdate" DESC LIMIT 1 -- PARAMETERS: [3380,"Dir1/*"]

query: SELECT "media"."id" AS "media_id", "media"."name" AS "media_name", "directory"."name" AS "directory_name", "directory"."path" AS "directory_path", "directory"."id" AS "directory_id" FROM "media_entity" "media" INNER JOIN "directory_entity" "directory" ON "directory"."id"="media"."directoryId" WHERE ("media"."directoryId" = ? OR "directory"."path" GLOB ?) ORDER BY LENGTH("directory"."path") DESC, "media"."metadataRating" DESC, "media"."metadataCreationdate" DESC LIMIT 1 -- PARAMETERS: [3380,"Dir2/*"]

query: SELECT "media"."id" AS "media_id", "media"."name" AS "media_name", "directory"."name" AS "directory_name", "directory"."path" AS "directory_path", "directory"."id" AS "directory_id" FROM "media_entity" "media" INNER JOIN "directory_entity" "directory" ON "directory"."id"="media"."directoryId" WHERE ("media"."directoryId" = ? OR "directory"."path" GLOB ?) ORDER BY LENGTH("directory"."path") DESC, "media"."metadataRating" DESC, "media"."metadataCreationdate" DESC LIMIT 1 -- PARAMETERS: [3380,"Dir3/*"]

query: SELECT "media"."id" AS "media_id", "media"."name" AS "media_name", "directory"."name" AS "directory_name", "directory"."path" AS "directory_path", "directory"."id" AS "directory_id" FROM "media_entity" "media" INNER JOIN "directory_entity" "directory" ON "directory"."id"="media"."directoryId" WHERE ("media"."directoryId" = ? OR "directory"."path" GLOB ?) ORDER BY LENGTH("directory"."path") DESC, "media"."metadataRating" DESC, "media"."metadataCreationdate" DESC LIMIT 1 -- PARAMETERS: [3380,"Dir4/*"]

@bpatrik
Copy link
Owner

bpatrik commented Nov 28, 2021

Yes, indeed this would be an optimization.

But I would rather just de-normalize the table and "cache" the preview image, then the listing would go with one query instead of N.

@desertwitch
Copy link
Contributor Author

desertwitch commented Nov 28, 2021

Yes, indeed this would be an optimization.

But I would rather just de-normalize the table and "cache" the preview image, then the listing would go with one query instead of N.

That sounds like a good idea, but I honestly have no idea how to do it.
It would be great if the performance of folders with many subfolders could be improved this way. :-)

Currently browsing into folders with many subfolders causes the whole application to freeze up during these numerous database queries and even the port to become unresponsive for other users until the database queries are finished, unfortunately.

@bpatrik
Copy link
Owner

bpatrik commented Dec 11, 2021

Yeah it would require a more complex development.

DataManagers know if there was a database update. Like Album manger does something similar. It invalidates the album previews:

public async onNewDataVersion(): Promise<void> {
this.isDBValid = false;
}
private async updateAlbums(): Promise<void> {
if (this.isDBValid === true) {
return;
}
Logger.debug(LOG_TAG, 'Updating derived album data');
const connection = await SQLConnection.getConnection();
const albums = await connection.getRepository(AlbumBaseEntity).find();
for (const a of albums) {
await AlbumManager.updateAlbum(a as SavedSearchEntity);
}
this.isDBValid = true;
}

Something similar would be needed for directories too. Probably a bit smarter as not all Directory need t be updated if a new photo was added, just all parent dir of that photo.

@myroik
Copy link

myroik commented Jan 9, 2022

+1, its a shame the application stop responding and need total restart because of eternal load time of listing and preview images... cannot use for very large database (like photographer).

@bpatrik
Copy link
Owner

bpatrik commented Jan 14, 2022

776c8e8 introduces a column to store the directory previews. This should speed up the listing. (not first time ever listing will be still slow. assigning preview is lazy. They are assigned only when they are needed)

@desertwitch
Copy link
Contributor Author

Amazing, thanks a lot!
Can't wait to test it once it's pushed to Docker.

@bpatrik
Copy link
Owner

bpatrik commented Jan 15, 2022

nightly should have it already

@desertwitch
Copy link
Contributor Author

I have just had the chance to test this and the performance is significantly improved. A test database on MySQL with 2.5 million images in 800 folders (4 folders with 200 subfolders each) is running smoothly with minor loading times and no crashes. Initial preview building times are acceptable and once the database is filled with preview IDs there is almost no loading time for the directory listings.

Thank you so much for addressing this issue and taking the time to code this, I am sure people with especially large databases will be very happy that this works so well now. THANK YOU! 🥇

bpatrik added a commit that referenced this issue Jan 17, 2022
The job fills, directory, albums and persons' thumbnails
@bpatrik
Copy link
Owner

bpatrik commented Jan 17, 2022

wow nice size of a gallery. Happy that it works, did not have time for extensive testing.
in 69fedd6 I created a Preview Filling job that should populate all previews (Album, Person and Directory). I recommend running it right after indexing.

@desertwitch
Copy link
Contributor Author

This has been implemented and is working well since 69fedd6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants