Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues caused by UniFile findFile implementation #705

Closed
6 tasks done
raxod502 opened this issue Apr 23, 2024 · 12 comments
Closed
6 tasks done

Performance issues caused by UniFile findFile implementation #705

raxod502 opened this issue Apr 23, 2024 · 12 comments
Labels
Bug Something isn't working

Comments

@raxod502
Copy link
Contributor

Steps to reproduce

The performance of Mihon degrades linearly with the number of downloaded chapters for a specific manga. You can experience this behavior by selecting a manga with a large number of chapters and downloading about 1,000 of them. You will then observe the following behavior:

  1. Adding additional chapters to the download queue is extremely slow. With about 1,000 chapters downloaded, a delay of about 10 seconds is incurred per chapter added to the queue. So, for example, adding 100 additional chapters to the queue takes about 15 minutes.
  2. Navigating between chapters in the reader is extremely slow. With about 1,000 chapters downloaded, the entire interface of Mihon freezes for about 10 seconds every time a new chapter must be loaded from disk.

Expected behavior

Adding any number of chapters to the download queue should take no observable time, because this requires at most one filesystem operation to list existing downloaded chapters, no network calls, and no significant data processing.

Loading an already-downloaded chapter from disk should take no observable time unless the chapter is large, because this requires at most one filesystem operation to read the chapter images from disk.

Actual behavior

Mihon uses an unusual library called UniFile for its filesystem operations. This is described as a fork of the Android library DocumentFile, which describes itself as follows:

Representation of a document backed by either a android.provider.DocumentsProvider or a raw file on disk. This is a utility class designed to emulate the traditional File interface. It offers a simplified view of a tree of documents, but it has substantial overhead

The substantial overhead is difficult to overstate, it turns out, when dealing with directories that have a large number of files. Observe the following. When Mihon adds chapters to the download queue, it invokes the Downloader.queueChapters method, which uses the DownloadProvider.findChapterDir method to filter out chapters that have already been downloaded:

// Filter out those already downloaded.
.filter { provider.findChapterDir(it.name, it.scanlator, manga.title, source) == null }

The DownloadProvider.findChapterDir iterates through four possible filepath structures returned by the DownloadProvider.getValidChapterDirNames method (handling both a .cbz suffix and an underscore prefix), and invokes the UniFile.findFile method for each one:

return getValidChapterDirNames(chapterName, chapterScanlator).asSequence()
.mapNotNull { mangaDir?.findFile(it, true) }
.firstOrNull()

The intent of this method, which is documented upstream here, is to check for the existence of a file in a directory. Normally, this operation would be near-instantaneous, as checking for file existence is O(1). However, because the DocumentFile interface acts as an abstraction over a large number of virtual document store providers, some of which do not support random access, the implementation of findFile actually lists every child of the directory individually, fetches each of their attributes, and then iterates over the returned list to see if the desired file is found:

https://github.com/tachiyomiorg/UniFile/blob/7c257e1c642caa5ec7a41c640d61cd096f7f5c2e/library/src/main/java/com/hippo/unifile/TreeDocumentFile.java#L236-L241

This has the consequence that the performance of the method degrades linearly according to how many files exist in the directory - and it is called four times per chapter. Thus, with 1,000 chapters already downloaded, and 100 requested by the user to download, only 100 (or 400, to handle backward compatibility) system calls should be needed to check each chapter. However, due to the findFile implementation, a total of 400,000 system calls are made instead, because each chapter traverses the entire download directory. In practice, this can mean a delay of 15 minutes or more, with heavy IO usage on the device which quickly drains battery life.

Since the original issue and pull request tracker for Tachiyomi has been lost, it is unclear to me why this library was introduced in the first place, or if anybody else noticed this significant shortcoming. However, the problem completely blocks me from using the download system in Mihon at all for manga that have more than a thousand chapters, so I am willing to do the necessary work to identify an appropriate fix for this performance issue, with accompanying documentation and/or tests, that maintains an acceptable level of backwards compatibility.

Any insight from the maintainers would be appreciated, especially in terms of the rationale for the existing system around UniFile.

Crash logs

No response

Mihon version

f27ca3b

Android version

Android 14 (GrapheneOS)

Device

Google Pixel 6a

Other details

This issue causes #669.

I have verified through added logging and timing that the code path I mentioned is responsible for the performance issues around the download queue. I have not verified specifically that it is also responsible for the reader slowdown, but this seems almost certainly to be the case to me. (My testing is impaired by the fact that downloads are so slow I cannot complete them on the development version of the app.)

Acknowledgements

  • I have searched the existing issues and this is a new ticket, NOT a duplicate or related to another open or closed issue.
  • I have written a short but informative title.
  • I have gone through the FAQ and troubleshooting guide.
  • I have updated the app to version 0.16.5.
  • I have updated all installed extensions.
  • I will fill out all of the requested information in this form.
@raxod502 raxod502 added the Bug Something isn't working label Apr 23, 2024
@raxod502
Copy link
Contributor Author

I see that the reason for this "document tree" abstraction may be that it is what is returned by the directory picker / permissions model that is currently in use by Mihon:

https://developer.android.com/reference/androidx/activity/result/contract/ActivityResultContracts.OpenDocumentTree

return rememberLauncherForActivityResult(
contract = ActivityResultContracts.OpenDocumentTree(),
) { uri ->
if (uri != null) {
val flags = Intent.FLAG_GRANT_READ_URI_PERMISSION or
Intent.FLAG_GRANT_WRITE_URI_PERMISSION
context.contentResolver.takePersistableUriPermission(uri, flags)
UniFile.fromUri(context, uri)?.let {
storageDirPref.set(it.uri.toString())
}
}
}

For example, on my device, the returned URI is content://com.android.externalstorage.documents/tree/primary%3AScratch%2FMihonDev/document/primary%3AScratch%2FMihonDev, rather than an actual filesystem path. I also see options for Termux and Nextcloud in the document picker, although these both seem hilariously inappropriate as storage locations for local downloads of manga.

I guess the questions are:

  1. Is it intended that storage locations other than the local filesystem can be selected? If so, what is the use case?
  2. Is it possible to bypass the DocumentFile API and access the filesystem in a more expedient manner? If so, should that be done by DocumentContract methods as suggested in the DocumentFile reference, or are there even lower-level interfaces?
  3. If we are using the DocumentFile or DocumentContract API, do efficient methods exist for checking file existence? Clearly the ones we are using are wrong.
  4. If no efficient methods exist for checking file existence in the official APIs, are there workarounds? Do we need to maintain our own cache of filesystem state? (God, I hope not.) What do other apps do?

I will do some research.

@FooIbar
Copy link
Contributor

FooIbar commented Apr 24, 2024

Being able to support other storage locations is just a feature of SAF, which has to be used due to scoped storage.
And those file operations are IPC calls rather than system calls, which is why they are much slower.
For this case, it can be optimized by lifting listFiles outside of the lambda.

@AngelaDMerkel
Copy link

I have this issue, but in addition to causing poor performance it also causes crashes. I cannot even open the application without deleting some chapters through a file manager.

@raxod502
Copy link
Contributor Author

I have gotten a local development environment set up and am setting about adjusting UniFile to use more efficient DocumentContract methods to implement the commonly used filesystem operations. Given that Mihon uses UniFile methods that have O(n) performance instead of O(1) in many different places, I think fixing the issue at the library level will be the most fruitful. Pull request to follow.

@raxod502
Copy link
Contributor Author

I happen to disagree with you on that point, but let's keep the discussion on-topic and related to the bug report in this thread.

@raxod502
Copy link
Contributor Author

#728 and tachiyomiorg/UniFile#2, together, fix the reported problem. There may be other areas as well where Mihon's performance suffers due to unoptimized usage of SAF, but this at least remediates a number of problem hotspots by fixing a commonly used library subroutine - for example, this also improves the performance of createDirectory since it uses findFile as a subroutine, and even just findFile is used directly in about a dozen separate places in Mihon.

@FooIbar
Copy link
Contributor

FooIbar commented Apr 30, 2024

tachiyomiorg/UniFile#2 is definitely a good improvement, but do you have benchmark results of that vs lifting listFiles?
I'm a bit concerned because TreeDocumentFile.exists is also somewhat expensive.
As I said they are IPC calls and IIRC each of them will cost about 20-30 ms.

Actually, I think we could consider running them in parallel as they are rather independent.

@MajorTanya
Copy link
Contributor

Any insight from the maintainers would be appreciated, especially in terms of the rationale for the existing system around UniFile.

From checking back through the git history of the project, UniFile was first added to the project in 2016 with 6f29716 (archive of the PR). Since then, there have been changes to both it and its uses, the last of which I could find was about 5 months ago with 46aeab9, a74a689, and ca54984 in late November 2023.

I wasn't even aware of Tachiyomi at that point, let alone involved with it, so I have no insights to share.

@raxod502
Copy link
Contributor Author

raxod502 commented May 1, 2024

do you have benchmark results of that vs lifting listFiles?

Sure! To perform the following tests, I used a source with 1,000 chapters already downloaded, and then initiated the download of 50 additional chapters. Based on timestamps of added logging lines, I determined the total time from initiating the add-to-queue operation to the first chapter beginning to download. Here are the results for various implementations:

  • No changes: 164.295s
  • Your UniFile change: 80.110s (2x speedup)
  • My UniFile change + accompanying Mihon patch: 4.364s (38x speedup)
  • Precomputing file list, no other changes: 4.042s (41x speedup)

The file list precomputation I did in a very silly way just to get the performance numbers - it would have to be redone more intelligently if we move forward with that option. Here is what I did:

Patch against DocumentsContractApi21.java
diff --git i/library/src/main/java/com/hippo/unifile/DocumentsContractApi21.java w/library/src/main/java/com/hippo/unifile/DocumentsContractApi21.java
index 2dddf00..52bb777 100644
--- i/library/src/main/java/com/hippo/unifile/DocumentsContractApi21.java
+++ w/library/src/main/java/com/hippo/unifile/DocumentsContractApi21.java
@@ -26,10 +26,12 @@ import android.provider.DocumentsContract;
 import android.util.Log;
 
 import java.util.ArrayList;
+import java.util.HashMap;
 import java.util.List;
+import java.util.Map;
 
 @TargetApi(Build.VERSION_CODES.LOLLIPOP)
-final class DocumentsContractApi21 {
+public final class DocumentsContractApi21 {
     private DocumentsContractApi21() {}
 
     private static final String TAG = DocumentsContractApi21.class.getSimpleName();
@@ -110,7 +112,15 @@ final class DocumentsContractApi21 {
         return results.toArray(new Uri[results.size()]);
     }
 
+    private static Map<Uri, NamedUri[]> filesCache = new HashMap<>();
+    public static boolean cachingEnabled = false;
+
     public static NamedUri[] listFilesNamed(Context context, Uri self) {
+        if (cachingEnabled && filesCache.containsKey(self)) {
+            Log.w(TAG, "Returning from cache for URI: " + self);
+            return filesCache.get(self);
+        }
+
         final ContentResolver resolver = context.getContentResolver();
         final Uri childrenUri = DocumentsContract.buildChildDocumentsUriUsingTree(self,
                 DocumentsContract.getDocumentId(self));
@@ -139,7 +149,13 @@ final class DocumentsContractApi21 {
             closeQuietly(c);
         }
 
-        return results.toArray(new NamedUri[results.size()]);
+        NamedUri[] rv = results.toArray(new NamedUri[results.size()]);
+        if (cachingEnabled) {
+            Log.w(TAG, "Populating in cache for URI: " + self);
+            filesCache.put(self, rv);
+        }
+
+        return rv;
     }
 
     public static Uri renameTo(Context context, Uri self, String displayName) {
Patch against Downloader.kt
diff --git i/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt w/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt
index 7c8238d5a..9dc7698ad 100644
--- i/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt
+++ w/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt
@@ -1,6 +1,7 @@
 package eu.kanade.tachiyomi.data.download
 
 import android.content.Context
+import com.hippo.unifile.DocumentsContractApi21
 import com.hippo.unifile.UniFile
 import eu.kanade.domain.chapter.model.toSChapter
 import eu.kanade.domain.manga.model.getComicInfo
@@ -270,11 +271,17 @@ class Downloader(
     fun queueChapters(manga: Manga, chapters: List<Chapter>, autoStart: Boolean) {
         if (chapters.isEmpty()) return
 
+        logcat { "Turning ON caching" }
+        DocumentsContractApi21.cachingEnabled = true;
+
         val source = sourceManager.get(manga.source) as? HttpSource ?: return
         val wasEmpty = queueState.value.isEmpty()
         val chaptersToQueue = chapters.asSequence()
             // Filter out those already downloaded.
-            .filter { provider.findChapterDir(it.name, it.scanlator, manga.title, source) == null }
+            .filter {
+                logcat { "Checking download directory for chapter ${it.name}" }
+                provider.findChapterDir(it.name, it.scanlator, manga.title, source) == null
+            }
             // Add chapters to queue from the start.
             .sortedByDescending { it.sourceOrder }
             // Filter out those already enqueued.
@@ -307,6 +314,9 @@ class Downloader(
                 DownloadJob.start(context)
             }
         }
+
+        logcat { "Turning OFF caching" }
+        DocumentsContractApi21.cachingEnabled = false;
     }
 
     /**

One more thing I've noticed while doing this testing is that the implementation and usage of findChapterDir is extremely inefficient. It actually makes seven filesystem queries every time you call it (e.g. checking all the parent directories and then each possible file extension), and does no caching. Even with the actual filesystem ops stubbed out, when combined that still loses more than 30ms per call with the repeated list scanning and such.

If we fully precomputed the filesystem queries ahead of time (I didn't test this yet because it would require a bunch of changes), then that would get things even faster than my benchmarks above with the file list precomputed.

That said, even though the list precomputation produces more speedup on its own than the improved findFile implementation, the two changes can both be applied since they produce performance improvements in different areas (e.g., the better findFile will improve reader performance as well, the issue mentioned in #669).

So, I would vote for the following:

  1. I need to fix tachiyomiorg/unifile#2 to fallback to slower-but-correct behavior if somebody uses a provider other than ExternalStorageProvider - should be pretty easy.
  2. There is a minor compilation error in tachiyomiorg/unifile#1 to fix, I expect that is also a quick solve.
  3. We can apply both tachiyomiorg/unifile#1 and tachiyomiorg/unifile#2 as well as my accompanying Massively improve findFile performance #728, that should improve things considerably.
  4. Next step is to make another PR for Mihon that reduces the number of repetitive filesystem queries in the first place, by pre-listing the downloaded files properly (which will be faster than my hacky implementation above).

@FooIbar
Copy link
Contributor

FooIbar commented May 1, 2024

Nice! That looks promising.
I have a POC of running findChapterDir in parallel, would you mind testing #731 on top of the findFile optimization?

@raxod502
Copy link
Contributor Author

raxod502 commented May 4, 2024

I will run some tests to verify that all visible performance issues relating to large numbers of downloaded chapters are now resolved, and if so, close this issue. Otherwise, will file more PRs to address the other issues.

@raxod502
Copy link
Contributor Author

raxod502 commented May 4, 2024

I compiled a release build and, let me tell you, the difference is unreal. Reader performance has gone from freezing the UI for 10-15 seconds every time you flip a page, to no perceptible delay. There are further optimizations that we could make, for example #731 and the caching I mentioned in point 4 under #705 (comment), but I think this is good enough to mitigate almost all of the user-visible problems in most cases.

🎉

@raxod502 raxod502 closed this as completed May 4, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants