Performance issues caused by UniFile findFile implementation #705

raxod502 · 2024-04-23T01:38:08Z

Steps to reproduce

The performance of Mihon degrades linearly with the number of downloaded chapters for a specific manga. You can experience this behavior by selecting a manga with a large number of chapters and downloading about 1,000 of them. You will then observe the following behavior:

Adding additional chapters to the download queue is extremely slow. With about 1,000 chapters downloaded, a delay of about 10 seconds is incurred per chapter added to the queue. So, for example, adding 100 additional chapters to the queue takes about 15 minutes.
Navigating between chapters in the reader is extremely slow. With about 1,000 chapters downloaded, the entire interface of Mihon freezes for about 10 seconds every time a new chapter must be loaded from disk.

Expected behavior

Adding any number of chapters to the download queue should take no observable time, because this requires at most one filesystem operation to list existing downloaded chapters, no network calls, and no significant data processing.

Loading an already-downloaded chapter from disk should take no observable time unless the chapter is large, because this requires at most one filesystem operation to read the chapter images from disk.

Actual behavior

Mihon uses an unusual library called UniFile for its filesystem operations. This is described as a fork of the Android library DocumentFile, which describes itself as follows:

Representation of a document backed by either a android.provider.DocumentsProvider or a raw file on disk. This is a utility class designed to emulate the traditional File interface. It offers a simplified view of a tree of documents, but it has substantial overhead

The substantial overhead is difficult to overstate, it turns out, when dealing with directories that have a large number of files. Observe the following. When Mihon adds chapters to the download queue, it invokes the Downloader.queueChapters method, which uses the DownloadProvider.findChapterDir method to filter out chapters that have already been downloaded:

mihon/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt

Lines 276 to 277 in f27ca3b

    
           // Filter out those already downloaded. 
        
           .filter { provider.findChapterDir(it.name, it.scanlator, manga.title, source) == null }

The DownloadProvider.findChapterDir iterates through four possible filepath structures returned by the DownloadProvider.getValidChapterDirNames method (handling both a .cbz suffix and an underscore prefix), and invokes the UniFile.findFile method for each one:

mihon/app/src/main/java/eu/kanade/tachiyomi/data/download/DownloadProvider.kt

Lines 84 to 86 in f27ca3b

    
           return getValidChapterDirNames(chapterName, chapterScanlator).asSequence() 
        
               .mapNotNull { mangaDir?.findFile(it, true) } 
        
               .firstOrNull()

The intent of this method, which is documented upstream here, is to check for the existence of a file in a directory. Normally, this operation would be near-instantaneous, as checking for file existence is O(1). However, because the DocumentFile interface acts as an abstraction over a large number of virtual document store providers, some of which do not support random access, the implementation of findFile actually lists every child of the directory individually, fetches each of their attributes, and then iterates over the returned list to see if the desired file is found:

https://github.com/tachiyomiorg/UniFile/blob/7c257e1c642caa5ec7a41c640d61cd096f7f5c2e/library/src/main/java/com/hippo/unifile/TreeDocumentFile.java#L236-L241

This has the consequence that the performance of the method degrades linearly according to how many files exist in the directory - and it is called four times per chapter. Thus, with 1,000 chapters already downloaded, and 100 requested by the user to download, only 100 (or 400, to handle backward compatibility) system calls should be needed to check each chapter. However, due to the findFile implementation, a total of 400,000 system calls are made instead, because each chapter traverses the entire download directory. In practice, this can mean a delay of 15 minutes or more, with heavy IO usage on the device which quickly drains battery life.

Since the original issue and pull request tracker for Tachiyomi has been lost, it is unclear to me why this library was introduced in the first place, or if anybody else noticed this significant shortcoming. However, the problem completely blocks me from using the download system in Mihon at all for manga that have more than a thousand chapters, so I am willing to do the necessary work to identify an appropriate fix for this performance issue, with accompanying documentation and/or tests, that maintains an acceptable level of backwards compatibility.

Any insight from the maintainers would be appreciated, especially in terms of the rationale for the existing system around UniFile.

Crash logs

No response

Mihon version

f27ca3b

Android version

Android 14 (GrapheneOS)

Device

Google Pixel 6a

Other details

This issue causes #669.

I have verified through added logging and timing that the code path I mentioned is responsible for the performance issues around the download queue. I have not verified specifically that it is also responsible for the reader slowdown, but this seems almost certainly to be the case to me. (My testing is impaired by the fact that downloads are so slow I cannot complete them on the development version of the app.)

Acknowledgements

I have searched the existing issues and this is a new ticket, NOT a duplicate or related to another open or closed issue.
I have written a short but informative title.
I have gone through the FAQ and troubleshooting guide.
I have updated the app to version 0.16.5.
I have updated all installed extensions.
I will fill out all of the requested information in this form.

The text was updated successfully, but these errors were encountered:

raxod502 · 2024-04-23T02:05:46Z

I see that the reason for this "document tree" abstraction may be that it is what is returned by the directory picker / permissions model that is currently in use by Mihon:

https://developer.android.com/reference/androidx/activity/result/contract/ActivityResultContracts.OpenDocumentTree

mihon/app/src/main/java/eu/kanade/presentation/more/settings/screen/SettingsDataScreen.kt

Lines 107 to 120 in f27ca3b

    
           return rememberLauncherForActivityResult( 
        
               contract = ActivityResultContracts.OpenDocumentTree(), 
        
           ) { uri -> 
        
               if (uri != null) { 
        
                   val flags = Intent.FLAG_GRANT_READ_URI_PERMISSION or 
        
                       Intent.FLAG_GRANT_WRITE_URI_PERMISSION 
        
                   context.contentResolver.takePersistableUriPermission(uri, flags) 
        
                   UniFile.fromUri(context, uri)?.let { 
        
                       storageDirPref.set(it.uri.toString()) 
        
                   } 
        
               } 
        
           }

For example, on my device, the returned URI is content://com.android.externalstorage.documents/tree/primary%3AScratch%2FMihonDev/document/primary%3AScratch%2FMihonDev, rather than an actual filesystem path. I also see options for Termux and Nextcloud in the document picker, although these both seem hilariously inappropriate as storage locations for local downloads of manga.

I guess the questions are:

Is it intended that storage locations other than the local filesystem can be selected? If so, what is the use case?
Is it possible to bypass the DocumentFile API and access the filesystem in a more expedient manner? If so, should that be done by DocumentContract methods as suggested in the DocumentFile reference, or are there even lower-level interfaces?
If we are using the DocumentFile or DocumentContract API, do efficient methods exist for checking file existence? Clearly the ones we are using are wrong.
If no efficient methods exist for checking file existence in the official APIs, are there workarounds? Do we need to maintain our own cache of filesystem state? (God, I hope not.) What do other apps do?

I will do some research.

FooIbar · 2024-04-24T06:25:07Z

Being able to support other storage locations is just a feature of SAF, which has to be used due to scoped storage.
And those file operations are IPC calls rather than system calls, which is why they are much slower.
For this case, it can be optimized by lifting listFiles outside of the lambda.

AngelaDMerkel · 2024-04-26T12:27:37Z

I have this issue, but in addition to causing poor performance it also causes crashes. I cannot even open the application without deleting some chapters through a file manager.

raxod502 · 2024-04-28T04:30:49Z

I have gotten a local development environment set up and am setting about adjusting UniFile to use more efficient DocumentContract methods to implement the commonly used filesystem operations. Given that Mihon uses UniFile methods that have O(n) performance instead of O(1) in many different places, I think fixing the issue at the library level will be the most fruitful. Pull request to follow.

raxod502 · 2024-04-30T02:27:24Z

I happen to disagree with you on that point, but let's keep the discussion on-topic and related to the bug report in this thread.

raxod502 · 2024-04-30T03:00:37Z

#728 and tachiyomiorg/UniFile#2, together, fix the reported problem. There may be other areas as well where Mihon's performance suffers due to unoptimized usage of SAF, but this at least remediates a number of problem hotspots by fixing a commonly used library subroutine - for example, this also improves the performance of createDirectory since it uses findFile as a subroutine, and even just findFile is used directly in about a dozen separate places in Mihon.

FooIbar · 2024-04-30T04:16:59Z

tachiyomiorg/UniFile#2 is definitely a good improvement, but do you have benchmark results of that vs lifting listFiles?
I'm a bit concerned because TreeDocumentFile.exists is also somewhat expensive.
As I said they are IPC calls and IIRC each of them will cost about 20-30 ms.

Actually, I think we could consider running them in parallel as they are rather independent.

MajorTanya · 2024-04-30T08:44:02Z

Any insight from the maintainers would be appreciated, especially in terms of the rationale for the existing system around UniFile.

From checking back through the git history of the project, UniFile was first added to the project in 2016 with 6f29716 (archive of the PR). Since then, there have been changes to both it and its uses, the last of which I could find was about 5 months ago with 46aeab9, a74a689, and ca54984 in late November 2023.

I wasn't even aware of Tachiyomi at that point, let alone involved with it, so I have no insights to share.

raxod502 · 2024-05-01T00:54:37Z

do you have benchmark results of that vs lifting listFiles?

Sure! To perform the following tests, I used a source with 1,000 chapters already downloaded, and then initiated the download of 50 additional chapters. Based on timestamps of added logging lines, I determined the total time from initiating the add-to-queue operation to the first chapter beginning to download. Here are the results for various implementations:

No changes: 164.295s
Your UniFile change: 80.110s (2x speedup)
My UniFile change + accompanying Mihon patch: 4.364s (38x speedup)
Precomputing file list, no other changes: 4.042s (41x speedup)

The file list precomputation I did in a very silly way just to get the performance numbers - it would have to be redone more intelligently if we move forward with that option. Here is what I did:

Patch against DocumentsContractApi21.java

diff --git i/library/src/main/java/com/hippo/unifile/DocumentsContractApi21.java w/library/src/main/java/com/hippo/unifile/DocumentsContractApi21.java
index 2dddf00..52bb777 100644
--- i/library/src/main/java/com/hippo/unifile/DocumentsContractApi21.java
+++ w/library/src/main/java/com/hippo/unifile/DocumentsContractApi21.java
@@ -26,10 +26,12 @@ import android.provider.DocumentsContract;
 import android.util.Log;
 
 import java.util.ArrayList;
+import java.util.HashMap;
 import java.util.List;
+import java.util.Map;
 
 @TargetApi(Build.VERSION_CODES.LOLLIPOP)
-final class DocumentsContractApi21 {
+public final class DocumentsContractApi21 {
     private DocumentsContractApi21() {}
 
     private static final String TAG = DocumentsContractApi21.class.getSimpleName();
@@ -110,7 +112,15 @@ final class DocumentsContractApi21 {
         return results.toArray(new Uri[results.size()]);
     }
 
+    private static Map<Uri, NamedUri[]> filesCache = new HashMap<>();
+    public static boolean cachingEnabled = false;
+
     public static NamedUri[] listFilesNamed(Context context, Uri self) {
+        if (cachingEnabled && filesCache.containsKey(self)) {
+            Log.w(TAG, "Returning from cache for URI: " + self);
+            return filesCache.get(self);
+        }
+
         final ContentResolver resolver = context.getContentResolver();
         final Uri childrenUri = DocumentsContract.buildChildDocumentsUriUsingTree(self,
                 DocumentsContract.getDocumentId(self));
@@ -139,7 +149,13 @@ final class DocumentsContractApi21 {
             closeQuietly(c);
         }
 
-        return results.toArray(new NamedUri[results.size()]);
+        NamedUri[] rv = results.toArray(new NamedUri[results.size()]);
+        if (cachingEnabled) {
+            Log.w(TAG, "Populating in cache for URI: " + self);
+            filesCache.put(self, rv);
+        }
+
+        return rv;
     }
 
     public static Uri renameTo(Context context, Uri self, String displayName) {

Patch against Downloader.kt

diff --git i/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt w/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt
index 7c8238d5a..9dc7698ad 100644
--- i/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt
+++ w/app/src/main/java/eu/kanade/tachiyomi/data/download/Downloader.kt
@@ -1,6 +1,7 @@
 package eu.kanade.tachiyomi.data.download
 
 import android.content.Context
+import com.hippo.unifile.DocumentsContractApi21
 import com.hippo.unifile.UniFile
 import eu.kanade.domain.chapter.model.toSChapter
 import eu.kanade.domain.manga.model.getComicInfo
@@ -270,11 +271,17 @@ class Downloader(
     fun queueChapters(manga: Manga, chapters: List<Chapter>, autoStart: Boolean) {
         if (chapters.isEmpty()) return
 
+        logcat { "Turning ON caching" }
+        DocumentsContractApi21.cachingEnabled = true;
+
         val source = sourceManager.get(manga.source) as? HttpSource ?: return
         val wasEmpty = queueState.value.isEmpty()
         val chaptersToQueue = chapters.asSequence()
             // Filter out those already downloaded.
-            .filter { provider.findChapterDir(it.name, it.scanlator, manga.title, source) == null }
+            .filter {
+                logcat { "Checking download directory for chapter ${it.name}" }
+                provider.findChapterDir(it.name, it.scanlator, manga.title, source) == null
+            }
             // Add chapters to queue from the start.
             .sortedByDescending { it.sourceOrder }
             // Filter out those already enqueued.
@@ -307,6 +314,9 @@ class Downloader(
                 DownloadJob.start(context)
             }
         }
+
+        logcat { "Turning OFF caching" }
+        DocumentsContractApi21.cachingEnabled = false;
     }
 
     /**

One more thing I've noticed while doing this testing is that the implementation and usage of findChapterDir is extremely inefficient. It actually makes seven filesystem queries every time you call it (e.g. checking all the parent directories and then each possible file extension), and does no caching. Even with the actual filesystem ops stubbed out, when combined that still loses more than 30ms per call with the repeated list scanning and such.

If we fully precomputed the filesystem queries ahead of time (I didn't test this yet because it would require a bunch of changes), then that would get things even faster than my benchmarks above with the file list precomputed.

That said, even though the list precomputation produces more speedup on its own than the improved findFile implementation, the two changes can both be applied since they produce performance improvements in different areas (e.g., the better findFile will improve reader performance as well, the issue mentioned in #669).

So, I would vote for the following:

I need to fix tachiyomiorg/unifile#2 to fallback to slower-but-correct behavior if somebody uses a provider other than ExternalStorageProvider - should be pretty easy.
There is a minor compilation error in tachiyomiorg/unifile#1 to fix, I expect that is also a quick solve.
We can apply both tachiyomiorg/unifile#1 and tachiyomiorg/unifile#2 as well as my accompanying Massively improve findFile performance #728, that should improve things considerably.
Next step is to make another PR for Mihon that reduces the number of repetitive filesystem queries in the first place, by pre-listing the downloaded files properly (which will be faster than my hacky implementation above).

FooIbar · 2024-05-01T05:47:48Z

Nice! That looks promising.
I have a POC of running findChapterDir in parallel, would you mind testing #731 on top of the findFile optimization?

Related to #705

raxod502 · 2024-05-04T20:52:43Z

I will run some tests to verify that all visible performance issues relating to large numbers of downloaded chapters are now resolved, and if so, close this issue. Otherwise, will file more PRs to address the other issues.

raxod502 · 2024-05-04T21:42:15Z

I compiled a release build and, let me tell you, the difference is unreal. Reader performance has gone from freezing the UI for 10-15 seconds every time you flip a page, to no perceptible delay. There are further optimizations that we could make, for example #731 and the caching I mentioned in point 4 under #705 (comment), but I think this is good enough to mitigate almost all of the user-visible problems in most cases.

🎉

raxod502 added the Bug Something isn't working label Apr 23, 2024

raxod502 mentioned this issue Apr 23, 2024

Ability to disable chapter transitions #669

Closed

4 tasks

FooIbar mentioned this issue Apr 29, 2024

When I download 40 chapter or more it start to become slow #723

Closed

6 tasks

This was referenced Apr 30, 2024

Massively improve findFile performance tachiyomiorg/UniFile#2

Merged

Massively improve findFile performance #728

Merged

FooIbar mentioned this issue May 2, 2024

Mihon take long to load .cbz offline manga #735

Closed

6 tasks

AntsyLich added a commit that referenced this issue May 4, 2024

Remove some legacy folder/file name lookup for download

e55e5f6

Related to #705

raxod502 mentioned this issue May 4, 2024

Run findChapterDir in parallel #731

Closed

raxod502 closed this as completed May 4, 2024

github-actions bot locked as resolved and limited conversation to collaborators May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues caused by UniFile findFile implementation #705

Performance issues caused by UniFile findFile implementation #705

raxod502 commented Apr 23, 2024

raxod502 commented Apr 23, 2024

FooIbar commented Apr 24, 2024

AngelaDMerkel commented Apr 26, 2024

raxod502 commented Apr 28, 2024

raxod502 commented Apr 30, 2024

raxod502 commented Apr 30, 2024

FooIbar commented Apr 30, 2024

MajorTanya commented Apr 30, 2024

raxod502 commented May 1, 2024

FooIbar commented May 1, 2024

raxod502 commented May 4, 2024

raxod502 commented May 4, 2024

Performance issues caused by UniFile findFile implementation #705

Performance issues caused by UniFile findFile implementation #705

Comments

raxod502 commented Apr 23, 2024

Steps to reproduce

Expected behavior

Actual behavior

Crash logs

Mihon version

Android version

Device

Other details

Acknowledgements

raxod502 commented Apr 23, 2024

FooIbar commented Apr 24, 2024

AngelaDMerkel commented Apr 26, 2024

raxod502 commented Apr 28, 2024

raxod502 commented Apr 30, 2024

raxod502 commented Apr 30, 2024

FooIbar commented Apr 30, 2024

MajorTanya commented Apr 30, 2024

raxod502 commented May 1, 2024

FooIbar commented May 1, 2024

raxod502 commented May 4, 2024

raxod502 commented May 4, 2024