-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quadratic complexity when reading large entries #21
Comments
https://github.com/google/fuse-archive |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Make a
.tar
archive containing a single large file,archivemount
it and copy out the single element in the archive:Notice that the "MB/s"
dd
throughput slows down (in the interactive output, not copy/pasted above) as time goes on. It looks like there's some sort of super-linear algorithm involved, and the whole thing takes more than a minute. In comparison, a straighttar
extraction takes less than a second.Sprinkling some logging in the
_ar_read
function inarchivemount.c
shows that thedd
leads to multiple_ar_read
calls. In the steady state, it readssize = 128 KiB
each time, with theoffset
argument incrementing on each call.Inside that
_ar_read
function, if we take theif (node->modified)
false branch, then for each call:archive_read_new
, even though it's the same archive every time.archive_entry
for the file. Again, this is once per call, even if it's conceptually the samearchive_entry
object used in previous calls.offset
bytes from the start of the entry's contents to a 'trash' buffer, before finally copying out the payload that_ar_read
is actually interested in.The total number of bytes produced by
archive_read_data
calls is therefore quadratic in the decompressed size of the archive entry. This is slow enough for.tar
files but probably worse for.tar.gz
files. The whole thing is reminiscent of the Shlemiel the Painter story.There may be some complications if we're re-writing the archive, but when mounting read-only, we should be able to get much better performance.
The text was updated successfully, but these errors were encountered: