-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZIM Fuse Filesystem #400
base: main
Are you sure you want to change the base?
ZIM Fuse Filesystem #400
Conversation
setting up the project
Implemented a tree (trie-like) where a path is broken down per directories. For example, my/new/path1.jpg my/new/path2.jpg are represented as / (root) my new path1.jpg path2.jpg
Implemented some common FUSE functions. We can browse the ZIM file using ls, cd, etc. and open files in text editors or using commands such as "cat" now.
Updated the README to add information about zimfuse
I guess this would be an implementation of kiwix/overview#79 ? |
mappedNodes owns the nodes
Some entries had long filenames, this change reduces that to 100 characters which is will within the allowed limit. This introduces possible name collisions which will be fixed in the following commit
now if two files have the same name, the 2nd file is changed to "originalFileName(1)"
When reading ZIMs with a lot of files (for example: mediawiki_en_all), getting their filesize took a lot of time. There were 2 choices: 1. Read filesize while creating the tree - this gave fast subsequent responses (on commands such as ls) but the fuse initialization took a good amount of time. 2. Find filesize when requested by a command (and save it for future requests) - This provides fast initialization but the first read takes time (though, not as much as getting all filesizes - only the requested ones). They are later saved for subsequent requests. I went with option 2.
Yes. |
@juuz0 Thanks but most of the CI is broken? Can you fix it? |
CI fails because 'fuse3' is not available on the system...maybe something to add in kiwix-build later? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @juuz0.
But I have few general comments better addressed on the full PR than each commit:
- This kind of tree is a perfect use case for OOP and inheritance.
A basicNode
withname
,parent
. Then aDir
node (inheritingNode
) withchildren
and aLeaf
node. - You define a
Node::Ptr
asunique_ptr
, but theNode
tree is composed ofNode*
.Node::Ptr
is only used inTree::mappedNodes
. This makeTree
the direct owner of all the nodes andNodes
only making "reference" to other nodes. It would make more sense to haveDir
nodes the owners of their children. - The
Tree::mappedNodes
is kind of global cache. But I'm not sure we need it. Either "directories" contains few entries and loop in thevector
should be enough or not, and in this case, it would be better to transformNode::children
into amap
. We would have the O(1) access at node level and top of ensuring us the unity of children names. - The
Tree::statCache
could be removed by simply move the cachedstruct stat
in theNode
itself. - Then the
Tree
could probably be removed as it is simply aDir
node without parent. - The
collisionCount
is local to each node but "global" to all children. This means that four twinsfoo
(x2) andbar
(x2), you will find the sanitized namesfoo
,foo(1)
,bar
andbar(2)
. It should bebar(1)
. - Redirects are not properly handle. You resolve the redirection and return the content of the target but it may break relative links in html. You should treat redirect as symlink (and so implement
readlink
)
On top of that, I wonder about the memory usage of the tree.
wikipedia_fr_all_maxi contains 7029908 entries (leafs). And size_of(Node)
is 144 (without counting the actual data store in it (names/fullPath/originalPath bytes, children ptr). So it means that we need at least 965MiB for the nodes only. Probably more that 2GiB if we add the path and children ptr.
We can reduce that by carefully define Node
structure and what we store but at the end, we will always use a lot of data.
(We may simply don't care and tell user that mounting zim files need a lot of memory, at least for now)
@@ -28,13 +28,14 @@ find_library_in_compiler = meson.version().version_compare('>=0.31.0') | |||
rt_dep = dependency('rt', required:false) | |||
docopt_dep = dependency('docopt', static:static_linkage) | |||
|
|||
with_writer = host_machine.system() != 'windows' | |||
with_writer_and_mount = host_machine.system() != 'windows' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to have a on_windows
, or split this in two variables with_writer
and with_mount
.
We should not link writer and mount compilation together (at least not explicitly in one variable)
meson.build
Outdated
@@ -28,13 +28,14 @@ find_library_in_compiler = meson.version().version_compare('>=0.31.0') | |||
rt_dep = dependency('rt', required:false) | |||
docopt_dep = dependency('docopt', static:static_linkage) | |||
|
|||
with_writer = host_machine.system() != 'windows' | |||
with_writer_and_mount = host_machine.system() != 'windows' | |||
|
|||
if with_writer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be renamed also.
Fixes kiwix/overview#79
Mount any ZIM file to your filesystem
Usage:
zimfuse zimFile.zim mountDir
mountDir should exist before using zimfuse.