Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support read-only subset of Virtio-FS file system #1062

Closed
wkozaczuk opened this issue Nov 21, 2019 · 5 comments
Closed

Support read-only subset of Virtio-FS file system #1062

wkozaczuk opened this issue Nov 21, 2019 · 5 comments

Comments

@wkozaczuk
Copy link
Collaborator

Some resources:

This would require implementing a new virtio device and a new filesystem. The challenge is that Linux implementation (https://gitlab.com/virtio-fs/linux/blob/virtio-fs/fs/fuse/virtio_fs.c and https://gitlab.com/virtio-fs/linux/blob/virtio-fs/fs/fuse/inode.c) uses FUSE layer which is OSv does not implement and never will probably. For starters, we could limit the initial implementation to a read-only part.

nyh pushed a commit that referenced this issue Feb 9, 2020
This patch provides initial implementation of read-only subset
of new virtio-fs filesystem. One of the main two parts is new virtio
driver - virtio-fs - that handles interactions with virtio device
as specified in the new 1.2 virtio spec -
https://stefanha.github.io/virtio/virtio-fs.html#x1-41500011. The second
part implements VFS sublayer that provides virtio-fs filesystem in OSv and in essence
serves as a passthrough between VFS abstract layer and virtio-fs device and host filesystem.

The main motivation is to provide a way to run applications on OSv
without a need to create specific image for each app and not duplicate files
already available on host. In essence one should eventually be able to
execute an app on Linux host by reading the binaries exposed through
virtio-fs deamon. For that reason the decision to implement the read-inly
subset of the fuse interface was deliberate and we may never implement
full read-write functionality.

Please note that this initial implementation is missing readdir
functionality. Also the implementation of file read operation is
pretty naive and probably quite slow - each read requires an exit to
the host. Eventually we should optimize the latter by using some sort
of cache logic (like in ROFS) and ideally implement DAX (Direct Access) that would
provide direct guest/host mapping.

Here are the steps required to mount arbitrary directory from
host and run an app from it:

1. Get latest qemu source from master (> 4.2) and build it along with the virtiofsd
daemon the qemu source directory:
   - mkdir build && cd build
   - ../configure --prefix=$PWD --target-list=x86_64-softmmu
   - make -j8 && make -j4 virtiofsd

2. Apply this patch to OSv tree.

3. Add following line to static/etc/fstab:
   '/dev/virtiofs1 /virtiofs virtiofs defaults 0 0'

4. Add following line to usr.manifest.skel:
   '/virtiofs: ../../static'

5. Build standard OSv image with native-example app and run it with this command line to make it set proper command line:
   ./scripts/run.py -e '/virtiofs/hello'

6. Give permissions to virtiofsd daemon to be able to execute unshare (http://man7.org/linux/man-pages/man2/unshare.2.html):
   sudo setcap cap_sys_admin+ep build/virtiofsd

7. Start virtiofs daemon in another terminal:
   ./build/virtiofsd --socket-path=/tmp/vhostqemu -o source=<OSV_ROOT>/apps/native-example -o cache=always -d

8. Finally run OSv by manually starting qemu with proper parameters (eventually we will change run.py to support it). Please see
   parameters for new device:

<QEMU_BUILD_ROOT>/build/x86_64-softmmu/qemu-system-x86_64 \
-m 4G \
-smp 4 \
-vnc :1 \
-gdb tcp::1234,server,nowait \
-device virtio-blk-pci,id=blk0,drive=hd0,scsi=off,bootindex=0 \
-drive file=<OSV_ROOT>/build/last/usr.img,if=none,id=hd0,cache=none,aio=native \
-netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 \
-device virtio-net-pci,netdev=un0 \
-device virtio-rng-pci \
-enable-kvm \
-cpu host,+x2apic \
-chardev stdio,mux=on,id=stdio,signal=off \
-mon chardev=stdio,mode=readline \
-device isa-serial,chardev=stdio \
-chardev socket,id=char0,path=/tmp/vhostqemu \
-device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \
-object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on -numa node,memdev=mem // This last line may be not necessary

For more info read here - https://virtio-fs.gitlab.io/howto-qemu.html.

For more information information about VirtioFS please read this -
https://vmsplice.net/~stefan/virtio-fs_%20A%20Shared%20File%20System%20for%20Virtual%20Machines.pdf
and https://virtio-fs.gitlab.io/index.html#faq.

Partially implements #1062

Signed-off-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
Message-Id: <20200208232200.15009-2-jwkozaczuk@gmail.com>
@wkozaczuk wkozaczuk changed the title Support Virtio-FS file system Support read-only subset of Virtio-FS file system Feb 10, 2020
@wkozaczuk
Copy link
Collaborator Author

wkozaczuk commented Feb 10, 2020

Even though the commit bae4381 added initial VirtFS implementation, there still many outstanding elements missing or needing improvements:

  • Implement virtiofs_readdir() - implemented with a16e5c9
  • Implement virtiofs_statfs()
  • Make implementation of virtiofs_read() more efficient it terms of exists by adding some kind of caching logic possibly like ROFS has
  • Implement DAX Windows (see https://virtio-fs.gitlab.io/design.html) using FUSE_SETUPMAPPING/FUSE_REMOVEMAPPING commands - implemented (7a2eaf2 and other commits)
  • Allow booting OSv with VirtioFS mount as root (just like ZFS, ROFS or RAMS can now) - implemented with f398a3b

foxeng added a commit to foxeng/osv that referenced this issue Aug 19, 2020
This makes the necessary and pretty straight-forward additions to the
laoder to support using virtio-fs as the root filesystem. It also makes
minimal changes to scripts/build to add support there as well. Note
that, to obtain a directory with contents specified by the manifest
files, usable as the virtio-fs host directory, one can use the existing
'export' and 'export_dir' (previously undocumented) options to
scripts/build.

Ref cloudius-systems#1062.
foxeng added a commit to foxeng/osv that referenced this issue Sep 2, 2020
This makes the necessary and pretty straight-forward additions to the
laoder to support using virtio-fs as the root filesystem. It also makes
minimal changes to scripts/build to add support there as well. Note
that, to obtain a directory with contents specified by the manifest
files, usable as the virtio-fs host directory, one can use the existing
'export' and 'export_dir' (previously undocumented) options to
scripts/build.

Ref cloudius-systems#1062.
foxeng added a commit to foxeng/osv that referenced this issue Sep 2, 2020
This makes the necessary and pretty straight-forward additions to the
laoder to support using virtio-fs as the root filesystem. It also makes
minimal changes to scripts/build to add support there as well. Note
that, to obtain a directory with contents specified by the manifest
files, usable as the virtio-fs host directory, one can use the existing
'export' and 'export_dir' (previously undocumented) options to
scripts/build.

Ref cloudius-systems#1062.
foxeng added a commit to foxeng/osv that referenced this issue Sep 2, 2020
This makes the necessary and pretty straight-forward additions to the
laoder to support using virtio-fs as the root filesystem. It also makes
minimal changes to scripts/build to add support there as well. Note
that, to obtain a directory with contents specified by the manifest
files, usable as the virtio-fs host directory, one can use the existing
'export' and 'export_dir' (previously undocumented) options to
scripts/build.

Ref cloudius-systems#1062.
wkozaczuk pushed a commit that referenced this issue Sep 9, 2020
This makes the necessary and pretty straight-forward additions to the
laoder to support using virtio-fs as the root filesystem. It also makes
minimal changes to scripts/build to add support there as well. Note
that, to obtain a directory with contents specified by the manifest
files, usable as the virtio-fs host directory, one can use the existing
'export' and 'export_dir' (previously undocumented) options to
scripts/build.

Ref #1062.

Signed-off-by: Fotis Xenakis <foxen@windowslive.com>
Message-Id: <AM0PR03MB62921196B1CC7E09D5EFA29DA62C0@AM0PR03MB6292.eurprd03.prod.outlook.com>
@foxeng
Copy link
Contributor

foxeng commented Oct 18, 2020

Updating the list of pending items, feature-wise:

Gap-fillings, some improvements etc. are on the way. Unfortunately, due to current lack of time, this will be a somewhat long way (read: couple of months?).

@wkozaczuk
Copy link
Collaborator Author

@foxeng I think I will close this story and create new ones for each pending item. Also regarding the mmap support, I think it would be nice to implement its to directly map host pages as Linux is doing it. Please see this Linux patch - https://lore.kernel.org/lkml/20200819221956.845195-1-vgoyal@redhat.com/ and these comments:

"This patch series adds DAX support to virtiofs filesystem. This allows
bypassing guest page cache and allows mapping host page cache directly
in guest address space.

When a page of file is needed, guest sends a request to map that page
(in host page cache) in qemu address space. Inside guest this is
a physical memory range controlled by virtiofs device. And guest
directly maps this physical address range using DAX and hence gets
access to file data on host.

This can speed up things considerably in many situations. Also this
can result in substantial memory savings as file data does not have
to be copied in guest and it is directly accessed from host page
cache."

@foxeng
Copy link
Contributor

foxeng commented Oct 20, 2020

Totally agreed on replacing this with more fine-grained and easier to address issues.

Regarding mmap, the comment from the linux patch above is actually the description of how the DAX window works in the context of virtio-fs (and linux specifically, when referring to the guest page cache). Right now I am lacking the specifics around mmap in OSv (it's been a number of months since I looked into it and didn't end up implementing it back then), so I am not sure how this applies.

It might be of value, mainly for future reference, to leave a short overview of our current status here. Assuming a virtio-fs device supporting the DAX window:

  • We try to satisfy all reads using the window. If for any reason that fails, we fall back to issuing a FUSE_READ to the host. See
    if (dax_mgr) {
  • The first thing the DAX window manager (fs/virtiofs/virtiofs_dax.{hh,cc}) does is to check if the requested data are already in the window. If so ("cache hit"), it copies them back directly, satisfying the request. See
    // Requested data (at least some initial) is already mapped
  • If the data are not found in the window ("cache miss") and the manager decides to bring them in, it issues a FUSE_SETUPMAPPING request, the data are mapped in the window and then copied back like above. See
    // Map file

The ideal scenario for mmap IMHO would be to provide access directly to the DAX window, thus avoiding the (intra-OSv) copy that's part of read's semantics.

@wkozaczuk
Copy link
Collaborator Author

@foxeng closing this one in lieu of two smaller specific issues I have just created - #1163 and #1164.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants