Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Expose enum api #136

Closed
wants to merge 8 commits into from
Closed

Conversation

wmanley
Copy link
Contributor

@wmanley wmanley commented Mar 18, 2021

This allows implementing a FUSE filesystem by providing a FnMut(req: ll::AnyRequest) -> Result<Response, Errno> and using enum matching rather than by implementing trait Filesystem. This has the advantages that:

  • You can apply natural Rust error handing with ? and return Err(...).
  • Delegation is much more natural. For example, in your implementation you might look up the target file by INodeNo and call a method there passing the whole AnyRequest object, rather than having to forward every one of the Filesystem methods individually.
    • Easy delegation lends itself to writing middlewares and abstractions in the future without compromising on the power of the current API for users that really want to do things that are wacky. So instead of a high-level and a low-level API a la libfuse, users could pick-and-mix depending on their needs, and evolve smoothly one way or the other without a complete rewrite.
  • The one-reply for one request is guaranteed statically, rather than dynamically with the Reply API, so the compiler will tell you when you've got it wrong, rather than finding issues in testing.
  • The enum API uses newtypes to help avoid mistakes.
  • There's less boilerplate when implementing a FS - MkNod(x) => { is a lot shorter than fn mknod(&mut self, req: &Request, parent: u64, name: &OsStr, mode: u32, _umask: u32, _rdev: u32, reply: ReplyEntry) {
  • We can more easily add things to the API in the future as the ABI with the kernel evolves without breaking backwards compatibilty. They become functions on the Operation structs rather than additional arguments to trait methods
  • Adding async support or multithreading support doesn't require duplicating Reply or Request. We just need a new serve function. For example you might have a serve_mt that takes a Fn(req: ll::AnyRequest) -> Result<Response, Errno> + Send + Sync or a serve_async that takes a Fn(req: ll::AnyRequest) -> Future<Result<Response, Errno>>. Only the dispatch code needs implementing, rather than the whole API surface of Request and Reply.
  • The existing trait API is preserved for backwards compatibility, but a new pre_request(ll::AnyRequest) method is added to aid porting.

Have a look at examples/simple_enum.rs to get a feel for what this might look like.

TODO:

Maybe for this PR, maybe for a later one:

  • Add serve_mt - Run filesystem multithreaded
  • Add serve_async - Run filesystem async

@wmanley wmanley mentioned this pull request Mar 18, 2021
8 tasks
Ok(())
}

fn dispatch(&mut self, req: &AnyRequest) -> Result<Response, Errno> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this returns Result<Response, Error> now. Being able to return Errno and use Rust's standard handling of errors is way better. And returning Response so that it's checked at compile time is a nice improvement too.

I'm less sure about replacing all the Filesystem methods with this single dispatch method though. There are two main downsides that I see:

  1. users are forced to add this really big match clause. They can of course delegate to helper methods, but then it's pretty much like the old interface.
  2. adding a new enum value with default handling is complicated. The current Filesystem trait has default handling for many methods, since there are a bunch that don't really need to be implemented for basic filesystems. I guess we can export a function like default_handler, but it seems less nice to make users have a default clause in the match which calls it.

For the goal of making ABI changes backwards compatible (which I like, btw!), how about changing Filesystem so that all the methods take a single argument which is the the enum struct used here, and they would all return Result<Response, Errno>? To preserve backwards compatibility with the existing API, we could introduce it as Filesystem2 to start with, and deprecate Filesystem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. users are forced to add this really big match clause. They can of course delegate to helper methods, but then it's pretty much like the old interface.

The difference with delegation with the enum API is that you can delegate multiple operations at once, rather than having to delegate each operation separately.

For example: I have a filesystem where I want to handle / differently to by-commit/*, also differently to by-ref/**. With the enum I can essentially do:

fn handler(req: AnyRequest) -> Result<Response, Errno> {
    let inode_type = req.inode_no() & 0xf000_0000_0000_0000 >> 60;
    match inode_type {
        0 => handler_root(req),
        1 => handler_by_commit(req),
        2 => handler_by_ref(req),
        ...
        _ => Err(Errno::ENOENT),
    }
}

Similarly if I have some helper (like a middleware) that wants to handle some requests, but not all:

fn my_helper(req: AnyRequest) -> Result<Response, Errno>;

fn handler(req: AnyRequest) -> Result<Response, Errno> {
    match {
        op::Forget(_) | op::Lookup(_) => return my_helper(),
        _ => (),
    }

    ....
}

I don't need to add that code to each of my trait method implementations.

  1. adding a new enum value with default handling is complicated. The current Filesystem trait has default handling for many methods, since there are a bunch that don't really need to be implemented for basic filesystems. I guess we can export a function like default_handler, but it seems less nice to make users have a default clause in the match which calls it.

I hadn't considered this. My assumption was that all operations return ENOSYS by default, so I was thinking that you'd just have a _ => Err(Errno::ENOSYS) handler at the bottom of the match, but I like your idea of delegating to a default_handler instead. The nice thing is that the user can delegate all the operations they choose not to handle in a single line.

For the goal of making ABI changes backwards compatible (which I like, btw!), how about changing Filesystem so that all the methods take a single argument which is the the enum struct used here, and they would all return Result<Response, Errno>? To preserve backwards compatibility with the existing API, we could introduce it as Filesystem2 to start with, and deprecate Filesystem.

If we wanted a async or a multithreaded version we'd also need FilesystemAsync where each method returns Future<...> and FilesystemMultiThreaded where each method takes &self rather than &mut self. The enum API removes the coupling of how the user code is called from what operations are available, so instead there is no duplication. It also has advantages WRT async methods in traits. We don't need to box and use &dyn for every trait method.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted a async or a multithreaded version we'd also need FilesystemAsync where each method returns Future<...> and FilesystemMultiThreaded where each method takes &self rather than &mut self. The enum API removes the coupling of how the user code is called from what operations are available, so instead there is no duplication. It also has advantages WRT async methods in traits. We don't need to box and use &dyn for every trait method.

This is definitely a strong argument. It'd be great to have async without duplicating a bunch of code! Are you sure this will work though? These enums all contain borrows, and so I feel like that won't play nicely with async, in that all the lifetimes will need to be 'static.

p.s.
I'm traveling until next week, so may be a little slow to reply

Copy link
Contributor Author

@wmanley wmanley Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this will work though? These enums all contain borrows, and so I feel like that won't play nicely with async, in that all the lifetimes will need to be 'static.

I'm pretty sure. The data will still be owned by the future higher up in the async call stack. The code would look something like:

pub async fn serve_async<Handler>(chan: Channel, handler: Handler) -> io::Result<()>
where
    Handler: Fn(AnyRequest<'_>) -> Result<Response, Errno>,
{
    loop {
        let mut buf = vec![];
        chan.read_into(buf).await?;
        spawn(handle_one(buf, handler.clone(), chan.sender()))
    }
}

async fn handle_one<Handler>(buf: Vec<u8>, handler: Handler, sender: Sender) -> Result<Response, Errno>
where
    Handler: Fn(AnyRequest<'_>) -> Result<Response, Errno>,
{
    let req = AnyRequest::try_from(buf)?;
    let resp = match handler(req).await {
        Ok(resp) => resp,
        Err(errno) => Response::new_error(errno),
    };
    sender.send(resp);
}

So the buffer is owned by an async stack frame higher up in the async stack. References to that buffer can be passed to lower stack frames with a non-static lifetime. Of course this is not a stack in the traditional sense. rustc will transform the handle_one async fn into a self-referential struct that looks like:

struct HandleOneFuture {
    buf: Vec<u8>,
    req: AnyRequest<'buf>,
} 

but we don't have to worry about such details.

I don't see any reason that this shouldn't work, but I don't have code that actually does this yet. Baby-steps and all that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I think you're right. My grasp of the borrow checker is still a little shaky :) I played around in Rust Playground, and the thing that I thought wouldn't work seems to work fine, so ya let's do!

wmanley added 8 commits April 7, 2021 11:35
In the next commit I'll convert it to be implemented in terms of our
`ll::Operation` enum.  I'm doing this as a seperate commit to highlight
the diff.
TODO: Also test trait API
We want u8 data, but with u64 alignment so we can cast the data to our
`fuse_abi` structs.  This represents such a buffer implemented with only
safe code.
Can be used to mount and serve requests for filesystems implemented in
terms of the enum API.

Note: This is a little different to the `mount` and `mount2` APIs in that
we've split out mounting from actually serving.  This is neat as it removes
the need for a separate `spawn_mount` API.  If the user wants to run their
filesystem from a different thread they can just call `serve_sync` in that
thread.

It also provides the flexibility to do other work between `mount` and
`serve` - such as readyness notification.
@cberner
Copy link
Owner

cberner commented May 1, 2021

Just wanted to check in on this, since I haven't heard from you in a while. Are you still working on this feature, @wmanley?

@wmanley
Copy link
Contributor Author

wmanley commented May 3, 2021

Just wanted to check in on this, since I haven't heard from you in a while. Are you still working on this feature, @wmanley?

Yes. I just got busy with some other things.

I did reach somewhat of an impasse with this work though. There's a lot of API duplication between this API and the existing one. It makes the documentation confusing as there isn't one clear way to do things. To resolve this I think I'll break this into two crates - one with the new enum API and one with the existing one. They can live in the same repo and the trait Filesystem API crate would depend on the other. The alternative would be to just abandon backwards compatibility with this crate.

@cberner
Copy link
Owner

cberner commented May 4, 2021

Cool. Hmm, I'm not very excited about two separate crates for the reasons I mentioned before. Maybe it can be in a separate module?

@wmanley
Copy link
Contributor Author

wmanley commented May 7, 2021

There are two advantages I see for there being a separate crate - at least for the time being:

  1. Independent releases - the enum API is a less-mature API than the existing one - so we'll probably not get it right first go. The ability to release and iterate separately from disrupting the current API is useful.
  2. Documentation - My hope is to improve the documentation to make it much easier for people write correct FUSE filesystems. To accomplish this I think it's important for documentation to have focus. This work essentially duplicates all the APIs of the fuser crate. I think Python's "There should be one-- and preferably only one --obvious way to do it." is important for learnability.

So that's the motivation.

I could put the whole of the existing API in a module, but I can't see many advantages to that vs. a separate crate - albeit in the same git repo.

@cberner
Copy link
Owner

cberner commented May 8, 2021

Hmm, I think I stand by my previous argument that the trust benefits from a single crate outweigh those:

  1. lets just put it behind a feature flag, like --experimental-api
  2. ya, I definitely see this argument about duplicate APIs being confusing. I think it's easier to explain in a single crate which to use when, than have two separate crates that people have to read both documentation for and then choose between

@cberner
Copy link
Owner

cberner commented Jun 2, 2021

Just a heads up, I'm probably going to revert the partial implementation that's merged, in a few weeks, if you're no longer actively working on it.
We can re-add it later, when things are completed, but I don't like having a bunch of dead code around

@cberner cberner closed this Jun 7, 2021
@wmanley
Copy link
Contributor Author

wmanley commented Jun 7, 2021

I'll rebase this once #146 is merged.

@wmanley wmanley mentioned this pull request Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants