Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File System API #8

Closed
Gozala opened this issue May 16, 2018 · 36 comments
Closed

File System API #8

Gozala opened this issue May 16, 2018 · 36 comments
Assignees

Comments

@Gozala
Copy link
Contributor

Gozala commented May 16, 2018

Low level random access read/write APIs similar to fs.read

Gecko has multiple FileSystem APIs available most recommended one (last time I checked) was OS.File which likely could be (wrapped &) exposed as WebExtensions API. For random access read / writes there are still nsIFileInputStream & nsIFileOutputStream APIs and legacy add-on API wrappers to expose them in node compatibleAPI. For reading / writing at specific byte offset OS.File.prototype.setPosition could be used.

🐞 1246236

@Gozala Gozala added this to the File System APIs milestone May 16, 2018
@Gozala
Copy link
Contributor Author

Gozala commented May 17, 2018

From @pfrazee

If we're looking for a 1:1 with what Beaker can do, having a Filesystem API that can watch folders for changes, do read/writes within files, and handle "sparse" files, should all be fairly important.

In the short-term: we can get Dat working on IndexedDB, but performance would suffer a bit. In Beaker we also watch local folders to autopublish to dats, and that wouldn't be possible here.

From @mafintosh

Noting we'd need random access read/write fs support here to make it as performant as the node/beaker impl.

Also, we rely on native code to run our utp stack. We can run without it, but we might need some udp support here else.

From @rpl

an API for direct filesystem access has been debated a number of time over the last two years but we haven't yet reached an agreement about it, it is not that we will never provide one, but it has not been an "easy topic", and we don't have it on roadmap for the next two releases (62 and 63).

One of the tricky things about an API for direct filesystem access (besides the security/privacy concerns) is that the extension code runs in a sandboxed child process which cannot directly access files on disk (and so any disk access should be mediated by the main process, possibly not on its main thread).

In the meantime, the extension that needs some kind of random access files have only two options:

  • nativeMessaging, and so basically leave the external native app to access the files on behalf of the extension
  • IDBMutableFile, which is a non-standard extension to the IndexedDB API which allows IndexedDB to associate to a key a mutable file (instead of just an immutable Blob/File like it is supported on most of the modern browsers), and the mutable file profiles a (non-standard) API to read and write the file with random access (the file is stored on disk as a real file, read/write should happen in a separate IndexedDB thread, and it can be used in the sandboxed child extension process):
    https://developer.mozilla.org/en-US/docs/Web/API/IDBMutableFile and https://developer.mozilla.org/en-US/docs/Web/API/LockedFile

As a side note, issues with micro-tasks and Promise scheduling has been recently fixed in Firefox 60, and so the IDBRequest API that IndexedDB APIs provides can now be easily wrapped into a promise based API without issues with the IDB transaction being closed before the promise has a chance to run (https://bugzilla.mozilla.org/show_bug.cgi?id=1193394#c164).

From @pfrazee

Helpful insight, Luca. The downside of FS access through the main thread is messaging overhead, right?

IDBMutableFile might end up being a performance benefit compared to the rest of IndexedDB.

There are basically 2 concerns:

  1. Internal storage performance. This is what needs random read/write control.

  2. UX for dev workflows. This doesn't need total FS access, but it does need the extension to be able to access folders that the user specifies. That access then needs to include watching and read/write.

From @rpl

We try to avoid to run heavy tasks on the main thread because it creates "user perceivable" performance issues, especially the main thread on the main process (which if it is too busy, it would make the browser UI to lag)

And so, given that the child process are sandboxes by default and cannot access the filesystem, IndexedDB ensure that we run the most expensive code in a separate thread of the main process.

IDBMutableFile could help with "1. internal storage performance" (even if in a non-standard way),
but unfortunately there isn't currently any very good workaround for "2. UX for dev workflows" :-( (at least not one that doesn't involve "using an external helper native application and nativeMessaging").

There is actually another non-standard feature (coming from chrome/webkit), which would allow a user to provide read access to an entire directory tree (e.g. by let the user to drag and drop a directory instead of a single file, or with a special option on an input element), but it doesn't provide write access to that directory tree (and no watching):
https://developer.mozilla.org/en-US/docs/Web/API/File_and_Directory_Entries_API#Getting_access_to_a_file_system

It may be probably useful for an extension that wants/needs to allow a user to transfer a directory tree into a number of IDBMutableFile object stored in IDB, to be modified by the extension (but currently then there is no easy/performant way to sync those changes back into the original directory tree on the filesystem).

In the long run, once an actual FileSystem WebExtension API may finally be implemented, that API is very likely to provide "sandboxed read/write" to a directory (or directories) that the user explicitly "specify / give access" to the extension (e.g. to reduce the number of security concerns that have been raised during the past discussions about providing such a powerful/dangerous API to an extension).

But "watching the directory tree for changes" seems also an interesting feature to keep in mind, so that we do not forget about it once an actual API implementation is being proposed.

@Gozala
Copy link
Contributor Author

Gozala commented May 17, 2018

Respondig to @rpl

We try to avoid to run heavy tasks on the main thread because it creates "user perceivable" performance issues, especially the main thread on the main process (which if it is too busy, it would make the browser UI to lag)

And so, given that the child process are sandboxes by default and cannot access the filesystem, IndexedDB ensure that we run the most expensive code in a separate thread of the main process.

As things stand now we were planning on using OS.File API that does all the IO off the main thread. As you also pointed out there is no way to do IO anywhere other the parent / main process so sadly that would imply some overhead of pumping data from Extension Process -> Parent Process -> OS.FIle Thread and reverse. I imagine that is also more or less what IndexedDB API has to do is it not ?

Only other option I see (and seem to be getting increasingly more in favor of) to expose a ServiceWorker like API say ProtocolWorker that would be sandboxed version of ChromeWorker in the parent process with an access to TCP / UDP sockets, MDNS, FileSystem. That way all of the IO will actually happen right there and no data juggling will be required across all of these processes, except when data will be read from nsIChannel of the corresponding nsIProtocolHandler in the child process (which is sadly what happens when document is loaded).

Major downside of this approach is that I would need to replicate large portion of what WebExtensions do (like permissions, lazy loading, and probably a lot more). Still I would love to hear your opinion in this regard.

There is actually another non-standard feature (coming from chrome/webkit), which would allow a user to provide read access to an entire directory tree (e.g. by let the user to drag and drop a directory instead of a single file, or with a special option on an input element), but it doesn't provide write access to that directory tree (and no watching):
https://developer.mozilla.org/en-US/docs/Web/API/File_and_Directory_Entries_API#Getting_access_to_a_file_system

This actually more or less what Beaker is doing right now except it's not drag and drop based but rather they provide a prompt where user can choose a directory, which I imagine should not be too difficult to add. Write & Watch is essential though.

@pfrazee I wonder what requirements would FileSystem API would have to satisfy to not limit beaker in any way while at the same time was scoped limited in scope like FileSystem.requestFileSystem() allowing user to choose a target directory to expose. It seems that it would be a better option from the Firefox WebExtensions perspective as it reduces the harm malicious add-on could cause.

@rpl If we were to design our FileSystem API in the following principal would there be any way to persist access that user granted across the browser restarts ?

But "watching the directory tree for changes" seems also an interesting feature to keep in mind, so that we do not forget about it once an actual API implementation is being proposed.

That reminds me that this used to be an API not present in any way in Gecko back in Add-on SDK days at least.

@pfrazee
Copy link

pfrazee commented May 18, 2018

The ProtocolWorker idea is interesting - is there any precedent for putting an extension in the main thread? Other than the security implications it may not be a total performance win: we'll want the Dat networking stack to be off main so that it doesn't hurt overall perf.

RE the file access: yeah I think we could work within requestFileSystem() if it let the user pick the directory and then gave us full capabilities.

@Gozala
Copy link
Contributor Author

Gozala commented May 19, 2018

The ProtocolWorker idea is interesting - is there any precedent for putting an extension in the main thread?

Up until WebExtensions Firefox used to run all of the add-on's in the main thread (which is different from the content threads), but it also used to be a reason why Firefox was becoming a lot less responsive for users with many add-ons installed.

With WebExtensions as far as I understand actual add-on code is loaded in separate extension process, content scripts are loaded in the content processes (usually referred to as child processes). Still all the IO happens on in the main process (usually referred to as parent process) as far as I understand intentionally so to enable sandboxing. APIs exposed to the extensions usually have corresponding code (implemented by WebExtensions team) run in the main (thread of the main) process with which they exchange async messages to do IO related tasks.

Other than the security implications it may not be a total performance win: we'll want the Dat networking stack to be off main so that it doesn't hurt overall perf.

I should elaborate how protocol handlers work I think to explain why I think ProtocolWorker would be a better option.

Protocol handlers in gecko need to be installed both in main process and each content process. As far as I can tell one in main process is only used to create URLs while the one's in content processes are used to load content. So when tab say is attempting to load dat://${hash} it will as corresponding content process provide nsIChannel (which is similar to stream in node) since content process needs to do IO it will have to send message to main process, but even though main process can do IO it needs to pass the request to the extension process which has protocol implementation logic, but extension process need to perform some IO to read / fetch that content so it needs to send message back to main process which will stream chunks of data back to extension process which will pre-process it and send it back to main process so that it can finally forward it to the corresponding content process so tab can load it. In summary this turns out be at least 6 steps that requires copying actual data ArrayBuffer 3 times (as it goes across processes), and that is if the whole thing can be served in single round-trip.

# content process main process extension process
1 📧 Request url 📭 💤
2 💤 📧 Request url 📭
3 💤 📭 📧fs.read(file)
4 💤 📧 ReadArrayBuffer ♻️ Copy ArrayBuffer
5 💤 ♻️ Copy ArrayBuffer 📧 Respond ArrayBuffer
6 ♻️ Copy ArrayBuffer 📧 Respond ArrayBuffer 💤

If protocol handler doesn IO operations for a request and responds in k chunks total number of copying for that request will be n + 2*k.

It's not necessarily going to be slow, but it also does not seems to me that there is much value in running protocol handler logic in extension process. I believe main reason why add-on code runs in the extension process in first place is to provide it with DOM access, which in case of protocol handler does not seem to make much sense.

Now what I was suggesting to do was to spawn a dedicated worker thread on the main process and load protocol handler code there. It still should not block main process, but unlike the above case it would be possible to transfer ArrayBuffers across main threads without copying.

So in this case if protocol handler does n IO operations for a requqest and responds in k chunk total number of copying for that request will be k.

@Gozala Gozala mentioned this issue May 21, 2018
12 tasks
@Gozala Gozala self-assigned this May 24, 2018
@Gozala
Copy link
Contributor Author

Gozala commented May 24, 2018

@mafintosh @lidel @olizilla Do you care about having control of file open / close files ? Or would sparse read / write with automatic close satisfy requirements ?

@Gozala
Copy link
Contributor Author

Gozala commented May 25, 2018

For self reference there seems to be a "@mozilla.org/toolkit/filewatcher/native-file-watcher;1" XPCOM class implementing nsINativeFileWatcherService that presumably could be used for file fs watching.

@Gozala
Copy link
Contributor Author

Gozala commented May 25, 2018

I wrote down the interface definition for this API and could use some feedback:
https://github.com/mozilla/libdweb/blob/master/src/FileSystem/API.js

  • If there is something missing please speak up.
  • If there is something strictly unnecessary please let me know, I'd be happy to reduce the scope.
  • If for whatever reason mounting (as describe in the comment)[https://github.com/mozilla/libdweb/blob/66aeb6d57086e2f6c4526ee2687ce949b7173e49/src/FileSystem/API.js#L4-L11] isn't going to work for you please speak up.

P.S: API mostly follows existing OS.File. I do understand that node fs compatible API would have being likely more desirable, but that would have introduced extra layer of adapters and node Stream implementation, which I'd rather see happen in user land.

@pfrazee
Copy link

pfrazee commented May 25, 2018

Seems right. I'll pass it on to maf. Is the idea of createUnique to create a file with an unused name?

@Gozala
Copy link
Contributor Author

Gozala commented May 25, 2018

Seems right. I'll pass it on to maf.

👍

Is the idea of createUnique to create a file with an unused name?

Yes (I think it just prefixes number to the basename). It's just already existed in the gecko (see OS.File.openUnique for more details), presumably because there was a demand for it internally, but I'm not sure if it's going to be at all useful in other context.

@Gozala
Copy link
Contributor Author

Gozala commented May 25, 2018

I've changed my mind on the proposed API a bit, let me elaborate and please let me know if you see concerns with it.

I am considering removing path manipulation utils and instead expose root URL on the mounted FileSystem instance in the form of file:///path/to/chosen/dir/. This has two benefits IMO:

  1. We no longer need all the path manipulation stuff as you'd be able to use URL for that.
  2. Unix / Windows differences get abstracted away

Maybe there's a good reason against it (if you think of one please let me know), I can't think of one myself.

@mafintosh
Copy link

@Gozala using the API you propose in the link above how would I do a read/write at a specific byte offset? As I read your spec that seems to be two different async operations?

Other than that it looks pretty solid. In general we rely on this storage API https://github.com/random-access-storage/random-access-storage#var-storage--randomaccessstorageoptions btw

For the auto open/close stuff ... I'd prefer this to be as low level as possible. Having control over open/close allows us to serve 1000s of files in the browser potentially as we can tune when to open and close them (we already do this in our node code). If that is really tricky I'm sure we can live without it as long as the file isn't opened/closed for every write/read op as that would prob kill our perf.

@dominictarr
Copy link

dominictarr commented May 26, 2018

@mafintosh to read the second kilobyte: open(path, {read: true}).read({byteSize: 1024, byteOffset: 1024})

@Gozala I would definitely want control of open,close, link, unlink, move (aka rename).
move is important to get atomic updates - for example, stream a file in, and then move it into place when it's complete. move is a single operation so if the streaming fails, you don't move the file and the system stays consistent.

I'd recommend using fd instead of path as the first argument to read. Oh, I see that it becomes the OO style by opening the file and getting a ReadableFile which has read on it (and no path or fd argument). I think i personally prefer simpler positional args for low level stuff read(fd, start, end, buffer, cb) because that is a lower common denomenator and already close to node's api. If you went with the OO interface, I would probably end up polyfilling it back down to the simpler api anyway.

important: read options should include a buffer to read into so that it's possible to reuse memory - for high performance it's necessary to avoid memory allocations, also it would enable reading directly into a webassembly context without another memory copy. This will make a big difference to high performance applications!

@Gozala
Copy link
Contributor Author

Gozala commented May 31, 2018

@Gozala using the API you propose in the link above how would I do a read/write at a specific byte offset? As I read your spec that seems to be two different async operations?

Exactly what @dominictarr said (quoting below)

@mafintosh to read the second kilobyte: open(path, {read: true}).read({byteSize: 1024, byteOffset: 1024})

For the auto open/close stuff ... I'd prefer this to be as low level as possible. Having control over open/close allows us to serve 1000s of files in the browser potentially as we can tune when to open and close them (we already do this in our node code). If that is really tricky I'm sure we can live without it as long as the file isn't opened/closed for every write/read op as that would prob kill our perf.

👍 I assumed so, but still wanted to make sure. I expect that proposed open, close APIs would cover that.

@Gozala
Copy link
Contributor Author

Gozala commented May 31, 2018

@Gozala I would definitely want control of open,close, link, unlink, move (aka rename).

👍 I assume existing open, close, removeFile, removeDirectory and move capture all of the listed operations except link which is unfortunately not available in gecko.

I don't know on why API to create hard links isn't available it could be simply because it was not necessary from with in gecko or it could be related to platform portability or anything else. I can try to find that out. In the meantime I can provide some pointers:

Filesystem access in gecko is achieved via ctypes that allows loading native system libraries, on Unix it seems to use libc.so / libSystem.B.dylib and on windows it's kernel32.dll I think all of them have link so technically it should be possible to implement it.

Now that being said how important is fs.link ? Asking because there are two roadblocks:

  1. It is not something we would be able to do with-in the WebExtensions experimantal APIs and would require changes to be made with-in mozilla central that is far more difficult and will consume a lot more time (assuming no push-back). Never the less I have filed Bug 1465929 to request it.

  2. There is consensus with-in web-extensions team use of indexd-db using moz proprietary IDBMutableFile extension is a preferred solution allowing virtualized Filesystem API. Which would be able to provide most of the API described, but as far as I can tell it won't have fs.link or fs.symlink not natively anyway. I'll touch more on this in a separate comment.

I'd recommend using fd instead of path as the first argument to read. Oh, I see that it becomes the OO style by opening the file and getting a ReadableFile which has read on it (and no path or fd argument). I think i personally prefer simpler positional args for low level stuff read(fd, start, end, buffer, cb) because that is a lower common denomenator and already close to node's api. If you went with the OO interface, I would probably end up polyfilling it back down to the simpler api anyway.

👍

Truth is under the hood I pass around file descriptors across processes anyway so I see no reason to not expose fs.read(ReadableFile, byteOffset, byteLength, buffer):Promise<ArrayBuffer> as on the host process that's how it is anyhow. I'll just rename fs.read to fs.readFile.

As of the callbacks, as implementation does cross process resource juggling and underlying primitives to do so exposes promise based APIs and does all the bookkeeping I see no benefit to exposing callback based API other than compatibility with node. But even then API won't be compatible and would require boilerplate to do additional bookkeeping which I find hard to justify. It also worth pointing out that I expect that users will end up abstracting these APIs to some common API so I'd prefer not to avoid any extra work that would go into API compatibility layer.

If you still think there is a reason I should reconsider this please pursue me to do so.

important: read options should include a buffer to read into so that it's possible to reuse memory - for high performance it's necessary to avoid memory allocations, also it would enable reading directly into a webassembly context without another memory copy. This will make a big difference to high performance applications!

I'm afraid that's not going to be possible. Short version is due to "sandboxing" all the IO is performed on separate OS process and then data is being copied to the process that consumes this data. I have described this into more detail above. So even if I were to allow passing buffers to write into, I'd still have buffers allocated in this process and be copying from them into buffer you pass so at the end of the day it would be less performant.

That being said I've being told that since all of the IO in gecko is done the same way it's well optimized and likely to get even better over time and it should not be a bottleneck. I have being discussing of maybe running protocol handlers in main process in a separate thread, but it seems that really defeats security guarantees of "sandboxing" and so it's very unlikely to happen.

@Gozala
Copy link
Contributor Author

Gozala commented May 31, 2018

As I have being working towards implementation and discussing aspects of it with WebExtensions team I came to realize that there is a need to reconsider some of the API choices, I'd like to bring those up here provide some insight on why and describe alteration I'm considering instead.

Background

  1. Security aspects of FileSystem access are overwhelmingly difficult. To quote my colleague "designing the method by which users make an informed decision to grant an extension access to the filesystem is tricky".
  2. It is important to reduce amount of privileges user can grant to an add-on to reduce attack vectors.
  3. It is hard to design UX that would allow add-ons to request set of "must have" privileges and "nice to have" privileges that later allow turning some of the nice to haves to must haves, especially so that user can make sense what is going on and by who's behalf.
  4. Cases that can be served with virtualized file system using indexed-db and IDBMutableFile extension must is it as it greatly reduces harm that malicious add-on can do. It could also avoid unnecessary user prompting.
  5. As @pfrazee pointed out there are cases where actual files are better as that allows use of other tools to make changes.

Next

I have also discovered @rpl put togather idb-file-storage library that already provides Virtualized FileSystem access using IDBMutableFile that provides random file read / write requested by @mafintosh and it works completely in user-land. I think it would be best to converge on a common API with idb-file-storage as that would allow one to use virtual filesystem access without any user prompts for default case and have compatible scoped FileSystem API for cases where actual FileSystem access is desired. Later case on FileSystem.mount(...) would prompt user where user will be able to choose desired directory for restricted storage.

Your feedback

There are some consequences if we were to do this:

  1. We would need to emulate directories somehow maybe via index or a custom files ? I'm also unsure of performance characteristics that would have.
  2. There will be no way to set permissions. Do any of you actually care ?
  3. There are no symlinks or hardlinks but those could probably be emulated as well.
  4. There is no watch but given that virtual filestore will be per addon it can be emulated as well.

Essentially I wonder how much do you all care about the list above, do we even care to emulate it for virtual FS case ?

For now I'm concentrating on non virtual filesystem access that would prompt user to choose directory to get access to. In that regard I'm also wondering if you have some thoughts on:

Say you do FileSystem.mount({read:true}) and user chose some directory, but later you want to also an ability to write and watch what's do you think is reasonable ?

  1. Just ask user to choose directory again but pre-select one that has limited access ? (That might confuse user).
  2. Ask user somehow if (s)he's willing to also grant write access ? (What if they don't want to grant write into that directory but would rather choose a different one ?). Difficulty here is in displaying directory path).

@Gozala
Copy link
Contributor Author

Gozala commented May 31, 2018

@dominictarr in regards to

for example, stream a file in, and then move it into place when it's complete. move is a single operation so if the streaming fails, you don't move the file and the system stays consistent.

Please not that currently I don't expose any streaming into / out of file primitives, maybe once https://streams.spec.whatwg.org/ are available in gecko we'll do that, but until now it's lot of work, increased surface for bugs and is doomed to be obsolete in longer term.

@dominictarr
Copy link

dominictarr commented Jun 1, 2018

@Gozala regards streaming: if we have low level write (append) and read, then I will implement streaming my self. Any built in streaming thing is only gonna wrap those primitives anyway. (in my opinion, whatwg streams are very heavy and complicated)
So what I say "stream in" that's what I meant, but I need move.

Personally, I don't have any designs using links, so I can live without those.
Also, allowing the user to select a directory is not important to me, I want files for backend stuff that the user does not directly interact with. The permission could just say "do you give this website file storage" and assign it a directory somewhere far away from the user's home folder.

clarification of the buffer and callbacks, you say:

as implementation does cross process resource juggling and underlying primitives to do so exposes promise

so this means the actual file access happens in a separate process, and then is copied into the process were the javascript runs? This isn't just a firefox api, this is an api we want other engines to also implement. Do you think the architecture will be similar in chrome and others. Even without the buffer option it this will still be much better than nothing.

@dominictarr
Copy link

I didn't know about IDBMutableFile! investigating this now!

@dominictarr
Copy link

I tried to get IDBMutableFile working... but only got to:

InvalidStateError: A mutation operation was attempted on a file storage that did not allow mutations.
and various othe things that don't make sense...

although the idb-file-storage demos do seem to work. I want a file system api that actually looks like a file system api. it should make sense to someone who understands ordinary file systems... implementing a polyfill that works on top of IDBMutableFile would be good but I would be very sad if IDB's warts end up on this file system api.

@dominictarr
Copy link

okay, I got IDBMutableFiles working via idb-file-storage module.

@lidel
Copy link
Contributor

lidel commented Jun 1, 2018

@Gozala from IPFS perspective: key use for this new File System API would be a new backend for js-ipfs-repo, a more powerful alternative to current level-js store on top of IndexedDB that we could use in web-ext contexts. User does not directly interact with backend storage and ideally use of this API in default mode should not trigger any user prompts apart from approving 'storage' permission during extension install. Use of virtual filesystem and the four issues you raised (dirs, links, permissions, watch) do not impact this use case, as far I can tell.

The FileSystem.mount introducing real filesystem feels like a thing on its own, as it is bound to abstractions closer to the end user. It could enable use cases such as syncing arbitrary directories over pubsub/libp2p, but exposing user-picked paths comes with significant UX and security challenges.

It should require a separate permission in extension's manifest, unless there is a mode for mounting anonymous, writable directory within user's profile without any prompt (eg. ~/.mozilla/firefox/<profile>/extension/<ext-id>/fs/<fs-uuid>), which essentially makes it work like a virtual fs but with perks of being a real fs. I suspect a big chunk of use cases will just require a storage without the need for asking user for target location.

Say you do FileSystem.mount({read:true}) and user chose some directory, but later you want to also an ability to write and watch what's do you think is reasonable ? [..]

  1. Just ask user to choose directory again but pre-select one that has limited access

If I had to pick, this one feels less confusing. Or just keep the API super simple and throw an Error and let extension developer to solve UX of communicating the need for unmounting and mounting a directory with extended privileges.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 7, 2018

@dominictarr in regards to

for example, stream a file in, and then move it into place when it's complete. move is a single operation so if the streaming fails, you don't move the file and the system stays consistent.

Please not that currently I don't expose any streaming into / out of file primitives, maybe once https://streams.spec.whatwg.org/ are available in gecko we'll do that, but until now it's lot of work, increased surface for bugs and is doomed to be obsolete in longer term.

@Gozala regards streaming: if we have low level write (append) and read, then I will implement streaming my self. Any built in streaming thing is only gonna wrap those primitives anyway. (in my opinion, whatwg streams are very heavy and complicated)
So what I say "stream in" that's what I meant, but I need move.

👍 I don't really like whatwg streams all that much either, but they will have one very important feature that would be impossible to match in userland, which is if you have readable in a process A and some native transformer and then that is piped into a Writable browsers will be able to do all the IO and processing without ever entering JS event loop or copying data back and forth. That's not to justify whatwg streams API complexity, I wish it was simple (I failed to pursue working group in that)* but that ship is sailed.

Personally, I don't have any designs using links, so I can live without those.

👍

Also, allowing the user to select a directory is not important to me, I want files for backend stuff that the user does not directly interact with. The permission could just say "do you give this website file storage" and assign it a directory somewhere far away from the user's home folder.

I've decided to go with a following:

  • If you just want efficient read/writes to disk, you should choose IDBMutableFile. It is already available and does not requires complex user interactions for permission etc.. I might pursue writing user-space library that wraps it to provide drop-in replacement for FileSystem API.
  • If you want to actually operate on filesystem (to allow users use other tools on the same files) then you should choose FileSystem API I'm currently working. It would allow you to mount directories and will prompt user for permissions. (Readme has a demo of it in action).

I could be pursueded to also expose VirtualFileSystem API which would be just like FileSystem API except it will write files on the disk somewhere in firefox profile and will prompt user asking if it's ok to do so. I think it would only make sense if we find it impossible to implement VirtualFileSystem on top of IDBMutableFile.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 7, 2018

@Gozala from IPFS perspective: key use for this new File System API would be a new backend for js-ipfs-repo, a more powerful alternative to current level-js store on top of IndexedDB that we could use in web-ext contexts. User does not directly interact with backend storage and ideally use of this API in default mode should not trigger any user prompts apart from approving 'storage' permission during extension install. Use of virtual filesystem and the four issues you raised (dirs, links, permissions, watch) do not impact this use case, as far I can tell.

👍 Sounds like what I'm aiming for (see #8 (comment)) would fit this perfectly.

The FileSystem.mount introducing real filesystem feels like a thing on its own, as it is bound to abstractions closer to the end user. It could enable use cases such as syncing arbitrary directories over pubsub/libp2p, but exposing user-picked paths comes with significant UX and security challenges.

I think mostly this useful for whenever you want to allow users to operate on the content with other tools. Currently there is a subset of the API implemented with demo and .gif in readme showing UX interaction.

It should require a separate permission in extension's manifest, unless there is a mode for mounting anonymous, writable directory within user's profile without any prompt (eg. ~/.mozilla/firefox//extension//fs/), which essentially makes it work like a virtual fs but with perks of being a real fs.

I think this use case should just use IDBMutableFile. Am I missing use case where real fs but hidden from user is preferred over IDBMutableFile ignoring the API part of things. Only one I can think of is when you started off with VirtualFileSystem and then on users request need to move it to real directory. That would be a very simple OP if backend used real FS but then I don't think it's enough to justify this.

I suspect a big chunk of use cases will just require a storage without the need for asking user for target location.

Yes that is why I prefer IDBMutableFile base solution for that, it also has benefit of having quota management built-in. With a real FS backend it would be lot more difficult to constraint as other tools could exceed your quota etc...

If I had to pick, this one feels less confusing. Or just keep the API super simple and throw an Error and let extension developer to solve UX of communicating the need for unmounting and mounting a directory with extended privileges.

Current flow act's as follows:

  • If you FileSystem.mount({ read:true }) that will prompt user and will allow picking desired directory. That will return Promise<{url:"file:///Users/gozala/picked/dir/', readable:true, writable:false, watchable:false }> if and when user followed through, or promise will be rejected if user did not. It's your responsibility to store (associated with virtual fs id) URL (will elaborate why below).
  • You could optionally provide url when mounting FileSystem.mount({ url:volumeURL, read:true }) returned promise will reject unless user had previously picked that directory and will succeed without prompting a user otherwise. This will work across browser sessions.
  • You could request additional privileges following example above FileSystem.mount({url:"file:///Users/gozala/picked/dir/', read:true, write:true}) that would prompt user but in this case popup will just ask if it's ok to also write to that directory, in this case user won't be choosing any directories.
  • If you ask for less privileges next time around, say user already granted read+write and in next session you just ask for read this won't cause any prompts, furthermore asking read+write in the next session will still succeed without user prompts. In other words you don't need to remember permissions you have just ask for minimum we will still keep max permissions user granted.
  • You can mount multiple directories. Although user will have to pick each one.

I am not really happy that user prompt is a doorhanger which can be confusing as it may seem as if page is asking you permission instead of an add-on and it's also associated with a tab, which makes very little sense. I would love if it pointed to the add-on button instead or to a hamburger firefox button if add-on does not have it, but that is something I'll look to improve after I can get APIs out.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 7, 2018

clarification of the buffer and callbacks, you say:

as implementation does cross process resource juggling and underlying primitives to do so exposes promise

so this means the actual file access happens in a separate process, and then is copied into the process were the javascript runs?

Yes, except we have JS running in both processes, it's just that process that has file access is privileged and one where data is copied to is restricted / sandboxed and that is where extension and web content code runs (although there separate processes for extensions and web content).

This isn't just a firefox api, this is an api we want other engines to also implement. Do you think the architecture will be similar in chrome and others.

It is what had being referred as process sandboxing you can find more details here https://wiki.mozilla.org/Security/Sandbox and gecko ported that from Chromium so Chrome had it far earlier. No idea about safari but my guess is they do that as well.

Even without the buffer option it this will still be much better than nothing.

I have already landed open / close / read / write / stat and an example excising them 😉 :
https://github.com/mozilla/libdweb/tree/master/demo/fs

@Gozala
Copy link
Contributor Author

Gozala commented Jun 7, 2018

One more piece of communication I want to do here. It turns out that exposing any kind of classes under web-extensions API is really difficult (as far as I can tell not something that can be done without landing pieces into Firefox) which is why I end up going with a sugar-free API (that might please @dominictarr 😉). I'll update all the interface definitions to reflect that but here is a short summary:

  • All the path manipulation stuff will be omitted, in fact all the places that were took file paths will take file:// URLs instead (Might have to switch to filesystem:// URLs in the future). Reasons being:
    • Cuts down huge API surface.
    • There are user space libraries for it.
    • URL resolutions avoids all kind of cross-platform issues.
  • There will be no longer FileSystem instances that are scoped to mounted directory. 😞 Instead when you mount you'll get object like { url:'file:///path/to/dir', readable:true, writable:true, watchable:false} and it will be on you to remember root url and resolve to it.
  • All the described APIs will become static:
    • FileSystem.open(url, {read:true}):Promise<Reader>
    • FileSystem.open(url, {write:true}):Promise<Writer>
    • File.read(Reader, ?{?offset, ?size}):Promise<ArrayBuffer>
    • File.write(Writer, ArrayBuffer, ?{?offset, ?size}):Promise<number> // bytes writter
    • File.stat(Reader|Writer):Promise<Stat>
    • File.close(Reader|Writer):Promise<void>
    • FileSystem.read(url, ?{?offset, ?size})
    • FileSystem.write(url, ArrayBuffer, ?{?offset, ?size}):Promise<number> // bytes writter
    • FileSystem.stat(url):Promise<Stat>
  • I will drop APIs that don't exist in node and no-one seem to care about:
    • writeAtomic
    • openUnique

@mafintosh
Copy link

👍 on the sugar free api.

@dominictarr
Copy link

@Gozala did we loose move?

@Gozala
Copy link
Contributor Author

Gozala commented Jun 9, 2018

@Gozala did we loose move?

Nope, I just did not list everything. Will only drop what's mentioned in the drop list.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 10, 2018

Unfortunately I found out that file watching implementation in Gecko had was only implemented for windows. There is a very old Bug 958280 on file.

@pfrazee
Copy link

pfrazee commented Jun 10, 2018

That's too bad!

@Gozala
Copy link
Contributor Author

Gozala commented Jun 14, 2018

That's too bad!

Yeah that is unfortunate. We could likely get an implementation over time, but in the meantime I would suggest using Native Messaging API in combination with a tiny node app that would notify changes back.

1 similar comment
@Gozala
Copy link
Contributor Author

Gozala commented Jun 14, 2018

That's too bad!

Yeah that is unfortunate. We could likely get an implementation over time, but in the meantime I would suggest using Native Messaging API in combination with a tiny node app that would notify changes back.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 14, 2018

I also have discovered by following bugzilla issues on the watch subject that @Noitidart did js-ctypes based implementation for watching https://github.com/Noitidart/jscFileWatcher which might be another way to go about it.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 14, 2018

And Bug 958280 is where OS.File.watch addition had being proposed / discussed.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 20, 2018

Submitted bug to add support for writing in given offsets https://bugzilla.mozilla.org/show_bug.cgi?id=1469974

@Gozala
Copy link
Contributor Author

Gozala commented Jul 23, 2018

Implementation (that lacks file watching) has landed. I don't think having this open makes sense, all the followup work should get tracked in corresponding issues.

@Gozala Gozala closed this as completed Jul 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants