-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File System API #8
Comments
From @pfrazee
From @mafintosh
From @rpl
From @pfrazee
From @rpl
|
Respondig to @rpl
As things stand now we were planning on using OS.File API that does all the IO off the main thread. As you also pointed out there is no way to do IO anywhere other the parent / main process so sadly that would imply some overhead of pumping data from Only other option I see (and seem to be getting increasingly more in favor of) to expose a ServiceWorker like API say Major downside of this approach is that I would need to replicate large portion of what WebExtensions do (like permissions, lazy loading, and probably a lot more). Still I would love to hear your opinion in this regard.
This actually more or less what Beaker is doing right now except it's not drag and drop based but rather they provide a prompt where user can choose a directory, which I imagine should not be too difficult to add. Write & Watch is essential though. @pfrazee I wonder what requirements would FileSystem API would have to satisfy to not limit beaker in any way while at the same time was scoped limited in scope like @rpl If we were to design our
That reminds me that this used to be an API not present in any way in Gecko back in Add-on SDK days at least. |
The RE the file access: yeah I think we could work within |
Up until WebExtensions Firefox used to run all of the add-on's in the main thread (which is different from the content threads), but it also used to be a reason why Firefox was becoming a lot less responsive for users with many add-ons installed. With WebExtensions as far as I understand actual add-on code is loaded in separate extension process, content scripts are loaded in the content processes (usually referred to as child processes). Still all the IO happens on in the main process (usually referred to as parent process) as far as I understand intentionally so to enable sandboxing. APIs exposed to the extensions usually have corresponding code (implemented by WebExtensions team) run in the main (thread of the main) process with which they exchange async messages to do IO related tasks.
I should elaborate how protocol handlers work I think to explain why I think Protocol handlers in gecko need to be installed both in main process and each content process. As far as I can tell one in main process is only used to create URLs while the one's in content processes are used to load content. So when tab say is attempting to load
If protocol handler does It's not necessarily going to be slow, but it also does not seems to me that there is much value in running protocol handler logic in extension process. I believe main reason why add-on code runs in the extension process in first place is to provide it with DOM access, which in case of protocol handler does not seem to make much sense. Now what I was suggesting to do was to spawn a dedicated worker thread on the main process and load protocol handler code there. It still should not block main process, but unlike the above case it would be possible to transfer So in this case if protocol handler does |
@mafintosh @lidel @olizilla Do you care about having control of file open / close files ? Or would sparse read / write with automatic close satisfy requirements ? |
For self reference there seems to be a |
I wrote down the interface definition for this API and could use some feedback:
P.S: API mostly follows existing OS.File. I do understand that node fs compatible API would have being likely more desirable, but that would have introduced extra layer of adapters and node Stream implementation, which I'd rather see happen in user land. |
Seems right. I'll pass it on to maf. Is the idea of |
👍
Yes (I think it just prefixes number to the basename). It's just already existed in the gecko (see |
I've changed my mind on the proposed API a bit, let me elaborate and please let me know if you see concerns with it. I am considering removing
Maybe there's a good reason against it (if you think of one please let me know), I can't think of one myself. |
@Gozala using the API you propose in the link above how would I do a read/write at a specific byte offset? As I read your spec that seems to be two different async operations? Other than that it looks pretty solid. In general we rely on this storage API https://github.com/random-access-storage/random-access-storage#var-storage--randomaccessstorageoptions btw For the auto open/close stuff ... I'd prefer this to be as low level as possible. Having control over open/close allows us to serve 1000s of files in the browser potentially as we can tune when to open and close them (we already do this in our node code). If that is really tricky I'm sure we can live without it as long as the file isn't opened/closed for every write/read op as that would prob kill our perf. |
@mafintosh to read the second kilobyte: @Gozala I would definitely want control of open,close, link, unlink, move (aka rename). I'd recommend using fd instead of path as the first argument to important: read options should include a buffer to read into so that it's possible to reuse memory - for high performance it's necessary to avoid memory allocations, also it would enable reading directly into a webassembly context without another memory copy. This will make a big difference to high performance applications! |
Exactly what @dominictarr said (quoting below)
👍 I assumed so, but still wanted to make sure. I expect that proposed |
👍 I assume existing I don't know on why API to create hard links isn't available it could be simply because it was not necessary from with in gecko or it could be related to platform portability or anything else. I can try to find that out. In the meantime I can provide some pointers: Filesystem access in gecko is achieved via ctypes that allows loading native system libraries, on Unix it seems to use Now that being said how important is
👍 Truth is under the hood I pass around file descriptors across processes anyway so I see no reason to not expose As of the callbacks, as implementation does cross process resource juggling and underlying primitives to do so exposes promise based APIs and does all the bookkeeping I see no benefit to exposing callback based API other than compatibility with node. But even then API won't be compatible and would require boilerplate to do additional bookkeeping which I find hard to justify. It also worth pointing out that I expect that users will end up abstracting these APIs to some common API so I'd prefer not to avoid any extra work that would go into API compatibility layer. If you still think there is a reason I should reconsider this please pursue me to do so.
I'm afraid that's not going to be possible. Short version is due to "sandboxing" all the IO is performed on separate OS process and then data is being copied to the process that consumes this data. I have described this into more detail above. So even if I were to allow passing buffers to write into, I'd still have buffers allocated in this process and be copying from them into buffer you pass so at the end of the day it would be less performant. That being said I've being told that since all of the IO in gecko is done the same way it's well optimized and likely to get even better over time and it should not be a bottleneck. I have being discussing of maybe running protocol handlers in main process in a separate thread, but it seems that really defeats security guarantees of "sandboxing" and so it's very unlikely to happen. |
As I have being working towards implementation and discussing aspects of it with WebExtensions team I came to realize that there is a need to reconsider some of the API choices, I'd like to bring those up here provide some insight on why and describe alteration I'm considering instead. Background
NextI have also discovered @rpl put togather idb-file-storage library that already provides Virtualized FileSystem access using Your feedbackThere are some consequences if we were to do this:
Essentially I wonder how much do you all care about the list above, do we even care to emulate it for virtual FS case ? For now I'm concentrating on non virtual filesystem access that would prompt user to choose directory to get access to. In that regard I'm also wondering if you have some thoughts on: Say you do
|
@dominictarr in regards to
Please not that currently I don't expose any streaming into / out of file primitives, maybe once https://streams.spec.whatwg.org/ are available in gecko we'll do that, but until now it's lot of work, increased surface for bugs and is doomed to be obsolete in longer term. |
@Gozala regards streaming: if we have low level write (append) and read, then I will implement streaming my self. Any built in streaming thing is only gonna wrap those primitives anyway. (in my opinion, whatwg streams are very heavy and complicated) Personally, I don't have any designs using links, so I can live without those. clarification of the buffer and callbacks, you say:
so this means the actual file access happens in a separate process, and then is copied into the process were the javascript runs? This isn't just a firefox api, this is an api we want other engines to also implement. Do you think the architecture will be similar in chrome and others. Even without the buffer option it this will still be much better than nothing. |
I didn't know about IDBMutableFile! investigating this now! |
I tried to get IDBMutableFile working... but only got to:
although the idb-file-storage demos do seem to work. I want a file system api that actually looks like a file system api. it should make sense to someone who understands ordinary file systems... implementing a polyfill that works on top of IDBMutableFile would be good but I would be very sad if IDB's warts end up on this file system api. |
okay, I got IDBMutableFiles working via idb-file-storage module. |
@Gozala from IPFS perspective: key use for this new File System API would be a new backend for js-ipfs-repo, a more powerful alternative to current level-js store on top of IndexedDB that we could use in web-ext contexts. User does not directly interact with backend storage and ideally use of this API in default mode should not trigger any user prompts apart from approving 'storage' permission during extension install. Use of virtual filesystem and the four issues you raised (dirs, links, permissions, watch) do not impact this use case, as far I can tell. The It should require a separate permission in extension's manifest, unless there is a mode for mounting anonymous, writable directory within user's profile without any prompt (eg.
If I had to pick, this one feels less confusing. Or just keep the API super simple and throw an Error and let extension developer to solve UX of communicating the need for unmounting and mounting a directory with extended privileges. |
@dominictarr in regards to
Please not that currently I don't expose any streaming into / out of file primitives, maybe once https://streams.spec.whatwg.org/ are available in gecko we'll do that, but until now it's lot of work, increased surface for bugs and is doomed to be obsolete in longer term.
👍 I don't really like whatwg streams all that much either, but they will have one very important feature that would be impossible to match in userland, which is if you have readable in a process A and some native transformer and then that is piped into a Writable browsers will be able to do all the IO and processing without ever entering JS event loop or copying data back and forth. That's not to justify whatwg streams API complexity, I wish it was simple (I failed to pursue working group in that)* but that ship is sailed.
👍
I've decided to go with a following:
I could be pursueded to also expose |
👍 Sounds like what I'm aiming for (see #8 (comment)) would fit this perfectly.
I think mostly this useful for whenever you want to allow users to operate on the content with other tools. Currently there is a subset of the API implemented with demo and .gif in readme showing UX interaction.
I think this use case should just use
Yes that is why I prefer
Current flow act's as follows:
I am not really happy that user prompt is a doorhanger which can be confusing as it may seem as if page is asking you permission instead of an add-on and it's also associated with a tab, which makes very little sense. I would love if it pointed to the add-on button instead or to a hamburger firefox button if add-on does not have it, but that is something I'll look to improve after I can get APIs out. |
Yes, except we have JS running in both processes, it's just that process that has file access is privileged and one where data is copied to is restricted / sandboxed and that is where extension and web content code runs (although there separate processes for extensions and web content).
It is what had being referred as process sandboxing you can find more details here https://wiki.mozilla.org/Security/Sandbox and gecko ported that from Chromium so Chrome had it far earlier. No idea about safari but my guess is they do that as well.
I have already landed open / close / read / write / stat and an example excising them 😉 : |
One more piece of communication I want to do here. It turns out that exposing any kind of classes under web-extensions API is really difficult (as far as I can tell not something that can be done without landing pieces into Firefox) which is why I end up going with a sugar-free API (that might please @dominictarr 😉). I'll update all the interface definitions to reflect that but here is a short summary:
|
👍 on the sugar free api. |
@Gozala did we loose |
Nope, I just did not list everything. Will only drop what's mentioned in the drop list. |
Unfortunately I found out that file watching implementation in Gecko had was only implemented for windows. There is a very old Bug 958280 on file. |
That's too bad! |
Yeah that is unfortunate. We could likely get an implementation over time, but in the meantime I would suggest using Native Messaging API in combination with a tiny node app that would notify changes back. |
1 similar comment
Yeah that is unfortunate. We could likely get an implementation over time, but in the meantime I would suggest using Native Messaging API in combination with a tiny node app that would notify changes back. |
I also have discovered by following bugzilla issues on the watch subject that @Noitidart did js-ctypes based implementation for watching https://github.com/Noitidart/jscFileWatcher which might be another way to go about it. |
And Bug 958280 is where |
Submitted bug to add support for writing in given offsets https://bugzilla.mozilla.org/show_bug.cgi?id=1469974 |
Implementation (that lacks file watching) has landed. I don't think having this open makes sense, all the followup work should get tracked in corresponding issues. |
Low level random access read/write APIs similar to fs.read
Gecko has multiple FileSystem APIs available most recommended one (last time I checked) was OS.File which likely could be (wrapped &) exposed as WebExtensions API.
For random access read / writes there are still nsIFileInputStream & nsIFileOutputStream APIs and legacy add-on API wrappers to expose them in node compatibleAPI.For reading / writing at specific byte offset OS.File.prototype.setPosition could be used.🐞 1246236
The text was updated successfully, but these errors were encountered: