-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meta VFS for adding storage options #447
Comments
Can you say how you think using the SQLite VFS Api compares to using the Emscripten filesystem api? I ask because I just built a VFS that fetches chunks of the database using HTTP range requests using just the emscripten filesystem api and it works fine with only a few changes to sql.js: 837d25d The emscripten FS api is very similar to the sqlite one, it's just one level lower in providing a POSIX api for sqlite instead of the sqlite-specific api. demo: https://phiresky.github.io/youtube-sponsorship-stats/?uploader=Adam+Ragusea It seems like Emscripten also provides an IndexedDB file storage backend that could maybe be used in a similar fashion? https://emscripten.org/docs/api_reference/Filesystem-API.html#filesystem-api-idbfs |
@phiresky I haven't written an Emscripten FS, so this is based on my understanding of it. As you note, the SQLite VFS interface is higher level than the Emscripten FS interface, so I think it's a little easier to understand and implement. A more fundamental difference, however, is there is no way I know of to use an Emscripten FS to bridge synchronous calls to an asynchronous API, whereas SQLite VFS can use Asyncify. I assume that to satisfy this synchronous requirement you use synchronous It's great that you can use sync XHR; that's fine for HTTP requests. One might think that we could use sync XHR as a bridge to any asynchronous API by installing a Service Worker. That would be wonderful, but unfortunately at least Chrome and Safari don't support this. Emscripten does provide IDBFS, but because of the synchronous restriction, it is actually MEMFS with manual two-way mirroring to IndexedDB. Or viewed another way, it's IndexedDB with everything cached in memory. So it won't support a filesystem larger than available memory, even if no files are actively being used. In addition, I believe that the current implementation of IDBFS stores the entire file as an object, so sync will slow down as the files grow even if the changes are small (this behavior could be changed with a reimplementation). I don't believe one can implement a blocking lock with an Emscripten FS (other than with sync XHR to an external server). That means that if you want to support concurrent access to an IDBFS database, e.g. a user has two tabs open on your page, you have to arbitrate above the filesystem level. You can hack together a locking scheme at the library level, but how do you know whether you need a shared read lock or an exclusive write lock? It's a lot easier and more elegant to handle locks exactly when SQLite tells you it needs them and why. My toy IndexedDB VFS will support a database up to the browser storage limit, which could be larger than available memory. It doesn't support locking either but that's a design choice - the framework itself doesn't exclude it. I think adding support for concurrent access from same-origin contexts is quite doable. Many applications can live within the constraints of Emscripten FS - that is not in dispute - but extending via the SQLite VFS opens potentially useful possibilities outside those constraints. If you want to mount over a WebSocket or encrypt with WebCrypto, for example, I don't know how you do that with Emscripten FS. I also think that the SQLite VFS is conceptually simpler, and so is more accessible (e.g. easier to read existing implementations) because it is higher level and only includes what SQLite uses. |
Thanks for the detailed response. By the way, did you try using Atomics.wait() with a separate worker for communication instead of asyncify? |
I have not tried Atomics at all. I generally try to avoid features not supported across the evergreen browsers, but because you mentioned it I see that the holdout Safari has it in preview behind a flag so hopefully that will be coming soon. It seems like there might be a nice development path in initially creating a VFS with Asyncify, debugging and testing in a single context, then moving the VFS class to a separate worker and replacing it with a proxy that no longer needs Asyncify (or never moving it if performance isn't an issue). |
I just tried it and using Atomics for async stuff (without asyncify) works! In my case I'm trying to write a virtual table to access the DOM right now, which has to be async since my SQLite is running in a Worker but the DOM is only accessible in the main thread. Here's some (shitty) code: sqlite.worker.ts // sends a request to the main thread via postMessage,
// then synchronously waits for the result via a SharedArrayBuffer
function doAsyncRequestToMainThread(request: MainThreadRequest) {
// todo: dynamically adjust this for response size
const sab = new SharedArrayBuffer(1024 * 1024);
// first element is for atomic synchronisation, second element is the length of the response
const metaArray = new Int32Array(sab, 0, 2);
metaArray[0] = 1;
// send message to main thread
self.postMessage({
action: "eval",
notify: sab,
request,
});
Atomics.wait(metaArray, 0, 1); // wait until first element is not =1
const dataLength = metaArray[1];
console.log("got response of length", dataLength);
// needs to be copied because textdecoder and encoder is not supported on sharedarraybuffers (for now)
const dataArray = new Uint8Array(sab, 2 * 4, dataLength).slice();
const res = new TextDecoder().decode(dataArray);
return JSON.parse(res);
} main.ts worker.addEventListener("message", async (e) => {
if (e.data && e.data.action === "eval") {
const metaArray = new Int32Array(e.data.notify, 0, 2);
const dataArray = new Uint8Array(e.data.notify, 2 * 4);
const response = JSON.stringify(await handleRequest(e.data.request));
const text = new TextEncoder().encode(response);
dataArray.set(text, 0); // need to copy here because:
// sadly TextEncoder.encodeInto: Argument 2 can't be a SharedArrayBuffer or an ArrayBufferView backed by a SharedArrayBuffer [AllowShared]
// otherwise this would be better:
/*const encodeRes = new TextEncoder().encodeInto(response, data);
if (encodeRes.read !== response.length) {
console.log(encodeRes, response.length)
throw Error(`not enough space for response: ${response.length} > ${data.byteLength}`);
}*/
metaArray[1] = text.length;
Atomics.notify(metaArray, 0);
}
}); There's some restrictions:
Now I can do this: So useful 🤣 |
Hello ! @rhashimoto I am willing to merge a pull request with your VFS interface once it's fully developed, tested, and documented. And I do agree that the interface and its various implementations for different storage backends should be maintained separately. Of course, having a synchronous IndexedDB API available in web workers would make everything infinitely easier. MDN states that
I am not sure to who this interest should be formulated, but I think we should at least make ourselves known from the standard committee in charge of that. And I do agree that using Emscripten's IndexedDB backend doesn't make much sense. Its behavior can easily be replicated by library users without any change in sql.js itself. |
Here are some timings for various VFS combinations. For each combination I did some simple operations:
I used three VFS implementations. "unix" is the default in-memory filesystem that SQL.js already uses. "mem" is a synchronous in-memory VFS I wrote, and "async mem" is an asynchronous in-memory VFS I wrote. I tested with two builds, one regular build that only permits synchronous VFS, and one with Asyncify that allows either synchronous or asynchronous VFS. I noticed that a lot of the filesystem traffic is for the journal file, so I also tested with Here are average times over 100 iterations:
I see these interesting things in the results:
|
I created a demo page with several SQLite VFS combinations of filesystems (default, memory, IndexedDB) and builds (standard, Asyncify). It seems to work on Chrome, Firefox, and Safari (haven't tried Edge). Note that it restores your SQL from the previous session and you can execute a portion of your SQL by highlighting it first. This is all just to get some idea of what can be done, what things might cost, and what is the best way to implement them. The interesting VFS is obviously IndexedDB - the others are mainly for comparison with it. IDB is used like a block device and is limited by the origin storage limits imposed by the browser, instead of by memory. Unlike the other VFS implementations, when using IDB after you reload the page the database is still there. If multiple tabs are open, completed transactions in one tab are visible across all tabs. I implemented a crude locking scheme that should make this safe (more on this below). IDB does feel quite a bit slower. Of course, it's always going to be slow compared with RAM; it takes a performance hit from using Asyncify and has IDB on top of that. I didn't implement any caching other than for checking file size, so all reads and writes use an IDB transaction. Looking at the access pattern, caching the first block of a database file might be a win because SQLite keeps checking a change counter there. I'm not sure how much benefit a general LRU-type cache would be. A lot of the slowness (most, I think) comes from the default Locking is a rabbit hole all by itself. The Web Locks API is nearly perfect for VFS...except that it isn't supported on Safari. Service Worker is supported across the evergreen browsers, but using it for locking across windows and workers would take some work. In the demo I used a lock file in IDB as an exclusive lock (i.e. multiple readers not allowed). I think it works and the implementation is small and all in one place (unlike a Service Worker approach), but this is not a high-quality solution (ignoring the lack of shared locks, which could be addressed) - it polls to test availability (inviting starvation) and it can leave a lock file in place on an abnormal termination (orphaned lock). So if you leave a transaction open and close the tab, you may have to use browser Dev Tools to clear IndexedDB for the page origin (just the "locks" database would be sufficient) for it to work again. |
Nice work on these benchmarks! I'll have to play with them - I've been wanting to compare journal modes and the like to have a set of recommended best practices for sql.js users. One thing I'm immediately curious about is the overhead percentage for various aspects of your test. For example, I would expect typical usage to involve creating/opening a DB, performing a LOT of transactions, and then closing the DB when the session is done. In your benchmarks, I would expect there to be a lot of overhead in The opening and closing of the DB...that's likely intentional, but it's the first thing I want to tweak and test for my own expected usage. For our (planned) usage of sql.js, we are okay with a local, in memory database losing data, but ideally, before we talk to our server, we will need a consistent flushed-to-disk version of the DB so that a crash after that would restore us to a consistent state. This would probably allow us to use atypical database semantics that give us the speed of PRAGMA=off, but occasionally take a hit when flushing the DB to disk before a server API call. |
@Taytay There isn't very much overhead for opening and closing a database, at least in terms of i/o. That seems unintuitive because we're trained to consider opening and closing files as expensive operations, but a database moves a lot of those costs (e.g. flushing) inside transactions which leaves much less to do outside. Here's a log of what happens on open:
Everything there except
That takes three reads. One of them could have been read from cache but my VFS doesn't have a cache. Here's a single row insertion
The numbers on the In my benchmark, I didn't wrap my insertions (or anything else) in a BEGIN...COMMIT block so inserting each row is a transaction. My guess is that's what dominates the timings. I mainly put the reopening in to make sure that SQLite couldn't use its own internal cache and the data really was making a successful round-trip. |
This is fascinating @rhashimoto. I'm now curious about even more! This page is a good explanation of the assumptions that SQLite is making regarding the filesystem when using the journal. I think this could help us implement the VFS in a more performant way. (It also indicates why the Journal might be slower than the WAL): https://www.sqlite.org/atomiccommit.html I think there's gonna be a good combo here that gives us the safety we want, with the file-system semantics that are IndexedDB friendly. 🤞 anyway... |
@Taytay Regarding WAL mode, that's a question for someone who knows more about database implementation and its trade-offs than I do. I probably only know enough to be horribly wrong. Regarding the journal mode for those logs, it was DELETE, i.e. the default. I've been wondering myself why there isn't an I don't want to get too far down into the details of implementing any specific VFS in this thread. There assuredly will be many opportunities to optimize performance and tune for specific usages, like the ones you've mentioned. I'm highly gratified that this has sparked your and others' interest, and that was one reason I started the discussion, to find out if anyone else wanted this. I'm hoping to pivot to the other motivating reason, which is to figure out how to provide the platform for anyone to write their own VFS using only Javascript and without involving Emscripten. I don't think I'm prepared to integrate Meta VFS into sql.js all by myself. The biggest issue is that using an asynchronous VFS requires an asynchronous API that sql.js currently does not have. We could start with only synchronous VFS support which still can be useful with the existing API (e.g. the Atomics experiment by @phiresky) but the best browser support (at least for now) and easiest development are unlocked with asynchronous. I'm not really interested in defining or implementing this new API; for the demo, I wrote my own simple API directly on the exported SQLite C API. So I'm hoping that perhaps some others can help out. @lovasoa Would you accept a PR that only had the minimal Meta VFS code, with low-level tests written using the exported C API (i.e. via There are some other things that make it frustrating to work in the sql.js tree, so much so that I moved my development to a repo I made from scratch. The restriction to ES5 Javascript is pretty painful, harder to read and maintain, and I don't know how anyone will write an asynchronous API (Promise is ES6, async/await even later). I played with adjusting the Emscripten flags to transpile to ES5 in release builds, which worked, but then updating the ESLint rules broke on the worker code which Emscripten doesn't process and that's when I gave up working directly in the tree. If someone good at tooling could take this on that would make things better all around. #438 and #284 would also make life a little nicer for those of us who like buildless development. |
One more thing I found while experimenting is I just published the code and an article about my http-vfs and and my DOM virtual table by the way: https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/ It shows an exact page read log for each query (based on the dbstat virtual table) which could be useful. I think those many "weird" writes to the journal file are explained by this from the file format spec: It's probably writing the old page content, the page ID, and the checksum in three separate writes. So in your indexed DB thing you could align your rows with those three items.
I mean that kinda sounds like WAL mode but with extra steps. The WAL file is literally just a set of pages that are overlaid over the main db file during READs. Can't you always have an IDB transaction open that is committed and a new one created when xSync is called? |
I think that IDB's auto-committing transactions make this complicated. It might work as-is if you require the application to be synchronous during an SQLite transaction, i.e. it won't do something like |
@phiresky Congrats on your very nice blog post. @rhashimoto Off-topic, but I found wa-sqlite and its meta async VFS implementation extremely easy to use. Very well done. It was easy to build an HTTP Range-request based VFS (inspired by @phiresky 😉) without any local persistence or sequential pre-fetch. Maybe litestream’s approach to WAL streaming is informative? https://litestream.io/how-it-works/
I don’t think it’s possible to have a long-running read transaction open due to the single thread nature of sqlite in WASM, but maybe something similar can be achieved on the VFS or library level. |
Hello! I have built a project that solves this problem as well. For the last couple years I've built a completely local app (https://actualbudget.com), and it works well with Electron because I have native bindings to sqlite. It's been bugging me to get it working well on the web though; up until now I've used an awkward system of writing down changes manually and occasionally flushing the entire db. I really just want SQLite to control writes. I was very inpired by @phiresky's post about using SQLite over https. I realized the problem of IDB being async though. In early June I also discovered the technique of using From there I went off to build a solution, and it works very well! Excited to show you all my project: https://github.com/jlongster/absurd-sql. absurd-sql is a filesystem backend for sql.js that allows persistent storage, primarily IndexedDB for now (but could use something else later). I have ported the beta version of my app to use this. This is using my persistent storage layer: https://app-next.actualbudget.com/, go ahead and "try demo" and inspect IDB storage to see it. You also see my benchmark demo: https://priceless-keller-d097e5.netlify.app This whole thing is a bit absurd, honestly, because browsers except Chrome use SQLite to implement IDB, so we're storing SQLite in SQLite. It's also absurd how much faster this is than using raw IDB. I actually emailed @lovasoa in June when I saw this was possible, and he pointed me to this thread. I meant to post earlier, but I kept seeing new possibilities in my research and I didn't want to report back until I had confirmed some theories. I have learned a lot in my research, and I have a lot of opinions about everything in this thread. I'm writing a long blog post explaining everything in detail, and will publicly launch the project tomorrow. I will link the post tomorrow when it's live. Note: my work does require a few changes in sql.js, but not much. You can see my commits here: https://github.com/jlongster/sql.js/commits/master. I will open a PR soon Until them, I will summarize my research here: The goal of absurd-sqlMy goal is to simply have a persistent version of sql.js. That's it. It should ideally be a turnkey solution for It's not a goal to allow for people to write their on VFS's. This is a big difference with @rhashimoto's wa-sqlite library. These are quite different goals and there is room for both projects. For me, I just want something to plug into The filesystem is the APIBecause I don't care about VFS's, whether or not I achieve my goal via a VFS or FS layer is just an implementation detail. After some research, I decided that using the filesystem a the API was lot better than using a VFS. Why? A few reasons:
That code zero-fills the buffer and returns the right value for a short read. I get all of that for free. Emscripten already has a bunch of work to simulate a unix-like environment, so it makes sense that we can just lean into that.
Now everything under However, I will admit that I did need to customize the
Avoid Asyncify at all costs - SharedArrayBuffer is the answerTo solve the async problem, the two solutions are Asyncify or SharedArrayBuffer. Performance is already a sensitive issue; there's no way I'm going to use asyncify and add a lot more overhead. It also scares me in general to include an invasive transform in my stack; I'd much rather use SQLite compiled directly. SQLite being completely synchronous is a feature. It's simple and fast model. However we do need to perform async work inside of the FS methods. It's incredibly fast and works well.
Safari is the only browser that doesn't enable SAB yet, but it will in the future. Until then, we can support it with a fallback implementation. In that mode, it reads the entire DB into memory on open so that reads work, and for writes it queues them up and writes them async in the background. The only problem with this is multiple connections; if another connection has performed a write while it's trying to write, it must bail and it can never write again. In practice, this means if you have the app opened in two tabs, only one of them can perform writes. This is acceptable to me for now until SAB is enabled. Reaching amazing perfIndexedDB is a slow database. The only operation that is somewhat fast is advancing a cursor; everything else (opening a transaction, opening a cursor, etc) is very slow (especially in Chrome). We cannot afford doing any of these things for every single read or write. I found some critical optimizations, and I'll show the difference between my project and wa-sqlite. My goal was to have a table like First let's do some writes. absurd-sql (with sql.js) code:
wa-sqlite code:
Please let me know if I did something wrong there. Both of these ran with a block size of 8192,
I'm honestly a little skeptical here -- I didn't expect writes to be so different because I thought we both queued them up and did a bulk write at the Now for reads. absurd-sql code:
wa-sqlite code:
Again let me know if the wa-sqlite code is wrong. I'm not posting this to try to put down wa-sqlite; I'm writing down my research so that everything can benefit and maybe it can do the same optimizations. I also benchmarked it against raw IndexedDB, meaning if implemented the same code against IndexedDB directly. With the speed it has, it puts raw IndexedDB to shame. Here are some graphs that illustrate it. (All of the benchmarks are here, particularly the These are the same benchmarks as above, first doing the writes: And them doing the So what are the optimizations?I realized that the blocking That means if we do And now that we have a single read transaction, if we doing lots of sequential read requests we can open a cursor and advance it! That is another huge perf boost. See the So lots of sequential reads will just open one cursor in a transaction and iterate though. Similar optimizations occur with writes; when SQLite locks the file for writing, we open a Proper locking semanticsAnother key thing from my research: locking is super important. Since each tab in the browser will be a separate connection, we have to make sure that we aren't going to overwrite data. We already have IndexedDB transactions semantics to work with. Here's the method that implements locking, and the comment above it is helpful: https://github.com/jlongster/absurd-sql/blob/master/src/indexeddb/worker.js#L423 When a SHARED lock is requested, we open a The only problem is that another thread might have written data down between the time that we requested a This is only possible because we have long-lived IDB transactions because we can open a So The key thing is that we don't maintain any lock state. It's all represented as IDB transactions. The fantastic thing about that is that a db can never be left permanently locked. I've noticed that browsers will forcefully unlock IDB when you refresh the page, i.e. kill all transactions. We gain the assurance that all SQLite locks will also be cleared. Always use an in-memory journal and no WALA WAL doesn't make sense. In our case, either way data is going to end up in IDB. And while reading through the SQLite code, a WAL seems to require shared memory via Anything other than an in-memory journal also doesn't make sense. We don't need hot journals. IDB already gives us an atomic guarantee -- either the writes worked or they didn't. We still need the in-memory journal so we can do ConclusionAnyway, sorry for this huge dump. It's been a lot of research! I'm happy to discuss this more, I probably missed some stuff. I'm going to release my project tomorrow. I'm really happy that this actually worked out. Again you can see it working here: https://app-next.actualbudget.com/. |
Thank you very much for this research. I'm very excited about your upcoming PR. It will give a common interface on top of which we can have both an IndexedDB and an HTTP backend as independent libraries, which is great.
Did you contact @kripken or open an issue in emscripten about that ? |
@rhashimoto I'm been thinking about and I think you can benefit from long-lived transactions too. Even though your functions are async, when you return data to SQLite if it is going to continuously read more data it's going to call back into your VFS synchronously. So I think I can store an opened transaction locally, and on the next read check if it's still available. I think you'd see similar perf if you did that (not 100% of write transactions yet, haven't thought through it) You'll have to somehow check if the transaction gets closed though. |
Not yet, and I totally agree. It's very minimal: https://github.com/jlongster/sql.js/blob/master/src/vfs.c But yes ideally we wouldn't have to manage the locks like that. It's also annoying because we have to manually register the FS lock/unlock methods so they can be called, and there's potential for memory leaks because of how we have to manage everything. Will followup with this |
That's really nice work, @jlongster! I love how the locking can be handled so elegantly with IndexedDB transactions; I think that's my favorite part. Your wa-sqlite benchmark code looks reasonable. The IndexedDB VFS in wa-sqlite has not been tuned for performance because I wanted to keep the sample code simple. In particular, I think it gets crushed in the write benchmark because because its cache implementation is too crude. That being said, however, your approach doesn't depend on a cache so it sidesteps all those tuning problems completely.
This statement seems a little strong. There could be memory-intensive apps for which a large transaction might not fit in memory.
Yes, your post got me thinking about that, too! |
Thanks @rhashimoto! It's too bad I didn't find your project until late into my research because I would have reached out earlier. Thank you for all your work! Yeah, I try to do as little as possible, so depending on SQLite's cache, let IDB do the locking etc. It leads to some nice guarantees. I am also happy about how locking worked out; I had no idea how to solve the problem of an orphaned lock before that, and from what I've seen with IDB it was way too easy for that to happen if I tried to manage that state myself.
Totally fair. I don't fully understand what happens with large transactions tbh, and how cache spilling works. I won't say "doesn't make sense" again but more "this is the recommended". Need to think more about what happens with large transactions. (In my lib, opening a journal will be very slow because it spins up a new worker and connects to a new database. each file has its own db & worker, so it will greatly slow down transactions) |
Just realized I never link to my post, it's here: https://jlongster.com/future-sql-web I'll add a link to wa-sqlite from my project as an alternative Sorry I have put a PR to sql.js yet, today has been crazy busy. Will follow-up tomorrow :) |
@jlongster : I have been following your work for some time now. I'm very excited about what you've done with absurd-sql! It fills a previously important missing piece for sql.js and its adoption, and then manages to do so in a surprisingly performant manner. Thanks for taking the time to dive into your assumptions here, and nice work! |
Thank you! Thanks for others here doing this research too! I am SO sorry I dropped the ball here. I had put off a lot of stuff to get this launched, so I was really busy catching up on things for the last couple weeks. Going to work on a PR for emscripten and then will open a PR here too, and link the emscripten PR. |
Opened up a PR! Sorry for the wait. Will keep up with this from now on. #481 |
FYI Safari apparently re-enabled support for SharedArrayBuffer in release 15.2 last month. This means that the Atomics trick used by @phiresky and @jlongster to call asynchronous APIs (e.g. IndexedDB) synchronously should be usable across the four major evergreen browsers. This feature requires setting COEP/COOP headers, which does put some restrictions on what a web app can do. For example, this can interfere with third-party OAuth and payment integrations. |
I am looking for a way to use a storage backend such as Torrent or Sia/SkyNet, so both @phiresky XHR HTTP backend for fetching, as @jlongster's persistenc. Is there a way your works could be merged or combined? Would it be easier to include the HTTP backend to AbsurdSQL, or easier the other way around? I also found this: https://github.com/riyaz-ali/sqlite3.js by @riyaz-ali. |
You could also look over this unmerged SQLite over WebTorrent PR for a different SQLite library. |
And there is now also this: https://boredcaveman.xyz/post/0x2_static-torrent-website-p2p-queries.html |
Chrome-based browser have implemented the Access Handle for Origin Private File System (and supposedly it's in development for Safari). I'm wondering if that would be a more performant alternative to using IndexedDB... seems like that's the use case the authors had in mind:
Would it be possible to combine this meta-VFS with an origin private file system implementation? |
Safari was actually first to ship Access Handle in its production channel. Chrome and Edge made it to production last month. It's Firefox where it is still in development. I played with it as a SQLite back-end a couple months ago in Chrome Canary and my benchmarks had it slower than IndexedDB at that time (though a lot nicer to program with). |
The SQLITE_IOCAP_SEQUENTIAL property means that information is written to disk in the same order as calls to xWrite(). So if you return 'SQLITE_IOCAP_SEQUENTIAL' from xDeviceCharacteristics, sqlite will omitted a xSync for journal. |
Hey guys! I was able to achieve the same performance for So, I adopted James benchmark from the comment #447 (comment), and got almost the same amazing performance for both runInTransaction(db, async () => {
const time = new Date().getTime();
setLogs((l) => [...l, "Start inserting..."]);
for (let i = 0; i < 100; i++) {
await runQuery(
db,
sql`INSERT INTO kv (key, value) VALUES ${sql.join(
[...Array(10_000)].map(
() =>
sql`(${makeId()}, ${(
(Math.random() * 100) |
0
).toString()})`
)
)}`
);
}
runAfterTransactionCommitted(db, async () => {
setLogs((l) => [
...l,
`Done inserting in ${(new Date().getTime() - time) / 1000}s`,
]);
const summingTime = new Date().getTime();
setLogs((l) => [...l, `Summing...`]);
await runQuery(db, sql`SELECT SUM(value) FROM kv`);
setLogs((l) => [
...l,
`Done summing in ${(new Date().getTime() - summingTime) / 1000}s`,
]);
});
}) Both of these ran with a block size of I usually get such values for
And such values for
Here is the result with cache=0 for wa-sqlite new VFS, just in case:
You can see that difference between So, what does it mean? We don't need to use I hope I didn't make any mistakes in benchmarking or in implementations, so I will be glad if you will point me to something that I missed. Also, I need to notice — I used patched version of Also, you can play around with more complex demo — https://main--trong-example.netlify.app/ example3.mp4 |
Also, the code for VFS is pretty messy — I am going to refactor it soon 🙂 Also, I appreciate all the work that @rhashimoto did — his discussions in repo were very helpful! But the downside is that I hope somebody (or maybe me 😅) will make MIT version of sqlite + Asyncify (with all respect to rhashimoto! But such license really restricts code usage a lot). Overall, not so much job needed to be done on the first look 🤔 |
@quolpr I'm impressed with the effort that must have gone into this, especially writing a new IndexedDB VFS. I have these thoughts:
|
You are right! I corrected the posts. Thanks!
Yeah, agree. But for now, I also made some heavier tests — and perfomance of wa-sqlite VFS is still good. Also, I was not able to use absurd-sql with 64kb block size, while 64k block size with wa-sqlite make some performance boost too.
You are right, but regarding size — brotli size difference(I took Oz version of sql.js) is around 60kb. But of course it will take more time to unbrotli and load the I used this commands to test:
Yeah, I hope it will happen. Cause right now restrictions are too strict.
Yeah! You made an excellent job. It's so easy to write any backend you want just in JS 🙂 |
60 KB in bandwidth and 15% in insert time seem like pretty reasonable costs for Asyncify, assuming that holds up. I wouldn't go so far as to claim that is the "same performance", though, unless you collect other benchmarks where the edge goes the opposite way or indicate that this one is an uncommon outlier. |
UPDATE: Code links in this comment are stale. Check the thread for the latest info.
I have an experimental sql.js branch with changes that enable defining a SQLite VFS in Javascript. It's a platform for creating a VFS, so I've been calling it a meta-VFS. The idea is to enable more storage options in the browser, beyond the current in-memory filesystem, as has been suggested a number of times, e.g. #302.
The branch doesn't define any VFS itself, but in a separate repo I have examples of creating an in-memory VFS and an IndexedDB VFS (using it as a block device) with only Javascript. Neither is a huge amount of code, less than 350 lines for IndexedDB and no function over a page, so I think not difficult to understand.
None of this is production-ready code (mainly because of lack of testing), and the branch is not currently suitable for a pull request. But I wanted to start a discussion on whether adding something like this makes sense, and if it does how to incorporate it.
I know that the maintainers have limited time and adding multiple storage options to the code base isn't going to help with that. So my idea is to provide only the capability for storage extensions, i.e. just the meta-VFS, to allow developers to create the extensions themselves outside the scope of the project.
There are some drawbacks to this idea, among them:
Any thoughts, especially from maintainers?
(As an aside the above mentioned branch uses Emscripten modularization to build ES6 modules, so perhaps of interest for #284 and #438.)
The text was updated successfully, but these errors were encountered: