-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding durability for ClojureScript #14
base: master
Are you sure you want to change the base?
Conversation
fix unit tests
Also remove unnecessary changes to project.clj and .gitignore
Yes! I initially skipped CLJS implementation because I thought localStorage is too small to be useful, and IndexedDB is async. What are you guys using for storage? |
Cool. We're using SQLite via OPFS. From the OPFS approaches that SQLite offers we chose the OPFS pool approach because it offers a sync handle. The async approach proved too difficult and we're unsure if it's doable |
Please let me know when it’s ready for review |
Sure. This is ready for review |
@tonsky Sorry. There was actually still another bug with deletion. Since I'm going to be away on holiday vacation, I'm putting this back until draft until I get back the first week of next year. Any low-effort feedback would be welcome. Cheers |
Sure. Will give it a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, but I have couple of notes
(defprotocol IStorage | ||
(restore [this address]) | ||
(accessed [this address]) | ||
(store [this node address]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure what address
here does? Clojure version only takes node and storage chooses and returns address, so that it can be compatible with e.g. auto-increment. I think we should keep it consistent between versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea here is to reuse the existing address to reduce the storage usage and get rid of gc if possible.
We observed 0.1MB increases for db.sqlite when editing one block in Logseq.
I agree that we should keep it consistent between versions, maybe someone can implement the same idea with the Clojure version, or I can remove both address
and _dirty
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, you can’t reuse addresses though? Because user can keep a reference to an old version of database that is referencing the old version of the block? If you rewrite it the old db ref will stop working?
There’s actually a whole deal about it in Java version, you kind of need to know all alive references to run GC and not break any of them. That was the initial promise of Datomic and Datascript — dbs are immutable, if you keep a reference to them they’ll keep working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for not getting back to you sooner.
What I did is to keep using any existing address instead of generating a new address for each branch node and leaf, it seems that the Clojure version of Datascript has to remove all the unused addresses during GC here, those unused addresses increases the storage usage unless they're garbage collected. We aim to delete unused addresses immediately instead of delaying to GC to reduce the storage overhead.
(deftype Leaf [keys ^:mutable _address ^:mutable _dirty] | ||
IStore | ||
(store-aux [this storage] | ||
(if (or _dirty (nil? _address)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the idea behind _dirty
flag here? In Clojure version, if node has address, it is persisted. If address is nil, it is not persisted (==dirty?). Do we really need two separate flags for this?
@logseq-cldwalker What are your main insights into asynchronous support? @pkpkpk and I have also started working on async version of the persistent-sorted-set. I think both async and sync execution models have benefits and drawbacks. |
@tonsky Thanks for looking into this PR and all the beautiful work for the Clojure community! |
@whilo Hey! It'll be nice to have async support so that people can choose to store the data in IndexedDB. |
@tiensonqin thanks to you for your consistent support and for such a significant contribution in such a tricky part of the system! |
@tiensonqin Thank you for laying that out, that makes sense. A solution I originally developed for replikativ with the hitchhiker-tree and we are currently revisiting (replikativ/datahike#429) is to stream tree fragment deltas incrementally and then update the db root after everything is in store/storage. That way you can realize your synchronous DataScript scenario. You just need to transact first into a storage system somewhere and then react to its confirmation/updates. I think this would be a simple and nice model to synchronize logseq, but I don't know whether you want to treat the markdown files or the DataScript as the primary source of truth. |
@tonsky Are there things you're waiting for from us? I think Tienson addressed most of the feedback |
Oops, sorry, no. I’ll take a look |
Okay, I think I finally understand what are you doing. Sorry it took so long to catch up. I like the idea! It can’t be the default mode though, default should be normal persistent data structure where you can keep references to old copies. But as an option, I’d like to have it too. I can imagine an app that can only keep one reference to the latest DB at all times. If that eliminates GC, I see how it is beneficial. So let’s say we want to get rid of GC at all. Right now you have two behaviours: some addressed get erased on next I propose we move it all the next store.
So it’s not exactly address reuse, more like freeing addresses as we go and allocating new ones. If a storage doesn’t want to clean up freed addresses immediately, it can make the implementation noop. Would that work for you? |
Tienson is out right now for Chinese New Year. Hopefully he can respond soon after he gets back. In the meantime, we've been able to use this PR successfully on databases up to 9.3M datoms which translates to ~1.4G on disk |
Awesome! |
Hi @tonsky. Thanks for this handy library! This is a conversational PR to see if you'd be interested in our ClojureScript durability that @tiensonqin added as part of our datascript.storage cljs implementation. We are using this forked datascript in our product's feature branch and it is working well for our needs. The existing cljs tests on this repo are passing as our the relevant upstream datascript.storage cljs tests. I'm aware there's some minor whitespace and commenting that needs to be cleaned up here as well as cljs storage tests that need to be ported. I could help with those but for any design questions I'd defer to @tiensonqin. Would you be interested in a contribution like this?