Adding durability for ClojureScript #14

logseq-cldwalker · 2023-12-14T15:26:11Z

Hi @tonsky. Thanks for this handy library! This is a conversational PR to see if you'd be interested in our ClojureScript durability that @tiensonqin added as part of our datascript.storage cljs implementation. We are using this forked datascript in our product's feature branch and it is working well for our needs. The existing cljs tests on this repo are passing as our the relevant upstream datascript.storage cljs tests. I'm aware there's some minor whitespace and commenting that needs to be cleaned up here as well as cljs storage tests that need to be ported. I could help with those but for any design questions I'd defer to @tiensonqin. Would you be interested in a contribution like this?

fix unit tests

Also remove unnecessary changes to project.clj and .gitignore

tonsky · 2023-12-16T19:37:36Z

Yes! I initially skipped CLJS implementation because I thought localStorage is too small to be useful, and IndexedDB is async. What are you guys using for storage?

logseq-cldwalker · 2023-12-18T16:39:14Z

Cool. We're using SQLite via OPFS. From the OPFS approaches that SQLite offers we chose the OPFS pool approach because it offers a sync handle. The async approach proved too difficult and we're unsure if it's doable

tonsky · 2023-12-19T16:22:33Z

Please let me know when it’s ready for review

logseq-cldwalker · 2023-12-19T22:15:53Z

Sure. This is ready for review

logseq-cldwalker · 2023-12-21T14:38:33Z

@tonsky Sorry. There was actually still another bug with deletion. Since I'm going to be away on holiday vacation, I'm putting this back until draft until I get back the first week of next year. Any low-effort feedback would be welcome. Cheers

tonsky · 2023-12-21T16:50:26Z

Sure. Will give it a look

tonsky

Overall looks great, but I have couple of notes

src-clojure/me/tonsky/persistent_sorted_set.cljs

tonsky · 2023-12-27T20:32:08Z

src-clojure/me/tonsky/persistent_sorted_set/protocol.cljs

+(defprotocol IStorage
+  (restore [this address])
+  (accessed [this address])
+  (store [this node address])


I’m not sure what address here does? Clojure version only takes node and storage chooses and returns address, so that it can be compatible with e.g. auto-increment. I think we should keep it consistent between versions

The main idea here is to reuse the existing address to reduce the storage usage and get rid of gc if possible.
We observed 0.1MB increases for db.sqlite when editing one block in Logseq.

I agree that we should keep it consistent between versions, maybe someone can implement the same idea with the Clojure version, or I can remove both address and _dirty.

Wait, you can’t reuse addresses though? Because user can keep a reference to an old version of database that is referencing the old version of the block? If you rewrite it the old db ref will stop working?

There’s actually a whole deal about it in Java version, you kind of need to know all alive references to run GC and not break any of them. That was the initial promise of Datomic and Datascript — dbs are immutable, if you keep a reference to them they’ll keep working

I'm sorry for not getting back to you sooner.
What I did is to keep using any existing address instead of generating a new address for each branch node and leaf, it seems that the Clojure version of Datascript has to remove all the unused addresses during GC here, those unused addresses increases the storage usage unless they're garbage collected. We aim to delete unused addresses immediately instead of delaying to GC to reduce the storage overhead.

tonsky · 2023-12-27T20:35:00Z

src-clojure/me/tonsky/persistent_sorted_set.cljs

+(deftype Leaf [keys ^:mutable _address ^:mutable _dirty]
+  IStore
+  (store-aux [this storage]
+    (if (or _dirty (nil? _address))


What is the idea behind _dirty flag here? In Clojure version, if node has address, it is persisted. If address is nil, it is not persisted (==dirty?). Do we really need two separate flags for this?

src-clojure/me/tonsky/persistent_sorted_set.cljs

whilo · 2024-01-03T18:04:29Z

@logseq-cldwalker What are your main insights into asynchronous support? @pkpkpk and I have also started working on async version of the persistent-sorted-set. I think both async and sync execution models have benefits and drawbacks.

tiensonqin · 2024-01-04T12:35:24Z

@tonsky Thanks for looking into this PR and all the beautiful work for the Clojure community!

tiensonqin · 2024-01-04T12:43:04Z

@whilo Hey!
I think the main reason for Logseq is that it'll require tons of effort to migrate the existing code base to be asynchronous, we still face the challenge that OPFS with SQLite can only be used in a web worker, which is great not to block the main UI thread, but it means both queries and transact will be async, we'll experiment soon with the idea to have an in-memory datascript db in the main UI thread for caching and sync the data with the full db from the worker.

It'll be nice to have async support so that people can choose to store the data in IndexedDB.

tonsky · 2024-01-04T19:40:17Z

@tiensonqin thanks to you for your consistent support and for such a significant contribution in such a tricky part of the system!

whilo · 2024-01-05T10:36:14Z

@tiensonqin Thank you for laying that out, that makes sense. A solution I originally developed for replikativ with the hitchhiker-tree and we are currently revisiting (replikativ/datahike#429) is to stream tree fragment deltas incrementally and then update the db root after everything is in store/storage. That way you can realize your synchronous DataScript scenario. You just need to transact first into a storage system somewhere and then react to its confirmation/updates. I think this would be a simple and nice model to synchronize logseq, but I don't know whether you want to treat the markdown files or the DataScript as the primary source of truth.

logseq-cldwalker · 2024-01-16T22:32:18Z

@tonsky Are there things you're waiting for from us? I think Tienson addressed most of the feedback

tonsky · 2024-01-18T16:34:41Z

Oops, sorry, no. I’ll take a look

tonsky · 2024-02-01T17:13:50Z

@tiensonqin

Okay, I think I finally understand what are you doing. Sorry it took so long to catch up.

I like the idea! It can’t be the default mode though, default should be normal persistent data structure where you can keep references to old copies.

But as an option, I’d like to have it too. I can imagine an app that can only keep one reference to the latest DB at all times. If that eliminates GC, I see how it is beneficial.

So let’s say we want to get rid of GC at all. Right now you have two behaviours: some addressed get erased on next store (the ones that are getteing reused, marked with _dirty), some are erased immediately as database is changed (delete storage [unused-addresses])

I propose we move it all the next store.

Add some sort of queue of freed up addresses to the top level of the tree (through dynamic var?)
When node gets changed/split/merged, its address is added to that queue. Node itself does not remember last address, it just gets set to nil, same as in clj implementation. This will let us get rid of _dirty flag (has .-address === stored).
When the time comes to store new version of the set, we first do (delete storage unused-addresses) and then let the storage allocate new addresses. Upside: it can happen in a batch call.

So it’s not exactly address reuse, more like freeing addresses as we go and allocating new ones.

If a storage doesn’t want to clean up freed addresses immediately, it can make the implementation noop.

Would that work for you?

logseq-cldwalker · 2024-02-12T20:23:56Z

Tienson is out right now for Chinese New Year. Hopefully he can respond soon after he gets back. In the meantime, we've been able to use this PR successfully on databases up to 9.3M datoms which translates to ~1.4G on disk

tonsky · 2024-02-13T12:14:41Z

Awesome!

tiensonqin and others added 21 commits September 19, 2023 16:42

Initial implementation for cljs storage support

b51137b

fix: protocol calls

e6719a4

fix: add back javac-options

5eb0cad

Rename PersistentSortedSet back to BTSet

96244b1

def BTSet

fc8402e

fix typo

f1aa6bd

add target classes

8020ddb

remove impl

94bf28c

fix: make sure keys/addresses/chilren have the same length

f114487

Remove debug asserts

8878430

fix: can skip passing storage to set/store

93b471d

fix: make storage mutable

5801fbc

fix: mutable storage

37d818c

fix: restore node in rpath

4c08766

fix: wrong arguments

f777ffd

enhance: lazy load indexes

068a99f

fix: add missing mutable

095ef73

fix: add IRoot protocol

b98540b

typo

86ccc18

fix: ensure root node exists

647779d

fix unit tests

Remove accidental target classes introduced

ea0e507

Also remove unnecessary changes to project.clj and .gitignore

logseq-cldwalker marked this pull request as draft December 14, 2023 15:26

Add delete fn

78fd0ee

tiensonqin added 2 commits December 19, 2023 20:45

perf: reuse node address as possible as we can

2ef401b

fix: mutable dirty? to avoid saving nodes

efd90bd

logseq-cldwalker marked this pull request as ready for review December 19, 2023 22:13

tiensonqin added 2 commits December 21, 2023 18:52

fix: safe delete for unused addresses

9315caa

fix: release bug

081320b

logseq-cldwalker marked this pull request as draft December 21, 2023 14:38

tonsky reviewed Dec 27, 2023

View reviewed changes

Update new-node and new-leaf to be multi-arity fns

85f7bb3

logseq-cldwalker marked this pull request as ready for review January 4, 2024 12:56

fix: node could be missing

d1c8ac4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding durability for ClojureScript #14

Adding durability for ClojureScript #14

logseq-cldwalker commented Dec 14, 2023 •

edited

Loading

tonsky commented Dec 16, 2023

logseq-cldwalker commented Dec 18, 2023

tonsky commented Dec 19, 2023

logseq-cldwalker commented Dec 19, 2023

logseq-cldwalker commented Dec 21, 2023

tonsky commented Dec 21, 2023

tonsky left a comment

tonsky Dec 27, 2023

tiensonqin Jan 4, 2024 •

edited

Loading

tonsky Jan 19, 2024

tiensonqin Jan 31, 2024 •

edited

Loading

tonsky Dec 27, 2023

whilo commented Jan 3, 2024

tiensonqin commented Jan 4, 2024

tiensonqin commented Jan 4, 2024

tonsky commented Jan 4, 2024

whilo commented Jan 5, 2024

logseq-cldwalker commented Jan 16, 2024

tonsky commented Jan 18, 2024

tonsky commented Feb 1, 2024

logseq-cldwalker commented Feb 12, 2024

tonsky commented Feb 13, 2024

Adding durability for ClojureScript #14

Are you sure you want to change the base?

Adding durability for ClojureScript #14

Conversation

logseq-cldwalker commented Dec 14, 2023 • edited Loading

tonsky commented Dec 16, 2023

logseq-cldwalker commented Dec 18, 2023

tonsky commented Dec 19, 2023

logseq-cldwalker commented Dec 19, 2023

logseq-cldwalker commented Dec 21, 2023

tonsky commented Dec 21, 2023

tonsky left a comment

Choose a reason for hiding this comment

tonsky Dec 27, 2023

Choose a reason for hiding this comment

tiensonqin Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

tonsky Jan 19, 2024

Choose a reason for hiding this comment

tiensonqin Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

tonsky Dec 27, 2023

Choose a reason for hiding this comment

whilo commented Jan 3, 2024

tiensonqin commented Jan 4, 2024

tiensonqin commented Jan 4, 2024

tonsky commented Jan 4, 2024

whilo commented Jan 5, 2024

logseq-cldwalker commented Jan 16, 2024

tonsky commented Jan 18, 2024

tonsky commented Feb 1, 2024

logseq-cldwalker commented Feb 12, 2024

tonsky commented Feb 13, 2024

logseq-cldwalker commented Dec 14, 2023 •

edited

Loading

tiensonqin Jan 4, 2024 •

edited

Loading

tiensonqin Jan 31, 2024 •

edited

Loading