Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "import JSON from Roam" #31

Closed
tangjeff0 opened this issue May 6, 2020 · 24 comments
Closed

Implement "import JSON from Roam" #31

tangjeff0 opened this issue May 6, 2020 · 24 comments

Comments

@tangjeff0
Copy link
Collaborator

tangjeff0 commented May 6, 2020

Assigned to @jeroenvandijk

This issue covers importing only Roam's JSON export.

Requires generating :block/uids for pages since Roam's export lacks them... 🤷‍♂️🧐

Also, there are some properties in the JSON that may need to be omitted otherwise the datascript transaction throws an exception. Consider using clojure.spec, specter, or something like that to ignore non-essential attributes for now.

@jeroenvandijk jeroenvandijk self-assigned this May 8, 2020
@jeroenvandijk
Copy link
Contributor

@tangjeff0 I did some exploratory work. I pretty printed the json dump and I did some analysis on the attributes https://gist.github.com/jeroenvandijk/6713bf0af4fe5bc82ebf4a261766537b#file-pretty_print_json_export-edn

I think there is more information in the dump than you thought? For instance, in many cases there is a :block/uid, but it's at not at the top level, only under children it seems. Also in the diagrams many things are actually uid (e.g. from, to, id, maybe more).

Might take some time to get this right. Maybe we can do a cross reference with your datom export to make sure we are not missing any data.

@tangjeff0
Copy link
Collaborator Author

Yeah, last I recall :block/uid was present for all child blocks, but not for some pages. I didn't dig into diagrams, but all of them use blocks under the hood. You can see how that would look in datoms here.

@jeroenvandijk
Copy link
Contributor

jeroenvandijk commented May 12, 2020

Making some progress with Meander (gist)

It's not there yet, it's still missing some mapping, but slowly getting there. I think/hope it's a matter of understanding Meander better. The expected result is a self-explaining import script.

I'm asking for help in the #meander slack channel

@jeroenvandijk
Copy link
Contributor

jeroenvandijk commented May 14, 2020

Some more WIP and background here:

Transformation is more or less complete. Next step is getting it in the right shape to transact to Datascript

@palashkaria
Copy link

palashkaria commented Jul 1, 2020

@jeroenvandijk @tangjeff0 I have been thinking of this for a while & following along on discord, but I have a couple of ideas here:

  1. Instead of supporting roam JSON (which has incomplete info, as you mention, page UIDs are missing), why don't we just use the query method? (there's window.roamAlphaApi for this)

    • The advantage: we get the whole data
    • I can see concerns with this method being hard to use, but we can easily build this into a browser extension using window.roamAlphaApi.q, say roam-toolkit,(or a tiny one of our own/a small script)
  2. There are other ways to get data out - for eg by taking a dump of the local indexeddb - this might be a useful idea for exporting stuff from Athens itself ( Import/Export #98 )

    • For example, the memex extension by worldbrain, they ask you to download and run a local server & save files to your hard drive - it's basically a dump of the db which works on import

    • they use worldbrain/storex for this, which is a way to descrive schema which can talk across types of DBs/devices. For example, you can have the same db running on both browser's indexedDb (via Dexie) * SQLite in React-native (sorex docs). They also have something called storex-sync in the works for cross device syncing of this data, offline (via a local server)

    • also, memex runs completely offline, only on indexedDb, so that shows how powerful idxDb can be

Please lmk thoughts on this. I would like to contribute to this area :)

@tangjeff0
Copy link
Collaborator Author

tangjeff0 commented Jul 1, 2020

@palashkaria I think roam-toolkit could be a great solution if we decide to go with roamAlphaApi. The only problem here is that this export is not perfect. If you look at the return value of roamAlphaApi.q, you can see we lose all the namespaces for attributes.

  • This means we don't know if the attribute "time" maps to :edit/time or :create/time.
  • We don't know if "email" maps to :create/email or :edit/email.

On the plus, this export seems to export data structures well like the refs attribute, which is a set of 3 vectors.

There may be other export issues I'm not aware of. If we can find an interoperable example of each attribute described in Notion, I'm happy to use this method. I don't think :x/email or :x/time attributes are that important ultimately. I think :block/uid, :node/title, :block/string, :block/order, :block/children are the most important, which all seem to work.


Regarding your second point, I love worldbrain. How do we query and pull data from IndexedDB? If we can get it out in datom format, that'd be great!

@palashkaria
Copy link

palashkaria commented Jul 2, 2020

@tangjeff0 to get around that, we can use .pull, maybe; Something like this:

const data = window.roamAlphaAPI.q(`[:find ?e :where [?e ?a ?v]]`).map(dbId =>  {

	return window.roamAlphaAPI.pull('[*]', dbId[0]))`;

});

this will give an array of such objects

:block/uid: "MKCclfcvD",
:create/email: "example@gmail.com",
:create/time: 1593616207301,
:db/id: 94,
:edit/email: "example@gmail.com",
:edit/time: 1593616207302,
:node/title: "roam/css",


:block/open: false
:block/order: 39
:block/string: "    - <3"
:block/uid: "QP_YnHL-l"
:create/email: "example@gmail.com"
:create/time: 1591331394897
:db/id: 68
:edit/email: "example@gmail.com"
:edit/time: 1593616207210

Although I think it might be possible to do it in one query too; not quite sure how.

(Just trying out a pull inside .q did not work)

window.roamAlphaAPI.q(`[:find (pull ?e []) :where [?e ?a ?v]]`)

@tangjeff0
Copy link
Collaborator Author

tangjeff0 commented Jul 2, 2020

Wow, great idea using pull syntax. This q-map-pull works perfectly! @palashkaria

roamAlphaAPI.q(`[:find [?e ...] :where [?e _ _]]`).map(e => roamAlphaAPI.pull('[*]', e))

Want to begin making the roam-toolkit plugin?

FYI the way to do it one query is:

roamAlphaAPI.q(`
[:find [(pull ?e [*] ) ...]
 :where [?e ?a ?v]]`)

But this again loses the namespace attributes 😢 but we can still use your JS map idea

How about IndexedDB? What can we pull from there?

@palashkaria
Copy link

palashkaria commented Jul 2, 2020

@tangjeff0 is this all the data? The prev query was missing some stuff I think; could you test this one out?


window.roamAlphaAPI.q(`[:find ?e ?a ?v :where [?e ?a ?v]]`).map(x => window.roamAlphaAPI.pull('[*]', x[0]))


@palashkaria
Copy link

palashkaria commented Jul 2, 2020

@tangjeff0 IndexDb is pretty involved, & can look very weird with roam internals exposed
You can go to your Application tab in inspect > storage & look at dbs there. What these are useful for is to make a backup of your whole roam db etc

an example of db data



tx: "[["~:db.fn/retractAttribute",["~:block/uid","jjXiTSxJe"],"~:block/refs"],["^ ","^1","jjXiTSxJe","^2",[["^ ","^1","MfxMc1zIG"]]]]"


tx: "[["~:db/add","uuid4236c39a-e712-xxxx-xxxx-9ecb03ec833e","~:block/uid","Vrv2DiaVE"],["^ ","^1","jjXiTSxJe","~:block/children",[["^ ","~:block/string","","~:create/email","example@gmail.com","~:create/time",1593104696364,"^1","Vrv2DiaVE","~:block/open",true,"~:edit/time",1593104696364,"~:db/id","uuid4236c39a-xxx-xxx-bac0-9ecb03ec833e","~:edit/email","example@gmail.com","~:block/order",1]]]]"

--

@palashkaria
Copy link

palashkaria commented Jul 2, 2020

Also, please test out the query with more kinds of things - I am unable to verify if everything in notion works properly with this - do we have a validator of any kind? Maybe just importing into athens?

I'll also figure out the roam-toolkit part if this pans out.

@tangjeff0
Copy link
Collaborator Author

Please provide the JS necessary to access the IndexedDB data. We can compare window.roamAlphaApi with the IndexedDB data.

What you showed me are not Roam internals. In fact, I believe it is datascript-transit. Athens uses the same thing but with localStorage right now:

Screen Shot 2020-07-02 at 11 36 55 AM

We can compare data from these two sources. If they are the same, then it is a perfect export. @palashkaria

@tangjeff0
Copy link
Collaborator Author

(But I’m pretty confident that q-map-pull works as expected)

@palashkaria
Copy link

palashkaria commented Jul 2, 2020

@tangjeff0 ah yes, so that's all there is but no direct schema/datom exports. We can export these transaction dbs using [Dexie](https://github.com/dfahlander/Dexie.js/), as blobs, somewhat like this:

import {exportDB} from 'dexie-export-import';
import Dexie from 'dexie';
import { saveAs } from 'file-saver';


Dexie.getDatabaseNames(dbNames => {
	dbNames.forEach((dbName) => {
		const db = new Dexie(dbName);
		db.open().then(() => {
			exportDB(db).then(blob => {
				saveAs(blob, `${dbName}.json`);
			})
		})
	})
})

Note: This will download a lot of files - your browser will probably warn you

  • Also, a side note localstorage is synchronous, so it would be slow, indexedDb might be be a better idea for athens eventually (I suggested using it with Dexie.js)

https://web.dev/storage-for-the-web/#other

@palashkaria
Copy link

palashkaria commented Jul 2, 2020

@tangjeff0 all considered, let's go ahead with q-map-pull itself (although it might be a bit slow because iterations)

You are sure there's no way to get namespaces with attributes in pull-expr? I tried searching for that, but I'm just a beginner at datascript; nothing in the datalog docs around this

@tangjeff0
Copy link
Collaborator Author

tangjeff0 commented Jul 2, 2020

@palashkaria The datalog semantics are correct. The issue is that namespaced keywords are mostly a Clojure (edn) construct that requires additional serialization to export to JSON.

Namespaced keywords are considered harmful by some.

@tangjeff0
Copy link
Collaborator Author

tangjeff0 commented Jul 20, 2020

@palashkaria implemented lossless import in #288. Feel free to start on the roam-toolkit plugin!

@tangjeff0
Copy link
Collaborator Author

I would also understand if you didn't want to put this directly in roam-toolkit, so we could just make this a small script as you also suggested in #31 (comment). But you (and @Stvad) are the main contributors there so I'll let you decide.

@sagarjauhari
Copy link

Trying to figure out the status of this feature - is the roam import supported atm?

@tangjeff0
Copy link
Collaborator Author

Not supported atm @sagarjauhari , waiting for performance improvements before importing this. Many people have large Roam dbs.

@Mayeu
Copy link

Mayeu commented Nov 12, 2020

I don't know if you saw, but there is now a "loseless edn export" in Roam.

@tangjeff0
Copy link
Collaborator Author

Yeah, that makes it even easier @Mayeu. Thanks!

@TDHTTTT
Copy link

TDHTTTT commented Mar 21, 2021

Hi I am wondering if this is currently being worked on. If not, how can I help? Thanks!

@tomisme
Copy link
Contributor

tomisme commented Mar 22, 2021

@TDHTTTT I think the latest work is at #561

shanberg added a commit to shanberg/athens that referenced this issue Sep 2, 2022
* feat: readable table query

* fix: clearer style for disabled taskbox

* improvement: neater query table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants