-
Notifications
You must be signed in to change notification settings - Fork 10
3. Loading Data
This page provides some basic processes to get started loading data.
Data can be inserted into Asami using either entities or statements.
Entities are just the name given to objects that are defined as maps of keys to values. This is a common format for data in JSON or EDN formats, which is often available in files or from data APIs.
Native EDN data can be loaded as a single object, or a sequence of objects.
As an example, consider the following EDN file named data.edn
:
[{:id "bennet"
:type "family"
:name "Bennet"
:children [{:name "Jane"}
{:name "Elizabeth"}
{:name "Mary"}
{:name "Catherine"}
{:name "Lydia"}]}
{:id "bingley"
:type "family"
:name "Bingley"
:children [{:name "Charles"}
{:name "Caroline"}
{:name "Louisa" :surname "Hurst"}]}
{:id "fitzwilliam"
:type "family"
:name "Fitzwilliam"
:children [{:name "Catherine" :surname "de Bourgh"}
{:name "Anne" :surname "Darcy"}]}]
This can be loaded by parsing the file as EDN, and transacting it as :tx-data
. As it is a small file, a call to clojure.core/slurp
is one way to load the file as a string, where it can be parsed and inserted.
(require '[asami.core :as d])
(require '[clojure.edn :as edn])
(def data (edn/read-string (slurp "data.edn")))
(def conn (d/connect "asami:mem://data"))
(d/transact conn {:tx-data data}))
JSON files can be loaded in the same way as EDN, typically converting keys to Clojure keywords automatically. The JSON equivalent to the data above would be:
[{"id": "bennet",
"type": "family",
"name": "Bennet",
"children": [{"name": "Jane"},
{"name": "Elizabeth"},
{"name": "Mary"},
{"name": "Catherine"},
{"name": "Lydia"}]},
{"id": "bingley",
"type": "family",
"name": "Bingley",
"children": [{"name": "Charles"},
{"name": "Caroline"},
{"name": "Louisa", "surname": "Hurst"}]}
{"id": "fitzwilliam",
"type": "family",
"name": "Fitzwilliam",
"children": [{"name": "Catherine", "surname": "de Bourgh"},
{"name": "Anne", "surname": "Darcy"}]}]
Both EDN and JSON can be parsed directly from a file instead of as a string, and this is faster and more memory efficient. This example loads JSON data using the Cheshire JSON library:
(require '[asami.core :as d])
(require '[cheshire.core :as json])
(require '[clojure.java.io :as io])
(def data (json/parse-stream (io/reader "data.json") true))
(d/transact conn {:tx-data data}))
The data loaded by this technique is identical to the equivalent EDN in the previous section.
Note that the JSON example above used a true
parameter when parsing a stream. This automatically converts keys into keywords. While this is preferred, Asami can also work with raw strings as keywords. This may be necessary if some of the strings contain characters that are not legal in Clojure keywords. For instance, the file page.json
contains:
{ "@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025",
"@type": "sc:Canvas",
"height": 3223,
"images": [
{ "@id": "https://www.loc.gov/resource/dcmsiabooks.prideprejudice00aust_5/seq-25/",
"@type": "oa:Annotation",
"motivation": "sc:painting",
"on": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025",
"resource": {
"@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025/full/pct:100/0/default.jpg",
"@type": "dctypes:Image",
"format": "image/jpeg",
"height": 3223,
"service": {
"@context": "http://iiif.io/api/image/2/context.json",
"@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025",
"profile": "http://iiif.io/api/image/2/level2.json"},
"width": 2040}}],
"label": "Page 25",
"metadata": [{"label": "Library of Congress Resource URL",
"value": "https://www.loc.gov/resource/dcmsiabooks.prideprejudice00aust_5/?sp=25"}],
"related": "https://www.loc.gov/resource/dcmsiabooks.prideprejudice00aust_5/?sp=25",
"service": {
"@context": "http://iiif.io/api/image/2/context.json",
"@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025",
"profile": "http://iiif.io/api/image/2/level2.json"},
"thumbnail": {
"@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025/full/pct:12.5/0/default.jpg",
"format": "image/jpeg",
"height": 402,
"width": 255},
"width": 2040}
The fields @id
, @type
, and @context
all contain an "@" character, which can't appear in a Clojure keyword. Keys with space characters are another common issue. It is possible to provide a function to JSON parsing libraries to convert keys into a keyword-compatible format, but this can be difficult if the allowed keys are unknown ahead of time.
To work with data that may be difficult, it can be loaded the same as before, except without converting keys to keywords:
(require '[asami.core :as d])
(require '[cheshire.core :as json])
(require '[clojure.java.io :as io])
(def data (json/parse-stream (io/reader "page.json")))
(def conn (d/connect "asami:mem://page"))
(d/transact conn {:tx-data data}))
Note that this file contained a single object rather than a sequence. A single object is still valid and is treated as a sequence containing one entity.
The resulting data now uses strings as attributes, which will change the format of queries. For instance, to ask the above data for the height and width of all images:
(d/q '[:find ?height ?width
:where
[?image "format" "image/jpeg"]
[?image "height" ?height]
[?image "width" ?width]]
conn)
Asami statements can be inserted directly as :db/add
operations. These can also appear in a transaction sequence.
For instance, this data structure has 2 entities which refer to the same 3rd entity:
[{:id "charles"
:name "Charles"
:home {:id "scarborough"
:town "Scarborough"
:county "Yorkshire"}}
{:id "jane"
:name "Jane"
:home {:id "scarborough"}}]
This can be represented by adding the following statements:
[[:db/add :a/node-1000 :id "charles"]
[:db/add :a/node-1000 :name "Charles"]
[:db/add :a/node-1001 :id "scarborough"]
[:db/add :a/node-1001 :town "Scarborough"]
[:db/add :a/node-1001 :county "Yorkshire"]
[:db/add :a/node-1000 :home :a/node-1001]
[:db/add :a/node-1002 :id "jane"]
[:db/add :a/node-1002 :name "Jane"]
[:db/add :a/node-1002 :home :a/node-1001]]
This uses 3 keywords to represent the objects.
It is also possible to mix these statements with entities:
[[:db/add :a/node-1000 :id "charles"]
[:db/add :a/node-1000 :name "Charles"]
[:db/add :a/node-1001 :id "scarborough"]
[:db/add :a/node-1001 :town "Scarborough"]
[:db/add :a/node-1001 :county "Yorkshire"]
[:db/add :a/node-1000 :home :a/node-1001]
[:db/add :a/node-1002 :id "jane"]
[:db/add :a/node-1002 :name "Jane"]
[:db/add :a/node-1002 :home :a/node-1001]
{:id "Elizabeth" :sister {:id "jane"}}]
Loading a file containing a sequence like this can be done by:
(require '[asami.core :as d])
(require '[clojure.edn :as edn])
(require '[clojure.java.io :as io])
(def data (edn/read (io/reader "adds.edn")))
(def conn (d/connect "asami:mem://data"))
(d/transact conn {:tx-data data}))
An alternative to :db/add
statements is when all the data is in a triple form. In this case, they can be sent directly using :tx-triples
instead of :tx-data
.
The above data in triple form would appear as:
[[:a/node-1000 :id "charles"]
[:a/node-1000 :name "Charles"]
[:a/node-1001 :id "scarborough"]
[:a/node-1001 :town "Scarborough"]
[:a/node-1001 :county "Yorkshire"]
[:a/node-1000 :home :a/node-1001]
[:a/node-1002 :id "jane"]
[:a/node-1002 :name "Jane"]
[:a/node-1002 :home :a/node-1001]]
Loading is almost the same, with the change in the parameter label:
(require '[asami.core :as d])
(require '[clojure.edn :as edn])
(require '[clojure.java.io :as io])
(def data (edn/read (io/reader "triples.edn")))
(def conn (d/connect "asami:mem://data"))
(d/transact conn {:tx-triples data}))
Nodes that indicate entities are often represented using Asami Internal Nodes. These are serialized in EDN as #a/n[1234]
where the number can be any positive long integer. Loading data that contains these elements requires a reader that is found in asami.graph/node-reader
.
An alternative to the triples above could include these internal nodes:
[[#a/n[1000] :id "charles"]
[#a/n[1000] :name "Charles"]
[#a/n[1001] :id "scarborough"]
[#a/n[1001] :town "Scarborough"]
[#a/n[1001] :county "Yorkshire"]
[#a/n[1000] :home #a/n[1001]]
[#a/n[1002] :id "jane"]
[#a/n[1002] :name "Jane"]
[#a/n[1002] :home #a/n[1001]]]
This would then be loaded by specifying the reader:
(require '[asami.core :as d])
(require '[asami.graph :as g])
(require '[clojure.edn :as edn])
(require '[clojure.java.io :as io])
(def data (edn/read {:readers g/node-reader} (io/reader "triples.edn")))
(def conn (d/connect "asami:mem://data"))
(d/transact conn {:tx-triples data}))
Asami can also export data which can be imported to another store. You should specify a specific database to export from, rather than the connection, but connections are still accepted and the latest database will be selected:
(require '[asami.core :as d])
;; load up existing data
(def conn (d/connect "asami:local://existing"))
;; export this data to a file
(spit "export.edn" (d/export-str conn))
The data can be imported into another database connection. In this case, it must be imported to a connection, where it will update the most recent database:
(require '[asami.core :as d])
(def conn2 (d/connect "asami:mem://newdata"))
(d/import-data conn2 (slurp "export.edn"))
Data may also be sent directly from one database directly to another. For instance, the value of a local database from some time in the past could be sent to an in memory database to experiment with:
(require '[asami.core :as d])
(def conn-existing (d/connect "asami:local://existing"))
(def conn-new (d/connect "asami:mem://new"))
;; get a database from conn-existing for some time in the past
(def past-data (d/as-of (d/db conn-existing) #inst "2021-06-28T23:08:16.949-00:00"))
(d/import-data conn-new (d/export-data past-data))