Skip to content
/ igraph Public

Defines a protocol for viewing data as Subject-Predicate-Object graphs, with a simple typedef implementing same

License

Notifications You must be signed in to change notification settings

ont-app/igraph

Repository files navigation

NaturalLexicon logo ont-app/igraph

IGraph defines a protocol which aims to provide a general interface to a variety of graph-based representations (RDF, datascript, datomic, loom, ...)

It also defines a Graph datatype which implements IGraph.

There is a 15-minute video introduction here.

Contents


Installation

This is deployed to clojars:

Clojars Project

Require thus:

(:require 
  [ont-app.igraph.core :as igraph] ;; for the IGraph protocol and related stuff
  [some.igraph.implementation ...] ;; implements IGraph
  )
           

Motivation

One of the defining characteristics of Clojure is that it revolves around a minimal set of basic data structures.

I think it can be argued that the collection primitives in Clojure can be approximately ordered by their degree of expressiveness:

  • seq - first/rest
  • < set - membership
  • < vector - indexable members
  • < map - a collection of associations

The conceit of IGraph is that there is room for a new collection primitive with one higher level of expressiveness:

  • < graph - a collection of named relationships between named entities

This is informed to a large degree by the RDF model, and aims to align with linked data encoded in RDF, while keeping direct dependencies to the RDF stack to a minimum.

The IGraph protocol

This protocol defines the basic operations over a graph conceived of as a set of triples S-P-O, where subject S and object O typically name entities, and property P is a named relation that applies between those entities.

This is directly inspired by the RDF model, but the requirement that these identifiers adhere strictly to RDF specifications for URIs, and that literal values be restricted to a small set of scalars is relaxed quite a bit.

Methods summary

The IGraph protocol specifies the following methods:

Member access

  • (normal-form g) -> {s {p #{o...}...}...}
  • (subjects g) -> (s ...), a lazy sequence
  • (get-p-o g s) -> {p #{o...} ...}
  • (get-o g s p) -> #{o ...}
  • (ask g s p o) -> truthy
  • (query g q) -> implementation-dependent query results

Content manipulation

  • (mutability g) -> One of #{::read-only ::immutable ::mutable ::accumulate-only}

invoke to support IFn

  • (g) = (normal-form g)
  • (g s) -> {p #{o...} ...}
  • (g s p) -> #{o ...}
  • (g s p o) -> truthy

Member access

Normal form

Any implemetation of this protocol, regardless of its native representation must be expressable in IGraph's Normal Form.

As an example, let's start with a graph called 'eg' with four triples:

> (igraph/normal-form eg)
{:john 
  {:isa #{:person}, 
   :likes #{:beef}},
 :mary 
 {:isa #{:person}, 
  :likes #{:chicken}}}
>

These are facts about two subjects, :john and :mary with two facts each.

John is a person who likes beef.

Mary is also a person, and likes chicken.

The normal form has three tiers. The "s-level" is a map from each subject to a "p-level" description of that subject. The normal form for descriptions is a map from a property identifier to an "o-level" set of objects for said subject and property.

What I'm aiming for here is a form that's

  • extremely regular and simple
  • lends itself to expressing and thinking about basic set operations on graphs.
A note on the keyword identifiers used in these examples

To keep things simple and readable, none of the keywords used in these examples are namespaced.

In practice you will probably want to used namespaced keywords, and some implementations of IGraph, e.g. those that interact directly with RDF-based representations, will expect them.

Tractability

It is expected that while many implementations of IGraph will be in-memory data structures of modest size, others might be huge knowledge bases provided on a server somewhere (Wikidata, for example). In the latter case it is always acceptable throw an ::igraph/Intractable for any method that warrants it:

(throw (ex-info "Normal form for Wikidata is intractable" 
  {:type ::igraph/Intractable}))

subjects

The subjects method must return a lazy sequence of the complete set of subjects in the graph (modulo tractability):

> (igraph/subjects eg)
`(:john :mary)
> (type (igraph/subjects eg))
clojure.lang.LazySeq
>

get-p-o

We must be able to get the p-level description of any subject with get-p-o:

> (igraph/get-p-o eg :john)
{:isa #{:person}, :likes #{:beef}}
>

get-o

We must be able to get the o-level set of objects for any subject and predicate with get-o:

> (igraph/get-o eg :john :isa)
#{:person}
>

ask

We must be able to test for whether any particular triple is in the graph with ask (any truthy response will do).

> (igraph/ask eg :john :likes :beef) 
:beef 
> (igraph/ask eg :john :likes :chicken) 
nil
>

query

We must be able to query the graph using a format appropriate to the native representation. This example uses the format expected by ont-app.igraph.graph/Graph, described below:

> (igraph/query eg [[:?person :isa :person]])
#{{:?person :mary} {:?person :john}}
>

In this case, the result is a set of binding maps, mapping :?variables to values, similar to the result set of a SPARQL query.

For comparison, here is a sketch of an equivalent SPARQL query, which would be appropriate if our IGraph protocol was targeted to a SPARQL endpoint which we might call sparql-eg:

> (query sparql-eg 
  "PREFIX : <http://path/to/my/ns#> 
   SELECT * WHERE 
   {
     ?person a :person
   }")
[{:person :mary} {:person :john}]
> 

invoke for arities 0-3

An instance of IGraph must provide invoke implementations as follows:

Without arguments, it must return Normal Form (or throw an ::igraph/Intractable):

> (eg)
{:john {:isa #{:person}, :likes #{:beef}},
 :mary {:isa #{:person}, :likes #{:chicken}}}
>

With a single "s" argument, it must treat the argument as the subject of get-p-o:

> (eg :john)
{:isa #{:person}, :likes #{:beef}},
>

With two arguments "s" and "p", a set of objects must be returned:

> (eg :mary :likes)
#{:chicken}
>

This will often be the value of get-o, but it may also accept as the "p" argument a traversal function, described below.

With three arguments "s" "p" and "o", the response must be truthy:

> (eg :mary :likes :chicken)
:chicken
>
>
> (eg :mary :likes :beef)
nil
>

This will often be equivalent to ask, but again, the "p" argument can be a traversal function, described below.

Content Manipulation

There a several factors to take into account when adding or removing content from a graph.

Some graphs, (such as a public SPARQL endpoint to which one does not have UPDATE permissions) may not be subject to modification. Other native representations (such as a SPARQL endpoint with UPDATE permissions) might best be treated as mutable graphs.

Naturally, other things being equal, the preferred solution is to use immutable graphs when it is possible to do so. The examples in this README will mostly be applied to immutable graphs unless stated otherwise.

mutability

The mutability method returns one of the following values

  • ::igraph/read-only - there is no means for altering the contents of the graph
  • ::igraph/immutable - the graph implements IGraphImmutable
  • ::igraph/mutable - the graph implements IGraphMutable
  • ::igraph/accumulate-only - the graph implements IGraphAccumulateOnly, the approach used in Datomic

The add-to-graph multimethod

IGraph defines a multimethod add-to-graph, dispatched on the type of graph, and a function triples-format. This multimethod can inform mutable, immutable and accumulate-only graphs.

Naturally Normal Form is one possible format:

> (igraph/triples-format {:john {:likes# #{:beef}}})
:normal-form
>

Another possible value is :vector, with a subject and at least one P-O pair:

> (igraph/triples-format [:john :likes :beef])
:vector
> (igraph/triples-format [:john :isa :person :likes :beef])
:vector
>

Finally, we have :vector-of-vectors:

> (igraph/triples-format  [[:john :isa :person] [:mary :isa :person]])
:vector-of-vectors
>

Any implementation of IGraph should support adding to the graph in all of these formats.

The remove-from-graph multimethod

IGraph also defines multimethod remove-from-graph, dispatched on the graph types and a function triples-removal-format. This multimethod can inform both mutable and immutable graphs.

The triples-removal-format function returns the same keywords as triples-format, but adds one more: :underspecified-triple, a vector with fewer than 3 elements:

> (igraph/triples-removal-format [:john])
:underspecified-triple
> (igraph/triples-removal-format [:john :likes])
:underspecified-triple
>

triples-removal-format assigns the :vector-of-vectors flag to a vector of either :vector or :underspecified-vector. All implementations of IGraph should support each of these flags.

This allows us to subtract any format that could also be added, plus all [s * *] or all [s p *].

The IGraphImmutable protocol

An add or subtract operation to an immutable graph returns a cheap copy of the original graph modified per the argument provided.

add

Calling (add g to-add) must return an immutable graph such that the graph now contains to-add. Any triples in to-add which are already in the graph should be skipped.

See the notes above about the add-to-graph multimethod.

Typically adding to a graph in code is most easily expressed using a vector or a vector of vectors:

> (igraph/normal-form 
    (igraph/add 
      eg 
      [[:chicken :subClassOf :meat]
       [:beef :subClassOf :meat]
       ]))
{:john {:isa #{:person}, :likes #{:beef}},
 :mary {:isa #{:person}, :likes #{:chicken}},
 :chicken {:subClassOf #{:meat}},
 :beef {:subClassOf #{:meat}}}
>

We can use the Normal Form of one graph to add it to another:

> (meats)
{:chicken {:subClassOf #{meat}}
 :beef {:subClassOf #{meat}}}
>
> (igraph/normal-form (add eg (meats)))
{:john {:isa #{:person}, :likes #{:beef}},
 :mary {:isa #{:person}, :likes #{:chicken}},
 :chicken {:subClassOf #{:beef}},
 :beef {:subClassOf #{:beef}}}
> 

subtract

The multimethod remove-from-graph supports the subtract operation, dispatched on the type of the graph and triples-removal-format, described above:

> (igraph/normal-form (igraph/subtract eg [:john]))
{:mary {:isa #{:person}, :likes #{:chicken}}}
>
> (igraph/normal-form (igraph/subtract eg [:john :likes]))
{:john {:isa #{:person}}, 
 :mary {:isa #{:person}, :likes #{:chicken}}}
>

The IGraphMutable protocol

Some graphs' native representations are implemented as mutable repositories. To support this, the IGraphMutable protocol provides methods add! and subtract!.

The add-to-graph and remove-from-graph multimethods should still inform the logic here, and the behavior should be essentially the same, with the exception that the graph returned is the same object, mutated as specified.

add!

(add! g to-add) -> g, where g is both the argument and return value.

An error should be thrown if (mutablility g) != ::igraph/mutable.

subtract!

(subtract! g to-subtract) -> g, where g is both the argument and return value.

An error should be thrown if (mutablility g) != ::igraph/mutable.

The IGraphAccumulateOnly protocol

A graph whose native representation is based on Datomic implements what Datomic calls an "Accumulate-only" approach to adding and removing from a graph. To support this, the IGraphAccumulateOnly protocol provides methods claim (corresponding to the datomic 'add' operation), and retract. In this scheme the state of the graph can be rolled back to any point in its history. See the Datomic documentation for details.

The add-to-graph and remove-from-graph multimethods should still inform the logic here, and the behavior should be essentially the same, with the exception that the graph returned now points to the most recent state of the graph after making the modification. Any given instantiation of the graph will remain immutable.

claim

(claim g to-add) -> g', where g is an append-only graph, and g' now points to the most recent state of g's transactor.

An error should be thrown if (mutablility g) != ::igraph/accumulate-only.

retract

(retract g to-retract) -> g', where g is an append-only graph, and g' now points to the most recent state of g's transactor.

An error should be thrown if (mutablility g) != ::igraph/accumulate-only.

The IGraphSet protocol

It will make sense for many implementations of IGraph also to implement the basic set operations, defined in IGraphSet. Set operations may not be suitable between very large graphs.

For purposes of demonstration, let's assume a second graph other-eg:

> (igraph/normal-form other-eg)
{:mary {:isa #{:person}, :likes #{:pork}},
 :waldo {:isa #{:person}, :likes #{:beer}}}
>

I think examples of each operation should serve to describe them.

Methods summary

  • (union g1 g2) -> A new graph with all triples from both graphs
  • (difference g1 g2) -> A new graph with triples in g1 not also in g2
  • (intersection g1 g2) -> A new graph with only triples shared in both graphs

union

> (igraph/normal-form (igraph/union eg other-eg))
{:john {:isa #{:person}, :likes #{:beef}},
 :mary {:isa #{:person}, :likes #{:pork :chicken}},
 :waldo {:isa #{:person}, :likes #{:beer}}}
>

intersection

> (igraph/normal-form (igraph/intersection eg other-eg)) 
{:mary {:isa #{:person}}
>

difference

> (igraph/normal-form (igraph/difference eg other-eg))
{:john {:isa #{:person}, :likes #{:beef}}, 
 :mary {:likes #{:chicken}}}
>
> (igraph/normal-form (igraph/difference other-eg eg))
{:mary {:likes #{:pork}}, :waldo {:isa #{:person}, :likes #{:beer}}}
>

Traversal

Clojure and other functional programming languages have a reduce idiom, which allows the user to create aggregations over a sequence by providing a "reducer function" expressing the relationship between each member of that sequence and the resulting aggregation.

IGraph defines a traverse function to allow the user to create aggregations over the contents of a graph by providing a traversal function, which is analogous to a reducer function, but is nessesarily a bit more involved.

  • (traverse g traversal context acc queue) -> acc'

  • (traverse g traversal acc queue) -> acc' ;; default context = {}

  • (traverse g traversal queue) -> acc' ;; default context = {}, default acc = []

    ... traversing g per the traversal function, starting with the first element of queue, possibly informed by context.

This function will repeatedly call the traversal function until queue is empty, returning the final value for acc. Each call to the traversal function returns modified versions of context, acc and queue.

To illustrate traversal, let's expand on our eg graph by adding some type structure:

Assume we have a graph called 'eg-with-types':

> (def eg-with-types 
    (add eg
      [[:person :subClassOf :thing]
       [:beef :subClassOf :meat]
       [:chicken :subClassOf :meat]
       [:meat :subClassOf :food]
       [:beer :subClassOf :beverage]
       [:beverage :subClassOf :consumable]
       [:food :subClassOf :consumable]
       [:consumable :subClassOf :thing]]))
eg-with-types
> (eg-with-types)
{:consumable {:subClassOf #{:thing}},
 :beef {:subClassOf #{:meat}},
 :person {:subClassOf #{:thing}},
 :beer {:subClassOf #{:beverage}},
 :meat {:subClassOf #{:food}},
 :food {:subClassOf #{:consumable}},
 :beverage {:subClassOf #{:consumable}},
 :pork {:subClassOf #{:meat}},
 :john {:isa #{:person}, :likes #{:beef}},
 :mary {:isa #{:person}, :likes #{:chicken}},
 :chicken {:subClassOf #{:meat}}}

Our eg-with-types now provides a bit more context for what's going on with our heroes John and Mary. Note that :isa and :subClassOf differ in their domain. :isa relates an instance to its class, while :subClassOf relates a class to its parent.

The traverse function

Here's an example of how the traverse function works, starting with a traversal function we'll call subClassOf*, which follows and accumulates all :subClassOf links, starting with an initial queue of say, [:meat :beer]:

> (igraph/traverse eg-with-types subClassOf* {} #{} [:meat :beer])
#{:consumable :beer :meat :food :beverage :thing}
>

The arguments for traverse are

  • g - an invariant graph
  • traversal-fn - A function [g c acc q] -> [c' acc' q'], defining the logic of each step in the traversal
  • context - (optional) a map holding the traversal history plus whatever traversal-fn may want to track. Default is {}
  • acc - (optional) accumulates the resulting value of the traversal. Default is [].
  • queue - the starting queue

Traversal functions

The traversal function takes 4 arguments and returns a vector of length 3.

> (subClassOf* eg-with-types {} #{} [:meat])
[{} #{:meat} (:food)]
>
> (subClassOf* eg-with-types {} #{:meat} '(:food))
[{} #{:meat :food} (:consumable)]
>

The first argument is the invariant graph itself.

The second argument (and first element returned) is the context, which subClassOf* leaves unchanged. Context is used by traverse to avoid cycles, and will be explained in detail below. More sophisticated traversal functions may use the context as a kind of blackboard.

The third argument (and precursor to the second element returned) is the value to be accumulated, identical to its counterpart in the reduce idiom.

The fourth argument (and precursor to the third element returned) is the traversal queue. It must be sequential, and may be ordered in any way that makes sense. An empty queue signals and end of the traversal, at which point traverse will return the value of the accumulator.

Here's a possible definition of subClassOf*:

(defn subClassOf* [g c acc q]
  "Traversal function to accumulate super-classes."
  (let [elt (first q)]
    [c                      ;; context is unchanged
     (conj acc elt)         ;; updating the accumulator
     (reduce conj 
       (rest q) 
       (g elt :subClassOf)) ;; adding super-classes to the queue
       ]))

Context

The context argument to traverse and its traversal function is a map containing key-values which may inform the course of the traversal, but are not part of the accumulated value. This will include:

  • :history set by traverse, and updated to hold all elements encountered in the course of the traversal. In order to avoid cycles, any element in the history will be skipped should it ever re-appear at the head of the queue.

The traverse function also supports these optional keys in the context:

  • :skip? (optional), a function (fn[x] ...) -> truthy, applicable to the head of the queue, which will override :history.

  • :seek (optional), a function (fn [context acc]...) -> acc'. If specified, this function will be called at the beginning of each traversal, and if truthy and non-empty, the traversal will end immediately with the value of acc.

In addition, the traversal function may use the context as a blackboard to communicate between iterations of the traversal. For example, you may want to prune and re-order your queue based on a set of heuristics, details of which are stored in the context.

The queue

The queue argument must be sequential, but is otherwise unrestricted. An empty queue signals the end of the traversal, at which point traverse will return the accumulated value.

Note that conj-ing to a vector in the traversal function suggests a breadth-first traversal, while conj-ing to a seq suggests a depth-first tranversal.

More sophisticated traversal functions may use the context to inform logic to prune and re-order the queue to optimize the traversal.

Traversal utilities

IGraph provides utilities to express several common types of traversal functions.

transitive-closure

  • (trasitive-closure p) -> (fn [g context acc to-visit] ...) -> [context' acc' queue'],

    This returns a traversal function which will accumulate all o s.t. any s in the queue is associated with o through zero or more p links.

So in the example above, the subClassOf* function could be defined thus:

(def subClassOf* (igraph/transitive-closure :subClassOf))

traverse-link

  • (traverse-link p) -> (fn [g context acc queue] ...) -> [context acc' []],

The function returned here will accumulate all o s.t. for all s in queue, (g s p o) is truthy:

> (igraph/traverse 
    eg-with-types 
    (igraph/traverse-link :isa) 
    #{} 
    [:john :mary])
#{:person}
>

maybe-traverse-link

  • (maybe-traverse-link p) -> (fn [g context acc queue] ...) -> [context acc' []]

Matches 0 or 1 occurrences of p:

> (igraph/traverse eg-with-types 
    (igraph/maybe-traverse-link :isa) 
    #{} 
    [:john :mary])
#{:person :john :mary}
>

traverse-or

  • (traverse-or & ps) -> (fn [g context acc queue] ...) -> [context acc' []],

Where ps is one or more traversal functions, merging all of their outputs.

Keyword arguments are interpreted as an implicit traverse-link.

> (def subsumed-by (igraph/traverse-or :isa :subClassOf))
subsumed-by
> (igraph/traverse eg-with-types subsumed-by #{} [:john])
#{:person}
>
> (igraph/traverse eg-with-types subsumed-by #{} [:meat])
#{:food}
>

Traversal composition with t-comp

Composition functions are composable with a 'short form' and a 'long form'.

short form

Short-form composition can be used when the traversal function meets the following criteria:

  • None of the component functions manipulate the traversal context
  • Each component function accumulates a sequential value suitable to serve as the initial queue of the component function that follows it.

Such functions can be called as a simple vector:

> (def instance-of 
    (igraph/t-comp [:isa (igraph/transitive-closure :subClassOf)]))
>
> (igraph/traverse eg-with-types instance-of #{} [:john])
#{:person :thing}
>

long form

In cases where you want to compose a traversal function that cannot meet the criteria above, then instead of passing to traversal-comp a vector of traversal functions, you pass in a map with the following keys:

{ :path  [:<traversal-stage-1> :<traversal-stage-2> ...]
   :<traversal-stage-1> {:fn <traversal-fn>
                         :doc <docstring> (optional)
                         :into <initial accumulator> (default [])
                         :local-context-fn <context> (default nil)
                         :update-global-context (default nil)
                      }
   :<traversal-stage-2> ...
   ...
 }

A call to (t-comp [:a :b :c]) is equivalent to calling (t-comp {:path [:a :b :c]}).

These parameters should allow you as much control as you need over the flow of contexts between each stage of traversal, and over the flow of outputs from any one stage into the input queue of its next stage.

However, most of the time, the short form is sufficient, and at this point, the long form has not been tested heavily.

the :path parameter

This is a vector of traversal function specifications. Each traversal function specification must be either:

  • A traversal function
  • A keyword with an entry in the long-form map
  • A keyword eligible as an implicit traverse-link

If the traversal function specification is itself a function, it will be applied directly.

If the traversal function specification is a keyword, and the t-comp map has a matching entry for that keyword, it will look for and interpret a map with the parameters described in the next section.

If the spec is a keyword without an entry in the long-form map, it is assumed to be a candidate for an implicit traverse-link, i.e. a graph element in 'p' position in g.

traversal specification parameters
  • :fn - a traversal function
  • :doc (optional) - a docstring
  • :into (optional) - a container to which the output should be coerced (default [])
  • :local-context-fn (optional) - a function [global-context] -> local context producing the context for this stage of the traversal.
  • :update-global-context (optional) - a function [global-context local-context] -> global-context', capturing whatever aspects of the current stage of traversal may be of interest to subsequent stages.

Using traversal functions as a p argument to invoke

Recall that implementations of IGraph should provide invoke functions with 0-3 arguments.

Two of these functions involve specification of a p parameter:

(g s p) -> {<o>...}

(g s p o) -> truthy.

This is informed by a multimethod dispatched on whether p is a function.

  • (match-or-traverse g s p) -> #{...}
  • (match-or-traverse g s p o) -> truthy

A typical declaration for an IGraph implementation will contain these two method declarations:

  #?(:clj clojure.lang.IFn
     :cljs cljs.core/IFn)
  ...
  (invoke [g s p] (igraph/match-or-traverse g s p))
  (invoke [g s p o] (igraph/match-or-traverse g s p o))
  ...

If the p argument is a function, then p will be expected to match the signature of a traversal function, and the output of the method will be the value of its traversal, starting with queue [s].

If p is not a function it will be matched directly against elements of the graph.

So given the traversal functions in the examples above:

> (eg-with-types :beef subClassOf*)
#{:consumable :beef :meat :food :thing}
>
> (eg-with-types :beef subClassOf* :food)
:food
>
> (eg-with-types :john (igraph/t-comp [:likes subClassOf*]))
#{:consumable :beef :meat :food :thing} ;; classes of stuff John likes
>

cardinality-1 utilites

Requiring normal form to provide a set as its 3rd-tier representation has the advantage of ensuring that the normal form is as simple and regular as possible, and makes it easy to think about set operations over graphs. However, it can be a bit unwieldy when dealing with the many cases where the descriptive map's keys reliably map to a single scalar value.

The following utilities are provided to help with this:

  • (unique [x]) -> x - translates a singleton sequence to its only value
  • (flatten-description (g s)) Automatically translates the p-o description into a simple k-v mappings wherever only a single v exists.
  • (normalize-flat-description m) is the inverse of flatten-description.
  • (assert-unique g s p o) - replaces one singleton object with another.
    • there are analogous functions assert-unique! and claim-unique for Mutable and Accumlate-only protocols respectively

unique

The unique function takes a sequence and an optional on-ambiguity argument. Default on-ambiguity throws ex-info of type ::igraph/Non-unique.

> (eg-with-types :john :isa)
{:person}
>
> (igraph/unique (eg-with-types :john :isa))
:person
>
> (igraph/unique (eg-with-types :beef subClassOf*))
Execution error (ExceptionInfo) at ont-app.igraph.core/unique$fn (core.cljc:640).
Unique called on non-unique collection
>
> (igraph/unique (eg-with-types :beef subClassOf*)
                 first) ;; arbitrary disambiguation
:consumable

Sometimes defining the as an alias for unique reads better, and is easier to type:

> (def the igraph/unique)
> (the (eg-with-types :john :isa))
:person
>

flatten-description

(igraph/flatten-description (eg-with-types :john))
{:isa :person, :likes :beef}
>
> (let [g (igraph/add 
             eg 
            [:john :likes :beer :has-vector [1 2 3]])
        ]
    (igraph/flatten-description (g :john)))
{:isa :person, :likes #{:beef :beer}, :has-vector [1 2 3]}
>

normalize-flat-description

This is the inverse of flatten-description:

> (igraph/normalize-flat-description 
    {:isa :person, :likes #{:beef :beer}, :has-vector [1 2 3]})
{:isa #{:person}, :likes #{:beef :beer}, :has-vector #{[1 2 3]}}
>
> (let [g (igraph/add 
             eg 
             {:john (igraph/normalize-flat-description {:likes :beer})})
        ]
    (g :john))
{:isa #{:person}, :likes #{:beef :beer}}
>

assert-unique

We can replace one singleton value with another using (assert-unique g s p o) -> g':

> (let [g (igraph/assert-unique eg :john :isa :man)]
    (g :john))
{:likes #{:beef}, :isa #{:man}}
>

I/O

In general writing the normal form of a graph to a stream and applying the reader to it on the other end should be fairly straightforward. Any predicates bearing reader-choking objects will of course need to be filtered out.

At this point, only the :clj platform is directly supported with a pair of functions to read/write to the file system.

write-to-file

(write-to-file [path g] ...) -> path

Will write an edn file with the normal form contents of g.

read-from-file

(read-from-file [g path] ...) -> g'

Will read the normal form contents of path into g.

Other utilities

reduce-spo

  • (reduce-spo f acc g) -> acc', such that f is called on each triple in g. Where f := (fn [acc s p o]...) -> acc'. Cf. reduce-kv.
> (defn tally-triples [tally s p o]
    (inc tally))
> (igraph/reduce-spo tally-triples 0 eg)
4

Implementations

The ont-app.igraph.graph module makes one implementation of IGraph available without any additional dependencies, and so far there are four other libraries in the ont-app project which implement this protocol.

Other implementations are planned, and I'd be interested to learn of any implementations published by other parties.

ont-app.igraph.graph/Graph

The IGraph library comes with ont-app.igraph.graph, whose Graph deftype is a very lightweight implementation of IGraph.

Its native representation is just Normal Form. Any hashable object can technically be provided for any s, p, or o, but be advised that other IGraph implementations often expect to keep non-identifiers a the "o" level.

(require '[ont-app.igraph.graph :as g])

Graph creation

Use make-graph to create a new graph, with an optional :contents argument.

> (def eg (g/make-graph))
eg
> (eg)
{}
>
> (def eg
    (g/make-graph 
      :contents {:john {:isa #{:person}, :likes #{:beef}},
                 :mary {:isa #{:person}, :likes #{:chicken}}}
eg
> (eg)
{:john {:isa #{:person}, :likes #{:beef}},
 :mary {:isa #{:person}, :likes #{:chicken}}}
>

The :contents argument must be in Normal Form.

Querying

Querying is done with a very simple vector-of-triples graph pattern using keywords starting with ":?" to serve as variables. It returns an unordered set of binding maps. This is very minimalistic. Any selecting, ordering, grouping or aggregation needs to be done downstream from the call.

> (igraph/query eg [[:?liker :likes :?liked]])
#{{:?liker :john, :?liked :beef} 
  {:?liker :mary, :?liked :chicken}}
>

Traversal functions can be specified in p position:

> (igraph/query eg-with-types [[:?liker :likes ?liked]
                               [?liked subClassOf* :?liked-class]])
#{{:?liked :beef, :?liked-class :consumable, :?liker :john}
  {:?liked :beef, :?liked-class :beef, :?liker :john}
  {:?liked :chicken, :?liked-class :food, :?liker :mary}
  {:?liked :chicken, :?liked-class :chicken, :?liker :mary}
  {:?liked :chicken, :?liked-class :consumable, :?liker :mary}
  {:?liked :beef, :?liked-class :meat, :?liker :john}
  {:?liked :beef, :?liked-class :food, :?liker :john}
  {:?liked :beef, :?liked-class :thing, :?liker :john}
  {:?liked :chicken, :?liked-class :thing, :?liker :mary}
  {:?liked :chicken, :?liked-class :meat, :?liker :mary}}
>

sparql-client

https://github.com/ont-app/sparql-client

Implements a mutable IGraph for a SPARQL endpoint. Initializtion requires configuring query and update endpoints, and the query language is SPARQL.

Keyword identifiers are expected to be namespaced, and rely on the ont-app/vocabulary library, which uses namespace metadata to intercede between Clojure namespaces and RDF namespaces.

Set operations are not supported.

igraph-jena

https://github.com/ont-app/igraph-jena

Implements a mutable IGraph for Apache Jena.

Keyword identifiers are expected to be namespaced, and rely on the ont-app/vocabulary library, which uses namespace metadata to intercede between Clojure namespaces and RDF namespaces.

Set operations are supported.

Currently supports Jena version 4.

datascript-graph

https://github.com/ont-app/datascript-graph

This implements IGraph for a datascript native representation, and may as such may need to be initialized with some schema declarations. Query language is datalog. Immutable, with set operations.

datomic-client

https://github.com/ont-app/datomic-client

This implements IGraph for the Datomic Client API. The query language is datalog. Mutability model is Accumulate Only. Set operations are not supported.

igraph-grafter

https://github.com/ont-app/igraph-grafter

A port of the IGraph protocols to Grafter.

Testing support

The ont-app.igraph.test-support module provides utilities to developers of downstream implementations to confirm compliance with the various protocols defined here.

The graph_test.cljc file should serve as an example.

It starts with an initialized instance of ont_app.igraph.graph.Graph intended to hold a report containing the results of a battery of tests. This report should be initialized with a triple to declare a function [data] -> test-graph, which should take various bodies of canonical test data and return an instance of the graph under examination....

(require [ont-app.igraph.test-support :as ts])

(defn make-test-graph
  "Creates an instance of the graph I want to test"
  ^ont_app.igraph.graph.Graph [data]
  (g/make-graph :contents data))

(defn make-standard-report
  "Creates a configured report graph."
  []
  (-> 
   (g/make-graph)
   (igraph/add [::ts/StandardIGraphImplementationReport
                ::ts/makeGraphFn make-test-graph])))

Where:

  • the ::ts/makeGraphFn property ranges over fn [data] -> graph
  • data is an appropriate argument to one of the add methods
  • graph is an instance of the IGraph impelementation to be tested

Then we can apply a battery of standard tests, passing in the report and collecting test results.

(deftest standard-implementation-tests
  "Standard tests against examples in the IGraph README for immutable set-enabled graphs"
  (let [report (-> (make-standard-report)
                   (ts/test-readme-eg-access)
                   (ts/test-readme-eg-mutation)
                   (ts/test-readme-eg-set-operations)
                   (ts/test-readme-eg-traversal)
                   (ts/test-cardinality-1)
                   ;; Watch this space for bug-fix tests
                   )
     
        ]
    ;; You could also run (ts/run-standard-implementation-tests))
    ;; `report` with be a graph of test results, some of which might be of type Failed...
    ...

The we can query for errors in the report...

    (is (empty? (ts/query-for-failures report)))))

If it's not empty, the bindings returned will contain descriptions of the failures with hopefully helpful information.

When your graph requires a schema

Some graph representations such as datascript and datomic require that you provide a schema which may be part of your graph content.

In such cases you should provide a triple like this:

(add report [::ts/StandardIGraphImplementationReport ::ts/schemaGraph <schema-graph>])

Where schema-graph contains a graph with just the schema content. Various tests will use this to filter out this stuff when comparing test results to canonical data.

Developer notes

The Makefile has targets for most of the usual stuff.

It presumes that your ~/.clojure/deps.edn has aliases as follows:

{
 :aliases {
           ;; Borkdude's Kondo linter
           ;; typical usage:
           ;; clojure -M:kondo --lint src
           ;; For help: clojure -M:kondo  --help
           :kondo
           {:extra-deps {clj-kondo/clj-kondo {:mvn/version "RELEASE"}}
            :main-opts  ["-m" "clj-kondo.main"]
            }
           ;; Document generator
           ;; Call with clojure -X:codox
           :codox
           {
            :extra-deps {codox/codox {:mvn/version "0.10.8"}}
            :exec-fn codox.main/generate-docs
            :exec-args {:output-path "doc"}
            }
           ;; outdated
           ;; c.f. Lein ancient
           ;; clojure -M:outdated --help for help
           ;; Typical usage: clojure -M:outdated
           :outdated
           {
            ;; Note that it is `:deps`, not `:extra-deps`
            :deps {com.github.liquidz/antq {:mvn/version "RELEASE"}}
            :main-opts ["-m" "antq.core"]
            }
           } ;; /end aliases

}

The nvd target presumes installation of https://github.com/rm-hull/nvd-clojure

Future work

  • Ports to loom, ubergraph, and other graph-oriented libraries
  • There will be a regime for providing annotations for reified triples (for weights and such).
  • Ports to table-based representations
  • igraph.graph will have query planning and indexing
  • Some kind of a scheme to bring all the various query formats under a single tent

Acknowledgements

Thanks to Ram Krishnan for his feedback and advice.

License

Copyright © 2019-22 Eric D. Scott

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

Natural Lexicon logo

Natural Lexicon logo - Copyright © 2020 Eric D. Scott. Artwork by Athena M. Scott.

Released under Creative Commons Attribution-ShareAlike 4.0 International license. Under the terms of this license, if you display this logo or derivates thereof, you must include an attribution to the original source, with a link to https://github.com/ont-app, or http://ericdscott.com.

About

Defines a protocol for viewing data as Subject-Predicate-Object graphs, with a simple typedef implementing same

Resources

License

Stars

Watchers

Forks

Packages

No packages published