Skip to content

Commit

Permalink
Updates readme with info about versioning.
Browse files Browse the repository at this point in the history
  • Loading branch information
gunnarvelle committed Sep 12, 2023
1 parent 1fa7740 commit b16123c
Showing 1 changed file with 46 additions and 7 deletions.
53 changes: 46 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ hirerarchy with new types later.

The taxonomy data model consists of *entities* and *connections* between entities.

The central entities in the taxonomy are Node (Subject, Topic, Resource) and Resource-type. The taxonomy stores metadata
for each entity, such as name and content URI. Translations of names can also be stored.
The central entities in the taxonomy are Node (Subject, Topic, Resource, Programme, Node) and Resource-type. The taxonomy
stores metadata for each entity, such as name and content URI. Translations of names can also be stored.

In addition to the entities, the taxonomy stores the connections you make between entities. Each connection also has
metadata attached to it, such as which entities are connected, and whether this connection is the primary connection
Expand Down Expand Up @@ -84,7 +84,7 @@ flowchart TB
resT3 --> pRes2
```

### Subjects and topics
### Nodes (Subjects, topics and resources)

First, create a subject with the name Mathematics with a POST call to `/v1/nodes` and nodetype SUBJECT. When this call
returns you'll get a location. This location contains the path to this node within the nodes resource, e.g.,
Expand Down Expand Up @@ -200,7 +200,7 @@ When you get all resources for a subject or topic you can choose to get only res
corresponding to the ID of Articles will give you a list of three entities; Sine and Cosine, What is probability, and Adding probability.

### Multiple parent connections
Multiple parent connections for nodes are not allowed, except for resources! But as some topics such as Statistics may
Multiple parent connections for topics are not allowed, but other node-types may. But as some topics such as Statistics may
be relevant in several subjects, it is possible to have the same contentURI for several topics. This allows us to create
a structure where Statistics is a topic in Mathematics, but it is also a topic in Social Studies.

Expand All @@ -219,7 +219,7 @@ a structure where Statistics is a topic in Mathematics, but it is also a topic i
```

In the figure above, primary connections are showed in bold while secondary connections are shown in normal width. Only
resources can have primary connections, and have only *one* primary parent. If you set a different connection to
resources can have non-primary connections, and have only *one* primary parent. If you set a different connection to
be primary, another primary connection will become secondary.

The figure above shows how Statistics is a topic in both Mathematics and Social Studies. If you list all the topics in
Expand Down Expand Up @@ -313,9 +313,9 @@ sub topics and resources below Statistics:


A topic can be marked as a root context by making a POST call to `/v1/contexts`, and removed by making a DELETE call to the same.
(Additionally you could update the root-variable on the node by making POST to `/v1/nodes/{id}`)
(Additionally you could update the context-variable on the node by making PUT to `/v1/nodes/{id}`)
All subjects are defined as root contexts. Please note that POST and DELETE calls to `/v1/contexts` does not create or remove nodes,
it only marks those nodes as being a root context or not.
it only marks those nodes as being a context or not.

To list all root contexts, make a GET call to `/v1/contexts`. The contexts will be listed with their ID, (translated) name and path.

Expand Down Expand Up @@ -378,3 +378,42 @@ The results will contain *Equations with one variable*, *Equations with two vari

If a resource is used in several subjects, you can tag it as core or supplementary material in each of the subjects
separately.


## Versioning of a complete api

Since all nodes in the tree/graph are connected, data versioning is handled with multi-tenancy. This is implemented with
schema-based multi-tenancy. The tenancy is connected to a Version-entity in the database, and all requests to `/v1/versions`
is routed to the base schema, taxonomy_api. This makes all version handling consistent.

When a new version is created, a custom script is triggered and clones another schema, based on the parameters provided.
This script is fetched from [pg-clone-schema](https://github.com/denishpatel/pg-clone-schema/blob/master/clone_schema.sql).
Versions can be based on either the base schema or another version. A version can have one of three stauses: BETA, PUBLISED,
or ARCHIVED. When a BETA schema is published, the current PUBLISHED is set as ARCHIVED. There are no practical limits to
how many versions can be present in the application. Each version get a corresponding hash-value, and the schema name
is based on this value, e.g. taxonomy_api_abcd.

A HttpHeader VersionHash is used to address which schema to communicate with. If no header is provided, the code defaults
to the published version if there is one. If no versions are created, the base schema is used. Values for the header can
be found by fetching list of versions from the versions-endpoint.

Mark that only GET-request uses published version when no header is provided. PUT and POST uses base schema. This is to
support backwards compability for clients. Clients are advised to always use the header value `VersionHash: default` when
interacting with the api, or specify correct version hash from the version-table.

### Publishing changes

When you have a published version and want to make the changes available to clients, you can either make a full or partial
publish of specified changes. A full publish is the easiest and fastest, and is simply done by making a new version based
on the base-schema, and then marking the version as published.

A partial publish is more time consuming, but gives more control over what is being published. This is done by using the
`/v1/nodes/{id}/publish` endpoint. This endpoint takes, in addition to an id of a node in the path, two url-params: sourceId
and targetId. These params specifies the publicIds of the target and source versions where the node is to be published. If
sourceId is omitted, the base schema is used.

The endpoint goes recursively through the node and _all_ children and copies all data from source to target. This is done
by registering the publicId of all node-connections and nodes in the changelog-table. A scheduled task in ChangelogService
checks this table every second and starts processing changes whenever any are available. This makes it possible to share
the data-copying between several servers in a cluster. The further down a node-tree a node is published, the shorter the
time used for copying.

0 comments on commit b16123c

Please sign in to comment.