Skip to content
This repository has been archived by the owner on Mar 15, 2021. It is now read-only.

RFC0018: Context resource #31

Merged
merged 3 commits into from
Nov 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ REST API).
|0015|[Schema resource](content/schema-resource/index.md)|rejected|2018-08-22|#28|
|0016|[Blob resource](content/blob-resource/index.md)|approved|2018-09-28|#29|
|0017|[Full item redaction](content/full-item-redacted/index.md)|approved|2018-09-25|#30|
|0018|Context resource|draft|-|#31|
|0018|[Context resource](content/context-resource/index.md)|approved|2018-11-07|#31|
|0019|[Boolean datatype](content/boolean-datatype/index.md)|approved|2018-08-29|#32|
|0020|[Blob normalisation](content/blob-normalisation/index.md)|approved|2018-09-07|#33|
|0021|[Archive resource](content/archive-resource/index.md)|approved|2018-10-31|#35|
Expand Down
326 changes: 326 additions & 0 deletions content/context-resource/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
---
rfc: 0018
start_date: 2018-08-13
decision_date: 2018-11-07
pr: openregister/registers-rfcs#31
status: approved
---

# Context resource

## Summary

This RFC proposes a context resource to surface the current metadata of the
register.

Dependencies:

* [RFC0013: Multihash](../multihash/index.md)


## Motivation

Currently, the register metadata is not exposed consistently. Some is exposed
via the register resource, some via the field register and some from the
register register.

Also, the current specification and implementation don't agree on what is the
register resource.

The main driver for this RFC is to have a computed resource for the latest
metadata in a way that makes the register self contained (i.e. a user should
not need to interact with any other register to get the metadata for a
arnau marked this conversation as resolved.
Show resolved Hide resolved
register).


## Explanation

The **context** is the metadata snapshot that apply to a given log size.

### Definition

```elm
type Context =
{ id : Name
, copyright : Maybe String
, custodian : Maybe String
, description : Maybe String
, hashingAlgorithm : HashingAlgorithm
, licence : Maybe String
, rootHash : Hash
, schema : Schema
, statistics : Statistics
, status : Status
, title : Maybe String
}
```

See each section below of the explanation for each attribute.


#### Id

* Type: Name (This is defined in the specification as `[a-z][a-z0-9-]*`, the
future specification will make it more clear).

The register identifier. The current implications for this value make it act
arnau marked this conversation as resolved.
Show resolved Hide resolved
as a unique identifier for the environment it belongs to. This RFC doesn't
pretend to change this fact.


#### Copyright

* Type: Optional String

The register copyright. E.g. `© Crown copyright`. Similar to the `copyright`
field in the Register register. Must be present for any published register.


#### Custodian

* Type: Optional String

The same value currently exposed in the Register resource. Must be present for
arnau marked this conversation as resolved.
Show resolved Hide resolved
any published register.


#### Description

* Type: Optional String

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this can't be mandatory if we return it here? A register should always have a human readable description, otherwise how do people know to use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, ideally yes and it is something we will probably want to enforce for registers part of the our catalogue but I can foresee registers not having a description at early/immature stages.


The human readable description of the register.


#### Hashing algorithm

* Type: HashingAlgorithm

The hashing algorithm used throughout the register.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we actually need this. When we've implemented multihash, I think you should be able to determine the hashing algorithm used by looking at its root hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a nice thing to offer and not at a big cost. Why do you think it shouldn't be there? The fact that you can infer it doesn't mean it is useless or harmful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, it's just a general preference for less information returned but I don't have strong opinions about this.


```elm
type HashingAlgorithm =
{ id : Name
, functionType : UVarint
, digestLength : Int
}
```

This depends on RFC0013

|Name|Description|
|-|-|
|`digest-length`|The byte length of the digest. E.g. `0x12`|
|`function-type`|The identifier of the hash function. E.g. `0x20`, `0xb220`|
|`id`| The name of the algorithm. E.g. `sha2-256`|

See also: [Multihash table function type
identifiers](https://github.com/multiformats/multihash/blob/master/hashtable.csv).


#### Licence

* Type: Optional String

The licence that applies to the data. Must be present for any published register.

This could get restricted with a suitable list of licences and validated with
an external tool such that the technology allows free text but each register
controller is able to ensure what is allowed.


#### Root hash

* Type: Hash

The root hash for the register. Note that signatures will be addressed in
another RFC. This is not strictly part of the metadata, it is derived from the
data.

#### Schema

* Type: Schema

The set of attributes that define the data allowed in the register.

```elm
type Attribute =
arnau marked this conversation as resolved.
Show resolved Hide resolved
{ id : Name
, datatype : Datatype
, title : Maybe String
, description : Maybe Text
}

type Schema =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like what we refer to in other place as fields is there a deliberate attempt to move away from this to schema? If so we should be consistent across other instances e.g. RSF. if not, we should call it fields

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering whether this would prevent us from supporting another metadata standard.

For example, if we wanted to provide things like csvw or json schema it would be great if we just had one resource and you can request the standard you want, using content types. But I'm not sure if that would fit this model where the schema is part of a larger object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatMoore I did the research on using json schema or csvw schemas and json-ld related and the result is that Registers is not really 100% compatible with them due our primitives. Both csvw and json-schema let you define your own patterns which could work for us as a sort-of compatible translation but it's messier than it seems.

Now, particularly with csvw, it allows to provide schema + context although probably not everything this context resource is offering. I'm not sure it's worth it extracting (again) the schema just because of that.

@gidsg It is deliberated indeed. The "fields" approach builds on a central vocabulary held in the fields register and it has been proven a mistake. Moving away from this central model to a local schema definition predates me, but I totally agree with it.

Changing terminology to schema and attributes is important because a) it's not the same and b) when it is (i.e. field vs attribute) I want to be able to talk about previous concepts with minimal confusion.

RSF is something we can change when we are ready and comfortable but to my eyes it is not important enough to reject or hold this RFC. After all, RSF is an internal format that so far has not even a stable specification.

Set Attribute
```

|Name|Description|
|-|-|
|Id|Attribute identifier (e.g. `start-date`)|
|Datatype|Datatype identifier (e.g. `datetime`)|
arnau marked this conversation as resolved.
Show resolved Hide resolved
|Title|Human readable attribute name (e.g. `Start date`)|
|Description|Human readable attribute description (e.g. `The date a record first became relevant to a register.`)|


#### Statistics

* Type: Statistics

The summary of objects stored in the register. This overlaps with the Register
resource and it is not strictly part of the metadata, it is derived from the data.


#### Status

* Type: Status (Active, Retired). Defaults to Active.

The status of the register. Either active or retired. This addresses the
problem of being able to retire an entire register (see
[issue #17](https://github.com/openregister/registers-rfcs/issues/17)).

```elm
type Status
= Active { startDate : Timestamp }
| Retired { startDate : Timestamp
, endDate : Timestamp
, replacement : Maybe Url
Copy link

@MatMoore MatMoore Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the first time registers have linked to other registers by URL.

This could cause problems if that URL breaks but the register is still available elsewhere.

Another way of doing this would be to use a CURIE then rely on a catalog to resolve this to something (and this could be a catalog outside of GDS's control if GDS is no longer hosting the register register). Although then you couldn't replace a register with a non-register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is the first time indeed, the intention of this is to point to the successor which should be a fixed and clear appointment. I imagine the situation you describe being resolved by a regular HTTP 301 redirection. I'm not sure the indirection added by a CURIE is worth it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough 👍

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I cannot see where this information would be persisted? Is there a companion to this that addresses how this is represented in RSF/Entry log? If not I think this should be pulled out into a separate RFC that describes Status and its representation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnau please can you address this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gidsg status must be part of the register metadata which means it has to be encoded in RSF. The way to express that will highly depend on the metadata solution we end up having but, if it is in line with what we have right now it will be an system entry with the appropriate key pointing to a blob with these information.

, reason : Text
}
```

|Name|Description|
|-|-|
|Start date| Date when the register started.|
|End date| Date when the register was retired.|
|Replacement| A URL to a register that replaces the current one.|
|Reason| A human readable explanation of why the register is no longer active.|


#### Title

* Type: Optional String

The human readable name of the register.


### HTTP resource

***
#### Endpoint

```
GET /context
```

#### Parameters

|Name|Type|Description|
|-|-|-|
|`total-entries` | Optional `Integer`| The log size to compute the snapshot from. Note this only applies when the metadata can be versioned.|


#### Response attributes

|Name|Type|
|-|-|
|`id`| Name |
|`copyright`| Optional String |
|`custodian`| Optional String |
|`description`| Optional String |
|`hashing-algorithm`| HashingAlgorithm |
|`licence`| Optional String |
|`root-hash`| Hash |
|`schema`| List Attribute |
|`statistics`| Statistics |
|`status`| Status |
|`title`| Optional String |

#### Hashing algorithm attributes

|Name|Type|
|-|-|
|`digest-length`| Integer |
|`function-type`| Integer |
|`id`| Name |

See also: [Multihash table function type
identifiers](https://github.com/multiformats/multihash/blob/master/hashtable.csv).

#### Attribute attributes

|Name|Type|
|-|-|
|`id`| Name |
|`datatype`| Datatype |
|`cardinality`| Cardinality |
|`title`| Optional String |
|`description`| Optional String |

#### Statistics attributes

|Name|Type|
|-|-|
|`total-entries`| Integer |
|`total-items`| Integer |
arnau marked this conversation as resolved.
Show resolved Hide resolved
|`total-records`| Integer |

#### Status attributes

|Name|Type|
|-|-|
|`start-date`| Datetime |
|`end-date`| Optional Datetime |
|`replacement`| Optional Url |
|`reason`| Optional String |

***

***
**EXAMPLE:**


```http
GET /context HTTP/1.1
Accept: application/json
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
"id": "multihash",
"title": "The Multihash register",
"description": "List of multihash codes.",
"custodian": "IPFS team",
"hashing-algorithm": {
"id": "sha2-256",
"function-type": 18,
"digest-length": 32
},
"statistics": {
"total-entries": 0,
"total-items": 0,
"total-records": 0,
},
"copyright": "Copyright (c) 2016 Protocol Labs Inc.",
"licence": "MIT",
"root-hash": "1220e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"status": { "start-date": "2018-12" },
"schema": [
{"id": "name", "datatype": "name", "cardinality": "1"},
{"id": "function-type", "datatype": "integer", "cardinality": "1"},
{"id": "digest-length", "datatype": "integer", "cardinality": "1"}
]
}
```

***

### Considerations
arnau marked this conversation as resolved.
Show resolved Hide resolved

The context resource makes the register resource obsolete. Given that users
depend on it the register resource will be available for as long as necessary.
arnau marked this conversation as resolved.
Show resolved Hide resolved

Once the context resource is available, the register resource should use a
warning HTTP header and a Link alternate header.