Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardizing state changes in resources (history, undo, sync) #161

Open
joepio opened this issue Apr 3, 2020 · 7 comments
Open

Standardizing state changes in resources (history, undo, sync) #161

joepio opened this issue Apr 3, 2020 · 7 comments

Comments

@joepio
Copy link

joepio commented Apr 3, 2020

Solid specs standardize how to represent the current state of a resource (RDF, in some valid serialization format), but there is no part of the spec that describes how to store or share the deltas / changes in data / patches / transactions (I'm just calling them Deltas from now on).

Why store & standardize deltas

Having an append-only event log / ledger that describes every single mutation in a pod can provide some cool features:

  • Full time travel, on every resource (using Memento, perhaps?)
  • Resource versioning
  • Generate Activity Streams
  • Synchronizing data between systems (peer to peer!)
  • Lightweight synchronization of small changes client and server. (per triple adjustments, instead of full resource replacement)
  • Easy (incremental) backups
  • Add new serialization options / state representations by adding systems that playback the delta log (e.g an elasticsearch loader)
  • Notification / alert systems
  • Easier debugging
  • Audit logs

One might argue that we don't need a standardized event system for most of these features - every solid server implementation could create their own way of dealing with versioning, for example.

However, I think standardizing this would improve data portability. If these events are standardized, the user can maintain undo / version history across Solid servers. And besides individual advantages, it would enable more powerful and performant data synchronization. Even very large resources could be updated incrementally, triple for triple.

And besides, since RDF is a relatively simple model, I think standardizing this will not be too complicated.

What the standard should define

  • The model of a delta - both atomic (mutation to some field) and larger (set of mutations, in one batch)
  • How an RDF graph should be mutated upon playing back a set of deltas (i.e. the methods in deltas)
  • How to exchange such an event log

What this means for clients

Currently, (most) Solid apps write to a pod by writing a full RDF resource. This works fine for smaller documents, but it becomes very inefficient and error prone when resources consist of more triples. Therefore, I think that clients should be able to send these state changes to their pod, and both the pod as the client app should be able to parse the delta and apply it to their RDF store.

Ways to standardize event logs

Some initiatives already exist that aim to standardize how deltas should be serialized and interpreted.

Some things to take in mind when considering (or designing) a delta standard:

  • How hard is it for (new) developers to comprehend and use?
  • How performant is it?
  • Can it be streamed?
  • Does it already exist / does it have working implemetations?
  • Does it allow for encryption and signatures (useful for P2P)?

RDF-Delta

This standard consists of two concepts: RDF Patch and the RDF Patch log. It introduces a new serialization format, similar to turtle, where you can add some letters before statements that encode for mutations. It also supports header items, e.g. to reference to previous commits.

An Apache Jena implemtation + CLI app already exist.

linked-delta

linked-delta is serialized in n-quads and uses the fourth column to semantically describe how a triple should be changed (e.g. update the value, add it, remove it). We created this and use it in our e-democracy application Argu to communicate state changes (when resource attributes change) between back-end and front-end.

The main benefit of this solution, is that it is light weight and does not require new serialization formats, and n-quads is the RDF serialization format that's the easiest to write a parser for. Since the fourth columns uses IRIs, the spec is inherently exetendible: any IRI can be added, which means that in the future we might come up with many other things than "add" or "replace". However, this might introduce complexity, since loaders (apps that playback the deltas) now might have to deal with unknown methods.

Some implementations exist: [Link-lib] (browser typescript), linked_rails (ruby on rails, server side),

  // This is how you can describe and process a linked-delta in a JS app:
  store.processDelta([
    new Statement(
      subject, // https://timbl.inrupt.net/profile/card#me
      predicate, // https://schema.org/firstName
      object, // "Tim"
      ld("replace"), // => http://purl.org/linked-delta/replace
    ),
  ]);

This spec does not (yet) standardize the level above a set of quads - and I do think it makes sense to standardize how we denote who created a delta, whether it's signed, when it's created, what the previous hash is (to make a cryptographically valid ledger).

Currenlty, the order in which statements appear in a linked-delta document do not have any semantic meaning, and there are rules that determine in what order a parser (loader?) should interpret all delta statements.

N3 Patches

Tim Berners-Lee mentioned N3 Patches during a meeting some time ago, as an alternative, but I failed to find more about this.

LD-Patch

LD-Patch is a W3C working group spec that also introduces a new serialization language.

SRARQL updates

SPARQL-Update supports INSERT and DELETE, so you could use these SPARQL Update strings to store deltas.

Using PROV / other reification methods

Maybe the right way to store changes is to express it in RDF, perhaps use the PROV ontology for this.
This would of course eliminate the need for a new serialization format.

However, I feel like it should be trivial / really simple to convert these change statements into valid RDF.

Atomic Commits

see https://docs.atomicdata.dev/commits/intro.html

disclaimer: This is a design of my own.

It's a JSON based serialization of state changes, which allows for full traceability using cryptographic signatures. It's implemented and used in atomic server and atomic data browser. Only works with a strict subset of RDF.

TL;DR

Using deltas to communicate state changes is efficient and makes P2P state sharing easier. Storing deltas makes it easier to deal with backups, versioning, undo, and adding new query options. Various solutions exist, but perpahs we need something else.

Most importantly, we should pick one, and I'd love to hear your thoughts on this!

@RubenVerborgh
Copy link
Contributor

Missing above seems to be the issue of blank nodes and canonicalization (see Aidan Hogan and others).

Tim Berners-Lee mentioned N3 Patches during a meeting some time ago, as an alternative, but I failed to find more about this.

Implemented by yours truly in https://github.com/solid/node-solid-server/blob/v5.2.2/lib/handlers/patch/n3-patch-parser.js (nodeSolidServer/node-solid-server#516)

how to store or share the deltas

Can you maybe be a bit more precise about the problem we are solving? Because storage is not a Solid concern (the specs only govern exchange). Is this about PATCH?

Because versioning itself is just Memento (and that design is part of server architectures).

@joepio
Copy link
Author

joepio commented Apr 3, 2020

Can you maybe be a bit more precise about the problem we are solving? Because storage is not a Solid concern (the specs only govern exchange). Is this about PATCH?

Because versioning itself is just Memento (and that design is part of server architectures).

I'm mostly thinking about client-server (two way) communication, e.g. during collaborative document editing, but I think that standardized deltas can be useful in many contexts. And for many of these use cases, storing the deltas itself is important (auditability + P2P state replication = why git is awesome). Now I agree that we should not care for how these deltas should be stored (any solid server implementation can do whatever it likes), but providing a standard interface for accessing and appending these deltas is something that the spec maybe should cover.

@RubenVerborgh
Copy link
Contributor

I'm mostly thinking about client-server (two way) communication, e.g. during collaborative document editing, but I think that standardized deltas can be useful in many contexts.

OK but then we should probably have collaborative editing as an issue/use case.

And for many of these use cases, storing the deltas itself is important

exposing; slight difference, but important in the Solid context, because the specs only govern the exchanges.

@gsvarovsky
Copy link

OK but then we should probably have collaborative editing as an issue/use case.

I've just done a scrape of related use-cases from Solid and w3, and written it up on the forum, and indeed, it's not expressed directly.

However solid/user-stories#22: "As a developer, I want to be able to subscribe to a stream of pod events" supports this ticket in general.

@TallTed
Copy link
Contributor

TallTed commented Mar 10, 2021

Using NQuads for the deltas, and using the context (fourth) column as an "action" indicator only works if you're applying these deltas to a single graph, i.e., that your target is not working with Named Graphs, which at least some Solid servers (will) do. Something to consider as this suggestion moves forward...

@joepio
Copy link
Author

joepio commented Mar 10, 2021

Using NQuads for the deltas, and using the context (fourth) column as an "action" indicator only works if you're applying these deltas to a single graph, i.e., that your target is not working with Named Graphs, which at least some Solid servers (will) do. Something to consider as this suggestion moves forward...

My colleague Thom suggested using a query parameter in the 'action' field, if you use named graphs.

@joepio
Copy link
Author

joepio commented Jan 13, 2022

I'm happy to see N3 Patch has been invented and added to the spec. This is an interesting alternative to the earlier mentioned existing specs. I'm still a bit sceptical to relying on the n3 serialization format, as it will be new to most developers and it can be quite hard to parse.

Anyway, If n3 patches are persisted (and named) by a server, it becomes possible to construct verisons / a history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants