Skip to content

[Design draft] Actions, data streams, and middleware

Thom van Kalkeren edited this page Aug 7, 2018 · 5 revisions

Use cases

  • Implementing (virtual) resource executors (current schema:action system, logout, etc.)
  • Logging data requests
  • Batching data requests
  • Create side-effects / chain implementation?
  • Cross-thread/worker communication and processing
  • Consistent transformation of server-provided data (i.e. namespace transformation, local counter-cache generation)
  • Inter-app broadcasting of resource changes

Current solution; dependency injection

The most simple solution is one that's always possible in JS; allowing the action handler to be swapped by the end-user. Such a mechanism would be limited to a single object and that object is bound to the non-generic interface, resulting in either;

  • The emergence of one behemoth implementation, or
  • an abstraction library which works around the existing interface, since the ability to share and complement implementations of different data-transforming modules dramatically increases the usefulness of the system (as proven in numerous libraries such as redux, express/koa, and others).

Current RDFlib implementation

RDFlib allows for actions to be executed on certain data changes, called propertyActions. These are callback which can be registered for when statements containing predefined predicate IRI's enter the store. This allows for reasoning on triple-level data and other operations on new data.

Though suitable for basic inferencing because it executes code directly when a new triple hits the store, it becomes difficult when needing to calculate changes over larger sets or transactions in the store (say, a parsed response) since it's unknown whether the transaction finished at the time of processing. Though the end-user can replicate behaviour by callbacks/chaining, it cannot be properly ensured that all new data is processed while performing operations on the entire dataset without running queries.

Current Link implementation

Link currently goes the dependency injection route, proxying low-level store operations (i.e. add, delete) to keep track of resource changes (and singleton-ing IRI objects for performance).

The action execution system (execActionByIRI) is limited to schema:Action objects that result in network requests. Though this allows the system to declaratively describe the hypermedia system through data, it doesn't allow for more dynamic operations on the data store.

Proposed solution: middleware

Each (large) action in the system is assigned an IRI, and thus is executed or passed down via each layer in the middleware. Each layer in turn may transform the input and pass the result to the next middleware.

Possible functional signature:

  • (store LinkedRenderStore|IndexedFormula) => (next: Function) => Promise

Possible observable/rxjs signature:

  • (store: LinkedRenderStore|IndexedFormula) => (nextPipe: Observable) => value$

Considerations

When using streaming parsers, the results would be introduced into the store one triple/resource at a time. This would somewhat defeat the purpose of parts of this proposal. Regressing back to document parsers is also not an option however.

There might be an option to define two middleware types, streaming and block middleware. The streaming middleware takes a triple stream and are run while data is entering the store, the block middleware are run after a transaction has been parsed but before it is committed. This however requires the ability to run (possibly parallel) transactions, support for which (to the best of my knowledge) is lacking.

Stacking; example

Since rdflib support is not assumed, these are Link/ontola specific, the DataProcessor and RDFStore are wrappers around Fetcher and IndexedFormula respectively.

Development-only middlewares are denoted with the capital letter d.

stack consumes produces
D action logger all actions console output
D performance logger all actions console output
ontola actions ontola:* any
DataProcessor ll:network/* ll:store/*
Linked-Delta ll:add,replace,delete etc ll:store/*
RDFStore ll:store/* data changes

Where

  • The action logger would log the action to the console or some debugging service and call next.
  • The performance logger starts a timer calls next waits to yield and then log the time taken.
  • The ontola actions handler consumes application-specific tasks such as triggering sign-out.
  • The DataProcessor would allow current methods such as getEntity to be called
  • The Linked-Delta handler automatically processes all linked-delta graph triples.
  • The RDFStore commits triples to the store, possibly with transaction management.

Stacking; functional considerations

Should each middleware call the next like in redux/express/koa, or should the stack be predetermined?

When using a (classical) next-call based approach, each middleware is responsible for calling the next function. In redux, the middleware is expected to pass the current value. Opposite from koa, where each middleware seems to have pre-bound arguments to ensure consistency (arguably against a small run-time cost) in which state changes are mediated via a context object. The redux model allows any middleware to propagate ad-hoc events.

When using a predetermined stack with observable streams, the pipes have to be set up ahead of time and therefore must define their switching logic in a declarative manner, possibly resulting in more readable/predictable and performant behaviour.

Shared state

Considerations

Each middleware is able to transform incoming actions, create side-effects (I/O), and create new (ad-hoc) actions. Roughly three function types exist (AFAIK, I'm no mathematician), pure functions which use their arguments to produce a result, methods which consume a domain to produce their result (or modify the domain directly), and combinations of those two. So the primary methods for passing state into a function would either be having arguments or access to the domain.

The mechanism by which arguments are consumed is inherently coupled to the programming interface. Different actions would require different argument types (e.g. a logging system which is generic, delta update processing which might accept a Response, or an schema:Action processor which accepts a request body graph), but every middleware should ideally be able to interpret the arguments in order to enable more powerful behaviour (e.g. preventing a separate 'graph logger' and 'response logger'). The mechanism for executing actions and optionally passing arguments should be uniform yet simple (just the IRI should suffice).

Arguments and consistency

So on one extreme there is domain-only state; all the arguments possibly including the function name have to be prepared in the store before execution by creating an execution context object. And on the other hand, passing all arguments as literal arguments to the middleware.

Since simplicity, consistency and predictability are desirable features when in comes to application reasoning, it makes sense to require the middleware to act as a pure function. This however limits the possibilities for implementing more powerful middlewares, i.e. allowing both reading and writing to the store (the domain).

Acquiring consistent behaviour across time while keeping powerful possibilities, if possible, would therefore require some constraints as to what a middleware may do in order to process an action. It is self-evident that if subsequent middleware layers read and write to the same data as the normal application works on, inconsistent behaviour will arise since middleware operations aren't timed to other read-write operations (like the fetcher).

So if the store is to be used in processing (to enable advanced possibilities) it must be guaranteed that the (functional) result of each layer is a function of either the arguments, or an externally write-protected resource in the store. To acquire isolation in each layer and thus that inter-action execution flow stays consistent across store changes, layers are free to access the store but can't use the data for the purpose of changing execution flow. In other words, all execution path dependent data must fall within the functional context, which means that the code-path of each event will be deterministic, but that resulting changes to the store and I/O are (possibly) not.

Arguments; Argument list V context argument V data context

Three different models are considered in executing actions;

  • Argument list - Each execution request consists of an array, starting with the function IRI and optional arguments following.
  • Context argument - Each execution request consists of two arguments, the action IRI and a (writable?) context object on which additional information to execute the action is stored.
  • Data context - Like the context argument, but the second argument is an IRI as well (or blank node) to an in-store resource holding the execution arguments.
Argument list

This method, like the others can be implemented with further constraints (e.g. required argument order, constraints on the shape), but that would basically make for the context argument design. So the argument definition is completely open to the action.

Pros:

  • Very high performance: each argument can be fully optimized to the action (e.g. passing a SharedArrayBuffer for threaded calculations).

Cons:

  • Highly specific: middleware can't make assumptions on the function context making implementing generic functions difficult. Can possibly be mitigated by pre-defining actions in a vocabulary with a specialized ontology.
Context argument

Plain object - The simplest version would be a POJsO, a basic Object on which either well-known or action-specific attributes can be written and used across layers using plain JS.

Pros:

  • High performance; basic JS objects are very cheap to instantiate and pass around.
  • Simple and versatile; accessing and writing data is straightforward JS.

Cons:

  • Key collisions; the freedom of a JS object can also create collisions if different action sets/middlewares don't namespace their instance data.
  • Non-LD; semantics about the (action) data are lost without an accompanying action vocabulary.

Graph - In resolving name collisions and semantic loss, an action specific graph can be passed with the action which contains the data needed to process the action.

Pros:

  • Consistent; It fits the underlying paradigm since it uses an RDF store to resolve arguments.
  • Well-defined; Barr the subject of the arguments list (if any), basic RDF semantics will apply to calling functions preventing name collisions (and theoretically, inconsistent implementations).

Cons:

  • Slow/clunky; An entire graph has to be instantiated and later searched in order to pass a basic string argument.
Data context

If the arguments are to be constraint into either a single resource or a single graph, the main store can be used in order to prepare the action and provide the execution context. The argument data will sit next to the data, but should be shielded from external modification once execution has been started.

Pros:

  • Uses already present objects; the store is already instantiated and made available to each middleware for access, which would reduce the arguments list to a single IRI. A blank node for application actions, a named node for server-prepared actions, or a data-less named node for methods without arguments (though hard to discern when the action should be resolved).
  • Highly interactive; The argument object (and possibly the status) can be read by the views while it's executing.

Cons:

  • Mixes data and action context; For deterministic results, parts of the store should be shielded from modification during execution, which might require deep changes to rdflib.
Arguments; conclusion

None yet, but there might be some combination of the above methods which give the right trade-offs for multiple use-cases.

Link changes

Replaced LRS method entrypoints:

  • getEntity
  • execActionByIRI
  • parseResponse?
  • processDelta?

Requires breaking implementation changes:

  • subscribe
  • buffer/broadcast To decrease duplicate work caused by propagating many tiny changes rapidly, the current buffer system stores all changes until some 'acceptable' level of changes is acquired (flushes are marked manually by calling broadcast, which will schedule an update or execute immediately depending on the arguments). The current changeBuffer implementation might be rewritten as as a (top-level) buffered stream propagating data changes in the system, with the ability to manually emit a closing notifier.