Skip to content

VIATRA Query Engine Internals

Zoltán Ujhelyi edited this page Apr 2, 2024 · 1 revision

Overview

Most important concepts

  • ViatraQueryEngine: the central element of the API.
  • QueryScope: the set of model elements the ViatraQueryEngine should work on.
    • The project itself implements only EMFScope right now.
  • IQueryRuntimeContext: wrapper class for the query backends to access the model. The access can be both indexed and non-indexed.
  • IQueryBackend: an executor of graph pattern matches; it instantiates IQueryResultProviders. Query Backends needs to be registered to the ViatraQueryEngine itself.
    • RetePatternMatcher is the fully incremental implementation of the IQueryResultProvider interface.
    • LocalSearchResultProvider is a local search based implementation of the IQueryResultProvider interface.
  • The ViatraQueryMatcher interface is the public matcher API: generated instances are type-safe.

PSystem

The aim of this document is to give an overview on the PSystem representation in VIATRA Query. The PSystem is a declarative description of the constraint network that is used internally to represent queries, argue on them, and create query evaluation plans. The PSystem is used both by the VIATRA and IncA projects as an intermediate representation of the model queries, and both the Rete-based and Local Search-based query backends consume it as their starting point. In the layered architecture of query evaluators which build on VIATRA Query, this is the first component which is entirely modeling platform/framework independent. See the following page for the role of PSystem with respect to other representations: Query processing.

Overview

PQuery: represents a graph pattern and consists of a disjunction of pattern bodies (PBody)

PBody: a collection of pattern matcher constraints (PConstraint)

PParameter: represents a parameter of a PQuery (graph pattern). A parameter has a name and a type.

PVariable: represents the pattern parameters and the local variables inside a given PBody. Each PParameter appears as a separate PVariable in each PBody. A variable has a name and optionally a type.

PConstraint: represents a condition expressed on certain PVariables of a PBody, that is required to hold for matches to fulfill that body. Can be either enumerable (one is able to list all tuples of variable values satisfying the constraint) or deferred (one can only check whether a given tuple of variable values satisfy it, once the variable values, or at least some of them, are available).

PDisjunction: a set of PBodies representing separate conditions (in OR relationship) making up a PQuery. The PQuery itself is defined as a canonical disjunction of bodies, but PDisjunctionRewriters may rewrite the query into a different form with equivalent meaning, which would be represented as another disjunction of bodies. Such rewriters can be used as a preprocessing step before query evaluation.

The following diagram shows an overview of the relationship between these concepts.

Detailed PConstraint semantics

Constraint name Syntax Semantics Example
Equality Equality(PBody pSystem, PVariable who, PVariable withWhom) Checks whether the who variable is equal to the withWhom variable CompareConstraint with the == operator is reduced to an Equality.
Inequality Inequality(PBody pSystem, PVariable who, PVariable withWhom) Checks whether the who variable is NOT equal to the withWhom variable. CompareConstraint with the != operator is reduced to an Equality.
ExpressionEvaluation ExpressionEvaluation(PBody pSystem, IExpressionEvaluator evaluator, PVariable outputVariable) Evaluates the expression defined inside the evaluator. The outputVariable is not used for check expression, but the eval expression may return a value and so it will be stored in that variable. CheckConstraint is reduced to an ExpressionEvaluation, where the outputVariable can simply be null.
ExportedParameter ExportedParameter(PBody pSystem, PVariable parameterVariable, String parameterName) Establishes a binding between a PParameter and the PVariable that is created in a given PBody for that parameter. --
PositivePatternCall (enumerable) PositivePatternCall(PBody pSystem, Tuple variablesTuple, PQuery pattern) Allows composing patterns by calling the given pattern. The actual arguments of the pattern call are contained inside the variablesTuple. PatternCompositionConstraint is reduced to a PositivePatternCall if the neg keyword is NOT applied.
NegativePatternCall NegativePatternCall(PBody pSystem, Tuple actualParametersTuple, PQuery query) Allows composing patterns by calling the given pattern as a NAC. PatternCompositionConstraint is reduced to a NegativePatternCall if the neg keyword is applied.
BinaryTransitiveClosure (enumerable) BinaryTransitiveClosure(PBody pSystem, Tuple variablesTuple, PQuery pattern) Computes the transitive closure of the binary relation which is defined by the given pattern. PatternCompositionConstraint is reduced to a BinaryTransitiveClosure if the plus symbol is applied.
TypeConstraint (enumerable) TypeConstraint(PBody pSystem, Tuple variablesTuple, IInputKey inputKey) Represents an enumerable type constraint that asserts that values substituted for the given tuple of variables form a tuple that belongs to an enumerable extensional relation identified by the inputKey. Unary type constraints and path expression segments are reduced to TypeConstraints.
TypeFilterConstraint TypeFilterConstraint(PBody pSystem, Tuple variablesTuple, IInputKey inputKey) Same as above, but for the case where the extensional relation identified by the inputKey is non-enumerable (only tuple membership can be checked). e.g. a count or eval may imply a Java type for their result variable, which is a non-enumerable constraint
PatternMatchCounter PatternMatchCounter(PBody pSystem, Tuple actualParametersTuple, PQuery query, PVariable resultVariable) Counts the number of matches of the called pattern that are incident on given variables, and stores the resulting number in an output variable. A count find expression is reduced to a PatternMatchCounter constraint on the result variable.
ConstantValue (enumerable) ConstantValue(PBody pSystem, PVariable variable, Object value) Asserts that the given variable takes the given value. Any literal value (number, boolean, enum reference, string, etc.) mentioned in the pattern language is turned into a virtual variable and a ConstantValue constraint on the variable.

Query Processing for Rete networks

This figure shows the workflow of the recipe generation in VIATRA Query.

Local Search

This page summarizes the improvement ideas concerning the local search-based pattern matching capabilites of VIATRA Query.

  • Hybrid pattern matching:
    • New ISearchOperation to support calling an incremental matcher
    • Provide input for the flattener: which patterns should NOT be flattened. Implement this by creating an interface with a method that expects a constraint and a pattern. (Interface added)
  • Implement a weight function to estimate the cost of a given search plan over a model
  • Learning algorithm: the planner shall be able to tune its internal parameters regarding the planning process. This could be done by providing (small) example models and measuring pattern matching performance.
  • Provide a low-level code generator: instead of the current execution method, that is based on sequential exectuion of ISearchOperations contained in plans, generate executable code. This might be in a form of several embedded for loops and if conditions.
  • Support the POperationCompiler with normalization capability to remove redundant constraints thus decrease the search plan size.
  • Investigate if new VQL elements (e.g. containment) should be added that are only supported by either the local search-based or the RETE algorithm.
  • Support parallelism for plan execution. For example, the search graph traversal can be done simultaneously starting from different nodes.
  • Discuss different plan generation options and additional capabilities here.

Known problems

Based on the fact that the local search matcher itself does not use the EMF notification API, in some cases the RETE and the local search-based matchers may return different match sets after model modification. At the current state of development this issue can be reproduced by running the School tests "deleteAllYear" or "deleteTeacher2", where EObjects, which are not part of the EMF model, are still reached by object level references by the local search matcher, generating unexpected matches. When calling EcoreUtil#delete on the object to-be-removed, the phenomenon does not occur. By rewriting the mentioned tests to use this EMF utility class the mentioned tests are also passing.

It is important to also note that the local search matcher might use the base index to get nodes/edges located in the model. This case (while all elements are collected using the index) the result set will be the same as the RETE matcher, for the base index does listen to the notifications from the EMF model.