Skip to content

In Network Query Processing

Henning Hasemann edited this page Sep 30, 2013 · 8 revisions

This Page describes briefly the encoding of queries for the In-Network Query Processing component (INQP). The implementation can be found in wiselib.testing/algorithms/rdf/inqp/.

Queries

A query consists of the following:

  • A numeric id (one byte, assumed to uniquely identify the query during its existence in the network)
  • A set of operators

General things to note:

  • Values are always 4 byte long and can be either integers, floats or strings (identified by hash value)
  • For hash we use SDBM, see http://www.cse.yorku.ca/~oz/hash.html and wiselib.testing/algorithms/hash/sdbm.h

Operators

Operators and their parameters are encoded in a compact format and can be transmitted individually to the INQP communicator. INQP will start processing the query once its set of operators is complete. All operator types have a common set of base fields, described below. Depending on the operator type, each operator also features different options. Operators form a query tree, each operator has 2 input ports and one output (for some operators only the left input port is being used). Each operator implicitely also defines a subsequent projection (this way, unenecessary allocations for throw-away columns are avoided).

Base fields

QUERY ID OPERATOOR ID TYPE PARENT ID PROJECTION INFO Type-specific options...
1 byte 1 byte 1 byte 1 byte 4 bytes  

Query Id

Numerical query id, see above

Operator Id

Numerical id of this operator. Must be unique in this query tree. 0 is reserved for special purposes, operators should be numbered such that operators higher in the tree have a higher operator id.

Type

Operator type, one of

Value Meaning
'g' Graph Pattern Selection
's' (General) Selection
'j' Simple Local Join
'c' Collect
'C' Construct
'a' Aggregate
'D' Delete

Parent Id

The set of operators in a query form a directed tree, in general, data is input at the leaves and output at the root. The parent id specifies where to connect the output port of the current operator, that is a target operator (by operator id) plus a port (left or right).

The MSB defines the parent port, the remaining 7 bits the parent operator id, i.e.:

parent_id = (parent_op_id & 0x7f) | (left ? 0 : 0x80)

By definition the root of the operator tree has no parent which is denoted by parent_id = 0.

Projection Info

The projection info defines for each output column of the operator it is attached to, whether to keep it or to drop it. For easier processing, the projection info also re-states the type of each column it defines to be kept. Data in INQP can have three types: INTEGER, FLOAT or STRING.

For each column, 2 bits are stored, defining whether to drop/ignore the column or to keep it and if so, its type.

Hi Low Meaning
0 0 IGNORE
0 1 INTEGER
1 0 FLOAT
1 1 STRING

The first output column is defined by the two least-significant bits of the first byte of the projection info, the second column by the 3rd and 4th bit, etc...

Graph Pattern Selection

Type
Selection
Inputs
0
Outputs
1
Options length
4, 8 or 12 byte
BITMASK (VALUE 1) (VALUE 2) (VALUE 3)
1 byte 4 byte 4 byte 4 byte

The Graph Patterns Selection (GPS) is (currently) the only operator that is allowed to appear as a leaf in the query tree. It never has childs and receives its data directly from the tuple store. It will select all tuples that match the specified pattern. The selection pattern is specified as follows: BITMASK defines which attributes of the input data are to be compared (bit 3 = subject, bit 2 = predicate, bit 1 = LSB = object). According to that bitmask a number of comparison values follows that will be used to compare the according column to.

E.g:

(?foo 0x1234 0x5678) translates into

{ BIN(011), 0, 0, 0x12, 0x34, 0, 0, 0x56, 0x78 }

In contrast to other operators the projection info in GPS is not only used for (re-)stating the type of column to be output but can also be used to trigger a conversion from the string-only tuplestore.

General Selection

Type
Selection
Inputs
1
Outputs
1
Options length
2..? byte
#CRITERIA #VALUES Criteria... Values...
1 byte 1 byte  

The general selection allows more sophisticated comparisons. It defines a number of critera, followed by a number of values.

The first criterion always refers to the first input column and defines how to compare it and to what. If the criterion does not set the AGAIN flag (MSB), the next criterion will refer to the following column. If it does set the AGAIN flag, it will refer to the same column.

AGAIN CRITERION VALUE INDEX
1 bit 4 bit 3 bit
MSB LSB

The table belows gives possible values for a criterion field, that can be or'ed together (i.e. 0x20 | 0x80 | 3 means "the current attribute has to be > than the attribute with index 3, another criterion for this attribute will follow").

0x00 Ignore (always matches)
0x08 ==
0x10 <
0x18 <=
0x20 >
0x28 >=
0x30 !=
0x80 AGAIN
0-7 Value index

The value index may refer to constants (=values given with operator) and/or input columns. The lower #VALUES indices refer to the values given with the operator, higher indices refer to values from the input row.

For the following examples we use for "clarity" (in line with the table above):

IGN = 0x00 LT = 0x10 AGAIN = 0x80

Example:

2, 1, IGN, LT | 0, 0x12, 0x23, 0x45, 0x67

means: 2 critera, 1 value, ignore 0th column, 1st column has to be lower than the value with index 0, which is the provided constant 0x12234567.

Example:

2, 0, IGN, LT | 0

means: 1 criteria, 0 values, ignore 0th column, 1st column has to be lower than the value with index 0. Here this is not a constant, as there are 0 constants provided, but the first column of the input row, similarily:

2, 0, IGN, LT | 2

would compare 1st column < 2nd colmumn. This could be equivalently written as:

3, 0, IGN, IGN, GT | 1

Simple Local Join

Type
Join
Inputs
2
Outputs
1
Options length
1 byte

The simple local join simply compares an attribute of its left child with an attribute of its right child (for equality). Both attribute indices are identified by a single byte, the left attribute index in the high bits, the right one in the low bits, i.e.:

byte = (left_attr_index << 4) | right_attr_index;

A crossjoin can be achieved by setting

byte = 0xff;

Collect

Type
Distributed
Inputs
1
Outputs
0
Options length
0 byte

The collect operator simple sends its input tuple to the sink for further processing (sorting, non-local joins etc...).

Aggregation

Type
Distributed Aggregation
Inputs
1
Outputs
0
Options length
1..? byte
0 GROUP BY this attribute
1 SUM()
2 AVG()
3 COUNT()
4 MIN()
5 MAX()
0x80 AGAIN
#OPERATORS Aggregation Operations...
1 byte  
Clone this wiki locally