-
Notifications
You must be signed in to change notification settings - Fork 57
In Network Query Processing
This Page describes briefly the encoding of queries for the In-Network Query Processing component (INQP). The implementation can be found in wiselib.testing/algorithms/rdf/inqp/.
A query consists of the following:
- A numeric id (one byte, assumed to uniquely identify the query during its existence in the network)
- A set of operators
General things to note:
- Values are always 4 byte long and can be either integers, floats or strings (identified by hash value)
- For hash we use SDBM, see http://www.cse.yorku.ca/~oz/hash.html and wiselib.testing/algorithms/hash/sdbm.h
Operators and their parameters are encoded in a compact format and can be transmitted individually to the INQP communicator. INQP will start processing the query once its set of operators is complete. All operator types have a common set of base fields, described below. Depending on the operator type, each operator also features different options. Operators form a query tree, each operator has 2 input ports and one output (for some operators only the left input port is being used). Each operator implicitely also defines a subsequent projection (this way, unenecessary allocations for throw-away columns are avoided).
QUERY ID | OPERATOOR ID | TYPE | PARENT ID | PROJECTION INFO | Type-specific options... |
1 byte | 1 byte | 1 byte | 1 byte | 4 bytes |
Numerical query id, see above
Numerical id of this operator. Must be unique in this query tree. 0 is reserved for special purposes, operators should be numbered such that operators higher in the tree have a higher operator id.
Operator type, one of
Value | Meaning |
'g' | Graph Pattern Selection |
's' | (General) Selection |
'j' | Simple Local Join |
'c' | Collect |
'C' | Construct |
'a' | Aggregate |
'D' | Delete |
The set of operators in a query form a directed tree, in general, data is input at the leaves and output at the root. The parent id specifies where to connect the output port of the current operator, that is a target operator (by operator id) plus a port (left or right).
The MSB defines the parent port, the remaining 7 bits the parent operator id, i.e.:
parent_id = (parent_op_id & 0x7f) | (left ? 0 : 0x80)
By definition the root of the operator tree has no parent which is denoted by parent_id = 0.
The projection info defines for each output column of the operator it is attached to, whether to keep it or to drop it. For easier processing, the projection info also re-states the type of each column it defines to be kept. Data in INQP can have three types: INTEGER, FLOAT or STRING.
For each column, 2 bits are stored, defining whether to drop/ignore the column or to keep it and if so, its type.
Hi | Low | Meaning |
0 | 0 | IGNORE |
0 | 1 | INTEGER |
1 | 0 | FLOAT |
1 | 1 | STRING |
The first output column is defined by the two least-significant bits of the first byte of the projection info, the second column by the 3rd and 4th bit, etc...
- Type
- Selection
- Inputs
- 0
- Outputs
- 1
- Options length
- 4, 8 or 12 byte
BITMASK | (VALUE 1) | (VALUE 2) | (VALUE 3) |
1 byte | 4 byte | 4 byte | 4 byte |
The Graph Patterns Selection (GPS) is (currently) the only operator that is allowed to appear as a leaf in the query tree. It never has childs and receives its data directly from the tuple store. It will select all tuples that match the specified pattern. The selection pattern is specified as follows: BITMASK defines which attributes of the input data are to be compared (bit 3 = subject, bit 2 = predicate, bit 1 = LSB = object). According to that bitmask a number of comparison values follows that will be used to compare the according column to.
E.g:
(?foo 0x1234 0x5678) translates into { BIN(011), 0, 0, 0x12, 0x34, 0, 0, 0x56, 0x78 }
In contrast to other operators the projection info in GPS is not only used for (re-)stating the type of column to be output but can also be used to trigger a conversion from the string-only tuplestore.
- Type
- Selection
- Inputs
- 1
- Outputs
- 1
- Options length
- 2..? byte
#CRITERIA | #VALUES | Criteria... Values... |
1 byte | 1 byte |
The general selection allows more sophisticated comparisons. It defines a number of critera, followed by a number of values.
The first criterion always refers to the first input column and defines how to compare it and to what. If the criterion does not set the AGAIN flag (MSB), the next criterion will refer to the following column. If it does set the AGAIN flag, it will refer to the same column.
AGAIN | CRITERION | VALUE INDEX |
1 bit | 4 bit | 3 bit |
MSB LSB |
The table belows gives possible values for a criterion field, that can be or'ed together (i.e. 0x20 | 0x80 | 3 means "the current attribute has to be > than the attribute with index 3, another criterion for this attribute will follow").
0x00 | Ignore (always matches) |
0x08 | == |
0x10 | < |
0x18 | <= |
0x20 | > |
0x28 | >= |
0x30 | != |
0x80 | AGAIN |
0-7 | Value index |
The value index may refer to constants (=values given with operator) and/or input columns. The lower #VALUES indices refer to the values given with the operator, higher indices refer to values from the input row.
For the following examples we use for "clarity" (in line with the table above):
IGN = 0x00 LT = 0x10 AGAIN = 0x80
Example:
2, 1, IGN, LT | 0, 0x12, 0x23, 0x45, 0x67
means: 2 critera, 1 value, ignore 0th column, 1st column has to be lower than the value with index 0, which is the provided constant 0x12234567.
Example:
2, 0, IGN, LT | 0
means: 1 criteria, 0 values, ignore 0th column, 1st column has to be lower than the value with index 0. Here this is not a constant, as there are 0 constants provided, but the first column of the input row, similarily:
2, 0, IGN, LT | 2
would compare 1st column < 2nd colmumn. This could be equivalently written as:
3, 0, IGN, IGN, GT | 1
- Type
- Join
- Inputs
- 2
- Outputs
- 1
- Options length
- 1 byte
The simple local join simply compares an attribute of its left child with an attribute of its right child (for equality). Both attribute indices are identified by a single byte, the left attribute index in the high bits, the right one in the low bits, i.e.:
byte = (left_attr_index << 4) | right_attr_index;
A crossjoin can be achieved by setting
byte = 0xff;
- Type
- Distributed
- Inputs
- 1
- Outputs
- 0
- Options length
- 0 byte
The collect operator simple sends its input tuple to the sink for further processing (sorting, non-local joins etc...).
- Type
- Distributed Aggregation
- Inputs
- 1
- Outputs
- 0
- Options length
- 1..? byte
0 | GROUP BY this attribute |
1 | SUM() |
2 | AVG() |
3 | COUNT() |
4 | MIN() |
5 | MAX() |
0x80 | AGAIN |
#OPERATORS | Aggregation Operations... |
1 byte |