-
Notifications
You must be signed in to change notification settings - Fork 22
Metric Parameter for Aggregate Queries (OLAP)
[Table of Contents](https://github.com/dell-oss/Doradus/wiki/OLAP Databases: Table-of-Contents) | Previous | Next
OLAP Aggregate Queries: Metric Parameter for Aggregate Queries
The aggregate query metric parameter has multiple formats, as described below.
In its basic form, the metric parameter is a comma-separated list of metric functions. Each function performs a statistical calculation on a scalar or link field. The general syntax of a metric function is:
function(field)
Where function is a metric function name and field defines what the metric function is computed on. The field must be a scalar or link defined in the application’s schema. It can belong to the perspective table, or it can be a path to a field linked to the perspective table (e.g., DirectReports.Name
).
If the metric function parameter is a field path with a reflexive link, the transitive operator can be used. Example:
COUNT=(DirectReports^)
This function counts all DirectReports
link values for each perspective object (i.e., all employees in the org chart reporting to each person). Scalar fields of transitively-linked objects can also be used as the metric function’s parameter. For example:
MAX(Manager^.FirstName)
This function finds the maximum FirstName
(highest, alphabetically) of all managers “up” the org chart for each perspective object. For more information on the transitive operator, see the section Transitive Function.
The supported metric functions are summarized below:
-
COUNT(field)
: Counts the values for the specified field. If the field is multi-valued with respect to the perspective object, all values are counted for each selected object. For example,COUNT(Tags)
tallies allTags
values of all objects. The COUNT function also allows the special value "*
", which counts the selected objects in the perspective table regardless of any fields. That is,COUNT(*)
counts objects instead of values. -
DISTINCT(field)
: This metric is similar to COUNT except that it totals unique values for the given field. For example,COUNT(Size)
finds the total number of values of theSize
field, whereasDISTINCT(Size)
finds the number of uniqueSize
values. -
SUM(field)
: Sums the non-null values for the given numeric field. The field’s type must beinteger
,long
,float
, ordouble
. -
AVERAGE(field)
: Computes the average value for the given field, which must beinteger
,long
,float
,double
, ortimestamp
. Note that theAVERAGE
function uses SQL null-elimination semantics. This means that objects for which the metric field does not have a value are not considered for computation even though the object itself was selected. As an example, consider an aggregate query that computesAVERAGE(foo)
for four selected objects, whose value forfoo
are 2, 4, 6, and null. The value computed will be 4 ((2+4+6)/3) not 3 ((2+4+6+0)/4) because the object with the null field is eliminated from the computation. -
MIN(field)
: Computes the minimum value for the given field. For scalar fields,MIN
computes the lowest value found based on the field type’s natural order. For link fields,MIN
computes the lowest object ID found in the link field’s values based on the string form of the object ID. -
MAX(field)
: Computes the maximum value for the given field, which must be a predefined scalar or link field. -
MAXCIUNT(field)
: This function computes the maximum cardinality of the given scalar or link field. For example, for theMessage
table,MAXCOUNT(ExternalRecipients)
returns the highest number ofExternalRecipients
values found for selected objects. -
MINCOUNT(field)
: This function computes the minimum cardinality of the given field. If any selected objects have no value for the given field, the result will be 0.
When applied to a single-valued field, the result of both MINCOUNT
and MAXCOUNT
will be either 0 or 1.
Example metric functions are shown below:
COUNT(*)
DISTINCT(Tags)
MAX(Sender.Person.LastName)
AVERAGE(Size)
MAXCOUNT(Sender.Person.DirectReports)
The metric parameter can be a comma-separated list of metric functions. All functions are computed concurrently as objects are selected. An example metric parameter with multiple functions is shown below:
MIN(Size),MAX(Size),COUNT(InternalRecipients)
The results of multi-metric aggregate queries are described later.
Doradus OLAP allows the metric parameter to be a list of metric expressions. A metric expression is a set of one or more algebraic clauses containing metric functions, constants, and the DATEDIFF
function. Multiple clauses are combined with basic math operators (+
, -
, *
, and /
) and parentheses. This allows arbitrary calculations to be performed for selected objects. Examples of metric expressions are shown below:
MAX(Size) + COUNT(InternalRecipients.Person.Domain)
COUNT(*) * 2
SUM(Size) / COUNT(Sender.Person.MessageAddresses) / (DATEDIFF(DAY, "2013-11-01", NOW(PST)) + 1)
As with metric functions, metric expressions can be used in global and grouped aggregate queries. Multiple metric expressions can be computed in a single aggregate query. Each metric expression compute a value for each group. Computations are performed as integers or floating-point values as necessary. When a metric expression attempts to divide by 0, the metric value is returned as "Infinity". Parentheses override default operator evaluation precedence in the usual manner. Calculations are performed across all objects or for each group when a grouping parameter is provided.
Metric expressions can use the DATEDIFF
function to calculate the difference between two timestamp constants in a specific granularity. Its result is an integer that may be positive, negative, or zero. The DATEDIFF
function is similar to that used by SQL Server and uses the following syntax:
DATEDIFF(<units>, <start date>, <end date>)
Where:
-
<units>
can be any of the following mnemonics:SECOND
,MINUTE
,HOUR
,DAY
,WEEK
,MONTH
,QUARTER
, andYEAR
. The units mnemonic must be uppercase. -
<start date>
and<end date>
are timestamp literals or theNOW()
function. All date/time components are optional from right-to-left except for year. Example literal values are"2013-11-13 11:03:02.571"
,"2013-11-13 11:03"
, and "2013-11-13"
. Quotes are optional if the value has only a year component.
The two timestamp values are truncated to their <units>
precision, similar to the TRUNCATE
function. The difference between the two timestamps in then computed in the requested units, producing a numeric result:
-
If the truncated
<start date>
is less than the truncated<end date>
, the result is a positive number. -
If the truncated
<start date>
is greater than the truncated<end date>
, the result is a negative number. -
If the two truncated timestamps are the same, the result is zero.
Common practice is to use an <end date>
>= <start date>
to produce a positive value. Hence, when NOW()
is used, it normally appears as the <end date>
parameter. Since DATEDIFF
produces a constant, it is most useful when combined with other computations, e.g., to divide an average by the number of days in a query date range. For example:
.../Message?m=COUNT(*)/DATEDIFF(DAY, "2013-12-01", NOW())&q=SendDate:["2013-12-01" TO NOW()]
This query counts the number of messages sent between 2013-12-01 and "now", inclusively, and divides by the number of days between these same dates.
When the parameter to a metric function is a link path, the WHERE
function can be used to filter values passed to the metric calculation. When the WHERE
function follows a link name, it filters objects connected via that link. For example:
GET /Email/Message/_aggregate?range=0&m=MIN(Sender.Person.WHERE(Department:sales).Office)
This query selects all Message
objects, but the WHERE
filter selects only those Person
objects whose Department
contains the term sales
. Hence, the MIN
metric finds the lowest (first alphabetic) Office
value whose sender is in the sales department.
When the WHERE
function appears at the beginning of the metric function, it filters perspective objects, acting in conjunction with the query selection expression, if present. When used in this way, the WHERE
function must be followed by .*field*
where field
is the metric field name or link path. For example:
GET /Email/Person/_aggregate?range=0&m=COUNT(WHERE(Department:sales).*)&q=Office:aliso
Since the COUNT
function’s outer WHERE
filter is followed by .*
, the function counts objects. The WHERE
expression is AND-ed with the query expression Office:also
. Therefore, this query is similar (but not identical) to the following:
GET /Email/Person/_aggregate?range=0&m=COUNT(*)&q=Office:aliso AND Department:sales
Because the scope of an outer WHERE function remains at the perspective table, multiple WHERE functions can be chained together to filter perspective objects. Example:
GET /Email/Message/_aggregate?range=0
&m=MIN(WHERE(Sender.Person.Department:sales).WHERE(InternalRecipients.Person.Office:aliso).Size)
&q=Tags=AfterHours
This query finds the smallest Message.Size
value of all messages where (1) the message is tagged with AfterHours
, (2) at least one sender belongs to the sales
department, and (3) at least one internal recipient resides in the aliso
office.
Note that filtering objects within the metric function does not produce exactly the same results as filtering objects in the query expression. In the previous example, the query expression Tags=AfterHours
determines which objects are initially selected. The number of these objects is reflected in the <totalobjects>
element returned by the query. However, the objects actually passed to the metric function are filtered by the other WHERE
functions. This number is probably less than <totalobjects>
but otherwise unknown unless the metric function is COUNT(*)
. A typical result is shown below:
<results>
<aggregate metric="MIN(WHERE(Sender.Person.Department:sales).
WHERE(InternalRecipients.Person.Office:aliso).Size)" query="Tags=AfterHours"/>
<totalobjects>6029</totalobjects>
<value>4802</value>
</results>
Suppose we moved the outer WHERE functions to the query parameter:
GET /Email/Message/_aggregate?range=0&m=MIN(Size)
&q=Sender.Person.Department:sales AND InternalRecipients.Person.Office:aliso AND Tags=AfterHours
Here, the objects are filtered by the query expression, so only those selected are reflected in the <totalobjects>
element returned by the query. A typical result is shown below:
<results>
<aggregate metric="MIN(Size)" query="Sender.Person.Department:sales AND InternalRecipients.Person.Office:aliso AND Tags=AfterHours"/>
<totalobjects>2</totalobjects>
<value>4802</value>
</results>
Compared to the previous query, which selected 6029 objects, this query selected only 2 objects. This also shows that selecting objects in the query expression (&q) is more efficient, especially for large data sets.
However, sometimes metric-level filtering is the only way to get the results we need. For example, suppose we want to count messages sent by people in two different departments but grouped by the same office. We could use the following multi-metric query:
GET /Email/Message/_aggregate?range=0
&m=COUNT(WHERE(Sender.Person.Department:sales).*),
COUNT(WHERE(Sender.Person.Department:support).*)
&f=TOP(3,Sender.Person.Office)
This query performs two COUNT(*)
functions: the first selects messages whose sender belong to sales
, the second those sender belong to support
. By definition, when a TOP or BOTTOM grouping field is used with a multi-metric query, the groups are generated from the sorted values of the first metric function. Secondary metric functions follow the same grouping pattern so that the metric computations are correlated. A typical result for this query looks like this:
<results>
<aggregate metric="COUNT(WHERE(Sender.Person.Department:sales).*),
COUNT(WHERE(Sender.Person.Department:support).*)" query="*"
group="TOP(3,Sender.Person.Office)"/>
<totalobjects>6030</totalobjects>
<groupsets>
<groupset group="TOP(3,Sender.Person.Office)"
metric="COUNT(WHERE(Sender.Person.Department:sales).*)">
<summary>81</summary>
<totalgroups>40</totalgroups>
<groups>
<group>
<metric>68</metric>
<field name="Sender.Person.Office">Maidenhead</field>
</group>
<group>
<metric>5</metric>
<field name="Sender.Person.Office">Madrid</field>
</group>
<group>
<metric>2</metric>
<field name="Sender.Person.Office">Aliso Viejo 5</field>
</group>
</groups>
</groupset>
<groupset group="TOP(3,Sender.Person.Office)"
metric="COUNT(WHERE(Sender.Person.Department:support).*)">
<summary>16</summary>
<totalgroups>40</totalgroups>
<groups>
<group>
<metric>14</metric>
<field name="Sender.Person.Office">Maidenhead</field>
</group>
<group>
<metric>0</metric>
<field name="Sender.Person.Office">Madrid</field>
</group>
<group>
<metric>2</metric>
<field name="Sender.Person.Office">Aliso Viejo 5</field>
</group>
</groups>
</groupset>
</groupsets>
</results>
This allows the offices with the most email senders in sales
to be directly compared to the number of email senders in support
for the same offices. If we executed two separate aggregate queries—one for each function—but with the same grouping parameter, we would get different, uncorrelated groups.
Technical Documentation
[Doradus OLAP Databases](https://github.com/dell-oss/Doradus/wiki/Doradus OLAP Databases)
- Architecture
- OLAP Database Overview
- OLAP Data Model
- Doradus Query Language (DQL)
- OLAP Object Queries
- OLAP Aggregate Queries
- OLAP REST Commands
- Architecture
- Spider Database Overview
- Spider Data Model
- Doradus Query Language (DQL)
- Spider Object Queries
- Spider Aggregate Queries
- Spider REST Commands
- [Installing and Running Doradus](https://github.com/dell-oss/Doradus/wiki/Installing and Running Doradus)
- [Deployment Guidelines](https://github.com/dell-oss/Doradus/wiki/Deployment Guidelines)
- [Doradus Configuration and Operation](https://github.com/dell-oss/Doradus/wiki/Doradus Configuration and Operation)
- [Cassandra Configuration and Operation](https://github.com/dell-oss/Doradus/wiki/Cassandra Configuration and Operation)