Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Metric Parameter for Aggregate Queries (OLAP)

JoeWinter edited this page Feb 19, 2015 · 1 revision

[Table of Contents](https://github.com/dell-oss/Doradus/wiki/OLAP Databases: Table-of-Contents) | Previous | Next
OLAP Aggregate Queries: Metric Parameter for Aggregate Queries


The aggregate query metric parameter has multiple formats, as described below.

Metric Functions

In its basic form, the metric parameter is a comma-separated list of metric functions. Each function performs a statistical calculation on a scalar or link field. The general syntax of a metric function is:

function(field)

Where function is a metric function name and field defines what the metric function is computed on. The field must be a scalar or link defined in the application’s schema. It can belong to the perspective table, or it can be a path to a field linked to the perspective table (e.g., DirectReports.Name).

If the metric function parameter is a field path with a reflexive link, the transitive operator can be used. Example:

COUNT=(DirectReports^)

This function counts all DirectReports link values for each perspective object (i.e., all employees in the org chart reporting to each person). Scalar fields of transitively-linked objects can also be used as the metric function’s parameter. For example:

MAX(Manager^.FirstName)

This function finds the maximum FirstName (highest, alphabetically) of all managers “up” the org chart for each perspective object. For more information on the transitive operator, see the section Transitive Function.

The supported metric functions are summarized below:

  • COUNT(field): Counts the values for the specified field. If the field is multi-valued with respect to the perspective object, all values are counted for each selected object. For example, COUNT(Tags) tallies all Tags values of all objects. The COUNT function also allows the special value "*", which counts the selected objects in the perspective table regardless of any fields. That is, COUNT(*) counts objects instead of values.

  • DISTINCT(field): This metric is similar to COUNT except that it totals unique values for the given field. For example, COUNT(Size) finds the total number of values of the Size field, whereas DISTINCT(Size) finds the number of unique Size values.

  • SUM(field): Sums the non-null values for the given numeric field. The field’s type must be integer, long, float, or double.

  • AVERAGE(field): Computes the average value for the given field, which must be integer, long, float, double, or timestamp. Note that the AVERAGE function uses SQL null-elimination semantics. This means that objects for which the metric field does not have a value are not considered for computation even though the object itself was selected. As an example, consider an aggregate query that computes AVERAGE(foo) for four selected objects, whose value for foo are 2, 4, 6, and null. The value computed will be 4 ((2+4+6)/3) not 3 ((2+4+6+0)/4) because the object with the null field is eliminated from the computation.

  • MIN(field): Computes the minimum value for the given field. For scalar fields, MIN computes the lowest value found based on the field type’s natural order. For link fields, MIN computes the lowest object ID found in the link field’s values based on the string form of the object ID.

  • MAX(field): Computes the maximum value for the given field, which must be a predefined scalar or link field.

  • MAXCIUNT(field): This function computes the maximum cardinality of the given scalar or link field. For example, for the Message table, MAXCOUNT(ExternalRecipients) returns the highest number of ExternalRecipients values found for selected objects.

  • MINCOUNT(field): This function computes the minimum cardinality of the given field. If any selected objects have no value for the given field, the result will be 0.

When applied to a single-valued field, the result of both MINCOUNT and MAXCOUNT will be either 0 or 1.

Example metric functions are shown below:

COUNT(*)
DISTINCT(Tags)
MAX(Sender.Person.LastName)
AVERAGE(Size)
MAXCOUNT(Sender.Person.DirectReports)

Multiple Metric Functions

The metric parameter can be a comma-separated list of metric functions. All functions are computed concurrently as objects are selected. An example metric parameter with multiple functions is shown below:

MIN(Size),MAX(Size),COUNT(InternalRecipients)

The results of multi-metric aggregate queries are described later.

Metric Expressions

Doradus OLAP allows the metric parameter to be a list of metric expressions. A metric expression is a set of one or more algebraic clauses containing metric functions, constants, and the DATEDIFF function. Multiple clauses are combined with basic math operators (+, -, *, and /) and parentheses. This allows arbitrary calculations to be performed for selected objects. Examples of metric expressions are shown below:

MAX(Size) + COUNT(InternalRecipients.Person.Domain)

COUNT(*) * 2

SUM(Size) / COUNT(Sender.Person.MessageAddresses) / (DATEDIFF(DAY, "2013-11-01", NOW(PST)) + 1)

As with metric functions, metric expressions can be used in global and grouped aggregate queries. Multiple metric expressions can be computed in a single aggregate query. Each metric expression compute a value for each group. Computations are performed as integers or floating-point values as necessary. When a metric expression attempts to divide by 0, the metric value is returned as "Infinity". Parentheses override default operator evaluation precedence in the usual manner. Calculations are performed across all objects or for each group when a grouping parameter is provided.

DATEDIFF Function

Metric expressions can use the DATEDIFF function to calculate the difference between two timestamp constants in a specific granularity. Its result is an integer that may be positive, negative, or zero. The DATEDIFF function is similar to that used by SQL Server and uses the following syntax:

DATEDIFF(<units>, <start date>, <end date>)

Where:

  • <units> can be any of the following mnemonics: SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, and YEAR. The units mnemonic must be uppercase.

  • <start date> and <end date> are timestamp literals or the NOW() function. All date/time components are optional from right-to-left except for year. Example literal values are "2013-11-13 11:03:02.571", "2013-11-13 11:03", and "2013-11-13". Quotes are optional if the value has only a year component.

The two timestamp values are truncated to their <units> precision, similar to the TRUNCATE function. The difference between the two timestamps in then computed in the requested units, producing a numeric result:

  • If the truncated <start date> is less than the truncated <end date>, the result is a positive number.

  • If the truncated <start date> is greater than the truncated <end date>, the result is a negative number.

  • If the two truncated timestamps are the same, the result is zero.

Common practice is to use an <end date> >= <start date> to produce a positive value. Hence, when NOW() is used, it normally appears as the <end date> parameter. Since DATEDIFF produces a constant, it is most useful when combined with other computations, e.g., to divide an average by the number of days in a query date range. For example:

.../Message?m=COUNT(*)/DATEDIFF(DAY, "2013-12-01", NOW())&q=SendDate:["2013-12-01" TO NOW()]

This query counts the number of messages sent between 2013-12-01 and "now", inclusively, and divides by the number of days between these same dates.

WHERE Filter

When the parameter to a metric function is a link path, the WHERE function can be used to filter values passed to the metric calculation. When the WHERE function follows a link name, it filters objects connected via that link. For example:

GET /Email/Message/_aggregate?range=0&m=MIN(Sender.Person.WHERE(Department:sales).Office)

This query selects all Message objects, but the WHERE filter selects only those Person objects whose Department contains the term sales. Hence, the MIN metric finds the lowest (first alphabetic) Office value whose sender is in the sales department.

When the WHERE function appears at the beginning of the metric function, it filters perspective objects, acting in conjunction with the query selection expression, if present. When used in this way, the WHERE function must be followed by .*field* where field is the metric field name or link path. For example:

GET /Email/Person/_aggregate?range=0&m=COUNT(WHERE(Department:sales).*)&q=Office:aliso

Since the COUNT function’s outer WHERE filter is followed by .*, the function counts objects. The WHERE expression is AND-ed with the query expression Office:also. Therefore, this query is similar (but not identical) to the following:

GET /Email/Person/_aggregate?range=0&m=COUNT(*)&q=Office:aliso AND Department:sales

Because the scope of an outer WHERE function remains at the perspective table, multiple WHERE functions can be chained together to filter perspective objects. Example:

GET /Email/Message/_aggregate?range=0
	&m=MIN(WHERE(Sender.Person.Department:sales).WHERE(InternalRecipients.Person.Office:aliso).Size)
	&q=Tags=AfterHours

This query finds the smallest Message.Size value of all messages where (1) the message is tagged with AfterHours, (2) at least one sender belongs to the sales department, and (3) at least one internal recipient resides in the aliso office.

Note that filtering objects within the metric function does not produce exactly the same results as filtering objects in the query expression. In the previous example, the query expression Tags=AfterHours determines which objects are initially selected. The number of these objects is reflected in the <totalobjects> element returned by the query. However, the objects actually passed to the metric function are filtered by the other WHERE functions. This number is probably less than <totalobjects> but otherwise unknown unless the metric function is COUNT(*). A typical result is shown below:

<results>
	<aggregate metric="MIN(WHERE(Sender.Person.Department:sales).
		WHERE(InternalRecipients.Person.Office:aliso).Size)" query="Tags=AfterHours"/>
	<totalobjects>6029</totalobjects>
	<value>4802</value>
</results>

Suppose we moved the outer WHERE functions to the query parameter:

GET /Email/Message/_aggregate?range=0&m=MIN(Size)
	&q=Sender.Person.Department:sales AND InternalRecipients.Person.Office:aliso AND Tags=AfterHours

Here, the objects are filtered by the query expression, so only those selected are reflected in the <totalobjects> element returned by the query. A typical result is shown below:

<results>
	<aggregate metric="MIN(Size)" query="Sender.Person.Department:sales AND InternalRecipients.Person.Office:aliso AND Tags=AfterHours"/>
	<totalobjects>2</totalobjects>
	<value>4802</value>
</results>

Compared to the previous query, which selected 6029 objects, this query selected only 2 objects. This also shows that selecting objects in the query expression (&q) is more efficient, especially for large data sets.

However, sometimes metric-level filtering is the only way to get the results we need. For example, suppose we want to count messages sent by people in two different departments but grouped by the same office. We could use the following multi-metric query:

GET /Email/Message/_aggregate?range=0
	&m=COUNT(WHERE(Sender.Person.Department:sales).*),
		COUNT(WHERE(Sender.Person.Department:support).*)
	&f=TOP(3,Sender.Person.Office)

This query performs two COUNT(*) functions: the first selects messages whose sender belong to sales, the second those sender belong to support. By definition, when a TOP or BOTTOM grouping field is used with a multi-metric query, the groups are generated from the sorted values of the first metric function. Secondary metric functions follow the same grouping pattern so that the metric computations are correlated. A typical result for this query looks like this:

<results>
	<aggregate metric="COUNT(WHERE(Sender.Person.Department:sales).*),
		COUNT(WHERE(Sender.Person.Department:support).*)" query="*" 
		group="TOP(3,Sender.Person.Office)"/>
	<totalobjects>6030</totalobjects>
	<groupsets>
		<groupset group="TOP(3,Sender.Person.Office)" 
			metric="COUNT(WHERE(Sender.Person.Department:sales).*)">
			<summary>81</summary>
			<totalgroups>40</totalgroups>
			<groups>
				<group>
					<metric>68</metric>
					<field name="Sender.Person.Office">Maidenhead</field>
				</group>
				<group>
					<metric>5</metric>
					<field name="Sender.Person.Office">Madrid</field>
				</group>
				<group>
					<metric>2</metric>
					<field name="Sender.Person.Office">Aliso Viejo 5</field>
				</group>
			</groups>
		</groupset>
		<groupset group="TOP(3,Sender.Person.Office)" 
			metric="COUNT(WHERE(Sender.Person.Department:support).*)">
			<summary>16</summary>
			<totalgroups>40</totalgroups>
			<groups>
				<group>
					<metric>14</metric>
					<field name="Sender.Person.Office">Maidenhead</field>
				</group>
				<group>
					<metric>0</metric>
					<field name="Sender.Person.Office">Madrid</field>
				</group>
				<group>
					<metric>2</metric>
					<field name="Sender.Person.Office">Aliso Viejo 5</field>
				</group>
			</groups>
		</groupset>
	</groupsets>
</results>

This allows the offices with the most email senders in sales to be directly compared to the number of email senders in support for the same offices. If we executed two separate aggregate queries—one for each function—but with the same grouping parameter, we would get different, uncorrelated groups.

Clone this wiki locally