Skip to content

Latest commit

 

History

History
365 lines (257 loc) · 15.9 KB

configuring-metrics.md

File metadata and controls

365 lines (257 loc) · 15.9 KB

Configuring Metrics

Table of Contents

Overview

Fili metrics are either named aggregations of Druid metrics or named expressions over other Fili metrics. They range from simple arithmetic to complex combinations of aggregations and post-aggregations.

There are two types of metrics:

  1. First order metrics are metrics that directly aggregate a Druid metric. For example, you might have two metrics, page_views and additive_page_views, which compute the longSums of their equivalent Druid metrics, druid_page_views and druid_additive_page_views.

  2. Higher order metrics are metrics defined in terms of other metrics. For example, you might have a total_page_views metric that is the sum of page_views and additive_page_views.

Loading Metrics

Fili relies on a MetricDictionary to resolve names into metrics. This suggests there are two pieces you need to define:

  1. The names of your metrics

  2. The metrics themselves

Naming Metrics

You can name your metrics by implementing the ApiMetricName interface. The interface has two responsibilities:

  1. It provides a formal name to the metrics that can be used by other parts of the system (like the BaseTableLoader).

  2. It determines if the metric is valid for a given TimeGrain.

For example, consider the following enum:

    public enum ExampleApiMetricName implements ApiMetricName {
        PAGE_VIEWS,
        ADDITIVE_PAGE_VIEWS,
        TOTAL_PAGE_VIEWS;
      
        private final TimeGrain minimumGrain;

        ExampleApiMetricName() {
            this.minimumGrain = DefaultTimeGrain.DAY;
        }

        @Override
        public String getApiName() {
            return EnumUtils.enumJsonName(this);
        }

        @Override
        public boolean isValidFor(TimeGrain grain) {
            //Check if the passed in grain is coarser than the metric's grain.
            return grain.compareTo(minimumGrain) >= 0;
        }
        ...
    }

This enum specifies that all metrics are valid for time grains at the day level and coarser (week, month, year, etc). The WikiApiMetricName in the [fili-wikipedia-example][fili-wikipedia-example] provides a more complete example.

You also need to give Fili the names of the Druid metrics. This is done by implementing the FieldName interface in a similar manner as ApiMetricName (except Druid metric names do not require a minimum time grain).

Implementing FieldName allows you to feed the Druid metric names into the BaseTableLoader, which uses them to configure the physical tables. See the Binding Resources for more information about loading tables. The WikiDruidMetricName enum provides an example.

Building and Loading Metrics

Next, you need to write the code that builds the metrics and loads them into the MetricDictionary at Fili start up. To do so, you need to implement the MetricLoader interface, which has a single method loadMetricDictionary.

For example, suppose you want to register the three page view metrics introduced in Overview.

Then the loadMetricDictionary method may look something like this:

private MetricMaker longSumMaker;
private MetricMaker sumMaker;

@Override
public void loadMetricDictionary(MetricDictionary metricDictionary) {
    buildMetricMakers(metricDictionary);
    metricInstances = buildMetricInstances(metricDictionary);
    addToMetricDictionary(metricDictionary, metricInstances);
}

MetricMakers

A MetricMaker knows how to construct a LogicalMetric. A LogicalMetric is a named Druid query plus a Mapper for post-Druid processing. For example, the longSumMaker knows how to construct a longSum aggregation, while the sumMaker knows how to construct an arithmetic post aggregation using addition.

For the running example, a longSumMaker and a sumMaker are needed:

private void buildMetricMakers(MetricDictionary metricDictionary) {
    longSumMaker = new LongSumMaker(metricDictionary);
    sumMaker = new ArithmeticMaker(metricDictionary, ArithmeticPostAggregationFunction.PLUS);
}

MetricInstances

A MetricInstance knows how to use a MetricMaker to make a metric. In the running example, there are three metrics: page_views, additive_page_views, and total_page_views. A MetricInstance is needed for each metric:

private List<MetricInstance> buildMetricInstances(MetricDictionary metricDictionary) {
    return Arrays.<MetricInstance>asList(
            new MetricInstance(PAGE_VIEWS, longSumMaker, DRUID_PAGE_VIEWS),
            new MetricInstance(ADDITIVE_PAGE_VIEWS, longSumMaker, DRUID_ADDITIVE_PAGE_VIEWS),
            new MetricInstance(TOTAL_PAGE_VIEWS, sumMaker, ADDITIVE_PAGE_VIEWS, PAGE_VIEWS)
    );
}

Observe that it is here that you tie metrics to their dependents. Since page_views and additive_page_views are both Druid metrics, they rely on the respective druid metrics. Meanwhile, total_page_views relies on additive_page_views and page_views.

Creating Metrics and loading the MetricDictionary

Finally, the metrics need to be made, and added to the MetricDictionary. In the example, this is handled by the addToMetricDictionary method:

private void addToMetricDictionary(MetricDictionary metricDictionary, List<MetricInstance> metrics) {
    metrics.stream().map(MetricInstance::make).forEach(metricDictionary::add);
}

The bard-wikipedia-example has a sample metric loader called WIkiMetricLoader.

Of course, Fili also needs to be told about the MetricLoader that you just defined. See Binding Resources for details on how to do that.

Custom Metrics

Most custom metrics will be simple operations on metrics that already exist, using makers that already exist. In this case, defining the new metric is as simple as adding the following line to your buildMetricInstances method (or equivalent):

   new MetricInstance(NEW_METRIC_NAME, metricMaker, DEPENDENT, METRIC, NAMES)

and adding NEW_METRIC_NAME to your implementation of ApiMetricName.

See Built-in Metrics for a list of makers that come with Fili.

Custom Makers

Sometimes you need more than what Fili provides out of the box. Perhaps you need to perform a calculation that cannot be expressed in terms of other metrics, or you are working with a datatype that Druid does not support natively. In such cases, you can define your own custom maker. As a running example consider the ArithmeticMaker, which models post-aggregation arithmetic.

First, you need to decide what kind of metric you want to define: first-order or higher-order.

If the metric is first-order, then you should extend RawAggregationMetricMaker. You will also likely have to add a custom Druid aggregation to your Druid cluster.

If the metric is higher-order, then you should extend MetricMaker.

ArithmeticMaker is a higher-order metric, so it extends MetricMaker.

The bulk of the work in defining a custom Maker is in overriding the makeInner method, which performs the actual construction of the LogicalMetric:

@Override
protected LogicalMetric makeInner(String metricName, List<String> dependentMetrics) {
...
}

makeInner generally performs the following steps:

  1. Merge Dependent Queries: If there is more than one dependent metric, merge the queries of each dependent metric into a single query. This can be accomplished using the MetricMaker::getMergedQuery method.

    Since ArithmeticMaker takes at least two other metrics, its dependent metrics need to be merged:

    TemplateDruidQuery mergedQuery = getMergedQuery(dependentMetrics);

    A TemplateDruidQuery is scaffolding of a Druid query that knows how to merge with another TemplateDruidQuery.

  2. Build Aggregators and Post-Aggregators: Construct the aggregations and post-aggregations the query depends on.

    In the case of ArithmeticMaker, the query consists of the aggregations performed by its dependent metrics, a field accessor for each aggregation, and a single arithmetic post-aggregation.

    Set<Aggregation> aggregations = mergedQuery.getAggregations();
    
    //Creates a field-accessor post-aggregation for the aggregator in each dependentMetric.
    List<PostAggregation> operands = dependentMetrics.stream()
            .map(this::getNumericField)
            .collect(Collectors.toList());
    PostAggregation arithmeticPostAgg = new ArithmeticPostAggregation(metricName, function, operands);
  3. Build the inner query: Construct the inner query, if the metric requires query nesting.

    The ArithmeticMaker uses the inner query constructed by the getMergedQuery method. See AggregationAverageMaker for an example maker that builds a more interesting inner query.

    TemplateDruidQuery innerQuery = mergedQuery.getInnerQuery();
  4. Build TemplateDruidQuery: Construct a TemplateDruidQuery.

    ArithmeticMaker constructs the following TemplateDruidQuery:

    TemplateDruidQuery templateDruidQuery = new TemplateDruidQuery(
            aggregations,
            Collections.singletonSet(arithmeticPostAgg),
            innerQuery,
            mergedQuery.getTimeGrain()
    );
  5. Build Mapper: Construct a Mapper. If a metric does not require post-Druid processing, then an instance of NoOpResultSetMapper should be used.

    The ArithmeticMaker uses a ColumnMapper that is injected at construction time as resultSetMapper. So all that needs to be done here is construct a new version of resultSetMapper with the name of the metric being constructed:

    ColumnMapper mapper = resultSetMapper.withColumnName(metricName);
  6. Build LogicalMetric: Construct and return the LogicalMetric.

    return new LogicalMetric(query, mapper, metricName);

Mappers

Mappers are subclasses of ResultSetMapper that allow us to perform post-Druid processing in a row-wise fashion. Fili constructs the post-Druid workflow by iterating through each LogicalMetric and composing their Mappers into a function chain. When the Druid result comes in, the result set is then passed through each link in the chain in the order of the metrics defined in the query.

To define a Mapper, you need to override two methods: map(Result result, Schema schema) and map(Schema schema). The first allows you to modify a single row in the result set. The second allows you to modify the result schema.

In order to allow Result processing in a (moderately) type-safe way, the Result class provides a variety of methods for extracting the value of a metric column of the appropriate type:

  1. getMetricValue
  2. getMetricValueAsNumber
  3. getMetricValueAsString
  4. getMetricValueAsBoolean
  5. getMetricValueAsJsonNode

The first returns the metric value as an Object. The others cast the result to the appropriate type (BigDecimal in the case of getMetricValueAsNumber).

NonNumericMetrics contains simple sample mappers for each of the non-numeric metrics.

SketchRoundUpMapper is an example of a mapper for numeric metrics.

RowNumMapper is an example of a mapper that adds a column.

Complex Metrics

Complex (non-numeric) metrics are configured the same as custom numeric metrics. Fili supports all native JSON types:

  1. Numbers
  2. Strings
  3. Booleans
  4. Objects/Lists

Numbers, Strings, and Booleans are parsed into the corresponding Java types. JSON Objects and Lists are extracted from the Druid response as JsonNode instances. By default, Fili will pass the results from Druid on to the user unchanged. If post-Druid processing is required, a Mapper can be added to the mapper workflow stage. See Custom Metrics for details on how to add a Mapper to the workflow.

If Druid returns a JSON null, then Fili will parse it into the Java null.