Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to decouple availability from table schemas #165

Merged
merged 5 commits into from
Mar 13, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 57 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,21 @@ Current

### Added:

- [Major refactor for availability and schemas and tables](https://github.com/yahoo/fili/pull/165)
* `ImmutableAvailability` - provides immutable, typed replacement for maps of column availabilities
* New Table Implementations:
* `BasePhysicalTable` core implementation
* `ConcretePhysicalTable` creates an ImmutableAvailability
* `Schema` implementations
* `BaseSchema` has `Columns`, `Granularity`
* `PhysicalTableSchema` has base plus `ZonedTimeGrain`, name mappings
* `LogicalTableSchema` base with builder from table group
* `ResultSetSchema` base with transforming with-ers

* `ApiName`, `TableName`: Added static factory from String to Name

* `ErrorMessageFormat` for errors during `ResultSetMapper` cycle

- [Added default base class for all dimension types](https://github.com/yahoo/fili/pull/177)
* Added base classes `DefaultKeyValueStoreDimensionConfig`, `DefaultLookupDimensionConfig` and `DefaultRegisteredLookupDimensionConfig`
to create default dimensions.
Expand All @@ -29,7 +44,31 @@ Current

### Changed:

-- [Request IDS now supports underscores.](https://github.com/yahoo/fili/pull/176)
- [Major refactor for availability and schemas and tables](https://github.com/yahoo/fili/pull/165)
* `Schema` and `Table` became interfaces
* `Table` has-a `Schema`
* `PhysicalTable` extends `Table`, interface only supports read only operations
* `Schema` constructed as immutable, `Column`s no longer bind to `Schema`
* Removed addNew...Column method
* `Schema` implementations now: `BaseSchema`, `PhysicalTableSchema`, `LogicalTableSchema`, `ResultSetSchema`
* `DimensionLoader` uses `ConcretePhysicalTable`

* `PhysicalTableDefinition` made some fields private, accepts iterables, returns immutable dimensions

* `ResultSet` constructor parameter order swapped
* `ResultSetMapper` now depends on ResultSetSchema

* `TableDataSource` constructor arg narrows: PhysicalTable->ConcreteTable

* `DataApiRequest` constructor arg narrows: Table->LogicalTable

* `DruidQueryBuilder` now polymorphic on building data sources models from new physical tables

* `ApiFilter` schema validation moved to DataApiRequest

* Guava version bumped to 21.0

- [Request IDS now supports underscores.](https://github.com/yahoo/fili/pull/176)

- Added support for extensions defining new Query types
* TestDruidWebService assumes unknown query types behave like GroupBy, TimeSeries, and TopN
Expand All @@ -55,6 +94,11 @@ Current

### Fixed:

- [Major refactor for availability and schemas and tables](https://github.com/yahoo/fili/pull/165)
* Ordering of fields on serialization could be inconsistent if intermediate stages used `HashSet` or `HashMap`.
* Several constructors switched to accept `Iterable` and return `LinkedHashSet` to emphasize importance of ordering/prevent `HashSet` intermediates which disrupt ordering.


-[Fix Lookup Dimension Serialization](https://github.com/yahoo/fili/pull/187)
* Fix a bug where lookup dimension is serialized as dimension spec in both outer and inner query

Expand All @@ -68,6 +112,18 @@ Current

### Removed:

- [Major refactor for availability and schemas and tables](https://github.com/yahoo/fili/pull/165)
* Removed `ZonedSchema` (all methods moved to child class ResultSetSchema)

* `PhysicalTable` no longer supports mutable availability
* Removed addColumn, removeColumn, getWorkingIntervals, resetColumns, commit
* resetColumns moved to BasePhysicalTable
* Other mutators no longer exist, availability is immutable
* getAvailableIntervals removed (availability.getAvailableIntervals replaces)

* `DruidResponseParser` buildSchema removed, work moved to ResultSetSchema constructor

* `BaseTableLoader` removed redundant buildLogicalTable methods


v0.7.36 - 2017/01/30
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ public abstract class AbstractBinderFactory implements BinderFactory {
SYSTEM_CONFIG.getPackageVariableName("loader_scheduler_thread_pool_size"),
LOADER_SCHEDULER_THREAD_POOL_SIZE_DEFAULT
);
public static final String SYSTEM_CONFIG_TIMEZONE_KEY = "timezone";

private ObjectMappersSuite objectMappers;

Expand Down Expand Up @@ -355,7 +356,10 @@ protected Class<? extends DruidResponseParser> buildDruidResponseParser() {
*/
protected Clock getClock() {
return Clock.system(
ZoneId.of(SYSTEM_CONFIG.getStringProperty(SYSTEM_CONFIG.getPackageVariableName("timezone"), "UTC"))
ZoneId.of(SYSTEM_CONFIG.getStringProperty(
SYSTEM_CONFIG.getPackageVariableName(SYSTEM_CONFIG_TIMEZONE_KEY),
"UTC"
))
);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import com.yahoo.bard.webservice.druid.model.query.DruidSearchQuery;
import com.yahoo.bard.webservice.druid.model.query.RegexSearchQuerySpec;
import com.yahoo.bard.webservice.druid.model.query.SearchQuerySpec;
import com.yahoo.bard.webservice.table.ConcretePhysicalTable;
import com.yahoo.bard.webservice.table.PhysicalTableDictionary;
import com.yahoo.bard.webservice.web.handlers.RequestContext;

Expand Down Expand Up @@ -110,6 +111,8 @@ public DruidDimensionsLoader(
.collect(Collectors.toList());

dataSources = physicalTableDictionary.values().stream()
.filter(physicalTable -> physicalTable instanceof ConcretePhysicalTable)
.map(physicalTable -> (ConcretePhysicalTable) physicalTable)
.map(TableDataSource::new)
.collect(Collectors.toList());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
package com.yahoo.bard.webservice.async;

import com.yahoo.bard.webservice.data.ResultSet;
import com.yahoo.bard.webservice.data.ResultSetSchema;
import com.yahoo.bard.webservice.druid.model.query.AllGranularity;
import com.yahoo.bard.webservice.table.Schema;
import com.yahoo.bard.webservice.web.PreResponse;
import com.yahoo.bard.webservice.web.responseprocessors.ResponseContext;
import com.yahoo.bard.webservice.web.responseprocessors.ResponseContextKeys;
Expand Down Expand Up @@ -61,7 +61,11 @@ public static PreResponse buildErrorPreResponse(Throwable throwable) {
}

return new PreResponse(
new ResultSet(Collections.emptyList(), new Schema(AllGranularity.INSTANCE)),
new ResultSet(new ResultSetSchema(
AllGranularity.INSTANCE,
Collections.emptySet()),
Collections.emptyList()
),
responseContext
);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ public ResponseException(
* @param statusType Status type of the response
* @param druidQuery The druid query being processed
* @param error Exception object with error details
*
* @deprecated In order to ensure correct serialization of the Druid Query, an ObjectWriter with all appropriate
* configuration should be passed in to the constructor
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import com.yahoo.bard.webservice.druid.model.datasource.DataSource;
import com.yahoo.bard.webservice.druid.model.datasource.QueryDataSource;
import com.yahoo.bard.webservice.druid.model.datasource.TableDataSource;
import com.yahoo.bard.webservice.druid.model.datasource.UnionDataSource;
import com.yahoo.bard.webservice.druid.model.filter.Filter;
import com.yahoo.bard.webservice.druid.model.having.Having;
import com.yahoo.bard.webservice.druid.model.orderby.LimitSpec;
Expand All @@ -21,6 +22,7 @@
import com.yahoo.bard.webservice.druid.model.query.GroupByQuery;
import com.yahoo.bard.webservice.druid.model.query.TimeSeriesQuery;
import com.yahoo.bard.webservice.druid.model.query.TopNQuery;
import com.yahoo.bard.webservice.table.ConcretePhysicalTable;
import com.yahoo.bard.webservice.table.LogicalTable;
import com.yahoo.bard.webservice.table.LogicalTableDictionary;
import com.yahoo.bard.webservice.table.PhysicalTable;
Expand All @@ -30,6 +32,8 @@
import com.yahoo.bard.webservice.table.resolver.PhysicalTableResolver;
import com.yahoo.bard.webservice.web.DataApiRequest;

import com.google.common.collect.Sets;

import org.joda.time.DateTimeZone;
import org.joda.time.Interval;
import org.slf4j.Logger;
Expand Down Expand Up @@ -199,51 +203,36 @@ protected GroupByQuery buildGroupByQuery(
granularity = template.getTimeGrain().buildZonedTimeGrain(timeZone);
}

DataSource dataSource;
if (!template.isNested()) {
LOG.trace("Building a single pass druid groupBy query");

// The data source is the table directly, since there is no nested query below us
DataSource dataSource = new TableDataSource(table);

// Filters must be applied at the lowest level as they exclude data from aggregates
return new GroupByQuery(
dataSource,
dataSource = buildTableDataSource(table);
} else {
LOG.trace("Building a multi pass druid groupBy query");
// Build the inner query without an order by, since we only want to do that at the top level
// Sorts don't apply to inner queries and Filters only apply to the innermost query
GroupByQuery query = buildGroupByQuery(
template.getInnerQuery(),
table,
granularity,
timeZone,
groupByDimensions,
filter,
having,
template.getAggregations(),
template.getPostAggregations(),
intervals,
druidOrderBy
(LimitSpec) null
);
dataSource = new QueryDataSource(query);
// Filters have been handled by the inner query, are not needed/allowed on the outer query
filter = null;
}

LOG.trace("Building a multi pass druid groupBy query");

// Build the inner query without an order by, since we only want to do that at the top level
TemplateDruidQuery nestedQuery = template.getInnerQuery();
GroupByQuery query = buildGroupByQuery(
nestedQuery,
table,
granularity,
timeZone,
groupByDimensions,
filter,
having,
intervals,
(LimitSpec) null
);

// The data source is the inner query we just built
DataSource dataSource = new QueryDataSource(query);

// Build the wrapping query without filters
// Filters must be applied at the lowest level as they exclude data from aggregates
return new GroupByQuery(
dataSource,
granularity,
groupByDimensions,
(Filter) null,
filter,
having,
template.getAggregations(),
template.getPostAggregations(),
Expand All @@ -252,6 +241,21 @@ protected GroupByQuery buildGroupByQuery(
);
}

/**
* Build a data source from a table.
*
* @param table A fact table or fact table view
*
* @return A table datasource for a fact table or a union data source for a fact table view
*/
private DataSource buildTableDataSource(PhysicalTable table) {
if (table instanceof ConcretePhysicalTable) {
return new TableDataSource((ConcretePhysicalTable) table);
} else {
return new UnionDataSource(Sets.newHashSet(table));
}
}

/**
* Builds a druid topN query.
*
Expand Down Expand Up @@ -308,10 +312,8 @@ protected TopNQuery buildTopNQuery(
LOG.trace("Building a single pass druid topN query");

// The data source is the table directly, since there is no nested query below us
DataSource dataSource = new TableDataSource(table);

return new TopNQuery(
dataSource,
buildTableDataSource(table),
// The check that the set of dimensions has exactly one element is currently done above
granularity,
groupByDimension.iterator().next(),
Expand All @@ -325,7 +327,7 @@ protected TopNQuery buildTopNQuery(
}

/**
* Builds a druid timeseries query.
* Builds a druid TimeSeries query.
*
* @param template The query template. Not nested since nesting is not supported in druid timeseries queries
* @param table The physical table that underlies the lowest-level datasource
Expand Down Expand Up @@ -367,11 +369,8 @@ protected TimeSeriesQuery buildTimeSeriesQuery(

LOG.trace("Building a single pass druid timeseries query");

// The data source is the table directly, since there is no nested query below us
DataSource dataSource = new TableDataSource(table);

return new TimeSeriesQuery(
dataSource,
buildTableDataSource(table),
granularity,
filter,
template.getAggregations(),
Expand Down
Loading