-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to decouple availability from table schemas #165
Conversation
😱 |
|
||
columns.addAll( | ||
Streams.stream(metricEntries) | ||
.map(entry-> new MetricColumnWithValueType(entry.getKey(), entry.getValue().asText())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this matters but there is a space missing between entry
and ->
* @return name | ||
*/ | ||
public String getName() { | ||
return this.name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
might be unnecessary.
tableGroup.getApiMetricNames().stream() | ||
.filter(apiMetricName -> apiMetricName.isValidFor(granularity)) | ||
.map(ApiMetricName::getApiName) | ||
.map(name -> new LogicalMetricColumn(name, metricDictionary.get(name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be easier to read if this is indented by 2 tabs more
@NotNull Map<String, String> logicalToPhysicalColumnNames | ||
) { | ||
super(columns); | ||
this.granularity = timeGrain; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
might be unnecessary
this.granularity = timeGrain; | ||
|
||
this.logicalToPhysicalColumnNames = Collections.unmodifiableMap(logicalToPhysicalColumnNames); | ||
this.physicalToLogicalColumnNames = Collections.unmodifiableMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
might be unnecessary
* @return true if this table supports this column explicitly | ||
*/ | ||
public boolean containsLogicalName(String | ||
logicalName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String logicalName
can be put on the same line
if (result != null) { | ||
return result; | ||
} | ||
return Collections.emptyList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two return
's can be combined to return result == null ? Collections.emptyList() : result
*/ | ||
public class ImmutableWrapperMap<K, V> implements Map<K, V> { | ||
|
||
Map<K, V> map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, if we implement Map
, does it make sense to have this? Looks like HashMap
doesn't do it this way http://grepcode.com/file_/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java/?v=source
@archolewa I'll restructure the commits tomorrow. |
f9045c4
to
4620d8c
Compare
There was so much interdependency I couldn't even figure out how to simplify via several commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just style change and questions. Thanks!
@@ -219,8 +219,7 @@ private ResultSet getResultSet(JsonNode serializedResultSet) { | |||
* | |||
* @return ZonedSchema object generated from the JsonNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix comment here from ZonedSchema
to ResultSetSchema
@@ -229,21 +228,25 @@ private ZonedSchema getZonedSchema(JsonNode schemaNode) { | |||
); | |||
|
|||
//Recreate ZonedSchema from granularity and timezone values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix comment here
/** | ||
* The schema for a result set. | ||
*/ | ||
public class ResultSetSchema extends BaseSchema implements Schema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not need to implement Schema
since BaseSchema
implements Schema
already?
Map<String, Object> schemaComponents = new HashMap<>(); | ||
|
||
String timeId = (schema instanceof ZonedSchema) ? | ||
((ZonedSchema) schema).getDateTimeZone().getID() : | ||
SYSTEM_CONFIG.getStringProperty(SYSTEM_CONFIG.getPackageVariableName("timezone"), "UTC"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are completely discarding the timezone
configuration property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time zone is now riding along via the granularity rather than in the schema top level.
definition.getGrain(), | ||
definition.getLogicalToPhysicalNames() | ||
); | ||
new LinkedHashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra code.
dataSourceName, | ||
DAY.buildZonedTimeGrain(DateTimeZone.UTC), | ||
[] as Set | ||
, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we move the comma back?
dataSourceName, | ||
DAY.buildZonedTimeGrain(DateTimeZone.UTC), | ||
[] as Set | ||
, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we move the comma back?
dataSourceName, | ||
DAY.buildZonedTimeGrain(DateTimeZone.UTC), | ||
[] as Set | ||
, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we move the comma back?
"NETWORK", | ||
DAY.buildZonedTimeGrain(UTC), | ||
columns | ||
, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we move the comma back?
@@ -0,0 +1,88 @@ | |||
Support Union Data Source Well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to merge these rfc notes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part 1
And this will definitely need a CHANGELOG entry (or possibly many)
@@ -167,6 +167,7 @@ | |||
SYSTEM_CONFIG.getPackageVariableName("loader_scheduler_thread_pool_size"), | |||
LOADER_SCHEDULER_THREAD_POOL_SIZE_DEFAULT | |||
); | |||
public static final String SYSTEM_CONFIG_TIMEZONE = "timezone"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In places where the constant is the property key, we've generally taken the approach of including _KEY
in the constant name, so that there's no confusion and expectation that this constant holds the actual timezone, rather than the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// The data source is the table directly, since there is no nested query below us | ||
DataSource dataSource = new TableDataSource(table); | ||
DataSource dataSource = new TableDataSource((ConcretePhysicalTable) table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long do we expect this code to live like this? I'm leery of the possibly unsafe cast w/o a check / log nearby
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we are casting to a ConceretePhysicalTable
in each of these building methods. Should we just update the signatures of tehse methods to take a ConcretePhysicalTable
instead, and then do the cast before invoking these methods?
Also, what's the planned alternative to this? It looks like we're implicitly assuming that when we go to actually build a druid query, we are working with actual druid tables. Which makes sense until you factor in union datasources, unless union datasource are also considered ConcretePhysicalTables
(haven't gotten that far).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'm putting in a slightly more resilient variation. However, we should still tweak this in some way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a little tech debt here. Existing union data sources will construct with either non-Concrete physical tables or sets of tables. Table data source will only be usable by concrete physical tables. This polymorphism will push up the stack, but I'd like to take it as tech debt right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound reasonable.
/** | ||
* The schema for a result set. | ||
*/ | ||
public class ResultSetSchema extends BaseSchema implements Schema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to declare that it implements Schema
, since extends BaseSchema
does that.
import javax.validation.constraints.NotNull; | ||
|
||
/** | ||
* The schema for a result set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good if we could expand a bit on what it means to be the schema of a result set. If we can't, that's fine, but I'd love to see this class comment be more useful / descriptive than just repeating the class name.
Things like expected use, meaning, and what it adds beyond (or how it differs from) a Schema or BaseSchema would all be good candidates for info to add here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the next commit is better.
/** | ||
* The granularity of the ResultSet. | ||
*/ | ||
private Granularity granularity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be final
?
*/ | ||
public ResultSetSchema( | ||
@NotNull Granularity granularity, Iterable<Column> columns | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can unwrap
* | ||
* @param resultSetSchema the result set schema being copied | ||
*/ | ||
public ResultSetSchema(ResultSetSchema resultSetSchema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed. Since ResultSetSchema
is immutable, creating a new object with exactly the same contents isn't needed because it's exactly the same as simply using the 1st instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part 2
@@ -61,7 +61,9 @@ public static PreResponse buildErrorPreResponse(Throwable throwable) { | |||
} | |||
|
|||
return new PreResponse( | |||
new ResultSet(Collections.emptyList(), new Schema(AllGranularity.INSTANCE)), | |||
new ResultSet( | |||
new ResultSetSchema(AllGranularity.INSTANCE, Collections.emptySet()), Collections.emptyList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Param list should "chop" down if it needs to wrap. (Collections.emptyList()
belongs on a new line)
* | ||
* @return the result set being constructed | ||
*/ | ||
public ResultSetSchema withAddColumn(Column c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize it's not particularly confusing what c
is, but could we get a better name for the param in the interest of self-documenting (if nothing else)?
static ApiMetricName of(String name) { | ||
return new ApiMetricName() { | ||
@Override | ||
public boolean isValidFor(final TimeGrain grain) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
not needed
* | ||
* @param name the name being wrapped | ||
* | ||
* @return an anonymous subclass instance of ApiMetricName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste error
public String asName() { | ||
return name; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this implement equals and hashcode?
/** | ||
* Constructor. | ||
* | ||
* @param name The column name | ||
*/ | ||
protected MetricColumn(String name) { | ||
public MetricColumn(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this made public b/c the static "factory" thing went away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
protected Result map(Result result, Schema schema) { | ||
return result; | ||
protected Result map(Result result, ResultSetSchema schema) { | ||
throw new UnsupportedOperationException("This code should never be reached."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should it never be reached? It's not clear why this method is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is potentially called only from the general map function, and it isn't here.
Result newResult; | ||
ResultSetSchema schema = map(resultSet.getSchema()); | ||
MetricColumn column = schema.getColumn(ROW_NUM_COLUMN_NAME, MetricColumn.class).orElseThrow( | ||
() -> new IllegalStateException("Unexpected missing column") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should say which column is missing?
And should this be in the ErrorMessage enum?
And we should probably log this as an error
, since it's likely to kill the request I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just going to remove this check. We literally just added this column to the schema the line before.
if (columnName == null) { | ||
throw new IllegalStateException("Cannot map results without a column name"); | ||
} | ||
|
||
MetricColumn metricColumn = (MetricColumn) schema.getColumn(columnName); | ||
MetricColumn metricColumn = schema.<MetricColumn>getColumn(columnName, MetricColumn.class).orElseThrow( | ||
() -> new IllegalStateException("Unexpected missing column") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should say which column is missing?
And should this be in the ErrorMessage enum?
And we should probably log this as an error
, since it's likely to kill the request I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next commit will add capture for ResultSetMapper errors above where mapping is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've push the error handling for this up into ResultSetResponseProcessor.
super(DefaultDataSourceType.TABLE, Collections.singleton(physicalTable)); | ||
|
||
this.name = physicalTable.getFactTableName(); | ||
this.name = physicalTable.getAvailability().getDataSourceNames().stream().findFirst().get().asName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify this, and because ConcretePhyiscalTable
can only have 1 backing table, could/should we push this logic into ConcretePhysicalTable
and hide it behind a getTableName()
method?
And actually, after looking a bit deeper, ConcretePhyiscalTable
already / still has a getFactTableName
method that does this logic, so we should use that and rely on the contract / concept of ConcretePhyiscalTable
.
71e52d5
to
7e024ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor quibbles and questions. In general though, it looks great.
@@ -202,8 +203,9 @@ protected GroupByQuery buildGroupByQuery( | |||
if (!template.isNested()) { | |||
LOG.trace("Building a single pass druid groupBy query"); | |||
|
|||
// TODO FIXME hardcoding to concrete for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an issue for fixing this?
// The data source is the table directly, since there is no nested query below us | ||
DataSource dataSource = new TableDataSource(table); | ||
DataSource dataSource = new TableDataSource((ConcretePhysicalTable) table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we are casting to a ConceretePhysicalTable
in each of these building methods. Should we just update the signatures of tehse methods to take a ConcretePhysicalTable
instead, and then do the cast before invoking these methods?
Also, what's the planned alternative to this? It looks like we're implicitly assuming that when we go to actually build a druid query, we are working with actual druid tables. Which makes sense until you factor in union datasources, unless union datasource are also considered ConcretePhysicalTables
(haven't gotten that far).
@@ -40,11 +39,17 @@ | |||
* @param jsonResult Druid results in json | |||
* @param schema Schema for results | |||
* @param queryType the type of query, note that this implementation only supports instances of | |||
* @param dateTimeZone the time zone used for format the results | |||
* {@link DefaultQueryType} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line goes with the parameter queryType
, not dateTimeZone
@@ -2,11 +2,11 @@ | |||
// Licensed under the terms of the Apache license. Please see LICENSE.md file distributed with this work for terms. | |||
package com.yahoo.bard.webservice.table.resolver; | |||
|
|||
import com.yahoo.bard.webservice.table.PhysicalTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be kept where it was.
generateGranularity(schemaNode.get(SCHEMA_GRANULARITY).asText(), timezone), | ||
timezone | ||
); | ||
Granularity granularity = generateGranularity(schemaNode.get(SCHEMA_GRANULARITY).asText(), timezone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used.
*/ | ||
public ResultSet(List<Result> results, Schema schema) { | ||
public ResultSet(ResultSetSchema schema, List<Result> results) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we flipping the order of parameters? Seems like an unnecessary backwards incompatible change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the old order was stupid
/** | ||
* The schema for a result set. | ||
*/ | ||
public class ResultSetSchema extends BaseSchema implements Schema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of inconsistency here concerning whether or not a ResultSetSchema
has an order. It takes as an argument a simple Set
, but withAddColumn
refers to a final
column, which only makes sense if the Schema
has an ordering.
So in other words, we have a behavior, where the original columns are unordered, until we add another column, in which case the new schema has a fixed (arbitrary) order such that the new columns added always appear last?
I think I remember us talking about something like this, where something took an unordered collection and spat out an ordered collection, and I remember being ok with it, but I don't remember if it was for this class or a different one. But either way, this "no fixed order until we add a new column then order" is kind of weird.
If we want this schema to have some kind of order, then we should probably take in a LinkedHashSet
. Or try to explain that it imposes a (potentially arbitrary) order on the columns depending on how the passed in set's iterator happens to spit out the columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm switching to an Iterable in LinkedHashSet out idiom
One of the libraries we use supports that and it makes a lot of sense because it unambiguously commuicates that we're set-ifying, and the Iterable makes it clear that we're concerned with order while being polymorphic across sources.
@@ -2,9 +2,9 @@ | |||
// Licensed under the terms of the Apache license. Please see LICENSE.md file distributed with this work for terms. | |||
package com.yahoo.bard.webservice.table.resolver; | |||
|
|||
import com.yahoo.bard.webservice.table.PhysicalTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should stay where it was.
import com.yahoo.bard.webservice.table.PhysicalTable; | ||
import com.yahoo.bard.webservice.data.metric.TemplateDruidQuery; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -4,12 +4,12 @@ | |||
|
|||
import static com.yahoo.bard.webservice.web.ErrorMessageFormat.TABLE_SCHEMA_UNDEFINED; | |||
|
|||
import com.yahoo.bard.webservice.table.PhysicalTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7e024ba
to
fce490b
Compare
👍 |
c4c01bb
to
21ac864
Compare
👍 The upcoming PRs will change many things in here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part A
new ResultSet(new ResultSetSchema( | ||
AllGranularity.INSTANCE, | ||
Collections.emptySet()), | ||
Collections.emptyList()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/notablocker
Wrap the closing paren onto the next line
@@ -35,6 +38,9 @@ | |||
import org.slf4j.Logger; | |||
import org.slf4j.LoggerFactory; | |||
|
|||
import avro.shaded.com.google.common.collect.Sets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is certainly not the right import. We shouldn't depend on a shadowed import in another library.
// The data source is the table directly, since there is no nested query below us | ||
DataSource dataSource = new TableDataSource(table); | ||
//DataSource dataSource = new TableDataSource((ConcretePhysicalTable) table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just delete this? If not, add a comment explaining why we're keeping makes sense.
// The data source is the table directly, since there is no nested query below us | ||
DataSource dataSource = new TableDataSource(table); | ||
DataSource dataSource = new TableDataSource((ConcretePhysicalTable) table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have the same "concrete if concrete otherwise Union" logic that the GroupBy builder method has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Fix incoming.
Set<TableName> names = table.getAvailability().getDataSourceNames(); | ||
if (names.isEmpty()) { | ||
throw new IllegalStateException("Misconfigured table with no backing datasource."); | ||
} else if (names.size() == 1 && table instanceof ConcretePhysicalTable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the TimeseriesQueryBuild method yet a 3rd version of constructing the DataSource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored in next checkin
* | ||
* @param logicalName Logical name to lookup in physical table | ||
* @return Translated logicalName if applicable | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
JavaDoc not needed, since it's a duplicate of the interface
public Map<Column, List<Interval>> getAvailableIntervals() { | ||
return getAvailability().getAvailableIntervals(); | ||
} | ||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank line needed.
* | ||
* @return tableEntries map of column to set of available intervals | ||
* | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
JavaDoc not needed, since it's a duplicate of the interface
* @return The time grain of this physical table | ||
* | ||
* @deprecated use getSchema().getGranularity() | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
JavaDoc not needed, since it's a duplicate of the interface
*/ | ||
public void resetColumns(SegmentMetadata segmentMetadata, DimensionDictionary dimensionDictionary) { | ||
Map<String, Set<Interval>> dimensionIntervals = segmentMetadata.getDimensionIntervals(); | ||
Map<String, Set<Interval>> metricIntervals = segmentMetadata.getMetricIntervals(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
Can just inline thse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part B
|
||
@Override | ||
public LinkedHashSet<Column> getColumns() { | ||
return columns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be (somewhere, probably the constructor) made to be immutable (if possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harder than it seems. There are no types that ARE immutable but also extend LinkedHashSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see. That's dumb.
import javax.validation.constraints.NotNull; | ||
|
||
/** | ||
* An instance of Physical table that is backed by a single fact table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
It's not an instance, but perhaps it's an implementation?
/** | ||
* Create a concrete physical table. | ||
* | ||
* @param name Fili name of the physical table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc should make it more clear what the differences is between this and the other constructor, particularly around things like what values are used as defaults...
* Getter. | ||
* | ||
* @return The granularity | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
This javadoc isn't needed, since this is just a getter.
|
||
/** | ||
* Physical Table represents a druid table. | ||
* An interface describing a config level physical table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's odd that this interface references config. Isn't it a very core interface, and used by lots more than config?
It's also a bit self-referential... Perhaps we cast it in a FactSourceRepresentation light?
/** | ||
* An availability which guarantees immutability on its contents. | ||
*/ | ||
//public class ImmutableAvailability extends ImmutableWrapperMap<Column, List<Interval>> implements Availability { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
//public class ImmutableAvailability extends ImmutableWrapperMap<Column, List<Interval>> implements Availability { | ||
public class ImmutableAvailability implements Availability { | ||
|
||
private final TableName name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
Technically, because you're holding an interface here which itself has no immutability assertion, this class cannot ensure immutability.
|
||
@Override | ||
public Set<TableName> getDataSourceNames() { | ||
return Sets.newHashSet(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this conversion in the constructor?
|
||
@Override | ||
public int hashCode() { | ||
return columnIntervals.hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to include the TableName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs addressing?
if (this == obj) { | ||
return true; | ||
} | ||
if (obj instanceof ImmutableAvailability) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to pay attention to TableName...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reached end of changes.
result.getDimensionRow((DimensionColumn) schema.columns.toArray()[0])?.get(BardDimensionField.DESC) == "u" | ||
result.getDimensionRow((DimensionColumn) schema.columns.toArray()[2])?.get(BardDimensionField.DESC) == "4" | ||
result.getDimensionRow((DimensionColumn) schema.columns.toArray()[3])?.get(BardDimensionField.DESC) == "" | ||
result.getDimensionRow((DimensionColumn) schema.columns.toArray()[3])?.get(BardDimensionField.ID) == "foo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these changing ordinal and order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea. @garyluoex , did you change these?
resultWithNullDimensionKey.getDimensionRow(schema.columns.toArray()[0])?.get(BardDimensionField.ID) == "" | ||
resultWithNullDimensionKey.getDimensionRow(schema.columns.toArray()[0])?.get(BardDimensionField.DESC) == | ||
resultWithNullDimensionKey.getDimensionRow((DimensionColumn) schema.columns.toArray()[2])?.get(BardDimensionField.ID) == "" | ||
resultWithNullDimensionKey.getDimensionRow((DimensionColumn) schema.columns.toArray()[2])?.get(BardDimensionField.DESC) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the ordinal change?
Note: This seems to apply to many other places in this file as well.
MetricColumn.addNewMetricColumn(schema, "lookback_pageViews") | ||
MetricColumn.addNewMetricColumn(schema, "retentionPageViews") | ||
ResultSet resultSet = new DruidResponseParser().parse(jsonResult, schema, DefaultQueryType.LOOKBACK) | ||
ResultSetSchema schema = new ResultSetSchema(DAY, [new MetricColumn("pageViews"), new MetricColumn("lookback_pageViews"), new MetricColumn("retentionPageViews")].toSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
Rather than calling .toSet()
, it's more idiomatic to call as Set
in Groovy.
Note: Applies to other places in this PR as well.
|
||
then: | ||
thrown(UnsupportedOperationException) | ||
|
||
} | ||
|
||
def "Build the schema from the query"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To where did this test move?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moved it here: ResultSetResponseProcessorSpec (next commit)
} | ||
|
||
Schema schema = new ResultSetSchema(DAY, columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
Can just return the new ResultSetSchema(...)
directly. No need to assign to temp var
|
||
Map<Column, List<Interval>> availabilityMap1 = [:] | ||
Map<Column, List<Interval>> availabilityMap2 = [:] | ||
Map<Column, List<Interval>> availabilityMap3 = [:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give these better names? "1", "2", "3" isn't very clear about what these will hold.
responseContext.put("missingIntervals", ["a","b","c", new SimplifiedIntervalList([interval]), bigDecimal]) | ||
responseContext.put( | ||
"missingIntervals", | ||
["a", "b", "c", new SimplifiedIntervalList([interval]), bigDecimal] as ArrayList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to coerce this to an ArrayList
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arraylist is serializable
} | ||
ResultSetSchema schema = new ResultSetSchema(granularity, columns) | ||
return schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
Just return it directly?
ScanSearchProviderManager.getInstance("gender") | ||
) | ||
def filtered_metric_name = "FOO_NO_BAR" | ||
Set<ApiMetricName> metricNames = (["FOO", filtered_metric_name].collect() { ApiMetricName.of(it)}) as Set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlocker
empty parens are not needed on collect
here
import com.yahoo.bard.webservice.web.DataApiRequest | ||
import com.yahoo.bard.webservice.web.ResponseFormatType | ||
|
||
import com.fasterxml.jackson.databind.JsonNode | ||
|
||
import org.joda.time.DateTimeZone | ||
|
||
import avro.shaded.com.google.common.collect.Sets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong import
Making Schema very immutable. Changing column build pattern as a result.
9822ebf
to
6384e6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few final things. Nothing big.
@@ -219,37 +241,19 @@ protected GroupByQuery buildGroupByQuery( | |||
); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is wrong here, and I think there's a regression where, in a nested case, filters are also being applied at the outer query, when they were no before. I don't think this is correctness issue, but it's definitely an optimality issue in terms of query size, as well as work Druid is doing for filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, never mind. I see what's happening. I missed the fact that filters get set to null
when building the inner query. We should definitely add a comment on that, and probably toss a comment on the outer query filter location, so that it's obvious what had happened (it's easy to miss).
|
||
// The data source is the table directly, since there is no nested query below us | ||
DataSource dataSource = new TableDataSource(table); | ||
filter = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should definitely add a comment here to call this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* | ||
* @return The logical table built | ||
*/ | ||
public LogicalTable buildLogicalTable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these were public methods, are we sure we can / want to just remove them? I'm pretty sure other code bases were relying on them for configuration, so if we could deprecate them instead, that would be better.
)); | ||
} | ||
|
||
public void setAvailability(Availability availability) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be protected
?
return super.toString() + " factTableName: " + factTableName + " alignment: " + getTableAlignment(); | ||
} | ||
/** | ||
* Get the time grain from the physical table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This javadoc doesn't match what the method seems to be doing, based on it's name.
return physicalToLogicalColumnNames.getOrDefault(physicalName, Collections.singleton(physicalName)); | ||
} | ||
@Deprecated | ||
Set<Column> getColumns(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of these deprecated methods, they can be made default
and the implementations can be lifted / removed from the BasePhysicalTable
up into this interface. Right now, there's a mix of some of the deprecated methods being defaulted (but still implemented), some deprecated, but their implementations pushed into the base class, etc.
public ImmutableAvailability(TableName tableName, Map<Column, List<Interval>> map) { | ||
this.name = tableName; | ||
columnIntervals = ImmutableMap.copyOf(map); | ||
dataSourceNames = Sets.newHashSet(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should make it into an immutable set (either ImmutableSet.of(name)
or something using Collections)
resultSet[0].getMetricValueAsNumber(schema.getColumn("pageViews")) == 1 as BigDecimal | ||
resultSet[0].getMetricValueAsNumber(schema.getColumn("time_spent")) == 2 as BigDecimal | ||
Result firstResult = resultSet.get(0) | ||
List<Column> columns = new ArrayList<>(schema.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused
Set<Column> dimensionColumns | ||
Column ageColumn | ||
Column genderColumn | ||
Column unknownColumn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should all be DimensionColumn
-typed
@@ -43,13 +36,17 @@ class DruidResponseParserSpec extends Specification { | |||
|
|||
@Shared DimensionDictionary dimensionDictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/NotABlockerBecauseYouDidn'tChangeIt
It would be great to make this not be shared so that we can't leak state across tests. (or at least not as easily)
|
||
import org.joda.time.Interval; | ||
|
||
import avro.shaded.com.google.common.collect.Sets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 pending getting the build to pass.
7f2e498
to
5772ead
Compare
No description provided.