Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to decouple availability from table schemas #165

Merged
merged 5 commits into from
Mar 13, 2017

Conversation

michael-mclawhorn
Copy link
Contributor

No description provided.

@archolewa
Copy link
Contributor

😱


columns.addAll(
Streams.stream(metricEntries)
.map(entry-> new MetricColumnWithValueType(entry.getKey(), entry.getValue().asText()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this matters but there is a space missing between entry and ->

* @return name
*/
public String getName() {
return this.name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be unnecessary.

tableGroup.getApiMetricNames().stream()
.filter(apiMetricName -> apiMetricName.isValidFor(granularity))
.map(ApiMetricName::getApiName)
.map(name -> new LogicalMetricColumn(name, metricDictionary.get(name)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to read if this is indented by 2 tabs more

@NotNull Map<String, String> logicalToPhysicalColumnNames
) {
super(columns);
this.granularity = timeGrain;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be unnecessary

this.granularity = timeGrain;

this.logicalToPhysicalColumnNames = Collections.unmodifiableMap(logicalToPhysicalColumnNames);
this.physicalToLogicalColumnNames = Collections.unmodifiableMap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be unnecessary

* @return true if this table supports this column explicitly
*/
public boolean containsLogicalName(String
logicalName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String logicalName can be put on the same line

if (result != null) {
return result;
}
return Collections.emptyList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two return's can be combined to return result == null ? Collections.emptyList() : result

*/
public class ImmutableWrapperMap<K, V> implements Map<K, V> {

Map<K, V> map;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, if we implement Map, does it make sense to have this? Looks like HashMap doesn't do it this way http://grepcode.com/file_/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java/?v=source

@michael-mclawhorn
Copy link
Contributor Author

@archolewa I'll restructure the commits tomorrow.

@michael-mclawhorn
Copy link
Contributor Author

There was so much interdependency I couldn't even figure out how to simplify via several commits.

Copy link
Collaborator

@garyluoex garyluoex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just style change and questions. Thanks!

@@ -219,8 +219,7 @@ private ResultSet getResultSet(JsonNode serializedResultSet) {
*
* @return ZonedSchema object generated from the JsonNode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix comment here from ZonedSchema to ResultSetSchema

@@ -229,21 +228,25 @@ private ZonedSchema getZonedSchema(JsonNode schemaNode) {
);

//Recreate ZonedSchema from granularity and timezone values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix comment here

/**
* The schema for a result set.
*/
public class ResultSetSchema extends BaseSchema implements Schema {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not need to implement Schema since BaseSchema implements Schema already?

Map<String, Object> schemaComponents = new HashMap<>();

String timeId = (schema instanceof ZonedSchema) ?
((ZonedSchema) schema).getDateTimeZone().getID() :
SYSTEM_CONFIG.getStringProperty(SYSTEM_CONFIG.getPackageVariableName("timezone"), "UTC");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are completely discarding the timezone configuration property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time zone is now riding along via the granularity rather than in the schema top level.

definition.getGrain(),
definition.getLogicalToPhysicalNames()
);
new LinkedHashSet<>();
Copy link
Collaborator

@garyluoex garyluoex Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra code.

dataSourceName,
DAY.buildZonedTimeGrain(DateTimeZone.UTC),
[] as Set
,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we move the comma back?

dataSourceName,
DAY.buildZonedTimeGrain(DateTimeZone.UTC),
[] as Set
,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we move the comma back?

dataSourceName,
DAY.buildZonedTimeGrain(DateTimeZone.UTC),
[] as Set
,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we move the comma back?

"NETWORK",
DAY.buildZonedTimeGrain(UTC),
columns
,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we move the comma back?

@@ -0,0 +1,88 @@
Support Union Data Source Well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to merge these rfc notes?

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 1

And this will definitely need a CHANGELOG entry (or possibly many)

@@ -167,6 +167,7 @@
SYSTEM_CONFIG.getPackageVariableName("loader_scheduler_thread_pool_size"),
LOADER_SCHEDULER_THREAD_POOL_SIZE_DEFAULT
);
public static final String SYSTEM_CONFIG_TIMEZONE = "timezone";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In places where the constant is the property key, we've generally taken the approach of including _KEY in the constant name, so that there's no confusion and expectation that this constant holds the actual timezone, rather than the key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// The data source is the table directly, since there is no nested query below us
DataSource dataSource = new TableDataSource(table);
DataSource dataSource = new TableDataSource((ConcretePhysicalTable) table);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long do we expect this code to live like this? I'm leery of the possibly unsafe cast w/o a check / log nearby

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are casting to a ConceretePhysicalTable in each of these building methods. Should we just update the signatures of tehse methods to take a ConcretePhysicalTable instead, and then do the cast before invoking these methods?

Also, what's the planned alternative to this? It looks like we're implicitly assuming that when we go to actually build a druid query, we are working with actual druid tables. Which makes sense until you factor in union datasources, unless union datasource are also considered ConcretePhysicalTables (haven't gotten that far).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'm putting in a slightly more resilient variation. However, we should still tweak this in some way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a little tech debt here. Existing union data sources will construct with either non-Concrete physical tables or sets of tables. Table data source will only be usable by concrete physical tables. This polymorphism will push up the stack, but I'd like to take it as tech debt right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound reasonable.

/**
* The schema for a result set.
*/
public class ResultSetSchema extends BaseSchema implements Schema {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to declare that it implements Schema, since extends BaseSchema does that.

import javax.validation.constraints.NotNull;

/**
* The schema for a result set.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we could expand a bit on what it means to be the schema of a result set. If we can't, that's fine, but I'd love to see this class comment be more useful / descriptive than just repeating the class name.

Things like expected use, meaning, and what it adds beyond (or how it differs from) a Schema or BaseSchema would all be good candidates for info to add here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if the next commit is better.

/**
* The granularity of the ResultSet.
*/
private Granularity granularity;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be final?

*/
public ResultSetSchema(
@NotNull Granularity granularity, Iterable<Column> columns
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can unwrap

*
* @param resultSetSchema the result set schema being copied
*/
public ResultSetSchema(ResultSetSchema resultSetSchema) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed. Since ResultSetSchema is immutable, creating a new object with exactly the same contents isn't needed because it's exactly the same as simply using the 1st instance.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 2

@@ -61,7 +61,9 @@ public static PreResponse buildErrorPreResponse(Throwable throwable) {
}

return new PreResponse(
new ResultSet(Collections.emptyList(), new Schema(AllGranularity.INSTANCE)),
new ResultSet(
new ResultSetSchema(AllGranularity.INSTANCE, Collections.emptySet()), Collections.emptyList()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Param list should "chop" down if it needs to wrap. (Collections.emptyList() belongs on a new line)

*
* @return the result set being constructed
*/
public ResultSetSchema withAddColumn(Column c) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it's not particularly confusing what c is, but could we get a better name for the param in the interest of self-documenting (if nothing else)?

static ApiMetricName of(String name) {
return new ApiMetricName() {
@Override
public boolean isValidFor(final TimeGrain grain) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final not needed

*
* @param name the name being wrapped
*
* @return an anonymous subclass instance of ApiMetricName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste error

public String asName() {
return name;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this implement equals and hashcode?

/**
* Constructor.
*
* @param name The column name
*/
protected MetricColumn(String name) {
public MetricColumn(String name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this made public b/c the static "factory" thing went away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

protected Result map(Result result, Schema schema) {
return result;
protected Result map(Result result, ResultSetSchema schema) {
throw new UnsupportedOperationException("This code should never be reached.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should it never be reached? It's not clear why this method is not supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is potentially called only from the general map function, and it isn't here.

Result newResult;
ResultSetSchema schema = map(resultSet.getSchema());
MetricColumn column = schema.getColumn(ROW_NUM_COLUMN_NAME, MetricColumn.class).orElseThrow(
() -> new IllegalStateException("Unexpected missing column")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should say which column is missing?

And should this be in the ErrorMessage enum?

And we should probably log this as an error, since it's likely to kill the request I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just going to remove this check. We literally just added this column to the schema the line before.

if (columnName == null) {
throw new IllegalStateException("Cannot map results without a column name");
}

MetricColumn metricColumn = (MetricColumn) schema.getColumn(columnName);
MetricColumn metricColumn = schema.<MetricColumn>getColumn(columnName, MetricColumn.class).orElseThrow(
() -> new IllegalStateException("Unexpected missing column")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should say which column is missing?

And should this be in the ErrorMessage enum?

And we should probably log this as an error, since it's likely to kill the request I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next commit will add capture for ResultSetMapper errors above where mapping is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've push the error handling for this up into ResultSetResponseProcessor.

super(DefaultDataSourceType.TABLE, Collections.singleton(physicalTable));

this.name = physicalTable.getFactTableName();
this.name = physicalTable.getAvailability().getDataSourceNames().stream().findFirst().get().asName();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify this, and because ConcretePhyiscalTable can only have 1 backing table, could/should we push this logic into ConcretePhysicalTable and hide it behind a getTableName() method?

And actually, after looking a bit deeper, ConcretePhyiscalTable already / still has a getFactTableName method that does this logic, so we should use that and rely on the contract / concept of ConcretePhyiscalTable.

Copy link
Contributor

@archolewa archolewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor quibbles and questions. In general though, it looks great.

@@ -202,8 +203,9 @@ protected GroupByQuery buildGroupByQuery(
if (!template.isNested()) {
LOG.trace("Building a single pass druid groupBy query");

// TODO FIXME hardcoding to concrete for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue for fixing this?

// The data source is the table directly, since there is no nested query below us
DataSource dataSource = new TableDataSource(table);
DataSource dataSource = new TableDataSource((ConcretePhysicalTable) table);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are casting to a ConceretePhysicalTable in each of these building methods. Should we just update the signatures of tehse methods to take a ConcretePhysicalTable instead, and then do the cast before invoking these methods?

Also, what's the planned alternative to this? It looks like we're implicitly assuming that when we go to actually build a druid query, we are working with actual druid tables. Which makes sense until you factor in union datasources, unless union datasource are also considered ConcretePhysicalTables (haven't gotten that far).

@@ -40,11 +39,17 @@
* @param jsonResult Druid results in json
* @param schema Schema for results
* @param queryType the type of query, note that this implementation only supports instances of
* @param dateTimeZone the time zone used for format the results
* {@link DefaultQueryType}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line goes with the parameter queryType, not dateTimeZone

@@ -2,11 +2,11 @@
// Licensed under the terms of the Apache license. Please see LICENSE.md file distributed with this work for terms.
package com.yahoo.bard.webservice.table.resolver;

import com.yahoo.bard.webservice.table.PhysicalTable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be kept where it was.

generateGranularity(schemaNode.get(SCHEMA_GRANULARITY).asText(), timezone),
timezone
);
Granularity granularity = generateGranularity(schemaNode.get(SCHEMA_GRANULARITY).asText(), timezone);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used.

*/
public ResultSet(List<Result> results, Schema schema) {
public ResultSet(ResultSetSchema schema, List<Result> results) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we flipping the order of parameters? Seems like an unnecessary backwards incompatible change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the old order was stupid

/**
* The schema for a result set.
*/
public class ResultSetSchema extends BaseSchema implements Schema {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of inconsistency here concerning whether or not a ResultSetSchema has an order. It takes as an argument a simple Set, but withAddColumn refers to a final column, which only makes sense if the Schema has an ordering.

So in other words, we have a behavior, where the original columns are unordered, until we add another column, in which case the new schema has a fixed (arbitrary) order such that the new columns added always appear last?

I think I remember us talking about something like this, where something took an unordered collection and spat out an ordered collection, and I remember being ok with it, but I don't remember if it was for this class or a different one. But either way, this "no fixed order until we add a new column then order" is kind of weird.

If we want this schema to have some kind of order, then we should probably take in a LinkedHashSet. Or try to explain that it imposes a (potentially arbitrary) order on the columns depending on how the passed in set's iterator happens to spit out the columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm switching to an Iterable in LinkedHashSet out idiom
One of the libraries we use supports that and it makes a lot of sense because it unambiguously commuicates that we're set-ifying, and the Iterable makes it clear that we're concerned with order while being polymorphic across sources.

@@ -2,9 +2,9 @@
// Licensed under the terms of the Apache license. Please see LICENSE.md file distributed with this work for terms.
package com.yahoo.bard.webservice.table.resolver;

import com.yahoo.bard.webservice.table.PhysicalTable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay where it was.

import com.yahoo.bard.webservice.table.PhysicalTable;
import com.yahoo.bard.webservice.data.metric.TemplateDruidQuery;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -4,12 +4,12 @@

import static com.yahoo.bard.webservice.web.ErrorMessageFormat.TABLE_SCHEMA_UNDEFINED;

import com.yahoo.bard.webservice.table.PhysicalTable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@archolewa
Copy link
Contributor

👍

@garyluoex
Copy link
Collaborator

👍 The upcoming PRs will change many things in here

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part A

new ResultSet(new ResultSetSchema(
AllGranularity.INSTANCE,
Collections.emptySet()),
Collections.emptyList()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/notablocker

Wrap the closing paren onto the next line

@@ -35,6 +38,9 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import avro.shaded.com.google.common.collect.Sets;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly not the right import. We shouldn't depend on a shadowed import in another library.

// The data source is the table directly, since there is no nested query below us
DataSource dataSource = new TableDataSource(table);
//DataSource dataSource = new TableDataSource((ConcretePhysicalTable) table);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just delete this? If not, add a comment explaining why we're keeping makes sense.

// The data source is the table directly, since there is no nested query below us
DataSource dataSource = new TableDataSource(table);
DataSource dataSource = new TableDataSource((ConcretePhysicalTable) table);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have the same "concrete if concrete otherwise Union" logic that the GroupBy builder method has?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Fix incoming.

Set<TableName> names = table.getAvailability().getDataSourceNames();
if (names.isEmpty()) {
throw new IllegalStateException("Misconfigured table with no backing datasource.");
} else if (names.size() == 1 && table instanceof ConcretePhysicalTable) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the TimeseriesQueryBuild method yet a 3rd version of constructing the DataSource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored in next checkin

*
* @param logicalName Logical name to lookup in physical table
* @return Translated logicalName if applicable
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

JavaDoc not needed, since it's a duplicate of the interface

public Map<Column, List<Interval>> getAvailableIntervals() {
return getAvailability().getAvailableIntervals();
}
/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank line needed.

*
* @return tableEntries map of column to set of available intervals
*
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

JavaDoc not needed, since it's a duplicate of the interface

* @return The time grain of this physical table
*
* @deprecated use getSchema().getGranularity()
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

JavaDoc not needed, since it's a duplicate of the interface

*/
public void resetColumns(SegmentMetadata segmentMetadata, DimensionDictionary dimensionDictionary) {
Map<String, Set<Interval>> dimensionIntervals = segmentMetadata.getDimensionIntervals();
Map<String, Set<Interval>> metricIntervals = segmentMetadata.getMetricIntervals();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

Can just inline thse

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part B


@Override
public LinkedHashSet<Column> getColumns() {
return columns;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be (somewhere, probably the constructor) made to be immutable (if possible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harder than it seems. There are no types that ARE immutable but also extend LinkedHashSet

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see. That's dumb.

import javax.validation.constraints.NotNull;

/**
* An instance of Physical table that is backed by a single fact table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

It's not an instance, but perhaps it's an implementation?

/**
* Create a concrete physical table.
*
* @param name Fili name of the physical table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc should make it more clear what the differences is between this and the other constructor, particularly around things like what values are used as defaults...

* Getter.
*
* @return The granularity
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

This javadoc isn't needed, since this is just a getter.


/**
* Physical Table represents a druid table.
* An interface describing a config level physical table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd that this interface references config. Isn't it a very core interface, and used by lots more than config?

It's also a bit self-referential... Perhaps we cast it in a FactSourceRepresentation light?

/**
* An availability which guarantees immutability on its contents.
*/
//public class ImmutableAvailability extends ImmutableWrapperMap<Column, List<Interval>> implements Availability {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

//public class ImmutableAvailability extends ImmutableWrapperMap<Column, List<Interval>> implements Availability {
public class ImmutableAvailability implements Availability {

private final TableName name;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

Technically, because you're holding an interface here which itself has no immutability assertion, this class cannot ensure immutability.


@Override
public Set<TableName> getDataSourceNames() {
return Sets.newHashSet(name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this conversion in the constructor?


@Override
public int hashCode() {
return columnIntervals.hashCode();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to include the TableName?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs addressing?

if (this == obj) {
return true;
}
if (obj instanceof ImmutableAvailability) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to pay attention to TableName...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reached end of changes.

result.getDimensionRow((DimensionColumn) schema.columns.toArray()[0])?.get(BardDimensionField.DESC) == "u"
result.getDimensionRow((DimensionColumn) schema.columns.toArray()[2])?.get(BardDimensionField.DESC) == "4"
result.getDimensionRow((DimensionColumn) schema.columns.toArray()[3])?.get(BardDimensionField.DESC) == ""
result.getDimensionRow((DimensionColumn) schema.columns.toArray()[3])?.get(BardDimensionField.ID) == "foo"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changing ordinal and order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. @garyluoex , did you change these?

resultWithNullDimensionKey.getDimensionRow(schema.columns.toArray()[0])?.get(BardDimensionField.ID) == ""
resultWithNullDimensionKey.getDimensionRow(schema.columns.toArray()[0])?.get(BardDimensionField.DESC) ==
resultWithNullDimensionKey.getDimensionRow((DimensionColumn) schema.columns.toArray()[2])?.get(BardDimensionField.ID) == ""
resultWithNullDimensionKey.getDimensionRow((DimensionColumn) schema.columns.toArray()[2])?.get(BardDimensionField.DESC) ==
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the ordinal change?

Note: This seems to apply to many other places in this file as well.

MetricColumn.addNewMetricColumn(schema, "lookback_pageViews")
MetricColumn.addNewMetricColumn(schema, "retentionPageViews")
ResultSet resultSet = new DruidResponseParser().parse(jsonResult, schema, DefaultQueryType.LOOKBACK)
ResultSetSchema schema = new ResultSetSchema(DAY, [new MetricColumn("pageViews"), new MetricColumn("lookback_pageViews"), new MetricColumn("retentionPageViews")].toSet())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

Rather than calling .toSet(), it's more idiomatic to call as Set in Groovy.

Note: Applies to other places in this PR as well.


then:
thrown(UnsupportedOperationException)

}

def "Build the schema from the query"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To where did this test move?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved it here: ResultSetResponseProcessorSpec (next commit)

}

Schema schema = new ResultSetSchema(DAY, columns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

Can just return the new ResultSetSchema(...) directly. No need to assign to temp var


Map<Column, List<Interval>> availabilityMap1 = [:]
Map<Column, List<Interval>> availabilityMap2 = [:]
Map<Column, List<Interval>> availabilityMap3 = [:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give these better names? "1", "2", "3" isn't very clear about what these will hold.

responseContext.put("missingIntervals", ["a","b","c", new SimplifiedIntervalList([interval]), bigDecimal])
responseContext.put(
"missingIntervals",
["a", "b", "c", new SimplifiedIntervalList([interval]), bigDecimal] as ArrayList
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to coerce this to an ArrayList?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arraylist is serializable

}
ResultSetSchema schema = new ResultSetSchema(granularity, columns)
return schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

Just return it directly?

ScanSearchProviderManager.getInstance("gender")
)
def filtered_metric_name = "FOO_NO_BAR"
Set<ApiMetricName> metricNames = (["FOO", filtered_metric_name].collect() { ApiMetricName.of(it)}) as Set
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlocker

empty parens are not needed on collect here

import com.yahoo.bard.webservice.web.DataApiRequest
import com.yahoo.bard.webservice.web.ResponseFormatType

import com.fasterxml.jackson.databind.JsonNode

import org.joda.time.DateTimeZone

import avro.shaded.com.google.common.collect.Sets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong import

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few final things. Nothing big.

@@ -219,37 +241,19 @@ protected GroupByQuery buildGroupByQuery(
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is wrong here, and I think there's a regression where, in a nested case, filters are also being applied at the outer query, when they were no before. I don't think this is correctness issue, but it's definitely an optimality issue in terms of query size, as well as work Druid is doing for filtering.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, never mind. I see what's happening. I missed the fact that filters get set to null when building the inner query. We should definitely add a comment on that, and probably toss a comment on the outer query filter location, so that it's obvious what had happened (it's easy to miss).


// The data source is the table directly, since there is no nested query below us
DataSource dataSource = new TableDataSource(table);
filter = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should definitely add a comment here to call this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

*
* @return The logical table built
*/
public LogicalTable buildLogicalTable(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these were public methods, are we sure we can / want to just remove them? I'm pretty sure other code bases were relying on them for configuration, so if we could deprecate them instead, that would be better.

));
}

public void setAvailability(Availability availability) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be protected?

return super.toString() + " factTableName: " + factTableName + " alignment: " + getTableAlignment();
}
/**
* Get the time grain from the physical table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This javadoc doesn't match what the method seems to be doing, based on it's name.

return physicalToLogicalColumnNames.getOrDefault(physicalName, Collections.singleton(physicalName));
}
@Deprecated
Set<Column> getColumns();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these deprecated methods, they can be made default and the implementations can be lifted / removed from the BasePhysicalTable up into this interface. Right now, there's a mix of some of the deprecated methods being defaulted (but still implemented), some deprecated, but their implementations pushed into the base class, etc.

public ImmutableAvailability(TableName tableName, Map<Column, List<Interval>> map) {
this.name = tableName;
columnIntervals = ImmutableMap.copyOf(map);
dataSourceNames = Sets.newHashSet(name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should make it into an immutable set (either ImmutableSet.of(name) or something using Collections)

resultSet[0].getMetricValueAsNumber(schema.getColumn("pageViews")) == 1 as BigDecimal
resultSet[0].getMetricValueAsNumber(schema.getColumn("time_spent")) == 2 as BigDecimal
Result firstResult = resultSet.get(0)
List<Column> columns = new ArrayList<>(schema.columns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused

Set<Column> dimensionColumns
Column ageColumn
Column genderColumn
Column unknownColumn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all be DimensionColumn-typed

@@ -43,13 +36,17 @@ class DruidResponseParserSpec extends Specification {

@Shared DimensionDictionary dimensionDictionary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/NotABlockerBecauseYouDidn'tChangeIt

It would be great to make this not be shared so that we can't leak state across tests. (or at least not as easily)


import org.joda.time.Interval;

import avro.shaded.com.google.common.collect.Sets;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong import

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 pending getting the build to pass.

Javadoc tightening
Stream simplification
Test variable renames
@michael-mclawhorn michael-mclawhorn merged commit 5772ead into master Mar 13, 2017
@michael-mclawhorn michael-mclawhorn deleted the RefactorTable branch March 13, 2017 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants