Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PhysicalTableDefinition and Update BaseTableLoader #207

Merged
merged 5 commits into from
Apr 17, 2017

Conversation

garyluoex
Copy link
Collaborator

@garyluoex garyluoex commented Mar 29, 2017

  • PhysicalTableDefinition is now an abstract class, construct using ConcretePhysicalTableDefinition instead
  • PhysicalTableDefinition now requires a build methods to be implemented that builds a physical table
  • BaseTableLoader now constructs physical tables by calling build method on PhysicalTableDefinitions in buildPhysicalTablesWithDependency
  • buildDimensionSpanningTableGroup method in BaseTableLoader now uses loadPhysicalTablesWithDependency instead of removed loadPhysicalTables
  • buildDimensionSpanningTableGroup method in BaseTableLoader now does not take druid metric as arguments, instead PhysicalTableDefinition does

LOG.error("Unable to resolve physical table dependency for physical table: " + currentTableName.asName());
throw new RuntimeException("Unable to resolve physical table dependency for physical table: " +
currentTableName
.asName());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move up a line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think this should wrap differently in general:

throw new RuntimeException(
        "Unable to resolve physical table dependency for physical table: " + currentTableName.asName()
);

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial feedback.

import com.google.common.collect.ImmutableSet;

import org.joda.time.DateTimeZone;
import avro.shaded.com.google.common.collect.ImmutableSet;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong import

@@ -159,6 +155,113 @@ public void loadLogicalTableWithGranularities(
}

/**
* Load a new physical table into the dictionary and return the loaded physical table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This JavaDoc doesn't seem to do a good job at explaining the name of the method. In particular, why does the method call out WithDependency? What does that mean? The doc doesn't indicate what's going on regarding that.

@@ -159,6 +155,113 @@ public void loadLogicalTableWithGranularities(
}

/**
* Load a new physical table into the dictionary and return the loaded physical table.
*
* @param definitions A config object for the physical table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should say something like table configurations to load? (ie. plural instead of singular)

@@ -159,6 +155,113 @@ public void loadLogicalTableWithGranularities(
}

/**
* Load a new physical table into the dictionary and return the loaded physical table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be plural instead of singular?

* @param definitions A config object for the physical table
* @param dictionaries The resource dictionaries for reading and storing resource data
*
* @return The physical table created
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be plural?

// If the dependent table definition is currently being build or missing
if (Objects.isNull(currentTableDefinition)) {
LOG.error("Unable to resolve physical table dependency for physical table: " + currentTableName.asName());
throw new RuntimeException("Unable to resolve physical table dependency for physical table: " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract the message into a single String.format and pass the same string to both users, rather than duplicate the message building.

// If the dependent table definition is currently being build or missing
if (Objects.isNull(currentTableDefinition)) {
LOG.error("Unable to resolve physical table dependency for physical table: " + currentTableName.asName());
throw new RuntimeException("Unable to resolve physical table dependency for physical table: " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May also want to indicate that this error could be because of a circular dependency, since we're not sure at this point why the definition isn't available.

) {
this(name, grain.buildZonedTimeGrain(DateTimeZone.UTC), dimensions);
}
public abstract Optional<PhysicalTable> build(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return Optional? When do we ever expect that a table couldn't be built that isn't an error condition?

Copy link
Collaborator Author

@garyluoex garyluoex Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much now, maybe good for debugging, but it feels right that it should return an Optional when you build something. Maybe some fili application will extend some new dimension and dimension definition that will fail to build for some reason. Let me know if you want me to remove the Optional or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment elsewhere with deeper thoughts. tldr: Yes, let's remove the Optional wrapper.

physicalTableDictionary.put(currentPhysicalTable.getName(), currentPhysicalTable);
} else {
LOG.error("Unable to build physical table: " + currentTableDefinition.getName().asName());
throw new RuntimeException("Unable to build physical table: " + currentTableDefinition.getName().asName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract the message into a single String.format and pass the same string to both users, rather than duplicate the message building.

Map<TableName, PhysicalTableDefinition> mutableCopyDefinitionMap =
buildPhysicalTableDefinitionDictionary(definitions);

definitions.forEach(definition -> buildPhysicalTablesWithDependency(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This construct feels odd... It's like we're doing a recursion, but modifying a structure and filling up a bucket as we go. I think things would be much cleaner if we used a more traditional DepthFirstBuilding approach, where the return value of the method returns a LinkedHashSet of all of the tables it built, or perhaps just the 1 table it wanted to build.

LinkedHashSet<PhysicalTable> physicalTables = loadPhysicalTablesWithDependency(
tableDefinitions,
dictionaries
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion yesterday, for this to work we need to have both the list of table names or table definitions which define the table group and ALSO have the total set of table definitions for building dependent tables.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for non-test changes

*
* @return A table group binding all the tables for a logical table view together.
*/
public TableGroup buildDimensionSpanningTableGroup(
Set<ApiMetricName> apiMetrics,
Set<FieldName> druidMetrics,
Set<TableName> tableNames,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include tableNames as a parameter? Can we just get them from the tableDefinitions instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableDefinitions is a super set of tableNames

Copy link
Collaborator

@cdeszaq cdeszaq Apr 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Though it's not that tableDefinitions is a superset of tableNames if we're building a table group using a table that's already been built into another group. Perhaps we could rename tableNames to something like tablesInTableGroup or something that indicates their purpose more clearly?

physicalTables.add(table);
}
Map<String, PhysicalTableDefinition> availableTableDefinitions =
buildPhysicalTableDefinitionDictionary(tableDefinitions);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should wrap on the parameter list, not the assignment.

LinkedHashSet<PhysicalTable> physicalTables = tableNames.stream()
.map(
tableName -> physicalTableDictionary.computeIfAbsent(
tableName.asName(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call can be pulled up as a primary map step: .map(TableName::asName)

key -> buildPhysicalTablesWithDependency(key, availableTableDefinitions, dictionaries)
)
).collect(Collectors.toCollection(LinkedHashSet::new));


//Derive the logical dimensions by taking the union of all the physical dimensions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're messing with this method, it would be nice to lift this comment up into the JavaDoc somehow, since this is the core of what the name of the method means, and the existing JavaDoc doesn't really address the "dimension spanning" aspect at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this repeated elsewhere, I don't think these are 'logical' or 'physical' dimensions, just dimensions here. '..all the dimensions from tables in the group' perhaps.

@@ -159,6 +167,138 @@ public void loadLogicalTableWithGranularities(
}

/**
* Iterate through given definitions and builds all corresponding physical table with satisfied dependency.
*
* @param currentTableName Iterator for the mutable definition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this an iterator? It looks like it's just a String?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Param still seems to have an odd description?

public Optional<PhysicalTable> build(
ResourceDictionaries dictionaries,
DataSourceMetadataService metadataService
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can unwrap

this.timeGrain = timeGrain;
this.metricNames = ImmutableSet.copyOf(metricNames);
this.dimensionConfigs = ImmutableSet.copyOf(dimensionConfigs);
this.logicalToPhysicalNames = Collections.unmodifiableMap(buildLogicalToPhysicalNames(dimensionConfigs));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want this class to be primarily a holder for data needed to build a physical table, we need a constructor that takes this logicalToPhysicalNames mapping in as a parameter, rather than just assuming we can build it from the dimension configs.

The reason for this stems from the reason for the mapping in the first place: Dimensions do not have the same physical name on all physical tables. Because of that, you'll either need to have a complete set of DimensionConfig objects for every table, or you need to be able to pass the map into the PTDefinition directly.

(Note that this concern likely ripples down to the constructors of the subclasses as well.

public abstract Set<TableName> getDependentTableNames();

/**
* Given a set of metric names and a data source metadata service, build the corresponding physical table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this javadoc is correct?

) {
this(name, grain.buildZonedTimeGrain(DateTimeZone.UTC), dimensions);
}
public abstract Optional<PhysicalTable> build(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment elsewhere with deeper thoughts. tldr: Yes, let's remove the Optional wrapper.

if (physicalName == null) {
throw new RuntimeException("No physical name found for dimension: "
+ config.getApiName());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this behavior is correct, but I'm also not sure what the correct behavior is.

Per the docs on DimensionConfig::getPhysicalName, this value: This field (if set) is used as the only physical name, which means that it is OK not to have a value for this field. It also means that if it's not set, the expectation is that there is more than 1 physical name for this dimension, and that information is being set on the PhysicalTables through the mapping directly.

That said, if this method is being made available as a default / helper (ie. if no mapping was provided), which is the expected use of this I think, then there should be a single physical name present for each dimension.

The other concern I have is that by throwing an exception, we're forcing users to specify the physical name of the dimension. While I think it's highly likely that most Fili instances will have different API names and physical names for dimensions, would it ease onboarding (ie. reduce required configuration) if we instead had this method log a warning, and then use the API name as the default physical name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that this not even be a warning. I think we should simply default the physical name to apiName.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not a warning, but log something so that it's not surprising that we're filling in the blanks. Info or Debug levels would be fine by me, as long as the method indicates it's going to do this.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing big for the tests, other than having more descriptive names for the definitions.

Optional<PhysicalTable> build(
ResourceDictionaries dictionaries,
DataSourceMetadataService metadataService
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can unwrap

PhysicalTableDefinition definition3
PhysicalTableDefinition definition4
PhysicalTableDefinition definition5
PhysicalTableDefinition definition6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give these more descriptive names? Seeing it in code, definition4 means nothing, but selfDependentDefinition is much more meaningful, for example.

dimNames = TestApiDimensionName.getByLogicalTable(SHAPES)
tableNames = physDefs.stream().map({it -> it.getName()}).collect(Collectors.toSet())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're in groovy, it's generally more idiomatic to use groovy's built-in collection processing, closures, and property access:

physDefs.collect {it.name} as Set

dicts.physicalDictionary.containsValue(t1)
dicts.physicalDictionary.containsValue(t2)
dicts.physicalDictionary.size() == 6
group.physicalTables.stream().map({it.getTableName()}).collect(Collectors.toSet()) as Set == tableNames
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be groovier: group.physicalTables.collect {it.tableName} as Set

Note: Applies elsewhere in PR

List<PhysicalTableDefinition> allDefs = physDefs as List
def "loading physical tables with dependency loads all satisfied dependency physical tables"() {
given:
Set<TableName> currentTableNames = [definition1.getName(), definition2.getName(), definition3.getName()]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

property access is groovier: definition1.name, definition2.name, definition3.name

Note: Applies elsewhere in PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are not testing the behavior of getName() anymore?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

property access in groovy goes through the getter/setter methods. If you want direct field access, it's a different operator (something like definition1@.name I think). Groovy realized that the more common case is getter/setter access, rather than field access, and so they made accessing JavaBean-style data easier.

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some feedback. I need to do a fresh run at this after the next round of changes.

"circular dependency", currentTableName);

LOG.error(message);
throw new RuntimeException(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make this an IllegalStateException

if (physicalName == null) {
throw new RuntimeException("No physical name found for dimension: "
+ config.getApiName());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that this not even be a warning. I think we should simply default the physical name to apiName.

CHANGELOG.md Outdated
* `BaseTableLoader` now constructs physical tables by calling `build` method on `PhysicalTableDefinition`s in `buildPhysicalTablesWithDependency`
* `buildDimensionSpanningTableGroup` method in `BaseTableLoader` now uses `loadPhysicalTablesWithDependency` instead of removed `loadPhysicalTables`
* `buildDimensionSpanningTableGroup` method in `BaseTableLoader` now does not take druid metric as arguments, instead `PhysicalTableDefinition` does

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: druid metrics

key -> buildPhysicalTablesWithDependency(key, availableTableDefinitions, dictionaries)
)
).collect(Collectors.toCollection(LinkedHashSet::new));


//Derive the logical dimensions by taking the union of all the physical dimensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this repeated elsewhere, I don't think these are 'logical' or 'physical' dimensions, just dimensions here. '..all the dimensions from tables in the group' perhaps.

@garyluoex garyluoex force-pushed the RefactorPhysicalDef branch from 621a17b to 8f4da9a Compare April 17, 2017 19:10
@garyluoex garyluoex force-pushed the RefactorPhysicalDef branch from 424169d to 0dfa45f Compare April 17, 2017 21:08
@cdeszaq cdeszaq merged commit 08d0467 into master Apr 17, 2017
@garyluoex garyluoex deleted the RefactorPhysicalDef branch April 17, 2017 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants