-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor PhysicalTableDefinition and Update BaseTableLoader #207
Conversation
cb7774a
to
ed06b70
Compare
ed06b70
to
4a10981
Compare
4a10981
to
4e58f37
Compare
4e58f37
to
9847d6f
Compare
LOG.error("Unable to resolve physical table dependency for physical table: " + currentTableName.asName()); | ||
throw new RuntimeException("Unable to resolve physical table dependency for physical table: " + | ||
currentTableName | ||
.asName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move up a line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think this should wrap differently in general:
throw new RuntimeException(
"Unable to resolve physical table dependency for physical table: " + currentTableName.asName()
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial feedback.
import com.google.common.collect.ImmutableSet; | ||
|
||
import org.joda.time.DateTimeZone; | ||
import avro.shaded.com.google.common.collect.ImmutableSet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong import
@@ -159,6 +155,113 @@ public void loadLogicalTableWithGranularities( | |||
} | |||
|
|||
/** | |||
* Load a new physical table into the dictionary and return the loaded physical table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This JavaDoc doesn't seem to do a good job at explaining the name of the method. In particular, why does the method call out WithDependency
? What does that mean? The doc doesn't indicate what's going on regarding that.
@@ -159,6 +155,113 @@ public void loadLogicalTableWithGranularities( | |||
} | |||
|
|||
/** | |||
* Load a new physical table into the dictionary and return the loaded physical table. | |||
* | |||
* @param definitions A config object for the physical table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should say something like table configurations to load
? (ie. plural instead of singular)
@@ -159,6 +155,113 @@ public void loadLogicalTableWithGranularities( | |||
} | |||
|
|||
/** | |||
* Load a new physical table into the dictionary and return the loaded physical table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be plural instead of singular?
* @param definitions A config object for the physical table | ||
* @param dictionaries The resource dictionaries for reading and storing resource data | ||
* | ||
* @return The physical table created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be plural?
// If the dependent table definition is currently being build or missing | ||
if (Objects.isNull(currentTableDefinition)) { | ||
LOG.error("Unable to resolve physical table dependency for physical table: " + currentTableName.asName()); | ||
throw new RuntimeException("Unable to resolve physical table dependency for physical table: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract the message into a single String.format
and pass the same string to both users, rather than duplicate the message building.
// If the dependent table definition is currently being build or missing | ||
if (Objects.isNull(currentTableDefinition)) { | ||
LOG.error("Unable to resolve physical table dependency for physical table: " + currentTableName.asName()); | ||
throw new RuntimeException("Unable to resolve physical table dependency for physical table: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May also want to indicate that this error could be because of a circular dependency, since we're not sure at this point why the definition isn't available.
) { | ||
this(name, grain.buildZonedTimeGrain(DateTimeZone.UTC), dimensions); | ||
} | ||
public abstract Optional<PhysicalTable> build( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this return Optional
? When do we ever expect that a table couldn't be built that isn't an error condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much now, maybe good for debugging, but it feels right that it should return an Optional
when you build something. Maybe some fili application will extend some new dimension and dimension definition that will fail to build for some reason. Let me know if you want me to remove the Optional
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment elsewhere with deeper thoughts. tldr: Yes, let's remove the Optional
wrapper.
physicalTableDictionary.put(currentPhysicalTable.getName(), currentPhysicalTable); | ||
} else { | ||
LOG.error("Unable to build physical table: " + currentTableDefinition.getName().asName()); | ||
throw new RuntimeException("Unable to build physical table: " + currentTableDefinition.getName().asName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract the message into a single String.format
and pass the same string to both users, rather than duplicate the message building.
Map<TableName, PhysicalTableDefinition> mutableCopyDefinitionMap = | ||
buildPhysicalTableDefinitionDictionary(definitions); | ||
|
||
definitions.forEach(definition -> buildPhysicalTablesWithDependency( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This construct feels odd... It's like we're doing a recursion, but modifying a structure and filling up a bucket as we go. I think things would be much cleaner if we used a more traditional DepthFirstBuilding approach, where the return value of the method returns a LinkedHashSet of all of the tables it built, or perhaps just the 1 table it wanted to build.
LinkedHashSet<PhysicalTable> physicalTables = loadPhysicalTablesWithDependency( | ||
tableDefinitions, | ||
dictionaries | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion yesterday, for this to work we need to have both the list of table names or table definitions which define the table group and ALSO have the total set of table definitions for building dependent tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments for non-test changes
* | ||
* @return A table group binding all the tables for a logical table view together. | ||
*/ | ||
public TableGroup buildDimensionSpanningTableGroup( | ||
Set<ApiMetricName> apiMetrics, | ||
Set<FieldName> druidMetrics, | ||
Set<TableName> tableNames, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to include tableNames
as a parameter? Can we just get them from the tableDefinitions
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tableDefinitions
is a super set of tableNames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Though it's not that tableDefinitions
is a superset of tableNames
if we're building a table group using a table that's already been built into another group. Perhaps we could rename tableNames
to something like tablesInTableGroup
or something that indicates their purpose more clearly?
physicalTables.add(table); | ||
} | ||
Map<String, PhysicalTableDefinition> availableTableDefinitions = | ||
buildPhysicalTableDefinitionDictionary(tableDefinitions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should wrap on the parameter list, not the assignment.
LinkedHashSet<PhysicalTable> physicalTables = tableNames.stream() | ||
.map( | ||
tableName -> physicalTableDictionary.computeIfAbsent( | ||
tableName.asName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call can be pulled up as a primary map
step: .map(TableName::asName)
key -> buildPhysicalTablesWithDependency(key, availableTableDefinitions, dictionaries) | ||
) | ||
).collect(Collectors.toCollection(LinkedHashSet::new)); | ||
|
||
|
||
//Derive the logical dimensions by taking the union of all the physical dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're messing with this method, it would be nice to lift this comment up into the JavaDoc somehow, since this is the core of what the name of the method means, and the existing JavaDoc doesn't really address the "dimension spanning" aspect at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this repeated elsewhere, I don't think these are 'logical' or 'physical' dimensions, just dimensions here. '..all the dimensions from tables in the group' perhaps.
@@ -159,6 +167,138 @@ public void loadLogicalTableWithGranularities( | |||
} | |||
|
|||
/** | |||
* Iterate through given definitions and builds all corresponding physical table with satisfied dependency. | |||
* | |||
* @param currentTableName Iterator for the mutable definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this an iterator? It looks like it's just a String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Param still seems to have an odd description?
public Optional<PhysicalTable> build( | ||
ResourceDictionaries dictionaries, | ||
DataSourceMetadataService metadataService | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can unwrap
this.timeGrain = timeGrain; | ||
this.metricNames = ImmutableSet.copyOf(metricNames); | ||
this.dimensionConfigs = ImmutableSet.copyOf(dimensionConfigs); | ||
this.logicalToPhysicalNames = Collections.unmodifiableMap(buildLogicalToPhysicalNames(dimensionConfigs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want this class to be primarily a holder for data needed to build a physical table, we need a constructor that takes this logicalToPhysicalNames
mapping in as a parameter, rather than just assuming we can build it from the dimension configs.
The reason for this stems from the reason for the mapping in the first place: Dimensions do not have the same physical name on all physical tables. Because of that, you'll either need to have a complete set of DimensionConfig objects for every table, or you need to be able to pass the map into the PTDefinition directly.
(Note that this concern likely ripples down to the constructors of the subclasses as well.
public abstract Set<TableName> getDependentTableNames(); | ||
|
||
/** | ||
* Given a set of metric names and a data source metadata service, build the corresponding physical table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this javadoc is correct?
) { | ||
this(name, grain.buildZonedTimeGrain(DateTimeZone.UTC), dimensions); | ||
} | ||
public abstract Optional<PhysicalTable> build( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment elsewhere with deeper thoughts. tldr: Yes, let's remove the Optional
wrapper.
if (physicalName == null) { | ||
throw new RuntimeException("No physical name found for dimension: " | ||
+ config.getApiName()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this behavior is correct, but I'm also not sure what the correct behavior is.
Per the docs on DimensionConfig::getPhysicalName
, this value: This field (if set) is used as the only physical name
, which means that it is OK not to have a value for this field. It also means that if it's not set, the expectation is that there is more than 1 physical name for this dimension, and that information is being set on the PhysicalTables through the mapping directly.
That said, if this method is being made available as a default / helper (ie. if no mapping was provided), which is the expected use of this I think, then there should be a single physical name present for each dimension.
The other concern I have is that by throwing an exception, we're forcing users to specify the physical name of the dimension. While I think it's highly likely that most Fili instances will have different API names and physical names for dimensions, would it ease onboarding (ie. reduce required configuration) if we instead had this method log a warning, and then use the API name as the default physical name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that this not even be a warning. I think we should simply default the physical name to apiName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not a warning, but log something so that it's not surprising that we're filling in the blanks. Info or Debug levels would be fine by me, as long as the method indicates it's going to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing big for the tests, other than having more descriptive names for the definitions.
Optional<PhysicalTable> build( | ||
ResourceDictionaries dictionaries, | ||
DataSourceMetadataService metadataService | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can unwrap
PhysicalTableDefinition definition3 | ||
PhysicalTableDefinition definition4 | ||
PhysicalTableDefinition definition5 | ||
PhysicalTableDefinition definition6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give these more descriptive names? Seeing it in code, definition4
means nothing, but selfDependentDefinition
is much more meaningful, for example.
dimNames = TestApiDimensionName.getByLogicalTable(SHAPES) | ||
tableNames = physDefs.stream().map({it -> it.getName()}).collect(Collectors.toSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're in groovy, it's generally more idiomatic to use groovy's built-in collection processing, closures, and property access:
physDefs.collect {it.name} as Set
dicts.physicalDictionary.containsValue(t1) | ||
dicts.physicalDictionary.containsValue(t2) | ||
dicts.physicalDictionary.size() == 6 | ||
group.physicalTables.stream().map({it.getTableName()}).collect(Collectors.toSet()) as Set == tableNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be groovier: group.physicalTables.collect {it.tableName} as Set
Note: Applies elsewhere in PR
List<PhysicalTableDefinition> allDefs = physDefs as List | ||
def "loading physical tables with dependency loads all satisfied dependency physical tables"() { | ||
given: | ||
Set<TableName> currentTableNames = [definition1.getName(), definition2.getName(), definition3.getName()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
property access is groovier: definition1.name, definition2.name, definition3.name
Note: Applies elsewhere in PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are not testing the behavior of getName()
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
property access in groovy goes through the getter/setter methods. If you want direct field access, it's a different operator (something like definition1@.name
I think). Groovy realized that the more common case is getter/setter access, rather than field access, and so they made accessing JavaBean-style data easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some feedback. I need to do a fresh run at this after the next round of changes.
"circular dependency", currentTableName); | ||
|
||
LOG.error(message); | ||
throw new RuntimeException(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make this an IllegalStateException
if (physicalName == null) { | ||
throw new RuntimeException("No physical name found for dimension: " | ||
+ config.getApiName()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that this not even be a warning. I think we should simply default the physical name to apiName.
CHANGELOG.md
Outdated
* `BaseTableLoader` now constructs physical tables by calling `build` method on `PhysicalTableDefinition`s in `buildPhysicalTablesWithDependency` | ||
* `buildDimensionSpanningTableGroup` method in `BaseTableLoader` now uses `loadPhysicalTablesWithDependency` instead of removed `loadPhysicalTables` | ||
* `buildDimensionSpanningTableGroup` method in `BaseTableLoader` now does not take druid metric as arguments, instead `PhysicalTableDefinition` does | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: druid metrics
key -> buildPhysicalTablesWithDependency(key, availableTableDefinitions, dictionaries) | ||
) | ||
).collect(Collectors.toCollection(LinkedHashSet::new)); | ||
|
||
|
||
//Derive the logical dimensions by taking the union of all the physical dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this repeated elsewhere, I don't think these are 'logical' or 'physical' dimensions, just dimensions here. '..all the dimensions from tables in the group' perhaps.
621a17b
to
8f4da9a
Compare
424169d
to
0dfa45f
Compare
PhysicalTableDefinition
is now an abstract class, construct usingConcretePhysicalTableDefinition
insteadPhysicalTableDefinition
now requires abuild
methods to be implemented that builds a physical tableBaseTableLoader
now constructs physical tables by callingbuild
method onPhysicalTableDefinition
s inbuildPhysicalTablesWithDependency
buildDimensionSpanningTableGroup
method inBaseTableLoader
now usesloadPhysicalTablesWithDependency
instead of removedloadPhysicalTables
buildDimensionSpanningTableGroup
method inBaseTableLoader
now does not take druid metric as arguments, insteadPhysicalTableDefinition
does