-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove restriction for single physical dimension to multiple lookup dimension #112
Conversation
9ef0e67
to
9084e10
Compare
9084e10
to
a8097e7
Compare
@@ -38,7 +40,7 @@ | |||
private Map<Column, Set<Interval>> workingIntervals; | |||
private final Object mutex = new Object(); | |||
private final Map<String, String> logicalToPhysicalColumnNames; | |||
private final Map<String, String> physicalToLogicalColumnNames; | |||
private final Map<String, List<String>> physicalToLogicalColumnNames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason we're making this a List
rather than a Collection
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Set
fit better (since it should be a unique set of logical dimensions)? I'm not sure what are the advantages of just using Collection
, fill me in if you have time, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really more about avoiding overspecification. The places we use it only needs a Collection
(we're iterating over every element). And at least in the current implementation we don't seem overly concerned with uniqueness. Though having the same dimension twice could lead to weirdness if addNewDimensionColumn
isn't idempotent.
I'm not too concerned about it, it's just that we usually default to Collection
unless we have a good reason for using a more specific type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to have any headaches around nested types. Let's just use set for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will make it Set
since it makes sense to add a uniqueness constraint and doesn't make sense that there are duplicates.
continue; | ||
} | ||
DimensionColumn dimensionColumn = DimensionColumn.addNewDimensionColumn(this, dimension); | ||
workingIntervals.put(dimensionColumn, nameIntervals.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can streamify this easily enough, I think:
getLogicalColumnName(physicalName).stream()
.map(dimensionDictionary::findByApiName)
.filter(dimension -> dimension != null)
.forEach(dimension -> {
DimensionColumn dimensionColumn = DimensionColumn.addNewDimensionColumn(this, dimension);
workingIntervals.put(dimensionColumn, nameIntervals.getValue());
});
Note that you could put DimensionColumn.addNewDimensionColumn(this, dimension)
in a map
rather than the first statement of a forEach
, but addNewDimensionColumn
has side-effects, and side-effecting operations in a map
operation feels ugly to me.
@@ -237,8 +244,8 @@ public String getPhysicalColumnName(String logicalName) { | |||
* @param physicalName Physical name to lookup in physical table | |||
* @return Translated physicalName if applicable | |||
*/ | |||
private String getLogicalColumnName(String physicalName) { | |||
return physicalToLogicalColumnNames.getOrDefault(physicalName, physicalName); | |||
private List<String> getLogicalColumnName(String physicalName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this could probably be a Collection
rather than a List
.
.entrySet() | ||
.stream() | ||
.collect(StreamUtils.toLinkedMap(Map.Entry::getValue, Map.Entry::getKey)) | ||
this.logicalToPhysicalColumnNames.entrySet().stream().sequential().collect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing the sequential()
here is fine right? Will remove it in the next commit if no one complains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe sequential
just forces the stream to be sequential, even if it was originally parallel. However, by default the stream of a set is sequential, so sequential
doesn't actually do anything.
So yes, I would suggest removing the sequential.
The fact that this didn't break a test worries me. Please add a test to see that the correct set of logical column names is returned from the method in both the single-dimension-to-physical-name situation and the multiple-dimension-to-physical-name situation. |
private List<String> getLogicalColumnName(String physicalName) { | ||
return physicalToLogicalColumnNames.getOrDefault(physicalName, Arrays.asList(physicalName)); | ||
private Set<String> getLogicalColumnName(String physicalName) { | ||
return physicalToLogicalColumnNames.getOrDefault(physicalName, new HashSet<>(Arrays.asList(physicalName))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pluralize this method name. getLogicalColumnNames
@@ -14,6 +14,9 @@ Current | |||
|
|||
### Changed: | |||
|
|||
- [Remove restriction for single physical dimension to multiple lookup dimensions](https://github.com/yahoo/fili/pull/112) | |||
* Modified physical dimension name to logical dimension name mapping into a `Map<String, List<String>>` instead of `Map<String, String>` in `PhysicalTable.java` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in synch with the code change.
workingIntervals.put(dimensionColumn, nameIntervals.getValue()); | ||
|
||
getLogicalColumnName(physicalName).stream() | ||
.sequential() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is extra again, will remove in the next commit again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing big, just a few trivial cleanups. 👍
|
||
getLogicalColumnNames(physicalName).stream() | ||
.map(dimensionDictionary::findByApiName) | ||
.filter(dimension -> dimension != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects::isNotNull
simplifies this I think
.filter(dimension -> dimension != null) | ||
.forEach( | ||
dimension -> { | ||
DimensionColumn dimensionColumn = DimensionColumn.addNewDimensionColumn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a .map()
call on the stream, since you're just transforming the Dimension
into a DimensionColumn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the operations will cause side-affects right (adding things to existing things)? so if the stream runs in parallel things will get synchronization problem. forEach
is a reduce operation that allows side-affects I believe. Let me know if I'm wrong or not making sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @garyluoex on this one. It's not a big deal, especially since we don't run around randomly making streams parallel (and we shouldn't, not without carefully examining the operations in the stream first), but it just feels deeply and fundamentally wrong to me to put a side-effectful operation in a map
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate on why it feels wrong to me, I generally prefer to be hit across the head with a big neon sign when an operation has side-effects, especially when the operation returns a value as well. forEach does that, while a map
operation does not.
def "test physical to logical mapping is constructed correctly"() { | ||
setup: | ||
def oneDimPhysicalTable = new PhysicalTable("test table", DAY.buildZonedTimeGrain(UTC), ['dimension':'druidDim']) | ||
def twoDimPhysicalTable = new PhysicalTable("test table", DAY.buildZonedTimeGrain(UTC), ['dimension1':'druidDim', 'dimension2':'druidDim']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we have good reason to be taking advantage of Groovy's duck-typing, we should be explicitly specifying the types of the variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make things shorter and simpler to read since the type should be obvious from the assignment initialization in this case (it feels like it is more intuitive to me), let me know if there are any consequences of doing this or it is just good practice to declare type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in the context of our project, it's considered good practice (I don't know enough about the larger Groovy community to know if that's true more generally).
If nothing else, it improves IDE support, because the IDE knows what the variable type is.
It can also make things more readable, because it's easy for readers to see what the variable type this (though this can be partially offset because def
is more concise, so which is more readable is going to be a matter of opinion).
Duck typing has its advantages, but it has its disadvantages too. Here we're not using any of the advantages, but we're still saddled with the disadvantages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will change it then, thanks!
private String getLogicalColumnName(String physicalName) { | ||
return physicalToLogicalColumnNames.getOrDefault(physicalName, physicalName); | ||
private Set<String> getLogicalColumnNames(String physicalName) { | ||
return physicalToLogicalColumnNames.getOrDefault(physicalName, new HashSet<>(Arrays.asList(physicalName))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use Collections.singleton
.
.map(dimensionDictionary::findByApiName) | ||
.filter(Objects::nonNull) | ||
.forEach( | ||
dimension -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation seems off here. dimension ->
should be indented eight spaces, and everything inside the lambda should be indented four spaces past that.
Had a few small comments, 👍 once they've been addressed. |
Map<String, List<String>>
instead ofMap<String, String>
inPhysicalTable.java