-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PermissiveAvailability & PermissiveConcretePhysicalTable #190
Conversation
Need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only evaluated this from GitHub (haven't pulled the code locally yet), so I didn't look at anything deeper that surface-code (so, no inspections on extension vs. implementation, etc.), but I wanted to get this level of feedback to you so you could start looking at it.
I'll dive a bit deeper shortly.
@NotNull ZonedTimeGrain timeGrain, | ||
@NotNull Set<Column> columns, | ||
@NotNull Map<String, String> logicalToPhysicalColumnNames, | ||
@NotNull Availability availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the exact same constructor as exists in ConcretePhysicalTable
, and there are otherwise no methods in this class that are not directly from the parent. What added value is this class giving us?
As a minimum, I would expect that the constructor here would require a PermissiveAvailability
, rather than any Availability
, otherwise there's no assertion that this PhysicalTable
is going to do anything "permissive".
import java.util.Set; | ||
|
||
/** | ||
* An instance of permissive Physical table that is backed by a single fact table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless a class is set up as a singleton, it isn't really an instance. It may be an implementation, which is perhaps what you meant?
Since this is a specialization of a ConcretePhysicalTable
, the JavaDoc on this class should talk about what is different about this implementation. Why does this class exist, what is it for, what concepts does it add on top of what ConcretePhysicalTable
already does.
To put it a different way, merely adding (essentially) the class name to the javadoc of the parent class doesn't really make this a valuable / useful comment. 😄
import java.util.stream.Collectors; | ||
|
||
/** | ||
* An availability which doesn't force immutability on its contents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really the case that "permissive" means "mutable"? My understanding was that "permissive" in this context has more to do with something related to how intervals from multiple columns were counted, to determine if all columns were available or not. It's entirely possible I'm misunderstanding something, but if mutability is really the question here, I think we should rename this class to better reflect that.
|
||
@Override | ||
public Set<TableName> getDataSourceNames() { | ||
return Collections.singleton(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than make this wrapping every time getDataSourceNames
is called, we should do it only once in the constructor and store the collection.
@@ -0,0 +1,80 @@ | |||
// Copyright 2016 Yahoo Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
expect: | ||
permissiveAvailability.getAllAvailableIntervals() == [ | ||
(column1): [interval1].toSet(), | ||
(column2): [interval2].toSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, using as Set
is generally better
dataSourceConstraint.getAllColumnNames() >> ['column1', 'column2'] | ||
|
||
expect: | ||
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList([interval1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to be wrong. interval1
is not the intersection of the intervals for the two columns, it's the union.
Also, additional test cases for many more overlap / non-overlap cases would be good to include here. In particular:
- full overlap (
start/end, start/end
) - 0 overlap (
-10/-1, 0/10
) - 0 overlap abutting (
-10/0, 0/10
) - partial front overlap (
0/10, 5/15
) - partial back overlap (
0/10, -5/5
) - full front overlap (
0/10, 5/10
) - full back overlap (
0/10, 0/5
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, PermissiveAvailability
in this case should return the union. This is how I understand "Concrete"(intersection) and "Permissive"(union) availavilities.
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList([interval1]) | ||
} | ||
|
||
def "getAvailableInterval returns empty interval if given column not configured"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not configured where? It's unclear because the setup clearly configures it somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ignored
interval is not associated with any column in setup. I changed ignored
to unConfiguredColumn
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList([interval1]) | ||
} | ||
|
||
def "getAvailableInterval returns empty interval if given column not configured"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to run this same check for getAllAvailableIntervals
, since it's behavior should be similar.
Or rather, it shouldn't include columns in the underlying metadata that this availability isn't configured for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is done in the "getAllAvailability returns the union of all availabilities for each columns configured to the table" test? (see it in the next commit)
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList() | ||
} | ||
|
||
def "getAvailableInterval returns empty interval if given empty column request"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct behavior? Should this throw an exception instead? It's not clear to me what's right, since "all data is available if I don't need data from any columns" seems like it might be a reasonable response in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic is if you don't ask for anything, you don't get anything
, consider the case where the user asks for a column not on the table in which is filtered out in the availability (although this should not be possible).
* @param columns The columns for this table | ||
* @param logicalToPhysicalColumnNames Mappings from logical to physical names | ||
* @param permissiveAvailability the instance of Permissive Availability which makes sure that | ||
* the table is backed by permissive availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get rid of the indention on the second line here, use intellij's auto format will help.
* @param permissiveAvailability the instance of Permissive Availability which makes sure that | ||
* the table is backed by permissive availability | ||
*/ | ||
protected PermissiveConcretePhysicalTable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be public and does not take a permissiveAvailability
as its argument, it should construct the availability and pass it to super
.
import javax.validation.constraints.NotNull; | ||
|
||
/** | ||
* An availability which allows multiple disconnected intervals(union) on its contents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of saying multiple disconnected intervals, just say missing intervals might be more appropriate.
*/ | ||
public class PermissiveAvailability implements Availability { | ||
|
||
private final TableName tableName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed if we are going to store a Set<TableName> dataSourceNames
this.tableName = tableName; | ||
this.columns = columns; | ||
this.metadataService = metadataService; | ||
dataSourceNames = Collections.singleton(tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a this.
to be consistent with everything else so that it looks nicer.
} | ||
|
||
|
||
Map<Column, Set<Interval>> x = getAllAvailableIntervals(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x
is not acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I forgot to remove this debug line.
return map.entrySet().stream() | ||
.collect( | ||
Collectors.toMap( | ||
{Map.Entry it -> new Column(it.key)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we try remove the type cast and directly use it
? since intellij complains anyways, if it compile and runs might as well remove the type cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a cast in this context, it's a type indication, but otherwise you are correct, Groovy's dynamic typing means that it's not needed here (unless we wanted it for clarity)
Column column1 | ||
Column column2 | ||
Interval interval1 | ||
Interval interval2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouping the ones with the same type into a single line make things look cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I generally hate lumping things onto one line. It may just be my personal preference, but it makes it harder to see how many things there are, and harder to see what changes if variable names need to change. It also makes it more difficult to initialize variables when they are defined.
It's not a big deal, but I definitely would rather have more lines, with each definition on it's own line, rather than not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for reference, here's the relevant style guide indication for this: https://google.github.io/styleguide/javaguide.html#s4.8.2-variable-declarations
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList() | ||
} | ||
|
||
def "getAvailableInterval returns empty interval if given empty column request"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic is if you don't ask for anything, you don't get anything
, consider the case where the user asks for a column not on the table in which is filtered out in the availability (although this should not be possible).
column2 = new Column('column2') | ||
|
||
interval1 = new Interval('2010-01-01/2020-12-31') | ||
interval2 = new Interval('2017-01-01/2020-12-31') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the setup things that are not used by more than one test into the single test that uses it.
Set<String> allColumnNames = constraints.getAllColumnNames(); | ||
|
||
if (allColumnNames.isEmpty()) { | ||
throw new IllegalArgumentException("DataSourceConstraint needs to have at least 1 constraint column"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow the behavior if things are empty in ConcreteAvailability
, we do not want different behavior for each Availability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last two things, otherwise good
Map<String, Set<Interval>> allAvailableIntervals = metadataService.getAvailableIntervalsByTable( | ||
dataSourceNames.stream() | ||
.findFirst() | ||
.orElseThrow(NullPointerException::new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary since the constructor only allows a single name to be passed in, also throwing a null pointer exception with out any additional information is not recommended.
@@ -68,17 +70,12 @@ public PermissiveAvailability( | |||
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) { | |||
Set<String> allColumnNames = constraints.getAllColumnNames(); | |||
|
|||
if (allColumnNames.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to ConcreteAvailability
the behavior of getAvailableIntervals
behavior, it should filter out to only the columns that is configured in this table. Should add a test to ensure this behavior.
@garyluoex @cdeszaq More comments welcome :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, pretty good. I think there can be a lot of code deduplication between ConcreteAvailability
/ PermissiveAvailability
(which I think should be renamed to PermissiveConcreteAvailability
to match the PhysicalTable
naming pattern.)
I left a bunch of comments along those DRY lines, but I also did a bunch of playing with the code to see where it could go. Rather than throw that away, here's what I ended up with for those classes. Feel free to take what you think is good from them:
public class PermissiveAvailability extends ConcreteAvailability {
/**
* Constructor.
*
* @param tableName The name of the data source
* @param columns A set of columns associated with the data source
* @param metadataService A service containing the data source segment data
*/
public PermissiveAvailability(
@NotNull TableName tableName,
@NotNull Set<Column> columns,
@NotNull DataSourceMetadataService metadataService
) {
super(tableName, columns, metadataService);
}
@Override
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
Set<String> requestColumns = filterRequestColumns(constraints);
return getIntervalsForColumnNames(requestColumns)
.flatMap(Set::stream)
.collect(SimplifiedIntervalList.getCollector());
}
}
public class ConcreteAvailability implements Availability {
private final TableName name;
private final Set<TableName> names;
private final Set<Column> columns;
private final DataSourceMetadataService metadataService;
private final Set<String> columnNames;
/**
* Constructor.
*
* @param tableName The name of the data source associated with this ImmutableAvailability
* @param columns The columns associated with the table and availability
* @param metadataService A service containing the datasource segment data
*/
public ConcreteAvailability(
TableName tableName,
@NotNull Set<Column> columns,
@NotNull DataSourceMetadataService metadataService
) {
this.name = tableName;
this.metadataService = metadataService;
this.names = Collections.singleton(tableName);
this.columns = columns;
this.columnNames = columns.stream()
.map(Column::getName)
.collect(Collectors.toSet());
}
protected Set<Column> getColumns() {
return columns;
}
protected DataSourceMetadataService getMetadataService() {
return metadataService;
}
@Override
public Set<TableName> getDataSourceNames() {
return names;
}
@Override
public Map<Column, Set<Interval>> getAllAvailableIntervals() {
Map<String, Set<Interval>> allAvailableIntervals = getAllAvailableIntervalsByName();
return columns.stream()
.collect(
Collectors.toMap(
Function.identity(),
column -> allAvailableIntervals.getOrDefault(column.getName(), Collections.emptySet())
)
);
}
@Override
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
Set<String> requestColumns = filterRequestColumns(constraints);
return new SimplifiedIntervalList(
getIntervalsForColumnNames(requestColumns)
.reduce(null, IntervalUtils::getOverlappingSubintervals)
);
}
protected Set<String> filterRequestColumns(DataSourceConstraint constraints) {
return constraints.getAllColumnNames().stream()
.filter(columnNames::contains)
.collect(Collectors.toSet());
}
protected Map<String, Set<Interval>> getAllAvailableIntervalsByName() {
return metadataService.getAvailableIntervalsByTable(name);
}
protected Stream<Set<Interval>> getIntervalsForColumnNames(Set<String> columnNames) {
if (columnNames.isEmpty()) {
return Stream.empty();
}
Map<String, Set<Interval>> allAvailableIntervals = getAllAvailableIntervalsByName();
return columnNames.stream()
.map(columnName -> allAvailableIntervals.getOrDefault(columnName, Collections.emptySet()));
}
}
* An implementation of permissive Physical table that is backed by a single fact table. | ||
* <p> | ||
* This is different from its parent <tt>ConcretePhysicalTable</tt> by using a different | ||
* Availability type, which is <tt>PermissiveAvailability</tt> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should talk about what it means to use a different type of availability. In particular, pointing out which methods of a PhysicalTable are impacted by this would be a good start.
import javax.validation.constraints.NotNull; | ||
|
||
/** | ||
* An availability which allows missing intervals on its contents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
dataSourceNames.stream() | ||
.findFirst() | ||
.get() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method appears to be doing exactly the same thing as ConcreteAvailability
with the only difference being we unpack the collection of table names each time. We should not have this duplication. If a PermissiveAvailability
is ConcreteAvailability
, why not make it so?
Either way, we should not have to unpack this every time, since it's useless work. Instead, just hold the collection and the single table name if we need to.
} | ||
|
||
@Override | ||
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs a fairly rich comment explaining what getAvailableIntervals
does, since this is the meat of this class (ie. it's the reason the class exists)
dataSourceNames.stream() | ||
.findFirst() | ||
.get() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a clear duplicate of work that's being done in the earlier method. Can we de-duplicate the code? Perhaps pull the duplicated code out into a getAvailableIntervals
method? (and possibly have this live in the ConcreteAvailability
class, since it too seems to do the same thing twice)
'2017-01-01/2017-02-01' | '2016-10-01/2017-01-15' | ['2016-10-01/2017-02-01'] // partial back overlap (0/10, -5/5) | ||
'2017-01-01/2017-02-01' | '2017-01-15/2017-02-01' | ['2017-01-01/2017-02-01'] // full front overlap (0/10, 5/10) | ||
'2017-01-01/2017-02-01' | '2017-01-01/2017-01-15' | ['2017-01-01/2017-02-01'] // full back overlap (0/10, 0/5) | ||
'2017-01-01/2017-02-01' | '2017-01-15/2017-01-25' | ['2017-01-01/2017-02-01'] // fully contain (0/10, 3/9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would cool to have these comments as a string variable and have that expanded into the test name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdeszaq Agree. How do I do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to do it would be like this:
Bar mockBar = Mock(Bar)
@Unroll
def "The #thing always does #actionCount #action whenever it is poked and #reason"() {
given: "A configured thing and a mock"
thing."set$initialParam"(initialState)
when: "It gets poked"
thing.poke()
then: "It has taken the action the correct number of times"
actionCount * mockBar."action"()
where:
thing | actionCount | action | initialParam | initialState | reason
new Dog() | 1 | "jump" | "Barking" | true | "it is barking"
}
This example is a bit fancy, and shows off much more than how to get a string to unroll as part of the test name, but you can see that the reason
where
variable is only used as an unrolled parameter in the test name.
In a nutshell, you can put a #
in front of any where
parameter in the test name when using @Unroll
, and the value of that variable will be swapped into the test name when Spock generates the tests (which is on for each iteration in the where
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the other stuff that's in that example shows off how Groovy will let you use strings as method names :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically, for this test, of you make the where
block like this:
where:
firstInterval | secondInterval | expected | reason
'2017-01-01/2017-02-01' | '2017-01-01/2017-02-01' | ['2017-01-01/2017-02-01'] | "full overlap (start/end, start/end)"
'2017-01-01/2017-02-01' | '2018-01-01/2018-02-01' | ['2017-01-01/2017-02-01', '2018-01-01/2018-02-01'] | "0 overlap (-10/-1, 0/10)"
'2017-01-01/2017-02-01' | '2017-02-01/2017-03-01' | ['2017-01-01/2017-03-01'] | "0 overlap, abutting (-10/0, 0/10)"
'2017-01-01/2017-02-01' | '2017-01-15/2017-03-01' | ['2017-01-01/2017-03-01'] | "partial front overlap (0/10, 5/15)"
'2017-01-01/2017-02-01' | '2016-10-01/2017-01-15' | ['2016-10-01/2017-02-01'] | "partial back overlap (0/10, -5/5)"
'2017-01-01/2017-02-01' | '2017-01-15/2017-02-01' | ['2017-01-01/2017-02-01'] | "full front overlap (0/10, 5/10)"
'2017-01-01/2017-02-01' | '2017-01-01/2017-01-15' | ['2017-01-01/2017-02-01'] | "full back overlap (0/10, 0/5)"
'2017-01-01/2017-02-01' | '2017-01-15/2017-01-25' | ['2017-01-01/2017-02-01'] | "fully contain (0/10, 3/9)"
Then you can add it to the test name, like this:
getAvailableIntervals returns the union of requested columns when available intervals have #reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant! Thanks! I'll take a note.
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList( | ||
expected.stream() | ||
.map{it -> new Interval(it)} | ||
.collect(Collectors.toSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be simplified using groovy: expected.collect{new Interval(it)} as Set
permissiveAvailability.dataSourceNames == Collections.singleton(TableName.of('table')) | ||
} | ||
|
||
def "getAllAvailability returns all availabilities for each columns configured to the table"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method name here in the description doesn't seem to match the method under test
permissiveAvailability.dataSourceNames == Collections.singleton(TableName.of('table')) | ||
} | ||
|
||
def "getAllAvailability returns all availabilities for each columns configured to the table"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also have a test like this for only returning availabilities for columns that are configured on the Availability, even if they are in the underlying DataSourceMetadataService.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new test is the same as getAvailableInterval returns only configured column
|
||
import java.util.stream.Collectors | ||
|
||
class PermissiveAvailabilitySpec extends Specification { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If PermissiveAvailability
extends from ConcreteAvailability
, then this Spec only needs to worry about the specific differences in behavior from it's base class, which will likely simplify this Spec significantly.
That said, if the comments I've left here can make the Spec for the parent better, then they should be rolled into the Parent spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on removing test that don't test the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial. Still need to dig into most recent test changes.
this.metadataService = metadataService; | ||
this.names = Collections.singleton(tableName); | ||
this.columns = columns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should turn this into an ImmutableSet
so that callers can't modify it if they get it via getColumns
this.columns = columns; | ||
this.columnNames = columns.stream() | ||
.map(Column::getName) | ||
.collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also be made into an ImmutableSet
, though it matters a little less, since it's never returned directly. Better to be safe though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the couple of final things, this is looking really good. Just a few small cleanups!
} | ||
|
||
@Unroll | ||
def "The spanning intervals are returned even when individual columns are not fully covering the interval"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Unroll
tests should have some sort of where
expansion, to differentiate the test iterations and indicate what is being tested in each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though it makes more sense to remove @Unroll
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a data-driven test (ie. a test with a where
block), we should always use @Unroll
. The reason is that @Unroll
makes a test (as seen by the JUnit Test Runner) for each "row" in a where
table. Without that, all of the iterations in the where
block are run as a single test, so a failure with the 1st row marks the entire test as a failure, instead of allowing each iteration to fail on it's own, which gives much more information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good.
I like the readability of the tests particularly.
@@ -25,8 +26,10 @@ | |||
public class ConcreteAvailability implements Availability { | |||
|
|||
private final TableName name; | |||
private final Set<TableName> names; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's align the variable name with the getter name. e.g. dataSourceNames
|
||
/** | ||
* An implementation of permissive Physical table that is backed by a single fact table. | ||
* <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A permissive table that is backed by a single fact table or a concrete table with permissive availability? I would have thought the latter.
return getIntervalsForColumnNames(requestColumns) | ||
.flatMap(Set::stream) | ||
.collect(SimplifiedIntervalList.getCollector()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Permissive availability shouldn't use the constraints to limit the available times.
Instead the columns from the Availability should be used.
.reduce(null, IntervalUtils::getOverlappingSubintervals) | ||
); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, I believe this could return a null pointer exception.
The way to fix this would be to empty check the request columns before doing the stream and returning an empty SIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option:
return new SimplifiedIntervalList(
requestColumns.isEmpty() ?
Collections.emptySet() :
getIntervalsForColumnNames(requestColumns)
.reduce(null, IntervalUtils::getOverlappingSubintervals)
);
|
||
import java.util.stream.Collectors | ||
|
||
class PermissiveAvailabilitySpec extends Specification { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on removing test that don't test the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once that small tweak is addressed, this is good to go. So, after 1 more commit, this will be squashed and merged.
*/ | ||
@Override | ||
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) { | ||
return getIntervalsForColumnNames(getColumns().stream().map(Column::getName).collect(Collectors.toSet())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than recompute this, we should pull this collection from the parent class: https://github.com/yahoo/fili/pull/190/files#diff-6923730d44e29354ccfcc8d1ca622d8eR33
We may need to add a protected
getter for it to the parent, but that's where we should use it from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing is that the permissive tests are wrong to the feature we're expecting.
* @param columns The columns for this table | ||
* @param logicalToPhysicalColumnNames Mappings from logical to physical names | ||
* @param availability the instance of Availability specifying the availabilities of table columns | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whitepace: two spaces after param name.
} | ||
|
||
def "getAvailableIntervals returns only configured column"() { | ||
given: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar 'only configured columns'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could be lifted up to ConcreteAvailabilitySpec.
expect: | ||
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList() | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three tests have incorrect results. The reason is not obvious is because you haven't initialized the metadata service they use with actual intervals. They should be returning the full set of intervals supplied by the columns of the permissive table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these two comments are my last concern.
} | ||
|
||
def "getAvailableIntervals returns empty list of intervals if given empty column request"() { | ||
given: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You changed the test behavior but not the test name.
} | ||
|
||
def "getAvailableIntervals returns empty list of intervals if given column not configured"() { | ||
given: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You changed the test behavior but not the test name.
Once you've got these last changes in, please rebase onto the updated UDCore3 and resolve any conflicts. |
There is too much changes to the point that I need to re-review this piece.
since this needs a rebase onto the underlying PR, but once this is cleanly rebased, I'm happy to give this a quick glance and fast re-approval
* @param columns The columns for this table | ||
* @param logicalToPhysicalColumnNames Mappings from logical to physical names | ||
* @param dataSourceMetadataService data source metadata service containing availability data for the table | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spacing: two spaces after param name.
|
||
public Set<String> getCachedColumnNames() { | ||
return cachedColumnNames; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop the 'cached' prefix on the member variables in this class and in the getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the variable name change will make me happy with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few final things, should be fairly trivial, and the comment updates to reflect the behavior change for Permissive availability vs. it's previous version.
@@ -99,4 +99,4 @@ class ConcreteAvailabilitySpec extends Specification{ | |||
expect: | |||
concreteAvailability.getAvailableIntervals(constraint) == new SimplifiedIntervalList() | |||
} | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline needed at end of file
@Override | ||
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) { | ||
Map<String, List<Interval>> allAvailableIntervals = getMetadataService() | ||
.getAvailableIntervalsByTable(getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be using getLatestAvailableIntervalsByTable
from the parent class.
Also means that getName
from the parent class can go away.
* @return the union of all available intervals | ||
*/ | ||
@Override | ||
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is unused, we should probably rename the parameter to something like ignored
(and probably make sure the javadoc on this method and class explains why it doesn't matter. @michael-mclawhorn should be able to better explain why it doesn't matter if there are questions.
CHANGELOG.md
Outdated
@@ -8,6 +8,9 @@ pull request if there was one. | |||
Current | |||
------- | |||
### Added: | |||
- [PermissiveAvailability and PermissiveConcretePhysicalTable](https://github.com/yahoo/fili/pull/190) | |||
* Added `PermissiveConcretePhysicalTable` and `PermissiveAvailability` to model table in druid datasource and its availability in the new table availability structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this addition is really a re-add of the old behavior (mostly), just in a new place, we should make sure to call that out. Also, since the behavior is subtly different than the old behavior, we should explain that change in here too. @michael-mclawhorn or @will-lauer can help explain it I think if there are questions.
|
||
public TableName getName() { | ||
return name; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two methods are unused if getLatestAvailableIntervalsByTable
is used in the permissive availability.
return name; | ||
} | ||
|
||
public Set<String> getCachedColumnNames() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should make this protected
, since it's intended as a helper method more than anything else, right?
static Map<Column, Set<Interval>> constructColumnToIntervalMap(Map<String, Set<String>> columnToIntervalInString) { | ||
return columnToIntervalInString.collectEntries{ | ||
key, | ||
value -> [ (new Column(key)): value.collect{new Interval(it)} ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, format for Groovy closures is to put the closure params on the opening line, and then wrap after that:
[:].collectEntries {key, value ->
[(key): value]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
4c65f95
to
a36562f
Compare
Closing in favor of #200, which is the rebased version of this. |
No description provided.