Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PermissiveAvailability & PermissiveConcretePhysicalTable #190

Closed
wants to merge 10 commits into from
Closed

Add PermissiveAvailability & PermissiveConcretePhysicalTable #190

wants to merge 10 commits into from

Conversation

QubitPi
Copy link
Contributor

@QubitPi QubitPi commented Mar 6, 2017

No description provided.

@garyluoex
Copy link
Collaborator

Need CHANGELOG.md, also add a test to test PermissiveConcretePhysicalTable correctly creates PermissiveAvailability and that the getAvailableIntervals and getAllAvailableIntervals uses the PermissiveAvailability appropriately.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only evaluated this from GitHub (haven't pulled the code locally yet), so I didn't look at anything deeper that surface-code (so, no inspections on extension vs. implementation, etc.), but I wanted to get this level of feedback to you so you could start looking at it.

I'll dive a bit deeper shortly.

@NotNull ZonedTimeGrain timeGrain,
@NotNull Set<Column> columns,
@NotNull Map<String, String> logicalToPhysicalColumnNames,
@NotNull Availability availability
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the exact same constructor as exists in ConcretePhysicalTable, and there are otherwise no methods in this class that are not directly from the parent. What added value is this class giving us?

As a minimum, I would expect that the constructor here would require a PermissiveAvailability, rather than any Availability, otherwise there's no assertion that this PhysicalTable is going to do anything "permissive".

import java.util.Set;

/**
* An instance of permissive Physical table that is backed by a single fact table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless a class is set up as a singleton, it isn't really an instance. It may be an implementation, which is perhaps what you meant?

Since this is a specialization of a ConcretePhysicalTable, the JavaDoc on this class should talk about what is different about this implementation. Why does this class exist, what is it for, what concepts does it add on top of what ConcretePhysicalTable already does.

To put it a different way, merely adding (essentially) the class name to the javadoc of the parent class doesn't really make this a valuable / useful comment. 😄

import java.util.stream.Collectors;

/**
* An availability which doesn't force immutability on its contents.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really the case that "permissive" means "mutable"? My understanding was that "permissive" in this context has more to do with something related to how intervals from multiple columns were counted, to determine if all columns were available or not. It's entirely possible I'm misunderstanding something, but if mutability is really the question here, I think we should rename this class to better reflect that.


@Override
public Set<TableName> getDataSourceNames() {
return Collections.singleton(name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than make this wrapping every time getDataSourceNames is called, we should do it only once in the constructor and store the collection.

@@ -0,0 +1,80 @@
// Copyright 2016 Yahoo Inc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

expect:
permissiveAvailability.getAllAvailableIntervals() == [
(column1): [interval1].toSet(),
(column2): [interval2].toSet()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, using as Set is generally better

dataSourceConstraint.getAllColumnNames() >> ['column1', 'column2']

expect:
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList([interval1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to be wrong. interval1 is not the intersection of the intervals for the two columns, it's the union.

Also, additional test cases for many more overlap / non-overlap cases would be good to include here. In particular:

  • full overlap (start/end, start/end)
  • 0 overlap (-10/-1, 0/10)
  • 0 overlap abutting (-10/0, 0/10)
  • partial front overlap (0/10, 5/15)
  • partial back overlap (0/10, -5/5)
  • full front overlap (0/10, 5/10)
  • full back overlap (0/10, 0/5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, PermissiveAvailability in this case should return the union. This is how I understand "Concrete"(intersection) and "Permissive"(union) availavilities.

permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList([interval1])
}

def "getAvailableInterval returns empty interval if given column not configured"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not configured where? It's unclear because the setup clearly configures it somewhere.

Copy link
Contributor Author

@QubitPi QubitPi Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ignored interval is not associated with any column in setup. I changed ignored to unConfiguredColumn

permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList([interval1])
}

def "getAvailableInterval returns empty interval if given column not configured"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to run this same check for getAllAvailableIntervals, since it's behavior should be similar.

Or rather, it shouldn't include columns in the underlying metadata that this availability isn't configured for.

Copy link
Contributor Author

@QubitPi QubitPi Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is done in the "getAllAvailability returns the union of all availabilities for each columns configured to the table" test? (see it in the next commit)

permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList()
}

def "getAvailableInterval returns empty interval if given empty column request"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct behavior? Should this throw an exception instead? It's not clear to me what's right, since "all data is available if I don't need data from any columns" seems like it might be a reasonable response in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic is if you don't ask for anything, you don't get anything, consider the case where the user asks for a column not on the table in which is filtered out in the availability (although this should not be possible).

* @param columns The columns for this table
* @param logicalToPhysicalColumnNames Mappings from logical to physical names
* @param permissiveAvailability the instance of Permissive Availability which makes sure that
* the table is backed by permissive availability
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of the indention on the second line here, use intellij's auto format will help.

* @param permissiveAvailability the instance of Permissive Availability which makes sure that
* the table is backed by permissive availability
*/
protected PermissiveConcretePhysicalTable(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be public and does not take a permissiveAvailability as its argument, it should construct the availability and pass it to super.

import javax.validation.constraints.NotNull;

/**
* An availability which allows multiple disconnected intervals(union) on its contents.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of saying multiple disconnected intervals, just say missing intervals might be more appropriate.

*/
public class PermissiveAvailability implements Availability {

private final TableName tableName;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed if we are going to store a Set<TableName> dataSourceNames

this.tableName = tableName;
this.columns = columns;
this.metadataService = metadataService;
dataSourceNames = Collections.singleton(tableName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a this. to be consistent with everything else so that it looks nicer.

}


Map<Column, Set<Interval>> x = getAllAvailableIntervals();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is not acceptable.

Copy link
Contributor Author

@QubitPi QubitPi Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I forgot to remove this debug line.

return map.entrySet().stream()
.collect(
Collectors.toMap(
{Map.Entry it -> new Column(it.key)},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try remove the type cast and directly use it? since intellij complains anyways, if it compile and runs might as well remove the type cast.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a cast in this context, it's a type indication, but otherwise you are correct, Groovy's dynamic typing means that it's not needed here (unless we wanted it for clarity)

Column column1
Column column2
Interval interval1
Interval interval2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping the ones with the same type into a single line make things look cleaner

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I generally hate lumping things onto one line. It may just be my personal preference, but it makes it harder to see how many things there are, and harder to see what changes if variable names need to change. It also makes it more difficult to initialize variables when they are defined.

It's not a big deal, but I definitely would rather have more lines, with each definition on it's own line, rather than not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for reference, here's the relevant style guide indication for this: https://google.github.io/styleguide/javaguide.html#s4.8.2-variable-declarations

permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList()
}

def "getAvailableInterval returns empty interval if given empty column request"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic is if you don't ask for anything, you don't get anything, consider the case where the user asks for a column not on the table in which is filtered out in the availability (although this should not be possible).

column2 = new Column('column2')

interval1 = new Interval('2010-01-01/2020-12-31')
interval2 = new Interval('2017-01-01/2020-12-31')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the setup things that are not used by more than one test into the single test that uses it.

Set<String> allColumnNames = constraints.getAllColumnNames();

if (allColumnNames.isEmpty()) {
throw new IllegalArgumentException("DataSourceConstraint needs to have at least 1 constraint column");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow the behavior if things are empty in ConcreteAvailability, we do not want different behavior for each Availability.

Copy link
Collaborator

@garyluoex garyluoex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last two things, otherwise good

Map<String, Set<Interval>> allAvailableIntervals = metadataService.getAvailableIntervalsByTable(
dataSourceNames.stream()
.findFirst()
.orElseThrow(NullPointerException::new)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary since the constructor only allows a single name to be passed in, also throwing a null pointer exception with out any additional information is not recommended.

@@ -68,17 +70,12 @@ public PermissiveAvailability(
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
Set<String> allColumnNames = constraints.getAllColumnNames();

if (allColumnNames.isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to ConcreteAvailability the behavior of getAvailableIntervals behavior, it should filter out to only the columns that is configured in this table. Should add a test to ensure this behavior.

@QubitPi
Copy link
Contributor Author

QubitPi commented Mar 8, 2017

@garyluoex @cdeszaq More comments welcome :)

garyluoex
garyluoex previously approved these changes Mar 8, 2017
Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, pretty good. I think there can be a lot of code deduplication between ConcreteAvailability / PermissiveAvailability (which I think should be renamed to PermissiveConcreteAvailability to match the PhysicalTable naming pattern.)

I left a bunch of comments along those DRY lines, but I also did a bunch of playing with the code to see where it could go. Rather than throw that away, here's what I ended up with for those classes. Feel free to take what you think is good from them:

public class PermissiveAvailability extends ConcreteAvailability {

    /**
     * Constructor.
     *
     * @param tableName The name of the data source
     * @param columns A set of columns associated with the data source
     * @param metadataService A service containing the data source segment data
     */
    public PermissiveAvailability(
            @NotNull TableName tableName,
            @NotNull Set<Column> columns,
            @NotNull DataSourceMetadataService metadataService
    ) {
        super(tableName, columns, metadataService);
    }

    @Override
    public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
        Set<String> requestColumns = filterRequestColumns(constraints);
        return getIntervalsForColumnNames(requestColumns)
                .flatMap(Set::stream)
                .collect(SimplifiedIntervalList.getCollector());
    }
}

public class ConcreteAvailability implements Availability {

    private final TableName name;
    private final Set<TableName> names;
    private final Set<Column> columns;
    private final DataSourceMetadataService metadataService;
    private final Set<String> columnNames;

    /**
     * Constructor.
     *
     * @param tableName The name of the data source associated with this ImmutableAvailability
     * @param columns The columns associated with the table and availability
     * @param metadataService A service containing the datasource segment data
     */
    public ConcreteAvailability(
            TableName tableName,
            @NotNull Set<Column> columns,
            @NotNull DataSourceMetadataService metadataService
    ) {
        this.name = tableName;
        this.metadataService = metadataService;
        this.names = Collections.singleton(tableName);
        this.columns = columns;
        this.columnNames = columns.stream()
                .map(Column::getName)
                .collect(Collectors.toSet());
    }

    protected Set<Column> getColumns() {
        return columns;
    }

    protected DataSourceMetadataService getMetadataService() {
        return metadataService;
    }

    @Override
    public Set<TableName> getDataSourceNames() {
        return names;
    }

    @Override
    public Map<Column, Set<Interval>> getAllAvailableIntervals() {

        Map<String, Set<Interval>> allAvailableIntervals = getAllAvailableIntervalsByName();
        return columns.stream()
                .collect(
                        Collectors.toMap(
                                Function.identity(),
                                column -> allAvailableIntervals.getOrDefault(column.getName(), Collections.emptySet())
                        )
                );
    }

    @Override
    public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
        Set<String> requestColumns = filterRequestColumns(constraints);
        return new SimplifiedIntervalList(
                getIntervalsForColumnNames(requestColumns)
                        .reduce(null, IntervalUtils::getOverlappingSubintervals)
        );
    }

    protected Set<String> filterRequestColumns(DataSourceConstraint constraints) {
        return constraints.getAllColumnNames().stream()
                .filter(columnNames::contains)
                .collect(Collectors.toSet());
    }

    protected Map<String, Set<Interval>> getAllAvailableIntervalsByName() {
        return metadataService.getAvailableIntervalsByTable(name);
    }

    protected Stream<Set<Interval>> getIntervalsForColumnNames(Set<String> columnNames) {
        if (columnNames.isEmpty()) {
            return Stream.empty();
        }

        Map<String, Set<Interval>> allAvailableIntervals = getAllAvailableIntervalsByName();
        return columnNames.stream()
                .map(columnName -> allAvailableIntervals.getOrDefault(columnName, Collections.emptySet()));
    }
}

* An implementation of permissive Physical table that is backed by a single fact table.
* <p>
* This is different from its parent <tt>ConcretePhysicalTable</tt> by using a different
* Availability type, which is <tt>PermissiveAvailability</tt>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should talk about what it means to use a different type of availability. In particular, pointing out which methods of a PhysicalTable are impacted by this would be a good start.

import javax.validation.constraints.NotNull;

/**
* An availability which allows missing intervals on its contents.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

dataSourceNames.stream()
.findFirst()
.get()
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method appears to be doing exactly the same thing as ConcreteAvailability with the only difference being we unpack the collection of table names each time. We should not have this duplication. If a PermissiveAvailability is ConcreteAvailability, why not make it so?

Either way, we should not have to unpack this every time, since it's useless work. Instead, just hold the collection and the single table name if we need to.

}

@Override
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs a fairly rich comment explaining what getAvailableIntervals does, since this is the meat of this class (ie. it's the reason the class exists)

dataSourceNames.stream()
.findFirst()
.get()
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a clear duplicate of work that's being done in the earlier method. Can we de-duplicate the code? Perhaps pull the duplicated code out into a getAvailableIntervals method? (and possibly have this live in the ConcreteAvailability class, since it too seems to do the same thing twice)

'2017-01-01/2017-02-01' | '2016-10-01/2017-01-15' | ['2016-10-01/2017-02-01'] // partial back overlap (0/10, -5/5)
'2017-01-01/2017-02-01' | '2017-01-15/2017-02-01' | ['2017-01-01/2017-02-01'] // full front overlap (0/10, 5/10)
'2017-01-01/2017-02-01' | '2017-01-01/2017-01-15' | ['2017-01-01/2017-02-01'] // full back overlap (0/10, 0/5)
'2017-01-01/2017-02-01' | '2017-01-15/2017-01-25' | ['2017-01-01/2017-02-01'] // fully contain (0/10, 3/9)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would cool to have these comments as a string variable and have that expanded into the test name...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdeszaq Agree. How do I do that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to do it would be like this:

Bar mockBar = Mock(Bar)

@Unroll
def "The #thing always does #actionCount #action whenever it is poked and #reason"() {
    given: "A configured thing and a mock"
    thing."set$initialParam"(initialState)

    when: "It gets poked"
    thing.poke()

    then: "It has taken the action the correct number of times"
    actionCount * mockBar."action"()

    where:
    thing     | actionCount | action | initialParam | initialState | reason
    new Dog() | 1           | "jump" | "Barking"    | true         | "it is barking"
}

This example is a bit fancy, and shows off much more than how to get a string to unroll as part of the test name, but you can see that the reason where variable is only used as an unrolled parameter in the test name.

In a nutshell, you can put a # in front of any where parameter in the test name when using @Unroll, and the value of that variable will be swapped into the test name when Spock generates the tests (which is on for each iteration in the where block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the other stuff that's in that example shows off how Groovy will let you use strings as method names :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, for this test, of you make the where block like this:

where:
firstInterval           | secondInterval          | expected        | reason
'2017-01-01/2017-02-01' | '2017-01-01/2017-02-01' | ['2017-01-01/2017-02-01'] | "full overlap (start/end, start/end)"
'2017-01-01/2017-02-01' | '2018-01-01/2018-02-01' | ['2017-01-01/2017-02-01', '2018-01-01/2018-02-01'] | "0 overlap (-10/-1, 0/10)"
'2017-01-01/2017-02-01' | '2017-02-01/2017-03-01' | ['2017-01-01/2017-03-01'] | "0 overlap, abutting (-10/0, 0/10)"
'2017-01-01/2017-02-01' | '2017-01-15/2017-03-01' | ['2017-01-01/2017-03-01'] | "partial front overlap (0/10, 5/15)"
'2017-01-01/2017-02-01' | '2016-10-01/2017-01-15' | ['2016-10-01/2017-02-01'] | "partial back overlap (0/10, -5/5)"
'2017-01-01/2017-02-01' | '2017-01-15/2017-02-01' | ['2017-01-01/2017-02-01'] | "full front overlap (0/10, 5/10)"
'2017-01-01/2017-02-01' | '2017-01-01/2017-01-15' | ['2017-01-01/2017-02-01'] | "full back overlap (0/10, 0/5)"
'2017-01-01/2017-02-01' | '2017-01-15/2017-01-25' | ['2017-01-01/2017-02-01'] | "fully contain (0/10, 3/9)"

Then you can add it to the test name, like this:

getAvailableIntervals returns the union of requested columns when available intervals have #reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant! Thanks! I'll take a note.

permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList(
expected.stream()
.map{it -> new Interval(it)}
.collect(Collectors.toSet())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified using groovy: expected.collect{new Interval(it)} as Set

permissiveAvailability.dataSourceNames == Collections.singleton(TableName.of('table'))
}

def "getAllAvailability returns all availabilities for each columns configured to the table"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method name here in the description doesn't seem to match the method under test

permissiveAvailability.dataSourceNames == Collections.singleton(TableName.of('table'))
}

def "getAllAvailability returns all availabilities for each columns configured to the table"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have a test like this for only returning availabilities for columns that are configured on the Availability, even if they are in the underlying DataSourceMetadataService.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new test is the same as getAvailableInterval returns only configured column


import java.util.stream.Collectors

class PermissiveAvailabilitySpec extends Specification {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PermissiveAvailability extends from ConcreteAvailability, then this Spec only needs to worry about the specific differences in behavior from it's base class, which will likely simplify this Spec significantly.

That said, if the comments I've left here can make the Spec for the parent better, then they should be rolled into the Parent spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on removing test that don't test the difference.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial. Still need to dig into most recent test changes.

this.metadataService = metadataService;
this.names = Collections.singleton(tableName);
this.columns = columns;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should turn this into an ImmutableSet so that callers can't modify it if they get it via getColumns

this.columns = columns;
this.columnNames = columns.stream()
.map(Column::getName)
.collect(Collectors.toSet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be made into an ImmutableSet, though it matters a little less, since it's never returned directly. Better to be safe though.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the couple of final things, this is looking really good. Just a few small cleanups!

}

@Unroll
def "The spanning intervals are returned even when individual columns are not fully covering the interval"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Unroll tests should have some sort of where expansion, to differentiate the test iterations and indicate what is being tested in each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though it makes more sense to remove @Unroll :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a data-driven test (ie. a test with a where block), we should always use @Unroll. The reason is that @Unroll makes a test (as seen by the JUnit Test Runner) for each "row" in a where table. Without that, all of the iterations in the where block are run as a single test, so a failure with the 1st row marks the entire test as a failure, instead of allowing each iteration to fail on it's own, which gives much more information.

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good.
I like the readability of the tests particularly.

@@ -25,8 +26,10 @@
public class ConcreteAvailability implements Availability {

private final TableName name;
private final Set<TableName> names;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's align the variable name with the getter name. e.g. dataSourceNames


/**
* An implementation of permissive Physical table that is backed by a single fact table.
* <p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A permissive table that is backed by a single fact table or a concrete table with permissive availability? I would have thought the latter.

return getIntervalsForColumnNames(requestColumns)
.flatMap(Set::stream)
.collect(SimplifiedIntervalList.getCollector());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Permissive availability shouldn't use the constraints to limit the available times.
Instead the columns from the Availability should be used.

.reduce(null, IntervalUtils::getOverlappingSubintervals)
);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I believe this could return a null pointer exception.
The way to fix this would be to empty check the request columns before doing the stream and returning an empty SIL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option:

        return new SimplifiedIntervalList(
                requestColumns.isEmpty() ?
                        Collections.emptySet() :
                        getIntervalsForColumnNames(requestColumns)
                                .reduce(null, IntervalUtils::getOverlappingSubintervals)
        );


import java.util.stream.Collectors

class PermissiveAvailabilitySpec extends Specification {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on removing test that don't test the difference.

cdeszaq
cdeszaq previously approved these changes Mar 15, 2017
Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once that small tweak is addressed, this is good to go. So, after 1 more commit, this will be squashed and merged.

*/
@Override
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
return getIntervalsForColumnNames(getColumns().stream().map(Column::getName).collect(Collectors.toSet()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than recompute this, we should pull this collection from the parent class: https://github.com/yahoo/fili/pull/190/files#diff-6923730d44e29354ccfcc8d1ca622d8eR33

We may need to add a protected getter for it to the parent, but that's where we should use it from.

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing is that the permissive tests are wrong to the feature we're expecting.

* @param columns The columns for this table
* @param logicalToPhysicalColumnNames Mappings from logical to physical names
* @param availability the instance of Availability specifying the availabilities of table columns
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitepace: two spaces after param name.

}

def "getAvailableIntervals returns only configured column"() {
given:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar 'only configured columns'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could be lifted up to ConcreteAvailabilitySpec.

expect:
permissiveAvailability.getAvailableIntervals(dataSourceConstraint) == new SimplifiedIntervalList()
}

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn Mar 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three tests have incorrect results. The reason is not obvious is because you haven't initialized the metadata service they use with actual intervals. They should be returning the full set of intervals supplied by the columns of the permissive table.

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two comments are my last concern.

}

def "getAvailableIntervals returns empty list of intervals if given empty column request"() {
given:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You changed the test behavior but not the test name.

}

def "getAvailableIntervals returns empty list of intervals if given column not configured"() {
given:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You changed the test behavior but not the test name.

@michael-mclawhorn
Copy link
Contributor

Once you've got these last changes in, please rebase onto the updated UDCore3 and resolve any conflicts.

@QubitPi QubitPi changed the base branch from UDCore3 to JekyllSync March 17, 2017 15:32
@QubitPi QubitPi changed the base branch from JekyllSync to UDCore3 March 17, 2017 15:32
@garyluoex garyluoex dismissed their stale review March 19, 2017 23:57

There is too much changes to the point that I need to re-review this piece.

@cdeszaq cdeszaq dismissed their stale review March 21, 2017 22:03

since this needs a rebase onto the underlying PR, but once this is cleanly rebased, I'm happy to give this a quick glance and fast re-approval

@michael-mclawhorn michael-mclawhorn modified the milestone: mclawhor Mar 22, 2017
@michael-mclawhorn michael-mclawhorn changed the base branch from UDCore3 to SegmentMetadataLoaderBindingBug March 22, 2017 00:12
@michael-mclawhorn michael-mclawhorn changed the base branch from SegmentMetadataLoaderBindingBug to UDCore3 March 22, 2017 00:13
* @param columns The columns for this table
* @param logicalToPhysicalColumnNames Mappings from logical to physical names
* @param dataSourceMetadataService data source metadata service containing availability data for the table
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing: two spaces after param name.


public Set<String> getCachedColumnNames() {
return cachedColumnNames;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop the 'cached' prefix on the member variables in this class and in the getter.

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the variable name change will make me happy with this.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few final things, should be fairly trivial, and the comment updates to reflect the behavior change for Permissive availability vs. it's previous version.

@@ -99,4 +99,4 @@ class ConcreteAvailabilitySpec extends Specification{
expect:
concreteAvailability.getAvailableIntervals(constraint) == new SimplifiedIntervalList()
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline needed at end of file

@Override
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
Map<String, List<Interval>> allAvailableIntervals = getMetadataService()
.getAvailableIntervalsByTable(getName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be using getLatestAvailableIntervalsByTable from the parent class.

Also means that getName from the parent class can go away.

* @return the union of all available intervals
*/
@Override
public SimplifiedIntervalList getAvailableIntervals(DataSourceConstraint constraints) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is unused, we should probably rename the parameter to something like ignored (and probably make sure the javadoc on this method and class explains why it doesn't matter. @michael-mclawhorn should be able to better explain why it doesn't matter if there are questions.

CHANGELOG.md Outdated
@@ -8,6 +8,9 @@ pull request if there was one.
Current
-------
### Added:
- [PermissiveAvailability and PermissiveConcretePhysicalTable](https://github.com/yahoo/fili/pull/190)
* Added `PermissiveConcretePhysicalTable` and `PermissiveAvailability` to model table in druid datasource and its availability in the new table availability structure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this addition is really a re-add of the old behavior (mostly), just in a new place, we should make sure to call that out. Also, since the behavior is subtly different than the old behavior, we should explain that change in here too. @michael-mclawhorn or @will-lauer can help explain it I think if there are questions.


public TableName getName() {
return name;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods are unused if getLatestAvailableIntervalsByTable is used in the permissive availability.

return name;
}

public Set<String> getCachedColumnNames() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should make this protected, since it's intended as a helper method more than anything else, right?

static Map<Column, Set<Interval>> constructColumnToIntervalMap(Map<String, Set<String>> columnToIntervalInString) {
return columnToIntervalInString.collectEntries{
key,
value -> [ (new Column(key)): value.collect{new Interval(it)} ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, format for Groovy closures is to put the closure params on the opening line, and then wrap after that:

[:].collectEntries {key, value ->
    [(key): value]
}

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@garyluoex garyluoex force-pushed the UDCore3 branch 3 times, most recently from 4c65f95 to a36562f Compare March 24, 2017 21:51
@cdeszaq cdeszaq changed the base branch from UDCore3 to master March 27, 2017 17:05
@cdeszaq
Copy link
Collaborator

cdeszaq commented Mar 27, 2017

Closing in favor of #200, which is the rebased version of this.

@cdeszaq cdeszaq closed this Mar 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants