Better programmatic generation of metadata json in tests #412

QubitPi · 2017-06-23T19:59:52Z

Address issue #392

The strategy is to have all metadata specs under com.yahoo.bard.webservice.metadata to extend BaseDataSourceMetadataSpec, which is a base specification of all metadata tests. All metadata Specs override childSetupSpec(), and call/override generate*() methods to initiate metadata related resources.

Take "intervals" as an example:

you don't need any of the resources(including "intervals") defined in base Spec, you do nothing
you need "intervals" in your test, you override childSetupSpec() and call its generation method in your Spec as follows

@Override
def childSetupSpec() {
    intervals = generateIntervals()
}

you need "intervals" in your test, but you need a different set of intervals:

@Override
def childSetupSpec() {
    intervals = generateIntervals()
}

@Override
Map<String, Interval> generateIntervals() {
    [
        "interval1": Interval.parse("2015-01-01T00:00:00.000Z/2015-01-02T00:00:00.000Z"),
        "interval2": Interval.parse("2015-01-02T00:00:00.000Z/2015-01-03T00:00:00.000Z"),
        "interval3": Interval.parse("2015-01-03T00:00:00.000Z/2015-01-04T00:00:00.000Z"),
        "interval123": Interval.parse("2015-01-01T00:00:00.000Z/2015-01-04T00:00:00.000Z")
    ]
}

archolewa

Overall, the ideas look reasonable. Mostly comments around trying to make the Groovy code a bit cleaner, more concise, and/or easier to read.

archolewa · 2017-11-17T21:59:49Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy

    @Shared
-    String version1
+    Integer binversion1


What is this?

And why is it an Integer rather than an int?

@archolewa A binary version(binVersion) is used as a required argument to construct a DataSegment. That argument is an Integer

archolewa · 2017-11-17T22:00:28Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy

    @Shared
-    DataSegment segment1
-
+    long size2


What are these the size of?

@archolewa Same reason as binVersion above

archolewa · 2017-11-17T22:02:23Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy


-    @Shared
-    List<String> dimensions123
+    def shutdownSpec() {


This should be cleanupSpec. Also if we're going to have a childSetupSpec then we should have a childCleanupSpec as well, so that people can cleanup whatever they setup.

archolewa · 2017-11-17T22:03:42Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy

+        [
+                "interval1": Interval.parse("2015-01-01T00:00:00.000Z/2015-01-02T00:00:00.000Z"),
+                "interval2": Interval.parse("2015-01-02T00:00:00.000Z/2015-01-03T00:00:00.000Z"),
+                "interval12": Interval.parse("2015-01-01T00:00:00.000Z/2015-01-03T00:00:00.000Z")


You don't need to wrap quotes around String keys in Groovy:

interval1: Interval.parse(...), interval2: Interval.parse(...), ...

archolewa · 2017-11-17T22:06:11Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy

-        version2 = DateTimeFormat.fullDateTime().print(DateTime.now())
+    RangeSet<DateTime> generateRangeSet() {
+        rangeSet = TreeRangeSet.create()
+        rangeSet.add(Range.closedOpen(intervals["interval12"].getStart(), intervals["interval12"].getEnd()))


Similarly, when accessing string keys in a map, you can just write map.key:

rangeSet.add(Range.closedOpen(intervals.interval12.getStart(), intervals.intervals12.getEnd

Though since you're pulling out interval12 twice, I'd probably pull it out into a variable:

Interval interval12 = intervals.interval12

@archolewa Thank you very much for the tip! I've changed this as well as how intervals are accessed in the same way.

archolewa · 2017-11-17T22:06:43Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy

+        [
+                new DataSegment(
+                        tableName,
+                        intervals["interval1"],


Groovy has a syntactic shortcut for accessing string keys: intervals.interval1

archolewa · 2017-11-17T22:08:11Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy

-
-    def shutdownSpec() {
-        DateTimeZone.setDefault(currentTZ)
+        [


Personally, I prefer to use an explicit return statement in functions/methods that are longer than one or two lines. Makes it clearer what you're doing with that last line.

archolewa · 2017-11-17T22:15:25Z

...ore/src/test/groovy/com/yahoo/bard/webservice/metadata/DataSourceMetadataLoadTaskSpec.groovy

@@ -170,10 +145,10 @@ class DataSourceMetadataLoadTaskSpec extends Specification {
        druidWS.jsonResponse = {fullDataSourceMetadataJson}

        expectedIntervalsMap = [:]
-        dimensions123.each {
-            expectedIntervalsMap.put(new DimensionColumn(dimensionDict.findByApiName(it)), [interval123] as Set)
+        dimensions.each {


Rather than building an empty map, and then using an each to populate it, I would initialize it from the beginning:

expectedIntervalsMap = dimensions123 .collect { dimensionDict.findByApiName(it) } .collectEntries{[(it): [interval123] as Set]}

And then use assignment syntax for the metrics:

metrics123 .collect { new MetricColumn(it)} .each { expectedIntervalsMap[it] = [intervals.interval123] as Set }

@archolewa Thank you very much for teaching me this good practice! I'll take a note on this 😉

archolewa · 2017-11-17T22:19:40Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy

-        TestApiDimensionName dim1 = TestApiDimensionName.BREED
-        TestApiDimensionName dim2 = TestApiDimensionName.SPECIES
-        TestApiDimensionName dim3 = TestApiDimensionName.SEX
+    List<TestApiDimensionName> generateDimensions() {


Rather than making this a list, I would make this a Map<String, TestApiDimensionName>. Then, in extending classes, you can use dimensions.dim3.asName() rather than dimensions[2].asName(). This allows people writing specs to give their dimensions meaningful names and then use them (though admittedly the names we typically use aren't all that more meaningful than dimensions[2]).

The same applies to metrics.

@archolewa Beautiful! 👍 This is exactly what I was considering before looking at this comment

And thank you for letting me know that we can index a map in this smart way in Groovy: dimensions.dim3.asName()

archolewa · 2017-11-17T22:21:48Z

...ore/src/test/groovy/com/yahoo/bard/webservice/metadata/DataSourceMetadataLoadTaskSpec.groovy

+        intervals = generateIntervals()
+        dimensions = generateDimensions()
+        metrics = generateMetrics()
+        versions = generateVersions()


We keep doing this in children classes. Since this simply assigns fields that are in the parent class, why not move this into the parent class?

@archolewa Would you like to move versions = generateVersions() or all other generate*()'s ?

I meant the entire method, sorry I wasn't clear.

Well, If we move all of these and (in some cases) we override generateIntervals() in child Spec, for example, then other generate method might throw an error. For instance

@Override Map<String, Interval> generateIntervals() { [ "interval1": Interval.parse(intervalString1), "interval3": Interval.parse(intervalString3), "interval13": Interval.parse(intervalString13), "interval23": Interval.parse(intervalString23) ] }

Then this default in base Spec won't work, because intervals.interval2 does not exist

Map<String, DataSegment> generateSegments() { NumberedShardSpec partition1 = Stub(NumberedShardSpec) { getPartitionNum() >> 0 } NumberedShardSpec partition2 = Stub(NumberedShardSpec) { getPartitionNum() >> 1 } return [ segment1: new DataSegment( tableName, intervals.interval1, versions.version1, null, dimensions.keySet().toList(), metrics.keySet().toList(), partition1, binaryVersions.binaryVersion1, sizes.size1 ), segment2: new DataSegment( tableName, intervals.interval1, versions.version2, null, dimensions.keySet().toList(), metrics.keySet().toList(), partition2, binaryVersions.binaryVersion1, sizes.size2 ), segment3: new DataSegment( tableName, intervals.interval2, versions.version1, null, dimensions.keySet().toList(), metrics.keySet().toList(), partition1, binaryVersions.binaryVersion1, sizes.size1 ), segment4: new DataSegment( tableName, intervals.interval2, versions.version2, null, dimensions.keySet().toList(), metrics.keySet().toList(), partition2, binaryVersions.binaryVersion1, sizes.size2 ) ] }

archolewa

I meant to request changes...

QubitPi · 2017-11-20T08:22:26Z

@archolewa I applied the dictionary method to all shared variables in BaseDataSourceMetadataSpec, because I think your suggestion is such a good idea that it should be applied more broadly.

In addition, in the last commit, I cleaned up all meta data tests a little bit(organizing init stuff to setup() method and applied more Groovy smart syntax, etc)

archolewa · 2017-11-20T16:34:44Z

...ore/src/test/groovy/com/yahoo/bard/webservice/metadata/DataSourceMetadataLoadTaskSpec.groovy

-    }
+    def setup() {
+        dimensions13 = [
+                dimensions.(TestApiDimensionName.BREED.asName()).asName(),


I didn't know you could do that (wrap TestApiDimensionName.BREED.asName() in parenthesis to use the dot syntax, rather than the [] syntax).

@archolewa I did that just incase if people change BREED API name in the future, they don't break this test 😌 Does that look good to you?

archolewa · 2017-11-20T16:35:13Z

...ore/src/test/groovy/com/yahoo/bard/webservice/metadata/DataSourceMetadataLoadTaskSpec.groovy

+                dimensions.(TestApiDimensionName.BREED.asName()).asName(),
+                dimensions.(TestApiDimensionName.SEX.asName()).asName()
+        ]
+        quotedDimensions = dimensions13.collect() { '"' + it + '"'}


Any particular reason you can't do /"$it"/? Slashy strings

I should do that, yes. Thanks for the doc link 😉

tarrantzhang · 2017-11-22T05:59:02Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/metadata/BaseDataSourceMetadataSpec.groovy

+                        intervals.interval1,
+                        versions.version1,
+                        null,
+                        dimensions.keySet().toList(),


Could use the more groovy way: dimensions.keySet() as List. Same for all the following toList()

tarrantzhang · 2017-11-22T06:38:54Z

...ore/src/test/groovy/com/yahoo/bard/webservice/metadata/DataSourceMetadataLoadTaskSpec.groovy

-    Interval interval2 = Interval.parse("2015-01-02T00:00:00.000Z/2015-01-03T00:00:00.000Z")
-    Interval interval3 = Interval.parse("2015-01-03T00:00:00.000Z/2015-01-04T00:00:00.000Z")
-    Interval interval123 = Interval.parse("2015-01-01T00:00:00.000Z/2015-01-04T00:00:00.000Z")
+    List<String> dimensions13


In general, one thing your strategy confuse me is that in the original Spec, we have something like:

List<String> dimensions123 = [dim1, dim2, dim3]*.asName() List<String> dimensions13 = [dim1, dim3]*.asName()

Your strategy choose to use generateDimensions() to populate your new dimensions123 as the shared variable dimensions, but for dimensions13 it's populate in the setup(). It's a bit inconsistent for me that two similar variables in the old Spec get to be set in different ways.

I might be wrong, but here is my suggestion:

Define all generateBlah() in BaseDataSourceMetadataSpec and for each generateBlah(), populate the map with all possible data you will be using in all child Spec, you can rename the keys to distinguish between which child Spec will be using it:

Map<String, Interval> generateIntervals() { return [ loadTaskSpecInterval1: Interval.parse("2015-01-01T00:00:00.000Z/2015-01-02T00:00:00.000Z"), loadTaskSpecInterval2: Interval.parse("2015-01-02T00:00:00.000Z/2015-01-03T00:00:00.000Z"), loadTaskSpecInterval3: Interval.parse("2015-01-01T00:00:00.000Z/2015-01-03T00:00:00.000Z"), SegmentSpecInterval1: Interval.parse("2015-01-01T00:00:00.000Z/2015-01-04T00:00:00.000Z"), ... ] }

Don't override any generateBlah() in your child Spec, then in your setup():

def setup() { Interval loadTaskSpecInterval1 = intervals.loadTaskSpecInterval1 Interval loadTaskSpecInterval2 = intervals.loadTaskSpecInterval2 ....

With these change you can avoid calling those generateBlah() functions over and over again in your childSetupSepc() as mentioned by @archolewa . You don't even need a childSetupSepc() because the parent Spec does all the population for you. 😼

dimensions123 is gone. dimensions serves as a resource container for all dimensions that are to be used in a Spec

If we initialize all the resources in the way mentioned above, we didn't solve the original problem because in that way we are changing from piling up hardcoded variables to piling up hardcoded strings. In addition, base spec won't become pluggable and expressive. For example, Andrew wrote a child spec which had a resource called "expectedInterval": "2017/2018", but Tarrant also writes a separate spec but also wants to have a resource called "expectedInterval": "2018/2019"(with different value). In that case, Tarrant will be forced to use a different name or different interval value, which makes him a little unhappy. And things becomes worse if variable name affects readability and maintainability of child Spec. Moreover, if Tarrant makes a compromise and use the same "expectedInterval" and Andrew assign expectedInterval to a different value in his test logic, Tarrant will be angry to see his test is not working because someone outside of his test "polluted" his testing resources(This seriously affects the decoupling behavior of two separate classes). In the end, base spec will again take long time to be loaded and initialized, hard to read, and hard to maintain, as more child specs are written, which is exactly what the original problem we are trying to solve.

The reason for not generating these resources but having child extend and override is that, for example, if you override generateInterval, generateSegment throws error in other Specs, since segments depends on generated intervals. What we have in this new paradigm is that we offer default resource package, but if you want a customized package(intervals + dimensions + segments), go define your clean and own world and your things won't be seen by others, won't affect others, and other's customization won't have any impact on you neither

Sounds reasonable

QubitPi · 2017-11-22T07:23:22Z

Thanks @archolewa !
Thanks @tarrantzhang for helping to review at late night 👍 !

QubitPi added the WIP label Jun 23, 2017

QubitPi added the NEED REBASE label Oct 28, 2017

QubitPi force-pushed the better-programmatic-generation-of-metadata-json-in-tests branch from a094a2e to d1969ef Compare November 16, 2017 07:10

QubitPi removed the NEED REBASE label Nov 16, 2017

QubitPi force-pushed the better-programmatic-generation-of-metadata-json-in-tests branch from 1825d9c to d722376 Compare November 17, 2017 06:51

QubitPi added NEED 2 REVIEWS REVIEWABLE TESTING and removed WIP labels Nov 17, 2017

QubitPi force-pushed the better-programmatic-generation-of-metadata-json-in-tests branch from d722376 to 607c8b2 Compare November 17, 2017 06:55

archolewa approved these changes Nov 17, 2017

View reviewed changes

archolewa requested changes Nov 17, 2017

View reviewed changes

QubitPi removed the TESTING label Nov 20, 2017

archolewa reviewed Nov 20, 2017

View reviewed changes

archolewa approved these changes Nov 20, 2017

View reviewed changes

QubitPi added NEED 1 REVIEW and removed NEED 2 REVIEWS labels Nov 20, 2017

QubitPi requested a review from tarrantzhang November 21, 2017 06:20

QubitPi added 6 commits November 21, 2017 16:41

Better programmatic generation of metadata json in tests

5cddd56

Self-review

938f1ac

Add change log

8a6f9b3

Self-review

4989737

Address @archolewa's comments

f2926fd

Address @archolewa's comments

9751ea9

QubitPi force-pushed the better-programmatic-generation-of-metadata-json-in-tests branch from e90c411 to 9751ea9 Compare November 22, 2017 00:42

tarrantzhang reviewed Nov 22, 2017

View reviewed changes

Address Tarrant's comment

4967a75

tarrantzhang approved these changes Nov 22, 2017

View reviewed changes

QubitPi merged commit 888dd8c into master Nov 22, 2017

QubitPi deleted the better-programmatic-generation-of-metadata-json-in-tests branch November 22, 2017 07:24

QubitPi mentioned this pull request Nov 22, 2017

Better programmatic generation of metadata json in tests #392

Closed

QubitPi added MERGEABLE and removed NEED 1 REVIEW REVIEWABLE labels Nov 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better programmatic generation of metadata json in tests #412

Better programmatic generation of metadata json in tests #412

QubitPi commented Jun 23, 2017 •

edited

Loading

archolewa left a comment

archolewa Nov 17, 2017

archolewa Nov 17, 2017

QubitPi Nov 20, 2017 •

edited

Loading

archolewa Nov 17, 2017

QubitPi Nov 20, 2017 •

edited

Loading

archolewa Nov 17, 2017

archolewa Nov 17, 2017

archolewa Nov 17, 2017

QubitPi Nov 20, 2017 •

edited

Loading

archolewa Nov 17, 2017

archolewa Nov 17, 2017

archolewa Nov 17, 2017

QubitPi Nov 20, 2017

archolewa Nov 17, 2017

QubitPi Nov 20, 2017 •

edited

Loading

archolewa Nov 17, 2017

QubitPi Nov 20, 2017

archolewa Nov 20, 2017

QubitPi Nov 20, 2017

archolewa left a comment

QubitPi commented Nov 20, 2017 •

edited

Loading

archolewa Nov 20, 2017

QubitPi Nov 20, 2017

archolewa Nov 20, 2017 •

edited

Loading

QubitPi Nov 20, 2017

tarrantzhang Nov 22, 2017

tarrantzhang Nov 22, 2017

QubitPi Nov 22, 2017 •

edited

Loading

tarrantzhang Nov 22, 2017

QubitPi commented Nov 22, 2017

Better programmatic generation of metadata json in tests #412

Better programmatic generation of metadata json in tests #412

Conversation

QubitPi commented Jun 23, 2017 • edited Loading

archolewa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QubitPi Nov 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QubitPi Nov 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QubitPi Nov 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QubitPi Nov 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa left a comment

Choose a reason for hiding this comment

QubitPi commented Nov 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa Nov 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QubitPi Nov 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QubitPi commented Nov 22, 2017

QubitPi commented Jun 23, 2017 •

edited

Loading

QubitPi Nov 20, 2017 •

edited

Loading

QubitPi Nov 20, 2017 •

edited

Loading

QubitPi Nov 20, 2017 •

edited

Loading

QubitPi Nov 20, 2017 •

edited

Loading

QubitPi commented Nov 20, 2017 •

edited

Loading

archolewa Nov 20, 2017 •

edited

Loading

QubitPi Nov 22, 2017 •

edited

Loading