-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better programmatic generation of metadata json in tests #412
Better programmatic generation of metadata json in tests #412
Conversation
a094a2e
to
d1969ef
Compare
1825d9c
to
d722376
Compare
d722376
to
607c8b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the ideas look reasonable. Mostly comments around trying to make the Groovy code a bit cleaner, more concise, and/or easier to read.
@Shared | ||
String version1 | ||
Integer binversion1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why is it an Integer
rather than an int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archolewa A binary version(binVersion
) is used as a required argument to construct a DataSegment
. That argument is an Integer
@Shared | ||
DataSegment segment1 | ||
|
||
long size2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these the size of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archolewa Same reason as binVersion
above
|
||
@Shared | ||
List<String> dimensions123 | ||
def shutdownSpec() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be cleanupSpec
. Also if we're going to have a childSetupSpec
then we should have a childCleanupSpec
as well, so that people can cleanup whatever they setup.
[ | ||
"interval1": Interval.parse("2015-01-01T00:00:00.000Z/2015-01-02T00:00:00.000Z"), | ||
"interval2": Interval.parse("2015-01-02T00:00:00.000Z/2015-01-03T00:00:00.000Z"), | ||
"interval12": Interval.parse("2015-01-01T00:00:00.000Z/2015-01-03T00:00:00.000Z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to wrap quotes around String keys in Groovy:
interval1: Interval.parse(...),
interval2: Interval.parse(...),
...
version2 = DateTimeFormat.fullDateTime().print(DateTime.now()) | ||
RangeSet<DateTime> generateRangeSet() { | ||
rangeSet = TreeRangeSet.create() | ||
rangeSet.add(Range.closedOpen(intervals["interval12"].getStart(), intervals["interval12"].getEnd())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, when accessing string keys in a map, you can just write map.key
:
rangeSet.add(Range.closedOpen(intervals.interval12.getStart(), intervals.intervals12.getEnd
Though since you're pulling out interval12 twice, I'd probably pull it out into a variable:
Interval interval12 = intervals.interval12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archolewa Thank you very much for the tip! I've changed this as well as how intervals are accessed in the same way.
[ | ||
new DataSegment( | ||
tableName, | ||
intervals["interval1"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Groovy has a syntactic shortcut for accessing string keys: intervals.interval1
|
||
def shutdownSpec() { | ||
DateTimeZone.setDefault(currentTZ) | ||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I prefer to use an explicit return
statement in functions/methods that are longer than one or two lines. Makes it clearer what you're doing with that last line.
@@ -170,10 +145,10 @@ class DataSourceMetadataLoadTaskSpec extends Specification { | |||
druidWS.jsonResponse = {fullDataSourceMetadataJson} | |||
|
|||
expectedIntervalsMap = [:] | |||
dimensions123.each { | |||
expectedIntervalsMap.put(new DimensionColumn(dimensionDict.findByApiName(it)), [interval123] as Set) | |||
dimensions.each { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than building an empty map, and then using an each
to populate it, I would initialize it from the beginning:
expectedIntervalsMap = dimensions123
.collect { dimensionDict.findByApiName(it) }
.collectEntries{[(it): [interval123] as Set]}
And then use assignment syntax for the metrics:
metrics123
.collect { new MetricColumn(it)}
.each { expectedIntervalsMap[it] = [intervals.interval123] as Set }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archolewa Thank you very much for teaching me this good practice! I'll take a note on this 😉
TestApiDimensionName dim1 = TestApiDimensionName.BREED | ||
TestApiDimensionName dim2 = TestApiDimensionName.SPECIES | ||
TestApiDimensionName dim3 = TestApiDimensionName.SEX | ||
List<TestApiDimensionName> generateDimensions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than making this a list
, I would make this a Map<String, TestApiDimensionName>
. Then, in extending classes, you can use dimensions.dim3.asName()
rather than dimensions[2].asName()
. This allows people writing specs to give their dimensions meaningful names and then use them (though admittedly the names we typically use aren't all that more meaningful than dimensions[2]
).
The same applies to metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archolewa Beautiful! 👍 This is exactly what I was considering before looking at this comment
And thank you for letting me know that we can index a map in this smart way in Groovy: dimensions.dim3.asName()
intervals = generateIntervals() | ||
dimensions = generateDimensions() | ||
metrics = generateMetrics() | ||
versions = generateVersions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We keep doing this in children classes. Since this simply assigns fields that are in the parent class, why not move this into the parent class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archolewa Would you like to move versions = generateVersions()
or all other generate*()
's ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the entire method, sorry I wasn't clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, If we move all of these and (in some cases) we override generateIntervals()
in child Spec, for example, then other generate
method might throw an error. For instance
@Override
Map<String, Interval> generateIntervals() {
[
"interval1": Interval.parse(intervalString1),
"interval3": Interval.parse(intervalString3),
"interval13": Interval.parse(intervalString13),
"interval23": Interval.parse(intervalString23)
]
}
Then this default in base Spec won't work, because intervals.interval2
does not exist
Map<String, DataSegment> generateSegments() {
NumberedShardSpec partition1 = Stub(NumberedShardSpec) {
getPartitionNum() >> 0
}
NumberedShardSpec partition2 = Stub(NumberedShardSpec) {
getPartitionNum() >> 1
}
return [
segment1: new DataSegment(
tableName,
intervals.interval1,
versions.version1,
null,
dimensions.keySet().toList(),
metrics.keySet().toList(),
partition1,
binaryVersions.binaryVersion1,
sizes.size1
),
segment2: new DataSegment(
tableName,
intervals.interval1,
versions.version2,
null,
dimensions.keySet().toList(),
metrics.keySet().toList(),
partition2,
binaryVersions.binaryVersion1,
sizes.size2
),
segment3: new DataSegment(
tableName,
intervals.interval2,
versions.version1,
null,
dimensions.keySet().toList(),
metrics.keySet().toList(),
partition1,
binaryVersions.binaryVersion1,
sizes.size1
),
segment4: new DataSegment(
tableName,
intervals.interval2,
versions.version2,
null,
dimensions.keySet().toList(),
metrics.keySet().toList(),
partition2,
binaryVersions.binaryVersion1,
sizes.size2
)
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to request changes...
@archolewa I applied the dictionary method to all shared variables in In addition, in the last commit, I cleaned up all meta data tests a little bit(organizing init stuff to |
} | ||
def setup() { | ||
dimensions13 = [ | ||
dimensions.(TestApiDimensionName.BREED.asName()).asName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know you could do that (wrap TestApiDimensionName.BREED.asName()
in parenthesis to use the dot syntax, rather than the []
syntax).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archolewa I did that just incase if people change BREED
API name in the future, they don't break this test 😌 Does that look good to you?
dimensions.(TestApiDimensionName.BREED.asName()).asName(), | ||
dimensions.(TestApiDimensionName.SEX.asName()).asName() | ||
] | ||
quotedDimensions = dimensions13.collect() { '"' + it + '"'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason you can't do /"$it"/
? Slashy strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should do that, yes. Thanks for the doc link 😉
e90c411
to
9751ea9
Compare
intervals.interval1, | ||
versions.version1, | ||
null, | ||
dimensions.keySet().toList(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use the more groovy way: dimensions.keySet() as List
. Same for all the following toList()
Interval interval2 = Interval.parse("2015-01-02T00:00:00.000Z/2015-01-03T00:00:00.000Z") | ||
Interval interval3 = Interval.parse("2015-01-03T00:00:00.000Z/2015-01-04T00:00:00.000Z") | ||
Interval interval123 = Interval.parse("2015-01-01T00:00:00.000Z/2015-01-04T00:00:00.000Z") | ||
List<String> dimensions13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, one thing your strategy confuse me is that in the original Spec, we have something like:
List<String> dimensions123 = [dim1, dim2, dim3]*.asName()
List<String> dimensions13 = [dim1, dim3]*.asName()
Your strategy choose to use generateDimensions()
to populate your new dimensions123
as the shared variable dimensions
, but for dimensions13
it's populate in the setup()
. It's a bit inconsistent for me that two similar variables in the old Spec get to be set in different ways.
I might be wrong, but here is my suggestion:
- Define all
generateBlah()
inBaseDataSourceMetadataSpec
and for eachgenerateBlah()
, populate the map with all possible data you will be using in all child Spec, you can rename the keys to distinguish between which child Spec will be using it:
Map<String, Interval> generateIntervals() {
return [
loadTaskSpecInterval1: Interval.parse("2015-01-01T00:00:00.000Z/2015-01-02T00:00:00.000Z"),
loadTaskSpecInterval2: Interval.parse("2015-01-02T00:00:00.000Z/2015-01-03T00:00:00.000Z"),
loadTaskSpecInterval3: Interval.parse("2015-01-01T00:00:00.000Z/2015-01-03T00:00:00.000Z"),
SegmentSpecInterval1: Interval.parse("2015-01-01T00:00:00.000Z/2015-01-04T00:00:00.000Z"),
...
]
}
- Don't override any
generateBlah()
in your child Spec, then in yoursetup()
:
def setup() {
Interval loadTaskSpecInterval1 = intervals.loadTaskSpecInterval1
Interval loadTaskSpecInterval2 = intervals.loadTaskSpecInterval2
....
With these change you can avoid calling those generateBlah()
functions over and over again in your childSetupSepc()
as mentioned by @archolewa . You don't even need a childSetupSepc()
because the parent Spec does all the population for you. 😼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimensions123
is gone. dimensions
serves as a resource container for all dimensions that are to be used in a Spec
If we initialize all the resources in the way mentioned above, we didn't solve the original problem because in that way we are changing from piling up hardcoded variables to piling up hardcoded strings. In addition, base spec won't become pluggable and expressive. For example, Andrew wrote a child spec which had a resource called "expectedInterval": "2017/2018", but Tarrant also writes a separate spec but also wants to have a resource called "expectedInterval": "2018/2019"(with different value). In that case, Tarrant will be forced to use a different name or different interval value, which makes him a little unhappy. And things becomes worse if variable name affects readability and maintainability of child Spec. Moreover, if Tarrant makes a compromise and use the same "expectedInterval" and Andrew assign expectedInterval
to a different value in his test logic, Tarrant will be angry to see his test is not working because someone outside of his test "polluted" his testing resources(This seriously affects the decoupling behavior of two separate classes). In the end, base spec will again take long time to be loaded and initialized, hard to read, and hard to maintain, as more child specs are written, which is exactly what the original problem we are trying to solve.
The reason for not generating these resources but having child extend and override is that, for example, if you override generateInterval
, generateSegment
throws error in other Specs, since segments depends on generated intervals. What we have in this new paradigm is that we offer default resource package, but if you want a customized package(intervals + dimensions + segments), go define your clean and own world and your things won't be seen by others, won't affect others, and other's customization won't have any impact on you neither
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable
Thanks @archolewa ! |
Address issue #392
The strategy is to have all metadata specs under
com.yahoo.bard.webservice.metadata
to extendBaseDataSourceMetadataSpec
, which is a base specification of all metadata tests. All metadata Specs overridechildSetupSpec()
, and call/overridegenerate*()
methods to initiate metadata related resources.Take "intervals" as an example:
childSetupSpec()
and call its generation method in your Spec as follows