-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Move random data generation methods from CometCastSuite to new DataGenerator class #426
Conversation
@@ -907,28 +866,6 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { | |||
} | |||
} | |||
|
|||
// TODO Commented out to work around scalafix since this is currently unused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this commented out code since it is no longer needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks good to me except one minor comment.
However, the CI is failing, you may need to take a look at that.
@@ -35,6 +35,9 @@ import org.apache.comet.expressions.{CometCast, Compatible} | |||
class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { | |||
import testImplicits._ | |||
|
|||
/** Create a data generator using a fixed seed so that tests are reproducible */ | |||
private val gen = new DataGenerator(new Random(42)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to make this instance a field in the Companion object of RandomDataGenerator?
So that other test suites can all use the same random data generator by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand what you are suggesting, I think you are saying to add this:
object DataGenerator {
val DEFAULT = new DataGenerator(new Random(42))
}
And then reference that from the test suite?
private val gen = DataGenerator.DEFAULT
I think this could lead to some non-deterministic behavior. For example, running CometCastSuite
on its own from my IDE would start with the freshly created generator, but running the same suite after after test suites have run would result in different inputs.
I think it would be better for each test to create a new instance of the generator with a fixed seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could define it as a def
rather than a val
to force a new creation each time. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new suggested approach, that’s better and should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @advancedxy. I have implemented your suggestion.
Yes, this change effectivey changed the seed and it has exposed new compatibility issues in existing tests. I am investigating. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #426 +/- ##
============================================
+ Coverage 34.02% 34.18% +0.16%
+ Complexity 857 850 -7
============================================
Files 116 116
Lines 38565 38552 -13
Branches 8517 8523 +6
============================================
+ Hits 13120 13178 +58
+ Misses 22691 22609 -82
- Partials 2754 2765 +11 ☔ View full report in Codecov by Sentry. |
@@ -108,7 +108,7 @@ object CometCast { | |||
Compatible() | |||
case DataTypes.ByteType | DataTypes.ShortType | DataTypes.IntegerType | | |||
DataTypes.LongType => | |||
Compatible() | |||
Incompatible(Some("Not all invalid inputs are detected")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #431 to fix this
private val gen = DataGenerator.DEFAULT | ||
|
||
/** Number of random data items to generate in each test */ | ||
private val dataSize = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the size is increased 10x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. At 1000 I saw string -> short fail, but not the other string -> integer tests. When I increased this to 10000 then all of the string -> integer tests failed, so seems like we get better coverage at this level
Double.NaN, | ||
Double.PositiveInfinity, | ||
Double.NegativeInfinity, | ||
0.0d) ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we cover
1.0d,
-1.0d,
Int.MinValue.toDouble,
Int.MaxValue.toDouble,
0.0d,
-0.0d
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. But I think it could be addressed in a follow-up pr.
Double.NaN, | ||
Double.PositiveInfinity, | ||
Double.NegativeInfinity, | ||
0.0d) ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. But I think it could be addressed in a follow-up pr.
Merged. Thanks @andygrove |
… DataGenerator class (apache#426) * add new DataGenerator class * move more data gen methods * ignore some failing tests, update compat docs * address feedback * fix regression (cherry picked from commit fcf7d5b)
Which issue does this PR close?
N/A
Rationale for this change
Start moving random data generation out of
CometCastSuite
and into a class that can be used from other test suites.What changes are included in this PR?
How are these changes tested?
Existing tests