-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink: refactor sink tests to reduce the number of combinations with parameterized tests #10777
Conversation
998bee4
to
85168ca
Compare
@@ -118,26 +118,4 @@ protected String getFullQualifiedTableName(String tableName) { | |||
static String getURI(HiveConf conf) { | |||
return conf.get(HiveConf.ConfVars.METASTOREURIS.varname); | |||
} | |||
|
|||
static String toWithClause(Map<String, String> props) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is moved to SqlBase
@@ -41,7 +40,7 @@ | |||
import org.junit.jupiter.api.extension.RegisterExtension; | |||
import org.junit.jupiter.api.io.TempDir; | |||
|
|||
public abstract class TestBase extends TestBaseUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestBaseUtils
seems like a pure util class with static methods only
@@ -169,39 +154,6 @@ public void testOverwriteTable() throws Exception { | |||
icebergTable, Lists.newArrayList(SimpleDataUtil.createRecord(2, "b"))); | |||
} | |||
|
|||
@TestTemplate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved these tests to the new TestFlinkTableSinkExtended
which only tests on the dimensions of isStreamingJob
boolean value. Currently every test run with 20+ combinations of parameters. Don't want to add range distribution test here to further slow down the test run time.
} | ||
|
||
@TestTemplate | ||
public void testJobNoneDistributeMode() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved all tests related to distribution mode to TestFlinkIcebergSinkDistributionMode
. we don't need to test different file formats (Avro, Orc, Parquet) for distribution mode.
85168ca
to
bf7aaa9
Compare
@rodmeneses: Could you please take a look. I think it will greatly affect your testing for Sink V2 |
import org.junit.jupiter.api.io.TempDir; | ||
|
||
/** | ||
* This tests the more extended features of Flink sink. Extract them separately since it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These
instead of This
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This
is actually correct. but I see why it is confusing. let me change it to This class tests ...
*/ | ||
@ExtendWith(ParameterizedTestExtension.class) | ||
public class TestFlinkTableSinkExtended extends SqlBase { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. will remove. but one empty line at the beginning is also ok with Iceberg coding style
|
||
@TempDir protected File warehouseRoot; | ||
|
||
protected final String flinkDatabase = CATALOG + "." + DATABASE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these ones could be also static
. Also make them uppercase when changing to static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
protected TableEnvironment getTableEnv() { | ||
if (tEnv == null) { | ||
synchronized (this) { | ||
EnvironmentSettings.Builder settingsBuilder = EnvironmentSettings.newInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically the implementation of the singleton pattern ask for another if (tEnv == null)
after synchronized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably copied it somewhere. we don't actually need double checked locking. I will just make the method synchronized to simplify the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments, otherwise lgtm
…parameterized tests. E.g. it is not really necessary to test different file format for distribution mode (hash, range). I found this necessary when trying to add new tests for the new range distribution feature. We often run 20+ combinations for every test/method in some classes.
bf7aaa9
to
b7f39a2
Compare
…ergSinkExtended without combinational parameters
* @param catalogName The catalog to drop | ||
* @param ifExists If we should use the 'IF EXISTS' when dropping the catalog | ||
*/ | ||
protected void dropCatalog(String catalogName, boolean ifExists) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a throughout cleanup instead of dropping random catalogs?
Like:
- Dropping every catalog other than the dafault
- Dropping every database other than the default
- Dropping every table other than the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now, inherited classes clean up created tables and databases explicitly, e.g. TableSourceTestBase
. We can implement what you suggested. but it will be a couple more SQL executions (1) list databases (2) for each database (usually one), list tables.
@AfterEach
public void clean() {
sql("DROP TABLE IF EXISTS %s.%s", DATABASE_NAME, TABLE_NAME);
dropDatabase(DATABASE_NAME, true);
dropCatalog(CATALOG_NAME, true);
}
I am leaning a little more toward the existing approach. Or if we want to change it, we can do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to reduce the complexity of writing tests.
If we create a good cleanup method we can use everywhere then we just call that in an AfterEach
method and forget about it altogether. Otherwise we need to remember which database/table was created during the tests. Also, if you different test methods creating different tables, then you need to add logic to the cleanup to differentiate those.
I feel more inclined to the generic cleanup
method, even if it adds a few more calls to the test runs.
What do you think @rodmeneses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if it adds a few more calls to the test runs.
note that this is for each test method. with parameterized tests, some classes have dozens of tests. this can add up quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have numbers?
Higher test cases means higher probability of a mistake to forget to delete a db or a table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to redo the catalog clean up pattern across the board, I think it is better to do it in a separate PR if we can get consensus on the new approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
thanks @rodmeneses and @pvary for the review |
…toring. 1.20 porting is not needed as the 1.20 code was based on the hash that already included PR apache#10777.
…parameterized tests (apache#10777)
E.g. it is not really necessary to test different file format for distribution mode (hash, range). I found this when trying to add new tests for the new range distribution feature. We often run 20+ combinations for every test/method in some classes.