Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Remove corr in group-by.sql and separate supported SQLQueryTest list for backends #3774

Merged
merged 9 commits into from
Nov 24, 2023

Conversation

marin-ma
Copy link
Contributor

@marin-ma marin-ma commented Nov 20, 2023

Velox corr has better computation logic but it fails Spark's precision check. Based on the discussion in facebookincubator/velox#7204, we decide to use original Velox corr and mute the precision check failure in Gluten UT.

This PR contains following changes:

  1. Moved supported SQL tests into Backends API TestApi.
  2. Copied Spark's inputs/group-by.sql, input/udf/udf-group-by.sql and results/group-by.sql.out, results/udf/udf-group-by.sql.out and removed the query contains "corr".
  3. Fixed SQL tests file name matching, using equals instead of contains.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

3 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@marin-ma
Copy link
Contributor Author

@PHILO-HE @rui-mo Could you help to review? Thanks!

@@ -38,4 +38,6 @@ trait Backend {
def broadcastApi(): BroadcastApi

def settings(): BackendSettingsApi

def testApi(): TestApi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @marin-ma, as this interface is just for test use, can we move it to some place in test module? Maybe, into BackendTestSettings? Thanks!

ignoreList.exists(
t => testCase.name.toLowerCase(Locale.ROOT).contains(t.toLowerCase(Locale.ROOT)))
t => testCase.name.toLowerCase(Locale.ROOT).equals(t.toLowerCase(Locale.ROOT)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, I suppose we wil be missing some sql files under subdirectory. Can we specially handle group-by.sql only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using contains can lead to some unexpected files being added. For example, group-by.sql was in the supported list but udf/udf-group-by.sql wasn't. But the latter one was also tested. If we are sure that one file is supported, it should be explicitly added to the supported list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's just what I mean. With this change, we may be missing some tests for the files under subdirectory. There are some other sql files under subdirectory but share the same name with those in root directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep using contains and exclude udf/udf-group-by.sql? If there are more test cases in udf/udf-group-by.sql, maybe we can merge them into the group-by.sql added by ourselves.
This can make sure other tests are not missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended that we use contains to make udf/udf-group-by.sql a test? I think it would be better to explicitly add it to the supported list. Moreover, the current supported list inconsistently mixes some files specified with subdirectories and others with only the filename. It would be confusing for others who are trying to add new files to the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to use contains derives from Spark SQLQueryTestSuite. It makes sense to me if we need to change it in Gluten test.

}
}

private val SUPPORTED_SQL_QUERY_LIST_SPARK32: Set[String] =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Do you mean some of the filenames listed here might not specifying the subdirectory? If so, we should add the missing subdirectories.

Copy link

Run Gluten Clickhouse CI

@marin-ma
Copy link
Contributor Author

marin-ma commented Nov 22, 2023

@rui-mo I would suggest that I perform a final check to ensure that all files belonging to any subdirectories are specified correctly, and to make sure no files are missing. And we use equals instead of contains to eliminate any uncertainty. What do you think?

@rui-mo
Copy link
Contributor

rui-mo commented Nov 22, 2023

@marin-ma I think it makes sense as long as we don't miss any test.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@marin-ma
Copy link
Contributor Author

@rui-mo @PHILO-HE Could you help to review again? Thanks!

val overwriteTestCaseNames = overwriteTestCases.map(_.name)
listFilesRecursively(new File(inputFilePath))
.flatMap(createTestCase(_, inputFilePath, goldenFilePath))
.filterNot(testCase => overwriteTestCaseNames.contains(testCase.name)) ++ overwriteTestCases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the thoughts here to use contains instead of equals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overwriteTestCaseNames is Seq[String], here it checks whether the testCase.name is in the list.

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I believe commit oap-project/velox@64b00b0 could be removed after this PR. cc @JkSelf

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@marin-ma
Copy link
Contributor Author

Thanks. I believe commit oap-project/velox@64b00b0 could be removed after this PR. cc @JkSelf

Yes. The test PR to remove that commit also passed CI. #3783

@marin-ma marin-ma merged commit 41b8850 into apache:main Nov 24, 2023
16 checks passed
@marin-ma marin-ma changed the title [VL] Remove corr in group-by.sql [VL] Remove corr in group-by.sql and separate supported SQLQueryTest list for backends Nov 24, 2023
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3774_time.csv log/native_master_11_23_2023_df3eb372c_time.csv difference percentage
q1 33.10 34.09 0.987 102.98%
q2 22.72 24.43 1.712 107.54%
q3 37.22 37.11 -0.107 99.71%
q4 36.43 35.49 -0.942 97.41%
q5 70.57 70.74 0.174 100.25%
q6 7.16 6.77 -0.397 94.46%
q7 83.97 85.18 1.214 101.45%
q8 85.69 84.73 -0.956 98.88%
q9 124.85 118.73 -6.114 95.10%
q10 45.90 43.59 -2.306 94.97%
q11 20.13 19.61 -0.524 97.40%
q12 23.72 23.77 0.047 100.20%
q13 45.47 45.78 0.308 100.68%
q14 17.57 17.52 -0.055 99.69%
q15 28.44 28.14 -0.308 98.92%
q16 14.73 15.14 0.413 102.80%
q17 100.86 100.38 -0.474 99.53%
q18 147.12 147.08 -0.048 99.97%
q19 14.48 12.95 -1.532 89.43%
q20 27.09 28.07 0.985 103.64%
q21 219.98 218.99 -0.987 99.55%
q22 13.31 12.93 -0.374 97.19%
total 1220.51 1211.23 -9.284 99.24%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants