Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
612396c
Initial commit
andylam-db Nov 29, 2023
83e6a51
Add CrossDbmsQueryTestSuite
andylam-db Nov 29, 2023
c691f83
Small changes for niceness
andylam-db Nov 29, 2023
24a02f0
Save point
andylam-db Nov 29, 2023
315847a
Add sqllogictest-select1.sql in postgres-crosstest input file
andylam-db Nov 29, 2023
5e087e1
Add header for class suite
andylam-db Nov 29, 2023
33b52b2
Small changes
andylam-db Nov 29, 2023
81202c4
Trying to get the input files set up..
andylam-db Nov 29, 2023
c9fba5b
Passing test!!
andylam-db Nov 29, 2023
8b5ed22
Ignore all earlier tests first
andylam-db Nov 29, 2023
fea3975
Small comment changes
andylam-db Nov 30, 2023
678a0ef
More comments
andylam-db Nov 30, 2023
309a7e2
Address simple comments
andylam-db Nov 30, 2023
5c52b9c
A little more small changes..
andylam-db Nov 30, 2023
ad867df
More small changes and removal of redundant code
andylam-db Nov 30, 2023
bef8011
Generate golden files for SQLQueryTestSuite
andylam-db Nov 30, 2023
a6e7e2a
Revert "Generate golden files for SQLQueryTestSuite"
andylam-db Nov 30, 2023
1409d63
Ignore "postgres-crosstest" in SQLQueryTestSuite
andylam-db Nov 30, 2023
a552d86
Add comment to clarify why it is ignored
andylam-db Nov 30, 2023
7a9cb80
Fix compilation failures
andylam-db Nov 30, 2023
66f41a5
Regenerate golden files for sqllogictest-select1.sql.out
andylam-db Nov 30, 2023
69614d3
Tiny changes..
andylam-db Nov 30, 2023
2bf43df
Add postgresql back to ignorelist
andylam-db Nov 30, 2023
aba83b8
Add exception handling in CrossDbmsQueryTestSuite
andylam-db Nov 30, 2023
bf4036d
Do refactoring so that we can add an additional input argument for Cr…
andylam-db Dec 1, 2023
08e2cfb
Generate with postgres
andylam-db Dec 1, 2023
2ec381c
Fix compilation error with ThriftServerQueryTestSuite
andylam-db Dec 1, 2023
9c2d283
Generate golden files with SQLQueryTestSuite
andylam-db Dec 1, 2023
ab4ab12
Fixed a bug where tests weren't running against the golden file
andylam-db Dec 4, 2023
0166aed
Use local spark session
andylam-db Dec 4, 2023
2e45a67
Small comment change
andylam-db Dec 4, 2023
496cd82
Merge master
andylam-db Dec 4, 2023
dd4597e
Revert change in ThriftServerQueryTestSuite
andylam-db Dec 4, 2023
cb71979
Remove DialectConverter..
andylam-db Dec 4, 2023
6cd52f1
Don't run the tests if the cross dbms is not specified
andylam-db Dec 5, 2023
7f39707
Remove sqllogictest
andylam-db Dec 18, 2023
615f4d9
Small niceness changes
andylam-db Dec 18, 2023
cf345bd
Do null -> NULL replacement, and put 2 tests in
andylam-db Dec 18, 2023
cdef387
Add docker compose and bash scripts for easy postgres instance
andylam-db Dec 18, 2023
36be209
Trivial changes
andylam-db Dec 18, 2023
d037331
Trivial changes
andylam-db Dec 18, 2023
750a1de
Fix typo
andylam-db Dec 19, 2023
089c292
Add header comments, and make CrossDbmsQuerySuite an abstract class
andylam-db Dec 19, 2023
62d1760
Trivial changes
andylam-db Dec 19, 2023
def270b
Use prepared statements
andylam-db Dec 19, 2023
fdb90a4
Remove sql file changes for now
andylam-db Dec 19, 2023
8c21ceb
Header comment improvements
andylam-db Dec 19, 2023
50c2dfb
Trivial changes
andylam-db Dec 19, 2023
939b8f2
Add custom postgres command
andylam-db Dec 19, 2023
db01061
Modify exists-having to be compatible with psql
andylam-db Dec 19, 2023
d8c7bc9
Merge branch 'master' into crossdbms
andylam-db Dec 19, 2023
7200f7d
Modify query slightly
andylam-db Dec 19, 2023
1d79345
Add readme and add ONLY_IF for most sql files in subquery dir
andylam-db Dec 20, 2023
83cd8d4
Rewrite exists-having
andylam-db Dec 20, 2023
f491d1f
Update comment
andylam-db Dec 20, 2023
cc039ae
Merge master
andylam-db Dec 21, 2023
ce0e106
Ignore subquery-offset.sql
andylam-db Dec 21, 2023
1d7c620
Create new file PostgreSQLQueryTestSuite.scala and fix indent
andylam-db Dec 22, 2023
0e542cf
Move from postgres-results -> crossdbms-results
andylam-db Jan 2, 2024
5d4c577
Fix silly paste
andylam-db Jan 3, 2024
68ea6f2
Fix unintended removal of results file
andylam-db Jan 3, 2024
1e1a283
In middle of refactoring..
andylam-db Jan 3, 2024
5756c8e
Major refactoring in progress..
andylam-db Jan 3, 2024
81a61f7
Passing tests
andylam-db Jan 3, 2024
cf7b044
Add documnetation and delete previous files
andylam-db Jan 3, 2024
a32a693
Move functions from PostgreSQLQueryTestSuite -> CrossDbmsQueryTestSuite
andylam-db Jan 3, 2024
e08f7a4
Merge master..
andylam-db Jan 3, 2024
18f821f
Merge branch 'master' into crossdbms
andylam-db Jan 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark.sql.jdbc

import java.io.File
import java.sql.ResultSet

import scala.collection.mutable.ArrayBuffer
import scala.util.control.NonFatal

import org.apache.spark.sql.Row
import org.apache.spark.sql.SQLQueryTestHelper
import org.apache.spark.sql.catalyst.util.fileToString

/**
* This suite builds off of that to allow us to run other DBMS against the SQL test golden files (on
* which SQLQueryTestSuite generates and tests against) to perform cross-checking for correctness.
* Note that this is not currently run on all SQL input files by default because there is
* incompatibility between SQL dialects for Spark and the other DBMS.
*
* This suite adds a new comment argument, --ONLY_IF. This comment is used to indicate the DBMS for
* which is eligible for the SQL file. These strings are defined in the companion object. For
* example, if you have a SQL file named `describe.sql`, and you want to indicate that Postgres is
* incompatible, add the following comment into the input file:
* --ONLY_IF spark
*/
trait CrossDbmsQueryTestSuite extends DockerJDBCIntegrationSuite with SQLQueryTestHelper {

val DATABASE_NAME: String

protected val baseResourcePath = {
// We use a path based on Spark home for 2 reasons:
// 1. Maven can't get correct resource directory when resources in other jars.
// 2. We test subclasses in the hive-thriftserver module.
getWorkspaceFilePath("sql", "core", "src", "test", "resources", "sql-tests").toFile
}
protected val inputFilePath = new File(baseResourcePath, "inputs").getAbsolutePath
protected val customInputFilePath: String
protected val goldenFilePath = new File(baseResourcePath, "results").getAbsolutePath

protected def listTestCases: Seq[TestCase] = {
listFilesRecursively(new File(customInputFilePath)).flatMap { file =>
val resultFile = file.getAbsolutePath.replace(inputFilePath, goldenFilePath) + ".out"
val absPath = file.getAbsolutePath
val testCaseName = absPath.stripPrefix(customInputFilePath).stripPrefix(File.separator)
RegularTestCase(testCaseName, absPath, resultFile) :: Nil
}.sortBy(_.name)
}

def createScalaTestCase(testCase: TestCase): Unit = {
testCase match {
case _: RegularTestCase =>
// Create a test case to run this case.
test(testCase.name) {
runSqlTestCase(testCase, listTestCases)
}
case _ =>
ignore(s"Ignoring test cases that are not [[RegularTestCase]] for now") {
log.debug(s"${testCase.name} is not a RegularTestCase and is ignored.")
}
}
}

protected def runSqlTestCase(testCase: TestCase, listTestCases: Seq[TestCase]): Unit = {
val input = fileToString(new File(testCase.inputFile))
val (comments, code) = splitCommentsAndCodes(input)
val queries = getQueries(code, comments, listTestCases)

val dbmsConfig = comments.filter(_.startsWith(CrossDbmsQueryTestSuite.ONLY_IF_ARG))
.map(_.substring(CrossDbmsQueryTestSuite.ONLY_IF_ARG.length))
// If `--ONLY_IF` is found, check if the DBMS being used is allowed.
if (dbmsConfig.nonEmpty && !dbmsConfig.contains(DATABASE_NAME)) {
log.info(s"This test case (${testCase.name}) is ignored because it indicates that it is " +
s"not eligible with $DATABASE_NAME.")
} else {
runQueriesAndCheckAgainstGoldenFile(queries, testCase)
}
}

protected def runQueriesAndCheckAgainstGoldenFile(
queries: Seq[String], testCase: TestCase): Unit = {
// The local Spark session is needed because we use Spark analyzed plan to check if the query
// result is already semantically sorted, below.
val localSparkSession = spark.newSession()
val conn = getConnection()
val stmt = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)

val outputs: Seq[QueryTestOutput] = queries.map { sql =>
val output = {
try {
val sparkDf = localSparkSession.sql(sql)
val isResultSet = stmt.execute(sql)
val rows = ArrayBuffer[Row]()
if (isResultSet) {
val rs = stmt.getResultSet
val metadata = rs.getMetaData
while (rs.next()) {
val row = Row.fromSeq((1 to metadata.getColumnCount).map(i => {
val value = rs.getObject(i)
if (value == null) {
"NULL"
} else {
value
}
}))
rows.append(row)
}
}
val output = rows.map(_.mkString("\t")).toSeq
if (isSemanticallySorted(sparkDf.queryExecution.analyzed)) {
output
} else {
// Sort the answer manually if it isn't sorted.
output.sorted
}
} catch {
case NonFatal(e) => Seq(e.getClass.getName, e.getMessage)
}
}

ExecutionOutput(
sql = sql,
// Don't care about the schema for this test. Only care about correctness.
schema = None,
output = output.mkString("\n"))
}
conn.close()

// Read back the golden files.
var curSegment = 0
val expectedOutputs: Seq[QueryTestOutput] = {
val goldenOutput = fileToString(new File(testCase.resultFile))
val segments = goldenOutput.split("-- !query.*\n")
outputs.map { output =>
val result =
ExecutionOutput(
segments(curSegment + 1).trim, // SQL
None, // Schema
normalizeTestResults(segments(curSegment + 3))) // Output
// Assume that the golden file always has all 3 segments.
curSegment += 3
result
}
}

// Compare results.
assertResult(expectedOutputs.size, s"Number of queries should be ${expectedOutputs.size}") {
outputs.size
}

outputs.zip(expectedOutputs).zipWithIndex.foreach { case ((output, expected), i) =>
assertResult(expected.sql, s"SQL query did not match for query #$i\n${expected.sql}") {
output.sql
}
assertResult(expected.output, s"Result did not match" +
s" for query #$i\n${expected.sql}") {
output.output
}
}
}

}

object CrossDbmsQueryTestSuite {

final val POSTGRES = "postgres"
// Argument in input files to indicate that the sql file is restricted to certain systems.
final val ONLY_IF_ARG = "--ONLY_IF "
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark.sql.jdbc

import java.io.File
import java.sql.Connection

import org.apache.spark.tags.DockerTest

/**
* READ THIS IF YOU ADDED A NEW SQL TEST AND THIS SUITE IS FAILING:
* Your new SQL test is automatically opted into this suite. It is likely failing because it is not
* compatible with the default Postgres. You have two options:
* 1. (Recommended) Modify your queries to be compatible with both systems. This is recommended
* because it will run your queries against postgres, providing higher correctness testing
* confidence, and you won't have to manually verify the golden files generated with your test.
* 2. Add this line to your .sql file: --ONLY_IF spark
*
* Note: To run this test suite for a specific version (e.g., postgres:15.1):
* {{{
* ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:15.1
* ./build/sbt -Pdocker-integration-tests
* "testOnly org.apache.spark.sql.jdbc.PostgreSQLQueryTestSuite"
* }}}
*/
@DockerTest
class PostgreSQLQueryTestSuite extends CrossDbmsQueryTestSuite {

val DATABASE_NAME = CrossDbmsQueryTestSuite.POSTGRES
// Scope to only subquery directory for now.
protected val customInputFilePath: String = new File(inputFilePath, "subquery").getAbsolutePath

override val db = new DatabaseOnDocker {
override val imageName = sys.env.getOrElse("POSTGRES_DOCKER_IMAGE_NAME", "postgres:15.1-alpine")
override val env = Map(
"POSTGRES_PASSWORD" -> "rootpass"
)
override val usesIpc = false
override val jdbcPort = 5432

override def getJdbcUrl(ip: String, port: Int): String =
s"jdbc:postgresql://$ip:$port/postgres?user=postgres&password=rootpass"
}

override def dataPreparation(conn: Connection): Unit = {
conn.prepareStatement(
// Custom function `double` to imitate Spark's function, so that more tests are covered.
"""
|CREATE OR REPLACE FUNCTION double(numeric_value numeric) RETURNS double precision
| AS 'select CAST($1 AS double precision);'
| LANGUAGE SQL
| IMMUTABLE
| RETURNS NULL ON NULL INPUT;
|""".stripMargin
).executeUpdate()
}

listTestCases.foreach(createScalaTestCase)
}
Loading