Skip to content

Commit

Permalink
WhiteRabbit 1.0, with Snowflake support (#401)
Browse files Browse the repository at this point in the history
* Use and configure license-maven-plugin (org.honton.chas)

* First setup of distribution verification integration test

* Use Java 17 for compilation, updates of test dependencies, update license validation config

* Update comment on CacioTest annotation

* Cleanup

* Add generating fat jars for WhiteRabbit and RabbitInAHat; lock hsqldb version for Java 1.8

* Enforce Java 1.8 for distributed dependencies

* Update main.yml

Project now requires Java 17 to build. Should still produce java 8 (1.8) compatible artifacts though.

* Bump org.apache.avro:avro from 1.11.2 to 1.11.3 in /rabbit-core

Bumps org.apache.avro:avro from 1.11.2 to 1.11.3.

---
updated-dependencies:
- dependency-name: org.apache.avro:avro
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Use jdk8 classifier for hsqldb 2.7.x

* Exclude older version of hsqldb

* Fix image crop when using stem table

* Update stem table image

* Decrease size of table panel when using stem table.

Without this change, the table panel height is always higher than
needed (when using stem table), because the stem table is counted
as one of the items in the components list. It is however shown
separately at the top, which is already accounted for by the
stem table margin.

* Add snowflake support (#37)

* Refactor RichConnection into separate classes, and add an abstraction for the JDBC connection. Implement a Snowflake connection with this abstraction

* Add unit tests for SnowflakeConnector

* Added Snowflake support for SourceDataScan; added minimal test for it; some refactorings to move database responsibility to rabbit-core/databases

* Move more database details to rabbit-core/databases

* Clearer name for method

* Ignore snowflake.env

* Create PostgreSQL container in the TestContainers way

* Refactored Snowflake tests + a bit of documentation

* Fix Snowflake test for Java 17, and make it into an automated integration test instead of a unit test

* Remove duplicate postgresql test

* Make TestContainers based database tests into automated integration tests

* Suppress some warnings when generating fat jars

* Let autimatic integration tests fail when docker is not available

* Allow explicit skipping of Snowflake integration tests

* Added tests for Snowflake, delimited text files

* Switch to fully verifying the scan results against a reference version (v0.10.7)

* Working integration test for Snowflake, and some refactorings

* Some proper logging, small code improvements and cleanup

* Remove unused interface

* Added tests, some changes to support testing

* Make automated test work reliably (way too many changes, sorry)

* Rudimentary support for Snowflake authenticator parameter (untested)

* review xmlbeans dependencies, remove conflict

* extend integration test for distribution

* Restructuring database configuration. Work in process, but unit and integration tests all OK

* Restructuring database configuration 2/x. Still work in process, but unit and integration tests all OK

* Restructuring database configuration 3/x. Still work in process, but unit and integration tests all OK

* Restructuring database configuration 4/x. Still work in process, but unit and integration tests all OK

* Restructuring database configuration 5/x. Still work in process, but unit and integration tests all OK

* Restructuring database configuration 6/x. Still work in process, but unit and integration tests all OK

* Restructuring database configuration 7/x. Still work in process, but unit and integration tests all OK

* Intermezzo: get rid of the package naming error (upper case R in whiteRabbit)

* Intermezzo: code cleanup

* Snowflake is now working from the GUI. And many small refactorings, like logging instead of printing to stout/err

* Refactor DbType into an enum, get rid of DBChoice

* Move DbType and DbSettings classes into configuration subpackage

* Avoid using a manually destructured DbSettings object when creating a RochConnection object

* Code cleanup, remove unneeded Snowflake references

* Refactoring, code cleanup

* More refactoring, code cleanup

* More refactoring, code cleanup and documentation

* Make sure that order of databases in pick list in GUI is the same as before, and enforce completeness of that list in a test

* Add/update copyright headers

* Add line to verify that a tooltip is shown for a DBConnectionInterface implementing class

* Test distribution for Snowflake JDBC issue with Java 17

* cleanup of build files

* Add verification that all JDBC drivers are in the distributed package

* Add/improve error reporting for Snowflake

* Disable screenshottaker in GuiTestExtension, hoping that that is what blocks the build on github. Fingers crossed

* Better(?) naming for database interface and implementing class

* Use our own GUITestExtension class

---------

Co-authored-by: Jan Blom <janblom@thehyve.nl>

* Add mysql test (#38)

* Fixed a bug in the comparison for sort; let comparison report report all differences before failing

* Allow the user to specify the port for a MySQL server

* Add tests for a MySQL source database

* Add sas test (#39)

* Add automated regression tests for SAS files

* Fix problems with comparisons of test results to references

* create bypass for value mismatch that only shows up in github actions so far

* create bypass for value mismatch that only shows up in github actions so far, 2nd

* Pom updates to enable building on MacOS

* Prepare release (#40)

* Add warehouse/database handling to StorageHandler class

* Show stdout/stderr from distribution verification when there are errors

* Pom updates to enable building on MacOS

* Update dependencies as far as possible without code changes

* Update README.md

---------

Co-authored-by: Jan Blom <janblom@thehyve.nl>

* Update whiterabbit/src/main/java/org/ohdsi/whiterabbit/WhiteRabbitMain.java

The sample size should start disabled, as the calculateNumericStats checkbox is unchecked by default.

Co-authored-by: Maxim Moinat <maximmoinat@gmail.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jan Blom <janblom@thehyve.nl>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Spayralbe <stefan@thehyve.nl>
Co-authored-by: Maxim Moinat <maximmoinat@gmail.com>
  • Loading branch information
5 people authored Feb 7, 2024
1 parent 654ed24 commit 4bfe7e8
Show file tree
Hide file tree
Showing 26 changed files with 856 additions and 90 deletions.
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,12 @@ Screenshots

Technology
============
White Rabbit and Rabbit in a Hat are pure Java applications. Both applications use [Apache's POI Java libraries](http://poi.apache.org/) to read and write Word and Excel files. White Rabbit uses JDBC to connect to the respective databases.
White Rabbit and Rabbit in a Hat are pure Java applications. Both applications use [Apache's POI Java libraries](http://poi.apache.org/) to read and write Word and Excel files.
White Rabbit uses JDBC to connect to the respective databases.

System Requirements
============
Requires Java 1.8 or higher, and read access to the database to be scanned. Java can be downloaded from
Requires Java 1.8 or higher for running, and read access to the database to be scanned. Java can be downloaded from
<a href="http://www.java.com" target="_blank">http://www.java.com</a>.

Dependencies
Expand Down Expand Up @@ -101,16 +102,23 @@ To generate the files ready for distribution, run `mvn install`.
### Testing

A limited number of unit and integration tests exist. The integration tests run only in the maven verification phase,
(`mn verify`) and depend on docker being available to the user running the verification. If docker is not available, the
(`mvn verify`) and depend on docker being available to the user running the verification. If docker is not available, the
integration tests will fail.

Also, GitHub actions have been configured to run the test suite automatically.

#### MacOS

It is currently not possible to run the maven verification phase on MacOS, as all GUI tests will fail with an
exception. This has not been resolved yet.
The distributable packages can be built on MacOS using `mvn clean package -DskipTests=true`, but be aware that
a new release must be validated on a platform where all tests can run.

#### Snowflake

There are automated tests for Snowflake, but since it is not (yet?) possible to have a local
Snowflake instance in a Docker container, these test will only run if the following information
is provided through environment variables:
is provided through system properties, in a file named `snowflake.env` in the root directory of the project:

SNOWFLAKE_WR_TEST_ACCOUNT
SNOWFLAKE_WR_TEST_USER
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.release>1.8</maven.compiler.release>
<maven.compiler.release>8</maven.compiler.release>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

<skipTests>false</skipTests>
Expand Down
20 changes: 10 additions & 10 deletions rabbit-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<version>8.1.0</version>
<version>8.3.0</version>
</dependency>
<dependency>
<groupId>org.dom4j</groupId>
Expand Down Expand Up @@ -87,17 +87,17 @@
<dependency>
<groupId>org.apache.xmlbeans</groupId>
<artifactId>xmlbeans</artifactId>
<version>5.1.1</version>
<version>5.2.0</version>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.6.0</version>
<version>42.7.1</version>
</dependency>
<dependency>
<groupId>com.cedarsoftware</groupId>
<artifactId>json-io</artifactId>
<version>4.14.1</version>
<version>4.18.0</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
Expand All @@ -112,12 +112,12 @@
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.2</version>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.24.0</version>
<version>1.25.0</version>
</dependency>
<dependency>
<groupId>com.healthmarketscience.jackcess</groupId>
Expand All @@ -144,7 +144,7 @@
<dependency>
<groupId>com.amazon.redshift</groupId>
<artifactId>redshift-jdbc42</artifactId>
<version>2.1.0.18</version>
<version>2.1.0.25</version>
</dependency>
<dependency>
<groupId>com.teradata.jdbc</groupId>
Expand Down Expand Up @@ -265,18 +265,18 @@
<dependency>
<groupId>net.snowflake</groupId>
<artifactId>snowflake-jdbc</artifactId>
<version>3.14.3</version>
<version>3.14.5</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>RELEASE</version>
<version>5.10.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
<version>4.5.14</version>
<scope>compile</scope>
</dependency>
<dependency>
Expand Down
21 changes: 19 additions & 2 deletions rabbit-core/src/main/java/org/ohdsi/databases/DBConnector.java
Original file line number Diff line number Diff line change
Expand Up @@ -127,12 +127,12 @@ public static Connection connectToPostgreSQL(String server, String user, String

public static Connection connectToMySQL(String server, String user, String password) {
try {
Class.forName("com.mysql.jdbc.Driver");
Class.forName("com.mysql.cj.jdbc.Driver");
} catch (ClassNotFoundException e1) {
throw new RuntimeException("Cannot find JDBC driver. Make sure the file mysql-connector-java-x.x.xx-bin.jar is in the path");
}

String url = "jdbc:mysql://" + server + ":3306/?useCursorFetch=true&zeroDateTimeBehavior=convertToNull";
String url = createMySQLUrl(server);

try {
return DriverManager.getConnection(url, user, password);
Expand All @@ -141,6 +141,23 @@ public static Connection connectToMySQL(String server, String user, String passw
}
}

static String createMySQLUrl(String server) {
final String jdbcProtocol = "jdbc:mysql://";

// only insert the default port if no port was specified
if (!server.contains(":")) {
if (!server.endsWith("/")) {
server += "/";
}
server = server.replace("/", ":3306/");
}

String url = (!server.startsWith(jdbcProtocol) ? jdbcProtocol : "") + server;
url += "?useCursorFetch=true&zeroDateTimeBehavior=convertToNull";

return url;
}

public static Connection connectToODBC(String server, String user, String password) {
try {
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@
public enum SnowflakeHandler implements StorageHandler {
INSTANCE();

final static Logger logger = LoggerFactory.getLogger(SnowflakeHandler.class);

DBConfiguration configuration = new SnowflakeConfiguration();
private DBConnection snowflakeConnection = null;

Expand Down Expand Up @@ -99,6 +97,12 @@ public DBConnection getDBConnection() {
return this.snowflakeConnection;
}

public String getUseQuery(String ignoredDatabase) {
String useQuery = String.format("USE WAREHOUSE \"%s\";", configuration.getValue(SNOWFLAKE_WAREHOUSE).toUpperCase());
logger.info("SnowFlakeHandler will execute query: " + useQuery);
return useQuery;
}

@Override
public String getTableSizeQuery(String tableName) {
return String.format("SELECT COUNT(*) FROM %s.%s.%s;", this.getDatabase(), this.getSchema(), tableName);
Expand Down
65 changes: 63 additions & 2 deletions rabbit-core/src/main/java/org/ohdsi/databases/StorageHandler.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,18 @@
******************************************************************************/
package org.ohdsi.databases;

import org.apache.commons.lang.StringUtils;
import org.ohdsi.databases.configuration.*;
import org.ohdsi.utilities.files.IniFile;
import org.ohdsi.utilities.files.Row;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.PrintStream;
import java.sql.DatabaseMetaData;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
Expand All @@ -35,6 +39,8 @@
*/
public interface StorageHandler {

Logger logger = LoggerFactory.getLogger(StorageHandler.class);

/**
* Creates an instance of the implementing class, or can return the singleton for.
*
Expand Down Expand Up @@ -94,9 +100,18 @@ default long getTableSize(String tableName ) {
*
* No-op by default.
*
* @param ignoredDatabase provided for compatibility
* @param database database to use
*/
default void use(String ignoredDatabase) {}
default void use(String database) {
String useQuery = getUseQuery(database);
if (StringUtils.isNotEmpty(useQuery)) {
execute(useQuery);
}
}

default String getUseQuery(String ignoredDatabase) {
return null;
}

/**
* closes the connection to the database. No-op by default.
Expand All @@ -118,6 +133,7 @@ default void close() {
*/
default List<String> getTableNames() {
List<String> names = new ArrayList<>();
use(getDatabase());
String query = this.getTablesQuery(getDatabase());

for (Row row : new QueryResult(query, new DBConnection(this, getDbType(), false))) {
Expand Down Expand Up @@ -230,4 +246,49 @@ default DbSettings getDbSettings(IniFile iniFile, ValidationFeedback feedback, P
* Returns the DBConfiguration object for the implementing class
*/
DBConfiguration getDBConfiguration();

default void execute(String sql) {
execute(sql, false);
}

default void execute(String sql, boolean verbose) {
Statement statement = null;
try {
if (StringUtils.isEmpty(sql)) {
return;
}

statement = getDBConnection().createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
for (String subQuery : sql.split(";")) {
if (verbose) {
String abbrSQL = subQuery.replace('\n', ' ').replace('\t', ' ').trim();
if (abbrSQL.length() > 100)
abbrSQL = abbrSQL.substring(0, 100).trim() + "...";
logger.info("Adding query to batch: " + abbrSQL);
}

statement.addBatch(subQuery);
}
long start = System.currentTimeMillis();
if (verbose) {
logger.info("Executing batch");
}
statement.executeBatch();
if (verbose) {
// TODO outputQueryStats(statement, System.currentTimeMillis() - start);
}
} catch (SQLException e) {
logger.error(sql);
logger.error(e.getMessage(), e);
} finally {
if (statement != null) {
try {
statement.close();
} catch (SQLException e) {
logger.error(e.getMessage());
}
}
}
}

}
10 changes: 10 additions & 0 deletions rabbit-core/src/test/java/org/ohdsi/databases/DBConnectorTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,14 @@ void testJDBCDriverAndVersion(String driverName) throws ClassNotFoundException {
}
}
}

@Test
void createMySQLUrl() {
assertTrue(DBConnector.createMySQLUrl("127.0.0.1").contains(":3306/"),
"The default port (:3306) should have been added when no port is specified in the server string");
assertTrue(DBConnector.createMySQLUrl("127.0.0.1/").contains(":3306"),
"The default port (:3306) should have been added when no port is specified in the server string");
assertFalse(DBConnector.createMySQLUrl("127.0.0.1:12345/").contains(":3306"),
"The default port (:3306) should not have been added when a port is specified in the server string");
}
}
20 changes: 15 additions & 5 deletions rabbitinahat/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -63,20 +63,30 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<phase>process-test-resources</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<tasks>
<target>
<unzip src="${project.basedir}/../examples.zip" dest="${basedir}/target/test-classes" overwrite="true"/>
</tasks>
</target>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.3.1</version>
<configuration>
<nonFilteredFileExtensions>
<nonFilteredFileExtension>csv</nonFilteredFileExtension>
</nonFilteredFileExtensions>
</configuration>
</plugin>
<plugin>
<!-- enforce that the used dependencies support Java 1.8 -->
<groupId>org.apache.maven.plugins</groupId>
Expand Down Expand Up @@ -129,7 +139,7 @@
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<version>3.24.2</version>
<version>3.25.2</version>
<scope>test</scope>
</dependency>
<!-- causes warning for CVE-2020-15250 but is only used in test scope -->
Expand All @@ -142,7 +152,7 @@
<dependency>
<groupId>com.github.caciocavallosilano</groupId>
<artifactId>cacio-tta</artifactId>
<version>1.17.1</version>
<version>1.17.3</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.maven.plugins/maven-antrun-plugin -->
Expand All @@ -155,7 +165,7 @@
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>5.9.2</version>
<version>5.10.1</version>
<scope>test</scope>
</dependency>
</dependencies>
Expand Down
Loading

0 comments on commit 4bfe7e8

Please sign in to comment.