-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #13 from nextflow-io/abhinav/google-bigquery
Add Google BigQuery support
- Loading branch information
Showing
18 changed files
with
539 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Google BigQuery integration setup | ||
|
||
## Pre-requisites | ||
|
||
1. A Google Cloud project with BigQuery APIs enabled | ||
2. A service account with sufficient permissions | ||
|
||
## Usage | ||
|
||
In the example below, it is assumed that the [NCBI SRA Metadata](https://www.ncbi.nlm.nih.gov/sra/docs/sra-bigquery/) has been used as the data source. You can refer the official [NCBI docs](https://www.ncbi.nlm.nih.gov/sra/docs/sra-bigquery/) for setting up the `nih-sra-datastore` within your BigQuery console. | ||
|
||
*NOTE*: For Google BiqQuery you do not need to specify the `user` and `password` fields as these are provided by your service account credentials JSON file. | ||
|
||
### Configuration | ||
|
||
```nextflow config | ||
//NOTE: Replace the values in the config file as per your setup | ||
params { | ||
google_bigquery_db = "nih-sra-datastore.sra.metadata" | ||
google_project_id = "<YOUR_GOOGLE_PROJECT_ID>" | ||
google_service_account_email = "<YOUR_GOOGLE_SERVICE_ACCOUNT_EMAIL>" | ||
google_service_account_key = "<YOUR_GOOGLE_SERVICE_ACCOUNT_KEY_LOCATION>" | ||
} | ||
plugins { | ||
id 'nf-bigquery@0.0.1' | ||
} | ||
sql { | ||
db { | ||
googlebigquery { | ||
url = "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=${params.google_project_id};OAuthType=0;OAuthServiceAcctEmail=${params.google_service_account_email};OAuthPvtKeyPath=${params.google_service_account_key};" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
### Pipeline | ||
|
||
Once the configuration has been setup correctly, you can use it in the Nextflow code as shown below | ||
|
||
```nextflow | ||
include { fromQuery } from 'plugin/nf-bigquery' | ||
def googleSqlQuery = """ | ||
SELECT * | ||
FROM `nih-sra-datastore.sra.metadata` | ||
WHERE organism = 'Mycobacterium tuberculosis' | ||
AND bioproject = 'PRJNA670836' | ||
LIMIT 2; | ||
""" | ||
Channel.fromQuery(googleSqlQuery, db: 'googlebigquery') | ||
.view() | ||
``` | ||
|
||
### Output | ||
|
||
When you execute the above code, you'll see the query results on the console | ||
|
||
```console | ||
[SRR6797500, WGS, SAN RAFFAELE, public, SRX3756197, 131677, Illumina HiSeq 2500, PAIRED, RANDOM, GENOMIC, ILLUMINA, SRS3011891, SAMN08629009, Mycobacterium tuberculosis, SRP128089, 2018-03-02, PRJNA428596, 165, null, 201, 383, null, 131677_WGS, Pathogen.cl, null, uncalculated, uncalculated, null, null, null, bam, sra, s3, s3.us-east-1, {k=assemblyname, v=GCF_000195955.2}, {k=bases, v=383901808}, {k=bytes, v=173931377}, {k=biosample_sam, v=MTB131677}, {k=collected_by_sam, v=missing}, {k=collection_date_sam, v=2010/2014}, {k=host_disease_sam, v=Tuberculosis}, {k=host_sam, v=Homo sapiens}, {k=isolate_sam, v=Clinical isolate18}, {k=isolation_source_sam_ss_dpl262, v=Not applicable}, {k=lat_lon_sam, v=Not collected}, {k=primary_search, v=131677}, {k=primary_search, v=131677_210916_BGD_210916_100.gatk.bam}, {k=primary_search, v=131677_WGS}, {k=primary_search, v=428596}, {k=primary_search, v=8629009}, {k=primary_search, v=PRJNA428596}, {k=primary_search, v=SAMN08629009}, {k=primary_search, v=SRP128089}, {k=primary_search, v=SRR6797500}, {k=primary_search, v=SRS3011891}, {k=primary_search, v=SRX3756197}, {k=primary_search, v=bp0}, {"assemblyname": "GCF_000195955.2", "bases": 383901808, "bytes": 173931377, "biosample_sam": "MTB131677", "collected_by_sam": ["missing"], "collection_date_sam": ["2010/2014"], "host_disease_sam": ["Tuberculosis"], "host_sam": ["Homo sapiens"], "isolate_sam": ["Clinical isolate18"], "isolation_source_sam_ss_dpl262": ["Not applicable"], "lat_lon_sam": ["Not collected"], "primary_search": "131677"}] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
/* | ||
* Copyright 2020-2022, Seqera Labs | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
plugins { | ||
id 'java-library' | ||
id 'groovy' | ||
id 'idea' | ||
id 'de.undercouch.download' version '4.1.2' | ||
} | ||
|
||
group = 'io.nextflow' | ||
// DO NOT SET THE VERSION HERE | ||
// THE VERSION FOR PLUGINS IS DEFINED IN THE `/resources/META-INF/MANIFEST.NF` file | ||
java { | ||
toolchain { | ||
languageVersion = JavaLanguageVersion.of(11) | ||
} | ||
} | ||
|
||
idea { | ||
module.inheritOutputDirs = true | ||
} | ||
|
||
repositories { | ||
mavenCentral() | ||
maven { url = 'https://s3-eu-west-1.amazonaws.com/maven.seqera.io/releases' } | ||
maven { url = 'https://s3-eu-west-1.amazonaws.com/maven.seqera.io/snapshots' } | ||
} | ||
|
||
configurations { | ||
// see https://docs.gradle.org/4.1/userguide/dependency_management.html#sub:exclude_transitive_dependencies | ||
runtimeClasspath.exclude group: 'org.slf4j', module: 'slf4j-api' | ||
} | ||
|
||
sourceSets { | ||
main.java.srcDirs = [] | ||
main.groovy.srcDirs = ['src/main'] | ||
main.resources.srcDirs = ['src/resources'] | ||
test.groovy.srcDirs = ['src/test'] | ||
test.java.srcDirs = [] | ||
test.resources.srcDirs = [] | ||
} | ||
|
||
ext{ | ||
nextflowVersion = '22.08.1-edge' | ||
} | ||
|
||
dependencies { | ||
compileOnly "io.nextflow:nextflow:$nextflowVersion" | ||
compileOnly 'org.slf4j:slf4j-api:1.7.10' | ||
compileOnly 'org.pf4j:pf4j:3.4.1' | ||
|
||
api("org.codehaus.groovy:groovy-sql:3.0.10") { transitive = false } | ||
|
||
api project(":plugins:nf-sqldb") | ||
|
||
// JDBC driver setup for Google BigQuery - the 3rd party JAR are being downloaded and setup as gradle tasks below. | ||
// Reference https://cloud.google.com/bigquery/docs/reference/odbc-jdbc-drivers | ||
api files('src/dist/lib/GoogleBigQueryJDBC42.jar') | ||
//NOTE: Had to remove the slf4j jar due to a conflict | ||
implementation fileTree(dir: 'src/dist/lib/libs', include: '*.jar') | ||
|
||
|
||
testImplementation "io.nextflow:nextflow:$nextflowVersion" | ||
testImplementation "org.codehaus.groovy:groovy:3.0.10" | ||
testImplementation "org.codehaus.groovy:groovy-nio:3.0.10" | ||
testImplementation("org.codehaus.groovy:groovy-test:3.0.10") { exclude group: 'org.codehaus.groovy' } | ||
testImplementation("cglib:cglib-nodep:3.3.0") | ||
testImplementation("org.objenesis:objenesis:3.2") | ||
testImplementation("org.spockframework:spock-core:2.1-groovy-3.0") { | ||
exclude group: 'org.codehaus.groovy'; | ||
exclude group: 'net.bytebuddy' | ||
} | ||
testImplementation('org.spockframework:spock-junit4:2.1-groovy-3.0') { | ||
exclude group: 'org.codehaus.groovy'; | ||
exclude group: 'net.bytebuddy' | ||
} | ||
testImplementation('com.google.jimfs:jimfs:1.1') | ||
|
||
testImplementation(testFixtures("io.nextflow:nextflow:$nextflowVersion")) | ||
testImplementation(testFixtures("io.nextflow:nf-commons:$nextflowVersion")) | ||
} | ||
|
||
test { | ||
useJUnitPlatform() | ||
} | ||
|
||
/** | ||
* Google BigQuery | ||
* The following tasks download and confirm the MD5 checksum of the ZIP archive | ||
* for Simba BigQuery JDBC driver and extract its contents to the build directory | ||
* Reference: https://cloud.google.com/bigquery/docs/reference/odbc-jdbc-drivers | ||
*/ | ||
task downloadBigqueryDep(type: Download) { | ||
src 'https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.3.0.1001.zip' | ||
dest new File(buildDir, 'downloads/SimbaJDBCDriverforGoogleBigQuery42_1.3.0.1001.zip') | ||
overwrite false | ||
} | ||
|
||
task verifyBigqueryDep(type: Verify, dependsOn: downloadBigqueryDep) { | ||
src new File(buildDir, 'downloads/SimbaJDBCDriverforGoogleBigQuery42_1.3.0.1001.zip') | ||
algorithm 'MD5' | ||
checksum '2e54169cfba2050f0a0f01bcf12c8aa7' | ||
} | ||
|
||
task unzipBigqueryDep(dependsOn: verifyBigqueryDep, type: Copy) { | ||
from zipTree(new File(buildDir, 'downloads/SimbaJDBCDriverforGoogleBigQuery42_1.3.0.1001.zip')) | ||
into "${buildDir}/downloads/unzip/googlebigquery" | ||
} | ||
unzipBigqueryDep.doLast{ | ||
file("${buildDir}/downloads/unzip/googlebigquery/libs/slf4j-api-1.7.36.jar").delete() | ||
} | ||
|
||
// Files under src/dist are included into the distribution zip | ||
// https://docs.gradle.org/current/userguide/application_plugin.html | ||
task copyBigqueryDep(dependsOn: unzipBigqueryDep, type: Copy) { | ||
from file(new File(buildDir, '/downloads/unzip/googlebigquery/GoogleBigQueryJDBC42.jar')) | ||
into "src/dist/lib" | ||
} | ||
|
||
task copyBigqueryLibs(dependsOn: copyBigqueryDep, type: Copy) { | ||
from file(new File(buildDir, '/downloads/unzip/googlebigquery/libs')) | ||
into "src/dist/lib/libs" | ||
} | ||
|
||
project.copyPluginLibs.dependsOn('copyBigqueryLibs') | ||
project.compileGroovy.dependsOn('copyBigqueryLibs') |
17 changes: 17 additions & 0 deletions
17
plugins/nf-bigquery/src/main/nextflow/sql/BigQueryDriverRegistry.groovy
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
package nextflow.sql | ||
|
||
import nextflow.sql.config.DriverRegistry | ||
|
||
|
||
/** | ||
* @author : jorge <jorge.aguilera@seqera.io> | ||
* | ||
*/ | ||
class BigQueryDriverRegistry extends DriverRegistry { | ||
|
||
BigQueryDriverRegistry(){ | ||
super() | ||
addDriver('bigquery','com.simba.googlebigquery.jdbc.Driver') | ||
} | ||
|
||
} |
34 changes: 34 additions & 0 deletions
34
plugins/nf-bigquery/src/main/nextflow/sql/BigQuerySqlPlugin.groovy
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
/* | ||
* Copyright 2020-2022, Seqera Labs | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
* | ||
*/ | ||
|
||
package nextflow.sql | ||
|
||
import nextflow.sql.config.DriverRegistry | ||
import org.pf4j.PluginWrapper | ||
|
||
/** | ||
* Implements BigQuerySQL plugin for Nextflow | ||
* | ||
* @author : jorge <jorge.aguilera@seqera.io> | ||
*/ | ||
class BigQuerySqlPlugin extends SqlPlugin { | ||
|
||
BigQuerySqlPlugin(PluginWrapper wrapper) { | ||
super(wrapper) | ||
DriverRegistry.DEFAULT = new BigQueryDriverRegistry() | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Manifest-Version: 1.0 | ||
Plugin-Class: nextflow.sql.BigQuerySqlPlugin | ||
Plugin-Id: nf-bigquery | ||
Plugin-Provider: Seqera Labs | ||
Plugin-Version: 0.0.1 | ||
Plugin-Requires: >=22.08.1-edge |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# | ||
# Copyright 2020-2022, Seqera Labs | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
nextflow.sql.ChannelSqlExtension |
49 changes: 49 additions & 0 deletions
49
plugins/nf-bigquery/src/test/nextflow/sql/bigquery/BigQuerySqlDataSourceTest.groovy
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
/* | ||
* Copyright 2020-2022, Seqera Labs | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
* | ||
*/ | ||
|
||
package nextflow.sql.bigquery | ||
|
||
import nextflow.sql.BigQueryDriverRegistry | ||
import nextflow.sql.config.DriverRegistry | ||
import nextflow.sql.config.SqlDataSource | ||
import spock.lang.Specification | ||
/** | ||
* | ||
* @author Paolo Di Tommaso <paolo.ditommaso@gmail.com> | ||
*/ | ||
class BigQuerySqlDataSourceTest extends Specification { | ||
|
||
def 'should map url to driver' () { | ||
given: | ||
DriverRegistry.DEFAULT = new BigQueryDriverRegistry() | ||
def helper = new SqlDataSource([:]) | ||
|
||
expect: | ||
helper.urlToDriver(JBDC_URL) == DRIVER | ||
where: | ||
JBDC_URL | DRIVER | ||
'jdbc:postgresql:database' | 'org.postgresql.Driver' | ||
'jdbc:sqlite:database' | 'org.sqlite.JDBC' | ||
'jdbc:h2:mem:' | 'org.h2.Driver' | ||
'jdbc:mysql:some-host' | 'com.mysql.cj.jdbc.Driver' | ||
'jdbc:mariadb:other-host' | 'org.mariadb.jdbc.Driver' | ||
'jdbc:duckdb:' | 'org.duckdb.DuckDBDriver' | ||
'jdbc:awsathena:' | 'com.simba.athena.jdbc.Driver' | ||
'jdbc:bigquery:' | 'com.simba.googlebigquery.jdbc.Driver' | ||
} | ||
|
||
} |
Oops, something went wrong.