Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello Azure SQL Database from Cromwell on Azure [VS-812] #8220

Merged
merged 32 commits into from
Feb 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions scripts/variantstore/azure/HelloAzure.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
version 1.0

workflow HelloAzure {
input {
String sql_server
String sql_database
File utf8_token_file
File python_script
File ammonite_script
}
meta {
description: "Workflow to say Hello to Azure SQL database from sqlcmd, Python, and Ammonite (Java ecosystem) contexts"
}
parameter_meta {
sql_server: {
description: "Name of the Azure SQL Database Server without .database.windows.net suffix"
}
sql_database: {
description: "Name of the Database within the Azure SQL Database Server"
}
token_file: {
description: "A file with a UTF-8 encoded access token generated with auth that can access the Azure SQL Database Server. e.g. `az account get-access-token --resource=https://database.windows.net/ --query accessToken --output tsv > db_access_token.txt"
}
}

call HelloFromSqlcmd {
input:
sql_server = sql_server,
sql_database = sql_database,
token_file = utf8_token_file
}

call HelloFromPython {
input:
sql_server = sql_server,
sql_database = sql_database,
python_script = python_script,
token_file = utf8_token_file
}

call HelloFromAmmonite {
input:
sql_server = sql_server,
sql_database = sql_database,
ammonite_script = ammonite_script,
token_file = utf8_token_file
}
}

task HelloFromSqlcmd {
input {
String sql_server
String sql_database
File token_file
}
meta {
description: "Say hello to Azure SQL Database from sqlcmd using a database access token"
}
command <<<
# Prepend date, time and pwd to xtrace log entries.
PS4='\D{+%F %T} \w $ '
set -o errexit -o nounset -o pipefail -o xtrace

# sqlcmd is particular about the formatting and encoding of its access token: no whitespace and UTF-16LE.
# Python is particular too but these manipulations are sprinkled into the code. Java / Ammonite doesn't
# seem to care about encoding or autodetects and adapts?
cat ~{token_file} | cut -f 1 | tr -d '\n' | iconv -f ascii -t UTF-16LE > /tmp/db_access_token.txt

sqlcmd -S tcp:~{sql_server}.database.windows.net,1433 -d ~{sql_database} -G -Q 'select @@version as "Hello Azure SQL Database!"' -P /tmp/db_access_token.txt
>>>
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:coa-2023-02-22"
}
output {
String out = read_string(stdout())
}
}

task HelloFromPython {
input {
String sql_server
String sql_database
File python_script
File token_file
}
meta {
description: "Say hello to Azure SQL Database from Python -> pyodbc -> unixodbc -> MS ODBC driver"
}
command <<<
# Prepend date, time and pwd to xtrace log entries.
PS4='\D{+%F %T} \w $ '
set -o errexit -o nounset -o pipefail -o xtrace

python3 ~{python_script} --server ~{sql_server} --database ~{sql_database} --token-file ~{token_file}
>>>
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:coa-2023-02-22"
}
output {
String out = read_string(stdout())
}
}

task HelloFromAmmonite {
input {
String sql_server
String sql_database
File ammonite_script
File token_file
}
meta {
description: "Say hello to Azure SQL Database from Ammonite/Java -> JDBC -> MS JDBC driver"
}
command <<<
# Prepend date, time and pwd to xtrace log entries.
PS4='\D{+%F %T} \w $ '
set -o errexit -o nounset -o pipefail -o xtrace

amm ~{ammonite_script} --server ~{sql_server} --database ~{sql_database} --tokenFile ~{token_file}
>>>
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:coa-2023-02-22"
}
output {
String out = read_string(stdout())
}
}
18 changes: 18 additions & 0 deletions scripts/variantstore/azure/build_coa_docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
if [ $# -lt 1 ]; then
echo "USAGE: ./build_coa_docker.sh [DOCKER_TAG_STRING] [OPTIONAL:LATEST]"
echo " e.g.: ./build_coa_docker.sh $(date -I)"
exit 1
fi

set -o errexit -o nounset -o pipefail -o xtrace

BASE_REPO="broad-dsde-methods/variantstore"
REPO_WITH_TAG="${BASE_REPO}:coa-${1}"
GCR_TAG="us.gcr.io/${REPO_WITH_TAG}"

docker build . -t "${REPO_WITH_TAG}" -f cromwell_on_azure.Dockerfile

docker tag "${REPO_WITH_TAG}" "${GCR_TAG}"
docker push "${GCR_TAG}"

echo "Docker image pushed to \"${GCR_TAG}\""
65 changes: 65 additions & 0 deletions scripts/variantstore/azure/cromwell_on_azure.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Docker image with a grab bag of utilities for Cromwell on Azure exploration spikes. Not currently optimized for size
# or anything else, this is currently just all the potentially useful things.
FROM ubuntu:20.04

# Azure CLI
# https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-linux?pivots=apt#option-2-step-by-step-installation-instructions
RUN apt-get update
RUN apt-get install --assume-yes ca-certificates curl apt-transport-https lsb-release gnupg

RUN mkdir -p /etc/apt/keyrings
RUN curl -sLS https://packages.microsoft.com/keys/microsoft.asc | \
gpg --dearmor | \
tee /etc/apt/keyrings/microsoft.gpg > /dev/null
RUN chmod go+r /etc/apt/keyrings/microsoft.gpg

# ENV AZ_REPO=$(lsb_release -cs)
# Hardcode to focal/20.04 for consistency with the base image above and sqlcmd setup below
ENV AZ_REPO=focal
RUN echo "deb [arch=`dpkg --print-architecture` signed-by=/etc/apt/keyrings/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | \
tee /etc/apt/sources.list.d/azure-cli.list

RUN apt-get update
RUN apt-get install --assume-yes azure-cli

# Install sqlcmd (Microsoft SQL client)
# https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-setup-tools?view=sql-server-ver16&tabs=ubuntu-install%2Credhat-offline#install-tools-on-linux
# Also sneak in an installation of the driver for Microsoft databases via `msodbcsql18`.
RUN curl https://packages.microsoft.com/keys/microsoft.asc | \
apt-key add -

RUN curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list | \
tee /etc/apt/sources.list.d/msprod.list

RUN apt-get update

# sneaky EULA "acceptance" https://stackoverflow.com/a/42383714
ENV ACCEPT_EULA=Y

# ODBC and Microsoft ODBC SQL driver
RUN apt-get install --assume-yes mssql-tools unixodbc-dev msodbcsql18
ENV PATH=$PATH:/opt/mssql-tools/bin

# Python
RUN apt-get install --assume-yes python3-pip
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt

# Temurin 11 JDK
# https://askubuntu.com/a/1386901
RUN apt-get install --assume-yes wget
RUN wget -O - https://packages.adoptium.net/artifactory/api/gpg/key/public | apt-key add -
RUN echo "deb https://packages.adoptium.net/artifactory/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | tee /etc/apt/sources.list.d/adoptium.list
RUN apt update && apt install --assume-yes temurin-11-jdk

# Coursier / Ammonite for scripting in the Java ecosystem
# https://get-coursier.io/docs/cli-installation#linux
#
# Use the statically linked version for now to get around a broken dynamically linked launcher
# https://github.com/coursier/coursier/issues/2624
# https://stackoverflow.com/a/75232986/21269164
RUN curl -fL "https://github.com/coursier/launchers/raw/master/cs-x86_64-pc-linux-static.gz" | gzip -d > /usr/local/bin/cs
RUN chmod +x /usr/local/bin/cs
RUN mkdir -p /coursier/bin
ENV PATH=/coursier/bin\:$PATH
RUN cs setup --install-dir /coursier/bin --yes
65 changes: 65 additions & 0 deletions scripts/variantstore/azure/hello_from_ammonite.sc
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
// Ammonite script to say Hello to Azure SQL Database from the Java ecosystem.

// Package management
import $ivy.`com.microsoft.sqlserver:mssql-jdbc:12.2.0.jre11`
import $ivy.`com.azure:azure-identity:1.4.6`
// ANTLR 4 appears to be required by one of the above Microsoft packages but the dependency is not expressed explicitly
// so it is not imported automatically. Without adding this import ourselves Ammonite fails to compile this script.
import $ivy.`org.antlr:antlr4:4.12.0`

// Imports
import com.azure.core.credential.*
import com.azure.identity.*
import com.microsoft.sqlserver.jdbc.SQLServerDataSource
import java.nio.charset.StandardCharsets
import java.nio.file.*
import java.sql.*
import java.util.*


// Nearly everything taken from
// https://learn.microsoft.com/en-us/azure/app-service/tutorial-connect-msi-azure-database?tabs=sqldatabase%2Csystemassigned%2Cjava%2Cwindowsclient#3-modify-your-code
def getAccessTokenViaRequest(): String = {
val creds = new DefaultAzureCredentialBuilder().build()
val request = new TokenRequestContext()
request.addScopes("https://database.windows.net//.default");
val accessToken = creds.getToken(request).block()
accessToken.getToken()
}

// Generate a token via
// az account get-access-token --resource=https://database.windows.net/ --query accessToken --output tsv > db_access_token.txt
// Note this produces a token file with a confounding trailing newline, the code below has to `trim()`.
// Also note that unlike the sqlcmd and Python contexts, there is nothing here about UTF-16LE encoding; the Java
// ecosystem seems to deal with the ASCII / UTF-8 access token just fine without the caller doing anything special.
def getAccessTokenViaFile(tokenFile: String): String = {
// https://www.digitalocean.com/community/tutorials/java-read-file-to-string
val token = new String(Files.readAllBytes(Paths.get(tokenFile)))
token.trim()
}

@main
def main(server: String, database: String, tokenFile: Option[String] = None) = {
val ds = new SQLServerDataSource()

val token = tokenFile match {
case Some(file) => getAccessTokenViaFile(file)
case None => getAccessTokenViaRequest()
}
ds.setAccessToken(token)
ds.setServerName(s"${server}.database.windows.net")
ds.setDatabaseName(database)

val connection = ds.getConnection()
val statement = connection.createStatement()

val resultSet = statement.executeQuery("""

select @@version as "Hello Azure SQL Database!"

""")

resultSet.next()
val result = resultSet.getString(1)
print(result)
}
83 changes: 83 additions & 0 deletions scripts/variantstore/azure/hello_from_python.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
from azure.identity import DefaultAzureCredential

import argparse
import pyodbc
import struct


def read_token_from_file(token_file_name):
# https://learn.microsoft.com/en-us/azure/app-service/tutorial-connect-msi-azure-database?tabs=sqldatabase%2Csystemassigned%2Cpython%2Cwindowsclient#3-modify-your-code
with open(token_file_name) as token_file:
token_str = token_file.read().rstrip().encode("UTF-16-LE")
return token_str


def fetch_token():
credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)
token_str = credential.get_token("https://database.windows.net/.default").token.encode("UTF-16-LE")
return token_str


def token_to_struct(token_str: bytes):
token_struct = struct.pack(f'<I{len(token_str)}s', len(token_str), token_str)
return token_struct


def build_connection_string(server, database):
driver = "{ODBC Driver 18 for SQL Server}"
return f'DRIVER={driver};SERVER={server}.database.windows.net;DATABASE={database}'


def connect_via_token(connection_string, token_struct):
# PEP8 has no appreciation for the beauty of ALL_CAPS symbolic constants in functions.
# noinspection PyPep8Naming
SQL_COPT_SS_ACCESS_TOKEN = 1256
return pyodbc.connect(connection_string, attrs_before={SQL_COPT_SS_ACCESS_TOKEN: token_struct})


def connect_via_msi(connection_string):
return pyodbc.connect(connection_string + ";Authentication=ActiveDirectoryMsi")


def query_and_print(connection):
query = """

select @@version as "Hello Azure SQL Database!"

"""

cursor = connection.cursor()
cursor.execute(query)
row = cursor.fetchone()

while row:
print(row[0])
row = cursor.fetchone()


if __name__ == '__main__':
# All taken from
# https://techcommunity.microsoft.com/t5/apps-on-azure-blog/how-to-connect-azure-sql-database-from-python-function-app-using/ba-p/3035595
parser = argparse.ArgumentParser(allow_abbrev=False, description='Say Hello to Azure SQL Database from Python')
parser.add_argument('--server', type=str, help='Azure SQL Server name', required=True)
parser.add_argument('--database', type=str, help='Azure SQL Server database', required=True)
parser.add_argument('--token-file', type=str, help='Azure SQL Database access token', required=False)
parser.add_argument('--msi-auth', type=bool,
help='Use MSI (Managed Service Identity) Authentication.',
required=False, default=False)
args = parser.parse_args()

connection_string = build_connection_string(args.server, args.database)

if args.token_file:
token = read_token_from_file(args.token_file)
token_struct = token_to_struct(token)
connection = connect_via_token(connection_string, token_struct)
elif args.msi_auth:
connection = connect_via_msi(connection_string)
else:
token = fetch_token()
token_struct = token_to_struct(token)
connection = connect_via_token(connection_string, token_struct)

query_and_print(connection)
9 changes: 9 additions & 0 deletions scripts/variantstore/azure/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
azure-identity
azure-keyvault-secrets
azure-mgmt-resource
azure-mgmt-sql
azure-mgmt-storage
azure-mgmt-subscription
azure-storage-blob
inflection
pyodbc
Loading