Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(openchallenges): add Mariadb Connection and Load the EDAM Concepts #2680

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 215 additions & 2 deletions apps/openchallenges/edam-etl/poetry.lock

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions apps/openchallenges/edam-etl/project.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
"sourceRoot": "apps/openchallenges/edam-etl/src",
"projectType": "application",
"targets": {
"install-libmariadb-dev": {
"executor": "nx:run-commands",
"options": {
"command": "pip wheel --no-cache-dir --use-pep517 'mariadb (==1.1.10)'",
"cwd": "{projectRoot}"
}
},
"create-config": {
"executor": "nx:run-commands",
"options": {
Expand Down
3 changes: 3 additions & 0 deletions apps/openchallenges/edam-etl/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ python = "3.12.0"
pandas = "2.2.1"
requests = "2.31.0"
regex = "2023.12.25"
mariadb = "^1.1.10"
mysql-connector = "^2.2.9"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version should be pinned. You can't simplify change this value, instead you should reinstall the package with poetry while specifying the version: poetry add mysql-connector@2.2.9.

Same comments for the packages below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the problems with using poetry to install this package back in May. It did not work. I will try again and hopefully there has been a poetry update to improve functionality when working with Maria DB and mysql.

sqlalchemy = "^2.0.30"

[tool.poetry.group.dev.dependencies]

Expand Down
66 changes: 66 additions & 0 deletions apps/openchallenges/edam-etl/src/main.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following command fails:

vscode@12a2d2b44a0f:/workspaces/sage-monorepo$ nx serve openchallenges-edam-etl

> nx run openchallenges-edam-etl:serve

> poetry run python src/main.py

Traceback (most recent call last):
  File "/workspaces/sage-monorepo/apps/openchallenges/edam-etl/src/main.py", line 1, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'
Warning: command "poetry run python src/main.py" exited with non-zero status code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the script with Docker fails:

vscode@12a2d2b44a0f:/workspaces/sage-monorepo$ nx serve-detach openchallenges-edam-etl

> nx run openchallenges-edam-etl:serve-detach

> docker/openchallenges/serve-detach.sh openchallenges-edam-etl

[+] Running 2/2
 ✔ Container openchallenges-mariadb   Healthy                                                                                                                                                               0.6s 
 ✔ Container openchallenges-edam-etl  Started                                                                                                                                                               1.2s 

————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————

 NX   Successfully ran target serve-detach for project openchallenges-edam-etl (2s)

vscode@12a2d2b44a0f:/workspaces/sage-monorepo$ docker logs openchallenges-edam-etl
/usr/local/lib/python3.12/site-packages/mysql/connector/abstracts.py:130: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
  if group is 'connector_python':
/usr/local/lib/python3.12/site-packages/mysql/connector/optionfiles.py:98: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
  if group is 'connector_python':
Traceback (most recent call last):
  File "/opt/app/src/main.py", line 168, in <module>
EDAM Version: None
OC DB URL: jdbc:mysql://openchallenges-mariadb:3306/challenge_service
Downloading the EDAM concepts from GitHub (CSV file)...
Error downloading EDAM concepts: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/Sage-Bionetworks/edamontology/main/releases/EDAM_None.csv
    main()
  File "/opt/app/src/main.py", line 164, in main
    connect_to_mariadb(USERNAME, PASSWORD, PORT, HOST, DB, df)
                                                           ^^
UnboundLocalError: cannot access local variable 'df' where it is not associated with a value

Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
import requests
from os import getenv
from typing import Optional
import mariadb
import sys
import mysql.connector
from sqlalchemy import create_engine

# Get config from the environment variables

Expand All @@ -11,7 +15,66 @@
print(f"EDAM Version: {VERSION}")
print(f"OC DB URL: {OC_DB_URL}")

# Intialize required connection variables from environment variables

USERNAME = getenv("USERNAME")
PASSWORD = getenv("PASSWORD")
PORT = getenv("PORT")
DB = getenv("DB")
HOST = getenv("HOST")

def connect_to_mariadb(username: str, password: str, port: str, host: str, database: str, df: pd.DataFrame) -> None:
"""Connect to the MariaDB database"""
try:
conn = mariadb.connect(
user = username,
password = password,
host = host,
port = int(port),
database = database
)
print("Establishing a connection to the MariaDB Platform.")

# Get the cursor
cur = conn.cursor()
print("Connection has been established to MariaDB Platform!")

# Commit the transaction
conn.commit()
print("The table edam_etl has been added to the edam database!")

# Create a SQLAlchemy engine from the MySQL Connector connection
engine = create_engine(f'mysql+mysqlconnector://{username}:{password}@{host}/{database}')

# Drop the table if it exists
cur.execute("DROP TABLE IF EXISTS edam_etl")

# Create the table with columns
cur.execute("""
CREATE TABLE edam_etl (
id INT PRIMARY KEY,
class_id VARCHAR(255),
preferred_label VARCHAR(255)
)
""")

#Load the concepts
df.to_sql(
name = "edam_etl",
con = engine,
if_exists = "append",
index = False
)

print("The table edam_etl has been populated with the EDAM concepts!")

# Close the connection
conn.close()

except mariadb.Error as e:
print(f"Error connecting to MariaDB Platform: {e}")
sys.exit(1)

def download_edam_csv(url: str, version: str) -> Optional[bool]:
"""Download EDAM concepts from GitHub or S3 bucket (CSV file)"""
print("Downloading the EDAM concepts from GitHub (CSV file)...")
Expand Down Expand Up @@ -92,6 +155,8 @@ def print_info_statistics(df: pd.DataFrame) -> None:
else:
print("No data available.")

# def load_edam_dataframe(df: pd.DataFrame) -> None:


def main() -> None:
"""Main function to execute preceding functions"""
Expand All @@ -101,6 +166,7 @@ def main() -> None:
if download_edam_csv(url, VERSION):
df: pd.DataFrame = transform_to_dataframe(VERSION)
print_info_statistics(df)
connect_to_mariadb(USERNAME, PASSWORD, PORT, HOST, DB, df)


if __name__ == "__main__":
Expand Down
Binary file added mariadb-1.1.10-cp310-cp310-linux_x86_64.whl
mdsage1 marked this conversation as resolved.
Show resolved Hide resolved
Binary file not shown.
Binary file added packaging-24.0-py3-none-any.whl
Binary file not shown.
Loading