Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(openchallenges): add Mariadb Connection and Load the EDAM Concepts #2680

Closed
wants to merge 26 commits into from

Conversation

mdsage1
Copy link
Contributor

@mdsage1 mdsage1 commented May 15, 2024

Description

EDAM ETL processes need to be developed to incorporate ETAM ontology in the Maria DB linking the ontology to existing data. This PR will address the load portion.

Related Issue

Contribute to #2524
Contribute to #2548

Fixes #2548

Changelog

  1. Create connection to MariaDB using Python
  2. Load the data in the Pandas dataframe that match the content of this file

Preview

This is the output as the project is run:

image

This will run automatically but, you would need to run nx serve-detach <project_name>

Copy link

@mdsage1 mdsage1 marked this pull request as ready for review June 27, 2024 15:38
@tschaffter
Copy link
Member

@mdsage1 Could you please share an update here about this PR?

@mdsage1
Copy link
Contributor Author

mdsage1 commented Aug 1, 2024

@tschaffter This pr was marked ready for review. I thought that I was getting emails about the Checks failing but they were not. Please review the changes and let me know your thoughts. I seem to be getting emails about other PRs I've gone ahead and synced this branch to hold the changes evident in the main branch and all CI/CD checks are good. Thanks!

@mdsage1 mdsage1 requested a review from tschaffter August 1, 2024 22:50
Copy link
Member

@tschaffter tschaffter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, unfortunately I'm encountering several issues (see comments).

Please add or update the following information in the original post of this PR:

  • Changelogs
  • Step by step instructions on how to use the project openchallenges-edam-etl (prepare, build, run, etc.)
  • Include a Preview section that shows that the PR works (refer to other PRs for examples)

poetry install --with prod,dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command nx prepare openchallenges-edam-elt shows this warning:

Group(s) not found: dev (via --with), prod (via --with)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the command used the l and the t are in the wrong order. nx prepare openchallenges-edam-elt should be nx prepare openchallenges-edam-etl I will run tests post Sage week to confirm. I'm currently trying to finish infrastructure changes for an imaging de-identification challenge that has submission queue opening on Monday.

[tool.poetry.group.prod.dependencies]

[tool.poetry.group.test.dependencies]
mysql-connector = "^2.2.9"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version should be pinned. You can't simplify change this value, instead you should reinstall the package with poetry while specifying the version: poetry add mysql-connector@2.2.9.

Same comments for the packages below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the problems with using poetry to install this package back in May. It did not work. I will try again and hopefully there has been a poetry update to improve functionality when working with Maria DB and mysql.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following command fails:

vscode@12a2d2b44a0f:/workspaces/sage-monorepo$ nx serve openchallenges-edam-etl

> nx run openchallenges-edam-etl:serve

> poetry run python src/main.py

Traceback (most recent call last):
  File "/workspaces/sage-monorepo/apps/openchallenges/edam-etl/src/main.py", line 1, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'
Warning: command "poetry run python src/main.py" exited with non-zero status code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the script with Docker fails:

vscode@12a2d2b44a0f:/workspaces/sage-monorepo$ nx serve-detach openchallenges-edam-etl

> nx run openchallenges-edam-etl:serve-detach

> docker/openchallenges/serve-detach.sh openchallenges-edam-etl

[+] Running 2/2
 ✔ Container openchallenges-mariadb   Healthy                                                                                                                                                               0.6s 
 ✔ Container openchallenges-edam-etl  Started                                                                                                                                                               1.2s 

————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————

 NX   Successfully ran target serve-detach for project openchallenges-edam-etl (2s)

vscode@12a2d2b44a0f:/workspaces/sage-monorepo$ docker logs openchallenges-edam-etl
/usr/local/lib/python3.12/site-packages/mysql/connector/abstracts.py:130: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
  if group is 'connector_python':
/usr/local/lib/python3.12/site-packages/mysql/connector/optionfiles.py:98: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
  if group is 'connector_python':
Traceback (most recent call last):
  File "/opt/app/src/main.py", line 168, in <module>
EDAM Version: None
OC DB URL: jdbc:mysql://openchallenges-mariadb:3306/challenge_service
Downloading the EDAM concepts from GitHub (CSV file)...
Error downloading EDAM concepts: 404 Client Error: Not Found for url: https://raw.githubusercontent.com/Sage-Bionetworks/edamontology/main/releases/EDAM_None.csv
    main()
  File "/opt/app/src/main.py", line 164, in main
    connect_to_mariadb(USERNAME, PASSWORD, PORT, HOST, DB, df)
                                                           ^^
UnboundLocalError: cannot access local variable 'df' where it is not associated with a value

@mdsage1
Copy link
Contributor Author

mdsage1 commented Aug 2, 2024

Thanks for the update, unfortunately I'm encountering several issues (see comments).

Please add or update the following information in the original post of this PR:

  • Changelogs
  • Step by step instructions on how to use the project openchallenges-edam-etl (prepare, build, run, etc.)
  • Include a Preview section that shows that the PR works (refer to other PRs for examples)

There is a preview section within the initial PR comment that shows that the PR worked when it was marked ready for review 2 months ago. It would be helpful if changes to how PR documentation should be handled were well documented and not applied to PRs that have been awaiting review prior to the change implementation. I will try to document changelogs best I can however, they may not be as inclusive as possible considering that when this work was completed Changelogs weren't required. Furthermore, this was an addition not a change so I'm not sure what you'd like included in the Changelog. Could you please provide an example of a Changelog for the addition of a feature? This feature was assigned to me with several steps so I'm unsure if I should just be duplicating what the Issue description is in the PR since the Feature request was broken into multiple Issues. I'm also unsure where the Changelog should be located given the complexity of the OpenChallenges infrastructure. Please let me know where best to document step-by-step instructions on how to use the project since this is the first new project I've created. Thanks!

@tschaffter
Copy link
Member

Warning

This branch was created before the repo migrated to pnpm as the Node.js package manager. Consider creating a new branch from main. Alternatively, we could meet in a call to update this branch together.

@tschaffter
Copy link
Member

This PR hasn't been updated in a while so I'll go ahead and close it. Since this branch still relies on the Yarn package manager, it would be easier to create it again from main rather than update it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task] Load the transformed EDAM data into the OC Challenge Service DB
2 participants