Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task] Load the transformed EDAM data into the OC Challenge Service DB #2548

Open
1 task done
Tracked by #2546
tschaffter opened this issue Mar 4, 2024 · 7 comments
Open
1 task done
Tracked by #2546
Assignees

Comments

@tschaffter
Copy link
Member

tschaffter commented Mar 4, 2024

What product(s) is this story for?

OpenChallenges

As a user, I want

No response

Description

Depends on #2547

Load the transformed EDAM data generated in #2547 into the OC Challenge Service DB.

Acceptance criteria

Running the following commands download, transform, and loads the EDAM data into the OC Challenge Service DB:

  • nx serve openchallenges-edam-etl
  • nx serve-detach openchallenges-edam-etl

Tasks

  • Read the DB connection config parameters from environment variables defined in .env
  • Define a parameter that control whether the script should overwrite the data in the DB if data already exist
  • Connect to the OC Challenge Service DB
  • Detect if the EDAM data already exist in the DB
  • Delete the existing data if needed
  • Load the data into the DB

Anything else?

See #2524 and its PR to get familiar with the environment of the project openchallenges-edam-etl.

Have you linked this story to a GitHub Project?

  • I have linked this story to a GitHub Project and set its metadata.
@tschaffter tschaffter changed the title Load the transformat EDAM data into the OC Challenge Service DB [Task] Load the transformat EDAM data into the OC Challenge Service DB Mar 4, 2024
@tschaffter tschaffter changed the title [Task] Load the transformat EDAM data into the OC Challenge Service DB [Task] Load the transformed EDAM data into the OC Challenge Service DB Apr 5, 2024
@tschaffter
Copy link
Member Author

Added to Sprint 24.4

@tschaffter
Copy link
Member Author

Moved to Backlog

@mdsage1
Copy link
Contributor

mdsage1 commented May 14, 2024

Update: 05/13/2024
Challenges: N/A
Remaining Tasks: Implement the load aspect of the ETL process so that the generated dataset is available/accessible in the MariaDB

@mdsage1
Copy link
Contributor

mdsage1 commented May 15, 2024

Update: 05/15/2024
Challenges:

@tschaffter I have written code to connect to MariaDB using python. The OC_DB_URL within the .env file, jdbc:mysql://openchallenges-mariadb:3306/edam_etl, isn't used according to the documentation I have located. Resource1 and Resource2 for connecting to MariaDB.

I've received this error when using jdbc:mysql://openchallenges-mariadb:3306/edam_etl as the Host:
Error connecting to MariaDB Platform: Plugin jdbc:mysql could not be loaded: /usr/lib/x86_64-linux-gnu/libmariadb3/plugin/jdbc:mysql.so: cannot open shared object file: No such file or directory
Warning: command "poetry run python src/main.py" exited with non-zero status code

I get this error when I change the host to openchallenges-mariadb and don't use the OC_DB_URL in the .env file:

Error connecting to MariaDB Platform: Can't connect to local server through socket '/run/mysqld/mysqld.sock' (2)
Warning: command "poetry run python src/main.py" exited with non-zero status code

I'm wondering if the variables are being assigned incorrectly.

@tschaffter
Copy link
Member Author

I believe that we have solve the issue since your last message. Feel free to get rid of the config variable OC_DB_URL. We use it the OC microservice because the DB client we use in Java accept this URL as a parameter, which not be the case of the DB client for Python you use.

@mdsage1
Copy link
Contributor

mdsage1 commented May 16, 2024

@tschaffter For PR #2680 It looks like the CI/pr (pull_request) check is failing because it can not find the installation of the MariaDB Connector/C required by Maria DB which doesn't support PEP builds. I was able to bypass this in the Dev container by running the indicated command in the terminal but I guess that doesn't transfer to the PR. It says it needs to be preinstalled but I'm unsure how that works w/in microservices. Should I be creating a script that will perform this operation w/in the app folder?

Using virtualenv: /workspaces/sage-monorepo/apps/openchallenges/edam-etl/.venv
Installing dependencies from lock file

Package operations: 1 install, 0 updates, 0 removals

  • Installing mariadb (1.1.10)

ChefBuildError

Backend subprocess exited when trying to invoke get_requires_for_build_wheel

/bin/sh: 1: mariadb_config: not found
Traceback (most recent call last):
File "/etc/poetry/venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/etc/poetry/venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/etc/poetry/venv/lib/python3.10/site-packages/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmppwky36mf/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmppwky36mf/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
self.run_setup()
File "/tmp/tmppwky36mf/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 487, in run_setup
super().run_setup(setup_script=setup_script)
File "/tmp/tmppwky36mf/.venv/lib/python3.12/site-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "", line 27, in
File "/tmp/tmp783xeam4/mariadb-1.1.10/mariadb_posix.py", line 62, in get_config
cc_version = mariadb_config(config_prg, "cc_version")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp783xeam4/mariadb-1.1.10/mariadb_posix.py", line 28, in mariadb_config
raise EnvironmentError(
OSError: mariadb_config not found.

This error typically indicates that MariaDB Connector/C, a dependency which
must be preinstalled, is not found.
If MariaDB Connector/C is not installed, see installation instructions
If MariaDB Connector/C is installed, either set the environment variable
MARIADB_CONFIG or edit the configuration file 'site.cfg' to set the
'mariadb_config' option to the file location of the mariadb_config utility.

at /etc/poetry/venv/lib/python3.10/site-packages/poetry/installation/chef.py:164 in _prepare
160│
161│ error = ChefBuildError("\n\n".join(message_parts))
162│
163│ if error is not None:
→ 164│ raise error from None
165│
166│ return path
167│
168│ def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with mariadb (1.1.10) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 "mariadb (==1.1.10)"'.

@tschaffter
Copy link
Member Author

tschaffter commented May 16, 2024

@mdsage1 Hint: Look at the files in the EDAM ETL project folder, in particular to project.json. There is a perfect place somewhere where the pip command could be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants