You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an issue running Kedro with ThreadRunner to execute the following pipeline:
The primary layer shown in the Kedro Viz above is a series of 21 SQLScriptDataset objects (a pandas.sql_dataset.SQLQueryDataset subclass which formats input queries in a special way using parameters in the catalog and then calls super().__init__).
This Kedro pipeline is triggered as part of a CommandJob in Azure Machine Learning (AML), using a command_job.py which runs a Kedro session with something like this:
In AML, these jobs can be run on two types of compute: a Compute Instance, which is an Ubuntu VM used for development, and Clusters, which are managed infrastructures that allow for the creation of single/multi-node computes for deployment.
When executing the CommandJob, essentially running kedro run with ThreadRunner on a Cluster, the job fails. However, this issue does not occur when running the same job on a Compute Instance, or when run locally from source using kedro run.
These command jobs run with the same environment image in both cases.
Steps to Reproduce
Set up an AML Cluster with the specified environment.
Execute the CommandJob to run the Kedro pipeline with ThreadRunner.
Observe the error.
Expected Result
I would expect the job to run successfully on the cluster, as it does on other compute instances with the same configuration.
The text was updated successfully, but these errors were encountered:
gitgud5000
changed the title
Database Connection Failure on AML clusters using kedro ThreadRunner
Database Connection Failure on AML clusters using kedro ThreadRunnerJun 13, 2024
Hi @ArmandoRl1, could you give more details on your setup? @gitgud5000 already gave a good writeup but the more information we have about this the better.
Description
I have an issue running Kedro with
ThreadRunner
to execute the following pipeline:The primary layer shown in the Kedro Viz above is a series of 21
SQLScriptDataset
objects (apandas.sql_dataset.SQLQueryDataset
subclass which formats input queries in a special way using parameters in the catalog and then callssuper().__init__
).This Kedro pipeline is triggered as part of a
CommandJob
in Azure Machine Learning (AML), using acommand_job.py
which runs a Kedro session with something like this:Problem/Error
After most or all of the datasets in the primary layer are loaded, SQLAlchemy produces the following error:
Context
In AML, these jobs can be run on two types of compute: a Compute Instance, which is an Ubuntu VM used for development, and Clusters, which are managed infrastructures that allow for the creation of single/multi-node computes for deployment.
When executing the
CommandJob
, essentially runningkedro run
withThreadRunner
on a Cluster, the job fails. However, this issue does not occur when running the same job on a Compute Instance, or when run locally from source usingkedro run
.These command jobs run with the same environment image in both cases.
Steps to Reproduce
CommandJob
to run the Kedro pipeline withThreadRunner
.Expected Result
I would expect the job to run successfully on the cluster, as it does on other compute instances with the same configuration.
Actual Result
The job fails with the following error:
Attempts to Resolve
max_workers
in the runner configuration.oracledb
andcx-Oracle
no luck.Logs
Here is a log file of a run with
'echo_pool': 'debug'
and a similar setup, with 5SQLScriptDataset
as input.Running in AzureML.log
Your Environment
kedro
version: 0.19.6kedro-datasets
version: 3.0.0cx-Oracle
version: 8.3.0Standard_D16_v3
The text was updated successfully, but these errors were encountered: