You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to refer to a database/table in Databricks we get kedro.io.core.DataSetError: Failed while loading data from data set SparkHiveDataSet cannot resolve 'namespace' given input columns: [databaseName];;
Context
We want to access data catalogs within Databricks with reference to Database-Table pairs instead of dbfs file paths for better governance. SparkHiveDataset is the current method until the Delta Dataset implementation comes in.
Steps to Reproduce
Create a spark hive dataset
Run a kedro pipeline with dbconnect flow and observe the error
[And so on...]
Expected Result
Ideally it should be able to load the data in the table and process further
Actual Result
we get an error
###My Debug
The issue comes from this line in the codebase the column name "namespace" is hardcoded. in Databricks the command show databases results in a column called databaseNames - hence the above error on not being able to resolve the column
kedro.io.core.DataSetError: Failed while loading data from data set SparkHiveDataSet
-- Separate them if you have more than one.
cannot resolve '`namespace`' given input columns: [databaseName];;
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
Kedro version used: 0.17.5
Python version used : 3.7
Operating system and version: Mac/BigSur
Databricks: Azure, DBR 7.1 LTS
The text was updated successfully, but these errors were encountered:
Hi @vihag our delta dataset implementation will be in release we want to get out in the next month, until then it's probably easier for you to subclass The HiveDataSet you want to use and override/extend to work for your purposes.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Description
When trying to refer to a database/table in Databricks we get
kedro.io.core.DataSetError: Failed while loading data from data set SparkHiveDataSet
cannot resolve '
namespace' given input columns: [databaseName];;
Context
We want to access data catalogs within Databricks with reference to Database-Table pairs instead of dbfs file paths for better governance. SparkHiveDataset is the current method until the Delta Dataset implementation comes in.
Steps to Reproduce
Expected Result
Ideally it should be able to load the data in the table and process further
Actual Result
we get an error
###My Debug
The issue comes from this line in the codebase the column name "namespace" is hardcoded. in Databricks the command show databases results in a column called databaseNames - hence the above error on not being able to resolve the column
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
The text was updated successfully, but these errors were encountered: