Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2141] Impala support for kedro, custom SQLdataset #504

Closed
noklam opened this issue Sep 8, 2020 · 9 comments
Closed

[KED-2141] Impala support for kedro, custom SQLdataset #504

noklam opened this issue Sep 8, 2020 · 9 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@noklam
Copy link
Contributor

noklam commented Sep 8, 2020

Description

To use kedro with impala. Is is possible to extend the current SQLDataset to create a connection instead of accepting a string only?

Context

Integration with Impala is useful.

Possible Implementation

I am using the python package impyla. To query something with pandas, I need to first create a connection.

from impala.dbapi import connect

user=user
password=password
host=host

conn = connect(host=host, port=21050, user=user, password=password)
sql = 'select * from trable'
pd.read_sql(sql, conn)

In theory, as long as I can implement the connection creation logic manually, the rest should just match the standard pandasSQLdataset.

@noklam noklam added the Issue: Feature Request New feature or improvement to existing feature label Sep 8, 2020
@yetudada
Copy link
Contributor

Hi @noklam! Thank you for this query. We think it may be possible to support Impala in the way that you would prefer by creating your own version of the SQLDataSet. We detail how to create a custom dataset in our documentation: https://kedro.readthedocs.io/en/stable/07_extend_kedro/01_custom_datasets.html

Do let us know if this would be helpful, if so then I'll be able to close this issue.

@noklam
Copy link
Contributor Author

noklam commented Sep 17, 2020

@yetudada I think as long as I can create a connection, it should fit SQLalchemy api.

However, I am struggling to find a reliable solution to connect to Impala on Window.
I tried odbc and impyla, both often fails to connect or even freeze the program.

would be great if someone from kedro have experiene on this.

@921kiyo
Copy link
Contributor

921kiyo commented Oct 2, 2020

I've logged how Kedro could support with Impala :)

@921kiyo 921kiyo changed the title Impala support for kedro, custom SQLdataset [KED-2141] Impala support for kedro, custom SQLdataset Oct 2, 2020
@brendalf
Copy link

Is there anything left to do here?

@merelcht
Copy link
Member

merelcht commented Mar 8, 2021

Hi @brendalf, there hasn't been an attempt yet to implement this, so if you're interested in helping out we'd be more than happy to accept a PR 😄

@noklam
Copy link
Contributor Author

noklam commented Mar 10, 2021

Any thoughts from Kedro team? If not I may try to implement one, but cannot guarantee it works for every platform.

I am mainly a Window user, getting impala to works in Window is a bit tricky.

@merelcht
Copy link
Member

@noklam This isn't a priority for the team at the moment, so we'd be very happy to accept a PR.

@noklam
Copy link
Contributor Author

noklam commented Mar 10, 2021 via email

@stale
Copy link

stale bot commented May 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

5 participants