Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a new dataset allowing querying a Google Bigquery table with a custom SQL statement #1039

Merged
merged 20 commits into from
Nov 23, 2021

Conversation

simonpicard
Copy link
Contributor

Description

Adds a new dataset called GBQQueryDataset which allows querying a Google Bigquery table with a custom SQL statement, either by providing a direct SQL str as parameter for the dataset or using a path to a file containing the query.

Linked to #1032

Development notes

  • Created GBQQueryDataset in kedro/extras/datasets/pandas/gbq_dataset.py
  • Tested the dataset using existing tests to GBQTableDataset and SQLQueryDataset as inspiration
  • Added GBQQueryDataset in the __init__ of module kedro/extras/datasets/pandas
  • Included GBQQueryDataset in docs/source/15_api_docs/kedro.extras.datasets.rst for auto doc generation

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@datajoely
Copy link
Contributor

This looks great @simonpicard ! We'll review properly in the week. Thanks for contributing!

@datajoely datajoely requested review from SajidAlamQB, lorenabalan and merelcht and removed request for idanov and yetudada November 15, 2021 09:54
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍 Thanks for the contribution @simonpicard!

RELEASE.md Outdated Show resolved Hide resolved
kedro/extras/datasets/pandas/gbq_dataset.py Outdated Show resolved Hide resolved
kedro/extras/datasets/pandas/gbq_dataset.py Outdated Show resolved Hide resolved
Copy link
Contributor

@lorenabalan lorenabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, happy to merge once all comments have been addressed. 🙌 Thank you for the contribution!

RELEASE.md Outdated Show resolved Hide resolved
kedro/extras/datasets/pandas/gbq_dataset.py Show resolved Hide resolved
kedro/extras/datasets/pandas/gbq_dataset.py Outdated Show resolved Hide resolved
kedro/extras/datasets/pandas/gbq_dataset.py Outdated Show resolved Hide resolved
kedro/extras/datasets/pandas/gbq_dataset.py Outdated Show resolved Hide resolved
simonpicard and others added 9 commits November 17, 2021 16:48
Co-authored-by: Merel Theisen <49397448+MerelTheisenQB@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+MerelTheisenQB@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+MerelTheisenQB@users.noreply.github.com>
Co-authored-by: Lorena Bălan <lorena.balan@quantumblack.com>
Co-authored-by: Lorena Bălan <lorena.balan@quantumblack.com>
Co-authored-by: Lorena Bălan <lorena.balan@quantumblack.com>
@simonpicard
Copy link
Contributor Author

simonpicard commented Nov 23, 2021

Thanks for the review, team!

LGTM, happy to merge once all comments have been addressed. 🙌 Thank you for the contribution!

@lorenabalan I updated the codebase according to your comments, please let me know if you notice something else to update.

Copy link
Contributor

@lorenabalan lorenabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent thank you for following up @simonpicard ! Left just a couple of minor docstrings comments, otherwise LGTM. Will merge this today.

kedro/extras/datasets/pandas/gbq_dataset.py Outdated Show resolved Hide resolved
kedro/extras/datasets/pandas/gbq_dataset.py Outdated Show resolved Hide resolved
simonpicard and others added 2 commits November 23, 2021 13:58
Co-authored-by: Lorena Bălan <lorena.balan@quantumblack.com>
Co-authored-by: Lorena Bălan <lorena.balan@quantumblack.com>
@lorenabalan
Copy link
Contributor

I'll take over this one and get it merged, not sure why CI checks are not showing.

@lorenabalan lorenabalan merged commit ce58b4d into kedro-org:master Nov 23, 2021
Galileo-Galilei pushed a commit to Galileo-Galilei/kedro that referenced this pull request Feb 19, 2022
lvijnck pushed a commit to lvijnck/kedro that referenced this pull request Apr 7, 2022
Signed-off-by: Laurens Vijnck <laurens_vijnck@mckinsey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants