-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to load SQL queries from a file for SQLQueryDataSet #887
Add an option to load SQL queries from a file for SQLQueryDataSet #887
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @BenjaminLevyQB ! Don't forget to update the RELEASE.md
to include the changes as well as your name on the contributor list.
Co-authored-by: Merel Theisen <49397448+MerelTheisenQB@users.noreply.github.com>
…jaminLevyQB/kedro into feature/sql-dataset-filepath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your excellent contribution! ⭐ I like this feature a lot but have two little concerns:
- Is this technically a breaking change? The signature for
SQLQueryDataSet
has changed so that anyone calling it with positional arguments before (likeSQLQueryDataSet("SELECT * FROM table", credentials)
presumably won't work now. The principal use will be through catalog.yml, for which this doesn't matter, but I think anyone using the Python API will suffer a breaking change. Curious if others agree here - if so then this should go to develop rather than master - What happens if a user specifies both
sql
andfilepath
? Currently it looks likefilepath
will take precedent, but maybe this should throw an error message instead.
What's left on this PR - I think it should be ready to go right? We should close #886 for reasons outlined here |
@datajoely looks good to me -- sorry about the delay, can I get a review on this so it can be merged? |
@BenjaminLevyQB I think there a few comments from @AntonyMilneQB (who is now on parental leave) to address |
Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>
Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>
Updated in response to @AntonyMilneQB's comments:
|
…to feature/sql-dataset-filepath
Can I get a review on the updated code? Thanks! |
Hi @BenjaminLevyQB, thanks for your updates on this and apologies for the delayed review - I just got back from leave. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice feature addition! Left a few suggestions, let me know what you think. 😊
Here you can find all available arguments for `open`: | ||
https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.open | ||
All defaults are preserved, except `mode`, which is set to `r` when loading | ||
and to `w` when saving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and to `w` when saving. | |
and to `w` when saving. |
This can be removed, we don't actually set mode to w
ourselves anymore.
|
||
def _load(self) -> pd.DataFrame: | ||
load_args = self._load_args.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this return a shallow copy instead of deepcopy? I think the latter is the one we want for dictionaries generally. 🤔
|
||
def _load(self) -> pd.DataFrame: | ||
load_args = self._load_args.copy() | ||
|
||
if "sql" not in load_args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why not if "filepath" in load_args
? 😅 Or if self._filepath
if we go with the previous suggestion.
@BenjaminLevyQB amazing work so far - some little bits needed before we can get this in. I want to release 0.17.6 shortly so this would be a wonderful addition. |
…jaminLevyQB/kedro into feature/sql-dataset-filepath
Thank you @lorenabalan and @AntonyMilneQB for you excellent comments! The requested changes have been made!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for all your work on this @BenjaminLevyQB. I'll make it sure it gets merged 🙂
…dro-org#887) Signed-off-by: Laurens Vijnck <laurens_vijnck@mckinsey.com>
Description
Adds a new filepath argument to the constructor of SQLQueryDataSet that allows the user to specify a file containing a SQL query instead of having to write the entire query in the catalog, allowing users to keep the catalog clean and take advantage of SQL syntax highlighting in the case of long queries
Development notes
Checklist
RELEASE.md
fileNotice
I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.