Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APIDataSet reduce number of arguments and move to Load args #1633

Merged
merged 22 commits into from
Sep 12, 2022

Conversation

daBlesr
Copy link
Contributor

@daBlesr daBlesr commented Jun 19, 2022

Description

Resolves #711

Development notes

I have changed the implementation of APIDataSet by removing all the specific configuration parameters, and generally passing load_args to a request instance as keyword arguments. Also, users are now able to construct different kind of Authenticator objects to use with requests.

Checklist

  • Read the contributing guidelines
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@daBlesr daBlesr requested review from yetudada and idanov as code owners June 19, 2022 02:04
daBlesr added 6 commits June 19, 2022 04:15
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
…tantiate authenticator objects from config.

Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
…eaming, certs etc.

Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
@daBlesr daBlesr force-pushed the feature/api-dataset-load-args branch from b8ba671 to 82584e6 Compare June 19, 2022 02:20
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for working on this @daBlesr! Really nice that someone has picked this up since it had got a bit stuck.

Moving arguments to load_args definitely makes sense to me here given the API of requests.request that you posted in the issue. Like you suggest, we aim to keep the interface as close to the underlying library as possible so this seems like a very good improvement to our APIDataSet 👍

Your changes to auth also very much make sense and I totally see where you're coming from here, but I think they should not be part of this PR. Let's limit the scope here to just be the load_args part. The auth question is more of an open question (as are credentials in kedro in general) - I will post more about this in #711.

RELEASE.md Outdated Show resolved Hide resolved
kedro/extras/datasets/api/api_dataset.py Outdated Show resolved Hide resolved
daBlesr added 3 commits June 20, 2022 18:52
…dded migration docs. Changed test for simple auth, to include `load_args`

Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
@daBlesr
Copy link
Contributor Author

daBlesr commented Jun 20, 2022

Hi @AntonyMilneQB, thanks for reviewing. I have removed the auth changes. You've seen a working example, so it might come in handy in the future when generalising for other Datasets 😉 . Also I have updated the docs as per your comment.

@antonymilne antonymilne self-requested a review June 20, 2022 20:21
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, especially on the tests!! Thank you so much for this contribution. I have lots of small comments and questions but nothing big so very happy to approve this 😀

One potential problem I foresee here is that it seems that some requests arguments expect tuple rather than just a general Iterable, notably auth and cert. This is a problem because when a user specifies these arguments in yaml, they always come out as a list. This was one of the original motivations for #1101.

The only fix I can think for this is for us to manually go through load_args and convert anything that is a list to a tuple (checking for Iterable won't work for at least cert because it could also be a string, which we don't want to convert to tuple!). That seems kind of ugly though. Interested what you think! Possibly our great configuration overhaul will provide a better solution here (like a native way of putting Python lists into yaml e.g. #1011).

Also interested in seeing you have any comments on this issue on credentials, partly inspired by the problems you've uncovered here! 🙏 #1646

RELEASE.md Outdated Show resolved Hide resolved
RELEASE.md Outdated Show resolved Hide resolved
kedro/extras/datasets/api/api_dataset.py Show resolved Hide resolved
kedro/extras/datasets/api/api_dataset.py Outdated Show resolved Hide resolved
kedro/extras/datasets/api/api_dataset.py Show resolved Hide resolved
kedro/extras/datasets/api/api_dataset.py Outdated Show resolved Hide resolved
kedro/extras/datasets/api/api_dataset.py Outdated Show resolved Hide resolved
tests/extras/datasets/api/test_api_dataset.py Show resolved Hide resolved
tests/extras/datasets/api/test_api_dataset.py Outdated Show resolved Hide resolved
tests/extras/datasets/api/test_api_dataset.py Outdated Show resolved Hide resolved
daBlesr added 4 commits June 25, 2022 16:07
… converted to tuple before sending the request. Changed docs. Added tests.

Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion from lists to tuples seems good, but I'm slightly confused about what you ended up doing with the types for credentials/auth - feels like you opted to go for something non-breaking here but didn't revert everything you intended to? I don't mind where we end up here (your original solution vs. this new one) so long as all the code, docstrings, release notes are consistent on it. Please do feel free to merge whenever you're happy with it 🙂

Thank you again for all your work on this!

kedro/extras/datasets/api/api_dataset.py Show resolved Hide resolved
auth = tuple(auth)
self._auth = credentials or self._load_args_auth
self._cert = self._load_args.pop("cert", None)
self._timeout = self._load_args.pop("timeout", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: when the user hasn't specified a timeout, it will end up as timeout: None in self._request_args and hence the _describe method. Might be slightly neater to only pass these arguments to _request_args if we really want to - something like:

if "cert" is in self._load_args:
    self._load_args["cert"] = self._convert_type(self._load_args["cert"])
# just pass to self._request_args using **self._load_args

timeout: int = 60,
credentials: Union[Iterable[str], AuthBase] = None,
load_args: Dict[str, Any] = None,
credentials: Union[Tuple[str, str], List[str]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to revert the type hint and docstring here if it can still be AuthBase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still doubting about this, since technically it works. However, I think it might be confusing to the reader, since APIDataSet are automatically constructed from the catalog, and AuthBases cannot be inferred this way. So I am wondering, in what cases would this constructor type be used when using Kedro? At this moment, for a user that wants to do a request using an AuthBase instance, should imo take a different approach, such as extending APIDataSet. However, if you have other ideas, I'll put AuthBase in the typing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very slow reply I know, sorry!! 😅 The use case here would be that the APIDataSet is constructed directly using the Python API rather than from the data catalog. In this case you can easily pass arbitrary Python objects to the constructor. This is unusual but does occur and is something we support. So I've put AuthBase back in the type hint 🙁

RELEASE.md Outdated Show resolved Hide resolved
tests/extras/datasets/api/test_api_dataset.py Show resolved Hide resolved
daBlesr added 3 commits June 29, 2022 14:30
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
…ture/api-dataset-load-args

� Conflicts:
�	kedro/extras/datasets/api/api_dataset.py

Signed-off-by: Niels Drost <codingdutchman@gmail.com>
Signed-off-by: Niels Drost <codingdutchman@gmail.com>
@afuetterer
Copy link

Hi, what is the status of this PR? Do you need more time working on this?

Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
antonymilne and others added 2 commits September 12, 2022 09:25
Signed-off-by: Antony Milne <antony.milne@quantumblack.com>
@antonymilne
Copy link
Contributor

Hi @afuetterer, thanks for the push here and sorry for the huge delay - this was 99% ready to merge and then seemed to have stalled. I've just polished it off now and I think we should be ready to merge.

Thank you @daBlesr for all your hard work on this, it's much appreciated 🙏

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for the contribution @daBlesr ⭐ and for finishing the PR @AntonyMilneQB ⭐

@antonymilne antonymilne merged commit d29eb3b into kedro-org:develop Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants