Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2565] Register DataCatalog in settings instead of through the registration hook. #1064

Merged
merged 11 commits into from
Dec 6, 2021

Conversation

merelcht
Copy link
Member

@merelcht merelcht commented Nov 23, 2021

Description

DataCatalog is currently registered via hooks - however we would like this to be via settings.py instead.

Development notes

  • Followed the same refactoring the DataCatalog registration as done for the ConfigLoader
  • Kept the DataCatalog as part of the KedroContext for this PR. The removal will happen seperately.
  • Default register_catalog hook_impl has been removed
  • DATA_CATALOG_CLASS has been added to _ProjectSettings class impl
  • DATA_CATALOG_CLASS has been added (commented) to settings.py
  • Default DATA_CATALOG_CLASS is set to kedro.io.DataCatalog
  • Left the ProjectHooks in place for now as we have a separate task in the backlog to get rid of the registration hooks fully.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@merelcht merelcht requested a review from idanov as a code owner November 23, 2021 17:08
@merelcht merelcht self-assigned this Nov 23, 2021
@merelcht merelcht force-pushed the KED-2565-data-catalog-in-settings branch from 5f54348 to f97f6b1 Compare November 23, 2021 17:16
@merelcht merelcht requested a review from yetudada as a code owner November 23, 2021 17:16
Copy link
Contributor

@lorenabalan lorenabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking this up and making it such a clean PR to review. Made a few suggestions, let me know what you think. Everything looks good otherwise.

@merelcht merelcht force-pushed the KED-2565-data-catalog-in-settings branch from 1b02c30 to 02f55c9 Compare November 25, 2021 11:59
@merelcht merelcht requested a review from lorenabalan November 29, 2021 10:09
Copy link
Contributor

@lorenabalan lorenabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff! 💥

@merelcht merelcht force-pushed the KED-2565-data-catalog-in-settings branch from 03798c2 to 4c6a7b9 Compare November 29, 2021 17:28
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General 👍 👍 👍 from me but with a few questions...

  • Do we want to be exposing DATA_CATALOG_ARGS as we do CONFIG_LOADER_ARGS? I don't know if this has been considered already. Personally I have no strong feelings on it - we can always add it on in a separate PR or some time in the future if it turns out it's needed. I suspect @lorenabalan or @idanov would know the answer to whether we want to do this anyway
  • Maybe I missed it, but I couldn't see any tests to make sure this is actually working ok? You have the BadCatalog but I didn't see any happy path tests analogous to the config loader tests that use MyConfigLoader

Completely separate from this PR, but before we release 0.18 we should really go through the release notes and reorganise them. e.g. to put all the registration hook change stuff together. Same goes for settings.py actually, to make sure that it's all nicely ordered and clear and consistent.

@merelcht
Copy link
Member Author

  • Do we want to be exposing DATA_CATALOG_ARGS as we do CONFIG_LOADER_ARGS? I don't know if this has been considered already. Personally I have no strong feelings on it - we can always add it on in a separate PR or some time in the future if it turns out it's needed. I suspect @lorenabalan or @idanov would know the answer to whether we want to do this anyway

Good question. I'm not sure if we've discussed this or not 🤔 The DataCatalog is constructed slightly differently from the ConfigLoader though, it calls from_config which takes specific parameters so we probably need to add some input validation. I haven't really looked at it in depth though so maybe I'm talking nonsense.

  • Maybe I missed it, but I couldn't see any tests to make sure this is actually working ok? You have the BadCatalog but I didn't see any happy path tests analogous to the config loader tests that use MyConfigLoader

The happy paths are the already existing tests. Follow up to this task will be to remove the catalog from KedroContext, so I think it makes sense to revisit/rewrite the tests as part of that as well.

@antonymilne
Copy link
Contributor

Good question. I'm not sure if we've discussed this or not 🤔 The DataCatalog is constructed slightly differently from the ConfigLoader though, it calls from_config which takes specific parameters so we probably need to add some input validation. I haven't really looked at it in depth though so maybe I'm talking nonsense.

I don't think the fact that it's constructed from from_config would make a difference here. We could do

        catalog = settings.DATA_CATALOG_CLASS.from_config(
            catalog=conf_catalog,
            credentials=conf_creds,
            load_versions=load_versions,
            save_version=save_version,
            **settings.DATA_CATALOG_ARGS
        )

...analogous to how we do

 return config_loader_class(
                conf_source=str(self._project_path / settings.CONF_SOURCE),
                env=env,
                runtime_params=extra_params,
                **settings.CONFIG_LOADER_ARGS,
            )

Overall to me, at a glance it makes sense that we should provide DATA_CATALOG_ARGS since, as I understand it, data catalog is on an equivalent footing to config loader and hence should have the same options for configurability and extensibility. This is in contrast to CONTEXT_CLASS, for which I'm guessing we are deliberately not enabling CONTEXT_ARGS. But I guess others may argue that we should not enable this functionality until there's a proven user need for it.

The happy paths are the already existing tests. Follow up to this task will be to remove the catalog from KedroContext, so I think it makes sense to revisit/rewrite the tests as part of that as well.

Do these test what happens if I overwrite DATA_CATALOG_CLASS with my own custom class? I couldn't see that but could definitely just be missing it!

Co-authored-by: Antony Milne <49395058+AntonyMilneQB@users.noreply.github.com>
@lorenabalan
Copy link
Contributor

My personal preference would be to leave it as is for now because I don't think we know enough information about that workflow, if it even exists. With the ConfigLoader we at least could look at two classes (templated vs regular) and infer a pattern, but there's no feasible option for an alternative for DataCatalog out of the box - only potentially user-defined options which we have no visibility over.

If the workflow exists, I would rather make the catalog go through the same exercise as the config loader, and see if we can define a clearer interface, rather than force it now and feel patchy. We can add it later as a feature addition, not a breaking change.

@merelcht
Copy link
Member Author

merelcht commented Dec 6, 2021

Thanks for your input @lorenabalan and @AntonyMilneQB! I will merge this PR now as is and if we find that users need to pass arguments to the catalog we can create a follow up task to investigate and add this functionality.

@merelcht merelcht merged commit 2658e98 into develop Dec 6, 2021
@merelcht merelcht deleted the KED-2565-data-catalog-in-settings branch December 6, 2021 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants