-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Kedro instantiate datasets from kedro_dataset
with higher priority than kedro.extras.datasets
#1734
Conversation
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
This was tested manually with local testing, maybe that's enough already. Optionally we could do add a test to make sure the |
…o_datasets_higher_priority-#1494 Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
kedro.datasets
with higher priority than kedro.extras.datasets
kedro_dataset
with higher priority than kedro.extras.datasets
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
RELEASE.md
Outdated
@@ -11,8 +11,9 @@ | |||
# Upcoming Release 0.18.4 | |||
|
|||
## Major features and improvements | |||
* The config loader objects now implement `UserDict` and the configuration is accessed through `conf_loader['catalog']` | |||
* You can configure config file patterns through `settings.py` without creating a custom config loader | |||
* Make Kedro instantiate datasets from `kedro.datasets` with higher priority than `kedro.extras.datasets`. `kedro.datasets` is the namespace for the new `kedro-datasets` python package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be kedro_datasets
instead of kedro.datasets
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
…ub.com:kedro-org/kedro into feat/init_kedro_datasets_higher_priority-#1494 Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you managed to create a test for this!! 👏 🏆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Really clever test, but I wonder if it could be a bit clearer. I left two comments but otherwise happy to approve.
tests/io/test_data_catalog.py
Outdated
def test_config_import_kedro_datasets(self, sane_config, mocker): | ||
"""Test kedro.extras.datasets default path to the dataset class""" | ||
# Spy _load_obj because kedro_datasets is installed | ||
class DummyMock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think DummyMock
and dummy_mock
could have more meaningful names, would DummyLoader
work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmholzer I manage to remove the dummy object, turns out I can spy on the kedro.io.core
module object directly, so the patch
is not needed.
tests/io/test_data_catalog.py
Outdated
def test_config_import_kedro_datasets(self, sane_config, mocker): | ||
"""Test kedro.extras.datasets default path to the dataset class""" | ||
# Spy _load_obj because kedro_datasets is installed | ||
class DummyMock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a function instead of a class? That may also make the test clearer. I checked the pytest-mock
docs, it's not obvious if it can be in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. This isn't possible, the spy need to work with a class and method. I will add comments on it.
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
…ub.com:kedro-org/kedro into feat/init_kedro_datasets_higher_priority-#1494 Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work 🎉.
I like the test, the comments and structure make things clear.
Signed-off-by: Nok Chan nok.lam.chan@quantumblack.com
Description
Close #1494
We are going to introduce
kedro-datasets
as a new Kedro's plugins for datasets. It will be initiated with a higher priority in case of conflicting datasets. Thekedro.extras.datasets
will be removed in 0.19.0 completely, but for 0.18.x we will keep it compatible thus we need to make sure thekedro-datasets
package has a higher priority.Development notes
Tested manually. Adding an automatic test is a tricky since we don't want to introduce
kedro-datasets
into the test dependency.More notes on test
I try to avoid touching the
sys.modules
as it seems to be problematic and creates some unknown side effects in our tests. We did it for a few tests to force reload. I can't find a better way to test a non-existing module, mocking a module is not trivial.In the end, I go with spying, which basically swaps the method with a custom one (It's a bit annoying that it has to be a class, you can't spy a function), but it works. I would be very pleased if there is a better way to do this.
Checklist
RELEASE.md
file