-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataCatalog]: add_feed_dict()
performance bottleneck
#3912
Comments
xref #3930 which also discusses catalog mutability |
Interestingly I found this from 2021 :P #951 |
As I understand it, we use the In my opinion, to solve this issue, we should first consider cancelling immutability. Regarding the current ticket, it's unclear to me why the current situation is a problem. Do we frequently encounter scenarios where |
Indeed, that's the topic of #3930 (some examples from plugins linked there) Admittedly, some follow-up questions could be asked to better understand in which cases do we want to allow mutability in the |
@ElenaKhaustova, could you please comment: Is it correct that the solution proposed in that ticket leads to the loss of dataset immutability? Before implementing it, we need to agree on this loss as described in #3930 ? |
That's the case when people use multi-runner for tuning parameters which under the hood creates numerous similar datasets with namespaces and thus extensively uses the |
Well, technically one can modify it now using private methods as well. The suggestion for now is just to use |
@ElenaKhaustova, thank you for the explanation. I have two questions:
|
|
Solved in #4218 |
Description
The current implementation of
add_feed_dict()
leads to performance bottlenecks because it callsadd()
method which duplicates the structure of_FrozenDatasets
, resultingO(N^2)
complexity thus unnecessary slowdowns, especially in case of many catalog entries.We propose implementing a more efficient approach that directly updates datasets collection without the need for copying
_FrozenDatasets
structures.Context
kedro/kedro/io/data_catalog.py
Line 694 in 27f5405
kedro/kedro/io/data_catalog.py
Line 626 in 27f5405
kedro/kedro/io/data_catalog.py
Line 108 in 27f5405
Steps to Reproduce
Suggested Implementation
Modify
_FrozenDatasets
constructor, so it only inputs adict[str, AbstractDataset]
. Keep usingself.__dict__.update()
in the constructor to add datasets into the_FrozenDatasets
. In case of extending_FrozenDatasets
collection as inadd()
method use_FrozenDatasets.__dict__.update()
. We can also consider adding_FrozenDatasets._update()
method wrapping_FrozenDatasets.__dict__.update()
logic and use it in the constructor and uponadd()
.The text was updated successfully, but these errors were encountered: