-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularize default argument handling for datasets #14
Comments
Hi @deepyaman , thanks a lot for pointing this out. It's something I 've thought of proposing to fix quite a few times but was never a priority. Another solution to consider would be to have class Base:
default_save_args = {}
default_load_args = {}
@property
def _load_args(self):
return ({**self.default_load_args, **self._load_args_}
if hasattr(self, '_load_args_') else {**self.default_load_args})
@property
def _save_args(self):
return ({**self.default_save_args, **self._save_args_}
if hasattr(self, '_save_args_') else {**self.default_save_args})
@_load_args.setter
def _load_args(self, load_args):
self._load_args_ = load_args if load_args is not None else {}
@_save_args.setter
def _save_args(self, save_args):
self._save_args_ = save_args if save_args is not None else {}
class Child(Base):
default_save_args = {'index': False}
def __init__(self, load_args=None, save_args=None):
self._load_args = load_args
self._save_args = save_args So that:
This would avoid the I also like you proposition of making the Btw, I m not sure if what I suggested is the right way |
@tsanikgr Thanks for the feedback! I believe the property implementations can be further simplified as: @property
def _load_args(self):
return {**self.default_load_args, **getattr(self, '_load_args_', {})} (and the analogous method for |
👏 Indeed! Unfortunately, we can't avoid the setters, otherwise |
Thank you so much for this @deepyaman! We'll await feedback from @idanov on this and will get back to you. |
Thanks @deepyaman for raising this. We've had discussions internally and this has come up a few times already. However I don't think this is something which should be added - we should leave each implementation to deal with default arguments as they see fit - they might not even have default arguments if they want to. The dataset abstraction is meant to be not very prescriptive in order to allow for all heterogenous and very different in their own way datasets. Adding more structure to the abstraction will make that functionality usefull for only some datasets but not others. That is going to undermine the abstraction - abstract classes are not meant to remove code repetition, but to create useful and general abstractions. Also in this case we are talking about very small code repetition, which does not justify paying the price of having a more complex and bad abstraction at the end. Having heterogenous classes means that the code will remain to be federated by nature, which will inevitably result in some code repetition here and there, and that's not such a bad thing.
See https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction |
@idanov I understand your point. That being said, there are options (from the
Finally, from the
At the very minimum, I feel like it would be an improvement to standardize this functionality across existing datasets, even if the code is identical/copy-pasted in each. Thoughts? |
@deepyaman I agree with your last point, so I suppose we can do at least that. I like your idea for |
@idanov Sounds great! I've updated #15 with the above changes. |
Description
Near-identical code to handle default arguments is replicated in almost every dataset implementation. Worse still, functionality across said datasets is the same, but implementation is inconsistent.
Context
When I want to implement a new dataset, I look at existing datasets as a baseline for implementing my own. However, there are inconsistencies between these datasets, from the more minor (
save_args
handled afterload_args
for some datasets), to the slightly more significant (special casing where there are no default arguments on some datasets but not others) and worse (one case where arguments are evaluated for truthiness instead ofis not None
) (see https://github.com/quantumblacklabs/kedro/blob/0.14.1/kedro/contrib/io/azure/csv_blob.py#L109-L113 as an example representing several of the above). I don't know which one to follow to maintain consistency across the codebase.Possible Implementation
#15
By having
DEFAULT_LOAD_ARGS
/DEFAULT_SAVE_ARGS
attributes, users can also see the defaults programmatically (with the caveat that this is a drawback if you consider the few cases where such arguments don't apply, like no save onSqlQueryDataSet
or in general onLambdaDataSet
/MemoryDataSet
).Possible Alternatives
AbstractDataSet
and thereby only apply to those withload_args
/save_args
__init__
method (not preferred)Checklist
Include labels so that we can categorise your issue:
The text was updated successfully, but these errors were encountered: