Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor StreamSets and the chaining [ch13505] #8

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jleifnf
Copy link

@jleifnf jleifnf commented Jul 2, 2021

https://app.clubhouse.io/pingthings-ws/story/13505

made method-chaining filter(start, end) step optional, allow aligned_windows and windows to take ‘start’ and ‘end’ which would be consistent with Stream

@jleifnf jleifnf requested review from RamiroCope and chrisjryan July 2, 2021 15:40
Copy link

@chrisjryan chrisjryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the motivation for this is interesting, and I like that it uses filter under the hood. Though the interface is a little unusual to me and I think it could have a simpler approach. How about changing:

def windows(self, *args, **kwargs):
    ...

to

def windows(self, width, depth, start=None, end=None):
    assert (start and end) or (not start and not end), 'If using `start` and `end`, specify both.'

I think this would have similar functionality to what we're going for but in a simpler way (both at the interface & in implementation).

Also, I tend to favor when there is 1 way to do something in an API rather than more than one. Even with the changes suggested above, this PR would let users have 2 ways of inputing start and end, which may add some confusion... Though I understand that removing the old way would be a breaking change (some customers' code would stop working), so maybe we should consider removing the old way some other time.

p.s.: It might be interesting to check in w/ Allen to learn more about the motivation behind the filter/chain interface as-is, and the pros/cons considered when making that decision. As discussed in DS Office Hour, to me it seems similar to boto3's S3 interface (docs here), so perhaps this was chosen so as to be more intuitive for folks familiar with that library.

@@ -1107,13 +1107,19 @@ def clone(self):
setattr(clone, attr, deepcopy(val))
return clone

def windows(self, width, depth):
def windows(self, *args, **kwargs):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original version (using specific args, with expressive variable names) was clearer to understand and use, and simplified the code since it didn't need the if/else statements below. Is there a way we can implement the functionality you want without using *args & *kwargs?

start, end = None, None
if len(args) < 2 and len(kwargs) < 2:
raise UserWarning("Require 'width' and 'depth'")
elif len(args) == 2 and len(kwargs) == 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems surprising to me that the user can specify width and depth in either the args or the kwargs. Also, it might be unclear to the user how to order width & depth when specifying as args.

raise UserWarning("Require 'width' and 'depth'")
elif len(args) == 2 and len(kwargs) == 0:
width, depth = args
elif len(args) == 0 and len(kwargs) == 2:
Copy link

@chrisjryan chrisjryan Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that, if the user gave start & end in the kwargs but not width & depth, this could lead to unclear errors.

width, depth = kwargs.get('width', None), kwargs.get('depth', None)
elif len(args) == 0 and len(kwargs) == 4:
start, end, width, depth = [kwargs.get(k, None) for k in ('start', 'end', 'width', 'depth')]
elif len(args) == 4 and len(kwargs) == 0:
Copy link

@chrisjryan chrisjryan Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall that having multiple ways to order & submit the input args could add more confusion than usability.

@jleifnf
Copy link
Author

jleifnf commented Jul 2, 2021

I was trying to use the args/kwargs because if we use the keywords like windows(self, width, depth, start=None, end=None):, then it's not the same order as it is Stream: stream.windows(start, end, width, depth).

@chrisjryan
Copy link

chrisjryan commented Jul 2, 2021

I see, that helps. I still feel like the code as-is may not be the best approach though.

@jleifnf What behavior do we expect if the user does not specify start and end? [edit: I'm guessing that requires an upstream filter() call, else it would throw an error.]

@jleifnf
Copy link
Author

jleifnf commented Jul 2, 2021

What behavior do we expect if the user does not specify start and end?

@chrisjryan, I expect the user is still trying to use it as streamset.filter(start, end).windows(width, depth) or they are trying to do the same as Stream windows(start, end, width, depth) when not specifying.

Edit:
It would throw an error, when not specifying start and end. Just not within the refactored code, it'll be down the line when actually doing data fetching and transforming, as with current version when it's just streamset.windows(width, depth).to_dataframe().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants