-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fields parameter in join to fetch a subset of fields from rhs #535
Conversation
Your free trial has expired. To keep using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month or reach us at help@ellipsis.dev |
a551b13
to
6e58bea
Compare
fennel/client_tests/test_dataset.py
Outdated
] | ||
] | ||
|
||
c = rating.join(revenue, how="left", on=[str(cls.movie)], right_fields=["revenue"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/right_fields/rhs_fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used right_fields because the param for join on is called right_on and not rhs_on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just use fields
here since that's quite disambiguated on its own (there is no analog of left_fields)? cc @aditya-nambiar
c = a.join(b, how="inner", on=["user_id"], right_fields=["rank"]) # type: ignore | ||
return c | ||
|
||
assert "doesn't exist in allowed fields of right schema" in str(e.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add assert that we print the pipeline and dataset name in the error message so that user can figure where the issue is
c = a.join(b, how="inner", on=["user_id"], right_fields=["user_id"]) # type: ignore | ||
return c | ||
|
||
assert "doesn't exist in allowed fields of right schema" in str(e.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user doesnt know what the allowed fields are? can we tell them eg value fields and or timestamp of the rhs dataset
c = a.join(b, how="inner", on=["user_id"], right_fields=["timestamp"]) # type: ignore | ||
return c | ||
|
||
assert "already exists in left schema" in str(e.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please assert on teh full message rather than using "in"
fennel/datasets/test_dataset.py
Outdated
"on":{ | ||
"a1":"b1" | ||
}, | ||
"withinLow":"259200s", | ||
"withinHigh":"31536000s" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the how missing ? We should not modify what the protos used to generate since that is stored state on disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the test throw an error if I copied wrong json?
fennel/datasets/datasets.py
Outdated
# TODO(sat): Are the same checks here and in SchemaValidator redundant? | ||
# If right_fields is set, check that it contains elements from right schema values and timestamp only | ||
if self.right_fields is not None and len(self.right_fields) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer to raise error in SchemaValidator as we can give a lot more context about the error there
ebf24a6
to
760113d
Compare
c5c8b70
to
64c9a60
Compare
Add right_fields parameter in join to fetch only a subset of fields from rhs