Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datatypes): infer the correct shape for many existing ops #9334

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

NickCrews
Copy link
Contributor

Found this when looking into #9333

Usually when using eg duckdb's REGEXP_REPLACE() function, the pattern and replacement are always scalar constants. But actually it can accept columnar for all three of it's arguments.

These sort of assumptions appeared to be all over the place in the code base. I found these fixs by grepping for "shape_like" and manually looking at all the instances.

I didn't add any test cases, that felt like a monumental thing to do I wasn't willing to put the time in for. But IDK, I'm not sure I would even want to put in all the tests, it would be SO much boilerplate.

To deal with Nodes sometimes having optional args, I modified shape_like() to be more flexible.
IDK, that was sort of a lazy approach, we could be much more verbose and have each op do this filtering individually, but I didn't that was worth it.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I think it's a good idea!

Just a question about where we should handle the None check.

ibis/expr/rules.py Show resolved Hide resolved
@NickCrews NickCrews marked this pull request as ready for review June 10, 2024 22:14
@cpcloud cpcloud added this to the 9.2 milestone Jun 13, 2024
@cpcloud cpcloud changed the title fix: fix and improve shape inference in many ops fix(datatypes): infer the correct shape for many existing ops Jul 16, 2024
@cpcloud cpcloud added bug Incorrect behavior inside of ibis internals Issues or PRs related to ibis's internal APIs labels Jul 16, 2024
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'll rebase and merge when CI passes.

Usually when using eg duckdb's REGEXP_REPLACE() function, the pattern and replacement are always scalar
constants. But actually it can accept columnar for all three of it's arguments.

These sort of assumptions appeared to be all over the place in the code base. I found these fixs by grepping for "shape_like"
and manually looking at all the instances.

I didn't add any test cases, that felt like a monumental thing
to do I wasn't willing to put the time in for. But IDK,
I'm not sure I would even want to put in all the tests, it would be SO much boilerplate.

To deal with Nodes sometimes having optional args,
I modified shape_like() to be more flexible.
IDK, that was sort of a lazy approach, we could be
much more verbose and have each op do this filtering individually, but I didn't that was worth it.
@NickCrews NickCrews merged commit 7a0b21e into ibis-project:main Jul 16, 2024
82 checks passed
@NickCrews NickCrews deleted the shape-fix branch July 16, 2024 17:37
@NickCrews
Copy link
Contributor Author

@cpcloud hopefully that was OK for me to hit the merge button, the CI passed! Thank you!

NickCrews added a commit to NickCrews/ibis that referenced this pull request Jul 16, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler

I move the validation for comparable-ness down into the operation so that
the logic is consolidated to one place.
in ibis-project#9096 there might be multiple places that construct an ops.SimpleCase, and we don't want
to have to implement the validation in all
calling locations.

We could consider relaxing the limitation for non-empty cases later, but for now lets be strict.

I already fixed the shape of ops.SearchedCase in ibis-project#9334,
but it looks like in that PR I forgot to also fix ops.SimpleCase, so I do that fix here.
cpcloud pushed a commit to NickCrews/ibis that referenced this pull request Jul 23, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler

I move the validation for comparable-ness down into the operation so that
the logic is consolidated to one place.
in ibis-project#9096 there might be multiple places that construct an ops.SimpleCase, and we don't want
to have to implement the validation in all
calling locations.

We could consider relaxing the limitation for non-empty cases later, but for now lets be strict.

I already fixed the shape of ops.SearchedCase in ibis-project#9334,
but it looks like in that PR I forgot to also fix ops.SimpleCase, so I do that fix here.
cpcloud pushed a commit to NickCrews/ibis that referenced this pull request Jul 24, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler

I move the validation for comparable-ness down into the operation so that
the logic is consolidated to one place.
in ibis-project#9096 there might be multiple places that construct an ops.SimpleCase, and we don't want
to have to implement the validation in all
calling locations.

We could consider relaxing the limitation for non-empty cases later, but for now lets be strict.

I already fixed the shape of ops.SearchedCase in ibis-project#9334,
but it looks like in that PR I forgot to also fix ops.SimpleCase, so I do that fix here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis internals Issues or PRs related to ibis's internal APIs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants