Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PhysicalExtensionCodec consistently #10075

Merged
merged 4 commits into from
Apr 15, 2024

Conversation

joroKr21
Copy link
Contributor

Which issue does this PR close?

Follup to #9436.

Rationale for this change

We need to use the physical extension codec consistently, so that UDFs embedded in more complex expressions can also be serialized and deserialized. Most importantly in parse_required_physical_expr which calls recursively into parse_physical_expr and should be providing the original codec.

What changes are included in this PR?

Add a PhysicalExtensionCodec parameter to more of the deserialization functions that were using a DefaultPhysicalExtensionCodec previously. I have left the the signature of public functions unchanged but they should be changed in the next major release (left a TODO comments):

  • parse_physical_window_expr
  • parse_protobuf_hash_partitioning
  • parse_protobuf_file_scan_config

Are these changes tested?

I extended the test in roundtrip_physical_plan.rs to use a more complex expression.

Are there any user-facing changes?

No, but left a TODO for future changes.

@joroKr21 joroKr21 force-pushed the physical-extension-codec branch from 73c1310 to a6993eb Compare April 14, 2024 06:08
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @joroKr21 . I think some of the backwards compatible API stuff isn't strictly necessary but I also think it is fine to leave

I had some suggestions for additional tests, but otherwise this looks very nice.

cc @thinkharderdev

/// * `input_schema` - The Arrow schema for the input, used for determining expression data types
/// when performing type coercion.
/// * `codec` - An extension codec used to decode custom UDFs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for improving the comments here as well ❤️

)
}

// TODO: Make this the public function on next major release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what is on main will effectively become the next major release (38.0.0) so there is no reason to avoid API changes here.

If we want to backport this to a stable 37.1.0 release (e.g. #9904 ) then we should avoid API changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see so if we want the backwards compatible changes it would be against branch-37

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the breaking changes here in 1f0040c

}

#[test]
fn roundtrip_distinct_count() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth a test for round tripping window functions with a user defined function too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 280feda

Copy link
Contributor

@thinkharderdev thinkharderdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @joroKr21 -- thanks again

@alamb alamb merged commit 36f6e0f into apache:main Apr 15, 2024
24 checks passed
@joroKr21 joroKr21 deleted the physical-extension-codec branch April 15, 2024 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants