Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify serialization by removing redundant PrimitiveScalarValue #3612

Merged
merged 6 commits into from
Oct 25, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Sep 25, 2022

Draft as it builds on #3547

Which issue does this PR close?

Re #3531

This scratches an itch I had while last working with this code

Rationale for this change

While working on #3531 I noticed that PrimitiveScalarType duplciated the functionality of DataType -- this makes working with the protobuf serialization code, which is already hard, even harder.

As the DataFusion community grows, we need to keep our code as easy to work with as possible to make reviews as well as new contirbutions easy.

What changes are included in this PR?

  1. Remove PrimitiveScalarValue and use the existing Arrow DataType

Are there any user-facing changes?

No

@alamb alamb marked this pull request as draft September 25, 2022 14:03
PrimitiveScalarType null_value = 19;
// was PrimitiveScalarType null_value = 19;
// Null value of any type
ArrowType null_value = 33;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a dumb question but why are nulls typed? This encoding of scalarvalue seems to conflate encoding the schema with encoding the values, which seems unfortunate.

Perhaps we could take a look at what substrait does and copy that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean why are nulls typed in general? Basically because of how ScalarValue:: is implemented (as an Option<> around the underlying native type). I think @jimexist tried to clean it up at some point and make ScalarValue::None and then all the variants like ScalarValue::Int8 have values like i8 rather than Option<i8>.

I can't remember what the problem was but it didn't work easily.

In my opinion at least the serialization should follow how they are implemented in ScalarValue and if we improve ScalarValue then we can also improve the serialization code

@alamb alamb force-pushed the alamb/cleanup_serialization branch from ecf9925 to d754b94 Compare October 23, 2022 11:23
@@ -803,48 +804,9 @@ message Decimal128{
int64 s = 3;
}

// Contains all valid datafusion scalar type except for
// List
enum PrimitiveScalarType{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these types are already handled in ArrowType

@@ -1359,184 +1246,6 @@ fn vec_to_array<T, const N: usize>(v: Vec<T>) -> [T; N] {
})
}

//Does not typecheck lists
fn typechecked_scalar_value_conversion(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all redundant with code that (now) exists in ScalarValue

@alamb alamb requested a review from tustvold October 23, 2022 11:28
@alamb alamb marked this pull request as ready for review October 23, 2022 11:28
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine for me, not sure what stability guarantees we provide on this API. Theoretically we should go through a period of supporting both, but I don't know if this is worth the effort

@alamb
Copy link
Contributor Author

alamb commented Oct 24, 2022

This looks fine for me, not sure what stability guarantees we provide on this API. Theoretically we should go through a period of supporting both, but I don't know if this is worth the effort

Current answer I believe is "we provide no guarantees" (even though in theory we could provide a backwards compatible APIs). For example #3547 was likely not backward compatible but yet did not seem to cause changes.

cc @andygrove and @yahoNanJing I think Ballista is the other major user of this API

@alamb
Copy link
Contributor Author

alamb commented Oct 24, 2022

I plan to merge this tomorrow unless someone has additional comments or would like more time to review

@alamb alamb merged commit 54d2870 into apache:master Oct 25, 2022
@alamb alamb deleted the alamb/cleanup_serialization branch October 25, 2022 17:25
@ursabot
Copy link

ursabot commented Oct 25, 2022

Benchmark runs are scheduled for baseline = 48f73c6 and contender = 54d2870. 54d2870 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Dandandan pushed a commit to yuuch/arrow-datafusion that referenced this pull request Nov 5, 2022
…pache#3612)

* Simplify serialization by removing redundant PrimitiveScalarValue

* comments

* it compiles

* Add additional scalar value null construction

* reserve old field name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants