-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51065][SQL] Disallowing non-nullable schema when Avro encoding is used for TransformWithState #49751
base: master
Are you sure you want to change the base?
Conversation
"STATE_STORE_SCHEMA_MUST_BE_NULLABLE" : { | ||
"message" : [ | ||
"If schema evolution is enabled, all the fields in the schema for column family <columnFamilyName> must be nullable.", | ||
"Please set the 'spark.sql.streaming.stateStore.encodingFormat' to 'UnsafeRow' or make the schema nullable.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is stored in the offset log though. So maybe just say that they should make schema nullable ?
schemas.map { case (colFamilyName, schema) => | ||
// assert that each field is nullable if schema evolution is enabled | ||
schema.valueSchema.fields.foreach { field => | ||
if (!field.nullable && ensureNullableFields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just always enforce this for transformWithState
?
@ericm-db - can u add the SPARK ticket in the PR title ? |
@ericm-db - also, is test failure related to the change ? |
schemas.map { case (colFamilyName, schema) => | ||
// assert that each field is nullable if schema evolution is enabled | ||
schema.valueSchema.fields.foreach { field => | ||
if (!field.nullable && shouldCheckNullable && !isInternal(colFamilyName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to treat internal col families differently ?
What changes were proposed in this pull request?
Right now, effectively set all fields in a schema to nullable, regardless of what the user specifies. If a field is specified as non-nullable and Avro encoding is used we will throw an error
Why are the changes needed?
In order to keep parity with the user-specified schema with the actual schema that we use.
Does this PR introduce any user-facing change?
This error is thrown if the schema is defined as non-nullable
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No