Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.io.NotSerializableException when using JavaSerializer in v5.1.0, v6.1.0 #274

Closed
kevinwallimann opened this issue Feb 4, 2022 · 0 comments · Fixed by #275
Closed
Assignees
Labels
bug Something isn't working

Comments

@kevinwallimann
Copy link
Collaborator

kevinwallimann commented Feb 4, 2022

Description

With the new configurable schema converter feature (#268, #269), the class DefaultSchemaConverter is instantiated by default as member variable schemaConverter in AvroDataToCatalyst. Even though AvroDataToCatalyst as a case class is serializable by default, serialization fails when using the JavaSerializer with the following error message

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: za.co.absa.abris.avro.sql.DefaultSchemaConverter
Serialization stack:
	- object not serializable (class: za.co.absa.abris.avro.sql.DefaultSchemaConverter, value: za.co.absa.abris.avro.sql.DefaultSchemaConverter@1ce2ce83)
	- field (class: za.co.absa.abris.avro.sql.AvroDataToCatalyst, name: schemaConverter, type: interface za.co.absa.abris.avro.sql.SchemaConverter)
	- object (class za.co.absa.abris.avro.sql.AvroDataToCatalyst, from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]})))
	- field (class: org.apache.spark.sql.catalyst.expressions.IsNotNull, name: child, type: class org.apache.spark.sql.catalyst.expressions.Expression)
	- object (class org.apache.spark.sql.catalyst.expressions.IsNotNull, isnotnull(from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]}))))
	- field (class: org.apache.spark.sql.execution.FilterExec, name: condition, type: class org.apache.spark.sql.catalyst.expressions.Expression)
	- object (class org.apache.spark.sql.execution.FilterExec, Filter isnotnull(from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]})))

How to fix

Add Serializable trait to SchemaConverter to trait.
Make schemaConverter lazy

@kevinwallimann kevinwallimann added the bug Something isn't working label Feb 4, 2022
@kevinwallimann kevinwallimann self-assigned this Feb 4, 2022
@kevinwallimann kevinwallimann changed the title java.io.NotSerializableException when using JavaSerializer java.io.NotSerializableException when using JavaSerializer in v5.1.0, v6.1.0 Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant