Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Jul 13, 2018

What changes were proposed in this pull request?

Currently in Avro data source module

  1. the Avro Deserializer converts input Avro format data to Row, and then convert the Row to InternalRow.
  2. the Avro Serializer converts InternalRow to Row, and then output Avro format data.

This PR allows direct conversion between InternalRow and Avro format data.

How was this patch tested?

Unit test

@SparkQA
Copy link

SparkQA commented Jul 13, 2018

Test build #92975 has finished for PR 21762 at commit bb7a43c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType)
  • sealed trait CatalystDataUpdater
  • final class RowUpdater(row: InternalRow) extends CatalystDataUpdater
  • final class ArrayDataUpdater(array: ArrayData) extends CatalystDataUpdater
  • class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable: Boolean)
  • class IncompatibleSchemaException(msg: String, ex: Throwable = null) extends Exception(msg, ex)
  • class SerializableSchema(@transient var value: Schema)

def deserialize(data: Any): Any = converter(data)

/**
* Creates a writer to writer avro values to Catalyst values at the given ordinal with the given
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit a writer to write

def setInt(ordinal: Int, value: Int): Unit = set(ordinal, value)
def setLong(ordinal: Int, value: Long): Unit = set(ordinal, value)
def setDouble(ordinal: Int, value: Double): Unit = set(ordinal, value)
def setFloat(ordinal: Int, value: Float): Unit = set(ordinal, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we don't need these default implementation

* This function takes an avro schema and returns a sql schema.
*/
def toSqlType(avroSchema: Schema): SchemaType = {
def toCatalystType(avroSchema: Schema): SchemaType = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't need to rename it?

@SparkQA
Copy link

SparkQA commented Jul 15, 2018

Test build #93026 has finished for PR 21762 at commit aa5e79e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 9603087 Jul 15, 2018
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
Currently the Avro Deserializer converts input Avro format data to `Row`, and then convert the `Row` to `InternalRow`.
While the Avro Serializer converts `InternalRow` to `Row`, and then output Avro format data.
This PR allows direct conversion between `InternalRow` and Avro format data.

Unit test

Author: Gengliang Wang <gengliang.wang@databricks.com>

Closes apache#21762 from gengliangwang/avro_io.

(cherry picked from commit 9603087)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants