Skip to content

Avro to arrow schema conversion fails when a field has a default type that is not string #8209

@yongkyunlee

Description

@yongkyunlee

Describe the bug

There is a bug in arrow-avro for an avro record that contains a default whose type is not string.

The avro record schema should be converted to Complex schema (the comment says "A complex type such as record, array, map, etc.") but it is being converted to Type schema

To Reproduce

When we try to parse an avro schema that contains a field with int default value like

{
  "type": "record",
    "name": "R",
    "fields": [
      {"name": "a", "type": "int", "default": 0}
    ]
}

by running

let schema: Schema = serde_json::from_str(schema_json).expect("schema should parse");
    match &schema {
        Schema::Complex(ComplexType::Record(_)) => {}
        other => panic!("expected record schema, got: {:?}", other),
    }

The code exists with error printing

expected record schema, got: Type(Type { type: Ref("record"), attributes: Attributes { logical_type: None, additional: {"fields": Array [Object {"default": Number(0), "name": String("a"), "type": String("int")}], "name": String("R")} } })

Expected behavior

The schema should be something like

Complex(Record(Record { name: "R", namespace: None, doc: None, aliases: [], fields: [Field { name: "a", doc: None, type: TypeName(Primitive(Int)), default: Some(Number(0)) }], attributes: Attributes { logical_type: None, additional: {} } }))

This is because if we parse an avro schema with a field of default string value like

{
  "type": "record",
  "name": "R",
  "fields": [
      {"name": "a", "type": "string", "default": "hello"}
  ] 

the parsed schema is

Complex(Record(Record { name: "R", namespace: None, doc: None, aliases: [], fields: [Field { name: "a", doc: None, type: TypeName(Primitive(String)), default: Some(String("hello")) }], attributes: Attributes { logical_type: None, additional: {} } }))

The parsing should be consistent regardless of what default types the record fields have.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions