Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Avro ref type in source #17020

Closed
xxchan opened this issue May 30, 2024 · 3 comments · Fixed by #17052 or #19746
Closed

Support Avro ref type in source #17020

xxchan opened this issue May 30, 2024 · 3 comments · Fixed by #17052 or #19746
Assignees
Milestone

Comments

@xxchan
Copy link
Member

xxchan commented May 30, 2024

We added the support for Debezium avro, but not otherwise, which makes no sense.

A little background: In debezium, there are after and before sharing the same schema.

let schema = resolver.to_resolved(&outer_schema)?;

BTW, this line is introduced in this huge refactor: #10096, which is too confusing and scary. How does it work before?

@xxchan xxchan self-assigned this May 30, 2024
@github-actions github-actions bot added this to the release-1.10 milestone May 30, 2024
@tabVersion
Copy link
Contributor

resolve ref in schema is an internal API and has some bugs, we use our patched crate in Cargo.toml. Not sure @xiangjinwu includes the fix when migrating deps.

@xiangjinwu
Copy link
Contributor

Not sure @xiangjinwu includes the fix when migrating deps.

Not yet.

resolve ref in schema is an internal API and has some bugs, we use our patched crate in Cargo.toml.

The bug is because we use the internal API the wrong way, or the API itself is poorly designed and hard to use. In short, we need to replaces almost all references to Schema into ResolvedSchema and use functions that take schemata (vec of schemas) rather than a single schema. I will work with @xxchan on resolving this.

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Jul 15, 2024

The current cloning hack implementation fails to handle the following situation:

{
  "type": "record",
  "name": "Root",
  "fields": [
    {
      "name": "f1",
      "type": {
        "type": "record",
        "name": "Nested",
        "fields": [
          {
            "name": "f1",
            "type": {
              "type": "enum",
              "name": "Case",
              "symbols": ["A", "B", "C"]
            }
          },
          {
            "name": "f2",
            "type": "Case"
          }
        ]
      }
    },
    {
      "name": "f2",
      "type": "Nested"
    }
  ]
}

In ResolvedAvroSchema::resolved_schema the field f2.f2 is not resolved and still a ref:

Record(
    RecordSchema {
        name: Name {
            name: "Root",
            namespace: None,
        },
        aliases: None,
        doc: None,
        fields: [
            RecordField {
                name: "f1",
                doc: None,
                aliases: None,
                default: None,
                schema: Record(
                    RecordSchema {
                        name: Name {
                            name: "Nested",
                            namespace: None,
                        },
                        aliases: None,
                        doc: None,
                        fields: [
                            RecordField {
                                name: "f1",
                                doc: None,
                                aliases: None,
                                default: None,
                                schema: Enum(
                                    EnumSchema {
                                        name: Name {
                                            name: "Case",
                                            namespace: None,
                                        },
                                        aliases: None,
                                        doc: None,
                                        symbols: [
                                            "A",
                                            "B",
                                            "C",
                                        ],
                                        default: None,
                                        attributes: {},
                                    },
                                ),
                                order: Ascending,
                                position: 0,
                                custom_attributes: {},
                            },
                            RecordField {
                                name: "f2",
                                doc: None,
                                aliases: None,
                                default: None,
                                schema: Enum(
                                    EnumSchema {
                                        name: Name {
                                            name: "Case",
                                            namespace: None,
                                        },
                                        aliases: None,
                                        doc: None,
                                        symbols: [
                                            "A",
                                            "B",
                                            "C",
                                        ],
                                        default: None,
                                        attributes: {},
                                    },
                                ),
                                order: Ascending,
                                position: 1,
                                custom_attributes: {},
                            },
                        ],
                        lookup: {
                            "f1": 0,
                            "f2": 1,
                        },
                        attributes: {},
                    },
                ),
                order: Ascending,
                position: 0,
                custom_attributes: {},
            },
            RecordField {
                name: "f2",
                doc: None,
                aliases: None,
                default: None,
                schema: Record(
                    RecordSchema {
                        name: Name {
                            name: "Nested",
                            namespace: None,
                        },
                        aliases: None,
                        doc: None,
                        fields: [
                            RecordField {
                                name: "f1",
                                doc: None,
                                aliases: None,
                                default: None,
                                schema: Enum(
                                    EnumSchema {
                                        name: Name {
                                            name: "Case",
                                            namespace: None,
                                        },
                                        aliases: None,
                                        doc: None,
                                        symbols: [
                                            "A",
                                            "B",
                                            "C",
                                        ],
                                        default: None,
                                        attributes: {},
                                    },
                                ),
                                order: Ascending,
                                position: 0,
                                custom_attributes: {},
                            },
                            RecordField {
                                name: "f2",
                                doc: None,
                                aliases: None,
                                default: None,
                                schema: Ref {
                                    name: Name {
                                        name: "Case",
                                        namespace: None,
                                    },
                                },
                                order: Ascending,
                                position: 1,
                                custom_attributes: {},
                            },
                        ],
                        lookup: {
                            "f1": 0,
                            "f2": 1,
                        },
                        attributes: {},
                    },
                ),
                order: Ascending,
                position: 1,
                custom_attributes: {},
            },
        ],
        lookup: {
            "f1": 0,
            "f2": 1,
        },
        attributes: {},
    },
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment