Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redpanda Schema Registry is more permissive than the reference implementation, causing compatibility issues with Materialize #4297

Open
umanwizard opened this issue Apr 15, 2022 · 4 comments
Assignees
Labels
area/schema-registry Schema Registry service within Redpanda community kind/bug Something isn't working

Comments

@umanwizard
Copy link

umanwizard commented Apr 15, 2022

See here: https://github.com/MaterializeInc/database-issues/issues/3459

Redpanda's SR is accepting schemas that violate the Avro spec, and also aren't accepted by Confluent SR (or Materialize).

JIRA Link: CORE-887

@umanwizard umanwizard added the kind/bug Something isn't working label Apr 15, 2022
@emaxerrno
Copy link
Contributor

Thank you for the report @umanwizard !

@BenPope BenPope self-assigned this Apr 17, 2022
@jcsp jcsp added the area/schema-registry Schema Registry service within Redpanda label Dec 12, 2022
@umanwizard
Copy link
Author

@benesch , since Materialize apparently decided to delete all the publicly accessible issues, can you please paste the relevant content from the now-private issue here?

@benesch
Copy link

benesch commented Sep 29, 2024

Sure, see below.


From @AnotherGenZ on 2022 April 15:

What version of Materialize are you using?

v0.26.0 (07670312b)

How did you install Materialize?

Docker image

What was the issue?

Attempting to hook up Materialize to Redpanda + Redpanda's schema registry results in an avro parse error on some of my schemas that contain multiple fields of the same enum.

SQL Error [XX000]: ERROR: validating avro schema: Schema parse error: Sub-schema with name PS2Observer.Loadout encountered multiple times

Reduced Avro schema for reproduction:

{
  "name" : "EventName",
  "type" : "record",
  "fields" : [ {
    "name" : "attacker_loadout_id",
    "type" : {
      "name" : "Loadout",
      "type" : "enum",
      "symbols" : [ "Unknown", "Class1", "Class2", "Class3" ]
    }
  }, {
    "name" : "character_id",
    "type" : "string"
  }, {
    "name" : "character_loadout_id",
    "type" : {
      "name" : "Loadout",
      "type" : "enum",
      "symbols" : [ "Unknown", "Class1", "Class2", "Class3" ]
    }
  }, {
    "name" : "timestamp",
    "type" : {
      "type" : "long",
      "logicalType" : "timestamp-millis"
    }
  }],
  "namespace" : "PS2Observer"
}

Relevant log output

No response

From @umanwizard on 15 April 2022:

That is not a correct Avro schema, because Loadout is defined twice, which is not allowed by the Avro spec.

Apache's official Python Avro library fails in a similar way:

~ ❯❯❯ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import avro.schema
>>> schema = avro.schema.parse(open('foo.avsc', 'rb').read())
Traceback (most recent call last):
  File "/home/brennan/.local/lib/python3.9/site-packages/avro/schema.py", line 341, in __init__
    type_schema = make_avsc_object(type_, names)
  File "/home/brennan/.local/lib/python3.9/site-packages/avro/schema.py", line 1147, in make_avsc_object
    return EnumSchema(
  File "/home/brennan/.local/lib/python3.9/site-packages/avro/schema.py", line 579, in __init__
    NamedSchema.__init__(self, "enum", name, namespace, names, other_props)
  File "/home/brennan/.local/lib/python3.9/site-packages/avro/schema.py", line 263, in __init__
    new_name = names.add_name(name, namespace, self)
  File "/home/brennan/.local/lib/python3.9/site-packages/avro/name.py", line 160, in add_name
    raise avro.errors.SchemaParseException(f'The name "{to_add.fullname}" is already in use.')
avro.errors.SchemaParseException: The name "PS2Observer.Loadout" is already in use.

That schema can be corrected by making the second Loadout refer to the first one, rather than trying to redefine it. Like so:

{
  "name" : "EventName",
  "type" : "record",
  "fields" : [ {
    "name" : "attacker_loadout_id",
    "type" : {
      "name" : "Loadout",
      "type" : "enum",
      "symbols" : [ "Unknown", "Class1", "Class2", "Class3" ]
    }
  }, {
    "name" : "character_id",
    "type" : "string"
  }, {
    "name" : "character_loadout_id",
    "type" : "Loadout"
  }, {
    "name" : "timestamp",
    "type" : {
      "type" : "long",
      "logicalType" : "timestamp-millis"
    }
  }],
  "namespace" : "PS2Observer"
}

@umanwizard
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/schema-registry Schema Registry service within Redpanda community kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants