Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when parsing non-nested logicalType #726

Closed
bengels97 opened this issue Aug 23, 2024 · 6 comments
Closed

TypeError when parsing non-nested logicalType #726

bengels97 opened this issue Aug 23, 2024 · 6 comments

Comments

@bengels97
Copy link
Contributor

Describe the bug

A clear and concise description of what the bug is.

{
  "type": "record",
  "name": "TestEvent",
  "namespace": "com.example",
  "doc": "Bla bla",
  "fields": [
    {
      "name": "occurredAt",
        "type": "long",
        "logicalType": "timestamp-millis"
      }
  ]
}

Given the above example. When parsing the logical type in BaseGenerator.parse_logical_type a TypeError is raised due to a parsing issue when trying to access field["logicalType"].

  File "/Users/xxx/Documents/Development/xxxx/xxxx/.venv/lib/python3.11/site-packages/dataclasses_avroschema/model_generator/lang/python/base.py", line 321, in parse_logical_type
    logical_type = field["logicalType"]
                   ~~~~~^^^^^^^^^^^^^^^
TypeError: list indices must be integers or slices, not str

To Reproduce

Steps to reproduce the behavior

The issue can be reproduced by trying to generate a model using the above schema:

model_generator.render(schema=schema, model_type=ModelType.PYDANTIC)

It is resolved by specifying the type and logicalType fields under a type key. like so:

{
  "type": "record",
  "name": "TestEvent",
  "namespace": "com.example",
  "doc": "Bla bla",
  "fields": [
    {
      "name": "occurredAt",
      "type": {
        "type": "long",
        "logicalType": "timestamp-millis"
      }
    }
  ]
}

Expected behavior

A clear and concise description of what you expected to happen.

I was not able to find whether the above schema is valid AVRO or not (ChatGPT told me it is, but I don't see that as a reliable source). Reformatting seems like a simple solution. But if the above AVRO is valid, then parse_logical_type should be able to parse it.

Ideally the above schema should return the below rendered_class, the same way that the nested type field does.

class ComExampleTestEvent(BaseModel):
    """
    Bla bla
    """
    occurredAt: datetime.datetime
@bengels97
Copy link
Contributor Author

bengels97 commented Aug 23, 2024

If we all agree on the example schema being a valid AVRO schema, I don't mind resolving the bug.
A very easy fix would be to include an extra condition to the if condition to assign variable field inside BaseGenerator.parse_logical_type like so:

field = field if field_name is None or ~isinstance(field["type"], dict) else field["type"]

instead of the current condition:

field = field if field_name is None else field["type"]

@marcosschroh
Copy link
Owner

marcosschroh commented Aug 23, 2024

Hi @bengels97

Thanks for reporting the issue. In the past the avro spec defined that complex and logicalTypes are defined using {"type": {"type: {...}, ...}}. Unfortunately, the only available documentation is the current version (1.12.0), somehow it is not possible to access to previous ones.

Because the avro spec documentation is super unclear, I checked the avro rust and fstavro implementation and It is possible to define logicalTypes without the need to define type twice.

I think we should support this use case, so go forward with your PR. Then, the question is what to do when generating avro schemas from models model --> schema ? Currently, we create the schema with {"type": {"type: {...}, ...}}, maybe we should use the "new way".

@bengels97
Copy link
Contributor Author

For our own purposes we wouldn't necessarily need to be able to generate the same schema from the model. So personally I'm fine with generating the schema with the default {"type": {"type: {...}, ...}}.

If compatibility is broken for anyone, they could always generate the schema with include_original_schema=True

@bengels97
Copy link
Contributor Author

Created a PR to meet the expected behaviour. Please take a look and let me know if you have any feedback. Have not included the generation of model -> schema in the scope of this PR.

@marcosschroh
Copy link
Owner

Probably this issue is related to #727 as well

@marcosschroh
Copy link
Owner

Closed by #734

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants