Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cdk]: correctly raise unsupported logical type errors when parsing avro #36888

Merged
merged 4 commits into from
Jun 6, 2024

Conversation

imrehg
Copy link
Contributor

@imrehg imrehg commented Apr 8, 2024

A typo in the lookup of the field that is causing issues with the Avro schema discovery resulted in error raised by the error being raised. Correcly extract the field, and add a test to check that this continues to work.

Related Slack thread: https://airbytehq.slack.com/archives/C021JANJ6TY/p1712557994937269

…sing avro

A typo in the lookup of the field that is causing issues with the Avro
schema discovery resulted in error raised by the error being raised.
Correcly extract the field, and add a test to check that this continues to work.
@imrehg imrehg requested a review from a team as a code owner April 8, 2024 07:20
Copy link

vercel bot commented Apr 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 6, 2024 4:11am

@octavia-squidington-iii octavia-squidington-iii added CDK Connector Development Kit community labels Apr 8, 2024
Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @imrehg I requested to the connector team review your change.

Copy link
Contributor

@natikgadzhi natikgadzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @imrehg, thank you for the fix!

It looks good to me, and given it was clearly a typo, and code above uses the correct logicalType key, it's fine to merge by me.

@brianjlai, can you take a quick glance too, to make sure I'm not hallucinating?

The problem, though, is that you would still get the error in this scenario — it's just that you'll see what logical type caused it. Right?

@imrehg
Copy link
Contributor Author

imrehg commented Apr 9, 2024

The problem, though, is that you would still get the error in this scenario — it's just that you'll see what logical type caused it. Right?

Hey @natikgadzhi, you hit the nail on the head indeed!


To give a bit more context:

I was trying to ingest an avro file that has a logical type that is not directly supported:

...
{"name": "end_time", "type": ["null", {"logicalType": "datetime", "type": "string"}]},
...

Without this change I had to guess that this was the issue, while with the fix I got to see which field was non-ingestable by AirByte.

Having said that, fastavro loads the file just fine when I try it, and e.g. loads this field as a string,

... 'end_time': '2024-03-27T09:30:00+00:00', ...

So this change is a (necessary?) band-aid at the moment, and ideally other logical types should be possible to load as well, likely into their declared primitive types (ie. here into string, without trying to interpret it, as it is done when mapping logical types to JSON types in AirByte:

AVRO_LOGICAL_TYPE_TO_JSON = {
"decimal": {"type": "string"},
"uuid": {"type": "string"},
"date": {"type": "string", "format": "date"},
"time-millis": {"type": "integer"},
"time-micros": {"type": "integer"},
"timestamp-millis": {"type": "string", "format": "date-time"},
"timestamp-micros": {"type": "string"},
"local-timestamp-millis": {"type": "string", "format": "date-time"},
"local-timestamp-micros": {"type": "string"},
# fastavro does not support duration https://fastavro.readthedocs.io/en/latest/logical_types.html
}

This is the first time I'm working with avro but my guess would be a logic like this:

  1. is there a logicalType declared?
  2. if yes, is the type there a known avro type (and/or known other logicalType? not sure if that is possible, but I guess it should)?
  3. if the type is known, use that type for the field and continue interpreting the schema

If there was a change, so actually all fastavro-loadable files could be loaded by AirByte, that would be the best outcome of this.

Is it possible that this above can happen for the CDK? Should I open a separate issue for it? (I guess yes.) Should I try to submit a MR? (I guess no on that one, given how your team does think about the right ways of fixing things up :)

@imrehg imrehg changed the title [source-s3]: correctly raise unsupported logical type errors when parsing avro [cdk]: correctly raise unsupported logical type errors when parsing avro Apr 9, 2024
@natikgadzhi
Copy link
Contributor

@alafanechere another one in a batch of PRs to CDK that I would love to get in.

@natikgadzhi
Copy link
Contributor

We're winning at CI now, @imrehg, setting this to automerge.

@natikgadzhi natikgadzhi temporarily deployed to community-ci-auto June 6, 2024 04:07 — with GitHub Actions Inactive
@natikgadzhi natikgadzhi merged commit d55995d into airbytehq:master Jun 6, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit community team/extensibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants