Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on invalid type/logicalType #368

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

stampy88
Copy link
Contributor

@stampy88 stampy88 commented Apr 4, 2024

  • Ran into a nasty issue where I had incorrectly used a time.Duration with a long Avro type and timestamp-millis. Hamba was encoding the entire int64 for this case instead of erroring on an invalid schema. When Spark/SQL tried to convert this into a microseconds (it wants everything as micros), it eventually tries to do this:
def millisToMicros(millis: Long): Long = {
    Math.multiplyExact(millis, MICROS_PER_MILLIS)
}

And Math.multiplyExact detects an overflow because it is trying to multiply the long by 1000, which doesn't fit. This fix will give the user an error if they use the logicalType incorrectly. The error will look something like avro: time.Duration is unsupported for Avro long and logicalType timestamp-micros.

- Ran into a nasty issue where I had incorrectly used a `time.Duration` with a `long` Avro type and `timestamp-millis`. Hamba was encoding the entire int64 for this case instead of erroring on an invalid schema. When Spark/SQL tried to [convert](https://github.com/apache/spark/blob/v3.5.0/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L142C38-L142C52) this into a microseconds (it wants everything as micros), it eventually tries to do this:
```
def millisToMicros(millis: Long): Long = {
    Math.multiplyExact(millis, MICROS_PER_MILLIS)
}
```
And [Math.multiplyExact](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Math.html#multiplyExact(long,int)) detects an overflow because it is trying to multiply the long by 1000, which doesn't fit.
This fix will give the user an error if they use the `logicalType` incorrectly. The error will look something like `avro: time.Duration is unsupported for Avro long and logicalType timestamp-micros`.
@stampy88 stampy88 force-pushed the invalid_logical_type_combination branch from 4c7f21e to bdc185a Compare April 4, 2024 19:41
@nrwiersma nrwiersma merged commit 4894c0b into hamba:main Apr 5, 2024
3 checks passed
@stampy88 stampy88 deleted the invalid_logical_type_combination branch April 6, 2024 15:57
@hhromic
Copy link
Contributor

hhromic commented Apr 9, 2024

Hey, I know this has been merged already but I had a question. Sorry to hijack this PR conversation!

I don't see why using a time.Duration is invalid for a long Avro type annotated with either a timestamp-millis or timestamp-micros logical type. While I do agree that it was incorrect for time.Duration to be encoded blindly as int64 (and thus nanos), I think there is no reason for hamba/avro to not do the correct conversion automatically.

  • If logical type is timestamp-millis then convert from Duration.Milliseconds().
  • If logical type is timestamp-micros then convert from Duration.Microseconds().

At least that is what I would expect the encoder to do because it has everything necessary to do the conversion automatically.

Is there any reason why this behaviour is not desired? I can contribute with it if you feel is a good idea.

EDIT: Ah, sorry, I just realized that time.Duration is not a timestamp to begin with, so is invalid no matter the logical type. Apologies for the noise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants