Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-2307: [java] list of primitive #2389

Merged
merged 4 commits into from
Aug 17, 2023

Conversation

clesaec
Copy link
Contributor

@clesaec clesaec commented Jul 24, 2023

What is the purpose of the change

AVRO-2307 : sub task 1.
As described in JIRA
Another challenge we’ve come across is that lists of primitive types (floats in our case) are always boxed into Object Floats by Avro. In our case, this resulted in millions of Objects / second getting allocated, causing pathological GC pressure.

So, this PR aims to introduce list of primitives classes that save memory consomption when Generic data with array of boolean, int, long, float or double.

Verifying this change

New unit test class PrimitivesArraysTest is implemented + rest of code use it via GenericData.newArray method.

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@github-actions github-actions bot added the Java Pull Requests for Java binding label Jul 24, 2023
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@clesaec clesaec merged commit 3735035 into apache:master Aug 17, 2023
RanbirK pushed a commit to RanbirK/avro that referenced this pull request May 13, 2024
@bechhansen
Copy link

bechhansen commented Sep 6, 2024

Hi @clesaec

I looks like this change broke the Avro deserialization of arrays of items using logicalTypes in Quarkus.

In our existing solution, we now get this error after bumping the Quarkus version to a version using 1.12.0.

2024-09-06 14:14:44,674 ERROR [io.sma.rea.mes.kafka] (smallrye-kafka-consumer-thread-0) SRMSG18249: Unable to recover from the deserialization failure (topic: testobject), configure a DeserializationFailureHandler to recover from errors.: java.lang.ClassCastException: class java.time.Instant cannot be cast to class java.lang.Long (java.time.Instant and java.lang.Long are in module java.base of loader 'bootstrap') at org.apache.avro.generic.PrimitivesArrays$LongArray.add(PrimitivesArrays.java:132) at java.base/java.util.AbstractList.add(AbstractList.java:113) at org.apache.avro.generic.GenericDatumReader.addToArray(GenericDatumReader.java:333) at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:294) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:184) at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:181) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248) at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:168) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154) at io.apicurio.registry.serde.avro.AvroKafkaDeserializer.readData(AvroKafkaDeserializer.java:117) at io.apicurio.registry.serde.AbstractKafkaDeserializer.readData(AbstractKafkaDeserializer.java:142) at io.apicurio.registry.serde.AbstractKafkaDeserializer.deserialize(AbstractKafkaDeserializer.java:122) at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.lambda$deserialize$1(DeserializerWrapper.java:77) at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.wrapDeserialize(DeserializerWrapper.java:109) at io.smallrye.reactive.messaging.kafka.fault.DeserializerWrapper.deserialize(DeserializerWrapper.java:77) at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:73) at org.apache.kafka.clients.consumer.internals.CompletedFetch.parseRecord(CompletedFetch.java:321) at org.apache.kafka.clients.consumer.internals.CompletedFetch.fetchRecords(CompletedFetch.java:283) at org.apache.kafka.clients.consumer.internals.FetchCollector.fetchRecords(FetchCollector.java:168) at org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:134) at org.apache.kafka.clients.consumer.internals.Fetcher.collectFetch(Fetcher.java:145) at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.pollForFetches(LegacyKafkaConsumer.java:693) at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.poll(LegacyKafkaConsumer.java:617) at org.apache.kafka.clients.consumer.internals.LegacyKafkaConsumer.poll(LegacyKafkaConsumer.java:590) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874) at io.smallrye.reactive.messaging.kafka.impl.ReactiveKafkaConsumer.lambda$poll$4(ReactiveKafkaConsumer.java:199) at io.smallrye.context.impl.wrappers.SlowContextualFunction.apply(SlowContextualFunction.java:21) at io.smallrye.mutiny.operators.uni.UniOnItemTransform$UniOnItemTransformProcessor.onItem(UniOnItemTransform.java:36) at io.smallrye.mutiny.operators.uni.UniOperatorProcessor.onItem(UniOperatorProcessor.java:47) at io.smallrye.mutiny.operators.uni.UniMemoizeOp.forwardTo(UniMemoizeOp.java:123) at io.smallrye.mutiny.operators.uni.UniMemoizeOp.subscribe(UniMemoizeOp.java:67) at io.smallrye.mutiny.operators.AbstractUni.subscribe(AbstractUni.java:36) at io.smallrye.mutiny.operators.uni.UniRunSubscribeOn.lambda$subscribe$0(UniRunSubscribeOn.java:27) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583)

@clesaec
Copy link
Contributor Author

clesaec commented Sep 6, 2024

@opwvhk, could you have a look to this, as I'm in vacation and no longer participate much in apache project. (nice to see 1.12.0 is available 🙂)

if (o == null) {
return;
}
this.add(location, o.floatValue());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a loss of precision, since it converts the Double to a float, and then casts back to double. An example:

jshell> Double d = 40.001
d ==> 40.001
jshell> double d2 = d.floatValue()
d2 ==> 40.000999450683594

So for example, if you have a field in the schema that's an array of doubles, and you use the Avro JSON parser to read a record with that schema from a JSON object where the value of that field is [40.001], the value in the resulting object will be an array with the element 40.000999450683594. This caused failures in unit tests that were using Precision.equals to compare values because that difference is much bigger than Precision's default epsilon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Confluent Schema Registry may also be running into this regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants